Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Characterizing synonymous variants by leveraging gene expression and GWAS datasets
(USC Thesis Other)
Characterizing synonymous variants by leveraging gene expression and GWAS datasets
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Characterizing synonymous variants
by leveraging gene expression and GWAS datasets
by
Tzu Yu Huang
A Thesis Presented to the
FACULTY OF THE USC KECK SCHOOL OF MEDICINE
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
(BIOSTATISTICS)
December 2022
Copyright 2022 Tzu Yu Huang
ii
TABLE OF CONTENTS
List of Tables ............................................................................................................................................... iii
List of Figures .............................................................................................................................................. iv
Abstract ......................................................................................................................................................... v
Chapter 1: Introduction ................................................................................................................................. 1
Chapter 2: Methods ....................................................................................................................................... 4
1. Common coding variants in the human genome. .............................................................................. 4
2. Fine-mapped variants impacting human complex traits ................................................................... 4
2.a Introduction to fine-mapping .................................................................................................. 4
2.b Fine-mapped eQTLs from the Genotype-Tissue Expression (GTEx) .................................... 5
2.c Fine-mapped disease and complex traits variants from the UK Biobank ............................... 5
3. Predicting pathogenicity of synonymous variants ............................................................................ 6
3.a dRSCU .................................................................................................................................... 6
3.b SilVA ...................................................................................................................................... 6
3.c TraP ......................................................................................................................................... 7
3.d SynVep .................................................................................................................................... 7
3.e Other methods predicting variant pathogenicity ..................................................................... 8
4. Variants implicated in Mendelian disorders. .................................................................................... 9
Chapter 3: Results ....................................................................................................................................... 10
1. Quantifying the impact of synonymous (and non-synonymous) variants on molecular and
phenotypic traits .................................................................................................................................. 10
1.a Quantifying the impact of coding variants on gene regulation (GTEx) ................................ 10
1.b Quantifying the impact of coding variants on human diseases and complex traits (UK
Biobank) ...................................................................................................................................... 13
1.c Quantifying the impact of coding variants on predicted impact on expression (Enformer)
and constraint scores (Zoonomia) ............................................................................................... 15
2. Evaluating synonymous scores in the context of molecular and phenotypic traits ......................... 18
2.a Evaluating synonymous scores in the context of gene expression ........................................ 19
2.b Evaluating synonymous scores in the context of human diseases complex traits................. 20
Chapter 4: Conclusion ................................................................................................................................. 22
References ................................................................................................................................................... 23
iii
List of Tables
Table 1: Numbers and enrichment of coding variants within common fine-mapped eQTLs of GTEx.......11
Table 2: Numbers and enrichment of coding variants within common fine-mapped variants of UKBB....14
Table 3: Top enfromer and constraint scores of common synonymous and non-synonymous variants......16
iv
List of Figures
Figure 1: GTEx enrichment: CAVIAR and SuSie method………………………………………………..10
Figure 2: MAF and SuSie estimates of squared effect sizes for fine-mapped eQTLs of GTEx…………..12
Figure 3: UKBB enrichment: SuSie and Polyfun method………………………………………………...13
Figure 4: MAF and Polyfun estimates of squared effect sizes for fine-mapped variants of UKBB……...15
Figure 5: Distribution of Enformer and constraint scores in mammals, syn and non-syn variants……….17
Figure 6: Predictors enrichments for causal fine-mapped eQTLs of GTEx………………………………19
Figure 7: Predictors enrichments for causal fine-mapped sSNVs of UKBB……………………………...20
Figure 8: Predictors enrichments for pathogenic sSNVs of ClinVar……………………………………...21
v
Abstract
Synonymous coding variants are usually regarded as neutral, and thus neglected in clinical genetic studies.
However, recent studies highlighted a few examples where they can play a crucial role in fine-tuning gene
expression, protein function and even contribute to human pathologies through a variety of mechanisms.
Computational tools have been developed to predict the pathogenicity effect of synonymous variants, but
their performance remains unclear. Here, we leveraged large fine-mapping studies of molecular and
phenotypic traits to characterize the regulatory and deleterious effects of common synonymous variants.
First, we estimated the enrichment of synonymous (and non-synonymous) coding variants within fine-
mapped eQTLs from GTEx and fine-mapped causal variants from UK Biobank. We observed similar
enrichment for synonymous and non-synonymous variants impacting gene expression, which are under
similar patterns of negative selection. In contrast, we found that enrichments were significantly higher for
non-synonymous variants on human disease and complex traits compared to synonymous counterparts.
Second, we leveraged fine-mapped synonymous variants in GTEx and UK Biobank datasets to evaluate the
predictive power of various predictors. Our results indicate that the impact of common synonymous and
non-synonymous variants on gene expression and human complex traits differ substantially, but that they
could be leveraged to predict the effect of rare synonymous variants on disease risk.
1
Chapter 1: Introduction
The degeneracy of genetic code allows most amino acids to be encoded by several (1 to 6) different codons.
Due to the degeneracy of the genetic code, synonymous mutations, also called synonymous single
nucleotide variants (sSNVs), result in substitution of a protein-coding nucleotide without changing the
amino acid composition of the resulting protein. Such variants have long time been referred to as ‘silent’,
i.e., without any protein- or phenotype-altering effect. Historically, sSNVs have been considered neutral,
i.e., not affected by natural selection, and resulting from random genetic drift (Kimura 1977). Nonetheless,
if the synonymous substitutions are strictly neutral, then codons encoding the same amino acid - referred
to as synonymous codons - would appear randomly and equally distributed along the genes. This is not the
case, as synonymous codons are not used at equal frequencies in protein-coding regions, a phenomenon
called codon usage bias. The non-random use of synonymous codons is now recognized as a crucial process
involved in fine-tuning of gene expression and protein function (Plotkin and Kudla 2011). Consequently,
sSNVs can influence protein biogenesis and phenotype and, thus, contribute to human pathologies (Sauna
and Kimchi-Sarfaty 2011). Their clinical impact can result from different types of molecular alterations
including aberrant pre-mRNA splicing(Chamary, Parmley, and Hurst 2006; Macaya et al. 2009), alteration
of miRNA binding sites(Brest et al. 2011), mRNA folding (Bartoszewski et al. 2010), (Park et al. 2013)
and translation dynamics. Overall, synonymous variants have now been implicated in over 50 genetic
disorders (Chamary, Parmley, and Hurst 2006), including congenital heart disease (Dixit, Kumar, and
Mohapatra 2019), cancer (Supek et al. 2014; Diederichs et al. 2016; Kandoth et al. 2013; Gotea et al. 2015),
neurodevelopmental disorders (Kim et al. 2020) and psychiatric disease (Takata et al. 2016; Reitz et al.
2018).
A key challenge in clinical genetics is to identify disease-related variants among a large background
of neutral non-pathogenic polymorphisms ((Bamshad et al. 2011); (MacArthur et al. 2014). Despite their
proven clinical relevance, sSNVs are still largely overlooked in clinical studies and are often discarded as
2
functionally irrelevant, unless directly associated with splicing defects. The under-prioritization of sSNVs
is mainly due to the lack of reliable computational tools and simply the vast number of such variants present
in every individual’s genome. While a great number of widely used algorithms exist to infer the deleterious
effect of non-synonymous variants (Liu, Jian, and Boerwinkle 2011, 2013), only very few computational
approaches are available for discovery of pathogenic sSNVs. Their development is hindered by: (1) the
extremely low number of high-confidence pathogenic sSNVs reported so far in the literature; and (2) by
the limited available experimental data investigating sSNVs effects, which could be used for training and
validation of such methods. Indeed, to our knowledge, there exists less than a hundred pathogenic sSNVs
with known biological effects and experimental validation. For example, sSNV-dedicated tools like SilVA
(Buske et al. 2015) and TrAP (Gelfman et al. 2017) employ machine-learning approaches to discover
pathogenic sSNVs, but the models are trained using custom-curated datasets of only 33 and 75 disease-
causing sSNVs, respectively. To circumvent these limitations, some tools use Minor Allele Frequency
(MAF) of sSNVs as a proxy for their deleterious effect (SilVA) or employ wholistic machine learning
methods with simulated datasets to determine putative effects of each possible sSNV in human genome
(synVep, (Zeng, Aptekmann, and Bromberg 2021) ). These methods also present drawbacks: several studies
suggest that population frequency of a variant is poorly correlated with its effect (Zeng and Bromberg 2019)
while tools trained on simulated datasets do not necessarily reflect biological reality.
While very few pathogenic sSNVs have been reported in clinical genetics, Genome Wide
Association Studies (GWAS) identify a large number of SNPs presenting significant association with
phenotypic traits. Downstream fine-mapping of GWAS-significant SNPs can pinpoint causal variants
presenting high probability of having functional effects ((Weissbrod et al. 2020); (Spain and Barrett 2015)).
Additionally, transcriptomic resources like GTEx provide a large resource of expression quantitative trait
loci (eQTLs), i.e., variants with an impact on gene expression (GTEx Consortium, 2013). The results of
such fine-mapping and transcriptomic studies could provide a large resource of sSNVs impacting,
3
respectively, phenotypic and molecular traits. Such resource could be used to address the limitations of
existing algorithms for pathogenic sSNVs prediction.
In this study, we hypothesize that synonymous variants impacting phenotypic and molecular traits
can be leveraged and used to train novel algorithms of assessing variant pathogenicity. First, we quantified
the impact of sSNVs on molecular and phenotypic traits, using eQTLs from GTEx and fine-mapped causal
variants from UK Biobank, respectively. Second, we used these two datasets to evaluate the predictive
power of various predictors of sSNVs pathogenicity, as well as the predictive power of gold-standard
pathogenicity predictors of coding and non-coding bases. We additionally evaluated the power of these
scores in Mendelian disorders, by leveraging the reported pathogenic sSNVs of the ClinVar dataset
(Landrum et al. 2014). Our results indicate that common synonymous variants impacting human complex
traits and gene expression could be leveraged to predict the effect of rare synonymous variants on disease
risk.
4
Chapter 2: Methods
1. Common coding variants in the human genome.
We restricted our analyses to variants that are common (minor allele frequency (MAF) ≥ 5%) in the
Europeans of the 1000 Genomes (1000G) project (1000 Genomes Project Consortium et al. 2015). Analyses
were restricted to common variants as they have stronger statistical power in expression quantitative trait
loci (eQTLs) and genome-wide association studies (GWAS) (see below); analyses were restricted to
Europeans as eQTLs and GWAS datasets have larger sample size in Europeans populations.
We annotated 85,095 common coding variants out of the 5,961,159 common variants in 1000G
using Annovar (K. Wang, Li, and Hakonarson 2010). 18,469 variants were defined as synonymous, 16,328
were defined as non-synonymous (i.e., missense , stop gain, or stop loss), and 50,298 were defined as
outside of the exons or with unknown function.
2. Fine-mapped variants impacting human complex traits
2.a Introduction to fine-mapping
In GWAS, we often see closeby associated variants that pass genome-wide significance threshold; while
none of these variants are truly causal, they all are associated due to patterns of linkage disequilibrium (LD)
between causal and non causal variants.The purpose of statistical fine-mapping is to use GWAS data and
LD structure to decipher the complex association, and discovered the actual causal variants underlying the
significant associations in the first place. Performing fine-mapping can help researchers find causal variants
and their target genes, understand gene mechanisms, genetic architecture, and design better experiments,
more accurate prediction models, etc. Two common outcomes for fine-mapping are the posterior inclusion
probability (PIP) for individual variants (i.e. the posterior probability the variant is causal), and credible
sets (i.e. set of variants that contains a causal variant with certain confidence (e.g., 95%)).
5
In this study we used the publicly available outputs of the CAVIAR (Hormozdiari et al. 2014),
SuSie (G. Wang et al., n.d.), and PolyFun (Weissbrod et al. 2020) (see below). For ongoing experiments,
we considered variants presenting a PIP ≥ 0.05, 0.50 and 0.95 as causal.
2.b Fine-mapped eQTLs from the Genotype-Tissue Expression (GTEx)
Genotype-Tissue Expression (GTEx) project Version 8 contains 15,201 RNA-sequencing specimens from
49 tissues of 838 postmortem donors were examined and presented. Whole-genome sequencing (WGS)
performed on each donor detected a total of 43,066,422 single-nucleotide variants after phasing and quality
control. The GTEx project mapped genetic loci that affect the expression of protein-coding and long
intergenic noncoding RNA (lincRNA) genes, defined as eQTLs, while local and distal genetic effects are
characterized as cis-eQTLs and trans-eQLTs, respectively. We used fine-mapped eQTLs obtained using
CAVIAR (Hormozdiari et al. 2014), and SuSie (unpublished data obtained from collaboration with the
Finucane laboratory).
2.c Fine-mapped disease and complex traits variants from the UK Biobank
Starting from 2006, UK Biobank is a large prospective cohort study containing deep genetic and phenotypic
data from close to 500,000 individuals, aged between 40 and 69 years old and living in the UK. A great
amount of phenotypic and medical data is collected including, biological measurements, biomarkers in
urine and blood, imaging of brain and other body parts, and genome-wide genotype data using a purpose-
designed genotyping array, etc. The database is anonymized and accessible to researchers around the globe
to conduct statistical analysis for the better understanding of human biology. We used fine-mapped GWAS
loci obtained using SuSie and PolyFun over 49 human diseases and complex traits (Weissbrod et al. 2020).
6
3. Predicting pathogenicity of synonymous variants
3.a dRSCU
Codons encoding the same amino acid (referred to as synonymous codons) are not used at equal frequencies
in protein coding regions, a phenomenon called codon usage bias (Quax et al. 2015). It has been previously
shown that sSNVs introducing substantial changes in codon frequencies (i.e., substitution of a very frequent
codon by a very rare one and vice versa) can lead to changes in protein expression and pathogenic
consequences (Kimchi-Sarfaty et al., 2007; (Hunt et al. 2019)). The relative synonymous codon usage
(RSCU) is a relative measure of the frequency at which one given codon is used in protein-coding regions
(Sharp, Tuohy, and Mosurski 1986). For a given sSNV, RSCU can be used to estimate the difference in
frequencies of the wild-type and the mutated codon, called dRSCU.
3.b SilVA
The Silent Variant Analyzer (SilVA) is a tool for predicting the pathogenicity of sSNVs within the human
genome (Buske et al. 2015). SilVA uses a random forest model trained on datasets of 33 rare (MAF <5 %)
disease-causing synonymous mutations selected according to developer defined criteria (Buske et al. 2015).
In terms of training and benchmarking negative controls, all 758 synonymous variants (MAF < 5%) from
a subject of the 1000 Genomes Project (NA10851) were used. The features of the model include numerous
characteristics of a coding variant, including pre-mRNA folding free energy, codon usage, GERP++
conservation (Davydov et al. 2010), splice sites, exon splicing enhancer and suppressor (ESE/ESS) motifs.
The resulting model assigns a score between 0 and 1 for each sSNV. SilVA classifies sSNVs as likely
benign with scores lower than 0.27; potentially pathogenic with scores between 0.27 and 0.485, and likely
pathogenic with scores higher than 0.485. We scored our list of common coding variants by creating a VCF
file for SilVA. The usage of SilVA requires a combination of Python, Bash, and R. Of the 18,469 sSNVs
we extracted from the 1000 Genomes Project (1000 Genomes Project Consortium et al. 2015), 17,672
could be annotated by SilVA (797 variants were dropped due to annotation errors). These 17,672 variants
7
were kept for all further analyses. A total of 15 sSNVs were classified as pathogenic, and 142 were classified
as likely pathogenic.
3.c TraP
Transcript-inferred Pathogenicity (TraP) score is developed to predict a sSNV’s ability to cause disease
through interrupting a gene’s transcripts and consequently the translated protein (Gelfman et al. 2017). TraP
is a random forest algorithm trained exclusively on pathogenic synonymous variants (N = 75) to ensure the
capture of information unrelated to amino-acid substitutions. sSNV features used in TraP model include:
the harboring gene and its transcript, GERP++ conservation score (Davydov et al. 2010), changes in
sequence motifs, regulatory effects on splicing mechanism and others. The developer proposed a TraP score
of 0.459 as threshold to filter benign variants. Variants with scores ≥ 0.459 and < 0.93 are considered as
possibly damaging, indicating intermediate pathogenic range. Variants presenting scores ≥ 0.93 are
considered as probably damaging through damaging the final transcript. Of the 17,672 common sSNVs, 18
were classified as probably damaging, and 158 were classified as intermediate pathogenic.
3.d SynVep
Synonymous Variant effect predictor (synVep) is a machine learning-based model for predicting the impact
of human sSNVs (Zeng, Aptekmann, and Bromberg 2021). Instead of training with disease/deleteriousness-
labeled data, synVep was developed using information observed from the Genome Aggregation Database
(gnomAD) (Karczewski et al. 2020) and simulated (i.e., possible, but unseen in gnomAD) variants. We
note that, unlike previous methods, synVep does not use conservation scores (such as GERP++) for training
to prevent the situation described by the developers as over-reliance on conservation scores that were shown
by other predictors. The outcome of this algorithm is the probability of the SNPs having a deleterious effect
(between 0 and 1). Of the 17,672 common sSNVs, 7,038 were classified as having a potential deleterious
effect (synVep probability > 0.5).
8
3.e Other methods predicting variant pathogenicity
Enformer. Enformer is an algorithm that utilizes both convolutional neural networks and a class of natural
language processing (NLP) models called Transformers (Avsec et al. 2021). Enformer can be used to
predict gene expression from a DNA sequence that is ~200kb long, and the incorporation of Transformer
allows enformer to read in a longer DNA sequence compared to previous method (Basenji2 (Kelley 2020)),
and thus account for regulatory DNA elements further away. Since Enformer predictions were available for
5,313 regulatory tracks, we used the maximum score across these tracks to predict the functional impact of
a synonymous variants.
Conservation scores. Evolutionary conservation quantifies variants' functional importance by inferring
which positions have changed more slowly than expected under neutral drift due to purifying selection. A
key advantage of conservation lies in its mechanistic agnosticism: it is detected by comparing germline
DNA and thus is independent of the specific cell-type or developmental time point, or the mechanism of
action, unlike functional assays. GERP++ is the most widely used conservation score, and it quantifies
evolutionary conservation at each base pair position by comparing the human reference genome with the
reference genome of 33 mammalian species (Davydov et al. 2010). The Zoonomia conservation scores are
a recent improvement to mammalian conservation scores, by quantifying conservation across 241
mammalian species and 43 primates (Zoonomia Consortium 2020).
CADD. Combined Annotation Dependent Depletion (CADD) is a gold-standard pathogenicity predictor of
coding and non-coding bases in the human genome. CADD is a machine learning logistic regression model
that can score single nucleotide variants as well as insertion/deletions variants in the human genome for its
deleteriousness (Kircher et al. 2014). CADD is purposefully not trained on variants with known pathogenic
or benign effects which only has a limited number of variants. Instead, it derived its own “proxy-neutral”
and “proxy-deleterious” variants to train CADD. “Proxy-neutral” are derived from variants that are fixed
or nearly fixed in human populations, but missed in the presumed genome of the human-ape ancestor, while
9
“proxy-deleterious” are de novo simulated variants using allele composition. By nature of its training
materials, CADD can have a relatively huge training dataset compared to other predictors. CADD outputs
a continuous phred-like score for each variant (ranging from 1 to 99), with higher values indicating a more
deleterious impact. A total of 212/17,672 sSNVs presented CADD scores above the threshold suggested by
developers (CADD_phred ≥ 15).
4. Variants implicated in Mendelian disorders.
ClinVar dataset is a public repertoire of genetic variants and interpretations of their significance to disease
(i.e., whether the variant should be considered as pathogenic or benign), maintained at the National
Institutes of Health (Landrum et al. 2014). The clinical significance of each variant can be reported by
various members of clinical and research community. A confidence level is assigned to each variant, based
on the level of evidence confirming its benign/pathogenic status (multiple submissions, review by a panel
of experts). For this study, no filtering based on confidence level was performed in order to maximize the
size of the dataset. We leveraged a total of 283,504 sSNVs reported in ClinVar, of which 638 sSNVs were
reported as having Pathogenic/Likely pathogenic consequences.
10
Chapter 3: Results
1. Quantifying the impact of synonymous (and non-synonymous) variants on molecular and
phenotypic traits
1.a Quantifying the impact of coding variants on gene regulation (GTEx)
We defincd enrichment as the ratio between the fraction of common synonymous (non-synonymous)
variants that are fine-mapped eQTLs, divided by the fraction of all common variants that are fine-mapped
eQTLs. We found high enrichment of synonymous variants within common fine-mapped eQTLs of GTEx
(≥6.1x for PIP≥0.5) (see Figure 1 and Table 1). Enrichments were slightly smaller than non-synonymous
variants (≥7.5x for PIP≥0.5) (paired t test: p < 0.001). Enrichments were positively correlated with PIP
thresholds, and results were consistent between Caviar and SuSie methods.
Figure 1: We report line plots showing relationship between PIP and enrichment (with standard error)
within common fine-mapped eQTLs of GTEx, PIP threshold were set to classified eQTLs as casual and
non-causal. (a) eQTLs of GTEx were fine-mapped using CAVIAR method. (b) eQTLs of GTEx were
fine-mapped using SuSie method.
11
Caviar SuSie
Synonymous Non-synonymous Synonymous Non-synonymous
PIP Enr N Enr N Enr N Enr N
0.05 3.36 5,378 3.28 4,568 3.98 3,307 3.90 2,818
0.5 6.11 524 7.50 560 7.08 485 8.02 478
0.95 7.83 174 9.93 192 8.94 201 10.88 213
Table 1: We report enrichment and numbers of both synonymous and non-synonymous variants within
common fine-mapped eQTLs of GTEx, with respect to different fine-mapping methods and PIP thresholds.
Using SuSie estimates of squared effect sizes, we observed that lower-frequency synonymous fine-
mapped eQTLs tend to have higher effect sizes compare to higher-frequency synonymous fine-mapped
eQTLs, consistent with the action of negative selection on synonymous variants impacting gene expression
(Figure 2a). Effects were higher for non-synonymous fine-mapped eQTLs. Interestingly, more common
synonymous fine-mapped eQTLs tend to also have higher effects, consistent with the action of balancing
selection. We observed that these common variants with high effects tend to be in genes tolerant to
deleterious mutations (Figures 2b and 2c).
12
Figure 2: We report line plots showing relationship between MAF and SuSie estimates of squared effect
sizes for fine-mapped eQTLs of GTEx with a PIP>0.5 (a). We split eQTLs by half according to how their
affected genes were tolerant to loss-of-functions mutations by using their pLI scores. We report fine-
mapped eQTLs of GTEx impacting more tolerant genes (b) and more intolerant genes of our dataset (c).
For visualization purposes, effects were smoothed using a roll median of 100 SNPs.
To summarize, we observed that synonymous variants are enriched in variants impacting gene
expression on a scale comparable to non-synonymous variants, and that synonymous variants impacting
gene expression of constrained genes are under negative selection (suggesting even stronger effect on gene
expression for rarer synonymous variants). These results suggest that common and low-frequency
synonymous fine-mapped eQTLs of GTEx could be leveraged to predict the effects of rare
synonymous variants on gene expression. Non-synonymous variants had slightly higher enrichments of
common fine-mapped eQTLs (paired t test: p < 0.001), and higher effect sizes for rare variants impacting
expression of constraint genes. These results are consistent with flattening effects of negative selection for
common variants and higher deleterious effects for non-synonymous variants. They also suggest that non-
synonymous variants have the same probability to impact gene expression as synonymous variants, but that
the causal ones tend to have higher effect sizes.
13
1.b Quantifying the impact of coding variants on human diseases and complex traits (UK Biobank)
We defined enrichment as the ratio between the fraction of common synonymous (non-
synonymous) variants that are fine-mapped variants of UK Biobank diseases and complex traits, divided
by the fraction of all common variants that are fine-mapped variants of UK Biobank diseases and complex
traits. We found high enrichment of synonymous variants within common fine-mapped variants of UK
Biobank diseases and complex traits, (≥5.3x for PIP ≥ 0.5) (see Figure 3 and Table 2); these enrichments
were in the same order of magnitude as the ones obtained with fine-mapped eQTLs. However, in contrast
to the GTEx dataset, we found enrichments were significantly higher for non-synonymous variants (≥25x
for PIP ≥ 0.5) (paired t test: p < 0.001). Enrichments were higher when using Polyfun than when using
SuSie, but had consistent trends and similar enrichments for high PIP thresholds. PolyFun enrichments for
synonymous variants were similar whatever the PIP threshold.
Figure 3: We report line plots showing relationship between PIP and enrichment (with standard error)
within common fine-mapped variants of UKBB, PIP threshold were set to classified variants as casual and
non-causal. (a) Variants of UKBB were fine-mapped using SuSie method. (b) Variants of UKBB were
fine-mapped using Polyfun method.
14
SuSie PolyFun
Synonymous Non-synonymous Synonymous Non-synonymous
PIP Enr N Enr N Enr N Enr N
0.05 2.64 408
4.90 658 9.09 2839
10.88 2958
0.5 5.33 41
25.23 169 10.01 144 28.69 359
0.95 7.07 19 38.04
89 7.87 29
43.99 141
Table 2: We report enrichment and numbers of both common synonymous and non-synonymous fine-
mapped variants of UKBB, with respect to different fine-mapping methods and PIP thresholds.
Using PolyFun estimates of squared effect sizes, we observed that lower-frequency synonymous
fine-mapped UK Biobank variants tend to have higher effect sizes, consistent with the action of negative
selection (Figure 4a). Interestingly, effect sizes of lower frequency variants were in the same order of
magnitude than non-synonymous variants, whereas effect sizes of common variants were lower.
15
Figure 4: We report line plots showing relationship between MAF and PolyFun estimates of squared effect
sizes for fine-mapped UK Biobank variants with a PIP ≥ 0.5 (a). We split coding variants by half according
to how their genes were tolerant to loss-of-functions mutations by using their pLI scores. We report fine-
mapped UK Biobank coding variants impacting more tolerant genes (b) and more intolerant genes of our
dataset (c). For visualization purposes, effects were smoothed using a roll median of 100 SNPs for non-
synonymous SNPs and using a roll median of 50 SNPs for synonymous SNPs (due to the low number of
fine-mapped synonymous variants).
To summarize, we observed that synonymous variants are enriched in variants impacting human
diseases and complex traits and that these variants are under negative selection (suggesting even stronger
effect on diseases and complex traits for rarer synonymous variants). These results suggest that common
and low-frequency synonymous fine-mapped variants impacting human disease and complex traits
could be leveraged to predict the effect of rare synonymous variants on disease risk.
1.c Quantifying the impact of coding variants on predicted impact on expression (Enformer) and constraint
scores (Zoonomia)
We aimed to replicate our results by investigating the enrichment of synonymous and non-synonymous
variants using predicted gene expression and predicted deleterious effects of common variants. We used
16
gene expression predictions from Enformer (Avsec et al. 2021) (kept the maximum squared prediction
across 5,313 genomic tracks) and Zoonomia constrained scores across 240 mammals (Zoonomia
Consortium 2020) as predicted expression and deleterious effects, respectively.
We observed a similar trend of results (Table 3). Synonymous and non-synonymous variants had
similar (but lower) enrichments of top Enformer scores. These results are consistent with our previous
observations that non-synonymous variants have the same probability to impact gene expression than
synonymous variants (Results 1a). Synonymous variants were highly enriched in constrained variants
(≥7.56 fold), but this enrichment was significantly lower than for non-synonymous variants (≥ 18.19 fold).
Since constraint has been shown to explain a large part of heritability of complex traits (Gazal et al. 2018),
these results indicate that non-synonymous variants are more likely to be causal in complex traits than
synonymous variants, consistent with our previous observations (Results 1b).
Enformer Constraint in mammals
Top Synonymous Non-synonymous Synonymous Non-synonymous
0.5% 1.58 1.74 11.02 43.19
1% 1.71 1.94 9.90 29.58
2% 2.03 2.09 7.56 18.19
Table 3: We report top 0.5%, 1%, and 2% values of both common synonymous and non-synonymous, with
respect to enformer and constraint scores in mammals.
In summary, while constraint in mammals seems like a strong predictor of deleterious effects that
could be leveraged to predict the impact of synonymous variants, it is unclear how to leverage Enformer
predictions due to their low enrichments.
17
Figure 5: We report boxplots of scores for Enformer (a) and Zoonomia (b) distribution with respect to
mutation type, synonymous and non-synonymous.
18
2. Evaluating synonymous scores in the context of molecular and phenotypic traits
Here, we evaluated the predictive power of 4 pathogenicity predictors of synonymous variants (dRSCU,
SilVA, Trap, synVep), as well as Enformer, Zoonomia, and CADD scores in molecular and phenotypic
traits. Due to the limited number of fine-mapped variants and the lack of synonymous variants with
validated impact on gene expression and disease risk, we only evaluated the power of each predictor (and
not its false negative rate). For a wide range of score thresholds, we defined enrichment as the ratio between
the fraction of causal synonymous variants with a score greater than a threshold, divided by the fraction of
all synonymous variants with a score greater than a threshold.
19
2.a Evaluating synonymous scores in the context of gene expression
We evaluated predictor enrichments on 549 common fine-mapped GTEx eQTLs with PIP ≥ 0.5. We
observed the highest enrichments for Enformer (consistent with the fact that it is trained on gene expression
data, including GTEx) and CADD. The highest enrichment for the predictors of synonymous variants was
obtained by Trap (Figure 6a).
As we observed that synonymous variants have different deleterious effects in genes tolerant and
intolerant to loss-of-function mutations, we stratified our results based on the tolerance of the eQTL genes.
Within tolerant genes, we only observed high enrichments for Enformer and CADD (Figure 6b). However,
we observed stronger enrichments for Trap and Zoonomia within intolerant genes (Figure 6c).
Figure 6: We report line plots showing relationship between score percentile within sSNVs and predictors
enrichments for fine-mapped eQTLs of GTEx with a PIP ≥ 0.5 (a). We split eQTLs by half according to
how their affected genes were tolerant to loss-of-functions mutations by using their pLI scores (same
threshold of pLI scores used in Figure 2). We report fine-mapped eQTLs of GTEx impacting more tolerant
genes (b) and more intolerant genes of our dataset (c). For visualization purposes, we kept datapoints with
more than 5 causal variants over a give score threshold in the plot.
20
2.b Evaluating synonymous scores in the context of human diseases complex traits
We evaluated predictor enrichments on 143 common fine-mapped synonymous variants of UKBB with PIP
≥ 0.5. For this part, we only considered variants fine-mapped by Polyfun due to their higher number. We
observed the highest enrichments for constraint-related predictors Zoonomia and CADD. The highest
enrichment for the predictors of synonymous variants was obtained by Trap (Figure 7a). As previously, we
stratified our results based on the tolerance of the genes. The highest enrichment was observed for CADD
within tolerant genes (Figure 7b) and for CADD, Zoonomia and TraP within intolerant genes (Figure 7c).
We replicated CADD and Zoonomia high enrichments by using the list of ClinVar synonymous pathogenic
variants (Figure 8); note that the high enrichments for SilVA and Trap are due to the fact that they are
trained using this ClinVar dataset.
Figure 7: We report line plots showing relationship between score percentile within sSNVs and predictors
enrichments for fine-mapped UK Biobank variants with a PIP ≥ 0.5 (a). We split coding variants by half
according to how their affected genes were tolerant to loss-of-functions mutations by using their pLI scores
(same threshold of pLI scores used in Figure 4). We report fine-mapped UK Biobank coding variants
impacting more tolerant genes (b) and more intolerant genes of our dataset (c). For visualization purposes,
we kept datapoints with more than 5 causal variants over a give score threshold in the plot.
21
Figure 8: We report line plots showing relationship between predictors score percentile within sSNVs and
predictors enrichments for mendelian disorders ClinVar variants. (a) Variants classified as “Pathogenic”
were compared to other variants to compute enrichment. (b) Variants classified as “Pathogenic”, “Likely
pathogenic”, “Pathogenic/Likely pathogenic” were grouped and compared to other variants to compute
enrichment.
22
Chapter 4: Conclusion
Current methods for predicting the pathogenicity of rare synonymous variants suffer from the lack of
experimental data necessary for training and validation. The results of our study indicate that this problem
could be addressed by leveraging common fine-mapped sSNVs from expression and association studies.
We report that synonymous SNVs present similar enrichment compared to non-synonymous SNVs
within variants impacting gene expression (GTEx fine-mapped eQTLs) and are under negative selection.
In contrast to what we observed for GTEx, synonymous variants presented significantly lower enrichments
than non-synonymous ones within common UKBB fine-mapped variants impacting human diseases and
complex traits. Nevertheless, we observed similar effect sizes for lower-frequency variants in the UKBB
dataset, indicating that non-synonymous variants are more likely to be causal than synonymous variants,
but that the causal ones have similar effect sizes. Our results also indicate that UKBB fine-mapped sSNVs
are under negative selection. As such, GTEx and UKBB fine-mapped sSNVs can constitute a relatively
large dataset for training novel pathogenicity prediction algorithms.
Interestingly, the results of our study suggest a different functional architecture of sSNVs affecting
gene expression (GTEx) and complex phenotypic traits (UKBB). Indeed, we observed that synonymous
and non-synonymous SNVs tend to present a similar impact on gene expression: we observe similar
enrichment of synonymous and non-synonymous SNVs within GTEx eQTLs and top scores of Enformer,
which is a sequence-based gene expression predictor. Enformer also showed the best performance when
predicting the effects of sSNVs on gene expression (this observation is, however, biased as GTEx data was
used in the Enformer model for initial training). In contrast, our analysis of UKBB fine-mapped SNVs
indicated that the impact of synonymous and non-synonymous variants on complex traits seems to differ
significantly.
23
References
1000 Genomes Project Consortium, Adam Auton, Lisa D. Brooks, Richard M. Durbin, Erik P. Garrison,
Hyun Min Kang, Jan O. Korbel, et al. 2015. “A Global Reference for Human Genetic Variation.”
Nature 526 (7571): 68–74.
Avsec, Žiga, Vikram Agarwal, Daniel Visentin, Joseph R. Ledsam, Agnieszka Grabska-Barwinska, Kyle
R. Taylor, Yannis Assael, John Jumper, Pushmeet Kohli, and David R. Kelley. 2021. “Effective
Gene Expression Prediction from Sequence by Integrating Long-Range Interactions.” Nature
Methods 18 (10): 1196–1203.
Bamshad, Michael J., Sarah B. Ng, Abigail W. Bigham, Holly K. Tabor, Mary J. Emond, Deborah A.
Nickerson, and Jay Shendure. 2011. “Exome Sequencing as a Tool for Mendelian Disease Gene
Discovery.” Nature Reviews Genetics. https://doi.org/10.1038/nrg3031.
Bartoszewski, Rafal A., Michael Jablonsky, Sylwia Bartoszewska, Lauren Stevenson, Qun Dai, John
Kappes, James F. Collawn, and Zsuzsa Bebok. 2010. “A Synonymous Single Nucleotide
Polymorphism in ΔF508 CFTR Alters the Secondary Structure of the mRNA and the Expression of
the Mutant Protein.” Journal of Biological Chemistry. https://doi.org/10.1074/jbc.m110.154575.
Brest, Patrick, Pierre Lapaquette, Mouloud Souidi, Kevin Lebrigand, Annabelle Cesaro, Valérie Vouret-
Craviari, Bernard Mari, et al. 2011. “A Synonymous Variant in IRGM Alters a Binding Site for
miR-196 and Causes Deregulation of IRGM-Dependent Xenophagy in Crohn’s Disease.” Nature
Genetics 43 (3): 242–45.
Buske, Orion J., Ashokkumar Manickaraj, Seema Mital, Peter N. Ray, and Michael Brudno. 2015.
“Identification of Deleterious Synonymous Variants in Human Genomes.” Bioinformatics 31 (5):
799.
Chamary, J. V., Joanna L. Parmley, and Laurence D. Hurst. 2006. “Hearing Silence: Non-Neutral
Evolution at Synonymous Sites in Mammals.” Nature Reviews. Genetics 7 (2): 98–108.
Davydov, Eugene V., David L. Goode, Marina Sirota, Gregory M. Cooper, Arend Sidow, and Serafim
Batzoglou. 2010. “Identifying a High Fraction of the Human Genome to Be under Selective
Constraint Using GERP++.” PLoS Computational Biology 6 (12): e1001025.
Diederichs, Sven, Lorenz Bartsch, Julia C. Berkmann, Karin Fröse, Jana Heitmann, Caroline Hoppe,
Deetje Iggena, et al. 2016. “The Dark Matter of the Cancer Genome: Aberrations in Regulatory
Elements, Untranslated Regions, Splice Sites, Non-Coding RNA and Synonymous Mutations.”
EMBO Molecular Medicine 8 (5): 442–57.
Dixit, Ritu, Ashok Kumar, and Bhagyalaxmi Mohapatra. 2019. “Implication of GATA4 Synonymous
Variants in Congenital Heart Disease: A Comprehensive in-Silico Approach.” Mutation Research
813 (January): 31–38.
Gazal, Steven, Po-Ru Loh, Hilary K. Finucane, Andrea Ganna, Armin Schoech, Shamil Sunyaev, and
Alkes L. Price. 2018. “Functional Architecture of Low-Frequency Variants Highlights Strength of
Negative Selection across Coding and Non-Coding Annotations.” Nature Genetics 50 (11): 1600–
1607.
24
Gelfman, Sahar, Quanli Wang, K. Melodi McSweeney, Zhong Ren, Francesca La Carpia, Matt
Halvorsen, Kelly Schoch, et al. 2017. “Annotating Pathogenic Non-Coding Variants in Genic
Regions.” Nature Communications 8 (1): 236.
Gotea, Valer, Jared J. Gartner, Nouar Qutob, Laura Elnitski, and Yardena Samuels. 2015. “The
Functional Relevance of Somatic Synonymous Mutations in Melanoma and Other Cancers.”
Pigment Cell & Melanoma Research 28 (6): 673–84.
Hormozdiari, Farhad, Emrah Kostem, Eun Yong Kang, Bogdan Pasaniuc, and Eleazar Eskin. 2014.
“Identifying Causal Variants at Loci with Multiple Signals of Association.” Genetics 198 (2): 497.
Hunt, Ryan, Gaya Hettiarachchi, Upendra Katneni, Nancy Hernandez, David Holcomb, Jacob Kames,
Redab Alnifaidy, et al. 2019. “A Single Synonymous Variant (c.354G>A [p.P118P]) in ADAMTS13
Confers Enhanced Specific Activity.” International Journal of Molecular Sciences.
https://doi.org/10.3390/ijms20225734.
Kandoth, Cyriac, Michael D. McLellan, Fabio Vandin, Kai Ye, Beifang Niu, Charles Lu, Mingchao Xie,
et al. 2013. “Mutational Landscape and Significance across 12 Major Cancer Types.” Nature.
https://doi.org/10.1038/nature12634.
Karczewski, Konrad J., Laurent C. Francioli, Grace Tiao, Beryl B. Cummings, Jessica Alföldi, Qingbo
Wang, Ryan L. Collins, et al. 2020. “The Mutational Constraint Spectrum Quantified from Variation
in 141,456 Humans.” Nature 581 (7809): 434–43.
Kelley, David R. 2020. “Cross-Species Regulatory Sequence Activity Prediction.” PLoS Computational
Biology 16 (7): e1008050.
Kim, Artem, Jérôme Le Douce, Farah Diab, Monika Ferovova, Christèle Dubourg, Sylvie Odent, Valérie
Dupé, et al. 2020. “Synonymous Variants in Holoprosencephaly Alter Codon Usage and Impact the
Sonic Hedgehog Protein.” Brain: A Journal of Neurology 143 (7): 2027–38.
Kimura, M. 1977. “Preponderance of Synonymous Changes as Evidence for the Neutral Theory of
Molecular Evolution.” Nature 267 (5608): 275–76.
Kircher, M., D. M. Witten, P. Jain, B. J. O’Roak, G. M. Cooper, and J. Shendure. 2014. “A General
Framework for Estimating the Relative Pathogenicity of Human Genetic Variants.” Nature Genetics
46 (3). https://doi.org/10.1038/ng.2892.
Landrum, Melissa J., Jennifer M. Lee, George R. Riley, Wonhee Jang, Wendy S. Rubinstein, Deanna M.
Church, and Donna R. Maglott. 2014. “ClinVar: Public Archive of Relationships among Sequence
Variation and Human Phenotype.” Nucleic Acids Research 42 (Database issue): D980–85.
Liu, Xiaoming, Xueqiu Jian, and Eric Boerwinkle. 2011. “dbNSFP: A Lightweight Database of Human
Nonsynonymous SNPs and Their Functional Predictions.” Human Mutation 32 (8): 894–99.
———. 2013. “dbNSFP v2.0: A Database of Human Non-Synonymous SNVs and Their Functional
Predictions and Annotations.” Human Mutation. https://doi.org/10.1002/humu.22376.
MacArthur, D. G., T. A. Manolio, D. P. Dimmock, H. L. Rehm, J. Shendure, G. R. Abecasis, D. R.
Adams, et al. 2014. “Guidelines for Investigating Causality of Sequence Variants in Human
Disease.” Nature 508 (7497): 469–76.
25
Macaya, D., S. H. Katsanis, T. W. Hefferon, S. Audlin, N. J. Mendelsohn, J. Roggenbuck, and G. R.
Cutting. 2009. “A Synonymous Mutation inTCOF1causes Treacher Collins Syndrome due to Mis-
Splicing of a Constitutive Exon.” American Journal of Medical Genetics Part A.
https://doi.org/10.1002/ajmg.a.32834.
Park, Chungoo, Xiaoshu Chen, Jian-Rong Yang, and Jianzhi Zhang. 2013. “Differential Requirements for
mRNA Folding Partially Explain Why Highly Expressed Proteins Evolve Slowly.” Proceedings of
the National Academy of Sciences of the United States of America 110 (8): E678–86.
Plotkin, Joshua B., and Grzegorz Kudla. 2011. “Synonymous but Not the Same: The Causes and
Consequences of Codon Bias.” Nature Reviews Genetics. https://doi.org/10.1038/nrg2899.
Quax, Tessa E. F., Nico J. Claassens, Dieter Söll, and John van der Oost. 2015. “Codon Bias as a Means
to Fine-Tune Gene Expression.” Molecular Cell 59 (2): 149–61.
Reitz, Christiane, Daniel Felsky, Ismael Santa-Maria, Badri N. Vardarajan, David A. Bennett, Philip L.
De Jager, and Richard Mayeux. 2018. “P1‐161: RARE, SYNONYMOUS VARIANTS IN CDH23,
SLC9A3R1, RHBDD2 AND ITIH2 ARE ASSOCIATED WITH ALZHEIMER’S DISEASE IN
MULTIPLEX CARIBBEAN HISPANIC FAMILIES.” Alzheimer’s & Dementia.
https://doi.org/10.1016/j.jalz.2018.06.165.
Sauna, Zuben E., and Chava Kimchi-Sarfaty. 2011. “Understanding the Contribution of Synonymous
Mutations to Human Disease.” Nature Reviews Genetics. https://doi.org/10.1038/nrg3051.
Sharp, P. M., T. M. Tuohy, and K. R. Mosurski. 1986. “Codon Usage in Yeast: Cluster Analysis Clearly
Differentiates Highly and Lowly Expressed Genes.” Nucleic Acids Research 14 (13): 5125–43.
Spain, Sarah L., and Jeffrey C. Barrett. 2015. “Strategies for Fine-Mapping Complex Traits.” Human
Molecular Genetics 24 (R1): R111–19.
Supek, Fran, Belén Miñana, Juan Valcárcel, Toni Gabaldón, and Ben Lehner. 2014. “Synonymous
Mutations Frequently Act as Driver Mutations in Human Cancers.” Cell.
https://doi.org/10.1016/j.cell.2014.01.051.
Takata, Atsushi, Iuliana Ionita-Laza, Joseph A. Gogos, Bin Xu, and Maria Karayiorgou. 2016. “De Novo
Synonymous Mutations in Regulatory Elements Contribute to the Genetic Etiology of Autism and
Schizophrenia.” Neuron. https://doi.org/10.1016/j.neuron.2016.02.024.
Wang, Gao, Abhishek Sarkar, Peter Carbonetto, and Matthew Stephens. n.d. “A Simple New Approach to
Variable Selection in Regression, with Application to Genetic Fine-Mapping.”
https://doi.org/10.1101/501114.
Wang, Kai, Mingyao Li, and Hakon Hakonarson. 2010. “ANNOVAR: Functional Annotation of Genetic
Variants from High-Throughput Sequencing Data.” Nucleic Acids Research 38 (16): e164.
Weissbrod, Omer, Farhad Hormozdiari, Christian Benner, Ran Cui, Jacob Ulirsch, Steven Gazal, Armin
P. Schoech, et al. 2020. “Functionally Informed Fine-Mapping and Polygenic Localization of
Complex Trait Heritability.” Nature Genetics 52 (12): 1355–63.
Zeng, Zishuo, Ariel A. Aptekmann, and Yana Bromberg. 2021. “Decoding the Effects of Synonymous
26
Variants.” Nucleic Acids Research 49 (22): 12673–91.
Zeng, Zishuo, and Yana Bromberg. 2019. “Predicting Functional Effects of Synonymous Variants: A
Systematic Review and Perspectives.” Frontiers in Genetics 10 (October): 914.
Zoonomia Consortium. 2020. “A Comparative Genomics Multitool for Scientific Discovery and
Conservation.” Nature 587 (7833): 240–45.
Abstract (if available)
Abstract
Synonymous coding variants are usually regarded as neutral, and thus neglected in clinical genetic studies. However, recent studies highlighted a few examples where they can play a crucial role in fine-tuning gene expression, protein function and even contribute to human pathologies through a variety of mechanisms. Computational tools have been developed to predict the pathogenicity effect of synonymous variants, but their performance remains unclear. Here, we leveraged large fine-mapping studies of molecular and phenotypic traits to characterize the regulatory and deleterious effects of common synonymous variants. First, we estimated the enrichment of synonymous (and non-synonymous) coding variants within fine-mapped eQTLs from GTEx and fine-mapped causal variants from UK Biobank. We observed similar enrichment for synonymous and non-synonymous variants impacting gene expression, which are under similar patterns of negative selection. In contrast, we found that enrichments were significantly higher for non-synonymous variants on human disease and complex traits compared to synonymous counterparts. Second, we leveraged fine-mapped synonymous variants in GTEx and UK Biobank datasets to evaluate the predictive power of various predictors. Our results indicate that the impact of common synonymous and non-synonymous variants on gene expression and human complex traits differ substantially, but that they could be leveraged to predict the effect of rare synonymous variants on disease risk.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Understanding ancestry-specific disease allelic effect sizes by leveraging multi-ancestry single-cell RNA-seq data
PDF
Orthogonal shared basis factorization: cross-species gene expression analysis using a common expression subspace
PDF
Improving the power of GWAS Z-score imputation by leveraging functional data
PDF
Leveraging functional datasets of stimulated cells to understand the relationship between environment and diseases
PDF
Functional characterization of colorectal cancer GWAS loci
PDF
Understand the distinct patterns of selection in auto-immune diseases with ancient DNA data by the S-LDSC model
PDF
Characterization and discovery of genetic associations: multiethnic fine-mapping and incorporation of functional information
PDF
Ecologically responsible domestication of kelp facilitated by genomic tools
PDF
Characterizing and manipulating homology-directed gene editing in human cells
PDF
Model selection methods for genome wide association studies and statistical analysis of RNA seq data
PDF
Alzheimer’s disease: dysregulated genes, ethno-racial disparities, and environmental pollution
PDF
Pseudotyped viral vectors: HIV gene therapy applications and basic studies of SARS-COV-2
PDF
Predicting functional consequences of SNPs: insights from translation elongation, molecular phenotypes, and pathways
PDF
Epigenetic plasticity of cultured female human embryonic stem cells and regulation of gene expression and chromatin by PR-SET7 mediated H4K20me1
Asset Metadata
Creator
Huang, Tzu Yu
(author)
Core Title
Characterizing synonymous variants by leveraging gene expression and GWAS datasets
School
Keck School of Medicine
Degree
Master of Science
Degree Program
Biostatistics
Publication Date
11/17/2022
Defense Date
11/16/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
CAVIAR,ClinVar,eQTL,Expression quantitative trait loci,fine mapping,genome wide association study,GTEx,GWAS,non-synonymous variants,OAI-PMH Harvest,PolyFun,SuSie,synonymous,synonymous variants,UK Biobank,UKBB
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Steven, Gazal (
committee chair
), Joseph, Wiemels (
committee member
), Nicholas, Mancuso (
committee member
)
Creator Email
cornmit@gmail.com,tzuyuhua@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC112486312
Unique identifier
UC112486312
Identifier
etd-HuangTzuYu-11321.pdf (filename)
Legacy Identifier
etd-HuangTzuYu-11321
Document Type
Thesis
Format
theses (aat)
Rights
Huang, Tzu Yu
Internet Media Type
application/pdf
Type
texts
Source
20221121-usctheses-batch-992
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
CAVIAR
ClinVar
eQTL
Expression quantitative trait loci
fine mapping
genome wide association study
GTEx
GWAS
non-synonymous variants
PolyFun
SuSie
synonymous
synonymous variants
UK Biobank
UKBB