Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Comparative transcriptomics: connecting the genome to evolution
(USC Thesis Other)
Comparative transcriptomics: connecting the genome to evolution
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Comparative Transcriptomics: Connecting the Genome to Evolution
By
Wei Jiang
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTATIONAL BIOLOGY AND BIOINFORMATICS)
May 2024
Copyright 2024 Wei Jiang
ii
Acknowledgments
First and foremost, I would like to express my sincere gratitude to my advisors Professor
Liang Chen for her unwavering support and mentoring. Having her as my mentor over the past
six years has been the greatest fortune, and her influence on me extends well beyond research.
My special thanks also go to Professor Fengzhu Sun, Professor Michael D. Edge, and
Professor Aiichiro Nakano for serving as committee members during my qualifying exam and
dissertation defense. Additionally, I would like to express my honor and gratitude to Professor
Sika Zheng for our collaborative work. A special note of thanks for his professionalism, I have
gained valuable insights from our collaborative effort.
Lastly, I want to thank my parents, Guixiang Bai, Haishan Jiang, and my husband,
Xinyun Xiao, for their unconditional love and endless support throughout my academic journey.
iii
Table of Contents
Acknowledgments............................................................................................................................ ii
List of Figures..................................................................................................................................v
Abstract........................................................................................................................................ viii
Chapter 1 Introduction ................................................................................................................... 1
1.1 Transcriptomic analysis.......................................................................................................................1
1.2 Gene Expression profiling...................................................................................................................1
1.2.1 Tissue Specificity of Gene Expression........................................................................................................ 2
1.3 Alternative splicing .............................................................................................................................3
1.3.1 General mechanisms of pre-mRNA splicing............................................................................................... 3
1.3.2 Alternative splicing mechanisms and regulation......................................................................................... 6
1.3.3 Alternative splicing associated human disease............................................................................................ 9
1.4 Histone modification ...........................................................................................................................9
1.4.1 Regulation of chromatin by histone modification..................................................................................... 10
1.5 Comparative Analysis with Model Organisms .................................................................................11
1.5.1 Comparative genomics.............................................................................................................................. 11
1.5.2 Comparative transcriptomics..................................................................................................................... 12
Chapter 2 Tissue Specificity of Gene Expression Evolves Across Mammal Species.................... 14
2.1 Introduction .......................................................................................................................................14
2.2 Materials and Methods......................................................................................................................15
2.2.1 Data acquisition......................................................................................................................................... 15
2.2.2 RNA-seq data processing .......................................................................................................................... 16
2.2.3 Tissue specificity quantification................................................................................................................ 16
2.2.4 Identification of tissue-specific shifting pattern with the EVE model ...................................................... 17
2.2.5 Synonymous codon usage distribution...................................................................................................... 20
2.2.6 Alternative splicing of cassette exons ....................................................................................................... 20
2.3 Results and Discussion......................................................................................................................21
2.3.1 Tissue specificity landscape of gene expression in five mammalian species ........................................... 21
2.3.2 Evolutionary patterns of tissue specificity across five mammalian species.............................................. 23
2.3.3 Genes with shifting tissue specificity have a distinct conservation pattern .............................................. 26
2.3.4 Subgroups of genes with shifting tissue specificity .................................................................................. 29
2.3.5 Synonymous codon usage bias in different gene groups........................................................................... 31
2.3.6 Functional classification for different gene groups................................................................................... 32
2.3.7 Tissue specificity shifting at the alternative splicing level........................................................................ 33
2.4 Conclusion.........................................................................................................................................34
Chapter 3 Alternative Splicing and its Implications for Maximum Lifespan ............................... 36
3.1 Introduction .......................................................................................................................................36
3.2 Materials and Methods......................................................................................................................37
3.2.1 Data acquisition......................................................................................................................................... 37
3.2.2 RNA-seq-based de novo transcriptome assembly..................................................................................... 38
3.2.3 Homologous alternative splicing identification......................................................................................... 38
3.2.4 Alternative splicing quantification ............................................................................................................ 41
3.2.5 Models for detecting MLS- or age-associated alternative splicing events................................................ 41
3.2.6 Phylogenetically independent contrasts .................................................................................................... 42
iv
3.2.7 Gene set enrichment analysis .................................................................................................................... 42
3.2.8 RNA-binding protein analysis for MLS (or age)-associated alternative splicing ..................................... 42
3.3 Results...............................................................................................................................................43
3.3.1 Homologous alternative splicing events in Rodentia and Eutlipotyphyla species.................................... 43
3.3.2 Alternative splicing events associated with MLS in Rodentia and Eutlipotyphyla species...................... 47
3.3.3 MLS-associated alternative splicing events exhibit distinct patterns in the brain..................................... 51
3.3.4 Functional enrichment of genes with MLS-associated alternative splicing.............................................. 52
3.3.5 Age-associated alternative splicing events across human tissues ............................................................. 55
3.3.6 Alternative splicing events associated with both MLS and age are enriched for intrinsically disordered
proteins............................................................................................................................................................... 58
3.3.7 Exploration of upstream regulators of MLS and age-associated AS events............................................. 60
3.4 Discussion .........................................................................................................................................64
Chapter 4 Global Reprogramming of Apoptosis-Related Genes during Brain Development...... 68
4.1 Introduction .......................................................................................................................................68
4.2 Materials and Methods......................................................................................................................70
4.2.1 Definition of Apoptosis-Related Genes..................................................................................................... 70
4.2.2 Data Acquisition........................................................................................................................................ 71
4.2.3 RNA-Seq Data Analysis............................................................................................................................ 71
4.2.4 ChIP-Seq Data Analysis............................................................................................................................ 71
4.2.5 Regression Fit between RNA-Seq (or ChIP-Seq) Data and Ages............................................................. 72
4.2.6 KEGG Apoptosis Pathway........................................................................................................................ 73
4.3 Results...............................................................................................................................................73
4.3.1. Most Apoptosis-Related Genes Are Dynamically Regulated in the Developing Forebrain of a Mouse . 73
4.3.2 Temporal Expression Patterns of Apoptosis-Related Genes Are Consistent across Different Brain
Regions during Development............................................................................................................................. 75
4.3.3 The Regulation of the Classical Apoptosis Pathway during Mouse Brain Development......................... 78
4.3.4 Distinct Histone Modifications Underly the Developmental Regulation of Apoptotic Genes................. 80
4.3.5 Developmental Remodeling of H3K4me3 and H3K27me3 Marks Underscores Expression Changes of
Pidd1, Casp6, Mapk8, and Tradd ....................................................................................................................... 83
4.4 Discussion .........................................................................................................................................84
Chapter 5 Conclusions and Future work...................................................................................... 87
References..................................................................................................................................... 89
v
List of Figures
Figure 1.1 Stepwise schematic presentation of general pre-mRNA splicing. Abbreviations: BP: brunch
point; SS: splice site................................................................................................................................................ 5
Figure 2.2 Number of tissue-specific genes identified across 13 tissues in five species. (a) All genes were
considered for each species. (b) Only genes with 1-to-1 homologous genes across all five species were
considered............................................................................................................................................................. 22
Figure 2.3 Tissue-specificity shifting of gene expression between two tissues of two species. All genes
have 1-to-1 homologous counterparts across the five species. Each circle represents a comparison
between two tissues. The shifting direction is from a row tissue to a column tissue. Each comparison of
the two tissues between a pair of species is illustrated in one sector of the circle. Such species pairs are
abbreviated by their first letters, i.e., dh represents the comparison between dogs and humans. The
arrangements of the species pairs for the sectors are shown along the heatmap. The shifting
discoveries were based on the τ-only measurements. (a) Absolute frequency. (b) Normalized frequency.
For a species pair dh, the normalized tissue-specificity-shifting frequency from tissues T to S is
��(�), �(ℎ)/��(�)��(ℎ), where ��(�), �(ℎ) is the number of genes which are tissue-T-specific in
species d and tissue-S-specific in species h; ��(�) is the total number of genes which are tissue-Tspecific in species d; and ��(ℎ) is the total number of genes which are tissue-S-specific in species h. ............. 24
Figure 2.4 Comparison of the tissue-specificity shifting between the τ-based and the fold change-based
methods. (a) Normalized shifting frequency based on the τ measurements. (b) Normalized shifting
frequency based on the fold-change measurements. For each tissue pair (T and S), the normalized
tissue-specificity-shifting frequency is �, ℎ��(�), �(ℎ)/���(�)ℎ��(ℎ), where �, ℎ��(�), �(ℎ) is the
total number of genes shifting their tissue specificity between tissues T and S; ���(�) is the total
number of genes which are tissue-T-specific; and ℎ��(ℎ) is the total number of genes which are
tissue-S-specific..................................................................................................................................................... 25
Figure 2.5 Conservation comparison for genes with different degree of tissue specificity. Genes were
grouped as tissue-specificity-shifting genes, tissue-specificity-not-shifting genes, and housekeeping
genes. (a) Conservation scores of CDS regions for different gene groups. For each gene, the
conservation score was calculated as the average of the PhastCons scores across CDS positions. For
genes with multiple transcripts, the average scores of all transcripts were used. (b) Comparison of the
conservation scores at the 3rd positions of codons for different gene groups. The median PhastCons
score at each CDS 3rd position relative to the 5’ and 3’ ends is shown. (c) Comparison of the
conservation scores at promoter regions for different gene groups. The 3rd quartile of PhastCons score
at each relative position is shown. The human annotations are used. ................................................................. 27
Figure 2.6 Density curve of 5’UTR lengths for genes with different degree of tissue specificity. Genes
are grouped as tissue-specificity shifting genes, tissue-specificity non-shifting genes, and housekeeping
genes. (a) The 5’ UTR lengths are based on the human annotation (Ensembl Genes 105, GRCh38.p13).
(b) The 5’ UTR lengths are based on the mouse annotation (Ensembl Genes 105, GRCm39)............................ 29
Figure 2.7 Subgroups of genes with shifting specificity measured by EVE. (a) Number of genes for each
shifting subgroup. A: complete shifting; B: intermediate shifting; C: specificity loss; and D: initial
shifting. (b) The comparison of PhastCons scores between complete shifting (subtype A) and
incomplete shifting genes (subtypes B-D) at each promoter position. P-values from Student’s t tests are
shown. (c) The comparison of PhastCons scores between complete shifting and incomplete shifting
genes at third codon positions. P-values from Student’s t tests are shown. ......................................................... 30
Figure 2.8 Synonymous codon usage bias analysis for different gene groups. Single-letter amino acid
codes (* represents the stop codon) for the 19 codons with degeneracy are shown along the horizontal
axis. The vertical axis represents the significance of the codon usage bias (−log10 p, truncated at 300
for those very significant ones with ‘>’ on the bars) for a specific codon in the three gene groups. The
p-value was based on the Chi-square goodness of fit test. ................................................................................... 31
Figure 2.9 Gene ontology pathway analysis for genes with shifting tissue specificity. ............................................ 33
Figure 2.10 Tissue-specificity shifting of alternative splicing between two tissues of two species. All
homologous exons between two species are considered and they may not have the 1-to-1 homologous
counterparts in other species. Each circle represents a comparison between two tissues. The shifting
direction is from a row tissue to a column tissue. Each comparison of the two tissues between a pair of
species is illustrated in one sector of the circle. The shifting discoveries were based on the τ-only
measurements. (a) Absolute frequency. (b) Normalized frequency. For a species pair dh, the
vi
normalized tissue-specificity-shifting frequency from tissues T to S is ��(�), �(ℎ)/��(�)��(ℎ),
where ��(�), �(ℎ) is the number of cassette exons whose splicing ratios are tissue-T-specific in
species d and tissue-S-specific in species h; ��(�) is the total number of cassette exons whose splicing
ratios are tissue-T-specific in species d; and ��(ℎ) is the total number of cassette exons whose
splicing ratios are tissue-S-specific in species h................................................................................................... 34
Figure 3.1 Sequence regions used for homologous alternative splicing identification. (a) constitutive
splicing; (b) exon skipping (cassette exons); (c) mutually exclusive exons; (d) alternative 5’ splice sites
(alternative donors); (e) alternative 3’ splice sites (alternative acceptors); (f) intron retention; (g)
alternative first exons; (h) alternative last exons. ................................................................................................ 39
Figure 3.2 Workflow of alternative splicing identification and quantification. (a) other mammal species
and (b) human....................................................................................................................................................... 41
Figure 3.3 Homologous alternative splicing events in Rodentia and Eutlipotyphyla species. Sequences
are aligned against the mouse BLAST database................................................................................................... 45
Figure 3.4 Seven types of alternative splicing events in nonhuman. (a) AS events within individual
species. (b) Homologous AS events identified against the mouse BLAST database............................................. 46
Figure 3.5 Maximum life span (MLS)-associated alternative splicing events in nonhuman. Bar plots
illustrate the number of AS events where the percent spliced-in (PSI) level is positively (a) or
negatively (b) correlated with MLS, body mass (BM), or both. Jaccard index values are calculated to
show the similarity between BM-associated and MLS-associated AS events: (c) positive association,
and (d) negative association. ................................................................................................................................ 48
Figure 3.6 Type distribution of MLS-associated alternative splicing events. ........................................................... 48
Figure 3.7 Overlap (Jaccard index value) between positively (a) or negatively (b) MLS-associated and BMassociated gene expression................................................................................................................................... 49
Figure 3.8 Scatterplots illustrating two MLS-associated alternative splicing events. (a) Exon skipping
event in which its PSI is positively correlated with MLS in the liver. (b) Exon skipping event in which
its PSI is negatively correlated with MLS in the kidney. Spearman correlation coefficients (�) of PSI
and MLS are shown in the plot. Phylogenetically corrected coefficients are shown in the parenthesis.............. 50
Figure 3.9 Tissue specificity of MLS-associated alternative splicing events. (a) Upset plots showing
positive (Pos-MLS AS) or negative (Neg-MLS AS) MLS-correlated AS events identified in six tissues.
(b) Hierarchical biclustering of AS events across tissues based on Spearman correlation coefficients
with maximum lifespan. All considered homologous AS events are plotted......................................................... 51
Figure 3.10 Brain-specific association between alternative splicing and maximum lifespan. For these AS
events, their correlations with MLS in the brain have opposite signs compared to other tissues. ES:
exon skipping; A5: alternative 5’ splice sites; A3: alternative 3’ splice sites...................................................... 52
Figure 3.11 GSEA dot pots of enriched terms with all tissue merged for AS-MLS association calculation.
(a) Gene Ontology (GO) terms; (b) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. ............... 54
Figure 3.12 GSEA dot pots of enriched KEGG pathways within individual tissues. (a) Brain, (b)Skin, (c)
Kidney, (d)Lung, (e) Liver, (f)Heart. .................................................................................................................... 55
Figure 3.13 Age-associated alternative splicing events in humans. (a) The number of age-associated AS
events in individual human tissues. (b) Heatmap showing pairwise tissue similarity based on Jaccard
index values for age-associated AS events. These events are identified by our elastic net regression. (c)
The number of age-associated AS for six major tissues. Events from sub-tissues are pooled together. .............. 56
Figure 3.14 Heatmap showing pairwise tissue similarity for age-associated splicing identified by
traditional regression models fitting alternative splicing individually. The overlap similarity is based
on Jaccard index values........................................................................................................................................ 58
Figure 3.15 Percentage of IDP genes in humans. IDP genes are those associated with the GO term
“REGION: Disordered”. One-sided Fisher’s exact tests are used for the percentage comparison.................... 59
Figure 3.16 Top 20 most enriched RBP compared to the control group in MLS/age-associated AS events
from various tissues are pooled. ........................................................................................................................... 60
Figure 3.17 (a)Heatmap showing hierarchical biclustering of MLS-associated AS event based on percentile
of observed length-normalized frequency among all the homologous alternative splicing events. Each
row corresponds to a specific AS event, while columns represent the associated RNA Binding Proteins
(RBPs).Green line on the grey bar represent overlap with age-associated AS events. (b) detailed view
of cluster 1 (c) and cluster 2.. ............................................................................................................................... 63
vii
Figure 3.18 Heatmap showing hierarchical biclustering of age-associated AS event based on based on
percentile of observed length-normalized frequency among all the homologous alternative splicing
events. Green line on the grey bar represent overlap with MLS-associated AS events........................................ 64
Figure 4.1 Expression dynamic of apoptosis−related genes during mouse forebrain development (a)
Hierarchical clustering and expression heatmap of apoptosis−related genes across development ages
(E10.5, E11.5, E12.5, E13.5, E14.5, E15.5, E16.5, P0). Clusters are defined by the red line crossing
the dendrogram. Cluster numbers are noted next to each cluster tree. (b) mRNA expression of five
clusters defined in (a). The expression level of each gene was log−transformed (log2(TPM+1)) and
then Z−score normalized (i.e., centered by the mean and divided by the standard deviation across
different development ages). ................................................................................................................................. 75
Figure 4.2 Expression of apoptotic genes in midbrain and hindbrain regions. Gene clusters were defined
by the forebrain data............................................................................................................................................. 76
Figure 4.3 Consistent temporal expression patterns of apoptotic genes during development in three
different brain regions (a) Clustering heatmaps of apoptotic gene mRNA expression in the midbrain
and hindbrain. Gene clusters with the dendrogram are labeled on the right to the heat maps. (b) Venn
diagrams of downregulated genes and upregulated genes defined by hierarchical clustering separately
performed for three brain regions, showing substantial overlaps among the three brain regions...................... 77
Figure 4.4 The classic apoptosis pathway from the KEGG database marked with regulatory directions of
developmental gene ex-pression changes and functional activity on apoptosis................................................ 79
Figure 4.5 Distinct histone modifications are responsible for the chronical change in expression of the
downregulation and upregulation clusters. Regression coefficients (βEx) between mRNA expression
and ages for the downregulation clusters (a) and the upregulation clusters (b) in the forebrain,
midbrain, and hindbrain. Regression coefficients (βK4) between H3K4me3 signals and ages for genes
in the downregulation clusters (c) and the upregulation clusters (d) in the forebrain, midbrain, and
hindbrain. Regression coefficients (βK27) between H3K27me3 signals and ages for genes in the
downregulation clusters (e) and the upregulation clusters (f) in the forebrain, midbrain, and
hindbrain. All three brain regions show consistent patterns................................................................................ 82
Figure 4.6 Histone modification patterns of representative genes in developing mouse brain tissues. Top:
H3K4me3 (green) and H3K27me3 (red) ChIP-seq peak signals around selected TSSs at eight time
points; bottom: mRNA expression of the gene in the same corresponding brain region as shown on the
top, analyzed by RNA-seq. (a) Pidd1 in the midbrain, (b) Casp6 in the midbrain, (c) Mapk8 in the
forebrain, (d) Tradd in the forebrain.................................................................................................................... 84
viii
Abstract
The post-genomic era has witnessed the rapid development of transcriptomics, propelled by
high-throughput technologies. Comparative transcriptomic analysis, focusing on quantification
and comparison of gene expression and alternative splicing across species, plays a pivotal role in
unraveling the complexities of biological processes and evolution relationship. In this
dissertation, we developed and implemented various statistical methods and computational
pipeline to explore hidden information in the transcriptome and genome with comparative
analysis.
In Chapter 2, we explored tissue specificity in gene expression across species, utilizing
transcriptomic data from humans, cynomolgus macaques, rats, mice, and dogs in 13 tissues.
The evolutionary path of such tissue specificity provides essential information about the tissuespecific function of genes and the validity of disease animal models. We found that although
tissue specificity of homologous gene expression is largely well-conserved across species, a total
of 380 genes shifted or are in the process of shifting their tissue specificity. The tissuespecificity-shifting genes are less conserved than those preserving their tissue specificity or
housekeeping genes. Interestingly, tissue-specificity-shifting genes tend to be less conserved at
the third codon positions, likely due to their relaxed synonymous codon usage bias. Moreover,
compared to genes, cassette exons are more likely to shift their tissue specificity of splicing
across the five species.
Next, we delved deeper into explore one of regulation mechanism of gene expression:
alternative splicing with comparative approach once more. We investigated the potential
connection between alternative splicing, maximum lifespan, and longevity with a comparative
transcriptomic analysis involving 26 mammal species with varying lifespans. We identified
ix
hundreds of alternative splicing events associated with maximum lifespans. These longevityassociated alternative splicing events exhibited tissue-specific patterns, particularly in the brain.
They were enriched in genes related to metabolic processes, DNA damage or integrity
checkpoints, and mRNA surveillance pathways. Additionally, we derived a list of alternative
splicing events associated with human ages. Our analysis revealed longevity-associated splicing
is enriched in genes encoding intrinsically disordered proteins. Our findings therefore shed light
on the comprehensive role of alternative splicing in maximum lifespans and aging and present
interesting targets for future research and therapeutic interventions.
In addition to post-transcriptional regulation, transcriptional regulation like histone
modification also plays a crucial role in regulating gene expression. In Chapter 4, we tried to
answer the questions that whether and how neurons globally reprogram the expression of
apoptotic genes during development using a mouse dataset. We systematically examined the in
vivo expression of 1923 apoptosis-related genes and associated histone modifications at eight
developmental ages of mouse brains. Most apoptotic genes displayed consistent temporal
patterns across the forebrain, midbrain, and hindbrain, suggesting ubiquitous robust
developmental reprogramming. Although both anti- and pro-apoptotic genes can be up- or
downregulated, half the regulatory events in the classical apoptosis pathway are downregulation
of pro-apoptotic genes. Reduced expression in initiator caspases, apoptosome, and pro-apoptotic
Bcl-2 family members restrains effector caspase activation and attenuates neuronal apoptosis.
The developmental downregulation of apoptotic genes is attributed to decreasing histone-3-
lysine-4-trimethylation (H3K4me3) signals at promoters, where histone-3-lysine-27-
trimethylation (H3K27me3) rarely changes. By contrast, repressive H3K27me3 marks are lost in
the upregulated gene groups, for which developmental H3K4me3 changes are not predictive.
x
Hence, developing brains remove epigenetic H3K4me3 and H3K27me3 marks on different
apoptotic gene groups, contributing to their downregulation and upregulation, respectively. As
such, neurons drastically alter global apoptotic gene expression during development to transform
apoptosis controls. Research into neuronal cell death should consider maturation stages as a
biological variable.
1
Chapter 1 Introduction
1.1 Transcriptomic analysis
Transcriptomics analysis has emerged as a well-developed field in the post-genomic era. With
the advancement of high-throughput technologies, substantial amounts of transcriptomics data
have been generated through the years. The transcriptome is the complete set of transcripts in a
particular type of cell or tissue under a specific developmental stage or a physiological condition
[1, 2]. The primary goal of transcriptomic analysis is to identify differentially expressed genes
among various conditions, offering new insights into associated genes. This approach
emphasizes gene expression, providing genome-wide information on gene structure and function
to unveil the molecular mechanisms governing specific biological processes. Recent
development of next-generation sequencing (NGS) technology has significantly enriched the
field of transcriptomic analysis, notably, the diversity of species that been sequenced for its full
transcriptome increased rapidly in recent year [3, 4]. Paired with another powerful tool:
comparative transcriptomic made it feasible to dive deep into the realm of transcriptomes and
unveil novel mechanisms across various organisms.
In Chapter 1, we will briefly review the importance of gene expression and
understanding tissue specificity of it, as well as two of the pivotal mechanisms governing
expression regulation, histone modifications as transcriptional regulation, and alternative splicing
as post-transcriptional regulation. Additionally, we will discuss the significance of comparative
analysis, and its application in genomic study.
1.2 Gene Expression profiling
Gene expression, one of the most important biological processes that convert DNA information
into functional polypeptide products, is the cornerstone of every living organism. This process
involves the sequential flow of information from DNA to RNA to polypeptide is known as
2
central dogma of molecular biology. Firstly, transcription copies DNA into RNA, and in
eukaryotes, additional processing yields mature mRNA. Following by translation decodes
mRNA sequence into a polypeptide's amino acid sequence with ribosomes and tRNA molecules
[5]. While most of the time, the polypeptide products are proteins, but with non-protein coding
gene, sometime the product will be functional non-coding RNA[6]. The regulation of gene
expression is crucial for controlling cellular process, respond to internal and external stimuli or
executing developmental and growth programs[7, 8]. Therefore, quantification of gene
expression under various condition, tissue, species are essential to understanding fundamental
question of biology.
1.2.1 Tissue Specificity of Gene Expression
With the importance of gene expression, it is not surprising that tissue specificity of gene
expression sheds light on the tissue-selective manifestation of hereditary disease despite the same
DNA across all tissues. The evolutionary path of such tissue specificity provides essential
information about the tissue-specific function of genes and the validity of disease animal models.
Therefore, it is important to quantify and compare gene expression across tissues since this is
critical to explore gene function and its role in the tissue-selective manifestation of hereditary
disease. In an expression profiling analysis, often two main gene categories are focused:
housekeeping genes and tissue-specific genes. The formers are of vital for maintaining basic cell
function, on the other hand tissue-specific genes are also crucial considering the tissue-specific
nature of many diseases [9-12]. Previous study shows that genetic alternations in cancer driver
gene exhibit a noticeable spectrum of tissue specificity, there is copious experimental evidence
that cancer driver genes and the organization of oncogenic signaling pathways show tissue
specificity[13, 14]. More importantly, deciphering the tissue-specific pattern of expression will
3
shed light on molecular mechanism of tissue development and tissue-specific transcriptional
regulation. In Chapter 2, we will explore how tissue specificity of gene expression evolves with a
comprehensive dataset with five mammal species across 13 tissues.
1.3 Alternative splicing
Aside from gene expression itself, a crucial regulation mechanism of gene expression: alternative
splicing has also been investigated in-depth recently. Upon the completion of the Human
Genome Project in 2003, the discrepancy between the number of annotated protein-coding genes
and the number of observed human polypeptides reveals the widespread violation of the “one
gene–one polypeptide” hypothesis. It is now commonly accepted that in higher eukaryotes,
alternative splicing plays a remarkably important role in increasing protein diversity by allowing
one gene to generate distinct protein isoforms and increasing the complexity of gene expression
regulation [15]. In humans, up to 95% of multi-exon genes undergo alternative splicing to
encode proteins with different functions in distinct cellular processes [16]. Furthermore, around
15% of human hereditary diseases and cancers are reported to be associated with alternative
splicing [17, 18]. The analyses and studies of alternative splicing will fundamentally advance our
understanding of mRNA complexity and its regulation, provide valuable insights to understand
disease etiology, and assist the development of therapeutic interventions for splicing-related
diseases.
1.3.1 General mechanisms of pre-mRNA splicing
Constitutive splicing is the process that mRNA is spliced identically producing the same
isoforms, while alternative splicing generates different isoforms through using different sets of
exons. Typically, there are five major subtypes of alternative splicing [19]. 1) Exon skipping
(also known as cassette exons) is reported to be the most common alternative splicing event in
mammalian cells, which results in complete skipping of one or more exons [20, 21]. 2) Mutually
4
exclusive exons are a rare subtype where two or more splicing events are no longer independent.
They are executed or disabled in a coordinated manner [22]. 3) Alternative 5’ splice sites
(alternative donors): the usage of an alternative 5’ donor site, which changes the 3’ boundary of
the upstream exon. 4) Alternative 3’ splice sites (alternative acceptors): opposite to alternative 5’
splice sites, it is the usage of an alternative 3’ splice junction site causing the change of the 5’
boundary of the downstream exon. 5) Intron retention (IR) is the process that one or more introns
remain unspliced in the mRNA. The fate of those intron-retaining mRNA can be different [23,
24]. Some of them are degraded by the nonsense-mediated decay pathway, while others could
generate new protein isoforms [25]. Often, malfunctional proteins could be produced from IR
and lead to diseases. In addition to the five major subtypes, alternative polyadenylation sites and
alternative promoters are often discussed under this topic. Although these two also increase the
coding potential of genomes, they have very different mechanisms and are not directly related to
splicing.
Pre-mRNA splicing occurs in a large ribonucleoprotein complex (RNP) known as
spliceosome [26-28]. The spliceosome is a dynamic complex mainly consisting of five small
nuclear ribonucleoproteins (snRNPs) (U1, U2, U4/U6, U5) that recognize and assemble on each
intron to ultimately remove introns from a transcribed pre-mRNA (Figure 1.1) [29-33]. During
the assembly, U1 binds to the 5’ splice site with the assistance of the U2 auxiliary factor protein
(U2AF) forming base pairing between the U1 snRNA and the splice site. This earliest formed
complex (complex E or commitment complex) then binds to U2 to form the complex A (pre
spliceosome). Formation of the complex A ensures the intron to be spliced out and the last exon
to be retained during the final step. The complex B (precatalytic spliceosome) is formed by the
complex A joining U4, U5 and U6. Then a series of intricate reorganization events occur in order
5
to form the complex C (catalytic step 1 spliceosome). Firstly, the U1 interaction at the donor site
is replaced by the U6 snRNP, followed by U1 and U4 leaving the complex B. This lastly formed
complex C catalyzes two transesterification reactions of the splicing. During the first
transesterification, the phosphate at the 5’ splice site is attacked by the 2’- hydroxyl group at
branch point which results in the 5’ end of the intron being cleaved from the upstream exon, and
then joining to the branch point by a phosphodiester bond. In the second transesterification, the
phosphate at the 3’ splice site of the intron is attacked by the 3’- hydroxyl of the downstream
exon. This step finally releases the intron as well ligates the two exons by a phosphodiester bond
[20].
Figure 1.1 Stepwise schematic presentation of general pre-mRNA splicing. Abbreviations: BP: brunch point; SS: splice site.
6
1.3.2 Alternative splicing mechanisms and regulation
Alternative splicing is regulated by the interaction between cis-acting regulatory sequences and
corresponding trans-acting regulatory proteins. There are two major types of cis-acting elements
that either promote (enhancers) or inhibit (silencers) splicing activity of nearby splice sites.
Splicing enhancers can be located either in the exon (exonic splicing enhancers, ESEs) or in the
intron (intronic splicing enhancers, ISEs). They bind to splicing activator proteins such as
serine/arginine-rich family of nuclear phosphoproteins (SR protein family) to increase the chance
of an adjacent site being spliced. Splicing silencers include exonic splicing silencers harbored in
the exon (ESSs) and intronic splicing silencers harbored in the neighboring intron (ISSs). They
bind to splicing repressor proteins such as heterogeneous nuclear ribonucleoproteins (hnRNPs)
to inhibit the splicing of nearby sites or stimulate exon skipping.
Occasionally, the presence or absence of one single regulator is adequate to alternatively
splice a pre-mRNA [34-37]. But more often, more than one cis-acting regulatory sequences have
to work together with splicing activators or repressors to enhance or inhibit the spliceosome
activity at the splice site to determine an alternative spicing pathway [24, 34, 38, 39].
Interestingly, hnRNP proteins do not always act as splicing repressors nor does SR protein
always act as splicing activators. HnRNPs is a well-conserved RNA-binding protein family in
mammals [40]. In humans, hnRNPA1 usually binds to ESS or ISS to repress exon inclusion by
steric actions, or binds to both sides of a cassette exon forming a loop to skip the exon [41, 42].
However, during the interaction with Fas pre-mRNA, hnRNPA1 promotes exon 6 inclusion
through interrupting 5’ splice donor site selection of exon 5 [43]. Another member of the family
hnRNPL also has the ability to both activate or suppress exon 5 of CD45 gene [44]. More
importantly, this characteristic of splicing factors often appears to be strongly position-dependent
[45]. For example, a splicing factor would serve as a splicing suppressor or an activator in
7
different pre-mRNAs depending on whether it is bound to exons or introns. A well-studied
splicing factor, neuron-specific RNA-binding protein Nova-1 is one of such splicing factors.
CLIP Analysis has revealed that over 91% of Nova-dependent exon inclusion events occurred
near either alternative 5’ splice sites or constitutive 3’ splice sites, while 74% of Nova-dependent
exon skipping occurs near constitutive 5’ splice sites [46].
Splicing factors are imperative in regulating splicing events. Being able to pinpoint the
binding sites between RNA and splicing factors provides us valuable information about splicing
regulation. The predominant methods for investigating RNA-protein binding sites are based on
cross-linking followed by immunoprecipitation (CLIP) [36, 47]. The CLIP method coupling with
high-throughput sequencing permits the transcriptome-wide investigation of RNAs that interact
with the protein of interest [48]. A variety of CLIP-based methods have been developed. Among
them, high-throughput sequencing-CLIP (HITS-CLIP or CLIP-Seq) [48], photoactivatableribonucleoside enhanced CLIP (PAR-CLIP) [49], and individual CLIP (iCLIP) [50] are the three
major ones. HITS-CLIP was originally applied to identify genome-wide protein-RNA
interactions for the Nova protein family [46]. It is a well-established and effective method,
though the false negative rate is high as a result of low cross-linking efficiency [48]. Comparing
to HITS-CLIP, PAR-CLIP has improved the efficiency due to the incorporation of photoreactive
ribonucleoside analogs. This protocol has effectively increased the resolution and signal-to-noise
ratio [49, 51]. The potential drawback of PAR-CLIP is that the treatment could be toxic [52, 53],
but some studies suggest otherwise [49, 51]. iCLIP has achieved an even higher resolution and
efficiency compared to the first two. However, the experimental set-up and computational
analysis are complicated.
8
The three CLIP-based experiments along with other variations have extended the scope
of the genome-wide map of protein-RNA interactions. Choices of CLIP-based methods,
experimental design, and the selection of proper bioinformatic pipelines have been reviewed in
detail [52, 54, 55]. Such high-resolution maps of protein-RNA interactions advance the
understanding of splicing regulation, and how protein-RNA interactions play a role in human
physiological and pathological process. In addition, the detection of RNA-protein interaction can
reveal biomarkers, more importantly it could identify potential therapeutic targets [48].
The protein-RNA interaction is not the only player in regulating alternative splicing.
RNA structures also play a significant role [56]. They could either promote splicing or inhibit
splicing. In general, RNA structures aid splicing by bringing splicing signals close to each other.
For example, in the adenoviruses ADML gene, a stem-loop is formed in order to bring the 3’
splice site into the proximity of three possible branch points [57]. Another astonishing example
of promoting alternative splicing through RNA structures is through a docking and selector site
in the Drosophila melanogaster DSCAM gene. DSCAM encodes over 38,000 distinct mature
mRNA isoforms by mutually exclusive splicing of 95 alternative exons [58]. The majority of its
diversity is generated by the exon 6 cluster, which contains 48 alternative exons. A conserved
selector sequence is complementary to a portion of the docking site. This RNA-RNA base
pairing between the docking site and selector sequence ensures that only one of the 48 alternative
exons is used to produce mature mRNA [59]. This pattern illustrates that these competing RNA
secondary structures are the key element to maintain and assist the formation of protein isoforms.
As for RNA structure suppressing splicing, native RNA structures often sequester important
sequences required for splicing, thus repress the usage of splice sites. In some other cases,
structures close to exons or splice sites may have negative impact on the recruitment of U1 or U2
9
snRNPs [60]. To conclude, these delicate machineries integrate and interact with each other to
regulate alternative splicing in distinct cellular machines.
1.3.3 Alternative splicing associated human disease
Considering the important role of pre-mRNA splicing in composing protein diversity and
maintaining organism functionality, it is no surprise that disruption of normal splicing patterns
can cause gene dysfunction and even disease. There are around 20,000 human protein-coding
genes but almost 150,000 transcript isoforms. Thus, on average each human gene has about
seven transcript isoforms. Meanwhile, a recent study finds that over 30% of tissue-dependent
transcript variations are constituted by local splicing variations [61]. Given that such high level
of human genome complexity is exemplified by the flexibility of each gene with alternative
splicing, it is natural to consider such flexibility as a risk factor. Indeed, a great number of human
diseases have been reported to be linked to defective splicing.
Mutation disrupting either trans-acting regulatory proteins or cis-acting regulatory
sequences could lead to aberrant splicing. In general, trans-acting mutations are rarer than the
other. It is very likely due to the fact that disruptions in basal factors of the splicing machinery
are generally more lethal comparing to mutations altering splicing of a single gene in cis [62].
Given the role of alternative splicing in human disease, we aim to investigate its potential
contribution to aging and longevity, in Chapter 3, we explore its implications for maximum
lifespan across 26 mammal species.
1.4 Histone modification
Beside post-transcriptional regulation, transcriptional regulation like epigenetic regulation also is
a vital process that determine the transcriptional output of gene expression [63, 64]. One of the
main epigenetic mechanisms is histone modification.
10
Histones are proteins that play an essential role in the structure and organization of DNA
in eukaryotic cells [65, 66]. They provide structural support for a chromosome by allowing DNA
to wrap around complexes of histone proteins to form the nucleosome, giving the chromosome a
more compact shape to fit into the cell nucleus. There are five families of histones, H1 and H5
are known as linker histone, H2, H3, and H4 serves as core histones [67]. The former is highly
variable, it is important in the condensation of chromatin into a 30nm fiber, it binds to the
nucleosome particle [68, 69], and H1 also stabilizes nucleosome structure. H2, H3 and H4
together form nucleosome core that consist of two H2A-H2B dimers and a H3-H4 tetramer [70-
72]. The tight wrapping of DNA around histones ensures DNA remains untangled and prevent it
from DNA damage.
1.4.1 Regulation of chromatin by histone modification
Chromatin is an instructive structure that respond to external cues, and one of the regulation
mechanisms is the modification of histones. Histone modification includes acetylation,
phosphorylation, methylation, deimination, β-N-acetylglucosamine, ADP ribosylation,
ubiquitylation, sumoylation, etc. [73]. The combined impact of these modifications influences
DNA regulatory processes such as DNA replication, repair, and transcription both temporally
and spatially[74].
The most common type of histone modification are the methylation and acetylation of
specific lysine residues on N-terminal histone tails [75, 76]. Histone acetylation is a dynamic
process regulated by opposing action of two enzyme families, histone acetyltransferases (HATs)
and histone deacetylases (HDACs) [77]. HATs assisting the transfer of an acetyl group to the
lysine NH3
+ group is known as acetylation, while HDACs is capable of deacetylating by removal
of the acetyl group. Acetylation changes the accessibility of DNA for transcription. Histones
acetylation is often associated with gene activation, as it can lead to a more open chromatin
11
structure, allowing transcriptional machinery to access the DNA and promote gene expression
[78].
Similar to acetylation, histone methylation is also a dynamic process that methyl groups
can be added or removed by specific enzymes [79, 80]. Though methylation mainly occurs on
the side chains of lysine and arginine. Histone methylation can activate or repress gene
expression depending on which amino acids are methylated and how many methyl groups are
added [74]. Histone methylation has been shown to contribute to nearly all biological processes,
ranging from DNA repair, cell cycle regulation, and stress response to transcription,
development, differentiation, and aging [81]. In Chapter 4, our research highlights the role of
histone methylation in regulating apoptosis-related genes during brain development.
1.5 Comparative Analysis with Model Organisms
Cross-species studies is considered a key element in unraveling the intricacies of human biology.
While animal research is remained controversial and posing ethical challenges, the role of animal
models in understanding the human physiology, as well as physiopathology and medicine
development for diseases is still indispensable [82]. Various mammal species, spanning from
primates to rodents, has been utilized in biological research for their similarities to human. For
example, humans share approximately 99% of our genome with chimpanzees[83, 84], even with
mouse, the genetic similarity is over 90% [85]. Nowadays, Cross-species comparisons of
genomes, transcriptomes and gene regulation are now achievable at exceptional resolution,
allowing the comparison between human and other model organisms at the molecular-level [86].
1.5.1 Comparative genomics
Comparative genomics investigates the similarities and differences in the genomes of different
organisms. Genome comparisons reveal the remarkable commonality shared by all living
creatures, approximately 70% of the genes in bacteria and archaea came from ancient conserved
12
families that whose common ancestors existed billions of years ago[87, 88]. Identifying those
conserved sequence is a crucial step in comprehending the genome itself, it assists in finding the
sets of gene that are fundamental to life, enabling a deep understand of how mutations in these
genes may contribute to disease study.
Comparative genomics could also be used to address question regarding evolutionary
path amongst species, comparison of complete genome allows for a global view on genome
evolution, common ancestor and the evolutionary path could be inferred from it, in addition to
interspecific comparison, intraspecific genome comparison also shed light on the evolutionary
forces such as selective sweeps that acted on a species[89-91].
1.5.2 Comparative transcriptomics
Similar to comparative genomics, comparative transcriptomics focuses on examining the
similarities and differences in gene expression across different organisms, conditions, or
developmental stages. With comparative transcriptomic, understanding how genes are
transcribed and how transcriptional regulation vary among species or in response to different
environmental is achievable [86, 92, 93], moreover, beyond the instant response, it could also be
used to study physiological plasticity and adaptive evolution by elucidating the linkage between
organism and its environment [94]. More importantly, comparative transcriptomic facilitates the
identification of highly conserved molecular and cellular features that further enrich human
physiology [95-98].
Often comparative genomics and transcriptomics would be used together to gain a more
comprehensive understanding of biological and evolutionary processes. Comparative
transcriptomics focuses on the expression patterns of genes at the transcriptional level, while
comparative genomics studies the broader aspects of genome structure and function. The
collaborative use of these two tools provides researchers more insights in the associations
13
between genes and phenotypes and evolutionary path across species. In Chapter 2 and 3, we use
comparative approaches integrating RNA-seq data across to explore the evolutionary change of
tissue specific gene and longevity-associated splicing events. More importantly, comparative
expression and splicing profiling is not the only things interspecies study could offer, in Chapter
4, we will explore the epigenetics regulation of apoptosis-related genes during brain
development with mouse data.
14
Chapter 2 Tissue Specificity of Gene Expression Evolves Across
Mammal Species
2.1 Introduction
It is imperative to quantify and compare gene expression across tissues since this is critical to
understand gene function as well as its role in the tissue-selective manifestation of hereditary
disease. Usually, two major gene categories are studied in depth: housekeeping genes and tissuespecific genes. The formers are of utmost importance to the basic maintenance of cell function,
while tissue-specific genes are also invaluable as many diseases exhibit tissue specificity [9-12].
A systematic analysis showed that disease genes and protein complexes indeed tend to be
overexpressed in the tissues where defects cause pathology [99], highlighting the importance of
the tissue specificity studies of gene expression. In addition, understanding the tissue-specific
pattern of gene expression is essential to elucidate the molecular mechanisms of tissue
development and the tissue-specific transcriptional regulation.
With recent advances in the sequencing technology, large-scale transcriptomics studies
made it feasible to perform the phylogenetic comparative analyses of gene expression across
species [100-102]. The expression-level analyses help to understand how genomic variations
especially those on regulatory elements translate into phenotype variations across different
species [103, 104]. The evolutionary path of gene expression can then be related to
morphological, physiological and developmental characteristics of individual species.
Alternative splicing complicates the transcriptomes of higher eukaryotes. In humans, up
to 95% of multi-exon genes undergo alternative splicing to encode protein isoforms with
different functions [105]. Notably, about 15% of human hereditary diseases and cancers are
associated with alternative splicing events [106, 107]. The knowledge of splicing diversification
helps to further explain the phenotypic differences among different species [108].
15
Here, we combine the study of tissue specificity with the phylogenetic comparative
analyses of gene expression. We focus on the evolutionary changes in the tissue specificity of
gene expression. The divergent selection on the tissue specificity can help us to better understand
the impact of tissue-specific expression on the tissue-specific disease pathology as well as the
validity of disease animal models.
Recently Naqvi et al. [109] investigated the evolutionary path of sex differences in gene
expression by examining transcriptomes of 13 tissues in male and female humans, mice, rats,
dogs, and cynomolgus macaques. For the four nonhuman species, three females and three males
of each species were sampled. Another 740 human RNA-seq data sets were extracted from the
Genotype-Tissue Expression Consortium (https://gtexportal.org/home/) [110]. Utilizing such a
unique and valuable resource in transcriptomics, we conduct a comparative genome-wide, multitissue, multi-species RNA-study to explore the diversification of tissue specificity in gene
expression or alternative splicing.
2.2 Materials and Methods
2.2.1 Data acquisition
RNA-seq data of the four nonhuman species (cynomolgus macaque, mouse, rat and dog) were
downloaded from the GEO database (GSE125483) [109]. A total of 13 tissues (adipose, adrenal
gland, brain, colon, heart, liver, lung, muscle, pituitary, skin, spleen, testis, and thyroid) were
profiled in three male and three female samples of each species. RNA-seq data for human was
acquired from the Genotype Tissue Expression (GTEx) consortium (https://gtexportal.org/),
following the sample selection from the original publication [109]. The selected tissues were
described in GTEx as: adipose – visceral (omentum), adrenal gland, brain – cortex, colon –
transverse, heart – left ventricle, liver, lung, muscle –skeletal, pituitary, skin (not sun-exposed),
spleen, testis, and thyroid.
16
The homologous gene information was obtained from the Ensembl database Version 96
(https://uswest.ensembl.org/). The PhastCons conservation scores were downloaded from UCSC
Genome Browser (https://genome.ucsc.edu/).
2.2.2 RNA-seq data processing
For nonhuman mammals, short sequence reads were mapped to the reference genomes
(cynomolgus macaques to macFas5; mouse to mm10; rat to rn6; dog to canFam3) using STAR
[111] v2.6.0 with the parameters specified as: --outFilterMultimapNmax 50 --
outFilterMismatchNmax 999 -- outFilterMismatchNoverReadLmax 0.15 --outSAMtype BAM
SortedByCoordinate. The bam output files from STAR were further fed to Salmon [112] which
is a tool for fast transcript quantification from RNA-seq data. Salmon alignment-based mode was
used to quantify TPM (transcript per million) values for transcripts with the default parameters
and respective reference genomes. Salmon output files were then processed using our
customized scripts to obtain aggregated gene-level TPM values. For human RNA-seq data, we
downloaded the preprocessed TPM values directly from GTEx.
Our read mapping was not significantly impacted by duplicated genes. We examined the
mapping statistics from all nonhuman samples: there were on average 85.9% uniquely mapped
reads and 6.7% multi-mapped reads. In the downstream quantification step, Salmon handled the
multi-mapped reads based on the structure of the uniquely mapped reads to provide a transcriptlevel quantification. The final gene-level TPM values were then aggregated. Thus, the gene
expression quantification was accurate without strong effect caused by paralogous genes.
2.2.3 Tissue specificity quantification
The tissue specificity measurement τ was calculated as follows:
� = ∑ (#$%&!) "
!#$
($# ,
17
where �$) = %!
*+%$%!%"(%!)
, � = the number of tissues, �) = ���,(���) + 1) for the gene
expression in tissue i. If ���#-)-((�)) = 0, �$) = 1. τ varies between 0 and 1, where 0 for
ubiquitously expressed genes and 1 for tissue-specific highly expressed genes.
In order to apply the phylogenetic ANOVA model to formally test the divergence of
tissue specificity, we incorporated the τ-based tissue specificity measure with a � matrix
consisting of expression profiles of all individuals. Specifically, to associate τ with each tissue,
we defined �) as �) = τ �$). Thus, �) = � for the tissue with the highest expression and �) < �
for all other tissues.
For each gene, we created a matrix,
� = 7
�#,# ⋯ �#,#0
⋮ ⋱ ⋮
�1,# ⋯ �1,#0
;,
where rows were for individuals and columns were for tissues. A gene was defined as tissuespecific if the max column-average was greater than 0.8.
2.2.4 Identification of tissue-specific shifting pattern with the EVE model
The Expression Variance and Evolution (EVE) model [102] can be used to detect expression
divergence/diversity and branch-specific shift with expression profiles. Here we extended its
usage in the detection of tissue specificity shifting among species with the � matrix addressed
above. We supplied the model with � measures instead of expression data to examine the
shifting of tissue specificity. For example, for each of the human brain-specific genes, the �
matrix was constructed and fed into the EVE model to inspect whether any human brain-specific
gene experienced significant tissue specificity shift in other species. The likelihood ratio test
statistics produced by the EVE model was assessed by the chi-square test. To validate whether
the chi-squared (df=1) distribution was appropriate to approximate the null distribution of the
likelihood ratio test (LRT) statistics, we generated the null distribution based on parametric
18
bootstrap through EVE. We randomly selected 100 genes and generated the quantile-quantile
plots. As shown in Figure 2.1a, the null distributions were similar to the chi-squared (df=1)
distribution. The distribution of the p-values values based on the Kolmogorov-Smirnov tests is
shown in Figure 2.1b. A total of 94 P-values were greater than 0.05 and only one smaller than
0.01. Thus, the EVE model is generally applicable for the detection of tissue specificity shifting.
19
Figure 2.1 Comparison between the null distribution of the EVE model and the chi-squared distribution (df=1). A total of 100
genes were randomly chosen. (a) The quantiles of the LRT statistics from 100 parametric bootstrap samples generated from the
20
null (y axis) are plotted against the quantiles of the chi-squared distribution with 1 degree of freedom (x axis). The distributions
are similar. The six genes with p-values of the Kolmogorov-Smirnov tests smaller than 0.05 are highlighted. (b) The distribution
of the p-values based on the Kolmogorov-Smirnov tests.
2.2.5 Synonymous codon usage distribution
The coding sequences of human genes were downloaded from the Ensembl Biomart
(https://www.ensembl.org/biomart/). The codon usage for an interested gene was averaged from
all transcripts of that gene. A series of chi-square tests were performed to test the potential bias
usage of synonymous codons.
2.2.6 Alternative splicing of cassette exons
Cassette exons for each species were obtained from gene annotation GTF files in Ensembl
(release 97). The splicing ratio of a cassette exon was calculated as 0.5×inclusive junction read
counts / (0.5×inclusive junction read counts + exclusive junction read counts). Junction reads
from individuals of the same species were aggregated together. And we required that the total
number of exclusive junction reads or the total number of inclusive junction reads was greater
than 20 to obtain a valid splicing ratio calculation. Only valid splicing ratios were included in the
τ calculation.
To identify homologous cassette exons, we utilized BLASTn [113] to align cassette
exons from homologous genes of different species. We built a BLAST database from human
cassette exon sequences, and then blasted nonhuman cassette exon sequences against the
database. Cassette exon pairs with ≥50% positions aligned and belonging to homologous gene
pairs were homologous cassette exons. The similar strategy of identifying homologous exons had
been applied in Sakuma et al. [114]. The number of homologous cassette exons between two
species varied from 120 to 5004 (mean: 1058, median: 396). Since we used human sequences as
anchor sequences, species pairs involving humans and other well-annotated species had more
homologous cassette exons. Only 44 cassette exons had one-to-one homologous cassette exons
21
across all the five species. In the future, a more sophisticated model can be developed to further
incorporate evolutionary distances into the identification of homologous cassette exons.
2.3 Results and Discussion
2.3.1 Tissue specificity landscape of gene expression in five mammalian species
To measure the tissue specificity in gene expression, we calculated the Tau (τ) value which has
been reported as the most robust metric for tissue specificity [115]. τ ranges from 0 to 1, where 0
is for ubiquitously expressed genes and 1 for highly tissue-specifically expressed genes. Tissuespecific genes are those with average τ ≥ 0.8 among individuals of the same species. Based on
our analysis of the five mammalian species, the number of tissue-specific genes ranged from
4170 to 8194. The numbers were slightly higher than those reported in previous studies of
humans, mice and rats which were obtained from expression-fold-change based methods [116-
118]. Interestingly the number in cynomolgus macaques was over ten-fold greater than that in a
previous study [119] (4170 vs. 175). The discrepancy might be caused by different analysis
criteria, different numbers of considered tissues, as well as differences in sequencing
technologies. To our best knowledge, there is no prior research examining transcriptome-wide
tissue-specific gene expression in dogs.
The overall tissue-specificity pattern amongst the five species was similar: the highest
number of tissue-specific genes in the testis; followed by the brain, skin, and liver; and relatively
few in other tissues (Figure 2.2a). If only genes with 1-to-1 homologous across all the five
species were considered, the total number of tissue-specific genes as well as their tissue
distribution were still consistent across the five species. The testis and brain were the two tissues
with the most numbers of tissue-specific genes (Figure 2.2b). We confined our studies to 1-to-1
homologous protein-coding genes since our main focus is the evolutionary path of the tissue
22
specialty of gene expression. Such genes made up 15%-82% (median: 49%) of all tissue-specific
genes from
Figure 2.2 Number of tissue-specific genes identified across 13 tissues in five species. (a) All genes were considered for each
species. (b) Only genes with 1-to-1 homologous genes across all five species were considered.
23
individual tissues. Moreover, 1-to-many/many-to-many genes only consisted of 0-17% (median:
2%) of all tissue specific genes for each tissue. The remaining large number of genes lost their
homologous counterparts in at least one of the five species.
2.3.2 Evolutionary patterns of tissue specificity across five mammalian species
To understand the evolutionary path of a gene’s tissue specificity, we examined whether the gene
was enriched in the same tissue of different species. Thus, we counted the number of genes
which were tissue-specific in the same tissue of two species (i.e., tissue specificity maintains)
and genes which were tissue-specific in different tissues of two species (i.e., tissue specificity
shifts). Among protein-coding genes with one-to-one orthologous across all the five species,
3522 genes maintained their tissue specificity in at least two species, while 898 genes shifted
their tissue specificity from one tissue to another tissue across different species based on the τ
measurements. Figure 2.3 shows the pair-wise tissue comparison among the five species. Each
tissue pair corresponded to a circle and each circle was divided into 10 sections representing the
10 pair-wise comparison between the five species for this tissue pair. Note that the shift from
tissue A to tissue B was different from the shift from tissue B to A. The shifting direction here
was from a row tissue to a column tissue. Majority of tissue specificities were conserved
(diagonal circles). The shifting of tissue specificities was also observed (off diagonal circles).
Many of such shifting events happened from the testis to the brain tissues as well as the from the
brain to the testis since the two had the most numbers of tissue-specific genes (Figure 2.3a).
Similar findings were reported in Fukushima et al. [120]. Interestingly, many shifting events also
happened from pituitaries to brains (Figure 2.3a). Such pituitary → brain shifts were especially
enriched in the comparison between rats and cynomolgus macaques (“rc”) and the comparison
between rats and humans (“rh”). When the relative frequencies normalized by the numbers of
tissue-specific genes in the two considered tissues were used, the pituitary → brain and the
24
muscle → heart shifts were relatively more frequent than other shifts (Figure 2.3b). And the liver
and testis tissues were more likely to preserve their expression tissue specificity (Figure 2.3b).
Figure 2.3 Tissue-specificity shifting of gene expression between two tissues of two species. All genes have 1-to-1 homologous
counterparts across the five species. Each circle represents a comparison between two tissues. The shifting direction is from a
row tissue to a column tissue. Each comparison of the two tissues between a pair of species is illustrated in one sector of the
circle. Such species pairs are abbreviated by their first letters, i.e., dh represents the comparison between dogs and humans. The
arrangements of the species pairs for the sectors are shown along the heatmap. The shifting discoveries were based on the τ-only
measurements. (a) Absolute frequency. (b) Normalized frequency. For a species pair dh, the normalized tissue-specificity-shifting
frequency from tissues T to S is �!(#),&(')/#�!(#)�&('), where �!(#),&(') is the number of genes which are tissue-T-specific in
species d and tissue-S-specific in species h; �!(#) is the total number of genes which are tissue-T-specific in species d; and �&(')
is the total number of genes which are tissue-S-specific in species h.
To demonstrate the robustness of the results, we also applied a fold-change method to
detect tissue-specific genes and shifting ones. A gene was considered as tissue specific if (i) its
expression level was more than fivefold higher in a particular tissue as compared to all other
tissues; (ii) it was highly expressed in the tissue (FPKM >5). The fold-change based method is
much more stringent than the τ-based method and leads to fewer number of tissue-specific genes.
But the shifting patterns between tissues were similar and the enrichment of shifts between
related tissues (e.g., brain <=> pituitary gland, or heart <=> muscle) still existed (Figure 2.4).
25
Figure 2.4 Comparison of the tissue-specificity shifting between the τ-based and the fold change-based methods. (a)
Normalized shifting frequency based on the τ measurements. (b) Normalized shifting frequency based on the fold-change
measurements. For each tissue pair (T and S), the normalized tissue-specificity-shifting frequency is ∑#,' �!(#),&(') /
&∑# �!(#) ∑' �&('), where ∑#,' �!(#),&(') is the total number of genes shifting their tissue specificity between tissues T and S;
∑# �!(#) is the total number of genes which are tissue-T-specific; and ∑' �&(') is the total number of genes which are tissue-Sspecific.
To further confirm the shifts in tissue specificity of gene expression, we applied the
Expression Variance and Evolution (EVE) model [102] to conduct formal hypothesis tests. The
EVE model was originally developed to examine the evolution of gene expression by comparing
the expression variance between species and that within species. Here, we are interested in the
divergence of tissue specificity and hence focus on scenarios with differences of tissue
specificity between species larger than those within species. To obtain a tissue-specificity score
per tissue, we defined our own Phi (φt) score by associating τ to the gene expression in tissue t
normalized by its highest expression across all tissues (see Methods). Thus, φt = τ for the tissue
with the highest expression; φt < τ for all other tissues. Considering a gene in a tissue t, if it met
the tissue specificity cutoff (τ ≥ 0.8) in at least one species, we calculated φt for all individuals
across the five species and then conducted the phylogenetic ANOVA through EVE. A total of
380 genes were declared as significantly shifting their tissue specificity (FDR < 0.05). Thus, for
26
those genes, the tissue-specificity variation between species was significantly larger than that
within species.
2.3.3 Genes with shifting tissue specificity have a distinct conservation pattern
To assess the impact of the sequence-level conservation on the regulation-level conservation, we
examined both the coding sequence (CDS) regions and the promoter regions of genes with
shifting tissue specificity assessed by our EVE model. We observed that shifting genes were
significantly less conserved in CDS regions compared with other tissue-specific genes without
shifting as well as housekeeping genes (the one-sided Wilcoxon signed-rank test, p-values =
0.0027 and < 2.2×10−16, respectively, Figure 2.5a).
More interestingly, shifting genes tended to have less conserved third-positions in
codons. Thus, the 3-nt periodicity pattern for shifting genes showed deeper conservation
decreases on the third positions (Figure 2.5b). The conservation status in promoter regions (or
5’UTRs) also had a distinguishable pattern (Figure 2.5c): shifting genes were less conserved in
the immediate upstream region (−100 bp to 0) of start codons. However, they exhibited a
conservation peak around (−250bp, −100 bp) which was not observed in other non-shifting
tissue-specific genes and housekeeping genes. The lengths of 5’UTRs were similar for the three
gene groups (Figure 2.6). The median length of 5’UTRs based on the human annotation was 105
bp, 109 bp, and 105 bp for tissue-specific shifting genes, tissue-specific non-shifting genes, and
housekeeping genes, respectively. Thus the (−100 bp to 0) region upstream of start codons was
more likely to correspond to 5’UTRs and the (−250bp, −100 bp) region was more likely to be
located in promoter regions. Previous studies have shown that housekeeping genes evolve more
slowly as a result of purifying selection acting on both CDS regions and core promoter regions of
housekeeping genes [121]. Here, the conservation difference between tissue-specific genes
27
shifting their tissue specificity and those preserving their tissue specificity also suggests the
different selection constraints on these genes.
Figure 2.5 Conservation comparison for genes with different degree of tissue specificity. Genes were grouped as tissuespecificity-shifting genes, tissue-specificity-not-shifting genes, and housekeeping genes. (a) Conservation scores of CDS
regions for different gene groups. For each gene, the conservation score was calculated as the average of the PhastCons scores
across CDS positions. For genes with multiple transcripts, the average scores of all transcripts were used. (b) Comparison of the
conservation scores at the 3rd positions of codons for different gene groups. The median PhastCons score at each CDS 3rd
position relative to the 5’ and 3’ ends is shown. (c) Comparison of the conservation scores at promoter regions for different gene
groups. The 3rd quartile of PhastCons score at each relative position is shown. The human annotations are used.
28
29
Figure 2.6 Density curve of 5’UTR lengths for genes with different degree of tissue specificity. Genes are grouped as tissuespecificity shifting genes, tissue-specificity non-shifting genes, and housekeeping genes. (a) The 5’ UTR lengths are based on the
human annotation (Ensembl Genes 105, GRCh38.p13). (b) The 5’ UTR lengths are based on the mouse annotation (Ensembl
Genes 105, GRCm39).
2.3.4 Subgroups of genes with shifting tissue specificity
Among those 380 genes with significant tissue specificity divergence, we further categorized
them into four subgroups (Figure 2.7a): (A) complete shifting: tissue t specific in some species,
and tissue s (s≠t) specific in some other species (not necessarily in all). The τ-only-based study in
Figure 2.3 focused on such complete shifting. If not (A), then (B) intermediate shifting: tissue t
specific in some species; and in some other species the gene is not expressed (TPM < 5) in tissue
t and has much higher expression in other tissues although not meeting the tissue specificity
cutoff yet. If not (A) and (B), then (C) specificity loss: tissue t specific for some species; in other
species, the gene lost tissue specificity via not expressed at all (the highest TPM among all tissue
< 5). If not (A), (B) and (C), then (D): initial shifting: tissue t specific for some species; and in
other species the gene is expressed in the tissue t (TPM ≥ 5) but has much higher expression in
other tissues (at least twice of the expression in tissue t) while not meeting the tissue specificity
cutoff yet. As shown in Figure 2.7a, most of the shifting patterns fell into the subgroup C
(68.9%), which lost its tissue specificity by not being expressed in any tissue for certain species.
Intermediate shifting consisted of another 23.7% of these divergent genes. Only 13 genes
(PLET1, RDH8, FNDC9, DHRS2, CCN4, ANXA10, C14orf39, LCTL, S100A5, IRX6, RIPPLY2,
CRYGN, MROH6) completely shifted their tissue-specificity from one tissue to another tissue
under the more stringent statistical tests, compared to 898 genes in the τ-only-based study.
We examined the potential conservation differences among the complete shifting genes
and other incomplete shifting genes. As shown in Figure 2.7b, complete shifting genes tend to be
more conserved (further based on the signs of the t test statistics) in the 60-90 and 280-290 base
pair positions upstream of the CDS regions. However, in the third codon positions, no
30
significantly different evolutionary signal was observed between complete shifting and
incomplete shifting genes (Figure 2.7c), possibly due to the short amount of evolutionary time.
Although some third codon positions exhibited a significant P-value (student’s t-tests) < 0.05,
none of them passed the false discovery rate (FDR) threshold of 0.1.
Figure 2.7 Subgroups of genes with shifting specificity measured by EVE. (a) Number of genes for each shifting subgroup. A:
complete shifting; B: intermediate shifting; C: specificity loss; and D: initial shifting. (b) The comparison of PhastCons scores
between complete shifting (subtype A) and incomplete shifting genes (subtypes B-D) at each promoter position. P-values from
Student’s t tests are shown. (c) The comparison of PhastCons scores between complete shifting and incomplete shifting genes at
third codon positions. P-values from Student’s t tests are shown.
31
2.3.5 Synonymous codon usage bias in different gene groups
Since tissue-specific genes especially those shifting ones exhibited less conserved third positions
of codons, we investigated their synonymous codon usages. We examined all 19 codons with the
codon degeneracy (18 amino acids and the stop codon). As shown in Figure 2.8, all 19 codons
exhibit significantly biased synonymous codon usages (p-values ≤ 0.001, chi-square tests).
However, the severity of codon usage bias is very different for different gene groups. The codon
usage for tissue-specificity-shifting genes was more relaxed or uniform with an average p-value
of 1.1×10−4 for the Chi-square goodness of fit tests (brown bars in Figure 2.8). Housekeeping
genes exhibited the highest codon usage bias with very extreme p-values (blue bars in Figure
2.8), while tissue-specific but not shifting genes showed an intermediate level of codon usage
bias (purple bars in Figure 2.8). Thus, tissue-specificity-shifting genes had less constraint on the
selection of synonymous codons and displayed less conserved third codon positions.
Figure 2.8 Synonymous codon usage bias analysis for different gene groups. Single-letter amino acid codes (* represents the
stop codon) for the 19 codons with degeneracy are shown along the horizontal axis. The vertical axis represents the significance
of the codon usage bias (−log10 p, truncated at 300 for those very significant ones with ‘>’ on the bars) for a specific codon in
the three gene groups. The p-value was based on the Chi-square goodness of fit test.
32
2.3.6 Functional classification for different gene groups
Gene ontologies for genes maintaining or shifting their tissue specificity were functionally
clustered using the PANTHER classification system (http://pantherdb.org/, v.14.0). For genes
maintaining their tissue specificity, we generated the PANTHER analysis for each tissue
separately. We particularly focused on brain-specific and testis-specific genes since they were
the two tissues with the most numbers of tissue-specific genes. Brain-specific genes were more
enriched for ontology categories compared to testis-specific ones. For example, we found 103
function hits out of 130 brain-specific genes, comparing to only 127 hits out of 349 testisspecific genes (two-proportions z-test, p-value < 2.2×10−16). The top three molecular function
GO terms in brain-specific genes are ‘binding’, ‘transporter activity’ and ‘molecular transducer
activity’. Only 19 pathway hits were found for testis specific genes, yet unexpectedly, we found
two pathways involving neuron degenerative disease. Testis-specific genes DNAI2, GAPDHS,
CAPN11 and DNAH8 are involved in the Huntington disease pathway, while PSMA8 is in the
term of Parkinson disease. All the five genes are indeed testis enriched when we examined their
protein-level tissue specificity from the Human Protein ATLAS database
(http://www.proteinatlas.org).
In the functional classification for shifting genes combining all tissues, we found 242 hits
out of the 380 shifting genes for molecular function. The top three molecular function GO terms
are ‘binding’, ‘catalytic activity’, and ‘molecular function regulator’. For the pathway analysis, it
was not surprising to see that many shifting genes were involved in many different signaling
pathways such as Nicotinic acetylcholine receptor signaling pathway, TGF-beta signaling
pathway, Wnt signaling pathway, and so on (Figure 2.9).
33
Figure 2.9 Gene ontology pathway analysis for genes with shifting tissue specificity.
2.3.7 Tissue specificity shifting at the alternative splicing level
Besides the shifting of tissue specificity at the gene level, we examined the shifting at the
splicing level as well. We applied the Tau (τ) method substituting gene expression level with
exon splicing ratios to examine possible tissue specificity shifting at the splicing level. Figure
2.10 shows the pair-wise comparison among the five species. We still observed that majority of
tissue specificities were conserved along the diagonal line. However, the ratio of specificitymaintaining cassette exons compared to specificity-shifting ones was far less than that for genes.
A total of 366 cassette exons maintained their tissue specificity in at least two species. And 260
cassette exons shifted their tissue specificity at the splicing level. The tissue specificity of the
splicing of cassette exons was not as conserved as that of gene expression. They were more
34
prone to shift their tissue specificity (ratio: 260/366 vs. 898/3522, two-proportions z-test, p-value
< 2.2×10−16). Moreover, we observed hotspots of splicing-specificity-shifting events such as
pituitary → brain, muscle → brain, and testis → brain (Figure 2.10a). In addition, brains were
more likely to preserve their tissue specificity of splicing (Figure 2.10). Alternative splicing has
been reported to evolved more rapidly than gene expression [108]. Here we showed that the
tissue-specificity of alternative splicing also tended to evolve more rapidly than the tissuespecificity of gene expression. Since the EVE model requires using 1-to-1 homologous cassette
exons across all species, yet only 44 such cassette exons exist across all the five species, we only
explored such tissue-specificity-shifting with the τ measurements.
Figure 2.10 Tissue-specificity shifting of alternative splicing between two tissues of two species. All homologous exons between
two species are considered and they may not have the 1-to-1 homologous counterparts in other species. Each circle represents a
comparison between two tissues. The shifting direction is from a row tissue to a column tissue. Each comparison of the two
tissues between a pair of species is illustrated in one sector of the circle. The shifting discoveries were based on the τ-only
measurements. (a) Absolute frequency. (b) Normalized frequency. For a species pair dh, the normalized tissue-specificity-shifting
frequency from tissues T to S is �!(#),&(')/#�!(#)�&('), where �!(#),&(') is the number of cassette exons whose splicing ratios
are tissue-T-specific in species d and tissue-S-specific in species h; �!(#) is the total number of cassette exons whose splicing
ratios are tissue-T-specific in species d; and �&(') is the total number of cassette exons whose splicing ratios are tissue-S-specific
in species h.
2.4 Conclusion
In addition to the divergence of gene expression or alternative splicing, the tissue specificity of
gene expression also diverged along the evolutionary path. The divergence of tissue specificity
35
may underlie the diverse phenotypes across different species. The understanding of such
diversification is valuable to dissect human disease pathology.
36
Chapter 3 Alternative Splicing and its Implications for Maximum
Lifespan
3.1 Introduction
Aging is a dynamic, irreversible, inevitable process associated with gradual changes or
deterioration over multiple body systems, leading to an increased mortality risk [122]. With
advances in medical science, the life expectancy of the general population has substantially
increased. Yet, such a trend gives rise to an increased incidence of various chronic ageassociated diseases [123]. Currently, aging is considered the most important risk factor for most
chronic diseases, including cardiovascular disease, atherosclerosis, cancer, neurodegenerative
diseases, etc. [124-126]. Maximum lifespan (MLS) refers to the upper limit of how a species can
achieve. Human MLS is significantly limited by aging and chronic aging-associated diseases
[127]. Studies of maximum lifespan shed light on the fundamental aspect of the biology of aging
and provide valuable insights into the potential for extending healthy human lifespans.
Comparative studies examining the variability of lifespans across different species
provide a panoramic view of the complicated tapestry of longevity and aging. Modern
sequencing technologies allow us to pinpoint genetic signatures associated with extended or
truncated lifespans. Identifying candidate genes implicated in maximum lifespans offers a
treasure trove of targets for further investigation and potential interventions for aging-related
disease. For example, by juxtaposing the gene expression profiles of species with diverse
lifespans, we gain invaluable insights into the conserved pathways, unique adaptations, and
potential regulatory networks that contribute to the pace and trajectory of aging [128, 129]. In a
recent study, Lu et al. [130] identified thousands of genes whose expression levels exhibited
negative or positive correlations with species-specific MLS. Remarkably, genes manifesting a
negative correlation with MLS were enriched in energy metabolism, inflammation, and
37
controlled by circadian regulatory factors. Conversely, genes displaying a positive correlation
with MLS exhibited enrichment in DNA repair and were controlled by the pluripotency network.
Extensive investigations have illuminated the dynamic landscapes of gene expression
changes tied to the aging process, attributing to both transcriptional and post-transcriptional
regulations [131]. Despite these significant progresses, alternative splicing (AS) in the
comparative studies of MLS and aging has remained underexplored. Alternative splicing, by
which multiple distinct transcript isoforms are generated from a single gene, contributes to
protein diversity in higher eukaryotes [15]. In humans, up to 95% of multi-exon genes undergo
AS to encode proteins with different functions in distinct cellular processes [16]. Furthermore,
around 15% of human hereditary diseases and cancers are reported to be associated with AS [17,
18]. Yet, the study that forges the association between AS and the intertwined realms of aging
and longevity is still in its nascent stages [132].
To systematically identify longevity-associated splicing, we conducted a comparative
analysis that amalgamates the Rodentia and Eutlipotyphyla dataset with human data sourced
from the Genotype-Tissue Expression (GTEx) project. Our central aim is to uncover AS events
associated with lifespan and aging. By harnessing the power of transcriptomics, our study sheds
light on the relationship between AS and the complex trajectory of longevity, enriching our
understanding of the multifaceted aging process.
3.2 Materials and Methods
3.2.1 Data acquisition
RNA-seq data of six tissues, including brain, heart, kidney, liver, lung, and skin, from 577
samples from 141 individuals covering 26 mammal species were downloaded from the GEO
database (GSE181413 and GSE190756) [130]. For the human data, transcript- and gene-level
38
expression, splicing junction read counts, and other sample covariates were acquired from the
Genotype Tissue Expression (GTEx) consortium (https://gtexportal.org/).
3.2.2 RNA-seq-based de novo transcriptome assembly
We first implemented an analysis pipeline similar to that introduced by Lu et al. [130] to perform
the de novo transcriptome assembly. The RNA short sequence reads were trimmed with
Trim_Galore v0.6.6 (https://github.com/FelixKrueger/TrimGalore) to remove poor quality reads
and adapter sequences (Phred scores < 20). The processed reads were then aligned to the
reference genome (if available) with HISATS2 v2.2.1 [133] with parameters specified as: --dta --
very-sensitive. In cases where a reference genome was unavailable, the closest accessible
neighboring species was used. For each RNA-seq sample, the aligned reads were subsequently
processed by StringTie v2.1.7 [134] to conduct the de novo transcriptome assembly with the
default setting. The assembled transcripts from all the samples for each species were merged
using StringTie merge mode with the parameter -T 0.5 -g 50. A non-redundant set of transcripts
was then generated for the corresponding species.
3.2.3 Homologous alternative splicing identification
The GENCODE human and mouse gene annotations and the annotations derived from our de
novo transcriptome assemblies were used to extract a repertoire of seven types of AS events
(exon skipping, alternative 5’ splice sites, alternative 3’ splice sites, mutually exclusive exons,
retained intron, alternative first exons, and alternative last exons) with SUPPA v2.3 [135]. To
identify homologous AS events, we constructed a BLAST database using mouse exon/intron
sequences (mm10) as the anchor. The sequences of other mammal species and humans were then
blasted against the anchor database. Homologous exon/intron sequences were identified when
≥80% of the positions were aligned. For certain AS types involving two distinct exons, such as
mutually exclusive exons and alternative first and last exons, both exons must display qualified
39
alignments with their homologous counterparts. Further details on the specific regions used for
BLAST can be found in Figure 3.1. In cases where multiple sets of sequences met the criteria,
the one with the highest alignment percentage was selected. AS events with homologous
counterparts in at least ten species were included in the downstream analysis.
Figure 3.1 Sequence regions used for homologous alternative splicing identification. (a) constitutive splicing; (b) exon
skipping (cassette exons); (c) mutually exclusive exons; (d) alternative 5’ splice sites (alternative donors); (e) alternative 3’
splice sites (alternative acceptors); (f) intron retention; (g) alternative first exons; (h) alternative last exons.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Pre-mRNA mRNA BLAST region
40
41
Figure 3.2 Workflow of alternative splicing identification and quantification. (a) other mammal species and (b) human
3.2.4 Alternative splicing quantification
The transcript-level and gene-level expression of each non-human sample was estimated using
the StringTie expression estimation mode. The transcript-level expression data for humans were
directly downloaded from GTEx. Subsequently, SUPPA was employed to quantify the splicing
level of each AS event in terms of percent spliced-in (PSI), utilizing transcript-level expression
data.
3.2.5 Models for detecting MLS- or age-associated alternative splicing events
We calculated Spearman’s rank correlation coefficient between splicing ratios (i.e., PSI) and
maximum life spans of the nonhuman mammals to detect MLS-associated AS events. To ensure
sufficient statistical power, we only examined AS events whose genes were expressed in at least
1/3 of tissue samples with TPM >0.5. AS events with homologs in at least ten mammal species
and the PSI range (|max PSI – min PSI|) ≥0.2 were considered. Those with an absolute
correlation >0.4 and FDR <0.05 were declared significant.
To detect significant age-associated events with human GTEx data, we implemented an
elastic net regression model that considers all splicing events simultaneously and takes into
account existing covariates to remove potential confounding factors. The covariates include sex,
the top five of the genotyping principal components, 15 PEER factors, as well as sequencing
platform and protocol. The same set of covariates has been used in the GTEx sQTL analysis. The
elastic net model was built as follows:
���) = �2 + ? �3 ∗ ���3,)
(
34# + ? �(53 ∗ ���3,)
*
34# + e );
where ���) denotes the age of human sample i, ���3,) is the kth AS event, ���3,) is the kth
covariate provided by GTEx, �� are the corresponding coefficients, and e ) is the error in the
model. To ensure robustness, we only used alternative exons whose gene was expressed
42
(TPM >0.5) in at least 1/3 of tissue samples. Additionally, we imposed a minimum difference of
0.2 between the maximum and minimum PSI values to ensure substantial splicing changes. An
AS event that has a non-zero coefficient in the elastic net model was considered age-associated.
3.2.6 Phylogenetically independent contrasts
To examine the correlation between AS and MLS in a way that accounts for the evolutionary
relationships among species, we calculated the phylogenetically independent contrasts (PIC).
The PSI measurements obtained from SUPPA were used as input and the “pic” function from
the R package APE was employed to conduct PIC [136]. The construction of the phylogenetic
tree for the considered species was through VertLife [137].
3.2.7 Gene set enrichment analysis
Gene set enrichment analysis (GSEA) [138] was conducted to explore the functional enrichment
of MLS-associated cassette exons with the R package clusterProfiler v4.4.4 [139]. All genes
were pre-ranked based on Spearman’s rank correlation coefficients between their splicing PSI
values and the maximum lifespans. In cases where a gene harbored multiple AS events, the
highest correlation coefficient among those events was selected to govern the gene's ranking.
Gene sets of the Gene Ontology database (GO) [140] and the Kyoto Encyclopedia of Genes and
Genomes database (KEGG) [141] were analyzed. GO analysis was also performed on the
overlap gene group between MLS-associated and age-associated events using the Database for
Annotation, Visualization and Integrated Discovery (DAVID) classification online tool.
3.2.8 RNA-binding protein analysis for MLS (or age)-associated alternative splicing
To examine the upstream regulators of MLS or age-associated AS events, we conducted the
following analytical approach. The exonic and (or) intronic sequences used in the homologous
AS identification were provided to RBPmap[142], a web-based tool for mapping binding sites of
RNA-binding proteins (RBPs). Default parameters and consensus RNA binding motifs for both
43
humans and mice were applied in the analysis. For each RBP, we counted the motif occurrences
and normalized them based on the region length for each AS event.
To identify enriched RBPs for our interested AS groups, we compared the MLS or ageassociated AS group with randomly selected AS groups. Specifically, we randomly selected a
group of alternative splicing events matching the same composition of alternative splicing types
observed in the target group (MLS or age group). This process was repeated 10 times to generate
multiple random groups. The enrichment ratio was computed by comparing the average motif
occurrence in the target group with that in the 10 random groups. At the individual AS event
level, we calculated the ranking of its length-normalized motif frequency among all AS events
conserved in at least ten mammal species, and then we performed hierarchical clustering based
on the rankings to detect interesting RBP and AS event clusters.
3.3 Results
3.3.1 Homologous alternative splicing events in Rodentia and Eutlipotyphyla species
To reduce variables in identifying longevity-associated splicing, we designed comparative
transcriptomic analyses across tissues and between species within the same order or closely
related orders and displaying a wide spectrum of lifespans. This design is most appropriate also
because alternative exons are generally less conserved than genes between evolutionary-distant
species, increasing the challenges of comparative transcriptomic analyses. The best available
dataset is tissue RNA-seq transcriptomes of 26 species in the orders Rodentia and Eutlipotyphyla
whose lifespans range from 2.2 to 37 years [130], First, we needed to identify homologous
alternative splicing (AS) events for the 26 species, for which we developed an integrated pipeline
through transcriptome assembly and sequence comparison (Figure 3.2). The homologous events
were obtained through sequence alignments against mouse sequences (Figure 3.1, details in
Methods). As shown in Figure 3.3, the number of homologous events correlates with the
44
evolutionary distance to mice, ranging from 900 to over 3700. Exon skipping is generally the
most abundant AS type for individual species (Figure 3.4a), and many of them are conserved
among the Rodentia and Eutlipotyphyla species (Figure 3.4b). Mutually exclusive exons are the
least AS type in 24 out of the 26 species (Figure 3.4a), but many of them are conserved with
homologs in multiple species (Figure 3.4b). On the other hand, alternative first or last exons are
commonly observed in individual species (Figure 3.4a), but few of them are conserved across the
considered Rodentia and Eutlipotyphyla species (Figure 3.4b). In this comparative study, we
focus on AS events with homologous counterparts in at least ten species. In total, we identified
1974 such homologous AS events distributed among 1267 genes.
45
Figure 3.3 Homologous alternative splicing events in Rodentia and Eutlipotyphyla species. Sequences are aligned against the
mouse BLAST database.
46
Figure 3.4 Seven types of alternative splicing events in nonhuman. (a) AS events within individual species. (b) Homologous AS
events identified against the mouse BLAST database.
47
3.3.2 Alternative splicing events associated with MLS in Rodentia and Eutlipotyphyla
species
To explore the role of AS in maximum life span (MLS), we calculated the Spearman correlation
between the PSI (percent spliced in) values of an alternative exon and the lifespans.
Approximately 220-270 AS events were identified as MSL-associated (absolute Spearman
correlation >0.4 and FDR <0.05) in each tissue. About half of them are positively (Figure 3.5a)
or negatively (Figure 3.5b) correlated with MLS. Thus, 37% (731 out of 1974) of all considered
homologous AS events are associated with MLS in at least one of the six tissues, suggesting the
prevalence of significant AS in MLS. Interestingly, the majority of these AS events are exon
skipping (Figure 3.6).
48
Figure 3.5 Maximum life span (MLS)-associated alternative splicing events in nonhuman. Bar plots illustrate the number of
AS events where the percent spliced-in (PSI) level is positively (a) or negatively (b) correlated with MLS, body mass (BM), or
both. Jaccard index values are calculated to show the similarity between BM-associated and MLS-associated AS events: (c)
positive association, and (d) negative association.
Figure 3.6 Type distribution of MLS-associated alternative splicing events.
49
Previous studies suggest that larger rodents generally have longer lifespans [143]. To
explore its possible connection to alternative splicing, we first determined body mass (BM)
associated splicing. About 250 BM-associated AS events were identified in each tissue. BMassociated events significantly overlap with MLS-associated events (Figure 3.5a and b),
supporting the idea of interconnected mechanisms underlying longevity regulation. Interestingly,
such overlap is the smallest for the brain tissue, demonstrated by the lowest Jaccard index values
(Figure 3.5c and d). This lower similarity in the brain indicates that the rate of brain aging is
more distinct from the effects caused by body mass. Interestingly, this pattern was not observed
in the correlation study with gene expression (Figure 3.7), suggesting that underlying
mechanisms could segregate AS and gene expression in the aging process.
Figure 3.7 Overlap (Jaccard index value) between positively (a) or negatively (b) MLS-associated and BM-associated gene
expression.
Figure 3.8 illustrates two examples of AS events in genes that have been implicated in
lifespan regulation. An exon skipping event in Fn1 displays significant positive correlation with
MLS. Fn1 encodes a fibronectin, a glycoprotein that binds to membrane-spanning receptor
proteins integrins, and extracellular matrix proteins like collagen, fibrin [144-146]. Previous
50
studies reported that dysregulations of the AS in this gene decrease mean and maximum
lifespans [147]. Figure 3.8b shows another exon-skipping event in Arnt3 with a significant
negative correlation with MLS. Arnt3 encodes a transcription factor that contains a basic helixloop-helix (bHLH) domain and two Per-Arnt-Sim (PAS) domains. Along with another member
of its family, CLOCK, they form the essential components of the molecular oscillator responsible
for generating circadian rhythms [148, 149]. Previous studies show that Arnt3 knockout mice
exhibit symptoms of premature aging, including shorter lifespan, sarcopenia, cataracts, decreased
subcutaneous fat, and organ shrinkage [150]. Our analysis revealed that AS of Arnt3 is
associated with MLS.
Figure 3.8 Scatterplots illustrating two MLS-associated alternative splicing events. (a) Exon skipping event in which its PSI is
positively correlated with MLS in the liver. (b) Exon skipping event in which its PSI is negatively correlated with MLS in the
kidney. Spearman correlation coefficients (�) of PSI and MLS are shown in the plot. Phylogenetically corrected coefficients are
shown in the parenthesis.
To assess the potential impact of the evolutionary history of species on our results, we
conducted phylogenetically independent contrasts of PSI with our original data. After applying a
threshold of absolute correlation greater than 0.2 for the corrected data, we observed that 83% of
associations retained their significance even after correction. This suggests the robustness of our
discovered associations.
51
3.3.3 MLS-associated alternative splicing events exhibit distinct patterns in the brain
Our study observed that although 342 out of 731 (46.8%) AS events are associated with MLS in
more than one tissue, the other 389 are tissue-specific (Figure 3.9a). The brain tissue contains the
highest proportion of tissue-specific MLS-associated AS events in both positive and negative
association. Our distinct set of MLS-associated AS events underscores the unique regulation of
brain aging.
Figure 3.9 Tissue specificity of MLS-associated alternative splicing events. (a) Upset plots showing positive (Pos-MLS AS) or
negative (Neg-MLS AS) MLS-correlated AS events identified in six tissues. (b) Hierarchical biclustering of AS events across
tissues based on Spearman correlation coefficients with maximum lifespan. All considered homologous AS events are plotted.
The unique role of AS in brains is also reflected through the hierarchical clustering of the
association coefficients across different tissues. As shown in Figure 3.9b, although the
correlation intensities between AS and MLS vary across tissues, most MLS-associated AS events
still exhibit consistent correlation direction across tissues (Figure 3.9b). However, many positive
correlations in the brain tissue display little or even slightly negative correlations in all other
52
tissues. Conversely, many negative correlations within the brain exhibit positive correlations in
the other five tissues.
Figure 3.10 show twelve examples of brain-specific MLS-associated splicing events that
consistently demonstrate an opposite correlation direction against all five other tissues. Such an
opposite correlation pattern was rarely observed in the remaining tissues, with only one such
splicing event in each of the heart and liver tissues.
Figure 3.10 Brain-specific association between alternative splicing and maximum lifespan. For these AS events, their
correlations with MLS in the brain have opposite signs compared to other tissues. ES: exon skipping; A5: alternative 5’ splice
sites; A3: alternative 3’ splice sites.
3.3.4 Functional enrichment of genes with MLS-associated alternative splicing
To identify functional enrichment of MLS associated AS events across all tissues and within
individual tissues, we ranked genes based on their splicing events’ Spearman correlation
53
coefficient with MLS to perform a gene set enrichment analysis (GSEA) for the GO and the
KEGG pathways. Figure 3.11 shows the GSEA dot pot of enriched terms for all tissues. These
analyses identified previously reported aging-associated pathways. For example, the DNA
damage checkpoint signaling pathway is crucial for preserving genome integrity and preventing
mutation accumulation that can lead to various diseases, including cancer. The pathway not only
plays a crucial role in the cellular response to DNA damage but its dysregulation and persistent
activation can contribute to cellular senescence and age-related changes. Further investigating
the AS events involved in this pathway may help unravel novel molecular mechanisms
underlying the aging process [151-153]. Similarly, the mRNA surveillance pathway (Figure
3.11b) plays an important role in monitoring and degrading abnormal or faulty mRNA
molecules. It plays a crucial role in maintaining the integrity and quality control of mRNA
transcripts within the cell [154]. The BMP signaling pathway helps mitigate the negative effect
of aging on the osteogenic differentiation and regenerative capacity of aged muscle-derived stem
cells [155, 156]. As expected, each tissue exhibited different functional enrichments since they
contain somewhat different MLS-associated events (Figure 3.12). Interestingly, we identified a
longevity-regulating pathway in the brain tissue, corresponding to the unique pattern of AS
events observed in the brain. This finding also suggests that the brain may play a role in
regulating longevity through distinct molecular mechanisms.
54
Figure 3.11 GSEA dot pots of enriched terms with all tissue merged for AS-MLS association calculation. (a) Gene Ontology
(GO) terms; (b) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways.
55
Figure 3.12 GSEA dot pots of enriched KEGG pathways within individual tissues. (a) Brain, (b)Skin, (c) Kidney, (d)Lung, (e)
Liver, (f)Heart.
3.3.5 Age-associated alternative splicing events across human tissues
With the GTEx human data, we investigated the association between human ages and AS
changes across multiple individuals. To tackle the complex relationships among individuals and
better utilize existing measured covariates, we employed an elastic net regression model that
considers all AS events simultaneously. Our elastic net regression resulted in similar numbers of
56
age-associated AS events for different tissue types (or subtypes) (Figure 3.13a). Interestingly, a
previous study that used the same dataset but the traditional regression mode on individual AS
events obtained a wide range of age-associated AS events across different tissues (or sub-tissues)
[157].
Figure 3.13 Age-associated alternative splicing events in humans. (a) The number of age-associated AS events in individual
human tissues. (b) Heatmap showing pairwise tissue similarity based on Jaccard index values for age-associated AS events.
57
These events are identified by our elastic net regression. (c) The number of age-associated AS for six major tissues. Events from
sub-tissues are pooled together.
To evaluate the model performance, we calculated and visualized the pairwise similarity
between tissues with the Jaccard index value (Figure 3.13b). We observed a high Jaccard index
among similar tissues. On the contrary, when AS events were considered individually in
traditional regression models, the tissue similarity is generally lower (Figure 3.14). This suggests
that the elastic net regression method effectively identified age-associated AS events by
modelling multiple AS events simultaneously. The better performance of the elastic net
regression model also indicates that AS events may work in a collaborative way, contributing to
the regulation of the aging process. Yet despite observing a clear pattern of higher pairwise
similarity between sub-tissues belonging to the same primary tissue, we found that most ageassociated splicing events are still specific to sub-tissues. The union of age-associated AS events
across all sub-tissues in the brain tissue resulted in over 1000 events (Figure 3.13c).
58
Figure 3.14 Heatmap showing pairwise tissue similarity for age-associated splicing identified by traditional regression models
fitting alternative splicing individually. The overlap similarity is based on Jaccard index values.
3.3.6 Alternative splicing events associated with both MLS and age are enriched for
intrinsically disordered proteins
Upon comparing the age-associated and MLS-associated AS events, we identified 139
overlapping AS events (i.e., age-MLS AS events) across six tissues. The distribution of different
types of AS for the overlapping events is similar to the composition of MLS-associated splicing,
with exon-skipping events being predominant. further analysis for GO enrichment using DAVID
revealed intriguing findings for this group of genes.
59
Figure 3.15 Percentage of IDP genes in humans. IDP genes are those associated with the GO term “REGION: Disordered”.
One-sided Fisher’s exact tests are used for the percentage comparison.
As shown in Figure 3.15, 90.6% of the age-MLS genes are associated with the GO term
"REGION: Disordered," indicating a significant enrichment of genes encoding intrinsically
disordered proteins (IDPs). IDP-containing proteins play an essential role in signaling and stressresponse processes [158]. In comparison, only 62.7% of human genes harboring AS events are
IDPs (p-value = 1.05×10−12, one-sided Fisher's Exact Test). For genes with homologous AS
events in humans and ≥10 nonhuman species, the proportion of IDP is also lower than age-MLS
group (84.4%, vs. 90.6%, p -value = 0.037, one-sided Fisher’s Exact Test).
When compared to the age-associated group, age-MLS AS events are significantly more
enriched with IDPs (90.6% vs. 82%, p -value = 0.009, one-sided Fisher’s Exact Test). Although
not reaching statistical significance, the age-MLS group displayed a slightly higher proportion of
IDP genes compared to the MLS-associated group (90.6% vs. 88%). The results suggest a
potentially stronger contribution from MLS-associated events to this enriched pattern than age-
60
associated events. Further investigation is needed to explore why IDPs with conserved AS
events, especially those associated with MLS, are linked to aging.
3.3.7 Exploration of upstream regulators of MLS and age-associated AS events
To better understand the potential upstream regulators linked with MLS or age-related AS
events, we employed RBPmap [142] to map the binding motifs of RNA-binding proteins (RBPs)
for each considered AS event. RBPs exhibiting enriched motif occurrences in a target group
were deemed potential master upstream regulators for this AS group. Details of the enrichment
analysis can be found in the Methods section. Figure 3.16 illustrates the enrichment ratio for the
top 20 RBPs for the MLS or the age-related AS group (AS events from various tissues are
pooled). The MLS-associated AS group demonstrates notably more significant RBP motif
enrichment than the age-associated AS group.
Figure 3.16 Top 20 most enriched RBP compared to the control group in MLS/age-associated AS events from various tissues
are pooled.
Furthermore, our researched the functions of the top 20 significantly enriched RBPs and
found that over half of them are relevant to the aging process. One notable gene is PPARG
related coactivator 1 (PPRC1), an essential regulator of energy metabolism, glucose homeostasis,
and antioxidant defense mechanisms [159-161]. A recent study revealed that upregulating
PPRC1 via PPARG leads to enhancements in oxidative stress, a reduction in inflammation, and
61
an extension of lifespan[162]. RNA-binding protein 6 (RBM6) interacts with the DNA repair
protein RAD51 and is recruited to DNA damage site, facilitating homologous recombinational
repair [163]. DNA damage significantly contributes to aging by inducing genomic instability and
cellular senescence [151, 164, 165]., EIF4G2, SRSF4, and SRSF5 are involved in cellular
senescence [166-168]. RNA-binding protein 8a (RBM8A) was reported to exhibit a declining
expression pattern associated with aging [169] and play a role in several age-associated
progressive neurological diseases, including Alzheimer's disease [170]. HNRNPH2 is a splicing
regulator of Trf2 pre-mRNA, and TRF2 is well-known for its function in telomeres [171, 172].
ZC3H10 has been identified to undergo age-related epigenetic changes in primates [173].
Together these findings suggest an intricate network of RBPs that may contribute to the
regulation of MLS or age-related AS events.
Next, we delved into tissue-specific MLS- and aging-associated AS events. Skin, brain
and heart exhibit relatively more RBP motif enrichment for MLS-associated AS events
compared to age-associated AS events (average enrichment ratio for the top 20 RBPs is 1.43 vs.
1.17 for hearts, 1.34 vs. 1.14 for brains, and 1.45 vs 1.16 for skins). This suggests that in those
tissues, AS events associated with the rate of aging are subject to tighter regulation by a small set
of RBPs compared to those linked with lifespan.
To further explore enriched RBPs for each individual AS event (instead of the whole
group), we determined the rankings (represented by percentiles) of its length-normalized motif
occurrences among all considered AS events. As shown in the hierarchical clustering of the
motif enrichment rankings (Figure 3.17), two prominent RBP clusters emerged. Cluster 1
consists of 23 RBPs and 83 MLS-associated splicing events, 14 out of the 23 RBPs have been
previously reported to be involved in age-related processes. Certain RBPs, such as CPEB1 and
62
CPEB4, have a direct impact on aging by regulating the mammalian cell cycle, particularly
during senescence [174, 175]. Similarly, RBPs like ELAVL4 and HuR, involved in mRNA
stabilization, play a role in aging [176, 177]. Others may contribute to aging through proteinprotein interactions with counterparts, such as hNRNPC-p53 and CPEB2–p53 interaction [178,
179]. Interestingly, we identified an enriched RBP cluster for age-related AS analysis which
shares almost the same set of RBPs in Cluster 1 (Figure 3.18), despite sparse overlap in their
regulated alternative splicing events (11 out of 83 MLS-associated events and out of 156 ageassociated events). The RBPs in Cluster 2 exhibit a high overlap with the top 20 significantly
enriched RBPs for the whole MLS group, with 6 out of 8 overlapping. Functional annotation
with DAVID for genes corresponding to these AS events suggests their predominant
involvement in RNA binding and transcription regulatory activity. While they may not have a
direct functional connection with aging or maximum life span, further investigation on those
genes could shed light on the mechanism underlying the aging process.
63
Figure 3.17 (a)Heatmap showing hierarchical biclustering of MLS-associated AS event based on percentile of observed lengthnormalized frequency among all the homologous alternative splicing events. Each row corresponds to a specific AS event, while
columns represent the associated RNA Binding Proteins (RBPs).Green line on the grey bar represent overlap with ageassociated AS events. (b) detailed view of cluster 1 (c) and cluster 2..
64
Figure 3.18 Heatmap showing hierarchical biclustering of age-associated AS event based on based on percentile of observed
length-normalized frequency among all the homologous alternative splicing events. Green line on the grey bar represent overlap
with MLS-associated AS events.
3.4 Discussion
Mammalian species exhibit a remarkable range of over a hundred-fold variation in their
maximum lifespans [180]. This broad diversity offers a valuable opportunity for comparative
biological investigations into the underlying mechanisms governing aging and lifespan
regulation [128]. In this comparative study, we explored the impact of AS on maximum
lifespans.
Our examination of MLS-associated AS events using Spearman's rank correlation
coefficients revealed 731 (37% of all considered events) significant associations between AS and
65
MLS across different tissues, and about 46.8% of them show up in multiple tissues. These
findings highlight the potential essential role of AS in governing lifespan dynamics.
Noteworthy observations emerged from our investigation into the overlap between MLSassociated and BM-associated AS events. Unlike other tissues, the aging processes within the
brain are influenced by factors separate from those associated with body mass (Figure 3.5a and
b). Interestingly, this distinctive pattern did not manifest in gene expression study (Figure 3.8),
implying that regulating the brain aging process through AS may entail unique underlying
mechanisms.
Furthermore, brain tissue stood out in our tissue-specific MLS-associated AS analysis. It
exhibited the highest count of tissue-specific MLS-associated AS events, and some of those even
displayed opposite correlation directions compared to non-brain tissues (Figure 3.9 and Figure
3.10). These findings underscore the pivotal role of AS in brain-related aging processes and its
potential implications for neurodegenerative diseases.
Although the elastic net model worked well with human data to identify age-associated
AS events, it was unsuitable for the nonhuman data due to its reliance on complete datasets. The
nonhuman dataset, focusing on homologous AS events, often lacked representation across all the
26 Rodentia and Eutlipotyphyla species. We attempted to address the missing data issue by
imputing missing values and removing samples and AS events with too many missing data
points. However, this approach resulted in removing a substantial portion of the data, which may
compromise the analysis. Therefore, we chose to calculate the Spearman method based on the
available nonhuman dataset.
The comparative analysis between MLS and chronological age in the context of RBPs
and AS events highlights intriguing distinctions in the regulatory landscape. Notably, the
66
enrichment analysis of RBPs' binding motifs revealed a more pronounced association with MLSassociated AS events compared to age-related AS events. This suggests that the intricate
regulatory mechanisms governing MLS may involve a more robust interplay of RBPs,
potentially contributing to the sustained functionality and homeostasis characteristic of longerlived organisms.
While RBPs remain crucial in mediating age-related AS events, the observed less
significant enrichment suggests a potentially less tightly regulated network in the context of
chronological age especially for the brain and heart tissues. This discrepancy implies that the
regulation of AS events associated with MLS may be subject to a more precise and intricate
control mechanism compared to those linked solely to chronological aging. Such nuanced
distinctions underscore the importance of considering MLS as a distinct factor in the study of
aging-related transcriptomic changes, hinting at the involvement of specialized regulatory
mechanisms that extend beyond the conventional effects of chronological aging. Further
investigation into these contrasting regulatory dynamics may offer valuable insights into the
molecular determinants underlying the complex interplay between RBPs, AS events, and the
aging process.
However, discrepancies in reference genome quality and RNA-seq sample parameters
prevented a complete capture of AS events across all species. The count of identified AS events
within species exhibited notable variance. Enhanced annotations akin to those in humans or mice
would facilitate better detection of homologous AS events among nonhuman species.
Additionally, these improvements would pave the way for the application of advanced models
(such as elastic net regression models) to nonhuman species, potentially yielding more robust
outcomes and a deeper understanding of the intricate AS landscape in the context of aging.
67
In summary, our comprehensive transcriptomic analysis across 26 mammalian species
with varying lifespans has unveiled significant associations between AS events and maximum
lifespan. Our findings in the brain tissue align with prior research, highlighting its unique role
and mechanisms in the aging process. We have identified lifespan-associated AS events and
enriched genes associated with aging-related pathways, both directly and indirectly. Our analysis
reveals an overlapping gene group between age-associated and maximum lifespan-associated AS
events enriched in the GO term "REGION: Disordered." These insights enhance our
understanding of alternative splicing’s contribution to aging and open avenues for future research
directions.
68
Chapter 4 Global Reprogramming of Apoptosis-Related Genes
during Brain Development
4.1 Introduction
Apoptosis is a ubiquitous regulated cell death pathway in multi-cellular organisms, essential for
development, tissue homeostasis, and immune system functions [181]. Various physiological or
pathological stimuli can induce apoptosis. Depending on the origin of a triggering stimulus, the
signaling cascades leading to apoptosis are commonly attributed to the intrinsic (mitochondrial)
pathway and the extrinsic (death receptor) pathway, though the two pathways can influence one
another [182-185]. Both pathways converge on activation of executioner caspases that cause
cellular structure breakdowns and DNA fragmentation [186-188].
The significance of apoptosis in the mammalian nervous system is well documented [189,
190]. Dysregulated apoptosis is implicated in a wide spectrum of neurodevelopmental disorders
and neurodegenerative diseases. During development, apoptosis plays an essential role in
establishing appropriate neuronal cell numbers and forming robust neural circuits [191, 192]. For
example, peripheral neurons depend on target-derived trophic factors for survival and without
them undergo apoptosis [193, 194]. As such, the formation of neural circuits is a selection process
through apoptotic control.
Despite the importance of apoptosis for neuronal development, neurons reduce in apoptotic
competence as they mature [195, 196]. In the peripheral nervous system, withdrawal of
neurotrophins drastically induces apoptosis in developing neurons but barely affects mature
neurons [194, 197, 198]. In the central nervous system, the vulnerability of post-mitotic cortical
neurons to apoptosis is much lower than cortical progenitors. Apoptosis occurs more frequently in
neural stem and progenitor cell-enriched ventricular and subventricular zones than in cortical
plates consisting of post-mitotic neurons [199-204]. In adult brains, mature neurons become less
69
susceptible to various apoptotic stimuli, including ionizing radiation and hypoxic stress [205-208].
The attenuation of neuronal apoptosis during development prevents mature neurons from
prematurely committing apoptotic cell death, allowing uninterrupted neuronal survival throughout
an organism’s life, which is essential for fitness and survival.
Accordingly, neurons must transform how they regulate apoptosis during development.
However, the molecular mechanisms transforming neuronal apoptosis regulation during
development are less clear. The expression control of apoptotic genes is presumably at the center
of this regulatory program. To date, 1923 genes have been annotated by the Gene Ontology
Consortium as involved in or regulators of apoptosis. This regulatory nexus formulates the check
and balance before a cell undergoes apoptosis. Altered expression of apoptotic genes is widely
implicated in cancer development and other disease progression [209, 210]. With varied
expression of apoptosis-related genes (herein also termed apoptotic genes), neural progenitors and
immature and mature post-mitotic neurons can respond differently to apoptotic stimuli.
While the role of apoptosis in neuronal development has been extensively studied and the
contributions of individual apoptotic genes have been demonstrated in piecemeal with genetically
modified mice, the question remains as to how the neuronal regulatory program of apoptotic genes
as a whole is shaped during neuronal development. In this study, we comprehensively examined
the in vivo temporal regulation of all known apoptosis-related genes in multiple brain regions
during mouse development from embryonic day 10.5 (E10.5) to postnatal day 0 (P0). This period
coincides neurogenesis and neuronal differentiation, and largely excludes gliogenesis. We
determined to what extent and with what patterns apoptotic genes are regulated during neuronal
development.
70
We further investigated whether chromatin-based gene regulation could be the underlying
regulatory mechanism. Epigenetic regulation, in cooperation with transcription factors, determines
the transcriptional output of gene expression [63, 64]. A major epigenetic regulatory mechanism
is histone modification. Histone H3 tri-methylation on lysine 4 (H3K4me3) and lysine 27
(H3K27me3) are a pair of important modifications at and around gene promoters that activate and
repress gene expression, respectively [211-214]. H3K4me3 modification is generated by specific
histone methyltransferases (e.g., UTX, SETD1A) and is highly enriched near transcription start
sites of active promoters [215, 216]. H3K27me3 deposited by histone methyltransferases (e.g.,
EZH2) of PRC complexes is a hallmark of polycomb group-mediated transcriptional silencing
[217]. These histone modifications may serve as an upstream regulation layer to orchestrate the
global chronological expression changes of apoptotic genes, but their contributions have not been
studied. Knowledge about the developmental reprogramming of the apoptotic gene expression and
the underlying regulatory mechanisms is essential to understanding the activity and regulation of
the apoptosis pathway in neural development and diseases.
4.2 Materials and Methods
4.2.1 Definition of Apoptosis-Related Genes
We downloaded GO terms of mouse genes from the Ensembl BioMart
https://uswest.ensembl.org/index.html (Oct,1st,2021). Genes with GO terms including the keyword
“apopt” (e.g., apoptosis, apoptotic) were considered, resulting in a total of 1923 apoptosis-related
genes. Based on the assigned GO terms, 1282 of 1923 apoptotic genes are involved in either
positive or negative regulation of cellular apoptosis. To analyze regulation direction, genes solely
involved in positive regulation of apoptosis were defined as pro-apoptotic genes, and those in the
negative direction were defined as anti-apoptotic genes. The 153 genes involved in both directions
71
were excluded for analyses involving regulation direction. The remaining genes have no
information available regarding regulation direction in apoptosis.
4.2.2 Data Acquisition
Datasets of mRNA expression (mRNA-seq) and histone ChIP-seq were downloaded from the
mouse Encyclopedia of DNA Elements (ENCODE) project (http://encodeproject.org). These
corresponded to three brain tissues (forebrain, midbrain, and hindbrain) of the Mus musculus
C57BL/6 strain, and eight developing ages (embryonic days 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, and
16.5, as well as to postnatal day 0). The histone modification marks included H3K4me3 and
H3K27me3 for the same brain regions and developing ages. Data from the two biological
replicates of each experimental configuration were averaged.
4.2.3 RNA-Seq Data Analysis
Processed gene quantification data of two biological replicates from the forebrain, midbrain, and
hindbrain tissues were downloaded from ENCODE. Expressed apoptosis-related genes were
defined using the following cutoffs: transcripts per million (TPM) ≥ 1 in at least one-third of data
points and maximum TPM ≥ 5 among all data points. The Z-score normalization on log2(TPM+1)
was performed before the hierarchical clustering analysis using the UPGMA agglomeration
method. Two major clusters were labeled as the upregulation cluster and the downregulation
cluster based on their overall chronological trends. The same analyses were performed for data
from the three regions separately.
4.2.4 ChIP-Seq Data Analysis
For the ChIP-seq data, “bed narrowPeak” files containing the peaks of signal enrichment and
“bigwig” files containing signal p-values were obtained from ENCODE. For each sample, the file
containing either replicated peaks or pseudoreplicated peaks was downloaded based on the default
recommendation from ENCODE. Defined by ENCODE, replicated peaks are peak calls from
pooled replicates and pseudoreplicated peaks are calls from two analysis partitions. Significant
72
peaks were annotated with the ChIPpeakAnno package [218] using mm10 Ensembl genes as
references. Peaks residing from −2 kb to 2 kb around a transcription start site (TSS) were defined
as histone modification signals over the promoter region. Peak intensity data were obtained from
the bed narrowPeak files. For a promoter overlapping with multiple peaks, the peak with maximum
intensity was taken for further analysis. The ChIP-seq tracks were visualized using the UCSC
genome browser.
4.2.5 Regression Fit between RNA-Seq (or ChIP-Seq) Data and Ages
Linear regression models were fit between expression levels (log2(TPM+1)) of each gene in the
upregulation or downregulation clusters and the considered age time points (P0 was coded as day
22). For the H3K4me3 or H3K27me3 ChIP-seq data, linear regression models were fit between
the promoter peak intensity and ages. The age coefficient for each gene was denoted as βEx, βK4,
and βK27 for expression, H3K4me3, and H3K27me3 signals respectively. The coefficient
represents the change of the expression or the histone modification when the age advances one
day.
73
4.2.6 KEGG Apoptosis Pathway
The classical apoptosis pathway (entry: hsa04210) was acquired from KEGG (Kyoto Encyclopedia
of Genes and Genomes, https://www.genome.jp/kegg/) and then replotted for visualization
purposes. Genes from the upregulation cluster in forebrain, midbrain, or hindbrain regions were
labeled as “upregulation cluster” and genes from the downregulation cluster in any of the three
brain regions were labeled as “downregulation cluster”. Genes were color labeled if they positively
or negatively regulate apoptosis according to GO. Genes implicated in both or neither directions
by GO terms were not color labeled.
4.3 Results
4.3.1. Most Apoptosis-Related Genes Are Dynamically Regulated in the Developing
Forebrain of a Mouse
To characterize expression dynamics of apoptosis-related genes in developing mouse brains, we
extracted the expression levels (TPM) of 1923 apoptotic genes in the forebrain region across seven
embryonic age points (E10.5, E11.5, E12.5, E13.5, E14.5, E15.5, and E16.5) and one postnatal
(P0) age point. Many of these genes are well known for their roles in the classic apoptosis pathway,
including caspases, Bcl-2 family proteins, death ligands and receptors, etc. Others are upstream
modulators of the classical apoptotic genes, including signaling molecules and transcription
factors, etc. The expression values from the ENCODE RNA-seq datasets were log-transformed
(log2(TPM+1)) before further Z-score normalization across developmental ages.
A total of 1306 apoptotic genes passed expression filtration (details in Materials and
Methods) and were subject to hierarchical clustering, which unbiasedly defined five clusters
(Figure 4.1a). Clusters 1 and 4 were the two major gene groups whose gene expression,
respectively, decreased and increased chronologically during brain development (Figure 4.1b).
Cluster 1 contained 603 downregulated apoptotic genes. The median fold change between the last
74
(P0) and the first (E10.5) age points was 0.45 (P0 vs. E10.5). On the contrary, cluster 4 contained
434 apoptotic genes increasing their expression levels with a median 2.45-fold change.
The remaining genes belonged to three smaller clusters. Cluster 2 contained 77 genes, the
expression of which were reduced during the early embryonic stage of development but appeared
to be upregulated around birth. By contrast, cluster 5 contained 189 genes, modestly increasing
expression in the early phase of forebrain development but decreasing expression from E16.5 to
P0. Cluster 3 contained only three genes without a clear pattern of temporal regulation. In
summary, a majority of the apoptotic genes (1037 out of 1306) falling within cluster 1 or 4
exhibited nearly unidirectional expression changes during mouse forebrain development. We
called clusters 1 and 4 the “downregulation cluster” and the “upregulation cluster”, respectively,
for further analyses.
75
Figure 4.1 Expression dynamic of apoptosis−related genes during mouse forebrain development (a) Hierarchical clustering and
expression heatmap of apoptosis−related genes across development ages (E10.5, E11.5, E12.5, E13.5, E14.5, E15.5, E16.5, P0).
Clusters are defined by the red line crossing the dendrogram. Cluster numbers are noted next to each cluster tree. (b) mRNA
expression of five clusters defined in (a). The expression level of each gene was log−transformed (log2(TPM+1)) and then Z−score
normalized (i.e., centered by the mean and divided by the standard deviation across different development ages).
4.3.2 Temporal Expression Patterns of Apoptosis-Related Genes Are Consistent across
Different Brain Regions during Development
To determine whether the dynamic regulation of these apoptosis-related genes was unique
to forebrain development or ubiquitous to general neuronal development, we examined mouse
midbrain and hindbrain transcriptomic RNA-seq data from the ENCODE project. Specifically, we
determined the temporal expression patterns of the five forebrain clusters during midbrain and
hindbrain development. Overall, the temporal patterns for each of these clusters followed a similar
76
trend among the three brain regions (Figure 4.2). In particular, forebrain cluster 1 displayed a
downregulation trend and cluster 4 consistently exhibited an upregulation pattern in both the
midbrain and the hindbrain regions.
Figure 4.2 Expression of apoptotic genes in midbrain and hindbrain regions. Gene clusters were defined by the forebrain data.
Forebrain cluster 1
Forebrain cluster 2
Forebrain cluster 1
Forebrain cluster 2
Forebrain cluster 3 Forebrain cluster 3
Forebrain cluster 4 Forebrain cluster 4
Forebrain cluster 5 Forebrain cluster 5
Midbrain Hindbrain
77
We also conducted expression clustering ab initio based on the time-series RNA-seq data
from the midbrain and hindbrain regions. As shown in Figure 4.3a, a majority of apoptotic genes
belonged to either a downregulation cluster (615 genes, or 47% of 1317 total expressed genes in
the midbrains; 594 genes, or 44% of 1340 total expressed genes in the hindbrains) or an
upregulation cluster (518 and 545 genes for midbrains and hindbrains, respectively; or 39% and
41%, respectively, of their total expressed genes).
Figure 4.3 Consistent temporal expression patterns of apoptotic genes during development in three different brain regions (a)
Clustering heatmaps of apoptotic gene mRNA expression in the midbrain and hindbrain. Gene clusters with the dendrogram are
labeled on the right to the heat maps. (b) Venn diagrams of downregulated genes and upregulated genes defined by hierarchical
clustering separately performed for three brain regions, showing substantial overlaps among the three brain regions.
78
More importantly, genes in the downregulation clusters largely overlapped between the
three brain regions (Figure 4.3b). A total of 434 downregulated apoptotic genes (56.4% of the
cluster union) were identified in all three brain regions. The genes in the upregulation clusters,
likewise, shared 344 (50.7%) of apoptotic genes among the three regions. This shows that the
regulation of apoptotic genes during neurogenesis is largely consistent across brain regions,
indicating their robust transcriptional reprogramming is intrinsic to neuronal differentiation and
ubiquitous to neuronal types.
4.3.3 The Regulation of the Classical Apoptosis Pathway during Mouse Brain Development
We focused on genes in the classic apoptosis pathway, defined by the KEGG database, for their
expression dynamics during brain development. These genes were most extensively reported for
their roles in apoptosis, whereas other GO-annotated apoptotic genes were studied in a more
limited context, for example, in a specific cell type or under a specific treatment condition. As
shown in Figure 4.4, out of the 79 classic apoptosis genes, 39 genes displayed a decreasing
expression trend during development in any of the brain regions, while 22 genes increased their
expression chronologically. The large proportion of genes changing expression (61/79 or 77%)
demonstrates, again, that differentiating neurons drastically alter their regulation of apoptosis
during development.
79
Figure 4.4 The classic apoptosis pathway from the KEGG database marked with regulatory directions of developmental gene
ex-pression changes and functional activity on apoptosis.
Among the pro-apoptotic genes, 28 were downregulated, outnumbering the 13 upregulated
genes, consistent with the notion of decreasing apoptosis competence during neuronal
differentiation. The downregulated pro-apoptotic genes included genes in the TNF signaling
pathway (TNF-R, FADD, RIP1 (RIPK1)), genes in the ER stress pathway (IRE1α (ERN1), IP3R
(ITPR1), Eif2α), pro-apoptotic Bcl-2 family members (Bax, Bak1, Noxa (PMAIP1), Bim
(BCL2L11)), and caspases (Casp2, Casp6, Casp8, Casp9). Mitochondrial proteins cytochrome C,
Diablo, and AIF (AIFM1), and DNA damage response genes (ATM, p35 (CDK5R1), p53 (TP53))
also reduced their expression during development.
Among the anti-apoptotic genes, eight were upregulated, including anti-apoptotic Bcl-2
family proteins (Bcl-2, Bcl-xL (BCL2L1)) and the PI3k-Akt signaling pathway (PI3K (PIK3CA),
80
Ras (HRAS/NRAS/ KRAS), Akt (AKT1), Erk1/2 (MAPK3/1)). On the other hand, nine genes were
downregulated, including Traf2, Flip (CFLAR), Xiap, Mcl-1, Icad (DFFA), and NFкb (NFKB1/2).
In summary, both anti- and pro-apoptotic genes can be up- or downregulated, but nearly
half of the developmentally regulatory events in the classical apoptosis pathway are the
downregulation of pro-apoptotic genes, which makes up the largest regulatory gene group (28/(28
+ 13 + 8 + 9) = 48%). Interestingly, in the downregulation cluster, classical pro-apoptotic genes
were more downregulated than anti-apoptotic genes (fold-change medians: 0.38 vs. 0.67; p-value
of the one-sided Wilcoxon signed-rank test: 0.04). Therefore, by number or magnitude, the proapoptotic genes are more downregulated than the anti-apoptotic genes.
4.3.4 Distinct Histone Modifications Underly the Developmental Regulation of Apoptotic
Genes
To further quantify the temporal expression trends of apoptotic genes, we conducted
regression fitting between the gene expression level of each gene and time points (see the Materials
and Methods section for details) to derive its regression coefficient of gene expression (βEx) across
development. The sign (+ or −) of a coefficient indicates the direction of the developmental change,
whereas the absolute value of the coefficients indicate the “speed”, or the “magnitude of the
developmental change per time unit”. As expected, the regression coefficients βEx were negative
for the downregulation clusters and positive for the upregulation clusters in all three brain regions
(Figure 4.5a and b).
To understand the mechanisms regulating the developmental expression dynamics of
apoptotic genes, we investigated the temporal changes of histone mark H3K4me3, which is known
to activate transcription and is highly enriched in gene promoter regions near transcriptional start
sites [215]. We also examined repressive histone mark H3K27me3, which also resides around
promoter areas but causes downregulation of nearby genes. We utilized the ENCODE ChIP-seq
81
data corresponding to the same developmental ages and the same brain regions for precise
inference. For each gene, we obtained the histone mark intensities at its promoter and quantified
their changes along the brain development period through a regression fitting between histone
mark intensities and time points (see Materials and Methods for details). The regression
coefficients (βK4, βK27) resulting from this analysis represented the change of histone modification
intensities over a time unit.
For the downregulation clusters, we found the H3K4me3 coefficients βK4 mostly agreed
with the developmental gene expression trends. Over three-quarters of genes displayed a negative
βK4, consistent with the notion that H3K4me3 is positively correlated with transcription (Figure
4.5). For these genes, the loss of H3K4me3 at promoters during development implicates reduced
transcription or expression levels over time. Interestingly, the repressive H3K27me3 mark showed
insignificant contributions to the expression changes, as the median βK27 hovered around 0 and
most values were relatively small (Figure 4.5d). In other words, the H3K27me3 patterns do not
change much for most genes in the downregulation clusters.
The upregulation clusters exhibited notably different patterns of βK4 and βK27 than the
downregulation clusters. If H3K4me3 alteration was the major driving force of expression
changes, the upregulation cluster was expected to contain mostly positive βK4. However, the
H3K4me3 mark was not accordant with the expression changes, as median βK4 was about 0 for the
forebrain and midbrain regions (Figure 4.5e). The hindbrain even atypically exhibited negative βK4
for most upregulated genes (median: −0.19). Therefore, H3K4me3 is not predictive of the
developmental increase in expression of most apoptotic genes in the upregulation clusters.
By contrast, the loss of repressive H3K27me3 paralleled the increase in apoptotic gene
expression. The βK27 of the upregulation cluster clearly displayed asymmetric regulatory
82
directions, with over three-quarters of them in a negative range and the others in a negligibly
positive range (Figure 4.5f). This meant, for most genes, peak intensities of H3K27me3 dropped
over the developmental period. Taken together, the developmental upregulation of apoptotic genes
can be attributed to the consistent removal of H3K27me3 marks from the promoters and, for only
a selection of genes, to the addition of H3K4me3 marks.
Figure 4.5 Distinct histone modifications are responsible for the chronical change in expression of the downregulation and
upregulation clusters. Regression coefficients (βEx) between mRNA expression and ages for the downregulation clusters (a) and
the upregulation clusters (b) in the forebrain, midbrain, and hindbrain. Regression coefficients (βK4) between H3K4me3 signals
and ages for genes in the downregulation clusters (c) and the upregulation clusters (d) in the forebrain, midbrain, and hindbrain.
Regression coefficients (βK27) between H3K27me3 signals and ages for genes in the downregulation clusters (e) and the
upregulation clusters (f) in the forebrain, midbrain, and hindbrain. All three brain regions show consistent patterns.
83
4.3.5 Developmental Remodeling of H3K4me3 and H3K27me3 Marks Underscores
Expression Changes of Pidd1, Casp6, Mapk8, and Tradd
To illustrate the different epigenetic regulations as driving factors of developmental gene
expression changes, we examined representative genes from the KEGG apoptotic pathway (Figure
4.4). For example, Pidd1 (P53-induced death domain protein 1) and Casp6 are two genes in the
downregulation clusters of all three brain regions. PIDD1, caspase-2, and adaptor protein CRADD
form a multiprotein complex, PIDDosome, that activates caspase-2 to initiate apoptosis [219, 220].
CASP6 belongs to the cysteine-aspartic acid protease (caspase) gene family and is important for
the execution phase of cell death [186-188].
As shown in Figure 4.6a and b, the H3K4me3 peaks at the promoter regions of Pidd1 and
Casp6 diminished over time. By contrast, the H3K27me3 marks exhibited no pattern of regulation.
As a result, Pidd1 and Casp6 mRNA expressions were downregulated 9- and 5-fold from E10.5
to P0 in the brain, respectively (Figure 4.6a,b). Notably, the H3K4me3 peak of Pidd1 showed the
largest drop between E10.5 and E11.5, consistent with the steepest decline in mRNA expression
at the same time. Therefore, changes in H3K4me3 rather than H3K27me3 are the main contributor
to developmental downregulation of Pidd1 and Casp6.
The upregulation cluster included Mapk8 and Tradd ((Figure 4.6c,d). Mapk8 encodes
mitogen-activated protein kinase 8, belonging to the MAP kinase family [221]. MAPK8 plays an
important role in TNF alpha-mediated cell death and UV-induced apoptosis [222-224]. TRADD
encodes TNFR1-associated death domain protein, which functions in the TNFR1 signaling
pathway to regulate TNFα-induced extrinsic apoptosis and proteasomal stress-induced apoptosis
[225-229]. For both genes, there were no clear trends of developmental changes in H3K4me3.
However, their H3K27me3 signals markedly declined over time ((Figure 4.6c,d). This led to an
overall increase in mRNA expression of these two genes during development.
84
Figure 4.6 Histone modification patterns of representative genes in developing mouse brain tissues. Top: H3K4me3 (green) and
H3K27me3 (red) ChIP-seq peak signals around selected TSSs at eight time points; bottom: mRNA expression of the gene in the
same corresponding brain region as shown on the top, analyzed by RNA-seq. (a) Pidd1 in the midbrain, (b) Casp6 in the midbrain,
(c) Mapk8 in the forebrain, (d) Tradd in the forebrain.
4.4 Discussion
Neurons contain a unique regulatory program for apoptosis, as cell death needs to be attenuated to
allow continuous neuronal survival. The regulatory program, as regards to apoptotic gene
expression, is presumably determined during neuronal development. We found most apoptosisrelated genes, including those in the classic apoptosis pathway, are subject to developmental
regulation, consistent with the idea that neurons transform how they regulate apoptosis during
differentiation. This developmental reprogramming is conserved, sharing most of the upregulated
and downregulated genes between different brain regions (forebrain, midbrain, and hindbrain),
and, therefore, is intrinsic to the process of neural differentiation and ubiquitous to various
85
neuronal types. Interestingly, neither the up- nor downregulatory trend is restricted to one category
of anti- or pro-apoptotic genes (determined by GO terms). This is probably not surprising given
the vast number of apoptosis-related genes considered in the study.
However, the core genes in the classical apoptosis pathway (defined by KEGG) exhibit
skewed regulation. Specifically, among the pro-apoptotic genes, more are downregulated than
upregulated. Among the downregulated genes, pro-apoptotic genes reduce expression more than
anti-apoptotic genes do. Notably, while anti-apoptotic Bcl-2 family proteins (Bcl-2 and Bcl-xl) are
upregulated, most pro-apoptotic Bcl-2 family proteins (Bax, Bak, Noxa, Bim) and caspases (Casp2,
Casp6, Casp8, Casp9) are downregulated. This would reduce the overall “fuel” for apoptosis,
agreeing with the notion that the balance and relative ratio of pro- and anti-apoptotic genes
determine cell death outcomes and that neurons attenuate apoptosis during development [195,
196].
Casp3 and Casp7 are the exceptions, as they increase expression during embryonic
development. They are best known as the executioner caspases of apoptosis for destruction of
cellular proteins and structures, irrespective of upstream death signals. In neurons, caspase-3 is
also important for dendritic trimming and synaptic pruning, which is necessary during the early
postnatal period to establish robust and responsive neural circuits [230-234]. Furthermore,
caspase-3 has non-apoptotic functions for synaptic plasticity in adult neurons [235, 236]. How this
localized function of caspase-3 is activated and restricted without inducing a full-blown cell death
program is an interesting subject. The essential role of caspase-3 for neural circuit formation likely
restrains its developmental downregulation, or even requires its upregulation so as to be distributed
broadly in distant dendritic processes.
86
Despite upregulation of effector caspases Casp3 and Casp7, neurons uniformly turn down
initiator caspases (Casp2, Casp8, and Casp9). This constitutes an effective strategy to attenuate
apoptosis, because, in healthy cells, caspases are zymogens with weak or little protease activity
and a hierarchy of caspase activation is needed to amplify the destructive signals leading to
apoptosis [187, 237-239]. For example, cytochrome c, released from mitochondria, interacts with
Apaf1 to form apoptosome, a complex that recruits caspase-9, causing its self-cleavage and
activation, which in turn cleaves and activates caspase-3. Cytochrome c, Apaf1, and Casp9 all
decrease mRNA expression during development (Figure 4.4), effectively reducing the chance of
caspase-3 activation.
Another regulatory checkpoint restricting apoptosis is the Bcl-2 family of proteins. Many
apoptotic signals converge on BAX and BAK1 in mitochondria by inducing their oligomerization
or formation of mitochondrial transition pores, leading to mitochondrial membrane
permeabilization and release of cytochrome c and DIABLO [240, 241]. BCL-xL and BCL-2 can
interact with BAX and BAK1 to inhibit mitochondrial membrane permeabilization [242, 243].
Intracellular ratios between anti- and pro-apoptotic BCL-2 family proteins determine cells’
responsiveness to death stimuli [244, 245]. This ratio is magnified by the upregulation of Bcl-xl
and Bcl-2 and simultaneous downregulation of Bax and Bak1, thereby significantly diminishing
neurons’ vulnerability towards apoptosis activation. Notably, Bak1 is subject to further posttranscriptional regulation. Increased splicing of Bak1 exon 5 during development generates an
alternative isoform, degraded by the nonsense-mediated mRNA decay pathway without productive
translation, such that the BAK1 protein is nearly absent in neurons [204]. The abatement of
apoptosome components and the increasing ratio of anti- to pro-apoptotic Bcl-2 family members
present two critical “choke” points to attenuate apoptosis in neurons.
87
The developmental regulation of apoptotic genes coincides with widespread changes in
histone modification marks at their promoter regions. This developmental rearrangement of
H3K4me3 and H3K27me3 for apoptotic genes may constitute a larger epigenome program. The
H3K4me3 landscape continuously changes during brain development and has been implicated in
directing neuronal lineage differentiation [216, 246, 247]. H3K27me3 regulation has also been
shown to orchestrate several aspects of neurogenesis, including the balance of self-renewal and
differentiation of neural progenitors, as well as the switch from neurogenesis to gliogenesis [248-
251].
We found the upregulation and downregulation clusters of apoptotic genes to undergo
distinct dynamic histone modification mark transitions. The predominant driving force for the
developmental downregulation of apoptotic genes is reducing H3K4me3 signals on their
promoters where H3K7me3 does not significantly change. By contrast, downregulation of
H3K27me3 is responsible for developmental upregulation of apoptotic genes, while changes in
H3K4me3 are not predictive of the increased gene expression. The differential re-modeling of
H3K4me3 and H3K27me3 during development for the upregulation vs. downregulation clusters
suggests divergent epigenetic regulatory mechanisms of apoptotic genes, which is an interesting
topic for future investigation.
Chapter 5 Conclusions and Future work
Comparative transcriptomics is a powerful tool for investigating evolutionary relationships
among species and deciphering fundamental biological processes. Throughout this dissertation,
our comprehensive exploration of the transcriptomic landscape across species has provided
valuable insights into gene expression, alternative splicing, epigenetic regulation, and their
intricate roles in tissue specificity, longevity, and apoptosis. Leveraging comparative analyses,
88
we successfully identified genes undergoing tissue specificity shifts, alternative splicing events
that are associated to longevity, as well as epigenetic regulation of apoptotic genes during brain
development.
In Chapter 2, we identified hundreds of gene that shifted or in the process of shifting their
tissue specificity, understanding how these shifts may contribute to functional implications and
physiological adaptations could be a potential direction for future research. Chapter 3 revealed
hundreds of alternative splicing events that are associated with maximum lifespans, further
exploration of events may uncover additional regulatory mechanisms of aging and potential
targets promoting healthy aging. Due to the limitation of dataset size, more homologous
alternative splicing events may be detected with a larger dataset. Therefore, future improvements
may involve refining our pipeline for detecting homologous alternative splicing events if larger
dataset containing more species is available. Methods for classifying or detecting longevityassociated events could also be improved with larger dataset, ultimately enhancing implications
even with limited data. With the current finding that longevity-associated alternative splicing
events are enriched in intrinsically disordered proteins (IDP), our future investigations aim to
delve deeper into the underlying mechanisms driving this enrichment.
As we continue to unravel the intricacies of transcriptomics across species, these findings
lay the groundwork for future research endeavors, ultimately contributing to a deeper
understanding of molecular processes in health and disease.
89
References
1. Willingham, A.T. and T.R. Gingeras, TUF love for "junk" DNA. Cell, 2006. 125(7): p.
1215-20.
2. Jacquier, A., The complex eukaryotic transcriptome: unexpected pervasive transcription
and novel small RNAs. Nat Rev Genet, 2009. 10(12): p. 833-44.
3. Mardis, E.R., Next-generation DNA sequencing methods. Annu Rev Genomics Hum
Genet, 2008. 9: p. 387-402.
4. Wang, Z., M. Gerstein, and M. Snyder, RNA-Seq: a revolutionary tool for
transcriptomics. Nat Rev Genet, 2009. 10(1): p. 57-63.
5. Crick, F., Central dogma of molecular biology. Nature, 1970. 227(5258): p. 561-3.
6. Mattick, J.S. and I.V. Makunin, Non-coding RNA. Hum Mol Genet, 2006. 15 Spec No 1:
p. R17-29.
7. Pascual-Ahuir, A., J. Fita-Torró, and M. Proft, Capturing and Understanding the
Dynamics and Heterogeneity of Gene Expression in the Living Cell. Int J Mol Sci, 2020.
21(21).
8. Atkinson, T.J. and M.S. Halfon, Regulation of gene expression in the genomic context.
Comput Struct Biotechnol J, 2014. 9: p. e201401001.
9. Winter, E.E., L. Goodstadt, and C.P. Ponting, Elevated rates of protein secretion,
evolution, and disease among tissue-specific genes. Genome Res, 2004. 14(1): p. 54-61.
10. Kitsak, M., et al., Tissue Specificity of Human Disease Module. Sci Rep, 2016. 6: p.
35241.
11. Barshir, R., et al., Comparative analysis of human tissue interactomes reveals factors
leading to tissue-specific manifestation of hereditary diseases. PLoS Comput Biol, 2014.
10(6): p. e1003632.
12. Koh, W., et al., Noninvasive in vivo monitoring of tissue-specific global gene expression
in humans. Proc Natl Acad Sci U S A, 2014. 111(20): p. 7361-6.
13. Haigis, K.M., K. Cichowski, and S.J. Elledge, Tissue-specificity in cancer: The rule, not
the exception. Science, 2019. 363(6432): p. 1150-1151.
14. Schneider, G., et al., Tissue-specific tumorigenesis: context matters. Nat Rev Cancer,
2017. 17(4): p. 239-253.
90
15. Graveley, B.R., Alternative splicing: increasing diversity in the proteomic world.
TRENDS in Genetics, 2001. 17(2): p. 100-107.
16. Pan, Q., et al., Deep surveying of alternative splicing complexity in the human
transcriptome by high-throughput sequencing. Nature genetics, 2008. 40(12): p. 1413-
1415.
17. Marquez, Y., et al., Transcriptome survey reveals increased complexity of the alternative
splicing landscape in Arabidopsis. Genome research, 2012. 22(6): p. 1184-1195.
18. Cui, Y., M. Cai, and H.E. Stanley, Comparative analysis and classification of cassette
exons and constitutive exons. BioMed research international, 2017. 2017.
19. Sammeth, M., S. Foissac, and R. Guigó, A general definition and nomenclature for
alternative splicing events. PLoS Comput Biol, 2008. 4(8): p. e1000147.
20. Black, D.L., Mechanisms of alternative pre-messenger RNA splicing. Annual review of
biochemistry, 2003. 72(1): p. 291-336.
21. Sugnet, C.W., et al., Transcriptome and genome conservation of alternative splicing
events in humans and mice. Pac Symp Biocomput, 2004: p. 66-77.
22. Sammeth, M., Complete alternative splicing events are bubbles in splicing graphs. J
Comput Biol, 2009. 16(8): p. 1117-40.
23. Vanichkina, D.P., et al. Challenges in defining the role of intron retention in normal
biology and disease. in Seminars in cell & developmental biology. 2018. Elsevier.
24. Wang, Z., et al., General and specific functions of exonic splicing silencers in splicing
control. Molecular cell, 2006. 23(1): p. 61-70.
25. Jaillon, O., et al., Translational control of intron splicing in eukaryotes. Nature, 2008.
451(7176): p. 359-362.
26. Wahl, M.C., C.L. Will, and R. Lührmann, The spliceosome: design principles of a
dynamic RNP machine. Cell, 2009. 136(4): p. 701-718.
27. Will, C.L. and R. Lührmann, Spliceosome structure and function. Cold Spring Harbor
perspectives in biology, 2011. 3(7): p. a003707.
28. Staley, J.P. and C. Guthrie, Mechanical devices of the spliceosome: motors, clocks,
springs, and things. Cell, 1998. 92(3): p. 315-326.
29. Wolf, E., et al., Exon, intron and splice site locations in the spliceosomal B complex. The
EMBO Journal, 2009. 28(15): p. 2283-2292.
91
30. Sander, B., et al., Organization of core spliceosomal components U5 snRNA loop I and
U4/U6 Di-snRNP within U4/U6. U5 Tri-snRNP as revealed by electron cryomicroscopy.
Molecular cell, 2006. 24(2): p. 267-278.
31. Boehringer, D., et al., Three-dimensional structure of a pre-catalytic human spliceosomal
complex B. Nature structural & molecular biology, 2004. 11(5): p. 463-468.
32. Jurica, M.S., et al., Three-dimensional structure of C complex spliceosomes by electron
microscopy. Nature structural & molecular biology, 2004. 11(3): p. 265-269.
33. Behzadnia, N., et al., Composition and three‐dimensional EM structure of double affinity‐
purified, human prespliceosomal A complexes. The EMBO journal, 2007. 26(6): p. 1737-
1748.
34. Matlin, A.J., F. Clark, and C.W. Smith, Understanding alternative splicing: towards a
cellular code. Nature reviews Molecular cell biology, 2005. 6(5): p. 386-398.
35. Jensen, K.B., et al., Nova-1 regulates neuron-specific alternative splicing and is essential
for neuronal viability. Neuron, 2000. 25(2): p. 359-371.
36. Ule, J., et al., CLIP identifies Nova-regulated RNA networks in the brain. Science, 2003.
302(5648): p. 1212-1215.
37. Polydorides, A.D., et al., A brain-enriched polypyrimidine tract-binding protein
antagonizes the ability of Nova to regulate neuron-specific alternative splicing.
Proceedings of the National Academy of Sciences, 2000. 97(12): p. 6350-6355.
38. Wang, Z. and C.B. Burge, Splicing regulation: from a parts list of regulatory elements to
an integrated splicing code. Rna, 2008. 14(5): p. 802-813.
39. Wang, Y., et al., Mechanism of alternative splicing and its regulation. Biomedical
reports, 2015. 3(2): p. 152-158.
40. Eversole, A. and N. Maizels, In vitro properties of the conserved mammalian protein
hnRNP D suggest a role in telomere maintenance. Molecular and cellular biology, 2000.
20(15): p. 5425-5432.
41. Cartegni, L., S.L. Chew, and A.R. Krainer, Listening to silence and understanding
nonsense: exonic mutations that affect splicing. Nature reviews genetics, 2002. 3(4): p.
285-298.
42. Blanchette, M. and B. Chabot, Modulation of exon skipping by high‐affinity hnRNP A1‐
binding sites and by intron elements that repress splice site utilization. The EMBO
journal, 1999. 18(7): p. 1939-1952.
92
43. kyung Oh, H., et al., hnRNP A1 contacts exon 5 to promote exon 6 inclusion of apoptotic
Fas gene. Apoptosis, 2013. 18(7): p. 825-835.
44. Motta-Mena, L.B., F. Heyd, and K.W. Lynch, Context-dependent regulatory mechanism
of the splicing factor hnRNP L. Molecular cell, 2010. 37(2): p. 223-234.
45. Lim, K.H., et al., Using positional distribution to identify splicing elements and predict
pre-mRNA processing defects in human genes. Proceedings of the National Academy of
Sciences, 2011. 108(27): p. 11093-11098.
46. Licatalosi, D.D., et al., HITS-CLIP yields genome-wide insights into brain alternative
RNA processing. Nature, 2008. 456(7221): p. 464-469.
47. Ule, J., et al., CLIP: a method for identifying protein–RNA interaction sites in living cells.
Methods, 2005. 37(4): p. 376-386.
48. Garcia-Blanco, M.A., A.P. Baraniak, and E.L. Lasda, Alternative splicing in disease and
therapy. Nature biotechnology, 2004. 22(5): p. 535-546.
49. Hafner, M., et al., Transcriptome-wide identification of RNA-binding protein and
microRNA target sites by PAR-CLIP. Cell, 2010. 141(1): p. 129-141.
50. Konig, J., et al., iCLIP-transcriptome-wide mapping of protein-RNA interactions with
individual nucleotide resolution. JoVE (Journal of Visualized Experiments), 2011(50): p.
e2638.
51. Spitzer, J., et al., PAR-CLIP (Photoactivatable Ribonucleoside-Enhanced Crosslinking
and Immunoprecipitation): a step-by-step protocol to the transcriptome-wide
identification of binding sites of RNA-binding proteins, in Methods in enzymology. 2014,
Elsevier. p. 113-161.
52. Wang, T., et al., Design and bioinformatics analysis of genome-wide CLIP experiments.
Nucleic acids research, 2015. 43(11): p. 5263-5274.
53. Garzia, A., et al., Optimization of PAR-CLIP for transcriptome-wide identification of
binding sites of RNA-binding proteins. Methods, 2017. 118: p. 24-40.
54. Kishore, S., et al., A quantitative analysis of CLIP methods for identifying binding sites of
RNA-binding proteins. Nature methods, 2011. 8(7): p. 559-564.
55. Uhl, M., et al., Computational analysis of CLIP-seq data. Methods, 2017. 118: p. 60-72.
56. Buratti, E. and F.E. Baralle, Influence of RNA secondary structure on the pre-mRNA
splicing process. Molecular and cellular biology, 2004. 24(24): p. 10505-10514.
93
57. Chebli, K., et al., The 216-nucleotide intron of the E1A pre-mRNA contains a hairpin
structure that permits utilization of unusually distant branch acceptors. Molecular and
cellular biology, 1989. 9(11): p. 4852-4861.
58. Schmucker, D., et al., Drosophila Dscam is an axon guidance receptor exhibiting
extraordinary molecular diversity. Cell, 2000. 101(6): p. 671-684.
59. Graveley, B.R., Mutually exclusive splicing of the insect Dscam pre-mRNA directed by
competing intronic RNA secondary structures. Cell, 2005. 123(1): p. 65-73.
60. Warf, M.B. and J.A. Berglund, Role of RNA structure in regulating pre-mRNA splicing.
Trends in biochemical sciences, 2010. 35(3): p. 169-178.
61. Vaquero-Garcia, J., et al., A new view of transcriptome complexity and regulation
through the lens of local splicing variations. elife, 2016. 5: p. e11752.
62. Tazi, J., N. Bakkour, and S. Stamm, Alternative splicing and disease. Biochimica et
Biophysica Acta (BBA)-Molecular Basis of Disease, 2009. 1792(1): p. 14-26.
63. Kouzarides, T., Chromatin modifications and their function. Cell, 2007. 128(4): p. 693-
705.
64. Lomvardas, S. and T. Maniatis, Histone and DNA Modifications as Regulators of
Neuronal Development and Function. Cold Spring Harb Perspect Biol, 2016. 8(7).
65. Kornberg, R.D. and Y. Lorch, Twenty-five years of the nucleosome, fundamental particle
of the eukaryote chromosome. Cell, 1999. 98(3): p. 285-94.
66. Luger, K. and J.C. Hansen, Nucleosome and chromatin fiber dynamics. Curr Opin Struct
Biol, 2005. 15(2): p. 188-96.
67. McGhee, J.D. and G. Felsenfeld, Nucleosome structure. Annu Rev Biochem, 1980. 49: p.
1115-56.
68. McGhee, J.D., et al., Higher order structure of chromatin: orientation of nucleosomes
within the 30 nm chromatin solenoid is independent of species and spacer length. Cell,
1983. 33(3): p. 831-41.
69. Robinson, P.J. and D. Rhodes, Structure of the '30 nm' chromatin fibre: a key role for the
linker histone. Curr Opin Struct Biol, 2006. 16(3): p. 336-43.
70. Kornberg, R.D. and J.O. Thomas, Chromatin structure; oligomers of the histones.
Science, 1974. 184(4139): p. 865-8.
71. Kornberg, R.D., Chromatin structure: a repeating unit of histones and DNA. Science,
1974. 184(4139): p. 868-71.
94
72. Luger, K., et al., Crystal structure of the nucleosome core particle at 2.8 A resolution.
Nature, 1997. 389(6648): p. 251-60.
73. Peterson, C.L. and M.A. Laniel, Histones and histone modifications. Curr Biol, 2004.
14(14): p. R546-51.
74. Bannister, A.J. and T. Kouzarides, Regulation of chromatin by histone modifications.
Cell Res, 2011. 21(3): p. 381-95.
75. Bártová, E., et al., Histone modifications and nuclear architecture: a review. J Histochem
Cytochem, 2008. 56(8): p. 711-21.
76. Zappacosta, F., et al., A Chemical Acetylation-Based Mass Spectrometry Platform for
Histone Methylation Profiling. Mol Cell Proteomics, 2021. 20: p. 100067.
77. ALLFREY, V.G., R. FAULKNER, and A.E. MIRSKY, ACETYLATION AND
METHYLATION OF HISTONES AND THEIR POSSIBLE ROLE IN THE REGULATION
OF RNA SYNTHESIS. Proc Natl Acad Sci U S A, 1964. 51(5): p. 786-94.
78. Turner, B.M., Histone acetylation and control of gene expression. J Cell Sci, 1991. 99
( Pt 1): p. 13-20.
79. Ng, S.S., et al., Dynamic protein methylation in chromatin biology. Cell Mol Life Sci,
2009. 66(3): p. 407-22.
80. Bedford, M.T. and S.G. Clarke, Protein arginine methylation in mammals: who, what,
and why. Mol Cell, 2009. 33(1): p. 1-13.
81. Greer, E.L. and Y. Shi, Histone methylation: a dynamic mark in health, disease and
inheritance. Nat Rev Genet, 2012. 13(5): p. 343-57.
82. Domínguez-Oliva, A., et al., The Importance of Animal Models in Biomedical Research:
Current Insights and Applications. Animals (Basel), 2023. 13(7).
83. Kehrer-Sawatzki, H. and D.N. Cooper, Understanding the recent evolution of the human
genome: insights from human-chimpanzee genome comparisons. Hum Mutat, 2007.
28(2): p. 99-130.
84. Kehrer-Sawatzki, H. and D.N. Cooper, Structural divergence between the human and
chimpanzee genomes. Hum Genet, 2007. 120(6): p. 759-78.
85. Waterston, R.H., et al., Initial sequencing and comparative analysis of the mouse
genome. Nature, 2002. 420(6915): p. 520-62.
86. Breschi, A., T.R. Gingeras, and R. Guigó, Comparative transcriptomics in human and
mouse. Nat Rev Genet, 2017. 18(7): p. 425-440.
95
87. Koonin, E.V., L. Aravind, and A.S. Kondrashov, The impact of comparative genomics on
our understanding of evolution. Cell, 2000. 101(6): p. 573-6.
88. Tatusov, R.L., et al., The COG database: a tool for genome-scale analysis of protein
functions and evolution. Nucleic Acids Res, 2000. 28(1): p. 33-6.
89. Sivashankari, S. and P. Shanmughavel, Comparative genomics - a perspective.
Bioinformation, 2007. 1(9): p. 376-8.
90. Haubold, B. and T. Wiehe, Comparative genomics: methods and applications.
Naturwissenschaften, 2004. 91(9): p. 405-21.
91. Wei, L., et al., Comparative genomics approaches to study organism similarities and
differences. J Biomed Inform, 2002. 35(2): p. 142-50.
92. Alvarez, M., A.W. Schrey, and C.L. Richards, Ten years of transcriptomics in wild
populations: what have we learned about their ecology and evolution? Mol Ecol, 2015.
24(4): p. 710-25.
93. Romero, I.G., I. Ruvinsky, and Y. Gilad, Comparative studies of gene expression and the
evolution of gene regulation. Nat Rev Genet, 2012. 13(7): p. 505-16.
94. DeBiasse, M.B. and M.W. Kelly, Plastic and Evolved Responses to Global Change:
What Can We Learn from Comparative Transcriptomics? J Hered, 2016. 107(1): p. 71-
81.
95. Jorstad, N.L., et al., Comparative transcriptomics reveals human-specific cortical
features. Science, 2023. 382(6667): p. eade9516.
96. Krienen, F.M., et al., Innovations present in the primate interneuron repertoire. Nature,
2020. 586(7828): p. 262-269.
97. Fang, R., et al., Conservation and divergence of cortical cell organization in human and
mouse revealed by MERFISH. Science, 2022. 377(6601): p. 56-62.
98. Bakken, T.E., et al., Comparative cellular analysis of motor cortex in human, marmoset
and mouse. Nature, 2021. 598(7879): p. 111-119.
99. Lage, K., et al., A large-scale analysis of tissue-specific pathology and gene expression of
human disease genes and complexes. Proc Natl Acad Sci U S A, 2008. 105(52): p.
20870-5.
100. Shafer, M.E.R., Cross-Species Analysis of Single-Cell Transcriptomic Data. Front Cell
Dev Biol, 2019. 7: p. 175.
96
101. Roux, J., M. Rosikiewicz, and M. Robinson-Rechavi, What to compare and how:
Comparative transcriptomics for Evo-Devo. J Exp Zool B Mol Dev Evol, 2015. 324(4):
p. 372-82.
102. Rohlfs, R.V. and R. Nielsen, Phylogenetic ANOVA: The Expression Variance and
Evolution Model for Quantitative Trait Evolution. Syst Biol, 2015. 64(5): p. 695-708.
103. Wittkopp, P.J. and G. Kalay, Cis-regulatory elements: molecular mechanisms and
evolutionary processes underlying divergence. Nat Rev Genet, 2011. 13(1): p. 59-69.
104. Meireles-Filho, A.C. and A. Stark, Comparative genomics of gene regulationconservation and divergence of cis-regulatory information. Curr Opin Genet Dev, 2009.
19(6): p. 565-70.
105. Pan, Q., et al., Deep surveying of alternative splicing complexity in the human
transcriptome by high-throughput sequencing. Nat Genet, 2008. 40(12): p. 1413-5.
106. Cui, Y., M. Cai, and H.E. Stanley, Comparative Analysis and Classification of Cassette
Exons and Constitutive Exons. Biomed Res Int, 2017. 2017: p. 7323508.
107. Marquez, Y., et al., Transcriptome survey reveals increased complexity of the alternative
splicing landscape in Arabidopsis. Genome Res, 2012. 22(6): p. 1184-95.
108. Barbosa-Morais, N.L., et al., The evolutionary landscape of alternative splicing in
vertebrate species. Science, 2012. 338(6114): p. 1587-93.
109. Naqvi, S., et al., Conservation, acquisition, and functional impact of sex-biased gene
expression in mammals. Science, 2019. 365(6450).
110. Consortium, G., The Genotype-Tissue Expression (GTEx) project. Nat Genet, 2013.
45(6): p. 580-5.
111. Dobin, A., et al., STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 2013.
29(1): p. 15-21.
112. Patro, R., et al., Salmon provides fast and bias-aware quantification of transcript
expression. Nat Methods, 2017. 14(4): p. 417-419.
113. Altschul, S.F., et al., Basic local alignment search tool. J Mol Biol, 1990. 215(3): p. 403-
10.
114. Sakuma, M., K. Iida, and M. Hagiwara, Deciphering targeting rules of splicing
modulator compounds: case of TG003. BMC Mol Biol, 2015. 16: p. 16.
115. Kryuchkova-Mostacci, N. and M. Robinson-Rechavi, A benchmark of gene expression
tissue-specificity metrics. Brief Bioinform, 2017. 18(2): p. 205-214.
97
116. Uhlén, M., et al., Transcriptomics resources of human tissues and organs. Mol Syst Biol,
2016. 12(4): p. 862.
117. Liao, B.Y. and J. Zhang, Evolutionary conservation of expression profiles between
human and mouse orthologous genes. Mol Biol Evol, 2006. 23(3): p. 530-40.
118. Yu, Y., et al., A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4
developmental stages. Nat Commun, 2014. 5: p. 3230.
119. Huh, J.W., et al., Large-scale transcriptome sequencing and gene analyses in the crabeating macaque (Macaca fascicularis) for biomedical research. BMC Genomics, 2012.
13: p. 163.
120. Fukushima, K. and D.D. Pollock, Amalgamated cross-species transcriptomes reveal
organ-specific propensity in gene expression evolution. Nat Commun, 2020. 11(1): p.
4459.
121. Zhu, J., et al., On the nature of human housekeeping genes. Trends Genet, 2008. 24(10):
p. 481-4.
122. Kirkwood, T.B., Understanding the odd science of aging. Cell, 2005. 120(4): p. 437-47.
123. Christensen, K., et al., Ageing populations: the challenges ahead. Lancet, 2009.
374(9696): p. 1196-208.
124. Dillin, A., D.E. Gottschling, and T. Nyström, The good and the bad of being connected:
the integrons of aging. Curr Opin Cell Biol, 2014. 26: p. 107-12.
125. MacNee, W., R.A. Rabinovich, and G. Choudhury, Ageing and the border between
health and disease. Eur Respir J, 2014. 44(5): p. 1332-52.
126. Hung, W.W., et al., Recent trends in chronic disease, impairment and disability among
older adults in the United States. BMC Geriatr, 2011. 11: p. 47.
127. Vijg, J. and E. Le Bourg, Aging and the Inevitable Limit to Human Life Span.
Gerontology, 2017. 63(5): p. 432-434.
128. Gorbunova, V., et al., Comparative genetics of longevity and cancer: insights from longlived rodents. Nat Rev Genet, 2014. 15(8): p. 531-40.
129. Zhao, Y., A. Seluanov, and V. Gorbunova, Revelations About Aging and Disease from
Unconventional Vertebrate Model Organisms. Annu Rev Genet, 2021. 55: p. 135-159.
130. Lu, J.Y., et al., Comparative transcriptomics reveals circadian and pluripotency
networks as two pillars of longevity regulation. Cell Metab, 2022. 34(6): p. 836-856.e5.
98
131. Johnson, F.B., D.A. Sinclair, and L. Guarente, Molecular biology of aging. Cell, 1999.
96(2): p. 291-302.
132. Bhadra, M., et al., Alternative splicing in aging and longevity. Hum Genet, 2020. 139(3):
p. 357-369.
133. Kim, D., et al., Graph-based genome alignment and genotyping with HISAT2 and
HISAT-genotype. Nat Biotechnol, 2019. 37(8): p. 907-915.
134. Kovaka, S., et al., Transcriptome assembly from long-read RNA-seq alignments with
StringTie2. Genome Biol, 2019. 20(1): p. 278.
135. Trincado, J.L., et al., SUPPA2: fast, accurate, and uncertainty-aware differential splicing
analysis across multiple conditions. Genome Biol, 2018. 19(1): p. 40.
136. Paradis, E., J. Claude, and K. Strimmer, APE: Analyses of Phylogenetics and Evolution in
R language. Bioinformatics, 2004. 20(2): p. 289-90.
137. Upham, N.S., J.A. Esselstyn, and W. Jetz, Inferring the mammal tree: Species-level sets
of phylogenies for questions in ecology, evolution, and conservation. PLoS Biol, 2019.
17(12): p. e3000494.
138. Subramanian, A., et al., Gene set enrichment analysis: a knowledge-based approach for
interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A, 2005. 102(43):
p. 15545-50.
139. Wu, T., et al., clusterProfiler 4.0: A universal enrichment tool for interpreting omics
data. Innovation (Camb), 2021. 2(3): p. 100141.
140. Ashburner, M., et al., Gene ontology: tool for the unification of biology. The Gene
Ontology Consortium. Nat Genet, 2000. 25(1): p. 25-9.
141. Kanehisa, M. and S. Goto, KEGG: kyoto encyclopedia of genes and genomes. Nucleic
Acids Res, 2000. 28(1): p. 27-30.
142. Paz, I., et al., RBPmap: a web server for mapping binding sites of RNA-binding proteins.
Nucleic Acids Res, 2014. 42(Web Server issue): p. W361-7.
143. de Magalhães, J.P., J. Costa, and G.M. Church, An analysis of the relationship between
metabolism, developmental schedules, and longevity using phylogenetic independent
contrasts. J Gerontol A Biol Sci Med Sci, 2007. 62(2): p. 149-60.
144. Owens, R.J. and F.E. Baralle, Mapping the collagen-binding site of human fibronectin by
expression in Escherichia coli. EMBO J, 1986. 5(11): p. 2825-30.
99
145. Calaycay, J., et al., Primary structure of a DNA- and heparin-binding domain (Domain
III) in human plasma fibronectin. J Biol Chem, 1985. 260(22): p. 12136-41.
146. Garcia-Pardo, A., A. Rostagno, and B. Frangione, Primary structure of human plasma
fibronectin. Characterization of a 38 kDa domain containing the C-terminal heparinbinding site (Hep III site) and a region of molecular heterogeneity. Biochem J, 1987.
241(3): p. 923-8.
147. Muro, A.F., et al., Regulated splicing of the fibronectin EDA exon is essential for proper
skin wound healing and normal lifespan. J Cell Biol, 2003. 162(1): p. 149-60.
148. Panda, S., J.B. Hogenesch, and S.A. Kay, Circadian rhythms from flies to human. Nature,
2002. 417(6886): p. 329-35.
149. Lowrey, P.L. and J.S. Takahashi, Mammalian circadian biology: elucidating genomewide levels of temporal organization. Annu Rev Genomics Hum Genet, 2004. 5: p. 407-
41.
150. Kondratov, R.V., et al., Early aging and age-related pathologies in mice deficient in
BMAL1, the core componentof the circadian clock. Genes Dev, 2006. 20(14): p. 1868-73.
151. Schumacher, B., et al., The central role of DNA damage in the ageing process. Nature,
2021. 592(7856): p. 695-703.
152. Karanjawala, Z.E. and M.R. Lieber, DNA damage and aging. Mech Ageing Dev, 2004.
125(6): p. 405-16.
153. Gensler, H.L. and H. Bernstein, DNA damage as the primary cause of aging. Q Rev Biol,
1981. 56(3): p. 279-303.
154. Jamar, N.H., P. Kritsiligkou, and C.M. Grant, Loss of mRNA surveillance pathways
results in widespread protein aggregation. Sci Rep, 2018. 8(1): p. 3894.
155. Gao, X., et al., Influences of donor and host age on human muscle-derived stem cellmediated bone regeneration. Stem Cell Res Ther, 2018. 9(1): p. 316.
156. Cheng, H., et al., Bone morphogenetic protein 4 rescues the bone regenerative potential
of old muscle-derived stem cells via regulation of cell cycle inhibitors. Stem Cell Res
Ther, 2022. 13(1): p. 385.
157. Wang, K., et al., Comprehensive map of age-associated splicing changes across human
tissues and their contributions to age-associated diseases. Sci Rep, 2018. 8(1): p. 10929.
158. Wright, P.E. and H.J. Dyson, Intrinsically disordered proteins in cellular signalling and
regulation. Nat Rev Mol Cell Biol, 2015. 16(1): p. 18-29.
100
159. Luo, C., H.R. Widlund, and P. Puigserver, PGC-1 Coactivators: Shepherding the
Mitochondrial Biogenesis of Tumors. Trends Cancer, 2016. 2(10): p. 619-631.
160. Puigserver, P., et al., A cold-inducible coactivator of nuclear receptors linked to adaptive
thermogenesis. Cell, 1998. 92(6): p. 829-39.
161. Singh, F., et al., PGC-1β modulates statin-associated myotoxicity in mice. Arch Toxicol,
2019. 93(2): p. 487-504.
162. Shi, D., et al., An isocaloric moderately high-fat diet extends lifespan in male rats and
Drosophila. Cell Metab, 2021. 33(3): p. 581-597.e9.
163. Awwad, S.W., et al., Recruitment of RBM6 to DNA Double-Strand Breaks Fosters
Homologous Recombination Repair. Mol Cell Biol, 2023. 43(3): p. 130-142.
164. Korotkov, A., A. Seluanov, and V. Gorbunova, Sirtuin 6: linking longevity with genome
and epigenome stability. Trends Cell Biol, 2021. 31(12): p. 994-1006.
165. Yousefzadeh, M., et al., DNA damage-how and why we age? Elife, 2021. 10.
166. Xiaoli, L., et al., Detection of genomic structure variants associated with wrinkled skin in
Xiang pig by next generation sequencing. Aging (Albany NY), 2021. 13(22): p. 24710-
24739.
167. Kwon, S.M., et al., Global spliceosome activity regulates entry into cellular senescence.
FASEB J, 2021. 35(1): p. e21204.
168. Angarola, B.L. and O. Anczuków, Splicing alterations in healthy aging and disease.
Wiley Interdiscip Rev RNA, 2021. 12(4): p. e1643.
169. Zhu, C., et al., regulates neurogenesis and reduces Alzheimer's disease-associated
pathology in the dentate gyrus of 5×FAD mice. Neural Regen Res, 2024. 19(4): p. 863-
871.
170. Zou, D., et al., Identification of molecular correlations of RBM8A with autophagy in
Alzheimer's disease. Aging (Albany NY), 2019. 11(23): p. 11673-11685.
171. Muñoz, P., R. Blanco, and M.A. Blasco, Role of the TRF2 telomeric protein in cancer
and ageing. Cell Cycle, 2006. 5(7): p. 718-21.
172. Grammatikakis, I., et al., Alternative Splicing of Neuronal Differentiation Factor TRF2
Regulated by HNRNPH1/H2. Cell Rep, 2016. 15(5): p. 926-934.
173. Horvath, S., et al., Pan-primate studies of age and sex. Geroscience, 2023. 45(6): p.
3187-3209.
101
174. Vislovukh, A., et al., Role of 3'-untranslated region translational control in cancer
development, diagnostics and treatment. World J Biol Chem, 2014. 5(1): p. 40-57.
175. Zeng, W., et al., Restoration of CPEB4 prevents muscle stem cell senescence during
aging. Dev Cell, 2023. 58(15): p. 1383-1398.e6.
176. van der Linden, R.J., et al., RNA-binding protein ELAVL4/HuD ameliorates Alzheimer's
disease-related molecular changes in human iPSC-derived neurons. Prog Neurobiol,
2022. 217: p. 102316.
177. Brennan, C.M. and J.A. Steitz, HuR and mRNA stability. Cell Mol Life Sci, 2001. 58(2):
p. 266-77.
178. Wang, C., et al., Reciprocal modulation of long noncoding RNA EMS and p53 regulates
tumorigenesis. Proc Natl Acad Sci U S A, 2022. 119(3).
179. Sun, D., et al., LncRNA DANCR counteracts premature ovarian insufficiency by
regulating the senescence process of granulosa cells through stabilizing the interaction
between p53 and hNRNPC. J Ovarian Res, 2023. 16(1): p. 41.
180. Tacutu, R., et al., Human Ageing Genomic Resources: new and updated databases.
Nucleic Acids Res, 2018. 46(D1): p. D1083-D1090.
181. Galluzzi, L., et al., Molecular mechanisms of cell death: recommendations of the
Nomenclature Committee on Cell Death 2018. Cell Death Differ, 2018. 25(3): p. 486-
541.
182. Green, D.R. and F. Llambi, Cell Death Signaling. Cold Spring Harb Perspect Biol, 2015.
7(12).
183. Elmore, S., Apoptosis: a review of programmed cell death. Toxicol Pathol, 2007. 35(4):
p. 495-516.
184. Peter, M.E., Programmed cell death: Apoptosis meets necrosis. Nature, 2011. 471(7338):
p. 310-2.
185. Green, D.R. and G. Kroemer, The pathophysiology of mitochondrial cell death. Science,
2004. 305(5684): p. 626-9.
186. McIlwain, D.R., T. Berger, and T.W. Mak, Caspase functions in cell death and disease.
Cold Spring Harb Perspect Biol, 2015. 7(4).
187. Van Opdenbosch, N. and M. Lamkanfi, Caspases in Cell Death, Inflammation, and
Disease. Immunity, 2019. 50(6): p. 1352-1364.
102
188. Larsen, B.D. and C.S. Sørensen, The caspase-activated DNase: apoptosis and beyond.
FEBS J, 2017. 284(8): p. 1160-1170.
189. Fricker, M., et al., Neuronal Cell Death. Physiol Rev, 2018. 98(2): p. 813-880.
190. Dekkers, M.P. and Y.A. Barde, Developmental biology. Programmed cell death in
neuronal development. Science, 2013. 340(6128): p. 39-41.
191. Yamaguchi, Y. and M. Miura, Programmed cell death in neurodevelopment. Dev Cell,
2015. 32(4): p. 478-90.
192. Williams, R.W. and K. Herrup, The control of neuron number. Annu Rev Neurosci,
1988. 11: p. 423-53.
193. Kristiansen, M. and J. Ham, Programmed cell death during neuronal development: the
sympathetic neuron model. Cell Death Differ, 2014. 21(7): p. 1025-35.
194. Huang, E.J. and L.F. Reichardt, Neurotrophins: roles in neuronal development and
function. Annu Rev Neurosci, 2001. 24: p. 677-736.
195. Kole, A.J., R.P. Annis, and M. Deshmukh, Mature neurons: equipped for survival. Cell
Death Dis, 2013. 4: p. e689.
196. Benn, S.C. and C.J. Woolf, Adult neuron survival strategies--slamming on the brakes.
Nat Rev Neurosci, 2004. 5(9): p. 686-700.
197. Easton, R.M., et al., Analysis of the mechanism of loss of trophic factor dependence
associated with neuronal maturation: a phenotype indistinguishable from Bax deletion. J
Neurosci, 1997. 17(24): p. 9656-66.
198. Davies, A.M., Developmental changes in the neurotrophic factor survival requirements
of peripheral nervous system neurons. Prog Brain Res, 1998. 117: p. 47-56.
199. Spreafico, R., et al., In situ labeling of apoptotic cell death in the cerebral cortex and
thalamus of rats during development. J Comp Neurol, 1995. 363(2): p. 281-95.
200. Blaschke, A.J., K. Staley, and J. Chun, Widespread programmed cell death in
proliferative and postmitotic regions of the fetal cerebral cortex. Development, 1996.
122(4): p. 1165-74.
201. Blaschke, A.J., J.A. Weiner, and J. Chun, Programmed cell death is a universal feature
of embryonic and postnatal neuroproliferative regions throughout the central nervous
system. J Comp Neurol, 1998. 396(1): p. 39-50.
202. Thomaidou, D., et al., Apoptosis and its relation to the cell cycle in the developing
cerebral cortex. J Neurosci, 1997. 17(3): p. 1075-85.
103
203. McConnell, M.J., H.R. MacMillan, and J. Chun, Mathematical modeling supports
substantial mouse neural progenitor cell death. Neural Dev, 2009. 4: p. 28.
204. Lin, L., et al., Developmental Attenuation of Neuronal Apoptosis by Neural-Specific
Splicing of Bak1 Microexon. Neuron, 2020. 107(6): p. 1180-1196.e8.
205. Petito, C.K., et al., Selective glial vulnerability following transient global ischemia in rat
brain. J Neuropathol Exp Neurol, 1998. 57(3): p. 231-8.
206. Chow, B.M., Y.Q. Li, and C.S. Wong, Radiation-induced apoptosis in the adult central
nervous system is p53-dependent. Cell Death Differ, 2000. 7(8): p. 712-20.
207. Casafont, I., et al., Effect of ionizing radiation in sensory ganglion neurons: organization
and dynamics of nuclear compartments of DNA damage/repair and their relationship
with transcription and cell cycle. Acta Neuropathol, 2011. 122(4): p. 481-93.
208. Li, Y.Q., et al., Time course of radiation-induced apoptosis in the adult rat spinal cord.
Radiother Oncol, 1996. 39(1): p. 35-42.
209. Labi, V. and M. Erlacher, How cell death shapes cancer. Cell Death Dis, 2015. 6: p.
e1675.
210. Hanahan, D. and R.A. Weinberg, The hallmarks of cancer. Cell, 2000. 100(1): p. 57-70.
211. Zhang, T., S. Cooper, and N. Brockdorff, The interplay of histone modifications - writers
that read. EMBO Rep, 2015. 16(11): p. 1467-81.
212. Kimura, H., Histone modifications for human epigenome analysis. J Hum Genet, 2013.
58(7): p. 439-45.
213. Voigt, P., W.W. Tee, and D. Reinberg, A double take on bivalent promoters. Genes Dev,
2013. 27(12): p. 1318-38.
214. Chen, L., A link between H3K27me3 mark and exon length in the gene promoters of
pluripotent and differentiated cells. Bioinformatics, 2010. 26(7): p. 855-9.
215. Liang, G., et al., Distinct localization of histone H3 acetylation and H3-K4 methylation to
the transcription start sites in the human genome. Proc Natl Acad Sci U S A, 2004.
101(19): p. 7357-62.
216. Cheung, I., et al., Developmental regulation and individual differences of neuronal
H3K4me3 epigenomes in the prefrontal cortex. Proc Natl Acad Sci U S A, 2010. 107(19):
p. 8824-9.
217. Ku, M., et al., Genomewide analysis of PRC1 and PRC2 occupancy identifies two classes
of bivalent domains. PLoS Genet, 2008. 4(10): p. e1000242.
104
218. Zhu, L.J., et al., ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and
ChIP-chip data. BMC Bioinformatics, 2010. 11: p. 237.
219. Tinel, A. and J. Tschopp, The PIDDosome, a protein complex implicated in activation of
caspase-2 in response to genotoxic stress. Science, 2004. 304(5672): p. 843-6.
220. Tinel, A., et al., Autoproteolysis of PIDD marks the bifurcation between pro-death
caspase-2 and pro-survival NF-kappaB pathway. EMBO J, 2007. 26(1): p. 197-208.
221. Parra, E., L. Gutiérrez, and J. Ferreira, Inhibition of basal JNK activity by small
interfering RNAs enhances cisplatin sensitivity and decreases DNA repair in T98G
glioblastoma cells. Oncol Rep, 2015. 33(1): p. 413-8.
222. Sabio, G. and R.J. Davis, TNF and MAP kinase signalling pathways. Semin Immunol,
2014. 26(3): p. 237-45.
223. Liu, J., Y. Minemoto, and A. Lin, c-Jun N-terminal protein kinase 1 (JNK1), but not
JNK2, is essential for tumor necrosis factor alpha-induced c-Jun kinase activation and
apoptosis. Mol Cell Biol, 2004. 24(24): p. 10844-56.
224. Tournier, C., et al., Requirement of JNK for stress-induced activation of the cytochrome
c-mediated death pathway. Science, 2000. 288(5467): p. 870-4.
225. Baker, E., et al., Chromosomal location of the human tumor necrosis factor receptor
genes. Cytogenet Cell Genet, 1991. 57(2-3): p. 117-8.
226. Pobezinskaya, Y.L., et al., The function of TRADD in signaling through tumor necrosis
factor receptor 1 and TRIF-dependent Toll-like receptors. Nat Immunol, 2008. 9(9): p.
1047-54.
227. Chen, N.J., et al., Beyond tumor necrosis factor receptor: TRADD signaling in toll-like
receptors. Proc Natl Acad Sci U S A, 2008. 105(34): p. 12429-34.
228. Michallet, M.C., et al., TRADD protein is an essential component of the RIG-like helicase
antiviral pathway. Immunity, 2008. 28(5): p. 651-61.
229. Xu, D., et al., Modulating TRADD to restore cellular homeostasis and inhibit apoptosis.
Nature, 2020. 587(7832): p. 133-138.
230. Hyman, B.T. and J. Yuan, Apoptotic and non-apoptotic roles of caspases in neuronal
physiology and pathophysiology. Nat Rev Neurosci, 2012. 13(6): p. 395-406.
231. D'Amelio, M., V. Cavallucci, and F. Cecconi, Neuronal caspase-3 signaling: not only
cell death. Cell Death Differ, 2010. 17(7): p. 1104-14.
105
232. Hollville, E. and M. Deshmukh, Physiological functions of non-apoptotic caspase activity
in the nervous system. Semin Cell Dev Biol, 2018. 82: p. 127-136.
233. Gulyaeva, N.V., Non-apoptotic functions of caspase-3 in nervous tissue. Biochemistry
(Mosc), 2003. 68(11): p. 1171-80.
234. Geden, M.J., S.E. Romero, and M. Deshmukh, Apoptosis versus axon pruning:
Molecular intersection of two distinct pathways for axon degeneration. Neurosci Res,
2019. 139: p. 3-8.
235. Li, Z. and M. Sheng, Caspases in synaptic plasticity. Mol Brain, 2012. 5: p. 15.
236. Chan, S.L. and M.P. Mattson, Caspase and calpain substrates: roles in synaptic plasticity
and cell death. J Neurosci Res, 1999. 58(1): p. 167-90.
237. Goyal, L., Cell death inhibition: keeping caspases in check. Cell, 2001. 104(6): p. 805-8.
238. Nuñez, G., et al., Caspases: the proteases of the apoptotic pathway. Oncogene, 1998.
17(25): p. 3237-45.
239. Cohen, G.M., Caspases: the executioners of apoptosis. Biochem J, 1997. 326 ( Pt 1): p.
1-16.
240. Jürgensmeier, J.M., et al., Bax directly induces release of cytochrome c from isolated
mitochondria. Proc Natl Acad Sci U S A, 1998. 95(9): p. 4997-5002.
241. Narita, M., et al., Bax interacts with the permeability transition pore to induce
permeability transition and cytochrome c release in isolated mitochondria. Proc Natl
Acad Sci U S A, 1998. 95(25): p. 14681-6.
242. Hardwick, J.M. and L. Soane, Multiple functions of BCL-2 family proteins. Cold Spring
Harb Perspect Biol, 2013. 5(2).
243. Czabotar, P.E., et al., Control of apoptosis by the BCL-2 protein family: implications for
physiology and therapy. Nat Rev Mol Cell Biol, 2014. 15(1): p. 49-63.
244. Chipuk, J.E., et al., The BCL-2 family reunion. Mol Cell, 2010. 37(3): p. 299-310.
245. Youle, R.J. and A. Strasser, The BCL-2 protein family: opposing activities that mediate
cell death. Nat Rev Mol Cell Biol, 2008. 9(1): p. 47-59.
246. Shulha, H.P., et al., Coordinated cell type-specific epigenetic remodeling in prefrontal
cortex begins before birth and continues into early adulthood. PLoS Genet, 2013. 9(4): p.
e1003433.
106
247. Burney, M.J., et al., An epigenetic signature of developmental potential in neural stem
cells and early neurons. Stem Cells, 2013. 31(9): p. 1868-80.
248. Pereira, J.D., et al., Ezh2, the histone methyltransferase of PRC2, regulates the balance
between self-renewal and differentiation in the cerebral cortex. Proc Natl Acad Sci U S
A, 2010. 107(36): p. 15957-62.
249. Hirabayashi, Y., et al., Polycomb limits the neurogenic competence of neural precursor
cells to promote astrogenic fate transition. Neuron, 2009. 63(5): p. 600-13.
250. Sparmann, A., et al., The chromodomain helicase Chd4 is required for Polycombmediated inhibition of astroglial differentiation. EMBO J, 2013. 32(11): p. 1598-612.
251. Egan, C.M., et al., CHD5 is required for neurogenesis and has a dual role in facilitating
gene expression and polycomb gene repression. Dev Cell, 2013. 26(3): p. 223-36.
Abstract (if available)
Abstract
The post-genomic era has witnessed the rapid development of transcriptomics, propelled by high-throughput technologies. Comparative transcriptomic analysis, focusing on quantification and comparison of gene expression and alternative splicing across species, plays a pivotal role in unraveling the complexities of biological processes and evolution relationship. In this dissertation, we developed and implemented various statistical methods and computational pipeline to explore hidden information in the transcriptome and genome with comparative analysis.
In Chapter 2, we explored tissue specificity in gene expression across species, utilizing transcriptomic data from humans, cynomolgus macaques, rats, mice, and dogs in 13 tissues.
The evolutionary path of such tissue specificity provides essential information about the tissue-specific function of genes and the validity of disease animal models. We found that although tissue specificity of homologous gene expression is largely well-conserved across species, a total of 380 genes shifted or are in the process of shifting their tissue specificity. The tissue-specificity-shifting genes are less conserved than those preserving their tissue specificity or housekeeping genes. Interestingly, tissue-specificity-shifting genes tend to be less conserved at the third codon positions, likely due to their relaxed synonymous codon usage bias. Moreover, compared to genes, cassette exons are more likely to shift their tissue specificity of splicing across the five species.
Next, we delved deeper into explore one of regulation mechanism of gene expression: alternative splicing with comparative approach once more. We investigated the potential connection between alternative splicing, maximum lifespan, and longevity with a comparative transcriptomic analysis involving 26 mammal species with varying lifespans. We identified hundreds of alternative splicing events associated with maximum lifespans. These longevity-associated alternative splicing events exhibited tissue-specific patterns, particularly in the brain. They were enriched in genes related to metabolic processes, DNA damage or integrity checkpoints, and mRNA surveillance pathways. Additionally, we derived a list of alternative splicing events associated with human ages. Our analysis revealed longevity-associated splicing is enriched in genes encoding intrinsically disordered proteins. Our findings therefore shed light on the comprehensive role of alternative splicing in maximum lifespans and aging and present interesting targets for future research and therapeutic interventions.
In addition to post-transcriptional regulation, transcriptional regulation like histone modification also plays a crucial role in regulating gene expression. In Chapter 4, we tried to answer the questions that whether and how neurons globally reprogram the expression of apoptotic genes during development using a mouse dataset. We systematically examined the in vivo expression of 1923 apoptosis-related genes and associated histone modifications at eight developmental ages of mouse brains. Most apoptotic genes displayed consistent temporal patterns across the forebrain, midbrain, and hindbrain, suggesting ubiquitous robust developmental reprogramming. Although both anti- and pro-apoptotic genes can be up- or downregulated, half the regulatory events in the classical apoptosis pathway are downregulation of pro-apoptotic genes. Reduced expression in initiator caspases, apoptosome, and pro-apoptotic Bcl-2 family members restrains effector caspase activation and attenuates neuronal apoptosis. The developmental downregulation of apoptotic genes is attributed to decreasing histone-3-lysine-4-trimethylation (H3K4me3) signals at promoters, where histone-3-lysine-27-trimethylation (H3K27me3) rarely changes. By contrast, repressive H3K27me3 marks are lost in the upregulated gene groups, for which developmental H3K4me3 changes are not predictive. Hence, developing brains remove epigenetic H3K4me3 and H3K27me3 marks on different apoptotic gene groups, contributing to their downregulation and upregulation, respectively. As such, neurons drastically alter global apoptotic gene expression during development to transform apoptosis controls. Research into neuronal cell death should consider maturation stages as a biological variable.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Genome-wide studies reveal the isoform selection of genes
PDF
Mapping genetic variants for nonsense-mediated mRNA decay regulation across human tissues
PDF
Genome-wide studies reveal the function and evolution of DNA shape
PDF
Data-driven learning for dynamical systems in biology
PDF
Model selection methods for genome wide association studies and statistical analysis of RNA seq data
PDF
Probing the genetic basis of gene expression variation through Bayesian analysis of allelic imbalance and transcriptome studies of oil palm interspecies hybrids
PDF
Isoform quantification and splicing regulation analysis in RNA-seq studies
PDF
Exploration of human microbiome through metagenomic analysis and computational algorithms
PDF
Constructing metagenome-assembled genomes and mobile genetic element host interactions using metagenomic Hi-C
PDF
Enhancing phenotype prediction through integrative analysis of heterogeneous microbiome studies
PDF
Deciphering protein-nucleic acid interactions with artificial intelligence
PDF
Application of machine learning methods in genomic data analysis
PDF
Comparative analysis of DNA methylation in mammals
PDF
Efficient algorithms to map whole genome bisulfite sequencing reads
PDF
Using molecular techniques to explore the diversity, ecology and physiology of important protistan species, with an emphasis on the Prymnesiophyceae
PDF
DNA shape at transcription factor binding sites: from purifying selection to a new alphabet
PDF
Statistical and computational approaches for analyzing metagenomic sequences with reproducibility and reliability
PDF
Understanding the 3D genome organization in topological domain level
PDF
Machine learning of DNA shape and spatial geometry
PDF
Identifying allele-specific DNA methylation in mammalian genomes
Asset Metadata
Creator
Jiang, Wei
(author)
Core Title
Comparative transcriptomics: connecting the genome to evolution
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Computational Biology and Bioinformatics
Degree Conferral Date
2024-05
Publication Date
04/15/2024
Defense Date
03/26/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
aging,comparative transcriptomics,Evolution,OAI-PMH Harvest,tissue specificity
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Chen, Liang (
committee chair
), Edge, Michael (
committee member
), Nakano, Aiichiro (
committee member
), Sun, Fengzhu (
committee member
)
Creator Email
jian227@usc.edu,katejiang1011@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113872130
Unique identifier
UC113872130
Identifier
etd-JiangWei-12808.pdf (filename)
Legacy Identifier
etd-JiangWei-12808
Document Type
Dissertation
Format
theses (aat)
Rights
Jiang, Wei
Internet Media Type
application/pdf
Type
texts
Source
20240415-usctheses-batch-1140
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
comparative transcriptomics
tissue specificity