Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
An analysis of conservation of methylation
(USC Thesis Other)
An analysis of conservation of methylation
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Copyright 2020 Chao XIA
An Analysis of Conservation of Methylation
by
Chao XIA
A Thesis Presented to the
FACULTY OF THE USC KECK SCHOOL OF MEDICINE
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
(BIOSTATISTICS)
May 2020
ii
DEDICATION
This work is dedicated to my parents for all of their endless love and support.
iii
ACKNOWLEDGMENTS
I would like to thank my advisor Dr. Paul Marjoram for being a great mentor, allowing me to
join his research, and for being incredibly supportive and helpful. In addition, I would like to thank
my committee members Dr. Darryl Shibata and Dr. Kimberly Siegmund for their expertise and
guidance.
I’d also like to thank Dr. Darryl Shibata for his permission to use his methylation data.
Finally, I would like to thank Dr. Emil Hvitfeldt for his help during my research on this thesis.
iv
Table of Contents
DEDICATION ................................................................................................................................ ii
ACKNOWLEDGMENTS ............................................................................................................. iii
LIST OF TABLES .......................................................................................................................... v
LIST OF FIGURES ...................................................................................................................... vii
ABSTRACT ................................................................................................................................. xiv
INTRODUCTION .......................................................................................................................... 1
MATERIALS AND METHODS .................................................................................................... 5
RESULTS ..................................................................................................................................... 11
DISCUSSION ............................................................................................................................... 16
CONCLUSION ............................................................................................................................. 18
REFERENCES ............................................................................................................................. 19
v
LIST OF TABLES
Table 1
Average Manhattan by tissue type, stratified by whether it is gene associated
Gene Associated
Tissue Type Total Yes No p-value 95% CI
Colon 0.099 0.088 0.118 < 0.001 (-0.031, -0.030)
Small Intestine 0.109 0.101 0.134 < 0.001 (-0.033, -0.032)
Endometrium 0.089 0.081 0.112 < 0.001 (-0.031, -0.030)
Note: CI = Confidence Interval; Confidence level = 0.05.
Table 2
Average Manhattan by tissue type stratified by relationship with island
Manhattan distance
Island relation Colon Small intestine Endometrium
Sea 0.107 0.125 0.102
South Shore 0.102 0.112 0.089
South Shelf 0.107 0.121 0.097
North Shore 0.105 0.115 0.092
North Shelf 0.107 0.121 0.097
Island 0.055 0.060 0.051
Table 3
Bonferroni adjusted ANOVA test for colon stratified by relationship with island
Island relation Island North Shelf North Shore South Shelf South Shore
North Shelf < 0.001 - - - -
North Shore < 0.001 0.112 - - -
South Shelf < 0.001 > 0.999 0.217 - -
South Shore < 0.001 < 0.001 < 0.001 < 0.001 -
Sea < 0.001 < 0.001 < 0.001 > 0.999 < 0.001
Note: Bonferroni adjusted confidence level = 0.003.
Table 4
Bonferroni adjusted ANOVA test for small intestine stratified by relationship with island
Island relation Island North Shelf North Shore South Shelf South Shore
North Shelf < 0.001 - - - -
North Shore < 0.001 < 0.001 - - -
South Shelf < 0.001 > 0.999 < 0.001 - -
South Shore < 0.001 < 0.001 < 0.001 < 0.001 -
Sea < 0.001 < 0.001 < 0.001 < 0.001 < 0.001
Note: Bonferroni adjusted confidence level = 0.003.
vi
Table 5
Bonferroni adjusted ANOVA test for endometrium stratified by relationship with island
Island relation Island North Shelf North Shore South Shelf South Shore
North Shelf < 0.001 - - - -
North Shore < 0.001 < 0.001 - - -
South Shelf < 0.001 > 0.999 < 0.001 - -
South Shore < 0.001 < 0.001 < 0.001 < 0.001 -
Sea < 0.001 < 0.001 < 0.001 < 0.001 < 0.001
Note: Bonferroni adjusted confidence level = 0.003.
Table 6
Average Manhattan distance by tissue type stratified by if sites located at 5’UTR
5’UTR
Tissue Type Yes No p-value 95% CI
Colon 0.079 0.099 < 0.001 (-0.010, -0.009)
Small Intestine 0.092 0.113 < 0.001 (0.008, 0.009)
Endometrium 0.072 0.093 < 0.001 (0.008, 0.009)
Note: CI = Confidence Interval; Confidence level = 0.05.
Table 7
Average Manhattan distance by tissue type stratified by if sites located at TSS1500
TSS1500
Tissue Type Yes No p-value 95% CI
Colon 0.089 0.098 < 0.001 (-0.001, 0.000)
Small Intestine 0.097 0.113 < 0.001 (0.003, 0.004)
Endometrium 0.080 0.092 < 0.001 (0.000, 0.001)
Note: CI = Confidence Interval; Confidence level = 0.05.
Table 8
Average Manhattan distance by tissue type stratified by if sites located at TSS200
TSS200
Tissue Type Yes No p-value 95% CI
Colon 0.055 0.101 < 0.001 (0.032, 0.033)
Small Intestine 0.063 0.115 < 0.001 (0.038, 0.039)
Endometrium 0.051 0.094 < 0.001 (0.029, 0.030)
Note: CI = Confidence Interval; Confidence level = 0.05.
vii
LIST OF FIGURES
Figure 1. Average Manhattan distance for single CpGs as a function of position relative to first 5'
annotated gene CpG site
The greater conservation (lower average Manhattan distances) around genes indicates DNA
methylation conservation generally extends for hundreds of base pairs and is not isolated to a single
CpG site.
viii
Figure 2. Distribution of boot-strapping p-values for genes
Each column corresponds to a specific tissue. The top row shows results from the naive bootstrap
procedure, whereas the bottom row shows the adjusted bootstrap results.
ix
Figure 3. Frequency of overlap between pathways that are called has most conserved for each
tissue type
x
Figure 4. Enrichment map of pathways of conserved genes in small intestine tissue.
Edges are shown between pathways if the overlap ratio is greater than 0.5. Major clusters are
labeled according to the most frequent words in pathways names.
xi
Figure 5. Relationship between pathways that are called as conserved in small intestine tissue
Figure 6. Relationship between pathways that are called as conserved in colon tissue
xii
Figure 7. Relationship between pathways that are called as conserved in endometrium tissue
xiii
Figure 8. The relationship between conservation and expression
Genes are collected into 10 groups according to the degree of conservation measured in our data.
For each group, we then show a box-pot of the distribution of gene expression values (FPKM)
recorded for the corresponding tissue type in the Encode database. Columns correspond to the
tissue type. The top row shows results when assessing conservation for the entire gene; the bottom
row shows the results when assessing conservation just for the promoter region of each gene. We
see that gene conservation correlates with expression better than does promoter conservation.
xiv
ABSTRACT
Background: DNA methylation is chemical modifications of DNA bases. With the development
of sensitive technologies, DNA methylation was found to not only play an important role in gene
expression, but also to be associated with several diseases, such as depression, hypertension and
cancer. But it is also likely that some functional CpGs may need to be conserved between cells
from same tissue, because otherwise changing them may cause disease. To explore this issue, we
developed a method to explore the relationship between conservation of DNA methylation and
gene expression.
Methods: 32 samples of normal tissue containing about 850,000 CpG sites taken from the colon,
small intestine and endometrium of 8 human subjects were used in this thesis. Each sample consists
of around 500-10,000 cells from individual crypts or glands. Gene expression data from 22 samples
of colon, small intestine and endometrium was also used. A novel method consisting of a pairwise
distance calculation and a bootstrapping procedure was developed to analyze the relationship
between CpG methylation and gene expression.
Results: Some functional regions of CpG sites such as CGIs, 5’UTR, TSS1500 and TSS200 are
statistically significantly more conserved than are other regions. Pathways of core housekeeping
functions (cell cycle, DNA replication, transcription, translation) were statistically significantly
conserved in all three tissues. We find that conservation in genes does correlate with gene
expression levels, while conservation in gene promoter methylation does not appear to correlate
with expression at all for all three tissues.
Conclusion: Gene associated regions and some functional regions such as CGIs, 5’UTR, TSS1500
and TSS200 are more conserved than other regions. CpGs in core housekeeping functions (cell
xv
cycle, DNA replication, transcription, translation) are more likely to be conserved in native human
tissues. Gene expression is correlated with average CpG conservation but the CpG conservation
in promoter is not correlated with gene expression. This means that epigenetic conservation can
indicate what genes are likely to be more important than others in native human tissues.
1
INTRODUCTION
DNA methylation is chemical modifications of DNA bases. It was first detected as early
as 1948 (Hotchkiss, 1948). A role for DNA methylation in gene regulation, in particular for 5-
methylcytosine (5mC), was proposed in the mid-1970s by Holliday and Pugh, among others. By
1980, the functional connection between DNA methylation and gene repression was established
(Razin & Riggs, 1980), as was the existence of CpG islands (CGIs) (Bird, Taggart, Fromme, Miller
& Macleod, 1985).
In recent years, sensitive technologies have been developed that allow the interrogation
of DNA methylation patterns from a small number of cells. The use of these technologies has
greatly improved our knowledge of DNA methylation dynamics and heterogeneity across genes
(Greenberg & Bourc’his, 2019).
Generally, DNA methylation is catalyzed and maintained by DNA methyltransferase
(DNMT) including DNMT1, DNMT3A, DNMT3B, DNMT3C, DNMT3L, etc. Global DNA
methylation in DNA repetitive elements, such as ALU and LINE-1, are most widely used in the
study of population demographics (Gu, Wang, Nekrutenko & Li, 2000).
In 2017, a survey of 542 human transcription factors found that 117 (22%) exhibited
decreased binding to their motifs when methylated compared with unmethylated (Yin et al., 2017).
This research indicated that certain transcription factors were sensitive to CpG methylation. DNA
methylation can also influence heterochromatin formation via the recruitment to chromatin of
chromatin remodelers and modification by DNMT protein (Dennis, Fan, Geiman, Yan & Muegge,
2001; Deplus et al., 2002; Esteve et al., 2006; Epsztejn-Litman et al., 2008; Fuks, Burgers, Brehm,
Hughes-Davies & Kouzarides, 2000; Fuks, Burgers, Godin, Kasai & Kouzarides, 2001; Myant et
al., 2011; Tao et al., 2011).
2
Besides the association between DNA methylation and gene repression, the relationship
between DNA methylation and several diseases has also been examined. In 2014, Dalton, Kolshus
& McLoughlin found the negative association between early-life stress and the epigenetic
modification of gene expression. Again, in 2015, Lockwood, Su & Youssef concluded that
epigenetics could play an important role in depression and suicide in human.
In 2017, a review examined the association between DNA methylation of seven
candidate genes and depression, and found that brain-derived neurotropic factor (BDNF) and
nuclear receptor subfamily 3 group C member 1 (NR3C1) gene methylation levels may be
associated with depression, while there may not be relationship between serotonin transporter gene
(SLC6A4; synonyms: 5-HTT and SERT) and depression (Chen, Meng, Pei, Zheng & Leng, 2017).
In the same year, Bakusic, Schaufeli, Claes & Godderis identified the correlation between
depression and global and candidate-gene DNA methylation.
In 2019, a review assessed earlier studies and concluded that there was significant
association between hyper- and hypomethylations on both promoter region and CpG islands of a
number of genes and depression. But they also suggested that longitudinal studies should be
designed to examine the dynamic change of methylation levels (Alexeeff et al., 2013).
DNA methylation not only influences depression, but also plays an important role in
hypertension. Studies on LINE-1 concluded that there was a significant inverse association
between methylation levels and high systolic blood pressure (SBP) and diastolic (DBP) (Baccarelli
et al., 2010; Gardiner-Garden & Frommer, 1987; Turcot et al. 2012), while hypomethylation at
ALU elements was related to higher blood pressure (Turcot et al. 2012).
Furthermore, DNA methylation is related to cancer. Generally, the human genome is
CpG poor. Over two-third of mammalian promoters are CGIs (Larsen, Gundersen, Lopez & Prydz,
3
1992; Rakyan et al., 2010), which are very rarely methylated. But, with increasing age, DNA
methylation tend to increase on some CGIs, specifically at Poly-comb target genes (Esteller, 2002;
Teschendorff et al., 2010) and at promoters of tumor suppressor genes (Burbee et al., 2001).
Besides this, tumor-suppressor genes often reside in genomic regions that are
characterized by frequent chromosomal deletions. For example, two chromosomal regions,
RASSF1A at 3p21 (Dammann et al., 2000; Wales et al., 1995) and hypermethylated in cancer 1
(HIC1) at 17p13.3 (Kane et al., 1997), which are characterized by frequent loss of heterozygosity
in several tumor types, are often hypermethylated in many important human cancers, such as lung,
prostate, colon and breast. Moreover, the importance of epigenetic gene silencing in cancer is also
evidenced by the fact that such changes can actually happen easily during tumor progression. As
early as 1997, Kane et al. pointed out that the mismatch-repair gene MLH1 (mutL homologue 1,
colon cancer, non-polyposis type 2) is frequently hypermethylated in sporadic tumors. In 1998,
Herman et al. also indicated the same result in another study. These changes in the methylation of
the 5′ region of MLH1 have been not only observed in the normal colonic epithelium of patients
that have colorectal cancer with microsatellite instability (Nakagawa et al., 2001), but endometrial
cancers also develop this type of genetic change in hyperplastic regions (Esteller et al, 1999).
In 2018, Ryser, Yu , Grady, Siegmund & Shibata concluded that epigenetic genomic
intratumor heterogeneity (ITH) was minimal and more consistent with neutral evolution than
frequent stepwise selection during primary human colorectal tumor growth from their study on
epigenetic heterogeneity in human colorectal tumors.
From this previous research, we see that DNA methylation is very frequently related to diseases.
But at the same time, not all CpG methylation is associated with these diseases. This finding
suggests that some functional CpG sites may be rarely methylated and stay the same between cells
4
from the same tissue since any changes will decrease cell fitness and cause disease, while
unimportant CpG sites may be methylated and become different.
In this thesis, we developed a method to research the relationship between DNA methylation and
gene expression and applied it to about 850,000 CpG sites in 32 crypts/glands from the human
colon, small intestine and endometrium from 8 different individuals. We find that the methylation
of CpG sites in genes, particularly housekeeping genes, and more highly expressed genes in
general are, on average, more conserved.
5
MATERIALS AND METHODS
Data
Our data consists of 32 samples of normal tissue taken from the colon, small intestine
and endometrium of 8 human subjects. Each sample consists of around 500-10,000 cells from
individual crypts or glands. We then use the Illumina MethylationEPIC BeadChip Infinium
microarray (Moran, Arribas & Esteller, 2016) to measure methylation at ~850,000 CpG sites via
hybridization-ligation. This method can measure DNA from paraffin embedded tissues, single
tumor glands (~100 ng), and has high technical reproducibility (replicates with Pearson correlation
of 0.997 (Moran et al., 2016)). In this study, we first measure the proportion of methylated assayed
CpG sites from different tissue cells. Then we contrast our results with the gene expression taken
from the Encyclopedia of DNA Elements (ENCODE, https://www.encodeproject.org/). We
downloaded the call sets from the ENCODE portal with the following identifiers: ENCFF182PQO,
ENCFF697OWF, ENCFF365ZMW, ENCFF517LAP, ENCFF654HMF, ENCFF968IKU,
ENCFF885HSL, ENCFF630RAI, ENCFF122LQH, ENCFF537WNL, ENCFF411RSF,
ENCFF647PTK, ENCFF970SOK, ENCFF256ELA, ENCFF769FAA, ENCFF080HMB,
ENCFF845OKW, ENCFF309DZJ, ENCFF315DBK, ENCFF602HNU, ENCFF036RQO. All of
these sets are with mapping assembly GRCh38 and genome annotation V24.
Statistical Approach
Our analysis method was implemented using the Statistical programming language R,
version 3.6.1. It enables us to evaluate how DNA methylation varies along the genome, and in
specific regions/genes in order to identify genes or regions with particularly highly conserved
DNA methylation. We use a bootstrapping approach for this problem. We then assess whether
such conserved regions are consistent with genes that are highly expressed in the tissue of interest.
6
The methylation value is represented by 𝑀 for each CpG site in each gene of each
sample. Assume we have 𝑆 samples, then within each gene of interest, for every pair of samples,
we calculate the Manhattan distance (Sammut, Claude & Webb, Geoffrey I., 2017) between all
CpG sites at which we have measurements of methylation. We then calculate the mean of these
values across all pairs of samples. These values are normalized by the number of CpGs in that
gene. This gives us a standardized measure of conservation 𝐶 𝑔 , for the 𝑔 𝑡 ℎ
gene, which has 𝑁 𝑔
CpG sites measured in this gene. Since Manhattan distance is used to measure conservation, the
lower 𝐶 𝑔 is, the higher the conservation rate of this gene.
More formally, if we express this method in matrix form, for a specific gene 𝑔 , we will
have a matrix 𝐷 with dimension of 𝑆 × 𝑁 𝑔 , where the (𝑠 , 𝑛 )
𝑡 ℎ
element of the matrix, represented
as 𝑀 𝑠𝑛
, is methylation measured for the 𝑠 𝑡 ℎ
sample at the 𝑛 𝑡 ℎ
CpG site. Then the overall
Manhattan (or pairwise) distance between sample 𝑖 and 𝑗 will be ∑ |𝑀 𝑠 𝑖 − 𝑀 𝑠 𝑗 |
𝑁 𝑔 𝑘 =1
. And the
average Manhattan distance of the total set of these samples in this gene will be
∑ |𝑀 𝑠 𝑖 −𝑀 𝑠 𝑗 |
𝑁 𝑔 𝑘 =1
(
𝑆 2
)
(𝑖 , 𝑗 ∈ [1,32], 𝑖 , 𝑗 𝑎𝑟𝑒 𝑖𝑛𝑡𝑒𝑔𝑒𝑟 , 𝑖 < 𝑗 ). In order to control for the number of CpGs in
this gene, the conservation value 𝐶 𝑔 will be defined as:
𝐶 𝑔 =
∑ |𝑀 𝑠 𝑖 − 𝑀 𝑠 𝑗 |
𝑁 𝑔 𝑘 =1
(
𝑆 2
) × 𝑁 𝑔 𝑖 , 𝑗 ∈ [1,32], 𝑖 , 𝑗 𝑎𝑟𝑒 𝑖 𝑛𝑡𝑒𝑔𝑒𝑟 , 𝑖 < 𝑗
With this definition of conservation value, we should be able to determine which of those
genes are the most conserved (i.e., have the lowest values of 𝐶 𝑔 ). One approach would be to simply
rank the calculated 𝐶 𝑔 values and select the smallest values (values less than𝐿 say). These genes
would then be regarded as the most conserved on average at those CpG sites. However, since
variance of the observed 𝐶 𝑔 is inversely proportional to 𝑁 𝑔 , the number of CpGs measured in that
7
gene, under the null hypothesis that there is no difference among conservations of different genes,
we would expect an over-representation of genes for which the number of measured CpGs is low.
Indeed, this is what we really did observe in our dataset. Given that the 𝑁 𝑔 values do vary greatly
for the MethylationEPIC BeadChip microarray, in order to reduce the influence of 𝑁 𝑔 , we decided
to use two bootstrapping (Efron & Tibshirani, 1993) approaches in this thesis.
For each observed value of 𝑁 𝑔 we construct a null distribution using a boot-strapping
procedure. First of all, we construct a table of all measured CpGs with 𝑁 𝑔 CpGs that are annotated
as gene-linked per the EPIC microarray documentation. We then repeatedly sample sets of 𝑁 𝑔
CpG sites to act as “null genes” in the following ways:
In the approach we call “naive bootstrap”, we proceed as follows, repeating these steps
for 𝑙 = 1, ⋯ , 𝑁 𝑠 , for some large number 𝑁 𝑠 .:
1. Sample a set of 𝑁 𝑔 of these gene-associated CpGs independently at random to form
each null gene 𝑔 𝑙 without regard to their location.
2. Calculate standard Manhattan distance for null gene 𝑔 𝑙 as described above,
represented by 𝐶 ̂
𝑙 .
But in reality, the methylation status in neighboring CpGs is correlated, so that
neighboring CpGs are more likely to have the same methylation state than non-neighboring CpGs.
In such case, null genes constructed by the naive bootstrap approach above will typically not follow
this correlation structure observed in CpG neighboring sites from actual genomes. Therefore, such
an approach may perform badly (this will be discussed in detail in Results). For this reason, we
provide a more detailed approach, which we refer to as the “adjusted bootstrap”. In this version of
the bootstrap, we proceed in following ways:
8
1. Extract all possible sets of 𝑁 𝑔 consecutive CpGs for all genes such that all 𝑁 𝑔 CpGs
are associated with the same gene. This total set across all genes is called 𝑆 𝑁 𝑔 .
2. Repeat the following steps for 𝑙 = 1, ⋯ , 𝑁 𝑠 , for some large number 𝑁 𝑠 .
3. Sample a set of 𝑁 𝑔 consecutive CpGs from 𝑆 𝑁 𝑔 for null gene 𝑔 𝑙 in order to make the
distribution of CpG sites of each gene be the same. The CpGs sites it contains will all be associated
with the same gene through this construction so that the correlation structure will be that which is
typical among nearby CpG sites.
4. Calculate standard Manhattan distance for null gene 𝑔 𝑙 as described above,
represented by 𝐶 ̂
𝑙 .
Basically, these two approaches test the same null hypothesis: there is no difference
among conservations of different genes. But the adjusted bootstrap should construct sets of CpGs
that better reflect the correlation relationship found within the genome.
In the results we report in this thesis, we repeat these two boot-strapping procedures
1000 times (𝑁 𝑠 = 1000) for each distinct gene of 𝑁 𝑔 . We then take the set with 𝑁 𝑠 values of 𝐶 ̂
𝑙
generated for each gene and use those as the null distribution. The significance level of the
observed values 𝐶 𝑔 for all 𝑁 𝑔 genes is constructed from this distribution. The significance level is
defined as the quantile of the observed value 𝐶 𝑔 in this null distribution, which can be thought as
a p-value. In such case, the lower the quantile, the more significantly conserved the gene is. We
extract all genes that fall between the 0
th
and 5
th
quantile so that we can compare them with gene
expression values since they can represent the distribution of the gene conservation (Figure 1)
(described below - Comparison to public data).
In order to better illustrate the importance of highly-conserved genes using the procedure
described above, a gene set enrichment analysis is then applied to the ones we call as conserved
9
via the ReactomePA (Yu & He, 2016) software package. This flags pathways that are significantly
enriched among our set of conserved genes.
Besides conducting this analysis for each gene including all CpGs that are associated
with that gene, we also use the procedure we mentioned above to analyze gene conservation in
some other particular genic regions (5’UTR, or 500 & 2000 base pairs from the transcription start
site) so that we can see whether these other localized regions might better correlate with gene
expression.
Finally, we also compare gene expression with gene conservation in the promoter region
(A region of DNA where transcription of a gene is initiated).
Comparison to public data
The final step of our analysis is to evaluate how well gene conservation is related to gene
expression. Since we have no expression data for the samples we are using, we instead use data
from Encode (https://www.encodeproject.org/). From this website, we can obtain expression levels
for each gene in normal tissue for each of colon, small intestine and endometrium calculated from
RNA-seq data for tissue samples. The expression level is evaluated by Fragments Per Kilobase of
transcript per Million mapped reads (FPKM).
On ENCODE, we find 8 samples for colon, 10 samples for small intestine and 4 samples
for endometrium. Since the dataset of ENCODE only gives the gene id for each gene in the
standard human gene nomenclature from HUGO Gene Nomenclature Committee (HGNC), we
recommend using biomaRt software package in R to clean the dataset.
Although the samples are different, the tissue is the same so that we believe for most
genes, since we focus on the association in normal tissues, expression will be similar in these
samples and ours. It is obvious that any correlation between conservation and expression that we
10
would see in our actual data would be likely to be more significant than that between expression
in unrelated public samples, so our test for correlation should be conservative. Ultimately, if this
method is applicable, we do hope to apply this method to tumor tissues.
11
RESULTS
First of all, we analyze the distribution of gene conservation and find that it is
nonuniform along the genome with greater conservation within genes (Table 1). In this table we
show the mean value of per-CpG Manhattan distance, 𝐶 𝑔 , as a function of category. We categorize
the sites into following groups: i) gene/non-gene; ii) whether or not they fall in CpG Islands, Shores,
Shelves or Sea (Antequera & Bird, 1993); iii) whether or not they are located in the 5’UTR of a
gene; iv) whether or not they are located within 1500 base-pairs [bps] or 200 bps of the
transcription start site [TSS]. We use two independent sample t-test for group i), iii) and iv) while
Bonferroni adjust ANOVA for group ii). In such case, we are able to figure out whether there are
differences among or between specific functional regions and other regions in specific tissue.
According to our assumption, if conservation is an indicator of biologic function, 𝐶 𝑔
should be lower in genes rather than outside genes. We find a statistically significant difference
between mean 𝐶 𝑔 of CpG sites in genes versus outside genes for colon (p < 0.001, 95%CI: -0.031,
-0.030), small intestine (p < 0.001, 95%CI: -0.033, -0.032) and endometrium (p < 0.001, 95%CI:
-0.031, -0.030) (Table 1).
As we mentioned in the Introduction, CpG Islands are rarely methylated, so these regions
should be observed to have low levels of methylation. , Regions nearby to Islands, annotated as
“Shore” (closest to Island) and “Shelf” (between Shore and Sea), also have lower conservation
than CGI (Table 2). According to a Bonferroni adjusted test, the conservation of CGI is statistically
different from the other four region (p < 0.001) for all three tissues as we expected.
At the same time, there is statistically significant difference between mean conservation
values of north shelf and north shore (p < 0.001), north shelf and south shore (p < 0.001), north
shore and south shore (p < 0.001), north shore and sea (p < 0.001), south shelf and south shore (p
12
< 0.001) as well as south shore and sea (p < 0.001) in colon (Table 3). As in small intestine and
endometrium, the pairwise mean conservation values of these regions are all statistically
significantly different (p < 0.001) except south shelf and north shelf (Table 4, 5).
Next, the conservation values of sites located in 5’UTR, TSS1500 and TSS200 are
compared with those sites outside those regions. There is a statistically significant difference
between mean 𝐶 𝑔 of CpG sites located at 5’UTR versus not located at 5’UTR of colon (p < 0.001,
95%CI: -0.010, -0.009), small intestine (p < 0.001, 95%CI: 0.008, 0.009) and endometrium (p <
0.001, 95%CI: 0.008, 0.009) (Table 6). However, the conservation values of sites located at
TSS1500 and not are also statistically significantly different in all three tissues (p < 0.001 for three
tissues, 95%CI: colon: -0.001, 0.000; Small intestine: 0.003,0.004; Endometrium: 0.000, 0.001)
(Table 7). The result for TSS200 is the same as that observed for the previous two categories (p <
0.001 for three tissues, 95%CI: colon: 0.032, 0.033; Small intestine: 0.038,0.039; Endometrium:
0.029, 0.030) (Table 8). And the sites located at TSS200 are mostly conserved in all these regions.
The behavior of conservation of CpGs sites as a function of their position relative to
their associated gene is then examined (Figure 1). The conservation values are averaged across all
genes. Each point represents the mean of the absolute value of the difference of methylation
frequencies at a given CpG site across all samples. CpG sites are grouped by their physical position,
where 0 represents the location of the first CpG associated with a given gene (per the annotation
file for the EPIC array) value. Manhattan distance is minimized within the first 2000bp and then
increases to a steady value along the rest of the gene, which means at the position that before
2000bp, the CpG sites are more conserved than other positions. This pattern is replicated across
all three tissue types with a slightly different level for each type.
In Figure 2, we show how the two methods of bootstrapping described earlier perform
13
in constructing p-value distributions when applied to our data. The first row is the distribution of
the “naive method” when CpG sites are randomly picked when constructing the null distribution
of similar genes. From the figure, we can see that the vast majority genes have an observed 𝐶 𝑔
value that is either always bigger, or always smaller, than the “null” genes created by the
bootstrapping procedure. This is problematic because this distribution essentially results in a large
number of ties with an over-representation of p-values equal to 0 or 1 when we try to rank the
genes. The null distribution of the alternative “adjusted bootstrap” procedure is shown in the
second row. The distribution of p-values is much more uniform, as is desired. In such case, it will
have far fewer ties between p-values, which enables us to rank the genes more effectively. After
comparing the distributions constructed by two different bootstrapping procedure, we decided to
use the adjusted method to produce the results shown in the rest of this paper.
Since not all genes are expressed in all tissues, conservation should vary between tissues,
and so gene conservation can be further stratified or ranked. After calculating the measures of
conservation shown above, we take the 5% of most conserved genes for each tissue, separately.
Then a gene-set enrichment analysis is conducted to see what pathways are over-represented
among these genes, considered as “conserved pathways”. We then ask how many of the conserved
pathways are conserved in one, two, or all three tissues can be determined.
In Figure 3, the first 3 columns show the number of pathways that are called as
significantly over-represented just in a single tissue type, while the next four columns show how
many pathways that are called as conserved in two or more tissue types. The overall number of
pathways called as significantly conserved in each tissue is shown by the colored bars at the left
bottom. We can see that endometrium and small intestine have the greatest numbers of uniquely
conserved pathways, but the overlap was just 6 pathways. However, surprisingly, 50 pathways
14
were conserved in all three tissues. These pathways are enriched in core housekeeping functions
(cell cycle, DNA replication, transcription, translation) which are essential to all mitotic cells
(Figure 4). This illustrates that we are successfully using conservation of methylation to detect
genes and pathways that play important roles in these tissues.
The enriched gene-sets are organized into a network (Figure 5-7). In these figures, the
nodes represent pathways that were labelled as conserved in our analysis. The edges between nodes
represents that genes are associated with both pathways that are labelled as highly-conserved in
our analysis. If the overlap proportion of genes between pathways is less than 0.5, no edge is
presented (Yu, 2019). In all three tissues we research, most of the pathways that we detect as most
conserved have significant overlaps in genes involved. In our point of view, this is because they
perform house-keeping roles in cell function.
Finally, the correlation between conservation and gene expression is examined.
Although variation in gene promoter methylation is often associated with gene silencing, whether
it correlates with gene expression is still unclear (Ross et al., 2005). In Figure 8, we present the
correlation between gene expression and gene conservation, as well as gene promoter methylation.
The first row represents the adjusted conservation values obtained by our adjusted bootstrapping
approach for genes (x-axis, binned by value), while the second row is the mean methylation in the
promoter regions (x-axis, binned by value). The expression values (y-axis) were taken from the
ENCODE.
The values for conservation and mean methylation in promoter have been binned so
that an equal number of points could be placed in each bin. Higher bin number represents more
conserved genes. In this figure, we can see that in the first row, the conservation increases from
left to right, but there is no such trend in gene promoter. As a result, we conclude that CpG
conservation in genes does correlate with gene expression levels, while CpG conservation in gene
15
promoter does not appear to correlate with expression at all for all three tissues.
16
DISCUSSION
A novel method is introduced to carry out the calculations for conservation and the
bootstrapping procedure in this thesis. The method consists of two parts: calculations of the
conservation value by region and then bootstrapping methods to calculate p-values based on
conservation values and region length. The calculation of the conservation value is customizable
by function of arithmetic mean in R. Two different bootstrapping methods are introduced in this
thesis. In order to prevent ties with an over-representation of p-values equal to 0 or 1 in this analysis
(Figure 2), we choose the so-called “adjusted bootstrapping”. Since pairwise distance is calculated,
at least two samples should be involved using this method.
In this thesis, 32 samples of CpG methylation and totally 22 samples of colon, small
intestine and endometrium were used. In order to test our hypothesis that there is correlation
between DNA methylation and gene expression, we used this method to identify and rank CpG
DNA methylation conservation and gene expression in these datasets.
From our analysis, we can conclude that in different regions of genes, conservation of
DNA methylation is different. More specifically, functional regions are more conserved than other
regions. Gene associated regions are more likely to be conserved than no-associated regions, while
CGIs are more conserved than Shelfs, Shores and Seas. Furthermore, CpGs at 5’UTR, TSS1500
and TSS200 are more conserved than those not in these regions.
From our visualizations we see that, function appeared to associate with gene
conservation at multiple gene associated CpG sites but not a specific CpG sites (Figure 1). This
may imply that the epigenetic configuration of the gene region is more informative than a single
CpG site. Moreover, through gene set enrichment analysis, we see that genes that correlate with
17
core housekeeping functions are more conserved than other functions (Figure 4-7), which further
indicate that some genes are more informative than others.
Most importantly, conservation does correlate with gene expression levels (Figure 8).
But interestingly, although gene promoter methylation is generally associated with gene silencing
[46]
, it does not appear to correlate with expression at all for all three tissues, which may indicate
that the promoter, to a certain degree, is not so important in gene expression for most genes.
From all these results, we see that known functional regions in genes have greater
epigenetic conservation and such epigenetic conservation can be used to identify which genes are
more important to the survival of its cell. Although conservation in single CpG sites does not
indicate function, highly-conserved functional regions can serve as indicators for unbiased
discovery of important genes or noncoding regions that correlate with the function or survival of
human cells. In such a case, epigenetic conservation can indicate what genes are greater selected
than others in native human tissues, which constructs an interesting potential foundation for our
next research steps, that of applying this method to CpG conservation and gene expression in tumor
tissues.
18
CONCLUSION
Our research applies a novel method to CpG methylation in 32 samples and gene
expression data in 22 samples of colon, small intestine and endometrium to study the relationship
between CpG methylation and gene expression in these tissues. We find that gene associated
regions and some functional regions such as CGIs, 5’UTR, TSS1500 and TSS200 are more
conserved than other regions. Furthermore, CpGs in core housekeeping functions (cell cycle, DNA
replication, transcription, translation) are more likely to be conserved in native human tissues.
Finally, the relationship between gene expression and CpG methylation is examined. We find that
gene expression is correlated with average CpG conservation but the CpG conservation in gene
promoter is not correlated with gene expression.
19
REFERENCES
Antequera, F., Bird, A. (1993). CpG Islands. In Jost, J.P. & Saluz, H. P. (eds) DNA Methylation
(pp. 169-185). Switzerland: Birkhä user Basel.
Alexeeff, S. E, Baccarelli, A. A., Halonen, J., Coull, B. A., Wright, R. O., Tarantini, L., …
Schwartz, J. (2013). Association between blood pressure and DNA methylation of
retrotransposons and pro-inflammatory genes. Int J Epidemiol, 42, 270-280. doi:
10.1093/ije/dys220
Burbee, D. G., Forgacs, E., Zö chbauer-Mü ller, S., Shivakumar, L., Fong, K., Gao, B., … Minna,
J. D. (2001). Epigenetic inactivation of RASSF1A in lung and breast cancers and
malignant phenotype suppression. J. Natl Cancer Inst., 93, 691-699. doi:
10.1093/jnci/93.9.691.
Bakusic, J., Schaufeli, W., Claes, S. & Godderis, L. (2017). Stress, burnout and depression: A
systematic review on DNA methylation mechanisms. J. Psychosom. Res., 92, 34-44. doi:
10.1016/j.jpsychores.2016.11.005.
Bird, A., Taggart, M., Frommer, M., Miller, O. J. & Macleod, D. (1985). A fraction of the mouse
genome that is derived from islands of nonmethylated, CpG-rich DNA. Cell, 40, 91-99.
doi: 10.1016/0092-8674(85)90312-5.
Baccarelli, A., Wright, R,, Bollati, V., Litonjua, A., Zanobetti, A., Tarantini, L., …Schwartz, J.
(2010). Ischemic heart disease and stroke in relation to blood DNA methylation.
Epidemiology, 21, 819-828. doi: 10.1097/EDE.0b013e3181f20457.
Chen, D., Meng, L., Pei, F., Zheng, Y. & Leng, J. (2017). A review of DNA methylation in
depression. J. Clin. Neurosci., 43, 39-46. doi: 10.1016/j.jocn.2017.05.022.
Deplus, R., Brenner, C., Burgers, W. A., Putmans, P., Kouzarides, T., de Launoit, Y. & Fuks, F.
(2002). Dnmt3L is a transcriptional repressor that recruits histone deacetylase. Nucleic
Acids Res., 30, 3831-3838. doi: 10.1093/nar/gkf509
Dennis, K., Fan, T., Geiman, T., Yan, Q. & Muegge, K. (2001). Lsh, a member of the SNF2 family,
is required for genome- wide methylation. Genes Dev., 15, 2940-2944. doi:
10.1101/gad.929101.
Dalton, V. S., Kolshus, E. & McLoughlin, D. M. (2014) Epigenetics and depression: return of the
repressed. J. Affect. Disord., 155, 1-12. doi: 10.1016/j.jad.2013.10.028.
Dammann, R., Li, C., Yoon, J. H., Chin P. L., Bates, S. & Pfeifer, G. P. (2000). Epigenetic
inactivation of a RAS association domain family protein from the lung tumour
suppressor locus 3p21.3. Nature Genet, 25, 315-319. 10.1038/77083.
20
Esteller, M. (2002). CpG island hypermethylation and tumor suppressor genes: a booming present,
a brighter future. Oncogene, 21, 5427-5440. doi: 10.1038/sj.onc.1205600.
Esteller, M. Catasus, L., Matias-Guiu, X., Mutter, G. L., Prat, J., Baylin, S. B. & Herman, J.G.
(1999). hMLH1 promoter hypermethylation is an early event in human endometrial
tumorigenesis. Am. J. Pathol., 155, 1767-1772. doi: 10.1016/S0002-9440(10)65492-2.
Esteve, P. O., Chin, H. G., Smallwood, A., Feehery, G. R., Gangisetty, O., Karpf, A. R., … Pradhan,
S. (2006). Direct interaction between DNMT1 and G9a coordinates DNA and histone
methylation during replication. Genes Dev., 20, 3089-3103. doi: 10.1101/gad.1463706.
Epsztejn- Litman, S., Feldman, N., Abu-Remaileh, M., Shufaro, Y., Gerson, A., Ueda,
J., …Bergman, Y. (2008). De novo DNA methylation promoted by G9a prevents
programming of embryonically silenced genes. Nat. Struct. Mol. Biol., 15, 1176-1183.
doi: 10.1038/nsmb.1476.
Efron, B. & Tibshirani, R. (1993). An Introduction to the Bootstrap . New York, NY: Chapman &
Hall/CRC.
Fuks, F., Burgers, W. A., Brehm, A., Hughes- Davies, L. & Kouzarides, T. (2000). DNA
methyltransferase Dnmt1 associates with histone deacetylase activity. Nat. Genet., 24,
88-91. doi: 10.1038/71750.
Fuks, F., Burgers, W. A., Godin, N., Kasai, M. & Kouzarides, T. (2001). Dnmt3a binds
deacetylases and is recruited by a sequence‐specific repressor to silence transcription.
EMBO J., 20, 2536-2544. doi: 10.1093/emboj/20.10.2536.
Gardiner- Garden, M. & Frommer, M. (1987). CpG Islands in vertebrate genomes. J. Mol. Biol.,
196, 261-282. doi: 10.1016/0022-2836(87)90689-9.
Gu, Z., Wang H., Nekrutenko A. & Li, W-H. (2000). Densities, length proportions, and other
distributional features of repetitive sequences in the human genome estimated from 430
megabases of genomic sequence. Gene, 259, 81-88. doi: 10.1016/s0378-
1119(00)00434-0.
Hotchkiss, R. D. (1948). The quantitative separation of purines, pyrimidines, and nucleosides by
paper chromatography. J. Biol. Chem., 175, 315-332.
Holliday, R. & Pugh, J. E. (1975). DNA modification mechanisms and gene activity during
development. Science, 187, 226-232. doi: 10.1126/science.187.4173.226.
Herman, J. G., Umar, A., Polyak, K., Graff, J. R., Ahuja, N., Issa, J. P., …Baylin, S. B. (1998)
Incidence and functional consequences of hMLH1 promoter hypermethylation in
colorectal carcinoma. Proc. Natl Acad. Sci. USA, 95, 6870-6875. doi:
10.1073/pnas.95.12.6870.
Kane, M. F., Loda, M., Gaida, G. M., Lipman, J., Mishra, R., Goldman, H., …Kolodner, B. (1997).
Methylation of the hMLH1 promoter correlates with lack of expression of hMLH1 in
21
sporadic colon tumors and mismatch repair-defective human tumor cell lines. Cancer
Res., 57, 808-811.
Larsen, F., Gundersen, G., Lopez, R. & Prydz, H. (1992), CpG islands as gene markers in the
human genome. Genomics, 13, 1095-1107. doi: 10.1016/0888-7543(92)90024-m.
Lockwood, L. E., Su, S. & Youssef, N. A. (2015) The role of epigenetics in depression and suicide:
A platform for gene-environment interactions. Psychiatry Res., 228, 235-242. doi:
10.1016/j.psychres.2015.05.071.
Moran, S., Arribas, C. & Esteller, M. (2016). Validation of a DNA methylation microarray for
850,000 CpG sites of the human genome enriched in enhancer sequences. Epigenomics,
8, 389-399. doi: 10.2217/epi.15.114.
Maxim V. C. Greenberg & Deborah Bourc’his. (2019). The diverse roles of DNA methylation in
mammalian development and disease. Nature, 20, 590-607. doi: 10.1038/s41580-019-
0159-6.
Myant, K., Termanis, A., Sundaram, A. Y., Boe, T., Li, C., Merusi, C., …Stancheva, I. (2011)
LSH and G9a/GLP complex are required for developmentally programmed DNA
methylation. Genome Res., 21, 83-94. doi: 10.1101/gr.108498.110.
Nakagawa, H., Nuovo, G. J., Zervos, E. E., Martin, E. W. Jr., Salovaara, R., Aaltonen, L. A. & de
la Chapelle, A. (2001). Age-related hypermethylation of the 5′ region of MLH1 in
normal colonic mucosa is associated with microsatellite-unstable colorectal cancer
development. Cancer Res., 61, 6991-6995.
Rakyan, V. K., Down, T. A., Maslau, S., Andrew, T., Yang, T. P., Beyan, H., …Spector, T. D.
(2010). Human aging-associated DNA hypermethylation occurs preferentially at
bivalent chromatin domains. Genome Res., 20, 434-439. doi: 10.1101/gr.103101.109.
Ross, M., Grafham, D., Coffey, A., Scherer, S., McLay, K., Muzny, D., …Bentley D. R. (2005)
The DNA sequence of the human X chromosome. Nature, 434, 325–337. doi:
10.1038/nature03440.
Razin, A. & Riggs, A. D. (1980). DNA methylation and gene function. Science, 210, 604-610. doi:
10.1126/science.6254144.
Ryser, M. D., Yu, M., Grady, W., Siegmund, K., & Shibata, D. (2018). Epigenetic Heterogeneity
in Human Colorectal Tumors Reveals Preferential Conservation And Evidence of
Immune Surveillance. Scientific reports, 8, 17292. doi: 10.1038/s41598-018-35621-y.
Sammut, Claude & Webb, Geoffrey I. (2017). Encyclopedia of Machine Learning and Data
Mining., Boston, MA: Springer.
Teschendorff, A. E., Menon, U., Gentry-Maharaj, A., Ramus, S. J., Weisenberger, D. J., Shen,
H., …Widschwendter, M. (2010). Age-dependent DNA methylation of genes that are
22
suppressed in stem cells is a hallmark of cancer. Genome Res., 20, 440-446. doi:
10.1101/gr.103606.109.
Turcot, V., Tchernof, A., Deshaies, Y., Perusse, L., Belisle, A., Marceau, S., …Vohl. M. C. (2012).
LINE-1 methylation in visceral adipose tissue of severely obese individuals is associated
with metabolic syndrome status and related phenotypes. Clin Epigenetics, 4, 10. doi:
10.1186/1868-7083-4-10.
Tao, Y., Xi, S., Shan, J., Maunakea, A., Che, A., Briones, V., … Muegge, K. (2011). Lsh,
chromatin remodeling family member, modulates genome- wide cytosine methylation
patterns at nonrepeat sequences. Proc. Natl Acad. Sci. USA, 108, 5626-5631. doi:
10.1073/pnas.1017000108.
Wales, M. M., Biel, M. A., el Deiry, W., Nelkin, B. D., Issa, J. P., Cavenee, W. K., … Baylin, S.
B. (1995). p53 activates expression of HIC-1, a new candidate tumour suppressor gene
on 17p13.3. Nature Med., 1, 570-577. doi: 10.1038/nm0695-570.
Yu, G. (2019). Enrichplot: Visualization of Functional Enrichment Result [R package version].
Retrieved from https://github.com/GuangchuangYu/enrichplot.
Yu, G. & He, Q. Y. (2016). ReactomePA: an R/Bioconductor package for reactome pathway
analysis and visualization. Molecular BioSystems., 12, 477-479. doi:
10.1039/c5mb00663e.
Yin, Y., Morgunova, E., Jolma, A., Kaasinen, E., Sahu, B., Khund-Sayeed, S., …Taipale J. (2017).
Impact of cytosine methylation on DNA binding specificities of human transcription
factors. Science, 356, eaaj2239. doi: 10.1126/science.aaj2239.
Abstract (if available)
Abstract
Background: DNA methylation is chemical modifications of DNA bases. With the development of sensitive technologies, DNA methylation was found to not only play an important role in gene expression, but also to be associated with several diseases, such as depression, hypertension and cancer. But it is also likely that some functional CpGs may need to be conserved between cells from same tissue, because otherwise changing them may cause disease. To explore this issue, we developed a method to explore the relationship between conservation of DNA methylation and gene expression. ❧ Methods: 32 samples of normal tissue containing about 850,000 CpG sites taken from the colon, small intestine and endometrium of 8 human subjects were used in this thesis. Each sample consists of around 500-10,000 cells from individual crypts or glands. Gene expression data from 22 samples of colon, small intestine and endometrium was also used. A novel method consisting of a pairwise distance calculation and a bootstrapping procedure was developed to analyze the relationship between CpG methylation and gene expression. ❧ esults: Some functional regions of CpG sites such as CGIs, 5’UTR, TSS1500 and TSS200 are statistically significantly more conserved than are other regions. Pathways of core housekeeping functions (cell cycle, DNA replication, transcription, translation) were statistically significantly conserved in all three tissues. We find that conservation in genes does correlate with gene expression levels, while conservation in gene promoter methylation does not appear to correlate with expression at all for all three tissues. ❧ Conclusion: Gene associated regions and some functional regions such as CGIs, 5’UTR, TSS1500 and TSS200 are more conserved than other regions. CpGs in core housekeeping functions (cell cycle, DNA replication, transcription, translation) are more likely to be conserved in native human tissues. Gene expression is correlated with average CpG conservation but the CpG conservation in promoter is not correlated with gene expression. This means that epigenetic conservation can indicate what genes are likely to be more important than others in native human tissues.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Using average pairwise distance in a correlation analysis
PDF
Differential methylation analysis of colon tissues
PDF
DNA methylation and gene expression profiles in Vidaza treated cultured cancer cells
PDF
Identification of DNA methylation markers in diffuse large B-cell lymphoma
PDF
Comparative analysis of DNA methylation in mammals
PDF
The kinetic study of engineered MBD domain interactions with methylated DNA: insight into binding of methylated DNA by MBD2b
PDF
Modeling mutational signatures in cancer
PDF
Functional DNA methylation changes in normal and cancer cells
PDF
Understanding DNA methylation and nucleosome organization in cancer cells using single molecule sequencing
PDF
DNA methylation changes in the development of lung adenocarcinoma
PDF
Identification and analysis of shared epigenetic changes in extraembryonic development and tumorigenesis
PDF
Identification of novel epigenetic biomarkers and microRNAs for cancer therapeutics
PDF
Understanding protein–DNA recognition in the context of DNA methylation
PDF
Preprocessing and analysis of DNA methylation microarrays
PDF
Identification and characterization of cancer-associated enhancers
PDF
DNA methylation groups determined by GATA5 gene methylation level are correlated with tumor subtype, sex, smoking status, and body mass index in esophageal and gastric adenocarcinoma
PDF
DNA methylation as a biomarker in human reproductive health and disease
PDF
DNA methylation inhibitors and epigenetic regulation of microRNA expression
PDF
Finding signals in Infinium DNA methylation data
PDF
Prenatal air pollution exposure, newborn DNA methylation, and childhood respiratory health
Asset Metadata
Creator
Xia, Chao
(author)
Core Title
An analysis of conservation of methylation
School
Keck School of Medicine
Degree
Master of Science
Degree Program
Biostatistics
Publication Date
05/07/2020
Defense Date
05/06/2020
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
bootstrap,DNA methylation,Manhattan distance,OAI-PMH Harvest,r
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Marjoram, Paul (
committee chair
), Shibata, Darryl (
committee member
), Siegmund, Kimberly (
committee member
)
Creator Email
chaoxia@usc.edu,xcxxshy@aliyun.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-301016
Unique identifier
UC11666117
Identifier
etd-XiaChao-8455.pdf (filename),usctheses-c89-301016 (legacy record id)
Legacy Identifier
etd-XiaChao-8455.pdf
Dmrecord
301016
Document Type
Thesis
Rights
Xia, Chao
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
bootstrap
DNA methylation
Manhattan distance