Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Using average pairwise distance in a correlation analysis
(USC Thesis Other)
Using average pairwise distance in a correlation analysis
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Copyright 2024 Jinyao Yuan
Using Average Pairwise Distance in a Correlation Analysis
by
Jinyao Yuan
A Thesis Presented to the
FACULTY OF THE USC KECK MEDICINE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
(BIOSTATISTICS)
December 2024
Copyright 2024 ii Jinyao Yuan
Table of Contents
List of Tables..................................................................................................................................iii
List of Figures................................................................................................................................ iv
Abstract........................................................................................................................................... v
Chapter One: Introduction .............................................................................................................. 1
Chapter Two: Methods.................................................................................................................... 3
Epigenetic Conservation.............................................................................................................. 3
Gene Expression Variability........................................................................................................ 4
Statistical Methods...................................................................................................................... 4
Precision of Average PWD ...................................................................................................... 4
Ranking Average PWD............................................................................................................ 5
Chapter Three: Results.................................................................................................................... 7
Chapter Four: Discussion.............................................................................................................. 11
References..................................................................................................................................... 13
Copyright 2024 iii Jinyao Yuan
List of Tables
Table 1: The names and definitions of gene regions....................................................................... 3
Table 2: The number of genes having paired measures of average PWD and expression
deviance for correlation analysis .................................................................................................... 7
Table 3: Correlation values for different gene regions in three analysis methods........................ 10
Copyright 2024 iv Jinyao Yuan
List of Figures
Figure 1: Scatterplots of the variance of the average PWD relative to the number of CpGs......... 8
Figure 2: Scatterplots of gene expression variability versus gene methylation variability ............ 9
Copyright 2024 v Jinyao Yuan
Abstract
Average pairwise distance (average PWD) is a common measure of sequence variation used in
statistical analysis. We use it to measure variation in gene DNA methylation between samples. Our
ultimate goal is to evaluate the association between variation in DNA methylation with variation
in gene expression via a correlation coefficient. A property of average PWD is that, with an
independence assumption, the variance is inversely associated with the number of sites averaged.
In our application, this is the number of CpGs in a gene. This property results in variable precision
of the measures across genes having variable numbers of CpGs, which affects their ranks and our
estimate of correlation with gene expression. To address this issue, we employed a bootstrap
sampling method to estimate the quantile of gene average PWD conditional on the number of
CpGs in the gene. Using our bootstrap estimates of the average PWD quantile, the estimate of the
correlation with gene expression became stronger, demonstrating the efficacy of the bootstrap
method in correcting biases associated with the number of CpGs.
Copyright 2024 1 Jinyao Yuan
Chapter One: Introduction
In recent work, Street et al. (2023) propose to measure cellular plasticity by the correlation of
epigenetic conservation with gene expression variability. Plasticity is one of two mechanisms of
the epigenome affecting the gene expression of cells, with the second being remodeling.
Remodeling is irreversible and controls the differentiation of cells, such as red blood cells, muscle
cells, and nerve cells deriving from stem cells, while plasticity is reversible and present in stem
cells and cancer cells (Burkhardt and others 2022; Easwaran and others 2014; Flavahan and others
2017; Mathur and Costello 2021). Plasticity is reflected by a conserved and permissive epigenome
where gene expression is regulated by the microenvironment (Saxena & Shivdasani, 2021; Tae
Yong Kim et al., 2014). Investigators seek new ways of exploring the relationship that is not
entirely understood between these two mechanisms.
Street et al. (2023) propose to visualize plasticity by a scatterplot of variation in gene
expression as a function of variation in DNA methylation. The plasticity of normal colon cells and
colorectal cancer tumor cells is reflected through these plots. They assessed gene expression
variability using single-cell RNA sequencing data from previously published studies (Bian et al.,
2018; Li et al., 2017) and measured DNA methylation using the Illumina MethylationEPIC array.
Epigenetic conservation was determined by comparing DNA methylation between samples from
the same patient. In tumors, they quantified gene-level methylation variability by computing the
average pairwise distance (PWD) in CpG methylation between paired samples obtained from a
single tumor. They propose the correlation between conservation of gene CpG methylation and
variability in gene expression as a means for quantifying plasticity. Specifically, genes that are
Copyright 2024 2 Jinyao Yuan
epigenetically conserved (low methylation variability) show variation in gene expression while
genes with greater epigenetic variability show decreased variability in expression. Uncertain of
whether it is gene conservation or promoter conservation that is more predictive, they consider
both and compare the correlation coefficients obtained from using the two different metrics.
A concern with this analysis is the precision inherent to quantifying average pairwise distance.
As genes vary in length and number of CpGs, the precision of the average gene conservation
estimates varies; conservation estimates from genes with fewer CpGs have less precision. This
variation in precision affects the ranking of genes based on conservation, and subsequently affects
the correlation estimate between variation in gene expression and variation in gene average PWD.
Moreover, the variation in precision as a function of the number of CpGs will affect our
comparison of average PWD summaries based on different gene regions as the numbers of CpGs
will vary by region. If the correlation using promoter region conservation is weaker than the
correlation using the whole gene, we do not know if it is weaker due to the biology or due to
measurement error from averaging fewer CpGs. Thus, we seek a method that is not affected by the
number of CpGs to rank genes based on average pairwise distance.
Hvitfeldt et al. (2020) developed an approach to account for the variable precision of average
PWD as a function of the number of measurements. They proposed a bootstrap sampling approach
to rank gene average PWD conditional on the number of CpGs to overcome this bias. We applied
their method to our data and estimated stronger correlations than those obtained using the original
measures.
Copyright 2024 3 Jinyao Yuan
Chapter Two: Methods
Epigenetic Conservation
DNA methylation was measured using the Illumina MethylationEPIC BeadChip Infinium
microarray, where there are over 850,000 CpG sites (Moran et al., 2016; Pidsley et al., 2016). Each
CpG site had paired methylation values, each representing the ratio of methylation fluorescence
intensity over total intensity at the CpG site for each sample. In our analysis, we use the names of
genes and gene regions from the UCSC database (https://genome.ucsc.edu/), resulting in a total
26,644 genes. The target CpGs mapping to genes are classified into 7 gene regions: 1stExon,
3'UTR, 5'UTR, Body, ExonBnd, TSS1500, and TSS200. If any target CpG is assigned to multiple
genes and multiple gene regions, we use the first one reported. Table 1 gives the definitions of
some regions, also found at https://knowledge.illumina.com/microarray/general/microarraygeneral-reference_material-list/000001568.
Gene Region Names Definitions
TSS200 0-200 bases upstream of the transcriptional start site (TSS)
TSS1500 200-1500 bases upstream of the TSS
5'UTR Within the 5' untranslated region, between the TSS and the ATG start
site
Body Between the ATG and stop codon; irrespective of the presence of introns,
exons, TSS, or promoters
3'UTR Between the stop codon and poly-A signal
Table 1: The names and definitions of gene regions
In our study, conservation will be summarized across three different gene regions: whole gene,
promoter, and gene body. Whole gene corresponds to CpGs mapping to any of the seven gene
regions; promoter is limited to those in the TSS200, 5'UTR, and 1stExon; and gene body refers to
those in the Body and ExonBnd.
Copyright 2024 4 Jinyao Yuan
Epigenetic conservation is measured by comparing DNA methylation values between paired
samples from a single tumor, obtained from opposite tumor halves. We define pairwise distance
by the Manhattan distance (Jukes, 2018) at the CpG level, �! = |�"! − �#!|, in which �"!, �#!
represent, respectively, the methylation value at the �$% CpG site for the paired tumor samples.
Since different genes will have different number of CpGs, we normalize this difference by taking
the average. To be more specific, we notate this mean Manhattan distance as �& = "
'!
∑ �!
'!
!(" ,
where �& is the number of CpG sites in gene g (Jukes, 2018). We call this the average PWD.
Gene Expression Variability
We used deviance, computed from single-cell RNA sequence data, to measure gene expression
variability (Townes et al., 2019). We obtained the measure from Street et al. (2023), who computed
it using publicly available data on colorectal cancer (GSE81861 (Li et al., 2017) and GSE97693
(Bian et al., 2018)) (for details see Street et al. 2023). To control outliers while still retaining values
of zero, they applied the transformation:
������������������������� = ��(�������� + 1)
Statistical Methods
Precision of Average PWD
Statistical theory dictates that the variance of a mean is inversely related to the sample size for
independent and identically distributed observations. For average PWD, assuming �! are
independent and identically distributed, we write this as:
Copyright 2024 5 Jinyao Yuan
���?�&@ = ��� A 1
�&
B�!
'!
!("
C = 1
�&
# ��� AB�!
'!
!("
C = 1
�&
# ?�&����!@ = 1
�&
����!
Therefore, the variance of �& is largest for small �& . Especially when �& = 1 , the
distribution of �& was right-skewed. We implement two approaches to mitigate the effect of
sample size on the analysis using average PWD. In one, we filter the gene regions to include only
those with a minimum of four or more CpGs in the analysis. This reduces the influence of regions
with the fewest CpGs and most variability at the cost of eliminating a small subset of genes on the
array (~3.7%). This resulting correlation estimate is still subject to bias, but the bias due to
variation in CpG number is reduced compared to the analysis without filtering. In a second method,
we implement a bootstrapping approach proposed by Hvitfeldt et al. 2020.
Ranking Average PWD
Hvitfeldt et al. (2020) proposed two bootstrap methods to rank average PWD after correcting
for the variable precision of average PWD estimates. In the first, they assume independence of
CpG methylation in the genome, and compute bootstrap estimates of �& from a sample of CpG
sites drawn at random from the genome. However, as there is known correlation in CpG
methylation at nearby CpG sites, depending on their location in the genome (e.g., in CpG islands
(Bell et al., 2011)), they propose a second method to sample adjacent CpG sites. This second
method is different from the simple bootstrap procedure in that it considers the correlation of
methylation status among adjacent CpGs. It is this latter approach that we implement.
The bootstrap procedure under correlation is divided into the following four steps:
1. For each �&, select from each gene all sets of �& neighboring CpGs to form a new set,
Copyright 2024 6 Jinyao Yuan
�'!. For example, if there were 3 genes, A, B, C with CpGs �", �#, �), �", �#, �", (3 CpGs
in gene A, 2 in gene B and 1 in gene C), the new sets would be �" =
{�", �#, �), �", �#, �"}, �# = {�"�#, �#�), �"�#}, �) = {�"�#�)}.
2. Set the number of iterations as �* = 1000, and in each iteration, sample an element from
each �'!
to serve as a "null gene" and notate this gene as �+.
3. For each �+, compute its mean Manhattan distance (aka PWD) and notate it as �&" L .
4. For each level of �&, 1000 �&" L values form a null distribution. From this distribution, we
can determine the quantile of each gene’s observed mean PWD. For example, the gene
"A1BG" has 6 CpGs, so the quantile of its observed mean PWD should be obtained from
the null distribution corresponding to �& = 6.
This method is implemented in the Methcon5 R package on GitHub
(https://github.com/USCbiostats/MethCon5).
We use the Methcon5 R package to estimate quantiles of average PWD to use in our plots and
correlation analysis. For visualization, we created a scatterplot of gene expression variability
(measured by jittered deviance) by the bootstrap estimates of quantiles of average PWD. The
correlation between the quantiles (epigenetic variability) and the natural log values of jittered
deviance (gene expression variability) was compared to those obtained from the x-y scatterplots
created using the raw average PWD (un-bootstrapped method).
Copyright 2024 7 Jinyao Yuan
Chapter Three: Results
We analyze data for a single tumor, measuring DNA methylation from opposite tumor sides.
This tumor yielded data on 540,901 gene-linked CpG sites. We summarized this measurement at
the gene level by computing the average pairwise difference, resulting in 25,475 genes, of which
19,236 (75%) had deviance values for computing pairwise correlation. Table 2 reports the number
of genes having paired measures of average PWD and deviance for each analysis.
Method
Gene Regions
Whole gene Gene body Promoter
Average PWD (1 or
more CpGs)
19236 17355 17124
Filter on >=4 CpGs 18519 (96.3%) 13152 (75.8%) 12497 (73.0%)
Adjusted Bootstrap 19236 17355 17124
Table 2: The number of genes having paired measures of average PWD and expression deviance
for correlation analysis
In Table 2, the number of whole genes decreased slightly after excluding genes with fewer than
four CpGs from our data. However, for the gene body and promoter regions, the reduction in
sample size was about a quarter or more, which was considerably larger.
Figure 1 shows the variance of the average PWD as a function of the number of CpGs,
dropping missing variances when only one gene was present for a certain number of CpGs. For
example, only the gene ABR has 239 CpGs in the dataset. Genes like this were not included in the
scatterplots below (n=62).
Copyright 2024 8 Jinyao Yuan
Figure 1: Scatterplots of the variance of the average PWD relative to the number of CpGs
In the left subplot of Figure 1, regardless of the gene region, the variance of the average PWD
decreases as the number of CpGs increases. Three loess curves were added to summarize this trend
(for whole gene, gene body, and promoter), as shown in the right subplot of Figure 1. The change
in variance is steepest when there are fewer than 30 CpGs. This represents 73.8% of the genes on
the array.
Figure 2 shows scatterplots to visualize the relationship between gene epigenetic conservation
and gene expression variability for three different analyses. In the first two scatterplots, epigenetic
conservation was measured by the average PWD. For the third scatterplot, epigenetic conservation
was measured using the bootstrap estimates of quantiles of average PWD. For all these scatterplots,
the jittered deviance was selected as the measurement of variability to prevent an overlapping of
points with deviance equal to 0 on the horizontal line. We used (ln-transformed) expression
deviance when computing correlation for all plots.
Copyright 2024 9 Jinyao Yuan
Figure 2: Scatterplots of gene expression variability versus gene methylation variability
The points in the three scatterplots represent all genes (left), the genes with 4 or more CpGs
(middle), and all genes but with quantiles of average PWD (right), respectively. As shown in these
plots, the absolute values of correlation were 0.53, 0.56, and 0.61. Comparisons between the first
and second subplots illustrate the effect of dropping the genes with a small number of CpGs (<4).
After removing these genes, we observed a slight increase in the correlation value in the second
scatterplot. On the other hand, the correlation estimated using results from the bootstrap procedure
(right), revealed a significant improvement in the correlation value. Interestingly, this
improvement was greater than that seen in the second scatterplot. This phenomenon might be
explained by the fact that while removing the genes with fewer CpGs did eliminate some bias, it
did not eliminate all bias. The bootstrap method, however, is not subject to bias, allowing us to
effectively utilize the information from genes with a small number of CpGs.
In normal colon, average PWD varies by distance to the transcription start site (Hvitfeldt et
al., 2020). Consequently, we wondered whether the association between gene expression deviance
and average PWD varied by gene region. We considered two: gene body and promoter. We
examined three different cases: 1. average PWD with no filtering (1 or more CpGs), 2. retaining
genes having 4 or more CpGs, and 3. using the adjusted bootstrap procedure. We summarize our
correlation estimates in Table 3.
Copyright 2024 10 Jinyao Yuan
Method
Gene Regions
Whole gene Gene body Promoter
Average PWD (1 or
more CpGs)
-0.53 -0.37 -0.47
Restrict to nCpG>=4 -0.56 -0.49 -0.52
Adjusted Bootstrap -0.61 -0.45 -0.58
Table 3: Correlation values for different gene regions in three analysis methods
Table 3 shows the strongest (negative) correlation is obtained when averaging PWD for CpGs
in the whole gene, instead of averaging those just in the promoter or just the gene body. At the
same time, the three values for the "Promoter" followed the same pattern across the different
methods as those for whole gene. This indicates that the adverse effect on the correlation of average
PWD from genes with a small number of CpGs can be mitigated by filtering genes with fewer
CpGs (<4) or using the adjusted bootstrap method. Similar to the result when averaging CpGs
from the whole gene, the adjusted bootstrap method showed largest change in correlation estimate
from the naive analysis for the "Promoter". Interestingly, we note that the values for the "Gene
Body" region did not follow this pattern. The correlation estimate for the "Gene Body" was
stronger after applying the two correction methods, but the adjusted bootstrap method was less
effective than simply discarding the genes with 3 or fewer CpGs. Although a stronger correlation
is estimated using either correction, all correlation estimates are smaller than when using the whole
gene. Overall, these results support using the bootstrap sampling measure for the whole gene for
future studies.
Copyright 2024 11 Jinyao Yuan
Chapter Four: Discussion
In studies of human tissue, pairwise distance (PWD) in CpG methylation measures
methylation variability, and average PWD, averaging PWD across CpGs in a gene, measures gene
methylation variability. As genes vary in the number of CpG sites, the precision of average PWD
estimates will vary. The more CpGs averaged the higher the precision of the average PWD estimate.
Naively using average PWD in a correlation analysis of gene expression variability and gene
methylation variability results in biased correlation estimates. We use a bootstrap sampling
approach to compute quantiles of average PWD conditional on the number of CpGs. This gives an
approach to compare methylation variability for different gene regions, looking either at the whole
gene, or focusing on the promoter or gene body. Like Street et al. (2023), we propose correlation
between gene expression variability and gene methylation variability as a measure of plasticity.
We estimate the greatest correlation when methylation variability is measured using all CpGs in
the gene, instead of focusing on the promoter or gene body only.
Another approach to address the poor precision of average PWD for genes with small numbers
of CpGs is to omit them from the analysis, however this might introduce other unknown biases of
gene representation. In our analysis, we omit genes with three or fewer CpGs. The fraction of
omitted genes varied from 3.7% for the analysis of whole genes to 24.2% and 27% for the analysis
of gene body or promoter, respectively. The effect of omitting these genes had a small effect on
the correlation estimates using the whole gene or promoter, but a considerably larger effect when
measuring methylation conservation in the gene body. This might be explained, in part, due to the
biases of increased methylation conservation at CpG islands which occur more often in gene
Copyright 2024 12 Jinyao Yuan
promoters and the bias of CpGs targeted on the EPIC array. Further work is needed to investigate
this.
A challenge in comparing average PWD estimates from different gene regions ("Gene Body"
or "Promoter") is that the number of CpGs will change. For example, the gene "FAM41C" had two
CpGs: one located in the "Gene Body" and the other in the "TSS1500", which belongs to neither
the "Gene Body" nor the "Promoter" region. Thus, if we only consider the "Gene Body", the
number of CpG for this gene is reduced to one. Our bootstrap sampling method provides one
approach to address this issue.
One limitation of the adjusted bootstrap method is that the quantile estimate for the gene with
the largest number of CpGs is by definition, one. For that gene, the sampling set consists entirely
of this gene’s CpGs, resulting in the same sample outcome in each iteration. Consequently, the
average PWD quantile for this gene is 1. However, given the large number of genes, the effect of
a single gene on the overall correlation is likely negligible, making this limitation unimportant.
Lastly, a limitation of this study is the generalizability of our results. Our data were derived
solely from a single tumor, "Tumor M". These results require replication using more tumors as
well as samples of normal tissue.
In summary, this study applied a bootstrap sampling approach to correct the ranking of average
PWD estimates of methylation variability as a function of number of CpGs. Estimates of
correlation between gene expression variability and DNA methylation variability were stronger
after applying the bootstrap sample quantile estimates.
Copyright 2024 13 Jinyao Yuan
References
Burkhardt, D., Perez, B., Lock, J. G., Krishnaswamy, S., & Chaffer, C. L. (2022). Mapping
Phenotypic Plasticity upon the Cancer Cell State Landscape Using Manifold Learning.
Cancer Discovery, 12(8), 1847–1859. https://doi.org/10.1158/2159-8290.cd-21-0282
Easwaran, H., Tsai, H.-C., & Baylin, Stephen B. (2014). Cancer Epigenetics: Tumor
Heterogeneity, Plasticity of Stem-like States, and Drug Resistance. Molecular Cell,
54(5), 716–727. https://doi.org/10.1016/j.molcel.2014.05.015
Flavahan, W. A., Gaskell, E., & Bernstein, B. E. (2017). Epigenetic plasticity and the hallmarks
of cancer. Science, 357(6348), eaal2380. https://doi.org/10.1126/science.aal2380
Hvitfeldt, E., Xia, C., Siegmund, K. D., Shibata, D., & Marjoram, P. (2020). Epigenetic
Conservation Is a Beacon of Function: An Analysis Using Methcon5 Software for
Studying Gene Methylation. JCO Clinical Cancer Informatics, 4(4), 100–107.
https://doi.org/10.1200/cci.19.00109
Jukes, E. (2018). Encyclopedia of Machine Learning and Data Mining (2nd edition). Reference
Reviews, 32(7/8), 3–4. https://doi.org/10.1108/rr-05-2018-0084
Kaaij, L. T., van de Wetering, M., Fang, F., Decato, B., Molaro, A., van de Werken, H. J., van
Es, J. H., Schuijers, J., de Wit, E., de Laat, W., Hannon, G. J., Clevers, H. C., Smith, A.
D., & Ketting, R. F. (2013). DNA methylation dynamics during intestinal stem cell
differentiation reveals enhancers driving gene expression in the villus. Genome Biology,
14(5), R50. https://doi.org/10.1186/gb-2013-14-5-r50
Mathur, R., & Costello, J. F. (2021). Epigenomic contributions to tumor cell heterogeneity and
plasticity. Nature Genetics, 53(10), 1403–1404. https://doi.org/10.1038/s41588-021-
00932-w
Moran, S., Arribas, C., & Esteller, M. (2016). Validation of a DNA methylation microarray for
850,000 CpG sites of the human genome enriched in enhancer sequences. Epigenomics,
8(3), 389–399. https://doi.org/10.2217/epi.15.114
Pidsley, R., Zotenko, E., Peters, T. J., Lawrence, M. G., Risbridger, G. P., Molloy, P., Van Djik,
S., Muhlhausler, B., Stirzaker, C., & Clark, S. J. (2016). Critical evaluation of the
Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation
profiling. Genome Biology, 17(1). https://doi.org/10.1186/s13059-016-1066-1
Copyright 2024 14 Jinyao Yuan
Saxena, M., & Shivdasani, R. A. (2021). Epigenetic Signatures and Plasticity of Intestinal and
Other Stem Cells. Annual Review of Physiology, 83(1), 405–427.
https://doi.org/10.1146/annurev-physiol-021119-034520
Street, K., Siegmund, K., & Shibata, D. (2023). Epigenetic Conservation Infers That Colorectal
Cancer Progenitors Retain The Phenotypic Plasticity Of Normal Colon.
Tae Yong Kim, Li, F., Ferreiro-Neira, I., Ho, L.-L., Annouck Luyten, Kodandaramireddy
Nalapareddy, Long, H. W., Verzi, M. P., & Shivdasani, R. A. (2014). Broadly permissive
intestinal chromatin underlies lateral inhibition and cell plasticity. Nature, 506(7489),
511–515. https://doi.org/10.1038/nature12903
Abstract (if available)
Abstract
Average pairwise distance (average PWD) is a common measure of sequence variation used in statistical analysis. We use it to measure variation in gene DNA methylation between samples. Our ultimate goal is to evaluate the association between variation in DNA methylation with variation in gene expression via a correlation coefficient. A property of average PWD is that, with an independence assumption, the variance is inversely associated with the number of sites averaged. In our application, this is the number of CpGs in a gene. This property results in variable precision of the measures across genes having variable numbers of CpGs, which affects their ranks and our estimate of correlation with gene expression. To address this issue, we employed a bootstrap sampling method to estimate the quantile of gene average PWD conditional on the number of CpGs in the gene. Using our bootstrap estimates of the average PWD quantile, the estimate of the correlation with gene expression became stronger, demonstrating the efficacy of the bootstrap method in correcting biases associated with the number of CpGs.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
An analysis of conservation of methylation
PDF
DNA methylation and gene expression profiles in Vidaza treated cultured cancer cells
PDF
Differential methylation analysis of colon tissues
PDF
DNA methylation of NOS genes and carotid intima-media thickness in children
PDF
Identification and characterization of cancer-associated enhancers
PDF
Finding signals in Infinium DNA methylation data
PDF
DNA methylation groups determined by GATA5 gene methylation level are correlated with tumor subtype, sex, smoking status, and body mass index in esophageal and gastric adenocarcinoma
PDF
The kinetic study of engineered MBD domain interactions with methylated DNA: insight into binding of methylated DNA by MBD2b
PDF
Scalable latent factor models for inferring genetic regulatory networks
PDF
Identifying and quantifying transcriptional module heterogeneity and genetic co-regulation, with applications in asthma
PDF
Gene expression and angiogenesis pathway across DNA methylation subtypes in colon adenocarcinoma
PDF
Statistical analysis of high-throughput genomic data
PDF
DNA methylation review and application
PDF
Identification of DNA methylation markers in diffuse large B-cell lymphoma
PDF
DNA methylation inhibitors and epigenetic regulation of microRNA expression
PDF
Identification and analysis of shared epigenetic changes in extraembryonic development and tumorigenesis
PDF
Prediction modeling with meta data and comparison with lasso regression
PDF
Nonlinear modeling of the relationship between smoking and DNA methylation in the multi-ethnic cohort
PDF
DNA hypermethylation: its role in colorectal tumorigenesis and potential clinical applications
PDF
Best practice development for RNA-Seq analysis of complex disorders, with applications in schizophrenia
Asset Metadata
Creator
Yuan, Jinyao
(author)
Core Title
Using average pairwise distance in a correlation analysis
School
Keck School of Medicine
Degree
Master of Science
Degree Program
Biostatistics
Degree Conferral Date
2024-12
Publication Date
10/02/2024
Defense Date
09/13/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
bootstrap,correlation,gene expression,gene methylation,OAI-PMH Harvest,pairwise distance,plasticity
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Siegmund, Kimberly (
committee chair
), Marjoram, Paul (
committee member
), Street, Kelly (
committee member
)
Creator Email
jinyaoyu@usc.edu,laoguai@i.smu.edu.cn
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11399BEE7
Unique identifier
UC11399BEE7
Identifier
etd-YuanJinyao-13568.pdf (filename)
Legacy Identifier
etd-YuanJinyao-13568
Document Type
Thesis
Format
theses (aat)
Rights
Yuan, Jinyao
Internet Media Type
application/pdf
Type
texts
Source
20241002-usctheses-batch-1216
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
bootstrap
correlation
gene expression
gene methylation
pairwise distance
plasticity