Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Determining the epigenetic contribution of basal cell identity in cystic fibrosis
(USC Thesis Other)
Determining the epigenetic contribution of basal cell identity in cystic fibrosis
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Determining the epigenetic contribution of basal cell identity in cystic fibrosis
by
Aakriti Singh
A Thesis Presented to the
FACULTY OF THE USC KECK SCHOOL OF MEDICINE
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfilment of the
Requirements for the Degree
MASTER OF SCIENCE
(BIOCHEMISTY AND MOLECULAR MEDICINE)
December 2022
ii
Acknowledgements
First, I would like to thank the entire Marconett lab for all their support and contribution to
my academic and research pursuits. I want to thank my principal investigator, Crystal Marconett,
for all her help as a research mentor, and for being so understanding, patient, and dedicated
throughout my time in research. I’d like to thank Michele Ramos Correa for helping with
bioinformatics projects, Jonathan Castillo for being a great mentor, and Minxiao Yang for help
with wet lab experiments.
Next, I’d like to thank my thesis committee, Dr. Ite Offringa and Dr. Suhn Rhie, for all their
time, guidance, and support throughout my research. Additionally, I’d like to thank Dr. Amy Ryan
for her help and expertise on my research and for obtaining financial support the project and
coordination and processing of sample collection.
Finally, I’d like to thank my friends and family for supporting me financially and emotionally
throughout my research.
iii
Table of Contents
Acknowledgements..................................................................................................................ii
List of Tables............................................................................................................................iv
List of Figures...........................................................................................................................v
Abstract..................................................................................................................................vii
Chapter I: Introduction...........................................................................................................1
Cystic Fibrosis .....................................................................................................................1
Epigenetics, CF etiology, and differentiation.......................................................................2
Single cell sequencing..........................................................................................................6
Chapter II: Materials and Methods....................................................................................... 8
Chapter III: Results.................................................................................................................9
Aim 1: Identifying cell-specific expression patterns in cystic fibrosis...................................9
Aim 2: Identifying epigenetic changes between cystic fibrosis and non-cystic
fibrosis samples..................................................................................................................21
Chapter IV: Discussion..........................................................................................................40
Supplementary Figures..........................................................................................................42
References...............................................................................................................................44
iv
List of Tables
Table 1. Patient data compositon...................................................................................................24
Table 2. Cell count pre and post filtering per sample....................................................................26
Table 3. Top differentially expressed pathways comparing P1 to P5............................................38
Table 4. Top differentially expressed pathways comparing CF to non-CF...................................39
v
List of Figures
Figure 1. Improper movement of chloride ions and effect on CFTR channel.................................2
Figure 2. Cells of airway epithelium, in both healthy and diseased tissue......................................4
Figure 3. Differentiation pathway of basal cells in proximal airway epithelium............................5
Figure 4. Preliminary quality control of first five samples (NBG18, NBG19, NBG20,
NBG22,NBG26)............................................................................................................................10
Figure 5. Dimensional reduction analysis for merged dataset of first five samples......................11
Figure 6. Various iterations of clustering of merged dataset of first five samples........................13
Figure 7. Known lung markers and their expression levels depicted via violin plot of the
merged dataset of first five samples...............................................................................................14
Figure 8. Cluster identification of supervised, six clusters............................................................15
Figure 9. Distinction of CF and non-CF populations depicted on UMAP....................................16
Figure 10. Quality control of merged twenty-four samples...........................................................17
Figure 11. UMAP depiction of merged twenty-four samples, examining unsupervised and
supervised as well as CF identity...................................................................................................19
Figure 12. Fluorescence activated cell sorting of first five samples..............................................21
Figure 13. Degree of overlap from bulk ATAC sequencing of first five samples.........................22
Figure 14. Composition of sample data information.....................................................................23
Figure 15. Quality control metrics of thirteen new samples..........................................................25
Figure 16. Filtered thirteen new samples with dimensional reduction analysis............................27
Figure 17. Labeled UMAP of clustering by cell population..........................................................28
Figure 18. Labeled UMAP depiction by CF/non-CF identity and timepoint identity...................31
Figure 19. Healthy18 and CF16 cell lines examined through timepoints......................................32
vi
Figure 20. IGV exploration of samples examining GAPDH and KRT5 genes.............................33
Figure 21. ATAC and WNN analysis for unfiltered data..............................................................35
Figure 22. ATAC and WNN analysis for filtered data..................................................................37
Supplementary Figure 1. Top differentially expressed markers for six supervised clusters.........42
Supplementary Figure 2. Individual cluster analysis of batch 1 and batch 2 samples...................43
vii
Abstract
Cystic fibrosis is a progressive, genetic disease that severely affects the lungs and other
organs of the body. In the lungs, the major progenitor cells are basal cells, which give rise to other
differentiated functional epithelial cells in the airways. Epigenetic changes can affect how basal
cells proliferate and differentiate in the CF lung. To study cell-to-cell variation between CF and
non-CF, single cell sequencing was applied. From single cell sequencing, principal component
analysis identified clusters showing a clustering of CF and non-CF cells together. Through these
clusters, it was shown that CF samples correlate with a more differentiated phenotype (ciliated
FOXJ1 and club SCGB1A markers), while non-CF samples had more basal stem cell phenotypes
(KRT5 and KRT15). Additionally, fluorescence activated cell sorting data showed a higher
percentage of basal cells in non-CF samples compared to CF samples. Furthermore, an ingenuity
pathways analysis by QIAGEN showed that P5 cells showed differentially expressed transcription
markers such as FAK and CREB, and CF cells showed an upregulation in pathways relating to
inflammation. Overall, single cell sequencing studies of CF and non-CF patient samples showed
a difference in cell populations between the two groups, with CF samples correlating with
differentiating cells. Utilization of single cell ATAC was able to increase our ability to discriminate
between basal cell subtypes in culture, further elucidating the need for understanding epigenomic
context of basal cell differentiation.
1
Chapter I
Introduction
Cystic Fibrosis
Cystic fibrosis (CF) is a progressive, inherited genetic disease that causes severe damage
to the lungs and other organs of the body (Elborn, 2016). It is caused by a mutation in the cystic
fibrosis transmembrane conductance regulator (CFTR) gene, which encodes the CFTR protein.
There are over 1800 CFTR gene mutations identified, with the most common cystic fibrosis
mutation being F508del, accounting for two-thirds of all CFTR mutations (Mathew et al., 2021).
CFTR is a transmembrane protein that is cyclic AMP-activated and regulates outwardly rectifying
chloride channels in airway epithelia. CFTR is activated by protein kinase A through the cyclic
AMP pathway (Robert et al., 2007).
The CFTR channel is responsible for the transport and movement of anions from the inside
to the outside of the cell (Moore et al., 2008). A normal and active CFTR channel allows for water
movement alongside proper sodium movement, hydrating the airway. With a defective CFTR
gene, the channel is unable to properly transport ions (Moore et al., 2018). The limited ion
movement causes a subsequent decreased water movement, leading to dehydration. To circumvent
the dehydration, present cells secrete mucus, causing a buildup of thick, viscous mucus on the
outside of the cell. The mucus harbors germs, creating frequent airway and sinus infections for
patients with cystic fibrosis, which can make breathing difficult (Orenstein et al., 2004). Figure 1
illustrates increased mucus side effect caused by improper movement of chloride ions in a cystic
fibrosis airway with a mutated CFTR gene.
2
Figure 1. A mutation in the CFTR gene causes cystic fibrosis through improper movement of
chloride ions (Moore et al., 2018).
Additionally, cystic fibrosis can cause problems with the digestive system, sometimes
requiring medication as inflammation and scarring of the pancreas can result in cystic fibrosis
related diabetes (Doan et al., 2022). Cystic fibrosis can also negatively impact the liver and cause
bones to become thinner and weaker. In men, cystic fibrosis can also affect fertility (Ahmad et al.,
2013). Eventually, patients with cystic fibrosis will need to undergo a lung transplant as the disease
is incurable. The life expectancy for an individual with cystic fibrosis is about forty years
(McBennett et al., 2022).
Epigenetics, CF etiology and differentiation
Cystic fibrosis etiology is hypothesized to be affected by epigenetic dysregulation.
Epigenetic alterations can affect cell to cell populations, causing differences amongst CF and non-
3
CF patients, contributing to the disease pathology. Epigenetic modifications can affect gene
expression patterns without changing the DNA sequence by altering chromatin states (Allis et al.,
2016). Chromatin has two states: euchromatin, which is an active state, and heterochromatin,
which is an inactive state (Allis et al., 2016). Epigenetic modifications can regulate whether genes
are turned on or off, and consist of DNA methylation, histone modifications, and non-coding
RNAs. In DNA methylation, a methyl group is transferred onto the C5 position of a cytosine to
form 5-methylcytosine. This can affect gene expression through the recruiting and inhibition of
binding proteins and transcription factors to DNA (Moore et al., 2013). Histones can be affected
through posttranslational histone modifications such as phosphorylation, methylation, acetylation,
and ubiquitination which, depending on the location, can have an activating or inhibiting effect
(Hii et al., 2018). Additionally, non-coding RNAs also affect the regulation of gene expression.
They are a subset of RNAs that do not encode for functional proteins, and include small interfering
RNA (siRNA), microRNA (miRNA), and long non-coding RNA (lncRNA) (Wei et al., 2016).
While the mutation in the CFTR gene is known to cause cystic fibrosis, not much is known
about how epigenetics contributes to CFTR expression or the expression of cell populations in the
proximal airway epithelium. Induced pluripotent stem cells (iPSCs) can allow for the
reprogramming of dysfunctional progenitor cells. To do this, first CF cell populations must be
identified. Previous studies have characterized at least five different types of epithelial cells located
in the epithelium of the proximal airway: basal cells, club cells, ciliated cells, goblet cells, and
neuroendocrine cells (Pollard et al., 2016). These cells of the airway epithelium each play an
important role in disease and health. Figure 2 depicts the different roles (with blue arrows; healthy)
of the various cell populations present in the epithelium airway: basal cells act as the stem cells of
the airway, club cells secrete anti-inflammatory proteins, ciliated cells aid in clearance of mucus
4
with their cilia that propel particles out of the airway, goblet cells produce and secrete mucus, and
neuroendocrine cells secrete neuropeptides (Pollard et al., 2016). Improper function of these cell
populations results in a diseased state.
Figure 2. The cells of the airway epithelium. Blue arrows depict roles in healthy tissue, red
arrows depict roles in diseased tissue (Davis et al., 2021).
Core cell markers for these known cell populations are as follows: basal (KRT5, KRT14,
NGFR, ITGA6, TP63), ciliated (FOXJ1, DNAH5), club (SCGB1A1), goblet (SPDEF, MUC5A/C),
and neuroendocrine (CD56) (Davis et al., 2021). Basal cells are well established as the major
progenitor cells of the proximal lung, differentiating into the previously mentioned cell populations
(Pollard et al., 2016). There are two known basal stem cells, characterized by the presence or
absence of the KRT14 cell marker. Basal stem cells first differentiate into suprabasal cells, and
5
then are able to differentiate into ciliated cells, club cells, and goblet cells (Bukowy-Bierytto,
2021). First, basal cells divide to either become club cells or form further basal cells. The signal
for basal cells to become more differentiated into club cells is through the high expression of
cytokines and chemokines (Pollard et al., 2016). Club cells, additionally can also act as stem cells,
and are able to differentiate further, into ciliated cells or goblet cells. This is done through NOTCH
signaling: an increase of NOTCH signaling causes club cell differentiation into goblet cells, while
a decrease of NOTCH signaling causes club cell differentiation into ciliated cells (Pollard et al.,
2016). Additionally, club cells can dedifferentiate back to basal cells, growing into a less
differentiated stage in their lineage. NOTCH signaling is also responsible for driving basal stem
cells into intermediate basal precursor cells, marked by increased expression of KRT8 (Pollard et
al., 2016). Additionally, previous research characterizing the transcriptional analysis of cystic
fibrosis shows the characterization of three major types of cells in cystic fibrosis: three ciliated
subsets, five secretory subsets, and five basal cell subsets (Carrero et al., 2021). Figure 3 depicts
the differentiation pathway of basal cells in the proximal airway epithelium.
6
Figure 3. Differentiation of cell subsets in the proximal airway epithelium. Basal cells act as
progenitor stem cells, giving rise to club, goblet, ciliated, and neuroendocrine cell populations.
(Pollard et al., 2016).
Single cell sequencing
Single cell sequencing is a tool that can be applied for analysis of various disease systems,
such as in CF. The single cell states of each individual cell can be analyzed to understand the
complex microenvironment and heterogenous states that provide tissue and organ function
(Eberwine et al., 2014). Single cell sequencing can also now be used for epigenetic profiling of
DNA sequences, providing a form of comparison between abnormal and normal cells (Allis et al.,
2016). Data generated from single cell analysis goes through generally two processes: cell isolation
and amplification. From here, a preliminary quality control metric, the total number of cells, genes,
and molecules of those cells can be analyzed. Single cell analysis can also be applied to look at
trajectory analysis, in which a lineage tree can be formed (Shapiro et al., 2013). Overall, direct
single cell sequencing can provide unbiased information in the form of single cell expression
profiles that can allow for clustering and identification subpopulations that can be compared
among various disease states.
In my thesis, I address two aims regarding the epigenetic contribution of basal cells in the
disease cystic fibrosis. The first aim is to identify expression patterns within cell types between
cystic fibrosis and non-cystic fibrosis samples, allowing an understanding of differences in
differentiation between the two subsets of samples. To answer this aim, I analyze single cell RNA
sequencing samples to study cell-to-cell population differences. The second aim is to identify the
root cause of epigenetic changes between cystic fibrosis and non-cystic fibrosis samples. For this
aim, I study multimodal single cell RNA sequencing and single cell ATAC sequencing samples
7
and perform an ingenuity pathways analysis. Collectively, my results have identified key basal
differentiation differences between cystic fibrosis and non-cystic fibrosis samples, showing that
cystic fibrosis samples are more differentiated than non-cystic fibrosis samples.
8
Chapter II
Materials and Methods
Single cell sequencing data in ‘FASTQ format was downloaded from Globus Connect
Personal (version 3.1.6). ‘FASTQ’ data was processed through 10X Genomics Cell Ranger Cloud
(version 7.0.0) via the Single Cell Multiome ATAC and 3’ Gene Expression library types. Of the
outputs, the ‘filtered.feature.bc.matrix.h5’ files were utilized using the “hdf5r” package (version
1.3.5) for single cell RNA sequencing analysis using the “Seurat” package (version 4.1.1) in R
(version 4.2.1). Samples went through normalization, scaling, and principal component analysis
using “Seurat”. Clusters were identified using top differentially expressed markers via p-value
analysis. The Human Protein Atlas (proteinatlas.org, version 21.1) was used for manual cluster
identification. ScType (Aleksandr, 2022) and the package “Azimuth” (version 0.4.6), using the
packages “SeuratData” (version 0.0.2) and “lungref.SeuratData” (Version 2.0.0), were
additionally used for algorithmic cluster identification. Packages “ggplot” (version 3.36) and
“patchwork” (version 1.1.2) were used for plot generation. For single cell ATAC and multimodal
data analysis, the ‘atac_fragments.tsv.gz’ and ‘atac_fragments.tsv.gz.tbi’ file outputs from Cell
Ranger were analyzed using the “Signac” package (version 1.7.0). Annotations were made using
the “EnsDb.Hsapiens.v86” package (version 2.99.0). Pathway analysis was performed by
QIAGEN Ingenuity Pathways Analysis and transcription factors were selected through p-value
and z score analysis. Full code for all analyses can be found on:
https://github.com/singhaakriti/MS/blob/main/README.md.
9
Chapter III
Results
Aim 1. Identifying cell-specific expression patterns in cystic fibrosis
To determine whether basal cell differentiation varies between cystic fibrosis and healthy
airway epithelium, I analyzed patient airway epithelium with respect to their cell-specific
expression patterns. The Marconett lab had previously acquired three cystic fibrosis patient
samples and two non-cystic fibrosis samples. Samples were cells isolated from explant lungs of
patients, with non-CF samples coming from patients with no prior history of chronic lung disease.
For these samples, there was no age/sex information. Single cell RNA sequencing was performed
on these samples and the data was bioinformatically analyzed using Seurat, in R (Hao et al., 2021).
I first performed preliminary quality control on the samples. Figure 4 shows the
preliminary examination of the samples and quality control of the individual samples. Examining
the cell count of each of the samples, it is shown that two of the cystic fibrosis samples (NBG18
and NBG26) have a very low cell count (< 300) compared to the other samples (Figure 4A).
Furthermore, two additional metrics were examined: nFeature_RNA and nCount_RNA.
nFeature_RNA depicts the number of genes detected in a cell, while nCount_RNA depicts the total
number of molecules detected in a cell (Zhou et al., 2020). These metrics are looked at to gain an
understanding of the cell population in each sample, and can allow for filtering. When examining
this, it can be seen that NBG18, NBG19, NBG20, and NBG22 all have similar values of the
number of genes and molecules detected in a single cell. However, NBG26 has a much lower value
of the number of genes and molecules detected in a cell, correlating to its low cell count number
(Figure 4B). This could indicate a low sample quality for NBG26, which could affect future
analyses.
10
Sample Cell Count
NBG18 (CF) 191
NBG19 (non-CF) 1216
NBG20 (CF) 2502
NBG22 (non-CF) 1349
NBG26 (CF) 280
Figure 4. Preliminary quality control of the five preliminary samples. (A) Total cell count of
each of the five preliminary samples. (B) nFeature_RNA and nCountRNA metrics for each
sample, color coded in legend.
After examination of the samples individually, the samples were merged together for future
analyses. I next wanted to examine clustering of the merged samples, to identify various sub
A.
B.
11
populations and to see if they vary between CF and nonCF samples. To do this, first, a principal
component analysis was performed on the merged dataset. This is a statistical technique to reduce
the dimensionality of a dataset, and is used to identify clusters based on a nearest-neighbor
approach. The principal component analysis is visually examined using two plots: an elbow plot
and a jack straw plot. The elbow plot shows the standard deviation of each of the principal
components, and the location of the elbow is an indicator identifying the optimal number of
clusters in the dataset (Figure 5A) (Lun et al., 2016). Similarly, the jack straw plot visually depicts
the statistical significance of each of the principal components (Figure 5B) (Lun et al., 2016). From
both these dimensional reduction analyses, it can be seen that the estimated optimal number of
clusters in this merged dataset is near nine clusters.
A.
12
Figure 5. (A) Elbow plot visually depicting dimensional reduction for the merged dataset. The
elbow plot depicts the standard deviation of each of the principal components. (B) Jack straw
plot depicting dimensional reduction for the merged dataset, showing statistical significance of
each of the principal components.
Next, I explored the various cluster identities. Based on the dimensional reduction analyses,
it was estimated that the optimal number of clusters in this dataset is nine. Clusters were identified
by their top differentially expressed markers. Additionally, the exploration of known lung cell
marker expression levels was also done. Various iterations of clustering and identification were
performed to validate the optimal number of clusters in the dataset as nine (Figure 6). Of these
nine clusters, violin plots depicting gene expression levels showed that clusters 1, 7 and 8 had high
expression levels of FOXJ1 and DNAH5 (ciliated cell markers) (Figure 7). Clusters showing a high
expression of either goblet cell markers (MUC5B) or ciliated cell markers (FOXJ1, DNAH5,
B.
13
DNAH11) with little to no basal or club cell markers were removed from further analysis. The
remaining six supervised clusters were of primary basal and club identity, and they were chosen
to study further as these sub populations act as the proliferating cells of the proximal lung (Figure
8A).
Figure 6. Various iterations of clustering (thirteen, eight, and four clusters) via dimensional
reduction and manual identification of identities by top differentially expressed markers.
14
Figure 7. Known lung markers’ expression levels analyzed via violin plots of the
unsupervised dataset.
Further examination of the six supervised clusters’ top differentially expressed markers
allowed a more specific identification of each cluster’s identity. In addition, heat maps and
expression levels of known lung cell markers were also explored (Figure 8B and 8C). Based on
this examination, cluster identities were established as follows: cluster 0 as transitional basal cells
(basal, club, and ciliated markers), cluster 2 as basal (TOP2A+) proliferating, clusters 3 and 5 as
club cell clusters (SCGB1A1+), and clusters 4 and 6 as basal (KRT5+) differentiating (Figure 8A).
The top differentially expressed markers of each cluster were as follows: cluster 0: TSPAN1,
MTRNR2L8, TXN, cluster 2: TOP2A, MK167, STMN1, cluster 3: SERPINB3, SERPINB4, IGFBP3,
cluster 4 and 5: KRT15, KRT5, MMP10, and cluster 5: SCGB1A1, MUC5B (Supplemental Figure
1).
15
B.
A.
C.
16
Figure 8. (A) UMAP depiction of supervised clustering performed on the merged dataset.
Clusters of ciliated and goblet identity were removed. Remaining six clusters were of primary
basal or club cell identity. (B) Heat map examining the six supervised clusters. Yellow correlates
to higher expression, while purple correlates to lower expression. (C) Violin maps depicting
expression levels of known lung cell specific markers in the six supervised clusters.
After validation, the original identities were reintroduced back into the samples in their
supervised clustering form. By comparing the original identity and the cluster specific identities,
there was a clear distinction between the cell specific identities in the cystic fibrosis samples and
non-cystic fibrosis samples. The non-cystic fibrosis samples had a higher correlation with basal
TOP2a+ and KRT5+ cells, while the cystic fibrosis samples had a higher correlation with club and
transitional basal cells (Figure 9). This distinction suggests that the cystic fibrosis samples were
more differentiated compared to the control samples. However, due to the small sample size and
poor quality of one of the CF samples, patient specific differences were also visualized.
Non-CF: correlated
with basal (TOP2A+
and KRT5+)
CF: Correlated with
club and transitional
basal
17
Figure 9. Original (CF or non-CF) identities of samples brought back into supervised
clusters, depicted in UMAP. Distinction shown between CF and non-CF samples as CF samples
correlated more with club and transitional basal cells, while non-CF samples correlated with
more pure basal cells (KRT5+ and TOP2A+).
To combat the patient specific differences, a secondary reanalysis was performed on
published, available cystic fibrosis patient samples (Carrero et al., 2021). Twenty-four samples
were used for this secondary analysis, including the five previously studied. Of the twenty-four
samples, twelve were of cystic fibrosis identity and twelve were of non-cystic fibrosis identity.
Previous research on these cystic fibrosis single cell samples defined four populations of cells in
the CF airway epithelium as basal, ciliated, secretory, and rare cell types including ionocytes,
neuroendocrine, and FOXN4+ cells (Carraro et al., 2021). Of these, subpopulations were defined
as: three ciliated, five basal, and five secretory subsets (Carraro et al., 2021). In my reanalysis, I
merged all the samples together and performed the same quality control analyses, looking at
nFeature_RNA, nCount_RNA, as well as percent.mt, a metric to examine the mitochondrial
percdntage in each cell (Figure 10).
18
Figure 10. Quality control metrics (nFeature_RNA, nCount_RNA, and percent.mt) examined for
the merged twenty-four samples.
After quality control, the samples were clustered with a UMAP projection (Figure 11A)
(Carrero et al., 2021). From the total batch, basal cells were again supervised clustered to filter out
ciliated and secretory clusters. This left five supervised basal clusters (Figure 11B). From previous
published research, the five basal cell clusters could be defined as: Basal1 with pure basal stem
cells (KRT5+, KRT15+), Basal2 with proliferating basal cells (TOP2A+), Basal3 with a transition
to secretory cells, Basal4 with a transition to ciliated cells (FOS/JUN markers), and Basal5 with
an expression of beta catenin (Carrero et al., 2021). These clusters defined in the literature correlate
with the six clusters I found previously. The proliferating basal cell clusters (cluster 2 & Basal2)
in both groups have a high expression of TOP2A, MK167, and STMN1. The pure basal stem cells
(clusters 4/6 & Basal1) had high expression of KRT15, KRT5, and MMP10. Cluster 3, one of club
cluster, correlated with Basal3 with high expression of SERPINB3, SERPINB4, and IGFBP3. The
other club cluster, cluster 5, correlated with a non-basal cluster, but instead with Secretory1 and
Secretory2, expressing SCGB1A1 and MUC5B. Only cluster 0 was unable to correlate with any of
the clusters defined in the literature. Furthermore, when reintroducing cystic fibrosis identity into
the supervised basal cell clusters, it can be seen that there is overlap between the cystic fibrosis
and non-cystic fibrosis cells. The Basal2 subpopulation of all five basal clusters shows a higher
expression of non-cystic fibrosis cells correlating with proliferating basal cells (Figure 11C).
19
B.
A.
20
Figure 11. (A) UMAP depiction of twenty-four samples (12 CO and 12 CF) merged together,
labeled by cluster identity. (B) UMAP depiction of supervised clustering, looking at five basal
clusters. (C) UMAP depiction of supervised clustering, colored by CF or non-CF (CO) identity.
C.
21
Aim 2. Identifying epigenetic changes between cystic fibrosis and non-cystic fibrosis samples.
To determine the root cause of epigenetic changes between cystic fibrosis and non-cystic
fibrosis samples, fluorescence activated cell sorting (FACS), bulk ATAC sequencing, and
multimodal single cell RNA sequencing with single cell ATAC sequencing was performed and
analyzed. Fluorescence activated cell sorting (FACS) was performed on the five original samples
and was preliminarily analyzed by members of the Amy Ryan lab (University of Iowa). Single cell
samples from the lung were first gated to examine only epithelial cells. Non-cystic fibrosis samples
from the bronchi and trachea were examined, along with two cystic fibrosis samples from bronchi.
Then, the epithelial cells were gated again to examine for only basal cells (Figure 12). A clear
difference was shown between the cystic fibrosis samples and non-cystic fibrosis samples. The
non-cystic fibrosis samples had a higher amount of basal cell population (~63% for both samples),
while the cystic fibrosis samples had a lower amount of basal cell population (8% and 12%),
suggesting they are more differentiated. This correlates with the data shown for Aim 1 (Figure 9),
and suggests there is a difference in cystic fibrosis and non-cystic fibrosis samples due to
differentiation.
A.
B.
22
Figure 12. (A) Non-cystic fibrosis samples taken from the trachea and bronchi. Gated initially
for epithelial cells, then gated again to examine for basal cells: ~63% basal cell population. (B)
Two cystic fibrosis samples taken from the bronchi. Similar gating protocols applied: First
sample has a ~8% basal cell population, second sample has a ~12% basal cell population.
Furthermore, bulk ATAC sequencing was performed on the five samples as well (Amy
Ryan lab, University of Iowa), and preliminarily analyzed by members of the Marconett lab. Bulk
ATAC sequencing analyzes genome-wide chromatin association, assessing open chromatin
regions (transcriptionally active sites). As chromatin is epigenetically regulated, an analysis into
bulk ATAC sequencing can provide information about the differentiation of basal cells. However,
conclusions could not be drawn from the preliminary bulk ATAC sequencing data, as the degree
of overlap between CF and non-CF samples (~90%) was more than the degree of overlap of each
of the sample sets (CF: ~85% or non-CF: ~86%) individually (Figure 13).
23
Figure 13. Degree of overlap between bulk ATAC integration peaks in non-CF sample
set (~86%), CF sample set (~85%), and between the intersect of non-CF and CF sample sets
(~90%).
To perform a secondary analysis, nine additional patient samples (throughout two batches)
were acquired by the Marconett (University of Southern California) and Ryan (University of Iowa)
labs in a joint analysis effort. The two batches included primary isolate patient samples and
cultured basal cell lines. Of the nine samples, four were of cystic fibrosis identity, and four were
of non-cystic fibrosis identity. From the cell mixture, the nuclei were isolated and transposition
sites were attached. After pre-amplification, quality control was performed to check that regions
of interest were enriched and harvested. Multimodal single cell RNA sequencing and single cell
ATAC sequencing was performed on the eight samples, with some at different cultured time points
(P0, P1, P5). In totality, there were sixteen sequencing timepoints of the samples for analysis
(Figure 14). Due to poor initial sample quality (low number of initial cells), three of the samples
were removed from any preliminary or further analysis. The remaining thirteen samples were
analyzed in Seurat, using R. Table 1 shows the composition of patient data information.
24
Figure 14. Composition of sample data information. Samples came from eight patients (4 CF
and 4 CO), with cultured basal patient lines having three timepoints, with 13 total samples.
Sample Sex Age
*CF13 (P1) No info No info
*CF10 (P1) M 64
*CF16 (P0) F 54
*H14 (P1) M 56
*H15 (P0) No info No info
*H15 (P1) No info No info
*H18 (P0) No info No info
H48 (P1) M 63
H14 (P5) M 56
CF16 (P1) F 54
CF7 (P5) F 26
H18 (P5) No info No info
H18 (P1) No info No info
Table 1. Patient data composition of samples. Starred (*) samples indicate samples from batch 1,
non-starred samples indicate samples from batch 2.
The samples were first preliminarily analyzed looking at quality control metrics:
nFeature_RNA and nCount_RNA. For these samples, an additional quality control metric,
mitochondrial percentage (percent.mt), was also examined. Mitochondrial percentage of a cell is
25
a marker of sample quality, as a high mitochondrial percentage can act as an indicator of poor
sample quality. The mitochondrial percentage of each of the samples is very high, with most
showing many cells having a greater than 75% mitochondrial percentage (Figure 15). Downstream
preliminary analysis of these samples showed clusters with their top differentially expressed
markers as all mitochondrial genes: MT-CO1, MT-ND2, MT-ATP6, MT-CO3.
Figure 15. Quality control metrics of thirteen new samples. nFeature_RNA depicts the number
of genes detected in a cell, nCount_RNA depicts the total number of molecules detected in a cell,
and percent.mt depicts the mitochondrial percentage of a cell, acting as a sample quality
indicator. Majority of the samples have many cells with a high mitochondrial percentage.
Due to the high mitochondrial percentage seen in the majority of the samples, filtering was
performed to remove cells with a high mitochondrial percentage. At first, all cells with a
mitochondrial percentage of greater than 25% were filtered out. However, after preliminary
downstream analysis of the filtered dataset, clusters were still showing top markers as
mitochondrial genes. Therefore, more stringent measures were applied to the dataset, and all cells
26
with a mitochondrial percentage of greater than 20% were filtered out of the dataset. The remaining
cells from each sample were used for further analysis. Table 2 shows the number of cells prior to
filtering and post filtering (all cells with >20% mitochondrial content filtered out) for each sample.
Sample Cell Count Pre-
Filtering
Cell Count Post-
Filtering (<20%)
*CF13 (P1) 2243 1665
*CF10 (P1) 11300 3141
*CF16 (P0) 1102 471
*H14 (P1) 3741 1014
*H15 (P0) 2067 106
*H15 (P1) 10606 1463
*H18 (P0) 6377 507
H48 (P1) 3828 812
H14 (P5) 651 134
CF16 (P1) 1864 612
CF7 (P5) 489 95
H18 (P5) 3390 1957
H18 (P1) 4316 1452
Table 2. Pre-filtering and post-filtering (by greater than 20% mitochondrial content) of cells
from thirteen new samples. Starred (*) samples indicate samples from batch 1, non-starred
samples indicate samples from batch 2.
27
After undergoing filtering individually, all thirteen samples were merged together for
further analysis. The merged samples only included cells with a mitochondrial percentage of less
than 20%. Similar to the previous data sets, after primary quality control analysis (Figure 16A), a
principal component analysis was performed (Figure 16B). From the elbow plot, the optimal
number of clusters appears to be around fifteen.
Figure 16. (A) Quality control metrics (nFeature_RNA, nCount_RNA, and percent.mt) for
thirteen new samples, including their time points. Samples all merged together and filtered for
mitochondrial percentage >20%. (B) Elbow plot dimensional reduction for the merged dataset.
The elbow plot depicts the standard deviation of each of the principal components.
A.
B.
28
Clustering via a UMAP was performed on the merged dataset, according to the elbow plot’s
optimal number of clusters. Sixteen clusters were identified (Figure 17A). Additionally, the
clusters were identified according to a computational platform, ScType (Ianevski et al., 2022),
which provides unbiased and accurate cell type annotations. These annotations were validated
through manual analysis of top differentially expressed cluster markers, as well as through
examination of key, known lung markers of interest (Figure 17C). Through this cluster
identification, multiple cell types can be seen, including ciliated, basal, mesothelial, secretory, and
a differentiating cell type cluster (Figure 17B). Furthermore, batch 1 and batch 2 samples were
separated and merged together according to batch for clustering to examine for any batch-specific
effects (Supplemental Figure 2).
A.
29
Figure 17. (A) UMAP depiction of clustering performed according to elbow plot optimal
number of clusters. Sixteen clusters are depicted. Clustering performed after filtering of cells.
B.
C.
30
(B) Labeled UMAP depiction showing cluster cell identities. Cell identities labeled through
computational platform ScType (Ianevski et al., 2022) and validated through top cluster marker
analysis. (C) Basal stem cell genes (KRT5, KRT14, KRT15, TP63, ITGA6) shown in feature
plot for dataset.
After identification of cluster cell identities, the original sample identities were brought
back to examine for group effects. Two separate identities were examined: first, looking at cystic
fibrosis samples compared to non-cystic fibrosis samples, and second, looking at time point
differences (P0 versus P1 versus P5). When re-introducing cystic fibrosis identity into the samples,
and looking specifically at the basal cell populations, it can be seen that cystic fibrosis samples
appear to be clustering with the differentiating and basal phenotype, while the non-cystic fibrosis
samples appear to be clustering with the pure basal cell population (Figure 18A). When analyzing
for subpopulations, it can be seen that basal stem cell identity (KRT5+ and KRT15+) appears to
correlate with non-cystic fibrosis samples, compared to cystic fibrosis samples (Figure 18B).
These observations correlate with those seen in Aim 1 (Figure 9) and in the FACS sorted data
(Figure 12), and suggest that cystic fibrosis samples have a higher degree of differentiation as
compared to the healthy, non-cystic fibrosis samples. When re-introducing timepoint identity into
the samples (P0 versus P1 versus P5), it is visually seen that the samples also appear to cluster
according to time point. The P1 timepoint samples appear to be clustering with the differentiating
basal cell type, while the P5 appears to be clustering with the pure basal cell type (Figure 18B).
However, it is important to notice that of the samples, seven are of P1 identity, three are of P0
identity, and three are of P5 identity. Due to the higher sample size of P1 identities compared to
P0 and P5, the clusters may not be as representative of the total population, as there are highly
specific timepoint clusters visible with little to no overlap.
31
A.
B.
32
Figure 18. (A) UMAP depiction of clusters colored by cystic fibrosis identity. (B) UMAP
depiction of clusters colored by timepoint identity.
Furthermore, individual samples that were cell cultured and had various timepoints were
also analyzed to examine for timepoint/differentiation specific differences. Two sample sets were
picked: Healthy18 and CF16. The Healthy18 sample set had three timepoints: P0, P1, and P5,
while the CF16 cell line had two timepoints: P0 and P1. Both cell line samples went through the
same Seurat pipeline as previous samples. Samples were merged together according to sample set
(all Healthy18 samples were merged together and all CF16 samples were merged together) and
compared. In both cell line samples, the earlier timepoints clustered with more differentiating basal
cell types, while the later timepoints clustered with more progenitor basal cell types (Figure 19).
This correlates with the total merged sample set shown in Figure 17. A confounding variable here
could include cell line CF16 only having P0 and P1 time points to examine, while cell line
Healthy18 had P5, a more differentiated timepoint.
A. B.
C.
D.
33
Figure 19. (A) UMAP depiction of clusters for Healthy18 cell line, colored and labeled by cell
type identity. (B) UMAP depiction of clusters for Healthy18 cell line, colored and labeled by
timepoint identity. (C) UMAP depiction of clusters for CF16 cell line, colored and labeled by
cell type identity. (D) UMAP depiction of clusters for CF16 cell line, colored and labeled by
timepoint identity.
Additionally, the single cell ATAC sequencing data of the samples was analyzed to gain
further insight. This was analyzed using Signac, in R. First, the ATAC samples were analyzed in
the Integrative Genomics Viewer (Broad Institute). The Integrative Genomics Viewer (IGV)
allows for exploration of genomic data, and can be used as a preliminary quality control step for
the ATAC data. In IGV, the ATAC samples were examined looking at the KRT5 and GAPDH
genes to visually assess ATAC quality. Looking at IGV, it can be seen that the samples show
greater chromatin accessibility before and after the gene location (Figure 20). From preliminary
ATAC analysis, the samples H15 (P0) and CF7 (P5) appear to be of lower quality compared to the
other samples, as seen by the tracks (Figure 20).
A.
34
Figure 20. ATAC data for samples examined in the Integrative Genomics Viewer. CF samples
in red and non-CF samples in blue. (A) All ATAC samples for GAPDH gene. (B) All ATAC
samples for KRT5 gene.
Based on the preliminary ATAC analysis, the samples H15 (P0) and CF7 (P5) were
removed from further ATAC analysis. The single cell ATAC sequencing data for the remaining
eleven samples was merged together for analysis, and was clustered similarly to the single cell
RNA sequencing data (Figure 21A). Furthermore, a weighted nearest neighbor analysis was
performed on the multimodal data to define clusters of the integrated single cell RNA and single
cell ATAC together in one representation (Figure 21B). Looking at the weighted nearest neighbor
and ATAC analysis of the unfiltered data, it can be noted that the cystic fibrosis samples appear to
cluster with the basal and ciliated sub-populations, while the non-cystic fibrosis samples appear to
cluster with more differentiating sub-populations: suprabasal, club, goblet, and ciliated clusters
(Figure 21C and 21D). This shows that the cystic fibrosis samples appear to have an identity of
B.
35
cells either in a transitional basal or fully differentiated into ciliated form, while the non-cystic
fibrosis samples appear to have an identity of cells throughout the differentiating process of lung
progenitor cells. A confounding variable here could be that there are more non-cystic fibrosis
samples compared to cystic-fibrosis samples in this analysis (Table 1). Additionally, when
comparing by timepoint, the P0 cells cluster with pure ciliated cells, the P1 timepoint clusters with
more basal, club, and ciliated cells, and the P5 timepoint clusters with the basal and suprabasal
cells (Figure 21E and 21F). This correlates with the timepoint analysis of the single cell RNA
sequencing data by sample (Figure 18). Similarly, a confounding variable here as well could be
that there are less P0 and P5 cells than P1 cells, affecting analysis. Furthermore, when examining
the weighed nearest neighbor analysis and ATAC analysis, it can be visualized that there are sub-
populations of un-clustered cells around the clusters, not visualized in the single cell RNA
sequencing data. This could be due to sample error as this is unfiltered data.
A. B.
C. D.
36
Figure 21. (A) Labeled clusters depicting sup-populations identified from unfiltered single cell
ATAC data. (B) Labeled clusters depicting sup-populations identified from unfiltered weighted
nearest neighbor analysis looking at the single cell RNA and ATAC data. (C) Single cell ATAC
clusters identified by CF/Healthy identity. (D) Weighted nearest neighbor clusters identified by
CF/Healthy identity. (E) Single cell ATAC clusters identified by timepoint identity. (F)
Weighted nearest neighbor clusters identified by timepoint identity.
Furthermore, the filtered single cell ATAC and weighted nearest neighbor analysis was
also examined. Here, all the samples were filtered with a 20% mitochondrial cutoff, similar to the
single cell RNA sequencing samples. Here, it can be noted that there are more populations of cells
that are not clustering together, which could be acting as a technical artifact. Both the single cell
ATAC and weighted nearest neighbor UMAPs show clusters of basal, suprabasal, and goblet
subpopulations (Figure 22A and 22B). The cystic fibrosis overlay shows that the CF samples
appear to correlate with the suprabasl cells, while non-CF samples correlate with the basal cells
(Figure 22C and 22D). Additionally, the timepoint overlay shows that the P1 subset appears to
correlate with the basal subpopulation, the P0 with the goblet subpopulation, and the P5 with the
suprabasal subpopulation (Figure 22E and 22F). The large ciliated cell population seen in the
unfiltered data appears to be filtered out.
E. F.
37
Figure 22. (A) Labeled clusters depicting sup-populations identified from filtered single cell
ATAC data. (B) Labeled clusters depicting sup-populations identified from filtered weighted
nearest neighbor analysis looking at the single cell RNA and ATAC data. (C) Filtered single cell
ATAC clusters identified by CF/Healthy identity. (D) Filtered weighted nearest neighbor clusters
identified by CF/Healthy identity. (E) Filtered single cell ATAC clusters identified by timepoint
identity. (F) Filtered weighted nearest neighbor clusters identified by timepoint identity.
A. B.
F.
E.
C. D.
38
Additionally, an ingenuity pathways analysis by QIAGEN was performed on the total
dataset comparing the P5 timepoint to the P1 timepoint. The ingenuity pathways analysis is a
bioinformatics tool that allows for identification of differentially expressed cellular pathways and
transcription pathways when comparing the P5 time point to the P1 timepoint (Table 2). When
looking at the ingenuity pathways analysis, it can be seen that one of the most highly differentially
expressed pathways when comparing the P5 cells to the P1 cells is the CREB (cAMP Response
Element-Binding Protein) pathway. CREB is a transcription factor that is associated with cellular
processes relating to proliferation, survival, and differentiation (Wen et al., 2010). The P5 subset
of cells correlates with more proliferation and differentiation, as expected. Another highly
differentially expressed gene in the P5 subset compared to the P1 subset is FAK, a transcription
factor involved in proliferation and survival (Li et al., 2008).
Pathway p-value z score
FAK signaling 2.2e-16 4.07
CREB signaling 2.2e-16 4.02
Phagosome
Formation
2.2e-16 3.78
SAPK/JNK signaling 2.2e-16 3.77
IL-15 signaling 2.2e-16 3.68
Table 3. Top five differentially expressed pathways/transcription markers for the P5 subset of
cells when compared to P1, indicated by p-value and z score.
Additionally, the QIAGEN ingenuity pathways analysis was applied to identify
differentially expressed cellular pathways and transcription pathways when comparing the CF to
non-CF subset of samples. When looking at the ingenuity pathways analysis, it can be seen that
39
one of the most least differentially expressed pathways when comparing the CF cells to the non-
CF cells are the glutathione redox reactions. Glutathione is a known lung antioxidant that is low
in the airways of patients with CF (Kettle et al., 2014). Oxidative stress can contribute to the
lowering of glutathione, and as shown in Table 3, the CF subset showed a much lower amount of
glutathione compared to the non-CF subset. Oxidative stress has been shown to promote
proliferation and differentiation in different tissue types (Hu et al., 2017), and could be affecting
proliferation of basal cells as well (Paul et al., 2014). Additionally, multiple inflammatory
pathways were found to be downregulated in the CF samples, including Th17, Nur77, and CTLA4
signaling (Moss et al., 2000).
Pathway p-value z score
Glutathione Redox
Reactions
2.2e-16 -2.23
CTLA4 signaling 2.2e-16 -1.15
Th17 signaling 2.2e-16 -1.06
Nur44 signaling 2.2e-16 -1.51
Table 4. Top differentially expressed pathways/transcription markers for the non-CF subset of
cells when compared to CF, indicated by p-value and z score.
40
Chapter III
Discussion
Cystic fibrosis is a progressive, genetic disease that severely affects the lungs. In lungs, the
major progenitor cells are basal cells, which give rise to various differentially functional epithelial
cells such as goblet, club, and ciliated cells. Single cell sequencing is a tool that can be utilized to
understand cell-to-cell variation between cystic fibrosis and non-cystic fibrosis samples.
Therefore, using single cell sequencing to understand the differentiation differences between cystic
fibrosis and non-cystic fibrosis samples can be useful clinically to identify ways to reprogram
induced pluripotent stem cells into the correct cell populations.
To this end, when examining single cell RNA sequencing and single cell ATAC
sequencing data, principal component analysis identified clusters which showed the cystic fibrosis
and non-cystic fibrosis samples clustered together. Through these clusters, it was identified that
cystic fibrosis samples correlate with a more differentiated phenotype as compared to non-cystic
fibrosis samples. Top differentially expressed markers were computed and examined both
manually and algorithmically and showed that non-cystic fibrosis samples were found to express
basal stem cells (KRT5+ and KRT15+) and differentiating (TOP2A) cell markers, when compared
to cystic fibrosis samples which appear to be more differentiated (with more club SCGB1A1+ and
ciliated FOXJ1+ cell markers). Additionally, fluorescence activated cell sorting data showed a
higher percentage of basal cell population in non-cystic fibrosis samples compared to cystic-
fibrosis samples.
Furthermore, timepoint analysis was also performed on the sample set as an additional
layer of examination. Three timepoints were analyzed throughout the data set: P0, P1, and P5. The
P1 sample set had the highest number of samples and cells, which could have confounded the data
41
analysis. P5 cells correlated with basal and suprabasal cell populations, while P1 cells correlated
with club, ciliated and basal cells. Additionally, an ingenuity pathways analysis by QIAGEN was
performed comparing the timepoint dataset and CF vs non-CF datasets. This showed that the P5
subset showed differentially expressed transcription markers such as FAK and CREB, which are
responsible for proliferation and differentiation as compared to the P1 subset. Additionally, the CF
samples showed more downregulation in inflammatory pathways and oxidative stress.
Limitations of these analyses include the fact that sample size of cystic fibrosis samples by
subset (CF versus non-CF and P0/P1/P5 identity) are limited and low, which can add confounding
variables during analysis. In the future, additional sample sets are planned to be sequenced and
integrated to characterize the differentiation differences between cystic fibrosis and non-cystic
fibrosis identity, differentially expressed transcription factors, and differentiation analysis of
timepoint identity.
42
Supplementary Figures
Supplementary Figure 1. Top ten differentially expressed markers of six supervised clusters
analyzed for cluster identification.
43
Supplementary Figure 2. Individual cluster analysis of batch 1 and batch 2 samples prior to
merging. (A) Clusters with identities for batch 1 samples. (B) Clusters with identities for batch 2
samples.
A.
B.
44
References
Ahmad, A., Ahmed, A., & Patrizio, P. (2013). Cystic fibrosis and fertility. Current Opinion in
Obstetrics & Gynecology, 25(3), 167–172. https://doi.org/10.1097/GCO.0b013e32835f1745
Airway Epithelial Cells. (n.d.). Retrieved September 25, 2022, from
https://www.stemcell.com/airway-epithelial-cells-lp.html
Alaskhar Alhamwe, B., Khalaila, R., Wolf, J., von Bülow, V., Harb, H., Alhamdan, F., Hii, C. S.,
Prescott, S. L., Ferrante, A., Renz, H., Garn, H., & Potaczek, D. P. (2018). Histone
modifications and their role in epigenetics of atopy and allergic diseases. Allergy, Asthma &
Clinical Immunology, 14(1), 39. https://doi.org/10.1186/s13223-018-0259-4
Allis, C. D., & Jenuwein, T. (2016). The molecular hallmarks of epigenetic control. Nature
Reviews Genetics, 17(8), Article 8. https://doi.org/10.1038/nrg.2016.59
Bukowy-Bieryłło, Z. (2021). Long-term differentiating primary human airway epithelial cell
cultures: How far are we? Cell Communication and Signaling, 19(1), 63.
https://doi.org/10.1186/s12964-021-00740-z
Carraro, G., Langerman, J., Sabri, S., Lorenzana, Z., Purkayastha, A., Zhang, G., Konda, B.,
Aros, C. J., Calvert, B. A., Szymaniak, A., Wilson, E., Mulligan, M., Bhatt, P., Lu, J.,
Vijayaraj, P., Yao, C., Shia, D. W., Lund, A. J., Israely, E., … Gomperts, B. N. (2021).
Transcriptional analysis of Cystic Fibrosis airways at single cell resolution reveals altered
epithelial cell states and composition. Nature Medicine, 27(5), 806–814.
https://doi.org/10.1038/s41591-021-01332-7
Cystic Fibrosis | CDC. (2022, September 15).
https://www.cdc.gov/genomics/disease/cystic_fibrosis.htm
Davis, J. D., & Wypych, T. P. (2021). Cellular and functional heterogeneity of the airway
epithelium. Mucosal Immunology, 14(5), Article 5. https://doi.org/10.1038/s41385-020-
00370-7
Doan, L. V., & Madison, L. D. (2022). Cystic Fibrosis Related Diabetes. In StatPearls.
StatPearls Publishing. http://www.ncbi.nlm.nih.gov/books/NBK545192/
Eberwine, J., Sul, J.-Y., Bartfai, T., & Kim, J. (2014). The promise of single-cell sequencing.
Nature Methods, 11(1), Article 1. https://doi.org/10.1038/nmeth.2769
Elborn, J. S. (2016). Cystic fibrosis. The Lancet, 388(10059), 2519–2531.
https://doi.org/10.1016/S0140-6736(16)00576-6
Hu, Q., Khanna, P., Ee Wong, B. S., Lin Heng, Z. S., Subhramanyam, C. S., Thanga, L. Z., Sing
Tan, S. W., & Baeg, G. H. (2017). Oxidative stress promotes exit from the stem cell state
and spontaneous neuronal differentiation. Oncotarget, 9(3), 4223–4238.
https://doi.org/10.18632/oncotarget.23786
45
IanevskiAleksandr. (2022). ScType: Fully-automated and ultra-fast cell-type identification using
specific marker combinations from single-cell transcriptomic data [HTML].
https://github.com/IanevskiAleksandr/sc-type (Original work published 2019)
Ingenuity Pathway Analysis | QIAGEN Digital Insights. (n.d.). Bioinformatics Software and
Services | QIAGEN Digital Insights. Retrieved October 7, 2022, from
https://digitalinsights.qiagen.com/products-overview/discovery-insights-portfolio/analysis-
and-visualization/qiagen-ipa/
Kettle, A. J., Turner, R., Gangell, C. L., Harwood, D. T., Khalilova, I. S., Chapman, A. L.,
Winterbourn, C. C., & Sly, P. D. (2014). Oxidation contributes to low glutathione in the
airways of children with cystic fibrosis. European Respiratory Journal, 44(1), 122–129.
https://doi.org/10.1183/09031936.00170213
Li, S., & Hua, Z. (2008). Chapter 3 FAK Expression: Regulation and Therapeutic Potential. In
Advances in Cancer Research (Vol. 101, pp. 45–61). Academic Press.
https://doi.org/10.1016/S0065-230X(08)00403-X
Lun, A. T. L., McCarthy, D. J., & Marioni, J. C. (2016). A step-by-step workflow for low-level
analysis of single-cell RNA-seq data with Bioconductor. F1000Research, 5, 2122.
https://doi.org/10.12688/f1000research.9501.2
Lynch, T. J., & Engelhardt, J. F. (2014). Progenitor Cells in Proximal Airway Epithelial
Development and Regeneration. Journal of Cellular Biochemistry, 115(10), 1637–1645.
https://doi.org/10.1002/jcb.24834
Mathew, A., Dirawi, M., Abou Tayoun, A., & Popatia, R. (n.d.). A Rare Cystic Fibrosis
Transmembrane Conductance Regulator (CFTR) Mutation Associated With Typical Cystic
Fibrosis in an Arab Child. Cureus, 13(2), e13526. https://doi.org/10.7759/cureus.13526
McBennett, K. A., Davis, P. B., & Konstan, M. W. (2022). Increasing life expectancy in cystic
fibrosis: Advances and challenges. Pediatric Pulmonology, 57(Suppl 1), S5–S12.
https://doi.org/10.1002/ppul.25733
Moore, L. D., Le, T., & Fan, G. (2013). DNA Methylation and Its Basic Function.
Neuropsychopharmacology, 38(1), Article 1. https://doi.org/10.1038/npp.2012.112
Moore, P., & Tarran, R. (2018). The epithelial sodium channel (ENaC) as a therapeutic target for
cystic fibrosis lung disease. Expert Opinion on Therapeutic Targets, 22.
https://doi.org/10.1080/14728222.2018.1501361
Moss, R. B., Hsu, Y.-P., & Olds, L. (2000). Cytokine dysregulation in activated cystic fibrosis
(CF) peripheral lymphocytes. Clinical and Experimental Immunology, 120(3), 518–525.
https://doi.org/10.1046/j.1365-2249.2000.01232.x
Orenstein, D. M. (2004). Cystic Fibrosis: A Guide for Patient and Family. Lippincott Williams
& Wilkins.
46
Paul, M., Bisht, B., Darmawan, D., Chiou, R., Ha, V., Wallace, W., Chon, A., Hegab, A.,
Grogan, T., Elashoff, D., Alva-Ornelas, J., & Gomperts, B. (2014). Dynamic changes in
intracellular ROS levels regulate airway basal stem cell homeostasis through Nrf2-
dependent Notch signaling. Cell Stem Cell, 15(2), 199–214.
https://doi.org/10.1016/j.stem.2014.05.009
Pollard, B. S., & Pollard, H. B. (2018). Induced pluripotent stem cells for treating cystic fibrosis:
State of the science. Pediatric Pulmonology, 53(S3), S12–S29.
https://doi.org/10.1002/ppul.24118
Pontén, F., Jirström, K., & Uhlen, M. (2008). The Human Protein Atlas—A tool for pathology.
The Journal of Pathology, 216(4), 387–393. https://doi.org/10.1002/path.2440
Robert, R., Savineau, J.-P., Norez, C., Becq, F., & Guibert, C. (2007). Expression and function
of cystic fibrosis transmembrane conductance regulator in rat intrapulmonary arteries. The
European Respiratory Journal, 30(5), 857–864.
https://doi.org/10.1183/09031936.00060007
Schwiebert, E. M., Morales, M. M., Devidas, S., Egan, M. E., & Guggino, W. B. (1998).
Chloride channel and chloride conductance regulator domains of CFTR, the cystic fibrosis
transmembrane conductance regulator. Proceedings of the National Academy of Sciences,
95(5), 2674–2679. https://doi.org/10.1073/pnas.95.5.2674
Shapiro, E., Biezuner, T., & Linnarsson, S. (2013). Single-cell sequencing-based technologies
will revolutionize whole-organism science. Nature Reviews Genetics, 14(9), Article 9.
https://doi.org/10.1038/nrg3542
The Role of the Transcription Factor CREB in Immune Function | The Journal of Immunology.
(n.d.). Retrieved October 7, 2022, from
https://www.jimmunol.org/content/185/11/6413.short
Tools for Single Cell Genomics. (n.d.). Retrieved October 10, 2022, from
https://satijalab.org/seurat/index.html
Wei, J.-W., Huang, K., Yang, C., & Kang, C.-S. (2017). Non-coding RNAs as regulators in
epigenetics (Review). Oncology Reports, 37(1), 3–9. https://doi.org/10.3892/or.2016.5236
Zhou, B., & Jin, W. (2020). Visualization of Single Cell RNA-Seq Data Using t-SNE in R. In B.
L. Kidder (Ed.), Stem Cell Transcriptional Networks: Methods and Protocols (pp. 159–
167). Springer US. https://doi.org/10.1007/978-1-0716-0301-7_8
Abstract (if available)
Abstract
Cystic fibrosis is a progressive, genetic disease that severely affects the lungs and other organs of the body. In the lungs, the major progenitor cells are basal cells, which give rise to other differentiated functional epithelial cells in the airways. Epigenetic changes can affect how basal cells proliferate and differentiate in the CF lung. To study cell-to-cell variation between CF and non-CF, single cell sequencing was applied. From single cell sequencing, principal component analysis identified clusters showing a clustering of CF and non-CF cells together. Through these clusters, it was shown that CF samples correlate with a more differentiated phenotype (ciliated FOXJ1 and club SCGB1A markers), while non-CF samples had more basal stem cell phenotypes (KRT5 and KRT15). Additionally, fluorescence activated cell sorting data showed a higher percentage of basal cells in non-CF samples compared to CF samples. Furthermore, an ingenuity pathways analysis by QIAGEN showed that P5 cells showed differentially expressed transcription markers such as FAK and CREB, and CF cells showed an upregulation in pathways relating to inflammation. Overall, single cell sequencing studies of CF and non-CF patient samples showed a difference in cell populations between the two groups, with CF samples correlating with differentiating cells. Utilization of single cell ATAC was able to increase our ability to discriminate between basal cell subtypes in culture, further elucidating the need for understanding epigenomic context of basal cell differentiation.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Evaluating the impact of long non-coding RNAs on tumor mutational burden in cancer
PDF
Generation of an epigenetic toggle switch to test LINC00261 function on lung adenocarcinoma cellular response to the chemotherapeutics oxaliplatin and carboplatin
PDF
In vitro lineage tracing of immortalized human alveolar epithelial cells
PDF
Optimizing an immortalized human alveolar epithelial cell line model system to recapitulate lung adenocarcinoma development in vitro
PDF
Defining the functional region of LINC00261 in lung adenocarcinoma
PDF
Modeling lung adenocarcinoma progression in vitro using immortalized human alveolar epithelial cells
PDF
Evaluation of preservatives in blood collection tubes for cell-free RNA transcriptional profiles in human plasma
PDF
Limit of detection analysis for cell-free DNA methylation using targeted bisulfite sequencing
PDF
Elucidating the cellular origins of lung adenocarcinoma
PDF
RNA methylation in cancer plasticity and drug resistance
PDF
DNA methylation markers for blood-based detection of small cell lung cancer in mouse models
PDF
Application of tracing enhancer networks using epigenetic traits (TENET) to identify epigenetic deregulation in cancer
PDF
Breast epithelial cell type specific enhancers and functional annotation of breast cancer risk loci
PDF
Investigating the function and epigenetic regulation of ABCA3, a novel LUAD tumor suppressor gene
PDF
LINC00261 induces a G2/M cell cycle arrest and activation of the DNA damage response in lung adenocarcinoma
PDF
Development of DNA methylation based biomarkers for the early detection of squamous cell lung cancer
PDF
Genotype-phenotype associations in children with a positive cystic fibrosis newborn screening
PDF
Mechanism of Mbnl1/2-depletion mediated neural defects
PDF
Peripheral myelin protein 22 promotes intestinal epithelial cell survival and barrier function maintenance
PDF
Mechanism of a new CK2 inhibitor triggering senescence in breast cancer cells
Asset Metadata
Creator
Singh, Aakriti
(filename)
Core Title
Determining the epigenetic contribution of basal cell identity in cystic fibrosis
School
Keck School of Medicine
Degree
Master of Science
Degree Program
Biochemistry and Molecular Medicine
Degree Conferral Date
2022-12
Publication Date
11/05/2024
Defense Date
11/05/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
bioinformatics,CF,cystic fibrosis,Lung,OAI-PMH Harvest,single cell sequencing
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Marconett, Crystal (
committee chair
), Offringa, Ite (
committee member
), Rhie, Suhn (
committee member
)
Creator Email
aakriti@usc.edu,singhaakriti00@gmail.com
Unique identifier
UC112305795
Identifier
etd-SinghAakri-11306.pdf (filename)
Legacy Identifier
etd-SinghAakri-11306
Document Type
Dissertation
Format
theses (aat)
Rights
Singh, Aakriti
Internet Media Type
application/pdf
Type
texts
Source
20221107-usctheses-batch-990
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
bioinformatics
CF
cystic fibrosis
single cell sequencing