Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Deciphering heterogeneity of preleukemic clonal expansion
(USC Thesis Other)
Deciphering heterogeneity of preleukemic clonal expansion
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Deciphering Heterogeneity of Preleukemic Clonal Expansion
by
Charles Bramlett
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
in Partial Fulfilment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(DEVELOPMENT, STEM CELLS, and REGENERATIVE MEDICINE)
December 2022
ii
Epigraph
“Cancer, we now know, is a disease caused by the uncontrolled growth of a
single cell. This growth is unleashed by mutations—changes in DNA that specifically
affect genes that incite unlimited cell growth. In a normal cell, powerful genetic circuits
regulate cell division and cell death. In a cancer cell, these circuits have been broken,
unleashing a cell that cannot stop growing...
...Malignant growth and normal growth are so genetically intertwined that
unbraiding the two might be one of the most significant scientific challenges faced by
our species. Cancer is built into our genomes: the genes that unmoor normal cell
division are not foreign to our bodies, but rather mutated, distorted versions of the very
genes that perform vital cellular functions.”
—Dr. Siddhartha Mukherjee,
The Emperor of All Maladies: A Biography of Cancer
iii
Dedication
To all those who participated in my scientific and personal growth. In particular, those
who pushed me to strive for more than I knew I was capable of.
iv
Acknowledgments
I am grateful to my mentor, Dr. Rong Lu, for her guidance and expertise. She has
enabled me to grow as a scientist and individual the past six years. She always
provided sage advice regarding experiential design, grant proposals, and manuscript
revisions. Her enthusiasm and strong work ethic have been ever present whilst
balancing her own life and priorities. I am thankful for the members of my dissertation
committee Dr. Min Yu, Dr. Unmesh Jadhav, and the late Dr. Neil Segil who have given
me valuable advice and feedback throughout my doctoral years. I would like to thank
Dr. Jan Nolta and Johnathon Anderson for beginning my scientific journey at UC Davis.
They gave me the opportunity to dive in research at a young age and set the foundation
for all my future successes. I thank all the members of the Lu lab, past and present, for
assisting in research and providing friendship during such a formative time. I thank Ania
Nogalska for training me and providing endless support throughout my masters and
doctoral work. I thank Du Jiang for being a mentor and friend in the early days and
facilitating my late-night discussions to cover experimental details as well as co-writing
the Nature Protocols manuscript. I thank the NIH and NHLBI for choosing to fund my
predoctoral training, I am grateful the U.S. government finds priority in investing in
scientific training. Lastly, I thank my friends and family, for their support outside of the
laboratory. I thank my mother, Gailmarie Bramlett, for her unwavering conviction in my
success. I thank my father John Bramlett, for continued encouragement. I am forever
thankful for both of your endless support during my journey through higher education
and beyond.
v
Table of Contents
Epigraph........................................................................................................................... ii
Dedication........................................................................................................................ iii
Acknowledgments........................................................................................................... iv
List of tables..................................................................................................................... vi
List of figures................................................................................................................... vii
Abbreviations.................................................................................................................. ix
Abstract........................................................................................................................... xi
Chapter One: Introduction................................................................................................ 1
Chapter Two: Improvement of clonal tracking via lentiviral
genetic barcoding.................................................................................... 14
Chapter Three: RNA splicing underlies heterogeneous
preleukemic clonal expansion............................................................... 75
Chapter Four: GTPase Smap1 accelerates preleukemic
clonal expansion in mice...................................................................... 128
Chapter Five: Discussion.............................................................................................. 135
References................................................................................................................... 139
Appendix...................................................................................................................... 159
vi
List of tables
Table 2.1 Barcode Oligos for each Library ID................................................................. 62
Table 2.2 Primer list........................................................................................................ 64
Table 2.3 Troubleshooting table..................................................................................... 65
Table 3.1 Primers & Oligos........................................................................................... 125
Table 4.1 Primers......................................................................................................... 135
vii
List of figures
Figure 1.1 Simplified schematic of hematopoietic hierarchy and MPP biases.................. 7
Figure 2.1 Experimental Workflow.................................................................................. 68
Figure 2.2 Comparing barcode extraction replicates....................................................... 70
Figure 2.3 qPCR Amplification of Barcode...................................................................... 71
Figure 2.4 Optimizing the edit distance thresholds.......................................................... 73
Figure 2.5 Python pipeline outputs................................................................................. 74
Figure 3.1 Heterogeneous cellular expansion induced by Tet2 knockout..................... 104
Figure 3.2 Preleukemic clonal expansion in Tet2 KO mice is driven by
rare, overexpanded clones......................................................................... 106
Figure 3.3 Overexpanded myeloid-biased Tet2 KO clones drive MPP3
and granulocyte expansion......................................................................... 108
Figure 3.4 Overexpanded myeloid biased Tet2 KO clones exhibit
significantly reduced expression of genes associated with
RNA splicing and acute myeloid leukemia................................................... 110
Figure 3.5 Repression of the RNA splicing factor Rbm25 accelerates
expansion of Tet2 KO HSPCs..................................................................... 113
Figure 3.S1 Tet2 knockout (KO) validation................................................................... 116
Figure 3.S2 Flow cytometry gating for peripheral blood cells........................................ 117
Figure 3. S3 Flow cytometry gating for bone marrow hematopoietic
stem and progenitor cells......................................................................... 118
Figure 3.S4 Number of clones detected in each mouse................................................ 119
Figure 3.S5 Clonal abundance of top 5 and top 10 most abundant
clones in each mouse............................................................................... 120
viii
Figure 3.S6 Hierarchical clustering of all clones from control and
Tet2 KO mice........................................................................................... 121
Figure 3.S7 UMAP visualization of data collected from all HSCs
and MPP3s............................................................................................... 122
Figure 3.S8 Number of unique and shared biological processes
and disease gene ontology (GO) terms.................................................... 123
Figure 3.S9 Engineering leukemic cell lines to knockout Tet2
and knockdown Rbm25 expression.......................................................... 124
Figure 4.1 Engineering leukemic cell lines to knockout Tet2
and knockdown Smap1 expression............................................................ 132
Figure 4.2 Repression of the Smap1 accelerates proliferation of
Tet2 KO hematopoietic cell lines in vitro................................................... 133
Figure 4.3 Repression of the Smap1 accelerates expansion
of Tet2 KO HSPCs in vivo........................................................................... 134
ix
Abbreviations
AML Acute myeloid leukemia
cDNA complementary DNA
CH Clonal hematopoiesis
CLP Common lymphoid progenitor
CMP Common myeloid progenitor
CRISPR Clustered Regularly Interspaced Short Palindromic Repeats
DAPI 4′,6-diamidino-2-phenylindole
DMEM Dulbecco's Modified Eagle Medium
DNA Deoxyribonucleic acid
FBS Fetal bovine serum
gDNA genomic DNA
GFP Green fluorescent protein
GMP Granulocyte monocyte progenitor
GO Gene ontology
gRNA guide RNA
HSC Hematopoietic stem cell
HSPC Hematopoietic stem and progenitor cell
IP Intraperitoneal
KO Knockout
MDS Myelodysplastic syndrome
MEP Megakaryocyte erythrocyte progenitor
MNC Mononuclear cell
x
MPN Myeloproliferative neoplasm
MPP Multipotent progenitor
mRNA Messenger RNA
NGS Next-generation sequencing
PBS Phosphate buffered saline
PCR Polymerase chain reaction
qPCR Quantitative PCR
RFP Red fluorescent protein
RMP RNA metabolic processing
RNA Ribonucleic acid
RT-PCR Real time-PCR
SCF Stem cell factor
sgRNA Single guide RNA
TPO Thrombopoietin
WBC White blood cell
WT Wildtype
xi
Abstract
Clonal expansion is a critical step in the early stage of leukemia genesis. While genetic
mutations that induce expansion and increase disease risk --such as TET2, DNMT3a,
and ASXL-- have been identified, these mutations are also frequently found in healthy
individuals and many never progress to malignancy. This thesis aims to further
understand the basis of preleukemic heterogeneity, especially amongst cells that share
the same genetic mutation. Here, we tracked preleukemic clonal expansion using
genetic barcoding and found preleukemic hematopoietic stem cells (HSCs) are highly
heterogeneous both at the clonal and transcriptomic levels. We identified HSC clones
that underwent extreme expansion and their associated gene expression. These
overexpanded HSCs expressed significantly lower levels of genes associated with
leukemia predominately associated with RNA splicing genes compared to non-
overexpanded Tet2 knockout HSCs. These heterogeneous differences could contribute
to the variable disease risks across individuals and further risk-stratify patients who
harbor preleukemic mutations.
1
Chapter One: Introduction
Stem Cell Types and Potencies
A stem cell is uniquely defined as harnessing the ability to self-renew indefinitely
and differentiate into diverse lineages. The most naïve of all stem cells in mammals is
the zygote. The zygote is totipotent-- a single cell capable of producing every cell in an
entire organism as well as extra-embryonic cell types. In mammals it has been
appreciated that the first four divisions (resulting in eight total cells) produce cells that
are totipotent. However, to date these cells remain the only recognized cells capable of
generating a complete organism (Baker & Pera, 2018; Zakrzewski et al., 2019).
Succeeding totipotent stem cells, are pluripotent stem cells. Pluripotent stem
cells can produce all germ layers but lack the ability to produce extra-embryonic cell
types and fail to produce entire organisms. Pluripotent stem cells have been the focus
of many regenerative medicine efforts as they harness the potential to replenish all adult
tissues. In recent years, it has been displayed all cells can return to a pluripotent state
by reverting its epigenome(Smith et al., 1988). Through a process of cell
reprogramming, epigenetic plasticity enables cellular plasticity, repurposing the specific
function of individual cells (Takahashi & Yamanaka, 2006; Yamanaka, 2020). Examples
of pluripotent stem cells include embryonic stem cells (ESCs) and induced pluripotent
stem cells.
Multipotent stem cells are a narrower type of stem cell, often responsible for
replenishing a single tissue for life. While still harnessing the capacity to self-renew
indefinitely, their differentiation potential is limited to a specific organ. These stem cells
are often referred to as ‘adult’ stem cells as they are responsible for life-long
2
rejuvenation through adulthood following embryogenesis(Weissman & Shizuru, 2008;
Zakrzewski et al., 2019). Examples of multipotent stem cells include neural stem cells,
mesenchymal stem cells, and hematopoietic stem cells (HSCs).
Oligopotent cells are lineage restricted progenitors as they do not have the ability
to recapitulate an entire organ and often do not sustain life-long production. Examples
of oligopotent cells are neural progenitors, osteoprogenitors, myeloid progenitors, and
lymphoid progenitors. Lastly, unipotent cells are those which can only produce one cell
type, their own. These are typically terminally differentiated cells. These cells are neither
stem nor progenitor cells. They are the stereotypical functional cells of respective
tissues and organs(Zakrzewski et al., 2019). Examples of unipotent cell types include
neurons, osteoblasts, fibroblasts, neutrophils, and lymphocytes.
The variable potential of stem cells has been famously illustrated by Waddington,
as a “ball rolling down a hill”. The paradigm of Waddington’s Landscape is-- once the
ball has rolled down the gravitation potential, it cannot revert. This notion has been
refuted, most notably by Yamanaka’s anti-dogmatic work discovering induced
pluripotent stem cells. However, there is still much to attribute to this “ball rolling down a
hill” model. Many stem cell systems seemingly follow this paradigm where tissue-
specific stem cells sit atop their respective hierarchies and give rise to all downstream
tissue-specific cell types. While mechanisms of the regulation and processes by which
this happens are still being uncovered it is appreciated that potency decreases
throughout differentiation and that this process is largely facilitated through diminishing
epigenetic plasticity. Perhaps no better models exist to display this biological process in
adulthood than HSCs. HSCs on the other hand are a great model to study adult stem as
3
they are well-defined cell type that can be purified, self-renew, and give rise to all
hematopoietic lineages in vitro and in vivo. HSCs offer a biological model to understand
the mechanisms that govern how stem cells operate at different stages of biology.
Additionally, by furthering our understanding of HSCs and adult stem cells we are more
equipped to leverage their regenerative potential for future therapies and cures for
otherwise untreatable diseases.
The Hematopoietic Stem Cell
Hematopoietic stem cells are the most appreciated and utilized adult, multipotent
stem cells to date. HSCs remain one of the few recognized true stem cell populations
capable of repopulating an entire organ from a single cell, the hematopoietic
system(Weissman, 2000). HSCs were first discovered in the 1950s by Medawar and
Billingham. Their work in 1953 named “Actively acquired tolerance of foreign cells” first
highlighted the ability of successful hematopoietic transplant into fetal and neonatal
mice (Billingham et al., 1953). Medawar later won the 1960 Nobel Prize in Physiology or
Medicine this seminal work. In light of these findings, Till and McCulloch began their
search to reveal the identify for the HSC. They published a series of experiments that
set the foundation for the entire hematopoietic field that still serve as paradigms today,
over 60 years later(Becker et al., 1963; Siminovitch L. et al., 1963; Till & McCulloch,
1961; Wu A. M. et al., 1967). Their findings established that hematopoietic cells existed
in the bone marrow that could give to themselves and other myelo-erythroid progeny as
well as cells existed in the spleen could give rise to lymphoid progeny. They established
the cell responsible for this repopulation were indeed of clonal origin by using
chromosomal labeling techniques of the day that allowed them to track single cells and
4
divisional progeny. They observed clones displayed variable potential to produce
myeloid and lymphoid lineages providing early indication of heterogeneity within these
stem-like clonal populations. Ultimately, Till and McCulloch proposed these cells were
HSCs, which harnessed the unique ability to self-renew and repopulate all blood cell
types needed to recapitulate the hematopoietic system.
These findings spurred bone marrow transplants as a curative therapy by the
1970s and 1980s, but the HSC had still not been identified and isolated. To this point,
only a posteriori observation had been made and the field needed an a priori
identification to proper isolate and characterize HSCs. The search for purified HSCs
began by separating cells from bone marrow by size and density (Iscove, 1990), then by
self-surface proteins (Hulett et al., 1969; Kohler & Milstein, 1975). But characterization
assays were limited to in vitro colony forming assays which did not distinguish stem
cells and their progenitors easily.
Next came the development of the mouse transplantation assay, by which the
C57BL genetic background contained two alleles for the pan hematopoietic surface
marker CD45(F.-W. Shen et al., 1986). With this new research tool, donor and host cells
could be easily distinguished using monoclonal antibodies against the varying CD45
alleles. This simple but powerful assay could now reveal the true nature of HSCs by
ensuring isolated and transplanted cells self-renewed indefinitely and recapitulated the
entire hematopoietic system in vivo.
Using a combination of in vitro and in vivo assays Weissman was able to
systematically narrow the search for purified HSCs. Weisman developed antibodies
against known myeloid, erythroid, and lymphoid mature blood cells. These cell types
5
were defined as the lineage cell types as they are the derivative cell types of
hematopoietic precursors and according were identified as not stem cells(Coffman &
Weissman, 1983; Kondo et al., 2003; Muller-Sieburg et al., 1986). Using the absence of
lineage defining markers as well as presence of Sca1 and cKit (Ikuta & Weissman,
1992; Morrison & Weissman, 1994), repopulation efficiency increased massively
narrowing the search for purified HSCs further. To this end, a collection of surface
markers have been identified further refining the purity of the HSC population(Busch et
al., 2015; Chen et al., 2016; Kiel et al., 2005; Oguro et al., 2013). The Rong Lu lab at
USC defines mouse HSCs as: lineage
−
, cKit
+
, Sca1
+
, Flk2
−
, CD34
−
, CD150
+
and human
HSCs as: lineage
−
, CD34
+
, CD38
−
, CD90
+
, CD45RA
−
.
The Hematopoietic Hierarchy
By systemically isolating different cell population defined by cell surface markers,
stem cells, multipotent progenitors (MPPs), and oligopotent progenitors were eventually
elucidated. The conventional hematopoietic hierarchy was born, where HSCs sit atop
and give rise to all downstream blood cells. The hematopoietic system comprises two
predominant lineages, the myeloid and the lymphoid. The myeloid system is largely
responsible for producing the innate immune system while the lymphoid responsible for
the adaptive immune system(Weissman & Shizuru, 2008).
From HSCs arise MPPs, which differentiate into all downstream cell types but do
not self-renew indefinitely. From MPPs arise common myeloid progenitors (CMPs) and
common lymphoid progenitors (CLPs) responsible for recapitulating their respective
hematopoietic lineages(Akashi et al., 2000; Kondo et al., n.d.). CMPs give rise to
granulocyte macrophage progenitors (GMPs) and megakaryocyte-erythroid progenitors
6
(MEPs). GMPs eventually give rise to granulocytes and monocytes while MEPs
eventually give rise to megakaryocytes, erythrocytes, and platelets(Akashi et al., 2000).
CLPs give rise to lineage restricted progenitors like pro-T cell, pro-natural killer, pro-b
progenitors cells which produce T cells, natural killers, and B cells
respectively(Karsunky et al., 2008; Kondo et al., n.d.). In more recent years, the
heterogeneity within the HSPC compartment has been elucidated by determining
lineage preferences of subtypes of MPPs(Pietras et al., 2015). MPP1 has been
proposed as a short-term HSC that is more divisionally active, perhaps in preparation
for differentiation. MPP2 is an erythroid-biased progenitor that preferentially produces
MEPs and can even skip the CMP differentiation step altogether. MPP3 is a myeloid-
biased progenitors that produce CMPs in a skewed manor. Lastly, MPP4 is a lymphoid-
biased progenitor which differentiate preferentially into CLPs and lymphoid lineage. The
tendencies of these cell types to differentiate is the dogmatic representation of
hematopoiesis, which classically is defined by cell surface markers (Figure 1.1). More
recent advances of single cell genomics— both transcriptomics and epigenomics—have
shed light on the continuous process of hematopoiesis rather than the step wise
diagrams traditionally portrayed, though this work is still being investigated (Nestorowa
et al., 2016; Pellin et al., 2019). These -omics based datasets define cell types by their
comprehensive molecular status rather than collection of cell surface makers and may
better define or even identify new cell types.
7
Figure 1.1 Simplified schematic of hematopoietic hierarchy and MPP biases.
HSC Heterogeneity
Despite HSPCs sharing cell surface markers, it has been well appreciated that
there is a high level of heterogeneity that exists amongst individual stem and progenitor
cells. This was first noticed in the amount and types of divisional progenies produced
beginning from the clonal colonies observed in search of the HSC (Till et al., 1964).
There has been a concerted ongoing effort to understand the basis for HSC
heterogeneity in almost all aspects of hematopoietic biology including lineage bias,
proliferation potential, aging, engraftment, and homing potential. The identification of
purified HSCs enabled study of heterogeneity in vivo, by isolating single cells and
transplanting them into conditioned recipients. Through transplantation it has been
8
identified individual HSCs are subset into different lineage preferences, such as
lymphoid-biased (or myeloid-deficient), myeloid-biased (or lymphoid-deficient), and
lineage balanced (Beerman, Bhattacharya, et al., 2010; Dykstra et al., 2007). Even
more exhaustive efforts to study repopulation have identified at least 16 different HSC
kinetic patterns following transplantation(Sieburg et al., 2006). Environmental changes
in cell signaling related to inflammation have been shown to alter mTOR (a key
regulator of cellular metabolism) and affect HSPC differentiation (Ergen et al., 2012).
Moreover, difference in mitochondrial metabolism has been shown to influence HSC
self-renewal and differentiation. Knockout mice with depleted mitochondrial gene
PTPMT1 exhibited 40 times higher levels of self-renewal with an inability to differentiate
(Yu et al., 2013).
HSCs also exhibit high levels of heterogeneity during aging and disease genesis.
The variability of aging onset is well recognized but the mechanisms that dictate cellular
and molecular heterogeneity are not well understood even in models where genetic
background and environment are controlled(Mitnitski et al., 2017; Yashin et al., 2016).
In the hematopoietic system aging manifests by an imbalance of different cell types,
particularly lymphoid-deficiency and myeloid-skewing. Additionally, phenotypic HSCs
undergo high levels of expansion during aging despite their reduced capacity to produce
a balanced blood supply. (Beerman, Maloney, et al., 2010; Pang et al., 2017). This
phenotype is associated with limited ability to fight infections and increased
inflammation which affect an organism’s overall health. Despite the acknowledgment of
hematopoietic heterogeneity, the mechanistic basis for variable HSC behaviors during
homeostasis, aging, and disease largely remain a mystery.
9
Clonal hematopoiesis & leukemia-genesis
Clonal expansion occurs when a single cell and its divisional progeny excessively
increase in number. Clonal expansion is common in development, regeneration, aging,
and cancer-genesis (Doan et al., 2013; Jaiswal & Ebert, 2019; Jan et al., 2012; Park et
al., 2021; Zhu et al., 2019). Clonal expansion commonly occurs in the hemopoietic
system where a single or few HSC clones disproportionately populate the blood system.
This process, coined clonal hematopoiesis (CH), is thought to be largely driven by
somatic mutations that increase the fitness of individual HSCs, although does occur
even in the absence of candidate driver mutations(Jaiswal & Ebert, 2019; Zink et al.,
2017).
Frequency of somatic mutations increases with in all cells and tissues. When
mutations occur in stem cells they offer life-long changes, as stem cells are long-lived
cells. If the mutations are advantageous, they confer fitness advantage compared to
non-mutated counterparts. Clonal skewing was first recognized by non-random
inactivation of the X-chromosomes of women. In a process that is thought to be
otherwise random, there was a disproportionate amount of blood cells that had the
same X-chromosome inactivated indicating they may be derived from the same
clone(Champion et al., 1997). In the ensuing decades, DNA sequencing efforts
identified a collection of somatic mutation in the tumors of patients with leukemias, and
found that a small collection of mutation was overly represented, indicating a clonal
origin and advantage for such cells(Welch et al., 2012). These mutations were largely
found to occur in epigenetic modifiers and chromatin modifiers such as TET2, DNMT3A,
10
ASXL1, and IDH2(Papaemmanuil et al., 2016). Experimental evidence mutating these
genes in mice indeed revealed their role in causing chromosomal abnormalities, stem
cell dysfunction, differentiation disabilities and eventual malignancy further validating
their role as the origins of leukemia(Abdel-Wahab et al., 2013; Challen et al., 2012;
Moran-Crusio et al., 2011).
In 2014, researchers analyzed controls from genome wide association studies
and found the presence of mutations in genes such as TET2, DNMT3A, ASXL1—the
same genes that were identified as the drivers of leukemia. These individuals had
increased variable allele frequencies for mutations in these gene indicating clonal
expansions, otherwise they were considered healthy. Accordingly, the same individuals
displayed a 10-fold increased risk of developing a hematologic malignancy such that
harboring an expanded clone with a high-risk mutation raised the background
malignancy rate from 0.5% to 5% (Jaiswal et al., 2014). This highlights the highly
heterogenous nature of disease genesis as well as raises the question “what additional
changes in addition to CH need to occur to drive malignancy?”. While there have been
efforts to understand the end stage malignancy resulting from mutations to high-risk
gene little is known about the cellular and molecular mechanism that occur in the early
stages of leukemia genesis and CH that contribute to clonal outgrowths.
Quantifying clones and single cell transcriptomes
It is intrinsically difficult to study single cells. They appear the same visually and
are not distinguished easily from one another. To tackle this hurdle previous
researchers have performed experiments with single cells in vitro and in vivo(Dykstra et
al., 2007; Till et al., 1964). However, this process is incredibly low throughput requiring
11
high labor and resources to gain insight from relatively few samples. To this end, it is
incredibly important to study many HSCs simultaneously. Till and McCulloch first,
tacked this problem by exposing cells to different irradiation doses which would mark
cells from similar origin individually(Becker et al., 1963). In this way, they could study
differences in clonogenicity, but suffered from low resolution and was not performed in
vivo. Next, researchers experimented with retroviral insertion of trans-genes to mark
unique clones via variable genomic insertion site(Dick et al., 1985; Lemischka et al.,
1986). While this enabled in vivo tracking of clones, this technique still suffered from low
resolution to study large amounts of HSCs simultaneously.
Improving on retroviral insertion techniques, use of synthetic DNA segments
packaged into lentiviral particles ensured high recovery via PCR methodology while
drastically increasing clonal diversity. While originally designed to be read out via
microarray, the fast evolution of next-generation sequencing (NGS) quickly outdated the
former method(Cheung et al., 2013; Gerrits et al., 2010; Lu et al., 2011; Naik et al.,
2013; L. V. Nguyen, Cox, et al., 2014; Schepers et al., 2008). Use of NGS provides high
sensitivity and unbiased identification of clonal barcodes which are easily isolated from
gDNA and a single PCR. In recent years, advances have been made to incorporate
barcodes without ex vivo manipulation. Use of transposons, polylox, and CRISPR/Cas9
technologies have been incorporated into the germlines of mice to enable barcode
generation and tracking in vivo of naïve mice(Bowling et al., 2020; Pei et al., 2017;
Rodriguez-Fraticelli et al., 2018; Sun et al., 2014). In vivo barcoding technologies are
still developing, but their growing popularity highlights the research need to study clonal
12
dynamics in a perturbed state to further get at the cellular mechanisms that underly
HSC heterogeneity.
In addition to studying clones, study of single cells has become the attention of
much high impact research. The development of single cell RNA sequencing has
opened insights to the mechanistic basis of cellular heterogeneity. The first attempt to
profile the transcriptomes utilized cDNA synthesis via reverse transcription with
oligo(dT) by binding the poly(dA) tails of mRNAs (Brady et al., 1995). In the ensuing
years, this approach was combined with microarrays and high-throughput sequencing
(Kurimoto et al., 2006; Tang et al., 2009). Through the improvement of these
approaches, the entire transcriptome was profiled from a single cell. These methods
captured many unknown transcripts and splicing variants which vastly expanded the
understanding of molecular biology and gene expression. Further advancing this
technology, unique molecule identifiers comprised of random DNA sequences were
utilized during cDNA synthesis to eliminate biases introduced during multiple rounds of
PCR(Kivioja et al., 2012). The unique molecule identifiers act as their own molecular
barcode and tally system to only count as a single transcript regardless how many times
they appear in the deep-sequencing result.
Despite these improvements, single cell profiling was still low throughput and
individual cells needed to be separated and processed one-by-one. Multiple methods
were designed to reduce the cost and labor associated with single cell transcriptome
profiling. CEL-seq and MARS-seq introduced unique cellular barcodes during reverse
transcription once cells were isolated to individually mark cells and pool cells post cDNA
synthesis(Jaitin et al., 2014). Next, methods increased profiling throughput by using flow
13
cytometry, microfluids, and droplet-based single cell isolation(Klein et al., 2015;
Macosko et al., 2015; Shalek et al., 2014). Droplet-based technologies encapsulate
cells into lipid microparticles by combining aqueous and oil solution in a microfluidic
device. Inside each droplet contains a single cell, a microbead with mRNA capture
probes, reverse transcriptase enzyme, cDNA synthesis primers, and buffers needed for
cellular lysis. Each microbead contains a unique cellular barcode that is incorporated
into each mRNA captured, ensuring single cell resolution for each cDNA molecule
within each droplet. Commercialized droplet-based sequencing products have greatly
impacted biomedical research enabling quick and easy use of high throughput profiling
of single cells. Enabling new insights into cellular heterogeneity, single cell biology is
now at the forefront of almost all stem cell and cancer biology.
14
Chapter Two: Improvement of clonal tracking via lentiviral genetic barcoding
Preface:
Chapter two, in full, is a manuscript published in Nature Protocols. The work in chapter
improves on existing clonal tracking technology via lentiviral barcoding and provides a
detailed explanation how to implement such a methodology to ones’ own work. The
article provides a comprehensive 127-step protocol outlining barcode conception,
generation, and analysis.
15
Clonal tracking using embedded viral barcoding and high-throughput sequencing
Charles Bramlett
1,2
, Du Jiang
1,2
, Anna Nogalska
1
, Jiya Eerdeng
1
, Jorge Contreras
1
,
Rong Lu
1
1
Eli and Edythe Broad CIRM Center for Regenerative Medicine and Stem Cell
Research, Keck School of Medicine, University of Southern California, Los Angeles, CA,
USA.
2
These authors contributed equally to this work. Correspondence should be
addressed to R.L. (ronglu@usc.edu).
KEYWORDS: clonal tracking, genetic barcoding, high-throughput sequencing, cellular
heterogeneity, synthetic DNA barcodes, viral barcoding, lineage tracing
EDITORIAL SUMMARY This protocol describes the generation and delivery of
embedded viral barcode libraries to track single clones. Barcodes are amplified
from genomic DNA and quantified by high-throughput sequencing and
bioinformatics analysis.
16
ABSTRACT
Embedded viral barcoding in combination with high throughput sequencing is a powerful
technology to track single cell clones. It can provide clonal level insights into cellular
proliferation, development, differentiation, migration, and treatment efficacy. Here, we
present a detailed protocol of a viral barcoding procedure including the creation of
barcode libraries, the viral delivery of barcodes, the recovery of barcodes, and the
computational analysis of barcode sequencing data. The entire procedure can be
completed within a few weeks. This barcoding method requires cells to be susceptible
to viral transduction. It provides high sensitivity and throughput, and allows precise
quantification of cellular progeny. It is cost efficient and does not require any advanced
skills. It can also be easily adapted to many types of applications, including both in vitro
and in vivo experiments.
INTRODUCTION
A cell is a basic unit of biological systems. It can divide to produce progeny cells,
forming a cell clone. Tracking cell clones over time and through space can provide
critical insights into cellular behavior. As genetic material is conserved during cell
division, a cell can be marked and tracked when unique genetic information is inserted
into its genomic DNA, a procedure called genetic barcoding. Since genetic barcodes are
inherited by all progeny cells, the abundance of each barcode in a cellular population is
proportional to the number of cells derived from the original barcoded cell. In
17
conjunction with high throughput sequencing, genetic barcoding is a powerful technique
that allows for tracking clonal behaviors in a high-throughput manner(Lu et al., 2011).
The original approach for genetic barcoding used retroviral insertion sites to mark
individual cell clones and Southern blots to analyze the results(Dick et al., 1985; Keller
et al., 1985; Lemischka et al., 1986). Later, synthetic random DNA barcodes were used
in conjunction with micro-arrays(Schepers et al., 2008). Recently, we and others
developed viral genetic barcodes that mark cells using synthetic DNA segments
embedded within a viral construct that can be easily quantified by high throughput
sequencing(Cheung et al., 2013; Gerrits et al., 2010; Lyne et al., 2018; Naik et al., 2013;
L. V. Nguyen, Makarem, et al., 2014) (Figure 2.1). The embedded viral barcoding
technology provides high sensitivity and throughput, and allows precise quantification of
cellular progeny(Brewer et al., 2016; Lu et al., 2019; L. Nguyen et al., 2018; Wu et al.,
2014). The high-throughput nature of the improved technique reduces the impact of
experimental noise associated with single-cell measurements by greatly increasing the
number of measurements. The high sensitivity of barcode recovery provided by a single
step of PCR allows for the identification of small changes in barcode abundances. In
addition, embedded viral barcoding generates data with single cell resolution through
the use of randomized barcodes and does not involve the handling of single cells at any
point. For simplicity, the term “barcoding” will refer to embedded viral barcoding
throughout unless otherwise stated.
The barcoding method has been utilized and improved by several groups(Lyne et al.,
2018; Bystrykh et al., 2014; Bystrykh & Belderbos, 2016; Naik et al., 2014; Thielecke et
al., 2017). However, there are no standards in the field to generate and analyze
18
barcode data(Lyne et al., 2018). Here, we provide a detailed and easy-to-replicate
protocol to generate and implement genetic barcodes for cellular tracking studies. Since
its first publication(Lu et al., 2011), our protocol has been significantly optimized to
improve sensitivity and detection limits(Brewer et al., 2016; Lu et al., 2019; L. Nguyen et
al., 2018; Wu et al., 2014). These improvements primarily involve upgraded data
analysis algorithms and experimental procedures for barcode recovery. Here, we outline
the protocol in a general way so that it can be adapted to many types of applications,
including both in vitro and in vivo experiments. Our protocol allows new users to easily
set up barcoding at a low cost by creating their own barcode libraries and performing
computational analysis in their own labs.
Applications of the method
Barcoding can be applied to any cells that are susceptible to lentivirus
infection(Kebschull & Zador, 2018; Naik et al., 2014; Thielecke et al., 2017; Woodworth
et al., 2017) and generates clonal behavior information that is important for many fields
of research. For example, it can identify the cell of origin during development and track
the differentiation patterns of stem cells. Using this approach, we have identified a
distinct lineage origin for natural killer cells in a rhesus macaque transplantation
model(Wu et al., 2014). The high throughput nature of this technology allows for
comparing many individual cells simultaneously, and provides a direct assay of cellular
heterogeneity. For example, we used barcoding to show how hematopoietic stem cells
heterogeneously differentiate after transplantation in mice(Brewer et al., 2016; Lu et al.,
2019; L. Nguyen et al., 2018).
19
Barcoding can also be used to study diseases, particularly those that originate from rare
cells such as cancer(L. V. Nguyen, Cox, et al., 2014; L. V. Nguyen et al., 2015;
Woodworth et al., 2017). For example, barcoding can help reveal the cellular origins of
cancer genesis, relapse and metastasis. It can also reveal the heterogeneous
responses of cancer cells to treatment. These studies require ex vivo barcoding of
candidate cells, typically samples from patients or animal models. Tracking can then be
performed in vitro or in animal models.
Barcoding has also been applied to facilitate gene and drug screens. For example,
barcoding has been used in CRISPR screens, where the gRNAs serve as genetic
barcodes(Shalem et al., 2014; Wang et al., 2014). While these studies typically do not
require single cell resolution, genetic barcodes still provide high throughput and
tremendous cost savings. Moreover, barcoding can provide further insights regarding
cellular heterogeneity for these screens.
Comparisons with other methods
The conventional strategy for studying clonal behaviors is simply to track a single cell at
a time, e.g. single cell transplantation and single cell culture(Dykstra et al., 2007; Osawa
et al., 1996; Sieburg et al., 2006). While these approaches do not require viral
transduction, they are labor intensive and cost prohibitive for most applications. To
increase throughput and reduce cost, fluorescent proteins, either singly or in
combination, have been used to mark clonal identities(Cornils et al., 2014; Livet et al.,
2007; Rios et al., 2014; Weber et al., 2012). However, the number of fluorescent colors
is small, limiting the number of cells that can be confidently tracked at the clonal level. In
contrast, the synthetic DNA segments that we use have virtually unlimited variations,
20
allowing thousands of clones to be tracked with a quantifiably high degree of accuracy
in the clonal level labeling. It is also cost effective as our viral library designs allow
multiple samples to be sequenced together.
Other techniques have tried to overcome the limited number of fluorescent colors using
viral insertion sites paired with Linear Amplification-Mediated PCR (LAM-PCR)(Harkey
et al., 2007; Schmidt et al., 2007; Wu et al., 2011, 2013) and quantitative shearing linear
amplification PCR(S. Zhou et al., 2014). These methods are still in use for human
studies particularly those that involve gene therapy. However, the difficulty of recovering
genomic insertion sites precludes obtaining the quantitative data needed for most clonal
tracking questions. In contrast, the synthetic DNA segments used in barcoding are easy
to recover and produce highly quantitative results. The design of our genetic barcodes
allows for their recovery using a single step of PCR, during which primers that are
needed for downstream sequencing are simultaneously incorporated. This simple and
elegant approach greatly reduces experimental noise during barcode recovery and
produces quantitative and reproducible data. In our day-to-day experiments, replicate
samples are highly consistent (Figure 2.2), and barcode quantification is directly
proportional to donor cell doses(Brewer et al., 2016), demonstrating the high fidelity of
our quantitative measurements when applied to in vivo experiments.
The barcoding procedure for in vivo studies requires cell transduction and
transplantation. Attempts to eliminate the transduction and transplantation steps have
been tried in several approaches where cells are engineered with transposon, Polylox
or CRISPR/Cas9 technologies. Transposon-based methods temporarily express a
transposase to activate the mobilization of a DNA segment, called a transposon, which
21
is randomly inserted into the genome to label individual cells(Sun et al., 2014). Similar
to the viral insertion site detection technique, this approach suffers from poor
quantification because of technical difficulties in recovering the genomic insertion sites.
The Polylox-based system uses a series of unique loxP sites embedded in the genome
that are excised randomly upon exposure to Cre recombinase(Pei et al., 2017). This
approach has been commonly used to generate fluorescent protein combinations(Livet
et al., 2007). The CRISPR/Cas9-based system edits genomic DNA, or synthetic DNA
segments embedded in the genome, with the help of guide RNA (gRNA)(Frieda et al.,
2016; Kalhor et al., 2018). Both Polylox and CRISPR/Cas9 rely on the assumption that
the DNA recombination is random, which is not entirely true for either system(Frieda et
al., 2016; Kalhor et al., 2018; Rüfer & Sauer, 2002; M. W. Shen et al., 2018).
Meanwhile, the viral barcoding method suffers from random multiple labeling as well.
Taken together, the transposon, Polylox and CRISPR/Cas9 systems enable
endogenous activation of cellular labeling. Compared to viral barcoding, these
approaches do not require cell isolation, culture, transduction or transplantation, thus
enabling the study of native cellular behaviors in addition to ex vivo and transplantation-
mediated studies. Moreover, tissue-specific promoters can be implemented to address
tissue specific questions. However, these approaches can be challenging for cell types
that cannot be defined by a single promoter.
To overcome the transgenic requirement, new retrospective methods of clonal tracking
take advantage of naturally occurring mutations. These methods rely on rare somatic
mutations to reconstruct the lineage relationships between individual cells(Chapal-Ilani
et al., 2013; Lee-Six et al., 2018; Osorio et al., 2018; Wasserstrom et al., 2008) . Since
22
neutral mutations occur during cell division in a seemingly random process, these
methods link cells to one another when they share common mutations. However, these
methods require enough cell divisions to accumulate rare mutations and cannot track
cells that do not carry any mutations. Moreover, they require whole genome sequencing
to identify the rare mutations, which is cost prohibitive for most applications. They are
also computationally intensive and require specialized knowledge of lineage
reconstruction using a population genetics approach. In contrast, barcoding can label
any cells that are accessible to viral infection and integration, is easy and inexpensive,
and does not require any advanced computational skills.
Limitations
The barcoding strategy presented here is limited to systems that tolerate cell isolation,
short term culture, and transplantation. In our protocol, cells of interest need to be
isolated from their respective tissues in order to be transduced by the lentiviral vector
carrying genetic barcodes. This can be a problem for cells that cannot be readily
isolated or require maintenance of endogenous tissue architecture. In some cases, in
vivo injection of barcode-carrying virus can be used as an alternative strategy, although
it creates new problems of labeling unwanted cells and uneven transduction. In vivo
delivery of the barcodes is not discussed further here.
Cells may potentially change their properties during culture and barcode transduction.
While many studies have shown that lentiviral integration does not cause any apparent
change in biological functions of the transduced cells(Gonzalez-Murillo et al., 2008;
McKenzie et al., 2006; Naik et al., 2014; L. V. Nguyen, Cox, et al., 2014; L. V. Nguyen
et al., 2015), it is still possible that a particular lentiviral vector may be randomly inserted
23
into some genomic regions and alter cell behavior. Therefore, experimental replicates
and controls must be rigorously used to exclude the possibility that rare viral insertions
cause the observed phenotype.
Different cell types have different transduction rates. The technique reported here was
optimized for mouse cells but has been used for studying primate cells as well(Wu et
al., 2014, 2018). Transduction efficiencies for primary human cells are generally lower
than for mouse cells in our hands but are sufficient to yield meaningful results. Both
mouse and human cell lines generally exhibit higher transduction efficiencies than
primary cells.
While modern high throughput sequencing has greatly improved barcode recovery,
barcode detection is still limited by experimental loss during barcode extraction and cell
collection. For example, cells collected from a part of a solid tissue may not be
representative of the whole tissue. Additionally, the sizes of some cell populations may
numerically exceed the limit of our barcode extraction protocol, and only a fraction of the
cells can be analyzed. Sequencing depth, as well as barcode extraction procedure, may
also limit the detection of barcodes with low abundancy(Bystrykh et al., 2012).
Furthermore, detection limits may vary between samples with differing cell numbers.
Experimental design
Plasmid Generation
The oligo library template can be obtained from IDT or other vendors. We suggest
HPLC purification for best results. The synthetic DNA oligos that we use to generate
barcode libraries are comprised of several parts: a BamHI restriction enzyme site, a
24
forward primer binding site, a 6bp library ID, a 27bp random sequence, a reverse primer
binding site, and an EcoRI restriction enzyme site (Figure 2.1a). The 6bp library ID
allows different cell populations to be simultaneously barcoded and combined during
downstream biological treatment, barcode extraction and sequencing. This saves much
experimental time and resource. The 27bp random sequence generates a maximum of
4
27
different barcodes in theory. This number is reduced by excluding sequences with
restriction enzyme cutting sites and with characters difficult for PCR and sequencing
such as poly-Ns. Longer or shorter random sequences and random sequences with
interspersed fixed sequences can also be used. The length should not be too long such
that it exceeds the sequencing capacity, nor too short such that it limits library diversity.
A 6bp sequence is added to both ends to ensure proper restriction enzyme cutting
(Table 2.1). The primer binding sites enable targeted PCR for barcode extraction
(Figure 2.1b). The BamHI and EcoRI sites are designed for cloning the double-stranded
barcode DNA into lentiviral backbones, such as the pCDH plasmid. Other types of
vectors are also applicable as long as they can insert DNA barcodes into the host cell’s
genome. The plasmid may also include fluorescent proteins, such as GFP, to signal the
presence of barcodes and to evaluate the transduction efficiency. The primer design
can be customized as needed. The 6bp library ID and 27bp random sequence can be
readily replaced to accommodate alternative barcode designs. Alternative barcode
designs include interspersed fixed sequences and a library of known barcode
sequences
16
. Implantation of partially or fully pre-designed barcode sequences can
avoid restriction cutting sites and poly-N stretches.
25
Synthesized DNA oligos are made double-stranded using a single primer “Strand2”
(Table 2.2). After cloning, each plasmid library is transformed into competent cells (E.
coli), and all bacterial colonies are amplified to achieve high barcode diversity. Bacterial
cultures are grown overnight in an incubator. Plasmids are isolated from bacterial
culture using the Qiagen Plasmid Maxi kit. Plasmid DNA concentration is then
measured using NanoDrop. Before proceeding to the next step, the plasmid needs to be
sequenced for evaluating barcode diversity, i.e. the number of unique barcodes and
their relative abundances in the library(Lu et al., 2011; Naik et al., 2014). A high library
diversity (high barcode numbers and equal representation of unique barcodes) is
essential to reducing the chance that more than one cell is labeled by the same
barcode. Optimizing the bacterial transformation step is the key to improving library
diversity. The diversity of the library dictates the number of clones that can be tracked in
a single experiment such that each barcode represents a single cell clone with statistical
certainty. An exact calculation for this limit was provided in our previous study(Lu et al.,
2011). A user-friendly calculation tool is provided with this protocol (Supplementary
Software). As a rule of thumb, a library of 40,000-50,000 barcodes typically allows
around 1,000 cells to be tracked with greater than 95% probability that more than 95%
of the barcodes represent single cells.
Lentivirus Packaging
HEK293T cells are used to produce lentiviral particles. HEK293T cells are transfected
with pCDH barcode plasmid, lentivectors Pax2 and VSV-G, in the presence of
SuperFect Transfection Reagent. The supernatant is collected and the media changed
at 48, 72, and 96 hours. The virus should always be kept on ice or at 4°C after
26
harvesting. After pooling and concentrating using 50% PEG-8000, the virus should be
aliquoted and kept at -80°C for long term storage. Lentiviral library needs to be tested
on cell lines before use by transduction, barcode extraction and sequencing, to evaluate
the viral titer and barcode diversity, i.e. the number of unique barcodes and their relative
abundances in the library. Results from sequencing plasmids and transduced cell lines
can create reference libraries that facilitate downstream bioinformatics analysis(Naik et
al., 2014).
Transducing Experimental Cells
The exact transduction time depends on the research purpose, the type of cells, and
how well the culture condition supports cellular properties. Using low viral titer will
ensure that each cell only receives one viral insertion and subsequently one barcode.
Cells that receive more than one barcode will be over represented in the results. We
previously reported that ~50% transduction efficiency resulted in >95% cells carrying
only a single barcode(Lu et al., 2011). Other studies have applied lower transduction
efficiency (~15%) to further reduce the chance of double barcoding(Naik et al., 2014).
After incubation, cells should be washed thoroughly to remove any remaining virus.
Labeled cells are now ready for experimental use.
It is important to use the same viral libraries for both control and experiment groups, and
to include biological replicates using different viral libraries or viral infection wells if
possible, to avoid experimental noise associated with viral infection. We recommend
evaluating the percentage of infected cells in every experiment by analyzing an aliquot
of the experimental cells. The fractions of cells receiving single and multiple barcodes
must be determined experimentally by analyzing the barcode copy number in genomic
27
DNA at the clonal level. Multiple infections of a single cell can label the cell with multiple
barcodes, and data from this cell will be over represented. The data is acceptable, if
cells with multiple barcodes are expected to produce similar results as cells with single
barcodes. The number of experimental cells to be barcoded should be limited based on
the barcode diversity in the library(Lu et al., 2011). This limit is particularly important for
experiments using cell lines where each barcode is meant to represent a single cell with
statistical confidence. In addition to the cell number and viral incubation time, other
experimental parameters, such as cell numbers for barcode analysis and time to
harvest the cells, also influence experimental results and can be adapted from
previously published studies with similar experimental conditions(Guernet et al., 2016;
Lu et al., 2019; Merino et al., 2019; L. Nguyen et al., 2018; L. V. Nguyen, Cox, et al.,
2014; L. V. Nguyen et al., 2015; L. V. Nguyen, Makarem, et al., 2014; Shalem et al.,
2014; Verovskaya et al., 2013; Wang et al., 2014; Woodworth et al., 2017; Wu et al.,
2014).
Barcode Extraction
Barcodes are recovered by isolating genomic DNA from cells of interest. For a given
population, the number of cells required for analyses depend on the desired barcode
detection sensitivity. To identify barcodes that are as rare as 1 in 1,000, at least 1,000
barcoded cells have to be collected for barcode recovery. If possible, more than 10,000
barcoded cells should be collected for best results. High cell numbers allow for
identifying rare barcodes, but too many, as well as too few, cells may reduce barcode
recovery rates and present problems during barcode extraction. Sorting is not required
for collecting cells, but the collected cells should be prepared for genomic DNA
28
extraction and counted in preparation for the barcode extraction procedure. From the
isolated genomic DNA, barcodes are PCR amplified using designed primers (Table 2.2)
that flank the barcode region and provide binding sites for downstream high-throughput
sequencing. These primers also add indexes that enable multiplex sequencing. To
ensure precise quantification, the PCR should be halted during the exponential phase of
amplification (typically 20–27 cycles) before the curve plateaus (Figure 2.3). Samples
with different numbers of cells may require different numbers of cycles. Compared to
conventional PCR that uses a predetermined cycle number, stopping the PCR reaction
during the exponential phase prevents over amplification and reduces PCR bias in line
with the idea behind quantitative PCR. After PCR, barcode DNA is purified using
magnetic beads.
DNA Quantification & High-throughput Sequencing
The amplified barcodes need to be precisely quantified before sequencing. It is
important to choose a quantification method that is sensitive and robust. We choose
fluorescence-based quantification (Qubit assay), but other methods such as
TapeStation ScreenTape assay may also suffice.
Barcode samples prepared using different reverse primers may be pooled for
sequencing as one sample to reduce cost. Our library ID design provides an additional
option for multiplexing different barcoded samples. Additional index primers and library
IDs can reduce sequencing cost at the expense of the additional resources to create
them. While we recommend HPLC purified primers, desalted primers are also
acceptable. The depth of sequencing depends on the number of barcoded cells used
during barcode extraction. We recommend sequencing around 100 reads per barcoded
29
cell to ensure precise barcode quantification. While the barcode is only 33bp, we
typically sequence at least 50bp single end, so that the sequence from 34
th
to 50
th
bp
can be used as a quality control check.
Analyzing Sequencing Data
We developed custom Python scripts to extract barcodes from the raw sequencing
results (Supplementary software). The scripts consist of three major steps. In the first
step, the code extracts the first 50bp of each read. This 50bp should consist of the 6bp
library ID, a 27bp random sequence, and the 17bp PCR handle. In the second step, the
code aligns the last 17bp of each read to the expected sequence. The reads containing
the expected 17bp are then separated based on their first 6bp sequence, i.e., library
IDs. In this step, the code also counts the copy number for each unique sequence. In
the third step, the code generates the final results that consist of master barcodes and
their copy numbers. The generation and use of master barcodes are explained in detail
below.
As PCR and sequencing can both generate errors(Bystrykh et al., 2012), we combine
sequences that are closely similar to each other following the conventional strategy
used for analyzing high throughput sequencing data. We use Levenshtein edit distance
to quantify the similarity between different sequences. Each nucleotide substitution
results in an edit distance of 1. Each indel results in an edit distance of 2 because all
sequences are the same length and an indel creates an additional difference at the last
base pair. By default, if the edit distance between different sequences is no more than
4, they are considered to be derived from the same sequence (Figure 2.4). Our Python
scripts allow users to customize the edit distance threshold.
30
In the second step, we allow a maximum edit distance of 4 when aligning the 17bp. We
exclude reads whose first 6bp does not match exactly with any expected library IDs. In
the third step, the code performs pairwise comparison of all the unique sequences, and
groups the pairs with no more than 4 edit distance that share a common sequence.
Within each group, the sequence with the highest copy number is kept as the “master
barcode”. The copy number for each master barcode is the sum of the copy numbers of
all barcodes that are no more than 4 edit distance different from the master barcode.
The master barcodes are used to represent the original barcodes delivered by the
lentivirus. If a reference library from sequencing plasmids and transduced cell lines is
used, the master barcode sequences can be drawn from the reference library instead.
The sequences of the master barcodes can facilitate comparisons between different
samples that are derived from the same barcoded cell population. The third step of the
code generates a file reporting the distance between each unique sequence and its
master barcode, as well as the distance between different master barcodes. This
information can help users adjust the edit distance threshold. While there is an R-
package ‘genBaRcode’ available for similar barcode analysis(Thielecke et al., 2019),
our Python code provides a flexible alternative that is easy to implement for users with
little programming skills. Downstream data analysis and visualization are contingent to
the specific biological questions and can be adapted from previous studies(Brewer et
al., 2016; Bystrykh et al., 2014; Bystrykh & Belderbos, 2016; Lu et al., 2019; Lyne et al.,
2018; Naik et al., 2013, 2014; L. Nguyen et al., 2018; L. V. Nguyen, Cox, et al., 2014; L.
V. Nguyen et al., 2015; L. V. Nguyen, Makarem, et al., 2014; Wu et al., 2014, 2018).
31
MATERIALS
Biological Materials
• 5-alpha Competent E. coli, High Efficiency (New England Biolabs, cat. no. C2987I)
• HEK293T cell line (ATCC cat. no. CRL-3216, RRID:CVCL_0063)
CAUTION: Cell lines should be checked for authenticity and to ensure that they
are not infected with mycoplasma.
Reagents
• lentivirus pCDH plasmid (System Biosciences, cat no. CD523A-1)
• BamHI restriction enzyme (New England Biolabs, cat. no. R3136S)
• EcoRI restriction enzyme (New England Biolabs, cat. no. R0101S)
• 10x NEBuffer 3.1 (New England Biolabs, cat. no. B7203S)
• DNA Polymerase I, Large (Klenow) Fragment (New England Biolabs, cat. no. M0210S)
• Deoxynucleotide (dNTP) Solution Mix (cat. no. N0447S)
• 10x Reaction Buffer (New England Biolabs, cat. no. B9014S)
• Zymoclean Gel DNA Recovery Kit (Zymo Research, cat. no. D4001, Genesee Scientific,
cat. no. 11-300C )
• Oligo library (Integrated DNA Technologies, see Table 2.1 for ordering details)
• T4 DNA Ligase (New England Biolabs, cat. no. M0202S)
• SOC Outgrowth Medium (New England Biolabs, cat. no. B9020S)
32
• LB Agar Plates, Quality Biological, LB Agar with 100μg/mL Ampicillin (VWR, cat no.
10128-318)
• VWR Life Science AMRESCO Premixed LB Broth, Miller Formulation (VWR, cat. no.
97064-114)
• VWR Life Science AMRESCO Agarose RA (Benchmark Scientific, cat. no. A1700)
• Thermo Scientific GeneJET Gel Extraction Kit (Thermo Fisher Scientific, cat. no. K0691)
• Plasmid Maxi Kit (Qiagen, cat. no. 12162)
• Pax2 lentiviral vector (Addgene, cat. no. 35002)
• pCMV-VSV-G lentiviral vector (Addgene, cat. no. 8454)
• SuperFect Transfection Reagent (Qiagen, cat. no. 301305)
• PBS (Life Technology, cat. no. 21600-010)
• DMEM (Life Technology, cat. no. 11320033)
• Penicillin-streptomycin (ThermoFisher, cat. no. 15140122)
• Fetal Bovine Serum (Fisher Scientific, cat. no. SH3007103)
• Poly(ethylene glycol), BioUltra, for molecular biology, 8,000 (Sigma-Aldrich, cat. no.
81268-250G)
• Quick-DNA Micro-Prep kit (Zymo Research, cat. no. D3020)
• Primers, HPLC purified. (Integrated DNA Technologies, see Table 2.2 for ordering
details)
• Phusion High-Fidelity PCR Master Mix with HF Buffer (Thermo Fisher, cat. no. F531L)
33
• EvaGreen Dye (VWR, cat. no. 89138-984)
• SPRIselect Beads (Beckman Coulter, cat. no. B23318)
• Ethyl Alcohol Pure 200 Proof (Sigma-Aldrich, cat. no. E7023-500ML)
• Water, Ultra Pure, Sterile (Genesee Scientific, cat. no. 18-194)
• Qubit dsDNA HS Assay kit (Life Technology, cat. no. Q32854)
• NextSeq Sequencing Kit (Illumina, cat. no. FC-404-2005)
• Agarose LE (Apex Bio research cat. no. 20-102)
• 10,000X GelGreen Nucleic Acid Gel Stain (EmbiTec, cat. no. EC-1995)
• Apex DNA Ladder III (Apex Bioresearch Products, cat. no. 42-425)
• Tris Acetate EDTA 50X (TAE) (Fisher Scientific, cat. no. MP1TAE50X01)
• Isopropanol (Sigma-Aldrich, cat. no. 190764)
Equipment
• Pipettes (Rainin, cat. nos. 17014382, 17014383, 17014384)
• Falcon® Tissue Culture Dishes, Polystyrene, Sterile (Corning, cat. no. 353003)
• Filter tips for pipettes (Denville Scientific, cat. nos. P1126, P1122, P1121, P1096-FR)
• Sterile Syringe Filters, 0.45 micron filter, 10 ml, 33mm (Fisher Scientific, cat. no.
SLHVM33RS)
• Eppendorf Flex-Tube 1.5mL Microcentrifuge Tube (Eppendorf, cat no. 022364111)
• Falcon® Centrifuge Tubes, Polypropylene, Sterile, (VWR, cat. no. 21008-918)
34
• 6-well cell culture plate (VWR, cat. no. 62406-161)
• 50 mL syringe (VWR, cat. no. 80062-745)
• 0.45 micron filter (VWR, cat. no. 28145-505)
• 96-well cell culture plates, untreated (VWR, cat. no.15705-064)
• C1000 Touch ThermoCycler (Bio-Rad, cat. no. 1851148)
• Portable Balances (Fisher Scientific, cat. no. S94792C)
• RunOne Electrophoresis system with timer (Embi Tec, cat. no. EP-2100)
• Safe Imager 2.0 Blue-Light Transilluminator (ThermoFisher, cat. no. G6600)
CAUTION: Always wear a UV-light protective safety glasses/face shield.
• Innova 44/44R shaker (New Brunswick, cat. no. M1282-0000)
• Tissue culture incubator (Panasonic, cat. no. MCO-170AICUVL-PA)
• NanoDrop 2000c (ThermoFisher, cat. no. ND-2000)
• Bench top centrifuge (Beckman Coulter, cat. no. B30134 )
• Centrifuge (Beckman Coulter, cat. no. A99465 )
• Swinging Bucket Rotors (Beckman Coulter, cat. no. 366650, 360581)
• BD FACSAria™ III Cell Sorter with 530/30-nm FITC laser
• Olympus 96-Well PCR Plate, FAST-type (Genesee Scientific, cat. no. 24-310)
• ThermalSeal RTS Sealing Films, sterile (Excel Scientific, cat. no. TSS-RTQS-50)
• ViiA 7 Real-Time PCR System (Applied Biosystems, cat. no. 4453545)
35
• 0.2 mL Strip tube magnet (10x Genomics, cat. no. 230003)
• Qubit fluorometer (ThermoFisher, cat. no. Q33238)
• Computer with installed Windows 7+
• Microsoft Visual C++ Compiler for Python 2.7
• Anaconda Distribution, Python 2.7 version
Reagent setup
Oligos and Primers
Resuspend IDT DNA oligos (Table 2.1) and primers (Table 2.2) to 100 µM in nuclease
free water. Dilute to 10 µM concentration by adding 10 µL of 100 µM primers to 90 µL of
nuclease free water. DNA oligos and primers can be stored at 10 µM or 100 µM at -
20°C for up to two years.
DMEM Media
Mix 445 mL DMEM medium with 50 mL FBS (10% FBS vol/vol final concentration) and
5 mL penicillin-streptomycin (1% pen-strep vol/vol final concentration). Store at 4°C for
up to one month.
1X TAE Buffer
Mix 2 mL 50X TAE with 98 mL water for 100 mL 1X TAE. Store at room temperature
until expiration date on packaging.
70% (vol/vol) Ethanol Solution
36
Mix 700 μL Ethyl Alcohol Pure 200 Proof with 300 μL nuclease free water for 1 mL 70%
Ethanol (vol/vol) right before use. CRITICAL: Make fresh.
Agarose Gel
Prepare before use. Mix 1g for 1% (wt/vol) or 3g for 3% (vol/vol) agarose with 100 mL
1X TAE, heat in microwave until agarose completely dissolves, pour the solution in a
casting box with the comb positioned, and cool at room temperature until the gel
solidifies for at least 20 minutes.
25X GelGreen DNA Dye
Mix 1 µL 10,000X GelGreen DNA dye with 399 µL nuclease free water. Store at 4°C for
up to six months.
Equipment setup
Tissue culture incubator
Set incubator to 37°C with 5% carbon dioxide. Keep humidifier pan full by adding sterile
water.
Software dependencies
Install Anaconda Distribution, Python 2.7 version
(https://www.anaconda.com/distribution/).
Download code_demo.zip (Supplemntal Information). code_demo.zip includes:
• readme.pdf
• step-1_read-raw-data.py
37
• step-2_combine-library-ID.py
• step-3_combine-barcodes.py
• step-4-opt_evaluating_barcode_diversity.py
• library_ID.txt
• sample info 041519.txt
• expected_output
PROCEDURE
Plasmid Generation, Timing 2 –3 days
1. Order DNA oligos listed in Table 2.1 from Integrated DNA Technologies or other
vendors.
2. Perform second strand synthesis using Strand2 primer (Table 2.2). Mix and incubate for
2 hours at 16°C. Perform a separate reaction for each virus library from Table 2.1. A
negative control to estimate the background should be set up by removing the enzyme.
Experiment setup:
Component Amount
(μL)
Final
(concentration/amount)
Oligos (100 µM;
Table 2.1)
8 μL 40 µM
Strand 2 Primer
(10 µM, Table 2.2)
1 μL 0.5 µM
38
dNTPs (10 µM) 2 μL 1 µM
10x reaction buffer 2 μL 1X
Klenow enzyme 1 μL 5 Units
Nuclease-free
water
6 μL
Negative Control:
Component Amount
(μL)
Final
(concentration/amount)
Oligos (100 µM;
Table 2.1)
8 μL 40 µM
Strand 2 Primer
(10 µM; Table 2.2)
1 μL 0.5 µM
dNTPs (10 µM) 2 μL 1 µM
10x reaction buffer 2 μL 1X
Klenow enzyme 0 μL
Nuclease-free
water
7 μL
3. Vortex SPRIselect Magnetic Beads before use.
4. Add 1.8X beads to sample from Step 2 (36 μL beads for 20 μL of reaction) and gently
mix with pipette 15 times. CRITICAL STEP: Do not vortex steps 4–15
39
5. Incubate the sample with beads at room temperature (25°C) for 5 minutes.
6. Condense beads into a pellet with magnet for 3–5 minutes.
7. Remove and discard supernatant without disturbing beads, leaving ~5 μL behind at the
bottom of the tube. Keep beads on the magnet until elution step, do not disturb the
pellet.
8. Pipette 200 μL of 70% ethanol without disturbing beads, keep beads on the magnet.
CRITICAL STEP: Prepare fresh 70% (vol/vol) ethanol. Ethanol that has been
stored for too long will have an incorrect ethanol to H2O ratio, which will decrease
DNA yield.
9. Leave ethanol with beads for 30 seconds, then remove ethanol and discard.
10. Repeat wash (steps from 8 and 9 for a total of two ethanol washes).
11. Remove as much of the ethanol as possible. Be mindful of small ethanol droplets.
12. Air dry pellet for ~1 minute. CRITICAL STEP: The drying time for beads is variable.
Be careful not to OVERDRY the pellet, which will lead to cracking and/or breakup,
reducing DNA recovery.
13. Add 20 μL of nuclease free water to all samples, and then pipette mix 15 times. Repeat
the mixing to ensure better recovery.
14. Incubate for 5 minutes.
15. Condense beads into a pellet with magnet for 3–5 minutes.
16. Collect the supernatant into a new tube. OPTIONAL: Capture carry-over beads with
magnet for 3–5 minutes, and transfer the supernatant into a new tube. Pause Point.
Store at -20°C for long term storage.
17. Digest purified product using EcoRI and BamHI.
40
Component Amount Final
(concentration/amount)
DNA (step 16) 1 μL
10X NEBuffer 3.1 5 µL 1X
EcoRI 1 µL 10 Units
BamHI 1 µL 10 Units
Nuclease-free
water
up to 50 µL
18. Incubate at 37°C for 60 minutes.
19. Add 10 µL of 6X loading dye and 2.4 µL 25X GelGreen DNA dye to the product.
20.
Run all product on a 3% (wt/vol) agarose gel (in 1X TAE) at 100 Volts for 60 minutes.
CRITICAL STEP: Depending on gel well volume capacity, product may need to be
loaded into multiple wells.
21. Illuminate the DNA in the gel using UV transilluminator. The expected band should be
~100 bp. Excise the DNA fragment from the agarose gel using gel extraction tips, a
razor blade or a scalpel, and transfer it into a 1.5 mL microcentrifuge tube. CAUTION:
Always wear UV-light protective safety glasses/face shield.
22. Purify DNA from gel using Gel DNA Recovery Kit.
23. Add 3 volumes of ADB buffer provided in the kit per each 100 mg of agarose excised
from the gel (e.g., for 100 mg of agarose gel slice, add 300 µL of ADB).
41
24. Incubate at 37–55 C for 5–10 minutes until the gel slice is completely dissolved. For
DNA fragments >8 kb, following the incubation, add one additional volume
(corresponding to the weight of the gel slice) of water to the mixture for better DNA
recovery (e.g., 100 µL agarose, 300 µL ADB, and 100 µL water).
25. For steps 26 and 27 centrifuge at 10,000 x g at room temperature. Transfer the melted
agarose solution to a Zymo-Spin™ Column in a Collection Tube. Centrifuge.
26. Discard the flow-through. Add 200 µL of DNA Wash Buffer to the column and centrifuge.
Discard the flow-through. Repeat the wash.
27. Add ≥ 6 µL DNA nuclease free water directly to the column matrix. Place column into a
1.5 mL tube and centrifuge to elute DNA.
28. Prepare the lentivirus pCDH vector backbone (or vector of choice). Linearize 5 ug of the
pCDH vector by digesting it with EcoRI and BamHI, and run agarose gel as described in
step 20 but using a 1% (wt/vol) agarose gel. Purify the excised product as described in
step 21–27. Band size should be ~7150 bp.
29. Ligate the vector and the barcode insert into a microcentrifuge tube on ice. A negative
control removing ligase should be included to assess the background level.
Experiment setup:
Component Amount Final
(concentration/amount)
T4 DNA Ligase
Buffer
2 μL 1X
42
Linear lentivirus
pCDH vector DNA
(step 28)
50 ng 2.5 ng/μL
Insert DNA (step
27)
37.5 ng 1.875 ng/μL
Nuclease-free
water
up to 19 μL
T4 DNA Ligase 1 μL 400 Units
Negative control:
Component Amount Final
(concentration/amount)
T4 DNA Ligase
Buffer
2 μL 1X
Linear lentivirus
pCDH vector DNA
(step 28)
50 ng 2.5 ng/μL
Insert DNA (step
27)
37.5 ng 1.875 ng/μL
Nuclease-free
water
up to 20 μL
T4 DNA Ligase 0 μL
43
CRITICAL STEP: Add T4 DNA Ligase last.
30. Gently mix the reaction by pipetting up and down and centrifuge briefly.
31. Incubate at 16°C overnight or at room temperature for 10 minutes.
32. Heat inactivate at 65°C for 10 minutes. Pause Point. Store at -20°C for long term
storage.
33. Thaw a tube of High Efficiency 5-alpha Competent E. coli cells on ice until the last ice
crystals disappear.
34. Add the entire ligation reaction (20 μL). Carefully flick the tube 4–5 times to mix cells
and DNA. CRITICAL STEP: Do not vortex.
35. Incubate the mixture on ice for 30 minutes. Do not mix.
36. Heat shock at exactly 42°C for exactly 30 seconds. Do not mix.
37. Place on ice for 5 minutes. Do not mix.
38. Pipette 950 µL of room temperature SOC media into the mixture.
39. Incubate at 37°C for 60 minutes, shaking vigorously (225 rpm) or rotating the samples.
40. Pre-warm the 100 µg/mL Ampicillin selection plates to 37°C.
41. Mix the cells thoroughly by flicking the tube and inverting, then perform several 100-fold
serial dilutions in SOC media with 10 µL of transformant.
42. Spread 50–100 µL of each dilution onto a pre-warmed selection plate and incubate
overnight at 37°C to estimate the transformation efficiency.
?Troubleshooting.
44
43. Pipette remaining transformant into 100 mL of LB broth with 100 µg/mL ampicillin and
incubate overnight at 37°C, shaking vigorously (225 rpm) or rotating the sample.
44. The following day extract plasmid using QIAGEN Plasmid Maxiprep kit.
45. Harvest overnight bacterial culture by centrifuging at 6000 x g for 15 minutes at 4°C.
46. Resuspend the bacterial pellet in 10 mL Buffer P1.
47. Add 10 mL Buffer P2, mix thoroughly by vigorously inverting 4–6 times, and incubate at
room temperature for 5 minutes. If using LyseBlue reagent, the solution will turn blue.
48. Add 10 mL prechilled Buffer P3, mix thoroughly by vigorously inverting 4–6 times.
Incubate on ice for 20 minutes. If using LyseBlue reagent, mix the solution until it is
colorless.
49. Centrifuge at ≥20,000 x g for 30 minutes at 4°C. Re-centrifuge the supernatant at
≥20,000 x g for 15 minutes at 4°C.
50. Equilibrate a QIAGEN-tip 500 by applying 10 mL Buffer QBT, and allow column to
empty by gravity flow.
51. Apply the supernatant from step 49 to the QIAGEN-tip and allow it to enter the resin by
gravity flow.
52. Wash the QIAGEN-tip with 2 x 30 mL Buffer QC. Allow Buffer QC to move through the
QIAGEN-tip by gravity flow.
53. Elute DNA with 15 mL Buffer QF into a clean 50 mL vessel.
54. Precipitate DNA by adding 10.5 mL (0.7 volumes) room temperature isopropanol to the
eluted DNA and mix. Centrifuge at ≥15,000 x g for 30 minutes at 4°C. Carefully decant
the supernatant.
45
55. Wash the DNA pellet with 5 mL room-temperature 70% (vol/vol) ethanol and centrifuge
at ≥15,000 x g for 10 minutes. Carefully decant supernatant.
56. Air dry pellet for 5–10 minutes and re-dissolve DNA in a suitable volume of nuclease
free water. Use NanoDrop to measure plasmid concentration.
Pause Point. Store at -20°C for long term storage.
CRITICAL STEP: Test the barcode diversity of the plasmid library by following
steps 91127 before lentiviral packaging. Proceed only if the barcode diversity
allows for tracking single cells in the intended experiment (step 127).(Lu et al.,
2011)
Lentivirus Packaging, Timing 4–5 days
57. Plate 8x10
5
HEK293T cells per well in 6-well plate. Grow overnight to approximately
80% confluency. CAUTION: Except for centrifugation, all downstream procedures
in steps 57–68 must be performed in a biosafety cabinet. Use freshly prepared
10% (vol/vol) bleach solution to disinfect tips after pipetting the virus.
58. Aspirate and discard media for the HEK293T cells. Add 1 mL DMEM media to each
well.
59. For each well, mix in a tube 2 μg of ligated lentivirus pCDH vector from step 56, 1.3 μg
of Pax2, and 0.7 μg of VSV-G in 100 μL final volume of DMEM media.
60. Add 6 μL SuperFect Transfection Reagent to the tubes from step 59, vortex for 10
seconds, incubate at room temperature for 5–10 minutes, then disperse drop by drop to
the cells evenly across the plate. Put the plate in 37°C incubator.
46
61. 8–12 hours later, aspirate and discard media then wash the HEK293T cells with PBS,
and add 2 mL of fresh DMEM media to each well.
62. Harvest 2 mL supernatant containing the virus at 48 hours, 72 hours, and 96 hours after
step 61, and pool supernatants from the same well together on ice and store at 4°C.
Replace with 2 mL fresh DMEM media at each time point.
63. Spin down for 10 minutes at 300 x g and 4°C. Collect the supernatant into a new tube.
64. Filter the supernatant using 0.45-micron filter mounted on a syringe into ¼ volume of
50% PEG-8000 (e.g. 6 mL virus supernatant into 2 mL 50% PEG-8000).
65. Mix by inverting, and incubate at 4°C overnight.
66. The following morning spin down the tubes at 1500 x g at 4°C for 20 minutes and
discard the supernatant.
67. Spin down again 300 x g at 4°C for 5 minutes to remove as much of the remaining liquid
as possible.
68. Resuspend the virus stock in 30 μL PBS, make 3 μL aliquots (or to preference), and
store at -80°C.
CRITICAL STEP: Avoid multiple freeze-thaw cycles to prevent virus stock
degradation.
CRITICAL STEP: Titer the virus stocks and test the barcode diversity by
performing steps 69–89 on a cell line similar to the experimental cells. Make sure
that the viral concentration and exposure time is appropriate for transduction and
results in 15–50% GFP expression.
47
CRITICAL STEP: Test the diversity of the lentiviral library by performing steps 69
–127 on a cell line. Proceed only if the barcode diversity allows for tracking single
cells in the intended experiment (step 127)(Lu et al., 2011).
Pause Point. Store virus stock at -80°C for up to one year.
?Troubleshooting.
Transducing Experimental Cells, Timing 13–16 hours)
CAUTION: Except for centrifugation, all downstream procedures after step 69
must be performed in a biosafety cabinet. Use freshly prepared 10% (vol/vol)
bleach solution to disinfect tips after pipetting the virus.
69. Prepare experimental cells in a new tube. Cells can be grown in suspension or
adherent.
70. Spin down the collected cells at 300 x g at 4°C (speed may vary depending on cell
type).
71. Aspirate the supernatant with a pipette, leaving around 20 μL above the pelleted cells.
72. To allocate cells into different samples, resuspend the cells in desired culture media to
achieve 30 μL volume per sample. Aliquot each 30 μL cell suspension sample into
individual wells in a 96-well plate.
73. Add barcode virus from step 68 at determined viral concentration to each well
containing the aliquoted cells. If samples will be pooled for sequencing (step 119), each
sample needs to receive a different virus library ID. Mix with the pipette. CRITICAL
STEP: Always keep the virus stock on ice.
48
74. Incubate at 37°C for determined viral exposure time from step 68.
75. Collect each sample by adding 200 μL of PBS (or media of choice) to each well, mix by
pipetting, and transfer the cells into a new tube for every sample. CRITICAL STEP:
Adherent cells will need to be detached from the plate.
76. To collect residual cells, wash each well with 200 μL of PBS and transfer into the
respective tubes from step 75.
77. Repeat step 76 three more times (1 mL final volume).
78. Spin down each tube at 300 x g at 4°C for 5 minutes.
79. Aspirate and discard the supernatant leaving ~50 μL above the pelleted cells. Add 1 mL
of PBS (or media of choice) and mix by pipetting thoroughly.
80. Repeat steps 78 and 79 for a total of three washes.
81. Cells are now barcoded and ready for an experiment.
OPTIONAL: Culture a small aliquot of experimental barcoded cells for 3–5 days and
analyze GFP expression using flow cytometry to estimate the transduction rate.
?Troubleshooting.
Barcode Extraction, Timing 4–5 hours
82. Pellet experimental barcoded cells by centrifuging at 300 x g at 4°C for 5 minutes.
Remove and discard supernatant. Pause Point. Store cell pellet at -80°C for up to six
months.
49
83. Extract genomic DNA using Quick-gDNA Micro Prep kit. Start by adding 400 μL of
Genomic Lysis Buffer to cell pellets (add 800 μL lysis buffer if the sample contains more
than 0.8 million cells).
84. Mix completely by vortexing for 4–6 seconds, then incubate for 5–10 minutes at room
temperature.
85. Transfer the mixture to a Zymo-Spin™ IC Column in a Collection Tube. Do not work
with more than 1,000,000 cells. For steps 81—85, centrifuge at 10,000 x g for one
minute at room temperature. Discard the Collection Tube with the flow through.
86. Transfer the Zymo-Spin™ IC Column to a new Collection Tube. Add 200 μl of DNA Pre-
Wash Buffer to the spin column. Centrifuge.
87. Add 500 μL of g-DNA Wash Buffer to the spin column. Centrifuge.
88. Transfer the spin column to a clean microcentrifuge tube. Add 20 μL DNA Elution Buffer
to the spin column. Incubate for 2–5 minutes at room temperature and then centrifuge to
elute the DNA.
89. Load the flow through again to the same column, incubate for another 5 minutes, and
then centrifuge to elute the DNA for a second time.
90. Use 20 μL pipette to measure the eluted volume.
CRITICAL STEP: Measure genomic DNA concentration using Qubit (step 110-
118). For the subsequent PCR reaction, use no more than 1,000 ng of gDNA in the
40 μL PCR reaction. Multiple samples can be combined into one PCR reaction if
their library IDs are different. Make sure that each cell sample is equally
represented by using equivalent amounts of gDNA in the combined PCR reaction.
Pause Point. Store gDNA at -20°C for up to one year.
50
?Troubleshooting.
91. Barcode recovery by qPCR using Phusion High-Fidelity PCR Master Mix with HF Buffer.
Keep reagents on ice.
92. 40 μL PCR reaction. Use a negative control without DNA template to assess
background noise and contamination.
Experiment setup:
Component Amoun
t
Final
(concentration/amount
)
Template (from Step
90)
16 μL
Forward Primer (10 µM;
Table 2.2)
2 μL 1 μM
Reverse Primer of
choice (10 µM; Table
2.2)
2 μL 1 μM
Phusion Mix (2X) 20 μL 1X
EvaGreen dye 0.4 μL 1X
Negative control:
51
Component Amoun
t
Final
(concentration/amount
)
Nuclease-free water 16 μL
Forward Primer (10 µM;
Table 2.2)
2 μL 1 μM
Reverse Primer of
choice (10 µM; Table
2.2)
2 μL 1 μM
Phusion Mix (2X) 20 μL 1X
EvaGreen dye 0.4 μL 1X
93. Mix, briefly spin down, and load to qPCR machine.
Cycle number Denature Anneal Extend
1 98°C, 30
seconds
2-27 98°C, 10
seconds
65°C, 30
seconds
72°C, 30
seconds
28 72°C, 10
minutes
52
CRITICAL STEP: Stop the PCR when the fluorescence increases by 900,000 units
and/or after 5–8 PCR cycles from the start of the exponential phase. The PCR
curve should still be in the exponential phase when the PCR is stopped, which
typically occurs between 20-27 cycles.
CRITICAL STEP: Run samples with similar cell numbers together. Samples with
significantly different numbers of cells need to be processed separately, because
they will need different numbers of PCR cycles.
?Troubleshooting.
94. Vortex SPRIselect Magnetic Beads before use.
95. Add 1.8X beads to sample from step 93 (72 μL beads for 40μL PCR reaction) and
gently mix with pipette 15 times. CRITICAL STEP: Do not vortex for steps 95–107.
96. Incubate the sample with beads at room temperature for 5 minutes.
97. Condense beads into pellet with magnet for 3–5 minutes.
98. Remove and discard supernatant without disturbing beads, leaving ~5 μL behind at the
bottom of the tube. Keep pelleted until the elution step, do not disturb the pellet.
99. Pipette 200μL of 70% (vol/vol) ethanol without disturbing beads, and keep them
pelleted. CRITICAL STEP: Prepare fresh 70% (vol/vol) ethanol. Ethanol that has
been stored for too long will have an incorrect ethanol to H2O ratio, which will
impair DNA yield.
100. Leave ethanol with beads for 30 seconds, then remove and discard ethanol.
101. Repeat wash (steps from 99 and 100) for a total of two ethanol washes).
102. Remove as much of the ethanol as possible. Be mindful of small ethanol droplets.
53
103. Air dry the pellet for ~1 minute. CRITICAL STEP: The drying time for beads is
variable. Be careful not to OVERDRY the pellet, which will lead to cracking and/or
breakup, thus reducing DNA recovery.
104. Add 20 μL of nuclease free water to all samples, and then pipette mix 15 times.
Repeat the mixing to ensure better recovery.
105. Incubate for 5 minutes.
106. Condense beads into pellet with magnet for 3–5 minutes.
107. Collect the supernatant into a new tube.
108. Optional: Capture carry-over beads with magnet for 3–5 minutes, and transfer the
supernatant into a new tube.
109. Quantify the concentration of the purified PCR product using Qubit fluorimeter.
Pause Point. Store purified PCR product at -20°C for up to one year.
CRITICAL STEP: Incubate with SPRIselect beads (steps 94–108) longer to
increase recovery rate.
?Troubleshooting.
DNA Quantification, Timing 0.5–1 hour
110. Quantify the concentration of the barcode library.
111. Prepare 0.5 mL tubes. The number of tubes necessary is the number of samples
plus two. Label them with the appropriate sample IDs and “S1” and “S2”.
112. Using the Qubit dsDNA Reagents HS Assay kit, prepare working solution by
mixing dsDNA HS Buffer and dsDNA HS Reagent at a ratio of 199:1. Make enough for
54
200 μL x (sample number + 2). For example, for 2 samples, prepare at least 200 μL x (2
+ 2) = 800 μL of working solution.
113. In “S1” tube, mix 190 μL working solution with 10 μL dsDNA HS Standard 1.
114. In “S2” tube, mix 190 μL working solution with 10 μL dsDNA HS Standard 2.
115. In sample tubes, mix 199 μL working solution with 1 μL of sample from step 107.
116. Vortex all the tubes, micro-centrifuge for a few seconds, then incubate at room
temperature for 2 minutes.
117. Turn on the Qubit fluorimeter, select “DS-HS-DNA” and run new calibration using
the S1 and S2 tubes. Then load sample tubes following the instructions on the screen.
118. Write down the measurement results.
High-throughput Sequencing, Timing 2–3 days
CRITICAL STEP: Genomic DNA from cells tagged with different library IDs may use the
same reverse primer in qPCR amplification, and can be combined into one sequencing
sample. Genomic DNA from cells tagged with the same library ID should be amplified
with different reverse primers if they will be combined in one sequencing sample (Table
2.2, qPCR). Table 2.2 provides three examples of reverse primer designs. Additional
reverse primers can be designed as needed.
119. When pooling sequencing samples, use the same amount of DNA from each cell
source. Try to save half of each sample in case of a failed run. Remaining sample can
be stored at -20°C for up to one year.
120. Sequence samples using Illumina NextSeq
55
• Sequencing primer: See Table 2.2. Note that Index Sequencing Primer is not a standard
Illumina sequencing primer.
• Sequencing cycle: Read1 for at least 50 cycles; Index i7 for 6 cycles.
• Sequencing depth: Plan for 2 million reads per cell source.
• Sequencing sample name: Use the 6bp Reverse Primer index (Table 2.2) as the
sample name.
?Troubleshooting.
Analyzing Sequencing Data, Timing 1 day
121. Install Anaconda Distribution, Python 2.7 version
(https://www.anaconda.com/distribution/).
122. From the Start Menu, locate “Anaconda2 (64-bit)” folder, then start Anaconda
Prompt, then type in the terminal window:
pip install python-Levenshtein
and
pip install biopython
In a few minutes, messages will show up indicating that the packages have been
successfully installed.
123. In the “Anaconda2 (64-bit)” folder, start Spyder software. Use the software to
open “*.py” Python scripts in the “code demo” folder (Supplemental Information).
124. Open “step-1_read-raw-data.py” file, edit “variables subject to change”
accordingly:
56
fastq_location: The directory where raw sequencing data are stored. Sample data is
provided in a file named “GCCAAT_S1_R1_001.fastq.gz”. “GCCAAT” is the sample
index corresponding to the reverse primer that was used to extract the barcodes.
date_today: The code uses this information to document data analysis history.
step1_output: Folder to store the output of this script. Default is “step-1”. If folder does
not exist, the code will create one.
Run the file. “GCCAAT_041519.txt” file will be generated in the output folder. It includes
the first 50 bps of all reads in this sample.
125. Open “step-2_combine-library-ID.py” file, and edit “variables subject to change”
accordingly:
step1_location: Folders storing the step-1 output. It is the input for this step.
sample_info: Tab-delimited text file. A template can be found in the “code demo folder”
• “reverse primer” and “primer index”: Information about the reverse primer used to
extract barcodes from the sample. Note that primer index was used to name the
sequencing sample.
• “sample” and “lib ID”: Cell sample name and the corresponding virus library ID. Note
that cell samples with different virus library ID can be combined into the same
sequencing sample.
distance_allowed: Edit distance allowed when determining whether the 34
th
to 50
th
bp of
each read is the adaptor sequence. Default is 4.
step2_output: Folder to store the output of this script. Default is “step-2”. If folder does
not exist, the code will create one.
57
Run the file. Two types of output files will be generated in the output folder:
• {sample}.bin: For this demo, “sample1.bin” and “sample2.bin”. They store the
intermediate files that serve as input for the next step.
• {index}_stats.txt: For this demo, “GCCAAT_stats.txt”. It is the quality report for each
sequencing sample in this step.
?Troubleshooting.
126. Open “step-3_combine-barcodes.py” file, edit “variables subject to change”
accordingly:
step2_location: Folders storing the step-2 output. It is the input for this step.
distance_allowed: Edit distance allowed when determining whether two barcodes are
legitimately the same. Default is 4.
step3_output: Folder to store the output of this script. Default is “step-3”. If folder does
not exist, the code will create one.
Run the file. Four types of output files will be generated in the output folder:
• {sample}_{number of reads}.txt: tab-delimited text file; the first column is the barcode
sequence, and the second column is the number of reads of this barcode.
• {sample}_{number of reads}.bin: Binary file storing a dictionary, whose key is master
barcodes, and item is a dictionary in which the key is barcodes that are legitimately the
same as the master barcode, and item is the corresponding number of reads.
• {sample}.xlsx: excel file with two worksheets. The ‘intraclonal’ sheet reports the copy
number and the Levenshtein distance with the master barcode for each unique barcode;
the ‘interclonal’ sheet reports Levenshtein distance between different master barcodes.
58
• step3_stats.txt: Quality report for this step.
?Troubleshooting.
127. This step only applies for evaluating plasmid library diversity post plasmid
generation (step 56) and for evaluating lentiviral library diversity post lentiviral
packaging (step 68). Open “step-4-opt_evaluating-barcode-diversity.py” file, edit
“variables subject to change” accordingly:
• library_file: Tab-delimited text file generated by “step-3_combine-barcodes.py”
• intended_cells: Amount of cells to be tracked in an intended experiment
• simulation_events: Number of Monte Carlo simulation experiments. Default is 1000.
While a larger number of experiments take longer time, the user should try different
values until a stable result is obtained.
Run the file. The code will print on the screen whether the given library is sufficiently
diverse to track the intended number of cells with greater than 95% probability that more
than 95% of the barcodes represent single cells.
Timing
• Plasmid Generation, Steps 1 — 56 (2 — 3 days)
• Lentivirus Packaging, Step 57 — 68 (4 — 5 days)
• Transducing Experimental Cells, Step 69 — 81 (13 — 16 hours)
• Barcode Extraction, Step 82 — 109 (4 — 5 hours)
• DNA Quantification, Step 110 — 118 (0.5 hour — 1 hour)
• High-throughput Sequencing, Step 119 — 120 (2 — 3 days)
59
• Analyzing Sequencing Data, Step 121 — 126 (1 day)
ANTICIPATED RESULTS
Plasmid Generation
Barcode oligos should be inserted at the BamHI and EcoRI restriction enzyme sites.
Plasmid should be ~100 bp larger (~7250 bp if using the pCDH lentivirus vector) and
circularized when ligation is successful. PCR using qPCR primers from Table 2.2 should
produce a ~150 bp product. Barcode diversity should be sufficient for the intended
experiment tested by step 127.
Lentivirus Packaging
Virus should have an appropriate viral titer that ensures 15–50% transduction efficiency,
and should have enough barcode diversity for the intended experiment tested by step
127.
Transducing Experimental Cells
After completing steps 68 and 81, transduction should produce 15–50% GFP
expression via flow cytometry to reduce the chance of double barcoding. When testing
barcode diversity using control cell lines, higher percentage of GFP positive cells is
acceptable.
Barcode Extraction
qPCR amplification of barcodes should produce a typical exponential curve, which is
absent for the negative control (Figure 2.3).
60
DNA Quantification & High-throughput Sequencing
Barcode DNA should be ~ 150 bp and yield > 1 ng/µl per sample. High throughput
sequencing will provide one “.fastq” file for each used reverse primer. The file name will
begin with the 6bp Reverse Primer Index. (Table 2.2).
Analyzing sequencing data
Barcode quantification results will be generated (Figure 2.5). Additionally, “step-
2_combine-library-ID.py” and “step-3_combine-barcodes.py” will both generate
“stats.txt” file for quality check.
In stats.txt files generated in step 2 of the analysis (Step 125):
1. “% valid reads based on 17bp ending” should be at least 70–80%.
2. “Numbers of reads with expected virus ID” should be higher than those with unexpected
library IDs. (Figure 2.5a)
Author Contributions
R.L. conceived and developed the protocol. C.B., D.J., J.C., and A.N. optimized the
barcode extraction protocol. D.J. and J.E. improved the Python code for analyzing high
throughput sequencing data. C.B., D.J., and R.L. prepared the manuscript. J.C. and
A.N. provided assistance in manuscript text preparation.
Competing Interests
The authors declare that they have no competing financial interests.
61
ACKNOWLEDGMENTS
We thank all members of the Rong Lu lab for helping optimize the protocol, and thank
C.L. for help editing the text. This research was funded primarily by the National
Institutes of Health (NIH) R00 early investigator grant (NIH-R00-HL113104) and R01s
(R01HL135292 and R01HL138225). R. Lu is a Scholar of the Leukemia & Lymphoma
Society and a Richard N. Merkin Assistant Professor. The project described was
supported in part by award number P30CA014089 from the National Cancer Institute.
The content is solely the responsibility of the authors and does not necessarily
represent the official views of the National Cancer Institute or the National Institutes of
Health.
62
Table 2.1. Barcode Oligos for each Library ID. The core of each oligo consists of a
6bp library ID and a 27bp random sequence represented as N’s. The core is flanked by
forward and reverse primer binding sites as well as restriction enzyme sites. Additional
6bp sequence is added at both ends to ensure proper restriction enzyme cutting.
63
Virus
library
Library
ID
DNA Oligos (5’ – 3’)
1 CGTGAT
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTCGTGATNNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
2 ACATCG
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTACATCGNNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
3 GCCTAA
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTGCCTAANNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
4 TGGTCA
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTTGGTCANNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
5 CACTGT
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTCACTGTNNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
6 ATTGGC
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTATTGGCNNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
7 GATCTG
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTGATCTGNNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
8 TCAAGT
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTTCAAGTNNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
9 CTGATC
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTCTGATCNNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
10 AAGCTA
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTAAGCTANNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
11 GTAGCC
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTGTAGCCNNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
12 TACAAG
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTTACAAGNNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
13 ATGACA
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTATGACANNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
14 AGCGGT
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTAGCGGTNNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
15 ACTCAG
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTACTCAGNNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
16 TAACGT
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTTAACGTNNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
17 TGTTAC
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTTGTTACNNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
18 TCCGTA
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTTCCGTANNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
19 GAGTTC
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTGAGTTCNNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
20 GTCGAG
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTGTCGAGNNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
21 GCAACT
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTGCAACTNNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
22 CAGTGC
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTCAGTGCNNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
23 CTTATG
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTCTTATGNNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
24 CGACCT
CGCCGCGGATCCACACTCTTTCCCTACACGACGCTCTTCCGATCTCGACCTNNNNNNNNNNNNNN
NNNNNNNNNNNNNAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGGAATTCCGGCG
64
Table 2.2. Primer list.
Procedure Primer Sequence (5’-3’)
Plasmid generation Strand2 CGCCGGAATTCCAAGCAGAAGACGGCATACGA
qPCR
Forward
AATGATACGGCGACCACCGAGATCTACACTCTTTCCC
TACACGACGCTCTTCCGATCT
R1 (GCCAAT)
CAAGCAGAAGACGGCATACGAGATGCCAATACGGCAT
ACGAGCTCTTCCGATCT
R2 (GATCTG)
CAAGCAGAAGACGGCATACGAGATGATCTGACGGCAT
ACGAGCTCTTCCGATCT
R3 (TCAAGT)
CAAGCAGAAGACGGCATACGAGATTCAAGTACGGCAT
ACGAGCTCTTCCGATCT
Sequencing
Custom Index
Primer
AGATCGGAAGAGCTCGTATGCCGT
65
TROUBLESHOOTING
Troubleshooting advice can be found in Table 2.3.
Table 2.3. Troubleshooting table.
Step Problem Possible reason Possible Solution
42
Fewer than 100 colonies in
1:10,000 dilution
Poor bacteria
transformation
Improve the
transformation
efficiency during
cloning
56
Step-4-
opt_evaluating_barcode_diversity
output: “No, please increase
library diversity”
Plasmid library
barcode diversity
too low for desired
cell number
(1) Redo
transformation
(2) Re-ligate DNA
oligo library into
vector
(3) Reorder DNA
oligos from
manufacturer
68
Step-4-
opt_evaluating_barcode_diversity
output: “No, please increase
library diversity”
Lentiviral library
barcode diversity
too low for desired
cell number
Repackage lentiviral
library
66
68,
81
GFP% < 15%
Low transduction
efficiency
Increase the viral titer
68,
81
GFP% > 50%
Too high
transduction
efficiency
Experiment should
be stopped. Set up a
new transduction and
incubate cells with
virus shorter or
reduce the viral titer
90
Low yield of genomic DNA (less
than 1 ng/µL)
Too few barcoded
cells to start with
Start with at least
10,000 barcoded
cells
93
Amplification beyond the
exponential phase
PCR terminated
too late
Repeat the qPCR
reaction
93
No amplification curve after 27
cycles
Missing
fluorescent dye
such as
EvaGreen;
too few barcoded
cells in this
sample
Add a fluorescent
dye
If the sample does
not have enough
barcoded cells, it
67
may not be worth
further investigation
93
Curve plateaus before
fluorescence increases to
900,000 units.
Too much
template
Use no more than
1000 ng of the
template for qPCR
109
Low DNA yield (less than 1
ng/µL) after beads purification
Incorrect volume
of beads;
beads pellet dried
out
Repeat the qPCR
and beads
purification steps
120
Number of reads less than
expected
Poor quantification
of DNA library
Use reliable method
to quantify DNA
concentration
120 Failed to demultiplex
Incorrect index
primer used for
sequencing
Re-sequence the
library using the
correct custom index
primer provided in
Table 2.2
125
Many reads seen with
unexpected library ID
Other cell sources
present in this
sample
Check the
experimental set-up,
modify sample
information file, and
68
re-run the data
analysis
125
More “combined codes” than
expected
Misread allowance
was set too
stringent
Set different values
for
“distance_allowed” in
the code. Use “4” by
default
FIGURES & LEGENDS
69
Figure 2.1. Experimental Workflow. a. Synthesized semi-random barcode oligos
(Table 2.1) are cloned into plasmids before packaging into a lentiviral vector. Cells of
interest are then transduced. To retrieve barcodes, genomic DNA is extracted before
qPCR amplification and high-throughput sequencing. Raw sequencing data is
processed by a custom data analysis pipeline to quantify the abundance of each
barcode. b. PCR strategy. The 33bp cellular barcode is flanked by an Illumina TruSeq
read1 sequence and a custom read2 sequence so that a single PCR reaction can add
the Illumina P5 and P7 adaptors to the ends of each barcode. See Table 2.2 for primer
sequences.
70
Figure 2.2. Comparing barcode extraction replicates. Primary mouse hematopoietic
stem cells were barcoded and transplanted into recipient mice. 4 months after
transplantation mice were bled, and white blood cells were collected and processed
following steps 69 — 126. Cell lysates were divided into two replicate samples, and
processed separately for genomic DNA extraction, barcode amplification, and
sequencing. Each dot represents a barcode. Shown is the abundance of barcodes
highly consistent in replicated samples. Pearson correlation: 0.99; P value: 5.3 x 10
-144
.
Animal procedures were approved by the Institutional Animal Care and Use Committee
and mice were maintained at the Research Animal Facility of the University of Southern
California.
71
Figure 2.3. qPCR Amplification of Barcode. BB88 cells were barcoded and 50,000
GFP positive cells were sorted via FACS one week after transduction. gDNA was
isolated and amplified using primers from Table 2.2. Shown is a multi-component plot of
barcode amplification. EvaGreen fluorescent dye (green lines) is used to quantify DNA
72
amount, thus no Rox signal was observed (red lines). a. Two samples with similar
amounts of genomic DNA were amplified together, and their exponential curve emerged
at similar PCR cycles. We stopped the reaction at cycle 25, which is about halfway
through the exponential phase. This is to avoid over-amplification and to reduce
background signals. No DNA template control samples show no amplification (two flat
green lines). b. One sample was amplified to saturation. This is an example of over-
amplification.
73
Figure 2.4 Optimizing the edit distance thresholds. Histograms show the distances
between unique sequences and their corresponding master barcode (red), as well as
the distances between different master barcodes (blue). Each row shows one edit
distance threshold. Data from two independent samples are shown in two columns.
Threshold of edit distance 4 was chosen where the distances between master barcodes
are higher than and separated from the distances between unique sequences and their
master barcodes.
74
Figure 2.5. Python pipeline outputs. Primary human acute lymphoblastic leukemia
(ALL) cells were barcoded and transplanted into non-obese diabetic scid-gamma mice
(NSG). Two months after transplantation, mice were bled, ALL cells were collected and
processed following steps 69 — 126. ALL cells barcoded with virus library 8 and 9 were
used for this example. a. Custom algorithms written in Python code group reads based
on their library IDs. b. The Python algorithm quantifies each barcode with consideration
to sequencing errors. Each color represents a unique barcode, and the size represents
its relative abundance. Shown are data from library 9 in Figure 2.4a. Animal procedures
were approved by the Institutional Animal Care and Use Committee and mice were
maintained at the Research Animal Facility of the University of Southern California.
Supplementary Information
Supplementary Software
75
• code_demo.zip
The software for data analysis (steps 121-127)
DATA AVAILABILITY
A sample dataset has been deposited in Figshare; DOI: 10.35092/yhjc.11374446. This
dataset was used to generate Figure 2.4 and Figure 2.5.
CODE AVAILABILITY
The Python scripts have been provided in Supplementary Software of this manuscript.
Related links
Key references using this protocol
Lu, R. et al Nat. Biotechnol. 29, 928–933 (2011): https://doi.org/10.1038/nbt.1977
Nguyen, L. et al. EMBO Rep. 19, e45702 (2018):
https://doi.org/10.15252/embr.201745702
Brewer, C. et al Cell Rep. 15, 1848–1857 (2016):
https://doi.org/doi:10.1016/j.celrep.2016.04.061
76
Chapter Three: RNA splicing underlies heterogeneous preleukemic clonal expansion
Preface:
Chapter three, in full, is a manuscript submitted to Nature Genetics. The work in chapter
three aims to dissect the heterogeneity associated with preleukemic expansion by
leveraging genetic barcoding and single-cell transcriptomics. This works identifies a
class of myeloid-biased clones that exhibit high levels of preleukemic expansion,
seemingly by repression RNA splicing factors. Ultimately, we found the repression of a
specific RNA splicing factor, Rbm25, increased hematopoietic stem and progenitor
proliferation by altering pro-apoptotic and anti-apoptotic mRNAs which may cause
increased clonal fitness of such clones.
77
RNA splicing underlies heterogeneous preleukemic clonal expansion
Charles Bramlett
1
, Jiya Eerdeng
1
, Du Jiang
1
, Patrick Condie
1
, Yeachan Lee
1
, Mary
Vergel
1
, Ivon Garcia
1
, Anna Nogalska
1
, Rong Lu
1*
Affiliations:
1 Department of Stem Cell Biology and Regenerative Medicine, Eli and Edythe Broad
Center for Regenerative Medicine and Stem Cell Research, Keck School of Medicine,
University of Southern California, Los Angeles, CA, 90033, USA
*Corresponding author. Email: ronglu@usc.edu
78
Abstract (150 words)
Clonal expansion sets the stage for cancer genesis by allowing for the accumulation of
molecular alterations. While genetic mutations that induce clonal expansion and
malignancy, such as Tet2, have been identified, these mutations are also frequently
found in healthy individuals. Here, we tracked preleukemic clonal expansion using an
inducible Tet2 knockout mouse model and found that only a small fraction of
hematopoietic stem cells (HSCs) expand excessively upon Tet2 knockout. These
overexpanded HSCs expressed significantly lower levels of genes associated with
leukemia and RNA splicing compared to non-overexpanded Tet2 knockout HSCs.
Knocking down of Rbm25, an identified RNA splicing factor, accelerated the expansion
of Tet2-knockout hematopoietic cells in vitro and in vivo. Our data suggest that
mutations of epigenetic factors induce variability in the expression of RNA splicing
factors, which subsequently drives heterogeneous preleukemic clonal expansion. This
heterogeneous clonal expansion could contribute to the variable disease risks across
individuals.
Introduction
Clonal expansion is the process whereby the number of progenies of a cell excessively
increases(Cartier et al., 2009; Hacein-Bey-Abina et al., 2003, 2014; Jaiswal & Ebert,
2019), it plays a crucial role during the early phases of oncogenesis. Clonal expansion
is often initiated by sporadic genetic mutations or epigenetic dysregulations, which in
79
turn provide greater opportunities for accumulating the additional genetic and epigenetic
changes that may eventually lead to malignancy(Desai et al., 2018; Hacein-Bey-Abina
et al., 2003; Jaiswal et al., 2014; Warren & Link, 2020). In the hematopoietic system,
clonal expansion, named clonal hematopoiesis, can be easily monitored by sampling
the peripheral blood. Clonal hematopoiesis is monitored in clinical settings as an
indicator of leukemia genesis(Desai et al., 2018; Warren & Link, 2020) and provides an
easily accessible experimental model for studying the initial phase of cancer genesis.
Several commonly mutated genes have been identified in clonal hematopoiesis
(Jaiswal et al., 2014; Jaiswal & Ebert, 2019; Jan et al., 2012; Zink et al., 2017), many of
these encode epigenetic factors, including TET2, DNMT3A, ASXL1, IDH2, and NPM1
(Jaiswal et al., 2014; Jaiswal & Ebert, 2019; Jan et al., 2012; Zink et al., 2017).
Mutations of these genes are also frequently found in leukemia and preleukemic
hematopoietic disorders in patients (Abdel-Wahab & Levine, 2013; Cimmino et al.,
2011; Genovese et al., 2014; Jaiswal et al., 2014; Roche-Lestienne et al., 2011).
Knockout of these genes in mouse models drives cellular expansion and hematologic
malignancies(Abdel-Wahab et al., 2013; Challen et al., 2012; Moran-Crusio et al., 2011;
X. Zhang et al., 2016). With technical advances in high throughput sequencing that
allow for identifying clonal hematopoiesis marked by sporadic genetic mutations, recent
studies show that individuals carrying preleukemic mutations have an increased risk of
developing leukemia, although most of them remain healthy and disease free(Busque et
al., 2012; Jaiswal et al., 2014; Jaiswal & Ebert, 2019; Zink et al., 2017). For example,
mutations of the DNA demethylase TET2 induce preleukemic clonal expansion in both
humans and mice, and are frequently found in hematopoietic malignancies in
80
patients(Abdel-Wahab et al., 2009; Bacher et al., 2010; Cimmino et al., 2011;
Delhommeau et al., 2009; Langemeijer et al., 2009; Metzeler et al., 2011; Nakajima &
Kunimoto, 2014; Tefferi & Vardiman, 2009) as well as in people without any sign of
hematological disease(Busque et al., 2012; Jaiswal et al., 2014; Quivoron et al., 2011).
The variable outcomes in humans may arise from different responses of individual cells
to the same leukemic mutation. Most studies up to date analyze bulk cell populations
are unable to address cell-cell variation. Understanding the cell-cell variation can help to
improve cancer prognosis by further stratifying high risk individuals and may provide
new therapeutic targets.
Hematopoietic stem cells (HSCs) sustain life-long hematopoietic regeneration
and provide unique opportunities for accumulating genetic mutations that initiate clonal
expansion and disease genesis. Recent studies have revealed that HSCs are
heterogeneous in their proliferation, differentiation and gene expression(Lu et al., 2019;
Weinreb et al., 2020). We hypothesized that the variation among individual HSCs might
generate variable responses to the same preleukemic mutation and produce
heterogenous levels of clonal expansion. To test this hypothesis, we applied a genetic
barcoding technology (Bramlett et al., 2020; Lu et al., 2011) to track the clonal
expansion of hundreds of individual HSCs with Tet2 knockout (KO) in mice as a model.
In addition, we integrated single cell RNA sequencing with clonal tracking(Contreras-
Trujillo et al., 2021) to identify genes that were differentially expressed between Tet2
KO clones that overexpanded and those that did not. Our results suggest that only a
small fraction of Tet2 KO HSCs exhibits excessive clonal expansion and that their
expansion is associated with the reduced expression of RNA splicing factors.
81
Results
Tet2 deletion induces cellular expansion specifically in the myeloid lineage
To investigate the heterogeneity of Tet2 KO-induced clonal expansion, we isolated and
barcoded HSCs from Rosa26-Cre
ERT2+
Tet2
fl/fl
mice and, as a control, their littermates
Rosa26-Cre
ERT2−
Tet2
fl/fl
mice. The barcoded HSCs were transplanted into lethally
irradiated wildtype recipients that subsequently received tamoxifen injection at one-
month post-transplantation to induce Tet2 deletion. Tet2 KO mice herein refers to mice
carrying Tet2 deleted HSCs (Figure 3.1a and Figure 3.S1). We found that myeloid cells
(e.g. granulocytes), but not lymphoid cells (e.g. B cells), expanded significantly in the
peripheral blood of Tet2 KO mice compared to control mice 12 months post-
transplantation (Figure 3.1b and Figure 3.S2). At 13 months post-transplantation, the
end time point, significant expansion was detected specifically in the MPP3 population
isolated from the bone marrow (Figure 3.1c and Figure 3.S3), which is known as a
myeloid-biased multipotent progenitor(Pietras et al., 2015). The specificity of clonal
expansion in the myeloid lineage is consistent with the prevalence of TET2 induced
myeloid leukemia(Delhommeau et al., 2009) and with other studies of Tet2 mutant
mouse models(Ko et al., 2011; Moran-Crusio et al., 2011).
Preleukemic clonal expansion is initiated at the HSC level without measurable
population expansion
82
At the clonal level, the numbers of unique barcodes detected in the hematopoietic cells
from the peripheral blood and from the bone marrow were similar between Tet2 KO and
control mice (Figure 3.S4), suggesting that similar numbers of HSC clones were
producing blood in both groups of mice. However, the abundances of the most
abundant clones in HSCs, MPP3s, granulocytes and monocytes of the Tet2 KO mice
were significantly higher compared to the controls (Figure 3.1d-1g and Figure 3.S5).
The high abundance clones were few in number but contributed higher amounts of cells
than the low abundance clones in Tet2 KO mice (Figure 3.1e-1f). This suggested that
the highly expanded clones played an important role in driving the cellular expansion of
granulocyte and MPP3 populations (Figure 3.1b-1c). Notably, HSCs did not exhibit
gross population expansion even though their clonal expansion was evident (Figure
3.1c, 1d and 1g). The high abundance clones in Tet2 KO mice contributed substantial
numbers of HSCs whereas the low abundance clones in these mice contributed less
HSCs compared to those in control mice (Figure 3.1g). This data suggest that clonal
expansion can occur in the absence of gross population expansion when high
abundance clones outcompete low abundance clones while the overall cell number is
constrained.
Overexpanded clones in Tet2 KO mice produced more hematopoietic cells than
any clones in control mice
To identify clones that excessively and abnormally expanded upon Tet2 deletion, we
defined “overexpanded” clones as the Tet2 KO clones whose abundances were greater
than the most abundant clone of control mice. The average abundance of the most
83
abundant clone from each of the four control mice was used as the threshold.
Consistent with the aforementioned myeloid expansion at the population and clonal
levels (Figure 3.1), most overexpanded clones were found in the myeloid lineage and
myeloid-biased HSPCs and were very rarely found in megakaryocyte–erythroid
progenitors (MEPs), common lymphoid progenitors (CLPs), B cells and T cells (Figure
3.2a-2b). We identified 142 overexpanded clones across 12 hematopoietic cell types
among all 6 Tet2 KO mice, which belong to 66 unique clones, as some clones were
identified as “overexpanded” in multiple cell types (Figure 3.2c-2d).
The overexpanded clones represented around 13% of all Tet2 KO clones but
produced more than half of the HSPCs in the bone marrow and mature blood cells in
the peripheral blood (Figure 3.2e). Moreover, the overexpanded clones in Tet2 KO mice
on average produced significantly more MPP3s and granulocytes than “normally
expanded” clones – those did not overexpand – in Tet2 KO mice and all clones in
control mice (Figure 3.2f). These results suggest that the identified overexpanded
clones are few but play a major role in hematopoietic regeneration and in driving the
population-level expansion of MPP3s and granulocytes (Figure 3.1b-1c).
Overexpanded clones initiated cellular expansion at the HSC level that is
unmeasurable at the population level
Although the population-level expansion was not significant amongst HSCs (Figure
3.1c), some HSC clones exhibited overexpansion (Figure 3.1d, 1g and 3.2a). These
clones produced significantly more HSCs compared to normally expanded clones in
Tet2 KO mice as well as all clones in control mice (Figure 3.2f). This suggested that the
84
abnormal clonal expansion was initiated at the HSC level, the uppermost level of the
hematopoietic hierarchy. Furthermore, to determine if the overexpanded clones had
already started expansion before any population level expansion was detected (Figure
3.1b), we retrospectively analyzed the peripheral blood collected at 4, 5, 6, and 7
months post-transplantation. While normally expanded clones in Tet2 KO mice
continuously produced similar amounts of granulocytes as clones in control mice, the
overexpanded clones produced significantly more granulocytes as early as 4 months
after transplantation (Figure 3.2g). These results suggest that clonal expansion is
initiated at the apex of the hematopoietic hierarchy.
Overexpanded myeloid-biased clones drive cellular expansion upon Tet2 deletion
As some of the overexpanded clones appeared to expand only at the HSC and MPP
stages (Figure 3.2d and 3.2e), we next analyzed the heterogeneity of overexpanded
clones to identify those that contributed to the observed myeloid expansion. We
performed hierarchal clustering of all clones from control and Tet2 KO mice and
identified three major clusters that represented myeloid-biased clones, lymphoid-biased
clones, and self-renewing clones (Figure 3.3a and Figure 3.S6). Myeloid-biased clones
were highly abundant in myeloid cell types, such as MPP3, CMP, GMP, granulocyte
and monocyte populations, and had low abundance in lymphoid cells, including CLPs, B
cells and T cells (Figure 3.3a and Figure 3.3b). Lymphoid-biased clones were highly
abundant in CLP, B cell, and T cell populations, as well as in MPP4, known as
lymphoid-biased multipotent progenitors(Pietras et al., 2015) (Figure 3.3a and 3.3b).
Self-renewing clones were highly abundant in HSC and MPP1/2 populations, the most
85
primitive multipotent progenitors immediately downstream of HSCs(Pietras et al., 2015),
and had low abundance in all downstream cell types (Figure 3.3a and 3b). Amongst all
Tet2 KO clones, overexpanded clones were more likely to be myeloid-biased clones
and less likely to be self-renewing clones compared to a random distribution (Figure
3.3c). Conversely, we also found that myeloid-biased Tet2 KO clones were more likely
to overexpand than lymphoid-biased and self-renewing Tet2 KO clones (Figure 3.3d).
These data suggest that myeloid-biased HSC clones are more prone to develop
excessive expansion upon Tet2 deletion.
While overexpanded clones on average contributed significantly more to HSCs,
MPP3s and granulocytes than Tet2 KO normally expanded clones and clones in control
mice (Figure 3.2f-3.2h), we found that different subsets of the overexpanded clones
made significantly different contributions (Figure 3.3e). While both myeloid-biased and
self-renewing overexpanded clones contributed substantially to HSCs (Figure 3.3e),
expansion at the MPP3 and granulocyte stages were specifically attributed to myeloid-
biased overexpanded clones. In particular, myeloid-biased overexpanded clones
contributed significantly more to MPP3s and granulocytes than Tet2 KO normally
expanded clones and clones in control mice, and more than lymphoid-biased
overexpanded clones and self-renewing overexpanded clones (Figure 3.3e). Therefore,
myeloid-biased overexpanded clones play a major role in driving the observed
population-level expansion at the MPP3 and granulocyte stages.
Myeloid-biased overexpanded and normally expanded Tet2 KO clones express
genes differently
86
Since overexpanded myeloid-biased clones constituted only a small subset of all Tet2
KO clones but played a dominant role in driving the expansion of HSC, MPP3, and
granulocyte populations (Figure 3.3), we determined if distinct gene expression
characteristics underlied their expansion. We used 10X Chromium single cell RNA
sequencing to profile single cell transcriptomes and recovered the clonal tracking
barcodes from the cDNAs of individual cells as previously described (Figure 3.4a and
Figure 3.S7)(Contreras-Trujillo et al., 2021). By mapping the clonal tracking barcodes
with single cell cDNA barcodes (Figure 3.4a), we were able to compare HSCs and
MPP3s derived from overexpanded Tet2 KO myeloid-biased clones and Tet2 KO
myeloid-biased normally expanded clones in order to identify genes specifically
associated with different levels of clonal expansion (Figure 3.4b-4g).
Our results show that the most significantly differentially expressed genes were
down regulated in overexpanded clones compared to the normally expanded clones
(Figure 3.4c and 3.4f), consistent with reports that Tet2 KO HSCs undergo DNA
hypermethylation(Ko et al., 2011; Tulstrup et al., 2021). Critically, many genes
differentially expressed between overexpanded and normally expanded Tet2 KO
myeloid-biased clones exhibited little or no difference in their expression between
normally expanded Tet2 KO clones and control clones (Figure 3.4d, 3.4g), indicating
their specific association with excessive clonal expansion instead of with the Tet2
mutation.
Many differentially expressed genes that we identified had already been shown
to play essential roles in cancer genesis. For example, Smap1, Etv6, Rbm25, Tut7, and
Ythdc1 identified from myeloid-biased overexpanded HSCs (Figure 3.4d) have been
87
implicated in cancer-related pathways and leukemia (Cheng et al., 2021; Di Paola &
Porter, 2019, p. 6; Fu et al., 2014; Ge et al., 2019, p. 25; Kon et al., 2013). Similarly,
genes identified from myeloid-biased overexpanded MPP3s, such as Uba52, Erh,
Hspa8, Jun, Fut8, and Fos (Figure 3.4g), had also been implicated in cancer (Dai et al.,
2020; Han et al., 2012; Li & Ge, 2021, p. 8; Siegmund et al., 2001; Weng et al., 2015; C.
Zhou et al., 2017). In particular, Jun and Fos have been identified as proto-oncogenes
that cause normal cells to become malignant when mutated. Therefore, our approach of
comparing cells carrying the same Tet2 genetic mutation but expanding at different
levels identified genes underlying cancer genesis. These results highlight the disease
relevance of the molecular differences between clones exhibiting different levels of
preleukemic clonal expansion.
Inhibition of RNA splicing underlies Tet2 KO-induced preleukemic clonal
expansion
To identify the biological processes underlying Tet2 KO-induced clonal expansion, we
performed gene ontology analysis of differentially expressed genes between myeloid-
biased overexpanded Tet2 KO clones and normally expanded Tet2 KO clones (Figure
3.4h and 4f). At both HSC and MPP3 levels, these genes were significantly enriched for
biological processes related to RNA processing, particularly mRNA splicing (Figure 3.4h
and Figure 3.S8). Moreover, these genes were also significantly relevant to acute
myeloid leukemias (AML) (Figure 3.4i and Figure 3.S8), a major disease associated
with TET2 mutations(Bacher et al., 2010, p. 2; Metzeler et al., 2011; Rasmussen et al.,
2015, p. 2; Tulstrup et al., 2021, p. 2).
88
To further verify the relevance of RNA processing and AML in Tet2 KO-induced
excessive clonal expansion, we compared the average expression levels of all RNA
processing and AML related genes in overexpanded clones and normally expanded
clones. We found that these genes were expressed at significantly lower levels in
overexpanded Tet2 KO myeloid-biased HSCs compared to normally expanded Tet2 KO
myeloid-biased clones and myeloid-biased control clones without the Tet2 mutation
(Figure 3.4j). These data suggest that RNA processing plays an essential role in driving
preleukemic clonal expansion upon Tet2 mutation.
Rbm25 modulates the expansion of Tet2 mutant HSPC clones
To determine the functional role of the RNA processing genes that we identified, we
further investigated Rbm25, an RNA splicing factor that was significantly down
regulated in Tet2 KO myeloid-biased overexpanded HSCs compared to Tet2 KO
myeloid-biased normally expanded HSCs and control myeloid-biased HSCs without
Tet2 mutation (Figure 3.4d). Rbm25 was selected because it is among the most
significantly differentially expressed RNA splicing factors in our analysis (Figure 3.4c,
4d). Moreover, RBM25 is expressed at significantly lower levels in human AML blasts
that carry mutations in TET2 or other preleukemia genes encoding epigenetic factors
such as DNMT3A, ASXL1, IDH2, and NPM1 (Döhner et al., 2017) (Figure 3.5a).
To determine if the reduced Rbm25 expression triggered the expansion of Tet2
KO hematopoietic cells, we used the dCas9-KRAB system to down regulate the Rbm25
expression in three mouse leukemic cell lines, BB88, M-NFS-60, and M1, that were
genetically modified to be Tet2 KO and to express dCas9-KRAB (Figure 3.5b and
89
Figure 3.S9). We found that all Tet2 KO cell lines proliferated significantly faster when
transduced with sgRNAs targeting Rbm25 compared to those transduced with non-
genome targeting sgRNAs (Figure 3.5c). This result suggests that Rbm25 repression
accelerates the expansion of Tet2 KO hematopoietic cells.
To determine if Rbm25 repression triggers the expansion of Tet2 KO HSPCs in
vivo, we isolated HSCs from mice carrying heterozygous Tet2 KO and dCas9-KRAB
and transduced them with Rbm25 targeting or non-genome targeting sgRNAs. These
HSCs were then transplanted into lethally irradiated wildtype recipients (Figure 3.5d and
Figure 3.S9). Three months post-transplantation, Rbm25 down regulation had already
induced significantly increased expansion of cKit
+
Sca1
+
lineage
-
progenitors, specifically
the MPP3 sub-population (Figure 3.5e and 3.5f). Additionally, the common myeloid
progenitor (CMP) population also significantly expanded (Figure 3.5f), further
demonstrating the myeloid specific expansion. This result suggests that Tet2 KO
myeloid progenitors undergo increased expansion when Rbm25 expression is
suppressed.
Tet2 KO induces variable expression of Rbm25 that subsequently alters the
splicing of Bcl2l1 mRNA
Previous bulk RNAseq analyses suggest that Rbm25 expression is down regulated
upon Tet2 KO(X. Zhang et al., 2016). Our single cell analysis showed that Rbm25 down
regulation took place specifically in overexpanded clones (Figure 3.4d). We found that
Tet2 KO resulted in increased variability of Rbm25 expression among individual HSCs
(Figure 3.5g). HSCs with lower levels of Rbm25 expression exhibited higher levels of
90
clonal expansion upon Tet2 KO (Figure 3.5h). This correlation only existed among Tet2
KO clones and not among control clones (Figure 3.5h), indicating that Tet2 mutation
was a prerequisite for Rbm25 to modulate clonal expansion.
To determine how Rbm25 repression induced clonal expansion in conjunction
with Tet2 KO, we examined RNA splicing in primary mouse cKit
+
Sca1
+
lineage
-
progenitors that were transduced with Rbm25 targeting sgRNAs and carried
heterozygous Tet2 KO and dCas9-KRAB (Figure 3.5f-g). We found that Rbm25
repression changed the splicing of Bcl2l1 mRNA (Figure 3.5i). Specifically, HSPCs with
Rbm25 down regulation expressed significantly less short splicing isoform and
significantly more long splicing isoform of Bcl2l1. The former is known to promote
apoptosis and the latter is known to inhibit apoptosis(A. Zhou et al., 2008). Together,
these data suggest that Tet2 KO results in variable expression of Rbm25, which
dysregulates splicing of Bcl2l1 mRNA and promotes clonal expansion (Figure 3.5j)
Discussion
While the heterogeneity in prognosis of leukemic mutations across individuals has been
well recognized (Dorsheimer et al., 2019; Jaiswal & Ebert, 2019; Lindsley et al., 2017),
little is known about the underlying cellular and molecular mechanisms. Here, we
provide original experimental evidence that this may be partially attributed to different
expansion levels induced by a single mutation among individual cells. We identify a
surprisingly small number of clones whose expansion levels upon Tet2 KO exceed
normal expansion levels in control mice (Figure 3.2). Furthermore, we show that
overexpanded clones express low levels of RNA splicing factors (Figs. 4-5) and that the
91
over expansion takes place at specific stages of myeloid differentiation (Figs. 1-2).
These findings suggest that sporadic preleukemic mutations in humans do not always
induce excessive clonal expansion. Additional factors such as reduced expression of
RNA splicing factors are required as shown in our model. Although preleukemic
mutations may have other effects that increase leukemia risk such as altering gene
expression and genetic stability, clonal expansion is considered as an initial step of
cancer genesis (Cartier et al., 2009; Cavazzana-Calvo et al., 2010; Hacein-Bey-Abina et
al., 2003, 2014). The variability in clonal expansion induced by a single preleukemic
mutation could explain the risk and rarity of disease genesis in humans.
Genes involved in RNA splicing can generate alternatively spliced transcripts that
may have different stabilities and translation efficiency and may encode proteins with
different functions(Papaemmanuil et al., 2016). Previous studies have shown that these
genes are often mutated in hematopoietic malignancies and clonal hematopoiesis
(Inoue et al., 2016; Jaiswal et al., 2014; Y. Zhang et al., 2021). Moreover, mutations of
RNA splicing factors have been found to drive preleukemic diseases such as MDS(Kim
et al., 2015; Malcovati et al., 2020). Here, we show that RNA splicing factors are
significantly down regulated in overexpanded clones (Figure 3.4). Furthermore,
knocking down Rbm25, one of the RNA splicing factors that we have identified,
accelerates the expansion of Tet2 KO hematopoietic cells in vitro and in vivo (Figure
3.5). Repression of Rbm25 has been shown to increase the proliferation of AML cell
lines (Ge et al., 2019, p. 25). In addition, Rbm25 repression does not affect the
expansion of primary wildtype HSPCs(Ge et al., 2019, p. 25). This is in line with our
data that Rbm25 expression is negatively correlated with clonal expansion only in Tet2
92
KO HSCs and not in control HSCs (Figure 3.5b). Our data further reveal a new role for
Rbm25 in driving Tet2 KO associated clonal expansion in primary mouse HSPCs. Using
Tet2 and Rbm25 as examples, our data suggest a potential general mechanism that
splicing alterations may act as a secondary driver following epigenetic alterations during
the multiple steps of leukemia genesis. This discovery suggests that RNA splicing could
be used to further stratifying high risk patients and as therapeutic targets to prevent
clonal hematopoiesis and cancer genesis.
Our study reveals cell-cell variability in clonal expansion during the preleukemic
stage. Cellular heterogeneity has been increasingly recognized in cancer progression
and remains a major challenge in developing viable treatments(Meacham & Morrison,
2013). Our data here demonstrate that deletion of Tet2, which encodes an epigenetic
factor, in otherwise normal HSCs induces the variability in the expression of RNA
splicing factors, such as Rbm25, that subsequently leads to heterogeneous clonal
expansion (Figs. 4-5). These findings reveal the stochastic nature of cancer genesis as
observed in the clinical settings(Busque et al., 2012; Genovese et al., 2014; Jaiswal et
al., 2014). Our unique approach of comparing overexpanded with normally expanded
clones brings new insights into the molecular events underlying preleukemic clonal
expansion, different from previous studies relying on population-level comparisons
between Tet2 mutant and normal cells. This approach can be extended to studying
other types of cancer and diseases driven by cellular expansion.
Acknowledgments
93
The authors thank: The USC Stem Cell Flow Cytometry Core, CHLA Sequencing Core,
and UCI Genomics High Throughput Facility for technical support. We also thank
Jasper Rubin-Sigler for providing dCas9-KRAB-GFP lentivirus used for in vitro
experiments. R.L. is a Scholar of the Leukemia & Lymphoma Society (LLS-1370-20)
and a Richard N. Merkin Assistant Professor, supported by NHLBI grants
R35HL150826, R01HL138225, R01HL135292, and K99/R00HL113104. C.B. was
supported by the National Heart, Lung, Blood Institute grant 1F31HL149278-01A1. J.E.
and I.G. were supported by California Institute for Regenerative Medicine grant EDUC4-
12756 and EDUC2-12607. The content is solely the responsibility of the authors and
does not necessarily represent the official views of the National Heart, Lung, Blood
Institute, or the National Institutes of Health. This manuscript was edited at Life Science
Editors.
Author information
Authors and Affiliations
Department of Stem Cell Biology and Regenerative Medicine, Eli and Edythe Broad
Center for Regenerative Medicine and Stem Cell Research, Keck School of Medicine,
University of Southern California, Los Angeles, CA, 90033, USA
Charles Bramlett, Jiya Eerdeng, Du Jiang, Patrick Condie, Yeachan Lee, Mary Vergel,
Ivon Garcia, Anna Nogalska, Rong Lu
94
Contributions
R.L. conceptualized the project, designed the experiments, and wrote the manuscript.
C.B. conceptualized the project, designed and performed the experiments, performed
data analysis, and wrote the manuscript. J.E., D.J., P.C. performed data analysis and
generated figures. Y.L., M.V., and I.G. assisted with experiments and data collection.
A.N. assisted in writing and editing the manuscript.
METHODS
Mice
Mice used for these studies were purchased from Jackson Laboratories. Inducible Tet2
knockout mice were generated by crossing B6;129S-Tet2tm1.1Iaai/J (Tet2
fl/fl
, CD45.2,
stock No. 017573) and B6.129-Gt(ROSA)26Sortm1(Cre/ERT2)Tyj/J (Rosa26Cre
+/+
,
CD45.2, stock No. 008463). Their offspring Rosa26Cre
+/-
Tet2
fl/fl
were used for inducing
Tet2 deletion, and their offspring Rosa26Cre
-/-
Tet2
fl/fl
from the same litter were used as
controls. To induce excision of floxed alleles, 4mg/40g tamoxifen was intraperitoneally
injected into each mouse every other day for one week. Tet2
+/-
dCas9-KRAB mice were
generated by crossing B6(Cg)-Tet2tm1.2Rao/J (Tet2
+/-
, CD45.2, stock No. 023359) and
B6.Cg-Igs2tm1(CAG-mCherry,-cas9/ZNF10*)Mtm/J (dCas9-KRAB
+/+
, CD45.2, stock
No. 030000). Donor mice were 12 to 16 weeks old. Transplantation recipients were 12-
to-16-week-old wildtype B6.SJL ‐Ptprca Pepcb/BoyJ (CD45.1, stock No. 002014).
Flushed femur bone marrow of C57BL/6J and B6.SJL ‐Ptprca Pepcb/BoyJ
95
(CD45.1/CD45.2, stock Nos. 000664 &002014) offspring were transplanted to help
recipient mice to recover from irradiation. Mice were bred and maintained at the
University of Southern California Department of Animal Research facility. All animal
procedures were approved by the Institutional Animal Care and Use Committee.
Hematopoietic stem cell isolation, transduction, and transplantation
Hematopoietic stem cells (HSCs, lineage [TCR, CD4, CD8, B220, Gr1, Mac1, Ter119]
−
/
cKit
+
/ Sca1
+
/ Flk2
−
/ CD34
−
/ CD150
+
) were obtained from the crushed bones of donor
mice and enriched using CD117 microbeads (Miltenyi Biotec, Auburn, CA). HSCs were
sorted via fluorescent activated cell sorting (FACS) by the FACS-Aria III (BD
Biosciences, San Jose, CA). HSCs were transduced for 15 hours with barcode/sgRNA
lentivirus and washed three times before transplantation. Recipient mice are
preconditioned with 950 cGy x-ray irradiation before transplantation. Each recipient
mouse received 4,000- 5,000 HSCs along with 200,000 whole bone marrow cells. Cells
were transplanted via retro-orbital injection.
Blood and bone marrow collection
Peripheral blood was collected into 10mM EDTA in PBS solution via a small transverse
cut in the tail vain. A 2% dextran solution was used to separate red blood cells from
white blood cells. Samples were treated with an ammonium-chloride-potassium lysis
buffer for 5 minutes on ice to eliminate remaining red blood cells. White blood cells were
resuspended in 2% fetal bovine serum in PBS. At the end time point, peripheral blood
96
was collected by transcardial perfusion immediately after mice were sacrificed. Perfused
blood was then processed in the same way as that collected from the tail. Bone marrow
was collected by crushing the bones, and was split into two aliquots. The first aliquot
was enriched using CD117 microbeads and was used for sorting hematopoietic stem
and progenitors (HSPCs). The second aliquot was not enriched and was used for
analyzing the proportions of HSPCs in the bone marrow.
Flow cytometry and FACS
Blood and bone marrow cells were stained with fluorescent antibodies and sorted on the
FACS-Aria III. Blood cells were sorted to collect T cells, B cells, granulocytes, and
monocytes. Bone marrow cells were sorted to collect HSCs, MPP1/2s, MPP3s, MPP4s,
CMPs, GMPs, MEPs, and CLPs. Antibodies were obtained from eBioscience and
BioLegend as described previously(Lu et al., 2019; L. Nguyen et al., 2018). Donor cells
were sorted based on the CD45.2 marker and barcoded cells were sorted based on
GFP. Dead cells were distinguished using 4′,6-diamidino-2-phenylindole or propidium
iodide viability dyes. The following cell surface markers were used to sort hematopoietic
populations:
• T cell: B220
−
/CD19
−
/Mac1
−
/Gr1
−
/TCRαβ
+
/CD4+/CD8
+
• B cell: CD4
−
/CD8
−
/Gr1
−
/Mac1
−
/B220
+
/CD19
+
• Granulocyte: CD4
−
/CD8
−
/B220
−
/CD19
−
/Mac1
+
/Gr1
+
• Monocyte: CD4
−
/CD8
−
/B220
−
/CD19
−
/Mac1
+
/CD115
+
97
• HSC: lineage (TCR, CD4, CD8, B220, Gr1, Mac1, Ter119)
−
/ cKit
+
/ Sca1
+
/ Flk2
−
/ CD34
−
/
CD150
+
• MPP1/2: lineage (TCR, CD4, CD8, B220, Gr1, Mac1, Ter119)
−
/ cKit
+
/ Sca1
+
/ Flk2
−
/
CD34
+
/ CD150
+
• MPP3: lineage (TCR, CD4, CD8, B220, Gr1, Mac1, Ter119)
−
/ cKit
+
/ Sca1
+
/ Flk2
−
/
CD150
−
• MPP4: lineage (TCR, CD4, CD8, B220, Gr1, Mac1, Ter119)
−
/ cKit
+
/ Sca1
+
/ Flk2
+
/
CD150
−
• CMP: lineage (TCR, CD4, CD8, B220, Gr1, Mac1, Ter119)
−
/cKit
+
/Sca1
−
/FcγR
−
/CD34
+
• GMP: lineage (TCR, CD4, CD8, B220, Gr1, Mac1, Ter119)
−
/cKit
+
/Sca1
−
/FcγR
+
/CD34
+
• MEP: lineage (TCR, CD4, CD8, B220, Gr1, Mac1, Ter119)
−
/cKit
+
/Sca1
−
/FcγR
−
/CD34
−
• CLP: lineage (TCR, CD4, CD8, B220, Gr1, Mac1, Ter119)
−
/Flk2
+
/Il7rα
+
DNA barcode extraction, sequencing, and analysis
Genomic DNA was extracted from sorted cells and PCR amplified as previously
described(Bramlett et al., 2020; Brewer et al., 2016; Lu et al., 2011, 2019; L. Nguyen et
al., 2018). PCR product was purified and analyzed using high-throughput sequencing.
Sequencing data was used in combination with FACS data to calculate the abundance
of each clone using the following formula. Unique DNA barcodes were used for clonal
analysis if their abundance was greater than 0.1% of any cell type (Figure 3.1, Figure
3.S2) or 0.1% of HSC (Figure 3.2-3.4, Figure 3.S2).
98
Clonal abundance amongst mononuclear cells (% MNCs):
100% × (Each cell population % MNCs) × (Donor % Each cell population) × (GFP %
Donor cells) × (number of reads for each barcode) / (total reads of all barcodes)
Clonal abundance within a cell type (% cell type):
100% × (Donor % Each cell population) × (GFP % Donor cells) × (number of reads for
each barcode) / (total reads of barcodes)
Single-cell RNA sequencing and analysis
Single-cell RNA sequencing was performed following the manufacturer’s protocol for the
Chromium Single Cell 3’ Library (10x Genomics, V3.1). Barcoded HSCs and MPP3s
(GFP+) were sorted and loaded into the Chromium chip. The gene expression libraries
were sequenced for preliminary quality control purposes using Illumina Miseq at low
coverage (~200 raw reads per cell). The gene expression libraries were then sequenced
using an Illumina Novaseq and HiSeq 4000 until coverage of 50,000 raw reads per cell
was reached (paired end; read1: 26 cycles; i7 index: 8 cycles; read 2: 90 cycles). Raw
data were processed using the Cell Ranger pipeline (10× Genomics, v 2.1.0) for cellular
barcode assignment and unique molecule identifier (UMI) quantification. From four HSC
and four MPP3 10x reactions, we successfully recovered 40,072 HSC and MPP3 cells
form the sequencing data. We excluded cells with more than 10% UMIs mapped to
mitochondrial genes, and cells with total UMI counts or gene counts more than three
median absolute deviations from the median of all cells. After filtering, we analyzed
99
4,302 control and 5,820 Tet2 KO HSCs as well as 8,701 control and 15,059 Tet2 KO
MPP3s. Genes with 3 or more UMIs in at least 5% of cells were used for downstream
analyses. Expression values for gene i in cell j were calculated by dividing UMI count
values for gene i by the sum of the UMI counts in cell j, and then multiplying by 10,000
to create transcripts per million (TPM) like values, and finally calculating (TPM + 1) as
gene expression values. For comparing single-cell gene expression data, P values were
calculated using the one-sided Mann–Whitney U-test.
Integrating clonal tracking data and single cell RNA sequencing data
cDNA was used to PCR amplify molecules that contain both the clonal tracking
barcodes (TBCs) and the 10x Chromium cellular barcodes (CBCs). PCR was performed
using the single-cell cDNA library as the template and a single primer
(ACACTCTTTCCCTACACGACGCTCTTCCGATCT). The PCR products were purified
from an agarose gel (Zymo Research). These molecules were then sequenced using
the SMRT sequencing platform (Sequel II, Pacific Biosciences) to map TBCs and
CBCs, and thereby connecting clonal tracking data and gene expression data at the
single cell level. Raw sequencing data were analyzed using SMRT Analysis software
(Pacific Biosciences, SMRT Link Version 5.1.0) with default parameters. Mapping
between single-cell gene expression and the clonal tracking data was established when
one or more molecules contained an 100% exact match to a TBC and a CBC. No
misreads were allowed during this mapping. Chromium cellular barcodes that were
mapped to more than one clonal tracking barcode were excluded from downstream
analyses. Around 10-15% of cells with single-cell transcriptome data were successfully
100
mapped to a tracking barcode. We successfully mapped 146 control, 100 Tet2 KO
normally expanded, and 204 Tet2 KO overexpanded myeloid-biased HSCs, as well as
365 control, 363 Tet2 KO normally expanded, and 599 Tet2 KO overexpanded myeloid-
biased MPP3s, that were used in differential gene expression analysis of Figure 3.4.
Gene Ontology analysis
Gene ontology analysis was performed using ToppFun (ToppGene,
https://toppgene.cchmc.org/). Differential expressed genes with a q-value < 0.1 were
used for this analysis. Significant terms in ‘Biological Process’ and ‘Disease’ were
compared between HSC and MPP3 to identify terms shared by both cell types. RNA
metabolic process (GO:0016071, 729 genes) and acute myeloid leukemia (C0026998,
125 genes) gene sets were used in Figure 3.4j.
AML gene expression data
AML RNAseq data were obtained from the BEAT AML data viewer (Vizome, ). Patients
with TET2 mutations and healthy controls were exclusively used for the analysis shown
in Figure 3.5a. Risk stratification for the ELN2017 cohort was determined by cytogenetic
classifications(Döhner et al., 2017).
101
Cell lines and culture
Leukemic cell lines were purchased from ATCC (#TIB-55, #CRL-1838, #TIB-192). BB88
cells were cultured in DMEM with 10% FBS, 1% penicillin/streptomycin, and 1% L-
glutamine. M-NSF-60 cells were cultured in RPMI with 10% FBS, 1%
penicillin/streptomycin, 1% L-glutamine, 0.05 mM 2-mercaptoethanol, and 62 ng/ml
human recombinant macrophage colony stimulating factor (M-CSF). M1 cells were
cultured in RPMI with 10% FBS, 1% penicillin/streptomycin, and 1% L-glutamine.
CRISPR/Cas9 genome editing
Tet2 mutation was introduced to leukemic cell lines via Cas9/sgRNA ribonucleo-protein
(RNP) complex designed and produced through Synthego. Three single guide RNAs
(sgRNAs) were used to create indels (~100bp) with high knockout efficiency. RNPs
were electroporated using the Neon Transfection System (Thermo Fisher). Cells were
electroporated with .5ug GFP mRNA. 24 hours post electroporation, GFP+ cells were
sorted to enrich for Tet2 KO cells.
CRISPRi via dCas9-KRAB
For in vitro experiments, we transduced dCas9-KRAB-GFP (Addgene, #71237)
lentivirus to cell lines and sorted for GFP+ cells before sgRNA transduction. For in vivo
experiments, Tet2
+/-
dCas9-KRAB mice were used. sgRNAs targeting Rbm25 were
designed using the Broad Institute CRISPick online tool. The sgRNAs were cloned into
the BsmbI site of the pLKO5.sgRNA.EFS.tRFP lentiviral backbone (Addgene #57823).
102
Three sgRNAs were cloned to overcome the limitations of the CRISPick algorithm. Five
non-genome-targeting sgRNAs were cloned as control for in vitro and in vivo
experiments. Correct insertion of sgRNA was verified by Sanger sequencing.
Lentiviruses carrying the plasmids were generated in HEK293T cells (ATCC, cat. no.
CRL-3216, RRID: CVCL_0063) by co-transfecting the packaging plasmids psPAX.2 and
pCMV-VSV-G using BioT reagent (Bioland Scientific) following manufacturers’
protocols. sgRNAs were mixed prior to lentiviral packaging to generate a single lentiviral
pool per gene target. Sorted Tet2
+/-
dCas9-KRAB HSCs or Tet2
-/-
dCas9-KRAB
leukemic cell lines were transduced in X-Vivo 15 media (Lonza, Walkersville Inc.) with
10ug/mL polybrene.
RNA isolation, cDNA synthesis, and quantatative PCR
RNA was isolated from BB88 cell line or primary hematopietic stem and progenitor cells
using Quick-RNA Miniprep Plus Kit (Zymo Reseach). cDNA was then synthesized using
qScript cDNA Synthesis Kit (Quantabio). RT-qPCR was performed using PerfeCTa
SYBR Green, Low ROX FastMix (Qunatabio) on a ViiA 7 Real-Time PCR System
(Thermofisher). Expression was normalized using housekeeping gene Gapdh.
MTS proliferation assay
5,000 cells per well were seeded in 96-well plates in cell culture media with 2% FBS. 8
wells were used per experimental group. MTS reagent (Abcam, ab197010) was added
and incubated for 3 hours. Cell number was determined by 490nm absorbance using
Spectramax i3x plate reader (Molecular Devices). Initial absorbance was analyzed
103
immediately after plating. Different wells were then analyzed at 24 and 48 hours after
plating.
104
FIGURES & LEGENDS
Figure 3.1. Heterogeneous cellular expansion induced by Tet2 knockout
(a) Experimental workflow. HSCs from Rosa26-Cre
ERT2+/-
Tet2
fl/fl
mice and Rosa26-
Cre
ERT2-/-
Tet2
fl/fl
littermate mice were barcoded and transplanted into wildtype recipients.
Tet2 knockout (KO) was induced 1-month post-transplantation via tamoxifen injection.
105
Control mice n=4, Tet2 KO mice n=6. HSPCs, hematopoietic stem and progenitor cells.
WBCs, white blood cells.
(b) Peripheral blood analysis by FACS. Shown are the abundance of B cells and
granulocytes in control and Tet2 KO mice. Error bars represent SEM. Two-tailed
Student’s t-test.
(c) FACS analysis of hematopoietic stem and progenitor cells (HSPCs) at the end point,
13 months post-transplantation. Bars show the mean of control and Tet2 KO groups.
MNC, mononuclear cells. BM, bone marrow. Two-tailed Student’s t-test.
(d) Clonal abundance of the top 15 most abundant clones in each mouse. Each dot
represents one clone. Bars show the median of control and Tet2 KO groups. Control
clones n=60, Tet2 KO clones n=90. Two-tailed Mann-Whitney U test.
(e-g) Distribution of granulocyte clonal abundance in the peripheral blood (PB) (e) and
distribution of MPP3 (f) and HSC (g) clonal abundance in the bone marrow. Bottom x-
axis denotes the percent of unique clones in respective groups shown as bars. Top x-
axis denotes the total abundance as a percent of corresponding cell type for each bar
shown as red dots. Yellow highlights bars that represent high abundance clones only
found in the Tet2 KO group.
* P value <.05, ** P value <.01, *** P value <.001.
106
Figure 3.2. Preleukemic clonal expansion in Tet2 KO mice is driven by rare,
overexpanded clones.
(a) Tet2 KO overexpanded clones are defined as clones whose abundances are greater
than the most abundant clone in control mice. Blue bar shows the average abundance
of the most abundant clone from each of the four control mice, which was used as the
threshold to identify overexpanded clones. Each dot represents one overexpanded
clone.
(b) The number of Tet2 KO overexpanded clones in each cell type.
107
(c) Tet2 KO overexpanded clones across cell types. Each column represents one
unique clone. Some clones are identified as overexpanded in multiple cell types.
(d) Hierarchical clustering of Tet2 KO overexpanded clones based on their abundance
in different cell types.
(c-d) Clones identified from the same mouse are indicated as a unique color in the top
row.
(e) Numbers and abundances of Tet2 KO overexpanded clones among all Tet2 KO
clones.
(f) Comparing HSC and MPP3 clonal abundance in the bone marrow and granulocyte
clonal abundance in the peripheral blood of control, Tet2 KO normally expanded and
overexpanded clones. White dots represent median. Two-tailed Mann-Whitney U test.
**** P value <.0001.
(g) Clonal abundance of granulocytes in the peripheral blood before population-level
expansion are detected. Shaded regions represent SEM. MNC, mononuclear cells.
108
Figure 3.3. Overexpanded myeloid-biased Tet2 KO clones drive MPP3 and
granulocyte expansion
(a) Hierarchical clustering of all clones from control and Tet2 KO mice identifies three
major clusters. Clones in each cluster are grouped by Tet2 genotype and expansion
profiles.
(b) Average clonal abundance of each cell type in the three major clusters from (a).
Error bars represent SD.
(c) Expected and observed number of Tet2 KO overexpanded clones in each cluster.
Chi squared test: 𝜒 2
= 20.5, P value < .0001.
(d) Number of unique clones across the three clusters.
109
(e) HSC and MPP3 clonal abundance in the bone marrow and granulocyte clonal
abundance in the peripheral blood of control, Tet2 KO normally expanded and Tet2 KO
overexpanded clones across the three clusters. Bars represent median. MNC,
mononuclear cells. Two-tailed Mann-Whitney U test. ns, not significant, ** P value <.01,
*** P value <.001, **** P value <.0001.
110
111
Figure 3.4. Overexpanded myeloid biased Tet2 KO clones exhibit significantly
reduced expression of genes associated with RNA splicing and acute myeloid
leukemia.
(a) Workflow to connect clonal tacking data to single cell gene expression profiles.
(b) UMAP visualization of myeloid-biased HSCs.
(c) Differentially expressed genes comparing overexpanded and normally expanded
HSCs that are myeloid-biased and Tet2 KO.
(d) Examples of significantly differentially expressed genes identified in (c).
(e) UMAP visualization of myeloid biased MPP3s.
(b, e) Colors highlight cells identified as derived from myeloid-biased clones. The rest of
cells are shown in grey.
(f) Differentially expressed genes comparing overexpanded and normally expanded
MPP3s that are myeloid-biased and Tet2 KO.
(g) Examples of significantly differentially expressed genes identified in (f).
(d, g) One-side Mann-Whitney U test. Shown are adjusted P values (Benjamini–
Hochberg).
(h, i) Gene ontology (GO) analysis of differentially expressed genes from (c) and (f).
Shown are all the GO terms associated with biological processes (h) and diseases (i)
shared by the GO analysis of HSC and MPP3 data.
112
(j) Comparing averaged expression of all genes associated with the GO terms mRNA
metabolic process (RMP, GO:0016071) and acute myeloid leukemia (AML, C0026998).
Two-tailed Mann-Whitney U test.
* P value <.05, ** P value <.01, *** P value <.001, **** P value <.0001.
113
Figure 3.5. Repression of the RNA splicing factor Rbm25 accelerates expansion
of Tet2 KO HSPCs
114
(a) Rbm25 expression in AML with TET2 mutations from patients. Controls are CD34
+
bone marrow cells from healthy individuals. AML patients are grouped by cytogenetic
risk. Two-tailed Student’s t-test. Data derived from BEAT AML ELN2017 cohort
48
.
(b) Workflow for engineering leukemic cell lines to carry Tet2 KO and Rbm25
expression knockdown. Transduced cells were sorted after each transduction.
(c) MTS cell proliferation assay of Tet2 KO leukemic cell lines. Cells were transduced
with either non-genomic targeting (NT, gray) or Rbm25-targeting (red) sgRNAs. Error
bars represent SEM. Two-tailed Student’s t-test.
(d) Experimental workflow to determine the effect of Rbm25 expression on the
expansion of Tet2 KO HSPCs. HSCs carrying heterozygous Tet2 KO and dCas9-KRAB
were transduced with either non-targeting (NT) sgRNAs or sgRNAs targeting Rbm25.
(e) Fraction of cKit
+
Sca1
+
Lin
-
(KLS) among all RFP
+
mononuclear cells (MNCs) in the
bone marrow. RFP indicates successful transduction of sgRNAs.
(f) Fraction of various types of HSPCs among all RFP
+
MNCs in the bone marrow.
(e,f) Each dot depicts data from one Tet2
+/-
Cas9-KRAB
+/-
mouse. NT sgRNA, n=8.
Rbm25 sgRNA, n=7.One-tailed Student’s t-test. * P value <.05, ** P value <.01.
(g) Rbm25 expression in control and Tet2 KO HSCs.
(h) Spearman correlation (rs) of clonal abundance and averaged expression level of
Rbm25 in each myeloid-biased clone. Clonal abundance ranked least to greatest, rank
1 being the least abundant.
115
(i) Bcl2l1 mRNA isoform abundance in Tet2
+/-
Cas9-KRAB
+/-
HSPCs transduced with NT
sgRNAs or Rbm25-targeting sgRNAs. One-tailed Student’s t-test. * P value <.05, ** P
value <.01,
(j) Model depicting how Tet2 KO and Rbm25 repression drive hematopoietic expansion.
116
Supplementary Figure 3.S1. Tet2 knockout (KO) validation. PCR analysis of Tet2
exon 3 and Gapdh from single cell cDNA libraries of HSCs sorted from mice treated
with tamoxifen (Figure 3.1a and Figure 3.4a). Control: HSCs derived from Cre
-/-
Tet2
fl/fl
mice; Tet2 KO: HSCs derived from Cre
+/-
Tet2
fl/fl
mice; no template: no cDNA loaded.
117
Supplementary Figure 3.S2. Flow cytometry gating for peripheral blood cells. The
four harvested blood cell populations (T cells, B cells, granulocytes, and monocytes) are
highlighted in red. Donor cells were sorted as CD45.2-positive cells. GFP was used to
identify barcoded cells. A list of cell surface markers for each harvested cell population
is included in Methods.
118
Supplementary Figure 3.S3. Flow cytometry gating for bone marrow
hematopoietic stem and progenitor cells. The eight harvested bone marrow HSPC
populations (HSC, MPP1/2, MPP3, MPP4, CLP, CMP, GMP, MEP) are highlighted in
red. Donor cells were sorted as CD45.2-positive. GFP was used to identify barcoded
cells. The bone marrow cells were enriched using magnetic cKit and IL7Ra antibody
beads before sorting (details described in Methods). A list of cell surface markers for
each harvested cell population is included in Methods.
119
Supplementary Figure 3.S4. Number of clones detected in each mouse. (a)
Number of clones detected in the peripheral blood. (b) Number of clones detected in the
bone marrow. Each marker represents data from one mouse. ns, non-significant.
120
Supplementary Figure 3.S5. Clonal abundance of top 5 and top 10 most abundant
clones in each mouse. Each dot represents one clone. Bars show the median of
control and Tet2 KO groups. * P value <.05, ** P value <.01, *** P value <.001, **** P
value <.0001.
121
Supplementary Figure 3.S6. Hierarchical clustering of all clones from control and
Tet2 KO mice. Each column shows one clone.
122
Supplementary Figure 3.S7. UMAP visualization of data collected from all HSCs
and MPP3s comparing control and Tet2 KO cells.
123
Supplementary Figure 3.S8. Number of unique and shared biological processes
and disease gene ontology (GO) terms. GO terms were identified as enriched among
differentially expressed genes in HSCs and MPP3s of overexpanded myeloid biased
clones compared to normally expanded myeloid biased clones (Figs 4c and 4f).
124
Supplementary Figure 3.S9. Engineering leukemic cell lines to knockout Tet2 and
knockdown Rbm25 expression. (a) Tet2 KO confirmation in three separate leukemic
cell lines. (b) dCas9-KRAB mediated knock down of Rbm25 expression in BB88 cell line
and primary mouse cKit
+
Sca1
+
Lin
-
(KLS) . One-tailed Student’s t-test. * P value < .05. **
P value < .01.
125
TABLES
Table 3.1 Primers & Oligos
Name
Primer/Oli
go
Sequence Note
Rosa26 Cre F pcr primer CTGGCTTCTGAGGACCG
mouse
genotype
Rosa26 Cre R pcr primer CCGAAAATCTGTGGGAAGTC
mouse
genotype
Rosa26 F pcr primer CGTGATCTGCAACTCCAGTC
mouse
genotype
Rosa26 R pcr primer AGGCAAATTTTGGTGTACGG
mouse
genotype
Tet2 Flox F pcr primer AAGAATTGCTACAGGCCTGC
mouse
genotype
Tet2 Flox R pcr primer TTCTTTAGCCCTTGCTGAGC
mouse
genotype
dCas9-KRAB F pcr primer GGCCATCGGGACTAATAGC
mouse
genotype
dCas9-KRAB R pcr primer TTCTTCCTCCTGGTGTACCG
mouse
genotype
Igs2 F pcr primer TGGAGGAGGACAAACTGGTCAC
mouse
genotype
126
Igs2 R pcr primer TTCCCTTTCTGCTTCATCTTGC
mouse
genotype
Tet2 Rao wt F pcr primer AGCTGATGGAAAATGCAAGC
mouse
genotype
Tet2 Rao
universal R
pcr primer TCTCAGAGCAAAGAGGACTGC
mouse
genotype
Tet2 Rao mut F pcr primer GCCACTTTAGAAGCCTATTGGA
mouse
genotype
Tet2 sgRNA GCUUUGGGAAGAUUGCCACU sgRNA
Tet2 sgRNA CUGCUAACGGGCUUCCAUUC sgRNA
Tet2 sgRNA UGGGAGAAGGUGGUGCUAUC sgRNA
Tet2 F pcr primer TCTTGTTTGGATGGAGCCCA
cell line
genotype
Tet2 R pcr primer GCATCTGCTTGCTAGATCTAGGT
cell line
genotype
Tet2
sequencing
primer
GCTAGATCTAGGTATTGGTGTAACTG
TAAT
cell line
genotype
genome non-
targeting F-1
oligo CACCGGTAGCGAACGTGTCCGGCGT
sgRNA
oligo
genome non-
targeting R-1
oligo AAACACGCCGGACACGTTCGCTACC
sgRNA
oligo
genome non-
targeting F-2
oligo CACCGGACCGGAACGATCTCGCGTA
sgRNA
oligo
127
genome non-
targeting R-2
oligo AAACTACGCGAGATCGTTCCGGTCC
sgRNA
oligo
genome non-
targeting F-3
oligo CACCGGGCAGTCGTTCGGTTGATAT
sgRNA
oligo
genome non-
targeting R-3
oligo AAACATATCAACCGAACGACTGCCC
sgRNA
oligo
genome non-
targeting F-4
oligo CACCGGCTTGAGCACATACGCGAAT
sgRNA
oligo
genome non-
targeting R-4
oligo AAACATTCGCGTATGTGCTCAAGCC
sgRNA
oligo
genome non-
targeting F-5
oligo CACCGGTGGTAGAATAACGTATTAC
sgRNA
oligo
genome non-
targeting R-5
oligo AAACGTAATACGTTATTCTACCACC
sgRNA
oligo
Rbm25 F-1 oligo CACCGTCCGAAGCGCCTACTCGCCT
sgRNA
oligo
Rbm25 R-1 oligo AAACAGGCGAGTAGGCGCTTCGGAC
sgRNA
oligo
Rbm25 F-2 oligo CACCGCAGGAGCTCCCCGCGCAGCG
sgRNA
oligo
Rbm25 R-2 oligo AAACCGCTGCGCGGGGAGCTCCTGC
sgRNA
oligo
128
Rbm25 F-3 oligo CACCGGCCTAGGCGAGTAGGCGCTT
sgRNA
oligo
Rbm25 R-3 oligo AAACAAGCGCCTACTCGCCTAGGCC
sgRNA
oligo
Gapdh F pcr primer AGGTCGGTGTGAACGGATTTG
RT-
qPCR
Gapdh R pcr primer TGTAGACCATGTAGTTGAGGTCA
RT-
qPCR
Rbm25 F pcr primer GATCCCACCCCCACAGTTTC
RT-
qPCR
Rbm25 R pcr primer CACGGTAGGTACTAGGACAGTT
RT-
qPCR
Bcl2l1 long F pcr primer AGAGACTGACAGCCTGATGC
RT-
qPCR
Bcl2l1 long R pcr primer TCACAGAGTCACAGACACCA
RT-
qPCR
Bcl2l1 short F pcr primer CTATCGGGAGATGCGTGGAC
RT-
qPCR
Bcl2l1 short R pcr primer CGTCCTTCCTGAAGTCCTCC
RT-
qPCR
129
Chapter Four: GTPase Smap1 accelerates preleukemic clonal expansion in mice
Preface:
Chapter four is additional unpublished works relating to gene target candidate Smap1.
The work in chapter four further explores the functional validation of targets found from
chapter three. Leveraging the combined preleukemic and gene repression model as
chapter three, chapter four investigates how loss of smap1 changes hematopoietic
proliferation in vitro and in vivo.
130
Loss of Smap1 expression underlies Tet2 KO-induced preleukemic clonal
expansion
To identify the biological processes underlying Tet2 KO-induced clonal expansion, we
identified differentially expressed genes between myeloid-biased overexpanded Tet2
KO clones and normally expanded Tet2 KO clones (Figure 3.4h and 3.4f). From these
analyses, we identified repression Smap1 as a potential gene associated with
preleukemic clonal expansion. Smap1 is a GTPase activating protein and regulator of
clathrin-dependent endocytosis(Tanabe et al., 2005). Knockout of Smap1 in mice
resulted in development of MDS, and AML. Moreover, hematopoietic cells with Smap1
knockout have dysregulated intracellular membrane tracking resulting in accumulation
of c-Kit along with enhanced activation of ERK leading to increased cell growth(Kon et
al., 2013).
To determine if reduced Smap1 expression triggered the expansion of Tet2 KO
hematopoietic cells, we used the dCas9-KRAB system to down regulate the Smap1
expression (Figure 4.1). We tested three mouse leukemic cell lines, BB88, M-NFS-60,
and M1, that were genetically modified to be Tet2 KO and to expressed dCas9-KRAB
(Figure 3.5b and Figure 3.S9) and sgRNAs. We found that 2 of 3 Tet2 KO cell lines
proliferated significantly faster when transduced with sgRNAs targeting Smap1
compared to those transduced with non-genome targeting sgRNAs (Figure 4.2). This
result suggests that Smap1 repression accelerates the expansion of Tet2 KO
hematopoietic cells in vitro.
To determine if Smap1 repression triggers the expansion of Tet2 KO HSPCs in
vivo, we isolated HSCs from mice carrying heterozygous Tet2 KO and dCas9-KRAB
131
and transduced them with Rbm25 targeting or non-genome targeting sgRNAs. These
HSCs were then transplanted into lethally irradiated wildtype recipients (Figure 3.5d).
Three months post-transplantation, Smap1 down regulation had already induced
significantly increased expansion of cKit
+
Sca1
+
lineage
-
progenitors, specifically the
HSC, MPP1/2, and MPP4 populations (Figure 4.3). Additionally, the common myeloid
progenitor (CMP) population also significantly expanded (Figure 4.3), further
demonstrating myeloid specific expansion. The MPP3 population slightly expanded in
the Smap1 repression group but the pvalue was slightly below significance (p=0.07)
(Figure 4.3). This result suggests that Tet2 KO myeloid progenitors undergo increased
expansion when Smap1 expression is suppressed.
Small GTPases in the ARF family play an important role in vesicle formation via
clathrin-mediated endocytosis. Smap1 is part of Arf6 that localizes to the plasma
membrane and functions in the endocytosis, vesicles recycling, actin rearrangement
and lipid metabolism(Donaldson & Jackson, 2011; Sangar et al., 2014). Moreover, there
has been implication of clathrin-related proteins to alter transferrin endocytosis and cell
growth(Khwaja et al., 2010; O’Donnell et al., 2006). While much is still unknown about
how Smap1 deficiency would cause clonal expansion, one study has suggested it may
be due to dysregulated c-Kit trafficking and increased cell growth (Kon et al., 2013).
Here we display that Smap1 repression in preleukemic Tet2 KO HSCs leads to
increased expansion compared to HSCs with only Tet2 KO. These data provide early
evidence dysregulation of clathrin-mediated endocytosis may play an important role in
the early stages of disease genesis.
132
FIGURES & LEGENDS
3.S2. Flow cytometry gating for peripheral blood cells..1. Engineering leukemic
cell lines to knockout Tet2 and knockdown Smap1 expression. dCas9-KRAB
mediated knock down of Smap1 expression in BB88 cell line. One-tailed Student’s t-
test. * P value < .05. ** P value < .01.
133
Figure 4.2. Repression of the Smap1 accelerates proliferation of Tet2 KO
hematopoietic cell lines in vitro. MTS cell proliferation assay of Tet2 KO leukemic cell
lines. Cells were transduced with either non-genomic targeting (NT, black) or Smap1-
targeting (blue) sgRNAs. Error bars represent SEM. Two-tailed Student’s t-test. * P
value <.05, ** P value <.01., *** P value <.001.
134
Figure 4.3. Repression of the Smap1 accelerates expansion of Tet2 KO HSPCs in
vivo.
Fraction of cKit
+
Sca1
+
Lin
-
(KLS) among all RFP
+
mononuclear cells (MNCs) in the bone
marrow (left). Fraction of various types of HSPCs among all RFP
+
MNCs in the bone
marrow (right). RFP indicates successful transduction of sgRNAs. Each dot depicts data
from one Tet2
+/-
Cas9-KRAB
+/-
mouse. NT sgRNA, n=5. Smap1 sgRNA, n=5.One-tailed
Student’s t-test. * P value <.05.
135
TABLES
TABLE 4.1 Primers
Name Primer/Oligo Sequence Note
Gapdh F pcr primer AGGTCGGTGTGAACGGATTTG RT-qPCR
Gapdh R pcr primer TGTAGACCATGTAGTTGAGGTCA RT-qPCR
Smap1 F pcr primer CTCATCCTGTCCAAACTGCTG RT-qPCR
Smap1 R pcr primer TGGTCTAGGTTGACTGACTTGAC RT-qPCR
136
Chapter Five: Discussion
Many studies have identified the prevalence of somatic mutations of cancer
driver genes in otherwise healthy individuals(Busque et al., 2012; Jaiswal & Ebert,
2019; Zink et al., 2017). However, the downstream events post-mutation that enable
clonal expansions in the early stages of disease genesis are not well understood.
Although heterogeneity within populations is well established and recognized, traditional
approaches to study biological differences due to gene mutation often ignore
differences at the clonal level. While individuals with the same mutation experience
variable clonal outgrowths, experimental models largely overlook this underlying
biology. With the advent of clonal tracking, single-cell molecular, and single-cell spatial
technologies over the past 10 years, the ability to unravel cellular identity has just
begun. Recent work studying clonal hematopoiesis has highlighted the role of
inflammation, epigenetics, and modulation of the oncogenic protein MYC in outgrowth
and disease progression(Muto et al., 2022, p. 6; Nakauchi et al., 2022; Nam et al.,
2022; Weinstock et al., 2021, p. 1; Yeaton et al., 2022). These mechanisms likely all
play a role in the complex progression from healthy to preleukemia to leukemia. To
make these findings truly translational there should be emphasis on three areas of
focus: (1) Identifying true causes preleukemic clonal expansions; (2) Identifying
common mechanisms shared amongst preleukemia’s harboring mutation in different
genes; (3) Specific targeting of HSCs harboring preleukemic mutations. By taking
approaches to distinguish signal from noise we can better identify molecular differences
directly responsible for expansion. Once true mechanisms are identified, we can
137
evaluate if common mechanisms exist amongst preleukemias harboring different
founder mutations but resulting in the same clinical complications. Lastly, by
understanding the potential common mechanisms of many preluekmias we can better
design therapies to treat or prevent individuals exhibited large clonal expansions.
This thesis aims to further understanding the heterogeneity in the early stages of
leukemia genesis, an under studied but critical phase in disease trajectory. We utilized
innovative clonal tracking technology to reveal the heterogeneity of preleukemic HSCs
harboring Tet2 genetic mutation, particularly regarding their differentiation and clonal
expansions. We showed only a subset of highly abundant clones contribute to
hematopoietic outgrowth in Tet2 KO mice. This work highlights the power of clonal
resolution to precisely analyze only cellular behaviors of interest without unintended
cellular contaminations. By incorporating genetic barcoding with single-cell
transcriptomics, we narrowed our analysis to show myeloid biased ‘overexpanded’
clones were responsible for Tet2 associated expansions and further analyzed these
clones solely to identify the molecular basis associated with their phenotype. Moreover,
we were able to use ‘non-overexpanded’ myeloid biased Tet2 KO cells as a compare
group to isolate subtle but key difference which we hypothesized were responsible for
the overserved hematopoietic expansion. Through our approach we identified RNA
processing genes were dysregulated in Tet2 KO clones exhibiting the highest levels of
expansion but not in Tet2 KO clones with normal expansion levels. We identified a
splicing factor Rbm25 which had had implications in preleukemia and disease
progression. After repressing Rbm25 in Tet2 KO cells we observed they exhibited
higher expansion of myeloid lineages than cells harboring Tet2 KO alone. These data
138
suggest our approach is successful in identifying genes that are key players in driving
preleukemic expansion. RNA splicing factors are often mutated in AML, but here we
show gene repression is enough to initiate preleukemic expansion on the background of
Tet2 KO. This highlights the importance to maintain proper splicing to keep cells in a
tumor suppressive state especially in preleukemic cells. Moreover, these findings may
offer additional molecular readouts in addition to somatic mutation genetic testing to
properly diagnose and prognose complex disease trajectories in the clinic.
139
References
Abdel-Wahab, O., Gao, J., Adli, M., Dey, A., Trimarchi, T., Chung, Y. R., Kuscu, C.,
Hricik, T., Ndiaye-Lobry, D., LaFave, L. M., Koche, R., Shih, A. H., Guryanova,
O. A., Kim, E., Li, S., Pandey, S., Shin, J. Y., Telis, L., Liu, J., … Levine, R. L.
(2013). Deletion of Asxl1 results in myelodysplasia and severe developmental
defects in vivo. Journal of Experimental Medicine, 210(12), 2641–2659.
https://doi.org/10.1084/jem.20131141
Abdel-Wahab, O., & Levine, R. L. (2013). Mutations in epigenetic modifiers in the
pathogenesis and therapy of acute myeloid leukemia. Blood, 121(18), 3563–
3572. https://doi.org/10.1182/blood-2013-01-451781
Abdel-Wahab, O., Mullally, A., Hedvat, C., Garcia-Manero, G., Patel, J., Wadleigh, M.,
Malinge, S., Yao, J., Kilpivaara, O., Bhat, R., Huberman, K., Thomas, S.,
Dolgalev, I., Heguy, A., Paietta, E., Le Beau, M. M., Beran, M., Tallman, M. S.,
Ebert, B. L., … Levine, R. L. (2009). Genetic characterization of TET1, TET2,
and TET3 alterations in myeloid malignancies. Blood, 114(1), 144–147.
https://doi.org/10.1182/blood-2009-03-210039
Akashi, K., Traver, D., Miyamoto, T., & Weissman, I. L. (2000). A clonogenic common
myeloid progenitor that gives rise to all myeloid lineages. Nature, 404(6774),
193–197. https://doi.org/10.1038/35004599
Bacher, U., Haferlach, C., Schnittger, S., Kohlmann, A., Kern, W., & Haferlach, T.
(2010). Mutations of the TET2 and CBL genes: Novel molecular markers in
myeloid malignancies. Annals of Hematology, 89(7), 643–652.
https://doi.org/10.1007/s00277-010-0920-6
Baker, C. L., & Pera, M. F. (2018). Capturing Totipotent Stem Cells. Cell Stem Cell,
22(1), 25–34. https://doi.org/10.1016/j.stem.2017.12.011
Becker, A. J., McCulloch, E. A., & Till, J. E. (1963). Cytological Demonstration of the
Clonal Nature of Spleen Colonies Derived from Transplanted Mouse Marrow
Cells. Nature, 197, 452.
Beerman, I., Bhattacharya, D., Zandi, S., Sigvardsson, M., Weissman, I. L., Bryder, D.,
& Rossi, D. J. (2010). Functionally distinct hematopoietic stem cells modulate
hematopoietic lineage potential during aging by a mechanism of clonal
expansion. Proceedings of the National Academy of Sciences, 107(12), 5465.
https://doi.org/10.1073/pnas.1000834107
Beerman, I., Maloney, W. J., Weissmann, I. L., & Rossi, D. J. (2010). Stem cells and the
aging hematopoietic system. Current Opinion in Immunology, 22(4), 500–506.
PubMed. https://doi.org/10.1016/j.coi.2010.06.007
140
Billingham, R. E., Brent, L., & Medawar, P. B. (1953). ‘Actively Acquired Tolerance’ of
Foreign Cells. Nature, 172(4379), Article 4379. https://doi.org/10.1038/172603a0
Bowling, S., Sritharan, D., Osorio, F. G., Nguyen, M., Cheung, P., Rodriguez-Fraticelli,
A., Patel, S., Yuan, W.-C., Fujiwara, Y., Li, B. E., Orkin, S. H., Hormoz, S., &
Camargo, F. D. (2020). An Engineered CRISPR-Cas9 Mouse Line for
Simultaneous Readout of Lineage Histories and Gene Expression Profiles in
Single Cells. Cell, 181(6), 1410-1422.e27.
https://doi.org/10.1016/j.cell.2020.04.048
Brady, G., Billia, F., Knox, J., Hoang, T., Kirsch, I. R., Voura, E. B., Hawley, R. G.,
Cumming, R., Buchwald, M., Siminovitch, K., Miyamoto, N., Boehmelt, G., &
Iscove, N. N. (1995). Analysis of gene expression in a complex differentiation
hierarchy by global amplification of cDNA from single cells. Current Biology, 5(8),
909–922. https://doi.org/10.1016/S0960-9822(95)00181-3
Bramlett, C., Jiang, D., Nogalska, A., Eerdeng, J., Contreras, J., & Lu, R. (2020). Clonal
tracking using embedded viral barcoding and high-throughput sequencing.
Nature Protocols, 15(4), 1436–1458. https://doi.org/10.1038/s41596-019-0290-z
Brewer, C., Chu, E., Chin, M., & Lu, R. (2016). Transplantation Dose Alters the
Differentiation Program of Hematopoietic Stem Cells. Cell Reports, 15(8), 1848–
1857. https://doi.org/10.1016/j.celrep.2016.04.061
Busch, K., Klapproth, K., Barile, M., Flossdorf, M., Holland-Letz, T., Schlenner, S. M.,
Reth, M., Höfer, T., & Rodewald, H.-R. (2015). Fundamental properties of
unperturbed haematopoiesis from stem cells in vivo. Nature, 518(7540), 542–
546. https://doi.org/10.1038/nature14242
Busque, L., Patel, J. P., Figueroa, M. E., Vasanthakumar, A., Provost, S., Hamilou, Z.,
Mollica, L., Li, J., Viale, A., Heguy, A., Hassimi, M., Socci, N., Bhatt, P. K.,
Gonen, M., Mason, C. E., Melnick, A., Godley, L. A., Brennan, C. W., Abdel-
Wahab, O., & Levine, R. L. (2012). Recurrent somatic TET2 mutations in normal
elderly individuals with clonal hematopoiesis. Nature Genetics, 44(11), 1179–
1181. https://doi.org/10.1038/ng.2413
Bystrykh, L. V., & Belderbos, M. E. (2016). Clonal Analysis of Cells with Cellular
Barcoding: When Numbers and Sizes Matter. In K. Turksen (Ed.), Stem Cell
Heterogeneity: Methods and Protocols (pp. 57–89). Springer New York.
https://doi.org/10.1007/7651_2016_343
Bystrykh, L. V., de Haan, G., & Verovskaya, E. (2014). Barcoded Vector Libraries and
Retroviral or Lentiviral Barcoding of Hematopoietic Stem Cells. In K. D. Bunting &
C.-K. Qu (Eds.), Hematopoietic Stem Cell Protocols (pp. 345–360). Springer New
York. https://doi.org/10.1007/978-1-4939-1133-2_23
141
Bystrykh, L. V., Verovskaya, E., Zwart, E., Broekhuis, M., & de Haan, G. (2012).
Counting stem cells: Methodological constraints. Nature Methods, 9, 567.
Cartier, N., Hacein-Bey-Abina, S., Bartholomae, C. C., Veres, G., Schmidt, M.,
Kutschera, I., Vidaud, M., Abel, U., Dal-Cortivo, L., Caccavelli, L., Mahlaoui, N.,
Kiermer, V., Mittelstaedt, D., Bellesme, C., Lahlou, N., Lefrère, F., Blanche, S.,
Audit, M., Payen, E., … Aubourg, P. (2009). Hematopoietic stem cell gene
therapy with a lentiviral vector in X-linked adrenoleukodystrophy. Science (New
York, N.Y.), 326(5954), 818–823. https://doi.org/10.1126/science.1171242
Cavazzana-Calvo, M., Payen, E., Negre, O., Wang, G., Hehir, K., Fusil, F., Down, J.,
Denaro, M., Brady, T., Westerman, K., Cavallesco, R., Gillet-Legrand, B.,
Caccavelli, L., Sgarra, R., Maouche-Chrétien, L., Bernaudin, F., Girot, R.,
Dorazio, R., Mulder, G.-J., … Leboulch, P. (2010). Transfusion independence
and HMGA2 activation after gene therapy of human β-thalassaemia. Nature,
467(7313), 318–322. https://doi.org/10.1038/nature09328
Challen, G. A., Sun, D., Jeong, M., Luo, M., Jelinek, J., Berg, J. S., Bock, C.,
Vasanthakumar, A., Gu, H., Xi, Y., Liang, S., Lu, Y., Darlington, G. J., Meissner,
A., Issa, J.-P. J., Godley, L. A., Li, W., & Goodell, M. A. (2012). Dnmt3a is
essential for hematopoietic stem cell differentiation. Nature Genetics, 44(1), 23–
31. https://doi.org/10.1038/ng.1009
Champion, K. M., Gilbert, J. G., Asimakopoulos, F. A., Hinshelwood, S., & Green, A. R.
(1997). Clonal haemopoiesis in normal elderly women: Implications for the
myeloproliferative disorders and myelodysplastic syndromes. British Journal of
Haematology, 97(4), 920–926.
Chapal-Ilani, N., Maruvka, Y. E., Spiro, A., Reizel, Y., Adar, R., Shlush, L. I., & Shapiro,
E. (2013). Comparing Algorithms That Reconstruct Cell Lineage Trees Utilizing
Information on Microsatellite Mutations. PLOS Computational Biology, 9(11),
e1003297. https://doi.org/10.1371/journal.pcbi.1003297
Chen, J. Y., Miyanishi, M., Wang, S. K., Yamazaki, S., Sinha, R., Kao, K. S., Seita, J.,
Sahoo, D., Nakauchi, H., & Weissman, I. L. (2016). Hoxb5 marks long-term
haematopoietic stem cells and reveals a homogenous perivascular niche.
Nature, 530(7589), 223–227. https://doi.org/10.1038/nature16943
Cheng, Y., Xie, W., Pickering, B. F., Chu, K. L., Savino, A. M., Yang, X., Luo, H.,
Nguyen, D. TT., Mo, S., Barin, E., Velleca, A., Rohwetter, T. M., Patel, D. J.,
Jaffrey, S. R., & Kharas, M. G. (2021). N6-Methyladenosine on mRNA facilitates
a phase-separated nuclear body that suppresses myeloid leukemic
differentiation. Cancer Cell, 39(7), 958-972.e8.
https://doi.org/10.1016/j.ccell.2021.04.017
142
Cheung, A. M. S., Nguyen, L. V., Carles, A., Beer, P., Miller, P. H., Knapp, D. J. H. F.,
Dhillon, K., Hirst, M., & Eaves, C. J. (2013). Analysis of the clonal growth and
differentiation dynamics of primitive barcoded human cord blood cells in NSG
mice. Blood, 122(18), 3129–3137. https://doi.org/10.1182/blood-2013-06-508432
Cimmino, L., Abdel-Wahab, O., Levine, R. L., & Aifantis, I. (2011). TET Family Proteins
and Their Role in Stem Cell Differentiation and Transformation. Cell Stem Cell,
9(3), 193–204. https://doi.org/10.1016/j.stem.2011.08.007
Coffman, R., & Weissman, I. (1983). Immunoglobulin gene rearrangement during pre-B
cell differentiation. The Journal of Molecular and Cellular Immunology : JMCI,
1(1), 31–41. PubMed.
Contreras-Trujillo, H., Eerdeng, J., Akre, S., Jiang, D., Contreras, J., Gala, B., Vergel-
Rodriguez, M. C., Lee, Y., Jorapur, A., Andreasian, A., Harton, L., Bramlett, C.
S., Nogalska, A., Xiao, G., Lee, J.-W., Chan, L. N., Müschen, M., Merchant, A.
A., & Lu, R. (2021). Deciphering intratumoral heterogeneity using integrated
clonal tracking and single-cell transcriptome analyses. Nature Communications,
12(1), 6522. https://doi.org/10.1038/s41467-021-26771-1
Cornils, K., Thielecke, L., Hüser, S., Forgber, M., Thomaschewski, M., Kleist, N.,
Hussein, K., Riecken, K., Volz, T., Gerdes, S., Glauche, I., Dahl, A., Dandri, M.,
Roeder, I., & Fehse, B. (2014). Multiplexing clonality: Combining RGB marking
and genetic barcoding. Nucleic Acids Research, 42(7), e56–e56. PubMed.
https://doi.org/10.1093/nar/gku081
Dai, Y., Cheng, Z., Pang, Y., Jiao, Y., Qian, T., Quan, L., Cui, L., Liu, Y., Si, C., Chen,
J., Ye, X., Chen, J., Shi, J., Wu, D., Zhang, X., & Fu, L. (2020). Prognostic value
of the FUT family in acute myeloid leukemia. Cancer Gene Therapy, 27(1), 70–
80. https://doi.org/10.1038/s41417-019-0115-9
Delhommeau, F., Dupont, S., Della Valle, V., James, C., Trannoy, S., Massé, A.,
Kosmider, O., Le Couedic, J.-P., Robert, F., Alberdi, A., Lécluse, Y., Plo, I.,
Dreyfus, F. J., Marzac, C., Casadevall, N., Lacombe, C., Romana, S. P., Dessen,
P., Soulier, J., … Bernard, O. A. (2009). Mutation in TET2 in myeloid cancers.
The New England Journal of Medicine, 360(22), 2289–2301.
https://doi.org/10.1056/NEJMoa0810069
Desai, P., Mencia-Trinchant, N., Savenkov, O., Simon, M. S., Cheang, G., Lee, S.,
Samuel, M., Ritchie, E. K., Guzman, M. L., Ballman, K. V., Roboz, G. J., &
Hassane, D. C. (2018). Somatic mutations precede acute myeloid leukemia
years before diagnosis. Nature Medicine, 24(7), 1015–1023.
https://doi.org/10.1038/s41591-018-0081-z
143
Di Paola, J., & Porter, C. C. (2019). ETV6-related thrombocytopenia and leukemia
predisposition. Blood, 134(8), 663–667.
https://doi.org/10.1182/blood.2019852418
Dick, J. E., Magli, M. C., Huszar, D., Phillips, R. A., & Bernstein, A. (1985). Introduction
of a selectable gene into primitive stem cells capable of long-term reconstitution
of the hemopoietic system of W/Wv mice. Cell, 42(1), 71–79.
https://doi.org/10.1016/S0092-8674(85)80102-1
Doan, P. L., Himburg, H. A., Helms, K., Russell, J. L., Fixsen, E., Quarmyne, M., Harris,
J. R., Deoliviera, D., Sullivan, J. M., Chao, N. J., Kirsch, D. G., & Chute, J. P.
(2013). Epidermal Growth Factor Regulates Hematopoietic Regeneration
Following Radiation Injury. Nature Medicine, 19(3), 295–304. PMC.
https://doi.org/10.1038/nm.3070
Döhner, H., Estey, E., Grimwade, D., Amadori, S., Appelbaum, F. R., Büchner, T.,
Dombret, H., Ebert, B. L., Fenaux, P., Larson, R. A., Levine, R. L., Lo-Coco, F.,
Naoe, T., Niederwieser, D., Ossenkoppele, G. J., Sanz, M., Sierra, J., Tallman,
M. S., Tien, H.-F., … Bloomfield, C. D. (2017). Diagnosis and management of
AML in adults: 2017 ELN recommendations from an international expert panel.
Blood, 129(4), 424–447. https://doi.org/10.1182/blood-2016-08-733196
Donaldson, J. G., & Jackson, C. L. (2011). ARF family G proteins and their regulators:
Roles in membrane transport, development and disease. Nature Reviews.
Molecular Cell Biology, 12(6), 362–375. https://doi.org/10.1038/nrm3117
Dorsheimer, L., Assmus, B., Rasper, T., Ortmann, C. A., Ecke, A., Abou-El-Ardat, K.,
Schmid, T., Brüne, B., Wagner, S., Serve, H., Hoffmann, J., Seeger, F.,
Dimmeler, S., Zeiher, A. M., & Rieger, M. A. (2019). Association of Mutations
Contributing to Clonal Hematopoiesis With Prognosis in Chronic Ischemic Heart
Failure. JAMA Cardiology, 4(1), 25–33.
https://doi.org/10.1001/jamacardio.2018.3965
Dykstra, B., Kent, D., Bowie, M., McCaffrey, L., Hamilton, M., Lyons, K., Lee, S.-J.,
Brinkman, R., & Eaves, C. (2007). Long-term propagation of distinct
hematopoietic differentiation programs in vivo. Cell Stem Cell, 1(2), 218–229.
https://doi.org/10.1016/j.stem.2007.05.015
Ergen, A. V., Boles, N. C., & Goodell, M. A. (2012). Rantes/Ccl5 influences
hematopoietic stem cell subtypes and causes myeloid skewing. Blood, 119(11),
2500–2509. https://doi.org/10.1182/blood-2011-11-391730
Frieda, K. L., Linton, J. M., Hormoz, S., Choi, J., Chow, K.-H. K., Singer, Z. S., Budde,
M. W., Elowitz, M. B., & Cai, L. (2016). Synthetic recording and in situ readout of
lineage information in single cells. Nature, 541, 107.
144
Fu, X., Meng, Z., Liang, W., Tian, Y., Wang, X., Han, W., Lou, G., Wang, X., Lou, F.,
Yen, Y., Yu, H., Jove, R., & Huang, W. (2014). MiR-26a enhances miRNA
biogenesis by targeting Lin28B and Zcchc11 to suppress tumor growth and
metastasis. Oncogene, 33(34), 4296–4306. https://doi.org/10.1038/onc.2013.385
Ge, Y., Schuster, M. B., Pundhir, S., Rapin, N., Bagger, F. O., Sidiropoulos, N.,
Hashem, N., & Porse, B. T. (2019). The splicing factor RBM25 controls MYC
activity in acute myeloid leukemia. Nature Communications, 10(1), 172.
https://doi.org/10.1038/s41467-018-08076-y
Genovese, G., Kähler, A. K., Handsaker, R. E., Lindberg, J., Rose, S. A., Bakhoum, S.
F., Chambert, K., Mick, E., Neale, B. M., Fromer, M., Purcell, S. M., Svantesson,
O., Landén, M., Höglund, M., Lehmann, S., Gabriel, S. B., Moran, J. L., Lander,
E. S., Sullivan, P. F., … McCarroll, S. A. (2014). Clonal hematopoiesis and
blood-cancer risk inferred from blood DNA sequence. The New England Journal
of Medicine, 371(26), 2477–2487. https://doi.org/10.1056/NEJMoa1409405
Gerrits, A., Dykstra, B., Kalmykowa, O. J., Klauke, K., Verovskaya, E., Broekhuis, M. J.
C., de Haan, G., & Bystrykh, L. V. (2010). Cellular barcoding tool for clonal
analysis in the hematopoietic system. Blood, 115(13), 2610–2618.
https://doi.org/10.1182/blood-2009-06-229757
Gonzalez-Murillo, A., Lozano, M. L., Montini, E., Bueren, J. A., & Guenechea, G. (2008).
Unaltered repopulation properties of mouse hematopoietic stem cells transduced
with lentiviral vectors. Blood, 112(8), 3138–3147. https://doi.org/10.1182/blood-
2008-03-142661
Guernet, A., Mungamuri, S. K., Cartier, D., Sachidanandam, R., Jayaprakash, A.,
Adriouch, S., Vezain, M., Charbonnier, F., Rohkin, G., Coutant, S., Yao, S.,
Ainani, H., Alexandre, D., Tournier, I., Boyer, O., Aaronson, S. A., Anouar, Y., &
Grumolato, L. (2016). CRISPR-Barcoding for Intratumor Genetic Heterogeneity
Modeling and Functional Analysis of Oncogenic Driver Mutations. Molecular Cell,
63(3), 526–538. PubMed. https://doi.org/10.1016/j.molcel.2016.06.017
Hacein-Bey-Abina, S., Pai, S.-Y., Gaspar, H. B., Armant, M., Berry, C. C., Blanche, S.,
Bleesing, J., Blondeau, J., de Boer, H., Buckland, K. F., Caccavelli, L., Cros, G.,
De Oliveira, S., Fernández, K. S., Guo, D., Harris, C. E., Hopkins, G., Lehmann,
L. E., Lim, A., … Thrasher, A. J. (2014). A modified γ-retrovirus vector for X-
linked severe combined immunodeficiency. The New England Journal of
Medicine, 371(15), 1407–1417. https://doi.org/10.1056/NEJMoa1404588
145
Hacein-Bey-Abina, S., Von Kalle, C., Schmidt, M., McCormack, M. P., Wulffraat, N.,
Leboulch, P., Lim, A., Osborne, C. S., Pawliuk, R., Morillon, E., Sorensen, R.,
Forster, A., Fraser, P., Cohen, J. I., de Saint Basile, G., Alexander, I.,
Wintergerst, U., Frebourg, T., Aurias, A., … Cavazzana-Calvo, M. (2003). LMO2-
associated clonal T cell proliferation in two patients after gene therapy for SCID-
X1. Science (New York, N.Y.), 302(5644), 415–419.
https://doi.org/10.1126/science.1088547
Han, X.-J., Lee, M.-J., Yu, G.-R., Lee, Z.-W., Bae, J.-Y., Bae, Y.-C., Kang, S.-H., & Kim,
D.-G. (2012). Altered dynamics of ubiquitin hybrid proteins during tumor cell
apoptosis. Cell Death & Disease, 3(1), e255–e255.
https://doi.org/10.1038/cddis.2011.142
Harkey, M. A., Kaul, R., Jacobs, M. A., Kurre, P., Bovee, D., Levy, R., & Blau, C. A.
(2007). Multiarm High-Throughput Integration Site Detection: Limitations of LAM-
PCR Technology and Optimization for Clonal Analysis. Stem Cells and
Development, 16(3), 381–392. https://doi.org/10.1089/scd.2007.0015
Hulett, H. R., Bonner, W. A., Barrett, J., & Herzenberg, L. A. (1969). Cell Sorting:
Automated Separation of Mammalian Cells as a Function of Intracellular
Fluorescence. Science, 166(3906), 747.
https://doi.org/10.1126/science.166.3906.747
Ikuta, K., & Weissman, I. L. (1992). Evidence that hematopoietic stem cells express
mouse c-kit but do not depend on steel factor for their generation. Proceedings of
the National Academy of Sciences, 89(4), 1502.
https://doi.org/10.1073/pnas.89.4.1502
Inoue, D., Bradley, R. K., & Abdel-Wahab, O. (2016). Spliceosomal gene mutations in
myelodysplasia: Molecular links to clonal abnormalities of hematopoiesis. Genes
& Development, 30(9), 989–1001. https://doi.org/10.1101/gad.278424.116
Iscove, N. (1990). Searching for stem cells. Nature, 347(6289), Article 6289.
https://doi.org/10.1038/347126a0
Jaiswal, S., & Ebert, B. L. (2019). Clonal hematopoiesis in human aging and disease.
Science, 366(6465), eaan4673. https://doi.org/10.1126/science.aan4673
Jaiswal, S., Fontanillas, P., Flannick, J., Manning, A., Grauman, P. V., Mar, B. G.,
Lindsley, R. C., Mermel, C. H., Burtt, N., Chavez, A., Higgins, J. M., Moltchanov,
V., Kuo, F. C., Kluk, M. J., Henderson, B., Kinnunen, L., Koistinen, H. A.,
Ladenvall, C., Getz, G., … Ebert, B. L. (2014). Age-related clonal hematopoiesis
associated with adverse outcomes. The New England Journal of Medicine,
371(26), 2488–2498. https://doi.org/10.1056/NEJMoa1408617
Jaitin, D. A., Kenigsberg, E., Keren-Shaul, H., Elefant, N., Paul, F., Zaretsky, I., Mildner,
A., Cohen, N., Jung, S., Tanay, A., & Amit, I. (2014). Massively Parallel Single-
146
Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types.
Science, 343(6172), 776–779. https://doi.org/10.1126/science.1247651
Jan, M., Snyder, T. M., Corces-Zimmerman, M. R., Vyas, P., Weissman, I. L., Quake, S.
R., & Majeti, R. (2012). Clonal Evolution of Preleukemic Hematopoietic Stem
Cells Precedes Human Acute Myeloid Leukemia. Science Translational
Medicine, 4(149), 149ra118-149ra118.
https://doi.org/10.1126/scitranslmed.3004315
Kalhor, R., Kalhor, K., Mejia, L., Leeper, K., Graveline, A., Mali, P., & Church, G. M.
(2018). Developmental barcoding of whole mouse via homing CRISPR. Science,
361(6405), eaat9804. https://doi.org/10.1126/science.aat9804
Karsunky, H., Inlay, M. A., Serwold, T., Bhattacharya, D., & Weissman, I. L. (2008).
Flk2+ common lymphoid progenitors possess equivalent differentiation potential
for the B and T lineages. Blood, 111(12), 5562–5570.
https://doi.org/10.1182/blood-2007-11-126219
Kebschull, J. M., & Zador, A. M. (2018). Cellular barcoding: Lineage tracing, screening
and beyond. Nature Methods, 15(11), 871–879. https://doi.org/10.1038/s41592-
018-0185-x
Keller, G., Paige, C., Gilboa, E., & Wagner, E. F. (1985). Expression of a foreign gene in
myeloid and lymphoid cells derived from multipotent haematopoietic precursors.
Nature, 318(6042), 149–154. https://doi.org/10.1038/318149a0
Khwaja, S. S., Liu, H., Tong, C., Jin, F., Pear, W. S., van Deursen, J., & Bram, R. J.
(2010). HIV-1 Rev-binding protein accelerates cellular uptake of iron to drive
Notch-induced T cell leukemogenesis in mice. The Journal of Clinical
Investigation, 120(7), 2537–2548. https://doi.org/10.1172/JCI41277
Kiel, M. J., Yilmaz, O. H., Iwashita, T., Yilmaz, O. H., Terhorst, C., & Morrison, S. J.
(2005). SLAM family receptors distinguish hematopoietic stem and progenitor
cells and reveal endothelial niches for stem cells. Cell, 121(7), 1109–1121.
https://doi.org/10.1016/j.cell.2005.05.026
Kim, E., Ilagan, J. O., Liang, Y., Daubner, G. M., Lee, S. C.-W., Ramakrishnan, A., Li,
Y., Chung, Y. R., Micol, J.-B., Murphy, M. E., Cho, H., Kim, M.-K., Zebari, A. S.,
Aumann, S., Park, C. Y., Buonamici, S., Smith, P. G., Deeg, H. J., Lobry, C., …
Abdel-Wahab, O. (2015). SRSF2 Mutations Contribute to Myelodysplasia by
Mutant-Specific Effects on Exon Recognition. Cancer Cell, 27(5), 617–630.
https://doi.org/10.1016/j.ccell.2015.04.006
147
Kivioja, T., Vähärautio, A., Karlsson, K., Bonke, M., Enge, M., Linnarsson, S., & Taipale,
J. (2012). Counting absolute numbers of molecules using unique molecular
identifiers. Nature Methods, 9(1), Article 1. https://doi.org/10.1038/nmeth.1778
Klein, A. M., Mazutis, L., Akartuna, I., Tallapragada, N., Veres, A., Li, V., Peshkin, L.,
Weitz, D. A., & Kirschner, M. W. (2015). Droplet Barcoding for Single-Cell
Transcriptomics Applied to Embryonic Stem Cells. Cell, 161(5), 1187–1201.
https://doi.org/10.1016/j.cell.2015.04.044
Ko, M., Bandukwala, H. S., An, J., Lamperti, E. D., Thompson, E. C., Hastie, R.,
Tsangaratou, A., Rajewsky, K., Koralov, S. B., & Rao, A. (2011). Ten-Eleven-
Translocation 2 (TET2) negatively regulates homeostasis and differentiation of
hematopoietic stem cells in mice. Proceedings of the National Academy of
Sciences of the United States of America, 108(35), 14566–14571.
https://doi.org/10.1073/pnas.1112317108
Kohler, G., & Milstein, C. (1975). Continuous cultures of fused cells secreting antibody
of predefined specificity. Nature, 256, 495.
Kon, S., Minegishi, N., Tanabe, K., Watanabe, T., Funaki, T., Wong, W. F., Sakamoto,
D., Higuchi, Y., Kiyonari, H., Asano, K., Iwakura, Y., Fukumoto, M., Osato, M.,
Sanada, M., Ogawa, S., Nakamura, T., & Satake, M. (2013). Smap1 deficiency
perturbs receptor trafficking and predisposes mice to myelodysplasia. The
Journal of Clinical Investigation, 123(3), 1123–1137.
https://doi.org/10.1172/JCI63711
Kondo, M., Wagers, A. J., Manz, M. G., Prohaska, S. S., Scherer, D. C., Beilhack, G. F.,
Shizuru, J. A., & Weissman, I. L. (2003). Biology of hematopoietic stem cells and
progenitors: Implications for clinical application. Annual Review of Immunology,
21, 759–806. https://doi.org/10.1146/annurev.immunol.21.120601.141007
Kondo, M., Weissman, I. L., & Akashi, K. (n.d.). Identification of Clonogenic Common
Lymphoid Progenitors in Mouse Bone Marrow. Cell, 91(5), 661–672.
https://doi.org/10.1016/S0092-8674(00)80453-5
Kurimoto, K., Yabuta, Y., Ohinata, Y., Ono, Y., Uno, K. D., Yamada, R. G., Ueda, H. R.,
& Saitou, M. (2006). An improved single-cell cDNA amplification method for
efficient high-density oligonucleotide microarray analysis. Nucleic Acids
Research, 34(5), e42. https://doi.org/10.1093/nar/gkl050
Langemeijer, S. M. C., Kuiper, R. P., Berends, M., Knops, R., Aslanyan, M. G., Massop,
M., Stevens-Linders, E., van Hoogen, P., van Kessel, A. G., Raymakers, R. A.
P., Kamping, E. J., Verhoef, G. E., Verburgh, E., Hagemeijer, A., Vandenberghe,
P., de Witte, T., van der Reijden, B. A., & Jansen, J. H. (2009). Acquired
mutations in TET2 are common in myelodysplastic syndromes. Nature Genetics,
41(7), 838–842. https://doi.org/10.1038/ng.391
148
Lee-Six, H., Øbro, N. F., Shepherd, M. S., Grossmann, S., Dawson, K., Belmonte, M.,
Osborne, R. J., Huntly, B. J. P., Martincorena, I., Anderson, E., O’Neill, L.,
Stratton, M. R., Laurenti, E., Green, A. R., Kent, D. G., & Campbell, P. J. (2018).
Population dynamics of normal human blood inferred from somatic mutations.
Nature, 561(7724), 473–478. https://doi.org/10.1038/s41586-018-0497-0
Lemischka, I. R., Raulet, D. H., & Mulligan, R. C. (1986). Developmental potential and
dynamic behavior of hematopoietic stem cells. Cell, 45(6), 917–927.
Li, J., & Ge, Z. (2021). High HSPA8 expression predicts adverse outcomes of acute
myeloid leukemia. BMC Cancer, 21(1), 475. https://doi.org/10.1186/s12885-021-
08193-w
Lindsley, R. C., Saber, W., Mar, B. G., Redd, R., Wang, T., Haagenson, M. D.,
Grauman, P. V., Hu, Z.-H., Spellman, S. R., Lee, S. J., Verneris, M. R., Hsu, K.,
Fleischhauer, K., Cutler, C., Antin, J. H., Neuberg, D., & Ebert, B. L. (2017).
Prognostic Mutations in Myelodysplastic Syndrome after Stem-Cell
Transplantation. The New England Journal of Medicine, 376(6), 536–547.
https://doi.org/10.1056/NEJMoa1611604
Livet, J., Weissman, T. A., Kang, H., Draft, R. W., Lu, J., Bennis, R. A., Sanes, J. R., &
Lichtman, J. W. (2007). Transgenic strategies for combinatorial expression of
fluorescent proteins in the nervous system. Nature, 450, 56.
Lu, R., Czechowicz, A., Seita, J., Jiang, D., & Weissman, I. L. (2019). Clonal-level
lineage commitment pathways of hematopoietic stem cells in vivo. Proceedings
of the National Academy of Sciences of the United States of America, 116(4),
1447–1456. https://doi.org/10.1073/pnas.1801480116
Lu, R., Neff, N. F., Quake, S. R., & Weissman, I. L. (2011). Tracking single
hematopoietic stem cells in vivo using high-throughput sequencing in conjunction
with viral genetic barcoding. Nature Biotechnology, 29(10), 928–933.
https://doi.org/10.1038/nbt.1977
Lyne, A.-M., Kent, D. G., Laurenti, E., Cornils, K., Glauche, I., & Perié, L. (2018). A track
of the clones: New developments in cellular barcoding. Experimental
Hematology, 68, 15–20. https://doi.org/10.1016/j.exphem.2018.11.005
Macosko, E. Z., Basu, A., Satija, R., Nemesh, J., Shekhar, K., Goldman, M., Tirosh, I.,
Bialas, A. R., Kamitaki, N., Martersteck, E. M., Trombetta, J. J., Weitz, D. A.,
Sanes, J. R., Shalek, A. K., Regev, A., & McCarroll, S. A. (2015). Highly Parallel
Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets.
Cell, 161(5), 1202–1214. https://doi.org/10.1016/j.cell.2015.05.002
149
Malcovati, L., Stevenson, K., Papaemmanuil, E., Neuberg, D., Bejar, R., Boultwood, J.,
Bowen, D. T., Campbell, P. J., Ebert, B. L., Fenaux, P., Haferlach, T., Heuser,
M., Jansen, J. H., Komrokji, R. S., Maciejewski, J. P., Walter, M. J., Fontenay,
M., Garcia-Manero, G., Graubert, T. A., … Cazzola, M. (2020). SF3B1-mutant
MDS as a distinct disease subtype: A proposal from the International Working
Group for the Prognosis of MDS. Blood, 136(2), 157–170.
https://doi.org/10.1182/blood.2020004850
McKenzie, J. L., Gan, O. I., Doedens, M., Wang, J. C. Y., & Dick, J. E. (2006). Individual
stem cells with highly variable proliferation and self-renewal properties comprise
the human hematopoietic stem cell compartment. Nature Immunology, 7, 1225.
Meacham, C. E., & Morrison, S. J. (2013). Tumor heterogeneity and cancer cell
plasticity. Nature, 501(7467), 328–337. https://doi.org/10.1038/nature12624
Merino, D., Weber, T. S., Serrano, A., Vaillant, F., Liu, K., Pal, B., Di Stefano, L.,
Schreuder, J., Lin, D., Chen, Y., Asselin-Labat, M. L., Schumacher, T. N.,
Cameron, D., Smyth, G. K., Papenfuss, A. T., Lindeman, G. J., Visvader, J. E., &
Naik, S. H. (2019). Barcoding reveals complex clonal behavior in patient-derived
xenografts of metastatic triple negative breast cancer. Nature Communications,
10(1), 766. https://doi.org/10.1038/s41467-019-08595-2
Metzeler, K. H., Maharry, K., Radmacher, M. D., Mrózek, K., Margeson, D., Becker, H.,
Curfman, J., Holland, K. B., Schwind, S., Whitman, S. P., Wu, Y.-Z., Blum, W.,
Powell, B. L., Carter, T. H., Wetzler, M., Moore, J. O., Kolitz, J. E., Baer, M. R.,
Carroll, A. J., … Bloomfield, C. D. (2011). TET2 mutations improve the new
European LeukemiaNet risk classification of acute myeloid leukemia: A Cancer
and Leukemia Group B study. Journal of Clinical Oncology: Official Journal of the
American Society of Clinical Oncology, 29(10), 1373–1381.
https://doi.org/10.1200/JCO.2010.32.7742
Mitnitski, A., Howlett, S. E., & Rockwood, K. (2017). Heterogeneity of Human Aging and
Its Assessment. The Journals of Gerontology. Series A, Biological Sciences and
Medical Sciences, 72(7), 877–884. PubMed.
https://doi.org/10.1093/gerona/glw089
Moran-Crusio, K., Reavie, L., Shih, A., Abdel-Wahab, O., Ndiaye-Lobry, D., Lobry, C.,
Figueroa, M. E., Vasanthakumar, A., Patel, J., Zhao, X., Perna, F., Pandey, S.,
Madzo, J., Song, C., Dai, Q., He, C., Ibrahim, S., Beran, M., Zavadil, J., …
Levine, R. L. (2011). Tet2 loss leads to increased hematopoietic stem cell self-
renewal and myeloid transformation. Cancer Cell, 20(1), 11–24.
https://doi.org/10.1016/j.ccr.2011.06.001
Morrison, S. J., & Weissman, I. L. (1994). The long-term repopulating subset of
hematopoietic stem cells is deterministic and isolatable by phenotype. Immunity,
1(8), 661–673.
150
Muller-Sieburg, C. E., Whitlock, C. A., & Weissman, I. L. (1986). Isolation of two early B
lymphocyte progenitors from mouse marrow: A committed Pre-Pre-B cell and a
clonogenic Thy-1lo hematopoietic stem cell. Cell, 44(4), 653–662.
https://doi.org/10.1016/0092-8674(86)90274-6
Muto, T., Guillamot, M., Yeung, J., Fang, J., Bennett, J., Nadorp, B., Lasry, A.,
Redondo, L. Z., Choi, K., Gong, Y., Walker, C. S., Hueneman, K., Bolanos, L. C.,
Barreyro, L., Lee, L. H., Greis, K. D., Vasyliev, N., Khodadadi-Jamayran, A.,
Nudler, E., … Starczynowski, D. T. (2022). TRAF6 functions as a tumor
suppressor in myeloid malignancies by directly targeting MYC oncogenic activity.
Cell Stem Cell, 29(2), 298-314.e9. https://doi.org/10.1016/j.stem.2021.12.007
Naik, S. H., Perié, L., Swart, E., Gerlach, C., van Rooij, N., de Boer, R. J., &
Schumacher, T. N. (2013). Diverse and heritable lineage imprinting of early
haematopoietic progenitors. Nature, 496(7444), 229–232.
https://doi.org/10.1038/nature12013
Naik, S. H., Schumacher, T. N., & Perié, L. (2014). Cellular barcoding: A technical
appraisal. Experimental Hematology, 42(8), 598–608.
https://doi.org/10.1016/j.exphem.2014.05.003
Nakajima, H., & Kunimoto, H. (2014). TET2 as an epigenetic master regulator for
normal and malignant hematopoiesis. Cancer Science, 105(9), 1093–1099.
PubMed. https://doi.org/10.1111/cas.12484
Nakauchi, Y., Azizi, A., Thomas, D., Corces, M. R., Reinisch, A., Sharma, R., Cruz
Hernandez, D., Köhnke, T., Karigane, D., Fan, A., Martinez-Krams, D., Stafford,
M., Kaur, S., Dutta, R., Phan, P., Ediriwickrema, A., McCarthy, E., Ning, Y.,
Phillips, T., … Majeti, R. (2022). The Cell Type–Specific 5hmC Landscape and
Dynamics of Healthy Human Hematopoiesis and TET2-Mutant Preleukemia.
Blood Cancer Discovery, 3(4), 346–367. https://doi.org/10.1158/2643-3230.BCD-
21-0143
Nam, A. S., Dusaj, N., Izzo, F., Murali, R., Myers, R. M., Mouhieddine, T. H., Sotelo, J.,
Benbarche, S., Waarts, M., Gaiti, F., Tahri, S., Levine, R., Abdel-Wahab, O.,
Godley, L. A., Chaligne, R., Ghobrial, I., & Landau, D. A. (2022). Single-cell multi-
omics of human clonal hematopoiesis reveals that DNMT3A R882 mutations
perturb early progenitor states through selective hypomethylation. Nature
Genetics, 54(10), Article 10. https://doi.org/10.1038/s41588-022-01179-9
Nestorowa, S., Hamey, F. K., Pijuan Sala, B., Diamanti, E., Shepherd, M., Laurenti, E.,
Wilson, N. K., Kent, D. G., & Göttgens, B. (2016). A single-cell resolution map of
mouse hematopoietic stem and progenitor cell differentiation. Blood, 128(8), e20-
31. https://doi.org/10.1182/blood-2016-05-716480
151
Nguyen, L. V., Cox, C. L., Eirew, P., Knapp, D. J. H. F., Pellacani, D., Kannan, N.,
Carles, A., Moksa, M., Balani, S., Shah, S., Hirst, M., Aparicio, S., & Eaves, C. J.
(2014). DNA barcoding reveals diverse growth kinetics of human breast tumour
subclones in serially passaged xenografts. Nature Communications, 5, 5871.
Nguyen, L. V., Makarem, M., Carles, A., Moksa, M., Kannan, N., Pandoh, P., Eirew, P.,
Osako, T., Kardel, M., Cheung, A. M. S., Kennedy, W., Tse, K., Zeng, T., Zhao,
Y., Humphries, R. K., Aparicio, S., Eaves, C. J., & Hirst, M. (2014). Clonal
Analysis via Barcoding Reveals Diverse Growth and Differentiation of
Transplanted Mouse and Human Mammary Stem Cells. Cell Stem Cell, 14(2),
253–263. https://doi.org/10.1016/j.stem.2013.12.011
Nguyen, L. V., Pellacani, D., Lefort, S., Kannan, N., Osako, T., Makarem, M., Cox, C. L.,
Kennedy, W., Beer, P., Carles, A., Moksa, M., Bilenky, M., Balani, S., Babovic,
S., Sun, I., Rosin, M., Aparicio, S., Hirst, M., & Eaves, C. J. (2015). Barcoding
reveals complex clonal dynamics of de novo transformed human mammary cells.
Nature, 528, 267.
Nguyen, L., Wang, Z., Chowdhury, A. Y., Chu, E., Eerdeng, J., Jiang, D., & Lu, R.
(2018). Functional compensation between hematopoietic stem cell clones in vivo.
EMBO Reports, 19(8), e45702. https://doi.org/10.15252/embr.201745702
O’Donnell, K. A., Yu, D., Zeller, K. I., Kim, J.-W., Racke, F., Thomas-Tikhonenko, A., &
Dang, C. V. (2006). Activation of transferrin receptor 1 by c-Myc enhances
cellular proliferation and tumorigenesis. Molecular and Cellular Biology, 26(6),
2373–2386. https://doi.org/10.1128/MCB.26.6.2373-2386.2006
Oguro, H., Ding, L., & Morrison, S. J. (2013). SLAM Family Markers Resolve
Functionally Distinct Subpopulations of Hematopoietic Stem Cells and
Multipotent Progenitors. Cell Stem Cell, 13(1), 102–116.
https://doi.org/10.1016/j.stem.2013.05.014
Osawa, M., Hanada, K., Hamada, H., & Nakauchi, H. (1996). Long-Term
Lymphohematopoietic Reconstitution by a Single CD34-Low/Negative
Hematopoietic Stem Cell. Science, 273(5272), 242.
https://doi.org/10.1126/science.273.5272.242
Osorio, F. G., Rosendahl Huber, A., Oka, R., Verheul, M., Patel, S. H., Hasaart, K., de
la Fonteijne, L., Varela, I., Camargo, F. D., & van Boxtel, R. (2018). Somatic
Mutations Reveal Lineage Relationships and Age-Related Mutagenesis in
Human Hematopoiesis. Cell Reports, 25(9), 2308-2316.e4.
https://doi.org/10.1016/j.celrep.2018.11.014
152
Pang, W. W., Schrier, S. L., & Weissman, I. L. (2017). Age-associated changes in
human hematopoietic stem cells. Aging and Hematopoiesis, 54(1), 39–42.
https://doi.org/10.1053/j.seminhematol.2016.10.004
Papaemmanuil, E., Gerstung, M., Bullinger, L., Gaidzik, V. I., Paschka, P., Roberts, N.
D., Potter, N. E., Heuser, M., Thol, F., Bolli, N., Gundem, G., Van Loo, P.,
Martincorena, I., Ganly, P., Mudie, L., McLaren, S., O’Meara, S., Raine, K.,
Jones, D. R., … Campbell, P. J. (2016). Genomic Classification and Prognosis in
Acute Myeloid Leukemia. New England Journal of Medicine, 374(23), 2209–
2221. https://doi.org/10.1056/NEJMoa1516192
Park, S., Mali, N. M., Kim, R., Choi, J.-W., Lee, J., Lim, J., Park, J. M., Park, J. W., Kim,
D., Kim, T., Yi, K., Choi, J. H., Kwon, S. G., Hong, J. H., Youk, J., An, Y., Kim, S.
Y., Oh, S. A., Kwon, Y., … Ju, Y. S. (2021). Clonal dynamics in early human
embryogenesis inferred from somatic mutation. Nature, 597(7876), Article 7876.
https://doi.org/10.1038/s41586-021-03786-8
Pei, W., Feyerabend, T. B., Rössler, J., Wang, X., Postrach, D., Busch, K., Rode, I.,
Klapproth, K., Dietlein, N., Quedenau, C., Chen, W., Sauer, S., Wolf, S., Höfer,
T., & Rodewald, H.-R. (2017). Polylox barcoding reveals haematopoietic stem
cell fates realized in vivo. Nature, 548(7668), 456–460.
https://doi.org/10.1038/nature23653
Pellin, D., Loperfido, M., Baricordi, C., Wolock, S. L., Montepeloso, A., Weinberg, O. K.,
Biffi, A., Klein, A. M., & Biasco, L. (2019). A comprehensive single cell
transcriptional landscape of human hematopoietic progenitors. Nature
Communications, 10(1), Article 1. https://doi.org/10.1038/s41467-019-10291-0
Pietras, E. M., Reynaud, D., Kang, Y.-A., Carlin, D., Calero-Nieto, F. J., Leavitt, A. D.,
Stuart, J. M., Göttgens, B., & Passegué, E. (2015). Functionally Distinct Subsets
of Lineage-Biased Multipotent Progenitors Control Blood Production in Normal
and Regenerative Conditions. Cell Stem Cell, 17(1), 35–46.
https://doi.org/10.1016/j.stem.2015.05.003
Quivoron, C., Couronné, L., Della Valle, V., Lopez, C. K., Plo, I., Wagner-Ballon, O., Do
Cruzeiro, M., Delhommeau, F., Arnulf, B., Stern, M.-H., Godley, L., Opolon, P.,
Tilly, H., Solary, E., Duffourd, Y., Dessen, P., Merle-Beral, H., Nguyen-Khac, F.,
Fontenay, M., … Bernard, O. A. (2011). TET2 inactivation results in pleiotropic
hematopoietic abnormalities in mouse and is a recurrent event during human
lymphomagenesis. Cancer Cell, 20(1), 25–38.
https://doi.org/10.1016/j.ccr.2011.06.003
Rasmussen, K. D., Jia, G., Johansen, J. V., Pedersen, M. T., Rapin, N., Bagger, F. O.,
Porse, B. T., Bernard, O. A., Christensen, J., & Helin, K. (2015). Loss of TET2 in
hematopoietic cells leads to DNA hypermethylation of active enhancers and
153
induction of leukemogenesis. Genes & Development, 29(9), 910–922. PubMed.
https://doi.org/10.1101/gad.260174.115
Rios, A. C., Fu, N. Y., Lindeman, G. J., & Visvader, J. E. (2014). In situ identification of
bipotent stem cells in the mammary gland. Nature, 506, 322.
Roche-Lestienne, C., Marceau, A., Labis, E., Nibourel, O., Coiteux, V., Guilhot, J.,
Legros, L., Nicolini, F., Rousselot, P., Gardembas, M., Helevaut, N., Frimat, C.,
Mahon, F.-X., Guilhot, F., Preudhomme, C., & Fi-LMC group. (2011). Mutation
analysis of TET2, IDH1, IDH2 and ASXL1 in chronic myeloid leukemia.
Leukemia, 25(10), 1661–1664. https://doi.org/10.1038/leu.2011.139
Rodriguez-Fraticelli, A. E., Wolock, S. L., Weinreb, C. S., Panero, R., Patel, S. H.,
Jankovic, M., Sun, J., Calogero, R. A., Klein, A. M., & Camargo, F. D. (2018).
Clonal analysis of lineage fate in native haematopoiesis. Nature, 553(7687),
212–216. https://doi.org/10.1038/nature25168
Rüfer, A. W., & Sauer, B. (2002). Non-contact positions impose site selectivity on Cre
recombinase. Nucleic Acids Research, 30(13), 2764–2771. PubMed.
https://doi.org/10.1093/nar/gkf399
Sangar, F., Schreurs, A.-S., Umaña-Diaz, C., Clapéron, A., Desbois-Mouthon, C.,
Calmel, C., Mauger, O., Zaanan, A., Miquel, C., Fléjou, J.-F., & Praz, F. (2014).
Involvement of small ArfGAP1 (SMAP1), a novel Arf6-specific GTPase-activating
protein, in microsatellite instability oncogenesis. Oncogene, 33(21), 2758–2767.
https://doi.org/10.1038/onc.2013.211
Schepers, K., Swart, E., van Heijst, J. W. J., Gerlach, C., Castrucci, M., Sie, D.,
Heimerikx, M., Velds, A., Kerkhoven, R. M., Arens, R., & Schumacher, T. N. M.
(2008). Dissecting T cell lineage relationships by cellular barcoding. The Journal
of Experimental Medicine, 205(10), 2309–2318.
https://doi.org/10.1084/jem.20072462
Schmidt, M., Schwarzwaelder, K., Bartholomae, C., Zaoui, K., Ball, C., Pilz, I., Braun,
S., Glimm, H., & von Kalle, C. (2007). High-resolution insertion-site analysis by
linear amplification–mediated PCR (LAM-PCR). Nature Methods, 4, 1051.
Shalek, A. K., Satija, R., Shuga, J., Trombetta, J. J., Gennert, D., Lu, D., Chen, P.,
Gertner, R. S., Gaublomme, J. T., Yosef, N., Schwartz, S., Fowler, B., Weaver,
S., Wang, J., Wang, X., Ding, R., Raychowdhury, R., Friedman, N., Hacohen, N.,
… Regev, A. (2014). Single-cell RNA-seq reveals dynamic paracrine control of
cellular variation. Nature, 510(7505), Article 7505.
https://doi.org/10.1038/nature13437
Shalem, O., Sanjana, N. E., Hartenian, E., Shi, X., Scott, D. A., Mikkelsen, T. S., Heckl,
D., Ebert, B. L., Root, D. E., Doench, J. G., & Zhang, F. (2014). Genome-Scale
154
CRISPR-Cas9 Knockout Screening in Human Cells. Science, 343(6166), 84.
https://doi.org/10.1126/science.1247005
Shen, F.-W., Tung, J.-S., & Boyse, E. A. (1986). Further definition of the Ly-5 system.
Immunogenetics, 24(3), 146–149. https://doi.org/10.1007/BF00364741
Shen, M. W., Arbab, M., Hsu, J. Y., Worstell, D., Culbertson, S. J., Krabbe, O., Cassa,
C. A., Liu, D. R., Gifford, D. K., & Sherwood, R. I. (2018). Predictable and precise
template-free CRISPR editing of pathogenic variants. Nature, 563(7733), 646–
651. https://doi.org/10.1038/s41586-018-0686-x
Sieburg, H. B., Cho, R. H., Dykstra, B., Uchida, N., Eaves, C. J., & Muller-Sieburg, C. E.
(2006). The hematopoietic stem compartment consists of a limited number of
discrete stem cell subsets. Blood, 107(6), 2311–2316.
https://doi.org/10.1182/blood-2005-07-2970
Siegmund, D., Mauri, D., Peters, N., Juo, P., Thome, M., Reichwein, M., Blenis, J.,
Scheurich, P., Tschopp, J., & Wajant, H. (2001). Fas-associated Death Domain
Protein (FADD) and Caspase-8 Mediate Up-regulation of c-Fos by Fas Ligand
and Tumor Necrosis Factor-related Apoptosis-inducing Ligand (TRAIL) via a
FLICE Inhibitory Protein (FLIP)-regulated Pathway*. Journal of Biological
Chemistry, 276(35), 32585–32590. https://doi.org/10.1074/jbc.M100444200
Smith, A. G., Heath, J. K., Donaldson, D. D., Wong, G. G., Moreau, J., Stahl, M., &
Rogers, D. (1988). Inhibition of pluripotential embryonic stem cell differentiation
by purified polypeptides. Nature, 336(6200), Article 6200.
https://doi.org/10.1038/336688a0
Sun, J., Ramos, A., Chapman, B., Johnnidis, J. B., Le, L., Ho, Y.-J., Klein, A., Hofmann,
O., & Camargo, F. D. (2014). Clonal dynamics of native haematopoiesis. Nature,
514(7522), 322–327. https://doi.org/10.1038/nature13824
Takahashi, K., & Yamanaka, S. (2006). Induction of Pluripotent Stem Cells from Mouse
Embryonic and Adult Fibroblast Cultures by Defined Factors. Cell, 126(4), 663–
676. https://doi.org/10.1016/j.cell.2006.07.024
Tanabe, K., Torii, T., Natsume, W., Braesch-Andersen, S., Watanabe, T., & Satake, M.
(2005). A Novel GTPase-activating Protein for ARF6 Directly Interacts with
Clathrin and Regulates Clathrin-dependent Endocytosis. Molecular Biology of the
Cell, 16(4), 1617–1628. https://doi.org/10.1091/mbc.e04-08-0683
Tang, F., Barbacioru, C., Wang, Y., Nordman, E., Lee, C., Xu, N., Wang, X., Bodeau, J.,
Tuch, B. B., Siddiqui, A., Lao, K., & Surani, M. A. (2009). MRNA-Seq whole-
transcriptome analysis of a single cell. Nature Methods, 6(5), Article 5.
https://doi.org/10.1038/nmeth.1315
155
Tefferi, A., & Vardiman, J. W. (2009). Myelodysplastic syndromes. The New England
Journal of Medicine, 361(19), 1872–1885.
https://doi.org/10.1056/NEJMra0902908
Thielecke, L., Aranyossy, T., Dahl, A., Tiwari, R., Roeder, I., Geiger, H., Fehse, B.,
Glauche, I., & Cornils, K. (2017). Limitations and challenges of genetic barcode
quantification. Scientific Reports, 7, 43249.
Thielecke, L., Cornils, K., & Glauche, I. (2019). GenBaRcode – a comprehensive R
package for genetic barcode analysis. BioRxiv, 696229.
https://doi.org/10.1101/696229
Till, J. E., McCulloch, E. A., & Siminovitch, L. (1964). A stochastic model of stem cell
proliferation, based on the growth of spleen colony-forming cells*. Proceedings of
the National Academy of Sciences, 51(1), 29–36.
https://doi.org/10.1073/pnas.51.1.29
Tulstrup, M., Soerensen, M., Hansen, J. W., Gillberg, L., Needhamsen, M., Kaastrup,
K., Helin, K., Christensen, K., Weischenfeldt, J., & Grønbæk, K. (2021). TET2
mutations are associated with hypermethylation at key regulatory enhancers in
normal and malignant hematopoiesis. Nature Communications, 12(1), 6061.
https://doi.org/10.1038/s41467-021-26093-2
Verovskaya, E., Broekhuis, M. J. C., Zwart, E., Ritsema, M., van Os, R., de Haan, G., &
Bystrykh, L. V. (2013). Heterogeneity of young and aged murine hematopoietic
stem cells revealed by quantitative clonal analysis using cellular barcoding.
Blood, 122(4), 523–532. https://doi.org/10.1182/blood-2013-01-481135
Wang, T., Wei, J. J., Sabatini, D. M., & Lander, E. S. (2014). Genetic Screens in Human
Cells Using the CRISPR-Cas9 System. Science, 343(6166), 80.
https://doi.org/10.1126/science.1246981
Warren, J. T., & Link, D. C. (2020). Clonal hematopoiesis and risk for hematologic
malignancy. Blood, 136(14), 1599–1605.
https://doi.org/10.1182/blood.2019000991
Wasserstrom, A., Adar, R., Shefer, G., Frumkin, D., Itzkovitz, S., Stern, T., Shur, I.,
Zangi, L., Kaplan, S., Harmelin, A., Reisner, Y., Benayahu, D., Tzahor, E., Segal,
E., & Shapiro, E. (2008). Reconstruction of Cell Lineage Trees in Mice. PLOS
ONE, 3(4), e1939. https://doi.org/10.1371/journal.pone.0001939
Weber, K., Thomaschewski, M., Benten, D., & Fehse, B. (2012). RGB marking with
lentiviral vectors for multicolor clonal cell tracking. Nature Protocols, 7, 839.
156
Weinreb, C., Rodriguez-Fraticelli, A., Camargo, F. D., & Klein, A. M. (2020). Lineage
tracing on transcriptional landscapes links state to fate during differentiation.
Science, 367(6479), eaaw3381. https://doi.org/10.1126/science.aaw3381
Weinstock, J. S., Gopakumar, J., Burugula, B. B., Uddin, M. M., Jahn, N., Belk, J. A.,
Daniel, B., Ly, N., Mack, T. M., Laurie, C. A., Broome, J. G., Taylor, K. D., Guo,
X., Sinner, M. F., Falkenhausen, A. S. von, Kääb, S., Shuldiner, A. R., O’Connell,
J. R., Lewis, J. P., … Consortium, on behalf of the N. T.-O. for P. M. (TOPMed).
(2021). Clonal hematopoiesis is driven by aberrant activation of TCL1A (p.
2021.12.10.471810). bioRxiv. https://doi.org/10.1101/2021.12.10.471810
Weissman, I. L. (2000). Stem cells: Units of development, units of regeneration, and
units in evolution. Cell, 100(1), 157–168.
Weissman, I. L., & Shizuru, J. A. (2008). The origins of the identification and isolation of
hematopoietic stem cells, and their capability to induce donor-specific
transplantation tolerance and treat autoimmune diseases. Blood, 112(9), 3543–
3553. https://doi.org/10.1182/blood-2008-08-078220
Welch, J. S., Ley, T. J., Link, D. C., Miller, C. A., Larson, D. E., Koboldt, D. C.,
Wartman, L. D., Lamprecht, T. L., Liu, F., Xia, J., Kandoth, C., Fulton, R. S.,
McLellan, M. D., Dooling, D. J., Wallis, J. W., Chen, K., Harris, C. C., Schmidt, H.
K., Kalicki-Veizer, J. M., … Wilson, R. K. (2012). The Origin and Evolution of
Mutations in Acute Myeloid Leukemia. Cell, 150(2), 264–278.
https://doi.org/10.1016/j.cell.2012.06.023
Weng, M.-T., Tung, T.-H., Lee, J.-H., Wei, S.-C., Lin, H.-L., Huang, Y.-J., Wong, J.-M.,
Luo, J., & Sheu, J.-C. (2015). Enhancer of rudimentary homolog regulates DNA
damage response in hepatocellular carcinoma. Scientific Reports, 5(1), 9357.
https://doi.org/10.1038/srep09357
Woodworth, M. B., Girskis, K. M., & Walsh, C. A. (2017). Building a lineage from single
cells: Genetic techniques for cell lineage tracking. Nature Reviews Genetics, 18,
230.
Wu, C., Espinoza, D. A., Koelle, S. J., Yang, D., Truitt, L., Schlums, H., Lafont, B. A.,
Davidson-Moncada, J. K., Lu, R., Kaur, A., Hammer, Q., Li, B., Panch, S., Allan,
D. A., Donahue, R. E., Childs, R. W., Romagnani, C., Bryceson, Y. T., & Dunbar,
C. E. (2018). Clonal expansion and compartmentalized maintenance of rhesus
macaque NK cell subsets. Science Immunology, 3(29).
https://doi.org/10.1126/sciimmunol.aat9781
Wu, C., Jares, A., Winkler, T., Xie, J., Larochelle, A., & Dunbar, C. E. (2011). Tracking
Retroviral-Integrated Clones with Modified Non-Restriction Enzyme LAM-PCR
Technology. Molecular Therapy, 19, S45. https://doi.org/10.1016/S1525-
0016(16)36685-0
157
Wu, C., Jares, A., Winkler, T., Xie, J., Metais, J.-Y., & Dunbar, C. E. (2013). High
efficiency restriction enzyme-free linear amplification-mediated polymerase chain
reaction approach for tracking lentiviral integration sites does not abrogate
retrieval bias. Human Gene Therapy, 24(1), 38–47. PubMed.
https://doi.org/10.1089/hum.2012.082
Wu, C., Li, B., Lu, R., Koelle, S. J., Yang, Y., Jares, A., Krouse, A. E., Metzger, M.,
Liang, F., Loré, K., Wu, C. O., Donahue, R. E., Chen, I. S. Y., Weissman, I., &
Dunbar, C. E. (2014). Clonal tracking of rhesus macaque hematopoiesis
highlights a distinct lineage origin for natural killer cells. Cell Stem Cell, 14(4),
486–499. https://doi.org/10.1016/j.stem.2014.01.020
Yamanaka, S. (2020). Pluripotent Stem Cell-Based Cell Therapy—Promise and
Challenges. Cell Stem Cell, 27(4), 523–531.
https://doi.org/10.1016/j.stem.2020.09.014
Yashin, A. I., Arbeev, K. G., Arbeeva, L. S., Wu, D., Akushevich, I., Kovtun, M., Yashkin,
A., Kulminski, A., Culminskaya, I., Stallard, E., Li, M., & Ukraintseva, S. V.
(2016). How the effects of aging and stresses of life are integrated in mortality
rates: Insights for genetic studies of human health and longevity. Biogerontology,
17(1), 89–107. PubMed. https://doi.org/10.1007/s10522-015-9594-8
Yeaton, A., Cayanan, G., Loghavi, S., Dolgalev, I., Leddin, E. M., Loo, C. E., Torabifard,
H., Nicolet, D., Wang, J., Corrigan, K., Paraskevopoulou, V., Starczynowski, D.
T., Wang, E., Abdel-Wahab, O., Viny, A. D., Stone, R. M., Byrd, J. C.,
Guryanova, O. A., Kohli, R. M., … Guillamot, M. (2022). The Impact of
Inflammation-Induced Tumor Plasticity during Myeloid Transformation. Cancer
Discovery, 12(10), 2392–2413. https://doi.org/10.1158/2159-8290.CD-21-1146
Yu, W.-M., Liu, X., Shen, J., Jovanovic, O., Pohl, E. E., Gerson, S. L., Finkel, T.,
Broxmeyer, H. E., & Qu, C.-K. (2013). Metabolic regulation by the mitochondrial
phosphatase PTPMT1 is required for hematopoietic stem cell differentiation. Cell
Stem Cell, 12(1), 62–74. https://doi.org/10.1016/j.stem.2012.11.022
Zakrzewski, W., Dobrzyński, M., Szymonowicz, M., & Rybak, Z. (2019). Stem cells:
Past, present, and future. Stem Cell Research & Therapy, 10(1), 68.
https://doi.org/10.1186/s13287-019-1165-5
Zhang, X., Su, J., Jeong, M., Ko, M., Huang, Y., Park, H. J., Guzman, A., Lei, Y.,
Huang, Y.-H., Rao, A., Li, W., & Goodell, M. A. (2016). DNMT3A and TET2
compete and cooperate to repress lineage-specific transcription factors in
hematopoietic stem cells. Nature Genetics, 48(9), 1014–1023. PubMed.
https://doi.org/10.1038/ng.3610
158
Zhang, Y., Qian, J., Gu, C., & Yang, Y. (2021). Alternative splicing and cancer: A
systematic review. Signal Transduction and Targeted Therapy, 6(1), 78.
https://doi.org/10.1038/s41392-021-00486-7
Zhou, A., Ou, A. C., Cho, A., Benz, E. J., & Huang, S.-C. (2008). Novel Splicing Factor
RBM25 Modulates Bcl-x Pre-mRNA 5′ Splice Site Selection. Molecular and
Cellular Biology, 28(19), 5924–5936. https://doi.org/10.1128/MCB.00560-08
Zhou, C., Martinez, E., Di Marcantonio, D., Solanki-Patel, N., Aghayev, T., Peri, S.,
Ferraro, F., Skorski, T., Scholl, C., Fröhling, S., Balachandran, S., Wiest, D. L., &
Sykes, S. M. (2017). JUN is a key transcriptional regulator of the unfolded protein
response in acute myeloid leukemia. Leukemia, 31(5), 1196–1205.
https://doi.org/10.1038/leu.2016.329
Zhou, S., Bonner, M. A., Wang, Y.-D., Rapp, S., De Ravin, S. S., Malech, H. L., &
Sorrentino, B. P. (2014). Quantitative Shearing Linear Amplification Polymerase
Chain Reaction: An Improved Method for Quantifying Lentiviral Vector Insertion
Sites in Transplanted Hematopoietic Cell Systems. Human Gene Therapy
Methods, 26(1), 4–12. https://doi.org/10.1089/hgtb.2014.122
Zhu, M., Lu, T., Jia, Y., Luo, X., Gopal, P., Li, L., Odewole, M., Renteria, V., Singal, A.
G., Jang, Y., Ge, K., Wang, S. C., Sorouri, M., Parekh, J. R., MacConmara, M.
P., Yopp, A. C., Wang, T., & Zhu, H. (2019). Somatic Mutations Increase Hepatic
Clonal Fitness and Regeneration in Chronic Liver Disease. Cell, 177(3), 608-
621.e12. https://doi.org/10.1016/j.cell.2019.03.026
Zink, F., Stacey, S. N., Norddahl, G. L., Frigge, M. L., Magnusson, O. T., Jonsdottir, I.,
Thorgeirsson, T. E., Sigurdsson, A., Gudjonsson, S. A., Gudmundsson, J.,
Jonasson, J. G., Tryggvadottir, L., Jonsson, T., Helgason, A., Gylfason, A.,
Sulem, P., Rafnar, T., Thorsteinsdottir, U., Gudbjartsson, D. F., … Stefansson, K.
(2017). Clonal hematopoiesis, with and without candidate driver mutations, is
common in the elderly. Blood, 130(6), 742–752. PubMed.
https://doi.org/10.1182/blood-2017-02-769869
159
Appendix
List of publications
Publications published or submitted by December 2022. Achieved during doctoral
training.
First author:
• Bramlett, C., Eerdeng, J., Jiang, D., Condie, P., Lee, Y., Vergel, M., Garcia, I.,
Nogalska, A., & Lu, R. “RNA splicing factors underlie preleukemic clonal
expansion”. (2022). In submission.
• Bramlett, C., Jiang, D., Nogalska, A., Eerdeng, J., Contreras, J., & Lu, R. (2020).
Clonal tracking using embedded viral barcoding and high-throughput sequencing.
Nature Protocols, 15(4), 1436–1458. https://doi.org/10.1038/s41596-019-0290-z
Co-author:
• Nogalska, A., Eerdeng, J., Akre, S., Bramlett, C., Wang, B., Cess, C., Finley, S.,
& Lu, R. “Small subsets of stem cells drive the variability in the onset of
hematopoietic aging”. (2022). In submission.
• Contreras-Trujillo, H., Eerdeng, J., Akre, S., Jiang, D., Contreras, J., Gala, B.,
Vergel-Rodriguez, M. C., Lee, Y., Jorapur, A., Andreasian, A., Harton, L.,
Bramlett, C. S., Nogalska, A., Xiao, G., Lee, J.-W., Chan, L. N., Müschen, M.,
Merchant, A. A., & Lu, R. (2021). Deciphering intratumoral heterogeneity using
integrated clonal tracking and single-cell transcriptome analyses. Nature
Communications, 12(1), 6522. https://doi.org/10.1038/s41467-021-26771-1
• Hao, J., Zhou, H., Nemes, K., Yen, D., Zhao, W., Bramlett, C., Wang, B., Lu, R.,
& Shen, K. (2021). Membrane-bound SCF and VCAM-1 synergistically regulate
the morphology of hematopoietic stem cells. Journal of Cell Biology, 220(10),
e202010118. https://doi.org/10.1083/jcb.202010118
Abstract (if available)
Abstract
Clonal expansion is a critical step in the early stage of leukemia genesis. While genetic mutations that induce expansion and increase disease risk --such as TET2, DNMT3a, and ASXL-- have been identified, these mutations are also frequently found in healthy individuals and many never progress to malignancy. This thesis aims to further understand the basis of preleukemic heterogeneity, especially amongst cells that share the same genetic mutation. Here, we tracked preleukemic clonal expansion using genetic barcoding and found preleukemic hematopoietic stem cells (HSCs) are highly heterogeneous both at the clonal and transcriptomic levels. We identified HSC clones that underwent extreme expansion and their associated gene expression. These overexpanded HSCs expressed significantly lower levels of genes associated with leukemia predominately associated with RNA splicing genes compared to non-overexpanded Tet2 knockout HSCs. These heterogeneous differences could contribute to the variable disease risks across individuals and further risk-stratify patients who harbor preleukemic mutations.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Dissecting the heterogeneity of mouse hematopoietic stem cells in vivo
PDF
Molecular signatures underlying intercellular differences in leukemia progression and chemotherapy response
PDF
Functional compensation between hematopoietic stem cell clones in vivo
PDF
The autism-associated gene SYNGAP1 regulates human cortical neurogenesis
PDF
Exploring the function of distal nephron enhancers in zebrafish
PDF
Novel roles for Maf1 in embryonic stem cell differentiation and adipogenesis
PDF
Investigating molecular roadblocks to enhance direct cellular reprogramming
PDF
Role of the bone marrow niche components in B cell malignancies
PDF
Transcriptional and epigenetic mechanisms underlying sensory hair cell differentiation and regeneration
PDF
Integrative genomic and epigenomic analysis of human cancer
PDF
Exploring stem cell pluripotency through long range chromosome interactions
PDF
Development of an “in cubo” culture system for the study of early avian development using dynamic imaging
PDF
Mechanisms that dictate beta cells’ response to stress in the context of genetic mutation, pregnancy, and infection
PDF
Transcriptomic maturation of developing human cone precursors in fetal and 3D hESC-derived tissues
PDF
Membrane-bound regulation of hematopoietic stem cells
PDF
Functional characterization of colon cancer risk-associated enhancers: connecting risk loci to risk genes
PDF
Regional localization and regulation of hematopoietic stem cells in the bone marrow stem cell niche
PDF
Transcriptional regulation in nephron progenitor cells
PDF
Functional study of C9ORF72 and its implication in the pathogenesis of amyotrophic lateral sclerosis
PDF
Signaling networks in complex brain disorders
Asset Metadata
Creator
Bramlett, Charles
(author)
Core Title
Deciphering heterogeneity of preleukemic clonal expansion
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Development, Stem Cells and Regenerative Medicine
Publication Date
11/21/2022
Defense Date
10/17/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
cellular heterogeneity,clonal hematopoiesis,clonal tracking,OAI-PMH Harvest,preleukemia,TET2
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Yu, Min (
committee chair
), Jadhav, Unmesh (
committee member
), Lu, Rong (
committee member
)
Creator Email
cbramlet@usc.edu,cs.bramlett@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC112485547
Unique identifier
UC112485547
Identifier
etd-BramlettCh-11323.pdf (filename)
Legacy Identifier
etd-BramlettCh-11323
Document Type
Dissertation
Format
theses (aat)
Rights
Bramlett, Charles
Internet Media Type
application/pdf
Type
texts
Source
20221121-usctheses-batch-992
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
cellular heterogeneity
clonal hematopoiesis
clonal tracking
preleukemia
TET2