Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Exploring stem cell pluripotency through long range chromosome interactions
(USC Thesis Other)
Exploring stem cell pluripotency through long range chromosome interactions
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DISSERTATION
EXPLORING STEM CELL PLURIPOTENCY THROUGH LONG RANGE
CHROMOSOME INTERACTIONS
By
David Shih Yu Huang
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the Requirements
for the Degree
DOCTOR OF PHILOSOPHY
(GENETIC, MOLECULAR AND CELLULAR BIOLOGY)
December 2018
Doctoral Committee:
Advisor: Wange Lu
Neil Segil
Michael Stallcup
Copyright 2018 David Huang
i
Acknowledgements
It has been a privilege to complete this dissertation, and I am grateful to have had
the opportunity to do research at USC. I could not have accomplished what I have without
the help and support of various people in my life whom I want to appreciate here.
First and foremost, I would like to thank my advisor Dr. Wange Lu for his support
and encouragement in my research endeavors. Without the physical and intellectual
space he has provided me to pursue various research avenues, I would not be where I
am today. I have learned a great deal from him about what it takes to survive on the
cutting edge of research and how to drive research forward through innovative means.
Second, I want to thank the former and current members of the Lu lab. They have
been a source of strength for me and a forum for intellectual discussion as I have
navigated my project forward. In particular, I want to thank Wen Hsuan for helping me get
started in the lab and teaching me essential lab techniques to begin my lab work. I also
owe my gratitude to Byoung San, who taught me how to organize my research and helped
me put the manuscript together when I encountered challenges. Additionally, I want to
thank Mingyang for helping extensively with the bioinformatics analysis and being a moral
support in the lab, and Fan also for his help with bioinformatics analysis.
I would also like to thank members of my qualifying and dissertation committee for
their insights and constructive criticism on my research: Dr. Gerhard Coetzee, Dr. Michael
Stallcup, Dr. Neil Segil, Dr. Qilong Ying, Dr. Justin Ichida, and Dr. Alex Wong. The
guidance and advice they’ve provided during my annual research appraisal meetings
have helped steer my project to a success.
ii
I also want to thank my classmates Camila, Tim, and Ingrid for commiserate
support and encouragement throughout the PhD process. The many friends I’ve made in
my time working in BCC, in the stem cell department, and in general at USC are all
valuable to me. I also want to express appreciation for the staff at the graduate affairs
office for making my PhD experience as smooth as it has been.
Lastly and most importantly, I want to thank my family for their unconditional love
and support through the arduous years it’s taken for me to finish my PhD work.
iii
Table of Contents
Acknowledgements
i
List of Figures
v
List of Tables
vii
Abstract
viii
Chapter 1: Nuclear Architecture In Pluripotent Stem Cells
1
1.1 A Three Dimensional Genome
1
1.2 Tools to Study Chromosome Structure: Microscopy Techniques
1
1.3 Chromosome Conformation Capture
3
1.4 DNA Adenine Methyltransferase Identification
7
1.5 Chromatin Structure and Transcriptional Regulation
8
1.6 Enhancers and Chromatin Looping
12
1.7 Long Range Interchromosome Interactions
15
1.8 Site Targeted Genome Engineering
17
1.9 Embryonic Stem Cells and Pluripotency
22
1.10 Regulation of Pluripotency By Extrinsic and Intrinsic Factors
25
1.11 Induced Pluripotent Stem Cells
28
1.12 Pluripotency and Chromatin Interactions
32
Chapter 2: Improved Detection of Long Range Chromatin Interactions With
CRISPR-Dam
34
2.1 Abstract
34
2.2 Introduction
34
2.3 Materials and Methods
37
2.4 Results
42
iv
2.4.1 Construction and Validation of CRISPR-Dam Components
41
2.4.2 Validation of CRISPR-Dam By Profiling Interactions at the Pou5f1
locus
45
2.4.3 Determination of Optimal Dox Induction Time
45
2.4.4 CRISPR-Dam Sequencing and Comparison with 4C-seq
50
2.5 Discussion
55
Chapter 3: The Oct4 Distal Enhancer Interacts With Distal Targets Llgl2 and
Grb7 to Regulate ESC Pluripotency
62
3.1 Abstract
62
3.2 Introduction
62
3.3 Materials and Methods
64
3.4 Results
67
3.4.1 Identification of Long Range Interchromosome Targets Llg2 and
Grb7
68
3.4.2 Reciprocal Interaction of Llgl2 and Grb7 with the Oct4 DE
75
3.4.3 Oct4 distal enhancer CR4 region directly regulates Llgl2 and Grb7
77
3.4.4 Llgl2 and Grb7 Knockdown Correlates to a Disrupted Pluripotency
State
80
3.4.5 Overexpression of Llgl2 and Grb7 Enhances Reprogramming
Efficiency
82
3.5 Discussion
85
Chapter 4: Summary and Future Perspectives 95
4.1 Summary
95
4.2 Future Development of CRISPR-Dam
96
4.3 Functional Chromatin Interactions In Disease
97
Bibliography 99
v
List of Figures
Figure 1.1 Chromosome Territories Identified By DNA FISH
2
Figure 1.2 Chromosome Conformation Capture (3C) and Derived Techniques
5
Figure 1.3 DNA Adenine Methyltransferase Identification
8
Figure 1.4 Compartmentalization of the Nucleus and Chromatin Loops
11
Figure 1.5 Model of the β-globin gene locus
14
Figure 1.6 CRISPR-Cas9 Genome Editing
19
Figure 1.7 Derivation of Embryonic Stem Cells
24
Figure 1.8 Somatic Cell Reprogramming By Induced Pluripotency
29
Figure 2.1 Schematic of CRISPR-Dam Assay
43
Figure 2.2 Identification of ESC lines with Integrated Cas9-Dam
44
Figure 2.3 Confirmation of sgRNA Expression and Targeting
45
Figure 2.4 Time Course Induction of Cas9-Dam Labeling Near the Pou5f1 Locus
47
Figure 2.5 CRISPR-Dam Labeling in FD1 ESC clone Targeting the Oct4 DE
49
Figure 2.6 CRISPR-Dam Sequencing Library Preparation Strategy
49
Figure 2.7 Time Course CRISPR-Dam Coverage at the Pou5f1 Locus
51
Figure 2.8 HOMER Identified 6 Hour Induced CRISPR-Dam Peaks
52
Figure 2.9 Comparison of CRISPR-Dam Peaks and 4C Identified Regions
53
Figure 3.1 CRISPR-Dam Interaction Profile at Llgl2 Locus From Oct4 DE
69
Figure 3.2 CRISPR-Dam Interaction Profile at Grb7 Locus From Oct4 DE
70
Figure 3.3 3C–PCR gels Tiling Interactions Between Oct4 DE and Llgl2 Loci
71
Figure 3.4 3C–PCR gels Tiling Interactions Between Oct4 DE and Grb7 Loci
71
Figure 3.5 3C-PCR Plots Summarizing Contact Frequencies at Llgl2 and Grb7
Regions
72
vi
Figure 3.6 DNA FISH with BACs Representing Oct4 DE, Llgl2, and Grb7 Loci
74
Figure 3.7 Reciprocal CRISPR-Dam Interaction Plots from Llgl2 locus to the Oct4
DE locus
75
Figure 3.8 Reciprocal CRISPR-Dam Interaction Plots from Grb7 locus to the Oct4
DE locus
76
Figure 3.9 Schematic of LoxP Sites Insertion Flanking CR4 Region Within the
Oct4 DE
77
Figure 3.10 Characterizing Oct4 DE CR4 Floxed ESCs
78
Figure 3.11 Pluripotency State of Oct4 DE CR4 Floxed ESCs Is Disrupted
79
Figure 3.12 Loss of Llgl2 or Grb7 Expression Perturbs the Pluripotent State in
ESCs
81
Figure 3.13 Knock Down of Llgl2 and Grb7 Induces Loss of Pluripotency Gene
Expression and Increases Differentiation Gene Expression
82
Figure 3.14 Somatic Cell Reprogramming Is Enhanced With Overexpression of
Llgl2 or Grb7
83
Figure 3.15 Llgl2 and Grb7 Overexpression Aids in Reestablishing Endogenous
Pluripotency Gene Expression
84
vii
List of Tables
Table 2.1 REACTOME Analysis of CRISPR-Dam Identified Genes
54
Table 2.2 CRISPR-Dam Tiling Primers at Pou5f1 Locus
59
Table 2.3 CRISPR-Dam Targeting sgRNA Primers for Oct4 DE
61
Table 3.1 BACs Use to Generate DNA FISH Probes
74
Table 3.2 CRISPR-Dam Tiling Primers at Llgl2 and Grb7
89
Table 3.3 CRISPR-Dam Targeting sgRNA Sequences for Reciprocal Pou5f1
Locus Interaction
90
Table 3.4 3C-PCR Primers
91
Table 3.5 CRISPR Knock-In LoxP Primers
92
Table 3.6 shRNA Targeted Sequences
93
Table 3.7 RT-qPCR Gene Expression Primers
94
viii
Abstract
Exploration of the genome and functional elements contained within the DNA
sequence continue to bring forth surprising discoveries about nature and dynamics of the
nuclear architecture. As many new methods for examining chromatin interactions have
emerged in the past decade, there now is a better understanding of the overall spatial
configuration of the chromosomes in the interphase cell nucleus and overarching
principles of nuclear compartmentalization. In particular, biochemical approaches like
chromosome conformation capture (3C) techniques and its variations have been
developed further by various research groups to continue deciphering the matrix of
interactions in different types of cells. These advances have helped bolster the idea that
spatial organization of the chromatin provides an additional layer of regulation within the
cell for controlling transcriptional programs. However, limitations inherent to the 3C
method have made study of long range interacting chromatin loci difficult. To address the
shortcomings encountered in 3C methods, I have developed a novel assay combining the
specific locus targeting by CRISPR-Cas9 and the DNA labeling function of the DNA
adenine methyltransferase protein utilized in DamID assays. Drawing upon the strengths
of both techniques and performing the assay in live cells, CRISPR-Dam is able to better
characterize the proximity mediated interaction network at a specified locus of interest.
Using the well studied pluripotency gene network in mouse embryonic stem cells as a
model system to validate the new assay, we focused on the interactions mediated by the
Oct4 distal enhancer (DE). The Oct4 DE is required for Oct4 gene expression and
maintenance of stem cell pluripotency. Previously, 4C-seq performed to characterize the
Oct4 DE interactome in mouse embryonic stem cells (ESC) was unsuccessful in
identifying distally located functional target genes of the DE. Performing the CRISPR-
ix
Dam assay at the Oct4 DE in mouse ESCs helped identify two candidate genes, lethal
giant larvae 2 (Llgl2) and growth factor receptor-bound protein 7 (Grb7), both located on
chromosome 11. Deletion of the Oct4 DE via Cre-loxP recombination showed that Llgl2
and Grb7 are directly regulated by the Oct4 DE and their expression cannot be rescued
by ectopic OCT4. We further characterized Llgl2 and Grb7 roles in supporting ESC
pluripotency and found that both genes are necessary to maintain ESC pluripotency and
overexpression of either gene enhances reprogramming efficiency, suggesting that Llgl2
and Grb7 play functional roles in ESC biology.
1
Chapter 1: Nuclear Architecture in Pluripotent Stem Cells
1.1 A Three Dimensional Genome
The success of whole genome sequencing of humans and other organisms has
greatly advanced knowledge of the nucleic acid content and base ordering within the cell
nucleus, but there remains much to explore regarding usage and regulation of the DNA
sequence that govern biological function. Conventional conception of the genome in
many molecular biology studies imagines the chromatin as a simple, one-dimensional,
linear strand, with genomic elements functioning without any spatial configuration or
limitation. Additionally, proteins and other regulatory factors and complexes are assumed
to bind at chromatin sites without much restriction by local environment factors. However,
as more studies have emerged, genome architecture has emerged as an important
epigenetic signature, with roles in regulating cell identity and function. Through
development of tools to investigate chromatin structure, the influence of spatial
configuration of the chromatin on transcription and cell function could be fully explored.
1.2 Tools to Study Chromosome Structure: Microscopy Techniques
As first observed by Rabl and Boveri at the start of the 20
th
century, chromatin in
the interphase cell nucleus occupy distinct domains in the nucleus. Using light microscopy,
individual chromosomes could be observed occupying distinct spaces within the nucleus
(Cremer and Cremer, 2010). These “territories” would be maintained throughout
interphase and only reorganized into different configurations in the nucleus when cells
entered prophase and metaphase of mitosis. This non-random ordering of the chromatin
suggested that regulatory mechanisms were at play that were biologically meaningful.
2
Advent of DNA Fluorescence in situ hybridization (DNA-FISH) allowed for in depth
analysis of chromatin spatial configuration and interaction by highlighting specific loci or
chromosomes within the nucleus. The DNA-FISH protocol requires generation of DNA
probes biochemically tagged with a fluorophore or hapten to hybridize at complementary
loci in the chromatin. Haptens are nucleotides tagged with non-fluorophore moieties,
usually biotin or dioxigenin, and require a second incubation step with a fluorophore
conjugated antibody for signal detection (Langer-Safer et al., 1982). DNA probes are
usually generated via nick translation by DNA Polymerase I activity on bacterial artificial
chromosomes BACs for large regions of interest, or by in vitro transcription of short
stretches of genomic sequence. After hybridization in situ, localization of the probes can
be directly observed via laser excitation of the fluorophores. Multiple probes can be
generated with fluorophores with different excitation frequencies to map overlapping
signals that represent colocalized and potentially interacting loci. Layering of images by
Figure 1.1 Chromosome Territories Identified By DNA FISH
Chromosome paint probes hybridized to the interphase cell nucleus reveals that each
chromosome occupies a distinct nuclear space. Intermingling of the chromosome can also
be observed at the borders between territories, indicating potentially interaction regions.
Adapted from Speicher et al. 2005
3
z-stacking allows for 3D construction of the spatial orientation of the chromatin strands
and loci. Additionally, spatial ranges can be determined for each chromatin loci or region
visually for calculation of a mean distribution of proximity, allowing for estimation of
interaction at the population level (Giorgetti and Heard, 2016). DNA-FISH is intrinsically
a low throughput method due to the need for layered acquisition of images representing
each probe and is limited by the number of channels available for imaging each
fluorophore. Only by counting large numbers of single cells can statistical significance be
obtained. However, the technique has been used widely for studying the underlying
structural organization of the chromatin to show the existence of chromosome territories
and relocation of transcribed and silenced loci to other nuclear compartments (Cremer
and Cremer, 2001) (Figure 1.1). FISH continues to be useful for visual understanding of
chromatin dynamics and organization.
1.3 Chromosome Conformation Capture
Various biochemical techniques have been developed for investigating genome
organization, with chromosome conformation capture (3C) derived techniques as the
most widely used. Focusing primarily on chromatin-chromatin contacts, the family of 3C
based techniques generally follow a similar approach as described by Job Dekker’s group
in 2002: first, the cells are fixed by formaldehyde or other fixatives crosslink proteins to
other proteins or DNA, which preserves the internal structures and configurations of the
chromatin inside the nucleus. Next, the cells are harvested and nuclei are lysed using
detergent to release the fixed chromatin. The chromatin is fragmented by restriction
enzymes, chosen based on the aim of the study and available sites at the loci of interest.
Depending on the necessary resolution needed, 6-bp cutting or 4-bp cutting restriction
4
enzymes are most commonly used to generate “sticky ends” for each fragment. Then, the
fragments are ligated with T4 DNA ligase at very dilute conditions so that chromatin
fragments in proximity of each other ligate at higher probabilities within the same fixed
protein-DNA complex, creating chimeric ligation products capturing the “proximity-
mediated” interactome in the nucleus. The ligated, circular DNA species are
decrosslinked and purified to create a chromosome conformation capture library. Based
on the number of chosen “baits” sites, ranging from just one locus of interest to multiple
loci, PCR primers are designed to the bait and other site of interest detect the products
generated by the assay. These ligation products are thought to provide a quantitative
measure of interaction frequency between chromatin fragments measured by qPCR
(Göndör et al., 2008; Hagège et al., 2007) (Figure 1.2).
In traditional 3C, a single bait is used and primers are designed to interrogate
potential sites of interaction in a ‘one-to-one’ assay. Expanding upon this protocol,
chromosome conformation capture carbon-copy (5C) was developed to interrogate
multiple bait sites to many individual potential interacting sites within a defined region,
which allows for creation of an interaction matrix within a segment of a chromosome (Nora
et al., 2012). However, both 3C and 5C required prior knowledge of potential interaction
between bait and captured sites as they depended on having designed appropriate PCR
primers to detect the interaction. Further development of 3C assays produced circular
chromosome conformation capture (4C) assays, which incorporated inverse PCR with
primers designed at opposing ends of a single bait region to identify putative interacting
sites to one bait site in a ‘one-to-all’ assay. Requiring DNA sequencing of the chromosome
conformation capture libraries, 4C assays require no prior knowledge of interacting sites
5
and have been used to create interactome maps within defined regions, chromosomes,
and even genome wide (Zhao et al., 2006). Taking the 3C protocol to the logical extreme
to capture all interactions within the nucleus, high throughput 3C (Hi-C) was developed
as an unbiased method to generate the a matrix of all sites interacting with all other sites
within the genome (Dixon et al., 2012; Lieberman-aiden et al., 2009). This has allowed
researchers to generate high level interaction maps and 3D models of how individual
Figure 1.2 Chromosome Conformation Capture (3C) and Derived Techniques
The basic 3C assay involves crosslinking the chromatin, fragmentation, ligation, and reverse
crosslinking before PCR analysis. Various modifications of the 3C protocol have produced
assays to examine multiple bait loci (5C, Hi-C), unknown loci (4C), and protein mediated
interactions (ChIA-PET)
Adapted from Fraser et al. 2015
6
chromosomes are spatially oriented in relation to other chromosomes within the nucleus
(Denker and De Laat, 2016; Wit and Laat, 2012).
The basic 3C technique has also been adapted for examining proteins and
complex mediated chromatin interactions. ChIP-loop, chromatin interaction analysis with
paired end tagging (ChIA-PET) and Hi-ChIP techniques all include an
immunoprecipitation step using an antibody against a protein of interest to pull down
associated chromatin fragments that are considered to be interacting partners (Cai et al.,
2006; Goh et al., 2012; Horike et al., 2005; Li et al., 2010; Mumbach et al., 2016; Ruan
and Ruan, 2012). The approach has been applied to many DNA binding proteins including
transcription factors, architectural proteins such as CTCF, cohesion components, and
Satb1 (Cai et al., 2006; Handoko et al., 2011; Heidari et al., 2014; Tang et al., 2015).
These powerful techniques help link specific protein binding to specific chromatin contacts.
Refinement of the basic 3C protocol in recent work have increased the resolution
and accuracy of interaction data obtained. Use of sonication for fragmenting the
chromatin instead of using restriction enzymes has helped to overcome potential bias due
to uneven distribution of restriction sites and inaccessibility of certain sites (Gao et al.,
2013). Using DNA hybridizing probes in the Capture-C assay has allowed for enrichment
of bait specific chimeric ligation products and has been shown to reduce background
noise post sequencing (Davies et al., 2015). These improvements to the 3C method have
allowed for better resolution for the assay, which has contributed to wider adoption by the
chromosome structure field.
7
1.4 DNA Adenine Methyltransferase Identification
DNA adenine methyltransferase identification (DamID) is a parallel technique
described around the same time as 3C that has also been used to investigate
chromosome interactions. It utilizes DNA adenine methyltransferase (Dam), a prokaryotic
unique enzyme expressed in Escherichia coli primarily involved in labeling the parental
DNA strand during DNA replication. As the newly synthesized DNA strand lacks
methylation, any replication errors occurring due to replication can use the methylated
DNA strand as the reference template for repairing errors (Calmann and Marinus, 2003).
Dam is also used in prokaryotes for initiating chromosome replication and regulation of
gene expression (Løbner-Olesen et al., 2003). By adapting the methyltransferase activity
of Dam for labeling chromatin loci in mammalian cells, DamID was initially used to identify
protein-binding sites within the chromatin. This was accomplished by fusing Dam to a
DNA binding protein of interest and expressing the mutant construct in nuclei. The Dam
fusion protein is then tethered to the native protein binding site and labels loci through
adenine methylation of GATC sequences within proximally positioned DNA strands
(Figure 1.3). Labeled sites can then be assayed directly after DNA extraction and by
sequential digestion with DpnI and DpnII restriction enzymes (Cléard et al., 2006; Greil et
al., 2006; Horton et al., 2005; Steensel and Henikoff, 2000) (Figure 1.3). Both DpnI and
DpnII recognize the GATC motif for restriction, but DpnI can only cleave at methylated
GATC sites while DpnII is blocked from cleavage of methylated GATC. Resulting products
after sequential restriction are chromatin fragments that are flanked by methylated GATC
sites but do not contain internal sites and can serve as templates for PCR detection. As
8
restriction digestion is the sole ex vivo biochemical process used for processing labeled
sites, a mostly unmodified interaction profile of protein binding loci can be generated.
DamID has been performed in a variety of cell types and biological contexts, but
has not been as widely adopted due to the requirement of constructing the Dam fusion
protein and the difficulty of regulating Dam activity (Aughey and Southall, 2016). In the
first application of DamID, Dam was fused with nuclear Lamin B, which helped to identify
lamina associating domains (LADs) that contain silenced genes sequestered at the
nuclear periphery (Kind et al., 2015; Pickersgill et al., 2006; Reddy et al., 2008). It was
observed in these experiments that even low level expression of Dam quickly saturates
methylation of interacting loci and leads to nonspecific methylation (Steensel and Henikoff,
2000). To control Dam expression, studies have used inducible system such as the FLP-
system, driving expression of Dam fusion proteins using leaky heat shock promoter hsp70,
or integrating a tamoxifen inducible, self splicing intein in the Dam sequence (Greil et al.,
2006; Pindyurin et al., 2016). Additionally, an experiment expressing a non-fused Dam
construct is typically performed in parallel as a nonspecific methylation control sample for
Figure 1.3 DNA Adenine Methyltransferase Identification
Prokaryotic DNA adenine methyltransfrase (Dam) fused to a DNA
binding protein can methylate proximally located GATC site in vivo. The
labeled DNA can be detected by PCR after digestion with DpnI and
DpnII to exclude unlabeled sites
Adapted from Greil et al. 2006
9
comparison in analyzing DamID results. DamID is a useful technique that provides an
alternative method for further exploration of chromatin structure.
1.5 Chromatin Structure and Transcriptional Regulation
As cells are restricted in their biological context and regulate gene expression
programs correspondingly to external and internal stimuli, spatially restricting chromatin
positioning provides an additional layer of regulation that can be highly flexible and
responsive to external cues. Chromosome territories investigated by FISH chromosome
painting experiments revealed the first evidence of chromatin structure influence on gene
expression. Observed first in the nearly spherical lympohocyte, gene-rich chromosomes
with actively transcribing genes are positioned towards the nuclear center, while gene-
poor chromosomes with weakly expressing genes were positioned towards the nuclear
periphery (Cremer and Cremer, 2001, 2010). This general trend was also observed at the
subchromosomal level, where open euchromatin regions containing actively expressed
genes were organized towards the center and closed heterochromatin regions with silent
genes were localized closer to the nuclear envelope (Mateos-Langerak et al., 2007). This
hierarchical ordering of active and inactive chromosome compartments is maintained
through mitosis, though can undergo reorganization during cell division (Cremer and
Cremer, 2010; Ulianov et al., 2016). This compartmentalization of chromosomes into
separate states related to their overall positioning within the nucleus was the first
evidence of spatial regulatory effects (Figure 1.4).
One level down, topologically associating domains (TADs) were discovered by
characterizing all chromosome contacts in the nucleus using the Hi-C technique (Dixon
10
et al., 2012; Lieberman-Aiden et al., 2009) (Figure 1.4). TADs are organized in similar
interacting blocks across species and cell type, establishing the overall chromatin
architecture to be mostly stable and self-contained. Actively transcribed genes are
observed at the boundaries of TADs, which are demarcated by CTCF binding sites (Dixon
et al., 2012; Ulianov et al., 2016). Establishment of these TAD boundaries enforces most
active transcription by RNA polymerase II to occur within TADs and rarely between
different TADs, establishing a division between inactive and active compartments (Dixon
et al., 2012; Nora et al., 2012). As these TAD have been observed to be stable across
different cell types and species, it initially suggests that general organization of nuclear
architecture is static. Despite making up a small portion of interactions, dynamic
interactions have been observed to occur and are thought to uniquely define cell function
and identity as signature of existing interaction networks.
11
At the level of the chromatin fiber, division of active and inactive chromatin regions
are mediated by transcription factors and architecture proteins. Zinc finger containing
CCCTC-binding factor CTCF plays a large role in formation of general chromatin loops
and determining transcriptional activity of looped loci (Handoko et al., 2011; Phillips-
Cremins et al., 2013; Sofueva et al., 2013). Along with CTCF, cohesin is a multi-subunit
complex that has also been observed in mediating chromatin loop formation between
regulatory regions and is thought to function through a loop extrusion model (Barrington
Figure 1.4 Compartmentalization of the Nucleus and Chromatin Loops
One level down from chromosome territories, chromatin is divided between active and
inactive compartments (A and B), which can be further divided into topological associating
domains (TADs) and sub-TADs. At the level of the chromatin strand, chromatin looping can
occurs to bring enhancer and promoter elements into proximity or can also silence gene
expression through enhancer-silencer and insulator-insulation looping.
Adapted from Fraser et al. 2015
12
et al., 2017; Kagey et al., 2010; Merkenschlager and Odom, 2013). Cell type specific
transcription factors also play a major role in cell type specific looping through binding
with lineage specific enhancers to interact with specific gene promoters (Stadhouders et
al., 2018) (Figure 1.4). Each of these nuclear factors have been shown to segregate
chromatin into active and inactive compartments within the nucleus, where actively
transcribed genes tend to cluster together in active chromatin hubs (ACH) that are highly
correlated with RNA polymerase II localization (Allahyar et al., 2018; Li et al., 2012; Tsai
et al., 2016; Zhang et al., 2013b). These ACH, also known as “transcription factories”,
are observed to be stable and constrain chromatin movement unless perturbed by
external signaling and developmental changes, which only then lead to greater mobility
of the chromatin (Davidson et al., 2013; Gu et al., 2018). The close association of
chromatin positioning by DNA binding factors that lead to clustering of chromatin into
active transcription compartments indicate the importance of chromatin structure on
transcription regulation.
1.6 Enhancers and Chromatin Looping
Visible in DNA FISH experiments, chromatin loci labeled with specific probes have
been seen converging within the same nuclear space despite separation by several
megabases on the same chromosome or even different chromosomes. The phenomenon
described as “chromosome kissing” spurred closer investigation of the loci involved which
often are revealed to be associations between gene promoters and enhancers.
Enhancers are DNA sequences that contain binding sites for transcription factors
and chromatin architecture proteins and regulate the expression of distally located genes
13
through looping contacts with the targeted promoters. Many studies examining enhancer
activity have revealed several interesting properties: enhancers can regulate multiple
genes simultaneously and create clusters of co-regulated genes in an active chromatin
hub (Beagrie et al., 2017; Palstra et al., 2003; Tolhuis et al., 2002; Tsai et al., 2016), do
not always regulate their most linearly proximal promoters (Levine et al., 2014), and many
have been observed interacting with promoters located hundreds of kilobases away or
even on other chromosomes (Li et al., 2012; Zhang et al., 2013a). Enhancer looping is
considered to be a non-emergent phenomenon, where active long range chromatin
looping occurs non randomly and is highly cell context dependent (Levine et al., 2014;
Soutoglou and Misteli, 2007). Enhancer dynamics have largely been characterized by
3C-based assays, which was first used to probe the regulation of the -globin genes in
erythrocytes. The -globin gene locus has a well characterized locus control region (LCR),
which was observe to loop to the -globin gene cluster located 50kb downstream from
the LCR (Palstra et al., 2003). Additionally, depending on the stage of development, the
LCR switches the site of interaction to bring into proximity different globin gene promoters,
forming what has been termed an active chromatin hub (ACH) (Palstra et al., 2003;
14
Tolhuis et al., 2002). The LCR has become the standard model for demonstrating
enhancer activity (Figure 1.5).
Recently, a new class of genomic regulatory element have been described as
“super enhancers”. Postulated by bioinformatics analysis clustering regions with active
enhancer histone mark H3K27ac, mediator complex subunit MED1, and multiple cell
specific transcription factor binding sites together, these genomic regions are thought to
be unique in their ability to coordinate lineage specific chromatin looping activity and cell
type specific function (Hnisz et al., 2013; Moorthy et al., 2017; Whyte et al., 2013). These
super enhancers are described to be much larger on average, at 8.7kbs, but are defined
to encompass traditionally identified enhancer regions such as the LCR at the -globin
gene cluster in erythrocytes (Allahyar et al., 2018; Moorthy et al., 2017). It is thought that
each cell type activates unique sets of super enhancers to establish a specific
transcriptional network, but many super enhancers remain to be functionally
characterized. Deletion experiments are needed to help characterize these super
enhancers and whether they have different effects compared to traditionally defined
enhancers. The super enhancer definition may be useful in identifying the most important
Figure 1.5 Model of the -globin gene locus
The archetypal enhancer, locus control
region (LCR), mediates chromatin looping
to various globin genes at specific
developmental stages (switching between
βmaj and βmin) and silences others genes
(βh1 and εy), forming an active chromatin
hub (ACH) highlighted in red.
Adapted from Tolhius et al. 2002
15
enhancer regions, which may help in identifying the set of regulated genes that have the
largest effect in determining cell identity.
1.7 Long Range Interchromosome Interactions
Many studies performed to examine chromatin conformation have focused on cis-
regulatory activity between enhancers and promoters, often restricting the window to
focus on the local chromatin neighborhood. The first genomic locus to be characterized,
the β-globin locus, observed chromatin loops between the locus control region LCR and
interacting globin gene promoters located only 40-60kb. (Palstra et al., 2003; Tolhuis et
al., 2002). Even in Hi-C studies that aim to capture all chromatin contacts in the nucleus
in an unbiased manner, interaction signals detected at distally located loci are typically
not analyzed and considered noise as they occur outside of defined TADs and have much
lower signal intensity than intra-TAD signals (Lieberman-Aiden et al., 2009; Rao et al.,
2014). As a result, a majority of studies into chromosome structure do not extend beyond
the same chromosome, as many rely on 3C-based assays to profile chromatin contacts
in the nucleus. However, inherent biases of the 3C assay predisposes detection of
proximal interactions signals within the fixed “chromatin cages” formed during
formaldehyde crosslinking (Gavrilov et al., 2013b, 2013a). Additionally, disagreement of
interaction signals between 3C and FISH assays that show interchromosome interaction
have brought into question results obtained from 3C-based assays (Giorgetti and Heard,
2016; Maass et al., 2018).
Examples from various chromatin profiling techniques lend support to
interchromosome interactions occurring in vivo. FISH studies have often observed
16
intermingling of chromatin at borders of chromosome territories (Cremer and Cremer,
2010). In conjunction with foci of RNA polymerase II aggregates called “transcription
factories”, intermingling of chromatin loci are often seen at the borders between
chromosome territories, suggesting that genes located on different chromosomes are
colocalized, coregulated, and co-transcribed in these locales (Davidson et al., 2013).
Long range interactions have been observed to occur in various biological systems. The
first validated example was observed in CD4 T helper cells during differentiation into TH1
or TH2 lineages. In naïve CD4 T cells, TH1 gene interferon gamma Ifng, located on
chromosome 10, and TH2 genes interleukin 4, 5 and 13, located on chromosome 11, are
clustered prior to differentiation. Once lineage specification occurs by cytokine signaling,
the interchromosome interaction is abolished and the appropriate genes of the specified
lineage can be immediately expressed (Spilianakis et al., 2005). Other associations in
trans with the TH2 interleukin genes have been observed with tumor necrosis factor alpha,
located on chromosome 17, and interleukin 17, located on chromosome 1 (Williams et al.,
2010). Interchromosome interactions have also been observed between the Igf2/H19
imprinting control region ICR on the maternal chromosome 7 and an intergenic region
between two genes, Wsbl and Nf1, located on paternal chromosome 11 (Ling et al., 2006).
CTCF was found to mediate the looping interaction and mediate transcriptional activation
of Wsbl and Nf1 by forming an active chromatin hub. Knockdown of CTCF or mutation at
the maternal ICR locus abrogated the interchromosome interaction with the paternal
genes, as well as at other imprinted genes located at distal sites on other chromosomes
(Zhao et al., 2006). These examples show that interchromosome interactions have
biological significance and can be found in various contexts.
17
1.8 Site Targeted Genome Engineering
Site specific targeting techniques for use in mammalian cells have been under
rapid development in recent decades. A wide variety of approaches to direct edit DNA
have emerged. Homing meganucleases, the most commonly known and utilized being
the LAGLIDADG endonucleases, are derived from archae and mitochondrial genomes
(Stoddard, 2011). These endonucleases exist as homodimers with a DNA recognizing
domain and a cleavage domain connected by variable peptide linker. However, many
homing meganucleases were found to have high off-targeting rates in vitro, which limited
their use for in vivo studies (Scalley-Kim et al., 2007). Designer zinc finger nucleases
(ZFNs) were found through naturally occurring zinc finger proteins that bind sequence
specific DNA through the major groove of the DNA double helix (Kim et al., 1996). Fusing
a nonspecific nuclease domain derived from FokI restriction enzyme to a designed
sequence of zinc finger units, targeted modification of specific DNA sequences can be
achieved (Miller et al., 2007). Similar to ZFNs, transcription activator-like effector
nucleases (TALENs) were derived from Xanthomonas plant pathogens, consisting of a
nuclear localization signal and a DNA binding domain with tandem repeated arrays of 34
amino acids that bind specifically to nucleotide bases based on the repeat variable di-
residues at the end of each array (Christian et al., 2010; Cong et al., 2012). The TALE
proteins sequences are fused to a FokI endonuclease where dimerization of two TALENs
flanking a specific site causes a double strand break (DSB) to occur. TALENs have been
shown to be highly effective for genetic editing in vivo (Bedell et al., 2012; Lei et al., 2012;
Mashimo et al., 2013). Although these protein based technologies are highly adaptable
for targeting any site in the genome by their modular nature, they can be cost prohibitive
18
when needed for complex manipulations of the genome and may exhibit reduced
specificity due to SNPs at the locus of interest or other nucleotide substitutions.
Recently, Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)
systems from bacteria and archae have been adapted for DNA editing in mammalian cells.
The Type II CRISPR system acts as an adaptive immune defense system against
invading viruses and bacteriophages by first fragmenting foreign DNA by cas proteins into
22-24bp fragments and incorporating these spacer sequences into the CRISPR locus
array, interspacing them between series of conserved direct repeated sequences
(Horvath and Barrangou, 2010). These arrays become transcribed to produce precursor
CRISPR RNA (pre-crRNA) and becomes based paired with a complementary trans-
activating crRNA (tracrRNA) to form a RNA duplex structure that is recognizable to the
Cas9 endonuclease protein. The active crRNA-tracrRNA-Cas9 protein complex is able to
hybridize with complement DNA sequences that are flanked by the requisite protospacer-
adjacent motif (PAM) of 5’-NGG-3’. Through modifications of the Type II CRISPR system
in Streptococcus pyogenes by combining the crRNA and tracRNA species into a chimeric
single guide RNA sequence (sgRNA) that can be easily programmed, almost any loci in
the chromatin can serve as a template for site specific targeting and modification (Cho et
al., 2013; Cong et al., 2013; Hwang et al., 2013; Jiang et al., 2013; Jinek et al., 2012; Mali
et al., 2013) (Figure 1.6). Type II CIRSPR systems have been adapted from other species
of bacteria, including Neisseria meningitides which utilizes a 5’-NNNNGATT-3’ PAM
sequence, and Streptococcus thermophiles which utilizes a 5’-NGGNG-3’ (Hou et al.,
2013). Co-utilization of two different CRISPR systems from different bacteria have been
done in experiments requiring visualization of different genomic loci and have been shown
19
to not have cross reactivity (Chen et al., 2013; Fu et al., 2016). Despite some concerns
of off targeting effects by the short 22-24bp sgRNA, CRISPR has largely proven to be
highly specific for targeting loci (Cho et al., 2014; Fu et al., 2013).
To manipulate the targeted DNA sequence, the native Type II CRISPR system
utilizes the Cas9 protein, which has been further developed for a variety of application.
Wild type Cas9 has been shown in vitro to induce DSB at targeted loci. This leads to
repair of the genetic loci by two different DNA repair pathways. Non homologous end
joining (NHEJ) is preferred in DSB and involve exonucleases, polymerases, and ligases
Figure 1.6 CRISPR-Cas9 Genome Editing
The Cas9 protein complexes with a small RNA targeted to a specific site in the genome
containing a protospacer adjacent sequence (PAM). Double strand cleavage occurs which
can be repaired via error prone non-homologous end joining (NHEJ) or precise homology
directed repair (HDR).
Adapted from Addgene CRISPR 101 Handbook
20
(Lieber, 2011). NHEJ is error prone and leads to loss of nucleotides at the ends of each
strand, allowing for deletion of targeted genes or frameshift mutations that may disable
normal protein function (Figure 1.6). This approach has been applied for many phenotype
screening experiments to identify genome wide functional genes involved in a variety of
biological contexts ranging from cancer progression and suppression (Chen et al., 2015;
Tzelepis et al., 2016), parasite fitness (Sidik et al., 2016), viral infectivity (Park et al., 2017),
to RNA splicing (Fei et al., 2017). These have been found to have greater success for
identifying relevant gene targets over chemical screens and RNAi screens (Dhanjal et al.,
2017). For more precise genetic editing, the Cas9 protein has been modified into a
“nickase” enzyme through mutation of D10A to only cause a single strand break along
the targeted DNA strand (Cong et al., 2013; Jinek et al., 2012; Ran et al., 2013). Using
two sgRNAs to target the nickase Cas9 to a single locus separated by less than 50bp
also allows for a staggered double strand break to occur and encourages activation of
homologous recombination pathway over NHEJ for DNA repair. Homologous directed
repair (HDR) requires a repair template to be provided but can more precisely target the
locus for insertion and preserves the native sequence (Figure 1.6). A variety of
approaches have been developed to increase HDR over NHEJ to optimize incorporation
of exogenous sequences (Chu et al., 2015; Pawluk et al., 2016; Robert et al., 2015; Yu
et al., 2015). Addition of small molecule drugs to inhibit components of the NHEJ such as
DNAPK have been shown to increase HDR utilization (Robert et al., 2015; Yu et al., 2015).
HDR insertion has been found to work in vitro and in vivo, generating viable animals
containing the mutation of interest (Wang et al., 2013).
21
CRISPR/Cas9 has also been adapted for other applications not directly
manipulating the DNA sequence. Cas9 can been modified to be catalytically dead (dCas9)
through mutations at sites within the RuvC1 and HNH nuclease domains at D10A and
H841A (Bikard et al., 2013; Qi et al., 2013). This mutant dCas9 can still recognize sgRNA
and be targeted to loci of interest. The cleavage deficient dCas9 can be fused with a
variety of effector domains. Fusing the dCas9 to the omega subunit of RNA Polymerase,
expression of targeted genes can be activated (Bikard et al., 2013). Fusion with the VP64
transactivation domain with dCas9 allows strong endogenous activation of target genes
that can even initiate somatic cell reprogramming into induced pluripotent stem cells (Liu
et al., 2018). It was also observed that the targeted dCas9 alone causes some level of
expression inhibition, which may be due to steric hindrance from the physical occupancy
of the protein to promoter sequences (Bikard et al., 2013). Fusion with a repressive KRAB
domain could improve gene expression silencing by modifying the epigenetic
environment, termed CRISPR interference, and can be used in a high throughput screen
for regulatory elements (Fulco et al., 2016; Gilbert et al., 2013; Qi et al., 2013). An even
more effective gene silencing technique was developed with a dCas9-LSD1 fusion
construct as LSD1 a known histone demethylase (Kearns et al., 2015). In other
applications, dCas9 fused with an engineered peroxidase APEX2 allows for tagging of
local protein effectors with biotin moieties in vivo, allowing for enrichment prior to mass
spectrometry (Myers et al., 2018). dCas9 has also been fused with GFP and other
fluorophores for visualizing repetitive and non-repetitive genomic loci by tiling sgRNAs at
the locus of interest (Chen et al., 2013; Fu et al., 2016; Ma et al., 2015; Maass et al.,
2018a). More interestingly, dCas9 fused to LDB1, which forms homodimers, can be used
22
to induce chromatin looping (Hao et al., 2017). The CRISPR/Cas9 system has become a
toolbox for exploring biological questions through targeted, site specific genetic
engineering.
1.9 Embryonic Stem Cells and Pluripotency
Embryonic stem cells (ESCs) are a transient cell type appearing in the inner cell
mass (ICM) of the pre-implantation blastocyst at development day E3.5. Surrounded by
trophectoderm cells that make up the outer layer cells of the blastocyst, the ICM
contributes to the development of the entire fetus and can differentiate into all three germ
layers of the body, also known as the property of pluripotency. Mouse ESCs were first
isolated from the work of Evans and Kaufman (Evans and Kaufman, 1981). Grown in vitro
with fetal bovine serum containing growth factors bone morphogenic protein 4 (BMP4)
and leukemia inhibitory factor (LIF) (Martin, 1981; Ying and Smith, 2003), murine ESCs
can self-renew indefinitely and are able to contribute to other pre-implantation ICM to form
chimeras. These two properties have made ESCs the ideal tool for biological research by
providing an inexhaustible source of identical cells for genetic engineering and allowing
for generation of fully mutant animals through germline transmission of modified ESCs.
ESCs have been further characterized and distinguished from other cell types of
early development in mice. ESCs stain positively for alkaline phosphatase (AP) and
express surface marker SSEA-1. Murine ESCs highly express the genes Oct3/4, Sox2,
and Nanog, which are considered to be definitive biomarkers of the pluripotent state.
ESCs also do not show X-inactivation as seen in somatic cells. ESCs are able to
demonstrate pluripotency by their ability to form embryoid bodies when grown in
23
suspension without LIF, and form teratomas containing all three germ layers when
implanted into mice. These attributes differentiated ESCs from trophectoderm cells and
extraembryonic cells also present during early embryogenesis and have become
hallmarks for what has been termed “naïve” state pluripotency.
Work on the in vitro derivation of mouse ESCs helped to derive human ESCs
(Thomson et al., 1998) though several differences were found to exist between the
species. Human ESCs as initially derived could only be grown with growth factors
ACTIVIN A and FGF2. Additionally, hESCs express other cell surface markers SSEA-3,
SSEA-4, TRA-1-60, and TRA-1-80. These differences from murine ESCs were the first
indications that alternate states of pluripotency may exist with different signaling pathways
involved.
24
Discovery of alternative ESC supporting culture conditions started to reveal the
complex interplay between signaling pathways regulating the pluripotency state. The ‘2i’
culture condition using small molecule inhibitors PD0325901 and CHIR99021, which
blocks activation of the Mek/Erk pathway and canonical Wnt pathways respectively, was
found to support murine ESC self renewal and maintain pluripotency independent of
growth factors (Ying et al., 2008). This was termed “ground state” pluripotency, as ESCs
in 2i phenotypically resembled the in vivo state of the epiblast cells in the mature
blastocyst more closely than ESCs grown in traditional LIF/serum conditions, which have
been termed “naïve” state (Hackett and Azim Surani, 2014). Ground state ESCs have
also been shown to exhibit global hypomethylation and retain fewer 5-methylcytosine
marks in contrast to naïve state ESCs (Ficz et al., 2013; Habibi et al., 2013). Post
implantation epiblast-derived stem cells (EpiSCs) are also pluripotent, but are unable to
Figure 1.7 Derivation of Embryonic Stem Cells
Embryonic Stem Cells (ESCs) are derived from the
inner cell mass of a developing blastocyst. ESCs can
be isolated and grown in vitro and directed to
differentiate into specific cell types.
25
contribute to chimera formation when grafted into preimplantation blastocysts (Brons et
al., 2007; Joo et al., 2014; Tesar et al., 2007). They have been shown to represent a later
time point of development where cells have become partially lineage restricted, termed
“primed” pluripotency. EpiSCs require different growth conditions and rely on different
growth factors for propagation in vitro, ACTIVIN A and FGF2. Despite maintaining
expression of Oct4, EpiSCs lack expression of Rex1 (Guo et al., 2009; Tesar et al., 2007).
EpiSCs can be transitioned into ESCs through overexpression of Klf4 via a transitional
cell type, termed epiblast-like cells (EpiLCs) (Guo et al., 2009; Hayashi et al., 2011) and
ESCs can be differentiated into EpiSC by changing culture media conditions and addition
of growth factors FGF2 and small molecules IWR-1 and CHIR99021 (Kim et al., 2013).
As murine EpiSCs share similar culture conditions to initially derived hESCs, it was found
that ground state hESCs could be derived through changing culture media and additional
small molecules like HDAC inhibitor and forskalin (Gafni et al., 2013; Hanna et al., 2010;
Ware et al., 2014; Warrier et al., 2017). These different states of pluripotency pointed to
common signaling pathways utilized to express a “core” transcriptional network of genes
by pluripotent cells, but additional pathways and genes are activated or silenced to retain
the type of pluripotent cell supported by the external environment.
1.10 Regulation of Pluripotency By Extrinsic and Intrinsic Factors
ESCs are an inherently unstable cell type in vivo as they exist in a very narrow
window of development in the embryo. However, as they have been successfully derived
in vitro and propagated as cell lines without losing their ability to differentiate into any
somatic cell type, ESCs demonstrate that a stable pluripotency gene network can be
activated to maintain ESC pluripotency and self renewal. ESCs also provide a convenient
26
system for understanding the priming of differentiation potential and the basis for cell
immortality. A variety of factors and regulatory pathways divided into extrinsic and intrinsic
signals have been identified that converge to regulate “stemness” in ESCs.
ESCs grown in vitro require extrinsic growth factors to remain pluripotent and self
renewing. In culture, ESC cultures encounter autocrine secretion of FGF4 into the media
which promotes pluripotency exit by MEK/ERK pathway activation. This intrinsic
prodifferentiation signal is a result of OCT4 and SOX2 activity and occurs naturally in vivo
to continue embryonic development (Feldman et al., 1995). Thus, to maintain ESCs in
culture, inhibition of differentiation signaling pathways such as MEK/ERK and FGF are
necessary to maintain stemness. In conventional ESC media culture, LIF is an essential
supplement and acts through binding gp130/LIFR at the cell surface. LIFR
phosphorylates JAK kinases, which in turn activates STAT3. As a transcription factor,
STAT3 translocates to the nucleus to activate pluripotency related genes Klf4, Socs3,
Gbx2, and Tfcp2l1, the latter has shown to be sufficient in maintaining ESC self renewal
by forced expression (Niwa et al., 2009; Tai and Ying, 2013; Ye et al., 2013). LIF also
activates PI3K/AKT and MEK/ERK pathways however, but is thought to be counteracted
through cross signaling by other pathways in fetal calf serum, another essential
component in conventional ESC culture media. BMP4 is the major factor in serum that is
required to maintain ESCs and acts by binding to its receptors to phosphorylate SMAD1,
which activates genes of the inhibitor differentiation (Id) family (Chen et al., 2008).
Downstream of extrinsic signals, the pluripotency gene network is regulated by a
host of transcription factors that activate or silence genes to maintain ESC stemness.
Studies on the pluripotent state in ESCs have centered on a trio of “core factors”
27
consisting of OCT4, SOX2, and NANOG. OCT4 is encoded by the Pou5f1 gene and is
known as the master regulator of ESC pluripotency and self renewal as Oct4 expression
is indispensable for the maintenance of self renewing ESCs, as Oct4 deficient embryos
lack development of the ICM and die at the blastocyst stage (Nichols et al., 1998). OCT4
levels are tightly regulated within a narrow range as insufficiency initiates stem cell
dedifferentiation into the trophectoderm lineage and over two fold expression increase
predisposes differentiation into primitive endoderm and mesoderm (Niwa et al., 2000).
Interestingly, heterozygous knock out ESCs for Oct4 show stabilized expression of
pluripotency related genes, reinforcing the observation that wild type ESCs are primed
for differentiation and are unstable whereas intermediate levels of OCT4 help support the
transient in vivo pluripotent state (Karwacki-Neisius et al., 2013). Sox2, an SRY-related
HMG-box transcription factor, is also widely expressed in the ICM but also additionally in
the extraembryonic ectoderm. It is known to bind with OCT4 has a heterodimer to activate
expression of Oct4 itself. ChIP-seq experiments have shown that these three factors
share similar binding sites at cis-acting regulatory elements, which can act to coregulate
target gene transcription by serving as a scaffold for recruiting coactivators for
transcription (Chen et al., 2008). Sox2 null embryos are also embryonic lethal and ESCs
with reduced SOX2 lose pluripotency and ability to differentiate into ectodermal neural
lineages. And lastly, Nanog is a homeobox transcription factor that is essential in
maintaining the pluripotency gene network, but is not critical in establishing pluripotency.
Overexpression of Nanog is able to maintain ESCs without LIF (Niwa, 2007), SOX2 and
OCT4 heterodimers have been shown activate Nanog expression as a complex (Rodda
et al., 2005). All three core transcription factors bind at their own promoters to
28
autoregulate their expression (Chew et al., 2006; You et al., 2011; Young, 2011). These
findings in various studies indicated a self reinforcing network of genes can drive cells to
a stabile pluripotent state, evidenced in later studies on induced reprogramming.
OCT4 binds with many interacting protein partners, including SOX2 and NANOG
(van den Berg et al., 2010; Pardo et al., 2010). These complexes target other important
pluripotency related genes in the cell and largely retain heterogenous expression in ESCs,
suggesting these may not be essential factors (Loh et al., 2006). However, both active
and inactive genes are bound by the trio of factors, the latter consisting of development
related genes, which indicate a dual role of activation and repression by the complex. The
genes targeted by OCT4, SOX2, and NANOG include transcription factors Klf4, Sall4,
Esrrb, Dax2, Tcfcp2l1 (van den Berg et al., 2010), chromatin modifying enzymes
Smarcad1, Myst3, and Set proteins, and components of signal transduction pathways for
Wnt, Dkk1 and Frat2, and TGF-β, Tdgf1 and Lefty2 (Chen et al., 2008; Loh et al., 2006).
Additionally, these three key regulatory factors also bind with known repressive epigenetic
complexes: the NuRD histone deacetylase complex, Polycomb group proteins, and the
SWI/SNF chromatin remodeling complex (Wang et al., 2006). These binding partners
help silence genes associated with differentiation and reinforce genes participating in the
pluripotency gene network.
1.11 Induced Pluripotent Stem Cells
Discovery of induced pluripotency by the Yamanaka group in 2006 changed the
notion of a terminal cell identity. Through screening overexpression of 24 candidate
factors, four transcription factors, Oct4, Sox2, Klf4, and c-Myc (OSKM), were found to be
29
able to induce expression of genes within the pluripotency gene network and revert
normal somatic cells to a state of pluripotency (Takahashi et al., 2007; Yamanaka, 2009)
(Figure 1.8). These cells showed similar properties as ESCs, where they expressed core
pluripotency genes Oct4, Sox2, and Nanog, form teratomas and differentiate into tissues
of all three germ layers, and could self renew in ESC culture media containing LIF and
BMP4. However, these induced pluripotent stem cells (iPSCs) could not fully contribute
to chimera formation, indicating that these cells may not be fully pluripotent as ESCs
(Mikkelsen et al., 2008). In spite of the incomplete process, somatic cell reprogramming
has been demonstrated successfully in a variety of cell types and organisms
(Hochedlinger and Jaenisch, 2015). Interestingly, reprogramming has also been achieved
with different sets of genes, substituting Klf4 and c-Myc with Nanog and Lin28 (Yu et al.,
2009), or Oct4 with nuclear orphan receptor Nr5a2 (Heng et al., 2010). These findings
prompted a closer look at the mechanism of transcription factor induced reprogramming.
Figure 1.8 Somatic Cell Reprogramming By Induced Pluripotency
Overexpression of transgenes Oct4, Sox2, Klf4, and c-Myc allows initiates transformation of
somatic cells like fibroblasts into induced pluripotent stem cells (iPSCs). During the process,
somatic genes are silenced while pluripotency genes are reactivated. Reprogramming is not
stable until endogenous pluripotency genes are active.
Adapted from Hochedlinger and Jaenisch 2015.
30
Initially, the efficiency of reprogramming somatic cells from mouse embryonic
fibroblasts MEFs by retroviral transduction was estimated to be one tenth of 1% of cells
incorporating all four reprograming factors (Okita et al., 2010; Takahashi et al., 2007).
This low level of efficiency was thought to be due to the imprecise control of transduction
of each factor and expression level, which was shown with Oct4 in ESCs require a narrow
range for maintenance of pluripotency (Niwa et al., 2000). Improvements to the original
protocol in other studies have tried to increase reprogramming efficiency by transfection
of other reprogramming factors (Buganim et al., 2014; Theunissen and Jaenisch, 2014;
Wang et al., 2011), use of non-integrating episomal vectors, transposon, and mRNA
delivery modalities (Kaji et al., 2009; Kim et al., 2009b; Yu et al., 2009; Yusa et al., 2009),
and use of small molecule drugs such as valproic acid and ascorbic acid (Esteban and
Pei, 2012; Esteban et al., 2010; Huangfu et al., 2008; Shi et al., 2010). As these protocols
have shown, multiple barriers exist in preventing somatic cell reprogramming, which
required understanding of the process at the molecular level.
Molecular characterization of the reprogramming process have split the cascade
of events into two distinct stages, an early and late stage of reprogramming, both centered
on activity of OSKM. In the largely stochastic early stage, “elite” cells predisposed for
reprogramming that sufficiently receive exogenous expression of OSKM see OCT4,
SOX2, and KLF4 binding at native enhancer sites within closed chromatin that are
typically sequestered and inaccessible in somatic cells (Soufi et al., 2012; Zaret and
Carroll, 2011). This pioneering activity allows for C-MYC binding at OSK bound sites and
induce transcription of pluripotency related genes and reentry into the cell cycle (Soufi
and Dalton, 2016). This essential step of the establishing accessibility of the endogenous
31
OSK loci that artificial binding of guided, activating Cas9-VP64 at these promoters could
replicated reprogramming (Liu et al., 2018). As those initially bound enhancers are
directed to bind to their target promoters in this latter stage of reprogramming, alkaline
phosphatase expression is upregulated, followed by SSEA-1 expression, and OSK
binding activity becomes centered on promoter regions of pluripotency related genes.
When the endogenous pluripotency circuitry consisting of Oct4, Sox2, and Nanog
becomes activated and independent of ectopic factor expression, the reprogramming
process is considered complete (Brambrink et al., 2008; Sridharan et al., 2009). All of
these dynamic changes alter the epigenetic landscape within the reprogrammed cells,
where the chromatin gradually becomes globally hypomethylated and histone
modifications are remodeled (Doege et al., 2012; Papp and Plath, 2013). Specifically,
OCT4 binding at its own enhancer maintains a nucleosome depleted region (NDR) to
establish accessibility of transcription factors to the locus (You et al., 2011). The process
of occluded enhancer binding, pluripotency gene expression, and global epigenetic
resetting point to a mostly stochastic process of reprogramming with these significant
barriers to the reestablishment of pluripotency.
Reprogramming cells have been shown to retained an epigenetic memory from
the source cells, where residual DNA methylation and histone marks could carryover and
alter subsequent iPSC cell fates (Kim et al., 2010; Mikkelsen et al., 2008). This “epigenetic
inertia” has largely been attributed to DNA methylation at the promoters of pluripotency
gene promoters. Despite the pioneering activity of OSKM, activation of essential
pluripotency genes is still a mostly stochastic process (Sridharan et al., 2009). A more
“deterministic” reprogramming process can be achieved through silencing of Mbd3
32
expression, a NuRD repressor complex component, methylation at pluripotent gene
promoters can be prevented and allow nearly 100% reprogramming rate (Rais et al.,
2013). Silencing of other methyltransferase proteins like DMNT1 has also been shown to
increase efficiency (Mikkelsen et al., 2008). Additionally, it was observed that progenitor
and other multipotent stem cells were more amenable to reprogramming, indicating that
uncommitted cells generally retaining fewer epigenetic barriers such as epigenetic
memory are less impeded to full reprogramming (Kim et al., 2009a, 2009b). It can be
hypothesized that other epigenetic features such as spatial configuration and DNA
accessibility may also influence the reprogramming process to establish induced
pluripotency.
1.12 Pluripotency and Chromatin Interactions
iPSC induction efficiency has been shown to be impacted by chromatin dynamics
and other epigenetic mechanisms. As the process requires reactivation of silenced
pluripotency genes that are often sequestered in heterochromatin, chromatin remodeling
is necessary for reestablishing the pluripotency gene network. Chromatin modifying
enzymes such as the SWI-SNF complex are recruited by OCT4 to heterochromatin sites
to induce gene transcription (Singhal et al., 2010). Chd1, a chromatin remodeling factor,
has been shown to actively open up the local chromatin environment it is targeted to and
influences reprogramming efficiency (Gaspar-Maia et al., 2009). Repression of DNA
methyltransferases such as NURD complex component Mbd3 also helps to open up
heterochromatin and aids in reprogramming efficiency (Rais et al., 2013).The increasing
accessibility of the chromatin sees pluripotency associated genes become released from
33
the nuclear lamina and clustered at the nuclear center, in line with reactivation of genes
seen in other studies (Peric-Hupkes et al., 2010).
Orchestration of chromatin looping is also directed by the reprogramming factors,
in particular by Oct4, Klf4, and Nanog. Restoration of genome wide chromatin contacts
at the Oct4 and Nanog promoters have been shown to be mediated by binding of these
reprogramming factors in association with cohesin and mediator complexes (Apostolou
et al., 2013; Wei et al., 2013; Zhang et al., 2013). Stepwise engagement of pluripotency
related enhancers such as the Oct4 distal enhancer and occupancy of non-pluripotent
enhancer regions by the reprogramming factors alters the local chromatin structure and
sets up specific chromatin loops to active or silence enhancer target genes (Chronis et
al., 2017). Importantly, the pluripotency specific chromatin loops are reestablished prior
to reactivation of endogenous Oct4 transcription at early stages of reprogramming and
Nanog transcription during later stages of reprogramming (Levasseur et al., 2008; Wei et
al., 2013). Depletion of any of the reprogramming factors prevents completion and
progression of the reprogramming process. Profiling chromatin looping activity by the
reprogramming factors continues to be an active area for research for understanding stem
cell pluripotency.
34
Chapter 2: Improved Detection of Long Range Chromatin Interactions With
CRISPR-Dam
2.1 Abstract
Previously employed 4C-seq to identify long range interacting targets of the Oct4
distal enhancer (DE) was largely unable to find functional candidate genes. To address
the challenges and limitations inherent to chromosome conformation capture assays, we
developed a technique based on the methylation labeling activity of prokaryote unique
DNA adenine methyltransferase (Dam) and site specific targeting capability of CRISPR-
Cas9. A Cas9-Dam fusion construct was cloned and validated for expression in mouse
E14 ESC cells. To confirm labeling specificity and functionality of the Cas9-Dam construct,
sgRNA were designed targeting the Oct4 DE and naïve state ESC chromatin was
assayed for known interaction with the Pou5f1 gene promoter region after incremental
induction times. Appropriate interaction was detected and an optimal interval for labeling
was chosen based on the results. CRISPR-Dam samples were then prepared for
sequencing to detect genome wide interactions of the Oct4 DE. We were able to identify
more high confidence candidates labeled by Cas9-Dam than in 4C-seq and found the
CRISPR-Dam assay showed comparatively less bias for physically proximal loci, making
it more suitable for identifying long range interactions.
2.2 Introduction
Many experimental techniques for studying the spatial orientation of the chromatin
exist, but chromosome conformation capture assays are the most often used. However,
certain intrinsic limitations of the assay have been found to affect the performance of the
35
protocol. 3C based assays are predicated on the principle of biochemically fixing the
spatial positioning of the chromatin so that interacting loci can be detected through
measuring the frequency of ligated bait-target junction fragments (Dekker et al., 2013).
However, formaldehyde fixation of the chromatin can lead to interaction artifacts that may
not represent truly interacting loci (Brant et al., 2016; Nagano et al., 2015; Schmiedeberg
et al., 2009; Simonis et al., 2007). Examining the biochemical effect of formaldehyde
fixation has shown that 85% of fixed chromatin remains insoluble even after treatment
with restriction enzyme and SDS, leaving “cages” of chromatin that may interfere with
proximity ligation and decrease available ligation fragments (Gavrilov et al., 2013a). As a
result, when measuring actual ligation frequencies between chromatin fragments in the
well known beta-globin gene and locus control regions, it was shown that less than 1% of
total ligation products are generated by the traditional 3C protocol (Gavrilov et al., 2013b).
This can obscure rare and transient or long distance interactions and create bias to over
represent the most proximally located fragments to the bait viewpoint (O’Sullivan et al.,
2013; Schmiedeberg et al., 2009). Additionally, other 3C derived techniques that rely on
immunoprecipitation such as ChIA-PET and and Hi-ChIP perform their enrichment step
using only the soluble chromatin fraction, which may exclude the majority of interactions
contained in the fixed chromatin cages (Gavrilov et al., 2013a). Such issues in 3C
generated data have resulted in discrepancies when validating interactions using other
techniques such as DNA fluorescence in situ hybridization (Giorgetti and Heard, 2016;
Schmiedeberg et al., 2009; Williamson et al., 2017). Stemming from these limitations,
very few 3C studies have been performed on investigating long range interactions greater
36
than a few megabases from a bait due to the high degree of difficulty in reliably capturing
these interactions.
Turning to other available methods of interrogating chromatin interactions, DamID
potentially overcomes some of these challenges. First, Dam labeling occurs in situ within
the living cell, bypassing issues related to fixation as the labeled chromatin can be directly
assayed for interaction, unlike 3C which relies on the restriction and ligation efficiencies
of utilized enzymes. The proximity ligation step, in particular, is a critical juncture in the
3C protocol that works under the assumption of sufficiently dilute conditions to prevent
inter-complex ligations. As these ex vivo manipulations may not be performed in the most
optimal experimental conditions, some rare or transient interactions may be excluded as
they are unable to generate the necessary chimeric ligation products for detection. DamID
circumvents these limitations and can provide a more accurate snapshot of interactions
that is less dependent on external enzymatic reactions (Greil et al., 2006). Additionally,
instead of proximity ligation, “proximity labeling” in live cells avoids generation of
chromatin cages and allows labeling to occur without bias of physical distance, more
faithfully representing native looping of the chromatin. DamID can create interaction maps
to comparable those generated by 3C libraries as both can be prepared with 4-bp cutting
restriction enzymes, with DpnI and DpnII as required in DamID (Cléard et al., 2006).
These advantages present DamID as a possible alternative technique to address the
challenges found in 3C.
The challenge of using DamID for profiling chromatin interaction is that most DNA
binding proteins have multiple binding sites throughout the genome, making it difficult to
specifically link individually labeled sites to each other. Previous studies investigating site
37
specific interaction using DamID required insertion of the yeast transcription factor GAL4
upstream sequence UAS to the site of interest to guide a Dam-GAL4 fusion protein to the
site (Cléard et al., 2006; Marshall et al., 2016). Similarly, insertion of a Tet operator
sequence was used target a Dam-TetR fusion protein to the HML locus in Sacchromyces
cerevisiae for characterizing interaction between the locus and telomeric sequences
(Lebrun et al., 2003). Both approaches required a laborious process involving
homologous recombination and clone selection.
Considering the challenges faced by 3C-derived techniques and traditional DamID,
we aimed to construct a new method that overcomes the known drawbacks and be easy
to perform. We had previously attempted to identify putative interacting partners of the
Oct4 DE using 4C-seq, but were unable to find functional candidates. Thus, we combined
the site specific targeting technology of CRISPR-Cas9 and “proximity labeling” aspect of
DamID to construct the CRISPR-Dam labeling system to improve identification of
potential Oct4 DE interacting partners.
2.3 Materials and Methods
Plasmid Cloning
The Dam domain was cloned from pMQ430 (Addgene # 42216) which contained the E.
coli DNA adenine methyltransferase gene. The domain was fused to the C-terminus of
the catalytically dead, N-terminal FLAG-tagged Neisseria meningitides Cas9 protein from
M-NMn-VP64 (Addgene #48676) in a vector using the FUW backbone containing a
Tetracycline Response Element (TRE) promoter. For reverse tetracycline-controlled
transactivator expression and clone selection, pSUPER-hygro was used as a backbone
38
with PGK promoter driving rtTA-IRES-hygro expression. Targeting sequences for the Nm-
sgRNA were designed using the CRISPR Multi-Targeter webtool
(http://www.multicrispr.net/basic_input.html) by specifying parameters for 22-24nt
sequence length and 3’PAM sequence “NNNNGMTT”. Highly scored sequences within
the Oct4 DE were cloned into a modified pLKO.1-puromycin vector containing an
optimized NmsgRNA loop sequence (Lee et al., 2016).
T7 Endonuclease I Assay
Wild type E14 cells were transfected with Nm-sgRNA and catalytically active NmCas9
plasmid pSimpleII-U6-tracr-U6-BsmBI-NLS-NmCas9-HA-NLS (Addgene # 47868) using
Lipofectamine 3000 (Invitrogen, Carlsbad, CA) as suggested by the manufacturer. Cells
were incubated for 48 hours before genomic DNA was harvested and purified with
Phenol:chloroform:isoamyl alcohol (VWR, Brooklyn, NY). High fidelity PCR was
performed using Q5 Master Mix (NEB, Ipswich, MA) and primers flanking the targeted
site within a 600bp window. Amplicons from wildtype and CRISPR targeted samples were
mixed 1:1 and hybridized on a thermocycler before treatment with the T7 Endonuclease
I (NEB). Products were run on a 1.5% agarose gel to visualize and quantify CRISPR
efficiency using ImageJ.
Lentiviral particle production and transduction
Nm-dCas9-dam and Nm-sgRNA plasmids were made into lentiviral particles using a 2
nd
generation packaging plasmids psPAX and pMD2.G. Plasmids were co-transfected into
293T cells and incubated for 48 hours prior to harvest. Viral media was filtered and
ultracentrifuged at 28,000 rpm for 90 minutes. Viral pellets were resuspended in 200ul 1X
PBS overnight and stored at -80C until use. For transduction, 5 x 10
5
E14 cells were
39
prepared and 10ul of lentivirus was added to wells for transduction with 5ug/ml polybrene.
Wells were then centrifuged at 1000g for 1 hour and incubated for 24 hours. Clones
transduced with Nm-sgRNA were selected with 1ug/ml puromycin and manually picked
for expansion.
Cell culture
E14 mouse ES cells were grown in ESC media containing GMEM with 15% FBS, 1%
NEAA, 1% Sodium Pyruvate, 1% L-glutamine, 1% PenStrep, 1X B-mercaptoethanol, and
1000U LIF. Cells were passaged using 0.05% Trypsin in PBS solution. NmCas9-Dam
expressing cell lines were generated first through selection of rtTA containing E14 ESC
clones using 200ug/ml Hygromycin B (VWR) and then for sgRNA selection with 1ug/ml
of puromycin (Santa Cruz Biotechnology, Santa Cruz, CA) for 48 hours post transduction.
Clones were manually selected and genotyped by PCR to confirm construct integration.
NmCas9-Dam labeling performed using 5x10
6
cells and induced with a single pulse of
6ug/ml doxycycline (Sigma, St. Louis, MO). Cells were harvested at 6 hour increments to
quantify FLAG-tagged Cas9-Dam construct expression by western blot with anti-FLAG
M2 (Sigma, F1804 1:10000) and anti-β-ACTIN (1:500, Santa Cruz) antibodies. Total RNA
was extracted with Trizol reagent (Invitrogen, Carlsbad, CA) and reverse transcribed
using iScript (Bio-Rad, Hercules, CA) for RT-PCR validation of NmsgRNA expression and
for cDNA templates in qPCR.
Real Time-qPCR
ESCs were harvested and total RNA was isolated using TRIzol reagent (Invitrogen). A
SuperScript III qRT-PCR kit (Invitrogen) was used to synthesize cDNA from total RNA.
Quantitative PCR was performed using a ViiA7 system (Applied Biosystems, Rockford,
40
IL) or LightCycler 480 system (Roche, Indianapolis, IN) with iTaq Universal SYBR Green
Master Mix (BioRad); conditions were 95°C for 10 minutes followed by 50 cycles at 95°C
for 15 sec and 60°C for 3 sec. Samples were run in triplicate and transcripts were
quantitated by comparing Cycle Threshold (Ct) values for each reaction with a Gapdh
reference.
CRISPR-Dam qPCR Analysis
Primer pairs were designed to 30 GATC sites 10kb upstream and downstream of the
Pou5f1 TSS. Doxycycline induced and non-induced E14 ESCs containing the doxycycline
inducible Cas9-Dam construct and DE targeting sgRNA were processed as described
above and purified DNA was assayed by qPCR. E14 ESCs containing only the
doxycycline inducible Cas9-Dam construct were also induced simultaneously as
background controls and processed in tandem. qPCR was performed in triplicate for all
30 GATC sites on duplicate samples and signals for each site was normalized and
subtracted for background as log2 values. Negative signal intensities were excluding in
visualization.
Library Preparation and Sequencing Analysis
Genomic DNA from samples were extracted and purified using Genomic DNA Mini Kit
(Bioland Scientific LLC, Paramount, CA). For qPCR preparation, DNA was digested with
DpnII for 24 hours and purified by ethanol precipitation before assay. For high throughput
sequencing library preparation, DNA was digested for 24 hours with DpnI and ethanol
precipitated before ligating PCR adaptors. DNA was then digested for 24 hours with DpnII
to exclude amplification of fragments containing non-methylated sites. PCR was
performed to enrich for fragments flanked by methylated GATC sites. Adaptors were
41
removed via sonication and AluI digestion and cleaned up with AMPure XP beads
(Beckman Coulter, Indianapolis, IN). Library fragments were further processed using the
NEBNext® Ultra™ DNA Library Prep Kit (NEB) and labeled with dual indexes prior to
sequencing. Sequencing was performed on the Illumina HiSeq 4000 using a paired end
151bp read strategy. Sequencing reads were checked for quality by FastQC before
adaptor removal and Barrows Wheeler alignment. Reads were then aligned to genome
assembly MGSCv37 (mm9) and analyzed using HOMER to identify significant Dam
peaks genome wide. Results summarized through visualization with Integrative
Genomics Viewer with DNA hypersensitivity and histone modification tracks from
ENCODE. Enriched CRISPR-Dam peaks were also mapped to genes by Genomic
Regions Enrichment of Annotations Tool (GREAT) analysis with strict rules defining
promoter regions 5kb upstream and 1kb downstream of gene TSS. Identified genes were
further analyzed with Reactome for association with known pathways.
Statistical analysis
Statistical analyses were performed using Excel statistical tools or Prism 6 (GraphPad
Software). Where differences between treatment groups were experimentally
hypothesized, statistical differences among two groups were analyzed using Student’s t-
test (*P< 0.05, **P< 0.005, and ***P< 0.0005.). ANOVA tests (Tukey’s multiple
comparison test) were used to test hypotheses about effects in multiple groups.
Differences are indicated in figures as follows: *P< 0.01, **P< 0.001, and ***P< 0.0001.
*P< 0.01 was considered statistically significant.
42
2.4 Results
2.4.1 Construction and Validation of CRISPR-Dam Components
We began construction of each component necessary for function of the CRISPR-
Dam system. The Escherichia coli DNA adenine methyltransferase (Dam) gene was first
cloned from the pMQ430 plasmid (Calmann and Marinus, 2003) and integrated into the
catalytically dead Neisseria meningitides Cas9. No linker sequence was included to
exclude any additional confounding factors influencing labeling proximity. The type II
CRISPR system in N. meningitidis was chosen over the more commonly used
Streptococcus pyogenes CRISPR-Cas9 due to the longer protospacer adjacent motif
(PAM) sequence utilized for guide RNA targeting. This could allow for comparatively more
specific locus recognition despite the reduction of available sites within the genome (Hou
et al., 2013; Lee et al., 2016).
Following the usual setup for performing DamID, we prepared three conditions for
performing the assay: one condition expressing both Cas9-Dam and sgRNA, one
expressing only the Cas9-Dam, and one only expressing the sgRNA (Figure 2.1.A). To
better control Dam activity and expression of the fusion protein in cell, we elected to have
the NmCas9-Dam to be driven by a doxycycline-inducible multi-TetO promoter of the Tet
expression system. Tetracycline activated transcription (Tet-On) requires the expression
of a reverse tetracycline-controlled transactivator (rtTA) protein and presence of
doxycycline to activate transcription of attached sequences. We then cloned the inducible
multi-TetO promoter-FLAG-tagged NmCas9-Dam fusion construct into a lentiviral FUW
plasmid. To confirm inducible expression of our fusion protein, constructs were
43
transfected into E14 mouse ESCs with integrated rtTA and induced with a single
saturating 6µg/ml dose of doxycycline. Cells were harvested every 6 hours post induction
up to 24 hours. The FLAG-NmCas9-Dam construct was also transfected into 293T cells
with rtTA and induced as a positive transfection control. Assaying by qPCR with primers
designed specific to Dam showed highly responsive linear induction of fusion construct
expression (Figure 2.1.B).
D a m
R e la tiv e E x p re s s io n
0 h r
6 h r
1 2 h r
2 4 h r
T ra n s fe c te d +
0
1 0
2 0
3 0
4 0
2 0 0
4 0 0
Figure 2.1 Schematic of CRISPR-Dam Assay
A. Schematic of the CRISPR-Dam labeling assay. The doxycycline inducible Nm-dCas9-
Dam fusion protein is targeted by guide RNA to the Oct4 DE locus and proximity labeling
occurs at GATC sites within the vicinity. Control samples not containing small guide RNA
with Cas9-Dam expression and samples with guide RNA but without induction of Cas9-Dam
are needed to normalize potential nonspecific labeling.
B. Confirmation of Dam expression by qPCR in E14 ESCs transfected the Cas-Dam fusion
construct
A
B
44
Confident that each component of the system were functional, E14 ESCs with
integrated rtTA were then transduced with lentiviral particles containing the inducible
FLAG-NmCas9-Dam construct. Individual ESC colonies were picked and induced with
doxycycline to verify fusion construct induction. Clones were harvested 24 hours after
addition of doxycycline and assayed for Dam expression by qPCR (Figure 2.2.A). Three
clones FD1, FD2, and FD3 showed inducible Cas9-Dam expression, albeit with variable
levels potentially due to differences in transduced plasmid copy numbers. The FD1 clone
was selected for further testing and induced for various time durations to confirm Cas9-
Dam RNA and protein expression (Figure 2.2.B). Correlating with previously observed
induction time frames in transfected E14 ESCs, the FLAG-tagged fusion construct could
be detected by 6 hours post doxycycline induction and is maintained through 24 hours as
seen in qPCR and Western blot for FLAG (Figure 2.2.C).
D a m
R e la tiv e E x p re s s io n
F D 1
F D 2
F D 3
N e g a tiv e
0
2 0 0 0
4 0 0 0
6 0 0 0
8 0 0 0
Figure 2.2 Identification of ESC lines with Integrated Cas9-Dam
A. qPCR to confirm expression of Cas9-Dam fusion protein in three lentivirus transduced
E14 ESC clones after 24 hour induction with 6ug/ml doxycycline
B. qPCR of FD1 clone induced to express Cas9-Dam construct after addition of 6ug/ml
doxycycline and harvested at 6 hour intervals post induction.
C. Western blot confirming expression of FLAG-Cas9-Dam construct after induction with
doxycycline at 6 hour increments. β-actin used to normalize for total protein.
A
B
C
45
2.4.2 Validation of CRISPR-Dam By Profiling Interactions at the Pou5f1 locus
We used the well known interaction between the Oct4 DE with the Oct4 promoter
in naïve state ESCs to validate the CRISPR-Dam assay. We designed two different
sgRNAs, DE1 and DE2, both targeted to the CR4 domain within the Oct4 DE using the
CRISPR Multi-Targeter webtool. Both highly scored sgRNA sequences were cloned into
pLKO backbone plasmids and transfected into E14 cells. Expression of the sgRNA
constructs were checked by using RT-qPCR (Figure 2.3.A). Confirmation of on-site
targeting was performed by co-transfection of wild type cleaving NmCas9, DNA extraction
and PCR of the Oct4 CR4 region, and treatment of PCR amplicons with T7 endonuclease
1, which produces cleaved products at mismatching CRISPR altered loci (see Methods,
Figure 2.3.B).
2.4.3 Determination of Optimal Dox Induction Time
To ensure sufficient expression of Cas9-Dam but avoid oversaturation of Dam
methylation, we induced our transfected ESCs targeting the Oct4 DE with doxycycline at
Figure 2.3 Confirmation of sgRNA Expression and Targeting
A. RT-PCR confirming expression of sgRNAs DE1 and DE2, both targeting the Oct4 DE,
in transfected E14 cells. Compared with positive (+) control, E14 transfected an empty
sgRNA construct, and negative (-) control, an untransfected E14. Bands normalized to β-
actin expression
B. T7 Endonuclease I digestion assay to confirm on site targeting by DE1 and DE2.
Expected digested products (387bp for DE1, 276bp for DE2) are indicated by red arrows.
A
B
46
different 6 hour time points and assayed by qPCR for labeling at 30 GATC sites
approximately 7kb upstream and downstream from the Oct4 DE sgRNA targeted site.
After restriction with DpnII to exclude unlabeled sites and normalization with signals
collected from each site in the non-targeted, Cas9-Dam induced ESCs, profiles of
interaction were generated (Figure 2.4.A). The 6 hour induced cells showed a robust
interaction profile along the Oct4 locus, with methylated sites adjacent to the targeted
sgRNA sequence and at the promoter and gene body sites. The promoter region of Oct4
maintained its labeling at the 12 hour time point as well, although the intensity of signal
enrichment was decreased. Interestingly, the subsequent time points showed a gradual
loss of signal intensity, with the 24 hour induced cells showing nearly no enriched labeling
at the 30 sites. This suggested that background methylation by the untargeted Dam
became sufficiently high genome wide at the 18 and 24 hour time points to occlude signals
established at the earlier 6 and 12 hour time points. The results indicated that a shorter
time for induction by dox was necessary to preserve an ideal signal to noise ratio.
To confirm our observations, we performed the time course dox induction
experiment on FD1 ESC clone containing the integrated dox inducible FLAG-NmCas9-
Dam fusion construct and sgRNA DE1 targeting the Oct4 DE. Harvesting cells at the
same 6 hour intervals post dox induction, the same 30 GATC sites adjacent to the
targeted Oct4 DE site were digested by DpnII and assayed by qPCR. Signals were
47
6 hour
12 hour
18 hour
24 hour
bp from
sgRNA
site
Figure 2.4 Time Course Induction of Cas9-Dam Labeling Near the Pou5f1 Locus
A. E14 cells transfected with the FLAG-NmCas9-Dam construct and DE1 sgRNA targeting
the Oct4 DE and induced with doxycycline for different time points. Origin is centered at
location of DE1 sgRNA locus. 30 GATC sites around the Oct4 DE locus were assayed with
qPCR after genomic DNA processing with DpnII. Signals at each site were normalized to a
control Gapdh promoter GATC site and signals from the untargeted (Cas9+/sgRNA-)
samples for each time point. Positive log 2 values are plotted representing labeled sites.
A
-6 0 0 0
0
5
1 0
1 5
2 0
-1 0 00
-2 0 00
-3 0 00
-4 0 00
+ 1 0 0 0
+ 2 0 0 0
+ 3 0 0 0
+ 4 0 0 0
+ 5 0 0 0
+ 6 0 0 0
+ 7 0 0 0
O c t4 D is ta l E n h a n c e r
P E
P P O c t4 G B
0
-5 0 00
-6 0 0 0
0
5
1 0
1 5
2 0
-1 0 00
-2 0 00
-3 0 00
-4 0 00
+ 1 0 0 0
+ 2 0 0 0
+ 3 0 0 0
+ 4 0 0 0
+ 5 0 0 0
+ 6 0 0 0
+ 7 0 0 0
O c t4 D is ta l E n h a n c e r P E
P P O c t4 G B
0
-5 0 00
-6 0 0 0
0
1
2
3
4
5
0
-1 0 00
-2 0 00
-3 0 00
-4 0 00
+ 1 0 0 0
+ 2 0 0 0
+ 3 0 0 0
+ 4 0 0 0
+ 5 0 0 0
+ 6 0 0 0
+ 7 0 0 0
-5 0 00
-6 0 0 0
0
1
2
3
4
5
0
-1 0 00
-2 0 00
-3 0 00
-4 0 00
+ 1 0 0 0
+ 2 0 0 0
+ 3 0 0 0
+ 4 0 0 0
+ 5 0 0 0
+ 6 0 0 0
+ 7 0 0 0
-5 0 00
-6 0 0 0
0
1
2
3
4
5
O c t4 D is ta l E n h a n c e r
P E
P P
O c t4 G B
0
-1 0 00
-2 0 00
-3 0 00
-4 0 00
+ 1 0 0 0
+ 2 0 0 0
+ 3 0 0 0
+ 4 0 0 0
+ 5 0 0 0
+ 6 0 0 0
+ 7 0 0 0
-5 0 00
Log2 Relative Intensity (targeted/untargeted)
48
normalized to the FD1 ESC clone lacking targeted sgRNA expression. qPCR results were
then plotted for visualization on the UCSC Genome Browser and aligned with tracks for
histone marks for active enhancers, H3K27ac, primed enhancers, H3K4me1, and active
promoters, H3K4me3. DNA accessibility indicated by DNaseI hypersensitivity was also
included (Figure 2.5.A). Similar to what was observed in the E14 ESCs transfected with
the CRISPR-Dam components, robust signals were detected at the promoter and gene
body of Oct4 and at sites nearby the Oct4 DE. In contrast to the transfected E14 cells,
signals detected at the promoter region were maintained through the 24 hour time point,
suggesting that labeling at the promoter region of Oct4 were true positive signals and the
induction of the Cas9-Dam protein had not led to label oversaturation. The labeled sites
interestingly were not correlated to peaks in H3K27ac and DNaseI hypersensitivity, but
did overlap with active promoter mark H3K4me3, which indicated specific labeling of
interactions at the Oct4 locus that falls in line with our previous understanding of Oct4 DE
looping to the Oct4 promoter in naïve state ESCs.
49
Figure 2.5 CRISPR-Dam Labeling in FD1 ESC clone Targeting the Oct4 DE
A. The FD1 ESC clone transfected with sgRNA DE1 and induced with doxycycline was
harvested at 6 hour intervals post induction. Genomic DNA was digested with DpnII and 30
GATC sites around the Pou5f1 locus was assayed by qPCR. Samples were normalized to a
control GATC site at the Gapdh promoter and the untargeted (Cas9+/sgRNA-) sample at
each site. Positive log 2 values are plotted representing detected labeled sites. Pink bars
represent negative values. Signals are aligned the UCSC genome browser with ENCODE
E14 ChIP-seq tracks for histone marks H3K27ac, H3K4me1, H3K4me3, and DNaseI
hypersensitivity tracks
A
50
2.4.4 CRISPR-Dam Sequencing and Comparison with 4C-seq
Adapting the CRISPR-Dam assay for sequencing, we followed the typical DamID
library preparation protocol as described in other published studies (Marshall et al., 2016;
Vogel et al., 2007). The protocol is altered to accommodate an initial DpnI digestion step
to allow for PCR adaptors to be ligated at the methylated site (Figure 2.6.A). The adaptor
ligated fragments are then digested by DpnII to exclude unlabeled fragments from
amplification during the first round of PCR. Adaptors are then removed by sonication and
digested with AlwI enzyme prior to sequencing library preparation. We prepared duplicate
FD1 samples containing sgRNA DE1 and induced with dox at 6 hour increments for paired
end library construction and sequencing on the HiSeq4000 platform. As before, an
untargeted Dam expressing FD1 ESC line was also prepared for sequencing for
normalization. Results were then aligned to the mm9 mouse genome. The Pou5f1 locus
Figure 2.6 CRISPR-Dam Sequencing Library Preparation Strategy
A. Dam labeled fragments are first digested with DpnI and ligated with PCR adaptor. This is
followed by DpnII digestion to exclude unlabeled fragments. Fragments are PCR amplified
prior to sequencing index ligation and further library preparation. Adapted from Vogel et al.
2007
A
51
was examined with ENCODE ChIP-seq tracks for H3K27ac, H3K4me1, and H3K4me3
and DNaseI hypersensitivity to confirm agreement with our qPCR results (Figure 2.7.A).
Interestingly, the 6 hour sample contained three distinct peaks upstream from the Oct4
TSS, with only one captured in signals from our CRISPR-Dam transfected E14 samples
and FD1 ESC line. Interactions at the Oct4 promoter were surprisingly not prominent at
6 hours, but appeared in the 12 and 18 hour samples and disappeared in the 24 hour
samples (not shown). We believe this may reflect early chromatin dynamics of the Oct4
DE, where interaction between the DE and Oct4 promoter occurs albeit infrequently within
any 6 hour interval. Additionally, we observed that the identified peaks were located
independent of histone marks and DNaseI sites, demonstrating that Dam labeling and,
by extension, chromatin looping does not necessarily follow established epigenetic marks.
Although these results differed slightly with our qPCR data, we continued our analysis
using only the 6 hour samples.
Figure 2.7 Time Course CRISPR-Dam Coverage at the Pou5f1 Locus
A. Sequencing coverage tracks for CRISPR-Dam samples at different time points. Signal
tracks were normalized together and aligned in the Integrative Genomics Viewer mouse
genome mm9. ENCODE tracks for histone marks H3K27ac, H3K4me1, H3K4me3, and
DNaseI hypersensitivity tracks are shown.
A
H3K4me3
No Dam
No sgRNA
6 hour
H3K4me1
DNaseI HS
H3K27ac
52
To identify enriched Dam labeled sites, raw 6 hour Dam labeled signals were
loaded into peak calling software HOMER (Figure 2.8.A). In the FD1 ESC lines
expressing Cas9-Dam and sgRNA, 17876 peaks were identified while 9799 peaks were
identified in the untargeted Cas9+/sgRNA- samples. 935 peaks were shared among both,
making up only 5% of the HOMER identified peaks in the Cas9/sgRNA- samples. 16940
unique peaks were identified in the Cas9+/sgRNA+ samples while only 8864 peaks were
found in the Cas9+/sgRNA- samples. The greater number of peaks called in the
Cas9+/sgRNA+ sample indicated greater enrichment of interaction labeling in these
samples over background.
935 16940
Cas9+/sgRNA+
Cas9+/sgRNA-
8864
A
Figure 2.8 HOMER Identified 6 Hour Induced CRISPR-Dam Peaks
A. Number of peaks called in 6 hr induced samples expressing Cas9-Dam with sgRNA
compared to peaks called in samples expressing Cas9-Dam without sgRNA. Overlapping
peaks and unique peaks in each sample are summarized in the graph above.
53
We compared our CRISPR-Dam data to previously performed circular
chromosome conformation capture and sequencing (4C-seq) that also mapped the Oct4
DE interactome in E14 mouse ESCs (Wei et al., 2013). The data was comparable as we
had designated the 4C bait region at the same DE locus approximately 2kb upstream of
the Pou5F1 TSS. Previously, 4C-seq reads were consolidated into megabase regions
due to a lack of resolution (Wei et al., 2013). In summary, we had observed about 85%
of reads mapping to intrachromosomal regions in chromosome 17, resulting in 1216
unique regions. Only 15% of captured reads were interchromosomal interactions shared
between two biological replicates, totaling 590 regions. In contrast, our CRISPR-Dam
Figure 2.9 Comparison of CRISPR-Dam Peaks and 4C Identified Regions
A. Number of Cas9-Dam peaks within chromosome 17 and all other chromosomes
compared to 4C identified regions within chromosome 17 and other chromosomes.
Distribution of peaks and regions per megabase of genome within chromosome 17 and other
chromosomes.
B. Overlap of Dam peaks with 4C regions mapped from sequenced 4C reads. Samples
expressing Cas9-Dam and sgRNA overlapped with higher signal intensity at 4C identified
sites.
A
B
54
peaks only contained 4% of peaks mapping within chromosome 17 with 753 unique peaks,
while the majority of peaks mapped to other chromosomes, making up 96% of all
identified peaks (Figure 2.9.A). The distribution of the identified CRISPR-Dam peaks and
4C regions also differed significantly, with CRISPR-Dam peaks distributed relatively
evenly with approximately 7-8 peaks per megabase of the genome regardless of
intrachromosome or interchromosome distance, while 4C regions were identified at much
higher frequency within chromosome 17 (Figure 2.9.A). In overlapping 4C regions with
CRISPR-Dam peaks genome wide, we were able to see that the CRISPR-Dam reads
Table 2.1 REACTOME Analysis of CRISPR-Dam Identified Genes
Summary of GREAT identified genes and enriched pathways in Cas9-Dam+/sgRNA+
samples and Cas9-Dam+/sgRNA- samples.
55
obtained from samples expressing Cas9-Dam and sgRNA aligned well within previously
identified 4C regions and closely mapped to the center of 4C read regions (Figure 2.9.B).
This result indicated that our CRISPR-Dam results also captured those interactions found
previously by 4C-seq.
CRISPR-Dam peaks were further analyzed using Genomic Regions Enrichment
of Annotations Tool (GREAT) to associate with gene promoters. Using a strict definition
for gene promoter regions, 453 number of genes were identified in Cas9-Dam+/sgRNA+
samples while 236 number of genes were identified in Cas9-Dam+/sgRNA- samples. As
expected, well known pluripotency related genes Pou5f1, Sox2, and Klf4 were enriched,
as well as other auxiliary pluripotency associated genes Sall4, Dppa4, and Zscan10.
Reactome analysis further characterized the pathways associated with the genes,
revealing several enriched pathways related to stem cell pluripotency and proliferation
(Table 2.1). These results confirmed the utility of our CRISPR-Dam assay to identify
distally located genes interacting with the Oct4 DE that participate in the pluripotency
gene network.
2.5 Discussion
We described a new method to identify extreme long range chromosome
interactions through utilizing strengths of the DamID system and CRISPR gene editing.
Highly specific targeting of the Nm-sgRNA and in vivo labeling of proximal loci by the Dam
domain not only allows for reproducible mapping of Oct4 distal enhancer interactome, but
also captures temporal dynamics of the enhancer over time. We were able to achieve
better resolution to 3C-based methods with more appropriate controls to limit background
56
signals, which allowed for positive identification of specifically interacting genes with the
Oct4 DE. The CRISPR-Dam assay is distinguish from other studies on enhancer activity
and chromatin looping as we are able to observe both intrachromosome and
interchromosome interacting sites genome wide. Other studies utilizing 3C based assays
are limited in their focus on relatively proximal interactions within megabase regions or
within the same chromosome, largely due to the lack of resolution and relative scarcity of
distal signals (Amano et al., 2009; Tolhuis et al., 2002; Zhang et al., 2013). This potentially
overlooks biologically significant long range interactions that also exert functional
influence on transcription programs and cell identity. Through analyzing interaction data
from our new CRISPR-Dam technique, we found genes expected to participate in the
pluripotency gene network through interaction with the Oct4 DE, as well as novel genes
previously not identified in other studies interacting with the Oct4 DE.
Our comparison of CRISPR-Dam sequencing peaks and 4C-seq identified regions
yielded some unexpected but interesting results. Based on the high number CRISPR-
Dam peaks found, the data suggest a very active Oct4 DE in ESCs, which may reflect the
DE’s role as a super enhancer. As the DE contains many binding sites for a multitude of
transcription factors (e.g. OCT4, SOX2, NANOG, KLF4, SALL4, ESRRB) and
architectural proteins (e.g. CTCF, Cohesin), extensive looping activity mediated by these
factors can be expected to allow DE interaction with many genes all throughout the
genome that are involved in regulating ESC pluripotency. However, it is hard to gauge if
the high level of interaction we observed may be an artifact of the CRISPR-Dam assay
itself as our study is the first of its kind tracking the looping activity of Oct4 DE in living
ESCs. The relatively even distribution of CRISPR-Dam peaks suggested that the
57
CRISPR-Dam labeling occurred in a less biased manner and could capture more distal
interactions in contrast to 4C. However, this may also indicate more labeling artifacts due
to insufficient normalization of background labeling in our control samples.
Notwithstanding, we could still observe enrichment of pluripotency related genes and
pathways from gene ontology analysis of our Oct4 DE targeted samples (Table 2.1).
Further optimization of the protocol and analysis pipeline can help refine the results, such
as including additional comparison with samples labeled at shorter and longer time
durations.
Other aspects of the CRISPR-Dam assay may also require further consideration
and development. Dam labeling relies on the natural distribution of GATC sites throughout
the genome, which occurs on average once every 256bp and limits the available
resolution to hundreds of base pairs, but this has shown to be sufficient to map
interactions to genes and other genomic elements. The availability of GATC sites may be
influenced by the positioning of nucleosomes and DNA binding proteins on the chromatin,
the impact of which we could not fully assess in our current study. Also, temporal control
of Dam activity is a critical parameter as noted in other studies utilizing Dam. Label
oversaturation can occur if Dam is allowed to be active for long periods of time and lead
to labeling at unrelated loci. We observed this in our time course experiment, where peaks
at the Oct4 locus became obscured by background signals by 18 and 24 hours. As a
result, we had selected the 6 hour induction time point as the most optimal based on the
clear signals we observed, but shorter induction and labeling times may provide an even
more distinct interaction profile. We had incorporated the inducible tetracycline
expression system to restrict Dam expression and included control cell lines to minimize
58
signal leakage, but more precise control of Dam activity can be achieved as seen in other
studies utilizing ligand-dependent or dimer activated Dam constructs (Hass et al., 2016;
Pindyurin et al., 2016). Further improvement of the protocol to incorporate use of N6-
adenine specific antibodies and employing single cell analysis may also help improve
sensitivity and precision of the assay. Lastly, recent reports of endogenous N6-adenine
methylation have been found in mammalian embryonic stem cells along with putative
adenine demethylases (Luo et al., 2015; Wu et al., 2016). However, we expect the Dam
labeling technique to remain viable since the endogenous consensus motif discovered in
these studies does not match the GATC sites used by Dam, which excludes endogenous
adenine methylation mechanisms from interfering with our CRISPR-Dam method.
59
Primer F/R Sequence Size (bp)
Distance to
Oct4 TSS
1
Forward CAGAACAGGGCTAGGTGTGG
154 -7890
Reverse AAAGGCATGTGCCATCATC
2
Forward GGCACATGCCTTTGATCCTA
129 -7689
Reverse TGTTTTCAAGGCAGGGTTTC
3
Forward GAAACCCTGCCTTGAAAACA
180 -7456
Reverse CCAAGCCTGGCATTTAGAGA
4
Forward GGGGAAGCTAGGTGGGAGTA
101 -6057
Reverse TGGGTGGTTCTAATGCTGTG
5
Forward AGGCACACCAGGAGAGAGAG
114 -5628
Reverse CAGCTCCCACTTGGTGACTC
6
Forward CCACAGCTGCTCACCTATGT
175 -4544
Reverse ATAACCAGAACCCCACGACA
7
Forward GAGTGATGTCGTGGGGTTCT
120 -4369
Reverse CACTACTGCCCGTTTCCTTT
8
Forward AGCCTTCTGTTCTGCTGGTG
220 -4036
Reverse AGGAACCTGAGGACAGCTCA
9
Forward CATGTCGCTGAAACTCCTCA
148 -3472
Reverse GAGTTCAGCCACCTCCTCAC
10
Forward ACCTTTTCATGCTGGTGGAC
133 -2964
Reverse CGTTAACTTGCCACAAACCA
11
Forward TGACTCTTAAAGGGGGCAGA
190 -2556
Reverse TGCCTCCTGGGTCTTAGAAA
12
Forward AGAGTGAGTTCCAAGCCATCT
189 -2424
Reverse AGCTTGAAACAGATGTGCCC
13
Forward GGAGCAGACAGACAAACACC
172 -1227
Reverse AGGAGACGGGATTAGGAGGA
14
Forward GCTGGACACCTGGCTTCA
150 150
Reverse CTCTGAGCCTGGTCCGATT
15
Forward GAACCTGGCTAAGCTTCCAA
107 224
Reverse CCCTCCGCAGAACTCGTAT
Table 2.2 CRISPR-Dam Tiling Primers at Pou5f1 Locus
60
16
Forward CTGGCCCTTGTCATGTAGGT
249 767
Reverse CCAGTCACACCCAACCTCTT
17
Forward CCTCAGACTCCATGAGTCACC
115 1012
Reverse CAAGGGTGTCCCTTTCTTGT
18
Forward CCCTTGGGTAGGGGAGTTT
108 1102
Reverse ACACTTGCTCCAGAGTGCTG
19
Forward GGCATGGACATTTGGCTACT
108 2238
Reverse GCTCTCCTGCCTCAGTGTCT
20
Forward AGACACTGAGGCAGGAGAGC
343 2426
Reverse GGCTGCCGACTGTAAGTCAT
21
Forward GTCCCAGCTGGTGTGACTCT
175 3131
Reverse GAGACCCACCAAAGAGAACG
22
Forward CGTTCTCTTTGGTGGGTCTC
175 3210
Reverse AGTGAGTGAGGCAGCAAGAA
23
Forward GAGCTGCTTCTCCACAGGTA
220 3557
Reverse CTTCCTCCACCCACTTCTCC
24
Forward TGTGAGGTGGAGTCTGGAGA
110 4063
Reverse TCTTGGCACTCACATCCTTC
25
Forward CTCACATCGCCAATCAGCTT
152 4114
Reverse ATAAGCTCTGGGATGCTGCA
26
Forward TGTTTCCCTGTTCCCATTCC
113 4220
Reverse TAGGATGGGCACAGAGATGG
27
Forward ATGTCCATCTCTGTGCCCAT
120 4294
Reverse GGTGTCCCTGTAGCCTCATA
28
Forward CAAGGCAAGGGAGGTAGACA
152 4668
Reverse ATGAGTGACAGACAGGCCAG
29
Forward TCAGTTCCTAGAGAGCCTGC
189 4815
Reverse TAGCAGAGACTTACCGCCTC
30
Forward GGGAGTTCTGGTGGGCTTAT
236 5123
Reverse AAAGAGTTTGAGGGCTGGAG
GAPDHcontrol
Forward GGGAGAACAGGGGAAATGGA
176
Reverse CCACCCTGGCATTTTCTTCC
61
Table 2.3 CRISPR-Dam Targeting sgRNA Primers for Oct4 DE
62
Chapter 3: The Oct4 Distal Enhancer Interacts With Distal Targets Llgl2 and Grb7
to Regulate ESC Pluripotency
3.1 Abstract
Through the CRISPR-Dam labeling assay, we are able to characterize extreme
long range interactions of the Oct4 DE genome wide. By further dissection of
interchromosome interactions observed previously from DNA FISH occurring on
chromosome 11, we identified lethal giant larvae 2 (Llgl2) and growth factor receptor-
bound protein 7 (Grb7) as putative functional interacting target genes. We show that the
Oct4 DE directly regulates expression of Llgl2 and Grb7 independent from OCT4 activity.
Expression of both Llgl2 and Grb7 closely correlates with the pluripotent state, where
knock down of either results in loss of pluripotency, and overexpression enhances
somatic cell reprogramming. We demonstrated that biologically important interactions of
the Oct4 DE can occur at extreme distances that are necessary for the maintenance of
the pluripotent state.
3.2 Introduction
As the focus of our study of ESC pluripotency, the Oct4 super enhancer region is
an important regulator of the pluripotent stem cell state. Work characterizing the locus
has identified three cis-regulatory regions upstream of the Pou5f1 gene transcription start
site (TSS): the proximal promoter (PP), the proximal enhancer (PE), and distal enhancer
(DE) (Minucci et al., 1996; Yeom et al., 1996). The DE and PE are utilized in a cell-type
specific manner in pluripotent cells to regulate Oct4 expression, where the DE controls
active transcription of Oct4 in naïve state pluripotency while the PE is activated in primed
state pluripotency (Buecker et al., 2014; Choi et al., 2016). Two distinct regulatory
63
elements within the DE have also been identified, conserved regions CR3 and CR4, that
are shared across different species and contain binding sites for many transcription
factors with positive and negative regulatory roles on Oct4 expression (Chew et al., 2006;
Minucci et al., 1996; Wu and Schöler, 2014; Yeom et al., 1996). In studying the activity of
the Oct4 locus in naïve state ESCs, previous work from our lab has shown that
endogenous Oct4 transcription occurs concurrently with long-range chromatin
interactions at the Oct4 locus and binding of looping factors like KLF4 at the DE (Wei et
al., 2009, 2013). Chromatin remodeling complexes Mediator and cohesin are also
recruited by KLF4 which subsequently leads to RNA polymerase II (RNAPII) recruitment
concomitant to loop formation. With these known chromatin looping factors actively bound
to the DE locus, we predicted that other putative functional targets of the DE should exist
that may be functionally important genes of the pluripotency gene network in ESCs.
In our study to characterize the interactions of the Oct4 DE in ESCs, we utilized
data generated by the CRISPR-Dam sequencing assay in conjunction with previously
obtained 4C-seq and DNA FISH results to identify interchromosomal targets. Two
Putative targets initially identified in the distal arm of chromosome 11 containing lethal
giant larvae 2 (Llgl2) and growth factor receptor bound protein 7 (Grb7) were verified by
3C and FISH assays to interact with the Oct4 DE. To confirm direct regulation by Oct4
DE, a conditional DE deletion ESC line was constructed using CRISPR to knock in LoxP
sites flanking the CR3 and CR4 region of the DE. Oct4 DE deletion mutant ESCs were
found to lose expression of Llgl2 and Grb7 in spite of restoring OCT4 to native levels.
Characterizing the functional role of Llgl2 and Grb7 showed both are necessary for
64
maintaining the pluripotent state and ectopic expression of either gene enhances the
reprogramming of somatic cells into iPSCs.
3.3 Materials and Methods
Cell Culture
E14 mouse ES cells were grown in ESC media containing GMEM with 15% FBS, 1%
NEAA, 1% Sodium Pyruvate, 1% L-glutamine, 1% PenStrep, 1X B-mercaptoethanol, and
1000U LIF. Primary mouse MEF cells were derived from E12.5 embryos and cultured in
DMEM/10% FBS, 1% PenStrep, and 1% L-glutamine for two passages before use. Cells
were passaged using 0.05% Trypsin in PBS solution. Floxed CR4 ESCs were treated
with 1µg/ml tamoxifen for Cre-ERT2 induction.
Chromosome Conformation Capture
3C assay was performed following steps previously outlined (Naumova et al., 2012).
Briefly, 1x10
8
E14 cells were trypsinized and crosslinked with 37% formaldehyde for 10
minutes at room temperature. Crosslinking was quenched with ice cold 2.5M glycine.
Cells were pelleted and lysed using a dounce homgenizer. Isolated nuclei were
resuspended in 1ml of 1.2X restriction buffer and distributed into 20 eppendorf tubes. 1%
SDS was added and the samples incubated at 65°C, followed by snap cooling and
addition of 10% Triton X-100. Chromatin was then digested overnight with 400U of HindIII.
Comparison of purified undigested and digested chromatin aliquots by SYBR Green PCR
was performed to check digestion efficiency. Only samples with more than 80% were
used for further analysis. Inactivation of the restriction enzyme with 10% SDS and heating
at 65°C was performed before addition of 7.5ml of ligation cocktail containing 10% Triton
65
X-100, 10mg.ml BSA, 100mM ATP, and 10X ligation buffer containing 500mM Tris pH
7.5, 100mM MgCl2, 100mM DTT. 100U of T4 ligase was then added to each sample tube
and incubated overnight at 4°C and 1 hour at room temperature the following day.
Samples were decrosslinked overnight at 65°C with 30ug/ml Proteinase K and for 1 hour
at 37°C with 30ug/ml RNAse A. Samples were then purified with 7ml of phenol-chloroform
and precipitated with 3M sodium acetate and 100% ethanol. Library pellets were
resuspended in 10mM Tris pH 7.5 and stored at -20°C. BAC control libraries were
generated following the same procedure using BAC clones RP23-143F14 representative
of the Llgl2 locus, RP23-474J5 for the Grb7 locus, and CH29-91H19 for the Oct4 locus.
PCR reactions were performed using APEX 2X HotStart Master Mix on the Biorad
Bioengine Peltier Thermocycler. Titration curves were generated using BAC control
libraries to prevent signal saturation. Amplicons were run on a 2% agarose gel.
3C Primer Design
Genomic maps of the Oct4 distal enhancer locus, Llgl2 locus, and Grb7 locus were
downloaded from the UCSC Genome Browser mm9 for identification of appropriate
restriction sites. Primer flanking restriction sites were designed using Primer3 and
optimized to have a melting temperature of 60°C when paired with the bait primer located
at the Oct4 DE. Primers were designed to the sense strand and located within 150bp of
restriction sites, for a total fragment length of at least 70bp and less than 250bp after
ligation with the bait fragment.
66
DNA fluorescence in situ hybridization
DNA FISH probes were produced from BAC clones used from 3C assays to visualize the
chromatin regions tested. BAC clones were gently extracted and purified using Qiagen
Large Construct Kit (Qiagen, Germantown, MD). For probe generation, BACs were nick
translated with biotin-11-dUTP or dioxigenin-11-dUTP or 90 mins at 15°C and quenched
with salmon sperm DNA and mouse Cot-1 DNA. Probes were then precipitated with
ethanol and resuspended in deionized formamide and 2X hybridization buffer (2X SSC,
dextran sulfate, sodium phosphase monobasic). E14 cells were grown on 1% gelatinized
glass slides in preparation. Following a previously described protocol, cells were fixed
with 4% formaldehyde and permeablized with 0.5% Triton X-100 on ice briefly. Then,
samples were dehydrated sequentially with ice cold ethanol and rehydrated in 2X SSC.
Cells were denatured in 50% formamide/2X SSC (pH 7.2) for 30 mins at 80°C and then
quenched in 70% ethanol. Probes were added and incubated for 48 hours in a humid
chamber. Slides were then washed in 50% formamide/2X SSC at 42°C and incubated
with anti-DIG-fluorescein (Sigma Aldrich cat. 11207741910) for 40 mins at 37°C. After
washing with 0.1% Tween 20/4X SSC, slides were then incubated with avidin-rhodamine
(Vector Labs, Burlingame, CA) for 40 mins at 37°C. Slides were then washed again in
0.1% Tween 20/2X SSC and counter stained with DAPI before mounting. Images were
collected using confocal microscopy (AxioImager with AxioCam HRc color camera; Zeiss,
Jena, Germany). Overlapping fluorescein and rhodamine signals were considered
colocalized loci. At least 200 nuclei were analyzed for a power of 0.8 at an alpha of 0.05.
67
CRISPR knock in LoxP Insertion at DE Construction
Wild type E14 cells were transduced with separate lentiviral vectors containing rtTA, TRE-
mOct4, and pSin-CreER(T2). Clones were identified through genomic PCR and
expanded clonally. Targeting sgRNA were designed using the sgRNA Designer tool
(portals.broadinstitute.org/gpp/public/analysis-tools/sgrna-design) to target the CR3 and
CR4 flanking sequences located within the Oct DE. Using the SpCas9 system, sgRNAs
were cloned into LentiCRISPRv2 (Addgene #52961), and co-transfected with repair donor
oligos containing the designated loxP sites and altered PAM sequence using
Lipofectamine 3000 (Life Technologies). Clones were picked after 48 hours and
expanded before genotyping for loxP insertion. Screening PCR was performed after
treatment with 0.1M 4-hydroxytamoxifen for 48 hours. Identified mono- and bi-allelic
colonies were stably expanded. For knockout experiment, cells were treated with 0.1M 4-
hydroxytamoxifen for 48 hours and with 0.5 ug/ml doxycycline for the appropriate
condition. Total RNA was extracted using Trizol and reverse transcribed with iScript (Bio-
Rad) in preparation for qPCR.
shRNA knock down and RT-qPCR
Knock down of Llgl2 was performed using GIPZ-based sequences procured from GE
Dharmacon. Grb7 KD sequences were cloned into pLKO.1 backbones. Sequences are
listed in Table. Vectors were transfected using Lipofectamine 3000 according to
manufacturer recommended ratio of vectors to reagent. Transfected E14 cells were
incubated for 16 hours at 37°C before selection with 1ug/ml puromycin for 36 hours. Cells
were harvested for RNA 72 hours post transfection using Trizol reagent (Life
68
Technologies) and purified by ethanol precipitation. RNA was measured using NanoDrop
and normalized to 1ug for reverse transcription using iScript (Bio-Rad).
Reprogramming
Llgl2 and Grb7 mRNA were cloned via Gateway Cloning into pMX retroviral plasmid
backbone containing a T2A-mCherry in frame. Retroviral particles of the Llgl2 and Grb7
overexpression constructs along with pMX-human Oct4, Sox2, Klf4, and c-Myc were
transfected into Plat-E retrovirus packaging cells. Viral supernatants were harvested 3
days post transfection and used freshly. 1 x 10
5
early passage MEFs were plated in 6
well plates 1 day before transducing twice by spinfection on subsequent days with viral
supernatants and 8ug/ml of polybrene. Transformed MEFs were cultured with ESC media
for 7 days before passaging. iPS colonies emerged by day 9 and single clones were
picked on day 14 post transduction before entire wells were stained with Alkaline
Phosphatase (Vector labs) for quantification. Picked iPS colonies were expanded for 2
days before harvesting mRNA for qPCR quantification and western blot analysis using
anti-OCT4 (1:1000, Abcam Ltd, Cambridge, MA), anti-SOX2 (1:1000, Abcam), and anti-
NANOG (1:1000, Santa Cruz).
3.4 Results
3.4.1 Identification of Long Range Interchromosome Targets Llg2 and Grb7
We had performed CRISPR-Dam initially by designing sgRNAs to bind at the Oct4
DE within the CR4 region. Through our analysis pipeline aligning raw sequencing reads
via Barrows-Wheeler alignment and enriching for signals by subtracting control, non-
targeted labeling signals, we were able to generate a list consisting of significantly
69
enriched labeled peaks in the assay. Previously, DNA FISH was performed on regions
located on chromosome 11 that were observed to colocalize more frequently than other
tested loci (Wei et al., 2013). Scanning these regions for CRISPR-Dam peaks, we
observed distinct signals located at the promoter and gene body regions of two genes,
Figure 3.1 CRISPR-Dam Interaction Profile at Llgl2 Locus From Oct4 DE
A. Profile of CRISPR-Dam sequencing read coverage at Llgl2 loci. Tracks for samples
containing Cas9-Dam+/sgRNA+ induced with doxycycline for 6 hours, Cas9-Dam+/sgRNA-
induced with doxycycline for 6 hours, and Cas9-Dam-/sgRNA+ samples. Red boxes indicate
HOMER identified peaks at approximately 5kb and 18kb downstream of Llgl2 TSS.
B. qPCR of individual GATC sites around identified sequencing peaks proximal to transcription
start sites TSS of Llgl2. Arrows correspond to individual GATC sites. Values calculated from
triplicate samples normalized with Cas9-Dam+/sgRNA- and Cas9-Dam-/sgRNA+ samples.
Statistical analysis by students t-test (**p < 0.001).
L 1 8
L 1 9
L 2 0
L 2 1
L 2 2
L 2 3
L 2 4
L 2 5
L 2 6
L 2 7
0 .0
0 .5
1 .0
1 .5
L o g
2
F o ld (ta rg e te d /u n ta rg e te d )
L 1 8 L 1 9 L 2 0 L 2 1 L 2 2
Llgl2
L 2 3 L 2 4 L 2 5 L 2 6 L 2 7
A
B
Site Distance to TSS (bp)
L18 -1492
L19 -855
L20 -433
L21 259
L22 955
L23 2810
L24 3925
L25 6593
L26 9077
L27 10824
**
**
H3K4me3
No Dam
No sgRNA
6 hour
H3K4me1
DNaseI HS
H3K27ac
70
lethal giant larvae 2, Llgl2 (Figure 3.1), and growth receptor bound protein 7, Grb7 (Figure
3.2).
To confirm the sequencing results, we repeated the CRISPR-Dam assay again
using the Oct4 DE targeting sgRNAs. The same CRISPR-Dam E14 cell lines were
Figure 3.2 CRISPR-Dam Interaction Profile at Grb7 Locus From Oct4 DE
A. Profile of CRISPR-Dam sequencing read coverage at Grb7 loci. Tracks for samples
containing Cas9-Dam+/sgRNA+ induced with doxycycline for 6 hours, Cas9-Dam+/sgRNA-
induced with doxycycline for 6 hours, and Cas9-Dam-/sgRNA+ samples. Red boxes indicate
HOMER identified peaks at approximately 3kb downstream of Grb7 TSS.
B. qPCR of individual GATC sites around identified sequencing peaks proximal to transcription
start sites TSS of Grb7. Arrows correspond to individual GATC sites. Values calculated from
triplicate samples normalized with Cas9-Dam+/sgRNA- and Cas9-Dam-/sgRNA+ samples.
Statistical analysis by students t-test (**p < 0.001).
A
G 1 6
G 1 7
G 1 8
G 1 9
G 2 0
G 2 1
G 2 2
G 2 3
G 2 4
G 2 5
G 2 6
G 2 7
G 2 8
0 .0
0 .5
1 .0
L o g
2
F o ld (ta rg e te d /u n ta rg e te d )
G 21 G 22
G rb 7
G 16 G 17 G 20
G 19
G 18 G 28
G 27
G 26 G 25 G 24 G 23
Site Distance to TSS (bp)
G16 -1708
G17 -1469
G18 -650
G19 -422
G20 -116
G21 355
G22 958
G23 1117
G24 2155
G25 3038
G26 3236
G27 3865
G28 3948
B
**
**
H3K4me3
No Dam
No sgRNA
6 hour
DNaseI HS
H3K27ac
H3K4me1
71
Figure 3.3 3C–PCR gels Tiling Interactions Between Oct4 DE and Llgl2 Loci
A. Triplicate 3C libraries were constructed by digestion with 6bp cutter HindIII. Frequencies
are calculated from 3C-PCR band intensities averaged over triplicate samples and
normalized to an internal Oct4 promoter control and BAC control library bands.
Figure 3.4 3C–PCR gels Tiling Interactions Between Oct4 DE and Grb7 Loci
A. Triplicate 3C libraries were constructed by digestion with 6bp cutter HindIII. Frequencies
are calculated from 3C-PCR band intensities averaged over triplicate samples and
normalized to an internal Oct4 promoter control and BAC control library bands.
A
A
72
induced to express Cas9-Dam for 6 hours before collection and digestion with only DpnII,
leaving only methylated site available for detection. Primer pairs were designed to tile
GATC sites located approximately 6kb around the TSS of Llgl2 and Grb7 (Table 3.2).
Profiling qPCR probes tiling each gene loci showed matching profiles of signals
corresponding to the sequencing identified peaks, with the nearest peak to the Llgl2 TSS
enriched in the qPCR (Figure 3.1) and the first intron of Grb7 (Figure 3.2). Agreement of
the results from sequencing and qPCR demonstrated that our CRISPR-Dam system
could reproducibly identify interacting loci of Oct4 DE in ESCs.
To further validate our CRISPR-Dam assay results, we used well established
chromosome conformation capture (3C-PCR) and DNA-FISH approaches to support our
Figure 3.5 3C-PCR Plots Summarizing Contact Frequencies at Llgl2 and Grb7 Regions
A. Summary of interchromosomal interaction frequency between the Oct4 DE locus on
chromosome 17 and regions encompassing Llgl2 and Grb7 on chromosome 11. HindIII
sites indicated by green arrows. Values indicate mean±SEM for biological replicates.
73
observations. We employed semi-quantitative 3C-PCR assays to probe an approximately
10kb region around the TSS of Llgl2 and Grb7 (Table 3.3). 3C-PCR primers were solely
designed on the forward strand to HindIII sites at the gene loci. The 6bp cutting enzyme
HindIII was initially chosen as there was only a single HindIII site within the Oct4 DE we
could designate as our bait. Triplicate 3C libraries were each prepared with 1 x 10
7
E14
cells and 3C-PCR was performed to detect ligated bait-target fragments. Resultant band
intensities were normalized to an internal reference site and corresponding band
intensities from control bacterial artificial chromosome (BAC) libraries representing Llgl2
and Grb7 loci also prepared with HindIII. We found 3C interaction signals at the gene
cluster containing promoters of Caskin2, Tsen54, and Llgl2 (Figure 3.3 and Figure 3.5)
and a clear interaction signal at the Grb7 locus (Figure 3.4 and Figure 3.5). These results
supported the observations made from the CRISPR-Dam and 4C results.
To visually confirm the Oct4-Llgl2/Grb7 interactions in ESCs, we performed DNA-
FISH using hapten probes generated from the selected BACs encompassing our loci of
interest. Probes were generated using BAC clones used in constructing the 3C control
libraries that represented the Oct4 locus and the Llgl2 and Grb7 loci (Table 3.1). Probes
were hybridized to ESCs and at least 200 cells were counted to quantify colocalization
between Oct4 DE and Llgl2 and Grb7 regions. Both Llgl2 and Grb7 probes showed
74
significantly higher colocalization frequency with the Oct4 locus compared to other
regions containing known non-interacting loci, such as one containing fibroblast marker
gene Thy1 (Figure 3.6) (Stadtfeld et al., 2008). These observations demonstrated to us
that Llgl2 and Grb7 are candidate interacting gene targets of the Oct4 DE.
Figure 3.6 DNA FISH with BACs Representing Oct4 DE, Llgl2, and Grb7 Loci
A.DNA fluorescence in situ hybridization comparing colocalization of the Oct4 DE region
and interacting Llgl2 or Grb7 regions to non-interacting Thy1 region. 10um scale.
B. Colocalization frequency between Oct4 DE region and Llgl2, Grb7, and Thy1 regions
per field of view. ANOVA tests were performed to calculate significance (*p < 0.01,
***p < 0.0001).
A
B
Table 3.1 BACs Use to Generate DNA FISH Probes
75
3.4.2 Reciprocal Interaction of Llgl2 and Grb7 with the Oct4 DE
To confirm that Llgl2 and Grb7 promoter regions could interact with the Oct4 DE,
we designed 2 different sets of sgRNAs to target the CRISPR-Dam identified peaks at
the promoter regions of the two candidate genes (Table 3.3). E14 cells transfected with
the inducible FLAG-Cas9-Dam construct and corresponding sgRNA were induced for 6
hours before harvesting and DpnII digestion. Fragments were cleaned up and assayed
Figure 3.7 Reciprocal CRISPR-Dam Interaction Plots from Llgl2 locus to the Oct4 DE locus
A. Llgl2 promoter targeting sgRNA 1 and 2 transfected into E14 cells were assayed for Dam
labeling at 30 GATC sites adjacent to the Pou5f1 locus after 6 hour doxycycline induction.
Signals are normalized to Cas-Dam+/sgRNA- and Cas-Dam-/sgRNA+ samples.
LP1
LP2
A
76
by qPCR at 30 sites at the Oct4 locus. Signals were normalized to the internal Gapdh
promoter GATC site for control and Cas9-Dam+/sgRNA- and Cas9-Dam-/sgRNA+
samples. Interestingly, interaction signals were detected at the TSS and downstream
gene body region of Pou5f1 all targeted peak sites and only our Grb7 targeted samples
saw interaction at the DE locus (Figures 3.7 and Figure 3.8). This suggested that
interactions with the Oct4 DE were relatively infrequent and most of the interactions
Figure 3.8 Reciprocal CRISPR-Dam Interaction Plots from Grb7 locus to the Oct4 DE locus
A. Grb7 promoter targeting sgRNA 1 and 2 transfected into E14 cells were assayed for Dam
labeling at 30 GATC sites adjacent to the Pou5f1 locus after 6 hour doxycycline induction.
Signals are normalized to Cas-Dam+/sgRNA- and Cas-Dam-/sgRNA+ samples.
GP1
GP2
A
77
occurring are with the promoter of Oct4. This also suggested a “poised” state, where both
genes are colocalized with the Oct4 promoter to maintain proximity to the Oct4 DE, which
leads to co-transcription of genes. This phenomenon was previously observed when
DNA-RNA FISH was performed to correlate Oct4 DE looping activity with Oct4 gene
transcription (Wei et al., 2013).
3.4.3 Oct4 distal enhancer CR4 region directly regulates Llgl2 and Grb7.
While OCT4 protein has been extensively studied for its role in pluripotency, the
role of the Oct4 DE in naïve state pluripotent cells has not been fully explored. Dissection
of the DE have revealed several critical conserved regions (CR) shared across species
containing multiple transcription factor binding sites, with the CR4 region shown to be
Figure 3.9 Schematic of LoxP Sites Insertion Flanking CR4 Region Within the Oct4 DE
A. Pairs of guide RNAs were designed with corresponding oligonucleotide repair templates
containing loxP sites and modified PAM sequences. Doxycycline inducible transgenic Oct4
mRNA and tamoxifen inducible CreERT2 were separately transduced as lentiviruses.
A
78
necessary for auto-regulation of Oct4 expression by an OCT4-SOX2 complex (Chew et
al., 2006; Wu and Schöler, 2014). Insufficient Oct4 expression directly leads to loss of
pluripotency gene expression network and initiates differentiation gene expression (Niwa,
2007). With our 4C and 3C bait regions designated at the HindIII site nearly flanking the
CR4 region, we hypothesized that looping interactions of the Oct4 DE may be regulated
by the CR4 region and be directly involved in activation of target gene expression.
To determine if the Oct4 distal enhancer plays a direct role in regulating Grb7 and
Lgl2 gene expression in addition to Oct4 gene expression, we designed doxycycline
inducible transgenic Oct4 E14 cell lines in which an independently inducible Cre-loxP
system can conditionally knock out the DE while maintaining total OCT4 levels (Figure
3.9). Mono-allelic and bi-allelic Oct4 DE deletion cell lines, -/- and +/- respectively, were
isolated and clonally expanded for comparison with wild type +/+ cells. Tamoxifen
treatment led to efficient deletion of Oct4 distal enhancer(Figure 3.10.A). qPCR was
A
Figure 3.10 Characterizing Oct4 DE CR4 Floxed ESCs
A. PCR genotyping of CR4 locus in wildtype (+/+), heterozygous (+/-), and
homozygous (-/-) loxP inserted E14 cell lines after treatment with 0.1M tamoxifen
for 48 hours.
B. qPCR comparison of wild type (+/+) and bi-allelic (-/-) loxP inserted ESC
clones for Oct4, Llgl2, and Grb7 expression after tamoxifen and/or doxycycline
treatment. Data plotted are mean±SEM for three replicates. Statistical analysis by
students t-test (**p < 0.001, **p < 0.001).
B
79
performed to measure total Oct4 expression as well as Llgl2 and Grb7 expression after
treatment with tamoxifen or tamoxifen and doxycycline (Figure 3.10.B). The mono-allelic
+/- Oct4 DE ESCs did not show loss of Oct4 expression after tamoxifen treatment and
saw increases in Llgl2 and Grb7 expression even with addition of doxycycline. In the
Figure 3.11 Pluripotency State of Oct4 DE CR4 Floxed ESCs Is Disrupted
A. Alkaline phosphatase staining of tamoxifen and/or doxycycline treated wild type E14 and
homozygous CR4 floxed cell lines. 40X scale.
B. Quantification of cell colonies for wild type E14 and homozygous CR4 floxed cell lines.
ANOVA tests were performed to calculate significance (***p < 0.0001).
C. Higher magnification of AP stained E14 and CR4 floxed cell lines after tamoxifen or
doxycycline induction conditions. 100X scale.
D. Colony sizes of AP stained colonies by pixel area. ANOVA tests were performed to
calculate significance (***p < 0.0001).
A
B
C
D
80
tamoxifen only treatment condition, Oct4 gene expression was almost completely
abolished in the Oct4 DE -/- lines, confirming that the Oct4 DE is required for Oct4
expression. Additionally, both Llgl2 and Grb7 expression was reduced in the -/- cell line,
suggesting that they may be regulated by the Oct4 DE. To rule out the possibility that the
reduction of Lgl2 and Grb7 expression is an indirect effect of reduced Oct4 gene
expression, we measured Llgl2 and Grb7 levels where Oct4 gene expression was
restored with addition of doxycycline. Neither Llgl2 nor Grb7 expression was rescued
after doxycycline treatment in the -/- cell line in which Oct4 expression is restored to a
similar level to wild type cells (Figure 3.10.B). We performed additional experiments by
staining treated cell lines treated with alkaline phosphatase (AP) staining as an indication
of pluripotency status. All cell lines had normal levels of AP staining before and after
doxycycline only treatment, but Oct4 DE deleted cells treated with tamoxifen showed
reduced numbers of AP positive cells (Figures 3.11.A-C)) and reduced colony size (Figure
3.11.D) regardless of doxycycline treatment, indicating impaired pluripotency and self-
renewal. These experiments demonstrated that Oct4 DE directly regulates Llgl2 and Grb7
gene expression despite these genes being located on a different chromosome.
3.4.4 Llgl2 and Grb7 Knockdown Correlates to a Disrupted Pluripotency State
As newly identified direct targets of the Oct4 DE, we examined the functional role
of Llgl2 and Grb7 in pluripotent cells. Comparing the expression of both genes in ESCs
and non-pluripotent mouse embryonic fibroblasts, we observed much higher expression
of these genes in pluripotent cells, suggesting that these genes may play a role in
pluripotency (Figure 3.12). We then employed an shRNA strategy to deplete the mRNA
of each gene in E14 ESCs. Cells were incubated for 2 days after lipofection of shRNA
81
plasmids before staining for AP. Colonies with Llgl2 and Grb7 knockdown by differently
targeting shRNA showed significantly decreased numbers of AP positive ESC colonies
compared to controls (Figures 3.12.B-C). Knock down colonies also began to show a
differentiation phenotype with less distinct colony borders and flattened morphology.
Expression of Llgl2 and Grb7 decreased by 70% and 75%, respectively upon shRNA
knockdown (Figure 3.13). Key pluripotency genes such as Oct4, Nanog, Sox2, Klf4,
Rex1, and Esrrb also had significantly decreased expression after knockdown of either
Llgl2 or Grb7, with only Utf1 and Sall4 not changing significantly for both (Figure 3.13).
Trophectoderm associated genes, Cdx2, Eomes, and Hand1 were significantly elevated
in the knockdown cells of either gene compared to control (Figure 3.13). Expression of
early differentiation genes of the three germ layers, including early marker genes for
mesoderm, Snail and Gsc, endoderm, Sox17, Gata4, Gata6, and ectoderm, Sox1, were
significantly increased (Figure 3.13). These experiments demonstrated that Llgl2 and
Grb7 were required for maintenance of stem cell self-renewal and pluripotency.
Figure 3.12 Loss of Llgl2 or Grb7 Expression Perturbs the Pluripotent State in ESCs
A. qPCR on ESCs and MEFs show Llgl2 and Grb7 are endogenously expressed at higher
levels in ESCs than in MEFs.
B. Llgl2 and Grb7 knock down by shRNAs decreases numbers of AP+ staining ESC colonies
as compared to knock down by control shRNA. Representative image shown for triplicate
wells.
C. Quantification of AP+ staining ESC colonies per area (mm2). Data plotted are mean±SEM
for replicate wells quantified in triplicate n = 9. (***p<0.0001).
A
B
C
82
3.4.5 Overexpression of Llgl2 and Grb7 Enhances Reprogramming Efficiency
To further elucidate the functional role of Llgl2 and Grb7 in pluripotency, we
overexpressed these genes in an induced pluripotent cell (iPS) reprogramming context
to determine whether either gene could enhance the establishment of the pluripotent
state. We employed a retroviral reprogramming strategy with pMX vectors expressing
human Oct4, Sox2, Klf4 and c-Myc. Primary MEFs were additionally transduced with
either pMX control, mLlgl2-FLAG, or mGrb7-FLAG mRNA expressing vectors with
mCherry expression to control for transduction efficiency (Figure 3.14.A-B). Wells
transduced with either Llgl2 or Grb7 showed larger colony sizes compared to controls
Figure 3.13 Knock Down of Llgl2 and Grb7 Induces Loss of Pluripotency Gene Expression and
Increases Differentiation Gene Expression
A. qPCR measuring pluripotency gene (Oct4, Nanog, Sox2, Klf4, Rex1, Esrrb, Utf1, Sall4)
expression show overall decrease after knock down of Llgl2 and Grb7 compared to control
knock down. Early differentiation gene expression for each germ layer (Cdx2, Hand1, Eomes
for trophectoerm) (Fgf5, Sox1, Otx2 for ectoderm) (Snail, Gsc, T for mesoderm) (Sox17,
Gata4, Gata6 for endoderm) are up regulated after knockdown. Data plotted are mean±SEM
for triplicate samples (*p < 0.01, p < ** 0.001, ***p < 0.0001).
A
83
(Figure 3.14.C). AP staining demonstrated that significantly more AP positive colonies
were observed in wells transduced with Llgl2 and Grb7 compared to control cells (Figure
3.15.A-B). The sizes of these colonies were also larger (Figure 3.15.B). Overall, Llgl2
and Grb7 could induce more somatic cells into induced pluripotent stem cells.
Wells were assayed for pluripotency gene expression at 7 and 14 days post
transduction. As compared to control transduced colonies, wells overexpressing Llgl2 and
Figure 3.14 Somatic Cell Reprogramming Is Enhanced With Overexpression of Llgl2 or Grb7
A. Schematic of the reprogramming process, where a non-pluripotent cell type such as a
mouse embryonic fibroblast (MEF) is transfected with reprogramming factors OSKM. Over a
period of 2 weeks, certain population of cells will become induced pluripotent stem cells
(iPSCs)
B. Time course for reprogramming of early passage MEFs with retroviral supernatants
containing pMX-hOct4, hSox2, hKlf4, hc-Myc, and either pMX-T2A-mCherry, mLlgl2-T2A-
mCherry, or mGrb7-T2A-mCherry. Representative colonies imaged at 14 days post
transduction. Images taken at 40X scale.
C. Quantification of reprogramming colony sizes by day 14 post transduction. Colonies sizes
were compared and analyzed statistically with ANOVA. Number of colonies = N, (p < ** 0.001,
***p < 0.0001).
A
B
C
mCherry / Llgl2 / Grb7
+
84
Grb7 expressed higher levels of pluripotency marker genes Oct4, Sox2, Nanog, Klf4, and
Rex1 as soon as 7 days post transduction compared with mock and control wells (Figure
3.15.C). A similar pattern of expression was observed at 14 days, with expression for
Sox2 and Nanog significantly increased in Llgl2 and Grb7 overexpressing wells.
Figure 3.15 Llgl2 and Grb7 Overexpression Aids in Reestablishing Endogenous Pluripotency
Gene Expression
A. Alkaline phosphatase staining of colonies on day 14 post transduction. Representative wells
shown.
B. Quantification of AP+ colony numbers and size on day 14 post transduction. Values show
mean±SEM for triplicate wells. (*p < 0.01, ***p < 0.0001).
C. Quantitative PCR analysis of pluripotency gene expression for reprogramming wells at day
7 and day 14 post transduction. Error bars show mean±SEM for triplicates. (*p<0.01,
**p<0.001, ***p < 0.0001)
D. Immunoblotting for pluripotency factors OCT4, SOX2, and NANOG in reprogrammed
colonies from MEFs. LLGL2 and GRB7 overexpression confirmed with anti-FLAG antibody.
Normalized to α-TUBULIN levels.
A B
C
D
85
Comparing protein expression of OCT4, SOX2, and NANOG, we observe slightly higher
expression levels of these pluripotency markers at day 14 (Figure 3.15.D). These
experiments demonstrated that Llgl2 or Grb7 are required for pluripotency and self-
renewal and can enhance reprogramming efficiency to induced pluripotency.
3.5 Discussion
We have identified two new genes that participate in the pluripotency gene network
to regulate ESC pluripotency. By characterizing the long range interactions occurring from
the Oct4 DE through CRISPR-Dam, 3C-PCR, and DNA FISH experiments, we show that
distally located genes Llgl2 and Grb7 are interaction targets of the Oct4 DE. Validating
our result with reciprocal CRISPR-Dam experiments stemming from Llgl2 and Grb7
promoter regions, we demonstrate that these genes interact with the Oct4 locus and may
be coregulated along with Oct4 by the DE. By deletion of the Oct4 DE, we show that Llgl2
and Grb7 are direct targets of the DE and are not secondary targets of the OCT4 protein.
Both Llgl2 and Grb7 are important in maintaining the pluripotent state, where loss of
expression of either gene leads to downregulation of pluripotency related genes and up
regulation of differentiation genes. Llgl2 and Grb7 may have important functional roles in
the pluripotency network as both can enhance reestablishment of the pluripotency gene
network in reprogramming. Our study is novel in demonstrating a new approach to
identifying functional interchromosome targets of important regulatory genomic elements
such as the Oct4 DE.
Within the context of the pluripotency gene regulatory network, the focus on the
core pluripotency factors, OCT4, SOX2, and NANOG, emphasizes the role of the protein
mediated pluripotency. OCT4 is recognized as the “master regulator” of the pluripotent
86
state and responsible for activation of other pluripotency associated genes in somatic cell
reprogramming and maintenance of the pluripotent state (Li and Belmonte, 2017; Li et al.,
2012) However, the experiments performed demonstrate that the role of the Oct4 DE is
equally if not more important for regulation of stem cell pluripotency. As predicted to be
part of the super enhancer region upstream of Pou5f1, the DE contains multiple binding
sites for pluripotency factors and architectural chromatin proteins that are known to be
required for stem cell pluripotency and self renewal (Whyte et al., 2013; Wu and Schöler,
2014). In particular, the deletion of the CR4 region disrupts expression of Oct4, but rescue
of protein expression does not rescue expression of DE targeted genes Llgl2 and Grb7,
indicating that OCT4 expression alone is not sufficient to maintain pluripotency. As this is
the first demonstration of endogenous biallelic Oct4 DE deletion in pluripotent cells, our
results show that at least a single allele of the DE is required for sustaining pluripotency.
We expect that Llgl2 and Grb7 are not the only pluripotency related genes directly
targeted by the Oct4 DE, and genome wide profiling of Oct4 DE knockout ESCs in
conjunction with chromosome conformation assays such as CRISPR-Dam may reveal
more participants in this “proximity-mediated pluripotency network”.
As observed in our study, chromosome conformation data requires careful
interpretation and independent validation to be reliable. Some concern has emerged from
trying to correlate spatial proximity information found in DNA-FISH with interaction
frequency from 3C methods, where frequently interacting loci identified from 3C do not
correlate with FISH results (Giorgetti and Heard, 2016; Maass et al., 2018b; Williamson
et al., 2017). With many studies focused on correlating actively transcribing regions of
chromatin containing active RNA polymerase II binding sites with chromatin interaction,
87
there is a tendency to consider interacting genes as direct targets of transcriptional
regulation. However, looping between interacting enhancers and promoters can occur
independently of general transcription factors and RNAPII recruitment. In the well-studied
β-globin gene cluster, enhancer looping is prerequisite to RNAPII transcription, but
interacting genes can just be poised for activation and are only expressed in certain
developmental contexts (Palstra et al., 2003; Tolhuis et al., 2002). Inhibition of RNAPII
elongation can still maintain observed contacts between enhancer elements and target
promoters (Palstra et al., 2008). Conversely, forced looping of the enhancer element
through an artificial zinc finger fused transcription factor, LDB1, was sufficient to recruit
active RNAPII for partial restoration of gene transcription, suggesting that enhancers
involved in spatial organization can direct transcription activation (Deng et al., 2012, 2014).
An even more recent study utilizing orthogonal dCas9 heterodimer complexes to more
specifically induce looping between enhancer-promoter loci showed improved
transcription of targeted genes (Hao et al., 2017). Thus, investigating these important
enhancer looping dynamics can help reveal regulatory networks underlying cell specific
transcription programs and developmental processes, but require careful study to link
chromatin organization with transcriptional outcomes.
How Llgl2 and Grb7 function within pluripotent stem cells to preserve pluripotency
still remains to be investigated, but known mechanisms in other biological contexts may
provide some hints. Llgl2 is observed to associate with Scribble and Discs-large to form
the Scribble complex, which plays a role in asymmetric cell division through establishment
of apico-basal polarity (Yamanaka et al., 2006). In the developing embryo, this axis
determination marks the first division of cells where the outer layer forms a polarized
88
epithelium that later develops into trophectoderm and an inner layer that remains apolar
that contributes to the inner cell mass at later stages. Overexpression of Llgl2 in the
embryo leads to loss of polarity in epithelial cells and expands the basolateral domain
(Chalmers et al., 2005). On the other hand, loss of Llgl2 has been observed to lead to
disorganized cell polarity and excess trophectoderm development in early embryos
(Stephenson et al., 2010). This uncontrolled growth of Llgl2 depleted cells also suggests
it plays a role in the Hippo pathway, as dysregulated Scribble complex has been shown
to downregulate hippo target genes (Cao et al., 2015). Grb7 has known function as an
adaptor protein involved in signal transduction pathways mediating cell migration. The
protein contains an SH2 domain that is able to bind to several membrane receptors and
can be phosphorylated by focal adhesion kinase (FAK) (Chu et al., 2009; Han and Guan,
1999). Grb7 has also been shown to be involved in EGF stimulated mRNA nuclear export
in neurons and regulates translation of specific mRNA species through direct binding
(Tsai et al., 2010). Functional characterization of Llgl2 and Grb7 in mESCs can clarify
their roles within the respective pathways involved in maintaining the pluripotent cell state.
89
Llgl2 Loci GATC sites F/R Primer Sequences Size
L18
Forward CTTTTGCCTTACAGCCTTGC
146
Reverse GGAACATCCCCACTCTGGTA
L19
Forward CCCTCCCTTCATTCCCGATT
179
Reverse CTACCACCACCCAGCTGTT
L20
Forward CAGCGGCCCCTCTGAAGA
236
Reverse GCACAAGAGCCAAGCGAG
L21
Forward CTCGGCTCGGGAGGATAC
161
Reverse GGGCGGAACAGGAAAAGAC
L22
Forward GTAAGGAGGGTGGGGCTTAG
167
Reverse CTTCTTTTAGGCAGGCCACA
L23
Forward CCTTGAAGCATGGAAAAACC
155
Reverse GGCATGGCTTCAGGATAAAA
L24
Forward CTCTTGAGGGACTTGGCTTG
165
Reverse CAAAGGCCACTCAGATGTCA
L25
Forward TTCATGCCTGTGAGTGGGTA
220
Reverse CAGAGGCTGAGAGAGGAGGA
L26
Forward GGTGTTTGGTTTGGAAGTGG
130
Reverse GGGCCAAGACTTTGCTTATG
L27
Forward CCTGGGTGATACTGGCTCTG
164
Reverse CATGCCCACAGTTGTCCA
Grb7 Loci GATC sites F/R Primer Sequences Size
G16
Forward CATGGTGGCACATGACTGTA
156
Reverse TCAAGATAGGGTTTTTCCCTGT
G17
Forward ATTCAGCTGGAGTCTGGCTT
136
Reverse GCCATCCATCCTGTTTGTCC
G18
Forward TGTCTGTCGTCACTCTGTTTT
187
Reverse GCCTCCCTTAGCTTTTCCG
G19
Forward CCCAGTGGCTTGTTTGTTTT
190
Reverse CACAGTTCTGGTGACCGTTG
G20
Forward CCTCTTCTGGCTTCCCTTTT
174
Reverse TACATGGGAAGGGGAGTTTG
G21
Forward TATTCTCCCGCCTCAACATT
153
Reverse GAGCACCTCCAACAGAAGGA
G22 Forward TCCTTCCCCTTCCTCATCTT 246
G22
Forward TCCTTCCCCTTCCTCATCTT
246
Reverse GGACCACAACGTGGAGAGAT
G23
Forward GGGAGTAAGGGAGAGCGACT
151
Reverse CACACTACACCCACCCCTCT
G24
Forward CCAACGGAACCTTTGCTTAG
181
Reverse AAGCTCAGGTGCAGGCTAAA
G25
Forward AGAGGCAGGTTGTTTGTGCT
188
Reverse GACTGTGAGGCTACCCAGGA
G26
Forward CTGGGATTGGCATTTTGTCT
189
Reverse CTGAAAGCACCTGGCATACA
G27
Forward AAAGGACGCACATCCTGTTC
120
Reverse ATAGAGAAAGGGCGGAGCTG
G28
Forward CAGCTCCGCCCTTTCTCTAT
211
Reverse CCTACCAATGCATTCCCATT
GAPDHcontrol
Forward GGGAGAACAGGGGAAATGGA
176
Reverse CCACCCTGGCATTTTCTTCC
Table 3.2 CRISPR-Dam Tiling Primers at Llgl2 and Grb7
90
Table 3.3 CRISPR-Dam Targeting sgRNA Sequences for Reciprocal Pou5f1 Locus Interaction
91
Table 3.4 3C-PCR Primers
92
sgRNA
CR4-sgRNA_F CACCGCCCACACCCAGTTCCTCCCA
CR4-sgRNA_R AAACTGGGAGGAACTGGGTGTGGGC
DE3'-sgRNA_F CACCGCAGTCTGAGGATCCCATTAC
DE3'-sgRNA_R AAACGTAATGGGATCCTCAGACTGC
Repair Template Oligos
CR4-donor
GGCTGCAGGCATACTTGAACTGTGGTGGAGAGTGCTGTCTAGGCCTT
AGAGGCTGGTCCTGGGAGGAACTGGGTGTGGGGAGGATAACTTCGT
ATAGCATACATTATACGAAGTTATTTGTAGCCCGACCCTGCCCCTCCC
CCCAGGGAGGTTGAGAGTTCTGGGCAGACGGCAGATGCATAACAAA
GGTGCATGATAGC
DE3'-donor
TCCCTTGCAGACAGGCACTCTGAGGGCTATTCTCTTGCAAAGATAACT
AAGCACCAGGGCAGTAATGATAACTTCGTATAGCATACATTATACGAA
GTTATGGATCCTCAGACTGGGCCCAGAAAACCACTCTAGGGAAGTTC
AGGGTAGGCTCTCTGCACCCCCTCCTCCTAATCCCGTCTCCTTAGTGT
CTTTCCGCC
LoxP Genotyping
CR4flankF1 TGGCCAGTGAGTCACCAAAA
CR4flankR1 GGACTCCGGTGTTCATCCTC
Cre Genotyping
CreGenoF GAACCTGATGGACATGTTCAGG
CreGenoR AGTGCGTTCGAACGCTAGAGCCTGT
Table 3.5 CRISPR Knock-In LoxP Primers
93
Llgl2 shRNA Sequences
No. GE Dharmacon No. Targeted Sequence
1 25726 TATCAGATAGAAACCTTGG
2 26808 TAGCTTGACAGCTCCAGAG
3 31673 TTCGTTATAGCTTTGCAAG
4 459622 TGAATCCACCAAACACCGG
5 459623 AGAATCTCAGTGACCTGCG
6 459624 CTGCATCTCAGTGGTCCAC
7 459626 TGCAGAGGAAAATCTTCTG
Grb7 shRNA Sequences
No. TRC No. Targeted Sequence
1 TRCN0000097204 CTCTCTCAACCAGTGAAACAT
2 TRCN0000097206 CGCCAAGTATGAACTATTCAA
3 TRCN0000097207 CTTCGCCAAGTATGAACTATT
4 TRCN0000097208 CCACCTTCACAGAAACCCATT
Scramble shRNA
No. N/A Targeted Sequence
1
GTTAACTATCCTAGCTCTAA
Table 3.6 shRNA Targeted Sequences
94
Gene F/R Primer Sequences
Tm ( ℃)
mOct3/4
(total Oct4)
Forward GGCGTTCTCTTTGGAAAGGTGTTC 68
Reverse CTCGAACCACATCCTTCTCT
mOct3/4
(endogenous Oct4)
Forward CCTGGCCTGTCTGTCACTCA 69
Reverse GTGTCCCAGTCTTTATTTAAGAACAAAAT 63
mNanog Forward CTCTTCAAGGCAGCCCTGAT 67
Reverse CCATTGCTAGTCTTCAACCAC 63
mSox2 Forward GGTTACCTCTTCCTCCCACTCCAG 71
Reverse TCACATGTGCGACAGGGGCAG 73
mKlf4 Forward CACCATGGACCCGGGCGTGGCTGCCAGAAA 84
Reverse TTAGGCTGTTCTTTTCCGGGGCCACGA 76
mRex1 Forward GAGTGCATCATACGAGGTGAG
65
Reverse TCCAGAACCTGGCGAGAAAGG 69
mEsrrb Forward
CGCAAGAGCTACGAGGACTG
68
Reverse
GGTAGCCAGAGGCAATGTCC
68
mUtf1 Forward
AGGTTGCTCCCCAGTCGTTG
70
Reverse
TGGTTCAGGGTCGACAGCTG
70
mSall1 Forward
CATCATGAATGACAGCGAGGG
66
Reverse
CCGTTCTCCTGGAGACTGTC
67
mCdx2 Forward
TGCTGCAGACGCTCAACCTC
71
Reverse
TCCACTCGCACAGGTTTCGC
71
mHand1 Forward
ACAGAGAGCATTAACAGCGCG
68
Reverse
ACGTCCATCAAGTAGGCGATG
67
mEomes Forward
CTGCACAAATACCAACCGAGG
66
Reverse
GAAGGTCTGAGTCTTGGAAGG
65
mSnail Forward TAGGTCGCTCTGGCCAACAT 69
Reverse CTGGAAGGTGAACTCCACACA 67
mT Forward
CCAGCTGCGTCTACATCCAC
68
Reverse
AAGCAGTGGCTGGTGATCATG
68
mGsc Forward
AGATGCTGCCCTACATGAACG
67
Reverse
TGCTCATCGGTGAAGATGGTG
67
mLlgl2 Forward
ACCCAGAACCCAGAACCTCT
68
Reverse
ATGACACGGGAGGTGAAGTC
67
mGrb7 Forward
GGTTTCATGGACGCATCTCT
65
Reverse
AGGTCTGTGAAACGGGTCTG
67
mFgf5 Forward
CATCGGTTTCCATCTGCAGATC
66
Reverse
CTCCTCGTATTCCTACAATCCC
64
mSox1 Forward
CATGAACGCCTTCATGGTGTGG
69
Reverse
TCGGACATGACCTTCCACTCG
69
mOtx2 Forward
GGAAGTGAGTTCAGAGAGTGG
65
Reverse
TCCAGATAGACACTGGAGCAC
66
mSox17 Forward
CCAGATCTGCACAACGCAGAG
69
Reverse
TGGTCCTGCATATGCTGCACG
71
mGata4 Forward
ACTTCTCAGAAGGCAGAGAGTG
66
Reverse
GGTTGATGCCGTTCATCTTGTG
67
mGata6 Forward
ATCCAGACGCCACTGTGGAGA
71
Reverse
CTGAGGCCATTCATCTTGCTG
66
mGAPDH Forward CATCACCATCTTCCAGGAGC 65
Reverse GCTGTAGCCGTATTCATTGTC 63
Table 3.7 RT-qPCR Gene Expression Primers
95
Chapter 4: Summary and Future Perspectives
4.1 Summary
As been shown in this work and other studies, chromatin interactions exert a
significant level of influence over gene expression programs and plays a role in cell
identity. Often mediated by enhancers bound by transcription factors and nuclear
architecture proteins to contact their target genes, looping mechanisms exist in the
nucleus to bring these chromatin loci into proximity with each other to allow for
transcriptional regulation and cell identity determination. Due to folding of the chromatin,
these interactions do not only occur within the local chromatin neighborhood, but also
occur at extremely long distances that allow interaction between distal loci on the same
chromosome and other chromosomes. To properly investigate all of these interactions
genome wide beyond the confines of the local chromatin, the various approaches that
currently exist need to be evaluated for potential bias and limitations.
The blind spots in 3C-based assays potentially miss many significant and
functionally important interchromosomal chromatin interactions. Reliance on external
biochemical manipulation like formaldehyde fixation and proximity ligation creates
discrepancies when comparing 3C results to those obtained through other chromatin
conformation profiling techniques like FISH (Maass et al., 2018b). The key parameter to
capturing high fidelity chromatin interactions may be minimizing the effects of artificial
fixation through use of live cells. Our CRISPR-Dam results support this observation, along
with other studies performing “intrinsic” 3C assays without fixation (Brant et al., 2016).
96
Wider adoption of non-crosslinking assays to explore chromatin interactions will produce
more faithful interaction maps and improve identification of functional targets.
4.2 Future Development of CRISPR-Dam
We can envision the CRISPR-Dam assay being performed in other cell types to
investigate genome wide interactions of important super enhancers and other regulatory
regions. As the technique can better capture genome wide interactions occurring at all
distances, the assay can be highly useful in studies interested in identifying which targets
are directly regulated by particular super enhancers. In combination with knock out
studies, a more complete picture can be constructed to map chromatin contacts that play
significant roles in regulating determinative transcription programs in specific cells. In
pluripotent stem cells, additional super enhancers have been identified in regions
proximal to important pluripotency genes such as Sox2 and Nanog (Blinka et al., 2016; Li
et al., 2014; Whyte et al., 2013). Experiments to characterize the complex “proximity-
mediated” network sustained by these super enhancers can further reveal other important
genes involved in regulating pluripotency.
Additionally, chromatin dynamics can also be captured with our CRISPR-Dam
assay to trace changes in chromatin contacts as cells undergo differentiation or other cell
transformations. Single cell profiling of CRISPR-Dam labeling can provide insights into
real time contact frequencies of specific chromatin loci. Combining CRISPR-Dam with
other orthogonal DNA labeling methods such as adenine base editors (ABEs) may
provide multiplex capabilities to identify long range chromatin contacts of multiple
chromatin loci (Gaudelli et al., 2017). As a possible future development of the assay,
97
visualization of Dam labeled loci potentially can be achieved by coexpression of a
catalytically dead DpnI enzyme fused with fluorophore for following chromatin dynamics
in vivo.
4.3 Functional Chromatin Interactions In Disease
Better understanding and characterization of chromatin interactions with improved
chromosome conformation tools may help elucidate how aberrant interactions in
dysfunctional cells contribute to disease development. In the most extreme case,
chromosomal translocations are already known to be causative factors in malignant
transformation, driving progression of cancers like chronic myelogenous leukemia (CML).
Fusion between duplicated regions of chromosome 22 and 9 creates a hybrid
“Philadelphia” chromosome that contains a BCR-ABL fusion gene, which is constitutively
expressed and inappropriately activates tyrosine kinase signaling, leading to uncontrolled
cell proliferation (Kang et al., 2016). In addition to the inappropriate activation of signaling
cascades, we can expect a completely altered chromatin interaction network stemming
from the duplicated and translocated chromosome regions, which may also drive
transcription of other oncogenes through abnormal enhancer dynamics. Profiling
chromatin looping in cancer cells at enhancer regions that drive expression of oncogenes
or repression of tumor suppressor genes can reveal potential risk enhancer loci or driver
mutations in particular types of cancer. Such studies have already been performed in
prostate and colorectal cancer, identifying a risk SNP interacting with the Myc oncogene
to drive cancer progression (Ahmadiyeh et al., 2010; Pomerantz et al., 2009).
98
As we understand more about the underlying interaction networks and their
associated transcriptional programs in cells, artificial maintenance and manipulation of
nuclear architecture may become a new avenue of investigation in epigenetic medicine.
As already investigated in other studies, the forced looping of chromatin into artificial
loops were found to be sufficient to activate gene transcription at previously silent genetic
loci (Deng and Blobel, 2014; Deng et al., 2014). Targeting of activating CRISPR-Cas9
constructs to pluripotency gene promoters has also demonstrated induced looping at
those targeted loci and can induce reprogramming of somatic cells into iPSCs (Hao et al.,
2017; Liu et al., 2018). Induced chromatin looping using programmable targeting systems
such as CRISPR may be novel therapies in the future.
99
Bibliography
Ahmadiyeh, N., Pomerantz, M.M., Grisanzio, C., Herman, P., Jia, L., Almendro, V., He, H.H.,
Brown, M., Liu, S., Davis, M., et al. (2010). 8q24 prostate, breast, and colon cancer risk loci
show tissue-specific long-range interaction with MYC. PNAS 107, 9742–8746.
Allahyar, A., Vermeulen, C., Bouwman, B.A.M., Krijger, P.H.L., Verstegen, M.J.A.M., Geeven,
G., van Kranenburg, M., Pieterse, M., Straver, R., Haarhuis, J.H.I., et al. (2018). Enhancer hubs
and loop collisions identified from single-allele topologies. Nat. Genet. 1.
Amano, T., Sagai, T., Tanabe, H., Mizushina, Y., Nakazawa, H., and Shiroishi, T. (2009).
Chromosomal Dynamics at the Shh Locus: Limb Bud-Specific Differential Regulation of
Competence and Active Transcription. Dev. Cell 16, 47–57.
Apostolou, E., Ferrari, F., Walsh, R.M., Bar-Nur, O., Stadtfeld, M., Cheloufi, S., Stuart, H.T.,
Polo, J.M., Ohsumi, T.K., Borowsky, M.L., et al. (2013). Genome-wide chromatin interactions of
the nanog locus in pluripotency, differentiation, and reprogramming. Cell Stem Cell 12, 669–
677.
Aughey, G.N., and Southall, T.D. (2016). Dam it’s good! DamID profiling of protein-DNA
interactions. Wiley Interdiscip. Rev. Dev. Biol. 5, 25–37.
Bedell, V.M., Wang, Y., Campbell, J.M., Poshusta, T.L., Starker, C.G., Krug II, R.G., Tan, W.,
Penheiter, S.G., Ma, A.C., Leung, A.Y.H., et al. (2012). In vivo genome editing using a high-
efficiency TALEN system. Nature 491, 114–118.
Bikard, D., Jiang, W., Samai, P., Hochschild, A., Zhang, F., and Marraffini, L.A. (2013).
Programmable repression and activation of bacterial gene expression using an engineered
CRISPR-Cas system. Nucleic Acids Res. 41, 7429–7437.
Blinka, S., Reimer, M.H., Pulakanti, K., and Rao, S. (2016). Super-Enhancers at the Nanog
Locus Differentially Regulate Neighboring Pluripotency-Associated Genes. Cell Rep. 17, 19–28.
Brant, L., Georgomanolis, T., Nikolic, M., Brackley, C.A., Kolovos, P., van Ijcken, W., Grosveld,
F.G., Marenduzzo, D., Papantonis, A., Beneke, S., et al. (2016). Exploiting native forces to
capture chromosome conformation in mammalian cell nuclei. Mol. Syst. Biol. 12, 891.
Buecker, C., Srinivasan, R., Wu, Z., Calo, E., Acampora, D., Faial, T., Simeone, A., Tan, M.,
Swigut, T., and Wysocka, J. (2014). Reorganization of enhancer patterns in transition from
naive to primed pluripotency. Cell Stem Cell 14, 838–853.
Cai, S., Lee, C.C., and Kohwi-Shigematsu, T. (2006). SATB1 packages densely looped,
transcriptionally active chromatin for coordinated expression of cytokine genes. Nat. Genet. 38,
1278–1288.
Calmann, M.A., and Marinus, M.G. (2003). Regulated expression of the Escherichia coli dam
gene. J Bacteriol 185, 5012–5014.
Chen, B., Gilbert, L.A., Cimini, B.A., Schnitzbauer, J., Zhang, W., Li, G.W., Park, J., Blackburn,
E.H., Weissman, J.S., Qi, L.S., et al. (2013). Dynamic imaging of genomic loci in living human
cells by an optimized CRISPR/Cas system. Cell 155, 1479–1491.
Chen, S., Sanjana, N.E., Zheng, K., Shalem, O., Lee, K., Shi, X., Scott, D.A., Song, J., Pan,
100
J.Q., Weissleder, R., et al. (2015). Genome-wide CRISPR screen in a mouse model of tumor
growth and metastasis. Cell 160, 1246–1260.
Chew, J., Loh, Y., Zhang, W., Chen, X., Tam, W., Yeap, L., Li, P., Ang, Y., Robson, P., Ng, H.,
et al. (2006). Reciprocal Transcriptional Regulation of Complex in Embryonic Stem Cells
Reciprocal Transcriptional Regulation of Pou5f1 and Sox2 via the Oct4 / Sox2 Complex in
Embryonic Stem Cells. Mol. Cell. Biol. 25, 6031–6046.
Cho, S.W., Kim, S., Kim, J.-S.J.M., and Kim, J.-S.J.M. (2013). Targeted genome engineering in
human cells with the Cas9 RNA-guided endonuclease. Nat. Biotechnol. 31, 230–232.
Cho, S.W., Kim, S., Kim, Y., Kweon, J., Kim, H.S., Bae, S., and Kim, J.-S.S. (2014). Analysis of
off-target effects of CRISPR/Cas-derived RNA-guided endonucleases and nickases. Genome
Res. 24, 132–141.
Choi, H.W., Joo, J.Y., Hong, Y.J., Kim, J.S., Song, H., Lee, J.W., Wu, G., Sch??ler, H.R., Do,
J.T., Schöler, H.R., et al. (2016). Distinct Enhancer Activity of Oct4 in Naive and Primed Mouse
Pluripotency. Stem Cell Reports 7, 911–926.
Christian, M., Cermak, T., Doyle, E.L., Schmidt, C., Zhang, F., Hummel, A., Bogdanove, A.J.,
and Voytas, D.F. (2010). Targeting DNA double-strand breaks with TAL effector nucleases.
Genetics 186, 756–761.
Chronis, C., Fiziev, P., Papp, B., Butz, S., Bonora, G., Sabri, S., Ernst, J., and Plath, K. (2017).
Cooperative Binding of Transcription Factors Orchestrates Reprogramming. Cell 168, 442–
459.e20.
Chu, P.-Y., Huang, L.-Y., Hsu, C.-H., Liang, C.-C., Guan, J.-L., Hung, T.-H., and Shen, T.-L.
(2009). Tyrosine phosphorylation of growth factor receptor-bound protein-7 by focal adhesion
kinase in the regulation of cell migration, proliferation, and tumorigenesis. J. Biol. Chem. 284,
20215–20226.
Chu, V.T., Weber, T., Wefers, B., Wurst, W., Sander, S., Rajewsky, K., and Kühn, R. (2015).
Increasing the efficiency of homology-directed repair for CRISPR-Cas9-induced precise gene
editing in mammalian cells. Nat. Biotechnol. 33, 543–548.
Cléard, F., Moshkin, Y., Karch, F., and Maeda, R.K. (2006). Probing long-distance regulatory
interactions in the Drosophila melanogaster bithorax complex using Dam identification. Nat.
Genet. 38, 931–935.
Cong, L., Zhou, R., Kuo, Y.C., Cunniff, M., and Zhang, F. (2012). Comprehensive interrogation
of natural TALE DNA-binding modules and transcriptional repressor domains. Nat. Commun. 3,
966–968.
Cong, L., Ran, F.A., Cox, D., Lin, S., Barretto, R., Hsu, P.D., Wu, X., Jiang, W., and Marraffini,
L.A. (2013). Multiplex Genome Engineering Using CRISPR/Cas Systems. Science (80-. ). 339,
819–823.
Cremer, T., and Cremer, C. (2001). Chromosome Territories, Nuclear Architecture and Gene
Regulation in Mammalian Cells. Nat. Rev. Genet. 2, 292–301.
Cremer, T., and Cremer, M. (2010). Chromosome territories. Cold Spring Harb. Perspect. Biol.
2, 1–22.
101
Davies, J.O.J., Telenius, J.M., McGowan, S.J., Roberts, N.A., Taylor, S., Higgs, D.R., and
Hughes, J.R. (2015). Multiplexed analysis of chromosome conformation at vastly improved
sensitivity. Nat. Methods 13, 1–10.
Dekker, J., Marti-Renom, M.A., and Mirny, L.A. (2013). Exploring the three-dimensional
organization of genomes: interpreting chromatin interaction data. Nat. Rev. Genet. 14, 390–403.
Deng, W., and Blobel, G.A. (2014). Manipulating nuclear architecture. Curr. Opin. Genet. Dev.
25, 1–7.
Deng, W., Lee, J., Wang, H., Miller, J., Reik, A., Gregory, P.D., Dean, A., and Blobel, G.A.
(2012). Controlling long-range genomic interactions at a native locus by targeted tethering of a
looping factor. Cell 149, 1233–1244.
Deng, W., Rupon, J.W., Krivega, I., Breda, L., Motta, I., Jahn, K.S., Reik, A., Gregory, P.D.,
Rivella, S., Dean, A., et al. (2014). Reactivation of developmentally silenced globin genes by
forced chromatin looping. Cell 158, 849–860.
Denker, A., and De Laat, W. (2016). The second decade of 3C technologies: Detailed insights
into nuclear organization. Genes Dev. 30, 1357–1382.
Dhanjal, J.K., Radhakrishnan, N., and Sundar, D. (2017). Identifying synthetic lethal targets
using CRISPR/Cas9 system. Methods 131, 66–73.
Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., and Ren, B. (2012).
Topological domains in mammalian genomes identified by analysis of chromatin interactions.
Nature 485, 376–380.
Fei, T., Chen, Y., Xiao, T., Li, W., Cato, L., Zhang, P., Cotter, M.B., Bowden, M., Lis, R.T., Zhao,
S.G., et al. (2017). Genome-wide CRISPR screen identifies HNRNPL as a prostate cancer
dependency regulating RNA splicing. Proc. Natl. Acad. Sci. 201617467.
Fu, Y., Foden, J.A., Khayter, C., Maeder, M.L., Reyon, D., Joung, J.K., and Sander, J.D. (2013).
High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells. Nat.
Biotechnol. 31, 822+.
Fu, Y., Rocha, P.P., Luo, V.M., Raviram, R., Deng, Y., Mazzoni, E.O., and Skok, J.A. (2016).
CRISPR-dCas9 and sgRNA scaffolds enable dual-colour live imaging of satellite sequences and
repeat-enriched individual loci. Nat. Commun. 7, 11707.
Fulco, C.P., Munschauer, M., Anyoha, R., Munson, G., Grossman, S.R., Perez, E.M., Kane, M.,
Cleary, B., Lander, E.S., and Engreitz, J.M. (2016). Systematic mapping of functional enhancer–
promoter connections with CRISPR interference. Science (80-. ). 354, 769–773.
Gao, F., Wei, Z., Lu, W., and Wang, K. (2013). Comparative analysis of 4C-Seq data generated
from enzyme-based and sonication-based methods. BMC Genomics 14, 345.
Gaspar-Maia, A., Alajem, A., Polesso, F., Sridharan, R., Mason, M.J., Heidersbach, A.,
Ramalho-Santos, J., McManus, M.T., Plath, K., Meshorer, E., et al. (2009). Chd1 regulates
open chromatin and pluripotency of embryonic stem cells. Nature 460, 863–868.
Gaudelli, N.M., Komor, A.C., Rees, H.A., Packer, M.S., Badran, A.H., Bryson, D.I., and Liu, D.R.
(2017). Programmable base editing of A·T to G·C in genomic DNA without DNA cleavage.
Nature 551, 464–471.
102
Gavrilov, A.A., Gushchanskaya, E.S., Strelkova, O., Zhironkina, O., Kireev, I.I., Iarovaia, O. V.,
and Razin, S. V. (2013a). Disclosure of a structural milieu for the proximity ligation reveals the
elusive nature of an active chromatin hub. Nucleic Acids Res. 41, 3563–3575.
Gavrilov, A.A., Golov, A.K., and Razin, S. V. (2013b). Actual Ligation Frequencies in the
Chromosome Conformation Capture Procedure. PLoS One 8, e60403.
Gilbert, L.A., Larson, M.H., Morsut, L., Liu, Z., Brar, G.A., Torres, S.E., Stern-Ginossar, N.,
Brandman, O., Whitehead, E.H., Doudna, J.A., et al. (2013). CRISPR-mediated modular RNA-
guided regulation of transcription in eukaryotes. Cell 154, 442–451.
Giorgetti, L., and Heard, E. (2016). Closing the loop: 3C versus DNA FISH. Genome Biol. 17,
215.
Goh, Y., Fullwood, M.J., Poh, H.M., Peh, S.Q., Ong, C.T., Zhang, J., Ruan, X., and Ruan, Y.
(2012). Chromatin Interaction Analysis with Paired-End Tag Sequencing (ChIA-PET) for
Mapping Chromatin Interactions and Understanding Transcription Regulation. J. Vis. Exp. 1–10.
Göndör, A., Rougier, C., and Ohlsson, R. (2008). High-resolution circular chromosome
conformation capture assay. Nat. Protoc. 3, 303–313.
Greil, F., Moorman, C., and van Steensel, B. (2006). DamID: Mapping of In Vivo Protein-
Genome Interactions Using Tethered DNA Adenine Methyltransferase. Methods Enzymol. 410,
342–359.
Hagège, H., Klous, P., Braem, C., Splinter, E., Dekker, J., Cathala, G., de Laat, W., and Forné,
T. (2007). Quantitative analysis of chromosome conformation capture assays (3C-qPCR). Nat.
Protoc. 2, 1722–1733.
Han, D.C., and Guan, J.L. (1999). Association of focal adhesion kinase with Grb7 and its role in
cell migration. J. Biol. Chem. 274, 24425–24430.
Handoko, L., Xu, H., Li, G., Ngan, C.Y., Chew, E., Schnapp, M., Lee, C.W.H., Ye, C., Ping,
J.L.H., Mulawadi, F., et al. (2011). CTCF-mediated functional chromatin interactome in
pluripotent cells. Nat. Genet. 43, 630–638.
Hao, N., Hao, N., Shearwin, K.E., and Dodd, I.B. (2017). Programmable DNA looping using
engineered bivalent dCas9 complexes. Nat. Commun. 8.
Hass, M.R., Liow, H. haw, Chen, X., Sharma, A., Inoue, Y.U., Inoue, T., Reeb, A., Martens, A.,
Fulbright, M., Raju, S., et al. (2016). SpDamID: Marking DNA Bound by Protein Complexes
Identifies Notch-Dimer Responsive Enhancers. Mol. Cell 64, 213.
Heidari, N., Phanstiel, D., and He, C. (2014). Genome-wide map of regulatory interactions in the
human genome. Genome Res. 24, 1905–1917.
Hnisz, D., Abraham, B.J., Lee, T.I., Lau, A., Saint-André, V., Sigova, A.A., Hoke, H.A., and
Young, R.A. (2013). Super-enhancers in the control of cell identity and disease. Cell 155, 934–
947.
Horike, S., Cai, S., Miyano, M., Cheng, J.-F., and Kohwi-Shigematsu, T. (2005). Loss of silent-
chromatin looping and impaired imprinting of DLX5 in Rett syndrome. Nat. Genet. 37, 31–40.
Horton, J.R., Liebert, K., Hattman, S., Jeltsch, A., and Cheng, X. (2005). Transition from
103
nonspecific to specific DNA interactions along the substrate-recognition pathway of Dam
methyltransferase. Cell 121, 349–361.
Horvath, P., and Barrangou, R. (2010). CRISPR/Cas, the immune system of bacteria and
archaea. Science 327, 167–170.
Hou, Z., Zhang, Y., Propson, N.E., Howden, S.E., Chu, L., Sontheimer, E.J., and Thomson, J.A.
(2013). Efficient genome engineering in human pluripotent stem cells using Cas9 from Neisseria
meningitidis. Proc. Natl. Acad. Sci. USA 110, 15644–15649.
Hwang, W.Y., Fu, Y., Reyon, D., Maeder, M.L., Tsai, S.Q., Sander, J.D., Peterson, R.T., Yeh,
J.-R.R.J., and Joung, J.K. (2013). Efficient genome editing in zebrafish using a CRISPR-Cas
system. Nat. Biotechnol. 31, 1–3.
Jiang, W., Bikard, D., Cox, D., Zhang, F., and Marraffini, L. a (2013). RNA-guided editing of
bacterial genomes using CRISPR-Cas systems. Nat Biotechnol 31, 233–239.
Jinek, M., Chylinski, K., Fonfara, I., Hauer, M., Doudna, J.A., and Charpentier, E. (2012). A
Programmable Dual-RNA – Guided DNA Endonuclease in Adaptice Bacterial Immunity. Science
337, 816–822.
Kang, Z.J., Liu, Y.F., Xu, L.Z., Long, Z.J., Huang, D., Yang, Y., Liu, B., Feng, J.X., Pan, Y.J.,
Yan, J.S., et al. (2016). The philadelphia chromosome in leukemogenesis. Chin. J. Cancer 35,
1–15.
Kearns, N.A., Pham, H., Tabak, B., Genga, R.M., Silverstein, N.J., Garber, M., and Maehr, R.
(2015). Functional annotation of native enhancers with a Cas9-histone demethylase fusion. Nat.
Methods 12, 401–403.
Kim, Y.G., Cha, J., and Chandrasegaran, S. (1996). Hybrid restriction enzymes: zinc finger
fusions to Fok I cleavage domain. Proc. Natl. Acad. Sci. 93, 1156–1160.
Kind, J., Pagie, L., De Vries, S.S., Nahidiazar, L., Dey, S.S., Bienko, M., Zhan, Y., Lajoie, B., De
Graaf, C.A., Amendola, M., et al. (2015). Genome-wide Maps of Nuclear Lamina Interactions in
Single Human Cells. Cell 163, 134–147.
Langer-Safer, P.R., Levine, M., and Ward, D.C. (1982). Immunological method for mapping
genes on Drosophila polytene chromosomes. Proc. Natl. Acad. Sci. U. S. A. 79, 4381–4385.
Lebrun, E., Fourel, G., Gilson, E., and Defossez, P. (2003). A Methyltransferase Targeting
Assay Reveals Silencer-Telomere Interactions in Budding Yeast A Methyltransferase Targeting
Assay Reveals Silencer-Telomere Interactions in Budding Yeast. 23, 1498–1508.
Lee, C.M., Cradick, T.J., and Bao, G. (2016). The Neisseria meningitidis CRISPR-Cas9 System
Enables Specific Genome Editing in Mammalian Cells. Mol. Ther. 24, 645–654.
Lei, Y., Guo, X., Liu, Y., Cao, Y., Deng, Y., Chen, X., Cheng, C.H.K., Dawid, I.B., Chen, Y., and
Zhao, H. (2012). Efficient targeted gene disruption in Xenopus embryos using engineered
transcription activator-like effector nucleases (TALENs). Proc. Natl. Acad. Sci. 109, 17484–
17489.
Levasseur, D.N., Wang, J., Dorschner, M.O., Stamatoyannopoulos, J.A., and Orkin, S.H.
(2008). Oct4 dependence of chromatin structure within the extended Nanog locus in ES cells.
Genes Dev. 22, 575–580.
104
Li, M., and Belmonte, J.C.I. (2017). Ground rules of the pluripotency gene regulatory network.
Nat. Rev. Genet. 18, 180–191.
Li, G., Fullwood, M.J., Xu, H., Mulawadi, F.H., Velkov, S., Vega, V., Ariyaratne, P.N., Mohamed,
Y. Bin, Ooi, H.-S., Tennakoon, C., et al. (2010). ChIA-PET tool for comprehensive chromatin
interaction analysis with paired-end tag sequencing. Genome Biol. 11, R22.
Li, M., Liu, G.-H., and Belmonte, J.C.I. (2012). Navigating the epigenetic landscape of
pluripotent stem cells. Nat. Rev. Mol. Cell Biol. 13, 524–535.
Li, Y., Rivera, C.M., Ishii, H., Jin, F., Selvaraj, S., Lee, A.Y., Dixon, J.R., and Ren, B. (2014).
CRISPR reveals a distal super-enhancer required for Sox2 expression in mouse embryonic
stem cells. PLoS One 9, 1–17.
Lieber, M. (2011). The mechanism of double-strand DNA break repair by the nonhomologous
DNA end-joining pathway. Mol. Microbiol. 181–211.
Lieberman-aiden, E., Berkum, N.L. Van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A.,
Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., et al. (2009). Comprehensive Mapping of
Long-Range Interactions Reveals Folding Principles of the Human Genome. Science (80-. ).
326, 289–293.
Liu, P., Chen, M., Liu, Y., Qi, L.S., and Ding, S. (2018). CRISPR-Based Chromatin Remodeling
of the Endogenous Oct4 or Sox2 Locus Enables Reprogramming to Pluripotency. Cell Stem
Cell 1–10.
Løbner-Olesen, A., Marinus, M.G., and Hansen, F.G. (2003). Role of SeqA and Dam in
Escherichia coli gene expression: A global/microarray analysis. Proc. Natl. Acad. Sci. U. S. A.
100, 4672–4677.
Luo, G.-Z., Blanco, M.A., Greer, E.L., He, C., and Shi, Y. (2015). DNA N(6)-methyladenine: a
new epigenetic mark in eukaryotes? Nat. Rev. Mol. Cell Biol. 16, 705–710.
Ma, H., Naseri, A., Reyes-Gutierrez, P., Wolfe, S. a., Zhang, S., and Pederson, T. (2015).
Multicolor CRISPR labeling of chromosomal loci in human cells. Proc. Natl. Acad. Sci. 112,
3002–3007.
Maass, P.G., Barutcu, A.R., Shechner, D.M., Weiner, C.L., Melé, M., and Rinn, J.L. (2018a).
Spatiotemporal allele organization by allele-specific CRISPR live-cell imaging (SNP-CLING).
Nat. Struct. Mol. Biol. 25, 176–184.
Maass, P.G., Barutcu, A.R., Weiner, C.L., and Rinn, J.L. (2018b). Inter-chromosomal Contact
Properties in Live-Cell Imaging and in Hi-C. Mol. Cell 69, 1039–1045.e3.
Mali, P., Yang, L., Esvelt, K.M., Aach, J., Guell, M., DiCarlo, J.E., Norville, J.E., and Church,
G.M. (2013). RNA-guided human genome engineering via Cas9. Science 339, 823–826.
Marshall, O.J., Southall, T.D., Cheetham, S.W., and Brand, A.H. (2016). Cell-type-specific
profiling of protein-DNA interactions without cell isolation using targeted DamID with next-
generation sequencing. Nat. Protoc. 11, 1586–1598.
Mashimo, T., Kaneko, T., Sakuma, T., Kobayashi, J., Kunihiro, Y., Voigt, B., Yamamoto, T., and
Serikawa, T. (2013). Efficient gene targeting by TAL effector nucleases coinjected with
exonucleases in zygotes. Sci. Rep. 3, 1253.
105
Miller, J.C., Holmes, M.C., Wang, J., Guschin, D.Y., Lee, Y.-L.L., Rupniewski, I., Beausejour,
C.M., Waite, A.J., Wang, N.S., Kim, K.A., et al. (2007). An improved zinc-finger nuclease
architecture for highly specific genome editing. Nat. Biotechnol. 25, 778–785.
Minucci, S., Botquin, V., Yeom, Y.I., Dey, A., Sylvester, I., Zand, D.J., Ohbo, K., Ozato, K., and
Scholer, H.R. (1996). Retinoic acid-mediated down-regulation of Oct3/4 coincides with the loss
of promoter occupancy in vivo. EMBO J. 15, 888–899.
Moorthy, S.D., Davidson, S., Shchuka, V.M., Singh, G., Malek-gilani, N., Langroudi, L.,
Martchenko, A., So, V., Macpherson, N.N., and Mitchell, J.A. (2017). Enhancers and super-
enhancers have an equivalent regulatory role in embryonic stem cells through regulation of
single or multiple genes. 246–258.
Mumbach, M.R., Rubin, A.J., Flynn, R.A., Dai, C., Khavari, P.A., Greenleaf, W.J., and Chang,
H.Y. (2016). HiChIP: Efficient and sensitive analysis of protein-directed genome architecture.
Nat. Methods 13, 919–922.
Myers, S.A., Wright, J., Peckner, R., Kalish, B.T., Zhang, F., and Carr, S.A. (2018). Discovery of
proteins associated with a predefined genomic locus via dCas9–APEX-mediated proximity
labeling. Nat. Methods 15, 1–3.
Nagano, T., Várnai, C., Schoenfelder, S., Javierre, B.-M., Wingett, S.W., and Fraser, P. (2015).
Comparison of Hi-C results using in-solution versus in-nucleus ligation. Genome Biol. 16, 175.
Naumova, N., Smith, E.M., Zhan, Y., and Dekker, J. (2012). Analysis of long-range chromatin
interactions using Chromosome Conformation Capture. Methods 58, 192–203.
Nora, E.P., Lajoie, B.R., Schulz, E.G., Giorgetti, L., Okamoto, I., Servant, N., Piolot, T., van
Berkum, N.L., Meisig, J., Sedat, J., et al. (2012). Spatial partitioning of the regulatory landscape
of the X-inactivation centre. Nature 485, 381–385.
O’Sullivan, J.M., Hendy, M.D., Pichugina, T., Wake, G.C., and Langowski, J. (2013). The
statistical-mechanics of chromosome conformation capture. Nucleus 4, 390–398.
Palstra, R.-J., Tolhuis, B., Splinter, E., Nijmeijer, R., Grosveld, F., and de Laat, W. (2003). The
β-globin nuclear compartment in development and erythroid differentiation. Nat. Genet. 35, 190–
194.
Palstra, R.-J., Simonis, M., Klous, P., Brasset, E., Eijkelkamp, B., and de Laat, W. (2008).
Maintenance of long-range DNA interactions after inhibition of ongoing RNA polymerase II
transcription. PLoS One 3, e1661.
Park, R.J., Wang, T., Koundakjian, D., Hultquist, J.F., Lamothe-Molina, P., Monel, B.,
Schumann, K., Yu, H., Krupzcak, K.M., Garcia-Beltran, W., et al. (2017). A genome-wide
CRISPR screen identifies a restricted set of HIV host dependency factors. Nat. Genet. 49, 193–
203.
Pawluk, A., Amrani, N., Zhang, Y., Garcia, B., Hidalgo-Reyes, Y., Lee, J., Edraki, A., Shah, M.,
Sontheimer, E.J., Maxwell, K.L., et al. (2016). Naturally Occurring Off-Switches for CRISPR-
Cas9. Cell 167, 1829–1838.e9.
Peric-Hupkes, D., Meuleman, W., Pagie, L., Bruggeman, S.W.M., Solovei, I., Brugman, W.,
Gräf, S., Flicek, P., Kerkhoven, R.M., van Lohuizen, M., et al. (2010). Molecular Maps of the
106
Reorganization of Genome-Nuclear Lamina Interactions during Differentiation. Mol. Cell 38,
603–613.
Pickersgill, H., Kalverda, B., De Wit, E., Talhout, W., Fornerod, M., and Van Steensel, B. (2006).
Characterization of the Drosophila melanogaster genome at the nuclear lamina. Nat. Genet. 38,
1005–1014.
Pindyurin, A. V., Pagie, L., Kozhevnikova, E.N., Van Arensbergen, J., and Van Steensel, B.
(2016). Inducible DamID systems for genomic mapping of chromatin proteins in Drosophila.
Nucleic Acids Res. 44, 5646–5657.
Pomerantz, M.M., Ahmadiyeh, N., Jia, L., Herman, P., Verzi, M.P., Doddapaneni, H., Beckwith,
C.A., Chan, J.A., Hills, A., Davis, M., et al. (2009). The 8q24 cancer risk variant rs6983267
shows long-range interaction with MYC in colorectal cancer. Nat. Genet. 41, 882–884.
Qi, L.S., Larson, M.H., Gilbert, L.A., Doudna, J.A., Weissman, J.S., Arkin, A.P., and Lim, W.A.
(2013). Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene
expression. Cell 152, 1173–1183.
Rais, Y., Zviran, A., Geula, S., Gafni, O., Chomsky, E., Viukov, S., Mansour, A.A., Caspi, I.,
Krupalnik, V., Zerbib, M., et al. (2013). Deterministic direct reprogramming of somatic cells to
pluripotency. Nature 502, 65–70.
Ran, F.A., Hsu, P.D., Lin, C.Y., Gootenberg, J.S., Konermann, S., Trevino, A.E., Scott, D. a.,
Inoue, A., Matoba, S., Zhang, Y., et al. (2013). Double nicking by RNA-guided CRISPR cas9 for
enhanced genome editing specificity. Cell 154, 1380–1389.
Reddy, K.L., Zullo, J.M., Bertolino, E., and Singh, H. (2008). Transcriptional repression
mediated by repositioning of genes to the nuclear lamina. Nature 452, 243–247.
Robert, F., Barbeau, M., Éthier, S., Dostie, J., Pelletier, J., Jinek, M., Chylinski, K., Fonfara, I.,
Hauer, M., Doudna, J., et al. (2015). Pharmacological inhibition of DNA-PK stimulates Cas9-
mediated genome editing. Genome Med. 7, 93.
Ruan, X., and Ruan, Y. (2012). Chromatin Interaction Analysis Using Paired-End Tag
Sequencing (ChIA-PET). Tag-Based Next Gener. Seq. Chapter 21, 185–210.
Scalley-Kim, M., McConnell-Smith, A., and Stoddard, B.L. (2007). Coevolution of a Homing
Endonuclease and Its Host Target Sequence. J. Mol. Biol. 372, 1305–1319.
Schmiedeberg, L., Skene, P., Deaton, A., and Bird, A. (2009). A temporal threshold for
formaldehyde crosslinking and fixation. PLoS One 4, e4636.
Sidik, S.M., Huet, D., Ganesan, S.M., Huynh, M.H., Wang, T., Nasamu, A.S., Thiru, P., Saeij,
J.P.J., Carruthers, V.B., Niles, J.C., et al. (2016). A Genome-wide CRISPR Screen in
Toxoplasma Identifies Essential Apicomplexan Genes. Cell 166, 1423–1435.e12.
Simonis, M., Kooren, J., and de Laat, W. (2007). An evaluation of 3C-based methods to capture
DNA interactions. Nat. Methods 4, 895–901.
Singhal, N., Graumann, J., Wu, G., Araúzo-Bravo, M.J., Han, D.W., Greber, B., Gentile, L.,
Mann, M., and Schöler, H.R. (2010). Chromatin-remodeling components of the baf complex
facilitate reprogramming. Cell 141, 943–955.
107
Stadtfeld, M., Maherali, N., Breault, D.T., and Hochedlinger, K. (2008). Defining Molecular
Cornerstones during Fibroblast to iPS Cell Reprogramming in Mouse. Cell Stem Cell 2, 230–
240.
Steensel, B., and Henikoff, S. (2000). Identification of in vivo DNA targets of chromatin proteins
using tethered dam methyltransferase. Nat. Biotechnol. 18, 424–428.
Stoddard, B.L. (2011). Homing endonucleases: from microbial genetic invaders to reagents for
targeted DNA modification. Structure 19, 7–15.
Tang, Z., Luo, O.J., Li, X., Zheng, M., Zhu, J.J., Szalaj, P., Trzaskoma, P., Magalska, A.,
Wlodarczyk, J., Ruszczycki, B., et al. (2015). CTCF-Mediated Human 3D Genome Architecture
Reveals Chromatin Topology for Transcription. Cell 163, 1–17.
Tolhuis, B., Palstra, R.J., Splinter, E., Grosveld, F., and De Laat, W. (2002). Looping and
interaction between hypersensitive sites in the active β-globin locus. Mol. Cell 10, 1453–1465.
Tsai, N.P., Lin, Y.L., Tsui, Y.C., and Wei, L.N. (2010). Dual action of epidermal growth factor:
Extracellular signal-stimulated nuclear-cytoplasmic export and coordinated translation of
selected messenger RNA. J. Cell Biol. 188, 325–333.
Tzelepis, K., Koike-Yusa, H., De Braekeleer, E., Li, Y., Metzakopian, E., Dovey, O.M., Mupo, A.,
Grinkevich, V., Li, M., Mazan, M., et al. (2016). A CRISPR Dropout Screen Identifies Genetic
Vulnerabilities and Therapeutic Targets in Acute Myeloid Leukemia. Cell Rep. 17, 1193–1205.
Vogel, M.J., Peric-Hupkes, D., and van Steensel, B. (2007). Detection of in vivo protein-DNA
interactions using DamID in mammalian cells. Nat. Protoc. 2, 1467–1478.
Wang, H., Yang, H., Shivalila, C.S., Dawlaty, M.M., Cheng, A.W., Zhang, F., and Jaenisch, R.
(2013). One-step generation of mice carrying mutations in multiple genes by CRISPR/cas-
mediated genome engineering. Cell 153, 910–918.
Wei, Z., Yang, Y., Zhang, P., Andrianakos, R., Hasegawa, K., Lyu, J., Chen, X., Bai, G., Liu, C.,
Pera, M., et al. (2009). Klf4 interacts directly with Oct4 and Sox2 to promote reprogramming.
Stem Cells 27, 2969–2978.
Wei, Z., Gao, F., Kim, S., Yang, H., Lyu, J., An, W., Wang, K., and Lu, W. (2013). Klf4 Mediates
Long-Range Interactions Associated with Oct4 Loci in Reprogramming and Pluripotency. Cell
5498, 36–47.
Whyte, W.A., Orlando, D.A., Hnisz, D., Abraham, B.J., Lin, C.Y., Kagey, M.H., Rahl, P.B., Lee,
T.I., and Young, R.A. (2013). Master transcription factors and mediator establish super-
enhancers at key cell identity genes. Cell 153, 307–319.
Williamson, I., Berlivet, S., Eskeland, R., Boyle, S., Illingworth, R.S., Paquette, D., Ee Dostie, J.,
and Bickmore, W.A. (2017). Spatial genome organization: contrasting views from chromosome
conformation capture and fluorescence in situ hybridization. Genes Dev. 28, 2778–2791.
Wit, E. De, and Laat, W. De (2012). A decade of 3C technologies-insights into nuclear
organization. Genes Dev. 26, 11–24.
Wu, G., and Schöler, H.R. (2014). Role of Oct4 in the early embryo development. Cell Regen.
(London, England) 3, 7.
108
Wu, T.P., Wang, T., Seetin, M.G., Lai, Y., Zhu, S., Lin, K., Liu, Y., Byrum, S.D., Mackintosh,
S.G., Zhong, M., et al. (2016). DNA methylation on N6-adenine in mammalian embryonic stem
cells. Nature 532, 329–333.
Yamanaka, T., Horikoshi, Y., Izumi, N., Suzuki, A., Mizuno, K., and Ohno, S. (2006). Lgl
mediates apical domain disassembly by suppressing the PAR-3-aPKC-PAR-6 complex to orient
apical membrane polarity. J. Cell Sci. 119, 2107–2118.
Yeom, Y.I., Fuhrmann, G., Ovitt, C.E., Brehm, A., Ohbo, K., Gross, M., Hubner, K., and Scholer,
H.R. (1996). Germline regulatory element of Oct-4 specific for the totipotent cycle of embryonal
cells. Development 122, 881–894.
Yu, C., Liu, Y., Ma, T., Liu, K., Xu, S., Zhang, Y., Liu, H., La Russa, M., Xie, M., Ding, S., et al.
(2015). Small molecules enhance crispr genome editing in pluripotent stem cells. Cell Stem Cell
16, 142–147.
Zhang, H., Jiao, W., Sun, L., Fan, J., Chen, M., Wang, H., Xu, X., Shen, A., Li, T., Niu, B., et al.
(2013). Intrachromosomal Looping Is Required for Activation of Endogenous Pluripotency
Genes during Reprogramming. Cell Stem Cell 14, 1–6.
Zhao, Z., Tavoosidana, G., Sjölinder, M., Göndör, A., Mariano, P., Wang, S., Kanduri, C.,
Lezcano, M., Sandhu, K.S., Singh, U., et al. (2006). Circular chromosome conformation capture
(4C) uncovers extensive networks of epigenetically regulated intra- and interchromosomal
interactions. Nat. Genet. 38, 1341–1347.
Abstract (if available)
Abstract
Exploration of the genome and functional elements contained within the DNA sequence continue to bring forth surprising discoveries about nature and dynamics of the nuclear architecture. As many new methods for examining chromatin interactions have emerged in the past decade, there now is a better understanding of the overall spatial configuration of the chromosomes in the interphase cell nucleus and overarching principles of nuclear compartmentalization. In particular, biochemical approaches like chromosome conformation capture (3C) techniques and its variations have been developed further by various research groups to continue deciphering the matrix of interactions in different types of cells. These advances have helped bolster the idea that spatial organization of the chromatin provides an additional layer of regulation within the cell for controlling transcriptional programs. However, limitations inherent to the 3C method have made study of long range interacting chromatin loci difficult. To address the shortcomings encountered in 3C methods, I have developed a novel assay combining the specific locus targeting by CRISPR-Cas9 and the DNA labeling function of the DNA adenine methyltransferase protein utilized in DamID assays. Drawing upon the strengths of both techniques and performing the assay in live cells, CRISPR-Dam is able to better characterize the proximity mediated interaction network at a specified locus of interest. Using the well studied pluripotency gene network in mouse embryonic stem cells as a model system to validate the new assay, we focused on the interactions mediated by the Oct4 distal enhancer (DE). The Oct4 DE is required for Oct4 gene expression and maintenance of stem cell pluripotency. Previously, 4C-seq performed to characterize the Oct4 DE interactome in mouse embryonic stem cells (ESC) was unsuccessful in identifying distally located functional target genes of the DE. Performing the CRISPR-Dam assay at the Oct4 DE in mouse ESCs helped identify two candidate genes, lethal giant larvae 2 (Llgl2) and growth factor receptor-bound protein 7 (Grb7), both located on chromosome 11. Deletion of the Oct4 DE via Cre-loxP recombination showed that Llgl2 and Grb7 are directly regulated by the Oct4 DE and their expression cannot be rescued by ectopic OCT4. We further characterized Llgl2 and Grb7 roles in supporting ESC pluripotency and found that both genes are necessary to maintain ESC pluripotency and overexpression of either gene enhances reprogramming efficiency, suggesting that Llgl2 and Grb7 play functional roles in ESC biology.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Epigenetic plasticity of cultured female human embryonic stem cells and regulation of gene expression and chromatin by PR-SET7 mediated H4K20me1
PDF
Pleotropic potential of Stat3 in determining self-renewal, apoptosis, and differentiation in mouse embryonic stem cells
PDF
Derivation and characterization of human embryonic stem (hES) cells and human induced pluripotent stem (hiPS) cells in clinical grade conditions
PDF
Co-expression of monoamine oxidase A and prostate cancer stem cell markers in Pten knockout mice
PDF
Exploring three-dimensional organization of the genome by mapping chromatin contacts and population modeling
PDF
Exploring the application and usage of whole genome chromosome conformation capture
PDF
Molecular basis of mouse epiblast stem cell and human embryonic stem cell self‐renewal
PDF
Computational analysis of genome architecture
PDF
Interaction of epigenetics and SMAD signaling in stem cells and diseases
PDF
Development of targeted therapies for peroxisome biogenesis disorders
PDF
IPS and CNS cell models of peroxisomal disorders
PDF
Lung mesenchyme cell biology
PDF
Modeling neurodegenerative diseases using induced pluripotent stem cells and identifying therapeutic targets
PDF
The splicing error of FOXP1 in type I myotonic dystrophy
PDF
The role of Prkci in stem cell maintenance and cell polarity using a 3-D culture system
PDF
3D modeling of eukaryotic genomes
PDF
Role of cancer-associated fibroblast secreted annexin A1 in generation and maintenance of prostate cancer stem cells
PDF
Understanding DNA methylation and nucleosome organization in cancer cells using single molecule sequencing
Asset Metadata
Creator
Huang, David Shih Yu
(author)
Core Title
Exploring stem cell pluripotency through long range chromosome interactions
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Genetic, Molecular and Cellular Biology
Publication Date
10/16/2018
Defense Date
08/30/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
chromosome conformation capture,CRISPR,distal enhancer,DNA adenine methyltransferase,embryonic stem cell,higher order chromosome structure,long range chromatin interaction,OAI-PMH Harvest,Oct4,pluripotency,Pou5f1,somatic cell reprogramming,stem cell biology
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Segil, Neil (
committee chair
), Lu, Wange (
committee member
), Stallcup, Michael (
committee member
)
Creator Email
davidshh@usc.edu,dhuang369@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-79863
Unique identifier
UC11671752
Identifier
etd-HuangDavid-6847.pdf (filename),usctheses-c89-79863 (legacy record id)
Legacy Identifier
etd-HuangDavid-6847.pdf
Dmrecord
79863
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Huang, David Shih Yu
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
chromosome conformation capture
CRISPR
distal enhancer
DNA adenine methyltransferase
embryonic stem cell
higher order chromosome structure
long range chromatin interaction
Oct4
pluripotency
Pou5f1
somatic cell reprogramming
stem cell biology