Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Characterizing and developing E. coli Type I-E CRISPR adaptation as a DNA recording and genome engineering tool
(USC Thesis Other)
Characterizing and developing E. coli Type I-E CRISPR adaptation as a DNA recording and genome engineering tool
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
CHARACTERIZING AND DEVELOPING E. COLI TYPE I-E CRISPR ADAPTATION AS A
DNA RECORDING AND GENOME ENGINEERING TOOL
by
Luke Peach
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
MOLECULAR BIOLOGY
December 2024
Copyright 2024 Luke Peach
ii
This thesis is dedicated to my parents, Andrea and Robert Peach.
iii
ACKNOWLEDGEMENTS
Graduate school has been an exciting and demanding time. My main goal in the time spent
at USC was to contribute something useful to scientific knowledge while expanding and honing
my capabilities for science and research. Many people along the way have been a great part of my
time working in the Boedicker Lab at USC. In this section I will acknowledge some of these
people.
I would like to express my gratitude to Professor James Boedicker for giving me the
opportunity to pursue research in his lab and for providing financial support, allowing me to focus
on laboratory research. CRISPR-Cas was not a research focus in the Boedicker Lab prior to my
addition. It was certainly a risk for Professor Boedicker to support me in pursuing this research
topic. Hopefully we have grown enough expertise and inspired enough interest for this research
path to proceed beyond my time with the group. I think our discussions regarding data
interpretation and research strategies were invaluable, as was the help in learning how to clearly
present my research. My scientific background is in biology and chemistry so having the
perspective of a biophysicist was often extremely useful. I greatly appreciate the Molecular and
Computational Biology department for accepting me into the program, allowing me to explore my
research interests and improve as a scientist. The support provided by the program not just in
facilitating my progress but highlighting opportunities and options beyond the program has been
exceptional. I found that the lectures required early in the program really helped me focus in on
the available research options where I am most capable of contributing.
A postdoc from the Boedicker Lab Fengjie Zhao and I developed a great friendship. He
and I mostly worked on different projects, but we often spent time discussing the challenges and
experimental options for our respective research areas. Prior to joining the Boedicker Lab, Fengjie
had worked with a synthetic biology group involved in generating bacterial genome deletions. He
had a lot of good insight regarding molecular tools and approaches for modifying bacterial
genomes. A true optimist, Fengjie was always positive and encouraging making working with him
a real pleasure. From the Chung Lab, graduate student Nicholas Scurato was great to work with.
We didn't share any projects together but relied on each other for advice about research problems.
The Chung Lab performed a lot of protein purification work. Nicholas was my go-to person with
anything related to protein expression. He always extended a helping hand if we needed anything
from his lab, including consumables in short supply. I tried to reciprocate as much as possible. I
significantly benefitted from having Nicholas in the lab next door, so I really appreciate his help
with everything. An undergraduate student, Haoyun Zhang, was a pivotal part of the CRISPR array
dynamics project. Haoyun went above and beyond with his efforts and contributions. I really
appreciate his willingness to take on difficult challenges for the benefit of the project, even while
having a full load of classes.
iv
TABLE OF CONTENTS
Dedication.................................................................................................................................. ii
Acknowledgements.................................................................................................................... iii
List of Tables.............................................................................................................................. vii
List of Figures............................................................................................................................ viii
Abstract...................................................................................................................................... x
Chapter 1: Introduction: Prokaryotic CRISPR-Cas adaptive immunity.................................... 1
1.1 CRISPR-Cas adaptive immunity ............................................................................. 1
1.2 CRISPR-Cas diversity ............................................................................................. 4
1.3 E. coli Type I-E CRISPR-Cas.................................................................................. 6
1.3.1 Type I-E Adaptation.................................................................................. 7
1.3.2 Type I-E Expression and processing......................................................... 10
1.3.3 Type I-E Interference ................................................................................ 11
1.4 S. pyogenes Type II-A CRISPR-Cas........................................................................ 14
1.5 Surviving CRISPR-Cas self-genome targeting in E. coli ........................................ 17
1.6 Research Aims ......................................................................................................... 21
Chapter 2: Assessing spacer acquisition rates in E. coli Type I-E CRISPR arrays.................... 22
2.1 Introduction.............................................................................................................. 22
2.2 Materials and Methods............................................................................................. 26
2.2.1 Assessing the accuracy of PCR quantifications........................................ 26
2.2.2 Methods for genome-integrating DNA constructs.................................... 30
2.2.3 Spacer acquisition assay ........................................................................... 34
2.2.4 Procedure for quantifying amplicon bands............................................... 36
2.2.5 Fitting procedure for CRISPR array expansion rates ............................... 37
2.2.6 Expanded array sequencing ...................................................................... 38
2.2.7 Simulations ............................................................................................... 38
2.2.8 Competition experiments.......................................................................... 39
2.2.9 Calculating spacer acquisition rates.......................................................... 40
2.2.10 Phage infection studies ........................................................................... 40
v
2.2.10.1 Phage propagation.................................................................... 41
2.2.10.2 Plaque formation assay ............................................................ 41
2.2.10.3 Bacteriophage infection assay.................................................. 42
2.3 Results...................................................................................................................... 42
2.3.1 A model of array expansion ...................................................................... 46
2.3.2 Cellular parameters modulate spacer acquisition rates............................. 52
2.3.2.1 DNA substrate............................................................................ 53
2.3.2.2 Cas1-Cas2 expression ................................................................ 53
2.3.2.3 CRISPR array copy number....................................................... 54
2.3.2.4 Expression of heterologous DNA end-joining genes................. 55
2.3.3 Enhanced array expansion rates boost phage protection .......................... 56
2.4 Discussion................................................................................................................ 60
2.5 Supporting Experiments........................................................................................... 64
2.5.1 Spacer acquisition in strains with different starting array lengths............ 64
2.5.2 Spacer acquisition assay for distinct +2 expanded strains........................ 67
2.5.3 NHEJ genes ku and ligD individually enhance spacer acquisition........... 72
2.6 Acknowledgements.................................................................................................. 73
2.7 Supplementary Material........................................................................................... 74
Chapter 3: CRISPR-Cas interference: Genome streamlining.................................................... 91
3.1 Introduction.............................................................................................................. 91
3.2 Materials and Methods............................................................................................. 95
3.2.1 Plasmid construction and strain engineering ............................................ 95
3.2.2 E. coli Type I-E programmed self-targeting assay.................................... 98
3.2.3 S. pyogenes Type II-A programmed self-targeting assay.......................... 99
3.2.4 Tiling PCR to assess the size of self-targeting deletions .......................... 100
3.2.5 OD600 growth measurements................................................................... 101
3.3 Results...................................................................................................................... 102
3.3.1 E. coli Type I-E programmed self-targeting ............................................. 102
3.3.2 E. coli Type I-E CRISPR autoimmunity................................................... 106
3.3.3 S. pyogenes Type II-A CRISPR-Cas in E. coli ......................................... 109
vi
Chapter 4: Ongoing research ..................................................................................................... 110
4.1 CRISPR array contraction........................................................................................ 110
4.2 Spacer associated fitness effects.............................................................................. 114
4.2.1 Directed spacer acquisition....................................................................... 117
4.3 Host-derived spacers enriched post phage infection................................................ 120
Chapter 5: Summary and future directions................................................................................ 124
5.1 CRISPR array dynamics.......................................................................................... 124
5.2 CRISPR self-genome targeting................................................................................ 126
5.3 CRISPR adaptation applications.............................................................................. 128
5.4 Future directions ...................................................................................................... 132
Bibliography .............................................................................................................................. 135
vii
LIST OF TABLES
Table 2.1 E. coli strains used in this study............................................................................. 83
Table 2.2 Plasmids used in this study.................................................................................... 83
Table 2.3 Primers used in this study...................................................................................... 84
Table 2.4 P values generated using the Student T-test (2-tailed, unpaired)........................... 85
Table 2.5 Newly acquired spacers sequenced from clonal strains ........................................ 86
viii
LIST OF FIGURES
Figure 1.1 Bacterial nucleic acid sensing systems................................................................ 2
Figure 1.2 Overview: the three phases comprising CRISPR-Cas adaptive immunity.......... 4
Figure 1.3 Diversity of CRISPR-Cas systems ...................................................................... 6
Figure 1.4 Components comprising the E. coli Type I-E CRISPR system........................... 7
Figure 1.5 Type I-E CRISPR adaptation............................................................................... 10
Figure 1.6 Type I-E CRISPR expression and processing...................................................... 11
Figure 1.7 Type I-E CRISPR interference ............................................................................ 14
Figure 1.8 Components of CRISPR-Cas defense in the Type II-A system of S. pyogenes... 16
Figure 1.9 Cas9 effector enzyme protospacer processing..................................................... 17
Figure 1.10 Effects of CRISPR-Cas self-genome targeting.................................................... 20
Figure 2.1 Validating the accuracy of PCR-based quantifications........................................ 29
Figure 2.2 Cas1 protein expression confirmed via SDS-PAGE separation .......................... 31
Figure 2.3 Cas1 protein expression vs. time and IPTG concentration.................................. 32
Figure 2.4 Quantification comparisons for two different sample preparation methods........ 36
Figure 2.5 Quantifying the temporal dynamics of spacer acquisition................................... 45
Figure 2.6 Modeling CRISPR spacer acquisition ................................................................. 48
Figure 2.7 Array expansion associated with a fitness cost.................................................... 51
Figure 2.8 Intracellular parameters modulate spacer acquisition.......................................... 56
Figure 2.9 Phage protection is correlated with spacer acquisition rates ............................... 59
Figure 2.10 Spacer acquisition in strains with different starting array lengths....................... 66
Figure 2.11 No contraction detected below array starting lengths.......................................... 67
Figure 2.12 Spacer acquisition in strains with two newly acquired spacers........................... 71
Figure 2.13 Both Ku and LigD alone increase spacer acquisition .......................................... 73
Figure S2.1 Array contraction not detected through 5-days of culturing................................ 74
Figure S2.2 Model simulations fitting to a 10-day cas1-cas2 induction experiment
for the base recording strain containing pUC19 .................................................. 75
Figure S2.3 Expanded vs. unexpanded-array competition experiments
without cas1-cas2 induction ................................................................................ 76
Figure S2.4 Alpha values calculated from five-day cas1-cas2 induction experiments .......... 77
ix
Figure S2.5 Quantifying spacer acquisition using PCR.......................................................... 78
Figure S2.6 Cas1-Cas2 expression range in base spacer recording strain .............................. 79
Figure S2.7 Construction of an E. coli strain with two CRISPR arrays.................................. 79
Figure S2.8 NHEJ expression enhances spacer acquisition .................................................... 80
Figure S2.9 Spacer acquisition assay ...................................................................................... 81
Figure S2.10 Doubling times calculated from OD600 growth curves...................................... 82
Figure 3.1 S. pyogenes Type II-A CRISPR adaptation machinery in E. coli........................ 98
Figure 3.2 NHEJ expression increases the survivability of Cascade-Cas3 self-targeting..... 103
Figure 3.3 NHEJ expression repairs Cas3-damaged DNA with smaller deletions............... 106
Figure 4.1 Fluorescent CRISPR array to assess array stability............................................. 113
Figure 4.2 Recapitulating fitness effects using directed spacer integration.......................... 117
Figure 4.3 Directed spacer acquisition.................................................................................. 119
Figure 4.4 Self-genome targeting spacers enriched during phage infection ......................... 122
x
ABSTRACT
Characterizing and developing E. coli Type I-E CRISPR adaptation
as a DNA recording and genome engineering tool
Adaptive immunological defense fortifies life coevolving in adversarial environments.
CRISPR-Cas is a prokaryotic defense mechanism providing adaptive protection against evasive
mobile genetic elements (MGEs) such as bacteriophage and plasmids. CRISPR arrays, preserved
in the host genome, are repositories of MGE-derived DNA sequences termed “spacers” that serve
as a chronological, immunological record of infections. These arrays are composed of distinct
spacer sequences flanked by conserved, palindromic repeats. Cas1 and Cas2 proteins form a
complex that captures, processes and array-integrates these infection-derived sequences. This
CRISPR adaptation process is also central to emerging biological recording technologies. In E.
coli, the Type I-E CRISPR array is expressed and processed into crRNAs, each of which guide a
surveillance complex to target protospacer sequences complementary to the spacer. Target site
binding triggers recruitment of a helicase nuclease for processive dsDNA degradation. The primary
focus of the research presented in this dissertation is CRISPR adaptation, specifically the dynamics
of spacer acquisition in the E. coli Type I-E CRISPR-Cas system.
Array integrated spacer sequences, captured from infecting mobile genetic elements
provide target specificity for the CRISPR-Cas immune response. The rates at which spacers
integrate into native arrays within bacterial populations has not been quantified. Here we measure
naïve spacer acquisition rates in E. coli Type I-E CRISPR, identify factors that affect these rates,
and model this process fundamental to CRISPR-Cas defense. Prolonged Cas1-Cas2 expression
produced fewer new spacers per cell on average than predicted by our model. Subsequent
experiments revealed this was due to a mean fitness reduction linked to array-expanded
populations. Also, expression of heterologous non-homologous end joining (NHEJ) DNA-repair
genes was found to augment spacer acquisition rates, translating to enhanced phage infection
defense. Together, these results demonstrate the impact of intracellular factors that modulate spacer
acquisition and identify an intrinsic fitness effect associated with array expanded populations.
We also present research characterizing self-genome deletions catalyzed by the CascadeCas3 CRISPR interference machinery of E. coli. Programmed host-DNA targeting produces longrange bidirectional deletions in vivo. Introducing heterologous NHEJ expression increases the
survivability of self-targeting and generates significantly smaller, unidirectional deletions.
1
Chapter 1
Introduction: Prokaryotic CRISPR-Cas
adaptive immunity
1.1 CRISPR-Cas adaptive immunity
Bacteria evolve under the threat of invasion from mercurial viral adversaries known as
bacteriophage. These pernicious counterparts can hijack host bacterial machinery to replicate
before lysing the host and releasing into the extracellular environment. Phage infection is thought
to be responsible for 20-40% of daily mortality in bacterial communities providing a strong
evolutionary pressure guiding evolution (Suttle et al., 2007). Prokaryotes can block phage
receptors by masking them or modifying them through mutation (Westra et al., 2012) to prevent
uptake of the infecting nucleic acids. Infected cells can also induce a programmed abortive
pathway to mitigate viral spread through the broader population. Bacteria have also evolved
dozens of distinct defense mechanisms to counteract phage infections, including restriction
modification, patrolling prokaryotic argonaute proteins (pAgo) and the adaptive defense system
CRISPR-Cas (Georjan and Bernheim., 2023). Restriction modifications systems modify host
recognition sequences via methylation while degrading the same unmodified sequences present in
mobile genetic elements (MGEs) such as invading bacteriophage (Figure 1.1). Prokaryotic
argonautes utilize small nucleic acid fragments as guides to recognize the presence of MGEs,
ultimately resulting in target degradation. CRISPR-Cas is an adaptive system allowing prokaryotes
2
to coevolve with dynamic bacteriophage populations by counteracting immunological escape.
Through a CRISPR adaptation process, prokaryotes facing invasion can express DNA integrase
genes that capture, process and self-integrate small fragments of replicating nucleic acids. These
fragments are then expressed as RNA sequences that guide nuclease enzymes to invader target
sites for degradation.
Figure 1.1 | Bacterial nucleic acid sensing systems. Several known nucleic acid sensing systems
are present across bacterial species to detect the presence of invader sequences. Restriction
modification systems utilize sequence specific endonucleolytic restriction enzymes to destroy
invader DNA, masking host genome sequences from recognition through modifications such as
DNA-methylation. CRISPR-Cas adaptive immunity allows host defenses to coevolve with and
destroy phage DNA, silencing these invaders through RNA-guided target complementarity and
effector mediated degradation. Argonaute proteins (pAgo) are sensors that use nucleic acid guides
to first detect invader sequences, subsequently degrading them via antiphage effectors. This figure
is adapted from Georjan and Bernheim et al., 2023.
CRISPR-Cas adaptive immunity is present in about half of all sequenced bacterial species
and about 90% of archaea. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)
refers to the CRISPR array, the first discovered and most fundamental component of this adaptive
system. Three main phases make up CRISPR-Cas defense: Adaptation, expression and processing,
and interference (Figure 1.2). A CRISPR-Cas immune response first activates the adaptation phase
through the expression of Cas1 and Cas2. Together, these proteins form a complex capable of
capturing invader nucleic acid sequences called protospacers. These sequences are processed in
3
the Cas1-Cas2 complex before being integrated into a CRISPR array, typically contained in the
prokaryotic host genome. These spacers, now host-integrated serve as an immunological memory
of the infection and provide templated targeting coordinates for intervention. The array is
expressed from a promoter within a region directly upstream termed the leader sequence. Array
transcription produces precursor CRISPR RNA transcript containing the individual spacers each
separated by a conserved repeat sequence, typically 20-50bp in length. Through further processing
these individual spacers are separated to form mature CRISPR RNAs (crRNAs). These crRNAs,
made up of a spacer sequence and fragments from adjacent repeats, each form a ribonucleoprotein
complex with one or more Cas effector proteins. These surveillance complexes patrol the cell,
annealing to complementary DNA protospacers and initiating target degradation through effectormediated endonucleolytic activity.
Spacer integrations are polar, occurring at the leader-proximal end of the array, allowing
for the most recently acquired spacers to be nearest the array promoter (upstream of the leaderrepeat1 junction). As bacteriophage evolve escape mutations, no longer allowing for full spacerprotospacer complementarity, the system adapts. Partially annealing crRNA initiates an alternative
spacer acquisition process termed primed acquisition (Datsenko et al., 2012; Richter et al., 2014;
Musharova et al., 2021). Here, the immunological memory bank is updated by stimulating arrayintegration of new spacer sequences near the obsolete protospacer, amending the target repository.
4
Figure 1.2 | Overview: the three phases comprising CRISPR-Cas adaptive immunity. The
three phases of CRISPR-Cas adaptive immunity illustrated within a self-contained prokaryotic
cell. CRISPR-Cas adaptation initiates a response to counteract invading nucleic acids. Through
sequence capture, processing and array integration, Cas1-Cas2 generates the infection memory
utilized for host defense. The array is expressed from a promoter in the leader sequence producing
precursor crRNA, with subsequent transcript processing producing mature crRNAs. These
crRNAs guide Cas effector proteins to complementary sequences for target silencing in a process
termed interference.
1.2 CRISPR-Cas diversity
Although CRISPR-Cas loci are rapidly evolving, evolutionary classifications have been
established to distinguish different systems across species. The major classification division groups
all systems into two classes based on fundamental differences in Cas effector mechanisms for
crRNA processing and target interference (Makarova et al., 2020). The class 1 CRISPR-Cas
systems are distinct from class 2 systems in that they possess a multi-subunit protein complex that
binds and processes precursor-crRNA to produce functional, mature crRNAs. These complex
subunits function together with the crRNA to bind protospacer target sequences, often recruiting
5
another protein such as Cas3 for target processing. As of 2024 class 1 systems are made up of three
types (I, III and IV) and 16 subtypes with types classified based on the signature interference
protein. Cas3, Cas10 and Csf1 are the signature proteins for types I, III and IV respectively. Class
2 systems are defined by a single multi-domain protein such as Cas9 that performs several key
roles independently, including pre-crRNA processing (Type II systems also require a host RNase),
target site binding and target cleavage. Class 2 systems are made up of three types (II, V and VI)
and 20 subtypes. Cas9, Cas12 and Cas13 are the signature proteins for types II, V and VI
respectively.
All Cas genes are characterized for their contributions to four primary functions:
adaptation, expression and processing, interference and signal transduction (Figure 1.3). The first
three functions are fundamental to adaptive immunity, whereas signal transduction function is
confined to a small number of CRISPR-Cas systems (mostly Type III) involved in regulatory
activities. Cas1 and Cas2 are the only two proteins common to all identified CRISPR-Cas systems,
suggesting the spacer acquisition mechanism was fundamental in the early evolutionary history of
adaptive prokaryotic defense. Although many systems require only these two Cas proteins to
enable CRISPR adaptation, others also require a third protein such as Cas4 to facilitate the process.
6
Figure 1.3 | Diversity of CRISPR-Cas systems. Classifications for all characterized CRISPRCas systems. Four functional modules make up CRISPR-Cas systems: adaptation, expression &
processing, interference and a for a small fraction, signal transduction. The two major classes of
systems are differentiated by multi-protein subunit vs. multi-domain single protein
processing/interference mechanisms. The various types and subtypes are further classified by the
signature effector protein and other associated interference proteins. Cas1 and Cas2, proteins that
facilitate spacer acquisition during CRISPR adaptation, are common to all known systems. This
figure is adapted from Makarova et al., 2020.
1.3 E. coli Type I-E CRISPR-Cas
Native E. coli CRISPR-Cas defense is classified as a Type I-E system. This system includes
two distinct CRISPR arrays within the E. coli genome (Figure 1.4). Wild type array 2 contains 4
preexisting spacers and Array 1 contains 13. Array 2 is no longer functional as the leader sequence
(directly upstream of the array) has mutated and is unable to facilitate spacer integrations. Cas1
and Cas2 are expressed from an operon that also includes the five Cascade (CRISPR associated
complex for antiviral defense) genes. Cas3, the helicase-nuclease effector is independently
expressed from the other components. The CRISPR array is expressed from a promoter contained
within the leader sequence. An integration host factor (IHF) not exclusively involved in CRISPRCas defense is required for spacer acquisition (Yoganand et al., 2017).
7
Figure 1.4 | Components comprising the E. coli Type I-E CRISPR-Cas system. Two separate
CRISPR arrays derived from this system are contained within the E. coli genome. The CRISPR
locus is made up of cas3, the Cascade genes, cas1-cas2 and the functional array 1. Three different
promoters (Pcas3, Pcas8, PCRISPR) control the expression of all CRISPR-Cas components. An
integration host factor (IHF), though not an exclusively a CRISPR component, facilitates spacer
acquisition during CRISPR adaptation.
1.3.1 Type I-E Adaptation
Adaptation is the first step in CRISPR adaptive immunity, this process generates
immunological memory for host prokaryotes by programming target sequences into the CRISPR
array. Arrays for this system are made up of variable 33 base pair spacers separated by 28 base pair
repeat sequences. The repeats are partially palindromic each forming a hairpin loop in the
expressed RNA transcript. Upstream of the first repeat is the leader sequence, a region containing
the array promoter, and sequence motifs required for the integration of new spacers. In the E. coli
Type I-E CRISPR-Cas system, spacer acquisition is performed by proteins Cas1 and Cas2 (Figure
1.5). Together, these proteins form a hetero-hexameric integrase complex made up of two Cas1
dimers connected by a Cas2 dimer. Cas1 contains a nuclease domain allowing for prespacer
processing prior to array-integration. Prespacer substrate is primarily derived during DNA
replication from RecBCD (Levy et al., 2015). RecB and DNA polymerase I are required for spacer
acquisition (Ivancic-Bace et al., 2015). The process of repairing stalled replication forks during
replication produces DNA debris that can re-duplex to form potential prespacer substrate for Cas1-
Cas2-complex capture. The size of spacer sequences is determined by the fixed distance between
8
the two Cas1 dimers (23bp). Spacers, 33bp in length, are made up of this 23bp central duplex and
5bp overhangs at each end (Wang et al., 2015; Jackson et al., 2017). Cas1 proteins have a
protospacer adjacent motif (PAM) binding pocket that is used for prespacer selection (Kim et al.,
2020). The PAM is a short target recognition sequence directly adjacent to the protospacer,
important for both CRISPR adaptation and interference. The PAM sequence is cleaved off during
prespacer processing by a host exonuclease, prior to array integration (Wang et al., 2023). With
the PAM present at protospacer target sites but absent in the repeats adjacent to host-integrated
spacers, it serves as an interference-phase target recognition site differentiating protospacers from
the CRISPR array, preventing host targeting. A PAM sequence is not requisite for Cas1-Cas2
substrate capture but short duplex sequences containing a PAM have a stronger affinity for the
integrase complex (Wang et al., 2015). The PAM also assures the correct spacer orientation for
array integration (Staals et al., 2016; Shmakov et al., 2014; Swarts et al., 2012). The leader
sequence contains an IHF recognition site. Upon docking to this site, the IHF bends the DNA
creating a kink recognized by the Cas1-Cas2 complex. Spacer integration is polar, occurring at the
leader-proximal end of the CRISPR array. The Cas1-Cas2 complex docks with one Cas1 dimer at
the leader-repeat junction and the second at the distal end of this first repeat. The processed
prespacer ends within the integrase complex each perform a nucleophilic attack at opposite ends
of the repeat (Nuñez et al., 2015; Arslan et al., 2014; Rollie et al., 2015), this separates the two
stands of the leader proximal repeat with the newly integrated spacer in between. Host DNA repair
enzyme DNA polymerase I fills in the complementary repeat sequences followed by post-synaptic
complex (PSC) resolution to complete the process (Budhathoki et al., 2020). Each newly acquired
spacer expands the array by 61bp. The newest spacers are integrated closest to the array promoter,
ensuring the most relevant crRNAs are expressed first.
9
The process of acquiring spacers from newly encountered MGEs is called naïve CRISPR
adaptation (Fineran et al., 2012). These spacers are generally derived from replication debris
generated during the repair of stalled replication forks (Levy et al., 2015). The greater number of
replication origins from multi-copy MGEs and relative dearth of chi sites compared to the E. coli
genome results in a significant bias for the acquisition of foreign spacers. This bias however does
not preclude the acquisition of host-chromosome derived spacers. It’s unclear if these self-targeting
spacers are mostly a detrimental tradeoff to the benefits of adaptive immunity or if they generally
embody alternative, beneficial functions. There are some examples where these host-derived
spacers play a functional role including gene regulation and editing (Du et al., 2022; Chao et al.,
2023; Wimmer et al., 2019). Target protospacers and their cognate PAMs can evade CRISPR
destruction through mutation (Semenova et al., 2011; Deveau et al., 2008; Fineran et al., 2014).
When this occurs, the host CRISPR system can trigger an updating of spacers from the target
protospacer region in a process termed “primed” acquisition (Datsenko et al., 2012; Staals et al.,
2016). During priming, spacers are rapidly captured from the regions surrounding the partially
mismatched spacer:protspacer, providing the host with fully updated and functional targeting
spacers. Cascade complex containing a mismatching spacer forms an RNA-DNA duplex with the
priming protospacer leading to the formation of a primed acquisition complex (PAC). The PAC is
made up of the bound Cascade, helicase-nuclease Cas3 and the Cas1-Cas2 complex. The PAC then
identifies a PAM and cognate 33bp spacer near the priming protospacer. The new protospacer is
loaded into Cas1-Cas2 and excised from the DNA as Cas3 degrades the surrounding region (Swarts
et al., 2012). Host nucleases RecBCD and RecJ are also required for the processing of the prespacer
(Shiriaeva et al., 2022).
10
Figure 1.5 | Type I-E CRISPR adaptation. The Cas1-Cas2 integrase complex catalyzes the arrayintegration of new spacers at the leader-repeat junction. Spacer sequences are derived from either
MGEs or the host genome. Candidate prespacers contain a PAM that is cleaved off during
processing, prior to integration. PAMs differentiate true protospacer targets from CRISPR array
sequences. In E. coli Type I-E CRISPR, newly acquired spacers generally expand the array by
61bp. This expansion includes the introduction of a spacer sequence (33bp) and duplication of the
leader-proximal repeat (28bp). Purple arrows represent PCR primers.
1.3.2 Type I-E Expression and processing
Immunological targets for CRISPR-Cas defense are defined by the spacer sequences within
expressed CRISPR arrays. The array is initially expressed as a long precursor crRNA containing
both the alternating repeats and spacers on one transcript, along with 54 non-crRNA bases from
the leader sequence, between the array promoter and first repeat. The repeat sequences are partially
palindromic, each forming a stem loop in the RNA transcript between the spacers. One of the
Cascade proteins, Cas6, is an endoribonuclease. Cas6 binds to this stem loop at a site designated
CBS (Cas6 binding site) nucleating the formation of Cascade and producing an RNA-Cascade
complex (Makarova et al., 2020). Cas6 cleaves the RNA sequence on one side of the stem loop
11
origin, separating individual RNA sections, each made up of one spacer and part of both flanking
repeats. The stem loop occupies the 3’ end (Niewoehner et al., 2014; Sashital et al., 2011; Gao et
al., 2022). This double cleavage at adjacent stem loops generates functional, mature crRNAs
(Figure 1.6).
Figure 1.6 | Type I-E CRISPR expression and processing. The CRISPR array is expressed from
a promoter within the leader sequence. The resulting pre-crRNA transcript is made up of
alternating repeats and spacers (also included is a short leader derived sequence). Partially
palindromic sequences form stem loops containing a Cas6 binding site. Cas6 cleaves one side of
each stem loop base, procuring mature crRNAs made up of a single spacer and sections from each
adjacent repeat. This figure was adapted from Liu et al., 2020.
1.3.3 Type I-E Interference
CRISPR-Cas Type I-E surveillance is achieved by a ribonucleoprotein complex made up
of Cascade bound by a crRNA. The role of Cascade is to bind a PAM sequence and unwind
adjacent DNA complementary to the spacer sequence within the crRNA. This complex maintains
an elongated seahorse-like architecture made up of 11 protein subunits of five types (Jackson et
al., 2015). As mentioned previously Cas6 binds to the crRNA stem loop cleaving one side of its
base, establishing the 3’ end of the complex. Six copies of the Cas7 protein polymerize along the
length of the crRNA forming a helical filament as the backbone of the complex. A single Cas5
12
protein binds to the foot of the complex with affinity for the 5’ end of the crRNA. Cas5 binds to
the end of the Cas7 chain capping the crRNA-spanning backbone. A pair of Cas11 proteins, known
as the “small subunit” assemble on the inner core of the complex by interacting with the Cas7
backbone through salt bridges that help to stabilize crRNA-DNA interactions (Venclovas et al.,
2016; Zhao et al., 2014; van Erp et al., 2015). A single Cas8 protein known as the large subunit
also binds to the 5’ end of the crRNA, directly adjacent to Cas5, rounding out the complex.
There are two main steps that comprise CRISPR targeting in the Type I-E system. The first
aim is for the Cascade ribonucleoprotein surveillance complex is to identify and bind to targets
with crRNA complementarity. Target recognition is achieved through identification of both a PAM
as well as a crRNA-complementary sequence in dsDNA targets, leading to high affinity binding
(Gleditzsch et al., 2019; Sashital et al., 2012). Once bound, the second step is for Cascade to recruit
Cas3 endonuclease for target sequence destruction (Figure 1.7). PAM seeking is performed by
Cascade as it scans DNA sequences predominantly through 3D search, binding transiently at PAM
sites to partially unwind the DNA and check for a spacer:protospacer match (Redding et al., 2015).
The amount of time used to probe a given site is about the same time spent diffusing to the next
site (Vink et al., 2020). The E. coli PAM recognition system is well studied, revealing a rather
broad range of sufficient 3bp motifs capable granting adjacent sequence interrogation. These
potential PAMs include 5’-A-(T/C/A)-G-3’ with the consensus sequence being AAG (Shipman et
al., 2015). The Cas8 protein within Cascade recognizes the PAM through a lysine finger, glutamate
wedge and glycine loop (Hayes et al., 2016). Crucially, Cascade cannot recognize the PAM motif
5’-CCG-3’ the sequence found at the end of array repeats, directly adjacent to each host residing
spacer (Xue et al., 2015; Leenay et al., 2016).
13
With the recognition of a correct target sequence, Cascade initiates unwinding of the
dsDNA, allowing for the crRNA sequence to anneal to the complementary DNA strand. This
hybridization leads to the formation of a nucleic acid structure called an R-loop where the
displaced, non-target strand forms a looped protrusion (Aguilera et al., 2012; Santos-Pereira et al.,
2015). Positively charged amino acids on Cas8 and Cas5 direct the non-target strand away from
the target strand, towards the Cas11 subunits (Pausch et al., 2017; Guo et al., 2017). The formation
of this R-loop induces a conformational change in Cascade, locking the R-loop in place and leading
to the recruitment of Cas3, the target degrading enzyme (He et al., 2020; Rollins et al., 2019). The
conformational changes required for Cas3 recruitment occur in the Cas8 subunit (Xue et al., 2016;
van Erp et al., 2018). Cas3 has an N-terminal HD nuclease domain and a C-terminal helicase
domain as both activities are required for target degradation (mulepati et al., 2013). Initially the
Cas3 generates a nick (ssDNA break) on this non-target strand (Loeff et al., 2018). While staying
bound to the Cascade complex, Cas3 reels in the ssDNA sequence through the helicase domain,
degrading it as the sequence unwinds and moves through the HD nuclease domain (Redding et al.,
2015; Nimkar et al., 2020). Cas3 has also been shown to detach from the Cascade complex and
translocate along the ssDNA sequence though no evidence for continued degradation has been
observed through this movement. This mechanism describes the process for non-target strand
degradation but it’s not clear how the target strand itself degrades. Target strand exposure upon
initial non-target strand degradation may lead to the recruitment of a second Cas3 capable of
nicking and degrading this complementary sequence.
14
Figure 1.7 | Type I-E CRISPR interference. The Cascade complex loaded with a crRNA
identifies a target sequence first through PAM recognition and binding prior to crRNA/targetsequence hybridization. The displaced non-target forms an R-loop, changing the conformation of
the Cascade, resulting in Cas3 recruitment and binding to the complex. Cas3 generates a nick in
the non-target strand. It then reels in the ssDNA sequence through its helicase domain degrading
the sequence through an HD nuclease domain. Cas3 has also been observed dissociating from the
Cascade complex and translocating laterally along the DNA sequence. This figure was adapted
from Liu et al., 2020.
1.4 S. pyogenes Type II-A CRISPR-Cas
The most well-known CRISPR-Cas protein is the Type II-A Cas9 effector enzyme of
Streptococcus pyogenes (Pennisi et al., 2013). This large endonuclease functions as a crRNAbound ribonucleoprotein that can independently detect, secure and degrade target protospacer
sequences (Sapranauskas et al., 2011). Across a wide range of biological and in vitro research, wild
type and modified Cas9 variants have been harnessed to revolutionize genetic/epigenetic
engineering and transcriptional control. The broad adoption of this system is due to its simplicity
15
and efficacy, as it only requires defining a target sequence through RNA programming and
expressing the Cas9 protein, making it cheap and easy to use.
Cas9 serves as the Type II effector enzyme mediating CRISPR-Cas interference. As with
the Type I system of E. coli several other components are necessary for CRISPR adaptation and
crRNA biogenesis to enable functional adaptive immunity. CRISPR adaptation and crRNA
processing in Type II systems, as with CRISPR interference, are distinctly different than the
homologous mechanisms in the Type I-E system of E. coli. The type II-A system of S. pyogenes is
comprised of the proteins Cas1-Cas2 and Csn2, a CRISPR array, a trans-activating crRNA
sequence (tracrRNA) and the aforementioned Cas9 effector nuclease (Figure 1.8). The Type II-A
CRISPR adaptation mechanism requires not just Cas1-Cas2 but all components of the system.
Cas1, Cas2 and Csn2 are involved in spacer acquisition and processing although the exact function
of Csn2 is not well understood. The Cas9-tracrRNA complex identifies PAM sequences within
invading DNA, enhancing prespacer selectivity for Cas1-Cas2 processing (Heler et al., 2015; Wei
et al., 2015). As with the Type I-E system previously discussed, spacers are integrated at the leader
proximal end of the array but unlike the E. coli system an integration host factor is not needed for
integration site recognition. Instead, a five base pair anchoring site located at the leader-repeat
boundary directs the integrase complex into position for precise spacer integrations (Mcginn et al.,
2016; Wright et al., 2016). The array is expressed as a long precursor crRNA transcript, which is
then processed into mature crRNA facilitated by the tracrRNA sequence. Transcribed tracrRNA
contains a sequence complementary to CRISPR-array repeats. Each tracrRNA anneals to a precrRNA repeat, which is further stabilized by Cas9 recognition and binding to tracrRNA secondary
structure. The tracrRNA:crRNA-Cas9 is then processed by a host ribonuclease III (RNase III) and
16
other unknown factors, producing mature crRNAs 39-42 nucleotides long. The ribonucleoprotein
complex is then capable of surveillance and target interference (Deltcheva et al., 2011).
Figure 1.8 | Components of CRISPR-Cas defense in the Type II-A system of S. pyogenes. All
components displayed are involved in the adaptation phase resulting in spacer acquisition. Array
expression stimulates crRNA biogenesis with the pre-crRNA processed into mature crRNAs via
tracrRNA, Cas9 and host factors including RNaseIII. Target recognition and interference are
performed by the Cas9-crRNA ribonucleoprotein.
The Cas9 protein contains six different domains including a C-terminal PAM-recognition
domain and two nuclease domains (HNH and RuvC) together capable of generating a double
strand DNA break within crRNA complementary protospacers (Jinek et al., 2014; Nishimasu et
al., 2014). The Cas9 first searches for an NGG protospacer adjacent motif. Temporary binding
includes partial unwinding of the upstream sequence for protospacer interrogation (Sternberg et
al., 2014). If annealing does not occur, Cas9 quickly detaches and continues the search by
continuing to sample PAMs. If full base-pairing occurs in the unwound region upstream of the
PAM, as seen in Figure 1.9, this leads to the formation of an R-loop by the displaced, PAMcontaining non-target strand (Jiang et al., 2016). R-loop formation initiates a conformational
change in the HNH nuclease domain leading to cleavage of both DNA strands with HNH breaking
the target strand and RuvC breaking the non-target strand, producing a blunt end dsDNA cleavage
(Jinek et al., 2012; Dagdas et al., 2017). Cas9 can also function as a negative transcriptional
regulator binding the Cas operon to repress expression of the downstream genes (Workman et al.,
17
2021). This activity is performed in conjunction with a “long form” tracrRNA expressed from a
different promoter than the tracrRNA utilized in crRNA biogenesis (Deltcheva et al., 2011). This
longer tracrRNA folds into a tertiary structure that presents a sequence with complementarity to
the Cas operon promoter, allowing for Cas9:tracrRNA to bind and repress expression.
Figure 1.9 | Cas9 effector enzyme protospacer processing. Cas9 bound by crRNA:tracrRNA
first identifies the protospacer adjacent motif (NGG) before upstream DNA is unwound and the
spacer anneals to the complementary protospacer sequence. With two endonuclease domains the
Cas9 generates two nicks, one on the protospacer and one on the opposite non-target strand,
producing a blunt end dsDNA break. Image is from Integrated DNA Technologies (IDT).
1.5 Surviving CRISPR-Cas self-genome targeting in E. coli
Prokaryotes can use template-based homologous recombination (HR) to accurately repair
double-strand DNA breaks that may occur in the chromosome (Cui and Bikard et al., 2016). This
generally takes place during the logarithmic growth phase when cells are replicating and multiple
copies of genome sequences are present in single cells at the same time (Pitcher et al., 2007). This
allows the undamaged DNA sequence to act as a template for repairing the damaged strand. This
HR occurs via multi-subunit helicase-nuclease complexes such as RecBCD that process the free
DNA ends and catalyze HR repair (Wigley., 2013). In E. coli, RecBCD will resect one of the two
DNA strands from a free DNA end, producing a ssDNA sequence subsequently loaded with RecA
18
proteins that promote homologous recombination with the template DNA sequence. This process
generally results in faithful DNA repair, restoring the original sequence and maintaining genome
integrity. When template DNA is not available prokaryotes rely on other pathways to process and
repair the free DNA ends (Bowater and Doherty., 2006; Brissett and Doherty 2009). Two nontemplate repair pathways present in some prokaryotes are non-homologous end joining (NHEJ)
and alternative end joining (AEJ). NHEJ is a robust DNA repair pathway in Eukaryotic cells.
Bacterial species with NHEJ generally have a simplified version, made up of just two proteins, Ku
and LigD (Della et al., 2004; Pitcher et al., 2005). Ku proteins form around the broken DNA ends
as a ring-like structure, protecting them from exonuclease degradation and recruiting the LigD
enzyme. LigD is a large protein containing nuclease, polymerase and ligase domains (Shuman and
Glickman., 2007). LigD utilizes nuclease and polymerase activities to process broken DNA
sequence before catalyzing an ATP-dependent ligation of the two ends (Zhu et al., 2005; Nair et
al., 2010). AEJ is another template-independent DNA repair pathway native to E. coli and
commonly known as microhomology-mediated repair (MMEJ). AEJ is not a robust repair pathway,
but it allows for the repair of broken DNA ends with very little DNA synthesis necessary at the
break site (Chayot et al., 2010). RecBCD first performs DNA end resection in search of DNA
microhomologies (1-9 bp) on both sides of the break site. When these matching homologies are
identified repair is facilitated by the NAD-dependent DNA ligase A, restoring a continuous
sequence and leaving behind a deleted region the size of the sequence between the original two
microhomologies.
Several studies have reported the cytotoxicity of CRISPR-Cas self-genome targeting in
bacteria and archaea, suggesting an inability for either HDR or AEJ to efficiently repair the
damaged DNA (Jiang et al., 2013; Vercoe et al., 2013; Citorik et al., 2014; Li Y. et al., 2016). Self-
19
targeting can however be overcome in presence of a repair template that omits the original
protospacer sequence. If the protospacer is still present after repair, the site can be retargeted by
CRISPR effectors. A designed template containing omissions beyond just the protospacer can
produce cells with a precisely defined protospacer-adjacent deletion (Csörgő et al., 2020). NHEJ
activity can also facilitate the repair of dsDNA breaks in E. coli. Cas9-catalyzed dsDNA breaks
repaired via Ku-LigD result in deletions of variable size (Zheng et al., 2017). Likewise,
bacteriophage T4 DNA ligase can function as a single protein NHEJ repair pathway in E. coli,
increasing survivability of self-targeting and producing deletions variable in size (Su et al., 2019).
The various possible outcomes of CRISPR-Cas self-targeting are outlined in Figure 1.10A. NHEJ
and AEJ based repair impact the target sequence by producing an indel or deletion, respectively,
allowing for repair without retargeting. Faithful DNA damage repair back to the original sequence
can apply a strong selective pressure to disrupt self-targeting, as the unchanged sequence would
require continual repair for survival. Naturally occurring mutations in the protospacer itself,
preventing spacer complementarity, or in the PAM, preventing effector recognition can eliminate
the threat of specific spacers (Figure 1.10B). Mutations can also arise in the CRISPR-array either
preventing transcription altogether or transcribing mutated crRNA sequences. Finally, mutations
in the CRISPR-associated or host facilitating machinery can also allow cells to escape the harms
of self-targeting.
20
Figure 1.10 | Effects of CRISPR-Cas self-genome targeting. (A) Cas9 self-targeting can result
in several distinct outcomes: i) cell death resulting from damage to an essential sequence or the
inability for a cell to repair the cleaved DNA, ii) DNA template based repair utilizing RecBCD
end resection and RecA to stimulate homologous recombination restoring the original sequence,
iii) non homologous end joining (NHEJ) can ligate unrelated DNA ends often resulting in small
indels, iv) alternative end joining (AEJ) utilizes RecBCD end resection and microhomologymediated repair to ligate the two ends resulting in loss of the sequence between the homologies.
(B) Strong selective pressure due to self-genome targeting can lead to the emergence of distinct
types of escape mutations: i) mutations in the protospacer or PAM preventing target
complementarity or recognition, respectively, ii) modified CRISPR array sequence preventing
target-site recognition, iii) mutations in the Cas machinery averting functional self-targeting. This
figure was adapted from Wimmer and Beisel., 2020.
21
1.6 Research Aims
Tools derived from prokaryotic CRISPR-Cas systems have revolutionized biological
research. Genetic engineering has greatly benefitted from the explosion in tools developed for
nucleic acid targeting and modification. These highly specific genetic recognition tools such as
Cas9 come from the interference machinery utilized by prokaryotes for defense during infection.
The capabilities of the Cas1-Cas2 adaptation machinery may not be as broadly applicable as
precision targeting but their use is the central function for several emerging technologies.
Expression of CRISPR-Cas genes cas1 and cas2 in the presence of at least one CRISPR array but
the absence of interference machinery allows cells to record the presence of nucleic acids in the
intracellular environment. Spacers are incorporated through polar array integration, recording the
sequence of acquisitions in chronological order. This capability enables continuous recording of
various biological events within living cells in situ. Research presented in the following chapters
is primarily based on further illuminating the fundamental nature of spacer acquisition and
enhancing acquisition capabilities. We also explore the application potential for CRISPR
interference machinery (e.g. Cascade-Cas3; Cas9) to be guided by crRNAs derived from Cas1-
Cas2 acquired spacers. CRISPR-Cas targeting applications generally define specific targets for
which the corresponding spacers are synthesized to program crRNA expression. Through cas1-
cas2 expression we can randomize targeting in bacterial cultures, exploring suitable applications
including genome streamlining and genome optimization under stress conditions.
22
Chapter 2
Assessing spacer acquisition rates in E. coli
Type I-E CRISPR arrays
A portion of the work presented in this chapter has been accepted for publication with the journal
Frontiers in Microbiology.
2.1 Introduction
CRISPR-Cas defense enables adaptive invader targeting through an updating array of
clustered regularly interspaced short palindromic repeats (CRISPR) containing a repository of
immunological targets (spacers) stored in the host chromosome. Arrays are expressed and
processed into short RNA sequences (crRNA) that guide CRISPR-associated (Cas) effectors to
eliminate targets with crRNA complementarity (Barrangou et al., 2007; Garneau et al., 2010;
Marraffini, 2015). Upon infection, the CRISPR-Cas immune response begins with an adaptation
phase whereby a small fraction of infected cells incorporates invader derived spacers between
repeat sequences within an array. Acquisition of spacers from sources not previously encountered
or in the absence of Cas effector machinery is referred to as naïve spacer acquisition (Fineran et
al., 2012).
In vivo spacer acquisition studies often utilize plasmid-based arrays and deep sequencing
to identify newly integrated spacer sequences. These studies have been crucial for expanding our
23
fundamental understanding of CRISPR adaptation and for the development of new applications by
providing insight into the relative differences in acquisition frequencies of specific spacer
sequences (Heler et al., 2019; Sheth et al., 2017), yet rates at which new spacers are integrated into
native arrays has not been rigorously studied. Although many mechanistic details of spacer
acquisition have been reported (Arslan et al., 2014; Ivančić-Baće et al., 2015; McGinn and
Marraffini, 2016, 2019; Nuñez et al., 2015), the temporal dynamics of this process and how these
dynamics are modulated by cellular parameters is understudied. Several promising spacerrecording applications are being developed that may benefit from a broader understanding of naïve
acquisition and a simple method to detect rate changes. These include recording intra- and
extracellular biological events within a lineage of cells over time (Munck et al., 2020; Sheth et al.,
2018), long term, ordered recording of transcriptional events (Lear et al., 2023) and digital-tobiological data storage (Shipman et al., 2017; Yim et al., 2021). In this work, naïve spacer
acquisition rates are quantified for Escherichia coli Type I-E CRISPR (Koonin et al., 2017). We
calculate mean spacer acquisition rates per cell and identify intracellular factors that modulate
these rates.
Spacer integrations are carried out by the Cas1-Cas2 integrase complex. This process not
only immunizes the host but generates a heritable and chronological memory bank of infection
history (Amitai et al., 2016; Jackson et al., 2017; Sternberg et al., 2016). CRISPR arrays identified
in wild-type bacterial genomes contain up to a few hundred spacers (Martynov et al., 2017; Pourcel
et al., 2020). The size of an array repertoire is optimized to maintain the diversity proportional to
the environmental threat, while being small enough to avoid diluting interference machinery with
obsolete spacers. Spacers are derived from sequences that contain a protospacer adjacent motif
(PAM), a short sequence that differentiates the array spacer from the protospacer target (Wang et
24
al., 2015). Spacer integrations are polarized, generally occurring at the leader-end of the array
(Bernick et al., 2012). Directly upstream of the array, the leader sequence contains the CRISPR
promoter and segments required for spacer integration (Díez-Villaseñor et al., 2013; Mitić et al.,
2023; Wei et al., 2015). The leader proximal repeat is duplicated with each spacer addition resulting
in array expansion the combined length of these two elements (Jackson et al., 2017; Yosef et al.,
2012). Spacer integrations in E. coli Type I-E CRISPR usually expand the array by 61 base pairs
(33bp spacers, 28bp repeats) (Shipman et al., 2016). In this system arrays are expressed as
precursor crRNA and subsequently processed into mature crRNA by the Cas6 enzyme of the
CRISPR associated complex for antiviral defense, known as Cascade. The Cascade complex is
comprised of five different subunits (Cas5, Cas6, Cas7, Cas8 and Cas11) forming an elongated 11-
protein architecture. Each crRNA is made up of a spacer and part of each adjacent repeat. Cascade
is guided by crRNA to a target sequence (protospacer) complementary to the spacer-derived region
within the crRNA. Once bound to a target, Cascade recruits helicase-nuclease Cas3 to degrade the
DNA (He et al., 2020; Liu and Doudna et al., 2020; Mulepati et al., 2011; Yoshimi et al., 2022).
This defense strategy enables adaptive invader-targeting by updating the array as foreign DNA is
encountered over time (Bolotin et al., 2005; Mojica et al., 2005; Pourcel et al., 2005).
Several studies have quantified spacer acquisition under laboratory conditions. Genomic
array deep-sequencing data has been used to quantify array-expanded fractions from Cas1-Cas2
expressing cultures at a single time point post induction (Levy et al., 2015), however this provides
limited insight into acquisition rates. PCR amplifications using primers flanking the leader-repeat1
integration site produce amplicon band intensities with ratios proportional to the expanded-array
subpopulations. This assay was used to accurately measure expanded fractions from a CRISPRadapted culture (Yosef et al., 2023), but also at a single time point post induction. Plasmid
25
barcoding has been utilized to identify independent acquisition events in bacterial cultures to
characterize relative rates of spacer acquisition (Heler et al., 2017). This method can provide
accurate acquisition rate comparisons between strains but does not elucidate the extent of
acquisition per cell in bacterial cultures.
In this study, strains of E. coli were engineered for controlled expression of Cas genes to
quantify CRISPR-array spacer acquisition dynamics. PCR and DNA gel electrophoresis were
utilized to measure the extent of spacer acquisition in genomic CRISPR arrays within bacterial
cultures over multi-day serial passage experiments. By tracking array expansion within
populations of E. coli, rates of spacer acquisition were calculated. We modified several
intracellular parameters and quantified their respective impacts on spacer acquisition rates. These
included Cas1-Cas2 expression levels, the presence of a high copy number plasmid, the presence
of multiple CRISPR arrays in the genome, and expression of heterologous non-homologous end
joining (NHEJ) genes from Mycobacterium smegmatis. NHEJ expression significantly enhanced
spacer acquisition rates, with this increased CRISPR adaptation providing greater phage infection
defense. In modeling spacer acquisition from the array expansion data, it appeared that spacer
acquisition slowed for populations of cells with longer arrays. Model parameterization identified
reduced fitness associated with array-expanded populations as the likely cause, which was
subsequently supported with competition experiments.
26
2.2 Materials and Methods
2.2.1 Assessing the accuracy of PCR quantifications
Polymerase chain reaction (PCR) was utilized to track the extent of CRISPR-array
expansion within cas1-cas2 induced cultures. Spacer integrations into CRISPR arrays is polar
occurring at the leader-proximal end of the array between the leader sequence and the first repeat.
Using a PCR primer pair that flanks this site of integration captures the presence and the extent of
spacer acquisition from a culture sample, as the genomic DNA template produced from a sample
contains array lengths proportional to the array-length subpopulations within a culture. PCR
amplifies the source template DNA to produce a large enough number of amplicons that can be
separated and visualized on an agarose gel through electrophoresis. After separation, these
amplicons can be quantified to differentiate population proportions. The amplification process can
be influenced by PCR bias, a disparity between the true template proportions within the population
and the amplicon proportions quantified on agarose gels. This bias can be mitigated by increasing
the amount of DNA template used in the PCR reactions and reducing the total number of
amplification cycles (Polz et al., 1998). Standard PCRs utilize 1-2ul DNA, running through 28-34
amplification cycles. We modified this protocol to 5ul template and 21 cycles.
To test the accuracy of this approach we set up control experiments that allowed us to
quantify the amplicon products from mixtures of two different clonal cultures. For example,
overnight clonal cultures from a parental +0 strain and a +1 strain were mixed together in different,
known proportions. For each clonal-culture pair six different mixtures were assessed. For
hypothetical clonal pair A and B they were mixed 0%(A) 100%(B), 20%(A) 80%(B), 40%(A)
60%(B), 60%(A) 40%(B), 80%(A) 20%(B), 100%(A) 0%(B). Prior to mixing, the overnight
27
cultures were normalized to OD600 by diluting the denser culture with LB media and mixing
together volumes proportional to the desired population proportions. Immediately after preparing
the six mixtures, samples were taken from each to prepare genomic DNA template for PCR. This
was done by mixing 15ul culture mixture with 15ul water in PCR tubes. The tubes were heated to
95°C for 15 minutes to extract the DNA. Tubes were then centrifuged to pellet cell debris, leaving
the genomic DNA template in the supernatant. PCRs were run as described previously using 5ul
template to generate amplicon products. These products were loaded into wells of 2% TBE-agarose
gels to separate the amplicons by size. The parental +0 amplicon product is 379 base pairs with
each spacer addition expanding the amplicon by 61 base pairs. TBE was used instead of TAE as it
is better at resolving smaller amplicons (within the expected size range). Relatively dense agarose
gels (2%) expand the lower half of the DNA ladder allowing for sufficient separation of amplicons
for individual quantification. Using GelAnalyzer software the band intensities for each of the two
amplicons were quantified. Several different clonal pairs were assessed to correlate the known
proportions of strains in each mixture to the quantified proportions calculated after amplifying and
size separating the starting DNA material. The mixed pairs assessed were as follows: +0 and +1,
+0 and +2, +1 and +2, +3 and +4, +4 and +5, with each set run in triplicate. The expanded clones
had all been isolated from a spacer recoding strain that had been induced for between 5 and 11
days of Cas1-Cas2 expression. Gel images and charts for two of these quantifications (+0/+1,
+0/+2) are seen in Figure 2.1AB. The average correlation from the five mixed pairs is shown in
Figure 2.1C. The average error between amplicon quantifications and mixed proportions was -
±3.62% on average across the 15 replicates.
In addition to using the clonal-strain mixtures as control experiments a second control
experiment was established to further scrutinize the accuracy of the PCR-product quantification
28
method. This second approach required screening individual clonal colonies isolated from cas1-
cas2 induced cultures. From cas1-cas2 induced culture PCRs relative proportions of the arraylength subpopulations can be determined from a single culture sample and PCR. These proportions
can also be identified by isolating clones from the culture, PCR screening each individually, and
determining array-length population proportions. The second control experiment compared these
two approaches, with clonal screening as the benchmark, since this measures the true population
proportions with sufficient sample size. Cultures from two strains (BL21AI-Cas1Cas2 recording
strain with and without pUC19 present) were serial passaged once daily for five days, with constant
induction of cas1-cas2. At the 120h mark the two cultures were each serial diluted and plated onto
LB agar to isolate individual cells within the respective populations. The cultures were also
sampled at 120 hours to perform PCRs quantifying the array-length population proportions from
amplicon separation. After 24h of incubation clonal colonies from the LB agar plates were
individually screened to determine the length of the genomic array. 50 clones from the pUC19
strain and 48 from the plasmid free strain were screened with percent of clones compared to the
mixed population PCRs. Also, 100 clones were screened from the pUC19 containing strain after
about 1 day. From all three experiments the average error of the mixed culture PCR method was
~5% relative to percentages calculated form clonal analyses (Figure 2.1D). Taken together these
control experiments validated our approach for quantifying expanded-array subpopulations within
cas1-cas2 induced cultures. Newly acquired spacers from some of the individual colonies were
sequenced, showing new spacers derived from either the genome or the pUC19 plasmid, depending
on the strain (Table 2.5).
29
Figure 2.1 | Validating the accuracy of PCR-based quantifications (A) Clonal strains with
different array lengths were grown separately, mixed in a known ratio, and analyzed via PCR and
gel electrophoresis. Here, strains with arrays of length +0 and +1 were mixed. PCR products were
separated by gel electrophoresis and imaged. Image analysis using the ladder bands convert band
intensities to picomoles. Correlation curves were generated with known proportions charted versus
pmol proportions calculated from the PCR-gel images. (B) Clonal +0 and +2 cultures were mixed,
processed, and evaluated as in A. (C) The average correlation from 15 different mixture series is
plotted ±SD. These mixture series were made up of the following pairs: +0 /+1, +0/+2, +1/+2,
+3/+5, and +4/+5. Each mixture was run in triplicate producing 15 correlation curves. (D) The
base recording strain with and without pUC19, starting at array length +0 were grown for either 1
or 5 days with cas1-cas2 induction generating a mixed population with multiple array lengths.
Cells were isolated by plating on LB-agar, and the array length for individual, clonal isolates was
measured using PCR. The fraction of screened clones with an expanded array was compared to
the expanded fraction calculated from the culture using the PCR method. 50 colonies were
30
screened in the no-plasmid strain, 100 in the strain with pUC19 (1-day) and 48 (5-day). These
results show error in array-length percentages (as measured using PCR) of ±3.62% on average.
2.2.2 Methods for genome-integrating DNA constructs
Several strains needed for this study that required the integration of constructs into the E.
coli BL21-AI genome. We utilized two different approaches for engineering these strains. All the
spacer recording strains utilized a T7-lac inducible cas1-cas2 operon derived from plasmid
pCas1+2 (Addgene: 72676). Lambda red plasmid pCas (Addgene: 62225) contains the three
lambda phage genes gam, beta and exo, which facilitate the homologous recombination of linear
DNA fragments into the E. coli genome. Plasmid pCas was electroporated into parental strain
BL21-AI. A linear amplicon of the cas1-cas2 operon along with the spectinomycin resistance
marker SmR was then generated from pCas1+2 using primers with homology to a known, high
expression region within the E. coli genome. This amplified fragment was then electroporated into
the E. coli strain containing pCas, already expressing the three lambda red genes. Genome
integration was confirmed, establishing the base recording strain, capable of expressing Cas1-Cas2
and acquiring spacers in the absence of plasmids.
Validation of Cas1-Cas2 expression was done by separating protein extract on an SDSPAGE gels. This was first done in two strains (base recording strain with genomic inducible cas1-
cas2 operon, and BL21-AI with the pCas1+2 plasmid) derived from cultures that were either
induced or uninduced prior to harvesting total cell lysate (Figure 2.2). Both strains produced the
Cas1 band present only when IPTG and arabinose treatment was applied to the cultures prior to
harvest, confirming T7-lac as a functional Cas1-Cas2 expression switch. In a separate set of
experiments the base recording strain with the genome integrated cas1-cas2 operon was induced
31
with 0.05mM IPTG and 0.2% arabinose prior to total cell lysate harvesting at several time points
post induction (Figure 2.3A). Cultures of the same strain dosed with fixed arabinose (0.2%) and
variable IPTG (0-5 mM) were harvested at 3h post induction for total cell lysate SDS-PAGE
analysis via Cas1 band intensity quantification (Figure 2.3B). The Cas1 bands were quantified
using GelAnalyzer software and normalized to the housekeeping protein GAPDH (Figure 2.3CD).
Figure 2.2 | Cas1 protein expression confirmed via SDS-PAGE separation. The E. coli Type IE CRISPR Cas1 nuclease is 33.194 kDa. Cas1 expression was confirmed from two sources: A
strain containing a single genomic copy of the cas1 gene in an operon with cas2 and controlled by
a T7-lac promoter (left) and a strain containing a plasmid with a cas1-cas2 operon controlled by a
T7-lac promoter (right). Both instances show a clear protein band at the Cas1 size upon arabinose
and IPTG induction. The genomic construct was derived from the plasmid. Expression was
induced with 0.2% arabinose and 1mM IPTG.
32
Figure 2.3 | Cas1 protein expression vs. time and IPTG concentration. (A) Total cell lysate
from the base recording strain dosed with IPTG (0.05 mM) and arabinose (0.2%). Lysate was
harvested from cultures at distinct time intervals post induction with proteins separated using SDSPAGE gel electrophoresis. (B) Total cell lysate from the base recording strain 3h post induction
with fixed arabinose dose (0.2%) and variable IPTG dose (0-5 mM). The 0mM IPTG condition
also did not receive arabinose. The Cas1 band is ~33 kDa. Band intensities were quantified and
normalized using housekeeping protein GAPDH (~35.5 kDa). lacZ gene product b-Galactosidase
is also identified as expression of this protein is upregulated in the presence of IPTG. (C) Cas1
protein band intensities quantified and normalized from A. (D) Cas1 protein band intensities
quantified and normalized from the last 7 lanes of panel B.
The base recording strain was then further modified to produce several more strains used
in the study. A streamlined genome integration system called INTEGRATE (Vo et al., 2021) was
utilized from this point as an alternative to lambda red recombineering as it is more efficient and
allows for the integration of larger DNA sequences (up to 10kbp). INTEGRATE is based on a
33
naturally occurring RNA-guided CRISPR integrase system. All components necessary for targeted
genome integration are contained in a single plasmid called pSPIN (CRISPR-integrase genes,
guide-RNA expression construct and cargo flanked by integrase recognition sites). The process
requires programming in coordinates (host target site for cargo integration) through ligation of a
spacer sequence complementary to the intended integration target. Restriction enzyme digestion
of the stock pSPIN with BsaI generates staggered DNA breaks, releasing a dummy spacer in
between two array repeats and linearizing the plasmid. Individual spacers for this targeting system
are 32 base pairs. Two complementary single strand oligos with accompanying overhangs (35bp
total) are annealed together, with the resulting duplex oligos phosphorylated and subsequently
ligated with the digested pSPIN plasmid. Constitutive expression of this ligated region produces
an RNA sequence that guides the CRISPR-integrase to the complementary integration site. Within
the same plasmid a cargo of interest can be programmed into the region flanked by integrase
recognition sites via Gibson Assembly. We used the pSPIN plasmid containing a pSC101
temperature sensitive origin of replication (Addgene: 160729). This allowed us to transform our
host strain with the engineered plasmid and cure the strain, after sufficient time for integration to
occur, by culturing agar-plated cells at 41°C. This system was used to genome integrate E. coli
Type I-E cascade-cas3 (9480 bp) to enable CRISPR interference, a mini version of the native Type
I-E CRISPR array (641 bp) allowing for two arrays to independently acquire spacers, and
heterologous non-homologous end joining (NHEJ) genes ku/ligD (4529 bp) derived from
Mycobacterium smegmatis, enabling a useful DNA end-joining repair pathway. Sites of host
genome integration were chosen based on a study that identified several high expressing regions
within the E. coli genome ideal for synthetic construct integration (Park et al., 2020). Along with
34
the cargo, pSPIN also integrates the cargo ends recognized by the transposase. This produces a
DNA scar of about 270 base pairs for every genome integration.
2.2.3 Spacer acquisition assay
To study CRISPR adaptation in our Cas1-Cas2 enabled E. coli strains we established an
assay protocol for inducing expression and assessing the extent of spacer acquisition in the
experimental cultures. We intended to characterize the spacer acquisition rate in our base recording
strain and identify intracellular factors that impact that rate. Serial passaging experiments were
generally run for 5 or 11 days, producing 6 or 12 PCR samples for analysis, respectively. Cultures
were under constant induction for Cas1-Cas2 expression by dosing with IPTG and arabinose. All
experimental cultures were run in triplicate, each inoculated from an overnight culture scaled up
for 12-15 hours. Passaging occurred every 24 hours with 30uL (1:100 dilution) of the 3mL scaled
culture passaged to a new tube containing 3mL fresh media, antibiotics and the induction
chemicals. At the time of passaging cultures were sampled (15uL) for PCR analyses. The
completed PCRs were stored in the 4°C fridge until the experiment was completed, at which point
the PCR products from each day were run together on an agarose gel (one gel for each experimental
replicate), to separate the bands via electrophoresis. A second sample/PCR strategy was compared
to our daily-PCR method to see if storing the PCR products in the fridge for up to five days would
affect our results. For this second approach, 50uL culture samples were pelleted, decanted and
stored at -20°C until the end of the experiment at which time all PCRs were run at the same time
immediately prior to separating products on agarose gels. At the conclusion of the experiment,
50uL samples stored for this alternative method, were resuspended in 45uL water. From here, the
35
procedure is the same as our primary method: 15uL sample was mixed with 15uL water for a 95°C
heating cycle to produced template, 5uL of which was subsequently used for the PCR reactions.
Both strategies produced very similar results as seen in Figure 2.4. Daily PCRs and 4°C storage
was employed thereafter. Most serial passage experiments were run for five days as this time range
provided enough data to measure average spacer acquisition rates per cell. Some experiments were
run for 11 days but we found that variance across replicates was considerably greater from days 6
through 11.
For each PCR product, 20ul was mixed with 7ul dye (6x dye, NEB: B7025). From this
mixture, 20ul was added to the agarose gel well (8-well biorad comb, 1.5mm thick). DNA ladder
used was the 1kb plus ladder (NEB: N3200). Ladder was added to the first and last wells of the
agarose gel (8ul). The electrophoresis gel box was placed on ice prior to starting current flow. Gels
were run for 70-80 minutes at 110 volts and 400 mA. The ice prevented the gels from overheating
and dissipating signal. Sybr safe (ThermoFisher) was used as the DNA intercalator (1uL per gel).
After electrophoresis, gels were imaged on the Bio-Rad GelDoc EZ Imager. These images were
analyzed through pixel analysis to quantify array-length subpopulations within respective cultures.
36
Figure 2.4 | Quantification comparisons for two different sample preparation methods. Two
distinct culture sampling methods from a spacer-recording strain induced for Cas1-Cas2
expression were compared after quantifying the array-expanded subpopulation proportion. In this
experiment samples from the same induced culture were either frozen at minus 20°C after being
pelleted and decanted (pink) or immediately prepared for array-proximal PCR with the PCR
products stored at 4°C (black). The pelleted samples were all prepared for PCR and amplified
using the array proximal primers on the same day prior to being run together on an electrophoresis
gel. The storage method maintained the PCR samples in the fridge as reactions were run on the
same day as sampling. These samples were collected from the 4°C fridge at the end of the induction
period and run on an electrophoresis gel together.
2.2.4 Procedure for quantifying amplicon bands
The ladder bands were used to convert band intensities to picomoles. Band intensities were
initially generated using GelAnalyzer software. This software enabled easy background omission
and quantification of each detectable subpopulation band within amplified PCR samples. The pmol
values, converted from the initial amplicon band intensity quantifications, were tracked over time
and used to determine spacer acquisition rates. Each experimental amplicon band size sits between
two DNA ladder bands. The parental +0 band is 379bp for example, which is between the two
ladder bands 300bp and 400bp. For the ladder bands, both the pmols and intensity were known,
37
for the experimental bands only the intensity was known. The pmols/intensity units for the ladder
bands were calculated to account for variability across the gel. The pmols/intensity values of the
300bp and 400bp ladder bands were used to determine the same value for the 379bp band. The
intensity value for the 379bp band was multiplied by pmols/intensity to get pmols. This procedure
was followed for all amplicon bands. The pmols from all bands in an agarose gel lane were added
together to get total pmols for a given sample. Though the total pmols varied from different
amounts of the same sample run on a gel, the band proportions did not.
2.2.5 Fitting procedure for CRISPR array expansion rates
Model parameters were fit to the mean of the experimental replicates. For the initial
calculation of the array expansion rate reported, the loss of cells at array length +0 was fit to Eq. 5
using the Matlab Curve Fitting Tool with residuals weighted by the inverse of the standard
deviation. To account for the fitness effect of array expansion, array expansion rates reported were
generated by solving equations 3, 4, and 7. Data was fit from 24 h to the end of experiment. Optimal
parameter values were determined using a weighted least squares fit, implemented with Matlab’s
inbuilt lsqnonlin fitting function (trust region reflective algorithm). Weights for a given data point
were defined as the inverse of the standard deviation at that data point. In cases where no band was
detected experimentally, the residual was assigned a weight of 0. For the case of the strain with
two arrays, the equations were modified to account fitness costs associated with expanding both
arrays simultaneously, see Text S2. Estimations of error in fit parameters were calculated by
performing bootstrapping on each data set. Parameters and errors reported in this work result from
38
averaging 500 bootstrapping iterations. For statistical comparisons of expansion rates, see Table
S4.
2.2.6 Expanded array sequencing
Clones were isolated by plating diluted cultures onto LB-agar after one-day or five-day
induction time course experiments. PCR amplicons were generated with the method previously
described to screen for array-expanded colonies. For expanded clones that were sent for
sequencing a second PCR was performed, and subsequent PCR cleanup was carried out for each
post-PCR sample. Sanger sequencing was performed on these clonal samples using one of the two
standard array-PCR primers (Table S5). Sequencing data was imported into SnapGene for
amplicon analysis to identify the newly integrated spacers.
2.2.7 Simulations
Simulations were run in Matlab, see SI Codes 1 and 2. In these simulations, a population of
9,780 cells with array length +0 and 220 cells with array length +1 comprised the initial arraytype subpopulations, based on experimental measurements of the composition 24h post
induction. The simulation had a time step of 5 min with an end time of 10 days. For the initial
model, the culture grew exponentially with a growth rate constant of 0.02 1/min. Upon reaching
a population size of 108 cells, 104 cells were selected at random to inoculate a new culture. At
each timepoint, each cell had a probability of 8.16833 x 10-5 of gaining one new spacer.
Simulations were modified to incorporate reduction in the array expansion rate, reduction in the
39
growth rate, or mutations. For simulations with mutations, each cell had a low probability (10-5
to 10-7
) of becoming a mutant with an array expansion rate of 0 and a variable gain of fitness
(either +0% or +3%).
2.2.8 Competition experiments
For the base recording strain with and without pUC19 the standard cas1-cas2 induction
experiment was carried out for 24h. At the 24h mark cells were passaged 1:100 into fresh LB media
with antibiotics but without induction chemicals (IPTG and arabinose). From this point on, cultures
were not exposed to IPTG or arabinose. Cultures were grown from 24-32h, to allow for residual
Cas1-Cas2 to degrade. At 32h, each culture was sampled for PCR analysis across the spacer
integration site to quantify baseline (expanded cells)/(all cells) population ratios. Cultures were
again sampled for PCR experiments and passaged 1:100 at 48h and 72h, with the last PCR samples
run at 96h. The PCR ratios were quantified at each time point to assess changes in relative
proportions over time (Fig S2.3AB). From one of the three replicates of the base strain w/pUC19,
the culture from the 32h time point was plated to single cells onto LB agar. 100 clonal colonies
were PCR screened across the array integration site. 14 of the 100 clones contained an expanded
array (all +1). These +1 clones were individually competed against the parental +0. In these 14
clonal competition experiments, overnight cultures were OD600 normalized, 50:50 volumes of the
+1 clone and +0 were first mixed into a sterile 1.5mL Eppendorf tube. This mixture was used as
PCR template for the 0 time point and used to seed the initial 3mL cultures (30µL) containing
antibiotics. No induction chemicals were used. Competition experiments were run for 48h with
passaging occurring at 24h and PCRs run on samples at 0h, 24h and 48h (Fig. S2.3C-E).
40
Carbenicillin and spectinomycin antibiotics were dosed into all 14 cultures except for two. Slow
growing clones 8 and 14 were sensitive to spectinomycin so only carbenicillin was used for those
two competition experiments.
2.2.9 Calculating spacer acquisition rates
To calculate spacer acquisition rates, we first needed to determine the average number of
spacers per cell within our samples. This was done by multiplying the expanded populations (%
of total population) by the number of spacers within those groups and summing together (e.g. 20%
plus-1 population and 5% plus-2 population would be 0.2 + 2(0.05) = 0.3 spacers per cell). Six
data points were generated for each experimental replicate (hours after the start of the experiment:
0, 24, 48, 72, 96, 120). The 0-to-24-hour period (first growth cycle) included a Cas1-Cas2
expression ramp up period, so this period was not included in rate calculations. The other five time
points however 24-120h were together used to calculate rates of acquisition.
2.2.10 Phage infection studies
Several E. coli base recording strain variants with distinct acquisition rates were further
enabled with CRISPR interference machinery Cascade and Cas3. The aim was to see if spacer
acquisition rates correlate with resistance to phage infection. To test if array expansion rate
influences phage resistance, CRISPR-interference enabled spacer acquisition strains with different
array expansion rates were infected with bacteriophage Lambda. We utilized an infection protocol
that tracks cell density post phage inoculation (Rajnovic et al., 2019). OD600 measurements
41
tracked culture growth in response to an infection. OD600 curves were compared for infected and
uninfected cultures to determine the extent of phage-induced growth inhibition.
2.2.10.1 Phage propagation
An E. coli lysogen containing bacteriophage Lambda prophage was used to produce
purified phage for our infection experiments. For isolation of bacteriophage Lambda, an
engineered strain of E. coli containing plasmid pB33recA730 allows for induction of the lytic cycle
with arabinose. An overnight culture of this strain was passaged 1:200 into 3mL fresh media with
chloramphenicol. The culture was grown until it reached OD600 ~0.4, at which point it was dosed
with arabinose at a final concentration of 0.2%. The culture was protected from light and grown at
37°C until lysis occurred, and the culture became clear. The solution was then centrifuged to clear
the debris. Supernatant was transferred to a fresh tube and chloroform was added to sterilize (100
µL chloroform for 5-10 mL supernatant). The solution was transferred to a polystyrene tube to
extract the chloroform before sample transfer to a 15 mL conical tube and wrapped in tinfoil for
storage at 4°C.
2.2.10.2 Plaque formation assay
A ten-fold dilution series was made from purified bacteriophage. The 106
, 107 and 108
dilutions were separately plated with MG1655 E. coli suspended in 0.7% top agar containing LB
supplemented with maltose and MgSO4. The plates were incubated overnight at 37°C and the
resulting plaques enumerated to determine the purified-phage concentration.
42
2.2.10.3 Bacteriophage infection assay
E. coli strains were inoculated and cultured overnight at 37°C in LB media supplemented
with maltose and MgSO4 (LBMM). 3mL fresh LBMM was prepared in 14 mL falcon tubes along
with the appropriate antibiotics and induction chemicals to express Cas1-Cas2 (IPTG, arabinose),
Cascade and Cas3 (rhamnose). Overnight cultures were normalized to OD600 prior to 15 µL
inoculations with or without phage. To induce infection, Lambda phage was inoculated at a
multiplicity of infection (MOI) of 0.02. Immediately after inoculation these 3 mL cultures were
distributed as 200 µL replicates into a flat-bottom 96-well plate (Corning). Absorbance at 600 nm
was recorded every 20 minutes for 21 hours using a microplate reader (TECAN Infinite 200 PRO).
For each strain, uninfected and infected OD600 was plotted over the course of the experiment and
area under the curve, using the trapezoidal rule, was quantified to calculate the percentage of
growth inhibition (PI). This is calculated by finding the difference in areas under the curve for
uninfected control and infected cultures (45). The areas are calculated from a start point of
detection (SPD) to an end point of detection (EPD). The SPD is the threshold at which growth is
first detected in the cultures, defined as when the uninfected control reaches a growth rate of 0.001
OD units per minute. The EPD was defined as 15 hours post SPD. The PI values were analyzed to
approximate the relative phage resistance for each E. coli strain, see Table S4 for statistical tests.
Doubling times (Td) were calculated from the uninfected cultures for each strain (Fig. S2.10).
2.3 Results
To monitor CRISPR array expansion over time, PCR was used to measure the proportion
of the array-expanded populations at each array length. This assay has been reported previously to
43
identify array expansion after culturing cells for several hours with Cas1-Cas2 expression (Wei et
al., 2016). We used E. coli BL21-AI as our host strain to study spacer acquisition (Wei et al., 2016;
Yosef et al., 2012). This strain is deficient in all Type I-E Cas components but does include a native
CRISPR array. We integrated an inducible cas1-cas2 operon into the genome. This allowed us to
study spacer acquisition in a “base recording strain” free of Cas interference machinery
(Cascade/Cas3) and plasmids. We did however transform pUC19 into some spacer recording
strains as it provides excess Cas1-Cas2 template to amplify acquisition. The parental BL21-AI
CRISPR array contains 13 conserved spacers and 14 repeats. PCR primers flanking the leaderproximal end of the array were used to amplify samples from cultures induced for Cas1-Cas2
expression (Figure 2.5A). One of the primers is complementary to part of the upstream leader
sequence and the other to conserved spacer 5. Unexpanded parental arrays produce 379 base pair
amplicons with expanded subpopulations 61 base pairs longer for each new spacer addition. PCR
products were separated by size on agarose gels through DNA electrophoresis allowing us to
differentiate band intensities (Figure 2.5B). These amplicon bands were converted to picomoles
and subsequently used to evaluate array-length subpopulation ratio changes throughout our
experiments. Control experiments verified that this method can accurately measure array lengths
that represent as little as ~0.5% of the population with an error of approximately 3.6% (Fig.
S2.5CD), consistent with previously reported data (Amlinger et al., 2017).
Over time, cas1-cas2 induction produces longer arrays within the population due to the
continued addition of new spacers (Figure 2.5B). Cultures induced for constant Cas1-Cas2
expression were grown for 5-days, with subculturing and PCR-based length measurements
performed every 24 hours. Amplicon bands were sufficiently separated via gel electrophoresis,
with band intensities proportional to the frequency of that array length within the population. The
44
DNA ladder with bands of known concentrations were used to convert experimental band
intensities to pmols. The fraction of cells at each array length, �!, can be calculated using,
�! = �!/ ∑ �!, Eq. 1
where �! is the number of cells with array length �. Cells were cultured over 120 hours, with
unexpanded parental arrays gradually decreasing as expanded subpopulations increased in
proportion (Figure 2.5C). Given the percentage of the population at each detectable array length,
the average array length can be calculated using,
�( = ∑ �!�!, Eq. 2
where �! is the array length of subpopulation � and �( is the average array length across the whole
population. The extent of expansion in the population, calculated as the “average new spacers per
cell” over time, is shown in Figure 2.5D.
45
Figure 2.5 | Quantifying the temporal dynamics of spacer acquisition. (A) An overview of the
adaptation phase of CRISPR adaptive immunity in the E. coli Type I-E system. CRISPR arrays are
made up of alternating repeats (diamonds) and spacers (ovals) along with an upstream leader
sequence. Cas1 and Cas2 form a six-subunit complex that captures and processes small fragments
of DNA before integration as spacers between the leader and first repeat. Each spacer integration
duplicates the leader proximal repeat, together expanding the array by 61 base pairs. Red arrows
represent PCR primer binding sites used to detect array expansion. (B) PCR and DNA gel
electrophoresis measure changes in array lengths within a culture of cells expressing Cas1-Cas2.
At 120h, cells within the population have gained up to 4 new spacers. The band intensity is used
to quantify the relative proportion of cells at each array length. (C) The ratio of cells at each array
length is tracked over several days. (D) The average number of new spacers acquired per cell
within a culture is calculated at each time point through the experiment.
46
2.3.1 A model of array expansion
Array expansion occurs as individual CRISPR arrays gain spacers in a sequential process.
This process can be modeled as shown in Figure 2.6A. A cell with array length +0, the array length
at the start of the experiment, transitions to array length +1 at a rate proportional to �",#. Cells with
arrays of length +1 can then transition to arrays of length +2 at a rate proportional to �",
$ and so
on. Similarly, the model allows for contraction of the CRISPR array (spacer deletion), such that
cells with an array length of +1 can transition to length +0 at rate �%,
$. In this model, the number
of cells with array length �! is �!. Cells at each array length divide with growth rate constant µ.
Combining these processes, the change in number of cells with array of length �! follows:
&'!
&( = �#� − �#�",# + �$�%,$, Eq. 3
and for cells with array length of +0,
&'"
&( = �!� − �!�%,! + �!*$�%,!*$ − �!�",! + �!+$�",!+$. Eq. 4
Using these equations, the experimental data can be fit to calculate the array expansion rate. In
this initial fit, the growth rate of cells and array expansion rate was assumed to be constant (i.e.
does not change over time and is independent of array length).
The rate of array contraction was set to zero, as no cells with contracted arrays were
detected over 120 h of measurements. Contraction was monitored in two ways. First, the back end
of the CRISPR array, from parental spacer-5 through the end of the array beyond parental spacer13 was measured across a 5-day time course for the base recording strain with and without pUC19
(Fig. S2.1A). The standard acquisition measurements probe only the leader proximal end of the
array however contraction of the array may occur at any point from the leader-proximal to the
47
leader-distal end. No spacer loss was detected in the back end of the array over 5-days. Second,
the leader-proximal end of the array was measured for several expanded clones with different array
lengths between +1 and +5. These clones were isolated from a culture of the pUC19 recording
strain that was previously induced for Cas1-Cas2 expression. A subsequent five-day non-induction
time course was run for mixtures of these expanded clones with none producing PCR bands below
the starting amplicon size, indicating no appreciable loss of the newly acquired spacers (Fig.
S2.1B). Although array contraction has been identified both experimentally and through
comparative genomics (Deecker et al., 2020; Garrett et al., 2021), these two experiments
demonstrate that spacer deletion events are insignificant in these strains over the time scale
analyzed. The replacement of spacers in the array with new sequences may be possible, but was
not observed in any sequenced arrays and seems unlikely to noticeably bias measurements of array
dynamics.
With these assumptions, the model used the decay of the percentage of cells at the starting
array length to calculate the array expansion rate. Figure 2.6B shows the change in the fraction of
cells at the original array length over time. As derived in the Text S1, when assuming array
expansion and cell growth are constant for all array lengths, the change in the fraction of cells at
the original length follows:
�#(�) = �+,#( Eq. 5
Figure 2.6C reports the array expansion rate fitting the data from Figure 2.6B using Equation 5.
To check if this expansion rate was consistent with the change in all array lengths over time, we
simulated the fractions of cells at all array lengths over the experimental timeframe using the rate
from Figure 2.6C. The simulation used a Euler forward algorithm and Equations 3 and 4 to predict
the change in array lengths within the population over time. Simulation results are shown next to
48
experimental measurements in Figure 2.6D, indicating experimental measurements of average
array length begin to deviate from model predictions towards the end of the experiment. Figure
2.6E further shows predictions of expansion at each array length systematically deviates from
experimental results. These comparisons suggest some form of feedback that reduces the fraction
of cells at longer array lengths.
Figure 2.6 | Modeling CRISPR spacer acquisition. (A) In the model, arrays with length � expand
at rate �",!, resulting in the addition of one new spacer to the array. Arrays can also lose spacers at
rate �%,!. (B,C) Assuming cells at all array lengths have the same array expansion rate and cellular
growth rate, the loss of the cells at the original array length +0 was used to calculate the rate of
49
array expansion. (D) Using the calculated array expansion rate, a simulation predicted the average
length of the array over time, showing deviation from experimental results at later times. (E)
Simulation results for the percentage of cells at each array length also indicate systematic deviation
from experimental results. Means are reported from three biological replicates ± SD.
To further explore the differences between model predictions and experimental data, the
array expansion measurements were run to 10 days. As shown in Figure 2.7A, the average array
length in the population increases at a lower rate over time, not following the linear growth
trajectory predicted by the model. The model was adapted to include feedback related to array
length. Three options for reducing the percentage of cells with longer arrays were considered. The
first model variant made array expansion dependent on array length (Figure 2.7B), specifically,
�",! = �",# �,
! , Eq. 6
where �,
! is a factor that reduces the rate constant for array expansion rate raised to the power �.
The second model variant reduces the cell growth rate for every new spacer added to the array
(Figure 2.7C). In this model,
�! = �# �-
! , Eq. 7
where �-
! is a factor that reduces the rate constant for cell growth rate raised to the power �. The
third model variant assumes that mutations appear within the population over time that deactivate
spacer acquisition (Figure 2.7D). In simulations, the mutations occur at a frequency of 10-6
mutations / (cell min). 10X and 0.1X mutation frequency is 10-5 1/(cell min) and 10-7 1/(cell min),
respectively. Mutated cells have array expansion rate constants equal to zero. The potential of
mutants having a fitness advantage was also explored. These three models were compared by
running simulations that approximate the experimental procedure. In the simulation, cells grow
over time and at each timepoint individual cells add a single new spacer with probability 0.0049
50
1/hr. When the culture reaches 108 cells, a small fraction of cells, 0.01%, are inoculated into fresh
media. Transferred cells are chosen randomly from the population.
Comparison of the modeling results points to a fitness cost to array expansion, see also Fig.
S2.2. The reduction in the array expansion rate would be very large to account for the trend in
average array length over time, more than a 50% reduction in the expansion rate for every new
spacer. The mutations that cease array expansion would have to be frequent and have a fitness
benefit of several percent. Conservatively, the rate of mutations that would impact spacer
acquisition would be less than 10-8 1/(cell min) (Foster et al., 2015), so the simulations represent
the extreme case of an abnormal frequency of mutation. Conversely, the reduction in fitness that
results in the average array length levelling off to 0.5 spacer per cell would be between 0.2% and
1%, only a small penalty.
Testing the fitness hypothesis experimentally, cells with arrays of different lengths were
competed over time. First, a clonal culture with a starting array length of +0 was expanded for 24h,
via cas1-cas2 induction. This resulted in a culture containing a mixture of cells with mostly +0
and +1 array lengths. Cells were then transferred to media without inducer, and after waiting 8h
for expansion to cease, the ratio of +1 to +0 cells was monitored via the PCR-based method over
64h (Figs. 2.7E and S2.3). This experiment was performed using the base recording strain with
and without pUC19. Results showed +0 cells outcompeted +1 cells over time. From these
measurements, the cells with the longer array had a growth rate constant ~0.27% smaller (Figure
2.7F). Growth rates for cells with different array lengths were predicted using this measured deficit
(Figure 2.7G). To see if these predicted growth rates matched experiments, the experimental data
from Figure 2.7A was refit, with the growth rate of cells at each array length as free parameters.
The extracted growth rates were similar to predictions based on the competition experiment
51
(Figure 2.7H). Additional competition experiments between individual strains with array length
+1 and the unexpanded strain revealed variable fitness consequences for array expansion that was
on average slightly negative (Fig. S2.3E).
To explore the consequences of this fitness change, spacer acquisition was simulated over
35 days. Array-expansion in populations without a fitness reduction (Figure 2.7I) were compared
to populations with a fitness reduction (Figure 2.7J). Both models use the same rate of spacer
acquisition, highlighting the impact of fitness on average array length over time (Figure 2.7K).
Figure 2.7 | Array expansion associated with a fitness cost. (A) Spacer acquisition over 10-days
of cas1-cas2 induction for the base recording strain containing pUC19. Comparing experimental
results with a model that assumes the rate of array expansion and cell growth are independent of
array length. Given the reduced expansion observed in experiments, modifications of the model
were explored. (B) A model in which the array expansion decreases by X% for every new spacer.
52
(C) A model in which cell growth rate slows by X% for every new spacer. (D) A model in which
cells within the population mutate. Mutant cells have an array expansion rate of zero and may have
a gain in fitness. (E) Experimental validation of a fitness cost associated with array expansion.
Cas1-Cas2 was expressed in cells with a starting array length of +0 for 24 h, resulting in a mixed
population of cells with array lengths of +0 and +1. Cells were transferred to media without
inducer. Starting at 32 h the ratio of +1 and +0 cells was monitored over time using the PCR-based
method. n=3 for each condition. (F) From the change in the ratio of +1 to +0 cells, the ratio of cell
growth rate, �1/�0 or �-, was calculated. (G) This value of �- predicts the growth rates for cells
with different array lengths. (H) The experimental data shown in A was fit to calculate the growth
rate for cells at each array length, revealing a similar trend of a small decrease in the growth rate
as the array expanded. (I) Simulated spacer acquisition over 35 days with a constant acquisition
rate and no fitness effects associated with array length (�- = 0). (J) Simulating spacer acquisition
over 35 days with a constant acquisition rate and a growth reduction of �- = 0.9975. (K) The
average array length over time using the simulation results from I and J. A-D means of three
independent simulations ± SD. E, F show means ± SD.
2.3.2 Cellular parameters modulate spacer acquisition rates
Next, we examined how the rate of CRISPR array expansion is affected by cellular
parameters. As shown in Figure 2.8A, array expansion involves Cas1-Cas2 DNA protospacer
substrate processing prior to insertion as a new spacer into the array. We hypothesized parameters
that influence spacer acquisition rates include the expression level of Cas1 and Cas2 proteins, the
availability of DNA substrate, and the number of CRISPR arrays within the cell. Array expansion
rates were measured in 5-day cas1-cas2 induction experiments that modulated these three
parameters. Rates were generated by fitting the expansion data for both array expansion rate and
the fitness penalty �- as seen in Equation 7. All �- values from these experiments can be found in
Fig. S2.4.
53
2.3.2.1 DNA substrate
Our base spacer recording strain does not contain any plasmids, acquiring exclusively selfgenome derived spacers (Table S1). When present, intracellular mobile genetic elements (MGEs)
such as plasmids and bacteriophage contribute to the Cas1-Cas2 substrate pool (Sheth et al., 2017;
Levy et al., 2015). We measured spacer acquisition in the base strain with and without the high
copy number plasmid pUC19. With no plasmid, the acquisition rate was 2.37E-3 spacers/cell per
hour. The rate increased roughly 1.8x to 4.28E-3 spacers/cell per hour with pUC19 present, Figure
2.8B. A few dozen colonies were PCR screened from these cultures at the end of the experiment
to identify clones with an expanded array (Table S5). From the plasmid-free strain 23 new spacers
were sequenced with all matching sequences from the host chromosome. For the strain containing
pUC19, 18 new spacers were sequenced and identified, with 3 derived from pUC19 and 15 from
the host chromosome. Shown previously, the presence of a high copy-number plasmid can
significantly increase the integration of host genome derived spacers (Sheth et al., 2017).
2.3.2.2 Cas1-Cas2 expression
The genome integrated cas1-cas2 operon is controlled by a T7-lac inducible promoter
expressed when cells are dosed with both arabinose and IPTG (Yosef et al., 2012). IPTG releases
the repressor from an operator upstream of cas1-cas2, and arabinose induces the expression of
genomic T7 RNA-polymerase required for cas1-cas2 transcription. We sought to induce a range
of Cas1-Cas2 expression levels and measure the corresponding spacer acquisition rates. This was
done by titrating the IPTG dose with a fixed arabinose concentration (0.2%) and measuring spacer
54
acquisition over five days in the base strain containing pUC19. A control condition with no
arabinose or IPTG was run, revealing no detectable spacer acquisition over 5 days (Fig. S2.5AB),
indicating strong repression in the absence of both inducers. Seven arabinose-dosed conditions
were examined with the IPTG dose ranging between 0 and 5mM. The 0mM IPTG condition
produced a low, but detectable level of array expansion, indicating slightly leaky expression with
arabinose alone, Figure 2.8C. The culture dosed with 0.05mM IPTG produced the fastest rates of
spacer acquisition. This experiment was also performed in the plasmid-free base strain, showing a
similar trend in spacer acquisition rates for the corresponding IPTG doses (Fig. S2.6).
2.3.2.3 CRISPR array copy number
To determine if multiple arrays affect spacer acquisition rates per cell, we compared
acquisition in strains containing one or two CRISPR arrays within the host chromosome. The
second array was derived from the native E. coli CRISPR locus and is hereafter referred to as the
“mini” array as it contains just two repeats flanking a single parental spacer (Fig. S2.7). This mini
array was integrated ~1.8Mbp away from the native CRISPR locus. Using unique pairs of primers,
the expansion of each array was independently monitored through a 5-day time course. The pUC19
plasmid was added to both strains to enhance spacer acquisition rates. 0.05 mM IPTG was used
for cas1-cas2 induction.
As shown in Figure 2.8D, both arrays in the two-array strain expanded slower than the
single acquisition locus of the one-array strain. However, the average number of spacers acquired
per cell for the two arrays combined was about the same as spacer acquisition in the one-array
strain. This suggests acquisition of new spacers was roughly split between the two arrays without
55
changing the overall rate of spacer acquisition per cell. In the two-array strain, the added miniarray expanded more slowly than the native array, potentially due to the native array being closer
to the chromosomal origin of replication (oriC), and therefore having a higher average copy
number than the mini-array (Skovgaard et al., 2011). Replicating bacterial cultures contain greater
sequence copy number for sequences closer to the oriC. This copy-number gradient relative to the
oriC has been shown in genomic DNA extracted from E. coli BL21-AI (Levy et al., 2015).
2.3.2.4 Expression of heterologous DNA end-joining genes
We hypothesized the Cas1-Cas2 DNA substrate pool may be impacted by pathways that
can protect and join-together free DNA ends in the cell. We tested this hypothesis by introducing
a bacterial non-homologous end joining (NHEJ) system made up of two genes that specifically
serve this purpose. E. coli does not have a native NHEJ pathway, but heterologous NHEJ can be
introduced. This simple, two-component NHEJ system found in some bacterial species utilizes the
genes ku and ligD for non-homologous end joining (Shuman et al., 2007). Ku binds to free DNA
ends protecting them from exonuclease degradation and LigD ligates these DNA ends together
(Aniukwu et al., 2008). We reasoned that these functions may preserve more Cas1-Cas2 substrate
by protecting intracellular DNA debris. ku and ligD native to Mycobacterium smegmatis were
assembled into an operon and genome integrated into our base recording strain. Spacer acquisition
rates were measured for this strain with and without pUC19. Figure 2.8E shows that expression
of Ku and LigD increased the spacer acquisition rate by ~14% for the base recording strain and
~124% in the strain containing pUC19. Gel image comparisons for the pUC19 strains can be found
in Fig. S2.8.
56
Figure 2.8 | Intracellular parameters modulate spacer acquisition. (A) Several factors are
identified that modulate spacer acquisition: Mobile genetic elements (e.g. plasmids), Cas1-Cas2
expression level, number of CRISPR arrays and Ku-LigD expression listed as B-E, respectively.
(B) Quantifying spacer acquisition rates from cas1-cas2 induction time course experiments.
Acquisition rates are shown for the base spacer recording strain with and without the pUC19
plasmid. (C) Modulating Cas1-Cas2 expression by varying the IPTG dose and measuring the
corresponding spacer acquisition rates for the base recording strain containing pUC19. (D)
Quantifying spacer acquisition rates in the base recording strain containing pUC19 with either one
1 or 2 chromosome-based CRISPR arrays. (E) The impact of DNA end joining machinery (Ku +
LigD) on spacer acquisition rates in the base strain with and without pUC19. For all charts here,
means of three biological replicates ± SD are reported. Spacer acquisition rates were generated by
fitting to constant acquisition rate and array-length dependent fitness reduction represented by �-.
Statistical comparisons provided in SI Table 4.
2.3.3 Enhanced array expansion rates boost phage protection
To test if array expansion rate influences phage resistance, CRISPR-interference enabled
spacer acquisition strains with different array expansion rates were infected with bacteriophage
Lambda. We utilized an infection protocol that tracks cell density post phage inoculation (Rajnovic
et al., 2019). OD600 measurements tracked culture growth in response to an infection. OD600
57
curves were compared for infected and uninfected cultures to determine the extent of phageinduced growth inhibition.
All strains used in this infection assay were plasmid-free, capable of acquiring spacers derived
from either the infecting bacteriophage or the self-genome. Four strains were run in this assay to
compare their relative resistance to phage infection (Figure 2.9A), C1C2: the base acquisition
strain containing only cas1-cas2, C1C2-N: the same base strain with the addition of NHEJ, C1C2-
C3: the base strain with all CRISPR machinery but not NHEJ, and C1C2-C3-N: the base strain
with all CRISPR machinery and NHEJ. Cultures were inoculated with or without Lambda phage
and dosed with spectinomycin (50 µg/mL), IPTG (0.05mM), arabinose (0.2%) and rhamnose
(0.1%). These cultures were distributed across a 96-well plate and run on a plate reader for 21
hours with OD600 data collected every 20 minutes.
For each strain we calculated area under the curve for infected and uninfected cultures to
quantify phage induced growth inhibition. The difference in the respective areas is the percent
growth inhibition relative to uninfected cultures (Rajnovic et al., 2019). For each growth curve the
area was calculated from a start point of detection (SPD) to an end point of detection (EPD). SPD
is defined as the threshold at which the culture growth rate reaches 0.001 OD units per minute and
the EPD is 15 hours post SPD. The base recording strain C1C2, lacking CRISPR interference and
NHEJ machinery, had a phage induced growth inhibition of ~35%, Figure 2.9B. C1C2-N growth
was inhibited by ~39%. C1C2-C3, capable of utilizing spacer derived crRNAs for targeting, had
growth inhibited by ~31% and the strain combining the full CRISPR system with NHEJ (C1C2-
C3-N) had a 3-fold reduction in growth inhibition at ~12%.
A second experiment using just C1C2-C3-N was run to directly test the hypothesis that
phage protection varies with spacer acquisition rates. The IPTG dose range experiment in the base
58
recording strain showed that spacer acquisition rates are modulated with Cas1-Cas2 expression
levels. In this experiment we inoculated C1C2-C3-N with one of four IPTG doses from 0mM to
5mM while keeping the arabinose dose fixed (0.2%) and also inducing Cascade-Cas3 expression
with rhamnose (0.1%) as seen in Figure 2.9C. The experiment was run in the same way as the
previous plate-reader time course. Phage induced growth inhibition was lowest (~12%) for the
IPTG dose previously shown to produce the highest spacer acquisition rate (0.05mM), as seen in
Figure 2.8C, whereas inhibition in the 0mM IPTG cultures was ~60% higher. Spacer acquisition
rates associated with Cas1-Cas2 expression levels corresponded in rank order to the degree of
protection from phage infection, indicating a positive correlation for spacer acquisition rate and
phage protection. Nearly 20% growth inhibition for cells expressing NHEJ at 0 IPTG as compared
to 30% growth inhibition for cells without NHEJ at 0.05 mM IPTG suggests the potential for NHEJ
to increase spacer acquisition is exaggerated in phage infection cells. A similar increase in the
expansion rate for NHEJ expressing cells with plasmid compared to cells without plasmid was
observed in Fig. 2.8E.
From the most-protected C1C2-C3-N strain, 14 expanded-array clones isolated post
infection were sequenced across the CRISPR array to identify the newly acquired spacer
sequences. Interestingly, all 15 of the spacers identified were derived from the host genome. These
15 spacer sequences are listed in Table S5.
59
Figure 2.9 | Phage protection is correlated with spacer acquisition rates. (A) Spacer recording
strains with or without CRISPR effector machinery (Cascade-Cas3), and with or without
heterologous NHEJ (Ku-LigD), were infected with bacteriophage Lambda to measure the degree
of growth inhibition relative to uninfected controls. OD600 measurements tracked culture growth
over 16 hours. Means of five biological replicates ± SD are reported. (B) Percent growth inhibition
from (A) calculated as the percent difference in area under the curve for infected versus uninfected.
The period assessed was the 15 hours from when cultures reached a growth rate of 0.001 OD units
per minute. Expression (+) or not (-) for relevant genes is indicated below the chart. (C) Percent
growth inhibition for the most protected strain (effector + NHEJ competent) treated with a range
60
of IPTG doses to modulate Cas1-Cas2 expression. Means of three biological replicates ± SD are
reported. Statistical comparisons provided in SI Table 4.
2.4 Discussion
In this study we used a synthetic CRISPR-Cas system derived from E. coli to characterize
rates of spacer acquisition in the absence of interference machinery. Establishing a baseline spacer
acquisition rate for this system allowed us to identify intracellular factors that modulate array
expansion rates. We identified three intracellular factors that affect these rates: 1) Cas1-Cas2
protospacer substrate, 2) Cas1-Cas2 expression levels and 3) the number of arrays within the
genome. Introducing a high copy number plasmid (pUC19) into our base recording strain increases
the Cas1-Cas2 protospacer substrate concentration (Levy et al., 2015; Sheth et al., 2017) resulting
in a nearly 2-fold increase in the rate of spacer acquisition (Figure 2.8B). Utilizing an inducible
promoter in the cas1-cas2 operon, we varied the expression levels for these adaptation genes and
measured a corresponding range of spacer acquisition rates (Figure 2.8C). The fastest rates of
spacer acquisition occurred at a midrange IPTG concentration. With Cas1 and Cas2 proteins
coalescing to form a 6-subunit integrase complex (Cas1)2(Cas2)2(Cas1)2, saturating expression
(≥1mM IPTG) may produce larger, non-functional protein aggregates, reducing spacer acquisition
potential. With a second array introduced into the base strain, the rate of new spacers acquired per
cell did not change much as the acquisition rate for each array was roughly half that of the singlearray strain (Figure 2.8D). We hypothesized that expression of heterologous NHEJ genes ku and
ligD may enhance spacer acquisition by stabilizing DNA fragments, increasing the concentration
of Cas1-Cas2 substrate within cells. Ku binds to and protects DNA ends from exonucleolytic
degradation and LigD can ligate these ends together. We discovered that NHEJ expression does
increase spacer acquisition, with rates boosted as much as ~124% relative to the non-NHEJ control
61
(Figure 2.8E). This Mycobacterium smegmatis derived NHEJ construct was also introduced into
a strain with a fully functional Type I-E CRISPR system and found to provide a 3-fold increase in
protection from phage infection. These strategies to control spacer acquisition may help to better
understand and engineer CRISPR-Cas systems in bacteria (Shivram et al., 2021). Prior work with
Type II-A CRISPR showed that transcription of the CRIPSR array may affect spacer acquisition
rates. Array transcription is a mechanism to resolve the post-synaptic complex to complete spacer
integration (Budhathoki et al., 2020). It is not clear if array transcription would also influence
spacer acquisition for Type I-E CRISPR. In this study array transcription was not modified or
intentionally regulated, therefore our measurements would not reveal any impacts of array
transcription.
Using the data from these studies we modeled naïve spacer acquisition in this system. The
basic model, considering constant spacer acquisition and cell growth, predicted a linear increase
in average array length per cell over time, which was not supported by experimental data (Figure
2.6D). Several other variables were modeled including slowed acquisition, fitness effects
associated with expanded populations and array-expansion inactivation (mutation). Both the
modeling (Figs. 2.7C and S2.2) and experimental data (Figure 2.7E) suggest fitness effects linked
to array-expanded populations are the source of this disparity. Although CRISPR-Cas selftargeting of host RNA has previously been identified as an infection defense strategy, a mechanism
for spacers influencing host fitness in the absence of interference machinery has not been detected
(Meeske et al., 2019). We suspect that precursor crRNA transcripts in cells lacking Cascade
mediated processing may interact with complementary sequences in the genome or within
plasmids. Genome-complementary RNA sequences may impact host gene expression, producing
net fitness effects. It is not known if antisense sequences within unprocessed CRISPR-array
62
transcripts can affect the translation of mRNAs, but bacterial RNA interference (RNAi)
mechanisms are known to be involved in Post-Transcriptional Gene Silencing (PTGS), (Lioliou et
al., 2010; Rusk et al., 2012; Saberi et al., 2016). Alternatively, it has been shown that
overexpression of Cas1-Cas2 can result in non-canonical spacer integrations into non-array regions
within the genome, potentially resulting in fitness effects (Nivala et al., 2018). As shown in Fig.
S2.3, the fitness impact of array expansion was variable, presumably depending upon the sequence
of the acquired spacer. It is intriguing to consider if a gene regulatory effect from CRISPR-array
transcripts, in the absence of additional interference machinery, could be a secondary and more
primitive function of a CRISPR array.
Several emerging technologies utilize spacer acquisition as a tool for various applications
including recording the occurrence and order of events in cellular environments resulting in
transcription (Munck et al., 2020; Sheth et al., 2018; Lear et al., 2023) and converting digital data
to biological storage in CRISPR arrays (Shipman et al., 2017; Yim et al., 2021). A broader
understanding of spacer acquisition and the factors affecting rates of spacer uptake may enable
tuning the frequency of these events to optimal rates for specific applications. Also, engineering
conditions to maximize spacer acquisition rates may increase the probability of recording rare
events.
We show that faster CRISPR adaptation provides greater protection from phage infection
in our engineered E. coli host with constant CRISPR-Cas induction (Figure 2.9A). As host cells
acquire resistance to an infection, phage can coevolve through escape mutations (Barrangou et al.,
2007), with host spacer acquisition (naïve and primed) a key rate-limiting factor for the adaptive
immune response (Sternberg et al., 2016; Heler et al., 2017; Datsenko et al., 2012; Staals et al.,
2016). In this sense, faster spacer acquisition would increase survival during infection. However,
63
the benefit/cost ratio of spacer integration rates may be proportional to the threat of the infectious
agent as potentially damaging self-targeting spacers can also be acquired. Strains with very long
CRISPR arrays either have an increased benefit from a greater repertoire of spacer sequences, or
somehow have managed to reduce the costs associated with array growth. Over much longer times,
the rate of array contraction should play a role is setting the size of the array. Though a bias exists
for the acquisition of spacers derived from MGE’s, genome-derived spacers are also acquired
during infection (Shipman et al., 2016; Levy et al., 2015). Bacteria are generally deficient in robust
double-strand DNA break repair pathways (Finger-Bou et al., 2020; Wimmer et al., 2019),
reducing the probability for self-targeting survival. Wild type bacterial cells generally maintain
strong regulatory control over CRISPR-Cas to maximize the benefit of expression in dynamic
natural environments (Markulin et al., 2020; Patterson et al., 2016). Increasing spacer acquisition
rates to enhance CRISPR-Cas efficiency may help improve bacteriophage resistance needed in
fermentation and other industrial processes (Deem, 2020; Garneau et al., 2011; Maguin et al.,
2022), with consideration for temporal control of expression. The fact that no phage-derived
spacers were detected post infection is puzzling, although prior work has shown that CRISPRassociated defense against phages does not always lead to maintenance of phage-targeting spacers
(Strotskaya et al., 2017).
NHEJ genes ku and ligD function naturally to repair double-strand DNA breaks. In the
context of CRISPR spacer acquisition however we hypothesized a role for these genes in
producing a larger Cas1-Cas2 DNA substrate pool. We verified increased acquisition rates in
strains expressing these NHEJ genes. This result implies that other pathways may also impact
spacer acquisition rates by altering the concentration of intracellular DNA debris. In E. coli,
exonucleases RecBCD and SbcCD degrade DNA from free ends. Directly regulating the
64
expression of these genes or expressing an exonuclease inhibitor such as Gam may also result in
enhanced spacer acquisition rates. Discovery of other native and heterologous factors affecting
CRISPR efficiency, as well as engineering and evolving improvements may further expand
application potentials for both CRISPR adaptation alone and for functional CRISPR-Cas defense.
2.5 Supporting Experiments
2.5.1 Spacer acquisition in strains with different starting array lengths
To this point all spacer acquisition assays were performed with strains not exposed to a
cas1-cas2 induction period prior to the spacer acquisition serial passage time course. All strains
evaluated were parental +0 (wild type CRISPR array starting length). Here, we sought to assess
the impact of spacer acquisition rates in clonal strains that were array-expanded through a previous
cas1-cas2 induction, serial passage experiment. We isolated five different clones from the base
recording strain containing pUC19. These clones were array-expanded to +1, +2, +3, +4 or +5.
They had been isolated and characterized at the end of cas1-cas2 induction time course
experiments. The +1 and +2 were isolated from a parental +0 culture induced for Cas1-Cas2
expression for five days. The isolated +1 was then fed into a cas1-cas2 induction experiment and
serial passaged for eleven days, followed by agar streaking for clonal isolates. The +3, +4 and +5
clones were identified from this screen.
These five strains along with the parental +0 were each cultured in triplicate for eleven
days of constant cas1-cas2 induction with 0.05mM IPTG and 0.2% arabinose. Cultures were
passaged once a day as was done with prior induction time course experiments. Samples from
cultures were taken once daily for PCR analysis across the leader-proximal end of the array, as
65
described previously. The average new spacers per cell were tracked over the time course duration
as seen in Figure 2.10A. For all six strains, spacer acquisition rates slowed down significantly
over the course of the experiment. Spacer acquisition rates were calculated using best fit
exponential decay curve for time range between 24-96 hours. Figure 2.10B shows projections for
new spacers-per cell in the six respective cultures if these rates were constant over the full 11-day
time course. The difference in area under the curves between projected and experimental are shown
for the +0 parental strain in Figure 2.10C. Areas were calculated by summing the trapezoids under
the respective curves between 24 and 264 hours. The differences in these areas were calculated as
percent reduction from the projected curves, as seen in Figure 2.10D. The +0 parental and +1
strains show the smallest reductions at 26.7% and 25.1% respectively. The +3 and +5 strains show
the greatest reduction in acquisition at 63.8% and 57.1% respectively whereas the +2 and +4 strains
have reductions between the other two groups, at about 45%. The total area under the curves for
the +2 and +4 strains were less than those of the +3 and +5. The lower acquisition projections due
to slower acquisition rates from 24-96h explain the smaller differences under the curves for the +2
and +4 strains.
66
Figure 2.10 | Spacer acquisition in strains with different starting array lengths. (A) cas1-cas2
induction time course data for six strains of different starting array lengths. Array expansion was
tracked within the cultures over 11-days with constant induction. (B) Acquisition rates from panel
A were calculated from best fit exponential decay curves between 24-96h. These rates were set as
constant for the eleven-day period to project spacer acquisition trajectories for each strain
assuming constant spacer acquisition rates. (C) The difference in area under the curve for projected
spacer acquisition in the +0 parental culture for theoretical constant acquisition vs. the actual
experimental data. (D) Quantified differences in area under the curve between projected (constant
acquisition) vs. experimental data, calculated as percent reduction of area under projected curve.
Areas were calculated using the trapezoidal summing method. The means of three biological
replicates ± SD are reported.
The +2, +3, +4 and +5 strains all show a reduction in spacer acquisition much earlier than
the parental +0 and +1 strains. These four strains show spacers/cell acquisition rates peaking at
about 120h, whereas the two shorter starting array length strains peak much later at 216h. It is
unclear why strains with longer starting array lengths are significantly more impacted than those
67
with shorter array lengths, but it may be due to the specific spacer sequences within the arrays
more so than just the array length itself. To test this hypothesis, we measured spacer acquisition in
strains containing the same number of new spacers but of different sequence identities (Section
2.5.2). No detectable spacer loss (array contraction) was measured across the five expanded-array
starting strains through several days of constant cas1-cas2 induction (Figure 2.11). This is due to
the lack of quantifiable amplicon bands detected below the starting array length bands, indicating
that newly acquired spacers are relatively stable within the arrays.
Figure 2.11 | No contraction detected below array starting lengths. Five clonal strains each
with a successively greater number of new spacers were run is an 11-day constant cas1-cas2
induction time course. None of these cultures produced quantifiable amplicon bands shorter than
the starting array length, at any of the analyzed time points. The PCR-band quantification method
can detect subpopulations as small as ~0.5% of the total population. Gel images on the right show
the 0h through 120h time points from the 11-day serial passage experiments.
2.5.2 Spacer acquisition assay for distinct +2 expanded starting strains
From the spacer acquisition experiment with strains of different starting array lengths, we
found that longer arrays, initially containing at least two new spacers prior to the start of the
experiment, acquired spacers near the parental rate over the first few days but slowed significantly
68
earlier than +0 and +1 strains beyond day 3. We sought to test the hypothesis that all strains of the
same array starting length expand at the same rate, regardless of the spacer sequences present. If
arrays of the same length acquire spacers at different rates, then it’s not just the array length that
sets the acquisition rate. Alternatively, what appears as relatively slower acquisition rates may in
fact be fitness effects. If the array-expanded clones of a parental strain have reduced fitness on
average, relative to parental, then the acquisition rate will appear slower.
We investigated three different +2 clonal strains isolated from previous time course
experiments. These three strains were derived from the base recording strain containing pUC19.
Each were isolated at the conclusion of a five-day constant cas1-cas2 induction time course.
Alongside these three strains we ran the parental +0, also containing pUC19. Each strain was run
in duplicate with two replicate cultures for a multi-day cas1-cas2 induction time course. All three
of the +2 strains were sequenced to identify new spacers present (Table S5). The three +2 strains
are identified as S-129, S-135 and S-139. The two new spacers in each of these strains were
classified as derived from either the host BL21-AI genome or the pUC19 plasmid and identified
as being either positive-sense or antisense. The +sense/antisense identity is determined by the
orientation of the spacer within the CRISPR array. When a spacer is derived from a protospacer
within a coding region, the precursor crRNA produced from array transcription is either the same
sequence as the source gene product or the complementary sequence, depending on the orientation.
This potential for precursor-crRNA/mRNA complementarity is an area of focus in our CRISPR
adaptation work. The two new spacers of S-129 are both genome-derived, with one positive sense
and the other antisense. In S-135 both new spacers are genome derived and both are positive sense.
S-139 contains two pUC19 derived spacers, one being positive sense and the other anti-sense.
69
For the three +2 strains and +0 parental, an 11-day cas1-cas2 induction time course was
run to characterize spacer acquisition. Acquisition was tracked over time as average new spacers
per cell (Figure 2.12A). The areas under the curves between 24-264h were quantified, using the
trapezoidal summing method to compare the extent of spacer acquisition for each strain (Figure
2.12B). Spacer acquisition was greatest in the parental +0 control. For the +2 strains there was
significant variation in spacer acquisition across the three strains with S-129 and S-139 cultures
both expanding to about the same extent, but area under the curve for S-135 was about 50% greater.
The variance across the +2 strains indicates that array length alone does not determine the extent
of array expansion. Although S-135 had the greatest area under the curve for the three +2 strains
over the period analyzed, it’s rate of expansion was slower over the first three days relative to the
other two strains (Figure 2.12A). It’s unclear what may cause these early acquisition rate
differences but the longer period of sustained acquisition for S-135 resulted in greater overall
spacer integration within the population over the experimental period.
For all strains run in the time course, including parental +0, the average new spacers per
cell eroded from between 192 and 264 hours. This erosion in spacers within the population could
be due to either spacer contraction (spacer loss from within arrays typically due to slippage during
genome replication) or mean fitness reduction associated with expanded-array cells relative to
unexpanded parental cells. Array contraction is an unlikely explanation for two reasons, first we
have performed serval experiments designed to identify and quantify the extent of contraction (Fig.
S2.1) with no contracted amplicon bands reaching our quantifiable threshold. Secondly, the
mechanism of replication slippage whereby repeats flanking a spacer partially anneal (leading to
spacer loss) can occur on either the template strand or the replicating strand. This means slippage
can result in either spacer loss or spacer duplication (Garrett SC., 2021). If the rates for both forms
70
of this mechanism are equal, they would offset and eliminate any detectable effects from slippage.
Therefore, spacer erosion in our populations is more likely due to fitness effects associated with
expanded-array clones. Interestingly, variance across replicates increases significantly at about the
same time as spacer erosion occurs within the cultures. Figure 2.12C highlights this variance
across replicate cultures and the increasing variance over the second half of the experiment. This
experiment-wide variance was quantified using the standard deviation as a percentage of the mean,
with the average of all four strains reported at each time point (Figure 2.12D). This increased
potential for replicate divergence over the course of an experiment may also be evidence of spacersequence dependent fitness effects. Fitness-enhanced array-expanded cells within a mixed culture
would gain in proportion relative to the parental +0 population over time. If this is not occurring
in the replicate culture, the variance in average spacers per cell between the cultures would increase
as we see in figures 2.12C and 2.12D.
The three +2 strains and the parental +0 were also cultured to track growth over time by
frequently measuring OD600. Cultures were inoculated in 96-well plates and incubated in a plate
reader with intermittent shaking and OD600 readings. Cultures were run over 17 hours either
without (Figure 2.12E) or with (Figure 2.12F) cas1-cas2 induction. Cultures grew to a greater
average density when Cas1-Cas2 was not expressed. These uninduced cultures displayed similar
growth rates with slight variance in carrying capacities across the strains. The induced cultures
also showed very little variance in growth rates but significantly greater variation in carrying
capacities. The two strains with significantly reduced carrying capacities under induction, S-129
and S-139 are the two strains producing cultures with the fewest new spacers across the 11-day
cas1-cas2 induction experiment (Figures 2.12A and 2.12B).
71
Figure 2.12 | Spacer acquisition in strains with two newly acquired spacers. (A) Three different
+2 strains were induced for constant expression of Cas1-Cas2 over 11-days. Average new spacers
per cell was tracked for these three strains as well as the parental +0 strain. (B) Array expansion
from A is quantified as area under the curve for each of the four strains. (C) Both replicates plotted
from the 11-day induction time course highlighting the increasing variance over the course of the
experiment. (D) The average variance between replicates quantified for each of the four strains
from panel C. Variance measured was SD as % of mean at respective time points. (E) OD600
growth curves measured over 17 hours for each of the four strains cultured with no induction of
cas1-cas2. (F) OD600 curves measured over 17 hours for each of the four strains cultured with
cas1-cas2 induction. The means of two biological replicates ±SD are reported.
72
2.5.3 NHEJ genes ku and ligD individually enhance spacer acquisition
Non-homologous end joining (NHEJ) genes ku and ligD of Mycobacterium smegmatis
expressed together in E. coli were shown to significantly increase the rate spacers are integrated
into CRISPR arrays (Figure 2.8E). To determine which of these two genes are responsible for
augmented array expansion rates, we generated strains with just one of the two genes integrated
into the host genome. Five-day constant cas1-cas2 induction experiments were run with daily
samples taken to quantify and track subpopulation proportions (Figure 2.13A). Both the ku-only
and ligD-only strains acquire spacers faster than parental but not as fast as the strain containing
both ku and ligD (Figure 2.13B). We can quantify just spacers/cell acquisition above parental by
subtracting spacers/cell of parental from the other three strains. We can then combine the values
from the ku-only and ligD-only strains. These combined spacers/cell values closely match the
acquisition of the strain containing ku and ligD together (Figure 2.13C). This result implicates
both Ku and LigD as each individually enhancing spacer acquisition, with the sum of their
independent impacts producing nearly the exact array-expansion increase we observed in the strain
with both NHEJ genes.
73
Figure 2.13 | Both Ku and LigD alone increase spacer acquisition. (A) PCR products from gel
electrophoresis images tracking cultures through five days of cas1-cas2 induction for parental
recording strain (top left), with ku-ligD (NHEJ), ku-only, ligD-only. All four strains contain
pUC19. Gels from a single replicate of each strain are shown. (B) Quantified gel amplicons from
panel A converted to average new spacers per cell, tracked across the five-day time course. (C)
The y-value from the parental strain subtracted from the other three strains showing spacer
acquisition increases over parental. The blue dashed line combines spacer acquisition increases
from ku-only and ligD-only.
2.6 Acknowledgements
This work was supported by Army Research Office MURI Award No. W911NF1910269. We
would like to acknowledge Adam Krieger for helpful discussions and for providing the
bacteriophage Lambda bacterial lysogen.
74
2.7 Supplementary Material
Fig. S2.1 | Array contraction not detected through 5-days of culturing. (A) 5-day cas1-cas2
induction experiments using PCR primers spanning the array-distal end of the CRISPR array from
parental spacer 5 indicated no loss of spacers (array contraction) in that region. Two strains were
monitored (using primers LD-FP and LD-RP), the base recording strain with (right) and without
(left) pUC19. Colored parental spacers are numbered 1-13. (B) Five expanded-array clones were
isolated at the end of a 5-day induction time course containing between +1 and +5 new spacers
(base recording strain w/pUC19). These array-expanded strains were mixed in different
combinations for a 5-day time course with no cas1-cas2 induction. The leader-proximal PCRs for
these strains revealed no detectable bands below the parental bands through these experiments.
75
Fig. S2.2 | Model simulations fitting to a 10-day cas1-cas2 induction experiment for the base
recording strain containing pUC19. (A) The expansion of the array was simulated starting with
100% of the population being at length +0 at t=0. Array expansion, using the rate derived from +0
parental population decay was used to simulate the expansion of all array-length populations over
ten days. (B) Using a best fit to all array length populations, finding the acquisition rate and cas1-
cas2 mutation rate that together produce the best fit. Mutation did affect the cell growth rate. (C)
Best fit model for acquisition rate and fitness, using αi growth for array-length populations where
i is the number of new spacers. (D) Residuals calculated from the model fit to experimental data
for each detected expanded-array population. Experimental data is the mean of three biological
replicates ±SD. The code for running these simulations is Sim. Code 1. Fits minimize average
residual across all array lengths (+0 to +4) over all time points.
76
Fig. S2.3 | Expanded vs. unexpanded-array competition experiments without cas1-cas2
induction. (A) Two strains, the base recording strain with and without pUC19, were induced for
Cas1-Cas2 expression for 24h. Cultures were then passed into fresh media without induction
chemicals. Cultures were grown for 8h (24-32h) to allow for residual Cas1-Cas2 to degrade. From
32-96h, PCR measurements identified changing expanded proportions. Vertical dashed lines
represent 1:100 passaging into fresh media. (B) Clones from one of the replicates (base strain
w/pUC19) were isolated from the 32h time point. 100 clones were screened across the leader
proximal end of the array, with 14 identified as expanded (all +1). (C) Competition experiments
were run for 48h competing each of the 14 expanded clones individually with +0. PCR bands were
quantified to determine the ratios of +1/+0 at each time point (D) +1 proportion changes from the
77
14 competition experiments across the 24-48h period showing 12 of the fourteen +1 clones losing
in proportion to the +0 cells. (E) Percent change in the +1 proportion for each of the fourteen +1
clones from the respective competition experiments, from 0-48h. “less/more fit +1 clone” labels
refer to +1 fitness relative to the competing +0 population.
Fig. S2.4 | Alpha values calculated from five-day cas1-cas2 induction experiments. Alpha (α)
is the growth penalty for array-expanded populations. The growth rate for a strain decreases by
alpha for each new spacer added to the array, as shown in Equation 7. These alpha values were
derived from the experiments presented in Figures 2.8 and S2.6. Weighted mean alpha for all
conditions was 0.9972, calculated using a weight of 1/SD. Error bars represent the SD from
bootstrapping based on three experimental replicates
78
Fig. S2.5 | Quantifying spacer acquisition using PCR. (A) Base recording strain serial passaged
for five days without IPTG or arabinose, sampled daily for PCR analysis. No array expansion
detected. (B) Base recording strain with pUC19 also shows no detectable array expansion through
a five-day serial passaging experiment without cas1-cas2 induction. (C) Minimal population
proportion detection using this method is ~0.5%, as seen here for the +3 band at 72h for a spacer
recording strain. (D) The 72h bands from C are quantified using GelAnalyzer software. The +3
band as indicated by the “1.” tag represents 0.57% of the population.
79
Fig. S2.6 | Cas1-Cas2 expression range in base spacer recording strain. Modulating Cas1-Cas2
expression by varying the IPTG dose and measuring the corresponding spacer acquisition rates for
the base recording strain containing no plasmids. The array expansion rate was calculated as in
Figure 2.8, assuming a fitness cost for array expansion. The array expansion rate is independent of
array length. Means of three biological replicates ±SD is reported. P values are provided in SI
Table 5.
Fig. S2.7 | Construction of an E. coli strain with two CRISPR arrays. (A) The E. coli BL21-
AI host strain contains a native Type I-E CRISPR array. A second “mini” array was inserted into
the genome. Primers for amplifying across the spacer integration sites are shown. (B) A map of the
E. coli BL21-AI genome annotated with the two arrays and distances relative to the origin of
replication (OriC).
80
Fig. S2.8 | NHEJ expression enhances spacer acquisition. (A) PCR products from a 5-day cas1-
cas2 induction time course for the base recording strain containing pUC19. (B) The same strain as
in A with the addition of NHEJ expression from an operon introduced into the genome. (C) Spacer
acquisition from A and B plotted as average new spacers/cell in each culture. Means of three
biological replicates ± SD are reported.
81
Fig. S2.9 | Spacer acquisition assay. (A) Cas1-Cas2 integrase complex captures and processes
short DNA sequences prior to CRISPR-array integration. Arrays contain alternating repeats (black)
and spacers. Expanded arrays can be detected using PCR with primers (purple arrows) that span
the spacer integration site (leader-repeat1 junction). Arrays are expressed from a promoter in the
leader sequence (orange), producing precursor crRNA, which can be further processed to crRNAs
via Cas6 (E. coli). DR: Direct Repeat; PS: Parental Spacer. (B) Five-day cas1-cas2 induction time
course. 1. Spacer recording strain scaled up overnight at 37ºC with no cas1-cas2 induction. 2.
Scaled overnight culture used to inoculate fresh media containing induction chemicals and
antibiotic(s). 3. After 24 hours, cultures are sampled for PCR to measure the extent of spacer
acquisition. Culture is then passaged 1:100 into fresh induction media for a subsequent round of
growth. This cycle is repeated for five days.
82
Fig. S2.10 | Doubling times calculated from OD600 growth curves. OD600 growth curves were
generated from 200uL cultures in 96-well plates at 37ºC measured every 20 minutes. (A) Log
phase growth defined for each of four strains grown in LB media, measured in a plate reader.
Vertical gray lines define the log phase growth for strains C1C2, C1C2-C3 and C1C2-N. The
orange vertical line defines the log phase growth for strain C1C2-C3-N. (B) For each strain loggrowth is fit to an exponential growth curve. The x-axis is time post log phase start. (C) Doubling
times (Td) were calculated from the log growth fits. The fits are of the form Y=AeBx, where B is
the exponential growth rate. Doubling time Td = ln(2)/B.
83
Table 2.1 | E. coli strains used in this study
Table 2.2 | Plasmids used in this study
Strains Description Resistance
Strain-1 BL21AI-Cas1Cas2 (Base spacer acquisition strain). S
Strain-2 Strain-1 with a constitutive NHEJ (Ku + LigD) construct integrated into the genome. S
Strain-3 Strain-1 with a second array (mini) integrated into the genome. S
Strain-4 Strain-1 with pUC19. S + A
Strain-5 Strain-2 with pUC19. S + A
Strain-6 Strain-3 with pUC19. S + A
Strain-7 Strain-1 with rhamnose-inducible Cascade-Cas3 integrated into the genome. S
Strain-8 Strain-2 with rhamnose-inducible Cascade-Cas3 integrated into the genome. S
Strain-9 DH5! NA
Strain-10 MG1655 NA
Strain-11 BL21-AI NA
Strain-12 Host strain containing Lambda prophage and pB33recA730 C
Plasmid Marker Origin Source ID
pCas KanR pSC101 Addgene 62225
pCas1+2 SmR CloDF13 Addgene 72676
pSPIN SmR pSC101 Addgene 160729
pCas3cRh GmR ColE1 Addgene 133773
pDSG372 TetR p15A Lab stock NA
pUC19 AmpR ColE1 NEB NA
pB33recA730 CmR f1 Lab stock NA
84
Table 2.3 | Primers used in this study Name Sequence Description C1C2-FP gtcttcagtctgatttaaataagcgttgatattcagtcaatatgcgactcctgcattagg Generates Cas1-Cas2-SmR fragment from plasmid pCas1+2 w/homology arms for BL21-AI genome integration. C1C2-RP tcttcctgttatgtttttaatcaaacatcctgccaactccgtagcgaccgagtgagctag Generates Cas1-Cas2-SmR fragment from plasmid pCas1+2 w/homology arms for BL21-AI genome integration. s2-FP cctgggaggattcataaagc Detection of genomic insertion at site 2 in BL21-AI. s2-RP agctactggcttatcacgtc Detection of genomic insertion at site 2 in BL21-AI. Array-FP tggatgtgttgtttgtgtg Amplification across native array spacer-insertion site. Array-RP attttgcgtttcgttcaggt Amplification across native array spacer-insertion site. Mini-FP acgacattttcctcgaggtc Amplification across mini array spacer-insertion stie. Mini-RP gcagtaaggactctagaagg Amplification across mini array spacer-insertion stie. s6-FP gttgctttatagacacccgc Detection of genomic insertion at site 6 in BL21-AI. s6-RP ctttattcgcctgatgctcc Detection of genomic insertion at site 6 in BL21-AI. MA-FP ttttcctcgaggtcatttccttggaatttacagcgaggcg Generates miniature E. coli CRISPR array from BL21AI for assembly with pSPIN-pSC101-site6. MA-RP gcagtaaggactctagaaggttacccatccagggctaatc Generates miniature E. coli CRISPR array from BL21AI for assembly with pSPIN-pSC101-site6. MAPS-FP gattagccctggatgggtaaccttctagagtccttactgc Generates the cargo-free pSPIN-pSC101-site6 backbone for a Gibson Assembly with the miniature CRISPR array. MAPS-RP cctcgctgtaaattccaaggaaatgacctcgaggaaaatg Generates the cargo-free pSPIN-pSC101-site6 backbone for a Gibson Assembly with the miniature CRISPR array. PSc1-SP cctcaggcatttgttgttg Primer for sequencing across assembled cargo junctions. Determining if pSPIN cargo is correct. PSc2-SP gtaaaacgacggccagt Primer for sequencing across assembled cargo junctions. Determining if pSPIN cargo is correct. PSs-SP attaccgcctttgagtgagc Sequencing pirmer to determine if the ligated spacer in pSPIN is correct. Kan-FP agcgaccgagtgagctagctaagcgctgcatgcctatttg Switching out antibiotic resistance marker. Adding KanR to the pSPIN-pSC101 plasmid. Amplified from pRSF-DUET. Kan-RP cttggtctgctgacaggacacagcaatagacataagcggc Switching out antibiotic resistance marker. Adding KanR to the pSPIN-pSC101 plasmid. Amplified from pRSF-DUET.
KPS-FP gccgcttatgtctattgctgtgtcctgtcagcagaccaag Generates the pSPIN-pSC101 backbone to pair with the KanR fragment for a Gibson Assembly.
KPS-RP caaataggcatgcagcgcttagctagctcactcggtcgct Generates the pSPIN-pSC101 backbone to pair with the KanR fragment for a Gibson Assembly.
MS1-FP ctaaagaggagaaaggatctatgcgcagcatttggaaagg Generates Mycobacterium smegmatis NHEJ fragment 1 for Gibson Assembly with pBAD backbone.
MS1-RP aatgcctgaggtttcagttactggctccaatccagaaaca Generates Mycobacterium smegmatis NHEJ fragment 1 for Gibson Assembly with pBAD backbone.
MS1b-FP tgtttctggattggagccagtaactgaaacctcaggcatt Generates the pBAD plasmid backbone for Gibson Assembly with M. smegmatis NHEJ fragment 1.
MS1b-RP cctttccaaatgctgcgcatagatcctttctcctctttag Generates the pBAD plasmid backbone for Gibson Assembly with M. smegmatis NHEJ fragment 1.
MS2-FP gcgggcaaagtgtttctggattggagccag Generates Mycobacterium smegmatis NHEJ fragment 2 for Gibson Assembly with plasmid-backbone-NHEJ-1.
MS2-RP aatgcctgagtgtttattgctcagcggtgg Generates Mycobacterium smegmatis NHEJ fragment 2 for Gibson Assembly with plasmid-backbone-NHEJ-1.
MS2b-FP ccaccgctgagcaataaacactcaggcatttgagaagcac Generates plasmid-backbone-NHEJ-1 for Gibson Assembly with Mycobacterium smegmatis fragment 2.
MS2b-RP tggttttcgccgcgttgttctggctccaatccagaaacac Generates plasmid-backbone-NHEJ-1 for Gibson Assembly with Mycobacterium smegmatis fragment 2.
MSp-FP ttttcctcgaggtcatttccaacctttgcggtatggcatg Generates the Mycobacterium smegmatis Ku-LigD (NHEJ) operon for G.A. as cargo into a pSPIN plasmid.
MSp-RP gcagtaaggactctagaaggtttcacttctgagttcggcc Generates the Mycobacterium smegmatis Ku-LigD (NHEJ) operon for G.A. as cargo into a pSPIN plasmid.
MSpb-FP ggccgaactcagaagtgaaaccttctagagtccttactgc Generates the pSPIN-pSC101-KanR-site6 plasmid backbone for Gibson Assembly with the Ku-LigD operon as cargo.
MSpb-RP tgccataccgcaaaggttggaaatgacctcgaggaaaatg Generates the pSPIN-pSC101-KanR-site6 plasmid backbone for Gibson Assembly with the Ku-LigD operon as cargo.
s9-FP ttgtgatgagcgcggatatg Detection of genomic insertion at site 9 in BL21-AI.
s9-RP tgccataccggtactacatg Detection of genomic insertion at site 9 in BL21-AI.
INT1-FP gctttttagactggtcgtacgtatagaaatgcaggag Generates the Cascade complex from MG1655 for G.A. with pCas3cRh backbone containing a rham-ind. promoter.
INT1-RP ctacacgaaccctttggctattacacctcaatcacagtgg Generates the Cascade complex from MG1655 for G.A. with pCas3cRh backbone containing a rham-ind. promoter.
INT1b-FP actgtgattgaggtgtaatagccaaagggttcgtgtagac Generates the pCas3cRh backbone for Gibson Assembly with the E. coli Cascade complex.
INT1b-RP ctcctgcatttctatacgtacgaccagtctaaaaagcg Generates the pCas3cRh backbone for Gibson Assembly with the E. coli Cascade complex.
INT2-FP tatctttggctccactgtgagggaggctattaatggaacc Generates the E. coli Cas3 from MG1655 for Gibson Assembly with the pCas3cRh-Cascade backbone.
INT2-RP ctttggctattacacctcaatgtacattgtgcaccttccc Generates the E. coli Cas3 from MG1655 for Gibson Assembly with the pCas3cRh-Cascade backbone.
INT2b-FP ggaaggtgcacaatgtacattgaggtgtaatagccaaagg Generates the pCas3cRh-Cascade backbone for Gibson Assembly with E. coli Cas3.
INT2b-RP gttccattaatagcctccctcacagtggagccaaagatag Generates the pCas3cRh-Cascade backbone for Gibson Assembly with E. coli Cas3.
INTpb-FP ttttcctcgaggtcatttcctggatctggcctagttaatc Generates the RhaB-Cascade-Cas3 fragment for Gibson Assembly with the pSPIN-pSC101-KanR-s6 backbone.
INTpb-RP gcagtaaggactctagaaggtgtacattgtgcaccttccc Generates the RhaB-Cascade-Cas3 fragment for Gibson Assembly with the pSPIN-pSC101-KanR-s6 backbone.
INTp-FP gggaaggtgcacaatgtacaccttctagagtccttactgc Generates the pSPIN-pSC101-KanR-sp6 backbone, for assembly with RhaB-Cascade-Cas3 as cargo.
INTp-RP ttaactaggccagatccaggaaatgacctcgaggaaaatg Generates the pSPIN-pSC101-KanR-sp6 backbone, for assembly with RhaB-Cascade-Cas3 as cargo.
85
Table 2.4 | P values generated using the Student T-test (2-tailed, unpaired)
P > 0.05 = red
P < 0.05 = yellow
P < 0.01 = green
P < 0.001 = blue
SAMPLE Base/pUC19
Base 0.000148514
SAMPLE 0.001mM 0.01mM 0.05mM 0.1mM 1mM 5mM
0mM 0.109246124 0.00346922 5.05278E-05 0.00631667 7.8724E-05 0.00217218
0.001mM 0.02034521 0.000141028 0.01217491 0.00086106 0.00534939
0.01mM 0.000585553 0.0592705 0.03538264 0.04384296
0.05mM 0.11733838 0.00039698 0.01854032
0.1mM 0.17522099 0.60837104
1mM 0.19486299
SAMPLE 0.001mM 0.01mM 0.05mM 0.1mM 1mM 5mM
0mM 0.725295989 0.01493278 4.34801E-05 0.00110965 3.9827E-05 0.0001806
0.001mM 0.01538193 3.75511E-05 0.00110734 2.6423E-05 0.00015833
0.01mM 0.003098678 0.01646259 0.03115102 0.02956846
0.05mM 0.50830512 0.02162303 0.001836
0.1mM 0.08530709 0.03644288
1mM 0.5937395
SAMPLE 2-array NA 2-array MA
1-array 0.005129823 0.00037386
2-array NA 0.02710738
SAMPLE Base/pUC19 Base/NHEJ Base/pUC19/NHEJ
Base 0.000148514 0.05700845 1.27806E-05
Base/pUC19 0.00021828 0.000391748
Base/NHEJ 1.52103E-05
SAMPLE C1C2-N C1C2-C3 C1C2-C3-N
C1C2 0.001359854 0.00084219 3.62028E-10
C1C2-N 6.3805E-06 8.0607E-11
C1C2-C3 8.39822E-10
SAMPLE 0.01mM 0.05mM 5mM
0mM 0.283963576 0.00295126 0.001436211
0.01mM 0.0044542 0.001912779
0.05mM 0.12592268
IPTG dose range for base strain w/NHEJ and pUC19 (Fig. 5C)
Plasmid vs. No plasmid (Fig. 4B)
IPTG dose range for base strain (Fig. S10)
IPTG dose range for base strain containing pUC19 (Fig. 4C)
1 v. 2 arrays in base strain containing pUC19 (Fig. 4D)
NHEJ vs. No NHEJ in the base strain with and without pUC19 (Fig.4E)
Growth inhibition across strains (Fig. 5B)
86
Table 2.5 | Newly acquired spacers sequenced from clonal strains.
Red: deviation from standard 33bp spacers.
Strain Sequence (5' to 3' from leader) bp Origin
gcaaaattgaatatatcgcgcccggagaacata 33 genome
gcatcatgggcgtttttacgcaataacccgctc 33 genome
ataatccatctcgccagtgaccacgacgattgc 33 genome
gtaatgtgttgcatccctattaatccgcatgat 33 genome
gattgatgtcatctggcagcatgcgatacggcg 33 genome
gtgaactggcgcagcttggcctgctggatatcc 33 genome
gtctggagtcgctgggccgtcataccatccaga 33 genome
gaccatatgcttgacgctcaaaccatcgctaca 33 genome
atacgcgcaatactggcgatataagcgagtgac 33 genome
aaagcatccccagtcgaggaatgctcttctgta 33 genome
gaatgctggcgaccaaaaatcacctccatccgc 33 genome
ggcgtttattgtttccttcctgccgccggataa 33 genome
gcagggccgttttcgttactccgccacactctc 33 genome
ggcaaacgcatcaccctctcgccgcgtaactac 33 genome
gtgatggtgatgtacctgggccgctgcgtggag 33 genome
gcgagcaatttctcccccagcaggctgcctaca 33 genome
gtgatggtgatgtacctgggccgctgcgtggag 33 genome
gcagaattcgcaaatgcagcgaaagatttcggaa 34 genome
tacctttggcgacgccatgcgcgtgccggggaa 33 genome
ggcgaaatggttaccatgccgaaccatccgaac 33 genome
gccatatgtcgtctccgttacaccttttccaca 33 genome
ggaaatgttatgcataaggagcagtagagtatt 33 genome
gtaaagaccggttggtggcagtactgaccggtct 34 genome
gtatccggtaagcggcagggtcggaacaggaga 33 pUC19
taaaccagccagccggaagggccgagcgcagaa 33 pUC19
gtaatcatggtcatagctgtttcctgtgtggaa 33 pUC19
ccagtagttatccccctccatcaggcagtttcc 33 genome
gaaaatactgtggaagaaagcgacgagaaagcc 33 genome
gtcacgaccagagtgcgggggcgggcacaaccac 34 genome
ggcacgtatatcgttctgcccgcgaagccattt 33 genome
gataccggtgcaccttcaagtaattcattcacg 33 genome
gattacacagggttgaaagaacacgacgtcatt 33 genome
gcgaccatcgctgctgatcacgcgacgaagcaa 33 genome
ggcggcagttttaacagccactacggtgtcaaa 33 genome
ggagaaattgaccatgcgccgatgctgtggattg 34 genome
caaaaatttttaatggagggaaggagcgatttg 33 genome
ggctgttttatcgctttgttgctgctggagttg 33 genome
gcctcgcgccgaacgcggaaatcggcaagatca 33 genome
gtcgctgtcgttctcaaaatcggtggagctgca 33 genome
ggtaaccatctgcgtggtgccagcgcggttatc 33 genome
gaagagcggagcagtcgggaatggtcgctgtaa 33 genome
Base strain: 5-day induction
Base strain w/pUC19: 5-day
induction
87
Strain Sequence (5' to 3' from leader) bp Origin
ggaaccgtaaaaaggccgcgttgctggcgtttt 33 pUC19
caaatatgtatccgctcatgagacaataa 29 pUC19
gcatttatcagggttattgtctcatgagcggat 33 pUC19
gcagcgaattttgctcaccaatgcgcatccgcac 34 genome
gtagaacgtacttctcgccagccctgccgctttt 34 genome
gcgagcgcgaataaatattctgccagtaatcaa 33 genome
gccgatgatcatccactcaacgttgacgtagtt 33 genome
gtaaaactcgcacccgccgcagcatgttgtttg 33 genome
ggcgatcaggttggtatcgtcatcctgctcttc 33 genome
gaaacatcaggcgcaggagaaaaagaatggata 33 genome
gcaggatttcacgctgccatgattccagcagtg 33 genome
gacacattgttgagcgcgtgaacatcaccgtta 33 genome
gtaaaggccgtaaataaggagcgtcgctgatgt 33 genome
gtgagcagtaccgaatgcgtccattacgaacac 33 genome
ttattacgcgcaatctggctgttgaatgattac 33 genome
tcagatggcgctgggcgcaatgcgcgccattac 33 genome
tcgttcgagaatgcactcaccatatggattgct 33 genome
ctggcaatggcacctgaagaagcgcctgcaatg 33 genome
gctttttttcgccgccgcttttttcgccggcgc 33 genome
tcagatggcgctgggcgcaatgcgcgccattac 33 genome
tgatggtggttaacggcgggatataacatgagc 33 genome
ttgcaggatgcgctgagcgattaatagcccaac 33 genome
ttattgactaccggaagcagtgtgaccgtgtgc 33 genome
cttccaccacctgaccaatcttctcctgcgcgc 33 genome
caatgaaatccgggttactgtaaacgcggaaac 33 genome
cttccaccacctgaccaatcttctcctgcgcgc 33 genome
gggaatgtaattcagctccgccatcgccgcttc 33 genome
tcagccgctttgttggtggcatcttttgcacgc 33 genome
acgttcattccttttcgttttcggagcaaagac 33 genome
Base strain w/pUC19: 1-day
induction
C1C2-C3-N: post infection spacers
88
Text 2.1 | Calculating array expansion rate in Figure 2.6C
Let Ni be the number of cells with i spacers in the population. Ni changes with respect to time
according to:
��#(�)
�� = �#(�)6� − �",#7
��!(�)
�� = �!(�)6� − �",!7 + �!+$(�) �",!+$
Where cells at all array lengths have the same growth rate constant � and acquisition rate �".
Starting with an unexpanded cell population of M0, solving this set of differential equations
gives:
�#(�) = �#�.-+,#/(
�!(�) = 1
�!
�#6�"�7
!
�.-+,#/(
The fraction of cells at array length i within the population is therefore:
�!(�) = �!(�)
∑ �0 1 (�) 02#
=
1
�!
�#6�"�7
!
�.-+,#/(
∑ 1
�!
�#6�"�7
0
� 1 .-+,#/(
02#
Cancelling M0 and �.-+,#/(
, the fraction simplifies to:
�!(�) =
1
�!
6�"�7
!
∑ 1
�!
6�"�7 1 0
02#
showing that under the assumption where cells of different array lengths all grow at the same rate
�, the fractions of cells of each array length in the population over time depend only on the
acquisition rate. Furthermore, using the +0 fraction and making a substitution in the
denominator:
�#(�) = �+,#(
89
it can be shown that the acquisition rate r equals the exponential decay rate of �#(�).
Text 2.2 | Modeling array expansion with variable growth in the two-array strain
In the case of the strain containing both the native array and mini array, the number of cells with
a given array length at a given time is determined similarly to equation (Eq. 3) of the main body.
Shown below is the system of equations representing the number of cells with each native array
length.
��#
34(
�� = �#
34(�#
34( − �34(�#
34(
��!
34(
�� = (�!
34( − �34()�!
34( + �34(�!+$
34(
The key difference arises in the determination of the growth rates associated with each array length.
In the main body, a variable growth rate was described using the form �! = �#�!
. Because each
cell contains both arrays, cells with any given native array length contain a representative
population of mini array lengths, and vice versa. Accordingly, growth at any given native or mini
array length is determined via a weighted sum of the growth rates of the other array. Shown below
for unexpanded native arrays,
�#
34( = �#(�#
5!3! + ��$
5!3! + �6�6
5!3! + ⋯ )
where
�0
5!3! = �0
5!3!
∑ �0
5!3!
0
Generalizing to any native array length yields the following expression:
90
�!
34( = �! ?@�0
5!3!�0
0
A
Plugging in our earlier definition of �!, we can rewrite the previous expression as:
�!
34( = �# ?@�0
5!3!�!*0
0
A
This representation also describes the growth rates for each mini array length population by
swapping the tags ‘mini’ and ‘nat’. These coupled equations, assuming a single �# and �, were
used to determine the expansion rates presented in Fig. 2.8D of the main body.
91
Chapter 3
CRISPR-Cas interference: Genome
Streamlining
3.1 Introduction
Genome streamlining is the process of significantly reducing the genome size for a host
organism by eliminating non-essential genomic DNA. This process can improve the performance
of bacterial strains for biotechnological applications. Growth productivity, energy efficiency, and
biosynthetic production yields are among properties that can be improved through bacterial
genome streamlining. Whole genome synthesis and template directed repair utilize a bottom-up
and top-down approach, respectively representing two very different strategies for genome
streamlining. Here we describe an alternative top-down approach to streamlining based on the
production of sequential genomic deletions. In theory, this method would utilize CRISPR-Cas selftargeting to generate random genomic deletions, accelerating the natural selection for eliminating
unnecessary DNA. This process first utilizes CRISPR adaptation in the absence of mobile genetic
elements (MGEs) to acquire self-genome derived spacers, with interference proteins subsequently
deployed to produce host-targeted genomic deletions. The Cas1-Cas2 integrase complex acquires
spacers, which ultimately guide CRISPR interference machinery to complementary sequences for
degradation. Although CRISPR-Cas systems generally function for adaptive defense, in the
absence of MGEs such as phage and plasmids the system can be coopted to target host genome
92
itself. We are currently developing this method in the model Type I-E CRISPR-Cas system of E.
coli. This approach aims to generate random, self-genome deletions through controlled expression
of CRISPR machinery, with sites targeted for deletion defined by array-integrated spacer
sequences.
The purpose of this study is to develop a system and set of tools capable of generating
genome-optimized and significantly genome-reduced E. coli chassis strains. In dynamic natural
environments bacteria benefit from genetic redundancy and plasticity. In more stable, resource rich
conditions these genes are often unnecessary and burdensome. Eliminating this DNA can cut
energy costs associated with genetic maintenance and reduce the potential for interference with
heterologous genes. Identifying combinations of deletions with either neutral or beneficial effects
may also shed light on context dependent genetic essentiality and provide alternative pathways for
genome reduction. Our CRISPR-adaptation method utilizes a top-down approach for iterative
genome reduction based on unbiased, random deletions that may lead to improved microbial
chassis and inform chassis design. The goal of this research is to enable a distinctly new capability
that facilitates microbial genome reduction. The concept of generating random genomic deletions
is simple, but development of an effective and efficient method is challenging. This method would
redirect CRISPR adaptive immunity to produce self-genome deletions in the host bacterial cells.
This process may be utilized to generate deletion libraries in E. coli populations that can then be
selected upon. Evolving these random-deletion populations may identify new combinations of
beneficial deletions while eliminating unnecessary DNA.
CRISPR arrays contain spacers that define targets for CRISPR effector proteins. In the case
of E. coli Type I-E these effectors are comprised of a Cascade complex and Cas3 helicase-nuclease.
The Cascade processes precursor crRNA, producing mature crRNA that then guides the complex
93
to target sites for subsequent degradation, mediated by Cas3. In the absence of cas3, Cascade alone
can bind to a target protospacer and repress expression (Luo et al., 2015). Formation of an R-loop
after Cascade-target binding creates a conformational change leading to the recruitment of Cas3.
This recruitment results in Cas3 mediated degradation of the target DNA (He et al., 2020; Jackson
et al., 2014). Alternatively, the widely utilized Type II-A CRISPR effector Cas9 of Streptococcus
pyogenes can alone (with a bound crRNA) perform the tasks of target site identification,
hybridization and degradation. Upon annealing to a target sequence Cas9 can generate a double
strand DNA break by itself. The main difference between these two nucleases (Cas3 and Cas9) is
the ability for Cas3 reel in DNA through a helicase domain, allowing for processive degradation.
Cas9 contains two endonucleolytic domains, generating a blunt-end double strand DNA break.
Cas3 can generate a nick (single strand DNA break), with a separate Cas3 likely producing a nick
on the opposite strand, and both subsequently being degraded. If target site damage by either Cas3
or Cas9 is not repaired by the host or heterologous machinery, it leads to cell death. Successful
repair back to the original sequence can occur via the native homology directed repair (HDR)
pathway. A second native self-repair pathway for both Cas3 and Cas9 target degradation is called
alternative end joining (Wimmer and Beisel, 2020). This pathway utilizes strand resection via
recBCD, which searches for microhomologies on either side of the sequence break. Once short
micro-homologous sequences are found the break is repaired by LigA, leading to deletion of the
intervening region between the two microhomologies. Alternative end joining is a useful strategy
for bacterial species such as E. coli which are deficient in robust end joining pathways such as
non-homologous end joining (NHEJ). However, some heterologous NHEJ pathways are functional
in E. coli enabling a third option for successfully repairing DNA damage.
94
The goal of this study was to develop two inducible switches, one for CRISPR adaptation
and the other for CRISPR interference. This would allow us to independently control the
acquisition of target sequences and the utilization of the processed crRNAs for target degradation.
The aim is to explore the potential for establishing a systematic approach to generating random
genomic deletions within the E. coli genome that may be ported to other species. The first step in
this process is enabling controllable acquisition of exclusively self-genome derived spacers. This
was achieved as described earlier through the integration of a T7lac-cas1-cas2 operon into the E.
coli genome, allowing for spacer acquisition in the absence of mobile genetic elements (MGE).
The next step is to utilize the expressed/processed precursor crRNAs via a second inducible operon
also genome integrated. We sought to quantify the lethality of this process and characterize the
types of genome deletions generated. With a relatively low self-targeting survival rate the viability
of this method would rely on identifying options for enhancing the survivability of self-targeting
while maximizing the diversity of mutated clones within E. coli cultures. Another challenge is
minimizing the potential for CRISPR-Cas-defective mutants arising and maintaining a much larger
proportion of the population than deletion mutants. Two distinct host platforms are being
developed in E. coli BL21AI using either the native Type I-E system or the S. pyogenes native
Type II-A system. Characterization of both hosts were performed using programmed crRNA for
specific targets to learn more about the survivability of self-targeting and the types of deletions
generated within those survivors. These experiments were carried out both with and without a
heterologous non-homologous end joining (NHEJ) pathway.
95
3.2 Materials and Methods
3.2.1 Plasmid construction and strain engineering
In our Escherichia coli self-genome targeting studies, we generated strains enabled with
either E. coli native cascade-cas3 or S. pyogenes native cas9 targeting machinery. For both systems
we assembled rhamnose inducible constructs allowing for controlled expression of the effector
proteins. Initially, these constructs were assembled in plasmids prior to being integrated, via the
pSPIN-INTEGRATE protocol into the respective host genomes. The backbone plasmid used for
construction, containing the rhamnose-inducible system was pCas3cRh (Addgene: 133773). The
stock version of this plasmid contains Type I-C cascade-cas3 from Pseudomonas aeruginosa
controlled by the rhaB rhamnose inducible promoter. For the E. coli Type I-E system, cascade and
cas3 were individually amplified from MG1655 and assembled with pCas3cRh backbone via
Gibson Assembly, allowing for inducible expression of the Type I-E interference genes. cascade
was first cloned into the plasmid without cas3 to enable a version of targeting where Cascade binds
and blocks transcription from protospacer sites (target repression) without affecting the DNA
sequence. A second version with cas3 subcloned in directly downstream of Cascade, allowed for
expression of all interference proteins from the same inducible operon (target degradation). Both
plasmid-based operons were PCR amplified and subcloned into the cargo region of a pSPIN
plasmid to facilitate E. coli genome integration. The base recording strain containing inducible
cas1-cas2 was transformed with the pSPIN plasmid followed by cargo integration and plasmid
curing. This enabled two different applications, one for self-repression (Cascade alone) and the
other for self-deletion (Cascade-Cas3). Utilization of programmed spacers enabled targeted
96
modifications. For strains where Cas1-Cas2 acquired spacers were used in conjunction with
interference machinery, target sites were random.
Streptococcus pyogenes Type II-A CRISPR interference requires just the Cas9 effector
protein in addition to crRNA. We developed an inducible system in E. coli BL21-AI. cas9 was
amplified from the pCas plasmid (Addgene: 62225), and assembled into pCas3cRh (Gibson
Assembly), replacing cascade-cas3, and enabling rhamnose-inducible Cas9 expression. This
operon was subsequently integrated into the genome of E. coli BL21-AI using the pSPININTEGRATE method. The site of integration is an intergenic region known to allow for high levels
of heterologous gene expression (Park et al., 2020). Several other components native to S.
pyogenes were codon optimized and integrated into the genome of this Cas9 expressing host. These
additional components were intended to enable heterologous Type II-A CRISPR adaptation in E.
coli. The following Type II-A components native to S. pyogenes were genome-integrated: cas1,
cas2, csn2, tracrRNA, CRISPR array and rnaseIII. The RnaseIII is involved in processing
precursor crRNA prior to array integration. The three CRISPR adaptation genes were synthesized
by Twist Bioscience and assembled into a plasmid with a T7lac promoter to allow for tight control
of expression. The CRISPR array was also subcloned into this plasmid following the adaptation
operon. This array contained the constitutive, native S. pyogenes promoter within the upstream
leader sequence. The array was amplified from plasmid pAW091 (Addgene: 113253). It contained
five conserved S. pyogenes spacers. After subcloning these Type II-A CRISPR adaptation
components into pSPIN, they were integrated into the genome of the BL21-AI/RhaB-cas9 strain.
A third construct was also synthesized by Twist Bioscience. This construct contained inducible
rnaseIII in front of a tetR/tetA promoter allowing for anhydrotetracycline (ATc) inducible
expression. It also contained the guideRNA scaffold tracrRNA, constitutively expressed, as well
97
as the AmpR antibiotic resistance marker. This third construct was integrated into the host genome
already containing inducible cas9 and inducible Type II-A adaptation machinery. The three
constructs were each integrated into separate intergenic regions within the BL21-AI host strain
using pSPIN, curing the cargo vector after each integration. The cas9 construct was integrated
between genes adiA and adiY. The CRISPR adaptation construct was integrated between the genes
focA and GSU80_RS04535. The rnaseIII construct was integrated between the genes glmS and
pstS. These three constructs and locations of genomic integrations are shown in Figure 3.1.
NHEJ genes ku and ligD of Mycobacterium smegmatis were designed as part of a genetic
construct synthesized by Twist Bioscience. The sequence was codon optimized for E. coli
expression and synthesized into two partially overlapping fragments. The two fragments were
sequentially subcloned into a plasmid containing a promoter allowing for weak constitutive
expression. Following construction, the plasmid was restriction enzyme digested and sequenced
to confirm proper assembly. This plasmid was transformed into several strains to enable expression
of Ku and LigD. The NHEJ construct including promoter were subcloned from the constructed
plasmid into a pSPIN plasmid to integrate just the promoter and ku-ligD construct. Genome
integration was useful for this project to eliminate plasmids in our host strains, as they contribute
spacers during CRISPR adaptation. Acquiring host-derived spacers only requires the absence of
MGEs.
98
Figure 3.1 | S. pyogenes Type II-A CRISPR adaptation machinery in E. coli. (A) Three S.
pyogenes DNA constructs were individually integrated into the genome of E. coli BL21-AI. 1.
Rhamnose-inducible cas9. 2. IPTG/arabinose-inducible adaptation machinery and CRISPR array.
3. aTc-inducible rnaseIII and constitutive tracrRNA. (B) Gel images confirming successful
genome integrations using primers flanking respective integration sites. From left to right shows
amplicons from constructs 1, 2 and 3, respectively. (C) E. coli BL21-AI loci utilized for the three
construct integrations. Site numbers correspond to construct numbers 1-3. The pSPIN cargo ends
are recognized by a transposase of V. cholerae with guided integration assisted by TniQ-Cascade
of the INTEGRATE system.
3.2.2 E. coli Type I-E programmed self-targeting assay
Prior to utilizing random self-targeting spacers acquired through Cas1-Cas2 expression we
sought to validate the functionality of the CRISPR interference machinery. This was done using a
plasmid that constitutively expresses a programmed crRNA. To program a crRNA of interest,
99
duplex DNA oligos for the encoded spacer can be ligated into a KpnI and XhoI double digested
pcrRNA plasmid (Addgene: 61285) with subsequent T4 DNA ligase ligation. In this case we
programmed in a crRNA targeting host gene lacZ. Two ssDNA oligos (65bp and 73bp) synthesized
by IDT were annealed forming an array repeat and spacer targeting the promoter region of lacZ.
Sanger sequencing was employed to confirm a correctly assembled plasmid. This allowed us to
perform Xgal experiments and determine the frequency of clones produced with a modified
(partially or fully deleted) lacZ gene. It would also allow us to determine the size range of deletions
generated in modified clones. In X-gal experiments the lacZ product, Beta-galactosidase, interacts
with X-gal to produce blue colored colonies. When Beta-galactosidase is not expressed the
colonies are white, indicating either downregulation of the gene or gene sequence modification.
To assure lacZ expression, cells were treated with IPTG to prevent any repression of lacZ by the
lacI repressor. To set up the experiment, 30µL overnight culture was inoculated into 3mL fresh LB
media containing spectinomycin (50µg/mL) and rhamnose (0.1%). Cultures were incubated at
37°C for five hours to allow lacZ-targeting survivors to proliferate. Samples from these cultures
were then diluted and plated onto LB-agar containing spectinomycin, IPTG and Xgal. Some of the
resulting colonies (both blue and white) were examined at the lacZ locus.
3.2.3 S. pyogenes Type II-A programmed self-targeting assay
Characterization of self-targeted DNA sequence modifications were performed via PCR in
our E. coli BL21-AI strain containing the rhamnose-inducible cas9 construct within the genome.
An individual host-genome targeting spacer can be programmed into the pTargetF plasmid that
constitutively expresses the corresponding sgRNA. A single target is programmed in via inverse
100
PCR containing half of the new spacer-repeat on each 5’ primer tail. After the PCR, blunt end
ligation is performed via T4 DNA ligase with the product transformed into E. coli DH5a and plated
on antibiotic-selective agar. Colonies are individually scaled up in LB, subsequently plasmid
purified and sequenced across the ligation site using Sanger sequencing to identify clonal plasmids
containing the spacer. We utilized this process to introduce a lacZ targeting spacer, with the
resulting plasmid transformed into the rhaB-cas9 enabled E. coli strain. The self-targeting protocol
included rhamnose induction and ~7h of growth in LB media within a 37°C shaker incubator. The
culture was then diluted and plated onto solid agar for blue-white colony screening and
characterization of deletion frequencies and sizes.
3.2.4 Tiling PCR to assess the size of self-targeting deletions
To evaluate the types of deletions generated from lacZ self-targeting experiments we used
a process called tiling PCR. For short deletions a PCR directly across the target site can directly
reveal the deletion size. For larger deletions however, PCR primers close to the target site may not
be sufficient as the genome annealing sites may have been part of the deletion. Deletions generated
by Cascade-Cas3 targeting and A-EJ repair are generally large, often exceeding 20 kbp, determined
by the presence of microhomologies on either side of the target. The tiling PCR process consists
of designing short PCRs (<1kbp) at increasing distances from the crRNA target site. If a PCR does
not produce a product at distance X from the target site, a new PCR is run at distance 2X from the
target, with this process continued until the approximate locations of deletion borders are
identified. This process reveals the extent of sequence loss on each side of the protospacer,
approximating the full deletion size, characterized as either unidirectional or bidirectional. This
101
process was utilized to delineate lacZ deletions within clones from cultures targeted by CascadeCas3. LB-agar plates containing antibiotics, IPTG and X-gal were plated with cells that had
undergone Cascade-Cas3 expression in the presence of a lacZ complementary crRNA. Colonies
scaled on these agar plates were either blue (intact lacZ) or white (disrupted lacZ). White colonies
were individually assessed using tiling PCR with primer pairs starting near the target site and
expanding outwards.
3.2.5 OD600 growth measurements
Growth curves for two different strains were measured with and without rhamnose-induced
expression of Cascade-Cas3 interference machinery. The two strains, C1C2-C3 and C1C2-C3-N,
also contained a plasmid constitutively expressing a lacZ-targeting crRNA. Overnight cultures
were scaled up from cryo-stocks with spectinomycin (50µg/mL) and chloramphenicol (15µg/mL).
The following day each were passaged 1:100 into two separate tubes containing fresh LB and
antibiotics, one with rhamnose (0.1%) and the other without rhamnose. Replicates for each of these
cultures were randomly distributed across a clear, flat bottom 96-well plate. Each replicate well
contained 200µL of newly inoculated culture. All experimental wells were surrounded by other
experimental wells or wells with water to minimize evaporation during the growth time course.
The plate was incubated at 37°C in a plate reader (TECAN Infinite 200 PRO) with shaking and
OD600 readings every 10 minutes over a 10-hour period. After the experiment OD600 densities
were charted to assess the differences between the two strains with and without cascade-cas3
induction.
102
3.3 Results
3.3.1 E. coli Type I-E programmed self-targeting
E. coli Type I-E CRISPR interference machinery was genome-integrated into the base
recording strain containing T7-lac inducible cas1-cas2. Interference machinery comprised of
cascade and cas3 was first subcloned into a plasmid with a rhamnose-inducible promoter
controlling the expression of all genes and subsequently genome-integrated. We sought to
characterize the size of genomic deletions generated from E. coli Cascade-Cas3 activity by coexpressing a lacZ-targeting crRNA. The pcrRNA plasmid constitutively expressing this crRNA
was transformed into the adaptation/interference enabled E. coli BL21-AI strain. A second strain
was generated from the first, with the addition of another genome-integrated construct, this one
expressing non-homologous end joining (NHEJ) genes ku and ligD of Mycobacterium smegmatis.
To assess growth curves of each strain, cultures were assessed by tracking OD600 over ten
hours, as described in the previous section. Cultures with and without cascade-cas3 induction were
analyzed. Uninduced cultures produced essentially the same growth curve for both strains, whereas
the NHEJ expressing strain recovered significantly earlier when Cascade-Cas3 targeting lacZ was
induced (Figure 3.2A). The two strains were then assessed for blue/white colony ratios after
scaling 3mL induced cultures for 7 hours and plating on solid agar containing IPTG and Xgal. The
strain containing NHEJ produced more colonies in total, with a higher proportion being white,
indicating a greater probability of DNA damage repair in lacZ-targeted cells when Ku and LigD
are expressed (Figure 3.2B). White colonies on both plates also confirm the functionality of
cascade and cas3 upon induction. Without robust dsDNA break repair pathways, E. coli cells
without heterologous NHEJ have a low probability of surviving Cascade-Cas3 self-targeting. The
103
blue colonies appearing on the X-gal plates are either cells repaired via HR during exponential
growth or CRISPR-Cas mutants. A strong selective pressure favoring CRISPR-Cas loss of function
mutations is exerted when Cascade-Cas3 along with a host-targeting crRNA are expressed. We did
not sequence any of these blue colonies to identify their genotypes. There are several distinct
regions where natural mutations (SNPs or indels) can render CRISPR-Cas non-functional,
including: cascade-cas3, rhamnose promoter components, lacZ-target protospacer, lacZ-target
PAM or the crRNA spacer sequence/promoter. The crRNA expression construct is an unlikely lossof-function source however, as host cells contain multiple copies of this plasmid.
Figure 3.2 | Heterologous NHEJ expression increases the survivability of Cascade-Cas3 selftargeting. (A) Tracking OD600 to evaluate growth in Cascade-Cas3 competent strains with and
without heterologous NHEJ expression. For each strain cultures were both induced and not
induced for Cascade-Cas3 expression. (B) The non-NHEJ strain (C1C2-C3) and NHEJ strain
(C1C2-C3-N), induced for Cascade-Cas3 expression were incubated in liquid cultures for 7 hours
104
prior to diluting and plating samples on LB-agar containing antibiotics, IPTG and Xgal. Blue
colonies: + b-gal, white colonies: - b-gal.
To further characterize the lacZ loci from isolated white clones, tiling PCR was employed
to estimate the size of target site deletions. From a 7-hour cascade-cas3 induction time course,
cells were isolated on X-gal agar plates from both the NHEJ and non-NHEJ cultures. Ten white
colonies from C1C2-C3 and 12 from C1C2-C3-N (strain w/NHEJ) were examined. An initial PCR
was performed using primers annealing ~5kb on either side of the protospacer. None of the clones
from C1C2-C3 parental produced an amplicon, whereas several of the clones from C1C2-C3-N
produced a large PCR product close to 10kb in size (Figure 3.3AB). For each clone not producing
a PCR product from the first reaction, short PCR reactions were performed at increasing distances
from the lacZ protospacer, on both sides, to approximate the size and directionality of the deletion.
From the strain not containing NHEJ all ten clones were found to have bidirectional deletions with
the shortest being ~17.5kb, the largest being more than 43kb and the average being no less than
26kbp (Figure 3.3C). Repair of these cleaved/degraded DNA sequences is performed by the native
alternative end-joining (A-EJ) pathway utilizing microhomology mediated repair. Deletions across
clones from the strain with NHEJ varied more significantly. Five of the 12 clones contained
unidirectional deletions from the protospacer, ranging in size from 0.7-2.9kbp. These deletions are
much smaller than those of the non-NHEJ clones and likely the result of Ku and LigD mediated
NHEJ processing. However, some of the clones from this strain contained deletions as large as
34kb. Producing both large and small deletions, the NHEJ-enabled strain can likely utilize either
A-EJ or NHEJ to repair broken and partially degraded DNA ends. The average deletion size across
clones derived from C1C2-C3-N was ~16.5kb. Heterologous, bacterial NHEJ repair of Cas9-
mediated dsDNA breaks in E. coli have been shown to produces smaller deletions on average
105
compared to the non-NHEJ parent (Zheng et al., 2017). Our study is the first to demonstrate the
repair efficacy of heterologous NHEJ for Cascade-Cas3 mediated DNA damage in E. coli.
The Cascade-Cas3 mechanism for target degradation in E. coli is thought to produce
unidirectional deletions as Cas3 nicks, “reels” in and degrades ssDNA from one side of the
protospacer before dissociating from Cascade and translocating along that same side (Redding et
al., 2015). In vitro studies characterizing this process found that long-range unidirectional deletions
were generated upstream of protospacer target sites (Mulepati et al., 2013; Yoshimi et al., 2022).
In vivo work incorporating targeted Cascade-Cas3 deletions in human cells also found that
deletions were unidirectional (Dolan et al., 2019; Morisaka et al., 2019). However, human cells
contain a robust NHEJ pathway to repair double strand breaks in DNA (Change et al., 2017). In
our interference enabled strain without NHEJ, we detect only bidirectional deletions (10-clone
sample size) resulting from self-genome targeting. It has been shown that Type I-C Cascade-Cas3
from Pseudomonas aeruginosa also generates bidirectional deletions in E. coli (Csörgő et al.,
2020). Our findings suggests when only native E. coli repair machinery is present, bidirectional
deletions result from Cascade-Cas3 target processing. When NHEJ is present however, from either
heterologous expression in E. coli or native expression in human cells, unidirectional deletions
result from this type of end-joining repair.
106
Figure 3.3 | NHEJ expression repairs Cas3-damaged DNA producing smaller deletions. (A)
lacZ deletion clones isolated from strain C1C2-C3-N and PCR amplified with flanking primers
each ~5kb away from the targeting protospacer (~10kb parental amplicon). Each column
represents a clone (3 clones produced an amplicon). (B) The same as in panel A except these lacZdisrupted clones are from strain C1C2-C3 (none produced an amplicon). (C) 12 white clones from
the strain w/NHEJ (black) and 10 from the NHEJ-free strain (brown) were sequence-assessed near
the protospacer site targeted by a crRNA for Cascade-Cas3 degradation. Inward facing arrows (
) indicate tiling-PCR primer pairs used to detect intact or deleted sequence at different distances
from the protospacer, on both sides. The vertical red line represents the protospacer site targeted
by CRISPR-Cas interference machinery Cascade and Cas3. The left deletion borders for C1C2-C3
clones 8-10 were not defined as the leftmost tiling-PCRs did not produce a product.
3.3.2 E. coli Type I-E CRISPR autoimmunity
With Type I-E CRISPR adaptation and interference functions each enabled on separate,
inducible operons we sought to use them together to explore the potential for expanding population
genetic diversity. By expressing Cas1-Cas2 in our strains containing no MGEs, spacers derived
from thousands of regions throughout the genome can potentially be integrated into the CRISPR
array, enabling exclusively self-genome targeting. Inducing cascade and cas3 then leads to the
production of crRNAs that target and potentially degrade DNA in regions adjacent to protospacer
107
sites. If the frequency of surviving self-genome targeting is high enough, the resulting population
may have significant cellular fitness variance to select upon. A unique attribute of CRISPR-arraybased targeting is the inherent multiplexing capabilities. Other methods for generating populationwide fitness variance (e.g. sgRNA libraries with dCas9; transposon mutagenesis) produce a wide
range of affected cells each with one or maybe two genomic regions affected. The CRISPR array
can continually integrate new spacers, proliferating single-cell genomic target sites, and expanding
the option space for fitness gain. The more modifications per cell the greater the potential for
pleiotropic effects that cannot arise in single-site modified cells.
We have tried several induction strategies including the expression of
adaptation/interference modules simultaneously as well as sequentially, first acquiring spacers,
turning off cas1-cas2 before inducing cascade-cas3 for spacer utilization. Unfortunately, we have
not confirmed the ability to generate arbitrary deletions using this approach. Expanded-array
clones were isolated after Cas1-Cas2 expression and subsequent survival of cascade-cas3
induction, with both the spacers and protospacers sequenced. We found that the target sites were
still fully intact. There are two logical explanations for these results. The first is the low survival
rate for CRISPR-Cas self-targeting in E. coli even with heterologous NHEJ expression. The
population impact of self-targeting ultimately produces a strong selective pressure favoring the
survival of cells with a modified CRISPR system. These survivors may be incapable of generating
deletions but still possibly capable of acquiring spacers. The second explanation has to do with
expression and processing of the CRISPR array itself. We did not modify the array, leader sequence
or promoter from the native E. coli CRISPR locus for these experiments. We do think the array
can working properly as the CRISPR-Cas system is functional in the presence of bacteriophage
Lambda. Perhaps the cells can differentiate self-targeting from MGE-targeting crRNAs, interfering
108
to prevent the highly lethal potential of dsDNA breaks within the genome. Secondly, a regulator
of CRISPR array expression is the global regulator H-NS, responsible for repressing many genes
involved in E. coli stress responses. In the absence of a true infection expression of the array may
be at least partially repressed via H-NS.
The host escape-mutation scenario may be overcome if all necessary CRISPR-Cas
machinery is contained in multi-copy plasmids, instead of relying on single-copy expression from
the genome. With each cell containing at least a few copies of every CRISPR-Cas component,
generating the sheer number of aligned mutations needed to eliminate CRISPR-Cas function may
be improbable. Plasmids do complicate a self-targeting procedure however as spacers would also
be derived from and therefore target the plasmids themselves, not just the host genome. Even so,
this machinery redundancy may offer a viable path for single-cell accumulation of host genome
deletions.
A modified version of this random self-targeting approach expresses Cascade alone without
Cas3, in populations producing exclusively self-targeting crRNAs. The E. coli Cascade:crRNA
ribonucleoprotein complex binds to the target protospacer, effectively repressing transcription of
the region (Luo et al., 2015). The extent of repression varies depending on the binding location
along the gene and whether the protospacer is on the template or non-template strand. Expressing
Cascade (without Cas3) after a period of spacer acquisition may generate significant randomrepression induced fitness variance across a population that could then be selected upon. Through
crRNA-guided random repression, enhanced fitness clones may proliferate specific spacers in the
population. Subsequent rounds of spacer acquisition and Cascade expression may encourage
further enrichment, proliferating cells with combinations of beneficial repression targets within
the experimental context.
109
3.3.3 S. pyogenes Type II-A CRISPR-Cas in E. coli
Rhamnose-inducible interference gene cas9 was integrated into E. coli BL21-AI. With the
subsequent introduction of a plasmid constitutively expressing a cognate lacZ-targeting sgRNA
we confirmed a functional, inducible cas9 construct. The production of white colonies on solid
agar containing Xgal and IPTG followed lacZ targeting through a period of cas9 induction. The
interference machinery, CRISPR array, tracrRNA and S.py-rnaseIII were incorporated into the E.
coli genome of the strain with integrated cas9. However, we were not able to acquire spacers for
this Type II-A system in E. coli. Through several attempts over multiple days expressing all
integrated Type II-A machinery (all are necessary for spacer acquisition in this system), we did not
detect any spacer acquisition. Potentially unidentified factors also necessary for successful spacer
acquisition may also be required. Although this S. pyogenes CRISPR-adaptation process is
functional in the species Staphylococcus aureus, it does not appear to be functional in E. coli, even
with the same set of heterologous genetic components (Heler et al., 2015).
110
Chapter 4
Ongoing Research
4.1 CRISPR array contraction
Cas1-Cas2 acquired spacers are directly flanked by identical repeat sequences within
CRISPR arrays. During genome replication, slippage can occur whereby two adjacent repeats
partially anneal, leading to the duplication or deletion of the intervening spacer, depending whether
annealing occurs on the replicating strand or the template strand, respectively. Frequencies of
spacer deletions/duplications are not well defined. Comparative genomics work has identified
differences in CRISPR array content for closely related strains with nearly identical array
repertoires. These differences indicate that spacer deletions can occur, generally in the middle of
CRISPR arrays (Lillestol et al., 2006; Held et al., 2010; Achigar et al., 2017). Spacers closer to the
center of an array are more likely to be deleted or duplicated as there are more possibilities for
flanking repeats on each side to anneal, increasing the probability of replication misalignment
relative to other spacers in the array. Additionally, the proportion of cells in a population with a
particular spacer is more likely to change if that spacer changes the fitness of the host when the
spacer is utilized during target interference (Jiang et al., 2013; Delaney et al., 2012). However,
spacers can also be deleted in the absence of a selective pressure (Horvath et al., 2008;
Gudbergsdottir et al., 2011). In Shipman et al., 2016, deep sequencing of an E. coli population
111
across the leader-proximal end of the array found that the first spacer (an existing parental spacer)
was deleted in about 0.096% of sequenced cells after about a day of cas1-cas2 induction.
In the CRISPR array dynamics study presented in chapter 2, we did not detect the presence
of contracted array PCR bands in our cultures after several days of cas1-cas2 induction. Our
population quantification method was limited by a minimum detection threshold of about 0.5% of
the population, meaning smaller populations may be present but undetectable. To further assess
array contraction, we adapted an existing method allowing us to use fluorescence to detect arraycontracted cells. The original method, developed by Amlinger et al., 2017, utilized a modified
CRISPR array containing a truncated leader sequence with an upstream yellow fluorescent protein
(YFP) gene. The modified array also contained a downstream promoter that constitutively
expresses back upstream, through the array, truncated leader and YFP gene. This technique takes
advantage of the fact that each acquired spacer generally integrates into the same location (leaderrepeat junction) and expands the array by 61 base pairs. The unexpanded initial array produces a
stop codon within the leader sequence, so the transcript is not translated beyond the leader and into
the YFP sequence, allowing all cells non-fluorescent. When a spacer is integrated this expands the
array by 61 base pairs producing a frame shift, eliminating the leader-based stop codon and
allowing for expression of YFP. We designed a construct with the reverse function where the
expanded array cells are not fluorescent but the unexpanded array cells are (Figure 4.1A). This
allows us to monitor nonfluorescent expanded-array parental cells for the presence of fluorescent
clones, potentially indicating cells with a contracted array (Figure 4.1B). After design the
construct was synthesized by Twist Bioscience and subsequently integrated into the genome of the
base recording strain containing inducible cas1-cas2. Genome integration was performed using
the pSPIN guided transposition system (Vo et al., 2021).
112
CRISPR adaptation via Cas1-Cas2 expression was induced in the parental strain containing
the initially YFP-expressing construct (the single spacer within this initial array is part of the
construct design). This produced a population of cells, some of which were no longer fluorescent
(as identified through colony screening). An individual non-fluorescent clone was screened via
PCR across the leader-proximal end of the array and identified as having acquired a single new
spacer. This strain was cryopreserved and serves as our starting point strain for array-contraction
quantification experiments. This strain can be further modified to identify factors that change the
rates of array contraction. This may include modifying the repeat and or the spacer sequences, as
well as changing the length of spacers, which can vary naturally (Table 5). We can also assess the
potential for contraction rate changes based on environmental conditions, comparing for instance
cultures in standard laboratory conditions versus cultures exposed to phage infection.
113
Figure 4.1 | Fluorescent CRISPR array to assess array stability. (A) The native E. coli Type IE CRISPR array was modified by truncating the leader and maintaining only a single, benign
spacer. The construct was designed with a constitutive promoter downstream and an EYFP gene
upstream from the leader sequence. This initial construct was integrated into the E. coli genome
making cells fluorescent. (B) The array-contraction experiment utilizes +1 expanded clones that
generate a frame shift stop codon transition preventing EYFP expression. Contraction of the +1
clones to the original array length (single spacer) can produce fluorescent cells, allowing for the
quantification of spacer deletion rates.
114
4.2 Spacer associated fitness effects
A key finding from our CRISPR array-dynamics paper (chapter 2) was identifying the
relative fitness differences between expanded and unexpanded array populations, as seen in Figure
2.7E. We sought to further explore these observed relative fitness differences by characterizing
expanded-array clones to see if they were also outcompeted by unexpanded parental cells.
Specifically, we wanted to quantify fitness differences relative to parental by competing the strains
against each other. We could then determine if all +1 clones for instance present the same degree
of fitness reduction or if they show a distribution of fitness effects.
Array-expanded clones are generally isolated from cas1-cas2 induced cultures, identified
through leader-proximal array PCR. These clones are and the unexpanded parental are each
individually scaled up overnight. After OD600 measurements and cellular concentration
normalization, equal volumes of both cultures are mixed and vortexed in a microfuge tube. A
sample of the mixture is immediately taken for a time zero PCR and quantification of relative band
proportions. A sample from the mixture is then used as inoculum, dispensed into 3mL LB media
for growth in a 37°C shaker incubator. Cultures are then sampled and passaged at 24h intervals for
further analysis of population proportion changes via PCR amplicon quantifications. This method
was used to isolate expanded array clones from the base recording strain containing pUC19. These
clones were then competed against unexpanded parental as seen in Fig. S2.3. Several expanded
clones were also isolated from the base strain with no plasmids at 32h after cas1-cas2 induction
and competed against unexpanded parental. We recently enabled a second method for competing
strains using the more traditional approach of counting colony forming units (CFU). By generating
a lacZ deletion mutant version of the base recording strain (genomic inducible cas1-cas2), we can
compete clonal pairs and track their relative proportions through blue-white colony counting.
115
These strains are also cultured in isolation to assess individual growth characteristics including
growth rate and carrying capacity. These monocultures are grown at 37°C on 96-well plates in a
plate reader that tracks OD600 through exponential and stationary phases.
Described in section 4.2.1 is a method to recapitulate expanded array clones from parental
using a directed spacer integration process. If an array expanded clone isolated from a cas1-cas2
induced culture has a fitness difference (positive or negative) relative to parental, we can use
directed integration to insert the spacer into the parental strain via electroporation and test if the
fitness effect is reproduced (Figure 4.2). A working hypothesis currently being explored postulates
that spacer orientation within the array and the resulting precursor crRNA expressed can produce
fitness effects. Spacers can be derived from coding regions within genes and depending on the
orientation within an array, the spacer can produce a sequence (through array expression) either
identical or complementary to the RNA transcript (Figure 4.3A). In effect, this could allow for a
pre-crRNA to contain a host-transcript complementary sequence potentially capable annealing as
a posttranscriptional regulator. The spacer sequences are 33 base pairs allowing the pre-crRNA to
have at least 33 nucleotides of target complementarity. Small regulatory RNAs (sRNAs) are
generally 50 to 500 nucleotides sequences (Papenfort et al., 2023) and in E. coli often associate
with chaperone proteins such as Hfq to enable target silencing (Santiago-Frangos et al., 2017). A
common functional domain of sRNAs is a Rho-independent transcription terminator in the form
of a stem loop, which is very similar in size and structure to the stem loops CRISPR repeats form,
directly adjacent to spacers in pre-crRNA. We have performed some preliminary experiments to
test this hypothesis. Several +2 expanded clones were isolated from a cas1-cas2 induced base
recording strain containing pUC19. These clones were each sequenced across the leader-proximal
end of their array revealing the identities of newly acquired spacers. Strain 129 contained a spacer
116
producing antisense RNA transcript matching 7 different sites for 16S ribosomal RNA within the
host genome, and a spacer producing positive sense RNA for the secretion pathway protein GspB
(Figure 4.2A). This strain shows a slight growth defect relative to parental when cultured without
cas1-cas2 induction and a significant growth defect when cas1-cas2 is induced, perhaps
amplifying the effect when overexpressing the adaptation genes (Figure 4.2BC). The directed
spacer integration process was utilized to partially recapitulate the genotype of Strain 129. From
the parental strain, we integrated the ribosomal-RNA derived spacer into the CRISPR array. Two
strains were generated from the parent, a version with the spacer integrated in the antisense (Strain
166) orientation and a version with the opposite, positive-sense orientation (Strain 173). OD600
growth curves for these new strains were generated with and without cas1-cas2 induction. These
two strains generated comparable growth curves without cas1-cas2 induction (Figure 4.2D). With
cas1-cas2 induced however, the antisense-integrated strain reproduced the growth defect whereas
the strain with opposite spacer orientation showed no obvious growth differences relative to the
parental control (Figure 4.2E). Continued experimentation using this approach can further
illuminate what may be the cause of these observed fitness effects.
117
Figure 4.2 | Recapitulating fitness effects using directed spacer integration. (A) Strain 129,
identified as a +2 expanded-array clone from a cas1-cas2 induced culture, contains two new
spacers, a positive sense spacer (blue) and antisense spacer (red). Gray spacers are parental. (B)
Strain 129 growth analysis along with unexpanded parental and two other +2 clones each
containing distinct “newly acquired” spacers. Clones cultured without cas1-cas2 induction. (C)
Same strains as in B, cultured with cas1-cas2 induction revealing a strain 129 growth defect. (D)
The antisense spacer from Strain 129 was integrated into the CRISPR array of the parental strain
in both the antisense (Strain 166) and positive sense (Strain 173) orientations. Growth of these
clones were analyzed relative to parental control and Strain 129 with no cas1-cas2 induction. (E)
The same strains as in panel D, cultured with cas1-cas2 induction, revealing a partial reproduction
of the growth defect with antisense but not positive sense spacer integration.
4.2.1 Directed spacer acquisition
Directed CRISPR spacer acquisition is a method developed by Seth Shipman and the
Church lab (Shipman et al., 2016) whereby specific spacers can be incorporated in one of two
orientations into the E. coli Type I-E CRISPR array. The orientation of integration is determined
118
by the presence of the 5’ protospacer adjacent motif (PAM), which can be on either the top or
bottom strand (Figure 4.3A). This process can be used to array-incorporate a specific spacer to be
expressed as a crRNA for targeting a region of interest. We specifically utilized this process to
evaluate fitness effects that may result from the presence of new spacers incorporated into the
native array.
To incorporate a specific spacer, in an orientation of choice, the two complementary DNA
sequences for the spacer including the 5’ adjacent PAM (aa) were ordered from IDT. The
lyophilized DNA sequences were resuspended in annealing buffer to equimolar concentrations. In
an Eppendorf tube equal volumes of the resuspended complementary oligos were mixed and
vortexed. This mixture was incubated in a 94°C water bath for 2 minutes before removal and
cooling to room temperature, allowing the DNA to anneal. An overnight culture of the base
recording strain BL21AI-C1C2 (genomic inducible cas1-cas2) was passaged (50uL) into 3mL
fresh LB media containing 0.05mM IPTG and 0.2% arabinose. This was cultured in a 37°C shaker
for about 2 hours until the culture reached an OD600 of ~0.45. The culture was then harvested into
two 1.5mL aliquots (pre-frozen Eppendorf tubes) and centrifuged at 4°C for 3 minutes at 10000 x
g. Samples were decanted and pellets resuspended in cold water containing 10% glycerol. Samples
were then centrifuged at 3500 x g for 7 minutes also at 4°C. This wash step was performed a total
of five times. After the final wash, supernatant was decanted. The pellet was resuspended in 20uL
of cold 10% glycerol containing the annealed duplex oligos at a concentration of 3.125µM. The
20-25µL resuspension was added to an ice-cold electroporation cuvette. Electroporation was
performed at 1800 volts in an Eppendorf Eporator. Samples with time constants between 4.0 and
5.5 were advanced. Immediately after electroporation cells were resuspended in 980uL SOC media
and subsequently transferred to a 14mL culture tube for a three-hour recovery in a 37°C shaker
119
incubator. After recovery a culture sample was serial diluted and plated onto LB-agar containing
spectinomycin (50µg/mL) to isolate clonal colonies for screening. After scaling colonies at 37°C
overnight, they were individually screened using standard CRISPR-array primers flanking the
integration site. Unexpanded clones produce a 379bp product, whereas +1 expanded clones
produce a 440bp PCR product (Figure 4.3B). Initial screening identified array expanded clones,
but a second PCR screen was needed to specifically identify the presence of the electroporated
oligo (Figure 4.3C). The secondary PCR was performed with one primer annealing to the leader
sequence and the other primer annealing to the electroporated oligo in the in guided orientation.
Therefore, the production of a PCR amplicon from the second reaction indicates the presence of
electroporated oligo in the leader proximal end of the array.
Figure 4.3. Directed spacer acquisition. (A) Synthesized spacers of interest can be added into
the E. coli native CRISPR array via Cas1-Cas2 integrase. Integration orientation is determined by
120
which of the two strands contains the 5’ PAM, in turn determining which strand is expressed as
precursor crRNA. (B) PCR screened colonies using standard leader proximal primers (FP in leader,
RP in native spacer 5) to identify clones with a new spacer. 26 colonies were screened with two
(7.7%) containing one new spacer. (C) Clones 20 and 26 were PCR screened with the leaderannealing primer and a primer that anneals to the oligo-spacer electroporated into the cells. The
clone-20 CRISPR array was expanded but not with the electroporated oligo as indicated by the
absence of a PCR product between 100-200bp. Clone-26 did incorporate the electroporated oligo
as indicated by the PCR amplicon product of the correct length. Panel A in this figure is derived
from Jackson et al., 2017.
4.3 Host-derived spacers enriched post phage infection
Phage infection experiments performed as part of our CRISPR array dynamics study
revealed enrichment of host-derived spacers seemingly related to infection defense. E. coli strains
expressing all Type I-E CRISPR machinery were infected with bacteriophage Lambda and cultured
for about 16 hours. The purpose for the experiment was to track and quantify phage infection
resistance via frequent OD600 monitoring, and to assess the relationship between spacer
acquisition rate and the extent of phage protection. The strain with the fastest spacer acquisition
rate (C1C2-C3-N) was found to have the greatest degree of protection (Figure 2.9). Bacteriophage
Lambda is both lytic and lysogenic so after an initial lytic phase countered by the host system,
lambda transitions to a lysogenic phase, integrating into the host genome. About 20 hours after the
infection, strain C1C2-C3-N was streaked for colonies, which were subsequently screened for
array-expanded clones. Fourteen of these clones were assessed via Sanger sequencing to identify
the newly acquired spacers. Surprisingly, all 15 of the identified spacers were derived from the
host genome, not the infecting phage. It is a common defense strategy however for infected
prokaryotic cells to initiate abortive self-destruction to prevent internally proliferating phage from
lysing into the environment and further spreading the infection (Strotskaya et al., 2017). This may
121
partially explain the lack of phage-derived spacers identified. Most intriguing though were the
identities of these 15 host-derived spacers. Four of these spacers (3 unique) were derived from the
LacI gene (Figure 4.4A). This is significant because LacI gene expresses a protein that represses
the inducible cas1-cas2 operon. One of the other spacers is derived from a region just outside the
terminator of the cas1-cas2 operon, and another is derived from the C-terminal end of the NHEJ
Ku protein. This ku-derived spacer is complementary to two identical, adjacent protospacer
sequences in a region associated with Ku-DNA binding. Other interesting spacers were derived
from the following genes: an ABC transporter permease, a periplasmic chaperone and a
lipopolysaccharide synthesis kinase. Though spacers from some of these genes seems to be
enriching, it’s unclear if this is related to fighting the infection.
We first assessed the host protospacer sites in each of these self-targeting clones to see if
any obvious indels were present. PCRs were performed across each of the protospacers with
products gel-separated alongside parental controls (Figure 4.4B). None of these amplicons
deviated from the controls. Each of the protospacers were then PCR amplified and sent for Sanger
sequencing. All protospacer and directly adjacent regions were found to be fully intact. It’s unclear
how infection-related spacers can enrich if the cognate protospacers are still fully intact. We
presume spacer enrichment is due to improved fitness related to the impact of spacer function.
Cascade alone, in the absence of Cas3 can form a targeting ribonucleoprotein with a crRNA,
binding to repress at protospacer sites without causing damage (Luo et al., 2014). This could
potentially reconcile observed fitness associated with specific spacers in the absence of
protospacer sequence modifications. This explanation would necessitate modifications to the cas3
DNA sequence or some kind of post-transcriptional regulation of the cas3 gene product. There is
a strong selective pressure favoring cells with non-functional cas3 when Cas1-Cas2 is also
122
expressed. Acquisition of self-targeting spacers is generally lethal to bacterial cells as they often
have weak double strand DNA break repair machinery.
These enriched clones can be fed back into the phage infection assay with only the
interference machinery induced to assess protection from infection relative to the parental control.
If they are more protected, this may be evident from an OD600-monitoring time course through
the infection period. To test for Cas3 function we can first sequence the gene and then utilize
directed spacer integration to incorporate a pUC19 derived spacer. From here we can quantify
pUC19 transformation efficiency compared to a control when the interference machinery is
expressed. Additionally, deep sequencing across the expanded-array region from a culture post
infection may yield clearer insight into spacer/protospacer enrichment across the host genome.
Figure 4.4 | Self-genome targeting spacers enriched during phage infection. Clonal
coloniesfrom a CRISPR adaptation/interference strain infected with bacteriophage Lambda were
isolated after infection recovery. 15 expanded-array clones were identified. (A) Four of 15 hostderived spacers (3 unique) are complementary to the LacI gene. Two of the crRNAs are positive
sense to the LacI transcript and one is antisense. One of the spacers targets a protospacer
downstream of the inducible cas1-cas2 operon. The ku gene from an integrated heterologous NHEJ
construct contains two identical, adjacent protospacers targeted by the same sequenced crRNA.
(B) Each of the clones containing a self-targeting spacer within the array was PCR amplified across
the cognate protospacer site revealing no signs of insertions or deletions. Sanger sequencing
123
confirmed intact protospacers. Numbers at the top of columns represent different expanded-array
clones. Parental control amplifications are indicated by a “c”.
124
Chapter 5
Summary and Future Directions
5.1 CRISPR array dynamics
Spacer acquisition plays a critical role in the functioning of prokaryotic CRISPR-Cas
defense, whereby short DNA target sequences are host-encoded in a chronological repository of
immunological history. The unique capabilities of core adaptation machinery Cas1 and Cas2 are
being developed for biological data recording and storage applications. This evolved mechanism
for generating immunological targets can be repurposed in the absence of interference genes, to
capture and integrate short duplex DNA sequences that either proliferate or first appear in response
to a biological event. CRISPR arrays can also be utilized as a high density and durable biological
storage medium for preserving large amounts of information including digital data. Across
populations of bacterial cells, deep sequencing through the leader proximal end of CRISPR arrays
can reveal fluctuations in the composition of intracellular DNA debris in the form of newly
acquired spacers. These large-scale variations can reflect environmental changes including the
transient presence of mobile genetic elements or signifying changes in actively transcribing gene
networks. The recording resolution for event detection depends on the rate at which spacers are
acquired within the population. Sufficiently rare events may not be detected using this method
with wild type spacer acquisition rates. Likewise, for data storage and preservation applications,
the cellular capacity for new spacer integrations may be limited with only a single acquiring
125
CRISPR array within each bacterial cell. Therefore, we explored the potential for enhancing array
expansion via three rate-limiting factors involved in spacer acquisition, 1) Cas1-Cas2 expression
level, 2) Cas1-Cas2 substrate concentration or availability and 3) the number of CRISPR arrays
present within each cell. Spacer acquisition rates were calculated from multi-day cas1-cas2
induction experiments tracking the proportions of array-expanded subpopulations within E. coli
BL21-AI cultures.
We confirmed in our base recording strain that the presence of a high copy number plasmid
increases the rate of spacer acquisition, at least partially due to the additional plasmid derived DNA
debris produced. Adding an additional CRISPR array into the genome to provide two distinct sites
for integration did not increase total spacer acquisition per cell even with pUC19 present. In the
two-array strain acquisition rates were nearly halved for each array, potentially indicating Cas1-
Cas2 substrate as limiting. We found that spacer acquisition rates increased with Cas1 expression
levels for IPTG concentrations between 0.001-0.05 mM, stabilizing at higher doses. Additionally,
expressing heterologous non-homologous end joining (NHEJ) genes enhanced spacer acquisition
rates both in the presence and absence of plasmid DNA. The two host-genome integrated NHEJ
genes ku and ligD are derived from M. smegmatis. When each NHEJ gene was introduced without
the other, they were both found to enhance spacer acquisition independently. Summing the
acquisition increases from individual expression of Ku and LigD produced about the same boost
in spacer acquisition as both being expressed together. Study-wide analysis of spacer acquisition
experiments characterizing array expansion rates in various strains, identified a systematic slowing
in acquisition relative to a model describing constant-rate spacer acquisition. Modeling several
parameters potentially attributable for this deviation, we identified fitness associated with
expanded-array cells best fit our experimental results. To try and refute this conclusion, we isolated
126
expanded-array clones from an early time point in a cas1-cas2 induction time course. These clones
were individually competed against the unexpanded parental strain. On average the expanded
clones were significantly outcompeted by the unexpanded parental, supporting the “fitness effects”
hypothesis drawn from our modeling.
5.2 CRISPR-Cas self-genome targeting
CRISPR-Cas interference machinery can be directed to target the host-genome itself,
eliminating predetermined DNA sequence. Successful targeting can be achieved using just the
native DNA repair machinery or through facilitated repair from heterologous expression of nonnative machinery such as the NHEJ genes ku and ligD. Utilizing an introduced repair template
containing a designed target site modification allows for precise deletions through homology
directed repair. In the absence of a repair template, the maximum deletion size for a targeted
dsDNA break is the length of sequence connecting the two flanking essential genes on each side
of the targeted protospacer. Any breach of essential DNA is generally lethal. In the absence of a
repair template E. coli cells can utilize native A-EJ repair via DNA end resection matching short
microhomologies on either side to initiate repair, with deletions being the length of the sequence
between microhomologies. With multiple different microhomologies often on both sides of a target
site, this can result in a population of cells with deletions variable in size. NHEJ-mediated repair
does not require homologous sequences, allowing for repair with less damage to the targeted
region, resulting in smaller deletions.
We integrated E. coli Type I-E interference machinery cascade and cas3 into the casdeficient E. coli host strain BL21-AI. A single operon allowed for inducible expression of all
127
machinery necessary for target site binding (Cascade) and degradation (Cas3). Using a plasmid
expressing a lacZ crRNA we analyzed the impacts of self-genome targeting. We found that
Cascade-Cas3 generates large (>15kb) bidirectional deletions through native repair. With M.
smegmatis NHEJ genes ku and ligD introduced and expressed alongside the interference
machinery, a greater percentage of the population survived targeting with many of the survivors
containing small (~2.5kb) unidirectional deletions. Several previous studies had characterized
Cascade-Cas3 deletions as strictly unidirectional from work done in vitro and in human cells. We
concluded that heterologous NHEJ expression in this context increases survivability due to a
broader repair capacity and the ability to reduce deletion sizes, making damage to essential regions
less likely.
Our host strains containing both operons for controlled expression of CRISPR adaptation
and CRISPR interference were explored for their potential to survive autoimmunity. The aim was
to utilize crRNAs derived from integrated spacers and expressed from the array, rather than
specifically programmed crRNAs. In theory, this would allow for effectively random-genome
targeting across the population, producing broad deletion-mutant diversity to then select upon.
This strategy was pursued for the potential of single-cell sequential targeting to facilitate genome
streamlining, and to unearth beneficial pleotropic effects. We implemented several different
culturing schemes to introduce random deletions including alternating Cas1-Cas2/Cascade-Cas3
expression, as well as simultaneous expression of both operons. We did not succeed in generating
these mutant pools, as expanded-array clones ostensibly surviving random self-targeting
maintained in-tact protospacer target sites. CRISPR self-targeting, especially in bacteria with
feeble dsDNA break repair, imposes a strong selective pressure favoring the proliferation of
CRISPR-Cas-defective mutants in the population. These defects may be mutations to the Cas
128
machinery, the array promoter, the protospacer site or the PAM. Although NHEJ expression
improves survivability, the extent of self-targeting lethality may still be too much to sustain a
significant deletion-mutant population. Alternatively, the inability to identify self-genome
deletions from this process may in part be due to the repression of CRISPR array transcription by
H-NS.
5.3 CRISPR adaptation applications
Cellular recording tools enable tracking intracellular and extracellular events regulating
living cells across complex cultures. Several DNA recording devices have been developed
including RSM (Roquet et al., 2016), CAMERA (Tang & Liu., 2018), mSCRIBE (Perli et a., 2016)
and the CRISPR array recording system TRACE (Sheth et al., 2017). Biological signals articulated
in various forms can activate cellular responses by modulating genetic and epigenetic regulation,
leading to appreciable changes in both cellular and population behaviors (Masel et al., 2009).
Quantifying and tracking the relative spaciotemporal extent of these changes can help unravel the
complexities of specific cellular processes. However, several obstacles impede the optimal
implementation of these tools. DNA-based recording systems help overcome some of these
challenges. For example, transcriptomics and immunoassays require destruction of the biological
material preventing the recording of temporal information from individual samples. Live cell
imaging with fluorescent markers allows for continuous data collection but requires direct access
to the biological material. Alternatively, DNA recording systems occur in situ, enabling temporal
tracking without requiring access during the process. Cells are harvested after a recording period,
with DNA subsequently extracted and sequences analyzed to yield data tracking
129
cellular/population behavior. The useful data extracted from DNA recordings includes the
identities, quantities and relative chronological order of these biological signals. These recorded
DNA sequences are only valuable if they indicate the presence of specific biological events. This
could be an event that boosts the concentration of DNA recording substrate from a particular source
as recording is ongoing, or an event that induces the expression of recording machinery when it’s
repressed (Sheth et al., 2017). In systems utilizing a reverse transcriptase, DNA recording can
directly track regulatory impacts of biological events by converting transcriptional products into
DNA barcodes (Lear et al., 2023). Performance metrics can be quantified to evaluate and contrast
the capabilities of different biological recording methods, including: the temporal resolution of
recording, the cellular capacity for recording transduced signals and the accuracy and stability of
data storage. Other important criteria include the multiplexing/scaling potential of recording and
the ability to port recording systems between species.
The DNA-based recording system comprised of CRISPR adaptation machinery and a
CRISPR array is termed TRACE (temporal recording in arrays by CRISPR expansion). The system
architecture is composed of four parts: 1) Signal sensing, 2) DNA writing, 3) DNA reading and 4)
Actuation (Sheth and Wang., 2018). Signal sensing is the conversion of a signal such as an inducer,
into Cas1-Cas2 prespacer substrate. DNA writing is the integration of spacers into host or plasmidbased CRISPR arrays. DNA reading is the retrieval of the integrated sequences, either from
individual clones via Sanger sequencing or the broader, mixed population through deep
sequencing. Actuation is the functionalization of spacer integration such that array expansion itself
triggers a phenotypic change (e.g. inducing antibiotic resistance; Díez-Villaseñor et al., 2013). The
temporal resolution of DNA recording using TRACE is limited by the transduction of signal to
substrate as spacer acquisition depends on DNA replication to produce candidate Cas1-Cas2
130
prespacers (Levy et al., 2015). Memory writing also limits temporal resolution as Cas1-Cas2
integrations rely on post-synaptic complex processing to conclude before additional spacers can
be acquired by a particular CRISPR array (Budhathoki et al., 2020). Single-location integration
systems bottleneck the array-expansion process, supporting sequential but not parallel integrations.
However, the TRACE method does have significant multiplexing and scaling potential. Individual
cells containing multiple orthogonal spacer acquisition systems may be able to capture spacers
induced from distinct signals and integrate them into separate arrays. Alternatively, a single
CRISPR adaptation system can recognize and integrate into the same array, prespacers derived
from different signals.
TRACE enabled tools represent a new mode of biological measurement providing insight
into the complexities of cellular and population level behaviors and interactions. Tracking the
transient presence of phage and other mobile genetic elements within microbial communities may
help reveal the dynamics of horizontal gene transfer under different conditions (Smillie et al., 2011;
Munck et al., 2020). Biological signal variance can also be recorded in CRISPR arrays across
populations illuminating heterogeneity of expression and providing insight into cellular decision
making (Elowitz et al., 2010; Balázsi et al., 2011). Utilizing this recording process, cells can also
store digital data using a redox-responsive CRISPR adaptation system that directly encodes
information through electrical stimulation, providing a high-density, digital data storage medium
within living microbes (Yim et al., 2021). Additionally, TRACE-enabled cellular sentinels tasked
with passively recording internal and external environments can provide continuous feedback for
many different applications. In microbial communities, cellular sentinels could monitor pathogenrelated quorum signals to identify and quantify threats, with the potential to actuate a targeted
response (Gupta et al., 2013; Hwang et al., 2014). These monitoring cells can also detect toxic
131
contaminants such as heavy metals and track transient environmental conditions including
temperature and pH in more dynamic, open environments. For contained environments such as
bioreactors used for industrial processes, cellular sentinels can help guide product optimization by
monitoring cell metabolism and identifying pathways for improved productivity.
Technical challenges still exist for many of these applications to be fully realized. As
previously mentioned, temporal resolution is a key performance metric limiting recording
capabilities. This can be improved by modifying the DNA writing machinery Cas1-Cas2 through
directed evolution or random mutagenesis to speed up the spacer integration process (Heler et al.,
2016). Additionally, identifying accessory molecular components that enhance Cas1-Cas2
substrate preservation or expansion can also improve temporal resolution, as we discovered with
the overexpression of heterologous NHEJ genes ku and ligD. Providing a non-limiting number of
CRISPR arrays per cell can also enhance spacer acquisition rates in the presence of excess
substrate, especially in cells recording multiple transduced signals with the same adaptation
machinery. Porting spacer acquisition to a broad range of non-native hosts is a significant challenge
not yet solved, as all host components necessary for CRISPR adaptation are generally not entirely
known. TRACE applications are mostly limited to E. coli cells and the native Type I-E CRISPR
system. Enabling porting of adaptation machinery between species significantly increases the
potential for multiplexing and scaling, ultimately enhancing the ability to record and track complex
biological behaviors. CRISPR array stability is important for the accuracy of data sets as array
contraction or spacer duplication may distort the interpretation of biotic signals. Better
quantification of these aberrations may clarify their significance and help guide system
modifications that minimize incidence.
132
5.4 Future directions
Enhancing spacer acquisition rates, as mentioned previously can improve the capabilities
for several emerging applications. These enhancements may be attained through improvements to
the individual rate-limiting factors involved in the spacer integration mechanism. Modifying the
amino acid sequences of proteins involved in spacer acquisition, from Cas1-Cas2 to Cas9 (Type
II), through directed evolution or rational design has already been shown successful to this end
(Yosef et al., 2023; Heler et al., 2017). Boosting prespacer substrate concentrations for Cas1-Cas2
capture and processing also enhances acquisition rates. RecBCD exonuclease activity has been
found to significantly contribute to Cas1-Cas2 substrate pool production. As a byproduct of
repairing stalled replication forks RecBCD processively degrades DNA from free ends until it
reaches a short sequence motif called a chi-site (Levy et al., 2015). Overexpressing RecBCD or
reducing chi-site prevalence may therefore induce expansion of the prespacer substrate pool. In
our experiments heterologous NHEJ genes ku and ligD both independently enhanced spacer
acquisition. Perhaps other proteins that either protect DNA ends from host exonuclease
degradation or ligate/synthesize fragments (e.g. bacteriophage T4 DNA ligase) can similarly
augment array expansion rates. Additionally, increasing the number of spacer integration sites
within host genomes may scale spacer acquisition rates in the presence of excess prespacer
substrate. The leader-repeat-junction DNA sequence necessary for spacer acquisition requires no
more than 25 nucleotides (Nivala et al., 2018). Duplicating this sequence throughout the genome
would significantly reduce the probability of Cas1-Cas2 integration sites limiting spacer
acquisition. These hypotheses can be further explored using our array-expansion quantification
method to analyze the results.
133
The 5' end of precursor crRNA transcribed from the CRISPR array is derived from the 3'
region of the leader sequence. In the Type II-A CRISPR system of S. pyogenes this RNA sequence
has been found to interact with the transcribed leader proximal repeat sequence bordering the first
spacer (Liao et al., 2022). This interaction is facilitated by tracrRNA which anneals to the first
repeat and is thought to accelerate processing of the newest spacer, prioritizing maturation of the
most recently integrated sequence. However, the leader transcribed RNA sequence in pre-crRNA
from Type I-E CRISPR systems is not well understood. These systems do not have tracrRNA so if
the function of this leader sequence is also spacer prioritization the mechanism must be different.
In E. coli this terminal 5’ sequence forms an RNA stem loop reminiscent of the rho-independent
terminator common to small regulatory RNAs. These stem loops are thought to promote RNA
stability and the recruitment of RNA chaperones (Papenfort and Melamed., 2023). It may be
valuable to explore the function of this leader-derived RNA sequence in the context of crRNA
prioritization and perhaps for function in the absence of Cas6 (pre-crRNA processing enzyme)
expression.
An interesting observation comes from our array expansion data comparing spacer
acquisition rates in the base recording strain with and without the high copy number plasmid
pUC19. The degree of acquisition rate increase in the strain containing pUC19 is not accounted
for by the fraction of pUC19 derived spacers detected. The acquisition rate is approximately
doubled when pUC19 is introduced but only about 19% of the newly acquired spacers we
sequenced were pUC19 derived. This result was also observed in Sheth et al., 2017. They found
that the presence of a high copy number plasmid appeared to significantly increase the acquisition
of genome-derived spacers. It’s unclear what causes this but it may be worth exploring further.
134
Several hypotheses could be tested through deep sequencing to identify and map these spacer
sequences to the host E. coli genome.
Although many of the mechanistic details of CRISPR adaptation have been characterized,
some host components involved in the process have yet to be identified. CRISPR interference,
particularly repurposed Cas9 is a functional in vivo tool across all domains of life. Scarcely few
examples of CRISPR adaptation however have been reported outside the native host. Porting
CRISPR adaptation machinery for heterologous spacer acquisition has thus far been a challenge
due to unidentified host components involved in different parts of the process. No known examples
of heterologous CRISPR adaptation in Eukaryotes have thus far been reported. Identifying these
unknown components is the first step in reconstituting spacer recording in eukaryotic systems.
Recording intracellular events in human cells for instance could have significant implications in
understanding and treating disease. Additionally, due to robust NHEJ pathways in eukaryotes,
combining CRISPR adaptation with interference would make these systems ideal to generate
random deletions for both genome streamlining and minimization efforts.
We found that Ku and LigD, bacterial NHEJ machinery, facilitates DNA damage repair in
response to Cascade-Cas3 target-site degradation. Unlike the relatively simple mechanism for
generating Cas9 dsDNA breaks, the Cascade-Cas3 mechanism is complex and not fully
characterized. It’s unclear at which point during Cas3 target processing NHEJ machinery
intervenes to render unidirectional rather than bidirectional deletions, but perhaps further
exploration can illuminate fundamental aspects of the DNA degradation mechanism itself.
135
BIBLIOGRAPHY
1. Achigar R, Magadán AH, Tremblay DM, Julia Pianzzola M, Moineau S. Phage-host
interactions in Streptococcus thermophilus: Genome analysis of phages isolated in Uruguay
and ectopic spacer acquisition in CRISPR array. Sci Rep. 2017 Mar 6;7(1):43438.
2. Aguilera A, García-Muse T. R Loops: From Transcription Byproducts to Threats to Genome
Stability. Molecular Cell. 2012 Apr;46(2):115–24.
3. Amitai G, Sorek R. CRISPR–Cas adaptation: insights into the mechanism of action. Nat
Rev Microbiol. 2016 Feb;14(2):67–76.
4. Amlinger L, Hoekzema M, Wagner EGH, Koskiniemi S, Lundgren M. Fluorescent CRISPR
Adaptation Reporter for rapid quantification of spacer acquisition. Sci Rep. 2017 Sep
4;7(1):10392.
5. Aniukwu J, Glickman MS, Shuman S. The pathways and outcomes of mycobacterial NHEJ
depend on the structure of the broken DNA ends. Genes Dev. 2008 Feb 15;22(4):512–27.
6. Arslan Z, Hermanns V, Wurm R, Wagner R, Pul Ü. Detection and characterization of spacer
integration intermediates in type I-E CRISPR–Cas system. Nucleic Acids Res. 2014 Jul
8;42(12):7884–93.
7. Balázsi G, van Oudenaarden A, Collins JJ. Cellular Decision Making and Biological Noise:
From Microbes to Mammals. Cell. 2011 Mar;144(6):910–25.
8. Barrangou R, Fremaux C, Deveau H, Richards M, Boyaval P, Moineau S, et al. CRISPR
Provides Acquired Resistance Against Viruses in Prokaryotes. Science. 2007 Mar
23;315(5819):1709–12.
9. Bernick DL, Cox CL, Dennis PP, Lowe TM. Comparative genomic and transcriptional
analyses of CRISPR systems across the genus Pyrobaculum. Front Microbio [Internet].
2012 [cited 2024 Dec 5];3. Available from:
http://journal.frontiersin.org/article/10.3389/fmicb.2012.00251/abstract
10. Bolotin A, Quinquis B, Sorokin A, Ehrlich SD. Clustered regularly interspaced short
palindrome repeats (CRISPRs) have spacers of extrachromosomal origin. Microbiology.
2005 Aug 1;151(8):2551–61.
11. Bowater R, Doherty AJ. Making ends meet: repairing breaks in bacterial DNA by nonhomologous end-joining. PLoS Genet. 2006 Feb;2(2):e8.
12. Brissett NC, Doherty AJ. Repairing DNA double-strand breaks by the prokaryotic nonhomologous end-joining pathway. Biochem Soc Trans. 2009 Jun;37(Pt 3):539–45.
136
13. Budhathoki JB, Xiao Y, Schuler G, Hu C, Cheng A, Ding F, et al. Real-time observation of
CRISPR spacer acquisition by Cas1–Cas2 integrase. Nat Struct Mol Biol. 2020
May;27(5):489–99.
14. Chang HHY, Pannunzio NR, Adachi N, Lieber MR. Non-homologous DNA end joining and
alternative pathways to double-strand break repair. Nat Rev Mol Cell Biol. 2017
Aug;18(8):495–506.
15. Chayot R, Montagne B, Mazel D, Ricchetti M. An end-joining repair mechanism in
Escherichia coli. Proc Natl Acad Sci U S A. 2010 Feb 2;107(5):2141–6.
16. Chen H, Mayer A, Balasubramanian V. A scaling law in CRISPR repertoire sizes arises
from the avoidance of autoimmunity. Curr Biol. 2022 Jul 11;32(13):2897-2907.e5.
17. Citorik RJ, Mimee M, Lu TK. Sequence-specific antimicrobials using efficiently delivered
RNA-guided nucleases. Nat Biotechnol. 2014 Nov;32(11):1141–5.
18. Csörgő B, León LM, Chau-Ly IJ, Vasquez-Rifo A, Berry JD, Mahendra C, et al. A compact
Cascade-Cas3 system for targeted genome engineering. Nat Methods. 2020
Dec;17(12):1183–90.
19. Cui L, Bikard D. Consequences of Cas9 cleavage in the chromosome of Escherichia coli.
Nucleic Acids Res. 2016 May 19;44(9):4243–51.
20. Dagdas YS, Chen JS, Sternberg SH, Doudna JA, Yildiz A. A conformational checkpoint
between DNA binding and cleavage by CRISPR-Cas9. Sci Adv. 2017 Aug 4;3(8):eaao0027.
21. Datsenko KA, Pougach K, Tikhonov A, Wanner BL, Severinov K, Semenova E. Molecular
memory of prior infections activates the CRISPR/Cas adaptive bacterial immunity system.
Nat Commun. 2012 Jul 10;3(1):945.
22. Deecker SR, Ensminger AW. Type I-F CRISPR-Cas Distribution and Array Dynamics in
Legionella pneumophila. G3 Genes|Genomes|Genetics. 2020 Mar 1;10(3):1039–50.
23. Deem MW. CRISPR recognizes as many phage types as possible without overwhelming the
Cas machinery. Proc Natl Acad Sci USA. 2020 Apr 7;117(14):7550–2.
24. Delaney NF, Balenger S, Bonneaud C, Marx CJ, Hill GE, Ferguson-Noel N, et al. Ultrafast
Evolution and Loss of CRISPRs Following a Host Shift in a Novel Wildlife Pathogen,
Mycoplasma gallisepticum. Guttman DS, editor. PLoS Genet. 2012 Feb 9;8(2):e1002511.
25. Della M, Palmbos PL, Tseng HM, Tonkin LM, Daley JM, Topper LM, et al. Mycobacterial
Ku and ligase proteins constitute a two-component NHEJ repair machine. Science. 2004
Oct 22;306(5696):683–5.
26. Deltcheva E, Chylinski K, Sharma CM, Gonzales K, Chao Y, Pirzada ZA, et al. CRISPR
RNA maturation by trans-encoded small RNA and host factor RNase III. Nature. 2011
Mar;471(7340):602–7.
137
27. Deveau H, Barrangou R, Garneau JE, Labonté J, Fremaux C, Boyaval P, et al. Phage
Response to CRISPR-Encoded Resistance in Streptococcus thermophilus. J Bacteriol. 2008
Feb 15;190(4):1390–400.
28. Díez-Villaseñor C, Guzmán NM, Almendros C, García-Martínez J, Mojica FJM. CRISPRspacer integration reporter plasmids reveal distinct genuine acquisition specificities among
CRISPR-Cas I-E variants of Escherichia coli . RNA Biology. 2013 May;10(5):792–802.
29. Dolan AE, Hou Z, Xiao Y, Gramelspacher MJ, Heo J, Howden SE, et al. Introducing a
Spectrum of Long-Range Genomic Deletions in Human Embryonic Stem Cells Using Type
I CRISPR-Cas. Molecular Cell. 2019 Jun;74(5):936-950.e5.
30. Du K, Gong L, Li M, Yu H, Xiang H. Reprogramming the endogenous type I CRISPR‐Cas
system for simultaneous gene regulation and editing in Haloarcula hispanica. mLife. 2022
Mar;1(1):40–50.
31. Eldar A, Elowitz MB. Functional roles for noise in genetic circuits. Nature. 2010
Sep;467(7312):167–73.
32. Ferretti JJ, Stevens DL, Fischetti VA, editors. Streptococcus pyogenes: Basic Biology to
Clinical Manifestations [Internet]. Oklahoma City (OK): University of Oklahoma Health
Sciences Center; 2016 [cited 2024 Dec 5]. Available from:
http://www.ncbi.nlm.nih.gov/books/NBK333424/
33. Fineran PC, Charpentier E. Memory of viral infections by CRISPR-Cas adaptive immune
systems: Acquisition of new information. Virology. 2012 Dec;434(2):202–9.
34. Fineran PC, Gerritzen MJH, Suárez-Diez M, Künne T, Boekhorst J, Van Hijum SAFT, et al.
Degenerate target sites mediate rapid primed CRISPR adaptation. Proc Natl Acad Sci USA
[Internet]. 2014 Apr 22 [cited 2024 Dec 5];111(16). Available from:
https://pnas.org/doi/full/10.1073/pnas.1400071111
35. Finger‐Bou M, Orsi E, Van Der Oost J, Staals RHJ. CRISPR with a Happy Ending: Non‐
Templated DNA Repair for Prokaryotic Genome Engineering. Biotechnology Journal. 2020
Jul;15(7):1900404.
36. Foster PL, Lee H, Popodi E, Townes JP, Tang H. Determinants of spontaneous mutation in
the bacterium Escherichia coli as revealed by whole-genome sequencing. Proc Natl Acad
Sci USA [Internet]. 2015 Nov 3 [cited 2024 Dec 5];112(44). Available from:
https://pnas.org/doi/full/10.1073/pnas.1512136112
37. Gao F, Zheng K, Li YB, Jiang F, Han CY. A Cas6-based RNA tracking platform functioning
in a fluorescence-activation mode. Nucleic Acids Research. 2022 May 6;50(8):e46–e46.
38. Garneau JE, Moineau S. Bacteriophages of lactic acid bacteria and their impact on milk
fermentations. Microb Cell Fact. 2011;10(Suppl 1):S20.
138
39. Garneau JE, Dupuis MÈ, Villion M, Romero DA, Barrangou R, Boyaval P, et al. The
CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA. Nature.
2010 Nov 4;468(7320):67–71.
40. Garrett SC. Pruning and Tending Immune Memories: Spacer Dynamics in the CRISPR
Array. Front Microbiol. 2021 Apr 1;12:664299.
41. Georjon H, Bernheim A. The highly diverse antiphage defence systems of bacteria. Nat Rev
Microbiol. 2023 Oct;21(10):686–700.
42. Gleditzsch D, Pausch P, Müller-Esparza H, Özcan A, Guo X, Bange G, et al. PAM
identification by CRISPR-Cas effector complexes: diversified mechanisms and structures.
RNA Biology. 2019 Apr 3;16(4):504–17.
43. Gudbergsdottir S, Deng L, Chen Z, Jensen JVK, Jensen LR, She Q, et al. Dynamic
properties of the Sulfolobus CRISPR/Cas and CRISPR/Cmr systems when challenged with
vector‐borne viral and plasmid genes and protospacers. Molecular Microbiology. 2011
Jan;79(1):35–49.
44. Guo TW, Bartesaghi A, Yang H, Falconieri V, Rao P, Merk A, et al. Cryo-EM Structures
Reveal Mechanism and Inhibition of DNA Targeting by a CRISPR-Cas Surveillance
Complex. Cell. 2017 Oct;171(2):414-426.e12.
45. Gupta S, Bram EE, Weiss R. Genetically Programmable Pathogen Sense and Destroy. ACS
Synth Biol. 2013 Dec 20;2(12):715–23.
46. Hayes RP, Xiao Y, Ding F, Van Erp PBG, Rajashankar K, Bailey S, et al. Structural basis for
promiscuous PAM recognition in type I–E Cascade from E. coli. Nature. 2016
Feb;530(7591):499–503.
47. He L, St. John James M, Radovcic M, Ivancic-Bace I, Bolt EL. Cas3 Protein—A Review of
a Multi-Tasking Machine. Genes. 2020 Feb 18;11(2):208.
48. Held NL, Herrera A, Cadillo-Quiroz H, Whitaker RJ. CRISPR Associated Diversity within
a Population of Sulfolobus islandicus. Planet PJ, editor. PLoS ONE. 2010 Sep
28;5(9):e12988.
49. Heler R, Samai P, Modell JW, Weiner C, Goldberg GW, Bikard D, et al. Cas9 specifies
functional viral targets during CRISPR–Cas adaptation. Nature. 2015 Mar
12;519(7542):199–202.
50. Heler R, Wright AV, Vucelja M, Bikard D, Doudna JA, Marraffini LA. Mutations in Cas9
Enhance the Rate of Acquisition of Viral Spacer Sequences during the CRISPR-Cas
Immune Response. Molecular Cell. 2017 Jan;65(1):168–75.
51. Heler R, Wright AV, Vucelja M, Doudna JA, Marraffini LA. Spacer Acquisition Rates
Determine the Immunological Diversity of the Type II CRISPR-Cas Immune Response.
Cell Host & Microbe. 2019 Feb;25(2):242-249.e3.
139
52. Horvath P, Romero DA, Coûté-Monvoisin AC, Richards M, Deveau H, Moineau S, et al.
Diversity, Activity, and Evolution of CRISPR Loci in Streptococcus thermophilus. J
Bacteriol. 2008 Feb 15;190(4):1401–12.
53. Hwang IY, Tan MH, Koh E, Ho CL, Poh CL, Chang MW. Reprogramming Microbes to Be
Pathogen-Seeking Killers. ACS Synth Biol. 2014 Apr 18;3(4):228–37.
54. Ivančić-Baće I, Cass SD, Wearne SJ, Bolt EL. Different genome stability proteins underpin
primed and naïve adaptation in E. coli CRISPR-Cas immunity. Nucleic Acids Res. 2015
Dec 15;43(22):10821–30.
55. Jackson RN, Wiedenheft B. A Conserved Structural Chassis for Mounting Versatile
CRISPR RNA-Guided Immune Responses. Molecular Cell. 2015 Jun;58(5):722–8.
56. Jackson SA, McKenzie RE, Fagerlund RD, Kieper SN, Fineran PC, Brouns SJJ. CRISPRCas: Adapting to change. Science. 2017 Apr 7;356(6333):eaal5056.
57. Jiang F, Taylor DW, Chen JS, Kornfeld JE, Zhou K, Thompson AJ, et al. Structures of a
CRISPR-Cas9 R-loop complex primed for DNA cleavage. Science. 2016 Feb
19;351(6275):867–71.
58. Jiang W, Bikard D, Cox D, Zhang F, Marraffini LA. RNA-guided editing of bacterial
genomes using CRISPR-Cas systems. Nat Biotechnol. 2013 Mar;31(3):233–9.
59. Jiang W, Maniv I, Arain F, Wang Y, Levin BR, Marraffini LA. Dealing with the
Evolutionary Downside of CRISPR Immunity: Bacteria and Beneficial Plasmids. Matic I,
editor. PLoS Genet. 2013 Sep 26;9(9):e1003844.
60. Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, Charpentier E. A Programmable
Dual-RNA–Guided DNA Endonuclease in Adaptive Bacterial Immunity. Science. 2012 Aug
17;337(6096):816–21.
61. Jinek M, Jiang F, Taylor DW, Sternberg SH, Kaya E, Ma E, et al. Structures of Cas9
Endonucleases Reveal RNA-Mediated Conformational Activation. Science. 2014 Mar
14;343(6176):1247997.
62. Kim S, Loeff L, Colombo S, Jergic S, Brouns SJJ, Joo C. Selective loading and processing
of prespacers for precise CRISPR adaptation. Nature. 2020 Mar 5;579(7797):141–5.
63. Koonin EV, Makarova KS, Zhang F. Diversity, classification and evolution of CRISPR-Cas
systems. Current Opinion in Microbiology. 2017 Jun;37:67–78.
64. Le Rhun A, Escalera-Maurer A, Bratovič M, Charpentier E. CRISPR-Cas in Streptococcus
pyogenes. RNA Biology. 2019 Apr 3;16(4):380–9.
65. Lear SK, Shipman SL. Molecular recording: transcriptional data collection into the genome.
Current Opinion in Biotechnology. 2023 Feb;79:102855.
140
66. Lear SK, Lopez SC, González-Delgado A, Bhattarai-Kline S, Shipman SL. Temporally
resolved transcriptional recording in E. coli DNA using a Retro-Cascorder. Nat Protoc.
2023 Jun;18(6):1866–92.
67. Leenay RT, Maksimchuk KR, Slotkowski RA, Agrawal RN, Gomaa AA, Briner AE, et al.
Identifying and Visualizing Functional PAM Diversity across CRISPR-Cas Systems.
Molecular Cell. 2016 Apr;62(1):137–47.
68. Levy A, Goren MG, Yosef I, Auster O, Manor M, Amitai G, et al. CRISPR adaptation
biases explain preference for acquisition of foreign DNA. Nature. 2015 Apr
23;520(7548):505–10.
69. Li Y, Pan S, Zhang Y, Ren M, Feng M, Peng N, et al. Harnessing Type I and Type III
CRISPR-Cas systems for genome editing. Nucleic Acids Res. 2016 Feb 29;44(4):e34–e34.
70. Liao C, Sharma S, Svensson SL, Kibe A, Weinberg Z, Alkhnbashi OS, et al. Spacer
prioritization in CRISPR-Cas9 immunity is enabled by the leader RNA. Nat Microbiol.
2022 Apr;7(4):530–41.
71. Lillestøl R, Redder P, Garrett RA, Brügger K. A putative viral defence mechanism in
archaeal cells. Archaea. 2006 Jan;2(1):59–72.
72. Lioliou E, Romilly C, Romby P, Fechter P. RNA-mediated regulation in bacteria: from
natural to artificial systems. New Biotechnology. 2010 Jul;27(3):222–35.
73. Liu C, Wang R, Li J, Cheng F, Shu X, Zhao H, et al. Widespread RNA-based cas regulation
monitors crRNA abundance and anti-CRISPR proteins. Cell Host & Microbe. 2023
Sep;31(9):1481-1493.e6.
74. Liu TY, Doudna JA. Chemistry of Class 1 CRISPR-Cas effectors: Binding, editing, and
regulation. Journal of Biological Chemistry. 2020 Oct;295(42):14473–87.
75. Luo ML, Mullis AS, Leenay RT, Beisel CL. Repurposing endogenous type I CRISPR-Cas
systems for programmable gene repression. Nucleic Acids Res. 2015 Jan;43(1):674–81.
76. Maguin P, Varble A, Modell JW, Marraffini LA. Cleavage of viral DNA by restriction
endonucleases stimulates the type II CRISPR-Cas immune response. Molecular Cell. 2022
Mar;82(5):907-919.e7.
77. Makarova KS, Wolf YI, Iranzo J, Shmakov SA, Alkhnbashi OS, Brouns SJJ, et al.
Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants.
Nat Rev Microbiol. 2020 Feb;18(2):67–83.
78. Markulin D. CRISPR-Cas in Escherichia coli: regulation by H-NS, LeuO and temperature.
PDBIAD. 2020 Dec 30;121–122(3–4):155–60.
79. Marraffini LA. CRISPR-Cas immunity in prokaryotes. Nature. 2015 Oct 1;526(7571):55–
61.
141
80. Martynov A, Severinov K, Ispolatov I. Optimal number of spacers in CRISPR arrays. Wilke
CO, editor. PLoS Comput Biol. 2017 Dec 18;13(12):e1005891.
81. Masel J, Siegal ML. Robustness: mechanisms and consequences. Trends in Genetics. 2009
Sep;25(9):395–403.
82. McGinn J, Marraffini LA. CRISPR-Cas Systems Optimize Their Immune Response by
Specifying the Site of Spacer Integration. Molecular Cell. 2016 Nov;64(3):616–23.
83. McGinn J, Marraffini LA. Molecular mechanisms of CRISPR–Cas spacer acquisition. Nat
Rev Microbiol. 2019 Jan;17(1):7–12.
84. Meeske AJ, Nakandakari-Higa S, Marraffini LA. Cas13-induced cellular dormancy
prevents the rise of CRISPR-resistant bacteriophage. Nature. 2019 Jun 13;570(7760):241–
5.
85. Mitić D, Bolt EL, Ivančić-Baće I. CRISPR-Cas adaptation in Escherichia coli. Bioscience
Reports. 2023 Mar 31;43(3):BSR20221198.
86. Mojica FJM, Dez-Villaseor C, Garca-Martnez J, Soria E. Intervening Sequences of
Regularly Spaced Prokaryotic Repeats Derive from Foreign Genetic Elements. J Mol Evol.
2005 Feb;60(2):174–82.
87. Morisaka H, Yoshimi K, Okuzaki Y, Gee P, Kunihiro Y, Sonpho E, et al. CRISPR-Cas3
induces broad and unidirectional genome editing in human cells. Nat Commun. 2019 Dec
6;10(1):5302.
88. Mulepati S, Bailey S. Structural and Biochemical Analysis of Nuclease Domain of
Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated Protein 3
(Cas3). Journal of Biological Chemistry. 2011 Sep;286(36):31896–903.
89. Mulepati S, Bailey S. In Vitro Reconstitution of an Escherichia coli RNA-guided Immune
System Reveals Unidirectional, ATP-dependent Degradation of DNA Target. Journal of
Biological Chemistry. 2013 Aug;288(31):22184–92.
90. Munck C, Sheth RU, Freedberg DE, Wang HH. Recording mobile DNA in the gut
microbiota using an Escherichia coli CRISPR-Cas spacer acquisition platform. Nat
Commun. 2020 Jan 7;11(1):95.
91. Musharova O, Medvedeva S, Klimuk E, Guzman NM, Titova D, Zgoda V, et al. Prespacers
formed during primed adaptation associate with the Cas1-Cas2 adaptation complex and the
Cas3 interference nuclease-helicase. Proc Natl Acad Sci U S A. 2021 Jun
1;118(22):e2021291118.
92. Nair PA, Smith P, Shuman S. Structure of bacterial LigD 3’-phosphoesterase unveils a DNA
repair superfamily. Proc Natl Acad Sci U S A. 2010 Jul 20;107(29):12822
142
93. Niewoehner O, Jinek M, Doudna JA. Evolution of CRISPR RNA recognition and
processing by Cas6 endonucleases. Nucleic Acids Research. 2014 Jan 1;42(2):1341–53.
94. Nishimasu H, Ran FA, Hsu PD, Konermann S, Shehata SI, Dohmae N, et al. Crystal
Structure of Cas9 in Complex with Guide RNA and Target DNA. Cell. 2014
Feb;156(5):935–49.
95. Nivala J, Shipman SL, Church GM. Spontaneous CRISPR loci generation in vivo by noncanonical spacer integration. Nat Microbiol. 2018 Jan 29;3(3):310–8.
96. Nuñez JK, Lee ASY, Engelman A, Doudna JA. Integrase-mediated spacer acquisition
during CRISPR–Cas adaptive immunity. Nature. 2015 Mar 12;519(7542):193–8.
97. Papenfort K, Melamed S. Small RNAs, Large Networks: Posttranscriptional Regulons in
Gram-Negative Bacteria. Annu Rev Microbiol. 2023 Sep 15;77(1):23–43.
98. Park Y, Espah Borujeni A, Gorochowski TE, Shin J, Voigt CA. P recision design of stable
genetic circuits carried in highly‐insulated E. coli genomic landing pads. Molecular
Systems Biology. 2020 Aug;16(8):e9584.
99. Patterson AG, Jackson SA, Taylor C, Evans GB, Salmond GPC, Przybilski R, et al. Quorum
Sensing Controls Adaptive Immunity through the Regulation of Multiple CRISPR-Cas
Systems. Molecular Cell. 2016 Dec;64(6):1102–8.
100. Pausch P, Müller-Esparza H, Gleditzsch D, Altegoer F, Randau L, Bange G. Structural
Variation of Type I-F CRISPR RNA Guided DNA Surveillance. Molecular Cell. 2017
Aug;67(4):622-632.e4.
101. Pennisi E. The CRISPR Craze. Science. 2013 Aug 23;341(6148):833–6.
102. Perli SD, Cui CH, Lu TK. Continuous genetic recording with self-targeting CRISPR-Cas in
human cells. Science. 2016 Sep 9;353(6304):aag0511.
103. Pitcher RS, Green AJ, Brzostek A, Korycka-Machala M, Dziadek J, Doherty AJ. NHEJ
protects mycobacteria in stationary phase against the harmful effects of desiccation. DNA
Repair (Amst). 2007 Sep 1;6(9):1271–6.
104. Pitcher RS, Wilson TE, Doherty AJ. New insights into NHEJ repair processes in
prokaryotes. Cell Cycle. 2005 May;4(5):675–8.
105. Pourcel C, Salvignol G, Vergnaud G. CRISPR elements in Yersinia pestis acquire new
repeats by preferential uptake of bacteriophage DNA, and provide additional tools for
evolutionary studies. Microbiology. 2005 Mar 1;151(3):653–63.
106. Pourcel C, Touchon M, Villeriot N, Vernadet JP, Couvin D, Toffano-Nioche C, et al.
CRISPRCasdb a successor of CRISPRdb containing CRISPR arrays and cas genes from
complete genome sequences, and tools to download and query lists of repeats and spacers.
Nucleic Acids Research. 2019 Oct 18;gkz915.
143
107. Rajnovic D, Muñoz-Berbel X, Mas J. Fast phage detection and quantification: An optical
density-based approach. Dykeman EC, editor. PLoS ONE. 2019 May 9;14(5):e0216292.
108. Richter C, Dy RL, McKenzie RE, Watson BNJ, Taylor C, Chang JT, et al. Priming in the
Type I-F CRISPR-Cas system triggers strand-independent spacer acquisition, bidirectionally from the primed protospacer. Nucleic Acids Res. 2014 Jul;42(13):8516–26.
109. Rollie C, Schneider S, Brinkmann AS, Bolt EL, White MF. Intrinsic sequence specificity of
the Cas1 integrase directs new spacer acquisition. eLife. 2015 Aug 18;4:e08716.
110. Rollins MF, Chowdhury S, Carter J, Golden SM, Miettinen HM, Santiago-Frangos A, et al.
Structure Reveals a Mechanism of CRISPR-RNA-Guided Nuclease Recruitment and AntiCRISPR Viral Mimicry. Molecular Cell. 2019 Apr;74(1):132-142.e5.
111. Roquet N, Soleimany AP, Ferris AC, Aaronson S, Lu TK. Synthetic recombinase-based
state machines in living cells. Science. 2016 Jul 22;353(6297):aad8559.
112. Rusk N. Prokaryotic RNAi. Nat Methods. 2012 Mar;9(3):220–1.
113. Saberi F, Kamali M, Najafi A, Yazdanparast A, Moghaddam MM. Natural antisense RNAs
as mRNA regulatory elements in bacteria: a review on function and applications. Cell Mol
Biol Lett. 2016 Dec;21(1):6.
114. Santiago‐Frangos A, Woodson SA. Hfq chaperone brings speed dating to bacterial sRNA.
WIREs RNA. 2018 Jul;9(4):e1475.
115. Santos-Pereira JM, Aguilera A. R loops: new modulators of genome dynamics and function.
Nat Rev Genet. 2015 Oct;16(10):583–97.
116. Sapranauskas R, Gasiunas G, Fremaux C, Barrangou R, Horvath P, Siksnys V. The
Streptococcus thermophilus CRISPR/Cas system provides immunity in Escherichia coli.
Nucleic Acids Research. 2011 Nov;39(21):9275–82.
117. Sashital DG, Jinek M, Doudna JA. An RNA-induced conformational change required for
CRISPR RNA cleavage by the endoribonuclease Cse3. Nat Struct Mol Biol. 2011
Jun;18(6):680–7.
118. Sashital DG, Wiedenheft B, Doudna JA. Mechanism of Foreign DNA Selection in a
Bacterial Adaptive Immune System. Molecular Cell. 2012 Jun;46(5):606–15.
119. Semenova E, Jore MM, Datsenko KA, Semenova A, Westra ER, Wanner B, et al.
Interference by clustered regularly interspaced short palindromic repeat (CRISPR) RNA is
governed by a seed sequence. Proc Natl Acad Sci USA. 2011 Jun 21;108(25):10098–103.
120. Sharan SK, Thomason LC, Kuznetsov SG, Court DL. Recombineering: a homologous
recombination-based method of genetic engineering. Nat Protoc. 2009 Feb;4(2):206–23.
144
121. Sheth RU, Wang HH. DNA-based memory devices for recording cellular events. Nat Rev
Genet. 2018 Nov;19(11):718–32.
122. Sheth RU, Yim SS, Wu FL, Wang HH. Multiplex recording of cellular events over time on
CRISPR biological tape. Science. 2017 Dec 15;358(6369):1457–61.
123. Shipman SL, Nivala J, Macklis JD, Church GM. Molecular recordings by directed CRISPR
spacer acquisition. Science. 2016 Jul 29;353(6298):aaf1175.
124. Shipman SL, Nivala J, Macklis JD, Church GM. CRISPR–Cas encoding of a digital movie
into the genomes of a population of living bacteria. Nature. 2017 Jul;547(7663):345–9.
125. Shiriaeva AA, Kuznedelov K, Fedorov I, Musharova O, Khvostikov T, Tsoy Y, et al. Host
nucleases generate prespacers for primed adaptation in the E. coli type I-E CRISPR-Cas
system. Sci Adv. 2022 Nov 23;8(47):eabn8650.
126. Shivram H, Cress BF, Knott GJ, Doudna JA. Controlling and enhancing CRISPR systems.
Nat Chem Biol. 2021 Jan;17(1):10–9.
127. Shuman S, Glickman MS. Bacterial DNA repair by non-homologous end joining. Nat Rev
Microbiol. 2007 Nov;5(11):852–61.
128. Skovgaard O, Bak M, Løbner-Olesen A, Tommerup N. Genome-wide detection of
chromosomal rearrangements, indels, and mutations in circular chromosomes by short read
sequencing. Genome Res. 2011 Aug;21(8):1388–93.
129. Smillie CS, Smith MB, Friedman J, Cordero OX, David LA, Alm EJ. Ecology drives a
global network of gene exchange connecting the human microbiome. Nature. 2011
Dec;480(7376):241–4.
130. Staals RHJ, Jackson SA, Biswas A, Brouns SJJ, Brown CM, Fineran PC. Interferencedriven spacer acquisition is dominant over naive and primed adaptation in a native
CRISPR–Cas system. Nat Commun. 2016 Oct 3;7(1):12853.
131. Sternberg SH, Redding S, Jinek M, Greene EC, Doudna JA. DNA interrogation by the
CRISPR RNA-guided endonuclease Cas9. Nature. 2014 Mar;507(7490):62–7.
132. Sternberg SH, Richter H, Charpentier E, Qimron U. Adaptation in CRISPR-Cas Systems.
Molecular Cell. 2016 Mar;61(6):797–808.
133. Strotskaya A, Savitskaya E, Metlitskaya A, Morozova N, Datsenko KA, Semenova E, et al.
The action of Escherichia coli CRISPR-Cas system on lytic bacteriophages with different
lifestyles and development strategies. Nucleic Acids Res. 2017 Feb 28;45(4):1946–57.
134. Su T, Liu F, Chang Y, Guo Q, Wang J, Wang Q, et al. The phage T4 DNA ligase mediates
bacterial chromosome DSBs repair as single component non-homologous end joining.
Synth Syst Biotechnol. 2019 Jun;4(2):107–12.
145
135. Suttle CA. Marine viruses — major players in the global ecosystem. Nat Rev Microbiol.
2007 Oct;5(10):801–12.
136. Swarts DC, Mosterd C, Van Passel MWJ, Brouns SJJ. CRISPR Interference Directs Strand
Specific Spacer Acquisition. Mokrousov I, editor. PLoS ONE. 2012 Apr 27;7(4):e35888.
137. Tang W, Liu DR. Rewritable multi-event analog recording in bacterial and mammalian
cells. Science. 2018 Apr 13;360(6385):eaap8992.
138. Van Erp PBG, Patterson A, Kant R, Berry L, Golden SM, Forsman BL, et al.
Conformational Dynamics of DNA Binding and Cas3 Recruitment by the CRISPR RNAGuided Cascade Complex. ACS Chem Biol. 2018 Feb 16;13(2):481–90.
139. van Erp PBG, Jackson RN, Carter J, Golden SM, Bailey S, Wiedenheft B. Mechanism of
CRISPR-RNA guided recognition of DNA targets in Escherichia coli. Nucleic Acids Res.
2015 Sep 30;43(17):8381–91.
140. Venclovas Č. Structure of Csm2 elucidates the relationship between small subunits of
CRISPR‐Cas effector complexes. FEBS Letters. 2016 May;590(10):1521–9.
141. Vercoe RB, Chang JT, Dy RL, Taylor C, Gristwood T, Clulow JS, et al. Cytotoxic
Chromosomal Targeting by CRISPR/Cas Systems Can Reshape Bacterial Genomes and
Expel or Remodel Pathogenicity Islands. Hughes D, editor. PLoS Genet. 2013 Apr
18;9(4):e1003454.
142. Vink JNA, Martens KJA, Vlot M, McKenzie RE, Almendros C, Estrada Bonilla B, et al.
Direct Visualization of Native CRISPR Target Search in Live Bacteria Reveals Cascade
DNA Surveillance Mechanism. Molecular Cell. 2020 Jan;77(1):39-50.e10.
143. Vo PLH, Ronda C, Klompe SE, Chen EE, Acree C, Wang HH, et al. CRISPR RNA-guided
integrases for high-efficiency, multiplexed bacterial genome engineering. Nat Biotechnol.
2021 Apr;39(4):480–9.
144. Wang J, Li J, Zhao H, Sheng G, Wang M, Yin M, et al. Structural and Mechanistic Basis of
PAM-Dependent Spacer Acquisition in CRISPR-Cas Systems. Cell. 2015 Nov;163(4):840–
53.
145. Wang JY, Tuck OT, Skopintsev P, Soczek KM, Li G, Al-Shayeb B, et al. Genome expansion
by a CRISPR trimmer-integrase. Nature. 2023 Jun 22;618(7966):855–61.
146. Wei Y, Chesne MT, Terns RM, Terns MP. Sequences spanning the leader-repeat junction
mediate CRISPR adaptation to phage in Streptococcus thermophilus. Nucleic Acids
Research. 2015 Feb 18;43(3):1749–58.
147. Wei Y, Terns RM, Terns MP. Cas9 function and host genome sampling in Type II-A
CRISPR–Cas adaptation. Genes Dev. 2015 Feb 15;29(4):356–61.
146
148. Wigley DB. Bacterial DNA repair: recent insights into the mechanism of RecBCD, AddAB
and AdnAB. Nat Rev Microbiol. 2013 Jan;11(1):9–13.
149. Wimmer F, Beisel CL. CRISPR-Cas Systems and the Paradox of Self-Targeting Spacers.
Front Microbiol. 2020 Jan 22;10:3078.
150. Workman RE, Pammi T, Nguyen BTK, Graeff LW, Smith E, Sebald SM, et al. A natural
single-guide RNA repurposes Cas9 to autoregulate CRISPR-Cas expression. Cell. 2021
Feb;184(3):675-688.e19.
151. Wright AV, Doudna JA. Protecting genome integrity during CRISPR immune adaptation.
Nat Struct Mol Biol. 2016 Oct;23(10):876–83.
152. Xue C, Sashital DG. Mechanisms of Type I-E and I-F CRISPR-Cas Systems in
Enterobacteriaceae. Slauch JM, Phillips G, editors. EcoSal Plus. 2019 Dec
31;8(2):10.1128/ecosalplus.ESP-0008–2018.
153. Xue C, Seetharam AS, Musharova O, Severinov K, J. Brouns SJ, Severin AJ, et al. CRISPR
interference and priming varies with individual spacer sequences. Nucleic Acids Res. 2015
Dec 15;43(22):10831–47.
154. Xue C, Whitis NR, Sashital DG. Conformational Control of Cascade Interference and
Priming Activities in CRISPR Immunity. Molecular Cell. 2016 Nov;64(4):826–34.
155. Yim SS, McBee RM, Song AM, Huang Y, Sheth RU, Wang HH. Robust direct digital-tobiological data storage in living cells. Nat Chem Biol. 2021 Mar;17(3):246–53.
156. Yoganand KNR, Sivathanu R, Nimkar S, Anand B. Asymmetric positioning of Cas1–2
complex and Integration Host Factor induced DNA bending guide the unidirectional
homing of protospacer in CRISPR-Cas type I-E system. Nucleic Acids Res. 2017 Jan
9;45(1):367–81.
157. Yosef I, Goren MG, Qimron U. Proteins and DNA elements essential for the CRISPR
adaptation process in Escherichia coli. Nucleic Acids Research. 2012 Jul 1;40(12):5569–
76.
158. Yosef I, Mahata T, Goren MG, Degany OJ, Ben-Shem A, Qimron U. Highly active
CRISPR-adaptation proteins revealed by a robust enrichment technology. Nucleic Acids
Research. 2023 Aug 11;51(14):7552–62.
159. Yoshimi K, Takeshita K, Kodera N, Shibumura S, Yamauchi Y, Omatsu M, et al. Dynamic
mechanisms of CRISPR interference by Escherichia coli CRISPR-Cas3. Nat Commun.
2022 Aug 30;13(1):4917.
160. Zhao H, Sheng G, Wang J, Wang M, Bunkoczi G, Gong W, et al. Crystal structure of the
RNA-guided immune surveillance Cascade complex in Escherichia coli. Nature. 2014
Nov;515(7525):147–50.
147
161. Zheng X, Li SY, Zhao GP, Wang J. An efficient system for deletion of large DNA fragments
in Escherichia coli via introduction of both Cas9 and the non-homologous end joining
system from Mycobacterium smegmatis. Biochem Biophys Res Commun. 2017 Apr
15;485(4):768–74.
162. Zhu H, Wang LK, Shuman S. Essential constituents of the 3’-phosphoesterase domain of
bacterial DNA ligase D, a nonhomologous end-joining enzyme. J Biol Chem. 2005 Oct
7;280(40):33707–15.
Abstract (if available)
Abstract
Adaptive immunological defense fortifies life coevolving in adversarial environments. CRISPR-Cas is a prokaryotic defense mechanism providing adaptive protection against evasive mobile genetic elements (MGEs) such as bacteriophage and plasmids. CRISPR arrays, preserved in the host genome, are repositories of MGE-derived DNA sequences termed “spacers” that serve as a chronological, immunological record of infections. These arrays are composed of distinct spacer sequences flanked by conserved, palindromic repeats. Cas1 and Cas2 proteins form a complex that captures, processes and array-integrates these infection-derived sequences. This CRISPR adaptation process is also central to emerging biological recording technologies. In E. coli, the Type I-E CRISPR array is expressed and processed into crRNAs, each of which guide a surveillance complex to target protospacer sequences complementary to the spacer. Target site binding triggers recruitment of a helicase nuclease for processive dsDNA degradation. The primary focus of the research presented in this dissertation is CRISPR adaptation, specifically the dynamics of spacer acquisition in the E. coli Type I-E CRISPR-Cas system.
Array integrated spacer sequences, captured from infecting mobile genetic elements provide target specificity for the CRISPR-Cas immune response. The rates at which spacers integrate into native arrays within bacterial populations has not been quantified. Here we measure naïve spacer acquisition rates in E. coli Type I-E CRISPR, identify factors that affect these rates, and model this process fundamental to CRISPR-Cas defense. Prolonged Cas1-Cas2 expression produced fewer new spacers per cell on average than predicted by our model. Subsequent experiments revealed this was due to a mean fitness reduction linked to array-expanded populations. Also, expression of heterologous non-homologous end joining (NHEJ) DNA-repair genes was found to augment spacer acquisition rates, translating to enhanced phage infection defense. Together, these results demonstrate the impact of intracellular factors that modulate spacer acquisition and identify an intrinsic fitness effect associated with array expanded populations.
We also present research characterizing self-genome deletions catalyzed by the Cascade- Cas3 CRISPR interference machinery of E. coli. Programmed host-DNA targeting produces long- range bidirectional deletions in vivo. Introducing heterologous NHEJ expression increases the survivability of self-targeting and generates significantly smaller, unidirectional deletions.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Nutritional competence in Escherichia coli: the use of double-stranded DNA as a nutrient
PDF
Assessing the distribution and role of the CRISPR-Cas system on carbapenem resistance in e. coli and k. pneumoniae clinical strains
PDF
Maintenance of genome stability at fragile sites in Schizosaccharomyces pombe
PDF
Site-directed spin labeling studies of target DNA recognition by a CRISPR nuclease
PDF
Dps contributes to typical growth, survival, and genome organization in E. coli
PDF
Genome-wide GATC methylation in long-term cultures of Escherichia coli K-12
PDF
Characterizing and manipulating homology-directed gene editing in human cells
PDF
Electrochemical studies of outward and inward extracellular electron transfer by microorganisms from diverse environments
PDF
DNA methylation as a biomarker in human reproductive health and disease
PDF
Decoding the embryo: on spatial and genomic tools to characterize gene regulatory networks in development
PDF
Rational selection of CRISPR/Cas9 guide RNAs for homology directed genome editing and its utility in the development of gene therapies
PDF
AID scanning & catalysis and the generation of high-affinity antibodies
PDF
Investigating mechanisms of DNA target recognition by CRISPR-Cas nucleases
PDF
Electronic, electrochemical, and spintronic characterization of bacterial electron transport
PDF
Functional characterization of a prostate cancer risk region
PDF
Physiological roles and evolutionary implications of alternative DNA polymerases in Escherichia coli
PDF
Genomic and physiological characterization of Escherichia coli evolving in long-term batch culture
PDF
New tools for whole-genome analysis of DNA replication timing and fork elongation in saccharomyces cerevisiae
PDF
Decrypting Escherichia coli DNA polymerase V mutasome ATP regulation
PDF
C. elegans topoisomerase II regulates chromatin architecture and DNA damage for germline genome activation
Asset Metadata
Creator
Peach, Luke
(author)
Core Title
Characterizing and developing E. coli Type I-E CRISPR adaptation as a DNA recording and genome engineering tool
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Molecular Biology
Degree Conferral Date
2024-12
Publication Date
01/14/2025
Defense Date
12/20/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
CRISPR adaptation,OAI-PMH Harvest,spacer acquisition.
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Boedicker, James (
committee chair
), Cheng, Xianrui (
committee member
), Finkel, Steve (
committee member
), Thrash, Cameron (
committee member
)
Creator Email
lpeach@usc.edu,lukepeach1@gmail.com
Unique identifier
UC11399FAXC
Identifier
etd-PeachLuke-13754.pdf (filename)
Legacy Identifier
etd-PeachLuke-13754
Document Type
Dissertation
Format
theses (aat)
Rights
Peach, Luke
Internet Media Type
application/pdf
Type
texts
Source
20250115-usctheses-batch-1235
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
CRISPR adaptation
spacer acquisition.