Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
The mechanism of R-loop formation in mammalian immunoglobulin class switch recombination
(USC Thesis Other)
The mechanism of R-loop formation in mammalian immunoglobulin class switch recombination
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
THE MECHANISM OF R-LOOP FORMATION IN MAMMALIAN
IMMUNOGLOBULIN CLASS SWITCH RECOMBINATION
by
Deepankar Roy
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(BIOCHEMISTRY AND MOLECULAR BIOLOGY)
May 2009
Copyright 2009 Deepankar Roy
ii
DEDICATION
To my friend, Paul Roybal.
iii
ACKNOWLEDGEMENTS
I would like to wholeheartedly thank Dr. Michael Lieber for giving me an
opportunity to learn about scientific research and providing me with guidance and
mentorship on our research work on R-loop formation. Dr. Chih-lin Hsieh has
always helped me with not only technical advice about my work, but with advice
that would make me a better worker. I would also like to thank Dr. Michael
Stallcup, Dr. Ebrahim Zandi and Dr. Ian Haworth for their very useful suggestions
and guidance during the course of my research studies at the Lieber lab.
I would also thank my present and past labmates for their advice and
support. Dr. Kefei Yu, Dr. Sathees Raghavan, Dr. Ryan Irvine, Dr. Yunmei Ma,
Dr. Noriko Shimazaki, Dr. Shigeru Sasaki, Dr. Xiaoping Cui, Dr. Haihui Lu, Dr.
Feng-Ting Huang, Dr. Albert Tsai, Jiafeng Gu, Go Watanabe, Zheng Zhang and
Zhengfei Lu have all helped me not only with my experiments, but also by being
my very good friends. I am grateful to my friend Paul Roybal for all his help.
Nothing would have been possible without the unconditional love, wishes
and patience of my parents, my little brother, and Pavinder; and I am humbled
not only for their unfaltering support, but also for their generosity of spirit.
This work is a result of all the support and guidance I received from my
mentor, Dr. Lieber, my committee members, my teachers, my parents, all of my
friends, my brother and Pavinder and I would once again like to thank everybody
and would like to dedicate this to all of you.
Deepankar Roy
iv
TABLE OF CONTENTS
Dedication ii
Acknowledgements iii
List of Tables v
List of Figures vi
Abstract ix
Chapter 1 Understanding R-loop Formation: Basic Biochemical
Properties and Mechanistic Principles. 1
Chapter 1 References 16
Chapter 2 Mechanism of R-loop formation and Immunoglobulin
Class Switch Sequences. 21
Chapter 2 References 61
Chapter 3 G-Clustering is Important for the Initiation of
Transcription-Induced R-loops In vitro Whereas
High G-Density Without Clustering is Sufficient
Thereafter 66
Chapter 3 References 109
Chapter 4 Physical and Mechanical Aspects of the DNA and
RNA that Influence R-loop Formation 114
Chapter 4 References 141
Chapter 5 Concluding Remarks 142
Chapter 5 References 149
Bibliography 153
Appendices Appendix A: Supplementary Information for Chapter 2 163
Appendix B: Supplementary Information for Chapter 3 173
Appendix C: Supplementary Information for Chapter 4 180
v
LIST OF TABLES
Table 2.1 Frequency of R-loop formation at switch substrates
containing various repeat lengths of Sγ3 as determined
by the colony lift hybridization assay and sequence
analysis. 41
Table 2.2 Frequency of R-loop formation at four repeats of wild
type or sequence-modified Sγ3 as determined by the
colony lift hybridization assay and sequence analysis. 47
Table 3.1 R-loop formation frequencies in different substrates
calculated by colony lift hybridization assay. 90
Table B.1 Prediction of expected change in local Gibbs free
energy (∆G) upon DNA:DNA duplex formation or
RNA:DNA hybrid formation for motifs A, C or B
containing 2x4G-clusters, 1x4G-clusters or no
G-clusters, respectively. 176
Table B.2 Prediction of expected change in Gibbs free
energy (∆G) upon DNA:DNA duplex formation or
RNA:DNA hybrid formation for pDR18A, pDR18C
and pDR18B over the length of RIZ+REZ. 177
Table B.3 Prediction of expected change in Gibbs free
energy (∆G) upon DNA:DNA duplex formation or
RNA:DNA hybrid formation for pDR22A, pDR22C
and pDR22B over the length of RIZ+REZ. 178
Table B.4 Prediction of expected change in Gibbs free
energy (∆G) upon DNA:DNA duplex formation or
RNA:DNA hybrid formation for pDR54A, pDR54C
and pDR54B over the length of RIZ+REZ. 179
vi
LIST OF FIGURES
Figure 2.1 RNase T1 interferes with R-loop formation on a
linearized switch substrate containing 4 repeats
of murine Sγ3. 33
Figure 2.2 R-loop formation at the murine Sγ3 is not reliant
on G-quartet formation. 36
Figure 2.3 In vitro transcription of linear substrates bearing
0, 1, 2, 3 and 4 Sγ3 repeats. 38
Figure 2.4 Location of R-loops on transcribed substrates
bearing one to four Sγ3 repeats. 42
Figure 2.5 Location of R-loops on transcribed substrates
for which the G-density of the switch repeats
has been reduced. 48
Figure 2.6 Detection of transcription-induced R-loops on
a substrate with high G-density but no G-clustering. 54
Figure 3.1 Locations of G in the substrates used for
studying the effects of G-clusters in the
R-loop initiation zone (RIZ). 76
Figure 3.2 Effect of G-clusters in the RIZ on R-loop formation. 79
Figure 3.3 Effect of reducing the REZ G-clusters from GGGG
to GGG on R-loop formation. 85
Figure 3.4 Effect of reducing the REZ G-clusters from GGGG
to GG on R-loop formation. 91
Figure 3.5 Effect of reducing the REZ G-clusters from GGGG
to G while maintaining a high overall REZ G-density. 94
Figure 3.6 Role of G-clusters versus high G-density in the
R-loop initiation zone (RIZ) in R-loop formation
efficiency. 97
Figure 3.7 Model of R-loop initiation by nucleation at G-clusters. 103
vii
Figure 4.1 Effect of distance and the non-hybridizing portion
of RNA to R-loop formation. 121
Figure 4.2 Presence of RIZ G-clusters at a distance
upstream of the switch repeat REZ enhances
R-loop formation efficiency. 124
Figure 4.3 The location and extent of R-loop structures
in linearized and negatively supercoiled
versions of T7 transcribed pDR72A. 128
Figure 4.4 Presence of a nick on the nontemplate strand
downstream of the promoter increases R-loop
formation efficiency at the downstream switch
repeats. 133
Figure 4.5 Presence of a nick on the nontemplate strand
downstream of the promoter increases R-loop
formation efficiency downstream of the nicked
position even in absence of switch repeats. 134
Figure 4.6 Presence of a nick on the nontemplate strand
downstream of the promoter increases R-loop
formation efficiency at the downstream switch
repeats separated from the promoter better than
when the nick is present far downstream of the
promoter and immediately upstream of the switch
repeats. 138
Figure A.1 Two models for the formation of long R-loops. 169
Figure A.2 Schematic of RNase T1 interference with R-loop
formation. 170
Figure A.3 Decrease in R-loop formation efficiency as a
function of the number of switch repeats. 171
Figure A.4 Decrease in R-loop formation efficiency with the
decrease in G-clustering within switch repeats. 172
Figure C.1 Model of effect of increasing distance between
promoter and switch repeat DNA. 180
viii
Figure C.2 Model of R-loop formation at REZ switch repeats in
presence of G-clustered RIZs located at a distance
upstream of the REZ. 181
Figure C.3 Model of R-loop formation at a negatively supercoiled
substrate with the RIZ and REZ separated by a
non-G-rich sequence. 182
Figure C.4 Model of R-loop formation at substrates containing a
nontemplate strand nick upstream or downstream of the
promoter. 183
Figure C.5 Model of R-loop formation with promoter-downstream
nicks present at different locations. 184
ix
ABSTRACT
R-loops form at immunoglobulin (Ig) heavy chain locus in germinal center
B-cells undergoing class switch recombination (CSR). These are triple-stranded
structures formed upon transcription of the G-rich switch sequences where the
G-rich transcript hybridizes with the C-rich DNA template strand, leaving the
nontemplate DNA strand in a single-stranded conformation. These structures aid
in targeting of the CSR initiating activation-induced cytidine deaminase to the
participating switch regions to initiate CSR. CSR involves exchanging IgM heavy
chain domain gene (Cµ) with another heavy chain (C
H
) gene located downstream
of Cµ to generate IgG, IgA or IgE antibodies. This occurs by a recombination
event between the long switch regions located upstream of the C
H
genes in the
Ig locus. To understand the mechanism of R-loop formation and study their
biochemical and structural properties, we have performed experiments with a
minimal R-loop forming sequence as our model.
We find that R-loops form by a threadback mechanism where the single-
stranded RNA coming out of the RNA polymerase invades the upstream edge of
transcription bubble to establish a hybrid with the template strand DNA. We
observe that R-loop formation efficiency decreases with a decreasing number of
G-rich switch repeats and G-clustering is the primary sequence feature that helps
in R-loop formation. A sequence with a high G-density but no clustering cannot
efficiently form R-loops. We find that R-loop formation on the template strand
DNA is not dependent upon G-quartet formation on the nontemplate strand DNA.
x
R-loops can initiate at short G-clustered regions by formation of short
thermodynamically stable RNA:DNA [R-loop initiations zones (RIZ)].
Immediately downstream of the RIZ, R-loop elongation zones (REZ) are also G-
rich, and they can harbor R-loop structures by providing a longer G-rich region.
Whereas a G-clustered RIZ can initiate R-loop formation efficiently, G-rich but
unclustered regions cannot effectively serve as RIZs. REZs, on the other hand,
have to be G-rich but not necessarily G-clustered. Our experiments also indicate
that RIZ sequences can help in R-loop formation at distant REZ regions, and
DNA conformation is an important determinant of R-loop formation.
1
CHAPTER 1
UNDERSTANDING R-LOOP FORMATION: BASIC BIOCHEMICAL
PROPERTIES AND MECHANISTIC PRINCIPLES.
Introduction
In the immune system, three covalent somatic DNA modification events
are responsible for production of functional mature antibodies. These processes
are V(D)J recombination, somatic hypermutation (SHM) and class switch
recombination (CSR). V(D)J gene recombination occurs in genomes of the bone
marrow pre-B cells and makes prospective antigen binding domain exons before
antigen exposure. V(D)J recombination also occurs in pre-T cells in the thymus
to make T-cell receptors (TCR). On the other hand, SHM and CSR occur in the
mature B-cells present in the germinal centers (GC) of secondary lymphoid
organs like the spleen and lymph nodes. Upon antigen exposure, SHM
introduces point mutations in the reorganized V(D)J locus at a rate that is about a
thousand fold more than the random rate of point mutagenesis. This process
helps in increasing the variability and repertoire of antigen binding pockets by
increasing the probability of creating a better fit of the pocket of the
immunoglobulin (Ig) molecule to the antigen. After antigen exposure, different
antibody isotypes are produced with the same antigen binding pocket to achieve
various effector functions to neutralize the antigen, and these effector functions
are specified by the different Ig heavy chain isotypes. CSR is the process
responsible for the selection of the isotype of the antibody to be generated and
2
involves intrachromosomal deletional recombination events. After CSR, the GC
B-cells switch from a default IgM/IgD production mode to production of IgG, IgA
or IgE (5, 47).
Ig Heavy Chain Locus and Class Switch recombination
Murine Ig heavy chain locus is composed of eight functional heavy chain
constant region exon sets arrayed on chromosome 12. A similarly organized
human Ig heavy chain locus on chromosome 14q3.2 comprises nine functional
heavy chain constant region exon sets. Each set of constant region exons
(except Cδ) has a long (2-10kb) G-rich repetitive region called a switch region.
Each set also has an "I exon" 5’ of the switch region and a specific cytokine
responsive promoter 5’ of the I exon. Transcription is necessary for CSR (22, 44,
50) as also the presence of the switch regions (23, 26, 41, 48). Sµ, the most
upstream switch region is the donor switch region and, in CSR, it recombines
with one of the downstream acceptor switch regions. This process is followed for
generation of all isotypes except for IgD which is produced by alternative splicing.
The IgHµ promoter constantly fires and generates IgM and/or IgD type
antibodies. Upon antigen exposure and specific cytokine stimulation, additional
cytokine responsive promoters in front of the switch regions get activated and
transcribe through the G-rich switch region, thereby generating G-rich transcripts.
The downstream transcribing switch region DNA then recombines with the
transcribing Sµ region DNA in a way that is yet to be clearly defined. The
intervening sequence between the two recombining switch regions is looped out
3
and deleted and the joining of the recombining switch regions is accomplished by
the NHEJ double-strand DNA break repair pathway (15, 27, 36).
SHM and CSR are Dependent on the Action of Activation-Induced Cytidine
Deaminase (AID).
While SHM and CSR affect separate regions of the Ig locus, there are
multiple similarities in these processes. Both SHM and CSR exhibit parallelism
in their spatio-temporal occurrence in the Ig locus of the maturing antigen-
exposed GC B-cells, and on their dependence on the Activation-induced Cytidine
Deaminase (AID) (4, 6, 8, 27, 30, 33, 34, 37). First discovered by the Honjo
laboratory in 1999, (31) by cDNA subtraction, AID is expressed in GC B-cells and
is biochemically known to deaminate deoxycytidines on single-stranded DNA
substrates to Uracils (3, 34, 46), creating a U:G mismatch. These U:G
mismatches can be repaired by BER (base-excision repair) or MMR (mismatch
repair) and can create a mutation at the initial mismatch location. The uracils in
the DNA can be removed by uracil deoxynucleotide glycosylase (UDG) (7, 20,
35), a BER enzyme, creating an abasic site. The phosphodiester backbone of
the affected DNA strand can be cleaved at the 5’ end of the abasic site by the
apyrimidinic/apurinic endonuclease 1 (APE1) (14). These DNA single-stranded
breaks (SSB) can further act as the substrate for the MMR/BER pathway of DNA
repair. DNA double strand breaks (DSB) can be created at these SSB sites by
action of Msh2/Msh6 proteins of the MMR pathway. DNA DSBs can also be
created if there is another AID/UDG/APE1-induced nick on the complementary
4
DNA strand close to the position of the initial nick. While SHM mainly entails
introduction of point mutations by this AID induced mechanism, CSR involves the
generation of switch region DNA DSBs for recombination.
Targeting of AID to Switch Regions: Role of Transcription-Induced R-loop
Structures.
The main CSR initiating enzyme, AID, has deoxycytidine deamination
activity only in the single-stranded regions of a substrate DNA (3, 34, 46). While
the general activity of the AID in SHM regions could be generally attributed to
transient DNA strand separation by breathing or during processes like
transcription, the targeting of the CSR machinery specifically to the switch
regions in front of the constant region set of exons indicates enhanced targeting
of these regions. Transcription is required for CSR and removal of any of the
promoters (such as the I exon promoters) abolishes class switching to the set of
exons under control of that promoter (22, 50). The switch regions also have
unique sequence properties. They are extremely long (1-12kb), contain
repetitive units (10-80 nt in length depending on the type of switch region), and
are asymmetrically G-rich in the nontemplate DNA strand with many of the G
nucleotides organized in G-clusters. Our laboratory first characterized R-loop
structures at the switch regions in 2003 (45). These R-loop structures are triple-
stranded non-B-DNA structures formed when transcription through the switch
regions generates a G-rich RNA that hybridizes with the template DNA strand
(forming an RNA:DNA hybrid) for a certain length, leaving the corresponding
5
length of the nontemplate strand in a single-stranded configuration. Such R-loop
structures were seen at switch regions both in vivo in the splenic B-cells induced
for CSR (16, 17, 45) as well as in vitro when smaller portions of switch region
DNA were transcribed (39, 40, 45). Such structures can serve as targets for the
CSR machinery where the single-strandedness on the nontemplate strand could
be subjected to AID action (17). Similar action on the template could occur at the
edges of the R-loops where there is increased breathing of the RNA:DNA hybrid,
or at places where the RNA in RNA:DNA hybrid is partially digested by cellular
RNase H (RNase H can only degrade RNA in an RNA:DNA hybrid context)
exposing small regions of single strandedness, or by misalignment of the
template and nontemplate strands because of the highly repetitive nature of the
switch repeat regions. Thus, transient single-strandedness on the nontemplate
and the template strand at R-loop structures in the transcribed switch region DNA
can be exploited by the AID to induce deaminations leading to SSBs and DSBs.
The recombining switch regions synapse (43) and the DSBs in the Sµ and a
downstream switch region are used as points of recombination and joining by the
error prone NHEJ DNA repair pathway. The region-specific targeting of the
switch regions gives rise to a regionally-specified recombination event where the
cleavage can be anywhere in the R-loop structure (9, 13, 29), and this is different
from the site-specific cleavage at the recombination signal sequences (RSS) by
the RAG complex proteins to initiate V(D)J recombination (25).
While the G-rich switch regions targeted by the AID enzyme are known to
be mammalian features, the AID is known to be present from much earlier
6
periods in the evolutionary scale in sharks, bony fish and amphibians (1, 19, 42).
The conservation of AID points to its importance in evolution of an efficient
immune system with increased functional diversity and sophistication. It has
been shown that zebrafish AID can function in mammalian CSR (1). In Xenopus,
the immunoglobulin diversity is far inferior than the mammalian systems, and the
switch regions are predominantly A and T rich (32). The amphibian switch
regions probably function more like SHM-like substrates with point mutations or
SSBs occurring at a higher frequency as compared with surrounding non-switch
regions (48), but still lower than the much more efficient mammalian switch
regions where R-loop structures can serve as better targets for AID to act on.
Although replacing a 12 kb mammalian Sγ1 with a 4 kb Sµ from Xenopus can
functionally restore CSR to IgG1 locus to a smaller extent (49), it is possible that
the rate of switching is slower for the Xenopus fragment, and this might not be
apparent at the 60 to 72 hr time points used in those studies. It is known from
both in vivo and in vitro data that the Xenopus Sµ cannot form R-loop structures
((49), and Huang, F.T. and M.R. Lieber (unpublished)). Unlike mammals,
Xenopus only weakly responds to hyperimmunization (11), and this difference
from the mammalian systems might be partly due to the lack of asymmetric G-
richness like the mammalian switch repeats. The increased functional diversity
of the immune system and G-clustered switch repeats in mammalian
immunoglobulin heavy chain constant region locus indicates that the evolution of
G-rich repetitive switch regions from non-G-rich switch regions of lower classes
has occurred to support more efficient targeting by AID by forming R-loop
7
structures and to better control these regions by limiting R-loop formation to the
transcribed G-rich switch regions thereby avoiding and lessening chances of
promiscuous AID activity.
R-loop Formation in other Experimental Systems
R-loops are known to form at bacterial colE1 origins and at mitochondrial
genomes during replication (28). R-loops are thought to form in GC-rich regions
in Saccharomyces cerevisiae genomes, acting as mutational hotspots and are
associated with an increased mutational frequency phenomenon called
transcription associated recombination (TAR) at sites of mitotic recombination.
TAR reduced upon expression of RNase H proving the RNA:DNA hybrid nature
of the putative genomic structures (18). These RNA:DNA hybrids occur in yeast
knockout strains that lack specific factors (Tho/TREX complex, hpr protein) that
form mRNP (mRNA ribonucleoprotein particles) complexes that are tasked with
binding the mRNA during transcription and exporting them across the nuclear
membrane. In the absence of such factors therefore, the RNA has an increased
chance of hybridizing with the template, thus initiating TAR (12, 21). Similar
observations have been made in chicken DT40 and HeLa cell lines depleted for
ASF/SF2 proteins belonging to the family of mRNA splicing factors (24). These
factors associated with the transcriptional machinery can bind the transcribing
mRNA and keep it away from the DNA strands upstream of the RNA polymerase,
thus allowing them to reanneal and form duplex DNA. Absence of these factors
causes gross genomic instability associated with transcription, and R-loop
8
structures were detected at GC-rich regions in the genome. This genomic
instability could be suppressed when RNase H was overexpressed (24).
Structural and Biochemical Properties of R-loop Structures.
R-loop structures are known to form in vivo (16, 17, 45), and from earlier
work on them from our laboratory, their general properties as described from in
vivo studies revealed that they can form at G-rich mammalian switch repeats
upon transcription, they can be extremely long (over several kbs), they can have
heterogeneous start and termination sites within the G-rich regions. These
structures could persist after phenol:chloroform extraction of DNA and short term
exposure to temperatures of upto 65-70
o
C. R-looping was shown to be resistant
to RNase A treatment, but sensitive to RNase H treatment. Long stretches of
single-strandedness were detected on the nontemplate stand while the template
strand, owing to being hybridized with the RNA, showed only background
conversions of C-to-T changes upon sodium bisulfite treatment that increased to
short single-stranded stretches upon RNase H treatment, indicating that there
might be misalignment of template and nontemplate strand DNA (45).
Most of these features were seen in vitro also, and we wanted to further
our understanding of the basic requirements for R-loop formation, and the role of
RNA, DNA sequence and other structural characteristics that can influence R-
loop formation. Therefore, we decided to establish a minimal system to
recapitulate R-loop formation in vitro, so that we could perform experiments with
purified components and understand the fundamental R-loop formation
9
requirements. Without the influence or bias of in vivo factors, this reductionist
approach would help us in deconstructing a complex phenomenon into parts with
defined principles and formulating the basic ruIes for R-loop formation. To
understand the nature of R-loop structures, we chose the mouse Sγ3 switch
repeats to model R-loop formation in vitro. The genomic Sγ3 locus is ~2kb long
and is comprised of 41 repeats of ~49bp long G-rich repeat units (9, 13). Each
repeat contains regions of G-clusters. We cloned the switch repeats under a T7
promoter so that in vitro transcription generates a G-rich transcript which can
hybridize with the template strand and form a RNA:DNA hybrid region. The
methods and experiments have been described in more detail in the following
chapters.
Chapter 2 describes our experiments to understand the R-loop formation
mechanism and some sequence-dependent aspects of R-loop formation. We
wanted to know the mechanism of R-loop formation and found that R-loop
formation was inhibited when RNase T1, a single-strand RNA digestion enzyme
was present during the transcription. This provided us with proof that R-loops
are formed by an RNA-threadback mechanism in which the RNA comes out as a
single strand and then invades the upstream region of the transcription bubble
behind the RNA polymerase. The single-strandedness of the exiting RNA prior
to hybridization is the important point of mechanistic difference from another
model of R-loop formation; the extended-hybrid formation. In this model, the
RNA:DNA hybrid in the R-loop region forms as a direct extension of the initial ~9
bp RNA:DNA hybrid formed inside the elongating RNA polymerase complex.
10
However, the sensitivity of the exiting R-loop to RNaseT1 proves that the RNA
has to be single-stranded for a short period of time while coming out of the
polymerase, and that this window of opportunity is exploited by the RNaseT1 to
suppress R-loop formation. This is the first experimental data to determine the
generation mechanism of co-transcriptionally formed R-loops.
The mammalian switch regions are extremely G-rich, almost exclusively
upwards of 45% G-rich on the nontemplate strand, as compared to rest of the
genome where the average G-density is around 21%. The amphibian switch
regions also lack G-richness and do not form R-loops. This points to the
importance of G-richness in formation and stabilization of R-loop structures. Not
only that, the lengths of the mammalian switch regions are extensive and are
made up of repeating units, suggesting a natural selection based amplification of
the repeats within the switch regions. The organization of the G-content is also
an interesting point of research as most of the G-richness is accommodated in G-
clusters within the switch regions and not dispersed. We did a number of
experiments to understand the role of G-richness, the G-clustered organization
and length effects of switch repeats towards R-loop formation.
One mechanism related question was whether R-loop formation was
dependent on G-quartet/G-quadruplex formation at the G-clustered regions. G-
quartets are thought to occur at G-clustered regions in telomeric repeats and
other genomic regions where similar G-clustered regions are found. Since the
mammalian switch repeats have a G-clustered organization, some reports in the
literature suggest the formation of G-quartets (10) on the nontemplate strand that
11
can render the template strand DNA free to hybridize with the transcript. To test
the possibility that R-loop formation is dependent upon G-quadruplex formation,
we transcribed our Sγ3 repeat containing substrate in presence of various cations
like Li
+
, Na
+
, K
+
and Cs
+
. Owing to their ionic diameters, Li
+
and Cs
+
are G-
quartet destabilizing cations because Li
+
is too small and Cs
+
is too large to stay
in the cavity in the middle of a quartet or between quartet planes in a G-
quadruplex. Na
+
and K
+
can fit because of their favorable size and establish
coordinate bonds thereby stabilizing the G-quartets. We found that R-loops
could form irrespective of the cations present, thus emphasizing that R-loop
formation is not dependent upon G-quartet formation in the nontemplate strand.
The switch regions are long and this helps in establishing a wider region
for R-loop formation and better targeting by the CSR machinery that does not act
in a site sequence specific manner. Owing to the stochastic nature of R-loop
formation, a larger G-rich switch region increases the probability of R-loop
formation at the switch region. However, the switch repeats themselves are
relatively smaller in size, indicating a relatively recent evolution-driven expansion
of these repeats to make larger switch regions. To test the minimal length of
switch regions required for R-loop formation, we cloned 4, 3, 2 or 1 mouse Sγ3
repeats under T7 promoter and observed that while all substrates could support
R-loop formation, the frequencies were the highest for the 4 repeat substrate,
followed by the 3, 2 and 1 repeat substrate, which was the least efficient. The R-
loop regions were contained within the G-rich switch repeats. Unlike the gradual
drop of G-density in the genomic loci, the G-density drop between the switch
12
repeats (~50%) and the flanking non-switch regions(~21%) was sharp, and this
causes the termination of the R-loops observed within the G-rich switch regions.
The G-clusters are the most distinctive features of mammalian switch
repeats and to test their importance, we made several substrates where we
varied number of Gs in the G-clusters on the nontemplate strand and reduced
them from the average size of four or five Gs to cluster sizes of three, two or an
unclustered arrangement. The overall GC content was kept constant between
the substrates; and upon transcription, we found that the wild type G-clustered
substrate was the most efficient, but the reduction of the G-cluster size drastically
reduced the R-loop formation efficiency. This shows that even minor changes in
G-clustering can affect R-loop formation. This also indicates that G-clustering in
switch regions has evolved to a point where more efficient R-loop formation could
be achieved. While we kept the GC content the same in the substrates, we
gradually reduced the G-density of the nontemplate strand. To understand if the
reduction in R-loop formation was due to a change in G-clustering or change in
G-density, another substrate was created with the same nontemplate strand G-
density as the four wild-type repeat containing substrate but with all the G-
clusters disrupted so that no two Gs were adjacent to each other. This substrate
was inefficient in R-loop formation, thus proving that while a highly G-rich region
can support R-loop formation, a G-clustered organization is extremely important
and the primary determinant for efficient R-loop formation.
Chapter 3 describes our experiments and observations to further dissect
G-clustering and G-density in R-loop formation. We determined from our
13
experiments that R-loop formation has an R-loop initiation phase and an R-loop
elongation or stabilization phase. Addition of G-clusters towards the beginning of
the RNA transcript enhances R-loop formation by several folds as compared to a
non-G-rich sequence. This is achieved because the high mobility of the RNA
end increases the chances of effective collisions between the RNA and the
template strand DNA, and if G-clusters are present at the points of collision, they
have higher chances of nucleating short regions of thermodynamically stable
RNA:DNA hybrids at these G-clusters that are much more stable than their
random sequence RNA:DNA hybrids. This discourages the dissociation of the
G-rich RNA from its RNA:DNA hybrid form, thereby increasing the chances of
establishing an R-loop structure around it if favorable G-rich sequences are
available. Therefore, G-clusters can act as R-loop initiation zones or RIZs. We
found that presence of even 1 cluster of 4Gs was better in R-loop formation than
a random sequence. While G-clustered regions are very efficient in R-loop
formation, G-rich sequences with comparable G-densities but an unclustered G
organization cannot efficiently improve R-loop initiation and thus cannot act as
RIZs. Once initiated at the RIZs, R-loops can elongate and get stabilized at R-
loop elongation zones or REZs that are required to be G-rich but do not
necessarily have to be G-clustered. This is an important point of distinction and
this study attempts to understand these mechanistic differences during the
process of R-loop formation.
Experiments described in chapter 4 were aimed towards understanding
the physical and mechanistic roles of RNA and DNA and their behavior around
14
each other behind the transcribing RNA polymerase during R-loop formation.
We tested the effect of distance between the transcription start site (TSS) and
the R-loop forming region and found that while increasing distances suppress R-
loop formation, removal of the non-hybridizing portion of the RNA in presence of
RNase A increases R-loop formation by increasing the chances of functional
collisions between the G-rich RNA and the template strand DNA and also by
removing the mass of the non-hybridizing RNA that can peel off the short
initiation RNA:DNA hybrids formed. Next, we moved the G-clustered RIZ motifs
and placed them upstream of the switch regions, separated by random
sequence. We observed that the G-clusters are able to initiate R-loop formation
at switch regions indicating that G-clustered RIZ can act at a distance from the
REZ. Surprisingly, in linearized substrates, while the RIZ sequences can help in
R-loop formation, we found the R-loops still formed only in the REZ switch
regions indicating the transient nature of the nucleating RNA:DNA hybrid formed
at the G-clustered RIZ regions. While doing the same experiment with a
supercoiled version of DNA, we found significant changes in the position of R-
loops in that a significant number of them extended on both sides of the switch
regions into non-G-rich flanking sequences, indicating that DNA negative
supercoiling can enhance R-loop formation. Presence of DNA nicks also helped
in R-loop formation, if they were present after the promoter, showing that they
can potentially act as regions of R-loop initiation (RIZs) during transcription by
reducing the chances of reannealing of the two transiently separated DNA
15
strands, thereby giving the transcript a relative advantage to hybridize with the
DNA template strand.
These results indicate that R-loop formation is a process mechanistically
dependent on the DNA and RNA factors. The reductionist approach followed
here has reduced the R-loop formation into smaller questions that have
uncovered very basic but important aspects of R-loop formation. These
principles would be useful not only to understand the role of R-loops in CSR but
can also be extended to any sequence that can potentially support R-loop
formation. These regions can experience DNA breakage and
recombination/translocation events, essentially in a similar way as the CSR, and
because of participation of the same players that cause CSR. It is known that in
the t(8;12) translocation event, c-myc on chromosome 8 translocates to the IgH
locus in all sporadic Burkitt’s lymphoma cases and AID is required for this
translocation (38). This indicates a mistake in the CSR process resulting in this
cancerous translocation. Over 90% of multiple myelomas also show
translocation into the switch regions indicating involvement of CSR machinery
mistakenly acting on other genomic regions, most likely because they might
aberrantly recognize R-loop like structures formed at them (2). Our studies will
help us identify and understand such translocation prone regions and events, in
addition to adding to our understanding of R-loop formation in CSR event.
16
Chapter 1 References
1. Barreto, V. M., Q. Pan-Hammarstrom, Y. Zhao, L. Hammarstrom, Z.
Misulovin, and M. C. Nussenzweig. 2005. AID from bony fish catalyzes class
switch recombination. J Exp. Med. 202:733-8.
2. Bergsagel P.L., M. Chesi, E. Nardini, L. Brents, S. Kirby and W.M. Kuehl.
1996. Promiscuous translocations into immunoglobulin heavy chain switch
regions in multiple myeloma. Proc. Natl. Acad. Sci 93:13931-13936.
3. Bransteitter, R., P. Pham, M. D. Scharff, and M. F. Goodman. 2003.
Activation-inducedcytidine deaminase deaminates deoxycytidine on single-
stranded DNA but requires the action of RNase. Proc. Natl. Acad. Sci. U S A.
100:4102-4107.
4. Casellas R., A. Nussenzweig, R. Wuerffel, R. Pelanda, A. Reichlin, H.
Suh, X.F. Kin, E. Besmer, A. Kenter, K. Rajewsky, and M.C. Nussenzweig. 1998.
Ku80 is required for immunoglobulin isotype switching. EMBO J. 17: 2404-2411.
5. Chaudhuri, J., and F. W. Alt. 2004. Class-switch recombination: interplay
of transcription, DNA deamination and DNA repair. Nat. Rev. Immunol. 4:541-
552.
6. Chaudhuri, J., M. Tian, C. Khuong, K. Chua, E. Pinaus, and F. W. Alt.
2003. Transcription-targeted DNA deamination by the AID antibody
diversification enzyme. Nature 422:726-730.
7. Di Noia J., and M.S. Neuberger MS. 2002. Altering the pathway of
immunoglobulin hypermutation by inhibiting uracil-DNA glycosylase. Nature.
419(6902):43-8
8. Dickerson S.K., E. Market, E. Besmer and F.N. Papavasiliou. 2003. AID
mediates hypermutation by deaminating single stranded DNA. J. Exp. Med.
197(10):1291-6.
9. Dunnick, W. A., G. Z. Hertz, L. Scappino, and C. Gritzmacher. 1993. DNA
sequence at immunoglobulin switch region recombination sites. Nucl. Acid Res.
21:365-372.
10. Duquette, M. L., P. Handa, J. A. Vincent, A. F. Taylor, and N. Maizels.
2004. Intracellular transcription of G-rich DNAs induces formation of G-loops,
novel structures containing G4 DNA. Genes Dev. 18:1618-1629.
17
11. Flajnik, M. F., K. Miller, and L. D. Pasquier. 2003. Evolution of the immune
system, p. 519-570. In W. E. Paul (ed.), Fundamental Immunology. Lippincott,
Philadephia.
12. Gonzalez-Aguilera, C., C. Tous, B. Gomez-Gonzalez, P. Huertas, R.
Luna, and A. Aguilera. 2008. The THP1-SAC3-SUS1-CDC31 complex works in
transcription elongation-mRNA export preventing RNA-mediated genome
instability. Mol Biol Cell 19:4310-8.
13. Gritzmacher C.A. Molecular aspects of heavy-chain class switching. 1989.
Crit. Rev. Immunol. 9(3):173-200.
14. Guikema J.E., E.K. Linehan, D. Tsuchimoto, Y. Nakabeppu, P.R. Strauss,
J. Stavnezer, and C.E. Schrader. 2007. APE1- and APE2-dependent DNA
breaks in immunoglobulin class switch recombination. J. Exp. Med.
204(12):3017-26.
15. Han L., and K. 2008. Altered kinetics of nonhomologous end joining and
class switch recombination in ligase IV--deficient B cells. J. Exp. Med.
205(12):2745-53.
16. Huang F.T., K. Yu, B.B. Balter, E. Selsing, Z. Oruc, A.A. Khamlichi, C.L.
Hsieh and M.R. Lieber. 2007. Sequence dependence of chromosomal R-loops at
the immunoglobulin heavy-chain Smu class switch region. Mol. Cell Biol.
27(16):5921-32.
17. Huang F.T., K. Yu, C.L. Hsieh, and M.R. Lieber. 2006. Downstream
boundary of chromosomal R-loops at murine switch regions: implications for the
mechanism of class switch recombination. Proc. Natl. Acad. Sci. U S A.
103(13):5030-5.
18. Huertas, P., and A. Aguilera. 2003. Cotranscriptionally formed DNA:RNA
hybrids mediate transcription elongation impairment and transcription-associated
recombination. Mol. Cell 12:711-21.
19. Ichikawa, H. T., M. P. Sowden, A. T. Torelli, J. Bachl, P. Huang, G. S.
Dance, S. H. Marr, J. Robert, J. E. Wedekind, H. C. Smith, and A. Bottaro. 2006.
Structural phylogenetic analysis of activation-induced deaminase function. J.
Immunol. 177:355-61.
20. Imai, K., G. Slupphaug, W. I. Lee, P. Revy, S. Nonoyama, N. Catalan, L.
Yel, M. Forveille, B. Kavli, H. E. Krokan, H. D. Ochs, A. Fischer, and A. Durandy.
2003. Human uracil-DNA glycosylase deficiency associated with profoundly
impaired immunoglobulin class-switch recombination. Nat. Immunol. 4:1023-
1028.
18
21. Jimeno, S., A. G. Rondon, R. Luna, and A. Aguilera. 2002. The yeast THO
complex and mRNA export factors link RNA metabolism with transcription and
genome instability. Embo J 21:3526-35.
22. Jung, S., K. Rajewsky, and A. Radbruch. 1993. Shutdown of Class Switch
Recombination by Deletion of a Switch Region Control Element. Science.
259:984-987.
23. Khamlichi A.A., F. Glaudet, Z. Oruc, V. Denis, M. Le Bert, M. Cogné.
(2004). Immunoglobulin class-switch recombination in mice devoid of any S mu
tandem repeat. Blood. 103(10):3828-36.
24. Li, X., and J. L. Manley. 2005. Inactivation of the SR protein splicing factor
ASF/SF2 results in genomic instability. Cell 122:365-378.
25. Lieber M.R. 1991. Site-specific recombination in the immune system.
FASEB J. 14:2934-44.
26. Luby T.M., C.E. Schrader, J. Stavnezer, and E. Selsing. 2001. The mu
switch region tandem repeats are important, but not required, for antibody class
switch recombination. J. Exp. Med. 193(2):159-68.
27. Manis J.P., D. Dudley, L. Kaylor, and F.W. Alt. 2002. IgH class switch
recombination to IgG1 in DNA-PKcs-deficient B cells. Immunity. 16(4):607-17.
28. Masukata, H., and J. Tomizawa. 1990. A mechanism of formation of a
persistent hybrid betwen elongating RNA and template DNA. Cell 62:331-338.
29. Min, I. M., L. R. Rothlein, C. E. Schrader, J. Stavnezer, and E. Selsing.
2005. Shifts in targeting of class switch recombination sites in mice that lack mu
switch region tandem repeats or Msh2. J. Exp. Med. 201:1885-90.
30. Muramatsu M., K. Kinoshita, S. Fagarasan, S. Yamada, Y. Shinkai, and T.
Honjo. 2000. Class switch recombination and hypermutation require activation-
induced cytidine deaminase (AID), a potential RNA editing enzyme. Cell.
102(5):553-63.
31. Muramatsu, M., V. Sankaranand, S. Anant, M. Sugai, K. Kinoshita, N.
Davidson, and T. Honjo. 1999. Specific Expression of Activation-Induced
Cytidine Deaminase (AID), a Novel Member of the RNA-Editing Deaminase
Family in Germinal Center B Cells. J. Biol. Chem. 274:18470-18476.
19
32. Mussmann R., M. Courtet, J. Schwager, L. Du Pasquier. 1997. Microsites
for immunoglobulin switch recombination breakpoints from Xenopus to
mammals. Eur. J. Immunol. 27(10):2610-9
33. Petersen-Mahrt, S. K., R. S. Harris, and M. S. Neuberger. 2002. AID
mutates E. coli suggesting a DNA deamination mechanism for antibody
diversification. Nature 418:99-103.
34. Pham, P., R. Bransteitter, J. Petruska, and M. F. Goodman. 2003.
Processive AID-catalyzed cytosine deamination on single-stranded DNA
stimulates somatic hypermutation. Nature 424:103-107.
35. Rada, C., G. T. Williams, H. Nilsen, D. E. Barnes, T. Lindahl, and M. S.
Neuberger. 2002. Immunoglobulin isotype switching is inhibited and somatic
hypermutation perturbed in UNG-deficient mice. Curr. Biol. 12:1748-1755.
36. Reina-San-Martin B., S. Difilippantonio, L. Hanitsch, R.F. Masilamani, A.
Nussenzweig, and M.C. Nussenzweig. 2003. H2AX is required for recombination
between immunoglobulin switch regions but not for intra-switch region
recombination or somatic hypermutation. J. Exp. Med. 197(12):1767-78.
37. Revy, P., T. Muto, Y. Levy, F. Geissmann, A. Plebani, O. Sanal, N.
Catalan, M. Forveille, R. Dufourcq-Labelouse, A. Gennery, I. Tezcan, F. Ersoy,
H. Kayserili, A.G. Ugazio, N. Brousse, M. Muramatsu, L.D. Notarangelo, K.
Kinoshita, T. Honjo, A. Fischer, and A. Durandy. Activation-induced cytidine
deaminase (AID) deficiency causes the autosomal recessive form of the Hyper-
IgM syndrome (HIGM2). 2000. Cell. 102(5):565-75.
38. Robbiani, D.F., A. Bothmer, E. Callen, B. Reina-San-Martin, Y. Dorsett, S.
Difilippantonio, D.J. Bolland, H.T. Chen, A.E. Corcoran, A. Nussenzweig, and
M.C. Nussenzweig. 2008. AID is required for the chromosomal breaks in c-myc
that lead to c-myc/IgH translocations. Cell. 135(6):1028-38.
39. Roy, D., C.L. Hsieh, and M.R. Lieber. 2009. G-Clustering is important for
the initiation of transcription-induced R-loops in vitro whereas high G-density
without clustering is sufficient thereafter. (In press).
40. Roy, D., K. Yu, and M. R. Lieber. 2008. Mechanism of R-loop formation at
immunoglobulin class switch sequences. Mol. Cell Biol. 28:50-60.
41. Shinkura, R., M. Tian, C. Khuong, K. Chua, E. Pinaud, and F. W. Alt.
2003. The influence of transcriptional orientation on endogenous switch region
function. Nat. Immunol. 4:435-441.
20
42. Wakae, K., B. G. Magor, H. Saunders, H. Nagaoka, A. Kawamura, K.
Kinoshita, T. Honjo, and M. Muramatsu. 2006. Evolution of class switch
recombination function in fish activation-induced cytidine deaminase, AID. Int.
Immunol. 18:41-47.
43. Wuerffel, R., L. Wang, F. Grigera, J. Manis, E. Selsing, T. Perlot, F.W. Alt,
M. Cogne, E. Pinaud, and A.L. Kenter. 2007. S-S synapsis during class switch
recombination is promoted by distantly located transcriptional elements and
activation-induced deaminase. Immunity. 27(5):711-22.
44. Xu, L., B. Gorham, S. C. Li, A. Bottaro, F. W. Alt, and P. Rothman. 1993.
Replacement of germ-line ε promoter by gene targeting alters control of
immunoglobulin heavy chain class switching. Proc. Natl. Acad. Sci. USA
90:3705-3709.
45. Yu, K., F. Chedin, C. L. Hsieh, T. E. Wilson, and M. R. Lieber. 2003. R-
loops at immunoglobulin class switch regions in the chromosomes of stimulated
B cells. Nature Immunol. 4:442-451.
46. Yu, K., F. T. Huang, and M. R. Lieber. 2004. DNA substrate length and
surrounding sequence affect the activation induced deaminase activity at
cytidine. J. Biol. Chem. 279:6496-6500.
47. Yu, K., and M. R. Lieber. 2003. Nucleic acid structures 1 and enzymes in
the immunoglobulin class switch recombination mechanism. DNA Repair 2:1163-
1174.
48. Zarrin, A. A., F. W. Alt, J. Chaudhuri, N. Stokes, D. Kaushal, L.
DuPasquier, and M. Tian. 2004. An evolutionarily conserved target motif for
immunoglobulin class-switch recombination. Nat. Immunol. 5:1275-1281.
49. Zarrin, A.A., M. Tian, J. Wang, T. Borjeson, and F.W. Alt. 2005. Influence
of switch region length on immunoglobulin class switch recombination. Proc.
Natl..Acad. Sci. U S A. 2005. 102(7):2466-70.
50. Zhang, J., A. Bottaro, S. Li, V. Stewart, and F. W. Alt. 1993. A selective
defect in IgG2b switching as a result of targeted mutation of the I gamma 2b
promoter and exon. Embo J 12:3529-37.
21
CHAPTER 2
MECHANISM OF R-LOOP FORMATION AND IMMUNOGLOBULIN CLASS
SWITCH SEQUENCES.
Abstract
R-loops have been described in vivo at the immunoglobulin class switch
sequences and at prokaryotic and mitochondrial origins of replication. However,
the biochemical mechanism and determinants of R-loop formation are unclear.
We find that R-loop formation is nearly eliminated when RNase T1 is added
during transcription, but not when added afterwards. Hence, rather than forming
simply as an extension of the RNA:DNA hybrid of normal transcription, the RNA
must exit the RNA polymerase and compete with the nontemplate DNA strand for
an R-loop to form. R-loops persist even when transcription is done in Li
+
or Cs
+
,
which do not support G-quartet formation. Hence, R-loop formation does not rely
on G-quartet formation. R-loop formation efficiency decreases as the number of
switch repeats is decreased, though a very low level of R-loop formation occurs
at even one 49 bp switch repeat. R-loop formation decreases sharply as G-
clustering is reduced, even when G-density is maintained constant. The critical
level for R-loop formation is approximately the same point to which evolution
drove the G-clustering and G-density on the nontemplate strand of mammalian
switch regions. This provides an independent basis for concluding that the
primary function of G-clustering, in the context of high G-density, is R-loop
formation.
22
Introduction
R-loops are nucleic acid structures in which an RNA strand displaces one
strand of DNA for a limited length in an otherwise duplex DNA molecule. R-loops
were named by analogy to D-loops, which is where all three strands are DNA. R-
loops form in vivo at sequences that generate a G-rich transcript at the
prokaryotic origins of replication (20), mitochondrial origins of replication (18),
and mammalian immunoglobulin (Ig) class switch sequences (reviewed in (45).
In addition, R-loop formation occurs in vivo at some G-rich transcript locations
that are distinctly high for mitotic recombination in S. cerevisiae, and this high
recombination rate is reduced upon overexpression of S. cerevisiae RNase H1
(14). When prokaryotes lack topoisomerase activity, R-loops can form at a wider
variety of sequences, and the lethality associated with this can be remedied by
overexpression of E. coli RNase H1 (7). In an avian lymphoid cell line, lack of
the Asf2 RNA binding protein favors R-loop formation at G-rich transcript
locations in the genome, and expression of human RNase H1 can abolish the R-
loop formation (19).
In vitro studies of R-loop formation at prokaryotic origins and
immunoglobulin class switch regions have paralleled many of the observations
seen in vivo. In vitro studies utilize prokaryotic RNA polymerases, often the
phage T7 or T3 RNA polymerases, and purified plasmid DNA. Ig class switch
recombination (CSR) sequences have been the focus of most of these in vitro
studies (6, 9, 28, 29, 38), though studies on mitochondrial and prokaryotic
replication origins have also been done (18, 41). The R-loops only form when in
23
vitro transcription occurs in the direction that results in a G-rich transcript. There
has been no systematic study of how G-rich or how long the regions must be, nor
has there been any sequence modification to assess any aspect of these G-rich
regions for their propensity to form R-loops. Ig CSR occurs at switch regions. In
mammals, recombination occurs between the Sµ region, which is located
upstream of the constant region exons encoding Igµ heavy chain, and any one of
the downstream switch regions, Sγ, Sα, or Sε, which are located upstream of the
constant exons encoding the Igγ, Igα, and Igε heavy chains, respectively (3, 5,
33, 45). In mammals, the Ig switch regions are usually several kilobases in
length, G-rich on the nontemplate strand (thereby generating a G-rich RNA
transcript), and repetitive (with a repeat length of between 25 and 80 bp). Many
of the Gs are in clusters of 2 to 5 nt. Promoters are present in front of each
switch region, the transcripts generated from these promoters do not encode any
protein (hence, the name sterile transcripts), and removal of the promoter results
in the loss of switching to that specific switch region (35, 42). Ig CSR occurs in
germinal center B cells located in the peripheral lymphoid tissues (e.g., lymph
nodes, Peyer's patches, and spleen) upon cytokine stimulation. Different
cytokines stimulate the promoters upstream of the different switch regions (36).
Ig CSR requires a cytidine deaminase called activation-induced
deaminase (AID), which is expressed in activated B cells (24). AID only
deaminates Cs when these are located in single-stranded DNA (4, 25, 44). CSR
at the downstream switch regions occurs within the switch repetitive regions, and
recombination at Sµ can sometimes occur upstream (35%) or downstream (8%)
24
of the Sµ switch repeats (8, 21-23). Given that AID requires single-stranded
DNA, a key question concerns how any single-strandedness is exposed within
the switch regions (33). We have shown that R-loops are detectable at the Sγ3
and Sγ2b acceptor switch regions (13, 43), and more recently at the Sµ switch
regions (12). These R-loops can be kilobases in length and provide a ready
target for AID at any of the Cs within the top strand. Single-strandedness on the
bottom strand may derive from partial or complete action by endogenous RNase
H in its removal of the RNA that is annealed to the bottom strand (43, 45). After
AID action on the top and bottom strands, uracil glycosylase (UNG2) converts
these to abasic sites (16, 27), and apurinic/apyrimidinic endonuclease (APE1)
may nick the phosphodiester backbone 5' of the abasic site (10). Double-strand
breaks could arise from nicks that are sufficiently close, or after nucleases, such
as Exo 1 (1), resect from the nick toward an adjacent nick.
The mechanism of R-loop formation is of key importance not only for the
immune system during CSR, but also for any processes in biology where R-loops
arise. Here we have focused on three central issues about the mechanism of R-
loop formation. First, how does the RNA that forms the R-loop arrive at a
position that permits it to reanneal with the template DNA strand? Is it simply an
extension of the standard 9 bp RNA:DNA hybrid formed during transcription and
known to form within all RNA polymerases? Or does it thread back to anneal
with the template DNA strand after traversing the exit pore that exists in all RNA
polymerases? Second, is G-quartet formation by the G-rich nontemplate strand
essential for R-loop formation? Third, what is the minimum length of Ig switch
25
region DNA necessary for R-loop formation, and can a reduction in G-density or
a reduction in the extent of G-clustering still permit R-loop formation?
Materials and Methods
Plasmid switch substrates
All of the Ig class switch region sequences used in this and the
subsequent chapters are from the murine Sγ3 region. The wild type switch
sequences in pDR3 include four of the 41 repeats from the murine Sγ3 region.
These four are repeat numbers 13, 14, 15, and 16 (37), and they were introduced
using oligonucleotides and PCR into pKY127. pKY127 is simply a derivative of
pBKS(+) with a deletion of the lac Z promoter and replacement with an
oligonucleotide that serves as a multipurpose cloning region.
pTW122 (also called pTW-EL91) contains 5.5 repeats and is the same as
pTW-SS91 (43) except for minor differences in the multi-purpose cloning region.
Another 4 repeat substrate pDR18, pDR49 (1 repeat), pDR50 (2 repeats), pDR51
(3 repeats) and substrates containing 4 modified repeats pDR22, pDR26 and
pDR30 used in our studies, were also made similarly as described above for
pDR3. In pDR22, G-clusters containing 4 or 5 Gs in the wildtype sequence were
reduced to clusters of 3Gs, whereas pDR26 was made with 4 repeats in which all
clusters of 3, 4 or 5Gs were made to clusters of 2Gs while pDR30 has the 4 Sγ3
repeats where all clusters of 5, 4, 3 and 2 were disrupted such as no two Gs
could be adjacent to each other. A transcription substrate called pDR54 was
constructed to have 49.7% G-density on the nontemplate strand such that every
26
alternate nucleotide is a G. This substrate was constructed to compare the effect
of G-density on R-loop formation and was made using a 189 bp XhoI digested
PCR fragment that was cloned so that transcription from the T7 promoter
generates a G-rich RNA.
The plasmid DNA was extracted from bacterial cultures of transformed
bacterial colonies and purified on CsCl gradients, followed by ethanol
precipitation using standard procedures. For the G-quartet experiments,
pTW122 was purified using a CsCl gradient followed by precipitation and
reprecipitation without monovalent salt but with glycogen to preclude introduction
of other monovalent cations into the DNA preparation. An ethanol rinse was then
done (with no added salt), followed by resuspension in the transcription buffer
with the specified cation. Hence, any G-quartets formed during transcription
would need to form in the presence of the specified cation. In the experiments
where this template is linearized, the restriction digestion was done using in a
Na
+
free buffer containing 10mM Tris-HCl (pH 8.0) and 10mM MgCl
2
. DR050
and DR051 or DR056 were used for PCR while DR075, DR077, or DR110 were
used as probes to detect bisulfite converted R-loop derivative molecules in the
colony lift hybridization assay. See Appendix A for the sequence of the
oligonucleotides and other related information about enzymes, reagents, and the
construction of the plasmids.
27
In vitro transcription of switch substrates
Supercoiled or linearized (restriction enzyme digested) switch DNA
substrates were transcribed with T7 RNA polymerase (Promega, Madison) in the
physiological orientation at 37
o
C for 1 h in accordance with the polymerase
manufacturer’s instructions. Radiolabeled
32
P-α- UTP was added to the reaction
wherever specified. For the RNase T1 experiment, 1 µg RNase A or 100 units of
RNase T1 were added (per microgram of DNA transcribed) during the
transcription or after heat inactivation of the reaction at 65
o
C for 20 minutes.
50ng of purified E.coli RNase H1 was added per microgram of DNA transcribed
where specified. Unless otherwise specified, RNase H1 and/or RNase A were
added after the transcription reaction and incubated at 37
o
C for 1h. For the G-
quartet test, pTW122 was digested in a Na
+
free buffer containing 10mM Tris-HCl
(pH 8.0) and 10mM MgCl
2
, and transcription was done in buffers that had the
same composition as the commercially available buffer from Promega except
that the Na
+
was replaced with Li
+
, K
+
or Cs
+
. The transcribed DNA was
electrophoresed on 1% agarose gel in 0.5X TBE buffer and post-stained with
ethidium bromide. Radiolabeled species were detected by exposing the gels to
phosphoimager screens and scanning them on a Molecular Dynamics Imager
445SI (Sunnyvale, CA). Bands were analyzed with ImageQuant software version
5.0.
28
Determination of frequency of R-loop formation.
2 µg of SalI-digested plasmid switch substrates were transcribed, treated
with RNase A, organically extracted and precipitated with ethanol. Sodium
bisulfite treatment was done as described (43). PCR amplification on bisulfite
modified DNA was done with DR050 and DR051 or DR056 (all native primers).
The PCR fragment was cloned with the TOPO-TA cloning kit (Invitrogen,
Carlsbad, CA). Bacterial colonies were lifted onto nylon membranes (13) and
probed with oligonucleotides designed to anneal to a region containing C-to-T
conversions but not to an unconverted region on the nontemplate strand. Each
probe was designed to bind with a region with about 6 C-to-T changes over a
length of approximately 25 bp. Oligonucleotide probes DR075 (for pDR18,
pDR22, pDR49, pDR50, and pDR51), DR077 (for pDR26) or DR110 (for pDR54)
were radiolabeled in the presence of
32
P-γ-ATP and used for detection of putative
clones with regions containing C-to-T converted sites. The C-to-T conversions
found in the region between DR050 and DR051 are shown in appropriate figures.
Molecules with >25 nt long stretches containing at least 3 consecutive C-to-T
conversions were considered to be regions of single-strandedness.
Enrichment of R-loops by cutting the RNA:DNA hybrid bands from an agarose
gel
To study the nature of shifted DNA species, we ran 2µg of restriction
digested and T7 transcribed, RNase A treated DNA on 0.8% low melting
temperature agarose gel in 0.5X TBE and cut out the shifted fragment seen on
29
the gel or the corresponding position where a shift is expected (marginally above
the DNA band containing the switch region). This gel slice was incubated with
sodium bisulfite and analyzed further by sequencing.
Results
Experimental Strategy
The normal mammalian switch regions are kilobases in length. For
example, the murine Sγ3 switch region consists of 41 consecutive head-to-tail
copies of a ~49 bp repeat. We have shown previously that shorter forms of Sγ3
can still support R-loop formation on supercoiled plasmids very efficiently in vitro
(43, 46). Supercoiled plasmids form R-loops particularly efficiently because the
inherent negative supercoiling favors strand separation. We wanted to eliminate
any contribution of negative supercoiling, and hence, we have used linear DNA
substrates here except where specified, and this provides a more stringent test of
R-loop formation. Moreover, DNA in the genome, while somewhat negatively
supercoiled (17), is unlikely to be as negatively-supercoiled as prokaryotic
plasmids.
For the studies here, we use prokaryotic RNA polymerases, typically T7
RNA polymerase, and we transcribe for 1 hr incubation at 37°C. The samples
are organically extracted, ethanol precipitated, resuspended, and then run on
agarose gels. Where indicated, the transcription is done using radiolabeled
32
P-
α-UTP. The fraction of substrate that forms an R-loop exhibits a mobility
difference and runs more slowly than the double-stranded DNA substrate, even
30
on these linear substrates (7). We and others have demonstrated that the DNA
in the shifted position is R-looped, based on RNase H1-sensitivity and sodium
bisulfite chemical probing for single-strandedness (12, 31, 43).
Test of a Thread-Back Model for R-loop Formation
We were interested in how the RNA comes to be annealed to the template
DNA strand, and there are two major pathways that one can consider (see
Appendix A Fig. A.1). All RNA polymerases (prokaryotic and eukaryotic) have
exit channels where the nascent RNA normally exits. One possibility is that the
RNA comes out of the exit pore of the RNA polymerase, and then anneals to the
template strand before the two DNA strands anneal to one another. Once an
initial RNA:DNA association is formed between the G-rich RNA and the template
DNA, the rest of the RNA can thread back and form an RNA:DNA hybrid, which
is thermodynamically more stable than the double-stranded B-DNA (30). This
model of R-loop formation can be termed RNA 'threadback,' and requires that the
RNA be single-stranded for a short period of time before hybridizing with the
template DNA strand.
A second possibility, which can be called the 'extended-hybrid' model,
assumes that the transcript that forms upon transcription of the switch sequences
fails to denature from the template in the transcription bubble, owing to the high
thermodynamic stability of the short G-rich RNA:DNA hybrid. Then the
remainder of the transcript simply extends as the RNA polymerase transcribes
and moves forward on the template. If this model applies, then there would be
31
no free-RNA phase, and the RNA would be annealed to the template strand the
entire time.
To distinguish between the two models and dissect the mechanism of
RNA association with the DNA, we added RNase T1 during the transcription of
linearized switch substrates containing 4 wild type, murine Sγ3 (switch region)
repeats (diagrammed in Fig. 2.1C). If the nascent RNA is exposed at any time,
the RNase T1, which nicks 3' of Gs, will suppress R-loop formation (Appendix A
Fig. A.2). For this study, RNase T1 is superior to RNase A, which cuts only after
pyrimidines, because the RNA strand is relatively poor in pyrimidines and may be
cut less frequently than with RNase T1.
We transcribed 200 ng of linearized switch substrate with T7 RNA
polymerase with or without 20 units of RNase T1. Where it is specified, RNase A
was added after the transcription in order to destroy free RNA that would
otherwise obscure gel analysis. The gels show the shifted species associated
with R-loop formation when RNase T1 and RNase A were both added after the
transcription step (Fig. 2.1A, lanes 17 and 18). Importantly, the shifted species
was markedly reduced if the same amount of RNase T1 was present during the
transcription step, and the RNase A was still added afterward (Fig. 2.1, lanes 9
and 10). As a secondary point, when RNase A is present during transcription
(Fig. 2.1A, lanes 3, 4), it does not reduce the R-loop shifted species nearly as
much as T1 (Fig. 2.1B, lanes 5, 6). As mentioned, this is because the number of
cut sites in the nascent RNA for RNase A is much smaller than for T1. Because
the transcription was done with radiolabeled
32
P-α-UTP, gel exposure provides
32
information about the location of the RNA on the gel. The radiolabeled RNA is
associated in mobility with the linear DNA fragment (Fig. 2.1B, lanes 17, 18), and
this RNA is not visible in the lanes where RNase T1 is present during
transcription (Fig. 2.1B, lanes 9 and 10). Note that when RNase A is not used at
any time in the experiment, then the lanes are obscured by the radiolabeled RNA
(Fig. 2.1B, lanes 13 & 14).
Therefore, the nascent RNA is vulnerable to single-strand-specific
RNases, such as T1, when there are sufficient cut sites at which these can act.
This finding is inconsistent with the extended hybrid model and is quite consistent
with the thread back model (See Figs. A.1 and A. 2 in Appendix A).
33
Figure 2.1. RNase T1 interferes with R-loop formation on a linearized
switch substrate containing 4 repeats of murine Sγ3.
The substrate, pDR3, was ApaL1-linearized and then transcribed with T7
RNA polymerase in the presence of
32
P-α-UTP. RNAse T1 was added during or
after the transcription incubation to test if it can interfere with the mobility shift
(labeled 'Shift') caused by R-loop formation. RNase A was added where
indicated after transcription to digest excess RNA, which would otherwise
interfere with visualization of other species in the lane. The lanes are run in
duplicate as indicated by the brackets above the numbering of the lanes. A
reduction in the amount of shift is observed when RNase T1 is present during the
transcription (lanes 9 and 10) as compared to when it is added after transcription
(lanes 17 and 18). RNase T1 is more effective in interfering with R-loop
formation than RNase A (compare lanes 5 and 6 with lanes 3 and 4).
Comparison between lanes 13 and 14 with lanes 17 and 18 demonstrates the
effect of RNase A added after transcription. RNase H1 was added where noted
to show the reversal of the mobility shift (lanes 7 and 8, lanes 11 and 12, lanes
15 and 16, and lanes 19 and 20). No shift is seen when the DNA is not
transcribed (lanes 1 and 2).
(A) The ethidium bromide stained gel demonstrates both the shifted DNA
(positions marked with 'Shift') and the linear fragment that contains the switch
sequence (position marked with L). An irrelevant restriction fragment from the
plasmid runs at or near the bottom of the gel.
(B) The radioactive profile of the gel in (A) is shown. Only the shifted DNA,
but not the linearized DNA (L) becomes radioactively labeled, indicating the
presence of RNA in the shifted species.
(C) A schematic of the linearized substrate containing the four repeats of
murine Sγ3 is shown. Shaded arrows indicate the 4 switch repeats. The T7
promoter is shown with a thin solid arrow above the line.
34
Test of G-Quartet Formation at Ig Class Sequence R-loops
As mentioned, class switch sequences are repetitive in nature and are
extremely rich in G nucleotides on the nontemplate strand (49% G for the mouse
Sγ3 nontemplate strand). In addition, the G-nucleotides in the switch regions
tend to occur in clusters of three, four or five Gs, making G-quartets a possibility
(6, 9). Such structures have been proposed to exist at other G-rich sites in the
genome, and this raises a possibility that the R-loops found at the switch
sequences are dependent upon G-quartet formation at the G-rich non-template
DNA strand. G-quartets have very specific dimensions and are stabilized by K
+
or Na
+
cations, which have the right size for the cavity in the center of a G-quartet
or between planes of more than one G-quartet. G-quartets are destabilized in
presence of Cs
+
, which is too large, or by Li
+
, which is too small (32).
We wondered if R-loops are formed as a result of G-quarteting on the non-
template DNA strand. If this were true, then the presence of inhibitory cations
that inhibit G-quarteting would also suppress R-looping. To test this, we
prepared DNA templates in a manner such that only one type of monovalent
cation was present (see Methods). We transcribed a minimal switch substrate
containing 5.5 repeats of murine Sγ3, either in supercoiled or linearized form,
with T7 RNA polymerase with only Li
+
, Na
+
, K
+
, or Cs
+
present in the transcription
buffer as the monovalent cation. Hence, if G-quartets were to form during the
transcription, then they would have to do so in the presence of only these
specified cations. We then ran the samples on a gel to assess the transcription-
induced mobility shifted species. We find that in Li
+
(Fig. 2.2; lane 4 for
35
supercoiled substrate and lane 12 for linearized substrate) or in Cs
+
(Fig. 2.2;
lane 7 for supercoiled substrate and lane 15 for linearized substrate) the R-loop-
induced shift persists (Fig. 2.2; lanes 5 and 6 for supercoiled and lanes 13 and
14 for linearized substrate). Based on this, R-loop formation does not require G-
quartet formation. Moreover, the amount of R-loop formation is similar or higher
for all of our buffers containing Li
+
, Na
+
, K
+
or Cs
+
(Fig. 2.2, lanes 4-7 and 12-17
and data not shown) relative to the manufacturer's buffer for T7 RNA polymerase
(Fig. 2.2, lanes 2 and 10), which is prepared using Na
+
(and has the same
composition as our Na
+
-based transcription buffer). Therefore, the stability of R-
loops appears very unlikely to be reliant on G-quartet formation.
36
Figure 2.2. R-loop formation at the murine Sγ3 is not reliant on G-quartet
formation.
In vitro transcription was directed from the T7 promoter on supercoiled
(lanes 2-7) or ApaLI linearized (lanes 9-15) pTW122, which contains 5.5 repeat
switch repeats from 13 to 17.5 of Sγ3. The transcription was done in either
Promega transcription buffer (lanes 2 and 3, 10 and 11) or in transcription buffers
containing Li
+
(Lanes 4 and 12), Na
+
(lanes 5 and 13), K
+
(lanes 6 and 14) or Cs
+
(lanes 7 or 15). Both sets of DNA (supercoiled and linearized) exhibit mobility
shifts. The supercoiled (SC) DNA shifts to the nicked circular (NC) position. The
linear DNA fragment shifts only slightly (labeled Shift). R-loop formation is not
affected despite the presence of G-quartet destabilizing cations like Li
+
(lanes 4
and 12) or Cs
+
(lanes 7 and 12) as compared to Na
+
(lanes 5 and 13) or K
+
(lanes
6 and 14). Lanes 1 and 9 are untranscribed DNA and do not show any shift;
lanes 2 and 10 are transcription reactions done in the manufacturer's buffer and
lanes 3 and 11 are transcribed DNA treated with RNase H1 to demonstrate
reversal of the shift. Lane 8 is a 1 kb DNA molecular weight marker.
37
Determination of the Minimum Length of Ig Switch Sequences for R-loop
Formation
Genomic loci of mammalian class switch sequences are extremely
repetitive and are known to exceed several kilobase pairs in length. While a
greater length may allow for a better regional specification and targeting of the
switch sequences by the class switch machinery, we wanted to know if a smaller
number of switch repeats can form R-loops and the minimal length necessary.
We cloned different lengths of murine Sγ3 repeats containing 1 to 4 repeats
downstream of the T7 promoter in the physiological orientation so that
transcription generates a G-rich transcript. If the 41 murine Sγ3 repeats are
assigned numbers 1 through 41, then pDR18 contains repeats 13 through 16 (8,
26). pDR49 contains 1 repeat (repeat 13), pDR50 contains 2 repeats (repeats 13
and 14), and pDR51 contains 3 repeats (repeats 13-15), and pDR16 does not
contain any switch DNA and is used as a 'no switch' DNA control. Transcription
reactions were done using linearized substrates in the presence of radiolabeled
UTP (as described in the Methods) and run on agarose gels. Agarose gel
analysis shows that an RNase H1-sensitive shifted species is present for 2, 3
and 4 repeats but not for one or zero repeats (Fig. 2.3, lanes 2 and 16, lanes 5
and 17, lanes 8 and 18, lanes 11 and 19, and lanes 14 and 20 for pDR16,
pDR49, pDR50, pDR51 and pDR18, respectively). The amount of the shifted
species (R-loop) is less for the 2 repeat substrate (Fig. 2.3, lane 18) than for the
3 or 4 repeat substrates (Fig. 2.3, lanes 19 and 20). Therefore, R-loops can form
on linear segments of DNA containing only 2 Sγ3 repeats.
38
Figure 2.3. In vitro transcription of linear substrates bearing 0, 1, 2, 3 and
4 Sγ3 repeats.
Transcription was done in the presence of
32
P-α-UTP on a control
substrate bearing no switch sequences (lanes 1, 2, 3) or on substrates with 1, 2,
3 or 4 wild type Sγ3repeats (lanes 4-6, 7-9, 10-12 and 13-15 respectively). The
first lane in each set of 3 lanes is a mock transcription reaction without any T7
polymerase. The second lane in each set of 3 lanes is transcribed DNA followed
by RNase A treatment. The third lane in each set of 3 lanes shows the result of
transcription followed by RNase A and RNase H1 treatment. The shifted bands
have been highlighted with an asterisk placed on the left side of the shifted band.
Duplicate reactions of transcribed and RNase A-treated DNA were run together
in lanes 16-20 to allow for better comparison of shifted species.
(A) The ethidium bromide stained gel is shown.
(B) The radiolabeled RNA of the R-loop has the same mobility (based on
measurement) as the shifted band on the same gel visualized using ethidium
bromide. The band for the substrate is not radiolabeled.
39
Quantitation of the Efficiency of R-loop Formation as a Function of the Number of
Switch Repeat Units
To assess the length, map the location, and determine the frequency of R-
loops formed at these transcribed substrates, we adapted a colony lift
hybridization assay (13). In this approach, we treated the linearized and
transcribed substrates containing 1, 2, 3 or 4 repeats (pDR49, pDR50, pDR51
and pDR18, respectively) with sodium bisulfite, and then cloned individual
molecules after PCR. The PCR amplification used native unconverted DNA
primers located at least 100 bp away from the beginning and end of the switch
sequences.
We found that R-loops are present for the 1, 2, 3 and 4 repeat-containing
substrates (Table 2.1). As expected from the agarose gel and radiolabel data
(Fig. 2.3), the order of efficiency of R-loop formation is highest for pDR18,
containing four repeats (6.7%), and decreases to 5% for 3 repeats, 1% for 2
repeats, and 0.37% for one repeat (Table 2.1 and Appendix A Fig. A.3). The
colony lift hybridization assay for the detection of the R-loops appears to be more
sensitive than the gel shift assay, which did not show any detectable R-loop
formation for the 1 repeat substrate (Fig. 2.3).
The locations of the R-loops within the 1, 2, 3 or 4 repeat units is
noteworthy (Fig. 2.4A-D). The R-loops are nearly entirely contained within the
switch repeat zones. In many instances, there is a very small amount of branch
migration extending 7 nts towards the promoter, but still within the transcribed
region (Fig. 2.4). Interestingly, all of the R-loops terminated within the switch
40
repeats. The lack of extension downstream of the repeats contrasts with R-loops
in the genome at Sµ and Sγ3, where a subset of R-loops extend downstream for
hundreds of base pairs (12, 13). This difference may be a function of how
quickly the G-density drops after the last switch repeat. In the genome, this drop
is very gradual, whereas on the substrates used here, the drop is very sharp.
41
Table 2.1. Frequency of R-loop formation at switch substrates containing
various repeat lengths of Sγ3 as determined by the colony lift hybridization
assay and sequence analysis.
Substrate;
number of
S γ3 repeats
Length of
switch region
(bp)
Number of
molecules
tested in
colony lift
hybridization
assay
(nontemplate
strands only)
Number of
molecules
with long
stretches of
single-
strandednes
s on
nontemplate
strand
Frequency of
R-loop
formation
(percentage
of molecules
in R-looped
conformation
)
pDR18; 4 189 362 24 6.7
pDR51; 3 143 423 21 5.0
pDR50; 2 96 793 8 1.0
pDR49; 1 48 818 3 0.37
These data correspond to that in Figure 2.4.
42
Figure 2.4. Location of R-loops on transcribed substrates bearing one to
four Sγ3 repeats.
(A) R-loop formation on substrates with 4 Sγ3 switch repeats.
(B) R-loop formation on substrates with 3 Sγ3 switch repeats.
(C) R-loop formation on substrates with 2 Sγ3 switch repeats.
(D) R-loop formation on substrates with 1 Sγ3 switch repeat.
In each panel, the top horizontal line represents the PCR fragment
between primers DR050 and DR051 and shows the location of the T7 promoter,
the direction of transcription, and the location of the Sγ3 repeats and their head-
to tail arrangement (the open arrows). Each repeat is drawn to scale with
respect to the number of Cs in this region. The next horizontal line displays
every C present in the nontemplate strand of the fragment (full C display), and
these have been spaced equally, regardless of the actual distance between
them. Each C is depicted by a short, thick vertical line. The numbers refer to the
nucleotide position along the sequence. The actual results for the C-to-T
conversions in individual molecules are shown in the numerous lines below the
full C display. Each long horizontal line represents the sequence from a single
transformant that was picked by colony lift hybridization. All of the molecules
have at least three consecutive C-to-T changes on the nontemplate strand for a
length >25nt. The substrates of different lengths have been aligned on repeat
number 13 (first repeat). The length of the fragment has been noted under the
first horizontal lane in each panel, and the asterisks represent bacterial dcm
43
Figure 2.4, Continued
methylation sites CC(A/T)GG where the second C of the sequence escapes
bisulfite modification when methylated. For ease of visualization, the Cs have
been displayed equidistant relative to one another in this and the subsequent
figures.
44
Figure 2.4, Continued
45
Effect of G-clustering on the Efficiency of R-loop Formation at Ig Switch
Sequences
Mammalian switch repeats are remarkably rich in clusters of 3, 4 or 5 Gs
together, which suggests a physiologically relevant role of G-clustering at these
sequences (11). Significant numbers of G-clusters drive the G-density to a high
level also. In addition to being rich in G-clusters, mouse Sγ3 is approximately
49% G on the nontemplate strand, which is much higher than the mammalian
genome-wide average of ~20.5% G-content.
We sought to determine whether R-looping efficiency decreases with a
decrease in G-clustering on the nontemplate strand. To assess this, we modified
the wild type 4 repeat substrate, pDR18, to make three derivative substrates in a
manner that reduces the G-cluster size. For pDR22, the clusters of GGGGG
were changed to cGGGc, and the clusters of GGGG were changed to GGGc.
This amounts to a change of only 10 bp out of the 189 bp 4-repeat switch region,
and there is no change in GC-density; all G changes are to C. More substantial
reductions in G-density and G-clustering were made for pDR26 (Table 2.2) and
pDR30 (see Methods and Appendix A). We then used the colony lift
hybridization assay to determine the amount of R-loop formation. We found that
even these minimal changes in pDR22 decrease the R-loop frequency from 6.7%
down to 0.23% (Table 2.2). The location of the R-loops is similar to that found for
the wild type 4 repeat substrate (panel A in Fig. 2.5). The additional decrease in
pDR26 drops the frequency to an undetectable level (Table 2.2). Therefore, the
clustering of Gs is critical for R-loop formation.
46
We were interested in evaluating a larger number of R-loops for the
pDR22 substrate. After running the transcribed sequences on a gel (Fig. 2.5B),
we cut out the shifted species and treated with bisulfite, followed by TA cloning
and sequencing. The gel analysis confirmed that a reduction in G-clustering
reduces R-loop formation (Fig. 2.5B, lanes 2-4 triplicate for pDR18 versus lanes
7-9 for pDR22, lanes 12-14 for pDR26, and lanes 17-19 for pDR30). Using this
enrichment method, we still detected no R-loops when the G-cluster length was
decreased to GG or G. We were able to detect R-loops for pDR22 (which has
GGG clusters), and the location of the R-loops (Fig. 2.5C for wild type and 2.5D
for pDR22) was similar to those found by filter hybridization (Fig. 2.5A) and
similar to R-loops formed on the wild type substrate (pDR18)(Fig. 2.5C). Hence,
small reductions in the size of G-clusters result in large reductions in R-loop
formation efficiency (Appendix A Fig. A.4).
47
Table 2.2. Frequency of R-loop formation at four repeats of wild type or
sequence-modified Sγ3 as determined by the colony lift hybridization assay
and sequence analysis.
Substrate;
length of
insert (bp)
G-cluster
size; G-
density on
nontemplate
strand
Number of
molecules in
colony lift
hybridization
assay
(nontemplate
strands only)
Number of
molecules
with long
stretches of
single-
strandednes
s on
nontemplate
strand
Frequency of
R-loop
formation
(percentage of
molecules in R-
looped
conformation)
pDR18; 189 GGGG;
49.7%
362 24 6.7
pDR22; 189 GGG;
44.4%
854 2 0.23
pDR26; 189 GG;
41.3%
621 None
detected
<0.16
pDR54; 189 No G-
clusters;
49.7%
684 2 0.29
These data correspond to that in Figures 2.5A and 2.6C.
48
Figure 2.5. Location of R-loops on transcribed substrates for which the G-
density of the switch repeats has been reduced.
(A) The top line shows a diagram of the T7 promoter and the relative location
of the 4 modified repeats in pDR22, where all wild type G-clusters of 4 or more
Gs were reduced to GGGs. The second line shows a full display of the Cs
present on the nontemplate strand, with the approximate length in nucleotides
depicted on the top. The third and fourth lines show the results for two R-loop
molecules that were identified by colony lift hybridization assay. The asterisks
note the two dcm sites present in the sequence. The first molecule has a
conversion stretch of at least 35 nt, whereas the second molecule has a
conversion stretch of 162 nt stretch.
(B) The substrates containing 4 wild type repeats (pDR18) or 4 modified
repeats (pDR22, pDR26, pDR30) were either mock transcribed (lanes 1, 6, 11
and 16 for pDR18, pDR22, pDR26 and pDR30, respectively), transcribed with T7
RNA polymerase, in the presence of
32
P-α-UTP, or treated with RNase A (2
nd
, 3
rd
and 4
th
lane for each substrate; i.e., lanes 2, 3, 4 for pDR18; lanes 7, 8, 9 for
pDR22; lanes 11, 12, 13 for pDR26 and lanes 17, 18, 19 for pDR30 respectively)
or RNase A and RNase H1 (5
th
lane for each set; i.e., lanes 5, 10, 15 and 20 for
pDR18, pDR22, pDR26 and pDR30, respectively). The top gel image shows the
ethidium bromide stained gel, and the bottom image is the same gel after
phosphorimager exposure. 'Shift' designates the shifted species, and L
designates the linear fragment bearing the switch sequence. Only the wild type
repeats show a shift upon transcription (bottom panel, lanes 2-4) but not the
49
Figure 2.5, Continued
other three substrates. A small amount of radioactivity is seen for pDR30, but we
have confirmed that this species does not form R-loops based on the bisulfite
modification assay (after cutting out the region where a shift would be expected
on the agarose gel and doing bisulfite sequence analysis).
(C) The depiction is similar to Figure 2.3. The top line shows the location and
length of R-loops in the molecules derived from shift of the wild type substrate
bearing four repeats (pDR18). The second line is the full display of Cs on the
nontemplate strand. Asterisks are sites of the two dcm sites present in these
substrates.
(D) Same as (C) except for pDR22 (clusters of GGG).
(E) Same as (C) except for pDR26 (clusters of GG).
(F) Same as (C) except for pDR30 (isolated Gs; no consecutive Gs).
R-loops were detected only for the wild type repeats (C) or when the G-
clustering was reduced to GGG for pDR22 (shown in panel D) but not for
substrates with lower G-cluster lengths (for which the conversion levels dropped
to background) (shown in panels E & F). Panels C to F have all been aligned
with one another at the start of the first repeat.
50
Figure 2.5, Continued
51
Figure 2.5, Continued
52
Dissection of G-density from G-clustering in R-loop Formation Efficiency
After observing the effect of G-cluster size on R-loop formation, we
wanted to assess if a complete loss of G-clustering can still support R-loop
formation in the context of a high G-density (identical to the levels of G-density
on the nontemplate strand of wild-type repeats of murine switch γ3). Therefore,
we constructed a transcription substrate (pDR54) identical in size and length to
the repeats contained in the wild-type substrate (pDR18) and designed such that
the substrate contains 49.7% Gs over a length of 189 bp on the nontemplate
strand, but with every second nucleotide being G, thereby abolishing any G-
clustering (no two Gs next to one another). In a transcription-induced shift assay,
we found that in comparison to the wild type substrate that had a strong
transcription-induced and RNase H1-sensitive gel mobility shift, this G-dispersed
substrate (pDR54) showed no notable shift (Fig. 2.6A), suggesting that G-density
alone is not sufficient for induction of R-looping in templates with a high G-
density. However, in our colony lift assay, we did identify two R-looped
molecules (Fig. 2.6C). In these two molecules, the R-loops were located in the
G-dense region downstream of the promoter. The frequency of such R-looped
molecules was only 0.29%, which is substantially lower than the 6.7% R-loop
frequency observed for the wild type (G-clustered) repeat substrate (pDR18). An
enrichment method for detection of R-loops (as described above) allowed us to
study additional molecules that were in an R-loop conformation. These
experiments indicate that G-clustering is the most important determinant of R-
loop formation. Although R-loops can be independently supported by extremely
53
G-dense regions, high G-density is a less important determinant of R-loop
formation.
54
Figure 2.6. Detection of transcription-induced R-loops on a substrate with
high G-density but no G-clustering.
(A) The gels show the transcription-induced gel mobilities of pDR18 (with wild-
type G-clusters) and pDR54 (with 49.7% G-density but no G-clustering). A
shifted species (marked ‘Shift’) is seen running above the switch containing linear
fragment (marked ‘L’) for T7transcribed linearized pDR18 but not for pDR54
(compare triplicate lanes 2, 3, 4 for pDR18 with lanes 7, 8, and 9 for pDR54). As
expected, the untranscribed DNA (lane 1) or RNase H1 treated transcribed DNA
(lane 5) do not show the shifted species for pDR18. The top panel is an image of
an ethidium bromide-stained agarose gel. The bottom panel is the same gel as
the ethidium-stained gel and shows the localization of
32
P-α-UTP labeled RNA at
the shifted species for pDR18 (triplicate lanes 2,3 and 4) but not to the linear
fragment ‘L'. In contrast, there is almost no detectable radiolabel seen for the
corresponding lanes for pDR54 (triplicate lanes 7, 8, and 9).
(B) The top line depicts the PCR fragment of pDR54 with the G-dense region
shown as an open rectangle and downstream of the T7 promoter. The second
line depicts all of the Cs on the nontemplate strand.
(C) From 684 nontemplate-strand informative molecules analyzed by colony
lift hybridization, only two molecules were identified as having R-loops. The R-
loops are located in the G-dense region.
(D) The location of R-loops in pDR54 are shown. These molecules were
detected using the enrichment described in the Methods. The long stretches of
conversion (R-loops) are located in the G-dense region.
55
Figure 2.6, Continued
56
Discussion
We found that the RNA in an R-loop is vulnerable to RNase T1 action
during transcription. This observation is inconsistent with an extended hybrid
model and is most consistent with a thread back model for R-loop formation
(Appendix A, Figs. A.1 and A.2). This indicates that the RNA exits the RNA
polymerase exit channel and then anneals to or threads back onto the template
DNA strand. Prokaryotic and eukaryotic RNA polymerases all have exit
channels, and hence these findings are likely to be general ones for R-loop
formation (40).
The stability of the R-loop does not require G-quartet formation (Fig. 2.2).
Therefore, whether the nascent RNA remains unassociated with the template
DNA strand versus threads back to anneal with the template strand is determined
largely by the energy difference between these two states, and this is clearly a
function of the DNA sequence of the region. That is, the RNA polymerase itself
may have little role in determining the balance between R-loop formation and no
R-loop formation. Experiments in which the species of the RNA polymerase was
varied, the temperature was varied, or the rNTP concentration was varied had
little effect on R-loop formation (K. Yu, T. E. Wilson, G. A. Daniels, & M.R. Lieber,
unpublished). These are all factors that would influence the rate of transcription,
and their lack of effect suggests that the rate of the movement of the RNA
polymerase is a secondary issue for R-loop formation. In contrast, use of ITP in
place of GTP resulted in no R-loop formation (T.E. Wilson & M.R. Lieber,
57
unpublished; (9)), consistent with the energy of the interaction between the RNA
and DNA strands being a critical factor.
The data here support the view that clustering of Gs is an important
determinant of R-loop formation. In line with this, there is some propensity for
the R-loops to begin at the first repeat, regardless of whether there are 3 or 4
repeats within the switch region. In particular, the R-loops frequently initiate at
the GGGGTGCTGGGGTAGG sequence at the beginning of the first repeat
(repeat 13) (Fig. 2.4 and Fig. 2.6A). However, this sequence alone cannot
efficiently form R-loops (data not shown), and this is obvious from the inefficiency
of the 1-repeat substrate in forming R-loops (Table 2.1). Therefore, the length
and the G-density of the region downstream of such an R-loop initiation site
probably determine the efficiency of any R-loop formation. This is further
supported by our observations that R-loops located on a high G-density substrate
(but with no clusters), pDR54, are contained within the zone of high G-density.
Removal of G-clusters dramatically decreases the efficiency of R-loop formation,
even if the overall G-density of the nontemplate strand is maintained. Therefore,
it is quite apparent that G-clusters support R-loop formation much more efficiently
relative to merely a corresponding region of high G-density. The role of G-
clusters experimentally observed here is distinct from the conjectured role of G-
clusters in G-quartet formation. As we have shown above, G-quartet formation is
not necessary for R-loop formation, and we have observed nothing to indicate
that G-quartets are forming at the R-loops in vitro or in vivo (here and (12, 13,
43)). In fact, though it occurs at low efficiency, R-loop formation does occur with
58
the fully dispersed G-rich substrate, pDR54 (Fig. 2.6), and this DNA would not
form consecutive planes of G-quartets. If it is not for purposes of G-quartet
formation, then why is G-clustering more important than mere G-richness for R-
loop formation. One possibility relates to the initiation of the R-loop. Clearly, the
initiation event requires that a segment of the nascent RNA begin to thread back.
This thread back must begin at a few nucleotides (a nucleation site) because the
template and nontemplate strands of DNA would not be open for a sufficient
length to permit a long segment of RNA to anneal all at once. The initiation or
nucleation site would optimally contain more than one G. Hence, G-clusters
would be favored for this R-loop initiation phase rather than for any post-initiation
stabilization phase (such as G-quartet formation).
Short R-loops may be less stable because of the ability of the nontemplate
DNA strand to branch migrate so as to displace the RNA of the R-loop. When
the R-loop achieves sufficient size, displacement of the RNA due to branch
migration of the DNA may be inefficient. We see no evidence of any R-loops
extending downstream of the switch sequences in this study. Whether R-loops
extend downstream (or whether branch migration occurs so as to extend R-loops
further downstream) is almost certainly a function of the sequence downstream
of the switch regions. For all of the substrates here, the G-density falls sharply to
a random G-density (~20-25%) immediately after the last switch repeat. This is
in contrast to switch regions in vivo, such as Sµ and Sγ3, where the G-density
decreases gradually over several hundred base pairs and, hence, where R-loops
extend downstream of the core repeat region (12, 13). Therefore, it seems that
59
the downstream endpoint of R-loops is determined by the G-density, and this
determines whether the nascent RNA or the nontemplate DNA strand is favored
for base pairing with the DNA template strand.
The steep dependence of R-loop formation on G-clustering and high G-
density is noteworthy in an evolutionary context. CSR evolved over a hundred
million years after AID had evolved for its function in somatic hypermutation (2,
15, 39). Amphibians have class switch regions that are rich in preferred AID
sites (WRC) but are not G-rich on the nontemplate strand (47). Mice and
humans have switch regions that are not only uniformly G-rich on the
nontemplate strand, but contain clusters of Gs, and all of these switch regions
are G-rich across much of the repetitive core regions. We speculate that the G-
clustering and the overall G-richness of mammalian switch regions evolved to
drive efficient R-loop formation so as to make a more efficient single-stranded
DNA target at which AID can act. Based on our studies here, the mammalian
switch regions are at approximately the G-clustering and G-density level that is
needed to efficiently form R-loops. From an evolutionary standpoint, there would
have been little reason for the G-clustering and G-density to evolve to even
higher levels, once they had reached a sufficient level. The fact that the G-
clustering is close to the minimum needed for efficient R-loop formation is yet
another reason for regarding R-loop formation as the basis for the high G-
clustering of the nontemplate strand. Otherwise, it is unclear why the G-clusters
would evolve to precisely this critical point.
60
The contribution of R-loop formation to mammalian class switch
recombination may be approximately 4-fold for each switch region, based on the
fact that the inversion of Sγ1 results in a 4-fold drop in CSR (34). The
substitution of a Xenopus Sµ region in place of Sγ1 also shows a 4-fold
reduction, and this is consistent with the Sγ1 inversion data because the
Xenopus segment does not form R-loops (47). Hence, the downstream
(acceptor) switch regions appear to have evolved a G-richness on the top strand
to improve their use as targets by the single-strand specific AID enzyme.
Though this enrichment may only be about 4-fold for each switch region, this may
improve overall CSR substantially based on the fact that the ratio of switched
isotypes to IgM is often 100-fold or more in mammals, but is typically one or less
in Xenopus.
61
Chapter 2 References
1. Bardwell, P. D., C. J. Woo, K. Wei, Z. Li, A. Martin, S. Z. Sack, T. Parris,
W. Edelmann, and M. D. Scharff. 2004. Altered somatic hypermutation and
reduced class-switch recombination in exonuclease 1-mutant mice. Nat.
Immunol. 5:224-229.
2. Barreto, V. M., Q. Pan-Hammarstrom, Y. Zhao, L. Hammarstrom, Z.
Misulovin, and M. C. Nussenzweig. 2005. AID from bony fish catalyzes class
switch recombination. J Exp. Med. 202:733-8.
3. Basu, U., J. Chaudhuri, R. T. Phan, A. Datta, and F. W. Alt. 2007.
Regulation of activation induced deaminase via phosphorylation. Adv. Exp. Med.
Biol 596:129-37.
4. Bransteitter, R., P. Pham, M. D. Scharff, and M. F. Goodman. 2003.
Activation-inducedcytidine deaminase deaminates deoxycytidine on single-
stranded DNA but requires the action of RNase. Proc. Natl. Acad. Sci 100:4102-
4107.
5. Chaudhuri, J., and F. W. Alt. 2004. Class-switch recombination: interplay
of transcription, DNA deamination and DNA repair. Nat. Rev. Immunol. 4:541-
552.
6. Daniels, G. A., and M. R. Lieber. 1995. RNA:DNA complex formation upon
transcription of immunoglobulin switch regions: implications for the mechanism
and regulation of class switch recombination. Nucl. Acids Res. 23:5006-5011.
7. Drolet, M., S. Broccoli, F. Rallu, C. Hraiky, C. Fortin, E. Masse, and I.
Baaklini. 2003. The problem of hypernegative supercoiling and R-loop formation
in transcription. Front Biosci. 8:d210-221.
8. Dunnick, W. A., G. Z. Hertz, L. Scappino, and C. Gritzmacher. 1993. DNA
sequence at immunoglobulin switch region recombination sites. Nucl. Acid Res.
21:365-372.
9. Duquette, M. L., P. Handa, J. A. Vincent, A. F. Taylor, and N. Maizels.
2004. Intracellular transcription of G-rich DNAs induces formation of G-loops,
novel structures containing G4 DNA. Genes Dev. 18:1618-1629.
10. Fan, J., Y. Matsumoto, and D. M. Wilson. 2006. Nucleotide 1 sequence
and DNA secondary structure, as well as replication protein A, modulate the
single-stranded abasic endonuclease activity of APE1. J. Biol. Chem. 281:3889-
3898.
62
11. Gritzmacher, C. A. 1989. Molecular aspects of heavy-chain class
switching. Critical Reviews in Immunology 9:173-200.
12. Huang, F.-T., K. Yu, B. B. Balter, E. Selsing, Z. Oruc, A. A. Khamlichi, C.-
L. Hsieh, and M. R. Lieber. 2007. Sequence-dependence of chromosomal R-
loops at the immunoglobulin heavy chain Smu class switch region. Mol. Cell.
Biol. (in press).
13. Huang, F.-T., K. Yu, C.-L. Hsieh, and M. R. Lieber. 2006. The downstream
boundary of chromosomal R-loops at murine switch regions: implications for the
mechanism of class switch recombination. Proc. Natl. Acad. Sci. 103:5030-5035.
14. Huertas, P., and A. Aguilera. 2003. Cotranscriptionally formed DNA:RNA
hybrids mediate transcription elongation impairment and transcription-associated
recombination. Mol. Cell 12:711-21.
15. Ichikawa, H. T., M. P. Sowden, A. T. Torelli, J. Bachl, P. Huang, G. S.
Dance, S. H. Marr, J. Robert, J. E. Wedekind, H. C. Smith, and A. Bottaro. 2006.
Structural phylogenetic analysis of activation-induced deaminase function. J
Immunol 177:355-61.
16. Imai, K., G. Slupphaug, W. I. Lee, P. Revy, S. Nonoyama, N. Catalan, L.
Yel, M. Forveille, B. Kavli, H. E. Krokan, H. D. Ochs, A. Fischer, and A. Durandy.
2003. Human uracil-DNA glycosylase deficiency associated with profoundly
impaired immunoglobulin class-switch recombination. Nat. Immunol. 4:1023-
1028.
17. Kramer, P. R., and R. R. Sinden. 1997. Measurement of unrestrained
negative supercoiling and topological domain size in living human cells.
Biochemistry 36:3151-3158.
18. Lee, D. Y., and D. A. Clayton. 1996. Properties of a primer RNA-DNA
hybrid at the mouse mitochondrial DNA leading-strand origin of replication. J.
Biol. Chem. 271:24262-24269.
19. Li, X., and J. L. Manley. 2005. Inactivation of the SR protein splicing factor
ASF/SF2 results in genomic instability. Cell 122:365-378.
20. Masukata, H., and J. Tomizawa. 1990. A mechanism of formation of a
persistent hybrid betwen elongating RNA and template DNA. Cell 62:331-338.
21. Min, I. M., L. R. Rothlein, C. E. Schrader, J. Stavnezer, and E. Selsing.
2005. Shifts in targeting of class switch recombination sites in mice that lack mu
switch region tandem repeats or Msh2. J. Exp. Med. 201:1885-90.
63
22. Min, I. M., C. E. Schrader, J. Vardo, T. M. Luby, N. D'Avirro, J. Stavnezer,
and E. Selsing. 2003. The Smu tandem repeat region is critical for Ig isotype
switching in the absence of Msh2. Immunity 19:515-24.
23. Min, I. M., and E. Selsing. 2005. Antibody class switch recombination:
roles for switch sequences and mismatch repair proteins. Adv Immunol 87:297-
328.
24. Muramatsu, M., V. Sankaranand, S. Anant, M. Sugai, K. Kinoshita, N.
Davidson, and T. Honjo. 1999. Specific Expression of Activation-Induced
Cytidine Deaminase (AID), a Novel Member of the RNA-Editing Deaminase
Family in Germinal Center B Cells. J. Biol. Chem. 274:18470-18476.
25. Petersen-Mahrt, S. K., R. S. Harris, and M. S. Neuberger. 2002. AID
mutates E. coli suggesting a DNA deamination mechanism for antibody
diversification. Nature 418:99-103.
26. Petrini, J., and W. Dunnick. 1989. Products and Implied Mechanism of H
Chain Switch Recombination. J. Immunol. 142:2932-2935.
27. Rada, C., G. T. Williams, H. Nilsen, D. E. Barnes, T. Lindahl, and M. S.
Neuberger. 2002. Immunoglobulin isotype switching is inhibited and somatic
hypermutation perturbed in UNG-deficient mice. Curr. Biol. 12:1748-1755.
28. Reaban, M. E., and J. A. Griffin. 1990. Induction of RNA-stabilized DNA
conformers by transcription of an immunoglobulin switch region. Nature 348:342-
344.
29. Reaban, M. E., J. Lebowitz, and J. A. Griffin. 1994. Transcription induces
the formation of a stable RNA.DNA hybrid in the immunoglobulin alpha switch
region. J. Biol. Chem. 269:21850-21857.
30. Roberts, R. W., and D. M. Crothers. 1992. Stability and properties of
double and triple helices: dramatic effects of RNA or DNA backbone composition.
Science 258:1463-1466.
31. Ronai, D., M. D. Iglesias-Ussel, M. Fan, Z. Li, A. Martin, and M. D. Scharff.
2007. Detection of chromatin-associated single-stranded DNA in regions targeted
for somatichypermutation. J. Exp. Med. 204:181-90.
32. Saenger, W. 1984. Principles of Nucleic Acid Structure. Springer-Verlag,
New York.
33. Selsing, E. 2006. Ig class switching: targeting the recombinational
mechanism. Curr. Opin. Immunol. 18:249-254.
64
34. Shinkura, R., M. Tian, C. Khuong, K. Chua, E. Pinaud, and F. W. Alt.
2003. The influence of transcriptional orientation on endogenous switch region
function. Nat. Immunol. 4:435-441.
35. Stavnezer, J., and C. T. Amemiya. 2004. Evolution of isotype switching.
Semin. Immunol. 16:257-275.
36. Stavnezer, J., and S. Sirlin. 1986. Specificity of immunoglobulin heavy
chain switch correlates with activity of germline heavy chain genes prior to
switching. EMBO J. 5:95-102.
37. Szurek, P., J. Petrini, and W. Dunnick. 1985. Complete nucleotide
sequence of the murine γ3 switch region and analysis of switch recombination
sites in two γ3-expressing hybridomas. J. Immunol. 135:620-626.
38. Tian, M., and F. W. Alt. 2000. Transcription induced cleavage of
immunoglobulin switch regions by nucleotide excision repair nucleases in vitro. J.
Biol. Chem. 275:24163-24172.
39. Wakae, K., B. G. Magor, H. Saunders, H. Nagaoka, A. Kawamura, K.
Kinoshita, T. Honjo, and M. Muramatsu. 2006. Evolution of class switch
recombination function in fish activation-induced cytidine deaminase, AID. Int.
Immunol. 18:41-47.
40. Westover, K. D., D. A. Bushnell, and R. D. Kornberg. 2004. Structural
basis of transcription: separation of RNA from DNA by RNA polymerase II.
Science 303:1014-1016.
41. Xu, B., and D. A. Clayton. 1996. RNA-DNA hybrid formation at the human
mitochondrial heavy-strand origin ceases at replication start sites: an implication
for RNA-DNA hybrids serving at primers. EMBO J. 15:3135-3143.
42. Xu, L., B. Gorham, S. C. Li, A. Bottaro, F. W. Alt, and P. Rothman. 1993.
Replacement of germ-line ε promoter by gene targeting alters control of
immunoglobulin heavy chain class switching. Proc. Natl. Acad. Sci. USA
90:3705-3709.
43. Yu, K., F. Chedin, C.-L. Hsieh, T. E. Wilson, and M. R. Lieber. 2003. R-
loops at immunoglobulin class switch regions in the chromosomes of stimulated
B cells. Nature Immunol. 4:442-451.
44. Yu, K., F. T. Huang, and M. R. Lieber. 2004. DNA substrate length and
surrounding sequence affect the activation induced deaminase activity at
cytidine. J. Biol. Chem.
279:6496-6500.
65
45. Yu, K., and M. R. Lieber. 2003. Nucleic acid structures 1 and enzymes in
the immunoglobulin class switch recombination mechanism. DNA Repair 2:1163-
1174.
46. Yu, K., D. Roy, M. Bayramyan, I. S. Haworth, and M. R. Lieber. 2005.
Fine-structure analysis of activation-induced deaminase accessbility to class
switch region R-loops. Mol. Cell. Biol. 25:1730-1736.
47. Zarrin, A. A., F. W. Alt, J. Chaudhuri, N. Stokes, D. Kaushal, L.
DuPasquier, and M. Tian. 2004. An evolutionarily conserved target motif for
immunoglobulin class-switch recombination. Nat. Immunol. 5:1275-1281.
66
CHAPTER 3
G-CLUSTERING IS IMPORTANT FOR THE INITIATION OF TRANSCRIPTION-
INDUCED R-LOOPS IN VITRO WHEREAS HIGH G-DENSITY WITHOUT
CLUSTERING IS SUFFICIENT THEREAFTER
Abstract
R-loops form cotranscriptionally in vitro and in vivo at transcribed duplex
DNA regions when the nascent RNA is G-rich, particularly with G-clusters. This
is the case for phage polymerases, as used here (T7 RNA polymerase), as well
as RNA polymerases in bacteria, S. cerevisiae, birds, mice and humans. The
nontemplate strand is left in a single-stranded configuration within the R-loop
region. These structures are known to form at mammalian immunoglobulin class
switch regions, thus exposing regions of single-stranded DNA for the action of
AID, a single-strand-specific cytidine deaminase. R-loops form by thread-back of
the RNA onto the template DNA strand and here, we report that G-clusters are
extremely important for the initiation phase of R-loop formation. Even very short
regions with one GGGG sequence can initiate R-loops much more efficiently
than random sequences. The high efficiencies observed with G-clusters cannot
be achieved by having a very high G-density alone. Annealing of the transcript,
which is otherwise disadvantaged relative to the nontemplate DNA strand
because of unfavorable proximity while exiting the RNA polymerase, can offer
greater stability if it occurs at the G-clusters, thereby initiating an R-loop. R-loop
elongation beyond the initiation zone occurs in a manner that is not as reliant on
67
G-clusters as it is on a high G-density. These results lead to a model in which G-
clusters are important to nucleate the thread-back of RNA for R-loop initiation,
and once initiated, the elongation of R-loops is primarily determined by the
density of G on the nontemplate DNA strand. Without both a favorable R-loop
initiation zone and elongation zone, R-loop formation is inefficient.
Introduction
Immunoglobulin (Ig) class switch recombination (CSR) is the process in
which IgM is changed to IgG, IgA, or IgE by rearranging the Ig heavy chain from
IgHµ to IgHγ, IgHα, or IgHε (7, 11, 40). This DNA recombination process occurs
at class switch sequences located upstream of the corresponding constant
domain exons. The class switch sequences are long (1 to 12 kb); repetitive, with
unit repeat lengths of 25 to 80 bp; transcribed by a promoter immediately
upstream of each switch region; and strikingly G-rich on the nontemplate strand,
with G-densities reaching 40 to 50% (48). Despite conservation of these
features, the actual primary switch repeat sequences themselves are not
conserved across species or even among the switch sequences of the different
isotypes (e.g., Igµ, Igγ, Igα, Igε) (15). Even the unit repeats within any one switch
region of a given species vary from one repeat to the next, such that not one of
the individual repeats precisely matches the average sequence of that switch
region.
Activation-induced deaminase (AID) is a cytidine deaminase that is
expressed in activated B cells and is essential for Ig CSR and Ig somatic
68
hypermutation (SHM) (31). AID only acts on cytosines located in single-stranded
DNA (ssDNA) (6). This raises the question of how the DNA becomes single-
stranded so that AID can act on these genomic regions (48). The promoters
upstream of the switch region are critical for CSR, indicating that transcription is
critical (5, 16, 21, 46, 49). Transcription is also critically important for SHM,
suggesting that some level of ssDNA is somehow exposed to AID during
transcription (28). Indeed, transcription by both mammalian RNA polymerase II
and prokaryotic or phage polymerases can generate ssDNA upstream of the
polymerase to some extent in a manner that is not very well characterized (29).
The eukaryotic ssDNA binding protein, RPA, appears to contribute to this
exposure of ssDNA in vivo, perhaps by stabilizing the single-stranded state
transiently induced by transcription (7, 8, 32, 33). Other proteins that bind either
the nascent RNA or the nontemplate DNA strand may modify the efficiency of R-
loop formation in vivo (14, 20, 22, 26).
Ig switch regions evolved several hundred million years after SHM already
existed (4, 19, 40, 42, 50). Though both SHM and CSR require AID, the
processes of CSR and SHM are quite different. SHM is a point mutagenesis
process, whereas CSR is a double-strand break recombination process. CSR
regions in Xenopus are rich in palindromic forms of preferred sites of AID action
(WRC, where W = A or T and R = A or G). Interestingly, upon
hyperimmunization, amphibians do not switch nearly as efficiently as mammals
(13). At least part of the basis of this may be due to the high asymmetric G-
density in mammals. Mammalian CSR regions achieve G-densities of nearly
69
50% on the nontemplate strand in some repeats, in contrast to amphibian switch
regions, which are 21% G, like random vertebrate DNA.
An RNA:DNA hybrid forms at a 140 bp subregion of the Igα switch region
upon in vitro transcription with T7 RNA polymerase (34, 35). We showed that
RNA:DNA hybrids form at all of the tested murine switch regions anywhere within
the length of their repetitive regions (9). The RNA:DNA hybrids are stable for
days and stable to phenol/chloroform extraction. The RNA:DNA hybrids are also
stable to shorter term exposure to temperatures of 65-75°C. We showed that the
structure of these RNA:DNA hybrids is an R-loop, with the G-rich DNA strand
displaced by the G-rich nascent RNA of the same sequence (47). We and others
have shown that the number of hydrogen bonds between the RNA and template
DNA strand is important for the stability of the R-loop, based on failure of R-loops
to form when inosine is substituted for guanine (I:C base pairs share only 2
hydrogen bonds rather than the 3 of G:C) (T. E. Wilson and M. R. Lieber,
unpublished)(12). Upon RNase H treatment, some misalignment of top strand
and bottom strand DNA repeats occurs in vitro, resulting in displaced loops of
ssDNA on both strands, and we have proposed that this may occur in vivo as a
way to expose single-strandedness on the template DNA strand for AID action
(47, 48).
Our in vitro studies do not show any evidence of secondary structure on
the nontemplate DNA strand of the R-loops (e.g., G-quartets), and in vitro
experiments in which no Na
+
or K
+
are present (only Li
+
or Cs
+
) show unaltered
levels of R-loop formation, indicating that R-loops are stable under conditions
70
where G-quartets do not form (37). Therefore, if G-quartets form in vitro on the
G-rich nontemplate DNA strand, they are not essential for R-loop formation.
Moreover, at chromosomal R-loops, the regions of single-strandedness are
continuous (47, 48). One would expect G-quartets to trap intervening Cs,
thereby making them resistant to bisulfite, and we do not see this (17). For these
in vitro and in vivo reasons, we do not favor the view that G-quartets exist at
switch region R-loops; however, one can not rule out the possibility that G-
quartets exist but that physical methods to detect them are limited (12).
We and others have demonstrated that R-loops form at Ig CSR regions in
the mouse chromosome in activated B cells at Igµ, Igγ3, Igγ2b, and Igγ1, but not
in resting B cells (36, 47). In accord with an R-loop model, inversion of switch
regions reduces their efficiency (39). The R-loops can have heterogeneous
initiation sites; are continuous until their termination; typically terminate within or
shortly downstream of the switch region sequences; and can be removed by
RNase H.
R-loops have also been demonstrated in vivo at some other genomic and
mitochondrial locations in vertebrate cells and in yeast cells (18, 24-27, 45). R-
loops are found at G-rich regions in the chicken cell line DT40 as well as in HeLa
cells, when the cells were depleted of ASF/SF2 proteins that might aid in
ribonucleoparticle (mRNP) formation (26). Saccharomyces cerevisiae mutants
for a component of the THO complex (which is involved in mRNP biogenesis and
has functional roles in maintaining genomic stability during transcription and
integrates transcriptional elongation with mRNA export) also exhibit apparent R-
71
loop formation in GC-rich sequences upstream of the RNA polymerase
elongation complex. Importantly, this location is a hotspot for mitotic
recombination in wild type yeast, and the hotspot is suppressed with
overexpression of RNase H (1, 14, 18).
More recently, we showed that in vitro R-loops form by a thread-back
mechanism and form less efficiently as the number of switch repeats decreases
and as the G-cluster size of the repeats decreases (37). More specifically, we
showed that a sequence of 50% G where the Gs are alternating with A, C or T
does not form R-loops nearly as well as a sequence of 50% G in which the Gs
are clustered. Even a longer R-loop zone with a high G-density but no clustering
cannot match the R-loop formation efficiency of shorter G-clustered sequences of
equivalent G-density.
Here, we find that the region of initiation of R-loops, termed the R-loop
initiation zone (RIZ), relies on one or two clusters of a few Gs in the RIZ, and
without these, R-loop initiation and R-loop formation are inefficient. We believe
that this is because clusters of Gs have higher thermodynamic stability in an
RNA:DNA hybrid. This offsets the disadvantageous steric reasons that
contribute to the inefficiency of the RNA strand to compete with the nontemplate
DNA strand for the opportunity to anneal with the template DNA strand.
Downstream of the RIZ, the R-loop elongation zone (REZ) can support extension
of the R-loop with merely a high G-density and does not require G-clustering.
The distinction between sequence and stability requirements for the R-loop
initiation and elongation zones is essential to permit predictions of where, within
72
transcription units, R-loops may initiate or extend in the genome. The in vitro
findings here provide a better understanding of the mechanistic aspects of R-loop
formation and fit very well with the in vivo observations of where R-loops initiate
and extend.
Materials and Methods
Oligonucleotides and plasmid substrates.
All the substrates were constructed by cloning a small double-stranded
insert downstream of a T7 promoter sequence and immediately upstream of the
wild type or modified Ig switch region. For constructing substrates shown in
Figure 3.1, we digested the parent substrate (pDR18, pDR22, pDR26 or pDR54)
with Sac I, and blunted the 3' overhangs with T4 DNA polymerase before ligating
the short double stranded inserts containing sequences for motif A, C or B. Motif
A was made by annealing oligonucleotides DR122 (5’-GGTGCTGGGGTAGG-3’)
and DR123 (5'-CCTACCCCAGCACC-3'). Motif C was made by annealing
DR126 (5'-GGTGCTCGACTACA-3') and DR127 (5'-TGTAGTCGAGCACC-3')
while motif B was made by annealing DR124 (5'-TGCACTCGATCTAT-3') and
DR125 (5'-ATAGATCGAGTGCA-3'). pDR70 was made by cloning the annealed
double-stranded product of oligonucleotides DR118 (5'-
TCGAGCGTGCGAGCGCGAGAGCGTGAGTGCGTGAGCGAGCGCGTGAGCG
C-3') and DR119 (5’-
TCGAGCGCTCACGCGCTCGCTCACGCACTCACGCTCTCGCGCTCGCACGC-
3') in Xho I site immediately upstream of the first of the three Sγ3 repeats in
73
pDR51. The clones containing DR122, DR126, DR124 or DR118 sequences in
their respective nontemplate DNA strands were selected by sequencing and
used for in vitro transcription experiments. DNA purification on CsCl gradients
and subsequent procedures were followed as described previously (37).
In vitro transcription
In vitro transcription, sodium bisulfite treatment and frequency
determination experiments using a colony lift hybridization assay were performed
as previously described (37). Briefly, Sal I-linearized substrates were mock
transcribed or transcribed with T7 RNA polymerase in presence of
32
P-α-UTP for
1 hour at 37°C. Transcribed samples were treated with RNase A or RNase A
and RNase H1 for one additional hour at 37°C, organically extracted and
electrophoresed. Ethidium bromide staining was done afterwards to locate
restriction fragments containing the switch regions and the shifted bands. The
gels were exposed to phosphorimager screens after pressing and drying as
described previously.
Colony lift hybridization for determination of frequency of R-loop formation
To calculate the frequencies of R-loop formation, samples of T7
transcribed and RNase A treated substrates were incubated overnight with
sodium bisulfite at 37°C without any denaturation step. This would convert Cs on
the single-stranded regions on nontemplate and template DNA in the R-loop
conformation. PCR amplification was done on bisulfite-modified DNA with native
74
primers DR050 and DR051 that would anneal outside of the region of interest
(37). The PCR fragments were cloned with Topo-TA cloning kit, and the white
recombinant bacterial colonies were restreaked and lifted onto nylon
membranes. The membranes were then transferred to a denaturing solution (0.5
M NaOH, 1.5 M NaCl) for 15 min and followed by transfer to 1 M Tris, pH 7.5 for
15 min. This was followed by transfer to 1 M Tris, pH 7.5, 1.5 M NaCl for 15 min
and then a rinse with 2xSSC. The DNA was then fixed on the membrane by UV
crosslinking. The membranes were then rinsed with 2xSSC and incubated for
hybridization with end-labeled oligonucleotide probes. The oligonucleotide
probes were designed to anneal to regions of C-to-T conversions but not to
unconverted regions in the nontemplate strand derivative clones within the switch
regions. DR075 (5'-CAAAACTATCCAACCTGATTCCCATACTC-3') was used as
a probe for detecting nontemplate strand C-to-T converted regions in TA clones
of pDR18A, pDR18C, pDR18B, pDR22A, pDR22C and pDR22B while DR077 (5'-
CCAAAACTATCCAACCTGATTACCATACT-3') was used for probing pDR26A,
pDR26C and pDR26B. The clones corresponding to positive signals were
confirmed by DNA sequencing of the whole PCR insert and molecules with > 25-
nt stretches with at least three consecutive C-to-T conversions were considered
in this study to be informative R-loop derivatives. The total number of
nontemplate derivatives was determined by sequencing 16 white clones picked
randomly. They were then scored for C-to-T conversions on the nontemplate
strand. To calculate the frequency, the number of confirmed R-loop clones
75
(picked up by probing the membranes) was divided by the total number of
nontemplate derivative clones.
76
Figure 3.1. Locations of G in the substrates used for studying the effects
of G-clusters in the R-loop initiation zone (RIZ).
As reflected by the substrate names on the left, the substrates are
organized in groups of three (A, C and B). The positions of G on the
nontemplate strand are displayed as black dots. The first set of three substrates
(pDR18 set) has clusters of mostly GGGGs in the R-loop elongation zone (REZ).
The top molecule (pDR18A) has two additional GGGG-clusters in the R-loop
initiation zone (RIZ), which is upstream of the REZ. The second substrate
(pDR18C) has one additional GGGG-cluster motif in the RIZ, and the last
molecule in the set (pDR18B) has a random sequence in the RIZ of the same
length as the other substrates. The pDR22 set, the pDR26 set, and the pDR54
set are represented similarly. The REZ of the pDR22 set contains G-clusters but
none with a size more than GGG. The pDR26 substrate only contains GG
clusters in the REZ. In the pDR54 set, the REZ contains 49.7% Gs on the
nontemplate strand (same as the G-density in the REZ of the pDR18 set) but no
G-clusters. The Gs are distributed over the length of the REZ, with Gs
alternating with A, C or Ts. The RIZs are of the same length in the different sets
of substrates, and the same applies to the REZs. Each G is represented as a
black solid circle and other nucleotides are shown as open circles. The
nucleotide positions have been noted below the pDR54B representation.
77
Results
G-clustering is Important for R-loop Initiation.
In our previous work, we determined that the density of Gs is not the only
parameter that is important for R-loop formation because clustering of Gs yielded
more efficient R-looping (37). For example, we noted that for the same G-
density, clustered Gs supported more R-loop formation than dispersed G-rich
regions (7% of molecules with R-loops vs. 0.3% in Fig. 6A of Ref. (37)). Given
this, we wondered if addition of one or two additional GGGG sequences at the
very beginning of an RNA transcript might increase the percentage of DNA
templates that form R-loops. Therefore, we varied the number and location of Gs
in the sequence in the manner diagrammed in Figure 3.1. Nearly all of the
sequence shown in the top line (substrate pDR18A) consists of four ~49 bp
repeats of the Sγ3 switch region, but with the addition of a 16 bp sequence
before these four repeats. We use 3 variations of this 16 bp sequence, called A,
C, and B. The A variant of the 16 bp sequence contains two GGGG stretches on
the nontemplate strand. The C variant has only one cluster of GGGG, and the B
variant has none. We examine how these variants influence R-loop formation
upon transcription from the T7 promoter with extension through the 4 repeats
immediately downstream.
For the experiments, we incubate T7 RNA polymerase with purified,
linearized plasmid DNA for 1 hr at 37°C. RNase A is added to degrade free RNA
after transcription. The reactions are then organically extracted and analyzed on
agarose gels. RNA generated remains associated with the template DNA in the
78
form of an R-loop in a subset of molecules when the RNA strand is sufficiently G-
rich. For these linear templates, the fraction of molecules in an R-loop
conformation can be seen as a shifted species due to its slower gel mobility (9,
10, 30, 34, 47, 48).
Based on the ethidium stained gels (Fig. 3.2A), we find that upon T7
transcription using pDR18A, pDR18C and pDR18B, the pDR18A variant exhibits
the greatest amount of shift (i.e., R-loop formation), whereas the shift using
pDR18C (with one GGGG cluster, abbreviated 1x4G) is comparatively weaker.
The amount of shift drops even further for the pDR18B variant, which contains no
G-clusters. The transcription is done with
32
P-α-UTP present. Therefore, after
RNase A treatment, the shifted species can be visualized by phosphorimaging
(Fig. 3.2B) whereas the linearized fragment containing the transcribed sequence,
but not associated with RNA, remains largely unlabeled. The densitometric
analysis of the radiolabeled shifted species also reveals a similar trend in that the
radiolabeled shift in the substrate with the A motif (2x4G) is several fold greater
than the shift in the substrate with the B motif (random sequence; no G-clusters,
or 0x4G). The substrate with the C motif (1x4G) also shows an increase over the
B-variant. RNase H1 digests the RNA in RNA:DNA hybrids and treatment of a
fraction of transcribed sample with E. coli RNase H1 confirmed the RNA:DNA
hybrid nature of the shifted species because these were not observed in the
samples so treated (lanes 5,10 and 15 in Fig. 3.2A and 3.2B).
79
Figure 3.2. Effect of G-clusters in the RIZ on R-loop formation.
Analysis and maps of R-loop molecules in pDR18A, pDR18C and
pDR18B with RIZ motifs A, C or B (where A = 2x4G, C = 1x4G, B = 0x4G) are
shown. These plasmids have identical REZ regions.
(A) Linearized pDR18A, C and B substrates were either mock transcribed
(lanes 1, 6 and 11 for pDR18A, pDR18C and pDR18B respectively) or
transcribed with T7 RNA polymerase in the presence of
32
P-α-UTP and treated
with RNase A afterwards (lanes 2-4, 6-8, and 12-14 in triplicates for pDR18A,
pDR18C and pDR18B respectively). The fifth lane in each set is a transcribed
sample treated with RNase A and RNase H1 (lanes 5, 10, and 15 for pDR18A,
pDR18C and pDR18B respectively). The top panel is the ethidium stained gel
profile. The position of the linear fragment containing the switch region is
designated “L”. R-loop molecules run slower than “L”, and are seen as a shifted
band designated as “Shift”. The shifted band is not present in the RNase H-
treated lane, confirming the RNA:DNA hybrid nature of the shifted species.
(B)
32
P-α-UTP radiolabel profile of the same gel shown in panel A after
phosphorimager exposure. Most of the radiolabel localizes with the “Shift”
bands, but not with the “L” fragments, and is not seen in the mock transcribed
lanes or in the RNase H-treated samples at either position.
(C) Representation of single-stranded regions in the DNA nontemplate strand.
Transcribed substrates were treated with sodium bisulfite to convert Cs in the
single-stranded regions to Us. PCR amplification, cloning and colony lift
hybridization were done to calculate R-loop frequency (also see Table 3.1) and
80
Figure 3.2, Continued
detect regions of single-strandedness (read as stretches of C-to-T conversions
with sequencing). The top line is a diagram of the linearized substrate, showing
the T7 promoter, followed by the RIZ sequence A, C or B upstream (shown as an
inverted triangle) and with the REZ switch repeats represented as thick arrows.
In each set, the first line shows all Cs on the nontemplate strand as vertical lines.
Each of the following lines represents an independent nontemplate strand
derivative molecule, with vertical lines representing observed as C-to-T
conversions. Some molecules with R-loop-induced single-stranded stretches of
conversion were incomplete for the conversion information on the nontemplate
strand, and only the length to which the molecule was informative for the
nontemplate strand has been shown. The asterisks mark the position of the
methylated C in bacterial dcm methylation sites CC
met
(A/T)GG that remain
unconverted.
81
Figure 3.2, Continued
C
82
To calculate the frequency of R-looped molecules and map the sequence
location of the R-loops generated in these substrates, we treated the transcribed
samples with sodium bisulfite (as described in Materials and Methods) followed
by a colony lift hybridization assay (see Methods) to screen and sequence many
molecules for R-loops. We observed the same trend in R-loop frequencies: 12%
for the A variant (2x4G); 6% for the C variant (1x4G); and 0.8% for the B variant
(0x4G), as measured by colony lift hybridization. We find that nearly all of the R-
loops of the A variants begin within a narrow region 10 to 26 nt downstream of
the transcription initiation site within the G-clusters of the RIZ (Fig. 3.2C).
The sequence downstream of the RIZ is exactly the same in the three
variants (A, C and B); therefore, the difference in R-loop frequencies is a direct
consequence of the composition of the RIZ. Having two G-clusters in the RIZ
imparts better efficiency than having one, which in turn is better than none. The
thermodynamic stability of an RNA:DNA hybrid that initiates at the start of the
RIZ and extends to the end of the 4 switch region repeats is affected by less than
1.05-fold in the A or C variants compared to B (column 6 in Tables B.2 to B.4,
Appendix B); yet we see R-looping variation over a 15-fold range between these
three substrates. Therefore, the difference in R-loop formation efficiency is due
to the differences only in the RIZ rather than the sequences downstream.
G-clustering in the Initiation Zone is Important for a Wide Range of Substrates.
We wanted to extend analysis of the RIZ effect on R-loop formation to a
range of substrates where the sequence downstream, termed the R-loop
83
elongation zone or REZ, has been mutated. In these mutated forms, all clusters
of 4Gs are reduced to either clusters of 3Gs (pDR22A, C and B); clusters of 2Gs
(pDR26A, C and B); or only dispersed Gs with no G-clusters but relatively high
G-density (pDR54A, C and B). The RIZ region is one of the 3 sequences
described above (A = 2x4G, C = 1x4G, and B = 0x4G). The substrates thus
constructed were linearized, transcribed with T7 RNA polymerase and treated
with RNase A, or with RNase A and RNase H1.
In the substrates with only 3G-clusters in the REZ (pDRA, C and B), we
observed that the A variant exhibited a marked improvement in R-loop formation,
as seen on the ethidium stained gel, on the radiolabeled shift, and in the colony
lift assay. In the ethidium stained gel, the shifted species was about 5.3% of the
total transcribed substrate (Fig. 3.3A, lanes 2-4). Although a distinct shifted
species could not be detected for the C or the B variants in the ethidium stained
gel, in a more sensitive assay using the
32
P-α-UTP label incorporation in
transcribed RNA, the shifted species in the pDR22A variant showed a strong
radiolabeled band much greater than the radiolabel level in the shifted region for
the C variant, which in turn was greater that that of the B variant, which was
nearly at background levels (Fig. 3.3B). R-loop frequencies calculated by colony
lift hybridization assay also showed that the pDR22A variant was the most
efficient, with 5.1% of molecules in an R-loop conformation. In pDR22C, 1.2% of
molecules had R-loops, which was an improvement over the B-variant, in which
only 0.6% of molecules were R-looped. The substantial difference (5.1% versus
84
0.6%) between pDR22A and B clearly indicates the G-clusters in the RIZ are
important for R-loop formation efficiency.
85
Figure 3.3. Effect of reducing the REZ G-clusters from GGGG to GGG on
R-loop formation.
Analysis and maps of R-loop molecules in pDR22A, pDR22C and
pDR22B with RIZ motifs A, C or B (where A = 2x4G, C = 1x4G, B = 0x4G) and
an identical REZ with maximum G-cluster size of GGG are shown.
(A) Representation is similar to Figure 3.2A. Linearized pDR22A, C and B
substrates were either mock transcribed (lanes 1, 6 and 11 for pDR22A, pDR22C
and pDR22B respectively), transcribed with T7 RNA polymerase in the presence
of
32
P-α-UTP and treated with RNase A afterwards (lanes 2-4, 6-8, and 12-14 in
triplicates for pDR22A, pDR22C and pDR22B respectively). The fifth lane in
each set is transcribed sample treated with RNase A and RNase H1 (lanes 5, 10,
and 15 for pDR22A, pDR22C and pDR22B respectively). The top panel is the
ethidium stained gel profile. The position of switch region containing linear
fragment is designated “L”. The R-loop-induced shift is designated as “Shift”.
(B)
32
P-α-UTP radiolabel profile of the same gel shown in panel A after
phosphorimager exposure. Positions of shifted species and the linearized
restriction fragment have been marked as “Shift” and “L” respectively.
(C) Representation of single-stranded regions in the DNA nontemplate strand
detected by colony lift hybridization and sequencing after sodium bisulfite
treatment. Similar to the description in Figure 3.2C, the top line represents the
linearized substrate, showing the T7 promoter, followed by the RIZ sequence A,
C or B upstream (shown as an inverted triangle) and the REZ with modified
switch repeats (GGG clusters) represented as thick arrows. The first line in each
86
Figure 3.3, Continued
set shows all Cs on the nontemplate strand. Each of the following lines is an
independent nontemplate strand derivative molecule, with vertical lines
representing observed C-to-T conversions. Some molecules with R-loop induced
single-stranded stretches of conversions were incomplete for the conversion
information on the nontemplate strand, and only the length to which the molecule
was informative for the nontemplate strand has been shown. The asterisks mark
the position of the methylated C in bacterial dcm methylation sites (CC
met
A/TGG)
that remain unconverted.
87
Figure 3.3, Continued
88
Figure 3.3, Continued
C
89
Even in the substrate with only 2G-clusters in the REZ (pDR26A, C, and
B), there was a small but discernable indication that G-clustering in the RIZ
improves R-loop formation. We could not detect any significant shifted species in
the A, C, or B variants by ethidium staining or by radiolabel incorporation
analysis, indicating that the REZ is not sufficiently capable of extending any R-
loops initiated in the RIZ, presumably because of decreased G-density of the
transcript in the REZ (Fig. 3.4A and 3.4B). However, by colony lift hybridization,
we detected one R-loop molecule out of 432 nontemplate strand derivative
molecules (Table 3.1 and Fig. 3.4C). Also, the pDR54 set of substrates (Fig. 3.1)
further support a role for G-clusters in the RIZ in R-loop efficiency (Fig. 3.5).
These substrates have a high G-density in the REZ but no clustering. Compared
to the substrate with motif B (0x4G) in the RIZ, motif A (2x4G) or motif C (1x4G)
made R-loop formation more efficient (Fig. 3.5A and 5B). Therefore, several sets
of substrates (pDR22, 26 and 54 sets) suggest a role for G-clustering in the RIZ.
90
Table 3.1. R-loop formation frequencies in different substrates calculated
by colony lift hybridization assay.
Substrate
RIZ REZ Number of
molecules
with
nontempla
te strand
informatio
n
Number of
molecules
with long
stretches
of single-
strandedn
ess on
nontempla
te strand
Frequency
of R-loop
formation
(% of
molecules
in R-loop
conformati
on)
pDR18A 2x4G
clusters
4 Sγ3;
Mostly
GGGG
154 19 12.3
pDR18C 1x4G
cluster
4 Sγ3;
Mostly
GGGG
220 13 5.9
pDR18B No 4G
clusters
4 Sγ3;
Mostly
GGGG
371 3 0.8
pDR22A 2x4G
clusters
Modified
repeats,
GGG
clusters
334 17 5.1
pDR22C 1x4G
cluster
Modified
repeats,
GGG
clusters
411 5 1.2
pDR22B No 4G
clusters
Modified
repeats,
GGG
clusters
1051 6 0.6
pDR26A 2x4G
clusters
Modified
repeats,
GG
clusters
432 1 0.2
pDR26C 1x4G
cluster
Modified
repeats,
GG
clusters
754 None
detected
<0.1
pDR26B No 4G
clusters
Modified
repeats,
GG
clusters
1066 None
detected
<0.1
91
Figure 3.4. Effect of reducing the REZ G-clusters from GGGG to GG on R-
loop formation.
Experiments to detect the presence of transcription-induced R-loops in
pDR26A, pDR26C and pDR26B with RIZ motifs A,C or B (where A = 2x4G, C =
1x4G, B = 0x4G) and an identical REZ with maximum G-cluster size of GG are
shown.
(A) Representation is similar to Figure 3.2A. Linearized pDR26A, C and B
substrates were either mock transcribed (lanes 1, 6 and 11 for pDR26A, pDR26C
and pDR26B respectively), transcribed with T7 RNA polymerase in the presence
of
32
P-α-UTP and treated with RNase A afterwards (lanes 2-4, 6-8, and 12-14 in
triplicates for pDR26A, pDR26C and pDR26B respectively). The fifth lane in
each set is transcribed sample treated with RNase A and RNase H1 (lanes 5, 10,
and 15 for pDR26A, pDR26C and pDR26B respectively). The top panel is the
ethidium stained gel profile. The position of switch region containing linear
fragment is designated “L”. Expected position of the R-loop induced shift is
marked with an arrow and designated as “No shift observed” because of the lack
of a discernable shifted species.
(B)
32
P-α-UTP radiolabel profile of the same gel shown in panel A after
phosphorimager exposure. No shifted species was observed for the transcribed
samples. Similar to Figure 3.4A, the expected position of shifted species and the
linearized restriction fragment have been marked with arrows as “No shift
observed” and “L”, respectively.
92
Figure 3.4, Continued
(C) Representation of a molecule with a long stretch of single-strandedness in
the DNA nontemplate strand detected by colony lift hybridization and sequencing
after sodium bisulfite treatment. Similar to the description in Figure 3.2C, the top
line represents the linearized substrate, showing the T7 promoter, followed by
the RIZ sequence A (shown as an inverted triangle) and the REZ with modified
switch repeats (GG clusters) represented as thick arrows. The first line in each
set shows all Cs on the nontemplate strand. The following line shows the only
nontemplate strand-derived molecule detected in an R-loop conformation (out of
432 molecules screened; also see Table 1), with vertical lines representing
observed C-to-T conversions. The asterisks mark the position of the methylated
C in bacterial dcm methylation sites (CC
met
A/TGG) that remain unconverted.
93
Figure 3.4, Continued
94
Figure 3.5. Effect of reducing the REZ G-clusters from GGGG to G while
maintaining a high overall REZ G-density.
Experiments are shown analyzing transcription-induced R-loops in
pDR54A, pDR54C and pDR54B with RIZ motifs A, C or B (where A = 2x4G-
clusters, C = 1x4G-cluster, B = no G-clusters) and an identical REZ with high
nontemplate strand G-density (49.7%Gs, organized as GNGNGN…) with no G-
clusters.
(A) Linearized pDR54A, C and B substrates were either mock transcribed
(lanes 1, 6 and 11 for pDR54A, pDR54C and pDR54B respectively), transcribed
with T7 RNA polymerase in the presence of
32
P-α-UTP and treated with RNase A
afterwards (lanes 2-4, 6-8, and 12-14 in triplicates for pDR54A, pDR54C and
pDR54B respectively). The fifth lane in each set is transcribed sample treated
with RNase A and RNase H1 (lanes 5, 10, and 15 for pDR54A, pDR54C and
pDR54B respectively). The top panel is the ethidium stained gel profile. The
position of the linear fragment containing the switch region is designated “L”.
The position of the R-loop induced shift is marked with a bracket and designated
as “Shift”.
(B)
32
P-α-UTP radiolabel profile of the same gel shown in panel A after
phosphorimager exposure. The shifted position and the linearized restriction
fragment have been marked as “Shift” and “L”, respectively.
95
Comparison of Roles of G-clusters versus G-density in the R-loop Initiation Zone.
In the above studies, we observed that addition of G-clusters to the RIZ
improves R-loop formation. We wondered if initiation is strictly a function of the
number of G-clusters in the region immediately downstream of the promoter but
upstream of the REZ, and if a sufficiently high G-density in the RIZ can substitute
for G-clusters. The short motifs used in the previous section were too small to
effectively disrupt the G-clusters while maintaining the total G-content.
Therefore, we made a new substrate called pDR70 by inserting one repeat
length of 50% dispersed Gs with no G-clusters on the nontemplate strand,
between the T7 promoter and three Sγ3 repeats in pDR51 (37). For this
experiment, we call this the RIZ simply because it replaces the usual RIZ zone.
The overall length of the switch substrate was maintained to be equivalent to 4
repeats of Sγ3 but with the first repeat region containing no G-clusters, and
instead having a high G-density due to alternating GNGNGN. Therefore, pDR70
was constructed to contain 24Gs in the 48 nucleotides representing the one
repeat length of dispersed G region. For comparison, in pDR18 or pDR51, the
nontemplate strand G-density in the first Sγ3 repeat is 45.8% (22 Gs in the 48-nt
repeat), most of which are in G-clusters.
In the ethidium stained gel, transcription induced R-loop formation was
observed as a mobility shifted species in substrates pDR51 and pDR18 (Fig.
3.6A and 3.6B, lanes 2, 3, 4 and lanes 17, 18, 19 respectively), which have G-
clusters in Sγ3 repeats immediately downstream of the promoter but not in
pDR70 which has the alternating G region followed by 3 Sγ3 repeats (Fig. 3.6A
96
and B, lanes 13, 14, 15). Comparison of the radiolabel densities at the shifted
species shows that the pDR70 has about 17-fold less label intensity than that of
the pDR18 (4 Sγ3 repeats) and about 10-fold less as compared to pDR51 (3 Sγ3
repeats). This illustrates the effect of G-clustering, even though there is a
smaller effect of distance from the promoter. The values at the shifted position
for pDR70 are comparable to pDR54, which has a 4 repeat long G-rich region
without G-clusters (Fig. 3.6A and 3.6B, lanes 7, 8, 9). Both pDR70 and pDR18
have very similar and high G-densities in the first repeat region and differ only in
the distribution of the Gs within this region where R-loops would initiate.
These results show that G-clustering is extremely important for efficient R-
loop initiation regions. That is, replacing G-clusters with unclustered, but
equivalent G-density sequences is inadequate and drastically reduces the
efficiency of R-loop formation even in the presence of downstream G-cluster-
containing switch regions. Hence, we can conclude that high G-density without
clustering cannot replace the stronger effect of G-clusters in R-loop initiation.
97
Figure 3.6. Role of G-clusters versus high G-density in the R-loop initiation
zone (RIZ) in R-loop formation efficiency.
Linearized pDR51 (3 Sγ3 G-clustered repeats), pDR54 (4-repeat long and
dispersed high G-density region without any G-clustering), pDR70 (1 repeat long
dispersed and high G-density region followed by 3 Sγ3 repeats) and pDR18 (4
Sγ3 G-clustered repeats) were either mock transcribed (lanes 1, 6, 11 and 16 for
pDR51, pDR54, pDR70 and pDR18 respectively), transcribed with T7 RNA
polymerase in the presence of
32
P-α-UTP and treated with RNase A afterwards
(lanes 2-4, 7-9, 12-14 and 17-19 in triplicates for pDR51, pDR54, pDR70 and
pDR18 respectively). The fifth lane in each set is transcribed sample treated with
RNase A and RNase H1 (lanes 5, 10, 15 and 20 for pDR51, pDR54, pDR70 and
pDR18 respectively). The first repeat of these substrates (representing the RIZ)
have approximately similar G-density with dispersed (50% Gs in pDR54 and
pDR70) or clustered (45.8% Gs in pDR51 and pDR18) distribution on the
nontemplate strand.
(A) The top panel is the ethidium stained gel profile. The switch region
containing linear fragment of pDR51 contains three repeats and therefore has a
faster gel mobility than the switch region/modified switch region containing
fragments of pDR54, pDR70 and pDR18, which have four repeat long regions.
The positions of the linearized restriction fragments are marked “L”. The positions
of R-loop induced shifts are marked as “Shift”.
(B)
32
P-α-UTP radiolabel profile of the same gel shown in panel A after
phosphorimager exposure. The shifted positions and the linearized restriction
fragments have been marked as “Shift” and “L” respectively. A concise
description of the substrate and a pictorial representation of the tested switch
regions is shown below panel B.
98
High G-density can Compensate for an REZ with no G clustering.
Though our studies above focused primarily on the RIZ, our substrates do
have significant implication for clustering versus density in the REZ. Comparison
of pDR54A with pDR26A is informative in this regard. Both substrates have the
same RIZ. The REZ in pDR54A has only isolated Gs (no clusters of even 2Gs),
but it has a relatively high G-density. The REZ in pDR26A has many GG clusters
(Fig. 3.1).
We find efficient R-loop formation in pDR54A (4.2%) based on percentage
shift values from the respective ethidium stained gels. In contrast, pDR26A has
no detectable R-loop formation (compare pDR54A in Fig. 3.5A and B, lanes 2-4
with pDR26A in Fig. 3.4A and B, lanes 2-4). In fact, pDR54A is nearly as
efficient in R-loop formation (4.2%) as pDR22A (5.3%), which has many GGG
clusters throughout the REZ (Fig. 3.3A and 3.3B, lanes 2-4 for pDR22A and Fig.
3.5A and 3.5B, lanes 2-4 for pDR54A respectively).
This clearly shows that a high G-density in the REZ can support R-loop
formation without depending upon G-clusters. Therefore, a high G-density in the
REZ compensates for an REZ with no G clustering.
Discussion
R-loops have been studied in a wide range of in vitro and in vivo
prokaryotic and eukaryotic (including mitochondrial) systems. However,
mechanistic understanding of where and how they form has been lacking. In the
previous in vitro or in vivo work on R-loops from our lab and others, no distinction
99
was made between the zone where R-loops initiate (RIZ) versus the remaining
zone where the R-loop is maintained (REZ) (37). Though we previously learned
a great deal about R-loops, including the threading back of the RNA for R-loop
formation, we feel that the distinction between the RIZ and the REZ is critical for
understanding where R-loops form and the length over which they extend. The
sequence basis for the RIZ and REZ can be reduced to a thermodynamic level,
which was not possible previously. In addition, definition of parameters for R-
loop initiation and elongation now permits much more specific genome-wide
searches for potential R-loop forming regions. Assaying for R-loop forming
regions requires estimates not only of where they might initiate, but also how far
downstream they might extend.
Mechanism of R-loop Initiation.
The studies here markedly improve our understanding of the mechanism
of R-loop formation (Fig. 3.7). During transcription of random sequence by all
RNA polymerases, the nontemplate DNA strand is separated from the template
DNA strand. The nontemplate DNA strand appears to track along the outside of
the RNA polymerase for a length because it is solvent exposed, susceptible to
nucleases, and can be recognized by single-stranded binding proteins (2, 3, 43).
The two DNA strands appear to reanneal outside of the upstream side of the
RNA polymerase or on the surface of the polymerase in a region where both
DNA strands are solvent exposed (23, 44). It is clear that the nascent RNA that
exits the RNA polymerase is single-stranded, based on its susceptibility to
100
RNase A and T1 (37). R-loop formation upon transcription by phage and
mammalian polymerases indicates that the nascent RNA strand can compete
with the nontemplate DNA strand for annealing to the template DNA strand as it
emerges from the RNA polymerase.
The competition between the RNA and the nontemplate DNA strand for
annealing to the template DNA does not have to occur at the instant that all three
strands exit the RNA polymerase. Rather, the two DNA strands might anneal to
one another at or close to the surface of the RNA polymerase and then breath
open, thereby providing an opportunity for a G-cluster of the RNA strand to
invade the DNA. Such breathing of the newly reformed DNA duplex is
particularly likely because as the two DNA strands anneal to one another on the
upstream side of the RNA polymerase, the newly reannealed DNA duplex at this
position is effectively a DNA end, and DNA ends are known to undergo
substantial breathing (41). Interestingly, breathing of G-rich sequences is
maximal when one strand is composed of consecutive Gs and other is
consecutive Cs, just as is the case for the G-clusters in the RIZ (41). This may
be because such GGG/CCC (or longer) regions adopt a DNA conformation that
is intermediate between B-form and A-form, called B/A-intermediate DNA, and
the B/A-intermediate conformation is favorable for DNA breathing (41).
All RNA:DNA is stronger than DNA:DNA of the corresponding sequence.
However, the resumption of DNA:DNA in random DNA sequences is likely
because the nontemplate strand is closer to the template strand than the RNA
strand. Even for linear Ig switch region sequences with 50% G-density in
101
clusters, only a small minority of substrates assume an R-loop conformation.
Therefore, despite the thermodynamic advantage of RNA:DNA over DNA:DNA,
the proximity advantage of the nontemplate DNA strand is a dominant factor in
favoring DNA:DNA over RNA:DNA. Therefore, the initial nucleation site of the
thread-back RNA must have maximum stability via as many Gs as possible in a
short length (because there is not sufficient length of DNA strand separation to
permit a long segment of RNA to bind). It is interesting to note in this regard that
most of the R-loop substrates with two G-clusters initiate within that two G-cluster
RIZ, whereas almost all but one R-loop molecule in substrates with zero RIZ G-
clusters started downstream of the RIZ, mostly at or inside the switch regions
(REZ). Although the influence of the one G cluster motif in the RIZ results in
increased R-loop formation as compared to the zero G-cluster substrates, the R-
loop start sites on these substrates are more varied, pointing to the intermediate
stability of R-loops initiated at a one G-cluster RIZ.
When we compare the sequence between pDR18A and pDR18B, the two
substrates differ by two extra G-clusters in RIZ of pDR18A not present in
pDR18B. This is a comparatively small change in the overall sequence.
However, the R-loop formation efficiency of pDR18A is ~15-fold greater than
pDR18B. This disproportionate increase in pDR18A is observed because the
2x4G clusters in pDR18A are closer to the 5’ end of the transcript whereas the
position of the first G-cluster in pDR18B is internal in the transcript. The
transcript length is the same in the two substrates. Due to the higher mobility of
the nucleotide positions near the 5’ terminus compared with the internal
102
positions, the 5’ ends of the transcripts have a higher collision frequency with the
template DNA strand. Thus, G-clusters present towards the free 5’ end of the
transcribing RNA have a higher probability of nucleating an R-loop initiation
event. Once initiated, the overall stability of the R-loop is also higher in the A
variant because of a higher local G-content around this region (more G-clusters
in the pDR18A as compared to pDR18B; see Fig. 3.1), thereby further reducing
the RNA dissociation propensity (Fig. 3.7). Thus, whereas the stability factors
(total G-content) are additive in nature, the R-loop initiation factors (increased
molecular mobility driving higher collision frequency and RNA:DNA nucleation
upstream of the polymerase) are much more than additive.
103
Figure 3.7 Model of R-loop initiation by nucleation at G-clusters.
(A) This diagram depicts events prior to or without R-loop formation. The two
DNA strands separated by the RNA polymerase are reannealing to form a
duplex. The drawing is not to scale, and the reannealing of the two DNA strands
may occur anywhere between (or on) the surface of the RNA polymerase and
some uncertain number of base pairs upstream of the polymerase. The black
downward arrow (DNA on) represents the duplex formation propensity of the two
DNA strands. Thermodynamically, all RNA:DNA duplexes are more stable than
the DNA:DNA duplexes, but the DNA:DNA duplex ultimately prevails because of
more favorable proximity of the two DNA strands. The dashed black arrow
pointing upwards (DNA off) is the propensity of the DNA duplex to separate into
template and nontemplate strands (breathing). The red arrows are the
propensities of the transcript to associate (RNA on; red upward arrow) or
dissociate (RNA off; red downward arrow) with the template strand, and the red
arrows are thinner than the black arrows because the RNA transcript exits the
RNA polymerase away from the DNA, and consequently is relatively
disadvantaged sterically for association with the template DNA strand.
(B) Model of initiation of R-loop formation when G-clusters are present in the
transcript. The association of the RNA with the DNA template strand is
strengthened at the RNA:DNA hybrid regions containing G-clusters (thereby
weakening the RNA:DNA dissociation propensity; dashed red arrow). This
happens because of a considerable increase in the local thermodynamic stability
of the RNA:DNA hybrid (see Tables B.1-B.4 and Discussion in Appendix B). This
104
Figure 3.7, Continued
initial hybridization or stable nucleation event provides an increased opportunity
for the rest of the transcript to hybridize with the DNA template (depending on the
downstream G-density in the REZ). The presence of G-clusters on the
nontemplate strand also increases the breathing of the DNA duplex (now
depicted as a solid upward black arrow). Therefore, the RNA:DNA nucleation
event may occur after the two DNA strands anneal (on the surface of the
upstream side of the RNA polymerase) but then breath open, thereby allowing
the RNA transcript to 'invade' and anneal to the template DNA strand. The
increased RNA:DNA hybrid length [extension of the R-loop downstream (i.e.,
REZ)], and the presence of G-richness or other G-clusters in the transcript impart
greater stability to the R-loop structure because of increased difference between
the RNA:DNA thermodynamic stability over the DNA:DNA duplex stability in favor
of the RNA:DNA hybrid. Once formed, the R-loop terminates downstream when
the difference between the RNA:DNA and DNA:DNA stability is smaller, such
that the proximity advantage of the DNA:DNA annealing prevails.
105
Figure 3.7, Continued
106
Thermodynamic Considerations.
It is useful to consider the thermodynamic aspects to fully appreciate the
mechanism of RNA:DNA hybrid formation in R-loops. We observe a many fold
improvement in R-looping when the RIZ contains one or two clusters of GGGG.
This improved clustering correlates much better with the stability of annealing in
the RIZ than in the entire R-loop region (RIZ + REZ). The efficiency of R-looping
increases as the length and number of G-clusters increases. As mentioned
earlier, all RNA:DNA duplexes are stronger than DNA:DNA duplexes ((38) and
http://ozone3.chem.wayne.edu). Addition of a motif containing two GGGG
clusters or one GGGG cluster can be calculated to improve the local strength of
annealing (∆G) of the RIZ substantially (~50% or 1.5-fold with 2x4G motif and
~25% or 1.25 fold with 1x4G motif; see column 4 of Table B.1 in Appendix B).
The RIZ is so small relative to the REZ that the strength of annealing for the
entire RNA:DNA hybrid is increased relatively little (< 5% or < 1.05-fold; see
column 6 of Tables B.2-B.4 in Appendix B) compared to a random sequence in
the RIZ. Therefore, the stability of the RNA:DNA in the RIZ is the key factor for
R-loop initiation and a key factor for overall R-loop formation efficiency (Fig. 3.7).
In the R-loop initiation as well as elongation zones, the thermodynamic
stability of DNA:DNA that has clusters of Gs is weaker than for dispersed Gs
(see Discussion and Tables in Appendix B). This is a reflection of the B/A-
intermediate DNA structure mentioned above for regions of GGG/CCC (41). As
mentioned above for R-loop initiation, this would favor DNA:DNA strand
separation in the G-clustered regions, thereby making it somewhat easier for the
107
proximity-disadvantaged nascent RNA to anneal to the template DNA strand, and
thereby initiate a nucleation event.
Mechanism of R-loop Elongation.
Once an initial RNA:DNA nucleation site forms (typically in the RIZ in this
study), local RNA:DNA is more stable than DNA:DNA, as usual. Then elongation
is purely a reflection of the stability of RNA:DNA being substantially stronger than
the stability of DNA:DNA. The R-loop terminates as the difference in stability of
RNA:DNA and DNA:DNA gets smaller. At this point, the proximity advantage for
the annealing of the nontemplate DNA to the template DNA eventually prevails,
and the R-loop elongation ends.
The thermodynamic stability of clustered G duplexes in a DNA duplex is
weaker than that of dispersed sequences such as GNGNGN. The clustered G
duplexes may have better intrastrand base stacking at the expense of interstrand
interaction. A greater tendency of these G-clustered sequences to breath (as
mentioned above) may also make the DNA:DNA component weaker. The
majority of the difference between the RNA:DNA and DNA:DNA components is
contributed by this DNA:DNA interaction effect rather than the RNA:DNA
interaction within zones that contain G-clusters.
Relevance of R-loop Initiation and Elongation Zones in vivo.
In light of our inferences here, one might wonder why mammalian switch
regions evolved to have G-clustering throughout, rather than only at the
108
beginning. R-loop initiation is a stochastic process and is not 100% efficient at
the first G-cluster. In fact, on linear substrates with four Sγ3 repeats, less than
10% of the molecules are in an R-loop conformation at any one time. Hence,
additional G-clusters further downstream improve the R-loop formation efficiency
overall. Therefore, it is not surprising that the Ig switch regions contain G-
clustered repeats throughout their repetitive zone. Mapping of R-loop positions
in vivo shows that the initation point varies considerably (47), consistent with
what we see in vitro for partial switch regions (37).
The findings here provide a basis for understanding the R-loop initiation
and extension seen upstream of the Igµ switch region in vivo (17). A strong RIZ
followed downstream by a very weak REZ may be insufficient to remain stable as
an R-loop. An in vivo example of this may be the 50 bp G-clustered (50% G-
dense on the nontemplate strand) peak upstream of the Sµ repetitive region.
This might be insufficient to initiate stable R-looping in the wild type allele or an
allele that deletes the core Sµ repeats [called ∆SµTR in Ref. (17)] if not for the
region downstream of it (REZ), which is G-dense but relatively unclustered.
Therefore, the observations here for R-loop initiation and elongation are likely to
have predictive value in assessing transcription units for their propensity for R-
loop initiation and elongation.
109
Chapter 3 References
1. Aguilera, A., and B. Gomez-Gonzalez. 2008. Genome instability: a
mechanistic view of its causes and consequences. Nat Rev Genet 9:204-17.
2. Artsimovitch, I., and R. Landick. 2002. The transcriptional regulator RfaH
stimulates RNA chain synthesis after recruitment to elongation complexes by the
exposed nontemplate DNA strand. Cell 109:193-203.
3. Bandwar, R. P., N. Ma, S. A. Emanuel, M. Anikin, D. G. Vassylyev, S. S.
Patel, and W. T. McAllister. 2007. The transition to an elongation complex by T7
RNA polymerase is a multistep process. J Biol Chem 282:22879-86.
4. Barreto, V. M., Q. Pan-Hammarstrom, Y. Zhao, L. Hammarstrom, Z.
Misulovin, and M. C. Nussenzweig. 2005. AID from bony fish catalyzes class
switch recombination. J Exp Med 202:733-8.
5. Bottaro, A., R. Lansford, L. Xu, J. Zhang, P. Rothman, and F. W. Alt.
1994. S region transcription per se promotes basal IgE class switch
recombination but additional factors regulate the efficiency of the process. EMBO
J. 13:665-674.
6. Bransteitter, R., P. Pham, M. D. Scharff, and M. F. Goodman. 2003.
Activation-induced cytidine deaminase deaminates deoxycytidine on single-
stranded DNA but requires the action of RNase. Proc. Natl. Acad. Sci. 100:4102-
4107.
7. Chaudhuri, J., U. Basu, A. Zarrin, C. Yan, S. Franco, T. Perlot, B. Vuong,
J. Wang, R. T. Phan, A. Datta, J. Manis, and F. W. Alt. 2007. Evolution of the
immunoglobulin heavy chain class switch recombination mechanism. Adv
Immunol 94:157-214.
8. Chaudhuri, J., M. Tian, C. Khuong, K. Chua, E. Pinaus, and F. W. Alt.
2003. Transcription-targeted DNA deamination by the AID antibody
diversification enzyme. Nature 422:726-730.
9. Daniels, G. A., and M. R. Lieber. 1995. RNA:DNA complex formation upon
transcription of immunoglobulin switch regions: implications for the mechanism
and regulation of class switch recombination. Nucl. Acids Res. 23:5006-5011.
10. Drolet, M., S. Broccoli, F. Rallu, C. Hraiky, C. Fortin, E. Masse, and I.
Baaklini. 2003. The problem of hypernegative supercoiling and R-loop formation
in transcription. Front Biosci. 8:d210-221.
110
11. Dunnick, W. A., G. Z. Hertz, L. Scappino, and C. Gritzmacher. 1993. DNA
sequence at immunoglobulin switch region recombination sites. Nucl. Acid Res.
21:365-372.
12. Duquette, M. L., P. Handa, J. A. Vincent, A. F. Taylor, and N. Maizels.
2004. Intracellular transcription of G-rich DNAs induces formation of G-loops,
novel structures containing G4 DNA. Genes Dev. 18:1618-1629.
13. Flajnik, M. F., K. Miller, and L. D. Pasquier. 2003. Evolution of the immune
system, p. 519-570. In W. E. Paul (ed.), Fundamental Immunology. Lippincott,
Philadephia.
14. Gonzalez-Aguilera, C., C. Tous, B. Gomez-Gonzalez, P. Huertas, R.
Luna, and A. Aguilera. 2008. The THP1-SAC3-SUS1-CDC31 complex works in
transcription elongation-mRNA export preventing RNA-mediated genome
instability. Mol Biol Cell 19:4310-8.
15. Gritzmacher, C. A. 1989. Molecular aspects of heavy-chain class
switching. Critical Reviews in Immunology 9:173-200.
16. Harriman, G. R., A. Bradley, S. Das, P. Rogers-Fani, and A. C. Davis.
1996. IgA class switch in Ia exon-deficient mice. J. Clin. Invest. 97:477-485.
17. Huang, F.-T., K. Yu, B. B. Balter, E. Selsing, Z. Oruc, A. A. Khamlichi, C.-
L. Hsieh, and M. R. Lieber. 2007. Sequence-dependence of chromosomal R-
loops at the immunoglobulin heavy chain Smu class switch region. Mol. Cell.
Biol. 27:5921-5932.
18. Huertas, P., and A. Aguilera. 2003. Cotranscriptionally formed DNA:RNA
hybrids mediate transcription elongation impairment and transcription-associated
recombination. Mol Cell 12:711-21.
19. Ichikawa, H. T., M. P. Sowden, A. T. Torelli, J. Bachl, P. Huang, G. S.
Dance, S. H. Marr, J. Robert, J. E. Wedekind, H. C. Smith, and A. Bottaro. 2006.
Structural phylogenetic analysis of activation-induced deaminase function. J
Immunol 177:355-61.
20. Jimeno, S., A. G. Rondon, R. Luna, and A. Aguilera. 2002. The yeast THO
complex and mRNA export factors link RNA metabolism with transcription and
genome instability. Embo J 21:3526-35.
21. Jung, S., K. Rajewsky, and A. Radbruch. 1993. Shutdown of Class Switch
Recombination by Deletion of a Switch Region Control Element. Science
259:984-987.
111
22. Kaneko, S., C. Chu, A. J. Shatkin, and J. L. Manley. 2007. Human capping
enzyme promotes formation of transcriptional R loops in vitro. Proc Natl Acad Sci
U S A 104:17620-5.
23. Korzheva, N., A. Mustaev, M. Kozlov, A. Malhotra, V. Nikiforov, A.
Goldfarb, and S. A. Darst. 2000. A structural model of transcription elongation.
Science 289:619-25.
24. Lee, D. Y., and D. A. Clayton. 1998. Initiation of mitochondrial DNA
replication by transcription and R-loop processing. J. Biol. Chem. 273:30614-
30621.
25. Li, X., and J. L. Manley. 2006. Cotranscriptional processes and their
influence on genome stability. Genes Dev 20:1838-47.
26. Li, X., and J. L. Manley. 2005. Inactivation of the SR protein splicing factor
ASF/SF2 results in genomic instability. Cell 122:365-378.
27. Li, X., and J. L. Manley. 2005. New talents for an old acquaintance: the
SR protein splicing factor ASF/SF2 functions in the maintenance of genome
stability. Cell Cycle 4:1706-8.
28. Liu, M., J. L. Duke, D. J. Richter, C. G. Vinuesa, C. C. Goodnow, S. H.
Kleinstein, and D. G. Schatz. 2008. Two levels of protection for the B cell
genome during somatic hypermutation. Nature 451:841-5.
29. Longerrich, S., U. Basu, F. Alt, and U. Storb. 2006. AID in somatic
hypermutation and class switch recombination. Curr. Opin. Immunol. 18:164-174.
30. Masse, E., P. Phoenix, and M. Drolet. 1997. DNA topoisomerases
regulate R-loop formation during transcription of the rrnB operon in E. coli. J.
Biol. Chem. 272:12816-12823.
31. Muramatsu, M., H. Nagaoka, R. Shinkura, N. A. Begum, and T. Honjo.
2007. Discovery of activation-induced cytidine deaminase, the engraver of
antibody memory. Adv Immunol 94:1-36.
32. Pham, P., R. Bransteitter, J. Petruska, and M. F. Goodman. 2003.
Processive AID-catalyzed cytosine deamination on single-stranded DNA
stimulates somatic hypermutation. Nature 424:103-107.
33. Ramiro, A. R., P. Stavropoulos, M. Jankovic, and M. C. Nussenzweig.
2003. Transcription enhances AID-mediated cytidine deamination by exposing
single-stranded DNA on the nontemplate strand. Nat. Immunol. 4:452-456.
112
34. Reaban, M. E., and J. A. Griffin. 1990. Induction of RNA-stabilized DNA
conformers by transcription of an immunoglobulin switch region. Nature
348:342-344.
35. Reaban, M. E., J. Lebowitz, and J. A. Griffin. 1994. Transcription induces
the formation of a stable RNA.DNA hybrid in the immunoglobulin alpha switch
region. J. Biol. Chem. 269:21850-21857.
36. Ronai, D., M. D. Iglesias-Ussel, M. Fan, Z. Li, A. Martin, and M. D. Scharff.
2007. Detection of chromatin-associated single-stranded DNA in regions targeted
for somatic hypermutation. J Exp Med 204:181-90.
37. Roy, D., K. Yu, and M. R. Lieber. 2008. Mechanism of R-loop formation at
immunoglobulin class switch sequences. Mol Cell Biol 28:50-60.
38. SantaLucia, J. 1998. A unified view of polymer, dumbbell, and
oligonucleotide DNA nearest-neighbor thermodynamics. Proc. Natl. Acad. Sci.
USA 95:1460-1465.
39. Shinkura, R., M. Tian, C. Khuong, K. Chua, E. Pinaud, and F. W. Alt.
2003. The influence of transcriptional orientation on endogenous switch region
function. Nature Immunol. 4:435-441.
40. Stavnezer, J., and C. T. Amemiya. 2004. Evolution of isotype switching.
Semin. Immunol. 16:257-275.
41. Tsai, A. G., A. E. Engelhart, M. M. Hatmal, S. I. Houston, N. V. Hud, I. S.
Haworth, and M. R. Lieber. 2008. Conformational variants of duplex DNA
correlated with cytosine-rich chromosomal fragile sites. J. Biol. Chem.
284(11):7157-64.
42. Wakae, K., B. G. Magor, H. Saunders, H. Nagaoka, A. Kawamura, K.
Kinoshita, T. Honjo, and M. Muramatsu. 2006. Evolution of class switch
recombination function in fish activation-induced cytidine deaminase, AID. Int.
Immunol. 18:41-47.
43. Wang, D., and R. Landick. 1997. Nuclease cleavage of the upstream half
of the nontemplate strand DNA in an Escherichia coli transcription elongation
complex causes upstream translocation and transcriptional arrest. J Biol Chem
272:5989-94.
44. Westover, K. D., D. A. Bushnell, and R. D. Kornberg. 2004. Structural
basis of transcription: separation of RNA from DNA by RNA polymerase II.
Science 303:1014-1016.
113
45. Xu, B., and D. A. Clayton. 1995. A persistent RNA-DNA hybrid is formed
during transcription at a phylogenetically conserved mitochondrial DNA
sequence. Mol. Cell. Biol. 15:580-589.
46. Xu, L., B. Gorham, S. C. Li, A. Bottaro, F. W. Alt, and P. Rothman. 1993.
Replacement of germ-line e promoter by gene targeting alters control of
immunoglobulin heavy chain class switching. Proc. Natl. Acad. Sci. USA
90:3705-3709.
47. Yu, K., F. Chedin, C.-L. Hsieh, T. E. Wilson, and M. R. Lieber. 2003. R-
loops at immunoglobulin class switch regions in the chromosomes of stimulated
B cells. Nature Immunol. 4:442-451.
48. Yu, K., and M. R. Lieber. 2003. Nucleic acid structures and enzymes in
the immunoglobulin class switch recombination mechanism. DNA Repair 2:1163-
1174.
49. Zhang, J., A. Bottaro, S. Li, V. Stewart, and F. W. Alt. 1993. A selective
defect in IgG2b switching as a result of targeted mutation of the I gamma 2b
promoter and exon. Embo J 12:3529-37.
50. Zhao, Y., Q. Pan-Hammarstrom, Z. Zhao, and L. Hammarstrom. 2005.
Identification of the activation-induced cytidine deaminase gene from zebrafish:
an evolutionary analysis. Dev Comp Immunol 29:61-71.
114
CHAPTER 4
PHYSICAL AND MECHANICAL ASPECTS OF THE DNA AND RNA THAT
INFLUENCE R-LOOP FORMATION
Introduction
We have attempted to study the mechanistic aspects of R-loop formation
and have presented some of the experiments done towards understanding the
biochemical and physico-structural properties of R-loop structures. While some
mechanistic properties have now been identified and the basic principles have
been defined, we are in the process of doing more experiments to understand
the queries that remain.
In Chapter 2, we described the mechanism of R-loop formation and
defined the requirement of G-richness and the importance of G-clustering and
high G-density in the nontemplate strand DNA of R-loop forming regions. In
Chapter 3, we further dissected the role of G-clusters and G-richness in R-loop
formation and determined that the G-clustered regions can serve as regions of
nucleation very efficiently as RIZ because of the high thermodynamic stability of
a G-clustered transcript and its complementary C-clustered template DNA strand.
They can perform this function most effectively when they are present towards
the free end of the transcript upstream of the transcribing RNA polymerase and a
merely G-rich region cannot efficiently act as an RIZ. However, R-loop formation
also needs an REZ region around the RIZ where the R-loop structure can be
115
stabilized. The REZ regions still have to be G-rich but do not need to be G-
clustered.
With these results, we described most of the regional DNA sequence
dependent properties in R-loop formation. Now, we want to test the effects of
DNA sequence position, structure and conformation on R-loop formation. Our
experiments defined in the previous chapters on minimal substrates are valid in
purified systems, and now we want to incorporate some DNA aspects that would
make our observations better comparable with the in vivo situations and help us
understand the properties of in vivo R-loop formation. In the following sections,
we will describe some of our preliminary experiments that we have done or are in
the process of performing, and some of our future directions.
Materials and Methods
Plasmids and oligonucleotides
pDR72 was constructed by cloning in XhoI site, the dsDNA formed by
annealing DR120 (5’-
TGCAGCTTGCAATCGCTAGCTCAATATCCTAACTATCATACCTATCACATAAT
CAC-3’) and DR121 (5’-
TGCAGTGATTATGTGATAGGTATGATAGTTAGGATATTGAGATTGCAAGC-3’)
in XhoI site of pDR51(described in previous chapters) so that the DR120
sequence is the nontemplate strand during transcription from T7 promoter.
pDR72A, pDR72C and pDR72B were made in a similar way as described for
pDR18A, pDR18C or pDR18B in chapter 3. pDR87 was made by site directed
116
mutagenesis on pDR18 (described in Chapter 2) using oligonucleotides DR130
(5’-GAGCTGGTTTAGTGAACCTCAGCTCCGCTAGCGCTACCGG-3’) and
DR131 (5’-CCGGTAGCGCTAGCGGAGCTGAGGTTCACTAAACCAGCTC-3’) to
introduce an Nt.BbvCI site (CC*TCAGC) on the nontemplate strand DNA
upstream of the T7 promoter. The T after the * is at -39 of TSS. pDR88 was
made in a similar manner to introduce an Nt.BbvCI site between the T7 promoter
and the start of the switch repeats so that the T after the * is now at +10, using
oligonucleotides DR132 (5’-
GACTCACTATAGGGCGAACCTCAGCTCTCGAGGGGTGCTG-3’) and DR133
(5’-CAGCACCCCTCGAGAGCTGAGGTTCGCCCTATAGTGAGTC-3’). pDR89
was made by digesting pDR87 with XhoI to remove the 4Sγ3 repeats and
religating the backbone. pDR90 was constructed similarly by removing the XhoI
fragment containing the 4Sγ3 repeats from pDR88. pZZ6 was made by taking
pDR72 and performing site-directed mutagenesis to introduce an Nt.BbvCI site
immediately after the T7 promoter using oligonucleotides ZZ07 (5’-
CTCACTATAGGGCGAACCTCAGCTCTCGAGCTTGCAATCT-3’) and ZZ08 (5’-
AGATTGCAAGCTCGAGAGCTGAGGTTCGCCCTATAGTGAG-3’). The T after
the cut site is at +10. pZZ7 was made similarly using ZZ09 (5’-
ACTATCATACCTATCACCTCAGCACTCGAGGGGTGCTGGG-3’) and ZZ10 (5’-
CCCAGCACCCCTCGAGTGCTGAGGTGATAGGTATGATAGT-3’) to introduce
the nick site immediately upstream of the start of the switch repeats. The T after
the cut site is at +70.
117
In vitro transcription
In vitro transcription was done the same way as described in previous
chapters. For nicked substrates, the substrates were incubated with or without
Nt.BbvCI overnight and then the enzyme was heat inactivated by heating at 72
o
C
for 20 minutes. Nicking of the substrates was checked by running the samples in
agarose gel and post staining with ethidium bromide to locate the ‘nicked circular’
(NC) band. The nicked substrates were then linearized with Sal I and checked
for linearization before using them for transcription.
All other methods used (in vitro transcription in presence of
32
P-α-UTP, gel
analysis of transcription induced shifted species, enrichment of R-loop species by
cutting out of shifted band, sodium bisulfite treatment, colony lift hybridization
etc.) were same as the methods described in previous chapters. DR075 was
used as the radiolabeled probe to look for R-loop molecules in colony lift
hybridization with transcribed supercoiled pDR72A.
Results and Discussion
Effect of Distance between the Transcription Start Site (TSS) and the R-loop
Forming Region
In the genomic immunoglobulin heavy chain locus, the switch regions are
several kilobases apart from their respective cytokine promoters and are
preceded by the I-exons. Unlike the G-rich switch regions, the DNA sequence
between the promoter and the switch regions is not G-rich. For example, the
mouse Sγ3 region is about 0.9kb downstream of the Cγ3 promoter. To study the
118
effect of distance, we took our switch substrate pDR51 with three Sγ3 repeats
(as defined previously) and modified it by introducing a 1 repeat long random
DNA sequence between the T7 promoter and the first switch repeat and named it
pDR72. So, while both pDR51 and pDR72 have three wild-type Sγ3 repeats, the
repeats in pDR72 are located 1 repeat-length further downstream as compared
to their position in pDR51. Upon transcription, we found that the amount of
transcription induced shift goes down drastically in pDR72 as compared with
pDR51 showing a clear R-loop inhibitory influence of increasing distance
between the TSS and the G-rich region (compare lanes 2-4 for pDR51 to lanes
7-9 for pDR72 in Fig. 4.1).
During transcription, pDR72 generates a transcript that is not G-rich in the
first ~50nts and is then followed by a G-rich transcript corresponding to the
transcription of the switch regions (Appendix C, Fig C.1). The non-G-rich part of
the transcript is inefficient to form a stable RNA:DNA hybrid and remains as a
single-stranded RNA. The G-rich part of the transcript can form a hybrid with the
nontemplate strand DNA but the preceding RNA, being free in solution, can exert
its force on the hybridizing part of the RNA and can ‘peel away’ the transcript off
the RNA:DNA hybrid part. The distance effect would be governed by the
stochastic nature of effective collisions between the G-rich regions of the RNA
and the template strand, and probabilistic advantage lies with the G-rich regions
closer to the 5’ end of the transcript. Therefore, pDR51 would show a higher
amount of R-loop formation. However, the removal of the hybridized part of the
RNA by the non-hybridizing portion of the upstream RNA would reduce the R-
119
loop formation at distant switch regions even further. As a corollary to this, if this
non-hybridizing portion of the RNA is removed, then R-loop formation at the
distant switch regions should improve. To test this hypothesis, we performed in
vitro transcription of pDR51 and pDR72 in absence or presence of RNase A
during the transcription. RNase A cuts at Cs and Us in the RNA and therefore
cannot digest G-rich RNAs very well. RNase T1 would be unsuitable for such a
study because RNase T1 cuts at Gs in the transcript and presence of RNase T1
would therefore destroy the G-rich part if the transcript. The initial ~50 nt of the
RNA generated during transcription of pDR72 is non-G-rich (non-hybridizing
portion of the transcript) whereas the rest of it is G-rich. Although presence of
RNase A during transcription would affect overall R-loop formation at both the
substrates, the non-G-rich part of the transcript would be digested more than the
G-rich portion, thereby causing a relative increase in R-loop formation efficiency
at pDR72 in presence of RNase A by removal of the non-hybridizing portion of
the transcript as well as by a resultant increase in the molecular mobility by
reduction in the RNA bulk that can otherwise peel it off (Appendix C, Fig. C.1,
panel B). When RNAse A was not present during transcription, the amount of
shift observed for pDR72 (Fig. 4.1 lanes 7-9) was ~12% of the value of shift
observed for pDR51 (Fig. 4.1, lanes 2-4) by the radiolabel intensities (as
calculated by dividing the radiolabel intensities at the shifted positions for pDR72
over pDR51 for each set, calculations not shown). However, in presence of
RNase A during transcription, the amount of shift at pDR72 was ~55% of the shift
observed for pDR51, indicating a more than four-fold increase in R-loop
120
efficiency (compare lanes 17-19 for pDR72 with lanes 12-14 for pDR51 in Fig.
4.1). The overall decrease in shift values between transcription without and with
RNase A for pDR51 (Fig. 4.1, compare lanes 2-4 with lanes 12-14, for presence
of RNA after or during transcription, respectively) indicates the reduction in R-
loop formation because of RNA digestion of the G-rich transcript at the switch
repeats (albeit inefficient) by RNase A.
121
Figure 4.1. Effect of distance and the non-hybridizing portion of RNA to R-
loop formation.
Linearized pDR51(3 Sγ3 repeats after the T7 promoter) or pDR72 (3 Sγ3
repeats separated from the T7 promoter by 1 repeat long random sequence)
substrates were either mock transcribed (lanes 1 and 11 for pDR51 and lanes 6
and 16 for pDR72 respectively) or transcribed with T7 RNA polymerase in the
presence of
32
P-α-UTP. The samples were either treated with RNase A after
transcription (lanes 2-4 and 7-9 for pDR51 ad pDR72 respectively) or during
transcription (lanes 12-14 and 17-19 for pDR51 ad pDR72 respectively). The
fifth lane in each set is a transcribed sample treated with RNase A and RNase
H1 (lanes 5, 10, 15 and 20). The top panel is the ethidium stained gel profile.
The position of the linear switch substrate containing fragment can be deduced
from the first lane of each set. R-loop molecules run slower than the linear
fragments and are seen as a shifted band running higher than the linear position.
The shifted band is not present in the RNase H-treated lane. The
32
P-α-UTP
radiolabel profile of the same gel shown in the lower panel. Most of the
radiolabel localizes with the shifted bands, but not with the linear fragments, and
is not seen in the mock-transcribed lanes or in the RNase H-treated samples at
either position. The effect of increasing distance between the promoter and the
switch repeats on R-loop formation is seen by comparing lanes 2-4 for pDR51
with lanes 7-9 for pDR72. The radiolabel intensities at the shifted positions were
calculated and the average shift in lanes 7-9 (for pDR72) were found to be ~12%
of the values at lanes 2-4 (for pDR51). When RNase A was present during
transcription, the overall radiolabel values at pDR51 in lanes 12-14 reduced
because of cutting by RNase A. However, the relative shift values of
pDR72:pDR51 increased to ~55% exhibiting a ~4-fold increase. This increase is
the result of preferential digestion of the non-G-rich RNA in front of the G-rich
RNA in the transcript (also see Fig. S4.1 for the model explaining this
observation).
122
Effect of G-clustered RIZs at a Distance from G-rich Switch Repeats.
In the genome, while the defined G-rich switch repeats are located at large
distances from the promoters, G-clusters often are found between these two
features, as random sequence features or as parts of degenerate repeats. We
wondered if these G-clusters that are located far upstream of the switch repeats
can influence R-loop formation. We introduced short sequences containing two,
one or no GGGG-clusters (as defined in Chapter 3 as motifs A, C, or B) in
pDR72 at a location that was after the T7 promoter but about ~50 bp upstream of
the start of the three wild-type switch repeats. In effect, we made three more
substrates derived from pDR72 where the RIZ motifs A(2x4G), C(1x4G) or
B(0x4G, random sequence) were placed upstream of the switch repeats (REZ
region) and separated by a random spacer sequence of about ~50 bp. The new
substrates were called pDR72A, pDR72C and pDR72B respectively (Fig. 4.2 top
panel). Transcription-induced shift assay on linearized substrates revealed that
the G-clusters could improve R-loop formation efficiencies even if they are
located at a distance from the switch repeats. While pDR72B showed the
amount of shift without any G-clustering upstream (Fig. 4.2A and 4.2B, lanes 12-
14), R-loop formation was seen to improve for both substrates pDR72A(2x4G)
and pDR72C(1x4G) (Fig. 4.2, lanes 2-4 for pDR72A and lanes 7-9 for pDR72C).
Improved shift with pDR72C showed that presence of even one G-cluster is
capable of improving R-loop formation efficiencies at downstream switch regions
across random sequences.
123
To determine the location of R-looped regions in these substrates, we
sequenced some R-loop derivative molecules after cutting out the shifted region
of linearized pDR72A, and treating it with sodium bisulfite. We found that of the
23 molecules which were R-looped (single-stranded at nontemplate strand) at
the switch repeats (REZ), none of them had the single-strandedness extending to
the A motif RIZ (2x4G-clusters), located ~50 upstream of the start of the switch
repeats (Fig. 4.3, molecules depicted above the red line). This shows that while
G-clusters can improve R-loop formation at the switch repeats by acting as RIZs,
the R-loop structures do not start at these G-clusters because of a lack of
suitable REZ region around these G-clusters. Because of the small size of the
RIZ motifs and lack of a longer G-rich REZ near to it, the short nucleating
RNA:DNA hybrids formed at the G-clusters are less stable and are likely to
dissociate more easily than the longer R-loop regions at the downstream switch
repeats (see Fig.C.2 in Appendix C for an explanatory model) . Therefore, we do
not see much R-loop induced single- strandedness on the nontemplate strand
region at the G-clustered motifs upstream of the switch repeats.
124
Figure 4.2. Presence of RIZ G-clusters at a distance upstream of the
switch repeat REZ enhances R-loop formation efficiency.
The top of the figure shows the presence of RIZ (red block) motifs A
(2x4G-clusters), C (1x4G-clusters) or B (0x4G-clusters) upstream of the 3 Sγ3
switch repeats (solid black arrows) separated from them by a 1-repeat long
random sequence in pDR72A, pDR72C or pDR72B respectively. Linearized
substrates were transcribed and run on agarose gel to check for the presence of
shifted species in the transcribed samples treated with RNase A after
transcription but not with RNase H1.
(A) This panel shows the ethidium bromide stained gel. Lanes 1-5 have
pDR72A, lanes 6-10 have pDR72C and lanes 11-15 have pDR72B. Lanes 1, 6
and 11 contain the mock-transcribed samples for pDR72A, pDR72C and
pDR72B respectively. The next three lanes in each set contain T7 transcribed
substrates treated with RNase A afterwards. The last lanes in each set contain
transcribed sample treated with RNase A and RNase H1 afterwards. The
position of the linearized fragment containing the switch region is marked ‘L’.
The position of the transcription-induced shifted species, seen in lanes 2-4 for
pDR72A, 7-9 for pDR72C and 12-14 for pDR72B is marked ‘Shift’.
(B) This panel shows the radiolabel densities at the ‘Shift’ positions.
Both panels A and B show an increase in the shifts and the radiolabel
densities in the shifted species from pDR72B to pDR72C and pDR72A as the
number of G-clusters in the RIZ motif increase indicating that the G-clustered
motifs can influence R-loop formation at downstream G-rich switch repeats
separated by random non-G-rich sequences. This effect is seen because
formation of short RNA:DNA nucleation hybrids at the G-clusters can keep the
RNA transcript in near proximity to the transcribing substrate and this can help in
laying of the R-loop at the downstream switch repeats.
125
Effect of DNA Supercoiling on R-loop Formation.
While the mammalian chromosomes are all linear and have defined 5’ and
3’ ends, the structural organization of the genomic DNA is determined by various
factors that affect local conformation of the DNA. Binding of histones in
chromatin, protein scaffold binding to DNA, DNA elements such as SARs and
MARs, biased attachment and binding of other DNA binding proteins, and even
the local sequence behavior introduce and determine the local conformational
variations in the DNA. These variations can influence biological activities based
on their conformational states (1). As an example, the FUSE element in the c-
myc locus is thought to acquire transcription induced dynamic negative
supercoiling that can then lead to binding of structure specific binding proteins.
The strong suggestion is that DNA conformation change can contribute to
differential gene regulation. The twin domain model of Liu and Wang (2, 3)
proposes that a transient negative supercoiling trails the elongating RNA
polymerase whereas positive supercoiling accumulates downstream of the RNA
polymerase. It is also known that negative supercoiling of DNA promotes
transcription from a promoter, partly because it promotes DNA strand separation.
We wanted to test how negative supercoiling would influence R-loop
formation. All the experiments that have been described so far have been done
using linearized substrates because, being linear, these substrates can be
studied for the effect of DNA sequence features without the influence of
supercoiling. The experiments with linearized DNA were more stringent because
of removal of any supercoiling effect that can influence not only R-loop formation,
126
but also transcription. To study the effect of supercoiling, we therefore chose to
study the maps and distribution of R-loop regions (extent and location of single-
strandedness on the nontemplate strand) instead of comparing the frequency of
R-loop formation as that can potentially be biased by an effect of differential
transcription.
We chose the negatively supercoiled plasmid version of pDR72A for these
studies. Having seen the R-loop locations in the linearized version of pDR72A,
we hypothesized that negative supercoiling might cause the random sequence
spacer region between the A-motif RIZ (2x4G-clusters) and the switch region
REZ to be in R-loop conformation, unlike the linear version R-loops where the R-
loops were all contained within the REZ, with none of the R-loop molecules
showing conversions at the random sequence spacer upstream of the REZ. We
transcribed supercoiled pDR72A plasmid in vitro, did DNA bisulfite sequence
analysis, and followed it by selection of some molecules R-looped at the switch
region (REZ) by the colony lift hybridization method. Upon sequencing, we found
that out of 13 molecules that were single-stranded (R-looped) at REZ, 5
molecules (~38%) showed continuity of the single-strandedness beginning from
either the promoter or the RIZ motif A. 2 more molecules (~15%) also exhibited
single-strandedness of >25nt length at the promoter or the RIZ motif A but were
not continuous with the single-stranded regions of the REZ switch repeats. Even
more interestingly, 3 out of these 13 molecules (~23%) also showed substantial
extension of the single-stranded region downstream of the end of the REZ switch
repeats (molecules depicted in Fig. 4.3 below the horizontal red line). These
127
results show that the presence of negative supercoiling in DNA can promote R-
loop formation and reduces the dependence of R-loop formation on G-richness of
the DNA substrate. Figure C.3 in Appendix C depicts the model of R-loop
formation at supercoiled substrates.
128
Figure 4.3. The location and extent of R-loop structures in linearized and
negatively supercoiled versions of T7 transcribed pDR72A.
The topmost line represents the length of pDR72A tested for the presence
of R-loop molecules and each short vertical line represents a C that can
potentially be converted to T upon sodium bisulfite treatment following
transcription. The thick green line, the red triangle and the asterisk represent in
order the Cs in the T7 promoter, the C in motif A (GGGGTGCTGGGG), and the
methylated C in a dcm site CC
met
(A/T)GG where the methylated second C
escapes bisulfite modification. The first of the three vertical lines represents the
position of the C in motif A, whereas the other two vertical lines represent the first
and the last Cs in the switch repeat sequence. Each of the horizontal lines below
the full display of Cs represents an observed R-loop molecule with each short
vertical line representing an observed C-to-T conversion. The molecules above
the red horizontal line are the linearized and transcribed pDR72A molecules
containing R-loop structures. The molecules below the red line are the
supercoiled version of the same substrate in R-loop conformation. R-loops on
the 23 linearized molecules shown here are mostly contained within the switch
repeats and there are only rare conversions near the promoter or the A motif. In
contrast, out of 13 supercoiled molecules with R-loop structures, 5 start near the
promoter or the G-clustered A motif, and 3 continue beyond the end of the switch
repeats. Some molecules that do not show a continuous conversion pattern still
show a comparatively higher rate of conversion around the promoter/motif region
thus attesting to the importance of DNA conformation in R-loop formation.
129
Figure 4.3, Continiued
130
Effect of DNA Single-Strand Nicks on R-loop Formation
It is estimated that each cell is subject to about 10,000 to 100,000
individual molecular lesions in DNA per day. Regular cellular metabolism as well
as environmental factors such as UV and other radiation can inflict DNA damage.
Some of these damages are DNA nicks or DSBs, which can arise directly
because of damage or as steps in DNA repair pathways. Such DNA damage
can cause both structural and functional abnormalities of DNA. DNA nicks can
also cause substantial conformational change in regions that maintain a different
conformational identity separated from the surrounding DNA by insulators or
chromatin or some other resident DNA structures. We wondered if a DNA nick
would have any effect on R-loop formation.
To test the effect of DNA nicks on R-loop formation, we made two
substrates with a nicking enzyme site upstream or downstream of the promoter.
We took pDR18 (described earlier) which contains 4 wild-type mouse Sγ3
repeats downstream of a T7 promoter and made two variants, pDR87 and
pDR88. In pDR87, a site-directed sequence modification was done to introduce
a site for the nicking restriction enzyme Nt.BbvCI upstream of the T7 promoter so
that the position of the nick is 39 nt upstream of the transcriotional start site
(TSS). Nt.BbvCI recognizes its site in double-stranded form but only nicks the
one strand at the position at the asterisk (CC*TCAGC). In pDR88, an Nt.BbvCI
site was introduced downstream of the promoter at +10 nt but upstream of the
start of the switch repeats.
131
For doing the experiments, we took supercoiled forms of pDR18, pDR87
and pDR88, and treated them with Nt.BbvCI to introduce the nick. pDR18 has no
recognition site for the enzyme. After confirming that pDR87 and pDR88 have
been nicked (most of the DNA in the nicked form of the plasmid run at the ‘nicked
circular’ position as opposed to the ‘supercoiled’ position), we linearized them
before transcription. Figure 4.4 shows the experiment done with linearized
nicked substrates. We found that the nicked substrates exhibited greatly
increased R-loop formation efficiency, but only when present downstream of the
promoter and upstream of the switch repeats. There was not much difference in
R-loop formation efficiencies between the unnicked substrate and substrate with
nick upstream of the promoter. This shows that a promoter-downstream nick can
help in R-loop formation by causing the transient removal of the nontemplate
strand after the nicked position during transcription elongation. This gives the
RNA transcript an increased opportunity to compete with the nontemplate DNA
strand for hybridizing with the template strand DNA. The decrease in the ability
of the nontemplate strand to form duplex with the template strand is because the
continuation of the strand is compromised at the nick. Loss of this physical
association between the duplexed portion of the DNA (upstream of the nick) and
the single-strandedness acquired during transcription at and after the nick makes
the substrate unable to utilize the zippering action of DNA strand reannealing
during duplex formation upstream of the transcribing RNA polymerase. This
makes it more likely for the RNA to hybridize with the template strand DNA.
Because RNA:DNA hybrids are thermodynamically more stable than the DNA
132
duplex, the nontemplate strand is even less likely to displace the RNA once it is
hybridized with the template strand DNA. Figure C.4 in Appendix C presents a
model of R-loop formation at nicked substrates when the nick is present
upstream (panel A) or downstream (panel B) of the promoter.
Since the gel mobility shift for pDR88 was markedly stronger as compared
to the mobility shifts for pDR18 (no nick) or pDR87 (nick upstream of the
promoter), we wondered if a nick downstream of a promoter can initiate R-loop
formation in absence of a supportive G-rich switch region. Therefore, we made
pDR89 and pDR90 where nick sites were introduced either upstream (in pDR89)
or downstream of the T7 promoter (pDR90), but these substrates had no switch
repeats downstream of the T7 promoter. As a control substrate with no switch
sequences and no nick site, we used pDR16. Except for the position of the
nicking sites in pDR89 and pDR90, there is no other difference between them
and pDR16. We did the same experiment as described above (for pDR18,
pDR87 and pDR88) with these new substrates (pDR16, pDR89 and pDR90) and
observed that while there were no differences between pDR16 (no nick) and
pDR89 (nick upstream of promoter) or between the nicked and unnicked versions
of pDR89, the nicked version of pDR90 (nick downstream of the promoter)
showed the presence of a discernable shifted species (Fig. 4.5). This indicates
that even a non-G-rich RNA can achieve some degree of hybridization (although
lower as compared to a G-rich transcript) with the template strand if the
nontemplate strand is separated away from the template strand at the nick during
transcription.
133
Figure 4.4. Presence of a nick on the nontemplate strand downstream of
the promoter increases R-loop formation efficiency at the downstream
switch repeats.
Three substrates pDR18, pDR87 and pDR88 were mock-incubated or
incubated with the nicking enzyme Nt.BbvCI, and then linearized with SalI.
pDR87 and pDR88 are derived from pDR18, and all have 4 Sγ3 repeats
downstream of the promoter, except that pDR18 does not have a Nt.BbvCI
recognition site. A nicking site is present upstream of the promoter in pDR87,
and between the promoter and the switch repeats in pDR88 (nicking sites are
represented as blue triangles). The four solid black arrows represent the four
Sγ3 repeats. For each substrate, the first three lanes are unnicked substrates
whereas the next three lanes contain the same substrate treated with Nt.BbvCI.
In each subset of lanes, the first lane is mock transcribed substrate, followed by a
T7 transcribed substrate subsequently treated with RNase A after transcription
and before loading onto the agarose gel. The third lane contains the T7
transcribed substrate treated with RNase A and RNase H1. All samples were
organically extracted before running them on an agarose gel.
(A) This is the ethidium bromide stained profile of the agarose gel.
Comparable amounts of shift positions are seen for pDR18 or pDR87 or
unnicked version of pDR87. The nicked version of pDR88 shows a much greater
amount of shift indicating a highly efficient R-loop formation as compared to
unnicked substrates or with a substrate with a nick upstream of the promoter.
(B) This is the same gel shown in (A) and shows the location of radiolabeled
(
32
P-α-UTP) bands. While almost no radiolabel is present at the position where
the switch repeat containing linearized fragments run, we observe the presence
of high amount of radiolabel at the shifted position for nicked and transcribed
pDR88 as compared to the shifted positions in other substrates and unnicked
pDR87. It should be noted that this shifted position, as also observed in panel A,
runs higher than the shifted positions in other substrates and this mobility change
could be because of the presence of the partially displaced nontemplate DNA
strand. In the same lane (lane 19), an additional radiolabeled band localizes
near the linear fragment indicating that short RNA:DNA hybrids are formed at or
near the nicked position.
134
Figure 4.5. Presence of a nick on the nontemplate strand downstream of
the promoter increases R-loop formation efficiency downstream of the
nicked position even in absence of switch repeats.
Three substrates pDR16, pDR89 and pDR90 were mock-incubated or
incubated with the nicking enzyme Nt.BbvCI, and then linearized with SalI.
pDR88 and pDR90 are the same as pDR16 and have no switch repeats present.
pDR16 does not have a Nt.BbvCI recognition site, whereas a nicking site is
present upstream of the promoter in pDR88 and downstream of the promoter in
pDR90 (nicking sites are represented as blue triangles). Similar to the
arrangement in Fig. 4.4, for each substrate, the first three lanes are unnicked
substrates whereas the next three lanes contain the same substrate treated with
Nt.BbvCI. In each subset of lanes, the first lane is mock transcribed substrate,
followed by a T7 transcribed substrate subsequently treated with RNase A after
transcription and before loading onto the gel. The third lane contains the T7
transcribed substrate treated with RNase A and RNase H1.
(A) This is the ethidium bromide stained gel showing the positions of the
linearized fragments containing the T7 promoter. No shifts are seen for pDR16
or pDR89 or unnicked version of pDR90. Although small in amount, the nicked
version of pDR90 shows a shifted band indicating a better ability of nicked
pDR90 to form R-loop structures as compared to unnicked pDR16, pDR89 and
pDR90 or with a promoter-upstream nick at pDR89.
(B) This is the same gel shown in (A) and shows the location of radiolabeled
band (labeled with
32
P-α-UTP). The shifted position in lane 19 as deduced from
(A) is radiolabeled. While no radiolabel is present at the position where the
linearized fragments run, we observe the presence of a comparatively higher
amount of radiolabel at the shifted position for nicked and transcribed pDR90 as
compared to other substrates. In the same lane (lane 19), an additional
radiolabeled band localizes near the linear fragment indicating that short
RNA:DNA hybrids are formed at or near the nicked position, even in absence of
a R-loop supporting switch region.
135
We then wondered if we can test the influence of nicks at different
downstream positions along the R-loop substrates. The promoter-upstream
nicks, as deduced from experiments discussed above, do not have much effect
in R-loop formation (Figs. 4.4 and 4.5). In the two sets of experiments described
in the previous paragraphs, the promoter-downstream nicks were positioned
close to the promoter, and in pDR88, it was also close to the downstream 4 Sγ3
switch repeats. To test if nicks are more effective closer to the promoter or
closer to the switch region, we took pDR72 and made two more substrates where
we introduced a promoter downstream nicking site either close to the T7
promoter (in pZZ6) or far downstream of the promoter but immediately upstream
of the 3 Sγ3 switch repeats (in pZZ7). If pDR72, pZZ6 and pZZ7 are aligned, the
nicking sites in pZZ6 and pZZ7 are 60 nt apart (Appendix C Fig. C.5) and the
sequence between the nick sites of pZZ6 and that of pZZ7 is random non-G-rich
sequence.
Upon transcription, we found that while there is no difference between the
unnicked versions of the three linearized substrates used, the linearized nicked
pZZ6 showed the most efficient R-loop formation (Fig. 4.6, lane 4 in second set
of substrates). The linear nicked pZZ7 (Fig. 4.6, lane 4 in third set of substrates)
was also more efficient in R-loop formation as compared to pDR72 (Fig 4.6, first
set of substrates) but was less efficient as compared to pZZ6. Therefore, a
promoter-downstream nontemplate stand nick is more efficient in initiating R-loop
formation even at a distant R-loop forming site if the nick is closer to the
promoter.
136
Appendix C, Figure C.5 shows the model of R-loop formation at these
nicked substrates. In pZZ6 (top panel), the position of the nick is very close to
the TSS (or the 5’ end of the transcript; between +9 and +10). Therefore, during
transcription, the length of RNA and the DNA (from the nick) are comparable
behind the RNA polymerase. This means that the molecular mobility of the ends
are also comparable. Thus, the transcript and the DNA end have somewhat
equal chances to hybridize with the template strand DNA. However, in nicked
pZZ7 (bottom panel), the TSS is 69 nt upstream of the nick. Therefore, during
transcription near the TSS, the transcript is single-stranded more often because
of its lack of G-richness and also because the two DNA strands can reanneal
behind the RNA polymerase more efficiently in absence of a nick. When the
polymerase reaches the nick, the transcript is already ~70 nt long, and even
though the DNA strands would be relatively less efficient in reannealing due to
the nick (as explained in previous sections) as compared to the situation without
a nick, the RNA would be even more disadvantaged because of the extra bulk of
unhybridized portion. Besides, because of its much internal position in the
transcript, the molecular mobility of the G-rich portion is also decreased. In
comparison, the 5’ DNA end at the nick is smaller in length compared to the RNA
and is therefore more mobile. This increases the collision frequency between the
nicked nontemplate DNA strand with the template strand DNA, resulting in DNA
duplex formation more often as compared to nicked pZZ6. Even if an RNA:DNA
hybrid is formed between the non-G-rich portion of the RNA and the template
DNA upstream of the nick in pZZ7, the nontemplate strand can displace this RNA
137
more efficiently during reannealing because of a weaker thermodynamic stability
of such RNA:DNA hybrids as compared to hybrids formed between G-rich
transcripts and C-rich DNA strand.
However, even with such factors causing suppression of R-loop formation
at nicked pZZ7, it shows an improved R-loop formation efficiency when
compared with pDR72 (no nick) (Fig. 4.6, compare 4
th
lanes of first and third set
of substrates for pDR72 and pZZ7 respectively). This demonstrates that R-loops
form more readily if there is a nick downstream of the promoter.
138
Figure 4.6. Presence of a nick on the nontemplate strand downstream of
the promoter increases R-loop formation efficiency at the downstream
switch repeats separated from the promoter better than when the nick is
present far downstream of the promoter and immediately upstream of the
switch repeats.
Three substrates pDR72, pZZ6 and pZZ7 were mock-incubated or
incubated with the nicking enzyme Nt.BbvCI, and then linearized with SalI before
transcription. pZZ6 and pZZ7 are similar to pDR72 and have 3 Sγ3 repeats
present, with a random sequence 1 repeat long spacer DNA between the
promoter and the start of the switch repeats. pDR72 does not have a Nt.BbvCI
recognition site, whereas the nontemplate strand nicking site on pZZ6 is present
at the start of the 1 repeat long random spacer sequence, immediately
downstream of the promoter. pZZ7 has a nicking site at the end of the 1 repeat
long random spacer sequence downstream of the promoter and immediately
upstream of the start of the switch repeats (nicking sites are represented as blue
triangles). For each substrate, the first two lanes are unnicked substrates
whereas the next two lanes contain the same substrate treated with Nt.BbvCI. In
each subset of lanes, the first lane contains a mock-transcribed substrate,
whereas the second lane contains a T7 transcribed sample of the linearized
substrate, treated with RNase A after transcription and before loading onto the
gel.
(A) This is the ethidium bromide stained gel showing the positions of the
linearized fragments containing the T7 promoter. Whereas almost no shift could
139
Figure 4.6, Continued
be observed for transcribed pDR72 (lanes 2 and 4), a very strong shift pattern
was seen for nicked and transcribed (lane 8) but not for unnciked and transcribed
pZZ6 (lane 6). Two distinct bands are probably the two different conformational
variants of the R-loop structure, with the top band containing the nontemplate
strand displaced at the spacer region while the lower band represents the
conformation where the nontemplate strand at the spacer is duplexed with the
template strand. pZZ7 showed a visible shift in the nicked and transcribed
substrate (lane 12) that was weaker than corresponding lane for pZZ6 (lane 8)
but stronger than the corresponding lane for pDR72 (lane 4) proving that
presence of a promoter-downstream nick can help in R-loop formation even if it is
located at a distance from the promoter.
(B) This is the same gel shown in (A) showing the location of radiolabeled
(
32
P-α-UTP) bands. The shifted position in lane 8 as deduced from (A) is
radiolabeled. The shifted position for nicked and transcribed pZZ7 is also
radiolabeled (lane 12). While the amount of radiolabel in this band (lane 12) is
weaker than the corresponding lane for pZZ6 (lane 8), it is much stronger than
transcribed pDR72 (lanes 2 and 4). While no radiolabel is present at the position
where the linearized fragments run for the unnicked versions of the three
substrates, a radiolabeled band at this position for nicked and transcribed pZZ6
and pZZ7 is seen.
140
Figure 4.6, Continued
141
Chapter 4 References
1. Kouzine, F., S. Sanford, Z. Elisha-Feil, and D. Levens. 2008. The
functional response of upstream DNA to dynamic supercoiling in vivo. Nat.
Struct. Mol. Biol. 15(2):146-54.
2. Liu, L. F., and J. C. Wang. 1987. Supercoiling of the DNA template during
RNA transcription. Proc. Natl. Acad. Sci. USA 84. 7024-7027.
3. Tsao, Y. P., H. Y. Wu, and L. F. Liu. 1989. Transcription-driven
supercoiling of DNA: direct biochemical evidence from in vitro studies. Cell.
56(1):111-8.
142
CHAPTER 5
CONCLUDING REMARKS
The evolution of a complex and highly efficient mammalian immune
system from a seemingly less complex shark or amphibian immune system is an
important evolutionary event. And this has been marked with important changes
in the genome, that have not only established efficient immune systems, but
have done so while conserving one of the most ancient protein functions (AID) so
that the SHM like activity of AID (present in shark and bony fishes) could be
utilized with more evolved genomic systems to get a better targeting and CSR
like activity (in amphibians, birds and mammals) for obtaining enhanced immune
response against any antigen (20, 23). Zebrafish AID can be successfully used
in mammalian systems to induce CSR (1, 23). This represents an oddity in
evolution where successive genomes have evolved in each step to better utilize
an already established protein function to create a system much superior to the
earlier genomes in evolution.
AID deaminates deoxycytidines to uracils, and it can do so only if the
cytosine is presented in a single-stranded context (2, 4, 5, 17, 19). Although it
shows a preference for deaminating Cs in WRC motifs (W=A or T, R=A or G), the
preference is not absolute and non-WRC motifs are also acted on by AID (26,
31). So, while the amphibian switch regions are rich in these motifs to afford
targeting by AID, they are comparatively inefficient in presenting these motifs in a
single-stranded context to the AID. The mammalian switch sequences have
143
acquired several levels of sophistication in their switch regions in that they not
only have developed G-rich sequences that can harbor transient single-
strandedness at these sequences in form of R-loops for AID action, they exhibit
expansion of R-loop supporting G-rich regions both within as well as between
different switch regions. This increases the chances of R-loop formation within
the concerned switch region and also helps to maximize the AID targeting
potential by widening the regional target zone. The maintenance of these blocks
of switch regions in one genomic locus maximizes the use of AID function at
these switch regions while keeping the potentially mutagenic AID action confined
to a limited region, with a common regulation employed for all the heavy chain
switch loci to minimize DNA damage from spreading across different regions in
the genome. This conserves genomic integrity while also maintaining a high
potential and big repertoire of immunoglobulin generation when needed.
An understanding of the nature and function of mammalian
immunoglobulin heavy chain locus would require in depth understanding of the
sequence elements and structural features that influence its behavior.
Transcription-induced chromosomal R-loop formation at the switch regions are
the most characteristic feature of mammalian heavy chain isotype recombination.
First reported from our laboratory in 2003 (29), chromosomal R-loop structures
were characterized for their location within the Ig switch regions and their
composition of two DNA strands and an RNA transcript hybridized with the
template DNA strand thus rendering the nontemplate DNA strand as a single-
stranded entity. Several further studies from our laboratory have reported the
144
presence and significance of genomic R-loop structures correlating their
locations with the implications on targeting by AID and effect on CSR (7, 8, 30).
In the current set of studies reported here, we have followed a reductionist
approach to understand the more complex question of how and why R-loop
structures can form and what are the factors influencing the formation. In
breaking up the big question into smaller testable parts, we have been able to
remove the in vivo influences on R-loop formation and have studied the role of
the participating nucleic acids at the naked DNA and RNA level, thus uncovering
some of their specific roles in R-loop formation. While in vivo studies are very
important to assess the role of such structures and correlating them with
biological functions, the in vivo studies tend to be observational in nature and
give little information about the mechanistic aspects. The kind of study
undertaken here is important in understanding the mechanism in more detail and
establishment of the underlying principles of R-loop formation. These principles
are fundamental in nature and are likely to be extendable to the R-loop structures
formed in vivo systems at Ig locus or at other genomic regions.
These studies help us in understanding of G-clustered and G-dense
regions, importance of their relative positions and understanding the role of DNA
and RNA strand dynamics and conformation in R-loop formation. These studies
are also likely to be the first functional studies to define the nature of the two
DNA strands and the RNA transcript behind the transcribing RNA polymerase.
While the crystal structures of T7, bacterial and yeast RNA polymerases are
known along with the relative positions of bases in the downstream DNA and at
145
the site of RNA polymerization, the upstream edge positions of the nucleic acids
is still unclear. Although possible sites of RNA exit have been determined with
high confidence, the actual picture is very unclear in the absence of unequivocal
crystal structure data of elongating RNA polymerase with the upstream edge
nucleic acid strands (3, 6, 9, 21, 22, 24, 25, 27). Our R-loop formation data with
RNase T1 (described in Chapter 2) clearly demonstrates that the RNA comes out
single-stranded and that R-loops do not form by extension of the initial RNA:DNA
hybrid established inside the RNA polymerase at the RNA polymerization site.
This is an important point in that it means that the RNA invades the 5’ edge of the
transcription bubble behind the polymerase, indicating that the bubble can extend
for some distance on the upstream side of the polymerase, and that the two DNA
strands do not anneal immediately after passing of the RNA polymerase. Our
results from studying long transcribable substrates confirms this nature of the
upstream edge of the transcription bubble.
Besides the setting up of the fundamental rules for understanding and
predicting R-loop formation at the Ig locus, the significance of this study also lies
in the fact that these biochemical principles could be extended to any transcribed
genetic element in vitro or in vivo genomic region DNA in transcription units,
across species or systems. And drawing from the established correlation of
presence of R-loops at transcribed Ig switch regions and initiation of CSR by a
mechanism of DNA breakage at switch sequences and intrachromosomal
translocation, these experiments provide proof of principle for a possible
mechanism of initiating DNA breaks occurring at any R-loop like structures
146
formed in the genome and non-Ig sequences. As precedence to this line of
thinking, we can cite the example of c-myc translocation. Human c-myc locus on
chromosome 8 is translocated to the chromosome 14 resident IgH locus and is
seen in 100% of sporadic Burkitt’s lymphoma cases, indicating the causative
nature of this translocation. This t(8;14) event happens when the IgH locus is
undergoing CSR in the germinal center B-cells and AID is required for the c-
myc/IgH translocation (16, 18). This points to a commonality of processes at
these two participating loci which should normally only introduce DSBs in the IgH
locus, but now act aberrantly by mistake on the c-myc locus. The c-myc locus is
known to be high in G-richness and therefore might be able to form R-loop
structures. It is then conceivable that the CSR machinery can initiate DSBs at
these structures in the c-myc locus. The c-myc FUSE element is also known to
acquire dynamic negative supercoils (11) which can further enhance transcription
through the c-myc locus, thereby increasing chances of R-loop like structures
being formed.
AID is also reported to introduce mutations in other proto-oncogenic loci
like bcl6, pim1 etc. (10, 15) and seems to be required for induction of germinal
center derived B-cell lymphomas (14) suggesting that these loci and cell stage
can harbor substantial, even if transient, single-strandedness for the AID to act
on. There are other non-Ig locations in the genome which can get mutated by
AID and may play a role in tumorigenesis (28). Some of these regions might
have the potential to form non-B-DNA structures, and owing to their sequence
characteristics, might specifically form R-loop structures (13).
147
Whereas these R-loop formation rules defined here do not provide all
causative and functional explanations for these scenarios, our present study
certainly helps in putting forward a model explaining some of the observations,
and extends the principles of R-loop formation to regions other than switch
regions.
As a logical extension of these studies and based on the R-loop formation
principles noted here, we are trying to establish an R-loop code and trying to find
out genomic regions that have the potential to harbor R-loop structures. If some
of such regions are found around some of the known/preferred DNA breakpoints,
then further studies could be done to test that sequence in more detail.
A test for these principles would be their validation in the in vivo systems
where a functional outcome could be directly correlated with efficiency of R-loop
formation that in turn would depend on the sequence or conformation of the DNA
sequence. A B-lymphoma cell line CH12F3 (12) is known to switch to the α
heavy chain switch region upon induction and expresses Igα. The resident Sα
region could be tested further by modification of its sequence. We predict that
most of the rules defined here for R-loop formation would be found to be true in
the in vivo systems too. We are also attempting to test if nicked DNA substrates
(which might form R-loops upon transcription at the nick) would show higher AID
induced mutation frequencies upon transcription.
While there has been good evidence about the R-loop structures at switch
regions in CSR, the understanding of the mechanistic aspects has so far been
mostly conjectural. In our study as presented here, we have made some
148
progress in our understanding of mechanism of R-loop formation, and we are
now closer than before to establish some defined principles that support or
suppress R-loop formation. This helps in defining an R-loop code, which would
be an important contribution towards the present body of useful knowledge. The
basic nature of these studies to elucidate the biochemical and structural
properties of R-loop forming sequences would make these methods and
principles work both in vitro and in vivo on a wide range of potential substrates in
various functional systems. We believe that these principles put forth here would
be helpful for a better understanding of R-loop structures, provide the basis for
understanding other in vivo studies on SHM, CSR and DNA
recombinations/translocation events, and provide clues for functional studies
towards understanding some of the genomic structures and associated
biological/disease phenomenology in the long run.
149
Chapter 5 References
1. Barreto, V.M., Q. Pan-Hammarstrom, Y. Zhao, L. Hammarstrom, Z.
Misulovin, and M.C. Nussenzweig. 2005. AID from bony fish catalyzes class
switch recombination. J. Exp. Med. 202: 733–738.
2. Bransteitter, R., P. Pham, M.D. Scharff and M.F. Goodman. 2003.
Activation-induced cytidine deaminase deaminates deoxycytidine on single-
stranded DNA but requires the action of RNase, Proc. Natl. Acad. Sci. U S
A.100:4102-4107.
3. Bushnell, D, A., K.D. Westover, R.E. Davis, and R.D. Kornberg.Structural
basis of transcription: an RNA polymerase II-TFIIB cocrystal at 4.5 Angstroms.
Science. 303(5660):983-8.
4. Chaudhuri, J., M. Tian, C. Khuong, K. Chua, E. Pinaus, and F. W. Alt.
2003. Transcription-targeted DNA deamination by the AID antibody
diversification enzyme. Nature 422:726-730.
5. Dickerson, S.K., E. Market, E. Besmer and F.N. Papavasiliou. 2003. AID
mediates hypermutation by deaminating single stranded DNA. J. Exp. Med.
197(10):1291-6.
6. Durniak, K.J., S. Bailey, and T.A. Steitz. 2008. The structure of a
transcribing T7 RNA polymerase in transition from initiation to elongation.
Science. 322(5901): 553-7.
7. Huang, F.T., K. Yu, B.B. Balter, E. Selsing, Z. Oruc, A.A. Khamlichi, C.L.
Hsieh and M.R. Lieber. 2007. Sequence dependence of chromosomal R-loops at
the immunoglobulin heavy-chain Smu class switch region. Mol. Cell Biol.
27(16):5921-32.
8. Huang, F.T., K. Yu, C.L. Hsieh, and M.R. Lieber. 2006. Downstream
boundary of chromosomal R-loops at murine switch regions: implications for the
mechanism of class switch recombination. Proc. Natl. Acad. Sci. U S A.
103(13):5030-5.
9. Korzheva, N., A. Mustaev, M. Kozlov, A. Malhotra, V. Nikiforov, A.
Goldfarb, and S.A. Darst. 2000. A structural model of transcription elongation.
Science. 289(5479): 619-25.
150
10. Kotani, A., I.M. Okazaki, M. Muramatsu, K. Kinoshita, N.A. Begum, T.
Nakajima, H. Saito, and T. Honjo. 2005. A target selection of somatic
hypermutations is regulated similarly between T and B cells upon activation-
induced cytidine deaminase expression. Proc. Natl. Acad. Sci. U S A. 102(12):
4506-11.
11. Kouzine, F., S. Sanford, Z. Elisha-Feil, and D. Levens. 2008. The
functional response of upstream DNA to dynamic supercoiling in vivo. Nat.
Struct. Mol. Biol. 15(2): 146-54.
12. Nakamura, M., S. Kondo, M. Sugai, M. Nazarea, S. Imamura, and T.
Honjo. 1996. High frequency class switching of an IgM+ B lymphoma clone
CH12F3 to IgA+ cells. Int. Immunol. 8(2):193-201.
13. Okazaki, I.M., H. Hiai, N. Kakazu, S. Yamada, M. Muramatsu, K.
Kinoshita, and T. Honjo. 2003. Constitutive expression of AID leads to
tumorigenesis. J. Exp. Med. 197(9): 1173-81.
14. Pasqualucci, L., G. Bhagat, M. Jankovic, M. Compagno, P. Smith, M.
Muramatsu, T. Honjo, H.C. Morse 3rd, M.C. Nussenzweig, and R. Dalla-Favera.
2008. AID is required for germinal center-derived lymphomagenesis. Nat. Genet.
40(1): 108-12.
15. Pasqualucci, L., P. Neumeister, T. Goossens, G. Nanjangud, R.S.
Chaganti, R. Kuppers, and R. Dalla-Favera, 2001. Hypermutation of multiple
proto-oncogenes in B-cell diffuse large-cell lymphoma. Nature. 412: 341–346.
16. Ramiro, A.R., M. Jankovic, T. Eisenreich, S. Difilippantonio, S. Chen-
Kiang, M. Muramatsu, T. Honjo, A. Nussenzweig, and M.C. Nussenzweig. 2004.
AID is required for c-myc/IgH chromosome translocations in vivo. Cell.
118(4):431-8.
17. Ramiro, A. R., P. Stavropoulos, M. Jankovic, and M. C. Nussenzweig.
2003. Transcription enhances AID-mediated cytidine deamination by exposing
single-stranded DNA on the nontemplate strand. Nat. Immunol. 4:452-456.
18. Robbiani, D.F., A. Bothmer, E. Callen, B. Reina-San-Martin, Y. Dorsett, S.
Difilippantonio, D.J. Bolland, H.T. Chen, A.E. Corcoran, A. Nussenzweig, and
M.C. Nussenzweig. 2008. AID is required for the chromosomal breaks in c-myc
that lead to c-myc/IgH translocations. Cell. 135(6):1028-38.
19. Sohail, A., J. Klapacz, M. Samaranayake, A. Ullah and A.S. Bhagwat.
2003. Human activation-induced cytidine deaminase causes transcription-
dependent, strand-biased C to U deaminations. Nucl. Acids Res. 31: 2990–2994.
151
20. Stavnezer, J., and C.T. Amemiya. 2004. Evolution of isotype switching.
Semin. Immunol. 16: 257–275
21. Tahirov, T.H., D. Temiakov, M. Anikin, V. Patlan, W.T. McAllister, D.G.
Vassylyev, and S. Yokoyama. 2002. Structure of a T7 RNA polymerase
elongation complex at 2.9 A resolution. Nature. 420(6911): 43-50.
22. Toulokhonov, I., and R. Landick. 2006. The role of the lid element in
transcription by E. coli RNA polymerase. J. Mol. Biol. 361(4): 644-58.
23. Wakae, K., B.G. Magor, H. Saunders, H. Nagaoka, A. Kawamura, K.
Kinoshita, T. Honjo, and M. Muramatsu. 2006. Evolution of class switch
recombination function in fish activation-induced cytidine deaminase, AID. Int.
Immunol. 18: 41–47.
24. Wang, D., and R. Landick. 1997. Nuclease cleavage of the upstream half
of the nontemplate strand DNA in an Escherichia coli transcription elongation
complex causes upstream translocation and transcriptional arrest. J. Biol. Chem.
272(9):5989-94.
25. Westover, K.D., D.A. Bushnell, and R.D. Kornberg. 2004. Structural basis
of transcription: separation of RNA from DNA by RNA polymerase II. Science.
303(5660):1014-6.
26. Xue, K., C. Rada, and M.S. Neuberger. 2006. The in vivo pattern of AID
targeting to immunoglobulin switch regions deduced from mutation spectra in
msh2-/- ung-/- mice. J. Exp. Med. 203(9):2085-94.
27. Yin, Y.W., and T.A. Steitz. 2002. Structural basis for the transition from
initiation to elongation transcription in T7 RNA polymerase. Science. 298(5597):
1387-95.
28. Yoshikawa, K., I.M. Okazaki, T. Eto, K. Kinoshita, M. Muramatsu, H.
Nagaoka, and T. Honjo. 2002. AID enzyme-induced hypermutation in an actively
transcribed gene in fibroblasts. Science. 296(5575): 2033-6.
29. Yu, K., F. Chedin, C.-L. Hsieh, T. E. Wilson, and M. R. Lieber. 2003. R-
loops at immunoglobulin class switch regions in the chromosomes of stimulated
B cells. Nature Immunol. 4:442-451.
30. Yu, K., D. Roy, M. Bayramyan, I.S. Haworth, and M.R. Lieber. 2005. Fine-
structure analysis of activation-induced deaminase accessibility to class switch
region R-loops. Mol. Cell. Biol. 25(5): 1730-6.
152
31. Yu, K., F. T. Huang, and M. R. Lieber. 2004. DNA substrate length and
surrounding sequence affect the activation induced deaminase activity at
cytidine. J. Biol. Chem. 279:6496-6500.
153
BIBLIOGRAPHY
Bandwar, R. P., N. Ma, S. A. Emanuel, M. Anikin, D. G. Vassylyev, S. S. Patel,
and W. T. McAllister. 2007. The transition to an elongation complex by T7 RNA
polymerase is a multistep process. J Biol Chem 282:22879-86.
Bardwell, P. D., C. J. Woo, K. Wei, Z. Li, A. Martin, S. Z. Sack, T. Parris, W.
Edelmann, and M. D. Scharff. 2004. Altered somatic hypermutation and reduced
class-switch recombination in exonuclease 1-mutant mice. Nat. Immunol. 5:224-
229.
Barreto, V. M., Q. Pan-Hammarstrom, Y. Zhao, L. Hammarstrom, Z. Misulovin,
and M. C. Nussenzweig. 2005. AID from bony fish catalyzes class switch
recombination. J Exp. Med. 202:733-8.
Basu, U., J. Chaudhuri, R. T. Phan, A. Datta, and F. W. Alt. 2007. Regulation of
activation induced deaminase via phosphorylation. Adv. Exp. Med. Biol 596:129-
37.
Bergsagel P.L., M. Chesi, E. Nardini, L. Brents, S. Kirby and W.M. Kuehl. 1996.
Promiscuous translocations into immunoglobulin heavy chain switch regions in
multiple myeloma. Proc. Natl. Acad. Sci 93:13931-13936.
Bottaro, A., R. Lansford, L. Xu, J. Zhang, P. Rothman, and F. W. Alt. 1994. S
region transcription per se promotes basal IgE class switch recombination but
additional factors regulate the efficiency of the process. EMBO J. 13:665-674.
Bransteitter, R., P. Pham, M. D. Scharff, and M. F. Goodman. 2003. Activation-
inducedcytidine deaminase deaminates deoxycytidine on single- stranded DNA
but requires the action of RNase. Proc. Natl. Acad. Sci. U S A. 100:4102-4107.
Bushnell, D, A., K.D. Westover, R.E. Davis, and R.D. Kornberg.Structural basis
of transcription: an RNA polymerase II-TFIIB cocrystal at 4.5 Angstroms.
Science. 303(5660):983-8.
Casellas R., A. Nussenzweig, R. Wuerffel, R. Pelanda, A. Reichlin, H. Suh, X.F.
Kin, E. Besmer, A. Kenter, K. Rajewsky, and M.C. Nussenzweig. 1998. Ku80 is
required for immunoglobulin isotype switching. EMBO J. 17: 2404-2411.
Chaudhuri, J., and F. W. Alt. 2004. Class-switch recombination: interplay of
transcription, DNA deamination and DNA repair. Nat. Rev. Immunol. 4:541-552.
154
Chaudhuri, J., U. Basu, A. Zarrin, C. Yan, S. Franco, T. Perlot, B. Vuong, J.
Wang, R. T. Phan, A. Datta, J. Manis, and F. W. Alt. 2007. Evolution of the
immunoglobulin heavy chain class switch recombination mechanism. Adv
Immunol 94:157-214.
Chaudhuri, J., M. Tian, C. Khuong, K. Chua, E. Pinaus, and F. W. Alt. 2003.
Transcription-targeted DNA deamination by the AID antibody diversification
enzyme. Nature 422:726-730.
Di Noia J., and M.S. Neuberger MS. 2002. Altering the pathway of
immunoglobulin hypermutation by inhibiting uracil-DNA glycosylase. Nature.
419(6902):43-8
Dickerson S.K., E. Market, E. Besmer and F.N. Papavasiliou. 2003. AID
mediates hypermutation by deaminating single stranded DNA. J. Exp. Med.
197(10):1291-6.
Drolet, M., S. Broccoli, F. Rallu, C. Hraiky, C. Fortin, E. Masse, and I. Baaklini.
2003. The problem of hypernegative supercoiling and R-loop formation in
transcription. Front Biosci. 8:d210-221.
Dunnick, W. A., G. Z. Hertz, L. Scappino, and C. Gritzmacher. 1993. DNA
sequence at immunoglobulin switch region recombination sites. Nucl. Acid Res.
21:365-372.
Durniak, K.J., S. Bailey, and T.A. Steitz. 2008. The structure of a transcribing T7
RNA polymerase in transition from initiation to elongation. Science. 322(5901):
553-7.
Duquette, M. L., P. Handa, J. A. Vincent, A. F. Taylor, and N. Maizels. 2004.
Intracellular transcription of G-rich DNAs induces formation of G-loops, novel
structures containing G4 DNA. Genes Dev. 18:1618-1629.
Fan, J., Y. Matsumoto, and D. M. Wilson. 2006. Nucleotide 1 sequence and DNA
secondary structure, as well as replication protein A, modulate the single-
stranded abasic endonuclease activity of APE1. J. Biol. Chem. 281:3889-3898.
Flajnik, M. F., K. Miller, and L. D. Pasquier. 2003. Evolution of the immune
system, p. 519-570. In W. E. Paul (ed.), Fundamental Immunology. Lippincott,
Philadephia.
Gonzalez-Aguilera, C., C. Tous, B. Gomez-Gonzalez, P. Huertas, R. Luna, and
A. Aguilera. 2008. The THP1-SAC3-SUS1-CDC31 complex works in transcription
elongation-mRNA export preventing RNA-mediated genome instability. Mol Biol
Cell 19:4310-8.
155
Gritzmacher, C.A. Molecular aspects of heavy-chain class switching. 1989. Crit.
Rev. Immunol. 9(3):173-200.
Guikema, J.E., E.K. Linehan, D. Tsuchimoto, Y. Nakabeppu, P.R. Strauss, J.
Stavnezer, and C.E. Schrader. 2007. APE1- and APE2-dependent DNA breaks
in immunoglobulin class switch recombination. J. Exp. Med. 204(12):3017-26.
Han, L., and K. Yu. 2008. Altered kinetics of nonhomologous end joining and
class switch recombination in ligase IV--deficient B cells. J. Exp. Med.
205(12):2745-53.
Harriman, G. R., A. Bradley, S. Das, P. Rogers-Fani, and A. C. Davis. 1996. IgA
class switch in Ia exon-deficient mice. J. Clin. Invest. 97:477-485.
Huang, F.T., K. Yu, B.B. Balter, E. Selsing, Z. Oruc, A.A. Khamlichi, C.L. Hsieh
and M.R. Lieber. 2007. Sequence dependence of chromosomal R-loops at the
immunoglobulin heavy-chain Smu class switch region. Mol. Cell Biol.
27(16):5921-32.
Huang, F.T., K. Yu, C.L. Hsieh, and M.R. Lieber. 2006. Downstream boundary of
chromosomal R-loops at murine switch regions: implications for the mechanism
of class switch recombination. Proc. Natl. Acad. Sci. U S A. 103(13):5030-5.
Huertas, P., and A. Aguilera. 2003. Cotranscriptionally formed DNA:RNA hybrids
mediate transcription elongation impairment and transcription-associated
recombination. Mol. Cell 12:711-21.
Ichikawa, H. T., M. P. Sowden, A. T. Torelli, J. Bachl, P. Huang, G. S. Dance, S.
H. Marr, J. Robert, J. E. Wedekind, H. C. Smith, and A. Bottaro. 2006. Structural
phylogenetic analysis of activation-induced deaminase function. J. Immunol.
177:355-61.
Imai, K., G. Slupphaug, W. I. Lee, P. Revy, S. Nonoyama, N. Catalan, L. Yel, M.
Forveille, B. Kavli, H. E. Krokan, H. D. Ochs, A. Fischer, and A. Durandy. 2003.
Human uracil-DNA glycosylase deficiency associated with profoundly impaired
immunoglobulin class-switch recombination. Nat. Immunol. 4:1023-1028.
Jimeno, S., A. G. Rondon, R. Luna, and A. Aguilera. 2002. The yeast THO
complex and mRNA export factors link RNA metabolism with transcription and
genome instability. Embo J 21:3526-35.
Jung, S., K. Rajewsky, and A. Radbruch. 1993. Shutdown of Class Switch
Recombination by Deletion of a Switch Region Control Element. Science.
259:984-987.
156
Kaneko, S., C. Chu, A. J. Shatkin, and J. L. Manley. 2007. Human capping
enzyme promotes formation of transcriptional R loops in vitro. Proc Natl Acad Sci
U S A 104:17620-5.
Khamlichi, A.A., F. Glaudet, Z. Oruc, V. Denis, M. Le Bert, M. Cogné. (2004).
Immunoglobulin class-switch recombination in mice devoid of any S mu tandem
repeat. Blood. 103(10):3828-36.
Korzheva, N., A. Mustaev, M. Kozlov, A. Malhotra, V. Nikiforov, A. Goldfarb, and
S. A. Darst. 2000. A structural model of transcription elongation. Science
289:619-25.
Kotani, A., I.M. Okazaki, M. Muramatsu, K. Kinoshita, N.A. Begum, T. Nakajima,
H. Saito, and T. Honjo. 2005. A target selection of somatic hypermutations is
regulated similarly between T and B cells upon activation-induced cytidine
deaminase expression. Proc. Natl. Acad. Sci. U S A. 102(12): 4506-11.
Kouzine, F., S. Sanford, Z. Elisha-Feil, and D. Levens. 2008. The functional
response of upstream DNA to dynamic supercoiling in vivo. Nat. Struct. Mol. Biol.
15(2):146-54.
Lee, D. Y., and D. A. Clayton. 1996. Properties of a primer RNA-DNA hybrid at
the mouse mitochondrial DNA leading-strand origin of replication. J. Biol. Chem.
271:24262-24269.
Li, X., and J. L. Manley. 2006. Cotranscriptional processes and their influence on
genome stability. Genes Dev 20:1838-47.
Li, X., and J. L. Manley. 2005. Inactivation of the SR protein splicing factor
ASF/SF2 results in genomic instability. Cell 122:365-378.
Li, X., and J. L. Manley. 2005. New talents for an old acquaintance: the SR
protein splicing factor ASF/SF2 functions in the maintenance of genome stability.
Cell Cycle 4:1706-8.
Lieber M.R. 1991. Site-specific recombination in the immune system. FASEB J.
14:2934-44.
Liu, M., J. L. Duke, D. J. Richter, C. G. Vinuesa, C. C. Goodnow, S. H. Kleinstein,
and D. G. Schatz. 2008. Two levels of protection for the B cell genome during
somatic hypermutation. Nature 451:841-5.
Liu, L. F., and Wang, J C. 1987. Supercoiling of the DNA template during RNA
transcription. Proc. Natl. Acad. Sci. USA 84. 7024-7027.,
157
Longerrich, S., U. Basu, F. Alt, and U. Storb. 2006. AID in somatic hypermutation
and class switch recombination. Curr. Opin. Immunol. 18:164-174.
Luby T.M., C.E. Schrader, J. Stavnezer, and E. Selsing. 2001. The mu switch
region tandem repeats are important, but not required, for antibody class switch
recombination. J. Exp. Med. 193(2):159-68.
Manis J.P., D. Dudley, L. Kaylor, and F.W. Alt. 2002. IgH class switch
recombination to IgG1 in DNA-PKcs-deficient B cells. Immunity. 16(4):607-17.
Masse, E., P. Phoenix, and M. Drolet. 1997. DNA topoisomerases regulate R-
loop formation during transcription of the rrnB operon in E. coli. J. Biol. Chem.
272:12816-12823.
Masukata, H., and J. Tomizawa. 1990. A mechanism of formation of a persistent
hybrid betwen elongating RNA and template DNA. Cell 62:331-338.
Min, I. M., L. R. Rothlein, C. E. Schrader, J. Stavnezer, and E. Selsing. 2005.
Shifts in targeting of class switch recombination sites in mice that lack mu switch
region tandem repeats or Msh2. J. Exp. Med. 201:1885-90.
Min, I. M., C. E. Schrader, J. Vardo, T. M. Luby, N. D'Avirro, J. Stavnezer, and E.
Selsing. 2003. The Smu tandem repeat region is critical for Ig isotype switching
in the absence of Msh2. Immunity 19:515-24.
Min, I. M., and E. Selsing. 2005. Antibody class switch recombination: roles for
switch sequences and mismatch repair proteins. Adv Immunol 87:297-328.
Muramatsu, M., H. Nagaoka, R. Shinkura, N. A. Begum, and T. Honjo. 2007.
Discovery of activation-induced cytidine deaminase, the engraver of antibody
memory. Adv Immunol 94:1-36.
Muramatsu M., K. Kinoshita, S. Fagarasan, S. Yamada, Y. Shinkai, and T. Honjo.
2000. Class switch recombination and hypermutation require activation-induced
cytidine deaminase (AID), a potential RNA editing enzyme. Cell. 102(5):553-63.
Muramatsu, M., V. Sankaranand, S. Anant, M. Sugai, K. Kinoshita, N. Davidson,
and T. Honjo. 1999. Specific Expression of Activation-Induced Cytidine
Deaminase (AID), a Novel Member of the RNA-Editing Deaminase Family in
Germinal Center B Cells. J. Biol. Chem. 274:18470-18476.
Mussmann R., M. Courtet, J. Schwager, L. Du Pasquier. 1997. Microsites for
immunoglobulin switch recombination breakpoints from Xenopus to mammals.
Eur. J. Immunol. 27(10):2610-9.
158
Nakamura, M., S. Kondo, M. Sugai, M. Nazarea, S. Imamura, and T. Honjo.
1996. High frequency class switching of an IgM+ B lymphoma clone CH12F3 to
IgA+ cells. Int. Immunol. 8(2):193-201.
Okazaki, I. M., H. Hiai, N. Kakazu, S. Yamada, M. Muramatsu, K. Kinoshita, and
T. Honjo. 2003. Constitutive expression of AID leads to tumorigenesis. J. Exp.
Med. 197(9): 1173-81.
Pasqualucci, L., G. Bhagat, M. Jankovic, M. Compagno, P. Smith, M.
Muramatsu, T. Honjo, H.C. Morse 3rd, M.C. Nussenzweig, and R. Dalla-Favera.
2008. AID is required for germinal center-derived lymphomagenesis. Nat. Genet.
40(1): 108-12.
Pasqualucci, L., P. Neumeister, T. Goossens, G. Nanjangud, R.S. Chaganti, R.
Kuppers, and R. Dalla-Favera, 2001. Hypermutation of multiple proto-oncogenes
in B-cell diffuse large-cell lymphoma. Nature. 412: 341–346.
Petersen-Mahrt, S. K., R. S. Harris, and M. S. Neuberger. 2002. AID mutates E.
coli suggesting a DNA deamination mechanism for antibody diversification.
Nature 418:99-103.
Petrini, J., and W. Dunnick. 1989. Products and Implied Mechanism of H Chain
Switch Recombination. J. Immunol. 142:2932-2935.
Pham, P., R. Bransteitter, J. Petruska, and M. F. Goodman. 2003. Processive
AID-catalyzed cytosine deamination on single-stranded DNA stimulates somatic
hypermutation. Nature 424:103-107.
Rada, C., G. T. Williams, H. Nilsen, D. E. Barnes, T. Lindahl, and M. S.
Neuberger. 2002. Immunoglobulin isotype switching is inhibited and somatic
hypermutation perturbed in UNG-deficient mice. Curr. Biol. 12:1748-1755.
Ramiro, A. R., M. Jankovic, T. Eisenreich, S. Difilippantonio, S. Chen-Kiang, M.
Muramatsu, T. Honjo, A. Nussenzweig, and M.C. Nussenzweig. 2004. AID is
required for c-myc/IgH chromosome translocations in vivo. Cell. 118(4):431-8.
Ramiro, A. R., P. Stavropoulos, M. Jankovic, and M. C. Nussenzweig. 2003.
Transcription enhances AID-mediated cytidine deamination by exposing single-
stranded DNA on the nontemplate strand. Nat. Immunol. 4:452-456.
Reaban, M. E., and J. A. Griffin. 1990. Induction of RNA-stabilized DNA
conformers by transcription of an immunoglobulin switch region. Nature 348:342-
344.
159
Reaban, M. E., J. Lebowitz, and J. A. Griffin. 1994. Transcription induces the
formation of a stable RNA.DNA hybrid in the immunoglobulin alpha switch region.
J. Biol. Chem. 269:21850-21857.
Reina-San-Martin B., S. Difilippantonio, L. Hanitsch, R.F. Masilamani, A.
Nussenzweig, and M.C. Nussenzweig. 2003. H2AX is required for recombination
between immunoglobulin switch regions but not for intra-switch region
recombination or somatic hypermutation. J. Exp. Med. 197(12):1767-78.
Revy, P., T. Muto, Y. Levy, F. Geissmann, A. Plebani, O. Sanal, N. Catalan, M.
Forveille, R. Dufourcq-Labelouse, A. Gennery, I. Tezcan, F. Ersoy, H. Kayserili,
A.G. Ugazio, N. Brousse, M. Muramatsu, L.D. Notarangelo, K. Kinoshita, T.
Honjo, A. Fischer, and A. Durandy. Activation-induced cytidine deaminase (AID)
deficiency causes the autosomal recessive form of the Hyper-IgM syndrome
(HIGM2). 2000. Cell. 102(5):565-75.
Robbiani, D.F., A. Bothmer, E. Callen, B. Reina-San-Martin, Y. Dorsett, S.
Difilippantonio, D.J. Bolland, H.T. Chen, A.E. Corcoran, A. Nussenzweig, and
M.C. Nussenzweig. 2008. AID is required for the chromosomal breaks in c-myc
that lead to c-myc/IgH translocations. Cell. 135(6):1028-38.
Roberts, R. W., and D. M. Crothers. 1992. Stability and properties of double and
triple helices: dramatic effects of RNA or DNA backbone composition. Science
258:1463-1466.
Ronai, D., M. D. Iglesias-Ussel, M. Fan, Z. Li, A. Martin, and M. D. Scharff. 2007.
Detection of chromatin-associated single-stranded DNA in regions targeted for
somatichypermutation. J. Exp. Med. 204:181-90.
Roy, D., C.L. Hsieh, and M.R. Lieber. 2009. G-Clustering is important for the
initiation of transcription-induced R-loops in vitro whereas high G-density without
clustering is sufficient thereafter. (In press).
Roy, D., K. Yu, and M. R. Lieber. 2008. Mechanism of R-loop formation at
immunoglobulin class switch sequences. Mol. Cell Biol. 28:50-60.
Saenger, W. 1984. Principles of Nucleic Acid Structure. Springer-Verlag, New
York.
SantaLucia, J. 1998. A unified view of polymer, dumbbell, and oligonucleotide
DNA nearest-neighbor thermodynamics. Proc. Natl. Acad. Sci. USA 95:1460-
1465.
Selsing, E. 2006. Ig class switching: targeting the recombinational mechanism.
Curr. Opin. Immunol. 18:249-254.
160
Shinkura, R., M. Tian, C. Khuong, K. Chua, E. Pinaud, and F. W. Alt. 2003. The
influence of transcriptional orientation on endogenous switch region function.
Nat. Immunol. 4:435-441.
Sohail, A., J. Klapacz, M. Samaranayake, A. Ullah and A.S. Bhagwat. 2003.
Human activation-induced cytidine deaminase causes transcription-dependent,
strand-biased C to U deaminations. Nucl. Acids Res. 31: 2990–2994.
Stavnezer, J., and C. T. Amemiya. 2004. Evolution of isotype switching. Semin.
Immunol. 16:257-275.
Stavnezer, J., and S. Sirlin. 1986. Specificity of immunoglobulin heavy chain
switch correlates with activity of germline heavy chain genes prior to switching.
EMBO J. 5:95-102.
Szurek, P., J. Petrini, and W. Dunnick. 1985. Complete nucleotide sequence of
the murine γ3 switch region and analysis of switch recombination sites in two γ3-
expressing hybridomas. J. Immunol. 135:620-626.
Tahirov, T.H., D. Temiakov, M. Anikin, V. Patlan, W.T. McAllister, D.G.
Vassylyev, and S. Yokoyama. 2002. Structure of a T7 RNA polymerase
elongation complex at 2.9 A resolution. Nature. 420(6911): 43-50.
Tian, M., and F. W. Alt. 2000. Transcription induced cleavage of immunoglobulin
switch regions by nucleotide excision repair nucleases in vitro. J. Biol. Chem.
275:24163-24172.
Toulokhonov, I., and R. Landick. 2006. The role of the lid element in transcription
by E. coli RNA polymerase. J. Mol. Biol. 361(4): 644-58.
Tsai, A. G., A. E. Engelhart, M. M. Hatmal, S. I. Houston, N. V. Hud, I. S.
Haworth, and M. R. Lieber. 2008. Conformational variants of duplex DNA
correlated with cytosine-rich chromosomal fragile sites. J. Biol. Chem.
284(11):7157-64.
Tsao, Y.P., H.Y. Wu,, and L.F. Liu. 1989. Transcription-driven supercoiling of
DNA: direct biochemical evidence from in vitro studies. Cell. 56(1):111-8.
Wakae, K., B. G. Magor, H. Saunders, H. Nagaoka, A. Kawamura, K. Kinoshita,
T. Honjo, and M. Muramatsu. 2006. Evolution of class switch recombination
function in fish activation-induced cytidine deaminase, AID. Int. Immunol. 18:41-
161
Wang, D., and R. Landick. 1997. Nuclease cleavage of the upstream half of the
nontemplate strand DNA in an Escherichia coli transcription elongation complex
causes upstream translocation and transcriptional arrest. J Biol Chem 272:5989-
94.
Westover, K. D., D. A. Bushnell, and R. D. Kornberg. 2004. Structural basis of
transcription: separation of RNA from DNA by RNA polymerase II. Science
303:1014-1016.
Wuerffel, R., L. Wang, F. Grigera, J. Manis, E. Selsing, T. Perlot, F.W. Alt, M.
Cogne, E. Pinaud, and A.L. Kenter. 2007. S-S synapsis during class switch
recombination is promoted by distantly located transcriptional elements and
activation-induced deaminase. Immunity. 27(5):711-22.
Xu, B., and D. A. Clayton. 1996. RNA-DNA hybrid formation at the human
mitochondrial heavy-strand origin ceases at replication start sites: an implication
for RNA-DNA hybrids serving at primers. EMBO J. 15:3135-3143.
Xu, L., B. Gorham, S. C. Li, A. Bottaro, F. W. Alt, and P. Rothman. 1993.
Replacement of germ-line ε promoter by gene targeting alters control of
immunoglobulin heavy chain class switching. Proc. Natl. Acad. Sci. USA
90:3705-3709.
Xue, K., C. Rada, and M.S. Neuberger. 2006. The in vivo pattern of AID targeting
to immunoglobulin switch regions deduced from mutation spectra in msh2-/- ung-
/- mice. J. Exp. Med. 203(9):2085-94.
Yin, Y.W., and T.A. Steitz. 2002. Structural basis for the transition from initiation
to elongation transcription in T7 RNA polymerase. Science. 298(5597): 1387-95.
Yoshikawa, K., I.M. Okazaki, T. Eto, K. Kinoshita, M. Muramatsu, H. Nagaoka,
and T. Honjo. 2002. AID enzyme-induced hypermutation in an actively
transcribed gene in fibroblasts. Science. 296(5575): 2033-6.
Yu, K., F. Chedin, C.-L. Hsieh, T. E. Wilson, and M. R. Lieber. 2003. R-loops at
immunoglobulin class switch regions in the chromosomes of stimulated B cells.
Nature Immunol. 4:442-451.
Yu, K., F. T. Huang, and M. R. Lieber. 2004. DNA substrate length and
surrounding sequence affect the activation induced deaminase activity at
cytidine. J. Biol. Chem. 279:6496-6500.
Yu, K., and M. R. Lieber. 2003. Nucleic acid structures 1 and enzymes in the
immunoglobulin class switch recombination mechanism. DNA Repair 2:1163-
1174.
162
Yu, K., D. Roy, M. Bayramyan, I. S. Haworth, and M. R. Lieber. 2005. Fine-
structure analysis of activation-induced deaminase accessbility to class switch
region R-loops. Mol. Cell. Biol. 25:1730-1736.
Zarrin, A. A., F. W. Alt, J. Chaudhuri, N. Stokes, D. Kaushal, L. DuPasquier, and
M. Tian. 2004. An evolutionarily conserved target motif for immunoglobulin class-
switch recombination. Nat. Immunol. 5:1275-1281.
Zarrin, A. A., M. Tian, J. Wang, T. Borjeson, and F.W. Alt. 2005. Influence of
switch region length on immunoglobulin class switch recombination. Proc. Natl.
Acad. Sci. U S A. 2005. 102(7):2466-70.
Zhang, J., A. Bottaro, S. Li, V. Stewart, and F. W. Alt. 1993. A selective defect in
IgG2b switching as a result of targeted mutation of the I gamma 2b promoter and
exon. Embo J 12:3529-37.
Zhao, Y., Q. Pan-Hammarstrom, Z. Zhao, and L. Hammarstrom. 2005.
Identification of the activation-induced cytidine deaminase gene from zebrafish:
an evolutionary analysis. Dev Comp Immunol 29:61-71.
163
APPENDIX A: SUPPLEMENTARY INFORMATION FOR CHAPTER 2
Materials and Methods
Plasmid based switch substrates
The switch substrate, pDR3, which contains 4 Ig Sγ3 repeats of wild type
sequence, was assembled using oligonucleotides DR013 (5’-
GGGGTGCTGGGGTAGGTTAGAGCATGGGAACCAGGCTGGACAGCTCTGGA
AGCTGAGATATGTGGGGTTGTGGGGAACAGGTTGGACAGCTCT-3’) and
DR014 (5’-
GCAAGCTGCCCAGGCTGGTCCCCACATCCCACCCACCCTAGCTCCCCAGA
GCTGCCCAGCCTAGTCCCCACACCCCCAACTACCCTAGCTCCCCAGAGCTG
TCCAACCTGTT-3’), DR036 (5’-
GCGCTCGAGGGGTGCTGGGGTAGGTTAGAAG-3’) and DR037 (5’-
GCGCTCGAGCAAGCTGCCCAGCCTGGTC-3’). The ends of the resulting
fragment were cut with XhoI, and the fragment was cloned into the XhoI site of
pKY127, which is derived from pBluescript after deleting the lacZ promoter.
Another substrate, pDR18, which contains 4 wild type repeats, was made
by generating a PCR fragment of the switch insert in pDR3 with oligonucleotides
containing T7 and T3 promoter sequences, and then cloning this blunt end PCR
fragment into the blunted AgeI site of pDR15, which is a minor derivative of
pKY272 (one restriction site altered). pDR51 has 3 switch repeats and was
made using PCR using oligonucleotides DR013, DR014, DR036 and DR074 (5’-
GCGAAGCTTAGAGCTGCCCAGCCTA-3’). The PCR fragment was digested
164
with appropriate restriction enzymes and cloned in pDR16, a derivative of pDR15
containing the T7 and T3 promoters. pDR50 contains 2 switch repeats and was
constructed by generating a PCR fragment from DR013, DR036 and DR072 (5’-
GGCAAGCTTAGAGCTGTCCAACCTGTT-3’). The PCR fragment was digested
and cloned into pDR16. pDR49 contains 1 switch repeat and was constructed by
cloning annealed oligonucleotides DR070 (5’-
TCGAGGGGTGCTGGGGTAGGTTAGAGCATGGGAACCAGGCTGGACAGC
TCTGA-3’) and DR071 (5’-
AGCTTCAGAGCTGTCCAGCCTGGTTCCCATGCTCTAACCTACCCCAGCACC
CC-3’) into pDR16. The switch fragment in pDR22 (see text Methods) was
assembled using PCR using DR018 (5’-
GGGCTGTGGGCTAGGTTAGAGCATGGGAACCAGGCTGGACAGCTCTGGAG
GAGCTGAGATATGTGGGCTTGTGGGCAAC-3’), DR019 (5’-
TAGTGCCCACAGCCCGAACTACCCTAGCTGCCCAGAGCTGTCCAACCTGTT
GCCCACAAGCCCACATA-3’), DR020 (5’-
GTTCGGGCTGTGGGCACTAGGCTGGGCAGCTCTGGGCAGCTAGGGTGGGT
GGGATGTGGGCACCAGGCTGGGCAGCTTGC-3’), DR031 (5’-
GCGCTCGAGCAAGCTGCCCAGCCTGGTG-3’) and DR038 (5’-
GCGCTCGAGGGCTGCTGGGCTAGGTTAGAG-3’). The PCR fragment was
cloned into the XhoI site of pKY127. To make pDR22, PCR amplification of this
fragment using T7 and T3 oligonucleotides was done, and the resulting fragment
was cloned into pDR15. The same process was followed for construction of
pDR26 and pDR30 (below).
165
The switch fragment in pDR26 (see text Methods) was assembled by PCR
with DR022 (5’-
GGCGTGTGGCGTAGGTTAGAGCATGGCAACCAGGCTGGACAGCTCTGGAA
GGAGCTGAGATATGTGGCGTTGTGGCGAAC-3’), DR023 (5’-
TAGTCGCCACACCGCCAACTAGCCTAGCTCGCCAGAGCTGTCCAACCTG
TTCGCCACAACGCCACATA-3’), DR024 (5’-
GTTGGCGGTGTGGCGACTAGGCTGGCCAGCTCTGGCGAGCTAGGCTGG
CTGGCATGTGGCGACCAGGCTGGCCAGCTTGC-3’), DR033 (5’-
GCGCTCGAGCAAGCTGGCCAGCCTGGTC-3’) and DR039 (5’-
GCGCTCGAGGCGTGCTGGCGTAGGTTAGAG-3’). The fragment was moved
to pDR16.
The fragment for pDR30 was PCR assembled with DR026 (5’-
GCGCTGTGCGCTACGTTAGAGCATGCGAACCAGCCTGCACAGCTCTGCAG
CAGCTGAGATATGTGCGCTTGTGCGCAAC-3’), DR027 (5’-
CTAGTGCGCACACGCGCAACTACGCTAGCTGCGCAGAGCTGTGCAAGCT
GTTGCGCACAAGCGCACATA-3’), DR028 (5’-
GTTGCGCGTGTGCGCACTAGCCTGCGCAGCTCTGCGCAGCTAGCGTGCGT
GCGATGTGCGCACCACGCTGCGCAGCTTGC-3’), DR040 (5’-
GCGCTCGAGCGCTGCTGCGCTAGCTTAGAG-3’) and DR041 (5’-
GCGCTCGAGCAAGCTGCGCAGGCTGGTG-3’). The fragment was moved to
pDR16.
The G-dense fragment for pDR54 was PCR assembled with DR104 (5’-
GTGCGTGCGAGCGCGAGAGCGTGAGTGCGTGAGCGAGCGCGTGAGCGAG
166
CGCGAGTGTGCGAGCGAGCGTGTGCGTGAGCGTGTG-3’), DR105 (5’-
TCTCACTCGCGCACTCACTCGCACGCACTCAGCGCTCTCGCACACGCTCAC
GCACACGCT-3’), DR106 (5’-
GAGTGAGTGCGCGAGTGAGAGCGTGAGCGCGTGCGAGCGAGTGCGTGC
GAGTGCGTGAGCGCGTGAGCGAGTGCGCGAGAGCGTC-3’), DR107 (5’-
GCGCTCGAGACGCTCTCGCGCACTCGCTCACG-3’) and DR109 (5’-
GCGCTCGAGTGCGTGCGAGCGCGAGAGCGTGA-3'). This PCR fragment
was digested with XhoI and cloned into XhoI site of pDR16 to construct pDR54.
DR050 (5’-TAGGCGTGTACGGTGGGAGGT-3’) and DR051 (5’-
GCGGTACTTCATGGTCATCTCC-3’) or DR056 (5’-
AAGGCGGCGGACAAGATGTCCTC-3’) were used to amplify the sodium
bisulfite treated DNA. DR075 (5’-CAAAACTATCCAACCTGATTCCCATACTC-3’)
was used as a probe to detect bisulfite converted R-loop derivative molecules in
the colony lift hybridization assay on pDR18, pDR22, pDR49, pDR50 and
pDR51. DR077 (5’-CCAAAACTATCCAACCTGATTACCATACT-3’) was used in
the colony lift hybridization assay on pDR26 whereas DR110 (5’-
ACACTCACTCACACACTCACTCACA-3’) was used for colony lift hybridization
assay on pDR54. DR075, DR077 and DR110 were designed to hybridize with
the sodium bisulfite modified, PCR amplified and TA cloned nontemplate DNA
strand with C-to-T conversions. These oligonucleotides were complementary to
the anticipated C-to-U conversions on the nontemplate strand and could not bind
to unconverted (non-R-looped) derivatives. All oligonucleotides were ordered
from the Chemical core facility at USC Norris Cancer Center, Invitrogen Life
167
Technologies (Carlsbad, CA), Operon Biotechnologies, Inc. (Huntsville, AL) or
Integrated DNA Technologies (Coralville, IA).
Enzymes and reagents
T7 RNA polymerase was purchased from Promega (Madison, WI).
RNase A and sodium bisulfite was purchased from Sigma (St. Louis, MO). E.coli
RNase H1 was amplified from E.coli genomic DNA, cloned into pTME (gift from
Dr. Carlos D.M. Filipe, McMaster University, Canada). RNase H1 purification
was done as described in Ref. 13 of Chapter 2. RNaseT1 was a kind gift from
Dr. Lucio Comai at USC and was originally purchased from Boehringer Manheim.
All restriction endonucleases were from New England Biolabs (Beverly, MA).
Radioisotope-labeled nucleotides were purchased from Perkin Elmer (Boston,
MA).
In vitro transcription of switch substrates
Transcription was done as described in the main text Methods section.
The transcribed DNA was electrophoresed on 1% agarose gel in 0.5X TBE buffer
(44.5 mM Tris-borate, 1 mM EDTA, pH 8.4) at 5V/cm for 3.5 h. Nucleic acid was
visualized in the gels using an ultraviolet illuminator after post-staining the gel in
0.5x TBE with ethidium bromide.
32
P-α-UTP radiolabel was added during the
transcription reaction to label the transcript (and then the reaction was run on 1%
agarose/0.5xTBE gel), the gel was pressed flat between two sheets of blotting
paper under weights, wrapped in a saran wrap, exposed to the radiation reactive
168
phosphoimager screens, scanned on a Molecular Dynamics Imager 445SI
(Sunnyvale, CA).
169
Figure A.1. Two models for the formation of long R-loops.
The top panel is a schematic of the RNA thread back model. The nascent
RNA is depicted as coming out of the exit channel of the RNA polymerase and
then annealing with the template strand before the two DNA strands have a
chance to reanneal. Once this RNA:DNA association occurs, the remaining RNA
can continue to form the R-loop. The R-loop would terminate when the free
energy difference between the RNA:DNA and the DNA:DNA duplexes becomes
smaller. The bottom panel is a schematic of the extended hybrid model. During
transcription, a 9 bp RNA:DNA hybrid exists within the transcription complex. If
the RNA fails to exit the RNA polymerase exit channel, then the RNA could
remain associated with template strand, thereby forming an extended RNA:DNA
hybrid. RNA has been depicted with a dashed line whereas DNA strands are
represented by solid lines.
170
Figure A.2. Schematic of RNase T1 interference with R-loop formation.
If the thread back model obtains, and if RNase T1 is present during
transcription, then it would digest the transcript coming out of the exit pore of the
RNA polymerase. If the extended hybrid model applies, then the transcript would
always be hybridized with the DNA and would not be vulnerable to digestion with
RNAse T1.
171
Figure A.3. Decrease in R-loop formation efficiency as a function of the
number of switch repeats.
The frequency of molecules with R-loop decreases with the reduction in
switch repeat length. This graph plots the values from the colony lift hybridization
assay on substrates containing 4, 3, 2 and 1 Sγ3 repeats. Different substrates
are placed on the x-axis in the order of decreasing repeat numbers, while the y-
axis is the percentage of R-looped molecules. (See Table 2.1 for the precise
values and numbers.)
172
Figure A.4. Decrease in R-loop formation efficiency with the decrease in
G-clustering within switch repeats.
The frequency of molecules harboring R-loops decreases with the
decrease in G-clustering. This graph plots the values from the colony lift
hybridization assay on these substrates.
173
APPENDIX B: SUPPLEMENTARY INFORMATION FOR CHAPTER 3
Discussion
Thermodynamic considerations of R-loop initiation in the RIZ
Addition of the motif containing two GGGG clusters or one GGGG cluster
is predicted to improve the G values of RNA:DNA hybrid by less than 5% (see
column 6 in Tables B.2-B.4) compared to the addition of a random non-G-
clustered region over the length of the switch substrate, given that they have the
same REZ (1). However, we observe many fold improvement in R-loop
formation in the G-cluster containing substrates compared with the random
sequence containing substrate. The efficiencies correlate with the number of G-
clusters in the RIZ, with the 2x4G-cluster motif being most efficient in R-loop
initiation, followed by the 1x4G-cluster motif, and with the random sequence motif
(0x4G) being the least efficient, among those tested. There are two factors to be
considered in this regard as described below.
The first factor is the local contribution to increasing the RNA:DNA hybrid
stability by different motifs. Although addition of 2x4G marginally improves the
∆G of the RNA:DNA hybrid over the random motif over the length of the
substrate (RIZ + REZ), the local (i.e., RIZ) ∆GRNA:DNA of the 2x4G motif improves
by 49% as compared to the random motif thus making it a more stable hybrid.
The 1x4G motif similarly shows a local stability increase by 25% over the random
motif (see column 4 of Table B.1). Therefore, these small sequences with G-
clusters contribute by locally increasing the stability of the RNA-foot on the
174
template DNA strand. This added stability can direct a more efficient RNA-
threadback, resulting in higher R-loop formation efficiencies as compared to the
motifs containing random sequences.
The second factor is the increased local difference between RNA:DNA
and DNA:DNA values for the same molecule. We calculated the values of
∆GRNA:DNA and ∆GDNA:DNA, designated this difference as ∆∆G, and then compared
these ∆∆G values between motifs/substrates. Over the length of the switch
region, the ∆∆G values are very similar for the substrates with different RIZ but
the same REZ (Tables B.2-B.4, column 7). However, the local increase in
∆GRNA:DNA is almost 38% over the ∆GDNA:DNA for the 2x4G motif, 23% for the 1xG
motif and only 11% for the random motif (Table B.1, column 5). These values
indicate that while the local RNA:DNA hybrid in 2x4G motif is 38% more stable
than the local DNA duplex, the transcript with a random sequence motif is only
11% more stable than the corresponding DNA duplex. Although the RNA:DNA
hybrid is always more stable than the DNA:DNA duplex, DNA duplex formation
after transcription might be facilitated in the random sequence owing to a small
difference between ∆GRNA:DNA and ∆GDNA:DNA as compared to the G-clustered
motifs. More interestingly, when we compare the local ∆∆G values between the
three motifs used in this study, we find that the ∆∆G for the 2xG motif is more
than 4-fold better than the random motif. The 1xG-cluster motif is also better as
compared to the random motif by more than 2.3-fold (Table B.1, column 6)
whereas this factor increases only by about 0.1-fold (10%) in the 2x4G motif
175
containing substrate as compared with the substrate with the same REZ but a
random motif (see column 8 of Tables B.2-B.4).
Thus, the factors that help G-clusters at relatively small DNA regions in
improving R-loop nucleation are: (1) local increase in the RNA:DNA hybrid
stabilization when compared to non G-clustered motifs; (2) substantially higher
local RNA:DNA hybrid stability as compared to the DNA duplex; and (3) several
fold increase in G when compared with random sequence.
In the R-loop initiation as well as elongation zones, the thermodynamic
stability of DNA:DNA that has clusters of Gs is lower than for dispersed Gs
[compare ∆GDNA:DNA values for substrates with same RIZ but different REZs (e.g.,
substrates with the A motif; Tables B.2-B.4, column 4)]. This may favor
DNA:DNA strand separation in the G-clustered regions, thereby making it
somewhat easier for the proximity-disadvantaged nascent RNA to anneal to the
template DNA strand, thereby initiating a nucleation event.
176
Table B.1. Prediction of expected change in local Gibbs free energy (∆G)
upon DNA:DNA duplex formation or RNA:DNA hybrid formation for motifs
A, C or B containing 2x4G-clusters, 1x4G-clusters or no G-clusters,
respectively.
The ∆G values have been calculated using the Module 1 program in
HyTher server available at http://ozone3.chem.wayne.edu. The hybridization
conditions used in this and the subsequent tables were 0.01M Na
+
and 0.006 M
Mg
2+
at 37
o
C (same as the in vitro transcription reaction conditions used
throughout this study).
1.
RIZ motif
2.
∆G
DNA:DNA
(kcal/mol)
3.
∆G
DNA:DNA
(kcal/mol)
4.
Percent
improveme
nt in
RNA:DNA
stability
with motif A
or motif C
compared
to motif B
=
100%x
(∆G
RNA:DNA
of A or C –
∆G
RNA:DNA
of
B)/
∆G
RNA:DNA
of
B
5.
Percent
improveme
nt in
RNA:DNA
hybrid
stability as
compared
to DNA:DNA
(DNA
duplex
stability(∆∆
G) =
100%x
(∆G
RNA:DNA
-
∆G
DNA:DNA
)/
∆G
DNA:DNA
6.
Fold
change in
∆∆G with
motif A or
motif C
compared
to motif B
=
(∆G
RNA:DNA
of A, C or B
- ∆G
DNA:DNA
of A, C or B)
/ (∆G
RNA:DNA
-∆G
DNA:DNA
of B)
A (2x4G-
clusters)
-19.8 -27.4 48.9 38.4 4.2
C (1x4G-
clusters)
-18.6 -22.9 24.5 23.1 2.4
B (Random
DNA, no G-
clusters)
-16.6 -18.4 - 10.8 1.0
177
Table B.2. Prediction of expected change in Gibbs free energy (∆G) upon
DNA:DNA duplex formation or RNA:DNA hybrid formation for pDR18A,
pDR18C and pDR18B over the length of RIZ+REZ.
1.
Substrat
e name
2.
RIZ
motif
3.
REZ
sequenc
e
4.
∆G
DNA:DN
A
(kcal/mol
)
5.
∆G
RNA:DN
A
(kcal/mol
)
6.
Percent
improve
ment in
RNA:DN
A
stability
with
motif A
or motif
C
compare
d to
motif B =
100%x
(∆G
RNA:DN
A
of A or
C –
∆G
RNA:DN
A
of B)/
∆G
RNA:DN
A
of B
7.
Percent
improve
ment in
RNA:DN
A hybrid
stability
as
compare
d to
DNA:DN
A(DNA
duplex)
stability
(∆∆G)
=
100%x
(∆G
RNA:DN
A
-
∆G
DNA:DN
A
)/∆G
DNA:
DNA
8.
Fold
change
in ∆∆G
with
motif A
or motif
C
compare
d to
motif B
=
(∆G
RNA:D
NA
of A,
C or B
-
∆G
DNA:DN
A
of A, C
or B) /
(∆G
RNA:D
NA
-
∆G
DNA:DN
A
of B)
pDR18A A 4
repeats;
mostly
4G-
clusters
-283.9 -375.3 2.8 32.2 1.1
pDR18C C 4
repeats;
mostly
4G-
clusters
-282.0 -370.0 1.3 31.2 1.0
pDR18B B 4
repeats;
mostly
4G-
clusters
-279.8 -365.1 - 30.5 1.0
178
Table B.3. Prediction of expected change in Gibbs free energy (∆G) upon
DNA:DNA duplex formation or RNA:DNA hybrid formation for pDR22A,
pDR22C and pDR22B over the length of RIZ+REZ.
1.
Substrat
e name
2.
RIZ
motif
3.
REZ
sequenc
e
4.
∆G
DNA:DN
A
(kcal/mol
)
5.
∆G
RNA:DN
A
(kcal/mol
)
6.
Percent
improve
ment in
RNA:DN
A
stability
with
motif A
or motif
C
compare
d to
motif B =
100%x
(∆G
RNA:DN
A
of A or
C –
∆G
RNA:DN
A
of B)/
∆G
RNA:DN
A
of B
7.
Percent
improve
ment in
RNA:DN
A hybrid
stability
as
compare
d to
DNA:DN
A(DNA
duplex)
stability
(∆∆G)
=
100%x
(∆G
RNA:DN
A
-
∆G
DNA:DN
A
)/∆G
DNA:
DNA
8.
Fold
change
in ∆∆G
with
motif A
or motif
C
compare
d to
motif B
=
(∆G
RNA:D
NA
of A,
C or B
-
∆G
DNA:DN
A
of A, C
or B) /
(∆G
RNA:D
NA
-
∆G
DNA:DN
A
of B)
pDR22A A 4 repeats
of 3G-
clusters
-287.8 -370.4 2.8 28.7 1.1
pDR22C C 4 repeats
of 3G-
clusters
-285.9 -365.1 1.4 27.7 1.0
pDR22B B 4 repeats
of 3G-
clusters
-283.7 -360.2 - 27.0 1.0
179
Table B.4. Prediction of expected change in Gibbs free energy (∆G) upon
DNA:DNA duplex formation or RNA:DNA hybrid formation for pDR54A,
pDR54C and pDR54B over the length of RIZ+REZ.
1.
Substrat
e name
2.
RIZ
motif
3.
REZ
sequenc
e
4.
∆G
DNA:DN
A
(kcal/mol
)
5.
∆G
RNA:DN
A
(kcal/mol
)
6.
Percent
improve
ment in
RNA:DN
A
stability
with
motif A
or motif
C
compare
d to
motif B =
100%x
(∆G
RNA:DN
A
of A or
C –
∆G
RNA:DN
A
of B)/
∆G
RNA:DN
A
of B
7.
Percent
improve
ment in
RNA:DN
A hybrid
stability
as
compare
d to
DNA:DN
A(DNA
duplex)
stability
(∆∆G)
=
100%x
(∆G
RNA:DN
A
-
∆G
DNA:DN
A
)/∆G
DNA:
DNA
8.
Fold
change
in ∆∆G
with
motif A
or motif
C
compare
d to
motif B
=
(∆G
RNA:D
NA
of A,
C or B
-
∆G
DNA:DN
A
of A, C
or B) /
(∆G
RNA:D
NA
-
∆G
DNA:DN
A
of B)
pDR54A A 4 repeat;
disperse
d Gs, no
G-
clusters;
49.7% G
-319.6 -379.0 4.9 18.6 1.1
pDR54C C 4 repeat;
disperse
d Gs, no
G-
clusters;
49.7% G
-310.9 -365.9 1.3 17.7 1.1
pDR54B B 4 repeat;
disperse
d Gs with
no G-
clusters;
49.7% G
-309.0 -361.3 - 16.9 1.0
180
APPENDIX C: SUPPLEMENTARY INFORMATION FOR CHAPTER 4
Figure C.1. Model of effect of increasing distance between promoter and
switch repeat DNA.
(A) This diagram represents the approximate positions of the T7 promoter
(black arrow) and the 3 Sγ3 switch repeats (three solid arrows showing the C-rich
repeats on the template strand DNA, complementary to G-rich nontemplate
strand) and the random sequence DNA between the promoter and the start of
the switch repeats (light green non-C-rich template strand DNA) in pDR72. The
corresponding non-G-rich portion of the RNA is shown in dark green followed by
the red G-rich part of the transcript in the switch region.
(B) In presence of RNase A (blue oval) during transcription, the non-G-rich
part of the transcript is preferentially digested (dashed dark green line) as
compared to the G-rich part of the transcript because RNase A cuts after Cs and
Us in the RNA. This allows the G-rich RNA to hybridize with a relatively higher
efficiency to the template DNA strand (represented by red downward arrow) by
increase in the molecular mobility of its 5’ end and also by decrease in the
inhibition because of removal of already hybridized G-rich part of the RNA by the
non-G-rich, non-hybridizing portion of the transcript.
181
Figure C.2. Model of R-loop formation at REZ switch repeats in presence
of G-clustered RIZs located at a distance upstream of the REZ.
(A) The initial phase of R-loop formation is demonstrated where the short but
stable nucleating RNA:DNA hybrid forms at the G-clustered RIZ motif.
(B) The R-loop is established at the downstream switch repeats (REZ). This
process is more efficient in the substrates that have the RIZ motifs at the distant
upstream region as compared to substrate without any G-clusters because of the
favorable proximity of the RNA to the template strand maintained by short
RNA:DNA hybrid at upstream G-clusters. The RNA between the RIZ and the
REZ G-clustered regions is non-G-rich and therefore cannot compete effectively
with the nontemplate DNA strand to hybridize with the template strand DNA and
therefore remains single-stranded.
(C) Since a suitable REZ is not present near the RIZ, the short RNA:DNA
hybrid formed at the G-clustered RIZ dissociates relatively quickly as compared
to regions where RIZ and REZ are close to each other. The R-loop formed at
downstream REZ persists longer because of better stabilization. This is the
reason the R-loop structures are seen almost exclusively at the switch repeats
for linearized pDR72A substrate (Fig. 4.3) although there is a supporting role of
G-clusters towards R-loop formation as shown in Fig 4.2.
182
Figure C.3. Model of R-loop formation at a negatively supercoiled
substrate with the RIZ and REZ separated by a non-G-rich sequence.
Using a supercoiled version of pDR72A where the G-clustered RIZ (2x4G)
is separated from the REZ (3 Sγ3 repeats), we observed many molecules with R-
loop structures beginning upstream of the REZ and were continuous from RIZ to
REZ with the intervening 1-repeat long non-G-rich sequence also showing
continuous C-to-T conversions upon bisulfite treatment after transcription (Fig.
4.3, molecules depicted below the red line). This pattern is different from the
linearized molecules (depicted above the red line in Fig. 4.3). Negative
supercoiling favors transient separation of DNA strands and is shown by a solid
black arrow pointing upwards. This gives the non-G-rich portion of the transcript
an increased opportunity to compete (red downward arrow) with the nontemplate
DNA strand and consequently, we see an improved relative ability of the non-G-
rich RNA to establish an RNA:DNA hybrid with the template strand.
183
Figure C.4. Model of R-loop formation at substrates containing a
nontemplate strand nick upstream or downstream of the promoter.
(A) This model shows the mechanism of R-loop formation when the nick on
the nontemplate DNA strand is present upstream of the promoter. The nick
positions are schematically shown as discontinuations on the nontemplate
strand. During transcription, the elongating RNA polymerase (green rectangle)
unwinds the incoming duplexed DNA, but the upstream edge of the transiently
open transcription bubble behind the RNA polymerase closes and forms duplex
relatively quickly (solid black downward arrow depicts a relatively strong DNA
duplex formation propensity whereas a dashed upward arrow depicts a weaker
DNA dissociation propensity). This makes the RNA less efficient in establishing
an RNA:DNA hybrid to form R-loops.
(B) This model shows the mechanism of R-loop formation when the nick on
the nontemplate DNA strand is present downstream of the promoter. As the
transcribing RNA polymerase approaches the nick and unwinds the duplexed
DNA, the template strand enters the RNA polymerization active site whereas the
nontemplate strand is displaced because it loses its contact with the duplexed
upstream DNA because of the presence of the nick, thereby reducing the DNA
duplex formation propensity (dashed black downward arrow) and increasing the
chances of the template and the nontemplate DNA strand to remain single-
stranded (solid upward arrow). This allows the nascent RNA coming out of the
RNA exit pore in the RNA polymerase a better chance to establish an RNA:DNA
hybrid with the DNA template strand.
184
Figure C.5. Model of R-loop formation with promoter-downstream nicks
present at different locations.
The top panel shows the mechanism of R-loop formation in nicked pZZ6
when the nick (shown as a white region of discontinuation of the nontemplate
strand at the position of the nick) is present immediately downstream of the
promoter and one repeat length upstream of the 3Sγ3 repeats (three blue
arrows). When the RNA polymerase passes through the nick, the nontemplate
strand is displaced from the duplex form of DNA. Since the nick is close to the
promoter, the length of the displaced nontemplate strand DNA behind the RNA
polymerase is comparable with the length of the transcript coming out of the exit
pore of the RNA polymerase. Because the nontemplate strand after the nick is
displaced, the RNA in this case gets a better opportunity to compete with the
nontemplate strand for hybridizing with the template strand even though being
comparable in terms of length, the molecular mobilities of the displaced
nontemplate strand after the nick and the 5’ end of the transcript are also
comparable. This results in efficient R-loop formation at the downstream switch
repeats even though the nick is closer to the promoter.
The bottom panel represents the mechanism of R-loop formation when the
nontemplate strand nick is present far downstream of the promoter but
immediately upstream of the start of the switch repeats in pZZ7. Compared to
the position of the nick in pZZ6, the nick in pZZ7 is positioned 60 nts further
downstream. When the RNA polymerase reaches the nick position, the
transcript length is already over a repeat long with mostly non-G-rich (and
185
Figure C.5, Continued
therefore non-hybridizing) sequence composition. After the RNA polymerase
transcribes through the nick position and displaces the nontemplate strand DNA,
the displaced nontemplate stand behind the RNA polymerase is short and
consequently has higher molecular mobility as compared to the long non-
hybridizing RNA. This makes the RNA disadvantaged as compared to the
nontemplate strand which can participate in higher number of effective collisions
with the template strand. Therefore, even though the RNA:DNA hybrids have
higher stability than the corresponding DNA:DNA duplexes, the RNA in this case
would be far more compromised for its ability to hybridize with the template
strand DNA because of the extensive length of the non-hybridizing portion of the
transcript resulting in low molecular motion, the peeling of the RNA:DNA hybrids
by the extra bulk of non-hybridizing RNA, and the increased relative mobility of
the DNA end after the nick. Even with these factors negating efficient R-loop
formation, nicked pZZ7 is still better than unnicked pZZ7 or pDR72 (Fig 4.6)
showing that presence of a nick offers the RNA a better chance to compete with
the nontemplate strand DNA by decreasing the DNA duplex formation propensity
in the nicked substrates as compared to unnicked substrates.
186
Figure C.5, Continued
Abstract (if available)
Abstract
R-loops form at immunoglobulin (Ig) heavy chain locus in germinal center B-cells undergoing class switch recombination (CSR). These are triple-stranded structures formed upon transcription of the G-rich switch sequences where the G-rich transcript hybridizes with the C-rich DNA template strand, leaving the nontemplate DNA strand in a single-stranded conformation. These structures aid in targeting of the CSR initiating activation-induced cytidine deaminase to the participating switch regions to initiate CSR. CSR involves exchanging IgM heavy chain domain gene (Cμ) with another heavy chain (CH) gene located downstream of Cμ to generate IgG, IgA or IgE antibodies. This occurs by a recombination event between the long switch regions located upstream of the CH genes in the Ig locus. To understand the mechanism of R-loop formation and study their biochemical and structural properties, we have performed experiments with a minimal R-loop forming sequence as our model.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
The mechanism of mammalian immunoglobulin class switch recombination: R-loop structures and activation-induced deaminase site preferences
PDF
The mechanism of mammalian immunoglobulin class switch recombination
PDF
Biochemistry and reconstitution of V(D)J recombination in a purified system
PDF
Studies on the role of Artemis in non-homologous DNA end-joining to understand the mechanism and discover therapies
PDF
Mechanisms of nucleases in non-homologous DNA end joining
PDF
Mechanism of human nonhomologous DNA end joining
PDF
Mechanistic basis for chromosomal translocations at the E2A gene
PDF
Unraveling the molecular mechanisms of heterochromatic double strand break repair in Drosophila cells
PDF
A diagrammatic analysis of the secondary structural ensemble of CNG trinucleotide repeat
PDF
Technological advancements in microbial analyses of periodontitis patients: focus on Illumina® sequencing using the Miseq system on the 16s rRNA gene: a clinical and microbial study
Asset Metadata
Creator
Roy, Deepankar (author)
Core Title
The mechanism of R-loop formation in mammalian immunoglobulin class switch recombination
Contributor
Electronically uploaded by the author
(provenance)
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Biochemistry and Molecular Biology
Degree Conferral Date
2009-05
Publication Date
11/05/2009
Defense Date
03/30/2009
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
antibody class switch recombination,G-cluster,G-quartet,isotype switch,nontemplate strand,nucleic acid,OAI-PMH Harvest,R-loop,RNA polymerase,RNA threadback,RNA:DNA hybrid,template,thermodynamic stability,transcription
Language
English
Advisor
Lieber, Michael R. (
committee chair
), Hsieh, Chih-lin (
committee member
), Stallcup, Michael R. (
committee member
), Zandi, Ebrahim (
committee member
)
Creator Email
deepankarroy@gmail.com,deepankr@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m2183
Unique identifier
UC1146388
Identifier
etd-Roy-2824 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-244005 (legacy record id),usctheses-m2183 (legacy record id)
Legacy Identifier
etd-Roy-2824.pdf
Dmrecord
244005
Document Type
Dissertation
Rights
Roy, Deepankar
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
antibody class switch recombination
G-cluster
G-quartet
isotype switch
nontemplate strand
nucleic acid
R-loop
RNA polymerase
RNA threadback
RNA:DNA hybrid
template
thermodynamic stability
transcription