Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Mechanistic basis for chromosomal translocations at the E2A gene
(USC Thesis Other)
Mechanistic basis for chromosomal translocations at the E2A gene
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Mechanistic Basis for Chromosomal Translocations at the E2A Gene
by
Di Liu
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(CANCER BIOLOGY AND GENOMICS)
August 2021
Copyright 2021 Di Liu
ii
Acknowledgements
I would like to thank my mentors, Dr. Michael Lieber and Dr. Hsieh, for their great support
on my research project and on many aspects of my life. I have received tremendous valuable
suggestions from them on scientific studies, including experimental designs and technics,
troubleshooting, data analysis, paper writing, and presentation skills. They have been patiently
coaching me for years and offered me help, care, support, and encouragement whenever I
needed. I have learned how to perform good research and to think critically and independently.
I had a wonderful PhD life and sincerely enjoyed it. I cannot say enough to express my gratitude
to them.
In addition, I would like to thank my committee members for their guidance and support
throughout my graduate studies: Dr. Michael Press, Dr. Ebrahim Zandi, and Dr. Robert Maxson.
I also would like to thank the former and present members of the two labs, Dr. Nicholas Punnunzio,
Dr. Go Watanabe, Dr. Bailin Zhao, Dr. Zitadel Anne Esguerra, Dr. Shuangchao Ma, Christina
Gerodimos, and Cindy Okitsu, for their helpful discussion on the experiments and kind assistance.
I also want to thank USC Norris Medical Library, especially Dr. Yong-Hwee Eddie Loh.
Eddie has provided great help in sequencing data analysis and patiently taught me how to read
and revise scripts. I am grateful to Dr. Phuong Pham and Dr. Myron Goodman of USC for the
AID protein they generously provided. I would like to thank Dr. Kun-Yi Chien from Chang Gung
University for mass spectrum analyses and the valuable suggestions on sample preparation and
troubleshooting.
I thank my parents for their unconditional love. I thank the most important person in my
life, my husband, Bailin, who is always by my side and takes care of me. I thank for all the
encouragement, understanding, comfort, and support from my family.
Di Liu
May 2021
iii
Table of Contents
Acknowledgements ........................................................................................................................ ii
List of Tables ................................................................................................................................ v
List of Figures ............................................................................................................................... vi
Abstract ......................................................................................................................................... ix
Chapter 1 Introduction............................................................................................................ 1
1.1 Overview of translocations ................................................................................................ 1
1.2 The joining phase of chromosomal translocations ............................................................ 6
1.3 Overview of causes of chromosome breaks ..................................................................... 8
1.4 Timing of B cell translocations ........................................................................................ 11
1.5 Factors relevant to translocations occur in early B cell development ............................. 13
1.6 Characteristics of E2A breakpoints ................................................................................. 16
Chapter 2 Monomeric AID Purification and Activity Test ..................................................... 19
2.1 Introduction ..................................................................................................................... 19
2.2 Materials and Methods .................................................................................................... 20
2.3 Results ............................................................................................................................ 23
2.4 Discussion ....................................................................................................................... 26
Chapter 3 DNA Methylation Analysis of CpG Sites within the E2A Fragile Zones .............. 28
3.1 Introduction ..................................................................................................................... 28
3.2 Materials and Methods .................................................................................................... 28
3.3 Results ............................................................................................................................ 31
3.4 Discussion ....................................................................................................................... 34
Chapter 4 Assessment of the Single-Stranded Character of the DNA at the E2A Fragile Zone
in Human Pre-B Cell Lines from Acute Lymphoblastic Leukemia/Lymphoma Patients .............. 35
4.1 Introduction ..................................................................................................................... 35
4.2 Materials and Methods .................................................................................................... 38
4.3 Results ............................................................................................................................ 50
4.4 Discussion ....................................................................................................................... 68
Chapter 5 Biochemical AID Activity within the 23 bp E2A Fragile Zone in a Purified System
69
5.1 Introduction ..................................................................................................................... 69
5.2 Materials and Methods .................................................................................................... 70
5.3 Results ............................................................................................................................ 78
5.4 Discussion ....................................................................................................................... 90
Chapter 6 E2A Binding Protein Analyses with EMSA .......................................................... 93
6.1 Introduction ..................................................................................................................... 93
6.2 Materials and Methods .................................................................................................... 94
6.3 Results ............................................................................................................................ 99
6.4 Discussion ..................................................................................................................... 106
iv
Chapter 7 The De Novo Oligo Pull Down Assay for Detection of Unknown DNA Binding
Proteins 108
7.1 Introduction ................................................................................................................... 108
7.2 Materials and Methods .................................................................................................. 113
7.3 Results .......................................................................................................................... 122
7.4 Discussion ..................................................................................................................... 136
Chapter 8 Concluding Remarks ......................................................................................... 140
8.1 Models for the clustered E2A breakage ........................................................................ 140
8.2 Relevance to other major fragile regions in human lymphoid translocations ................ 143
References ............................................................................................................................... 147
v
List of Tables
Table 1.1 Fragile zones of common haematopoietic translocations. ............................................ 4
Table 1.2 Statistical analyses of sequence motifs in proximity of fragile region breakpoints in
patients. ...................................................................................................................................... 10
Table 3.1 The methylation percentage of the three CpG sites within the 265 bp E2A region varies
in three pre-B cell lines. .............................................................................................................. 33
Table 3.2 Two CpG sites within the fragile zone are independently methylated in three pre-B cell
lines. ........................................................................................................................................... 33
Table 4.1 Oligonucleotides. ........................................................................................................ 39
Table 4.2 Melt temperature of E2A duplex is lower than that of the control duplex. ................... 59
Table 4.3 Presence of methylcytosine on E2A and control duplexes increases the melting
temperature. ............................................................................................................................... 59
Table 4.4 Similar deletion frequencies within and outside of the 23 bp fragile zone were identified.
.................................................................................................................................................... 66
Table 5.1 Oligonucleotides. ....................................................................................................... 70
Table 6.1 Oligonucleotides. ........................................................................................................ 94
Table 6.2 DNA binding proteins detected by MS. ..................................................................... 102
Table 7.1 Oligonucleotides used in the oligo pull down study. ................................................. 113
Table 7.2 Mass spectrum trial 3 identifies limited number of proteins. ..................................... 135
vi
List of Figures
Figure 1.1 Breakage and rejoining phases of a double strand repair. .......................................... 1
Figure 1.2 Common chromosomal translocations in B cells. ........................................................ 2
Figure 1.3 Conversions of persistent DNA lesions caused by AID to DSBs. ................................ 5
Figure 1.4 PBX1 and HLF breakpoints distribution. .................................................................... 11
Figure 1.5 Breakpoints of non-IGH loci in early B cell translocations are focused within 20-600 bp
fragile zone. ................................................................................................................................ 15
Figure 1.6 The E2A breakpoints clustering to a 23 bp zone in E2A intron 16 in patients. .......... 18
Figure 2.1 Illustration of AID.mono and its substrate structures. ................................................ 19
Figure 2.2 AID.mono purification. ............................................................................................... 24
Figure 2.3 AID.mono can deaminate cytosines on ssDNA substrate. ........................................ 25
Figure 3.1 Traditional (denatured) bisulfite sequencing indicates both CpG sites within E2A fragile
zone are methylated at some levels in pre-B cells. .................................................................... 32
Figure 4.1 Illustration of the native bisulfite assay. ..................................................................... 36
Figure 4.2 Bisulfite mediated deamination of cytosine. .............................................................. 37
Figure 4.3 Preparation of the 150 bp end-labeled and internal-labeled radioactive DNA substrates.
.................................................................................................................................................... 44
Figure 4.4 Native ammonium bisulfite treatment show similar pattern but lower background
conversion rate compared with native sodium bisulfite treatment with using Nalm-6 genomic DNA.
.................................................................................................................................................... 50
Figure 4.5 Whole cell native bisulfite sequencing identifies sites with single-stranded DNA
character in the 23 bp E2A fragile zone and the downstream region. ........................................ 52
Figure 4.6 Bisulfite reactivity under native (nondenaturing) conditions correlates with the length of
the various C-strings at and downstream of the E2A fragile zone. ............................................. 56
Figure 4.7 The 666 bp region containing the 23 bp E2A fragile zone and its downstream region
has high C-string density in E2A intron 16. ................................................................................. 57
Figure 4.8 Melting curves show decreased DNA stability of the E2A fragile zone. .................... 58
Figure 4.9 Nuclease S1 digestion of the 810 bp E2A substrate suggests potential single-stranded
character of the E2A fragile zone. .............................................................................................. 61
vii
Figure 4.10 S1 Digestion of the 150 bp end-labeled E2A substrate shows cuttings over the entire
substrate. .................................................................................................................................... 62
Figure 4.11 Digestion of 5’ end labeled substrate with nuclease P1 at different temperatures
indicates the quick loss of radioactive signal on the substrate. .................................................. 63
Figure 4.12 P1 digestion of the 150 bp internal-labeled E2A indicates thermal fluctuation within
the 23 bp E2A fragile zone. ........................................................................................................ 65
Figure 4.13 Western blot analysis confirms ARM23 expression in pDL16 transfected 293/EBNA1
cells both with and without H2O2 treatment. ................................................................................ 66
Figure 5.1 Sequences around the E2A fragile zone on wild type E2A, mut1, and mut2 substrates.
.................................................................................................................................................... 71
Figure 5.2 Illustration of Maximum-Depth Sequencing (MDS). .................................................. 75
Figure 5.3 Illustration of reads sorting process for MDS data. .................................................... 76
Figure 5.4 No appreciable cytosine deamination is observed in the reactions without AID. ...... 78
Figure 5.5 AID deaminates cytosines on the NTS within the 23 bp E2A fragile zone and the
downstream region with and without transcription. ..................................................................... 80
Figure 5.6 Presence of RNase A during transcription results in markedly decreased AID
deamination activity in E2A fragile zone. .................................................................................... 82
Figure 5.7 AID shows similar deamination activity on the NTS of the mut1 substrate with and
without transcription compared with that of wt E2A. ................................................................... 86
Figure 5.8 AID shows decreased deamination activity on the NTS of the mut2 substrate with and
without transcription compared with that of wt E2A. ................................................................... 88
Figure 5.9 Potential loop formation due to misalignment between the direct repeats. ............... 92
Figure 6.1 Illustration of enrichment and detection of specific DNA binding proteins with EMSA.
.................................................................................................................................................... 95
Figure 6.2 Protein with high affinity specific to E2A sequence in EMSA. ................................. 100
Figure 6.3 Absence of protein with high affinity specific to E2A sequence in EMSA using affinity
beads captured fractions. ......................................................................................................... 103
Figure 6.4 Absence of protein with high affinity specific to E2A sequence in EMSA using blunt
ended hot probes. ..................................................................................................................... 104
Figure 6.5 Absence of protein with high affinity specific to E2A sequence in EMSA using single-
stranded hot probes. ................................................................................................................. 105
Figure 7.1 Illustration of the oligo pull down assay. .................................................................. 116
viii
Figure 7.2 Buffers with low salt result in good enrichment and are compatible with HindIII enzyme
activity. ...................................................................................................................................... 123
Figure 7.3 Optimization of T4 DNA polymerase and HindIII treatment conditions in oligo pull down
assay. ....................................................................................................................................... 125
Figure 7.4 Increased copy number of the oligo annealing sites increases the pull down efficiency.
.................................................................................................................................................. 128
Figure 7.5 Mass spectrometry trial 1 results indicate low reproducibility of proteins identified in the
assay. ....................................................................................................................................... 130
Figure 7.6 Mass spectrometry trial 2 results indicate low reproducibility of proteins identified in the
assay. ....................................................................................................................................... 131
Figure 7.7 BSA-free PvuII can release of EF1A enriched from both doped plasmid and
minichromosome from the Dynabeads. .................................................................................... 133
Figure 8.1 Models of E2A breakage. ........................................................................................ 146
ix
Abstract
A chromosomal translocation is the first essential step for most human lymphoid
malignancies. The translocation process involves a DNA breakage phase and a rejoining phase.
The rejoining phase is most often carried out by non-homologous end joining (NHEJ), which is
the major pathway for repair of double-strand breaks (DSBs) throughout the entire cell cycle. The
DSBs arising in the breakage phase can occur due to pathologic or physiologic causes. Previous
studies have revealed that chromosomal translocations in early B cells usually involve a DSB at
the immunoglobulin heavy chain (IGH) locus caused by the recombination activating gene (RAG)
complex (most often arising during a D to J recombination attempt) and a second DSB at the non-
IGH locus initiated by the enzyme, Activation Induced cytidine Deaminase (AID). The non-IGH
DNA breaks are usually focused on the CG motif within small zones of 20-600 bp. AID can only
deaminate cytosines in regions of single-stranded DNA (ssDNA), but the basis for restriction of
the AID activity to such narrow fragile zones has been unclear. In other words, what makes these
20-600 bp regions of duplex DNA transiently single-stranded to permit AID action. That is the
central issue studied in this thesis.
In this study, I have focused on the factors that determine DNA breakage at the E2A fragile
zone to provide insight into the other B cell translocation fragile zones in human lymphoma. The
E2A gene has the smallest fragile zone (23 bp) among all oncogenes involved in human
translocations. The PBX1 gene and the HLF gene are the two most common translocation
partners of E2A. The statistical analyses using all the available junctional sequences indicate
E2A breakpoints are in significant proximity to the CG motif as well as AID hotspot motifs in both
E2A-PBX1 translocation and E2A-HLF translocation. This suggests the involvement of AID in the
breakage phase of E2A. In this thesis, I used native DNA bisulfite chemical probing on live pre-
B cells to investigate the single-stranded character of the E2A fragile zone (and surrounding DNA),
including deviations from the duplex DNA state that are transient, intermittent, and short-lived.
x
The results indicate the 23 bp E2A fragile zone and its downstream region have a higher degree
of transient single-stranded character in living cells, which is further supported by DNA melting
studies and a nuclease P1 assay on purified DNA. The length of consecutive cytosines (C-string)
on either strand is correlated with the native bisulfite reactivity (on a per nt basis). This finding
indicates that duplex DNA with C-strings on either strand has increased thermal fluctuation and
therefore, increased frequency of adopting an intermittent, transient ssDNA deviation from the
double-stranded DNA (dsDNA) conformation. NMR studies by others indicate that the Cs within
C-strings undergo thermal fluctuation (breathe) on average every 5 milliseconds, which is nearly
6 times more frequent than isolated Cs.
I have also studied AID activity on the E2A fragile zone using a biochemically defined
system. Four AID deamination peaks, two within the E2A fragile zone and two in the downstream
region, are observed on the non-transcribed strand (NTS) but not the transcribed strand (TS) of
E2A without transcription. AID deamination activity within the E2A fragile zone and its
downstream region increases upon transcription in this purified system using T7 RNA polymerase.
Seven AID deamination peaks, with higher mutation rate compared with that on untranscribed
E2A, are present on the NTS with transcription and six of them are within and downstream of the
E2A fragile zone.
I next investigate factors that may influence the AID targeting in the 23 bp E2A fragile zone.
I find that RNase A treatment markedly decreases the AID activity on the NTS of E2A to
background levels during transcription. This indicates that removal of the trailing nascent RNA
from the RNA polymerase (removal of RNA tail is known to reduce transcription stalling)
decreases the AID accessibility to the cytosines on the NTS.
Using the biochemical system, I also investigate the impact of C-string density on the AID
deamination of E2A using two E2A mutants, mut1 with a C-string on the NTS disrupted and mut2
with a C-string on the NTS and two C-strings on the TS disrupted. Compared with the results on
wild type E2A substrate (wt E2A), AID shows similar deamination activity on mut1 but shows
xi
significantly decreased activity on mut2 both with and without transcription. These results
demonstrate that removal of three C-strings on mut2 leads to decreased AID targeting in the
fragile zone and the downstream region. In addition, the bisulfite sequencing assay (under
denatured DNA conditions) shows that the two CpG sites within the E2A fragile zone each has a
detectable level of methylation in the three pre-B cell lines (methylcytosine, upon deamination,
becomes T, which results in the long-lived T:G mismatch, which is a persistent lesion). A model
for the focused E2A breakage is derived based on the above results. It is likely that a
noncanonical B-form DNA (non-B DNA that adopts a B/A-intermediate DNA conformation
between A-form and B-form DNA) structure forms in the 23 bp E2A fragile zone and its
downstream region due to the high C-string density. RNA polymerase pausing in this non-B
region, especially at the C-strings, and DNA slippage between direct repeats that flank the fragile
zone can lead to this 23 bp fragile zone adopting a transient single-stranded state that is
vulnerable to AID deamination. When CpGs are methylated in this non-B region, AID acts on it
at AID WRC hotspot motifs (namely WRCG, W = A or T, R = A or G) generating a long-lived T:G
mismatch lesion, which is vulnerable to conversion to DSB by the action of RAG or Artemis.
Besides DNA sequence predisposing to a transient ssDNA that makes the fragile zone a
suitable AID substrate, I also considered the possibility that specific proteins may bind within the
fragile zone and cause strand separation to create a suitable ssDNA substrate for AID. The DNA
affinity column and electrophoretic mobility shift assay (EMSA) are used to detect proteins
specifically binding to the E2A fragile zone. Proteins with high affinity to the E2A fragile zone
were not identified using this assay, likely due to the high complexity and low sensitivity of this
assay. A de novo oligo pull down assay utilizing the specificity of DNA pairing followed by mass
spectrometric (MS) analysis is under development to study the unknown DNA binding proteins on
any DNA sequence of interest inside live cells. Further optimization and testing are needed to
apply this assay for the study of potential E2A specific binding proteins that can separate DNA
strands to generate transient ssDNA.
xii
Organization of the thesis
In Chapter 1, I provide a brief general introduction to the role of the E2A translocation in
the context of lymphoid malignancies.
In Chapter 2, I describe the purification of the monomeric AID.
In Chapter 3, I characterize the methylation status of the two CpG sites within the 23 bp
E2A fragile zone.
In chapter 4, I demonstrate the increased single-stranded character of the E2A fragile
zone and its surrounding regions, using the native bisulfite chemical assay, melting studies, and
a P1 nuclease assay.
In Chapter 5, I present the AID activity on the E2A substrate using a biochemically defined
in vitro system to investigate the influence of C-string density and the nascent RNA transcript on
AID targeting within the E2A fragile zone.
In Chapter 6, I study E2A specific binding proteins by EMSA assay and DNA affinity resin.
In Chapter 7, I describe the development of a de novo oligo pull down assay to identify
DNA binding proteins on any DNA sequence of interest in live cells.
In Chapter 8, I summarize all the results and present a final model for the focal nature of
the E2A breakage.
1
Chapter 1 Introduction
1.1 Overview of translocations
Chromosomal translocation is the most common first essential step for the majority of
human lymphoid malignancies (lymphoid leukemias and lymphomas). It usually involves two
phases: the breakage phase (at each of two chromosomes) and the rejoining phase (Fig. 1.1)
(Lieber, 2016; Tsai et al., 2008). Non-homologous end joining (NHEJ) is active throughout the
entire cell cycle and is the major pathway employed by somatic cells to repair DNA double-strand
breaks (DSBs) in the rejoining phase (Chang et al., 2017; Zhao et al., 2020a). NHEJ joins the
correct ends from each double-strand break (DSB) in nearly all cases. However, NHEJ can also
join the incorrect broken chromosome ends from two DSBs that occasionally arise concurrently,
and this can result in a chromosomal translocation (Fig. 1.2).
Figure 1.1 Breakage and rejoining phases of a double strand repair.
Physiological and pathological causes of DSBs in mammalian somatic cells are listed at the top. At any
time in the cell cycle, DSBs can be repaired by non-homologous DNA end-joining (NHEJ). During S and
G2 of the cell cycle, homology-directed repair is common because the two sister chromatids are in close
proximity, providing a nearby homology donor. Homology-directed repair includes homologous
recombination (HR) and single-strand annealing (SSA). Proteins involved in the repair pathways are also
shown. RAG, recombination activating gene; AID, activation-induced deaminase; UDG; APE, apurinic-
apyrimidinic endonuclease 1; DNA-PKcs, DNA-dependent kinase catalytic subunit; Pol, DNA polymerase;
XLF, XRCC4-like factor; NBS1, Nijmegen breakage syndrome 1; MRE11A, meiotic recombination 11
homologue A.
2
Figure 1.2 Common chromosomal translocations in B cells.
Most of the lymphoid translocations in early B cell stage involve a DSB that is a recombination activating
gene (RAG)-induced event (termed RAG-type break) at the immunoglobulin locus and a second DSB that
is an activation-induced cytidine deaminase (AID)-induced event (termed AID-type break) on an oncogene.
E2A-PBX1 translocation in early B cells (c) involves an AID-type break on E2A and a second break caused
by random reasons such as oxidative stress, ionizing radiation, etc. on PBX1. Mature B cell translocations
(f and g) commonly involves an AID-type event on one chromosome and a second event involving a failed
IGH class switch recombination (CSR) event, which is a physiologic AID-type event. MALT1, mucosa-
associated lymphoid tissue lymphoma translocation 1; CRLF2, cytokine receptor-like factor 2.
3
The incorrect exchange of the DNA ends between the two DSBs could occur due to the
broken ends of one DSB diffusing apart; then each DNA end from the first DSB may be joined by
random association to the broken DNA ends at a second DSB on a different chromosome. After
joining all four ends, the two resulting translocation chromosomes are called derivative
chromosomes. In cases where both derivative chromosomes have one centromere, these
chromosomes are usually stable. In cases where one chromosome has two centromeres and the
other has none, then the latter chromosome is often lost from progeny cells after cell division due
to the failure of the chromosome to align at the metaphase plate. But any derivative chromosome
bearing two centromeres will undergo breakage-fusion-bridge cycles, resulting in further variation
of that derivative chromosome (Murnane, 2006).
DNA breakage inside somatic cells can be due to physiologic or pathologic causes (Fig.
1.1) (Lieber, 2010). V(D)J recombination and class switch recombination (CSR) are two major
physiologic processes that involve DNA breakage in vertebrates. The recombination activating
gene (RAG) complex (a heterotetramer composed of two RAG1 and two RAG2 molecules) is the
nuclease that initiates DNA breakage in V(D)J recombination at early stages of B or T cells to
generate a diverse repertoire of immunoglobulins and T cell receptors (Ma et al., 2002; Oettinger
et al., 1990; Schatz et al., 1989). Activation-Induced cytidine Deaminase (AID) is involved in CSR
and somatic hypermutation (SHM) of germinal center B cells to generate diversify in the constant
region of the antibody heavy chain (Muramatsu et al., 2000). Most of the chromosomal
translocations in human lymphoid malignancies are due to the incorrect rejoining of a DNA break
initiated by the RAG complex or AID at the immunoglobulin heavy (IGH) locus and a second break
at a non-IGH locus, typically an oncogene (Lieber, 2016). The breakage initiated by the RAG
complex or AID is located at specific DNA motifs due to the sequence preferences of each of
these lymphoid-specific enzymes, and these will be discussed in detail later.
Pathologic causes, such as oxidative free radicals, ionizing radiation, or failed
topoisomerase II reactions, are responsible for the DSB in translocations not due to failed V(D)J
4
recombination or CSR (Fig. 1.1). These are general causes of DNA breakage that can occur in
all types of living cells, not just lymphoid cells. The breakage initiated by these general pathologic
causes are quite often distributed over large regions because there is little or no sequence
specificity. These large regions of distribution of breakage location can be several kilobases (kb)
of an intron in some cases or even hundreds of kb in other cases. Cells that harbor translocations,
in which the fusion of two open reading frames encodes an oncogenic protein, often acquire
survival or proliferation advantage and become neoplastic. Examples of this are BCR-ABL (the
Philadelphia chromosomal translocation) in myeloid malignancies and MLL translocations in
some hematopoietic malignancies (Table 1.1).
Table 1.1 Fragile zones of common haematopoietic translocations.
CRLF2, cytokine receptor-like factor 2; ETV6, ETS variant 6; ICR, intermediate cluster region; MALT1,
mucosa-associated lymphoid tissue lymphoma translocation 1; MBR, major breakpoint region; mcr, minor
cluster region; MLL, mixed lineage leukaemia; MTC, major translocation cluster; RUNX1, runt related
transcription factor 1.
In lymphoid cells the DNA breakage mechanism at the non-IGH loci (the oncogene) is
extremely interesting because its highly focused nature suggests a consistent mechanism. In B
cell translocations, a majority of these breaks, including the DSBs at Bcl2, E2A (TCF3), Bcl1
(CCND1), cytokine receptor-like factor 2 (CRLF2), and mucosal-associated lymphoid tissue
lymphoma translocation 1 (MALT1), take place at AID hotspot motifs in narrow zones of 20-600
bp (Table 1.1) (Cui et al., 2013; Greisman et al., 2012; Lieber, 2016; Pannunzio and Lieber, 2017;
Tsai et al., 2008). Although they occur less frequently, larger break regions of up to several kb,
5
such as Bcl6 and c-myc, are also hotspots of DNA breakage in B cell translocations (Lieber, 2016;
Lu et al., 2015b; Lu et al., 2013; Tsai et al., 2008). While AID is most abundant in mature B cells,
many studies have shown that there is a low level of AID expression in the pro- and pre-B cell
stages (Cantaert et al., 2015; Han et al., 2007; Kelsoe, 2014; Kumar et al., 2013; Kuraoka et al.,
2011; Kuraoka et al., 2009; Mao et al., 2004; Ueda et al., 2007; Umiker et al., 2014). The low
level of AID in early stages of B cell development is sufficient to catalyze the rare but critical
translocation events. AID can initiate a DNA lesion by deaminating cytosines in regions of
transient or stable single-stranded DNA (ssDNA) (Bransteitter et al., 2003; Pham et al., 2003),
resulting in an easily recognized and repaired U:G mismatch most of the time (Schmutte et al.,
1995; Walsh and Xu, 2006). A more persistent T:G mismatch is generated by AID in the presence
of methylated cytosine at the CpG site. This long-lived T:G lesion can be converted to a DSB by
the RAG complex or by the structure-specific nuclease, Artemis, in the context of the activated
Artemis:DNA-PKcs complex (Fig. 1.3) (Cui et al., 2013; Tsai et al., 2008). The central question,
why AID activity is limited to such focal 20-600 bp regions on these oncogenes, remains, and that
is the focus of my thesis.
Figure 1.3 Conversions of persistent DNA lesions caused by AID to DSBs.
AID can deaminated methylcytosines in single stranded region at CG motif to form a long-lived lesion. A
second deamination event could occur and a 2 bp loop structure is formed. Within the small loop, the
methylcytosine can be recognized by MBD4 or TDG to be converted to an abasic site, which can be further
cut by APE. The structure specific nucleases inside the cells such as RAG complex and Artemis:DNA-
PKcs complex can nick at the mismatches to create DSBs.
6
1.2 The joining phase of chromosomal translocations
After any single DSB occurs, NHEJ is the predominant pathway available through the
whole cell cycle for repairing the DNA damage (Beucher et al., 2009). The two resulting DNA
ends from a DSB are repaired, usually with nucleotide loss and modification at the joining site
prior to restoration of the overall configuration of the chromosome. For chromosomal
translocations, two DSBs are involved and result in a pathologic configuration of the chromosomal
segments. The enzymes in the NHEJ pathway are the predominant ones involved in making the
broken DNA ends chemically suitable for ligation, regardless of whether the correct or the
translocated chromosomal configuration is the result.
In NHEJ, a nuclease, multiple polymerases, and a ligase participate in the rejoining phase
(Chang et al., 2017). Direct ligation of the two ends is usually blocked by end incompatibility and
chemical modification. Ku binding at the DNA ends after breakage prevents excessive DNA end
resection (≥ 20 nt), which distinguishes NHEJ from other repair pathways (Chang et al., 2017;
Pannunzio et al., 2018). Artemis:DNA-PKcs complex, recruited to the Ku bound DNA ends,
resects the DNA ends until short microhomology (≤ 4 nt) between the strands is often exposed to
facilitate proper ligation (Chang et al., 2017; Pannunzio et al., 2018).
Random nucleotide addition by polymerases is another mechanism that can generate
microhomology. Members of the polymerase X family, including terminal deoxynucleotidyl
transferase (TdT), polymerase mu (pol µ), and polymerase lamda (pol λ), are the main
polymerases that generate ligatible ends. The inserted DNA sequence at the break junction
provides some clues as to which polymerase added the nucleotides at the junction (Zhao et al.,
2020a). TdT functions in a template-independent manner with a preference for incorporating
dCTP and dGTP (Gauss and Lieber, 1996). TdT is only expressed in early B and T lymphocytes
(Li et al., 1993), and one of its main activities is the random addition of nucleotides to ssDNA
during V(D)J recombination to increase the antigen receptor and immunoglobulin diversity
7
(Bertocci et al., 2006). The randomly added nucleotides by TdT during the rejoining phase are
called N-nucleotides (N-nts) or N-regions (Landau et al., 1987). Pol µ and pol λ are involved in
NHEJ in all cells, and they can incorporate nucleotides in both a template-independent and
template-dependent manners (Blanca et al., 2004; Maga and Hübscher, 2003; Ramadan et al.,
2004; Wood et al., 2001). The template-independent activity of pol λ is weaker compared with
that of pol µ and TdT (McElhinny et al., 2005). When these nucleotides are incorporated in a
template-dependent manner (by copying nts in either of the two DNA ends that are being joined)
then nucleotide additions are called T-nucleotides (T-nts) (Lieber, 2016; Zhao et al., 2020a).
DNA ligase IV (Lig4) functions exclusively in NHEJ. Two major steps are involved in the
ligation of two broken dsDNA ends, the physical juxtaposition of DNA ends (synapsis) and
covalent ligation. The x-ray repair cross-complementing 4 (XRCC4) can stimulate the activity of
Lig4 (Grawunder et al., 1997). The Lig4:XRCC4 complex (X4L4) is the central component for
DNA ends synapsis, which is stabilized by the XRCC4-like factor (XLF) and paralog of XRCC4
and XLF (PAXX), and ligation (Grawunder et al., 1997; Zhao et al., 2019). When DNA ends
contain at least 1 nt microhomology, pol u can also mediate efficient synapsis for ligation (Zhao
et al., 2020b). Once DNA ends are in close proximity with proper configurations, the covalent
ligation can occur immediately.
Of note, the proximity of the two chromosomes may affect the rejoining phase but there is
no evidence showing that chromosome proximity has any role in the DNA breakage phase in
lymphoid neoplasms. It is not likely that the distance of two chromosomes inside the cells would
affect the breakage of the DNA (Sunder and Wilson, 2019; Wilson and Sunder, 2020). If proximity
is important, we would expect dominant translocation events between nearby chromosomes
rather than most of the random translocations across the genome currently known. E2A-PBX1
translocations occurs between chromosome 1 and chromosome 19 which are far away from each
other according to the 3D map of human nuclei (Bolzer et al., 2005).
8
1.3 Overview of causes of chromosome breaks
The DSBs in lymphoid cells can be induced by the general causes that apply to all living
cells or due to the lymphoid specific factors, such as the aberrant action of the RAG complex
(during V(D)J recombination) or AID (during CSR at the IGH locus or during SHM at any of the
immunoglobulin gene loci) (Fig. 1.1) (Gostissa et al., 2011; Lieber, 2010; Mahowald et al., 2008;
Nussenzweig and Nussenzweig, 2010; Tsai et al., 2008). The two DSBs in lymphoid
chromosomal translocations may have independent causes or the same cause. In one type, the
RAG complex causes the breaks on both chromosomes, and these are referred to as RAG-type
events at both chromosomal locations (that is, RAG-type-RAG-type translocations). In the most
common cases, AID can initiate damage, causing a DSB on one chromosome (an AID-type break),
and the RAG complex would cause the break at the other chromosome (RAG-type-AID-type
translocation) (Fig. 1.2). In a third type of cases, both breaks can be AID-type. In a fourth category,
the 'uncertain-type’ breaks, a break is neither RAG-type nor AID-type but generated by one of the
other general causes mentioned above. In all cases, if the two DSBs are on the same
chromosome, then the same enzymes can cause a deletion within that chromosome, rather than
a translocation.
1.3.1 RAG-type breaks at CAC motifs
RAG complex normally cuts at CAC of the heptamer since this is the invariant portion of
the recombination signal sequence (RSS). RAG-type breaks can occur at CAC sequences at loci
other than antigen receptor. RAG-type-RAG-type interstitial (intrachromosomal) deletion is one
of the common outcomes that can lead to human T cell malignancies. Most notably, in pre-T cell
acute lymphoblastic leukemia/lymphoma (T-ALL), the RAG complex breaks at both the SCL (also
called TAL1) and SCL-interrupting locus (SIL; also known as STIL) locations are observed (Aplan
et al., 1990; Lieber, 1993). A small percentage of human B cell malignancies also appear to
involve RAG-type-RAG-type interstitial deletions, such as the CRLF2-purinergic receptor P2Y8
9
(P2RY8) translocation on the X chromosome in some human pre-B ALLs (Weigert and Weinstock,
2012). RAG-type-RAG-type translocations in human T-ALL, such as translocations between T
cell receptor (TCR) locus and the SCL, LIM domain only 2 (LMO2), homeobox 11 (HOX11; also
known as TLX1), and TTG1 (also known as LMO1) loci, are less common than the interstitial
deletions (Norris and Stone, 2008; Raghavan et al., 2001). The IGH-CRLF2 translocation is an
example of a RAG-type event at IGH and an AID-type event at CRLF2 in B cell ALL (Tsai et al.,
2010b).
In summary, RAG-type-RAG-type translocations account for majority if not all
translocations in T-ALLs, which have no AID expression. However, the co-expression of the RAG
complex and AID is responsible for the most common translocations in B cells, and this is a very
critical point that cannot be overemphasized. Given the involvement of AID, the DNA structural
requirements of AID are important to consider next.
1.3.2 AID-type breaks at AID hotspot motifs
AID-type breaks occur at AID hotspot motifs, including WGCW, WRC, and CGC,
specifically at cytosines of ssDNA (Pham et al., 2003; Yu et al., 2004). It is the most common
type of breakage in the oncogenes, including Bcl2, Bcl1, E2A, MALT1, and CRLF2 in early B cells
and Bcl6 and c-myc in mature B cells, involved in B cell translocations. The IGH locus is the
common translocation partner of the B cell oncogenes.
In early B cell translocations, the RAG-type-AID-type breakage is commonly observed (Fig.
1.2). The RAG motif found near the breakage sites on IGH locus indicates that the breaks in IGH
most likely arise from the aberrant V(D)J recombination events in these translocations. The
breakpoints in B cell oncogenes in Bcl2-IGH, Bcl1-IGH, MALT1-IGH, and CRLF2-IGH
translocations are in significant proximity to CG motif and AID hotspot motifs (Table 1.2),
suggesting the involvement of AID in the breakage phase. This feature is not observed in T-ALL
translocations because of the exclusive expression of AID in B cells but not T cells (Lieber, 2016).
10
Table 1.2 Statistical analyses of sequence motifs in proximity of fragile region breakpoints in
patients.
The statistical analysis results of all breakpoints in different B cell fragile zones to CG and AID hotspot
motifs (CGC, WRC, WGCW) were shown in the table (W = A or T; R = A or G). The statistical analyses
were performed in the same way as described before using the Lieber Lab database (Tsai et al., 2008).
The binomial statistic gives the probability that the E2A breakpoints occur at the tested DNA motif by
random chance. The Student’s t-test and the Mann-Whitney U-test are two different tests for proximity of
the E2A breakpoints to a given motif. DNA motifs that the breakpoints show statistically significantly
proximity to in all three statistics are highlighted in red. The breakpoints in Bcl2 fragile zones, Bcl1 MTC,
E2A 23 bp fragile zone, and 86 bp MALT1 fragile zone are statistically significantly close to both CG and
AID hotspot motifs. The Bcl1 breakpoints located outside of the MTC (non-MTC) show significant proximity
to CG and AID hotspot motifs when fused with IG loci, but not when fused with non-IG loci. The MALT1
breakpoints in API2-MALT1translocations do not show proximity to CG motif but AID CGC motif. The Bcl6
and c-myc breakpoints are statistically significantly close to AID WRC and WGCW motifs when translocated
to IG loci.
Beside the common IGH translocation partner loci, other oncogenes can also involve in
the early B cell translocations. E2A-HLF and E2A-PBX1 translocations are a typical AID-type-
uncertain-type event. The breakpoints of E2A gene in both translocations are in significant
proximity to CG motif and AID motifs (Table 1.2), suggesting the key role of AID in the initiation
of the E2A breakage. In contrast to the DSB at E2A, the breakpoints on PBX1 and HLF genes
are randomly distributed within broad zones (230 kb in PBX1 intron 2 and 5.3 kb HLF intron 3)
with no obvious clustering (Fig. 1.4). The breakpoints of PBX1 and HLF are most likely random
and due to the pathologic causes mentioned earlier. The growth advantage from the resulting
11
fusion protein is likely the determining factor for which introns within PBX1 and HLF are the ones
involved in the translocations.
Figure 1.4 PBX1 and HLF breakpoints distribution.
(a) PBX1 breakpoints distribution. The PBX1 breakpoints in 48 out 49 patients are randomly dispersed in
its intron 2 with no obvious clusters. (b) HLF breakpoints distribution. HLF breakpoints in 7 out of 8 patients
locate within its intron 3 with no obvious patterns.
In mature B cells, AID-type-AID-type breakage accounts for most translocation events,
including Bcl6-IGH translocation and c-myc-IGH translocation. The AID motif found around the
breakpoints of IGH locus suggests an event due to CSR or SHM in the mature B cells. The
significant proximity of Bcl6 and c-myc breakpoints to CG motif and AID motifs also indicates that
AID is involved in the breakage of those genes (Table 1.2).
1.4 Timing of B cell translocations
1.4.1 Translocations occur during pro- and pre-B cell stages based on junctional
nucleotide additions
The designation for junctional additions is N-nts, C/G-rich and consecutive identical
nucleotide, by TdT during V(D)J recombination in early B and early T cells. In most of the
lymphoid translocations, the N-nt feature is seen at the junctional sequences of the breaks that
occur at the non-IGH loci, such as E2A, MALT1, and CRLF2. Among the 72 junctional sequences
in E2A involved translocations that I analyzed, 66 of them contain N-nts, and the remaining ones
show no nucleotide additions. Among the six cases without nucleotide additions, two have 4 to 6
nts microhomology between E2A and its translocation partner. About 60% of the inserted
nucleotides (458 out of 773) are Cs and Gs, and 46 out of 72 (65%) junctional sequences contain
12
at least three consecutive Cs or Gs. N-nts are observed in all eight patients with IGH-MALT1
translocation with four of them containing consecutive Cs or Gs. Eighteen out of 19 junctional
sequences around CRLF2 breakpoints in CRLF2-IGH translocations also contain N-nts,
consistent with TdT activity.
Both N-nts and T-nts are observed in fragile zones of Bcl2 and Bcl1. Over 97% of the
Bcl1 breakpoints contain inserted junctional sequences, and 57% of the 1358 inserted nucleotides
are Cs or Gs with frequent presence of C/G-strings, typical of TdT addition. T-nts of 8 to 12 nts
are observed in 9 out of the 104 Bcl1 MTC breakpoints, exhibiting mismatches with the germ-line
sequences from MTC or IGH regions (Welzel et al., 2001). Ten out of 38 breakpoints located
outside of Bcl1 MTC region show T-nts of ≥ 8 nts. Over 95% of the junctional sequences of Bcl2
show nucleotide additions, the majority of which are random nucleotides (not templated), a
characteristic of TdT activity. One study reported T-nt insertions at the Bcl2-IGH junctions in 30%
of the follicular lymphoma patients (Jager et al., 2000). The two different nucleotide addition
patterns in Bcl1 and Bcl2 may indicate the involvement of two different breakage/repair
mechanisms. The addition of T-nts may indicate that pol µ and pol λ had a longer time window
to modify the DNA ends. It is possible that an alternative end joining (aEJ) pathway may be
responsible for some T-nts (Carvajal-Garcia et al., 2020). We favor the view that all the N-nts
and a majority of the T-nts are added during NHEJ for the following reasons: (a) most of the
junctions have TdT additions, indicating that TdT is present and indicating a typical NHEJ event
in lymphoid cells (Gauss and Lieber, 1996); (b) pol µ and λ can generate direct repeats (DR) and
inverted repeats (IR), which are the essential feature of T-nts (Maga and Hübscher, 2003); (c)
aEJ typically generates at least 2 nts of microhomology at both ends of the junctional addition and
this is rarely observed in lymphoid cells (Carvajal-Garcia et al., 2020).
The presence of N-nts in the junctional regions of E2A, Bcl1, Bcl2, and MALT1 indicates
that these chromosomal translocations arise during the pro-B or pre-B cell stage. For Bcl1, Bcl2
13
and MALT1, this is consistent with the partner DSB arising at the IGH locus during V(D)J
recombination (which is a pro-B/pre-B cell event).
1.4.2 Translocations occur during mature B cell stage
Translocations involving IGH and Bcl-6 or c-myc occur in much larger zones and do not
show features of TdT addition. Among 63 cases with Bcl6 translocated to immunoglobulin loci
with sequence information, 22 containing 1-5 nt microhomology, 23 containing random insertions,
and 18 have no identifiable insertions or microhomology. Among the 58 cases with Bcl6
translocated to non-immunoglobulin loci with sequences available, 29 have 1 to 4 nts
microhomology, 9 contain random insertions, and 20 have no insertions or microhomology. The
low percentage of N-nts presence in the junctional sequences of Bcl6 translocations is consistent
with the view that these translocations are more likely to occur in mature B cells rather than in
pro-B cells. Among the 177 c-myc breakpoints, 33 contain nucleotide insertions of 1 nt to 33 nt
in length with the majority of them less than 5 nt; 92 have microhomology of 1 to 5 nts from either
c-myc or IGH sequences with no insertions; and the remaining show no insertions or no
microhomology (when the IGH break junctional sequence is available for inspection).
The characteristic of Bcl6 and c-myc junctional sequences is very different from that of
Bcl1, Bcl2, MALT1, and E2A, which mostly contain N-nts. The mutation patterns of Bcl6 and c-
myc (caused by SHM) is consistent with the distribution of breakpoints, suggesting the AID activity
in geminal center B cells as the cause of the breaks (Lu et al., 2015b).
1.5 Factors relevant to translocations occur in early B cell
development
Breakage events at Bcl2, Bcl1, E2A, MALT1, and CRLF2 are likely caused by similar
mechanisms considering the time window and sequence signatures of the fragile regions. First,
the fragile regions of those genes are short in length vary between 20 bp and 600 bp (Cui et al.,
14
2013; Greisman et al., 2012; Lieber, 2016; Pannunzio and Lieber, 2017; Tsai et al., 2008) (Fig.
1.5). A majority of the breakpoints are within these focal regions, which are up to 400-fold more
fragile compared with the surrounding sequences. Second, breakpoints within these fragile zones
are in proximities to the CG motif with high statistical significance (Table 1.2). Third, all the
breakage events arise from the pro-/pre-B cell stage. The inserted nucleotides in all fragile zones
have minimal microhomology to the nearby DNA of the derivative chromosomes (Tsai et al., 2008).
The presence of N-nts in most of the junctional sequences is consistent with TdT activity, which
is only expressed at pro-/pre-B/T cell stage. Fourth, the fragile zones tend to be in C-string rich
regions. DNA with C-strings is in a B/A intermediate DNA structure which undergoes more
thermal fluctuation than DNA of random sequence (Dornberger et al., 1999; Trantı
́ rek et al., 2000;
Tsai et al., 2009). Fifth, no currently known common features, including histone modifications,
transcriptional pausing, or replication origin proximity, are specific to all of these fragile regions.
Our lab has reported the critical role of AID in the breakage phase of these fragile zones
of 20-600 bp (Cui et al., 2013; Greisman et al., 2012; Pannunzio and Lieber, 2017; Tsai et al.,
2008). The central question remaining is what factors focus all the DNA breakage events to such
small 20-600 bp regions. The goal of this thesis is to define factors that contribute to the highly
focused AID activity within the fragile zones, using the 23 bp E2A fragile zone as our primary
focus because it is the smallest of all fragile zones identified thus far.
15
Figure 1.5 Breakpoints of non-IGH loci in early B cell translocations are focused within 20-600 bp
fragile zone.
(a) Bcl2 breakpoints in IGH-Bcl2 translocations. The Bcl2 breakpoints are scattered within a 30 kb region
with three cluster regions (red starbursts). The breakpoints that do not fall into cluster regions are plotted
as dark grey vertical lines. The 175 bp Bcl2 major break region (MBR), located at the 3’ untranslated region
(UTR) of Bcl2, contains 50% of all Bcl2 breakpoints. The breakage frequency within this 175 bp region is
300-fold higher than what one would expect to occur at random. Patient breaks within the MBR are not
uniformly distributed across the entire 175 bp, but rather are focused in 3 peaks centered around CG motif.
The 105 bp intermediate cluster region (icr) and 561 bp minor cluster region (mcr) of Bcl2 contains 13%
and 5% of the Bcl2 breakpoints correspondingly. Human lymphomas are clinically indistinguishable
regardless of the position of the breakpoint within the 29 kb. (b) Bcl1 breakpoints in Bcl1-IGH translocations.
The major translocation cluster (MTC) of Bcl1 (also known as CCND1) is located 109 kb upstream of
CCND1 gene and contains 64% of all Bcl1 breakpoints. The rest of Bcl1 breakpoints are scattered within
a 344 kb intergenic region between CCND1 gene at the telemetric end and MYEOV gene at the centromeric
end of chromosome 11. (c) E2A breakpoints in E2A-PBX1 translocations. Over 75% of the E2A
breakpoints are in a 23 bp region within E2A intron 16, making the 23 bp more than 400-fold more fragile
compared with other regions of the same E2A intron. (d) MALT1 breakpoints in IGH-MALT1 and API2-
MALT1 translocations. The breakpoints of MALT1 are focused to an 86 bp region upstream of MALT1
gene in IGH-MALT1 translocations. In contrast, the breakpoints of MALT1 in API2-MALT1 translocation
are scattered in a 29 kb region in MALT1 gene from several introns and exons.
16
1.6 Characteristics of E2A breakpoints
E2A breakpoint sequences were assembled from translocation positive patients described
previously (Fischer et al., 2015; Hein et al., 2019; Hunger et al., 1992; Inaba et al., 1992; Kato et
al., 2017; Paulsson et al., 2007; Tsai et al., 2008; Wiemels et al., 2002) for breakpoint analysis
and sequence motif analysis in the same manner as described before (Cui et al., 2013; Greisman
et al., 2012; Lieber, 2016; Pannunzio and Lieber, 2017; Tsai et al., 2008). The most common
translocation partner of E2A is the PBX1 gene. Fifty-nine (98%) of the 60 E2A breakpoints
sequenced from E2A-PBX1 translocations in 49 patients are mapped to E2A intron 16. Forty-six
of the 60 E2A breakpoints (77%), including 25 from single-breakpoint boundary analysis
(sequence from only one of the translocation partners is available) and 21 from the 11 pairs of
double-breakpoint boundary analysis (sequences of both translocation partners are available),
are clustered in the 23 bp E2A fragile zone (Fig. 1.6a). The remaining one breakpoint of a double-
breakpoint pair is only one nucleotide outside of the 23 bp fragile zone and is treated as in the 23
bp fragile zone for analyses described below. This region is over 400-fold more fragile than the
surrounding DNA within the 3.3 kb intron 16.
The sequence motifs at or near each breakpoint are also informative. Three of the 12
single-breakpoints outside the 23 bp fragile zone in intron 16 are at CpG sites. Of the 25 single-
breakpoint sequences within the 23 bp fragile zone, 16 are at CpG sites. Of the 22 sequences
from 11 pairs of double-breakpoints, 12 are at CpG sites including three pairs with both
breakpoints at CpG sites. The boundaries of the initial DNA breakage site on each of the two
involved chromosomes can be more clearly identified when sequences of breakpoints from the
reciprocal translocation partners are available (Lieber, 2016). Remarkably, all of the eight
reciprocal translocations with a single or no breakpoint at CpG site have a CpG between the pair
of identified breakpoints. This suggests that the initial E2A breakage may arise at a CpG site with
some broken ends being resected by a few nucleotides, as is typical for DSB junctions repaired
17
by NHEJ (Zhao et al., 2020a). Statistical analysis shows that proximity of E2A breakpoints to the
CG motif is highly significant (Fig. 1.6b; p = 8.3 x 10
-6
in U-test). Besides the CG motif, the AID
hotspot motifs, WRC (p = 1.1 x 10
-3
in U-test) and its related motif WGCW (p = 1.6 x 10
-3
in U-
test), are also statistically highly significant for their proximity to E2A breakpoints (Fig. 1.6b) (Pham
et al., 2003; Yu et al., 2004).
Another common translocation partner of E2A is the HLF gene. Twelve breakpoints, four
single-breakpoint and four double-breakpoints, sequenced from 8 patients with E2A-HLF
translocations are available for breakpoint and motif analyses. Seven breakpoints, three single-
breakpoint and two pairs of double-breakpoints, are mapped in intron 16 and all are in the 23 bp
fragile zone (Fig. 1.6c). Six of these seven E2A breakpoints are precisely at a CpG site in the 23
bp fragile zone (p = 1.3 x 10
-3
in the binomial test). The proximity of E2A breakpoints in the E2A-
HLF translocation to the CpG site (p = 6.9 x 10
-3
in U-test) and the AID WRC hotspot motif (p =
2.4 x 10
-2
in U-test) are statistically significant (Fig. 1.6d).
The remarkable consistency of E2A break sites within the 23 bp fragile zone in patients
with the E2A-PBX1 and E2A-HLF translocations indicates the importance of this 23 bp of DNA,
either on a sequence level or a DNA structural level resulting from the sequence. It also suggests
that the CG motif and AID may play critical roles in the E2A breakage process. However, AID
requires DNA to be at least single-stranded transiently (Pham et al., 2003), and this fact compelled
us to focus extensively on the role of single-stranded feature in this small region of DNA with over
400-fold increased fragility in the mechanic basis of chromosomal translocation in my study.
Therefore, the primary focus of this thesis is the role of AID action at these CG sites and the basis
for how the E2A 23 bp fragile zone can acquire transient ssDNA character to serve as an AID
substrate.
18
Figure 1.6 The E2A breakpoints clustering to a 23 bp zone in E2A intron 16 in patients.
(a) The distribution of E2A breakpoints in patients with E2A-PBX1 translocations. Sixty breakpoints from
49 patients were plotted as triangles. The nucleotide sequence in 23 bp E2A fragile zone is displayed in
red with the two CpG sites in green. Triangles above the E2A sequence are sequenced from derivative
chromosome 19 and below the sequence are from derivative chromosome 1. Solid triangles with matching
colors denote breakpoints on the pair of chromosomes from the same patient with a reciprocal translocation
when sequences of both junctions are available. For all the remaining patients with only one of the
derivative chromosomes sequenced, a hollow black triangle is used to mark the single breakpoint. (b)
Statistical analysis of sequence motifs in proximity of E2A breakpoints in patients with E2A-PBX1
translocations. Statistical analyses on the 60 E2A-PBX1 translocation breakpoints were performed to
measure their proximity to more than 70 DNA motifs. (c) The distribution of E2A breakpoints in patients
with E2A-HLF translocations. Twelve breakpoints from 8 patients were plotted as triangles and illustrated
in the same manner as in (A). Triangles above the sequence are sequenced from derivative chromosome
19 and below the sequence are from derivative chromosome 17. (d) Statistical analysis of sequence motifs
in proximity of E2A breakpoints in patients with E2A-HLF translocations. Analyses were performed to
measure the proximity of the 12 E2A-HLF translocation breakpoints to more than 70 DNA motifs.
For both (b) and (d): W = A or T; Y = C or T; S = G or C. The binomial statistic gives the probability that
the E2A breakpoints occur at the tested DNA motif by random chance. The Student’s t-test and the Mann-
Whitney U-test are two different tests for proximity of the E2A breakpoints to a given motif (Tsai et al., 2008).
19
Chapter 2 Monomeric AID Purification and Activity Test
2.1 Introduction
AID is a well characterized deaminase that converts cytosine in the WRC, CGC and
WGCW motifs to U, creating a U:G mismatch for a unmodified cytosine or a T:G mismatch in case
of a methylcytosine (Pham et al., 2003; Yu et al., 2004). AID is involved in the physiologic DSBs
during CSR in the immunoglobulin heavy chain (Lieber, 2016; Petersen et al., 2001; Roy et al.,
2008) and the breakage phase of many oncogenes, such as Bcl2, Bcl1, and Bcl6, involved in
chromosomal translocations in pre-B cells (Cui et al., 2013; Greisman et al., 2012; Lieber, 2016;
Pannunzio and Lieber, 2017; Tsai et al., 2008). AID protein purification is necessary for in vitro
studies to characterize its role in E2A breakage, which occurs at a significant proximity to AID
hotspot motifs. RNase A treatment has previously been used to remove long RNA species that
inhibit AID activity (Pham et al., 2003; Yu et al., 2004).
Figure 2.1 Illustration of AID.mono and its substrate structures.
(a) The scheme of the AID.mono construct. AID.mono is fused with an MBP tag at the N terminal. The
arrow indicates two non-conservative amino acids (130H and 131R) that are mutated in AID.mono. The
six amino acid at N terminal and seventeen amino acids at C terminal are truncated in AID.mono compared
with wild type AID (WT AID). (b) Possible AID substrate structures. The conformation of substrate channel
and assistant patch of AID determines its high affinity with branched DNA and DNA structures containing
branched DNA such as G4 structure. Figures adopted from Dr. Kefei Yu.
A recent study by the Hao Wu lab reported the purification of a highly active monomeric
AID (AID.mono) (Qiao et al., 2017). The AID.mono is able to deaminate more than 90% of the
substrate in a G-quadruplex (G4) conformation in a 40 min incubation. This AID.mono has a
maltose binding protein (MBP) tag on the N terminal of AID to increase its solubility, two mutations
at non-conserved sites (H130A/R131E) to increase the monomer yield, and has 6 amino acids at
20
the N terminal and 17 amino acids at the C terminal truncated compared with the wild-type AID
(WT AID) (Fig. 2.1a). Crystallization results indicate that AID prefers to bind DNA substrate with
a branched Y structure (Fig. 2.1b). This AID.mono functions the same as WT AID in a SHM-
mimic rifampicin resistance assay in bacteria and the ex vivo CSR assay in cultured murine B
cells (Qiao et al., 2017).
I optimized the procedure for AID.mono purification and tested the activity of AID.mono
and the aggregated AID. I find that the yield of AID.mono varies using different cell types and
buffers, and AID.mono has moderate deaminase activity that is not affected by RNase A treatment.
2.2 Materials and Methods
2.2.1 Oligonucleotides
The oligonucleotides used in this chapter were synthesized by Integrated DNA
Technologies, Inc. (IDT, San Diego, CA). The oligonucleotides were purified using denaturing
PAGE and the recovered oligos were quantitated by spectrophotometry. 5’ labeling of
oligonucleotides was done with [gamma-
32
P] ATP (PerkinElmer Life Sciences, Waltham, MA)
using T4 polynucleotide kinase (PNK) [New England Biolabs (NEB), Beverly, MA] according to
the instructions of the manufacturer. Unincorporated radioisotope was removed by Sephadex
G25 Superfine resin (Pharmacia, Piscataway, NJ) in spin column.
The sequences of the oligonucleotides used in this study were as follows: DL39, 5’-
TTTTTTTAGCTTTTTTT-3’; KY335, 5’-TTTTTTTTTTTTTTACGATTTTTTTTTTTT-3’.
2.2.2 AID.mono expression and purification
2.2.2.1 AID.mono baculovirus activity test
AID.mono baculovirus was a kind gift from the Hao Wu Lab (Boston Children’s Hospital).
Hi-5 insect cells (Invitrogen, Carlsbad, CA) of 10 ml with the density of 1x10
6
cells/ml were infected
with AID.mono baculovirus at an estimated multiplicity of infection (MOI) of 2 and harvested 48
21
hours post infection. Hi-5 cells without infection were used as the negative control for protein
expression. The Hi-5 cells were harvested, washed twice with ice-cold 1 x phosphate-buffered
saline (PBS), lysed with 1 ml of ice-cold lysis buffer (20 mM HEPES at pH 7.5, 300 mM NaCl, 0.5
mM TCEP, 10% glycerol, and a mixture of proteinase inhibitors), and sonicated (3 rounds with 20
pulses of 50% duty cycle for each round). The non-soluble components were removed by
centrifugation at 12,000 rpm for 30 min at 4°C (Eppendorf Centrifuge 5415C, Hamburg, Germany).
The supernatant that contained soluble cellular exact was mixed with 100 µl of amylose resin (GE
Healthcare, Chicago, IL) which was prewashed thoroughly with cell lysis buffer, and incubated at
4°C on the rotator for 1 hour. The resin was recovered by centrifugation at 2,000 rpm at 4°C for
2 min (Eppendorf Centrifuge 5415C) and washed with cell lysis buffer thoroughly. AID.mono was
eluted from the resin with 40 µl of amylose elution buffer (20 mM HEPES at pH 7.5, 300 mM NaCl,
0.5 mM TCEP, 10% glycerol, 50 mM maltose, supplied with proteinase inhibitors). The whole cell
lysate in lysis buffer, soluble cellular extract, flowthrough of amylose resin binding, and the fraction
eluted from amylose resin were analyzed by a 10% Coomassie stained SDS-polyacrylamide gel
(SDS-PAGE).
2.2.2.2 AID.mono expression in Hi-5 cells
Similar procedures as above were performed to purify AID.mono from Hi-5 cells. Hi-5
cells (200 ml scale with 1x10
6
cells/ml) infected with AID.mono baculovirus were harvested and
resuspended with 14 ml of lysis buffer followed by sonication on ice. The soluble cellular extract
was mixed with 2 ml of prewashed amylose resin. AID.mono was eluted with 5 ml of cold amylose
elution buffer from the resin after a thorough wash of the resin, followed by purification with
Superdex S200 10/30 chromatography (GE Healthcare, Chicago, IL) using S200 buffer (20 mM
HEPES at pH 7.5, 200 mM NaCl, 0.5 mM TCEP).
2.2.2.3 AID.mono expression in Sf9 cells
AID.mono purification with Sf9 insect cells (Life Technologies, Carlsbad, CA) was
performed in the same manner as described above except that the amylose resin eluent was
22
filtered through a 0.22 µm membrane (Paul Corporation, Washington, NY) before being loaded
onto the S200 chromatography column.
Further optimization was done to increase the yield of AID.mono. The NaCl concentration
in cell lysis buffer, amylose elution buffer, and S200 buffer was increased to 500 mM (mentioned
as high salt buffer hereafter). AID.mono on the amylose resin were eluted with 2 ml of amylose
elution buffer (500 mM NaCl) twice followed by filtration and injection onto the S200
chromatography separately. The monomeric fractions from size exclusion chromatography were
combined and concentrated 10-fold with Amicon® Ultra-15 10K centrifugal filter (MilliporeSigma,
Burlington, MA) and stored in -80°C.
2.2.3 AID activity test
The AID activity assay was shown schematically in Figure 2.3a (Yu et al., 2004). DNA
uracil glycosylase (UDG) removal of uracil deaminated from cytosine by AID.mono generates an
abasic site (AP site) that can be cleaved by alkali treatment at 95°C (Abner et al., 2001). The
DNA cleavage products can be separated and visualized using denaturing PAGE to evaluate AID
activity. Each 10 µl reaction, containing 10 or 100 nM AID (monomeric or aggregated), 100 nM
of radioactively labeled DL39, and 1 unit of UDG (Invitrogen, Carlsbad, CA), with final
concentrations of 20 mM HEPES at pH 7.5, 100 mM KCl, and 1 mM DTT was incubated at 37°C
for 30 min. Then 1 µl of 2 M NaOH (0.18 M final concentration) was added to the reactions
followed by incubation at 95°C for 5 min. Equal volume of formamide was added to the reactions
followed by heating at 100°C for 5 min. The reactions were transferred immediately to ice before
resolved by the 20% urea-denaturing PAGE (7 M). Gels were visualized by autoradiography
using a phosphor-imager FX (BioRad Laboratories, Hercules, CA) and quantified with ImageJ
software. The percentage of deamination was calculated by dividing the signal intensity of the
product band by that of the whole lane.
The effect of RNase A on AID.mono activity was evaluated. The 10 µl reaction mixture
containing 40 nM radioactively labeled KY335, 120 nM AID.mono, and 1 unit of UDG in the buffer
23
containing 20 mM Tris-HCl at pH 7.5, 2 mM MgCl2, and 2 mM DTT were incubated at 37°C for 60
min followed by NaOH and formamide treatments as described above. RNase A was added at a
final concentration of 10 ng/µl when specified. All reactions were resolved with 12% urea-
denaturing PAGE (7 M) and imaged in the same manner as mentioned above.
2.3 Results
2.3.1 AID.mono can be purified from Sf9 cells with high salt buffers
The activity of the AID.mono baculovirus was evaluated before large scale AID purification.
Compared with the uninfected cells, Hi-5 cells infected with AID.mono baculovirus show high
levels of AID expression, which indicates the AID.mono baculovirus is active and efficient in
infecting cells (Fig. 2.2a).
AID.mono baculovirus was used to infect Hi-5 cells for monomeric AID purification (Fig.
2.2b). There are two peaks in the chromatography of the S200 size-exclusion column: peak 1 for
aggregated AID and peak 2 for AID.mono. The much smaller AID.mono peak compared with the
aggregated peak suggests that a low amount of AID in monomeric form is purified using Hi-5 cells.
Sf9 cells were used for AID.mono purification instead of Hi-5 cells (Fig. 2.2c). The monomeric
peak (peak 2) is larger than the one resulting from Hi-5 cells, but the majority of AID protein is still
in aggregated form (peak 1). NaCl concentration in all buffers was increased to 500 mM in order
to increase the monomer yield from Sf9 cells (Figure 2.2d). The monomeric peak (peak 2)
resulting from the high salt buffers is larger compared with the ones resulting from previous lower
salt conditions. The monomeric fractions were collected and concentrated for SDS-PAGE
analysis (Fig. 2.2e). The concentration of AID.mono was determined based on the BSA standards.
Approximately 20 µg AID.mono was purified from 200 ml of infected Sf9 cells.
24
Figure 2.2 AID.mono purification.
(a) AID.mono baculovirus activity test. Lane 1 to lane 4 are controls from Hi-5 cells without infection and
lane 5 to lane 8 are from AID.mono baculovirus infected Hi-5 cells. Lane 1 and lane 5: whole cell lysate;
lane 2 and lane 6: supernatant after sonication; lane 3 and lane 7: flowthrough of amylose resin; lane 4 and
lane 8: AID eluted from amylose resin. (b) S200 size exclusion chromatography of AID using Hi-5 cells. (c)
S200 size exclusion chromatography of AID expressed using Sf9 cells. (d) S200 size exclusion
chromatography of AID using Sf9 cells and high salt buffers. For (b) to (d): red line is the UV absorbance
at 260 nm. Blue line is the UV absorbance at 280 nm. Brown line is the conductivity. Peak 1 indicates the
aggregated AID.mono. Peak 2 indicates the AID.mono. (e) Fractions in AID.mono purification using Sf9
cells and high salt buffers. Lane 1: supernatant after sonication; lane 2: flowthrough of amylose resin; lane
3: first amylose elution; lane 4: second amylose elution; lane 5: first amylose elution pass through 0.22 µm
filter; lane 6: second amylose elution pass through 0.22 µm filter; lane 7: fraction from peak 1 (aggregated
fraction); lane 8: fraction from peak 2 (monomeric peak). Last four lanes: BSA standards of different
amounts.
25
2.3.2 AID.mono is active and able to deaminate cytosines of ssDNA substrate
Figure 2.3 AID.mono can deaminate cytosines on ssDNA substrate.
(a) Scheme of the AID activity assay. ssDNA is shown as a black thin line in the figure. Asteroid at the 5’
end represents the isotope label by [gamma-
32
P] ATP. Uracil generated by AID is removed by UDG,
resulting in an AP site. DNA cleavage at AP sites is introduced by alkali treatment. (b) AID.mono activity
on ssDNA. Radioactively labeled DL39 which contains one cytosine in the middle of the oligo is used as
the substrate. Both monomeric AID and aggregated AID are tested at 10:1 and 1:1 of AID to substrate
ratios. The deamination percentage is shown at the bottom of the figure for each reaction. Lane 1: substrate
control; lane 2: substrate treated with 10 nM AID.mono; lane 3: substrate treated with 100 nM AID.mono;
lane 4: substrate treated with 10 nM aggregated AID; lane 5: substrate treated with 100 nM aggregated
AID. (c) Influence of AID.mono activity by RNase A. KY335 with one cytosine in the middle is used as the
substrate. Reactions are done with 3:1 of AID to substrate ratio. RNase A is added to reaction 3. The
percentage of deaminated substrate based on the signal intensity of each band is shown at the bottom of
the figure. Lane 1: substrate control with no AID treatment; lane 2: substrate treated with 120 nM AID.mono;
lane 3: substrate treated with 120 nM AID.mono and 10 ng/µl of RNase A.
To evaluate the activity of AID.mono, a 17 nt radioactive oligonucleotide containing a
single cytosine in AGC AID hotspot motif was used in the in vitro deamination assay (Figure 2.3b).
Reactions done with AID.mono show a smaller deamination product while the ones using
26
aggregated AID do not (lane 2 and lane 3 versus lane 4 and lane 5 in Figure 2.3b), and increased
AID.mono concentration in the reaction leads to increased deamination product (lane 2 versus
lane 3 in Figure 2.3b). Specifically, 2% of the substrate is deaminated by AID.mono in the reaction
with AID.mono to substrate ratio at 1:10, and it increases to 8% when the AID.mono to substrate
ratio is 1:1. This result demonstrates that the AID.mono is active for cytosine deamination
whereas the aggregated form is not active. Of note, no RNase A was used in these purifications,
in contrast to purifications of WT AID used previously (Pham et al., 2003; Yu et al., 2004).
The 30 nt KY335 with one cytosine in TAC AID hotspot motif was used to evaluate the
influence of RNase A on the AID.mono activity. Three smaller bands are observed in AID
deamination reactions (Fig. 2.3c). Multiple bands in the AID deamination reactions, which was
also observed previously (Yu et al., 2005), might be caused by the impurity of the KY335
substrate or other unknown reasons. AID.mono leads to the deamination of 18% of the substrate
in presence of RNase A, compared with 16% when RNase A is not added. This result indicates
the deamination activity of AID.mono is not significantly affected by RNase A.
2.4 Discussion
The monomeric state of AID is difficult to maintain throughout the entire purification
process. Therefore, only a small portion of AID exists in the monomeric state, as shown in the
S200 chromatography; and a large percentage of the expressed AID is in aggregated form (Fig.
2.2). AID.mono purification is affected by multiple factors, including cell types and buffer
conditions. The AID.mono purified has moderate deaminase activity (around 8%) (Fig. 2.3b),
which is much lower than the glutathione S-transferase (GST)-tagged murine AID purified in our
Lab previously (as high as 90%) (Yu et al., 2005) or the GST-tagged human AID from the
Goodman lab (Bransteitter et al., 2003). It is also lower than the AID.mono purified by the Hao
Wu lab (90% on substrate with G4 structure) (Qiao et al., 2017). It is possible that a higher
27
percentage of deamination product can be obtained if the ratio of AID.mono to substrate is
increased. However, the high salt concentration of the AID.mono elution buffer and the relative
low yield of AID.mono preclude any further increase of AID.mono in the reaction.
The activity of GST-tagged AID is greatly influenced by the presence of RNase A in the
reaction (Pham et al., 2003; Yu et al., 2005), whereas RNase A does not affect the AID.mono
activity. The different impact of RNase A on AID.mono and GST-AID may be caused by the tags,
mutations of amino acids, or the overall structure of the two recombinant proteins. I purified this
variant of AID.mono with the intention of using it for biochemical studies in Chapter 5. However,
the activity was lower than described in the published work from the Hao Wu lab, and therefore,
I used human wild-type GST-AID provided as a gift from Dr. Myron Goodman’s laboratory at USC.
28
Chapter 3 DNA Methylation Analysis of CpG Sites within
the E2A Fragile Zones
3.1 Introduction
Methylcytosines deaminated by AID become thymine, which leads to slowly repaired T:G
mismatches in the genome (Schmutte et al., 1995; Walsh and Xu, 2006). The persistent T:G
mismatch is vulnerable to conversions by the structure specific nucleases (e.g., RAG complex
and Artemis) to a DSB inside of cells (Cui et al., 2013; Pannunzio and Lieber, 2019; Tsai et al.,
2008). Studies have shown the impact of long lived DNA lesions on the breakage of DNA, and
this has implications for chromosomal translocations (Pannunzio and Lieber, 2017, 2018).
Our studies on the Bcl2 major breakpoint region (MBR) region using minichromosome
substrates in human pre-B cells showed that MBR Bcl2 breakage is highly dependent on both
AID expression and CpG methylation (Cui et al., 2013). The methylation state of the two CpG
sites within the 23 bp E2A fragile zone has not been characterized. Bisulfite treatment of
denatured DNA is a widely used technique for cytosine methylation analysis of DNA (Frommer et
al., 1992). In this study, the methylation state of two CpG sites within the 23 bp E2A fragile zone
in pre-B cells is studied by the traditional (denatured) bisulfite assay. The results show that the
two CpG sites in the E2A fragile zone have a significant level of methylation in three pre-B cell
lines examined.
3.2 Materials and Methods
3.2.1 Oligonucleotides
The oligonucleotides used in this chapter were synthesized by IDT. The sequences of the
oligonucleotides used in this study are listed as follows. The nucleotides in lowercase
29
represented the bases that were changed to be complementary to the bisulfite converted DNA in
the assay. Underlined sequences were the adaptor regions for Illumina i7 and i5 index primers.
DL76: 5’-ACACTCTTTCCCTACACGACGCTCTTCCGATCTGGGGAtAAAGGAAAAGGTTGGG
GAt-3’; DL71: 5’-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTCCCTaaCCCCCACCCA
CTCCATa-3’.
3.2.2 Cell culture
Nalm-6, Reh, and 697 pre-B cells were cultured in RPMI 1640 medium supplemented with
10% fetal bovine serum. Cells were maintained at a density of 2.5 x 10
7
/ml and usually passaged
to 0.5 x 10
7
/ml. Harvested cells were washed twice with 1 x PBS buffer before further processing.
3.2.3 Denatured sodium bisulfite treatment
The denatured sodium bisulfite assay was performed as previously described (Hsieh,
1999). Briefly, genomic DNA (gDNA) extracted from Nalm-6, Reh, and 697 cells was digested
with EcoRI and denatured in a final concentration of 0.6 M NaOH with incubation at 37°C for 15
min. The 5 M Sodium bisulfite solution was freshly prepared with a final 2.5 M sodium
metabisulfite (J.T.Baker, Phillipsburg, NJ) and 2.5 M NaOH. The denatured gDNA was treated
with a final concentration of 2.3 M sodium bisulfite solution and 0.5 mM hydroquinone (Sigma, St.
Louis, MO) at 55°C for 4 hours in dark sealed with liquid wax. The bisulfite-treated gDNA was
purified with the Wizard DNA clean-up resin (Promega, Madison, WI) after removal of the liquid
wax. Purified DNA was treated with a final concentration of 0.3 M NaOH at 37°C for 15 min
followed by ethanol precipitation.
Fully converted primers (DL76 and DL77) for converted E2A top strand were used in PCR
with Taq polymerase (30 cycles of 94°C 30s, 68°C 20s, 72°C 1 min). Illumina i7 and i5 index
primers that each contained a unique 6 nt barcode were added to each sample by a second round
of PCR using Taq polymerase (25 cycles of 94°C 30s, 65°C 75s). The final PCR products were
sequenced by pair-end sequencing with MiSeq Reagent Kit v2 Nano (2 x 250 bp) using Illumina
MiSeq platform (Illumina, San Diego, CA).
30
3.2.4 Data analysis
Reads originating from each cell line were sorted according to the i5 and i7 indexes and
further processed by CTanalysis_v2.1 program which was written and developed by Yong-Hwee
Eddie Loh of USC Norris Medical Library to screen for C to T mutations. Briefly, primer sequences
were first trimmed off and the ones without primer sequences were considered as low-quality
reads and discarded. Reads shorter than 220 nt in length after primer trimming were filtered out.
E2A NTS was used as reference sequence. Reads were filtered with the three criteria sequentially.
First, the number of mismatches should be no greater than 70 for both R1 (67 Cs in the reference
sequence) and R2 (69 Cs in the reference sequence). second, the number of other types of
mutations for reads passed the first criteria should be no greater than 2 for both R1 and R2. Third,
R1 and R2 in a pair of read passed the second criteria should be consistent with each other in
the overlapping region.
Since fully unmethylated molecules will have identical C to T conversion patterns,
therefore, identical reads were treated as separate molecules and kept for further analysis. Reads
which passed all three criteria above were further sorted into five categories:
i.) CT_only. Reads only contained C to T conversions without G to A conversions.
ii.) GA_only. Reads contained G to A conversions without C to T conversions.
iii.) CT_mix. Reads with both C to T conversions and G to A conversions. The number
of C to T conversions was greater than that of G to A conversions.
iv.) GA_mix. Reads with both C to T and G to A conversions. The number of G to A
conversions was greater than that of C to T conversions.
v.) CTGA_equal. Reads with equal number of C to T and G to A conversions,
including the perfect reads.
Reads in CT_only and CT_mix categories were grouped together in group A. They were
considered to arise from converted E2A top strand. Reads in GA_only and GA_mix categories
were grouped together in group B which were considered to arise from converted bottom strand.
31
Reads with equal number of C to T and G to A conversions were either caused by random
mutations or considered as hybrid reads due to PCR.
The conversion percentage of the cytosine at a certain position of E2A top strand was
calculated by dividing the number of reads with C to T conversions at that position by the total
number of reads in group A.
3.3 Results
3.3.1 CpG sites within the 23 bp E2A fragile zone have detectable methylation in pre-B
cells
Methylated CpG sites can be converted to long-lived DNA lesions after deamination by
AID (Pannunzio and Lieber, 2017; Pfeifer, 2006; Schmutte et al., 1995; Tsai et al., 2008; Walsh
and Xu, 2006). I wondered whether the two CpG sites within the E2A fragile zone are methylated
in pre-B cells. Denatured bisulfite treatment of gDNA of three pre-B cell lines followed by deep
sequencing were performed. For all three samples, more than 99% of the reads passing all three
criteria mentioned in the method part are sorted to group A which contains C to T conversions. It
indicates the high specificity of the converted primers in amplification of the converted E2A top
strand.
The cytosines at other sequence contexts other than CpG sites are fully converted to
thymine in all three samples, as they should be (Fig. 3.1). The complete conversion of
unmethylated cytosine at non-CpG sites in the denatured bisulfite treatment indicates the high
efficiency of the bisulfite conversion for unmethylated cytosines on ssDNA. The conversion
percentages of cytosines within the three CpG sites are less than 100%, indicating a partial
methylation of those cytosines. The CpG site located upstream of the fragile region is methylated
at ~90% in all three pre-B cell lines (Table 3.1). The methylation percentage of the two CpG sites
32
within the 23 bp E2A fragile zone varies among three pre-B cells lines. Reh cell line has the
highest methylation rate at ~90%, followed by Nalm-6 cells at ~15% and 697 cells at ~4%.
Figure 3.1 Traditional (denatured) bisulfite sequencing indicates both CpG sites within E2A fragile
zone are methylated at some levels in pre-B cells.
Genomic Genomic DNA from (a) Reh, (b) 697, and (c) Nalm-6 cells is treated with 5 M sodium bisulfite
mixture after denaturation. Fully converted primers (DL76 and DL77) are used to amplify the 265 bp E2A
top strand after bisulfite treatment. The x axis denotes every position of the 265 bp E2A amplicon. The 23
bp E2A fragile zone is broadly covered in green. Red bars represent the conversion percentage of
cytosines within each CpG site, and indicate the percentage of reads without methylation at the CpG site.
Blue bars were for the conversion percentage of cytosines at non-CpG sites. The numbers of reads
included in denatured bisulfite analysis were 20,933 for Reh cells, 39,587 for 697 cells, and 13,329 for
Nalm-6 cells.
33
3.3.2 Two CpG sites in the E2A fragile zone are independently methylated in three pre-B
cell lines
In order to investigate if the two CpG sites within the E2A fragile zone are methylated
independently, molecules in denatured bisulfite sequencing were sorted according to the
methylation status of each CpG site (Table 3.2). The percentage of molecules with both CG
methylated is 43.6% for 697 cells, 35.6% for Nalm-6 cells, and 80.8% for Reh cells among all
molecules with methylcytosines. The remaining molecules with methylcytosines only have one
of the two CpG sites methylated. This result indicates the two CG sites are methylated
independently in the cells.
Table 3.1 The methylation percentage of the three CpG sites within the 265 bp E2A region varies in
three pre-B cell lines.
The methylation percentages of the two CpG sites within the 23 bp E2A fragile zone and one CpG site
upstream of the fragile zone in three pre-B cell lines are shown in this table.
Table 3.2 Two CpG sites within the fragile zone are independently methylated in three pre-B cell
lines.
The methylation status of the two CpG sites are shown in the top two rows and the number of molecules
that are differently methylated at the two CpG sites in the three pre-B cell lines are listed below.
34
3.4 Discussion
The traditional (denatured) bisulfite sequencing of the 265 bp E2A fragile zone indicates
that both CpG sites within the 23 bp E2A fragile zone are methylated at certain levels and the
level of methylation varies in different human pre-B cell lines. This information indicates that long-
lived T:G lesions could potentially arise in the genome, if AID deaminates these methylated CpG
sites within the fragile zone inside of human pre-B cells.
35
Chapter 4 Assessment of the Single-Stranded Character
of the DNA at the E2A Fragile Zone in Human Pre-B Cell
Lines from Acute Lymphoblastic Leukemia/Lymphoma
Patients
4.1 Introduction
DNA inside the cells undergoes structural fluctuation (breathing) at physiological
temperatures that reflects the heterogeneity of the dsDNA stability (von Hippel et al., 2013). The
structural fluctuation of dsDNA is sequence-dependent and has biological functions in the
regulation of protein binding (von Hippel et al., 2013). Previous studies have shown that
sequences with C-strings are structurally and thermodynamically more active than those with
alternating cytosines and guanines, resulting in a transiently unbase-paired state at these
locations (Dornberger et al., 1999; Dornberger et al., 2001; Trantı
́ rek et al., 2000; Tsai et al., 2009).
The x-ray crystallography results show that DNA with C-strings adopt an intermediate B/A DNA
duplex structure, which is less stable than the canonical B DNA duplex structure. Our previous
studies of the Bcl2 MBR and Bcl1 MTC fragile regions indicate that they are in a non-B DNA
structure (Raghavan et al., 2005; Raghavan et al., 2004b; Tsai et al., 2009). The structural
fluctuation of the non-B DNA may predispose certain regions to chromosomal fragility because
the transient single-stranded region resulting from frequent DNA breathing is vulnerable to DNA
damage inside the cells.
Native bisulfite treatment is one of the optimal chemical probing methods to characterize
the structural fluctuation of dsDNA. It is a well-developed method that is widely used to probe for
ssDNA regions in otherwise native DNA (Malig et al., 2020; Raghavan et al., 2004b; Tsai et al.,
2009; Yu et al., 2003). Briefly, unpaired cytosines (e.g., a loop structure) are converted to Us in
ssDNA regions under native conditions (37°C). These Us are converted to thymine in the PCR
36
amplification step done after bisulfite treatment. Paired cytosines in dsDNA are resistant to
bisulfite conversion under native conditions and remain unchanged after PCR (Fig. 4.1).
Therefore, regions with frequent structural fluctuation in single-stranded state will be revealed as
consecutively converted cytosines in native bisulfite sequencing.
Figure 4.1 Illustration of the native bisulfite assay.
(a) Paired cytosines in native bisulfite treatment are resistant to conversions. Paired cytosines within duplex
DNA that cannot be converted by bisulfite conversion at native condition remain unchanged after PCR
amplification. (b) Unpaired cytosines in native bisulfite treatment are converted to thymine. Unpaired
cytosines within a loop structure can be converted by bisulfite to uracil, which are read as thymine after
PCR. Cytosines and the bases converted from the original cytosines were highlighted in red.
The rate limiting step of bisulfite cytosine conversions is the hydrolytic deamination of the
sulfonated cytosine, which can be facilitated with high concentration of HSO3
-
in the reaction (Fig.
4.2) (Hayatsu et al., 2008; Hayatsu et al., 1970; Sono et al., 1973). The commonly used sodium
bisulfite solution can reach a highest concentration of 5 M. It often takes 16 to 20 hours to achieve
a complete conversion of cytosines in ssDNA region to uracil. Hayatsu et. al. found that the
ammonium bisulfite solution was able to reach a final concentration of 10 M (Shiraishi and
Hayatsu, 2004). The high concentration of HSO3
-
in the 10 M ammonium bisulfite solution greatly
37
enhances the cytosine conversion speed. Complete C to U conversions can be achieved within
10 min at 90°C or 20 min at 70°C for DNA treated with this 10 M ammonium bisulfite solution in
denatured bisulfite sequencing, while leaving the methylated cytosines intact (Hayatsu et al.,
2008). Decreased treatment time is extremely important in native bisulfite sequencing. It can
potentially reduce the background cytosine conversions during the long incubations under native
conditions.
Figure 4.2 Bisulfite mediated deamination of cytosine.
Bisulfite can specifically react with cytosine of ssDNA region and convert it to uracil. Three steps are
involved. Step 1 is the reversible addition of HSO3
-
to cytosine. The product C-SO3
-
from step 1 is stabilized
in acidic solution but is unstable in neutral solution. Step 2 is the hydrolytic liberation of NH3. Step 2 is the
rate-limiting step of the conversion reaction. High concentration of HSO3
-
facilitates the hydrolytic
deamination reaction of step 2. The U-SO3
-
is stable in neutral solution, which can be easily converted to
U by alkali in step 3. Step 3 is the release of HSO3
-
to regenerate the double bond, forming uracil.
Nuclease assays were also used in this chapter to characterize any single-strandedness
of the E2A fragile region both in vivo and in vitro. Nuclease S1 and P1 are both single-strand-
specific nucleases that have hundreds of folds higher activity on ssDNA compared with dsDNA
(Box et al., 1993; Desai and Shankar, 2003; Fujimoto et al., 1974; Vogt, 1973). They are
commonly used to probe for regions that are intrinsically easy to unwind and suffer active
structural fluctuation (Collier et al., 1988; Finer et al., 1984; Naik and Raghavan, 2008; Santra et
al., 1994; Williams and Kowalski, 1993). Nucleases P1 and S1 have been used in vitro to study
the non-B form DNA structure in many human fragile regions, including the well-defined Bcl2 MBR
(Javadekar et al., 2018; Raghavan et al., 2004a; Raghavan et al., 2006).
The Artemis nuclease (ARM) is an important protein involved in NHEJ, which can process
DNA end of various configurations when end resection is required before rejoining (Chang and
38
Lieber, 2016; Chang et al., 2015; Ma et al., 2002). The truncated form of the Artemis nuclease,
called ARM23, is used in this Chapter to study structural fluctuation patterns of the E2A fragile
region in vivo. Artemis possesses an intrinsic 5’ exonuclease activity, which allows it to act on
ssDNA and at blunt DNA ends that breathe to an open state (Chang et al., 2015; Li et al., 2014).
The endonuclease activity of Artemis is dependent on DNA-PKcs (Pawelczak and Turchi, 2010).
DNA with a bubble structure which contains ssDNA and dsDNA transition regions can be
recognized and cut by activated Artemis (Chang and Lieber, 2016; Ma et al., 2002). ARM23, the
truncated version of Artemis, only contains 382 amino acids from the catalytic domain at the N
terminus and is continuously active independent of DNA-PKcs. DNA structural fluctuation that
results in focal areas of transient ssDNA can be recognized by ARM23, when it is expressed in
the cells, resulting in DSBs. The DNA breakage will most likely be repaired in live cells and lead
to minor sequence variation due to the nature of the NHEJ repair process generally used to repair
such sites. Deep sequencing can capture the minor sequence alterations around the transient
open region that has been cleaved by ARM23 and then repaired by NHEJ.
In this study, I show ammonium bisulfite is more efficient in characterization of ssDNA at
native condition compared with sodium bisulfite. This assay, together with the melting study and
the nuclease P1 assay, all indicate a transient single-strandedness of the 23 bp E2A fragile zone,
which is critical for AID deamination activity and the creation of DNA lesions. The ARM23
nuclease in vivo assay did not capture any transient single-stranded state of the E2A fragile zone,
and this may be due to the inability of the truncated Artemis enzyme to find rare transient ssDNA
sites in the human genome of living cells.
4.2 Materials and Methods
4.2.1 Oligonucleotides and plasmids
39
The oligonucleotides used in this chapter were synthesized by IDT. Sequences of the
oligonucleotides used in this study are listed in Table. 4.1. Nucleotides in lowercase are the
adaptor regions for the i7 and i5 index primers for Illumina sequencing. Underlined cytosines are
5’-methylcytosine.
Table 4.1 Oligonucleotides.
Oligonucleotides used in this chapter are listed in a 5’ to 3’ direction in the table. The underlined Cs are
methylcytosines. The nucleotides in lowercase are the common adaptor sequences in Illumina i5 and i7
primers.
Two plasmids were constructed for the ARM23 in vivo cutting assay. pDL16 contains a
ARM23 gene driven by the RSV promoter and a 589 bp E2A sequence from E2A intron 16 driven
by the EF1A promoter. pDL14 has the same sequence as pDL16 except that it does not contain
the ARM23 gene. Both pDL14 and pDL16 contain Epstein-Barr Virus (EBV) latent replication
40
origin and Epstein-Barr nuclear antigen-1 (EBNA1) and can be stably maintained and replicate
once per cell cycle in human cells expressing EBNA1 protein. Both pDL14 and pDL16 also carry
the hygromycin resistance gene and cells harboring either of these plasmids can be selected with
hygromycin.
4.2.2 Native bisulfite assay
4.2.2.1 Native bisulfite treatment of Nalm-6 gDNA
Native bisulfite treatment was carried out without denaturing the substrate. Purified gDNA
from Nalm-6 pre-B cells was treated with either sodium bisulfite or ammonium bisulfite solution.
When sodium bisulfite was used for the native bisulfite assay, the 5 M sodium bisulfite solution
was freshly prepared with a final 2.5 M sodium metabisulfite and 2.5 M NaOH. Around 3 µg of
EcoRI digested gDNA from Nalm-6 cells was treated with a final concentration of 2.3 M sodium
bisulfite solution and 0.5 mM hydroquinone at 37°C for 16 hours followed by Wizard Purification
(Promega, Madison, WI). The purified DNA was treated with fresh 0.3 M NaOH for 15 mins at
37°C followed by ethanol precipitation.
When ammonium bisulfite was used for the assay, the 10 M ammonium bisulfite solution
(pH 5.3) was freshly prepared by mixing 2.08 g NaHSO3, 0.67 g (NH4)2SO3•H2O (Wako Chemicals
USA, Inc.), and 5 ml of 50% NH4HSO3 (Wako) in a 15 ml tube. The mixture was kept in 70°C
water bath with intermittently shaking to produce a homogeneous solution. The pH of the
ammonium bisulfite mixture was determined with pH meter. 5 µg gDNA in 50 µl volume was
treated with 500 µl of ammonium bisulfite solution (final bisulfite concentration of 9 M) at 37°C for
30 mins followed by Wizard Purification. The purified DNA was treated with NaOH and
precipitated in the same way as above.
4.2.2.2 Native ammonium bisulfite treatment of whole cells
Between 1,000 to 50,000 cells were embedded in 25 µl of 1% low melting agarose
(agarose plug) and treated with 500 µl of 10 M ammonium bisulfite mixture for 30 mins at 37°C.
After removal of ammonium bisulfite mixture, the agarose plug was washed with 500 µl of water
41
for 4 times, 500 µl of 0.3 M freshly prepared NaOH for 4 times, 500 µl of water for 3 times, and
500 µl of 10 mM Tris (pH 7.5) for 2 times. All washing steps were with 5 minutes incubation time
at room temperature for each except the last NaOH wash was incubated for 10 minutes. After
the washing steps, the agarose plug was melted at 65°C and 1 µl of it was used in PCR.
4.2.2.3 Amplification of the native bisulfite treated samples
Strand-biased primers were designed to be very C-poor (no more than two Cs) for the
target strand and very C-rich (more than 45% of the bases are Cs) for the undesired strand to
amplify the bisulfited treated DNA with high specificity to target either the NTS or the TS in the
reaction. These primers allow amplification of all the molecules (both with and without C
conversions at the priming sites) from the targeted strand and only the rare molecules without any
conversion at the many Cs at the priming sites from the undesired strand in the PCR reaction.
Two rounds of PCR amplification were performed. Primers that contain the adaptor sequence on
the 5’ end and complementary sequence biasing certain strand on the 3’ end (DL63 and DL64 for
the NTS and DL68 and DL69 for the TS) were used in the first PCR round (30 cycles of 94°C 30s,
68°C 20s, 72°C 1 min) to add adopt sequences to each sample. i5 and i7 indexed primers were
used in the second PCR round with Taq polymerase to add a unique index (6 nucleotides) to each
amplified sample (25 cycles of 94°C 30s, 65°C 75s). The indexed samples from different bisulfite
treated cells were pooled and sequenced with Illumina MiSeq Reagent Kit v2 (2 x 250 bp).
4.2.2.4 Native bisulfite sequencing data analysis
Native bisulfite sequencing data was analyzed with the same CTanalysis_v2.1 program
as mentioned in Chapter 3. Similar steps were performed with minor parameter changes. Reads
after priming trimming and lengh control were filtered with three criteria sequentially. First, the
number of mismatches should be no greater than the number of cytosines in R1 and R2. Second,
reads passed the first criteria should have no greater than 2 other types of mutations. Third,
sequences of R1 and R2 in a pair of read should be consistent in the overlapping region. The
pair of reads fulfilled the three criteria above were kept for further analysis. Identical reads were
42
treated as PCR duplicates and only one was included in the native bisulfite analysis since the
chance of two or more independent molecules having identical conversion pattern is low. The
reads were further sorted into five categories as described in Chapter 3. Group A (CT_only plus
CT_mix) contained reads with majority of C to T conversions which were mainly amplified from
NTS. Group B (GA_only plus GA_mix) contained reads with majority of G to A conversions which
were mainly amplified from the TS. Group C (CTGA_equal) contained equal number of C to T
and G to A conversions which arose from hybrid reads in PCR or caused by random mutations.
Reads in group A were only used to analyze the samples prepared with NTS specific primers.
Reads in group B were only used to analyze the samples prepared with TS specific primers. The
percentage of conversion for certain position for NTS was calculated by dividing the number of
reads with C to T conversions at that position in group A by the total number of reads in group A.
The conversions of the TS were calculated in the same way using reads in group B.
All statistical analyses were performed with Mann-Whitney U test. The comparison of
single C conversion rates within and outside of the fragile zone in native bisulfite sequencing of
697 cells was performed between the 6 single C positions within the fragile zone and 25 single C
positions outside of the fragile zone. The conversion rates of cytosines in different length of C-
strings were compared based on the average cytosine conversion rates in 1C, 2C, 3C, 4C, or 5C
motif for each of the two strands in two pre-B cell samples. The p value of 0.05 was used as the
significance cutoff.
4.2.3 DNA melting assay
The 27 bp E2A and control substrates, which were generated by annealing synthesized
oligos, were used in the melting study. The E2A substrate contains the 23 bp E2A fragile zone
and 4 more bases located outside of the fragile zone. The randomized control substrate has the
same nucleotide composition as the E2A substrate. The three base pairs located at each end of
the E2A and control substrates are identical to reduce the effect of end breathing on the substrate
that may be caused by sequence differences.
43
E2A substrate was prepared by annealing of DL94 and DL95 at 1:1 ratio in the annealing
buffer (10 mM sodium phosphate and 1 mM EDTA, pH 7.8) followed by overnight hybridization at
room temperature. Control substrate was prepared under the same condition by annealing DL96
and DL97. To test the effect of methylcytosine on DNA melting, E2A and control oligos with 5’-
methylcytosine modification at two CpG sites were synthesized. The hemi-methylated E2A
substrate was prepared by annealing of methylated DL100 and unmethylated DL95 (E2A-NTS).
Fully methylated E2A substrate (E2A-BOTH) was resulted from annealing of DL100 and DL101.
Methylated DL102 and unmethylated DL97 were used to prepare the hemi-methylated control
substrate (CNTL-NTS). Fully methylated control substrate (CNTL-BOTH) was resulted from
annealing of DL02 and DL103.
The melting reactions (20 µl) contained 480 ng of dsDNA and 0.7x SYBR Green I in buffers
containing 1 mM EDTA and 10 mM, 20 mM, or 50 mM sodium phosphate (pH 7.8). The melting
assays for the methylated substrate were performed in the same condition with 1 mM EDTA and
50 mM sodium phosphate (pH 7.8) buffer. The change of signal strength was recorded every
0.5°C from 60°C to 95°C with ramp time of 5s (BioRad, C1000 thermal cycler).
4.2.4 Nuclease S1 and P1 in vitro cutting assays
4.2.4.1 Substrate preparation
An 810 bp fragment containing a T7 promoter and a 589 bp E2A sequence from its intron
16 was amplified from pDL12 using DL40 and DL41 with Q5 polymerase from NEB (98°C 30 s;
30 cycles of 98°C 10 s, 62°C 20 s, 72°C 30 s; 72°C 2min). A 781 bp control sequence was
amplified from pDR18 with DL42 and DL43 using Q5 polymerase (98°C 30s; 30 cycles of 98°C
10s, 66°C 20s, 72°C 30s; 72°C 2min). The 810 bp E2A substrate and 781 bp control substrate
were both purified by low melting agarose gel followed by radioactive labeling with [gamma-
32
P]
ATP as previously described.
44
Figure 4.3 Preparation of the 150 bp end-labeled and internal-labeled radioactive DNA substrates.
(a) Preparation of the 150 bp end-labeled E2A substrate. The 150 nt E2A NTS is prepared by ligation of
two 75 nt oligonucleotides with a 30 nt scaffold. The ligation product is purified with denaturing PAGE
followed by 5’ labeling with T4 PNK. The TS is prepared and purified in the same way but without
radioactive labeling. The 150 bp E2A substrate with NTS 5’ radioactive label is resulted from annealing of
the two 150 nt oligonucleotides. (b) Preparation of the 150 bp internal-labeled E2A substrate. The right
arm of the E2A NTS is radioactively labeled at 5’ with T4 PNK followed by ligation to the 5’ half of E2A NTS
with the 30 nt oligonucleotide as scaffold. The 150 nt ligation product with internal label is purified by
denaturing PAGE. The 150 nt TS is prepared in the same way as described in (a). The 150 bp substrate
with internal label is prepared by annealing of the TS with the NTS. “p” represents 5’ phosphate group on
the oligonucleotide. “*” represents
32
P radioactive label. DNA substrates in the red boxes are the final
products.
The 150 bp E2A substrate with radioactive label at 5’ end of the NTS (end-labeled
substrate) was prepared by annealing of two 150 nt oligonucleotides which were resulted from
the ligation of two 75 nt oligonucleotides as described below (Fig. 4.3a). The 3’ half of the E2A
45
NTS (DL79) was phosphorylated by T4 PNK followed by annealing with the 5’ half of the E2A top
strand (DL78) bridged by a 30 nt scaffold oligonucleotides (DL80). The annealing reaction was
performed by heating at 100°C for 5 min followed by hybridization at room temperature. The
ligation reaction was performed with T4 DNA ligase (Roche, Basel, Switzerland) according to the
manufacturer’s instructions. The 150 nt E2A NTS from the ligation reaction was purified and
recovered with urea-denaturing PAGE (7 M) followed by radioactive labeling as described above.
The 150 nt TS was the ligation product from the 75 nt DL81 and phosphorylated DL82 with the
30 nt DL83 as the scaffold and then purified in the same manner as described above. The 150
bp E2A dsDNA was prepared by annealing the radioactively labeled E2A top strand with the
bottom strand at 1:1.2 molar ratio.
The 150 bp E2A substrate with radioactive label at position 76 of the NTS (internal-labeled
substrate) was prepared by labeling the 3’ half of the E2A NTS before ligation (Fig. 4.3b). The
150 bp TS without radioactive labeling was prepared in the same way as described above. The
3’ half of E2A top strand (DL79) was first radioactively labeled and then annealed with the 5’ half
(DL78) and the 30 nt scaffold (DL80) followed by ligation using T4 DNA ligase. The 150 nt
radioactively labeled E2A NTS was purified with urea-denaturing PAGE (7 M) before annealing
with an excess of 150 nt E2A TS. The control duplex was prepared in the same manner as
described above using DL85 and the radioactive DL86 with DL87 as the bridge for the NTS and
DL88, DL89, and DL90 (bridge oligo) for the TS.
The oligonucleotides mixture which contained DL27, DL29, BZ3, BZ12, BZ13, BZ18, BZ23,
BZ31, BZ36, BZ40, and BZ60 was radioactively labeled with T4 PNK and used as ladders in the
denaturing sequencing gel. The length of the oligonucleotides in the mixture ranged from 17 nt
to 82 nt. The 30 nt BZ13, 44 nt DL27, 59 nt BZ3, and 82 nt BZ31 were separately labeled in
individual reactions. The single nucleotide ladder was in reaction containing 50 nM of the 5’
labeled 150 nt E2A top strand and 0.5 mU/ml snake venom phosphodiesterase followed by
incubation for 15 min at 37°C in 1 x NEBuffer 2.
46
4.2.4.2 S1 digestion of the 810 bp E2A substrate and 781 bp control substrate
The 10 µl S1 reactions for the E2A substrate containing 40 units of S1, varied amount of
rNTP (1 nmol, 0.2 nmol, and 0.02 nmol), and 400 fmol radioactively labeled substrate in
transcription buffer (40 mM Tris pH 8.0, 6 mM MgCl2, 2 mM spermidine, 3 mM DTT, and 10 mM
KCl) were performed with or without 18 units of T7 RNA polymerase (Promega, Madison, WI).
The reactions were incubated at 37°C for 1 hour and terminated by adding EDTA to a final
concentration of 5 mM followed by 20 min incubation at 70°C. All reactions were further incubated
with 210 ng RNase A for 1 hour at 37°C to remove RNA transcripts. A 1.5% agarose gel was
used to resolve the final products of S1 digestion after phenol/chloroform extraction and ethanol
precipitation. The commercial DNA ladder without radioactive labeling was used and cut from the
gel, strained with EtBr, and imaged. The rest of the gel was imaged by autoradiography using a
phosphor-imager FX as mentioned in previous chapters. The size of the S1 digestion bands was
determined by alignment of the ladder image with the S1 reactions imaged by phosphor-imager.
The 20 µl S1 reactions for the control substrate containing increasing amounts of nuclease
S1 (0, 2, 10, 20, and 40 units), 1 nmol rNTP, and 400 fmol radioactively labeled control DNA in
transcription buffer were performed at 37°C for 1 hour without transcription. The 1.5% agarose
gel was used to resolve the reactions after phenol/chloroform extraction and ethanol precipitation.
The gel was imaged in the same manner as above.
4.2.4.3 S1 digestion of the 150 bp end-labeled substrate
The 10 µl S1 reactions containing 200 fmol of the 5’ end-labeled 150 bp E2A substrate
and various amount of S1 nuclease (0, 1, 3, 6, 10, 13, 16, and 20 units) in transcription buffer
were incubated at 37°C for 30 min. The reactions were terminated by adding EDTA to a final
concentration of 5 mM. Equal volume of formamide was added to the reaction after
phenol/chloroform extraction and chloroform evaporation. The reactions were boiled for 5 min
and immediately plunged into ice before being loaded to the 10% urea-denaturing sequencing gel
47
(7 M) with the radioactive ladder mix as size marker. The gels were imaged by autoradiography
as described above.
4.2.4.4 P1 digestion of the 150 bp end-labeled substrate
The 150 bp E2A substrate radioactively labeled at 5’ end of the top strand was used. The
10 µl P1 reactions containing 200 fmol E2A substrate and various amount of P1 nuclease (0, 0.01,
0.1, 1, 10, and 100 units for 37°C; 0, 1, and 10 units for 50°C and 65°C) in 1 x P1 buffer (50 mM
sodium acetate pH 5.5 and 1 mM ZnCl2) were incubated at 37°C, 50°C, or 65°C for 30 min. The
reactions were terminated and resolved on 10% urea-denaturing gel (7 M) in the same manner
as described above.
4.2.4.5 P1 digestion of the 150 bp internal-labeled substrate
The 150 bp E2A and control substrates with radioactive label at position 76 of the top
strand were used. The 10 µl P1 reactions contained 200 fmol internal-labeled substrate and
various amounts of P1 nuclease (0, 1, 2, 5, and 10 units) in 1 x P1 buffer. The reactions were
incubated at 37°C for 5 min and terminated with EDTA at 5 mM final concentration. Equal volume
of formamide was added to each reaction after phenol/chloroform extraction and chloroform
evaporation. The reactions were boiled for 5 min and then loaded to the 6% urea-denaturing
sequencing gel with increased urea (8 M). Individually labeled oligonucleotides varied between
30 nt to 82 nt were used as ladders. The size of bands on the gel was calculated by a semi-log
plot.
SacI was used to digest the P1 reaction products to distinguish the P1 cutting positions
within the E2A fragile zone and its symmetric region on the 150 bp E2A substrates. The P1
reactions performed with the 150 bp internally labeled E2A substrate mentioned above were
treated with 2 units of SacI in 1 x CutSmart buffer for 1 hour after phenol/chloroform extraction
and chloroform evaporation. Equal volume of formamide was added to the reaction after
phenol/chloroform extraction and chloroform evaporation. The P1 reactions with and without SacI
digestion were loaded to the same gel after being boiled for 5 min.
48
4.2.5 ARM23 in vivo cutting assays
4.2.5.1 Cell culture and transfection
A derivative of the 293 human embryonic kidney carcinoma cell line was used in this study
(293/EBNA1 cell line) (Hsieh, 1994). The 293/EBNA1 cells were cultured with Dulbecco modified
Eagle medium (DMEM) supplemented with 10% fetal calf serum and penicillin-streptomycin. The
calcium phosphate transfection method was used to transfect the pDL14 and pDL16 plasmid to
293/EBNA1 cells (Wigler et al., 1979). The 175 µl transfection mixture containing 2 µg/ml plasmid
DNA and 125 mM calcium chloride in 1 x HEPES buffer was added to the 35-mm diameter plate
when the cells are at about 10% to 20% confluency. Cell culture medium was changed 24 hours
post transfection. The cells were re-plated to a 100-mm-diameter plate when 100% confluence
was reached and 200 µg/ml Hygromycin B was added to the medium for selecting cells harboring
the plasmid. About 10% of the cells were used for seeding of a new plate and 90% of the cells
were harvested whenever the cells reached 100% confluence. The harvested cells were washed
twice with 1 x PBS buffer before further usage.
The pDL14 and pDL16 transfected 293/EBNA1 cells (pDL14-293/EBNA1 and pDL16-
293/EBNA1) were culture for 53 days before harvesting for deletion analysis. The H2O2 treatment
started at day 27. Both transfected cell lines were treated with 1 µM H 2O2 for 41 days. The cells
were usually passaged every four days and 1 µM H 2O2 was added a day after each cell passage.
4.2.5.2 Confirmation of ARM23 expression with western blot
ARM23 is 47.82 kDa in size including the His tag and Myc tag. The expression of ARM23
in pDL16-transfected 293/EBNA1 cells was examined by western blot 7 days after transfection
(no H2O2 treatment) and 54 days after transfection (with H2O2 treatment for 29 days). pDL14-
293/EBNA1 cells were tested in parallel as the negative control. Western blot assay was
performed with Trans-Blot SD Semi-Dry Electrophoretic Transfer Cell (Bio-Rad Laboratories,
Hercules, CA) according to the manufacturer’s protocol. Anti-Myc antibody generated by the
Lieber lab was used as the primary antibody and goat anti-mouse IgG conjugated with HRP (Bio-
49
Rad) was used as the secondary antibody. The membrane was washed thoroughly before
incubation with SuperSignal West Pico PLUS Chemiluminescent Substrate (Thermo Scientific,
Waltham, MA) and imaged with ChemiDoc (Bio-Rad).
4.2.5.3 Amplification of E2A
The E2A on the minichromosomes was amplified with DL91 and DL66 (30 cycles of 94°C
30 s, 60°C 20s, 72°C 30 s). A second round of PCR was done to add 6 nt unique indexes to each
sample with i5 and i7 index primers using Taq polymerase (25 cycles of 94°C 30 s, 65°C 75 s).
The indexed samples were pooled and sequenced with MiSeq Reagent Kit v2 Nano (2 x 250 bp).
4.2.5.4 Data analysis for deletion analysis
Minor sequence alterations (mainly deletions) in the amplified E2A sample were analyzed
using IndelAnalysis_v3.2 developed by Dr. Yong-Hwee Eddie Loh from Norris Medical Library.
Briefly, primer sequences on the reads from pair-end sequencing were trimmed and reads without
primer sequences were removed. R1 and R2 from pair-end sequencing were aligned with the
reference sequence. The deleted positions were filled with “-” for both aligned R1 and R2. Each
aligned read should cover at least 220 nt of the reference sequence and the ones that were too
short were removed. Ten mutations in addition to the indels were allowed for each read. The
ones with more than 10 mutations were removed. Duplicated reads were kept as different
molecules. The pair of reads passed all the filtering were counted as the total number of reads
for deletion frequency calculation. Insertions and deletions were screened on each pair of reads.
The output of the program included number of molecules with same indel motifs, the sequence of
the indels, and the position of the indels.
The deletion frequency of certain position was calculated by dividing the number of reads
with deleted motif starting at that position by the total number of reads. Positions with no data
points meant that no indel motifs starting from that position were detected.
50
4.3 Results
4.3.1 Native sodium bisulfite sequencing shows similar patterns but higher conversion
rates compared with native ammonium bisulfite sequencing
Figure 4.4 Native ammonium bisulfite treatment show similar pattern but lower background
conversion rate compared with native sodium bisulfite treatment with using Nalm-6 genomic DNA.
The conversion rate of E2A non-transcribed strand (NTS) and the transcribed strand (TS) in native sodium
(a) and ammonium (b) bisulfite sequencing using Nalm-6 gDNA are shown. Genomic DNA was treated at
native condition (37°C) with 5 M of sodium bisulfite for 16 hours or with 10 M ammonium bisulfite solution
for 30 min. The x axis denotes every position of the 389 bp E2A amplicon which includes the 270 bp E2A
NTS and the 276 bp E2A TS in a 5’ to 3’ direction. The conversion rate of NTS is shown in blue at the
positive direction of y axis and the TS conversion rate is in orange at the negative direction of y axis. The
directions of y axis does not indicate positive or negative values but represents the conversion results for
two strands. The 157 bp overlapping region between the NTS and the TS contains the 23 bp E2A fragile
zone which is covered in salmon. The 5 CpG sites are aligned with green dashed lines. A total of 4,323
and 3,898 reads were included in the calculation of cytosine conversion rate on the NTS and TS of native
sodium bisulfite treated Nalm-6 gDNA respectively. A total of 3,431 and 4,449 reads were included for the
NTS and the TS of native ammonium bisulfite treated Nalm-6 gDNA respectively.
51
Sodium bisulfite is commonly used for cytosine conversion experiments; however, the long
incubation time required for native conditions can lead to the accumulation of non-specific
cytosine conversions. Ammonium bisulfite, with a higher achievable concentration, has been
shown to be more efficient than sodium bisulfite and enables the completion of cytosine
conversion within 15 to 30 min in traditional (denatured) bisulfite treatment (Hayatsu, 2008;
Shiraishi and Hayatsu, 2004). No study has been done yet to evaluate the performance of the
ammonium bisulfite in native bisulfite treatment. The purified gDNA from Nalm-6 cells was treated
separately with sodium or ammonium bisulfite solution in native condition followed by deep
sequencing to compare the performance of the two reagents in single-stranded DNA
characterization.
The C to T conversion rate in native sodium bisulfite treatment varies between 0% and
60% at different positions of the tested E2A amplicon, with most of them above 30% (Fig. 4.4a).
The ammonium bisulfite conversion rate in native condition varies between 0% and 20% with the
majority of them below 5% (Fig. 4.4b). The overall conversion patterns of the tested E2A region
are similar for the native sodium bisulfite sequencing and native ammonium bisulfite sequencing.
This indicates that native ammonium bisulfite treatment can identify the DNA breathing pattern as
the sodium bisulfite treatment. The sodium bisulfite reactivity across the whole E2A amplicon is
higher than that in ammonium bisulfite, likely caused by the accumulation of random DNA
breathing during the long incubation time (up to 16 hours). The shorter treatment time of native
ammonium bisulfite assay (within 30 min) compared with the sodium bisulfite assay (16 hours)
reduces the non-specific conversions and makes it easier to identify the regions with active
thermal fluctuation.
52
Figure 4.5 Whole cell native bisulfite sequencing identifies sites with single-stranded DNA character
in the 23 bp E2A fragile zone and the downstream region.
(a) Native ammonium bisulfite sequencing results from live Nalm-6 pre-B cells. (b) Native ammonium
bisulfite sequencing results from live 697 pre-B cells. (c) Exemplary E2A NTS molecules in native
ammonium bisulfite treated Nalm-6 cells (top panel) and 697 cells (bottom panel). A collection of 160
unique molecules sequenced from the NTS of the native bisulfite treated pre-B cells is shown in each panel.
Each row represents the sequence of a unique molecule in a 5’ to 3’ direction. The single Cs on the
reference sequence of the NTS are marked by black squares at the top row of each panel. The C to T
conversions at single C sites in each molecule are shown as red squares. The 23 bp E2A fragile zone is
highlighted in green.
For both (a) and (b): The total length of the E2A region examined is 389 bp with 157 bp of overlapping
region between the NTS and the TS. The x axis denotes every base position of the E2A NTS along the
389 bp region examined. The 23 bp E2A fragile zone is highlighted with salmon background. Cytosine in
the CG motif is marked with a green dashed line. From the beginning of the 23 bp E2A fragile zone to the
end of the E2A sequence examined is 206 bp in length. The conversion rate of cytosines on the E2A NTS
is shown above the x axis, and conversion rate of cytosines on the E2A TS is shown below the x axis. The
conversion rate of cytosines in different lengths of C-strings is represented in different colors as indicated
in the figure. The average native bisulfite conversion rate of the single Cs located outside of the 23 bp E2A
53
fragile zone is marked with grey dashed lines with numbers labeled on the right to serve as a reference
point for baseline conversion rate. Two major bisulfite peaks, Peak 1 (at the edge of the 23 bp E2A fragile
zone) and Peak 2 (located 100 bp downstream), are indicated by the red brackets. The location of the 176
bp region examined in the AID deamination activity experiments is indicated by the grey line with double
arrows at the bottom of the figure for easy reference to the data presented in later chapter. A total of 1,291
and 2,144 reads were included in the calculation of cytosine conversion rate on the NTS and TS of native
bisulfite treated Nalm-6 cells respectively. A total of 1,715 and 4,317 reads were included for the NTS and
the TS for the 697 cell sample respectively.
54
4.3.2 Whole cell native ammonium bisulfite sequencing identifies sites with single-
stranded DNA character in the 23 bp E2A fragile zone and the downstream region
I treated live human pre-B cells (Nalm-6 and 697 cell lines) with ammonium bisulfite and
examined the 23 bp E2A fragile zone and the surrounding regions by amplifying a 270 bp region
from the non-transcribed strand (NTS) and a 276 bp region from the transcribed DNA strand (TS)
for next generation sequencing (Fig. 4.5). Combining the sequence information from NTS and
TS amplicons, the total length of the E2A region examined is 389 bp with a 157 bp overlap, which
includes the 23 bp E2A fragile zone. In both live Nalm-6 and 697 pre-B cells, two major bisulfite
reactive peaks (labeled Peak 1 and Peak 2 in Fig. 4.5) are clearly higher than the baseline
average of single Cs (isolated cytosines surrounded by G, T, or A nucleotides) conversion at sites
outside of the 23 bp fragile zone. Peak 1 observed in the native bisulfite assay is located at the
edge of the 23 bp E2A fragile zone on the NTS at the sequence motif C5AG4, and Peak 2 is ~100
bp downstream of the 23 bp E2A fragile zone on the TS also at a C5 motif.
Importantly, the conversion rate at the single Cs within the 23 bp E2A fragile zone and the
downstream region is higher than the average single C conversion baseline, which is established
based on the average conversion rate of single Cs located outside of the E2A fragile zones.
Specifically, the average conversion rate at single C sites within the 23 bp fragile zone is 6.3%,
which is ~2.5-fold higher than the baseline of 2.4% in the 697 cells (p = 1.1 x 10
-2
). Furthermore,
the single C sites within the 23 bp fragile zone in both cell types are converted at multiple
consecutive single Cs on the same molecule for a majority of the molecules, indicating the
existence of increased single-stranded character of more than a few bases in length within the
fragile zone (Fig. 4.5c). Multiple consecutive single C conversions are observed in downstream
region immediately outside the E2A fragile zone in Nalm-6 cells, further supporting the presence
of single-stranded character in the broader region harboring the 23 bp fragile zone and its
downstream region. The native bisulfite assay using live cells reveals the transient (i.e.,
intermittent) single-stranded character in the 23 bp E2A fragile zone and the downstream region
55
in the E2A intron 16 examined, suggesting frequent thermal fluctuation. Of note, the single-
stranded DNA character of the 23 bp E2A fragile zone is underestimated due to the bisulfite
resistance of methylated cytosines at CpG sites (3.4% to 16.1% at the CpG sites in 697 and Nalm-
6 cells, respectively; Table 3.1). Importantly, I do not observe a long region of asymmetry of
cytosine conversions on the NTS versus TS (Fig. 4.5), and this clearly rules out the presence of
an R-loop at the E2A fragile zone or the downstream region (Yu et al., 2003; Yu and Lieber, 2019).
4.3.3 Bisulfite reactivity under native (nondenaturing) conditions correlates with the
length of the C-strings
Both of the two native bisulfite conversion peaks (Peak1 and Peak2 in Fig. 4.5) are located
in C-string rich region. I wondered if the native bisulfite reactivity correlates with C-string length.
Sequence features, including single nucleotide density, GC density, and AT density within a 23
bp rolling window, are aligned with the native bisulfite reactivity of the E2A NTS and TS in 697
live cells separately (Fig. 4.6). I found that the bisulfite reactivity does not show any correlation
with the base composition mentioned above (Fig. 4.6a; Fig. 4.6b). In contrast, the bisulfite
conversion reactivity correlates well with the length of C-strings for both strands in both cell lines
(Fig. 4.6c). It indicates that more thermal fluctuation occurs in regions of long C-strings, and the
long C-string is in single-stranded state more frequently.
The density of C-strings with a length of 4 and greater is 1/37 bp in the 666 bp region
containing the 23 bp E2A fragile zone and its downstream region, and this is two-fold higher than
the C-string density of the entire E2A intron 16 (1/72 bp) and much higher than that of the region
further downstream (1/363 bp from downstream of the 666 bp zone to the end of the intron) (Fig.
4.7). The high C-string density in the E2A fragile zone and the 666 bp downstream region
suggests the single-stranded character may extend downstream of the 23 bp fragile zone.
56
Figure 4.6 Bisulfite reactivity under native (nondenaturing) conditions correlates with the length of
the various C-strings at and downstream of the E2A fragile zone.
(a) Correlation of bisulfite reactivity under native condition with sequence features for the NTS of E2A. The
native ammonium bisulfite reactivity of the E2A NTS in whole 697 cells as shown in Figure 4.5b is displayed
57
in the top panel for easy reference. The NTS sequence of the 270 bp is displayed under this panel in a 5’
to 3’ direction. The lower panel is the nucleotide density plot in a moving window of 23 bp for all four
nucleotides analyzed using MATLAB with the function “nt density”. The G+C density of each site is
calculated by addition of the C density and the G density for the same position. The A+T density is
calculated by addition of the A density and the T density for each position. (b) Correlation of native bisulfite
reactivity with sequence features for the TS of E2A. All panels were assembled in the same manner as
described in (a). The TS sequence of the 276 bp is displayed under this panel in a 3’ to 5’ direction, so the
positions are aligned with the NTS illustration in (a). (c) The average conversion rate of cytosines in C-
strings of different length. The average native bisulfite conversion rates of cytosines in 1C, 2C, 3C, 4C,
and 5C motifs on the NTS of 697 cells (solid histograms), TS of 697 cells (histograms with stripes), NTS of
Nalm-6 cells (hollow histograms), and TS of Nalm-6 cells (histograms with dots) are shown. The color of
the histograms representing the length of the C-strings is matched with that in panel (a) and panel (b). The
cytosine conversion rates in the 5C motif are statistically higher than those in 1C, 2C, and 3C. A significant
conversion rate difference between cytosines in 1C and 3C motifs is also observed. The p value of 0.05
was used as significance cutoff.
For both (a) and (b): The average conversion rate of single Cs located outside of the 23 bp E2A fragile
zone is shown with a dashed grey line with the number annotated on the right. The 23 bp E2A fragile zone
is highlighted in salmon color. The bases in all panels in (a) and (b) are aligned at the same scale.
Figure 4.7 The 666 bp region containing the 23 bp E2A fragile zone and its downstream region has
high C-string density in E2A intron 16.
The 23 bp E2A fragile zone plus the downstream region show increased C-string density. The x axis
denotes each position of the 3289 bp E2A intron 16. The C-string length of each position of E2A intron 16
is shown as the y axis. The string length is the number of consecutive Cs within which a certain C is located.
C-string lengths of above 3 are shown in blue for the TS and orange bars for the NTS separately. Three
regions with different C-string densities are widely covered in orange, green, and purple. The 23 bp E2A
fragile zone is at the beginning of the green zone and is colored in dark green.
58
4.3.4 E2A fragile region duplex shows lower melting temperature than the control duplex
of the same nucleotide composition
To explore the stability of the 23 bp E2A fragile zone, a DNA melting study was performed
using a 27 bp substrate that contains the 23 bp E2A fragile zone and a randomized control
substrate of the same nucleotide composition and length. The three nucleotides at each edge of
the E2A and the control substrates were kept identical to reduce the influence of end breathing
on DNA melting. The melting curves were done in three sodium phosphate concentrations
ranging from 10 mM to 50 mM (Fig. 4.8). The speed of fluorescent signal change as with
temperature is plotted against temperature in the melt curves. The melting temperatures of both
E2A and control substrates increase as with the increase of sodium phosphate concentration in
the buffer. Under all three conditions, the melting temperature of the 27 bp E2A substrate is ~2°C
lower than that of the control substrate (Table 4.2).
Figure 4.8 Melting curves show decreased DNA stability of the E2A fragile zone.
Melting curve analyses of the 27 bp E2A and control substrates were performed in buffers containing 10
mM (a), 20 mM (b), and 50 mM (c) sodium phosphate as indicated. The orange dots represent the melting
curve of E2A duplex. The melting curve of the control duplex is shown in dark grey.
Unmethylated, hemi-methylated, and fully methylated E2A and control substrates were
used to investigate the influence of methylcytosine on DNA thermal stability (Table 4.3). E2A
melting temperature increases by 1.5°C when hemi-methylated substrate is used and by 2°C for
fully methylated substrate compared with that of unmethylated DNA duplex. The control substrate
shows 2°C increase in melting temperature when the cytosines are hemi-methylated and 3.5°C
when they are fully methylated. The increased melting temperature with the presence of
59
methylcytosines indicates that methylcytosine stabilizes DNA substrate. The overall melting
temperature of E2A substrate, regardless of the cytosine modifications, is lower than that of the
control substrate. These data show that the base paring of E2A duplex is less stable than the
randomized control duplex with the identical nucleotide composition, indicating that the E2A
fragile zone is prone to DNA thermal fluctuation.
Table 4.2 Melt temperature of E2A duplex is lower than that of the control duplex.
The melting temperature of E2A dsDNA and control dsDNA in buffers with 10 mM, 20 mM, and 50 mM
sodium phosphate is shown. Two replicates for each probe were done (rep 1 and rep 2).
Table 4.3 Presence of methylcytosine on E2A and control duplexes increases the melting
temperature.
The melting temperature of E2A and control substrates in buffers with 50 mM sodium phosphate is shown.
Unmethylated, hemimethylated (NTS), and full methylated (BOTH) substrates for both E2A and control
probes are tested with two replicates (rep 1 and rep 2).
60
4.3.5 Nuclease P1 digestion of the 150 bp internal-labeled E2A substrate suggests the
single-stranded character of E2A fragile zone and its downstream region
4.3.5.1 Nuclease S1 digestion of the 810 bp E2A substrate suggests the potential single-stranded
character of the E2A fragile zone
To further characterize the single-stranded region of E2A substrate, the single-strand-
specific nuclease S1 was used to cut an 810 bp E2A substrate at native state and with
transcription (Fig. 4.9a). E2A substrate was transcribed with reduced concentrations of NTP to
slow the movement of RNA polymerase and increase the duration of single-stranded NTS for S1
cutting (Canugovi et al., 2009; Guajardo et al., 1998). The 23 bp E2A fragile zone is ~290 bp
downstream from the 5’ end of the 810 bp E2A substrate, which has a T7 promoter at the
upstream and radioactive labels at both ends. The addition of 4 units/µl of S1 to the reactions
results in two major digestion products on the gel as indicated by the arrows. The two bands with
the size of ~290 bp and ~520 bp can potentially be caused by S1 cutting within the E2A fragile
zone on the substrate. Since the substrate is radioactively labeled at both 5’ ends, it is also
possible that the S1 cuts in the symmetric region of the fragile zone that located 290 bp upstream
from the 3’ end of the top strand. In comparison, S1 does not result in specific bands on the
control substrate (Fig. 4.9b). The homogenous degradation of the control DNA is observed as
with the increase of S1 concentration in the control reactions. The loss of control substrate in
reaction with high concentration of nuclease S1 might be caused by technical issues during
phenol/chloroform extraction and ethanol precipitation (RXN#5 in Fig. 4.9b). The specific bands
on the E2A but not on control substrate indicate some regions of E2A, possibly the 23 bp E2A
fragile zone, are transiently single-stranded. Besides, the specific bands appeared in S1 digestion
of E2A substrate are independent of NTP concentration and transcription, suggesting the intrinsic
single-stranded character of E2A is sequence-dependent and transcription independent.
61
Figure 4.9 Nuclease S1 digestion of the 810 bp E2A substrate suggests potential single-stranded
character of the E2A fragile zone.
(a) Nuclease S1 digestion of the 810 bp E2A substrate with and without transcription. The 810 bp E2A
substrate contains 589 bp E2A sequence from intron 16. The 23 bp fragile zone, as indicated by the red
vertical line in the figure, is about 290 bp downstream of the 5’ end of top strand. The substrate with
radioactive label at both ends was digested with nuclease S1 at different rNTP concentrations coupled with
and without transcription. The S1 nuclease results in two bands on the gel as indicated by the blue arrows.
(b) Nuclease S1 digestion of the 781 bp control substrate without transcription. The control sequence with
radioactive label at both ends was digested with different concentrations of nuclease S1. All the reactions
were resolved with a 1.5% agarose gel.
4.3.5.2 S1 Digestion of the 150 bp end-labeled E2A substrate shows cuttings over the entire
substrate
To further narrow down the S1 cutting sites on the E2A substrate, a 150 bp E2A substrate
with the radioactive label at the 5’ end of the top strand was digested by S1 concentrations ranging
from 0 to 2 units/µl (Fig. 4.10). The reactions performed without S1 had several bands larger than
150 nt, size of the input substrate, likely to be the caused by incomplete denaturation of the
substrate. The substrate was degraded to smaller products as the concentration of nuclease S1
in the reaction increases. DNA fragments generated by S1 digestion scattered broadly in the lane
on the sequencing get without any specific clustering. The S1 digestion results of the end-labeled
substrate show no specific site of cutting by S1 on the 150 bp E2A substrate.
62
Figure 4.10 S1 Digestion of the 150 bp end-labeled E2A substrate shows cuttings over the entire
substrate.
(a) Illustrate of the 150 bp E2A substrate with radioactive label at 5’ of the top strand. Asterisk at the 5’ end
represents the radioactive label. The 23 bp E2A fragile zone is shown in red line. It is 45 bp to the 5’ of
the top strand. S1 cutting within the fragile region will lead to radioactive bands of 45 nt to 68 nt on the
urea-denaturing sequencing gel. (b) Digestion of the 150 bp end-labeled substrate with nuclease S1. The
reactions were performed with different concentrations of S1 (0 to 2 units/µl) and were resolved on a 10%
sequencing gel (7 M urea). Makers is indicated as “M”. Several oligonucleotides (82 nt) were radioactively
labeled in separate reactions and loaded to different lanes. Phosphodiester digestion of the 130 nt, 44 nt,
59 nt, and 50 nt substrate was performed to generate the ladder with single nucleotide resolution (ladder
lane: 150 nt). Bands resulted from S1 cuttings within the E2A fragile zone are expected to locate between
45 nt and 68 nt as marked by the two red lines. The bands in the region of interest are aligned with
sequence of the E2A fragile zone.
63
4.3.5.3 P1 digestion of the 150 bp end-labeled E2A substrate results in a quick loss of radioactive
signal
Figure 4.11 Digestion of 5’ end labeled substrate with nuclease P1 at different temperatures
indicates the quick loss of radioactive signal on the substrate.
(a) Digestion of 150 bp end-labeled E2A substrate with different concentrations of P1 at 37°C. The P1
concentrations varying 10,000-fold from 0.001 to 10 units/µl were tested in different reactions. (b) Digestion
of 150 bp end-labeled E2A with different concentrations of P1 at 50°C and 65°C. Three P1 concentrations
(0, 0.1, and 1 unit/µl) were tested with increased temperatures at 50°C and 65°C. For both (a) and (b):
same 150 bp end labeled substrate and ladders as shown in Figure 4.10 were used. The reactions were
loaded to a 10% sequencing gel (7 M urea). Bands between 45 nt and 68 nt as highlighted between the
two red lines are due to P1 cutting within the E2A fragile zone. The presence of P1 in the reaction leads to
a 1 nt band as indicated by the blue arrows at the bottom of the gels.
Another single-strand-specific nuclease P1 was used to digest the 150 bp end-labeled
E2A substrate to detect the single-stranded region of the E2A substrate (Fig. 4.11). P1
concentrations varied in 10,000-fold range were tested at different temperatures (37°C in Fig.
4.11a; 50°C and 65°C in Fig. 4.11b). The sole digestion band observed in reactions with 1 and
10 untis/µl of P1 is a 1 nt band at the bottom of the gel at 37°C. Breathing at DNA ends makes
the first base with radioactive label in frequent unpaired status which is preferred substrate of P1
64
nuclease. The increased incubation temperature for P1 digestion leads to the loss of the
radioactive label at a lower P1 concentration (0.1 unit/µl) compared with that at 37°C (Fig. 4.11b).
High incubation temperature accentuates the end breathing, rendering it more susceptible to P1
cutting. The end-labeled substrate cannot be used to capture the single-stranded region of E2A
because the loss of 5’ radioactive label on the substrate makes all digestion bands not detectable
by radioactivity on the gel.
4.3.5.4 P1 digestion of the 150 bp internal-labeled E2A indicates thermal fluctuation within the 23
bp E2A fragile zone
The 150 bp E2A substrate of the same sequence but with radioactive label at position 76
of the top strand was used in the nuclease P1 assay to avoid loss of radioactive signal (Fig. 4.12a).
P1 digestion in the 23 bp E2A fragile zone gives rise to product bands between 82 nt and 105 nt
in size. Sequencing gel with 8 M urea is used to facilitate the complete denaturation of the 150
nt substrate. The reactions with P1 at 0.5 and 1 unit/µl gives several sharp and specific bands as
indicated by the arrows, which are not present in the P1 reactions with control substrate (Fig.
4.12b). Three bands with size of 86 nt, 93 nt, and 99 nt are in the region of interest between 82
nt and 105 nt. The three candidate bands can be the products of P1 cutting within the fragile
zone (shown in red in Fig. 4.12a) or its symmetric region (shown in grey).
To determine more precisely the positions where the three candidate bands originated,
the 150 bp E2A substrate with internal label is digested with SacI. A unique SacI site is located
24 bp away to the 3’ end of the top strand (Fig. 4.12a). The products resulted from P1 cutting
within the E2A fragile zone will be 24 nt smaller on the gel while the ones from P1 digestion of the
symmetric region will stay unchanged after SacI digestion. Comparing the sizes of the P1 cutting
bands before and after SacI digestion allows precise mapping of the P1 cutting sites (Fig. 4.12c).
Three P1 cutting sites are mapped to the 23 bp E2A fragile zone and four P1 cutting sites are
located downstream of the fragile zone (Fig. 4.12d). This result indicates that the 23 bp E2A
fragile zone and its downstream region undergo frequent structural fluctuation.
65
Figure 4.12 P1 digestion of the 150 bp internal-labeled E2A indicates thermal fluctuation within the
23 bp E2A fragile zone.
(a) Illustration of the 150 bp E2A substrate. The red asterisk in the middle represents the radioactive label
on the 150 bp E2A substrate. The 23 bp E2A fragile zone and its symmetric site are shown in red and grey
lines. A SacI site is 24 bp to the 3’ end of NTS. The 23 bp E2A fragile zone is 58 bp to the SacI site on the
top strand. Products from P1 cuttings in the 23 bp E2A fragile zone followed by SacI digestion should
become 24 nt smaller on the denatured gel. While the size of the products from P1 cuttings in the
symmetrically-positioned region should stay the same after SacI digestion. (b) P1 cutting of the 150 bp
E2A substrate followed by SacI digestion at different P1 concentrations. The size of the bands resulted
from P1 digestion was calculated based on a semi-log plot and labeled on the gel pointed by the blue
arrows. The E2A substrate is treated with nuclease P1 at different concentrations (0, 0.2, 0.5, and 1 unit/µl).
Samples before (-) and after (+) SacI digestion are loaded onto the same gel for comparison. The 23 bp
E2A fragile zone before and after SacI digestion is labeled separately on two sides of the gel. P1 cutting
within the 23 bp E2A fragile zone of the 150 bp E2A substrate should generate bands between 82 nt and
105 nt and between 58 nt to 82 nt upon further digestion with SacI. (c) Comparison of bands before and
after SacI digestion provides precise P1 cutting positions. The size of the bands resulting from 0.5 and 1
unit/µl of P1 cutting before (-) and after (+) SacI digestion is listed in four separate columns in the table.
The bands remaining the same before and after SacI digestion are mapped to the symmetric portion of the
23 bp E2A fragile zone, and are highlighted with grey color in the table. The bands become 24 nt smaller
after SacI digestion are mapped to the 23 bp E2A fragile zone, and are highlighted in salmon color. A 6 nt
error range is allowed in the comparison. (d) Illustration of P1 cutting sites in the 23 bp E2A fragile zone
and the surrounding regions. Nucleotide sequence of the 150 bp P1 substrate including the 23 bp E2A
fragile zone (shown in red) and the surrounding regions is illustrated with the nucleotides at which P1 cuts
in green. Under the nucleotide sequence, P1 cut sites in the 23 bp E2A fragile zone are indicated by red
66
boxes with the size of the DNA fragment as shown in panel (c) below each box. Similarly, P1 cut sites
downstream of the 23 bp E2A fragile zone are indicated by grey boxes with the fragment size indicated
below each box. To allow integration of findings from other assays easily, the following results are also
illustrated: the AID mutation sites observed on the E2A substrate without transcription and without RNase
A treatment (shown in later chapter) are marked with green arrow heads. The native bisulfite reactivity from
live 697 pre-B cells (Fig. 4.5b) within the 150 bp P1 substrate are shown with light blue bars for the NTS
and light orange bars for the TS.
Figure 4.13 Western blot analysis confirms ARM23 expression in pDL16 transfected 293/EBNA1
cells both with and without H2O2 treatment.
(a) Expression of ARM23 in pDL16 and pDL14 transfected cells without H2O2 treatment. Transfected cells
were harvested 7 days after transfection without H2O2 addition. Lane 1 and lane 2: pDL16 transfected cells.
Lane 3 and lane 4: pDL14 transfected cells. (b) Expression of ARM23 in pDL16 and pDL14 transfected
293/EBNA1 cells 29 days post H2O2 treatment. The transfected cells were harvested 54 days after
transfection with 29 days of H2O2 treatment. Lane 1: pDL16 transfected cells with H2O2 treatment. Lane
2: pDL14 transfected cells with H2O2 treatment. The ARM23 band is indicated with red arrows in both
figures. ARM23 has a His tag and a Myc tag and has a molecular weight of 47.8 kDa. The western blot
was performed with anti-Myc as the primary antibody and goat anti-mouse IgG conjugated with HRP as the
secondary antibody.
Table 4.4 Similar deletion frequencies within and outside of the 23 bp fragile zone were identified.
The source of the E2A sequence is listed in column 2. Total number of molecules screened in each sample
is listed in column 3. The number of molecules with deletions is listed in column 4. Total deletion frequency
in column 5 was calculated by dividing the number of molecules with deletion in column 4 by the total
number of molecules in column 3. Deletion frequency of the 23 bp region in column 6 was calculated by
adding the deletion frequency of all positions within the 23 bp E2A fragile zone. The deletion frequency
outside the 23 bp region in column 7 was calculated by subtracting the deletion frequency of the 23 bp from
the total deletion frequency. The normalized deletion frequency in column 8 and 9 was calculated by
dividing the deletion frequency of certain region by the length of that region. Fold change in column 10 was
calculated by dividing the normalized deletion frequency of the 23 bp by that of the rest of the region.
67
4.3.6 ARM23 in vivo assay does not show increased deletion frequency within the E2A
fragile zone compared with surrounding region
I wondered if the transient single-stranded state of E2A fragile zone can be captured in
vivo. Nuclease ARM23, which can nick any DNA with single-stranded and double-stranded
boundaries due to DNA breathing, was used to capture the transient single-stranded region inside
the cells. The strand breakage is likely repaired inside the cells with end resections which would
lead to minor sequence deletions around the breakpoints. The minor sequence variations on E2A
can be examined by deep sequencing, and the frequency of deletion events in the 23 bp E2A
fragile zone can be used to evaluate the structural fluctuation frequency.
Plasmid pDL16 with both E2A fragment from E2A intron 16 and ARM23 gene was
transfected into human cells and stably maintained in the nuclei. A control plasmid pDL14 with
only E2A region without AMR23 gene was transfected into human cells in parallel to determine
the background error rate due to cell culture, PCR, or sequencing. The cell culture time of up to
56 days after transfection allows the accumulation of the deletion events on the episome inside
the cells over time. H2O2 treatment of the transfected cells increases the oxidative stress inside
of the cells and is used to increase the frequency of the deletion events (Pannunzio and Lieber,
2017). The expression of ARM23 after transfection is verified in cells with and without H2O2
treatment (Fig. 4.13). A decreased expression level of ARM23 was observed over time of cell
culture likely due to the decrease of episome copy number in the cells with increasing length of
cell culture time.
The E2A on the episomes but not the genome locus was specifically amplified, sequenced,
and analyzed for minor deletions (Table 4.4). More than 1.5 million sequencing reads are
analyzed in each sample, and 0.1% of the reads contain deletions in E2A. The normalized
deletion frequency of the E2A fragile zone is ~2 x 10
-6
/bp, which is similar to that of the region
outside of the E2A fragile zone. All four samples, regardless of ARM23 expression and H2O2
treatment, do not show any difference in the overall deletion frequency and the deletion frequency
68
within the 23 bp E2A fragile zone. This finding suggests that the ARM23 in vivo cutting assay
does not demonstrate an increased thermal fluctuation of the E2A fragile zone compared with the
surrounding region.
4.4 Discussion
Native ammonium bisulfite sequencing results reveals the transient single-stranded
character in the 23 bp E2A fragile zone and the downstream region, and this finding is further
confirmed by the nuclease P1 cutting assay and the melting studies. The increased native
bisulfite reactivity correlates with C-string length, and it is consistent with the previous studies
showing DNA with consecutive C-strings tends to be in a B/A-intermediate DNA structure (non-B
structure) (Dornberger et al., 1999; Trantı
́ rek et al., 2000; Tsai et al., 2009). The non-B structure
of the E2A fragile zone and its downstream region with increased single-stranded character may
predispose these regions to AID deamination activity.
The ARM23 in vivo cutting assay does not show an increased deletion rate within the E2A
fragile zone than its surrounding regions on the episome in human 293 293/EBNA1 cells. This
negative finding may be due to the following possibilities: (a) no ssDNA formation on the episomal
E2A; (b) the formation of ssDNA is too transient to be captured by the ARM23 enzyme in the
complex environment of live cells; (c) expression of ARM23 is insufficient; (d) the E2A on the
episome is not accessible to ARM23 in the cells; or (e) the true deletion frequency cannot be
distinguished from the background noise level. While this assay does not provide supporting
evidence for single strand character of E2A in vivo, it does not rule out the existence of such
character in E2A.
69
Chapter 5 Biochemical AID Activity within the 23 bp E2A
Fragile Zone in a Purified System
5.1 Introduction
AID can deaminate cytosines only in regions of ssDNA, even if this ssDNA state is
transient (Pham et al., 2003; Pham et al., 2019). AID activity on dsDNA during transcription has
been investigated in many studies. Transcription of the immunoglobulin region is required in the
action of AID in SHM and CSR (Storb, 1996). The transcription bubble usually occupies ~12-14
bp within which the TS is paired with RNA and the NTS is in a single-stranded state, which is
vulnerable to AID activity. The transcription rate is not uniform across the genome with the
elongation rate of RNA polymerase II (Pol II) ranges from 1 kb to 6 kb per minute (Jonkers and
Lis, 2015). The nonuniform transcription rate of Pol II suggests that the duration of the
transcription bubble at different locations of a transcribed gene can be varied. A longer duration
time may be associated with an increased time of the NTS to be in the ssDNA state at specific
sites. Pol II pausing or stalling may result in a stable transcription bubble. The pervasive pausing
of Pol II at promoter-proximal regions in higher eukaryotes is well-documented (Brown et al., 1996;
Landick, 2006) (Core and Lis, 2008; Gressel et al., 2017). Pol II pausing sites are not only limited
to the promoter-proximal region but widely found in the genome in a sequence-dependent manner
(Eddy et al., 2011; Shundrovsky et al., 2004; Watts et al., 2019). The GC density of certain
regions is reported to be one of the key factors determining the pausing of Pol II. Pol II tends to
pause at GC-rich regions in human fibroblasts (Watts et al., 2019), and G-rich region of the NTS
has been reported to correlate with Pol II pausing in human cells (Eddy et al., 2011). A recent in
vitro study suggested that Pol II transcription is frequently terminated in regions with C-strings
(Pham et al., 2019). It has also been shown that AID is able to deaminate cytosines at an
increased frequency during pausing and stalling (Canugovi et al., 2009).
70
I have documented the single-stranded character of E2A fragile zone and its downstream
region in Chapter 4. In this chapter, the AID activity on E2A was studied using a biochemically
defined in vitro system. The AID deamination activity is mainly observed within the E2A fragile
zone and its downstream region, and this activity is increased upon transcription. The factors that
affect AID deamination activity within the fragile zone were investigated. RNase A treatment,
which can increase RNA polymerase processivity during transcription reactions, dramatically
decreases AID activity on E2A. The density of C-strings also affects the AID deamination activity
within and downstream of the E2A fragile zone. Disruption of three C-strings on the E2A substrate
decreases C to T conversions by AID within and downstream of the E2A fragile zone.
5.2 Materials and Methods
5.2.1 Oligonucleotides
Table 5.1 Oligonucleotides.
The sequences in lower cases are the adaptor regions for indexed i7 and i5 primers of Illumina sequencing
platform. The Ns in the sequence represent random nucleotides.
The oligonucleotides used in this chapter were synthesized by IDT. The sequences of the
oligonucleotides used in this study were listed in Table 5.1. The sequences in lower-case were
the adaptor regions for indexed i7 and i5 index primers of Illumina sequencing platform. Positions
with Ns were positions with the four deoxynucleotides (A, T, C, or G) randomly incorporated during
oligo synthesis. The E2A mut1 and E2A mut2 sequences were synthesized as mini-genes and
71
inserted into pUCIDT (Amp) by IDT. Part of the mut1 and mut2 sequences are displayed in Figure
5.1.
Figure 5.1 Sequences around the E2A fragile zone on wild type E2A, mut1, and mut2 substrates.
DNA sequences around the 23 bp E2A fragile zone on wild type E2A (wt E2A), mut1, and mut2 substrates
are illustrated. The nucleotides in the 23 bp E2A fragile zone are displayed in red. The pairs of direct
repeats (DR) near the 23 bp E2A fragile zone are indicated with matching color lines above or below the
sequences. The 6 bp DR (in light blue) decreases to 5 bp in mut1 but is completely disrupted in mut2. The
two 7 bp DR (in dark blue) are the same in wt E2A and mut1 substrates. However, this 7 bp DR sequence
is altered so that it is no longer a consecutive string of C on the TS in mut2, but the direct repeat (in green)
nature remains. The C-string and its mutated form are marked by the black boxes. The two G-strings on
the NTS (C-strings on the TS) and their mutated form are marked by the grey boxes.
5.2.2 AID reactions
The 570 bp substrates for wild type (wt) E2A, mut1, and mut2 containing the T7 promoter
were generated by a BlpI and XhoI double digestion of pDL14, pDL31, and pDL32, respectively
at 37°C for 2 hours in 1x CutSmart buffer followed by native PAGE (5%) purification. The
recovered DNA was quantified by real-time qPCR with DL127, DL128, and DL129 (45 cycles of
94°C 30s, 65°C 1 min).
The variant of AID.mono descripted in Chapter 2 that I purified has lower activity than
described in the published work from the Hao Wu lab. Therefore, I used a human wild-type GST-
AID with a carboxy (C) terminal hexa-His-tag (Pham et al., 2019) provided as a gift from Dr. Myron
Goodman’s laboratory at USC in the experiment described in this chapter. The AID reactions
containing 40 fmol/µl of human GST-AID, 250 µM rNTP mix, 10 mM DTT, and 2 units/µl of T7
RNA polymerase in 1 x transcription buffer (Promega), with 5 fmol/µl of the 570 bp DNA substrate
added last to initiate the transcription reaction, were incubated at 37°C for 60 min. T7 RNA
72
polymerase was omitted for reactions without transcription. Where indicated, 50 ng/µl of RNase
A was added at the same time as other components to the reactions. Reactions were terminated
and purified using HighPrep PCR system (MAGBIO) and then digested with HaeII restriction
enzyme at 37°C for 2 hours before a second purification with HighPrep PCR system. HaeII
digestion product containing the 23 bp E2A fragile zone was used as template in the MDS library
construction described below.
5.2.3 Maximum-Depth Sequencing (MDS) library construction
The NTS and TS of each sample were amplified in separate reactions and used for
Maximum-Depth Sequencing (MDS) (Fig. 5.2). MDS allows detection of the rare mutations by
reducing the background error rate in sequencing and PCR (Jee et al., 2016). Each DNA
molecule from the NTS and TS amplification reactions was assigned a unique 24 nt barcode (24
Ns) at the 3’ end through a 1-cycle linear extension with Taq polymerase using 76 nt barcode
primers (DL123 for the NTS and DL124 for the TS), which contain Illumina i7 adaptor sequence,
24 Ns, and strand-specific primer sequence (95°C 3 min, ramping from 70°C to 55°C by 0.1°C/s,
55°C 2min, 72°C 2 min). Excess barcode primer and the single-stranded non-target strand were
removed by exonuclease I treatment at 37°C for 1 hour.
Following purification by the HighPrep PCR system, the recovered DNA from the barcode
assignment reaction was quantified by real-time qPCR with DL127, DL128, and DL129 (45 cycles
of 94°C 30s, 65°C 1 min). Approximately 1 million molecules from each reaction were used as
template in a 10-cycle linear amplification reaction by Taq polymerase with 0.1 nM i7 index
primers (95°C 2 min, 10 cycles of 95°C 30 s, 65°C 30 s, 72°C 1 min) and the products were
purified with the HighPrep PCR system. Multiple copies of each original molecule, which has a
unique 24 Ns, were generated at this step with the dU generated by AID deamination on the
original molecule reads as T in the amplified molecules. Two rounds of exponential PCR using
Q5 high-fidelity DNA polymerase were performed to further add the unique i5 index to each
reaction and increase the total quantity of the DNA after the 10-cycle linear amplification. The
73
final PCR products were purified with 5% native PAGE. The samples were pooled and sequenced
with Illumina MiSeq Reagent Kit v3 (2 x 300 bp). All experiments were repeated at least twice for
validation. The sequence of molecules with the same barcode can be grouped and aligned to the
reference sequence in the analyses to eliminate random mutations that occurred during PCR or
sequencing (not from AID deamination activity) as described below.
5.2.4 MDS data analysis
The known genomic E2A NTS and TS sequences were used as references for NTS
samples and TS samples, respectively. The reads were first sorted by the 24 Ns located at the
3’ of R1 reads and 5’ of R2 reads (Fig. 5.3). The read pairs that contained inconsistent 24 Ns on
R1 and R2 were removed first. The R1 and R2 of molecules from each individual sample were
sorted based on the i5 and i7 indexes and read pairs with identical 24 Ns that originated from the
same original molecule were grouped together. Only groups with at least two read pairs were
retained for consensus sequence analysis. Within each group, any read with less than 70% match
to the reference was considered a poor quality read and its sequence is replaced with all Ns. A
consensus sequence was called independently from all R1 reads and from all R2 reads in each
group when a nucleotide is present in > 50% of the reads at the specific position. A N nucleotide
was called in the event that two different nucleotides were present at equal ratio for the same
position. The consensus sequences (one for R1 and one for R2) of each group is then treated
as consensus read pair of a single molecule. Consensus read pairs from all the groups were
filtered with the following criteria sequentially. First, the number of mismatches in R1 and R2 is
no greater than 53 and 55 to the reference (number of Cs on NTS and TS, respectively); second,
each R1 and R2 from the paired reads are matched in the overlapping region to identify validated
consensus read pairs for base mutation calling.
The mutation frequency for each position was calculated by dividing the number of pairs
with a certain mutation at that position by the total number of validated pairs. The C to T mutation
with background subtracted is presented in all figures. For background subtraction, the C to T
74
conversion rates in samples without AID treatment were regarded as background and subtracted
from that of the AID treated sample under the same condition with the negative values treated as
0. Differences in AID effect were calculated by subtraction between background subtracted
samples.
Sequencing quality at the ends of R1 and R2 reads was low and the base calling was
unreliable; therefore, positions at the 3’ end (6 nt for the NTS samples and 1 nt for the TS samples)
were excluded from the analysis. The amplicon analyzed and shown was 176 bp. The conversion
rates of mutations other than C to T are all close to the background level and excluded from the
figures for simplicity. Some reads contain unexpectedly high G to T mutations and were excluded
from the analysis after confirmation that they are independent of AID activity and are absent on
molecules with C to T mutations.
5.2.5 Statistical analyses
The Mann-Whitney U test was used in all analyses. The comparisons between
transcribed E2A samples in the AID biochemical assay were done using the conversion rates of
the seven positions shown in Figure 5.5a. The comparisons between E2A samples without
transcription were performed using the mutation rates of the four positions shown in Figure 5.5b.
When comparing AID treated sample with the background, the raw mutation rates without
background subtraction were used. In other cases (e.g., the effect of RNase A, wt E2A with mut1,
and wt E2A with mut2), the mutation rates after appropriate background subtraction were used.
75
Figure 5.2 Illustration of Maximum-Depth Sequencing (MDS).
(a) Amplification of certain strand by MDS. The DNA region of interest which is resulted from restriction
enzyme digestion is amplified in a strand-specific manner (red strand is amplified in this example) in MDS.
Four general steps are involved. First, the target strand is extended by a barcode primer in a one-cycle
linear extension. The barcode primer contains common adaptor sequence which is identical to that in i7
index primer (adaptor #2) at 5’ end, a barcode region of 24 random nucleotides (N1) in the middle, and a
primer region that is complimentary to the target strand at 3’ end. In this step, the target strand of each
DNA molecule gets a unique barcode primer. Second, exonuclease digestion is performed to remove the
original bottom strand (in black) and the residual barcode primers. Third, i7 index primer that is
complementary to the 3’ of the extended target strand is used in the 10-cycle linear amplification. Ten
identical molecules that are complementary to the target strand are generated (blue lines) in this step. Last,
two rounds of exponential amplification are performed to give each sample a unique i5 and i7 index. The
first round of PCR uses a i7 primer and a second primer with a common i5 adaptor sequence (adaptor #1)
at 5’ region (salmon line). The second round of PCR uses i7 and i5 index primers. Different combination
of i7 and i5 primers are used for different samples. The same i7 primer should be used for each sample
throughout all MDS steps. The dashed vertical lines are used to define different regions on the primer in
the figure for the ease of alignment. (b) Background mutation reduction by MDS. The red region represents
the barcode. The blue region is the amplicon. Reads with barcode 1 are grouped together and aligned to
the reference sequence. The “x” on the reads represents the random mutations arising from PCR or
sequencing. The “o” indicates the true mutation on the original molecule. Random mutations can occur at
any position of the amplicon and are not likely at the same position in all the reads. The true mutation is at
the same position of all the reads amplified from the same original molecule. The random mutations will
be averaged out and only the true mutation will be left in the consensus sequence.
76
Figure 5.3 Illustration of reads sorting process for MDS data.
The raw data from MDS for each sample is a mixture of reads containing a variety of barcodes (B1, B2, B3,
etc.). Three general steps are involved in reads sorting. The first step is trimming and filtering. The reads
from raw fastq files are first trimmed to contain forward primer (For), reverse primer (Rev), barcode (B), and
amplicon. The barcode sequences in each pair of R1 and R2 reads are compared and only the pair of
77
reads with consistent barcode on R1 and R2 is used for further analysis. Second, R1 and R2 reads with
identical barcode are sorted into the same fastq file separately. The fastq files are named by the sequence
of the barcode as illustrated in the figure. Read pairs with the same barcode sequence (e.g., B1) are
aligned to the reference sequence. Third, a pair of consensus R1 and R2 reads are generated for each
barcode. If the reads contain less than 70% identity to the reference sequence, the sequence will be
replaced with all Ns (e.g., the second R2). The original sequence would be kept if the reads have ≥ 70%
match to the reference. A consensus R1 and a consensus R2 are generated separately based on all R1
and R2 reads for the same barcode. A minimum depth of 2 reads is required for each barcode in the
consensus calling otherwise the consensus sequence will end up as all Ns. The consensus sequence is
generated in a position-by-position way. For each position, the nucleotide occurring in > 50% of the reads
is regarded as the consensus nucleotide (e.g., two As on two R1 reads will lead to a consensus A). A “N”
will be called in case of equal occurrence (e.g., T on one R1 and A on the second R2). The consensus pair
with one of them being all Ns will be filtered out in the CTAnalysis program in the following-up analysis.
78
5.3 Results
5.3.1 AID can deaminate cytosines in the NTS within and downstream of E2A fragile zone
with and without transcription
Figure 5.4 No appreciable cytosine deamination is observed in the reactions without AID.
(a) No appreciable cytosine deamination (< 0.03%) on either the NTS or TS in the absence of AID for
transcribed E2A sample (no RNase A treatment). The mutation rate for the NTS and the TS was calculated
based on 22,639 and 10,190 unique molecules, respectively. (b) No appreciable cytosine deamination on
either the NTS or TS of E2A in the absence of AID for E2A sample without any transcription (no RNase A
treatment). The mutation rate for the NTS and the TS was calculated based on 15,493 and 11,677 unique
molecules, respectively.
For both (a) and (b): The 176 bp E2A NTS sequence in a 5’ to 3’ direction is shown in the middle of the
graph with C to T mutation rates of NTS shown above the sequence and that of TS shown below the
sequence as histograms. The scale of the mutation rate is as indicated on the left of the graph in percentage.
The 23 bp E2A fragile zone is broadly covered in salmon color. AID hotspot sites (WRC and CGC (Yu et
al., 2004)) on NTS and TS are aligned with red and grey dashed lines, respectively. The CpG sites are
highlighted by the green zones. The sequence motifs covered by the grey boxes on the x axis are the ones
disrupted in mut1 and mut2 shown later in this chapter for easy reference.
Our previous studies of other 20-600 bp human lymphoid fragile regions indicated the
important role of AID in DNA breakage (Cui et al., 2013; Greisman et al., 2012; Lieber, 2016;
79
Pannunzio and Lieber, 2017; Tsai et al., 2008). The statistically significant proximity of E2A
breakpoints to AID hotspot motifs strongly suggests the involvement of AID in E2A breakage. I
used an in vitro assay to identify AID deamination sites on a 570 bp duplex DNA template, which
harbors the 23 bp fragile zone near the center, with and without transcription in parallel with control
experiments without AID. Following the enzymatic reaction, sites of AID deamination activity in a
176 bp region including the 23 bp fragile zone on each molecule were examined using MDS, as
described in the methods section. The MDS method ensures that each original molecule in the
reaction is uniquely barcoded before amplification by PCR (Jee et al., 2016), and the AID
deamination activity can be identified by the presence of C to T mutation in each unique molecule
in the in vitro enzymatic assay (Fig. 5.2).
Control reactions without AID show only non-specific background mutations lower than
0.03% (average background was 0.004%) in reactions both with and without transcription (Fig.
5.4), excluding the possibility that transcription by T7 RNA polymerase could lead to cytosine
conversions and establishing the background errors generated by PCR and sequencing in MDS.
This background mutation rate is calculated based on the number of C to T mutations among all
the unique molecules (see methods for details) and it is very reliable. After appropriate
background is subtracted (e.g., subtraction of up to 0.03%), it is apparent that several sites,
especially within the 23 bp E2A fragile zone, were deaminated by AID on the NTS when the
substrate is transcribed with T7 RNA polymerase (Fig. 5.5a). In a total of seven AID deamination
sites, five are known AID hotspots, with C to T mutation rates ranging from 0.04% to 0.20%
observed on the NTS in the 176 bp of E2A substrate examined. The C to T mutation rates at
these seven positions in AID-treated samples with transcription are statistically significantly higher
than the background levels which are established by control reactions without AID treatment (p =
1.8 x 10
-3
compared with background level, see methods for details). Four of these seven high
deamination sites are clustered in a 9 bp region within the 23 bp E2A fragile zone with one at a
CpG site that is also the major translocation breakpoint in patients. Only limited mutations were
80
detected on the TS, indicating AID deamination activity on this strand is below the detection limit
of the MDS assay. These results suggest that the 23 bp E2A fragile zone is preferred by AID
when the region is transcribed.
Figure 5.5 AID deaminates cytosines on the NTS within the 23 bp E2A fragile zone and the
downstream region with and without transcription.
(a) AID deamination activity in the presence of transcription (without RNase A treatment). The C to T
mutation rates with appropriate background (AID-free reaction with transcription and without RNase A
treatment) subtracted are presented as histogram. The mutation rate for the NTS and the TS was
calculated based on 30,566 and 12,677 unique molecules in AID treated samples and 22,639 and 10,190
unique molecules in AID-free samples, respectively. (b) AID deamination activity without transcription
(without RNase A treatment). The C to T mutation rates in AID-free reaction without transcription and
without RNase A treatment are used for background subtraction. The mutation rate for the NTS and the
TS was calculated based on 24,714 and 8,142 unique molecules in AID treated samples and 15,493 and
11,677 unique molecules in AID-free samples, respectively.
For both (a) and (b): The 176 bp E2A NTS sequence in a 5’ to 3’ direction is shown in the middle of the
graph with C to T mutation rates of NTS shown above the sequence and that of TS shown below the
sequence as histograms after appropriate background is subtracted. The mutation rates at high AID
deamination sites are as annotated and AID hotspot sites are marked with an asterisk. The scale of the y
axis and markings of fragile zone, CpG sites, and AID hotspot motifs are identical to the descriptions in
Figure 5.4.
81
AID deamination activity is also detected on the NTS in the 176 bp region of E2A when
there is no transcription through the substrate (Fig. 5.5b), but this occurs at a lower rate than
reactions without transcription (borderline significant with p = 6.4 x 10
-2
). AID deamination of
cytosines occurs more often on the NTS than on the TS independent of transcription, and little
difference is seen on the TS with or without transcription (Fig. 5.5a and Fig. 5.5b). Four relatively
high AID deamination sites remained without transcription, with mutation rates ranging from 0.03%
to 0.11% (p = 2.1 x 10
-2
compared with background level), and all four sites, including the second
CpG site in the 23 bp E2A fragile zone, are AID deamination hotspots. As I discuss previously,
only methylated CpG sites have the potential to become long-lived T:G mismatch lesions after
they are deaminated by AID. Other AID hotspots may give rise to C to U mutation (thus a U:G
mismatch), but cannot give rise to a long-lived lesion.
The locations of AID deamination hotspots within the 23 bp fragile zone identified in these
experiments are consistent with the P1 nuclease digestion sites and sites with single-stranded
character in the native bisulfite assay described in Chapter 4 (Fig. 4.12). The deamination activity
of AID detected in the 176 bp region of the E2A substrate both with and without transcription
shows that cytosines in the 23 bp E2A fragile zone are preferred targets for AID and that this
region can be transiently single-stranded, as required for AID substrates (Pham et al., 2016; Pham
et al., 2003; Qiao et al., 2017). In addition, the similar AID deamination activity on substrates with
and without transcription indicates that this single-stranded feature on the NTS can occur even
when there are no extrinsic factors to change the duplex nature of the substrate. Since AID can
only act on cytosines in regions of ssDNA, the increase of AID deamination activity in the 23 bp
E2A fragile zone and the downstream region by transcription indicate that transcription can
potentially increase the duration and DNA length (when multiple polymerases stall) of single-
strandedness in these regions.
82
5.3.2 RNase A diminishes AID activity on the NTS of E2A
Figure 5.6 Presence of RNase A during transcription results in markedly decreased AID deamination
activity in E2A fragile zone.
(a) The C to T mutation rates reflecting AID deamination activity in the presence of RNase A during
transcription on the 176 bp E2A substrate with the background (AID-free reaction with transcription and
with RNase A) subtracted. The mutation rate for the NTS and the TS was calculated based on 8,366 and
9,617 unique molecules in AID treated samples and 9,823 and 13,431 unique molecules in AID-free
samples, respectively. (b) The difference in AID effect with and without RNase A treatment was calculated
by subtracting the C to T conversion rate in transcribed E2A sample with RNase A treatment from the C to
T conversion rate at the same sites in transcribed E2A sample without RNase A treatment with background
subtracted first in each sample (see method section for details of background subtraction). The histograms
above the horizontal midline represent the results of the NTS and the histograms below the horizontal
midline represent the results of the TS. Blue bars represent positions with positive values (decreases with
RNase A treatment) and orange bars indicate negative values (increases with RNase A treatment).
For both (a) and (b): The 176 bp E2A NTS sequence in a 5’ to 3’ direction is shown in the middle of the
graph. The scale of the y axis and markings of fragile zone, CpG sites, and AID hotspot motifs are identical
to the descriptions in Figure 5.4.
RNase A specifically degrades single-stranded RNA. RNase A removal of nascent RNA
transcripts from T7 RNA polymerase reduces transcription stalling (Bentin et al., 2005). Therefore,
83
I investigated the effect of RNase A on AID deamination activity on the transcribed E2A substrate.
I find that the AID deamination activity in the 176 bp region is greatly reduced when RNase A is
added to the E2A transcription reaction, detecting only a single clear deamination site, which is
an AID hotspot motif, on the NTS in the region downstream of the 23 bp E2A fragile zone (Fig.
5.6a). Although the mutation rate of this sole AID deamination site is 0.06%, it is far lower than
the 0.20% observed when RNase A is absent in the reaction. All other mutation sites detected in
RNase A-free conditions decrease to near or below background levels when RNase A is present
(p = 8.8 x 10
-3
) (Fig. 5.6b). The nearly complete elimination of AID deamination activity on the
NTS of E2A suggests the loss of single-stranded character of the E2A substrate by the addition
of RNase A during transcription. This may be caused by the more rapid transit of the RNA
polymerase after removal of the nascent RNA tail.
5.3.3 C-strings within the E2A substrate are essential for AID deamination activity
targeting
Our previous studies have indicated the presence of B/A intermediate DNA structures at
regions with C-strings in human Bcl1 and Bcl2 fragile zones (Raghavan et al., 2004a; Raghavan
et al., 2004b; Tsai et al., 2009). In the current study, I detected single-stranded character at long
C-strings and relatively high AID deamination activity at the 5C-string within the 23 bp E2A fragile
zone. I next tested whether the C-strings in the 23 bp E2A fragile zone and the downstream
region contribute to structural fluctuation, which predispose the region to AID deamination activity.
Two mutated E2A sequences (Fig. 5.1) were used to test this hypothesis in the same
experimental conditions as above. A string of five Cs at the edge of the 23 bp E2A fragile zone
on the NTS was disrupted on both ‘mut1’ and ‘mut2’ mutants, and an additional two strings of four
Gs on the NTS (two strings of four Cs on the TS) on ‘mut2’ were disrupted. The C-string
disruptions do not add new or delete the previously existing AID hotspot motifs on the mut1 and
mut2 substrates. The disruption of C-strings on mut1 and mut2 allows us to investigate the
84
contribution of these DNA motifs to AID deamination activities on E2A when compared with the
results of the wild type (wt) E2A substrate.
Similar to the wt E2A substrate, the same seven high AID deamination sites are present
on the NTS of mut1 with mutation rates ranging from 0.05% to 0.16% with transcription (top panel
of Fig. 5.7a). The number of high AID deamination sites reduces to three on the NTS of mut1 in
the absence of transcription with mutation rates between 0.11% to 0.20% (bottom panel of Fig.
5.6a). Comparison of the AID deamination activity on the wt E2A substrate and mut1 reveals a
similar deamination pattern of AID on both strands of the substrate regardless of transcription (no
statistical difference, with a p > 0.05; Fig. 5.7b). These results indicate that the elimination of the
5C-string on the NTS at the boundary of the 23 bp E2A fragile zone has no detectable effect on
AID deamination activity on the substrate despite the fact that disruption of the 5C-string may
reduce the tendency of these five bases to be single stranded (based on the findings in the native
bisulfite assay).
In marked contrast, a reduced number of AID deamination sites and lower mutation rates
are observed on the mut2 substrate compared with the wt E2A substrate, both with and without
transcription (Fig. 5.8). Only four of the seven high AID deamination sites observed on the NTS
of the wt E2A substrate in the presence of transcription remained on the mut2 substrate with
transcription (top panel of Fig. 5.8a). Compared with the wt E2A substrate, it is apparent that the
AID deamination activity decreases primarily in the 23 bp fragile zone and the downstream region
on the NTS of the mut2 substrate in the presence of transcription (p = 2.7 x 10
-3
; top panel of Fig.
5.8b). Unlike the wt or mut1 substrate, no clear AID deamination site is observed on the mut2
substrate in the absence of transcription (bottom panel of Fig. 5.8a). There is decreased AID
deamination activity across the entire mut2 substrate compared with the wt E2A substrate (p =
2.1 x 10
-2
; bottom panel of Fig. 5.8b). The reduced AID deamination activity on mut2 strongly
suggests that disruption of one C-string on the NTS and two C-strings on the TS markedly
85
changes the single-stranded character and accessibility of AID to the mut2 substrate both with
and without transcription.
86
Figure 5.7 AID shows similar deamination activity on the NTS of the mut1 substrate with and without
transcription compared with that of wt E2A.
(a) AID deamination activity on mut1 substrate with (top panel) and without (bottom panel) transcription (no
RNase A). The C to T mutation rates presented are with background (AID-free reaction with transcription
and without RNase A) subtracted. For the top panel, the mutation rate for the NTS and the TS was
calculated based on 13,000 and 5,016 unique molecules in AID treated samples and 13,146 and 6,752
unique molecules in AID-free samples, respectively. For the bottom panel, the mutation rate for the NTS
and the TS was calculated based on 6,091 and 5,372 unique molecules in AID treated samples and 7,418
87
and 10,192 unique molecules in AID-free samples, respectively. (b) Comparisons of AID mutation rates
between wt E2A and mut1 samples. The difference in AID effect between wt E2A substrate and mut1 in
the presence of transcription shown in the top panel was calculated by subtracting the C to T conversion
rate in transcribed mut1 from that in the transcribed wt E2A sample (no RNase A treatment) after
background was subtracted from each. The difference in AID effect between wt E2A and mut1 without
transcription shown in the bottom panel was calculated by subtracting C to T conversion rate in
untranscribed mut1 from that in untranscribed wt E2A sample after background was subtracted from each
(see method section for details of background subtraction). The histograms above the horizontal midline
represent the results of the NTS and the histograms below the horizontal midline represent the results of
TS. Blue bars represent positions with positive values (higher deamination activity on wt substrate) and
orange bars indicate negative values (lower deamination activity on wt substrate).
For both (a) and (b): The 176 bp E2A NTS sequence in a 5’ to 3’ direction is shown in the middle of the
graph. The scale of the y axis and markings of fragile zone, CpG sites, and AID hotspot motifs are identical
to the descriptions in Figure 5.4.
88
Figure 5.8 AID shows decreased deamination activity on the NTS of the mut2 substrate with and
without transcription compared with that of wt E2A.
(a) AID deamination activity on mut2 substrate with (top panel) and without (bottom panel) transcription (no
RNase A). The C to T mutation rates presented are with background (AID-free reaction with transcription
and without RNase A) subtracted. For the top panel, the mutation rate for the NTS and the TS was
calculated based on 32,208 and 6,016 unique molecules in AID treated samples and 25,880 and 9,588
unique molecules in AID-free samples, respectively. For the bottom panel, the mutation rate for the NTS
and the TS was calculated based on 19,610 and 5,692 unique molecules in AID treated samples and 23,148
89
and 15,032 unique molecules in AID-free samples, respectively. (b) Comparisons of AID mutation rates
between wt E2A and mut2 samples. The same calculation between wt E2A substrate and mut2 was done
as described as in Figure 5.7b.
For both (a) and (b): The 176 bp E2A NTS sequence in a 5’ to 3’ direction is shown in the middle of the
graph. The scale of the y axis and markings of fragile zone, CpG sites, and AID hotspot motifs are identical
to the descriptions in Figure 5.4.
90
5.4 Discussion
5.4.1 Experimental approaches chosen for this study
I have used a biochemical system in this chapter rather than a cellular assay system.
Genetic selection of translocation events in in vivo experimental systems would require
modification of the DNA surrounding the fragile zone to insert a selectable site or marker. The
low rate of AID deamination events in the genome makes it extremely difficult to detect its target
sites in the genome experimentally. In Chapter 4, the use of native bisulfite to study the DNA
structure of these regions in live pre-B cell lines from human acute lymphoblastic leukemia cases
is useful to gain insight on this challenging problem. Here, in Chapter 5, the use of an AID
biochemical system is essential to obtain precise high resolution information regarding AID
targeting. MDS can distinguish background mutations caused by amplification steps from bona
fide AID events and allows the detection of rare mutations occurring at 10
-5
or even lower (Jee et
al., 2016). I used MDS as the readout of this biochemical system for the highest sensitivity
possible for the approach here. The PCR and sequencing errors can be averaged out in MDS
and only the true mutations would remain as shown in the final results (Fig. 5.2b).
5.4.2 Strand-specific AID deamination activity in the presence of transcription
The T7 RNA polymerase transcription bubble is 9-12 bases long and has an RNA:DNA
hybrid of 7-8 bp in size (Huang and Sousa, 2000; Temiakov et al., 2000). As mentioned before,
the NTS in the transcription bubble is in a single-stranded state and is vulnerable to AID
deamination activity. Consistent with the previous studies that showed strand-biased action of
AID during transcription (Beletskii and Bhagwat, 1996; Pham et al., 2003; Pham et al., 2019;
Ramiro et al., 2003; Sohail et al., 2003), my results show that AID generated mutations occur
specifically on the NTS of E2A but not on the TS (Fig. 5.5). A cluster of AID mutation sites is
observed within a 9 bp zone located inside of the 23 bp E2A fragile zone in the presence of
transcription (Fig. 5.5a). The 9 bp region can be within a transcription bubble that is subjected to
91
greater access for AID to the single-stranded NTS due to longer duration of T7 RNA polymerase
pausing and acquires a cluster of C to T mutations.
5.4.3 Factors contributing to decreased transcription rate
Any factors that increase the duration of single-stranded DNA can contribute to increased
AID action. The RNase A and C-string density effects observed in my experiments demonstrate
this. Removal of the nascent RNA tail from the RNA polymerase reduces stalling (Bentin et al.,
2005), and thus reducing the ssDNA exposure on the NTS. My observations here are consistent
with a previous report that decreased rNTP concentration slows down T7 RNA polymerase and
results in increased AID activity(Canugovi et al., 2009), likely due to the increased exposure of
ssDNA on the NTS.
Specific DNA sequence features, especially C-strings, affect transcription transit time and,
therefore, affect AID action. GC-rich regions, especially DNA with C-strings, has been reported
to be preferred sites of RNA pol II pausing (Chaudhuri et al., 2003; Eddy et al., 2011; Krohn and
Wagner, 1996; Pham et al., 2019; Watts et al., 2019). Consistent with the previous reports, I
found that the C and G distribution on the NTS and the TS affects the AID action during
transcription (Fig. 5.7; Fig. 5.8).
A direct repeat 6 bp in length is located on the upstream and downstream edges of the 23
bp E2A fragile zone. Slippage between direct repeats may lead to a transient single-stranded
state in the region between the direct repeats that is more accessible to AID (Fig. 5.9). The entire
E2A gene is actively transcribed in pre-B cells, and the frequent transcription may increase the
opportunity of misalignment between direct repeats. The single stranded region caused by DNA
slippage can be targeted by AID before it collapses back to the duplex state. The disruption of a
C-string on the NTS in mut1 reduces the length of the direct repeats to 5 bp, while disruption of
the three C-strings on mut2 fully eliminates the direct repeat that flanks the 23 bp fragile zone.
The reduced AID activity on the mut2 substrate compared with the wt E2A and the mut1
substrates supports the hypothesis that the presence of direct repeat may increase AID
92
accessibility to the region, and that DNA slippage is a possible factor for the highly focused E2A
breakage.
Figure 5.9 Potential loop formation due to misalignment between the direct repeats.
The direct DNA repeat 6 bp in length flanking the 23 bp E2A fragile zone is shown in blue. The nucleotides
of the 23 bp E2A fragile zone are shown in red. Misalignment between DNA repeat could occur during the
reannealing process after the two E2A strands are transiently separated by transcription. Two possible
secondary structures by misalignment are shown.
It has been suggested that repair machinery recruited to the initial DNA lesions can lead
to further polymerase pausing (Gonzalez et al., 2020). The increased polymerase pausing may
lead to increased ssDNA exposure and then, an increased AID deamination activity within the
E2A fragile zone and its downstream region.
A model incorporating all of these factors is discussed in Chapter 8.
93
Chapter 6 E2A Binding Protein Analyses with EMSA
6.1 Introduction
The E2A fragile zone must be in a single-stranded state, at least transiently, for AID
deamination to take place in this region (Pham et al., 2003; Tsai et al., 2008). ssDNA can arise
from intrinsic structural fluctuation due to specific DNA sequences (Tsai et al., 2009; von Hippel
et al., 2013), biological processes like transcription (Bell and Dutta, 2002; Lieber, 1997; Murakami
et al., 2002; Pannunzio and Lieber, 2016; Zuo and Steitz, 2015), special DNA structures like G-
quadruplex and R-loop (Qiao et al., 2017; Roy and Lieber, 2009; Roy et al., 2008; Thomas et al.,
1976; Yu et al., 2005), or protein binding (Jensen et al., 1976; Luger et al., 1997; Travers, 1989).
It is well characterized that DNA around a bound protein is bent (Gartenberg and Crothers, 1988;
Luger et al., 1997; Travers, 1989; Wu and Crothers, 1984), and the bends of DNA can increase
the local torsional stress and hence change the DNA conformation. The sequence-dependent
structural fluctuation of the E2A fragile zone is characterized by multiple assays as described in
Chapter 4. Transcription was found to increase the AID deamination activity within the E2A fragile
zone and its downstream region in Chapter 5. The potential possibility of specific DNA binding
proteins around the E2A fragile zone has not been investigated previously. Identification of
proteins specifically bound to the E2A fragile zone would lay the foundation for further study on
the role of the specific binding proteins in breakage of the 23 bp E2A fragile zone.
One of the well-developed assays to study the unknown DNA binding proteins on
sequences of interest is using the DNA affinity column (Briggs et al., 1986; Chodosh et al., 1988;
Jones et al., 1987; Kadonaga and Tjian, 1986; Rosenfeld and Kelly, 1986; Wu et al., 1988). In
this chapter, DNA affinity beads and electrophoretic mobility shift assay (EMSA) are used to
detect the specific DNA binding proteins within the 23 bp E2A fragile zone. Affinity beads coupled
with DNA from E2A fragile zone in either duplex or single-stranded forms are used to capture the
E2A specific binding proteins in independent experiments. The affinity beads captured proteins
94
as well as the whole pre-B cell lysate are tested in EMSA. Proteins in some shifted bands are
further analyzed with MS. EMSA with whole pre-B cell lysate suggests potential E2A specific
binding proteins on duplex E2A sequence; however, those are concluded to be proteins with high
affinity to 5’ overhang structure in later experiments. Proteins with high affinity to E2A sequence
are not captured by the DNA affinity beads. Possible reasons for not detecting any E2A specific
binding proteins are considered.
6.2 Materials and Methods
6.2.1 Oligonucleotides
The oligonucleotides used in this chapter were synthesized by IDT. The sequences of the
oligonucleotides used in this study are listed in Table 6.1 with methylcytosines underlined. All the
Oligonucleotides are purified with native PAGE. 5’ labeling of oligonucleotides was done with T4
PNK and [gamma-
32
P] ATP as described previously.
Table 6.1 Oligonucleotides.
Underlined cytosines are 5-methylcytosines.
95
6.2.2 Preparation of DNA affinity beads
6.2.2.1 Concatemer preparation
Figure 6.1 Illustration of enrichment and detection of specific DNA binding proteins with EMSA.
(a) Preparation of DNA concatemer. Concatemers mixture with at least two units of the monomer (up to
several kilobase pairs) was generated by the ligating the 40 bp monomer with complementary 5’ sticky
ends. The vertical dashed lines in the concatemer indicate the boundaries of the monomeric unit in the
ligation product. (b) Enrichment of E2A sequence specific binding proteins with affinity beads. The whole
pre-B cell lysate is used to incubate with affinity beads coupled either with E2A sequence (shown in red
curved lines) or control sequence (shown in blue curved lines). Common DNA binding proteins (shown in
oval) and proteins non-specifically bound to the beads (shown in rectangle) are enriched by both E2A and
control affinity beads. The proteins that specifically bind to E2A sequence (shown in red triangle) can be
specifically enriched by E2A affinity beads. (c) Illustration of EMSA using proteins enriched by E2A and
control affinity beads. The proteins enriched by E2A and control affinity beads are incubated with
radioactive E2A probe (hot probe, shown in blue with asterisk at 5’ end) and radioactive control probe
(shown in grey) separately. The bottom band on the gel represents the free probes in each reaction. The
common DNA binding protein (shown in oval) can result in the same shifted bands in all the reactions shown
as the middle bands. The E2A specific binding protein (shown in red triangle) enriched specifically by the
E2A affinity beads only binds to the hot E2A probe with high affinity and leads to a unique shifted band on
the gel (shown as the top band).
96
The 40 bp double-stranded E2A substrate with 4 nt 5’ overhang on both strands was
prepared by annealing 22 µg of DL1 and 22 µg of DL2 in the annealing buffer (10 mM Tris-HCl,
50 mM NaCl, pH 8.0). The 40 bp control substrate with same 4 nt overhang was prepared with
DL3 and DL4 in the same condition. DNA concatemers were prepared by ligating the 40 bp
substrate after phosphorylation by T4 PNK (Fig. 6.1a). The 500 µl ligation reaction containing 44
µg of the dsDNA substrate, 4 mM of ATP, and 7.5 weiss units of T4 DNA ligase (Roche) in 1 x T4
DNA ligation buffer was incubated at 16°C for 6 hours followed by phenol/chloroform extraction
and ethanol precipitation. The residual ATP left in the recovered ligation products was removed
using G25 resin. The ligation products were verified on native PAGE before further usage. The
ligation products were a mixture of the concatemers with at least two copies of the 40 bp substrate
with size ranging from 84 bp to several kilobase pairs.
6.2.2.2 Concatemer coupling on CNBr-activated Sepharose beads
The DNA affinity beads were prepared as previously reported with minor modifications
(Kerrigan and Kadonaga, 1998). Specifically, the E2A concatemer and control concatemer from
the ligation reactions were coupled to CNBr-activated Sepharose 4 Fast Flow (GE Healthcare,
Chicago, IL) separately. Around 600 mg of Sepharose beads were hydrated with 50 ml of 1 mM
HCl by incubation at 4°C for 30 min on the rotator. The beads were washed in a sintered glass
funnel sequentially with 500 ml of ice-cold 1 mM HCl, 100 ml of ice-cold water, and 100 ml of ice-
cold 10 mM potassium phosphate (pH 8.0). After the final wash, the beads in ice-cold potassium
phosphate (pH 8.0) were transferred to a 50 ml tube and recovered by centrifugation at 1,500 rpm
at 4°C for 5 min (Beckman Coulter GS-6KR Centrifuge, Brea, CA). Extra buffer was removed to
make the final buffer to beads volume ratio at 1:2. The beads were gently resuspend in the tube
to a homogeneous thick slurry and aliquoted into three tubes. Forty-four µg of the E2A
concatemer, control concatemer, and equal volume of water were added separately to these three
tubes followed by incubation at room temperature for 8 hours (coupling reaction). The beads
were recovered by centrifugation at 1,500 rpm at 4°C for 5 min in Beckman Coulter centrifuge
97
followed by thorough washes with water and 100 mM Tris-HCl (pH 8.0). The DNA concentration
in the supernatant of the coupling reaction was measured and used to calculate the coupling
efficiency. The unreacted CNBr on the beads was inactivated in 100 mM Tris-HCl (pH 8.0) buffer
by incubation at room temperature for 3 hours. The beads were thoroughly washed sequentially
with 10 mM potassium phosphate (pH 8.0), 1 M potassium phosphate (pH 8.0), 1 M potassium
chloride, and water after the inactivation reaction and stored in 5 ml of storage buffer (50 mM Tris-
HCl, 1 mM EDTA, 0.1 M NaCl, pH 7.6) at 4°C for future usage.
6.2.2.3 ssDNA coupling on CNBr-activated Sepharose beads
ssDNA was coupled to the CNBr-activated Sepharose beads in the same manner as
described above with minor modifications. The coupling reactions were performed with 22 µg of
DL27 and DL29 separately in buffer containing 100 mM NaHCO 3 and 0.5 M NaCl, followed by
thorough wash and CNBr inactivation. After coupling, the beads were washed with low pH buffer
(0.1 M sodium acetate, 0.5 M NaCl, pH 3.8) three times, high pH buffer (0.1 M Tris-HCl, 0.5 M
NaCl, pH 8.0) three times, and column storage buffer (50 mM Tris-HCl, 1 mM EDTA, 0.1 M NaCl,
pH 7.6) once. The beads were stored in 2.4 ml of storage buffer at 4°C.
6.2.3 Protein capture with affinity beads
The nuclei of pre-B cells occupy up to 80% of the intracellular space (Valenciano et al.,
2014) and it is feasible to use whole cell lysate instead of nuclear extract to capture the E2A
specific binding proteins. Both the concatemer coupled beads and ssDNA coupled beads were
used to capture proteins from whole cell lysate of 697 cells in different experiments as specified
in the results.
Usually, 1.7 x 10
8
freshly harvested pre-B cells in a volume of 200 µl were lysed with 4
volumes of ice-cold lysis buffer (20 mM HEPES, 420 mM KCl, 1 mM DTT, 1 mM MgCl2, 1 mM
PMSF, 1% NP40, pH 7.5) and incubated on ice for 10 min. The volume of the lysis reaction was
scaled according to the number of cells. The supernatant was recovered by centrifugation at
12,000 rpm at 4°C for 10 min (Eppendorf Centrifuge 5415C) and then mixed with 3 µg of poly(dA-
98
dT), which was commonly used as non-specific competitor to remove non-specific DNA binding
proteins in the whole cell lysate. The reactions were incubated on ice for 15 min followed by
centrifugation at 12,000 rpm for 10 min at 4°C in the Eppendorf centrifuge to recover the
supernatant. The recovered cell lysate was further diluted with buffer Z (25 mM HEPES, 12.5
mM MgCl2, 1 mM DTT, 10% glycerol, 0.1% NP40, pH 7.6) to make the final KCl concentration of
50 mM before incubation with affinity beads. Buffer Z/50 mM KCl (buffer Z supplemented with 50
mM KCl) was used to wash the affinity beads before usage followed by centrifugation at 1,500
rpm at 4°C for 5 min in the Eppendorf centrifuge to recover the beads. The volume of the affinity
beads used was based on the cell number (6 x 10
9
cells/ml of settled bed volume of the beads).
The E2A and control affinity beads were incubated with the cell lysate at 4°C for 15 min followed
by four washes with buffer Z/50 mM KCl. The captured proteins were sequentially eluted with 20
µl of buffer Z containing increasing concentration of KCl (buffer Z/300 mM KCl, buffer Z/500 mM
KCl, and buffer Z/800 mM KCl). The affinity beads were recovered by centrifugation at 1,000 rpm
at 4°C for 5 min in the Eppendorf centrifuge after each wash/elution.
6.2.4 EMSA
Several probes were used in EMSA: 1) the 40 bp E2A substrate with 4 nt 5’ overhang on
both strands by annealing oligomer DL1 and DL2 and the control substrate of the same
configuration by annealing DL3 and DL4; 2) the 44 bp E2A substrate with blunt ends by annealing
DL27 and DL28; 3) the 44 bp blunt ended E2A substrate with two methylated CpG sites by
annealing DL19 and DL20; 4) 44 nt E2A non-transcribed strand (DL27), a 44 nt control ssDNA
(DL29), a 44 nt control ssDNA containing G4 motif (DL74), and a 44 nt control ssDNA with C5TG4
motif (DL75). The radioactively labeled dsDNA probes (hot probes) were prepared by annealing
of the 5’ radioactively labeled non-transcribed strand (DL1, DL3, DL27, or DL19) and unlabeled
transcribed strand (DL2, DL4, DL28, or DL20) at 1:1.2 ratio in the annealing buffer (10 mM Tris,
50 mM NaCl, pH 8.0). The corresponding unlabeled duplex probes (cold probes) were prepared
in the same way with two cold strands and used as specific competitors in EMSA.
99
Both the whole cell lysate and the affinity beads captured fractions were used in EMSA as
specified in the results section. Usually, 1 µl of the whole cell lysate or the affinity beads captured
fractions was incubated in buffer Z with the non-specific competitor poly(dA-dT) [0 pmol or 4 pmol
(0.57 µg)] and cold probe (0 pmol, 0.4 pmol, or 2 pmol) for 15 min on ice followed by the addition
of 0.4 pmol hot probe to make the final volume 10 µl. The reactions were incubated on ice for
another 15 min before being loaded onto native PAGE (8% for EMSA with dsDNA hot probes and
12% when ssDNA hot probes were used). The gels were autoradiographed as described in
previous chapters.
6.2.5 MS
The gel pieces with target shifted bands were recovered and submitted to the USC
Proteomics Core for mass spectrum detection and data analyses.
6.3 Results
6.3.1 Rationale of EMSA for E2A specific binding protein detection
Affinity beads coupled with E2A sequence and control sequence were used separately to
capture sequence specific DNA binding proteins (Fig. 6.1b). The control sequence coupled beads
can capture non-specific proteins binding to general DNA sequences and to the surface of the
affinity beads. Besides the non-specific DNA binding proteins, E2A sequence coupled beads can
also capture proteins specifically bind to the 23 bp E2A fragile zone. The E2A specific proteins
captured by the E2A sequence coupled affinity beads can be studied in EMSA using different hot
probes and protein fractions (Fig. 6.1c). The non-specific proteins binding to both E2A hot probe
and control hot probe will be revealed as common shifted bands in all reactions on the native
PAGE. A unique band is expected in reaction using E2A hot probe and E2A affinity beads
captured proteins due to the high specificity and affinity of the E2A specific binding proteins to the
E2A hot probe.
100
6.3.2 EMSA with whole cell lysate using dsDNA probes with 5’ overhangs suggests the
potential existence of proteins with higher affinity to E2A probe compared with the control
probe
Figure 6.2 Protein with high affinity specific to E2A sequence in EMSA.
(a) EMSA with 697 whole cell lysate using E2A and control hot probes with 5’ overhang. Lane 1 and lane
6 are free probe controls with only radioactive probes in the reaction buffer. The 697 whole cell lysate was
incubated with E2A and control hot probes in lane 2 and lane 7. Poly (dA-dT) functioned as a non-specific
competitor and was added to the reactions at 10 excess molar ratio to the hot probe as indicated above the
gel. E2A cold probe was used as a specific competitor and added to reactions at 1:1 molar ratio (lanes 4
and 9) and at 1:5 molar ratio (lanes 5 and 10) as indicated above the gel. The red dashed box and the red
arrow indicate the bands of interest, namely band 2 in this case. The bands in the green boxes were
recovered individually for MS analysis. (b) EMSA with whole cell lysate of three pre-B cell lines using E2A
hot probes with 5’ overhang. Similar reactions as described in (a) were performed with E2A hot probe with
5’ overhang using whole cell lysate from three different pre-B cell lines. The ratios of E2A hot probe to poly
(dA-dT) and E2A cold probe are illustrated in the table above the gel image. The same band 2 are
highlighted with the red dashed box.
A preliminary study was performed using whole pre-B cell lysate to investigate whether
there are proteins in pre-B cells having high affinity to E2A fragile zone sequence. The 40 nt E2A
probe and control probe with 4 nt 5’ overhang were used separately as hot probes to incubate
with the whole cell lysate of 697 cells (Fig. 6.2a). Free probe controls from reactions with only
hot probe in the reaction buffer (lane 1 and lane 6) show one single band at the bottom of the gel.
After incubated with the whole cell lysate, the majority of the radioactively labeled hot probes
cannot migrate into the gel and remain in the gel well due to the large complex formed by the
101
cellular proteins and hot probes (lane 2 and lane 7). The addition of non-specific competitor
poly(dA-dT) to the reactions at 1:10 ratio releases a portion of the hot probes from the large
complex, resulting in some shifted bands on the gel including a large band right below the gel well
(band 1) and a band in the middle of the gel (band 2) (lane 3 and lane 8). Adding E2A cold probe
at 1:1 ratio (lane 4 and lane 9) and 1:5 ratio (lane 5 and lane 10) to the hot probe leads to further
reduction of the hot probe remained in the gel well.
Theoretically, only a subset of proteins would bind to both control probe and E2A probe
with some unique proteins bind to each probe. In general, the molecular weight of the proteins
would determine the position of the protein-DNA complex on the native gel. Although several
shifted bands are observed on the gel after the addition of the cold probes to the reactions, it is
unclear that there are any specific proteins bind to the E2A probe since sizes of the shifted bands
are indistinguishable in the experiments using E2A probe and the control probe.
To more thoroughly investigate whether the potential E2A specific binding proteins exist
in other pre-B cell lines at a higher level that may allow detection of binding to E2A probe in EMSA.
The same experiment using the same E2A hot probe is performed using whole cell lysate from
697, Reh, and Nalm-6 cells (Fig. 6.2b). Similar to the previous results using 697 cell lysate when
cell lysate from Reh cells was used, band 2 appears when poly(dA-dT) is added to the reaction
and the relative intensity of this band decreases with the addition of increasing ratio of the E2A
cold probe in the reactions. Technical problems may have led to the loss of E2A hot probe in the
reaction in lanes 9, 10, and 11 and the absence of band 2 in the experiment using Nalm-6 cell
lysate. Although the reduction of band 2 when increasing amount of the cold E2A probe was
added in both Reh and 697 cells suggests potential specific protein binding to the E2A probe, it
is not entirely convincing since the indistinguishable size of the band also observed in the
experiment with control probe.
To final determine whether band 2 contains proteins specifically bind to E2A DNA, the
shifted band and a control band (a and b in green boxes in Fig. 6.2a) were recovered from the gel
102
followed by mass spectrometric (MS) analysis. Hundreds of proteins in each gel piece are
identified, among which the DNA binding proteins are listed in Table 6.2. The list includes a
variety of proteins with cellular functions like DNA repair, transcription, and translation. However,
none of the proteins is known to separate the two strands of the duplex to generate ssDNA.
Table 6.2 DNA binding proteins detected by MS.
Proteins in the two bands in the green boxes in Figure 6.2a were analyzed by MS. Among the abundant
proteins identified by MS in each band, the potential interesting DNA binding proteins were selected and
listed in the table.
6.3.3 EMSA with affinity beads captured proteins using probes with 5’ overhang does not
show the same shifted band
It is possible that low abundance of the proteins specifically bind to E2A is the basis for
the positive finding in EMSA using the whole cell lysate. I used DNA affinity beads coupled with
E2A concatemer to capture the potential E2A specific binding proteins from 697 whole cell lysate.
The proteins captured on the beads were eluted with buffer Z of gradient KCl concentrations (100
mM, 300 mM, 500 mM, and 800 mM). EMSA was performed using the affinity beads captured
proteins and E2A hot probe and control hot probe with 5’ overhang.
A different shifted band, band 3, as pointed by the arrow, but not the previous band 2 of
smaller size, appears in EMSA performed with affinity beads captured protein fractions (Fig. 6.3).
Band 3 is present in reactions without the addition of any cold DNA. This finding suggests that
103
band 2 observed previously was not likely to be shifted by E2A specific binding proteins and the
affinity chromatograph didn’t enrich the potential E2A specific binding protein.
Figure 6.3 Absence of protein with high affinity specific to E2A sequence in EMSA using affinity
beads captured fractions.
E2A affinity beads captured proteins were eluted with buffer of different salt concentrations (500 mM and
800 mM KCl) separately followed by incubation with E2A and control hot probes with 5’ overhang. Lane 1
and lane 10 are free probe controls. E2A hot probe was used in reactions from lane 2 to lane 9. Control
hot probe was used in reactions from lane 11 to lane 18. The ratios of hot probe to poly (dA-dT) and hot
probe to E2A cold probe in each reaction are displayed in the table above the gel image. Band 3 is the
only shifted band observed on the gel and is pointed by the arrow.
6.3.4 The shifted band observed in EMSA with whole cell lysate is likely caused by protein
binding to overhang DNA ends instead of E2A specific binding proteins
I wondered why the shifted band 2 is only present in EMSA with whole cell lysate but it is
absent when affinity beads captured proteins are used. Three different configurations of dsDNA
E2A hot probes were used in EMSA to further explore the existence of E2A specific binding
proteins (Fig. 6.4). The 40 bp E2A hot probe with 5’ overhang used previously and two 44 bp
blunt-ended E2A probes, one without cytosine modification and the other with two methylated
CpG sites within the E2A fragile zone, are used separately in reactions with 697 whole cell lysate.
104
The addition of E2A cold probe at 1:1 ratio leads to the presence of a band 4 right below
the gel well (lane 4, 8, and 12) likely that the protein complex is able to migrate out of the gel well
due to the disruption of the protein complex by the increased DNA in the reactions. The same
shifted band 2 appears in reactions with E2A hot probe with 5’ overhang after the addition of
poly(dA-dT) as observed before (Fig. 6.2; Fig. 6.4). However, band 2 is absent in reactions done
with the other two blunt ended E2A hot probes, regardless of cytosine modification. This finding
suggests that band 2 is likely to be shifted by proteins with high affinity to the 5’ overhang structure
but not by E2A specific binding proteins.
Figure 6.4 Absence of protein with high affinity specific to E2A sequence in EMSA using blunt
ended hot probes.
The configurations and modifications of E2A hot probes are illustrated above the gel image. Asterisk at the
5’ end represents the radioactive label. The red dots represent methylcytosines. The 40 bp E2A probe with
4 nt 5’ overhang, the blunt ended E2A probe with two methylated CpG sites, and the blunt ended E2A
without modifications were used to incubate with the whole cell lysate from 697 cells. Lane 1, 5, and 9 are
the free probe controls. The ratios of E2A hot probe to poly (dA-dT) and E2A cold probe are shown in the
table for each reaction. The addition of poly(dA-dT) to the reactions leads to the presence of a large band
right below the gel (band 4). The same shifted band 2 as previously observed in Figure 6.2 is observed.
105
6.3.5 A common shifted band is observed in EMSA with single-stranded hot probe using
whole cell lysate
Figure 6.5 Absence of protein with high affinity specific to E2A sequence in EMSA using single-
stranded hot probes.
Single-stranded E2A top strand (DL27) was used as hot probe in reactions from lane 1 to lane 4. Single
stranded control hot probe (DL29) was used in reactions from lane 5 to lane 8. Single-stranded control hot
probe with G4 motif (GGGG) was used in reactions from lane 9 to lane 12. Single stranded control hot
probe with C5TG4 motif (CCCCCTGGGG) was used in reactions from lane 13 to lane 16. Lanes 1, 5, 9,
and 13 are the free probe controls. The ratios of hot probe to poly(dA-dT) and E2A cold probe are shown
in the table for each reaction. The common shifted band 5 pointed by the red arrow is present before the
addition of poly(dA-dT) in reactions with all four probes.
Since single stranded feature in E2A may be important for AID participation in DSBs, it is
possible that there are proteins specifically bind to single-stranded E2A fragile zone sequence
instead of dsDNA. EMSA with four different single-stranded hot probes was performed with 697
whole cell lysate (Fig. 6.5). The four 44 nt ssDNA hot probes include the E2A non-transcribed
strand sequence (DL27), a random control sequence (DL29), a random sequence with G4 motif
(DL74), and a random sequence with C5TG4 motif (DL75). The hot E2A non-transcribed strand
(DL27) contains the 23 bp fragile zone and the downstream C 5AG4 motif. The cold E2A sequence
(DL27) was used as specific competitors in all the reactions. A common shifted band (band 5) is
106
present in all the reactions regardless of which single-stranded hot probe is used indicating that
the proteins bound to band 5 are highly unlikely to be binding to single stranded E2A probe
specifically.
6.4 Discussion
Although several shifted bands are observed in experiments using different hot probes
and protein fractions in EMSA using whole cell lysate, none of the bands appear to be shifted by
proteins specifically binding to single-stranded or double-stranded E2A sequence. The proteins
bound to the potentially interesting band 2 using whole cell lysate in EMSA is likely to be shifted
by proteins with high affinity to the overhang DNA structure (Fig. 6.2; Fig. 6.4). Besides, the
concatemer coupled affinity beads did not capture the proteins that bound to band 2 since band
2 was absent in the reactions using affinity bead captured proteins (Fig. 6.3). No specific protein
was detected to bind to the single stranded E2A NTS sequence in EMSA using whole cell lysate
(Fig. 6.5).
Detection of de novo specific binding proteins with DNA affinity beads and EMSA are
complicated such that they require extensive experiences and optimization. Failing to detect the
E2A specific binding proteins in EMSA, if they exist, can be due to many reasons. The DNA
affinity beads may not be able to capture the E2A specific binding proteins. First, the E2A specific
binding proteins may not have high enough affinity to the E2A sequence to be specifically
captured by the E2A affinity beads. The binding of E2A specific proteins to the E2A affinity beads
may be sensitive to buffer conditions like salt type, salt concentration, pH, reducing reagents, and
temperature. The conditions used may disrupt the protein-DNA binding. Second, the E2A
specific proteins may be composed of several different proteins that require different conditions
to be captured. The complex may be disrupted during the capture process or the elution process.
Third, the specific binding proteins in E2A fragile zone may be expressed conditionally. For
107
instance, their expression can be cell-type specific, cell-stage specific, or sensitive to culture
conditions. The current cell types or culture conditions may not be the optimal expression
condition of the E2A specific binding proteins.
It is also possible that the binding of E2A specific proteins on E2A hot probe is disrupted
on the gel in EMSA. First, the whole cell lysate containing DNA, high abundant proteins, and
lipids can disrupt the binding of E2A specific proteins on E2A hot probes in EMSA. Second, the
low abundance of E2A specific binding proteins in cells or captured by the affinity beads can make
the signal of the shifted band too low to be detected on gel even if they have high affinity to the
E2A fragile zone sequence. Third, the resolution of the native PAGE is another factor limiting the
detection of the unique shifted bands. The unique bands that resulted from the binding between
E2A probe and its specific proteins may be indistinguishable from other strong bands on the gel.
The negative results of EMSA do not preclude the existence of proteins that specifically
bind to the DNA sequence of the E2A fragile zone. Other methods or further optimization of the
current assay are needed to study the specific binding proteins around the E2A fragile zone.
108
Chapter 7 The De Novo Oligo Pull Down Assay for
Detection of Unknown DNA Binding Proteins
7.1 Introduction
Eukaryotic DNA is bound and regulated by proteins inside the cells in the context of
chromatin. Deciphering the proteins bound to specific loci is a key step to understand gene
regulation and functions. Numerous methods have been developed to study the sequence-
specific unknown DNA binding proteins (Dai et al., 2017; Déjardin and Kingston, 2009; Jutras et
al., 2012; Kennedy-Darling et al., 2014; Mohammed et al., 2016; Pardue and Gall, 1969; Wu et
al., 2011).
7.1.1 DNA affinity chromatography
The most well-developed method for detection of the sequence-specific DNA binding
proteins is DNA affinity chromatography, by the column or batch method, as mentioned in Chapter
6 (Briggs et al., 1986; Chodosh et al., 1988; Jones et al., 1987; Rosenfeld and Kelly, 1986; Wu et
al., 1988). Generally, the DNA sequence of interest is coupled to the pre-activated resins (usually
by cyanogen bromide) through chemical reactions between cyanate ester group on the resin and
the DNA amino group. Proteins with high affinity to the DNA sequence on the resin will be
specifically captured by applying whole cell extract to the resin followed by extensive washing to
remove non-specific bindings. This method has been widely used for decades and many
sequence-specific factors were successfully identified by this method including the promoter
specific transcription factor Sp1 (Briggs et al., 1986), transcription factor OTF-1 for histone H2B
gene expression (Fletcher et al., 1987), cAMP response element binding protein (CREB)
(Montminy and Bilezikjian, 1987), and nuclear factor I (NF-I) (Rosenfeld and Kelly, 1986).
Affinity chromatograph is a reliable and well-established method for identification of
sequence-specific DNA binding proteins. However, there are some obvious drawbacks. First,
109
this method requires a large amount of input material, and therefore large scale cell culture (1~100
liters) is necessary for each trial. Second, depending on the affinity of the unknown proteins to
the target DNA sequence, buffer components and purification conditions may vary considerably
for the retention of different proteins on the solid phase. Extensive work is needed to optimize
the conditions. Third, this method may not be suitable for the identification of some low affinity
proteins. The DNA-protein interaction will be disrupted in cases of weak binding at any step in
the process. Fourth, this assay may not be applicable to identify protein complexes. The protein
complex containing many interacting proteins may be disrupted in the experimental process and
not all the participating proteins can be captured by DNA on the affinity resin.
7.1.2 Affinity capture based on dCas9
With the rapid development of CRISPR-Cas9 technique for site-specific gene editing,
many attempts have been made to apply it for detection of DNA-specific binding proteins, taking
advantage of the high specificity of Cas9 system to a specific genomic locus. The Cas9 protein
used in those studies is usually deactivated (dCas9), which allows it to retain the target sequence
binding property without editing the target genomic locus. To capture the target DNA sequence
that dCas9 binds to, some assays rely on the antibody immunoprecipitation while others use
enzyme-catalyzed proximity labeling to tag the nearby proteins, either by engineered ascorbate
peroxidase (APEX2) or BirA.
dCas9 fused with FLAG tag was used in engineered DNA-binding molecule-mediated
chromatin immunoprecipitation (enChIP) assay (Fujita and Fujii, 2013; Fujita and Fujii, 2014).
Anti-FLAG antibody was used to capture the dCas9 bound to a specific genomic locus together
with the other proteins cross-linked near the target locus. However, sensitivity for mass
spectrometry (MS) analysis of unknown proteins (100 attomole; 6 x 10
7
molecules) would require
10
9
or more transfected cells for a single analysis of any genomic target, and this is not possible
for nearly all experimental situations. Furthermore, for each DNA sequence of interest in the
genome, a different guide RNA must be redesigned and tested for specificity.
110
BirA is a biotin ligase in E. coli which can specifically biotinylate a 14-mer peptide substrate
(Beckett et al., 1999). Some methods were developed to fuse the 14-mer BirA acceptor peptide
to dCas9 and express the fusion protein and BirA simultaneously inside the cells (Liu et al., 2017;
Schmidtmann et al., 2016). BirA can specifically biotinylate the 14-mer acceptor peptide fused
on dCas9 after the binding of dCas9 at the DNA sequence of interest inside the cells. Followed
by cross-linking of the cells, the dCas9 with biotin label is fixed on the target DNA sequence,
together with the other unknown DNA binding proteins on the DNA sequence of interest.
Streptavidin is applied to capture the biotinylated protein complex for MS detection.
Besides BirA, APEX2 is another protein used for the in vivo protein labeling. APEX2
catalyzes the oxidation of biotin-phenol (BP) to the short-lived biotin-phenoxyl radical in the
presence of hydrogen peroxide (Rhee et al., 2013). The cells transiently treated with BP and
H2O2 generate short-lived biotin-phenxyl radicals within 20 nm around APEX2. The radicals react
with electron-rich amino acid side chains which leads to the biotinylation of the nearby proteins.
In cells with APEX2 and dCas9 fusion protein expression, dCas9 guides the chimeric protein to
the target genomic locus and APEX2 allows the labeling of nearby polypeptides (Kalocsay, 2019).
After the BP and H2O2 treatment, the proteins close to the dCas9-APEX2 fusion protein are
labeled and then captured by streptavidin beads for MS analysis. C-BERST is one of the
representative methods that combines dCas9 and APEX2 system for proteomics study of a
defined genomic loci inside the cells (Gao et al., 2018; Myers et al., 2018).
By altering the sgRNA sequence, the proximity based labeling methods mentioned above
can be used to study the DNA binding proteins of different loci. They require fewer input cells
compared with the DNA affinity chromatography method. The biotinylation reaction by the
enzymes is very efficient and stable, permitting these methods to be widely used.
However, there are several concerns when using dCas9 to identify sequence-specific
DNA binding proteins. The most obvious drawback is loci-limitation for the in vivo study. Almost
all the studies are restricted to telomeric, centromeric, or satellite DNA regions because of the
111
high copy numbers of those regions in the human genome. Non-repetitive regions that only have
two copies per diploid genome are likely under the detection limitation of those methods. A
second problem is that the non-specific proteins can be captured in these assays caused by
dCas9. A recent study using single-molecule Förster resonance energy transfer (smFRET)
method to study the interaction between Cas9 and DNA shows that the Cas9 first finds the PAM
sequence by 3D collisions and then diffuses laterally on the DNA strand to finally find the on-
target sequence (Globyte et al., 2019). The PAM-finding and target-binding process give rise to
a potential problem for the dCas9-related proximity labeling. When the dCas9 fusion proteins
(both fused with the 14-mer BirA acceptor peptide and with APEX2) are sliding along DNA and
searching for the target sequence, the surrounding proteins will be biotinylated unselectively. This
will lead to the inclusion of non-specific DNA binding proteins in the final sample. Another study
regarding the kinetics of Cas9 binding shows that Cas9 spends ~40% of the time searching for
PAM in each binding. dCas9 diffuses freely in the cytoplasm for 25 ± 8 ms on average and then
interacts with a PAM site for 17 ± 4 ms (Martens et al., 2019). The high percentage of free dCas9
diffusing in the cytoplasm reemphasizes the potential capture of non-specific proteins mentioned
above. These non-specific binding events and target scanning properties of DNA binding proteins
can lead to very high backgrounds in protein identification experiments, especially in a large
genome as in human cells, when they are utilized to detect sequence specific binding proteins.
Most of the published methods using dCas9 to pull down the target DNA associated
proteins yielded poor MS results. The proteins detected by those assays are commonly present
in both control and real samples (Liu et al., 2017).
7.1.3 Affinity capture based on oligonucleotide
Several studies have taken advantage of DNA base pairing instead of specific DNA
binding proteins (such as dCas9 and antibody) to capture the unknown binding proteins on a
target DNA. Proteomics of isolated chromatin segments (PICh) is developed to use the pairing
between a doped locked nucleic acid (LNA) probe and the target sequence inside the eukaryotic
112
cells (Déjardin and Kingston, 2009). The proteins cross-linked with the target DNA are recovered
using the specific interaction between biotin label LNA and magnetic beads and analyzed by MS.
This assay is limited to telomere regions due to the low copy number of other regions in the diploid
genome as mentioned above. Murarka et al. further improved this assay by treating the cross-
linked genome with an exonuclease to make the target sequence single-stranded, such that it is
ready to anneal with the biotinylated oligo (Murarka and Srivastava, 2018). This assay was
performed using bacteria to detect genomic loci other than the repetitive regions, since the large
cell numbers can be easily achieved from bacterial cultures.
7.1.4 Oligo pull down assay
A reliable and efficient method is necessary for mapping proteomic composition at
genomic loci that are not repetitive in the human cells. In this study, we used an EBV-based
episome that closely simulates events that occur in the human genome (Han et al., 2001; Lin and
Hsieh, 2001; Okitsu and Hsieh, 2007; Okitsu et al., 2010). The advantage of having tens to
hundreds copies of the minichromosomes per cell in this system (Hsieh, 1994) may circumvent
the difficulty faced by the methods discussed above.
In this chapter, we use the EF1A promoter as the target sequence to test if this assay can
detect promoter-related DNA binding proteins. We first optimize the conditions of the oligo pull
down assay, including the buffer compositions and HindIII and T4 DNA polymerase
concentrations, to achieve a high pull down efficiency of EF1A at DNA level. The qPCR analysis
reveals a specific and efficient pull down of the EF1A promoter compared with the control region,
when using a plasmid with multiple biotin oligo annealing sites; however, the MS results show low
reproducibility of the assay on protein level. Modifications are made to reduce the non-specific
proteins captured by this assay through the use of restriction enzymes to release the captured
DNA/proteins from the beads. Further experiments should focus on increasing the absolute copy
number of the EF1A promoter and its binding proteins pulled down from the cells, which will help
113
to reduce the background non-specific proteins and increase the reproducibility of the assay at
protein level.
Once the method is developed using episomes harboring the EF1A promoter, the DNA
binding proteins on any sequence of interest can be studied by replacing the EF1A promoter by
the sequence of interest on the episome without any further optimization, since the oligo annealing
sites remain the same. One potential application of this assay is to detect the DNA binding
proteins at the human fragile zones in human lymphomas, which may help to provide new insights
into the mechanisms of DNA breakage in human B cell malignancies.
7.2 Materials and Methods
7.2.1 Oligonucleotides, restriction enzymes, and plasmids
Table 7.1 Oligonucleotides used in the oligo pull down study.
The oligo name, sequence, length, modifications, and purification methods are shown in separate columns
in the table.
The oligonucleotides used in this study were synthesized by IDT and by Eurofins Scientific
(Fresno, CA). The sequence, purification method, and modifications of oligonucleotides used in
this chapter are listed in Table 7.1. All restriction enzymes are from NEB unless otherwise
specified. PvuII with no BSA was a kind gift from NEB with a concentration of 13,000 units/µl. It
114
was diluted with enzyme storage buffer (10 mM Tris-HCl, 300 mM NaCl, 1 mM DTT, 0.1 mM
EDTA, 500 µg/ml BSA, 50% Glycerol, pH 7.4) before usage.
pCLH22EF1A was derived from pCLH22 by replacing the RSVLTR promoter for luciferase
gene with EF1A promoter. Two HindIII sites on this plasmid are located 81 bp upstream and 56
bp downstream of EF1A promoter. The other plasmids used in this chapter were derived from
pCLH22EF1A. Various numbers of annealing sites for the biotinylated oligo were inserted
upstream of EF1A promoter on pDL5 (3 copies), pDL6 (4 copies), pDL7 (5 copies), pDL8 (6
copies), and pDL10 (9 copies). pDL30 was modified based on pDL8 with the sequence of the
biotinylated oligo annealing site changed to contain a SacI site and the luciferase gene
downstream of EF2A promoter replaced with TurboGFP.
7.2.2 Cell culture, transfection, and harvest
The 293/EBNA1 cells were cultured in the same way as described in previous chapters.
The calcium phosphate method was used to transfect the plasmids to 293/EBNA1 cells (Wigler
et al., 1979). Once the transfected cells in a 35-mm-diameter plate grew into 100% confluence,
they were re-plated to a 100-mm-diameter plate under hygromycin B selection (200 µg/ml) for
5~7 days. When the cells in the 100-mm plate reached confluency, about 10% of the cells were
re-plated in a 100-mm diameter plate with continuing hygromycin selection and the remaining 90%
of the cells were harvested. The harvested cells were cross-linked with 1% formaldehyde by
incubation on the shaker at room temperature for 10 min. Cross-linking was quenched with 250
mM of Tris-HCl buffer (pH 8.0) at room temperature, followed by washing with 1x PBS buffer twice.
The transfection efficiency was estimated with luciferase expression analysis when
luciferase gene-carrying plasmids (pCLH22EF1A, pDL5, pDL6, pDL7, pDL8, and pDL10) were
transfected into the 293/EBNA1 cells as described previously (Hsieh, 1994). For 293/EBNA1
cells transfected with pDL30 which contained TurboGFP gene instead of luciferase gene, the
percentage of GFP positive cells were monitored by fluorescence microscopy.
115
All plasmids were transfected into 293/EBNA1 cells for a single time except pDL30, which
was transfected to already transfected cells one more time to boost the minichromosome copy
number in each cell.
7.2.3 Oligo pull down assay
7.2.3.1 Oligo pull down of minichromosomes in transfected cells
The total genomic DNA (gDNA) in a cell pellet harvested from one 100-mm-diameter plate
is estimated as 36 µg based on the 6 pg gDNA per diploid human cell and an average of 4 x 10
6
cells per 100 mm plate (293/EBNA1 cells often harbor three or four copies of some chromosomes).
The HindIII and T4 DNA polymerase concentrations (unit/µg) mentioned in this chapter were
estimated based on the total gDNA amount in each reaction with both doped plasmids and
minichromosomes. For example, for reaction using cells from one 100-mm-diameter plate, a total
of 36 units of T4 DNA polymerase is needed to achieve a final concentration of 1 unit/µg.
Five steps were involved in the oligo pull down assay (Fig. 7.1). The cell membrane was
first permeated by resuspending the cross-linked cell pellet harvested from a 100-mm-diameter
plate in 1 ml of ice-cold RSB buffer (10 mM Tris-HCl, 10 mM NaCl, 3 mM MgCl2, pH 7.5)
supplemented with 1 mM PMSF and 1% NP40 followed by incubation on ice for 5 min. The
permeated nuclei were recovered by centrifugation in Eppendorf centrifuge at 5,500 rpm for 5 min
at 4°C and washed with 1 ml of ice-cold RSB buffer and 1 ml of ice-cold digestion/0.05% SDS
buffer [50 mM Tris-HCl, 50 mM NaCl, 1%NP40, 0.5% deoxycholic acid, 0.5 mM MgCl2, 0.05%
SDS, pH 8.0, supplied with EDTA-free proteinase inhibitors (Roche, Basel, Switzerland)]. The
nuclei were centrifuged at 4°C for 5 min at 5,500 rpm in Eppendorf centrifuge after each wash
and resuspended in 1 ml of digestion/0.1% SDS buffer (50 mM Tris-HCl, 50 mM NaCl, 0.1% SDS,
1%NP40, 0.5% deoxycholic acid, 0.5 mM MgCl2, pH 8.0, supplied with EDTA-free proteinase
inhibitor) for in-nuclei digestion.
116
Figure 7.1 Illustration of the oligo pull down assay.
Plasmids harboring EBV latent origin, oriP, stay episomal in the nuclei and replicate once per cell cycle
after transfecting into in human cells expressing the EBV EBNA1 protein, such as 293/EBNA1 cells. Five
to 100 copies of these replicating minichromosomes can be stably maintained in 293/EBNA1 cells for
months. HindIII sites (marked with blue triangles) are located upstream and downstream of the EF1A
promoter. The transfected cells are cross-linked at harvest. The EF1A promoter region (shown in blue),
flanked by two HindIII restriction sites, is released from the rest of the plasmid by diffusing HindIII restriction
enzyme into permeated nuclei. The 3’ to 5’ exonuclease activity of the T4 DNA polymerase generates a
further recessed 3’ end and a long single stranded 5’ end on DNA fragments. An oligonucleotide (oligo)
probe with 3’ biotin label (shown as red line with an asterisk at 3’ end), complementary to the single-stranded
region upstream of the EF1A promoter, is annealed to this single stranded region of the EF1A promoter.
The streptavidin beads are used to enrich the EF1A DNA and the proteins cross-linked to it (shown as X in
brown oval) by binding with high specificity to the biotinylated oligo probe. The DNA pulled down is analyzed
by real-time qPCR and the proteins are analyzed by mass spectrometry (MS) after reversed crosslinking.
Second, HindIII digestion was performed at 37°C for 2 hours at a final concentration of 15
units/µg followed by T4 DNA polymerase treatment at 1 units/µg for 40 min at 37°C. T4 DNA
polymerase was inactivated by adding EDTA (pH 8.0) to a final concentration of 10 mM and
incubation at 65°C for 5 min. At the end of this step, the EF1A promoter region is separated from
the rest of the minichromosome with its 5’ region in single-stranded state where the biotinylated
oligo can anneal to. The sample was further fragmented by sonication on ice for 4 cycles with 10
pulses of 50% duty cycle in each cycle (BRANDON SONIFIER 450, VWR Scientific, Radnor, PA).
The supernatant after centrifugation at 12,000 rpm at 4°C for 10 min was used for biotinylated
117
oligo annealing (Eppendorf centrifuge). To evaluate the pull down efficiency by qPCR, 10% of the
supernatant was reserved before pull down to quantitate the total amount of EF1A promoter in
the reaction.
Third, the biotinylated oligo was added to the supernatant at a final concentration of 240
pM followed by 5 min incubation at 55°C and then overnight annealing at 37°C. At this step, 5’
overhang region of EF1A promoter should be stably paired with the biotinylated oligo.
Fourth, streptavidin magnetic Dynabeads were used to pull down the biotinylated
annealed EF1A promoter. NaCl was supplemented to a final 1 M concentration followed by
addition of 5 µl of Dynabeads
TM
MyOne
TM
Streptavidin C1 (Invitrogen, Carlsbad, CA). The
reaction with Dynabeads was incubated at room temperature for 1 hour on the rotator to ensure
a complete binding of the biotinylated oligo by the Dynabeads. The flowthrough from this binding
reaction was reserved for qPCR quantitation of EF1A promoter that remained in the supernatant
without annealing to the biotinylated oligo. The Dynabeads were washed twice with 1 ml of 1 x
binding buffer (5 mM Tris-HCl, 1 M NaCl, 0.5 mM EDTA, pH 7.5) and recovered with magnetic
stand (Thermo Fisher Scientific, Carlsbad, CA) after each wash. The 1 x binding buffer was
supplemented with 0.05% Tween20 in experiments involving pDL27 and pDL30.
Fifth, the pull down efficiency was analyzed by quantitating the EF1A and a control region
(Luc or Hyg) in each sample fraction with qPCR assays with TaqMan probes. The recovered
beads were heated at 95°C for 5 min in 100 µl of 1x binding buffer to disrupt the biotin-streptavidin
binding. The supernatant containing all the DNA captured by Dynabeads was recovered. All
fractions reserved for qPCR analysis were reversed cross-linked in buffers containing at least 0.5
M NaCl and incubation at 65°C overnight followed by phenol/chloroform extraction, ethanol
precipitation, and sample resuspension with 12 µl of 1 mM Tris (pH 7.5). Each qPCR assay
contains 1 µl of the resuspended samples as template, 50 nM of the TaqMan probe, and 300 nM
of each primer in 1 x qPCR SuperMaster mix (Thermo Fisher Scientific, Carlsbad, CA). The
cycling condition is 95°C for 3 min followed by 45 cycles of 95°C for 15 s and 60°C for 1 min.
118
In pull down reaction involving pDL30, the digestion/0.05% SDS buffer was used since
HindIII has an improved performance in digestion pDL30 in this buffer compared with
digestion/0.1% buffer. DL131/DL132 biotinylated probes were used in the oligo annealing step.
Different fractions were reserved for the qPCR analysis after Dynabeads incubation. The
recovered Dynabeads were not boiled but digested with BSA-free PvuII to release the EF1A
promoter from the Dynabeads. Specifically, the Dynabeads were washed with 1 ml of 1 x NEB
buffer 3 with 0.05% Tween20 after the second wash with 1x binding buffer. The Dynabeads were
then incubated in 20 µl reactions with 0.1~1 unit of BSA-free PvuII in 1 x NEB buffer 3 with 0.05%
Tween20 at 37°C for 30 mins. The supernatant of the digested reaction was reserved and the
Dynabeads were boiled for 5 min in 100 µl of 1 x binding buffer. The supernatant of the boiled
beads was recovered. All samples were reverse cross-linked for qPCR analysis.
The absolute quantifications of minichromosome copy number in each sample were
calculated based on the qPCR standard curve generated by a serial dilution of pCLH22EF1A in
each PCR run. The percentage of pull down was calculated by dividing the copy number of EF1A
recovered by the Dynabeads by the total copy number of plasmids prior to the pull down in each
reaction. For experiments involving pDL30 transfected cells, because the recovery of EF1A in
different fractions was always more than the original input, the denominator used to calculate the
percentage pull down was the total EF1A copy number in the flowthrough and enriched by the
Dynabeads, which is the highest quantity detected.
7.2.3.2 Oligo pull down with doped plasmid
pCLH22EF1A pretreated with HindIII and Klenow fragment (mentioned as pretreated
pCLH22EF1A hereafter) was used as positive controls in some experiments to evaluate the
performance of the assay in cellular context. Pretreated pCLH22EF1A was ready to be annealed
with the biotinylated oligo. The 15 µl digestion reaction containing 5 µg of plasmid and 20 units
of HindIII was incubated for 2 hours at 37°C. 0.8 unit of klenow fragment was added to the
reaction supplemented with reaction buffer to make the final volume 25 µl. Klenow treatment was
119
done at 37°C for 10 min followed by inactivation at 65°C for 20 min. Usually, 60 ng of pretreated
plasmid was doped into 1/5 of cross-linked cells harvested from a 100-mm-diameter plate. The
pull down was performed in the same manner as described above for the transfected cells except
that HindIII and T4 DNA polymerase treatment and the reverse cross-link step were omitted.
Intact plasmid was doped into the cell mixture in some experiments. Around 60 ng of
intact plasmid was doped into 1/5 of cross-linked cells harvest from a 100-mm-diameter plate.
The pull down was performed in the same way as described above for the transfected cells except
that the reverse cross-link step was omitted.
7.2.4 Optimizations of the oligo pull down assay
7.2.4.1 Buffer optimization
The buffer for oligo pull down assay was optimized using pretreated pCLH22EF1A.
Following cell membrane permeation, 60 ng of the pretreated plasmid was mixed together with
250 µl of cross-linked cells (1/4 of cells from a 100-mm-diameter plate) with buffer added to a final
volume of 1 ml. Six different buffers tested contain varied concentrations of NaCl, SDS, and
MgCl2 as listed in Figure 7.2a. Specifically, buffer A was RIPA buffer. Buffer D was digestion
buffer plus 0.1%SDS (digestion/0.1%SDS buffer), and buffer F was digestion/0.05%SDS buffer.
EDTA-free proteinase inhibitor cocktail tablet was added to each buffer. The oligo pull down
assay was performed in the same manner as described above. The amount of EF1A promoter
from the pretreated plasmids pulled down in each buffer was assayed by qPCR. The luciferase
(Luc) sequence was used as the negative control region for pull down in qPCR analysis.
7.2.4.2 Optimization of T4 DNA polymerase and HindIII amount
To optimize the T4 DNA polymerase concentration in the assay, 60 ng of the intact
pCLH22EF1A was doped into 1/5 of cross-linked cells harvested from a 100-mm-diameter plate
in 1 ml of digestion/0.1% SDS buffer after cell membrane permeation. The pull down was
performed in the same way as described before. Each reaction was incubated with 15 units of
HindIII per µg of DNA for 2 hours at 37°C. Six T4 DNA polymerase concentrations (0.5, 0.67,
120
0.83, 1, 1.2, and 1.5 units/µg DNA) were tested individually. A control reaction with pretreated
pCLH22EF1A was done in parallel.
To further increase the efficiency of this assay, I tried to increase either the HindIII
concentration or the incubation time in the oligo pull down assay. Usually, 60 ng of the intact
pCLH22EF1A was doped into 1/5 of cross-linked cells from a 100-mm-diameter plate in 1 ml of
digestion/0.1% SDS buffer after cell membrane permeation. Four HindIII concentrations (15, 16.5,
19.5, 22.5 units/µg DNA) and varied incubation time from 3 hours to 14 hours were tested. All
samples were treated by T4 DNA polymerase at 1 unit/µg and processed in the same way as
described above.
7.2.5 Mass spectrum sample preparation and data analysis
The recovered Dynabeads were treated differently for MS analysis than for qPCR analysis.
Minor changes were made for different MS trial.
7.2.5.1 Trial 1--MS samples prepared with pDL5 transfected cells
Cells transfected with pDL5 containing 3 copies of the oligo annealing sites were used for
MS trial 1. Two replicates each containing one sample and one control were prepared. The
sample in each replicate contained proteins captured from pDL5 transfected 293/EBNA1 cells
using CLH-bio biotinylated oligo. A total of five 100 mm plates of transfected cells were processed
individually and combined at the last step for reverse cross-link. The control was prepared in the
same way using five 100 mm plates of 293/ENBA1 cells without transfected plasmids for non-
specific protein background by this assay. One minor change of the oligo pull down assay for MS
sample preparation was that the HindIII concentration used in this trial was increased from 15
units/µg to 22 units/µg to ensure a complete digestion.
Proteins from the sample and the control in replicate 1 were released by on-beads trypsin
digestion following the thorough wash using detergent-free buffer. The supernatant of on-beads
digestion was collected and reverse cross-linked at 95°C for 1 hour. The reduction, alkylation,
121
and desalting were done followed by one dimensional (1D)-MS/MS analysis by the Graham Lab
of USC.
The second replicate (replicate 2) was analyzed by Dr. Kun-Yi Chien of Chang Gung
University in Taiwan. The proteins from the sample and the control in the second replicate were
directly eluted from the Dynabeads by acetonitrile and guanidinium chloride followed by freeze
dry and reverse cross-link. The samples were first reduced with 5 mM TCEP by incubation at
56°C for 20 min. Alkylation was done with 10 mM MMTS followed by incubation at room
temperature for 10 min. The samples were digested by 100 ng of trypsin at 37°C overnight
followed by desalting with C18 columns to remove any residual salt. The non-isobaric tag for
relative and absolute quantification (mTRAQ)-based quantitative proteomic analysis of the protein
complex captured was performed. Briefly, the trypsin-digested peptides were labeled with 2-plex
mTRAQ reagents (0 for control and 8 for sample) followed by polymer removal with strong cation
exchange (SCX) column. Another round of C18 column desalting was done before MS analysis.
Peptides from the sample and the control were mixed and loaded into an online two-dimensional
(2D) chromatography platform for in-depth proteome quantification. The MS raw data was
analyzed with Proteome Discoverer 1.4 (Thermo Fisher Scientific, Carlsbad, CA). Mascot and
percolator modules were used for peptide matching. Proteins with at least two unique peptides
were kept and others were filtered out. Within each group, the proteins identified in sample and
control were first compared and the sample-unique proteins were screened. The sample-unique
proteins between two replicates were compared for overlap which represented the consistency of
the assay.
7.2.5.2 Trial 2--MS samples prepared with pDL5 transfected cells with optimization
Two replicates were prepared with each containing a sample and a control. The sample
and control were prepared in the same manner as MS trial 1 with minor changes, including
avoiding using LoBind Eppendorf tubes and autoclaved pipette tips, changing tubes before protein
elution from the Dynabeads, increasing the biotinylated oligo concentration to 480 pM, decreasing
122
the Dynabeads from 5 µl to 3 µl for each pull down reaction, and standardize the oligo incubation
time to 16 hours. The sample and control in both replicates were from directly eluted from the
Dynabeads by acetonitrile and guanidinium chloride followed by freeze dry and reverse cross-link.
All samples were analyzed by Dr. Kun-Yi Chien of Chang Gung University.
7.2.5.3 Trial 3--MS samples prepared with pDL30 transfected cells and BSA-free PvuII on-beads
digestion
Cells transfected with pDL30 twice were used to increase the copy number of the
minichromosome in the cells and the recovery of the EF1A promoter binding proteins in these
trials. The cell harvest, pull down, and reverse cross-link were done as described above. Two
replicates each including three samples, two for the sample group and one for the control group,
were prepared. The HindIII concentration used in these trials was 15 units/µg DNA.
For the sample group, BSA-free PvuII digestion of the recovered Dynabeads was done to
release the captured EF1A promoter from the Dynabeads to the supernatant. The reaction
containing 0.5 unit of BSA-free PvuII and the Dynabeads in 1 x NEBuffer 3 with 0.05% Tween 20
was incubated at 37°C for 30 min. The supernatant containing the released EF1A fragment was
collected and reverse cross-linked. The portion of EF1A promoter remaining on the Dynabeads
due to the resistance to PvuII digestion was recovered by reverse cross-linking the remaining
Dynabeads. The proteins non-specifically pulled down by the Dynabeads were also present in
this fraction. The proteins pulled down from untransfected 293/EBNA1 cells were recovered by
reverse cross-linking of the recovered Dynabeads without PvuII digestion. All six samples were
analyzed individually by MS without isotope labeling by Dr. Kun-Yi Chien.
7.3 Results
7.3.1 Reasonable EF1A recovery can be achieved in HindIII compatible low salt buffers
123
To investigate the optimal buffer composition to maximize the efficiency of the oligo pull
down assay, we tested different buffers varying in NaCl, MgCl2, and SDS concentrations in this
assay with pretreated pCLH22EF1A mixed with cross-linked cells (Fig. 7.2a). The percentage of
pull down the EF1A region using all six buffers are above 35% while pull down of the control
region (Luciferase gene, Luc) is near 0%. This assay gives the highest 51% pull down of the
EF1A region in RIPA buffer (buffer A), 40% pull down in digestion/0.1%SDS buffer (buffer D), and
35% pull down in digestion/0.05%SDS buffer (buffer F). The results using pretreated plasmid
indicate that the buffer composition does not have a dramatic effect on EF1A pull down, and this
assay has the most robust performance and highest pull down of EF1A in RIPA buffer.
Figure 7.2 Buffers with low salt result in good enrichment and are compatible with HindIII enzyme
activity.
(a) Pull down results in different buffers using HindIII and Klenow fragment pretreated pCLH22EF1A. The
pull down reactions were performed with 60 ng of the pretreated plasmids in 200 µl of cell mixture. The
compositions of each buffer are listed at the bottom of the figure. All buffers are supplied with proteinase
inhibitor. Buffer A is RIPA buffer which results in highest enrichment of EF1A (51%). The other buffers
were modified from RIPA buffer. Buffer D is digestion buffer plus 0.1% SDS (digestion/0.1% SDS buffer).
Buffer F is digestion/0.05% SDS buffer. Luciferase gene (Luc) is used as the control region for non-specific
pull down. (b) Digestion of pCLH22EF1A in digestion/0.1% SDS buffer. The 30 µl digestion reaction
containing varied HindIII amount (12.5, 13.2, 15.2 units) and 1 µg of pCLH22EF1A was incubated at 37°C
for 2 hours. Reactions done with NEBuffer 2 serve as positive controls.
Since the components of the buffer do not affect pull down dramatically, we attempted to
identify a buffer with the highest HindIII restriction digestion activity. The HindIII activity was tested
124
by digestion of pCLH22EF1A in buffers with varied Tris-HCl, NaCl, and SDS concentrations (data
not shown). Reactions performed with low NaCl (50 mM NaCl) buffers show complete digestion
of pCLH22EF1A by HindIII while partial digestion is observed in reactions using buffers with 100
mM or 150 mM NaCl. Although the oligo pull down assay has the highest EF1A pull down
percentage using RIPA buffer, the 150 mM NaCl in RIPA buffer is inadequate for HindIII digestion
step in the oligo pull down assay. Both digestion/0.1% SDS buffer and digestion/0.05% SDS
buffer contain 50 mM NaCl with EF1A pull down above 35% (Fig. 7.2a), making these good
candidate buffers for the oligo pull down assay. Further HindIII activity test in digestion/0.1% SDS
buffer shows a 1.3 kb band as the positive control reactions (Fig. 7.2b) confirming the HindIII
complete digestion can be achieved in this buffer.
7.3.2 Determination of the optimal T4 DNA polymerase and HindIII concentrations for the
assay
After the HindIII digestion, T4 DNA polymerase with 3’ to 5’ exonuclease activity acts at
the 3’ end of all DNA fragments and generates 5’ overhangs at DNA ends including of the EF1A
promoter. The 5’ overhang at DNA ends upstream of the EF1A promoter allows the annealing of
the biotinylated oligo to that specific site. The T4 DNA polymerase concentration and reaction
condition would determine the length of the overhanging DNA ends and whether the biotinylated
oligo can be properly annealed to the EF1A target region. Therefore, we tested different T4 DNA
polymerase concentrations in order to achieve the highest EF1A pull down efficiency. The T4
DNA polymerase concentrations varying from 0.5 unit/µg to 1.5 units/µg were tested with doped
intact pCLH22EF1A to the cell mixture in digestion/0.1% SDS buffer (Fig. 7.3a). The reaction
with pretreated plasmid functioned as a positive control of the assay. HindIII treatment was done
at 15 units/µg for all the pull down reactions except the positive control, in which HindIII digestion
is not needed. The positive control reaction shows 41% pull down of EF1A promoter and is
consistent with the previous experimental result using the same buffer (Fig. 7.2a). The percent
pull down of EF1A with doped intact plasmid increased with increasing concentration of T4 DNA
125
polymerase to the highest point (20%) at 1 unit/ug DNA and then decreased as further increases
of T4 DNA polymerase concentration (Fig. 7.3a). This observation is consistent with how T4 DNA
polymerase function. The inadequately low T4 DNA polymerase concentrations would result in
short 5’ overhang that does not allow the biotinylated oligo to fully hybridize to the specific site
and inadequately high T4 DNA polymerase concentrations can lead to over digestion of the EF1A
promoter and interfere with the qPCR detection of EF1A DNA. Importantly, these results establish
an optimal T4 DNA polymerase concentration at 1 unit/µg for this assay.
Figure 7.3 Optimization of T4 DNA polymerase and HindIII treatment conditions in oligo pull down
assay.
(a) Optimization of T4 DNA polymerase concentration. The pull down reaction was performed with 60 ng
of intact pCLH22EF1A in 200 µl of cell mixture in digestion/0.1% SDS buffer. HindIII was added at the
concentration of 15 units/µg DNA in all the reactions. The concentration of T4 DNA polymerase used in
each reaction varies as listed at the bottom of the figure. The reaction performed with HindIII and Klenow
fragment pretreated plasmid was used as positive control of the assay. Luciferase gene (Luc) serves as
the control region for the non-specific pull down in each reaction. (b) HindIII treatment condition
optimization. The pull down reactions were performed in the same way as described in (a). T4 DNA
polymerase was added at the concentration of 1 unit/µg DNA in all the reactions. The concentration of
HindIII and incubation time of each reaction are listed below the x axis of the figure.
126
At the optimal T4 DNA polymerase concentration, I tested if increasing the HindIII
concentration and incubation time can lead to increased pull down efficiency with intact doped
pCLH22EF1A. Various HindIII concentrations ranging from 15 to 22.5 units/µg and digestion time
from 2 hours to 14 hours followed by T4 DNA polymerase treatment (1 unit/µg DNA) were tested
(Fig. 7.3b). The pull down percentage of EF1A target region under all conditions tested is above
10% with the highest achieved at 15 units/µg with 2 hours incubation time. This result indicates
the increase of either HindIII concentration or incubation time does not lead to an obvious increase
in pull down efficiency. Considering short incubation time at 37°C can limit the potential reverse
cross-linking that may occur during the prolonged incubation at 37°C (Kennedy-Darling and Smith,
2014), the HindIII digestion time is controlled under 2 hours in all the pull down experiments.
HindIII concentration in different experiments hereafter is either 15 units/µg or 22.5 units/µg to
ensure the complete digestion of the doped or transfected plasmid.
7.3.3 Increased number of biotin oligo annealing sites increases EF1A pull down
efficiency
The successful annealing of the biotinylated oligo to the EF1A promoter region is
dependent on the success exposure of the oligo binding site by the HindIII digestion and T4 DNA
polymerase treatment. Minichromosomes, such as what are used in this study, are chromatinized
and known to have histones binding to the minichromosomes that localized in the nuclei. The
HindIII site upstream of the EF1A promoter may have histone or other unknown protein binding
and inaccessible to HindIII restriction enzyme. To increase the chance of HindIII digestion of the
minichromosome, we constructed plasmids (pDL5, pDL6, pDL7, pDL8, and pDL10) with several
copies of biotinylated oligo annealing sites using pCLH22EF1A as backbone. The efficiency of
the oligo pull down assay using plasmids with 1, 3, 4, 5, 6, and 9 copies of biotinylated oligo
annealing site was then evaluated (Fig. 7.4). Both doped intact plasmids and minichromosomes
from transfected cells were used. For doped plasmids, 12% pull down of EF1A promoter is
observed when using plasmid with one copy of biotinylated oligo annealing site (Fig. 7.4a). It
127
increases to 15%~18% when three and more annealing sites are present on doped plasmids. For
minichromosomes from transfected cells, only 3% of the EF1A is pulled down from cells
transfected with plasmid with one biotinylated oligo annealing site (Fig. 7.4b). The pull down
efficiency of EF1A from minichromosomes with three annealing sites is increased to 10% and the
highest pull down of EF1A at 15% is observed when using minichromosome with five copies of
the annealing sites. These results suggest that multiple copies of the biotinylated oligo annealing
site can increase the pull down efficiency of the EF1A promoter both from doped plasmids and
minichromosomes.
The absolute copy number of the EF1A promoter pulled down from transfected cells in a
100-mm-dish ranges from 4 million to 38 million in different reactions involving minichromosomes
with different copy number of biotinylated oligo annealing site (Fig. 7.4c). More than 1 x 10
7
copies of EF1A promoter were pulled down when minichromosomes with multiple copies of the
biotinylated oligo annealing sites are used. The highest copy number pulled down is 3.8 x 10
7
using minichromosome with three copies of annealing sites. This number demonstrates that
sufficient number of protein molecules can be collected for MS detection, which has the sensitivity
to detect 6 x 10
7
protein molecules, when several dishes of transfected cells are processed in
parallel.
128
Figure 7.4 Increased copy number of the oligo annealing sites increases the pull down efficiency.
(a) Pull down efficiency of EF1A from doped intact plasmids. The pull down reactions were performed with
60 ng of intact plasmids with various copies of the biotinylated oligo annealing site in 200 µl of cell mixture.
(b) Pull down efficiency of EF1A from transfected minichromosomes. Plasmids with various copies of the
oligo annealing site were individually transfected to 293/EBNA1 cells. The pull down efficiency of the EF1A
promoter from minichromosomes with different copies of biotin oligo annealing site are shown. For both (a)
and (b): all reactions were performed with T4 DNA polymerase at 1 unit/µg DNA and HindIII at 22.5 units/µg
DNA in digestion/0.1% SDS buffer. The number of oligo annealing site on the plasmid used in different
reactions is shown under the x axis. Luciferase gene (Luc) is used as the control region for non-specific
pull down. (c) Absolute copy number of EF1A pulled down from minichromosomes in cells from a 100-mm
dish. The number of EF1A molecules pulled down from minichromosomes carrying different copy number
of the oligo annealing sites is shown as the y axis.
129
7.3.4 MS trial 1 and trial 2 show low reproducibility of the protein detected
The percentage and absolute number of EF1A molecule captured using minichromosome
with three and more oligo annealing sites are high enough to provide sufficient protein molecules
for MS analysis, if we assume one copy of unknown protein is bound on each EF1A molecule. A
sample using cells transfected with pDL5, which harbors three copies of annealing sites, was
prepared to identify proteins specifically bound to the EF1A promoter. A control sample using
293/EBNA1 cells without transfected plasmids was generated in parallel, to detect the non-
specific proteins pulled down in this assay. Two replicates both with one sample and one control
were analyzed in MS trial 1. The proteins unique to sample group (sample-unique proteins) was
derived by filtering out proteins also detected in the control group in each replicate, a cross
comparison of the sample-unique proteins between two replicates to identify unique proteins
detected in the sample group from both replicates.
In the first replicate, 409 and 311 proteins with ≥ 2 unique peptides are identified in the
sample and control groups, respectively, among which 267 proteins are present in both groups
(Fig. 7.5a). A total of 142 proteins are unique to the sample group in replicate 1. In replicate 2,
1176 and 741 proteins are identified in the sample and control groups, respectively, among which
567 proteins are detected in both groups (Fig. 7.5b). There are 609 proteins unique to the sample
group. Comparison of the sample-unique proteins from replicate 1 and replicate 2 reveals that
only 16 common proteins among the 142 and 609 sample-unique proteins respectively (Fig. 7.5c).
These 16 common proteins account for about 10% of the sample-unique proteins of replicate 1
and only 3% of the sample-unique proteins of replicate 2. In addition, the majority of these 16
common proteins locate in the cytoplasm with no known relation to transcription.
130
Figure 7.5 Mass spectrometry trial 1 results indicate low reproducibility of proteins identified in the
assay.
(a) Venn diagram of proteins identified in the sample and control groups in replicate 1. Between the 409
proteins detected in sample group and 311 proteins identified in control group, 267 proteins are overlapped.
The sample group contains 142 unique proteins compared with the control group in replicate 1. (b) Venn
diagram of proteins identified in sample group and control group in replicate 2. Between the 1176 proteins
detected in sample and 741 proteins identified in control, 567 proteins are overlapped. The sample group
contains 609 unique proteins compared with the control group in replicate 2. (c) Comparison of proteins
unique in the sample groups of replicate 1 and replicate 2. Between the 142 sample-unique proteins of
replicate 1 (shown in blue circle) and the 609 proteins sample-unique proteins of replicate 2 (shown in red
circle), 16 proteins are overlapped and listed in the table. For both (a) and (b): the proteins in the sample
group which were pulled down from pDL5 transfected 293/EBNA1cells are shown in red circle. The proteins
in the control group which were pulled down from 293/EBNA1 cells without transfected plasmids are shown
in blue circle. The overlapped portion between the sample group and control group is shown in green.
Only proteins with ≥ 2 unique peptides are shown in the diagrams.
Some modifications, as mentioned in detail in the methods section, on the oligo pull down
experimental procedure were made to reduce the pull down of non-specific proteins. EF1A
promoter was pulled down from same types of cells as above for MS trial 2. In replicate 1, 946
proteins and 443 proteins are identified in the sample group and control group, respectively (Fig.
7.6a). A total of 389 proteins are present in both groups and 557 proteins are unique to the
sample group including EF1A2A, which is a known EF1A binding protein (Vera et al., 2014). In
131
replicate 2, 1816 proteins and 1566 proteins are identified in the sample group and the control
group, respectively, among which 413 proteins are unique to the sample group (Fig. 7.6b),
including a known transcription factor Sp1(Talukder et al., 2001; Wakabayashi-Ito and Nagata,
1994). Further comparison of the sample-unique proteins of replicates 1 and replicate 2 showed
22 proteins are common between the two replicates (Fig. 7.6c). These common proteins only
account for less than 0.5% of the total number of sample-unique proteins of both replicates. Three
of the 22 proteins are related with transcription including general transcription factor TBP (TATA
box binding protein), HDAC1 (Histone deacetylase 1), and transcription factor ARID3A (AT-rich
interactive domain-containing protein 3A).
Figure 7.6 Mass spectrometry trial 2 results indicate low reproducibility of proteins identified in the
assay.
(a) Venn diagram of proteins identified in sample and control groups in replicate 1. Between the 946
proteins detected in sample and 443 proteins identified in control, 389 proteins are common. The sample
group contains 557 unique proteins, which are absent in the control group, in replicate 1. (b) Venn diagram
of proteins identified in sample and control groups in replicate 2. Between the 1816 proteins detected in
sample group and 1566 proteins identified in control group, 1403 proteins are found in both groups. The
sample group contains 413 unique proteins that were not detected in the control group in replicate 2. (c)
Venn diagram of proteins unique to sample groups in replicate 1 and replicate 2. A total of 22 proteins are
common between the 557 sample-unique proteins in replicate 1 and the 413 sample-unique proteins in
replicate 2. These proteins found in both replicates are listed in the table with the ones related to
transcription highlighted in yellow. The figures are shown in the same manner as described in Figure 7.5.
132
The low percentage of proteins consistently pulled down by this assay in different
replicates of MS trial 1 and trial 2 suggests that the reproducibility of this assay needs to be
improved.
7.3.5 Non-specific protein pull down may be reduced by use of PvuII restriction digestion
to release the EF1A captured by Dynabeads
The low reproductivity of protein identified by the assay may be caused by the non-specific
proteins captured by the Dynabeads that limit the detection of the real signals of the EF1A binding
proteins in MS analysis. One way to reduce the non-specific proteins background in the assay is
to decrease the proteins absorbed on the surface of Dynabeads that can be eluted together with
the EF1A promoter binding proteins. Considering that the biotin-streptavidin binding is very stable
and can only be destroyed at extreme conditions like high temperature or extreme pH (Cantor
and Fidelio, 1997), several strategies using different restriction enzymes and plasmids were
tested to release the captured EF1A molecules from the Dynabeads as described below.
In the first attempt, a new plasmid (pDL27) with SphI and NheI sites inserted between the
biotinylated oligo annealing site and the EF1A promoter was constructed. The oligo pull down
results with pDL27 transfected cells indicate that neither SphI nor NheI digestion is able to release
the EF1A molecule captured on the Dynabeads (data not shown). In the second attempt, the
sequence of oligo annealing site was modified to include a SacI site in the middle on pDL30. After
annealing the biotinylated oligo to the annealing site of pDL30 and binding to Dynabeads, this
SacI site can be cut by SacI restriction enzyme to release the EF1A promoter captured by the
Dynabeads. Although SacI digestion can release EF1A from doped pDL30 captured by the
Dynabeads, the EF1A pulled down from minichromosome in the transfected cells is resistant to
SacI digestion (data not shown). The failure of releasing the EF1A captured in the oligo pull down
experiment by SphI, NheI, and SacI digestions is likely caused by the occupation of the restriction
enzyme binding sites by unknown proteins inside the cells.
133
To overcome the difficulty in digesting DNA that may be bound by proteins and provide a
readily accessible unique restriction enzyme digestion site to release the captured EF1A promoter
from Dynabeads, a pair of biotinylated oligos were designed. The 41 nt oligo, DL131, with a
triethyleneglycol (TEG) linked biotin label that has extra 15 nt at the 3’ end that will be left single-
stranded after annealed to the 26 bases annealing site on pDL30. The 15 nt oligo DL132,
complementary to these extra 15 nt on DL131, would create a PvuII site in the middle of these 15
bp region when it anneals to DL131. This newly created PvuII site would provide a unique
restriction digestion site that is not affected by protein binding and can be readily digested (Fig.
7.7). In addition, the 3’ TEG biotin on DL131 has a longer linker which can reduce any steric
hindrance of the Dynabeads on PvuII activity.
Figure 7.7 BSA-free PvuII can release of EF1A enriched from both doped plasmid and
minichromosome from the Dynabeads.
The pull down experiment was performed with T4 DNA polymerase at 1 unit/µg and HindIII at 15 units/µg
in digestion/0.05% SDS buffer. EF1A pulled down from doped pDL30 and minichromosome in pDL30
transfected 293/EBNA1 cells using a biotinylated oligo DL131 and DL132 is digested by 0.1, 0.5, or 1 unit
of BSA-free PvuII to release the EF1A from the Dynabeads. The portion of EF1A released by PvuII into
the supernatant is labeled as “sup” below the x axis. The portion of EF1A that is remains on the Dynabeads
after PvuII is shown as “boiled beads” below the x axis. Hygromycin gene (Hyg5) is used as the control
region for non-specific pull down. The configuration of the EF1A captured on the Dynabeads is shown at
the upper right corner of the figure.
134
Considering that BSA is present in the storage buffer of the PvuII enzyme (for providing
stability for storage) can potentially interfere MS analysis, activity of the BSA-free PvuII was
evaluated and confirmed in digestions of purified plasmid DNA and prior to use in the oligo pull
down assay. EF1A promoter from both doped pDL30 and pDL30 transfected into cells can be
released by as low as 0.1 unit of BSA-free PvuII digestion after captured on the Dynabeads in the
pull down assay (Fig. 7.7). Approximately 60%-70% of EF1A from the doped pDL30 and 35%-
40% of EF1A from minichromosome are pulled down by the Dynabeads in the assay. BSA-free
PvuII releases up to 90% (60% out of 65%) of the captured EF1A from doped plasmid.
Approximately 61% (23% released from 38% captured) of the EF1A from minichromosome is
released in reactions with 0.1 unit of BSA-free PvuII.
These experiments clearly showed that using DL131 and DL132 oligo to create a BSA-
free PvuII digestible restriction site to release the captured EF1A promoter from Dynabeads can
be used to reduce the presence non-specific proteins in the assay for MS analysis.
7.3.6 Low number of proteins identified when proteins are recovered by PvuII digestion
The samples recovered from PvuII digestion were tested in MS trial 3 to investigate if PvuII
can help to reduce the non-specific binding proteins and increase the reproducibility of protein
identification in this assay. Two replicates each with three samples were prepared. Two of the
three samples were derived from proteins captured from pDL30 double-transfected cells. The
first sample was the portion of proteins released by PvuII digestion of the Dynabeads (mentioned
as “sup sample” hereafter). The second sample was the portion of proteins remain on the
Dynabeads after PvuII digestion (mentioned as “beads sample” hereafter). The last sample
contained proteins captured by the Dynabeads from 293/EBNA1 cells without transfected plasmid
(mentioned as “control sample” hereafter).
135
Table 7.2 Mass spectrum trial 3 identifies limited number of proteins.
Two replicates (rep 1 and rep 2) were analyzed by MS in this trial. The peak area of the identified proteins
in different samples is shown in the table. Column named “sup sample” is the group of proteins released
by PvuII from the Dynabeads. Column named “beads sample” is the group of proteins resulted from boiled
beads after PvuII digestion. Column named “control sample” is the group of proteins enriched from
untransfected 293/EBNA1cells. The Uniprot accession for each identified protein and the abbreviation of
the proteins are listed in the first and third columns. Only proteins identified in the sup of both replicates
are shown in the table in red. The ones specific to “sup sample” but not in “beads sample” and “control
sample” are filled in yellow.
In replicate 1, a total of 46, 72, and 120 proteins are identified in the sup sample, beads
sample, and control sample, respectively (Table 7.2). For replicate 2, the protein number
identified in the sup sample, beads sample, and control sample is 86, 72, and 80, respectively. A
total of 37 proteins are present in the sup sample of both replicate 1 and replicate 2. Label-free
MS was performed in this trial and therefore, proteins are identified without quantitation. There
are only 4 proteins both present in the sup sample of both replicate 1 and replicate 2 but not in
beads sample or control sample and all four of them are irrelevant to transcriptional activity of
EF1A promoter. The limited number of proteins identified does not allow a comprehensive
evaluation of the assay. The low protein number identified may be due to the poor transfection
136
efficiency and low copy number of the minichromosome maintained in the 293/EBNA1 cells after
transfection and hygromycin selection and resulted in the very low amount of EF1A promoter
pulled down in the assay. One potential cause of the low copy number of minichromosome in the
cells is the extremely robust production of GFP from the TurboGFP on the pDL30 that affecting
cell survival and interfering with episome maintenance.
7.4 Discussion
7.4.1 Advantages of the oligo pull down assay
The de novo oligo pull down assay aims to detect the unknown DNA binding proteins on
any DNA sequence of interest in live cells. It has several advantages compared with the
previously published assays:
i.) The EBV based episome closely simulate events that occur in the human genome (Han
et al., 2001; Lin and Hsieh, 2001; Okitsu and Hsieh, 2007; Okitsu et al., 2010). Each cell
contains tens to hundreds of copies of the minichromosomes (Hsieh, 1994) that can
increase the DNA target number per cell 10 to 100 folds than the genomic targets, which
are two copies per cell for non-repetitive sequences. The high copy number of episome
per cell would overcome the limitation of applying the method only to repetitive regions of
the genome such as telomeres and centromeres.
ii.) Once well established, our assay can be used directly for profiling of DNA binding proteins
of different loci by alteration of the sequence between the HindIII sites on the plasmid.
There is no need to re-optimize the specificity and efficiency of this assay for different
target sequences as required by the dCas9 based methods.
iii.) The oligo pull down assay detects the binding proteins of the target DNA in live cells which
provides results more reasonable at biological level compared with the in vitro results
detected by affinity chromatography.
137
iv.) Our method takes advantage of the high specificity between a synthesized biotin labeled
DNA oligonucleotide and the target DNA region on the minichromosome. The biotin
labeled oligo can specifically anneal to the target region and has minimal pairing chance
with other regions, which leads to lower level of non-specific proteins compared with the
dCas9 related assays.
v.) Our method uses biotin-streptavidin interaction (Kd = 10
-14
mol/L) to capture the target
DNA and protein complexes. This ensures the stable and efficient capture step.
vi.) Our method does not require large scale cell cultures. Usually, 3 x10
7
cells are enough
for each MS analysis.
7.4.2 The specificity and reproducibility of the assay
Specificity and reproducibility are two basic and most important standards to evaluate a
de novo assay. Two main issues for the oligo pull down assay are: i) if the assay can specifically
capture the EF1A DNA and proteins bind to it? ii) if the DNA and protein pulled down by this
assay are reproducible?
The oligo pull down assay shows high specificity and reproducibility at DNA level which is
analyzed by the qPCR assay. The specificity of this assay at DNA level is evaluated by
comparison of the pull down efficiencies of the target EF1A region and a control region that locate
on the same minichromosome. The use of pDL30 greatly increases the pull down efficiency of
the EF1A DNA above 61% and the control region captured is not detectable by qPCR when
plasmid DNA is doped into 293/EBNA1 cell harvest (Fig. 7.7). The pull down efficiency of EF1A
from pDL30 transfected cells is up to 38%, and that of the control hygromycin gene is less than
1% (Fig. 7.7). The high pull down efficiency of the promoter region increases the absolute copy
number of unknown binding protein on EF1A when same number of input cells are used. The
low pull down efficiency of the control region will lead to low non-specific proteins detected by the
assay. The specificity of our assay at DNA level is higher than the methods using dCas9 (Liu et
al., 2017; Schmidtmann et al., 2016). The specific high pull down efficiency of the EF1A region
138
rather than the control region is reproducible in many different trials at DNA level (Fig. 7.7 and
data not shown). The results on DNA level indicate the application of the oligo pull down assay
in profiling of unknown DNA binding proteins is achievable and practical.
MS is used to assess the performance of this assay at protein level. It is hard to assess
the specificity of the oligo pull down assay at protein level since only limited EF1A binding proteins
have been reported (Talukder et al., 2001; Vera et al., 2014; Wakabayashi-Ito and Nagata, 1994).
Since EF1A is a promoter, proteins related to transcription and DNA binding are expected in the
MS results. Some potential interesting proteins are identified in MS trial 2, such as Sp1, EF1A2A,
TBP, HDAC1, and ARID3A. The reproducibility of the assay at protein level is evaluated by
comparison of the results from two replicated samples. Ideally, the sample-unique proteins from
two MS replicates would be identical. But considering the minor changes of the procedures (cell
culture, transfection efficiency, incubation time, digestion conditions, or even the pipette tips used)
can result in differences of the proteins identified, it is impossible to get the ideal results. However,
we do expect that a reasonably high level of overlap between replicates. The percentages of
overlapping sample-unique proteins between two replicates in the MS trials 1 and 2 are only
around 0.5%~1% of the total proteins identified in the sample group, with the majority of them
cytoplasmic proteins. In the following MS trial 3, PvuII is used to reduce the non-specific proteins
binding to the Dynabeads. Only limited number of proteins were identified in MS trial 3 (Table
7.2) likely due to the low total number of minichromosomes present in the transfected cells
possibly caused by the toxicity of TurboGFP protein to cell survival and minichromosome
maintenance. The total number of minichromosome harvested from five 100-mm-plates of double
transfected cells for MS trail 3 is around 10
8
, which is much lower than expected considering each
cell should have 10-100 copies of minichromosome. Future efforts should be made to improve
the reproducibility of this assay at protein level by increasing the copy number of episomes
maintained in the cells and the total minichromosome harvested as a first step.
7.4.3 Future directions
139
To increase the reproducibility of the oligo pull down assay at the protein level, several
possible aspects should be considered. First, further modification of the current pDL30 plasmid
is needed. The replacement of TurboGFP with luciferase may increase the survival rate of the
transfected cells and increase the copy number of the minichromosome in the cells after
transfection. Second, the copy number of minichromosomes in the cells can be followed by
luciferase assay over time and before cell harvest for the pull down assay. Cells maintaining high
copy number of the minichromosome can then be used in the pull down assay. The high copy
number of minichormosome allows the pull down of more EF1A binding proteins and increases
the ratio of signal to noise for the MS analysis. Third, incubation time of different steps should be
kept to a minimum to reduce the protein-DNA reverse cross-link. Fourth, the final MS samples
should be in Tris-free buffers in order to increase the isotope labeling efficiency for MS analysis.
Once the assay is established to consistently capture similar proteins reproducibly with
EF1A promoter on the episome, it can be used to study the binding proteins on other DNA
sequence of interest. For example, the EF1A on the minichromosome can be replaced by the 23
bp sequence (or concatemer of the 23 bp sequence) to identify proteins specifically binding to the
E2A fragile region in live cells.
140
Chapter 8 Concluding Remarks
8.1 Models for the clustered E2A breakage
The E2A breakage in E2A-PBX1 and E2A-HLF translocations is highly focused within a
23 bp fragile zone in E2A intron 16 (Fig. 1.6). The fusion proteins derived from breakage within
intron 16 in the E2A-PBX1 and E2A-HLF translocations cause human B-cell malignancy
(specifically acute lymphoblastic leukemia). It is understandable that a specific intron is required
within the E2A gene to produce an oncogenic fusion protein. But within the 3.3 kb intron 16, what
accounts for the over 400-fold preference for the 23 bp fragile zone? The statistical analyses
indicate that the E2A breakpoints are in significant proximity to AID hotspot motifs (Table 1.2).
Since AID requires single-stranded DNA as substrate (Bransteitter et al., 2003; Pham et al., 2003),
the main question remains is what causes the duplex DNA in the E2A fragile zone to be single-
stranded, which restricts the AID activity within the focal 23 bp zone?
There are several features unique to the 23 bp E2A fragile zone and its surrounding region.
First, the only two CpG sites within a substantial length (250 bp) are in the 23 bp fragile zone,
which is centrally located in this 250 bp region and has a 600 bp C-string rich zone immediately
downstream (Fig. 8.1a). These two CpG sites are methylated with frequencies ranging from 4%
to 90% in human pre-B cells examined in my study and up to 100% in other hematopoietic cells
(Table 3.1). Analysis of all the available translocation breakpoint sequences involving E2A clearly
shows that the DNA breaks occur predominantly at CpG sites in patients. While other Cs can be
deaminated by AID with high frequency (Fig. 5.5), only methylated CpG sites can give rise to the
long-lived T:G mismatches (rather than U:G mismatches for unmethylated Cs) that are important
for the translocation process (Pannunzio and Lieber, 2017; Pfeifer, 2006; Schmutte et al., 1995;
Tsai et al., 2008; Walsh and Xu, 2006).
Second, as mentioned above, the two CpG sites in the E2A fragile zone are located at the
beginning of a region with the highest density of C-strings on both strands in the entire 3.3 kb
141
intron 16 (Fig. 8.1a). It has been shown previously that C-strings cause duplex DNA to favor a
B/A-intermediate DNA structure in vitro and in vivo where cytosines have a nearly 6-fold more
frequent breathing (lack of base pairing) frequency (Dornberger et al., 1999; Ng et al., 2000;
Trantı
́ rek et al., 2000; Tsai et al., 2009). DNA with three consecutive Cs have an average base
pair half time of ~5 ms and a Kd of 1.2 x 10
-6
, which is higher than that with alternating Cs and Gs
(Kd = 4.1 x 10
-7
) (Dornberger et al., 1999). This indicates an increased frequency of DNA
breathing (opening and closing) in regions with C-strings compared with random sequences.
In the current study, I specifically show that the 23 bp E2A fragile zone and the immediate
downstream region have a high propensity to be single-stranded (Fig. 4.5; Fig. 4.8; Fig. 4.12),
and AID can act on this region of DNA even without transcription (Fig. 5.5b). With mutant
sequences altered at one or more of the C-strings in the 23 bp fragile zone and its downstream
region, I show that disruption of multiple C-strings and direct repeats flanking the fragile zone
reduces AID targeting to these regions (Fig. 5.8). These results strongly indicate that multiple C-
strings in the 23 bp fragile zone and its downstream region are critical and have a direct effect on
AID deamination activity in this region.
Third, I show that transcription increases the AID targeting to this region beyond the direct
effect of C-strings in the DNA. Transcription separates the strands, thereby providing additional
ssDNA exposure in the transcription bubble. Transcription is known to be slower through regions
with C-strings (Gressel et al., 2017; Pham et al., 2019; Watts et al., 2019). A decreased
transcription rate would increase the number of RNA polymerases in the region and increase the
time and length of ssDNA exposed due to the multiple transcription bubbles. Given the high
density of C-strings in the 666 bp region containing the 23 bp fragile zone and its downstream
sequence, several RNA polymerases can potentially accumulate as they progress more slowly
through this region. Each RNA polymerase leaves a region of transient negative superhelical
tension upstream (Liu and Wang, 1987; Sinden, 1994). With several RNA polymerases, the
cumulative negative superhelical tension in the 23 bp fragile zone at the beginning of this C-string
142
rich region would favor unwinding of the two strands of the duplex, increasing the single-
strandedness of the 23 bp fragile zone and its downstream region for the AID deamination activity,
as we observe.
I show in this study that removal of the RNA tail by RNase A results in a marked reduction
in AID deamination activity in the 23 bp E2A fragile zone and its downstream region (Fig. 5.6).
Removal of the RNA tail is known to reduce transcription stalling (Bentin et al., 2005), thus
reducing the exposure of ssDNA on the NTS. Also, transcription through regions with direct
repeats can increase the chance of misalignment (Fig. 5.9) increasing the presence of ssDNA
and the chance for AID action. The slippage between the direct repeats flanking the 23 bp E2A
fragile zone may also contribute to the focused E2A breakage within a narrow fragile zone since
mut2, which without the direct repeat, shows decreased AID deamination activity (Fig. 5.8).
Findings in this study provide evidence for a model proposed below for the features
defining the 23 bp fragile zone within intron 16 of the E2A gene (Fig. 8.1b). The high density of
C-strings increases thermal fluctuation, and this results in transient ssDNA in the region, making
it accessible to AID even without transcription. Transcription exposes the single-stranded NTS in
the transcription bubble and further increases the opportunity for AID deamination of the cytosines
in the bubble. In addition, the B/A-intermediate DNA conformation, the C-string density, and
slippage of direct repeats flanking the fragile zone combined to slow down the movement of RNA
polymerases, leading to the accumulation of multiple polymerases. The accumulation of
polymerases favors unwinding of the duplex DNA in the 23 bp fragile zone, in addition to its
intrinsically lower melting temperature. The retardation of RNA polymerase movement and the
unwinding of the duplex DNA in the 23 bp fragile zone increase opportunities for AID to deaminate
its preferred targets in this region. The two AID hotspots in the 23 bp fragile zone are the only
ones in the 666 bp region with high C-string density which overlap, and this is an additional factor
favoring their targeting by AID (Han et al., 2011; Wei et al., 2015). The deamination of cytosines
in the AID hotspot motif (WRC) results in U:G mismatches which can be efficiently repaired by
143
UDG. However, the T:G mismatch resulting from AID deamination of the 5-methylcytosine within
the WRCG motif is long-lived because thymine DNA glycosylase is in low abundance in
mammalian cells and because of the inefficiency of T:G mismatch processing enzymes (Schmutte
et al., 1995; Walsh and Xu, 2006). The persistent T:G mismatch can be cut by activated Artemis
or the RAG complex and become a DSB (Cui et al., 2013; Pannunzio and Lieber, 2019; Tsai et
al., 2008). These steps give the opportunity to form the oncogenic fusion protein with different
partners, which leads to the B-cell malignancy.
8.2 Relevance to other major fragile regions in human lymphoid
translocations
The E2A, Bcl1, and Bcl2 fragile regions are all localized to very small zones of less than
600 bp, which is very different from the common human fragile sites employed in cytogenetics
upon replication poison-challenge testing. The latter are millions of bp in size and do not
contribute to somatic cell translocations because their fragility is primarily only a cell culture
phenomenon (Durkin and Glover, 2007). In contrast, natural DNA breakage in the B lymphoid
malignancies occurs at the pro-B and pre-B cell stages, and these break sites show clear
evidence of TdT activity that is characteristic of these stages of lymphoid development (Jager et
al., 2000; Tsai et al., 2008; Welzel et al., 2001). With CpG as the most significant dinucleotide
motif in all those fragile regions with AID initiated lesions, a new type of breakage mechanism,
the AID-type break, was proposed (Cui et al., 2013; Greisman et al., 2012; Lu et al., 2015a; Tsai
et al., 2008; Tsai et al., 2010a).
The breakpoints in these 20-600 bp fragile regions have highly significant proximity to AID
hotspot motifs. The expression of AID in B cells but not T cells explains the lineage specificity of
the DNA breakage (B lineage rather than T lineage). The expression level of AID at pro-B and
pre-B cell stage is low but sufficient to cause the rare translocation events (Cantaert et al., 2015;
Han et al., 2007; Kelsoe, 2014; Kumar et al., 2013; Kuraoka et al., 2011; Kuraoka et al., 2009;
144
Mao et al., 2004; Ueda et al., 2007; Umiker et al., 2014). The CG motif and its surrounding
sequences determine the sequence specificity of the DNA breakage. Bcl1 and Bcl2 fragile zones
adopt a B/A-intermediate DNA structure, characterized by many different assays including native
bisulfite probing (Javadekar et al., 2018; Raghavan et al., 2004a; Raghavan et al., 2004b; Tsai et
al., 2009). As the only DNA motif that can be methylated, CpG, especially the one preferred by
AID in WRCG motifs in the B/A intermediate DNA zones further delineates the fragile region. The
CpG sites at Bcl1 and Bcl2 breakpoints are typically within WRCG motifs and regions rich in C-
strings, as I have described here for the E2A fragile zone. Deamination of the methylated CpG
sites at these fragile zones would lead to long-lived DNA lesions (T:G mismatches) in the same
manner.
A key question addressed in this thesis concerns how the DNA in lymphoid fragile zones
becomes a substrate for AID, since only the CpG sites in these zones are the ones vulnerable to
AID. I have shown that the region near all of these CpG sites have an elevated density of C-
strings on the NTS and TS. This favors a B/A-intermediate structure that is prone to rapid thermal
fluctuation (breathing apart of the NTS and TS). I have demonstrated increased bisulfite
sensitivity in the E2A fragile zone, which reflects the increased vulnerability of ssDNA to
nucleophilic attack. This corresponds with the Bcl2 and Bcl1 fragile zones, which also have
increased bisulfite sensitivity (Tsai et al., 2009). I have shown that transcription through the E2A
fragile zone increases AID attack, and the E2A fragile zone is unique in having two methylated
cytosines that are also within AID hotspot motifs. Moreover, the two CpG sites are overlapping
with one another, which increases the likelihood of DSB formation even more (Han et al., 2011;
Wei et al., 2015).
Our model (RAG or activated Artemis cutting at AID-deaminated methylCpG sites)
explains the developmental stage, lymphoid lineage, and sequence specificity of the major
lymphoid translocation breakpoints (Fig. 8.1). My work explains how the essential ssDNA
145
required for AID action arises, and an explanation for this critical first step has been lacking up to
this time.
146
Figure 8.1 Models of E2A breakage.
(a) Factors contributing to the clustered E2A breakage. The E2A intron 16 is shown as a black line between
exon 16 and exon 17. The E2A fragile zone is marked by the red asterisks above the black line. C-strings
with a length of 4 and longer are shown as vertical blue lines and annotated above the black line for the
NTS and below the black line for the TS. The density of C-string in each of the three regions in E2A intron
16 is shown above the bracket. The enlarged view of the 666 bp region with high C-string density on both
strands, and particularly on the NTS, is illustrated below the intron 16 by an orange horizontal line.
Cytosines in AID hotspot motifs in this region are shown as thin vertical black lines. Cytosines in both AID
hotspot motifs and CpG sites (therefore WRCG) are shown in bold vertical black lines. The two blue arrow
heads on the sequence represent the direct DNA repeats flanking the E2A fragile zone. (b) Steps leading
to E2A breakage. DNA regions with high C-string density are in a B/A intermediate DNA structure which
have increased DNA thermal fluctuation and are vulnerable to AID deamination. Transcription through the
C-string rich region further increases its single-stranded character by the accumulation of RNA polymerases
and by the slippage between DNA direct repeats. The upstream zone in this high C-string density region
will be affected most since the accumulated negative superhelical tension left behind RNA polymerase will
be the highest at the uppermost region. The cytosines in WRCG motif within this region are preferred target
of AID deamination, which lead to persistent DNA lesions when cytosines are methylated. The long-lived
DNA lesions at methylated CpG sites within AID hotspots are subject to nuclease activities (RAG or
activated Artemis in B cells), resulting in DSBs.
147
References
Abner, C.W., Lau, A.Y., Ellenberger, T., and Bloom, L.B. (2001). Base excision and DNA binding
activities of human alkyladenine DNA glycosylase are sensitive to the base paired with a lesion.
J Biol Chem 276, 13379-13387.
Aplan, P.D., Lombardi, D.P., Ginsberg, A.M., Cossman, J., Bertness, V.L., and Kirsch, I.R. (1990).
Disruption of the human SCL locus by" illegitimate" V-(D)-J recombinase activity. Science 250,
1426-1429.
Beckett, D., Kovaleva, E., and Schatz, P.J. (1999). A minimal peptide substrate in biotin
holoenzyme synthetase-catalyzed biotinylation. Protein Science 8, 921-929.
Beletskii, A., and Bhagwat, A.S. (1996). Transcription-induced mutations: increase in C to T
mutations in the nontranscribed strand during transcription in Escherichia coli. Proceedings of the
National Academy of Sciences 93, 13919-13924.
Bell, S.P., and Dutta, A. (2002). DNA replication in eukaryotic cells. Annual review of biochemistry
71, 333-374.
Bentin, T., Cherny, D., Larsen, H.J., and Nielsen, P.E. (2005). Transcription arrest caused by long
nascent RNA chains. Biochimica et Biophysica Acta (BBA)-Gene Structure and Expression 1727,
97-105.
Bertocci, B., De Smet, A., Weill, J.-C., and Reynaud, C.-A. (2006). Nonoverlapping Functions of
DNA Polymerases Mu, Lambda, and Terminal Deoxynucleotidyltransferase during
Immunoglobulin V(D)J Recombination In Vivo. Immunity 25, 31-41.
Beucher, A., Birraux, J., Tchouandong, L., Barton, O., Shibata, A., Conrad, S., Goodarzi, A.A.,
Krempler, A., Jeggo, P.A., and Löbrich, M. (2009). ATM and Artemis promote homologous
recombination of radiation-induced DNA double-strand breaks in G2. Embo j 28, 3413-3427.
Blanca, G., Villani, G., Shevelev, I., Ramadan, K., Spadari, S., Hübscher, U., and Maga, G. (2004).
Human DNA Polymerases λ and β Show Different Efficiencies of Translesion DNA Synthesis past
Abasic Sites and Alternative Mechanisms for Frameshift Generation. Biochemistry 43, 11605-
11615.
Bolzer, A., Kreth, G., Solovei, I., Koehler, D., Saracoglu, K., Fauth, C., Müller, S., Eils, R., Cremer,
C., and Speicher, M.R. (2005). Three-dimensional maps of all chromosomes in human male
fibroblast nuclei and prometaphase rosettes. PLoS Biol 3, e157.
Box, H.C., Budzinski, E.E., Evans, M.S., French, J.B., and Maccubbin, A.E. (1993). The
differential lysis of phosphoester bonds by nuclease P1. Biochimica et Biophysica Acta (BBA)-
Protein Structure and Molecular Enzymology 1161, 291-294.
Bransteitter, R., Pham, P., Scharff, M.D., and Goodman, M.F. (2003). Activation-induced cytidine
deaminase deaminates deoxycytidine on single-stranded DNA but requires the action of RNase.
Proc Natl Acad Sci U S A 100, 4102-4107.
Briggs, M.R., Kadonaga, J.T., Bell, S.P., and Tjian, R. (1986). Purification and biochemical
characterization of the promoter-specific transcription factor, Sp1. Science 234, 47-52.
Brown, S.A., Imbalzano, A.N., and Kingston, R.E. (1996). Activator-dependent regulation of
transcriptional pausing on nucleosomal templates. Gene Dev 10, 1479-1490.
Cantaert, T., Schickel, J.-N., Bannock, Jason M., Ng, Y.-S., Massad, C., Oe, T., Wu, R., Lavoie,
A., Walter, Jolan E., Notarangelo, Luigi D., et al. (2015). Activation-Induced Cytidine Deaminase
148
Expression in Human B Cell Precursors Is Essential for Central B Cell Tolerance. Immunity 43,
884-895.
Cantor, C.R., and Fidelio, G.D. (1997). Interaction of biotin with streptavidin. Mol. Biol 272, 11288-
11294.
Canugovi, C., Samaranayake, M., and Bhagwat, A.S. (2009). Transcriptional pausing and stalling
causes multiple clustered mutations by human activation-induced deaminase. The FASEB
Journal 23, 34-44.
Carvajal-Garcia, J., Cho, J.-E., Carvajal-Garcia, P., Feng, W., Wood, R.D., Sekelsky, J., Gupta,
G.P., Roberts, S.A., and Ramsden, D.A. (2020). Mechanistic basis for microhomology
identification and genome scarring by polymerase theta. Proceedings of the National Academy of
Sciences 117, 8476-8485.
Chang, H.H., and Lieber, M.R. (2016). Structure-specific nuclease activities of Artemis and the
Artemis: DNA-PKcs complex. Nucleic acids research 44, 4991-4997.
Chang, H.H., Pannunzio, N.R., Adachi, N., and Lieber, M.R. (2017). Non-homologous DNA end
joining and alternative pathways to double-strand break repair. Nat Rev Mol Cell Bio 18, 495.
Chang, H.H., Watanabe, G., and Lieber, M.R. (2015). Unifying the DNA End-processing Roles of
the Artemis Nuclease KU-DEPENDENT ARTEMIS RESECTION AT BLUNT DNA ENDS. J Biol
Chem 290, 24036-24050.
Chaudhuri, J., Tian, M., Khuong, C., Chua, K., Pinaud, E., and Alt, F.W. (2003). Transcription-
targeted DNA deamination by the AID antibody diversification enzyme. Nature 422, 726-730.
Chodosh, L.A., Baldwin, A.S., Carthew, R.W., and Sharp, P.A. (1988). Human CCAAT-binding
proteins have heterologous subunits. Cell 53, 11-24.
Collier, D.A., Griffin, J., and Wells, R.D. (1988). Non-B right-handed DNA conformations of
homopurine. homopyrimidine sequences in the murine immunoglobulin C alpha switch region. J
Biol Chem 263, 7397-7405.
Core, L.J., and Lis, J.T. (2008). Transcription regulation through promoter-proximal pausing of
RNA polymerase II. Science 319, 1791-1792.
Cui, X., Lu, Z., Kurosawa, A., Klemm, L., Bagshaw, A.T., Tsai, A.G., Gemmell, N., Müschen, M.,
Adachi, N., Hsieh, C.-L., et al. (2013). Both CpG Methylation and Activation-Induced Deaminase
Are Required for the Fragility of the Human bcl-2 Major Breakpoint Region: Implications for the
Timing of the Breaks in the t(14;18) Translocation. Mol Cell Biol 33, 947-957.
Dai, Y., Kennedy-Darling, J., Shortreed, M.R., Scalf, M., Gasch, A.P., and Smith, L.M. (2017).
Multiplexed Sequence-Specific Capture of Chromatin and Mass Spectrometric Discovery of
Associated Proteins. Anal Chem 89, 7841-7846.
Déjardin, J., and Kingston, R.E. (2009). Purification of proteins associated with specific genomic
Loci. Cell 136, 175-186.
Desai, N.A., and Shankar, V. (2003). Single-strand-specific nucleases. FEMS microbiology
reviews 26, 457-491.
Dornberger, U., Leijon, M., and Fritzsche, H. (1999). High base pair opening rates in tracts of GC
base pairs. J Biol Chem 274, 6957-6962.
Dornberger, U., Spacková, N., Walter, A., Gollmick, F.A., Sponer, J., and Fritzsche, H. (2001).
Solution structure of the dodecamer d-(CATGGGCC-CATG) 2 is B-DNA. Experimental and
molecular dynamics study. Journal of Biomolecular Structure and Dynamics 19, 159-174.
149
Durkin, S.G., and Glover, T.W. (2007). Chromosome fragile sites. Annu. Rev. Genet. 41, 169-192.
Eddy, J., Vallur, A.C., Varma, S., Liu, H., Reinhold, W.C., Pommier, Y., and Maizels, N. (2011).
G4 motifs correlate with promoter-proximal transcriptional pausing in human genes. Nucleic acids
research 39, 4975-4983.
Finer, M.H., Fodor, E., Boedtker, H., and Doty, P. (1984). Endonuclease S1-sensitive site in
chicken pro-alpha 2 (I) collagen 5'flanking gene region. Proceedings of the National Academy of
Sciences 81, 1659-1663.
Fischer, U., Forster, M., Rinaldi, A., Risch, T., Sungalee, S., Warnatz, H.-J., Bornhauser, B.,
Gombert, M., Kratsch, C., and Stütz, A.M. (2015). Genomics and drug profiling of fatal TCF3-
HLF− positive acute lymphoblastic leukemia identifies recurrent mutation patterns and therapeutic
options. Nature genetics 47, 1020-1029.
Fletcher, C., Heintz, N., and Roeder, R.G. (1987). Purification and characterization of OTF-1, a
transcription factor regulating cell cycle expression of a human histone H2b gene. Cell 51, 773-
781.
Frommer, M., McDonald, L.E., Millar, D.S., Collis, C.M., Watt, F., Grigg, G.W., Molloy, P.L., and
Paul, C.L. (1992). A genomic sequencing protocol that yields a positive display of 5-
methylcytosine residues in individual DNA strands. Proceedings of the National Academy of
Sciences 89, 1827-1831.
Fujimoto, M., Kuninaka, A., and Yoshino, H. (1974). Identity of phosphodiesterase and
phosphomonoesterase activities with nuclease P1 (a nuclease from Penicillium citrinum).
Agricultural and Biological Chemistry 38, 785-790.
Fujita, T., and Fujii, H. (2013). Efficient isolation of specific genomic regions and identification of
associated proteins by engineered DNA-binding molecule-mediated chromatin
immunoprecipitation (enChIP) using CRISPR. Biochem Biophys Res Commun 439, 132-136.
Fujita, T., and Fujii, H. (2014). Identification of proteins associated with an IFNγ-responsive
promoter by a retroviral expression system for enChIP using CRISPR. PloS one 9, e103084.
Gao, X.D., Tu, L.-C., Mir, A., Rodriguez, T., Ding, Y., Leszyk, J., Dekker, J., Shaffer, S.A., Zhu,
L.J., and Wolfe, S.A. (2018). C-BERST: defining subnuclear proteomic landscapes at genomic
elements with dCas9–APEX2. Nature methods 15, 433-436.
Gartenberg, M.R., and Crothers, D.M. (1988). DNA sequence determinants of CAP-induced
bending and protein binding affinity. Nature 333, 824-829.
Gauss, G.H., and Lieber, M.R. (1996). Mechanistic constraints on diversity in human V(D)J
recombination. Mol Cell Biol 16, 258-269.
Globyte, V., Lee, S.H., Bae, T., Kim, J.S., and Joo, C. (2019). CRISPR/Cas9 searches for a
protospacer adjacent motif by lateral diffusion. The EMBO journal 38.
Gonzalez, M.N., Blears, D., and Svejstrup, J.Q. (2020). Causes and consequences of RNA
polymerase II stalling during transcript elongation. Nat Rev Mol Cell Bio, 1-19.
Gostissa, M., Alt, F.W., and Chiarle, R. (2011). Mechanisms that promote and suppress
chromosomal translocations in lymphocytes. Annual review of immunology 29, 319-350.
Grawunder, U., Wilm, M., Wu, X., Kulesza, P., Wilson, T.E., Mann, M., and Lieber, M.R. (1997).
Activity of DNA ligase IV stimulated by complex formation with XRCC4 protein in mammalian cells.
Nature 388, 492-495.
150
Greisman, H.A., Lu, Z., Tsai, A.G., Greiner, T.C., Yi, H.S., and Lieber, M.R. (2012). IgH partner
breakpoint sequences provide evidence that AID initiates t(11;14) and t(8;14) chromosomal
breaks in mantle cell and Burkitt lymphomas. Blood 120, 2864-2867.
Gressel, S., Schwalb, B., Decker, T.M., Qin, W., Leonhardt, H., Eick, D., and Cramer, P. (2017).
CDK9-dependent RNA polymerase II pausing controls transcription initiation. Elife 6, e29736.
Guajardo, R., Lopez, P., Dreyfus, M., and Sousa, R. (1998). NTP concentration effects on initial
transcription by T7 RNAP indicate that translocation occurs through passive sliding and reveal
that divergent promoters have distinct NTP concentration requirements for productive initiation.
Journal of molecular biology 281, 777-792.
Han, J.-H., Akira, S., Calame, K., Beutler, B., Selsing, E., and Imanishi-Kari, T. (2007). Class
switch recombination and somatic hypermutation in early mouse B cells are mediated by B cell
and Toll-like receptors. Immunity 27, 64-75.
Han, L., Lin, I.G., and Hsieh, C.-L. (2001). Protein binding protects sites on stable episomes and
in the chromosome from de novo methylation. Mol Cell Biol 21, 3416-3424.
Han, L., Masani, S., and Yu, K. (2011). Overlapping activation-induced cytidine deaminase
hotspot motifs in Ig class-switch recombination. Proceedings of the National Academy of Sciences
108, 11584-11589.
Hayatsu, H. (2008). Discovery of bisulfite-mediated cytosine conversion to uracil, the key reaction
for DNA methylation analysis—a personal account. Proceedings of the Japan Academy, Series
B 84, 321-330.
Hayatsu, H., Shiraishi, M., and Negishi, K. (2008). Bisulfite modification for analysis of DNA
methylation. Current Protocols in Nucleic Acid Chemistry 33, 6.10. 11-16.10. 15.
Hayatsu, H., Wataya, Y., Kai, K., and Iida, S. (1970). Reaction of sodium bisulfite with uracil,
cytosine, and their derivatives. Biochemistry 9, 2858-2865.
Hein, D., Dreisig, K., Metzler, M., Izraeli, S., Schmiegelow, K., Borkhardt, A., and Fischer, U.
(2019). The preleukemic TCF3-PBX1 gene fusion can be generated in utero and is present in≈
0.6% of healthy newborns. Blood 134, 1355-1358.
Hsieh, C.-L. (1994). Dependence of transcriptional repression on CpG methylation density. Mol
Cell Biol 14, 5487-5494.
Hsieh, C.-L. (1999). Evidence that protein binding specifies sites of DNA demethylation. Mol Cell
Biol 19, 46-56.
Huang, J., and Sousa, R. (2000). T7 RNA polymerase elongation complex structure and
movement. Journal of molecular biology 303, 347-358.
Hunger, S.P., Ohyashiki, K., Toyama, K., and Cleary, M.L. (1992). Hlf, a novel hepatic bZIP
protein, shows altered DNA-binding properties following fusion to E2A in t (17; 19) acute
lymphoblastic leukemia. Gene Dev 6, 1608-1620.
Inaba, T., Roberts, W.M., Shapiro, L.H., Jolly, K.W., Raimondi, S.C., Smith, S.D., and Look, A.T.
(1992). Fusion of the leucine zipper gene HLF to the E2A gene in human acute B-lineage
leukemia. Science 257, 531-534.
Jager, U., Bocskor, S., Le, T., Mitterbauer, G., Bolz, I., Chott, A., Kneba, M., Mannhalter, C., and
Nadel, B. (2000). Follicular lymphomas' BCL-2/IgH junctions contain templated nucleotide
insertions: novel insights into the mechanism of t(14;18) translocation. Blood 95, 3520-3529.
151
Javadekar, S.M., Yadav, R., and Raghavan, S.C. (2018). DNA structural basis for fragility at peak
III of BCL2 major breakpoint region associated with t (14; 18) translocation. Biochimica et
Biophysica Acta (BBA)-General Subjects 1862, 649-659.
Jee, J., Rasouly, A., Shamovsky, I., Akivis, Y., Steinman, S.R., Mishra, B., and Nudler, E. (2016).
Rates and mechanisms of bacterial mutagenesis from maximum-depth sequencing. Nature 534,
693-696.
Jensen, D.E., Kelly, R., and Von Hippel, P. (1976). DNA" melting" proteins. II. Effects of
bacteriophage T4 gene 32-protein binding on the conformation and stability of nucleic acid
structures. J Biol Chem 251, 7215-7228.
Jones, K.A., Kadonaga, J.T., Rosenfeld, P.J., Kelly, T.J., and Tjian, R. (1987). A cellular DNA-
binding protein that activates eukaryotic transcription and DNA replication. Cell 48, 79-89.
Jonkers, I., and Lis, J.T. (2015). Getting up to speed with transcription elongation by RNA
polymerase II. Nat Rev Mol Cell Bio 16, 167-177.
Jutras, B.L., Verma, A., and Stevenson, B. (2012). Identification of novel DNA-binding proteins
using DNA-affinity chromatography/pull down. Current protocols in microbiology 24, 1F. 1.1-1F.
1.13.
Kadonaga, J.T., and Tjian, R. (1986). Affinity purification of sequence-specific DNA binding
proteins. Proc Natl Acad Sci U S A 83, 5889-5893.
Kalocsay, M. (2019). APEX Peroxidase-Catalyzed Proximity Labeling and Multiplexed
Quantitative Proteomics. In Proximity Labeling (Springer), pp. 41-55.
Kato, M., Ishimaru, S., Seki, M., Yoshida, K., Shiraishi, Y., Chiba, K., Kakiuchi, N., Sato, Y., Ueno,
H., and Tanaka, H. (2017). Long-term outcome of 6-month maintenance chemotherapy for acute
lymphoblastic leukemia in children. Leukemia 31, 580-584.
Kelsoe, G. (2014). Curiouser and curiouser: The role(s) of AID expression in self-tolerance.
European Journal of Immunology 44, 2876-2879.
Kennedy-Darling, J., Guillen-Ahlers, H., Shortreed, M.R., Scalf, M., Frey, B.L., Kendziorski, C.,
Olivier, M., Gasch, A.P., and Smith, L.M. (2014). Discovery of Chromatin-Associated Proteins via
Sequence-Specific Capture and Mass Spectrometric Protein Identification in Saccharomyces
cerevisiae. J Proteome Res 13, 3810-3825.
Kennedy-Darling, J., and Smith, L.M. (2014). Measuring the formaldehyde Protein-DNA cross-
link reversal rate. Anal Chem 86, 5678-5681.
Kerrigan, L.A., and Kadonaga, J.T. (1998). Purification of Sequence-Specific DNA-Binding
Proteins by Affinity Chromatography. Current protocols in protein science 11, 9.6. 1-9.6. 18.
Krohn, M., and Wagner, R. (1996). Transcriptional pausing of RNA polymerase in the presence
of guanosine tetraphosphate depends on the promoter and gene sequence. J Biol Chem 271,
23884-23894.
Kumar, S., Wuerffel, R., Achour, I., Lajoie, B., Sen, R., Dekker, J., Feeney, A.J., and Kenter, A.L.
(2013). Flexible ordering of antibody class switch and V(D)J joining during B-cell ontogeny. Gene
Dev 27, 2439-2444.
Kuraoka, M., Holl, T.M., Liao, D., Womble, M., Cain, D.W., Reynolds, A.E., and Kelsoe, G. (2011).
Activation-induced cytidine deaminase mediates central tolerance in B cells. Proceedings of the
National Academy of Sciences 108, 11560-11565.
152
Kuraoka, M., Liao, D., Yang, K., Allgood, S.D., Levesque, M.C., Kelsoe, G., and Ueda, Y. (2009).
Activation-induced cytidine deaminase expression and activity in the absence of germinal centers:
insights into hyper-IgM syndrome. The Journal of Immunology 183, 3237-3248.
Landau, N.R., Schatz, D.G., Rosa, M., and Baltimore, D. (1987). Increased frequency of N-region
insertion in a murine pre-B-cell line infected with a terminal deoxynucleotidyl transferase retroviral
expression vector. Mol Cell Biol 7, 3237-3243.
Landick, R. (2006). The regulatory roles and mechanism of transcriptional pausing. (Portland
Press Ltd.).
Li, S., Chang, H.H., Niewolik, D., Hedrick, M.P., Pinkerton, A.B., Hassig, C.A., Schwarz, K., and
Lieber, M.R. (2014). Evidence that the DNA endonuclease ARTEMIS also has intrinsic 5′-
exonuclease activity. J Biol Chem 289, 7825-7834.
Li, Y.S., Hayakawa, K., and Hardy, R.R. (1993). The regulated expression of B lineage associated
genes during B cell differentiation in bone marrow and fetal liver. Journal of Experimental
Medicine 178, 951-960.
Lieber, M. (1993). The Causes and Consequences of Chromosomal Translocations. The Causes
and Consequences of Chromosomal Aberrations. CRC Press, Boca Raton, FL, 239-275.
Lieber, M.R. (1997). The FEN-1 family of structure-specific nucleases in eukaryotic DNA
replication, recombination and repair. Bioessays 19, 233-240.
Lieber, M.R. (2010). The Mechanism of Double-Strand DNA Break Repair by the Nonhomologous
DNA End-Joining Pathway. Annual Review of Biochemistry 79, 181-211.
Lieber, M.R. (2016). Mechanisms of human lymphoid chromosomal translocations. Nat Rev
Cancer 16, 387-398.
Lin, I.G., and Hsieh, C.L. (2001). Chromosomal DNA demethylation specified by protein binding.
EMBO reports 2, 108-112.
Liu, L.F., and Wang, J.C. (1987). Supercoiling of the DNA template during transcription.
Proceedings of the National Academy of Sciences 84, 7024-7027.
Liu, X., Zhang, Y., Chen, Y., Li, M., Zhou, F., Li, K., Cao, H., Ni, M., Liu, Y., and Gu, Z. (2017). In
situ capture of chromatin interactions by biotinylated dCas9. Cell 170, 1028-1043. e1019.
Lu, Z., Lieber, M.R., Tsai, A.G., Pardo, C.E., Müschen, M., Kladde, M.P., and Hsieh, C.-L. (2015a).
Human lymphoid translocation fragile zones are hypomethylated and have accessible chromatin.
Mol Cell Biol 35, 1209-1222.
Lu, Z., Pannunzio, N.R., Greisman, H.A., Casero, D., Parekh, C., and Lieber, M.R. (2015b).
Convergent BCL6 and lncRNA promoters demarcate the major breakpoint region for BCL6
translocations. Blood, The Journal of the American Society of Hematology 126, 1730-1731.
Lu, Z., Tsai, A.G., Akasaka, T., Ohno, H., Jiang, Y., Melnick, A.M., Greisman, H.A., and Lieber,
M.R. (2013). BCL6 breaks occur at different AID sequence motifs in Ig–BCL6 and non-Ig–BCL6
rearrangements. Blood 121, 4551-4554.
Luger, K., Mäder, A.W., Richmond, R.K., Sargent, D.F., and Richmond, T.J. (1997). Crystal
structure of the nucleosome core particle at 2.8 Å resolution. Nature 389, 251-260.
Ma, Y., Pannicke, U., Schwarz, K., and Lieber, M.R. (2002). Hairpin Opening and Overhang
Processing by an Artemis/DNA-Dependent Protein Kinase Complex in Nonhomologous End
Joining and V(D)J Recombination. Cell 108, 781-794.
153
Maga, G., and Hübscher, U. (2003). Proliferating cell nuclear antigen (PCNA): a dancer with many
partners. J Cell Sci 116, 3051-3060.
Mahowald, G.K., Baron, J.M., and Sleckman, B.P. (2008). Collateral damage from antigen
receptor gene diversification. Cell 135, 1009-1012.
Malig, M., Hartono, S.R., Giafaglione, J.M., Sanz, L.A., and Chedin, F. (2020). Ultra-Deep
Coverage Single-Molecule R-loop Footprinting Reveals Principles of R-loop Formation. Journal
of Molecular Biology.
Mao, C., Jiang, L., Melo-Jorge, M., Puthenveetil, M., Zhang, X., Carroll, M.C., and Imanishi-Kari,
T. (2004). T Cell-Independent Somatic Hypermutation in Murine B Cells with an Immature
Phenotype. Immunity 20, 133-144.
Martens, K.J.A., van Beljouw, S.P.B., van der Els, S., Vink, J.N.A., Baas, S., Vogelaar, G.A.,
Brouns, S.J.J., van Baarlen, P., Kleerebezem, M., and Hohlbein, J. (2019). Visualisation of dCas9
target search in vivo using an open-microscopy framework. Nature Communications 10, 3552.
McElhinny, S.A.N., Havener, J.M., Garcia-Diaz, M., Juárez, R., Bebenek, K., Kee, B.L., Blanco,
L., Kunkel, T.A., and Ramsden, D.A. (2005). A gradient of template dependence defines distinct
biological roles for family X polymerases in nonhomologous end joining. Molecular cell 19, 357-
366.
Mohammed, H., Taylor, C., Brown, G.D., Papachristou, E.K., Carroll, J.S., and D'santos, C.S.
(2016). Rapid immunoprecipitation mass spectrometry of endogenous proteins (RIME) for
analysis of chromatin complexes. Nat Protoc 11, 316.
Montminy, M.R., and Bilezikjian, L.M. (1987). Binding of a nuclear protein to the cyclic-AMP
response element of the somatostatin gene. Nature 328, 175-178.
Murakami, K.S., Masuda, S., Campbell, E.A., Muzzin, O., and Darst, S.A. (2002). Structural basis
of transcription initiation: an RNA polymerase holoenzyme-DNA complex. Science 296, 1285-
1290.
Muramatsu, M., Kinoshita, K., Fagarasan, S., Yamada, S., Shinkai, Y., and Honjo, T. (2000).
Class switch recombination and hypermutation require activation-induced cytidine deaminase
(AID), a potential RNA editing enzyme. Cell 102, 553-563.
Murarka, P., and Srivastava, P. (2018). An improved method for the isolation and identification of
unknown proteins that bind to known DNA sequences by affinity capture and mass spectrometry.
PLoS One 13, e0202602.
Murnane, J.P. (2006). Telomeres and chromosome instability. DNA Repair (Amst) 5, 1082-1092.
Myers, S.A., Wright, J., Peckner, R., Kalish, B.T., Zhang, F., and Carr, S.A. (2018). Discovery of
proteins associated with a predefined genomic locus via dCas9–APEX-mediated proximity
labeling. Nature methods 15, 437-439.
Naik, A.K., and Raghavan, S.C. (2008). P1 nuclease cleavage is dependent on length of the
mismatches in DNA. DNA repair 7, 1384-1391.
Ng, H.-L., Kopka, M.L., and Dickerson, R.E. (2000). The structure of a stable intermediate in the
A↔ B DNA helix transition. Proceedings of the National Academy of Sciences 97, 2035-2039.
Norris, D., and Stone, J. (2008). WHO classification of tumours of haematopoietic and lymphoid
tissues. Geneva: WHO, 22-23.
Nussenzweig, A., and Nussenzweig, M.C. (2010). Origin of chromosomal translocations in
lymphoid cancer. Cell 141, 27-38.
154
Oettinger, M., Schatz, D., Gorka, C., and Baltimore, D. (1990). RAG-1 and RAG-2, adjacent genes
that synergistically activate V(D)J recombination. Science 248, 1517-1523.
Okitsu, C.Y., and Hsieh, C.-L. (2007). DNA methylation dictates histone H3K4 methylation. Mol
Cell Biol 27, 2746-2757.
Okitsu, C.Y., Hsieh, J.C.F., and Hsieh, C.-L. (2010). Transcriptional activity affects the H3K4me3
level and distribution in the coding region. Mol Cell Biol 30, 2933-2946.
Pannunzio, N.R., and Lieber, M.R. (2016). Dissecting the roles of divergent and convergent
transcription in chromosome instability. Cell reports 14, 1025-1031.
Pannunzio, N.R., and Lieber, M.R. (2017). AID and Reactive Oxygen Species Can Induce DNA
Breaks within Human Chromosomal Translocation Fragile Zones. Molecular cell 68, 901-912.
e903.
Pannunzio, N.R., and Lieber, M.R. (2018). Concept of DNA lesion longevity and chromosomal
translocations. Trends in biochemical sciences 43, 490-498.
Pannunzio, N.R., and Lieber, M.R. (2019). Constitutively active Artemis nuclease recognizes
structures containing single-stranded DNA configurations. DNA repair 83, 102676.
Pannunzio, N.R., Watanabe, G., and Lieber, M.R. (2018). Nonhomologous DNA end-joining for
repair of DNA double-strand breaks. J Biol Chem 293, 10512-10523.
Pardue, M.L., and Gall, J.G. (1969). Molecular hybridization of radioactive DNA to the DNA of
cytological preparations. Proceedings of the National Academy of Sciences 64, 600-604.
Paulsson, K., Jonson, T., Ora, I., Olofsson, T., Panagopoulos, I., and Johansson, B. (2007).
Characterisation of genomic translocation breakpoints and identification of an alternative
TCF3/PBX1 fusion transcript in t(1;19)(q23;p13)-positive acute lymphoblastic leukaemias. Br J
Haematol 138, 196-201.
Pawelczak, K.S., and Turchi, J.J. (2010). Purification and characterization of exonuclease-free
Artemis: Implications for DNA-PK-dependent processing of DNA termini in NHEJ-catalyzed DSB
repair. DNA repair 9, 670-677.
Petersen, S., Casellas, R., Reina-San-Martin, B., Chen, H.T., Difilippantonio, M.J., Wilson, P.C.,
Hanitsch, L., Celeste, A., Muramatsu, M., and Pilch, D.R. (2001). AID is required to initiate Nbs1/γ-
H2AX focus formation and mutations at sites of class switching. Nature 414, 660-665.
Pfeifer, G. (2006). Mutagenesis at methylated CpG sequences. In DNA methylation: basic
mechanisms (Springer), pp. 259-281.
Pham, P., Afif, S.A., Shimoda, M., Maeda, K., Sakaguchi, N., Pedersen, L.C., and Goodman, M.F.
(2016). Structural analysis of the activation-induced deoxycytidine deaminase required in
immunoglobulin diversification. DNA repair 43, 48-56.
Pham, P., Bransteitter, R., Petruska, J., and Goodman, M.F. (2003). Processive AID-catalysed
cytosine deamination on single-stranded DNA simulates somatic hypermutation. Nature 424, 103-
107.
Pham, P., Malik, S., Mak, C., Calabrese, P.C., Roeder, R.G., and Goodman, M.F. (2019). AID–
RNA polymerase II transcription-dependent deamination of IgV DNA. Nucleic Acids Research 47,
10815-10829.
Qiao, Q., Wang, L., Meng, F.L., Hwang, J.K., Alt, F.W., and Wu, H. (2017). AID Recognizes
Structured DNA for Class Switch Recombination. Mol Cell 67, 361-373 e364.
155
Raghavan, S.C., Houston, S., Hegde, B.G., Langen, R., Haworth, I.S., and Lieber, M.R. (2004a).
Stability and strand asymmetry in the non-B DNA structure at the bcl-2 major breakpoint region.
J Biol Chem 279, 46213-46225.
Raghavan, S.C., Kirsch, I.R., and Lieber, M.R. (2001). Analysis of the V (D) J recombination
efficiency at lymphoid chromosomal translocation breakpoints. J Biol Chem 276, 29126-29133.
Raghavan, S.C., Swanson, P.C., Ma, Y., and Lieber, M.R. (2005). Double-strand break formation
by the RAG complex at the bcl-2 major breakpoint region and at other non-B DNA structures in
vitro. Mol Cell Biol 25, 5904-5919.
Raghavan, S.C., Swanson, P.C., Wu, X., Hsieh, C.L., and Lieber, M.R. (2004b). A non-B-DNA
structure at the Bcl-2 major breakpoint region is cleaved by the RAG complex. Nature 428, 88-93.
Raghavan, S.C., Tsai, A., Hsieh, C.L., and Lieber, M.R. (2006). Analysis of Non-B DNA Structure
at Chromosomal Sites in the Mammalian Genome. Method Enzymol 409, 301-316.
Ramadan, K., Shevelev, I.V., Maga, G., and Hübscher, U. (2004). De Novo DNA Synthesis by
Human DNA Polymerase λ, DNA Polymerase μ and Terminal Deoxyribonucleotidyl Transferase.
Journal of Molecular Biology 339, 395-404.
Ramiro, A.R., Stavropoulos, P., Jankovic, M., and Nussenzweig, M.C. (2003). Transcription
enhances AID-mediated cytidine deamination by exposing single-stranded DNA on the
nontemplate strand. Nature immunology 4, 452-456.
Rhee, H.-W., Zou, P., Udeshi, N.D., Martell, J.D., Mootha, V.K., Carr, S.A., and Ting, A.Y. (2013).
Proteomic mapping of mitochondria in living cells via spatially restricted enzymatic tagging.
Science 339, 1328-1331.
Rosenfeld, P.J., and Kelly, T. (1986). Purification of nuclear factor I by DNA recognition site affinity
chromatography. J Biol Chem 261, 1398-1408.
Roy, D., and Lieber, M.R. (2009). G Clustering Is Important for the Initiation of Transcription-
Induced R-Loops In Vitro, whereas High G Density without Clustering Is Sufficient Thereafter. Mol
Cell Biol 29, 3124-3133.
Roy, D., Yu, K., and Lieber, M.R. (2008). Mechanism of R-loop formation at immunoglobulin class
switch sequences. Mol Cell Biol 28, 50-60.
Santra, M., Danielson, K.G., and Iozzo, R.V. (1994). Structural and functional characterization of
the human decorin gene promoter. A homopurine-homopyrimidine S1 nuclease-sensitive region
is involved in transcriptional control. J Biol Chem 269, 579-587.
Schatz, D.G., Oettinger, M.A., and Baltimore, D. (1989). The V(D)J recombination activating gene,
RAG-1. Cell 59, 1035-1048.
Schmidtmann, E., Anton, T., Rombaut, P., Herzog, F., and Leonhardt, H. (2016). Determination
of local chromatin composition by CasID. Nucleus 7, 476-484.
Schmutte, C., Yang, A.S., Beart, R.W., and Jones, P.A. (1995). Base excision repair of U: G
mismatches at a mutational hotspot in the p53 gene is more efficient than base excision repair of
T: G mismatches in extracts of human colon tumors. Cancer research 55, 3742-3746.
Shiraishi, M., and Hayatsu, H. (2004). High-speed conversion of cytosine to uracil in bisulfite
genomic sequencing analysis of DNA methylation. DNA research 11, 409-415.
Shundrovsky, A., Santangelo, T.J., Roberts, J.W., and Wang, M.D. (2004). A single-molecule
technique to study sequence-dependent transcription pausing. Biophys J 87, 3945-3953.
156
Sinden, R.R. (1994). DNA structure and function (Gulf Professional Publishing).
Sohail, A., Klapacz, J., Samaranayake, M., Ullah, A., and Bhagwat, A.S. (2003). Human
activation-induced cytidine deaminase causes transcription-dependent, strand-biased C to U
deaminations. Nucleic acids research 31, 2990-2994.
Sono, M., Wataya, Y., and Hayatsu, H. (1973). Role of bisulfite in the deamination and the
hydrogen isotope exchange of cytidylic acid. Journal of the American Chemical Society 95, 4745-
4749.
Storb, U. (1996). The molecular basis of somatic hypermutation of immunoglobulin genes. Current
opinion in immunology 8, 206-214.
Sunder, S., and Wilson, T.E. (2019). Frequency of DNA end joining in trans is not
determined by the predamage spatial proximity of double-strand breaks in yeast. Proceedings of
the National Academy of Sciences 116, 9481-9490.
Talukder, A.H., Jørgensen, H.F., Mandal, M., Mishra, S.K., Vadlamudi, R.K., Clark, B.C.,
Mendelsohn, J., and Kumar, R. (2001). Regulation of elongation factor-1α expression by growth
factors and anti-receptor blocking antibodies. J Biol Chem 276, 5636-5642.
Temiakov, D., Mentesana, P.E., Ma, K., Mustaev, A., Borukhov, S., and McAllister, W.T. (2000).
The specificity loop of T7 RNA polymerase interacts first with the promoter and then with the
elongating transcript, suggesting a mechanism for promoter clearance. Proceedings of the
National Academy of Sciences 97, 14109-14114.
Thomas, M., White, R.L., and Davis, R.W. (1976). Hybridization of RNA to double-stranded DNA:
formation of R-loops. Proc Natl Acad Sci U S A 73, 2294-2298.
Trantı
́ rek, L., Štefl, R., Vorlı
́ čková, M., Koča, J., Sklenářář, V.r., and Kypr, J. (2000). An A-type
double helix of DNA having B-type puckering of the deoxyribose rings. Journal of molecular
biology 297, 907-922.
Travers, A.A. (1989). DNA conformation and protein binding. Annual review of biochemistry 58,
427-452.
Tsai, A.G., Engelhart, A.E., Ma'mon, M.H., Houston, S.I., Hud, N.V., Haworth, I.S., and Lieber,
M.R. (2009). Conformational variants of duplex DNA correlated with cytosine-rich chromosomal
fragile sites. J Biol Chem 284, 7157-7164.
Tsai, A.G., Lu, H., Raghavan, S.C., Muschen, M., Hsieh, C.-L., and Lieber, M.R. (2008). Human
chromosomal translocations at CpG sites and a theoretical basis for their lineage and stage
specificity. Cell 135, 1130-1142.
Tsai, A.G., Lu, Z., and Lieber, M.R. (2010a). The t (14; 18)(q32; q21)/IGH-MALT1 translocation
in MALT lymphomas is a CpG-type translocation, but the t (11; 18)(q21; q21)/API2-MALT1
translocation in MALT lymphomas is not. Blood, The Journal of the American Society of
Hematology 115, 3640-3641.
Tsai, A.G., Yoda, A., Weinstock, D.M., and Lieber, M.R. (2010b). t (X; 14)(p22; q32)/t (Y; 14)(p11;
q32) CRLF2-IGH translocations from human B-lineage ALLs involve CpG-type breaks at CRLF2,
but CRLF2/P2RY8 intrachromosomal deletions do not. Blood 116, 1993-1994.
Ueda, Y., Liao, D., Yang, K., Patel, A., and Kelsoe, G. (2007). T-Independent Activation-Induced
Cytidine Deaminase Expression, Class-Switch Recombination, and Antibody Production by
Immature/Transitional 1 B Cells. The Journal of Immunology 178, 3593-3601.
157
Umiker, B.R., McDonald, G., Larbi, A., Medina, C.O., Hobeika, E., Reth, M., and Imanishi-Kari, T.
(2014). Production of IgG autoantibody requires expression of activation-induced deaminase in
early-developing B cells in a mouse model of SLE. European Journal of Immunology 44, 3093-
3108.
Valenciano, A.C., Cowell, R.L., Rizzi, T.E., and Tyler, R.D. (2014). Section 5: - Hematopoietic
Neoplasia. In Atlas of Canine and Feline Peripheral Blood Smears, A.C. Valenciano, R.L. Cowell,
T.E. Rizzi, and R.D. Tyler, eds. (Mosby), pp. 234-257.e232.
Vera, M., Pani, B., Griffiths, L.A., Muchardt, C., Abbott, C.M., Singer, R.H., and Nudler, E. (2014).
The translation elongation factor eEF1A1 couples transcription to translation during heat shock
response. Elife 3, e03164.
Vogt, V.M. (1973). Purification and further properties of single-strand-specific nuclease from
Aspergillus oryzae. European journal of biochemistry 33, 192-200.
von Hippel, P.H., Johnson, N.P., and Marcus, A.H. (2013). Fifty years of DNA “breathing”:
Reflections on old and new approaches. Biopolymers 99, 923-954.
Wakabayashi-Ito, N., and Nagata, S. (1994). Characterization of the regulatory elements in the
promoter of the human elongation factor-1 alpha gene. J Biol Chem 269, 29831-29837.
Walsh, C., and Xu, G. (2006). Cytosine methylation and DNA repair. In DNA Methylation: Basic
Mechanisms (Springer), pp. 283-315.
Watts, J.A., Burdick, J., Daigneault, J., Zhu, Z., Grunseich, C., Bruzel, A., and Cheung, V.G.
(2019). cis Elements that Mediate RNA Polymerase II Pausing Regulate Human Gene Expression.
The American Journal of Human Genetics 105, 677-688.
Wei, L., Chahwan, R., Wang, S., Wang, X., Pham, P.T., Goodman, M.F., Bergman, A., Scharff,
M.D., and MacCarthy, T. (2015). Overlapping hotspots in CDRs are critical sites for V region
diversification. Proceedings of the National Academy of Sciences 112, E728-E737.
Weigert, O., and Weinstock, D.M. (2012). The evolving contribution of hematopoietic progenitor
cells to lymphomagenesis. Blood, The Journal of the American Society of Hematology 120, 2553-
2561.
Welzel, N., Le, T., Marculescu, R., Mitterbauer, G., Chott, A., Pott, C., Kneba, M., Du, M.Q., Kusec,
R., Drach, J., et al. (2001). Templated nucleotide addition and immunoglobulin JH-gene utilization
in t(11;14) junctions: implications for the mechanism of translocation and the origin of mantle cell
lymphoma. Cancer Res 61, 1629-1636.
Wiemels, J.L., Leonard, B.C., Wang, Y., Segal, M.R., Hunger, S.P., Smith, M.T., Crouse, V., Ma,
X., Buffler, P.A., and Pine, S.R. (2002). Site-specific translocation and evidence of postnatal origin
of the t(1;19) E2A-PBX1 fusion in childhood acute lymphoblastic leukemia. Proc Natl Acad Sci U
S A 99, 15101-15106.
Wigler, M., Sweet, R., Sim, G.K., Wold, B., Pellicer, A., Lacy, E., Maniatis, T., Silverstein, S., and
Axel, R. (1979). Transformation of mammalian cells with genes from procaryotes and eucaryotes.
Cell 16, 777-785.
Williams, D.L., and Kowalski, D. (1993). Easily unwound DNA sequences and hairpin structures
in the Epstein-Barr virus origin of plasmid replication. Journal of virology 67, 2707-2715.
Wilson, T.E., and Sunder, S. (2020). Double-strand breaks in motion: implications for
chromosomal rearrangement. Current Genetics 66, 1-6.
158
Wood, R.D., Gearhart, P.J., Neuberger, M.S., Ruiz, J.F., Domínguez, O., Lera, T.L.d., García–
Díaz, M., Bernad, A., and Blanco, L. (2001). DNA polymerase mu, a candidate hypermutase?
Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences 356,
99-109.
Wu, C., Tsai, C., and Wilson, S. (1988). Affinity chromatography of sequence-specific DNA-
binding proteins. In Genetic engineering (Springer), pp. 67-74.
Wu, C.-H., Chen, S., Shortreed, M.R., Kreitinger, G.M., Yuan, Y., Frey, B.L., Zhang, Y., Mirza, S.,
Cirillo, L.A., and Olivier, M. (2011). Sequence-specific capture of protein-DNA complexes for
mass spectrometric protein identification. PloS one 6.
Wu, H.-M., and Crothers, D.M. (1984). The locus of sequence-directed and protein-induced DNA
bending. Nature 308, 509-513.
Yu, K., Chedin, F., Hsieh, C.L., Wilson, T.E., and Lieber, M.R. (2003). R-loops at immunoglobulin
class switch regions in the chromosomes of stimulated B cells. Nat Immunol 4, 442-451.
Yu, K., Huang, F.T., and Lieber, M.R. (2004). DNA substrate length and surrounding sequence
affect the activation-induced deaminase activity at cytidine. J Biol Chem 279, 6496-6500.
Yu, K., and Lieber, M.R. (2019). Current insights into the mechanism of mammalian
immunoglobulin class switch recombination. Crit Rev Biochem Mol Biol 54, 333-351.
Yu, K., Roy, D., Bayramyan, M., Haworth, I.S., and Lieber, M.R. (2005). Fine-structure analysis
of activation-induced deaminase accessibility to class switch region R-loops. Mol Cell Biol 25,
1730-1736.
Zhao, B., Rothenberg, E., Ramsden, D.A., and Lieber, M.R. (2020a). The molecular basis and
disease relevance of non-homologous DNA end joining. Nat Rev Mol Cell Bio 21, 765-781.
Zhao, B., Watanabe, G., and Lieber, M.R. (2020b). Polymerase μ in non-homologous DNA end
joining: importance of the order of arrival at a double-strand break in a purified system. Nucleic
Acids Res 48, 3605-3618.
Zhao, B., Watanabe, G., Morten, M.J., Reid, D.A., Rothenberg, E., and Lieber, M.R. (2019). The
essential elements for the noncovalent association of two DNA ends during NHEJ synapsis. Nat
Commun 10, 3588.
Zuo, Y., and Steitz, T.A. (2015). Crystal structures of the E. coli transcription initiation complexes
with a complete bubble. Molecular cell 58, 534-540.
Abstract (if available)
Abstract
A chromosomal translocation is the first essential step for most human lymphoid malignancies. The translocation process involves a DNA breakage phase and a rejoining phase. The rejoining phase is most often carried out by non-homologous end joining (NHEJ), which is the major pathway for repair of double-strand breaks (DSBs) throughout the entire cell cycle. The DSBs arising in the breakage phase can occur due to pathologic or physiologic causes. Previous studies have revealed that chromosomal translocations in early B cells usually involve a DSB at the immunoglobulin heavy chain (IGH) locus caused by the recombination activating gene (RAG) complex (most often arising during a D to J recombination attempt) and a second DSB at the non-IGH locus initiated by the enzyme, Activation Induced cytidine Deaminase (AID). The non-IGH DNA breaks are usually focused on the CG motif within small zones of 20-600 bp. AID can only deaminate cytosines in regions of single-stranded DNA (ssDNA), but the basis for restriction of the AID activity to such narrow fragile zones has been unclear. In other words, what makes these 20-600 bp regions of duplex DNA transiently single-stranded to permit AID action. That is the central issue studied in this thesis. ? In this study, I have focused on the factors that determine DNA breakage at the E2A fragile zone to provide insight into the other B cell translocation fragile zones in human lymphoma. The E2A gene has the smallest fragile zone (23 bp) among all oncogenes involved in human translocations. The PBX1 gene and the HLF gene are the two most common translocation partners of E2A. The statistical analyses using all the available junctional sequences indicate E2A breakpoints are in significant proximity to the CG motif as well as AID hotspot motifs in both E2A-PBX1 translocation and E2A-HLF translocation. This suggests the involvement of AID in the breakage phase of E2A. In this thesis, I used native DNA bisulfite chemical probing on live pre-B cells to investigate the single-stranded character of the E2A fragile zone (and surrounding DNA), including deviations from the duplex DNA state that are transient, intermittent, and short-lived. The results indicate the 23 bp E2A fragile zone and its downstream region have a higher degree of transient single-stranded character in living cells, which is further supported by DNA melting studies and a nuclease P1 assay on purified DNA. The length of consecutive cytosines (C-string) on either strand is correlated with the native bisulfite reactivity (on a per nt basis). This finding indicates that duplex DNA with C-strings on either strand has increased thermal fluctuation and therefore, increased frequency of adopting an intermittent, transient ssDNA deviation from the double-stranded DNA (dsDNA) conformation. NMR studies by others indicate that the Cs within C-strings undergo thermal fluctuation (breathe) on average every 5 milliseconds, which is nearly 6 times more frequent than isolated Cs. ? I have also studied AID activity on the E2A fragile zone using a biochemically defined system. Four AID deamination peaks, two within the E2A fragile zone and two in the downstream region, are observed on the non-transcribed strand (NTS) but not the transcribed strand (TS) of E2A without transcription. AID deamination activity within the E2A fragile zone and its downstream region increases upon transcription in this purified system using T7 RNA polymerase. Seven AID deamination peaks, with higher mutation rate compared with that on untranscribed E2A, are present on the NTS with transcription and six of them are within and downstream of the E2A fragile zone. ? I next investigate factors that may influence the AID targeting in the 23 bp E2A fragile zone. I find that RNase A treatment markedly decreases the AID activity on the NTS of E2A to background levels during transcription. This indicates that removal of the trailing nascent RNA from the RNA polymerase (removal of RNA tail is known to reduce transcription stalling) decreases the AID accessibility to the cytosines on the NTS. ? Using the biochemical system, I also investigate the impact of C-string density on the AID deamination of E2A using two E2A mutants, mut1 with a C-string on the NTS disrupted and mut2 with a C-string on the NTS and two C-strings on the TS disrupted. Compared with the results on wild type E2A substrate (wt E2A), AID shows similar deamination activity on mut1 but shows significantly decreased activity on mut2 both with and without transcription. These results demonstrate that removal of three C-strings on mut2 leads to decreased AID targeting in the fragile zone and the downstream region. In addition, the bisulfite sequencing assay (under denatured DNA conditions) shows that the two CpG sites within the E2A fragile zone each has a detectable level of methylation in the three pre-B cell lines (methylcytosine, upon deamination, becomes T, which results in the long-lived T:G mismatch, which is a persistent lesion). A model for the focused E2A breakage is derived based on the above results. It is likely that a noncanonical B-form DNA (non-B DNA that adopts a B/A-intermediate DNA conformation between A-form and B-form DNA) structure forms in the 23 bp E2A fragile zone and its downstream region due to the high C-string density. RNA polymerase pausing in this non-B region, especially at the C-strings, and DNA slippage between direct repeats that flank the fragile zone can lead to this 23 bp fragile zone adopting a transient single-stranded state that is vulnerable to AID deamination. When CpGs are methylated in this non-B region, AID acts on it at AID WRC hotspot motifs (namely WRCG, W = A or T, R = A or G) generating a long-lived T:G mismatch lesion, which is vulnerable to conversion to DSB by the action of RAG or Artemis. ? Besides DNA sequence predisposing to a transient ssDNA that makes the fragile zone a suitable AID substrate, I also considered the possibility that specific proteins may bind within the fragile zone and cause strand separation to create a suitable ssDNA substrate for AID. The DNA affinity column and electrophoretic mobility shift assay (EMSA) are used to detect proteins specifically binding to the E2A fragile zone. Proteins with high affinity to the E2A fragile zone were not identified using this assay, likely due to the high complexity and low sensitivity of this assay. A de novo oligo pull down assay utilizing the specificity of DNA pairing followed by mass spectrometric (MS) analysis is under development to study the unknown DNA binding proteins on any DNA sequence of interest inside live cells. Further optimization and testing are needed to apply this assay for the study of potential E2A specific binding proteins that can separate DNA strands to generate transient ssDNA.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Molecular genetic analysis of human genetic lesions
PDF
Studies on the role of Artemis in non-homologous DNA end-joining to understand the mechanism and discover therapies
PDF
Mechanisms of nucleases in non-homologous DNA end joining
PDF
Molecular elucidation of nonhomologous DNA end-joining in the context of nucleosome core particles
PDF
The mechanism of R-loop formation in mammalian immunoglobulin class switch recombination
PDF
The mechanism of mammalian immunoglobulin class switch recombination
Asset Metadata
Creator
Liu, Di
(author)
Core Title
Mechanistic basis for chromosomal translocations at the E2A gene
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Cancer Biology and Genomics
Degree Conferral Date
2021-08
Publication Date
07/19/2021
Defense Date
06/07/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
activation-induced deaminase (AID),B/A-intermediate DNA,CpG,C-strings,DNA methylation,double-strand breaks,lymphoid leukemia,lymphoma,non-B DNA structure,OAI-PMH Harvest,single-stranded DNA,Transcription
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Press, Michael F. (
committee chair
), Hsieh, Chih-Lin (
committee member
), Lieber, Michael R. (
committee member
), Maxson, Robert (
committee member
), Zandi, Ebrahim (
committee member
)
Creator Email
dadidi@foxmail.com,diliu@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC15610518
Unique identifier
UC15610518
Legacy Identifier
etd-LiuDi-9793
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Liu, Di
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
activation-induced deaminase (AID)
B/A-intermediate DNA
CpG
C-strings
DNA methylation
double-strand breaks
lymphoid leukemia
lymphoma
non-B DNA structure
single-stranded DNA