Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
A structure based study of the HIV restriction factor APOBEC3G
(USC Thesis Other)
A structure based study of the HIV restriction factor APOBEC3G
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
A STRUCTURE BASED STUDY OF THE HIV RESTRICTION FACTOR
APOBEC3G
by
Lauren Georgianna Holden
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulllment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(MOLECULAR BIOLOGY)
May 2011
Copyright 2011 Lauren Georgianna Holden
Epigraph
The harder I work, the luckier I get. {Kari Byron
ii
Dedication
For my grandfather, Albert Vanderbeck McCotter. This is all your fault.
And for my rst mentor, Georgia Shearer. Please check my math at the end.
iii
Acknowledgments
In a perfect world I would be able to thank everyone who has supported me, pushed me,
guided me, or otherwise shaped me through this day; but as I have been so frequently
told, this is not a perfect world. So I will simply do my imperfect best to thank everyone
I can, and trust that those I do not name individually will either accept this blanket
thanks for everything, or never read this at all. I really do love all y'all.
Family rst: Thanks to my parents for doing everything possible, imaginable and
things both beyond my imagination and a little bit inadvertant to get me here. I can
quite literally say that I could not have done this without you. I am in your debt forever,
and unfortunately I don't think I can ever possibly pay you back. Thank you.
Thanks to my sister for setting the bar so high. I never would have worked so hard
if I didn't have to live up to \Oh, you're her sister?" You will always be ahead of me,
including in line for the Senior Discounts. Although now at least, Sparkly will get there
rst. And a special thank you for the goldsh, without which this thesis would not have
been written.
Thanks to all my grandparents for spoiling me with your generosity, unconditional
love and inexplicable pride. I'll keep doing my best to earn it and share it. I kept my
promise Grandpa. Miss you.
And since I am a crazy cat lady already, I have to thank my cat. My funny little
furball lets me dress her up in silly costumes, which makes me happy when science is
not. And most especially, her wonderful whiskers without which my diraction would
have sucked.
iv
Friends next: Thanks to both my roommates, Shawn and Liz, for putting up with
the most concentrated dose of me and somehow still remaining my friend. I can easily
say you two were the best roommates that I've ever had. Thank you Shawn for putting
up with my little allergen, introducing me to hockey, and whooping my butt on Mario
Kart. A super-sized thank you to Liz for twice as much time, carpooling nearly every
day, dragging me to the gym, decorating for all the holidays, not putting Domino's on
speed-dial, and so much more.
Thanks to Soph for coming in a close second in time spent, and a distant rst in
many other ways. And yes, my dear, now we can run away together forever. Sorry,
Pretty.
Thanks to Aysen for being my brightest friend, in every sense of the word. My eyes
are open wider and are laughing more because of you. Thanks to Ian for being the best
little brother I never had. I think I still owe you a couple batches of cookies.
Thanks to Bill for being a patient mentor and a darn good role model and friend.
I've got a better rum with real glasses and a deck of cards waiting.
Thanks to Melissa for being both my Satchel and my Bucky. I wish you weren't 3000
miles away. Two bits, four bits, six bits, a peso...miss you my dear.
And Science last: Thank you Xiaojiang for allowing me to be me, and taking the
science that came from that. You've shown me how to eventually focus in on one thing,
I just hope I didn't scare you away from letting students start out with 100.
Thank you to Myron, Courtney, Ronda, and Linda for grandfathering me on this
project. Your humor, patience and knowledge was greatly appreciated and well used
and abused. This work would not look as good without you.
Thank you to Ray Stevens for lab space, resources, and advice; my career and this
project would be completely dierent without you.
Thank you to my committee members past and present for committing their time
and insight to aid my scientic growth, and helping me through the various
aming
hoops of grad school.
v
Thanks especially to my class of 2005, our study groups got me through that rst
year. Meghann for infectious energy and being another girl on the IM team, Liz for your
detailed notes and cell cycle diagram, Justin for the best ribs ever, Pao-Chen and Chris
for simply being amazing, and even Sarmad for the stories.
Thank you to all my labmates, past and present, but especially: Maja for sharing
and getting me started. Danny and Georgia for being the best rst mentors I could
have asked for; you are a very large reason I came this direction. Matt for football
updates from Norway and the random cat video sanity check. Scott for being Canadian
and SSRL. Irene for being a bubbling ball of goodness. Chiharu for Chiharu, arigato-
gozaimas. Mike for safety harnesses, CNS, and sarcasm. Meng for A3G, Diablo, and
late late nights. Paul for CD2 and your Unix based wizardry. Sam for bringing both
sanity and insanity back into the lab. Ganggang, thank you for sharing DnaG, a bench,
and a beginning. And a giant thank you to my power pack of proof-readers: Michelle,
Courtney, Ian, and of course, Liz.
Thank you to the MCB and Biology support sta: Joe, Christina, Eleni, Laura and
Linda. You all are amazing and don't get thanked enough. Thank you to the rest of the
MCB Department for making the retreats so memorable and illuminating.
And lastly, thank you APOBEC. You are my love, my muse, my very reason for living
these past 5 years; I'm more optimistically cynical because of you, my complicated little
protein.
vi
Table of Contents
Epigraph ii
Dedication iii
Acknowledgments iv
List of Tables x
List of Figures xi
Abstract xiii
Chapter 1: Introduction 1
1.1 Cytidine Deamination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 APOBECs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 APOBEC1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 AID & APOBEC2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 APOBEC3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.4 APOBEC3G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Overview of Chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Chapter 2: A3G-CD2 13
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Purication and Crystallization . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Structure determination and renement . . . . . . . . . . . . . . . 14
2.2.3 Construction of A3G mutants . . . . . . . . . . . . . . . . . . . . . 15
2.2.4 DNA binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.5 Deamination activity, processivity and directionality assays . . . . 16
2.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Chapter 3: Full-length A3G 30
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Purication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
vii
3.3 Crystallization Screens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 Dynamic Light Scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5 Deamination Activity and DNA binding . . . . . . . . . . . . . . . . . . . 38
3.6 Mutants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.8 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Chapter 4: Summary and Future Directions 42
Bibliography 44
Appendix A: mtMCM ATPase 56
A 1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
A 2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
A 3 Experimental Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
A 3.1 Cloning and Mutagenesis . . . . . . . . . . . . . . . . . . . . . . . 58
A 3.2 Protein Purication . . . . . . . . . . . . . . . . . . . . . . . . . . 58
A 3.3 Heat Stability of Mutants . . . . . . . . . . . . . . . . . . . . . . . 60
A 3.4 Helicase Assay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
A 3.5 DNA Binding Assay . . . . . . . . . . . . . . . . . . . . . . . . . . 61
A 3.6 ATPase assay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
A 4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
A 4.1 Location of Mutated Residues on the mtMCM Double-Hexamer
Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
A 4.2 Mutant Protein Assembly and Stability . . . . . . . . . . . . . . . 62
A 4.3 Helicase Assay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
A 4.4 DNA binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
A 4.5 ATPase Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
A 5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
A 6 Supporting Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Appendix B: G40P ATPase 70
B 1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
B 2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
B 3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
B 3.1 Overall structure of the hexamerc helicase . . . . . . . . . . . . . . 72
B 3.2 Monomer structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
B 3.3 Cis and trans structures . . . . . . . . . . . . . . . . . . . . . . . . 75
B 3.4 Hexamerization of the ATPase domain . . . . . . . . . . . . . . . . 76
B 3.5 N-terminal requirement for helicase activity and primase binding . 78
B 3.6 Inter-tier contacts and helicase stimulation by primase . . . . . . . 79
B 4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
B 5 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
B 5.1 Protein purication and crystallization . . . . . . . . . . . . . . . . 84
B 5.2 Data collection and structure determination . . . . . . . . . . . . . 85
viii
B 5.3 Helicase assay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
B 5.4 ATPase assay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
B 5.5 Native gel shift assay . . . . . . . . . . . . . . . . . . . . . . . . . . 88
B 5.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 89
ix
List of Tables
1.1 Summary of the APOBEC family members . . . . . . . . . . . . . . 3
1.2 Homology of individual domains of APOBEC3 family members . 10
2.1 Data collection (MAD) and model renement statistics . . . . . . 15
3.1 Summary of DLS results . . . . . . . . . . . . . . . . . . . . . . . . . . 37
x
List of Figures
1.1 Deamination reaction and canonical active site . . . . . . . . . . . 4
1.2 A2 the rst X-ray APOBEC structure . . . . . . . . . . . . . . . . . 6
1.3 Alignment of the individual A3 domains . . . . . . . . . . . . . . . . 11
2.1 Enzymatic activity of A3G-CD2 . . . . . . . . . . . . . . . . . . . . . 18
2.2 X-ray structure of A3G-CD2 . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Structural comparison of the A3G-CD2 X-ray structure with
the A3G-2K3A NMR structure . . . . . . . . . . . . . . . . . . . . . 20
2.4 Common and distinct structural features between APOBEC pro-
teins and other Zn-deaminase superfamily enzymes . . . . . . . . 22
2.5 Superposition of A3G-CD2 . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 A3G-CD2 AC loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.7 A3G-CD2 Active Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.8 Predicted substrate groove and deamination activity of A3G
mutants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1 Gel ltration proles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Microcrystals and pretty salt crystals . . . . . . . . . . . . . . . . . 34
3.3 DNA gel shift of all four Groups . . . . . . . . . . . . . . . . . . . . 39
A.1 Location of the ve positively charged residues on the inner
channel surface of N-mtMCM . . . . . . . . . . . . . . . . . . . . . . 59
xi
A.2 Gel ltration analysis of wt and mutant mtMCM proteins . . . . 63
A.3 Helicase activity of wt and mutant proteins . . . . . . . . . . . . . 64
A.4 DNA binding of wt and mutants . . . . . . . . . . . . . . . . . . . . 66
A.5 ATPase activity of the wt and mutants . . . . . . . . . . . . . . . . 67
B.1 The overall features of the G40P hexameric helicase structure . 73
B.2 Two distinct monomeric structures of G40P . . . . . . . . . . . . . 75
B.3 Assembly of the G40P hexamer . . . . . . . . . . . . . . . . . . . . . 77
B.4 G40P helicase activity and helicase-primase interactions . . . . . 80
B.5 Functional interactions of B. subtilis DnaG primase with WT
and mutant G40P helicase proteins. . . . . . . . . . . . . . . . . . . 81
B.6 Data collection, phasing and renement statistics for full-length
and the ATPase domain of G40P . . . . . . . . . . . . . . . . . . . . 86
xii
Abstract
The Apolipoprotein B editing enzyme catalytic polypeptide-like (APOBEC) family of 11
proteins deaminate cytidines on either single-stranded DNA (ssDNA) or RNA substrates,
introducing C to U mutations [28]. These mutagenic proteins are vital for a variety
of biological purposes, ranging from proper metabolism to development of strict and
ecient antibodies to prevention of virus infection [28]. This activity is controlled by a
signature domain motif His-Xaa-Glu-Xaa
2328
-Pro-Cys-Xaa
24
-Cys, where the histidine
and two cysteine residues coordinate the Zinc (Zn) required for catalysis [24, 28, 50].
A duplication event occurred during evolution that produced four APOBECs with two
of these domains [29, 54]. One such member is APOBEC3G (A3G). In A3G, it has
been shown that only the second domain (CD2) is catalytically active, while the other
is responsible for RNA binding and various protein interactions [24].
A3G has been shown to potently restrict human immunodeciency virus (HIV) repli-
cation [24, 50]. A3G introduces multiple dC to dU deaminations in the viral cDNA during
reverse transcription, triggering degradation of the viral cDNA or resulting in defective
proviruses [24, 50]. A3G also appears to interfere with successful infection by HIV in
a non-deamination dependent manner [24, 50]. To counteract this antiviral activity
of A3G, HIV-1 encodes the Vif protein that binds and targets A3G for proteosome-
mediated degradation [24]. However, in the absence of Vif, A3G can eectively restrict
HIV-1 replication.
Prior to this study, structural information for this family of proteins consisted of
the APOBEC2 (A2) protein, which has an undetermined physiological function [98].
xiii
Here, this study presents the rst X-ray structure of the C-terminal catalytic domain
(A3G-CD2) [49]. The A3G-CD2 structure highlights key functional regions, allowing
for insights to be made into substrate binding and specicity. Further studies of A3G
were performed to answer questions of substrate recognition, oligomerization, and over-
all double domain structure. Altogether, the information presented here advanced the
understanding of A3G's enzymatic activity and provides a solid base for future insightful
drug design aimed at improving A3G's intrinsic HIV restriction.
xiv
Chapter 1
Introduction
Science begins with a question{and every answer raises more. Even before grad school my
favorite biological question was: \What does it look like?" I quickly found out that this
meant that I should do structural biology. The reason for this is because of the incredibly
tight correlation between the structure of a thing and what it can do physically. In
biological terms, how a protein is folded and shaped directly controls what it can interact
with, what that interaction can do, whether the protein is acting outward or being acted
upon. Because of this correlation, a single amino acid mutation can render a protein
completely inactive or, potentially worse, release it from any form of regulation and leave
it in a perpetually active state (think toddler on a sugar high with no supervision and
you will get a good idea of just how catastrophic this scenario can be). However, the
ip side of this correlation is that by knowing the structure, we can literally see the
function of the protein and then, ultimately, we can design a solution uniquely tted to
the biological problem.
We became interested in the APOBEC family of cytidine deaminases because they
create deliberate DNA mutations to achieve a wide variety of vital biological functions{
antibody maturation and viral restriction, for instance. These proteins are perfect exam-
ples of how human biology has achieved a precarious balance that carefully controls an
inherently destructive process to produce a positive outcome.
1.1 Cytidine Deamination
The chemical dierence between cytosine and uracil is a simple change from an amine
group to an oxygen [Figure 1.1 a]. This reaction can occur in multiple contexts: as a
1
free base, a free nucleotide, or within a polynucleotide (DNA or RNA) substrate [26]. In
the rst two contexts, this reaction is often completed as a vital part of the pyrimidine
salvage pathway with the relatively straightforward eect of recycling cytosine or cytidine
into uracil [26]. The biological ramications of this change within a nucleic acid context,
however, are anything but simple. Uracil should only bind to adenosine and be present
in an RNA context. Cytidine should only bind to guanine, but can exist in either
nucleic acid. The dierences in pairing partner result in predictable mutations, but
the presence of uracil in DNA triggers its repair by either base excision repair (BER) or
mismatch repair (MMR) mechanisms (reviewed in [28]). These repair mechanisms create
an opportunity for both mutation and double-strand DNA (dsDNA) breaks, which can
have even larger eects (reviewed in [95, 99]).
1.2 APOBECs
The APOBEC family consists of eleven family members{most of which have important
biological functions [Table 1.1]. They all contain at least one canonical Zn dependent
deaminase motif: His-Xaa-Glu-Xaa
2328
-Pro-Cys-Xaa
24
-Cys. Four of these proteins
actually contain two such domains, although only one of these double domain proteins
has been characterized as both domains are active. In APOBECs this motif is predicted
to mediate the reaction like other well known deaminases [11, 26, 135]: the histidine and
both cysteines coordinate a Zn atom, while the glutamate coordinates and activates a
water molecule in the active center [Figure 1.1 b]. The activated water and the Zn atom
work together in a nucleophilic attack on the C4 of the target cytidine, resulting in the
release of ammonia and successful conversion to uracil [Figure 1.1 a]. [11, 24, 26, 28, 50,
135]
While the chemistry of the reaction has been well characterized in other deami-
nases for many years [11, 26, 135], the overall structure and domain composition of the
APOBEC proteins has only recently seen success. First, in 2002 the X-ray structure of
2
Table 1.1: Summary of the APOBEC family members
Name Primarily Expressed Specicity Possible physiological function(s) References
AID Activated B cells, spleen, thymus, tonsils A/G A/T C Immunoglobulin-gene diversication [16, 28]
APOBEC1 Gastrointestinal tissues G/C T C APOB mRNA editing [97, 28]
APOBEC2 Heart and skeletal muscles unknown Muscle development? [98, 28]
APOBEC3A peripheral leukocytes and keratinocytes T/C C A Retrovirus and retroelement restriction [24, 28]
APOBEC3B heart, spleen, prostate, testis, ovary, thy-
mus, peripheral blood leukocyte, many
cancer cell lines
C/G T C Retrovirus and retroelement restriction [15, 28]
APOBEC3C heart, spleen, prostate, testis, ovary, sm.
intestine, colon, peripheral blood leuko-
cyte, various cancer cell lines
T C/T C Retrovirus and retroelement restriction [24, 28]
APOBEC3D thymus, spleen, liver, pancreas, prostate,
various cancer cell lines
A/G A/G C Retrovirus and retroelement restriction [24, 28]
APOBEC3F peripheral blood leukocytes, spleen, testis,
ovary
T T C Retrovirus and retroelement restriction [24, 28, 44,
45, 115]
APOBEC3G peripheral blood leukocytes, spleen, testis,
ovary
C C C Retrovirus and retroelement restriction [24, 28, 44,
80, 82]
APOBEC3H testis, ovary, fetal liver and skin H C C Retrovirus and retroelement restriction [24, 28, 85]
APOBEC4 testis unknown unknown [28, 101]
3
(a)
(b)
Figure 1.1: Deamination reaction and canonical active site: (a) Simple represen-
tation of the deamination reaction and conversion of cytosine to uracil. (b) A represen-
tation of the canonical active site. Zn is represented by a red sphere and is coordinated
by a histidine and two cysteine residues. H
2
O is represented by a light grey sphere and
is coordinated by the glutamate residue.
A2 was solved [98] and more recently there has been a veritable plethora of structures
of the C-terminus of A3G (A3G-CD2) [21, 40, 46, 49, 110]. The rst X-ray structure
of A3G-CD2 is presented in Chapter 2 [49]. These structures have served to further set
the APOBEC family apart as a unique group of proteins, not only because they are the
rst proteins shown to deaminate cytidine within the context of a nucleic acid, but also
because of their unique structural features; discussed in further detail in Chapter 2.
4
1.2.1 APOBEC1
APOBEC1 (A1) is the founding member of this protein family, and the reason why this
family of diverse biological functions are named as numbered variants of the Apolipopro-
tein B (apo B) mRNA editing catalytic subunit or APOBEC (A1, A2, A3A-H, and A4).
Human A1's gene locus was rst identied in 1994 [34], but a single enzyme responsi-
ble for deaminating a cytidine 6666 of the apo B mRNA resulting in a change from a
glutamate to a stop codon had been proposed earlier in 1991 [48]. The apo B protein is
the primary component of low-density lipoprotein (a tri-glyceride-rich lipoprotein) and
specically has the ligand binding region responsible for the interaction with the low-
density lipoprotein receptor [97]. The truncated protein is only generated in the intestines
and lacks this domain region; as such, it interacts with the chylomicron remnant receptor
instead, signicantly altering the catabolism of the apo B/lipoprotein complex [97]. This
was the rst such case identied of a tissue specic single mutation in mRNA resulting
in two signicantly functionally dierent proteins from a single mRNA transcript [97].
A1 directly catalyzes a simple chemical change that results in wide-reaching biological
eects, and is restricted both through its localized expression and its requirement of a
complementary protein for proper catalysis (reviewed in [28]). Thus, not only is A1 the
founding member simply by virtue of being discovered and studied rst, but it embod-
ies two of the leading characteristics of the APOBEC family: simple mutation causing
complex biological changes and careful regulation of its enzymatic function.
1.2.2 AID & APOBEC2
Activation induced cytidine deaminase (AID) and APOBEC2 (A2) are the oldest
APOBECs from an evolutionary standpoint [29], so it is tting that AID is arguably
one of the most interesting and thus characterized APOBECs [16, 28, 29, 43, 78, 77, 93,
96, 98, 100], and A2 was the rst structure solved [98].
AID was rst identied in 1999 using a subtractive screen between B cells that were
either switch-induced or uninduced [78]. Class switching is an important part of the
5
immune system and directs whether cells become long-lived plasma cells or antigen-
specic memory cells{i.e. whether they remain in innate defense mode or adaptively
respond to antagonistic challenges [78]. The next year the same group conclusively
(a)
(b) (c)
Figure 1.2: A2 the rst X-ray APOBEC structure (PDB code 2NYT) [98]: (a)
Overall tetrameric organization of A2; it is a dimer of dimers as there are two distinct
monomer conformations see (b){(c). Each core monomer consists of six -helices and
ve -strands. Zn is represented by a red sphere. (b) Representative monomer with
the open active site conformation. There is an extended -1 strand that pulls the loop
up and away from the Zn atom. (c) Representative monomer with the closed active
site conformation. There is
exible loop that bends toward and blocks access to the Zn
molecule, which may form a physical block preventing access to the catalytic Zn resulting
in an inactive conformation.
6
showed that AID was required for both class switch recombination (CSR) [77, 100] and
somatic hypermutation (SHM) [100]. Both papers clearly demonstrated that an AID
deciency was directly responsible for deciencies in both CSR and SHM. Initial specu-
lation suspected that AID was deaminating an RNA molecule that somehow controlled
these processes in a manner similar to that seen with A1. However, in vitro activity
of AID was discovered in 2003, when it was found to require purication with RNase
to remove an inhibitory RNA molecule, but then AID directly deaminated cytidines
within a DNA substrate [16, 96]. Again, we see a simple activity with signicant and
necessary biological eects, while the presence of an inhibitory RNA molecule sug-
gests a mechanism of regulation preventing widespread chaos. And indeed, AID has
been implicated in a variety of cancers{ranging from B-cell lymphomas to breast cancer
[4, 6, 24, 51, 62, 69, 86, 89, 90, 94, 102]
In fairly stark contrast to AID, A2 has very limited biological information avail-
able. It is expressed in cardiac and skeletal muscles [67] and upregulated by TNF- and
interleukin-1 [75]. However, in 2007 A2 was the rst APOBEC structure to be solved
and provided several invaluable pieces of information applicable to other APOBECs [98].
Aside from the general protein structure and folding, because of the high homology of
this family, key residues and structural features were highlighted when the 3D structural
information was applied to residue alignments [See Figure 1.2]. This was immediately
useful for mapping AID polymorphic residues that contribute to Hyper IgM onto the
A2 structure by using an alignment between AID and A2 [98]. In addition, dierences
highlighted in the alignment when compared to the structure suggested a possible mech-
anism of internal regulation for A2 [98]{again highlighting the risk these vitally useful
proteins pose to human health.
1.2.3 APOBEC3
Not only were the APOBEC3 proteins the most recently discovered, but they are the
evolutionarily youngest as well [54]. At the genetic level, the seven APOBEC3 proteins
7
cluster in a single locus on chromosome 22, presumably the result of a rapid expansion
[29, 54, 63, 85]. The locus of APOBEC3 proteins are restricted to mammals and can
consist of only one, APOBEC3, (mouse) to seven, APOBEC3A-H, (humans and pri-
mates) [63]. As mentioned earlier, in humans four of these APOBEC3 proteins contain
two canonical deaminase motifs (A3B, A3D, A3F, and A3G); this is likely the result
of another duplication/expansion event [See Table 1.2] [29, 54, 63, 85]. Of these four
proteins there are numerous reports indicating that only the C-terminal domain is enzy-
matically active [44, 45, 80, 82, 57, 115], and one reporting activity for both domains
of A3B [15]. This is especially interesting considering the high level of inter-domain
homology [See Figure 1.3, Table 1.2].
The primary biological function of all seven of the APOBEC3 is defense against
foreign nucleic acids, including the populous and pesky retrotransposons and, more
famously, retroviruses such as HIV-1 [24, 28, 50]. Nearly 45% of the human genome
is comprised of DNA derived from retrotransposons. As such, human cells have been
ghting to prevent further proliferation of both transposable elements and new invasion
by retroviruses for a long time. This con
ict can rather aptly be described as a struggle
to outmaneuver one another, a kind of evolutionary tug-of-war where each new adaptive
advantage results in a selection of a new response strategy. This is the evolutionary
context that may have driven the rapid expansion of the APOBEC3 locus [29, 105] and
the subsequent individualized adaptations in response to their unique challenges.
1.2.4 APOBEC3G
When I began this thesis project, my question was \What does an APOBEC protein look
like? How are they oligomerizing? Are they at all like A2?" In order to address these
questions I designed multiple constructs spanning the entire APOBEC family, searching
for a soluble protein construct to crystallize. From 28 constructs, I identied that the
C-terminal domain of A3G (A3G-CD2, residues 197-380) was highly soluble. A lot of
8
careful work went into the design of these constructs, but I saw getting this particular
construct as lucky for many reasons.
While all APOBEC3 proteins have been shown to function as either an anti-viral
or anti-retrotransposon defense mechanism, A3G is the founding member of this branch
of the family business. It was rst identied in 2002, by using a subtractive screen to
isolate the endogenous host factor responsible for rendering certain cells non-permissive
to HIV infection [112]. Published a mere month earlier was the paper identifying the
APOBEC3 gene locus on chromosome 22, however this study could only guess at a
possible RNA-based cytidine deaminase activity due to their homology with the only in
vivo characterized APOBEC member, A1 [54]. Since the initial discovery of A3G, there
has been an explosion of studies characterizing every aspect imaginable.
Because of the wealth of information provided within these studies, there is now a
very detailed and complex picture of A3G. Biochemically A3G deaminates predominantly
the 3' C of a CCC motif, and within a single-stranded DNA (ssDNA) context it acts
in a manner both directional (5'! 3') and processive [18]. Functionally A3G restricts
multiple retrotransposons and retroviruses. The most famous and well characterized is
HIV, and A3G to manage this restriction primarily through deaminase-dependent mech-
anisms [13, 45, 47, 71, 138], although there are a few reports of deaminase-independent
restriction [82, 115], but this possibility is somewhat debated and might be concentration
dependent [50]. Both mechanisms are likely and probably result from the competitive
con
ict between A3G and HIV.
As mentioned before, both sides of the con
ict are pressured to adapt to the other.
One example of HIV adapting to A3G is that A3G only restricts HIV when the viral
infectivity factor protein (Vif) is not present (reviewed in [24]). The A3G{Vif interaction
is highly specic, requiring A3G's D128 to bind in a species specic manner, meaning
that A3G's residue 128 determines whether HIV-1 Vif will recognize that species of A3G
(human, macaque, chimpanzee, etc.) [14, 72, 108]. Vif recognition results primarily
in A3G restriction due to targeted ubiquitination and proteasomal degradation of A3G
9
A3A A3B-CD1 A3B-CD2 A3C A3D-CD1 A3D-CD2 A3F-CD1 A3F-CD2 A3G-CD1 A3G-CD2 A3H
A3A 33 65 30 39 31 41 31 35 88 32 100
A3B-CD1 29 35 48 48 81 45 82 52 38 100
A3B-CD2 30 64 35 38 36 40 36 38 100
A3C 30 40 44 79 46 78 50 100
A3D-CD1 25 34 44 46 72 44 100
A3D-CD2 30 41 44 87 44 100
A3F-CD1 30 32 60 45 100
A3F-CD2 32 40 44 100
A3G-CD1 25 33 100
A3G-CD2 34 100
A3H 100
Table 1.2: Homology of individual domains of APOBEC3 family members The Clustal W webtool [23] was used to
produce pairwise scores (number of identical residues divided by the total number of residues compared and converted to a
percentage) for the individual A3 domains. Values greater than 50% are highlighted in green
10
Figure 1.3: Alignment of the individual A3 domains: Generated using the Clustal W webtool and visualized with Jalview
[23, 129]. Purple box designates the loop 7 responsible for substrate specicity and residues involved in N-terminal to N-terminal
oligomerization of A3G.
11
[14, 24, 53, 72, 108, 139]. Although again, other mechanisms of Vif directed restriction
of A3G also exist in yet another example of how HIV and A3G are engaged in a complex
winner-take-all tug-of-war.
A further wrinkle in this battle of protein versus virus, is that A3G is an ecient
mutator protein. This raises the possibility that A3G induced mutations are actually aid-
ing HIV's rapid adaptation and escape. And while that question is both important and
useful to highlight the complicated balance between an aggressive function to create an
eective defense and regulation of that function to prevent self-destruction, nevertheless
it is beyond the scope of this thesis.
1.3 Overview of Chapters
Overall, the work presented in this thesis details my contributions to the overall under-
standing of this important and complex protein. In Chapter 2, I describe our X-ray
structure of the C-terminal catalytic domain of A3G (A3G-CD2). We made several
mutations to residues within a visible groove in the structure and tested their in
uence
on substrate specicity, ssDNA binding and deaminase activity. Our results support a
unique model of ssDNA binding within this groove, and provide a strong basis for future
work aimed at understanding how A3G deaminates and restricts HIV.
Chapter 3 details my attempts to answer the next big structural question of A3G:
What does the full-length double domain protein look like? This Chapter is a discussion
of the progress made towards crystallizing the full-length A3G protein. These results
include biophysical studies to understand the forces in
uencing the dynamic oligomer-
ization of A3G and provide a strong basis for future work aimed at determining the
structure of this anti-viral protein.
Chapter 4 summarizes the work of presented in this thesis, and discusses two of the
questions raised by this work to be addressed in future work.
12
Chapter 2
A3G-CD2
Reproduced with permission from Holden, L.G., Prochnow, C., Chang, Y.P., Bransteit-
ter, R., Chelico, L., Sen, U., Stevens, R.C., Goodman, M.F., Chen, X.S. 2008. Crystal
structure of the anti-viral APOBEC3G catalytic domain and functional implications.
Nature Nov 6;456(7218):121-4. Copyright 2008 Macmillan Publishers Limited. [49]
Author contributions: L.G.H. designed, puried and crystallized the protein con-
struct. C.P. provided advice for the design of the project. Y.P.C. solved the structure.
C.P. and R.B. provided analysis of the structure and wrote the manuscript. L.G.H. and
L.C. designed and puried mutants and performed the biochemical assays. U.S. assisted
with the diraction data analysis. R.C.S, M.F.G, and X.S.C. supervised the project.
2.1 Overview
The APOBEC family members are involved in diverse biological functions. APOBEC3G
restricts the replication of human immunodeciency virus (HIV), hepatitis B virus and
retroelements by cytidine deamination on ssDNA or by RNA binding [24, 28, 43, 93] Here
we report the high-resolution crystal structure of the C-terminal deaminase domain of
APOBEC3G (A3G-CD2) puried from Escherichia coli. The A3G-CD2 structure has
a ve-stranded -sheet core that is common to all known deaminase structures and
closely resembles the structure of another APOBEC protein, APOBEC2 (A2) [98]. A
comparison of A3G-CD2 with other deaminase structures shows a structural conservation
of the active-site loops that are directly involved in substrate binding. In the X-ray
structure, these A3G active-site loops form a continuous substrate groove around the
active center. The orientation of this putative substrate groove diers markedly (by 90
)
13
from the groove predicted by the NMR structure [21]. We have introduced mutations
around the groove, and have identied residues involved in substrate specicity, ssDNA
binding and deaminase activity. These results provide a basis for understanding the
underlying mechanisms of substrate specicity for the APOBEC family.
2.2 Materials and Methods
2.2.1 Purication and Crystallization
A3G-CD2 was expressed and puried as a recombinant GST-fusion protein in E. coli.
Puried GST-fusion protein was digested by Pre-Scission protease. Further purication
of the A3G-CD2 protein was completed using Superdex-75 gel ltration chromatography
in 50 mM HEPES, pH 7.0, 250 mM NaCl and 1 mM dithiothreitol. Native and Se-Met-
labelled proteins were concentrated to 25 mg ml
1
. Crystals were grown at 18
C by
hanging-drop vapor diusion from a reservoir solution of 100 mM MES, pH 6.5, 40%
PEG 200.
2.2.2 Structure determination and renement
Selenium-substituted methionine protein crystals were used for collecting Se-MAD data
using the ALS synchrotron beam source. Data were processed with HKL3000 [87]. A
total of three selenium and one zinc sites were located by the SHELXD [107] program
using MAD data between 50 and 3.0
A resolution range. The SHARP program was
used to calculate the experimental and model-combined phases using the MAD data in
the resolution range of 50 to 2.3
A as well as for density modication. The model was
built with O using the high quality electron density map obtained, and was rened with
CNS to 2.3
A resolution with excellent statistics. The nal renement statistics and
geometry as dened by Procheck were in good agreement and are summarized in Table
2.1. Structure gures were designed using PyMOL [32].
14
Table 2.1: Data collection (MAD) and model renement statistics
Peak Se ( = 0.97907
A) In
ection Se ( = 0.97921
A)
Data collection
Cell dimensions (Space group C2)
a, b, c (
A) 83.464, 57.329, 40.578 83.464, 57.329, 40.578
, ,
(
) 90, 96.46, 90 90, 96.46, 90
Resolution (
A) 50-2.20 (2.28-2.20) 50-2.38 (2.38-2.30)
Observations 9675 8564
R
merge
12.1 (37.8) 9.9 (41.2)
InI 19.7 (4.6) 17.0 (3.0)
Completeness (%) 99.8 (99.7) 99.8 (99.2)
Renement
Resolution (
A) 30.0-2.3
No. re
ections 15669
R
work
nR
free
25.10n26.70
B-factor (Averaged): Protein 33.221 Water 34.725
R.m.s. deviations: Bond lengths (
A) 0.010 Bond angles (
) 1.2
Highest-resolution shell values are shown in parentheses.
2.2.3 Construction of A3G mutants
Mutant A3G proteins (D316R/D317R, R313E/R320D and R374E/R376D) were
constructed by site-directed mutagenesis using the pAcG2T-A3G vector as
the template. The following primers and their complementary strands were
used: 5'- CTTCACTGCCCGCATCTATAGAAGACAAGGAAGATGTCAGGAG
-3' (D316R/D317R), 5'- CTGTGCATCTTCACTGCCGAGATCTATGATGAT-
CAAGGAGATTGTCAGGAGGGGCTGCGC -3' (R313E/R320D), and 5'- GAG-
CACAGCCAAGACCTGAGTGGGGAGCTGGACGCCATTCTCCAGAATCAGG -3'
(R374E/R376D). The entire coding region of the A3G mutant constructs was veried
by DNA sequencing. The mutant plasmids were then co-transfected, according to the
manufacturer's protocol, with linearized baculovirus DNA (BD Biosciences) to generate
recombinant mutant A3G baculovirus. Wild-type and mutant A3G expression in Sf9
insect cells and purication was carried out as described previously [20]. Mutant E.
coli GST-A3G proteins (R213E, R215E, K249E, R256E, W285A, F289A, Y315A and
N244A) were constructed by site-directed mutagenesis using the pGEX-6P1-GST-A3G
15
vector as the template. The following primers and their complementary strands were
used:
5'-AATGAACCTTGGGTTGAAGGTCGTCACGAGACTTAC-3' (R213E),
5'-GAACCTTGGGTTCGTGGTGAACACGAGACTTACCTG-3' (R215E),
5'-TGTAACCAGGCCCCGCACGAGCACGGTTTTCTGGAA-3' (K249E),
5'-GCACGGTTTTCTGGAAGGTGAACACGCCGAACTGTG-3' (R256E),
5'-GTTACCTGCTTTACCTCTGCGTCCCCGTGCTTTTCC-3' (W285A),
5'-ACCTCTTGGTCCCCGTGCGCTTCCTGCGCACAAGAA-3' (F289A),
5'-ATCTTCACTGCACGTATTGCCGACGACCAGGGCCGT-3' (Y315A),
5'-CGTCGTGGTTTCCTGTGTGCCCAGGCCCCGCACAAGCAC-3' (N244A),
5'-CGTCGTGGTTTCCTGTGTAGACAGGCCCCGCACAAGCAC-3' (N244R).
The entire coding region of the A3G mutant constructs was veried by DNA sequencing.
Plasmids were expressed in XA90 E. coli cells and were lysed by French press. Further
purication was carried out as described previously [20].
2.2.4 DNA binding
A3G DNA binding was monitored by changes in steady state
uorescence depolarization
(rotational anisotropy). Reaction mixtures (70 l), containing
uorescein-labelled DNA
(50 nM) in buer (50 mM HEPES, pH 7.3, 1 mM dithiothreitol and 5 mM MgCl
2
) and
varying concentration of 0 to 500 nM A3G, were incubated at 37
C. The sequence of
the ssDNA was 5'- TTAGATGAGTGTAA(
uorescein-dT)GTGATATATGTGTAT -3'.
Rotational anisotropy was measured as described previously [18]. The fraction of DNA
bound to protein was determined as described previously [10].
2.2.5 Deamination activity, processivity and directionality assays
In an assay for deamination activity, A3G (0.024-10 M) was allowed to react with 500
nM
uorescein-dT-incorporated ssDNA for 10 or 15 min and subsequently treated with
16
uracil-DNA glycosylase and resolved on 16% urea-PAGE for analysis as described pre-
viously [18]. Specic activity, measured as fmoles of substrate deaminated per g of
enzyme per minute, was calculated from the percent deamination of a ssDNA substrate
over a range of enzyme concentrations. The ssDNA substrate used for specic activity
measurements was 5'- GG (
uorescein-dT) AGTTTAGTGGTTTGTATAGAATTAAT-
ACCCAAAGAAGTGTATGTAATTGTTATGATAAGATTGAAA -3'.
To analyze processivity and directionality, substrate use (%) was less the 15% to
maintain single-hit kinetics. The 'processivity factor' is dened as the ratio of the
observed fraction of double deaminations (occurring at both 5'C and 3'C on the same
molecule) to the predicted fraction of independent double deaminations [18]. A processiv-
ity factor of greater than one indicates that most of the double deaminations are caused
by the same A3G molecule acting processively on both C targets. The deamination bias
is measured by the ratio of 5'C/3'C deaminations [18]. The substrate used to determine
processivity and directionality is 5'- AAAGAGAAAGTGATACCCAAAGAGTAAAGT
(
uorescein-dT) AGATAGAGAGTGATACCCAAAGAGTAAAGTTAGTAAGATGTG-
TAAGTATGTTAA -3'.
2.3 Results and Discussion
We have puried the human wild-type C-terminal cytidine deaminase domain of
APOBEC3G (A3G-CD2, residues 197 - 380) expressed in E. coli. A3G-CD2 (with and
without a glutathione S-transferase (GST) tag) is highly soluble, and deaminates cyti-
dine to uridine on single-stranded DNA (ssDNA) with a specic activity of 5 fmol mg
1
min
1
, which is about 25-fold lower than that of the full-length A3G (GST-A3G; 126
fmol mg
1
min
1
) expressed in E. coli [Figure 2.1 a].
We analyzed the processive and polar properties of A3G-CD2 and full-length A3G
[Figure 2.1 b]. Similar to the insect-cell derived full-length A3G [18, 20], the full-length
A3G expressed in E. coli processively deaminates cytidine in two 5' CCC 3' motifs
17
(a) (b)
Figure 2.1: Enzymatic activity of A3G-CD2: (a) Analysis of the deamination activ-
ity for full-length GST-A3G and GST-A3G-CD2 (GST-CD2) and A3G-CD2 (CD2).
The 32-nucleotide (nt) band indicates deamination activity. F represents the position of
uorescein-dT incorporated into the ssDNA. (b) A3G processivity and the 3' to 5' deam-
ination bias was characterized on ssDNA with two CCC motifs. Single deaminations of
the 5'C and the 3'C appear as 67- and 48-nucleotide fragments, respectively; deamination
of both the 5'C and the 3'C results in a 30-nucleotide fragment (see Methods).
located on a ssDNA substrate, during one binding event [Figure 2.1 b]. The full-length
A3G also exerts a 3' to 5' deamination bias by preferentially deaminating the cytidine
in the CCC motif near the 5' end of the ssDNA substrate [Figure 2.1 b]. In contrast,
the A3G-CD2 exhibits an approximate two-fold decrease in processivity and virtually no
3' to 5' deamination bias [Figure 2.1 b]. These results indicate that A3G-CD2 partially
retains the catalytic properties of full-length A3G, but that the CD1 domain in the
context of the full-length A3G is probably required for the strong processive property
and the 3' to 5' deamination bias on ssDNA.
We crystallized the wild-type A3G-CD2 and solved the structure by using the multi-
wavelength anomalous dispersion (MAD) phasing method with selenium-substituted
methionine (Se-Met) diraction data. The 2.3
A resolution X-ray structure shows a
core -sheet that is composed of ve -strands surrounded by six -helices [Figure 2.2].
Helices 2-4 (h2-h4) are packed alongside one face of the core-sheet [Figure 2.2], whereas
18
Figure 2.2: X-ray structure of A3G-CD2: Two views of the A3G-CD2 domain
rotated 90
showing the ve-stranded -sheet core surrounded by six helices. The zinc
is represented as a red sphere.
helix 1 (h1) and helix 5 (h5) are packed against the opposite face of the -sheet [Fig-
ure 2.2]. Helix 6 (h6) is located at the edge of the -sheet core and is perpendicular to
the 5 strand [Figure 2.2].
A recently reported NMR structure of an A3G-CD2 mutant (A3G-2K3A, Protein
Data Bank (PDB) accession 2JYW) [21] resembles the X-ray structure of the wild-type
A3G-CD2. However, the superposition of the two structures shows notable dierences
[Figure 2.3 a]. The 2 strand and the amino-terminal helix (h1) are absent from the
NMR structure [Figure 2.3 b]. The residues which form the 2 strand in the X-ray
structure form a loop-like bulge in the NMR structure [Figure 2.3 (thickened loop)].
Superposition of the A3G-CD2 X-ray structure and the A3G-2K3A NMR structure has
an RMSD of 4.8
A
2
versus an RMSD of 2.7
A
2
between the A3G-CD2 and the A2 X-ray
structures. The overlay RMSD values indicate that the A3G-CD2 X-ray structure diers
signicantly more from the NMR structure than it does from the A2 structure.
19
(a)
(b)
Figure 2.3: Structural comparison of the A3G-CD2 X-ray structure with the
A3G-2K3A NMR structure: (a) Superposition of the A3G-CD2 X-ray structure
(yellow) and the A3G-2K3A NMR structure (grey) (RMSD=4.8
A
2
) [21]. Superposition
of the A3G-CD2 (yellow) and an A2 monomer (cyan) (RMSD=2.7
A
2
, inset). (b) Two
views of the superposition of the A3G-CD2 X-ray structure (yellow) and A3G-2K3A
NMR structure (grey) with helices 2, 3, and 4 removed to show the dierences in h1,
2 strand, AC-loop 1, and AC-loop 3. The view is rotated 180
. Highlighted are two of
the ve point mutations, L234K and C243A, that were made in order to obtain soluble
protein for the A3G-2K3A NMR structure determination. These mutations are located
on the N and C-terminus of the 2 strand of the X-ray structure (blue), and on the
loop-like bulge of the NMR structure (green).
The2 strand in the X-ray structure does not make crystal contact with neighboring
monomers. Thus, the formation of the 2 strand is unlikely to be the result of crystal
contact. Furthermore, a similar 2 strand within a ve stranded -sheet core is the
common structural feature that is observed in all wild-type cytidine deaminase structures
20
available so far [Figure 2.4]. The distinct h1, h4, and h6 structures are unique to the
X-ray structures of APOBEC proteins [Figure 2.4 a{b]. The number and positions of the
surrounding helices in
uence how the deaminases oligomerize [Figure 2.4 b{f (insets)].
The long h4 and h6 of A3G-CD2 and A2 are unique structural features that are absent
from the other cytidine deaminases, and that guide the elongated oligomerization so that
the active sites are likely to be accessible to DNA and RNA substrates. The elongated
A2 tetramer is formed from a head-head interaction between two dimers (h4 and h6 from
each dimer are labeled) [Figure 2.4 b]. Each A2 dimer is formed through the pairing
of 2 strands from each monomer (2 strands are labeled in left dimer). Alternately,
the human free-nucleotide cytidine deaminase (hCDA) monomer [26] and the ScCDA
monomer [135] form square shaped tetramers [Figure 2.4 d{e]. The E. coli free-nucleotide
cytidine deaminase monomer [11] also forms a square-shaped oligomer. However in the
ECDA, this h4* region connects the larger catalytic N-terminal domain with the smaller
pseudo-catalytic domain at the C-terminus. The APOBEC structures [Figure 2.4 a{b]
clearly show that the \linker" region forms a long helix 4 that is followed by the 5
strand, h5 and h6 before reaching the end of the domain. There is no pseudo-catalytic
domain equivalent to that of ECDA present in A3G or other APOBEC members and
future modeling of APOBEC oligomerization should take these structural features into
account.
As highlighted by the six X-ray structures of [Figure 2.4 a{f], an intact full-length
2 strand and the ve-stranded -sheet core is probably the dening structural feature
of wild-type A3G-CD2 and all other APOBEC proteins. The structural dierences
observed in the NMR structure could have resulted from the ve mutations on the A3G
protein used for NMR study [Figure 2.3 a], or from the dierent methodology used for
determining the structure, or from both.
A superposition of the core structures of A3G-CD2 and A2 monomers shows sub-
stantial overlap for all ve -strands and all six helices [Figure 2.5 a], suggesting that
21
(a) (b) (c)
(d) (e) (f)
Figure 2.4: Common and distinct structural features between APOBEC proteins and other Zn-deaminase super-
family enzymes Monomer and oligomer (insets) X-ray structures of various deaminases. The active site Zn is represented by
a red sphere. (a) The A3G-CD2 monomer (PDB code 3E1U). (b) The A2 monomer [98]; (inset) an A2 tetramer (PDB 2NYT).
(c) The Staphylococcus aureus tRNA adenosine deaminase TadA monomer [68]; (inset a TadA dimer (PDB 2B3J). (d) The
human free-nucleotide cytidine deaminase (hCDA) monomer [26]; (inset) a square shaped hCDA tetramer (PDB 1MQ0). (e)
The Saccharomyces cerevisiaeCDA monomer (ScCDD1) [135]; (inset a square shaped ScCDD1 tetramer (PDB 1RST). (f) The
E. coli free-nucleotide cytidine deaminase monomer (ECDA) [11]; (inset) a square-shaped ECDA dimer (PDB 1ALN).
22
(a)
(b) (c)
Figure 2.5: Superposition of A3G-CD2: (a) Core structures of A3G-CD2 (yellow)
and A2 (cyan) superimposed. The red sphere represents zinc. (b),(c) The superposition
of A3G-CD2 and an A2 monomer, with the AC loop 1 collapsed over the active site
(conformation 1, (b)) or forming an -hairpin (conformation 2, (c)).
these core structures of APOBEC family members are highly conserved. Yet, the struc-
tural overlap shows notable dierences in the active center (AC) loops, referred to as AC
loops 1 and 3, which potentially mark the dierences in substrate use and activity of the
two proteins [Figure 2.5 b{c]. The AC loop 1, which connects h1 with the 1 strand, is
located further away from the active site in A3G than in A2 [Figure 2.5 b{c]. The A3G
AC loop 3 is longer than that of A2 and is positioned further away from the active site
[Figure 2.5 b{c].
23
(a) (b)
Figure 2.6: A3G-CD2 AC loops: (a) In A3G-CD2, the AC loop 1 R215 residue forms
hydrogen bonds (green dashes) with F204, W211, N207, E209 and W285 (pink). The
R215 aliphatic chain hydrophobically packs with F204, R313 and W285. (b) The A3G-
CD2 AC loop 3 residues, R256, F252, L253, H248 and Q245 (pink), form main-chain
hydrogen bonds (green dashes). The conserved N244 is shown in cyan. The active site
residues are H257, C288 and C291 (wheat).
The structure shows elaborate bonding interactions that can stabilize the open con-
formation of A3G AC loops 1 and 3. For example, AC loop 1 forms an extensive bonding
network through R215, which anchors this loop to other parts of the structure [Fig-
ure 2.6 a]. R215 interactions include the direct contact with R313 and W285 located
near the core structure [Figure 2.6 a]. We demonstrate later that the R215E mutation
in A3G abolishes deamination activity, which is consistent with a previous study [22].
Similarly, the A3G AC loop 3 is stabilized by multiple hydrogen bonds between the
main-chain atoms of residues R256, R252, L253, H248, and Q245 within the loop [Fig-
ure 2.6 b]. The loop residue R356 interacts with D264 on a core helix by a strong salt
bridge, and R256 hydrophobically packs with the loop residue F252 by the long aliphatic
chain [Figure 2.6 b]. All of these interactions should help stabilize the conformation of
AC loop 3. Shown later, an A3G R256E mutant, which probably disrupts the AC loop
3 conformation, greatly impairs deamination activity.
In the active site of A3G-CD2, a zinc atom is coordinated by the three residues
H257, C288, and C291, and a water molecule [Figure 2.7 a]. The closely positioned
24
(a)
(b) (c)
Figure 2.7: A3G-CD2 Active Site: (a) The active site residues of A3G-CD2. The
water and zinc molecules are cyan and red spheres, repectively. (b) Superposition of
A3G-CD2 (yellow) and TadA (light blue, PDB accession 2B3J). (c) Superposition of
A3G-CD2 (yellow) and human CDA (pink, PDB accession 1MQO).
water molecule can be activated to become a Zn-hydroxide for nucleophilic attach in the
deamination reaction [26]. Two residues (N244 and H257) on the A3G AC loop 3 show
a structural conservation with many distantly related Zn-deaminases, specically TadA
and human CDA [26, 68] [Figure 2.7 b{c]. The two equivalent TadA residues (N42 and
H53) on a TadA loop (similar to the AC loop 3) directly contact the target base of the
RNA substrate [Figure 2.7 b].
These residues overlap well with the A3G residues N244 and H257 on the AC loop 3
in the superposition of the two structures [Figure 2.7 b] [68]. Similarly, two equivalent
residues (N54 and C65) on a human CDA loop contact the substrate/inhibitor [26], and
also overlap with N244 and H257 on the AC loop 3 of A3G [Figure 2.7 c]. This structural
conservation suggests that the A3G-CD2 residues, N244 and H257, are also involved in
substrate contact. In an in vitro assay, the A3G N244A mutant had no detectable
deamination activity [Figure 2.8 e]. The structural conservation of the position of these
25
residues suggests that the open conformation of the A3G AC loop 3 is in a position ready
to bind nucleic acid.
A surface representation of the A3G-CD2 X-ray structure shows that the AC loops
1 and 3 and the regions near the active site form a deep, spacious groove that runs
horizontally across the active center pocket [Figure 2.8 a]. This groove is not present
in the A3G-2K3A NMR structure because of the structural dierences [Figure 2.8 c{d]
[21]. The structural features in this groove strongly suggest a role for binding ssDNA
substrates. The groove starts between the AC loops 1 and 3 on the right side of the
displayed structure [Figure 2.8 a], leads into a deep pocket where the Zn atom is located
and slightly exposed, and continues towards the left side over helix 6. The target base
must be positioned into the active site so that the attachment of the Zn hydroxyl group
can occur on the cytidine base during the deamination reaction [Figure 2.7 c]. The
ssDNA lying across this horizontal groove can present a cytidine base so that it is directed
towards the active site Zn in the correct orientation and angle to permit deamination,
as shown in the case of TadA and human CDA [Figure 2.7 a{c] [26, 68].
Within this horizontal groove are a group of charged residues (R213, R215, N244,
R256, R313, D316, D317, R320, R374 and R376) and hydrophobic residues (W285,
Y315 and F289) [Figure 2.8 b]. In our mutagenesis study, we show that all of these
residues are important for the deamination activity on ssDNA [Figure 2.8 e]. However,
they aect the deamination activity in dierent ways. The R374 and R376 residues are
located on one end of the groove and are positioned to interact with a negatively charged
ssDNA phosphate backbone. The ssDNA binding of the R374E/R376D double mutant
is impaired by 46% in comparison to that of the wild-type A3G, and the deamination
activity is even more disrupted [Figure 2.8 e]. On the edge of the groove, the AC loop 1
R213 residue can make contact with ssDNA. Consistent with a previous report [21], the
R213E mutant has only weak deamination activity [Figure 2.8 e].
Three of the charged residues (R256, R215, and R313) are involved in elaborate
bonding networks for the AC loops [Figure 2.6 a, b and 2.8 b] and should be important for
26
(a) (b)
(c) (d)
(e)
Figure 2.8: Predicted substrate groove and deamination activity of A3G
mutants: (a) Surface representation of A3G-CD2, showing a horizontal groove with
residues (magenta) predicted to interact with ssDNA. ssDNA is represented by a green
line. (b) Cartoon and stick representation of A3G mutants. (c) Surface representation
of the NMR structure of A3G-2K3A shown in the same orientation as (a). A vertical
groove and proposed DNA binding path is represented by a dashed green line. (d) Car-
toon and stick representation of the A3G-2K3A structure. (e) Mutational data of A3G
puried from Sf9 (left) or from E. coli (right) are shown. The right inset shows the
relative deamination of the 3'C (5'CCC) or the middle C (5'CCC) on a ssDNA substrate
by Sf9 puried proteins. Error bars represent the s.d.
27
maintaining the groove conformation. The mutants R215E, R256E and R313E/R320D
show only minimal or no deamination activity [Figure 2.8 e]. The primary functional role
of these residues may be to maintain the conformation of the substrate groove rather than
to directly contact ssDNA. Mutation of the R313 residue can disrupt its interaction with
W285, which is located on the
oor of the groove near the active site Zn [Figure 2.8 b].
Y315 next to W285 is also on the
oor of the groove. Both residues could stack with
bases of ssDNA and position the DNA into the active site [Figure 2.8 a{b]. Mutants
W285A and Y315A show no detectable deamination activity [Figure 2.8 e], consistent
with a previous report [22]. Another hydrophobic residue on the edge of the groove is
F289, and the F289A mutant has greatly reduced deaminase activity [Figure 2.8 a{e].
Notably, next to Y315 and W285 are two negatively charged residues (D316 and
D317) on the
oor of the groove [Figure 2.8 a{b]. The mutant D316R/D317R has higher
deamination activity (1.6 fold), as well as higher ssDNA binding (twofold) compare to the
wild-type A3G [Figure 2.8 e]. These enhanced activities could be cause by the increased
total positive charge in the groove. Furthermore, this mutant showed altered substrate
specicity [Figure 2.8 e, inset]. Unlike wild-type A3G that strongly favors deamination
at the 3'C of a 5' CCC 3' hot-spot motif, the D316R/D317R mutant deaminates the
middle C and the 3'C at about the same rate [Figure 2.8 e, inset]. This result indicates
that these negative residues, D316 and D317, are important for positioning the substrate
so that the 3'C is most likely to be deaminated by wild-type A3G.
The mutagenesis study supports our model of the horizontal groove, and veries that
the residues located within and around the groove are important for deamination activ-
ity, ssDNA binding and substrate orientation. These results provide a basis to pursue
further studies of A3G and other important APOBEC proteins (including activation-
induced cytidine deaminase, AID), which will facilitate our understanding of how they
act within our innate and adaptive immune responses to restrict HIV and other infectious
pathogens.
28
We thank the sta at the Berkeley Laboratorys Advanced Light Source (ALS)
BL8.2.1 and Advanced Photon Source 19ID in Argonne National Laboratory for assis-
tance in data collection, and M. Klein and other members of the X.S.C. laboratory for
help and discussion. This work is supported in part by CBM graduate training grants
to L.G.H. and Y.P.C., and by National Institutes of Health grants to M.F.G. and X.S.C.
2.4 Concluding Remarks
Continued analysis of the A3G-CD2 structure, including those of additional alternate
A3G-CD2 structures published later [40, 46, 110], has led to a continued debate about
the true structure of the2 strand. Recently a group outside of the competing structural
labs, Autore et. al., performed a series of molecular dynamics simulations on all available
A3G-CD2 structures and observed a trend towards a stable and more dened2 structure
most similar to our original A3G-CD2 X-ray structure [2]. The continued interest and
further publications focused solely on the structure of A3G-CD2 reinforce the signicance
of this structure and the data presented herein. However, it also underlines the urgent
need for a full-length structure. Progress towards achieving that goal is detailed in the
following chapter.
29
Chapter 3
Full-length A3G
3.1 Overview
While the results presented in the previous Chapter are interesting and valuable, they
are just a beginning. For a full and complete understanding of A3G, we need the full-
length structure. As discussed in this Chapter, this is complicated by the dynamic
oligomerization of A3G. This dynamic oligomerization renders it highly unlikely that
A3G will crystallize natively because of the extreme diculty in isolating a homogenous
sample. The results of this Chapter support this conclusion, but suggest possible avenues
of controlling oligomerization in order to obtain a homogenous and crystallizable form
of A3G in the future.
3.2 Purication
A large part of my time working with full-length A3G has been spent optimizing the
purication scheme for a crystallization study. I began with the previously reported
method [20] and carefully altered individual variables in order to obtain the large quantity
of extremely pure protein necessary for crystallization.
A3G has previously been puried and characterized from both Sf9 insect cells and E.
coli [18, 19, 20, 21, 22, 40, 49]. However, there are distinct advantages and disadvantages
to each system. E. coli is the go-to system for many crystallization studies because it is
easy to manipulate and quick to mass-produce. The main disadvantage is that it lacks
most post-translational modications that are common in higher order organisms and
can be essential for proper protein function or stability. Alternatively Sf9 cells oer an
30
environment that is closer to that of higher order organisms, which can allow for a more
complex protein's needs to be addressed, such as post-translational modications. But
there is signicant nancial and time investment required for Sf9 expression. Instead of
an estimate of weeks to obtain a completely new mutant protein in E. coli, developing
a mutant for Sf9 cells involves months of work. This is because Sf9 cloning begins with
the same initial cloning procedure, but then requires a recombination step to generate
the virus particles, followed by multiple amplications of the virus, and then nally the
transfection and protein expression.
Because of the speed and cost advantages, it would be preferable to use E. coli if
possible. However, purifying A3G in E. coli results in a light expression of a weakly
active protein that is dicult to isolate and concentrate, and thus is not optimal for
crystallization. This uncooperative behavior may be due to a lack of post-translational
modications unavailable in a bacterial environment. What this means for our research
is that to obtain stable and fully active protein, we must take the extra time and purify
A3G from Sf9 insect cells.
Once the expression system was chosen, I obtained GST-A3G recombinant virus and
initial Sf9 pellets from Linda Chelico in Myron Goodman's laboratory. Further Sf9 cul-
turing was outsourced to the University of Colorado Protein Production, Monoclonal
Antibody, and Tissue Culture Core. With pellets in hand, I was able to focus on opti-
mizing the lysis buer conditions and purication scheme. Lysis buer variables tested
were: buer range of pH 6.75 to pH 8.5 (PIPES, HEPES, and Tris as appropriate), 0-10%
glycerol, 0-1% triton, and 1-10 mM DTT. After multiple preps testing these variables, I
determined the optimal buer to be 50 mM Tris pH 8.0, 250 mM NaCl, and 1 mM DTT,
which I maintained throughout the entire purication. The remainder of the lysis step
and GST anity purication remained unchanged from the protocol previously reported
[19].
Following on-column digestion with thrombin, A3G is eluted, concentrated to about
10-15 mg mL
1
, and loaded onto a Superdex 200 gel ltration column. Fractions were
31
collected and pooled into four groups according to oligomeric state based on the elution
prole, which consists of two broad and connected peaks [Figure 3.1 a]. Each group was
re-concentrated and re-run on an appropriate gel ltration column to more accurately
assess molecular weight, as well as sample homogeneity. The highest molecular weight
group (Group 1) was run on a Superose 6 analytical column and shows an approximate
molecular weight of 500-600 kDa [Figure 3.1 b]. The next highest group (Group 2) was
run again on a Superdex 200 analytical gel ltration column and shows an approximate
molecular weight of 200 kDa and tail to the right, which suggests heterogeneity within
the sample [Figure 3.1 c]. Group 3 was rerun on the S200 column and conrmed an
approximate molecular weight of 100-140 kDa, consistent with a dimer or trimer state,
and the breadth of the peak demonstrates a relatively high heterogeneity within the
sample [Figure 3.1 (d)]. Group 4 also was rerun on both the Superdex 200 column and a
Superdex 75 column, and both conrmed an approximate molecular weight range of 50-
100 kDa, consistent with a monomer or dimer state, although on both columns the peak
has a distinct tail to the right [Figure 3.1 e{f]. These groups were pooled and concentrated
again to about 10-15 mg mL
1
for further characterization and crystallization studies.
3.3 Crystallization Screens
Group 3 and Group 4 from multiple purication batches were set up in most of our
standard screens (from Hampton Research and Qiagen) at a 1:1 protein to mother liquor
buer ratio using the Hydra II robot at both 18
C and 4
C. Trays were monitored at
regular intervals beginning at every other day for the rst week, then once a week for the
rst month, then every few months for the next year and a half. Promising microcrystals
were observed in multiple conditions which were used to design an optimized screen
around these conditions. They were monitored in a similar manner, but sadly nothing
better than those promising microcrystals has ever been found, except for several salt
32
(a) (b)
(c) (d)
(e) (f)
(g)
Figure 3.1: Gel ltration proles: (a) Representative gel ltration experiments of
A3G Purication Groups. (a) A3G wild type on a Superdex 200 Preparation column.
Groups of pooled samples are identied by number 1-4 (b) Group 1 on an Analytical
Superose 6 column. (c{e) Groups 2, 3, and 4 respectively on an Analytical Superdex
200 column. (f) Group 4 on a Superdex 75 Preparation column (in pink). For molecular
weight comparison A3G-CD2 on the same column is presented in blue. (g) A3G wild
type (in pink) and A3G DL370/371A (in blue) on a Superdex 200 Preparation column.
33
(a)
(b) (c) (d)
(e) (f) (g)
Figure 3.2: Microcrystals and salt crystals: (a) exemplar microcrystals of A3G
(b){(g) Cool and unusual salt crystals formed during the A3G trials
crystals [Figure 3.2]. Within these optimized screening trials, variables such as protein
concentration, temperature and, drop ratios were evaluated as well.
Protein concentration can in
uence protein solubility, nucleation and crystal growth.
Concentrations of approximately 3{4 mg mL
1
and 8{10 mg mL
1
(dependent on which
purication batch) were used in optimization trials.
Temperature is a commonly adjusted variable in crystallization trials due to its role
in in
uencing nucleation, pH, speed of equilibrium and protein solubility. As discussed in
the next section, in the case of A3G temperature also directly aects the oligomerization
and thus the heterogeniety of our sample. Temperatures tried during this screening
34
included 4
C, 18
C, and 37
C. In addition, I tried setting up trays at one temperature and
then transferring to another temperature to assess the possibility of successful nucleation
requiring one temperature, and successful crystal growth requiring another. While I did
observe dierences in screening results due to temperature, none resulted in successful
crystal formation.
Drop ratio is another popular and important variable, as it directly correlates to the
chemical balance within the drop, which aects the rate at which the diusion experiment
will proceed and the path that the sample will follow through its phase diagram. Or, in
other words, temperature can aect at what stage the protein sample will enter into a
specic phase of saturation (stable, metastable, labile, and precipitation). As mentioned
above, initial screening for crystallization hits began with a 1:1 protein to buer ratio,
meaning that 0.4 L of both protein sample and mother liquor were mixed together in
the tray. Further studies included drop ratios in a grid ranging from 0.5 L{1.25 L
protein mixed with 0.5 L{1.25 L mother liquor.
Group 1 and Group 2 pools were smaller volumes and as such were set up in hanging
drop trays, again using a variety of drop ratios and both the mini24 screen [88] and some
of the personally designed optimization screens created based on the observed results of
Groups 3 and 4. Similar results were observed as those obtained in the screens of Groups
3 and 4.
3.4 Dynamic Light Scattering
One possibility for the lack of good crystal hits is the heterogeneity of the protein sample.
Crystal packing requires precision alignment of complexes to form repeating symmetrical
units. I used Dynamic Light Scattering (DLS) to test dierent buer and temperature
conditions to see their eect on the polydispersity of the pooled samples. Polydispersity
(Pd) is a measurement of sample heterogeneity and a good indicator of crystallization
prospects of a sample; the lower the percent Pd the more homogenous the sample and
35
thus the higher the likelihood of crystallization [133]. While gel ltration data can give
some indication of heterogeneity, and in fact A3G's gel ltration proles re
ect this with
their broad and double peaks [Figure 3.1], DLS has many advantages over gel ltration,
including:
1. Sample Size. DLS requires a relatively smaller sample size; on the scale of L
instead of mL.
2. Observation vs. Interaction. DLS examines the sample in solution without
introducing physical barriers; the passage of sample over the beads of the col-
umn is a physical interaction that can in
uence results by breaking up a delicate
interaction.
3. Temperature. Using the DynaPro system (Wyatt Technologies) temperature
eects can easily be assayed, whereas gel ltration columns are often maintained
at 4
C.
4. Time. DLS results take minutes, gel ltration runs take at least an hour.
20L of two possible purication buers was incubated at 4
C, once a solid baseline
was established, 1-5 L of approximately 200 M A3G Groups 1 and 4 was added (for
reference 200 M of A3G is equivalent to approximately 9 mg mL
1
of protein). Data
was collected at 10 second intervals for 30-50 total data sets. The Dynamics software
package was used to process the data [See Table 3.1 summary].
While this was very much a quick and dirty experiment and is by no means a com-
plete survey due to the small quantity of protein available at the time, the data do
reveal key pieces of information which can be utilized to break down the uncontrolled
oligomerization problem at the purication step. First, the data present experimental
proof that dierent batches of dierent oligomeric samples have comparable predicted
hydrodynamic radii (R
h
), but have widely variable Pd. This conrms batch-to-batch dif-
ferences despite a detailed purication protocol, highlighting the eect of each individual
36
fraction pooled post-gel ltration. Secondly, Group 4 samples showed a decrease in both
R
h
and Pd with an increase in temperature to in vivo levels, 37
C. Third, a lower pH
showed a lower overall Pd, however not quite low enough to be considered monodisperse.
This was especially true of Group 1 samples, where it seems that pH has a stronger eect
on Pd than temperature, contrary to the results seen for Group 4 samples. In addition,
Pd is not an absolute measurement of heterogeneity, nor is a completely homogenous
or monodisperse sample guaranteed to crystallize. Altogether, these data illustrate the
complexity of factors controlling the oligomerization of A3G and highlight certain trends
Table 3.1: Summary of DLS results
Sample Buer Temp
C R
h
(nm) % Pd % Int
Group 4 batch 2 100mM PIPES pH 6.75 4 7.5 27.3 88.6
Group 4 batch 2 100mM Tris pH 8 4 9.9 79.4 97.3
Group 4 batch 2 100mM Tris pH 8 18 6.8 32.1 88.6
Group 4 batch 2 100mM Tris pH 8 37 5.7 11.2 73.3
Group 4 batch 2 100mM Tris pH 8 37 23.3 12.6 23.4
Group 1 batch 1 100mM PIPES pH 6.75 4 10.9 32 95.2
Group 1 batch 1 100mM PIPES pH 6.75 18 10.8 38.6 96.1
Group 1 batch 1 100mM PIPES pH 6.75 37 9.1 37.8 67.5
Group 1 batch 1 100mM PIPES pH 6.75 37 36.7 29.7 32.3
Group 1 batch 1 100mM Tris pH 8 4 10 31.7 97.3
Group 1 batch 1 100mM Tris pH 8 18 10.1 45.4 98.3
Group 1 batch 1 100mM Tris pH 8 37 11.7 70.3 97.9
Group 4 batch 1 100mM Tris pH 8 4 7 31.5 99.3
Group 4 batch 1 100mM Tris pH 8 4 6.7 33.2 95.6
Group 4 batch 1 100mM Tris pH 8 18 6 19.4 87.6
Group 4 batch 1 100mM Tris pH 8 37 5.3 2.2 81
Group 1 batch 3 100mM Tris pH 8 4 12.6 51.1 98
Group 1 batch 3 100mM Tris pH 8 18 10.2 40.3 92.4
Group 1 batch 3 100mM Tris pH 8 37 9.4 48.3 90
*The R
h
is an indicator of the size of a globular protein, however, as A3G has
been shown to oligomerize in an elongated fashion [9, 103, 130] the R
h
would be more
an approximation of half the length of the protein complex. Further since 1 nm is
approximately 10
A , an estimated R
h
range of 5 nm to 10 nm would indicate a protein
length of 100
A to 200
A, which based on the A2 structure is consistent with a dimer
or tetramer of full-length A3G respectively.
37
within conditions that might be manipulated to isolate a homogenous or monodisperse
sample.
3.5 Deamination Activity and DNA binding
To investigate whether the dierent oligomeric composition of A3G had an eect on
its deamination activity, specic activity of Group 1 and Group 4 was measured as
pmoles of substrate deaminated per g of enzyme per minute. It should be noted here
that molar concentration values of A3G are calculated assuming a monomeric state to
ensure a consistent measurement scale. Varying amounts of 5 M A3G was allowed
to incubate with 5 pmols of DNA substrate at 37
C for 15 minutes and subsequently
treated with uracil-DNA glycosylase and resolved on a 16% urea-PAGE gel for analysis
as described previously [Figure 2.1 a] [18]. The ssDNA substrate used for specic activity
measurements was 5'- GG (
uorescein-dT) AGTTTAGTGGTTTGTATAGAATTAAT-
ACCCAAAGAAGTGTATGTAATTGTTATGATAAGATTGAAA -3'. Group 4 had a
specic activity of 0.068 pmol g
1
min
1
, which is consistent with that reported for
A3G elsewhere [19]. Group 1 exhibited a slightly higher specic activity of 0.086 pmol
g
1
min
1
. This indicates that there is some level of functional dierence between
these isolated groups and their oligomeric states. More specically, this indicates that
the higher order oligomers are not just aggregates of improperly folded protein, but
instead are fully functional.
This lead to analysis of what eect that oligomerization has on DNA binding, which
might possibly explain this increase in activity. To visualize this, 1 pmol
uorescein
labelled DNA (previously used in the deamination assay) was mixed with increasing
amounts of A3G protein (a range of 0.5 - 100 M protein for Group 1 and a range
of 0.5 - 20 M protein for Groups 2{4). Samples were allowed to incubate at room
temperature for 5 minutes, then an equal volume of 50% glycerol was added and the
sample was loaded onto a 1% agarose gel and run at 100 V for approximately 1 hour on
38
Figure 3.3: DNA gel shift of all four Groups: Increasing amounts of A3G protein
were incubated with 1 pmol of
uorescently labelled DNA and imaged on 1% agarose
ice. Immediately following this, the gel was soaked in approximately 50 mLs of dH
2
O
containing 5 L of SYBR Green I (Invitrogen) for 15 minutes followed by visualization
using the BioRad FX scanner [Figure 3.3].
Multiple modes of relatively tight binding can be seen with increasing amounts of
protein across all four oligomeric Groups, indicating that isolated lower oligomeric groups
(Group 3 and 4) can and do form higher oligomers in the presence of DNA substrate.
It should be noted that at approximately 20 fold excess protein, the Group 1 sample
exhibits complete binding of the DNA and appears to be shifted in a manner to suggest an
ultimate oligomeric state of protein bound to DNA. While this is promising for addressing
the heterogeneity plaguing our crystallization trials, co-crystallization with protein and
substrate is not a trivial matter and indeed small scale screening of protein and substrate
mixtures has not resulted in any promising crystallization hits as yet. However, the gel
shift assay represents a viable and simple option for optimizing substrate conditions prior
to large-scale screening.
39
3.6 Mutants
Meng Xu and I together puried three dierent mutants of A3G that were designed
to disrupt protein activity (E259Q) (MX), N-terminal oligomerization (FW126/127A)
(MX), and C-terminal oligomerization (DL370/371A) (LH). E259Q appears to oligomer-
ize in a manner consistent with wild-type A3G [data not shown]. DL370/371A exhibits
an interesting shift of the overall prole towards a smaller molecular weight [Figure 3.1 g],
however the possibility exists that it is simply an artifact of samples being run on dif-
ferent Superdex 200 columns. FW126/127A has previously been characterized [19], and
shown to behave similar to wild type A3G in all respects except for oligomerization
prole [data not shown]. As rst shown in Chelico, et. al. 2008, A3G is a mixture of
monomer, dimers, and higher oligomers both with and without ssDNA substrates, as
well as at various salt concentrations [20].
Sadly, all three mutants exhibited drawbacks during purication that likely prevented
successful crystallization. E259Q and DL370/371A had similar problems as the wild type
protein, with oligomerization resulting in a heterogeneous sample. While FW126/127
addresses this particular challenge [19], it falls short of the stable and highly concentrated
requirements as it precipitates when concentrated to around 3 or 4 mg mL
1
.
3.7 Discussion
The data presented above implicates the dynamic oligomerization and resulting hetero-
geneity of A3G as the leading challenge blocking crystallization. They show that no
single chemical or structural variable stands out as having absolute control over A3G.
The DLS data indicate that both temperature and buer composition have small, yet
signicant, eects on the relative sample Pd or heterogeneity. This implicates these two
variables as potential avenues for crystallization optimization, but further work is needed
at an earlier stage to isolate a more homogenous starting sample.
40
Previous studies implicate both the N-terminus [19] and C-terminus [9] of A3G in
mediating oligomerization. Further, SAXS and analytical centrifugation data suggest
an elongated oligomerization [9, 103, 130] similar to that seen in the A2 structure [98].
This suggests that possible oligomerization interfaces are formed between: N-terminus
to N-terminus, C-terminus to C-terminus, and N-terminus to C-terminus. Mutations
aimed at disrupting these possible interactions have been characterized. FW126/127A
eectively disrupts N-terminal based oligomerization, but cannot be concentrated to a
useful level for crystallization studies. DL370/371A shows no clear behavioral change.
It is likely that a combination of mutations designed at targeting all of these potential
interfaces will be required to successfully isolate a homogenous sample of A3G.
3.8 Concluding Remarks
The A3G wild type and mutant viruses were generous gifts from Linda Chelico and
Myron Goodman. As mentioned previously, large volume cultures were obtained from the
University of Colorado Protein Production, Monoclonal Antibody, and Tissue Culture
Core. The DLS studies were performed with the assistance of Nickolas Chelyapov at the
USC NanoBiophysics Core. This work was funded by a California HIV/AIDS Research
Program fellowship award to L. G. H. and an Innovative, Developmental, Exploratory
Award (IDEA) to X. S. C.
41
Chapter 4
Summary and Future Directions
In summary, the APOBEC family of proteins, and A3G in particular, present a com-
plicated biological and structural challenge. The structure of the A3G-CD2 domain
presented in Chapter 2 partially answers my initial question of \What does A3G look
like?", while the work presented in Chapter 3 identies the oligomerization and resulting
heterogeneity of A3G as the primary challenge to be addressed in order to achieve a
full-length A3G structure. Together the data presented in the previous Chapters answer
specic questions, but it is the nature of science for answers to lead to new questions.
A couple of the important questions raised by the work of this thesis that remain for
future study are:
1. What does the full structure of A3G look like? As discussed already in Chapter
3, before this question can be answered the oligomerization of A3G must be addressed.
Due to the complex nature of this interaction a single `magic bullet' is not likely to solve
this problem. However, using the structural information available, careful mutations can
be designed to control the predicted oligomerization interaction and allow for successful
crystal growth.
2. How does A3G-CD2 interact with its DNA substrate? One promising avenue to
answer this question would be through DNA crosslinking via cysteine mediated disulde
bridges. This method was successfully used by the Verdine group to covalently trap HIV-
1 Reverse transcriptase with a DNA template{primer complex [52] (reviewed in [126]).
The structure presented in Chapter 2 provides a solid basis for intelligent design of the
necessary cysteine mutants and even for the specialized DNA substrates to achieve a
covalently bound protein{DNA complex.
42
Answers to these questions will enhance our understanding of A3G and hopefully
lead towards the overall goal of developing defenses against or new treatments for HIV
infection.
43
Bibliography
[1] K Arai and A Kornberg. A general priming system employing only dnaB protein
and primase for DNA replication. Proceedings of the National Academy of Sciences
USA, 76:4308{4312, 1979. 10.1073/pnas.76.9.4308.
[2] F Autore, J R C Bergeron, M H Malim, F Fraternali, and H Hutho. Rationalisa-
tion of the dierences between APOBEC3G structures from crystallography and
NMR studies by molecular dynamics simulations. PLoS ONE, 5(7):e11515, Jan
2010.
[3] S Ayora, U Langer, and J C Alonso. Bacillus subtilis DnaG primase stabilises
the bacteriophage SPP1 G40P helicase-ssDNA complex. FEBS Letters, 439:59{62,
1998. 10.1016/S0014-5793(98)01337-4.
[4] G Babbage, C H Ottensmeier, J Blaydes, F K Stevenson, and S S Sahota.
Immunoglobulin heavy chain locus events and expression of activation-induced
cytidine deaminase in epithelial breast cancer cell lines. Cancer Research,
66(8):3996{4000, Apr 2006.
[5] S Bailey, W K Eliason, and T A Steitz. Structure of hexameric DnaB helicase
and its complex with a domain of DnaG primase. Science, 318:459{463, 2007.
10.1126/science.1147353.
[6] R C L Beale, S K Petersen-Mahrt, I N Watt, R S Harris, C Rada, and M S
Neuberger. Comparison of the dierential context-dependence of DNA deamina-
tion by APOBEC enzymes: correlation with mutation spectra in vivo. Journal of
Molecular Biology, 337(3):585{96, Mar 2004.
[7] S P Bell and A Dutta. DNA replication in eukaryotic cells. Annual Reviews
Biochemistry, 71:333{74, Jan 2002.
[8] S J Benkovic, A M Valentine, and F Salinas. Replisome-mediated
DNA replication. Annual Reviews Biochemistry, 70:181{208, 2001.
10.1146/annurev.biochem.70.1.181.
[9] R P Bennett, J D Salter, X Liu, J E Wedekind, and H C Smith. APOBEC3G
subunits self-associate via the C-terminal deaminase domain. Journal of Biological
Chemistry, page 19, Oct 2008.
44
[10] J G Bertram, L B Bloom, M O'Donnell, and M F Goodman. Increased dNTP
binding anity reveals a nonprocessive role for Escherichia coli beta clamp with
DNA polymerase IV. The Journal of Biological Chemistry, 279(32):33047{50, Aug
2004.
[11] L Betts, S Xiang, S A Short, R Wolfenden, and C W Carter Jr. Cytidine deaminase.
the 2.3 Angstrom crystal structure of an enzyme: transition-state analog complex.
Journal of Molecular Biology, 235(2):635{56, Jan 1994.
[12] L E Bird, H Pan, P Soultanas, and D B Wigley. Mapping protein-protein inter-
actions within a stable complex of DNA primase and DnaB helicase from bacillus
stearothermophilus. Biochemistry, 39:171{182, 2000. 10.1021/bi9918801.
[13] K N Bishop, R K Holmes, A M Sheehy, and M H Malim. APOBEC-mediated
editing of viral RNA. Science, 305(5684):645, Jul 2004.
[14] H P Bogerd, B P Doehle, H L Wiegand, and B R Cullen. A single amino acid
dierence in the host APOBEC3G protein controls the primate species specicity
of HIV type 1 virion infectivity factor. Proceedings of the National Academy of
Science USA, 101(11):3770{4, Mar 2004.
[15] H P Bogerd, H L Wiegand, B P Doehle, and B R Cullen. The intrinsic antiretroviral
factor APOBEC3B contains two enzymatically active cytidine deaminase. Virology,
Jan 2007.
[16] R Bransteitter, P Pham, M D Schar, and M F Goodman. Activation-induced cyti-
dine deaminase deaminates deoxycytidine on single-stranded DNA but requires
the action of RNase. Proceedings of the National Academy of Science USA,
100(7):4102{7, Apr 2003.
[17] A T Brunger. Crystallography & NMR system: a new software suite for macro-
molecular structure determination. Acta Crystallographica Section D Biological
Crystallography, 54:905{921, 1998. 10.1107/S0907444998003254.
[18] L Chelico, P Pham, P Calabrese, and M F Goodman. APOBEC3G DNA deam-
inase acts processively 3'! 5' on single-stranded DNA. Nature Structural and
Molecular Biology, 13(5):392{399, May 2006.
[19] L Chelico, C Prochnow, D A Erie, X S Chen, and M F Goodman. A structural
model for deoxycytidine deamination mechanisms of the HIV-1 inactivation enzyme
APOBEC3G. Journal of Biological Chemistry, Mar 2010.
[20] L Chelico, E J Sacho, D A Erie, and M F Goodman. A model for oligomeric
regulation of APOBEC3G cytosine deaminase-dependent restriction of HIV. The
Journal of Biological Chemistry, 283(20):13780{13791, Mar 2008.
[21] K M Chen, E Harjes, P J Gross, A Fahmy, Y Lu, K Shindo, R S Harris, and
H Matsuo. Structure of the DNA deaminase domain of the HIV-1 restriction
factor APOBEC3G. Nature, 452:116{119, Mar 2008.
45
[22] K M Chen, N Martemyanova, Y Lu, K Shindo, H Matsuo, and R S Harris. Exten-
sive mutagenesis experiments corroborate a structural model for the DNA deami-
nase domain of APOBEC3G. FEBS Letters, 581(24):4761{6, Oct 2007.
[23] R Chenna, H Sugawara, T Koike, R Lopez, T J Gibson, D G Higgins, and
J D Thompson. Multiple sequence alignment with the clustal series of programs.
Nucleic Acids Research, 31(13):3497{500, Jul 2003.
[24] Y L Chiu and W C Greene. The APOBEC3 cytidine deaminases: an innate
defensive network opposing exogenous retroviruses and endogenous retroelements.
Annual Review of Immunology, 26:317{53, Jan 2008.
[25] J P Chong, M K Hayashi, M N Simon, R M Xu, and B Stillman. A double-hexamer
archaeal minichromosome maintenance protein is an ATP-dependent DNA heli-
case. Proceedings of the National Academy of Sciences USA, 97(4):1530{5, Feb
2000.
[26] S J Chung, J C Fromme, and G L Verdine. Structure of human cytidine deaminase
bound to a potent inhibitor. Journal of Medicinal Chemistry, 48(3):658{60, Feb
2005.
[27] M G Clarey. Nucleotide-dependent conformational changes in the DnaA-like core
of the origin recognition complex. Nature Structural & Molecular Biology, 13:684{
690, 2006. 10.1038/nsmb1121.
[28] S G Conticello, M A Langlois, Z Yang, and M S Neuberger. DNA deamination in
immunity: AID in the context of its APOBEC relatives. Advances in Immunolgy,
94:37{73, Jun 2007.
[29] S G Conticello, C JF Thomas, S K Petersen-Mahrt, and M S Neuberger. Evolu-
tion of the AID/APOBEC family of polynucleotide (deoxy)cytidine deaminases.
Molecular Biology and Evolution, Jan 2005.
[30] J E Corn and J M Berger. Regulation of bacterial priming and daughter strand
synthesis through helicase-primase interactions. Nucleic Acids Research, 34:4082{
4088, 2006. 10.1093/nar/gkl363.
[31] K Cowtan. 'DM': an automated procedure for phase improvement by density mod-
ication. Joint CCP4 and ESF-EACBM Newsletter on Protein Crystallography,
31:34{38, 1994.
[32] Warren L DeLano. The PyMOL Molecular Graphics System. DeLano Scientic,
2002.
[33] J P Erzberger, M L Mott, and J M Berger. Structural basis for ATP-dependent
DnaA assembly and replication-origin remodeling. Nature Structural & Molecular
Biology, 13:676{683, 2006. 10.1038/nsmb1115.
46
[34] R Espinosa, T Funahashi, C Hadjiagapiou, M M Le Beau, and N O Davidson.
Assignment of the gene encoding the human apolipoprotein B mRNA editing
enzyme (APOBEC1) to chromosome 12p13.1. Genomics, 24(2):414{5, Nov 1994.
[35] D Fass, C E Bogden, and J M Berger. Crystal structure of the N-terminal domain
of the DnaB hexameric helicase. Structure, 7:691{698, 1999. 10.1016/S0969-
2126(99)80090-2.
[36] R J Fletcher, B E Bishop, R P Leon, R A Sclafani, C M Ogata, and X S Chen. The
structure and function of MCM from archaeal M. Thermoautotrophicum. Nature
Structural & Molecular Biology, 10(3):160{7, Mar 2003.
[37] R J Fletcher, J Shen, Y G omez-Llorente, C San Mart n, J M Carazo, and X S
Chen. Double hexamer disruption and biochemical activities of Methanobacterium
thermoautotrophicum MCM. Journal of Biological Chemistry, 280(51):42405{10,
Dec 2005.
[38] R J Fletcher, J Shen, L G Holden, and X S Chen. Identication of amino acids
important for the biochemical activity of Methanothermobacter thermautotrophicus
MCM. Biochemistry, 47(38):9981{6, Sep 2008.
[39] S L Forsburg. Eukaryotic MCM proteins: beyond replication initiation. Microbi-
ology and Molecular Biology Reviews, 68(1):109{31, Mar 2004.
[40] A Furukawa, T Nagata, A Matsugami, Y Habu, R Sugiyama, F Hayashi,
N Kobayashi, S Yokoyama, H Takaku, and M Katahira. Structure, interaction
and real-time monitoring of the enzymatic reaction of wild-type APOBEC3G. The
EMBO Journal, page 12, Jan 2009.
[41] D Gai, R Zhao, D Li, C V Finkielstein, and X.S Chen. Mechanisms of confor-
mational change for a replicative hexameric helicase of SV40 large tumor antigen.
Cell, 119:47{60, 2004. 10.1016/j.cell.2004.09.017.
[42] B P Glover and C S McHenry. The DNA polymerase iii holoenzyme: an asymmetric
dimeric replicative complex with leading and lagging strand polymerases. Cell,
105:925{934, 2001. 10.1016/S0092-8674(01)00400-7.
[43] M F Goodman, M D Schar, and F E Romesberg. AID-initiated purposeful muta-
tions in immunoglobulin genes. Advances In Immunology, 94:127{155, Jan 2007.
[44] G Hach e, M T Liddament, and R S Harris. The retroviral hypermutation specicity
of APOBEC3F and APOBEC3G is governed by the C-terminal DNA cytosine
deaminase domain. Journal of Biological Chemistry, 280(12):10920{4, Mar 2005.
[45] Y Hakata and N Landau. Reversed functional organization of mouse and human
APOBEC3 cytidine deaminase domains. Journal of Biological Chemistry, Jan
2006.
47
[46] E Harjes, P J Gross, K M Chen, Y Lu, K Shindo, R Nowarski, J D Gross, M Kotler,
R S Harris, and H Matsuo. An extended structure of the apobec3g catalytic domain
suggests a unique holoenzyme model. Journal of Molecular Biology, 389(5):819{32,
Jun 2009.
[47] R S Harris, K N Bishop, A M Sheehy, H M Craig, S K Petersen-Mahrt, I N Watt,
M S Neuberger, and M H Malim. DNA deamination mediates innate immunity to
retroviral infection. Cell, 113(6):803{9, Jun 2003.
[48] P E Hodges, N Navaratnam, J C Greeve, and J Scott. Site-specic creation of
uridine from cytidine in apolipoprotein B mRNA editing. Nucleic Acids Research,
19(6):1197{201, Mar 1991.
[49] L G Holden, C Prochnow, Y P Chang, R Bransteitter, L Chelico, U Sen, R C
Stevens, M F Goodman, and X S Chen. Crystal structure of the anti-viral
APOBEC3G catalytic domain and functional implications. Nature, 456:121{124,
Nov 2008.
[50] R K Holmes, M H Malim, and K N Bishop. APOBEC-mediated viral restriction:
not simply editing? Trends Biochem Sci, 32(3):118{28, Mar 2007.
[51] H C Hsu, Y Wu, P Yang, Q Wu, G Job, J Chen, J Wang, M V Accavitti-Loper,
W E Grizzle, R H Carter, and J D Mountz. Overexpression of activation-induced
cytidine deaminase in B cells is associated with production of highly pathogenic
autoantibodies. Journal of Immunology, 178(8):5357{65, Apr 2007.
[52] H Huang, R Chopra, G L Verdine, and S C Harrison. Structure of a covalently
trapped catalytic complex of HIV-1 reverse transcriptase: implications for drug
resistance. Science, 282(5394):1669{75, Nov 1998.
[53] H Hutho and M H Malim. Identication of amino acid residues in APOBEC3G
required for regulation by human immunodeciency virus type 1 Vif and virion
encapsidation. Journal of Virology, 81(8):3807{15, Apr 2007.
[54] A Jarmuz, A Chester, J Bayliss, J Gisbourne, I Dunham, J Scott, and N Navarat-
nam. An anthropoid-specic locus of orphan C to U RNA-editing enzymes on
chromosome 22. Genomics, 79(3):285{96, Mar 2002.
[55] S K Johnson, S Bhattacharyya, and M A Griep. DnaB helicase stimulates primer
synthesis activity on short oligonucleotide templates. Biochemistry, 39:736{744,
2000. 10.1021/bi991554l.
[56] T A Jones, J Y Zou, S W Cowan, and M Kjeldgaard. Improved meth-
ods for building protein models in electron density maps and the location of
errors in these models. Acta Crystallographica Section A, 47:110{119, 1991.
10.1107/S0108767390010224.
48
[57] S R J onsson, G Hach e, M D Stenglein, S C Fahrenkrug, V Andr esd ottir, and R S
Harris. Evolutionarily conserved and non-conserved retrovirus restriction activities
of artiodactyl APOBEC3F proteins. Nucleic Acids Research, 34(19):5683{94, Jan
2006.
[58] C Kaito, K Kurokawa, M S Hossain, N Akimitsu, and K Sekimizu. Isolation
and characterization of temperature-sensitive mutants of the Staphylococcus aureus
dnaC gene. FEMS Microbiology Letters, 210:157{164, 2002. 10.1016/S0378-
1097(02)00603-1.
[59] R Kasiviswanathan, J H Shin, E Melamud, and Z Kelman. Biochemical character-
ization of the Methanothermobacter thermautotrophicus minichromosome main-
tenance (MCM) helicase N-terminal domains. Journal of Biological Chemistry,
279(27):28358{66, Jul 2004.
[60] Z Kelman, J K Lee, and J Hurwitz. The single minichromosome maintenance
protein of Methanobacterium thermoautotrophicum DeltaH contains DNA helicase
activity. Proceedings of the National Academy of Sciences USA, 96(26):14783{8,
Dec 1999.
[61] Z Kelman and M F White. Archaeal DNA replication and repair. Current Opinions
Microbiology, 8(6):669{76, Dec 2005.
[62] A Kotani, N Kakazu, T Tsuruyama, I M Okazaki, M Muramatsu, K Kinoshita,
H Nagaoka, D Yabe, and T Honjo. Activation-induced cytidine deaminase (AID)
promotes B cell lymphomagenesis in Emu-cmyc transgenic mice. Proceedings of
the National Academy of Science USA, 104(5):1616{20, Jan 2007.
[63] R S Larue, V Andresdottir, Y Blanchard, S G Conticello, D Derse, M Emerman,
W C Greene, S R Jonsson, N R Landau, M Lochelt, H S Malik, M H Malim,
C Munk, S J O'Brien, V K Pathak, K Strebel, S Wain-Hobson, X F Yu, N Yuhki,
and R S Harris. Guidelines for naming nonprimate APOBEC3 genes and proteins.
Journal of Virology, 83(2):494{497, Nov 2008.
[64] J B Lee. DNA primase acts as a molecular brake in DNA replication. Nature,
439:621{624, 2006. 10.1038/nature04317.
[65] M Lei and B K Tye. Initiating DNA synthesis: from recruiting to activating the
MCM complex. Journal of Cellular Sciences, 114(Pt 8):1447{54, Apr 2001.
[66] D Li, R Zhao, W Lilyestrom, D Gai, R Zhang, J A DeCaprio, E Fanning,
A Jochimiak, G Szakonyi, and X S Chen. Structure of the replicative helicase
of the oncoprotein SV40 large tumour antigen. Nature, 423(6939):512{8, May
2003.
[67] W Liao, S H Hong, B H Chan, F B Rudolph, S C Clark, and L Chan. APOBEC-2, a
cardiac- and skeletal muscle-specic member of the cytidine deaminase supergene
family. Biochemical and Biophysical Research Communications, 260(2):398{404,
Jul 1999.
49
[68] H C Losey, A J Ruthenburg, and G L Verdine. Crystal structure of Staphylococcus
aureus tRNA adenosine deaminase TadA in complex with RNA. Nature Structural
and Molecular Biology, 13(2):153{9, Feb 2006.
[69] I S Lossos, R Levy, and A A Alizadeh. AID is expressed in germinal center B-
cell-like and activated B-cell-like diuse large-cell lymphomas and is not correlated
with intraclonal heterogeneity. Leukemia, 18(11):1775{9, Nov 2004.
[70] Y B Lu, P V Ratnakar, B K Mohanty, and D Bastia. Direct physical interaction
between DnaG primase and DnaB helicase of Escherichia coli is necessary for
optimal synthesis of primer RNA. Proceedings of the National Academy of Sciences
USA, 93:12902{12907, 1996. 10.1073/pnas.93.23.12902.
[71] B Mangeat, P Turelli, G Caron, M Friedli, L Perrin, and D Trono. Broad antiretro-
viral defence by human APOBEC3G through lethal editing of nascent reverse tran-
scripts. Nature, 424(6944):99{103, Jul 2003.
[72] B Mangeat, P Turelli, S Liao, and D Trono. A single amino acid determinant
governs the species-specic sensitivity of APOBEC3G to Vif action. Journal of
Biological Chemistry, 279(15):14481{3, Apr 2004.
[73] C San Martin, M Radermacher, B Wolpensinger, A Engel, C S Miles, N E Dixon,
and J M Carazo. Three-dimensional reconstructions from cryoelectron microscopy
images reveal an intimate complex between helicase DnaB and its loading partner
DnaC. Structure, 6:501{509, 1998. 10.1016/S0969-2126(98)00051-3.
[74] M C San Martin, N P J Stamford, N Dammerova, N E Dixon, and J M
Carazo. A structural model for the Escherichia coli DnaB helicase based on
electron microscopy data. Journal of Structural Biology, 114:167{176, 1995.
10.1006/jsbi.1995.1016.
[75] T Matsumoto, H Marusawa, Y Endo, Y Ueda, Y Matsumoto, and T Chiba. Expres-
sion of APOBEC2 is transcriptionally regulated by NF-kappaB in human hepato-
cytes. FEBS Letters, 580(3):731{5, Feb 2006.
[76] A V Mitkova, S M Khopde, and S B Biswas. Mechanism and stoichiome-
try of interaction of DnaG primase with DnaB helicase of Escherichia coli in
RNA primer synthesis. Journal of Biological Chemistry, 278:52253{52261, 2003.
10.1074/jbc.M308956200.
[77] M Muramatsu, K Kinoshita, S Fagarasan, S Yamada, Y Shinkai, and T Honjo.
Class switch recombination and hypermutation require activation-induced cytidine
deaminase (AID), a potential RNA editing enzyme. Cell, 102(5):553{63, Sep 2000.
[78] M Muramatsu, V S Sankaranand, S Anant, M Sugai, K Kinoshita, N O Davidson,
and T Honjo. Specic expression of activation-induced cytidine deaminase (AID),
a novel member of the RNA-editing deaminase family in germinal center B cells.
Journal of Biological Chemistry, 274(26):18470{6, Jun 1999.
50
[79] G N Murshudov, A A Vagin, and E J Dodson. Renement of macromolecular
structures by the maximum-likelihood method. Acta Crystallographica Section D
Biological Crystallography, 53:240{255, 1997. 10.1107/S0907444996012255.
[80] F Navarro, B Bollman, H Chen, R K onig, and Q Yu. Complementary function of
the two catalytic domains of APOBEC3G. Virology, Jan 2005.
[81] A F Neuwald, L Aravind, J L Spouge, and E V Koonin. AAA+: A class of
chaperone-like ATPases associated with the assembly, operation, and disassembly
of protein complexes. Genome Research, 9(1):27{43, Jan 1999.
[82] E N C Newman, R K Holmes, H M Craig, K C Klein, J R Lingappa, M H Malim,
and A M Sheehy. Antiviral function of APOBEC3G can be dissociated from cyti-
dine deaminase activity. Current Biology, 15(2):166{70, Jan 2005.
[83] R Nunez-Ramirez. Quaternary polymorphism of replicative helicase G40P: struc-
tural mapping and domain rearrangement. Journal of Molecular Biology, 357:1063{
1076, 2006. 10.1016/j.jmb.2006.01.091.
[84] A J Oakley. Crystal and solution structures of the helicase-binding domain of
Escherichia coli primase. Journal of Biological Chemistry, 280:11495{11504, 2005.
10.1074/jbc.M412645200.
[85] M OhAinle, J A Kerns, H S Malik, and M Emerman. Adaptive evolution and
antiviral activity of the conserved mammalian cytidine deaminase APOBEC3H.
Journal of Virology, Jan 2006.
[86] I M Okazaki, A Kotani, and T Honjo. Role of AID in tumorigenesis. Advances in
Immunology, 94:245{73, Jan 2007.
[87] Z Otwinowski and W Minor. Processing of X-ray diraction data collected in
oscillation mode. Methods in Enzymology, 276:307{326, Jan 1997.
[88] R Page and R C Stevens. Crystallization data mining in structural genomics: using
positive and negative results to optimize protein crystallization screens. Methods,
34(3):373{89, Nov 2004.
[89] L Pasqualucci, G Bhagat, M Jankovic, M Compagno, P Smith, M Muramatsu,
T Honjo, H C Morse, M C Nussenzweig, and R Dalla-Favera. AID is required
for germinal center-derived lymphomagenesis. Nature Genetics, 40(1):108{12, Jan
2008.
[90] L Pasqualucci, R Guglielmino, J Houldsworth, J Mohr, S Aoufouchi,
R Polakiewicz, R S K Chaganti, and R Dalla-Favera. Expression of the AID
protein in normal and neoplastic B cells. Blood, 104(10):3318{25, Nov 2004.
[91] S S Patel and K M Picha. Structure and function of hexam-
eric helicases. Annual Reviews Biochemistry, 69:651{697, 2000.
10.1146/ANNUREV.BIOCHEM.69.1.651.
51
[92] X Pedre, F Weise, S Chai, G Luder, and J C Alonso. Analysis of cis and trans
acting elements required for the initiation of DNA replication in the Bacillus sub-
tilis bacteriophage SPP1. Journal of Molecular Biology, 236:1324{1340, 1994.
10.1016/0022-2836(94)90061-2.
[93] J U Peled, F Li Kuang, M D Iglesias-Ussel, S Roa, S L Kalis, M F Goodman,
and M D Schar. The biochemistry of somatic hypermutation. Annual Review of
Immunology, 26:481{511, Jan 2008.
[94] P P erez-Dur an, V G de Yebenes, and A R Ramiro. Oncogenic events triggered by
AID, the adverse eect of antibody diversication. Carcinogenesis, 28(12):2427{33,
Dec 2007.
[95] P Pham, R Bransteitter, and M F Goodman. Reward versus risk: DNA cytidine
deaminases triggering immunity and disease. Biochemistry, 44(8):2703{15, Mar
2005.
[96] P Pham, R Bransteitter, J Petruska, and M F Goodman. Processive AID-catalysed
cytosine deamination on single-stranded DNA simulates somatic hypermutation.
Nature, 424(6944):103{7, Jul 2003.
[97] L M Powell, S C Wallis, R J Pease, Y H Edwards, T J Knott, and J Scott. A novel
form of tissue-specic RNA processing produces apolipoprotein-B48 in intestine.
Cell, 50(6):831{40, Sep 1987.
[98] C Prochnow, R Bransteitter, M G Klein, M F Goodman, and X S Chen. The
APOBEC-2 crystal structure and functional implications for the deaminase AID.
Nature, 445(7126):447{51, Jan 2007.
[99] P Revy, D Buck, F le Deist, and J P de Villartay. The repair of dna dam-
ages/modications during the maturation of the immune system: lessons from
human primary immunodeciency disorders and animal models. Advances in
Immunology, 87:237{95, Jan 2005.
[100] P Revy, T Muto, Y Levy, F Geissmann, A Plebani, O Sanal, N Catalan,
M Forveille, R Dufourcq-Labelouse, A Gennery, I Tezcan, F Ersoy, H Kayserili,
A G Ugazio, N Brousse, M Muramatsu, L D Notarangelo, K Kinoshita, T Honjo,
A Fischer, and A Durandy. Activation-induced cytidine deaminase (AID) de-
ciency causes the autosomal recessive form of the Hyper-IgM syndrome (HIGM2).
Cell, 102(5):565{75, Sep 2000.
[101] I B Rogozin, M K Basu, I K Jordan, Y I Pavlov, and E V Koonin. APOBEC4,
a new member of the AID/APOBEC family of polynucleotide (deoxy)cytidine
deaminases predicted by computational analysis. Cell Cycle, 4(9):1281{5, Sep
2005.
52
[102] F Rucci, L Cattaneo, V Marrella, M G Sacco, C Sobacchi, F Lucchini, S Nicola,
S Della Bella, M L Villa, L Imberti, F Gentili, C Montagna, C Tiveron, L Tatan-
gelo, F Facchetti, P Vezzoni, and A Villa. Tissue-specic sensitivity to AID expres-
sion in transgenic mouse models. Gene, 377:150{8, Aug 2006.
[103] J D Salter, J Krucinska, J Raina, H C Smith, and J E Wedekind. A hydrodynamic
analysis of APOBEC3G reveals a monomer-dimer-tetramer self-association that
has implications for anti-HIV function. Biochemistry, 48(45):10685{7, Nov 2009.
[104] M R Sawaya, S Guo, S Tabor, C C Richardson, and T Ellenberger. Crystal struc-
ture of the helicase domain from the replicative helicase-primase of bacteriophage
T7. Cell, 99:167{177, 1999. 10.1016/S0092-8674(00)81648-7.
[105] S L Sawyer, M Emerman, and H S Malik. Ancient adaptive evolution of the primate
antiviral DNA-editing enzyme APOBEC3G. PLoS Biology, 2(9):E275, Sep 2004.
[106] P M Schaeer, M J Headlam, and N E Dixon. Protein-protein interactions in the
eubacterial replisome. IUBMB Life, 57:5{12, 2005.
[107] T R Schneider and G M Sheldrick. Substructure solution with SHELXD. Acta
Crystallographica, 58(2):1772{9, Oct 2002.
[108] B Schr ofelbauer, D Chen, and N R Landau. A single amino acid of APOBEC3G
controls its species-specic interaction with virion infectivity. Proceedings of the
National Academy of Science USA, Jan 2004.
[109] R A Sclafani, R J Fletcher, and X S Chen. Two heads are better than one: regula-
tion of DNA replication by hexameric helicases. Genes Development, 18(17):2039{
45, Sep 2004.
[110] S M D Shandilya, M N L Nalam, E A Nalivaika, P J Gross, J C Valesano, K Shindo,
M Li, M Munson, W E Royer, E Harjes, T Kono, H Matsuo, R S Harris, M Soma-
sundaran, and C A Schier. Crystal structure of the APOBEC3G catalytic domain
reveals potential oligomerization interfaces. Structure, 18(1):28{38, Jan 2010.
[111] D F Shechter, C Y Ying, and J Gautier. The intrinsic DNA helicase activity
of Methanobacterium thermoautotrophicum delta H minichromosome maintenance
protein. Journal of Biological Chemistry, 275(20):15049{59, May 2000.
[112] A M Sheehy, N C Gaddis, J D Choi, and M H Malim. Isolation of a human gene
that inhibits HIV-1 infection and is suppressed by the viral Vif protein. Nature,
418(6898):646{50, Aug 2002.
[113] M R Singleton, M R Sawaya, T Ellenberger, and D B Wigley. Crystal struc-
ture of T7 gene 4 ring helicase indicates a mechanism for sequential hydrolysis of
nucleotides. Cell, 101:589{600, 2000. 10.1016/S0092-8674(00)80871-5.
[114] P Soultanas. The bacterial helicase-primase interaction: a common struc-
tural/functional module. Structure, 13:839{844, 2005. 10.1016/j.str.2005.04.006.
53
[115] M D Stenglein and R S Harris. APOBEC3B and APOBEC3F inhibit L1 retro-
transposition by a DNA deamination-independent mechanism. Journal of Biolog-
ical Chemistry, Jan 2006.
[116] B V Strokopytov. Phased translation function revisited: structure solution of the
colin-homology domain from yeast actin-binding protein 1 using six-dimensional
searches. Acta Crystallographica Section D Biological Crystallography, 61:285{293,
2005. 10.1107/S0907444904033037.
[117] K Syson, J Thirlway, A M Hounslow, P Soultanas, and J P Waltho. Solution
structure of the helicase-interaction domain of the primase DnaG: a model for
helicase activation. Structure, 13:609{616, 2005. 10.1016/j.str.2005.01.022.
[118] T C Terwilliger and J Berendzen. Automated MAD and MIR structure solu-
tion. Acta Crystallographica Section D Biological Crystallography, 55:849{861,
1999. 10.1107/S0907444999000839.
[119] J Thirlway. DnaG interacts with a linker region that joins the N- and C-domains
of DnaB and induces the formation of 3-fold symmetric rings. Nucleic Acids
Research., 32:2977{2986, 2004. 10.1093/nar/gkh628.
[120] J Thirlway and P Soultanas. In the Bacillus stearothermophilus DnaB-DnaG
complex, the activities of the two proteins are modulated by distinct but over-
lapping networks of residues. Journal of Bacteriology, 188:1534{1539, 2006.
10.1128/JB.188.4.1534-1539.2006.
[121] K Tougu and K J Marians. The extreme C terminus of primase is required for
interaction with DnaB at the replication fork. Journal of Biological Chemistry,
271:21391{21397, 1996. 10.1074/jbc.271.35.21391.
[122] K Tougu, H Peng, and K J Marians. Identication of a domain of Escherichia
coli primase required for functional interaction with the DnaB helicase at the
replication fork. Journal of Biological Chemistry, 269:4675{4682, 1994.
[123] B K Tye. MCM proteins in DNA replication. Annual Reviews Biochemistry,
68:649{86, Jan 1999.
[124] B K Tye. Minichromosome maintenance as a genetic assay for defects in DNA
replication. Methods, 18(3):329{34, Jul 1999.
[125] A A Vagin and A Teplyakov. MOLREP: an automated program for
molecular replacement. Journal Applied Crystallography, 30:1022{1025, 1997.
10.1107/S0021889897006766.
[126] Gregory L Verdine and Derek P G Norman. Covalent trapping of protein-DNA
complexes. Annual Reviews Biochemistry, 72:337{66, Jan 2003.
[127] C Vonrhein, E Blanc, P Roversi, and G Bricogne. Automated structure solution
with autoSHARP. Methods Molecular Biology, 364:215{230, 2006.
54
[128] G Wang, M G Klein, E Tokonzaba, Y Zhang, L G Holden, and X S Chen. The
structure of a DnaB-family replicative helicase and its interactions with primase.
Nature Structural & Molecular Biology, 15(1):94{100, Jan 2008.
[129] A Waterhouse, J B Procter, D MA Martin, M Clamp, and G J Barton. Jalview
version 2{a multiple sequence alignment editor and analysis workbench. Bioinfor-
matics, 25(9):1189{1191, 2009.
[130] J E Wedekind, R Gillilan, A Janda, and J Krucinska. Nanostructures of
APOBEC3G support a hierarchical assembly model of high molecular mass ribonu-
cleoprotein particles from dimeric subunits. Journal of Biological Chemistry, Jan
2006.
[131] J Weigelt, S E Brown, C S Miles, N E Dixon, and G Otting. NMR structure of
the N-terminal domain of E. coli DnaB helicase: implications for structure rear-
rangements in the helicase hexamer. Structure, 7:681{690, 1999. 10.1016/S0969-
2126(99)80089-6.
[132] S Wickner and J Hurwitz. Interaction of Escherichia coli dnaB and dnaC(D)
gene products in vitro. Proceedings of the National Academy of Sciences USA,
72:921{925, 1975. 10.1073/pnas.72.3.921.
[133] W W Wilson. Light scattering as a diagnostic for protein crystal growth{a practical
approach. Journal of Structural Biology, 142(1):56{65, Apr 2003.
[134] M D Winn, G N Murshudov, and M Z Papiz. Macromolecular TLS renement in
REFMAC at moderate resolutions. Methods Enzymology, 374:300{321, 2003.
[135] K Xie, M P Sowden, G S C Dance, A T Torelli, H C Smith, and J E Wedekind.
The structure of a yeast RNA-editing deaminase provides insight into the fold
and function of activation-induced deaminase and APOBEC-1. Proceedings of the
National Academy of Science USA, 101(21):8114{9, May 2004.
[136] S Yang. Flexibility of the rings: structural asymmetry in the DnaB hexam-
eric helicase. Journal of Molecular Biology, 321:839{849, 2002. 10.1016/S0022-
2836(02)00711-8.
[137] X Yu, M J Jezewska, W Bujalowski, and E H Egelman. The hexameric E. coli
DnaB helicase can exist in dierent quaternary states. Journal of Molecular Biol-
ogy, 259:7{14, 1996. 10.1006/jmbi.1996.0297.
[138] H Zhang, B Yang, R J Pomerantz, C Zhang, S C Arunachalam, and L Gao. The
cytidine deaminase CEM15 induces hypermutation in newly synthesized HIV-1
DNA. Nature, 424(6944):94{8, Jul 2003.
[139] L Zhang, J Saadatmand, X Li, F Guo, M Niu, J Jiang, L Kleiman, and S Cen.
Function analysis of sequences in human APOBEC3G involved in Vif-mediated
degradation. Virology, 370(1):113{21, Jan 2008.
55
Appendix A: mtMCM ATPase
Reproduced with permission from Ryan J Fletcher, Jingping Shen, Lauren G Holden, and
Xiaojiang S Chen. 2008. Identication of Amino Acids Important for the Biochemical
Activity of Methanobacter thermautotrophicus MCM. Biochemistry Aug 29;47 (38), pp
9981-9986. Copyright 2008 American Chemical Society. [38]
Author contributions: R.J.F. designed the project, designed and puried the proteins,
and provided data analysis. J.S. performed biochemical assays and provided data analy-
sis. L.G.H. performed the ATPase assays. X.S.C. supervised the project.
A 1 Abstract
Methanobacter thermautotrophicus minichromosomal maintenance protein (mtMCM) is
a 75 kDa protein that self-assembles into a double hexamer structure. The double hex-
amer formed by the N-terminal region of mtMCM has a highly charged (overwhelmingly
net positive) inner channel. Here we investigate the eects of point mutations of some
these charged residues on the biological activities of mtMCM. Although all of the mutants
were similar to the wild type in protein folding and complex assembly, we found that
mutations impaired helicase activity. The study of the DNA binding and ATPase activ-
ities of these mutants revealed that the impairment of the helicase activity was highly
correlated with a decrease in DNA binding, providing evidence consistent with the role
of these charged residues of the inner channel in interactions with DNA.
56
A 2 Introduction
The progression from G
1
to S phase during the cell cycle is highly regulated. A group
of proteins essential for the successful transition and the completion of S phase are the
minichromosomal maintenance proteins (MCM). The MCM proteins in eukaryotes are a
subgroup of six proteins (MCM2-7) within the much larger AAA+ ATPase family [81].
MCM 2-7 share a highly similar stretch of amino acids referred to as the MCM box
[124, 65, 61] located in the C-terminal half of the protein. The MCM box contains the
Walker A and Walker B motifs responsible for the binding and hydrolysis of ATP.
MCM proteins play an important role in the initiation of replication, as these pro-
teins serve as replication helicases for fork unwinding. During G
1
phase, origins with
the potential to serve as sites for the initiation of DNA replication are marked by an
origin recognition complex (ORC). Subsequent binding by Cdc6 and Cdt1 promotes the
recruitment of MCM proteins to form the prereplication complex (pre-RC). Failure to
recruit MCM proteins results in loss of origin ring and G1-phase arrest. Ensuing activ-
ity by both Dbf4 dependent kinase (DDK) and cyclin dependent kinase (CDK) promotes
the binding of Cdc45 and recruitment of a number of replication proteins to the pre-RC,
ulitmately leading to the initiation of replication. The presence of all six MCM proteins
is necessary for entry and completion of S-phase (reviewed in refs [123, 7, 39]).
Methanobacter thermautotrophicus has a single MCM (mtMCM) protein that has the
ability to self-assemble into a double hexamer structure [111, 25, 36, 60, 37, 59]. Puried
mtMCM protein contains ATPase, DNA binding, and helicase activities in vitro. The
crystal structure of the N-terminal portion of double-hexameric mtMCM (N-mtMCM)
revealed an overwhelmingly net positiviely charged inner channel surface [36]. Here we
investigated the functional role of ve residues located on the inner channel surface of
the double hexameric N-mtMCM structure. Mutational and biochemical studies revealed
that these residues play important roles in helicase function.
57
A 3 Experimental Procedures
A 3.1 Cloning and Mutagenesis
The gene for Methanobacter thermautotrophicus MCM was cloned into the pGEX-2T
expression vector using the restriction enzymes XhoI and XbaI for expression as a GST-
mCM fusion protein. Puried wild-type mtMCM protein was always severely degraded,
resulting in a heterogeneous population. We identied the two major cleavage sites
for this degradation by N-terminal sequencing and mass spectrometry of the degraded
protein fragments (data not shown). These two cleavage sites occurred at R275 and
R338. To decrease the degradation and provide a more homologous protein product,
we generated a mutant with the two arginines mutated to alanines. The degradation
of the mutant was greatly reduced compared with the unmodied wild type protein
(see Supporting INformation Figure 1). At the same time, this mutant protein retained
helicase, ATPase, and DNA binding activities at levels similar to those of the wild type
protein. All the channel surface mutants reported in this study were constructed to also
contain these two mutations to eliminate dierences in degradation between dierent
constructs. In this study, the construct containing the two arginine to alanine mutations
is referred to as wild type (wt). Five charged residues on the channel surface, R116, K117,
R122, K178, R223 (Figure 1A, B), were all mutated to alanine through standard site-
directed mutagenesis methods. The sequences of all clones were veried by sequencing
the DNA region encoding the entire MCM protein.
A 3.2 Protein Purication
All proteins were grown in Escherichia coli cells at 37
C until OD
600
= 0.2, and then the
temperature was reduced to 24
C. Isopropyl--d-thiogalactoside (IPTG) was added to
0.2 mM when OD
600
= 0.5. The cells were shaken at 24
C overnight. Cell pellets were
resuspended in purication buer (50 mM Tris, pH 8.0, 250 mM NaCl, 10 mM DTT) and
sonicated on ice with protease inhibitors pepstatin, leupeptin, and phenylmethylsulfonyl
58
Figure A.1: Location of the ve positively charged residues on the inner chan-
nel surface of N-mtMCM: A: N-mtMCM hexamer structure showing the ve mutated
residues, R116 (yellow), K117 (pink), K178 (green), R122 (blue), and R223 (cyan). B:
A closeup of the N-mtMCM view shown in panel A. R223 is on one side of the -
hairpin, R122 is adjacent to the -hairpin of another monomer. K117 is also on one
side of a monomer, around the side channel formed between two neighboring subunits.
C: Sequence alignments of mtMCM (methanoMCM) and MCM2{7 from S. cerevisiae
(scMCM) and humans (humanMCM). Amino acids mutated in this study are marked
by asterisks. Red boxes highlight aligned residues that are mutated in this study.
uoride (PMSF). The GST-MCM fusion protein in the supernatant was separated from
other proteins using glutathione resin (Amersham) at 4
C. After thrombin cleavage of
59
the GST-fusion protein, the free MCM protein was further puried using Superose-6 gel
ltration column chromatography (16/60 cm). Peak fractions were collected and purity
was analyzed by SDS-PAGE. Protein was concentrated using Amicon Ultra (Millipore)
centrifuge devices and quantied using the Bio-Rad protein assay and Coomassie blue
staining of an SDS-PAGE gel.
A 3.3 Heat Stability of Mutants
Protein samples (10g) were heated at 55
C in 15L of helicase buer for 30 min. The
heat-treated protein was analyzed by Superose-6 column chromatography and on a 10%
SDS-PAGE gel.
A 3.4 Helicase Assay
A 60 nucleotide primer
5
0
TTT TTT TTT TTT TTT TTT TTT TTT CGC GCG GGG
AGA GGC GGT TTG CGT ATT GGG CGC C
3
0
(Operon) was puried and radio-
labeled as previously described for DNA binding assays. The labeled primer was mixed
in a 2:1 ratio with M13mp18 circular single-stranded DNA in 300 mM NaCl and 20 mM
Tris (pH 8.0), boiled in 900L water for 3 min and allowed to cool to room temperature
over 5 h. This helicase substrate containing 34 complementary bases and a 26 dT non-
complimentary overhang was puried from excess primer using 700L Sephacryl 300HR
resin in a 2 mL glass column (Bio-Rad). Fractions containing helicase substrate were
pooled and stored at 4
C. Reactions of 20 L containing 20 mM Tris (pH 7.8), 10 mM
MgCl
2
, 1 mM DTT, 5 mM ATP, 0.1 mg/mL BSA, helicase substrate, and MCM protein
were assembled on ice. The reactions were then incubated at 50
C for 30 min, 5x stop
solution (10 mM EDTA, 0.5% SDS, 0.1% xylene cyanol, 0.1% bromophenol blue, and
50% glycerol) was added and the sample was placed on ice. Reactions were analyzed on
a 12% polyacrylamide gel in 1X TBE run on ice for 50 min at 150 V. The gel was dried
and then exposed to lm and a phosphorimaging plate for quantication.
60
A 3.5 DNA Binding Assay
All primers (Operon) were resuspended in 10 mM NaOH and puried by FPLC using
a Mono-Q 10/100 column (Amersham). Puried DNA was concentrated and quanti-
ed using OD
260
. For the double-stranded DNA (dsDNA) binding assay, complemen-
tary primers were mixed, boiled for 10 min in 900 L of water, and allowed to cool to
room temperature overnight. The annealed dsDNA was then puried by FPLC over a
Superdex-75 16/60 (Amersham) column in 50 mM NaCl. The DNA used for the single-
strand binding was
5
0
AAA GCG CTG ACC TAT CGC GTA TAG CTC GAG GA
3
0
and the complementary oligonucleotides used for the double strand binding assay were
5
0
GCG CTG ACC TAT CGA CCT ATA CGG TTA GCC
3
0
and
5
0
GGC TAA CCG TAT
AGG TCG ATA GGT CAG CGC
3
0
. The primers were labeled using T4 polynucleotide
kinase (NEB) and
-
32
P-ATP (Amersham, 3000 Ci/mmol). The labeled primer was puri-
ed using Microspin G-25 columns (Amersham) and stored at 4
C. Reactions of 20 L
were assembled at room temperature with 50 mM NaCl, 5 mM MgCl2, 50 mM Tris (pH
7.8), 1 mM DTT, 0.1 mg/mL BSA, mtMCM protein (between 0.5?1.0 M), and 15 nM
labeled DNA. Reactions were incubated at 25
C for 30 min and then 5L loading buer
(50% glycerol, 0.1% xylene cyanol, and 0.1% bromophenol blue) was added. Reactions
were analyzed by gel electrophoresis and by lter binding assay. For gel analysis, the
reactions were loaded onto a 2% agarose gel and electrophoresed for 1 h at 80 V in 1X
TBE buer. Gels were dried and exposed to lm and phosphorimaging plates for quan-
tication. For lter binding assay, 10 l reaction mixtures were applied to nitrocellulose
lter (Millipore, HA 0.45 mm). The lter was washed with 15 mL of reaction buer.
The radioactivity retained on the lter was quantied by liquid scintillation counting.
A 3.6 ATPase assay
Reactions (10 L) containing 50 mM NaCl, 5 mM MgCl2, 50 mM Tris (pH 7.8), 1 mM
DTT, 0.1 mg/mL BSA,
-
32
P-ATP (Amersham, 3000 Ci/mmol), 15 M cold ATP, and
various amounts of MCM proteins were assembled on ice. Reactions were incubated at
61
50
C for 30 min, plunged into ice water and 10 mM EDTA was added. Aliquots (5
L) from each reaction were placed onto a prewashed PEI cellulose TLC plate (EMD
Chemicals INC) and run for two hours in 2 M acetic acid and 0.5 M LiCl. Plates were
dried and exposed to lm and phosphorimaging plates for quantication.
A 4 Results
A 4.1 Location of Mutated Residues on the mtMCM Double-Hexamer
Structure
The crystal structure of the N-terminal region of mtMCM (N-mtMCM) shows that the
ve residues (R116, K117, R122, K178, R223) studied here are located close to the
central channel surface (Figure 1A,B). K117, R122, and K178 are highly conserved among
eukaryotic MCMs (Figure 1C). R223 is not strictly conserved, but there appear to be few
positively charged residues in the vicinity of R223 among the aligned MCM sequences
and R116 appears to be unique to mtMCM (Figure 1C). R116, K117, and R122 are
located on the same -strand and K178 is on the adjacent antiparallel -strand (Figure
1A,B). R223 is located on the previously characterized DNA binding -hairpin nger
[36] and is close to the cluster of the other mutated amino acids.
A 4.2 Mutant Protein Assembly and Stability
We rst performed experiments to test the structural integrity of the mutant mtMCM
proteins in order to rule out the possibility of disruption of protein folding or assembly
by the mutations. All mutant proteins and the wild-type (wt) protein were analyzed by
FPLC using an analytical Superose-6 gel ltration column before and after the proteins
were heat-treated at 55
C. With or without heat treatment, all mutants had similar chro-
matographic proles as the wt, characterized by a peak (P2) with an apparent molecular
weight of a double hexamer of mtMCM (Figure 2A). SDS-PAGE of fractions collected
across this peak revealed a predominant band consistent with the size of the full-length
62
Figure A.2: Gel ltration analysis of wt and mutant mtMCM proteins: A:
FPLC overlay of OD
280
curves from wt (blue) and the ve mutant proteins run on a
preparative grade Superose 6 column. The proles of all proteins overlapped. Peak 1
(P1) is in the void volume, P2 contains protein that has an apparent molecular weight
(Mwt) of approximately 960 kDa, and peak 3 (P3) contains degraded protein that cannot
oligomerize. P2 has an apparent molecular weight consistent with that of a double
hexamer. B: SDS{PAGE analysis of the peak fraction of P2 for the wt and each mutant
protein. Lanes 1{6 are wt, R116A, K117A, K178A, R122A, and R223A, respectively.
An asterisk marks the position of the full-length protein. The minor bands below the
full-length protein band were the result from the small amount of degradation of the
full-length protein.
protein (Figure 2B). N-terminal sequencing showed that other minor bands were degra-
dation products of the full-length protein that could not be completely eliminated even
after extensive purication. As all mutants and wt showed similar extents of degra-
dation, the degradation should not interfere with the comparison and interpretation of
biochemical results. The chromatography analysis indicated that none of the mutations
aected mtMCMs folding, oligomerization state, or heat stability in solution.
63
Figure A.3: Helicase activity of wt and mutant proteins: A: Gel analysis of
helicase assay. Increasing concentrations of wt (lanes 3{6), R223A (lanes 7{10), R116A
(lanes 11{14), K117A (lanes 15{18), R122A (lanes 19{22), and K178A (lanes 23{26) were
incubated with radiolabeled helicase substrate at 50
C for 30 min, and samples were then
analyzed by 12% polyacrylamide gel electrophoresis. Boiled and no protein controls are
in lanes 1 and 2, respectively. B: Quantitation of unwinding in panel A, expressed as
percentage of ssDNA released from the dsDNA substrate. Protein concentrations were
calculated based on double hexamer molecular weights. Error bars (SEM) are from a
minimum of three independent experiments.
A 4.3 Helicase Assay
A helicase assay showed that the mutants had various degrees of reduced helicase activity
when compared with the wt protein (Figure 3A,B). Over the tested protein concentration
range, helicase activity of R223A was 2{4 fold lower than the activity of the wt protein
and the helicase activities of R116A, K117A, R122A, and K178A were 3{7 fold lower that
that of the wt (Figure 3A,B). This suggests that these charged residues are important
for the helicase activity of mtMCM.
64
A 4.4 DNA binding
The ability of wt and mutants to bind single-stranded DNA (ssDNA) and double-
stranded DNA (dsDNA) was determined by native gel shift using a radiolabeled oligonu-
cleotide and four dierent concentrations for each protein. All mutants showed reduced
ssDNA binding compared to wt (Figure 4A). At the protein concentration of 1 M, the
ssDNA binding activity for K117A and K178A was approximately 4 fold lower than that
for the wt, and for the other three mutants (R116A, R122A, R223A), it was roughly
2 fold lower than wt. dsDNA binding assay results were quite similar to those of the
ssDNA binding analysis (Figure 4B). Four out of the ve mutants (K117A, R122A,
K178A, R223A) showed lower dsDNA binding (approximately 2{4 fold) than the wt
(Figure 4B). However, R116A bound to dsDNA at a level similar to the wt protein.
A 4.5 ATPase Activity
The eect of each mutation on the ability of mtMCM to hydrolyze ATP was measured
using standard ATPase assays. The mutations (R116A, K117A, R122A, K178A, R223A)
had little eect on the level of ATP hydrolysis when compared to wt (Figure 5A{C),
suggesting that none of the residues are directly involved in ATP binding or hydrolysis.
A 5 Discussion
The charged residues under investigation (R116, K117, R122, K178, and R223) are all
located on the inner central channel surface of the N-mtMCM, a location potentially
important for interacting with DNA, which binds within the central channel, and for
helicase function. We showed here that mutation of any of these ve residues to alanine
resulted in lower helicase activity compared with the wt protein. Although all mutants
showed ATPase activity comparable to that of wt, all ve mutants had consistently
reduced ssDNA binding and four out of the ve mutants showed reduced dsDNA binding,
an observation consistent with the role of these residues in interaction with DNA.
65
Figure A.4: DNA binding of wt and mutants: A: Quantitation of ssDNA binding
of wt and mutants at the indicated protein concentrations. B: Quantitation of dsDNA
binding of wt and mutants at the indicated protein concentrations. Protein concentra-
tions were calculated based on double hexamer molecular weights. Error bars (SEM) are
from a minimum of three independent experiments
Even though the positively charged residues mutated in this study are all located
on the inner channel of the double hexamer, the ve residues have dierent degrees of
exposure to the channel surface. For example, R122, R116, and K117 are located on
the side of the monomer that faces the next monomer within the ring (Figure 1A,B),
forming the edge around the wide-open side of the channel. These three residues are not
as exposed to the channel surface as R122 and K178. In order to directly contact DNA,
66
Figure A.5: ATPase activity of the wt and mutants: A: TLC analysis of ATPase assay. Four protein concentrations, 16,
32, 64, and 128 nM, were used for wt (lanes 2{5 and 19{22), R116A (lanes 6{9), K117A (lanes 10{13), R122A (lanes 14{17),
K178A (lanes 23{26), and R223A (lanes 27{28). No protein control is in lanes 1 and 18. B: Quantitation of results from panel
A. Protein concentrations were calculated based on double hexamer molecular weights. C: TLC analysis of ATPase assay of
the mutants in the presence of dsDNA (10 ng/L 1.0 kb PCR DNA product), showing that all mutants responded to dsDNA
stimulation of ATPase activity similarly to the wt. The chart on the bottom is an quantitation of ATPase stimulation results.
Error bars (SEM) in all charts are from a minimum of three independent experiments.
67
a modest conformational change would be required to expose these residues more to the
inner channel surface. Alternatively, DNA may interact with these residues by binding
or passing the side channel, as suggested in the looping model of SV40 LTag [66, 41, 109].
Furthermore, the location of the residues near the interface between two adjacent
subunits (R122, R116, K117) may in
uence the interactions between two neighboring
monomers during conformational changes necessary for helicase function. Of the residues
evaluated, R122 has been shown to have direct contact with the next neighbor through
the-hairpin (Figure 1B). Indeed, of the mutants analyzed, R122A, R116A, and K117A
showed the most signicant reduction in ssDNA and dsDNA binding (Figure 4) as well
as in helicase function (Figure 3). R116, K117, and R122 are conserved among most S.
cerevisiae and human MCMs (Figure 1C), suggesting the importance of these residues
for MCM function.
Based on the N-mtMCM structure, R223 sits on the previously characterized DNA
binding -hairpin structure that has been shown to be important for helicase function
(10, 17). Previously, double or triple mutations of three other positively charged residues
on this -hairpin, R226, R229, and K231, were shown to completely disrupt the DNA
binding and helicase function. Intriguingly, point mutation of R223 to alanine only
partially disrupted MCM DNA binding and helicase activity (Figure 3, Figure 4), sug-
gesting that the multiple positively charged residues on the -hairpin may be partially
complementary to each other.
In summary, this mutational study clearly demonstrated that the positively charged
residues on the inner central channel, R116, K117, R122, K178 and R223, are critical
for the functional activities of mtMCM. Mutation of any of these residues resulted in
decreased helicase activity, which was strongly associated with reduced binding to both
ssDNA and dsDNA. Even though these residues are located outside of the highly con-
served MCM box (helicase domain), the critical residues for classifying MCM proteins,
four of the ve residues are highly conserved in eukaryotes, underscoring the functional
68
signicance of these charged residues of the hexameric channel surface of the N-terminal
region of MCM.
A 6 Supporting Information
Comparison of the wt mtMCM and the double mutant on the full-length mtMCM. This
material is available free of charge via the Internet at http://pubs.acs.org.
69
Appendix B: G40P ATPase
Reproduced with permission from Ganggang Wang, Michael G Klein, Etienne Tokonzaba,
Yi Zhang, Lauren G Holden, and Xiaojiang S Chen. Nature Structural & Molecular
Biology 15, 94 - 100. Copyright 2008 Nature Publishing Group [128]
Author contributions: G.W. designed, puried and crystallized the protein construct.
M.G.K. solved the structure and provided analysis of the structure. E.T. did the helicase
and DNA binding assays. Y.Z. made and puried mutants for the biochemical assays.
L.G.H. performed the ATPase assays. X.S.C. supervised the project.
B 1 Abstract
Helicases are essential enzymes for DNA replication, a fundamental process in all living
organisms. The DnaB family are hexameric replicative helicases that unwind duplex
DNA and coordinate with RNA primase and other proteins at the replication fork in
prokaryotes. Here, we report the full-length crystal structure of G40P, a DnaB family
helicase. The hexamer complex reveals an unusual architectural feature and a new
type of assembly mechanism. The hexamer has two tiers: a three-fold symmetric N-
terminal tier and a six-fold symmetric C-terminal tier. Monomers with two dierent
conformations, termed cis and trans, come together to provide a topological solution
for the dual symmetry within a hexamer. Structure-guided mutational studies indicate
70
an important role for the N-terminal tier in binding primase and regulating primase-
mediated stimulation of helicase activity. This study provides insights into the structural
and functional interplay between G40P helicase and DnaG primase.
B 2 Introduction
Replication of cellular genomic DNA requires the highly coordinated activities of many
factors. In Escherichia coli cells, initiation of DNA replication occurs through the con-
certed actions of DnaA, the DnaB helicase and the DnaG primase, which leads to the
assembly of the replisome complex and the formation of two replication forks (reviewed
in refs. [106, 30]). For the replication fork to form, DnaB must be recruited to the melted
origin of DNA by DnaC and DnaA [132, 33, 27]. The DnaB helicases also associate with
the DnaG primase and the polymerase loader DnaX to coordinate fork unwinding with
RNA primase and DNA polymerase activities [1, 122, 55, 114, 42].
During elongation of DNA replication, DnaB helicase unwinds double-stranded DNA
to provide templates for leading- and lagging-strand synthesis (reviewed in refs. [106,
30, 8]). Evidence indicates that the DnaB-like helicases encircle single-stranded DNA
near a DNA fork on the 5' side, with the C-terminal domain of the helicase facing
the fork (reviewed in [91]). As DnaB unwinds DNA in a 5'! 3' direction, Okazaki
fragments are primed by primase for lagging-strand synthesis using the ssDNA exiting
the helicase channel. By recruiting DnaG to hexameric DnaB at the replication fork,
DnaB regulates the priming activity and processivity of RNA primase [122, 55, 70, 121].
Conversely, primase stimulates the ATPase and helicase activities of DnaB [12, 120].
Although it is well-documented, no mechanistic explanation is available to explain this
cross-talk between primase and helicase.
The Bacillus subtilis bacteriophage SPP1 helicase, G40P, is a homolog of bacterial
DnaB helicase. G40P has the same domain structure as other bacterial DnaB homologs
(Fig. 1a) and shares 35% and 45% sequence identity with the replicative helicases from
71
E. coli and B. subtilis, respectively. Functionally, G40P and the cellular helicase both
interact with DnaG primase for DNA replication [3, 92].
The current structural information for DnaB hexameric helicase is limited to low-
resolution EM images obtained for DnaB and G40P. These reveal two main classes of
double-tiered hexamer: three-fold and six-fold hexamers [137, 74, 73, 136, 83]. Domain
assignments have been attempted, placing the N-terminal fragment of DnaB [35, 131]
into the smaller ring of the EM images and positioning the T7 gp4 helicase domain
[104, 113] into the larger ring.
To advance our understanding of DnaB's role at the replication fork and its interac-
tions with other replication proteins, such as DnaG primase, we determined the crystal
structure of the full-length G40P hexamer. The structure revealed a new type of archi-
tecture that had an unexpected dual symmetry: a three-fold N-terminal tier and a
near{six-fold C-terminal tier. Assembly of the two distinct tiers in a single hexamer is
achieved by using two monomer conformations. Monomers with cis and trans confor-
mations interact alternately, like a right hand holding a left hand, with three such pairs
joining in a circle to form a hexameric ring. Mutagenesis guided by the G40P structure
provided insights into structural and functional interplay with DnaG primase.
B 3 Results
B 3.1 Overall structure of the hexamerc helicase
We determined the structures of the full-length and a deletion mutant of G40P helicase,
a DnaB homolog from B. subtilis bacteriophage SPP1 (see Methods). The deletion
mutant (N129) lacking the N-terminal 129 residues yielded a crystal form containing one
molecule per asymmetric unit (a.s.u.). The structure of N129 (Supplementary Fig. 1a
online) was used to help determine the full-length G40P structure. The full-length G40P
crystallized with one complete hexamer in one a.s.u.
72
Figure B.1: The overall features of the G40P hexameric helicase structure: (a) Diagram showing the domain organi-
zation of the full-length SPP1 G40P replicative helicase. The 'linker' region between the N-terminal globe (N-glob, green) and
the C-terminal ATPase domain (magenta) forms an-hairpin composed of two-helices (cyan). (b) Side view of the full-length
G40P hexamer structure in ribbon representation. Note the distinct separation between the wider, thinner top N-terminal
tier (green and cyan) and the narrower, thicker bottom C-terminal tier (magenta). (c) Top view of the G40P hexamer in
ribbon representation. View shows the wide open, quasi{three-fold triangular N-terminal tier on the top of the pseudo{six-fold
C-terminal tier.
73
The G40P hexamer resembled a two-tiered ring (Fig. 1b,c). A very unusual feature
of this double-tiered ring was the presence of two distinctive symmetry patterns (Fig.
1c). The top tier, containing the N-terminal domains, had a near{three-fold symmetry.
In contrast, the bottom tier, composed of the C-terminal ATPase domains, had a quasi{
six-fold symmetry. Unexpectedly, the top N-terminal tier (N-tier) was wider than the
C-terminal tier (C-tier). This is in contrast to the structure suggested in the EM reports
for DnaB and G40P, which assigned the C-terminal ATPase ring to be the wider of the
two tiers [136, 83]. The top N-tier had a much larger channel diameter than the bottom
C-tier, 42
A versus 17
A, respectively, as measured between the nearest C carbons.
Another surprise was that the linker region that was previously assumed to be
exible
was actually well-structured in the full-length hexamer.
B 3.2 Monomer structure
The full-length G40P monomer structure was composed of three domains: an N-terminal
globular domain (residues 12{93), a 'linker' region (residues 94{147) composed of two
long -helices, and a C-terminal RecA-like domain (residues 179{437) (Fig. 2a,b). The N-
terminal globular domain (N-globe) consisted of four -helices (h1{h4; Fig. 2a), as seen
in the X-ray and NMR structures of the N terminus of E. coli DnaB [35, 131]. The linker
region folded into two consecutive-helices (h5 and h6) arranged in an antiparallel fash-
ion to form a hairpin-like structure (-hairpin) (Fig. 2a,b). The RecA-like C-terminal
domain (C-domain) consisted of a nine-stranded -sheet sandwiched by three -helices
on both sides. This-sheet core was similar to that of T7 gp4 helicase domain [104, 113],
with a superposition of 1.233
A r.m.s. deviation over 77 C atoms from the -sheet core
(Supplementary Fig. 1b). However, superposition over 253 C atoms of G40P ATPase
domains and T7 gp4 helicase domain had an r.m.s. deviation of 2.825
A, indicating a
much larger dierence for the helical and loop regions outside the -sheet core.
74
Figure B.2: Two distinct monomeric structures of G40P: (a) The cis structure,
in which the -hairpin (helices h5, h6; green) points toward the ATPase domain (cyan). -
helices are labeled h1h15, strands 19, both from N to C termini. In this conformation, the
-hairpin, but not the N-globe, contacts the ATPase domain. (b) The trans structure, in
which the -hairpin (yellow) points away from the ATPase domain. In this conformation,
neither the -hairpin nor the N-globe (blue) contacts the ATPase domain (red). (c)
Superposition of the two conformations based on the ATPase domains, which overlap
well (0.35 r.m.s. deviation). However, from h7 toward the N terminus, the -hairpin and
N-globe of the two conformers have very dierent orientations and positions.
B 3.3 Cis and trans structures
One feature of the G40P hexamer that to our knowledge is unique was that the complex
was composed of two drastically dierent monomer conformations, termed cis and trans
structures (Fig. 2a versus Fig. 2b). The cis structure had the -hairpin pointing to the
same side (cis side) as the C-domain (Fig. 2a). The N-globe in the cis structure projected
away from the C-domain. The trans structure had the-hairpin pointing to the opposite
side (trans side) of the C-domain (Fig. 2b), which placed the N-globe in a dierent
position compared to that in the cis structure. Another distinction was the connecting
loop (loop1) between h7 and the -hairpin, which was kinked in the cis structure (Fig.
75
2a) but nearly straight in the trans structure (Fig. 2b). The dierences between the cis
and trans monomers become evident when the two C-domains are superimposed (Fig.
2c). It appears that a large rotation of the -hairpin relative to h7 would be needed
to generate the marked positional switch of the -hairpin and N-globe between the two
conformers.
Architecture of the N-terminal tier
The cis and trans structures assembled in the hexameric ring in alternating arrangement
(Fig. 3a). The cis monomer 1 was positioned between two trans monomers (2 and 6)
to form the 2$1 and 1$6 interfaces. Thus, two distinct interfaces can be found within
the N-tier, termed the tail-tail dimer (or-hairpin dimer, Fig. 3b) and head-head dimer
(Figure B.3 c), both formed by pairing a cis and a trans monomer. The-hairpin dimer
interface buried on average a surface area of 1.781
A
2
. The interface interactions were
extensive and involved a total of 32 residues. In contrast to the extensive hairpin-hairpin
interface, the head-head interaction had a relatively small interface burying on average
922
A
2
, with 19 residues making bonding contacts.
B 3.4 Hexamerization of the ATPase domain
Because no strict six-fold symmetry existed along the hexameric channel, the six inter-
faces between ATPase domains within the C-tier showed marked dierences. As a result,
the surface area buried ranged from 2.243
A
2
at the smallest interface to 3.122
A
2
at
the largest interface within the C-terminal ATPase ring (including helix 7 and the entire
ATPase domain, residues 158 to 436), suggesting plasticity in interactions between C-
terminal ATPase domains. The h7 at the N terminus of the ATPase domain serves to
hold the ATPase ring together, extending out to t into a groove on the adjacent ATPase
domain (Fig. 3d). Comparing the six monomers in the hexamer structure in the region
of h7 showed that this hinged arm emanated from its own ATPase domain with dierent
angles, largely because of the
exibility provided by two glycines (Gly173 and Gly177)
76
Figure B.3: Assembly of the G40P hexamer: (a) The triangular N-tier, showing two
types of dimer interface: three N-globe-to-N-globe interfaces (head-head) and three -
hairpin-to--hairpin interfaces. The three cis monomers (purple) form the inner triangle,
and the three trans monomers (blue) form the outer triangle. (b) The tail-tail dimer
formed by packing between the cis (purple) and trans (blue) -hairpins. (c) The head-
head dimer formed between cis and trans N-globes. (d) Side view of the G40P hexamer
showing the inter-subunit arrangement in a hexamer assembly. The h7 of the purple or
blue monomer reaches into the ATPase domain of the next monomer, and the N-globe
of the purple monomer (cis) packs with h13 and h14 of the ATPase domain of the blue
monomer (trans). The -hairpin of the trans monomer in blue reaches over to pack
with the -hairpin of the cis monomer in purple to form a four-helix bundle, and the
four-helix bundle interacts with h13 and h14 of the cis ATPase domain in purple.
on the loop connecting h7 to the ATPase domain. This loop's intrinsic
exibility allows
h7 to grip the neighboring monomer when the interface area changes between adjacent
ATPase domains, possibly facilitating conformational changes of the hexamer required
for DNA unwinding.
77
Around the ATP binding pocket of the full-length G40P hexamer, crystallized in the
absence of nucleotide, the critical arginine nger (Arg414) from the neighboring subunit
pointed away from the p-loop (the phosphate-binding region containing the Walker A
motif) (Supplementary Fig. 1c), as in the empty site of T7 gp4 and SV40 large T helicases
[104, 113, 41]. In contrast, in the G40P-129 structure, crystallized as a complex with
the nonhydrolyzable analog ATP-S, the residues involved in binding ATP, in particular
the arginine nger (Arg414) together with Lys412, contacted the triphosphate groups of
the nucleotide (Supplementary Fig. 1d).
Besides the N-to-N and C-to-C intra-tier domain interactions, there were also inter-
tier contacts, which were characterized by two types of N-to-C domain-packing inter-
actions (Fig. 3d). In one of these inter-tier contacts, an N-globe from a cis monomer
rested on the ATPase domain of an adjacent trans monomer. The second N-C contact
involved two-hairpins (the four-helix bundle) resting on the ATPase domain from a cis
monomer. These two N-to-C inter-tier contacts were composed mostly of hydrophobic
residues. Both of these packing interactions may be critical for proper inter-tier commu-
nication, as we found them to have an unexpected role in helicase function and in the
functional interplay with DnaG primase (discussed below).
B 3.5 N-terminal requirement for helicase activity and primase bind-
ing
We investigated the requirement of the N-globe and -hairpin regions of G40P for heli-
case activity by a series of N-terminal truncations (Figure B.4). All the deletions (N92,
N108, N112, N129), as well as the full-length G40P, readily assembled into hexam-
ers in gel ltration chromatography, even in the absence of ATP (data not shown). In
helicase assays, the N92 mutant, which lacks the N-globe but has an intact -hairpin,
retained 66% of the wild-type (WT) activity (Table 1). Other deletions (N108, N112,
N129) that lack an intact -hairpin showed no detectable helicase activity. These
results suggest that the -hairpin structure of the linker region is critical for helicase
78
activity. In contrast, the N-globe was not essential for helicase function, even though
deleting the N-globe consistently resulted in reduced helicase activity.
G40P/DnaB replicative helicases bind DnaG primase at the replication fork; this
interaction is important for coordinating DNA unwinding by the helicase with RNA
primer synthesis by the primase. We investigated the requirement of the N-terminal
domains of G40P for primase binding using the N-terminal deletions. In contrast to
the full-length G40P, all four N-terminal deletions (N92, N108, N112, N129)
were devoid of primase binding in a native gel-shift assay (Table 1), demonstrating
the importance of the N-terminal domains containing at least the rst 92 residues in
DnaG binding. We next examined whether the isolated N-terminal domain of G40P
could bind primase. Two constructs containing only the N-terminal -hairpin and N-
globe domains (N149 and N171, containing residues 1 to 147 and 1 to 171, respectively)
had no detectable primase binding in native gel shift assays (Fig. 4d and Table 1 as
Figure B.5). Both constructs behaved like monomers in gel ltration (data not shown).
To identify residues on the N-terminal tier of G40P that participate in primase bind-
ing, we constructed three point mutations (mt1{mt3, Table 1 as Figure B.5). The loca-
tions of the mutated residues are shown in Figure 4b,c. Like the WT G40P, these three
mutants assembled into hexamers in gel ltration chromatography (data not shown).
However, none of them showed any detectable primase binding (Fig. 4e and Table 1
as Figure B.5), suggesting a role of these mutated residues in mediating the interaction
with DnaG primase.
B 3.6 Inter-tier contacts and helicase stimulation by primase
Primase binding to DnaB N-tier stimulates ATPase and helicase activity. In order to
test the role of the inter-tier interactions in the primase-mediated helicase stimulation,
we designed two mutants to disrupt the interactions of the C-terminal helicase-tier with
either the N-globe (mt4) or the -hairpin (mt5) of the N-tier (Fig. 4b). Both mutants
assembled into stable hexamers. However, mt5 was unable to bind DnaG (Fig. 4e
79
Figure B.4: G40P helicase activity and helicase-primase interactions: (a) Heli-
case activity of G40P N-terminal deletion mutants, showing the importance of the N-
globe and -hairpin for helicase function. Helicase activity of the mutants is expressed as
a percentage of the full-length (FL) activity. (b,c) Locations on G40P of the residues
investigated by mutational analysis to dene the primase binding site. Residues mutated
in a given construct are represented by spheres of a given color: yellow, mt1; blue, mt2;
green, mt3; magenta, mt4; cyan, mt5 (see Table 1 for details). Gray ribbons, ATPase
domain; pink or blue ribbons, N-terminal domains of cis or trans monomers, respectively.
(d) Native gel shift assay of binding of primase to the two N-terminal fragments, N149
and N171, of G40P. (e) Native gel shift assay of binding of primase to WT and mutant
G40P. (f) Primase-mediated helicase stimulation of G40P WT and mutants, expressed
as a multiple of helicase activity in the absence of primase. A ratio of three primase
molecules to one G40P hexamer was used in the assay. Only those G40P mutants with
detectable helicase activity were tested. The mt3 mutant was used as a negative control,
as it had no detectable primase-binding. Error bars in a and f, s.e.m. from a minimum
of three independent experiments.
80
Figure B.5: Functional interactions of B. subtilis DnaG primase with WT and
mutant G40P helicase proteins.: The WT and all mutant proteins of G40P isolated
from the hexamer peak in gel ltration were used for the primase-binding and functional
assays. The locations of G40P mutants on the hexamer structure are as follows: mt1
and mt2 on -hairpin surface (Hp surface), mt3 at the interface between two N-globes
(Ngb-to-Ngb) for trans monomers or on the exposed surface for cis monomers, mt4 at
the interface between N-globe and C-tier (Ngb-to-C), and mt5 at the interface between
the -hairpin and the C-tier (Hp-to-C). a Values for the ATPase and helicase activity
are expressed as percentage of those of WT G40P in the absence of DnaG primase (set
as 100%), with s.d. indicated by. b Values are given relative to full-length activity in
the absence of DnaG.
and Table 1 as Figure B.5), possibly because the packing of the N-terminal -hairpins
with the helicase domain was disrupted, disturbing the structural integrity of the N-tier
that is important for primase-binding. Mt5 lost helicase activity (Table 1as Figure B.5),
consistent with the essential role of the-hairpin for helicase function. In contrast to mt5,
mt4 possessed wild-type ATPase and helicase activities and bound to primase (Table 1 as
Figure B.5 and Fig. 4e,f). Notably, mt4 showed reduced primase-mediated stimulation
of the ATPase and helicase activity of G40P (Fig. 4f and Table 1 as Figure B.5).
81
B 4 Discussion
We determined the crystal structure of the full-length G40P, which formed one com-
plete hexamer per a.s.u. Two distinct monomer conformations, termed cis and trans
structures, assembled alternately into one hexamer that had an unusual dual symmetry:
a near{three-fold N-terminal tier and a pseudo{six-fold C-terminal tier. Guided muta-
genesis based on the G40P structure demonstrated the importance of the N-terminal
domains for helicase function; mapped the DnaG-binding sites onto the N-terminal tier,
which is composed of the N-globe and -hairpin structures; and provided evidence to
suggest a mechanism by which primase-binding aects DnaB helicase function.
This study clearly demonstrated the importance of the N-terminal domains for heli-
case function, as deleting the N-globe reduced helicase activity, and further deletion
of a few residues into the -hairpin region caused a complete loss of helicase function
(Fig. 4a and Table 1 as Figure B.5). As these deletion mutants all retained substantial
ATPase activity (Table 1), these deletion results indicated that changes to the struc-
tural integrity of the three-fold N-tier aected the helicase function much more than the
ATPase activity.
The deletion studies revealed that the N terminus comprising the N-globe and hair-
pin regions was important in primase binding. However, isolated N-terminal fragments
containing the N-globe and -hairpin, which exist as a monomeric form and not in the
three-fold N-tier conformation, did not show any detectable primase binding. This sug-
gests that primase may bind to the N-terminal domains only when they are assembled
into the three-fold N-tier, which may only occur in the context of the full-length hexamer.
Mutagenesis analysis of residues located on the surface of the N-tier (mt1 and mt2,
Fig. 4b,c) suggested a potential role for these residues in mediating primase interaction,
either directly or indirectly. This result is consistent with published mutational and
genetic studies in dierent organisms [70, 120, 119, 58]; the residues aecting primase
82
binding in these studies mapped to similar locations on the surface of the three-fold N-
tier of G40P (Supplementary Fig. 2 online). As this work was being reviewed, another
structure article describing DnaB and primase interactions was published [5]; according
to that study, one a.s.u. contains a DnaB dimer binding to one primase P16 fragment
through the two adjacent N-globes of DnaB. Based on this structural information, the
mutated residues on the -hairpin surface of DnaB must not make direct contact with
the primase, suggesting that these residues disrupt primase binding indirectly. Alter-
natively, because of the reported dierence in primase binding by DnaB from dierent
organisms [114] and the dierent structures shown for primase P16 fragment from Bacil-
lus stereothermophilus and E. coli [84, 117], we cannot rule out the possibility that more
than one binding mode between DnaB and primase may exist.
The residues mutated in mt3 are in two dierent environments, depending on whether
they are on the cis or trans monomer: these residues on the cis monomer are exposed
(Fig. 4b,c), whereas these residues on the trans monomer (Fig. 4c) are at the interface
with a cis N-globe, participating in the globe-globe interaction. This mutant was origi-
nally designed to disrupt the globe-globe interactions within the three-fold N-tier to test
the role of this interaction in primase binding. The recent publication mentioned above
[5] suggests that the residues targeted in mt3 lie near the interface with primase and in
G40P may be involved in direct contact with primase.
DnaB helicase can be stimulated by primase binding. If primase binds to the G40P N-
tier, then primase-mediated stimulation of helicase activity should be channeled through
contacts between the N-terminal tier and the helicase domain. We showed that mutations
of residues making contact between the N- and C-terminal tiers rendered the helicase
insensitive to the primase-mediated stimulation of helicase activity, which provides evi-
dence that inter-tier interactions may channel primase stimulation from the N-tier to the
C-terminal helicase tier.
Structural and biochemical data indicate that one DnaB hexamer binds to three
DnaG primases [121, 12, 119, 5, 84, 117, 76]. Although primase binding can stimulate
83
helicase function in the absence of active priming, it is conceivable that when the pri-
mases start priming on the ssDNA generated by the helicase, the same primase binding
could also exert structural constraints on the helicase to negatively regulate helicase
function. In T7 replication, leading strand synthesis pauses upon lagging strand priming
at the replication fork [64], possibly because of similar primase-imposed negative struc-
tural constraints on the helicase. This may be a potential mechanism for coordinating
the leading and lagging strand synthesis. Thus, the primase-helicase interactions may
stimulate G40P and other DnaB-like helicases to function when the bound primases are
idle, but suppress helicase function when the attached primases start priming.
Herein, we describe the unusual architecture and assembly mechanism for a DnaB-like
helicase, G40P. Structural and functional analyses of G40P helicase and its interactions
with DnaG indicate that the N-terminal tier and its structural integrity are essential for
primase recognition and for primase-mediated stimulation of helicase function. This work
provides a basis for understanding how the helicase coordinates with other replication
proteins at the DNA replication fork.
B 5 Methods
B 5.1 Protein purication and crystallization
The cDNAs encoding SPP1 helicase G40P and B. subtilis DnaG primase were PCR
amplied and cloned into the E. coli expression vector pGEX-KG. All constructs were
conrmed by sequencing of the entire open reading frame. For purication of recombi-
nant proteins, E. coli cells were collected by centrifugation; the cell pellet was suspended
in 20 mM Tris-HCl buer, pH 8.0, 0.5 M NaCl, 1 mM DTT and lysed using a Micro
u-
idics pressurized cell disruptor, followed by a brief sonication. After clarication by
centrifugation, the GST-fusion protein was isolated using a glutathione anity column
at 4
C. The G40P protein was cleaved from the GST-fusion with thrombin and puried
using Resource-Q ion exchange column, followed by passage through a Superdex-200
84
gel ltration in 20 mM Tris-HCl, pH 8.0, 500 mM NaCl, 1 mM DTT. Proteins were
concentrated to approximately 10 mg ml
1
for crystallization and biochemical assays.
Crystals of the two G40P constructs, full-length (1{442) and N129 (residues 130{
442), were obtained at 18
C by the hanging-drop vapor-diusion method. The P2
1
2
1
2
1
crystals of the full-length G40P were grown in solutions containing 0.1 M HEPES buer,
pH 7.5, 1{1.25 M magnesium diacetate and 0.02{0.04% (w/v) -octylglucoside. Dehy-
dration, by transferring crystals into slightly higher concentrations of mother liquor
and incubating for 3{5 d over reservoir solution supplemented with 25% (v/v) glycerol,
improved the diraction resolution from 6
A to 3.9
A. The P6
1
crystal form was obtained
from N129 protein in mother liquor containing 0.1 M sodium citrate buer, pH 5.6,
8{12% (w/v) PEG 4000, 0.2 M ammonium acetate in the presence of 1 mM ATP-
S.
B 5.2 Data collection and structure determination
Native, selenium-SAD, or selenium-MAD datasets were collected at synchrotron beam-
lines using crystals frozen in liquid nitrogen. Diraction data were processed with
HKL2000 [87]. The structure of N129 was determined with the program SOLVE [118]
using a selenium-MAD dataset. A solvent
attening step with RESOLVE [118] yielded
an electron density map containing regions of well-featured -helices, which allowed ini-
tial model building with the program O [56]. Higher-resolution maps were obtained by
combination of a two-wavelength MAD dataset with a native dataset using as a phasing
scheme MIRAS in SHARP [127]. Renement to 2.35
A with REFMAC5 [116] using the
native data led to a nal model with an R
free
of 28.51% and R
work
of 23.81% (Table 2
as Figure B.6).
To determine the full-length G40P structure, 58 selenium atoms were located using
SOLVE from a SAD dataset in the resolution range of 30{5.5
A. Heavy atom renement
and phasing were performed with SHARP. RESOLVE automatically identied the initial
six-fold symmetry operators for non-crystallographic symmetry (NCS) averaging, and the
resulting electron density map showed excellent main chain connectivity, which allowed
85
Figure B.6: Data collection, phasing and renement statistics for full-length and the ATPase domain of G40P
86
the unambiguous docking of six copies of the ATPase domain structure from N129
construct, as well as six copies of the homologous crystal structure of the N-terminal
globular domain of E. coli DnaB (PDB 1B79), by phased translation searches [31] as
well as manual tting using O. Subsequent two-domain (N-terminal and C-terminal
domains) six-fold NCS averaging and phase extension to 4.5
A using DM [125] in CCP4
improved the density map and revealed the missing parts, with well-connected main
chain density throughout the molecule, including the -hairpin (Supplementary Fig. 3
online). The phases were further improved through phase combination of the anomalous
experimental phases with the hexameric model phases using SHARP, which produced a
contiguous density for the entire G40P hexamer.
MOLREP [134] placed the hexameric G40P model into the 3.90
A native data for
torsional simulated annealing and minimization renement using CNS [79]. At this
point, the electron density maps allowed us to build all the missing side chains, and the
58 selenium sites, along with well-featured side chain density, were helpful for checking
the registry of the polypeptide (Supplementary Figs. 3 and 4 online). NCS restraints
were applied throughout the renement process in CNS [79] as well as in TLS [17]
renement with REFMAC5. Four dierent NCS groups were used: group one, the six
N-terminal domain (six-fold); group two, the six C-terminal domains (six-fold); group
three, the three cis -hairpins (three-fold); and group 4, the three trans -hairpins
(three-fold). Geometry-restrained renement yielded R
free
and R
work
of 34.3% and
33.9%, respectively (Table 2 as Figure B.6). The nal model was validated by comparing
it with the experimental map and by calculating simulated-annealing omit maps [79].
The maps calculated with sharpened data by applying a B-factor of - 90 produced well-
featured side chain electron density.
B 5.3 Helicase assay
The substrate for the helicase assay was prepared by annealing a
32
P-labeled ssDNA
(a 60-base oligonucleotide) to circular M13mp18 ssDNA. This oligonucleotide has 35
87
nucleotides annealed to the M13 DNA, leaving a 25-nucleotide 5' overhang. The substrate
DNA was incubated with various amounts of dierent G40P mutant proteins in the
presence or absence of primase at 37
C for 30 min in a buer containing 20 mM Tris-
HCl, pH 7.5, 5 mM ATP, 10 mM MgCl2, 1 mM DTT and 50 mM NaCl. The reaction
was terminated by adding a stop solution containing 100 mM EDTA, 0.5% (w/v) SDS
and 50% (v/v) glycerol. Samples were analyzed on a 12% native polyacrylamide gel in 1
M Tris-borate/EDTA running buer. The unwinding of the substrate DNA was detected
by autoradiography.
B 5.4 ATPase assay
We assembled on ice 15-L reactions containing 20 mM Tris-HCl, pH 7.5, 10 mM MgCl2,
1 mM DTT, 0.1 mg ml
1
BSA, 1 Ci [-
32
P]ATP (Amersham, 3,000 Ci mmol- 1), 100
M cold ATP, and various amounts of G40P or G40P with varying amounts of primase.
Reactions were incubated at 37
C for 30 min and were stopped by adding 10 mM EDTA
and placing on ice. We placed 5 l from each reaction onto a prewashed PEI-cellulose
TLC plate (SelectoScientic), dried the plate, and ran it for 2 h in 2 M acetic acid and
0.5 M LiCl. Plates were then dried, autoradiographed using phosphorimaging plates and
quantied.
B 5.5 Native gel shift assay
We examined interactions between G40P and B. subtilis DnaG primase using a native
gel shift assay. We mixed 10 g of various G40P constructs 10 g primase in a buer
containing 25 mM Tris-HCl, pH 8.0, 50 mM NaCl, 5 mM MgCl2 and 1 mM ATP-
S and
incubated the mixtures for 30 min on ice. They were then analyzed by native 6% PAGE
at 150 V for 1 h at 4 C. The gel was stained by Coomassie blue for detection.
88
B 5.6 Concluding Remarks
We thank the stas at Lawrence Berkeley Laboratory's Advanced Light Source beamlines
8.2.1, 8.2.2, 4.2.2 and Argonne National Laboratory's Advanced Photon Source 19ID for
assistance in data collection.
Protein Data Bank: Coordinates have been deposited with accession codes 3BGW
and 3BH0 for the full-length G40P hexamer and the ATPase domain monomer,
respectively.
Note: Supplementary information is available on the Nature Structural & Molecular
Biology website.
89
Abstract (if available)
Abstract
The Apolipoprotein B editing enzyme catalytic polypeptide-like (APOBEC) family of 11 proteins deaminate cytidines on either single-stranded DNA (ssDNA) or RNA substrates, introducing C to U mutations [28]. These mutagenic proteins are vital for a variety of biological purposes, ranging from proper metabolism to development of strict and efficient antibodies to prevention of virus infection [28]. This activity is controlled by a signature domain motif His-Xaa-Glu-Xaa₂₃₋₂₈-Pro-Cys-Xaa₂₋₄-Cys, where the histidine and two cysteine residues coordinate the Zinc (Zn) required for catalysis [24, 28, 50]. A duplication event occurred during evolution that produced four APOBECs with two of these domains [29, 54]. One such member is APOBEC3G (A3G). In A3G, it has been shown that only the second domain (CD2) is catalytically active, while the other is responsible for RNA binding and various protein interactions [24].
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
The crystal structure of APOBEC-2 and implications for APOBEC enzymes
PDF
Structural and biochemical analyses on substrate specificity and HIV-1 Vif mediated inhibition of human APOBEC3 cytidine deaminases
PDF
Structural and biochemical determinants of APOBEC1 substrate recognition and enzymatic function
PDF
Biochemical studies of APOBEC protein family
PDF
Structural and biochemical studies of two DNA transaction enzymes
PDF
Exploring roles of human APOBEC-mediated RNA editing activity
PDF
Structural studies of two key factors for DNA replication in eukaryotic cells
PDF
APOBEC RNA mutational signatures and the role of APOBEC3B in SARS-CoV-2 infection
PDF
Simulating the helicase motor of SV40 large tumor antigen
PDF
X-ray structural studies on DNA-dependent protein kinase catalytic subunit:DNA co-crystals
PDF
Structure and regulation of lymphoid tyrosine phosphatase (LYP) in autoimmune response
PDF
Structural and biochemical studies of large T antigen: the SV40 replicative helicase
PDF
AID scanning & catalysis and the generation of high-affinity antibodies
PDF
Scanning and catalytic properties of AID with structural comparisons to APOBEC3A
PDF
Data-driven approaches to studying protein-DNA interactions from a structural point of view
PDF
Structure and function of archaeal McM helicase
PDF
Using novel small molecule modulators as a tool to elucidate the role of the Myocyte Enhancer Factor 2 (MEF2) family of transcription factors in leukemia
PDF
The kinetic study of engineered MBD domain interactions with methylated DNA: insight into binding of methylated DNA by MBD2b
PDF
Mechanism study of SV40 large tumor antigen atpase and helicase functions in viral DNA replication
PDF
Motions and conformations of nucleic acids studied using site-directed spin labeling
Asset Metadata
Creator
Holden, Lauren Georgianna (author)
Core Title
A structure based study of the HIV restriction factor APOBEC3G
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Molecular Biology
Publication Date
03/01/2011
Defense Date
01/28/2011
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
anti-HIV,cytidine deaminase,OAI-PMH Harvest,X-ray crystallography
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Chen, Xiaojiang S. (
committee chair
), Chen, Lin (
committee member
), Goodman, Myron F. (
committee member
), Warshel, Arieh (
committee member
)
Creator Email
ciaobella8682@gmail.com,lholden@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3673
Unique identifier
UC1118907
Identifier
etd-Holden-4343 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-416237 (legacy record id),usctheses-m3673 (legacy record id)
Legacy Identifier
etd-Holden-4343.pdf
Dmrecord
416237
Document Type
Dissertation
Rights
Holden, Lauren Georgianna
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
anti-HIV
cytidine deaminase
X-ray crystallography