Close
USC Libraries
University of Southern California
About
FAQ
Home
Collections
Login
USC Login
0
Selected 
Invert selection
Deselect all
Deselect all
 Click here to refresh results
 Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Folder
Structural studies of two key factors for DNA replication in eukaryotic cells
(USC Thesis Other) 

Structural studies of two key factors for DNA replication in eukaryotic cells

doctype icon
play button
PDF
 Download
 Share
 Open document
 Flip pages
 More
 Download a page range
 Download transcript
Copy asset link
Request this asset
Request accessible transcript
Transcript (if available)
Content





STRUCTURAL STUDIES OF TWO KEY FACTORS FOR DNA REPLICATION IN  

EUKARYOTIC CELLS



by



Yu-Hao Paul Chang








A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirement for the Degree
DOCTOR OF PHILOSOPHY
(GENETIC, MOLECULAR AND CELLULAR BIOLOGY)





August 2010





Copyright 2010       Yu-Hao Paul Chang

ii
Table of Contents

List of Tables          iii

List of Figures          iv

Abstract          vi

Chapter 1: Introduction          1

Chapter 2: Crystal Structure of the GINS Complex and Functional Insights into   8
Its Role in DNA Replication
2.1 Results           8
2.2 Materials and Methods       26

Chapter 3: The Structural Insights of SV40 Large T Antigen Assembly at  28
Replication Origin and Mechanism of Origin Melting
3.1 Results         28
3.2 Materials and Methods       51

References          54

Appendix A: Crystal Structure of the Anti-viral APOBEC3G Catalytic Domain 59
and Functional Implications
A.1 Abstract         59
A.2 Results         60
A.3 Materials and Methods       74

iii
List of Tables

Table 2.1: Data collection (MAD) and model refinement statistics of GINS 10
complex

Table 3.1: Data collection and model refinement statistics of LTag-EP origin 53
structure

Table A.1: Data collection (MAD) and model refinement statistics of  76
Apobec3G

iv
List of Figures

Figure 1.1: SV40 LTag structures and DNA sequence of core origin of   4
replication.

Figure 2.1: The overall structural features of human GINS complex.   9

Figure 2.2: The structural folds of individual subunits.    11

Figure 2.3: Sequence alignment and secondary structure assignment of  12
Sld5/Psf1 (panel a) and Psf2/Psf3 (panel b).

Figure 2.4: The ring structure of the GINS complex and co-localization of  14
disordered regions on the surface.

Figure 2.5: The structure of the central pore of GINS and its accessibility.  17

Figure 2.6: Role of Psf3 N-terminal loop in GINS complex assembly.  19

Figure 2.7: Distribution of ten Arg and Lys residues on the surface of GINS 20
complex central pore.

Figure 2.8: The mapping of the yeast temperature sensitive mutants onto the 22
equivalent sites present in human GINS structure.

Figure 2.9: A model for the GINS complex coordinating MCM, Cdc45, Pol ε 
and Pol α-primase complex at the replication fork.

Figure 3.1 SV40 LTag domain structure and DNA sequence of core origin of 29  
replication.

Figure 3.2 Overall structure of the LTag dimer bound to EP-ori.   30

Figure 3.2. Secondary structure of LTag 131-627.     31

Figure 3.4. Helicase domain and interface between helicase domains.  33

Figure 3.5. Interactions between LTag and EP-ori DNA.    41

Figure 3.6 A model of LTag at the replication origin.    49

Figure A.1: The X-ray structure of enzymatically active APOBEC3G-CD2. 61



v

Figure A.2: Structural comparison of the Apo3G-CD2 X-ray structure with 63
the Apo3G-2K3A NMR structure.

Figure A.3: Common and distinct structural features between APOBEC  64
proteins and other Zn-deaminase superfamily enzymes.

Figure A.4: Structural comparison of APOBEC3G-CD2 with APOBEC2.  67

Figure A.5: Predicted substrate groove and deamination activity of   69
APOBEC3G mutants.

Figure A.6: Comparison of the horizontal substrate groove of the X-ray  71
structure (a), (b), (c) with the brim-model based on the NMR structure
(d), (e), (f).


vi
Abstract
Genomic DNA replication is essential for the transmission of genetic information; during
this process, minichromosome maintenance (MCM) complex, the cellular replicative
helicase, unwinds duplex DNA to enable DNA synthesis by polymerases. Eukaryotic
DNA replication is a tightly regulated process and the recruitment of MCM to the
replication origin and its activation require the participations of the GINS complex and
more than ten additional DNA replication factors. My thesis project focuses on the
structural studies of the GINS complex, as well as a MCM functional homolog, Simian
virus 40 (SV40) large T antigen (LTag) helicase.  

The crystal structure of the full-length human GINS hetero-tetramer was determined in
order to further understand the functional role of GINS. The four subunits each has a
major domain composed of an α-helical bundle-like structure. With the exception of Psf1,
other subunits each has a small domain containing a three-stranded β-sheet core. Each
full-length protein in the crystal has unstructured regions that are all located on the
surface of GINS and are probably involved in its interaction with other replication factors.
The four subunits contact each other mainly through α-helices to form a ring-like
tetramer with a central pore. This pore is partially plugged by a 16-residue peptide from
the Psf3 N-terminus which is unique to some eukaryotic Psf3 proteins and is not required
for tetramer formation. Removal of this N-terminal 16 residues of Psf3 from the GINS
tetramer increases the opening of the pore by 80%, suggesting a mechanism by which

vii
accessibility to the pore may be regulated. The structural data presented here indicate that
the GINS tetramer is a highly stable complex with multiple flexible surface regions.

For the helicase part, I attempted to understand how LTag helicase assembles on the
origin, initiates origin melting and DNA unwinding by determining the structure of a
dimeric LTag bound to the EP half of the SV40 viral replication origin. This is the first
LTag structure with both the origin binding domain (OBD) and the helicase domain
visible and shows that the linker region between OBD and helicase domain is very
flexible. The structural information reveals how a dimeric LTag recognizes and
assembles on the replication origin: LTag is brought to the viral DNA by OBD
recognizing its minimal binding sequence 5’-GC-3’. Once a LTag monomer encounters
its canonical penta-nucleotide recognition sequence, it is locked onto the replication
origin. The subsequent recruitment of the second LTag molecule to the same half of the
origin is mainly mediated by protein-protein interactions between the two LTag subunits.
This dimeric structure bound to the origin may also suggest a new mechanism of origin
melting: DNA bending induced by LTag hexamer assembly at the replication origin at an
angle to the global directionality of DNA creates torsional stress that is released by origin
melting. The LTag Zn domain forms part of the LTag hexameric central channel and thus
may play an important role in DNA bending and origin melting. It is also important for
anchoring LTag hexamer to the replication origin by encircling dsDNA. The models
proposed here for LTag assembly at the replication origin and origin melting may have
general implications to other eukaryotic counterparts.

1
Chapter 1. Introduction
Eukaryotic DNA replication
Eukaryotic DNA replication is controlled by a series of ordered and regulated steps
4,24,55

that commence with the binding of the six-subunit origin recognition complex (ORC) to
replication origins. During the G1 phase of the cell cycle, Cdc6 and Cdt1 are recruited to
the origin, and together with ORC, support the loading of the hetero-hexameric MCM2-7
complex (minichromosome maintenance, MCM) to form the pre-replication complex.
Although a substantial amount of data suggest that MCM acts as the replicative helicase,
MCM present in the pre-RC is devoid of helicase activity (summarized in 35). At the
G1/S transition of the cell cycle, it appears that the MCM helicase activity is activated by
a complex and as yet poorly understood series of modifications that require the actions of
two protein kinases DDK (Cdc7-Dbf4) and CDK (cyclin-dependant) as well as the
participations of minimally eight additional factors, including Mcm10, Cdc45, Dpb11,
GINS, Sld2 and Sld3
35
. Unwinding of duplex DNA by the activated MCM allows the
progression of replication forks and enables DNA polymerases (Pol) ε and δ to continue
synthesis on the RNA primer that is initiated by DNA polymerase α-primase (Pol α-
primase) complex
35
.

An important MCM helicase activator - GINS
The GINS complex is an important activator of MCM helicase. DNA unwinding was
observed concomitant with the loading of GINS and Cdc45 at origins
26,55
. A complex
containing near stoichiometric levels of MCM, Cdc45 and GINS named unwindsome was

2
isolated from Drosophila and was shown to possess DNA helicase activity
43
. Studies
with Xenopus extracts revealed that a complex which included MCM, Cdc45 and GINS
was found at sites at which replication forks were halted artificially by a streptavidin-
biotin complex
45
. In Saccharomyces cerevisiae, GINS was shown to play a critical role
in supporting interactions between MCM and Cdc45, as well as a number of key
regulatory protein constituents that together formed a large complex called replisome
progression complex that migrated with the replication fork
26
. Upon selective
degradation of the Psf2 subunit of GINS, replication was halted and Cdc45 was no longer
associated with MCM. These findings suggest that interactions between MCM and other
key replication factors might be mediated by GINS. Collectively, they indicate that GINS
is an essential component of the replicative machinery that moves with the replication
fork.

More specifically, GINS is a heterotetrameric complex consisting of Sld5 (synthetic
lethal with dbp11 mutant-5), Psf1 (partner of Sld5-1), Psf2 and Psf3. It was first
discovered using a variety of genetic screens in S. cerevisiae
56
. The four GINS subunits
are paralogs among which the specific subunit pairs Psf1-Sld5 and Psf2-Psf3 are more
closely related
39
. Each of the subunits is relatively small (approximately 200 amino acids)
and highly conserved in all eukaryotes. In archaea, only two homologues have been
identified, Gins15 and Gins23 that appear to interact and form a dimer of the heterodimer,
suggesting that like their eukaryotic counterparts, they function as a tetramer
40
. Direct
interactions between the archaeal GINS complex and the archaeal MCM as well as

3
primase have been reported
40
. Other reports have suggested that GINS may serve as an
accessory factor for eukaryotic DNA polymerases, including DNA Pol ε
50
and the DNA
Pol α-primase complex
17
.

Despite the essential roles of GINS in DNA replication, how the four subunits of GINS
interact with one another and how GINS interacts with MCM, Cdc45, and other protein
factors at the replication fork remains unclear. To better understand the structural
/functional roles of GINS in DNA replication, the crystal structure of human GINS
complex was determined. The GINS structure and insights obtained from this structure
are presented in Chapter 2.

A model system of eukaryotic DNA replication - Simian virus 40 large T antigen  
In order to simplify the elucidation of the complicated process of eukaryotic DNA
replication, Simian virus 40 (SV40) large T antigen (LTag) is often used as a model
system
22
. SV40 transforms eukaryotic cells and LTag is essential for viral DNA
replication. SV40 replication uses many essential cellular replication proteins (e.g.,
primase, polymerase). However, LTag alone fulfills the functions of multiple initiator
proteins: origin recognition by ORC, origin melting, and replication fork unwinding by
MCM
52
, i.e.  LTag is an integrated initiator and replicative helicase for SV40 DNA
replication.

4
Figure 1.1 SV40 LTag structures and DNA sequence of core origin of replication. (a)
The linear representation of the LTag domain structure. (b) Structures of OBD not bound
to DNA (white, PDB ID 2ITJ) and OBD bound to PEN sequence (black, PDB ID 2ITL)
aligned. The loop region between residues 150 and 151 (boxed in blue) extends away
from the core of OBD in order to contact DNA (not shown). (c) Structure of the helicase
domain in the nucleotide-free state (PDB ID 1SVO) showing the N-terminal Zn domain
and the C-terminal AAA+ domain. (d) Structure of the hexameric helicase domain in the
nucleotide-free state showing the N-terminal Zn tier required for hexamer formation and
C-terminal AAA+ tier (side view). (e) Structures of LTag in nucleotide-free, ADP-bound
(PDB ID 1SVL) and ATP-bound states (PDB ID 1SVM) aligned showing the upward
movement of the β-hairpin in the central channel upon ATP binding (cross-section view).
(f) A model of double hexamer of LTag helicase domain at the SV40 origin (Zn tier in
white and AAA+ tier in grey). The LTag helicase domain binds to the AT rich and EP
regions. The exposed DNA region corresponds to the four PEN sequences. (g) Structures
of the hexameric helicase domains in nucleotide-free, ADP-bound and ATP-bound states
(top view). The central channel widens as ATP is hydrolyzed and ADP is released. (h)
The replication origin sequence of SV40. Each of the four GAGGC sequences/PENs (in
red) is recognized by one OBD, and the four PENs are flanked by the EP and the AT-rich
regions.

5
Figure 1.1, continued.




SV40 LTag has three major domains (Fig. 1.1a): the Dna-J homology domain (JD), an
origin binding domain (OBD), and a helicase domain
36
. The helicase domain is a
member of superfamily III helicase
53
and can be divided into the Zn domain and AAA+

6
domain (Fig. 1.1c)
36
. The helicase domain can assemble into a hexamer and the
structures of this hexamer in the nucleotide-free, ADP-bound and ATP-bound states have  
been determined (Fig. 1.1 d, e, g)
25,36
. The LTag hexamer is a two-tiered structure
consisting of the N-terminal Zn tier required for hexamerization and the C-terminal
AAA+ tier (Fig. 1.1d)
36
. The orientation of the AAA+ domain relative to the Zn domain
undergoes significant changes upon ATP binding, hydrolysis and release
25
, while the
orientation of the Zn domains relative to each other remains the same (Fig. 1.1e)
25
. This
change in the relative orientation of the AAA+ domain generates the iris-like motion of
the hexamer (Fig. 1.1g) and a large longitudinal shift of the β-hairpin fingers along the
central channel (Fig. 1.1e)
25
. These conformational changes have been proposed to be
coupled with DNA unwinding and translocation by LTag helicase
25
.

The SV40 core origin DNA consists of 64 base pairs (bps) (Fig. 1.1h) that can be divided
into two halves. Each half contains two penta-nucleotides (GAGGC) (PENs) and either a
stretch of AT-rich sequence (AT) or early palindrome (EP) region
18
, and each PEN is
recognized by one LTag OBD (Fig. 1.1b)
5,19
. Each half of the origin DNA supports the
assembly of one LTag hexamer, and the full origin supports the formation of two
hexamers arranged in an N-to-N configuration (Fig. 1.1f)
58
. The assembly of LTag
hexamer or double hexamer at the replication origin is coupled with origin DNA melting
and unwinding
31,33,41,54,58
. More specifically, the AT and EP regions of the origin are
melted
6,7
and the melted DNA region is thought to be within the helicase domain of
LTag
25,36,51
.

7
Despite advances in characterizing LTag helicase domain and the elucidation of LTag
OBD interactions with the PEN recognition sequence
5,42
, high-resolution information of
the interactions between LTag and the core origin, as well as the interactions of the three
domains of LTag with one another are still lacking. Consequently, the mechanisms of
LTag assembly at the origin and the subsequent DNA melting and unwinding remains
poorly understood. To better understand the assembly process of LTag hexamer/double
hexamer from individual subunits at the replication origin and the subsequently melting
and unwinding of DNA, we determined the first crystal structure of dimeric LTag in
complex with replication origin. This structure and its implications are discussed in
Chapter 3.



8
Chapter 2. Crystal Structure of the GINS Complex and Functional Insights into Its
Role in DNA Replication

Reproduced with permission from Chang, Y.P., Wang, G., Bermudez, V., Hurwitz, J.,
Chen, X.S. 2007. Crystal structure of the GINS complex and functional insights into its
role in DNA replication. Proceedings of the National Academy of Sciences 104(31):
12685-12690. Copyright 2007 National Academy of Sciences.
Author contributions: YPC and VB cloned the constructs. YPC purified the protein,
crystallized the complex, and collected and processed diffraction data. YPC and GW
determined the structure.  JH and XSC supervised the project.

2.1 Results
Overall structural features of the GINS complex
To understand the structural/functional roles of GINS in replication, we crystallized the
human GINS complex and determined its crystal structure. The four full-length subunits
of GINS were co-expressed in Escherichia coli and the complex was purified to
homogeneity. The isolated complex had an apparent molecular weight of ~90 kDa as
estimated from gel filtration chromatography (Fig. 2.1a), and glycerol gradient
sedimentation (data not presented) consistent with a 1:1:1:1 molar ratio of the four
different proteins in the tetrameric complex (Fig. 2.1b). The GINS complex was
crystallized as described in 2.2 Materials and Methods, and SDS-PAGE and mass
spectrometric analyses confirmed that all four proteins present in the crystals were full-
length. The X-ray structure of the GINS complex was determined and refined to 2.36 Å
resolution (statistics presented in Table 2.1). Each asymmetric unit in a crystal cell

9
contained two GINS hetero-tetramers with identical conformations. The overall
morphology of the GINS tetramer complex resembles a slightly elongated spindle (Fig.
2.1c-f) with a visible central hole (Fig. 2.1c, d). The body of the tetramer is composed of
α-helices with few peripheral short β-strands.

Figure 2.1 The overall structural features of human GINS complex. (a) Superdex-200 gel
filtration profile of the crystallized human GINS complex. (b) The SDS-PAGE analysis
of the peak fraction from panel a showing that all four subunits are present as full-length
proteins. (c), (d) Two views of the wider faces of the GINS complex structure, showing
the spindle-shaped structure and part of the central pore opening. The tetramer is
composed of predominantly α-helices that are arranged in α-helical bundles. (e), (f) Two
views of the narrower sides of the GINS complex. The four subunits are shown in
different colors as indicated.



10
Table 2.1 Data collection (MAD) and model refinement statistics of GINS complex.
Peak Se (  = 0.97932 Å) Inflection Se (  = 0.97945 Å)  
Data collection
Cell dimensions (Space group P1)
a, b, c (Å) 58.19, 88.81, 103.43 58.27, 89.10, 103.80  
 α, β, γ (˚) 104.95, 103.71, 95.04 105.02, 103.58, 95.07
Resolution (Å) 50–2.36 (2.44-2.36) 50–2.75 (2.85–2.75)
Observations 301933 191159
R
merge
12.0 (55.6) 11.8 (57.2)
I/ I 14.5 (2.7) 14.6 (2.7)
Completeness (%) 98.6 (97.7) 98.9 (98.4)
Refinement
Resolution (Å) 30.0–2.36
No. reflections 143641
R
work
/ R
free
21.41/25.03
B-factor (Averaged): Protein 35.07; Water 31.16
R.m.s deviations: Bond lengths (Å) 0.0070; Bond angles (˚) 1.14
Highest-resolution shell values are shown in parentheses.

Structures of individual GINS subunits
Each of the four subunits contains a major domain composed of α-helices ( α-domain, Fig.
2.2a-d). The folds of the α-domains of all four subunits are similar; each contains four to
five helices arranged more or less in a parallel fashion to form a partial three-helix bundle
structure. In three of the four subunits (Sld5, Psf2, and Psf3), there is a small β-sheet
composed of three anti-parallel β-strands ( β1, β2, and β3) near one end of the α-domain.
Around the β-sheet are two helices and a β-hairpin or loop, forming a small but definable
β-domain in these three subunits.  

Psf1 has only an α-domain (residues 1-145) while its entire C-terminal 51 residues are
disordered. There is substantial space to accommodate the C-terminal 51 residues of Psf1

11
in at least one of the two GINS tetrameric complexes in the asymmetric unit. Nonetheless,
no definable electron density can be seen for the C-terminal 51 residues of Psf1 (residues
146-196), despite the fact that sequence alignment suggests a fold similar to the β-domain
that is present in the other three subunits (Fig. 2.3a). The structure reveals that the C-
terminal 51 residues of Psf1 are not folded in the hetero-tetrameric GINS complex and
thus not anchored to the GINS surface. These findings suggest that these Psf1 residues
could play a role in binding other protein partners.

Figure 2.2 The structural folds of individual subunits. (a), (b), (c), (d) The detailed
structures of each of the four subunits. N and C indicate the position of the N- and C-
termini in each subunit, while dashed lines indicate the location of disordered fragments.
The α-domain ( α-D) and β-domain ( β-D) for each subunit are indicated. Labels for the
secondary structures in the α-domains start with α and labels for the β-domains start with
β.




12

Figure 2.3 Sequence alignment and secondary structure assignment of Sld5/Psf1 (panel a) and Psf2/Psf3 (panel b).
Arrows represent β-strands, bars denote α-helices, and dotted lines represent disordered regions. The colored residues
indicated in the sequences of the subunit correspond to those identified in the yeast GINS subunits as temperature-sensitive
mutants.

13
Even though the GINS subunits have similar α- and β-domains, the relative arrangements
of the two domains differ among the subunits, as predicted previously
39
. In Sld5 and
Psf1, the larger N-terminal portion forms the α-domain, and the smaller C-terminal
fragment corresponds to the β-domain (Fig. 2.3a). In contrast, the order of α- and β-
domains of Psf2 and Psf3 is reversed (i.e., the β-domain is at the N-terminus and the α-
domain is at the C-terminus, Fig. 2.3b). In addition, the space between the α- and β-
domains for Psf2/Psf3 is only six residues, but is 21 residues in Sld5, and possibly about
the same length in Psf1 based on sequence alignment (Fig. 2.3a, b). Despite the
differences in spacer length, the β-domains present in Sld5, Psf2, and Psf3 appear
anchored to their respective α-domains through direct contacts (Fig. 2.2a, c, d).

Tetramer formation
In the tetramer structure, each of the four subunits interacts with two other molecules to
form a ring-like structure with a central hole (Fig. 2.4a). The four subunits are arranged
around the ring in the order: Sld5, Psf1, Psf3, Psf2 such that Psf2 contacts Sld5 to
complete the circle. Within the pore created by this ring, there are no direct bonding
contacts between Sld5 and Psf3 or Psf1 and Psf2. The inter-subunit interactions are
mainly through the sides of the α-helical bundles to generate extensive contacts between
subunits with buried interface areas between two neighboring subunits ranging from
2,900 to 4,000 Å
2
. These large interface areas presumably provide strong bonding forces
at the subunit interfaces, which explain why the tetrameric complex is stable in solution.
Major bonding forces are provided through helix-helix interactions between adjacent α-

14
Figure 2.4 The ring structure of the GINS complex and co-localization of disordered
regions on the surface. (a) The ring structure of the GINS complex can be visualized
along the long axis (the axis of the α-helical bundles) of the tetramer. From this view, it is
clear that most of the inter-subunit interactions are mediated through α-helices. The
central opening is also apparent. (b), (c) These views show interactions between Sld5 and
Psf2 which involve β-strands at two locations on inter-subunit interfaces. (d) Co-
localization of the disordered regions on the tetramer surface. Each sphere indicates the
location of the residue immediately adjacent to the disordered fragment. The subunit (and
sphere) color scheme is the same as that used in panel a. The disordered fragments of the
subunits co-localize on one face of the tetramer, but at the opposite ends. These
disordered regions may become structured when bound to interacting partners.


15
domains of different subunits (Sld5-Psf1, Psf1-Psf3, Psf3-Psf2, Psf2-Sld5), mediated
mostly by hydrophobic residues. However, the β-domains at the interface between Psf2-
Sld5 also play a role in stabilizing the tetramer. The β2 of the Sld5 β-domain interacts
with a small β-strand ( αs4) from the Psf2 α-domain, expanding the Sld5 β-domain to a
four-stranded β-sheet (Fig. 2.4b). Additionally, a β-hairpin from the Psf2 β-domain
contacts an α-helix of the Sld5 α-domain (Fig. 2.4c).

Possible roles of the unstructured regions of Sld5, Psf1, and Psf3
Although our crystal structure of the GINS complex included full-length proteins of all
four subunits, portions of Sld5, Psf1, and Psf3 have no visible electron density (Fig. 2.2a-
d, 2.3a, b represented by dashed lines), likely due to the flexibility of these regions. The
fact that regions are missing from the electron density maps of the structure suggests that
these regions are highly flexible and unstructured in the GINS complex. The last visible
C-terminal residue of Psf1 (S145) prior to the disordered C-terminal domain is adjacent
to the disordered fragment of Sld5 (residue 65-71) and to the disordered C-terminal
residues of Psf3 (residues 194-216); these disordered regions are shown as spheres in Fig.
2.4d. The co-localization of these disordered parts of three different subunits on the GINS
surface suggests that this site may bind partner proteins in the replication complex. On
the same side, but located at the other end of the tetramer, is the disordered region within
the β-domain of Psf3 (Fig. 2.4d) which may also serve as a protein binding site. These
disordered regions located on the surface of GINS may become structured upon binding
to proteins known to interact with GINS, like MCM, Cdc45, and DNA polymerases.

16
Most of these are large proteins or complexes and the locations of these disordered
regions on opposite ends of the tetramer may allow binding of more than one of these
factors at the same time. In keeping with this idea, previous studies in Xenopus reported
that the chromatin loading of Cdc45 and GINS were mutually dependent
34
. Furthermore,
Kamada et al.
32
observed that the human GINS complex containing the Psf1 subunit
devoid of the flexible C-terminal region (labeled Psf1 C in Fig. 2.4d) failed to bind to the
Xenopus pre-RC as well as to load Cdc45. These findings support the notion that the
flexible regions present on the GINS surface are important for its binding to replicative
proteins.

Accessibility of the central pore  
When visualized by negative stain electron microscopy (EM), the recombinant Xenopus
GINS complex had a ring-like structure with a central hole
34
. Examination of our GINS
structure reveals that it does have ring-like structure when viewed along the central pore,
as shown in Fig. 2.4a. Based on EM measurements, the ring diameter was reported to be
~9.5 nm (95 Å) with the central pore ~4 nm (40 Å) in diameter (14); in contrast, the
diameter of the ring in our crystal structure is at maximum ~78 Å (measured from edge to
edge in the large dimension) and the central pore is 10 Å in diameter (Fig. 2.5a, b). These
quantitative differences in dimensions could be due to flattening (deformation) and
dehydration effects intrinsic to the negative stain method used to prepare EM samples.


17
Figure 2.5 The structure of the central pore of GINS and its accessibility. (a) A view of
the central pore of the GINS tetramer. The loop colored in red inside the pore is formed
by the N-terminal 16 residues of Psf3. The loop is not tightly bonded to the pore surface.  
(b) The surface representation of the view in panel a. (c) The same view as in panel a but
without the N-terminal residues of Psf3. (d) The surface representation of the view shown
in panel c.



18
Detailed examination of the central pore present in our GINS crystal structure reveals that
a 16-residue peptide loop from the N-terminus of Psf3 appears to fit loosely into the pore,
effectively restricting the opening (Fig. 2.5a, b, in red). Interactions of this peptide loop
with the surface of the pore are limited, suggesting that the peptide may enter and leave
the pore without a significant energy barrier. Interestingly, sequence alignment of this 16-
amino acid sequence at the N-terminus of Psf3 reveals that it is present in human and
some higher eukaryotes but not in many other organisms (Fig. 2.6a). This sequence
alignment data suggests that the first 16 N-terminal residues of Psf3 may not be needed
for the structural integrity of the GINS tetramer, but rather may function to regulate the
dimensions of the central pore and thus its accessibility, by plugging and unplugging this
cavity.  

To test the notion that this N-terminal 16-residue peptide is not required for tetramer
formation and stability of the GINS complex, we generated Psf3 constructs lacking either
10 or 18 residues of the N-terminus and examined whether such constructs supported
tetramer formation. Both Psf3 constructs formed stable tetramers under high and low salt
conditions and behaved similarly to the GINS complex containing full-length Psf3 in gel
filtration analyses (Fig. 2.6b). These indicate that the N-terminal 16-residue loop of Psf3
is not needed for folding and stability of the GINS complex.  

The diameter of the central pore is ~10 Å at its narrowest point, but the opening is
increased to 18 Å upon removal of the N-terminal 16-residue loop of Psf3 (Fig. 2.5 c, d).

19
Figure 2.6 Role of Psf3 N-terminal loop in GINS complex assembly. (a) Sequence
alignment of the N-terminal regions of Psf3 from 31 different organisms; most of the
organisms analyzed lack the N-terminal sixteen residues present in human Psf3. (b) Gel
filtration profile of full-length (FL) human GINS and GINS containing Psf3 mutants
lacking either 10 or 18 residues at the N-terminus.



20
Figure 2.7 Distribution of ten Arg and Lys residues on the surface of GINS complex
central pore. These ten Arg and Lys residues are shown in stick representation with side
chain carbon atoms colored in red.



This may also partially explain the larger pore size observed in EM
34
, as the acidic
negative staining solution used for EM may dislodge the N-terminal peptide of Psf3 from
the pore. Thus, the 16-residue peptide at the N-terminus of human Psf3 may possibly
regulate pore opening and closing. Though a functional role for the central pore is as yet
unknown, we speculate that one possible role is to bind and hold a fragment from MCM,
Cdc45 or DNA polymerases during DNA synthesis. It is also possible that the pore could
potentially bind single-stranded DNA (ssDNA) since its dimension is sufficiently large to

21
accommodate ssDNA even with the N-terminal peptide of Psf3 situated within the pore.
In support of this possibility, there are ten Arg and Lys residues as well as several Asn
and Gln amino acids distributed on the surface of pore (Fig. 2.7), despite the overall
negatively charged outer surface of the tetramer. This charge distribution is similar to that
of PCNA and T7 gp4 helicase, both of which have positively charged and polar residues
around the central openings, even though they have an overall negatively charged outer
surface
30
. The GINS complex might require an additional factor(s) to bind ssDNA if it
does since DNA binding activity has not as yet been detected for the GINS complex
isolated from human (data not shown) or other organisms
40,50
.

Understanding temperature sensitive mutants of GINS
A total of nine different temperature sensitive (ts) mutations have been identified in the
four yeast GINS proteins
27,56,60
. These mutated residues are highlighted (in color) in the
sequence alignments shown in Fig. 2.3a and 1.3b, eight of which are highly conserved
across eukaryotes. Our GINS structure provides a rationale for all of these ts mutations.
The nine ts mutations can be divided into three classes according to their possible
defect(s) at non-permissive temperatures. The mutants in class I involve residues
important for inter-subunit interactions. Psf1-R74G (Fig. 2.8a) and Psf3-L72P (Fig. 2.8b)
belong to this class. Psf1 R74 (yeast Psf1-1 R84G) forms hydrogen bonds with Psf3 E17
and S28 (Fig. 2.8a); a Gly substitution at this position should abolish the two hydrogen
bonds and significantly weaken interactions between Psf1-Psf3, which may lead to
instability of the complex at non-permissive temperatures. Similarly, the Psf3 L72P

22
Figure 2.8 The mapping of the yeast temperature sensitive mutants onto the equivalent
sites present in human GINS structure. (a), (b) The class I ts mutant residues involved in
the inter-subunit interactions between Psf1 R74 and Psf3 (panel a) and between Psf3 L72
and Psf1 (panel b). (c), (d), (e) The class II ts mutant residues Psf3 A73, Sld5 L222, and
Psf2 R124 are involved in intra-subunit interactions. Mutation of these residues, as
indicated, would be expected to affect the folding and stability of individual subunits
within the GINS complex.



mutation (yeast Psf3-21 L46P) should reduce the hydrophobic packing with Psf1 F64
(Fig. 2.8b). The ts mutants in class II involve residues important for intra-subunit
interactions. Psf3 A73 (yeast Psf3-21 A47P) and Sld5 L222 (yeast Sld5-13 L293P) are

23
both involved in β-domain packing, as well as α- and β-domains interactions (Fig. 2.8c,
d). Psf2 R124 (yeast Psf2-209 R133K) forms five hydrogen bonds within three helices
and two loops in its α-domain (Fig. 2.8e). Class III ts mutants involve residues buried in
the hydrophobic core structures of Sld5 and Psf3. Mutations of these buried residues
would be expected to reduce the thermodynamic stability of the structure, providing a
plausible molecular explanation for the phenotypes of these mutants. Surprisingly, ts
mutations have not as yet been identified on the surface of the GINS. Since GINS
interacts directly with the MCM complex, Cdc45, and other replicative protein in
different species
35
, it is likely that the binding sites for these proteins are conserved. We
speculate that mutations on the surface of GINS that result in temperature sensitivity may
help identify their binding sites.

Structural conservation of GINS in Archaea
The archaeon Sulfolobus solfataricus has two GINS homologs, Gins15 and Gins23
40
.
Sequence analysis indicates that eukaryotic Psf1/Sld5 and Psf2/Psf3 are close paralogs of
Gins15 and Gins23, respectively
39
, suggesting that these two GINS proteins can form a
tetrameric complex via dimerization of the heterodimer. The crystal structure of human
GINS reveal that the residues involved in the interfaces between subunits are conserved
among archaeal and eukaryotic GINS proteins (data not shown), providing further
support for a structural and functional conservation between archaeal and eukaryotic
GINS complexes. In keeping with these considerations, it was shown that the GINS

24
complex formed in the archaea is a tetramer formed via the dimerization of the Gins15-
Gins23 heterodimer
40
.

Figure 2.9 A model for the GINS complex coordinating MCM, Cdc45, Pol ε and Pol α-
primase complex at the replication fork. The GINS complex is shown with surface
representation. All other components (drawn as cartoons) are shown interacting directly
to the GINS complex.










A model for GINS role at the replication fork
A speculative model of how the human GINS tetrameric complex interacts and
coordinates the activities of its binding partners is proposed in Fig. 2.9. In this model, we
suggest a direct contact between Psf1 and Pol ε because an interaction between S.
cerevisiae Psf1 and Dpb2 (the second largest subunit of Pol ε) was detected in a yeast
two hybrid screen
56
. Because archaeal Gins23 binds to the N terminus of MCM
40
and an
interaction of Psf3 with MCM has been detected in the yeast two hybrid system (S.
Azuma and H. Masukata, unpublished data, cited in
60
), our model depicts Psf3 in contact

25
with MCM. We assume that Psf2 contacts the Pol α-primase complex based on the report
that archaeal Gins23 interacts with primase
40
. The model also allows MCM, Cdc45, and
GINS to contact each other based on the isolation of this complex from Drosophila
eggs
43
. Finally, we also suggest that GINS must interact with Pol εand the Pol α-primase
complex to coordinate leading and lagging strand synthesis, respectively
17,50
.

Conclusion
Structural and the biochemical data of GINS complex presented here suggest that GINS
functions as a tight hetero-tetrameric complex. The molecular interactions between the
subunits of GINS are mediated mostly through helix-helix interactions that amplify the
helix-bundle-like structure of each individual subunit. In agreement with an EM study of
the Xenopus GINS complex
34
, we find an open pore along the long axis of the ring-like
tetramer structure. The pore size appears to be influenced by the position of the N-
terminal 16-20 amino acid residues of Psf3, providing a possible mechanism for
regulating the accessibility of the central opening for its binding to a peptide from an
interacting factor or to ssDNA. The positions of disordered regions in our structure,
including the C-terminal 51 residues of Psf1, co-localize on the surface of the GINS
complex as patches, and likely serve as interaction sites for the binding of GINS to its
replication protein partners. The conservation of the GINS sequence among archaea and
eukaryotes suggests a fundamental functional role for this complex in DNA replication; it
is likely that GINS serves as a scaffold for the assembly and maintenance of an active
helicase and replication complex at the fork. This structure should provide a framework

26
for future studies directed at how the GINS complex interacts with other replication
proteins that jointly support both the initiation of DNA synthesis as well as fork
progression.

2. 2 Materials and Methods
Cloning and protein expression and purification GINS
hSld5 (GenBank accession no. NM_032336), hPsf2 (GenBank Accession no. BC062444)
and hPsf3 (GenBank Accession no. BC005879) cDNAs were amplified from a HeLa
cDNA library. hPsf1 (GenBank Accession no. BC012542) cDNA was derived from
IMAGE clone 4333095. All of these cDNAs were subcloned into pGEX6P-1 (GE
Healthcare Biosciences Amersham) in the order GST-Psf2, Sld5, Psf1 and His8-Psf3 so a
polycistronic mRNA is transcribed during transcription. The N-terminal deletion mutants
of Psf3 (deleted of 10 and 18 residues) were constructed as a C-terminal His8 fusion
(Psf3-His8). The four subunits of human GINS were co-expressed in E. coli cells by
IPTG induction at 18°C. After cells were lysed by passage through a French Press, the
GINS complex was purified by nickel-affinity chromatography followed by a glutathione
resin affinity column, in a buffer containing 50 mM Tris-HCl (pH 8.0), 250 mM NaCl
(buffer A), and 5 mM β-mercaptoethanol. The GST and His8 tags were subsequently
removed by PreScission protease treatment in buffer A containing 1 mM DTT. The GINS
complex with 1:1:1:1 molar stoichiometry was obtained using RESOURCE Q ion-
exchange chromatography with a 50 to 500 mM NaCl gradient elution followed by gel
filtration chromatography through a Superdex-200 column in buffer containing 50 mM

27
Tris-HCl (pH 8.0), 50 mM NaCl. The typical yield from 24 litter culture was
approximately 30 mg.

Crystallization and structure determination
Crystals were grown by the hanging drop vapor diffusion method with 20 mg/mL GINS
complex against a solution containing 60 mM MES (pH 5.5), 2% (v/v) isopropanol and
34 mM calcium chloride. Multiple anomalous diffraction (MAD) data from Se-Met
crystals were collected using the synchrotron at the Argonne National Laboratory from
plate crystals approximately 200 x 100 x 20 m in size (Table 2.1). Data were processed
with HKL2000. A total of 52 selenium sites were located by the SHELXD program using
MAD data between 30–3.5 Å resolution range. The SHARP program was used to
calculate the experimental phases using the MAD data in the resolution range of 50–2.5
Å. RESOLVE was used for density modification resulting in a high quality electron
density map for model building with O. The model was refined with CNS to 2.36 Å
resolution. The final refinement statistics and geometry were in good agreement with
those defined by Procheck and are summarized in Table 2.1.

28
Chapter 3. The Structural Insights of SV40 Large T Antigen Assembly at
Replication Origin and Mechanism of Origin Melting

Chang, Y.P., Chen, X.S. Manuscript in preparation.
Author contributions: YPC purified the protein and DNA, crystallized the complex,
collected and processed diffraction data, determined the structure. XSC supervised the
project.

3.1 Results
Overall architecture of the LTag-dsDNA complex
To better understand how LTag assembles at the replication origin and subsequently
melts and unwinds DNA, we determined the crystal structure of dimeric LTag in complex
with the replication origin. The LTag construct used included both the OBD and the
helicase domain (Fig. 3.1a, boxed in blue). The complex crystallized in spacegroup P2
1
,
with each asymmetric unit containing two copies of the LTag dimer related to each other
by non-crystallographic symmetry. Each LTag dimer in the crystal recognized the 32 bps
EP half of the origin DNA (EP-ori) (Fig. 3.1b, boxed in blue) used for co-crystallization.  

The electron density corresponding to the EP-ori DNA was very good and allowed
unambiguous assignment of DNA registry and fitting of DNA into the electron density
(Fig. 3.2a, b). The electron density showed clear 32 stairs, corresponding to the 32 bps of
the EP-ori DNA. The average distance and twist between base pairs are 3.2 Å and 34.9º
respectively, which are close to those of the B-form dsDNA (3.2 Å and 35.9º). The EP-

29
ori DNA makes bonding contacts with three LTag regions: the N-terminal OBD domain,
the Zn domain and the C-terminal β-hairpin (residues 510 to 516 between β9 and β10. see
Fig 3.3 for reference) of the helicase domain (Fig. 3.2d, e. blue arrows).

Figure 3.1 SV40 LTag domain structure and DNA sequence of core origin of replication.
(a) The linear representation of the LTag domain structure. The construct used for co-
crystallization is boxed in blue. (b) The replication origin sequence of SV40. Each of the
four GAGGC PEN sequences (in red) is recognized by one OBD, and the four PENs are
flanked by the EP and the AT-rich regions. The EP half of the origin DNA (boxed in blue)
was used for co-crystallization with LTag.







The two helicase domains of the LTag dimer each adopts a conformation similar to those
of previously reported structures of LTag helicase domain (Fig. 3.4a)
25,36
. And the
helicase domains of the LTag dimer are arranged in a fashion similar to the arrangement
of adjacent subunits in the helicase domain hexamer of LTag
25,36
(Fig. 3.2c, 3.4d) (see
below for details). As expected, one LTag OBD recognizes PEN1 (Fig. 3.2c, d. This
LTag subunit is colored orange and is referred hereafter as subunit one. The orange arrow
indicates the direction of the OBD from the C-terminus to the N-terminus. This also
corresponds to the 5’ to 3’ direction of PEN1 5’-GAGGC-3’ for subunit one. The

30
Figure 3.2 Overall structure of the LTag dimer in complex with EP-ori. (a) The electron
density map of EP-ori DNA showing clear base-pair steps. The region corresponding to
PEN1 is boxed and labeled. (b) The section corresponding to PEN1 in panel a enlarged
showing the good fitting of the DNA model in the density. (c) The overall structure of the
dimeric LTag EP-ori DNA complex. The helicase domains of the two subunits (subunits
one and two are colored orange and green respectively) in LTag dimer are arranged in a
fashion similar to the arrangement of adjacent subunits in LTag helicase domain
hexamers. The orange/green arrow indicates the orientation of the OBD from C- to N-
termini. OBD of subunit one binds to PEN1 (highlighted and labeled). A second structure
of LTag dimer bound to EP-ori is shown in white. In the second structure, subunit one
OBD is rotated closer to the helicase domain, resulting in 6º bent of the EP-ori (indicated
by black arrows). (d) The interactions between subunit one and EP-ori. Subunit one is
oriented exactly the same as in panel c with subunit two removed to show the contacts
between EP-ori and OBD, Zn domain and β-hairpin of the AAA+ domain (indicated by
blue arrows). Each domain is labeled. (e) The contacts between subunit two and EP-ori
DNA mediated by OBD, Zn domain and β-hairpin. The helicase domain is oriented the
same as in panel d, showing the different orientation and positioning of the OBD and the
flexibility of the linker region between the OBD and the helicase domain.


31

 
Figure 3.3. Secondary structure of LTag 131-627. Arrows represent β-strands, bars denote α-helices  

32
PEN1 5’-GAGGC-3’ is highlighted and labeled.). Surprisingly, the other LTag OBD (Fig.
3.2c, e. This LTag is colored green and is referred hereafter as subunit two. The green
arrow indicates the direction of subunit two OBD from the C- to the N-termini.
Surprisingly, this corresponds to the 3’ to 5’ direction of PEN1 5’-GAGGC-3’ for subunit
two.) does not bind to PEN2, and when compared to LTag OBD of subunit one, this
OBD is on the opposite face of the duplex DNA and has an orientation opposite that of
subunit one OBD (Fig. 3.2c. Note the opposite directionalities of the orange and green
arrows on opposite faces of the EP-ori.). This illustrates the flexibility of the linker region
between the OBD and Zn domains (compare the orientations of OBD in Fig. 3.2d, e).

Helicase domain and interface between helicase domains
The two Zn domains of the LTag dimer are very similar to each other (RMSD of 0.67 for
C α atoms between residues 270 and 354) and to Zn domains of previously reported
structures of LTag helicase domains (RMSD between 0.38 and 0.58) (Fig. 3.4a. See the
good overlap of Zn domains of LTag in various states.). Also, the two Zn domains of the
LTag dimer are arranged in a fashion similar to the way adjacent Zn domains are
arranged in LTag helicase domain hexamers
25,36
(RMSD between 0.51 and 0.62) (Fig.
3.4d. See the good overlap of Zn domains of adjacent subunits of LTag in various states.).

However, the relative orientations between the Zn and AAA+ domains are different
between the two subunits, and are also different from previously reported LTag structures

33
Figure 3.4 Helicase domain and interface between helicase domains. (a) Structural
alignment of helicase domains in nucleotide-free (PDB ID 1SVO. colored white)/ADP
(PDB ID 1SVL. colored gray)/ATP (PDB ID 1SVM. colored black) states and DNA
bound state. The helicase domains in various states are aligned with respect to the Zn
domain and this shows the similarity of Zn domains, AAA+ domain movement and
shifting of the β-hairpins in different states. For subunit one, α10 of the AAA+ domain is
shifted away from the Zn domain by approximately 6 Å; for subunit two, α10 is rotated
towards the Zn domain by 13º. These two different changes in α10 positions bring the β-
hairpins of both subunits (highlighted and boxed) to close proximity, and are further
away from the β-hairpin positions of LTag in nucleotide-free and ADP states than that in
ATP state. This corresponds to upward movement of β-hairpins in the central channel of
LTag hexamer closer to the Zn tier and narrowing of the central channel. (b) Contacts
between Zn and AAA+ domains in the absence of DNA. The Zn and AAA+ domains are
held together weakly by hydrogen bonding between Lys281 and Asp367 (shown in stick
representation), and van der waals interaction between Pro311 and Ile374 (shown in
sphere representation). (c) Contacts between Zn and AAA+ domain disrupted in the
presence of DNA. In the presence of DNA, α10 shifts away from or rotates toward the Zn
domain. The van der waals interaction between Pro311 and Ile374 is disrupted. The
hydrogen bond between Lys281 and Asp367 is totally disrupted in subunit one and is
weakened in subunit two. A new hydrogen bond is formed between Lys281 and Arg371
in subunit two. (d) A tight interface between the AAA+ domains. The two Zn domains in
the LTag dimer are arranged similar to the way adjacent Zn domains in LTag hexamers

34
(Figure 3.4, continued) are arranged. Residues involved in this tight interface are
highlighted and shown in stick representation. (e) Interactions between the AAA+
domains at two areas: one surface involves β-hairpins close to the corresponding central
channel in LTag hexamer and the second area corresponds to the outer surface of LTag
hexamer. Residues at the interface are highlighted and shown in stick representation.
 

(Fig. 3.4a-c). For subunit one, α10 of the AAA+ domain is shifted away from the Zn
domain by approximately 6 Å (Fig. 3.4a). In all LTag helicase domain hexamer structures,
the Zn and AAA+ domains are held together weakly by hydrogen bonding between

35
Lys281 and Asp367, and van der waals interaction between Pro311 and Ile374 (Fig. 3.4b).
This shift of α10 from the Zn domain causes both interactions to be disrupted (Fig. 3.4c).
For subunit two, α10 is rotated towards the Zn domain by 13º. This disrupts the above-
mentioned van der waals interaction completely, while the original hydrogen bond is
weakened and a new one between Lys281 and Arg371 is formed (Fig. 3.4c). Interestingly,
these two different changes in α10 position bring the β-hairpins of both subunits one and
two to close proximity, and are in fact further away from the β-hairpin positions of LTag
in nucleotide-free and ADP states than that in ATP states (Fig. 3.4a. highlighted and
boxed). This shift in β-hairpin position corresponds to upward movement of β-hairpin  
closer to the Zn domain in the central channel of LTag helicase domain hexamer and
narrowing of the central channel
25
.  Together, these show that replication origin binding
induces changes in LTag greater than those brought about by ATP binding and hydrolysis,
and these changes in LTag correspond to narrowing of the already narrow central channel
in LTag hexamer.

The difference in the relative orientation between the Zn and AAA+ domains create an
interface between adjacent AAA+ domains tighter than those between adjacent subunits
of LTag in nucleotide-free and ADP states (buried interface of around 1850 Å
2
as
compared to 1000 and 1400 Å
2
) but weaker than that for LTag in ATP state (2800 Å
2
)
(Fig. 3.4d). The AAA+ domains of the two subunits contact each other mainly through
hydrogen bonding and electrostatic interaction at two areas: one involves β-hairpins close
to the corresponding central channel of LTag hexamer (Fig. 3.4e on the left) and the

36
second one which is close to the area corresponding to the outer surface of LTag hexamer
(Fig. 3.4e on the right). For the formal area, side chains of Lys446, Arg456, Glu460,
Lys512, His513 and main chains of residues Ala447, Lys512, His513 of subunit one
interact with side chains of Asp455, Asn508, Glu510, His513, Asn515, Lys516, Thr518
and main chains of residues Lys511, His513, and Leu514 of subunit two. For the latter
area, side chains of Asp429, Glu561, Glu565, Arg567 and Gln570, and main chain of
residue Glu565 of subunit one are involved; for subunit two, side chains of Asn415,
Lys418, Lys419, Arg498 and main chains of residues Tyr414, Asn415, Gly503, Ser504
are involved. These demonstrate that although the interaction between adjacent subunits
in LTag hexamer is quite variable, DNA and ATP binding both can strengthen the
interaction; and simultaneous binding of ATP and DNA can possibly create an even
stronger interface between subunits.

Interactions between OBD and origin
The structures of the two OBDs in each LTag dimer are similar to each other (Fig. 3.5c)
and to the previously reported OBD structures
5,42
(RMSD between 0.51 and 0.81 for C α
atoms of residues 140 to 246). The OBD of subunit one interacts with PEN1 in a fashion
similar to the OBD-PEN interaction reported previously
5,42
. More specifically, the loop
region between α1 and β1, and residues just N-terminal of α3 together create a surface
that interacts both specifically and non-specifically with PEN1 at the major groove.
Hydrogen bonding and electrostatic interaction involving both the main chains and side
chains of residues at this surface give rise to the sequence specific interactions. Side

37
chains of Ser152, Asn153, Arg154 and Arg204 interact with bases of 5’-GAGGC-3’, 5’-
GAGGC-3’, 5’-GAGGC-3’ and 3’-CTCCG-5’ of PEN1 respectively (the bases of
interest are underlined. 3’-CTCCG-5’ is complementary to the PEN1  5’-GAGGC-3’.)
and main chain of Asn153 interact with the bases of 5’-GAGGC-3’ and 3’-CTCCG-5’
while that of Arg154 interact with the base of 5’-GAGGC-3’ (Fig. 3.5a). Ser147, His148,
Leu150, Phe151, His203, Arg204, Thr210, and Asn227 interact non-specifically with
DNA backbone.

In contrast, the OBD of subunit two does not recognize either PEN1 or PEN2, but instead
interacts sequence specifically with a minimal pseudo-PEN site of only 5’-GC-3’ that is 3
nucleotides away from PEN1 in the EP region. Because this minimal pseudo-PEN site
lacks 5’-GAGGC-3’, side chain of Arg154 swings away from the base that it would
otherwise interact with and instead forms a hydrogen bond with Asn227 (Fig. 3.5b);
similarly for Ser152 (Fig. 3.5b). So overall only 3 rather than 7 bases are involved in
specific DNA recognition. The strength of non-specific protein-DNA interaction remains
comparable with Ser147, Ser155, Arg202, His203, Arg204, Asn227, Lys228 making
contacts with the DNA backbone.

Although the non-specific protein-DNA interactions involving OBD of subunit two is
comparable in strength to those of subunit one OBD, the loop region between Leu150
and Phe151 moves away from the DNA (Fig. 3.5c) and adopts a conformation the same
as that in the absence of DNA (Fig. 3.5c, shown in white). In the presence of PEN

38
recognition sequence, this loop swings towards and interacts non-specifically with DNA.
This, together with the extra recognition brought about by the canonical PEN sequence,
brings the origin closer to the OBD by bending the DNA 6º.

Moreover, the region between residues 216 and 219 of subunit two OBD adopts a
conformation different than that of subunit one OBD and previously reported OBD
structures (Fig. 3.4e). For subunit two, Phe218 points towards the core of OBD while
Phe218 is on the surface of the OBD in the other structures. Due to the hydrophobic
nature of the side chain of Phe, it is more favorable thermodynamically to have it tucked
away in the core of the protein rather than having it exposed on the surface of the protein.
This conformation change may thus hide or expose the binding sites of interacting
partners.

These together indicate that OBD can bind DNA non-specifically and this could be the
mechanism of recruiting LTag to the viral DNA. After being recruited to the viral DNA,
LTag traverses the viral DNA in search of the replication origin and the first LTag
subunit is locked onto each half of the replication origin when its OBD recognizes a PEN
site. At the same time, origin is bent by the OBD (at least for the first LTag recruited to
the origin), creating stress that could facilitate melting of the origin DNA. Moreover, due
to steric constraint, OBD of the second LTag recruited to each half of the origin is unable
to bind to the second PEN sequence and the binding of the second LTag OBD to the
minimal binding site 5’-GC-3’ stabilizes the dimeric LTag-replication origin intermediate.

39
During the various stages during the assembly of double hexamer at the replication origin,
the conformation of some surface residues (for example, residues 216 to 219) may
change, thus hiding or exposing binding sites of replication partners.

Interactions between Zn domain and origin
Interactions between the EP-ori and the Zn domain are mostly non-specific involving
both strands of DNA backbone. For the Zn domain, α9 of both subunits, α5 of subunit
one, and the loop regions just N-terminal of these helices participate in binding the origin.
The resulting interaction surface on subunit one is stronger than that on subunit two
(buried surface of 980 Å
2
and 130 Å
2
for subunits one and two respectively) and is also
closer to PEN1.

More specifically, these helices and loops create 3 surfaces on LTag dimer that are
situated at approximately the same face of the DNA. Surface one involves the side chains
of Glu264, Thr265 of subunit one and Glu261 of subunit two interacting with the
phosphate backbone of 5’-GCTC-3’, 5’-GCTC-3’ and 5’-GCTC-3’ respectively (Fig.
3.5e). Gln267 of subunit one is also part of this surface and interacts with the sugar of 5’-
GCTC-3’, and bases of 5’-GCTC-3’ and 3’-TTATC-5’ (Fig. 3.5e). The phosphate
backbone of a stretch of five nucleotides 3’-TTATC-5’ bind to the second surface: side
chains of Gln267, Ser269, Trp270, Lys271, Lys331 and Asn332 of subunit one; Gln267
and Asn332 of subunit two; main chain nitrogen of Val268, Trp270, Lys271, Lys331 and
Lys332; and main chain amide oxygen of Val268 (Fig. 3.5e). The phosphate backbone of

40
another stretch of four nucleotides 5’-CTGG-3’ bind to side chains of Gln338 and
Lys334 of both subunits, and Asp342 of subunit one. These together create the third
surface. The elucidation of LTag residues involved in protein-DNA interactions help to
explain origin nucleotides buried and melted by LTag double hexamer
6,7
. This also aids
in the understanding of LTag residues necessary for assembly of LTag at replication
origin.

Although Asp342 of subunit one is involved in the third interface, Asp342 of subunit two
is buried at the dimer interface. The same is true for residues on α5 of subunit one that
contact EP-ori. These show that during the assembly of hexamer at the replication origin,
residues that have been observed to be involved in protein-protein interactions between
adjacent subunits in the hexamer may in fact be involved in protein-DNA interactions. In
LTag hexamer, each subunit is in contact with 2 adjacent neighboring subunits. However,
this might not be the case for some subunits during the assembly of hexamer. A good
example is subunits one and two in our structure, each with one interface buried and one
interface free. Residues on the free interface can then be used to contact DNA. However,
once another LTag subunit is recruited to the intermediate during hexamer formation,
these residues on the then free interface will be buried while a new free interface might
be formed on the last LTag subunit recruited, thus propagating the free interface. These
transient protein-DNA interactions may be important for the stabilization of LTag at the
replication origin during assembly of LTag hexamer at the replication origin.

41



Figure 3.5 Interactions between LTag and EP-ori DNA. (a) Sequence specific interactions between OBD of subunit one and
PEN1. (b) Interactions between OBD of subunit two and the minimal pseudo-PEN site of 5’-GC-3’. (c) Bending of origin due
to PEN1 recognition by OBD. OBD in the absence of DNA is shown in white (PDB ID 2ITJ), and in the presence of its
canonical PEN binding sequence is shown in black (PDB ID 2ITL). PEN1 bound to subunit one is shown with bases, sugars
and thicker backbones. DNA bound to subunit two is shown only with thinner backbones. For subunit one, the loop region
between residues 150-151 extends away from the center of the OBD to contact PEN1, and the origin is bent by 6º. Loop
region between residues 216 and 219 of subunit two adopts a closed conformation different than that in subunit one and those
in previously reported structures. (d) Interactions between the AAA+ domain and the EP region. Interactions between AAA+
domain and EP-ori are mainly mediated by residues at the tip of β-hairpin and Phe459 of α14. Residues at the subunit-subunit
interface in LTag dimer and hexamer, Asn515 of subunit one and Arg456 of subunit two, can also interact with EP-ori when
they are buried at the subunit-subunit interface. (e) Three interaction surfaces for Zn domain binding EP-ori. Zn domains of
both subunits bind both strands of DNA on same side of DNA, mediated mostly are non-sequence specific interactions.
Interactions involving α9 are parallel between the subunits while other interactions differ between the subunits.



42
Figure 3.5, continued.  

43
An interesting observation is that Lys334 of the two subunits bind to consecutive
triphosphates of the DNA backbone on the same DNA strand in a helical fashion (Fig.
3.5e). This is also true for Gln338 and Asn332. These indicate that LTag assembles at the
replication origin in a helical fashion around the DNA backbone.

Interactions between AAA+ domain and origin
The interactions between the β-hairpin and the DNA are mainly mediated by the three
residues at the tip of β-hairpin (Lys512, His513 and Leu514) and Phe459 of α14. They
interact with the EP-ori at the minor groove. These four residues directly participate in
binding the DNA backbones and/or bases through a combination of charge-charge
interactions, hydrogen bonds, and hydrophobic packing (Fig. 3.5c). Lys512 and His513
of subunit one interact with phosphate backbone of 3’-TGAAGACC-5’, bases of 5’-
ACTTCTGG-3’/3’-TGAAGACC-5’ and sugar of 5’-ACTTCTGG-3’. Similarly, Lys512
and His513 of subunit two interact with backbone of 3’-TGAAGACC-5’, bases of 5’-
ACTTCTGG-3’/3’-TGAAGACC-5’ and sugar of 5’-ACTTCTGG-3’. Phe459 and
Leu514 of subunit one pack against the backbone of 3’-TGAAGACC-5 and 5’-
ACTTCTGG-3’ respectively while those of subunit two interact hydrophobically with the
backbone of 3’-TGAAGACC-5 and 5’-ACTTCTGG-3’. Overall AAA+ domains of
subunits one and two interact with EP-ori in a similar fashion, but the corresponding
nucleotides they interact with are 1 base apart and those subunit one contact are 1 base
closer to PEN1. This is further evidence supporting LTag assembly at the replication
origin in a helical fashion around the DNA backbone.

44
Additionally, Asn515 of subunit one and Arg456 of subunit two interact with the
phosphate backbone of 5’-ACTTCTGG-3’ and 3’-TGAAGACC-5’ respectively. These
two residues, when present at interface between LTag subunits, are involved in LTag
hexamer formation (Fig. 3.4e). These further support the notion of propagatsing free
interface during the assembly of LTag hexamer at the replication origin and the
importance of transient protein-DNA interaction during LTag assembly at the replication
origin.

Deformation of origin by LTag
A second structure of dimeric LTag bound to EP-ori was determined using a crystal
grown under a different crystallization condition (See table 3.1). There is no major
difference between the two structures except for the relative orientation between OBD
and helicase domain of subunit one (Fig. 3.2c in white). The OBD of subunit one is
rotated closer to the helicase domain, again showing the flexibility of the linker region
between OBD and helicase domain. This rotation of the OBD also results in bending of
the EP-ori further by 6º (Fig 3c. DNA backbone of this second structure is shown with
thinner lines). This is further evidence showing that the torsional stress created by EP-ori
bending can result in melting of replication origin.

Helical arrangement of LTag residues involved in binding origin
Each subunit in the LTag dimer structure can interact with the EP-ori in a parallel fashion
with the parts of EP-ori the 2 subunits interact with differing by only one base in registry.

45
This kind of interaction involves α9 of Zn domain and β-hairpin of AAA+ domain. On
the other hand, the EP-ori interaction surfaces on the two subunits are not identical. α5
contributes substantially to the LTag EP-ori interaction of subunit one (45%) and this
interaction is totally absent in subunit two.

If all the interactions between adjacent subunits with the origin are helical in nature and
differ by one base in registry, then each subunit would only occupy the twist of the
dsDNA of 36º. The LTag hexamer then would not be sufficient to encircle the origin
DNA. The LTag hexamer encircling the origin would have two adjacent subunits
occupying 120º of the DNA surface, and the helicase domains of the dimeric LTag
occupy this much of the DNA surface with one helicase domain occupying roughly 50%
more than the other. This translates into the subunits one and two occupying 72º and 36º
DNA surface respectively. The 72º DNA surface occupied by subunit one can be divided
into two 36º halves. One of the 36º halves is analogous to the 36º DNA surface occupied
by subunit two, and the other 36º halve involves interaction surface present on subunit
one but not subunit two.

This non-planar helical arrangement of the β-hairpins in dimeric LTag EP-ori structure is
similar to the staircase arrangement of the β-hairpins previously reported for the E1
helicase, either in the ssDNA-bound state or the ssDNA-free state
21,48
. The staircase
arrangement in the E1 hexamer enables each of the six β-hairpins to follow the phosphate
backbone of the ssDNA to make almost identical contacts with different nucleotides

46
along the helical path of the staircase-arranged β-hairpins. The present structure would
suggest that the β-hairpins between some adjacent subunits do follow the DNA backbone
to make almost identical contacts, at least during assembly at the origin. However, there
would be breaks along the helical path between some subunits to allow for stronger
LTag-origin contact and to allow a hexamer completely encircle the DNA.

Role of the Zn domain and assembly of LTag at replication origin
In the LTag EP-ori structure, the relative orientation of the two adjacent Zn domains is
essentially the same as those in DNA-free LTag structures
25,36
. Therefore, the relative
orientation of Zn domains appears to be independent of ATP binding, hydrolysis and
DNA interaction. Thus, the part of central channel formed by the Zn domain may serve
purely as a route to hold and direct the passage of DNA. It may not contribute directly to
origin melting and unwinding. In this sense, the Zn-domain of LTag helicase is
structurally and functionally similar to the N-terminal Zn-domain of the MCM complex,
whose conformation is also independent of the ATP binding and hydrolysis
1,8,14,23,37
.  

However, the Zn domain is important for the recruitment of LTag to the replication origin
as it can contribute up to 65 percent of interactions (as measured by buried surface area)
between LTag helicase domain and EP-ori. Also, it contributes 42 percent of the
interactions between subunits, suggesting that it can be important in the recruitment of
LTag subunits to the replication origin through protein-protein interactions. Another
evidence suggesting the importance of protein-protein interactions in LTag recruitment to

47
replication origin is that interactions between LTag and origin can be quite weak: the
strength of interactions between helicase domain of subunit two and EP-ori is only 45%
that of subunit one. Also the interaction between the two subunits is stronger than the
interactions between EP-ori and either subunits one and two.

Together, these suggest that LTag Zn domain have may different roles at different stages
of DNA replication. During the initiation of DNA replication, Zn domain facilitates the
recruitment of LTag subunits to the origin in two different ways: binding the origin as
well as interacting with Zn domains of adjacent LTag subunits. During origin melting,
the Zn domain together with the AAA+ domain may bend origin, resulting in origin
melting (discussed in a subsequent section). During DNA unwinding, it serves as a route
and directs the passage of DNA, thus anchoring the LTag hexamer to the replication fork.

Another consequence is the different roles of protein-protein and protein-DNA
interactions in recruiting different subunits to the replication origin. Protein-DNA
interactions are likely to be essential for the recruitment of the first LTag subunit to the
replication origin, with interfaces that would otherwise be buried in LTag hexamer in
contact with the origin. However, the recruitment of the second LTag subunit to the same
half of the replication origin is mainly mediated by protein-protein interactions with
protein-DNA interactions being important for bringing the second subunit to the
proximity of the viral DNA only. Subsequent recruitments of other subunits to the same

48
half of the origin are mediated by varying degrees of protein-protein and protein-DNA
interactions.

Mechanism of origin melting
Given the insights from our dimeric LTag EP-ori structure, the structure of the core
origin with each PEN sequence bound to one OBD
5
, and the preferences of LTag double
hexamers to bind to PEN1 and PEN3 sites
31,57
, a model of LTag at replication origin is
constructed as described below and is shown in Fig 3.6. The core origin is modeled onto
PEN1 by aligning the OBD at PEN1. A second LTag dimer is fitted onto the PEN3
position of the core origin by aligning OBD of subunit one with OBD at PEN3 position in
the core origin structure. Finally, one LTag hexamer is modeled onto each of the LTag
dimer by aligning the Zn domain.

The most striking feature of this model is that the directionality of the DNA is different
from those of the LTag central channels. This is because DNA is linear on a global scale
while locally DNA backbone runs at an angle to this linear directionality of DNA. Also
LTag subunits likely assemble onto the origin in a helical fashion along the DNA
backbone as discussed already. Once LTag hexamer assembles at the replication origin,
DNA will pass through the central channel formed by the helicase domain, resulting in
bending of DNA. This bending results in curvature of origin DNA, consistent with
varying degrees of curvature of LTag-origin observed in a previous EM study
28
. Bending
of dsDNA results in torsional stress which can be released without breaking any covalent  

49
Figure 3.6 A model of LTag at the replication origin. This model is generated by 1.
modeling the core origin (PDB ID 2ITL) onto PEN1 of the dimeric LTag structure; 2.
fitting a second LTag dimer onto the PEN3 position of the core origin; 3. modeling one
LTag hexamer (PDB ID 1SVM) onto each LTag dimer. The direction of the DNA is
different from the direction of the LTag central channel and DNA bending is required for
it to pass through the central channel. Bending of dsDNA results in torsional stress which
can be released by the disruption of Watson-Crick base pairing which results in origin
melting.


50
linkage by the disruption of Watson-Crick base pairing and melting of the origin. This
provides a molecular mechanism for strand separation during origin DNA melting after
assembly at replication origin. Varying extents of DNA bending by dimeric LTag at
different states support this mechanism.

Conclusion
We have described the first structure of dimeric LTag in complex with the SV40
replication origin. This is also the first LTag structure with both the OBD domain and the
helicase domain visible and shows that the linker region between OBD and helicase
domain is very flexible. The structural information reveals how a dimeric LTag
recognizes and assembles on the replication origin: LTag is brought to the viral DNA by
OBD recognizing its minimal binding sequence 5’-GC-3’. Once a LTag monomer
encounters its canonical penta-nucleotide recognition sequence, it is locked onto the
replication origin. The subsequent recruitment of the second LTag molecule to the same
half of the origin is mainly mediated by protein-protein interactions between the two
LTag subunits. Protein-protein and protein-DNA interactions may have different
importance for the initial recruitment of different LTag subunits to the origin. Certain
amino acid residues may play different roles at different stages of LTag assembly at
replication origin, either in binding the origin or its neighboring LTag subunit. The roles
and importance of amino acid residues in contact with the origin should be investigated
further. Also, the model that we have proposed for the complete assembly of a LTag
hexamer on each half of the origin DNA does not contradict the staircase arrangement of

51
the β-hairpins in E1 helicase structures
21,48
. Analysis of this dimeric structure, structures
of hexameric LTag helicase domain
25,36
and structure of OBD bound to the core origin
5

suggest a mechanism of origin melting: DNA bending induced by the LTag hexamer
assembly at the replication origin at an angle to the global directionality of DNA creates
torsional stress that is released by the origin melting. The LTag Zn domain is part of the
LTag hexamer central channel and thus may plays an important role in DNA bending and
origin melting. It is also important for anchoring LTag hexamer to the replication fork by
encircling dsDNA. Overall, our structure provides a starting point for future studies of
LTag assembly at replication origin, origin melting and DNA unwinding.  The models
proposed here for LTag recruitment to replication origin, origin melting and DNA
unwinding may be applicable to its eukaryotic counterparts.

2.2 Materials and Methods
Cloning, and protein expression and purification of LTag
LTag 131–627 construct was cloned in the pGEX-6P-1 vector as a N-terminal GST
fusion. The LTag protein was expressed in E. coli cells by IPTG induction at 18°C when
the cell density reached OD ~ 0.6. After cells were lysed by passage through a French
Press, the LTag protein was purified by glutathione affinity chromatography in a buffer
containing 50 mM Tris-HCl (pH 8.0), 250 mM NaCl and 1mM DTT. The GST was
subsequently removed by PreScission protease treatment. The LTag protein was further
purified by ion exchange chromatography, and gel filtration chromatography through a
Superdex 200 column.

52
DNA preparation for co-crystallization
Oligonucleotides 5’-actacttctggaatagctcagaggccgaggcg-3’ and 5’-
cgcctcggcctctgagctattccagaagtagt-3’ were purified by ion exchange chromatography
through a Mono Q column. The two purified oligonucleotides were mixed in 1:1 molar
ratio and annealed by heating to 94 ºC followed by slow cooling to room temperature.
Gel filtration chromatography with Superdex 75 was used to purify the ET-half of the
origin dsDNA away from unannealed ssDNA.
 
Co-crystallization and data collection
LTag 131–627 protein (10mg/ml) and ET-half of the origin dsDNA were mixed in a
1:1.25 molar ratio. Crystals were grown at either 4 or 18 ºC by the sitting drop vapor
diffusion method using this complex against a solution containing 100 mM bis-Tris (pH
6.75) and 20 % (v/v) PEG3350. Crystals were harvested after 2 weeks and flash frozen
with liquid nitrogen in the crystallization buffer containing 24% glycerol. Diffraction data
were collected at beamline 8.2.1 of Advance Light Source, Lawrence Berkeley National
Laboratory. The datasets were processed using HKL2000 and the statistics are
summarized in Table 3.1.

Structure determination and refinement
The initial phase of both crystal forms were obtained by Molecular Replacement: two
copies of each of the LTag helicase domain (PDB ID 1SVM) and OBD domain (PDB ID
2ITL) were found by PHASER. Very good DNA electron density could be seen at this

53
stage already, which allowed fitting of DNA using O and COOT. The registry of the
DNA was determined by the shape of the electron density corresponding to the DNA
bases. Iterative model re-building with O and refinement with CNS improved the phase
even further. The final refinement statistics and geometry as defined by Procheck are
very good and are summarized in Table 3.1.

Table 3.1 Data collection and model refinement statistics of LTag-EP origin structure
   4

ºC Crystal  18

ºC Crystal
Data collection
Cell dimensions (Space group P2
1
)
a, b, c (Å)    73.63, 128.31, 166.12 72.90, 125.50, 160.23
α, β, γ (˚)    90, 89.911, 90 90, 94.936, 90
Resolution (Å)    50–2.8 (2.9-2.8) 50–3.0 (3.11–3.0)
Observations    172018 153865
R
merge
   8.3 (59.2) 9.7 (63.1)
I/ σI    12.1 (1.2) 13.0 (1.3)
Completeness (%)    93.2 (88.4) 96.0 (82.3)
Refinement
Resolution (Å)    50.0–2.8  50.0–3.0
No. reflections    65350  52489
R
work
/ R
free
   22.53/25.29  25.18/27.76
B-factor (Averaged): Protein    58.554  73.727
                                 DNA    55.396  68.898
R.m.s deviations: Bond lengths (Å) 0.008655  0.009027
                            Bond angles (˚)    1.20198  1.22384
Highest-resolution shell values are shown in parentheses.


54
References
1.  Barry, E. R., McGeoch, A. T., Kelman, Z. & Bell, S. D. Archaeal MCM has
separable processivity, substrate choice and helicase domains. Nucleic Acids Res.
35, 988-998 (2007).
2.  Bertram, J. G., Bloom, L. B., O'Donnell, M. & Goodman, M. F. Increased dNTP
binding affinity reveals a nonprocessive role for Escherichia coli beta clamp with
DNA polymerase IV. J. Biol. Chem. 279, 33047-33050 (2004).
3.  Betts, L., Xiang, S., Short, S. A., Wolfenden, R. & Carter, C. W., Jr. Cytidine
deaminase. The 2.3 A crystal structure of an enzyme: transition-state analog
complex. J. Mol. Biol. 235, 635-656 (1994).
4.  Blow, J. J. & Dutta, A. Preventing re-replication of chromosomal DNA. Nat. Rev.
Mol. Cell Biol. 6, 476-486 (2005).
5.  Bochkareva, E., Martynowski, D., Seitova, A. & Bochkarev, A. Structure of the
origin-binding domain of simian virus 40 large T antigen bound to DNA. EMBO J.
25, 5961-5969 (2006).
6.  Borowiec, J. A., Dean, F. B., Bullock, P. A. & Hurwitz, J. Binding and
unwinding--how T antigen engages the SV40 origin of DNA replication. Cell 60,
181-184 (1990).
7.  Borowiec, J. A. & Hurwitz, J. Localized melting and structural changes in the
SV40 origin of replication induced by T-antigen. EMBO J. 7, 3149-3158 (1988).
8.  Brewster, A. S. et al. Crystal structure of a near-full-length archaeal MCM:
functional insights for an AAA+ hexameric helicase. Proc. Natl. Acad. Sci. U. S.
A 105, 20191-20196 (2008).
9.  Chelico, L., Pham, P., Calabrese, P. & Goodman, M. F. APOBEC3G DNA
deaminase acts processively 3[prime] [rarr] 5[prime] on single-stranded DNA.
Nature Struct. Mol. Biol. 13, 392-399 (2006).
10.  Chelico, L., Sacho, E. J., Erie, D. A. & Goodman, M. F. A model for oligomeric
regulation of APOBEC3G cytosine deaminase-dependent restriction of HIV. J.
Biol. Chem. 283, 13780-13791 (2008).
11.  Chen, K. M. Extensive mutagenesis experiments corroborate a structural model
for the DNA deaminase domain of APOBEC3G. FEBS Lett. 581, 4761-4766
(2007).

55
12.  Chen, K. M. Structure of the DNA deaminase domain of the HIV-1 restriction
factor APOBEC3G. Nature 452, 116-119 (2008).
13.  Chiu, Y. L. & Greene, W. C. The APOBEC3 cytidine deaminases: an innate
defensive network opposing exogenous retroviruses and endogenous
retroelements. Annu. Rev. Immunol. 26, 317-353 (2008).
14.  Chong, J. P., Hayashi, M. K., Simon, M. N., Xu, R. M. & Stillman, B. A double-
hexamer archaeal minichromosome maintenance protein is an ATP-dependent
DNA helicase. Proc. Natl. Acad. Sci. U. S. A 97, 1530-1535 (2000).
15.  Chung, S. J., Fromme, J. C. & Verdine, G. L. Structure of human cytidine
deaminase bound to a potent inhibitor. J. Med. Chem. 48, 658-660 (2005).
16.  Conticello, S. G., Langlois, M. A., Yang, Z. & Neuberger, M. S. DNA
deamination in immunity: AID in the context of its APOBEC relatives. Adv.
Immunol. 94, 37-73 (2007).
17.  De, F. M. et al. The human GINS complex binds to and specifically stimulates
human DNA polymerase alpha-primase. EMBO Rep. 8, 99-103 (2007).
18.  Deb, S., DeLucia, A. L., Baur, C. P., Koff, A. & Tegtmeyer, P. Domain structure
of the simian virus 40 core origin of replication. Mol. Cell Biol. 6, 1663-1670
(1986).
19.  Deb, S. et al. The T-antigen-binding domain of the simian virus 40 core origin of
replication. J. Virol. 61, 2143-2149 (1987).
20.  DeLano, W. L. The PyMOL Molecular Graphics System.  2002.  Macmillan
Publishers Limited. All rights reserved.  
Ref Type: Generic
21.  Enemark, E. J. & Joshua-Tor, L. Mechanism of DNA translocation in a
replicative hexameric helicase. Nature 442, 270-275 (2006).
22.  Fanning, E. & Zhao, K. SV40 DNA replication: from the A gene to a
nanomachine. Virology 384, 352-359 (2009).
23.  Fletcher, R. J. et al. The structure and function of MCM from archaeal M.
Thermoautotrophicum. Nat. Struct. Biol. 10, 160-167 (2003).
24.  Forsburg, S. L. Eukaryotic MCM proteins: beyond replication initiation.
Microbiol. Mol. Biol. Rev. 68, 109-131 (2004).

56
25.  Gai, D., Zhao, R., Li, D., Finkielstein, C. V. & Chen, X. S. Mechanisms of
conformational change for a replicative hexameric helicase of SV40 large tumor
antigen. Cell 119, 47-60 (2004).
26.  Gambus, A. et al. GINS maintains association of Cdc45 with MCM in replisome
progression complexes at eukaryotic DNA replication forks. Nat. Cell Biol. 8,
358-366 (2006).
27.  Gomez, E. B., Angeles, V. T. & Forsburg, S. L. A screen for
Schizosaccharomyces pombe mutants defective in rereplication identifies new
alleles of rad4+, cut9+ and psf2+. Genetics 169, 77-89 (2005).
28.  Gomez-Lorenzo, M. G. et al. Large T antigen on the simian virus 40 origin of
replication: a 3D snapshot prior to DNA replication. EMBO J. 22, 6205-6213
(2003).
29.  Goodman, M. F., Scharff, M. D. & Romesberg, F. E. AID-initiated purposeful
mutations in immunoglobulin genes. Adv. Immunol. 94, 127-155 (2007).
30.  Gulbis, J. M., Kelman, Z., Hurwitz, J., O'Donnell, M. & Kuriyan, J. Structure of
the C-terminal region of p21(WAF1/CIP1) complexed with human PCNA. Cell
87, 297-306 (1996).
31.  Joo, W. S., Kim, H. Y., Purviance, J. D., Sreekumar, K. R. & Bullock, P. A.
Assembly of T-antigen double hexamers on the simian virus 40 core origin
requires only a subset of the available binding sites. Mol. Cell Biol. 18, 2677-2687
(1998).
32.  Kamada, K., Kubota, Y., Arata, T., Shindo, Y. & Hanaoka, F. Structure of the
human GINS complex and its assembly and functional interface in replication
initiation. Nat. Struct. Mol. Biol. 14, 388-396 (2007).
33.  Kim, H. Y. et al. Sequence requirements for the assembly of simian virus 40 T
antigen and the T-antigen origin binding domain on the viral core origin of
replication. J. Virol. 73, 7543-7555 (1999).
34.  Kubota, Y. et al. A novel ring-like complex of Xenopus proteins essential for the
initiation of DNA replication. Genes Dev. 17, 1141-1152 (2003).
35.  Labib, K. & Gambus, A. A key role for the GINS complex at DNA replication
forks. Trends Cell Biol. (2007).
36.  Li, D. et al. Structure of the replicative helicase of the oncoprotein SV40 large
tumour antigen. Nature 423, 512-518 (2003).

57
37.  Liu, W., Pucci, B., Rossi, M., Pisani, F. M. & Ladenstein, R. Structural analysis
of the Sulfolobus solfataricus MCM protein N-terminal domain. Nucleic Acids
Res. 36, 3235-3243 (2008).
38.  Losey, H. C., Ruthenburg, A. J. & Verdine, G. L. Crystal structure of
Staphylococcus aureus tRNA adenosine deaminase TadA in complex with RNA.
Nature Struct. Mol. Biol. 13, 153-159 (2006).
39.  Makarova, K. S., Wolf, Y. I., Mekhedov, S. L., Mirkin, B. G. & Koonin, E. V.
Ancestral paralogs and pseudoparalogs and their role in the emergence of the
eukaryotic cell. Nucleic Acids Res. 33, 4626-4638 (2005).
40.  Marinsek, N. et al. GINS, a central nexus in the archaeal DNA replication fork.
EMBO Rep. 7, 539-545 (2006).
41.  Mastrangelo, I. A. et al. ATP-dependent assembly of double hexamers of SV40 T
antigen at the viral origin of DNA replication. Nature 338, 658-662 (1989).
42.  Meinke, G. et al. The crystal structure of the SV40 T-antigen origin binding
domain in complex with DNA. PLoS. Biol. 5, e23 (2007).
43.  Moyer, S. E., Lewis, P. W. & Botchan, M. R. Isolation of the Cdc45/Mcm2-
7/GINS (CMG) complex, a candidate for the eukaryotic DNA replication fork
helicase. Proc. Natl. Acad. Sci. U. S. A 103, 10236-10241 (2006).
44.  Otwinowski, Z. & Minor, W. Processing of X-ray diffraction data collected in
oscillation mode. Methods Enzymol. 276, 307-326 (1997).
45.  Pacek, M., Tutter, A. V., Kubota, Y., Takisawa, H. & Walter, J. C. Localization
of MCM2-7, Cdc45, and GINS to the site of DNA unwinding during eukaryotic
DNA replication. Mol. Cell 21, 581-587 (2006).
46.  Peled, J. U. The biochemistry of somatic hypermutation. Annu. Rev. Immunol. 26,
481-511 (2008).
47.  Prochnow, C., Bransteitter, R., Klein, M. G., Goodman, M. F. & Chen, X. S. The
APOBEC-2 crystal structure and functional implications for the deaminase AID.
Nature 445, 447-451 (2007).
48.  Sanders, C. M. et al. Papillomavirus E1 helicase assembly maintains an
asymmetric state in the absence of DNA and nucleotide cofactors. Nucleic Acids
Res. 35, 6451-6457 (2007).
49.  Schneider, T. R. & Sheldrick, G. M. Substructure solution with SHELXD. Acta
Crystallogr. D 58, 1772-1779 (2002).

58
50.  Seki, T. et al. GINS is a DNA polymerase epsilon accessory factor during
chromosomal DNA replication in budding yeast. J. Biol. Chem. 281, 21422-
21432 (2006).
51.  Shen, J., Gai, D., Patrick, A., Greenleaf, W. B. & Chen, X. S. The roles of the
residues on the channel beta-hairpin and loop structures of simian virus 40
hexameric helicase. Proc. Natl. Acad. Sci. U. S. A 102, 11248-11253 (2005).
52.  Simmons, D. T. SV40 large T antigen functions in DNA replication and
transformation. Adv. Virus Res. 55, 75-134 (2000).
53.  Singleton, M. R., Dillingham, M. S. & Wigley, D. B. Structure and mechanism of
helicases and nucleic acid translocases. Annu. Rev. Biochem. 76, 23-50 (2007).
54.  Sreekumar, K. R., Prack, A. E., Winters, D. R., Barbaro, B. A. & Bullock, P. A.
The simian virus 40 core origin contains two separate sequence modules that
support T-antigen double-hexamer assembly. J. Virol. 74, 8589-8600 (2000).
55.  Takahashi, T. S., Wigley, D. B. & Walter, J. C. Pumps, paradoxes and
ploughshares: mechanism of the MCM2-7 DNA helicase. Trends Biochem. Sci.
30, 437-444 (2005).
56.  Takayama, Y. et al. GINS, a novel multiprotein complex required for
chromosomal DNA replication in budding yeast. Genes Dev. 17, 1153-1165
(2003).
57.  Titolo, S., Welchner, E., White, P. W. & Archambault, J. Characterization of the
DNA-binding properties of the origin-binding domain of simian virus 40 large T
antigen by fluorescence anisotropy. J. Virol. 77, 5512-5518 (2003).
58.  Valle, M., Chen, X. S., Donate, L. E., Fanning, E. & Carazo, J. M. Structural
basis for the cooperative assembly of large T antigen on the origin of replication.
J. Mol. Biol. 357, 1295-1305 (2006).
59.  Xie, K. et al. The structure of a yeast RNA-editing deaminase provides insight
into the fold and function of activation-induced deaminase and APOBEC-1. Proc.
Natl. Acad. Sci. U. S. A 101, 8114-8119 (2004).
60.  Yabuuchi, H. et al. Ordered assembly of Sld3, GINS and Cdc45 is distinctly
regulated by DDK and CDK for activation of replication origins. EMBO J. 25,
4663-4674 (2006).



59
Appendix A: Crystal Structure of the Anti-viral APOBEC3G Catalytic Domain and
Functional Implications

Reproduced with permission from Holden, L.G.,
*
Prochnow, C.,
*
Chang, Y.P.,
*

Bransteitter, R., Chelico, L., Sen, U., Stevens, R.C., Goodman1, M.F., Chen, X.S. 2008.
Crystal structure of the anti-viral APOBEC3G catalytic domain and functional
implications. Nature 456(7218):121-124. Copyright 2008 Macmillan Publishers Limited.
* These authors contributed equally to this work  
Author contributions: LGH cloned the construct, and purified and crystallized the protein.
YPC collected and processed diffraction data, and determined the structure. LC did the
assays. CP and RD helped with the design, execution, and interpretation of the
experiments. MFG and XSC supervised the project.

A.1 Abstract
The APOBEC family members are involved in diverse biological functions. APOBEC3G
restricts the replication of human immunodeficiency virus (HIV), hepatitis B virus and
retroelements by cytidine deamination on single-stranded DNA or by RNA
binding
13,16,29,46
. Here we report the high-resolution crystal structure of the carboxy-
terminal deaminase domain of APOBEC3G (APOBEC3G-CD2) purified from
Escherichia coli. The APOBEC3G-CD2 structure has a five-stranded -sheet core that is
common to all known deaminase structures and closely resembles the structure of another
APOBEC protein, APOBEC2
47
. A comparison of APOBEC3G-CD2 with other
deaminase structures shows a structural conservation of the active-site loops that are
directly involved in substrate binding. In the X-ray structure, these APOBEC3G active-

60
site loops form a continuous 'substrate groove' around the active centre. The orientation
of this putative substrate groove differs markedly (by 90 degrees) from the groove
predicted by the NMR structure
12
. We have introduced mutations around the groove, and
have identified residues involved in substrate specificity, single-stranded DNA binding
and deaminase activity. These results provide a basis for understanding the underlying
mechanisms of substrate specificity for the APOBEC family.

A.2 Results
We have purified the human wild-type C-terminal cytidine deaminase domain of
APOBEC3G (APOBEC3G-CD2, residues 197–380) expressed in E. coli. APOBEC3G-
CD2 (with and without a glutathione S-transferase (GST) tag) is highly soluble, and
deaminates cytidine to uridine on single-stranded DNA (ssDNA) with a specific activity
of 5 fmol μg
-1
min
-1
, which is about 25-fold lower than that of the full-length
APOBEC3G (GST–APOBEC3G; 126 fmol μg
-1
min
-1
) expressed in E. coli (Fig. A.1a).
We analysed the processive and polar properties of APOBEC3G-CD2 and fulllength
APOBEC3G (Fig. A.1b). Similar to the insect-cell-derived fulllength APOBEC3G
9,10
,
the full-length APOBEC3G expressed in E. coli processively deaminates cytidine in two
5’CCC3’ motifs located on a ssDNA substrate, during one binding event (Fig. A.1b). The
full-length APOBEC3G also exerts a 3’ to 5’ deamination bias by preferentially
deaminating the cytidine in the CCC motif near the 59 end of the ssDNA substrate (Fig.
A.1b). In contrast, the APOBEC3G-CD2 exhibits an approximate twofold decrease in
processivity and virtually no 3’ to 5’ deamination bias (Fig. A.1b). These results indicate  

61
Figure A.1 The X-ray structure of enzymatically active APOBEC3G-CD2. (a) Analysis
of the deamination activity for full-length GST–APOBEC3G (GST–A3G) and GST–
APOBEC3G-CD2 (GST–CD2) and APOBEC3G-CD2 (CD2). The 32-nucleotide (nt)
band indicates deamination activity. F represents the position of fluorescein-dT
incorporated into the ssDNA. (b) APOBEC3G processivity and the 3' to 5' deamination
bias was characterized on ssDNA with two CCC motifs. Single deaminations of the 5'C
and the 3'C appear as 67- and 48-nucleotide fragments, respectively; deamination of both
the 5'C and the 3'C results in a 30-nucleotide fragment (see Materials and Methods). (c),
(d) Two views of the APOBEC3G-CD2 domain rotated 90° showing the five-stranded -
sheet core surrounded by six helices. The zinc is represented as a red sphere.


62
that APOBEC3G-CD2 partially retains the catalytic properties of fulllength APOBEC3G,
but that the CD1 domain in the context of the full-length APOBEC3G is probably
required for the strong processive property and the 3’ to 5’ deamination bias on ssDNA.

We crystallized the wild-type APOBEC3G-CD2 and solved the structure by using the
multi-wavelength anomalous dispersion (MAD) phasing method with selenium-
substituted methionine (Se-Met) diffraction data. The 2.3Å resolution X-ray structure
shows a core β-sheet that is composed of five β-strands surrounded by six α-helices (Fig.
A.1c, d). Helices 2–4 (h2–h4) are packed alongside one face of the core β-sheet (Fig.
A.1c), whereas helix 1 (h1) and helix 5 (h5) are packed against the opposite face of the β-
sheet (Fig. A.1c, d). Helix 6 (h6) is located at the edge.

A recently reported NMR structure of an APOBEC3G-CD2 mutant (APOBEC3G-2K3A,
PDB ID 2JYW)
12
resembles the X-ray structure of the wild-type APOBEC3G-CD2.
However, the superposition of the two structures shows notable differences (Fig. A.2).
The β2 strand and the amino-terminal helix (h1) are absent from the NMR structure (Fig.
A.2b, c, in grey). The β2 strand in the X-ray structure does not make crystal contact with
neighbouring monomers. Thus, the formation of the β2 strand is unlikely to be the result
of crystal contact. Furthermore, a similar β2 strand within a fives tranded β-sheet core is
the common structural feature that is observed in all wild-type cytidine deaminase
structures available so far (Fig. A.3a, b, d, e). Therefore, an intact full-length β2 strand
and the five-stranded β-sheet core is probably the feature of wild-type APOBEC3G-CD2
and all other APOBEC proteins. The structural differences observed in the NMR  

63
Figure A.2 Structural comparison of the Apo3G-CD2 X-ray structure with the Apo3G-
2K3A NMR structure. (a) Superposition of the Apo3G-CD2 X-ray structure (yellow) and
the Apo3G-2K3A NMR structure (grey) (RMSD=4.8 Å2)
12
. The residues which form
the β2 strand in the X-ray structure form a loop-like bulge in the NMR structure
(thickened loop). Superposition of Apo3G-CD2 (yellow) and an Apo2 monomer (cyan)
(RMSD= 2.7 Å2) (inset). The overlay RMSD values indicate that the Apo3G X-ray
structure differs significantly more from the NMR structure than it does from the Apo2
structure. (b), (c) Two views of the superposition of the Apo3G-CD2 X-ray structure
(yellow) and Apo3G-2K3A NMR structure (grey) with helices 2,3, and 4 removed to
show the differences in h1, β2 strand, AC-loop 1, and AC-loop 3. The view in panel c is
rotated 180º relative to that in panel b. Highlighted are two of the five point mutations,
L234K and C243A, that were made in order to obtain soluble protein for the Apo3G-
2K3A NMR structure determination. These mutations are located on the N and C-
terminus of the β2 strand of the X-ray structure (blue), and on the loop-like bulge of the
NMR structure (green).


64
Figure A.3 Common and distinct structural features between APOBEC proteins and
other Zn-deaminase superfamily enzymes. Monomer and oligomer (insets) X-ray
structures of various deaminases show a common five-stranded β-sheet core among all
the Zn-dependent deaminase superfamily. The distinct h1, h4, and h6 structures are
unique to the X-ray structures of APOBEC proteins. The number and positions of the
surrounding helices influence how the deaminases oligomerize (insets). The active site
Zn is represented by red sphere. (a) The Apo3G-CD2 monomer. (b) The Apo2 monomer
47
. The long helix 4 and helix 6 of Apo3G-CD2 and Apo2 are unique structural features
that are absent from the other cytidine deaminases, and that guide the elongated
oligomerization so that the active sites are likely to be accessible to DNA and RNA
substrates.; (inset) an Apo2 tetramer (PDB ID 2NYT). The elongated Apo2 tetramer is
formed from a head-head interaction between two dimers (h4 and h6 from each dimer are
labeled). Each Apo2 dimer is formed through the pairing of β2 strands from each
monomer ( β2 strands are labeled in left dimer). (c) The Staphylococcus aureus tRNA
adenosine deaminase TadA monomer
38
; (inset) a TadA dimer (PDB ID 2B3J). (d) The
human free-nucleotide cytidine deaminase (hCDA) monomer
15
; (inset) a square shaped
hCDA tetramer (PDB ID 1MQ0). (e) The ScCDD1 monomer
59
; (inset) a square shaped
CDD1 tetramer (PDB ID 1RST). (f), The E.coli free-nt cytidine deaminase monomer
3
. In
the ECDA, this h4* region connects the larger catalytic N-terminal domain with the
smaller pseudo-catalytic domain at the C-terminus. There is no pseudo catalytic domain
equivalent to that of ECDA present in Apo3G or other APOBEC members; (inset) a
square-shaped ECDA dimer (PDB ID 1ALN).

65
Figure A.3, continued.


66
structure could have resulted from the five mutations on the APOBEC3G protein used for
NMR study (Fig. A.2b, c), or from the different methodology used for determining the
structure, or from both.

A superposition of the core structures of APOBEC3G-CD2 and APOBEC2 monomers
shows substantial overlap for all five β-strands and all six helices (Fig. A.4a), suggesting
that these core structures of APOBEC family members are highly conserved. Yet, the
structural overlap shows notable differences in the active centre (AC) loops, referred to as
AC loops 1 and 3, which potentially mark the differences in substrate se and activity of
the two proteins (Fig. A.4b, c). The AC loop 1, which connects h1 with the β1 strand, is
located further away from the active site in APOBEC3G than in POBEC2 (Fig. A.4b, c).
The APOBEC3G AC loop 3 is longer than that of APOBEC2 and is positioned further
away from the active site (Fig. A.4b, c).

The structure shows elaborate bonding interactions that can stabilize the open
conformation of APOBEC3G AC loops 1 and 3. For example, AC loop 1 forms an
extensive bonding network through R215, which anchors this loop to other parts of the
structure (Fig. A.4d). R215 interactions include the direct contact with R313 and W285
located near the core structure (Fig. A.4d). We demonstrate later that the R215E mutation
in APOBEC3G abolishes deamination activity, which is consistent with a previous
study
11
. Similarly, the APOBEC3G AC loop 3 is stabilized by multiple hydrogen bonds
between the main-chain atoms of residues R256, F252, L253, H248 and Q245 within the

67
Figure A.4 Structural comparison of APOBEC3G-CD2 with APOBEC2. (a) Core
structures of APOBEC3G-CD2 (yellow) and APOBEC2 (cyan) superimposed. The red
sphere represents zinc. (b), (c) The superposition of APOBEC3G-CD2 and an APOBEC2
monomer, with the AC loop 1 collapsed over the active site (conformation 1, b) or
forming an α-hairpin (conformation 2, c). (d) In APOBEC3G-CD2, the AC loop 1 R215
residue forms hydrogen bonds (green dashes) with F204, W211, N207, E209 and W285
(pink). The R215 aliphatic chain hydrophobically packs with F204, R313 and W285. (e)
The APOBEC3G-CD2 AC loop 3 residues, R256, F252, L253, H248 and Q245 (pink),
form main-chain hydrogen bonds (green dashes). The conserved N244 is shown in cyan.
The active site residues are H257, C288 and C291 (wheat).



68
loop (Fig. A.4e). The loop residue R256 interacts with D264 on a core helix by a strong
salt bridge, and R256 hydrophobically packs with the loop residue F252 by the long
aliphatic chain (Fig. A.4e). All of these interactions should help stabilize the
conformation of AC loop 3. Shown later, an APOBEC3G R256E mutant, which probably
disrupts the AC loop 3 conformation, greatly impairs demination activity.

In the active site of APOBEC3G-CD2, a zinc atom is coordinated by the three residues
H257, C288 and C291, and a water molecule (Fig. A.5a). The closely positioned water
molecule can be activated to become a Zn-hydroxide for nucleophilic attack in the
deamination reaction
15
. Two residues (N244 and H257) on the APOBEC3G AC loop 3
show a structural conservation with many distantly related Zn-deaminases, specifically
TadA and human CDA10,11 (Fig. A.5b, c). The two equivalent TadA residues (N42 and
H53) on a TadA loop (similar to the AC loop 3) directly contact the target base of the
RNA substrate (Fig. A.5b). These residues overlap well with the APOBEC3G residues
N244 and H257 on the AC loop 3 in the superposition of the two structures (Fig. A.5b)
38
.
Similarly, two equivalent residues (N54 and C65) on a human CDA loop contact the
substrate/inhibitor
15
, and also overlap with N244 and H257 on the AC loop 3 of
APOBEC3G (Fig. A.5c). This structural conservation suggests that the APOBEC3G-CD2
residues, N244 and H257, are also involved in substrate contact. In an in vitro assay, the
APOBEC3G N244A mutant had no detectable deamination activity (Fig. A.5f). The
structural conservation of the position of these residues suggests that the open
conformation of the APOBEC3G AC loop 3 is in a position ready to bind nucleic acid.

69
Figure A.5 Predicted substrate groove and deamination activity of APOBEC3G mutants.
(a) The active site residues of APOBEC3G-CD2. The water and zinc molecules are cyan
and red spheres, respectively. (b) Superposition of APOBEC3G-CD2 (A3G-CD2, yellow)
and TadA (light blue, PDB ID 2B3J). (c) Superposition of APOBEC3G-CD2 (yellow)
and human CDA (pink, PDB ID 1MQO). (d) Surface representation of APOBEC3G-CD2,
showing a horizontal groove with residues (magenta) predicted to interact with ssDNA.
ssDNA is represented by a green line. f, Mutational data of APOBEC3G purified from
Sf9 (left) or from E. coli (right) are shown. The right inset shows the relative deamination
of the 3'C (5'CCC) or the middle C (5'CCC) on a ssDNA substrate by Sf9 purified
proteins. Error bars represent the standard deviation.



70
A surface representation of the APOBEC3G-CD2 X-ray structure shows that the AC
loops 1 and 3 and the regions near the active site form a deep, spacious groove that runs
horizontally across the active centre pocket (Fig. A.5d). This groove is not present in the
APOBEC3G-2K3A NMR structure because of the structural differences
12
(Fig. A.6d–f).
The structural features in this groove strongly suggest a role for binding ssDNA
substrates. The groove starts between the AC loops 1 and 3 on the right side of the
displayed structure (Fig. A.5d), leads into a deep pocket where the Zn atom is located and
slightly exposed, and continues towards the left side over helix 6. The target base must be
positioned into the active site so that the attachment of the Zn hydroxyl group can occur
on the cytidine base during the deamination reaction (Fig. A.5c). The ssDNA lying across
this horizontal groove can present a cytidine base so that it is directed towards the active
site Zn in the correct orientation and angle to permit deamination, as shown in the case of
TadA and human CDA (Fig. A.5a–c)
15,38
.

Within this horizontal groove are a group of charged residues (R213, R215, N244, R256,
R313, D316, D317, R320, R374 and R376) and hydrophobic residues (W285, Y315 and
F289; Fig. A.5e). In our mutagenesis study, we show that all of these residues are
important for the deamination activity on ssDNA (Fig. A.5f). However, they affect the
deamination activity in different ways. The R374 and R376 residues are located on one
end of the groove and are positioned to interact with a negatively charged ssDNA
phosphate backbone. The ssDNA binding of the R374E/R376D double mutant is
impaired by 46 % in comparison to that of the wild-type APOBEC3G, and the

71
Figure A.6 Comparison of the horizontal substrate groove of the X-ray structure (a), (b),
(c) with the brim-model based on the NMR structure (d), (e), (f). For ease of comparison,
all panels are shown in the same orientation as used previously to describe the DNA
binding model in the report by Chen, et al., 2008
12
. All of the common structural features
(h2, h3, h6, and the Zn atom) of both structures occupy the same position. (a) The X-ray
structure of Apo3G-CD2 with residues predicted to interact with ssDNA (shown as sticks
in magenta). (b) A surface representation of the X-ray structure of Apo3G-CD2, showing
a horizontal groove with residues predicted to interact with ssDNA lining around the
groove (shown as sticks in magenta). Mutational analysis of most of these residues has
demonstrated their important role in deaminase activity (Fig. A.5f). The Apo3G-CD2
AC-loop 1, AC-loop3 and helix 1 (yellow) provide a wide open groove that may
accommodate DNA binding. Predicted ssDNA binding is represented by a green line
along the “substrate” groove, with the target cytidine (in green) presented to the active
site Zn atom from the only accessible direction for deamination. Notice that a small area
of the active site zinc atom is exposed on the surface of the groove floor near the
activating water molecule, as in the case for Tad and hCDA
15,38
. This zinc exposure is
required for interaction with the target base. (c) A surface representation of the X-ray
structure of Apo3G-CD2 with the overlay of the NMR AC-loop1 (line in dark blue)
blocking the horizontal groove formed between AC loop 1 and 3 in the X-ray structure
(in yellow). (d) The NMR structure of Apo3G-2K3A with residues previously predicted
to interact with ssDNA in the brim-model (sticks in dark blue). The positions of some of
these residues on the X-ray structure are shown in Fig. A.6a in the same orientation.

72
(Figure A.6, continued) Notice that the Zn atom is completely buried here, which is in
contrary to that in the X-ray structure (Fig. A.6b). (e) A surface representation of NMR
structure of Apo3G-2K3A shown in the same orientation as in Fig. A.6d. Notice the
horizontal groove is not present; instead a vertical groove is visible. This vertical groove
is the proposed DNA binding path (dashed line). However, this vertical binding would be
difficult for the target C to be presented to the active site Zn atom. f, A surface
representation of NMR structure of Apo3G-2K3A with the X-ray Apo3G-CD2 helix 1
shown to block the vertical groove present in the NMR model.



73
deamination activity is even more disrupted (Fig. A.5f). On the edge of the groove, the
AC loop 1 R213 residue can make contact with ssDNA. Consistent with a previous
report6, the R213E mutant has only weak deamination activity (Fig. A.5f).

Three of the charged residues (R256, R215 and R313) are involved in elaborate bonding
networks for the AC loops (Fig. A.4d, e and A.5e), and should be important for
maintaining the groove conformation. ThemutantsR215E, R256E and R313E/R320D
show only minimal or no deamination activity (Fig. A.5f). The primary functional role of
these residues may be to maintain the conformation of the substrate groove rather than to
directly contact ssDNA. Mutation of the R313 residue can disrupt its interaction with
W285, which is located on the floor of the groove near the active site Zn (Fig. A.5e).
Y315 next to W285 is also on the floor of the groove. Both residues could stack with
bases of ssDNA and position the DNA into the active site (Fig. A.5d, e). Mutants W285A
and Y315A show no detectable deamination activity (Fig. A.5f), consistent with a
previous report
11
. Another hydrophobic residue on the edge of the groove is F289, and
the F289A mutant has greatly reduced deaminase activity (Fig. A.5d–f).

Notably, next to Y315 and W285 are two negatively charged resi-dues (D316 and D317)
on the floor of the groove (Fig. A.5d, e). The mutant D316R/D317R has higher
deamination activity (1.6-fold), as well as higher ssDNA binding (twofold) compared to
the wild-type APOBEC3G (Fig. A.5f). These enhanced activities could be caused by the
increased total positive charge in the groove. Furthermore, this mutant showed altered

74
substrate specificity (Fig. A.5f, inset). Unlike wild-type APOBEC3G that strongly
favours deamination at the 3’C of a 5’CCC3’ hot-spot motif, the D316R/D317R mutant
deaminates the middle C and the 3’C at about the same rate (Fig. A.5f, inset). This result
indicates that these negative residues, D316 and D317, are important for positioning the
substrate so that the 3’C is most likely to be deaminated by wild-type APOBEC3G.

The mutagenesis study supports our model of the horizontal groove, and verifies that the
residues located within and around the groove are important for deamination activity,
ssDNA binding and substrate orientation. These results provide a basis to pursue further
studies of APOBEC3G and other important APOBEC proteins (including activation-
induced cytidine deaminase, AID), which will facilitate our understanding of how they
act within our innate and adaptive immune responses to restrict HIV and other infectious
pathogens.

A. 3 Materials and Methods
APOBEC3G-CD2 was expressed and purified as a recombinant GST-fusion protein in E.
coli. Purified GST-fusion protein was digested by PreScission protease. Further
purification of the APOBEC3G-CD2 protein was completed using Superdex-75 gel
filtration chromatography in 50 mM HEPES, pH 7.0, 250 mM NaCl and 1 mM
dithiothreitol. Native and Se-Met-labeled proteins were concentrated to 25 mg ml
-1
.
Crystals were grown at 18 °C by hanging-drop vapour diffusion from a reservoir solution
of 100 mM MES, pH 6.5, 40% PEG 200. In an assay for deamination activity,

75
APOBEC3G (0.024–10 M) was allowed to react with 500 nM fluorescein-dT-
incorporated ssDNA for 10 or 15 min and subsequently treated with uracil-DNA
glycosylase and resolved on 16% urea–PAGE for analysis as described previously
9
.
Specific activity, measured as fmoles of substrate deaminated per g of enzyme per minute,
was calculated from the per cent deamination of a ssDNA substrate over a range of
enzyme concentrations. To analyse processivity and directionality, substrate use (%) was
less than 15% to maintain single-hit kinetics. The 'processivity factor' is defined as the
ratio of the observed fraction of double deaminations (occurring at both 5'C and 3'C on
the same molecule) to the predicted fraction of independent double deaminations
9
. A
processivity factor of greater than one indicates that most of the double deaminations are
caused by the same APOBEC3G molecule acting processively on both C targets. The
deamination bias is measured by the ratio of 5'C/3'C deaminations
9
.

Structure determination and refinement
Selenium-substituted methionine protein crystals were used for collecting Se-MAD data
using the ALS synchrotron beam source. Data were processed with HKL3000
44
. A total
of three selenium and one zinc sites were located by the SHELXD
49
program using
MAD data between 50 and 3.0 Å resolution range. The SHARP program was used to
calculate the experimental and model-combined phases using the MAD data in the
resolution range of 50 to 2.3 Å as well as for density modification. The model was built
with O using the high quality electron density map obtained, and was refined with CNS
to 2.3 Å resolution with excellent statistics. The final refinement statistics and geometry

76
as defined by Procheck were in good agreement and are summarized in Table A.1.
Structure figures were designed using PyMOL
20
.

Table A.1 Data collection (MAD) and model refinement statistics of Apobec 3G
Peak Se (  = 0.97907 Å) Inflection Se (  = 0.97921 Å)  
Data collection
Cell dimensions (Space group C2)
a, b, c (Å) 83.46, 57.33, 40.58 83.46, 57.33, 40.58
 α, β, γ (˚) 90, 96.46, 90 90, 96.46, 90
Resolution (Å) 50–2.20 (2.28-2.20) 50–2.38 (2.38–2.30)
Observations 9675 8564
R
merge
12.1 (37.8) 9.9 (41.2)
I/ I 19.7 (4.6) 17.0 (3.0)
Completeness (%) 99.8 (99.7) 99.8 (99.2)
Refinement
Resolution (Å) 30.0–2.3
No. reflections 15669
R
work
/ R
free
25.10/26.70
B-factor (Averaged): Protein 33.221;  Water 34.725
R.m.s deviations: Bond lengths (Å)  0.010;   Bond angles (˚)   1.2
Highest-resolution shell values are shown in parentheses.

Construction of APOBEC3G mutants
Mutant APOBEC3G proteins (D316R/D317R, R313E/R320D and R374E/R376D) were
constructed by site-directed mutagenesis using the pAcG2T-APOBEC3G vector as the
template. The following primers and their complementary strands were used:
5'-CTTCACTGCCCGCATCTATAGAAGACAAGGAAGATGTCAGGAG-3'
(D316R/D317R),
5'-CTGTGCATCTTCACTGCCGAGATCTATGATGATCAAGGAGATTGTCAGGAG
GGGCTGCGC-3' (R313E/R320D),
and 5'-GAGCACAGCCAAGACCTGAGTGGGGAGCTGGACGCCATTCTCCAGAAT

77
CAGG-3' (R374E/R376D). The entire coding region of the APOBEC3G mutant
constructs was verified by DNA sequencing. The mutant plasmids were then co-
transfected, according to the manufacturer's protocol, with linearized baculovirus DNA
(BD Biosciences) to generate recombinant mutant APOBEC3G baculovirus. Wild-type
and mutant APOBEC3G expression in Sf9 insect cells and purification was carried out as
described previously8. Mutant E. coli GST–APOBEC3G proteins (R213E, R215E,
K249E, R256E, W285A, F289A, Y315A and N244A) were constructed by site-directed
mutagenesis using the pGEX-6P1-GST-APOBEC3G vector as the template. The
following primers and their complementary strands were used:
5'-AATGAACCTTGGGTTGAAGGTCGTCACGAGACTTAC-3' (R213E),
5'-GAACCTTGGGTTCGTGGTGAACACGAGACTTACCTG-3' (R215E),
5'-TGTAACCAGGCCCCGCACGAGCACGGTTTTCTGGAA-3' (K249E),
5'-GCACGGTTTTCTGGAAGGTGAACACGCCGAACTGTG-3' (R256E),
5'-GTTACCTGCTTTACCTCTGCGTCCCCGTGCTTTTCC-3' (W285A),
5'-ACCTCTTGGTCCCCGTGCGCTTCCTGCGCACAAGAA-3' (F289A),
5'-ATCTTCACTGCACGTATTGCCGACGACCAGGGCCGT-3' (Y315A),
5'-CGTCGTGGTTTCCTGTGTGCCCAGGCCCCGCACAAGCAC-3' (N244A),
5'-CGTCGTGGTTTCCTGTCTAGACAGGCCCCGCACAAGCAC-3' (N244A).
The entire coding region of the APOBEC3G mutant constructs was verified by DNA
sequencing. Plasmids were expressed in XA90 E. coli cells and were lysed by French
press. Further purification was carried out as described previously
10
.


78
DNA binding
APOBEC3G DNA binding was monitored by changes in steady state fluorescence
depolarization (rotational anisotropy). Reaction mixtures (70 μl), containing fluorescein-
labelled DNA (50 nM) in buffer (50 mM HEPES, pH 7.3, 1 mM dithiothreitol and 5 mM
MgCl2) and varying concentration of 0 to 500 nM APOBEC3G, were incubated at 37 °C.
The sequence of the ssDNA was 5’-TTAGATGAGTGTAA(fluorescein-
dT)GTGATATATGTGTAT-3’. Rotational anisotropy was measured as described
previously
8
. The fraction of DNA bound to protein was determined as described
previously
2
.
Processivity and directionality substrates
The substrate used to determine processivity and directionality is
5'-AAAGAGAAAGTGATACCCAAAGAGTAAAGT(fluorescein-dT)AGATAGAGAG
TGATACCCAAAGAGTAAAGTTAGTAAGATGTGTAAGTATGTTAA-3'.
For specific activity measurements, the ssDNA substrate sequence was
5’-GG(fluorescein-dT)AGTTTAGTGGTTTGTATAGAATTAATACCCAAAGAAGTG
TATGTAATTGTTATGATAAGATTGAAA-3’. 
Asset Metadata
Creator Chang, Yu-Hao Paul (author) 
Core Title Structural studies of two key factors for DNA replication in eukaryotic cells 
Contributor Electronically uploaded by the author (provenance) 
School Keck School of Medicine 
Degree Doctor of Philosophy 
Degree Program Genetic, Molecular and Cellular Biology 
Publication Date 08/06/2012 
Defense Date 04/20/2010 
Publisher University of Southern California (original), University of Southern California. Libraries (digital) 
Tag DNA replication,GINS origin,helicase,oai:digitallibrary.usc.edu:usctheses,OAI-PMH Harvest,SV40 large T antigen 
Language English
Advisor Chen, Xiaojiang S. (committee chair), Goodman, Myron F. (committee member), Langen, Ralf (committee member), Rees, Douglas (committee member) 
Creator Email yuhaopch@gmail.com,yuhaopch@usc.edu 
Permanent Link (DOI) https://doi.org/10.25549/usctheses-m3324 
Unique identifier UC1163005 
Identifier etd-CHANG-3999 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-366483 (legacy record id),usctheses-m3324 (legacy record id) 
Legacy Identifier etd-CHANG-3999.pdf 
Dmrecord 366483 
Document Type Dissertation 
Rights Chang, Yu-Hao Paul 
Type texts
Source University of Southern California (contributing entity), University of Southern California Dissertations and Theses (collection) 
Repository Name Libraries, University of Southern California
Repository Location Los Angeles, California
Repository Email uscdl@usc.edu
Abstract (if available)
Abstract Genomic DNA replication is essential for the transmission of genetic information 
Tags
DNA replication
GINS origin
helicase
SV40 large T antigen
Linked assets
University of Southern California Dissertations and Theses
doctype icon
University of Southern California Dissertations and Theses 
Action button