Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Site-directed spin labeling studies of sequence-dependent DNA shape and protein-DNA recognition
(USC Thesis Other)
Site-directed spin labeling studies of sequence-dependent DNA shape and protein-DNA recognition
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
SITE-DIRECTED SPIN LABELING STUDIES OF SEQUENCE-DEPENDENT DNA SHAPE AND PROTEIN-DNA RECOGNITION by Yuan Ding A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (CHEMISTRY) August 2015 Copyright 2015 Yuan Ding ii Dedication To my maternal grandparents, Yun, Ying and Zhang, Lvlan iii Acknowledgement During my stay at University of Southern California, I’ve been fortunate enough to meet a lot of talented and enthusiastic individuals. Their wisdom and kindness truly inspired me through my professional journey. It’s my great honor to thank them all. First and foremost, I would like to express my deepest gratitude to my advisor, Professor Peter Z. Qin. From the very beginning till the end of my graduate school life, he has offered endless support, patience, intellectual input, and professional guidance. In particular, he helped me greatly at developing my own way of thinking by implanting the significance of critical thinking through everyday interaction. Besides, he is always available for guidance when needed, but also offers freedom for me to explore my own research interest. I truly believe that the training I received from him has not only shaped me into a better scientist, but also made me a better person. I would like to thank Professors Chi H. Mak, Stephen E. Bradforth, Ralf Langen, Remo Rohs, and Susumu Takahashi for serving on my qualifying exam and dissertation committee. A special “thank you” goes to Professor Lin Chen for the critical help he offered in our collaborative study. I’m also grateful for the interaction I had with Professors Barry C. Thompson, Ian Haworth, G.K. Surya Prakash, Andrey Vilesov, Matthew R. Pratt, and Travis J. Williams during graduate courses they offered. I would like to acknowledge Professor Song-i Han and Drs. John Franck and Katherine Stone from University of California, Santa Barbara for their intellectual input during our collaboration. My life at USC couldn’t have been this pleasant if it weren’t for the accompany of all the previous and current Qin group members: Dr. Xiaojun Zhang for extensive professional training and challenging me to go the extra mile, Dewi Sri Hartati for sharing both technical skills and life lessons with me every day, Narin S. Tangprasertchai for all the assistance and helpful ideas, iv as well as Dr. Gian Paola Gacho Grant, Dr. Anna Popova, Dr. Phuong H. Nguyen, Yun Fang, Carolina Vazquez Reyes, Kenneth W. Tham, Yuan Wang for being such wonderful group members. I will never forget the heated but inspiring discussions we always have here. I want to thank my family and friends both in China and the US for always staying by my side and being supportive. In particular, I want to thank my parents for always giving me the best they can offer, and providing me the freedom and support to pursue my dream. A special “thank you” also goes to my husband, Dr. Marcos A. Sainz for his constant support, caring, patience and tolerance. He’s my rock, and my life now couldn’t have been possible if it weren’t for his unconditional love. Lastly, I would like to acknowledge all the previous and current staff members at USC Department of Chemistry: Michele Dea, Magnolia Benitez, Katie McKissick, Marie de la Torre, Ross Lewis and Allen Kershaw. Especially, I want to thank Drs. Jessica Parr, Michael Quinlan, Thomas M. Bertolini and Frank Devlin for making my teaching experience so pleasant. The Women in Science and Engineering (WiSE) and interdisciplinary Program in Drug Discovery (iPIDD) should also be acknowledged for their generous funding during my graduate life. v Table of Contents Dedication ii Acknowledgement iii List of Tables x List of Figures xi Abstract xii Chapter 1 Overview 1 1. Sequence-dependent shape of DNA duplexes 1 2. Roles of DNA shape in protein-DNA recognition 2 References 8 Chapter 2 Nucleic Acid Structure and Dynamics: Perspectives from Site-Directed Spin Labeling 10 1. Introduction 11 2. Chemical strategies for introducing spin label at specific sites of nucleic acids 12 2.1 Nitroxides covalently attached to nucleic acids 14 2.2 Nitroxides non-covalently attached to nucleic acids 15 2.3 Alternatives to nitroxide spin labels 16 3. Structural and dynamic information derived from singly-attached nitroxide spin labels 17 3.1 Theoretical basis of obtaining information from nitroxide rotational Dynamics 17 3.1.1 Basic EPR 17 3.1.2 Rotational dynamics of nitroxide spin labels measured using cw-EPR 20 3.1.3 Correlating nitroxide motion to the local environment of the target macromolecule 22 vi 3.2 Examples of application 23 3.2.1 Probing nucleic acid interactions using information derived from the global tumbling motion 23 3.2.2 Probing segmental motions: Tetrahymena group I ribozyme as an example 24 3.2.3 Probing local environments in nucleic acids 26 3.2.3.1 Sequence-dependent variations along DNA duplexes probed by a nucleotide-independent nitroxide probe 27 3.2.3.2 Local features in DNA duplexes probed by the rigid Ç label 28 4. Deriving structural information using distances measured with spin labels 29 4.1 Measuring inter-spin distances by determination of dipolar coupling strength 29 4.1.1 EPR methods for measuring dipolar coupling strength 30 4.1.2 Determining inter-spin distance distribution from the measured dipolar coupling strength 32 4.2 Correlating the measured inter-spin distances to structure of the parent molecule 34 4.3 Examples of application 36 4.3.1 Sequence-dependent shape of the p53 response elements 36 4.3.2 Conformational flexibility of DNA duplexes probed by inter- nitroxide distance measurements 40 4.3.3 Mapping the global structure of a three-way junction in a phi29 packaging RNA dimer 41 4.3.4 Monitoring conformational changes in folded RNAs 42 4.3.5 In-cell pulsed EPR spectroscopy distance measurement 43 5. Spin-labeling in NMR and EPR-NMR studies of nucleic acids 44 6. Conclusions and Perspectives 45 vii References 46 Chapter 3 Experimental Mapping of DNA Duplex Shape Enabled by Global Lineshape Analyses of a Sequence-independent Nitroxide Probe 54 1. Introduction 55 2. Materials and Methods 58 2.1 Preparation of spin-labeled DNA 58 2.2 EPR sample preparation 59 2.3 X-band EPR spectroscopy 60 2.4 Computation of Pearson coefficient 60 2.5 Sequence-dependent DNA shape parameters prediction 61 3. Results 61 3.1 Nitroxide scanning of DNA duplexes using an optimized streptavidin tethering scheme 61 3.2 Pair-wise Pearson coefficients for assessment of EPR lineshape variation 65 3.3 Pearson coefficients referenced to TEMPOL spectrum report nitroxide mobility 67 3.4 R5a mobility maps reveal DNA duplex shape at the level of individual base-pairs 68 3.5 P matrix for global lineshape analyses 71 4. Discussion 73 4.1 Pearson coefficients for EPR lineshape analysis and comparison 74 4.2 Mapping DNA shape using the nucleotide-independent nitroxide probe 75 References 79 Supplementary Information 82 Supplementary Information References 95 viii Chapter 4 Conformations of p53 Response Elements in Solution Deduced Using Site- Directed Spin Labeling and Monte Carlo Sampling 96 1. Introduction 98 2. Materials and Methods 101 2.1 DNA spin labeling 101 2.2 EPR sample preparation and DEER spectroscopy measurements of inter- spin distances 102 2.3 Generation of DNA models using Monte Carlo simulations 102 2.4 Computation of expected inter-R5 distances 103 2.5 Characterization of DNA duplex models 103 3. Results 104 3.1 DEER measured distances reveal that p53DBD binding induces RE conformational change 104 3.2 Conformation of unbound RE in solution 106 3.3 Assessing p53DBD bound RE conformations in solution 110 3.4 Distinct conformational changes at the central region of the REs upon p53DBD binding 111 4. Discussion 113 4.1 Sequence-dependent conformational changes of RE upon p53DBD binding 114 4.2 Mapping sequence-dependent DNA shape using the SDSL-MC approach 117 References 119 Supplementary Information 122 Supplementary Information References 150 ix Chapter 5 Conformations of GATA3 DNA Binding Domain Bound to DNA Studied by Site-directed spin labeling 151 1. Introduction 153 2. Materials and Methods 157 2.1 Protein mutagenesis 157 2.2 Protein expression and purification 159 2.3 Spin labeling of protein 160 2.4 Spin labeling of DNA 162 2.5 EPR sample assembly 163 2.6 Continuous-wave (cw) EPR measurement of spin labeled protein 164 2.7 DEER spectroscopy measurement for inter-spin distance 164 2.8 Computation of predicted inter-spin distances based on crystal structure 165 2.9 Electrophoretic mobility shift assay 166 3. Results 167 3.1 Biochemical characterization of spin-labeled GATA3 double-finger (DF) 167 3.2 Conformation of DF in absence of GATA DNA 170 3.3 Solution conformation of DF to DNA containing palindromic target sites 171 3.4 C-finger binds to single GATA sequence in a similar fashion as it does to palindromic target sites 176 3.5 N-finger fragment does not bind to DNA containing single GATA site in a specific conformation in the presence of C-finger 179 4. Discussion 189 References 191 Supplementary Information 193 x List of Tables Chapter 3 S1. DNA oligonucleotides used in this study 93 S2. Key for spectral number in the P-matrix 94 Chapter 4 S1. Comparison of DEER measured distances in unbound and bound REs 144 S2. Distance measurements and model characterization of the unbound p21-RE 145 S3. Comparison of unbound p21-RE MC models obtained using either 16 or 14 sets of DEER measured distances 146 S4. Distance measurements and model characterization of the unbound BAX-RE 147 S5. Comparison of unbound BAX-RE MC models obtained using either 18 or 16 sets of DEER measured distances 148 S6. Assessment of bound DNA conformations reported in co-crystal structures 149 Chapter 5 1. Summary of distances measured on complex formed by DF and palGATA 174 S1. Primers used to mutate T280 and I362 during mutagenesis 199 S2. Summary of DNA strands used in this study with sequences and extinction coefficients at 260 nm 200 xi List of Figures Chapter 1 1. Nucleotide structures in DNA and base pairing pattern 2 2. Molecular shape of A-DNA, B-DNA and Z-DNA 3 3. Illustration of DNA local shape features in DNA-protein complexes 5 Chapter 2 1. The general strategy of site-directed spin labeling 12 2. Representative examples of spin label reported in SDSL studies 14 3. Energy levels of nitroxide upon external magnetic field and EPR lineshape and its correlation with nitroxide dynamics 18 4. Using SDSL to probe segment motion on Group I ribozyme in open complex 25 5. Probing local environments in DNA duplexes using the R5a nitroxide 27 6. Double electron-electron resonance (DEER) spectroscopy 31 7. The NASNOX program for correlating measured inter-nitroxide distances to structure of the parent nucleic acid molecules 35 8. SDSL assessment of DNA conformation in the BAX/p53-DBD complex 38 9. Deformation of p21-RE upon p53-DBD binding 40 10. SDSL mapping of the three-way junction in the pRNA 42 Chapter 3 1. Experimental design of DNA scanning using SDSL 58 2. X-band cw-EPR spectra of R5a-labeled streptavidin-tethered BAX and p21 duplexes 64 3. Characteristics of pair-wise Pearson coefficients 66 4. Characteristics of P TEMOPL 68 5. R5a mobility maps reveal DNA duplex shape 69 6. A matrix of the pair-wise Pearson coefficients obtained with 31 R5a spectra 72 xii S1. DNA tethering to streptavidin examined by a native gel shift assay 82 S2. Examining spectral effects due to altering the relative positioning between the R5a labeling site and streptavidin 83 S3. Assessing potential inter-molecular di-polar interaction in the streptavidin tethered DNAs 84 S4. Examining the impact of noise on Pearson coefficient 85 S5. Sensitivity of Pearson coefficient and RMSD on spectra normalization 86 S6. Map of central line width and effective hyperfine splitting in the BAX and p21 duplexes 87 S7. Comparisons between maps of P TEMPOL and minor groove width 88 S8. Comparisons between maps of P TEMPOL and propeller twist 89 S9. Comparisons between maps of P TEMPOL and helical twist 90 S10. Comparisons between maps of P TEMPOL , Roll-Roll force constant and HelT-HelT force constant 91 S11. Overlay of spectra that showed a high degree of similarity in the P-matrix 92 Chapter 4 1. Experimental design and example DEER results 101 2. Characterization of the unbound p21-RE and BAX-RE 108 3. Conformational changes in REs upon p53 binding 112 4. Analyses of p53 RE structures 116 S1. Examples of DEER measurements on single-labeled p21-RE duplexes 122 S2. Native gel shift assay to examine p53DBD binding to double-labeled REs 123 S3. DEER measured distances in p21-RE 124 S4. Example of expected allowable R5 conformers within the p21-RE 131 S5. DEER measured distances in the BAX-RE 132 xiii S6. Superimposition of the top-20 models derived from SDSL-MC analyses of the unbound REs 141 S7. Superimposition of the bound DNA (red) and the top-20 MC models of the unbound DNAs 142 S8. Helix axis for bound p21-RE and top-ranked model of the unbound p21-RE model 143 Chapter 5 1. Domain organization of GATA3 and three crystal structures of GATA3 full DNA binding domain with DNA 154 2. Spin labeling scheme and DNA sequences used in the study 162 3. Electrophoretic mobility shift assay to assess DNA binding by DF 168 4. Continuous-wave EPR spectra of R5-labeled DF obtained in aqueous solution at room temperature 169 5. DEER measurement and analysis on double-labeled naked protein 171 6. Distance prediction and one example of DEER measurement on the complex formed between DF and palGATA 172 7. Distances measured between the protein and the DNA in the complex formed between DF and palGATA 175 8. C-finger binding to sgGATA 178 9. DEER result on T280-sgX12 and T280-sgY18 at 1:1 concentration ratio 181 10. DEER result on T280-sgX12 and T280-sgY18 at 1:2 concentration ratio with negative control 183 11. Analyses of N-finger behavior on complex formed by DF and sgGATA 185 12. Overlay of original DEER traces measured between spin labeled T280 and I362 187 13. EMSA to assess complex formation between DNA duplexes and DF wild type 188 S1. Protein sequence expressed and used in this study 193 S2. Sequencing results after mutagenesis in alignment with wt sequence for T280 construct, I362 construct and DM construct 194 xiv S3. Mono S trace for construct T280 purification 197 S4. SDS-PAGE gel of T280 after Mono S column 198 xv Abstract Deoxyribonucleic acid (DNA) is responsible for storage and transmission of genetic information. An essential aspect fulfilling this crucial function is the specific interactions (recognitions) of particular DNA duplex sequences by molecules such as proteins. It has been established that specific protein-DNA recognition occurs via a combination of base readout (i.e., direct interactions between protein and DNA base functional group) and shape readout (i.e., interactions based on sequence-dependent chemical and physical features of DNA duplexes). As such, sequence-dependent shape of DNA duplexes, particularly in the protein-free (i.e., naked) state, is an important determinant in protein-DNA recognition. However, while ample structural information exists for protein-bound DNAs, information on naked DNA remains rather limited. In this thesis, I present work on using a biophysical method, called site-directed spin labeling (SDSL), to study DNA shape and its impact on protein recognition. Using a nucleotide- independent nitroxide spin label, continuous-wave (cw-) and pulsed- electron paramagnetic resonance (EPR) spectroscopy, and new analytical techniques I developed, I investigated DNA shape and protein/DNA recognition in two systems: (i) the p53 response elements and their recognition by the DNA-binding domain of the p53 tumor suppressor; and (ii) the GATA family of DNAs and their complex with the double-zinc-finger GATA proteins. The results demonstrated that SDSL can be used to experimentally identify differences in DNA local shape, which may provide one of the mechanisms for achieving specific protein recognition. The work also sets a foundation for using SDSL to investigate protein-DNA complexes that consist of both well-folded and disordered domains. 1 Chapter 1 Overview 1. Sequence-dependent shape of DNA duplex Deoxyribonucleic acid (DNA) is one of the major players in biology. It plays critical roles in the maintenance and transmission of genetic information in all living organisms. The central dogma of molecular biology depicts that genetic information stored in DNA is transferred into ribonucleic acid (RNA) during transcription, and RNA is then further translated to proteins, which carry out the majority of cellular functions [1]. The building blocks of DNA and RNA are called nucleotides, and they are composed of sugar, base and phosphate. In DNA, the sugar is 2’- deoxyribose, and there are four types of bases (adenine, A; guanine, G; cytosine, C; thymine, T, Figure 1). DNA adopts the double-stranded duplex conformation, with the bases paring together following the Watson-Crick scheme (i.e., A pairs with T, and G pairs with C, Figure 1), allowing accurate copying of the nucleotide sequence which serves as the molecular basis for copying and transmitting the genetic information. We also note that in addition to duplexes, DNA is known to adopt alternative non-duplex structures. For example, G-rich single-stranded DNAs, such as those at the end of linear chromosomes called telomeres, form quadruplexes consisting stacked sets of four-guanine planar units [2, 3]. Such G-quadruplex structure is known to exist in vivo and play important functional role in gene maintenance as well as regulation [4, 5]. 2 Figure 1 Nucleotide structures in DNA and base pairing pattern. Shown in the figure is a dimer TG base pairing with another dimer CA. In each nucleotide, the base is colored in blue, sugar in black, and phosphate in red. The identity of each nucleotide is labeled next to the base. Hydrogen bondings between the corresponding bases are labeled with dash lines. Note that two sets of hydrogen bonding form between A/T base pair, and three sets between G/C base pair. There are at least three types of DNA duplex conformation found in nature (A-form, B-form and Z-form, Figure 2). The helical structure is held together by hydrogen bonding between base- paired nucleotides, as well as base stacking between neighboring nucleotides [6]. Among them, B-form is the most commonly existing double-stranded DNA, and is believed to be most predominate under physiological conditions existed in cells [7]. It features a right-handed double helix, with a wide, shallow major-groove and a narrow, deep minor-groove. A-form is another right-handed double helical DNA structure, and is observed under dehydrated conditions or DNA-RNA hybrid helices [8]. A-form helix is much thicker than B-form, and the base pairs are not oriented perpendicular to the helical axis like in B-form. Besides, it features narrow, deep major-groove and a wide, shallow minor-groove, as opposed to B-form. Z-form, in contrast to A- form and B-form, is a left-handed helical conformation for DNA. The formation of Z-DNA can be promoted by alternating purine-pyrimidine sequence under high salt condition [9, 10]. Aside 3 from different polymorphisms of the double helix, different propensity of bending also contributes to DNA global shape variation [11]. Figure 2: Molecular shape of A-DNA, B-DNA and Z-DNA in a GRASP2 images [12] of the three helical forms of DNA constructed with the software tool, 3DNA [13] from fiber diffraction data [10, 14]. Each DNA helix comprises 14 mers, and is alternating sequence of d(GC) 7 . (A) A-DNA with a narrow, deep major groove and a wide, shallow minor groove. (B) B-DNA with a wide, shallow major groove and a narrow, deep minor. (C) Z-DNA lacks a major groove and the minor groove is narrow and deep. Figure adapted from [11]. Aside from the global conformation, DNA also presents conformational variation in a localized manner (Fig 3). This refers to deviation of a DNA helix from ideal B-DNA that is localized in one base-pair or a small region of the helix. This deviation can usually be quantified by parameters describing DNA helix shape, and a few of the distinct parameters include minor groove width, propeller twist, roll, and helix twist. Two of the most commonly observed local DNA shape features are kink and narrowing minor groove. DNA kink, which is defined as a local disruption of an otherwise linear helix, usually results from loss in base stacking of the nearby base steps (Fig 3A). The neighboring dinucleotide that has the least base stacking effect is most likely to show a kink at the base step. Kinks can be further stabilized by binding to protein, in which case the weakness of base stacking is compensated by the interaction with hydrophilic groups in protein [11]. Minor groove width is another feature on DNA helix that can be seen 4 varied at a local degree (Fig 3B) [15]. It usually results from varied pattern of hydrogen bonding or stacking interaction. Recently, it was reported that narrow minor groove width could play a critical role at specific DNA recognition by protein [15], which is made possible by the usage of varied electrostatic potential as a result of narrow minor groove width. 5 Figure 3: Illustration of DNA local shape features in DNA-protein complexes. (A) The Lac repressor kinks the DNA at a central CpG base pair step (PDB ID 2kei). The helix axes calculated for both sides of the kink (blue) show an abrupt change in the helix trajectory caused by the kink. (B) Phage 434 repressor recognizes local shape deformations of its operator with arginine residues (PDB ID 2or1) [15]. The narrow region of the minor groove that is contacted by arginines is highlighted in blue. Figure adapted from [11]. 6 2. Role of DNA shape in protein-DNA recognition The function of DNA greatly relies on its capability of specifically recognized by protein. This recognition serves as the foundation of transcriptional regulation pathway, DNA replication and repair, as well as DNA packaging [16, 17]. It is important to note that the recognition between protein and DNA usually involves multiple tiers of specificity. One of the most common classifications of protein-DNA recognition specificity is direct readout vs. indirect readout [17]. Direct readout is defined as the particular pattern of hydrogen-bond donors and acceptors, as well as non-polar groups in the DNA that can be recognized by the chemically complementary protein [11, 18]. In other words, protein directly reads out the sequence information of the DNA strand. On the other hand, indirect readout is defined as protein-DNA interaction that involves bases that are not in direct contact with the protein via hydrogen-bonding or non-polar interaction. In other words, the protein is recognizing indirectly through the geometry/shape variation of the DNA strands. A survey of protein-DNA complex structures showed that although the relative contributions of direct and indirect readouts vary among different recognition systems, they are both present in every interaction [19]. Thus, it can be concluded that indirect readout through DNA shape is a universal mechanism of protein-DNA specific recognition. Consequently, it’s extremely critical to understand the detailed DNA local shape to fully comprehend its biological function involving specific recognition by protein. However, knowledge on sequence-dependent naked DNA shape is extremely lacking, with information largely derived from computational analyses [20-22]. Experimental investigation, on the other hand, remains quite challenging. Even though X-ray crystallography and NMR spectroscopy have provided high-resolution structures of DNAs [21], X-ray crystallography 7 studies are hindered by crystal-packing biases, and NMR studies are constrained by the size of the DNA. This thesis focus on the sequence-dependent DNA shape and the specific protein-DNA recognition investigated by a biophysical method called site-directed spin labeling (SDSL). Using a nucleotide-independent nitroxide spin label, continuous-wave (cw-) and pulsed- electron paramagnetic resonance (EPR) spectroscopy, and new analytical techniques I developed, DNA shape and protein/DNA recognition were investigated on two systems. Chapter 2 [23] is a review of how SDSL has been applied on studies of nucleic acid structure and dynamics. In addition to introducing SDSL and electron paramagnetic resonance (EPR) spectroscopy, examples were also given on the contribution of SDSL and different types of EPR experiments on nucleic acid structure and dynamics. Chapter 3 [24] introduces a novel methodology consisting of SDSL, continuous-wave (cw-) EPR, and a new analytical approach to enable scanning of DNA duplexes to reveal their sequence-dependent local shape features. Chapter 4 [25] offers the first example known to us to use SDSL and pulsed-EPR combined with Monte Carlo simulation to retrieve DNA duplex conformation in solution.. Chapter 5 describes preliminary data on conformation of a DNA-binding protein, GATA3, upon binding to its target DNA, which lays the foundation for applying SDSL to investigate how DNA direct conformations of multi-domain proteins. 8 References 1. Crick, F., Central dogma of molecular biology. Nature, 1970. 227(5258): p. 561-3. 2. Williamson, J.R., G-quartet structures in telomeric DNA. Annu Rev Biophys Biomol Struct, 1994. 23: p. 703-30. 3. Blackburn, E.H. and J.W. Szostak, The Molecular-Structure of Centromeres and Telomeres. Annual Review of Biochemistry, 1984. 53: p. 163-194. 4. Huffman, K.E., et al., Telomere shortening is proportional to the size of the G-rich telomeric 3'-overhang. J Biol Chem, 2000. 275(26): p. 19719-22. 5. Zaug, A.J., E.R. Podell, and T.R. Cech, Human POT1 disrupts telomeric G-quadruplexes allowing telomerase extension in vitro. Proc Natl Acad Sci U S A, 2005. 102(31): p. 10864-9. 6. Yakovchuk, P., E. Protozanova, and M.D. Frank-Kamenetskii, Base-stacking and base- pairing contributions into thermal stability of the DNA double helix. Nucleic Acids Res, 2006. 34(2): p. 564-74. 7. Richmond, T.J. and C.A. Davey, The structure of DNA in the nucleosome core. Nature, 2003. 423(6936): p. 145-50. 8. Lu, X.J., Z. Shakked, and W.K. Olson, A-form conformational motifs in ligand-bound DNA structures. J Mol Biol, 2000. 300(4): p. 819-40. 9. Wang, A.H., et al., Molecular structure of a left-handed double helical DNA fragment at atomic resolution. Nature, 1979. 282(5740): p. 680-6. 10. Arnott, S., et al., Left-handed DNA helices. Nature, 1980. 283(5749): p. 743-5. 11. Rohs, R., et al., Origins of specificity in protein-DNA recognition. Annu Rev Biochem, 2010. 79: p. 233-69. 12. Petrey, D. and B. Honig, GRASP2: visualization, surface properties, and electrostatics of macromolecular structures and sequences. Methods Enzymol, 2003. 374: p. 492-509. 13. Lu, X.J. and W.K. Olson, 3DNA: a versatile, integrated software system for the analysis, rebuilding and visualization of three-dimensional nucleic-acid structures. Nat Protoc, 2008. 3(7): p. 1213-27. 14. Leslie, A.G., et al., Polymorphism of DNA double helices. J Mol Biol, 1980. 143(1): p. 49-72. 15. Rohs, R., et al., The role of DNA shape in protein-DNA recognition. Nature, 2009. 461(7268): p. 1248-53. 16. Garvie, C.W. and C. Wolberger, Recognition of specific DNA sequences. Mol Cell, 2001. 8(5): p. 937-46. 17. Lawson, C.L. and H.M. Berman, Indirect Readout of DNA Sequence by Proteins, in Protein-Nucleic Acid Interactions : Structural Biology2008, R. Soc. Chem: Cambridge, UK. p. 66-90. 18. Matthews, B.W., Protein-DNA interaction. No code for recognition. Nature, 1988. 335(6188): p. 294-5. 19. Gromiha, M.M., et al., Intermolecular and intramolecular readout mechanisms in protein-DNA recognition. Journal of Molecular Biology, 2004. 337(2): p. 285-294. 20. Perez, A., F.J. Luque, and M. Orozco, Frontiers in molecular dynamics simulations of DNA. Acc Chem Res, 2012. 45(2): p. 196-205. 9 21. Zhou, T., et al., DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale. Nucleic Acids Res, 2013. 41(Web Server issue): p. W56-62. 22. Olson, W.K., et al., DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. Proc Natl Acad Sci U S A, 1998. 95(19): p. 11163-8. 23. Ding, Y., Nguyen, P., Tangprasertchai, N. S., Reyes, C. V., Zhang, X., and P.Z. Qin, Nucleic Acid Structure and Dynamics: Perspectives from Site-Directed Spin Labeling, in Electron Paramagnetic Resonance. 2014. 24. Ding, Y., et al., Experimental mapping of DNA duplex shape enabled by global lineshape analyses of a nucleotide-independent nitroxide probe. Nucleic Acids Res, 2014. 42(18): p. e140. 25. Zhang, X., et al., Conformations of p53 response elements in solution deduced using site- directed spin labeling and Monte Carlo sampling. Nucleic Acids Res, 2014. 42(4): p. 2789-97. 10 Chapter 2 Nucleic Acid Structure and Dynamics: Perspectives from Site-Directed Spin Labeling* Abstract The technique of site-directed spin labeling (SDSL) provides information on bio-molecular systems by monitoring the behaviors of a stable radical tag (i.e., spin label) using electron paramagnetic resonance (EPR) spectroscopy. SDSL studies of nucleic acids and protein/nucleic- acid complexes have yielded unique information that is difficult to derive from other methods. In this chapter, we describe strategies used in nucleic acid SDSL investigations, and summarize advancements with a focus on those reported during the past five years. *Ding, Y., Nguyen, P., Tangprasertchai, N.S., Reyes, C.V., Zhang, X. and Qin, P.Z.* (2015), "Nucleic acid structure and dynamics: perspectives from site-directed spin labeling." Electron Paramagnetic Resonance, 24, 122–147 11 1. Introduction The method of site-directed spin labeling (SDSL) refers to a biophysical technique that uses chemically stable radicals, i.e., spin labels, to obtain structural and dynamic information on bio-macromolecules.[1] Behaviors of the spin labels, monitored using electron paramagnetic resonance (EPR) spectroscopy, are used to derive information on the parent molecule (Figure 1). SDSL can be applied to study high molecular weight systems under physiological conditions using a small amount of sample (~ 50 µM in 5 µl). The method provides much more detailed structural and dynamic information compared to chemical probing, and avoids a number of fundamental issues faced by crystallography methods (e.g., crystalline sample preparation, interference from lattice packing) and NMR (e.g., limitation of molecule size). SDSL has matured as a powerful tool for investigating protein structure and dynamics, particularly for studying membranes and membrane proteins.[2-7] Nucleic acid SDSL, in which the spin labels are attached to either DNA or RNA, started in the 1970s, when nitroxide spin labels were used to investigate RNA duplex formation.[8] In SDSL studies of nucleic acids, information has been derived primarily from two types of measurements (Figure 1): (i) distances between a pair of spin labels, which provide structural constraints; and (ii) rotational dynamics of a singly attached spin label, which yield structural and dynamic information of the parent macromolecule at the labeling site. With advances in EPR technology[7] and the capability of manipulating nucleic acids, the scope of nucleic acid SDSL has expanded significantly. During the past five years, a number of exciting developments in nucleic acid SDSL have been reported, including structural probing of nucleic acid and protein/nucleic-acid complexes, characterization of nanosecond motions of 12 large folded nucleic acid molecules, measurement in intracellular environments, and combined EPR/NMR investigation of spin-labeled nucleic acid molecules. Figure 1: The general strategy of site-directed spin labeling. Reproduced from [9] with permission. In this chapter, we will describe methodologies employed in SDSL investigation of nucleic acid and protein/nucleic-acid systems, and summarize recent advancements with a focus on those reported during the past five years. We will focus primarily on studies using spin labels attached to either DNA or RNA, although spin labels attached to proteins have also been used to investigate protein/nucleic-acid complexes,[10-14] and represent a promising direction for SDSL. We note that there are a number of excellent recent reviews on this topic,[6, 9, 15-20] which are highly recommended. 2. Chemical strategies for introducing spin label at specific sites of nucleic acids SDSL requires the presence of stable radical tags, which are generally not present in native nucleic acids. Therefore SDSL generally starts with introducing an extraneous spin 13 label.[9, 16, 18] As the four major naturally occurring nucleotides do not contain functional groups with sufficient reactivity under physiological conditions, covalent attachment of spin labels necessitates the use of modified nucleotides containing either the spin label moiety or other more reactive functionalities that can be further derivatized. The most commonly used spin labels are nitroxides, which are small compared to many exogenous tags (e.g., fluorophores), rendering them less intrusive to the parent molecule. Schemes have been reported for covalently and site-specifically attaching nitroxide spin labels at internal phosphate (Figure 2A, I), sugar (Figure 2A, II), and base (Figure 2A, III) positions, as well as at the 5’- or 3’-terminus of nucleic acids.[9, 16, 18] In addition, non- covalently attached nitroxide spin labels,[21] as well as non-nitroxide based spin labels (Figure 2B, 2C),[22] have been used in nucleic acid SDSL. It should be noted that coupling between the moiety containing the unpaired electron(s) (e.g., nitroxide pyrroline/piperidine ring) and the parent macromolecule is highly dependent on the particular chemical bonding scheme, which varies between different attachment methods. Consequently, methodologies for extracting information from the observed spin label behaviors (e.g., measuring inter-nitroxide distances to examine nucleic acid structure) must tailor to each specific labeling scheme. Furthermore, as an external probe, the spin label may perturb the parent molecule to various extents. The degree of perturbation and its effect on information obtained regarding the parent macromolecule should be examined in each case. 14 Figure 2: Representative examples of spin label reported in SDSL studies. Adapted from [17, 22, 23] with permission. 2.1 Nitroxides covalently attached to nucleic acids Many established site-specific covalent labeling schemes use a two-step design.[9] First, modified nucleotides containing reactive functional groups (e.g., phosphorothioate,[24] 2’- amino,[25, 26] 4-thio-uridine,[27, 28] and more recently, alkyne[29, 30]) are introduced at desired locations within a target nucleic acid strand. Most of these modified nucleotides can be introduced using solid-phase chemical synthesis,[9] although they can also be introduced enzymatically[9] or via a combination of chemical and enzymatic syntheses.[31] In the second step, the modified oligonucleotide is further reacted with an appropriate nitroxide derivative, usually under biochemically mild solution conditions that minimize damages to the spin label. For example, a phosphorothioate labeling scheme has been reported, in which a family of nitroxides (the R5-series, Figure 2A, I) can be attached to a phosphorothioate 15 group introduced at a defined location of the nucleic acid backbone during solid-phase chemical synthesis.[24, 32] This method is cost-effective in terms of both time and resources, as the R5-series can be attached efficiently to phosphorothioate modified oligonucleotides using a simple aqueous coupling reaction.[24, 32] Importantly, the labeling site is not restricted by base identity and thus enables efficient scanning of large DNAs[33] and RNAs.[34] Alternatively, a variety of methods have been reported in which a nitroxide is attached to the target strand during solid-phase chemical synthesis, either via direct incorporation of a nitroxide containing phosphoroamidite[35-39] or via on-column derivatization.[40-42] Using this approach, one may incorporate “designer” nitroxides to tune behaviors of spin labels with respect to the parent macromolecule, which is advantageous for extracting information on the macromolecule. However, this approach generally requires elaborate synthetic protocols. Most noticeably, this strategy has been used to attach the Ç label, in which a nitroxide is rigidly fused with a modified cytosine (Figure 2A, III (ii)), at specific positions of either DNA[38] or RNA[39] strands. Ç maintains the ability to form Watson-Crick base pair with guanine, and completely eliminates relative motions between the nitroxide and nucleic acid base, both of which are attractive features in SDSL studies of nucleic acids. Interestingly, reduction of the Ç nitroxide moiety yields a highly fluorescent analogue,[38] thus allowing investigation using fluorescence spectroscopy. 2.2 Nitroxides non-covalently attached to nucleic acids Nitroxides bound non-covalently to DNA duplexes were used early on to measure electrostatic potentials around DNA,[43] although in that study the nitroxide was brought to 16 the DNA by an intercalator, which affords little control on the labeling site. More recently, an elegant non-covalent yet site-directed labeling scheme has been reported by Sigurdsson and co-workers.[21, 44, 45] In this scheme, the ç probe, which corresponds to the base portion of Ç without the sugar and phosphate moieties, is non-covalently inserted into an abasic site introduced at an intended site within a DNA duplex.[21] The ç/DNA interaction is stabilized by base stacking and hydrogen bonding, and is affected primarily by the identity of the opposite nucleotide and temperature,[21] although the flanking sequences and the location of the abasic site also play a role.[44] This labeling scheme, while still relying on chemical synthesis, provides a simple method to direct a nitroxide to a specific site within a duplex, although with the nature of the non-covalent bonding one should also carefully examine off-target bindings.[45] The ç probe has been used to measure distances in DNA duplexes and a protein/DNA complex.[45] 2.3 Alternatives to nitroxide spin labels While nitroxides have been used most frequently in SDSL, other types of spin labels have also been employed to study nucleic acids. One class of spin probe is paramagnetic metal ions. For example, the paramagnetic Mn 2+ ion, which is capable of substituting the Mg 2+ ion to support RNA folding and catalysis, has been used to investigate structure and catalysis of ribozymes.[46] In addition, chelated Gd 3+ ions (Figure 2B), widely used as paramagnetic relaxation enhancing agents in Magnetic Resonance Imaging, have been used as spin labels to measure distances in DNA,[22] peptides,[47] and proteins.[47] The gadolinium-based labels offer high sensitivity, particularly in high-field pulsed EPR measurements.[47] Recently, the carbon-centered triarylmethyl (trityl) radical (Figure 2C) has been 17 explored,[48, 49] although direct application in nucleic acids has not yet been reported. Specifically, a distance of ~ 20 Å has been measured at 4 °C between a pair of trityl radicals covalently attached in a protein,[48] opening up the possibility of measuring nanometer distances in biological systems at ambient temperatures. 3. Structural and dynamic information derived from singly-attached nitroxide spin labels A large number of nucleic acid SDSL investigations obtained information by measuring rotational dynamics of a singly-attached nitroxide label using continuous-wave (cw-) EPR spectroscopy.[9, 15-19] In this chapter, we will first outline the basis underlying this methodology, and then discuss examples of application. 3.1 Theoretical basis of obtaining information from nitroxide rotational dynamics 3.1.1 Basic EPR Here we present a very brief description of the physical basis underlying cw-EPR measurement of nitroxides, which contains one unpaired electron (electron spin quantum number S=1/2) localized at the N-O bond. For more in-depth discussions of EPR theory, the readers should consult the relevant literature.[50] 18 Figure 3: (A) A diagram showing the dependence of the energy levels of an unpaired electron in a 14 N nitroxide upon increasing applied magnetic field B 0 . Arrows between the energy levels indicate transitions induced by electromagnetic radiation at a constant frequency. (B) Simulated X-band EPR spectra of nitroxides undergoing isotropic rotation. (C) Three modes of motion that contribute to nitroxide dynamics. Adapted from [9] with permission. For applications using singly-attached spin labels, the relevant Hamiltonian (𝑯) includes two major terms: 𝑯=𝑯 !" +𝑯 !! (1) the 𝑯 !" term represents the Zeeman effect, which describes interactions with the external 19 magnetic field: 𝑯 !" = 𝜇 ! 𝐵 ∙𝑔 ! ∙𝑆 (2) where 𝜇 ! is the Bohr magneton (the moment associated with one unit of orbital angular momentum) for an electron, 𝐵 is the external magnetic field operator, 𝑆 is the electron spin angular momentum operator, and 𝑔 ! is the g-factor of the unpaired electron. The 𝑯 !! term represents the hyperfine interaction between the electronic and nucleic spins: 𝑯 !! = 𝐼 ∙𝐴 ∙𝑆 (3) where 𝐼 is the nuclear spin angular momentum operator, and 𝐴 is the hyperfine interaction tensor. For a nitroxide, both the g- and A-tensors are anisotropic, and can be expressed in their respective principle frame as: 𝑔 ! = 𝑔 !! 0 0 0 𝑔 !! 0 0 0 𝑔 !! (4a) and 𝐴= 𝐴 !! 0 0 0 𝐴 !! 0 0 0 𝐴 !! (4b) the principal frames for the g- and A- tensors are generally treated as aligning exactly, with the principal axes x, z, and y lying along the N-O bond, the π orbital of nitrogen, and a direction perpendicular to the x and z axes, respectively. 20 For an 14 N nitroxide, the I=1 14 N nuclei has three nuclear spin states (m I =+1, 0, -1), and there are three EPR allowed transitions (ΔS z =1 and Δm I =0):[51] ℎ𝜐=∆𝐸=𝑔 !"" 𝛽 ! 𝐵 ! +𝑚 ! 𝐴 !"" m I = +1, 0, -1 (5a) where ℎ is Planck’s constant and 𝜐 is the frequency of the transition, and 𝑔 !"" = (𝑔 !! 𝑠𝑖𝑛 ! 𝜃𝑐𝑜𝑠 ! 𝜑+𝑔 !! 𝑠𝑖𝑛 ! 𝜃𝑠𝑖𝑛 ! 𝜑+𝑔 !! 𝑐𝑜𝑠 ! 𝜃) !/! (5b) 𝐴 !"" = (𝐴 !! ! 𝑠𝑖𝑛 ! 𝜃𝑐𝑜𝑠 ! 𝜑+𝐴 !! ! 𝑠𝑖𝑛 ! 𝜃𝑠𝑖𝑛 ! 𝜑+𝐴 !! ! 𝑐𝑜𝑠 ! 𝜃) !/! (5c) where θ is the angle between the z-axis of the principle frame and the external magnetic field, and ϕ is the angle between the x-axis of the principle frame and the projection of external magnetic field in the xy-plane.[51] As such, the EPR spectrum of a 14 N nitroxide shows a 3-line pattern (Figure 3B), and varies depending on the orientation (i.e., θ and ϕ values) of the nitroxide. 3.1.2 Rotational dynamics of nitroxide spin labels measured using cw-EPR When measured as an ensemble, the observed EPR spectrum is the sum of all individual spectra, and its lineshape depends on both the rotational diffusive behavior of the nitroxides and the frequency of the applied microwave.[51-54] Figure 3B shows a series of simulated X-band (~ 9.5 GHz) EPR spectra of 14 N nitroxides undergoing isotropic tumbling with different rotational correlation times (τ). In the fast limit regime (τ ~ 10 -11 -- 10 -9 s at X- band), the g- and A-tensors are averaged nearly completely, and the EPR spectrum shows three sharp lines (Figure 3B, τ = 0.1 ns). In the rigid-limit (τ > 3×10 -8 s), the destruction of nitroxides can be treated as static, and the cw-EPR spectrum is the sum of lineshapes from all orientations present in the ensemble (a powder spectrum) (Figure 3B, τ = 50 ns). In either the 21 fast- or rigid-limit, the overall pattern of the EPR spectral lineshape is similar, and spectral analyses can yield quantitative information on nitroxide motion. Specifically, in the fast- motion regime, analyzing the heights and/or widths of the peaks is sufficient to obtain the effective nitroxide rotational correlation time τ,[51, 52] while in the rigid-limit regime, the splitting between the outermost peaks (2A eff ) is readily measured (Figure 3B) and used to characterize nitroxide orientation distributions.[53, 54] In between the fast- and rigid-limit regimes lies the slow motion regime (1×10 -9 s < τ < 3×10 -8 s), in which averaging of the g- and A-tensor is incomplete.[52] In this regime, variations in rotational diffusion motions of the nitroxide, including both rates and amplitudes of motion, lead to drastic lineshape variations. As illustrated by simulated spectra shown in Figure 3B, with decreasing nitroxide motion (increasing τ), the central line broadens, new features become apparent at low- and the high-field regions, and splitting between the outer peaks increases. The EPR spectrum in the slow-motion regime shows much richer lineshape variations, and is widely used to monitor variations in nitroxide motion caused by interacting with the parent macromolecule. In many cases, parameters measured directly from the EPR spectrum, such as the effective hyperfine splitting (2A eff ) and the central line width (ΔH pp ) (Figure 3B), are used to characterize nitroxide dynamics.[55-57] These parameters allow a semi- quantitative assessment of the nitroxide mobility, which describes a combined effect of the rate and the amplitude of motion. For example, a broad central line gives a small (ΔH pp ) -1 value and indicates low mobility, which can result from low frequency but large amplitude motions, or small amplitude motions with fast rates. More recently, an approach based on Pearson coefficient analysis was developed to collectively examine similarity among an 22 ensemble of R5a spectra, and the resulting Pearson coefficients were used to generate nitroxide mobility maps along the DNA, which were shown to report on DNA duplex shape.[58] Furthermore, a cw-EPR spectrum can be simulated based on quantum mechanics. The most widely used approach in EPR spectral simulation is based on the stochastic Liouville equation (SLE),[59, 60] which treats the electronic and nuclear spins quantum mechanically, while the nitroxide re-orientation motion is treated classically and parameterized in terms of rotational diffusion constants. The SLE approach is extremely efficient and capable of computing a spectrum in a fraction of a second. This enables iterative fitting of experimental spectra, including those that fall under the slow motion regime.[60-62] However, SLE-based spectral simulations depend on the physical model used to describe the nitroxide motion, which usually requires a large number of parameters,[60-62] and unique determination of nitroxide motion from simulation remains challenging. In another EPR simulation approach, trajectories of nitroxide motion are generated using computational methods, such as Molecular Dynamics (MD) simulations, and are used to compute the EPR spectrum directly.[63-67] With advances in simulation techniques, this approach holds great promise in revealing direct correlations between EPR lineshape and molecular structure and dynamics at the atomic level, thus allowing a more detailed understanding of the target macromolecule via SDSL. 3.1.3 Correlating nitroxide motion to the local environment of the target macromolecule When a nitroxide is attached to a nucleic acid molecule, its rotation is modulated by the structural and dynamic features of the nucleic acid at the labeling site. This serves as the 23 basis of deriving information on the macromolecule. Conceptually, a macromolecule impacts nitroxide rotational motion via simultaneous actions of three modes (Figure 3C):[9] macromolecule global tumbling (τ R ); rotation of bonds connecting the nitroxide to the macromolecule (τ B ); and intrinsic motions of the macromolecule transmitted to the labeling site (τ S ). The τ R effect is independent of the labeling site, but would be modulated upon interaction between the spin-labeled macromolecule and other molecules. On the other hand, τ B and τ S motions may be highly site-specific. Each of these three modes of nitroxide motion has been utilized to gain information on the parent nucleic acids. The key is to establish the correlation between the macromolecular local environment, nitroxide motions, and the observed EPR spectral features. Such correlations depend on a wide range of factors, including the chemical scheme of attachment, the experimental conditions, and specific features of the system, and should be investigated for each individual system. 3.2 Examples of application 3.2.1 Probing nucleic acid interactions using information derived from the global tumbling motion At room temperature and in aqueous solutions, a globally-folded bio-macromolecule with a molecular weight of 15 kD is expected to have a global rotational tumbling time τ R of approximately 8 ns, which, in most cases, would impact the observed X-band cw-EPR spectrum of an attached nitroxide.[32] As such, the observed EPR spectrum may change due to alteration in the size and/or shape of the parent molecule, which affects the global tumbling behaviors. This serves as the basis for detecting interactions between spin-labeled 24 nucleic acid molecules and their partners.[32, 68] A recent example of utilizing the τ R effect comes from studies on interactions between the 20-mer HIV-I RNA stem loop 3 and the HIV-1 nucleocapsid Zn-finger protein (NCp7).[68] Scholes and co-workers attached a spin label at the 5’ terminus of the RNA, and used cw-EPR to monitor changes in the overall tumbling dynamics of the RNA as it interacts with NCp7. The data reveal the stoichiometry of NCp7/RNA binding, its variations depending on NCp7 concentrations and ionic conditions, and the importance of Zn 2+ in sustaining a large complex in which multiple copies of NCp7 interact with the RNA. In addition, kinetics of NCp7/RNA association were studied over a range of milliseconds to seconds using a specialized micro-mixer stopped- flow EPR system, and the results revealed multiple kinetic events consistent with initial rapid NCp7/RNA binding followed by a slower complex forming process. 3.2.2 Probing segmental motions: Tetrahymena group I ribozyme as an example An example of SDSL probing of nucleic acid dynamics via monitoring the τ S mode of nitroxide motion is the investigation of nanosecond dynamics of an RNA duplex within the Tetrahymena group I ribozyme (~120 kD) (Figure 4).[69, 70] The Tetrahymena group I ribozyme catalyzes a site-specific cleavage of an oligonucleotide substrate (S), and has been one of the widely used large RNA systems to understand RNA structures, folding, and function.[71] The ribozyme recognizes its substrate by forming a duplex between S and a single-stranded segment of the ribozyme core (i.e., the P1 duplex, Figure 4A). During the catalytic cycle, the ribozyme undergoes a structural transition between two distinct states: the open state, in which the P1 helix is directly connected to the ribozyme core through a single- stranded J1/2 junction and makes no tertiary contacts to the ribozyme core; and the closed 25 state, in which P1 duplex docks into the pre-folded ribozyme core via multiple tertiary interactions and positions the substrate for cleavage. Qin and co-workers used SDSL to study how RNA elements within the ribozyme control dynamics of the P1 duplex,[69, 70] which may impact both the rate and fidelity of catalysis. In these studies, manipulations of S allowed specific nitroxide labelling at P1, as well as trapping the ribozyme at the respective open and closed states. X-band cw-EPR spectroscopy was carried out to obtain information on nitroxide dynamics in the 0.5 – 20 ns regime. The data were used to examine how mutations of the J1/2 junction alter P1 motions. Figure 4: (A) A schematic of group I ribozyme in open complex. The yellow dot marks the location of spin label. (B) Variations in P1 motions upon mutating the J1/2 junction from “AAA” to “UUU”. Adapted from [70] with permission. These studies have yielded a number of interesting findings.[69, 70] Noticeably, although the labeling site was deliberately chosen to avoid direct contact between the spin label and the folded ribozyme core, the nitroxides were able to report difference in P1 mobility between the close and open states. This established a basis for experimental probing of segmental motion of an RNA element (i.e., the P1 duplex) in the nanosecond regime in the context of a large folded RNA. More importantly, in the open state, both the flexible R5a 26 probe and the rigid Ç reported that lengthening J1/2 increases P1 mobility, and Ç was able to detect alteration of P1 motion as J1/2 is mutated from poly A to poly U (Figure 4B). Analysis of the EPR spectra, via both qualitative lineshape comparison and spectral simulations, provide evidence that the J1/2 junction modulates the motion ordering of P1 in the nanosecond regime, with the degree of modulation correlating with the flexibility intrinsically encoded in nucleotides constituting the J1/2 junction. Motional ordering dictates the probability of attaining a particular configuration, and is one of the key factors that impacts sampling of the conformational space. The SDSL studies thus established a means to experimentally investigate motional ordering in RNA and other nucleic acid systems. 3.2.3 Probing local environments in nucleic acids Site-specific features in cw-EPR spectra, which report on modulation of nitroxide τ B and τ S motions by the immediate environment surrounding the attachment site, have long been used to monitor structure, folding, and interaction in nucleic acids.[9, 15-19] A number of these studies, including investigating ligand interactions with the HIV TAR RNA,[72-76] interaction between the GAAA tetraloop to its 11-nucleotide receptor,[28, 77] and magnesium-dependent folding of the hammer head ribozyme,[78, 79] have been reviewed extensively,[9, 15-19] and will not be discussed here. Instead, the following sections summarize recent work on SDSL studies of DNAs at the level of individual nucleotides. 27 Figure 5: Probing local environments in DNA duplexes using the R5a nitroxide. (A) Examples of R5a spectra at different sites of a DNA duplex. The DNA is designated as “CS”, and spectra were obtained from mixed R p - and S p -diastereomers. (B) A representative R p -R5a conformer at position 7 of the CS duplex. (C) Mutating nucleotide 8 affects motions and EPR spectra of R p -R5a attached at position 7. Adapted from [80, 81] with permissions. 3.2.3.1 Sequence-dependent variations along DNA duplexes probed by a nucleotide-independent nitroxide probe Qin and co-workers reported that the R5/R5a probes (Figure 2, I), which can be attached at the phosphate backbone of any nucleotide, are able to report on site- and stereo-specific structural and dynamic features within a DNA duplex.[80-82] In these studies, R5 or R5a was attached, one at a time, to different sites of a dodecameric DNA duplex (designated as CS, Figure 5A), and were shown to not significantly perturb the native DNA conformation.[82] X-band EPR spectra, obtained either from mixed R p - and S p -diastereomers or from individual diastereomers, were found to vary depending on label locations (e.g., 28 duplex center vs. terminus) and surrounding DNA sequences (Figure 5A).[80, 82] Furthermore, SDSL and MD simulations were combined to investigate how R5a attached at the center of the CS duplex responds to a dT to dU mutation at the 3’ neighboring nucleotide (Figure 5B).[81] Experimentally, the mutation only altered the R p _R5a spectrum (Figure 5C left). MD simulations recuperated the experimental observations, and indicated that three factors combined would give rise to the observed effect: (i) interactions between DNA functional groups (i.e., C5-methyl of T) and the nitroxide (i.e., pyrroline ring and ring substituents); (ii) intrinsic rotameric conformers of the label; and (iii) nanosecond transitions between the BI/BII DNA backbone configurations. The study revealed that the label explores a naturally occurring transition in DNA conformation, thus directly connecting nitroxide motions with local DNA backbone dynamics.[81] 3.2.3.2 Local features in DNA duplexes probed by the rigid Ç label There is a long history of investigating site-specific features in DNA duplexes using nitroxides attached to base positions, as exemplified by work from the Bobst group and the Robinson group.[83, 84] One of the major recent advancements in this area comes from studies using the rigid labels. For example, Ç (Figure 2, III(ii)), which maintains full Watson-Crick base-pairing with G, has been used to investigate conformations and dynamics in DNA duplexes containing mis-match, bulge, and internal loop sequences.[85-87] In constructs with a single bulge neighboring a single base, the studies revealed bi-stable and temperature dependent switching between conformations.[85] Interestingly, in a fully- duplexed DNA without the bulge, Ç spectra showed little change when mis-pairing occurs, while a T C probe, in which the nitroxide was connected to a cytosine base with one rotatable 29 bond, gave large spectral variations between different mis-matched base pairs.[86] These studies further demonstrate the ability of spin labels to sense variations at the local environment of a DNA duplex. Ç was also used to investigate folding of a cocaine aptamer,[88] which binds cocaine at the junction of three DNA helices. Ç was incorporated independently into three different positions to monitor aptamer conformational changes upon cocaine binding. Combining EPR lineshape analyses with fluorescence measurements, it was proposed that the addition of cocaine led to formation of the short helix I, with a concomitant tilt between the pre-formed helices II and III, resulting in a Y-shaped configuration. The work provided an example of monitoring ligand-induced conformational changes in DNA. 4. Deriving structural information using distances measured with spin labels Distances measured between spin labels yield direct structural constraints, and have been highly informative in examining structure and conformational changes in nucleic acid and protein/nucleic-acid complexes.[6, 9, 16, 17, 19, 20] In this chapter, we will first outline the basis underlying this methodology, and then discuss examples of application. 4.1 Measuring inter-spin distances by determination of dipolar coupling strength Distance measurements are based on determining the strength of dipolar coupling between a pair of electron spins. The predominant dipolar interaction term between spins A and B is expressed as:[9, 16, 20, 89] 𝜔 !"# =𝐷 !"# 𝑆 ! ! 𝑆 ! ! ∙ !!!!"# ! ! ! ! (6a) and 30 𝐷 !"# = ! ! ! ! ! ! ! ! ! !!∙! (6b) where 𝜃 is the angle between the inter-spin distance vector and the external applied magnetic field, 𝑅 is the inter-spin distance, 𝜇 ! is the vacuum permeability, 𝑔 ! and 𝑔 ! are the g-factors for spins A and B, respectively. Note that the angular dependence in equation (6a) has important implications. In most SDSL distance measurements, samples are measured in a glassy state, in which the inter-spin vector adopts a static and isotropic distribution with respect to the external field. In such cases, values of 𝜃 are treated as equally populated, and the resulting 𝜔 !"# distribution shows a Pake pattern.[9, 16, 20, 89] 4.1.1 EPR methods for measuring dipolar coupling strength For inter-spin distances below 20 Å, the dipolar coupling is sufficiently strong to manifest as line-broadening in the cw-EPR spectrum. By comparing spectra obtained from doubly-labeled samples to the sum of the corresponding singly-labeled ones, the degree of line broadening can be determined, from which the dipolar interaction strength and the corresponding inter-spin distance are determined using deconvolution[90, 91] or convolution[92] methods. For distances longer than 20 Å, the broadening effect on cw-EPR is too small to be determined accurately, and pulsed-EPR methods are used to measure inter- spin distances. 31 Figure 6: Double electron-electron resonance (DEER) spectroscopy. (A) Pulse sequence for the four- pulse DEER. (B) Example of DEER data measured in a dodecameric DNA duplex. The measured dipolar evolution data is shown on the left, and the distance distribution obtained is shown on the right. Adapted from [24, 33] with permission. Two major types of pulsed EPR methods have been employed in SDSL. One class of methods measures dipolar interactions by monitoring the time-dependence in the generation of double-quantum coherence (DQC) involving both spins,[93, 94] for example, a 6-pulse DQC scheme has been applied to measure a distance of 72 Å in a 26-base-pair RNA duplex.[95] The DQC method uses strong, non-selective pulses to excite all radical populations in the system, and has been applied to measure nanometer distances at ambient temperature between a pair of trityl labels.[48, 49] Another class of methods measures inter-spin dipolar interactions using pulses that selectively manipulate different spin populations. The representative of this class, which is 32 widely used in SDSL, is double electron-electron resonance (DEER, also known as PELDOR, Figure 6).[96] In the dead-time free DEER scheme,[97, 98] a three-pulse sequence is applied to the ‘observer spin’ (spin A) to generate a refocused echo at a specific time. At a separate ‘pump frequency’, a fourth pulse is applied to flip a different population of spin (spin B). If spins A and B are coupled via dipolar interaction, flipping spin B causes dephasing in spin A, consequently reducing the amplitude of refocusing echo. By applying the pump pulse at variable “dipolar evolution time” (“t”, Figure 6A), oscillations in the refocused echo amplitude occur, the frequency of which yields the dipolar interaction strength. Note that in the majority of SDSL work, distance measurements are carried out at cryogenic temperatures. This limitation is due primarily to the relaxation behavior of the spin label. For example, to accurately measure an inter-nitroxide distance of 30 Å using DEER requires a dipolar evolution time t ~ 1 µs.[99] This sets the lower limit for the phase memory time of the nitroxide labels, which cannot be satisfied at ambient temperature. Currently, a number of directions are being actively pursued to extend the measurable range of distances at higher temperatures, including new spin labels (e.g., trityls[48, 49]) with more favorable relaxation behaviors, as well as new instrumentation (e.g. high-field[7]) and pulse schemes[100] for better sensitivity. 4.1.2 Determining inter-spin distance distribution from the measured dipolar coupling strength In most SDSL work, it is valid to assume that the inter-spin vector adopts a static distribution with all possible θ values populated equally, and Fourier transformation of the 33 observed time-domain dipolar evolution curve yields a Pake pattern. When the inter-spin distance distribution is narrow, its value can be calculated directly from measurements of the dipolar interaction energy corresponding to θ = 90° and/or θ = 0°.[101] This is indeed implemented in distance measurements of DNA and RNA duplexes using a nitroxide connected to the base via an acetylene bond.[101, 102] However, in the majority of cases, direct readout from the Pake pattern is not viable, as the inter-spin distance incurs a distribution due to intrinsic (e.g., dynamics in the parent macromolecule and the spin label) and extrinsic (e.g., inhomogeneity in the sample preparation, quality of the measured dipolar evolution trace) factors. Fitting of the measured dipolar evolution curve is carried out to obtain the inter-spin distance distribution profile, P(r). During this operation, one may either proceed without assuming a particular form factor in P(r),[103-106] or fit with model functions such as one or two Gaussians.[106-108] Note that the assumption of “equal adaptation of all possible θ” may not be valid, in which case explicit consideration of angular dependence is required in the analyses of pulsed EPR data. For example, Prisner, Schiemann, and co-workers carried out DEER measurements on pairs of rigid Ç labels incorporated in DNA duplexes.[109-111] The observed dipolar evolution traces varied depending on the frequency offset between the observer and the pump, which was attributed to selective excitation of spin populations that reside at different angles with respect to the inter-spin vector. To analyze such data, one approach is to combine all traces measured at different frequency offsets, and then treats the data in an orientation-independent fashion.[45, 111] Alternatively, global simulation of all traces measured with different frequency offsets was developed based on a twist-stretch dynamic model of Ç within a DNA duplex, from which both the magnitude and the 34 orientation of the inter-spin vector were determined.[109, 110] 4.2 Correlating the measured inter-spin distances to structure of the parent molecule In SDSL studies, the distance measured between a pair of spin labels (r spin ) generally differs from the corresponding distance between the labeling sites at the target molecule (r target ). The offset between r spin and r target strongly depends on the chemical structure of the spin label, and also could vary significantly among individual labeling sites within the same target molecule. Therefore, one should not and cannot assume a uniform offset between r spin and r target , even if the same spin labels are used at different sites of a macromolecule. Instead, one generally identifies (models) the spin label conformer (or the ensemble of conformers) present at each labeling site, from which r spin is then computed.[6, 9, 16] For rigidly attached labels, rotamer prediction is relatively simple. However, in a majority of studies, the spin label is attached via rotatable bonds, and rotamer prediction requires understanding of the detailed mechanisms of interactions between the label and the target macromolecule, which is not trivial and is an active area of investigation.[112] 35 Figure 7: The NASNOX program for correlating measured inter-nitroxide distances to structure of the parent nucleic acid molecules. (A) Schematic of the R5 nitroxide. (B) Online interface of web-based NASNOX program. (C) An example of a NASNOX predicted distance on a DNA duplex. The DNA is colored gray with the labeled nucleotide colored blue. Allowable conformers of R5 are colored green for R p -diastereomer and red for S p -diastereomer. (D) Correlation between the measured average distances (<r DEER >) to the corresponding NASNOX prediction ((<r NASNOX >) in a DNA duplex. Adapted from [24, 33] with permission. SDSL distance measurements in nucleic acids use primarily nitroxides, and label conformers have been predicted using a variety of approaches, including step-wise conformer searches, MD simulations, and geometry modeling.[6, 9, 16, 20] For example, Qin, Haworth, and co-workers developed a conformer search program, called NASNOX,[24, 33, 113] to identify sterically allowed conformers of the R5 label attached to a modified 36 phosphorothioate at the backbone of DNA or RNA (Figure 7). Starting from a fixed structure of the target nucleic acid molecule (input in the pdb format), the program models R5 at the target site using experimentally determined bond lengths and bond angles, then carries out a step-wise search of all combinations of the three torsion angles about the single bonds connecting the nitroxide pyrroline ring to the nucleic acid. Allowable conformers, defined as those presenting no steric collisions between the nitroxide and the parent nucleic acid, are retained. The resulting ensemble of allowable conformers is then used to compute the expected inter-spin distance distributions. The NASNOX program has been validated by calibration studies on both DNA[33, 113] and RNA[114] duplexes, as well as on protein/DNA complexes.[115, 116] The program is very efficient, as it uses a fixed target molecule structure fixed and considers only steric effects. This is highly beneficial in investigation of systems with unknown structures (see later). 4.3 Examples of application Over the past five years, SDSL measured distances have been used to assess reported high-resolution structures of nucleic acids and protein/nucleic-acid complexes,[34, 45, 115, 116] and to examine nucleic acid conformations as well as their variations,[13, 34, 42, 111, 116-125] Selected examples are described below to illustrate strategies established in these studies, which should be generally applicable to nucleic acid SDSL. 4.3.1 Sequence-dependent shape of the p53 response elements The tumor suppressor protein p53 is a master regulator that controls numerous signaling pathways. As a transcription factor, p53 specifically recognizing a family of DNA sequences 37 called the p53 response elements (REs), which consists of two decameric half-sites that are either consecutive or separated by a spacer.[126] Hundreds of REs have been validated in humans and mice, yet it remains unclear how p53 interacts with different REs to elicit rather distinct biological outcomes, e.g., cell cycle arrest vs. apoptosis. Using SDSL measured distances, Qin and co-workers investigated conformations of two prototypic REs involved in p53 regulation of the p21 and Bax genes.[115, 116] Data obtained in the absence and presence of the p53 core DNA binding domain (p53-DBD) revealed distinct RE conformational changes upon protein binding, and supported the hypothesis that sequence- dependent properties encoded in REs are exploited by p53 in order to achieve the most energetically favorable mode of deformation, consequently enhancing binding specificity. This work illustrated a number of generalizable approaches for SDSL investigation of nucleic acid conformation. First, SDSL is used to assess crystal structures of bound REs.[115, 116] This is illustrated using the tetrameric BAX/p53-DBD complex as an example (Figure 8), in which eight sets of distance spanning the central region of the BAX-RE were measured in the p53-DBD bound state. Each of the measured distances matched the corresponding value predicted using a recently determined crystal structure (pdb id 4HJE). This suggested that at the central region of the bound BAX-RE, the crystal structure accurately reflected the conformation in solution. This is significant because the crystal structures show that the central region of the BAX-RE, which contains a one-base-pair spacer between the two half-sites, deforms upon p53-DBD binding, resulting in a tetrameric complex nearly identical to what has been reported on other REs without a spacer. These SDSL measurements, carried out in the solution state, demonstrated that deformation at the central region was not a crystal packing artifact, and 38 therefore is an intrinsic feature of the p53/RE interaction. Figure 8: SDSL assessment of DNA conformation in the BAX/p53-DBD complex. (A) BAX-RE sequence. (B) An example data set obtained with labels attached at A10 and A14. DEER data are shown on the left, and NASNOX model is on the right. (C) Comparison between measured and predicted distances at the central region of the DNA. Adapted from [115] with permissions. In addition, SDSL measured distances were used to reveal DNA deformation upon p53-DBD binding.[116] For p21-RE (Figure 9), p53-DBD binding resulted in changes between 2 – 5 Å at several distances spanning the central region in between the two half-sites. These changes are beyond the error range, and are not induced by changes in the nitroxide rotamer distribution. They directly revealed p53 induced RE conformational change. Also note that there are distances that are invariant upon p53-DBD binding (Figure 9B).[116] As such, in these studies, it is important to obtain multiple sets of distances. 39 To understand molecular details in p53-induced RE deformation, SDSL was combined with Monte Carlo (MC) simulations to derive models of unbound p21- and BAX-RE.[116] This is important as high-resolution structure of unbound RE is not available, despite a large number of bound RE structures. In these studies, multiple nanometer distances were measured in the p21-RE and BAX-RE using the R5 probe (Figure 2, I) and DEER (Figure 6A). In parallel, MC simulations were used to generate all atom structures. Each of the models was evaluated based on a scoring function that measures the degree of deviation between the SDSL measured distances and the corresponding distances predicted in the model by NASNOX. Models conformed to the measured distances were found (Figure 9C), and showed clear differences from a B-DNA built using uniform base-pair parameters. As such, the combined SDSL/MC approach revealed sequence-dependent shapes of unbound REs in solution. Furthermore, comparisons between unbound and bound DNA structures revealed distinct modes of deformation at the central region of the duplex: the p21-RE undergoes a shift of the helical axis (Figure 9C), while the BAX-RE further unwinds.[116] Overall, SDSL studies on the p21- and BAX-RE established approaches for experimental probing of DNA deformation, as well as de novo mapping of DNA conformations. These should be applicable for studying DNA shape in solution, both in the absence and presence of protein factors. 40 Figure 9: Deformation of p21-RE upon p53-DBD binding. (A) Nucleotide sequences of p21-RE. (B) Examples of DEER data obtained in the absence (straight line) and presence (dashed line) of p53-DBD. (C) Comparison between the bound p21-RE (red), the best-fit model of the un-bound DNA (green), and the top 20 unbound-DNA models (blue). Adapted from [116] with permission. 4.3.2 Conformational flexibility of DNA duplexes probed by inter-nitroxide distance measurements In addition to the averaged or most probable distance, EPR measured dipolar interactions between the spin labels could yield a number of additional parameters, including width of distribution, number of sub-populations, and angular distribution of the inter-spin distance vector, which may provide very detailed information about the target macromolecule. One example comes from work of Prisner and co-workers, in which the full range of parameters derived from EPR measured dipolar interactions were used to examine the collective mode of motions in DNA duplexes.[111] In the study, pairs of the rigid Ç label were incorporated 41 into a series of 20 base-pair DNA duplexes. Orientation-selective DEER experiments were carried out at both X-band and G-band (180 GHz/6.4 T), and analyses were carried out to obtain averaged inter-spin distance, the width of distance distribution, and angular variations in inter-spin distance vectors. The data collectively supported a dynamic model for double- stranded DNA molecules, where stretching of the molecule leads to a slightly reduced radius of the helix induced by a cooperative twist-stretch coupling. The study further demonstrated that SDSL measured distances are not limited to investigations of a static structure. 4.3.3 Mapping the global structure of a three-way junction in a phi29 packaging RNA dimer SDSL measured distances were used to map the conformation of a three-way junction in the packaging RNA (pRNA), which is part of an intricate ring-shaped protein/RNA complex in the bacteriophage phi29 DNA packaging motor. The phi29 packaging motor is one of the strongest biological motors, and its ATPase activity depends on the pRNA. As such, information on pRNA conformation is needed in order to understand the mechanism of motor function. Qin and co-workers used the nucleotide-independent nitroxide probe to map the conformation of a pRNA three-way junction that bridges binding sites for the motor ATPase and the procapsid (Figure 10).[34] A total of 17 sets of inter-helical distances were measured in a pRNA dimer, which is the simplest ring-shaped pRNA complex and serves as a functional intermediate during motor assembly. In parallel, a pool containing around 65 billion distinct models were built using rigid body rotation and translation. The measured distances, together with steric chemical constraints, were used as criteria to select viable three-way junction models from the model pool. The results reveal a similar conformation 42 among the viable models, with two of the helices (H T and H L ) adopting an acute bend (Figure 10). This finding is in contrast to a reported pRNA tetramer crystal structure, in which H T and H L stack onto each other linearly. The studies establish a SDSL method for mapping global structures of complex RNA molecules, and provide information on pRNA conformation that aids investigations of phi29 packaging motor and developments of pRNA- based nanomedicine and nanomaterials. Figure 10: SDSL mapping of the three-way junction in the pRNA. On the left is a schematic showing the nitroxide probe used and the inter-helix distances measured; at the middle is an example of measured distance; and on the right the best-fit model. Adapted from [34] with permission. 4.3.4 Monitoring conformational changes in folded RNAs SDSL measured inter-spin distances have been used to monitor RNA folding in response to ligands.[120-122] For example, DeRose and co-workers carried out SDSL distance measurements in the hammerhead ribozyme, which is a class of catalytic RNA with a motif consisting of three A-form helices (stems) flanking a junction of conserved nucleotides. Using a pair of nitroxides attached to the 2’-sugar of selected nucleotides resided within stems I and II, inter-nitroxide distance was measured as Mg 2+ was titrated into the system.[120] The results showed that as Mg 2+ concentration increases, the ribozyme converts 43 from a set of randomly distributed configurations to a defined conformation that is compatible with the catalytically active form. The half-maximum concentration of Mg 2+ of RNA conformational transition was found to be lower than that estimated from cleavage rate measurements, indicating Mg 2+ requirements are different between ribozyme folding and catalysis. SDSL has also been used to monitor ligand-induced conformational changes in the aptamer domain of riboswitches. Prisner and co-workers used pulsed EPR to study an engineered neomycin-responsive riboswitch construct.[121] Four distances were measured, and surprisingly, they showed minimal changes upon binding to neomycin. Thus, it was concluded that the overall architecture of the riboswitch construct remained unchanged in the presence or absence of its target molecule, suggesting that the RNA forms a pre-existing neomycin binding pocket. In addition, Steinhoff and co-workers studies a different riboswitch aptamer, the tetracycline-binding switch.[122] Based on SDSL measured distance distribution, it was concluded that the RNA exists in two conformations, and tetracycline ligand binding efficiently shifted the equilibrium to favor one over the other. 4.3.5 In-cell pulsed EPR spectroscopy distance measurement An exciting recent development is the demonstration of SDSL measurements in intracellular environment or cell extract.[127, 128] In nucleic acid studies, spin-labeled DNAs or RNAs were injected into live Xenopus laevis oocytes, which provide a nucleus-like environment. After incubation, the samples were flash-frozen, and DEER measurements were carried out to measure inter-spin distances, which are used to gain structural information of the target molecule. Prisner and co-workers reported in-cell DEER 44 measurements on spin-labeled RNA hairpins, riboswitches, and DNA duplexes,[127] and concluded that folding of the RNAs remain the same in-cellulo as compared to in vitro, while the short DNA duplex may be stacking in-cellulo. In separate studies, Hartig, Drescher, and co-workers demonstrated folding of a non-B DNA structure, the G-quadruplex, in an intracellular environment.[128] They observed two major populations from the in-cell DEER measurements, which were also observed in in vitro measurements.[128, 129] The measured distances that are assigned to two topologically distinct folds of the G-quadruplex, and conversion between these two populations was reported to be slower as compared to that measured in vitro. The in-cell SDSL distance measurements open up new opportunities for directly assessing conformation of nucleic acids in vivo. However, many challenges remain. For example, reported work was limited by reduction of nitroxide-based spin labels in vivo.[127, 130]. Extending the life time of the spin label, including exploring non-nitroxide labels (e.g., Gd 3+ or trityl, Figure 2B, 2C), may be desirable. 5. Spin-labeling in NMR and EPR-NMR studies of nucleic acids Spin-labeled nucleic acids have also been used in studies where information is gained by monitoring interactions between the electron and nuclear spins. Electron Nuclear Double Resonance (ENDOR) and Electron Spin Echo Envelope Modulation (ESEEM) directly probe nuclei in the immediate vicinity of a radical, and have been used to characterize metal ion binding sites in nucleic acids.[16, 46] Another application is Paramagnetic Relaxation Enhancement (PRE), where enhanced nuclear relaxation due to unpaired electron(s) (i.e., spin labels) is utilized to gain structural and dynamic information in macromolecules.[131] 45 Varani and co-workers have carried out PRE measurements between spin-labeled RNAs and proteins, and obtained distance constraints to characterize the structure of protein/RNA complex.[27, 132, 133]. Clore and co-workers have used PRE to study protein/nucleic-acid complexes, particularly have successfully used paramagnetic metal compounds tagged to DNA (i.e., EDTA-Mn 2+ ) to characterize transient low-population states in protein/DNA complexes.[131] Very recently, nitroxide spin labels tagged at specific sites of RNAs have been use to carry out PRE measurements and obtain distances constraints up to 25 Å [134, 135]. 6. Conclusions and Perspectives Data reported from the past five years indicate that a foundation of using SDSL to study nucleic acids has been firmly established. Further advancements in EPR technology as well as methodology on manipulating spin labels and nucleic acids will expand the capability of SDSL, particularly for studying large nucleic acids and protein/nucleic-acid complexes. Furthermore, synergetic combinations with other experimental (e.g., NMR, chemical probing, SAXS) and computational (e.g., MD, MC) techniques, will expand the scope of the methodology and the questions that can be tackled by SDSL. 46 References 1. Altenbach, C., et al., Structural studies on transmembrane proteins. 2. Spin labeling of bacteriorhodopsin mutants at unique cysteines. Biochemistry, 1989. 28: p. 7806-7812. 2. Hubbell, W.L. and C. Altenbach, Investigation of structure and dynamics in membrane proteins using site-directed spin labeling. Curr. Opin. Struct. Biol., 1994. 4: p. 566-573. 3. Hubbell, W.L., D.S. Cafiso, and C. Altenbach, Identifying conformational changes with site-directed spin labeling. Nat. Struct. Biol., 2000. 7: p. 735-739. 4. Fanucci, G.E. and D.S. Cafiso, Recent Advances and applications of site-directed spin labeling. Curr. Opin. Struct. Biol., 2006. 16: p. 644-653. 5. Klare, J. and H.-J. Steinhoff, Spin labeling EPR. Photosynthesis Research, 2009. 102(2- 3): p. 377-390. 6. Reginsson, G.W. and O. Schiemann, Studying biomolecular complexes with pulsed electron-electron double resonance spectroscopy. Biochem Soc Trans, 2011. 39(1): p. 128-39. 7. Hubbell, W.L., et al., Technological advances in site-directed spin labeling of proteins. Current Opinion in Structural Biology, 2013. 23(5): p. 725-733. 8. Bobst, A.M., T.K. Sinha, and Y.-C.E. Pan, Electron spin resonance for detecting polyadenylate tracts in RNA's. Science, 1975. 188: p. 153-155. 9. Sowa, G.Z. and P.Z. Qin, Site-directed spin labeling studies on nucleic acid structure and dynamics. Prog Nucleic Acid Res Mol Biol, 2008. 82: p. 147-97. 10. Zhang, Z., et al., Rotational dynamics of HIV-1 nucleocapsid protein NCp7 as probed by a spin label attached by peptide synthesis. Biopolymers, 2008. 89(12): p. 1125-35. 11. Stone, K.M., et al., Electron Spin Resonance Shows Common Structural Features for Different Classes of EcoRI–DNA Complexes. Angewandte Chemie International Edition, 2008. 47(52): p. 10192-10194. 12. Zhang, X., et al., Conformational distributions at the N-peptide/boxB RNA interface studied using site-directed spin labeling. RNA, 2010. 16: p. 2474-2483 13. Freeman, A.D., et al., Analysis of conformational changes in the DNA junction-resolving enzyme T7 endonuclease I on binding a four-way junction using EPR. Biochemistry, 2011. 50(46): p. 9963-72. 14. Yang, Z., et al., ESR spectroscopy identifies inhibitory Cu2+ sites in a DNA-modifying enzyme to reveal determinants of catalytic specificity. Proceedings of the National Academy of Sciences, 2012. 109(17): p. E993-E1000. 15. Zhang, X., et al., Studying RNA using site-directed spin-labeling and continuous-wave electron paramagnetic resonance spectroscopy. Method Enzymol., 2009. 469: p. 303- 328. 16. Krstic, I., et al., Structure and dynamics of nucleic acids. Top Curr Chem, 2012. 321: p. 159-98. 17. Nguyen, P. and P.Z. Qin, RNA dynamics: perspectives from spin labels. WIREs: RNA, 2012. 3: p. 62-72. 18. Shelke, S.A. and S.T. Sigurdsson, Site-Directed Spin Labelling of Nucleic Acids. European Journal of Organic Chemistry, 2012(12): p. 2291-2301. 19. Zhang, X. and P.Z. Qin, Studying RNA Folding Using Site-Directed Spin Labeling, in Biophysics of RNA Folding, R. Russell, Editor. 2013, Springer New York. p. 69-87. 47 20. Fedorova, O.S. and Y.D. Tsvetkov, Pulsed electron double resonance in structural studies of spin-labeled nucleic acids. Acta Naturae, 2013. 5(1): p. 9-32. 21. Shelke, S.A. and S.T. Sigurdsson, Noncovalent and site-directed spin labeling of nucleic acids. Angew Chem Int Ed Engl, 2010. 49(43): p. 7984-6. 22. Song, Y., et al., Pulsed dipolar spectroscopy distance measurements in biomacromolecules labeled with Gd(III) markers. Journal of Magnetic Resonance, 2011. 210(1): p. 59-68. 23. Yang, Z., et al., Pulsed ESR dipolar spectroscopy for distance measurements in immobilized spin labeled proteins in liquid solution. J Am Chem Soc, 2012. 134(24): p. 9950-2. 24. Qin, P.Z., et al., Measuring nanometer distances in nucleic acids using a sequence- independent nitroxide probe. Nat. Protocols, 2007. 2(10): p. 2354-2365. 25. Edwards, T.E. and S.T. Sigurdsson, Site-specific incorporation of nitroxide spin-labels into 2'-positions of nucleic acids. Nature Protocols, 2007. 2: p. 1954-1962. 26. Kim, N., A. Murali, and V.J. DeRose, A distance ruler for RNA using EPR and site- directed spin labeling. Chem. Biol., 2004. 11: p. 939-48. 27. Ramos, A. and G. Varani, A new method to detect long-range protein-RNA contacts: NMR detection of electron-proton relaxation induced by nitroxide spin-labeled RNA. J. Am. Chem. Soc., 1998. 120: p. 10992-10993. 28. Qin, P.Z., et al., Monitoring RNA base structure and dynamics using site-directed spin labeling. Biochemistry, 2003. 42: p. 6772-83. 29. Jakobsen, U., et al., Site-directed spin-labeling of nucleic acids by click chemistry: detection of abasic sites in duplex DNA by EPR spectroscopy. J Am Chem Soc, 2010. 132(30): p. 10424-8. 30. Ding, P., et al., Site-Directed Spin-Labeling of DNA by the Azide–Alkyne ‘Click’ Reaction: Nanometer Distance Measurements on 7-Deaza-2′-deoxyadenosine and 2′- Deoxyuridine Nitroxide Conjugates Spatially Separated or Linked to a ‘dA-dT’ Base Pair. Chemistry – A European Journal, 2010. 16(48): p. 14385-14396. 31. Büttner, L., et al., Synthesis of spin-labeled riboswitch RNAs using convertible nucleosides and DNA-catalyzed RNA ligation. Bioorganic & Medicinal Chemistry, 2013. 21(20): p. 6171-6180. 32. Qin, P.Z., et al., Quantitative analysis of the GAAA tetraloop/receptor interaction in solution: A site-directed spin labeling study. Biochemistry, 2001. 40: p. 6929-6936. 33. Cai, Q., et al., Site-directed spin labeling measurements of nanometer distances in nucleic acids using a sequence-independent nitroxide probe. Nucleic Acids Res, 2006. 34(17): p. 4722-30. 34. Zhang, X., et al., Global structure of a three-way junction in a phi29 packaging RNA dimer determined using site-directed spin labeling. J Am Chem Soc, 2012. 134(5): p. 2644-52. 35. Spaltenstein, A., B.H. Robinson, and P.B. Hopkins, A rigid and nonperturbing probe for duplex DNA motion J. Am. Chem. Soc., 1988. 110(4): p. 1299 - 1301. 36. Miller, T.R., et al., A Probe for Sequence-Dependent Nucleic Acid Dynamics J. Am. Chem. Soc., 1995. 117(36): p. 9377 - 9378. 37. Gannett, P.M., et al., Probing triplex formation by EPR spectroscopy using a newly synthesized spin label for oligonucleotides. Nucleic Acids Res., 2002. 30: p. 5328-37. 48 38. Barhate, N., et al., A Nucleoside That Contains a Rigid Nitroxide Spin Label: A Fluorophore in Disguise. Angew. Chem. Int. Ed., 2007. 46(15): p. 2655-2658. 39. Höbartner, C., et al., Synthesis and Characterization of RNA Containing a Rigid and Nonperturbing Cytidine-Derived Spin Label. The Journal of Organic Chemistry, 2012. 77(17): p. 7749-7754. 40. Schiemann, O., et al., Spin labeling of oligonucleotides with the nitroxide TPA and use of PELDOR, a pulse EPR method, to measure intramolecular distances. Nat. Protoc., 2007. 2(4): p. 904-923. 41. Piton, N., et al., Base-specific spin-labeling of RNA for structure determination. Nucl. Acids Res., 2007. 35(9): p. 3128-3143. 42. Sicoli, G., et al., Probing secondary structures of spin-labeled RNA by pulsed EPR spectroscopy. Angew Chem Int Ed Engl, 2010. 49(36): p. 6443-7. 43. Shin, Y.K. and W.L. Hubbell, Determination of electrostatic potentials at biological interfaces using electron-electron double resonance. Biophys. J., 1992. 61: p. 1443-53. 44. Shelke, S.A. and S.T. Sigurdsson, Structural changes of an abasic site in duplex DNA affect noncovalent binding of the spin label c. Nucleic Acids Res, 2012. 40(8): p. 3732- 40. 45. Reginsson, G.W., et al., Protein-induced changes in DNA structure and dynamics observed with noncovalent site-directed spin labeling and PELDOR. Nucleic Acids Res, 2013. 41(1): p. e11. 46. Hunsicker-Wang, L., M. Vogt, and V.J. Derose, EPR methods to study specific metal-ion binding sites in RNA. Methods Enzymol, 2009. 468: p. 335-67. 47. Matalon, E., et al., Gadolinium(III) Spin Labels for High-Sensitivity Distance Measurements in Transmembrane Helices. Angewandte Chemie International Edition, 2013. 52(45): p. 11831-11834. 48. Yang, Z., et al., Pulsed ESR Dipolar Spectroscopy for Distance Measurements in Immobilized Spin Labeled Proteins in Liquid Solution. Journal of the American Chemical Society, 2012. 134(24): p. 9950-9952. 49. Reginsson, G.W., et al., Trityl Radicals: Spin Labels for Nanometer-Distance Measurements. Chemistry – A European Journal, 2012. 18(43): p. 13580-13584. 50. Berliner, L.J., ed. Spin Labeling: Theory and Applications. 1976, Academic Press: New York. 592. 51. Nordio, P.L., General Magnetic Resonance Theory, in Spin Labeling: Theory and Applications, L.J. Berliner, Editor. 1976, Academic Press: New York. p. 5-52. 52. Freed, J.H., Theory of Slo Tumbling ESR Spectra fro Nitroxides., in Spin Labeling: Theory and Applications, L.J. Berliner, Editor. 1976, Academic Press: New York. p. 53- 130. 53. Griffith, H.O. and P.C. Jost, Lipid spin labels in biological manbranes, in Spin Labeling: Theory and Applications, L.J. Berliner, Editor. 1976, Academic Press: New York. p. 453- 423. 54. Marsh, D., Electron spin resonance: Spin labels. Mol. Biol. Biochem. Biophys., 1981. 31: p. 51-142. 55. Mchaourab, H.S., et al., Motion of spin-labeled side chains in T4 lysozyme. Correlation with protein structure and dynamics. Biochemistry, 1996. 35: p. 7692-7704. 56. Columbus, L., et al., Molecular motion of spin-labeled side chains in a-helices: analysis by variation of side chain structure. Biochemistry, 2001. 40: p. 3828-3846. 49 57. Columbus, L. and W.L. Hubbell, A new spin on protein dynamics. Trends Biochem Sci, 2002. 27: p. 288-95. 58. Ding, Y., Zhang, X., Tham, KW., Qin, PZ., Experimental mapping of DNA duplex shape enabled by global lineshape analyses of a sequence-independent nitroxide probe. Submitted, 2014. 59. Moro, G. and J.H. Freed, Efficient computation of magnetic resonance spectra and related correlation functions from stochastic Liouville equations. J. Phys. Chem., 1980. 84: p. 2837-2840. 60. Khairy, K., D. Budil, and P. Fajer, Nonlinear-least-squares analysis of slow motional regime EPR spectra. J Magn Reson, 2006. 183(1): p. 152-9. 61. Budil, D.E., et al., Nonlinear-least-squares analysis of slow-motion EPR spectra in one and two dimensions using a modified Levenberg-Marquardt algorithm. J. Mag. Res. A, 1996. 120: p. 155-189. 62. Earle, K.A. and D.E. Budil, Calculating Slow-motion ESR Spectra of Spin-Labeled Polymers, in Advanced ESR Methods in Polymer Reaserch, S. Schlick, Editor. 2006, John Wiley and Sons: New York. p. 53-83. 63. Robinson, B.H., L.J. Slusky, and F.P. Auteri, Direct simulation of continous wave electron paramagnetic resonance spectra from Brownian dynamics trajectories. J. Chem. Phys., 1992. 96: p. 2609-2616. 64. Steinhoff, H.J. and W.L. Hubbell, Calculation of electron paramagnetic resonance spectra from Brownian dynamics trajectories: application to nitroxide side chains in proteins. Biophys J, 1996. 71(4): p. 2201-12. 65. Budil, D.E., et al., Calculating slow-motional electron paramagnetic resonance spectra from molecular dynamics using a diffusion operator approach. J Phys Chem A, 2006. 110(10): p. 3703-13. 66. DeSensi, S.C., et al., Simulation of nitroxide electron paramagnetic resonance spectra from brownian trajectories and molecular dynamics simulations. Biophys J, 2008. 94(10): p. 3798-809. 67. Sezer, D., J.H. Freed, and B. Roux, Multifrequency electron spin resonance spectra of a spin-labeled protein calculated from molecular dynamics simulations. J Am Chem Soc, 2009. 131(7): p. 2597-605. 68. Xi, X., et al., HIV-1 nucleocapsid protein NCp7 and its RNA stem loop 3 partner: rotational dynamics of spin-labeled RNA stem loop 3. Biochemistry, 2008. 47(38): p. 10099-110. 69. Grant, G.P.G., et al., Motions of the Substrate Recognition Duplex in a Group I Intron Assessed by Site-Directed Spin Labeling. J. Am. Chem. Soc., 2009. 131(9): p. 3136-3137. 70. Nguyen, P., et al., A Single-Stranded Junction Modulates Nanosecond Motional Ordering of the Substrate Recognition Duplex of a Group I Ribozyme. ChemBioChem, 2013. 14(14): p. 1720-1723. 71. Hougland, J.L., et al., How the Group I Intron Works: A Case Study of RNA Structure and Function, in RNA World, R.F. Gesteland, T.R. Cech, and J.F. Atkins, Editors. 2006, Cold Spring Harbor Laboratory Press: Cold Spring Harbor, New York. p. 133-205. 72. Edwards, T.E., et al., Site-specific incorporation of nitroxide spin-labels into internal sites of the TAR RNA; structure-dependent dynamics of RNA by EPR spectroscopy. J Am Chem Soc, 2001. 123(7): p. 1527-8. 50 73. Edwards, T.E., T.M. Okonogi, and S.T. Sigurdsson, Investigation of RNA-protein and RNA-metal ion interactions by electron paramagnetic resonance spectroscopy. The HIV TAR-Tat motif. Chem Biol, 2002. 9(6): p. 699-706. 74. Edwards, T.E. and S.T. Sigurdsson, Electron paramagnetic resonance dynamic signatures of TAR RNA-small molecule complexes provide insight into RNA structure and recognition. Biochemistry, 2002. 41(50): p. 14843-7. 75. Edwards, T.E. and S.T. Sigurdsson, EPR spectroscopic analysis of TAR RNA-metal ion interactions. Biochem Biophys Res Commun, 2003. 303(2): p. 721-5. 76. Edwards, T.E., B.H. Robinson, and S.T. Sigurdsson, Identification of amino acids that promote specific and rigid TAR RNA-tat protein complex formation. Chem Biol, 2005. 12(3): p. 329-37. 77. Qin, P.Z., J. Feigon, and W.L. Hubbell, Site-directed spin labeling studies reveal solution conformational changes in a GAAA tetraloop receptor upon Mg(2+)-dependent docking of a GAAA tetraloop. J Mol Biol, 2005. 351(1): p. 1-8. 78. Edwards, T.E. and S.T. Sigurdsson, EPR spectroscopic analysis of U7 hammerhead ribozyme dynamics during metal ion induced folding. Biochemistry, 2005. 44(38): p. 12870-8. 79. Kim, N.K., A. Murali, and V.J. DeRose, Separate metal requirements for loop interactions and catalysis in the extended hammerhead ribozyme. J Am Chem Soc, 2005. 127(41): p. 14134-5. 80. Popova, A.M. and P.Z. Qin, A nucleotide-independent nitroxide probe reports on site- specific stereomeric environment in DNA. Biophysical Journal, 2010. 99(7): p. 2180- 2189. 81. Popova, A.M., et al., Nitroxide sensing of a DNA microenvironment: mechanistic insights from EPR spectroscopy and molecular dynamics simulations. J Phys Chem B, 2012. 116(22): p. 6387-96. 82. Popova, A.M., et al., Site-Specific DNA Structural and Dynamic Features Revealed by Nucleotide-Independent Nitroxide Probes. Biochemistry, 2009. 48(36): p. 8540-8550. 83. Keyes, R.S. and A.M. Bobst, Spin-labeled nucleic acids, in Biological Magnetic Resonance, L.J. Berliner, Editor. 1998, Plenum Press: New York. p. 283-338. 84. Robinson, B.H., C. Mailer, and G. Drobny, Site-specific dynamics in DNA: Experiments. Annu. Rev. Biophys. Biomol. Struct., 1997. 26: p. 629-58. 85. Smith, A.L., et al., Conformational equilibria of bulged sites in duplex DNA studied by EPR spectroscopy. J Phys Chem B, 2009. 113(9): p. 2664-75. 86. Cekan, P. and S.T. Sigurdsson, Identification of single-base mismatches in duplex DNA by EPR spectroscopy. J Am Chem Soc, 2009. 131(50): p. 18054-6. 87. Cekan, P. and S.T. Sigurdsson, Conformation and dynamics of nucleotides in bulges and symmetric internal loops in duplex DNA studied by EPR and fluorescence spectroscopies. Biochemical and Biophysical Research Communications, 2012. 420(3): p. 656-661. 88. Cekan, P., E.O. Jonsson, and S.T. Sigurdsson, Folding of the cocaine aptamer studied by EPR and fluorescence spectroscopies using the bifunctional spectroscopic probe C. Nucleic Acids Res, 2009. 37(12): p. 3990-5. 89. Schiemann, O. and T.F. Prisner, Long-range distance determinations in biomacromolecules by EPR spectroscopy. Q Rev Biophys, 2007. 40(1): p. 1-53. 90. Rabenstein, M.D. and Y.K. Shin, Determination of the distance between two spin labels attached to a macromolecule. Proc Natl Acad Sci U S A, 1995. 92(18): p. 8239-43. 51 91. Altenbach, C., et al., Estimation of inter-residue distances in spin labeled proteins at physiological temperatures: experimental strategies and practical limitations. Biochemistry, 2001. 40(51): p. 15471-82. 92. Steinhoff, H.J., et al., Determination of interspin distances between spin labels attached to insulin: comparison of electron paramagnetic resonance data with the X-ray structure. Biophys J, 1997. 73(6): p. 3287-98. 93. Borbat, P.P. and J.H. Freed, Multiple-quantum ESR and distance measurements. Chemical Physics Letters, 1999. 313(1-2): p. 145-154. 94. Borbat, P.P., et al., Electron spin resonance in studies of membranes and proteins. Science, 2001. 291(5502): p. 266-269. 95. Borbat, P.P., et al., Measurement of large distances in biomolecules using double- quantum filtered refocused electron spin-echoes. Journal of the American Chemical Society, 2004. 126(25): p. 7746-7747. 96. Milov, A.D., A.B. Ponomarev, and Y.D. Tsvetkov, Electron-electron double resonance in electron spin echo: Model biradical systems and the sensitized photolysis of decalin. Chemical Physics Letters, 1984. 110(1): p. 67-72. 97. Martin, R.E., et al., Determination of End-to-End Distances in a Series of TEMPO Diradicals of up to 2.8 nm Length with a New Four-Pulse Double Electron Electron Resonance Experiment. Angew. Chem. Int. Ed., 1998. 37(20): p. 2834-2837. 98. Pannier, M., et al., Dead-Time Free Measurement of Dipole-Dipole Interactions between Electron Spins. J. Magn. Res., 2000. 142: p. 331-40. 99. Jeschke, G. and Y. Polyhach, Distance measurements on spin-labelled biomacromolecules by pulsed electron paramagnetic resonance. Phys Chem Chem Phys, 2007. 9(16): p. 1895-910. 100. Borbat, P.P., E.R. Georgieva, and J.H. Freed, Improved Sensitivity for Long-Distance Measurements in Biomolecules: Five-Pulse Double Electron-Electron Resonance. J Phys Chem Lett, 2013. 4(1): p. 170-175. 101. Schiemann, O., et al., A PELDOR based nanometer distance ruler for oligonucleotides. J. Am. Chem. Soc., 2004. 126: p. 5722-9. 102. Schiemann, O., et al., Nanometer distance measurements on RNA using PELDOR. J Am Chem Soc, 2003. 125(12): p. 3434-5. 103. Jeschke, G., et al., Data analysis procedures for pulse ELDOR measurements of broad distance distributions. Applied Magnetic Resonance, 2004. 26(1-2): p. 223-244. 104. Bowman, M.K., et al., Visulation of distance distribution from pulsed double electron- electron resonance data. Appl. Magn. Reson., 2004. 26: p. 23-39. 105. Chiang, Y.W., P.P. Borbat, and J.H. Freed, Maximum entropy: a complement to Tikhonov regularization for determination of pair distance distributions by pulsed ESR. J. Magn. Reson., 2005. 177: p. 184-196. 106. Jeschke, G., et al., DeerAnalysis2006 - a comprehensive software package for analyzing pulsed ELDOR data. Applied Magnetic Resonance, 2006. 30(3-4): p. 473-498. 107. Sen, K.I., Fajer, P. G., Analysis of DEER signal with DEFit. EPR Newsletter, 2009. 19: p. 26-28. 108. Vileno, B., et al., Broad disorder and the allosteric mechanism of myosin II regulation by phosphorylation. Proc Natl Acad Sci U S A, 2011. 108(20): p. 8218-23. 52 109. Schiemann, O., et al., Relative orientation of rigid nitroxides by PELDOR: beyond distance measurements in nucleic acids. Angew Chem Int Ed Engl, 2009. 48(18): p. 3292-5. 110. Marko, A., et al., Analytical method to determine the orientation of rigid spin labels in DNA. Phys Rev E Stat Nonlin Soft Matter Phys, 2010. 81(2 Pt 1): p. 021911. 111. Marko, A., et al., Conformational Flexibility of DNA. Journal of the American Chemical Society, 2011. 133(34): p. 13375-13379. 112. Jeschke, G., Conformational dynamics and distribution of nitroxide spin labels. Progress in Nuclear Magnetic Resonance Spectroscopy, 2013. 72(0): p. 42-60. 113. Price, E.A., et al., Computation of nitroxide-nitroxide distances for spin-labeled DNA duplexes. Biopolymers, 2007. 87: p. 40-50. 114. Cai, Q., et al., Nanometer distance measurements in RNA using site-directed spin labeling. Biophys J, 2007. 93(6): p. 2110-7. 115. Chen, Y., et al., Structure of p53 binding to the BAX response element reveals DNA unwinding and compression to accommodate base-pair insertion. Nucleic Acids Res, 2013. 41(17): p. 8368-76. 116. Zhang, X., et al., Conformations of p53 response elements in solution deduced using site- directed spin labeling and Monte Carlo sampling. Nucleic Acids Res, 2014. 42(4): p. 2789-97. 117. Sicoli, G., et al., Lesion-induced DNA weak structural changes detected by pulsed EPR spectroscopy combined with site-directed spin labelling. Nucleic Acids Res, 2009. 37(10): p. 3165-76. 118. Kuznetsov, N.A., et al., PELDOR study of conformations of double-spin-labeled single- and double-stranded DNA with non-nucleotide inserts. Phys Chem Chem Phys, 2009. 11(31): p. 6826-32. 119. Santangelo, M.G., et al., Can copper(II) mediate Hoogsteen base-pairing in a left-handed DNA duplex? A pulse EPR study. Chemphyschem, 2010. 11(3): p. 599-606. 120. Kim, N.K., M.K. Bowman, and V.J. DeRose, Precise mapping of RNA tertiary structure via nanometer distance measurements with double electron-electron resonance spectroscopy. J Am Chem Soc, 2010. 132(26): p. 8882-4. 121. Krstic, I., et al., PELDOR spectroscopy reveals preorganization of the neomycin- responsive riboswitch tertiary structure. J Am Chem Soc, 2010. 132(5): p. 1454-5. 122. Wunnicke, D., et al., Ligand-induced conformational capture of a synthetic tetracycline riboswitch revealed by pulse EPR. RNA, 2011. 17(1): p. 182-8. 123. Kuznetsov, N.A., et al., PELDOR analysis of enzyme-induced structural changes in damaged DNA duplexes. Mol Biosyst, 2011. 7(9): p. 2670-80. 124. Flaender, M., et al., A triple spin-labeling strategy coupled with DEER analysis to detect DNA modifications and enzymatic repair. Chembiochem, 2011. 12(17): p. 2560-3. 125. Wunnicke, D., et al., Site-Directed Spin Labeling of DNA Reveals Mismatch-Induced Nanometer Distance Changes between Flanking Nucleotides. The Journal of Physical Chemistry B, 2012. 116(14): p. 4118-4123. 126. Riley, T., et al., Transcriptional control of human p53-regulated genes. Nat Rev Mol Cell Biol, 2008. 9(5): p. 402-412. 127. Krstic, I., et al., Long-range distance measurements on nucleic acids in cells by pulsed EPR spectroscopy. Angew Chem Int Ed Engl, 2011. 50(22): p. 5070-4. 53 128. Azarkh, M., et al., Intracellular conformations of human telomeric quadruplexes studied by electron paramagnetic resonance spectroscopy. Chemphyschem, 2012. 13(6): p. 1444-7. 129. Singh, V., et al., Conformations of individual quadruplex units studied in the context of extended human telomeric DNA. Chemical Communications, 2012. 48(66): p. 8258-8260. 130. Azarkh, M., et al., Site-directed spin-labeling of nucleotides and the use of in-cell EPR to determine long-range distances in a biologically relevant environment. Nat Protoc, 2013. 8(1): p. 131-47. 131. Clore, G.M. and J. Iwahara, Theory, Practice, and Applications of Paramagnetic Relaxation Enhancement for the Characterization of Transient Low-Population States of Biological Macromolecules and Their Complexes. Chemical Reviews, 2009. 109(9): p. 4108-4139. 132. Ramos, A., P. Bayer, and G. Varani, Determination of the structure of the RNA complex of a double-stranded RNA-binding domain from Drosophila Staufen protein. Biopolymers, 1999. 52: p. 181-196. 133. Ramos, A., et al., RNA recognition by a Staufen double-stranded RNA-binding domain. EMBO J., 2000. 19: p. 997-1009. 134. Wunderlich, C.H., et al., A Novel Paramagnetic Relaxation Enhancement Tag for Nucleic Acids: A Tool to Study Structure and Dynamics of RNA. ACS Chemical Biology, 2013. 8(12): p. 2697-2706. 135. Helmling, C., et al., Noncovalent Spin Labeling of Riboswitch RNAs To Obtain Long- Range Structural NMR Restraints. ACS Chemical Biology, 2014. 54 Chapter 3 Experimental mapping of DNA duplex shape enabled by global lineshape analyses of a sequence-independent nitroxide probe* Abstract Sequence-dependent variation in structure and dynamics of a DNA duplex, collectively referred to as “DNA shape”, critically impacts interactions between DNA and proteins. Here a method based on the technique of site-directed spin labeling was developed to experimentally map shapes of two DNA duplexes that contain response elements of the p53 tumor suppressor. An R5a nitroxide spin label, which was covalently attached at a specific phosphate group, was scanned consecutively through the DNA duplex. X-band continuous-wave electron paramagnetic resonance spectroscopy was used to monitor rotational motions of R5a, which report on DNA structure and dynamics at the labeling site. An approach based on Pearson coefficient analysis was developed to collectively examine the degree of similarity among the ensemble of R5a spectra. The resulting Pearson coefficients were used to generate maps representing variation of R5a mobility along the DNA duplex. The R5a mobility maps were found to correlate with maps of certain DNA helical parameters, and were capable of revealing similarity and deviation in the shape of the two closely related DNA duplexes. Collectively, the R5a probe and the Pearson coefficient based lineshape analysis scheme yielded a generalizable method for examining sequence-dependent DNA shapes. *Ding, Y., Zhang, X., Tham, K.W. and Qin, P.Z.* (2014) "Experimental mapping of DNA duplex shape enabled by global lineshape analyses of a nucleotide-independent nitroxide probe." Nucl. Acids Res., 42, e140 55 1. Introduction DNA shape refers to sequence-dependent structural and dynamic variations on a double- stranded duplex. At the global level, variations of DNA shape manifest as polymorphisms of the double helix (e.g., B-, A-, and Z-DNA) and different propensity of bending; while at the local level (i.e., base-pair or base-pair step), shape may vary both in geometrical (structural) characteristics (e.g., narrowing of the minor groove in B-DNA; DNA kink) and in elastic properties (or deformability) that are characterized by the energetics of relative rotation and displacement of neighboring base pairs (1,2). The shape of a duplex, which is encoded by its sequence, critically impacts and influences interactions between DNA and other molecules, such as proteins, small ligands, and metal ions (1,2). As such, information on DNA shape is essential for understanding and manipulating biological functions. However, knowledge on sequence-dependent DNA shape, particularly in the naked DNA, is rather inadequate (1,2). Currently, our understanding of sequence-dependent shape in the DNA is derived largely from computational analyses. Most noticeably, Molecular Dynamics (reviewed in (3)) and Monte Carlo (4) simulations have been reported on DNAs. In addition, bio-informative approaches have been developed to analyze DNA and protein-DNA complex structures in the Protein Data Bank, and the resulting sequence-dependent helical parameters have been used to represent DNA shape (5). On the other hand, experimental probing of naked DNA shape, which is needed to validate and refine computational results, is challenging. Foot-printing experiments, such as those using hydroxyl radicals (6), have been used to probe DNA shape at the genomic scale, but their structural resolution is limited. X-ray crystallography and NMR spectroscopy have provided high-resolution structures of DNAs, however, their number is small compared to available data for protein-DNA complexes (4). In addition, X-ray crystallography studies are 56 hindered by crystal-packing biases, and NMR studies are constrained by the size of the DNA. We have been exploring a biophysical technique, site-directed spin labeling (SDSL), to probe sequence-dependent shape of DNAs. In SDSL, a stable nitroxide radical is attached at a specific site of a bio-molecule, and Electron Paramagnetic Resonance (EPR) spectroscopy is used to monitor the behavior of the nitroxide, from which structural and dynamic information at the labeling site can be derived (7). SDSL has matured as a technique for studying protein structure and dynamics (8,9). For SDSL studies of nucleic acids, information has been obtained primarily from nanometer distances measured between pairs of nitroxides, as well as mobility of nitroxides at 0.5 – 20 ns timescale, which is derived from X-band continuous-wave (cw-) EPR spectroscopy (10-15). A powerful tool commonly used in protein SDSL studies is to scan a nitroxide probe through consecutive sites within a segment of the primary sequence. By collectively analyzing patterns of the measured cw-EPR spectra, one can obtain information such as the underlying protein secondary structure as well as spatial organization of these secondary structure elements with respect to each other and to the environment (e.g., a lipid bilayer) (7,8). Conceptually, scanning a nitroxide consecutively through a DNA duplex could reveal structural and dynamic variations along the DNA, thus yielding information on DNA shape. However, to our knowledge, consecutive nitroxide scanning has not been reported in nucleic acids, although a number of studies have used nitroxide attached to multiple sites within the target molecule (e.g. (16-22)). We have developed a family of nucleotide-independent nitroxide probes that can be covalently attached at the phosphate backbone of a DNA or RNA (23,24). We have shown that X-band cw-EPR spectra of this family of probes, particularly one designated as R5a (Figure 1), vary between different internal sites of DNA duplexes (21,25,26). Such spectral variations arise 57 from differences in the modulation of nitroxide nanosecond rotational motions by the DNA, and therefore encode information on DNA local structural and dynamic features (21,25,26). Taking advantage that R5a can be attached to arbitrary target DNA site, we present here work on using R5a “scanning” to map DNA duplex shape. R5a was attached, consecutively and one nucleotide at a time, within two DNA duplexes that contain the p53 tumor suppressor response elements located respectively at the promoters of the BAX and p21 genes (Figure 1A) (27,28). X-band cw- EPR spectra were measured under conditions at which lineshape variations report on differences in the local DNA environment, and the degree of similarity among the ensemble of R5a spectra were characterized using a newly developed approach based on Pearson coefficient analysis. The maps of the resulting Pearson coefficients along these two sequences, which represent R5a mobility variations along the duplex, provided a measure of DNA shape at the level of individual base-pairs, and revealed sequence-dependent shape variations between the two closely-related duplexes. Overall, the R5a probe and the Pearson coefficient based lineshape analysis tool yielded a generalizable method for examining sequence-dependent DNA shapes. 58 Figure 1: Experimental design. (A) Sequences of DNA duplexes studied. Both duplexes are REs of the p53 tumor suppressor, with two decameric half-sites (marked by brackets) each containing a “CWWG” core (bold). Lower case letters in the BAX duplex mark the one-base-pair spacer between the two half- sites. No spacer is present in p21. For each duplex, the numbering scheme of the phosphates is shown, and the spin-labeled sites are indicated in red. (B) Chemical structure of the R5a spin label. (C) A schematic showing EPR sample assembly. One of the DNA strands was spin-labeled (left), paired with the complementary biotinylated strand (middle), then tethered to a streptavidin tetramer to form an approximately 103 kD complex (right). Representative EPR spectra are shown below each step. 2. Materials and Methods 2.1 Preparation of spin-labeled DNA Figure 1A shows sequences of DNA studied. For each duplex, one of the strands was subjected to spin labeling, while the complementary strand had no modification at the nucleotides but a biotin attached at the 5’ terminus. All DNA oligonucleotides were synthesized by solid-phase chemical synthesis (Integrated DNA Technologies, Coralville, IA). 59 Nitroxide spin labels, designated as R5a (Figure 1B), were site-specifically attached to DNAs using the phosphorothioate scheme as previously described (21,23,24). Briefly, during chemical synthesis, the phosphate group at the intended labeling site of a DNA strand is modified to a phosphorothioate. To attach the R5a label, crude DNA (~ 0.6 mM) was incubated with 200 mM of 4-bromo-3-bromo-methyl-2,2,5,5-tetramethyl-1-oxylpyrroline (R5a precursor) in a solution of 0.1 M MES (pH 5.8), and 40% (v/v) acetonitrile. The reaction mixture was incubated in dark at room temperature for 24 hr under constant shaking. Labeled DNA was purified by anion- exchange HPLC, followed by desalting using a reverse-phase column (21,24). Desalted oligonucleotides were then lyophilized, re-suspended in water, and stored at -20°C. The final concentrations of DNA were determined by absorbance at 260 nm using extinction coefficients listed in Supplemental Information (SI) Table S1. 2.2 EPR sample preparation Each EPR sample contained approximately 50 µM of R5a-labeled DNA duplex, and was assembled in two steps (Figure 1C). First, a stock solution of the DNA duplex (200 µM) was prepared by annealing the R5a-labeled strand with the proper biotin-labeled complementary strand at a ratio of 1 to 1.1. After mixing appropriate amount of the respective strands, the mixture was heated at 95°C for 1 min, and then cooled down at room temperature for 1 min. The proper amount of salt was then added to the mixture to reach a final concentration of 50 mM Tris-HCl (pH 7.5) and 100 mM NaCl. The solution was left standing at room temperature for > 1hr to allow duplex formation. In the second step, a spin-labeled DNA duplex was mixed with streptavidin (Amresco, Irvine CA) at a concentration ratio of 1 to 1.5 (duplex vs. streptavidin monomer) in a solution containing 50 mM HEPES (pH 7.5), 100 mM NaCl, and 5 mM MgCl 2 . 60 The mixture was incubated at room temperature for 2 hr, and then used immediately for cw-EPR measurement. DNA tethering to streptavidin was confirmed by gel shift assays (SI Figure S1). Each labeled DNA sample was designated by its strand identity and labeling site. For example, “BAX_4” represented the BAX duplex with an R5a attached at nucleotide 4, i.e. at the phosphorothioate between A 3 and C 4 (Figure 1A). Also note that R p and S p phosphorothioate diastereomers present at each attachment site were not separated (21). 2.3 X-band EPR spectroscopy Each EPR spectrum was obtained using ~5 µL sample loaded in a round glass capillary (0.6 mm i.d. × 0.8 mm o.d. Vitrocom, Inc., Mountain Lakes, NJ) sealed at one end. X-band EPR spectra were acquired at room temperature on a Bruker EMX spectrometer using an ER4119HS cavity. The incident microwave power was 2 mW, and the field modulation was 1 G at a frequency of 100 kHz. Each spectrum was acquired with 512 points, corresponding to a spectral range of 100 G. All spectra reported had a signal-to-noise ratio (S/N), defined as the central line peak-to-peak height (i.e., signal) divided by the standard deviation of the first 25 points at the low field region (i.e., noise), over 350. Spectra were corrected for free spin-label if necessary, and normalized to the same number of spins using software kindly provided by the Hubbell group at UCLA. No baseline correction was carried out due to the high S/N of samples studied. 2.4 Computation of Pearson coefficient Pearson coefficient (P), which measures the degree of linear correlation between data sets X ! and Y ! , is defined as: P= ! !!! ! ! !! ! ! ! ! !! ! ! ! !!! (1) 61 where X and Y are the mean of the respective data set, and s ! and s ! are the corresponding standard deviation. As defined, a Pearson coefficient may vary between +1 and -1, with +1 indicating the two data sets positively correlate in a perfectly linear fashion, -1 indicating negative correlation in a perfectly linear fashion, and 0 indicating no linear correlation. A set of programs written in Matlab (R2011b 7.13.0.564, The MathWorks, Inc.) were developed in house to automatically compute Pearson coefficients between any pair of EPR spectra. To compute a Pearson coefficient, two spectra were first aligned at the center, which was defined as the point at which the amplitude change signs within the center manifold. Then spectra corresponding to 40 G at either side of the center point were selected and digitized into 412 data points. Following spectral alignment and range selection, the pair-wise Pearson coefficient was calculated using the “corr(x,y)” function built-in within Matlab. 2.5 Sequence-dependent DNA shape parameters prediction Sequence-dependent DNA helical parameters were obtained using the webserver of “High- throughput DNA shape prediction” (http://rohslab.cmb.usc.edu/DNAshape/) (4), which currently provides predictions for Roll, Helical Twist, Minor Groove Width, and Propeller Twist. 3. Results 3.1 Nitroxide scanning of DNA duplexes using an optimized streptavidin tethering scheme Studies reported here used two DNA duplexes, designated as BAX and p21 (Figure 1A). Both duplexes are response elements (RE) of the p53 tumor suppressor, and conform to a consensus sequence of two decameric “half-sites” (RRRCWWGYYY; R=A,G; W=A,T; Y=C,T) that are either consecutive or separated by a spacer (27,28). Recently, using EPR measured inter- 62 nitroxide distances, we demonstrated these two REs both deform at the central region between the two half-sites in response to binding by the p53 core domain, but their modes of deformation differ between each other (29). The BAX and p21 duplex each has a molecular weight of approximately 13 kD (Figure 1A). In aqueous solution at room temperature, the rotational correlation time of each duplex is estimated to be approximately 7 ns (30). This global tumbling of the duplex averages the anisotropic magnetic tensors of nitroxide in a site-independent manner, and at X-band (~ 9.5 GHz), diminishes the observable site-dependent spectral variations. Interference due to site- independent global tumbling is not a problem unique to the current work. One of the commonly used approaches to address this issue is to increase solution viscosity by including a co-solvent (e.g., sucrose (16,18,21,31)). However, the co-solvent may affect not only global tumbling, but also local motions at the target molecules and/or the spin labels (32), thus potentially introducing artifacts. In this study, instead of adding co-solvent to minimize spectral interference due to global tumbling of BAX and p21, a previously reported tethering scheme (33) was further optimized and implemented (see Methods, Figures 1C and S1). In this scheme, the spin-labeled DNA duplex was tethered to streptavidin to form a complex, thus increasing the overall molecular weight by ~ 8 fold. Tethering efficiently reduced the overall rotational motion of the duplex without interfering with local motions at the labeling sites, rendering the observed X-band EPR spectrum more sensitive to local DNA environment. Using this tethering scheme, spectra were obtained for 16 and 15 consecutive internal sites along one strand of the BAX and p21 duplex, respectively (Figure 2). Spectral variations, such as hyperfine splitting, central line width, and features of the low- and high-field manifolds, can be 63 observed among these spectra. Prior studies have shown that R5a minimally perturbs the native local environment of a DNA duplex (21,34). Furthermore, observed R5a spectra were not affected by changing the relative location of the labeling site with respect to the tethering point (SI Figure S2), indicating that R5a did not directly contact streptavidin. In addition, while four DNA duplexes were present on one streptavidin tetramer (Figures 1B and S1), under the experimental conditions the observed spectra were not biased by spin-spin interactions between the duplexes (SI Figure S3). Overall, the observed lineshape variations report on differences at the local DNA environment. 64 Figure 2: X-band cw-EPR spectra of R5a-labeled streptavidin-tethered BAX (panel A) and p21 (panel B) duplexes. All spectra were corrected for free spin-label (< 5%) if necessary, normalized, and aligned at the central line. The S/N ratio ranged between 350 and 2300. 65 3.2 Pair-wise Pearson coefficients for assessment of EPR lineshape variation To retrieve information on DNA shape from the ensemble of R5a spectra, a numerical value encoding relevant spectral characteristics is needed. In protein nitroxide scanning, this can be accomplished by using parameters, such as central line width (ΔH pp ) or effective hyperfine splitting (2A eff ), to empirically represent nitroxide mobility (7,8). However, 3 of the 31 R5a spectra measured did not show discernable 2A eff (BAX_9, p21_6, p21_16, Figure 2). The ΔH pp values varied within a narrow range (1.5 – 2.3 G), as difference along a DNA duplex is likely smaller than that in a folded protein. As such, the traditional lineshape parameters are not as effective for analyzing R5a spectra in DNA mapping. We developed a new spectral analysis tool based on the Pearson coefficient to retrieve information collectively embedded in the R5a spectra (see Methods). The Pearson coefficient, which ranges from +1 to -1, assesses the degree of linear correlation between two sets of variables. A Pearson coefficient of +1 indicates two perfectly positively correlated data sets, and in this work, EPR spectra with identical lineshape. Indeed, when BAX_19 spectra (see Methods for nomenclatures) obtained from three different sample preparations were analyzed, the pair- wise Pearson coefficients (P) were averaged to 0.999 with a standard deviation of 0.001 (Figure 3A). Furthermore, control studies showed that the P values remain invariant to the third significant figure upon: (i) decreasing the S/N ratio down to 300 (SI Figure S4); (ii) spectral normalization (SI Figure S5); and (iii) extending the spectral range from 80 G to 100 G. Overall, for the reported R5a spectra, the Pearson coefficients were deemed accurate to three significant figures. 66 Figure 3: Characteristics of pair-wise Pearson coefficients. (A) Overlay of BAX_19 spectra obtained from three independent sample preparations. The pair-wise P values were 0.99892, 0.99968, and 0.99822, respectively. (B) Examples of pair-wise spectral comparisons with their corresponding P values shown. (C) Examples showing that a pair of spectra (BAX_5 and p21_12, center) with a high P value compared similarly to a third spectrum (top: p21_5; bottom: BAX_13). Figure 3B shows examples of pair-wise spectral comparison, which demonstrates that decreasing P values report on increasing spectral differences. The P value was 0.995 between BAX_4 and p21_4, both of which were obtained at an ApC di-nucleotide step, and direct comparison revealed highly similar lineshapes. When BAX_4 was compared to BAX_5, larger spectral difference in splitting at the low- and high-field manifolds can be observed, and a lower P value of 0.938 was obtained. Among the 465 pairs of spectra investigated, the lowest P value was 0.885, and was obtained between BAX_4 and p21_15, which indeed showed the most lineshape variations, with the largest difference in the splitting at both low- and high-field 67 manifolds and clear deviation in central line width. Note that a P value of 0.885 indicated that the two data sets (i.e., spectra) remain positively correlated, which reflects the fact that BAX_4 and p21_15 remain within the same motional regime (Figure 3B). P values not only provide a numeric measure of similarity/dis-similarity for one pair of spectra, but also allow assessment between different pairs. For example, P(BAX_4/p21_15) < P(BAX_4/BAX_5), and direct spectral superimposition showed that the difference is larger between BAX_4/p21_15 as compared to that between BAX_4/BAX_5 (Figure 3B). Also note that if two spectra are highly similar (i.e., P > 0.980), each of them would compare similarly with respect to a third spectrum, and therefore give similar P values. This is illustrated in Figure 3C, in which two highly similar spectra, BAX_5 and p21_12 (P=0.987), were compared with either p21_5 (Figure 3C, top) or BAX_13 (Figure 3C, bottom). 3.3 Pearson coefficients referenced to TEMPOL spectrum report nitroxide mobility In addition to analyzing R5a spectra by pair-wise Pearson coefficients, we developed another parameter, P TEMPOL , defined as the Pearson coefficient between an EPR spectrum and that of a free radical compound, TEMPOL (4-hydroxy-2,2,6,6-tetramethylpiperidinyloxy) (Figure 4). The TEMPOL spectrum, measured at room temperature in aqueous buffer without added molecular crowders, shows three sharp lines with equal amplitude (Figure 4A, bottom), and represents a nitroxide undergoing isotropic rotation at the fast motion limit. As such, a high P TEMPOL value indicates similarity between the sample spectrum and that of TEMPOL, and likely reports high nitroxide mobility. A low P TEMPOL indicates deviation of the sample spectrum from that of TEMPOL, and in the context of R5a spectra analyzed here, low nitroxide mobility. 68 Figure 4: Characteristics of P TEMOPL . (A) Examples of R5a spectra showing the correlation between P TEMOPL and lineshape variations. The reference TEMPOL spectrum was shown at the bottom. The R5a spectra were scaled to the same height at the center line, while the TEMPOL reference was scaled to 1/2 of that of R5a. Dashed lines are shown to aid visualization of 2A eff differences. (B) Correlation between P TEMPOL and 2A eff . Data points (black dots) were obtained from spectra shown in Figure 2, but excluded BAX_9, p21_6 and p21_16, in which no clear 2A eff could be measured. Linear fit (red line) yielded a coefficient of determination R 2 of 0.81. To validate the connection between P TEMPOL and R5a mobility, we examined the correlation between P TEMPOL and 2A eff , a well-established lineshape parameter that generally increases as nitroxide mobility decreases (11,31). Figure 4A shows examples of R5a spectra, where the computed P TEMPOL value decreases as the corresponding 2A eff increases. Furthermore, for the 28 spectra with measurable 2A eff , P TEMPOL negatively correlates with the corresponding 2A eff in a linear fashion (Figure 4B). Overall, the analyses demonstrated that P TEMPOL indeed reports R5a mobility, with a larger P TEMPOL value indicating higher mobility. 3.4 R5a mobility maps reveal DNA duplex shape at the level of individual base-pairs For each DNA duplex studied, we obtained a P TEMPOL map by plotting the measured P TEMPOL value according to the corresponding residue number (Figure 5A). P TEMPOL acts as a measurement of R5a mobility, which previously has been shown to report structural and dynamic 69 features at the labeling site (21,25,26). As such, the P TEMPOL map provided a map of the DNA local environment as sensed by R5a. Figure 5: R5a mobility maps reveal DNA duplex shape. (A) Plots of P TEMPOL along BAX (black) and p21 (red) sequence. (B) Pair-wise P values between spectra acquired at corresponding sites along the two DNA. The CWWG core regions were shown in gray, and the central region in blue. A dashed line was drawn at P = 0.980 to aid visualization. (C) Comparisons between the P TEMPOL map (black) and that of Roll (blue) for BAX (top panel) and p21 (bottom panel). P TEMPOL value was aligned to the Roll value of the base-pair step 3’ of the spin-labeled phosphate. Pearson coefficients between the two maps were 0.848 and 0.685, respectively, for BAX and p21. The P TEMPOL maps were used to compare shapes between BAX and p21. As targets of the p53 tumor suppressor, these two DNAs follow the same consensus pattern but differ in their exact sequences (Figure 1A). At each of the corresponding CWWG core regions (marked by dashed lines, Figure 5A), the P TEMPOL maps showed very similar patterns, indicating similarity in local shapes between the CWWG cores. This is further supported by the observation that the pair-wise P values for spectra obtained at the corresponding sites were all > 0.980 (Figure 5B), 70 which indicate high degrees of spectral similarity. For example, BAX_5 and p21_5, obtained at CpW of the first core, give a P of 0.992 (Figure 3C, top left). However, at the central region between the CWWG cores, variations of R5a mobility were observed between the two DNAs. Specifically, at the “RRR” segment located 5’ of the second half-site, opposite trends of P TEMPOL alteration can be observed (Figure 5A), and the two corresponding pair-wise P values were below 0.980 (Figure 5B). Examination of the corresponding spectra indeed confirmed a high degree of spectral variation (e.g., P(BAX_13/p21_12) = 0.954, Figure 3C, bottom right). This indicates that local shapes at the central regions of these two DNA are more divergent. Furthermore, maps of ΔH pp and 2A eff along BAX and p21 supported the conclusions drawn from the P TEMPOL maps, although the 2A eff map had gaps and the ΔH pp map had a narrow dynamic range (SI Figure S6). To better understand how P TEMPOL maps report on DNA local shape, we compared P TEMPOL maps to that of average DNA helical parameters, which represent local geometric variations of the duplex. Using an extensively validated high-throughput server (4), we obtained predicted maps of Roll, Helical Twist (HelT), Minor Groove Width (MGW), and Propeller Twist (ProT), for BAX and p21. In both duplexes, a high degree of positive linear correlations can be observed between the P TEMPOL map and that of Roll (Figure 5C), indicating higher R5a mobility at sites with larger Roll values. On the other hand, correlations between P TEMPOL /MGW (SI Figure S7) and P TEMPOL /ProT (SI Figure S8) were below 0.5 in both BAX and p21, indicating that R5a mobility is likely not correlated with variations of MGW and ProT. For HelT, correlation with P TEMPOL was low in BAX but high in p21 (SI Figure S9), so how R5a mobility reflects HelT variation is not clear. 71 Another aspect of local DNA shape variation is sequence-dependent deformability (1,5). Using parameters reported by Olson and co-workers (5), we predicted base-pair step elastic force constant maps of Roll-Roll and HelT-HelT for BAX and p21, which serve as representations of deformability variation. However, correlations between these maps and that of P TEMPOL were all below 0.6, with no consistent pattern in the two duplexes (SI Figure S10). As such, the current analysis did not reveal a clear correlation between P TEMPOL and DNA deformability. Overall, analyses demonstrated that P TEMPOL maps report on certain physical attributes of DNA duplex shape, and reveal that between BAX and p21 DNA shapes are very similar at the CWWG cores but deviate at the central region between the two cores. 3.5 P matrix for global lineshape analyses To globally analyze lineshape features of 31 R5a spectra obtained, we constructed a two- dimensional matrix (the P-matrix), in which each cell was assigned the P value between the two spectra placed at the respective row and column (Figure 6). For this purpose, we placed the 31 spectra into row and column following the same order of decreasing P TEMPOL (i.e., decreasing nitroxide mobility) (Figure 6, SI Table S2). To aid analyses, we color-coded the cells so that those with the highest P (i.e., 1 at the diagonal cells) were pure black, the one with the lowest P (i.e., 0.885) was pure white, and those in between were represented by a linear percentage of gray. As expected, this matrix was symmetric; the diagonal cells (representing self-comparison) all had P=1.000; and P-values generally decreased from the diagonal towards the outer edges, as the outer edge cells compared spectra with different mobility and hence increasing lineshape deviations. 72 Figure 6: A matrix of the pair-wise Pearson coefficients obtained with 31 R5a spectra. For clarity each spectrum was assigned a number, with the key given in SI Table S2. Each cell was color-coded in gray-scale based on the P value between spectra identified by the respective row and column labels. The row/column labels are color-coded based on the di-nucleotide step as shown in the figure legend. With the P-matrix constructed, direct visual inspection revealed a number of interesting features. First, off-diagonal cells in the last column on the right, which contained pair-wise P- values computed between p21_15 (spectrum #31, SI Table S2) and others, all showed relatively low P values (Figure 6). This indicated that the p21_15 spectrum, obtained from an R5a attached at a CpA step, differed from any other spectra within the ensemble. The same is true for BAX_5 (spectrum #30, SI Table S2), another CpA step, which occupied the second last column on the 73 right (Figure 6). The other three spectra obtained at “CpA” steps (p21_5, p21_12 and BAX_16, spectra #22, #26 and #29, respectively, SI Table S2) also showed relatively low R5a mobility, although they did have high P values with a few spectra obtained at non-CpA sites (Figure 6). Interestingly, pyrimidine-purine steps, including CpA, are the most easily deformed among all di-nucleotide steps, and usually contribute the largest flexibility in DNA/protein complexes (5). It is possible that in the naked DNA, the high deformability/flexibility of the CpA step manifests as sampling, at rates slower than nanosecond, of micro-conformations that vary depending on sequence context. This would result in broad spectra (apparent low R5a mobility) that are divergent among CpA sites (Figure 6). For the remaining spectra, a number of off-diagonal cells had high P values, and clusters of dark gray cells were observed (Figure 6). For example, BAX_18, BAX_17 and p21_17 (spectra #23, #24, and #25, respectively, SI Table S2) cluster at the lower right corner of the matrix, and direct comparisons revealed that these spectra were indeed highly similar (SI Figure S11). Spectra within each cluster hold a relatively high degree of similarity in lineshape features, possibly indicating similarity in DNA shape between these sites. Between clusters, low P values are observed, indicating divergence of spectral features and possible DNA shape differences. 4. Discussion Site-specifically attached nitroxides and their EPR lineshapes have been used widely to investigate local environments in DNA and RNA (10,11,13). However, contrary to protein studies, to the best of our knowledge, nitroxide scanning of consecutive sites has not been reported in nucleic acid SDSL. In this work, we scanned consecutive sites within two DNA duplexes using a sequence-independent nitroxide probe, R5a. EPR spectral variations, which 74 report on the local environment at the respective labeling site, were analyzed using a newly developed method based on Pearson correlation coefficients. The resulting Pearson coefficients, when plotted along the DNA sequence, provided maps that allow examination and comparison of local DNA shape in the two duplexes. The studies established an objective method to analyze EPR spectral variation, and together with the R5a probe, yielded an experimental approach for examining sequence-dependent DNA shape. 4.1 Pearson coefficients for EPR lineshape analysis and comparison The Pearson coefficient based protocol was developed to address issues arisen when scanning DNA using R5a, namely sites lacking discernible hyperfine splitting (2A eff ) and the narrow spread of central line width (ΔH pp ). The Pearson coefficient accounts for differences in all three manifolds of the 14 N nitroxide spectrum, and therefore globally reflects, in one numerical value, differences characterized by other spectral parameters, including ΔH pp and 2A eff . Two types of Pearson coefficients were demonstrated in mapping DNA shape: the pair- wise P values, which can be used for comparing different DNA sites (Figures 3C and 5B); and P TEMPOL , which reports R5a mobility variations along the DNA strand (Figures 4 & 5A). Pearson coefficients were robust with respect to a number of experimental factors, including a certain degree of deterioration in S/N (SI Figure S4). Furthermore, they are invariant with respect to linear scaling of either spectrum, and are not impacted by, and indeed do not require, spectral normalization (SI Figure S5). This is in contrast to root-mean-square-deviation (RMSD) between two spectra, which is based on differences in absolute value (SI Figure S5). Additionally, RMSD is dominated by the high amplitude region, which is usually the central line in a nitroxide 75 spectrum, rendering it sub-optimal for comparing R5a spectra, most of which have sharp central lines. Similar to 2A eff and ΔH pp , Pearson coefficients are semi-quantitative parameters derived empirically from the cw-EPR spectrum. They are not expected to provide a complete description of nitroxide motion, and their effectiveness on reporting variations of nitroxide mobility likely depends on the motional regime. For example, for R5a data obtained in the two DNA duplexes studied here, P TEMPOL was shown to correlate with 2A eff in a linear fashion (Figure 4), and sufficiently represent differences in R5a mobility to allow DNA mapping (Figure 5). In principle, one can construct a Pearson scale using an arbitrary dataset as the reference spectrum, although its effectiveness on representing the particular nitroxide mobility for mapping DNA shape would need to be examined. Nonetheless, the Pearson coefficients are obtained in an automated fashion without the highly subjective and rather qualitative assessment by a user, and should be applicable to other labels and systems. This expands the repertoire of parameters for extracting information from an ensemble of EPR spectra. 4.2 Mapping DNA shape using the nucleotide-independent nitroxide probe We have previously shown that EPR lineshapes of the family of nucleotide-independent nitroxide probes (e.g. R5a) vary among different sites of DNA duplexes, and such variations report on variation in the local DNA environment (21,25,26). Using semi-empirical lineshape parameters that report R5a mobility, including the Pearson coefficients developed here (Figure 5) and the more traditional ΔH pp and 2A eff values (SI Figure S6), we generated “maps” to globally and systematically examine structural and dynamic variation along a DNA duplex. This allowed us to assess similarities and differences in local DNA shape between duplex segments. Data 76 obtained for BAX and p21, both of which are p53 REs, revealed that DNA shape is similar between the corresponding CWWG cores, but varies at the central region bridging the two cores (Figure 5). Previously, we have reported that the central regions of these two REs deform in different fashions upon interacting with the p53 core DNA-binding domain: BAX undergoing further unwinding, while p21 incurring a pronouncing shift of the helix axis (29). Variability of DNA shape at the central region in the naked DNA is consistent with those observations. The strategy of attaching labels at multiple sites to gain a comprehensive understanding of the target DNA/RNA has been implemented in many prior SDSL studies (e.g. (16-22)). However, work presented here, to the best of our knowledge, is the first report of scanning a nitroxide consecutively through strands of nucleic acids. This is enabled by two key elements: (i) the nucleotide-independent R5a probe, which allows efficient labeling at each target site; and (ii) the method of analyzing R5a spectral variation using the Pearson coefficient, which complements the traditional ΔH pp and 2A eff parameters. With the R5a maps including each individual nucleotide within the target DNA strand, a high resolution is achieved. This is advantageous for comparing DNA shape, as demonstrated by the ability to pick up the highly localized differences between the central regions of BAX and p21 (Figure 5). Previous work has established that within a DNA duplex R5a motions can be described as Brownian diffusion under a restricted potential, and the observed EPR spectrum reports a combined effect due to the rate and the ordering of R5a motion (25,26). The local DNA environment modulates the sterically allowed rotameric space at the labeling site, thereby affecting R5a motional ordering and the resulting EPR spectrum. The rotameric space likely has a direct correlation to local geometric variation of the duplex, giving rise to the positive correlations observed between maps of P TEMPOL and Roll (Figure 5C). 77 We also reported that the R5a spectrum can be modulated by DNA backbone motions (i.e., the B I /B II transition) at the 5’ nearest neighbor and by specific contacts with the 3’ nearest neighbor (26). However, current analyses did not reveal clear correlations between P TEMPOL and predicted elastic force constants representing variations of DNA deformability (SI Figure S10). It is possible that DNA deformability and backbone motions are not directly correlated. In addition, the elastic force constant maps were predicted using a dimer model (5). This is likely not sufficient as R5a lineshape is certainly impacted by sequences beyond a dimer (e.g., the variability of the CpA spectra, Figure 6). Work reported here provided the first indication that EPR spectral parameters can be connected to actual DNA helical parameters. However, an R5a spectrum likely reflects only certain aspect of DNA local shape, and may be affected by 5’ and 3’ neighboring nucleotides (21,26). A lot more work is required for a better understanding between the EPR observables and the actual physical shape of a DNA duplex. Towards this goal, we have proposed to establish an empirical library in which R5a spectra are obtained and categorized according to DNA sequence (33), thus allowing systematic analyses of nitroxide/DNA coupling (26). A key requirement for such a library is to identify clusters of sites with similar lineshape features. This now becomes possible with the P-matrix analysis, which allows global assessments on variations among the 465 pairs of R5a spectra (Figure 6). The P-matrix did reveal clustering in the observed spectra (Figure 6), supporting the feasibility of establishing an R5a library. R5a is attached at the phosphate, and the di-nucleotide step sandwiching the probe likely exerts the most effect on its EPR spectra. As a first step towards identifying converging physical attributes that result in certain lineshape features, we examined the distribution of di-nucleotide step among the different clusters (Figure 6). We found that the YpY (Y: pyrimidine; C or T) 78 steps are located almost exclusively at the upper left region of the matrix, indicating that R5a attached to YpY shows a similar lineshape with characteristics of a higher degree of mobility. This likely reflects the weaker stacking between pyrimidine bases due to their smaller surface areas (35,36), and is consistent with prior studies of R5a (21). Furthermore, RpY (R: purine; A or G) steps are mostly located at the left half on the matrix, while YpR di-nucleotides favor strongly the right half. On the other hand, the RpR di-nucleotide steps spread throughout the matrix, which may suggest that these sites are subjected to influences beyond the nearest di-nucleotides. Note that while 31 spectra were analyzed here, the ensemble does not allow sufficient coverage of all the di-nucleotide steps, let alone further analysis on effects beyond the nearest-neighbors. Future work should expand the library size. In summary, this work established a method for experimentally examining sequence- dependent DNA shape. Furthermore studies, particularly those combine experimental and computational approaches (26), should advance our understanding on how DNA modulates R5a spectrum, thus enhancing our ability to map DNA shape using SDSL. 79 References 1. Rohs, R., Jin, X., West, S.M., Joshi, R., Honig, B. and Mann, R.S. (2010) Origins of specificity in protein-DNA recognition. Annual review of biochemistry, 79, 233-269. 2. Egli, M. and Pallan, P.S. (2010) The many twists and turns of DNA: template, telomere, tool, and target. Current opinion in structural biology, 20, 262-275. 3. Perez, A., Luque, F.J. and Orozco, M. (2012) Frontiers in molecular dynamics simulations of DNA. Accounts of chemical research, 45, 196-205. 4. Zhou, T., Yang, L., Lu, Y., Dror, I., Dantas Machado, A.C., Ghane, T., Di Felice, R. and Rohs, R. (2013) DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale. Nucleic Acids Research. 5. Olson, W.K., Gorin, A.A., Lu, X.J., Hock, L.M. and Zhurkin, V.B. (1998) DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. Proc Natl Acad Sci U S A, 95, 11163-11168. 6. Parker, S.C., Hansen, L., Abaan, H.O., Tullius, T.D. and Margulies, E.H. (2009) Local DNA topography correlates with functional noncoding regions of the human genome. Science (New York, N.Y, 324, 389-392. 7. Hubbell, W.L. and Altenbach, C. (1994) Investigation of structure and dynamics in membrane proteins using site-directed spin labeling. Curr. Opin. Struct. Biol., 4, 566- 573. 8. Fanucci, G.E. and Cafiso, D.S. (2006) Recent Advances and applications of site-directed spin labeling. Curr. Opin. Struct. Biol., 16, 644-653. 9. Hubbell, W.L., López, C.J., Altenbach, C. and Yang, Z. (2013) Technological advances in site-directed spin labeling of proteins. Current opinion in structural biology, 23, 725- 733. 10. Sowa, G.Z. and Qin, P.Z. (2008), Prog. Nucleic Acids Res. Mol. Biol., Vol. 82, pp. 147- 197. 11. Zhang, X., Cekan, P., Sigurdsson, S.T. and Qin, P.Z. (2009) Studying RNA using site- directed spin-labeling and continuous-wave electron paramagnetic resonance spectroscopy. Methods in enzymology, 469, 303-328. 12. Reginsson, G.W. and Schiemann, O. (2011) Studying biomolecular complexes with pulsed electron-electron double resonance spectroscopy. Biochemical Society transactions, 39, 128-139. 13. Krstic, I., Endeward, B., Margraf, D., Marko, A. and Prisner, T.F. (2012) Structure and dynamics of nucleic acids. Topics in current chemistry, 321, 159-198. 14. Shelke, S.A. and Sigurdsson, S.T. (2012) Site-Directed Spin Labelling of Nucleic Acids. Eur J Org Chem, 2291-2301. 15. Fedorova, O.S. and Tsvetkov, Y.D. (2013) Pulsed electron double resonance in structural studies of spin-labeled nucleic acids. Acta naturae, 5, 9-32. 16. Edwards, T.E., Okonogi, T.M., Robinson, B.H. and Sigurdsson, S.T. (2001) Site-specific incorporation of nitroxide spin-labels into internal sites of the TAR RNA; structure- dependent dynamics of RNA by EPR spectroscopy. Journal of the American Chemical Society, 123, 1527-1528. 17. Edwards, T.E., Okonogi, T.M. and Sigurdsson, S.T. (2002) Investigation of RNA-protein and RNA-metal ion interactions by electron paramagnetic resonance spectroscopy. The HIV TAR-Tat motif. Chemistry & biology, 9, 699-706. 80 18. Qin, P.Z., Hideg, K., Feigon, J. and Hubbell, W.L. (2003) Monitoring RNA base structure and dynamics using site-directed spin labeling. Biochemistry, 42, 6772-6783. 19. Edwards, T.E., Robinson, B.H. and Sigurdsson, S.T. (2005) Identification of amino acids that promote specific and rigid TAR RNA-tat protein complex formation. Chemistry & biology, 12, 329-337. 20. Kim, N.K., Murali, A. and DeRose, V.J. (2005) Separate metal requirements for loop interactions and catalysis in the extended hammerhead ribozyme. J. Am. Chem. Soc., 127, 14134-14135. 21. Popova, A.M., Kalai, T., Hideg, K. and Qin, P.Z. (2009) Site-specific DNA structural and dynamic features revealed by nucleotide-independent nitroxide probes. Biochemistry, 48, 8540-8550. 22. Cekan, P., Jonsson, E.O. and Sigurdsson, S.T. (2009) Folding of the cocaine aptamer studied by EPR and fluorescence spectroscopies using the bifunctional spectroscopic probe C. Nucleic Acids Res., 37, 3990-3995. 23. Qin, P.Z., Butcher, S.E., Feigon, J. and Hubbell, W.L. (2001) Quantitative analysis of the GAAA tetraloop/receptor interaction in solution: A site-directed spin labeling study. Biochemistry, 40, 6929-6936. 24. Qin, P.Z., Haworth, I.S., Cai, Q., Kusnetzow, A.K., Grant, G.P.G., Price, E.A., Sowa, G.Z., Popova, A., Herreros, B. and He, H. (2007) Measuring nanometer distances in nucleic acids using a sequence-independent nitroxide probe. Nat. Protocols, 2, 2354- 2365. 25. Popova, A.M. and Qin, P.Z. (2010) A nucleotide-independent nitroxide probe reports on site-specific stereomeric environment in DNA. Biophysical journal, 99, 2180-2189. 26. Popova, A.M., Hatmal, M.M., Frushicheva, M.P., Price, E.A., Qin, P.Z. and Haworth, I.S. (2012) Nitroxide sensing of a DNA microenvironment: mechanistic insights from EPR spectroscopy and molecular dynamics simulations. The journal of physical chemistry. B, 116, 6387-6396. 27. Riley, T., Sontag, E., Chen, P. and Levine, A. (2008) Transcriptional control of human p53-regulated genes. Nat Rev Mol Cell Biol, 9, 402-412. 28. Menendez, D., Inga, A. and Resnick, M.A. (2009) The expanding universe of p53 targets. Nat Rev Cancer, 9, 724-737. 29. Zhang, X., Dantas Machado, A.C., Ding, Y., Chen, Y., Lu, Y., Duan, Y., Tham, K.W., Chen, L., Rohs, R. and Qin, P.Z. (2014) Conformations of p53 response elements in solution deduced using site-directed spin labeling and Monte Carlo sampling. Nucleic Acids Res., 42, 2789-2797. 30. Tirado, M.M. and Garciadelatorre, J. (1980) Rotational-Dynamics of Rigid, Symmetric Top Macromolecules - Application to Circular-Cylinders. J Chem Phys, 73, 1986-1993. 31. McHaourab, H.S., Lietzow, M.A., Hideg, K. and Hubbell, W.L. (1996) Motion of spin- labeled side chains in T4 lysozyme. Correlation with protein structure and dynamics. Biochemistry, 35, 7692-7704. 32. Galiano, L., Blackburn, M.E., Veloro, A.M., Bonora, M. and Fanucci, G.E. (2009) Solute effects on spin labels at an aqueous-exposed site in the flap region of HIV-1 protease. The journal of physical chemistry. B, 113, 1673-1680. 33. Qin, P.Z., Iseri, J. and Oki, A. (2006) A model system for investigating lineshape/structure correlations in RNA site-directed spin labeling. Biochemical and Biophysical Research Communications, 343, 117-124. 81 34. Cai, Q., Kusnetzow, A.K., Hubbell, W.L., Haworth, I.S., Gacho, G.P.C., Van Eps, N., Hideg, K., Chambers, E.J. and Qin, P.Z. (2006) Site-directed spin labeling measurements of nanometer distances in nucleic acids using a sequence-independent nitroxide probe. Nucl. Acids Res., 34, 4722-4734. 35. Kool, E.T. (2001) Hydrogen bonding, base stacking, and steric effects in dna replication. Annual review of biophysics and biomolecular structure, 30, 1-22. 36. Bommarito, S., Peyret, N. and SantaLucia, J., Jr. (2000) Thermodynamic parameters for DNA sequences with dangling ends. Nucleic Acids Res., 28, 1929-1934. 82 Supplementary Information Figure S1: DNA tethering to streptavidin examined by a native gel shift assay. In each sample, 40 µM of DNA duplex was mixed with traced amount of 32 P labeled DNA duplex, then incubated with streptavidin in 50 mM HEPES (pH 7.5), 100 mM NaCl, and 5 mM MgCl 2 . The samples were loaded onto a 8% native polyacrylamide gel that was prepared in a buffer containing 50 mM HEPES, pH 7.5, 100 mM NaCl, 5 mM MgCl 2 , and 89 mM boric acid. The gel was run in the same buffer at 4°C, and was visualized by both phosphorimaging (left) and Coomassie Blue staining (right). The data show that with increasing concentration of streptavidin, one DNA-streptavidin complex is formed. This species was assigned as DNA tethered to a streptavidin tetramer, as previous studies reported that streptavidin exists as tetramer when bound to biotin under similar conditions (1). 83 Figure S2: Examining spectral effects due to altering the relative positioning between the R5a labeling site and streptavidin. Two EPR spectra of streptavidin tethered BAX_19 are shown. The black trace was obtained with 21-bp BAX duplex tethered directly to streptavidin (panel (A), top), in which case the R5a label at the BAX_19 site was closest to the tethering point and expected to have the highest probability to contact streptavidin. The red trace was obtained with a 12 base-pair adaptor sequence (gray, lower case) placed in between the BAX duplex and streptavidin (panel (A) bottom), which placed the BAX_19 site further away from the streptavidin and was expected to eliminate/reduce R5a-streptavidin contacts. The two spectra show identical lineshape, with a pair-wise Pearson coefficient of 0.993. This indicates that R5a spectrum was not affected upon changing the relative location between R5a and streptavidin, indicating a lack of direct R5a-streptavidin contacts. 84 Figure S3: Assessing potential inter-molecular di-polar interaction in the streptavidin tethered DNAs. Two streptavidin tethered p21_18 spectra were compared. The black trace was obtained from a sample in which 100% of the p21 duplex was labeled with R5a, while the red trace was obtained in which 20% of the p21 duplex was labeled. No broadening was observed when comparing the 100% labeled sample to that of 20% labeled. The pair-wise P was calculated to be 0.999. This indicates that inter-molecular di-polar interaction minimally affects the observed spectral lineshape under our experiment conditions. 85 Figure S4: Examining the impact of noise on Pearson coefficient. Various amount of random noise was added to the measured BAX_19 spectrum, and then the P TEMPOL values were computed. For signal-to-noise (S/N) ratio > 300, no change in P TEMPOL was observed. 86 Figure S5: Sensitivity of Pearson coefficient and RMSD on spectra normalization. Two spectra plotted here are BAX_17 (black trace) and BAX_10 (red trace) for both before and after normalization. P(BAX_17/BAX_10) remains unchanged upon spectral normalization, while the RMSD value changes drastically. 87 Figure S6: Map of central line width (ΔH pp ) (top panel) and effective hyperfine splitting (2A eff ) (bottom panel) in the BAX (black) and p21 (red) duplexes. 88 Figure S7: Comparisons between maps of P TEMPOL (black) and minor groove width (MGW, blue) for BAX (top panel) and p21 (bottom panel). P TEMPOL value was aligned to the MGW value for the base-pair 3’ of the spin label. Pearson coefficients between the two maps were 0.419 and 0.007, respectively, for BAX and p21. 89 Figure S8: Comparisons between maps of P TEMPOL (black) and propeller twist (ProT, blue) for BAX (top panel) and p21 (bottom panel). P TEMPOL value was aligned with the ProT value for the base-pair 3’ of the spin label. Pearson coefficients between the two maps were -0.124 and 0.237, respectively, for BAX and p21. 90 Figure S9: Comparisons between maps of P TEMPOL (black) and helical twist (HelT, blue) for BAX (top panel) and p21 (bottom panel). P TEMPOL value was aligned to the HelT value of the base-pair step 3’ of the spin-labeled phosphate. Pearson coefficients between the two maps were 0.210 and 0.664, respectively, for BAX and p21. Note that correlation between the predicted map of Roll and HelT is high for p21 (0.591) but low for BAX (0.035). As P TEMPOL shows high correlation to Roll in both BAX and p21, correlation between P TEMPOL /HelT can only be high for p21. 91 Figure S10: Comparisons between maps of P TEMPOL (black), Roll-Roll force constant (red), and HelT-HelT force constant (blue) for BAX (top panel) and p21 (bottom panel). The force constants were obtained from reference (2). P TEMPOL value was aligned to the force constant for the base-pair step 3’ of the spin-labeled phosphate. For BAX, Pearson coefficients were -0.517 and -0.456, respectively, for P TEMPOL /Roll-Roll and P TEMPOL /HelT-HelT. For p21, they were - 0.314 and -0.599, respectively. 92 Figure S11: Overlay of spectra that showed a high degree of similarity in the P-matrix (main text Figure 6). Black: spectrum 23 (BAX_18); Red: spectrum 24 (BAX_17); Blue: spectrum 25 (p21_17). The pair-wise P values among these three spectra were P(BAX_17/BAX_18): 0.999; P(BAX_18/p21_17): 0.998; and P(BAX_17/p21_17): 0.998. 93 Table S1: DNA oligonucleotides used in this study. Sequence name Sequence Extinction coefficient (M -1 ·cm -1 ) BAX-labeled strand 5’-T p C p A p C p A p A p G p T p T p A p g p A p G p A p C p A p A p G p C p C p T-3’ 211,400 BAX- complementary strand 5’-biotin- A p G p G p C p T p T p G p T p C p T p c p T p A p A p C p T p T p G p T p G p A-3’ 197,700 p21-labeled strand 5’-G p A p A p C p A p T p G p T p C p C p C p A p A p C p A p T p G p T p T p G-3’ 194,900 p21- complementary strand 5’-biotin- C p A p A p C p A p T p G p T p T p G p G p G p A p C p A p T p G p T p T p C-3’ 192,700 94 Table S2: Key for spectral number in the P-matrix. Spectral Number DNA Site P TEMPOL di-nucleotide step 1 BAX_9 0.575 TpT 2 BAX_15 0.555 ApC 3 p21_6 0.554 ApT 4 BAX_4 0.543 ApC 5 p21_16 0.540 ApT 6 p21_8 0.535 GpT 7 p21_11 0.531 CpC 8 p21_9 0.530 TpC 9 BAX_13 0.529 ApG 10 BAX_11 0.526 ApG 11 BAX_8 0.523 GpT 12 p21_4 0.520 ApC 13 p21_14 0.520 ApC 14 BAX_6 0.517 ApA 15 BAX_10 0.515 TpA 16 p21_7 0.514 TpG 17 p21_10 0.514 CpC 18 p21_18 0.511 GpT 19 p21_13 0.510 ApA 20 BAX_7 0.509 ApG 21 BAX_14 0.504 GpA 22 p21_5 0.502 CpA 23 BAX_18 0.502 ApG 24 BAX_17 0.500 ApA 25 p21_17 0.499 TpG 26 p21_12 0.495 CpA 27 BAX_12 0.492 GpA 28 BAX_19 0.491 GpC 29 BAX_16 0.488 CpA 30 BAX_5 0.484 CpA 31 p21_15 0.472 CpA 95 Supplementary Information References: 1. Sano, T. and Cantor, C.R. (1995) Intersubunit contacts made by tryptophan 120 with biotin are essential for both strong biotin binding and biotin-induced tighter subunit association of streptavidin. Proc Natl Acad Sci U S A, 92, 3180-3184. 2. Olson, W.K., Gorin, A.A., Lu, X.J., Hock, L.M. and Zhurkin, V.B. (1998) DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. Proc Natl Acad Sci U S A, 95, 11163-11168. 96 Chapter 4 Conformations of p53 Response Elements in Solution Deduced Using Site- Directed Spin Labeling and Monte Carlo Sampling* Abstract The tumor suppressor protein p53 regulates numerous signaling pathways by specifically recognizing diverse p53 response elements (REs). Understanding the mechanisms of p53-DNA interaction requires structural information on p53 REs. However, such information is limited as a three-dimenational structure of any RE in the unbound form is not available yet. Here, site- directed spin labeling was used to probe the solution structures of REs involved in p53 regulation of the p21 and Bax genes. Multiple nanometer distances in the p21-RE and BAX-RE, measured using a nucleotide-independent nitroxide probe and Double-Electron-Electron-Resonance spectroscopy, were used to derive molecular models of unbound REs from pools of all-atom structures generated by Monte-Carlo simulations, thus enabling analyses to reveal sequence- dependent DNA shape features of unbound REs in solution. The data revealed distinct RE conformational changes upon binding to the p53 core domain, and support the hypothesis that sequence-dependent properties encoded in REs are exploited by p53 in order to achieve the energetically most favorable mode of deformation, consequently enhancing binding specificity. This work reveals mechanisms of p53-DNA recognition, and establishes a new experimental/computational approach for studying DNA shape in solution that has far-reaching implications for studying protein-DNA interactions. 97 *Zhang, X., Dantas Machado, A.C., Ding, Y., Chen, Y., Lu, Y., Duan, Y., Tham, K.W., Chen, L., Rohs, R.* and Qin, P.Z.* (2014) "Conformations of p53 response elements in solution deduced using site-directed spin labeling and Monte Carlo sampling." Nucl. Acids Res., 42, 2789-2797 98 1. Introduction The tumor suppressor protein p53 plays various essential roles in maintaining the integrity of the human genome. Sequence-specific binding of the p53 core DNA-binding domain (DBD) to its response elements (REs) is a key component of the regulation of a large number of signaling pathways (1). The importance of DNA recognition by p53 is highlighted by the fact that more than 80% of missense mutations of p53 found in human cancers are located within the DBD (2), and many cancer hot-spot mutants have been shown to impair recognition of target DNAs (3). The p53 REs are defined by two closely spaced decameric half-sites (consensus sequence: 5’-(RRRCWWGYYY)n(RRRCWWGYYY)-3’; R=A,G; W=A,T; Y=C,T; n = spacer of length 0-20 base pairs (bp)), and hundreds of them have been validated in human and mouse (1). The mechanisms by which p53 specifically recognizes its REs have been a long-standing question (1,3). p53 is known to recognize REs using base readout in the major groove, as exemplified by the bidentate hydrogen bond between Arg280 and the conserved guanines in the CWWG core (3). The importance of DNA shape readout has also been noted: Arg248, the most frequently mutated residue in human cancers, recognizes its DNA target through readout of minor groove geometry and electrostatic potential (4); similarly, another cancer hot-spot, Arg273, plays a role in maintaining DNA minor groove shape (5). Intriguingly, several recent crystal structures of tetrameric p53DBDs bound to full REs have revealed various deformations of the bound DNA (4,6-9), suggesting a propensity of DNA conformational change upon formation of the p53/RE complex. However, it remains unclear to what degree the observed DNA deformation may be biased by crystal packing, and more importantly, how inherent variations of the shape of REs impact the mode of conformational change upon p53 binding. Answering these important questions requires a comparison of the 99 bound and unbound RE conformation in solution. The latter is severely lacking – no atomic resolution structure of any unbound p53 RE has yet been reported. The challenge of deducing DNA conformation in solution is not limited to p53 REs. Current knowledge on sequence-dependent DNA shape, particularly for free DNA, is rather inadequate despite its important role in protein-DNA recognition (10). For instance, sampling of sequence versatility in structure databases is insufficient, with the Dickerson dodecamer accounting for ~10% of all DNA entries in the Nucleic Acid Database (11), and analyses of short non-coding DNA sequences in several eukaryotic genomes conclude that none of the abundant sequences have had their structures determined (12). These limited experimental data on intrinsic DNA shape is due, in large part, to the difficulty of obtaining unbiased structural information of “naked” DNA. Whereas high-resolution structures of free DNA have been obtained by X-ray crystallography and NMR spectroscopy, their number is small compared to available data for protein-DNA complexes (13). In addition, X-ray crystallography studies are hindered by crystal- packing biases, and NMR studies are constrained by the size of the DNA. Here, we introduce a new experimental/computational pipeline, in which the method of site- directed spin labeling (SDSL) is combined with all-atom Monte Carlo (MC) simulations to derive atomic resolution data representing the sequence-dependent conformation of DNA duplexes in solution. SDSL utilizes electron paramagnetic resonance (EPR) spectroscopy to monitor nitroxide radicals (i.e., the spin labels) attached at specific sites of biomolecules, and has matured as a tool for studying the structure and dynamics of proteins and nucleic acids (14,15). The MC simulation technique was shown to enable efficient conformational sampling and was extensively validated using massive experimental data from X-ray crystallography, NMR spectroscopy, and hydroxyl radical cleavage experiments (13,16). In our new SDSL-MC scheme, 100 a pulsed EPR technique, Double-Electron-Electron-Resonance (DEER) (17), is applied to measure distances between nitroxide pairs attached to the target DNA duplex. These distances are then used as constraints to query a large pool of all-atom models generated by MC simulations (18,19), thereby identifying those that conform to the experimental measurements. The SDSL-MC approach is not limited by the size of the DNA or the requirement of crystalline samples, and provides a new method for examining the sequence-dependent shape of DNA in solution. This work presents studies on two prototypic naturally-occurring p53 REs (1): the p21-RE with no spacer (0-bp) between the two half-sites; and the BAX-RE with a 1-bp insertion between the two half-sites (Figure 1A). Using the SDSL-MC approach, conformations of the unbound p21-RE and BAX-RE were determined in solution. Comparing the unbound and bound DNA revealed conformational changes in the central region between the two half-sites upon protein binding, which allow formation of key protein-DNA and protein-protein contacts. The modes of conformational change, which differed between the two REs, could be linked to properties that are encoded in the individual nucleotide sequences, suggesting a possible means to achieve binding specificity through sequence-dependent conformational changes. 101 Figure 1: Experimental design and example DEER results. (A) Nucleotide sequences of the REs. The p21-RE has no spacer between the two half-sites and is present at the 5’ upstream of the promoter of the CDKN1A gene involved in controlling cell cycle arrest. The BAX-RE has a 1-bp spacer (indicated by the lower-case letter) and is present at the promoter of the Bax gene. For each RE, numbering scheme of the phosphates is shown next to each strand. The two half-sites are noted, with the CWWG core of each half- site shown in bold, and the dotted box marking the central region between the two half-sites. Colored dotted lines and corresponding phosphates designate the example measured distance sets shown in (C). (B) The R5 nitroxide probe. (C) Examples of DEER data. Each dataset is designated by the RE and the corresponding labeling sites. DEER spectra measured in the absence (straight line) and presence (dashed line) of p53DBD are shown, with black dotted lines included to aid comparison. See Supplementary Information (SI) Figures S3 and S5 for additional DEER data and analyses. 2. Materials and Methods 2.1 DNA spin labeling DNA oligonucleotides were synthesized by solid-phase chemical synthesis (Integrated DNA Technologies, Coralville, IA). Following previously reported protocols, the R5 spin labels (1- oxyl-2,2,5,5-tetramethylpyrroline, Figure 1B) were attached to specific DNA sites using the phosphorothioate scheme, and the labeled DNA was purified by HPLC (20). Concentrations of labeled DNA were determined by absorbance at 260 nm. Note that in this work the R p and S p 102 phosphorothioate diastereomers present at each attachment site were not separated. Previous studies using model systems have validated the use of R p /S p mixtures in DEER measurement and established an appropriate method for interpreting the measured inter-nitroxide distances (21- 23). 2.2 EPR sample preparation and DEER spectroscopy measurements of inter-spin distances EPR samples were prepared following procedures described in (9). Each DEER sample contains 50-100 µM of labeled DNA or p53DBD-DNA complex, 50 mM HEPES (pH 7.5), 100 mM NaCl, 5 mM MgCl 2 , and 40% (v/v) glycerol. DEER measurements were carried out at 80 K on a Bruker ELEXSYS E580 X-band spectrometer equipped with a MD4 resonator. Previously reported acquisition parameters and procedures (24) were used. Inter-spin distance distributions were computed from the resulting dipolar evolution data using DEERanalysis2011 (25). From these distance distribution profiles, the mean distance (r 0 ) and the width of distance distribution (σ) were calculated as reported previously (21). Repeated measurements indicated that errors in the measured r 0 are less than 1 Å. 2.3 Generation of DNA models using Monte Carlo simulations A previously published protocol was used to generate an MC ensemble of all-atom DNA models with sequence-dependent shape (18,26). Simulations started from idealized B-DNA structures generated without any sequence-dependent structural features. The MC sampling was based on collective and internal degrees of freedom using the AMBER force field implemented as previously described (18,26), an analytic chain closure with associated Jacobians (19), and explicit sodium counter ions combined with an implicit solvent description (27). MC sampling 103 was performed over 1 million cycles with random conformational changes of all degrees of freedom in each cycle. All-atom structures were recorded every 100 MC cycles, forming an MC ensemble of 10,000 structures for each simulation. 2.4 Computation of expected inter-R5 distances A previously validated NASNOX program (20,22) was used to calculate expected inter-R5 distances (r model ). Briefly, with each DNA model, the program modeled R5 at the target site, then identified sterically allowed R5 conformers using the following search parameters (see (20) for details on these parameters): t1 steps: 3; t2 steps: 6; t3 steps: 6; fine search: on; t1 starting values: 180°; t2 starting values: 180°; t3 starting values: 180°; and no additional conformer search criterion. Searches were carried out separately for the R p and S p diastereomers (i.e., R5 attached to the O1P or the O2P atom), and the results were combined to yield the ensemble of allowable R5 conformers at the given labeling site. r model between two specific labeling sites was then calculated by averaging all inter-R5 distances between the two corresponding R5 ensembles. Controls showed that varying the search parameters resulted in less than 1 Å difference in r model . For the bound REs, expected inter-R5 distances (r crystal ) were computed based on the reported crystal structures: PDB ID 3TS8 for the p21-RE (8); and 4HJE for the BAX-RE (9). A modified version of the NASNOX program was used to account for the presence of protein atoms. 2.5 Characterization of DNA duplex models For a given model j of a DNA duplex, we defined a scoring function P t as: } 2 ) ( exp{ 2 2 ∏ − − = − i i i 0 i j model j t r r P σ (1) 104 where i designates a particular distance in the DNA duplex, and r model-j is the NASNOX computed expected distance for the model j, r 0 is the DEER measured distance, and σ is the measured distance distribution. Heavy atom root-mean-square-deviations between DNA models (RMSD struct ) were calculated using the program VMD (28). Calculations included, unless otherwise stated, only the interior of the duplex and excluded two base-pairs at either terminus. DNA structural parameters were calculated using CURVES (29). 3. Results 3.1 DEER measured distances reveal that p53DBD binding induces RE conformational change To examine the conformation of REs, a pair of R5 probes (Figure 1B) were attached at specific DNA sites using a previously established phosphorothioate scheme (20,21), and inter-R5 distances were measured by DEER. Each measured distance was designated by the corresponding labeling site numbers, for example, [14; 34] in p21-RE represents the distance measured with a pair of R5 attached to the phosphorothioates of nucleotides C 14 and C 34 (Figure 1A). For all double labeled REs, the measured dipolar evolution traces showed an oscillating decay pattern (see Figure 1C for examples), from which the inter-R5 distances were determined. Control measurements on single-labeled REs gave flat traces (SI Figure S1), thus ensuring that the distances measured in the double-labeled DNA were not biased by spin interactions due to undesired sample aggregation. In addition, previous studies have demonstrated that R5 did not significantly distort the DNA duplex, and the measured distances accurately reported on the native structure (21,22,30). 105 We measured multiple distances in REs in the absence and presence of p53DBD in order to examine protein-induced DNA conformational changes. In these measurements, we chose labeling sites with minimal perturbation to p53 binding, and we confirmed p53DBD-DNA complex formation by gel shift assays (SI Figure S2). For the p21-RE, 2 of the 6 datasets, [14; 34] (Figure 1C) and [13; 34] (SI Figure S3D), showed clear differences in the measured echo evolution traces upon p53DBD binding. The differences in r 0 between the unbound and bound DNA were 5 Å and 3 Å, respectively (SI Table S1), which are well beyond the error of r 0 measurements (±1 Å). In addition, control studies indicated that changes in R5 conformers upon p53DBD binding were minimal, and were unlikely to account for the observed r 0 changes (SI Figure S4). As such, distance changes observed in datasets [14; 34] and [13; 34] clearly revealed p21-RE conformational changes upon p53DBD binding. On the other hand, the remaining 4 datasets, including [9; 24] shown in Figure 1C, gave superimposable dipolar evolution traces in the absence and presence of p53DBD, resulting in little or no changes in r 0 (SI Table S1, SI Figure S3D). For the BAX-RE, 10 sets of distances were measured in unbound and bound DNA (SI Table S1, SI Figure S5). Interestingly, the observed distance changes were much smaller than those observed for the p21-RE. The only noticeable distance changes were from datasets [15; 36] (Figure 1C) and [14; 36] (SI Figure S5D), both at 2 Å, whereas the remaining 8 datasets showed little or no changes in r 0 (SI Table S1, SI Figure S5D). Overall, in both the p21- and BAX-RE, p53 induced distance changes were detected by DEER, thus unambiguously revealing RE conformational changes upon interacting with p53. However, with the presence of both variable and invariable distances, it was difficult to “intuitively” deduce the mode of RE conformational changes, even though crystal structures of 106 both bound DNAs were available (8,9). This motivated us to “solve” the conformation of the unbound REs in solution. 3.2 Conformation of unbound RE in solution To obtain all-atom models of unbound RE, we implemented a strategy that has been successfully used in SDSL mapping of an RNA junction (24), namely to use the DEER measured distances to select sterically acceptable models from a large pool of three-dimensional structures. In our approach, these structures were generated by unrestrained MC simulations (18). For each structural model, the NASNOX program (20) was used to select sterically allowed R5 conformers at the respective labeling sites, from which the relevant average inter-R5 distances (r model ) were obtained. We then computed a scoring function P t for each model (Equation 1, Materials and Methods), taking into account the measured and predicted average distances (r 0 and r model ) and the width of the measured distance distribution (σ). P t represents, under the assumption of an idealized normal distribution, the effective probability of a given set of r model values that match the corresponding r 0 values, with a perfect match resulting in a maximum P t score of 1. For the p21-RE, we used 16 sets of DEER-measured distance (SI Figure S3, SI Table S2) to compute P t for a pool of 10,000 models obtained from 1 million MC cycles. The top-ranked model had a P t score of 0.64 (Figure 2A, SI Table S2), corresponding to on average a 97% probability of matching each r model to the corresponding r 0 (0.97 16 ≈ 0.64). For this top-ranked model, the root-mean-square-deviation between r model and r 0 (RMSD deer ) was 0.81 Å, and the largest difference between a corresponding set of r 0 and r model was 1.8 Å (SI Table S2). Given that r 0 and r model each might incur errors of ±1 Å, differences below 2 Å were deemed 107 insignificant. The results indicated that the top-ranked MC model satisfies all the measured distances. 108 Figure 2: Characterization of the unbound p21-RE (A-C) and BAX-RE (D-F). (A) Top-ranked MC model of the unbound p21-RE. (B) Uniform 20-bp B-DNA model constructed with a standard set of base- pair parameters (Helix twist: 35.9°; X-displacement; -0.66 Å; C2’-endo sugar pucker). (C) Data mining of MC-generated unbound p21-RE models using EPR-derived distances. X-axis: P t computed based on 16 measured distances in the unbound p21-RE duplex. Y-axis: RMSD struct computed against the top-ranked MC model. Data points corresponding to uniform B-DNA (cyan) and bound DNA from PDB ID 3TS8 (red) are also included. (D) Top-ranked MC model of the unbound BAX-RE. (E) Uniform 21-bp B-DNA model. (F) Data mining of MC-generated unbound BAX-RE models using EPR-derived distances. All color codes are the same as that in (C), except that the bound DNA data point (red) was obtained using PDB ID 4HJE. 109 To characterize variations in EPR-derived p21-RE models, we adopted a commonly used approach in NMR studies and further analyzed the 20 MC models with the highest P t scores. The pair-wise RMSD struct among this top-20 ensemble was (1.0 ± 0.3) Å (average ± standard deviation, same below). This suggested that models that conformed to the measured distances were structurally similar (SI Figure S6). In addition, we carried out a search using only 14 out of the 16 measured distances, which yielded the same top-ranked model and a very similar top-20 ensemble (SI Table S3). Overall, the data indicated that within the resolution of the method, the 16 measured distances were sufficient to identify a set of all-atom MC models with convergent conformations. We also examined models of generic B- and A-form duplexes built with identical helical parameters (Figure 2B and SI Table S2), which therefore exhibited uniform shapes without sequence-dependent characteristics. As expected, a uniform 20-bp A-form duplex, which drastically differed from the top-ranked p21-RE model with an RMSD struct of 5.0 Å, fitted poorly to the measured distances (P t = 1.0 × 10 -17 ; RMSD deer = 8.5 Å) (SI Table S2). More importantly, a uniform 20-bp B-form duplex (Figure 2B) also yielded a low P t score (3.3 × 10 -6 ) and a high RMSD deer value of 3.9 Å (SI Table S2), indicating that this generic B-DNA did not conform to distances measured for the p21-RE. Indeed, the generic B-DNA had an RMSD struct of 2.9 Å when compared to the top-ranked p21-RE model, which was larger than the variation among the top- 20 EPR-derived models (Figure 2C). Therefore, the 16 sets of distances yielded a converged conformation of the unbound p21-RE with a sequence-dependent shape. Using the SDSL-MC approach, we also obtained all-atom models for the unbound BAX-RE with 18 sets of measured distances (SI Figure S5, SI Table S4). The top-ranked BAX-RE model (Figure 2D) had a P t score of 0.67 (SI Table S4), i.e., an average 98% probability of matching 110 each r model to the corresponding r 0 . This top-ranked MC model again differed from a 21-bp generic B- duplex (Figure 2E) and satisfied all the DEER measured distances (SI Table S4). In addition, the pair-wise RMSD struct among the top 20 BAX-RE models was (1.0 ± 0.3) Å, indicating a high degree of structural similarity (SI Figure S6). Furthermore, control studies showed that the use of 18 sets of distances was sufficient to identify a set of all-atom models with convergent conformations for the unbound BAX-RE (SI Table S5). 3.3 Assessing p53DBD bound RE conformations in solution To evaluate the bound RE conformation in solution, we compared the expected average inter- R5 distances based on the crystal structures (r crystal ) with the corresponding DEER measured r 0 values (SI Table S6). For the p21-RE, the 4 distances spanning the central region between the two CATG cores showed differences ranging between 1.0 to -1.4 Å, which were within the variability range of 2 Å in our measurements. This indicated that within the resolution accessible to our method, the crystal structure of the bound p21-RE central region accurately reflected its conformation in solution. The same conclusion has also been previously drawn regarding the central region of the bound BAX-RE (9) (also see SI Table S6). Despite the general agreement observed at the central region, 4 equivalent distances, one each across the four CWWG cores presented in p21- and BAX-RE, showed that the measured r 0 values exceeded the corresponding r crystal by 2.9 to 5.0 Å (SI Table S6). In these measurements, DNA-p53DBD complex formation was confirmed (SI Figure S2), although r 0 did not change upon p53DBD binding (SI Table S1). In addition, these r 0 values were substantially larger than the corresponding r crystal distances obtained from other crystal structures (4-6). Taken together, the data indicated that in solution the CWWG core conformation in the bound REs likely 111 differed from conformations captured in the crystal (see Discussion). At this point, however, extensive protein-DNA contacts involved in the CWWG regions severely limited our ability to measure informative distances to derive an EPR-based model, and bound conformations of the CWWG core regions of the REs could not be deduced with certainty. 3.4 Distinct conformational changes at the central region of the REs upon p53DBD binding The unbound p21-RE conformation derived from the EPR-MC pipeline differed from the bound DNA reported in the crystal structure with an RMSD struct of 2.6 Å, which was beyond variations among the top-20 models (Figure 2C, SI Figure S7A). We modeled the unbound DNA into the tetrameric complex by aligning half-site 1 (i.e., nucleotides A 3 -C 10 /G 31 -T 38 ; Figure 1A). This half-site could be aligned reasonably well between the bound and unbound DNA, with an RMSD of 1.4 Å between heavy atoms of the aligned nucleotides (Figure 3A). However, with half-site 1 aligned, the bound and unbound DNA deviated at half-site 2 (i.e., C 11 -T 18 /A 23 -G 30 ) with an RMSD of 8.4 Å (Figure 3A). Apparently, if the protein tetramer were maintained, the unbound DNA conformation would not allow proper protein-DNA contacts (e.g., R280 to G 7 ; G 17 , G 27 and G 37 , Figure 3A) to form simultaneously at both half-sites. This likely drove deformation at the central region of the bound p21-RE, resulting in a previously noted displacement between the helix axes of the two half-sites (SI Figure S8) (8). 112 Figure 3: Conformational changes in REs upon p53 binding. In each panel, shown on the left is the superimposition of the SDSL-MC derived unbound RE onto the corresponding co-crystal structure of the complex, with the blue sticks representing the Arg280 residues; and the CPK representation denoting the G 7 , G 17 , G 27 and G 37 nucleotides in the RE. Shown on the right are schematic representations of the bound and unbound REs. (A) The p21-RE, with DNAs aligned at nucleotides A 3 -C 10 /G 31 -T 38 (dashed box). Note that in this work the p53 construct included only the wild-type DBD, while that co-crystal structure of the complex (PDB ID 3TS8) included the DBD linked directly to the oligomerization domain without the linker presented in the wild-type, with mutations present in both domains (7,8). (B) The BAX-RE, with the DNAs aligned at nucleotides A 3 -A 10 /T 33 -T 40 (dashed box). For the BAX-RE, RMSD struct between the unbound and bound DNA was 2.1 Å (Figure 2F). This is smaller than that of the p21-RE, and indicated that the BAX-RE underwent a more subtle 113 p53-induced conformational change (Figure 3). When we aligned the unbound and bound BAX- RE based on half-site 1 (i.e., nucleotides A 3 -A 10 /T 33 -T 40 ), the bound DNA was in effect superimposed by the ensemble of top-20 unbound RE models (SI Figure S7B), and half-site 2 (i.e., nucleotides A 12 -C 19 /G 24 -T 31 ) of the top-ranked unbound BAX-RE deviated from the corresponding segment of the bound DNA with an RMSD of 3.9 Å (Figure 3B). These observations were consistent with the small distance changes observed at the central region of BAX-RE upon p53 binding (SI Table S1), and might suggest that the unbound BAX-RE was poised to interact with the p53 tetramer due to its sequence-dependent shape (Figure 3B). Most noticeably, for the 9-bp central region spanning G 7 /C 36 and C 15 /G 28 (Figure 1A), the unbound BAX-RE was under-wound by ~ 15° compared to a generic B-DNA. This facilitated the transition into the p53 bound form, in which further unwinding at the central region has been observed (9). 4. Discussion We established a new SDSL-MC approach to study conformations of two prototypic p53 REs in solution. The unbound RE conformations, obtained using multiple measured nanometer distances as constraints, were within the B-DNA family while exhibiting sequence-dependent structural properties distinct from a uniform B-DNA. In both REs, p53-induced DNA deformations were detected at the central region between the two half-sites. The results indicate that sequence-dependent shapes of the unbound RE influence the mode of DNA conformational changes upon interacting with p53, thereby may serve as a mechanism to achieve p53-RE binding specificity. 114 4.1 Sequence-dependent conformational changes of RE upon p53DBD binding Previous biochemical and computational studies have suggested changes of RE conformations upon p53 binding (31-33). However, the molecular detail of the DNA conformational change and its relationship to individual RE sequence remained rather unclear. Earlier work suggested that the bound REs undergo bending at the CWWG region (31). However, recent structural (4-9) and biochemical (33) studies of p53DBD bound REs showed generally rather small bending. Instead, in a number of crystal structures, deviations from canonical B- DNA characteristics were noted at a confined location between the two half-sites (i.e., the central region) (4,6-9). In this work, SDSL measured distances unambiguously demonstrated that in solution p53DBD binding induces conformational changes at the central region of the p21- and BAX-RE (Figure 1C, SI Table S1). Perhaps unexpectedly, while the p53DBD bound complexes exhibited a similar tetrameric scaffold for both REs (9), the degree of p53 induced DNA alteration was more subtle in the 1-bp-spacer BAX-RE as compared to that in the 0-bp-spacer p21-RE (Figure 3). This provides a hint that sequence-dependent structural properties encoded in a particular DNA target are exploited by p53 in order to achieve the energetically most favorable mode of deformation. This hypothesis is further supported by structural analyses of unbound REs, which were enabled by the all-atom models provided by the new SDSL-MC method. The analyses indeed revealed RE shape variations, and suggested tangible connections between structural features in the unbound and bound DNA (Figure 4). For the BAX-RE, larger positive Roll of the T 9 pA 10 base pair step was already apparent in the unbound DNA (Figure 4C). Such intrinsic property of the T p A step facilitated widening of the minor groove (10), which was observed in the bound form (Figure 4A). In addition, the unbound BAX-RE was under-wound at the central 115 region (Figures 3B and 4), thus facilitating further un-winding in order to accommodate the 9-bp central region into the same volume occupied by 8-bp in other REs with 0-bp spacers (9). On the other hand, in the unbound p21-RE the relative positioning of the two CWWG cores deviated significantly from the bound form, necessitating a shift of the helix axis at a “hinge” located at the interface between two half-sites (Figure 3). Analyses further indicated that conformational changes at the central region of the REs occurred to facilitate proper protein-DNA interactions while maintaining the intra- and inter-dimer protein contacts (Figure 3). As the collective protein-DNA and protein-protein contacts give rise to cooperative binding, sequence-dependent conformational changes at the central region of RE thus may modulate cooperativity in p53-RE interaction, thereby contributing to specific RE recognition. 116 Figure 4: Analyses of p53 RE structures. The DNA shape parameters (A) minor groove width, (B) helix twist, and (C) roll are shown for p21-RE (left panel) and BAX-RE (right panel). Structural features were derived from the crystal structure of the complex (red), the top-ranked MC model (green), and the average of the top-20 MC models (blue). The error bars indicate the standard deviations of structural parameters among the top-20 models, demonstrating an efficient conformational sampling. The structural parameters indicate that the confirmations observed in the crystal structures of the bound forms were partially apparent in the intrinsic DNA shape of the unbound forms. Examples for this observation are the low helix twist values at the C 10 pC 11 step of the p21-RE and the A 10 pG 11 step of the BAX-RE, as well as the negative Roll at the A 12 pA 13 step of the p21-RE and the positive Roll at the T 9 pA 10 step of the BAX-RE. For both bound p21- and BAX-RE, the SDSL data indicated that solution-state conformations of the CWWG cores deviated to a certain degree compared to the corresponding crystal structures, although the central regions were in good agreement (SI Table S6). While we cannot rule out the possibility that this discrepancy might be due to the fact that the CWWG 117 cores and the central region were differentially impacted by differences in experimental conditions (e.g., frozen solution vs. crystal; difference in constructs, see Figure 3 caption), the SDSL data may also reflect an intrinsic variability of the CWWG core as suggested by previous studies (4-6,9,31,33). In particularly, the A p T steps within the CWWG cores have been reported in either Watson-Crick (6) or Hoogsteen configuration (4). Transitions between Watson-Crick and Hoogsteen geometry, which associate with base flipping (34), may account for the longer distances measured in solution as reported here. Furthermore, whereas p53DBD binding induced a larger degree of deformation at the central region of p21-RE as compared to that of the BAX- RE, the p21-RE is known to bind tighter to p53 (35). One of the possible explanations for this apparently puzzling observation is that the respective CWWG cores responded differently to p53 binding. Further investigation of the CWWG core, particularly in the bound state, is required. Finally, p53/DNA interactions can be impacted by sequences beyond the DBD and RE (3). Whereas biophysical studies focusing on folded p53 fragments have provided a wealth of information regarding p53 structure and function, expanding beyond these “truncated” systems is highly desirable. The SDSL-MC approach, which is capable of providing molecular details in large non-crystalline complexes, is particularly suited for these studies. 4.2 Mapping sequence-dependent DNA shape using the SDSL-MC approach Whereas early SDSL studies used DNA duplexes as model systems (21,36,37), recent reports have emerged in which SDSL measured distances were used to study DNA duplex conformation in response to base lesion (38), mismatches (39) , and protein binding (40). In addition, SDSL has also been used to study higher order DNA structures such as quadruplexes (41) and four-way junctions (42). In this work, using the R5 probe that can be attached to any nucleotide within a 118 target sequence, multiple distances were readily measured, and they directly revealed conformational changes between the bound and unbound DNA. In addition, synergistic integration with MC sampling allowed us to derive atomic models of the target DNA with sequence-dependent shape. In each top-20 ensemble of unbound RE, the models are: (i) structurally highly similar; and (ii) clearly different from a uniform B-DNA (Figures 2 & 4). This demonstrates that the SDSL-MC pipeline has the ability to provide detailed structural information of DNA duplexes. Note that the bound p21-RE structure differed from the top- ranked model of the unbound DNA by an RMSD struct of 2.6 Å and from uniform B-DNA by only 1.8 Å. As such, obtaining the sequence-dependent shape of the unbound DNA has a profound impact on properly assessing protein induced deformation of DNA targets. Furthermore, intrinsic DNA shape features reveled by the SDSL-MC approach will benefit a broad range of efforts, such as prediction of transcription factor specificities based on regression models that combine DNA sequence and shape (43,44). Nevertheless, further studies are needed to explore the utility and limitation of the SDSL-MC method. For example, the “resolution” that can be achieved by this approach remains to be investigated. In addition, DEER measures distances in a frozen solution state, whereas MC simulations target solution-state equilibrium at ambient temperature. It is not clear how unique aspects of each methodology impact the interpretation of the resulting DNA shape. In summary, results reported here clearly demonstrate that the SDSL-MC approach reveal sequence-dependent shapes of p53 REs that advance our understanding of p53/DNA recognition. The method is not limited by the size of the system, and allows parallel examination of DNA shape in both the unbound and protein-bound states. This is a step forward towards uncovering the role of intrinsic DNA shape on protein-DNA recognition on a general basis. 119 References 1. Riley, T., Sontag, E., Chen, P. and Levine, A. (2008) Transcriptional control of human p53-regulated genes. Nat. Rev. Mol. Cell Biol., 9, 402-412. 2. Petitjean, A., Mathe, E., Kato, S., Ishioka, C., Tavtigian, S.V., Hainaut, P. and Olivier, M. (2007) Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype: lessons from recent developments in the IARC TP53 database. Hum. Mutat., 28, 622-629. 3. Joerger, A.C. and Fersht, A.R. (2010) The Tumor Suppressor p53: From Structures to Drug Discovery. Cold Spring Harb. Perspect. Biol., 2, a000919. 4. Kitayner, M., Rozenberg, H., Rohs, R., Suad, O., Rabinovich, D., Honig, B. and Shakked, Z. (2010) Diversity in DNA recognition by p53 revealed by crystal structures with Hoogsteen base pairs. Nat. Struct. Mol. Biol., 17, 423-429. 5. Eldar, A., Rozenberg, H., Diskin-Posner, Y., Rohs, R. and Shakked, Z. (2013) Structural studies of p53 inactivation by DNA-contact mutations and its rescue by suppressor mutations via alternative protein-DNA interactions. Nucleic Acids Res. 6. Chen, Y., Dey, R. and Chen, L. (2010) Crystal Structure of the p53 Core Domain Bound to a Full Consensus Site as a Self-Assembled Tetramer. Structure, 18, 246-256. 7. Petty, T.J., Emamzadah, S., Costantino, L., Petkova, I., Stavridi, E.S., Saven, J.G., Vauthey, E. and Halazonetis, T.D. (2011) An induced fit mechanism regulates p53 DNA binding kinetics to confer sequence specificity. EMBO J., 30, 2167-2176. 8. Emamzadah, S., Tropia, L. and Halazonetis, T.D. (2011) Crystal Structure of a Multidomain Human p53 Tetramer Bound to the Natural CDKN1A (p21) p53-Response Element. Mol. Cancer Res., 9, 1493-1499. 9. Chen, Y., Zhang, X., Dantas Machado, A.C., Ding, Y., Chen, Z., Qin, P.Z., Rohs, R. and Chen, L. (2013) Structure of p53 binding to the BAX response element reveals DNA unwinding and compression to accommodate base-pair insertion. Nucleic Acids Res., 8368-8376. 10. Rohs, R., West, S.M., Sosinsky, A., Liu, P., Mann, R.S. and Honig, B. (2009) The role of DNA shape in protein-DNA recognition. Nature, 461, 1248-1253. 11. Egli, M. and Pallan, P.S. (2010) The many twists and turns of DNA: template, telomere, tool, and target. Curr. Opin. Struct. Biol., 20, 262-275. 12. Subirana, J.A. and Messeguer, X. (2010) The most frequent short sequences in non- coding DNA. Nucleic Acids Res., 38, 1172-1181. 13. Zhou, T., Yang, L., Lu, Y., Dror, I., Dantas Machado, A.C., Ghane, T., Di Felice, R. and Rohs, R. (2013) DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale. Nucleic Acids Res. 14. Fanucci, G.E. and Cafiso, D.S. (2006) Recent Advances and applications of site-directed spin labeling. Curr. Opin. Struct. Biol., 16, 644-653. 15. Sowa, G.Z. and Qin, P.Z. (2008), Prog. Nucleic Acids Res. Mol. Biol., Vol. 82, pp. 147- 197. 16. Bishop, E.P., Rohs, R., Parker, S.C., West, S.M., Liu, P., Mann, R.S., Honig, B. and Tullius, T.D. (2011) A map of minor groove shape and electrostatic potential from hydroxyl radical cleavage patterns of DNA. ACS Chem. Biol., 6, 1314-1320. 17. Schiemann, O. and Prisner, T.F. (2007) Long-range distance determinations in biomacromolecules by EPR spectroscopy. Q. Rev. Biophys., 40, 1-53. 120 18. Rohs, R., Sklenar, H. and Shakked, Z. (2005) Structural and energetic origins of sequence-specific DNA bending: Monte Carlo simulations of papillomavirus E2-DNA binding sites. Structure, 13, 1499-1509. 19. Sklenar, H., Wüstner, D. and Rohs, R. (2006) Using internal and collective variables in Monte Carlo simulations of nucleic acid structures: Chain breakage/closure algorithm and associated Jacobians. J. Comput. Chem., 27, 309-315. 20. Qin, P.Z., Haworth, I.S., Cai, Q., Kusnetzow, A.K., Grant, G.P.G., Price, E.A., Sowa, G.Z., Popova, A., Herreros, B. and He, H. (2007) Measuring nanometer distances in nucleic acids using a sequence-independent nitroxide probe. Nat. Protocols, 2, 2354- 2365. 21. Cai, Q., Kusnetzow, A.K., Hubbell, W.L., Haworth, I.S., Gacho, G.P.C., Van Eps, N., Hideg, K., Chambers, E.J. and Qin, P.Z. (2006) Site-directed spin labeling measurements of nanometer distances in nucleic acids using a sequence-independent nitroxide probe. Nucl. Acids Res., 34, 4722-4734. 22. Price, E.A., Sutch, B.T., Cai, Q., Qin, P.Z. and Haworth, I.S. (2007) Computation of nitroxide-nitroxide distances in spin-labeled DNA duplexes. Biopolymers, 87, 40-50. 23. Cai, Q., Kusnetzow, A.K., Hideg, K., Price, E.A., Haworth, I.S. and Qin, P.Z. (2007) Nanometer Distance Measurements in RNA Using Site-Directed Spin Labeling. Biophys. J., 93, 2110-2117. 24. Zhang, X., Tung, C.-S., Sowa, G.Z., Hatmal, M.m.M., Haworth, I.S. and Qin, P.Z. (2012) Global structure of a three-way junction in a phi29 packaging RNA dimer determined using site-directed spin labeling. J. Am. Chem. Soc., 134, 2644–2652. 25. Jeschke, G., Chechik, V., Ionita, P., Godt, A., Zimmermann, H., Banham, J., Timmel, C., Hilger, D. and Jung, H. (2006) DeerAnalysis2006—a comprehensive software package for analyzing pulsed ELDOR data. Appl. Magn. Reson., 30, 473-498. 26. Rohs, R., Bloch, I., Sklenar, H. and Shakked, Z. (2005) Molecular flexibility in ab initio drug docking to DNA: binding-site and binding-mode transitions in all-atom Monte Carlo simulations. Nucl. Acids Res., 33, 7048-7057. 27. Rohs, R., Etchebest, C. and Lavery, R. (1999) Unraveling Proteins: A Molecular Mechanics Study. Biophys. J., 76, 2760-2768. 28. Humphrey, W., Dalke, A. and Schulten, K. (1996) VMD: visual molecular dynamics. J. Mol. Graph., 14, 33-38, 27-38. 29. Lavery, R. and Sklenar, H. (1989) Defining the structure of irregular nucleic acids: conventions and principles. J. Biomol. Struct. Dyn., 6, 655-667. 30. Popova, A.M., Kálai, T., Hideg, K. and Qin, P.Z. (2009) Site-Specific DNA Structural and Dynamic Features Revealed by Nucleotide-Independent Nitroxide Probes. Biochemistry, 48, 8540-8550. 31. Nagaich, A.K., Zhurkin, V.B., Durell, S.R., Jernigan, R.L., Appella, E. and Harrington, R.E. (1999) p53-induced DNA bending and twisting: p53 tetramer binds on the outer side of a DNA loop and increases DNA twisting. Proc. Natl. Acad. Sci. U. S. A., 96, 1875- 1880. 32. Pan, Y. and Nussinov, R. (2008) p53-Induced DNA Bending: The Interplay between p53- DNA and p53-p53 Interactions. J. Phys. Chem. B, 112, 6716-6724. 33. Beno, I., Rosenthal, K., Levitine, M., Shaulov, L. and Haran, T.E. (2011) Sequence- dependent cooperative binding of p53 to DNA targets and its relationship to the structural properties of the DNA targets. Nucleic Acids Res., 39, 1919-1932. 121 34. Honig, B. and Rohs, R. (2011) Biophysics: Flipping Watson and Crick. Nature, 470, 472- 473. 35. Weinberg, R.L., Veprintsev, D.B., Bycroft, M. and Fersht, A.R. (2005) Comparative Binding of p53 to its Promoter and DNA Recognition Elements. J. Mol. Biol., 348, 589- 596. 36. Ward, R., Keeble, D.J., El-Mkami, H. and Norman, D.G. (2007) Distance determination in heterogeneous DNA model systems by pulsed EPR. ChemBioChem, 8, 1957-1964. 37. Schiemann, O., Cekan, P., Margraf, D., Prisner, T.F. and Sigurdsson, S.T. (2009) Relative orientation of rigid nitroxides by PELDOR: beyond distance measurements in nucleic acids. Angew. Chem. Int. Ed. Engl., 48, 3292-3295. 38. Sicoli, G., Mathis, G., Aci-Seche, S., Saint-Pierre, C., Boulard, Y., Gasparutto, D. and Gambarelli, S. (2009) Lesion-induced DNA weak structural changes detected by pulsed EPR spectroscopy combined with site-directed spin labelling. Nucleic Acids Res., 37, 3165-3176. 39. Wunnicke, D., Ding, P., Seela, F. and Steinhoff, H.-J. (2012) Site-Directed Spin Labeling of DNA Reveals Mismatch-Induced Nanometer Distance Changes between Flanking Nucleotides. J. Phys. Chem. B, 116, 4118-4123. 40. Reginsson, G.W., Shelke, S.A., Rouillon, C., White, M.F., Sigurdsson, S.T. and Schiemann, O. (2013) Protein-induced changes in DNA structure and dynamics observed with noncovalent site-directed spin labeling and PELDOR. Nucleic Acids Res., 41, e11. 41. Singh, V., Azarkh, M., Exner, T.E., Hartig, J.S. and Drescher, M. (2009) Human telomeric quadruplex conformations studied by pulsed EPR. Angew. Chem. Int. Ed. Engl., 48, 9728-9730. 42. Freeman, A.D.J., Ward, R., El Mkami, H., Lilley, D.M.J. and Norman, D.G. (2011) Analysis of Conformational Changes in the DNA Junction-Resolving Enzyme T7 Endonuclease I on Binding a Four-Way Junction Using EPR. Biochemistry, 50, 9963- 9972. 43. Gordan, R., Shen, N., Dror, I., Zhou, T., Horton, J., Rohs, R. and Bulyk, M.L. (2013) Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape. Cell Rep, 3, 1093-1104. 44. Dror, I., Zhou, T., Mandel-Gutfreund, Y. and Rohs, R. (2013) Covariation between homeodomain transcription factors and the shape of their DNA binding sites. Nucleic Acids Res. 122 Supplementary Information Figure S1: Examples of DEER measurements on single-labeled p21-RE duplexes. Panels (A) and (B) show data obtained with a single R5 attached to site 4 and site 24, respectively. The black traces represent the measured echo decay, and the red traces represent the background computed by fitting an exponential decay corresponding to a homogeneous 3-dimentional distribution of electron spin to the last half of the data. Panel (C) shows background-corrected dipolar evolution curve for the two corresponding single-labeled samples. No decay or oscillation was observed after background correction, indicating a lack of inter-molecular spin dipolar interaction between the single-labeled duplexes. This suggests that the DEER measurements are not biased by random aggregation of DNA or head-to-tail stacking between different duplexes. 123 Figure S2: Native gel shift assay to examine p53DBD binding to double-labeled REs. Selective DEER samples were diluted by 5-fold and loaded onto 6% (w/v) polyacrylamide gels. The gels were run in a 0.5× TBE buffer (50 mM Tris, pH 7.6, 45 mM Borate, 0.5 mM EDTA) at room temperature. (I) The p21-RE. Samples 1 to 5 represent DEER samples labeled at [4; 34], [14; 24], [9; 24], [13; 34], and [14; 34], respectively. (II) The BAX-RE. Samples 1 to 6 represent DEER samples labeled at [14; 36], [15; 36], [9; 15], [10; 14], and [10; 25], [12; 25], respectively. Note that DNA staining in the BAX-RE/protein complex is much stronger than that in the p21- RE/protein complex. The respective crystal structures (1,2) show that the DNA conformation is more rigid in the p21-RE complex as compared to that of the BAX-RE complex. It is likely more difficult for the DNA staining agent, ethidium bromide, to intercalate into the DNA duplex witin the p21-RE complex, resulting in reducing level of staining. 124 Figure S3: DEER measured distances in p21-RE. (A) Labeling sites and measured distances in p21-RE. R5 labeling sites at the phosphates (P) are shown in red. Red dotted lines denote the distances measured only in unbound DNA, while red dashed lines show the distances measured in both unbound and bound DNA. (B) DEER data of unbound DNA. (I) Original echo decay data. Black traces are the measured echo decays that have been normalized to the amplitude at t=0. Red traces are the dipolar decay background obtained by fitting an exponential decay to the later portion of the data determined by the DEERanaysis2011 program. (II) Dipolar evolution function. Black traces represent the differences between the measured echo decay and the background decay shown in (I). Red traces are the simulated echo decay computed according to the corresponding distance distributions shown in (III). (III) Computed distance distributions P(r). Shaded boxes indicate the major bands in P(r), from which the average distances (r 0 ) and the width of distribution (σ) were computed. Dotted lines mark the average distances. (I) (II) (III) 125 126 127 128 (C) DEER data of p53DBD bound DNAs. (I) (II) (III) 129 130 (D) Comparison of distances measured in unbound and bound DNAs. 131 Figure S4: Example of expected allowable R5 conformers within the p21-RE. Allowable R5 conformers were obtained using the NASNOX program as described in the main text. With the unbound DNA (left), 102 and 105 rotamers are found at sites 14 and 34, respectively. The corresponding numbers in the p53DBD bound state are 95 and 97. The similarity between the two sets of numbers indicates that the presence of p53DBD did not change the R5 rotamer distribution, and therefore was not expected to cause large changes in the inter-spin distance. 132 Figure S5: DEER measured distances in the BAX-RE. All symbols and representations are same as in SI Figure S4. (A) Labeling sites and measured distances in the BAX-RE. (B) DEER data of unbound DNAs. 133 134 135 (C) DEER data of p53DBD bound DNA. 136 137 138 139 140 (D) Comparison of distances measured in unbound and bound DNAs. 141 Figure S6: Superimposition of the top-20 models derived from SDSL-MC analyses of the unbound REs. Alignments were based on the interior base-pairs and excluded the two base-pairs at each terminus. The respective best-fit models are shown in green. 142 Figure S7: Superimposition of the bound DNA (red) and the top-20 MC models of the unbound DNAs (blue, the top-ranked models were shown in green). (A) p21-RE models. The models were aligned at nucleotides A 3 -C 10 /G 31 -T 38 . (B) BAX-RE models. The models were aligned at nucleotides A 3 -A 10 /T 33 -T 38 . 143 Figure S8: Helix axis for bound p21-RE and top-ranked model of the unbound p21-RE model. At the central region between the half sites, the helix axis of the bound DNA shows a pronounced shift, whereas that of the unbound DNA shows a slight bend. 144 Table S1: Comparison of DEER measured distances in unbound and bound REs. Datasets DEER measured distance (Å) Unbound Bound p21-RE Across the central region [14; 34] 31 26 [13; 34] 25 22 [4; 14] 33 32 [9; 24] 32 32 Within CATG core [14; 24] 24 24 [4; 34] 24 23 BAX-RE Across the central region [14; 36] 29 27 [15; 36] 34 32 [25 36] 36 35 [9; 15] 33 33 [10; 14] 29 28 [9; 31] 23 23 [12; 25] 21 22 [10;25] 33 34 Within CATG core [4; 36] 24 24 [15; 25] 24 25 145 Table S2: Distance measurements and model characterization of the unbound p21-RE. Unbound p21-RE Datasets Average distance [Å] DEER-measured (r 0 ) NASNOX-predicted (r model ) Top-ranked MC model Uniform B- DNA Uniform A- DNA Across the central region [14; 34] 31 30.6 25.8 30.4 [13; 34] 25 25.0 20.2 26.0 [4; 14] 33 34.4 34.0 28.1 [9; 24] 32 32.2 35.6 15.6 [4; 30] 25 25.2 30.2 11.5 [8; 24] 38 36.9 39.9 22.0 [4; 28] 38 38.6 39.9 22.1 [9; 30] 24 23.8 20.0 23.8 [6; 17] 38 39.8 38.1 28.8 [28; 38] 34 32.9 34.0 26.6 Within CATG core [14; 24] 24 23.7 16.7 21.4 [4; 34] 24 23.6 16.7 23.5 [4; 8] 30 28.9 27.5 26.2 [4; 33] 20 19.6 16.0 18.7 [8; 34] 26 25.4 24.9 24.6 [4; 38] 26 25.6 24.9 26.1 RMSD deer − 0.81 3.9 8.5 P t − 6.4 × 10 -1 3.3 × 10 -6 1.0 × 10 -17 146 Table S3: Comparison of unbound p21-RE MC models obtained using either 16 or 14 sets of DEER measured distances. Number of measured distances used to search the MC pool 16 (a) 14 (b) RMSD struct of the top-20 ensemble (Å) (c) (referenced to the top-ranked model) Average 1.0 1.1 Standard deviation 0.3 0.3 Maximum 1.4 1.4 (a) The 16 sets of distances are reported in Supplementary Table S2. (b) Datasets [6; 17] and [28; 38] were omitted when searching the MC model pool. The distances of these two datasets show large variation (standard deviation of 4.2 Å and 4.6 Å, respectively) among all models within the MC pool, and thus are deemed to be informative on discriminating different MC models. Similar results obtained by omitting these two informative datasets suggest that the selection of these 16 sets of distances is enough to determine a set of converged DNA conformations. (c) Individual examination revealed that 15 models were present in both groups. 147 Table S4: Distance measurements and model characterization of the unbound BAX-RE. Unbound BAX-RE Datasets Average distance [Å] DEER- measured (r 0 ) NASNOX-predicted (r model ) Top-ranked MC model Uniform B- DNA Across the central region [15; 36] 34 34.4 32.3 [14; 36] 29 28.0 26.4 [10; 14] 29 29.4 27.6 [9; 15] 33 33.3 31.4 [9; 31] 23 22.2 17.0 [7; 28] 34 33.5 35.5 [18; 36] 45 45.9 43.9 [25; 36] 35 36.1 38.1 [10; 25] 33 33.0 35.4 [12; 25] 21 20.1 23.8 [4: 31] 34 34.5 35.4 [6; 17] 39 39.3 38.1 [9; 28] 23 23.4 23.8 [5; 14] 30 30.4 31.5 Within CAAG core [15; 25] 24 24.0 16.7 [4; 36] 24 24.4 16.7 [4; 7] 25 24.6 23.1 [4; 35] 21 22.1 16.0 RMSD deer - 0.82 3.4 P t - 6.7 × 10 -1 1.1× 10 -5 148 Table S5: Comparison of unbound BAX-RE MC models obtained using either 18 or 16 sets of DEER measured distances. Number of measured distances used to search the MC pool 18 (a) 16 (b) RMSD struct of the top-20 ensemble (Å) (c) (referenced to the top-ranked model) Average 1.1 1.2 Standard deviation 0.3 0.4 Maximum 1.5 1.7 (a) The 18 sets of distances are reported in Supplementary Table S4. (b) Datasets [9; 28] and [5; 14] were omitted when searching the MC model pool. (c) Individual examination revealed that 11 models were present in both groups. 149 Table S6: Assessment of bound DNA conformations reported in co-crystal structures. A. Across the central region between half sites Datasets Average distance [Å] Measured (r 0 ) Predicted (r crystal ) Difference (r 0 -r crystal ) p21-RE a [14; 34] 26 26.6 -0.6 [13; 34] 22 21.0 1.0 [4; 14] 32 33.4 -1.4 [9; 24] 32 32.4 -0.4 BAX-RE b [15; 36] 32 32.5 -0.5 [14; 36] 27 27.5 -0.5 [10; 14] 28 29.0 -1.0 [9; 15] 33 32.3 0.7 [9; 31] 23 23.9 -0.9 [25; 36] 35 36.1 -1.1 [10; 25] 34 32.8 1.2 [12; 25] 22 21.5 0.5 a The central region of the p21-RE is defined as nucleotide positions G 7 -C 14 /G 27 -C 34 (see Figure 1A). Predicted distances are based on PDB ID 3TS8. b The central region of BAX-RE is defined as nucleotide positions G 7 -C 15 /G 28 -C 36 (see Figure 1A). Predicted distances are based on PDB ID 4HJE. B. Within the CWWG core of a half site Datasets Average distance [Å] Measured (r 0 ) Predicted (r crystal ) Difference (r 0 -r crystal ) p21-RE [14; 24] 24 20.4 3.6 [4; 34] 23 20.1 2.9 BAX-RE [15; 25] 25 20.0 5.0 [4; 36] 24 19.2 4.8 150 Supplementary Information References: 1. Emamzadah, S., Tropia, L. and Halazonetis, T.D. (2011) Crystal Structure of a Multidomain Human p53 Tetramer Bound to the Natural CDKN1A (p21) p53-Response Element. Mol. Cancer Res., 9, 1493-1499. 2. Chen, Y., Zhang, X., Dantas Machado, A.C., Ding, Y., Chen, Z., Qin, P.Z., Rohs, R. and Chen, L. (2013) Structure of p53 binding to the BAX response element reveals DNA unwinding and compression to accommodate base-pair insertion. Nucleic Acids Res., 41, 8368-8376. 151 Chapter 5 Conformations of GATA3 DNA binding domain bound to DNA studied by site-directed spin labeling Abstract GATA3, a member of the GATA-binding-protein family, is a master regulator of T-cell differentiation and plays crucial roles in adaptive immune system maintenance. GATA3 features two highly conserved zinc-fingers, referred to as N-finger and C-finger, which bind to DNA targets with the consensus sequence of “GATA”. The two fingers are connected by a conserved linker that is highly flexible, and it has been proposed that GATA3 recognition (binding) of GATA sites that are distal in the genome could be critical in re-organizing genome architecture. However, while a number of studies have revealed details on binding of individual GATA3 fingers, particularly the C-finger, to target DNA, little is known about solution conformations of DNA-bound double-finger GATA3. In this study, the technique of site-directed spin labeling was applied to study conformations of the double finger fragment of GATA3 in the presence and absence of its DNA targets. Stable nitroxide spin labels were successfully attached to specific sites at either the protein or the target DNA. Inter-spin distances in the nanometer range were measured using a pulsed electron paramagnetic resonance technique, and the measured distances were used to assess solution conformations of the DNA-free protein as well as complexes formed by GATA3 and its DNA targets. The results show that the solution conformation of double- finger GATA3 bound to a DNA containing palindromic GATA sites closely matches that reported in a crystallographic study. Moreover, preliminary data indicate that when presented with a target DNA containing a single GATA site, the C-finger binds to the DNA with a specific 152 conformation similar to the conformation while binds to palindromic DNA; while the N-finger does not bind with a specific conformation even with excess DNAs. This work lays the foundation for spin labeling studies of multi-domain DNA binding protein, which contains both ordered domains and disordered regions. 153 1. Introduction The GATA-binding-proteins are a family of transcription factor (TF) that specifically recognizes DNAs with a consensus sequence of “GATA” [1, 2]. The mammalian GATA family consists of six members, and functions as master regulator of lineage specific cell differentiation [3-5]. In particular, GATA3, the subject of this study (Figure 1A), is critical to the immune system and is an important regulator of T cell development [6-8]. All six GATA proteins in mammalian species feature two highly conserved zinc-fingers referred to as N-terminal zinc finger (N-finger, or Nf) and C-terminal zinc finger (C-finger, or Cf), which are critical for the protein function. C-finger and its adjacent basic region are necessary and efficient to bind to the cognate consensus DNA sequence WGATAR (W=A/T, R=A/G) [1, 2, 9-11]. On the other hand, N-finger has been shown to participate in interaction with protein partners, but can also bind to DNA independently with a different sequence preference (i.e., GATC) [12-14]. On certain DNA sequences with two proximal GATA sites, the N-finger can participate in DNA binding together with C-finger, leading to a greatly increased binding affinity [15]. 154 Figure 1: (A) Domain organization of GATA3. The major functional domains are: red: transactivation domains (TA1 and TA2); green: N-finger; magenta: C-finger; blue: linker; and purple: C-tail. (B) Three crystal structures of GATA3 full DNA binding domain with DNA that contains composite GATA binding sites. In each structure, the N-finger and linker regions are colored in green, while the C-finger is in purple. In complex 1, the two zinc-fingers from the same protein binds to the palindromic GATA sites of the same DNA, while in complex 2 and 3, the two fingers bridge together two DNA duplexes by binding to GATA sites from two different duplexes. Members within the GATA family have been characterized by both X-ray crystallography and NMR spectroscopy, although currently available high-resolution structures are primarily on the individual zinc fingers. Structures have been reported for DNA-bound C-fingers [16-18], as well as GATA1 N-finger bound to the FOG-1 zinc-finger 1 [19]. More recently, the Chen lab reported crystal structures of full GATA3 DNA binding domain (double-finger, or DF, including both zinc-fingers as well as the adjacent basic regions) bound to DNAs containing composite GATA sites (Figure 1B) [20]. Structures of DNA-bound double-finger GATA1 have also been deposited (3VEK and 3VD6) although publication is still unavailable. These structures revealed 155 that the detailed interactions between the individual C- and N-finger and its cognate DNA sites are largely conserved, and the basic region following the C-finger (“C-tail”) contact both the DNA and the N-finger. These observations are consistent with the highly conserved nature of both zinc-fingers as well as the C-tail. On the other hand, the conserved basic region between the N- and C-finger, designated as the linker (Figure 1A), is highly disordered and allows complex and versatile DNA-binding modes. In the three DNA-bound DF crystal structures, density of the majority of linker was missing, and very different modes of DNA binding were presented. In complex 1, the two fingers wraps around a single DNA molecule containing two cognate sites spaced by 1 base-pair (bp); whereas in complexes 2 and 3, in which the cognate sites are spaced by 3 bp, one protein bridges two DNA duplexes, with the N- and C-finger each binding to the WGATAR site of the DNA, and the N-finger/DNA module (N-module) projecting onto different orientation as relative to the C- finger/DNA module (C-module) (Figure 1B). The detailed binding modes of GATA3 are likely relevant to its functions. It has been reported that 40% of GATA1-binding sites follow the palindromic consensus motif [21], and a higher binding-affinity is observed between GATA1 and palindromic sites compared to single site [15], supporting hypothesis that GATA1 binding to palindromic DNA sites plays an important role in vivo [21]. Even though similar research hasn’t be carried out on GATA3, it’s possible that GATA3 also binds to palindromic sites in vivo, and complex 1 crystal structure is an example of this protein/DNA binding mode. Besides, the majority of GATA1 response elements (60%) are comprised of single GATA sites [21]. This is extremely important, since binding of these isolated GATA sites by the N-finger and C-finger of GATA proteins could lead to chromosome looping or inter-chromosome interactions, and crystal structures of complex 2 and 3 provide evidence that DF can indeed bridge two DNA molecules 156 together. Thus, the detailed recognition and binding mechanism of GATA3 to its DNA target site would help to elucidate the unique and versatile DNA binding function of GATA3. However, the high flexibility at the linker region presents great difficulty for crystallization work on the whole protein. More significantly, it also draws concern on how crystal packing affects configuration between the N- and C-modules. This challenge is not unique to GATA proteins. In fact, most eukaryotic TFs contain multiple folded domains linked by flexible regions, and many of them can form oligomers to recognize multiple DNA sites, resulting in large macromolecular complexes [22-25]. However, similar to that in the GATA proteins, structural analysis of eukaryotic TFs has been limited predominantly to sub-domains, and little is known about the structure and function of such higher-order protein/DNA complexes. This is the due to the fact that the co-existence of both well-folded domains and non-ordered linkers regions presents obstacles for crystallographic studies, and the size of those complexes is usually beyond the NMR study range. In this study, a method called site-directed spin labeling (SDSL) is applied on to the structural study of DF, and is proved to be a power route for structural studies of macromolecule, in particular for systems containing disordered regions. SDSL refers to a biophysical method that that utilizes chemically stable radicals, i.e., spin labels, to obtain structural and dynamic information on bio-macromolecules, such as DNA, RNA, protein, etc. Upon spin label attachment, electron paramagnetic resonance (EPR) is then used to monitor the behavior of the spin labels, and thus retrieve information regarding the parent molecule. Two types of information can be derived from SDSL experiments: 1) distances can be measured between a pair of spin labels, which provides direct structural constraints; and 2) rotational dynamics of a singly attached spin label can be monitored, which yields structural and dynamic information of 157 the parent macromolecule at the labeling site. In particular, structural investigation of macromolecules using SDSL is not limited by the size of molecular system (such as NMR), or the nature of sample (such as crystallography), thus making it an outstanding method for structural studies of multi-domain proteins. One commonly used EPR method to measure distances > 20 Å is called Double Electron-Electron Resonance (DEER), which measures inter- spin distance by monitoring inter-spin dipolar interaction. In this study, SDSL was combined with DEER to investigate DF structure and its interaction with DNA that contain either single or palindromic GATA site. Using a spin label that has been previously developed in our group, both protein and DNA were successfully labeled, and distances were measured between protein sites, DNA sites, as well as protein to DNA sites. Compared with crystal structure that has been published on DF and palindromic GATA site DNA, it was shown that the solution structure of the complex measured via SDSL matches that from the crystallographic study. Furthermore, preliminary data was also acquired on DF conformation in absence of DNA target, as well as bound to DNA containing single GATA site. Further work is still required to gain detailed information on independent binding mode of two zinc-fingers, as well as the relative position and orientation between the two when they are bound or unbound to separate DNA target. This work lays foundation for using SDSL to study multi-domain DNA binding protein, as well as further understanding GATA protein functions, in particular on its possible role of inter-chromosome looping. 158 2. Materials and Methods 2.1 Protein mutagenesis To introduce a cysteine at an intended spin labeling site, mutagenesis was carried on vector pET-28a (kanamycin resistant) encoding the DF (amino acids 260-370) fused with 6 Х His-tag [20] (sequence shown in Supporting Information (SI) Figure S1). Two amino acids (threonine280 on N-finger and isoleucine362 on C-tail, respectively) were individually (T280 construct and I362 construct) or simultaneously (double-mutant, or DM construct) mutated to cysteine for subsequent spin label attachment. Site-directed mutagenesis were carried out using the Quikchange kit (Agilent Technologies, Santa Clara, CA) following protocols provided by the vendor. The sequences of the mutagenesis primers are listed in SI Table S1. Briefly, PCR reactions were first carried out to synthesize complimentary strands containing the desired mutations. The PCR products were then subject to DpnI digestion to cleave the un-mutated templates before transformation into unltra-competent cell for nick repair. Transformation was carried out following a heat-shock protocol, using 3 µL of the cleaved mutagenesis reaction and 50 µL of ultra-competent cells 10G or XL1-Blue. After carefully mixing the reaction with competent cells, the mixture was incubated on ice for 30 min, heated at 42°C for 30 s, placed back on ice for 2 min, then mixed with 0.5 mL of pre-warmed SOC or LB broth. The cell solution was then let shake at 37°C for 1 h at 225 rpm, and coated onto kanamycin agar plates. The plate was covered and air dried, and then incubated up-side-down at 37°C overnight. Colonies from the overnight incubation were selected and individually grown in 5 mL of LB broth with kanamycin at 37°C, with constant shaking at 225 rpm overnight. Plasmids with the mutated sequence were recovered from each single colony growth by Miniprep kit (Qiagen, 159 Valencia, CA), and their sequences verified (Laragen, Inc., Culver City, CA). Mutated sequences and their alignment with wild type sequence were shown in SI Figure S2. 2.2 Protein expression and purification Using the heat-shock protocol described above, plasmids with verified sequence were transformed into BL21(DE3)-Gold competent cell for protein expression. Typically, a transformation reaction used ~1 ng of plasmid and 10 µL of cell. The transformed cell was coated on an agar plate containing kanamycin (50 µg/mL) and chlorophenical (34 µg/mL, stored in ethanol). Colonies were selected after overnight incubation, then grew in 5 mL 2 x YT (16 g/L tryptone, 10 g/L yeast extract and 5.0 g/L NaCl per liter media) broth containing kanamycin and chlorophenical, with constant shaking (225 rpm) at 37°C. After overnight growth, the 5 mL cell solution was transferred into 1 L 2 x YT broth (with the same antibiotics composition) and let shaking at 37°C, 225 rpm until OD 600nm reached 0.6-0.8. Protein expression was then induced by adding 0.5 mM isopropyl β-D-1-thiogalactopyranosid (IPTG) and 0.5 µM zinc acetate, and the induced cells were let shake for another 4 h at 25°C and 225 rpm. Cells were harvested by spinning at 7,000 rpm for 15 min. Each liter of cells was re- suspended in 38 mL lysis buffer (50 mM HEPES, pH 7.5, 500 mM NaCl and 5 mM 2- mercaptoethanol) with added protease inhibitors. The mixture was subjected to sonication using cycles of 1 s on, 3 s off, 90 s total on time, and then centrifuged at 18,000 rpm for 30 min to separate lysate from pellet. The lysate was mixed with 1 mL Ni-NTA resin for 1 h at 4°C to allow binding of the His-tagged fusion protein to Ni-NTA resin. The resin was then loaded onto single-use column and washed with 8 mL of wash buffer (lysis buffer with 20 mM imidazole) and then eluted with 6 mL of elute buffer (lysis buffer with 250 mM imidazole). The resin could 160 be cleaned with 10 mL clean buffer (lysis buffer with 500 mM imidazole) and stored in 4°C for future use. The elute from Ni-NTA resin was subsequently purified using a Mono S cation exchange column (GE Healthcare Life Sciences, Piscataway, NJ) with buffer A (50 mM HEPES, pH 7.5, 0.5 µM zinc acetate and 0.5 mM tris(2-carboxyethyl)phosphine (TCEP)) and buffer B (50 mM HEPES, pH 7.5, 1 M NaCl, 0.5 µM zinc acetate and 0.5 mM TCEP). The elute was first diluted by 10 mL buffer A to lower the ionic strength of the solution before loaded onto Mono S column. An example of the Mono S trace is shown in SI Figure S3. The Mono S fractions were run on a denaturing SDS-PAGE gel to identify the protein containing fractions as well as to verify the purity (see an example shown in SI Figure S4). Correct fractions were concentrated using 10K filter to desired concentration (10-100 µM), and resulting proteins were stored at -80°C with ~5% glycerol. Concentrations of protein were estimated from absorbance measured at 280 nm using an extinction coefficient of 16,960 L/(mol·cm) for the reduced protein. Control studies using Bradford Assay showed that concentrations determined from absorbance is accurate to within 20%. 2.3 Spin labeling of proteins The R5 label attaches to a cysteine with a thioether linkage (Figure 2A). Each labeling reaction was carried out in a solution containing 50 mM HEPES (pH 7.5), 500 mM NaCl, 0.5 µM zinc acetate, 0.5 mM TCEP, 0.5 mM R5 precursor (prepared as reported previously [26]), and 10-50 µM protein. The reaction mixture was incubated with constant shaking at 4°C in dark overnight. After incubation, un-reacted R5 precursor was removed using Mono S column, and the purified labeled protein fractions were concentrated to desired volume. The efficiency of 161 protein spin labeling was estimated using a spin counting procedure. Briefly, continuous-wave (cw) EPR spectra of a series of a stable radical, TEMPOL, with known concentrations were acquired to generate a calibration curve correlating the double integral values of the spectra to the known TEMPOL concentrations. A cw-EPR spectrum of the spin-labeled protein was then acquired under identical instrumentation setup. Using the calibration curve, the spin concentration in the protein sample was determined based on the double integral value of the measured spectral. The labeling efficiency was then computed as the ratio of the EPR-measured spin concentration to the protein concentration determined by using the absorbance method. 162 Figure 2: (A) R5 labeling on protein. The four torsion angles around which nitroxide rotates are labeled with red arrows. (B) R5 labeling on DNA backbone. Three torsion angles around which nitroxide rotates are labeled with red arrows. (C) DNA sequences used in this study. The palGATA duplex contains two GATA recognition sites separated by 1 base-pair; and the sgGATA containing single GATA recognition site. The GATA binding sites were labeled in bold, while all the spin labeled sites used in this study were labeled with asterisks at the labeled phophorothioates. 2.4 Spin labeling of DNAs The R5 label was attached to a DNA strand following protocols published previously (Figure 2B) [26]. Briefly, all DNA oligonucleotides were synthesized by solid-phase chemical synthesis (Integrated DNA Technologies, Coralville, IA, USA), while the desired phosphate for spin labeling is chemically modified to a phosphorothioate group in this step. The R5 precursor was 163 then reacted with the phosphorothioate group in a solution containing ~0.5 mM crude DNA oligos, 200 mM R5 precursor, 0.1 M 2-(N-morpholino)ethanesulfonic acid (MES) (pH 5.8), and 40% (v/v) acetonitrile. The reaction mixture was incubated in dark at room temperature for 24 h under constant shaking. Labeled DNA was purified by anion-exchange high pressure liquid chromatography, followed by desalting using a reverse-phase column. Desalted oligonucleotides were then lyophilized, re-suspended in water and stored at -20°C. Two DNA sequences were used in the study, containing either a palindromic GATA site (palGATA) or a single GATA site (sgGATA), as shown in Figure 2C. The final concentrations of each DNA strand were determined by absorbance at 260 nm using extinction coefficients listed in SI Table S2. 2.5 EPR sample assembly DNA duplex stock was prepared following annealing protocol as reported previously [26]. To summarize, a duplex stock was prepared by annealing the R5-labeled strand with the complementary strand at a ratio of 1:1.1, or annealing two R5-labeled strands at a ratio of 1:1. After mixing appropriate amount of the respective strands, the mixture was heated at 95◦C for 1 min, and then cooled down at room temperature for 1 min. Proper amount of salt was then added to the mixture to reach a final concentration of 50 mM Tris-HCl (Tris(hydroxymethyl)aminomethane hydrochloride) (pH 7.5) and 100 mM NaCl. The solution was left standing at room temperature for >1 h to allow duplex formation. To assemble a DF/DNA complex, proper amounts of DNA duplex and protein based on desired concentration ratio were mixed in a high salt solution containing 50 mM HEPES, pH 7.5, 500 mM NaCl, 0.5 µM zinc acetate, 0.5 mM TCEP, with the protein concentration ranging 10 – 50 µM and total solution volume at least 500 µL. When spin labels are only attached to protein 164 sites or DNA sites, the non-spin-labeled species can be added in excess to saturate the spin labeled species and maximize the EPR signal. However, when spin labels are attached simultaneously to both protein and DNA sites, the molar ratio of protein and DNA needs to be strictly controlled to make sure neither of the species is in great excess of the other to avoid low modulation depth for DEER measurements. The solution was transferred into dialysis tubing and then dialyzed against low salt solution containing 50 mM HEPES, pH 7.5, 200 mM NaCl, 0.5 µM zinc acetate and 0.5 mM TCEP at 4°C overnight to slowly lower the salt concentration to allow DNA/protein binding. Note that a salt concentration above 300 mM should be avoided for complex formation, since it would diminish the hydrophilic interaction between protein and DNA. The dialyzed complex was concentrated to desired volume with complex concentration ranging 200-300 µM using 10K filters. 2.6 Continuous-wave (cw) EPR measurement of spin labeled protein Each cw-EPR spectrum was obtained using ~5 µL sample loaded in a round glass capillary (0.6 mm i.d. Х 0.8 mm o.d., Vitrocom, Inc., Mountain Lakes, NJ) sealed at one end. X-band cw- EPR spectra were acquired at room temperature on a Bruker EMX spectrometer using an ER4119HS cavity. The incident microwave power was 2 mW, and the field modulation was 1 G at a frequency of 100 kHz. Each spectrum was acquired with 512 points, corresponding to a spectral range of 100 G. Following acquisition, each spectrum was averaged, and normalized to the same number of spins for comparison purpose. 165 2.7 DEER spectroscopy measurement for inter-spin distance Each final DEER sample contained 50 mM HEPES, pH 7.5, 200 mM NaCl, 0.5 µM zinc acetate, and 40% (v/v) glycerol with a total volume of at least 25 µL. The spin concentration was around 200 µM in each sample as estimated by protein concentration and labeling efficiency. Samples were loaded into round glass capillaries (2.0 mm i.d. Х 2.4 mm o.d., Vitrocom, Inc., Mountain Lakes, NJ) sealed at one end, flash-frozen in liquid nitrogen, and used immediately for DEER measurements. DEER measurements were carried out at 78 K on a Bruker ELEXSYS E580 X-band spectrometer equipped with an MD4 resonator, following previously reported procedure and parameters [26]. Briefly, a deadtime free four-pulse scheme [27] was used, with the pump pulse frequency set at the center of the nitroxide spectrum and the observer frequency being approximately 70 MHz higher. The observer π pulse was 32 ns, and the pump π pulse was usually set at 16 ns. Accumulation time in each measurement ranged from 12 to 24 h with 512 shots per point. Inter-spin distances were computed from the resulting dipolar evolution data using DeerAnalysis2013 by Tikhonov regularization [28]. Before data analysis, the trace was first prepared by optimization of background, zero-time, phase and cutoff either manually or automatically. Start and end distances for analysis were specified, and Tikhonov and L-curve were selected for data simulation. Output data would contain background corrected spectrum, simulated spectrum, as well as the distance distribution used to construct the simulated spectrum. From the distance distribution, average distance for each measurement was calculated as reported previously [29]. 166 2.8 Computation of predicted inter-spin distances based on crystal structure Inter-spin distances on protein/DNA complex were predicted based on an updated version of NASNOX, ALLNOX (manuscript in preparation). Briefly, R5 was modeled onto target labeling sites, which can be on either DNA or protein. When R5 is attached onto protein via a cysteine, four torsion angles are varied to allow internal motion of R5 against protein backbone (Figure 2A), while three torsion angles are present for DNA labeled with R5 (Figure 2B). All possible R5 conformers at each site was then searched by varying each of the torsion angles among 60°, 180°, and 300°. The ensemble of allowable R5 conformers, defined as those without steric collision with the parent molecule, were then selected and saved. Pairwise distances were then calculated between the ensembles at each labeling site, and the resulting distance set were used to calculate the averaged distance and the standard deviation of the distribution (representing the width of the distribution). Note that for DNA, an ensemble of allowable conformer at a given site included R5 modeled onto both the R p and the S p diastereomers. 2.9 Electrophoretic mobility shift assay Electrophoretic mobility shift assay (EMSA) was carried out to examine protein/DNA complex formation. Each sample was prepared by directly adding protein to DNA then incubating the mixture on ice for 30 minutes. A special loading dye was prepared for native gel assay, which includes low salt buffer (50 mM HEPES, pH 7.5, 200 mM NaCl, 0.5 µM zinc acetate and 0.5 mM TCEP) and glycerol. Appropriate amount of loading dye was mixed with the sample right before loading sample into the gel to reach a final 5% glycerol (v/v). Samples were run on native 6% (w/v) polyacrylamide gel at 4°C in 0.5 X Tris Borate EDTA buffer for 1.5 hrs, followed by staining with ethidium bromide and visualization by phosphorimaging. 167 3. Results 3.1 Biochemical characterization of spin-labeled GATA3 double-finger (DF) Two sites on DF were selected for labeling: T280, located on N-finger, and I362, located on C-tail (Figure 1A, 3A). Examination of the crystal structures of the DF/palGATA complex [20] showed that these two sites are both located far away from the protein/DNA interface, and thus spin labels on these sites would have minimal effect on protein/DNA interaction. Both sites are also solvent accessible, allowing efficient spin labeling. Based on prediction from complex 1 crystal structure (Figure 1B), distance between these two amino acids on the protein in complex with palGATA (3.16 nm) also falls within the ideal range of DEER measurement. Four different constructs of DF were used in this study: wild type (wt), T280, I362, double-mutant (DM), with the site number in the construct name indicating the site mutated to cysteine for spin labeling, and DM indicating both T280 and I362 mutated to cysteines. After mutagenesis, protein expression and spin labeling, protein activity was first tested by EMSA for binding between protein and DNA containing palindromic GATA sites (palGATA). As shown in Figure 3A, palGATA binds completely to wt DF at a 1:1 concentration ratio, indicating that the correct protein was expressed, and the protein maintains the DNA binding activity. For constructs T280 and DM, binding with palGATA were tested on R5 labeled protein. At 1:1 ratio, complete binding was achieved between palGATA and R5 labeled protein (Figure 3B, 3C), indicating that the mutation and spin label attachments did not affect binding to palGATA. This demonstrated that mutation and spin labeling didn’t interfere with the zinc-finger formation and binding function at the concentration tested here, as compared to wt. 168 Figure 3: Electrophoretic mobility shift assay to assess DNA binding by DF. Two populations with different mobility were observed within each gel, the upper band corresponding to complex formed by palGATA and the protein, and the lower band being palGATA only. Protein used in each panel: (A) wild type DF; (B) R5 labeled T280; (C) R5 labeled DM. palGATA duplex concentration was kept at 10 µM in panel (A) and (B), and 5 µM in panel (C), with concentration of protein varied based on molar ratio with DNA. In addition to the EMSA experiments, cw-EPR spectroscopy was carried out to assess spin labeling of the GATA3 fragments. Spin labeling was carried out on all four protein constructs used (wt, T80, I362, and DM) following the exact same procedure. For the wt construct, a very poor S/N was acquired for cw-EPR lineshape, with the labeling efficiency estimated to be ~ 6% (Figure 4A). When compared to R5 labeled T280 construct under similar concentration (50-60 µM), the intensity of signal is significantly lower, indicating a much low spin concentration. For the protein constructs with exogenously introduced cysteines, the cw-EPR spectra were also acquired (Figure 4B, 4C, 4D, black traces). Spin counting of the spin-labeled mutated protein samples yielded a labeling efficiency of up to 75%, which is sufficient for EPR DEER measurement. Since the wt DF also contains 8 native cysteines buried in the zinc fingers and coordinating with zinc ions (Figure 1A), background labeling of these cysteines could potentially affect the labeling specificity on the mutated proteins, thus affecting EPR results. With the labeled efficiency measured above, it would be concluded that the spin labels didn’t attach onto 169 the native cysteines, and consistent with EMSA results, didn’t interfere with the structure and function of the zinc-fingers. Figure 4: Continuous-wave EPR spectra of R5-labeled DF obtained in aqueous solution at room temperature. (A) wild type DF construct compared to T280 construct without normalization under similar concentration. In (B) T280 construct, (C) I362 construct, and (D) DM construct, spectra were compared before and after binding to palGATA. In (B), (C) and (D), all spectra were normalized to the same number of spins to allow better comparison in lineshape. Complex formation between palGATA and R5-labeled protein was also examined by cw- EPR measurements. When spin labeled protein is bound to DNA, molecular size of the complex increases, leading to a slower global tumbling motion, thus results in a broader cw lineshape spectrum. Binding of each protein construct (except wt) was tested individually, as shown in Figure 4B, 4C, 4D, red traces. It was observed that when mixed with palGATA at a concentration ratio of 1:1, broader spectra were observed for all protein constructs compared to protein only spectra. Therefore, the observed broader spectra indicate the formation of complex with a larger molecular size upon protein binding to DNA. 170 Combining EMSA and cw-EPR results, it is concluded that the mutations and spin labeling do not affect protein binding to palGATA at condition tested in this work, and EPR signal originated from background labeling of the intrinsic cysteines is negligible, allowing us to use the spin labeled protein to retrieve conformational information of the protein in absence or presence of its DNA target. 3.2 Conformation of DF in the absence of GATA DNA The distance between two spin-labeled sites on the DM construct, T280 and I362, was measured in the absence of DNA. As shown in Figure 5, a clear decay was observed in the measured echo decay data, indicating dipolar interactions are clearly present between the two spin labels. Thus, on the naked protein, T280 and I362 are located within a range that would allow detectable dipolar interactions between the two R5 labels. However, the decaying trace shows no oscillation, indicating the lack of a well-defined distance between the two labels. A good spectral fitting was achieved on this measurement, as shown in the middle panel, and a nice L-curve was achieved, with the regulation parameter located at the “turn” of L-curve selected to obtain the distance distribution. L-curve is an evaluation of the accuracy and smoothness of the distance distribution based on simulated trace, and regulation paramter at the “turn” represents the best compromise between the two features. However, a close examination of the distance distribution (right panel) elucidates that there was no single dominating distance population. Instead, multiple population peaks located close to each other were observed, ranging from 2.0- 6.0 nm. During data analysis, no oscillation posed extra obstacles in background selection, since it’s harder to identify the background decay from dipolar interaction decay. Thus, different background corrections and zero point selection were carried out on the original decay data, and 171 they all yielded multiple distance peaks, but with slightly different distance distribution. Therefore, the multiple distances observed in the distribution are not a result of ambiguity in background or zero-point. Instead, it’s intrinsic to the system itself. Figure 5: DEER measurement and analysis on double-labeled naked protein. A pair of R5 were attached to T280 and I362. From left to right: normalized original echo decay data (black) with background decay (red); background corrected original echo decay (black) compared to simulated echo decay data (red); computed distance distribution corresponding to simulated echo decay data. Insert: L-curve of regularization parameter with the selected regularization parameter circled in red. This result may not be surprising from what’s already known about the protein. Due to the high flexibility of long linker between the two zinc-fingers, it allows relatively free motion of the zinc fingers in relative to each other. Thus, it’s highly possible for the two fingers to adopt an ensemble of different orientations regarding each other, resulting in different distances measured at the above two sites. Consequently, multiple distances are expected between spin labels on these two sites, as a result of the conformation variation of the parent molecule. 3.3 Solution conformation of DF bound to DNA containing palindromic target sites Crystal structures of DF bound to several DNA targets have been published [20] (Figure 1B), with slightly different DNA sequences leading to varied configurations of the complex. SDSL studies were first carried out on complex formed by DF and DNA duplex containing the palindromic GATA sites separated by 1 bp (designated as palGATA) (Figure 6A), which is the same as what was used to crystallize complex 1 (Figure 2A, left panel). 172 Figure 6: (A) Crystal structure of Complex 1 formed by DF and palGATA. The N-finger and linker regions are colored in green, while the C-finger and C-tail regions are colored in magenta. Two labeling sites on protein, T280 and I362, are shown with structure in red. DNA duplex is shown in yellow, with the two labeling sites, palX15 and palY15, colored in black. The assembles of blue dots in proximity to each labeling site indicate the locations of N atom for each allowable nitroxide conformer obtained from ALLNOX modeling. (B) DEER data on the complex formed between DF and palGATA. The pairs of R5 were attached to T280 and I362 of the protein. The measured distance from DEER and predicted distance from crystal structure were listed on the right. A complex was first assembled using double R5-labeled DM and unlabeled palGATA, and the inter-R5 distance between T280 and I362 was measured using DEER (Materials and Methods) (Figure 6B). In this set of data, the original echo decay trace clearly shows decay with oscillations, indicating a narrowly defined distance population. A nice fit was obtained with a good L-curve showing a clear “turn”. Three populations were observed in the distance distribution spectrum, with the narrow one in the middle being the major population. Examination of this population yielded an averaged distance of 3.2 nm. Using ALLNOX, the expected distance was computed to be 3.16 nm based on the complex 1 structure. The measured 173 and predicted distances differ by only 0.04 nm, indicating that the DEER measured distance matches distance predicted from crystal structure very well. As for the other two populations shown on the distribution of the T280-I362 distance (Figure 6B, right panel), the short distance population located <20 Å likely results from a very fast decay at the beginning of the DEER trace. However, it’s located beyond the reliable DEER measurement range, and only takes up small percentage of the total population. Thus, this population is neglected for the consideration of final average distance. Another small population was observed at ~ 4.6 nm, but the percentage of this population is so small compared to other populations that it’s also negligible. This T280-I362 result supports the crystal structure showing DF wraps around the palGATA duplex, allowing both fingers to have extensive interaction with the DNA duplex, thus leading to a more defined relative conformation between the two fingers. To further support this conclusion, three additional sets of distance were measured between DNA and protein (Table 1, Figure 7A). Two sites on palGATA, palX15 and palY15, were selected for these measurements, with both having expected distances to protein sites within the range of DEER measurement, and not interfering with protein binding (Figure 2C, 6A). Furthermore, one distance measurement was carried out between two spin labeled DNA sites (palX15 and palY15) on the complex (Figure 7B). As shown in Figure 7, satisfactory S/N was achieved for all distances measured between protein-DNA sites, and defined distance populations were retrieved. Besides, all measured distances between protein-DNA sites were compared to corresponding distances predicted from crystal structure, and they were all within 0.3 nm range (Table 1, Figure 7). Previously, it has been reported by us that DEER measurements hold a 0.2 nm variability range when R5 spin label is used for DNA DEER measurements [30]. Here, a couple of the measured 174 distances were slightly above this range compared to predicted distances. A few reasons could possibly contribute to this variation. First, R5 labeling on cysteine has 4 torsion angles instead of 3 for phosphorothioate labeling on DNA (Figure 2A, 2B). Thus, more variation in the spin label conformers is introduced for protein spin labeling. This could lead to larger variation range on the measurement itself. Besides, no experimental repeats were carried out on the current data sets, and thus it’s unknown whether the measurements and analysis would hold the same variation range. Table 1: Summary of distances measured on complex formed by DF and palGATA. Predicted distances from crystal structure and differences between prediction and DEER measurement are listed in corresponding columns as well. 175 Figure 7: Distances measured between the protein and the DNA in the complex formed between DF and palGATA. The labeling sites are indicated on the left, and corresponding predicted distances from complex 1 are listed on the right. (A) Three sets were measured between protein site and DNA site on the complex. (B) One set was measured between two DNA sites when complexed with wt DF. 176 However, with the relative small difference between all five measured distances from DEER in frozen solution state and those predicted from crystal, it can be concluded that the global structure of the DF/palGATA complex measured in solution state is consistent with that in crystal structure. In addition, the analyses also suggest that the expected distances predicted by ALLNOX are sufficiently accurate for analyzing protein/DNA complex conformations using the R5 label. 3.4 C-finger binds to single GATA sequence in a similar fashion as it does to palGATA Among the three crystal structures published on complexes formed by DF and DNA with composite GATA sites, it was quite interesting to observe that on two complexes (complex 2 and complex 3) one DF was able to bind to two DNA duplexes. This mode of binding may serve as the example of GATA protein bridging DNAs located far away or between chromosomes. Therefore, the recognition mode of DF construct binding two GATA sites in solution that are located farther apart became a research interest. One of the possible models to represent the two far-away located GATA sites is two separate DNA duplexes, each containing a single GATA site. Thus, a second DNA construct was designed, with only one possible GATA protein recognition site, designated as sgGATA (Figure 2C). It has been previously reported that for GATA family of transcription factor, both fingers are able to bind to DNA independently [11, 12]. Slight sequence preference has been reported between the two fingers, with C-finger preferring WGATAR (W=A/T; R=A/G) while N-finger favoring GATC [2, 13]. Thus, for sgGATA DNA, it is expected to be favored by C-finger. Consequently, when sgGATA and DF are mixed at equal molar concentration, it is expected that 177 the C-finger would be able to first recognize the GATA site on sgGATA and bind to it specifically. In order to test this hypothesis, two distances were measured on a double R5-labeled DF/sgGATA complex assembled at a 1:1 concentration ratio (Figure 8). For DEER measurement between I362 at the C-finger and sgX12 of the DNA (Figure 8B), an oscillation was observed in echo decay trace, and an excellent fit was achieved. The resulting distance distribution shows one dominant distance population centered at 2.9 nm. sgX12, which is located between T and A on the GATA site, is the equivalent site of palX15 in complex 1 for C-finger binding (Figure 2C, 8A). This distance measured here is exactly the same as distance measured between I362 and palX15, and is < 0.2 nm different from the distance predicted from crystal structure of DF/palGATA using equivalent sites (Figure 8A). This indicates a match between local structure of C-finger/palGATA and that of C-finger/sgGATA. Another site on the sgGATA DNA, sgY18, was also selected for distance measurement with I362. DEER signal measured between sgY18 and I362 yielded no obvious oscillation, yet a good fit was still achieved with the L-curve showing a clear turn (Figure 8C). The distance distribution contains one major population with a minor right shoulder (Figure 8C), indicating two distance populations close to each other. However, the one on the right shoulder represents a much small occupancy as compared to the major peak. Thus, only the major peak centered at 3.5 nm is considered. sgY18, located 8 bp away from sgX12 on the complementary strand, is equivalent to palY17 for C-finger binding on palGATA (Figure 2C, 8A). Expected corresponding distance on crystal structure complex 1 is 3.46 nm (Figure 8A), which is also very close to the measured distance, again supporting a match in conformation between the C-finger when bound to sgGATA or palGATA. 178 Figure 8: C-finger binding to sgGATA. (A) ALLNOX modeling. To mimic C-finger binding to one GATA site, the input model used was the complex 1 crystal structure with N-finger and the linker of GATA3 deleted to avoid extra contact with nitroxide conformers. palX15 and palY17, were selected as the equivalent sites for sgX12 and sgY18, respectively. All other coloring and labeling schemes were same as Figure 3(A). (B) DEER result on I362-sgX12. (C) DEER result on I362-sgY18. Multiple crystal or NMR structures have been published on the local binding between C- finger and GATA DNA [10, 16, 18, 20]. Although the DNA sequence and the protein construct 179 used in those structures vary among each other, the interface remains highly conserved. Thus, with the high conservation of C-finger/DNA interface, as well as the nice agreements between two measured distances on C-finger/sgGATA and those predicted from equivalent sites on the crystal structure of C-finger/palGATA, it is highly likely that in solution the C-finger in the DF context binds to sgGATA in the same fashion as it binds to palGATA. 3.5 N-finger fragment does not bind to DNA containing single GATA site in a specific conformation in the presence of C-finger Although no crystal structure of the individual N-finger bound to DNA has been published, it has been reported that N-finger is capable of binding to DNA independently [12, 13], although it favors a slightly different target sequence, GATC, as compared to GATA for C-finger. Thus, when DF is mixed with sgGATA at equal molar ratio, sgGATA is likely specifically bound by the C-finger, while the N-finger behavior remains unknown. Therefore, distances were measured between T280 on the N-finger of the DF construct and sgGATA sites to investigate the relative positioning of N-finger to the DNA. Measurements were first carried out when DF is bound to sgGATA at 1:1 molar ratio (Figure 9). For the DEER trace measured between T280 and sgX12, a clear decay was still observed (Figure 9A, left panel), indicating dipolar interactions between T280 on N-finger and sgX12. The simulated spectrum, however, doesn’t regenerate all the features in the original trace, in particular at the beginning region (Figure 9A, middle panel). Distance distribution output multiple distances ranging between < 2.0 nm to ~ 5 nm. As mentioned before, the shortest distance < 2.0 nm likely originates from ESEEM effect or the small fraction of protein aggregates, and is shorter than the ideal range for DEER measurement. Among multiple distance 180 populations between 2.0 nm and 5.0 nm, the one centered at 4.5 nm is the most dominant. However, the S/N in the original decay trace is far from satisfactory. In particular, ESEEM effect is quite pronounced, as shown by the ultra-fast oscillation in the echo decay trace. The poor S/N and ESEEM effect posed obstacle on zero-point selection and background selection for further analysis. During analysis, background and zero-point selection were varied, yet multiple distance populations were still observed in all different parameter trials, although the exact location and percentage of each population varied. This is one of the reasons why the confidence on the observed distance distribution is low. In addition, giving the S/N of this data set, the evolution time used (i.e., 2 µs) is hardly sufficient to accurately measure distance longer than 4 nm [28]. In summary, the T280-sgX12 measurement indicates that the N-finger is located close enough to the DNA to allow dipolar interaction between N-finger and sgGATA. However, the distance measured in between is not one defined population. Instead, it’s more likely an ensemble of multiple populations, with the exact value and percentage of each population unclear. 181 Figure 9: (A) DEER result on T280-sgX12 at 1:1 concentration ratio. (B) DEER result on T280-sgY18 at 1:1 concentration ratio. A similar result was observed for distance measured between T280 on DF and sgY18 (Figure 9B). Again, the decay trace shows a clear decay, together with strong ESEEM effect, poor S/N and no oscillation. The final distance analysis also shows multiple distance populations. Other than the one < 2.0 nm, the more dominant population is located ~ 4.5 nm. However, for reasons stated for the T280-sgX12 measurement, confidence on the T280-sgY18 dataset is also low. Data described above indicate that at a 1:1 molar ratio, C-finger specifically binds to sgGATA to form the C-module. At the same time, N-finger could either interact with the C- module non-specifically, or have no interaction with C-module at all. With two sets of distances measured between T280 and sgGATA, it’s suggested that N-finger still stays close enough to the C-module to result in clear dipolar interactions. However, the location of N-finger as referenced to C-module is not defined. Instead, it’s more likely an ensemble of varied conformation, indicating that N-finger is not specifically interacting with sgGATA at this concentration ratio. 182 However, with the known capability of N-finger able to bind DNA target independently, another hypothesis was proposed that as the sgGATA to DF molar ratio is increased from 1:1 to 2:1, the N-finger may then bind to a DNA molecule that is not bound by the C-finger. In order to test this hypothesis, the same two sets of distances between T280 and sgGATA were measured when molar ratio of sgGATA to DF was increased to 2:1 (Figure 10A, 10B). Interestingly, in both decay traces, clear echo decay was detected, indicating presence of dipolar interactions, while there is no oscillation and S/N is quite poor with strong ESEEM effect. Also, multiple distance populations were observed for both measurements, with the dominating population other than the ultra-short distance located around 4.5 nm. 183 Figure 10: (A) DEER result on T280-sgX12 at 1:2 concentration ratio. (B) DEER result on T280-sgY18 at 1:2 concentration ratio. (C) Original DEER traces and background correction for negative control samples. In the negative control samples, unlabeled DF was complexed with R5-labeled sgX12 (top) or R5-labeled sgY18 (bottom) at molar ratio of 1:2. After background correction, no DEER signal was detected, as shown by the flat line in the background corrected traces on the right panels. In order to make sure the DEER signal doesn’t originate from two singly-labeled sgGATA duplexes in proximity to each other, negative controls with singly-labeled sgGATA bound to non-labeled DF at 2:1 molar ratio were carried out, and no DEER signal was detected after 184 background correction (Figure 10C). Thus, with sgGATA:DF at 2:1, there’s no two spin labeled sgGATA located within vicinity to each other to allow detectable dipolar interactions. Consequently, DEER signal detected at this concentration ratio comes exclusively from spin labeled T280 (N-finger) dipolar interacting with spin labeled sgGATA. Furthermore, the measurements between sgGATA and T280 at 1:1 and 2:1 molar ratio were compared to each other (Figure 10A, 10B compared to 9A, 9B). Even though the distance distributions between different molar ratios on the same distance set differ slightly, confidence on the difference revealed is low due to reasons described in analysis of 1:1 ratio DEER results earlier in this part. Furthermore, when corresponding original echo decay traces were overlaid for the same distance set measured at 1:1 and 2:1 ratio (Figure 11A), only small background decay difference were observed, while the majority of the decay segments overlay near perfectly. This suggests that the excess amount of sgGATA did not altered the interaction between N- finger and sgGATA as compared to that at 1:1 molar ratio. Thus, under the experimental conditions described here, no evidence was obtained regarding the N-finger binds to DNA in a specific fashion in the presence of C-finger. 185 Figure 11: (A) Overlay of original DEER traces on T280-sgX12 at concentration ratio of 1:1 (black) and 1:2 (red). (B) Overlay of original DEER traces on T280-sgY18 at concentration ratio of 1:1 (black) and 1:2 (red). (C) ALLNOX modeling. To mimic N-finger binding to one GATA site, the input model used was the complex 1 crystal structure with C-finger and C-tail of GATA3 deleted to avoid extra contact with nitroxide conformers. palY16 and palX16, were selected as the equivalent sites for sgX12 and sgY18, respectively. All other coloring and labeling schemes were same as Figure 3(A). To further examine the possibility that N-finger may recognize the GATA site on the sgGATA DNA in a similar fashion as that observed in the bound palGATA crystal structure (Figure 11B), complex 1 was used to predict distances between the N-finger and sgGATA DNA sites sgX12 (equivalent to palY16) and sgY18 (equivalent to palX16) (Figure 11C). The 186 resulting expected distances were 3.46 nm and 2.15 nm for T280-sgX12 and T280-sgY18, respectively, both of which deviate significantly from the measured distances of 4.5 nm and 4.6 nm (Figure 9, 10A, 10B). Thus, it can be concluded that N-finger is unlikely to recognize the GATA site on the sgGATA DNA in a similar fashion as observed for the bound palGATA in the crystal structure. In addition, one set of distance measurement was carried out with both spin labels on protein at T280 and I362 at concentration ratio of 1:1 and 2:1. As shown in Figure 12, although the original decay spectra show poor S/N, they present no observable difference when acquired with no DNA, sgGATA:DF at 1:1 or 2:1 other than slight different in background. It could thus be inferred that the addition of sgGATA doesn’t cause detectable change in distance between these two protein sites under current DEER measurement condition. 187 Figure 12: Overlay of original DEER traces measured between spin labeled T280 and I362 when the protein is in absence of DNA (black), in presence of same molar concentration of sgGATA (red), and in presence of sgGATA with twice molar concentration (blue). Given the lack of definitive changes in the DEER data upon the addition of sgGATA at 1:1 and 1:2 ratios, EMSA experiments were carried out to assess binding of DNA by DF. As shown in Figure 13, upon the addition of protein into sgGATA, a higher complex band started showing up. However, only one higher band was observed for different protein/DNA concentration ratios, indicating that only one type of complex was formed despite the ratio variation. Even when DNA/protein concentration ratio was increased to over 3:1 (1:0.3), no extra complex population was detected. As the DEER data shows that the C-finger binds to sgGATA, while N-finger probably interacts with sgGATA in a non-specific mode, the complex population observed in 188 mobility shift most likely represents the complex formed by one sgGATA (20 bp) DNA bound by the C-finger, while the N–finger is either moving freely or contacting the C-module non- specifically. It was also observed that this complex band runs slightly higher than the complex formed by DF and palGATA (22 bp DNA), even though the DNA is 2 bp shorter. This could be due to the fact that in the complex formed with sgGATA, N-finger is not tightly bound to the C- finger/DNA module, thus leading to a less compact global structure compared to the one formed with palGATA in which the protein wraps around the DNA duplex. It should also be noted that even when sgGATA:DF is 1:5, free DNA band was still detectable. That could be due to either the low binding affinity for the current protein/DNA construct or the dissociation of complex during the process of gel shift. Figure 13: EMSA to assess complex formation between DNA duplexes and wild type DF. For first seven samples with varied concentrations of sgGATA to DF, they were prepared by directly mixing the protein with sgGATA duplex stock in 200mM NaCl, incubated on ice for 30min. The DNA duplex concentration was kept constant at 5 µM among these samples. The DEER sample of doubly-labeled DF complexed with palGATA was included on the right as a reference, of which the amount loaded on the gel not accurately measured. In summary, while the C-finger is capable of specifically recognizing sgGATA in a similar fashion as that for palGATA, DEER and EMSA results indicate that under the conditions of experiments described here, the N-finger does not bind DNA in a unique conformation, although 189 it remains in proximity to the C-module, as shown by the dipolar interaction between T280 and sgGATA. 4. Discussion In this work, pulsed-EPR, in particular DEER, was explored to study a multi-domain protein, namely double zinc-finger GATA3, and its interactions with DNA target. Following successful mutagenesis and expression of protein constructs, the R5 spin label was successfully attached to both protein and DNA sites, and was shown to minimally impact protein-DNA binding. With the capability of simultaneously attaching spin labels onto both protein and DNA sites in our work, distances can be measured not only between DNA sites or between protein sites, but also between protein site and DNA site. This is unique to the work reported here, as it is a powerful tool to give more structural information regarding large protein/DNA complexes. The results show that the relative position and orientation between the two zinc-fingers is not defined for the naked protein, as expected from the high flexibility of the linker region. When bound to the palGATA DNA in solution, defined distances were measured on the complex, and the results indicate that the protein “wraps” around the DNA as suggested by the crystal structure. The agreements between DEER measured distances and those predicted using the crystal structure also validate the newly developed program, ALLNOX, for modeling allowable conformers of R5 onto DNA and protein sites. When the protein was bound to the sgGATA DNA, which contains a single GATA sequence, the current data indicates that the DNA is only specifically bound (recognized) by the C-finger, and the recognition follows the same fashion as observed for palGATA bound by the C-finger. 190 At the same time, the N-finger is located in vicinity of the C-module, but does not bind to the DNA in a defined mode under the conditions tested. However, many questions remain unsolved for the system. For example, using the sgGATA DNA, specific DNA-binding by N-finger was not detected in the studies described. It is possible that the folding of the N-finger is somewhat altered in these studies, leading to lower binding affinity with DNA target. Beside, the lack of N-finger/sgGATA interaction may arise from the difficulty of bring the two DNA duplexes to close proximity to allow simultaneous binding by the two fingers. In order to address this issue, a tethering strategy could to be applied as previously reported [20]. In summary, work presented here demonstrates the use of SDSL to study multi-domain DNA binding proteins. Further investigations are required to gain detailed information on independent DNA binding modes of the two zinc-fingers in GATA3, as well as the relative position and orientation between the two fingers as they interact with various DNA targets. 191 References 1. Ko, L.J. and J.D. Engel, DNA-binding specificities of the GATA transcription factor family. Mol Cell Biol, 1993. 13(7): p. 4011-22. 2. Merika, M. and S.H. Orkin, DNA-binding specificity of GATA family transcription factors. Mol Cell Biol, 1993. 13(7): p. 3999-4010. 3. Molkentin, J.D., The zinc finger-containing transcription factors GATA-4, -5, and -6. Ubiquitously expressed regulators of tissue-specific gene expression. J Biol Chem, 2000. 275(50): p. 38949-52. 4. Patient, R.K. and J.D. McGhee, The GATA family (vertebrates and invertebrates). Curr Opin Genet Dev, 2002. 12(4): p. 416-22. 5. Weiss, M.J. and S.H. Orkin, GATA transcription factors: key regulators of hematopoiesis. Exp Hematol, 1995. 23(2): p. 99-107. 6. Ho, I.C., et al., Human GATA-3: a lineage-restricted transcription factor that regulates the expression of the T cell receptor alpha gene. EMBO J, 1991. 10(5): p. 1187-92. 7. George, K.M., et al., Embryonic expression and cloning of the murine GATA-3 gene. Development, 1994. 120(9): p. 2673-86. 8. Zheng, W. and R.A. Flavell, The transcription factor GATA-3 is necessary and sufficient for Th2 cytokine gene expression in CD4 T cells. Cell, 1997. 89(4): p. 587-96. 9. Evans, T., M. Reitman, and G. Felsenfeld, An erythrocyte-specific DNA-binding factor recognizes a regulatory sequence common to all chicken globin genes. Proc Natl Acad Sci U S A, 1988. 85(16): p. 5976-80. 10. Omichinski, J.G., et al., A small single-"finger" peptide from the erythroid transcription factor GATA-1 binds specifically to DNA as a zinc or iron complex. Proc Natl Acad Sci U S A, 1993. 90(5): p. 1676-80. 11. Visvader, J.E., et al., The C-terminal zinc finger of GATA-1 or GATA-2 is sufficient to induce megakaryocytic differentiation of an early myeloid cell line. Mol Cell Biol, 1995. 15(2): p. 634-41. 12. Pedone, P.V., et al., The N-terminal fingers of chicken GATA-2 and GATA-3 are independent sequence-specific DNA binding domains. EMBO J, 1997. 16(10): p. 2874- 82. 13. Newton, A., J. Mackay, and M. Crossley, The N-terminal zinc finger of the erythroid transcription factor GATA-1 binds GATC motifs in DNA. J Biol Chem, 2001. 276(38): p. 35794-801. 14. Martin, D.I. and S.H. Orkin, Transcriptional activation and DNA binding by the erythroid factor GF-1/NF-E1/Eryf 1. Genes Dev, 1990. 4(11): p. 1886-98. 15. Trainor, C.D., et al., A palindromic regulatory site within vertebrate GATA-1 promoters requires both zinc fingers of the GATA-1 DNA-binding domain for high-affinity interaction. Mol Cell Biol, 1996. 16(5): p. 2238-47. 16. Bates, D.L., et al., Crystal structures of multiple GATA zinc fingers bound to DNA reveal new insights into DNA recognition and self-association by GATA. J Mol Biol, 2008. 381(5): p. 1292-306. 17. Omichinski, J.G., et al., NMR structure of a specific DNA complex of Zn-containing DNA binding domain of GATA-1. Science, 1993. 261(5120): p. 438-46. 18. Starich, M.R., et al., The solution structure of a fungal AREA protein-DNA complex: an alternative binding mode for the basic carboxyl tail of GATA factors. J Mol Biol, 1998. 277(3): p. 605-20. 192 19. Liew, C.K., et al., Zinc fingers as protein recognition motifs: structural basis for the GATA-1/friend of GATA interaction. Proc Natl Acad Sci U S A, 2005. 102(3): p. 583-8. 20. Chen, Y., et al., DNA binding by GATA transcription factor suggests mechanisms of DNA looping and long-range gene regulation. Cell Rep, 2012. 2(5): p. 1197-206. 21. Yu, M., et al., Insights into GATA-1-mediated gene activation versus repression via genome-wide chromatin occupancy analysis. Mol Cell, 2009. 36(4): p. 682-95. 22. Leung, T.H., A. Hoffmann, and D. Baltimore, One nucleotide in a kappaB site can determine cofactor specificity for NF-kappaB dimers. Cell, 2004. 118(4): p. 453-64. 23. Meijsing, S.H., et al., DNA binding site sequence directs glucocorticoid receptor structure and activity. Science, 2009. 324(5925): p. 407-10. 24. Stroud, J.C., et al., Structure of a TonEBP-DNA complex reveals DNA encircled by a transcription factor. Nat Struct Biol, 2002. 9(2): p. 90-4. 25. Giffin, M.J., et al., Structure of NFAT1 bound as a dimer to the HIV-1 LTR kappa B element. Nat Struct Biol, 2003. 10(10): p. 800-6. 26. Qin, P.Z., et al., Measuring nanometer distances in nucleic acids using a sequence- independent nitroxide probe. Nat Protoc, 2007. 2(10): p. 2354-65. 27. Pannier, M., et al., Dead-time free measurement of dipole-dipole interactions between electron spins. J Magn Reson, 2000. 142(2): p. 331-40. 28. Jeschke, G., et al., DeerAnalysis2006 - a comprehensive software package for analyzing pulsed ELDOR data. Applied Magnetic Resonance, 2006. 30(3-4): p. 473-498. 29. Cai, Q., et al., Site-directed spin labeling measurements of nanometer distances in nucleic acids using a sequence-independent nitroxide probe. Nucleic Acids Res, 2006. 34(17): p. 4722-30. 30. Zhang, X., et al., Conformations of p53 response elements in solution deduced using site- directed spin labeling and Monte Carlo sampling. Nucleic Acids Res, 2014. 42(4): p. 2789-97. 193 Supplementary Information Figure S1: Protein sequence expressed and used in this study. The protein construct contains a total of 132 amino acid, with sequence in the parentheses the wild-type human GATA3 sequence 260-370. His-tag in the sequence is colored in red. Two labeling sites on the protein, T280 and I362, are colored in green and purple, respectively. 194 Figure S2: Sequencing results after mutagenesis in alignment with wt sequence for T280 construct (A), I362 construct (B), and DM construct (C). In each panel, the mutated sequence is listed as Query sequence, while wt sequence as Subject sequence. Sequence alignment was carried out using online tool ClustalW2. The plasmid sequence of the whole expressed region, as listed in SI Figure S1 was used for alignment, with the only mismatch in each alignment being the mutation site. 195 196 197 Figure S3: Mono S trace for construct T280 purification. Black: UV absorption at 280 nm; red: Buffer B percentage during elution; blue: buffer conductivity. The UV280 curve is scaled to Y- axis on the left, while Buffer B percentage and conductivity curves are scaled to Y-axis on the right. Two major UV peaks was observed on Mono S trace, which is likely a result of multiple conformations of the protein with variation in the flexible linker region. The protein identity (size) and purity of each elute fraction was tested by SDS-PAGE gel. 198 Figure S4: SDS-PAGE gel of T280 after Mono S column. Six fractions identified by UV280 peaks were collected and they all showed correct protein size with satisfactory purity. Protein ladder was loaded on the left lane as reference, with the size of each band labeled on the left in unit of kD. Protein construct used in this study has a molecular weight of 14.7 kD, thus runs slightly lower than the 16 kD band in the ladder. 199 Table S1: Primers used to mutate T280 and I362 during mutagenesis. Construct T280 and I362 were directly achieved by mutagenesis using wild type plasmid as template, while DM construct was achieved by using T280 primers and I362 construct plasmid as template during mutagenesis. The mutated sequences for cysteine were labeled in red. 200 Table S2: Summary of DNA strands used in this study with sequences and extinction coefficients at 260 nm. Spin labeling sites are not listed in the sequences here.
Abstract (if available)
Abstract
Deoxyribonucleic acid (DNA) is responsible for storage and transmission of genetic information. An essential aspect fulfilling this crucial function is the specific interactions (recognitions) of particular DNA duplex sequences by molecules such as proteins. It has been established that specific protein-DNA recognition occurs via a combination of base readout (i.e., direct interactions between protein and DNA base functional group) and shape readout (i.e., interactions based on sequence-dependent chemical and physical features of DNA duplexes). As such, sequence-dependent shape of DNA duplexes, particularly in the protein-free (i.e., naked) state, is an important determinant in protein-DNA recognition. However, while ample structural information exists for protein-bound DNAs, information on naked DNA remains rather limited. In this thesis, I present work on using a biophysical method, called site-directed spin labeling (SDSL), to study DNA shape and its impact on protein recognition. Using a nucleotide-independent nitroxide spin label, continuous-wave (cw-) and pulsed- electron paramagnetic resonance (EPR) spectroscopy, and new analytical techniques I developed, I investigated DNA shape and protein/DNA recognition in two systems: (i) the p53 response elements and their recognition by the DNA-binding domain of the p53 tumor suppressor
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Enhanced site-directed spin labeling technique for monitoring RNA dynamics with rigid nitroxides
PDF
Site-directed spin labeling studies of target DNA recognition by a CRISPR nuclease
PDF
Site-directed spin labeling studies of the phi29 packaging RNA
PDF
Molecular and computational analysis of spin-labeled nucleic acids and proteins
PDF
Sensing sequence-specific DNA micro-environment with nucleotide-independent nitroxides
PDF
Decoding protein-DNA binding determinants mediated through DNA shape readout
PDF
Motions and conformations of nucleic acids studied using site-directed spin labeling
PDF
Genome-wide studies of protein–DNA binding: beyond sequence towards biophysical and physicochemical models
PDF
Genome-wide studies reveal the function and evolution of DNA shape
PDF
Quantitative modeling of in vivo transcription factor–DNA binding and beyond
PDF
Profiling transcription factor-DNA binding specificity
PDF
Site-directed spin labeling measurements of nanometer distances in nucleic acids using a sequence-independent nitroxide probe
PDF
Investigating mechanisms of DNA target recognition by CRISPR-Cas nucleases
PDF
Novel synthesis of β-glycosides for SPPS of GLCNAC glycoproteins and study of their site-specific biochemical and biophysical consequences
PDF
Understanding protein–DNA recognition in the context of DNA methylation
PDF
DNA shape at transcription factor binding sites: from purifying selection to a new alphabet
PDF
Machine learning of DNA shape and spatial geometry
PDF
Using cyclotides as a bioscaffold to target intracellular protein-protein interactions
PDF
Activation mechanism of damaged-induced DNA polymerase V in Escherichia coli
PDF
The structure and function of membrane curving proteins on different membrane shapes and their regulation by post-translational modifications
Asset Metadata
Creator
Ding, Yuan
(author)
Core Title
Site-directed spin labeling studies of sequence-dependent DNA shape and protein-DNA recognition
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Chemistry
Publication Date
07/16/2017
Defense Date
06/03/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
DNA shape,EPR,OAI-PMH Harvest,protein-DNA recognition,SDSL
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Qin, Peter Z. (
committee chair
), Langen, Ralf (
committee member
), Mak, Chiho (
committee member
)
Creator Email
yuanding@usc.edu,yuanding1988@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-598186
Unique identifier
UC11301294
Identifier
etd-DingYuan-3634.pdf (filename),usctheses-c3-598186 (legacy record id)
Legacy Identifier
etd-DingYuan-3634.pdf
Dmrecord
598186
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Ding, Yuan
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
DNA shape
EPR
protein-DNA recognition
SDSL