Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Decoding protein-DNA binding determinants mediated through DNA shape readout
(USC Thesis Other)
Decoding protein-DNA binding determinants mediated through DNA shape readout
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DECODING PROTEIN-DNA BINDING
DETERMINANTS MEDIATED THROUGH DNA
SHAPE READOUT
by
Ana Carolina Dantas Machado
____________________________________________________________________
A DISSERTATION PRESENTED TO THE
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR THE DEGREE
DOCTOR OF PHILOSOPHY
(MOLECULAR BIOLOGY)
December 2016
ii
Ana Carolina Dantas Machado
Copyright 2016
iii
DEDICATION
To my family, who always gave me their unconditional love and support.
iv
ACKNOWLEDGEMENTS
I cannot find enough words to properly thank all those people who have played such a
huge role during my Ph.D. studies. First, I would like to thank my advisor and mentor, Dr
Remo Rohs, for everything he has done for me. I was extremely fortunate to have been
given two big opportunities by him - the first was to join his computational biology
group, and the second, a few years later, to go back to the bench to test our hypothesis. In
both cases, he took a chance on me and supported me to broaden my field of research and
horizons, without measuring efforts to help me find the environment that I needed to
succeed. Remo has contributed immensely to my scientific and personal development
over the last years, and I am very grateful for all his guidance and support. I also want to
thank Dr. Lin Chen for opening the doors of his lab to me, and for always inspiring me
through his passion and enthusiasm for science. In addition, I would like to thank Dr.
Helen Berman for all the support and helpful discussions, and Dr. Oscar Aparicio for also
opening the doors of his lab and supporting me through my endeavor. Next, I would like
to thank my office mates and lab mates, and members from Dr Chen and Dr Aparicio
laboratories for the discussions, guidance and expertise provided, which were really
helpful throughout the last years of my Ph.D. studies. I also owe a special thanks to Dr
Rosa Di Felice, Dr Phil Bradley, Dr Lin Yang, Judith Kribelbauer, Xiao Lei and Yan Gan
for sharing their expertise and providing helpful discussions, and to all of our
collaborators, for the great discussions, always advancing our science. I want to thank my
friends Jordan Eboreime, Melina Butuci, Sandra Villwock, and my boyfriend Jared Peace
for making LA feel like my home away from home, always being there for me no matter
what! And thank you to all international friends for all the great moments we shared
together and for introducing me to so many great different cuisines. I also owe many
thanks to some people who were very supportive of me getting to grad school, each of
them in a very unique way - Denise Guimaraes, Paulo and Christine Medeiros, and Jon
Steiger, thank you so much! And of course, big thanks to my family for showing me the
really important things in life. Um agradecimento mais que especial aos meus tios
Alvaro, Bado e Tininha, por todo apoio nessa longa jornada de estudos. Obrigada, Mãe,
Vó, Ju e Pedro por todo amor, suporte e incentivo.
v
TABLE OF CONTENTS
DEDICATION ............................................................................................................................. III
ACKNOWLEDGEMENTS ........................................................................................................ IV
LIST OF FIGURES ................................................................................................................... VII
LIST OF TABLES .................................................................................................................... VIII
LIST OF ABBREVIATIONS ..................................................................................................... IX
REFERENCES TO ADAPTED WORK ................................................................................... XI
ABSTRACT ................................................................................................................................ XII
1 INTRODUCTION ................................................................................................................. 1
1.1 DNA-PROTEIN INTERACTIONS IN THE SPOTLIGHT .......................................................... 1
1.2 PROPERTIES OF THE DNA DOUBLE-HELIX ..................................................................... 3
1.3 PROTEIN-DNA RECOGNITION MODES ............................................................................ 5
1.3.1 Base Readout .............................................................................................................. 6
1.3.2 Shape Readout ............................................................................................................ 8
1.3.3 Interplay between shape and base readout ................................................................ 9
1.4 EXPERIMENTAL METHODS TO STUDY PROTEIN-DNA BINDING .................................... 11
1.4.1 Protein Binding Microarray (PBM) ........................................................................ 13
1.4.2 SELEX-seq ............................................................................................................... 14
1.5 PROTEIN-DNA BINDING SITES PRESENTED IN THIS THESIS .......................................... 15
1.5.1 Large T antigen and origin DNA ............................................................................. 16
1.5.2 Ferric uptake regulator (Fur) and its DNA sites ..................................................... 16
1.5.3 RNA polymerase aCTD and the upstream region ................................................... 17
1.5.4 p53 and its response elements (REs) ........................................................................ 17
1.5.5 DNase I endonuclease and DNA cleavage rate ....................................................... 18
1.5.6 The myocyte enhancer factor 2 (MEF2) and its DNA target sites ........................... 18
1.6 OVERVIEW .................................................................................................................... 19
2 METHODS .......................................................................................................................... 21
2.1 COMPUTATIONAL ANALYSIS OF THE BOUND DNA STRUCTURE .................................. 21
2.1.1 General computational methodology for DNA structure analysis ........................... 21
2.1.2 Computational analysis of p53 BAX and 3KMD ..................................................... 22
2.2 PREDICTION OF UNBOUND DNA STRUCTURE ............................................................... 22
2.2.1 Monte Carlo simulation to predict the unbound DNA structure ............................. 22
2.2.2 Prediction of unmethylated and methylated DNA for DNase I studies .................... 23
2.2.3 Generation of DNA models for in solution deduction of p53 conformations .......... 24
2.3 ESTIMATION OF UNBOUND MINOR GROOVE WIDTH THROUGH ORCHID2 ................... 24
2.4 PREDICTION OF PROTEIN-DNA BINDING ENERGY USING ROSETTA ............................. 24
2.5 HIGH-THROUGHPUT DATA ANALYZED ......................................................................... 25
2.6 EXPERIMENTAL PROCEDURES FOR SELEX-SEQ .......................................................... 26
2.6.1 Protein expression and purification ......................................................................... 26
2.6.2 Oligonucleotides ...................................................................................................... 28
2.6.3 SELEX-seq Library generation ................................................................................ 28
2.6.4 Electrophoretic mobility shift assay (EMSA) ........................................................... 29
2.6.5 Generation of selected library for rounds of selection and sequencing .................. 30
2.7 HIGH-THROUGHPUT PREDICTION OF DNA SHAPE ........................................................ 31
2.8 REGRESSION MODELS FOR BINDING AFFINITY PREDICTION ......................................... 31
2.9 HIGH-THROUGHPUT DATA ANALYSIS ........................................................................... 32
vi
2.9.1 Minor groove width analysis for DNase I sites ........................................................ 32
2.9.2 DNA shape analysis of ChIP-seq for MEF2A and MEF2C ..................................... 32
2.9.3 HT-SELEX data analysis of MEF2D ....................................................................... 33
2.9.4 SELEX-seq data analysis of MEF2B ....................................................................... 33
3 RESULTS ............................................................................................................................. 35
3.1 PROTEIN-DNA READOUT LESSONS FROM CRYSTAL STRUCTURES ............................... 35
3.1.1 Expanding the repertoire of DNA shape recognition modes ................................... 36
3.1.2 DNA conformational changes induced upon p53 binding ....................................... 43
3.2 DNA SHAPE READOUT IN THE GENOMICS ERA: RESULTS FROM DNASE I CLEAVAGE
AND METHYLATION STATUS ...................................................................................................... 50
3.2.1 DNase I cleavage rate and DNA methylation .......................................................... 50
3.2.2 Evolving insights into how cytosine methylation affects protein-DNA binding ....... 55
3.3 STUDIES OF THE MYOCYTE ENHANCER FACTOR-2 (MEF2) .......................................... 62
3.3.1 Analysis of the crystal structure reveals DNA shape readout ................................. 63
3.3.2 High-throughput data of MEF2 TFs suggests DNA shape contributions ................ 64
3.3.3 Investigating MEF2B mutants’ ability to interfere in DNA readout ....................... 67
3.3.4 Using SELEX-seq to interrogate MEF2B binding preferences ............................... 68
4 DISCUSSION ...................................................................................................................... 73
4.1 BROADER ARRAY OF READOUT MODES FROM CRYSTAL STRUCTURE ANALYSIS ......... 73
4.1.1 DNA shape readout recognition by the RNA polymerase aCTD ............................. 73
4.1.2 Intricacies of DNA shape readout recognition by histidines ................................... 74
4.1.3 DNA shape readout by lysines in the minor groove ................................................. 75
4.1.4 P53 and DNA conformational changes of its REs BAX and p21 ............................. 77
4.2 HOW DNA SHAPE AND CPG METHYLATION CAN AFFECT DNASE I CLEAVAGE .......... 78
4.3 INSIGHTS FROM MEF2-DNA RECOGNITION ................................................................. 80
4.4 OVERALL IMPLICATIONS FOR PROTEIN-DNA RECOGNITION ....................................... 85
APPENDIX ................................................................................................................................... 89
APPENDIX A The effect of non-coding nucleotide variations in DNA shape ................... 89
APPENDIX B Forkhead proteins and their DNA binding sites ......................................... 92
APPENDIX C MEF-2 DNA binding experiments validation ............................................. 96
BIBLIOGRAPHY ...................................................................................................................... 100
vii
LIST OF FIGURES
FIGURE 1. SCHEMATIC REPRESENTATION OF DNA PARAMETERS. .................................................................... 4
FIGURE 2. BASE AND SHAPE READOUT CONTRIBUTE TO TF–DNA BINDING SPECIFICITY. ................................ 6
FIGURE 3. INTERPLAY OF BASE AND SHAPE READOUT VARIES AMONG TF FAMILIES. ..................................... 10
FIGURE 4. OVERVIEW OF SELEX-SEQ PROTOCOL. ......................................................................................... 26
FIGURE 5. SHAPE READOUT BY THE aCTD SUBUNIT OF RNA POLYMERASE. ................................................ 37
FIGURE 6. AAA+ DOMAIN OF LTAG EMPLOYS DNA SHAPE READOUT RECOGNITION. ................................... 38
FIGURE 7. MINOR GROOVE SHAPE READOUT THROUGH HISTIDINE RESIDUES FOR THE IFN-Β ENHANCEOSOME.
.............................................................................................................................................................. 40
FIGURE 8. NARROW DNA MINOR GROOVE GEOMETRY WHERE THE LYS15 RESIDUE BINDS. .......................... 41
FIGURE 9. LYSINE IS EMPLOYED TO RECOGNIZE MINOR GROOVE SHAPE. ....................................................... 43
FIGURE 10. DNA STRUCTURE ANALYSES OF P53 RE BAX COMPARED TO 3KMD DNA. .............................. 45
FIGURE 11. THE OUTER REGION OF THE DNA ACCOMMODATES THE EXTRA BP IN THE BAX COMPLEX. ........ 46
FIGURE 12. ANALYSES OF P53 RE STRUCTURES. ............................................................................................ 49
FIGURE 13. MGW IS PREDICTIVE OF DNASE I CLEAVAGE RATE. .................................................................... 52
FIGURE 14. OBSERVATION AND ANALYSIS OF THE EFFECT OF METHYLATION ON DNASE I CLEAVAGE RATE. 54
FIGURE 15. INTRINSIC DNA METHYLATION SENSITIVITY OF DNASE I. .......................................................... 56
FIGURE 16. BASE AND SHAPE READOUT OF METHYLATED DNA. ................................................................... 58
FIGURE 17. DNA SHAPE RECOGNITION BY MEF2. ......................................................................................... 64
FIGURE 18. ANALYSIS OF DNA SHAPE FROM CHIP-SEQ FOR MEFA AND MEF2C. ....................................... 65
FIGURE 19. MODEL PERFORMANCE USING K-MERS AND SHAPE FEATURES FOR MEF2D BINDING DATA. ....... 66
FIGURE 20. MEF2B MUTATIONS SELECTED FOR THIS STUDY. ........................................................................ 68
FIGURE 21. INFERRING SELEX-SEQ RELATIVE AFFINITIES. ............................................................................ 69
FIGURE 22. SELEX-SEQ PROFILE FOR WT MEF2B. ........................................................................................ 71
FIGURE 23. MULTIPLE SEQUENCE ALIGNMENT AROUND H513. ...................................................................... 75
FIGURE 24. PROTEIN SEQUENCE ALIGNMENT BETWEEN MEF2 FAMILY MEMBERS AND MADS-BOX PROTEINS.
.............................................................................................................................................................. 84
FIGURE 25. CHANGE IN MINOR GROOVE WIDTH FOR SNVS LOCATED IN TFBSS AND ITS FLANK REGIONS. .... 90
FIGURE 26. THE ADDITION OF DNA SHAPE FEATURES INCREASES PREDICTION OF FORKHEAD BINDING
SPECIFICITY. .......................................................................................................................................... 93
FIGURE 27. CO-EVOLUTION BETWEEN FORKHEAD PROTEINS AND THEIR TARGET SITES. ................................ 95
FIGURE 28. MULTIPLE PROTEIN ALIGNMENT FROM MEMBERS OF THE FKH FAMILY. ...................................... 96
FIGURE 29. PURIFIED PROTEIN USED FOR EMSA. ........................................................................................... 97
FIGURE 30. VALIDATION OF SELEX-SEQ LIBRARY. ....................................................................................... 98
viii
LIST OF TABLES
TABLE 1. LIST OF PUBLISHED WORK ............................................................................................................... 20
TABLE 2. PDB IDS AND PRIMARY CITATIONS OF PROTEIN-DNA COMPLEXES ANALYZED IN THIS WORK. ...... 22
TABLE 3. DATASETS USED IN THIS WORK. ...................................................................................................... 25
TABLE 4. OLIGONUCLEOTIDES USED FOR SITE-DIRECTED MUTAGENESIS. ..................................................... 27
TABLE 5. OLIGONUCLEOTIDES USED FOR SELEX-SEQ. .................................................................................. 28
TABLE 6. LIST OF SNVS IDENTIFIED FROM THE REDFLY DATABASE. .............................................................. 91
TABLE 7. MEF2B MUTATIONS OBSERVED IN LYMPHOMA PATIENTS. ............................................................. 99
ix
LIST OF ABBREVIATIONS
3D: Three-dimensional
5mC: 5-Methyl-Cytosine
A: Adenine
ASD: aCTD-sR4-DNA
BAX: BCL-2-Associated X PROTEIN
bHLH: Basic Helix-Loop-Helix
bp: base pair
C: Cytosine
CAD: CAP-aCTD-DNA
ChIP: Chromatin Immunoprecipitation
CTD: C-Terminal Domain
DBD: DNA Binding Domain
DEER: Double-Electron-Electron-Resonance
DNA: deoxyribonucleic acid
dsDNA: double-stranded DNA
EMSA: Electrophoretic Mobility Assay
EP: Early Palindrome
EPR: Electron Paramagnetic Resonance
Fkh: Forkhead
Fur: Ferric Uptake Regulator
G: Guanine
HelT: Helix Twist
HPV: Human Papilomavirus
HT: High-Throughput
IFN: Interferon
IRF: Interferon Regulatory Factor
x
LTag: Large T Antigen
MBD: Methyl-CpG-Binding Domain
MC: Monte Carlo
MEF2: Myocyte Enhancer Factor 2
MGW: Minor Groove Width
MLR: Multiple Linear Regression
OBD: Origin Binding Domain
ori: Origin DNA
PBM: Protein Binding Microarray
PCR: Polymerase Chain Reaction
PDB: Protein Data Bank
ProT: Propeller Twist
PWM: Position Weight Matrix
R: Purine (A or G)
RE: Response Element
RNA: Ribonucleic Acid
RNAP: Rna Polymerase
SDSL: Site-Directed Spin Labeling
SELEX: Systematic Evolution Of Ligands By Exponential Enrichment
ssDNA: single-stranded DNA
SV40: Simian Virus 40
T: Thymine
TF: Transcription Factor
USF: Upstream Stimulating Factor
W: Weak (A or T)
WT: Wild-Type
Y: Pyrimidine (C or T)
xi
REFERENCES TO ADAPTED WORK
Some of the work presented here is part of published papers that I co-authored.
Therefore, some of the sections were adapted from published work and appear as in the
references below:
Sections 3.1.1.2 and 4.1.1 were adapted from Y.P. Chang, M. Xu, A.C. Dantas Machado,
X.J. Yu, R. Rohs, and X.S. Chen: Mechanism of origin DNA recognition and assembly of
an initiator-helicase complex by SV40 large tumor antigen. Cell Rep. 3, 1117-1127
(2013).
Sections 3.1.1.3 and 4.1.3 were adapted from Deng, Q. Wang, Z. Liu, M. Zhang, A.C.
Dantas Machado, T.P. Chiu, C. Feng, Q. Zhang, L. Yu, L. Qi, J. Zheng, X. Wang, X.M.
Huo, X. Qi, X. Li, W. Wu, R. Rohs, Y. Li, and Z. Chen: Mechanistic insights into metal ion
activation and operator recognition by the ferric uptake regulator. Nat. Commun. 6, 7642
(2015).
Sections 2.2.3, 3.1.2.2 and 4.1.4 were adapted from X. Zhang, A.C. Dantas Machado, Y.
Ding, Y. Chen, Y. Lu, Y. Duan, K.W. Tham, L. Chen, R. Rohs, and P.Z. Qin:
Conformations of p53 response elements in solution deduced using site-directed spin
labeling and Monte Carlo sampling. Nucleic Acids Res. 42, 2789-2797 (2014).
Sections 3.1.2 and 4.1.4 were adapted from Y. Chen, X. Zhang, A.C. Dantas Machado, Y.
Ding, Z. Chen, P.Z. Qin, R. Rohs, and L. Chen: Structure of p53 binding to the BAX
response element reveals DNA unwinding and compression to accommodate base-pair
insertion. Nucleic Acids Res. 41, 8368-8376 (2013).
Sections 2.2.2, 2.9.1, 3.2.1 and subsections, and 4.2 were adapted from A. Lazarovici, T.
Zhou, A. Shafer, A.C. Dantas Machado, T. Riley, R. Sandstrom, P.J. Sabo, Y. Lu, R.
Rohs, J.A. Stamatoyannopoulos, and H.J. Bussemaker: Probing DNA shape and
methylation state on a genomic scale with DNase I. Proc. Natl. Acad. Sci. USA 110,
6376-6381 (2013).
Section 3.2.2 and subsections were adapted from A.C. Dantas Machado, T. Zhou, S. Rao,
P. Goel, C. Rastogi, A. Lazarovici, H.J. Bussemaker, and R. Rohs: Evolving insights on
how cytosine methylation affects protein-DNA binding. Brief. Funct. Genomics 14(1), 61-
73 (2014).
xii
ABSTRACT
Protein-DNA interactions regulate many processes within the cell, and, therefore,
unraveling the intricacies of such recognition mode is a first step in understanding the
bigger complex network that culminates in gene regulation and downstream pathways.
From viruses to mammals, the work presented here expands our previous understanding
of protein-DNA recognition modes by exploring specific mechanisms employed by
proteins that belong to different families and are present in various organisms to
recognize their DNA counterparts. While such level of detail is possible through the
computational analysis of many crystal structures, data from protein-DNA binding assays
allow for further investigation of binding preferences in a high-throughput context. One
of the main points raised from our studies is certainly the role of DNA shape as a key
factor in achieving protein-DNA specificity. Either at the local level, with for instance the
recognition of narrow minor grooves of the DNA and the impact of CpG methylation on
DNA shape, or at a global level, with DNA conformational changes as shown for p53, we
have shown that proteins can recognize specific shape features of the DNA to achieve
binding specificity. Results from computational analysis prompted us to experimentally
investigate sources of protein-DNA recognition in another protein identified from our
studies as a DNA shape reader. By combining insights from structural studies, high-
throughput binding assays and computational analysis of DNA, we are now disentangling
sources of protein-DNA specificity.
1
1 INTRODUCTION
During the last century, tremendous advances have been made in understanding the
molecular hallmarks that govern biological processes within the cells. Within that
context, interactions between DNA and proteins are of paramount importance for
regulating gene expression and the complex network controlled by it. How full processes
are regulated is still a question that might need many more years to be answered.
Nonetheless, by studying the determinants of protein-DNA recognition one can start to
disentangle the many layers of gene regulation.
1.1 DNA-PROTEIN INTERACTIONS IN THE SPOTLIGHT
From the discovery of DNA to its ascension as the regulator of gene expression, various
methods have been used to study the intricacies of protein-DNA interactions and their
sources of specificity (Brown and Freemont, 1996; Galas and Schmitz, 1978; Mahony
and Pugh, 2015). Many nuances in protein-DNA recognition modes were uncovered due
to the rise of structural biology and the vast amount of structures available on the Protein
Data Bank (PDB) (Berman et al., 2000), which has provided insights into the key features
of protein-DNA recognition (Garvie and Wolberger, 2001; Luscombe et al., 2000; Rohs
et al., 2010). Yet, this information was usually restricted to one or a few binding sites
(Joshi et al., 2007; Stella et al., 2010), limiting large-scale studies for a specific protein.
While the emergence of the genomics field allowed for high-throughput approach, only
recently the structural biology and genomics fields started to communicate to decipher
the many codes that govern our genome (Kalhor et al., 2012; Zhou et al., 2013).
2
With the “omics” era, the connections between these fields have started to bring light to
long standing questions regarding the sources of specificity between proteins and their
target sites (Abe et al., 2015; Dror et al., 2014). The rise of high-throughput (HT)
sequencing also exposed the complexity of functional elements within the genome
(ENCODE Project Consortium, 2012). Historically, much of the attention given to
disease causing mutations was mainly focused on mutations within protein coding
regions. It is now recognized that the complexity and diversity encoded in the regulatory
regions of the genome presents non-coding variations that can play a role in complex
traits and human disease (Khurana et al., 2016; Ward and Kellis, 2012). Thus, protein-
DNA interactions can be disrupted and perturbed at different levels, with both coding and
non-coding variations able to interfere with protein-DNA interactions. How much these
variations contribute to disease and other traits, though, is specific to each case, with
studies still being necessary to elucidate the mechanisms behind them. A great part of the
work presented here will focus on the characteristics of protein-DNA interactions, and
how an extra layer of specificity can be achieved by specific residues on the protein side
or at certain positions within the DNA binding site. More specifically, I present insights
into the DNA shape readout recognition mode and its importance for protein-DNA
interactions, both from structural and high-throughput analysis. We will also explore
questions regarding the impact of slight differences in DNA sequence, and how these
variations could interfere with protein-DNA binding recognition through DNA shape.
To introduce the basis of the research presented in this dissertation, the following sections
will give an overview of the intricacies of the DNA three-dimensional (3D) structure,
3
readout modes utilized for achieving binding specificity, description of different
experimental methods from which data was analyzed, and proteins that were studied.
1.2 PROPERTIES OF THE DNA DOUBLE-HELIX
More than 60 years have passed since the structure of the double-helix was proposed by
Watson and Crick (Watson and Crick, 1953). The model presented at that time suggested
a mechanism for copying of the genetic material. Yet, many puzzles were still hidden
within that molecule’s 3D structure and the ways in which it is regulated.
Deoxyribonucleic acid (DNA) is composed of nucleotides, the building blocks of the
DNA. These are formed by sugar, phosphate and one of the four bases Adenine (A),
Thymine (T), Cytosine (C) or Guanine (G). With 2 antiparallel chains that run along a
central axis, the DNA chains pair together through a standard Watson-Crick base pair by
forming hydrogen bonds between a purine (R) base A or G, and pyrimidine (Y) base T or
C, where A pairs with T and C pairs with G, forming AT and GC base pairs, respectively.
From the geometry of paired nucleotides between the two chains, major (wide) and minor
(narrow) grooves arise in a standard B-DNA, which is the most common form of DNA.
Despite the main building blocks being the same among DNA molecules, the DNA
structure is degenerate and dependent on its surrounding sequence and environment (el
Hassan and Calladine, 1996; Shakked and Rabinovich, 1986; Shakked et al., 1989;
Slattery et al., 2014). Thus, variations in the local DNA geometry and global structural
variations that arise from differences in sequence context can play a role in protein-DNA
recognition (Olson et al., 1998; Rohs et al., 2010). Among the parameters utilized for
describing local variations in DNA structure there are intra base-pair parameters (shear,
4
buckle, stretch, propeller twist, stagger and opening), inter base-pair parameters (roll,
rise, slide, tilt, shift, twist) and helical parameters (helix twist, helix rise, x-displacement,
y-displacement, inclination and tip) (Lawson and Berman, 2008; Olson et al., 2001). A
few representations of these parameters are shown in Figure 1. Minor groove width is
also a key feature that has been observed to play a major role in protein-DNA recognition
(Rohs et al., 2009a).
Figure 1. Schematic representation of DNA parameters. For simplification, only a few parameters, including those
discussed in this work, are shown here.
Given the variations of these DNA conformations and parameters, differences in global
and local features of DNA that goes beyond the chemical signatures of the bases can
ultimately affect protein-DNA recognition. For instance, the narrowing of the minor
groove can increase the negative electrostatic potential on the surface of DNA, a feature
that can be read by proteins (Rohs et al., 2009a). In addition, DNA deformations can also
vary in a sequence-dependent manner (Olson et al., 1998). The variation in the number of
hydrogen bonds between base pairs allows AT base pairs to have greater variability in
propeller twist, which could ultimately affect TF binding by contributing to DNA
flexibility and rigidity (Dror et al., 2016). Furthermore, partial unstacking of flexible base
5
pairs, such as TpA step, can result in DNA kinks (Rohs et al., 2010). Stacking
interactions are stabilized by weak forces and dependent on the composition and
sequence of base pairs, with more stable stacking interactions occurring for i) dimers with
GC base pairs when compared to AT base pairs and ii) RpY when compared to YpR
(Koudelka et al., 2006). Therefore, nuances of structural properties of the double-helix, in
addition to the DNA sequence itself, can play a role in protein-DNA binding specificity.
1.3 PROTEIN-DNA RECOGNITION MODES
Traditionally, protein-DNA interactions have been classified as direct or indirect readout
(Harrison and Aggarwal, 1990; Koudelka et al., 2006; Otwinowski et al., 1988; Seeman
et al., 1976). Due to the broad definition of indirect readout terms, a more specific
classification was proposed using base (direct) and shape (indirect) readout (Rohs et al.,
2010). While base readout refers to the interactions of the specific chemical groups of
the bases and the protein residues (Rosenberg et al., 1973; Seeman et al., 1976), shape
readout (Rohs et al., 2010) refers to the recognition mode based on nuances in DNA
conformation (Rohs et al., 2009b; Travers, 1989). Figure 2 shows examples of how base
and shape readout can contribute to specificity, and the peculiarities of these readout
modes are discussed below.
6
Figure 2. Base and shape readout contribute to TF–DNA binding specificity. (A) Base readout describes direct
interactions between amino acids and the functional groups of the bases. Whereas the pattern of hydrogen bond
acceptors (red) and donors (blue), heterocyclic hydrogen atoms (white) and the hydrophobic methyl group (yellow) of a
Watson-Crick base pair is base pair-specific in the major groove, the pattern is degenerate in the minor groove. (B)
Shape readout includes any form of structural readout based on global and local DNA shape features, including
conformational flexibility and shape-dependent electrostatic potential. The DNA target of the IFN-b enhanceosome
(PDB ID 1t2k; top) (Panne et al., 2004) varies in minor groove shape. The human papillomavirus E2 protein binds to a
DNA binding site (PDB ID 1jj4; bottom) (Kim et al., 2000) with intrinsic curvature. Adapted from (Slattery et al.,
2014).
1.3.1 BASE READOUT
DNA base readout accounts for protein-DNA interactions either in the minor or major
grooves of the DNA where the chemical groups of amino acid residues recognize the
DNA through hydrogen bonds, hydrophobic interactions or water-mediated hydrogen
bonds (Rohs et al., 2010; Rosenberg et al., 1973; Seeman et al., 1976). The DNA major
groove of a Watson-Crick base pair in particular has a very unique chemical signature
(Figure 2A), which allows for recognition of specific bases by protein residues. One
prominent example is the formation of bidentate hydrogen bonds in the major groove
between an arginine and a guanine (Coulocheri et al., 2007). For the protein p53, such
base readout mode specifies the C and G positions seen in the core “CWWG” of the p53
7
binding site (Chen et al., 2010). Proteins that exhibit a readout mode of major groove
recognition of hydrogen bonds include homeodomains, zinc fingers, among others (Rohs
et al., 2010). Another example of major groove base readout is observed for hydrophobic
contacts, which can be used to differentiate pyrimidines (thymine and cytosine) at certain
positions (Harrison and Aggarwal, 1990). The DNA minor groove, on the other hand,
presents a less unique chemical signature. Hydrogen bond donors and acceptors do not
allow differentiation between AT from TA or CG from GC base pairs (Figure 2A).
Therefore, structural features of the DNA can be used as an extra layer for achieving
binding specificity.
Traditionally, through the recognition of these specific chemical signatures, DNA binding
sites that are bound by proteins have often been described as consensus motifs (Stormo,
2000). The need for more improved ways of binding site representations became clear
with the recognition that regulatory sites presented some variability, which later led to the
emergence of position weight matrices (PWMs) (Stormo, 2000; Stormo et al., 1982). One
of the advantages of the PWM depiction is that each position within the DNA binding
site can be visualized as a DNA logo (Schneider and Stephens, 1990). However, by only
considering one position of the DNA binding site at a time, the interdependency between
neighboring bases and structural features are not taken into account, and single position
motifs are usually not sufficient to fully explain specificity; in fact, it is now appreciated
that the interdependency between bases can play a key role in protein-DNA recognition
(Zhao et al., 2012; Zhou et al., 2015).
8
1.3.2 SHAPE READOUT
DNA shape readout refers to the readout mode in which proteins recognize features that
are a result of structural signatures of a DNA. These are deviations away from a standard
B-DNA, either on local parameters or global features of the DNA (Rohs et al., 2010;
Travers, 1989). These variations include stacking interactions (Harrison and Aggarwal,
1990), groove width changes (Rohs et al., 2009a), kinks (Kim et al., 1990), bending
(Zhurkin et al., 2005), A-DNA (Lu et al., 2000) and Z-DNA (Wang et al., 1979). An
example of nuances in groove width and DNA bending is shown in Figure 2B. Because
deviations away from the standard B-DNA can affect protein-DNA recognition, a few of
them are summarized here. A well-stablished example of shape readout at the local level
is the recognition of a narrow region of the DNA minor groove, with increased negative
electrostatic potential, by arginine side chains of the proteins (Rohs et al., 2009a). While
current research is still uncovering the contributions of shape readout for protein-DNA
recognition, recent studies have revealed this phenomenon to be widely observed for
various proteins including members of the Hox family (Joshi et al., 2007) and p53
(Kitayner et al., 2010). In the case of Hox proteins, shape readout has been observed to
allow for differentiation between binding sites of anterior and posterior members of the
family in the presence of its co-factors (Abe et al., 2015; Slattery et al., 2011). For the
tumor suppressor p53, the arginine residue (Arg248) that recognizes DNA shape has the
highest p53 mutation rate in cancer patients (IARC database), suggesting the importance
of DNA shape for protein-DNA recognition. DNA shape data analysis of high-throughput
DNA binding assays also suggest that this mechanism is involved in DNA binding
specificity between paralogous TFs (Dror et al., 2014; Gordân et al., 2013). Even though
9
the main focus of our studies has been the electrostatic-dependent minor groove-mediated
readout mode, shape readout can present itself in different ways. For instance, the
presence of DNA kinks and consequent disruption of the helix linearity is another shape
readout mechanism. Such kinks can favor optimal conformations of protein-DNA or
protein-protein contacts, as in the case of EcoRV endonuclease DNA recognition where
the DNA kink is induced by a TpA step (Horton et al., 2002). A detailed review
summarizes protein-DNA recognition modes in Rohs et al. (Rohs et al., 2010).
1.3.3 INTERPLAY BETWEEN SHAPE AND BASE READOUT
Given the ever increasing number of structures deposited into the PDB (Berman et al.,
2000), structural studies have shed light into various modes by which proteins recognize
their DNA target sites (Harrison and Aggarwal, 1990; Luscombe et al., 2000; Pabo and
Sauer, 1992; Rohs et al., 2010). A prevalent observation among many structures is the
interplay between shape and base readout employed by proteins. The degree of
requirement of each of these, though, varies according to the protein (Slattery et al.,
2014) .
Taking the example of Hox proteins, co-factors might be necessary to achieve specificity
that goes beyond the recognition of the main helix in the major groove by positioning
residues involved in shape readout of the minor groove (Joshi et al., 2007) (Figure 3A).
In fact, while Hox monomers bind to their motif “TWAY” with similar specificities, the
binding of co-factors contributes to achieving their latent specificity (Slattery et al.,
2011), which is partially mediated by shape readout in the minor groove (Abe et al.,
2015).
10
Figure 3. Interplay of base and shape readout varies among TF families. Heterodimer of the Hox homeodomain protein
(PDB ID 2r5z) (Joshi et al., 2007). Sex combs reduced (Scr; cyan; top and center) and its cofactor Extradenticle (Exd;
magenta; top and center) binds with its recognition helices through base readout to the major groove (blue box;
bottom), whereas arginine residues of the N-terminal Scr linker read minor groove shape and electrostatic potential as a
form of shape readout (pink box; bottom). (B) Upstream stimulating factor (USF) homodimer of the bHLH protein
family (green and pink; top and center) (PDB ID 1an4) (Ferré-D’Amaré et al., 1994) binds with its recognition helices
through base readout to the E-box core-binding site (blue box; bottom) and recognizes flanking sequences (pink box;
bottom) through extended linkers that connect the two a-helices of each USF monomer. (C) Human papillomavirus
(HPV) E2 homodimer (purple and chartreuse; top and center) (PDB ID 1jj4) (Kim et al., 2000) recognizes with its
recognition helices the half-sites of its binding site through base readout (blue box; bottom), whereas the intrinsic
curvature of the central spacer contributes to binding through shape readout (pink box; bottom). (D) Four DBDs of the
p53 tetramer (cyan, yellow, pink, and green; top and center) (PDB ID 3kz8) (Kitayner et al., 2010) bind to the major
groove through base readout (blue box; bottom), whereas the Arg248 residues recognize the minor groove through
shape readout (pink box; bottom). (E) Basic leucine zipper (bZIP) proteins c-Jun and ATF-2 TFs (cyan and magenta,
respectively; top and center) and helix-turn-helix (HTH) domains of interferon regulatory factors (IRF) of the IFN-b
enhanceosome (PDB ID 1t2k) (Panne et al., 2004) recognize the major groove through base readout (blue box; bottom),
whereas the IRF-3 TFs (green and yellow; top and center) also use their His40 residues to recognize the minor groove
through shape readout (pink box; bottom). Adapted from (Slattery et al., 2014).
Furthermore, proteins can also bind DNA as dimers and tetramers. The bHLH protein
upstream stimulating factor (USF) binds DNA as a dimer, where it reads the major
groove of DNA through its recognition helix and the flank sites through its linkers
11
(Figure 3B). Evidence that the linker interactions are important for binding specificity of
bHLH proteins was observed for the paralogous TFs Cbf1 and Tye7 (Gordân et al.,
2013). The human papillomavirus (HPV) E2 protein, on the other hand, in addition to
base readout through the recognition helix, utilizes shape readout through the recognition
of the DNA curvature (Figure 3C) (Rohs et al., 2005a). Monte Carlo simulations of the
binding sites and binding studies of the papillomavirus E2-DNA targets indicate that
changing the central bases of DNA binding sites which are not contacted by the protein
decreases protein’s affinity to the DNA as a result of varying the intrinsic DNA bending
(Hines et al., 1998; Rohs et al., 2005a). The p53 protein recognizes DNA as a tetramer
and, as mentioned before, utilizes both shape and base readout as important modes of
recognition (Figure 3D). Finally, within the 3D context of the cell, multimeric complexes
can dictate various modes of recognition to achieve specificity, as seen for the
enhanceosome (Figure 3E).
Lastly, for some proteins, flanking regions and motif environment (Dror et al., 2015), as
well as non-consensus interactions, can play a role in TF binding. Interestingly, non-
consensus interactions might actually be as important as interactions that contribute to the
consensus motif, as recently described for bHLH and E2F proteins (Afek et al., 2014).
1.4 EXPERIMENTAL METHODS TO STUDY PROTEIN-DNA BINDING
Among the most established methods utilized in the past for studying protein-DNA
interactions were electrophoretic mobility assays (EMSA), DNase I hypersensitivity,
chromatin immunoprecipitation (ChIP), the bacterial one-hybrid system (B1H), and X-
ray crystallography (Carey et al., 2009, 2012; Luscombe et al., 2000; Meng et al., 2005).
12
While each one of these has its particular advantages and disadvantages and is still used
today, the emergence and availability of high-throughput (HT) sequencing has allowed
for the development of innumerous methods that provide a wealth of data for
understanding protein-DNA interactions.
In vivo methods based on chromatin immunoprecipitation such as ChIP-seq (Mardis,
2007) have revealed the binding location regions of many proteins within the genome
(ENCODE Project Consortium, 2012). The results usually encompass regions of
hundreds of base pairs to which algorithms for motif discovery need to be applied.
Although recent advances have allowed for higher resolution (Mahony and Pugh, 2015),
every experiment has its potential noise and bias, and the model accuracy is restricted by
other factors that occur in vivo (Stormo and Zhao, 2010). Understanding the inherent
preferences of protein-DNA in addition to its in vivo binding profile can provide a picture
of the intrinsic binding sites preferred by a certain protein and how specificity is
achieved.
Many different methods exist for in vitro high-throughput analysis of protein-DNA
binding events, including Spec-seq (Stormo et al., 2014), Bind-n-seq (Zykovich et al.,
2009) Bundle-seq (Levo et al., 2015), SELEX-seq (Slattery et al., 2011) , HT-SELEX
(Jolma and Taipale, 2011), protein binding microarray (PBM) (Berger and Bulyk, 2009)
and MITOMI (Rockel et al., 2012). Such methods have allowed for the identification of
specific recognition patterns that govern protein-DNA specificity, including one
interesting observation of covariation between homeodomain proteins and their DNA
binding sites (Dror et al., 2014). In this case, by analyzing in vitro data for
homeodomains, specific amino acids in the N-terminal tail were proposed to play a key
13
role in binding specificity through shape readout (Dror et al., 2014). This is also
supported by other in vivo and in vitro studies in which mutations of Hox residues in the
N-terminal tail lead to changes in binding specificity (Abe et al., 2015). High-throughput
studies of various TF binding datasets from in vitro experiments also revealed another
interesting finding – that by considering the interdependency between bases and adding
DNA shape to sequence-only models, binding specificity prediction for TFs can be
improved in a family-specific manner (Zhou et al., 2015).
Due to the significance of high-throughput in vitro studies for studying determinants of
protein-DNA specificity and aiming for a deeper understanding of binding specificity,
some of the results presented in this dissertation took advantage of in vitro experiments in
order to understand the differences in base and shape readout contributions.
1.4.1 PROTEIN BINDING MICROARRAY (PBM)
Protein binding microarrays have been used for the characterization of many protein
binding specificities (Badis et al., 2009; Berger et al., 2006; Rogers et al., 2015). The
traditional technique, universal PBM, uses a DNA microarray to which the protein of
interest is added. Binding to any location on the array is measured by the fluorescence
resultant from the binding of specific antibodies to the protein of interest (Berger and
Bulyk, 2009; Berger et al., 2006). Every 10mer appears at least once in the array, with
every 8mer appearing 32 times. Many insights into protein-DNA binding specificity have
been revealed by this approach, including changes in DNA binding preferences during
the course of evolution by Forkhead (Fkh) proteins (Nakagawa et al., 2013), which will
be discussed later (Appendix B). A more recently developed version, genomic context
14
PBM (gcPBM), reveals that in vitro and in vivo binding specificities of bHLH proteins
varies according to their genomic context, and that regions outside the core E-box motif
play a key role in specificity through DNA shape readout (Gordân et al., 2013).
1.4.2 SELEX-SEQ
Systematic evolution of ligands by exponential enrichment (SELEX) is a methodology
used to interrogate DNA binding sites first described many years ago (Oliphant et al.,
1989; Tuerk and Gold, 1990). Also referred to as in vitro selection, it was one of the main
in vitro methods used to identify DNA binding sites. In the past, after isolation of the
bound fractions, the fragments were cloned and sequenced, which limited the number of
sequences studied. During the last decade, new technologies have been developed that
combine the power of HT-sequencing with selection of in vitro targets (Jolma et al.,
2010; Slattery et al., 2011).
The SELEX-seq methodology uses the basic concept of in vitro selection, which is that in
the presence of a ligand, higher affinity binding sites will be more enriched after rounds
of selection (Wilson and Szostak, 1999). Starting from an oligonucleotide library, the
ligand of interest, in our case a purified protein, is introduced into the reaction. After
removal of the unbound pieces (for example, by gel electrophoresis), the bound
sequences are amplified by PCR and put through a next round of selection. After a few
rounds of selection, the binding sites with highest specificities should be selected. One
advantage of this method is that the EMSA step with selection of bound fragments
through gel electrophoresis also allows for isolation of specific fragments and the study
of protein complexes (Slattery et al., 2011). Binding sites of different lengths can be used
to study flank regions or the effect of other binding sites within neighboring regions. Due
15
to the complexity of the initial library and the deep sequencing of the recovered (bound)
fragments enabled by SELEX-seq, methods like this allow for estimation of binding
models and binding energies (Riley et al., 2014; Zhao et al., 2009). For SELEX-seq,
relative affinities of the protein to the selected sequences can be estimated throughout the
rounds by statistically modeling the multiple rounds of selection (Riley et al., 2014;
Slattery et al., 2011) .
1.5 PROTEIN-DNA BINDING SITES PRESENTED IN THIS THESIS
Among the vast array of proteins that exist within the cell, a subset of them are DNA-
binding proteins that present a domain capable of DNA recognition. Based on the
recognition modes employed by proteins to interact with DNA, specific or non-specific
sequences can be selected. For instance, TFs, proteins that have been long-studied for
their ability to regulate gene expression by interacting with their transcription factor
binding sites (TFBSs), can recognize specific sequences of DNA in promoters and
enhancers of genes. On the other hand, structural proteins such as histones bind DNA at
less specific sequences.
Although gene expression control has many layers of regulation, the recognition of target
DNA binding sites is of crucial importance for its regulation. The binding of co-factors
and chromatin context can add additional layers of regulation, fine-tuning how proteins
discern their specific DNA binding sites (Slattery et al., 2014). Furthermore, variations
within the DNA binding sites can also impact the way proteins recognize their binding
sites (Ward and Kellis, 2012). Therefore, the effect of single nucleotide variants in the
context of DNA shape readout was also investigated here (Appendix A).
16
This work shows results from proteins present in different organisms and exerting various
functions in vivo: from replication to tumor suppression, the proteins studied have
expanded our comprehension of DNA shape as a protein readout mechanism and
challenged our views on how specificity can be achieved.
1.5.1 LARGE T ANTIGEN AND ORIGIN DNA
The Large T antigen (LTag) protein of the Simian virus 40 (SV40) is a model system to
study replication due to its dual function as both a initiator and helicase, essential
processes for replication (Simmons, 2000). The protein contains 3 different domains: the
origin binding domain (ODB), the Zn domain, and the AAA+ domain (Figure 6A). The
target origin of replication DNA is composed of 64bp, although the binding site presented
in this work will focus on only one half-site referred to as early palindrome (EP). The
LTag hexamer assembles on each half-site, which allows for DNA melting and
unwinding (Borowiec et al., 1990). However, in order to understand the contributions of
all 3 domains to origin recognition, a complex formed by a LTag dimer containing all 3
domains and the EP origin (ori) DNA was used for DNA analysis, which suggests a
shape readout mechanism not previously described.
1.5.2 FERRIC UPTAKE REGULATOR (FUR) AND ITS DNA SITES
The ferric uptake regulator (Fur) is a protein involved in regulating iron uptake and
homeostasis (Hantke, 2001). The protein is composed of a N-terminal DNA binding
domain and a C-terminal dimerization domain. Fur exerts its function by controlling the
transcription of genes involved in iron regulation, primarily by acting as a repressor once
17
bound to the DNA (Lee and Helmann, 2007). The study presented here focus on
different DNA targets bound by Fur: the Fur box, which is a consensus sequence derived
from various Fur target sites (Escolar et al., 1998; Ochsner and Vasil, 1996), and the
feoAB1, a Fe
2+
transport protein operator. The analysis of DNA sites presented here refers
to the crystal structures as well as known Fur DNA binding sequences.
1.5.3 RNA POLYMERASE aCTD AND THE UPSTREAM REGION
For the RNA polymerase to exert its transcriptional function, it is recruited to promoter
regions. One of the domains, comprised of two aCTD (C-terminal domain) units, is
responsible for recognition of an upstream region (UP). The presence of an AT-rich
sequence at this region can greatly affect transcription (Czarniecki et al., 1997; Estrem et
al., 1998; Ross et al., 1998). By studying crystal structures of aCTDs and their
interactions with DNA sites containing an A-tract of length six (A
6
, full A-tract) or five
(A
5
“knockout” A-tract), we suggest the presence of a shape readout mechanism whereby
the assembly of one subunit is disrupted in the case of the disruption of the full A-tract.
1.5.4 P53 AND ITS RESPONSE ELEMENTS (RES)
p53, also known as the guardian of the genome, is a transcription factor involved in cell
cycle arrest, apoptosis and DNA repair (Vousden and Prives, 2009). p53 binds to
dodecameric half-sites as a tetramer (Chen et al., 2010) of consensus RRRCWWGYYY,
(El-Deiry et al., 1992; Funk et al., 1992) although some binding sites present a variable
spacer region in between the 2 sites (Smeenk et al., 2008; Wei et al., 2006). This spacer
region is usually non-existent in high-affinity response elements (REs) of genes involved
18
in cell cycle arrest, while variable spacers are seen in low-affinity binding sites of
elements involved in apoptosis (Weinberg et al., 2005). The investigation of p53 DNA
binding sites in this work highlights the role of intrinsic structural properties of the DNA
for protein binding as well as the DNA conformational changes induced upon protein
binding.
1.5.5 DNASE I ENDONUCLEASE AND DNA CLEAVAGE RATE
The endonuclease DNase I has been used in molecular biology in a number of studies
(Galas and Schmitz, 1978; Hogan et al., 1989; John et al., 2011; Sabo et al., 2004). As
opposed to most TFs, though, DNase I does not recognize a specific DNA sequence
motif. The binding preference of this enzyme have long been studied (Brukner et al.,
1995; Suck et al., 1984), and DNase I has been previously reported to interact with the
DNA minor groove (Suck et al., 1988). Here, by coupling the study of DNAse I cleavage
rates with methylome information we were able to highlight the importance of DNA
structural features for DNase I recognition as a result of cleavage rate, and the impact of
methylation on DNA shape.
1.5.6 THE MYOCYTE ENHANCER FACTOR 2 (MEF2) AND ITS DNA TARGET SITES
Proteins of the MEF2 family are composed of a DNA binding domain (DBD) and a
transactivation domain. In humans, four members exist (MEF2A, MEF2B, MEF2C and
MEF2D), while flies and yeast present only one member (Potthoff and Olson, 2007). The
DBD contains a MADS domain and a MEF2 domain which are highly conserved not
only among the family members but also across species (Potthoff and Olson, 2007). In
19
fact, the MADS box is well conserved among different proteins (Wu et al., 2011),
suggesting its importance for protein-DNA recognition. MEF2 members recognize their
DNA binding sites as dimers, with contacts in both major and minor grooves of the DNA.
In humans, MEF2s have been implicated in a number of differentiation and signaling
processes, regulating muscle development, neuronal and endothelial cells, lymphocytes,
among others (Potthoff and Olson, 2007). Thus, it is not surprising that mutations in
members of the MEF2 family lead to a widespread number of conditions including
neurological disorders (Rocha et al., 2016; Zweier et al., 2010), coronary artery disease
(Oishi et al., 2010; Wang et al., 2003) and cancer (Morin et al., 2016; Ying et al., 2013;
Zhang et al., 2013). By inspecting crystal structures of MEF2 in complex with DNA and
analyzing high-throughput DNA binding data, the role of shape readout for MEF2B is
being revealed.
1.6 OVERVIEW
This dissertation presents results from analysis of protein-DNA interactions and explores
how these contacts can play a role in achieving DNA binding specificity - at the atomic
level, with analysis of crystal structures, and at the genomic level, with analysis of high-
throughput DNA binding assays. By combining computational studies with experimental
results, the analysis presented in this work highlights key aspects of protein-DNA
recognition and sheds light into future questions in the field. This dissertation is a
compilation of my Ph.D. studies, many of which involved collaborative projects. Since
not all projects are described in detail, the following Table 1 presents a list of all papers
that I co-authored during my Ph.D. studies.
20
Table 1. List of published work
Study subject Published work citations
DNA shape
readout in the
minor groove
Chen, Y., Bates, D.L., Dey, R., Chen, P.-H., Dantas Machado, A.C., Laird-Offringa, I.A., Rohs, R., and Chen, L.
(2012). DNA Binding by GATA Transcription Factor Suggests Mechanisms of DNA Looping and Long-Range
Gene Regulation. Cell Rep. 2, 1197–1206.
Chang, Y.P., Xu, M., Dantas Machado, A.C., Yu, X.J., Rohs, R., and Chen, X.S. (2013). Mechanism of Origin
DNA Recognition and Assembly of an Initiator-Helicase Complex by SV40 Large Tumor Antigen. Cell Rep. 3,
1117–1127.
Chen, Y., Zhang, X., Dantas Machado, A.C., Ding, Y., Chen, Z., Qin, P.Z., Rohs, R., and Chen, L. (2013).
Structure of p53 binding to the BAX response element reveals DNA unwinding and compression to accommodate
base-pair insertion. Nucleic Acids Res. 41, 8368–8376.
Deng, Z., Wang, Q., Liu, Z., Zhang, M., Dantas Machado, A.C., Chiu, T.-P., Feng, C., Zhang, Q., Yu, L., Qi, L., et
al. (2015). Mechanistic insights into metal ion activation and operator recognition by the ferric uptake regulator.
Nat. Commun. 6, 7642.
Lara-Gonzalez, S., Dantas Machado, A.C., Napoli, A.A., Birktoft, J., Rohs, R., Lawson, C.L.. RNA polymerase
recognizes the DNA shape of the Upstream Promoter element (in preparation).
Methods to
analyze DNA
shape/structure
Zhang, X., Dantas Machado, A.C., Ding, Y., Chen, Y., Lu, Y., Duan, Y., Tham, K.W., Chen, L., Rohs, R., and Qin,
P.Z. (2014). Conformations of p53 response elements in solution deduced using site-directed spin labeling and
Monte Carlo sampling. Nucleic Acids Res. 42, 2789–2797.
Zhou, T., Yang, L., Lu, Y., Dror, I., Dantas Machado, A.C., Ghane, T., Felice, R.D., and Rohs, R. (2013).
DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale. Nucleic
Acids Res. 41, W56–W62.
DNA shape
contributions
from high-
throughput
studies
Lazarovici, A., Zhou, T., Shafer, A., Dantas Machado, A.C., Riley, T.R., Sandstrom, R., Sabo, P.J., Lu, Y., Rohs,
R., Stamatoyannopoulos, J.A., et al. (2013). Probing DNA shape and methylation state on a genomic scale with
DNase I. Proc. Natl. Acad. Sci.
Levo, M., Zalckvar, E., Sharon, E., Dantas Machado, A.C., Kalma, Y., Lotam-Pompan, M., Weinberger, A.,
Yakhini, Z., Rohs, R., and Segal, E. (2015). Unraveling determinants of transcription factor binding outside the
core binding site. Genome Res. 25, 1018–1029.
Others
Slattery, M., Zhou, T., Yang, L., Dantas Machado, A.C., Gordân, R., and Rohs, R. (2014). Absence of a simple
code: how transcription factors read the genome. Trends Biochem. Sci. 39, 381–399.
Dantas Machado, A.C., Zhou, T., Rao, S., Goel, P., Rastogi, C., Lazarovici, A., Bussemaker, H.J., and Rohs, R.
(2015). Evolving insights on how cytosine methylation affects protein-DNA binding. Brief. Funct. Genomics 14,
61–73.
Educational/Book
Chapters/ News
and Views
Dantas Machado, A.C., Saleebyan, S.B., Holmes, B.T., Karelina, M., Tam, J., Kim, S.Y., Kim, K.H., Dror, I.,
Hodis, E., Martz, E., et al. (2012). Proteopedia: 3D visualization and annotation of transcription factor–DNA
readout modes. Biochem. Mol. Biol. Educ. 40, 400–401.
Harris, R.C., Mackoy, T., Dantas Machado, A.C., Xu, D., Rohs, R., and Fenley, M.O. (2012). Chapter 3: Opposites
attract: Shape and electrostatic complementarity in protein-DNA complexes. In Innovations in Biomolecular
Modeling and Simulation. Volume 2, pp. 53-80 (2012). Biomolecular Sciences Series, Royal Society of Chemistry,
RCS Publishing, London, UK.
Rohs, R., Dantas Machado, A.C., and Yang, L. (2015). Exposing the secrets of sex determination. Nat. Struct. Mol.
Biol. 22, 437–438.
21
2 METHODS
2.1 COMPUTATIONAL ANALYSIS OF THE BOUND DNA STRUCTURE
2.1.1 GENERAL COMPUTATIONAL METHODOLOGY FOR DNA STRUCTURE ANALYSIS
DNA structures presented in this work were analyzed with Curves (Lavery and Sklenar,
1989). The minor groove width was calculated by averaging all levels designated to each
nucleotide. Other DNA parameters were analyzed in a case-specific manner described
below. To calculate the electrostatic potential of the DNA, a non-linear Poisson-
Boltzmann equation was solved at physiologic ionic strength of 0.145M (Honig and
Nicholls, 1995). The electrostatic potential was calculated at reference points located in
the center of the minor groove, at a middle point between the O4′ atom and the closest
sugar moieties across the minor groove, as previously described (Rohs et al., 2009a).
Minor groove width and electrostatic potential were plotted as a function of DNA
sequence and visualized on the DNA molecular surface using GRASP2 (Petrey and
Honig, 2003). A complete list of DNA structures obtained from protein-DNA complexes
and analyzed under the above protocol can be found in Table 2. PyMol (DeLano, 2002)
was used to generate representations crystal structures analyzed in this work.
The PDB was queried to investigate DNA shape readout among crystal structures of
protein-DNA complexes. In brief, crystal structures were screened based on X-ray
resolution (< 2.5Å), DNA length (10 bps), and the presence of arginine or lysine contacts
in the minor groove as defined previously (Rohs et al., 2009a) .
22
Table 2. PDB IDs and primary citations of protein-DNA complexes analyzed in this work.
Study Protein-DNA complex, PDB IDs and primary structure citation
Shape readout by histidines
Large T antigen: PDB ID 4GDF (Chang et al., 2013);
Enhanceosome: PDB IDs 1TK2 (Panne et al., 2004) , 2PIO (Escalante et al., 2007) and 2O61 (Panne
et al., 2007).
Shape readout by lysines
Ferric uptake regulator: PDB IDs 4RB1 (Fur-Box DNA) and 4RB3 (feoAB1 operator) (Deng et al.,
2015);
Glucocorticoid receptor: PDB ID 3FYL (Meijsing et al., 2009);
DnaA: PDB ID 3PVP (Tsodikov and Biswas, 2011);
Tet3: PDB ID 4HP3 (Xu et al., 2012).
Shape readout by arginines
RNA polymerase: PDB IDs 3N4M (CAP-aCTD-DNA complex “CAD”), 3N97 (aCTD-sR4-DNA
complex “ASD”) and 5CIZ (“CAD” knockout with shortened A-tract) (Lara-Gonzalez et al., to be
published);
DNase I: 2DNJ (Lahm and Suck, 1991).
Conformational changes upon
p53 binding p53: 3KMD (Chen et al., 2010); 4HJE (Chen et al., 2013); 3TS8 (Emamzadah et al., 2011).
Shape readout by MEF2
MEF2A and MEF2B: 1EGW (Santelli and Richmond, 2000), 1N6J (Han et al., 2003), 1TQE (Han et
al., 2005), 3KOV (Wu et al., 2010), 3P57 (He et al., 2011), 3MU6 (Jayathilaka et al., 2012) .
2.1.2 COMPUTATIONAL ANALYSIS OF P53 BAX AND 3KMD
For the study of p53 binding sites, two DNA sequences, here referred as BAX and
3KMD - 4HJE (Chen et al., 2013) and 3KMD (Chen et al., 2010) PDB IDs, respectively -
were analyzed. To identify peculiarities in DNA structural features between these 2
sequences, helix axis, helix diameter, helix twist, and bending were analyzed in addition
to minor groove width. To analyze DNA unwinding, the sum of helix twist at a specific
region was calculated. By aligning BAX and 3KMD sequences based on the half-site
positions, the difference in total helix twist was then analyzed and compared between the
2 sequences.
2.2 PREDICTION OF UNBOUND DNA STRUCTURE
2.2.1 MONTE CARLO SIMULATION TO PREDICT THE UNBOUND DNA STRUCTURE
All-atom Monte Carlo (MC) simulation was used to predict the unbound structure of the
DNA (Joshi et al., 2007; Rohs et al., 2005a). In brief, the MC protocol used the AMBER
force field, an implicit aqueous solvent model, and explicit sodium counterion. The
23
simulations start with an ideal B-DNA presenting the sequence of interest, and fixed
dinucleotide conformations at the starting point. The final predicted unbound DNA
structure was obtained from 3 independent simulations, each containing 2 million MC
cycle simulations, and the first 500,000 cycles of equilibration were discarded from
further analysis unless otherwise noted. MC simulations were performed to investigate
the intrinsic structural features of DNA sequences present in the following complexes:
LTag, enhanceosome, Fur, aCTD of RNA polymerase, MEF2B. The MC simulations
followed the above protocol unless otherwise stated below.
2.2.2 PREDICTION OF UNMETHYLATED AND METHYLATED DNA FOR DNASE I STUDIES
A modified program was used where the MC sampling methodology was expanded by
facilitating one additional internal degree of freedom, the rotation of the cytosine 5-
methyl group, which we implemented in analogy to the rotation of the thymine methyl
group (Rohs et al., 2005a, 2005b). Partial charges for 5-methylcytosine were taken from a
set of AMBER force field parameters derived for chemically modified nucleotides (Aduri
et al., 2007). To predict MGW and roll of unbound DNA targets, we performed all-atom
MC simulations for all 256 possible hexamers of the form NNN|CGN, and the 256
corresponding sequences with 5-methylcytosines at both base pairs of the CpG
dinucleotide at positions +1 and +2. We performed one MC simulation for each
unmethylated and methylated forms of these 256 hexamers after adding a CGCG flank on
both sides. To evaluate end-effects for the hexamer ACTCGA, whose DNase I cleavage
rate increased ∼ eightfold upon methylation, we performed nine independent MC
simulations for this sequence, three for each of the three different tetramer flanks
24
(CGCG-N
6
-CGCG, CGTA-N
6
-TACG, and CATG-N
6
-CATG). These runs were
performed for the unmethylated and CpG methylated variants of the ACT|CGA hexamer.
MC predictions yield ensembles of all-atom structures, which we analyzed for structural
changes upon methylation. MGW and roll showed most significant effects upon CpG
methylation. For the ACTCGA sequence, we computed MGW and roll as the median
over the predictions with the three different flanking sequences (Figure 14B).
2.2.3 GENERATION OF DNA MODELS FOR IN SOLUTION DEDUCTION OF P53
CONFORMATIONS
For the combined approach using site-directed spin labeling and MC sampling to deduce
in solution conformations of p53, MC sampling was performed >1 million cycles with
random conformational changes of all degrees of freedom in each cycle. All-atom
structures were recorded every 100 MC cycles, forming an MC ensemble of 10 000
structures for each simulation.
2.3 ESTIMATION OF UNBOUND MINOR GROOVE WIDTH THROUGH ORCHID2
Hydroxyl cleavage pattern derived from ORChID2 (Bishop et al., 2011) and
implemented under GBshape (Chiu et al., 2015) was used as a measure of minor groove
width of the unbound Fur DNA.
2.4 PREDICTION OF PROTEIN-DNA BINDING ENERGY USING ROSETTA
We used a new version for molecular modeling based on the Rosetta software package
provided by Dr. Phil Bradley (Fred Hutchinson Cancer Research Center and University
of Washington) for protein-DNA interactions. The model is based on high resolution
structural modeling. This version allows for full backbone flexibility, while setting
25
constraints under the Ca and phosphate atoms. By inputting secondary structure
information, a kinematics setup is created using a fold tree representation so that
backbone atoms are allowed to have a greater degree of freedom. For each decoy
generated during the sampling, multiple energies are reported including energies for the
unbound components. Energies of relaxed components allows for a more precise binding
energies calculations and also differentiation between mutation strain caused by inter-
protein interactions versus protein-DNA interactions.
2.5 HIGH-THROUGHPUT DATA ANALYZED
Datasets containing DNA binding sites originated from protein-DNA binding studies
were obtained from different sources. ChIP-seq datasets for MEF2A and MEF2C were
obtained from the ENCODE project, release 3 (GEO accession numbers: GSM803511
and GSM803420). HT-SELEX data for MEF2D was obtained from (Jolma et al., 2013)
and pre-processed by L. Yang & Y. Orenstein. PBM data for Forkhead proteins was
obtained from (Nakagawa et al., 2013). Table 3 summarizes datasets that were analyzed
here. SELEX-seq data for MEF2B was generated according to experimental procedures.
Table 3. Datasets used in this work.
Study dataset Reference
DNase I (Lazarovici et al., 2013) (Lazarovici et al., 2013)
ChIP-seq of MEF2A and MEF2C ENCODE project, ChIP-seq data GEO: GSM803511 and GSM803420
Forkhead PBM data for Forkhead (Nakagawa et al., 2013)
HT-SELEX HT-SELEX MEF2D (Jolma et al., 2013)
SNP data analysis for Drosophila sp enhancer data; ChIP-seq data (Kvon et al., 2014; MacArthur et al., 2009)
26
2.6 EXPERIMENTAL PROCEDURES FOR SELEX-SEQ
The procedures used for MEF2B SELEX-seq experiments were based on previously
described protocols (Riley et al., 2014; Slattery et al., 2011) and are summarized below.
Figure 4 shows an overall representation of experimental procedures.
Figure 4. Overview of SELEX-seq protocol. The 60-base-long ssDNA containing a random region of 16 bases is
converted into dsDNA by a Klenow reaction with oligo SR1. A 60bp dsDNA library is used for EMSA in the presence
of the protein of interest. The higher band corresponding to the complex is isolated from the gel and the DNA is eluted.
After ethanol precipitation, the isolated DNA is amplified by PCR. From this step, the product can either be prepared
for sequencing or undergo a new round of selection.
2.6.1 PROTEIN EXPRESSION AND PURIFICATION
Vector pET30-b(+) containing cloned wild type (wt) human MEF2B (residues 1-93)
with and without a HIS-tag (LVPRGSKLAAALEHHHHHH) were obtained from Dr. Lin
Chen’s lab at USC. The HIS-tagged construct was used as template for site-directed
mutagenesis, performed as described in Quick-Change II kit manual (Agilent).
Oligonucleotides used for site-directed mutagenesis are shown in Table 4. Following
27
transformation and incubation, plasmids from individual colonies were isolated using the
Qiagen miniprep kit (Qiagen), and their identities confirmed by Sanger sequencing.
Table 4. Oligonucleotides used for site-directed mutagenesis.
Sequence Name Sequence
M2BK5E_FW CATATGGGGCGTAAAGAAATCCAGATCTCCCG
M2BK5E_RV CGGGAGATCTGGATTTCTTTACGCCCCATATG
M2BR24Q_FW GTGACGTTCACCAAGCAGAAGTTCGGGCTGATG
M2BR24Q_RV CATCAGCCCGAACTTCTGCTTGGTGAACGTCAC
M2BK04E_FW GATATACATATGGGGCGTGAAAAAATCCAGATCTCCCG
M2BK04E_RV CGGGAGATCTGGATTTTTTCACGCCCCATATGTATATC
M2BR03A_FW GAAGGAGATATACATATGGGGGCTAAAAAAATCCAGATCTCCC
M2BR03A_RV GGGAGATCTGGATTTTTTTAGCCCCCATATGTATATCTCCTTC
M2BR15G_FW CATCCTGGACCAAGGCAATCGGCAGGTGAC
M2BR15G_RV GTCACCTGCCGATTGCCTTGGTCCAGGATG
M2BK23R_FW GCAGGTGACGTTCACCAGACGGAAGTTCGGGCTGATG
M2BK23R_RV CATCAGCCCGAACTTCCGTCTGGTGAACGTCACCTGC
Protein expression was carried out using Escherichia coli Rosetta™(D3) pLysS cells
(Novagen). To this end, bacteria was inoculated in 2xYT media in the presence of
50µg/ml kanamycin, 34µg/ml chloramphenicol and incubated in shaker at 37°C. Once
culture density reached OD
600
~ 0.6, protein expression was induced by the addition of
500mM IPTG and overnight incubation on shaker at 220rpm, 23°C. Samples from before
and after induction were collected to check protein induction. Cells were collected by
centrifugation for 15min at 6000 x g. Protein purification from cell lysate was achieved
by either sp sepharose (Wu et al., 2010) (GE) or Ni-NTA agarose (Qiagen). Cell lysis
was performed by sonication in buffer consisting of 50mM HEPES pH 7.6, 300mM
NaCl, 1mM EDTA, 1mM DTT unless otherwise noted. For the Ni-NTA method, EDTA
was removed and imidazole was added at 20mM. Ultra-centrifugation was used to
separate the soluble and insoluble fractions of the cell lysate, which were analyzed
28
through SDS-PAGE. Purification of HIS-tagged proteins was performed according to
manufacturer’s instructions, with binding, washing, and elution buffer containing 20mM,
30mM and 150mM imidazole, respectively. Protein was stored in 250mM NaCl, 10mM
Hepes pH 7.6, 1mM EDTA and 1mM DTT.
2.6.2 OLIGONUCLEOTIDES
All oligonucleotides were purchased from Integrated DNA Technologies (Coralwille,
IA). For binding assays, dsDNA (Table 5) was prepared by annealing complementary
strands at 95°C in annealing buffer (IDT) and cooled down to RT.
Table 5. Oligonucleotides used for SELEX-seq.
Oligo id Oligo Sequence (5' - 3')
Selex-Lib GAGTTCTACAGTCCGACGATCCGC[N 16]CCTGGAATTCTCGGGTGCCA
SR1 TGGCACCCGAGAATTCCA
SF1 GAGTTCTACAGTCCGACGAT
SR1-FAM /56-FAM/TGGCACCCGAGAATTCCA
RP1 AATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACAGTCCGA
RPI# CAAGCAGAAGACGGCATACGAGAT[Index]GTGACTGGAGTTCCTTGGCACCCGAGAATTCCA
PCTRL#1 GAGTTCTACAGTCCGACGATCCGCATCTTATAAATAGTCTCCTGGAATTCTCGGGTGCCA
NCTRL#1 GAGTTCTACAGTCCGACGATCCGCATGGTGTGGCTGGTCTCCTGGAATTCTCGGGTGCCA
PCTRL#2 ACAGTCCGACGAGTCCGCATCTTATAAATAGTCTCGAGCACGAGAGTCCCCGTGAGGCGG
NCTRL#2 ACAGTCCGACGAGTCCGCATCGTGTGGCTCGAGGCGAGCACGAGAGTCCCCGTGAGGCGG
2.6.3 SELEX-SEQ LIBRARY GENERATION
An oligonucleotide library “Selex-Lib” containing a random region of 16 nucleotides and
flank regions compatible with Illumina sequencing was purchased from IDT with hand-
mix option. A complementary “SR1” oligonucleotide was annealed to the Selex-Lib as
follow: 6µl of 100µM SR1 and 3µl of 100µM Selex-Lib were mixed in the presence of
3µl 10X STE buffer (10mM Tris pH 8.0, 1M NaCl, 1mM EDTA) and incubated at 94°C
for 5 min in a final 30µl reaction. The reaction was allowed to cool down to RT. A
29
Klenow reaction was then performed to fully extend the complementary strand using
10µl of the previous product in a 25µl final reaction containing 1X NEB buffer 2, 0.8mM
dNTPs, 25 units of DNA polymerase I, (Large) Klenow fragment. After incubation for
30min, 0.5µl of 0.5M EDTA was added to stop the reaction. The product was then
applied on a 2% agarose gel and gel-purified using the minElute kit (Qiagen) according
to manufacturer’s protocol. A similar protocol was use for generation of 5’ 6-
FAM(Fluorescein)-labeled SELEX controls. DNA concentration was measured by
nanodrop.
2.6.4 ELECTROPHORETIC MOBILITY SHIFT ASSAY (EMSA)
Polyacrylamide gels were hand-casted (8%, 37.5:1 acrylamide/bis-acrylamide, Biorad)
and pre-runned for 20min at 150V. Native gel electrophoresis was performed for 40-
70min, depending on DNA fragment size, at 150V using 0.5X TBE at 4°C. All
polyacrylamide gel electrophoresis (PAGE) was carried out on a mini-PROTEAN
apparatus system (Biorad). For the initial assessment of protein-DNA binding, reactions
were carried out in 10mM HEPES (pH 7.6), 250mM NaCl, 1mM EDTA, 2.5% glycerol.
Binding reactions for SELEX-seq validation were carried out in 10mM HEPES (pH 7.6),
1mM DTT, 1mM EDTA, 5% glycerol, and variable concentrations of NaCl (50mM-
250mM), 0.05mg/ml BSA, and poly(dI-dC). Control probes of the same size were used
for validation, where the random region was replaced by a favorable (positive control,
here designated as PCTRL) or unfavorable (negative control, designated as NCTRL)
binding site for MEF2B. These controls presented either the same flank region (CTRL#1)
or a mutated flank region (CTRL#2) when compared to the random library. The DNA
30
concentration was kept at a constant concentration, while a protein serial dilution was
made in binding buffer from previous aliquoted samples.
Binding reactions for SELEX-seq were performed in 10mM HEPES (pH 7.6), 150mM
NaCl, 0.5mM EDTA, 0.5mM DTT, 0.05mg/ml BSA, 5% glycerol. A final solution of
30µl in binding buffer containing approximately 200nM of Selex-library DNA and 40nM
protein was incubated for 20min at RT. A parallel reaction using a 5’ 6-FAM-labeled
probe was used as control. Gels were visualized based either on SYBR® staining
(ThermoFisherScientific) or a fluorescent probe (EMSA for SELEX-seq). Gels were
visualized using a PharosFX™ (Biorad).
2.6.5 GENERATION OF SELECTED LIBRARY FOR ROUNDS OF SELECTION AND SEQUENCING
Bound fractions were isolated as described previously (Riley et al., 2014), with the
exception that visualization was based on fluorescence of the control probe. Polymerase
chain reaction (PCR) was then used to amplify the isolated DNA. Five reactions, each
containing 0.4mM of primers SR1 and SF1, 0.2mM dNTP, 2.5 U of Taq polymerase
(NEB), and a maximum of 2ul of isolated DNA in 1X Taq buffer were used. PCR
conditions were 95°C initial denaturation, 15-17 cycles of amplification (denaturation for
30s at 95°C, annealing for 15s at 55°C, extension for 20s at 68°C), with final extension
for 5min at 68°C. After PCR purification with minElute kit (Qiagen), the sample was
quantified by nanodrop. This product was then used for the next round of selection or for
Illumina sequencing library preparation. For the next round of selection, all procedures
from binding reaction to purification of PCR product were repeated. For preparing the
sequencing library, a low-cycle PCR using Phusion DNA polymerase (NEB) was carried
31
out as described elsewhere (Riley et al., 2014). Unique RPI# indexing primers were used
to allow for library multiplexing in the same sequencing lane. The product was purified
and applied on a 6% polyacrylamide gel. Electrophoresis was carried out at 100V until
bromophenol reached the end of the gel. The desired product (140bp), containing both
adapters, was isolated from the gel, eluted in 1X NEB 2 buffer for 2h at room
temperature, and ethanol precipitated. Samples were quantified by Qubit® 2.0
(ThermoFisherScientific). Sequencing was performed under Illumina NextSeq 500
platform at the USC Norris Genomic Core.
2.7 HIGH-THROUGHPUT PREDICTION OF DNA SHAPE
DNA shape parameters were predicted using a high-throughput DNA shape method as
previously described (Zhou et al., 2013). The HT DNA shape method assigns features to
a single position by using a sliding pentamer window with data mined from MC
trajectories. The resulting value is the average of unique pentamers, and include features
for minor groove width (MGW), roll (Roll), propeller twist (ProT) and helical twist
(HelT). Thus, by using this method, DNA shape features can be estimated for any desired
DNA length in a high-throughput manner. This method takes into account structural
information regarding neighboring sequences. For specific details on the analysis of each
dataset, see below.
2.8 REGRESSION MODELS FOR BINDING AFFINITY PREDICTION
An L2-regularized multiple linear regression (MLR) model was trained in order to predict
relative binding affinities from SELEX data as previously described (Yang et al., 2014;
Zhou et al., 2015). A 10-fold cross-validation was used, and models encoding nucleotide
32
sequence, shape, augmented shape, or a combination of those were trained. The R
2
was
calculated between predicted and experimentally validated values for binding affinities
with a 10-fold cross validation.
2.9 HIGH-THROUGHPUT DATA ANALYSIS
2.9.1 MINOR GROOVE WIDTH ANALYSIS FOR DNASE I SITES
From cleavage events data for DNase I, minor groove width was predicted for each of the
4,096 unique hexamers with the HT approach described above (Zhou et al., 2013). In the
HT approach, the central base pair of each pentamer was assigned a MGW value equal to
the average of all occurrences of that pentamer in our MC dataset. For a given hexamer
(N
-3
N
-2
N
-1
|N
+1
N
+2
N
+3
), where “|” represents the cleavage site, the MGW predictions for
positions N
-1
and N
+1
were unambiguous. For the remaining positions, we predicted
MGW as the average over all 16 possible dinucleotide flanks.
2.9.2 DNA SHAPE ANALYSIS OF CHIP-SEQ FOR MEF2A AND MEF2C
ChIP-seq datasets for MEF2A and MEF2C were obtained from the ENCODE project,
release 3 (GEO accession numbers: GSM803511 and GSM803420). The binding sites
were extracted using FIMO with the JASPAR (Mathelier et al., 2016a) input PWM of
MEF2A (MA0052.2) or MEF2C (MA0497.1). The sequences obtained for both MEFF2A
(6393) and MEF2C (4340) binding sites were then aligned based on the core motif. The
peak score was used as a measure of protein binding. For shape readout analysis, the
minor groove width was calculated with the HT DNA shape prediction method described
above. An MLR model with a 10-fold cross validation for predicting binding affinities
was trained as described earlier. Positional minor groove width distributions along the
33
DNA binding sites of MEF2A and MEF2C were compared. The Mann–Whitney U test
was used to investigate whether the 2 samples had significantly different distributions.
2.9.3 HT-SELEX DATA ANALYSIS OF MEF2D
HT-SELEX data for MEF2D was obtained from (Jolma et al., 2013) and pre-processed
by L. Yang & Y. Orenstein An L2-regularized MLR was used to train a models that
included information of sequence only (1mer, 2mer or 3mer), shape only (MGW, Roll,
ProT, HelT), or a combination of both. Model performance was evaluated based on the
coefficient of determination R
2
.
2.9.4 SELEX-SEQ DATA ANALYSIS OF MEF2B
SELEX-seq data for MEF2B was generated according to the experimental procedures. In
order to infer relative binding affinities, a Markov model was build using the initial
library round (R0) to predict the initial bias of the sequence pool. To do so, the initial
library pool of round 0 (R0) was separated into testing and training datasets. The longest
oligonucleotide length k was then found based on the longest length k (k=8 in this case)
for a given k-mer so that it appeared at least 100 times. A Markov model was trained and
evaluated, and the resulting model was used for further analysis. To calculate the
information gain after rounds of selection, varying k-mer lengths were taken into account.
Using the generated Markov model combined to the data from round 2 (R2) of selection,
relative affinities for k-mers were inferred by calculating the square root of the
enrichment ratio. This was accomplished using the package “SELEX” for motif
discovery from SELEX-seq data (Rastogi et al., 2015) available as a Bioconductor R
34
package. The basis of the analysis is described elsewhere (Riley et al., 2014; Slattery et
al., 2011).
35
3 RESULTS
3.1 PROTEIN-DNA READOUT LESSONS FROM CRYSTAL STRUCTURES
Tremendous advances have been made in the field of protein-DNA recognition: different
domains and folds used by DNA binding proteins have been described (Garvie and
Wolberger, 2001; Hong and Marmorstein, 2008; Pabo and Sauer, 1992; Rohs et al.,
2010); protein-DNA recognition patterns of various members of the same family have
been studied (Berger et al., 2008; Dror et al., 2014; Kissinger et al., 1990; Passner et al.,
1999). Yet, the question of which specific protein recognizes each specific piece of DNA
is not a trivial one to answer. In this section, protein-DNA readout modes dependent on
DNA shape recognition and conformational changes are explored. By studying co-crystal
structures of protein-DNA complexes, we demonstrated that histidines and lysines can
play a similar role in DNA shape readout in addition to the previously described
mechanism observed for arginines (Rohs et al., 2009a). By analyzing crystal structures of
p53 bound to DNA, we showed how variations in DNA structural parameters allow for
p53 to bind to a response element with an extra base-pair while maintaining its overall
tetrameric form. The work presented in this section is part of papers that I co-authored,
and part of collaborations with the laboratories of Drs. Catherine Lawson, Xiaojiang
Chen, Lin Chen, Peter Qin and Zhongzhou Chen. Therefore, some of the sections below
were partially adapted from my contributions to publications (Chang et al., 2013; Chen et
al., 2013; Deng et al., 2015; Zhang et al., 2014).
36
3.1.1 EXPANDING THE REPERTOIRE OF DNA SHAPE RECOGNITION MODES
3.1.1.1 Arginine-mediated DNA shape recognition
Previously, studies on co-crystal structures had shown the importance of DNA shape for
protein-DNA recognition. In particular, the importance of arginine residues was seen for
members of different protein families in which these arginines recognized the enhanced
negative electrostatic potential due to a narrowing of the minor groove. Here, we have
found this mechanism to be a readout mode characteristic of other proteins. Among these,
arginine base readout was observed for the aCTDs subunits of the Escherichia coli RNA
polymerase (RNAP), shown in Figure 5. First, two previously published re-refined
structures were analyzed (Benoff et al., 2002; Hudson et al., 2009; Lara-González et al.,
2010): the CAP-aCTD-DNA (referred to as CAD) and an asymmetric complex aCTD-
sR4-DNA (referred to as ASD). As seen in Figure 5A, the aCTDs interact with a region
of enhanced negative potential and narrow minor groove in both complexes. The narrow
minor groove is an intrinsic feature of the DNA, as observed by the profile of minor
groove width from the bound structure (blue line) and the predicted free DNA (green
line) (Figure 5A). Further evidence for the role of shape readout employed by the aCTD
subunits comes from substituting one of the positions within the A-tract from A to C
(Figure 5B-C). With this substitution, new crystallization studies by our collaborator
revealed reduced binding of aCTD2 (not shown). Analyzing the shape of bound DNA
versus predicted free DNA revealed shape differences between the two binding sites
(Figure 5C). This arginine readout mode was also investigated for other structures (see
Table 1 and next sections for details).
37
Figure 5. Shape readout by the aCTD subunit of RNA Polymerase. (A) Minor groove width for bound (blue) and
unbound (green) DNAs, and electrostatic potential of the bound DNAs (red) are plotted for CAD and ASD. Arrows
point to the region where R265 inserts into the minor groove. (B) Representation showing DNA surface and R265 of
aCTD inserting into the minor groove. The location of the DNA binding site of CAD that had a position within the A-
tract substituted (A/C) is shown in red. The figure was generated with PyMol (DeLano, 2002). (C) Plot of minor groove
width for CAD (A-tract “A
6
”) and KO-CAD (A-tract “A
5
”) with AàC substitution. Structural analysis based on the
CAD, ASD and KO-CAD structures, which have been deposited under the PDB IDs 3N4M, 3N97 and 5CIZ,
respectively (to be published in Lara-Gonzalez et al.). Adapted from Lara-Gonzalez et al., manuscript in preparation.
3.1.1.2 Histidines in shape readout recognition of the Simian Virus 40 Large T Antigen
On the basis of the analysis of the co-crystal structure of the Simian Virus Large T
Antigen bound to its origin (ori) of replication DNA, published in (Chang et al., 2013),
here we show how histidines can play a role in DNA shape readout. The co-crystal
structure showed the Early Palindrome (EP) DNA region interacting with 2 subunits of
the protein. Unlike the interactions of the two Origin Binding Domains (OBDs) with ori
DNA at the major groove, the histidine residues of the AAA+ domains of both subunits 1
and 2 interact with the minor groove (Figure 6A).
38
Figure 6. AAA+ domain of LTag employs DNA shape readout recognition. (A) Schematic representation of protein-
DNA complex showing region of shape readout. (B-C) Narrow Minor-Groove Geometry of the Region of EP-Ori DNA
where Both b-Hairpin H513 Residues Bind. (B) The shape of the molecular surface is shown with GRASP2 (concave
surfaces in dark gray; convex surfaces in green; (Petrey and Honig, 2003). The red mesh represents an isopotential
surface at 5 kT/e, calculated with DelPhi at a physiologic ionic strength of 0.145 M (Rocchia et al., 2002). The H513
residues from both subunits intrude into the minor groove in a region with enhanced negative electrostatic potential as a
result of narrowing the width of the minor groove to 4.5 Å (with an electrostatic potential of 9.4 kT/e) from the normal
width of 5.8 Å (with an electrostatic potential of 7.2 kT/e. (C) The minor-groove width of bound ori DNA in our crystal
structure (blue) and unbound ori DNA predicted in MC simulations (green), and the electrostatic potential in the center
of the minor groove calculated with DelPhi (red) illustrate that the H513 residues bind a region with an intrinsically
narrow minor groove. The enhanced negative electrostatic potential in the narrower groove region attracts H513
residues through favorable electrostatic interactions, a mechanism known as shape readout. DNA structural analysis
based on PDB ID 4GDF (Chang et al., 2013). Adapted from (Chang et al., 2013).
The H513 residues on the β-hairpin from both subunits bind to the minor groove side by
side in a nearly identical fashion with both imidazole rings lying in approximately the
same plane, following the helical path of the minor groove. The N-N distance between
the two H513 imidazole groups is ~2.7 Å, a distance indicative of the formation of a
hydrogen bond (Figure 6B), which may further stabilize the interactions within the
ternary complex of the LTag dimer and ori DNA. The minor groove region bound by the
two H513 side chains is narrower than its adjacent regions (Figure 6B, blue line), with a
minimum width of 4.5 Å (versus 5.8 Å for the minor groove width of standard B-DNA).
39
To distinguish between an intrinsic or protein induced narrow minor groove, we carried
out MC simulations of the DNA structure using the origin sequence. The result indicates
that this DNA region of the H513 contacts is characterized by an intrinsically narrow
groove in the absence of protein binding (Figure 6C, green line). The narrowed minor-
groove region where the H513 residues bind has an electrostatic potential that is ~2 kT/e
more negative than the potential in the wider minor groove of adjacent regions (Figure
6A, red mesh, and B, red line). Thus, the binding of H513 residues to the EP-ori sequence
is characterized by a shape readout mechanism whereby positively charged protein
residues bind to intrinsically narrow regions of the minor groove with enhanced negative
electrostatic potential (Rohs et al., 2009a).
After observing the origin recognition mode of the LTag His513 residue in this structure,
we sought to determine whether the observation that histidine residues recognize narrow
minor groove widths and enhanced negative electrostatic potential is of a more general
nature. We analyzed the minor groove width and electrostatic potential for various
structures that are part of the IFN-β enhanceosome (Escalante et al., 2007; Panne et al.,
2004, 2007). This analysis revealed that conserved histidine residues from IRF-3 (His40)
and IRF-7 (His46) that intrude into the minor groove consistently bind regions of narrow
minor groove and enhanced negative electrostatic potential (Figure 7).
40
Figure 7. Minor groove shape readout through histidine residues for the IFN-β enhanceosome. (A) DNA sequence of
IFN-β enhanceosome with boxes indicating three different crystal structures containing different overlapping parts of
the multiple interferon regulatory factor (IRF)-DNA complexes. (B–D) The minor groove width and electrostatic
potential of these three crystal structures have been analyzed and labeled by their PDB IDs: (B) 1T2K (Panne et al.,
2004), (C) 2PI0 (Escalante et al., 2007), and (D) 2O61 (Panne et al., 2007). The results show that minor groove width
(blue) and electrostatic potential (red) are highly correlated. Histidine residues from IRFs contact minima in groove
width and electrostatic potential, namely His40 from IRF-3 and His46 from IRF-7, which are highly conserved residues
in IRFs. These His side chains bind DNA alone and are likely protonated due to the highly charged environment of
DNA. This observation suggests that histidine can be used to recognize DNA based on a mechanism previously found
for arginines and described as shape readout (Rohs et al., 2009a).The initial contact of the LTag AAA+ domain with
the ori DNA is formed through this mechanism. Thus, through the analysis of co-crystal structure of LTag and ori DNA
complex, together with those of the IFN-β enhanceosome, we have established a histidine-mediated DNA shape-
readout mechanism that could potentially be used in the DNA recognition by many protein families. Adapted from
(Chang et al., 2013).
3.1.1.3 Lysine residues implicated in electrostatic dependent DNA shape recognition
Fur recognizes DNA by using a combination of base readout through direct contacts in
the major groove and shape readout through recognition of the minor-groove electrostatic
potential by lysine. One of the most prominent features of DNA shape recognition by Fur
is that the DNA target contained an AT-rich region that was characterized by a narrow
minor groove (Figure 8B). While many interactions occurred in the major groove and on
the phosphodiester backbone, a different recognition mode involved Lys15 of the L1
41
loop, which inserts into the minor groove with few direct interactions with DNA.
However, structure analysis showed that the Lys15 residues anchored in the minor
groove of the Fur-Mn
2+
–feoAB1 operator and the Fur-Mn
2+
–P. aeruginosa Fur box
without base-specific contacts.
Figure 8. Narrow DNA minor groove geometry where the Lys15 residue binds. (A) feoAB1 operator and P. aeruginosa
Fur box (PDB IDs 4RB3 and 4RB1, respectively) (Deng et al., 2015). Shape of the molecular surface is shown with
GRASP2 (concave surfaces in dark grey, convex surfaces in green) (Petrey and Honig, 2003). The red mesh represents
an isopotential surface at 5 kTe
-1
, calculated with DelPhi at a physiologic ionic strength of 0.145 M (Rocchia et al.,
2002). (B) Lys15 residues intrude into the minor groove in a region with enhanced negative electrostatic potential as a
result of narrowing the width of the minor groove. The minor groove width of bound DNA in our crystal structure
(blue) and the electrostatic potential in the minor groove calculated with DelPhi (red) illustrate minor groove binding
sites for the Lys15 residues. The enhanced negative electrostatic potential in the narrower groove region attracts Lys15
residues through favorable electrostatic interactions, a mechanism known as DNA shape readout. (C) Hydroxyl radical
cleavage intensities of unbound Fur binding sites derived with ORChID2. Top: consensus Fur-boxes among different
species (Baichoo and Helmann, 2002; Butcher et al., 2012; Davies et al., 2011; Ochsner and Vasil, 1996); Bottom:
average of 32 sequences bound by Fur from P. aeruginosa (Ochsner and Vasil, 1996). The different sequences were
aligned to the Fur box based on the conserved nucleotides. Positions where Lys15 would interact with the DNA minor
groove, based on the co-crystal structures from this study, are highlighted by an arrow. The sequences used to generate
top and bottom panels in “c” are listed in Supplementary Table 1 and Supplementary Figure 6 of (Deng et al., 2015).
Adapted from (Deng et al., 2015).
To uncover the recognition mechanism of Lys15, we further analyzed the DNA structure.
The minor groove region bound by the Lys15 side chains was narrower than its adjacent
regions (Figure 8, blue line), with a minimum width of about 4 Å (versus 5.8 Å for
42
standard B-DNA). The narrower minor groove might be an intrinsic structural feature of
the DNA sequence, or it may be induced by Fur binding. To distinguish between these
possibilities, we probed minor groove topographies of unbound Fur targets using
hydroxyl radical-cleavage intensities (Bishop et al., 2011). This analysis indicated that
both DNA regions of the Lys15 contacts were characterized by an intrinsically narrow
groove in the absence of the protein (Figure 8C). The negative electrostatic potential in
the minor groove is enhanced as the groove width decreases. The electrostatic potential of
the narrow minor groove regions where the Lys15 residues bind were about
3 kT e
−1
more negative than those of the wider minor groove of adjacent regions (Figure
8B, red line). Thus, the positively charged Lys15 residues favorably bind to the
intrinsically narrow minor groove with enhanced negative electrostatic potential. This
observation suggests that the binding of Lys15 residues to the target DNA is a specific
form of the DNA shape readout mechanism. Lysine, despite its positive charge and
abundance on protein surfaces has not been widely associated with this readout mode
(Rohs et al., 2009a). Nevertheless, the results in Fur demonstrate that lysine can play this
role (Figure 8). A survey of structures from other protein families revealed additional
examples where a lysine binds to narrow minor groove regions with enhanced negative
electrostatic potential (Figure 9).
43
Figure 9. Lysine is employed to recognize minor groove shape. After revealing the use of Lys15 side chains in Fur-
DNA recognition, we surveyed the PDB, which resulted, compared with arginine, in fewer examples where lysine
alone has been observed to intrude a narrow minor groove, such as (a) the Rattus norvegicus glucocorticoid receptor
DBD homodimer (PDB ID: 3FYL) (Meijsing et al., 2009) , (b) the Mycobacterium tuberculosis DnaA-DBD (PDB ID:
3PVP) (Tsodikov and Biswas, 2011) , and (c) the Xenopus tropicalis Tet3 CXXC domain (PDB ID: 4HP3) (Xu et al.,
2012) . Among these examples, DnaA binds to origins of replication through the insertion of Lys436 into the minor
groove, and when mutated to alanine its binding affinity decreases (Tsodikov and Biswas, 2011). The same study
shows that in other bacteria, however, the respective position is occupied by an arginine instead of lysine, which carries
the same positive charge while its desolvation cost is lower than that of lysine (Rohs et al., 2009a). Taken together, the
ternary Fur-DNA complexes presented in this study revealed the use of lysine in recognizing intrinsically narrow
regions of the minor groove with enhanced negative electrostatic potential, and the survey of the PDB suggests that
lysine can replace arginine in certain systems, thus expanding the currently known repertoire of DNA shape readout
mechanisms. Adapted from (Deng et al., 2015).
3.1.2 DNA CONFORMATIONAL CHANGES INDUCED UPON P53 BINDING
For proteins to exert their regulatory function, they recognize different binding sites
within the genome. A long-standing question remained as to how p53 could exert its
function when recognizing different binding sites. By studying the co-crystal of a DNA
response element (RE) in the promoter of the Bcl-2-associated X protein (BAX)
containing a one base pair spacer and the p53 tetramer we were able to shed insights into
how the extra base pair can be accommodated. In another p53-related study, an approach
44
has been developed in which by combining DNA simulation derived from Monte Carlo
simulations with EPR we were able to deduce DNA conformational state in solution.
3.1.2.1 BAX DNA absorbs extra base pair by unwinding and compression
The RE located at the promoter of the BAX gene contains a one base-pair insertion
between the two half-sites. In spite of this, the p53 complex bound to the BAX RE is
similar to that previously seen in the presence of a contiguous site (3KMD) between the 2
half-sites. This observation was surprising since previous studies of other members of the
p53 family and modeling studies suggested a different tetrameric architecture (Chen et
al., 2011; Ethayathulla et al., 2012; Pan and Nussinov, 2009). To get a better
understanding of how the 1bp spacer was accommodated, we performed analysis on the
DNA of the BAX RE (PDB ID 4HJE) (Chen et al., 2013) and a DNA with continuous
site (PDB ID 3KMD) (Chen et al., 2010), referred as BAX and 3KMD DNA from now
on. Although structural details of DNA distortion at the half-site interface are not
definitive at present, such distortions are likely responsible for correctly positioning the
two core CWWG elements to allow binding of a conserved p53 core tetramer structure to
the BAX promoter in the presence of the 1-bp insertion. Computational analysis reveals
that the DNA between the two CWWG elements exhibits a bending of ~6° (Figure 10A),
an increase in helix diameter (Figure 10B) and an increase in minor groove width in the
central region (Figure 10C). The main feature of the BAX-RE, however, is a partial
unwinding of the region between both core CWWG elements, which fully absorbs the
helix twist of the additional base-pair step and enables the formation of favorable inter-
dimer interactions.
45
Figure 10. DNA structure analyses of p53 RE BAX compared to 3KMD DNA. (A) Schematic representation of the
DNA conformations in the BAX (green) and 3KMD (magenta) structures. The helix axes indicate a slight bending,
while the backbones suggest an increase in helix diameter, due to deformations accommodating the additional base pair
in the BAX-RE. (B) The increase in helix diameter in the center of the BAX (green) vs. the 3KMD (magenta) DNA
target is illustrated based on a Curves analysis. (C) The minor groove width comparison of the BAX (green) vs. the
3KMD (magenta) DNA conformation. (D) The difference in helix twist between an increasing number of base pairs
centered around the interface of both half-sites of the BAX and 3KMD DNA structures demonstrates that 7 bp in the
BAX-RE are accommodated in the same rotational space as 6 bp in the 3KMD structure. This arrangement places the
CWWG core elements at a similar relative positioning, allowing for the formation of bidentate hydrogen bonds
between the guanines of the CWWG core elements and Arg280 residues of an almost identical p53 core tetramer
assembly. DNA structure analysis was based on BAX (PDB ID 4HJE) (Chen et al., 2013) and 3KMD (PDB ID 3KMD)
(Chen et al., 2010). Adapted from (Chen et al., 2013).
46
The structural details at the intersection of the two half-sites cannot be resolved with high
accuracy, but comparison of the BAX and 3KMD structures yields a negligible difference
in helix twist between the two most central GC base pair of the core elements (positions 7
and 15 in the 21-mer, and positions 7 and 14 in the 20-mer), which form bidentate
hydrogen bonds with Arg280, indicating an almost identical relative positioning (Figure
10D). This is also indicated by the helix twist values with an average helix twist of 31.3
between the inner CG base pairs of the CWWG core elements in the BAX structure
compared with 36.1 between the equivalent base pairs in the 3KMD structure.
Figure 11. The outer region of the DNA accommodates the extra bp in the BAX complex. (A) The p53-DNA interface
is located on one side the double helix. The BAX RE forms an interface with the p53 core domain tetramer that is
almost identical to a binding site without spacer. This is possible because the required conformational adjustment of the
DNA occurs at the outer side of the double helix that is not in contact with the protein whereas the inner side closely
resembles the conformation of a binding site without spacer (3KMD). (B) Different deformations of the inner and outer
side of the double helix. The structures of the BAX RE (green) and 3KMD (magenta) are compared by a series of
distances at the outer and inner side of the double helix, with respect to the protein-DNA interface. Only one distance at
the outer side (red) varies significantly between both structures and indicates the compaction of the complex upon
binding. (C) Distance measurements at inner and outer side of the DNA. The deformation of the BAX RE is most
47
apparent for only one distance measurement at the outer side of the DNA (red). DNA structure analysis was based on
BAX (PDB ID 4HJE) (Chen et al., 2013) and 3KMD (PDB ID 3KMD) (Chen et al., 2010). Adapted from (Chen et al.,
2013).
With a partial unwinding of the double helix and increase of its diameter and minor
groove width, the central 7 bps between the core elements of the BAX-RE are adequately
accommodated in the space occupied by only 6 bps in the 20-mer of the previously
solved structure (3KMD), thus adjusting to the formation of the p53 core tetramer.
Further computational analysis shows that such adjustment leads to deformations mainly
on the side of the double helix that is not in close contact with the protein (outer region)
(Figure 11), which has more conformational freedom to fit the DNA into its bound form.
3.1.2.2 Conformations of unbound REs using an approach to deduce DNA conformation
in solution
We introduce a new experimental/computational pipeline, in which the method of site-
directed spin labeling (SDSL) is combined with all-atom Monte Carlo (MC) simulations
to derive atomic resolution data representing the sequence-dependent conformation of
DNA duplexes in solution. SDSL uses electron paramagnetic resonance (EPR)
spectroscopy to monitor nitroxide radicals (i.e. the spin labels) attached at specific sites
of biomolecules, and has matured as a tool for studying the structure and dynamics of
proteins and nucleic acids (Fanucci and Cafiso, 2006; Sowa and Qin, 2008). The MC
simulation technique was shown to enable efficient conformational sampling and was
extensively validated using massive experimental data from X-ray crystallography, NMR
spectroscopy and hydroxyl radical cleavage experiments (Bishop et al., 2011; Zhou et al.,
2013).
48
To obtain all-atom models of unbound REs, we implemented a strategy to use the double-
electron-electron-resonance (DEER) measured distances to select sterically acceptable
models from a large pool of 3D structures. In our approach, these structures were
generated by unrestrained MC simulations (Rohs et al., 2005a). For each structural
model, the NASNOX program (Qin et al., 2007) was used to select sterically allowed R5
conformers at the respective labeling sites, from which the relevant average inter-R5
distances (rmodel) were obtained. We then computed a scoring function Pt for each
model, taking into account the measured and predicted average distances (r0 and rmodel)
and the width of the measured distance distribution. We then compared the DNA
structures of the top-ranked model and top-20 models with the structure of the bound
DNA for both, p21-RE and BAX-RE . The analyses revealed RE shape variations and
suggested tangible connections between structural features in the unbound and bound
DNA (Figure 12). For the BAX-RE, larger positive roll of the T
9
pA
10
base pair step was
already apparent in the unbound DNA (Figure 12C). Such intrinsic property of the TpA
step facilitated widening of the minor groove, which was observed in the bound form
(Figure 12A). In addition, the unbound BAX-RE was under-wound at the central region
(Figure 12), thus facilitating further unwinding to accommodate the 9-bp central region
into the same volume occupied by 8 bp in other REs with 0-bp spacers (Chen et al.,
2013).
49
Figure 12. Analyses of p53 RE structures. The DNA shape parameters (A) minor groove width, (B) helix twist and (C)
roll are shown for the p21-RE (left panel) and the BAX-RE (right panel). Structural features were derived from the
crystal structures of the complex (red), the top-ranked MC models (green) and the averages of the top-20 MC models
(blue). The error bars indicate the standard deviations of structural parameters among the top-20 models, demonstrating
an efficient conformational sampling. Analysis of DNA from crystal structures was based on the p21-RE (PDB ID
3TS8) (Emamzadah et al., 2011) and BAX (PDB ID 4HJE) (Chen et al., 2013). The structural parameters indicate that
the conformations observed in the crystal structures of the bound forms were partially apparent in the intrinsic DNA
shape of the unbound forms. Examples for this observation are the low helix twist values at the C
10
pC
11
step of the p21-
RE and the A
10
pG
11
step of the BAX-RE, as well as the negative Roll at the A
12
pA
13
step of the p21-RE and the
positive Roll at the T
9
pA
10
step of the BAX-RE. Adapted from (Zhang et al., 2014).
On the other hand, in the unbound p21-RE the relative positioning of the two CWWG
cores deviated significantly from the bound form, necessitating a shift of the helix axis at
a ‘hinge’ located at the interface between two half-sites (not shown). Further analyses
indicated that conformational changes at the central region of the REs occurred to
facilitate proper protein–DNA interactions, while at the same time maintaining the intra-
and inter-dimer protein contacts (not shown).
50
3.2 DNA SHAPE READOUT IN THE GENOMICS ERA: RESULTS FROM DNASE I
CLEAVAGE AND METHYLATION STATUS
The previous section delved into shape readout mechanisms observed from crystal
structures. With the emergence of high-throughput sequencing and the genomics field,
however, a new era began. With that, protein-DNA recognition patterns could now be
analyzed in a high-throughput manner and give insights into how binding sites could be
recognized in the genome. Likewise, a new method developed for high-throughput
prediction of DNA shape (Zhou et al., 2013) now allows for the study of thousands of
binding sites at the same time. In this section, results from high-throughput studies based
on cleavage rates of DNase I are presented. For other HT studies on Forkhead protein
binding sites, the effect of single nucleotide variants on DNA shape, and MEF2 binding
sites, refer to the appendix and to the next section. The work presented in this section
originated from a collaborative project with Professor Harmem Bussemaker’s group. The
sections below were adapted from (Dantas Machado et al., 2015; Lazarovici et al., 2013).
3.2.1 DNASE I CLEAVAGE RATE AND DNA METHYLATION
By analyzing cleavage rate events of DNase I, hexamers within the context of the
cleavage site (3 upstream and 3 downstream) were analyzed, and DNA structural features
at certain positions around the cleavage site were found to be correlated with the DNase I
cleavage rate. More importantly, by analyzing cleavage rate and methylome data, we
have found that CpG methylation can increase cleavage rate.
51
3.2.1.1 Minor groove width profile is predictive of DNase I cleavage rate.
To interrogate whether a quantitative relationship existed between minor groove width
and cleavage rate, we used a HT method to predict DNA shape across all possible
hexamers. Since the hexamers occur as part of longer double-stranded DNA sequences,
we accounted for the influence of flanking sequence by averaging over all possible ways
of adding a dinucleotide flank on each side. We also used a “base pair centric” coordinate
system as previously described (Zhou et al., 2013).
To assess to what extent the variation in DNA shape might explain the observed variation
in DNase I cleavage rate, we first plotted the negative of the logarithm of the relative
DNase I cleavage rate as a function of MGW at each base pair position. We interpret this
negative logarithm as a binding free energy difference ΔΔG between a given sequence
and the optimal sequence for DNase I cleavage. This analysis revealed a clear
partitioning of the hexamer into three parts (A): at positions −3 and −2 a narrow minor
groove is highly significantly associated with higher cleavage rate (with the t values
measuring the regression coefficient in units of its SE equal to +19.8 and +15.1,
respectively); at positions −1 and +1 this relationship is reversed but still highly
significant (t values −15.6 and −26.3); at positions +2 and +3 a less strong association is
observed (t values −6.0 and +6.4). The spatial profile of correlation between MGW and
DNase I cleavage rate is consistent with features of a crystal structure of a complex of
DNase I with a nicked DNA octamer duplex (Lahm and Suck, 1991) (Figure 13B). In
that structure, an arginine, Arg41, from DNase I can be seen to interact with the minor
groove near the −3 position, while a second arginine, Arg9, contacts the minor groove
between the −2 and −1 positions (Figure 13C). The narrower the minor groove is in the 5′
region of the hexamer at the −3 and −2 positions, the higher the cleavage rate is.
52
Figure 13. MGW is predictive of DNase I cleavage rate. (A) DDG derived from the negative logarithm of cleavage rate
as a function of MGW at the six positions of all 4,096 unique hexamers. MGW of this region was predicted for naked
binding sites based on the HT shape prediction approach. HT predictions for all possible 16 dinucleotide flanks were
averaged and values of MGW that fall within intervals of 0.3 Å assigned to groups of sequences for which cleavage
rates are shown as box plots. (B) DNase I–DNA complex based on crystal structure (PDB ID 2DNJ) (Lahm and Suck,
1991). Base pairs at positions −3 and −2, where DNase I cleavage anti-correlates with MGW, are highlighted in blue.
Base pairs at positions −1 and +1, where DNase I cleavage correlates positively with MGW, are highlighted in green.
Regions where no correlation could be detected are shown in gray. The color code of the base pairs in the crystal
structure is equivalent to the one used for the box plots. (C) DNase I–minor groove contacts within a distance of 5 Å
from any base atom are shown for the same crystal structure. Arg41 and Arg9 bind upstream of the cleavage site, where
MGW anti-correlates with DNase I cleavage (blue base pairs). This anti-correlation likely arises from the attraction
between the positively charged arginine residues and the locally enhanced negative electrostatic potential. The cleavage
site (indicated by the orange arrow), by contrast, is located in a region where MGW correlates positively with DNase I
cleavage (green base pairs). Adapted from (Lazarovici et al., 2013).
These observations suggest that DNAse I employs a shape readout mechanism previously
described where arginine residues recognize the narrow minor groove of the DNA (Rohs
et al., 2010). The anti-correlation observed for positions -1 and +1 is in agreement with
53
previous reports which showed higher DNase I cleavage rates for RpY dinucleotides,
which widen the minor groove (Brukner et al., 1990; Lomonossoff et al., 1981).
3.2.1.2 CpG methylation greatly enhances adjacent DNase I cleavage.
The results above indicate that molecular recognition of DNA by DNase I is subject to
significant dependencies between nucleotides, consistent with readout of specific features
of DNA shape (Brukner et al., 1995). Since DNA methylation has the potential to alter
the structural properties of DNA (Adams, 1990), we sought to analyze the influence of
methylation on protein–DNA binding. Whole genome shotgun bisulfite sequencing data
combined with hexamer contexts containing only hypermethylated or only
hypomethylated CpG dinucleotides allowed for direct comparison of cleavage rates
between both sets, which revealed a striking dependency on methylation status for a
subset of the hexamers (Figure 14A). A systematic search for DNA sequence features
that could explain this dependency revealed that it is almost completely explained by the
occurrence of a CpG dinucleotide immediately downstream of the cleaved bond (Figure
14A).
Upon methylation of the two cytosines within the C
+1
G
+2
base pair step, the rate of
cleavage by DNase I is enhanced ∼ eightfold (red points in Figure 14A), and for the most
cleavable CpG-containing hexamer (ACT|CGA) increases from ∼7% to ∼68% of the
maximum.
54
Figure 14. Observation and analysis of the effect of methylation on DNase I cleavage rate. The rate of cleavage
depends strongly on the DNA methylation status. We used a positional map of DNA methylation in IMR90 (Lister et
al., 2009) to delineate subsets of genomic positions with low/high degrees of CpG methylation, respectively.
Comparison between the hexamer cleavage rates derived from these respective subsets shows an ∼eightfold increase in
cleavage rate for hexamers with a methylated CpG immediately downstream of the cleaved phosphate (red points). (B)
Roll and MGW of methylated and unmethylated versions of the same hexamer based on the average of MC predictions
for three different flanking sequences (see Methods for details). Methylation leads to an increase in the positive roll
angle at the CpG dinucleotide and a narrowing of the MGW at position −2 by roughly 0.5 Å. Adapted from (Lazarovici
et al., 2013).
3.2.1.3 CpG methylation narrows the minor groove at adjacent positions.
Next, we asked whether methylation intrinsically leads to a narrowing of the minor
groove, which in turn would explain the observed increase in cleavage rate upon
methylation. To test this hypothesis, we extended the MC algorithm so that it could also
predict the shape of free DNA molecules containing 5-methylcytosine bases. We first
applied it to the most cleavable hexamer, ACTCGA. Strikingly, we observed that CpG
methylation leads to an increased roll angle at the CpG step and a narrowing of the minor
groove (Figure 14B). Roll is the angle between two adjacent base pair planes describing
the opening of a dinucleotide to either the minor or major groove. The narrowing of the
minor groove is most pronounced ( ∼0.5 Å) at position −2, which is exactly where
55
according to Figure 13A we expect it to have the biggest positive impact on cleavage
rate.
3.2.2 EVOLVING INSIGHTS INTO HOW CYTOSINE METHYLATION AFFECTS PROTEIN-DNA
BINDING
3.2.2.1 DNase I cleavage is dependent on DNA methylation state
Based on the previous results, Figure 15A illustrates a ‘shape-to-affinity model’ used by
Lazarovici et al., 2013. Each hexamer sequence is converted into a set of position-
specific Roll and MGW values, which serve as the independent (‘predictor’) variables in
a multiple linear regression. The negative logarithm of the relative cleavage rate serves as
the dependent (‘response’) variable, which can be interpreted as the binding free energy
DDG relative to that for the most cleavable sequence (for which, by definition, DDG = 0).
Analyzing the fraction of the total variance of DDG among all unmethylated sequences of
type NNN|CGN (where ‘|’ indicates the cleavage site) that can be explained by different
variants of the shape-based model (Figure 15B) shows that although Roll is more
predictive than MGW, the two variables complement each other. In addition, simulations
for methylated and unmethylated sequences revealed that a change in Roll owing to the
substitution of C by 5mC was largely independent of the identity of the other four bases
within the NNN|CGN hexamer (Figure 15C).
56
Figure 15. Intrinsic DNA methylation sensitivity of DNase I. This figure illustrates the original analysis performed
previously (Lazarovici et al., 2013). (A) Schematic diagram illustrating construction of the ‘shape-to-affinity model’
for predicting the binding free energy DDG (logarithm of the relative cleavage rate) from DNA structural features, such
as Roll and MGW. (B) Fraction of the variance in binding free energy explained by the different variants of the shape-
based model when fit to all 256 unmethylated DNA sequences containing a CpG dinucleotide. (C) Effect of cytosine
methylation on each shape parameter (distribution of difference for Roll and MGW values, along the leading strand of
the hexamer), derived from all-atom Monte Carlo simulations of methylated DNA fragments. Five Roll values describe
the base pair steps in a hexamer, whereas each of the six base pairs can be assigned a MGW value. Adapted from
(Dantas Machado et al., 2015).
3.2.2.2 Implications for DNA binding of TFs
Mechanisms explaining the change in the DNase I cleavage rate on CpG methylation
(Lazarovici et al., 2013) suggest that 5mC could have an impact on protein–DNA
interactions in general, and on the binding specificity of TFs in particular. In the case of
DNase I, it is important that two arginine side chains be present in the minor groove
(Figure 13). Thus, the magnitude of the enhancement of an interaction based on a similar
mechanism is expected to depend strongly on the DNA recognition mechanism used by
the specific protein (Levo et al., 2015; Rohs et al., 2010; Slattery et al., 2014). More
generally, the hydrophobic methyl group can influence direct contacts in the major
groove (base readout), CpG methylation can alter the 3D structure of the DNA binding
site (shape readout) and methylated cytosines can modify the nucleosome stability (and,
57
thus, chromatin structure). Any of these effects can occur in isolation or combination;
hence, it is unlikely that the effect of CpG methylation on TF-DNA binding can be
summarized in a set of simple rules. Moreover, DNA methylation interacts with other
epigenetic marks in a complex manner to affect transcription and regulate gene
expression (Feldmann et al., 2013; Zilberman et al., 2007). The exact mechanisms are
still not understood, although several have been proposed. TFs might be subjected to a
physical barrier created by DNA methylation, which hinders access to TFBSs. TFBSs
that are stably bound by TFs are highly resistant to de novo methylation (Gebhard et al.,
2010). Mechanical properties of DNA, such as stiffness (Pérez et al., 2012) and strand
separation (Severin et al., 2011), change upon cytosine methylation, with possible effects
on TF binding.
3.2.2.3 Effects of DNA methylation on DNA base and shape readout
Three major families of methyl-CpG–binding proteins can be distinguished on the basis
of the domain structure that they use to interact with DNA: methyl-CpG-binding domain
(MBD) proteins, SET and RING-finger associated domain proteins, and Kaiso-like C2H2
zinc-finger proteins (Fournier et al., 2012, 2012; Zou et al., 2012). The most intuitive
effect of CpG methylation on TF– DNA binding is the addition of a methyl group at the
major groove edge of the cytosine base (Figure 16A). The 5-methyl group is present in
5mC and thymine. Thus, the 5mC-G base pair can be contacted through hydrophobic
contacts, similar to how thymine is contacted in unmethylated DNA. Cytosine
methylation alters functional group signatures in the major groove, but not in the minor
groove (Figure 16B). Addition of a methyl group at the major groove edge of 5mC can be
specifically recognized through hydrophobic base contacts (base readout). Of the few
58
crystal structures of DNA oligonucleotides with 5mC bases that have been reported,
some of these structures suggest that the steric hindrance of the methyl group in the major
groove counters DNA bending and twisting (Tippin and Sundaralingam, 1997) . The
presence of a bulky methyl group might lead to a subtle widening of the major groove
and, consequently, a subtle narrowing of the minor groove (Figure 16C), owing to the
close proximity of the methyl group to the phosphodiester backbone, which might lead to
steric hindrance (Figure 16D).
Figure 16. Base and shape readout of methylated DNA. (A) Chemical configurations of C-G (left), 5mC-G (center) and
T-A (right) base pairs. The methyl group (indicated by the C5M carbon atom) is present at the major groove edge of the
5mC-G and T-A base pairs. (B) Signatures of functional groups at the major groove (left) and minor groove (right)
edges of C-G (top), 5mC-G (center) and T-A (bottom) base pairs. The methyl group (yellow) changes the signature of
functional groups at the major groove edge of the C-G base pair, but base readout at the minor groove edge is not
affected. (C) Presence of a 5mCpG dinucleotide (C5M carbon atom of the methyl groups shown in red) in methylated
DNA (top) can affect the widths of the major (left) and minor grooves (right) compared with unmethylated DNA
(bottom) as a function of its sequence context. (D) The methyl group of the 5mC nucleotide is in close proximity to the
sugar moiety and phosphate group of the adjacent nucleotide in 5’ direction (here, thymine). Structures in this figure
are derived from all-atom Monte Carlo simulations of naked DNA, using a previously described protocol (Lazarovici et
al., 2013; Rohs et al., 2005a).
59
DNA-binding proteins recognize the 3D DNA structure (shape readout) (Rohs et al.,
2009a, 2009b), which is highly sequence dependent (Dror et al., 2014; Gordân et al.,
2013; Yang et al., 2014). As discussed above (Lazarovici et al., 2013), the impact of
DNA methylation on DNA structure can explain the methylation state-dependent DNase
I cleavage rate. Local DNA shape features were powerful predictors of the effect of
cytosine methylation on DNA shape (Lazarovici et al., 2013). The effect was strongly
sequence dependent in that it only occurred for protein–DNA binding events that led to
cleavage of the phosphate immediately 5’ of the CpG dinucleotide. However, the
favorable (negative) contribution of methylation to the overall protein–DNA binding free
energy was largely independent of the base identity at other nucleotide positions near the
cleavage site. Whereas the DNase I cleavage rates of unmethylated CpG-containing
sequences varied widely, the fold increase in these cleavage rates because of methylation
was the same for all sequences. Using all-atom MC simulations (Rohs et al., 2005a;
Zhang et al., 2014), we observed that the Roll angle of CpG dinucleotides consistently
increased upon cytosine methylation (Lazarovici et al., 2013). This effect occurred for
both fully and hemi-methylated CpG steps, as only the size of the increase in Roll
depended on sequence context. Roll decreased at the two base pair steps surrounding a
CpG dinucleotide, to compensate for the increased Roll at the CpG step. This observation
was in agreement with all-atom molecular dynamics simulations, in which the increase in
Roll at CpG steps was the most pronounced effect of cytosine methylation on DNA
structure (Pérez et al., 2012).
60
3.2.2.4 Insights from structures of protein-DNA complexes
Few structures containing 5mC bases have been solved by X-ray crystallography or
nuclear magnetic resonance spectroscopy. Several structural studies have demonstrated
the importance of the 5mC methyl group for contacts with hydrophobic patches on the
protein surface of MBDs (Ohki et al., 2001; Scarsdale et al., 2011) or with water
molecules (Ho et al., 2008; Mayer-Jung et al., 1998). Understanding the role of
hydrophobic contacts with the 5mC methyl group in the major groove for methylation
state-specific binding ideally requires 3D structures of a protein bound to methylated and
unmethylated copies of its DNA target. However, this information is only available for a
few TFs, including the zinc-finger proteins Kaiso (Buck-Koehntop et al., 2012) and
Kruppel-like factor 4 (Klf4) (Liu et al., 2014; Schuetz et al., 2011). Recognition
processes for methylated and unmethylated DNA use similar overall geometries. When
Kaiso or Klf4 is in contact with methylated DNA (Liu et al., 2013, 2014), arginine and
glutamate form hydrophobic contacts with the cytosine methyl groups. For Kaiso, a
hydrophobic pocket built of threonine and cysteine was observed to contact another
methyl group (Buck-Koehntop et al., 2012). Interestingly, the binding affinities of Klf4 to
methylated DNA and to its unmethylated target are similar (Liu et al., 2014). In the case
of the zinc-finger protein Zfp57, structural information is only available for binding to
methylated DNA. The cytosine methyl group of the Zfp57 target site engages in
hydrophobic contacts with arginine (Liu et al., 2012). Taken together, these data support
the hypothesis that hydrophobic contacts with methyl groups merely fine-tune the
binding specificities of TFs. If the presence of a methyl group confers binding specificity,
then in principle, 5mC should mimic the presence of a thymine. This possibility has been
experimentally proven for the binding site of the lac repressor. In this case, replacement
61
of a thymine by cytosine caused loss of activity, which was restored when thymine was
replaced by 5mC (Razin and Riggs, 1980). Another example is the complex of the P22 c2
repressor and its operator. In this case, four thymine methyl groups form a hydrophobic
binding pocket for a specific valine contact (Watkins et al., 2008). Contacts with the
thymine methyl group were described as intermediates between weak hydrogen bonds
and strong van der Waals interactions (Mandel-Gutfreund et al., 1998). A TF or other
DNA-binding protein physically interacts via the electrostatic potential at the molecular
surface of the DNA (Rohs et al., 2009a). DNA is a polyanion, and its surface is
dominated by a negative electrostatic potential. Whereas DNA phosphate groups and
bases carry negative charges, the sugar moieties, thymine and 5mC methyl groups of
DNA are positively charged (Rohs et al., 2010). Thus, the electrostatic potential at the
molecular surface of these hydrophobic groups is less negative than at the remainder of
the DNA surface.
3.2.2.5 Large-scale binding assays reveal methylation-dependent binding
The hypothesis that DNA methylation leads to transcriptional inhibition is still being
debated. Despite studies on the effects of cytosine methylation on gene expression
(Gutierrez-Arcelus et al., 2013), the underlying mechanisms are still not understood. A
recent high-throughput study of TF binding (Hu et al., 2013), which combined in vitro
binding assays with in vivo validations, offered a rich perspective on this question. This
study used protein microarrays to probe the binding of 1321 human TFs and 210
cofactors to 154 TF binding motifs containing at least one CpG dinucleotide. In general,
DNA methylation did not inhibit the binding of TFs from different families. The in vitro
assay indicated interactions of at least one, and a median of eight, TFs with each of the
62
studied CpG-containing motifs. Moreover, 5mC did not inhibit binding per se. A subset
of 41 TFs and 6 cofactors from different protein families exhibited specific or nonspecific
5mC-dependent binding. Further analysis of methylated consensus TFBSs with known
binding motifs showed almost no correlation, suggesting that cytosine methylation
created completely different binding sites for some TFs (Hu et al., 2013).
3.3 STUDIES OF THE MYOCYTE ENHANCER FACTOR-2 (MEF2)
After studying many contributing factors for protein-DNA recognition mediated through
DNA shape readout across various proteins (Chang et al., 2013; Chen et al., 2013; Dantas
Machado et al., 2015; Deng et al., 2015; Lazarovici et al., 2013) , I sought to zoom in into
a specific TF to better understand the main mechanisms driving DNA recognition. To this
end, we used the MEF2 TF, selected among a pool of proteins, in order to be further
studied.
Features of members of the MEF2 family that made them a particularly good candidate
for understanding DNA shape include 1) interactions within the narrow minor groove of
the DNA suggesting a mechanism of shape recognition mode previously described; 2)
mutations either on the protein or in the MEF2 DNA binding sites have been associated
with different health conditions, motivating further studies to elucidate disease
mechanisms.
In order to understand the role of DNA shape in MEF2-DNA recognition, we analyzed
structural data and high-throughput protein-DNA binding data from members of the
MEF2 family. We then focused on MEF2B, a member of the MEF2 family shown to be
mutated in B-cell lymphoma (Morin et al., 2011; Zhang et al., 2013). The results
presented here suggest a possible role of DNA shape in MEF2 binding preferences.
63
3.3.1 ANALYSIS OF THE CRYSTAL STRUCTURE REVEALS DNA SHAPE READOUT
When attempting to unravel the contributions of DNA structural features for protein-
DNA recognition modes, members of the MEF2 family had been selected as top
candidates based on crystal structure analysis of MEF2 family members. One particular
feature that made MEF2 a good candidate was its interactions with the DNA minor
groove, which were very pronounced for the N-terminal tail and the 2 helices that interact
with the central DNA region (Figure 17A). The structural analysis of the DNA in
complex with the MEF2B dimer (Han et al., 2003) reveals basic residues at the N-
terminal tail and the alpha-helix interacting with the DNA in a narrow region of the
minor groove and increased negative electrostatic potential (Figure 17B). The narrowest
region of the minor groove measures 2.7 Å compared to a 5.8 Å in a standard B-DNA
(Figure 17B, blue line). The electrostatic potential in the central region is also more
negative; from the central narrowest point of the minor groove to the edge of the binding
site, the electrostatic potential varies by about 7kT/e (Figure 17B, red line). The analysis
of other complexes where MEF2 binds to DNA also reveals similar features in regards to
the increased negative potential induced by the narrowing of the minor groove (Figure
17C, minor groove profile of bound DNA). Furthermore, the predicted minor groove
width for the unbound DNA also presents a narrowing of the central region of the binding
site (Figure 17C, prediction of unbound DNA). Therefore, DNA recognition by MEF2
conforms to a mechanism of DNA shape readout in the minor groove whereby negatively
charged residues recognize DNA shape.
64
Figure 17. DNA shape recognition by MEF2. (A) A representation of the MEF2B in complex with DNA (PDB ID
1N6J) (Han et al., 2003) shows where basic residues (illustrated as sticks) are interacting with the minor groove. (B)
Plot of minor groove width and electrostatic potential as a function of nucleotide sequence. Minor groove width is
depicted in blue and electrostatic potential is depicted in red. Arrows indicate positions in which basic residues (as
marked in “A”) interact with the minor groove. (C) Minor groove width and electrostatic potential for other complexes
of MEF2-DNA aligned based on the core binding site. PDB IDs are shown on the left side of the panels for results of
the respective DNA sites bound by MEF2. A complete list of references is shown in Table 2. Darker green represents
narrower regions of minor groove width, and dark red represents more negative regions of electrostatic potential.
3.3.2 HIGH-THROUGHPUT DATA OF MEF2 TFS SUGGESTS DNA SHAPE CONTRIBUTIONS
Based on previous results and the fact that MEF2 proteins have a consensus DNA
binding site indicative of shape readout, we next sought to determine whether high-
throughput protein-DNA binding data could reveal specific DNA shape preferences
patterns for MEF2.
We first attempted to analyze the minor groove width profile of ChIP-seq data for MEF2-
A and MEF2-C binding sites. Figure 18A shows heatmaps in which each column
represents a position within the binding site and each row represents a bin of 30
65
sequences aligned based on the PWM. While alignment of the binding sites based on
PWM score reveals a clear pattern of narrow minor groove in the center, the alignment
based on peak enrichment does not reveal any binding site positional pattern for the top
50% of higher intensity peaks (not shown). If ranked by PWM score, then the narrow
minor groove pattern in the central region is seen among the top scoring motifs (bottom
of heatmap), which is expected since the consensus motif has a central A-tract.
Figure 18. Analysis of DNA shape from ChIP-seq for MEFA and MEF2C. (A) Heatmaps show minor groove profiles
across top 50% binding sites for each TF. Each column of the heatmap represents a position within the binding site, and
each row represents a bin for the specific position with averages of minor groove width values for 30 sequences. Rows
are aligned based on PWM score. Bottom logos show PWM for each dataset obtained from JASPAR (see Methods for
details). (B) Boxplot comparing minor groove width distributions between the top 50% binding sites between MEF2A
and MEF2C. Red asterisks indicate positions in which the distribution in MGW values were observed (Mann-Whitney
U test, p-value < 0.001).
Even though the PWMs logos are similar between MEF2A a MEF2C, nuances in DNA
features beyond what could be seem in PWMs could account for binding preferences. A
boxplot comparing the 2 datasets positions within the binding sites show differences in
MGW distributions between MEF2A and MEF2C (Figure 18B) (red asterisks, U test, p <
0.001).
66
We next used in vitro DNA binding data from HT-SELEX to investigate intrinsic DNA
shape preferences of MEF2D. To this end, a multiple linear regression (MLR) was used
to predict binding of DNA targets. To train the model, we used a 10-fold cross validation
and the model accuracy was given by the coefficient of determination R2. By adding
shape parameters (MGW, Roll, PropT, HelT) to a sequence model that considers only
sequence (1mer), prediction accuracy increases by 12% for addition of 1
st
order shape
parameters, and by 21% for addition of 1
st
+ 2
nd
order shape parameters (Figure 19B-C).
Prediction accuracy also increases by addition of k-mers (2mers and 3mers) that account
for dependency between adjacent bases (Figure 19 E-F).
Figure 19. Model performance using k-mers and shape features for MEF2D binding data. (A) Model with sequence
features only (MGW, Roll, PropT, HelT). (B) Combination of 1mer information and shape features (as described in
“A”). (C) Combination of 1mer information, shape features and 2
nd
order shape features. (E-G) Performance of models
using different k-mers.
67
These results suggest that DNA shape features are important for binding specificity and
contribute to binding affinity prediction in a model that considers interdependency
between adjacent nucleotides.
3.3.3 INVESTIGATING MEF2B MUTANTS’ ABILITY TO INTERFERE IN DNA READOUT
Due to the previous results showing a possible role for DNA shape in MEF2-DNA
recognition, both at structural and high-throughput DNA binding levels, we next asked
whether MEF2 mutants could switch the protein’s ability to recognize its DNA targets in
vitro. For this purpose, MEF2B mutants previously described in B cell lymphoma (Morin
et al., 2011; Ying et al., 2013) or from the COSMIC database (Forbes et al., 2015) were
collected. A compilation of these mutations is shown in Table 7. Figure 20A shows the
positions of residues’ mutations selected for this study as sticks representation on the
previously published structure of MEF2B (Han et al., 2003). Position of mutations
selected for this work are also shown aligned to the protein secondary structure and
additional reported mutations (Figure 20B). While many of these variants could affect the
ability of the protein to either dimerize or recognize its binding partners (Molkentin et al.,
1995; Ying et al., 2013), here we focused on the variants that appear to interfere with the
DNA binding interface. Simulations of protein-DNA complexes with the Rosetta
software reveal that the binding energy distributions between WT and selected mutants is
significantly different (p < 0.05) for some mutants. Even though binding energies were
predicted to be different for some of these mutations, we wondered whether shape and
base readout could be affected and alter intrinsic binding preferences of MEF2B.
68
Figure 20. MEF2B mutations selected for this study. (A) Representation of the MEF2B in complex with DNA (PDB ID
1N6J) ) (Han et al., 2003) showing selected residues that were mutated. (B) The protein secondary structure of MEF2B
obtained from PDBsum (de Beer et al., 2014) is displayed along with amino acid sequence and conservation for each
position. The aligned panels illustrate the position of missense mutations of MEF2B selected for this study, indicated as
black tick marks, and the distribution of mutations for MEF2B from the COSMIC database
(http://cancer.sanger.ac.uk/cosmic) (Forbes et al., 2015), where height indicates the frequency and different colors the
identity of the amino acid. (C). Plot of binding energy (energy of the bound complex minus energy of unbound
components) obtained from the Rosetta sampling method for protein-DNA complexes. Each box of the boxplot
corresponds to energies for each of the indicated substitutions on the protein. Asterisks indicate p-value < 0.05 (black)
and p-value < 0.001 (red).
3.3.4 USING SELEX-SEQ TO INTERROGATE MEF2B BINDING PREFERENCES
In order to investigate DNA binding preferences of MEF2B and its mutants, we chose to
carry out SELEX-seq. Over multiple rounds of enrichment, this method would allow us
to derive plenty of sequence information, allowing for a better characterization of this
69
transcription factor’s binding repertoire. First, MEF2B WT and mutant proteins were
purified, followed by EMSAs to assess protein-DNA binding (Appendix C, Figure 29 ).
The SELEX-seq library was design as described under the methods section, and
validation included use of two different sets of controls as well as varying concentrations
of salt and competitor (Appendix C, Figure 30).
Figure 21. Inferring SELEX-seq relative affinities. (A) A Markov model of different orders is built and evaluated based
on model performance (R
2
). (B) Information gain is calculated for selection at R2 (round 2) based on sequence length.
(C) 9mers were chosen for affinity based further studies as to increase k-mer count for high-affinity binding sites (> 0.9
relative affinity). (D) Comparison between relative affinities for 9mers calculated from enrichments of R0/R2 and
R0/R1.
To take into consideration the non-randomness of the initial oligonucleotide library pool
used for selection, Markov models of different orders were built and evaluated, where a
fifth-order Markov model was found to be the best at fitting our data (Figure 21A). The
70
information gain after two rounds of selection was then calculated to determine the
effective length of the binding site. In spite of the low information gain content, 10mers
seem to be the best length of the DNA binding site for MEF2B (Figure 21B). The
subsequent analyses were done on sequences of length k=9 as to increase the number of
sequences available, so that the k-mer count was higher than 10
4
for higher affinity
sequences (Figure 21C). Relative affinities calculated between R0/R1 (initial pool versus
Round 1) and R0/R2 show comparable enrichments for each k-mer evaluated between
these rounds (Figure 21D). While higher affinity sequences have not converged towards
very few sequences, this could indicate a bigger repertoire used by MEF2B to recognize
its sequence targets, as well as the need for further rounds of selection.
Analysis of the selected sequences from R2, though, indicate that high relative binding
affinities are observed for k-mers conforming to the consensus MEF2 motif
YTA[W]
4
TAR (Figure 22). When considering k-mers with relative binding affinity > 0.9,
the PWM logo obtained with the MEME-ChIP software reveals a sequence motif similar
to the consensus site for MEF2, although with more variability at the central region
(Figure 22A). Likewise, the sequence distribution of k-mers indicates that sites
conforming to the motif (YTAWWWWTA) are among the k-mers with highest relative
binding affinities (Figure 22B). Interestingly, small differences in sequence seem to
affect relative affinities of these k-mers. For example, varying the pyrimidine base (C or
T) at the first position relative to the consensus site (YTAWWWWTA) seems to decrease
overall relative binding affinity. Furthermore, sequences not perfectly matching the
consensus sites are observed among the high affinity sites, suggesting that a combination
71
of readout modes beyond the sequence content itself play a role in binding affinity
(Figure 22B, orange, lime green and cyan dots).
Figure 22. SELEX-seq profile for wt MEF2B. (A) Logos representing PWMs for 9mers with relative affinity > 0.9
generated using MEME-ChIP (Bailey et al., 2009). (B) Strip chart of relative binding affinity of k-mers compared to
consensus motif YTAWWWWTA. (C-E) DNA Shape parameters between lower affinity (< 0.2) (n=60) and higher
affinity (>0.6) (n=71) sequences matching YTANNNNTA was compared between the two groups, suggesting DNA
shape differences at specific positions between higher and lower affinity sites.
To investigate DNA binding determinants between high affinity and low affinity sites, we
compared DNA shape parameters for k-mers conforming to the sequence YTANNNNTA
(Figure 22C-E). This analysis shows differences in propeller twist, helix twist and minor
groove width between these 2 groups. More specifically, a more negative propeller twist
of the central region, together with an increased helix twist and narrower minor groove
72
width, are seen for higher affinity sites (Figure 22C-E, magenta line) in comparison to
lower affinity binding sites. Therefore, our preliminary results indicate that MEF2B
employs base and shape readout to achieve specificity. Further studies with more rounds
of selection, utilizing MEF2B mutants, combining k-mer and shape features, and using
regression models to predict the binding affinities derived from our in vitro approach will
likely elucidate the determinants of MEF2B-DNA binding specificity for this TF.
73
4 DISCUSSION
Readout modes used by proteins to recognize their DNA binding sites are vast (Rohs et
al., 2010). Yet, uncovering the determinants of these finely-tuned interactions is a key
component in understanding a much more complex system that controls the regulatory
processes within the cells. In this dissertation, I investigated how histidines and lysines,
in addition to arginines, could play a role in DNA shape readout. In addition, nuances in
DNA geometry that can accommodate binding of p53 differently than previously
hypothesized were also examined. Through high-throughput analysis of protein-DNA
interactions, I explore how DNA shape can be used as a mechanism by various proteins
to recognize its target sites.
4.1 BROADER ARRAY OF READOUT MODES FROM CRYSTAL STRUCTURE ANALYSIS
4.1.1 DNA SHAPE READOUT RECOGNITION BY THE RNA POLYMERASE aCTD
The results described for the DNA recognition by the aCTD subunits conform to the
previously described shape readout mechanism mediated by arginines (Rohs et al.,
2009a). In the case observed here, the R265 is responsible for DNA shape readout. The
importance of R265 shape readout recognition is supported by previous studies where
mutation of the arginine residue abrogates binding to the UP region (Ross and Gourse,
2005). Furthermore, by changing a position within the target site that decreases the size
of the A-tract (from A
6
to A
5
), binding is reduced for one of the aCTD subunits.
Therefore, it is possible that in the presence of high affinity binding sites, both subunits
could be bound to the DNA, while reduced binding is observed for sites presenting
deviations away from the highest affinity targets.
74
4.1.2 INTRICACIES OF DNA SHAPE READOUT RECOGNITION BY HISTIDINES
The electrostatic attraction of histidine to DNA requires its protonation, and histidine
residues are frequently protonated in the highly charged environment of (Joshi et al.,
2007). In the case of the LTag dimer, the presence of a hydrogen bond is indicative of
one histidine being protonated to provide a hydrogen-bond donor, while the second
histidine is unprotonated. Thus, the total charge of the histidine pair is likely +1. The
H513 pair establishes two crucial interactions within the ternary complex: (1) a hydrogen
bond within the LTag dimer and (2) electrostatic attraction of the His513 pair into the
minor groove. This readout mechanism of sequence-dependent DNA shape may explain
the initial anchoring of the subunits to the DNA, which may be of key importance for
LTag assembly. Histidine was previously observed to bind a narrow minor groove region
of the binding site of the Hox protein Sex combs reduced (Scr) but in tandem with an
arginine residue (Joshi et al., 2007), forming a hydrogen bond between its guanidinium
group with the histidine. Similar to the situation in the LTag dimer, this hydrogen bond
can only form when the histidine is not protonated, assigning the Arg3-His-12 pair a total
charge of +1. This conclusion is supported by the fact that a His-12 mutant of Scr had
only a minor effect on binding and in vivo activity compared with a mutant of the charged
Arg3 (Joshi et al., 2007) . In LTag dimer binding to DNA, however, histidine takes on a
key role in protein-DNA recognition without the presence of an arginine side chain. The
observation that histidine of LTag on its own uses the mechanism of DNA shape readout
to bind DNA is also present in other biological systems based on our analysis of co-
crystal structures of the IFN-β enhanceosome (Escalante et al., 2007; Panne et al., 2004,
2007) (Figure 7). We conclude that histidine in general uses a readout mechanism for
75
protein-DNA interactions in a manner similar to that previously described for arginines
(Rohs et al., 2009a). The critical role of histidine in the interaction of LTag with DNA is
also apparent from its high sequence conservation, as the H513 residue is highly
conserved among all LTag proteins in polyomaviruses and within the distantly related E1
helicase of papillomaviruses (Figure 23).
Figure 23. Multiple sequence alignment around H513. The alignment shows that H513 (blue highlight) is absolutely
conserved in polyomaviruses and the distantly related E1 helicase polyomaviruses. Alignment was done using the
alignment tool T-Coffee and viewed with Jalview. Adapted from (Chang et al., 2013).
Published results show that mutation of H513 on LTag to alanine affects DNA unwinding
and origin melting (Kumar et al., 2007; Shen et al., 2005). Additionally, mutations of the
equivalent H513 to alanine in E1 helicase disrupted ori DNA binding and unwinding (Liu
et al., 2007; Schuck and Stenlund, 2005).
4.1.3 DNA SHAPE READOUT BY LYSINES IN THE MINOR GROOVE
In contrast to our previous observations for arginine (Rohs et al., 2009a) and histidine
(Chang et al., 2013), Fur uses lysine to recognize the enhanced negative electrostatic
potential in the minor groove. Insertion into the narrow minor groove is associated with
desolvation, whose energetic cost is higher for lysine compared with arginine (Rohs et
al., 2009a). While lysine insertion into the minor groove has been previously observed
(Glasfeld et al., 1999), through its recognition of enhanced negative electrostatic potential
in narrow minor groove regions, Fur extends the possible repertoire of shape readout by
76
basic amino acids. A survey of the PDB revealed other examples of lysine recognizing
the narrow minor groove (Figure 9), although the number of examples was much fewer
compared with arginine (Rohs et al., 2009a).
We suggest that these sequence-dependent effects on DNA structure confer the specific
DNA conformations observed for Fur. Consensus sequences bound by Fur in E. coli, P.
aeruginosa, V. cholerae and C. jejuni are also AT rich (Butcher et al., 2012; Davies et al.,
2011; de Lorenzo et al., 1987; Ochsner and Vasil, 1996). Hydroxyl radical-cleavage
intensities demonstrated that the narrow minor groove is an intrinsic feature of the P.
aeruginosa Fur box and other Fur-binding sequences. In our structures, residues in the L1
loop and α2 helix detected the minor groove width through interactions with the DNA
backbone, and positioned Lys15 so that it inserted into the minor groove and interacted
with its negative electrostatic potential. Thus, shape readout, rather than specific
sequences for high-affinity recognition, may be a hallmark of the Fur proteins. Instead of
being able to read only a stringent promoter sequence, this specific property of
recognizing DNA shape confers a global regulatory function to Fur.
Individual contacts between Fur and DNA have precedents in other structural classes of
DNA-binding proteins. For example, a recognition helix of a helix-turn-helix motif
inserting into the major groove (base readout) is commonly found in repressor proteins.
Side chains of positively charged residues inserting into a narrow minor groove are
observed in homeodomains and other protein families (Joshi et al., 2007; Rohs et al.,
2009a). This interplay between base and shape readout has been described for many
transcription factors (Slattery et al., 2014). We showed that Lys15 and a narrow minor
groove are essential features for Fur function. AT-rich sequences tend to form a narrow
77
minor groove because of negative propeller twisting, which is stabilized by inter-base-
pair hydrogen bonds in the major groove (Hancock et al., 2013).
4.1.4 P53 AND DNA CONFORMATIONAL CHANGES OF ITS RES BAX AND P21
In prior studies, it has been assumed that an extra base pair inserted between the two half-
sites will change the relative orientation and distance between the two p53 dimers,
leading to a loss of binding cooperativity. Another possible outcome could be that one
p53 dimer binds to a half-site, while the other dimer binds nonspecifically by moving out
of register by 1 bp to maintain the stabilizing protein–protein interactions. This
mechanism has been observed for the binding of the homomeric glucocorticoid receptor
to a DNA target with an additional 1-bp insertion (Luisi et al., 1991).
However, our studies show that p53 binds the BAX RE through conserved core domain
tetramer architecture similar to the one observed on the contiguous DNA of 3KMD. The
DNA is unwound and compressed to accommodate the 1-bp insertion in the BAX-RE.
Unlike many higher-order protein–DNA complexes in which DNA bends to facilitate
protein–protein interactions (Chen et al., 1998), the DNA structural change observed here
is not a global bend but rather a highly localized distortion of the double helix in the
central region between the two half-sites. This unique feature of structural change in
DNA is likely imposed by the four protein–protein interfaces and the planar structure of
the p53 core domain tetramer. In other words, to compensate for the 1-bp insertion, DNA
bending apparently cannot simultaneously satisfy the geometric constrains of all four
protein–protein interfaces, whereas unwinding and compression of the central DNA
region will enable the p53 dimers to bind in a sequence-specific manner and reestablish
78
the spatial orientation required for core tetramer assembly. For the BAX RE to
accommodate the additional base pair, the DNA is deformed and partially disordered
around the spacer region, resulting in an apparent unwinding and compression, such that
the interactions between the dimers are maintained.
Further work on BAX and p21 REs also demonstrated that in solution p53DBD binding
induces conformational changes at the central region of the p21 and BAX RE. Perhaps
unexpectedly, the degree of p53 induced DNA alteration was more subtle in the 1-bp-
spacer BAX-RE as compared with that in the 0-bp-spacer p21-RE, whereas the p53DBD
bound complexes exhibited a similar tetrameric scaffold for both REs. This provides a
hint that sequence-dependent structural properties encoded in a particular DNA target are
exploited by p53 to achieve the energetically most favorable mode of deformation. This
hypothesis is further supported by structural analyses of unbound REs, which were
enabled by the all-atom models provided by the new SDSL-MC method.
4.2 HOW DNA SHAPE AND CPG METHYLATION CAN AFFECT DNASE I CLEAVAGE
The relationship between MGW and DNase I cleavage rate indicates a recognition
mechanism similar to the recently described binding of arginine residues to narrow
regions of the minor groove (Rohs et al., 2010). Such minor groove shape readout is
based on the enhancement of negative electrostatic potential in narrow groove regions,
which in turn allows for a stronger interaction with positively charged arginine residues
(Rohs et al., 2009a). The increase in DNase I cleavage rate with narrowing of the minor
groove is likely to be based on the attraction of the two arginine side chains through such
locally enhanced negative electrostatic potential. The opposite sign of the correlation
79
between MGW and cleavage rate at the −1 and +1 positions (Figure 13) also makes
structural sense. Earlier reports have shown that the phosphodiester backbone at purine–
pyrimidine (RpY) dinucleotides, which intrinsically widen the minor groove, are cleaved
by DNase I at higher rates (Brukner et al., 1990; Lomonossoff et al., 1981). Having a
widened minor groove where the backbone is cleaved would thus seem to be beneficial
(Bishop et al., 2011). The finding that CpG methylation can enhance DNase I cleavage at
adjacent sites is consistent with, and greatly extend, an earlier observation that
methylation of the central cytosine in the sequence GCGC renders the 5′ phosphate more
susceptible to cleavage by DNase I (Fox, 1986; Kochanek et al., 1993). It has been
previously noted that DNA binding proteins such as DNase I may be regarded as
structural probes of DNA conformation and flexibility (Brukner et al., 1990). The present
work shows that intrinsic DNA shape is an important recognition signal. Finally, and
perhaps most importantly, our in-depth study of DNase I allowed us to uncover a
structural mechanism that plausibly answers the long-standing question of how cytosine
methylation modulates protein–DNA interaction.
Further data analysis for the same dataset strongly suggests the DNA methylation affects
DNA shape, although a difference in roll angle seems more apparent than the narrowing
in minor groove itself. DNase I activity is probably sensitive to this change in DNA
conformation, what could explain our observation of enhanced cleavage adjacent to
methylated CpG base pair steps. Change in DNA shape may thus be the general
mechanism by which the addition or removal of methyl groups in the major groove
influences gene expression.
80
4.3 INSIGHTS FROM MEF2-DNA RECOGNITION
We investigated features of the DNA binding sites of MEF2B in order to explore the
mechanisms employed for recognition of its TFBSs. Previous studies over the years have
revealed a consensus binding site YTA[W]
4
TAR (Potthoff and Olson, 2007) and
highlighted the importance of an AT-rich sequence in the central region of the target site
(Andrés et al., 1995; Katoh et al., 1998; Santelli and Richmond, 2000). Compared to
previous studies, this study aims to shed light into the contributions of base and shape
readout of MEF2B to DNA recognition.
Here, by analyzing crystal structures that revealed 3D features of MEF2-DNA
recognition (Han et al., 2003; Santelli and Richmond, 2000), we found that MEF2
conforms to a shape readout mechanism where positively charged residues recognize the
increased electrostatic potential induced by the narrowing of the minor groove (Rohs et
al., 2009a). This is supported by early observations of the presence of a narrow minor
groove on the binding site of MEF2A (Santelli and Richmond, 2000), and by the
importance of the central A-tract for proteins containing a MADS-box domain (Muiño et
al., 2014). Furthermore, studies showing mutations on conserved residues among
members of the MEF2 family that seem to play a role in shape readout support their key
role in achieving binding specificity both in vitro (Molkentin et al., 1996) and in vivo
(Rocha et al., 2016; Ying et al., 2013).
Our analysis of ChIP-seq data for MEF2 TFs also shows a pattern of narrowing of the
minor groove in the central region, consistent with the consensus binding site and the
81
importance of the minor groove for the MADS-box TF SEP3 in plants (Muiño et al.,
2014). Despite the similarities in PWMs between MEF2A and MEF2C, DNA shape
profiles for the targets of these 2 TFs show positional differences in minor groove width
distributions (Figure 18C). Nuances in DNA geometry of the binding site can play a role
in achieving specificity for different DNA targets, which could be a result of intrinsic
differences in binding site targets between the 2 proteins, or a result of the binding of
other factors, or chromatin structure. Such subtle variations in DNA shape have been
seen for bHLH TFs (Yang et al., 2014).
We also analyzed in vitro HT-SELEX for MEF2D previously available (Jolma et al.,
2013) to circumvent the issue encountered with in vivo data. For MEF2D, adding shape
preferences to a sequence-only model to predict binding affinity greatly improves model
performance. The concept of DNA shape contributing to protein-DNA binding specificity
has been gaining attention and was recently investigated for different TFs (Zhou et al.,
2015). While including shape features or the dependency between nucleotides can
improve binding specificity prediction, these contributions are TF-specific and most
likely greatly depend on the position within the binding site, as recently shown for Hox
TFs (Abe et al., 2015).
Based on the previous analysis of MEF2 TFs, we next chose to study MEF2B for its
association with B cell lymphoma. Recent studies investigating frequently mutated genes
in lymphoma have found MEF2B to be a recurrent target of somatic mutations (Morin et
al., 2011; Pasqualucci et al., 2011; Zhang et al., 2013). The way such mutations
contribute to disease, though, varies, with possible mechanisms including disruption of
dimerization and binding of co-repressors in the case of MEF2 (Ying et al., 2013). While
82
disease-associated mutations have been described at different positions and for different
MEF2 family members, we focused on MEF2B and mutations on residues observed to be
involved in either shape or base readout that could interfere with the ability of the protein
to recognize its DNA targets. The rationale is that by analyzing DNA binding sites of WT
and mutants we would be able to identify changes in binding site preferences based on
those readout modes. The mutations selected for this study were based on previous
studies (Forbes et al., 2015; Ying et al., 2013). Initial analysis using the Rosetta software
package pointed us towards the mutations with the highest changes in binding energies,
and further studies are being carried out with our selected mutants to unravel their
sources of binding specificity and preferences.
Preliminary SELEX-seq results for wild-type MEF2B corroborates its use of base and
shape readout for achieving binding specificity. However, a requirement for a perfect
palindromic site might not be necessary for high affinity sites, since sequences presenting
slight deviations away from the consensus are observed to have a relative high binding
affinity in comparison to other sequences (Figure 22B). Furthermore, the use of shape
readout seems to be a feature employed by MEF2 at specific positions within the binding
site to achieve specificity. First, higher affinity binding sites present a more negative
propeller twist at the central 4 base pairs, which could indicate a higher degree of
flexibility required at these positions. The propeller twist pattern of high affinity binding
sites is also in agreement with the fact that MEF2 binding sites are known to be AT-rich,
which can result in a more negative propeller twist (Yoon et al., 1988). Second, the
narrower minor groove width observed for higher affinity sites in relationship to lower
affinity ones is also consistent with the electrostatic-dependent DNA shape recognition
83
mode. Such readout mode relies on the insertion of basic residues into the narrow minor
groove of the DNA (Rohs et al., 2009a). Third, an increase in helix twist is also observed
for high affinity sites. Therefore, intrinsic structural features of the DNA can be used by
MEF2-DNA recognition. These findings are supported by recent studies in which in vivo
binding site predictions of MEF2A and MEF2C were improved by the addition of DNA
shape features, with propeller twist particularly important for predicting TFBSs for
MADS-box TFs (Mathelier et al., 2016b). It will be interesting to see whether the
selected MEF2B mutants will interfere with these observed DNA structural features.
Based on our studies and on the analysis of the DNA binding sites in complex with
MEF2B, a few hypotheses have been raised as to how MEF2B mutants could affect DNA
binding preferences. Mutations on the N-terminal tail residues R03A, K04E, K05E could
change DNA shape readout preferences. For example, mutations R3T, K04S and K05I
on the N-terminal tail that interacts with the minor groove have been shown to affect
DNA binding activity of MEF2C in mutational analysis study (Molkentin et al., 1996),
and the mutation R3S, also located in the same region, has been found to be associated
with a severe case of neurological disorder (Rocha et al., 2016). For MEF2 TFs, the N-
terminal tail is highly conserved. In fact, the residue corresponding to the arginine that
inserts into the minor groove (Arg3) is very conserved among proteins containing a
MADS-box (Figure 24), suggesting its critical importance for DNA recognition. On the
other hand, immediately surrounding residues Gly2 and Lys4 show some variations in
different members, with the change of glycine to glutamate in the MCM1 structure being
suggested to differently position the arginine in the MCM1-DNA complex (Santelli and
Richmond, 2000) .
84
Figure 24. Protein sequence alignment between MEF2 family members and MADS-box proteins. The highlighted
arginines represent the Arg3 position of the MEF2 proteins which is implicated in shape readout. The alignment was
generated through the UniProt webserver (UniProt Consortium, 2015), for which the protein identifiers are located on
the left side of each sequence.
On a different readout mode, protein residues that recognize the major groove of the
DNA through base readout could also affect protein-DNA recognition. Besides affecting
the binding affinity for MEF2B’s intrinsic targets, mutations K23R and R15G could
affect base readout in the major groove, with possible effects on the preferred binding site
“YTAWWWWTAR”. Based on published studies of MEF2-DNA crystal structures, very
few base-specific contacts have been reported in the major groove, which include Arg15
and Lys23. For the binding site “YTAWWWWTAR”, while Lys23 could hydrogen bond
to the last 2 positions (9
th
and 10
th
positions) and have hydrophobic interactions with the
9
th
position of the binding site, Arg15 reaches outside of the core binding motif to form
hydrogen bonds, although there is no strong sequence preferred at that position.
Some other missense mutations observed in lymphomas could also affect readout modes
by restricting the flexibility of side chains that interact with DNA (Ile8, Phe26, Tyr33) or
by affecting local DNA geometry (Arg24, Lys30, Glu34).
Mutational studies on different members of the MEF2 family also support the importance
of the residues that we have chosen to study. Mutations in the Drosophila melanogaster
85
D-mef2 gene disrupting conserved residues R15C and R24C showed inability of these
mutants to bind to a consensus DNA recognition sequence in vitro and severe myogenic
defects in vivo (Nguyen et al., 2002). A mutation in MEF2C linked to severe retardation
in patients is located at position L38Q. Modeling studies revealed that L38 is important
for properly positioning R24, while the mutant leaves R24 in a position not optimal to
interact with DNA (Zweier et al., 2010). Thus, the high-throughput study of MEF2B
mutations will presumably assist in disentangling the determinants of binding specificity
used by this protein.
The in vivo DNA recognition modes employed by MEF2 are likely comprised of other
factors as well. Early studies had described that muscle activation was dependent on
binding of bHLH proteins (Molkentin et al., 1995), and more recent ChIP-exo results for
MEF2A showed different DNA motifs enriched in the vicinity of its target sites between
cardiomyocytes and skeletal myoblasts, suggesting that different co-factors might be used
for achieving in vivo binding specificity (Wales et al., 2014).
4.4 OVERALL IMPLICATIONS FOR PROTEIN-DNA RECOGNITION
For a long time, protein-DNA binding preferences were described mainly as a string of
letters. This is because hydrogen bonds donors and acceptors and hydrophobic
interactions between protein residues and the DNA bases can select for a specific
sequence of letters based on their interaction pattern (Seeman et al., 1976). However, it
was noticed early on that DNA sequence alone could not fully explain the preferred
binding sites of many proteins and that additional unaccounted contributions must be
important to achieve specificity (Gartenberg and Crothers, 1988; Hegde, 2002; Koudelka
86
et al., 2006; Otwinowski et al., 1988; Watkins et al., 2008). In fact, it is now widely
accepted that the interdependency between DNA bases and the conformations adopted by
DNA as a result of its sequence environment can vary and serve as an extra layer of
binding specificity for protein-DNA recognition (Koudelka et al., 2006; Lawson and
Berman, 2008; Rohs et al., 2010). Despite early attempts based on structural biology to
highlight the importance of such contributions, it was not until recently that methods to
analyze the variations of thousands of sequences at one time would emerge, which
allowed for a deeper understanding of how the DNA structure can affect protein-DNA
recognition on a large scale (Berger and Bulyk, 2009; Carey et al., 2009; Meng et al.,
2005; Riley et al., 2014).
Since the forces that govern protein-DNA interactions are dependent on many
surrounding factors, it can be challenging to find a specific code for these interactions.
However, the work presented here has shed light into the long-standing question of how
proteins recognize their specific DNA targets sites, revealing extra layers of regulation
mediated through DNA shape. Further work in progress will likely elucidate sources of
specificity contributions for another TF. There is evidence, though, that electrostatics
interactions described here as a main driver of DNA shape readout are extremely
important for protein-DNA recognition. In one case, mutagenesis experiments followed
by DNA binding studies of Hox members suggested that TFBS preferences might be
switched when mutating residues involved in DNA shape readout (Abe et al., 2015).
Strikingly, another study from members of the WOPR-domain family reveals that despite
the main contacts in the major groove, the minor groove presented some of the highest
87
affinity interactions, with mutations on the residue that interacts with the minor groove
having the strongest effect on K
d
(Lohse et al., 2014).
Naturally, the ability of a TF to identify its binding sites according to the base and shape
readout of the DNA will vary by TF. However, further work needs to be carried out in
order for us to understand the role of single nucleotide variations in the genome, their
effect on protein-DNA interactions and subsequent consequences on gene regulation.
Protein-DNA interactions could be disrupted in a number of ways. One possibility is that
certain non-coding mutations in regulatory regions of the DNA can affect how TFs
recognize their DNA target sites. In this scenario, regulatory process can be modified by
a change in the TFBS, as suggested for a SNP within the KLF5 locus associated with
increased risk of hypertension, which was located in the TFBS of MEF2A (Oishi et al.,
2010). Another possibility is that mutations on protein residues can affect protein-DNA
binding. In fact, much of the studies regarding protein-DNA mechanisms of recognition
focused on the latter case (Ward and Kellis, 2012). Therefore, variations that affect
protein-DNA binding could lead to gain-of-function or loss-of-function mutations, or
dominant-negative effects (Pon and Marra, 2016; Spielmann and Mundlos, 2016; Zhang
et al., 2016).
In addition, understanding the variations that affect protein-DNA interactions and how
they regulate gene expression can aid in the design of specific artificial TFs and designer
enzymes (Eguchi et al., 2014). Such artificial TFs can up-regulate or down-regulate
transcription of certain factors and be used as therapeutics for certain conditions.
Furthermore, DNA binding molecules such as polyamide can be used to target the DNA,
88
which could be used to target multiple sites of cis-regulatory elements within the genome
(Erwin et al., 2014).
Given the complexity of the network that governs the genetic code, there is certainly
much progress to be done before genes can just be turned on and off at the precise time,
location and concentration to achieve the desired output. However, according to the
points presented in this work, the identification of determinants of protein-DNA
regulation is of paramount importance, especially in a future where genome sequencing
might become routine.
Since the completion of the Human Genome Project (Lander et al., 2001), with the first
blueprint of the human genome, tremendous advances have been made in the genomics
field. The ENCODE project, targeted at functional elements in the human genome, has
generated massive amounts of data, revealing the largest variations within the non-coding
regions of the genome (ENCODE Project Consortium, 2012). More recent initiatives
such as the 100,000 Genomes Project in the UK and the Precision Medicine Initiatives by
the US and the Chinese governments certainly have the potential to shed light into
different aspects of gene regulation and revolutionize the way medicine works. Thus, an
interdisciplinary approach combining structural biology, phenotypic information,
biological data and computational models will likely aid in identifying the determinants
of protein-DNA interactions and how they govern our genomes.
89
APPENDIX
APPENDIX A THE EFFECT OF NON-CODING NUCLEOTIDE VARIATIONS IN DNA
SHAPE
Among the single nucleotide variants (SNVs) located in non-coding regions of genes,
many questions remain to be answered, including how non-coding variants affect gene
regulation. To shed light into this, our lab collaborated with the Nuzdhin Lab at USC and
the DePace Lab at Harvard. Data for SNVs, minor allele frequency (MAF), PhasCons
conservation scores, and enhancers from Drosophila melanogaster were provided by our
collaborators. From a dataset of enhancers (Kvon et al., 2014), TFBSs for a number of
TFs were predicted based on existing PWMs and provided to us by the DePace Lab.
Information for SNVs were then incorporated with MAF (Nuzdhin Lab), conservation
score, Euclidean distance of minor groove width, and TFBS status.
To test the hypothesis that DNA shape is under selection due to TF binding, the
distributions of the Euclidean distance of DNA shape for SNVs was compared between
conserved and non-conserved regions, for variants with low vs high MAF. Indeed, as
expected and previously reported for another dataset by our group, SNVs in conserved
regions are under selection (Figure 25 A-B). To further distinguish between the regions
under most selection, conserved regions were separated in TFBS or non-TFBS based on
binding site predictions. To this end, the same TFBS predictions were used. Such analysis
did not allow for a differentiation between the areas where DNA shape variations had the
most effect (Figure 25 C-D). ChIP-seq peaks were also incorporated to this analysis,
yielding no better results (not shown). This is likely a result of combining multiple types
90
of TFs and SNPs within one large group, where the effects of specific occurrences get
diluted in a more complex mix of DNA sequences that are selected by other criteria than
DNA shape only (Figure 25E-F). DNA shape requirements might vary based on the TF
as well as other factors that were not accounted for in this analysis. Therefore,
investigating specific TFBSs might be a better approach.
Figure 25. Change in minor groove width for SNVs located in TFBSs and its flank regions. (A-D) Distributions of the
Euclidean distance between minor groove width of SNVs located in conserved regions and non-conserved regions. (A-
B) The distributions between SNVs with low MAF versus high MAF are significantly different for those located on
conserved regions (Kolmogorov–Smirnov test, p < 0.001), but not on non-conserved regions of TFBSs and flank
regions. (C-D) The distributions between SNVs with low MAF versus high MAF seem different (p < 0.05) for SNVs
located in conserved regions of TFBSs and non-TFBSs. (E-F) Boxplots showing the Euclidean distance of MGW for
SNVs located in TFBS (E) or flanks (F) for each specific TF (x-axis; n= “total number of occurrences”).
To test the importance of DNA shape, known cases in the literature can be selected to
confirm the hypothesis. The best approach would be to select a TF for which there is
91
structure and literature data on the SNP so that we can better understand the effect of
different sequences in shape readout with structural analysis support. To achieve this
goal, enhancer and SNV data were combined with data from the redfly database
(http://redfly.ccr.buffalo.edu/) (Gallo et al., 2011), which comprises curated TFBS and
CRM information. Based on this approach, a list of approximately 100 SNVs was
collected, some of which had great impact on DNA shape. Further studies should be
carried out to assess the effect of those SNPs in TFBS recognition.
Table 6. List of SNVs identified from the redfly database.
92
APPENDIX B FORKHEAD PROTEINS AND THEIR DNA BINDING SITES
Based on the crystal structure analysis of some Forkhead (Fkh) protein members, a
mechanism of DNA shape readout was observed on the wings 1 and 2 that interact with
DNA. While the known Forkhead consensus motif does not necessarily indicate the
occurrence of such readout mode, some reports from the literature had pointed towards its
importance. For example, mutations to the arginine residue that interacts with the DNA
minor groove of FoxC1 wing2 are seen in patients with Axenfeld-Rieger malformation
(Murphy et al., 2004). Due to the possible role of DNA shape contributions for Fkh
proteins and the availability of protein binding microarray data (PBM) for a number of
Fkh proteins from various organisms, we sought to investigate whether DNA shape
contributed to DNA recognition by the Fkh proteins. In order to do so, I analyzed PBM
data for 30 members of the Fkh family (Nakagawa et al., 2013) with methods previously
described (Dror et al., 2014; Yang et al., 2014; Zhou et al., 2015).
Addition of shape features increase prediction accuracy for Fkh.
First, an MLR approach (as described in the Methods section) was used to predict
binding specificities derived from the PBM experiment based on the input sequence only
or a combination of sequence + shape parameters. As seen in Figure 26, adding shape
parameters (1mer + shape) to a sequence-only model (1mer) improves its accuracy (dots
above the line). However, a 2mer model does better than a 1mer + shape (not shown).
Yet, an improvement in accuracy is observed when adding shape features to a 1mer
model instead of k-mer features, which is remarkable due to the number of features
required for a k-mer model in comparison to shape features.
93
Figure 26. The addition of DNA shape features increases prediction of Forkhead binding specificity. By adding DNA
shape to a model that considers only sequence, model performance, represented by the coefficient of determination R
2
,
improves for PBM data of Forkhead proteins.
Investigating co-evolution between Fkh proteins and their DNA binding sites.
Next, motivated by the improvement in model performance observed by adding shape
features, we investigated if there was a correlation between Forkhead proteins and their
binding sites, as previously described for Homeodomains (Dror et al., 2014). In order to
do so, the same data was divided into groups according to its DNA binding site
similarities (Group Fkh primary: RYAAAYA; Group FHL: GAYGC). The DNA binding
sites and protein sequence for species within each group were aligned. For the DNA
binding sites, either a position frequency matrix (PFM) or a minor groove profile was
generated from the aligned sequences. A pairwise similarity score was calculated for each
pair of Fkh data. For PFMs the similarity score was calculated using the Pearson
correlation, while DNA shape similarity scores were calculated using the absolute value
94
of the Euclidean distance. Pairwise similarities between aligned amino acids sequences
were calculated based on BLOSUM45 (Henikoff and Henikoff, 1992). The Pearson
correlation coefficient was then used to measure the dependency between PFM and
protein sequence, or DNA shape and protein sequence. For the Fkh primary motif (Figure
27A-E), which is composed of Fkh members that align to the consensus GTAAACA
(canonical RYAAAYA), there is a small correlation (0.24) between the DNA sequence
and protein similarities, and an even smaller inverse correlation (-0.19) between DNA
shape and protein similarities (Figure 27A-B). When the protein sequence is divided into
C-terminal vs N-terminal regions, accounting for the regions that interact mainly with the
major groove (N-terminal) or minor groove (C-terminal), no improvement is seen (Figure
27A). For the FHL motif (GACGC), however, the results are different (Figure 27F-J).
The correlation between the Euclidean distance of DNA shape and protein similarity is
greater than the correlation between DNA sequence and protein similarity (Figure 27F-
G). When N-terminal and C-terminal protein regions are taken into account (see Figure
28 for protein alignment), the greater correlation is seen between the N-terminal protein
similarity and minor groove similarity (Euclidean distance), suggesting a possible role of
DNA shape for this second class of observed motifs (Figure 27I-J). In fact, the heatmap
of overall DNA minor groove profiles for each species indicates a possible function of
the flank positions around the binding site in achieving DNA specificity (Figure 27H).
The observed shape pattern, though, is observed for a specific subset of Forkhead
proteins, making possible experimental studies restricted to those members of the family.
Further experimental studies combined with computational analysis of DNA should be
95
conducted in order to evaluate the possible contributions of shape readout outside of the
core motif suggested by analysis of Fkh proteins.
Figure 27. Co-evolution between Forkhead proteins and their target sites. A-E) Analysis for DNA binding motifs of
form RYAAAYA, aligning to the Fkh primary consensus motif GTAAACA. F-J) Analysis for DNA binding motifs of
form GAYGCN, aligning to the FHL consensus motif GACGC. PCC= Pearson correlation coefficient.
96
Figure 28. Multiple protein alignment from members of the Fkh family. The alignment includes members that present
FHL motif according to (Nakagawa et al., 2013). The DNA binding domain (DBD) was divided in N-terminal and C-
terminal, according to the different regions that interact with the DNA. (A) N-terminal region is where the recognition
helix is located. (B) C-terminal region is where the two wings are located and interact with the DNA minor groove.
APPENDIX C MEF-2 DNA BINDING EXPERIMENTS VALIDATION
After protein purification, initial EMSAs were performed (Figure 29) to asses protein
binding to the probe used in previous experiments (Wu et al., 2010). For the SELEX-seq
experiments, a library compatible with Illumina sequencing containing a random 16-base
region was constructed and evaluated. For library validation, we used a positive control
probe (PCTRL) in which the variable region contains a sequence known to bind MEF2,
while the negative control probe (NCTRL) was an unfavorable sequence, presenting a
GC-rich sequence in the same region. Binding under the presence of a competitor (Figure
30A-C) or at higher salt concentration (Figure 30E-G) is favored for the PCTRL and
97
disfavored for the NCTRL. Furthermore, in the presence of an unlabeled competitor, the
binding is still observed for the PCTRL, but not for the NCTRL (Figure 30D). EMSA for
SELEX-seq was then performed using the SELEX library as seen in Figure 30H.
Figure 29. Purified protein used for EMSA. (A) An example of protein induction. SDS-PAGE of samples from bacteria
culture before induction (BI) and after induction (AI). (B) Ni-NTA purification of his-tagged MEF2B with mutation
K05E. When purification using Ni-NTA agarose did not yield a high level of purification, a combination of sp
sepharose and Ni-NTA purification was used. (C) Western blot of protein purified using mouse anti-His as primary
antibody, and anti-mouse HRP-conjugated (GE). Samples: 1) WT; 2) WT; 3) R15G; 4) K04E; 5) R24Q; 6) R03A.
Immobilon western chemiluminescent HRP substrate was used with a chemidoc (Biorad) to allow visualization. D)
EMSA for WT and mutants. EMSA for initial assessment of protein-DNA binding was performed as described on the
methods section. Lanes: C=DNA only; “1-4” refer to protein concentrations (9uM, 14uM, 18uM, 27uM). Probe was
kept constant at 10uM DNA. On SDS-PAGE gels, marker lane (M) allows for visualization of band at expected
position.
98
Figure 30. Validation of SELEX-seq library. (A-D) EMSA in the presence of competitor poly(dI-dC). (E-H) EMSA
under varying salt concentrations.
99
Table 7. MEF2B mutations observed in lymphoma patients.
Position Mutation Source
1 M1K 1 * Morin RD, et al., 2011, Nature;476(7360):298-303
4 K4E 2 * COSMIC
5 K5E 1 * Morin RD, et al., 2011, Nature;476(7360):298-304
6 I6T 1 COSMIC
6 I6S 1 * COSMIC
8 I8F 1 COSMIC
8 I8V 1 * Morin RD, et al., 2011, Nature;476(7360):298-305
10 R10H 2 COSMIC
15 R15G 1 * Morin RD, et al., 2011, Nature;476(7360):298-306
17 R17W 3 COSMIC
23 K23R 2 * COSMIC
23 K23R 1 * Morin RD, et al., 2011, Nature;476(7360):298-307
24 R24Q 1 * Morin RD, et al., 2011, Nature;476(7360):298-308
26 F26V 1 * Morin RD, et al., 2011, Nature;476(7360):298-309
33 Y33S 1 * Morin RD, et al., 2011, Nature;476(7360):298-310
38 L38I 1 * COSMIC
47 I47T 1 * Morin RD, et al., 2011, Nature;476(7360):298-311
49 N49S 1 * COSMIC
53 R53H 1 * Morin RD, et al., 2011, Nature;476(7360):298-312
54 L54P 1 * COSMIC
58 A58D 2 COSMIC
60 T60M 1 COSMIC
63 D63G 1 COSMIC
64 R64H 2 COSMIC
67 L67R 1 * COSMIC
67 L67R 1 * Morin RD, et al., 2011, Nature;476(7360):298-313
69 Y69H 1 * COSMIC
69 Y69C 1 * COSMIC
70 T70K 1 * COSMIC
70 T70N 1 * COSMIC
76 H76R 1 * Morin RD, et al., 2011, Nature;476(7360):298-314
77 E77K 2 * COSMIC
77 E77K 1 * Morin RD, et al., 2011, Nature;476(7360):298-315
78 S78N 1 * COSMIC
78 S78R 1 * COSMIC
79 R79H 1 COSMIC
81 N81Y 1 * COSMIC
81 N81K 1 * COSMIC
82 T82A 1 * COSMIC
83 D83A 2 * COSMIC
83 D83V 8 * COSMIC
92 G92V 2 COSMIC
Count
100
BIBLIOGRAPHY
Abe, N., Dror, I., Yang, L., Slattery, M., Zhou, T., Bussemaker, H.J., Rohs, R., and
Mann, R.S. (2015). Deconvolving the Recognition of DNA Shape from Sequence. Cell
161, 307–318.
Adams, R.L. (1990). DNA methylation. The effect of minor bases on DNA-protein
interactions. Biochem. J. 265, 309–320.
Aduri, R., Psciuk, B.T., Saro, P., Taniga, H., Schlegel, H.B., and SantaLucia, J. (2007).
AMBER Force Field Parameters for the Naturally Occurring Modified Nucleosides in
RNA. J. Chem. Theory Comput. 3, 1464–1475.
Afek, A., Schipper, J.L., Horton, J., Gordân, R., and Lukatsky, D.B. (2014). Protein-
DNA binding in the absence of specific base-pair recognition. Proc. Natl. Acad. Sci. U.
S. A. 111, 17140–17145.
Andrés, V., Cervera, M., and Mahdavi, V. (1995). Determination of the Consensus
Binding Site for MEF2 Expressed in Muscle and Brain Reveals Tissue-specific Sequence
Constraints. J. Biol. Chem. 270, 23246–23249.
Badis, G., Berger, M.F., Philippakis, A.A., Talukder, S., Gehrke, A.R., Jaeger, S.A.,
Chan, E.T., Metzler, G., Vedenko, A., Chen, X., et al. (2009). Diversity and Complexity
in DNA Recognition by Transcription Factors. Science 324, 1720–1723.
Baichoo, N., and Helmann, J.D. (2002). Recognition of DNA by Fur: a reinterpretation of
the Fur box consensus sequence. J. Bacteriol. 184, 5826–5832.
Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., Ren, J., Li,
W.W., and Noble, W.S. (2009). MEME SUITE: tools for motif discovery and searching.
Nucleic Acids Res. 37, W202–W208.
de Beer, T.A.P., Berka, K., Thornton, J.M., and Laskowski, R.A. (2014). PDBsum
additions. Nucleic Acids Res. 42, D292–D296.
Benoff, B., Yang, H., Lawson, C.L., Parkinson, G., Liu, J., Blatter, E., Ebright, Y.W.,
Berman, H.M., and Ebright, R.H. (2002). Structural basis of transcription activation: the
CAP-alpha CTD-DNA complex. Science 297, 1562–1566.
Berger, M.F., and Bulyk, M.L. (2009). Universal protein-binding microarrays for the
comprehensive characterization of the DNA-binding specificities of transcription factors.
Nat. Protoc. 4, 393–411.
Berger, M.F., Philippakis, A.A., Qureshi, A.M., He, F.S., Estep, P.W., and Bulyk, M.L.
(2006). Compact, universal DNA microarrays to comprehensively determine
transcription-factor binding site specificities. Nat. Biotechnol. 24, 1429–1435.
101
Berger, M.F., Badis, G., Gehrke, A.R., Talukder, S., Philippakis, A.A., Peña-Castillo, L.,
Alleyne, T.M., Mnaimneh, S., Botvinnik, O.B., Chan, E.T., et al. (2008). Variation in
homeodomain DNA binding revealed by high-resolution analysis of sequence
preferences. Cell 133, 1266–1276.
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H.,
Shindyalov, I.N., and Bourne, P.E. (2000). The Protein Data Bank. Nucleic Acids Res.
28, 235–242.
Bishop, E.P., Rohs, R., Parker, S.C.J., West, S.M., Liu, P., Mann, R.S., Honig, B., and
Tullius, T.D. (2011). A map of minor groove shape and electrostatic potential from
hydroxyl radical cleavage patterns of DNA. ACS Chem. Biol. 6, 1314–1320.
Borowiec, J.A., Dean, F.B., Bullock, P.A., and Hurwitz, J. (1990). Binding and
unwinding--how T antigen engages the SV40 origin of DNA replication. Cell 60, 181–
184.
Brown, D.G., and Freemont, P.S. (1996). Crystallography in the study of protein-DNA
interactions. Methods Mol. Biol. Clifton NJ 56, 293–318.
Brukner, I., Jurukovski, V., and Savic, A. (1990). Sequence-dependent structural
variations of DNA revealed by DNase I. Nucleic Acids Res. 18, 891–894.
Brukner, I., Sánchez, R., Suck, D., and Pongor, S. (1995). Sequence-dependent bending
propensity of DNA as revealed by DNase I: parameters for trinucleotides. EMBO J. 14,
1812–1818.
Buck-Koehntop, B.A., Stanfield, R.L., Ekiert, D.C., Martinez-Yamout, M.A., Dyson,
H.J., Wilson, I.A., and Wright, P.E. (2012). Molecular basis for recognition of
methylated and specific DNA sequences by the zinc finger protein Kaiso. Proc. Natl.
Acad. Sci. 109, 15229–15234.
Butcher, J., Sarvan, S., Brunzelle, J.S., Couture, J.-F., and Stintzi, A. (2012). Structure
and regulon of Campylobacter jejuni ferric uptake regulator Fur define apo-Fur
regulation. Proc. Natl. Acad. Sci. U. S. A. 109, 10047–10052.
Carey, M.F., Peterson, C.L., and Smale, S.T. (2009). Chromatin Immunoprecipitation
(ChIP). Cold Spring Harb. Protoc. 2009, pdb.prot5279.
Carey, M.F., Peterson, C.L., and Smale, S.T. (2012). Experimental Strategies for the
Identification of DNA-Binding Proteins. Cold Spring Harb. Protoc. 2012, pdb.top067470.
Chang, Y.P., Xu, M., Dantas Machado, A.C., Yu, X.J., Rohs, R., and Chen, X.S. (2013).
Mechanism of Origin DNA Recognition and Assembly of an Initiator-Helicase Complex
by SV40 Large Tumor Antigen. Cell Rep. 3, 1117–1127.
102
Chen, C., Gorlatova, N., Kelman, Z., and Herzberg, O. (2011). Structures of p63 DNA
binding domain in complexes with half-site and with spacer-containing full response
elements. Proc. Natl. Acad. Sci. U. S. A. 108, 6456–6461.
Chen, L., Glover, J.N., Hogan, P.G., Rao, A., and Harrison, S.C. (1998). Structure of the
DNA-binding domains from NFAT, Fos and Jun bound specifically to DNA. Nature 392,
42–48.
Chen, Y., Dey, R., and Chen, L. (2010). Crystal Structure of the p53 Core Domain Bound
to a Full Consensus Site as a Self-Assembled Tetramer. Structure 18, 246–256.
Chen, Y., Zhang, X., Dantas Machado, A.C., Ding, Y., Chen, Z., Qin, P.Z., Rohs, R., and
Chen, L. (2013). Structure of p53 binding to the BAX response element reveals DNA
unwinding and compression to accommodate base-pair insertion. Nucleic Acids Res. 41,
8368–8376.
Chiu, T.-P., Yang, L., Zhou, T., Main, B.J., Parker, S.C.J., Nuzhdin, S.V., Tullius, T.D.,
and Rohs, R. (2015). GBshape: a genome browser database for DNA shape annotations.
Nucleic Acids Res. 43, D103–D109.
Coulocheri, S.A., Pigis, D.G., Papavassiliou, K.A., and Papavassiliou, A.G. (2007).
Hydrogen bonds in protein-DNA complexes: where geometry meets plasticity. Biochimie
89, 1291–1303.
Czarniecki, D., Noel, R.J., and Reznikoff, W.S. (1997). The -45 Region of the
Escherichia Coli Lac Promoter: CAP-Dependent and CAP-Independent Transcription. J.
Bacteriol. 179, 423–429.
Dantas Machado, A.C., Zhou, T., Rao, S., Goel, P., Rastogi, C., Lazarovici, A.,
Bussemaker, H.J., and Rohs, R. (2015). Evolving insights on how cytosine methylation
affects protein-DNA binding. Brief. Funct. Genomics 14, 61–73.
Davies, B.W., Bogard, R.W., and Mekalanos, J.J. (2011). Mapping the regulon of Vibrio
cholerae ferric uptake regulator expands its known network of gene regulation. Proc.
Natl. Acad. Sci. U. S. A. 108, 12467–12472.
DeLano, W.L. (2002). The Pymol Molecular Graphics System.
Deng, Z., Wang, Q., Liu, Z., Zhang, M., Dantas Machado, A.C., Chiu, T.-P., Feng, C.,
Zhang, Q., Yu, L., Qi, L., et al. (2015). Mechanistic insights into metal ion activation and
operator recognition by the ferric uptake regulator. Nat. Commun. 6, 7642.
Dror, I., Zhou, T., Mandel-Gutfreund, Y., and Rohs, R. (2014). Covariation between
homeodomain transcription factors and the shape of their DNA binding sites. Nucleic
Acids Res. 42, 430–441.
103
Dror, I., Golan, T., Levy, C., Rohs, R., and Mandel-Gutfreund, Y. (2015). A widespread
role of the motif environment in transcription factor binding across diverse protein
families. Genome Res. 25, 1268–1280.
Dror, I., Rohs, R., and Mandel-Gutfreund, Y. (2016). How motif environment influences
transcription factor search dynamics: Finding a needle in a haystack. BioEssays News
Rev. Mol. Cell. Dev. Biol. 38, 605–612.
Eguchi, A., Lee, G.O., Wan, F., Erwin, G.S., and Ansari, A.Z. (2014). Controlling gene
networks and cell fate with precision-targeted DNA-binding proteins and small-
molecule-based genome readers. Biochem. J. 462, 397–413.
El-Deiry, W.S., Kern, S.E., Pietenpol, J.A., Kinzler, K.W., and Vogelstein, B. (1992).
Definition of a consensus binding site for p53. Nat. Genet. 1, 45–49.
Emamzadah, S., Tropia, L., and Halazonetis, T.D. (2011). Crystal structure of a
multidomain human p53 tetramer bound to the natural CDKN1A (p21) p53-response
element. Mol. Cancer Res. MCR 9, 1493–1499.
ENCODE Project Consortium (2012). An integrated encyclopedia of DNA elements in
the human genome. Nature 489, 57–74.
Erwin, G.S., Bhimsaria, D., Eguchi, A., and Ansari, A.Z. (2014). Mapping polyamide-
DNA interactions in human cells reveals a new design strategy for effective targeting of
genomic sites. Angew. Chem. Int. Ed Engl. 53, 10124–10128.
Escalante, C.R., Nistal-Villán, E., Shen, L., García-Sastre, A., and Aggarwal, A.K.
(2007). Structure of IRF-3 bound to the PRDIII-I regulatory element of the human
interferon-beta enhancer. Mol. Cell 26, 703–716.
Escolar, L., Pérez-Martín, J., and de Lorenzo, V. (1998). Binding of the fur (ferric uptake
regulator) repressor of Escherichia coli to arrays of the GATAAT sequence. J. Mol. Biol.
283, 537–547.
Estrem, S.T., Gaal, T., Ross, W., and Gourse, R.L. (1998). Identification of an UP
element consensus sequence for bacterial promoters. Proc. Natl. Acad. Sci. 95, 9761–
9766.
Ethayathulla, A.S., Tse, P.-W., Monti, P., Nguyen, S., Inga, A., Fronza, G., and Viadiu,
H. (2012). Structure of p73 DNA-binding domain tetramer modulates p73
transactivation. Proc. Natl. Acad. Sci. U. S. A. 109, 6066–6071.
Fanucci, G.E., and Cafiso, D.S. (2006). Recent advances and applications of site-directed
spin labeling. Curr. Opin. Struct. Biol. 16, 644–653.
Feldmann, A., Ivanek, R., Murr, R., Gaidatzis, D., Burger, L., and Schübeler, D. (2013).
Transcription factor occupancy can mediate active turnover of DNA methylation at
regulatory regions. PLoS Genet. 9, e1003994.
104
Ferré-D’Amaré, A.R., Pognonec, P., Roeder, R.G., and Burley, S.K. (1994). Structure
and function of the b/HLH/Z domain of USF. EMBO J. 13, 180–189.
Forbes, S.A., Beare, D., Gunasekaran, P., Leung, K., Bindal, N., Boutselakis, H., Ding,
M., Bamford, S., Cole, C., Ward, S., et al. (2015). COSMIC: exploring the world’s
knowledge of somatic mutations in human cancer. Nucleic Acids Res. 43, D805–D811.
Fournier, A., Sasai, N., Nakao, M., and Defossez, P.-A. (2012). The role of methyl-
binding proteins in chromatin organization and epigenome maintenance. Brief. Funct.
Genomics 11, 251–264.
Fox, K.R. (1986). The effect of HhaI methylation on DNA local structure. Biochem. J.
234, 213–216.
Funk, W.D., Pak, D.T., Karas, R.H., Wright, W.E., and Shay, J.W. (1992). A
transcriptionally active DNA-binding site for human p53 protein complexes. Mol. Cell.
Biol. 12, 2866–2871.
Galas, D.J., and Schmitz, A. (1978). DNAse footprinting: a simple method for the
detection of protein-DNA binding specificity. Nucleic Acids Res. 5, 3157–3170.
Gallo, S.M., Gerrard, D.T., Miner, D., Simich, M., Des Soye, B., Bergman, C.M., and
Halfon, M.S. (2011). REDfly v3.0: toward a comprehensive database of transcriptional
regulatory elements in Drosophila. Nucleic Acids Res. 39, D118–D123.
Gartenberg, M.R., and Crothers, D.M. (1988). DNA sequence determinants of CAP-
induced bending and protein binding affinity. Nature 333, 824–829.
Garvie, C.W., and Wolberger, C. (2001). Recognition of specific DNA sequences. Mol.
Cell 8, 937–946.
Gebhard, C., Benner, C., Ehrich, M., Schwarzfischer, L., Schilling, E., Klug, M.,
Dietmaier, W., Thiede, C., Holler, E., Andreesen, R., et al. (2010). General transcription
factor binding at CpG islands in normal cells correlates with resistance to de novo DNA
methylation in cancer cells. Cancer Res. 70, 1398–1407.
Glasfeld, A., Koehler, A.N., Schumacher, M.A., and Brennan, R.G. (1999). The role of
lysine 55 in determining the specificity of the purine repressor for its operators through
minor groove interactions. J. Mol. Biol. 291, 347–361.
Gordân, R., Shen, N., Dror, I., Zhou, T., Horton, J., Rohs, R., and Bulyk, M.L. (2013).
Genomic regions flanking E-box binding sites influence DNA binding specificity of
bHLH transcription factors through DNA shape. Cell Rep. 3, 1093–1104.
Gutierrez-Arcelus, M., Lappalainen, T., Montgomery, S.B., Buil, A., Ongen, H.,
Yurovsky, A., Bryois, J., Giger, T., Romano, L., Planchon, A., et al. (2013). Passive and
active DNA methylation and the interplay with genetic variation in gene regulation. eLife
2, e00523.
105
Han, A., Pan, F., Stroud, J.C., Youn, H.-D., Liu, J.O., and Chen, L. (2003). Sequence-
specific recruitment of transcriptional co-repressor Cabin1 by myocyte enhancer factor-2.
Nature 422, 730–734.
Han, A., He, J., Wu, Y., Liu, J.O., and Chen, L. (2005). Mechanism of Recruitment of
Class II Histone Deacetylases by Myocyte Enhancer Factor-2. J. Mol. Biol. 345, 91–102.
Hancock, S.P., Ghane, T., Cascio, D., Rohs, R., Di Felice, R., and Johnson, R.C. (2013).
Control of DNA minor groove width and Fis protein binding by the purine 2-amino
group. Nucleic Acids Res. 41, 6750–6760.
Hantke, K. (2001). Iron and metal regulation in bacteria. Curr. Opin. Microbiol. 4, 172–
177.
Harrison, S.C., and Aggarwal, A.K. (1990). DNA recognition by proteins with the helix-
turn-helix motif. Annu. Rev. Biochem. 59, 933–969.
el Hassan, M.A., and Calladine, C.R. (1996). Propeller-twisting of base-pairs and the
conformational mobility of dinucleotide steps in DNA. J. Mol. Biol. 259, 95–103.
He, J., Ye, J., Cai, Y., Riquelme, C., Liu, J.O., Liu, X., Han, A., and Chen, L. (2011).
Structure of P300 Bound to MEF2 on DNA Reveals a Mechanism of Enhanceosome
Assembly. Nucleic Acids Res. 39, 4464–4474.
Hegde, R.S. (2002). The papillomavirus E2 proteins: structure, function, and biology.
Annu. Rev. Biophys. Biomol. Struct. 31, 343–360.
Henikoff, S., and Henikoff, J.G. (1992). Amino acid substitution matrices from protein
blocks. Proc. Natl. Acad. Sci. U. S. A. 89, 10915–10919.
Hines, C.S., Meghoo, C., Shetty, S., Biburger, M., Brenowitz, M., and Hegde, R.S.
(1998). DNA structure and flexibility in the sequence-specific binding of papillomavirus
E2 proteins. J. Mol. Biol. 276, 809–818.
Ho, K.L., McNae, I.W., Schmiedeberg, L., Klose, R.J., Bird, A.P., and Walkinshaw,
M.D. (2008). MeCP2 binding to DNA depends upon hydration at methyl-CpG. Mol. Cell
29, 525–531.
Hogan, M.E., Roberson, M.W., and Austin, R.H. (1989). DNA Flexibility Variation May
Dominate DNase I Cleavage. Proc. Natl. Acad. Sci. 86, 9273–9277.
Hong, M., and Marmorstein, R. (2008). Chapter 3:Structural Basis for Sequence-specific
DNA Recognition by Transcription Factors and their Complexes. In Chapter 3:Structural
Basis for Sequence-Specific DNA Recognition by Transcription Factors and Their
Complexes, pp. 47–65.
Honig, B., and Nicholls, A. (1995). Classical electrostatics in biology and chemistry.
Science 268, 1144–1149.
106
Horton, N.C., Dorner, L.F., and Perona, J.J. (2002). Sequence selectivity and degeneracy
of a restriction endonuclease mediated by DNA intercalation. Nat. Struct. Mol. Biol. 9,
42–47.
Hu, S., Wan, J., Su, Y., Song, Q., Zeng, Y., Nguyen, H.N., Shin, J., Cox, E., Rho, H.S.,
Woodard, C., et al. (2013). DNA methylation presents distinct binding sites for human
transcription factors. eLife 2, e00726.
Hudson, B.P., Quispe, J., Lara-González, S., Kim, Y., Berman, H.M., Arnold, E.,
Ebright, R.H., and Lawson, C.L. (2009). Three-Dimensional EM Structure of an Intact
Activator-Dependent Transcription Initiation Complex. Proc. Natl. Acad. Sci. 106,
19830–19835.
Jayathilaka, N., Han, A., Gaffney, K.J., Dey, R., Jarusiewicz, J.A., Noridomi, K., Philips,
M.A., Lei, X., He, J., Ye, J., et al. (2012). Inhibition of the function of class IIa HDACs
by blocking their interaction with MEF2. Nucleic Acids Res. 40, 5378–5388.
John, S., Sabo, P.J., Thurman, R.E., Sung, M.-H., Biddie, S.C., Johnson, T.A., Hager,
G.L., and Stamatoyannopoulos, J.A. (2011). Chromatin accessibility pre-determines
glucocorticoid receptor binding patterns. Nat. Genet. 43, 264–268.
Jolma, A., and Taipale, J. (2011). Methods for Analysis of Transcription Factor DNA-
Binding Specificity In Vitro. In A Handbook of Transcription Factors, T.R. Hughes, ed.
(Springer Netherlands), pp. 155–173.
Jolma, A., Kivioja, T., Toivonen, J., Cheng, L., Wei, G., Enge, M., Taipale, M.,
Vaquerizas, J.M., Yan, J., Sillanpää, M.J., et al. (2010). Multiplexed massively parallel
SELEX for characterization of human transcription factor binding specificities. Genome
Res. 20, 861–873.
Jolma, A., Yan, J., Whitington, T., Toivonen, J., Nitta, K.R., Rastas, P., Morgunova, E.,
Enge, M., Taipale, M., Wei, G., et al. (2013). DNA-Binding Specificities of Human
Transcription Factors. Cell 152, 327–339.
Joshi, R., Passner, J.M., Rohs, R., Jain, R., Sosinsky, A., Crickmore, M.A., Jacob, V.,
Aggarwal, A.K., Honig, B., and Mann, R.S. (2007). Functional specificity of a Hox
protein mediated by the recognition of minor groove structure. Cell 131, 530–543.
Kalhor, R., Tjong, H., Jayathilaka, N., Alber, F., and Chen, L. (2012). Genome
architectures revealed by tethered chromosome conformation capture and population-
based modeling. Nat. Biotechnol. 30, 90–98.
Katoh, Y., Molkentin, J.D., Dave, V., Olson, E.N., and Periasamy, M. (1998). MEF2B Is
a Component of a Smooth Muscle-specific Complex That Binds an A/T-rich Element
Important for Smooth Muscle Myosin Heavy Chain Gene Expression. J. Biol. Chem.
273, 1511–1518.
107
Khurana, E., Fu, Y., Chakravarty, D., Demichelis, F., Rubin, M.A., and Gerstein, M.
(2016). Role of non-coding sequence variants in cancer. Nat. Rev. Genet. 17, 93–108.
Kim, S.S., Tam, J.K., Wang, A.F., and Hegde, R.S. (2000). The structural basis of DNA
target discrimination by papillomavirus E2 proteins. J. Biol. Chem. 275, 31245–31254.
Kim, Y.C., Grable, J.C., Love, R., Greene, P.J., and Rosenberg, J.M. (1990). Refinement
of Eco RI endonuclease crystal structure: a revised protein chain tracing. Science 249,
1307–1309.
Kissinger, C.R., Liu, B.S., Martin-Blanco, E., Kornberg, T.B., and Pabo, C.O. (1990).
Crystal structure of an engrailed homeodomain-DNA complex at 2.8 A resolution: a
framework for understanding homeodomain-DNA interactions. Cell 63, 579–590.
Kitayner, M., Rozenberg, H., Rohs, R., Suad, O., Rabinovich, D., Honig, B., and
Shakked, Z. (2010). Diversity in DNA recognition by p53 revealed by crystal structures
with Hoogsteen base pairs. Nat. Struct. Mol. Biol. 17, 423–429.
Kochanek, S., Renz, D., and Doerfler, W. (1993). Differences in the accessibility of
methylated and unmethylated DNA to DNase I. Nucleic Acids Res. 21, 5843–5845.
Koudelka, G.B., Mauro, S.A., and Ciubotaru, M. (2006). Indirect readout of DNA
sequence by proteins: the roles of DNA sequence-dependent intrinsic and extrinsic
forces. Prog. Nucleic Acid Res. Mol. Biol. 81, 143–177.
Kumar, A., Meinke, G., Reese, D.K., Moine, S., Phelan, P.J., Fradet-Turcotte, A.,
Archambault, J., Bohm, A., and Bullock, P.A. (2007). Model for T-antigen-dependent
melting of the simian virus 40 core origin based on studies of the interaction of the beta-
hairpin with DNA. J. Virol. 81, 4808–4818.
Kvon, E.Z., Kazmar, T., Stampfel, G., Yáñez-Cuna, J.O., Pagani, M., Schernhuber, K.,
Dickson, B.J., and Stark, A. (2014). Genome-scale functional characterization of
Drosophila developmental enhancers in vivo. Nature 512, 91–95.
Lahm, A., and Suck, D. (1991). DNase I-induced DNA conformation. 2 A structure of a
DNase I-octamer complex. J. Mol. Biol. 222, 645–667.
Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K.,
Dewar, K., Doyle, M., FitzHugh, W., et al. (2001). Initial sequencing and analysis of the
human genome. Nature 409, 860–921.
Lara-González, S., Birktoft, J.J., and Lawson, C.L. (2010). Structure of the Escherichia
coli RNA polymerase alpha subunit C-terminal domain. Acta Crystallogr. D Biol.
Crystallogr. 66, 806–812.
Lavery, R., and Sklenar, H. (1989). Defining the structure of irregular nucleic acids:
conventions and principles. J. Biomol. Struct. Dyn. 6, 655–667.
108
Lawson, C.L., and Berman, H.M. (2008). Chapter 4: Indirect Readout of DNA Sequence
by Proteins. In Protein-Nucleic Acid Interactions: Structural Biology, (Cambridge, UK:
The Royal Society of Chemistry), pp. 66–90.
Lazarovici, A., Zhou, T., Shafer, A., Dantas Machado, A.C., Riley, T.R., Sandstrom, R.,
Sabo, P.J., Lu, Y., Rohs, R., Stamatoyannopoulos, J.A., et al. (2013). Probing DNA shape
and methylation state on a genomic scale with DNase I. Proc. Natl. Acad. Sci. 110, 6376–
6381.
Lee, J.-W., and Helmann, J.D. (2007). Functional specialization within the Fur family of
metalloregulators. Biometals Int. J. Role Met. Ions Biol. Biochem. Med. 20, 485–499.
Levo, M., Zalckvar, E., Sharon, E., Dantas Machado, A.C., Kalma, Y., Lotam-Pompan,
M., Weinberger, A., Yakhini, Z., Rohs, R., and Segal, E. (2015). Unraveling determinants
of transcription factor binding outside the core binding site. Genome Res. 25, 1018–1029.
Lister, R., Pelizzola, M., Dowen, R.H., Hawkins, R.D., Hon, G., Tonti-Filippini, J., Nery,
J.R., Lee, L., Ye, Z., Ngo, Q.-M., et al. (2009). Human DNA methylomes at base
resolution show widespread epigenomic differences. Nature 462, 315–322.
Liu, X., Schuck, S., and Stenlund, A. (2007). Adjacent residues in the E1 initiator beta-
hairpin define different roles of the beta-hairpin in Ori melting, helicase loading, and
helicase activity. Mol. Cell 25, 825–837.
Liu, Y., Toh, H., Sasaki, H., Zhang, X., and Cheng, X. (2012). An atomic model of Zfp57
recognition of CpG methylation within a specific DNA sequence. Genes Dev. 26, 2374–
2379.
Liu, Y., Zhang, X., Blumenthal, R.M., and Cheng, X. (2013). A common mode of
recognition for methylated CpG. Trends Biochem. Sci. 38, 177–183.
Liu, Y., Olanrewaju, Y.O., Zheng, Y., Hashimoto, H., Blumenthal, R.M., Zhang, X., and
Cheng, X. (2014). Structural basis for Klf4 recognition of methylated DNA. Nucleic
Acids Res. 42, 4859–4867.
Lohse, M.B., Rosenberg, O.S., Cox, J.S., Stroud, R.M., Finer-Moore, J.S., and Johnson,
A.D. (2014). Structure of a new DNA-binding domain which regulates pathogenesis in a
wide variety of fungi. Proc. Natl. Acad. Sci. 111, 10404–10410.
Lomonossoff, G.P., Butler, P.J., and Klug, A. (1981). Sequence-dependent variation in
the conformation of DNA. J. Mol. Biol. 149, 745–760.
de Lorenzo, V., Wee, S., Herrero, M., and Neilands, J.B. (1987). Operator sequences of
the aerobactin operon of plasmid ColV-K30 binding the ferric uptake regulation (fur)
repressor. J. Bacteriol. 169, 2624–2630.
Lu, X.J., Shakked, Z., and Olson, W.K. (2000). A-form conformational motifs in ligand-
bound DNA structures. J. Mol. Biol. 300, 819–840.
109
Luisi, B.F., Xu, W.X., Otwinowski, Z., Freedman, L.P., Yamamoto, K.R., and Sigler,
P.B. (1991). Crystallographic analysis of the interaction of the glucocorticoid receptor
with DNA. Nature 352, 497–505.
Luscombe, N.M., Austin, S.E., Berman, H.M., and Thornton, J.M. (2000). An overview
of the structures of protein-DNA complexes. Genome Biol. 1, REVIEWS001.
MacArthur, S., Li, X.-Y., Li, J., Brown, J.B., Chu, H.C., Zeng, L., Grondona, B.P.,
Hechmer, A., Simirenko, L., Keränen, S.V.E., et al. (2009). Developmental roles of 21
Drosophila transcription factors are determined by quantitative differences in binding to
an overlapping set of thousands of genomic regions. Genome Biol. 10, R80.
Mahony, S., and Pugh, B.F. (2015). Protein-DNA binding in high-resolution. Crit. Rev.
Biochem. Mol. Biol. 50, 269–283.
Mandel-Gutfreund, Y., Margalit, H., Jernigan, R.L., and Zhurkin, V.B. (1998). A role for
CH...O interactions in protein-DNA recognition. J. Mol. Biol. 277, 1129–1140.
Mardis, E.R. (2007). ChIP-seq: welcome to the new frontier. Nat. Methods 4, 613–614.
Mathelier, A., Fornes, O., Arenillas, D.J., Chen, C.-Y., Denay, G., Lee, J., Shi, W., Shyr,
C., Tan, G., Worsley-Hunt, R., et al. (2016a). JASPAR 2016: a major expansion and
update of the open-access database of transcription factor binding profiles. Nucleic Acids
Res. 44, D110–D115.
Mathelier, A., Xin, B., Chiu, T.-P., Yang, L., Rohs, R., and Wasserman, W.W. (2016b).
DNA Shape Features Improve Transcription Factor Binding Site Predictions In Vivo.
Cell Syst. 0.
Mayer-Jung, C., Moras, D., and Timsit, Y. (1998). Hydration and recognition of
methylated CpG steps in DNA. EMBO J. 17, 2709–2718.
Meijsing, S.H., Pufall, M.A., So, A.Y., Bates, D.L., Chen, L., and Yamamoto, K.R.
(2009). DNA binding site sequence directs glucocorticoid receptor structure and activity.
Science 324, 407–410.
Meng, X., Brodsky, M.H., and Wolfe, S.A. (2005). A bacterial one-hybrid system for
determining the DNA-binding specificity of transcription factors. Nat. Biotechnol. 23,
988–994.
Molkentin, J.D., Black, B.L., Martin, J.F., and Olson, E.N. (1995). Cooperative activation
of muscle gene expression by MEF2 and myogenic bHLH proteins. Cell 83, 1125–1136.
Molkentin, J.D., Black, B.L., Martin, J.F., and Olson, E.N. (1996). Mutational analysis of
the DNA binding, dimerization, and transcriptional activation domains of MEF2C. Mol.
Cell. Biol. 16, 2627–2636.
110
Morin, R.D., Mendez-Lago, M., Mungall, A.J., Goya, R., Mungall, K.L., Corbett, R.D.,
Johnson, N.A., Severson, T.M., Chiu, R., Field, M., et al. (2011). Frequent mutation of
histone-modifying genes in non-Hodgkin lymphoma. Nature 476, 298–303.
Morin, R.D., Assouline, S., Alcaide, M., Mohajeri, A., Johnston, R.L., Chong, L.,
Grewal, J., Yu, S., Fornika, D., Bushell, K., et al. (2016). Genetic Landscapes of
Relapsed and Refractory Diffuse Large B-Cell Lymphomas. Clin. Cancer Res. Off. J.
Am. Assoc. Cancer Res. 22, 2290–2300.
Muiño, J.M., Smaczniak, C., Angenent, G.C., Kaufmann, K., and Dijk, A.D.J. van
(2014). Structural determinants of DNA recognition by plant MADS-domain
transcription factors. Nucleic Acids Res. 42, 2138–2146.
Murphy, T.C., Saleem, R.A., Footz, T., Ritch, R., McGillivray, B., and Walter, M.A.
(2004). The wing 2 region of the FOXC1 forkhead domain is necessary for normal DNA-
binding and transactivation functions. Invest. Ophthalmol. Vis. Sci. 45, 2531–2538.
Nakagawa, S., Gisselbrecht, S.S., Rogers, J.M., Hartl, D.L., and Bulyk, M.L. (2013).
DNA-binding specificity changes in the evolution of forkhead transcription factors. Proc.
Natl. Acad. Sci. U. S. A. 110, 12349–12354.
Nguyen, T., Wang, J., and Schulz, R.A. (2002). Mutations within the conserved MADS
box of the D-MEF2 muscle differentiation factor result in a loss of DNA binding ability
and lethality in Drosophila. Differentiation 70, 438–446.
Ochsner, U.A., and Vasil, M.L. (1996). Gene repression by the ferric uptake regulator in
Pseudomonas aeruginosa: cycle selection of iron-regulated genes. Proc. Natl. Acad. Sci.
U. S. A. 93, 4409–4414.
Ohki, I., Shimotake, N., Fujita, N., Jee, J., Ikegami, T., Nakao, M., and Shirakawa, M.
(2001). Solution structure of the methyl-CpG binding domain of human MBD1 in
complex with methylated DNA. Cell 105, 487–497.
Oishi, Y., Manabe, I., Imai, Y., Hara, K., Horikoshi, M., Fujiu, K., Tanaka, T., Aizawa,
T., Kadowaki, T., and Nagai, R. (2010). Regulatory Polymorphism in Transcription
Factor KLF5 at the MEF2 Element Alters the Response to Angiotensin II and Is
Associated with Human Hypertension. FASEB J. 24, 1780–1788.
Oliphant, A.R., Brandl, C.J., and Struhl, K. (1989). Defining the sequence specificity of
DNA-binding proteins by selecting binding sites from random-sequence
oligonucleotides: analysis of yeast GCN4 protein. Mol. Cell. Biol. 9, 2944–2949.
Olson, W.K., Gorin, A.A., Lu, X.-J., Hock, L.M., and Zhurkin, V.B. (1998). DNA
sequence-dependent deformability deduced from protein–DNA crystal complexes. Proc.
Natl. Acad. Sci. 95, 11163–11168.
111
Olson, W.K., Bansal, M., Burley, S.K., Dickerson, R.E., Gerstein, M., Harvey, S.C.,
Heinemann, U., Lu, X.J., Neidle, S., Shakked, Z., et al. (2001). A standard reference
frame for the description of nucleic acid base-pair geometry. J. Mol. Biol. 313, 229–237.
Otwinowski, Z., Schevitz, R.W., Zhang, R.G., Lawson, C.L., Joachimiak, A.,
Marmorstein, R.Q., Luisi, B.F., and Sigler, P.B. (1988). Crystal structure of trp
repressor/operator complex at atomic resolution. Nature 335, 321–329.
Pabo, C.O., and Sauer, R.T. (1992). Transcription factors: structural families and
principles of DNA recognition. Annu. Rev. Biochem. 61, 1053–1095.
Pan, Y., and Nussinov, R. (2009). Cooperativity dominates the genomic organization of
p53-response elements: a mechanistic view. PLoS Comput. Biol. 5, e1000448.
Panne, D., Maniatis, T., and Harrison, S.C. (2004). Crystal structure of ATF-2/c-Jun and
IRF-3 bound to the interferon-β enhancer. EMBO J. 23, 4384–4393.
Panne, D., Maniatis, T., and Harrison, S.C. (2007). An Atomic Model of the Interferon-β
Enhanceosome. Cell 129, 1111–1123.
Pasqualucci, L., Trifonov, V., Fabbri, G., Ma, J., Rossi, D., Chiarenza, A., Wells, V.A.,
Grunn, A., Messina, M., Elliot, O., et al. (2011). Analysis of the coding genome of
diffuse large B-cell lymphoma. Nat. Genet. 43, 830–837.
Passner, J.M., Ryoo, H.D., Shen, L., Mann, R.S., and Aggarwal, A.K. (1999). Structure
of a DNA-bound Ultrabithorax-Extradenticle homeodomain complex. Nature 397, 714–
719.
Pérez, A., Castellazzi, C.L., Battistini, F., Collinet, K., Flores, O., Deniz, O., Ruiz, M.L.,
Torrents, D., Eritja, R., Soler-López, M., et al. (2012). Impact of methylation on the
physical properties of DNA. Biophys. J. 102, 2140–2148.
Petrey, D., and Honig, B. (2003). GRASP2: visualization, surface properties, and
electrostatics of macromolecular structures and sequences. Methods Enzymol. 374, 492–
509.
Pon, J.R., and Marra, M.A. (2016). MEF2 transcription factors: developmental regulators
and emerging cancer genes. Oncotarget 7, 2297–2312.
Potthoff, M.J., and Olson, E.N. (2007). MEF2: a central regulator of diverse
developmental programs. Development 134, 4131–4140.
Qin, P.Z., Haworth, I.S., Cai, Q., Kusnetzow, A.K., Grant, G.P.G., Price, E.A., Sowa,
G.Z., Popova, A., Herreros, B., and He, H. (2007). Measuring nanometer distances in
nucleic acids using a sequence-independent nitroxide probe. Nat. Protoc. 2, 2354–2365.
Rastogi, C., Liu, D., and Bussemaker, H.J. (2015). SELEX: Functions for analyzing
SELEX-seq data. R package version 1.4.0.
112
Razin, A., and Riggs, A.D. (1980). DNA methylation and gene function. Science 210,
604–610.
Riley, T.R., Slattery, M., Abe, N., Rastogi, C., Liu, D., Mann, R.S., and Bussemaker, H.J.
(2014). SELEX-seq: a method for characterizing the complete repertoire of binding site
preferences for transcription factor complexes. Methods Mol. Biol. Clifton NJ 1196,
255–278.
Rocchia, W., Sridharan, S., Nicholls, A., Alexov, E., Chiabrera, A., and Honig, B.
(2002). Rapid grid-based construction of the molecular surface and the use of induced
surface charge to calculate reaction field energies: applications to the molecular systems
and geometric objects. J. Comput. Chem. 23, 128–137.
Rocha, H., Sampaio, M., Rocha, R., Fernandes, S., and Leão, M. (2016). MEF2C
haploinsufficiency syndrome: Report of a new MEF2C mutation and review. Eur. J. Med.
Genet.
Rockel, S., Geertz, M., and Maerkl, S.J. (2012). MITOMI: a microfluidic platform for in
vitro characterization of transcription factor-DNA interaction. Methods Mol. Biol. Clifton
NJ 786, 97–114.
Rogers, J.M., Barrera, L.A., Reyon, D., Sander, J.D., Kellis, M., Joung, J.K., and Bulyk,
M.L. (2015). Context influences on TALE-DNA binding revealed by quantitative
profiling. Nat. Commun. 6, 7440.
Rohs, R., Sklenar, H., and Shakked, Z. (2005a). Structural and Energetic Origins of
Sequence-Specific DNA Bending: Monte Carlo Simulations of Papillomavirus E2-DNA
Binding Sites. Structure 13, 1499–1509.
Rohs, R., Bloch, I., Sklenar, H., and Shakked, Z. (2005b). Molecular flexibility in ab
initio drug docking to DNA: binding-site and binding-mode transitions in all-atom Monte
Carlo simulations. Nucleic Acids Res. 33, 7048–7057.
Rohs, R., West, S.M., Sosinsky, A., Liu, P., Mann, R.S., and Honig, B. (2009a). The role
of DNA shape in protein–DNA recognition. Nature 461, 1248–1253.
Rohs, R., West, S.M., Liu, P., and Honig, B. (2009b). Nuance in the double-helix and its
role in protein-DNA recognition. Curr. Opin. Struct. Biol. 19, 171–177.
Rohs, R., Jin, X., West, S.M., Joshi, R., Honig, B., and Mann, R.S. (2010). Origins of
Specificity in Protein-DNA Recognition. Annu. Rev. Biochem. 79, 233–269.
Rosenberg, J.M., Seeman, N.C., Kim, J.J., Suddath, F.L., Nicholas, H.B., and Rich, A.
(1973). Double helix at atomic resolution. Nature 243, 150–154.
Ross, W., and Gourse, R.L. (2005). Sequence-Independent Upstream DNA–αCTD
Interactions Strongly Stimulate Escherichia Coli RNA Polymerase-lacUV5 Promoter
Association. Proc. Natl. Acad. Sci. U. S. A. 102, 291–296.
113
Ross, W., Aiyar, S.E., Salomon, J., and Gourse, R.L. (1998). Escherichia coli Promoters
with UP Elements of Different Strengths: Modular Structure of Bacterial Promoters. J.
Bacteriol. 180, 5375–5383.
Sabo, P.J., Hawrylycz, M., Wallace, J.C., Humbert, R., Yu, M., Shafer, A., Kawamoto, J.,
Hall, R., Mack, J., Dorschner, M.O., et al. (2004). Discovery of functional noncoding
elements by digital analysis of chromatin structure. Proc. Natl. Acad. Sci. U. S. A. 101,
16837–16842.
Santelli, E., and Richmond, T.J. (2000). Crystal structure of MEF2A core bound to DNA
at 1.5 Å resolution. J. Mol. Biol. 297, 437–449.
Scarsdale, J.N., Webb, H.D., Ginder, G.D., and Williams, D.C. (2011). Solution structure
and dynamic analysis of chicken MBD2 methyl binding domain bound to a target-
methylated DNA sequence. Nucleic Acids Res. 39, 6741–6752.
Schneider, T.D., and Stephens, R.M. (1990). Sequence logos: a new way to display
consensus sequences. Nucleic Acids Res. 18, 6097–6100.
Schuck, S., and Stenlund, A. (2005). Assembly of a double hexameric helicase. Mol. Cell
20, 377–389.
Schuetz, A., Nana, D., Rose, C., Zocher, G., Milanovic, M., Koenigsmann, J., Blasig, R.,
Heinemann, U., and Carstanjen, D. (2011). The structure of the Klf4 DNA-binding
domain links to self-renewal and macrophage differentiation. Cell. Mol. Life Sci. CMLS
68, 3121–3131.
Seeman, N.C., Rosenberg, J.M., and Rich, A. (1976). Sequence-specific recognition of
double helical nucleic acids by proteins. Proc. Natl. Acad. Sci. U. S. A. 73, 804–808.
Severin, P.M.D., Zou, X., Gaub, H.E., and Schulten, K. (2011). Cytosine methylation
alters DNA mechanical properties. Nucleic Acids Res. 39, 8740–8751.
Shakked, Z., and Rabinovich, D. (1986). The effect of the base sequence on the fine
structure of the DNA double helix. Prog. Biophys. Mol. Biol. 47, 159–195.
Shakked, Z., Guerstein-Guzikevich, G., Eisenstein, M., Frolow, F., and Rabinovich, D.
(1989). The conformation of the DNA double helix in the crystal is dependent on its
environment. Nature 342, 456–460.
Shen, J., Gai, D., Patrick, A., Greenleaf, W.B., and Chen, X.S. (2005). The roles of the
residues on the channel beta-hairpin and loop structures of simian virus 40 hexameric
helicase. Proc. Natl. Acad. Sci. U. S. A. 102, 11248–11253.
Simmons, D.T. (2000). SV40 large T antigen functions in DNA replication and
transformation. Adv. Virus Res. 55, 75–134.
114
Slattery, M., Riley, T., Liu, P., Abe, N., Gomez-Alcala, P., Dror, I., Zhou, T., Rohs, R.,
Honig, B., Bussemaker, H.J., et al. (2011). Cofactor Binding Evokes Latent Differences
in DNA Binding Specificity between Hox Proteins. Cell 147, 1270–1282.
Slattery, M., Zhou, T., Yang, L., Dantas Machado, A.C., Gordân, R., and Rohs, R.
(2014). Absence of a simple code: how transcription factors read the genome. Trends
Biochem. Sci. 39, 381–399.
Smeenk, L., van Heeringen, S.J., Koeppel, M., van Driel, M.A., Bartels, S.J.J., Akkers,
R.C., Denissov, S., Stunnenberg, H.G., and Lohrum, M. (2008). Characterization of
genome-wide p53-binding sites upon stress response. Nucleic Acids Res. 36, 3639–3654.
Sowa, G.Z., and Qin, P.Z. (2008). Site-directed spin labeling studies on nucleic acid
structure and dynamics. Prog. Nucleic Acid Res. Mol. Biol. 82, 147–197.
Spielmann, M., and Mundlos, S. (2016). Looking beyond the genes: the role of non-
coding variants in human disease. Hum. Mol. Genet. 25, R157–R165.
Stella, S., Cascio, D., and Johnson, R.C. (2010). The shape of the DNA minor groove
directs binding by the DNA-bending protein Fis. Genes Dev. 24, 814–826.
Stormo, G.D. (2000). DNA binding sites: representation and discovery. Bioinformatics
16, 16–23.
Stormo, G.D., and Zhao, Y. (2010). Determining the specificity of protein–DNA
interactions. Nat. Rev. Genet. 11, 751–760.
Stormo, G.D., Schneider, T.D., Gold, L., and Ehrenfeucht, A. (1982). Use of the
“Perceptron” algorithm to distinguish translational initiation sites in E. coli. Nucleic
Acids Res. 10, 2997–3011.
Stormo, G.D., Zuo, Z., and Chang, Y.K. (2014). Spec-seq: determining protein–DNA-
binding specificity by sequencing. Brief. Funct. Genomics elu043.
Suck, D., Oefner, C., and Kabsch, W. (1984). Three-dimensional structure of bovine
pancreatic DNase I at 2.5 A resolution. EMBO J. 3, 2423–2430.
Suck, D., Lahm, A., and Oefner, C. (1988). Structure refined to 2A of a nicked DNA
octanucleotide complex with DNase I. Nature 332, 464–468.
Tippin, D.B., and Sundaralingam, M. (1997). Nine polymorphic crystal structures of
d(CCGGGCCCGG), d(CCGGGCCm5CGG), d(Cm5CGGGCCm5CGG) and
d(CCGGGCC(Br)5CGG) in three different conformations: effects of spermine binding
and methylation on the bending and condensation of A-DNA. J. Mol. Biol. 267, 1171–
1185.
Travers, A.A. (1989). DNA conformation and protein binding. Annu. Rev. Biochem. 58,
427–452.
115
Tsodikov, O.V., and Biswas, T. (2011). Structural and thermodynamic signatures of
DNA recognition by Mycobacterium tuberculosis DnaA. J. Mol. Biol. 410, 461–476.
Tuerk, C., and Gold, L. (1990). Systematic evolution of ligands by exponential
enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249, 505–510.
UniProt Consortium (2015). UniProt: a hub for protein information. Nucleic Acids Res.
43, D204–D212.
Vousden, K.H., and Prives, C. (2009). Blinded by the Light: The Growing Complexity of
p53. Cell 137, 413–431.
Wales, S., Hashemi, S., Blais, A., and McDermott, J.C. (2014). Global MEF2 target gene
analysis in cardiac and skeletal muscle reveals novel regulation of DUSP6 by p38MAPK-
MEF2 signaling. Nucleic Acids Res. 42, 11349–11362.
Wang, A.H., Quigley, G.J., Kolpak, F.J., Crawford, J.L., van Boom, J.H., van der Marel,
G., and Rich, A. (1979). Molecular structure of a left-handed double helical DNA
fragment at atomic resolution. Nature 282, 680–686.
Wang, L., Fan, C., Topol, S.E., Topol, E.J., and Wang, Q. (2003). Mutation of MEF2A in
an Inherited Disorder with Features of Coronary Artery Disease. Science 302, 1578–
1581.
Ward, L.D., and Kellis, M. (2012). Interpreting noncoding genetic variation in complex
traits and human disease. Nat. Biotechnol. 30, 1095–1106.
Watkins, D., Hsiao, C., Woods, K.K., Koudelka, G.B., and Williams, L.D. (2008). P22 c2
repressor-operator complex: mechanisms of direct and indirect readout. Biochemistry
(Mosc.) 47, 2325–2338.
Watson, J.D., and Crick, F.H.C. (1953). Molecular Structure of Nucleic Acids: A
Structure for Deoxyribose Nucleic Acid. Nature 171, 737–738.
Wei, C.-L., Wu, Q., Vega, V.B., Chiu, K.P., Ng, P., Zhang, T., Shahab, A., Yong, H.C.,
Fu, Y., Weng, Z., et al. (2006). A Global Map of p53 Transcription-Factor Binding Sites
in the Human Genome. Cell 124, 207–219.
Weinberg, R.L., Veprintsev, D.B., Bycroft, M., and Fersht, A.R. (2005). Comparative
binding of p53 to its promoter and DNA recognition elements. J. Mol. Biol. 348, 589–
596.
Wilson, D.S., and Szostak, J.W. (1999). In vitro selection of functional nucleic acids.
Annu. Rev. Biochem. 68, 611–647.
Wu, W., Huang, X., Cheng, J., Li, Z., Folter, S. de, Huang, Z., Jiang, X., Pang, H., and
Tao, S. (2011). Conservation and Evolution in and among SRF- and MEF2-Type MADS
Domains and Their Binding Sites. Mol. Biol. Evol. 28, 501–511.
116
Wu, Y., Dey, R., Han, A., Jayathilaka, N., Philips, M., Ye, J., and Chen, L. (2010).
Structure of the MADS-box/MEF2 Domain of MEF2A Bound to DNA and Its
Implication for Myocardin Recruitment. J. Mol. Biol. 397, 520–533.
Xu, Y., Xu, C., Kato, A., Tempel, W., Abreu, J.G., Bian, C., Hu, Y., Hu, D., Zhao, B.,
Cerovina, T., et al. (2012). Tet3 CXXC domain and dioxygenase activity cooperatively
regulate key genes for Xenopus eye and neural development. Cell 151, 1200–1213.
Yang, L., Zhou, T., Dror, I., Mathelier, A., Wasserman, W.W., Gordân, R., and Rohs, R.
(2014). TFBSshape: a motif database for DNA shape features of transcription factor
binding sites. Nucleic Acids Res. 42, D148–D155.
Ying, C.Y., Dominguez-Sola, D., Fabi, M., Lorenz, I.C., Hussein, S., Bansal, M.,
Califano, A., Pasqualucci, L., Basso, K., and Dalla-Favera, R. (2013). MEF2B mutations
lead to deregulated expression of the oncogene BCL6 in diffuse large B cell lymphoma.
Nat. Immunol. 14, 1084–1092.
Yoon, C., Privé, G.G., Goodsell, D.S., and Dickerson, R.E. (1988). Structure of an
Alternating-B DNA Helix and Its Relationship to A-Tract DNA. Proc. Natl. Acad. Sci. U.
S. A. 85, 6332–6336.
Zhang, J., Grubor, V., Love, C.L., Banerjee, A., Richards, K.L., Mieczkowski, P.A.,
Dunphy, C., Choi, W., Au, W.Y., Srivastava, G., et al. (2013). Genetic heterogeneity of
diffuse large B-cell lymphoma. Proc. Natl. Acad. Sci. 110, 1398–1403.
Zhang, X., Dantas Machado, A.C., Ding, Y., Chen, Y., Lu, Y., Duan, Y., Tham, K.W.,
Chen, L., Rohs, R., and Qin, P.Z. (2014). Conformations of p53 response elements in
solution deduced using site-directed spin labeling and Monte Carlo sampling. Nucleic
Acids Res. 42, 2789–2797.
Zhang, Y., Coillie, S.V., Fang, J.-Y., and Xu, J. (2016). Gain of function of mutant p53:
R282W on the peak? Oncogenesis 5, e196.
Zhao, Y., Granas, D., and Stormo, G.D. (2009). Inferring binding energies from selected
binding sites. PLoS Comput. Biol. 5, e1000590.
Zhao, Y., Ruan, S., Pandey, M., and Stormo, G.D. (2012). Improved models for
transcription factor binding site identification using nonindependent interactions.
Genetics 191, 781–790.
Zhou, T., Yang, L., Lu, Y., Dror, I., Dantas Machado, A.C., Ghane, T., Felice, R.D., and
Rohs, R. (2013). DNAshape: a method for the high-throughput prediction of DNA
structural features on a genomic scale. Nucleic Acids Res. 41, W56–W62.
Zhou, T., Shen, N., Yang, L., Abe, N., Horton, J., Mann, R.S., Bussemaker, H.J., Gordân,
R., and Rohs, R. (2015). Quantitative modeling of transcription factor binding
specificities using DNA shape. Proc. Natl. Acad. Sci. U. S. A. 112, 4654–4659.
117
Zhurkin, V.B., Tolstorukov, M.Y., Xu, F., Colasanti, A.V., and Olson, W.K. (2005).
Sequence-Dependent Variability of B-DNA. In DNA Conformation and Transcription, T.
Ohyama, ed. (Boston, MA: Springer US), pp. 18–34.
Zilberman, D., Gehring, M., Tran, R.K., Ballinger, T., and Henikoff, S. (2007). Genome-
wide analysis of Arabidopsis thaliana DNA methylation uncovers an interdependence
between methylation and transcription. Nat. Genet. 39, 61–69.
Zou, X., Ma, W., Solov’yov, I.A., Chipot, C., and Schulten, K. (2012). Recognition of
methylated DNA through methyl-CpG binding domain proteins. Nucleic Acids Res. 40,
2747–2758.
Zweier, M., Gregor, A., Zweier, C., Engels, H., Sticht, H., Wohlleber, E., Bijlsma, E.K.,
Holder, S.E., Zenker, M., Rossier, E., et al. (2010). Mutations in MEF2C from the
5q14.3q15 microdeletion syndrome region are a frequent cause of severe mental
retardation and diminish MECP2 and CDKL5 expression. Hum. Mutat. 31, 722–733.
Zykovich, A., Korf, I., and Segal, D.J. (2009). Bind-n-Seq: high-throughput analysis of in
vitro protein-DNA interactions using massively parallel sequencing. Nucleic Acids Res.
37, e151.
Abstract (if available)
Abstract
Protein-DNA interactions regulate many processes within the cell, and, therefore, unraveling the intricacies of such recognition mode is a first step in understanding the bigger complex network that culminates in gene regulation and downstream pathways. From viruses to mammals, the work presented here expands our previous understanding of protein-DNA recognition modes by exploring specific mechanisms employed by proteins that belong to different families and are present in various organisms to recognize their DNA counterparts. While such level of detail is possible through the computational analysis of many crystal structures, data from protein-DNA binding assays allow for further investigation of binding preferences in a high-throughput context. One of the main points raised from our studies is certainly the role of DNA shape as a key factor in achieving protein-DNA specificity. Either at the local level, with for instance the recognition of narrow minor grooves of the DNA and the impact of CpG methylation on DNA shape, or at a global level, with DNA conformational changes as shown for p53, we have shown that proteins can recognize specific shape features of the DNA to achieve binding specificity. Results from computational analysis prompted us to experimentally investigate sources of protein-DNA recognition in another protein identified from our studies as a DNA shape reader. By combining insights from structural studies, high-throughput binding assays and computational analysis of DNA, we are now disentangling sources of protein-DNA specificity.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Genome-wide studies reveal the function and evolution of DNA shape
PDF
Genome-wide studies of protein–DNA binding: beyond sequence towards biophysical and physicochemical models
PDF
Profiling transcription factor-DNA binding specificity
PDF
Quantitative modeling of in vivo transcription factor–DNA binding and beyond
PDF
Understanding protein–DNA recognition in the context of DNA methylation
PDF
Machine learning of DNA shape and spatial geometry
PDF
DNA shape at transcription factor binding sites: from purifying selection to a new alphabet
PDF
Site-directed spin labeling studies of sequence-dependent DNA shape and protein-DNA recognition
PDF
The kinetic study of engineered MBD domain interactions with methylated DNA: insight into binding of methylated DNA by MBD2b
PDF
Deciphering protein-nucleic acid interactions with artificial intelligence
PDF
Data-driven approaches to studying protein-DNA interactions from a structural point of view
PDF
Improved methods for the quantification of transcription factor binding using SELEX-seq
PDF
Forkhead transcription factors regulate replication origin firing through dimerization and cell cycle-dependent chromatin binding in S. cerevisiae
PDF
Biochemical mechanism of TopBP1 recruitment to sites of DNA damage
PDF
Simulating the helicase motor of SV40 large tumor antigen
PDF
The function of Rpd3 in balancing the replicaton initiation of different genomic regions
PDF
Computational analysis of DNA replication timing and fork dynamics in S. cerevisiae
PDF
Biochemical characterization and structural analysis of two hexameric helicases for eukaryotic DNA replication
PDF
Mechanistic basis for chromosomal translocations at the E2A gene
PDF
Studies on the role of Artemis in non-homologous DNA end-joining to understand the mechanism and discover therapies
Asset Metadata
Creator
Dantas Machado, Ana Carolina
(author)
Core Title
Decoding protein-DNA binding determinants mediated through DNA shape readout
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Molecular Biology
Publication Date
04/13/2018
Defense Date
08/31/2016
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
5-methylcytosine,DNA methylation,DNA minor groove,DNA structure,high-throughput binding assay,OAI-PMH Harvest,protein–DNA interaction,protein-DNA recognition,transcription factors
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Rohs, Remo (
committee chair
), Aparicio, Oscar (
committee member
), Berman, Helen (
committee member
), Chen, Lin (
committee member
)
Creator Email
dantasma@usc.edu,karoldantas1@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-314485
Unique identifier
UC11213716
Identifier
etd-DantasMach-4880.pdf (filename),usctheses-c40-314485 (legacy record id)
Legacy Identifier
etd-DantasMach-4880.pdf
Dmrecord
314485
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Dantas Machado, Ana Carolina
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
5-methylcytosine
DNA methylation
DNA minor groove
DNA structure
high-throughput binding assay
protein–DNA interaction
protein-DNA recognition
transcription factors