Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
The effects of divalent counterions on the formation and stabilization of RNA tertiary structure
(USC Thesis Other)
The effects of divalent counterions on the formation and stabilization of RNA tertiary structure
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
The Effects of Divalent Counterions on the Formation and Stabilization
of RNA Tertiary Structure
by
Paul S. Henke
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(PHYSICS)
December 2015
Copyright 2015 Paul Henke
i
To Terri.
ii
Acknowledgements
My most sincere thanks go to my mentor and advisor, Chi Mak, for his guidance and support
over the past five years. His enthusiasm for teaching and relentless pursuit of knowledge is truly
inspirational, and has motivated me to push myself intellectually much farther than I ever
thought possible. Under his direction and with his support I have become far more confident in
myself and proficient as a scientist.
I would also like to thank the many professors at USC who offered their time and energy
to help me learn and grow as a physicist – notably Aiichiro Nakano, Howard Taylor, Stephan
Haas, and Richard Thompson. Thanks to their dedication to teaching and their helpful attitudes,
I have not only learned a tremendous amount about physics, but I feel that I have become a better
person for having had the chance to work with and study under them.
My deepest appreciation goes to my parents, Jon and Trudy, for being a constant pillar of
support throughout my education. From reading to me as an infant to moving to a new city so
that I could go to a high school where I could play my saxophone and learn computer science, to
doing my laundry in college every other week, I could never fully express my gratitude and love
for them. I have been incredibly lucky to have them as my parents.
Finally, I dedicate this thesis to my fiancée, Terri, who has been consistently tough but
fair throughout my time in graduate school. She is warm-hearted, beautiful, and a joy to be
around, and is always there and willing to share my successes and my failures, and I could not
have made it to where I am without her. My happiest success as a graduate student is
undoubtedly that I found and fell in love with her.
iii
Table of Contents
Acknowledgements ................................................................................................................... ii
Table of Contents .................................................................................................................... iii
Abstract ................................................................................................................................... vi
Chapter 1: Introduction ............................................................................................................1
1.1 RNA Function and Biological Relevance ...........................................................................1
1.2 RNA Structure ...................................................................................................................6
1.3 Charged nature of RNA ................................................................................................... 10
1.4 RNA Simulation .............................................................................................................. 11
1.5 Outline ............................................................................................................................. 13
Chapter 2: Background .......................................................................................................... 15
2.1 Debye-Hückel Theory ...................................................................................................... 15
2.2 Counterion Condensation Theory ..................................................................................... 19
2.3 Recent Developments ...................................................................................................... 23
Chapter 3: Tight-Binding Theory of the RNA-Counterion Interaction ............................... 25
3.1 Abstract ........................................................................................................................... 25
3.2 Introduction ..................................................................................................................... 26
3.3 The Ion RNA Model ........................................................................................................ 27
3.3.1. Mean Field Treatment of the Unassociated Ions........................................................ 34
3.3.2. Monte Carlo Treatment of the Associated Ions. ........................................................ 40
3.4 Association Equilibrium of Counterions to the RNA ........................................................ 42
3.4.1 Placement of Ion Interaction Sites ............................................................................. 43
3.4.2. Equilibrium Distribution between Associated and Dissociated Ions. ......................... 45
3.4.3 Higher-Valency Counterions ..................................................................................... 49
iv
3.5 Free Energies and Ion-Mediated RNA Fold Stabilities ..................................................... 51
3.5.1 Monte Carlo Calculation of the Free Energy .............................................................. 52
3.6 Ion-Mediated Free Energies and RNA Fold Stabilities ..................................................... 53
3.6.1 The Hammerhead Ribozyme ..................................................................................... 53
3.6.1.1 Ion-Ion Correlation and Other Ion Effects ........................................................... 56
3.6.1.2 Ion Titrations of Specific Ion Interaction Sites .................................................... 59
3.6.2 The Group I Intron in Tetrahymena ........................................................................... 63
3.6.2.1 P4-P6 Domain of the Group I Intron ................................................................... 63
3.6.2.2 The Full Tetrahymena Group I Intron.................................................................. 71
3.7 Conclusion ....................................................................................................................... 73
Chapter 4: Computational Methodology for Calculating the Free Energy of Interaction
between RNAs and Divalent Counterions .............................................................................. 76
4.1 Abstract ........................................................................................................................... 76
4.2 Introduction ..................................................................................................................... 77
4.3 Theory ............................................................................................................................. 78
4.3.1 The RNA-Counterion Model .................................................................................... 80
4.3.2 Ion Configurations Most Relevant to the Free Energy ............................................... 83
4.3.3 Physical Characteristics of Lowest-Energy Counterion Configurations ..................... 86
4.3.4 Domain Boundary Transition Operations .................................................................. 92
4.4 Results ............................................................................................................................. 93
4.4.1 Boundary Search ...................................................................................................... 93
4.4.2 Monte Carlo Search and Validation .......................................................................... 96
4.4.3 Implementation in Atomistic Simulations ................................................................. 98
4.5 Conclusion ..................................................................................................................... 104
v
Chapter 5: Coarse-Grained Model of RNA with Implicit Divalent Counterions ............... 105
5.1 Abstract ......................................................................................................................... 105
5.2. Introduction .................................................................................................................. 105
5.3. Methodology ................................................................................................................ 107
5.3.1. Theory .................................................................................................................... 108
5.3.2 Model ...................................................................................................................... 113
5.3.3. Validation............................................................................................................... 121
5.4. Results .......................................................................................................................... 129
5.5. Conclusion .................................................................................................................... 134
Chapter 6: Conclusion .......................................................................................................... 135
Bibliography .......................................................................................................................... 138
vi
Abstract
Ribonucleic Acid (RNA) is a ubiquitous, highly functional biomolecule found throughout life.
As a polyelectrolyte, RNA is negatively charged and requires the presence of positive
counterions to condense into the specific, folded structures that are necessary for the chemistry
of life. RNA molecules are especially influenced by divalent metal ions such as Mg
2+
, the
presence of which not only helps to neutralize the RNA, but creates an overall attractive free
energy – an important effect not found under simple monovalent salt conditions.
Methods of modeling ionic and polyionic systems date back to the early 20
th
century,
when Peter Debye and Erich Hückel developed their purely analytical, Poisson-Boltzmann
formalism of electrolytes that persists as the primary model used for most theoretical
applications. However, the Debye- Hückel model breaks down when considering highly
correlated polyions such as nucleic acids, as well as divalent and other high-valence ionic
solutions in concentrations similar to those found in vivo. The theoretical limits of the Debye-
Hückel model in tandem with the rise of simulation and numerical methods for studying
biological systems has led to a need for alternative models of polyelectrolyte systems that will
correctly account for complex ionic conditions while simultaneously allowing for calculations
fast enough to be incorporated into simulation.
In this thesis, we study the interaction between RNA and positive counterions, focusing
specifically on divalent cations. We begin with a discussion the structure and function of RNA,
and we cover new research showing that RNA has more functionality and biochemically relevant
applications than ever previously thought. This is followed by a recap of prior theoretical studies
and models of electrolyte and polyelectrolyte systems – both for RNAs specifically as well as
general models. We then present our own model for RNA-ion interactions, which allows us to
calculate the counterion-mediated free energy of RNA, but only through an NP-complete
optimization problem. A heuristic approach to this NP-complete problem is developed, which is
shown to work very well for all functional RNAs smaller than the ribosome, and is
vii
computationally fast and effective enough to be incorporated into full simulation. Monte Carlo
simulations of RNA are run with our implicit divalent ion model incorporated, and we show that
our model matches closely with experiment. Most biomolecular simulation is done via
molecular dynamics methodology, and to account for this, we further develop our numerical
approach so that ensemble forces may be calculated and incorporated into molecular dynamics
simulations. Finally, we develop a full coarse-grained model of RNA that accurately handles the
attractive electrostatic forces generated by the presence of divalent counterions, and use this
model to simulate and study RNA 2-way junctions.
1
Chapter 1:
Introduction
Ribonucleic Acid (RNA) is one of the most important, versatile, and active biomolecules found
in all living cells. Known for the essential role it plays in the processes of gene expression,
regulation, and suppression, RNA also plays an active biochemical role in vivo as a catalyst for
important reactions in biological pathways critical to life. As our understanding of the
biochemical role of RNA is expanded, we must also form a deeper physical understanding of
inter- and intra-molecular interactions of RNA as a means of predicting intracellular behavior
and ultimately controlling genetic disease.
This thesis primarily serves as a discussion of the current state of RNA structure
prediction and the importance and impact of the ionic environment of the cell in RNA folding.
Simulation as a means of RNA structure prediction is dependent on an accurate model of the
physics involved in the intracellular system. In this thesis we model and capture these physics
in order to improve the accuracy and predictive power simulation. In this first chapter, we
preface the discussion by detailing the biological relevance of RNA, its chemical structure, and
the cellular environment in which RNA is typically found.
1.1 RNA Function and Biological Relevance
From the biological perspective, RNA is most well understood for its role in gene expression.
Deoxyribonucleic acid (DNA) encodes all of the genetic information within cells in the form of a
sequence of nucleotides. That sequence, the genome, is used selectively by different types of
cells to build and maintain the appropriate protein enzymes that do the bulk of chemical work in
the cell. Importantly, when DNA is structured in its well-known double helix, it proves to be an
extremely stable molecule, and in eukaryotic cells it is further encapsulated and stabilized within
a heavily regulated nucleus. The chemical stability and physical separation of DNA from the
rest of the cell makes it a powerful medium for long term storage of genetic information because
2
it is of the utmost importance that the gene sequence be preserved for as long as possible within a
single cell. However, due to its conservative properties, DNA is a poor choice to actually pass
information or to dynamically adjust to a variable cellular environment. For instance, if the salt
concentration in a cell begins to increase and the cell needs to react by creating more enzymes
that remove the salt, then a static DNA molecule cannot change, nor would it benefit the cell for
that to happen.
Understanding the nature of the cell and DNA in this way sets the stage for understanding
the role of RNA in vivo. RNA is similar to DNA in that it is also a nucleic acid, ultimately
defined by its sequence of nucleotides, but RNA does not form any structure as stable as the
DNA duplex, and so is not suited to storing long-term genetic information. Rather, RNA is
utilized within the cell for its more dynamic and modifiable properties.
Primarily, RNA is used as a conveyer of genetic information in the form of messenger
RNA (mRNA). mRNA is created by a protein enzyme, RNA Polymerase, which reads
information from the DNA and simultaneously creates a complementary RNA sequence in a
process called transcription. That sequence, the ‘message’, can then be transported away from
the DNA and used elsewhere in the cell. Furthermore, mRNA is easily broken down by
ribonuclease (RNase) enzymes that are ubiquitous in and around any cell, which makes the
mRNA a relatively short-lived molecule. The information carrying capacity and short lifespan of
mRNA allows the cell to quickly adapt to changing conditions which would not be possible with
DNA alone.
The actual message that mRNA encodes is almost always that of a ‘gene sequence’,
which is a sequence of nucleotides that can be used by the cell to create a protein. Proteins are
powerful and functional molecules that act as the workhorse of all biochemistry, but they are
composed of amino acids, and so they cannot be directly copied from DNA in the same way
RNA is transcribed. Instead, a cellular structure called the ribosome reads the gene sequence on
mRNAs and creates a protein with a specific sequence of amino acids in a process called
‘translation’. The ribosome is a very large macromolecular complex (in fact it is so large that it
3
is often visible with simple light microscopy), composed mostly of RNA with many protein
‘chaperone’ molecules that help maintain its structure. The ribosomal RNA (rRNA) that forms
the bulk of the structure is the main functional actor in the ribosome, and is typically 4000–6000
nucleotides (nt) in length depending on the type of cell and ribosomal unit. Interestingly,
because of its huge importance in cellular maintenance and vitality, the nucleotide sequence of
rRNA and the functional sequence in particular are some of the most highly conserved gene
sequences across all life [1].
The mRNA sequence is read by the ribosome in the process of translation, during which
the ribosome utilizes transfer RNA (tRNA) molecules to recruit amino acids into the budding
protein strand. There are dozens of unique tRNA sequences; each one able to pair a specific 3-
nucleotide sequence (known as a codon) in the tRNA sequence itself to a specific amino acid
bound by the tRNA. Thus, the ribosome reads the mRNA one codon at a time, and uses tRNA to
exactly translate the given gene sequence into the desired protein.
mRNA, rRNA, and tRNA represent the canonical forms of RNA which combine to create
the gene expression platform used by all living cells. However, the usefulness of RNA in vivo
only begins with this platform. In fact, a cascade of discoveries over the past 30 years have led
to a much deeper understanding of the importance of RNA, and have helped to shape opinions in
evolutionary biology [2].
Even within the three types of RNA described so far, the biological versatility of RNA is
readily apparent. RNA is able to encode and transmit information via a sequence of nucleotides,
bind with extreme specificity to non-nucleobase molecules such as amino acids, and catalyze the
formation of new chemical bonds when building proteins from individual amino acids. It should
then come as no surprise that there are other types of RNAs that utilize these properties to carry
out many cellular activities.
Two of the most recent and relevant discoveries relating to RNA come in the form of
micro RNA (miRNA) and small interfering RNA (siRNA) [3]. Although RNA double helices
are not as stable as their DNA counterparts, double stranded RNA (dsRNA) can nonetheless be
4
formed from two complementary RNA sequences. miRNA and siRNA both take advantage of
this pairing. When the cell needs to quickly stop expressing a gene, but mRNA that codes for
that gene still exists in the cell and has yet to break down, the cell may transcribe miRNA, which
are short RNA sequences of RNA ( nt), and are semi-complementary to the gene that needs
to stop being transcribed. Once the miRNA finds and binds to the mRNA in question, the
ribosome can no longer translate it, because the ribosome can only translate single-stranded RNA
(ssRNA). In this way, miRNA is an effective way of dynamically maintaining the amount of
genes that can be transcribed and allows the cell to either destroy or store the mRNA for later
use [4,5]. In a very similar way, some cells defend themselves from retrovirus attacks by
creating short siRNAs that are exactly complementary to the single-stranded RNA that the cell
wishes to destroy. When the siRNA binds tightly to the viral mRNA, it becomes a target for
RNases that seek out dsRNAs, and is destroyed [6].
Non-coding, functional RNA molecules are RNAs that do not directly utilize their base
sequence as a part of gene expression or regulation. Much like the ribosome, the sequence of
these RNAs allows them to ‘fold’ into a compact, biologically active structure. These functional
RNAs come in two flavors: ribozymes and riboswitches.
Ribozymes are RNA corollaries to protein enzymes, and help to catalyze important
biochemical reactions [7]. Most known ribozymes are self-excising introns [8,9]. Introns are
sequences of mRNA that must be cut out of the sequence before transcription (exons are the
counterparts of introns, and are transcribed portions of RNA). These ribozymes catalyze a single
excision reaction, in which they cleave the covalent bonds holding them in the RNA sequence,
and then splice together the two loose RNA tails left behind. This self-excision is a one-time
reaction that arises purely due to the base sequence. The only ribozyme besides the ribosome
known to be a true catalyst that can turnover multiple reactions, is the ribonuclease RNase
P [10,11]. Like other ribonucleases, which are all protein enzymes, RNase P cleaves longer
RNA sequences. Ribozymes have been found to exist in both prokaryotes and eukaryotes.
5
Riboswitches are regulatory RNA sequences that exist primarily in prokaryotic cells.
Unlike ribozymes, which fold into a compact form that contains an active site allowing for
catalytic action, riboswitches fold into a compact form that can specifically bind ligands in the
cellular milieu [12,13]. Typically a riboswitch is a part of a gene that helps to regulate the
concentration of a specific type of molecule to which the riboswitch is attuned. As an explicit
example, the bacterial bpuE gene encodes for a purine pump that removes purine molecules such
as adenine from the cell. The bpuE gene contains a riboswitch that is attuned to adenine. During
transcription, the riboswitch will fold due to base-pairing interactions, and if a large
concentration of adenine is present it will likely bind to the riboswitch via hydrogen-bonding
interactions and stabilize the riboswitch structure thereby allowing transcription to continue and
the purine pump to be forged [14]. Had the adenine concentration been low, however, the
riboswitch structure would have not been stabilized and transcription would have halted,
disallowing creation of the purine pump. It is easy to see that the name ‘riboswitch’ is especially
apt, as the sequence acts as a simple switch attuned to the concentration of a specific ligand that
can turn gene expression on or off.
In addition to siRNA, miRNA, and functional ribozymes and riboswitches, there is a
wealth of exciting experimental research currently being conducted on so-called ‘dark RNA’. Of
a DNA sequence, the ‘exome’ is the sequence of exons that is known to actuall y code for
proteins. In a complex organism such as a human, the exome only comprises of the total
DNA sequence. Until recently, the rest of the DNA was thought to be unused by the cell, and
was termed ‘junk DNA’. However, recent discoveries have sho wn that a significant portion of
this non-coding DNA may encode RNAs that play an important role in the cell. Specifically,
experiments have shown that embryo development is heavily dependent on the presence of this
dark RNA [15], and that there are highly-conserved sequences of DNA that codes for non-coding
RNA across a large spectrum of complex organisms [16].
6
1.2 RNA Structure
The structure of RNA is broken up hierarchically into three tiers: primary, secondary, and
tertiary structure. This hierarchy allows for discussion of the RNA structure at different levels,
where the primary structure is the most simple and explicit description, and tertiary structure is
the most complex. Formation of a complete RNA structure is chronological, and each type of
structure is formed sequentially in time [17]. In practice, formation of each structural tier
necessarily follows the formation of the previous tier: primary structure is formed first, followed
by establishment of secondary structure, after which the tertiary structure can form [18,19].
Figure 1.1 A short RNA sequence,
, that illustrates the primary structure of RNA. In the primary
structure, only the base sequence is described, which gives a one-dimensional description of the molecule.
RNA primary structure is defined by the sequence of nucleotides. A nucleotide is a
monomer unit of a nucleic acid, and is comprised of a phosphate group, a ribose sugar, and a
nucleobase. In RNA, the nucleobases are the purines adenine (A) and guanine (G), and the
pyrimidines, cytosine (C) and uracil (U). A nucleoside is the ribose and base without the
phosphate. Nucleotides are sequentially bound together by a phosphodiester bond that connects
7
the 5’ carbon of one nucleoside ribose to the 3’ carbon of the next. An RNA sequence is
asymmetrical, and can unambiguously defined by writing it in the 5’ to 3’ direction. A short
RNA primary structure,
is shown in Fig. 1.1.
Secondary structure is defined by the pairing of bases in one or multiple RNA strands.
RNA nucleotides can base-pair by forming multiple stabilizing hydrogen bonds between bases.
The canonical Watson-Crick base-pairs are A-U and G-C. A-U has two hydrogen bonds and G-
C is more stable with three. Although these sets of base pairs are by far the most commonly
found in RNA secondary structure, it is important to note that nucleobases often interact with
each other in non-canonical ways such that they form stable, non-Watson-Crick base pairs.
These include the ‘wobble pair’ G -U and non-canonical Hoogsteen base pairs, which pair A-U
and G-C, but with a weaker hydrogen bonding arrangement.
Figure 1.2 Secondary structure of a hairpin loop (left), and associated three-dimensional structure (right) of a typical
RNA A-form helix (PDB ID: 4FNJ) [20]. The secondary structure is fully defined by the base-pairing interactions
between nucleotides, which are depicted as bars for standard Watson-Crick hydrogen bonding, and dots for non-
Watson-Crick base pairs, that are nonetheless stabilized by hydrogen bonding. The hairpin loop is the most
common RNA secondary structural motif, and typically forms as an A-form helix.
The most important secondary structural motif is the hairpin loop. This structure forms
via base pairing of multiple consecutive bases along two separate lengths of a single strand of
8
RNA. In this case, there is a loop region that connects the two base-paired sections that is
typically three to five nucleotides in length. The three dimensional structure most often
associated with a hairpin loop is an A-form helix, an illustration of which can be seen in Fig.
1.2. In addition to the hairpin loop, the other major secondary structure motif is the pseudoknot,
which is a base-pairing arrangement where two hairpin loops share each other loop sections as
part of their base-paired stems [21].
RNA tertiary structure is defined by the full set of three-dimensional intra-molecular
interactions within a single RNA molecule. Whereas primary structure is completely defined by
sequence, and secondary structure is completely defined by base-pairing, tertiary structure is
significantly more ephemeral in its meaning. Because all molecules are in constant diffusive
motion, a set of atomic coordinates is not enough to define the true structure of an RNA. That is,
a set of coordinates may define one possible specific structure, but the true tertiary structure is
defined by a set of conserved tertiary interactions.
There are far more RNA tertiary interactions than there are secondary structure motifs.
These interactions are often related to non-secondary base pairing, as in the case of tetraloop-
tetraloop receptor interaction, wherein the free nucleobases in a hairpin loop form hydrogen
bonds with free bases in another part of the molecule [22]. The complete list of tertiary
interactions is too long to detail here, but some of the other important tertiary motifs are kink
turns [23], which are tight, stable bends in the RNA helix; A-minor interactions, which involve
adenine bases forming base-triples in the minor groove of a helix; and metal ion-metal ion
receptor interactions, wherein the RNA is stabilized at a specific point about a metal ion [24].
There are several ways of experimentally determining the tertiary structure of a nucleic
acid. The most popular method by far is X-ray crystallography. Originally developed for
biomolecules in 1958 by Kendrew et. al. [25], the structures of over 100,000 biomolecules have
been solved using crystallography as of 2014 [26], To determine an RNA structure using X-Ray
crystallography, a solution with high concentration of an RNA is prepared, and the RNA is then
carefully crystallized out of solution, often at very low temperatures [27,28]. The RNA crystal is
9
then bombarded by a beam of X-rays, and the resulting diffraction pattern is analyzed to
reconstruct the crystallographic spatial structure of the RNA molecule. This method gives
extremely fine resolution, often of the order of 1-2 Å, and with very few exceptions reveals the
true native conformation of the molecule being analyzed [29]. Note that if the molecule being
analyzed has large flexible regions, it may not be susceptible to crystallization, and it is standard
procedure for an RNA sequence to be modified or truncated in order to produce a useable
crystal. An example of an X-Ray crystal RNA structure is shown in Fig. 1.3.
Figure 1.3 Crystal structure of a mutant guanine riboswitch aptamer domain (PDB ID: 4FEN) [30]. This structure is
67 nucleotides in length and has an atomic resolution of , which is very fine resolution for an RNA structure
of this size.
A corollary to X-Ray crystallography, small angle x-ray scattering (SAXS) also takes
advantage of the principle of elastic X-Ray scattering. However, in the SAXS method
biomolecules are analyzed in solution rather than as a crystal. The advantage to this is that large-
scale polymer properties like the radius of gyration, scaling exponent, and a rough average shape
can be determined for molecules that may not be crystallizable [31,32]. SAXS studies of RNA
in salt solutions have contributed a great deal of knowledge to our understanding of the
interaction between RNA and ions [33–35].
10
The other most commonly used method for revealing a detailed structure of RNA is
nuclear magnetic resonance (NMR) spectroscopy. In NMR, a purified solution of molecule is
placed in a strong magnetic field which orients the spins of the atoms within the molecule. Once
equilibrated, the solution is exposed to radiation in the radio frequency range which is absorbed
by some spins, causing the spins to flip. Measuring this characteristic frequency for all spins
thus reveals the composition and structure of the molecule [36,37]. Because the molecules are
studied in solution, NMR spectroscopy often reveals an ensemble of existing structures, rather
than a single structure revealed by crystallography. There has been much success in studying
nucleic acids via NMR spectroscopy [38].
Beyond X-ray scattering and NMR, several other methods for determining the structure
and dynamics of RNA have been introduced including electron microscopy [39] and neutron
scattering [40]. One other method of note is fluorescence resonance energy transfer (FRET)
which utilizes two chromophore molecules that fluoresce when they are spatially near to each
other. The chromophores can be attached to the RNA molecule at different parts of the
sequence, and then fluorescence is observed over time. The scale and intensity of fluorescence
allows experimentalists to observe RNA dynamics as a function of a varying the
environment [12,41].
1.3 Charged nature of RNA
Having described the many forms of RNA that fold into compact conformations stabilized by
secondary and tertiary structural interactions, it is perhaps surprising to note that RNA is
inherently a highly charged polymer. The phosphate molecules that bind together nucleosides,
PO
4
3-
, naturally ionize in solution. In the RNA polymer, two of the free electrons are bound by
neighboring carbon atoms and so there is a net charge of on each nucleotide where is the
proton charge.
Considering that the phosphates along the polymer backbone naturally repel each other
via the Coulombic interaction, the additional interaction between the RNA molecule and the free
11
ions in the surrounding environment is what allows for folding and stabilization. More
specifically, the cellular environment is an ionic milieu in which ion concentrations are heavily
regulated to maintain optimal conditions for life. With regard to RNA, positive cations are
attracted to the negatively charged RNA. These ions help to neutralize the effective charge on
the RNA and allow it to form compact conformations.
The four cations with the largest concentrations in the cell are K+ ( ) [42],
Na+ ( ) [42], Ca
2+
( ) [43], and Mg
2+
( ) [43]. RNAs will fold
in the presence of high concentrations of any cation, but is especially attuned to Mg
2+
[44,45].
Magnesium concentration is not heavily regulated on an individual cellular basis [46] as is the
case with monovalent potassium and sodium ions but is instead regulated on an intercellular
level in multicellular organisms [43].
While many of the tertiary interactions discussed in the previous section are well
understood in terms of how they form and stabilize, the interaction between RNAs and divalent
ions like Mg
2+
is still not entirely understood. It has been shown experimentally that ribozymes
will fold in the presence of Mg
2+
at physiological concentrations, but that concentrations of
monovalent cations must be many times their physiological levels, indicating that non-
monovalent cations are especially important to RNA folding [47]. In this thesis, we increase our
understanding of the RNA-divalent ion interaction through theory and simulation.
1.4 RNA Simulation
The earliest simulations of RNA were carried out in 1984 by Harvey et. al., who utilized
molecular dynamics (MD) to simulate the Phenylalanine tRNA for 12 picoseconds [48]. tRNA
is a relatively large biomolecule, especially for computers at the time to handle, and so these
simulations were carried out with no explicit solvent molecules. However, due to the charged
nature of RNA and the lack of efficient methods at the time for handling long-range electrostatic
interactions, it was not until the mid-1990s that computational advances allowed for fully-
atomistic MD simulations of small RNA hairpins utilizing explicit solvent [49–51]. Advances in
12
MD simulations and biomolecular force fields have advanced to the point where, in 2013, Chen
et. al. was able to demonstrate fully accurate de novo folding of RNA tetraloops eight
nucleotides in length [52].
While much progress has been made in full atomistic MD simulation of RNA over the
past three decades in terms of accuracy and simulation time, it leaves much to be desired in terms
of a being of practical use and in a predictive capacity. Due to the inherently serial nature of MD
numerical integration, there is a relatively hard limit on the amount of time that can be simulated
in practice. Optimizations in numerical algorithms and advances in extreme parallel processing
has yielded fully atomistic simulations of biomolecules in the 100-μs time range [53], and highly
specialized supercomputers have been able to simulate proteins in the 1-ms time scale [54].
However, simulations such as these are incredibly resource-intensive, are restricted to very small
molecules. In order to study larger, more biologically relevant RNAs, two alternatives to fully-
atomistic MD simulations are currently being pursued.
The first alternative is atomistic Monte Carlo (MC) simulations. MC simulations of
molecules proceed by stochastically generating random molecular conformations and applying a
statistical methodology to the resulting conformational space. In this way, MC is not restricted
by time, and is technically able to explore the full space of the system. MC simulation of RNA
was originally studied by Ulmschneider et. al. whose MC ‘moves’ consisted of rel atively small
backbone bending [55]. Loops MC, an RNA simulation platform developed by Mak [56] allows
much larger conformational moves which allows for study of large RNA molecules. Ultimately,
the effectiveness of MC simulation depends on the efficiency of the moves being applied. As
there are no ‘incorrect’ ways of developing these moves, this area remains largely open to future
study.
The second alternative to atomistic MD with explicit solvent is to reduce the number of
atoms being simulated by either simulating the an atomistic model in implicit solvent, by running
coarse-grained (CG) MD, or both. All RNA in life exists in an aqueous solution, and as such, a
common approach to approximate water without explicitly simulating it is by simply considering
13
it to be constant dielectric medium. Although the relative dielectric of water is , the small
scale of the molecules means a more accurate dielectric value may be between and [57].
Occasionally, a distance-dependent sigmoidal dielectric is applied, with the idea being that atoms
close to each other likely have few water molecules in between, while far away atoms are
essentially separated by bulk water [58–60]. These implicit solvent models typically ignore the
solvation effects of water molecules, which certainly plays a large role in biomolecular
dynamics, and tend to also neglect the ionic milieu. Certain continuum approximations to the
ionic solution exist, and are discussed in much more detail in Chapter 2.
In CGMD, commonly arising groups of atoms are integrated out of the molecular model
in such a way that the overall properties of the molecule remain the same, but with ‘beads’
representing the groups of atoms that are no longer treated explicitly. The advantage of reducing
the number of objects being simulated is that the simulation is able to proceed much faster. For
instance, if there is an order of magnitude reduction in the number of objects simulated, then
operations that run in (
) time run orders of magnitude faster. CGMD methods for
nucleic acids have proliferated in the past decade [61–65], and we build on this work to
introduce our own CGMD model in Chapter 5.
1.5 Outline
The goal of this thesis is to outline our development and application of a novel model of
the charged nature of RNA and its interaction with solvent ions. There are six chapters in this
thesis including this one.
Chapter 2 is a review of electrolyte theory developed to this point. Attempts to
understand the behavior of electrolytes and polyelectrolytes date back nearly 100 years,
producing many fruitful results and creating a rich field with many challenging problems yet to
be solved. This chapter introduces the Debye-Hückel mean-field theory of electrolyte solutions,
the counterion condensation model of polyelectrolytes, and more modern tight-binding models
14
that have arisen in large part due to the rise of cheap and powerful computers in combination
with more efficient numerical methods.
In Chapter 3 we introduce our own tight-binding model in detail. This is a tight-binding
model built mainly with multivalent ions in mind. We show that our model is effective for many
different species of electrolyte solutions and that resulting free energies of interaction compare
favorably with experimental results. We go on to show that this model can be fully incorporated
into an atomistic MC simulation.
Chapter 4 introduces a novel numerical approach to calculating the counterion mediated
free energy based on our RNA-ion model. This chapter introduces several algorithmic
approaches, including uniform-cost search, which are utilized as a means of speeding up the free
energy calculation developed in Chapter 3. The methods developed in this chapter are then fully
incorporated into an implicit-solvent simulation without creating a computational bottleneck, and
we demonstrate the effects of including divalent ions in simulation by simulating true folding of
large RNA structures due to the attractive interaction arising from the divalent ions.
In chapter 5 we develop a full two-state coarse-grained molecular dynamics model of
RNA based around the divalent-ion techniques developed in Chapter 4. We benchmark our
CGMD model against experimental results by reporting the scaling and persistence length of
single-stranded RNA. Microsecond simulations are also run that demonstrate the ability of our
CGMD simulation to accurately reproduce important RNA tertiary structures, and presence of
divalent ions are shown to induce compact tertiary structures.
Finally, Chapter 6 concludes the dissertation with a discussion of where we have left off,
future direct applications of what we have presented herein, and other future directions that can
be investigated in the theory of RNA and electrolytes.
15
Chapter 2:
Background
An electrolyte is a substance that ionizes in an aqueous solution. As a common example,
table salt, NaCl, is a stable neutral compound, but ionizes into Na
+
and Cl
–
ions when dissolved
in water. By extension, a polyelectrolyte is a polymer with monomer groups that ionize when
put into solution. Nucleic acids are examples of polyelectrolytes, as each phosphate group is
ionized in solution to carry a negative charge. As mentioned in the introduction, an RNA with
nucleotides has a net charge of – . This charge naturally plays an important role in the
interaction of RNA with itself and the cellular environment and helps determine how a stable,
compact molecular structure may form.
2.1 Debye-Hückel Theory
The theoretical study of electrolytes and polyelectrolytes dates back to the 1920s, when
Peter Debye and Erich Hückel developed the mean-field, Poisson-Boltzmann theory of
electrolyte solutions [66]. Modern theories are almost entirely underpinned by Poisson-
Boltzmann and its linearized form, known as Debye-Hückel theory. As such, we re-derive the
theory here for the convenience of the reader.
Consider an ionic solution with two ion species of charge
and
. Each individual ion
is treated as a hard sphere of radius containing a centralized point charge of magnitude
defined by its species. Interactions between two ions and are governed by Coulomb’s Law:
where
is the charge on ion , is the proton charge,
is the permittivity of free space,
is
the dielectric constant of the solution, and is the distance between the two charges. The
(2.1)
16
minimum distance between any two ions is due to steric interaction, and so the ion atmosphere
surrounding any given ion starts at a distance from that ion as depicted in Fig. 2.1.
Figure 2.1 Illustration of the basis for Poisson-Boltzmann theory. Each ion is treated as a hard sphere of radius
such that the minimum distance between central point charges is . We treat one ion as the ‘central’ ion during the
derivation, but the results can naturally be applied to any ion in the electrolyte solution.
The total potential at a distance from a central ion is defined as:
where
is the potential due to the central ion and
is the potential due to
the ionic atmosphere.
itself could be calculated via Poisson’s equation:
where
is the charge density of the ionic solution excluding the central ion, and
is the
absolute permittivity. However, because Poisson’s equation is valid onl y for a well-defined
charge distribution, it does not immediately apply to an electrolyte solution, which is a system of
discrete charges in constant movement. To rectify this, we take a statistical time-averaging over
the ion atmosphere so that it may be treated as if it were continuous.
Using the Maxwell-Boltzmann distribution, we can find the average number of ions,
,
for each ion species in solution at a distance from the central ion:
(2.2)
(2.3)
17
where
is the total number of ions of species ,
is Boltzmann’s constant, and is the
temperature. A sum can be performed over all ion species to find a general equation for a time-
averaged ion distribution at distances :
Therefore, by taking
∑
(
)
, and using it as the charge distribution in Eq. 2.3, we can
write down the Poisson-Boltzmann equation:
which gives
as a function of a continuum charge distribution. It is very important to note here
that we have made two assumptions. First, we have assumed that the ionic atmosphere obeys a
Maxwell-Boltzmann spherically-symmetric distribution. This assumption neglects the fact that
ions can be very heavily charged, and may form pairs or other non-symmetric structures.
Second, we assume that
in the Maxwell-Boltzmann distribution is equivalent to
in
Poisson’s equation. In effect, we are taking the continuous, statistically averaged charge
distribution from Maxwell-Boltzmann and saying that it is equivalent to the true, discrete
distribution of ions. Therefore we say this is a “ mean-field” theory, in which the major
approximation is the averaging out of a discrete system in order to treat it as a continuum.
For small
the exponential in Eq. 2.6 may be expanded about
, which gives the
linearized solution:
(
) (2.4)
∑
(
) ∑
(
)
(2.5)
∑
(
)
(2.6)
∑
(
) (2.7)
18
Eq. 2.7 is called the Debye-Hückel equation, and is accurate to (
) because the system is
electrically neutral, and the sum over even powers is zero. The Poisson-Boltzmann equation
given in Eq. 2.5 cannot be solved analytically, but Eq. 2.7 can, and has the solution:
where
has units of inverse length, and
is called the Debye Length. This value represents the
electrostatic shielding effect due to the ionic solution, and is visibile in the exponential decay
term in Eq. 2.8.
The Debye-Hückel model has proven to work very well for mild salt solutions at low
concentrations, in which case it accurately predicts the activity coefficient of the ionic
solution [67,68]. Extensions of the model that incorporate larger charged structures have been
solved as well, including a cell model of the charged rod by Fuoss & Alfrey [69], and the single
charged rod by Kotin & Nagasawa, and separately by Stigter [70,71]. The extensibility of the
model for large charged structures suggests that it could also apply to polyelectrolyte solutions.
In fact, Debye-Hückel has seen a tremendous amount of use in the theoretical study of electrolyte
and polyelectrolyte solutions since its development. In particular, most simulations designed for
study of polyelectrolytes have implemented Debye- Hückel electrostatics [72–76].
Despite the successes of Debye-Hückel, there remain many shortcomings of the model
with respect to polyelectrolytes. First and foremost, Poisson-Boltzmann is a mean-field theory
that assumes that the ions in solution obey a statistical averaging. For multivalent ions, it
becomes more probable that the ions will pair with each other and thus result in large statistical
correlations the larger their individual charges. Bjerrum attempted to deal with this problem in
(2.8)
√
(∑
)
(2.9)
19
his development of the theory of ion association [77,78]. In his modification of the Debye-
Hückel model, Bjerrum treated ion pairs separately, and their relationship with the ion
atmosphere is ignored. The Bjerrum model is therefore only applicable when the ion
concentration is very low, and the potential arising from the ion atmosphere is negligible.
In the case of polyelectrolytes, the charges along the polymer are necessarily highly
correlated with each other. Furthermore, biopolymers must exist and operate in vivo, in which
salt concentrations are fairly high, and there are many different species of ions. Although the
Debye- Hückel model works well for experiments that can be strictly controlled, with
biomolecules we must work within the constraints provided by life. Thus, when considering
multivalent ions at physiological concentrations, Debye- Hückel does not provide a good
solution due to the powerful correlations between ions, and Bjerrum’s theory of ion association
because the salt concentrations are too high [79].
There are many additional problems that arise in the study of the theory of electrolyte
solutions that are addressed by neither Debye- Hückel nor the models presented in this thesis.
These issues include polarizability, solvation, and chelation of ions – very real physical
occurrences that certainly impact the interactions of electrolytes. Addressing these concerns is
the subject of much ongoing work [80–82], but even this short list of complications goes to show
that the problem of understanding electrolytes from a theoretical perspective has a tremendous
depth and that even the most effective models to this point cannot and do not account for
everything.
2.2 Counterion Condensation Theory
As mentioned in Section 2.1, one of the major problems with extending the Debye-
Hückel model to polyelectrolytes is that it inherently does not account for highly concentrated or
correlated ions that necessarily arise when considering a charged chain like a polyelectrolyte.
While it is true that Debye-Hückel can theoretically account for an ionic solution of any valence
(i.e. a polyelectrolyte with 100 charged sites can be considered an ion of valence 100), the
20
concentrations of polyions for which Debye-Hückel is valid are far too small to be meaningful.
In response to these issues with Debye-Hückel, which was the prevailing theory at the time,
Gerald Manning introduced his theory of counterion condensation in 1969 [83,84].
Manning lays the foundation to his approach to polyelectrolytes with several
assumptions. The first assumption is that the screening effect of the electrolyte solution seen in
Eq. 2.8 is strong enough to consider different sections of polyelectrolyte independently. That is,
because one section of polyelectrolyte can only effectively interact with other electrostatic
particles in its immediate vicinity, then all sections of polyelectrolyte are equivalent.
Furthermore, Manning assumes that the polyelectrolyte can be assumed to be fully extended
because of this fact, although this assumption can only truly hold for a polymer with a very long
persistence length. That is, a polymer that has a very slowly decaying angle-correlation function
between consecutive monomers.
Figure 2.2 Illustration of the model for counterion condensation theory. The polyelectrolyte is treated as a
continuous linear charge in space, . Given a critical distance,
, any ions of distance
away from the
polyelectrolyte are counted as counterions and will neutralize the total charge within the cylinder about to a
certain degree. Ions of distance
away from the polyelectrolyte are shown here as fuzzy ions, are treated
with a Debye-Hückel mean field theory, and produce an overall screening effect on the Coulomb interaction.
If the polymer is fully extended with length and number of charge sites , then it is safe
to treat the discretely charged monomer units as one continuous line charge with charge density
, where
is the valence on each monomer unit, is the electron charge magnitude,
21
and is the distance between discrete charge sites. The standard unscreened Coulomb
interaction between an ion in solution with valence
a distance from the line charge is:
Consider that there is a critical value for ,
, where any ions distance
are not
affected by Debye screening as depicted in Fig. 2.2. All other ions with
contribute a
finite factor, (
) to the phase integral, then the contribution from the region
is given
by:
where
is the most relevant variable within the theory. Noting that the Bjerrum length,
, is the characteristic length at which the Coulomb energy equals the thermal energy of the
system, we can rewrite
.
Manning asserts that the ion atmosphere in close proximity to the polyion is unstable for
values |
|
which is equivalent for 1-1 monovalent salt solutions. The physical
interpretation of this instability is that counterions – that is, ions with opposite valence of the
polyelectrolyte – will ‘condense’ onto the polyelectrolyte until . This will neutralize a
fraction of the polyion equal to (
)
. The ions unaffected by this condensation effect
are treated using a Debye-Hückel approximation, as the instability has been resolved and those
ions are stable in solution. The effective concentration of counterions in solution that contribute
to Debye-Hückel screening is:
( )
(2.10)
(
) (
)∫ (
( ))
(2.11)
(
)∫
(
)
(2.12)
(2.13)
22
Using these primary assumptions, Manning goes on to derive equations for the activity
coefficient of polyelectrolyte solutions and shows that it matches fairly well with some
experimental results. However, more recent evaluations of CC theory reveal that it is not as
powerful of a model when directly compared to full Poisson-Boltzmann theory [85].
As nucleic acids are such important polyelectrolytes, Manning and others did a great deal
of work attempting to apply counterion condensation theory to DNA and RNA [86–89].
However, his primary assumption of treating the polyion as a line charge necessitates that all
DNA and RNA molecules must also be considered as line charge. In the case of a single-
stranded DNA, , and so for ssDNA, the fraction of charge neutralized according to CC
theory is with monovalent salt. However, in a double stranded DNA helix there is a
between base pairs and two charges per base pair, suggesting a value and a neutralized
fraction of for monovalent salt. Going one step further, DNA is known to form a ‘30nm
chromatin filament’ within the chromosome, as a super -compact structure of DNA with
monomers per which would suggest a neutralized fraction of . Thus very different
values are given for what is essentially the same molecule.
The reason these disparate values arise and what is ultimately never accounted for by
Manning is that the structure of these polyelectrolytes cannot always be reduced to a 1-
dimensional single line charge. This is especially true for short polymers with a highly compact
structure, as is the case with many RNA molecules. The assumption that one section of
polyelectrolyte will not interact with another section must immediately be discarded, as a small
compact molecule necessarily interacts a great deal with itself. Furthermore, the known high
degree of flexibility in single-stranded nucleic acids indicates that we cannot assume that the
polymer is fully extended, and instead we must acknowledge the polymer in the conformations it
is capable of forming. Considering these limits of CC theory, it is not truly applicable to modern
simulations of nucleic acids. However, as an analytical approximation, it does provide an
(
)
(2.14)
23
important framework that we build on in Chapter 3, in which we use some of the main concepts
of CC theory, but are able to account for any polyion structure thanks to more advanced
numerical techniques that Manning did not have access to during his development of CC theory.
2.3 Recent Developments
Deeper interest in simulation as a tool for studying RNAs combined with advances in
computational methodologies have led to new theoretical approaches to understanding the
interactions of polyelectrolytes with electrolyte solutions. New approaches typically rely on
numerical analysis of these systems rather than aiming to model these systems in a completely
closed-form way.
Linearization of the Poisson-Boltzmann equation defined in Eq. 2.7 is allowed on the
assumption that the potential arising from solution,
, is small. However, both large electrolyte
concentration and high-valence salts (or polyelectrolytes) lead to a larger value of
. Because
of this, one popular approach to studying electrolytes is to skip the linearization and instead solve
the entire Non-Linear Poisson-Boltzmann (NLPB) equation given in Eq. 2.6 for the case of high
salt concentrations or multivalent ions [90–92]. Many non-linear differential equation solvers
have been created specifically for use with biomolecular systems. [93,94] However, it has been
shown that simple application of NLPB does not completely account for the true effects of
multivalent ions like Mg
2+
[95]. Modifications to NLPB equation have been made in an attempt
to rectify these issues and are the focus of much ongoing study. Modifications that account for
steric effects [96–98] and solvent [99] have been shown to correct some issues with NLPB, but
have not been rigorously tested over a wide range of nucleic acid sizes and structures.
Another common approach has been similar to the CC approach of Manning, wherein a
small subset of counterions is treated separately from the bulk of the electrolyte solution. In
these methods, the explicit counterions that are closely associated with the polyelectrolyte are
equilibrated using Monte Carlo [99–102]. The capacity for computers to analyze a small set of
ions explicitly in this way has allowed researchers to better understand how these ions interact
24
with the charged polyelectrolyte. The model we develop in the Chapter 3 is similar to these
methods, in that we will treat some ions explicitly. However, unlike these methods, our model
accepts any arbitrary nucleic acid conformation which allows for full incorporation into any
dynamic simulation.
25
Chapter 3:
Tight-Binding Theory of the RNA-
Counterion Interaction
Reprinted with permission from Mak, C.H. & Henke, P.S., J. Chem. Theory Comput.2013, 9,
621−639. Copyright 2012 American Chemical Society.
3.1 Abstract
We present an implicit ion model for the calculation of the electrostatic free energies of RNA
conformations in the presence of divalent counterions such as Mg
2+
. The model was applied to
the native and several non-native structures of the hammerhead ribozyme and the group I intron
in Tetrahymena to study the stability of candidate unfolding intermediates. Based on a rigorous
statistical mechanical treatment of the counterions that are closely associated with the RNA
while handling the rest of the ions in the solution via a mean field theory in the Grand Canonical
ensemble, the implicit ion model accurately reproduces the ordering of their free energies,
correctly identifying the native fold as the most stable structure out of the other alternatives. For
RNA concentrations in the range below , divalent concentrations of or above,
and over a wide range of solvent dielectric constants, the equilibrium number of divalent ions
associated with the RNA remains close to what is needed to exactly neutralize the phosphate
negative charges, but the stability of compact RNA folds can be reversed when the divalent ion
concentration is lower than , causing the number of associated ions to underneutralize
the RNA. In addition to calculating counterion-mediated free energies, the model is also able to
identify potential high-affinity electronegative ion binding pockets on the RNA. The model can
be easily integrated into an all-atom Monte Carlo RNA simulation as an implicit counterion
model.
26
3.2 Introduction
Experimentally, it is well-established that the stability of RNA folds is strongly influenced by the
concentration of divalent ions such as Mg
2+
[90,103,104].
Theoretically, the understanding of
how
counterions mediate the effective attractive interactions that are
needed to stabilize the
structure of a RNA is incomplete.
Although there are unique cases where the binding of divalent
ions such as Mg
2+
to specific sites on an RNA is important for
establishing key tertiary
interaction(s) [105–107], the divalent ion−RNA association seems to be nonspecific [108–110]
for many others. From a practical point of view, the lack of a reliable theoretical model for
RNA−ion interactions has hampered the computational studies of RNA folding/unfolding,
because RNA fold stabilities are directed by the counterion-mediate electrostatic free energies.
Because of the large Coulombic forces involved, RNA simulations including explicit divalent
counterions are difficult to equilibrate. Consequently, there is a need for an implicit ion model,
which is simple enough to be computed on the fly and yet realistic enough to reflect the stability
of RNA folds accurately. This paper reports an effort along this direction.
There have been many theoretical attempts aimed at understanding the nature of the ion-
mediated free energies experienced by nucleic acids in the presence of divalent counterions.
Some of the earliest attempts were directed at explaining the stabilities of DNA
bundles [101,111–115], where the thermodynamic forces that mediate the effective attractions
between rigid DNA molecules are thought to arise from correlated counterion fluctuations on
different duplexes. There were also simulation evidence that flexible polyelectrolytes also show
both interchain and intrachain attractions [116,117] in the presence of counterions. However, the
existence and the precise nature of a possibly attractive effective intrachain interaction that could
stabilize compact chain structures are still not completely clear [118–120]. However, it is
increasingly clear that, for RNAs and multivalent counterions, the treatment of the electrostatic
free energy by mean-field theories such as Poisson−Boltzmann is insufficient to correctly capture
the counterion-induced effects [104,110,121]. Because of the strong electrostatic forces
27
involved, the ions close to the RNA are often highly correlated, and a mean-field theory fails to
describe the essential qualitative effects. This problem is exacerbated especially by higher-
valence counterions.
In this paper, we develop a statistical mechanical model for RNA−ion interactions, using
a consistent strategy to treat the strongly correlated counterions exactly, while employing a
mean-field treatment for the ions in the diffuse charge cloud around the RNA. This allows us to
both describe the equilibrium association of the counterions onto the RNA as well as compute
the ion-mediated electrostatic free energy of any RNA conformation. A numerically feasible
method has been devised to solve the statistical mechanics and calculate the free-energy
difference between different RNA conformations so that the model may be integrated into an all-
atom RNA simulation as an implicit ion model. In this paper, we have adopted a particularly
simple model to describe the ion interaction sites on the RNA similar to what is often used in
coarse-grained or tight-binding models for electrostatics [122–124], but our model can easily be
extended to include more atomic details on the RNA. Section 3.3 will describe our RNA−ion
model, as well as the mean-field treatment of the unassociated ions and the Monte Carlo
treatment of the associated counterions. Section 3.4 will discuss the equilibrium between
associated and unassociated ions. Section 3.5 shows how the free energy difference between two
RNA conformations may be computed, and Section 3.6 applies the formulation to the native and
non-native conformations of two example systems: the Schistosoma hammerhead ribozyme and
the group I intron in Tetrahymena.
3.3 The Ion RNA Model
At a sufficiently low RNA concentration, we consider placing one RNA molecule in a box of
volume to match its desired molarity. The rest of the box isfilled with a solvent with a
dielectric constant ε, and dispersed throughout it are the positive and negative ions. The box is
coupled to a infinitely large ion reservoir with positive and negative ion concentrations
and
, respectively.
28
The Hamiltonian of the system is
where the Hamiltonian of the RNA is
where
is the momentum of the th RNA atom,
is its mass and the potential
is a
function of the coordinates {
}, the Hamiltonian for the solution containing the position and
negative ions is
where
is the momentum of the th ion in the solution,
is its mass, and the potential
is a
function of the ion coordinates {
}.
is the potential between the RNA and the ions. Since
we are interested in studying electrostatic effects involving the RNA and the ions, we will
separate the potential of the RNA into an electrostatic part
and a nonelectrostatic part
,
such that
.
It will be convenient to work in a mixed ensemble where the RNA is treated canonically
with a fixed number of molecules (i.e., one) within the box while the solution is treated in a
Grand Canonical ensemble with a fluctuating number of ions at fixed chemical potentials
and
. The partition function of this mixed ensemble is defined as
where
denotes a classical canonical trace over the RNA coordinates {
},
is a Grand
Canonical trace over the ion coordinates {
}, as well as over all possible positive and negative
(3.1)
∑
(
)
(3.2)
∑
|
|
(
)
(3.3)
[ (
)] (3.4)
29
ion numbers
and
,
and
is the Boltzmann constant. The integral over the
momenta in
is separable from the configurational integral over positions, which can be
written as
where the ion-mediated free energy is given by
For every conformation of the RNA, {
}, our objective is to evaluate arising from a
Grand Canonical ensemble average over the ions.
Figure 3.1 Schematic of the model. Cation interaction sites on the RNA are shown as open circles. The positive ions
of charge
, shown as closed circles, are either associated with one of the interaction sites or in the diffuse charge
close in the solution. Negative co-ions of charge
are indicated by minus symbols. The total RNA plus solution
has a volume . The system is coupled to an ion reservoir with positive and negative ion chemical potentials
and
.
To be able to account accurately for the ions that are strongly correlated with the RNA,
we must treat them distinctly from those in the diffuse charge cloud around the molecule. We
(3.5)
(3.6)
30
imagine that there are certain places on the RNA where which positive counterions would be
preferably drawn. Fig. 3.1 depicts these possible ion interaction sites on the RNA schematically
by the large open circles superposed on an image of a RNA chain conformation. We will call the
positions of these ion interaction sites {
}. Clearly, the phosphate groups should have some of
the strongest affinities for high-valency counterions, such as Mg
2+
. Therefore, a minimal model
should incorporate at least as many ion interaction sites as there are phosphate groups. Ions that
are associated to these sites interact strongly with the RNA and also with each other, because
they are all close to the RNA backbone, and the correlations among them must be treated
accurately. We can accommodate any number of additional interaction sites as needed. In the
extreme case, we can tile the space around the RNA completely with interaction sites, turning the
entire system into a lattice model for the ions. However, this would hardly be beneficial or
necessary, since only the ions that are most strongly correlated with the RNA need to be treated
explicitly this way. In Fig. 3.1, the positive ions are represented by small filled circles. Some of
them are shown as being associated to the ion interaction sites on the RNA, while the rest are as
in the solution. Negative ions, depicted by minus symbols in Fig. 3.1, are also present in the
solution. Since the bare RNA is strongly negatively charged, we will consider only interaction
sites for the positive ions on the RNA for simplicity, but the model can easily be extended to
include interaction sites for negative ions also, if needed.
While this model is quite general, the focus of this paper is on non-site-specific ion−RNA
interactions. In many RNAs, there are known specific ion interaction sites which are either
important to the stability of the tertiary structures or to their chemistry. For example, in the
hammerhead ribozyme, a magnesium binding site close to the catalytic core has been found to be
important for its cleavage activity [47,125–128]. Similarly, in the P4−P6 do main of the
Tetrahymena group I intron, the tertiary structure seems to nucleate around a Mg ion core aided
by a tetraloop−receptor interaction [129–134]. In many of these cases, the key Mg ions involved
are either chelated to specific atoms in the RNA or they are bound to specific sites on the RNA
through hydration waters. While these site-specific ion−RNA interactions are critical for certain
31
RNAs, there is another, more-generic class of ion−RNA interactions that do not appear to be
site- or RNA-specific. The general observation that the structures of almost all RNAs can be
stabilized by a divalent ion concentration in the millimolar range suggests that non-site specific
ion−RNA interactions is an important contributor to RNA fold stability, and these nonspecific
interactions are believed to arise from correlated fluctuations in those counterions that are
strongly associated with the RNA. In addition to these, the distributed counterion background in
the solution can also interact with the RNA in the form of a diffuse charge cloud that is only
loosely associated with the RNA. The formulation in this paper is aimed at modeling nonspecific
ion−RNA interactions arising from both the correlated and diffuse counterions, but not the ones
associated with specific sites, such as chelated ions, and those bound to the RNA through their
solvation waters. However, we will also show in the examples below that certain gross features
of site-specific ion−RNA interactions can be extracted from our calculations, even though the
model is not specifically intended to address them. In our model, the ions in the diffuse charge
cloud around the RNA will be handled effectively by a mean field approximation. This
separation is justified when the number of ions associated to the interaction sites on the RNA is
roughly sufficient to neutralize the negative charges on the phosphate backbone. In this case, the
RNA plus its associated ions presents itself to the diffuse charge cloud as an overall
approximately neutral object, and the resulting charge density fluctuations in the solution are
expected to be weak. To segregate out these ions, we consider the dissociated ions in the solution
as a separate subsystem from the associated ones. Notice that even though we have defined
specific interaction sites on the RNA, our model does not necessitate any ion to actually be
bound to the RNA. Since the two subsystems are both in equilibrium with the ion reservoir, the
equilibrium number of associated ions will be controlled by the thermodynamic conditions as
well as the system’s physical parameters. Therefore, we can use our model to investigate shifts in
the equilibrium distribution between the associated and unassociated ions as external conditions
change.
32
Our model is aimed at understanding the electrostatic free energy resulting from the
association of divalent counterions to RNAs. Physiologically, the most relevant +2 ion is
probably Mg
2+
, while experimentally, ions such as Ca
2+
, Sr
2+
, Ba
2+
, Mn
2+
, and Cd
2+
have shown
non-universal effects. Our model is not sensitive to the chemical differences among these +2
ions, and it does not directly address specific ion-binding properties.
Although parameters in the model, such as the size of the ion and its differential
interaction strengths with different atomic sites on the RNA, can be calibrated to partially
account for ions-specific binding effects, this will be left for future study. Since there is no
explicit water in our model, it also will not address specific effects related to hydration, except
through a dielectric continuum.
With the two ion subsystems defined, we can now divide
in Eq. 3.6 further.
Based on the current RNA conformation {
}, we first define the positions of the ion interaction
sites {
}. To each possible ion interaction site , we assign an occupation number:
When an associated ion leaves the solution and enters an ion interaction site, it is confined to a
smaller free volume, compared to what is available to it in the solution. To properly account for
this, we add a free-energy contribution each time an ion associates with the RNA of the form
where
(
) ,
is the effective free volume of ion interaction site ,
(
)
is the thermal wavelength of a positive ion and
is its mass. Furthermore,
when a positive ion of charge
is associated with interaction site , it interacts with the partial
charge
on each RNA atom , as well as the associated ions on the other ion interaction sites.
These interactions are given by
{
(3.7)
∑
(3.8)
33
where
is the Bjerrum length, is the proton charge,
is the vacuum
permittivity, and is the dielectric of the solvent. Because each RNA atom has n impenetrable
core, the unassociated ions cannot approach it closer than some critical distance
and the
Hamiltonian must contain a hardcore interaction between every ion at position
and every
RNA atom in the form
where is the Heaviside function:
and we let
. The charges on the RNA atoms and the associated ions around it present an
electric potential to the unassociated ions (both positive and negative) in the solution in the form
of
where the first term arises from the RNA atoms and the second term arises from the associated
ions. In Eq. 3.12, we have included a step function that turns off the Coulombic interaction
between a RNA atom and an unassociated ion when their distance is closer than the atom’s hard
core radius. (Because of the presence of the hard-core interactions in
, this cutoff does not
physically alter anything. However, in the mean field treatment of the unassociated ions below,
the introduction of this cutoff will simplify the mathematics.) The interactions of the
unassociated ions with themselves and with the RNA and its associated ions are included in the
Hamiltonian:
∑
|
|
∑
∑
(3.9)
∑∑
(
)
(3.10)
( ) {
(3.11)
( ) ∑
(
) ∑
(3.12)
34
where
is the position of the th unassociated ion and
is its charge (either
for a positive
ion or
for a negative ion).
The ion-mediated RNA free energy in Eq. 3.6 can now be rewritten as
where
denotes a trace over the occupation variables and {
},
∑
is the total
number of associated (positive) ions. The effective free energy for the occupation
configuration {
} induced by the unassociated ions is given by
3.3.1. Mean Field Treatment of the Unassociated Ions
In this section, we will address the calculation of . Using mean field theory, we will
perform the trace over the unassociated ions, and at the end the mean field results, it will
resemble a Poisson−Boltzmann theory or, when linearized, a Debye−H ckel treatment, of the
unassociated ions. However, as pointed out by Zoetekouw and van Roij [135,136], a consistent
Grand Canonical treatment will also produce additional volume-dependent free energy terms that
are absent in the conventional canonical treatment. With very minor differences, our Grand
Canonical mean field theory is similar to that given in their work [135].
We want to compute the grand ensemble free energy in Eq. 3.15 given a certain RNA
conformation {
} and associated ion configuration {
}. To formulate a mean field theory for
the
unassociated ions in the grand ensemble, we start with the classical density functional
theory expression for the free energy [137]:
∑
∑
|
|
∑ (
)
(3.13)
[ (
) ] (3.14)
[ ({
} {
})]
[ (
)] (3.15)
35
where
are the positive and negative ion densities in the solution, ( ) is the external potential
and is a functional of the ion densities not dependent on . The “external”potential , given
by Eq. 3.12, is imposed on the unassociated ions by the RNA and its associated ions. According
to density functional theory (DFT), the equilibrium densities
are those that minimize the
functional :
To proceed, we apply the standard mean field treatment [138] to derive an approximate
expression for :
where the charge density ( )
( )
( ) and
is the free energy of an ideal
solution of point ions,
and
, the concentrations of the ions in the ion reservoir are related to their chemical potentials
by
(
)
(
). In Eqs. 18 and 19, the greater than (“> ”) signs on the integrals
indicate that we have restricted the integrations to space outside the RNA atom cores. We can
easily minimize Eq. 3.18, with respect to variations in
. If we define an effective potential
as
∫
[
( )
( )] ( ) [
] (3.16)
( )
( )
(3.17)
∫
( ) ( )
∫
( ) (
)
[
]
[
] (3.18)
[
] ∫
( )(
( )
)
(3.19)
( )
∫
( )
( ) (3.20)
36
and the densities
( )
[ ( )], minimizing will lead to a Poisson−Boltzmann
equation for the charge density
outside the atom cores. The
Poisson−Boltzmann equation can be solved numerically [92,139,140], but since we need to solve
it not only for every RNA conformation but also for every associated ion configuration {
}, we
will linearize the Poisson−Boltzmann equation to simplify it further. This leads to a
Debye−H ckel theory for the unassociated ions, but as we have previously mentioned, other
terms proportional to the volume, which are absent in the conventional Debye−H ckel theory,
will also emerge in the Grand Canonical treatment below.
Expanding Eq. 3.19 to second order in
gives
where
is the free volume of the solution (the total volume minus the hard-core volumes of the
RNA) and ( )
( ) ̅
is the deviation of the ion densities from their means defined
by
̅
∫
( ) . Minimizing Eq. 3.18 with Eq. 3.21 yields two self-consistent
equations:
In the above, the mean densities ̅
inside the box are not necessarily equal to the limiting ion
concentrations in the ion reservoir. The relationship between them is determined by integrating
Eq. 3.22 over the entire box, giving
and
̅
is the integral of over the entire box. The parameter
̅
, in turn, is related to the physical
parameters of the system via the neutrality of the system inside the box:
[
]
̅
(
)
̅
∫
( )
̅
∫
[
( )]
(3.21)
̅
( )
̅
( ) (3.22)
̅
(
̅
) (3.23)
37
where the right-hand side is the total charge of the RNA plus its associated ions, with
being
the total charge of the RNA (
∑
) . For every configuration of the associated ions,
can thus be determined before
( ) is actually computed. However, before actually solving for
, Eq. 3.22 can also be substituted back into Eq. 3.21 to obtain a simpler expression for . It is
straightforward to show that
where ̅
̅
̅
is the average of the total charge density. The second term gives rise to
one of the free-energy contributions proportional to the volume. To solve for
, we employ two
linear combinations:
and
̅
̅
, and use Eq. 3.23 to
rewrite Eq. 3.22 as
which applies to the space outside the RNA atom cores and
[
̅
̅
] is the ionic
strength of the solution. The second part of Eq. 3.26 is trivial, while the first part, being a linear
integral equation in , can be solved using Fourier transforms. However, since the integral for
in Eq. 3.20 is restricted to space outside of the atom cores while the Fourier transform is
performed over all space, we will add a core potential term in the form of
∫
( )∑
(
)
to Eq. 3.18 to expel the unassociated ion away from the
RNA atom cores and extend the spatial integration to cover all space. Physically, adding this
extra potential should not alter anything since there is already a hard wall potential in
for all
unassociated ions. Reminimizing with this additional term leads to
[
(
̅
)
(
̅
)]
∑
(3.24)
∫
( ) ( ) ( ̅
̅
̅
̅
)
(3.25)
( ) ̅ [ ( )
̅
] ( )
(3.26)
38
which can now be solved by Fourier transform, yielding
where ̂( ) ∫
( ) and (
)
is the reciprocal of the Debye screening
length. While thefirst term in ̂ does not contribute to ( ) , it will contribute to the free energy
and gives rise to one of the volume-dependent terms. The second term arises from the
interactions of the charge density with the associated ions, and the third from interactions with
the RNA atoms. Finally, setting each
(
) and inverse transforming
̂( ) , we find that the charge density in the solution is a sum of one-particle densities, each one
centered around a charge center, which is either a RNA atom having partial charge
and radius
or an associated ion with charge
and radius :
where
( ) ̅ [ ( )
̅
] ∑
(
)
(3.27)
̂( )
(
̅
̅ )
( )
∑
(
) (
)
∑ (
)[
(
)
(
) (
)]
(3.28)
( ) ∑
( )
(
)
∑
( )
(
)
(3.29)
( )
( ) {
(
( )
)
( )
(3.30)
39
Notice that, although the individual
( )
vanishes inside the RNA atom cores, their sum in Eq.
3.29 does leak into the cores. Substitution of Eq. 3.29 back into Eq. 3.25 and properly
accounting for the term in the ̂( ) yields thefinal expression for the free energy:
where the effective interaction induced by the unassociated ions between two charge centers with
hard-core radii
and
is
Notice that is asymmetric with respect to
and
,an artifact of the meanfield approximation.
Also, whenever
, the screening length (Eq. 3.32) reverts to the standard
Debye−Hückel result [66]. The first three terms in correspond to the interactions of the RNA
atoms with themselves, of the associated ions with themselves, and between the RNA atoms and
the associated ions, respectively. The unscreened anti-Coulombic terms in (i.e., those
proportional to the reciprocal distances but with minus signs) will cancel the Coulombic
interactions in the original RNA electrostatic potential
, as well as in the Hamiltonian
and
replace them with the screened interactionsugiven by Eq. 3.32.
∑
[ (|
|
)
|
|
]
∑
[ (|
| )
|
|
]
∑
[ (
)
]
(
̅
̅ )
(
∑
)
( ̅
̅
̅
̅
)
(3.31)
(
) {
[ (
)][ (
)]
(
)
(3.32)
40
3.3.2. Monte Carlo Treatment of the Associated Ions.
While the calculation of in Eq. 3.15 has been performed using a mean field approximation, the
calculation of in Eq. 3.14 will require an exact treatment because of the strong correlations
involved. We will use a Monte Carlo simulation to explicitly carry out the ensemble average
over {
}. In this section, we will only address the calculation of Grand Canonical ensemble
averages, leaving the actual calculation of the free energy to Section 3.4.
Using the expressions for
,
, and above, the trace in Eq. 3.14 can be rewritten as
where the effective Hamiltonian for the occupation numbers
takes the general form
In Eq. 3.34, the two-body terms
come solely from the screened interactions in of the
associated ions with themselves
since the anti-Coulombic terms in exactly cancel their Coulombic counterparts in
. The one-
body terms
in Eq. 3.34 arise from the interactions of the associated ions with the RNA, after
the anti-Coulombic terms in cancel the Coulombic ones in
, together with the entropic
factors in
, giving
Finally,
results from the RNA-RNA interactions as well as the volume terms in :
( )
[ (
)] (3.33)
∑
∑
(3.34)
(|
| ) (3.35)
(|
|
)
(3.36)
41
where the first term represents the screened interactions among the RNA atoms left after the anti-
Coulombic terms in exactly cancel
. Notice that
is not dependent on the individual
occupation numbers but only on their total
∑
.
Eq. 3.34 appears to have a Hamiltonian resembling a lattice-gas model. However, the
parameters
,
, and
are not constants. They all are dependent on the total number of
associated ions (
), which enters
, ̅
, and ̅ in complicated ways. These, in turn, affect the
magnitude of the screened Coulombic interactionsu in Eq. 3.32. Therefore, when the number of
associated ions
changes in the simulation, all of the parameters
,
, and
in
will have
to be re-evaluated. Even though it is somewhat cumbersome, this task is numerically quite
manageable as long as the number of ion interaction sites on the RNA is not too large.
Any Grand Canonical ensemble average can now be calculated based on the trace in Eq.
3.34. For example, the ensemble average of the total occupation number is
The averaging of the type in Eq. 3.38 can be carried out using a standard Metropolis
algorithm [141]. For the results presented in Section 3.4, a combination of two types of Monte
Carlo (MC) moves were typically used. The first consisted of a single-site flip: If
, a new
configuration where
was considered, and vice versa. The second MC move consisted of a
left or right cyclic permutation of a randomly selected block of sites. The first MC move alters
the total number of associated ions
, while the second conserves it. Starting with a random
∑
(|
|
)
(
̅
̅ )
(
)
( ̅
̅
̅
̅
)
(3.37)
〈
〉
[ (
)
]
[ (
)]
(3.38)
42
occupation number configuration, the system was first equilibrated, and statistics were then
collected using several long MC simulations for each set of conditions.
3.4 Association Equilibrium of Counterions to the RNA
In this section, we will discuss the results of Monte Carlo simulations using the RNA−ion model
described in Section 3.3, with the hammerhead ribozyme as an example. In two previous
studies [56,142], we have investigated the possible conformational changes involved in the
unfolding of the Schistosoma hammerhead ribozyme and have identified several characteristic
gateway states between the native conformation and the unfolded three-way junction structure.
We want to use the RNA−ion model here to evaluate the free -energy differences between these
gateway states and the native structure to better understand their stabilities and what role the ions
play in the folding sequence. In this section, we will first examine the equilibrium distribution of
associated/unassociated ions for different RNA conformations under a variety of physical
conditions. The actual calculation of their free energy differences will be addressed in Section
5.1.
The equilibrium partitioning of the counterions between their associated and unassociated
states is key to the understanding of RNA fold stabilities. From the RNA−ion model above, we
know that if there are no associated ions, the unfolded state of the RNA is highly favored. Even
though the unassociated ions produce a screening of the electrostatic interactions of the RNA
with itself, the overall effects of all the charges on the RNA still lead to a net repulsive self -
interaction. hile the Debye−H ckel scr eening reduces the strength of these interactions, it does
not alter the overall sign of the electrostatic interactions, which remains predominantly repulsive.
Therefore, fold stabilization within our model is impossible without associated ions. On the other
hand, an overloading of associated counterions leads to the opposite result. If the total charge
coming from the associated +2 ions is much larger than what is needed to neutralize the negative
charge on the RNA, the overall net positive charge will give rise to a repulsion that again favors
43
the unfolded structure. So if the ions are to be able to induce a net attractive self-interaction for
the RNA, the number of associated ions must be somewhere between these two limits. But even
that does not guarantee the existence of a net attractive self-interaction, the proof of which must
come from a careful free-energy calculation, which we will present in Section 3.5. The
remainder of this section will focus on the equilibrium partitioning of ions between their
associated and unassociated states.
3.4.1 Placement of Ion Interaction Sites
Given each RNA conformation, we first need to decide where to locate the positive-ion
interaction sites on the RNA. Since the negative partial charges are concentrated on the
phosphate groups along the backbone, the simplest model is to assume that the strongest
interaction sites for the positive ions are located near the phosphate groups. Since there is one
phosphate per nucleotide, in a zero-order model, there should be as many ion interaction sites as
the number of nucleotides in the RNA. For simplicity, we will assume that when a positive ion is
associated with one of these sites, it is, on average, at a distance from the P atom. Furthermore,
we will assume that the negative charge of each phosphate group is concentrated on the P atom,
giving each P atom a complete charge. While this zero-order model seems rather crude, since
we are focusing on nonspecific ion−RNA interactions, we do not expect the general nature of the
RNA−ion interaction to be dependent very specifically on the atomistic details of the RNA
structure, since the ability of multivalent ions to stabilize RNA folds seems to be quite
generic [90]. Therefore, this zero-order model for the ion interaction sites on the RNA is a
reasonable starting point. As we will see below, this simple model already captures many of the
essential qualitative features of the RNA−ion interactions that are needed to p roperly
discriminate the native fold from the unfolded conformations, and it also produces free energies
that are quantitatively quite reasonable. If necessary, we can refine this zero-order model by
introducing additional ion interaction sites, as well as accounting for the partial charges of the
individual RNA atoms to try to produce better agreement with experiments or simulations.
44
Figure 3.2 (Left) The two-dimensional (2D) structure of the Schistosoma hammerhead ribozyme [56,142]. (Right)
Six structures of the hammerhead ribozyme considered in this paper. These structures were identified by a Monte
Carlo simulation using large conformational moves as possible gateway structures to the unfolding of the
hammerhead. (N) is the native tertiary structure. The other five have varying degrees of openness: (E) and (S) are
the most extended, and (Y) and (YB) are the least open.
For each Monte Carlo calculation, we consider a single conformation of the full-length
Schistosoma hammerhead ribozyme [143] at a RNA concentration in the range of to
. The counterions are ions, and we assume the solution also contains added ions
(
) and ions (
) with a
salt concentration ranging between and . The
native conformation of the hammerhead, as well as five alternatives characteristic unfolded
conformations that we have identified in previous large-scale Monte Carlo simulations [56,142],
are shown in Fig. 3.2. In the full-length Schistosoma hammerhead, there are 61 phosphate
groups. For each RNA conformation, we therefore position 61 ion interaction sites along the
backbone, one at each P atom. As in the zero-order model, we place a full charge on each P
atom and assume that the average separation between it and a ion associated with it is ,
which is appropriate for a size similar to Mg
2+
. For the calculation of the entropic factors
, we
assume that each associated ion has a translational free volume corresponding to a -thick
spherical shell around each P atom. In the calculations, we have also examined the effect of the
45
dielectric of the solvent by varying it between 1 and 80. The two-dimensional (2D) structure of
the hammerhead is shown on the left of Fig. 3.2. The hammerhead is composed of two
hybridized strands. Starting from the 5′ end, the longer str and is responsible for the enzymatic
action of the molecule, while the substrate is the shorter strand with the scission site at
(colored green). The three-dimensional (3D) structures of some of the conformations we have
studied by Monte Carlo are also shown in Fig. 3.2. The native structure is labeled “N”. The other
five structures were identified by a Monte Carlo simulation using large conformational
moves [56,142]. These five non-native structures are candidate intermediates along the different
unfolding pathways, and they have various degree of openness, with “Y” being the most compact
structure and “S” being the most extended. For each of them, we placed the ion interaction sites
on the P atoms as described in Section 3.4.1 and performed a series of Monte Carlo studies
according to the method delineated in Section 3.3.2 to calculate the equilibrium number of
associated ions under a variety of different physical conditions. The results are presented in the
next section.
3.4.2. Equilibrium Distribution between Associated and Dissociated Ions.
For each of the six conformations in Fig. 3.2, we have performed Monte Carlo simulations at
different RNA and ions concentrations to study the equilibrium between associated and
unassociated ions. All simulations were carried out for . Fig. 3.3a shows results for
the average number of associated ions,
, for a solution with a RNA concentration in
the native conformation (structure N in Fig. 3.2) with various ion reservoir concentrations of the
salt, plotted as a function of the dielectric constant ( ) of the solvent. The four sets of data
correspond to salt concentration ranging from to . In this and all other plots in
this paper, the error bars are smaller than the size of the plotting symbols, unless otherwise
indicated. The dashed line indicates the number of +2 ions that would exactly neutralize the
RNA backbone, which is 30.5 in this case, since there are 61 phosphate groups on the
hammerhead structure. As expected, when the +2 ion concentration
in the reservoir is weak
46
(
), the equilibrium favors the unassociated ions. In contrast to this, higher +2 ion
concentrations (
) produce the opposite result, favoring associated ions instead.
Figure 3.3 Monte Carlo results for the average number of associated ions
for (a) the native structure N and two
other conformations ((b) YB and (c) S). The concentration of the RNA in all three is , while the MgCl
2
molarity ranges from to . The horizontal dotted line in each graph indicates the number of Mg
2+
needed
to exactly neutralize the RNA (30.5).
is plotted as afunction of the solvent dielectric constant ( ).
47
However, it is also clear that the equilibrium position is sensitive to . Especially for weak +2
ion concentrations, changing ε signific antly shifts the distribution between associated and
unassociated. On the other hand, when the +2 ion concentration is sufficiently high (
and in Fig. 3.3a), the effect of changing is highly suppressed, except at
extremely small values (less than ). Comparing the results in Fig. 3.3a for
and , it appears that when the concentration of +2 ions is in the range of or above,
the equilibrium distribution is firmly in favor of associated ions, to the extent that they
approximately neutralize the RNA backbone charge, i.e.
. Except for high +2 ion
concentrations and very low dielectric solvents,
remains very close to this value, indicating
that the RNA and its associated ions form a roughly neutral entity for
in the range of
or above. Since the physiological concentration of Mg
2+
ions in the cell is estimated to be in the
range of [110] and this level of +2 ion concentration appears to be sufficient to
stabilize most RNA tertiary structures, our Monte Carlo results seem to be quite reasonable.
However, we should also mention that the physiological concentrations of different RNAs are
not as well established; therefore, it is also important to examine how this
associated/unassociated ion equilibrium is affected by the RNA concentration.
To address this question, we have performed additional Monte Carlo simulations at
several different RNA concentrations. The results for two of them are shown in Fig. 3.4. In Fig.
3.4a,
is plotted against εfor the native conformation N at a concentration of , which is
10 times smaller than that shown in Fig. 3.3a. The two sets of results are qualitatively very
similar, except the transition to the associated ionic state now occurs at lower concentrations of
+2 ions. This is consistent with our expectation that a smaller RNA concentration should shift
the equilibrium transition between associated and unassociated ions to lower ionic
concentrations. But comparing the data for
and to those in Fig. 3.3a, we see
that the conclusion that we have reached above – that, at sufficiently high +2 ion concentrations,
the ions are associated to the extent that they would roughly neutralize the RNA backbone –
seems to hold true and remain unchanged for all lower RNA concentrations. Fig. 3.4b shows the
48
same results for a RNA concentration of , which is 10 times larger than that in Fig. 3.3a. In
this case, the equilibrium position shifts in the opposite direction to higher ionic concentrations,
which is consistent with expectations.
Figure 3.4 Plots showing the same Monte Carlo results as those observed in Figure 3 for two other RNA
concentrations, and , for the native fold.
Returning to Fig. 3.3, we examine the Monte Carlo results for other conformations. Two
examples are shown in Figs. 3.3b and 3.3c, at the same RNA concentration of as
observed in Fig. 3.3a. Results in Fig. 3.3b were obtained for the conformation labeled YB in Fig.
3.2, whereas those in Fig. 3.3c were for structure S. Compared to the native structure (N), which
is the most compact, structure S is the most extended of the six conformations that we have
studied and of all structures identified in the original conformational study of Mak et.
al. [56,142] Structure YB is a non-native structure between the two extremes. As data in Fig. 3.3
49
show, except for very minor differences, the equilibrium distribution of associated and
unassociated ions are virtually indistinguishable among the three conformations, even though the
compactness of their structures is vastly different. (The same conclusion could be reached for the
other three conformations in Fig. 3.2 and their Monte Carlo results will not be shown explicitly.)
This suggests that upon the folding or unfolding of the RNA, the equilibrium number of ions
associated to the RNA remains largely unchanged, as long as the concentration of the +2 ion is in
the range of or above.
3.4.3 Higher-Valency Counterions
In the literature, there is substantial agreement that the role that +1 counterions, such as Na
+
and
K
+
, play in stabilizing RNA folds is qualitatively different from that of +2 ions such as
Mg
2+
[110]. Although there are examples where Mg
2+
can act as chelating ions on specific sites
in certain RNAs [107,129], the majority of folding studies on RNAs suggest that Mg
2+
association is nonspecific, although the precise nature of how +2 ions stabilize RNA folds is still
debated. The key difference between +1 and +2 counterions is their entropies. Because it takes
half an many +2 ions to neutralize the same RNA, the association of +2 counterions causes less
entropic penalty, and this leads to a qualitative difference between how +1 and +2 ions interact
with the RNA [101,112]. A second factor is the size of the ion. Mg
2+
, being a smaller ion than
either Na
+
and K
+
, is better able to hold onto its hydration waters stronger than Na
+
or K
+
, even
when associated with an RNA [144]. If this is true, aside from electrostatics, binding Mg
2+
will be more favorable for the RNA than binding Na
+
or K
+
, because the ion solvation shells will
not have to be disturbed when Mg
2+
binds. Notice that the model we have developed in this
paper does not require a specific model for how the +2 ions bind. The only key assumption in
our model is that the statistical mechanics of RNA−ion interactions is controlled by electrostatics
but not by specific interactions. Even if Mg
2+
binds as a hydrated ion, the primary interactions
between the ions and the RNA remain electrostatic in nature, albeit possibly with a different
dielectric constant. Our model can easily be extended to accommodate these necessary
50
modifications. But even if the nonspecific association of Mg
2+
to a RNA can be proven to be
thermodynamically stable, binding itself does not necessarily imply that the ions are able to
induce an effective interaction that could stabilize the tertiary structure of the RNA. This is why
an accurate estimate of the free energies is crucial, and this will be addressed in Section 3.5.
Figure 3.5 Plot showing the same Monte Carlo results as those observed in Fig. 3.3 for the native fold with +3
counterions instead of +2. The number of ions needed to neutralize the RNA is 20.3, which is indicated by the
dotted line.
If the association of +2 counterions to RNAs is nonspecific and they do give rise to
stabilization of RNA folds, then one may speculate that the association of +3 of +4 ions would
lead to even stronger stabilities [110]. Of the two key factors that influence association
mentioned above, the entropic advantage of binding +3 ions versus +2 ions will be weaker than
the difference between +1 ions and +2 ions, and the integrity of the hydration shell will be
clearly dependent on the identity of the ion. To understand how changing to +3 counterions
affect the statistical mechanics of the system, we have carried out Monte Carlo simulations using
a fictitious ion X
3+
, assuming that it has the same size and mass as Mg
2+
. The results are shown
in Fig. 3.5 for a RNA molarity of in the native conformation. Except for the fact that the
number of +3 ions needed to neutralize the RNA backbone is reduced to (indicated by the
dashed line), these results are qualitatively very similar to those in Fig. 3.4a with the same RNA
concentration. Therefore, the nature of the effects behind the association of +3 counterions seems
to be similar to that of the +2 ions. However, the ion reservoir concentrations used to obtained
51
the data in Fig. 3.5 are actually 10 times lower than those used in Fig. 3.4a; therefore, as
expected, the association of +3 ions is thermodynamically more stable than that of +2 ions.
3.5 Free Energies and Ion-Mediated RNA Fold Stabilities
While the Monte Carlo results in the last section can address the association of the ions to the
RNA, the question of whether the associated ions are indeed able to produce the effective
interactions required to stabilize the folded structure of an RNA can only be answered by
calculating the difference in free energies between two RNA conformations. If this is indeed
possible, we can integrate the simple model developed in this paper into a fully atomistic
molecular dynamics or Monte Carlo simulation to calculate the counterion-mediated free energy
without including explicit counterions or solvent molecules, giving the simulation an immense
numerical and speed advantage. In our work, we are particularly interested in building this
effective counterion model into our own Loops MC simulation program [56] to carry out large-
scale RNA folding and unfolding simulations.
Before we go into the details of the free-energy calculations, we return to one of the key
observations made in Section 3.4 from Fig. 3.3. We noticed that the ion-association
characteristics are very similar among all the different RNA conformations. While the Grand
Canonical simulations in Section 3.4 enable us to study ion association, during the course of the
folding or unfolding of an RNA, the number of associated ions remains essentially close to
neutralizing condition. Therefore, once we have determined
using the Grand Canonical
Monte Carlo simulation above for the desired RNA and ion concentrations, we can then switch
to a canonical ensemble for the counterions using afixed
equal to
to study the folding or
the unfolding of the RNA. This is advantageous because, in the effective Hamiltonian described
by Eq. 3.34 for the associated ions, the factors
,
, and hσare
-dependent. In the Grand
Canonical simulations, when the number of associated ions changes, all these factors must be
recomputed. But if
is constant during the folding or the unfolding of the RNA, then the
parameters in Eq. 3.34 are fixed and the corresponding Monte Carlo simulation in the canonical
52
ensemble would be computational much more efficiently. This is particularly important if this
calculation is to be integrated into an atomistic RNA simulation as the solvent model, because
the algorithm must be fast enough to be called every time step.
3.5.1 Monte Carlo Calculation of the Free Energy
In the canonical ensemble for the associated ions where
is fixed, the parameters
,
, and
in the effective Hamiltonian for the associated ions
in Eq. 3.34 are constants for a given
RNA conformation {
}. To emphasize this, we will write them as
( ) ,
( ) , and
( ) ,
where denotes a RNA conformation{
}, so that the effective Hamiltonian for the occupation
numbers {
} is a function of :
To compare the stabilities of two RNA folds
and
,we must compute the free-energy
difference (
) (
) between the two conformations, with being given by Eq.
3.33 and the Grand Canonical trace
replaced by a canonical one at fixed
. To calculate
, we employed the Bennett acceptance ratio (BAR) method [145]. For this, we ran two
parallel simulations: one on
(
) , to generate a Monte Carlo sampling of the occupation
number configuration {
}
; and the other on
(
) , generating a sampling of {
}
. Both
simulations are determined at the samefixed
. Periodically, we considered switching the
instantaneous occupation number configuration {
}
from one RNA conformation
to
with the Metropolis acceptance probability [141],
and vice versa. The free-energy difference is given in the BAR method by the ratio
({
} ) ∑
( )
∑
( )
( )
(3.39)
( ) ( [ [
({
}
)
({
}
)]]) (3.40)
( [ (
) (
)])
〈 ( )〉
〈 ( )〉
(3.41)
53
where
denotes an ensemble average of along Monte Carlo trajectory {
}
.
For the results presented below, a combination of two types of Monte Carlo moves were
typically used. The first consisted of exchanging the occupation numbers between two sites. The
second Monte Carlo move consisted of a left or right cyclic permutation of a randomly selected
block of sites. Both MC moves conserve
. Starting with a random occupation number
configuration, the system was first equilibrated, and statistics were then collected using several
long MC simulations for each pair of RNA conformations. In the following, we will
demonstrate this using three specific examples.
3.6 Ion-Mediated Free Energies and RNA Fold Stabilities
3.6.1 The Hammerhead Ribozyme
Using the BAR method described above, we have computed the free-energy difference between
every pair of the hammerhead conformations shown in Fig. 3.2 with different number of
associated +2 ions and at varying dielectric constants. For all of these calculations, the RNA
concentration was and the salt concentration was . The results are summarized in
Fig. 3.6, where the free energy of each conformation, relative to the native fold (N), is plotted as
a function of the dielectric constant . With 61 phosphate groups on the hammerhead and
,
and are the two associated ion numbers closest to neutralizing the RNA.
For every RNA conformation, the shown in Figs. 3.6a and 3.6b for
and are
almost quantitatively identical. Moreover, Fig. 3.6 reveals that the free energy for every non-
native conformation is higher than the native structure, providing clear evidence that the model
we have formulated is able to correctly identify the native structure as the lowest free-energy
fold out of many vastly diverse alternative conformations, as long as the number of associated
ions is close to neutralizing the RNA.
54
Figure 3.6 Monte Carlo results from canonical ensemble simulations using the BAR method to calculate the free-
energy difference between each of thefive non-native conformations in Fig. 3.6 (E, S,TT, YB, and Y) and the
native state. The data in panels (a) and (b) are for the two
values, 30 and 31, closest to the number of +2 ions
required to exactly neutralize the RNA. for the five non-native conformations indicate that the native fold is
most stable and the extended structures (E) and (S) are the least stable.
The data in Fig. 3.6a for
are replotted in Fig. 3.7 for several values of ,
showing more clearly how the counterion-induced free energy F varies with structure. Of the six
structures in Fig. 3.2, the native conformation (N) is clearly the most compact, while E and S are
the most extended. Other than these, conformation TT is more open than Y and YB. From Figs.
3.6 and 3.7, the stability of each conformation is clearly seen to be correlated with the degree of
openness of the structure. Structures E and S have the highest free energies, while structures YB
and Y have free energies closest to that of the native state. Structure TT has intermediate free
energy. This correlation between the stability of the fold to the compactness of its structure holds
true over the entire range of the dielectric constant, from to , although the relative
55
stabilities of the structures can vary with (e.g., between Y and YB). Different measures of the
compactness of these structures, including their radii of gyration and rootmean-square deviation
(RMSD) from the native conformation, as well as the solvent-accessible surface areas (SASA),
are given in Table 3.1. Since the free energy decreases when the structure becomes more
compact, these results are consistent with the common view that the free-energy landscape can
form a funnel to direct the folding of the RNA into its lowest free-energy native state.
Figure 3.7 Free energies from Fig. 3.6a replotted as a function of the six structures in Fig. 3.2 for several values of
the dielectric constant.
Bulk measurements of the counterion-mediated free energy cannot be made for a specific
conformation. This makes a direct comparison between the computed free energies and
experiments difficult. Experimentally, the unfolding of a small RNA typically goes through a
compact but disordered “intermediate state”, which is often characterized by a collection of
nonspecific structures; therefore, the experimental folding free energy does not necessarily
correspond to the calculated free energy of any specific conformation. However, the folding free
energy can provide an experimental validation for the order of magnitude of the calculated free
energies. Draper et al. have analyzed experimental and theoretical free energy data for several
small RNAs similar in size to the hammerhead [44,107,146]. For tRNA [147] they attributed a
free energy difference between the folded and the intermediate states of to the
counterions, drawing evidence from previous experimental Mg
2+
binding data from Romer and
56
Hach [148]. For a 58-nucleotide fragment of rRNA [107,146], they reported a somewhat larger
free energy difference between the folded and the intermediate states coming
from nonspecific Mg
2+
association with the RNA. The magnitudes of these estimates are in very
good agreement with our calculated free energies at low dielectric. Computing reliable ion-
mediated electrostatic free energies for RNA folds can be a numerically demanding task. We
have seen that, depending on ε, the free energies of the non -native structures are on the order of
to at room temperature. Given that the individual electrostatic interactions
between charges can be 2 or 3 orders of magnitude stronger, these relatively minute free-energy
differences must be the results of the cancellations of many large and counterbalancing terms,
and computing them requires a high degree of computational effort, as well as numerical
precision.
3.6.1.1 Ion-Ion Correlation and Other Ion Effects
Fig. 3.8 shows typical occupancies of the ion interactions sites observed in the Monte Carlo
simulations for three conformations (Y), (E), and (S) at a solvent dielectric . Notice that
the occupied (yellow) and unoccupied (blue) Mg
2+
interaction sites tend to alternate along the
backbone of the RNA, indicating that the ions when associating with the RNA attempt to
organize themselves in such a way to try to minimize the repulsive Coulombic energy between
closest neighbors. However, this alternating pattern does become less pronounced when is
increased.
57
Figure 3.8 Snapshots showing representative configurations of the occupation of the ion interactions sites for three
RNA conformations Y, E, and S by Mg
2+
at and
. Occupied ion interaction sites are shown in
yellow and unoccupied sites are shown in blue.
In Fig. 3.9, we examine the effect of moving the associated ion number
farther away
from exactly neutralizing the RNA (30.5 in this case). Instead of a logarithmic scale, as
observed in Fig. 3.6, is plotted on a linear scale in Fig. 3.9. The results for
in Fig.
3.9a shows a similar correlation between the stability of the fold with the compactness of its
structure. The ordering of the stabilities of the five non-native structures are similar to those in
Figs. 3.6a and 3.6b, but the overall magnitude of the free-energy differences appear to diminish
when
moves away from the number required to exactly neutralize the RNA. The results for
are similar to those in Fig. 3.9a and are not shown. In Fig. 3.9b, we show results for
, which is farther away from the neutralizing condition. Now the free energies of all the
non-native conformations become lower than the native structure, indicating that, far away from
the neutralizing condition, the counterions actually destabilize the folded structure and the RNA
will instead favor an open structure, unless it is counteracted by non-electrostatic forces. This is
consistent with experimental findings that, at low concentrations of +2 ions, RNAs are generally
observed to unfold [90]. Therefore, the stability of the folded structure promoted by the
counterions is favorable only when the number of associated +2 ions is close to neutralizing the
58
RNA. However, as the results in Figure 3 show, there is a wide range of concentration conditions
where
meets this criterion; therefore, the stability of the folded structure seems to be fairly
robust when the +2 ion molarity is near the physiological value of Mg
2+
of .
Figure 3.9 Monte Carlo results as in Fig. 3.6 for of thefive nonnative structures at two values of
farther away
from what is needed to neutralize the RNA: (a)
, showing that the native state remains the most stable fold,
and (b)
, where the native conformation has now become the least stable.
We have also investigated the effects of using +3 counterions instead of +2 ions such as
Mg
2+
. The results are similar to the +2 ion data and therefore will not be displayed explicitly. For
associated ion number
, which is close to neutralizing the RNA, and a XCl
3
concentration of , the same trend similar to that observed for +2 ions in Fig. 3.6 is seen,
but the magnitude of the stabilization free energy for the native fold, relative to the open
structures, is overall stronger for +3 ions.
59
Conformation
measure of compactness N Y YB TT S E
radius of gyration
( )
root-mean-square deviation, RMSD ( )
solvent-accessible surface area, SASA (
)
Table 3.1 Several Measures of Compactness of the Various Conformations of the Hammerhead Ribozyme: Radius
of Gyration, Root-Mean-Square Deviation from the Native Structure, and Solvent-Accessible Surface Area.
3.6.1.2 Ion Titrations of Specific Ion Interaction Sites
Although the formulation of our model is intended to address nonspecific ion−RNA interactions,
it is nonetheless interesting to determine whether information about specific ion binding sites on
the RNA may be extracted from this simple model. This is important, because, in the full-length
hammerhead ribozyme, a specific ion binding site near the catalytic core has been shown to be
important to cleavage activity [47,125,128], although its role in stabilizing the tertiary fold has
not been firmly correlated with catalytic activities [47]. This putative ion binding site is located
within an electronegative recruiting pocket inside the folded structure [128]. Because our model
captures the gross electrostatics of the system, we should also be able to identify any
electronegative regions in the RNA that can serve as potential ion-binding sites. Demonstrating
this ability provides further validation for the model.
Fig. 3.10a shows the average occupancies, on an intensity scale, of the various ion
interaction sites along the backbone of the hammerhead ribozyme in its native conformation, in a
titration simulation where an increasing number of +2 ions (
) are added to the system. A
dielectric constant of was used, but similar results were observed for values between 4
and 20.
60
Figure 3.10 Mg
2+
ion titration results for the hammerhead ribozyme at ε= 10: (a) native state (N) with increasing
Mg
2+
and no Na
+
, revealing possible electronegative ion binding pockets as Mg
2+
are titrated into the system; (b)
native state (N) with 5 Mg
2+
ions and different numbers of Na
+
ions (high-occupancy sites identified at low Mg
2+
from panel (a) continue to be the most relevant when Na
+
is present); and (c) conformer E with 5 Mg
2+
ions and
different numbers of Na
+
ions.
61
We expect ion interaction sites in the neighborhood of the most electronegative region(s) in the
RNA to be filled first, and as more +2 ions are titrated into the system, they should spread into
other sites in the vicinity of these highly electronegative areas, and finally into the remaining
available sites. Therefore, a careful ion titration simulation should be able to identify any
potential ion recruiting pocket in the structure. As more +2 ions are titrated into the system, the
ion−ion correlations (see Section 3.6.1.1) will eventually dominate the site occupancies, since
like-charged divalent ions repel each other strongly. The results of the ion titration simulations in
Fig. 3.10a are thus consistent with these expectations.
According to the ion titration data in Fig. 3.10a, with
divalent ions, the first two
ion interaction sites filled by them are sites 8 and 48. These two sites are shown in red in the
native hammerhead structure on the right-hand side of Fig. 3.10a. Site 48 is on residue C17 at the
catalytic core of the hammerhead [143]. Previous ion occupancy studies [47,125,128] have
shown that there is a specific ion binding pocket at the scission site between A9 and C17 of the
hammerhead; therefore, the ion titration data are consistent with the identification of the region
in the neighborhood of C17 as being the key ion recruiting pocket. The other high-occupancy ion
interaction site with
ions, site 8, is in the vicinity of the B4 cytosine near the interface
between the stem I bulge/stem II loop interaction region. Tertiary interactions in this region are
thought to be important for the stability of the tertiary fold of the full-length hammerhead
structure [143]. While a specific ion-binding site has not been previously identified in this
region, it is possible that the electrostatics in this region may facilitate a highly electronegative
but nonspecific ion interaction pocket that helps to stabilize the tertiary structure.
In the second lane of Fig. 3.10a, titrating
+2 ions into the system leads to the
filling of three other ion interaction sites: 20, 27, and 31. These are shown in orange in the native
structure. Site 20 corresponds to the A9 residue at the catalytic core, which is in direct contact
with C17. Therefore, the filling of site 20 following site 48 corroborates the suggestion that there
is an electronegative binding pocket in the vicinity of the catalytic core. On the other hand, sites
31 and 27 correspond to the L5 uracil and the L1 cytosine, respectively, located near the stem I
62
bulge/stem II loop interaction region. The occupation of these two sites following the filling of
site 8 at the B4 cytosine on the stem I bulge again suggests that the stem I bulge/stem II loop
interaction region is a second possible metal ion pocket on the hammerhead.
As more ions are added, the
data reveal three more high occupancy sites: 17,
36, and 55. The locations of these are shown in green on the right-hand side of Fig. 3.10a, and
they are all in the neighborhood of the rest of the other high occupancy sites. Therefore, the ion
titration results suggest that there are two important electronegative regions on the hammerhead,
which can potentially act as ion recruiting pockets. One is located at the catalytic core near
residues A9 and C17. The other is in the stem I bulge/stem II loop region.
Experimentally, investigations of divalent ion binding are often carried out in high
concentrations of monovalent ions such at Na
+
to ensure the stability of the tertiary structure of
the RNA. When Na
+
are present, they may compete with Mg
2+
ions for the electronegative
binding pockets. Although the +2 charge of Mg
2+
ions causes them to bind more favorably than
Na
+
,a sufficiently large Na+ concentration can alter the chemical potential enough for the Na+
to be able to outcompete the Mg
2+
ions. Therefore, it is also important to study the robustness of
the high-occupancy +2 ion sites that we have identified above when Na
+
ions are titrated into the
system. Fig. 3.10b shows the average ion interaction site occupancy for
+2 ions in the
presence of different numbers of Na
+
ions. The first lane in Fig. 3.10b is identical to the second
lane in Fig. 3.10a with no Na+ ions, while the remainder are titration results with increasing
numbers of Na
+
ions. The high occupancy sites identified using the data in Fig. 3.10a, colored
red, orange, and green, continue to be some of the most relevant, even in the presence of Na
+
ions. Among these, site 20, which corresponds to residue A9 at the catalytic core, is clearly the
most robust. This is followed by site 48/49 at C17. On the other hand, the +2 ion occupancy at
three other sites – 8, 27, and 31 in the stem I bulge/stem II loop region – appear to become more
diffuse as Na
+
ions are titrated in, but a significant density of +2 ions remains in their vicinity.
This is consistent with the suggestion that we made previously, that these sites may represent a
63
more diffuse and nonspecific +2 ion binding pocket, whereas the ion recruiting pocket in the
catalytic core is site-specific.
Finally, in Fig. 3.10c, we show average +2 ion occupancies of the ion interaction sites for
conformation E of the hammerhead (see Fig. 3.2) in the presence of sodium. Comparing this to
the data in Fig. 3.10b for the N state, the +2 ion binding patterns of the two different
conformations are clearly quite distinct, but, surprisingly, the ion interaction sites in the
neighborhood of certain nucleotides also appear to have consistently high occupation in both
conformations. These consensus high occupancy regions are located at sites 19/20, which is near
the A9 residue, at sites 48/49, which is near C17, and at site 36, which in near residue G12. In
the native conformation, G12 is positioned directly against C17, and all three of these
nucleotides are packed into the catalytic core in close proximity with each other. However, in
conformation E, the catalytic core has been disassembled, and these three nucleotides are no
longer near each other in 3D space. Interestingly, these nucleotides, which are directly involved
in the catalytic activity of the hammerhead, receive intrinsically high +2 occupancy, even in the
absence of an assembled catalytic core.
3.6.2 The Group I Intron in Tetrahymena
To demonstrate the robustness of our model we have also performed free-energy calculations on
two larger systems: the Tetrahymena group I intron, as well as its standalone P4− P6 domain. The
details of these calculations are similar to those for the hammerhead ribozyme described above.
3.6.2.1 P4-P6 Domain of the Group I Intron
The P4−P6 domain serves as an especially good target for this analysis. ith 157 phosphate
groups, it is roughly double the size of the hammerhead ribozyme. The full X-ray structure of the
P4−P6 domain of the Tetrahymena group I intron is known (PDB ID: 1GID) [149], and this
molecule has been studied extensively [9,150], providing an opportunity to compare our results
directly with experimental data. The structure of the P4−P6 domain is composed of two helical
stems connected by a hinge loop between the P5 and P5abc helices. Experiments have shown
64
that the folding-unfolding process is dominated by the closing and opening of this hinge, while
the major P4−P5−P6 and P5abc helices remain l argely extended and undisturbed [134].
Therefore, the simplicity of this unfolding process suggests that intermediate structures are likely
represented by conformations with varying degrees of hinge opening between the open and
closed states. Using a Monte Carlo simulation that makes large conformational moves, we have
identified several candidate conformations that are likely intermediates in the unfolding process.
Some of these structures are shown in Fig. 3.11, and they are labeled N for the native structure,
and J, F, A, and O, from the most compact to the most open state. Measures of their compactness
are given in Table 3.2.
Figure 3.11 The six structures of the P4−P6 domain in the group I intron of Tetrahymena studied in this paper. The
two-dimensional structure of the molecule is shown on the left. These six structures were found using large
conformational Monte Carlo moves. The primary structural difference among these conformations is the variation of
the P5-hinge angle. Conformations are realigned such that the P4 and P6 helix is oriented vertically, revealing the
variation in their hinge angle.
Results of free energy calculations for
and are shown in Fig. 3.12 for the
P4−P6 domain. As with the hammerhead ribozyme, we once again find that the native state of
the P4−P6 domain has the lowest free energy of all conformations and, thus, is the most stable
65
thermodynamically. Between and , the most open state (O) is several kilocalories
in free energy above the native state, and the magnitude of this value is consistent with previous
estimates [134].
Conformation
measure of compactness N Y YB TT S
radius of gyration
( )
root-mean-square deviation, RMSD ( )
solvent-accessible surface area, SASA (
)
Table 3.2 Several Measures of Compactness of the Various Conformations of the P4-P6 Domain.
Figure 3.12 Monte Carlo results from canonical ensemble simulations using the BAR method to calculate the free-
energy difference, , between each of thefive non-native conformations in Fig. 3.11 (J, F, A,O, and the native state
(N)). The data in panels (a) and (b) are for and , respectively.
66
While our model is not intended for studying specific ion binding sites, ion titration
simulations can reveal where the electronegative pockets may be and help to identify potential
Mg
2+
binding sites. This is particularly interesting for the P4−P6 domain since many
experimental studies have focused on its folding due to the simplicity of its tertiary
structure [129,134]. There are two essential tertiary interactions thought to be important in the
native structure of the P4−P6 domain. The metal ion core on a bulge in the P5a region is believed
to be able to bind multiple Mg
2+
ions, causing an interaction with the metal core receptor on P4.
There is also a GAAA tetraloop-receptor interaction between P5b and P6a [129] that appears to
stabilize the fold.
Figure 3.13 Mg
2+
ion titration results for the P4−P6 domain of the Tetrahymena group I intron: (a) native state (N)
with increasing numbers of Mg
2+
ions and no Na
+
ions; and (b) native state (N) with 5 Mg
2+
ions and different
numbers of Na
+
ions.
67
Ion titration data shown in Fig. 3.13a suggest that the two most electronegative ion
association sites are located near residues 184 and 214. Site 184 is close to the metal ion core,
while site 214, which is in close proximity to it, is on the other side on the P4−P6 helix situat ed
near the ion core receptor. These two sites are colored red in the tertiary structure of the P4−P6
domain shown at the right-hand side of Fig. 3.13a. These results provide strong evidence that the
purported metal ion core is indeed one of the most electronegative ion recruiting pockets on the
native conformation of this molecule.
Number of Na
+
Ions
neighborhoods in the (N)
conformation
metal core/metal core receptor
P5b/P5c
tetraloop/tetraloop receptor
P5 hinge
Table 3.3 Several Measures of Compactness of the Various Conformations of the P4-P6 Domain.
Titrating more +2 ions into the system causes the additional ions to move into regions
proximal to ion sites 141 and 164, shown in orange in Fig. 3.13a. Sites 141 and 164 are both
situated near the P5b/P5c junction, which apparently is the second-most electronegative region in
the P4−P6 native structure, although no specific Mg
2+
ion binding site has previously been
reported here. As even more ions are titrated into the system, two effects are seen. First, more
ions continue to enter the metal ion core-receptor region, in the surroundings of residues 183 to
188 and 212 to 214, as well as the P5b/P5c junction region around residues 141 and 164. But at
the same time, a few other sites begin to be populated. They are colored green in Fig. 3.13a. No
specific ion-binding sites have previously been reported for these regions either, but they are
located at seemingly critical positions in the tertiary structure of the P4−P6 domain to which ions
associate and may be essential for the stability of the fold.
68
Figure 3.14 Ion titration results for the native conformer of the P4−P6 domain, with increasing number of Mg
2+
ions
and enough Na+ ions to neutralize the RNA. Difference regions in the RNA are designated as follows: J5/a = P5
hinge, P5bc = P5b/P5c junction, MC = metal core, and TLR = tetraloop receptor. The Mg
2+
ion occupancies at the
P5 hinge and P5b/P5c regions appear to continue to rise as the number of Mg
2+
ions increases, while the occupancies
at the metal core and the tetraloop receptor regions seem to have reached saturation after Mg
2+
ions.
Experiments have also been performed aimed at ascertaining how many Mg
2+
ions are
actually bound at the metal core. Some of these relied on determining the Hill coefficient [133],
which relates the folding free energy to the Mg ion concentration. These experiments are usually
performed in a large background of Na
+
concentration to keep the folded structure intact.
Titrating in Mg
2+
ions then competes out Na
+
ions from the highest affinity binding sites and
replace them with Mg
2+
. Calculating the Hill coefficient by quantitatively simulating these
experiments with mixed Na
+
and Mg
2+
is not possible within our model for two simple reasons:
(a) +1 counterions are expected to form a much more diffused ion cloud around the RNA, rather
than being tightly associated like Mg
2+
, and (b) in locating the Mg
2+
ion association sites only
near the phosphate P atoms, our model does not permit a precise determination of specific ion
binding sites. However, as we have seen above for the hammerhead and from the +2 ion titration
data for the P4−P6 domain, our model does capture the essential electrostatics properly and it can
identify high +2 ion affinity regions on the RNA. It will therefore be interesting to try to estimate
69
the average number of +2 ions bound in different regions on the RNA and to compare this
against previous findings for the P4−P6 domain.
Simulations with five +2 ions and increasing numbers of Na
+
ions are shown in Fig.
3.13b. The first lane, with no sodium, is identical to the second lane in Fig. 3.13a. The rest show
average ion associate site occupancies with increasing numbers of Na
+
ions. The Na
+
ions
compete with the +2 ion for these binding sites and, as the data show, they tend to smear out the
+2 ion occupancies. This is expected because, when bound near a phosphate, a +1 ion renders
that site neutral and reduces the electrostatics heterogeneity in its neighborhood. On the other
hand, the +1 ion occupancies also fluctuate more than magnesium, because they are held less
tightly. Fig. 3.13b shows that the four electronegative regions identified in Fig. 3.13a with five
+2 ions but no Na
+
ions (first lane) continue to hold onto significant +2 ion populations when
Na
+
ions are added, but additional +2 ion binding regions also open up when Na
+
ions are titrated
into the system. In particular, the sites close to 124 are situated near the hinge opening on the
J5/5a region, and they apparently receive a significant +2 ion population in the presence of Na
+
ions. Another cluster of +2 ion sites that develops with the addition of Na
+
are found near site
224 and between sites 244 and 260. These positions are near the tetraloop receptor on P6. Table
3.3 gives the average number of +2 ion within different regions of the native (N) conformation in
the P4−P6 domain. Since our model is not designed to address specific ion binding, we have
summed the ion occupancies on the sites within a small neighborhood of each region and
displayed them as a function of increasing Na
+
. For all cases, these four neighborhoods account
for almost all five +2 ions in the system. While the metal core/metal core receptor region holds
three +2 ions when no sodium is present, it seems to bind +2 ions when Na
+
ions are
present. On the other hand, while both the tetraloop/tetraloop receptor region and the P5 hinge
region hold no +2 ions when sodium is absent, they seem to hold close to 0.8 +2 ions, each when
Na
+
is present. The P5b/P5c junction also seems to take up close to one +2 ion with sodium.
While no explicit estimate for the number of Mg
2+
ions associated with other regions exist in the
literature, there is evidence that the metal core/metal core receptor region holds anywhere from 2
70
to 4 Mg
2+
ions [134] and the simulated ion titration data from our model are consistent with this
observation.
Figure 3.15 The six structures of the Tetrahymena group I intron. The 2-D structure is shown on the left-hand side.
The full intron contains the P4−P6 domain previously analyzed, as well as several additional helices in the P3−P9
domain. The varying conformations being analyzed are shown on the right-hand side. These were found using large
conformational Monte Carlo moves. For orientation, the P4, P5 helix remains vertical in each conformation shown.
N denotes the native structure; G and T show twists in the P6 and P8 helices while the P5 helices remain static. B, S,
and O retain those twists while the P5-hinge opens.
Finally, to determine how the various potential ion binding pockets are taken over by
Mg
2+
ions under high Na
+
concentration, we show +2 ion titration data in Fig. 3.14 for the P4−P6
domain in a background of Na
+
ions under almost neutral conditions, such that the net charge in
the system remains close to zero. Titrating in 1−5 +2 ions causes the various electronegative
binding pockets to be filled differently. While the P5 hinge (orange) and the P5b/P5c (yellow)
regions continue to gain +2 ion density, the occupancies at the metal core (red) and the tetraloop
receptor (green) appear to saturate when the number of +2 reaches . Titrating in additional +2
ions does not substantially increase the ion occupancies at the metal core and the tetraloop
71
receptor beyond 5. According to Table 3.3, the +2 ion occupancies at the metal core and the
tetraloop receptor at saturation are and , respectively.
3.6.2.2 The Full Tetrahymena Group I Intron
Figure 3.16 Monte Carlo results from canonical ensemble simulations using the BAR method to calculate the free-
energy difference ( ) between each of thefive non-native conformations in Fig. 3.15 (B, G, O, S, T) and the native
state (N). The data in panels (a) and (b) are for the two
values (121 and 123). These data demonstrates that the
native state is the most stable and also shows that this method remains effective, even with large molecules.
As a final application of the effective counterion model and a complement to the study on the
P4−P6 domain, we have also carried out calculations on the full Tetrahymena group I intron that
has 246 residues. The full intron structure includes the P4− P6 domain previously discussed, as
well as a P3−P9 domain, which is aligned perpendicularly in the native state. Again, several
intermediate conformations were identified by large conformational Monte Carlo simulations,
and their free energies were computed relative to the known native structure. These
72
conformations are shown in Fig. 3.15 and are labeled N, G, T, B, S, and O, in order from native
to open.
Results of the free-energy calculations for
and are shown in Fig. 3.16.
(Table 3.4 shows the measure of compactness for different conformations of the group I intron.)
Once again, the native state (N) proves to have the lowest calculated free energy. Although this
molecule is nearly four times as large as the hammerhead, the complexity of the calculations is
not much more demanding, compared to the hammerhead, and the effective counterion model
that we have proposed can likely be applied to an RNA of even larger sizes without problem.
Larger molecules do require more computational time, and we are exploring analytical methods
to augment this effective counterion model in order to further improve its computational
efficiency, so that it may be incorporated directly into all-atom folding/unfolding simulations of
moderate-size RNAs.
Conformation
measure of compactness N G T B S O
radius of gyration
( )
root-mean-square deviation, RMSD ( )
solvent-accessible surface area, SASA (
)
Table 3.4 Several Measures of Compactness of the Various Conformations of the Hammerhead Ribozyme: Radius
of Gyration, Root-Mean-Square Deviation from the Native Structure, and Solvent-Accessible Surface Area.
3.6.3 Prospect for an Implicit Ion Model for All-Atom RNA Simulations
The question of whether the RNA−ion model introduced above can be incorporated into a full -
scale RNA simulation as a viable implicit ion model will be dependent on its accuracy and its
speed. As Fig. 3.6 shows, the free energies produced by the model follow what appears to be the
correct order as a function of the compactness of the RNA structure, and it properly identifies the
native fold as the most stable of all the alternatives. The magnitudes of the free-energy difference
( ) between each of the non-native folds and the native one are of the order of a few kcal/mol
73
to for . In the model, a continuum dielectric constant has been assumed
for the solvent and there is an ambiguity of what dielectric constant to use. One empirical
approach to this question would be to try to either select the that best matches experimental
thermodynamic measurements or the one that is best able to stabilize the native structure against
all the other thermodynamic forces that favor the unfolded conformations (e.g., RNA
conformational entropy). Another possible approach to this would be to refine the model to
account for a possibly distance-dependent dielectric constant [60]. This work is currently
underway.
Computational speed is another important consideration. The data in Figs. 3.6 and 3. 9
were obtained using 1000 passes for equilibration and 10000 passes for the accumulation of the
BAR free energy. To produce reliable with the BAR method for dielectric constants ,
it is generally sufficient to use a few hundred passes for the accumulation, which on an AMD
Phenom II X4 955 processor requires about of CPU time. We are currently investigating
more efficiently alternative numerical algorithms to try to reduce this CPU requirement even
further. For integration with molecular dynamics simulations, we also need to evaluate
derivatives of the free energy. Since the BAR free energy is a noisy estimator, the computation
of the derivatives of the free energy will require more numerical precision, and this is currently
under study.
3.7 Conclusion
We have presented a statistical mechanical model to analyze the role of higher-valency
counterions in their ability to help stabilize compact RNA folds. Using a combination of exact
Monte Carlo calculations to handle the strongly correlated ions that are associated with the RNA
and a mean-field treatment of the weakly correlated ones in the diffuse charge cloud around the
RNA, we are able to accurately assess the equilibrium distribution between counterions that are
closely associated with the RNA and the unassociated one. Under physiologically relevant RNA
and +2 ion concentrations and a wide range of solvent dielectric, the number of +2 ions
74
associated with the backbone of the RNA is close to the number needed to exactly neutralize the
negative charges of phosphates on the RNA.
Using the model proposed, we have investigated the ion binding characteristics of
partially or fully unfolded structures of the Schistosoma hammerhead ribozyme as well as the
group I intron in Tetrahymena. It was found that the +2 ion association characteristics do not
change appreciably when the hammerhead unfolds from the native state into the extended three-
way junction structure. Similar conclusions were derived for +3 counterions.
More importantly, our RNA−ion model enables us to calculate the free -energy difference
( ) between any conformations due to the counterions and the co-ions. When coupled to an all-
atom molecular dynamics or Monte Carlo simulation of the RNA, this will provide a simple but
accurate implicit counterion model, allowing us to carry out RNAs simulations without having to
explicitly simulate the ions. We used this method to evaluate the free-energy differences between
partially or fully unfolded structures relative to the native fold, and found that every non-native
structure has higher free energy, compared to the most compact native fold. Moreover, it appears
that the more compact the fold, the lower its free energy. Therefore, the ion-mediated free energy
seems to impart a directionality to the folding process, directing it to home into the native state.
Our free energy calculations are based on the Bennett’s acceptance ratio method, and it is
sufficient fast to be able to be deployed in an all-atom RNA simulation. We are in the process of
integrating this counterion model into our Loops MC simulation package.
In the future, there are two aspects of the model that we will particularly focus on
improving. First, in the current formulation, we have assumed a uniform dielectric constant ( )
for the entire solvent. This value is used to describe both the ion−ion interactions as well as the
interaction between the ion and the RNA. Since the bulk value is not reached unless there is a
sufficient number of solvent molecules between the two interaction charges, assuming that the
bulk dielectric constant applies to the interaction between the RNA and the counterion
interaction sites on it could be problematic. Using a distance-dependent dielectric constant [60]
would provide a more realistic alternative treatment. Another possible improvement is to try to
75
identify numerical techniques to accelerate the BAR calculation of the free energies. Currently,
this requires a full Monte Carlo sampling on two RNA conformations in parallel. Even though
this calculation is manageable for the hammerhead, which contains only 63 nucleotides, it will
eventually become too expensive for larger RNAs. Therefore, a more-efficient numerical method
for sampling the occupations of the ion interaction site must be sought in order for this method to
be able to scale up to substantially larger systems.
76
Chapter 4:
Computational Methodology for Calculating
the Free Energy of Interaction between
RNAs and Divalent Counterions
Reprinted with permission from Henke, P.S., & Mak, C.H., J. Phys. Chem. 2014, 141, 064116.
4.1 Abstract
The thermodynamic stability of a folded RNA is intricately tied to the counterions, and the free
energy of this interaction must be accounted for in any realistic RNA simulations. Extending a
tight-binding model published previously, in this paper we investigate the fundamental structure
of charges arising from the interaction between small functional RNA molecules and divalent
ions such as Mg
2+
that are especially conducive to stabilizing folded conformations. The
characteristic nature of these charges is utilized to construct a discretely connected energy
landscape that is then traversed via a novel application of a deterministic graph search technique.
This search method can be incorporated into larger simulations of small RNA molecules and
provides a fast and accurate way to calculate the free energy arising from the interactions
between an RNA and divalent counterions. The utility of this algorithm is demonstrated within a
fully atomistic Monte Carlo simulation of the P4-P6 domain of the Tetrahymena group I intron,
in which it is shown that the counterion-mediated free energy conclusively directs folding into a
compact structure.
77
4.2 Introduction
Small functional RNA molecules such as ribozymes and riboswitches perform important
biological functions inside the cell, including the catalysis of biochemical reactions and
regulation of gene expression [10,13,151]. In order to operate correctly in vivo, these molecules
must fold into highly specific three-dimensional structures. Accurate simulations of RNAs and
the understanding of how they attain their thermodynamic ground state is of much scientific
interest, but atomistic simulations that are able to address these questions remain out of
reach [121,152]. A key component in the free energy of folded RNAs comes from the
interactions between their polyelectrolyte phosphate backbone and the surrounding counterions.
Due to the highly negatively charged groups along the backbone of RNA molecules,
the process of folding relies on positive counterions to stabilize a compact folded structure. Not
only do these counterions serve to passively maintain electrical neutrality, but they also produce
a net attractive free energy that aides in the folding to arrive at compact RNA
structures [90,103,104]. In vivo, divalent ions such as Mg
2+
appear to play an especially
powerful role in this folding process [153,154]. Sub-millimolar concentrations of Mg
2+
can
produce stabilization effects comparable to those arising from relevant +1 counterions at more
than one thousand times that concentration [47].
Many theories have been advanced to try to understand this RNA-ion
interaction [101,110,114,115], and more importantly to try to accurately determine the
magnitude of the counterion-mediated free energy. It is known that mean-field theories like
Debye- Hückel (or linearized Poisson-Boltzmann) produce no net stabilization. Higher-level
theories like nonlinear Poisson-Boltzmann have also been used, but they are numerically
expensive and generally cannot account for the strong correlations between high-valent
counterions [104,121]. For more rigid structures such as DNAs and ion channels, course-grained
theories [101,114,115] and tight-binding type models have been used [123,155]. Our own
previous attempt to model this system expands on these approaches by treating most of the ions
78
via an implicit mean-field theory, but goes further by explicitly considering the specific ions that
associate most closely with the RNA [156]. Previously reported Monte Carlo (MC)
calculations [156] provide compelling evidence that the free energy arising from our RNA-
counterion model accurately reproduces the attractive free energy needed to stabilize compact
RNA folds. However, the numerical effort required to stochastically sample the energy
landscape proves too expensive for such a method to be practical for an on-the-fly implicit free
energy evaluation during a molecular dynamics (MD) or a Monte Carlo simulation of the RNA.
In this paper, we expound on our earlier approach by analyzing the most relevant
case of divalent ions interacting with RNA. We formally describe the RNA-counterion system
and utilize the characteristic nature of small RNA molecules to define a system of generally
alternating charges. Here ‘small’ RNA molecules are defined as those less than a few hundreds
nucleotides (nt) in length, in contrast with ‘large’ RNAs such as those in the ribosome or viral
genomes.
In this new formulation, we describe the most relevant states by a set of
‘perturbations’ away from an energe tically important reference state instead of considering the
complete set of all possible charge configurations. Thoughtful connectivity of the discrete
energy landscape allows us to use a deterministic graph-search algorithm to calculate the RNA-
counterion free energy as well as the counterion-mediated forces, which can be integrated into a
conventional MD or MC simulation of RNAs. Furthermore, simulation of folding of the P4-P6
domain of the Tetrahymena group I intron is presented as an explicit example of the utility of
this new method.
4.3 Theory
In biomolecular simulation of nucleic acids, use of an implicit solvent helps to reduce the
computational requirements by removing a large number of degrees of freedom. However, a
fully accurate simulation still must account for the variation in free energy of the system due to
the solvent molecules, including ions. We have shown in a previous paper [156] that a tight-
79
binding model can be used to describe this interaction for any RNA structure, and that the free
energy arising from tight ion binding is lower for the compact native structure as compared to
extended ones.
Figure 4.1 The magnitude of the counterion mediated free energy plotted against the radius of gyration
for
opening conformations of the P4P6 domain of the Tetrahymena group I intron. The native crystal structure shown
on the bottom left of the plot has a minimized free energy value. As the compaction of the molecule dissipates, the
free energy becomes larger, indicating that the effect of the counterions is less stabilizing on more extended
conformations.
An explicit validation of this tight-binding model is shown in Fig. 4.1, wherein the RNA-
counterion free energy arising from our counterion model is calculated along an opening
trajectory of the P4P6 domain of the Tetrahymena group I intron (PDB ID: 1GID) [149]. This
calculation was performed with the graph search algorithm which will be explained in detail
below. The different molecular conformations were sampled by Loops MC, a fully atomistic
80
Monte Carlo RNA simulation [56]. As the molecule is pulled open, the radius of gyration (
)
increases accompanied by a rise in the counterion-induced free energy ( ); however, the
relationship between and
is not clearly nonmonotonic.
In practice, the calculation of this free energy must be accurate for each RNA
conformation and it must also be sufficiently efficient so as not to create a numerical bottleneck
in the overall simulation. A numerical solution for this and its theoretical basis are described
below.
4.3.1 The RNA-Counterion Model
Our RNA-counterion model has been discussed in detail previously [156]. Here we briefly
summarize the essential features of the model before describing the new approach. The basic
idea relies on a tight-binding description of the ion association sites on the RNA, at the same
time accounting for the highly flexible nature of RNA backbones.
Figure 4.2 The RNA molecule is considered to have ion interaction sites, shown here as large open circles, at highly
charged regions. Positive ions of charge
, shown as small filled circles, surround the RNA, and occasionally
associate with the ion interaction sites. This system has constant volume and is coupled to an ion resevoir with
chemical potentials
and
for positive and negative ions respectively.
81
The basic model is illustrated in Fig. 4.2. A mixed ensemble is employed in which the
RNA molecule is treated in a canonical ensemble while the surrounding ion cloud is treated in a
grand canonical ensemble. The electrostatic ion-mediated free energy of this system is generally
defined:
For which
is the grand canonical trace over ion coordinates { }.
is the electrostatic
potential of the RNA with atomic coordinates { },
is the electrostatic potential of the ions in
solution, and
is the interaction potential between the RNA and the ions.
and
are the
fixed chemical potential and variable number of ions for the positive and negative ions
respectively.
⁄ for the Boltzmann constant
and temperature .
To reduce the degrees of freedom of the positive ions, an assumption is made that there
exist specific sites highly correlated to the conformation of the RNA to which the positive ions
are most attracted. These sites are deemed ‘ion occupation sites’. Any such site is defined to be
‘occupied’ if it is strongly interacting with an associated ion, and we can therefore define an
occupation number for each site
The total number of these associated ions is
∑
(
) , but because RNA has
strictly negative charges, only positive counterions are considered, and
.
On the RNA itself, each bound phosphate holds a single extra electron, and so the negative
charge is concentrated at the phosphodiester bridges connecting the nucleosides. Since we are
interested in the counter-induced free energy arising from the electrostatics between the RNA
and the ions, we assume that these phosphate sites are the most relevant. Therefore, the
phosphates are chosen to be the most important ion association sites, with each site having a
background charge of . In the case of divalent counterions such as Mg
2+
, an ion bound to a
phosphate then confers an overall charge of to it.
(
)
(
)
(
)
(4.1)
{
(4.2)
82
Reformulation of as defined in Eq. 4.1 using the ion-occupation variables in conjunction
with a mean field treatment of the surrounding diffuse ion cloud results in the expression:
The effective Hamiltonian
evaluated in this trace is a function of both
and , the spatial
conformation of the RNA, and represents the electrostatic potential between the ion sites and the
associated ions:
As might be expected for a system with occupation site variables, the Hamiltonian in Eq. 4.4
closely resembles that of a lattice gas model. There is an electrostatic two-body interaction term
, a one-body term,
, arising from the interaction between the charge at site
and the
constant electrostatic potential created by all other sites , as well as a constant energy term
that represents the energy cost associated with the binding of ions to ion sites
The Hamiltonian derived in Eq. 4.4 can be used to generate grand canonical ensemble
averages via a standard Metropolis MC approach. The ensemble average for total occupation
number is:
Evaluation of the average in Eq. 4.5 reveals that, for physiological salt concentrations of 0.5 mM
MgCl
2
to 5 mM MgCl
2
the number of ions that associate closely with the RNA is nearly constant
such that the associating ions neutralize the overall RNA charge. This is physically not
surprising, and it agrees with counterion condensation theory. We can therefore define the
number of associated ions to be constant:
〈
〉 where 〈
〉
⁄ ,
being the charge
on the positive ions, which is +2 for divalent ions. This condition frees the Hamiltonian of its
grand canonical constraint and allows it to be defined canonically, solely as a function of the
RNA conformation
(
)
(4.3)
∑
(
)
∑
(
)
(
)
(4.4)
〈
〉
(
)
(
)
(4.5)
83
Specializing to only divalent counterions leads to a final Hamiltonian:
where the charges can be only
{ }, is the dielectric constant,
, and
is the
distance between ion association sites
and
.
The range of salt concentrations over which the canonical approximation is made also
overlaps with the physiological concentrations for which RNAs are observed to fold [47], and
evaluation of the free energy within the canonical ensemble provides a reasonable first order
approximation. However, specific tertiary elements, such as kink-turns, have been shown to
vary in compactness over this same range of salt concentrations [157], which would not be well
represented by the canonical contribution alone. We are presently more concerned with the
dominant role in the free energy arising from the associated ions, but future efforts will focus on
the relatively less important contribution to the free energy due to the mean-field treatment of
ions in solution. This grand-canonical contribution is dependent on ion concentration, and may
account for the concentration-induced effects observed experimentally.
4.3.2 Ion Configurations Most Relevant to the Free Energy
Analysis of our tightly-bound ion model has revealed that, in the case of divalent ions, the RNA-
counterion system is effectively reduced to a system of charges
at coordinates defined
by the phosphate sites of the RNA: ⃑
⃑(
) . It is convenient to re-express the Hamiltonian in
Eq. 4.7 in terms of a row vector of charges with elements {
}:
({
} ) ∑
( )
∑
( )
( )
(4.6)
∑
(4.7)
( )
(4.8)
84
for the inverse distance matrix
and total number of charge sites . Note that for the
total system charge to be neutral, must be even. In the canonical ensemble, the Helmholtz free
energy is related to the partition function by
( ) , with
for all energies
defined by the set of charge configurations with conserved charge: ∑
.
Figure 4.3 MC-determined energy spectrums of four RNA crystal structures for . Noting that physiological
, the low energy states are well separated with respect to thermal energy even at larger values of
the dielectric.
Due to the combinatorial nature of the set of possible configurations , its size is clearly too
large for explicit calculation of for RNAs with more than just a few nucleotides. However, if
the ground state energy and other low lying energy states are well separated from the rest of the
energy spectrum, then the free energy is dominated by these low-lying states. Therefore the low-
energy spectrum alone may be used to compute the counterion-mediated free energy. We have
computed the full energy spectra for four small RNA structures via exhaustive MC calculation in
order to confirm that their respective spectrums for do indeed have meaningful separation
∑ [
]
(4.9)
85
near the ground state. These are shown in Fig. 4.3. These structures were chosen because they
are very well resolved and are representative of different RNA lengths. In addition to the P4-P6
domain, the others analyzed here are: the cobalamin aptamer domain riboswitch (PDB ID: 4FRG
– 82 charge sites) [158], the tetrahydrofolate riboswitch (PDB ID: 3SD1 – 88 charge
sites) [159], and the Tetrahymena full group I intron (PDB ID: 1X8W – 246 charge sites) [160].
In general, the value of the dielectric is distance-dependent, and varies with the type and
amount of particles surrounding the charge sites [161]. The dielectric constant in vacuum is
with the bulk limit of water at an upper bound of . For the sake of simplicity, we
opt to use a constant value of the dielectric, for which a value of to is often assumed for
a polar molecular solvent such as water [57]. However, we note that implementation of an
implicit, distance-dependent dielectric is easily achieved [60], and would not change the methods
of our approach.
From Eq. 4.8, the energies of each counterion configuration in scales inversely with .
So with
at , it can be seen that the energy spectra of the four
different RNAs shown in Fig. 4.3 at indeed remain well separated if a nominal values of
to is used instead. This suggests that only the ground state energy and possibly the
low-lying excited states will be relevant to the overall free energy value, and given that these few
are the energy values that must be identified, the task of computing the free energy reduces to a
search for the lowest-energy ion states in the spectrum. This appears to be generally true for
small RNAs up to a few hundred nucleotides, but for larger RNAs of a few thousand nucleotides,
there may not always be a clean energy separation.
While the free energy is controlled by the ground and few low-lying states, it is important
to note that correctly identifying the ground state charge configuration is an NP-complete
problem because the only way to do so is through a direct binary comparison with all other
allowed counterion states. This is conceptually similar to identifying the ground state of a spin-
glass system [162]. Furthermore, the most relevant counterion configurations are different for
different RNA conformations, and we must be able to efficiently compute the low-energy
86
spectrum for every RNA conformation during the course of an MD or MC simulation. To
develop a reliable method for doing this, we must first understand the nature of these low-energy
ion states.
4.3.3 Physical Characteristics of Lowest-Energy Counterion Configurations
To motivate our discussion, we start by considering the simplest possible system – a fully
extended RNA chain. For this, the charge sites are given by an array of points along an
approximately linear axis. The ground state configuration is clearly that of an alternating pattern
of { } charges. This is because the RNA sequence also defines the closest-
distance charge sites in space, the energies of which dominate the total energy by minimizing
electrostatic repulsions. However, functional RNA structures are characterized by complex
tertiary structures, and the closest-distance sites do not necessarily follow the nucleotide
sequence. This implies that the ground state configuration will not always be that of simple
alternating charges. Despite the complex structures being considered, an alternating
configuration of plus and minus charges along the sequence is likely to be one of the dominant
ion states for several reasons. In the presence of 1mM Mg
2+
, the persistence length of single-
stranded RNA is
[163]. The average distance between two adjacent charge sites on an
RNA backbone is
̅
, and so for any short segment of RNA the charges are typically well
separated. Additionally, steric interactions from the nucleosides further prohibit charge sites
from coming too close to each other. Analysis of several RNA crystal structures in Fig. 4.4a
reveals that for small RNAs, typically fewer than of the charge sites have a closest-distance
site that is not its nearest neighbor in sequence.
A system of alternating charges can be parameterized by an order parameter
∑
(4.10)
87
where the Kronecker delta
is if integers and zero otherwise. Monte Carlo
calculation of for several RNA crystal structures with PDB IDs shown in Fig. 4.4b reveals that
for a nominal dielectric constant, charges in small RNA structures are highly ordered in a
regularly alternating pattern along the backbone. Varying in Fig. 4.4b is analogous to varying
the temperature as the value of the energy barriers between states is inversely proportional to the
dielectric.
Figure 4.4 a) Percent of closest-distance charge sites not given by the RNA backbone sequence for several RNA
crystal structures. The number of these sites is characteristically for small RNAs. b) Alternating order
parameter, , for RNA crystal structures of several different lengths. For small RNAs, the alternating order is stable
and suggests near complete alternation.
88
Based on the observation that the ground state charge sequence will be almost entirely
alternating, we will formally define a discrete space which will provide a basis for efficiently
finding the counterion ground state. Defining the counterion configuration with purely
alternating charges as the “reference state”, with explicit charges: {
}, all other counterion configurations with conserved charge can be generated by making
perturbations to the reference state. Note that due to the degeneracy of the Hamiltonian in Eq.
4.8, we choose to always set the first charge in the sequence to without losing any relevant
configurations.
Perturbations to the reference state arise through the introduction of “domain
boundaries”. A boundary at charge index implies that
, with possible boundaries at
the first sites. Sequential charge sites between domain boundaries are purely alternating
by definition, and are deemed “alternating domains”. The full ch arge vector associated with a
given set of domain boundaries {
} is defined as
with charges defined
by the recursive relation:
This concept is perhaps best understood visually, as shown in Fig. 4.5. For a typical,
extended RNA molecule like that seen in Fig. 4.5a, the reference state is also the most
energetically favorable configuration. However, the ground state of more compact spatial
configurations, including commonly found RNA structural motifs, is likely to require some
modification to the reference state. Fig. 4.5b shows a partial RNA α -helix. In the reference
state, many like-charge interactions arise from the base-pairs, depicted by dashed lines, and may
negatively impact the total energy value. Insertion of a domain boundary ‘flips’ the charges on
one half of the helix, thereby making those base-paired site interactions favorable, which may
lower the total energy. Similarly, tertiary interactions such as the tetraloop-tetraloop receptor
interaction in Fig. 4.5c may bring sequentially distant charge sites to within physical distances in
real space that are closer, and therefore more energetically relevant, than the sequence-defined
{
(4.11)
89
neighbors. In this case, a domain boundary that flips one set of these charges may resolve many
unfavorable interactions while preserving most of the preexisting favorable interactions. The
number of these types of long-distance interactions is shown in Fig. 4.4a for several small RNAs.
Reformulating the system in terms of domain boundaries conceptually simplifies the
problem. Rather than defining ion states in the system by explicit charges, it is easier to use
the locations of the domain boundaries instead. However, the charge neutrality requirement on
the allowed counterion configurations introduces nontrivial constraints on the positions of these
domain boundaries, as well as their relative locations. To generate all possible ion states, we
must first determine these constraints. But at the same time, from our analysis of it is expected
that the ground state configuration of small RNAs should not have more than 5 or 6 boundaries,
and so we usually do not have to generate perturbations that are very far from the reference state.
Figure 4.5 A visual example of the usefulness of domain boundaries. An RNA structure derived from experiment
or simulation defines specific charge sites. a) For an extended single-stranded RNA, charges are expected to be
alternating along the RNA backbone. For other, more compact RNA structural conformations, the short spatial
distance between charge sites that are sequentially distant indicates that the reference state alone may not be that of
the ground state energy. b) For a standard RNA α -helix, accumulation of unfavorable like-charge interactions may
be alleviated by ‘flipping’ the charges on one side of the helix via introduction of a domain boundary. Similar
benefits may exist for c) a tetraloop-tetraloop receptor tertiary interaction, in which the long-distance polymer
interactions may be more relevant than the short-distance polymer interactions.
90
The system is composed of charges, and so charge neutrality in the overall system is
dependent on both the properties of the domains as well as their ordering. Each domain can be
defined in terms of three properties:
defines its size as even or odd, its charge,
, and its
phase,
. The size of domain is defined by the parity of the number of charge sites within the
domain:
The charge of domain with beginning and ending indices and , respectively, is
defined by its overall total charge
∑
. Charges alternate within a domain, and so
can be or . Finally, the phase of a domain is given by the charge at its initial index:
or . Example sets of domains are shown in Fig. 4.6 with the domain
properties listed.
Figure 4.6 a) The reference state for . b) A single domain boundary changes the pattern of
charges away from that of pure alternation, but otherwise retains an alternating motif. In this case
a domain boundary at charge site index 20 creates two adjacent negative charges at
. c
& d) More complex sets of domain boundaries.
{
(4.12)
91
It is easy to see that these properties are related by a set of basic rules:
From Eq. 4.13a, even domains, i.e. those with
, necessarily have
. Even
domains do not affect the overall charge neutrality of the system. Conversely, odd domains, i.e.
those with
, have
. Thus, in order for the entire system to be charge neutral, there
must be an even number of odd domains. Furthermore, for each odd domain with
, there
must be a counter domain with
. Eq. 4.13b follows from the convention of Eq. 4.11,
arbitrarily placing a charge on the first site. Eq. 4.13c relates the phases of sequential
domains. If domain is odd, then
. On the other hand, if domain is even, then
. More generally, if there are an even number (including zero) of even domains
separating two odd domains and ,
. If there are an odd number of even domains
separating and , then
.
To focus attention on charge neutrality, we use a ‘ +’ sign to denote a domain with
and a ‘ -’ sign for
. e use a ’ 0‘ to denote an even domain. Because they are
all neutral they do not affect charge neutrality. Any charge configuration can now be visualized
as a sequence of even and odd domains with the condition that the number of + domains is equal
to the number of - domains. For example, the charge configurations from Fig. 4.6 become: a)
{0}, b) {00}, c) {+00+0--}, d) {+++0---000}. Because neutral domains do not affect
charge neutrality, they can be compressed further according to their effect on the phases of the
two domains they connect. For example, an even number of consecutive even domains such as
00, 0000, 000000, etc. preserves the phase of the next domain, whereas an odd number of
consecutive even domains such as 0, 000, etc. changes the phase of the next domain. A linker
consisting of an even number of even domains is represented by an ‘ =’, and a linker consisting of
an odd number of even domains is represented by an ‘ x’. Using the linker representation, the
(4.13a)
(4.13b)
( )
(4.13c)
92
charge sequences in Fig. 4.6 can be entirely rewritten as: a) {x}, b) {=}, c) {+=+x-=-}, d)
{+=+=+x-=-=-x}, so each one is manifestly charge-neutral.
4.3.4 Domain Boundary Transition Operations
In order to utilize this domain boundary formulation, we must first develop the corresponding
operations to make transitions among the different allowed states. These transition operations
define the ‘connectivity’ among the discrete charge states, and th ereby define the energy
landscape. Once these transitions have been defined, they may be incorporated into either a
graph search algorithm or a Monte Carlo method in order to efficiently sample the low-energy
part of configuration space.
Note that all charge-neutral configurations in the set can be described by a set of
domain boundaries, but not all sets of boundaries conserve charge. This is critically important
because transitions involving non-charge-neutral states would introduce large artificial energy
barriers between states. Therefore, the connectivity scheme will ensure that charge is always
conserved and that connected states are energetically similar in order to minimize and avoid any
energetic barriers that would otherwise impede the search.
The simplest class of transition operations to use is the addition and removal of domain
boundaries. All possible effects of these two operations are easily analyzed within the linker
formulation detailed above, but are not explicitly enumerated here for brevity. Importantly, in
order for the space to be ergodically explored within the charge-neutrality constraint, the
minimal set of transition operations is the addition or removal of both single boundaries and
pairs of boundaries. Consequently, we consider two states to be connected if and only if they
differ by at most two boundaries. By this definition of connectedness, the number of states
connected to any given state is of (
) . The polynomial size of this neighborhood suggests
that a search of the space will be manageable.
Energetically, the difference between two connected sets of domain boundaries, and ,
with charge vectors
and
and associated energies
and
given by Eq. 4.8 is:
93
for
(
) and ⟨
̃
| (
) . The advantage to writing the
energy difference in this way is that the latter vector will necessarily be mostly zeroes due to the
operations we have defined, resulting in an ( ) calculation.
4.4 Results
4.4.1 Boundary Search
We now pursue the idea that the operations describing the connected landscape of ion
states effectively create well-defined ‘neighborhoods’ in discrete space. That is, for any given
charge configuration, a set of similar configurations is described by the connecting boundaries.
Furthermore, because the discrete space is defined to be connected in such a way that transitions
between states are relatively smooth energetically, it can be treated as a graph and searched to
find the low energy states, which can then be used to calculate the free energy of the system.
Discrete space search proceeds in a uniform cost fashion for which the cost function is
the total system energy. Uniform cost search is the discrete space corollary to gradient search in
continuous space. Given a graph ( ) defined by a set of nodes connected by a set of
weighted edges , a uniform cost search algorithm maintains a set of ‘explored’ nodes ,
for which the adjacent edges have been evaluated. At each iteration, the set of nodes (
) is evaluated for which the edge { } is on the frontier of the explored nodes. The node
with the minimum total edge cost is then added to the set and the next iteration proceeds. For
known graphs it has been proven that this algorithm gives the shortest possible distance between
all nodes in ( ) time [164].
The size of the discrete space being searched here is too large to fully enumerate, and so
there must be a concession to guarantee termination. In our case, a user defined variable,
max_nodes, is introduced, for which the search automatically stops when max_nodes.
(
)
⟨
| |
̃
⟩
(4.14)
94
Explicit pseudocode is as follows:
Let P be the set of explored energy nodes
Let R be the set of nodes with connected edges {u,v}, where
and
Initialize P to be empty
Initialize
While |P| < max_nodes
Select the node that has the lowest total energy value
{ }
Foreach node with connected edge {u,v}
{ }
End foreach
Endwhile
Using the low energy set of states, calculate the free energy or
ensemble forces
The run time of this algorithm is (
) . This is pseudopolynomial, but if
max_nodes is small, then the search terminates quickly. Optimizations worth noting are the use
of a priority queue for storage of and a hash table to reference nodes in . A parallel
implementation is also easily achieved in practice. We expect to release a Fortran version of this
algorithm in the near future for general use.
Validation of the domain search algorithm is accomplished by comparing the magnitude
of the free energy, determined by boundary search to the value of max_nodes as seen in Fig.
4.7. Any state can be used as the starting node of the search, but in this example, the search is
initiated using the reference state. This allows all possible domain boundaries to be enumerated
first and is a good initial guess in general, as the number of boundaries in the ground state is
expected to be small. In addition to the aforementioned small RNA crystal structures, two
different crystal structures of the lysine riboswitch are included in order to demonstrate the
effectiveness of the search method for multiple conformations. These are PDB ID: 4ERL – 160
charge sites [165]; PDB ID: 3DIQ – 172 charge sites [166].
95
Figure 4.7 Convergence of is established typically within 10 iterations of boundary search. Full convergence may
take more iterations for larger RNAs, such as the full Tetrahymena group I intron (PDB ID: 1X8W), but a very
accurate approximation can be made with a cutoff value of max_nodes .
The very fast convergence to a stable free energy indicates that confidence in finding the
ground state energy is very high for small RNAs. Additionally, because these are the crystal
structures, it can be assumed that alternate conformations found in simulation may be as
compact, but never moreso, and that therefore the number of boundaries in the low-energy states
of the spectrum should be roughly equal to or less than those found in the ground state of the
crystal structure. This adds further confidence to use of this method in simulation.
In terms of incorporating this boundary search method into simulation, there are two
cases to consider. The first is that of Monte Carlo simulation, in which RNA conformations are
stochastically generated and the energy of the system is evaluated. In this instance, boundary
search can be directly implemented as an extra calculation between each MC step, and the free
energy value may be added to the total energy of each state.
The more common RNA simulation platform is that of molecular dynamics for which the
forces on all particles must be calculated as the system evolves. For MD, the results of
boundary search can be used to generate the forces arising from the RNA-counterion interaction.
96
The general equation for the vector of average forces
⃑⃑⃑
arising from the Helmholtz free energy,
, due to a charge conformation is:
for charge site coordinates
⃑⃑⃑⃑. Evaluation of Eq. 4.15 reveals that the force on charge site in the
conformation is the canonical ensemble average of the electrostatic forces on that charge site:
with
⃑⃑⃑⃑⃑
being the force on charge site for charge configuration . The
electrostatic potential energy
is given in Eq. 4.8.
Because boundary search finds the most relevant energy states, and knowing that these
systems have a well-separated energy spectrum, the ensemble force in Eq. 4.16 can be evaluated
directly once the search completes by evaluating the forces of the most relevant charge states. In
practice, the distances used in calculating these forces are precalculated before the search, and so
actual computation of the forces will not add significant computational requirements into a
simulation.
4.4.2 Monte Carlo Search and Validation
As noted in the previous section, the boundary search method is not complete, and only
explores a small subset of the entire combinatoric space. It is therefore important to verify both
the ergodicity of the operations being used as well as confirm that the low-energy spectrum
found by search is the true spectrum of the system. To do this we will incorporate the transition
operations into a full Monte Carlo method called Boundary Monte Carlo (BMC).
The two types of boundary operations defined above ( boundary; 2 boundaries) are
incorporated into BMC as the transition ‘moves’. Each of these sets of moves can be treated
independently, but because the number of connected states for each move varies with the number
and type of domains in a given charge state, each type of move must be evaluated to preserve
detailed balance so that the probability of transitioning between two states is equal. Therefore, in
⃑⃑⃑
⃑⃑⃑⃑
(4.15)
〈
⃑ ⃑⃑
〉
∑
⃑⃑⃑⃑⃑
∑
(4.16)
97
addition to the application of a standard Metropolis transition probability,
( [
] ) , the probability of going from one state to another state is
given by:
Where
is the number of transitions out of state for either adding ( ) or removing
( ) boundaries within either set of moves and
is the number of transitions out of the
connected state with the reversed type of move. Note that this transition probability implies
that some MC steps may not result in attempted moves, but provided that
, the
number of these nonproductive steps is small. These values are easily calculated, showing that
this is indeed the case for the moves being applied, and so this condition is well satisfied. The
overall transition probability between states is then:
. Hence,
,
and detailed balance is satisfied.
Figure 4.8 A comparison of the total system energy, , of Boundary MC and Naive MC scaled by the dielectric, ,
and the number of charge sites, , with energy values offset for clarity. At low , BMC equilibrates quickly to the
ground state energy also found by boundary search. At high , BMC and NMC converge to the same energy,
demonstrating that the transition operations being used are appropriately ergodic.
(
)
(4.17)
98
In order to demonstrate ergodicity of the boundary transition moves, BMC is compared to
a naive MC implementation. In the naive method, the only move attempted is the swapping of
two charges on randomly chosen charge sites with a Metropolis transition probability. Analysis
of the total system energy scaled by the dielectric and the number of charge sites for both
MC methods is shown for the Tetrahymena P4-P6 domain and full Group I intron in Fig. 4.8. At
low , BMC equilibrates to a ground state energy equal to that found by boundary search. Naive
MC is too simple to effectively equilibrate to the ground state for low values of , but instead
provides a good reference for the appropriate system energy at high . Convergence of both
methods at high implies that BMC is correctly ergodic and that therefore boundary search is as
well.
4.4.3 Implementation in Atomistic Simulations
Figure 4.9 The Tetrahymena P4-P6 group I Intron. a) The secondary structure consists of two helical stems
connected by a hinge loop. b) The tertiary structure of the equilibrated crystal structure, used as the initial
conformation in the simulations discussed.
As an example of the practical utility of boundary search in simulation, we integrate the
algorithm into the Loops MC simulation platform, a fully atomistic Monte Carlo program that
samples large conformational changes of nucleic acids [56]. This platform possesses the unique
99
capability of enabling large-scale conformational sampling to be carried out while preserving the
native secondary-structural elements or any subset thereof, allowing for systematic study of
tertiary folding processes with respect to important secondary elements of the structure. All
calculations described below were carried out using Loops MC with the Amber ff94 energy
functions.
As a model RNA system we use the P4-P6 helical domain of the Tetrahymena group I
intron. The crystal structure of P4-P6 is well known and it has been the subject of many
experimental folding studies. Structurally, P4-P6 is composed of two helical stems connected by
a hinge loop. One helical stem is coaxially stacked (P4-P5-P6), while the other forms a 3-way
junction (P5abc) as seen in Fig. 4.9a. In its native conformation the two stems dock with each
other in parallel, stabilized by two important tertiary interactions: a tetraloop-tetraloop receptor
interaction (TL/TLR) and a metal core-metal core receptor interaction (MC/MCR). P4-P6 has
been shown to fold in the presence of divalent ions, and exhibits a relatively simple folding
mechanism [9,134]. In the presence of Mg
2+
, the metal ion core is formed first, which brings
together the two helical stems via the MC/MCR interaction [167,168], which is primarily
stabilized by the recruitment of Mg
2+
into the MC/MCR pocket. The tertiary fold is further
stabilized by the hydrogen bonds formed in the TL/TLR interaction [22].
In order to properly simulate the RNA molecule, the crystal structure (PDB ID:
1GID) [160] was first equilibrated to remove the large van der Waals (vdW) energies present
due to steric overlap of atoms. 1000 Loops MC passes were run using the vdW potential, which
is expressed as:
for atomic vdW terms and as defined by Amber ff94. This equilibration process reduced the
total vdW energy of the crystal structure from
to . The
equilibrated, which is shown in Fig. 4.9b, was used as the initial configuration for all following
simulations.
( ) [(
)
(
)
]
(4.18)
100
All simulations we discuss began with an ’unfolding phase’ of 50 Loops MC passes using
the purely repulsive Weeks-Chandler-Andersen (WCA) potential in place of the attractive vdW
potential. The WCA potential is defined as:
sharing the same values for and as
. The WCA potential allows the molecule to move
without collapsing onto itself by only enforcing steric hard-sphere interactions. We note here
that steric overlap is the major impediment to effectively sampling a large set of conformations.
In order to alleviate this, all simulations we performed had explicit hydrogens removed from the
mainchain, decreasing the likelihood of the RNA becoming sterically trapped. Additionally, all
base-pairing interactions were conserved except for the four responsible for the TL/TLR tertiary
interactions.
In total, four sets of MC experiments were performed, each employing a different set of
potentials in order to gauge the impact of the inclusion of counterions on the RNA structure. We
characterized the structure of the RNA by its radius of gyration (
). Because we want to
determine how much the counterion-induced free energy was able to direct the folding of the
structure, we deliberately omitted all tertiary interactions from the potential. The structures
achieved were measured only for their compactness and not their correctness.
( ) {
[(
)
(
)
]
(4.19)
101
Figure 4.10 Typical folding trajectories for the Loops MC simulations being run. b & e) Radius of gyration (
)
vs. Monte Carlo step. Each of these trajectories depicts major compaction of the RNA molecule following the
initial, 50 MC-step, unfolding phase. b) The counterion free energy alone produces an overall attractive potential
that can be seen in the formation of a stable, compact conformation. e) Contrastingly, when the vdW potential is
included, the compact state produced is far more stable. a & d) Open conformations taken from the final step of the
unfolding phase for the WCA and vdW runs shown in (b) and (e), respectively. c & f) Stable folded conformations
taken from the final step of the overall trajectory for the WCA and vdW runs shown in (b) and (e), respectively.
The first set of simulations was a control in which only WCA remained active following
the unfolding phase. We expected the molecule to remain unfolded since there was no attractive
potential, which is indeed what was observed. We consider an MC run to have produced a
stable, folded conformation if the trajectory generated 25 consecutive conformations with
. The radius of gyration of the initial conformation is , and so a stable structure
close to this size can be reasonably considered ‘compact’ by our estimation. Based on this
definition of folding, none of the 48 WCA-only trajectories produced a stabilized fold, which
102
implies that the molecule is entropically driven into an open conformation in the absence of any
attractive forces.
In the second set of MC experiments, we wanted to evaluate how the counterion free
energy would influence the compactness without any additional attractive forces. Following the
unfolding phase, 400 MC passes were run in which the counterion-mediated free energy was
incorporated using domain search with a dielectric in addition to the WCA potential. In
this set of simulations, any attractive force must be due to the presence of the counterions, as the
WCA potential alone cannot sustain a compact structure as shown in the control. The effects of
the counterions were readily observed. In total, 24 of 48 trajectories produced a stable folded
structure by our definition. One example is shown in Fig. 4.10b which illustrates the radius of
gyration during the course of the simulation. The unfolded structure after the first 50 MC passes
with WCA only is shown in Fig. 4.10a, while the final structure after 450 total MC passes is
shown in Fig. 4.10c. These results suggest that the counterion-induced free energy is strong
enough to overcome the entropic tendency of the molecule to stay open.
In the third set of 48 simulations, the unfolding phase was succeeded by 400 MC passes
that also included the counterion-mediated free energy, but for which the full vdW potential was
used in place of the WCA potential. Of these, 34 of 48 produced a stable fold. One example
trajectory is shown in Fig. 4.10e, with the structure generated by the unfolding phase shown in
Fig. 4.10d, and the final structure in Fig. 4.10f. The larger percentage of folded structures
compared to the second set of simulations is due to the additional attractive force from the vdW
terms on top of that produced by the counterions. These simulations also tended to produce far
more stable compact structures.
In the final set of simulations, we used the same set of potentials as in the third
experiment after the unfolding phase, but also included the free energy of hydration via the Ooi-
Scheraga continuum solvation model [169], which is calculated using the solvation accessible
surface area (SASA). The free energy of hydration is given by:
103
for which
are the Ooi-Scheraga coefficients and
is the SASA for atom type . In these
runs, the hydration of the P4-P6 molecule tended to keep the two helical stems from contacting
each other, thereby preventing docking and lessening the impact of the two attractive potentials.
In total, 2 of 48 trajectories of 400 MC passes produced a stable fold.
Figure 4.11 Normalized histograms comparing the equilibrium radius of gyration (
) for each set of trajectories
generated. The control, WCA-only trajectories tend toward open conformations. By contrast, inclusion of ions with
both WCA and the full vdW potential produces far more compact conformers. When the Ooi-Scheraga SASA
hydration potential is included, the attractive effects of the other potentials are impeded, leading to a broader range
of conformations.
To gain a better understanding of the ensemble of equilibrium structures obtained from
each of the four experiments, the distributions of the radius of gyration for the four sets of
trajectories are shown in Fig. 4.11. The WCA-only structures were by far the most open, while
those structures generated with the WCA potential plus the counterions are much more compact,
clearly revealing the attractive influence of the counterion free energy. The structures generated
with the vdW potential and ions tended to be the most compact, while inclusion of the hydration
free energy led to a centered distribution. These simulations clearly demonstrated the centrality
∑
(4.20)
104
of the counterion-induced free energy in directing the folding of the P4-P6 structure to prefer
compact conformations.
4.5 Conclusion
Use of an established tight-binding RNA-counterion model provides a simplified view of
the RNA-divalent ion system, which reduces to a system of opposite charges. The physically
preferred spatial arrangement of the charges leads to an attractive, stabilizing free energy in
compact RNA structures. Analysis of the preferred internal charge structure provides a basis for
a polynomial-time search of the charge configuration landscape which verifiably elucidates the
low energy states and thereby provides a fast and accurate estimate of the RNA-counterion free
energy. We explicitly demonstrate that this search method can be easily incorporated into
existing MC implicit-solvent simulations without creating a computational bottleneck. The
algorithm can also be used in fully atomistic MD simulations to calculate the mean force arising
from the free energy of the counterion-RNA interaction.
105
Chapter 5:
Coarse-Grained Model of RNA with Implicit
Divalent Counterions
5.1 Abstract
Coarse-grained molecular dynamics simulations of RNA using implicit solvent have become
increasingly popular thanks largely to the computational gains associated with simulating fewer
explicit atoms while still retaining the most important dynamical aspects of the polymer. RNA is
a negatively charged polyelectrolyte, and divalent counterions like Mg
2+
are known to play an
important role in RNA folding and stabilization, yet no implicit-solvent coarse-grained RNA
models account for the electrostatic effects associated with non-monovalent counterions. In this
paper, we present a modified coarse-grained model of RNA that incorporates a previously
described implicit divalent counterion model. This two-state RNA model is shown to closely
match experimental results of RNA dynamics in the presence of both monovalent and divalent
cations, and is capable of forming and maintaining A-form helices in the presence of an
attractive electrostatic free energy. Incorporation of this coarse-grained model into simulation
reveals that RNA two-way junctions tend to be more likely to fold in the presence of divalent
counterions than under electrically neutral conditions, supporting the need for more realistic ion-
effects in future RNA folding simulations.
5.2. Introduction
RNA structure prediction has become an increasingly important problem as experimental results
continue to reveal that RNA plays a much larger biochemical role than once thought.
Functional, non-coding RNA molecules like ribozymes and riboswitches serve as important
106
regulators of gene expression in vivo [7,13,170], while studies on micro RNAs and small
interfering RNAs have led to new understanding and possible treatments of genetic maladies like
cancer [171,172], and other, less understood non-coding RNAs have been shown to have
significant impact on epigenetics and embryonic development [15,173]. To ably carry out this
myriad of cellular roles, functional RNAs often must fold into highly compact and specific
tertiary structures, and improved RNA structure prediction is necessary to fully understand and
take advantage of the variety of RNA.
Molecular dynamics (MD) simulations of RNA have yielded many promising results
toward this end, producing accurate predictions of small RNA structures [52,174]. While there is
much promise in atomistic simulation as a tool to better understand RNA dynamics and
ultimately predict full RNA tertiary structures, there remain hard limits on the amount of real
time that fully atomistic MD simulations can account for, even in the case of small
biomolecules [53,54]. These limits have led to a desire for a more computationally efficient
methodologies, which has increased scientific interest in both implicit solute modeling and
coarse-grained molecular modeling of nucleic acids [64,175,176]. Coarse-grained models of
RNA [63,65,177] and DNA [62,178–180] abstract out the important physical interactions of a
fully atomistic model, thereby reducing the number of explicit polymer atoms needed to be
simulated by an order of magnitude or more. Similarly, implicit salt models allow for the
electrostatic interactions between the nucleic acid and solute ions to be treated globally, which
eliminates the need for difficult ion-equilibration and further reduces the number of atoms to be
simulated [181,182].
RNA, a negatively charged polyelectrolyte, relies on positive counterions to fold into a
compact, functional structure. While monovalent counterions can fill this role in high
concentrations, divalent ions like Mg
2+
have been found to have an especially powerful
stabilizing effect on folded RNA structure [45,47,90,95,106,183]. However, coarse-grained
models of RNA proposed to this point have either included only implicit monovalent ions or
have omitted the electrostatic effects of salts entirely. While these models have proven to be
107
useful for studying RNA structure and thermodynamics [65,177], large-scale conformational
shifts are often reliant on the attractive forces arising from the presence of higher valence
cations. To study these effects, we have previously proposed a tight-binding counterion model
that accounts for highly charged counterions implicitly, as well as a computational approach for
implementing this salt model in simulation [156,184].
In this paper, we present a new, two-state, coarse-grained molecular dynamics model of
RNA designed around and specifically tuned for our implicit salt model. We also modify our
previously described computational search method for calculating the counterion-mediated free
energy so that it can be feasibly incorporated into MD simulations without introducing a
computational bottleneck. We verify that our model is capable of maintaining stabilized
secondary structural features even under the influence of the attractive counterion-mediated free
energy. We also show that our model can be tuned so that it correctly mirrors experimentally
determined dynamics of single-stranded RNA (ssRNA) sections. Using our model, we also
study the effects of divalent salt on the stabilization of RNA tertiary structure by simulating
RNA two-way junctions, and we show that divalent ions play an important role in shifting the
RNA conformational ensemble toward a more generally folded state.
5.3. Methodology
In this section we formulate a coarse-grained (CG) model that meaningfully incorporates our
previously developed and validated implicit ion model, which can be used to compute the free
energy of interaction between an RNA and its associated counterions in simulation. The salt
model we use relies heavily on the RNA backbone conformation, as discussed in more detail in
Section 2.1. Because of the reliance on backbone conformation, the CG model we propose is
primarily designed to accurately match the backbone topology of true RNA secondary structures
while under the influence of an attractive electrostatic free energy. We validate this CG model
by verifying that the counterion-mediated free energy calculated in simulation correctly
approaches that found for RNA structures determined experimentally.
108
5.3.1. Theory
Figure 5.1 A given RNA molecule with a fixed conformation is coupled with a constant-volume ion reservoir with
chemical potentials
for positive and negative ions. The RNA is considered to have charged sites with through
which it interacts most strongly with the ions in solution. These interaction sites are shown here as open circles.
The RNA is negatively charged, and so positive ions of charge
, depicted as small filled circles, surround the
RNA and may associate with the ion interaction sites.
With very few exceptions [185], nearly all previously proposed CG models of RNA and DNA
either omit electrostatics entirely, or utilize a linearized Poisson-Boltzmann potential to account
for the effects of counterions [182]. This electrostatic potential applies a Debye-Hückel
screening factor to the negatively charged monomers, as shown explicitly here:
where
is the charge on site ,
is the distance between sites and , is the relative
dielectric,
is the Debye length, and Coulomb’s constant
. For a nucleic acid, the
charge on each phosphate site is negative,
, for the proton charge , and
is therefore
purely repulsive for nucleic acids.
provides a good approximation for understanding
polymeric behavior in most relevant monovalent salt concentrations, but fails to encapsulate the
∑
(5.1)
109
truly important effects of the actual salt conditions found in vivo. The cellular ionic milieu
contains physiologically impactful concentrations of divalent and trivalent ions, which not only
help to neutralize the inherent charge on the RNA, but also actively stabilize a compact
conformation via a net attractive free energy of interaction. Divalent ions and Mg
2+
in particular
are known to have a universally stabilizing effect on folded RNA conformations arising from a
net attractive electrostatic potential [90], and in that sense the free energy of interaction due to
divalent ions is inherently different from
.
Although our implicit counterion model has been presented in much more detail
previously [156], we will briefly review the major points of the model in order to motivate the
discussion of our coarse-grained RNA model. This model is used to compute the counterion
mediated free energy,
, which is incorporated into the overall CG potential defined in Section
5.3.2.
The counterion model is illustrated in Fig. 5.1. Given an RNA conformation, the RNA
molecule is considered to exist in a canonical fixed conformational state within a cloud of
positive and negative ions that is treated separately within a grand canonical ensemble. The full
electrostatic free energy of this system includes
, the self-interaction potential of the RNA
with coordinates { };
, the self-interaction potential of the ions with coordinates { }; and
,
the interaction potential of the ions with the RNA:
where
is the grand canonical trace over all ion coordinates.
and
are the constant
chemical potential and number of ions, for both the positive and negative ions, respectively.
for temperature and Boltzmann constant
.
The positive ions in solution are most likely to associate with the RNA at regions where
the negative charge is greatest. These electronegative regions are termed ‘ion occupation sites’,
and any such site is defined to be ‘occupied’ if it is currently ass ociated with a positive ion. For
(
)
(
)
(
)
(5.2)
110
an RNA with total ion occupation sites, any given site is described by its occupation
number:
The negative charges on the RNA are concentrated at each phosphodiester bridge connecting the
sequential nucleosides. Each bridge holds a single bound phosphate with a full charge. Due
to their high electronegativity, we therefore assume that the phosphate sites are the most likely
ion-association sites, and define the positions of the phosphates to be the positions of the ion
occupation sites: { } {
}.
Using the ion occupation representation defined by Eq. 5.2 in tandem with a mean field
treatment of the surrounding, non-associated ion cloud, the free energy as defined in Eq. 5.1 is
then reduced to:
where
is the sum over occupation numbers, and
is the effective Hamiltonian, which
depends on the total number of associated ions,
, and the spatial conformation of the RNA, :
This Hamiltonian includes an electrostatic two-body interaction term between occupied sites,
, a one-body term arising from the interaction of occupied site with the surrounding field,
,
and an energy term,
, accounting for the interaction of non-occupied sites.
Use of
to evaluate the ensemble average for the total occupation number, 〈
〉,
reveals that the number of ions associated with the RNA at physiological salt concentrations in
the range from 0.05 mM to 5 mM MgCl
2
is effectively constant, such that the total ion charge
neutralizes the total charge of the RNA. That is, 〈
〉
for the relevant salt concentrations.
As an important simplificiation, we therefore consider
to be constant, and treat the full
system solely in a canonical manner – strictly as a function of the RNA structure, . All ion
{
(5.3)
(
(
)
) (5.4)
∑
(
)
∑
(
)
(
)
(5.5)
111
configurations in this ensemble are charge neutral, ∑
, and the set of all such ion
configurations is .
Considering only divalent ions, the relevant Hamiltonian is reduced to:
Each site occupied by a divalent ion has a net +1 charge, and therefore
has charges strictly
defined to be
{ }.
To evaluate
in practice, we expand on a previously described search algorithm called
‘boundary search’. Boundary search take s a static RNA conformation as input, and then defines
a discrete-space mapping of the canonical ion-configuration space about that
conformation. [184] Once the ion configuration space has been constructed, boundary search
applies a uniform-cost search method to efficiently find the energy states most relevant to the
ensemble. For a well-formed RNA, the ground state and other low-lying energy states dominate
the canonical ensemble and can be used to generate a very good approximation for
. In our
previous study, boundary search was incorporated into a fully-atomistic Monte Carlo simulation
of RNA and shown to work well for compact RNA structures up to 400nt in length.
Furthermore, because boundary search finds a full ensemble of relevant states, it can be used to
calculate the ensemble forces arising from the free energy. Generally, the vector of these forces
is given by:
for charge-site coordinates {
}. For a given configuration of ions , the force on the site
is
, where
is the energy for configuration given by Eq. 5.6. Evaluation of Eq.
5.7 shows that the ensemble average force on site is then:
∑
(5.6)
(5.7)
〈
〉
∑
∑
(5.8)
112
While the original boundary search algorithm detailed in [184] can be directly
incorporated into a molecular dynamics simulation to calculate the forces in Eq. 5.8, there are
some significant computational efficiency issues that arise in doing so. In particular, the full
boundary search algorithm runs in (
) time for number of nucleotides . In practice, this run
time makes the calculation of
and
⃑⃑⃑
a major computational bottleneck. We work around this
bottlneck by modifying boundary search so that it does not run a complete search for each
dynamic time step. This modification can be made when considering the following two points.
First, the small time steps of a dynamical simulation necessarily prohibit large motions of
particles between time steps, and so the relevant ion configurations are slow to change over the
course of a simulation. Using our coarse-grained model for RNA as described in Section 5.3.2,
we measure this behavior and find that the average time that a particular ground state persists is
for an ssRNA loop 20nt in length with a persistence length of .
Moreover, boundary search does a good job of finding many different low-lying energy
configurations beyond just the ground state. This is because the search is guaranteed to find a
new state on each iteration, unlike a Monte Carlo approach which may only find a subset of
relevant states. We take advantage of this feature by keeping track of the low energy ensemble
once the search has completed, and then utilize that ensemble to generate the ensemble forces for
several MD iterations before running boundary search again.
The number of charge configurations found in search is defined heuristically by a
parameter max_nodes. max_nodes determines how many search iterations are performed,
and therefore how many ion configurations are evaluated. Our previous report on boundary
search found that a value of max_nodes = 10 does a good job of elucidating the relevant
ensemble for RNA crystal structures up to 400nt in length [184].
113
Full pseudocode for the modified boundary search is shown here:
Let P be a persistent set of explored energy nodes
Let R be the set of nodes with connected edges {u,v}, where and
If a full search is to be performed
Initialize P to be empty
Initialize with node
While |P| < max_nodes
Select the node that has the lowest total energy value
{ }
Foreach node with connected edge {u,v}
{ }
End foreach
Endwhile
Else
Recalculate energy values in P for current charge site configuration
Use all ion configurations in P to calculate
and
In practice, a full repetition of boundary search is performed every , and we use a
value of max_nodes = 20. These values were chosen to be conservative considering the
bounds established above. Use of modified boundary search provides an approximate speedup
of calculation proportional to the number of searches skipped. In our case, where we use a time
step of , we find an order of decrease computational time spent calculating
when compared to using a full search on each time step. Modified boundary search can therefore
be incorporated into our CG model presented in Section 5.3.2 without seriously impacting the
time required to run a full simulation.
5.3.2 Model
The counterion-mediated free energy,
is heavily dependent on the conformation of the RNA
backbone. Specifically, the modified boundary search algorithm works on the assumption that
the RNA phosphate sites are generally well separated in such a way that the neighboring
phosphate sites along the backbone are usually the closest to each other in space. In all previous
analyses of boundary search, crystal structures with well-defined secondary structural features
were used as a basis for calculations, and the rigidity of these features is necessary for boundary
search to be effective.
114
The principal secondary structure of RNA, the A-form helix, is structurally very stiff,
with a persistence length
[186,187]. Conversely, single-stranded loop
sections are much more flexible with a persistence length
[35,163]. The flexible
loop sections allow for most large-scale RNA motion, and it is therefore important that the model
defined in this section has appropriately well-defined secondary structure while simultaneously
preserving the dynamics of single-stranded loop sections. Additionally, the model must account
for the overall attractive electrostatic free energy. Unlike most other CG potentials,
is not
defined around equlibrium positions in space, and the electrostatics alone could easily collapse
any RNA structure if not appropriately balanced by the other structural forces.
We implement a 3-bead per nucleotide CG model wherein each nucleotide is encapsulated by
three bound and sterically-interacting beads, which represent the phosphate, sugar, and
nucleobase. Similar models have been used for many previous CG nucleic acid simulations [62–
64] because they represent a strong balance between the computational speed increases gained
from coarse-graining, which in this case reduces the number of simulated atoms by an order of
magnitude, and the resulting control over the backbone and tertiary structure necessary for
validation of physical correctness.
Compared to most other 3-bead models, our model incorporates two important
modifications. Because of the vast difference in dynamics between ssRNA and dsRNA sections,
it is natural to treat each nucleotide differently depending on if it is base-paired within a helix or
if it is within a single-stranded loop section. We therefore utilize a two state model which is
incorporated by applying a different set of potentials to each nucleotide depending on its current
state. The nucleotide state is explicitly defined via a state parameter,
:
If the nucleotide is base paired, then a dihedral potential is imposed upon it, forcing
consecutive base-paired nucleotides into a rigid A-form helical structure. Conversely, if the
nucleotide is unpaired then no dihedral potential is applied, and the torsion angles can rotate
{
(5.9)
115
freely, as would be the case in a true RNA molecule. This separation of paired and unpaired
sections of RNA has successfully been used previously to study the topology of two-way
junctions [188]. Although the unpaired nucleotides are free to rotate, we also impose a stiffness
potential which allows for control of the flexibility of the single-stranded sections and prevents
collapse due to electrostatic attraction. The differences between paired and unpaired nucleotides
are illustrated in Fig. 5.2.
Our CG model is focused more on backbone conformations than the details of base-
pairing, but we also allow for generally defined base types in an effort to broaden future
applications. To accomplish this, base-pairs are allowed to be specifically defined in a dynamic
manner. That is,
is toggled dynamically in simulation, and is dependent on the structure of the
RNA. This is explicitly done by monitoring the distance between interacting bases over the
course of simulation. Once two interacting bases have been within a critical distance,
, of
each other for a set amount of time,
, the bases are internally declared to be ‘paired’, and
is
set to for each associated nucleotide.
Naturally, this switching of base-pairing states must be reversible for a generally
applicable model. It is expected that RNA base pairs have a lifetime around the nanosecond time
scale [189], and so a ‘breaking’ probability is imposed that randomly attempts to break base pai rs
on each time step with probability
, where is the length of time step. When
broken, each nucleotide in the pair is set to
, and interaction between the two is turned off
for so that re-equilibration can occur.
116
Figure 5.2 Each coarse-grained nucleotide is treated differently depending on its internal state. A) if the nucleotide
is unpaired, a stiffness force is applied such that angle is fully extended. B) If the nucleotide is base-paired, then a
dihedral force is applied about each torsional angle . Additionally, the base-pair is kept structurally rigid by
applying a force to extend angle , between bases, as well as a distance harmonic force to keep the bases separated
by a distance
. In both cases, bond lengths are set at a distance
and bond angles are set to , depending on the
type of beads involved.
The second major modification we implement stems from the fact that we are using an
overall attractive electrostatic free energy. It is standard for coarse-grained models to strictly
control all of the bond distances, angles, and dihedral angles through the use of harmonic
potentials such as harmonic, quaternary, and Lennard-Jones potentials. In the case of a
polyelectrolyte such as RNA, a repulsive electrostatic potential like
in Eq. 5.1 has a decay
term that is strong enough to prevent the electrostatics from overwhelming the other forces, and
breaking the polymer apart or disturbing its overall desired structure.
On the other hand, the opposite problem arises in the case of an attractive electrostatic
free energy like
. That is, rather than being concerned about the electrostatics repeling the
structure, the main concern is in preventing the electrostatics from causing the polymer to
117
collapse. The lowest-energy state for opposite charges is at a distance of zero, and so the other
potentials must be appropriately balanced to prevent this collapse from occurring.
Our coarse-grained model is comprised of nine potentials in addition to
, the
calculation of which has already been discussed. Each of the other potentials is discussed
individually below. Parameterization of these potentials has been carried out to specifically
match A-form helix structure in double-stranded sections and to match experimentally
determined scaling and dynamics in the single-stranded sections. The complete energy of the
CG model is given here as:
The general polymeric potentials,
and
are defined:
is the standard Weeks-Chandler-Andersen purely repulsive potential which is
included to prevent steric overlap of beads. For all bead pairs except phosphate-phosphate pairs,
. However, for the charge-carrying phosphate beads,
. The
larger value of more closely approximates the true Van der Waals radius of a phosphate
molecule ( ) , and importantly puts a hard limit on the minimum distance between point
charges.
is the well-established bonding potential for finite extensible non-linear elastic
polymers [190]. This potential is applied to all beads bound to each other along the RNA
backbone. It is used specifically because it has been parameterized to disallow the
polyelectrolyte from crossing over itself with parameters
and
[116,117]. In
these and all other potentials, is the distance between the bead pairs being evaluated.
[
]
(5.10)
∑ ((
)
(
)
)
(5.11)
∑
( (
)
)
(5.12)
118
Single-stranded structure is enforced by the potentials
,
, and
, which are
defined:
is applied across neighboring sugar bead triplets, where
is the angle between
three consecutive sugars about a central sugar as depicted in Fig. 5.2A. and
. The
minimum of this potential is at an angle of , and therefore
gives the RNA backbone an
inherent stiffness that can compete with and balance out an attractive electrostatic free
energy [191]. Moreover,
effectively allows for direct parameterization of the flexibility
of the single-stranded sections of the RNA through variation of the parameter
. It is important
to note that
is only applied to sugars in unpaired nucleotides (
).
is the base-stacking potential which is treated with a the general Lennard-Jones form, so
that the equilibrium distance between bases is representative of an A-form helix.
and
.
is necessary to help maintain helical structure and to facilitate base-
pairing by keeping neighboring bases together, which then allows consecutive base pairs to
dynamically form more easily following an original nucleation.
Bead Triplet
value (degrees)
P-S-B 102.0
P-S-P 120.15
S-P-S 94.49
Table 5.1 Bond angle values for
. Each bond angle is defined between a triplet of beads bound by
.
∑
(
)
(
)
(5.13)
∑
((
)
(
)
)
(5.14)
∑
(
)
(5.15)
119
is a harmonic potential that enforces a given bond angle between all consecutively bound
beads.
is the angle between the bead triplet about a center bead and
. Table 5.1
gives the values for
, which has been parameterized in combination with
and
to help preserve the appropriate distance between nearest-neighbor phosphate beads, as well as to
enforce an A-form helical geometry in base-paired sections of RNA.
The base-pairing potentials
,
, and
are defined as:
is the general, long-distance base-base interaction.
is applied only to Watson-Crick
base pairs (A-U, G-C) with
and
, and provides an attractive potential which
allows those bases to converge. However, without specifying base pairs, this potential may give
rise to clusters of three or more bases due to its general radial symmetry.
is a harmonic distance force, where
,
, and
. This potential keeps paired nucleotides at an appropriate distance from each
other, and restricts the electrostatics between paired phosphate sites from overwhelming the
helical structure. Note that the distance between bases here does not accurately mirror the true
distance between bases in a true A-form helix. This is not particularly important because we are
more interested in the backbone structure, and so the only important physical distance is the
distance between paired phosphates.
In order to keep the RNA helix appropriately rigid,
is applied where
and
is the angle about each base, its paired base, and its bound sugar as illustrated
∑
( (
)
(
)
)
(5.16)
∑ [
(
)
(
)
]
(5.17)
∑
(
)
(5.18)
120
in Fig. 5.2B.
is applied symmetrically to each nucleotide in a particular pair, and
forces the nucleotides apart orthogonally with respect to the helical axis.
In addition to the potentials applied to the bases themselves, nucleotides in paired state also have
the potential
applied to the beads:
where
is the torsional angle between four consecutive beads, and
is given in Table 5.2, and
. This potential allows neighboring base-pairs to arrange into an A-form helix.
Bead Quartet
value (degrees)
P-S-P-S -154.8
S-P-S-P -179.17
B-S-P-S 33.6
S-P-S-B 54.0
Table 5.2 Equilibrium dihedral angle values for
where the bead sequence is read in the 5’ to 3’ direction.
Note that many MD and CGMD simulations apply a standard harmonic potential about
the desired dihedral angle,
∑
(
)
. However, the related force term
has a
singularity and is unstable for some values of
[192]. Because
is toggled
dynamically in this model, the initial value of
is not known before it is applied, and it cannot
be guaranteed that this singularity will not arise and terminate the simulation. As a result, there
is a symmetry in
and the helices that form in our model may be left- or right-handed
depending on the intial conditions of base-pairing. Additionally, although an explicit singularity
is avoided by using
instead of
, some initial angles of
were observed to create
forces strong enough to break the polymer if simply toggled on in a single time step. To avoid
this, a time-dependent scaling factor ( ) is applied to
, which linearly increases the
force over after the pair has been toggled.
( ) ∑
[ (
) (
)]
(5.19)
121
In some cases we are interested in how the RNA behaves under ionic conditions that are less
than saturated, as determined by
. In these cases,
is substituted with a simple partial-
charge electrostatic potential
, which is defined as:
Rather than vary the dielectric constant as is done in
, the dielectric is constant for
, and the partial charge on each phosphate site,
, is adjusted. As
is decreased to zero,
the effect is similar to saturating the system with monovalent ions. That is, the electrostatics
become fully screened without creating any attractive electrostatic interactions. Note that
because there is no screening term applied to
as there is in the Debye-Hückel screened
potential like that found in Eq. 5.1,
is also not screened so that it may be compared more
directly with
.
5.3.3. Validation
We validate the model proposed in Section 5.3.2 by independently analyzing the
conformational statistics of both ssRNA unpaired loops and base-paired RNA hairpins.
Typically, some combination of these two states will exist in a given simulation, and so it is
important to verify that each state correctly models its corresponding RNA dynamics and
structure.
For the simulations performed, we apply a standard Velocity-Verlet numerical integration
in the context of Langevin Dynamics with a thermal bath temperature of . Time steps of
are used, and all simulations are initialized with a fully extended single-stranded
RNA structure with bond angles minimized and torsion angles set to an A-form helix as seen in
Fig. 5.3. Velocities are randomly initialized with a Gaussian distribution about the desired
temperature. Additionally, because the forces that control structure are applied asymmetrically
( ) {
(5.20)
∑
(
)
(5.21)
122
to the last two nucleotides on both the and ends of the RNA, those four nucleotides, colored
in all backbone figures as purple, are excluded from base pairing and electrostatic interactions.
This specifically prevents poorly defined end-effects from interfering with the overall RNA
dynamics. All statistics gathered also ignore these four end nucleotides.
Figure 5.3 Initial conformation of the RNA structure for all simulations is a fully extended polymer conformation
with bond angles and bond length equilibrated. On top is the full structure with all beads explicitly depicted. Below
is the backbone structure, which illustrates the overall RNA conformation more simply. Purple tail sections are inert
and orange sections have non-interacting bases. Seen here is an initial conformation for a hairpin loop with
secondary structure found in Fig. 5.6, with
.
We first show that our model is capable of correctly matching experimental results for
single-stranded loop sections of RNA. Specifically, we aim to match the dynamics of true
ssRNA by maintaining a persistence length,
, equivalent to that found experimentally.
is a
good measure of flexibility at the scale of individual nucleotides, and so it is important to
accurately match that flexibility in simulation. There have been many more experimental studies
of ssDNA [35,163,193–199] than ssRNA [163,193,194], however it is typically found that
ssRNA is marginally more rigid than ssDNA because of the extra hydroxyl group on the ribose
which slightly reduces flexibility through steric collision. Experimental estimates for the
persistence length of ssDNA vary with cation concentration and valence, and range from
under very concentrated salt conditions of MgCl
2
[193] to
at
NaCl [163] to
at urea [200]. Similarly, ssRNA persistence lengths
123
have been shown to vary widely, from
in MgCl
2
[193], to
in
NaCl [163].
Figure 5.4 Persistence length,
, under the influence of different electrostatic potentials. On the left hand side,
results for
are shown as a function of the dielectric constant, . On the right hand side, results for
are shown
as a function of the partial charge,
. Both potentials converge to an electrostatically neutral value of
. As is decreased in
, the
approaches experimental estimates for RNA in the presence of high-
concentration divalent counterions, while increasing
in
leads to values of
, which is what is found
experimentally for RNA in dilute monovalent salt conditions.
Even for a single concentration of salt, experimental results for
tend to vary. For
example, for ssDNA in NaCl, experimental values of
range from [201] to
[198]. Given this wide range of experimental results, we will not attempt to match any
one particular value for
. We instead take a more general approach and analyze the range of
dynamics that the model is capable of matching under different electrostatic conditions. When
is used as the electrostatic potential, an attractive force is generated, and the persistence
length of the single-stranded chain decreases as the dielectric constant, , decreases. On the
other hand, when
is used as the electrostatic potential, the persistence length decreases as
decreases toward an electrically neutral state.
124
The persistence length of ssRNA loops is determined by simulating a RNA with no
applied base pairing interactions.
is calculated using orientational correlation length
definition of persistence length [202], where the vectors between adjacent sugar beads are used
as the correlated vectors. Results of these simulations can be seen in Fig. 5.6. In the case when
electrostatics are excluded, we find a persistence length of nucleotides, which, when
multiplied by the average distance between sugar beads, gives a value of
.
When
is used as the electrostatic potential, the persistence length ranges from
(at
) to
(at
). For values of
, the persistence length becomes too long to effectively measure with an RNA of the length
we simulate. Regardless, the range allowed by this model accounts for most relevant
monovalent salt concentrations, and
increases expectedly with
. When
is included as
the electrostatic potential, values of result in a persistence length equal to that found with
no electrostatics. However, results in a value
and results in a
value
. Below , the magnitude of
becomes too large to be reasonable,
as the electrostatics overwhelm the other energies. These values of
are shorter than those
found with
, as expected, and are more relevant for divalent salts. Rivetti et. al. found a
value of
for ssDNA in and MgCl
2
[197], which is close to
physiological divalent salt concentration, but well below physiological monovalent salt
concentration, which is typically on the order of . Under true physiological salt
conditions, this value for ssDNA would likely be slightly shorter. We therefore postulate that
provides a good estimate for the value of
for ssRNA under similar physiological salt
conditions or a higher purely divalent counterion concentration.
One other good measure of ssRNA statistics is the scaling exponent, . The radius of
gyration,
is expected to decrease as increasing salt concentration more effectively neutralizes
the nucleic acid chain. This polymer size dependence scales as a power law with the number of
nucleotides:
, where and are constants for any given salt concentration.
125
Experimental scaling values for ssRNA are not readily available, but scaling has been studied
experimentally for ssDNA [203,204], and scaling is likely very similar between the two different
nucleic acids. To study the scaling, we simulated ssRNAs of lengths { }
under the effects various electrostatic potentials. The average 〈
〉
was then taken across all
simulations for each value of , and was determined by fitting the power law curve to our
results.
Figure 5.5 Scaling exponent, , under the influence of different electrostatic potentials. On the left hand side,
results for
are shown as a function of the dielectric constant, . On the right hand side, results for
are shown
as a function of the partial charge,
. Both potentials converge to an electrostatically neutral value of
. As is decreased in
, also decreases, which demonstrates the attractive forces arising from
allow the
ssRNA to approach ideal polymer scaling of . On the other hand, as
is increased in
, the ssRNA
approaches linear scaling, indicating that the electrostatics overwhelm all other potentials and force the ssRNA into
a fully extended conformation.
Results for the scaling properties of our model can be seen in Fig. 5.7. For a neutral, self-
avoiding coil, the scaling exponent is expected to be [205], and a similar value was
found experimentally for denatured proteins [206]. Our results from simulating ssRNA with no
electrostatics applied gives , which closely matches the theoretical and
experimental results. Experimental studies of poly-dT ssDNA find scaling values ranging from
126
in NaCl concentrations of to respectively [35]. Applying
results in values of for
, and for
which
mirrors the experimental scaling in monovalent salt. As
is increased further, the charged sites
overwhelm all other forces, and the scaling exponent approaches unity.
Conversely when
is applied, the attractive forces generated allow the scaling
exponent to fall below the self-avoiding neutral polymer scaling. As with the persistence length,
values of give results similar to the neutral system, but gives
and gives . Scaling values have not been experimentally determined for
divalent cations, but these values agree with the values of
that we find, which do match
experimental estimates.
Figure 5.6 Illustration of a standard dsRNA hairpin loop structure with a tetraloop bend at one end. The 5’ and 3’
ends, shown here in purple, are non-interacting. The gray section is completely base-paired, with
for all
nucleotides in this area. The orange tetraloop is not base-paired, and nucleotides in the loop have
, allowing
for free dihedral rotation.
We next show that the helical RNA duplex conformations dynamically generated in
simulation appropriately mirror the known structure of RNA A-form helices found in
experimentally determined RNA crystal structures. The magnitude of
is especially
responsive to the backbone structure of the RNA conformation, and small changes in structure
show up prominantly in the value of
. Taking advantage of this property, and because our
model is primarily constructed around the inclusion of
, we use it as a means of validating the
structure generated.
127
To analyze the structure of a dsRNA hairpin loop structure, base pairs are
specifically enforced and not allowed to break once formed. Explicitly enforcing base pairing in
this way allows us to verify that the structures form and behave as expected. Base pairing is
defined so that a tetraloop structure is formed with
being the number of C-G base pairs in
the hairpin with secondary structure illustrated in Fig. 5.6.
Figure 5.7 A) A hairpin loop with
is simulated and compared to a crystal structure of an equivalent RNA
hairpin loop, shown at left. An example of the backbone structure determined in simulation is shown at right. B) A
longer hairpin loop with
. A crystal structure is shown at left compared to a dynamically generated
structure shown at right. At bottom is a 2ns trace of
for each of these RNAs shown below in black, compared to
the value of the experimentally determined structures, shown as an orange dashed line.
128
Dynamically generated hairpin loops are compared to crystal structures for
and
. For
, a standard crystal structure (PDB ID: 1GID) [149] has a value
. At equilibrium, a hairpin loop of equivalent size was found to have an average
value of . Similarly, a crystal structure (PDB ID: 4FNJ) [20] with
has a value
, and an equivalent CG structure has an average value of
in simulation. The crystal structures and example dynamic structures
used in this calculation are shown in Fig. 5.7 alongside traces of
over compared to the
constant crystal structure value. These results show that our model does a good job of
maintaining the important A-form helical features even in the presence of both a heat bath and an
attractive electrostatic free energy.
In the CG RNA duplex helical structure, the distance between adjacent phosphate
beads is found to be
. We compare this to the average value of
measured across 15 crystal structures of RNA, which is found to be . The average
distance is exactly the same, although the variance in the CG model is substantially smaller than
the variance found experimentally. This is likely due to the stiffness in the potentials that are
necessary within the scope of coarse-graining compared to a fully atomistic model, which has an
inherent stiffness arising from more complex rigid chemical structures but has overall more
extensible monomer structures.
These analyses confirm that this CG RNA model is capable of demonstrating
experimentally accurate behavior in the presence of both monovalent or divalent counterions.
ssRNA loop sections are especially attuned to the different electrostatic potentials applied, and
our results show that the tuning parameters for each potential can be easily adjusted to mirror any
experimental results obtained to date. Similarly, helical stem portions of RNA are much more
stable experimentally, and our model preserves this stability by applying strong conformational
forces to the nucleotides determined to be in a stable, secondary structure.
129
5.4. Results
Previous theoretical studies have predicted that two helices or two like-charged rods or
nucleic acid helices will attract each other in the presence of divalent ions [101,114,124].
Follow-up experimental studies have demonstrated that DNA duplexes may not demonstrate this
predicted attraction under the influence of divalent ions [207], but that RNA duplexes do enter
into an attractive regime at concentrations of
and above [208]. In our previous
studies [156], we have also found that
is predicted to be negative in the presence of Mg
2+
at
similar concentrations, suggesting a net attractive average force between RNA helices. To test if
our CG model correctly reproduces this behavior, we have run a number of MD simulations of
RNA helices joined by a 2-way junction. These simulations were designed to study the attractive
nature of RNA helices, and we do find an effect similar to that expected experimentally.
Figure 5.8 Illustration of the secondary structure of the RNAs being simulated. Two equal-length helical stems with
base pairs each are connected by two ssRNA sections, each of length
. The orange tetraloop and inner
loop sections have no base interactions, and the final two nucleotides on the and end are non-interacting and
colored purple.
We ran simulations of two connected, symmetric RNA helical stems with
base pairs
each. The two stems are connected by two flexible loop sections where each loop is
nucleotides in length. The RNA secondary structure is depicted in Fig. 5.8, and the base pairs
are not allowed to break once formed, thereby keeping a strong helical stem structure on either
side of the loop. Two sets of twelve simulations were run for each value of
. The first set of
was run with no electrostatics, for which the loop section has a persistence length
, and the second set was run with counterions included via
with , for which the
loop section has persistence length
. A value of was chosen because it is
130
most closely emulates true salt conditions found in vivo, as proposed in Section 5.3.3. Because
the helical sections are rigidly held in place, the only portion of the structure affected by the
single-stranded persistence length is the loop section, and because
is short, its flexibility
should have a mimimal effect on the larger-scale interaction between helices.
Figure 5.9 The bend angle, , is calculated by taking the vector along each helical axis. Here, two conformations
generated in simulation for
are shown. A) An ‘open’ conformation with and B) a ‘closed’
conformation with .
Simulations were run for
and . In all cases
, which, at
over three times the persistence length, is long enough to allow the two helices to interact
meaningfully without interfering in the overall conformational ensemble. The primary quantity
that we measured is that of the ‘bend angle’, , between the two helices. This is measured by
taking the center point of two base pairs on each helix symmetrically about the loop region, and
then using the vector connecting those two points as the long helical axis. is then found by
taking the inner product between the two axis vectors for each helix. Examples of two
structures, one closed and one open, with their associated axis vectors shown in Fig. 5.9.
131
Figure 5.10 Normalized distributions of the bend angle, , for
in plots A, B, and C respectively
for simulations with no electrostatics (Neutral), shown as a solid black line, and those with implicit divalent ions via
with , shown as a dashed blue line. In all cases, as well as for
(not shown), the
conformational distribution is shifted toward a more folded state as depicted in Fig. 5.9B. The distribution of for
the neutral simulations is nearly identical in all cases, with 〈 〉 in all cases. The distribution of with
divalent ions is slightly more varied, but indicates that the distribution is not necessarily a function of
.
132
Normalized histograms of the bend angles for
and can found in Fig. 10.
These histograms show that that inclusion of counterions via
meaningfully increase the
amount of time that the RNA is in a closed conformation. Furthermore, with no counterions it
can be seen that the bend angle statistics remain the same as
is varied, with an average value
of 〈 〉 for each value of
. With counterions included, the average value of is
shifted down slightly in every case, ranging from 〈 〉 for
to 〈 〉
for
. It is clear that the value of is not correlated with
, as the
distribution shifts are all nearly the same, regardless of
.
Furthermore, the correlation function measured between and the magnitude of
shows that the two are uncorrelated for all values of
. This suggests that
plays a general
role in helping the helices to come together as an overall attractive free energy rather than
generating specific forces that act on the helices alone.
The overall effect on the conformational ensemble due to the divalent counterions is
relatively small, as there is only a slight shift away from a neutral ensemble, with an average
difference in of across all values of
. In spite of this relatively small impact, it should
be noted that in folded RNAs are typically stabillized by multiple other specific and strong
tertiary interactions, and that the counterion-mediated free energy is indeed only a small aspect
of overall RNA folding. The fact that divalent counterions allow the RNA to form a slightly
more closed conformation even some of the time indicates that close-range tertiary interactions
like those between tetraloop and tetraloop-receptors would more likely form in the presence of
divalent ions, which suggests that role divalent ions play is indeed quite important.
In addition to the bend angle, the radius of gyration,
, was also measured for each
value of
.
gives a good estimate of the size of the size of the RNA molecule, and a plot of
as a function of
can be seen in Fig. 5.11. The difference between
values for
simulations with divalent ions versus neutral simulations does not vary much with
. This
suggests that much of the difference is due to a more compacted loop region rather than due to
133
the larger conformational shifts affecting change in . However, as shown in Fig. 5.10, the
RNAs with divalent ions are certainly folded more often than their neutral counterparts, and so
some of the difference is also due to the shift in .
Figure 5.11 Radius of gyration,
vs. helix length,
for simulations with both no electrostatics (Neutral) and
with
imposed with . The size of the RNA with
applied is always smaller than with no electrostatics,
which owes mainly to the inter-helical attraction as well as the loop sections being slightly more compacted in the
presence of divalent counterions.
Results from these simulations demonstrate that our CG model is capable of recreating
the influence of high-concentration divalent cationic salts seen experimentally [208]. The overall
effect on the RNA conformations due to
is small and non-specific, but nevertheless very
important when considering overall RNA folding dynamics. That
permits the RNA to more
often reach a semi-folded state indicates that tertiary interactions that further stabilize a given
RNA folded structure are more likely to form in the presence of divalent counterions than in the
presence of monovalent ions alone.
134
5.5. Conclusion
We have developed a new coarse-grained RNA model for use with an implicit divalent
counterion model. This particular model diverges from previous coarse-grained models, in that
it is primarily designed around an attractive electrostatic free energy, whereas all other models to
date have incorporated a repulsive Coulomb or screened Coulomb potential. We have verified
and benchmarked our model against experimental analyses of ssRNA, demonstrating that it is
fully capable of matching true ssRNA dynamics under a range of salt conditions, including
MgCl
2
. Additionally, as a two-state model, it is able to dynamically form stable A-form helical
structures in the presence of both repulsive and attractive electrostatics. Finally, from extensive
simulations using our CG model, we have shown that divalent counterions play an important role
in allowing RNAs in a two-way junction to form folded conformations more often, which
suggests that other tertiary interactions may work cooperatively with divalent ions to form stable,
folded RNA structures.
135
Chapter 6:
Conclusion
In this thesis we have presented a complete story of the theory of RNA-ion interactions
and detailed our contribution to the field of polyelectrolyte theory and simulation. We developed
a novel model of polyelectrolyte-counterion interaction that is complex enough to meaningfully
represent the most relevant counterions in an explicit manner, but simple enough to analyze
numerically without massive computational power. We then developed novel algorithmic
approach to calculating the most relevant physical properties of this interaction – the free energy
of interaction between the RNA and its associated ions, and the forces associated with that
interaction – both necessary for incorporation of our model into simulation. Finally, through the
lens of molecular simulation we studied the effects of divalent ions on RNA molecules and
found that it is certainly a very important interaction that cannot not be neglected when studying
nucleic acid systems, especially when attempting to replicate true environmental conditions. The
attractive free energy that arises in the presence of divalent ions is inherently different from the
Debye- Hückel screening energy that is typically used as a model for monovalent ions, and
therefore must be treated in a unique fashion.
Initial application of our counterion model to the medium-sized functional RNAs of the
Schistosoma Hammerhead Ribozyme and the Tetrahymena Group I Intron P4-P6 domain have a
smaller RNA-ion free energy of interaction in their native folded state compared to their
unfolded state by , which is of the magnitude expected experimentally. This work
also shows that the native state is stabilized by the RNA-ion interaction in addition to the other
stabilizing interactions, such as base-pairing and tertiary interactions. Further application of our
model to the full Tetrahymena Group I Intron revealed the model’s ability to handle large RNAs
of up to 400 nucleotides in length. This size limit includes most all functional RNAs besides the
136
ribosome, and suggests that incorporation of the model into most meaningful RNA simulations is
feasible.
In order to most effectively incorporate our model into simulation, we developed a
method of mapping the ion-configuration space onto an undirected graph space. That space can
then be searched using a uniform-cost search algorithm, which finds the most relevant ion-
configurations energetically and does so many times faster than an equivalent Monte Carlo-based
optimization algorithm. Integration of this algorithm into Loops MC, the fully atomistic Monte
Carlo conformational simulation platform, revealed that the RNA-counterion free energy alone is
enough to help fold large RNA molecules. When combined with other attractive potentials like
the van der Waals potential, the RNA finds realistic and compact folded states. Specifically,
simulation of Tetrahymena Group I Intron P4-P6 domain in the presence of divalent ions showed
a folding behavior identical to that found experimentally.
Because the algorithm we have developed finds all of the relevant ion configurations that
contribute to the partition function, we were able to extend it to calculate the ensemble average
forces that impact the RNA, and thus we were able to incorporate it into molecular dynamics
simulations. We developed a coarse-grained model of RNA that is specifically designed to
handle the potentially large attractive electrostatic interactions generated by divalent ions while
still matching experimental results for both single-stranded RNA and helical duplex sections of
RNA. Simulation of two-way RNA junctions using our CG model reveal that the attractive
forces arising from the divalent ions allow the helical sections of RNA to attract each other
despite their natively opposing charges. This result was expected theoretically and
experimentally for RNA due to its A-form helical structure.
Moving forward, there are many opportunities for improvements on and further
developments of the work presented here. Likely the most pressing issue not covered in this
dissertation is the study of a general salt model. While much of the prior work covered in
Chapter 2 was focused on pure monovalent electrolytes, and our work has focused on divalent
cationic electrolytes, the true salt solution in vivo is a milieu of both of the above, plus trivalent
137
and other multivalent cations that may undoubtedly be heavily correlated with polyelectrolytes
and thus impact the interaction between ions and RNA. The model we present in Chapter 3 is
able to accommodate cations of varying valence, and it would be valuable to investigate a more
robust salt solution within that framework. Also related to this is incorporation of Debye
screening within our free energy calculation. The Debye screening length is relatively long, and
so for the small single molecules that we studied in this thesis, it is not necessarily relevant.
However, in order to study larger molecules and intermolecular systems, that aspect of our model
will need to be further fleshed out.
Lastly, while the numerical approach we present in Chapter 4 works well for modeling
divalent ions in the cases presented, it is important to note that is ultimately a heuristic method.
Due to the NP-completeness of the ion-configuration problem, there is no one correct or optimal
method for calculating the counterion mediated free energy of interaction. Rather, there may be
other heuristic approaches that are either faster or better suited to other types of ions that should
continue to be studied. For example, a goal of studying the explicit ions associated with the
ribosome, an RNA structure comprised of thousands of nucleotides, will require an approach that
is able to much more thoroughly explore the ion configuration space. What we have presented in
this dissertation is only a starting point to a much deeper conversation and study of our model of
polyelectrolytes and counterions as a whole. Nevertheless, this work combined with all the
ongoing work of many others in the field are continuing to show progress in studying this very
difficult subject with the ultimate goal of fully-realized simulations of RNA in conditions
equivalent to those in vivo, which would greatly deepen our understanding of the building blocks
of life.
138
Bibliography
[1] G. J. Olsen and C. R. Woese, J. Fed. Am. Soc. Exp. Biol. 7, 113 (1993).
[2] S. D. Copley, E. Smith, and H. J. Morowitz, Bioorg. Chem. 35, 430 (2007).
[3] R. W. Carthew and E. J. Sontheimer, Cell 136, 642 (2009).
[4] D. P. Bartel, R. Lee, and R. Feinbaum, 116, 281 (2004).
[5] L. He and G. J. Hannon, Nat. Rev. Genet. 5, 522 (2004).
[6] A. J. Hamilton and D. C. Baulcombe, 213, 1997 (1999).
[7] E. A. Doherty and J. A. Doudna, Annu. Rev. Biochem. 69, 597 (2000).
[8] C. Hammann and D. M. J. Lilley, Chembiochem 3, 690 (2002).
[9] F. L. Murphy and T. R. Cech, Biochemistry 32, 5291 (1993).
[10] C. Guerrier-Takada, K. Gardiner, T. Marsh, N. Pace, and S. Altman, Cell 35, 849 (1983).
[11] S. Altman, Mol. Biosyst. 3, 604 (2007).
[12] M. C. Hammond, Nat. Chem. Biol. 7, 342 (2011).
[13] A. Serganov and D. J. Patel, Nat. Rev. Genet. 8, 776 (2007).
[14] K. L. Frieda and S. M. Block, Science 338, 397 (2012).
[15] A. Pauli, J. L. Rinn, and A. F. Schier, Nat. Publ. Gr. 12, 136 (2011).
[16] M. Guttman, I. Amit, M. Garber, C. French, M. F. Lin, D. Feldser, M. Huarte, O. Zuk, B.
W. Carey, J. P. Cassady, M. N. Cabili, R. Jaenisch, T. S. Mikkelsen, T. Jacks, N.
Hacohen, B. E. Bernstein, M. Kellis, A. Regev, J. L. Rinn, and E. S. Lander, Nature 458,
223 (2009).
[17] P. Brion and E. Westhof, Annu. Rev. Biophys. Biomol. Struct. 26, 113 (1997).
[18] W. J. Greenleaf, K. L. Frieda, D. a N. Foster, M. T. Woodside, and S. M. Block, Science
319, 630 (2008).
[19] D. H. Mathews, J. Mol. Biol. 359, 526 (2006).
139
[20] L. A. Coonrod, J. R. Lohman, and J. A. Berglund, Biochemistry 51, 8330 (2012).
[21] D. H. Mathews and D. H. Turner, Curr. Opin. Struct. Biol. 16, 270 (2006).
[22] B. T. Young and S. K. Silverman, Biochemistry 41, 12271 (2002).
[23] D. J. Klein, T. M. Schmeing, P. B. Moore, and T. A. Steitz, EMBO J. 20, 4214 (2001).
[24] S. E. Butcher and A. M. Pyle, Acc. Chem. Res. 44, 1302 (2011).
[25] J. C. Kendrew, G. Bodo, H. M. Dintzis, R. G. Parrish, and H. Wyckoff, Nature 181, 662
(1958).
[26] Y. Shi, Cell 159, 995 (2014).
[27] M. S. Smyth and J. H. J. Martin, J. Clin. Pathol. - Mol. Pathol. 53, 8 (2000).
[28] N. E. Chayen, Acta Crystallogr. Sect. D Biol. Crystallogr. 54, 8 (1998).
[29] A. Wlodawer, W. Minor, Z. Dauter, and M. Jaskolski, FEBS J. 275, 1 (2008).
[30] C. D. Stoddard, J. Widmann, J. J. Trausch, J. G. Marcano-velázquez, R. Knight, and R. T.
Batey, J. Mol. Biol. 425, 1596 (2013).
[31] C. D. Putnam, M. Hammel, G. L. Hura, and J. A. Tainer, Q. Rev. Biophys. 40, 191
(2007).
[32] C. E. Blanchet and D. I. Svergun, Annu. Rev. Phys. Chem. 64, 37 (2013).
[33] L. Pollack, Annu. Rev. Biophys. 40, 225 (2011).
[34] S. P. Meisburger, J. L. Sutton, H. Chen, S. A. Pabit, S. Kirmizialtin, R. Elber, and L.
Pollack, Biopolymers 99, 1032 (2013).
[35] A. Y. L. Sim, J. Lipfert, D. Herschlag, and S. Doniach, Phys. Rev. E 86, 021901 (2012).
[36] D. Marion, Mol. Cell. Proteomics 12, 3006 (2013).
[37] G. Wider, Biotechniques 29, 1278 (2000).
[38] J. Flinders and T. Dieckmann, Prog. Nucl. Magn. Reson. Spectrosc. 48, 137 (2006).
[39] Y. Hashem, A. des Georges, V. Dhote, R. Langlois, H. Y. Liao, R. A. Grassucci, T. V
Pestova, C. U. T. Hellen, and J. Frank, Nature 503, 539 (2013).
[40] P. B. Moore, D. M. Engelman, and B. P. Schoenborn, J. Mol. Biol. 91, 101 (1975).
140
[41] L. Stryer and R. Haugland, Proc. Natl. Acad. Sci. U. S. A. 58, 719 (1967).
[42] S. G. Schultz and A. K. Solomon, J. Gen. Physiol. 45, 355 (1961).
[43] W. Jahnen-Dechent and M. Ketteler, Clin. Kidney J. 5, i3 (2012).
[44] V. K. Misra and D. E. Draper, J. Mol. Biol. 317, 507 (2002).
[45] V. K. Misra and D. E. Draper, Biopolymers 48, 113 (1998).
[46] C. Hurwitz and C. L. Rosano, J. Biol. Chem. 242, 3719 (1967).
[47] J. L. Boots, M. D. Canny, E. Azimi, and A. Pardi, RNA 14, 2212 (2008).
[48] S. C. Harvey, M. Prabhakaran, B. Mao, and J. A. Mccammon, Science 223, 1189 (1984).
[49] D. A. Zichi, J. Am. Chem. Soc. 117, 1425 (1995).
[50] P. Auffinger and E. Westhof, Biophys. J. 71, 940 (1996).
[51] P. Auffinger and E. Westhof, J. Mol. Biol. 269, 326 (1997).
[52] A. A. Chen and A. E. García, Proc. Natl. Acad. Sci. U. S. A. 110, 16820 (2013).
[53] J. L. Klepeis, K. Lindorff-Larsen, R. O. Dror, and D. E. Shaw, Curr. Opin. Struct. Biol.
19, 120 (2009).
[54] D.-A. Silva, D. R. Weiss, F. Pardo Avila, L.-T. Da, M. Levitt, D. Wang, and X. Huang,
Proc. Natl. Acad. Sci. U. S. A. 111, 7665 (2014).
[55] J. P. Ulmschneider and W. L. Jorgensen, J. Phys. Chem. B 108, 16883 (2004).
[56] C. H. Mak, Mol. Simul. 37, 537 (2011).
[57] C. N. Schutz and A. Warshel, 417, 400 (2001).
[58] R. Lavery, K. Zakrzewska, and H. Sklenar, Comput. Phys. Commun. 91, 135 (1995).
[59] B. E. Hingerty, R. H. Ritchie, T. L. Ferrell, and J. E. Turner, Biopolymers 24, 427 (1985).
[60] R. Rohs, C. Etchebest, and R. Lavery, Biophys. J. 76, 2760 (1999).
[61] S. Cao and S. Chen, RNA 11, 1884 (2005).
[62] T. A. Knotts, N. Rathore, D. C. Schwartz, and J. J. de Pablo, J. Chem. Phys. 126, 084901
(2007).
141
[63] S. Pasquali and P. Derreumaux, J. Phys. Chem. B 114, 11957 (2010).
[64] T. Cragnolini, P. Derreumaux, and S. Pasquali, J. Phys. Chem. B 117, 8047 (2013).
[65] N. A. Denesyuk and D. Thirumalai, J. Phys. Chem. B 117, 4901 (2013).
[66] P. Debye and E. Hückel, Phys. Zeitschrift 24, 185 (1923).
[67] G. Manov and R. Bates, J. Am. Chem. Soc. 65, 1765 (1943).
[68] R. Stokes and R. Robinson, J. Am. Chem. Soc. 70, 1870 (1948).
[69] T. Alfrey, P. Berg, and H. Morawetz, J. Polym. Sci. 7, 543 (1996).
[70] L. Kotin and M. Nagasawa, J. Chem. Phys. 36, 873 (1962).
[71] D. Stigter, J. Colloid Interface Sci. 53, 296 (1975).
[72] U. Micka and K. Kremer, 54, 2653 (1996).
[73] A. Dobrynin and M. Rubinstein, Prog. Polym. Sci. 30, 1049 (2005).
[74] A. Dobrynin, Curr. Opin. Colloid Interface Sci. 13, 376 (2008).
[75] C. Reed and W. Reed, J. Chem. Phys. 92, 6916 (1990).
[76] C. E. Reed and W. F. Reed, J. Chem. Phys. 94, 8479 (1991).
[77] J. Bjerrum, Chem. Rev. 46, 381 (1950).
[78] R. Fuoss, J. Am. Chem. Soc. 967, 5059 (1958).
[79] M. R. Wright, An Introduction to Aqueous Electrolyte Solutions (Wiley, 2007), pp. 386–
400.
[80] J. B. Hubbard, L. Onsager, W. M. van Beek, and M. Mandel, Proc. Natl. Acad. Sci. U. S.
A. 74, 401 (1977).
[81] E. R. Nightingale, J. Phys. Chem. 63, 1381 (1959).
[82] N. Sahai and D. A. Sverjensky, Geochim. Cosmochim. Acta 61, 2827 (1997).
[83] G. S. Manning, J. Chem. Phys. 51, 934 (1969).
[84] G. S. Manning, J. Chem. Phys. 51, 3249 (1969).
142
[85] D. Stigter, Biophys. J. 69, 380 (1995).
[86] G. S. Manning, Q. Rev. Biophys. 11, 179 (1978).
[87] G. S. Manning, Biopolymers 11, 937 (1972).
[88] G. S. Manning, Biopolymers 11, 951 (1972).
[89] N. D. E. Marky and G. S. Manning, Biopolymers 14, 1407 (1975).
[90] D. E. Draper, D. Grilley, and A. M. Soto, Annu. Rev. Biophys. Biomol. Struct. 34, 221
(2005).
[91] V. K. Misra and D. E. Draper, Proc. Natl. Acad. Sci. U. S. A. 98, 12456 (2001).
[92] B. Honig and A. Nicholls, Science 268, 1144 (1995).
[93] J. Colmenares, J. Ortiz, and W. Rocchia, Bioinformatics 30, 569 (2014).
[94] T. J. Dolinsky, J. E. Nielsen, J. A. McCammon, and N. a Baker, Nucleic Acids Res. 32,
W665 (2004).
[95] J. Lipfert, S. Doniach, R. Das, and D. Herschlag, Understanding Nucleic Acid–Ion
Interactions (2014), pp. 813–841.
[96] A. H. Boschitsch and P. V Danilov, J. Comput. Chem. 33, 1152 (2012).
[97] V. B. Chu, Y. Bai, J. Lipfert, D. Herschlag, and S. Doniach, Biophys. J. 93, 3202 (2007).
[98] I. Borukhov, D. Andelman, and H. Orland, Phys. Rev. Lett. 79, 435 (1997).
[99] Z. He and S.-J. Chen, J. Phys. Chem. B 117, 7221 (2013).
[100] Z.-J. Tan and S.-J. Chen, Biophys. J. 99, 1565 (2010).
[101] B.-Y. Ha and A. Liu, Phys. Rev. Lett. 79, 1289 (1997).
[102] T. Nishio and A. Minakata, J. Chem. Phys. 113, 10784 (2000).
[103] D. E. Draper, RNA 10, 335 (2004).
[104] G. C. L. Wong and L. Pollack, Annu. Rev. Phys. Chem. 61, 171 (2010).
[105] R. Hanna and J. A. Doudna, Curr. Opin. Chem. Biol. 4, 166 (2000).
[106] S. a Woodson, Curr. Opin. Chem. Biol. 9, 104 (2005).
143
[107] D. Leipply and D. E. Draper, J. Am. Chem. Soc. 133, 13397 (2011).
[108] S. L. Heilman-Miller, J. Pan, D. Thirumalai, and S. a Woodson, J. Mol. Biol. 309, 57
(2001).
[109] E. Koculi, C. Hyeon, D. Thirumalai, and S. a Woodson, J. Am. Chem. Soc. 129, 2676
(2007).
[110] D. E. Draper, Biophys. J. 95, 5489 (2008).
[111] L. Guldbrand, L. G. Nilsson, and L. Nordenski ld, . Chem. Phys. 85, 6686 (1986).
[112] J. Ray and G. S. Manning, Langmuir 10, 2450 (1994).
[113] A. P. Lyubartsev and L. Nordenskio, 5647, 4335 (1997).
[114] N. Grønbech-Jensen, R. Mashl, R. Bruinsma, and W. Gelbart, Phys. Rev. Lett. 78, 2477
(1997).
[115] A. Diehl, H. Carmona, and Y. Levin, Phys. Rev. E 64, 011804 (2001).
[116] M. J. Stevens and K. Kremer, J. Chem. Phys. 103, 1669 (1995).
[117] J. C. Chu and C. H. Mak, J. Chem. Phys. 110, 2669 (1999).
[118] H. Schiessel and P. Pincus, Macromolecules 31, 7953 (1998).
[119] F. J. Solis and M. O. de la Cruz, J. Chem. Phys. 112, 2030 (2000).
[120] C.-L. Lee and M. Muthukumar, J. Chem. Phys. 130, 024904 (2009).
[121] S.-J. Chen, Annu. Rev. Biophys. 37, 197 (2008).
[122] A. Dryga and A. Warshel, Biophys. J. 102, 686a (2012).
[123] Z.-J. Tan and S.-J. Chen, J. Chem. Phys. 122, 44903 (2005).
[124] Z.-J. Tan and S.-J. Chen, Biophys. J. 91, 518 (2006).
[125] A. Peracchi, L. Beigelman, E. C. Scott, O. C. Uhlenbeck, and D. Herschlag, J. Biol.
Chem. 272, 26822 (1997).
[126] N.-K. Kim, A. Murali, and V. J. DeRose, J. Am. Chem. Soc. 127, 14134 (2005).
[127] M. Vogt, S. Lahiri, C. G. Hoogstraten, R. D. Britt, and V. J. DeRose, J. Am. Chem. Soc.
128, 16764 (2006).
144
[128] T.-S. Lee, G. M. Giambaşu, C. P. Sosa, M. Martick, . G. Scott, and D. M. York, J. Mol.
Biol. 388, 195 (2009).
[129] J. Cate, R. Hanna, and J. Doudna, Nat. Struct. Mol. Biol. 4, 553 (1997).
[130] S. K. Silverman and T. R. Cech, Biochemistry 38, 8691 (1999).
[131] S. K. Silverman and T. R. Cech, RNA 7, 161 (2001).
[132] K. Takamoto, R. Das, Q. He, S. Doniach, M. Brenowitz, D. Herschlag, and M. R. Chance,
J. Mol. Biol. 343, 1195 (2004).
[133] R. Das, K. J. Travers, Y. Bai, and D. Herschlag, J. Am. Chem. Soc. 127, 8272 (2005).
[134] M. Greenfeld, S. V Solomatin, and D. Herschlag, J. Biol. Chem. 286, 19872 (2011).
[135] B. Zoetekouw and R. van Roij, Phys. Rev. E 73, 021403 (2006).
[136] B. Zoetekouw and R. van Roij, Phys. Rev. Lett. 97, 258302 (2006).
[137] N. D. Mermin, Phys. Rev. 137, A1441 (1965).
[138] P. M. Chaikin and T. C. Lubensky, Principles of Condensed Matter Physics (Cambridge
University Press, Cambridge, U.K., 2000), pp. 204–208.
[139] K. Chin, K. A. Sharp, B. Honig, and A. M. Pyle, 6, 1055 (1999).
[140] N. A. Baker, D. Sept, S. Joseph, M. J. Holst, and J. A. McCammon, Proc. Natl. Acad. Sci.
U. S. A. 98, 10037 (2001).
[141] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, J. Chem.
Phys. 21, 1087 (1953).
[142] C. H. Mak, W.-Y. Chung, and N. D. Markovskiy, J. Chem. Theory Comput. 7, 1198
(2011).
[143] M. Martick and W. G. Scott, Cell 126, 309 (2006).
[144] T. Ikeda, M. Boero, and K. Terakura, J. Chem. Phys. 127, 074503 (2007).
[145] C. Bennett, J. Comput. Phys. 268, 245 (1976).
[146] D. Grilley, A. M. Soto, and D. E. Draper, Proc. Natl. Acad. Sci. U. S. A. 103, 14003
(2006).
[147] V. K. Misra and D. E. Draper, J. Mol. Biol. 299, 813 (2000).
145
[148] R. Römer and R. Hach, Eur. J. Biochem. 55, 271 (1975).
[149] J. H. Cate, A. R. Gooding, E. Podell, K. Zhou, B. L. Golden, C. E. Kundrot, T. R. Cech,
and J. A. Doudna, Science 273, 1678 (1996).
[150] T. Uchida, Q. He, C. Y. Ralston, M. Brenowitz, and M. R. Chance, Biochemistry 41, 5799
(2002).
[151] M. Mandal and R. R. Breaker, Nat. Rev. Mol. Cell Biol. 5, 451 (2004).
[152] P. Sripakdeevong, K. Beauchamp, and R. Das, in RNA 3D Struct. Anal. Predict., edited by
N. Leontis and E. Westhof (Springer Berlin Heidelberg, Berlin, Heidelberg, 2012), pp.
43–65.
[153] D. C. Lynch and P. R. Schimmel, Biochemistry 13, 1841 (1974).
[154] a Stein and D. M. Crothers, Biochemistry 15, 160 (1976).
[155] Z.-J. Tan and S.-J. Chen, Biophys. J. 103, 827 (2012).
[156] C. H. Mak and P. S. Henke, J. Chem. Theory Comput. 9, 621 (2013).
[157] J. I. A. Liu and D. M. J. Lilley, RNA 13, 200 (2007).
[158] J. E. Johnson, F. E. Reyes, J. T. Polaski, and R. T. Batey, Nature 492, 133 (2012).
[159] J. J. Trausch, P. Ceres, F. E. Reyes, and R. T. Batey, Structure 19, 1413 (2011).
[160] F. Guo, A. R. Gooding, and T. R. Cech, Mol. Cell 16, 351 (2004).
[161] A. Warshel and S. T. Russell, Q. Rev. Biophys. 17, 283 (1984).
[162] F. Barahona, J. Phys. A. Math. Gen. 15, 3241 (1982).
[163] H. Chen, S. P. Meisburger, S. A. Pabit, J. L. Sutton, W. W. Webb, and L. Pollack, Proc.
Natl. Acad. Sci. U. S. A. 109, 799 (2012).
[164] E. Dijkstra, Numer. Math. 1, 269 (1959).
[165] A. D. Garst, E. B. Porter, and R. T. Batey, J. Mol. Biol. 423, 17 (2012).
[166] A. Serganov, L. Huang, and D. Patel, Nature 455, 1263 (2008).
[167] M. Lindqvist, K. Sandström, V. Liepins, R. Strömberg, and A. Gräslund, RNA 7, 1115
(2001).
146
[168] K. Travers, N. Boyd, and D. Herschlag, RNA 13, 1205 (2007).
[169] T. Ooi, M. Oobatake, G. Némethy, and H. A. Scheraga, Proc. Natl. Acad. Sci. U. S. A. 84,
3086 (1987).
[170] T. M. Henkin, Genes Dev. 22, 3383 (2008).
[171] Y. S. Lee and A. Dutta, Annu. Rev. Pathol. 4, 199 (2009).
[172] R. M. Schiffelers, A. Ansari, J. Xu, Q. Zhou, Q. Tang, G. Storm, G. Molema, P. Y. Lu, P.
V Scaria, and M. C. Woodle, Nucleic Acids Res. 32, e149 (2004).
[173] J. L. Rinn, M. Kertesz, J. K. Wang, S. L. Squazzo, X. Xu, S. A. Brugmann, L. H.
Goodnough, J. A. Helms, P. J. Farnham, E. Segal, and H. Y. Chang, Cell 129, 1311
(2007).
[174] R. Das and D. Baker, Proc. Natl. Acad. Sci. U. S. A. 104, 14664 (2007).
[175] W. G. Noid, J. Chem. Phys. 139, 090901 (2013).
[176] M. G. Saunders and G. A. Voth, Annu. Rev. Biophys. 42, 73 (2013).
[177] Z. Xia, D. R. Bell, Y. Shi, and P. Ren, J. Phys. Chem. B 117, 3135 (2013).
[178] A.-M. Florescu and M. Joyeux, J. Chem. Phys. 135, 085105 (2011).
[179] E. J. Sambriski, D. C. Schwartz, and J. J. de Pablo, Biophys. J. 96, 1675 (2009).
[180] T. E. Ouldridge, A. A. Louis, and J. P. K. Doye, J. Chem. Phys. 134, 085101 (2011).
[181] Y. Liu, E. Haddadian, T. R. Sosnick, K. F. Freed, and H. Gong, Biophys. J. 105, 1248
(2013).
[182] Y.-Z. Shi, F.-H. Wang, Y.-Y. Wu, and Z.-J. Tan, J. Chem. Phys. 141, 105102 (2014).
[183] D. Leipply and D. Draper, Biochemistry 50, 2790 (2011).
[184] P. S. Henke and C. H. Mak, J. Chem. Phys. 141, 064116 (2014).
[185] F.-H. Wang, Y.-Y. Wu, and Y.-Y. W. Z.-J. Tan, Biopolymers 99, 370 (2013).
[186] J. A. Abels, F. Moreno-Herrero, T. van der Heijden, C. Dekker, and N. H. Dekker,
Biophys. J. 88, 2737 (2005).
[187] P. J. Hagerman, Annu. Rev. Biophys. Biomol. Struct. 26, 139 (1997).
147
[188] A. M. Mustoe, H. M. Al-Hashimi, and C. L. Brooks, J. Phys. Chem. B 118, 2615 (2014).
[189] Y. Pan, Nucleic Acids Res. 31, 7131 (2003).
[190] R. C. Armstrong, J. Chem. Phys. 60, 724 (1974).
[191] M. C. Linak, R. Tourdot, and K. D. Dorfman, J. Chem. Phys. 135, 205102 (2011).
[192] M. Bulacu, N. Goga, W. Zhao, G. Rossi, L. Monticelli, X. Periole, D. P. Tieleman, and S.
J. Marrink, J. Chem. Theory Comput. 9, 3282 (2013).
[193] D. R. Jacobson, D. B. McIntosh, and O. A. Saleh, Biophys. J. 105, 2569 (2013).
[194] C. V Bizarro, A. Alemany, and F. Ritort, Nucleic Acids Res. 40, 6922 (2012).
[195] J. B. Mills, E. Vacano, and P. J. Hagerman, J. Mol. Biol. 285, 245 (1999).
[196] S. B. Smith, Y. Cui, and C. Bustamante, Science 271, 795 (1996).
[197] C. Rivetti, C. Walker, and C. Bustamante, J. Mol. Biol. 280, 41 (1998).
[198] M. C. Murphy, I. Rasnik, W. Cheng, T. M. Lohman, and T. Ha, Biophys. J. 86, 2530
(2004).
[199] E. U. Esis, I. Achter, and G. I. Eixese, Biopolymers 10, 1625 (1971).
[200] B. Tinland, A. Pluen, J. Sturm, and G. Weill, Macromolecules 30, 5763 (1997).
[201] S. V Kuznetsov, Y. Shen, A. S. Benight, and A. Ansari, Biophys. J. 81, 2864 (2001).
[202] M. Ullner and C. E. Woodward, Macromolecules 35, 1437 (2002).
[203] K. Rechendorff, G. Witz, J. Adamcik, and G. Dietler, J. Chem. Phys. 131, 095103 (2009).
[204] F. Alarcón, G. Pérez-Hernández, E. Pérez, and A. Gama Goicochea, Eur. Biophys. J. 42,
661 (2013).
[205] M. Doi and M. Edward, The Theory of Polymer Dynamics (Oxford University Press,
Oxford, UK, 1986).
[206] J. E. Kohn, I. S. Millett, J. Jacob, B. Zagrovic, T. M. Dillon, N. Cingel, R. S. Dothager, S.
Seifert, P. Thiyagarajan, T. R. Sosnick, M. Z. Hasan, V. S. Pande, I. Ruczinski, S.
Doniach, and K. W. Plaxco, Proc. Natl. Acad. Sci. U. S. A. 101, 12491 (2004).
[207] Y. Bai, R. Das, I. S. Millett, D. Herschlag, and S. Doniach, Proc. Natl. Acad. Sci. U. S. A.
102, 1035 (2005).
148
[208] S. A. Pabit, X. Qiu, J. S. Lamb, L. Li, S. P. Meisburger, and L. Pollack, Nucleic Acids
Res. 37, 3887 (2009).
Abstract (if available)
Abstract
Ribonucleic Acid (RNA) is a ubiquitous, highly functional biomolecule found throughout life. As a polyelectrolyte, RNA is negatively charged and requires the presence of positive counterions to condense into the specific, folded structures that are necessary for the chemistry of life. RNA molecules are especially influenced by divalent metal ions such as Mg²⁺, the presence of which not only helps to neutralize the RNA, but creates an overall attractive free energy—an important effect not found under simple monovalent salt conditions. ❧ Methods of modeling ionic and polyionic systems date back to the early 20th century, when Peter Debye and Erich Hückel developed their purely analytical, Poisson-Boltzmann formalism of electrolytes that persists as the primary model used for most theoretical applications. However, the Debye-Hückel model breaks down when considering highly correlated polyions such as nucleic acids, as well as divalent and other high-valence ionic solutions in concentrations similar to those found in vivo. The theoretical limits of the Debye-Hückel model in tandem with the rise of simulation and numerical methods for studying biological systems has led to a need for alternative models of polyelectrolyte systems that will correctly account for complex ionic conditions while simultaneously allowing for calculations fast enough to be incorporated into simulation. ❧ In this thesis, we study the interaction between RNA and positive counterions, focusing specifically on divalent cations. We begin with a discussion the structure and function of RNA, and we cover new research showing that RNA has more functionality and biochemically relevant applications than ever previously thought. This is followed by a recap of prior theoretical studies and models of electrolyte and polyelectrolyte systems—both for RNAs specifically as well as general models. We then present our own model for RNA-ion interactions, which allows us to calculate the counterion-mediated free energy of RNA, but only through an NP-complete optimization problem. A heuristic approach to this NP-complete problem is developed, which is shown to work very well for all functional RNAs smaller than the ribosome, and is computationally fast and effective enough to be incorporated into full simulation. Monte Carlo simulations of RNA are run with our implicit divalent ion model incorporated, and we show that our model matches closely with experiment. Most biomolecular simulation is done via molecular dynamics methodology, and to account for this, we further develop our numerical approach so that ensemble forces may be calculated and incorporated into molecular dynamics simulations. Finally, we develop a full coarse-grained model of RNA that accurately handles the attractive electrostatic forces generated by the presence of divalent counterions, and use this model to simulate and study RNA 2-way junctions.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Investigating voltage activated potassium and proton channels
PDF
Capturing the complexity of ion channels: simulations of long-time dynamics, conformational changes and the effect of the membrane potential
PDF
Theoretical studies of lipid bilayer electroporation using molecular dynamics simulations
PDF
Molecular simulations of water and monovalent ion dynamics in the electroporation of phospholipid bilayers
PDF
The physics of membrane protein polyhedra
PDF
CI approach to spectra of quantum dots
PDF
Molecular dynamics and virial expansions for equilibrium thermodynamics in the solar interior
PDF
Deterministic evolutionary game theory models for the nonlinear dynamics and control of chemotherapeutic resistance
PDF
A diagrammatic analysis of the secondary structural ensemble of CNG trinucleotide repeat
PDF
Multiscale modeling of cilia mechanics and function
PDF
Theoretical modeling of nanoscale systems: applications and method development
PDF
Electronic correlation effects in multi-band systems
PDF
Out-of-equilibrium dynamics of inhomogeneous quantum systems
PDF
Shock-induced poration, cholesterol flip-flop and small interfering RNA transfection in a phospholipid membrane: multimillion atom, microsecond molecular dynamics simulations
PDF
Modeling graphene: magnetic, transport and optical properties
PDF
Data-driven approaches to studying protein-DNA interactions from a structural point of view
PDF
Simulating the helicase motor of SV40 large tumor antigen
PDF
Discovery of mature microRNA sequences within the protein- coding regions of global HIV-1 genomes: Predictions of novel mechanisms for viral infection and pathogenicity
PDF
Step‐wise pulling protocols for non-equilibrium dynamics
PDF
Bacterial nanowires of Shewanella oneidensis MR-1: electron transport mechanism, composition, and role of multiheme cytochromes
Asset Metadata
Creator
Henke, Paul S.
(author)
Core Title
The effects of divalent counterions on the formation and stabilization of RNA tertiary structure
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Physics
Publication Date
10/20/2015
Defense Date
10/05/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
coarse graining,counterions,divalent ions,Mg²⁺,molecular dynamics,Monte Carlo,OAI-PMH Harvest,RNA,simulation
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Haas, Stephan (
committee chair
), Bickers, Nelson (
committee member
), Boedicker, James (
committee member
), Haselwandter, Christoph A. (
committee member
), Mak, Chi H. (
committee member
)
Creator Email
paulhenke@gmail.com,phenke@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-190494
Unique identifier
UC11277964
Identifier
etd-HenkePaulS-3979.pdf (filename),usctheses-c40-190494 (legacy record id)
Legacy Identifier
etd-HenkePaulS-3979-1.pdf
Dmrecord
190494
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Henke, Paul S.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
coarse graining
counterions
divalent ions
Mg²⁺
molecular dynamics
RNA
simulation