Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Computer modeling and free energy calculatons of biological processes
(USC Thesis Other)
Computer modeling and free energy calculatons of biological processes
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
C O M PU TER MODELING AND FR E E ENERGY CALCULATIONS OF BIOLOGICAL PROCESSES by Frederick Siklun Lee A D issertation Presented to the FACULTY OF TH E GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In P artial Fulfillment of the Requirements for the Degree D O CTO R OF PHILOSOPHY (Chem istry-Chem ical Physics) May 1992 Copyright 1992 Frederick Siklun Lee UNIVERSITY OF SOUTHERN CALIFORNIA P h - P - C b e * l A T \ 9 7 |S & l ^ a under the direction of ...... D issertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillm ent of re quirem ents for the degree of D O C T O R OF PH ILO SO PH Y D ean o f Graduate Studies D a te ....... DISSERTATION COMMITTEE \x ) THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90007 This dissertation, w ritten by Frederick Siklun Lee To La.ii Siu Yin Acknowledgements I would like to thank my m other for her love, patience and hope as well as my brothers Chris and Andy for their prayers and support. I would like to thank my professor Arieh Warshel for providing me the op portunities, guidance, training and understanding throughout the course of this program . I would also like to thank my friends Terry Catlin, Jung Hur, Mike Ito, Gia Kim, Michelle Kim, Sylvia Law, James Ng, James Priestley and Christine Russell for their kindness, help and concern over the years. Contents D e d ica tio n ii A ck n o w led g em en ts iii L ist o f T ables v i L ist o f F igu res v iii P refa ce x 1 M icroscop ic and sem im icroscop ic calcu lation s o f e le c tr o sta tic en erg ies in p ro tein s by th e P O L A R IS and E N Z Y M IX p rogram s 1 1.1 In tro d u c tio n ...................................................................................................... 1 1.2 Theoretical methods ..................................................................................... 3 1.2.1 The im plem entation of the simplified microscopic PDLD method in P O L A R IS ...................................................................... 3 1.2.2 The im plem entation of the surface constrained all-atom model and the FE P m ethod in E N Z Y M IX ............................ 17 1.2.3 Scaled microscopic models can extend the precision of elec trostatic calculations: the PD LD /S m ethod ......................... 30 1.2.4 Some macroscopic m o d e l s ............................................................ 36 1.3 Examining the performance of different m o d els.................................... 37 1.3.1 Solvation energies of small molecules in s o lu tio n ................... 38 1.3.2 Self-energies of single charges in solvated p r o t e i n s ......................42 1.3.3 Ion pairs in solvated proteins ...................................................... 49 1.3.3.1 Ion pairs in w a te r............................................................ 50 1.3.3.2 Charge-charge interactions between surface groups in proteins ........................................................ 51 1.3.3.3 Charge-charge interactions in protein interiors . . 52 1.3.3.4 Ionic strength m odulation of charge-charge inter actions in p r o te in s ........................................................ 5? iv 1.3.4 Electrostatic free energy in enzyme catalysis and ligand binding .............................................................................................. 59 1.3.4.1 Electrostatic energies in enzym atic reactions . . . 59 1.3.4.2 Calculations of relative and absolute binding en ergies 62 1.4 Concluding re m a rk s........................................................................................ 67 M icro sco p ic sim u la tio n s o f m acroscop ic d ielectric c o n sta n ts o f so lv a ted p ro tein s 75 2.1 In tro d u c tio n ...................................................................................................... 75 2.2 Theoretical approaches for microscopic calculations of e ................... 79 2.2.1 The Kirkwood-Frohlich e from microscopic sim ula tions: M ethod I ................................................................................ 79 2.2.2 Evaluating e using (E): M ethod II ............................................ 89 2.3 Exam ination of e in solvated protein.......................................................... 91 2.4 Concluding re m a rk s...........................................................................................100 A L ocal R ea ctio n F ield M eth o d for fast ev a lu a tio n o f lon g-ran ge e le c tr o sta tic in tera ctio n s in m olecu lar sim u la tio n s 111 3.1 In tro d u c tio n .........................................................................................................I l l 3.2 The Local Reaction Field m e th o d ............................................................... 112 3.3 Results and discu ssio n s....................................................................................121 3.4 Concluding re m a rk s...........................................................................................131 v List of Tables 1.1 Van der Waals param eters of the PDLD m odel..................................... 17 1.2 P artial atom ic charges for some common amino acids......................... 18 1.3 P artial atom ic charges for some protonated am ines...................................21 1.4 P artial atom ic charges for some other organic m olecules.........................22 1.5 Some common van der Waals param eters of the all-atom model. . 29 1.6 Solvation energies of several ions and molecules in solution ob tained by the PDLD and F E P m ethods................................................... 39 1.7 Electrostatic energies of different ionized groups in proteins eval uated by different macroscopic and microscopic m ethods.......................45 1.8 The F E P self-energies of Asp 3 and Glu 7 of B P T I evaluated in the presence of another ionized acid...............................................................48 1.9 T he effect of a distant chloride ion on the self-energy of a sodium ion in water obtained by the PDLD m ethod........................................... 51 1.10 PDLD calculations of the effect of removing surface charged groups in subtilisin on the p K a of His 64............................................................... 53 1.11 Energetics of ion pairs in proteins obtained by th e macroscopic and microscopic m ethods............................................................................... 55 1.12 T he effect of ionic strength on the energetics of an ion pair in w ater obtained by the PDLD m ethod....................................................... 58 1.13 The effect of ionic strength on the p K a shifts of His 64 of subtilisin upon m utating Asp 99 or Glu 156 to a serine estim ated by the PDLD m ethod.................................................... 58 1.14 Reaction free energies of enzym atic reactions obtained by the macroscopic and microscopic m ethods...................................................... 60 1.15 Absolute and relative binding free energies of phosphorylcholine analogs by McPC603 obtained by the macroscopic and m icro scopic m ethods.................................................................................................. 63 1.16 PD L D /S estim ates of the absolute and relative binding energies and the corresponding IC50’s for different ligands of McPC603. . . 65 2.1 The dielectric constants obtained for various w ater spheres w ith and w ithout the reaction field...................................................................... 86 vi 2.2 The dielectric constants evaluated at different sites of the trypsin system w ith and w ithout the solvent effect............................................. 95 2.3 The dielectric constant evaluated at the oxyanion hole of trypsin with inner perm anent dipoles neutralized...................................................107 3.1 A comparison of the potential energy and force obtained w ith the 2nd-order LRF m ethod and Coulomb’s law............................................... 122 3.2 A comparison of the potential energy and force obtained w ith the 3rd-order LRF m ethod and Coulomb’s law................................................123 3.3 The sensitivity of the long-range potential obtained w ith the LRF m ethod to the cutoff radius.............................................................................124 3.4 Com paring the results of the truncation m ethod, the LRF m ethod and the no-cutoff m ethod for the average free energies of charging a sodium ion in w ater........................................................................................126 3.5 LRF calculations of the free energies of charging the acidic residues of B P T I.................................................................................................................. 128 3.6 LRF calculations of the self-energies of the acidic residues of B P T I obtained from single prolonged trajectories............................................... 130 vii List of Figures 1.1 Illustrating the relationship among the microscopic, simplified m i croscopic and macroscopic models................... 6 1.2 An illustration of the four regions of the protein system in the PDLD m ethod................................................................................................... 9 1.3 An illustration of the five regions of the protein system in the surface constrained all-atom model which is used in the F E P cal culations 25 1.4 Illustrating the relationship between microscopic and macroscopic descriptions of the energetics of an ion pair............................................ 31 1.5 T he therm odynam ic cycle used in the PD LD /S m ethod for calcu lating the free energy of charges in a. protein.......................................... 34 1.6 A plot of the observed and calculated solvation energies given in Table 1.6.............................................................................................................. 41 1.7 A therm odjm am ic cycle used to estim ate the energetic of dissoci ation of an acidic group of a protein........................................................... 43 1.8 Illustrating the fact th at a simplified microscopic model such as the PDLD model provides a ruler th at allows one to explore the range between the microscopic FEP models and the macroscopic m odels.................................................................................................................. 69 2.1 An illustration of the four regions of the SCAAS water model. . . 83 2.2 Illustrating the effect of E R F on e for the 4 A water system ....... 87 2.3 A plot of e for different radii of region 1 in the 12 A w ater system . 88 2.4 An illustration of the five regions of the protein m odel.................... 92 2.5 Illustrating the effect of the solvent on e for a site centered at the oxyanion hole of trypsin................................................................................. 96 2.6 Illustrating the effect of the solvent on e for a site centered at His 57 of trypsin............................................................................................... 98 2.7 Illustrating the effect of the solvent on e for a site centered at Ser 54 of trypsin............................................................................................... 99 2.8 The autocorrelation functions Cm (I) °f the total dipole m om ent for different protein sites with solvent effect................................................101 viii 2.9 The autocorrelation functions Cjv#(fc) of the total dipole m om ent for the trypsin oxyanion hole with ancl w ithout solvent effect. . . . 103 3.1 A schem atic picture of a system of water molecules illustrating the basic idea behind the LRF m ethod....................................................... 115 3.2 A schem atic diagram of the positions of the charges of the Ah and j t h groups of the system relative to an expansion center..................... 116 ix Preface During the last decade, com puter sim ulation has slowly found its way to become an im portant technique in many areas of biological research. A lthough this field of research is still at its adolescence, there is no doubt th a t, in years to come, it will bring many advances to structure and function correlations of biological molecules. I am very fortunate and grateful to have been a p art of a research group which has pioneered microscopic com puter modeling of biological molecules. Coming from a molecular biology discipline, my work in Dr. W arshel’s group has m ainly been learning the techniques of the research. All th a t I can m anage in the last few years merely allows me to appreciate the potential and challenge of this new discipline. This dissertation, which is organized in three chapters, provides a description of the few problems th at I have been exposed to during the course of the training. T he first chapter of this dissertation gives a description of some of the new developm ents th at I have contributed to our microscopic models and sum m arized the applications of these programs in several crucial test cases. These test cases allow us to objectively gauge the perform ance of our program s and to illustrate the applications of these programs to im portant biological problems such as ion or ligand binding to proteins, ion selectivity and perm eation through biological channels, m echanisms of catalysis as well as effects of m utations on protein func tions. T he chapter also tries to clarify some of the common m isunderstanding of electrostatic effects in solution and in macromolecules. It identifies th e correct evaluation of the self-energies of charges in proteins as the key prerequisite to the correct evaluation of electrostatic effects in proteins. x T he second chapter addresses the issue of macroscopic dielectric constants of proteins, which has been a very controversial subject among researchers in the field of molecular simulation of macromolecules. In practice, the use of a dielectric constant in molecular simulations is unnecessary when detailed m icro scopic approaches such as ours are employed. However, from a theoretical point of view, it is useful to develop com putational m ethods th a t can estim ate this quantity in proteins. This chapter describes such an effort which was initiated by a graduate of our group, Gregory King, who developed the microscopic ap proaches described in the chapter. It was found from using these m ethods th a t the dielectric constant around the active site of trypsin can be as large as 10. These results, however, conflict with the traditional view th a t describes proteins as low dielectric media. T he last chapter describes a new numerical m ethod which effectively elim i nates the crucial problem associated w ith the use of cutoff radii in electrostatic calculations in molecular simulations. In practice, large sizes of m acrom olecular systems and lim ited availability of com puter tim e dictate the use of small cutoff radii for the nonbonded interactions in these simulations. This truncation ap proach implies the neglect of the long-range forces, which often leads to incorrect results. Although other methods th at treat the long-range problems have been reported, only this m ethod has been dem onstrated to be applicable to m olecular sim ulations of protein systems. In particular, this m ethod is able to reproduce the results as obtained by a procedure which uses no cutoff but at a speed th a t approaches those of the truncation procedures. This level of perform ance allows us to use our current com puter power to average the results of free energy calcu lations over different sets of initial conditions. This averaging procedure appears to provide results th a t better represent the ensemble averages of a system. Finalfy, I hope th at this dissertation is able to provide the reader a sense of the current art of the technology. Future advances in com puter hardw are, b etter potential energy functions for macromolecules and more detailed models X I will definitely bring progress to areas such as refinements of X -ray and NM R d ata on protein structures as well as predictions of the effects of m utations on m acrom olecular functions. Los Angeles. California Frederick Lee March, 1992 Chapter 1 Microscopic and semimicroscopic calculations of electrostatic energies in proteins by the POLARIS and ENZYMIX programs 1.1 Introduction E lectrostatic energies provide what is probably the most powerful correlation of structure and function in biological molecules [2-4, 6]. However, obtaining stable and reliable results from electrostatic calculations of proteins and other m acro molecules is far from being simple, due to the long-range nature of electrostatic effects and to the non-homogeneous nature of the m icroenvironm ents of biolog ical molecules [2-4]. Microscopic calculations of electrostatic energies encounter m ajor problems in term s of their accuracies since they reflect the com pensation of large energy contributions and frequently involve a rather small cutoff radius for dipolar interactions. Macroscopic calculations, on the other hand, involve the use of assumed dielectric constants th at cannot be deduced from macroscopic considerations w ithout direct experim ental inform ation about the energetics of the relevant groups in the given macromolecule [4]. Probably the best way to select an optim al approach for electrostatic calcula tions of macromolecules is to examine the ability of different alternative models to reproduce the observed energetics of many different test cases. In fact, m any 1 of the traps encountered by early macroscopic studies (see discussion in Ref. [7]) ' could have been avoided if these m ethods were used to calculate other properties i i in addition to charge-charge interactions between surface groups. 1 This work adopts the above philosophy and considers an extensive set of test 1 cases ranging from the solvation of ions in water, the self-energies of charges in ; proteins, th e energetics of ion pairs in proteins, and electrostatic contributions to catalysis and binding. The m ethods exam ined include an adiabatic charg ing F E P approach th a t is rigorous in principle (with infinite num ber of solvent molecules and infinite sim ulation tim e), the simplified microscopic m odel of the PDLD m ethod, and the recently introduced PD LD /S m ethod. In addition, we j exam ine the perform ance of some commonly used, yet inconsistent, macroscopic ■ models. The comparison of the different m ethods provides useful inform ation ; about the range of applicability and the com puter tim e associated w ith each of these approaches. Special attention is given to the perform ance of the PD L D /S m ethod. This m ethod has been introduced to avoid some of the difficulties associated w ith the precision of the microscopic models by exploiting the inherent precision of the macroscopic models. T hat is, the macroscopic models are more precise th an the microscopic models since the electrostatic potentials and fields are scaled by a dielectric constant which represents the assumed com pensation between th e different microscopic contributions. However, the macroscopic models are i | frequently less accurate than the microscopic models. Fortunately, it is possible i to use the PD L D /S m ethod and to scale the microscopic PDLD contributions l in a way th a t avoids the m ain traps th at eluded most early m acroscopic studies of proteins and to obtain an expression th at considers explicitly the protein perm anent dipoles but scale their effects and the relevant self-energies by an 1 assum ed dielectric constant, efn (see Refs. [4, 8] for the actual m eaning of this 1 “dielectric constant”). It is found th at the PD LD /S m ethod can provide useful i estim ates of electrostatic energies in macromolecules. However, a combined use 2 of the PDLD , PD L D /S and F E P m ethods is highly recom m ended since it can provide a clear indication of the error range associated with the given electrostatic energy. 1.2 Theoretical methods This section considers our microscopic approaches for calculations of electrostatic energies and their im plem entations in the corresponding sim ulation program s. T he problems and relative advantages associated with each approach will be considered in the results section. 1.2.1 T h e im p lem en ta tio n o f th e sim p lified m icro sco p ic P D L D m e th o d in P O L A R IS T he PDLD m odel emerged quite early [2-4, 9] as a practical way of evaluating electrostatic energies in proteins on a microscopic level. This was essential at th a t stage in order to avoid the traps and uncertainties associated w ith the use of macroscopic models. The PDLD m ethod approxim ates th e average energy of the protein and w ater system by the electrostatic energy evaluated at the average structure. The average protein structure can be taken from the X-ray structure (corresponding to a tim e-average in the crystalline state), which is sometimes rem inim ized for different charge configurations. It is also possible to average the calculations over different structures generated by m olecular dynam ics (MD) sim ulations (see below). The average solvent polarization is evaluated using a three-dim ensional grid model. T hat is, the water molecules are represented by point dipoles on a grid constructed around the protein (which are allowed to p en etrate cavities if such exist). The grid is then truncated to a sphere of a certain radius and each point dipole is polarized towards the local field due to 3 the protein atom s as well as other solvent dipoles except th e nearest neighbors. Thus the local field on the *th Langevin dipole can be expressed as: C = it + i t - i t , (1.1) where is the field due to the protein perm anent charges (which is evaluated only once since the protein structure is fixed), £L is the field due to all other Langevin dipoles and £c is the field due to the Langevin dipoles th at are the nearest neighbors to the tth dipole.1 The polarization of the ith Langevin dipole by its local field ^ of Eq. 1.1 is approxim ated by a Langevin-type function [3]: ( 1.2) where e,- is a unit vector in the direction of the local field C' is a param eter and f.i0 is taken as 1.8D. The equation for the effective Langevin dipoles, f i f , is solved iteratively (/a- 1+ 1 is determ ined by the field from the previous iteration). The relevant polarization law can be obtained by calibrating the pa- types so th at the solvation energies of various ionic species w ith different charges and radii are reproduced. This rather simple solvent model appears to give quite reliable solvation energies, reflecting the fact th at the physics of solvation ef- served solvation energies and radial distribution functions and not in the source of the solvent charges or nonbonded potential. The above statm ent m ight sound 1T h e field which was used in Ref. [9] was replaced in som e PD LD versions (e.g. Ref. [10]) by giving faster convergence and results th a t were very sim ilar to those obtained w ith T h is useful treatm en t, which was recently criticized [11], is a reasonable approxim ation for m odels th a t represent perm anent dipoles by their projections, as d em onstrated in Fig. 5 of Ref. [3]. T he key idea here is th a t the distribution function for the average polarization of the solvent follows som e given polarization law and th a t a m odel which reproduces this law also would reproduce the electrostatic interaction between the solute and the solvent. ram eter C and the solute-solvent van der W aals’ distances for different atom fects can be described by many alternative dipolar models [2-4]. T h at is, the key for obtaining reliable energetics lies in the consistent calibration using ob- 4 strange to some since it is frequently stated in the literature th a t the hydro gen bonding properties of different solvents are very special. However, although different solvents m ight have different structural properties, it is found experi m entally th a t different polar solvents (with and w ithout hydrogen bonds) give very sim ilar solvation energies. For instance, the free energy of solvation for a N a+ ion in water and in am m onia is -98 and -96 kcal m ol-1 , respectively. Ap parently, the physics of electrostatic interactions involves m any com pensating effects so th a t the overall solvation energy turns out to be sim ilar even though its individual components are quite different. This m ight be one reason why solvation effects can be reproduced even with rather simple continuum models using an adjustable cavity radius [12]. It m ight be useful to point here th a t some workers who found macroscopic models to be com pletely valid have felt th a t the PDLD model represents a m ajor approxim ation [13]. A pparently, it is hard to appreciate the difference between the approxim ations involved in the simplified microscopic models (such as the PDLD model) and those involved in macroscopic models. In fact, while the PDLD model approxim ates the solvent by dipoles and then approxim ate the average polarization of each dipole by a Langevin dipole, continuum models approxim ate a collection of dipoles by a vol um e elem ent w ith a uniform polarization (see Fig. 1.1), which is a m uch more serious approxim ation. It is also useful to comment th at the PDLD m odel does not represent a static polarization [13] but simply the tim e average polarization of the solvent. T he PDLD model also explicitly includes the effect of the protein electronic polarization by assigning induced dipole moments to all the protein atom s. These dipoles interact w ith the perm anent charge distribution of the system and the Langevin dipoles as well as w ith each other, and are therefore treated by an iterative procedure [9, 10] (although an effective non-iterative approach is also available [9, 10]). 5 Figure 1.1: Illustrating the relationship among the microscopic, simplified m icro scopic and macroscopic models. The microscopic model uses detailed all-atom representation and evaluates the average interaction between the solvent resid ual charges and the solute charges. Such calculations are very expensive. The simplified microscopic model replaces the tim e average dipole of each solvent m olecule by a point dipole, while the macroscopic model considers th e collection of solvent dipoles in a large volume element as a polarization vector. 6 microscopic - explicit water molecules (slow convergence and expensive calculations) simpified microscopic - the average polarization of each solvent molecule is replaced by a dipole * macroscopic (continuum) - polarized volumn elements replace collection of dipoles 7 The present version of th e PDLD model, which is im plem ented in the program POLARIS [14], divides the protein into three regions as depicted in Fig. 1.2. Region 1 contains the “solute” or the groups whose electrostatic energy is of interest. Region 2 contains the charged and electroneutral groups w ithin a radius R 2 from the center of region 1. The protein groups beyond R 2 are tru n cated from the calculations and the corresponding space (region 4a) is taken by the solvent (in th e representation of either region 3 or 4). This spherical truncation makes the electrostatic calculations more stable. The solvent region, referred to as region 3, is modeled by the above m entioned spherical grid of Langevin dipoles of a radius R3. The grid has two spacings: the inner region (up to R in from the center) has a spacing of 2 A whereas the rem aining volume has a spacing of 3 A. The dipole m om ent, fj,L of Eq. 1.2, at any grid point is scaled by the factor (A /3 )3 where A is the grid spacing of the given region. T he rational for a sm aller inner grid spacing for the dipoles surrounding region 1 is to increase the stabilities of the calculations by giving smaller energy differences for different grid configurations. Grid points which are found w ithin the the van der Waals radius of any protein or hapten atom are deleted. The solvent region beyond beyond R 3 (referred to as region 4) is represented by a continuum m odel w ith th e dielectric constant of water (e = ew). The electrostatic energy of the PDLD model is evaluated by considering the different contributions to the energy of the solute in the surrounding environ m ent (regions 2, 3 and 4). These contributions include the following: (i) The interaction between the charges in the solute region (x'egion 1), which is given by V Q Q = 332'£^i (13 ) i<j J o’ (thi'oughout this work, energies are expressed in kcal m ol-1 , charges in units of electron charge, and distances in A), (ii) The interaction betw een the charges of 8 Figure 1.2: An illustration of the four regions of the protein system in the PDLD m ethod. Region 1 contains the charged groups of interest. Region 2 contains the protein atom s found w ithin a radius R 2 from the center. Region 3 is the Langevin grid truncated to a sphere of a radius R3. Region 4a contains the rest of the protein atom s outside region 2. The electrostatic effects of regions 1, 2 and 3 are treated explicitly while those of region 4 (4a and 4b) are treated by a m acroscopic continuum form ulation. 9 (4b) (4a) ( 2 ) 10 region 1 and the residual charges of the protein (the charges of region 2), which is given by ■ (1-4) Here we denote the charges of region 1 and 2 by Q and q, respectively, (iii) The energy of the protein induced dipoles, which is given by [3] 339 ^ « = — 9-5) where is th e field on the zth protein atom from the protein perm anent charges in regions 1 and 2 and n f is the protein induced dipoles associated with the polarization of the electrons of the ith protein atom. This dipole is given by /*f = <*,•&, (1.6) where a; is the atom ic polarizability of the ?'th protein atom and is the local field on the zth atom due to £°, the other protein induced dipoles and option ally the Langevin dipoles. In this model, the solute atom s in region 1 are not polarizable (i.e., their crs are set to zeros). The factor 1/2 in Eq. 1.5 reflects the energy invested in forming the induced dipole [3]. The field is evaluated with a screening function, /^, (th at prevents overpolarization at very close distances) and is given by C = V (i.7 ) 3 ? O fd(rij) = 1 - e x p ( - r j / r ^ ) , where rj, is taken as 3 A. (iv) The solvation energy of the solute and the protein by the surrounding water regions (regions 3 and 4) is given by [3] A G qw — AG lgvn + AGbulk (1-8) AC _ 332 r £±( ~ 'rlgvn — 0 V'k Sk _ 166Q l „ 1 , 166/i^ (2ew — 2) H ,„ G < 1) ' 11 where A Gigvn the energy of Langevin dipoles (in region 3) and A Gbuik is the electrostatic energy of our finite system in its infinite surrounding which is rep resented as a continuum (region 4). fx^ is evaluated by Eq. 1.2 and £/ is the field on the kth Langevin dipole from the protein perm anent charge distribution. T he factor 1/2 reflects th e energy associated w ith the polarization of the solvent dipoles [3]. Qi and ^ are the monopole and dipole m om ents of the solute at the designated center of the system. is the dielectric constant of w ater, which is taken as 80 throughout this work. One of the m ost tim e-consuming steps in the PDLD calculation is the evalua tion of the field £L of Eq. 1.1 at the site of each Langevin dipole. Such calculations require one to consider the contributions to from Langevin dipoles th a t are quite far from the fth dipole. These calculations can be accelerated by using a sm all cutoff radius and truncating the long-range interaction of the Langevin dipoles. This approach, however, leads to disappointing results. T he long-range problem can be partially overcome by the Local Reaction Field (LR F) m ethod [15]. The LRF m ethod calculates explicitly the short-range electrostatic interac tions at the site of each group by a standard truncation procedure (considering all the contributions from within the given cutoff radius), while approxim ating the long-range contributions by an expansion th at is updated in a m uch slower rate than the tim e steps used for the given calculation. In the case of the PDLD m odel where each group is taken as a Langevin dipole, we rew rite Eq. 1.1 as € c « ? + ( « + < l ) - e . a -# ) where £s is the short-range field due to the Langevin dipoles w ithin the cutoff radius and is the long-range field due to the rest of the Langevin dipoles of the system so th at £s T — £L. W hile £s is evaluated at every iteration, is u pdated only once in every 10 iterations (generally 25 iterations are needed for convergence). This rather simple modification not only gives th e corresponding 12 results obtained w ithout any cutoff, but also increases significantly the perfor m ance of the POLARIS program for large systems w ith m any (> 1000) solvent molecules. Earl}? studies [3, 9, 10] considered the corresponding value at the average con figuration (taken usually from the X-ray structure). However, the recent increase in com puter power allows one to consider an actual average over configurations generated by MD simulations at low tem perature [16]. This provides a m ore rig orous expression for the free energy, A G eiec, associated w ith changing the charge configuration of the solute from one charge state (state A) to another charge state (state B), taking into account consistently the reorganization of the system during the change in its charge state (see Ref. [16]). W ith the current averaging over the configurations of state B, we can express A Geiec as A G elec = AG (Q a , I\4 ~* Qb , r f i ) ( 1 - 1 0 ) = ( AVqq + A V q ^ + A V q q - + AG qw)b T- (AAGhyd)t) + A - t - A G 7 S = {^ILc)b + A + A G is , where Q and r designate, respectively, the charge distribution and the coordi nates of the indicated state. ( )a designates an average of th e indicated fxmction over th e equilibrium distribution of the a th state. Early studies [3, 9, 10] con sidered the corresponding value at the average configuration (taken usually from the X-ray structure). However, the recent increase in com puter power allows one to consider an actual average over configurations generated by MD sim ulations at low tem perature [16]. The term A A Ghyd °f Eq. 1.10 is the change in the hydrophobic free energy AGhyd due to the charging process, which can be estim ated by the recently developed approach [17] th at involves a crucial m odification of our early approach 13 [18]. The modified treatm ent estim ates the effect of the field from th e solute on the hydrophobic energy. The field corrected hydrophobic energy is given by where i runs over the sites of the Langevin dipoles on the surface (s) of the solute. F is a function of the electric field, on the ith langevin dipole at the solvation surface. F is 1 when the field is small (< £phobic) and 0 when the field is large (> £phiiic)- A B , C, ffihuic and phobic are scaling constants obtained from studies of the free energies of transfer of different molecules from hydrocarbons to w ater [17]. This approach was found to give excellent results in recent studies of solvation entropies of polar and nonpolar molecules [17]. The term A of Eq. 1.10 represents the “reorganization energy” associated w ith the change of the protein structure between and r e [19], which can be estim ated by the approach of [16]. This approach approxim ates A by tures, i\4 and r b , th at correspond to the configurations obtained w ith the un charged and the charged ligand, respectively. It is im portant to realize th at the reorganization energy reflects the steric effect of the protein. A pparently, one can consider the steric forces as the restoring forces for the electrostatic interac tions in systems th a t follow the linear response approxim ation (see A ppendix 5 of [3] for a related discussion). T he last term A GIS of Eq. 1.10 represents the effect of ionic strength asso ciated w ith the change of the solute charges from Qa to Qb - In general, the F ( 0 = ‘ ( 1.12) where (U^Jc)a and (U^/Jc)b are evaluated with the two m inim ized protein struc 14 effect of ionic strength on rate constants and binding is very small in proteins (i.e., < 1 kcal m ol-1 as compared to overall electrostatic effects of m ore th an 10 kcal m ol-1 ). Furtherm ore, since the effect of the ionic strength involves ions in w ater it can be estim ated quite reliably using macroscopic models includ ing discretized continuum treatm ents of the Poisson-Boltzm ann equation [6] and Debye-Huckel type treatm ents [20]. Thus we take a two-step strategy in our calculations. T he first step involves the regular microscopic PDLD calculation of the electrostatic energy of the solute in the absence of any ionic atm osphere. T he second step involves a macroscopic estim ate of the electrostatic interaction between the solute and protein groups in regions 1 and 2 and the given ionic atm osphere. T he best way to justify this approach is to consider the interaction between an ionizable group of the protein and an ion in water. This interaction can be evaluated by first calculating the self-energy of the given group in w ater at infinite separation from the given ion and then calculating the change in en ergy (A A G W , + A V q q ) upon bringing the ion from infinity to its given location. However, the energy associated w ith the second step can also be estim ated on a macroscopic level using Coulomb’s law and a large dielectric constant. T he same considerations can be applied to many (rather than just one) ions in water. O ur specific macroscopic treatm ent for the above second step is a variant of the grid approach introduced by Pack and coworkers [21]. This treatm en t represents th e ionic atm osphere (within a specific radius from the center of our system ) by a grid of residual charges. The grid spacing is taken here as 3 A and th e net charge density within each cubic elem ent centered at the eth grid point is determ ined by considering the corresponding B oltzm ann distribution and w riting Pt = p t ~ P7 e x p ( - ^ j ) exp(/?$8 ) °l£ * e x p (-/? $ * ) E * e x p ( ^ * ) J ’ j 15 whei'e p f and p~ are, respectively, the charge densities of the cations and anions in th e ith elem ent, d is l/k ^ T , N0 is the num ber of cations or anions per volume elem ent (for systems w ith a net charge of zero, and N~ are equal), and $ > t is the potential at the *th grid point due to the charges of the system and the electrolytes, which is given in our macroscopic model by tl = £ jL + £jL. (1.14) j e* r v k& € Here q j is the residual charge of the jth protein atom , qG is the point charge at th e kth grid point (representing the excess net charge of the k th volume elem ent). Since the effect of the water on the electrostatic interactions between the system and the ions themselves is accounted for macroscopically, we use t = cw (although e = 40 can also be used for comparison). The residual charge at the kill grid point is now given by qk = Tkpk , (1.15) where r/,; is the volume of the kth elem ent. The final set of the grid charges is obtained by solving Eqs. 1.13 and 1.14 iteratively. W ith these qGs, we can now evaluate the effect of the ionic strength, A6r;s, by P„G nGnG A G is = 332 V ) + 332 V , ( 1.16) i,j (■w l'jk where q f is the residual charge of the ith protein atom and qG is th e point charge at the corresponding grid point. The van der Waals param eters of the PDLD model, which are being used in the present version of the POLARIS program , are given in Table 1.1. These param eters were obtained by the same calibration procedure introduced in the early PDLD studies (e.g., [18]). The charges used in the PDLD calculations are the same as those used in the all-atom calculations and are given in Tables 1.2, 1.3 and 1.4. It is im portant to note th at the philosophy of calibrating van der Waals radii using observed solvation energies of representative molecules is now common 16 Atom < 7 Atom a Atom a Atom a H 1.50 0 2.00 N 2.40 C{sp3) 2.40 C(sP2) 2.00 P 3.20 S 3.10 Na 2.10 Ca 2.01 Mg 1.50 A1 0.90 As 1.58 B (sp) 1.35 B(sp 2 ) 1.23 F 2.00 Cl 2.50 Br 2.50 I 3.00 Ga 1.10 Ge 1.60 Zn 1.42 Fe 1.10 In 1.81 Ru 1.67 Sb 1.76 Se 1.60 Si 2.80 Sn 1.80 Tc 2.00 Te 3.10 Ti 1.42 Table 1.1: Van der Waals param eters of the PDLD model. T he van der Waals energy (in kcal mol-1 ) for the PDLD model is given by Vv r jw = (e2c .7)^ [(- - p )1 2 r -- 12— 2 ( g‘L^-)6 r -6], where i and j denote the atom s in the pairwise interaction and rq is the distance between the zth and j t h atom s. tt is taken as 3 kcal m ol-1 for all atom types, a (in A) corresponds to the cutoff radius for th e distance between the given atom type and the closest Langevin dipole, while sp, sp2 and sp3 represent the types of orbital hybridization for the given atom type. to m any F E P approaches. In fact, the m ost reliable treatm ents are based on this approach rather than on ab initio determ ination of the relevant param eters. It is crucial, of course, to keep the van der Waals param eters unchanged when applied to different ligand or protein molecules. 1 .2 .2 T h e im p lem en ta tio n o f th e su rface co n str a in e d a ll a to m m o d el and th e F E P m e th o d in E N Z Y M IX A lthough the simplified solvent representation of the PDLD m odel seems to give reasonable results [4], it is im portant to examine the perform ance of all-atom solvent models. Such treatm ents, which are in principle m ore rigorous, involve m ajor convergence problems and require the inclusion of a very large num ber of solvent molecules. One way of obtaining reasonable results w ith a lim ited num ber of solvent molecules is to use surface constrained approaches. Such approaches 17 Table 1.2: P artial atom ic charges for some common am ino acids. The charges are given in unit of electron charge, (a) P artial atom ic charges of the rest of all the hydrogen atom s are 0.097. 18 Main chain and Gly N H Ca H a C 0 -0.400 0.400 -0.097 0.097 0.550 -0.550 Side chains of polar amino acids Cp Hp Cy Osx 0&2 Asp -0.194 0.097 0.360 -0.680 0.680 c a Ha Cp Hp Oy Hy Ser~ -0.050 0.050 -0.100 0.050 -1.000 0.000 Cp Hp Cy N Sl H Sl C M H s2 H tl N ,2 His+ -0.194 0.097 0.160 -0.160 0.190 0.540 0.070 0.100 0.070 0.030 H a Cfi Hp Oy Hy Ser -0.050 0.050 -0.100 0.050 -0.450 0.450 c P Hp Cy N Sl H Sl H s2 C« H tl His -0.194 0.097 0.035 0.013 0.187 0.058 0.070 0.110 0.077 -0.550 Cd Hp Cy Osx Os2 Asp -0.194 0.097 0.450 -0.450 0.000 Cp Hp Cy Hy Cs H s N t H t c ( N v II r, Arg -0.194 0.097 -0.194 0.097 -0.194 0.097 -0.253 0.253 0.000 -0.834 0.417 Cp Hp Cy Osx Ns 7 h 5 Asn -0.194 0.097 0.500 -0.500 -0.834 0.417 Cp Hp Cy Hy c 6 H s H t N C H< Lys -0.194 0.097 -0.194 0.097 -0.194 0.097 -0.194 0.097 -1.080 0.360 Cp Hp Cy Hy 0 * a £ 2 H t Gin -0.194 0.097 -0.194 0.097 0.500 -0.500 -0.834 0.417 Ca H a Cp Hp C~f i H n n r 7 2 Hy2 Thr -0.050 0.050 0.050 0.050 -0.600 0.450 -0.241 0.097 Cp Hp Cy Hy Cs 0«i 0 t2 h — 1 CO Glu -0.194 0.097 -0.194 0.097 0.400 -0.400 0.000 Table 1.2 (continued). Side chains of nonpolar amino acids Ala A -0.291 A 0.097 Val Cp -0.097 Hp 0.097 A -0.291 At 0.097 A -0.291 As 0.097 Leu Cp -0.194 Hp 0.097 A -0.097 H7 0.097 Gi -0.291 A 0.097 A -0.291 A 0.097 lie A -0.097 Hp 0.097 A -0.194 A 0.097 A -0.291 0.097 A -0.291 A 0.097 Pro Cp -0.194 Hp 0.097 A -0.194 H7 0.097 Cs 0.194 A 0.097 P heW Cp -0.194 A 0 .0 0 0 A -0.097 A -0.097 A -0.097 A -0.097 A -0.097 T yrW Cp -0.194 c7 0 .0 0 0 A -0.097 A -0.097 A 0.150 0„ -0.600 A 0.450 A -0.097 A -0.097 T rpW Cp -0.194 c7 0 .0 0 0 A -.0.097 A -0.450 At 0.045 A 0 .0 0 0 A -0.097 A -0.097 A -0.097 A -0.097 A 0 .0 0 0 Cys A -0.050 Ha 0.050 Cp 0.050 Hp 0.050 S 7 -0.600 H 1 0.450 Met Cp -0.194 Hp 0.097 Gy 0.056 H 1 0.097 -0.500 G -0.041 A 0.097 to o Amine N H ca R a Cp Hp NH 4 c h 3n h £ C2H5N H J (CH3)3NH+ -0.080 0.000 -0.046 0.294 0.270 0.269 0.272 0.220 0.022 0.080 -0.009 0.057 0.062 0.057 -0.100 0.042 Table 1.3: P artial atom ic charges for some protonated amines. T he table gives the atom ic charges for the protonated amines exam ined in this work. T he charge is given in unit of electron charge. (e.g., [3, 18, 22, 23]) use polarization constraints th at force the finite solvent sys tem to have the polarization expected from the corresponding infinite system. The present version of this approach which is im plem ented in the program EN- ZYMIX [14] focuses on obtaining a reliable treatm ent of long-range forces. This is accom plished by dividing the protein into 5 regions as depicted in Fig. 1.3. Region 1 contains the groups whose electrostatic energy is of interest. Region 2 contains the unconstrained protein electroneutral and charged groups selected w ithin a radius R 2 from the designated center. Region 3a contains the uncon strained w ater molecules generated w ithin a radius R$ and region 3b contains the surface w ater molecules (within R 3) which are subjected to a set of angular and radial constraints th a t are designed to keep the average polarization and density of region 3a as close as possible to those expected from the corresponding infi nite system [23]. Regions 2 and 3 are surrounded by region 4a, th a t includes the protein groups between R 2 and i?4a, and by a spherical grid of Langevin dipoles (region 4b) of a radius R^- The protein groups in region 4a are constrained to the corresponding X-ray positions by quadratic constraints. Region 4 (4a and 4b) is then surrounded by a bulk region, region 5, whose electrostatic effects are sim ulated by considering it as a continuum w ith the dielectric constant of bulk w ater. W hile the electrostatic effect of the protein p art of region 5 (region 5a) is represented by a dielectric continuum , the steric and bonding effects of this region are treated explicitly (this helps to m aintain the protein in its correct 21 Table 1.4: P artial atom ic charges for some other organic molecules. Atom ic charges for the other organic molecules exam ined in this work. T he charges are given in unit of electron charge. 22 c h 4 c -0.160 H 0.040 C5H1 2 0 0 Ha 0.037 Cp -0.070 Hp 0.038 C1 -0.070 Ha 0.038 C6 -0.070 Hs 0.038 c £ -0.120 Ht 0.037 c H c 6h 6 -0.050 0.050 H 0 Ca Ha CH30H 0.450 -0.450 -0.114 0.038 H 0 Ca Ha Cp Hp CH3CH2OH 0.450 -0.450 -0.066 0.033 -0.114 0.038 H 0 Ca Cp Hp C7 c £ H < c 6h 5o h 0.450 -0.450 0.030 -0.064 0.050 -0.057 0.053 -0.048 0.053 N ca Ha Cp Hp C7 Ha c 5h 5n -0.511 0.160 0.061 -0.044 0.054 -0.006 0.053 H N Ca Ha Cp Hp C7 Ha C5H5NH+ 0.210 -0.120 0.240 0.060 0.010 0.060 0.110 0.060 H N r Cp Hp c7 Ha C£ Ht c 6h 5n h 2 0.417 -0.834 0.000 -0.050 0.050 -0.050 0.050 -0.050 0.050 folded structure). The introduction of region 4 and 5 is a new feature of our all-atom model. T he inclusion of the Langevin grid allows us to generate system of explicit w ater dipoles far beyond R 3 and to obtain in a relatively inexpensive way m ore realistic long-range solvent effect than those obtained by a sim ple con tinuum treatm en t (see Eq. 1.8). The spherical truncation of th e protein allows us to obtain the continuum contribution in a stable way and to avoid artifacts associated w ith incorrect treatm ents of the long-range effects. It should be noted th a t both the energy and the forces from the Langevin grid are included in the sim ulation (in treating the btdk contribution we only consider the energy and neglect the corresponding forces). T he incorporation of th e Langevin grid in the all-atom model of the F E P approach is sim ilar to th a t of the regular PDLD treatm ent. T hat is, a cubic rigid grid w ith a spacing of 4 A, is built around regions 1, 2, 3 and 4a of the protein system (the use of a 4 A grid spacing and the corresponding use of a larger dipoles give results sim ilar to those obtained w ith a 3 A grid spacing but with fewer dipoles and less com puter tim e). T he grid is then truncated to a sphere of a radius R.ib from th e center. Grid points th at are closer than 3 A to any atoms in those regions are deleted. The local field on the ith the Langevin dipole is expressed in a sim ilar way to th at used in Eq. 1.1, i.e., £, = ( « + £ ) + € f , (i-i7) where is the contribution from the perm anent charges in regions 1, 2 and 3 (which include both protein and water atom s), £' is the contribution from the protein atom s in region 4a (which is evaluated only once in a sim ulation since these atom s are effectively fixed by constraints) and is th e contribution from all other Langevin dipoles (i.e., nearest neighbors are also included). B oth and £' are scaled by the same function D (rq) of Eq. 1.7. For the contribution from protein groups, rij of Eq. 1.7 is taken as the distance betw een the j t h protein atom and the ith Langevin dipole, while for the contribution from w ater 24 Figure 1.3: An illustration of the five regions of the protein system in the surface constrained all-atom model which is used in the F E P calculations. Region 1 con tains the charged groups of interest. Region 2 contains the unconstrained protein groups w ithin a radius R 2 from the center. Region 3a contains th e unconstrained w ater molecules and region 3b contains the surface w ater molecules (which are subjected to polarization constraints). Region 4a contains the harm onically con strained protein atom s between the radii R 2 and i?4a, and region 4b contains th e grid of langevin dipoles th a t complete regions 1, 2, 3 and 4a to a sphere of a radius R ^ . Region 5 (which includes region 5a of the protein) is m odeled by considering the entire region as a dielectric continuum . 25 molecules, it is taken as the distance between the j t h w ater oxygen and th e zth Langevin dipole. T he param eter iy is also taken as 3 A. The average polarization of the ith Langevin dipole, /x f, by its local field is given by Ecp 1.2. The electrostatic interaction between the Langevin dipoles and the charges in regions 1, 2 and 3 is evaluated by A G = - 3 3 2 X > f « . (1.18) i T he van der Waals interaction between the Langevin dipoles and the relevant atom s is determ ined by a simple repulsive potential th a t varies as R~12. Note th at the factor 1/2 of Eq. 1.8 is not included in Eq. 1.18. The reason for this is associated w ith th e fact th at here we include the Langevin contribution as an effective potential in the FE P calculations and th e penalty for polarizing the dipoles is evaluated explicitly in such calculations. The force on the ith protein atom , fi, from the j the Langevin dipole is evaluated by m inus the gradient of the corresponding potential energy, i.e., f. = -f, = - 3 3 2 M j £(V, • £ ,) , (1.19) where fj is the corresponding force on the j th Langevin dipole (which is not needed since the grid is rigid), /xj' is the j t h Langevin dipole from the last iteration, is the field on the j t h Langevin dipole from the ith protein atom , and V , is the gradient operator w ith respect to the coordinates of the ith atom . Since the Langevin dipoles of our model are usually quite far away from the region of interest (region 1), their collective long-range effect can be considered as a slowly varying function. Thus, we adopt some of the LRF features m entioned in the previous section to accelerate the evaluation of these interactions. T h at is, the field of Ecp 1.17 and the fiekl-gradient, which are needed, respectively, for the evaluation of the energy (by Eq. 1.18) and forces (by Eq. 1.19), are updated only once in every 10 tim e steps (i.e., 20 fs). In other words, the energy and forces from the Langevin dipoles are kept constant during this tim e period. However, th e field £L of Eq. 1.17 is updated at every step and is used to u pdate fiL (by 27 Eq. 1.2), thus allowing for self-consistency in the evaluation of th e Langevin clipoles. This rath er simple modification is found to substantially increase the speed of the calculations but w ithout affecting the accuracy and precision of the results (see below). In addition to the forces associated with the effect of the Langevin dipoles on regions 1, 2 and 3, the regular forces in these regions are considered as well. However, the forces due to the induced dipoles of the protein atom s and th e w ater molecules are not considered explicitly and their effects are approxim ated by atten u atin g the coulombic. forces w ith a factor of 0.5 (see Fig. 13 in [3]). Similar results are obtained by th e non-iterative induced dipole treatm en t [9, 22] which is also integrated conveniently in ENZYMIX. Alt the nonbonded interactions in regions 1, 2 and 3 are evaluated w ithout any cutoff (though th e m ost current version of ENZYM IX has been augm ented by the LRF m ethod [15] which uses a cutoff radius and estim ates the missing long-range contributions to th e energies and forces). T he charges and nonbonded param eters of the m odel were calibrated by requiring them to reproduce observed solvation free energies. Some common van der Waals param eters of the all-atom model are given in Table 1.5. The atom ic charges of the all-atom model for some common am ino acids and some other organic molecules are given in Tables 1.2, 1.3 and 1.4. W ith a. reasonable all-atom treatm ent, we can tu rn to evaluate the free en ergies of our system. A particular powerful strategy is provided by the F E P approach. In this approach, one determ ines the free energy difference between two states, A and B, by using an effective potential th at drives th e system from one state to another in small steps, so th at the free energy change in each step can be evaluated by a therm odynam ic perturbation approach [24, 25]. Studies of charge transfer reactions and other solvation processes in solutions and proteins [22, 26-30] have suggested th a t this can be done effectively by a potential of the form: £m = (1 — &m)£,A + @m£B j (1.20) 28 Atom A B H 22.0 1.0 H(in H20 ) 0.4 0.6 0 793.3 25.0 0 (in H20 ) 861.4 26.0 o - 2066.0 42.0 N 1425.0 42.0 C 1956.0 32.0 P 2454.0 47.0 s 1831.0 40.5 F - 800.0 2.5 Cl" 7000.0 2.5 B r- 8000.0 2.5 I- 16000.0 2.5 Li+ 17.3 1.2 Na+ 150.0 5.0 K + 730.0 14.0 R b+ 1260.0 20.0 Cs+ 2250.0 30.0 Ca2+ 345.0 15.0 Table 1.5: Some common van der Waals param eters of th e all-atom model. T he van der Waals energy (in kcal m ol-1 ) for the all-atom m odel is given by Vvdw — A; Ayr-- 1 2 — BiBjr~j6, where i and j denote the atom s in the pairwise interaction and is the distance between the ith and j t h atom s. A t and B t are given in units of A6 kcal m ol-1 and A3 kcal m ol-1 , respectively. where ea and eg are the potential surfaces of state A and state B, respectively. 9m is a m apping param eter th at is used to change em from &a to eg- For small steps in 9, the free energy change associated with changing 9m to 9mi can be treated as a perturbation on-the ensemble belonging to 0m, i.e., SG(9m - 0m.) = —R T ln{exp{ — (em» - em) /R T } )m . (1.21) T he to tal free energy difference between the states A and B is given by the sum of the contributions from each 6 step: A G (90 - On) = E ^ M • (1-22) 771 = 0 1 .2.3 S ca led m icro sco p ic m o d els can e x te n d th e p rec isio n o f e le c tr o sta tic ca lcu la tio n s: th e P D L D /S m e th o d T he m ain problem w ith the above microscopic approaches is related to the fact th a t such models involve the sum of large contributions (e.g., Vq^ and V q q ) which even with a small relative error lead to a significant absolute error. On th e other hand, the macroscopic picture is associated w ith m uch sm aller energy contributions since the relevant interactions are divided by a large dielectric constant. This difference between the microscopic and macroscopic approaches is illustrated in Fig. 1.4 for an ion pair in water. As seen from the figure, the m icroscopic approach gives small energy changes for moving the charges from infinity to R through an almost perfect com pensation of very large num bers (Vqq and A G soi). In the macroscopic picture, the above com pensation is basically assumed by using a large dielectric constant. In other words, the m acroscopic approach integrates the weak effective force (which is the field scaled by the dielectric constant) for moving the charges in solution, while the microscopic m odel evaluates directly the energy of moving the charges to their given site from vacuum. The problem with the macroscopic approach appears, however, 30 Figure 1.4: Illustrating the relationship between microscopic and m acroscopic descriptions of the energetics of an ion pair. T he figure gives the energy of an ion pair as a function of the interatom ic distance, evaluated m acroscopically by Vqq/ c-\- AGsoi(oo) (—I —) and the corresponding microscopic energy evaluated by Vqq + A Gsoi (-A -). The plot dem onstrates the rem arkable com pensation of the changes in the charge-charge interaction Vqq (-□ -) by changes in th e solvation energy A Gsoi ( ~ 0 - )- A G so/(oo) denotes the sum of the solvation energies of the ions at infinite separation. 31 Z£ Free energy (kcal/mol) to o o I — 1 o o o o o to oo o I o 3 c o o D > o 8 - > □ C Q »3 C/> when one deals w ith moving charges between different phases (e.g., from w ater to a protein). Now in addition to intergrating the forces, one m ust also take into account the change in self-energy th at accompanies th e m ovem ent of the charges from one phase to another. U nfortunately, this aspect is needed for the accuracy rath er than the precision of the evaluation of the energies of charges in proteins. Considering the higher precision of the macroscopic approaches, one m ay look for related ways of scaling large energy contributions of the microscopic models, obtaining sim ilar precision and yet retaining the clear energy-based concepts of these models. Such a “scaled microscopic” model, which is referred to here as the P D L D /S model, has been introduced [4, 31] and applied recently [32] and its exam ination is one of the prim ary objectives of the present work. Probably the sim plest way to understand the derivation of the P D L D /S scaling procedure is to follow the therm odynam ic cycle depicted in Fig. 1.5. T he first step of the cycle starts w ith the relevant charged groups of the protein Q s im m ersed in w ater at infinite separation from each other and the protein w ith the sites of th e Q s occupied by neutral groups of the same size. The electrostatic energy of the system at this state is given by AG„ = £ A + A G L,,„(Q = 0) . ( 1.2. 3) i where AG*0; w is th e PDLD solvation free energy of the zth charged group in w ater and A G ^ (Q =0) is the PDLD solvation energy of the protein w ith all the Q s in their uncharged forms (a more accurate treatm ent should involve em pty cavities instead of the uncharged groups [32]). The second step involves changing th e dielectric constant of the entire w ater region from ew to a dielectric constant, whose m eaning will be discussed below. The free energy change associated w ith this step can be approxim ated by A G '„ t ~ - [ £ AG"„,_„ + A G l, J Q = 0 )](4 - - 4 ) . (1.24) i in This expression is based on the fact th a t the change in the m acroscopic solvation energy upon moving a molecule from ew to t°in is scaled by 1 /efn as long as 33 (a AG © O protein AG | I A s t) (d) l © 0 AG c Figure 1.5: T he therm odynam ic cycle used in the PD L D /S m ethod for calculat ing th e free energy of charges in a protein. See text for details. 34 ew > > e?n- (the monopole contribution is scaled by (l/c°n — l/e w)/(l — l/e w) which can be approxim ated by l/e fn for ew > > efn and a sim ilar approxim ation is applied to the dipole term ). Of course, the macroscopic energy depends on an unknow n effective cavity radius but this problem is avoided in our treatm en t since we already know the microscopic A Gsoi and only retain th e macroscopic scaling (A Gsoi can be used to obtained the best effective cavity radius). In th e next step, we move the Qs from infinity to their actual protein sites. T he free energy change of this step is given by A Gb-+C = (Vqq + Vq^)— , (1.25) where V q q is the vacuum interaction between the charges and V q m is the vacuum interaction between the charges and the protein polar groups. Here we exploit th e fact th a t th e energy of bringing charges from infinity to a given distance at a uniform m edium is given by scaling the vacuum interaction energy by the dielectric constant of th a t medium. The last step th a t com pletes our cycle is th e change of the dielectric constant of the m edium surrounding the protein from efn back to ew. The free energy change associated w ith this stage can be approxim ated by A G U j ^ A G ^ ,J .Q = « . ) ( T - h ■ ( 1'26) ein Now the total free energy of bringing the charged groups from w ater to their protein sites is given by A G = AG'a_ r f = A Ga-> b + A Gb-*c + A Gc^d = [ ( £ J + A G « j G - G + (V q , + I W - V (1-27) i ^in ^in where A Gqw = A Gp solw(Q = Qa) - A Gv solw(Q = 0) is th e difference in th e solvation energy of the protein w ith and w ithout the charged groups. All the term s in Eq. 1.27 are obtained by the standard PDLD treatm en t except th a t 35 they are now scaled by a factor of the order of 1 /efn (here we use th e fact th a t l/e w ~ 0). This scaling gives smaller energy contribution and increase the precision of the model. At this point, one m ight wonder what value should be used for efn. In ad dressing this issue [8], it is crucial to rem em ber th a t efn is not the actual protein dielectric constant tin since we take into account explicitly th e protein perm anent dipoles in the Vq^ term . In fact, one should realize the frequently overlooked fact th a t the dielectric constant simply represents all the contribution which are not considered explicitly. Thus for exam ple, if we were to consider explicitly all the polar contributions including the protein perm anent and induced dipoles and th eir relaxations upon charge rearrangem ents (as is done in the PD LD tre a t m ent), we should have used efn = 1. Here we consider implicitly the protein induced dipoles, some bound w ater molecules th at are not included explicitly and the reorganization of the protein perm anent dipoles. T he relevant efn can be now estim ated from microscopic sim ulation [8], but here we will sim ply exam ine the result obtained for efn between 2 to 6. 1 .2 .4 S o m e m a cro sco p ic m o d els A lthough this paper focuses on microscopic and semimicroscopic approaches, it is useful th a t we also consider here some macroscopic models. This is done in order to clarify and illustrate some of th e problems associated w ith th e m acroscopic concepts. T he first of this models involves the commonly used assum ption th at electrostatic energies in proteins can be assessed using e = 2, or in our notation, = = (1.28) 2 ^ rij 2 v ' i < A / This treatm en t (referred to here as the “e = 2” m odel), which m ight seems quite l'easonable, is equivalent to the im plicit assum ption th at both the protein and th e surrounding solvent can be considered as a nonpolar m edium . T he problem s associated w ith this assum ption will be illustrated in our com parative studies. 36 A nother popular macroscopic model involves the use of a distance-dependent dielectric function e(R) — R , or in our notation, A G = 332 Y ] QiQi . (1.29) i<3 7 O T he problem s associated with this type of models (referred to here as the “e = J?” m odels) were considered in Ref. [3] and they will also be illustrated in our com parative studies. Finally, we will exam ine the perform ance of the Tanford-Kirkwood m odel [33]. This m odel considers the protein as a non-polar m edium surrounded by a high dielectric solvent. The problems associated w ith this and related models have been discussed before [3, 7]. However, in order to clarify these problem s, it is useful to im prove the model so th at it correctly accounts for the shape of the protein (in this way one can separate the problems associated w ith the assum ption of a non-polar protein from those associated w ith th e shape of the protein). T he im proved model, which is referred to here as the D iscretized Tanford-Kirkwood (D TK ) model, considers the protein as a continuum w ith a dielectric of e = 2, while describing the solvent as a discretized continuum w ith a dielectric of e = ew. The corresponding calculation can done by using a finite elem ent m ethod, b u t it can also be done by using the P D L D /S approach, om iting th e AVqm term of Eq. 1.27 while setting e“ n = 2 and l/e w ~ 0. This gives AG ~ i[(X; -A G U J + AGQ w + V qq\ . (1.30) i 1.3 Examining the performance of different models To assess the reliability of a given model for calculations of electrostatic energies in m acrom olecules, it is essential to use this model in system atic studies of dif ferent test cases. Such a validation study is in m any respects m ore relevant than the form al justification of the different models. For exam ple, a form ally rigorous 37 all-atom m odel can give entirely unreliable results when practical considerations dictate the use of a small cutoff radius. In this section, we exam ine th e various approaches of th e m ethod section by calculating the electrostatic free energies in different classes of problems, ranging from the simple cases of sm all molecules in w ater to m uch m ore challenging cases such as ion pairs in protein interiors. 1.3.1 S o lv a tio n en erg ies o f sm a ll m o le c u le s in so lu tio n Before exam ining the perform ance of different models for calculations of elec tro static energies in proteins, it is useful to exam ine such models by calculating the solvation energies of molecules and m olecular ions in aqueous solutions. The results of such a test case are summ erized in Table 1.6 and depicted in Fig. 1.6, which includes m onoatom ic ions of different sizes and charges, m olecular ions and polar molecules. As seen from the table, both the PDLD and F E P m ethods can give quite reliable results. A pparently, the reason for this success is th a t any m odel of polar solvents which properly treat the interplay between dipole-dipole and dipole-charge in teractions can reproduce the m ain physics of charges in a uniform homoge neous polar m edium . However, as dem onstrated in different microscopic studies [4, 18, 34], w hat is essential is the calibration of the van der W aals radii of different atom s to reproduce observed solvation energies. The same is tru e for m acroscopic approaches th a t give reasonable results w ith calibrated atom ic cavities [12]. On the other hand, much more rigorous all-atom models can give unreliable results w ith uncalibrated van der Waals param eters, even when such param eters are de duced from quantum mechanical calculations. However, it should be pointed out th a t th e param eters calibrated in solution studies should be left unchanged in studies of charges in proteins. 38 Table 1.6: Solvation energies of several ions and molecules in solution obtained by th e PDLD and F E P m ethods. Calculated solvation energies (in kcal m ol-1 ) of several ions and molecules in solution. Some of the calculations were used to calibrate the PDLD and F E P param eters by adjusting the van der Waals param eters of both m ethods. The table emphasizes the PDLD results and only some F E P results are given. The solvation energies for the am ino acids were obtained w ith th e charges of the m ain chain atom s set to zeros and the corresponding observed values are taken relative to the solvation energy of glycine, (a) T he observed values listed here are average values of the experim ental free energies from the sources indicated. Many of the experm ental results involved an error range of ± 5 kcal m ol-1 , which is in p art due to the different possible in terp reta tions and uncertainties associated with some parts of the therm odynam ic cycles used for obtaining the experim ental values. 39 System PDLD FE P O b s » Ref. Na+ -100 -101 -100 [52, 53] C a2+ -389 -377 -381 [52] F " -108 -109 -106 [52-55] c i - -74 -76 -77 [52-55] B r“ -74 -72 -73 [52-55] I- -60 -64 -63 [52, 53] NH+ -79 - -78 [56, 57] N (CH 3)H+ -77 - -74 [48, 57-59] N (C 2Hs)H+ -71 - -66 [57] N (CH 3)3H+ -62 - -61 [58-60] Asp- -83 -77 -80 [3, 61] Ser- -85 -90 -90 [48] His+ -71 -64 -65 [3] Ser -9 -8 -7 [62] His -12 -10 -12 [62] Asp -10 -9 -9 [62] Arg -10 -14 -13 [62] Asn -14 -9 -12 [62] Lys -5 -5 -6 [62] c h 4 2 - 2 [56, 62] c 5h 1 2 1 - 2 [56] c 6h 6 -1 - -1 [56] c h 3o h -7 - -5 [60, 61] c h 3c h 2o h -5 - -5 [61] c 6h 5o h -8 - -7 [61] c 5h 5n -4 - -5 [60] C5H5NH+ -56 - -54 [57] c 6h 5n h 2 -7 - -5 [63] 40 MOLARIS vs. Experiments 20 W ) 0 cu s -20 £ O - 4 0 m — 60 "d < n £ - 8 0 cu § -1 0 0 -120 -120 -100 -8 0 -6 0 -4 0 -2 0 0 20 Calculated solvation energy Figure 1.6: A plot of the observed and calculated solvation energies given in T a ble 1.6. T he calculations are perform ed w ith the M OLARIS sim ulation package [14] which includes the POLARIS PDLD calculations (□ ) and the ENZYM IX F E P calculations (+ ). A straight line corresponds to a perfect fit. This plot serves as a validation of the accuracy of our two microscopic models. 41 1.3.2 S elf-e n e rg ies o f sin g le ch arges in so lv a te d p r o te in s T he study of solvation energies of small molecules in solution is not th e m ajor challenge of this work particularly since the van der W aals param eters or the atom ic Born radii are adjustable param eters. A much m ore serious challenge, which has been repeatedly addressed in our previous works [2-4], is th e evaluation of the self-energies of single charges in protein interiors. A clear concept of the self-energies of single charges in proteins is crucial for understanding the energetics of m any biological processes including redox problem s, ion binding and ion perm eation in m em brane channels [4]. In this section, we exam ine th e ability of different models to reproduce the self-energies of single charged groups in several representative system s, which include the stringent test of the evaluation of the intrinsic p K as of acidic groups in proteins, the self-energy of the hem e in cytochrom e c, the self-energy of calcium ions in a calcium binding protein and the barrier for the penetration of a sodium ion through the gram icidin A channel. A lthough the self-energies of charges in proteins m ight appear quite abstract, they can be obtained uniquely from standard therm odynam ic or kinetic prop erties (e.g., p K as> redox potentials, conductance of ions, etc). This point can be illustrated, for exam ple, by considering the pK as of ionizable groups in pro teins. In this case, one uses a therm odynam ic cycle of th e type illustrated in Fig. 1.7 (see Ref. [35] for w hat is probably the earliest use of therm odynam ic cycles in microscopic studies of proteins). This cycle and related considerations give [35, 36] pK = V K + ^ ^ { A A G U A H A-) - A A G 'U A H A')} , (1.31) w here the superscripts p and iv denote the corresponding quantities are evaluated in protein and in water, respectively. A A Gsoi is the difference in solvation energy betw een th e ionized (A- ) and unionized (AH) states of the acid. Eq. 1.31 tells us th a t th e difference between the pK as of an ionizable group in w ater and protein is m ainly determ ined by the relative solvation energy (or self-energy) of th e charged 42 water AH AG + H i ° ^ 3 (aq) AG. -AH - A AG + H3°(aq) Figure 1.7: A therm odynam ic cycle used to estim ate the energetic of dissociation of an acidic group of a protein. The AG'; is given by A G i = AG™^*P(AH), A G - 2 — 2.3RT(pI<™ — pH) and A G 3 = AG'l ^ p(A~). The free energy of ionizing the group is given by A G = A G i + A G 2 + AG3. 43 ! form of this group, A Gsoi(A ), in the given media. Thus, one m ay approxim ate : j Eq. 1.31 by | , A p K ° ~ P ~ 2 j R T A A G l 7 r ( A ~) ' (1'32) However, reliable studies should also consider the A A G™ffp(AH ) term . T he j i same type of considerations are applied to redox problem s [16, 37], ion channels j [38, 39] and ion binding by proteins [40]. The m ain problem is to obtain reliable \ results for the absolute solvation energies of the charged groups in the com plicated protein m icroenvironm ents. T he perform ance of the three m acroscopic models l as well as our three microscopic models in these crucial tests is sum m erized in : Table 1.7. i T he bovine pancreatic trypsin inhibitor (B PT I) test case is indeed m uch i j harder th an w hat may be deduced from the fact th a t pK as of surface groups are I ! usually quite sim ilar to each other [41, 42]. T he apparently sim ilar p K as involve 1 J com pensation of larger energy contributions of different n atu re and it is difficult ; to obtain such a com pensation by microscopic approaches. In fact, the F E P m odel starts to give reasonable results only when large cutoff radii (> 18 A) ; are used. These calculations are very expensive and the precision of the results ' are often quite poor. Fortunately, the recently introduced LR F m ethod [15] ; substantially increases the speed of the all-atom F E P calculations, while still i ; I : giving th e results obtained w ithout any cutoff. The high perform ance of the I i LR F m ethod allows us to use our current com puter power to average th e results i j of F E P calculations over different sets of initial conditions. The results obtained J by using this averaging procedure appear to be m ore accurate and precise th an ' those obtained from a prolonged calculation started from a single set of initial I conditions [15]. T he self-energies of the four acidic groups of B P T I evaluated by th e LR F all-atom model are given shown in Table 3.5 of C hapter 3. These results are by far our best microscopic results for the B P T I test case (see Table 1.7 for a com parison). 44 Table 1.7: Electrostatic energies of different ionized groups in proteins evalu ated by different macroscopic and microscopic m ethods. E lectrostatic energies of different ionized groups in proteins evaluated by different m acroscopic and m icroscopic m ethods. The electrostatic energies (in kcal m ol-1 ) for the D TK , P D L D /S , PDLD and F E P m ethods are given relative to the corresponding sol vation energies of the given group in water. T he energies from th e e — 2 and e = R m ethods represent the interaction of the given group w ith other ionized groups of th e protein. Note th at for the e = 2 model, the reference state (or th e solvent around the protein) is a nonpolar continuum w ith e = 2. 45 System model Macroscopic Semimicroscopic Microscopic Obs. Ref. e = 2 e ~ R DTK PDLD/S PDLD FEP C = 2 e? = 4 Sn e? = 6 Bpti(Asp3) - - 6 -0.5 -0.2 -0.1 -3 -1 (-0.9) -1.3 [41, 42] Bpti(Glu7) - - 15 3.5 1.7 0.8 1 2 (-0.2) -0.6 [41, 42] Cytc(Heme) - - 13 12 6 4 10 6 7 [37] Gramicidin A(Na+1) - - 25 5 2 1 2 6 5 [38, 39] Calbindin(Ca+2) -232 -175 -10 0.7 -2.5 -3.6 -6 - -5 [40] 0 5 However, an even greater challenge for the F E P m ethod is to evaluate the self-energies of these acidic groups in the presence of another ionized acidic group. Here, we take this challenge by calculating the self-energy of Asp 3 of B P T I, w ith and w ithout Glu 7 ionized, as well as th at of Glu 7, w ith and w ithout Asp 3 ionized. Recent attem p ts indicated th a t even larger cutoff radii (> 21 A) are needed for stable results. These calculations would be quite expensive when the solvent effect w ithin these cutoff radii is represented exclusively by explicit w ater molecules. A m ore affordable alternative is to use Langevin dipoles to replace the w ater molecules at th e peripheral of the system , while still using th e LRF m ethod. The results such calculations are given in Table 1.8, which show th a t the charge-charge interaction due to the presence of a charged group is com pensated by other energy contributions to give self-energies sim ilar to those w ithout the charged group. The cytochrom e test case is perhaps the least challenging. Here th e oxidized hem e can be viewed as a charged sphere with a large radius. T he “solvation” of such delocalized charges involves smaller forces and smaller energy contributions th an those of sm aller ionized groups and therefore involves sm aller errors when evaluated by microscopic models [37], T he self-energy of an ion at a site in a transm enrbrane channel reflects the activation barrier for ion perm eation through th at site. Thus a correct evalua tion of the self-energy of an ion in a channel is a prerequisite for obtaining the com plete free energy profile of ion perm eation. Here we take the gram icidin A channel as a test case and calculate the self-energy of a sodium ion at th e cen ter of this channel. The result is indeed quite good considering th e error range (20-30 kcal m ol-1 ) of other related studies (see Ref. [38] for discussion), which appears to be due to the neglect of some key electrostatic contributions [38]. T he study of Ca2+ binding by calbindin is quite challenging. Here we find th a t only approaches th a t involve rem inim ization of the apo-protein give reasonable results. Rem inim izations of the protein structures allow for a correct evaluation 47 C ontribution Asp 3 Asp 3 (Glu 7 - ) Glu 7 Glu 7 (Asp 3 - ) &G qq - 40.0 - 32.7 -6.4 -10.1 -9.8 -13-4 -50.8 -62.8 -47.9 -52.9 -2.3 -3.4 -1.5 -4.9 a r AAL-T Igvn -6.6 -15.9 -3.7 -10.4 ^ &bulk -6.8 -20.1 -6.7 -20.7 A G L 72.9T3.2 72.3T1.5 69.6T1.2 69.6T1.0 Table 1.8: T he F E P self-energies of Asp 3 and Glu 7 of B P T I evaluated in the presence of another ionized acid. Average electrostatic energies (in kcal m ol-1 ) of charging the Asp 3, with and w ithout Glu 7 ionized, and Glu 7, w ith and w ithout Asp 3 ionized, in B PT I. The radii i?2, i?3, R^a and R < tb of th e all-atom m odel are taken as 15, 15, 18 and 24 A, respectively, for the protein calculations. U nder these conditions, there are 311 w ater molecules and 629 Langevin dipoles in all the protein cases (since the designated center of each case is th e sam e). All the reported quantities are obtained by averaging the results of 4 independent 22 ps M D /F E P trajectories generated from different initial configurations of th e system s. The table gives the different contributions to the to tal calculated free energies for the protein cases. A G q q is the contribution from charge-charge interaction. A Vq and A Vq^ are the contributions from the interactions betw een th e carboxy group of the relevant acidic residue w ith the protein and w ater atom s, respectively. AVga is the contribution from th e interaction betw een the protein induced dipoles w ith the perm anent charges. A Gigvn is the contribution from th e Langevin dipoles. AGbuik is the contribution from the bulk. 48 of the protein dipolar reorganization energy (the A of Eq. 1.10), which is a significant contribution to the electrostatic energy of m ultivalent ions (charging of m onovalent ion often involves less protein dipolar reorganization) and m ust be included in th e calculations. The results from the P D L D /S m ethod im plies th a t a larger value of efn (efn = 8) is needed to l'eprocluce the observed result in this case, as com pared to other test cases (see other Tables for com parisons). W ith the exception of the calbindin case, the D TK m odel gives very unrea sonable results for single charges in proteins. Furtherm ore, th e e = 2 and e = R models overestim ate in a drastic way the self-energy of calcium ions in calbindin. 1.3.3 Ion p airs in so lv a ted p ro tein s Ion pairs play an im portant role in many biological processes including enzym e catalysis, subunit interactions and photosynthesis. In fact, ion-pair interactions are the m ost commonly used exam ple of electrostatic interaction in proteins. Thus it is im portant to explore the abilities of different electrostatic models to reproduce the energetics of ion pairs. However, in doing so, it is im portant to understand the fundam ental difference between the use of the m acroscopic coulom b-type expression: A G = 3 3 2 = YQo W (1.33) rij6eff eeff and th e corresponding microscopic expression [3]: A G = 3 3 2 ^ + A G so;( r p ) - ^ A G ^ ( o o ) Vij i = V 'M(i?)+AG„,(«)-EAG‘ .»i(«>)- (1-34) i According to Ecp 1.33, changing the interatom ic distance of a pair of sodium chlo ride ions from rq to r 2 give a free energy change A A G = (332/ee/ / ) ( l / r 2 — 1 / n ) - For large values of ee/ / (e.g., ee/ / = e^), A A G ~ 0. However, to obtain the same result microscopically, using Eq. 1.34, is much harder. In th a t case, V qq(R ) 49 and AG>0/(R) m ust com pensate each other and this com pensation involves very detailed balance of two large quantities. A lthough the m acroscopic ee/ / provides a powerful description of the com pensation between Vqq and A Gsoi in hom oge neous polar solvents, it cannot be deduced from macroscopic considerations. In fact, in cases of ion pairs in proteins, one finds th at ee/ / is di'astically different from the small values frequently assigned to protein interiors (see discussion in Refs. [4, 8]). Thus it is im portant to calculate A G via Eq. 1.34 using m icroscopic or semimicroscopic approaches. Here we exam ine such approaches by considering several test cases w ith increasing level of complexity. 1 .3 .3 .1 Ion pairs in w ater Before considering ion pairs in proteins, it is instructive to consider charge-charge interactions in w ater. In doing so, it is im portant to realize th a t such interactions can be described quite accurately by using teff between 40 to 80 in Eq. 1.33 [3], which reflects an alm ost com plete com pensation between V q q and A G soi upon changing the interatom ic distance from infinity to th e given r,-j (see Fig. 1.4). However, obtaining such a com pensation from a microscopic m odel is a significant challenge even for a simple ion pair in water. Here we consider such a test case by using th e PDLD model to evaluate the energetics of N a+ C l- in w ater. In this study, which is sum m erized in Table 1.9, we exam ine the interaction of th e ions, 12 A ap art, by evaluating the energy of charging the sodium w ith and w ithout a C l- ion. The calculations w ith the standard PDLD m odel give a reasonable com pensation (A A G — 0.9 for A V q q ~ — 27) which corresponds to an ee/ / of 31, while th e actual interaction energy should correspond to an ee/ / of ~ 80 at such a large distance. Although this result is quite satisfactory, we could obtain an even b ette r result by a variant of our model th a t include in ^ of Eq. 1.8 (which usually corresponds to the dipole of regions 1 and 2 of Fig. 1.2) the to tal dipole m om ent of the Langevin dipoles in region 3. T he result of this im proved m odel, 50 System PDLD A G qq A G lgvn ~ o e> < AGbuik A G total A A G eeff Na+ 0.0 -94.2 -7.6 - -101.8 - - Na+ Cl" -27.7 -82.6 7.6 - -102.7 -0.9 31 Na+ 0.0 -94.2 - -7.7 -101.9 - - Na+ C l- -27.7 -82.6 - 7.8 -102.5 -0.6 46 Table 1.9: T he effect of a distant chloride ion on the self-energy of a sodium ion in w ater obtained by the PDLD m ethod. The table presents PDLD calculations of the effect of a distant chloride ion (12 A away) on th e self-energy (in kcal m o P 1) of a sodium ion in water. The A G qq, AG igvn, AG^uik> and AGbuik are> respectively, the conti'ibutions from charge-charge interaction, the Langevin dipoles, the bulk energy while neglecting the contribution of the Langevin dipoles to the /i& of Ecp 8, and the bulk. The table dem onstrates the large com pensation of different energy contributions in microscopic approaches, which can lead to large relative errors. To gauge the success of our microscopic Langevin dipole m odel, the effective dielectric constant is estim ated using eeff — — 332/(R • A A G ), where R is the distance between the ion pair (see text). T he results show th a t our simplified solvent model can reproduce a large dielectric effect. sum m erized in th e lower part of Table 1.9, corresponds to an ee/ / of 46, which is quite im pressive for a microscopic model. 1 .3 .3 .2 C h arge-ch arge in tera ctio n s b etw een su rface grou p s in p ro tein s O ur second test case involves charge-charge interaction between surface groups in proteins. T he strength of such interactions can be deduced uniquely by genetic engineering experim ents and a good exam ple is the effect of surface groups on the pKa of His 64 of subtilisin [43]. Here we consider the individual effect of two such surface groups, Asp 99 and Glu 156, which are over 10 A away from His 64. This test case has been the subject of several recent macroscopic studies [44, 45], which reproduced th e observed trend w ith a good precision. However, such a precision is expected from macroscopic models as long as they deal w ith charge-charge interaction between surface groups (rather than self-energies), since such models 51 sim ply scale the electrostatic interactions by a large dielectric constant. In fact, the effective dielectric constant, ee/ / , for an ion pair on a planar surface w ith e — em on one side and e = ew on the other side is simply given by (e,n + ew) / 2, which gives an ee/ / > 40 (such a. surface is a good approxim ation for the case when th e protein radius is larger than the distance betw een th e ions). Here again it is m uch harder to reproduce the experim ental observed sm all interaction by microscopic models th an by macroscopic models. Our PD LD results for the interaction between His 64 and Asp 99 and Glu 156 are sum m erized in Table 1.10. T he calculations are done by evaluating th e self-energy of His 64 in th e native protein and in the D99S and E156S m utants. For the case of the native protein, the unm inim ized X-ra,y structure is used for the calculations, while for the cases of the m utants, the corresponding structures are obtained by a lim ited local relaxations of the native structure. In all three cases, th e residue Asp 32 which is a crucial p art of the catalytic triad is also ionized. R 2 and R3 (see Fig. 1.2) are, respectively, 18 and 22 A. As seen from the table, we obtain quite encouraging results, dem onstrating th at the PDLD model is capable of reproducing large cej j as a result of a microscopic com pensation between large energy contributions. It is im portant to m ention th at smaller R 2 or R3 gives incorrect results (results not shown). 1 .3 .3 .3 C h arge-ch arge in ter a ctio n s in p ro tein in terio rs A fter considering two cases which involve small and rath er trivial effective in teractions, it is im portant to consider ion pairs in protein interiors th a t m ay involve m uch stronger interactions [3, 4]. Here we consider th e test cases of the lie 16-Asp 194 ion pair in trypsin [3, 45, 46] and the His 159-Cys 25 ion pair in papain [3, 47]. The results of this study are sum m erized in Table 1.11. As seen from th e table, we obtain from the DTK m odel unstable ion pairs, since these ions would rath er be in water than be in the incorrectly assum ed nonpo lar protein sites (see discussion in Ref. [7]). Or in other words, th e oppositely 52 Table 1.10: PDLD calculations of the effect of removing surface charged groups in subtilisin on th e p K a of His 64. The PDLD free energies (in kcal m ol-1 ) of charg ing (protonating) His 64 in the native subtilisin and two of its m utants, D99S and E156S. T he A G q A Gind, AG igvn, and A G^uik are, respectively, the contri butions from charge-dipole interaction, induced dipole interaction, the Langevin dipoles, and the bulk. T he p K a shifts of His 64 due to either m utation are ob tained using an expression sim ilar to Eq. 32, i.e., A p K a = — A A (? jv_*m /1-38 at 300 K. T he observed p K a shifts correspond to an ionic strength of 0.005M. T he calculated results reproduce the observed weak interactions between His 64 and either surface charged group. 53 System PDLD Obs. Ref. AG q^ A G q a AG[gV n AGkulk AGt0 tal A A G m-*m A pKa A pKa wild-type -74.6 -10.0 -8.4 35.0 -58.0 - - - D99S -50.6 -14.2 -13.9 21.1 -57.7 0.3 -0.2 -0.4 [43] E156S -47.3 -16.3 -14.9 21.0 -57.5 0.5 -0.3 -0.3 [43] O l Table 1.11: Energetics of ion pairs in proteins obtained by the m acroscopic and microscopic m ethods. The free energies ion pairs in protein interiors are evaluated using the three macroscopic approaches, the P D L D /S m ethod, the PDLD m ethod and th e F E P m ethod. The D TK, PD LD , P D L D /S and F E P energies are given relative to the corresponding energies in w ater. T he e = 2 and e = R results are given relative to the corresponding energies at infinite separation (see also Table 1.4). The results dem onstrate th a t ion pairs can be m ore stable in proteins than in water. Note th at the m acroscopic m odel give entirely incorrect results. 55 System model Macroscopic Semimicroscopic Microscopic Obs. Ref. e - 2 e = R DTK PDLD/S PDLD FEP C = 2 e" = 4 = 6 m w Trypsin(Ilel6,Aspl94) -45 -25 30 2 l 0 -3 -4 -2.8 |45, 46] Papain(Hisl59,Cys25) -40 -19 21 -6 -4 -3 -5 -5 -2.5 [47] Or 0 5 charged ions repel rather th an attra ct each other. On the other hand, th e “e = 2” and “e — R" models overestim ate drastically the stabilization of these ion pairs. These type of problem s do not appear in th e PDLD, P D L D /S and F E P m odels, which describe correctly the energetics of ion pairs. 1 .3 .3 .4 Io n ic str e n g th m o d u la tio n o f ch arge-ch arge in te r a c tio n s in p ro tein s Before leaving this section, it is instructive to exam ine the effect of ionic strength on charge-charge interactions in proteins. This effect has a ttra c te d m ajor a tte n tion for m any years, probably reflecting the fact th a t until the em ergence of genetic engineering it was quite difficult to determ ine experimentally the m ag nitude of other electrostatic factors. Apparently, as was pointed out repeatedly [3, 4], the effect of the ionic strength is much sm aller th an the electrostatic ef fect of the protein polarity. Nevertheless, it is clearly im portant to be able to evaluate the change in charge-charge interaction due to the surrounding ionic atm osphere. Such calculations can be perform ed by discretized continuum ap proaches (see Ref. [6]). Here, however, we use the approach described in the m ethod section. As a first, step in the exam ination of our approach for calculating ionic stren g th effects, we evaluate the energetics of an ion pair in w ater at different ionic strength. T he results of these calculations are sum m erized in Table 1.12. As seen from the table, our simple treatm ent is able to reproduce the trend expected from the ionic environm ent. T hat is, the interaction betw een th e ion- pair decreases as the ionic strength increases and th a t the sam e concentration of electrolytes has a lesser screening effect on the ion-pair interaction at a greater interatom ic distance. N ext we consider the effect of the ionic strength on th e Ap K a of His 64 of subtilisin upon the m utation of either Asp 99 or Glu 156 to a serine residue [43]. T he results are given in Table 1.13, which reproduce the observed effect 57 D istance Ionic strength A A & X P(0.01 M) A A G ^ ( 0 .1 M) A A G ^ ( 0 . 5 M) 3.0 A 0.1 0.5 1.2 5.0 A 0.1 0.4 0.9 10.0 A 0.0 0.2 0.4 Table 1.12: The effect of ionic strength on the energetics of an ion pair in w ater obtained by the PDLD m ethod. The relative free energies (in kcal m ol-1 ) of a N a+ C l- ion pair in w ater, w ith and w ithout the ionic strength effect, calculated using th e PDLD m ethod. As seen from the table, the newly introduced term A Gis of the PDLD m ethod is able to reproduce the observed trend. T h a t is, th e ionic strength m odulation has a lesser effect on ion pair interactions w ith greater separation distance and the shielding effect is greater for higher ionic strength. Ionic strength A p K a Asp99 — ► Ser G lul56 — ► Ser 0.005 M -0.2 (-0.4) -0.3 (-0.3) 0.01 M -0.2 (-0.4) -0.3 (-0.4) 0.1 M -0.2 (-0.3) -0.3 (-0.3) 0.5 M -0.1 (-0.1) -0.2 - 1.0 M -0.1 (-0.1) -0.2 - Table 1.13: T he effect of ionic strength on the p K a shifts of His 64 of subtilisin upon m utating Asp 99 or Glu 156 to a serine estim ated by the PDLD m ethod. T he effect of ionic strength on the p K a shift of His 64 of subtilisin upon the removal of the distant surface charged groups, Asp 99 or Glu 156, is evaluated using th e PD LD m ethod. The p K a shifts are obtained using the expression given in Table 1.10. T he results reproduce the observed small effect of ionic strength on electrostatic interactions in polar solvents. In parentheses are th e observed p K a shifts [43]. 58 reasonably well, despite the simple nature of the present treatm en t. In fact, since ionic strength effects in proteins are quite small, it is not clear if m ore sophisticated models are really justified (although microscopic studies of ionic strength effects present a com putational challenge). 1 .3 .4 E le c tr o sta tic free en erg y in e n z y m e c a ta ly sis an d lig a n d b in d in g E lectrostatic energies play a m ajor role in biological specificity including catalysis and binding [1-6, 11]. Thus the ability to evaluate electrostatic energies in such processes is a prerequisite to any structure and function correlation. Here we exam ine the perform ance of our models in several generic test cases of binding and catalysis. 1 .3 .4 .1 E le c tr o sta tic en erg ies in en zy m a tic rea ctio n s M any enzym atic reactions are characterized by a large change in the charge distribution of the reacting substrate between its ground state and transition state. T he corresponding change in the electrostatic interaction betw een the enzym e and the substrate appears to account for a m ajor p art of th e observed rate enhancem ent (relative to the relevant reference solution reaction). Here we consider two such exam ples by evaluating the electrostatic contributions to the catalytic free energy of lysozyme [35] and trypsin [48, 49]. We also consider the effect of the A spl02-A la m utation in subtilisin [31, 50]. T he perform ance of the different models in these test cases is sum m erized in Table 1.14. As seen from the table, all the macroscopic models give incorrect results for the trypsin and lysozyme cases, though these models seem to give reasonable results for the subtilisin case. T he PD L D /S with e.fn between 4 and 6 accounts qualitatively for the observed effects in all cases, while the PDLD and F E P models appear to give m ore q uantitative results in these specific test cases. 59 Table 1.14: R eaction free energies of enzym atic reactions obtained by th e m acroscopic and microscopic m ethods. The absolute reaction free energies (in kcal m ol-1 ) of the trypsin and lysozyme catalyzed reactions [going from the ground state (gs) to the transition state (ts)] are obtained w ith reference to the sam e reactions in w ater. In addition, the relative free energy of th e reaction catalyzed by subtilisin and the D102A m utant is shown as an application of our m ethods to studies of the effect of m utations on the reaction energy profiles. 60 System model Macroscopic Semimicroscopic Microscopic Obs. Ref. £ = 2 e = R DTK PDLD/S PDLD FEP ein ~ % = 4 '“ in ^ ea = 6 Lysozyme(gs — » ts) -64 -43 -50 -27 -14 -9 -8 -5 -7 [35] Trypsin(gs — > ts) -46 -22 20 -9.6 -5.7 -4.3 -12 -7 -7 [48, 49] Subtilisin(D102A) 10 5 7 7 3 1 5 6 6 [31, 50] 1 .3 .4 .2 C a lcu la tio n s o f rela tiv e and a b so lu te b in d in g en erg ies E lectrostatic interactions play an im portant role in determ ining th e strength and specificity of ligand binding processes. An excellent exam ple is provided by the binding of phosphorylcholine (P-C ol) analogs to the M cPC603 antibody [51], where the positive and negative ends of th e ligands interact specifically w ith the corresponding negative and positive charges of th e antibodies. Here we take as a test case the absolute and relative binding energies of the analogs (CH 3)3N + (CH 2)nCO O - with n — 3 and n = 2 (see Ref. [32] for details). These analogs differ in length and are designated here by C and S for the n — 3 and n = 2 cases, respectively. In trying to reproduce the absolute binding energies one has to consider, in addition to the corresponding change in electrostatic energies, the contributions from hydrophobic energy A A Ghyd of Eq. 1.11 and reorganization energy A of Eq. 1.12 as well as the entropic contribution associated w ith th e configurational space available for the ligand in the binding site and in solution (this contribution is obtained by the rough estim ate of Ref. [32]). T he results of the calculations are sum m erized in Table 1.15. As seen from the table, we obtain reasonable relative binding energies w ith the P D L D /S , the PDLD and the F E P m ethods, although the F E P m ethod gives good results only when very large cutoff radii are considered. T he PD L D /S m ethod is particularly encouraging as it gives good estim ates of the absolute binding energies as well as the relative binding energies, which is a less stringent test. W ith the exception of the D TK model, the macroscopic models drastically overestim ate the absolute binding energies, though the relative binding energies obtained w ith these models are in the right direction. Considering the speed and stability of the PD L D /S m ethod, we exam ine its perform ance on several more test cases th a t involve binding of different ligands to M cPC603. T he calculated binding energies and the corresponding IC 50 values are com pared to the relevant observed values in Table 1.16. As seen from the table, we can reproduce qualitatively the observed trend. This indicates th a t 62 Table 1.15: A bsolute and relative binding free energies of phosphorylcholine analogs by M cPC603 obtained by the macroscopic and m icroscopic m ethods. T he table presents th e absolute and relative binding free energies (in kcal m ol-1 ) of (C H 3)3N + (C H 2)nCOO- by m urine myeloma protein M cPC603. T he analogs w ith n = 3 (jC) and n = 2 (5.3a >5.8a (PDLD/S estim ate with £=4) p h o sp h o ch lin e (P C ) Q CH HO-f-O-C^-OyN-CHj O" CHj -6.7, 0.0 -7.9 , 0.0 0.0a ().0a p -a m in o p h e n y l p h o s p h o c h o lin e (APPC) r ~ \ ? ,? " • NHj-4 y-o-f-o-C H j-cM j-N -C H , 0* CHj -5.9 , 0.9 -6.5, 1.4 1.4a 2.3a P-Col short analog (S) ^H 3 0 -C-CH2-CH2-+N-CH3 u • o ch3 -7.5 , 2.3 -5.4 , 1.7 2.3b 2.8b P-Col long analog (L ) 0"-C-CM2-CH2‘CH2-+H -CM 3 II 1 O ch3 -9.8 , 0.0 -7.1 , 0.0 0.0b 0.0b o > os the P D L D /S m ethod should provide a powerful tool for estim ating binding free energies and for com puter-aided drug design 1.4 Concluding remarks This work presents a system atic study of th e perform ance of several microscopic and semimicroscopic models for calculations of electrostatic energies in solutions and macrom olecules. This includes the evaluation of the solvation energies of ions and molecules in w ater, the self-energies of single charges in proteins, the energetics of ion pairs in solutions and proteins, the interaction betw een surface charges in proteins, th e effect of ionic strength on such surface interactions as well as the electrostatic contributions to binding and catalysis in solvated proteins. Very encouraging results are obtained by the microscopic and semim icroscopic models although a correct treatm en t of the long-range effects and the boundary conditions appear to be crucial for obtaining reliable results. T he present study also illustrates the problems associated w ith some macroscopic m odels by using these models to evaluate the experim ental observed energetics of charges in pro teins. Approaches th a t seem to give reasonable result for charges on the surface of proteins give very unreliable results for m ore stringent test cases. As argued repeatedly in our early studies (e.g. Ref. [3, 4]), the effect of ionic strength is basically a second-order effect as m uch as the energetics of charges in protein interiors are concerned. Nevertheless, th e present work evalu ates th e effect of the ionic strength on a macroscopic level using a grid of residual charges th a t satisfy the B oltzm ann distribution expected from th e corresponding C oulom b’s law w ith the dielectric constant of water. Basically, our philosophy is to first evaluate the relevant electrostatic interaction on a m icroscopic level for zero ionic strength and then to estim ate the ionic strength correction on a m acroscopic level. This approach appears to be quite effective despite its simple nature. 67 C om paring the perform ance of our different microscopic and sem im icroscopic approaches, we find it significant th at the PDLD and PD L D /S m ethods are m uch faster th an the F E P m ethod while still giving reasonable results. In p articular, the speed and sim plicity of the PD L D /S m ethod m ake it a very effective strategy for estim ating electrostatic energies in macromolecules. In fact, the m uch m ore rigorous F E P m ethod can give very unreliable results when the boundary con ditions are treated incorrectly or when sm all cutoff radii are used [32]. However, th e current boundary conditions of the ENZYM IX program and the treatm en t of the long-range effects by the LRF m ethod [15] appear to give reliable results. Thus, considering the rapid increase in the available com puter tim e, we recom m end the use of th e entire PD LD, PD L D /S and F E P m ethods in predicting electrostatic energies in macromolecules. W hen the three approaches give sim i lar results, these results are likely to be reliable. However when the results are different, one has to exam ine the origin of the deviation and to eventually get sim ilar results. Finally, it m ight be useful to com m ent on the conceptual advantage of using several strategies for studies of electrostatic energies in macrom olecules. Basi cally, one deals w ith very com plicated systems where m any traps m ay (and do) exist on the way to correct evaluation of the given energies. As depicted in Fig. 1.8, one m ay consider the PDLD and PD L D /S m ethods as a ruler th a t covers the range from th e fully atom istic F E P m ethod to the discretized contin uum and other continuum approaches. Covering this range is useful not only in providing a fast alternative to the F E P m ethod, but also in protecting one from different pitfalls and in providing error bounds for the given calculations. 6 8 Figure 1.8: Illustrating th e fact th at a simplified microscopic m odel such as the PDLD model provides a ruler th at allows one to explore th e range betw een the micx-oscopic F E P models and the macroscopic models. At one end of our scale, we find th e m icroscopic F E P m ethods which is based on a fully atom istic represen tation. At th e other end, we find the discretized continuum (DC) m ethod which is expected to give reasonable results only if the protein perm anent dipoles are included explicitly [this model is referred to here as discretized continuum /polar (D C /P )]. T he far end of the scale also includes th e D TK m odel th a t cannot give reliable results for charges in proteins. Between these m acroscopic and m icro scopic extrem es lies the PDLD and PD L D /S models. 69 CO D_ \ \ C l U _ J Q Q Cl (N I I h - U ) Q macroscopic microscopic Cl Ll_ simplified microscopic microscopic (fully atom istic) Reference List P erutz, M .F. 1978. Science 201:1187. W arshel, A. 1978. Proc. Natl. Acad. Sci. U.S.A. 75:5250. W arshel, A., Russell, S.T. 1984. Q. Rev. Biophys. 17:283. W arshel, A. and Aqvist, J. 1991. Ann. Rev. Biophys. Biophys. Chem. 20:267. Hoi, W .G .J. 1985. Prog. Biophys. Molec. Biol. 45:149. Sharp, K. A. and Honig, B. 1990. Ann. Rev. Biophys. Biophys. Chem. 19:301. W arshel, A., Russell, S.T., Churg, A.K. 1984. Proc. Natl. Acad. Sci. U.S.A. 81:4785. King, G, Lee, F.S. and W arshel, A. 1991. J. Chem. Phys. 95:4366. W arshel, A., L evitt, M. 1976. J. Mol. Biol. 103:227. Russell, S.T., W arshel, A. 1985. J. Mol. Biol. 185:389. Harvey, S.C. 1989. Proteins 5:78. R ashin, A.A. 1990. J. Phys. Chem. 94:1725. Davis, M .E. and M cCammon, A .J. 1990. Chem. Rev. 90:509. W arshel, A. and Creighton, S. 1989. in Computer Simulation of Biomolec- ular Systems, ed. W .F. van G unsteren, P.K. W einer, p .120, Leiden: ES- COM. Lee, F.S. and W arshel, A. 1992. J. Chem. Phys. (subm itted). Langen, R., Brayer, G.D., Berghuis, A.M., M cLendon, G., Sherm an, F. and W arshel, A. 1992. J. Mol. Biol, (in press). 71 [17] Luzhkov, V. and W arshel, A. 1992. J. Comp. Chem. 13:199. [18] W arshel, A. 1979. J. Phys. Chem. 83:1640. [19] Churg, A.K., Weiss, R.M ., W arshel, A and Takano, T. 1983. J. Phys. Chem. 87:1683. [20] Frecer, V. and M iertus, S. 1991. Int. J. Quant. Chem. (in press). [21] Klein, B .J. and Pack, G.R. 1983. Biopolymers 22:2331. [22] W arshel, A., Sussman, F. and King, G. 1986. Biochemistry 25:8368. [23] King, G. and W arshel, A. 1989. J. Chem. Phys. 91:3647. [24] Zwanzig, R.W . 1954. J. Chem. Phys. 22:1420. [25] Valleau, J.P. and Torrie, G.M. 1977. Modern Theoretical Chemistry, ed. Berne, B .J., Vol. 5, p. 169, Plenum , New York. [26] P ostm a, J.P ., Berendsen, H .J.C . and Haak, J.R . 1982. Faraday Symp. Chem. Soc. 17:55. [27] W arshel, A. 1982. J. Phys. Chem. 86:2218. [28] W arshel, A. 1984. Pontif. Acad. Sci. Scr. Var. 55:59. [29] Wong, C.F. and M cCam m on, J.A . 1986. J. Am. Chem. Soc. 108:3830. [30] Rao, S.N., Singh, U.C., Bash, P A . and Kollm an, P.A. 1987. Nature 328:551. [31] W arshel, A., Naray-Szabo, G., Sussman, F. and Hwang, J-K . 1989. Bio chemistry 28:3629. [32] Lee, F.S., Chu, Z.T., Bolger, M and W arshel, A. 1992. Protein Engineering (in press). [33] Tanford, C., Kirkwood, J.G . 1957. J. Am. Chem. Soc. 79:5333. [34] A qvist, J. 1990. J. Phys. Chem. 94:8021. [35] W arshel, A. 1981. Biochemistry 20:3167. [36] Honig. B.H ., Hubbell, W.L. and Fleweling, R.F. 1986. Annu. Rev. Biophys. Biophys. Chem. 15:163. [37] Churg, A.K., W arshel, A. 1986. Biochemistry 25:1675. 72 [38] A qvist, J. W arshel, A. 1989. Biophys. J. 56:171. [39] Eisenm an, G., Horn, R. 1983. J. Membr. Biol. 76:197. [40] Forsen, S., Linse, S., Thulin, E., Lindegard, B., M artin, S.R., Bayley, P.M ., Brodin, P. and G rundstrom , T. 1988. Eur. J. Biochem. 177:47. [41] Richarz, R. and W uthrich, K. 1978. Biochemistry 17:2263. [42] M arch, K.L., Maskalick, D.G., England, R.D ., Friend, S.H. and G urd, F .R .N . 1982. Biochemistry 21:5241. [43] Russell, A .J., Thom as, P.G., Fersht, A.R. 1987. J. Mol. Biol. 193:803. [44] Gilson, M .K ., Honig, B.H. 1987. Nature 330:84. [45] Fersht, A.R. 1972. J. Mol. Biol. 64:497. [46] H uber, R. and Bode, W. 1978. Acc. Chem. Res. 11:114. [47] Lewis, S.D., Johnson, F.A . and Shafer, J.A . 1981. Biochemistry 20:48. [48] W arshel A. and Russell, S.T. 1986. J. Am. Chem. Soc. 108:6571. [49] W arshel, A., Sussm an, F., Hwang, J.-K . 1988. J. Mol. Biol. 201:139. [50] C arter, P. and Wells, J.A . 1988. Nature 332:564. [51] Young, N.M. and Leon, M.A. 1977. Immunochemistry 14:757. [52] Friedm an, H.L. and K rishann, C.V. 1973. In Water-A Comprehensive Trea tise, ed. Franks, F., Vol.3 p. 54, Plenum Press, New York. [53] Noyes, R.M . 1976. J. Am. Chem. Soc. 84:513. [54] Rossinsky, D.R. 1965. Chem. Rev. 65:467. [55] M iller, W .A. and W atts, D.W . 1967. J. Am. Chem. Soc. 86:6051. [56] Ben-N aim , A. and M arcus Y. 1984. J. Chem. Phys. 81:2016. [57] Taft, R.W . 1983. In Progress in Physical Organic Chemstry ed. Taft, R.M ., Vol. 14, p. 247, W iley, New York. [58] Aue, D.H., W ebb, H.M. and Bowers, M .T. 1984. J. Am. Chem. Soc. 98:318. [59] Klots, G.E. 1984. J. Phys. Chem. 85:3585 . 73 [60] Pearson, R.G. 1986. J. Am. Chem. Soc. 108:6109. [61] C abani, S., Gianni, P., Mollica, V. and Lepori, L. 1981. J. Solution Chem. 10:563. [62] Bash P.A ., Singh, V.C., Langridge, R. and Kollm an, P.A. 1987. Science 236:564. [63] C ram er, C .J. and Truhlar, D.G. 1991. J. Am. Chem. Soc. 113:8305. Chapter 2 Microscopic simulations of macroscopic dielectric constants of solvated proteins 2.1 Introduction E lectrostatic effects are thought to play a m ajor role in determ ining the struc tures and functions of proteins [1-9]. Furtherm ore, calculations of electrostatic I i energies are probably the m ost effective way of correlating protein structures ' w ith their specific functions [2-4, 8, 9]. Thus, one of th e prim ary challenges in I theoretical studies of protein function is the evaluation of th e relevant electro s ta t i c energies. This can be done effectively using microscopic approaches [4], in I which th e use of a dielectric constant (which is a m acroscopic concept) is avoided. ^Nevertheless, significant insight m ay be obtained by classifying and scaling th e i ■ m agnitude of different electrostatic energies using some type of a dielectric con- Istant (or constants). Obviously, no universal dielectric constant exists for protein 1 | interiors [9], but one would like to be able to estim ate th e dielectric constant of i a given protein site. T he definition of the dielectric constant is not as straightforw ard in the case |of proteins as it is in the case of homogeneous bulk solvents; m any trap s and !opportunities for confusion exist (e.g., see Refs. [9-11] for discussions). For ex am ple, the description of a solvated protein as a uniform sphere of low dielectric surrounded by a continuum of high dielectric [5, 12] has been shown to be in- j consistent w ith the existence of ionized groups in the interiors of proteins [10], I since these groups would not rem ain ionized in a low dielectric environm ent. In fact, in evaluating the solvation free energy, A G soi, of a group of radius a and , net charge Q in a protein site with B orn’s formula: = -3 3 2 l f1 - i) ™ ; (throughout this work, energies are expressed in kcal/m ol, charges in units of electron charge, and distances in A), one finds th at the observed solvation energy ! | can only be reproduced when a large value of the dielectric constant t B is used. i l ] A pparently, th e protein sites near ionized groups are very polar [4]. Similarly, \ I j th e effective dielectric constant eeff for the interaction of two ionized groups w ith , charges Q\ and Q 2 separated by the distance 2, as deduced from C oulom b’s Law : A G (r12) = 3 3 2 ^ ^ , (2.2) 1 2 f^fr is found to be quite large in proteins (eeff > 15). This is tru e for the rath er | trivial case of interaction between charges at the surface of the protein and for ] the interaction betw een charges buried w ithin the protein [4, 9]. On the other ; l hand, estim ates of the bulk value of e obtained by applying relatively weak electric fields to protein powders [13] yield values th a t are small (e ~ 2). T he use of dielectric constants obtained in this way would be quite questionable, \ however, if applied to specific charges w ithin a protein interior. A pparently, properties th at can be described by a single dielectric constant in homogeneous solvents m ust be described by several different dielectric constants I l in proteins [4, 9]. Nevertheless, one may still like to ask w hat is the m acroscopic j dielectric constant in a given region of a protein and explore this issue w ith j m icroscopic sim ulations. In doing so, we may agree th a t the m ost basic definition 76 of the m acroscopic dielectric constant e is given by the fundam ental electrostatic equation (e.g., see Refs. [4, 14]): ( e - l ) E = 47rP. (2.3) T he polarization vector P is defined as ^ M 1 , P = Y = y E > (2-4) i where is th e charge of the particle at the point and th e sum m ation includes all particles lying w ithin a specified macroscopic volume V . T he m acroscopic electric field E is uniquely obtained from the microscopic field, by [4] E = = (2.5) where dV designates a volume elem ent and the integral is over the sam e volum e used to define P . Taking Eqs. 2.3, 2.4, and 2.5 as our startin g point, it is possible to exam ine the value (or values) of e in proteins. A lternatively, one could use the K irkw ood-type form ulation [15], which is obtained from Eq. 2.5 under several approxim ations (see below). Previous attem p ts to evaluate e in proteins using K irkw ood-type expressions , have been reported [16, 17] and related approaches have also been introduced I j [18]. However, these treatm ents used a m ajor simplification of th e system and , seem ed to have underestim ated e. T h at is, all the studies reported to d ate involve ! a protein m olecule or ju st a portion of it in vacuum, ignoring th e surrounding ! solvent as paid of the actual s.ystem. The corresponding effect of the solvent, which can be treated as a reaction field (R F), on the average dipole m om ent of the specified region in the protein is apparently missing in these treatm ents. T he neglect of th e R F, which may seem at the first sight to be a reasonable approxi- I m at ion, leads at tim es to drastic reduction of the value of e. For instance, if one i j tries to evaluate e for a small volume elem ent of w ater w ithout including th e RF, J one obtains (see below) e ~ 9 rather th an e ~ 80. T he effect of the RF, which j m ay not be so widely appreciated, is not associated w ith the fact th a t th e w ater I ! 77 around a nonpolar protein provides a large eeff for interacting protein charges, b u t is connected w ith the fundam ental value of the p rotein’s dielectric constant e. T here are also some objections th a t can be raised against th e fact th a t the ' previous studies [16-18] used norm al mode analysis in determ ining th e average i fluctuations of the protein polarization. Such harm onic treatm en ts m ight under- i [ estim ate the average fluctuations. Furtherm ore, if the norm al m ode treatm en t only deals w ith dihedral angles [17], then a significant p art of the polarization ma}' be unaccounted for (e.g., bending of R -O -H angles). A lthough it is not clear w hat are the values of e in proteins, it seems th at I ' these values cannot be deduced reliably from studies th a t do not include th e , R F effect. M ore specifically, in view of the large R F effect in w ater and other i j possible conceptual problem s, we feel th a t any attem p t to calculate e in proteins should be carefully exam ined by first checking w hether th e given m ethod can produce a reasonable value of e for pure w ater. It is im portant to note th a t proper calculations of c in polar solvents are far from being trivial, and have I only em erged fairly recently [19-23]. In fact, controversies still exist, since the • correct treatm en t of the solvent R F in sim ulations using periodic boundary con ditions has not yet been established (see discussion in Ref. [23]). Evaluation of : e is fundam entally m ore difficult in proteins than in a homogeneous solvent like I ; w ater. In this study, we will concentrate on the evaluation of e in proteins using m icroscopic sim ulation approaches. In so doing, we will accept the fact th a t e may depend on th e size of the sample and the m agnitude of any applied field, I since we are dealing w ith a microscopic system of lim ited dim ensions th a t m ay 1 I som etim es be very inhomogeneous. Several independent approaches will be ex- i 1 plored in an effort to assess th e range of validity of the values of e obtained w ith a given definition. Moreover, we will m ake sure th a t any of our approaches for ■ the evaluation of e will be calibrated by using the same approach for calculation of e of bulk water. This will, hopefully, prevent us from applying an approach 78 to the studies of protein dielectrics th at may incorrectly estim ate th e e in w ater (once we know th a t an approach gives an unreasonable e in w ater, we m ay be 1 m ore careful about the predictions of the corresponding e in proteins). 2.2 Theoretical approaches for microscopic calculations of e < i i 1 In this section, we will outline our form ulations and the corresponding sim ulation I approaches. They are based on considering microscopic system s in a spherical | way, since system s of this geom etry seem to give the sim plest consistent connec- t tion betw een th e microscopic and macroscopic dielectric theories. In deriving the form ulations, we will try to rem ain as close as possible to th e fundam ental relation expressed in Eq. 2.3. The assum ptions involved in obtaining the m ore frequently-used expressions will be explained in some details. T he validities of 1 these approaches will be exam ined by evaluating e for the reference w ater sys- ' terns form ed by the Surface Constrained All-Atom Solvent (SCAAS) m odel (see ' below). I I I 2.2.1 T h e K irk w ood -F roh lich e from m icr o sco p ic i sim u la tio n s: M e th o d I Let us take as our system a. macroscopic sphere im m ersed w ithin a m edium whose 1 ' dielectric constant is eRF, and let the entire system be subjected to an externally 1 applied electric field E 0 = eE 0. Under these conditions, one finds th a t th e total ! m acroscopic electric field E w ithin the sphere in the direction of the un it vector i e is | < E' e> = E« - f + ^ T I iT (p' e)' (2'6 ) i T he term - f P is the field produced w ithin the sphere by P itself [14], and th e | term ■ 2 |ba. F _ _.U T ip (the reaction field) is the field from the surrounding dielectric 79 continuum resulting from its interaction w ith P [24]. The notatio n (A) indicates a B oltzm ann average of A. Substituting 47rP/(e — 1) for E in Eq. 2.6 and rearranging yields the expression: ( e - l)(2eR F + 1) = 4t t(P • e) = ( M • e) 3(e + 2eRF) 3E0 R 3E0 ’ 1 ' } w here R is the radius of the sphere. The value of e in this equation depends on [ eR F and in fact there is an inherent m athem atical instability in treatm e n t w ith j crf 7 ^ ^ [19, 21]. Here and in m ost other dielectric calculations, it is assum ed | th a t eR F = e although this m ight not be the eR F used in th e actual force field of j our sim ulations (see below). W hen eR F = e, then Eq. 2.7 becomes (e — l)(2e + 1) _ ( M • e) 9e i?3E0 ' ^ ’ Eqs. 2.7 and 2.8 are valid for systems in which ( M ■ e) = 0 when E0 = 0. W hen this is not the case, a m ore general form of Eq. 2.8 should be used [25] (e - l)(2e + 1) _ 1 d ( M • e) . ! 9e R 3 d E0 ‘ 1 ' J Eq. 2.9 makes it clear th at the dielectric properties of a given system are de- , term ined by the change of ( M ■ e) in response to the applied field E0, rath er I 1 th an { M ■ e) itself. This distinction is significant, because there are system s for p j which ( M ■ e) can be quite large even when E0 = 0. Now our task is to calculate d { M ■ e )/d E0 by microscopic sim ulations. Our notation becomes m ore com pact if we specify th a t E0 is applied in the x direction, so th a t ( M ■ e) = (Mx). T he derivative d(M x)/# E 0 can be w ritten as d(M x)Eo _ ^ {Mx)e 0+ae0 - (Mx)e 0- a e 0 , 2 10\ dE 0 A£o-o 2A E 0 i in which the B oltzm ann averages (Mx)Eo±ae 0 can be expressed as (M x)Eo ± A E a f d rM xe -(3 { -u°-Mx[Eo±^ Eo]) f dre-p(u°-M*iE°±AE°v j dVMxe±l3Mx&Eoe-P(uo-MxEo) / dTe±l3MxAEoe~l3(U o~MxEo' > (Mxe±0MxAEo)Eo (2 .11) 80 I w here U0 is the p art of the potential independent of E0. Eq. 2.11 can be evaluated by p ertu rb atio n treatm ents similar to those used in Free Energy P ertu rb atio n m ethods, but here we take the standard statistical m echanical tre atm en t, sub stitu tin g Eq. 2.11 into Eq. 2.10 and obtaining d(Mt x/Eo dE 0 — lim 1 ae0— * - o 2A E0 (Mx< ,0MxAEo E0 (Mxe-PM*AE°}Eo oPMxAEo) Eo (e -pMxAEc )Eo (2 .12) Using the expansion ex ~ 1 + x, we obtain d(M; x) Eo dEo = lim 1 AEo-^0 2A En X (M ,) e. + /JA E „ (il® s „ ( M J e. - /3AE0(M J)e „ 1 + /JA E0(M *)a which can be simplified to d(Mx)Ef. - 1 1 - /3AE 0 (Mx)Eo (2.13) dE„ = lim ae0-*o 2A En 2/?AE0(M,a)E. - 2i8AE.<AfI ) | . 1 - /32A E P ( M l ) E„ - ( M x ) ) (2.14) Sim ilar expressions can be w ritten for the y and z directions. If th e m edium is isotropic, then we can also write d ( M ■ e)E o & (M " )Eo { M ) l <9E0 3 j ; S ubstituting Eq. 2.15 into Eq. 2.9, we obtain (e — l)(2e + 1) (3 ( M ‘ I En (m YEo >.15) (2.16) 9e 3 R 3 This equation is of course a variant of the well known Ivirkw ood-Frohlich deriva tion [15, 26]. Here, however, we do not require th at E0 —> 0, b u t im pose the weaker restriction th at E0 be small. It is im portant to m ention th a t Eq. 2.16 is only valid when th e force of the RF on the molecules in the explicit p art of the system is included in the sim ulation, since the reaction field term was introduced in Eq. 2.6 and retained throughout the subsequent derivation. T he derivation of Eq. 2.16 from Eq. 2.10 did not consider th e electronic , polarization of M . T h at is, in Eq. 2.11 we should have replaced M by M perm + M i n ct(A E 0, M perm) where Mind is the contribution associated w ith the induced | 1 dipoles of the system , which is a function of A E 0 and th e given perm anent ! I polarization. The corresponding formal correction to Ecp 2.16 m ight be quite j ! involved (see for exam ple a related treatm ent in Ref. [25]). However, num erical 1 i . . . . 1 i sim ulations th a t consider the atom ic induced dipoles by a self-consistent iterative 1 ; approach (see Ref. [4]) allows one to easily obtain the contribution of M i nd to ! e. In particular, a reasonable approxim ation can be obtained by adding to the j ! . . 1 j right-hand side of Eq. 2.16 the contribution (d M in ci/d E 0) /R . This contribution j can be estim ated by {M in(ijE 0)/ R 3 for small values of E a and is of th e order of i ! (t.r < [ — 1 )/(t.- x - , + 2) (m aking it im portant only when e is small). To exam ine the perform ance of the above m ethod, we consider the well- . i i defined test case of the calculation of the dielectric constant of water. T he w ater ! | system is sim ulated by a sphere of explicit w ater molecules of a certain radius as , j described in the SCAAS model [23], which is depicted schem atically in Fig. 2.1. ! . i i In this m odel, the w ater molecules in regions 1, 2 and 3 are represented explicitly ! i i as individual atom s; the molecules of region 4 are represented by the reaction , ' field. T he forces in regions 1 and 2 are specified by a standard force field [23], ^ plus the effects of E0 and ERF, and the forces from the surface region 3. T he I m olectdes in th e surface region are subjected to th e same standard force field j plus a set of radial and angular constraints designed to keep th e average density and polarization of the inner region as close as possible to those expected in an infinite system (see Ref. [23] for m ore details). All the nonbonded interactions are evaluated w ithout any cutoff as this seems to be the m ost consistent way to tre a t the long-range electrostatic effects. In introducing the reaction field, it was found in our early study [23] th a t using eR F = in MD sim ulations leads to a positive feedback betw een P and E RF, and thus to an unrealistically large e. This presents a fundam ental problem for ; dielectric theories which has not been resolved consistently in studies th a t involve i periodic boundary conditions (see Ref. [23]). W ithout resolving the origin of the 82 r ~ -v u 7 7 C c n < C - A r | Figure 2.1: An illustration of the four regions of the SC A AS w ater m odel. Re- i gion 1 is th e subsphere for which the e is calculated. Region 2 contains th e rest of | th e unconstrained w ater molecules. Region 3 is the surface region at which the • molecules are subjected to both the radial and polarization constraints. Region 4 ' is the bulk region. 83 positive feedback, it was found th a t the use eR F < tw can lead to reasonable values of e. In this study, we chose to keep eR F = ew and circum vented the feedback problem by scaling the interaction of E R F w ith all th e electroneutral groups (a group w ith a net charge of zero used to represent a dipole) by th e function S(r): S(r) = 1 when r < r 0 — a | [1 + cos(7 r(r — r 0 T cr)ja)] when r 0 — a < r < rQ , (2-17) 0 when r 0 < r , where r 0 is the radius of the system and r is the distance of the designated center of a group from the center of the sphere (in case of the w ater m olecules, the 0x3 'gen atom is taken as the center of the group). Here we use a = 5. A lthough this scaling does not resolve the origin of the feedback effect, it does reflect the assum ption th a t this problem would disappear in an infinite system . T h a t is, this function sim ulates the fact the interaction of the actual hydrogen bonding betw een region 3 and the molecules which would be in region 4 in a real infinite system would tend to oppose the polarization induced w ithin the inner regions. A nother waj' of saying this is th a t we have chosen constraint forces (including a modified reaction field) in order to keep the polarization of our system sim ilar to th a t expected from an infinite system. In the lim it of small spheres (e.g., a 4 A or 6 A sphere), th e use of Eq. 2.17 would neglect a m ajor p art of the reaction field. This deficiency is corrected here by simply using S(r) = 0.5 for those cases. It is im p o rtan t to recognize th at our actual force field involves eR F = e, while th e derivation of Eq. 2.16 assumes th at eR P = e. In particular, when studying cases w ith a sm all region 1 and a large region 2, it is not clear th a t th e explicit field from region 2 is the one predicted by the microscopic m odel w ith eR F = e. This m eans th a t Eq. 2.16 and our actual force field m ight not be fully consistent l j w ith each other (as is of course the case with other sim ulation studies). Thus, insisting th a t our m odel reproduces a reasonable value for th e e of bulk w ater 84 is at present the best guarantee th a t the same m odel would give reasonable j ! _ . . 1 | inform ation about the value of e in proteins. j T he above m ethod was used to calculate t in five w ater spheres w ith radii of 4, j 6, 8, 10 and 12 A. T he e’s were calculated in each case for several radii of region 1. j I T he effect of E R F was exam ined in a series of parallel MD sim ulations including and excluding the reaction field. All sim ulations involved MD trajectories of 50 ps at 300 K w ith tim e steps of 2 fsec. The results of these w ater sim ulations are sum m arized in Table 2.1 under M ethod I, while those of the 4 and 12 A spheres are depicted schem atically in Fig. 2.2 and 2.3. As shown in Table 2.1 , ! and Fig. 2.2 and 2.3, the inclusion of the reaction field is crucial for obtaining a ! reasonable dielectric constant. This point is particularly clear in cases of small spheres, while for larger spheres we could obtain the effect of th e E R F for the inner region by including the outer region (see below). As far as th e calibration procedure is concerned, we consider the 12 A sphere as our representing system . For this system , we obtain e of about 70 for the inner 8 A subsphere as com pared to th e value of ~ 80 expected for bulk water. The calculated value of e becomes sm aller for outer subspheres of this case (e.g., the 10 or 12 A subregion) reflecting th e fact th a t our boundary conditions are not perfect. Yet, we consider the values in the inner regions as a reasonable representation of t in w ater, j An exam ination of Table 2.1 and Fig. 2.3 indicates th a t it is possible to obtain : the effect of the reaction field on e of region 1 by simply using a sufficiently large region 2. For exam ple, we obtained ~ 60 for e in the inner 6 A region of the j 12 A sphere w ithout including the reaction field. This suggested th a t one could j avoid some of the conceptual problems associated w ith the inclusions of th e E R F by using a large solvent sphere around the region of interest. N evertheless, one m ust avoid calculating the e for the entire system unless the effect of th e E R F is taken into account. For exam ple, the e for the inner 10 A region of th e 12 A : sphere is drastically underestim ated w ithout including the R F effect. J 85 M ethod I M ethod II Size (A) Ri (A) e/,o £/ e11,0 e/z 4.0^0 4.0 9. 45. 9. 47. 6 .0 ^ 4.0 30. 32. 5. 22. 6.0 8. 22. 8. 22. 8 .0 ^ 4.0 45. 69. 7. 38. 6.0 36. 95. 6. 27. 8.0 12. 20. 7. 14. io.o(6 ) 4.0 68. 44. 11. 20. 6.0 59. 66. 10. 22. 8.0 40. 50. 7. 19. 10.0 8. 16. 7. 15. . 12.0<6 > 4.0 65. 40. 15. 28. 6.0 58. 71. 8. 35. 8.0 37. 74. 6. 33. 10.0 21. 48. 6. 23. 12.0 7. 18. 6. 18. Table 2.1: T he dielectric constants obtained for various w ater spheres w ith and w ithout the reaction field. R i designates th e radius of region 1 in a w ater sphere of th e indicated size. M ethod I and II designate, respectively, th e use of Eq. 2.16 and 2.20. e/i0 and e//i C > designate the e’s obtained by the indicated m ethods w ithout the reaction field E RF, while tj and e/j designated th e e’s obtained w ith the reaction field. In the present sim ulations, we calculate e for the three perpendicular directions of and take e as the average of these values. T he indicated size corresponds to the radius of region 3. All calculations involved 50 ps sim ulation tim e, (a) The calculations for the 4 A and 6 A w ater system s involved a scaling of E R F by a constant factor (0.5 in th e present case), (b) For larger spheres, the distance-dependent function of Eq. 2.17 was used for the scaling (this function could not be used for small spheres since it would im ply neglect of m ost effects of the reaction field). 86 Method I: 4A water system 100 80- 60 40 20 ^ with RF □ without RF 4 Rj (A) I Figure 2.2: Illustrating the effect of E R F on e for the 4 A w ater system . T he bars | repi'esent the e calculated from the average polarization of th e entire region 1, 1 which in the present case is taken as the entire w ater system (the radius of J region 1 is designated by Ri). The diagram illustrates th e large effect of E RF. | T he sim ulation tim e is 50 ps. L _ I I 1 Method I: 12A water system 100 80 E 60 40 f t 4 I 0 ; Figure 2.3: A plot of e for different radii of region 1 (represented schem atically by the inner circles) in th e 12 A w ater system (represented by th e outer circles). ! T he bars th a t correspond to the different radii of region 1 represent in each case j th e e’s associated w ith the entire inner sphere. For exam ple, w ithout th e reaction | field, we obtain e = 65 by averaging over an inner sphere of 4 A and e = 58 by j averaging over an inner sphere of 6 A. The fig ure illustrates the dependence of j e on E rf and dem onstrates th a t in sufficiently large system s one can obtain a | reasonable e for inner region even w ithout including E RF. Yet, when region 1 is a significant p art of the entire system , it is crucial to include th e reaction field ' < effect. i with RF □ without RF 8 10 Rj (A) 12 At this point, it m ight be useful to point out th a t the calculations w ithout the reaction field could have been done with a modified form of Eq. 2.16, substituting £rf ~ 1 * n Eq- 2.6. Such a treatm ent, which represents a m ore rigorous study of th e dielectric in region 1 in vacuum, leads to replacem ent of the (e -l)(2 e + l)/9 e term in Eq. 2.16 by (e -l)/(e + 2 ) (see Refs. [19, 25]). T he resulting expression is rath er insensitive to the value of {M 2) and involves a well known m athem atical instability for (AT2) > 1, leading to a negative e. It is also instructive to note th a t th e value of (AT2) for a sphere in vacuum ((AT 2)0) and for a sphere surrounded by a continuum dielectric ((AT 2)r) are related by (e.g., see Ref. [25]) , {M 2 )° = (2e + 1Ke + 2 ) <M 2)- (2' 18) J Thus for e ~ 80, one obtains ( M 2)0 ~ T ( A T 2) £. Using this ( M 2)0 in Eq. 2.16 j gives e ~ 5 , indicating th a t the ( M 2)0 obtained w ithout the R F effect m ight not j be the proper ( A T 2) for this equation. ! Finally, it is useful to note th at the w ater calculations presented above con- 1 verged rath er slowly. At least 30 ps sim ulation tim e was required to achieve j convergence and continuation of the sim ulations to 100 ps gave sim ilar results to ; w ithin 10 %. This sim ulation tim e should be com pared to the >200 ps needed 1 for convergence of G* and the 50 ps needed for convergence of the polarization of the system [22]. 2 .2 .2 E v a lu a tin g e u sin g (E): M e th o d II W hile Eq. 2.15 is a reasonable approxim ation for homogeneous solvents, it m ight not provide the correct form ulation for evaluating e in m acrom olecules, due to the inhom ogeneities of such system s. In particular, one m ay question th e validity of the derivation of Eq. 2.7, which is based on th e assum ption th a t the region under study is a macroscopic sphere im m ersed in a m edium of a uniform dielectric constant. In principle, it should be possible to avoid the approxim ation th a t led 89 to Eq. 2.15 by evaluating e in a direct way using Eq. 2.3. This approach is im plem ented here with the relation: 1 = 1 + ^ , (2.19) where (E) and (P) are evaluated directly for the region under study. In p artic ular, (E) is evaluated w ith the approach explained in A ppendix A of Ref. [27] and (P) is evaluated using Eq. 2.4. MD sim ulations for w ater system s sim ilar to those described above were used to calculate the e with this approach. T he cal culated results converged very slowly, leading som etim es to a negative e for cases w here th e radius of region 1 is sm aller than 5 A (results not shown). T his m ight seem surprising, since the results obtained using Eq. 2.16 are relatively stable. However, the two approaches are quite different. In particular, it is assum ed arbitrarily in deriving Eq. 2.8 from Eq. 2.7 th at eR F = e (which always produces I > 0), while models w ith eR F / e contain m athem atical instabilities [19, 21] th a t can lead to negative e. A lthough it is still unclear w hether the actual value of eR F should correspond to the macroscopic estim ate of e, it is clear th a t treatm en ts th a t use eR F = e give m ore stable and reasonable results th an treatm en ts th a t do not im pose this constraint. Thus we relaxed our requirem ents and replaced the ! electric fields from regions 2 and 3 in Eq. 2.6 w ith a reaction field proportional to the polarization P of region 1. Assuming th at eR F = e, we obtained | (e — l)(2e + 1) 47t(P • e) 3e E0 (2.20) A lthough this equation is form ally identical to Eq. 2.14, it provides an altern a tive to the standard ( M 2) expression. The m ain feature th a t m akes Eq. 2.20 m ore stable th an Eq. 2.19 is the replacem ent of the field from region 2 by the field expected from this region in the corresponding m acroscopic model. This approxim ation m ight be removed in subsequent works, using longer sim ulation tim es and larger system s. However, at present we prefer to obtain a stable and convergent m ethod (th at can be calibrated by studies of e in w ater) to a m ore rigorous yet unstable one. 90 T he inclusion of the effect of the induced dipoles in Eq. 2.20 is m uch m ore straightforw ard th an in the case of Eq. 2.16. T h at is, including the atom ic induced dipoles in the sim ulation and adding the contribution from these induced dipoles (at the given E 0) to P gives autom atically the effect of the electronic polarizabilities on e. i T he calculations done using Eq. 2.20 are referred to here as M ethod II. T he e’s obtained w ith this m ethod for the 5 w ater spheres m entioned above are shown in Table 2.1 under M ethod II. The results follow the trend of M ethod I, indicating th a t E rf should be included in the calculations in order to obtain a large e. I N evertheless, w ith the same scaling function S(r), we were unable to obtain e I ! values as large as those of M ethod I. Once again, the im portance of the reaction field is to be stressed rather than the exact value of e. In fact, w ith M ethod II, it is hard to obtain a large e even for the inner parts of large sphere as long as E rf is not included (e.g., the 12 A sphere under M ethod II in Table 2.1). It ■ should be noted th a t the convergence of this m ethod seems to be sim ilar to th a t i of M ethod I. 2.3 Examination of e in solvated protein A fter verifying th a t the above two approaches give a reasonable dielectric con- , stan t in w ater, we can try to elucidate the m agnitude of e in different sites in i proteins. T he protein model is depicted in Fig. 2.4. In this m odel, the protein | system is divided into 4 regions. Region 1 is the region whose dielectric constant is of interest, while regions 2a, 2b, 3a, 3b and 4 represent the rest of the system . Regions 1, 2a and 2b contain protein and w ater atom s (see Fig. 2.4) which are treated by a standard force field [28] and are not subjected to an}/ additional j constraint except, of course, the forces from the surrounding regions. T he w ater ; molecules in region 3b are subjected to the previously described surface con- i j straints [28] th a t keep their polarization and radial positions close to the values (4) (3a) Figure 2.4: An illustration of the five regions of the protein m odel. Region 1 is a subsphere containing the protein and w ater groups of which th e e is calculated. Regions 2a and 2b contains the rest of the unconstrained protein and w ater atom s. Regions 3a and 3b contains the constrained protein and w ater atom s. Region 4 is the bulk region. See text for more details. 92 expected from the corresponding infinite system . T he protein atom s in region 3a I are kept near the corresponding X-ray positions by quadratic constraints and j 1 th e steric effect of this region is m aintained by considering th e bonding, angle- bending and torsional interactions between this region and region 2a. However, in order to obtain a stable spherical treatm en t, we replace the electrostatic effect 1 of this region as well as th at of the bulk solvent in region 4 by the m acroscopic < R F associated w ith th e polarization of the inner regions. All the nonbonded interactions inside the explicit region are evaluated w ithout any cutoff as in th e w ater case. 1 1 T he calculations were done for the test case of trypsin, whose su b strate was ; represented here by the tripeptide Ala-Lys-Gly. T he carbonyl group of th e pep tide bond connecting the lysine and alanine residues of the su b strate was con- | strained w ith a weak quadratic constraint to the center of th e oxyanion hole.1 We attem p ted to estim ate the value of e in three protein sites: th e oxyanion ! hole (site A), the site of His 57 (site B), which is a p art of the catalytic triad , | and th e site of Ser 54 (site C), a site about 13 A from the oxyanion hole. All j calculations were done for reference spheres of 15 A radii centered at th e corre- ' | sponding sites. Both M ethod I and II were used and the effect of th e reaction i field on the dielectric constant in trypsin was exam ined. T he calculations th a t ' ■ considered the R F effect were perform ed by the procedure described above. T he j calculations th a t corresponded to the omission of th e R F effect were perform ed j in a different way th an in the w ater cases. T h at is, as established in th e w ater test case, th e reaction field represents th e effect of th e solvent around the pro- ; tein. In general, one m ay choose to represent the solvent effect by E RF, or by an explicit solvent m odel, or by a com bination of the two. Since th e protein is not a spherical object, we chose to represent the solvent effect in regions 2b and 3b [ i _______________________________ xT h e active site of serine proteases is com posed of residue Asp 102, His 57, and Ser 195 i (w hich are referred to as the “catalytic tria d ”) and the oxyanion hole which involve tw o pro tein ! hydrogen bonds th a t is designed to provide electrostatic stab lizatio n to the reaction su b strate (see Ref. [31]). 93 by explicit w ater molecules while using E R F to represent regions 3a and 4. Thus, to om it the solvent effect, we “turned off” E R F and replaced the explicit w ater | molecules inside the reference radius w ith nonpolar van der W aals molecules (ob- ' tained here by neutralizing the residual charges of our w ater m olecules). In so J i doing, we m aintained the spherical notion and prevented the collapse of protein groups inside the reference sphere.2 MD trajectories of 50 ps at 300 K w ith tim e steps of 2 fsec were used to | calculate the e values. Once again, e’s for various subspheres were obtained ! ! and the results are sum m arized in Table 2.2, which shows the e values for the ' ■ [ ; three protein sites w ith and w ithout the solvent effect under M ethod I and II. > Fig. 2.5, 2.6 and 2.7 depict the results from M ethod I for th e three sites A, B ! and C, respectively. As shown in Table 2.2 and the figures, w ith the solvent i ; effect and under M ethod I, the e values for the 4 A subspheres appear to be j quite large in a site of catalytic im portance (e ~ 11 for the oxyanion hole) and ' quite small in a non-functional buried site (e ~ 5 for a site C). W ithout the 1 j solvent effect, the e value for the oxyanion hole decreased to ~ 6 as shown in i • Table 2.2 under M ethod I. However, the solvent appeared to have little effect 1 on the dielectrics of site C (see Table 2.2 and Fig. 2.7). H ere again we see I j th a t the calculations w ithout the solvent effect produced sm aller e’s under both , l , j m ethods, which indicate th a t for the protein systems a proper inclusion of solvent is essential for obtaining the actual dielectrics of any given region. In the present study, we only considered the perm anent dipoles. T he contri- | bution of th e induced dipole to M can easily be obtained by considering their | self-consistent interactions w ith the applied field and w ith each other, and fol- i lowing th e prescriptions m entioned after the derivations of Eqs. 2.16 and 2.20, , j i j respectively. However, such calculations requires more com puter tim e and were I | 2S im ilar b u t less stable results were obtained while leaving em p ty space in the solvent | cavities of regions 1, 2b and 3b. i ( 94 M ethod I M ethod II Site Label R i (A) e/,o e/ e//,o e// A oxyanion 4.0 6. 11. 2. 10. 6.0 6. 8. 3. 5. 8.0 3. 12. 2. 8. 10.0 3. 20. 2. 10. 12.0 2. 16. 2. 10. 14.0 2. 8. 2. 8. 15.0 2. 7. 2. 7. B His 57 4.0 3. 6. 4. 2. 6.0 4. 8. 1. 5. 8.0 4. 12. 3. 6. 10.0 3. 16. 2. 8. 12.0 2. 13. 2. 7. 14.0 3. 7. 3. 6. 15.0 3. 6. 3. 6. C Ser 54 4.0 6. 5. 5. 7. 6.0 6. 4. 4. 4. 8.0 3. 3. 4. 3. 10.0 3. 4. 3. 3. 12.0 3. 5. 3. 4. 14.0 3. 5. 2. 5. 15.0 3. 4. 2. 4. Table 2.2: T he dielectric constants evaluated at different sites of th e trypsin ! system w ith and w ithout the solvent effect, i?/ designates the radius of region 1 j of the indicated protein site. e / , a and e n t0 designate the e’s obtained by the indicated m ethods w ithout the solvent effect (see text for m ore details), w hereas i e/ and e/j designate the e’s obtained w ith the solvent effect (which included th e * reaction field E RF). The explicit p art of the trypsin system involves the protein, ; the tripepticle substrate (which is weakly constrained to the oxyanion hole), and th e w ater molecules w ithin a 15 A radius from its designated site. T he e’s were evaluated using th e two m ethods for the various sizes of region 1. P rotein atom s beyond the reference sphere were constrained harm onically to their X- ray positions and their electrostatic effects along w ith th a t of the bulk solvent beyond were represented by E R F associated w ith the m acroscopic polarization of th a t reference sphere. All the nonbonded interactions w ithin th e reference ! sphere were evaluated w ith no cutoff. All calculations involved 50 ps sim ulation I tim e. 95 Figure 2.5: Illustrating the effect of th e solvent on e for a site centered at th e ( oxyanion hole of trypsin. T he upper diagram shows the trypsin system and the corresponding region 1 for a site centered at site A (the oxyanion hole). The ! trypsin molecule is represented by its m ain-chain trace and th e w ater molecules 1 are plotted explicitly. The circle contains the heavy atom s found w ithin 4 A j from site A and is used to represent the case of which region 1 is a 4 A sphere. j T he lower diagram s give e as a function of the radius of region 1 (notation as in j Fig. 2.3, though only two circles representing R j of 6 A and 14 A are shown). This is to illustrate th e effect of the solvent on e for site A under M ethod I. T he ! sim ulation tim e is 50 ps. I I I * 1 I 96 Method I: Site A with RF □ w ithout RF 4 6 8 10 12 14 15 R, (A) Method I: Site B U with RF □ w ithout RF | Figure 2.6: Illustrating the effect of the solvent on e for a site centered at His 57 ; of trypsin. N otation and conditions as in Fig. 2.5. T Method I: Site C 2 0 - H with RF □ w ithout RF 5- 10 12 14 15 4 6 8 R t (A) Figure 2.7: Illustrating the effect of the solvent on e for a site centered at Ser 54 of trypsin. N otation and conditions as in Fig. 2.5. 99 left for subsequent studies (the corresponding correction is only significant for sm all e’s). An interesting issue th a t the present model can help to address is the fre- j I j quency dependence of the dielectric constant or the tim e dependence of th e j autocorrelation function (a.c.f.) (M(O)M (t)). This a.c.f. and the related a.c.f. of ; i i the fields from protein sites play a m ajor role in determ ining dynam ical effects in enzym atic reactions (see Refs. [29, 30]). Fig. 2.8 presents a prelim inary study of th e a.c.f. of C ^ ( t) obtained from the 4 A subspheres of the 12 A w ater system and the three different protein systems (all these calculations included th e effect ! i _ , of the solvent). As seen from the figure, the a.c.f. is different in different protein , sites as is expected from sites w ith different local environm ents. Nevertheless, | 1 the overall tren d is sim ilar to th a t of the surrounding solvent. To gain m ore in sight on the pure contribution of the protein to the Cjw(t), we evaluate this a.c.f. ! w ithout including the solvent effect in the calculations (this is done as before by replacing the w ater molecules w ith nonpolar van cler W aals molecules). T he | i results for the site A case are presented in Fig. 2.9, which dem onstrate th a t the ' protein perm anent polarization (which in the absence of the solvent is the m ajor j contribution to C m (^)) contributes significantly to Cjw(t) and cannot be treated ! as a fixed polarization. I T he analysis of the trend in the variation of the a.c.f. betw een different ; sites and the contribution of the protein perm anent dipoles to this a.c.f. is left j for subsequent studies while the present results are only presented as a further j support to the argum ent th at for proteins e is not a universal param eter and its actual value and tim e dependence depend on the given site. i 1 ! 2.4 Concluding remarks | ( ' t This work used microscopic MD sim ulation approaches to exam ine the m acro scopic dielectric constant, e, in different sites of a solvated protein. T he effect of 100 Figure 2.8: T he autocorrelation functions Cjvr(t) of th e to tal dipole m om ent for different protein sites w ith solvent effect. A com parison of th e autocorrelation functions C o f the total dipole m om ent for the 4 A subspheres of the 12 A w ater system (solid line) and the three trypsin system s (dashed, dotted, and dashdot lines for the site A, B, and C cases, respectively) all w ith solvent effect. I I 101 0.8 0.6 A I 0.4 s V 0.2 0.2 tim e (p s) Figure 2.9: The autocorrelation functions Cm (1) of the to tal dipole m om ent for the trypsin oxyanion hole w ith and w ithout solvent effect. A com parison of the autocorrelation functions C^f(t) of the total dipole m om ent for th e 4 A subspheres of the 12 A w ater system and the oxyanion hole of trypsin. T he w ater sim ulation (solid line) includes the effect of the surrounding solvent and the protein calculations are done both w ith (dashed line) and w ithout (dotted line) the effect of the solvent. The calculations w ithout the solvent effect reflect the contribution of the protein perm anent polarization to 103 0.8 0.6 0.4 0.2 0.2 0 2 4 6 8 ; ^ tim e (p s) j 1 th e reaction field of the surrounding m edium was included in the calculations. It was found th a t the e’s of reference regions in a protein can be significantly underestim ated if the effect of the solvent, or the corresponding reaction field, is neglected. This effect is not associated with the reductions of the effective charge-charge interactions in protein due to the high dielectric of the surround ing solvent, b u t w ith an increase of the t of the protein. This m eans th a t the \ dielectric constant of a protein site should be evaluated only when th e solvent around th e protein site is included either im plicitly or explicitly. In view of the m any conceptual and practical traps involved in the evaluation of the dielectric constants of proteins, we find it im perative to first verify th a t any m odel used for calculations of e in proteins gives a reasonable e in w ater. J A m odel th a t cannot reproduce a large value of e in w ater (e.g., a m odel th a t i j neglect E RF) m ight not be adequate for studies of e in proteins. In this study, we found th a t the problem associated w ith the correct im plem entation of the reaction field can be avoided by simply using a sufficiently large solvated system . U nder this condition, the e of the inner region approaches the value expected j from th e same region when the reaction field is used. I : This work used models th at incorporated the reaction field in a way th a t reproduce correctly the e of water. The same calibrated models indicated th a t the actual value of e in proteins depends on the specific site. In particular, we find th a t e can be as large as 10 in the catalytic active site of trypsin. More work is needed to be done on other protein system s to determ ine th e generality of this observation. T he reader m ay wonder at this point w hat is the m eaning of our e and w hether or not it can be used in calculations of electrostatic effects in proteins. T he answer to this question is far from being obvious. It is clear from th e present • calculations th a t e cannot be represented by a universal constant. Furtherm ore, as was argued extensively in Refs. [4, 9], e depends on the possible definitions of the protein dielectric constant; different definitions would yield different values. i i l I For exam ple, the e# of Eq. 2.1 and the ee/ / of Eq. 2.2 are significantly larger th an e. Perhaps the m ost obvious use of e is in K irkwood-Tanford (TK ) type treatm ents [5, 12] where the protein is viewed as a m edium of a uniform dielec tric constant surrounded by a high dielectric solvent. As discussed at length elsewhere [4, 10], the TK model cannot reproduce the energetics of charges in proteins w ith th e commonly used values of e = 2 or e = 4. Using e = e = 10 would give m uch b e tter results since the energies of charged groups in a uniform m edium follow th e trend of Ecp 2.1. The larger the e, the m ore stable is a given charge. However, th e TK m odel would never reproduce the correct results in cases when charges are more stable in their protein sites than in w ater (as long as one uses the same macroscopic radius for the charge in solution and in the i protein). T he above problem can only be resolved if the protein polarity is taken into account explicitly [9, 11]. This can, of course, be accom plished by using ex plicit microscopic models [4, 9] as well as semi-microscopic m odel introduced in Ref. [31] (see also Ref. [9]), or discretized continuum approaches [8, 32] provided th a t these approaches also take into account the protein perm anent dipoles [9]. T he semi-microscopic model expresses the energy of a charge in a pi'otein by 1 A G „, = ( £ - A G t 7 + AG Sl„.)(— - - L ) + (yog + V g j 2 - , (2.21) 2 tin *out tin I where A G *;^ is the solvation of the ith charged group in w ater at infinite distance \ from all other charged groups, A GsU r (which is usually evaluated by the Langevin j Dipole m odel [4]) is the difference in the solvation energy of th e protein by j the surrounding solvent upon charging the relevant groups in their actual sites, I , V q q represents the interaction between the charged groups, V qa i represents the J interaction between the charged groups and the protein perm anent dipoles, ein is | the dielectric constant assigned to the protein and eout is the dielectric constant I : of the surrounding solvent (see Ref. [9] for m ore details). It is tem pting to use the value of e obtained from microscopic calculations of a given region as the e;n for calculations of electrostatic energies in the same region. U nfortunately, such an approach is inconsistent w ith the derivation of Eq. 2.21. T he reason is th a t the 106 M ethod I M ethod II Site Ri (A) e/ e/z A 4.0 2. (3.) 2. (3.) 6.0 7. (8.) 4. (5.) 8.0 10. 6. 10.0 13. 7. 12.0 11. 7. 14.0 7. 7. 15.0 6. 6. I Table 2.3: T he dielectric constant evaluated at the oxyanion hole of trypsin j w ith inner perm anent dipoles neutralized, e/ and e// designate the e’s obtained 1 w ith the solvent effect by m ethod I and II, respectively. T he conditions for the ! calculations are identical to those described in Table 2.2 w ith th e exception th a t th e protein electroneutral groups w ithin 4 A from site A were neutralized (see tex t). In parentheses are the values of e when the protein induced dipoles are taken into account. J effect of the protein perm anent dipoles would be considered twice - once in the : calculation of e and the other in the evaluation of the charge-dipole interaction I term Vq,,. Of course, if the protein perm anent dipoles were com pletely fixed, we j could have identified e as em though this is never the case near ionized groups. | T he above problem may be overcome by evaluating t for a system w here all I | the protein perm anent dipoles of region 1 are neutralized. In this case e would ; only represent the contributions of the induced protein dipoles and th e internal i I w ater molecules. T he resulting e, which could be identified as e,n in Eq. 2.21, j can then be used in electrostatic calculations th a t take into account explicitly I the perm anent protein dipoles and their relaxations between the different charge j forms of the given solute. This idea was applied in a prelim inary calculation ! of e of site A (the oxyanion hole), in which the residual charges of the protein ; groups w ithin 4A from this site were set to zero. The results are sum m arized i in Table 2.3, which shows an overall decrease in the values of e as com pared to those of site A in Table 2.2. The m ost significant change occurs when region 1 107 I corresponds to th e same volume (4 A sphere) in which the protein perm anent ' I dipoles are neutralized. In th at case, e decreased to about 2 (and to about 3 when th e induced dipoles are considered) as com pared to 10 for the same case in Table 2.2. These results are reasonable and in fact, as was pointed out before [2], a protein can be viewed as a nonpolar solvent as long as its permanent dipoles and the surrounding solvent are considered explicitly and not as a p art of the dielectric ! effect. A pparently, the dielectric constant represents the effects which are not included explicitly in a given model. Thus, if all the perm anent and induced l dipoles (solute and solvent) are included explicitly and their reorganizations are j accounted for in a m odel, the dielectric constant would be reduced to unity. \ It is interesting to note th a t e may depend on the ionization state of th e ' ; protein groups. T h at is, the presence of a charged group in region 1 m ay force a j ! local unfolding, leading to penetration of w ater molecules. This m eans th a t the e value estim ated from sim ulations of the uncharged state may underestim ate the value of e th at should be used in macroscopic calculations of the charged state I (provided th a t th e calculations use the structure of the uncharged state and do j not tre a t the w ater molecules explicitly). { Finally, it should be emphasized th at in view of th e com plicated n atu re of the dielectric constant in proteins and the m any possible traps involved in its definition and im plem entation [9], it is always im portant to try to estim ate elec- I i tro static energies by microscopic approaches th a t tre at the protein and the sol- j vent explicitly, thus avoiding the concept of a dielectric constant altogether [9]. | Nevertheless, considering the error range and the slow convergence of the mi- ; ! croscopic approaches, it is im portant to exploit the im prove precision associated 1 j w ith the use of Eq. 2.21 (see Ref. [9]). Here the possibility of obtaining from consistent microscopic sim ulations m ight prove to be a useful approxim ation. 1 I 108 Reference List [1] P erutz, M .F. 1978. Science 201:1187. [2] W arshel, A. 1978. Proc. Natl. Acad. Sci. 75:5250. [3] W arshel, A. 1981. Acc. Chern. Res. 14:284. [4] W arshel, A. and Russell, S.T. 1984. Quart. Rev. Biophys. 17:283. [5] M atthew , J.B . 1985. Ann. Rev. Biophys. Biophys. Chern. 14:387. [6] Hoi, W .G .J. 1985. Prog. Biophys. Molec. Biol. 45:149. [7] Harvey, S.C. 1989. Proteins 5:78. [8] Sharp, K. A. and Honig, B. 1990. Ann. Rev. Biophys. Biophys. Chem. 19:301. [9] W arshel, A. and Aqvist, J. 1991. Ann. Rev. Biophys. Biophys. Chem. 20:267. [10] W arshel, A., Russell, S.T. and Churg, A. K. 1984. Proc. Natl. Acad. Sci. 81:4785. [11] W arshel, A. 1987. Nature 330:15. [12] Tanford, C. and Kirkwood, J.G . 1972. J. Am. Chem. Soc. 79:5333. [13] Harvey, S.C. and H oekstra, P. 1972. J. Phys. Chem. 76:2994. [14] Purcell, E. M., 1965. Electricity and Magnetism M cGraw-Hill, New York. [15] Kirkwood, J.G . 1939. J. Chem. Phys. 7:911. [16] Gilson, M.K. and Honig, B.H. 1986. Biopolymers 25:2097. [17] N akam ura, H., Sakamoto, T. and Wada., A. 1988. Protein Eng. 2:177. [IS] Simonson, T., Perahia, D. and Bricogne, G. 1992. J. Mol. Bio. in press. 109 [19] N eum ann, M. 1983. Mol. Phys. 50:841. [20] N eum ann, M. 1985. J. Chem. Phys. 82:5663. [21] Steinhauser, O. 1983. Chem. Phys. 79:465. [22] Levy, R„ M. and Alper, H. E. 1989. J. Chem. Phys. 91:1242. [23] King, G. and W arshel, A. 1989. J. Chem. Phys. 91:3647. [24] Onsager, L. 1936. J. Am. Chem. Soc. 58:1486. [25] Buckingham , A. D. 1956. Proc. R. Soc. London Ser. A 238:235. [26] Frohlich, H. 1958. Theory of Dielectrics Oxford University, London. [27] King, G, Lee, F.S. and W arshel, A. 1991. J. Chem. Phys. 95:4366. [28] W arshel, A., Sussm an, F. and King, G. 1986. Biochemistry 25:8368. [29] Hwang, J-K ., Creighton, S., King, G., W hitney, D. and W arshel, A. 1988. J. Chem. Phys. 89:859. [30] W arshel, A., Sussman, F. and Hwang, J-K. 1988. J. Mol. Biol. 201:139. [31] W arshel, A., Naray-Szabo, G., Sussman, F. and Hwang, J-K . 1989. Bio chemistry 28:3629. [32] Warwicker, J. and W atson, H. 1982. J. Mol. Biol. 157:671. 110 Chapter 3 A Local Reaction Field M ethod for fast evaluation of long-range electrostatic interactions in molecular simulations 3.1 Introduction j E lectrostatic interactions play m ajor role in biological and chemical processes [1-3] and the ability to evaluate electrostatic energies accurately is probably the m ost im portant requirem ent for quantitative structure-function correlation of bi- i ological molecules [1, 2]. U nfortunately, electrostatic interactions are long-range in n atu re and their evaluations present a fundam ental problem . One obvious option is to use a very large cutoff radius but the num ber of interaction pairs increases as N 2 (for a system of N atom s) and the calculations can becom e im practical. Thus m ost microscopic calculations of free energies of m acrom olecules i involve small cutoff radii and the corresponding electrostatic contributions m ight 1 be inaccurate. T he long-range problem may be reduced by using the extended Ew ald m ethod [4] or by related N log N m ethods [5, 6]. These m ethods, however, are hard to ( im plem ent in m acrom olecular calculations and their accuracy was not yet es tablished in q uantitative calculations of electrostatic free energies. T he present 111 work reexam ines the extended Ewald m ethod and th en develops quite a differ- j ent strategy which resulted in a very simple yet powerful m ethod for treatin g I long-range electrostatic effects. This m ethod, which is referred to as th e Local ! R eaction Field (LRF) m ethod, gives surprisingly accurate results in evaluating ' free energies of macromolecules. j I I j 3.2 The Local Reaction Field method i . : ! In order to understand the rational behind the LRF m ethod, it is useful to I I . . . . . . | consider first the ideas involved in other effective strategies. In particular, it is ; i instructive to consider the extended Ewald m ethod th a t inspired the developm ent { of the present approach. The extended Ewald m ethod considers a collection of i | N charges and expresses the electrostatic potential at the ith site, r;, as a sum of a short-range potential, $ s(rj), and a long-range potential, $ ((r;), using the identity , $(r* ) = U f ( r*j) ! i j = Y 2 + 1 2 i1 - £(rb)]/(ro) ' i i ’ - < h s( ri) + * /(r,-), (3.1) I where r ij is the distance between the ith and j t h charges, /’(r 1?) is a function | th a t describes th e interaction between the indicated atom pair and 5 (r,j) is a \ scaling function th a t decays fast and contributes only at short distances. Thus 1 the short-range potential can be evaluated by considering its contribution w ithin j a relatively small cutoff range (e.g., 6-7 A) and the problem reduces to the | evaluation of the long-ra.nge potential, which is approxim ated by an expansion i , of th e form I | ^/(*"j) = ^ C 'lm n f l m n (? 'i) V im (^i i H i; ; (3-2) I I inn 112 where the / and y are expansion functions, r,- is the length of the vector r ; and x yi, Z { axe the corresponding C artesian components. The m ain tim e-saving step i of this m ethod is associated w ith the fact th a t the expansion coefficients Cimn I can be eva.lua.ted by running only once over the N charges of the system . In I this way, we can evaluate the long-range force and energy of each charge in the system by considering T /(rz) at the site of th a t charge and this procedure requires N evaluations of th e corresponding potential of Eq. 3.2 (whei’ e the expansion coefficients are already available from a previous loop of the order N). Thus the m ethod is of th e order of N x q where q is the num ber of expansion coefficients (q = I x m x n). U nfortunately, the function th a t represents the long-range potential at a given atom also involves contributions from groups which are located at a short distance from th at atom (since / is a continuous function). Thus a precise description of the long-range potential requires th e evaluation of m any term s in th e expansion potential. Furtherm ore, in the general case one m ay need to use m any expansion centers in order to obtain accurate results. Considering this problem, we introduce a variant of the extended Ew ald ap proach, whose basic ideas are as follows: The N charges of th e system are divided into M groups (typically electroneutral groups) and the electrostatic potential at the a th charge of the zth group is divided into a short and long-range potentials, i.e., $(r“) = 4><(rf) + <MC) • (3-3) | T he short-range potential is simply the sum of th e electrostatic contributions ; from the groups inside the cutoff Rcot, i.e., I * .( r f ) = E E 4 + E a'a when (Rij < Rcut) 5 j (i r q a'^a a where r i s the distance between the a th charge of the ith group and th e /3th charge of the j t h group and is the distance between the centers of the *th j and j t h groups (the center of the *th group is taken as R ; = £ a qf)- j The long-range potential is given by I / ! * .( r f ) = E E - 3 » when (Rij > Reid) • (3-5) 3 P 1 ij With, a large enough R cut, we can approxim ate T/frf) by an expansion potential w ith only a few term s. To obtain accurate results, we allow each of the M groups of th e system to form a local expansion center so th at each group can be con sidered a.s a center for a reaction-field type treatm en t (see below). For exam ple, i in the w ater system depicted in Fig. 3.1, we consider each w ater molecule as a j group. In this case, the interaction between w ater 1 and w ater 2 is included in I while th a t betw een 1 and 3 is included in $/. T he to tal for each w ater i j m olecule represents the effect of all the molecules outside R cut- j T he expansion potential, used here to approxim ate th e of Eq. 3.5 involves the first four term s in a Taylor series about the center of each group and is given lay $/(rt -) ~ $/(*“ ») = w r . + v ^ O R y y + £ e f i v f « > •? t t ' t + E E E h v ‘" v‘V ‘$,)R. M (rf Srf , (3.6) I t" t> t D i where R ; is the center of the ith group, t and t' run over the C artesian com ponents, Srf — (rf — Rf ), and ( ) r ; indicates th at the corresponding term is evaluated at R t (see Fig. -3.2). V* = d/drf where d$i/drf indicates th a t Eq. 3.5 i is differentiated w ith respect to rf (the same results is obtained for any a). T he long-range electrostatic energy of the zth group can now be w ritten as c > i Figure 3.1: A schem atic picture of a system of w ater molecules illustrating the basic idea behind the LRF m ethod. Each group in this case is a w ater molecule and its center forms a local expansion center for the long-range potential in the LRF m ethod. Thus for exam ple, the coulombic potential at the oxygen atom of 1 molecule 1 is a sum of a short-range potential due to the groups inside the shaded ! circle (e.g., molecule 2) and a long-range potential due to the groups outside the shaded circle (e.g., molecule 3). the j th group the / th group -oc I / I / " origin Figure 3.2: A schem atic diagram of the positions of the charges of th e z'th and j t h groups of the system relative to an expansion center. In this case, the center of 2th group is an expansion center for the long-range charges of th e j th group. 116 ! where Q4, /z4, ©i and Oi are, respectively, the monopole, dipole, quadrupole i and octopole m om ents about the center of the zth group (see Refs. [7, 8] for the expressions of these electric m ultipole tensors), while ( £ ;) r ; and ( F /) r . are, respectively, the long-range contributions to the electric field and field-gradient j at R, and (Ff*'*")r . are the tensor com ponents corresponding to th e long-range I contribution to the gradient of the field-gradient at R;. In th e sim ple case when I th e ?'th group is a dipole (i.e., there is no net monopole, quadrupole or octopole ! m om ent about the center of the ith group), Eq. 3.7 reduces to ! u j = - n , (£ ,)r, (3.8) so th a t UJ is given by the product of the dipole of the zth group and the field at th e center of the group. T he total energy of the system is now given by £/<...! = | E I > f [ * ■ « ) + *'( I’j )] T V v d w T U b o n d in g i (3-9) ! where X 5 bonding is the bonding interaction between the different fragm ents of the j system described by a standard force field [9] and Uvdw is the non-electrostatic ' van der W aals interaction evaluated w ithin the given R cut. T he contribution from the induced dipoles of the system can also be considered and tre ated as a p art of th e electrostatic potential (see below), j M olecular sim ulation studies require one to evaluate the forces in addition to j th e energy. The long-range force, f/, acting on a charge q located at r.( (near R,) j is given by m inus the gradient of the corresponding potential energy, i.e., i / / ( r.) = -(dU,/d(6rl))tt ' = + E I f 6 r‘‘ + E E S r f 6rf} j t< t ’ t " ^ = 7^/(u) = r , } , (3.10) 1 1 where J f = (VjVfVf ...V?$/)Rj and t j ( n ) = -(V j$ j)r < is th e correspond- , ing com ponent of the long-range electric field at r,. Note th a t — T /, —Tf' and 117 — T l1 '1 " are, respectively, the corresponding (£f)R,i5 (F™')^ and (Fltt't")jii of Eq. 3.7. The T f 1 '1 " term is not essential for the evaluation of the long-range i potential energy but m ight be quite im portant in the accurate evaluation of the long-range forces (see below). Since Eq. 3.10 corresponds to the first four term s in Eq. 3.7, we refer to it as a third-order expression. I T he T -tensors in Eqs. 3.7 and 3.10 are updated once in every L tim e steps in the sim ulation. Thus these quantities do not reflect the fluctuations of the long-range charges during this tim e period. However, it appears th a t the long- l range field and field-gradient fluctuate rather slowly and follows in some respects a reaction-field type polarization. Thus we may keep the long-range polarization j constant for a rath er large num ber of tim e steps. In fact, one way to rationalize I th e present m ethod is to consider the fluctuations of the field at th e Tth group (see Ref. [10] for the related dispersed polaron approach and then to filter the fourier com ponents of the fast oscillations. ! The order of the present m ethod is I t j order ~ ((lY X M X q) + M X P' X q X L + M X P' X P X L)/ L = (T V x M x q)/ L - f Af x q + TV x P i | - TV x \{M x q)/L + P] , (3.11) where P' is th e average num ber of atom s in each group (so th a t P' x M = N ), P is the average num ber of atom s w ithin the cutoff region while q is related to the I I num ber of term s in the expansion potential. The first term in Eq. 3.11 represents | the tim e needed to evaluate at the center of th e ith group, the second term j represents the tim e needed for using the expansion of Eqs. 3.7 and 3.10 (whei'e the ] expansion coefficient are already available) and the last term represents the tim e needed for the short-range calculations. W hen L is sufficiently large, th e order of the m ethod approaches TV x P which is sim ilar to th a t of th e corresponding 1 cutoff m ethod. The calculations can be accelerated by using larger groups and j evaluating < ! > / using the m ultipole expansion of each group and the distance , |R j — R j| betw een the groups, rather th an sum m ing over all the particles in the j th group. This will reduce the N x M x q term in Ecp 3.11 to M x M x q x q. j I U nfortunately, such a procedure th at resembles the N log N m ethods results in 1 a rath er inconvenient program m ing and is not justified as long as L is large and I the second term of Eq. 3.11 is the leading term . Furtherm ore, as long as one j uses sm all groups this approach is not justified at all. ; It m ight be instructive to state here th a t the idea to deal w ith long-range i interactions in m acrom olecular calculations is not new [4, 6, 11]. However, to ( j the best of our knowledge, none of the previously published procedures has suc- i ceeded in reproducing accurate long-range electrostatic energies and forces in , I ' j m acrom olecular sim ulations. For exam ple, Ref. [11] recom m ends an energy cor- j rection term th a t involves the product of the dipole associated w ith an atom ic | displacem ent and an approximated expression for the field due to distant groups, j This is, in fact, an attem p t to w rite the first term in the interaction energy | of d istan t groups. Our approach, on the other hand, involves the exact field, j field-gradient and the gradient of the field-gradient (due to all the long-range . groups) at th e center of each group. Not only th a t the field-gradient and th e , I higher derivative are essential in order to obtain accurate results (see below), ! bi.it our simple procedure avoids the need to create a com plicated list for the I interaction between groups which are outside the cutoff range. In fact, despite ! { the apparent sim plicity of our approach, we are not aware of any related m odel i ! or any dem onstration of th e perform ance of a long-range model in calculations | I I of solvation energies of charges in macromolecules. ; ! T he present work sim ulates the w ater and protein charges using the SCAAS j m odel [12] which is im plem ented in the program ENZYM IX [13]. T he specific \ im plem entation of the w ater and protein models are described, respectively, in , Refs. [12] and [14]. T he protein boundary conditions are specified by three cutoff radii R 2, R 3 and R4 which correspond, respectively, to the radius of the unconstrained spherical protein region, the radius of the explicit w ater sphere 119 and the radius of the surrounding grid of Langevin dipoles (see Ref. [15]). The calculations reported here involve w ater atom s th a t are not polarizable (i.e., using the param eters of Set A of Ref. [12]). On the other hand, we still attach induced dipoles to the protein atom s. Neglecting the w ater induced dipoles while using som ew hat larger perm anent charges appears to be a reasonable approxim ation for ground state processes, but the same approxim ation is not so reasonable when one deals w ith charges in protein interiors. T he LRF m ethod is incorporated into ENZYM IX by retaining the regular cu t off approach (where each electroneutral group interacts w ith all the groups inside i the specific R cut) and then adding the long-range potential energy of Eq. 3.7 and ! the corresponding forces of Eq. 3.10. This treatm ent involves a m inor m odifica tion in treatin g the protein induced dipoles. T h at is, the interaction betw een the protein induced dipoles w ithin R cut is evaluated by the non-iterative approach of Ref. [16] (see also Ref. [14]) which allows one to obtain the corresponding j forces in a convenient way. This approach approxim ates the interaction betw een i ] the induced dipoles by scaling the field on each induced dipole w ith a screening ! constant and th e same philosophy has been adopted recently by other group [17]. 1 In this way, the energy of the induced dipoles is given by A G ind = - 1 6 6 £ ] T 2 f K (r“) + €?(r“)P , (3.12) i a a where 7 “ is the polarizability of the zth atom , d is the screening constant and ! £° designates the indicated field from the perm anent charge distribution of the system . W hile this expression includes both the short and long-range fields, we find it reasonable to neglect the contribution from the long-range field to the forces experienced by the induced dipoles. Note, however, th a t the short-range forces on the induced dipoles are included in our sim ulations. I 1 I i 1 1 ! 120 3.3 Results and discussions ; ! i Before exam ining the perform ance of the LRF m ethod in MD sim ulations, it is ; crucial to explore the reliability of this m ethod in reproducing the correct electro- : static energies and forces of systems with different sizes, charge distributions and | assum ed configurations. This is done here in two steps. The first step considers both th e long-range potential energy and forces of a simple system th a t contains j two w ater molecules separated by about 10 A. Table 3.1 compares th e results ! j obtained w ith the second-order expression to the corresponding exact energy and I forces. Table 3.2 shows the results of the same test except th a t the third-order ! expression is used. As shown in Table 3.2, the third-order expression gives an extrem ely good approxim ation for the exact coulombic energy and forces of th e \ system . A lthough the second-order expression may not reproduce th e long-range forces as good as the third-order expression does, we feel th a t it can provide the correct energetics of the system . T he second step of the study considers a wa ter sphere and two protein molecules, and compares the third-order long-range j contributions obtained using different R cut to those obtained using no cutoff. i This study is sum m erized in Table 3.3, which indicates th a t the use of a R cut of ; 7 A is sufficient to obtain an error of less than 1% on th e very large long-range ' | contribution (which is neglected in truncation treatm ents th a t use such a cutoff . radius). An even smaller cutoff can be used if more expansion term s are included t b u t such a com plicated treatm ent m ight not be justified. As a first step in the analyses of the perform ance of the LRF m ethod in MD ■ f j sim ulations, we consider the self-energy1 of a sodium ion in w ater. An adia- I batic charging F E P procedure [14, 15] is used for the evaluation of the given ♦ 1 self-energy. This study compares the self-energy obtained w ith an approach th a t uses no cutoff (referred to here as the “no-cutoff m ethod”) to th a t obtained w ith 1T h e self-energy of a charge in an hom ogeneous m edium is defined here as th e electrostatic free energy associated w ith changing the charge from zero to its actu al value in th e given ’ J m edium . 1 121 (A) Potential energy Atom E xact Energy LRF E n erg y 1 -0 .2 5 2 2 3 - 0 .2 5 2 1 9 2 0.09167 0.0 9 1 5 1 3 0.13534 0.13547 4 0.63940 0.63933 5 -0 .3 7 3 5 9 -0 .3 7 3 5 7 6 -0 .2 9 1 0 2 -0 .2 9 0 8 2 T o t a l -0 .0 5 0 4 3 - 0 .0 5 0 2 7 (B) F o rc e : Atom E xac t F o rc e LRF F o r c e 1 - 0 .1 0 2 4 0 -0 .1 0 1 7 3 1 - 0 .0 6 1 7 0 -0 .0 6 1 2 3 1 - 0 .1 0 1 6 0 - 0 .1 0 0 6 6 2 0.03128 0 .0 3 0 3 6 2 0.02722 0.02812 2 0.04414 0.04542 3 0.05581 0 .0 5726 3 0.02338 0 .0 2 3 9 5 3 0.03791 0.0 3 8 7 9 4 -0 .2 5 1 1 7 - 0 .2 5 2 0 8 4 -0 .0 2 5 4 6 - 0 .0 2 5 1 5 4 -0 .0 0 4 5 6 -0 .0 0 4 0 2 5 0.15862 0.15604 5 0.01117 0 .0 1 2 7 6 5 0.01726 0 .0 1 5 3 6 6 0.10787 0.10673 6 0.02539 0.0 2 8 2 4 6 0.00686 0.0 0 6 8 0 Table 3.1: A com parison of the potential energy and force obtained w ith th e 2nd-order LRF m ethod and Coulom b’s law. Energies and forces are in given kcal m ol-1 kcal m ol-1 A \ respectively. T he system involves two wa ter molecules about 10 A apart. The potential energy and force of th e system axe evaluated by the Coulom b’s law and the second-order LRF m ethod. 122 (A) Potential energy Atom E xac t Energy LRF E nergy 1 -0 .2 5 2 2 3 -0 .2 5 2 2 3 2 0.09167 0 .0 9169 3 0.13534 0.13532 4 0.63940 0 .6 3940 5 -0 .3 7 3 5 9 - 0 .3 7 3 6 1 6 -0 .2 9 1 0 2 - 0 .2 9 1 0 6 T o t a l -0 .0 5 0 4 3 - 0 .0 5 0 4 9 (B) F o rc e : Atom E x act F o rc e LRF F o r c e 1 -0 .10240 - 0 .1 0 2 3 6 1 - 0 . 06170 - 0 .0 6 1 6 8 1 - 0 . 10160 -0 .1 0 1 5 4 2 0.03128 0.03112 2 0.02722 0 .0 2 7 1 8 2 0.04414 0.04402 3 0.05581 0.05593 3 0.02338 0.02323 3 0.03791 0.0 3 7 7 7 4 -0 .2 5 1 1 7 - 0 .2 5 1 1 1 4 -0 .0 2 5 4 6 - 0 .0 2 5 5 1 4 -0 .0 0 4 5 6 - 0 .0 0 4 5 4 5 0.15862 0.15859 5 0.01117 0 .0 1 0 9 8 5 0.01726 0.01703 6 0.10787 0.1 0 7 6 9 6 0.02539 0.0 2 5 5 9 6 0.00686 0 .0 0 6 8 5 Table 3.2: A com parison of the potential energy and force obtained w ith the 3rd-order LRF m ethod and Coulom b’s law. T he same study as in Table 3.1 except th a t the third-order LRF m ethod is used. 123 I System Rcut--6 Rcut,— 7 A Rcut— 8 - A - exact approx. exact approx. exact approx. W at 15 -293.96 -293.66 -192.94 -193.00 -119.79 -119.79 B P T I -31.79 -31.54 -15.72 -15.52 -11.17 -11.10 Lyz -268.04 -265.28 -174.01 -173.81 -145.91 -145.96 Table 3.3: T he sensitivity of the long-range potential obtained w ith th e LRF m ethod to the cutoff radius. A comparison of the contribution of the long- ; range energies (in kcal m ol"1) evaluated by the exact expression (Eq. 3.5) to { those obtained by the LRF m ethod. The table considers several cutoff radii | and different charge configurations. W atl5 is a snapshot of a w ater sphere of ' radius of 15 A generated by a MD sim ulation. B P T I and Lyz are, respectively, - the X-ray structures of the bovine trypsin pancreatic inhibitor and lysozyme i protein molecules in vacuum. For the water system, each electroneutral group is ! a. w ater molecule. For the protein systems, each protein residue is divided into j several electroneutral groups (e.g., the CaHNH, CO and CH2OH fragm ents of a : serine residue are treated as three electroneutral groups). An interaction betw een two given groups is considered to be a long-range interaction when the distance j betw een the centers of these groups is larger than or equal to the specified R cut. I Note th a t the long-range contribution is very large even when Rc u t— 8 A. This ; contribution is neglected in regular “truncation m ethods” th a t do not consider ■ the contribution beyond the given R,cut ■ i I l l I 1 I 124 the LRF m ethod w ith a 7 A cutoff for the short-range potential. We also consider 1 x the results obtained w ith a 7 A cutoff but w ithout any long-range contribution (this approach is referred to here as the “truncation m ethod” ). T he accuracy and precision of these m ethods are studied by perform ing a series of M D /F E P sim ulations w ith different initial configurations, where the average and standard I j deviation of the results reflect, respectively, the accuracy and precision of the j calculations. In addition, the average hysteresis’s (the free energy difference i betw een the forward and backward integrations of the corresponding F E P calcu lation) of the different m ethods are also compared. The system studied involves a sodium ion surrounded by a sphere of water molecules sim ulated by SCAAS m odel [12]. The radius of the w ater sphere is 12 A, which included about 250 | w ater molecules. T he free energy calculations involve 11 m apping steps of 2 ps each and tim e steps of 2 fs. Thus each M D /F E P calculation involves a 22 ps 1 trajectory. The tem perature of the system is kept around 300 K. T he simula- ! tions are repeated w ith 3 independent sets of initial configurations so th a t the I accuracy of the calculations can be assessed. The results of the three approaches I are com pared in Table 3.4. Not surprisingly, the truncation m ethod gives very | poor results as reflected by its accuracy and precision. This m ethod also have the , largest hysteresis. This finding illustrates once again the lim itations of conven- 1 j tional truncation procedures in correctly evaluating electrostatic energies. On ! the other hand, the results obtained with LRF m ethod are as accurate as those | obtained w ithout any cutoff but at a a sixth of the tim e when the no-cutoff cal- ] dilations are done in the scalar mode of the same processor. For sm all system s (N < 1000), the LRF m ethod reaches the upper lim it of its speed which is th a t of the truncation m ethod. For larger systems (N > 5000), the LRF m ethod can j reduce the tim e by a factor of 40 or more (when com pared to th a t of th e no-cutoff m ethod) if we update the long-range potential every 160 fs. A lthough we obtain reliable results w ith a 160 fs interval per update, we feel th at an 80 fs interval is 1 ] an optim al choice and is used in the rest of the calculations. It is also instructive I 125 t Truncation LRF No-cutoff Case 1 Case 2 Case 3 -96.6 -82.4 -83.3 -82.1 -82.4 ^C Tbulk -13.7 -13.7 -13.7 -13.7 -13.7 & Getec -110.3T2.4 -96.0T0.2 -97.0T0.3 -95.8T2.6 -96.1T1.4 hysteresis 6.4 1.5 2.7 1.4 5.0 cpu tim eO 1.7 1.8 1.7 1.6 4.8 (10.3) ! Table 3.4: Com paring the results of the truncation m ethod, the LRF m ethod > and the no-cutoff m ethod for the average free energies of charging a sodium ion in I water. Energies are given in kcal m ol-1 . All the reported quantities are obtained by averaging the corresponding F E P results of 3 independent 22 ps trajectories (th a t involve 11 m apping steps of 2 ps each) generated from different initial con- j figurations of the system . T he system involves a sodium ion surrounded by a 12 A w ater sphere th a t includes about 250 water molecules. The truncation m ethod ; uses a cutoff of 7 A for the nonbonded interactions between the w ater molecules. 1 T he LRF m ethod uses a R cut of 7 A and updates th e long-range potential every I 40 (Case 1), 80 (Case 2) and 160 (Case 3) fs w ith tim e steps of 2 fs. T he table t gives the different contributions to the total calculated free energy. A G ™ tec is I th e to tal electrostatic free energy, A Gq^ is the contribution from the interaction j betw een the ion and the explicit w ater molecules and A Gbuik is the contribution ■ from th e bulk. The hyteresis is the absolute value of the free energy difference in , th e forward and backward integrations of the corresponding M D /F E P calcula- j tion. (a) The cpu tim e (in hr) taken by different approaches to generate a 22 ps j trajecto ry for the evaluation of A G ^ ec on a St ardent R3000 processor. Note th at the no-cutoff approach is optim ized by vectorization procedures. In parenthesis is the cpu tim e when obtained using a scalar mode. I I I I 1 i 126 I to notice th at the LRF m ethod appears to be the m ost stable m ethod, giving th e highest precision and smallest hysteresis among the three m ethods. I A fter establishing the LRF m ethod as a fast and reliable approach for the evaluation of electrostatic free energies in water, we exam ine the perform ance of this m ethod in macromolecules. This is done here by evaluating th e self-energies of the four acidic residues (Asp 3, Glu 7, Glu 49 and Asp 50) of B P T I. This test case presents a m ajor challenge for microscopic calculations [2], since the I self-energies of these ionized acids are almost identical while th e microenviron- I m ents around these acids are very different. To correctly reproduce the alm ost exact com pensation between the different electrostatic conti’ ibutions is a m ajor challenge to any approach th a t deals with the energetics of solvated proteins. In the present sim ulations, the radii R 2, R3 and R4 of the protein m odel (see previous section and Ref. [15]) are taken as 13, 18 and 18 A, respectively. U nder ' these conditions, there are between 550 to 650 w ater molecules and no Langevin dipoles in the calculations. In addition, the self-energies of th e aspartic and ; glutam ic acids in w ater are calculated. In each case, 8 independent 22 ps MD I trajectories are used in the statistical analyses. The self-energies of the aspartic and glutam ic acids in w ater and Asp 3 and Glu 7 in solvated B P T I are also eval- I j uated by the no-cutoff m ethod to allow a comparison between th e two m ethods. 1 Since the no-cutoff m ethod is extrem ely tim e-consum ing for these evaluations, j only 3 trajectories each of 22 ps duration are generated for the statistical anal- I yses in each case. T he results of these calculations are shown in Table 3.5 and ! the LRF m ethod is once again able to reproduce the corresponding results of ! the no-cutoff m ethod. The LRF m ethod is estim ated to be about 8 tim es faster from the no-cutoff results obtained in the scalar mode of the same processor ! (note th a t the corresponding protein calculations using the truncation m ethod I j are about 12 tim es faster). As indicated in Table 3.5, the LR F m ethod seems to have a sm aller hysteresis as compared to the no-cutoff m ethod. Though the j stan d ard deviations can be as large as 3 kcal m ol-1 (i.e., individual M D /F E P 127 Table 3.5: Average electrostatic energies (in kcal m ol-1 ) of charging th e aspartic | and glutam ic acids in w ater (A G ™ lec) and the acidic residues of B P T I (A Gv elec). T he table compares the results obtained with th e LRF m ethod and th e no-cutoff m ethod. The w ater calculations consider an aspartic or glutam ic acid surrounded [ by a 12 A w ater sphere th at includes about 245 w ater molecules. T he radii i?2, 1 R s , i?4a and R .U j of the protein model are taken as 13, 18, 18 and 18 A, respec- ! tively. U nder these conditions, there are about 550 to 650 w ater molecules in ; i the different cases and no Langevin dipoles. All the reported quantities are ob tained by averaging the results of 8 (3 for the no-cutoff case) independent 22 ps , M D /F E P trajectories generated from different initial configurations of th e sys tem s. T he LRF m ethod uses a R cut of 7 A and updates the long-range potential ! every 80 fs w ith a tim e step of 2 fs. The table gives the different contributions j to the total calculated free energy. A G qp and AGq^ are the contributions from j the interactions between the carboxy group of the relevant acidic residue w ith : j the protein and w ater atom s, respectively. A Gga is the contribution from the j interaction betw een the protein induced dipoles with the perm anent charges. 1 j A Gbuik is th e contribution from the bulk, (a) The cpu tim e (in hr) taken by the ! two approaches to generate a 22 ps trajectory for the evaluation of A Gv elec on a St ardent R3000 processor. In parenthesis is the cpu tim e obtained using a scalar mode, (b) T he observed value of A A G™ ^cp (in parenthesis) is obtained from the corresponding observed p K a using [2] pK% = pK™ + A A G ^ C P/(2.3R T). 1 129 LRF no-cutoff Asp 3 Glu 7 Glu 49 Asp 50 Asp 3 Glu 7 In water a o :l e c -72.O i l .1 -70.9i2.0 -70.9±2.0 -72.O il.1 7 1 .4 il.7 -71.2i2.7 hysteresis 1.7 1.3 1.3 1.7 3.8 2.5 In protein A < ^ -14.2 -10.0 -33.3 -39.4 -13.6 -5.6 A -48.2 -49.8 -27.8 -25.0 -49.7 -52.2 AGS„ -1.1 -2.3 -0.5 0.8 -0.4 -3.0 A Gbuik -9.4 -9.0 -9.4 i G O C O -9.3 -9.1 A Gp elec -7 2 .9 il.6 -71.1±3.0 -71.O il.6 -72.5i2.0 -73.O il.8 -69.9i2.7 hysteresis 1.9 0.8 1.3 1.7 3.9 3.1 cpu tirne^) 13.4 15.4 15.2 16.9 56.1 (112.7) 59.4 (121.7) AA GZZm -0.9 (-1.3) -0.2 (-0.6) -0.1 (-0.3) -0.5 (-0.3) -1.6 1.3 Asp 3 Glu 7 Glu 49 Asp 50 k-Gbuik -15.0 -45.4 -1.3 -9.5 -6.5 -51.2 -3.1 -9.1 -34.9 -25.4 -1.0 -9.5 -36.8 -31.6 0.8 -8.9 -71.2 -69.9 -70.8 -76.5 hysteresis 2.1 0.2 1.6 0.3 Table 3.6: LRF calculations of the self-energies of the acidic residues of B P T I obtained from single prolonged trajectories. E lectrostatic free energies (in kcal m ol-1 ) of charging the acidic residues of B P T I obtained w ith the LRF m ethod. Each quantity is obtained from a single 110 ps M D /F E P trajectory. O ther sim ulation conditions are identical to those used in Table 3.5. calculations m ay give results th at are differed by up to 6 kcal m ol-1 ), the average self-energies of th e four acids in B PT I are very similar. The results obtained are by far b etter than those obtained in any of our earlier studies [2]. Apparently, th e LR F m ethod has m et the above challenge in a cost-effective way. At this point, the readers may suggest th at single MD trajectories of long enough duration m ight produce results th a t are as accurate and stable as those obtained by the above averaging procedure of shorter sim ulations. To investigate this point, we repeat the LRF calculations done in Table 3.5 under the same conditions except th a t each M D /F E P calculation is of 110 ps duration (i.e., 11 m apping steps of 10 ps each). The results of this study, which are shown in Table 3.6, are less encouraging considering the results for Asp 50. A lthough we repeat this study w ith a different set of initial configurations, the results are once again less satisfactory (results not shown) than those of Table 3.5. A pparently, the averaging procedure is able to give results th at b etter represent th e ensemble averages of a, system. 130 3.4 Concluding remarks T he need for very large cutoff radius in sim ulation of solvated m acrom olecules and ionic solutions is only partially recognized. T hat is, alm ost everyone involved in such sim ulations is aware of the need for a large cutoff but m any practitioners j are not fully aware of the very large errors involved in using truncation m ethod. In fact, some m ight believe th at the rigorous use of free energy pertu rb atio n ( m ethods is sufficient to give reliable electrostatic energies even w ith cutoff radii around 7 A. This problem is associated in part w ith the enormous com puter tim e needed for F E P calculations th at use very large cutoff radii and also w ith the fact th at onl}'' few have attem pted to obtained accurate electrostatic energies of charges in protein interiors [2]. T he present LRF m ethod offers a practical m ethod for treating long-range electrostatic effects in microscopic simulations. This m ethod reproduces th e re sults obtained w ithout a cutoff while reducing the necessary com puter tim e by two orders of m agnitude. In addition to being fast and accurate, it appears to be very stable. Thus we are able to obtained our best results to d ate for the | pKa s of the acidic groups in B PT I (see discussion in Ref. [2]). Sim ilar im proved results are obtained in recent LRF studies of other electrostatic problem s in cluding ligand binding and enzyme catalysis. In view of these results, we expect ■ this m ethod to provide a m ajor im provem ent in quantitative studies of m acro molecules. The LRF m ethod can be incorporated into simplified electrostatic approaches th a t do not involve all-atom solvent models. For example, this m ethod has been j used to accelerate the Protein Dipoles Langevin Dipole (PDLD) m ethod [2] and is now im plem ented in the PDLD version of the program PO LA RIS [13]. In this case, we exploited the fact th at the solvent molecules are represented by dipoles and use Eq. 3.8 for the evaluation of the long-range potential. This rath er sim ple m odification increases significantly th e perform ance of the program for large system s w ith m any solvent molecules. 131 A nother im portant aspect of the present m ethod is its ability to provide re- i liable results for very large systems w ithout using periodic boundary conditions (such boundary conditions are not justified in m any electrostatic problem s [10]). This should allow one to explore some fundam ental electrostatic problem s in cluding the nature of the dielectric effect in macromolecules [18] and help to clarify how to incorporate correctly the microscopic reaction field in microscopic sim ulations [10, 18]. 132 Reference List i i < i [1] W arshel, A. 1981. Acc. Chem. Res. 14:284. > [2] W arshel, A. and Aqvist, J. 1991. Ann. Rev. Biophys. Biophys. Chem. 20:267. I [3] Sharp, K. A. and Honig, B. 1990. Ann. Rev. Biophys. Biophys. Chem. | 19:301. [4] K uw ajim a, S. and W arshel, A. 1988. J. Chem. Phys. 89:3751. [5] Barnes, J. and H ut, P. 1986. Nature 324:446. [6] G reengard, L and Rokhlin V. 1989. Chemica Scripta 29A:139. [7] Buckingham , A.D. 1959. Quarterly Reviews (Chem. Soc., London) 13:189. | [8] Buckingham , A.D. 1978. Perspectives in Quantum Chemistry and Biochem istry, Vol II, ed. Pullm an B., p .1-67, Wiley, New York. [9] W arshel, A. 1991. Computer Modeling of Chemical Reactions in Enzymes \ and Solutions, Wiley, New York. I [10] W arshel, A. and Hwang, J.K . 1986. J. Chem. Phys. 84:4938. [11] Brooks, B.R., Bruccoleri, R.E., Olafson, B.D., States, D .J., Sw am inathan, • S. and K arplus, M. 1983. J. Comp. Chem. 4:187. [12] King, G. and W arshel, A. 1990. J. Chem. Phys. 93:8682. [13] W arshel, A. and Creighton, S. 1989. Computer Simulation of Biomolecular Systems, ed. W .F. van G unsteren, P.K. Weiner, p .120-138, Leiden: ESCOM . j [14] W arshel, A., Sussman, F. and King, G. 1986. Biochemistry 25:8368. ' [15] Lee, F.S., Chu, Z.T., Bolger, M and W arshel, A. 1992. Protein Engineering | in press. : [16] W arshel, A. and Levitt, M. 1976. J. Mol. Biol. 103:227. 133 [17] S traatsm a, T.P. and M cCammon, J.A . 1990. Chem. Phys. Lett. 167:252. [18] King, G, Lee, F.S. and W arshel, A. 1991. J. Chem. Phys. 95:4366. 134 UMI Number: DP22138 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a com plete manuscript and there are missing pages, th ese will be noted. Also, if material had to be removed, a note will indicate the deletion. Disswtaiion Publishing UMI DP22138 Published by ProQ uest LLC (2014). Copyright in the Dissertation held by the Author. Microform Edition © ProQ uest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United S tates Code ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 4 8 1 0 6 -1 3 4 6
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Computer simulation of polar solvents with the SCAAS model
PDF
Calculations of electrostatic interactions in proteins
PDF
Computer simulation of chemical reactions in aqueous solutions and biological systems
PDF
Consistent evaluation of binding free energies and study of the role of electrostatic effects in the stabilization of protein complexes
PDF
A new approach to the problem of measuring the properties of micelles
PDF
Adrenalectomy and fat absorption
PDF
¹⁹F-NMR studies of trifluoroacetyl insulin derivatives
PDF
Chemical, physicochemical, and biological studies on the mucoproteins of plasma and serum
PDF
A study of the effect of glycogen on the oxidation of butyrate by rat liver slices
PDF
Chiroptical spectroscopy of nitrogenase
PDF
A study of the ketone body metabolism of excised rat tissues with respect to labile phosphate content
PDF
A critical analysis of ideology and discourse in two hotels
PDF
A novel reaction of mismatched cytosine-cytosine pairs associated with Fragile X
PDF
Building a model of organization acculturation: an interpretive study of organizational culture and stories
PDF
Crystallographic stydies on Zn(2), Mn(2) and Pt(2) complexes of phosphonates
PDF
Computer simulation studies of charge transfer through biological and artificial membrane channels
PDF
The role of water in the heat denaturation of egg albumin
PDF
An experimental investigation of amino compounds of boron hydrides
PDF
Changes in the rate of net synthesis of oral tissue collagen with age and wound healing
PDF
Computational investigation of the catalytic aspartic acids of HIV-1 protease
Asset Metadata
Creator
Lee, Frederick Siklun
(author)
Core Title
Computer modeling and free energy calculatons of biological processes
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Chemistry (Chemical Physics)
Degree Conferral Date
1992-05
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
chemistry, biochemistry,chemistry, physical,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Warshel, Arieh (
committee chair
), [illegible] (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c17-33605
Unique identifier
UC11348115
Identifier
DP22138.pdf (filename),usctheses-c17-33605 (legacy record id)
Legacy Identifier
DP22138.pdf
Dmrecord
33605
Document Type
Dissertation
Rights
Lee, Frederick Siklun
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
chemistry, biochemistry
chemistry, physical