Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Computer simulation of polar solvents with the SCAAS model
(USC Thesis Other)
Computer simulation of polar solvents with the SCAAS model
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
COMPUTER SIMULATION OF POLAR SOLVENTS WITH THE SCAAS MODEL by Gregory King A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY {Chemistry--Chemical Physics) December 1989 Copyright 1989 Gregory King UNIVERSITY OF SOUTHERN CAUFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CAUFORNIA 90089 fh.P . c h KSX 3 W 3 D A /a This dissertation, written by G REGORY K IN G under the direction of h...l$..... Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillment of re quirements for the degree of DOCTOR OF PHILOSOPHY Dean of Graduate Studies Date . SeptemberiJ< A - < 1 - 9 .? .? ........ DISSERTATION COMMITTEE Chairperson To Lois Signature A c k n o w le d g e m e n ts I would like to thank my parents for passing on to me an appreciation for think ing and learning. I would also like to thank Arieh Warshel for teaching me a pragmatic approach to science. 1 1 1 CO NTENTS D edication ii A cknow ledgem ents iii List of Tables v List of Figures vi A bstract vii 1 The SC A A S M odel 1 1.1 In tro d u ctio n ................................................................................................ 1 1.2 The SCAAS M o d e l.................................................. 5 1.2.1 The Polarization C o n s tra in t...................................................... 9 1.2.2 The Radial C o n s tra in t................................................................ 13 1.2.3 Implementation of the Constraint F o rc e s ............................... 16 1.2.4 The Total Potential S u rfa c e ...................................................... 16 1.3 Properties of Bulk W a te r......................................................................... 19 1.4 Results and D iscussion............................................................................. 28 1.4.1 Structural and dynamical properties ..................................... 28 1.4.2 Solvent polarization around charged solutes ........................ 31 1.4.3 Free energy calculations ............................................................ 38 1.4.4 Dielectric properties ................................................................... 43 1.5 Concluding R e m a rk s ................................................................................ 48 1.6 R eferences.................................................................................................... 49 2 C om puter Sim ulation of Solvated P rotein s 53 2.1 In tro d u ctio n ........................................... 53 2.2 The Protein Sub-M odel........................................................ 55 2.3 Adapting the SCAAS Model for Large Protein Solutes ......................59 2.3.1 The Polarization Constraint . . . ......................................... 60 2.3.2 The Radial C o n s tra in t................................................................ 60 iv 2.3.3 Size Dependence ........................................................................ 61 2.4 Calculation of pK^s of the acidic groups of B P T I ................................62 2.4.1 M e t h o d ......................................................................................... 65 2.4.2 D iscussion...................................................................................... 71 2.5 R eferences.................................................................................................. 77 A The Beem an N um erical Integration A lgorithm 79 A .l R e fe re n c e ........................................................................................ 81 B Evaluation of piTa’s 82 B.l R eferences.................................................................................................. 86 C E valuating Free Energy D ifferences w ith M ultistage Sam pling 87 C .l R eferences.................. 90 D O ptim izing Fortran Code for M achines w ith V ector P rocessing C apabilities 91 E Selected Bibliography 97 List of Tables 1.1 W ater parameters used in the MD simulations..................................... 20 1.2 The SCAAS constraint parameters. . ............................................... 25 1.3 Sodium and chloride ion param eters....................................................... 34 1.4 Sodium ion solvation free energy.................................................................. 40 1.5 Dielectric constant as a function of Crf .....................................................46 2.1 Sources of the parameters used in the protein simulations.....................58 2.2 AAG^o i of the Glu-49 acidic group of BPTI as a function of the SCAAS sphere radius................................................................................. 62 2.3 The pica’s of the four acidic groups of the protein B PT I.......................64 2.4 Parameters used for the COOH group............................ 68 2.5 A multistage sampling “charging” calculation...................................... 69 2.6 Results of calculations of A AGp a o l for the acidic groups of BPTI. . 72 2.7 A breakdown of the contributions to A AGp sol from the different dielectric sources in the model................................................................. 76 B .l Free energies of the steps involved in the acid dissociation reaction of a free acid molecule in aqueous solution. . . ; .................................. 83 B.2 Free energies of the steps involved in the acid dissociation reaction of a protein acid group.............................................................................. 85 V I List of Figures 1.1 An illustration of the three regions of the SCAAS model. . . . . . 5 1.2 The constraint angles (8?) as a function of the index i ...........................11 1.3 The constraint distances (R°) as a function of the index j ....................14 1.4 The oxygen-oxygen radial distribution function, ^(r), of water. . . 22 1.5 (A) The center of mass velocity autocorrelation function, C r(£), of water. (B) The power spectrum, Pr (u > ) (Pr (uj) is the cosine transform of Cr(£)), for translational motion of w ater...................... 24 1.6 The effect of the Langevin force on the power spectrum obtained with the SCAAS model............................................................................. 25 1.7 (A) The autocorrelation function of 8, Cg(t). (B) The power spec trum , Pff(u), for angular motion. P$(u>) is the cosine transform of Ce(t)............................................ 26 1.8 Radial distibution function for pure water obtained with parame ter set II, which includes induced dipoles interactions....................... 27 1.9 Radial distribution functions for pure water obtained from SCAAS simulations of two systems with radii of: (A) 8 A; (B) 10 A. . . . 29 1.10 The angular distribution function, p(0), for pure water........................30 1.11 Angular distribution functions for pure water obtained from SCAAS simulations of two systems with radii of: (A) 8 A; (B) 10 A. . . . 32 1.12 Velocity power spectra of pure water obtained from SCAAS sim ulations of two systems with radii of: (A) 8 A; (B) 10 A ......................33 1.13 Angular distribution function for a system containing a central chloride ion (9 A sph ere)........................................................................... 34 1.14 Angular distribution function for a system containing a central chloride ion ((A) = 8 A sphere, (B) = 10 A sphere). . ................. 35 1.15 The average charge-dipole energy for water around a central chlo ride ion as a function of distance. (A) A comparison of results from SCAAS spheres of different sizes. (B) A comparison of SCAAS and FCLD results........................................................................................ 37 vii 1.16 Solvation free energy of a sodium ion in water as a function of the parameter A , comparing results from SCAAS spheres of different sizes................................................................................................................ 41 1.17 Solvation free energy of a sodium ion in water as a function of the parameter A , comparing results from SCAAS and FCLD simulations. 42 1.18 Total dipole moment autocorrelation functions, f°r PBC and SCAAS systems.............................................. 46 2.1 An illustration of the model used in our molecular dynamics sim ulations of protein-water systems............................................................ 56 2.2 The calculated solvation free energy differences A A Gp a o l for the four acidic groups of BPTI as a function of the multistage sam pling parameter A........................ 75 2.3 A breakdown of the contributions to A AGp tol from the different dielectric sources in the model............................................. 76 A b str a c t The reliable computer simulation of ionic or strongly polar solutes in polar solvents presents a major challenge from both fundamental and practical aspects. The frequently used method of Periodic Boundary Conditions (PBC) does not correctly take into account the spherical symmetry of the electric field of the solute. Instead of using PBC, it is natural to model this type of system as a sphere (with the solute at the origin), but the boundary conditions to be used in such a model are not obvious. Early calculations performed with our spheri cal Surface Constrained Soft Sphere Dipoles (SCSSD) model indicated that the solvent molecules near the surface of the sphere tend to display unusual orienta tional preferences. In particular, these molecules will overpolarize in an electric field. This effect is caused by the presence of a solvent-vacuum interface in the model, which in a real system of course does not exist. We thus implemented a constraint in the SCSSD model to correct for this overpolarization effect. In more recent approaches that treated the surface with stochastic dynamics (but did not take into account the surface polarization effects), the molecules near the surface were also found to exhibit this aberrant behavior. In the present work a Surface Constrained All-Atom Solvent (SCAAS) model is developed in order to consistently treat the surface polarization effects in all-atom molecu lar dynamics simulations. The SCAAS model introduces surface constraints as boundary conditions in order to make the necessarily finite system behave as if it was part of an infinite system. The performance of the model with regard to various properties of bulk water is examined by comparing its results to those obtained from PBC simulations. The results obtained from SCAAS models of different sizes are found to be similar to each other and to the corresponding PBC results. The performance of the model in simulations of solvated ions is examined, and it is shown that the model does not possess any significant de pendencies on the size of the sphere used in the simulations. This indicates that the model can be used with a relatively small number of solvent molecules to obtain reliable information on the energetics, structure, and dynamics of polar solutions. A preliminary study of the dielectric properties of the SCAAS model is reported, and the problems associated with the implementation of the solvent reaction field in the potential surface is discussed. The SCAAS model is then incorporated into a more sophisticated model for simulating protein-water systems. The minor alterations needed for adapting the SCAAS model from small solutes such as sodium ions to large solutes such as protein molecules are discussed. The accuracy of this protein-water model is examined by employing it in the calculation of the Pj KVs of the four acidic groups of Bovine Pancreatic Trypsin Inhibitor (BPTI). Although the error range of ~ 5 kcal/mol associated with these calculations is larger than we had hoped for, the results are not totally unsatisfactory considering that the calculations require the evaluation of free energy differences of magnitude 70-80 kcal/mol. x Chapter 1 The SCAAS M odel 1.1 In tr o d u c tio n Many chemical and biological processes occur in polar solvents. Computer sim ulations of such processes provide an extremely useful source of detailed infor mation which is often difficult to obtain directly from experimental studies. It is necessary to assume some sort of model system for these computer simulation studies, and it is therefore desirable to make the model as physically realistic as possible. One of the most im portant considerations is the treatm ent of the boundaries. For simulations of homogeneous systems, Periodic Boundary Con ditions (PBC) is the natural method to employ. There are many systems for which PBC cannot be used, however, because they possess inhomogeneities of one sort or another. As a specific example, systems containing an ionic solute (to be addressed in this dissertation) fall within this category. The inapplicability of PBC to these and other types of systems has stimulated considerable work in the design of new simulation schemes that do not use PBC (Warshel, 1978; Warshel, 1979; Adelman, 1980; Adelman & Brooks III, 1982; Berkowitz & : McCammon, 1982; Warshel, 1982; Adelman, 1983; Brooks III & Karplus, 1983; Brunger et al, 1984; Belch & Berkowitz, 1985; Warshel & King, 1985) 1 Alternatives to the PBC method are faced with the problem of trying to represent a virtually infinite system with a finite system. There are two issues to address in regard to this problem. First, it is essential to include the average contribution to the forces and energy from the part of the system not explicitly included in the model. Second, it is im portant to include the dynamical effects of the missing part of the system, so that the fluctuation properties of the simulated system are similar to those of the real system. The non-dynamic part of the problem was addressed with the Surface Con strained Soft Sphere Dipole (SCSSD) model (Warshel, 1978; Warshel, 1979). This early model represented polar solvents with a limited number of Stockmayer dipoles. The packing and orientation of the solvent dipoles was adjusted from an arbitrary initial configuration through the application of an energy minimization algorithm, and the effect of the missing part of the system was approximated by freezing the dipoles at the surface in random orientations prior to the energy minimization procedure. This constraint on the surface dipoles prevented the over polarization that would otherwise have occurred. The SCSSD model was used in studies of chemical processes in solutions and proteins, and, since it used a very simplified representation of the solvent, overcame some of the inherent convergence problems present in computer simulations. For example, in the cal culation of energy differences associated with the formation of a charged species, a convergence error of about 5 kcal/mol (which corresponds to a relative error of less than 10%) was obtained with a small amount of computer time. Early attem pts to extend the SCSSD model to molecular dynamics (MD) simulations with an all-atom water model used a surface of fixed water molecules (Warshel, 1982). This approach placed its main emphasis on correct solvent polarization and energetics, and gave only secondary attention to dynamical effects (which appear to have little influence on solvation free energies). The correct modeling of therm al fluctuations has been emphasized in the Molecular Timescale Generalized Langevin Equation (MTGLE) approach of Adelman (Adelman, 1980; Adelman & ; Brooks III, 1982; Adelman, 1983). This 2 method simulates reactions in condensed phases by partitioning the system into the few species directly involved in the reaction, coupled to an effective heat , bath using a generalized Langevin formalism. The MTGLE method appears to • be useful for studies of nonpolar solvents, where frictional factors play a larger | i role than solvation effects. Unfortunately, this is not the case for polar solvents, j where solvation effects can change the rate constants of charge transfer reactions ' i . : by many orders of magnitude, and frictional effects are of secondary importance, j In another approach (variations of which are reported in Berkowitz & Me- j j Cammon (1982), Brooks III & Karplus (1983), Brunger et ah (1984), and Belch ] & Berkowitz (1985)) the system of interest is modeled as a sphere of molecules I with a radius of ss 10 A. The outer » 1 A shell of the sphere is designated as a buffer region, and molecules in this region obey a stochastic (Langevin) equation of motion rather than the normal deterministic Newtonian equation of motion. The stochastic buffer region provides a sink and source of thermal energy, and thus acts as a heat bath for the system. In an initial version of 1 | I : the model (Berkowitz & McCammon, 1982), an additional 4 A shell of frozen I i molecules outside of the buffer region was included in order to prevent evapora- t tion of molecules from the system. This implementation of a fixed surface in MD ! simulations was not much different than the approach used in Warshel (1982) for ! i | MD studies of polar solvents (with the exception of the stochastic treatm ent for ! I achieving a constant tem perature instead of the velocity-scaling method used in , Warshel (1982)). The model was subsequently modified (Brooks III & Karplus, j 1983; Brunger et al, 1984) by replacing the potential due to the atoms in the frozen region with an effective boundary potential, which significantly reduced the amount of computational time needed for a given simulation. j It was found, however, that when this model is used for polar solvents, the | molecules near the surface of the sphere exhibit strong orientational preferences j ] (Belch & Berkowitz, 1985). This effect is due to the absence of the electrostatic , | interactions these molecules would normally have with molecules outside of the j < sphere. j In view of the above considerations, it seems crucial for spherical models designed to be used for simulations of polar solvents to include a constraint to correct the physically unrealistic orientational preference shown by the molecules near the system boundary, and also to take into account the correct coupling to the thermal bath which was emphasized by both the MTGLE method (Adelman, 1980; Adelman & Brooks III, 1982; Adelman, 1983) and the subsequent stochas tic boundaries models (Berkowitz & McCammon, 1982; Brooks III & Karplus, 1983; Brunger et al., 1984; Belch & Berkowitz, 1985). This was done in the Sur face Constrained All-Atom Solvent (SCAAS) model (see Warshel & King (1985) for a preliminary report, and Warshel et al. (1986), Warshel & Hwang (1986), and Hwang et al. (1988) for recent applications). In this chapter we give a detailed description of the SCAAS model. The constraint on the surface polarization is emphasized, but the model also includes a constraint on the radial positions of the surface molecules. The main features of the model are delineated in section 1.2, where the constraints are introduced, and the complete potential energy surface of the model is described. In section 1.3 we present some of the im portant properties of a bulk water system obtained with the SCAAS model. In section 1.4 we compare several different types of properties obtained from SCAAS simulations with spheres of different sizes, with some emphasis placed on the performance of the model in simulations of solvated ions. These simulations with different size spheres are done in order to demonstrate that the model does not possess significant size dependencies. We also examine the validity of the much simpler Fixed Center Langevin Dipoles (FCLD) model (Warshel & Russell, 1984) by comparing the solvation energetics and distance- dependent polarization histograms obtained with the FCLD and SCAAS models. A preliminary study of the dielectric properties of the SCAAS model is also presented. 4 Figure 1.1: An illustration of the three regions of the SCAAS model. Region (a) contains the solute and its first few solvation shells. Region (s) is the surface region, and molecules in this region are subject to the constraint forces described in the text. Region (b) is the bulk region, which is represented by a continuum model. 1.2 T h e S C A A S M o d e l In order to clarify our method it is useful to consider Fig. 1.1. The figure shows an ion surrounded by a sphere of polar solvent molecules (water in this case). The radius of the sphere, Rc, is usually chosen to fall within 8 A and 10 A. The system is formally divided into three regions: region (a) contains the solute and its first few solvation shells. This region is a sphere with a radius of Rc — 1.5 A. Region (s) is a spherical shell of solvent molecules surrounding region (a), w;hich 5 extends from Rc — 1.5 A to R c A. Region (b) is the bulk solvent, which extends from R to infinity. The potential energy of the system may be written as V U r « ,r.,r f c ) = V aa(ra) + Va‘(va, r.) + V ab(va, rb) + V™(r.) + V * ( r .,r i) + V“ (r6) (1.1) where Va® denotes the interaction between atoms in region a and region /?, and ra are the cartesian coordinates for atoms in region a. Our model treats the subsystem (a+s) explicitly, representing (l/°a + "F“* + VM) with conventional two- body atom -atom interaction potentials, and in some cases including many-body effects by considering the induced dipoles of the system. The potentials V ab, V*b, and V b h are represented by effective potentials designed to make the subsystem (a+s) behave as if it was actually part of an infinite system (a+ s+ b). V ab can be approximated with one of the equations used to describe the interaction of a charge distribution with a dielectric continuum. For example, when the system possesses a net charge, the Born formula (Born, 1920), V ab = used. Q is the net charge of the solute, Rc is the radius of the “cavity” surrounded by the continuum region (b), and e is the dielectric constant of the continuum region. Onsager’s expression (Onsager, 1936), V ab = — where fi is the total dipole moment of the solute-solvent system, is used for dipolar solutes. Kirkwood’s expression (Kirkwood, 1934) is used for more complicated charge distributions. The following considerations are used to devise the effective potential V sb. We focus on a given set of variables Y° in region (s) of the complete system (a+ s+ b). For example, Y° could be the coordinates of the molecular centers of mass, or the molecular dipole moment vectors, etc. The index j runs from 1 to N , where N is the number of molecules in region (s), and the superscript (°) is used to indicate that we are referring to the complete system (a+ s+ b). A phenomenological approximation for the equation of motion of Y f is: ^ = _ d(V“ + y * ) = _ K . {Y, _ _ ^ . y . _ A i) (1 2) 6 which, is the equation of motion of a Brownian harmonic oscillator (Wang & Uhlenbeck, 1945). Ky is a force constant, /iy is a generalized mass (i.e. ny(Y°)2 has units of energy), and fiyAy is a random force. The first term in Eq. 1.2 rep resents a short-time harmonic motion, where Y°{t) is the current “equilibrium” value of Y° . The value of the effective force constant K y can be approximated by K°y = M K ) 2) (1.3) where the mean square frequency {(u>y)2) is calculated from the power spectrum of the relevant motion: <(“ r )2) = (L4) Jo Pyi^jdu; Py(“ ) = Jo" cos(u} t)dt (1-5) c m = (Y°(0)Y°(t)) (1.6) The effective friction constant 7y is obtained by combining the two relations 7y = (Einstein, 1905) and Dy — ksTf£° Cy(t)dt(fiy{(T^5 )2))-1 (McQuarrie, 1976) to yield . = ((Y°Y) ( 1 7) and the random force Ay is related to 7y through the second fluctuation-dissipa- tion theorem (Kubo, 1966): r dt{Ay(0)Ay(t)) = 2 J y ^ s L ( 1.8) J - 0 0 fiy where (Ay (O)Ay(f)} is the autocorrelation function of A y , and n is the number of degrees of freedom possessed by Y. The random force used in our MD simulations is Markovian (i.e. there is no correlation between the random forces at successive time steps), which means that (Ay(O)Ay(t)) is a delta function of width At, where At is the time step used in the MD simulations. Thus the integral on the l.h.s. of Eq. 1.8 is equal to ((Ay)2)A t . Since ((Y°)2) = n h ^ r , Eq. 1.8 may be expressed equivalently as: {(Ay)2)At — 27y ((y °)2) (1.9) The approximate equation of motion for the same set of variables Yj in the truncated system (a+s) may be written in an analogous way to eq. 1.2 as d(Vaa + V a a 4- V ab) * yYi = = ~ Kv{Yi ~ Yi{t)) ~ ‘ir(-'n Y * ~ Ay) (L10) where Ky and 7Y will in general be different than Ky and 7y, respectively. By performing a long-time average, the distribution function p(Y), which gives the probability of finding a given Yj between Y and Y + dY, can be ob tained. Our aim is to find the effective constraint forces, f f b = — 7^ , that will bring p(Y) as close as possible to the correct distribution function, p°(Y). To accomplish this, we define a set of constraint variables, (Y°) , with the equation r(Y7) 3 = 1 P°(Y)dY (1.11) * Y m » n in which j is an integer between 1 and N (where N is the total number of variables described by the distribution function p°(Y)), and Ymin is the minimum value that Y can assume. The (Yf) are then used to define a constraint potential i '* = £ ^ 0 5 - ( * ? » ! t 1-12) Note that the constraint variables obey the relation (Y°) < {Y°+1) (since p°(Y) is never less than zero), so the actual variables Yj should be sorted at every step of the simulation to satisfy Yj < Yj+\ before Eq. 1.12 is used. This sorting procedure minimizes the constraint potential V ab, and allows the Yj variables to diffuse freely through Y-space. If the distribution functions p(Y) and p°(Y) still do not agree when the constraint potential V sb of Eq. 1.12 is used, correction terms, A Yj, are added to the {Y°). The parameters A Yj are taken as AY; = (1 7 ) - (Yj) (1.13) where (Yj) are long-time averages of the sorted Yj (i.e. (Yj) < (Yj+1)). V ab is now given by = V - 2 3 In addition to satisfying p{Y) = p°{Y), it is im portant to obtain the correct fluctuation properties for the surface region. This can be done by adding a Langevin force to f j b , so that the total /*6 is given by f t = = - * r ( Y i - (1?) - - P rtfrY j - A!Y) (1.15) Eq. 1.7 can be used to get a trial value for the friction constant 7y . This value is then adjusted to bring the power spectrum of Y into agreement with the power spectrum of Y°. The adjusted value for 7y is usually much smaller than 7y, since most of the friction is already supplied by the molecules explicitly included in the simulation. The mean square value of the random force, ((A1 )2), is set using Eq. 1.9. The complete equation of motion for the variables Yj in the surface region of the SCAAS model is: n Y j = _ 8 (y " + V‘‘) - Ky(Yj - (Y;) - A Yj) - M -ty Y j - * , ) (1.16) Next we will consider the implementation of this approach for Yj = Rj}, where & i are the polarization angles of the surface molecules (defined below), and Rj are the radial distances of the surface molecules from the center of the sphere. 1.2.1 T h e P o la riza tio n C o n stra in t We first define the angle 6 of a solvent molecule with the relation p • R = |/i||ii[ cos 0, where p, is the solvent molecule’s dipole moment vector, and R is the displacement vector from a reference point (the origin) to the molecule (see Fig. 1.1). In an infinite system of pure solvent molecules, the probability of finding a molecule with an angle between 8 and 8 + d8 is proportional to sin#. This sin# dependence is a result of the fact that # does not uniquely specify the orientation of the dipole moment vector; the molecule may be rotated about R without altering #, and the circumference of the circle traced out by the dipole 9 moment vector through such a rotation is proportional to sin 8. The angular distribution function, p°(8), for a sample containing N molecules is p ° ( & l = Y sind (L17) N = p°(6)d6 (1.18) Jo For any finite subset of molecules in an infinite system, the distribution of angles will in general not obey Eq. 1.17 exactly at a given time, but (by the ergodic hypothesis) a time average of the angles of these molecules will yield Eq. 1.17. To obtain an average angular distribution in agreement with Eq. 1.17, we follow the procedure outlined in the previous section and apply a quadratic constraint potential of the form vr‘ ( 0 < ) = ^ ( 0( - (*?) - A 0()! (1.19) to each of the N molecules in the surface region (s). The reference angles (8°) are obtained by integrating Eq. 1.17: 1 N /“ < * ? > i = — H / sin8d8 (1.20) 2 2 Jo in which i takes on integral values from 1 through N . (Inclusion of the constant term | in Eq. 1.20 makes the distribution of angles symmetric about |. ) Solving Eq. 1.20 for {8f) yields < 0f) = arccos ( l + 1 — (1.21) A plot of (8°) vs. i is given in Fig. 1.2. When the solute is an ion, the distribution of angles will be shifted somewhat from the sin 8 dependence of Eq. 1.17. We therefore must find a more general set of constraint angles th at reflects the average polarization of the molecules in region (s) of the complete system (a+ s+ b). The exact distribution function p°{6) for the solvent molecules around a charged solute is not known, but it is clearly dependent on the vacuum field, £, of the solute. Computer simulations 10 m c « T 3 a o — CD. 3 2 1 1 0 2 0 3 0 5 0 4 0 Figure 1.2: The constraint angles (0°) as a function of the index i. Here the number of molecules in the surface region is 58, which is typical for a 9 A sphere containing a total of 119 water molecules. 11 and formal studies (Warshel & Russell, 1984) indicate that the average solvent polarization can be approximated by the Langevin function, which is usually used to describe the average polarization of dipolar molecules in the gas phase in the presence of an electric field (Hill, 1960), but which has also found application in condensed phase theory (Chelkowski, 1980). Thus, we describe the polarization of the surface molecules with the relation (cos 8) % L(X) = coth(X ) - ~ (1.22) JL (L23> where fi is the magnitude of the dipole moment of the solvent molecule. The function represents the effective field of the solute, in which the vacuum field £(J?) = where Q is the charge of the solute and R is the distance from the solute to the solvent molecule, and d(R) is a screening function that accounts for the attenuation of the field of the solute by the surrounding solvent molecules. The nature of the screening function d(R) and its correct asymptotic value are of fundamental interest (see Appendix 2 of Warshel & Russell (1984)). For the purpose of obtaining reasonable surface boundary conditions it is sufficient to use the d(R) that produces the best fit to the polarization law obtained from simulation studies. In this work we follow Warshel & Russell (1984) and use d(R) = 1 + R. The new set of constraint angles should satisfy cos(«f) = I ( X ) (1.24) Inserting {8f) = {6°)* + Si into Eq. 1.24, where (8°)* are the zero-field (8°) of Eq. 1.21 and Si are small correction terms, yields = » £ (* ) (1.25) where we have assumed that cos Si fa 1, and sin Si & Si, and made the substitution jf S i l i cos{8°)* = 0. One would expect the correction terms Si to follow the 12 relation Si = Asin(#°)* (where A is a constant), since the torques exerted on the solvent molecules by the electric field are proportional to sin 8. Substituting Si = Asin{0°}* into Eq. 1.25 gives - f S L = (1.26) which means that A = — | L(X). Finally, the expression for the generalization of Eq. 1.21 can be written as: ( 0 = (L27) The A 8i terms are initially set to zero. If after a sufficiently long simulation the averages (9i) do not approach the corresponding (#f), then A& i are set to A8i = (8°) — (8i) in accordance with Eq. 1.13. The force constant Kg of Eq. 1.19 is obtained from the relation Kg = Je//{u>j), in which the effective moment of inertia, Ieff, is taken as (Iilz)1^2, where 7i and J2 are the principal moments of inertia orthogonal, or most nearly orthogonal, to the solvent molecule’s dipole moment. The mean square angular frequency, (u>J), is calculated from the power spectrum of 8 in accordance with Eqs. 1.4-1.6. The complete equation of motion for the variables & i in the surface region of the SCAAS model, including the frictional force, is: d(Va‘ 4- V"! Ieff6i = ------L _ Ks{e. _ (£*) _ A 0.) _ - A!9) (1.28) 1.2.2 T h e R ad ial C on strain t Nonperiodic models such as the SCAAS model will not m aintain the correct solvent density unless a radial constraint potential of some sort is used. Various forms of this potential are used in different models (Adelman, 1980; Adelman & Brooks III, 1982; Adelman, 1983; Berkowitz & McCammon, 1982; Brooks III & Karplus, 1983; Brunger et al., 1984; Belch & Berkowitz, 1985). In our approach the radial distances of the surface molecules are constrained with quadratic po tentials of the form Vf^Rj) = - (it”) - AR6f (1.29) 13 •< j L a. v 9 7 5 3 4 0 8 0 1 2 0 Figure 1.3: The constraint distances (R°) as a function of the index j. The solid curve and dotted curve correspond to using the g(r) of Soper & Phillips (1986), and g(r) = 1, respectively. Note that the two curves are nearly identical for R > 7 A. The reference distances (Rj) are obtained by integrating the radial distribution function, £f(r), of the solvent: r(Rj) j = I 47rpp(r)r2dr (1.30) Jo The integer j represents the average number of molecules contained within a sphere of radius {R°} (excluding the molecule located at the origin), and p is the bulk number density of the solvent. Fig. 1.3 shows the dependence of (R°) on j when the radial distribution function of water at 25°C (Soper & Phillips, 1986) is used in Eq. 1.30. When the substitution g(r) = 1 is made, Eq. 1.30 becomes (Rj) = ( ^ ) ^ • This approximate solution is also shown in Fig. 1.3. Note that, 14 except for small r, the approximate {R°) and the exact (R°) are nearly identical. We therefore use g(r ) = 1 in order to simplify the model. This simplification has only a small effect on the (R°), and allows the model to be extended to solvents whose radial distribution functions are not known. For systems other than pure solvent, Eq. 1.30 must be modified to account for the possible difference in size of solute and solvent molecules. A more general relationship is: j - nsoiute - 1 + ^ 7 Tp(R°)s (1-31) where the param eter ni0iute (which is not necessarily an integer) represents the number of solvent molecules displaced by the solute. Solving Eq. 1.31 for (Rj} gives <*?> = 3 ( i “t" Tlsolute l ) 1 /3 (1.32) 47rp W hen the solute is an ion, the solvent structure is slightly more compressed than it is for an uncharged solute. This compression should also be taken into account in the (Rj). The relevant correction can be estimated by using simulations to evaluate the difference in the calculated g(r) curves for charged and uncharged solutes of the same size, or by using a continuum model and the experimental solvent compressibility. By comparing g(r ) for charged and uncharged solutes of the same size, we estim ate the change in (Rj) at distances of more than 6 A from a central charge to be on the order of 0.01 A, and we therefore neglect compression effects in the present work. The A Rj terms are initially set to zero. If after a sufficiently long simulation the averages (Rj) do not approach the corresponding (Rj), then A Rj are set to A R j = (R°) — (Rj) in accordance with Eq. 1.13. The force constant K r is obtained from the relation K r = m(wjj), where m is the total mass of a solvent molecule, and the mean square angular frequency, (u>x), is calculated from the power spectrum of R through the application of Eqs. 1.4-1.6. 15 The complete equation of motion for Rj in the surface region of the SCAAS model, including the frictional force, is: d(Va‘ -f mRj = - 1 ^ } - K R(Rj - (R°) - A Rj) - m(YRRj - A'R) (1.33) 1.2.3 Im p le m e n ta tio n o f th e C o n stra in t Forces Constraint forces are applied only to the molecules in region (s) (see Fig. 1.1), or in other words, to the outermost N solvent molecules. The value of N is set at the beginning of each MD simulation, and remains constant throughout the simulation. As mentioned previously, the assignment of the actual variables 8{ and Rj to the constraint values (8°) and (R°), respectively, is done by sorting 8j and Rj by magnitude. The result of this sorting procedure is to minimize the total constraint force applied to the system. The sorting of the variables 8j and Rj is done at each step of the simulation, and is carried out in two stages. In the first sorting procedure, the radial distances of all the solvent molecules in the system are calculated, and a set of auxiliary indices j that satisfy Rj < Rj+i for all j are found. These j ’s are the indices used in Eq. 1.29 and Eq. 1.33. This flexible indexing allows molecules to diffuse into and out of the surface region freely. In the second sorting procedure, the N surface molecules are given auxiliary indices i such that 8j < for all i. These i’s are the indices used in Eq. 1.19 and Eq. 1.28. This indexing scheme permits free rotational diffusion. 1.2.4 T h e T o ta l P o te n tia l S urface For the calculations reported here, water was chosen as the solvent. A three- site representation of the water molecule was chosen, and the following set of potential functions was used: yaa yat y t 332 ^ + 4ey ra 12 16 bonds L a ngles 1 + E - U’ + E y(A» - - E 166a ‘^ 7 - (L34) i where energy is expressed in kcal/mol, and distances in A . The parameters < £ are the residual charges of the atoms (in units of electron charge), and < x tJ- and 6ij are the van der Waals parameters (in units of A and kcal/m ol, respectively). Intramolecular interactions are of course excluded from the first summation of Eq. 1.34 The equilibrium OH bond length is b„, and kb o n d 1S the force constant for OH bond stretching; < f > 0 is the equilibrium HOH angle, and k$ is the force constant for HOH angle bending. The last term in Eq. 1.34 is an approximation of the induced dipoles energy, where a, is the polarizability of atom i (in units of A 3), and is the electric field at atom i due to the residual charges of the other atoms: a = E (i-3s) j& Tij where r l? = rt - — Tj. The exact induced dipoles energy is given by (Russell & W arshel,1985): V if^ = - 166 (1.36) i where & is the total field at atom i due to the complete (perm anent plus induced) charge distribution of the system: &=ff + E^r(— - ~fr) (i-37) j& ij \ ij ) Due to the way the fields are interrelated, solving Eq. 1.37 requires either a m atrix inversion or an iterative procedure. The m atrix inversion m ethod is impractical to use in large, many-atom systems such as ours because of the large amount of computer memory needed to store the m atrix, and we there fore use the iterative procedure. A problem arising from the way the fields & are interrelated is that the determination of analytic derivatives of Eq. 1.36 is 17 hopelessly complicated, and thus Eq. 1.36 cannot be used in molecular dynamics simulations, in which analytic derivatives of the potential energy are needed to determine the forces to be used in the equation of motion. Fortunately, the fields do not differ too much from the fields so as a first approximation, £? may be substituted for £ * ■ This approximation is equivalent to completely neglecting induced dipoles-induced dipoles interactions. As Eq. 1.34 indicates, we prefer a slightly better approximation, in which / d is substituted for where d is an effective screening constant: Vind = — 166 ai((f)2/d (1.38) i The screening constant d (not to be confused with a dielectric constant or the d(R) function that appears in Eq. 1.23) is chosen so that the average induced dipoles energy evaluated with Eq. 1.38 is equal to th at obtained with Eq. 1.36. At this level of approximation, the induced dipoles-induced dipoles interactions (which are small, but not negligible) are taken into account in an average way. This approximation has two benefits. First, analytic derivatives of Eq. 1.38 are easily calculated, and thus Eq. 1.38 can be employed in molecular dynam ics simulations. Second, using Eq. 1.38 instead of Eq. 1.36 saves considerable computational time, as the iterative procedure needed to calculate the fields exactly is quite time-consuming. Typical values for d fall between 1.2 and 1.4. It should be mentioned here that although the average energies obtained with Eqs. 1.36 and 1.38 can be made quite similar with the proper choice of the screening constant d, the potential surfaces described by these two equations are qualitatively different; it remains an open question as to whether this difference has any significant effect on the results obtained from our simulations. A third-order numerical integration algorithm (see Appendix A for details) is employed for propagating the molecular dynamics trajectories, with an integra tion time step of 1.96 fsec. Most of the computer simulations described in this dissertation were performed on either Cray or Alliant supercomputers. Some 18 techniques for rewriting Fortran source code to take advantage of the vector processing capabilities of these machines are described in Appendix D. 1.3 P r o p e r tie s o f B u lk W a te r In order to have a standard set of water properties that the SCAAS water proper ties can be compared to, we first performed a MD simulation of bulk water with a reference model using minimum image periodic boundary conditions with a spherical cutoff (which will henceforth be referred to simply as the PBC model). For the calculations reported here, the PBC system used had a unit cell length of 18.04 A (and therefore a cutoff radius of 9.02 A ), contained 196 water molecules, and was maintained at an average tem perature of 300 K. The param eter set used for water-water interactions in both the PBC and SCAAS simulations is similar to the SPC param eter set (Berendsen et al, 1981); the difference between the actual values of our param eters and the SPC param eters reflects in part the fact that the present model uses a flexible water molecule, while the SPC parameters are based on a rigid water molecule. We chose to make the water molecule flexible for the sake of convenience, in order to avoid having to use a more complicated rigid body numerical integration algorithm for the MD simulations. Modeling the inherently quantum-mechanical vibrational modes as classical harmonic oscillators allows for a much larger cou pling of the vibrational modes to other degrees of freedom than would actually occur in a real system. For this reason, contributions to calculated dynamical properties of the system (such as autocorrelation functions) due to intramolec ular vibrations must be filtered out when a flexible water model is used. Since the intramolecular vibrational motions of the water molecules must be filtered out when the MD trajectories are subsequently analyzed, we have some freedom in choosing the vibrational force constants fe & o n d and k These force constants should be large enough to make the vibrational frequencies obtained in the MD simulations high enough to be unambiguously distinguishable from the classical, 19 P a ra m e te r S et I q a e a Oxygen Hydrogen -0.82 0.42 3.21 0.19 0.155 0.155 0.00 0.00 Dipole moment: .473 eA (2.27 Debye) Geometry: b0 = 1.00 A , kf,ond = 400 kcal mol-1 A -2 4 > 0 = 109.5°, kff, = 100 kcal mol-1 P a ra m e te r S et II q cr e a Oxygen Hydrogen - 0.6668 0.3334 3.19 0.21 0.155 0.155 1.10 0.20 Dipole moment: .385 eA (1.85 Debye) Geometry: b0 = 1.00 A , k & cm d = 400 kcal mol-1 A -2 < f> 0 = 109.5°, k$ = 100 kcal mol-1 Screening constant: d = 1.25 Table 1.1: W ater parameters used in the MD simulations. The units are: electron charge (q); A (<r); kcal mol-1 (e), and A 3 (a). Combining rules for van der Waals interactions: cr^- = (cr^ + cr,)/2, low-frequency motions. At the same time, k^ond and k$ should not be so large that the MD integration time step would need to be reduced to an unreasonably small value. The vibrational frequencies obtained with the current choice of force constants k ^ d and k$ are « 1650 cm-1 for the bending mode and as 2250 cm-1 for the two stretching modes. (The actual vibrational frequencies of water in the gas phase are 1595 cm-1 for the bending mode, and 3652 cm-1 and 3756 cm-1 for the two stretching modes.) A AHvap of — 11.2 kcal/m ol (experimental value — 10.5 kcal/mol (Dorsey, 1940)), and diffusion coefficient of .21 A2psec_1 (experimental value .22 A2psec-1 (Krynicki et al, 1978)) were obtained with parameter set I of Table 1.1 in the 20 PBC system. These param eters were obtained by requiring a best fit between the calculated and observed AHvap, diffusion coefficient, and oxygen-oxygen radial distribution function, g(r), of water. Induced dipoles are not included in this param eter set, which was used for the m ajority of the calculations reported here, but are included in a second parameter set described later, which is still in development (see param eter set II in Table 1.1). Fig. 1.4 shows that the radial distribution function of the PBC system is in quite good agreement with the experimental results (Narten et al., 1982; Soper & Phillips, 1986). (It is interesting to note that the g(r) functions derived from experimental data in Narten et al. (1982) (not shown in the figure) and Soper & Phillips (1986) are not in good agreement for r < 4.5 A .) In comparing the more recent experimental result (Soper & Phillips, 1986) to the result of our PBC simulation (Fig. 1.4), it can be seen that the location and height of the first peak are not perfect. It is possible to obtain better agreement between the calculated and experimental g(r) for small distances by doing some param eter tweaking, but this causes less good agreement for large distances. The difficulties of reproducing all portions of the g(r) curve satisfactorily may in part be an inherent limitation of three-site water models. Four or five-site models could give better solvent structure (Jorgensen et al., 1983; Kuwajima & ; Warshel, 1989), but at the present level of approximation we do not feel it is justified to pay the price in terms of longer simulation times. Although the present work considers many-body effects in terms of induced dipoles (see below), it does not consider charge transfer (CT) interactions in a completely consistent way. The two-body part of the CT effect is included implicitly in the parameterization, but the stabilization of the charge transfer states by the surrounding solvent is not taken into account in our potential energy function (or in any other currently existing potential energy function). Quantum-mechanical zero-point expansion effects could also account for some of the discrepancies in g{r). Thus, while we continue our efforts to uncover the causes for the differences between calculated and experimental g(r) curves, we feel that at present the three-site models are sufficient for obtaining 21 c o * ■ * o c 3 U . e o + » 3 J3 < n Q w ■ o C O O C 3 2 1 3 4 5 6 7 Distance (A) Figure 1.4: The oxygen-oxygen radial distribution function, g(r), of water. The dotted curve is the experimental result (Soper & ; Phillips, 1986). The solid curve is the g(r) obtained with the PBC model, and the dashed curve is the result from a 9 A SCAAS sph ere. The PBC and SCAAS results are quite similar, and both agree well with the experimental result. (Fig. 9B shows that a 10 A SCAAS sphere has even better agreement with the experimental g(r).) 22 reliable energetics and reasonable structural information. More relevant for our discussion is the fact that the g(r) obtained with the SCAAS model is so similar to the PBC result (again, see Fig. 1.4). (For calculating the radial distribution function < 7(7 -) for SCAAS systems, we use the method developed by Mason (1968) for application to spherical systems.) One of our major goals was to make the dynamical properties of the solvent in the SCAAS model as similar as possible to those in the PBC model. As can be seen in Fig. 1.5A, where we compare the velocity autocorrelation functions obtained with a 9 A radius SCAAS system and the PBC system, the agreement is excellent. The power spectra Pr (u> ) associated with these two autocorrelation functions are shown in Fig. 1.5B. Fig. 1.6 illustrates th at the proper choice of the friction constant used for the random force is necessary in order to achieve such good agreement between the two models. It is interesting to note that the value of 7^ (the friction constant for radial motion) needed to bring the SCAAS power spectrum into accord with the PBC power spectrum is only 8.7 psec-1, which is only a small fraction of the “full” value of 7^ obtained from the Einstein (1905) relation: 7^ = = 62.5 psec-1, with D r = .22 A 2 psec-1 at 25°C (Krynicki et al., 1978). As can be seen in Fig. 1.7A, in which Cg(t) (the autocorrelation function of &), of the PBC system and a 9 A SCAAS system are shown, the dynamical properties of the angular variables 6, are in excellent agreement in the two systems. The SCAAS result was obtained without any additional frictional force in the equation of motion of 0, ; (i.e. 7^ is zero). The power spectra P $(u> ) associated with the autocorrelation functions of Fig. 1.7A are shown in Fig. 1.7B. The force constants K r and Kg used for the constraints were determined by applying Eq. 1.3 and Eqs. 1.4-1.6, and their values are 20 kcal mol-1 A -2 and 6.9 kcal mol-1, respectively. We also consider another param eter set (set II in Table 1.1) th at includes approximate induced dipoles interactions. While the induced dipoles term is not crucial for simulating ground state properties of polar solutions, it is very im portant to consider its effect in studies of excited electronic states and in 23 40 o « a . -< e o o e 2 0 a u . e o a o o o 3 < 1.6 0.4 1.2 0.8 u a S 3 U • a. ( A a J o a T im e ( p s e c ) 6 .4 .2 600 200 400 B F re q u e n c y (c m ' ) Figure 1.5: (A) The center of mass velocity autocorrelation function, Cj?(£), of water. The solid curve is the result from the PBC reference model, and the dotted curve is the result from a 9 A radius SCAAS model. (B) The power spectrum, Pr{u ) (f^e cosine transform of Cft(t)), for translational motion of water. The solid curve is the result from the PBC reference model, and the dotted curve is the result from the SCAAS model. 24 o o 0 ) E 3 h m o e a to ® s o a . 8 .6 . 4 .2 600 400 200 Frequency (cm’ 1) Figure 1.6: The effect of the Langevin force on the power spectrum obtained with the SCAAS model. The solid curve is from a simulation using the 7 and {A2) given in the text. The dotted curve is from a simulation with no Langevin force. Note that the absence of the frictional force results in an exceedingly large diffusion constant. jK $= 6.9 k c a l m o l 1 7^=0 K r — 20 k c a l m o l-1 A -2 7 ^ = 8 .7 p s e c -1 Table 1.2: The SCAAS constraint param eters. 25 .04 .08 .12 Tim* (psec) .16 u « a . c m T J « £ 3 w U e a. in » 5 o 6 1.5 1.0 0.5 400 800 1200 B Frequency (cm' ) Figure 1.7: (A) The autocorrelation function of 6, C$(t). The solid curve is the result from the PBC model, and the dotted curve is the result obtained with a 9 A radius SCAAS system. These functions possess some additional structure due to the classical OH stretching allowed by our flexible water model. As explained in the text, the contribution to Ce(t) from intramolecular vibrational motions have been filtered out. (B) The power spectrum , Pe(a»), for angular motion. Pe(w) is the cosine transform of C$(t). Again, the solid line is the PBC result, and the dotted line is the SCAAS result. The band due to the intramolecular OH stretching is not shown in the figure. 26 3 4 5 6 7 Distance (A) Figure 1.8: Radial distribution function for pure water obtained with parameter set II, which includes induced dipoles interactions. The solid curve is the SCAAS result, and the dotted curve is the experimental g(r). simulating chemical reactions in solution (Hwang et al., 1988; Hwang h Warshel, 1987). Param eter set II uses the gas phase dipole moment of .385 eA (1.85 Debye), which is considerably smaller than the dipole moment of .473 eA (2.27 Debye) used in param eter set I. The screening constant d (see Eqs. 1.34 and 1.38) was found to be 1.25. The inclusion of the induced dipoles term , together with the reduction of the perm anent dipole moment, has the effect of delocalizing the attractive solvent-solvent interactions. For this reason the diffusion constant of 2.0 A2psec-1 obtained with param eter set II is about ten times too large. However, the AHvap of — 9.2 kcal/m ol and g(r) (see Fig. 1.8) obtained with this param eter set are still reasonable. In the present study we are interested mainly 27 in examining the performance of the SCAAS model in calculating ground state energies of solvated ions, and unless otherwise indicated, the results reported here were obtained with param eter set I. 1.4 R e s u lts a n d D is c u ss io n Since the choice of the size of our model system is somewhat arbitrary, it is im portant to examine the results obtained from systems of varying size. If the results show a dependence on the system size, then there is a flaw in the model. We have therefore performed MD simulations with systems of three different sizes to see if any size dependence exists in the SCAAS model. The three sizes chosen, in terms of sphere radius, are 8 A , 9 A , and 10 A , which contain approximately 80, 120, and 155 water molecules, respectively. All of the results presented in this section are comparisons of these three systems. 1.4.1 S tru ctu ra l an d d y n a m ica l p ro p erties We start by examining the sensitivity of the calculated radial distribution func tion, g(r), to the size of the system. The results are summarized in Fig. 1.9, o o which compares the radial distribution functions for 8 A and 10 A SCAAS sys tems (g(r) for the 9 A system is shown in Fig. 1.4). The close similarity between the three g(r) functions indicates that the size of the system has very little effect on the solvent structure. The performance of the polarization constraint is examined next, with the results summarized in Fig. 1.10 and Fig. 1.11. In Fig. 1.10 we compare the angular distribution functions obtained from a 9 A SCAAS system in both the presence and absence of the polarization constraint. Also shown in the figure is the reference function | sin B that we are attem pting to duplicate. Only the molecules in the surface region (the outermost 1.5 A shell) are included in the calculation of these distribution functions. As can be seen from the figure, when 28 o S c o 5 T J < 8 £E 8A sphere 3 2 1 7 5 4 3 6 Distance (A) e o o c 3 n Q .2 ■ 5 10A sphere 3 2 1 6 3 4 5 7 B Distance (A) Figure 1.9: Radial distribution functions for pure water obtained from SCAAS simulations of two systems with radii of: (A) 8 A; (B) 10 A. The radial distri bution function of a 9 A SCAAS system is shown in Fig. 1.4. As in Fig. 1.4, the dotted curves are the experimentally determined distribution function of Soper & Phillips (1986). 29 . 6 .4 . 2 1 60 8 0 12 0 4 0 6 (degrees) Figure 1.10: The angular distribution function, p(B), for pure water. The dotted curve is the normalized analytical distribution function, p(B) = | sin B. The solid and dashed curves are, respectively, the angular distributions obtained with and without the polarization constraint in the SCAAS model. The system size is 9 A . Only molecules in the outer 1.5 A shell are included in the calculation of these distribution functions. 30 no polarization constraint is used, the resulting angular distribution function is clearly unsatisfactory, whereas the distribution function obtained from the sim ulation that included the polarization constraint matches the reference function quite well. In Fig. 1.11 the angular distribution functions obtained with 8 A and 10 A SCAAS systems are shown. The 8 A system exhibits the largest deviation from the desired sin 6 dependence, but this deviation is small compared to the overall good fit. Thus, we conclude that as far as the solvent structure is con cerned, the SCAAS model does a reasonable job, and shows no significant size dependencies. T he perform ance o f the m odel w ith regard to dynam ical properties was as sessed by exam ining th e center of m ass velocity pow er spectra of th e 8 A , 9 A, and 10 A system s. Fig. 1.12 shows th e power sp ectra o f th e 8 A and 10 A sys tem s, and th e power spectrum o f th e 9 A system is show n in Fig. 1.5B. A gain, th e results do not change significantly w ith th e size o f th e system ; th e calculated diffusion constants are 0.203 A 2psec x, 0.210 A 2psec *, and 0.210 A 2psec 1 for th e 8 A , 9 A , and 10 A system s, respectively. T he experim ental value of the diffusion constant is .22 A 2p sec_1 (K rynicki et al., 1978). 1.4.2 S o lv en t p o la riza tio n arou n d ch arged so lu te s The main motivation for the development of the SCAAS model was prompted by the need to develop an efficient tool for calculations of solvation effects, rather than a model for exploring bulk solvents. W ith this in mind, we tested the per formance of the model by simulating different-sized SCAAS systems containing a central chloride ion surrounded by water. The param eters used for the chloride ion are given in Table 1.3. The calculated angular distribution function for the 9 A system is shown in Fig. 1.13, and those for the 8 A and 10 A systems are shown in Fig. 1.14, In these two figures the solid curves are the angular distribu tion functions obtained from the simulations. Only the molecules in the surface regions were included in the calculation of these functions. As can be seen in the 31 e o a o C L 8A sphere .3 .4 .3 .1 1 60 120 40 80 8 (degrees) a « a o .5 10A sphere .4 .3 .2 .1 8 0 160 1 20 4 0 B 0 (degrees) Figure 1.11: Angular distribution functions for pure water obtained from SCAAS simulations of two systems with radii of: (A) 8 A; (B) 10 A . As in Fig. 1.10, the dotted curves are the target distribution function | sin 9, and only molecules in the outer 1.5 A shells are included in the calculation of these distribution functions. o 0 0 c l E 3 U & u a * o a 8A sphere .6 .2 600 400 200 Frequency (cm ‘ 1 ) u 0 « a E 3 W o 0 a. co 0 * O CL 10A sphere .6 .4 .2 200 400 600 B Frequency (cm' * ) Figure 1.12: Velocity power spectra of pure water obtained from SCAAS sim ulations of two systems with radii of: (A) 8 A; (B) 10 A . The velocity power spectrum of a 9 A SCAAS system is shown in Fig. 1.5B. 33 9 cr e C t Na+ c i- 1.0 - 1.0 2.19 4.60 1.030 0.109 0.00 0.00 Table 1.3: Sodium and chloride ion parameters. The units are the same as in Table 1.1. .a c o XI o .5 .4 .3 .2 .1 4 0 8 0 1 20 1 60 9 (degrees) Figure 1.13: Angular distribution function for a system containing a central chloride ion (9 A sphere). The solid line is the result from the simulation, and the dotted line is the distribution function expected if Eq. 1.27 is obeyed. 34 . 6 - 8A sphere jQ 4 0 8 0 1 20 1 60 0 (degrees) 10A sphere x > 4 0 8 0 1 20 1 60 6 (degrees) Figure 1.14: Angular distribution function for a system containing a central j chloride ion ((A) = 8 A sphere, (B) = 10 A sphere). As in Fig. 1.13, the solid j lines are the results of the simulations, and the dotted lines are the distribution j functions expected if Eq. 1.27 is obeyed. j figures, these distribution functions closely follow the reference functions (shown as dotted curves). These reference functions, which reflect the surface polariza tion expected when Eq. 1.27 is obeyed (and are analogous to the sin# reference functions for electrically neutral solutes shown in Fig. 1.10 and Fig. 1.11), were constructed in the following way. Using Eq. 1.17 we can write the identity: r<°?) 1> h»!) from which the approximation i{0?) is made. Eq. 1.40 is then rearranged to give {jo ^ /* { ^ ^ ) /* { ^ ^ ) 1 = (t + 1) - i = / * + 1 p°{8)d8 ~ I ' p°(8)d8 = f i+ 1 p°(8)d8 (1.39) Jo Jo J(0 ? ) P°(9i) [ {°i+l)d8 (1.40) J(6?) p°(ei) “ («?«> - m (L 4 1 ) Substitution of the constraint angles of Eq. 1.27 into Eq. 1.41 yields the approx im ate values of the “analytical” p°(8i) function. In Fig. 1.15A, the average ion-dipole energy for the three chloride ion-water systems is plotted as a function of ion-dipole distance. The quantity being plotted here, Uqm (t ’), is calculated as U qA t) , -33 2E ^ ‘ - f >S(r‘ - r) (1.42) 0{ri - r) where Pi are the dipole moments of the water molecules, are the electric fields of the ion at the dipoles, and £(r; — r) are delta functions of width 0.25 A . The results from the 8 A , 9 A , and 10 A systems are plotted in the figure as dotted, dashed, and solid curves, respectively. Two things are apparent from this figure: (l) all three results are quite similar, indicating that the choice of the system size may be made arbitrarily with no ill effects (at least with respect to solute- solvent interactions); (2) there are no abnormal peaks in the U q^r) functions near the system boundaries (i.e. between 8 A and 10 A ), which would have been an indication of the undesired overpolarifcation effect the model was designed to overcome. 36 o E u X e ui o a 5 • e o 0 4 • 8 1 2 2 9 3 5 6 7 8 4 1 0 D istance (A) e E e U i o C L C O - 4 - 8 1 2 7 3 4 5 6 8 9 1 0 B Distance (A) Figure 1.15: The average charge-dipole energy for water around a central chlo ride ion as a function of distance. (A) The dotted, dashed, and solid curves were obtained with 8 A , 9 A , and 10 A SCAAS spheres, respectively. (B) The solid curve was obtained with the SCAAS model (10 A sphere), and the dotted curve was obtained using the Langevin dipole approximation for the solvent. 37 It is also interesting to compare the ion-dipole energy obtained with the SCAAS model to th at obtained using the FCLD (Fixed Center Langevin Dipoles) model, as shown in Fig. 1.15B. The FCLD model represents the solvent as a grid of fixed point dipoles that obey the Langevin dipole polarization law (Eq. 1.22). The magnitudes and orientations of the dipoles are adjusted iteratively with a self-consistent energy minimization algorithm, until a convergent result is ob tained. (For more details of the FCLD model, see Warshel & Russell (1984)) There is obviously significant disagreement between the results of the two models for distances less than 5 A , which is caused mainly by the difference in solvent structure in the two models. However, for distances greater than 5 A the agree ment is quite good. This suggests that a hybrid model, utilizing an all-atom representation of the solvent for the inner solvation shells, and FCLD for the outer solvation shells, could be used for studies of solvation energetics such as this. The reason one would want to use such a hybrid model is related to the com putational times required by the two methods: a free energy calculation by MD simulation requires a factor of about 100 more computer time than the corresponding FCLD calculation. 1.4.3 Free e n er g y ca lcu la tio n s The performance of the SCAAS model in evaluating solvation free energies was examined by calculating the solvation free energy of a sodium ion in water. The param eters used for the sodium ion are listed in Table 1.3. The solvation free energy was evaluated with an adiabatic charging procedure (Warshel, 1982; Warshel et al, 1986), using the multistage sampling.method (Valleau & ; Torrie, 1977) (now also known as the free energy perturbation m ethod). The theoretical basis of the multistage sampling method is described in Appendix C. For related solvation studies see Postm a et al. (1982), Singh et al. (1987), and Hermans et al. (1988). The multistage sampling method allows one to calculate the free 38 energy difference, AGba = Gb — Ga, between states b and a (described by potential surfaces 14 and 14, respectively) with the equation exp(— /?AG40) = (exp(— /3A14a))o (1-43) where A14a = 14 — 14? P — and the term {X)a denotes a Boltzmann average of X on potential surface a. In principle, Eq. 1.43 is rigorously correct provided that those regions of configuration space corresponding to both states a and b are heavily sampled during the course of the MD simulation. The fact of the m atter is that in a MD simulation on surface 14, the system will spend most of the time in those regions of configuration space near the minima of 14, so in practice, Eq. 1.43 yields accurate results only when the minima of 14 and 14 are not too distant from each other in configuration space, or in other words when (AT4a)a is small. Thus, in practice Eq. 1.43 may only be used to calculate relatively small changes in free energy. However, since free energy is a state function, a large AGba may be expressed as a sum of smaller free energy differences: A Gba = AGbi + A(?i2 + A(?23 + • ■ • 4- A Gna (1.44) A suitable choice of the set of intermediate states {1,2,3, . . . ,n.} allows each of the AGij to be evaluated accurately with Eq. 1.43. The potential surfaces for the states {1,2,3, . ..,n } are constructed from surfaces a and b as follows (Warshel, 1982; Warshel ef al., 1986): Vi = A ; 14 + (1 - ai)Vb (1.45) where the mapping param eter A ; takes on values between 0 and 1. To calculate the sodium ion solvation free energy, we ran 2 psec trajectories on each of 12 potential energy surfaces in order to gradually transform (“perturb” ) the fully-, charged sodium ion (state a) to a neutral particle (state b). In this case, each of the terms on the r.h.s. of Eq. 1.44 has a m agnitude of about 7-8 kcal/m ol. A completely rigorous treatm ent would also transform the neutral particle (state b) into a null particle (e.g. by reducing the van der Waals radius gradually to 39 Sphere Radius A G,o l 8 A -102.4 ± 2.2 9 A -98.6 ± 1.1 io A -99.8 ± 0.5 Table 1.4: Sodium ion solvation free energy calculated with three different sized SCAAS spheres. The A Gaoi values are given in kcal/mol. The accepted value for the absolute solvation free energy of a sodium ion (determined from experiment and ionic radii considerations (Desnoyers & Jolicoeur, 1969)) is — 98.2 kcal/mol. zero). The change in free energy associated with this transform ation is only 1-2 kcal/mol, so this additional step was not performed. The solvation free energies obtained from the simulations are about — 80 kcal/mol, to which are added the contributions to the free energies from the bulk continuum, A G^ik, to bring the total solvation free energies to about — 100 kcal/m ol (the accepted value is — 98.2 kcal/m ol (Desnoyers & Jolicoeur, 1969)). The AGbuik terms are estim ated with the Born formula (Born, 1920), A G ^ik = - 1 6 6 ^ - ^ ^ , where Q is the charge of the ion in units of electron charge, R c is the “cavity” radius in units of A (taken here as the radius of the SCAAS sphere), and e = 80. Results of the three simulations are shown in Fig. 1.16 and summarized in Table 1.4. An explanation of how the error bars listed in Table 1.4 are determined is given in Appendix C. The inset in Fig. 1.16 shows the 3-4 kcal/m ol spread in AGaoi values obtained with the three different-sized SCAAS systems. Although this spread in values is small compared to the overall free energy change of — 100 kcal/mol, it is still about twice as large as we are comfortable with. The evaluation of solvation free energies with the above approach is very time- consuming, even if one uses the SCAAS model rather than a PBC model. Thus, it is im portant to examine the performance of the much simpler FCLD model. Such an examination is presented in Fig. 1.17, which compares the SCAAS and FCLD results for a sodium ion in water. As seen clearly from the figure, the two models give similar results, indicating that the FCLD and related models 40 .2 .4 .6 .8 1.0 X Figure 1.16: Solvation free energy of a sodium ion in water as a function of the param eter A. A = 1 corresponds to the fully-charged sodium ion, and A = 0 corresponds to a neutral particle. The dotted, dashed, and solid curves were obtained with 8 A, 9 A, and 10 A SCAAS systems. The inset shows an exploded view of the region of the curves between A = .88 and A = 1. 41 Figure 1.17: Solvation free energy of a sodium ion in water as a function of the param eter A . Here the solid curve was obtained with a 10 A SCAAS system, | and the dotted curve was obtained with a 10 A FCLD system. (Warshel & Russell, 1984) can provide a very efficient way to estim ate solvation free energies. 1 .4.4 D ie le c tr ic p ro p erties Considering our interest in using the SCAAS model for simulating chemical pro cesses in solutions and for studies of electrostatic interactions, it is im portant to examine the dielectric properties of the model. Here we have attem pted to eval uate the static dielectric constant of the SCAAS model in term s of the system’s mean square dipole moment with the relation (Kirkwood, 1939; Buckingham, 1956; Frohlich, 1958; Neumann, 1983; Neumann, 1985) (2« + l)(e -l) 47r(M!) „ ;-----------= - v w f (L46) where (M 2) is the mean square dipole moment for a sphere of volume V. This well-known expression was derived by Frohlich (Frohlich, 1958) for a large sphere embedded in an infinite bulk of its own dielectric. This conceptual model is in fact best reproduced by spherical models such as the SCAAS model, since the rigorous implementation of the Frohlich model within PBC models is far from being obvious. The Frohlich model, however, might not provide the proper representation of a real infinite microscopic system (see below). When trying to obtain a good approximation for the true value of (M 2) for a sphere within an infinite system, it is crucial to take into account the long range correlation between the dipoles in the explicit solvent region (i.e. regions (a) and (s) in our model) and the implicit dipoles in the bulk region. A seemingly obvious option is provided by the “reaction field” formulation (Onsager, 1936). The reaction field, F r f , is the field induced within the sphere by the bulk dielectric medium outside of the sphere in response to the total dipole moment of the sphere. The effect of the reaction field can be implemented in MD simulations by adding to the potential of Eq. 1.34 the reaction field potential: Vr f = — ^ • ^R F (1-47) 43 F « - = f e r f i (1-48> in which M = < liri is the total dipole moment of the sphere, and r* are the charges and position vectors, respectively, of the atoms within the sphere, R c is the radius of the sphere, and £ rr is the assumed dielectric constant of the con tinuum region outside of the sphere. The factor of | th at appears in Eq. 1.47 is usual for induced polarization phenomena, and indicates th at half of the energy gained by the interaction between the reaction field F rf and the dipole moment M is invested in polarizing the continuum. The incorporation of the forces asso ciated with Vrf in our MD simulations appears to be an obvious application of well-established electrostatic concepts that are based on macroscopically-small spheres surrounded by infinite bulk (Frohlich, 1958). However, the straight forward incorporation of V rf and the forces due to Vrf in MD simulations of SCAAS systems causes the total dipole moment of the system, M , to become unrealistically large. The reason for this is th at the reaction field forces act to increase M , and the larger M then increases the reaction field forces, etc. In short, a positive feedback is established between M and the reaction field forces, which causes M to achieve unrealistically large values. The feedback effect does not occur in PBC simulations (Neumann, 1983; Neu mann, 1985), because with PBC the system does not have a single M , but as many M^ as there are molecules (since each molecule is considered as the center of its own sphere). These M j vary in m agnitude and orientation, and tend to cancel each other to some extent. It should be pointed out, though, th at the fact that M is well-behaved when the reaction field treatm ent is applied in PBC models does not necessarily mean that the reaction field approximation gives the correct physical representation of the system being modeled. For example, consider a molecule that lies in the continuum region for both of two spheres (i.e. the molecule is beyond the distance R c for both spheres, and is therefore a part of the continuum region). This molecule will simultaneously experience interactions with the dipole moments M i and M 2 of the two spheres. However, 44 the reaction field formalism implicitly assumes that the molecule will react to M i as if M 2 did not exist, and vice versa, and thus the molecule is expected to point in two directions at once. It seems likely to us that the reaction field ap proximation overestimates the polarization of the surrounding medium, and this approximation for the long-range dipole-dipole interactions may not provide a rigorous representation of the corresponding microscopic system. We examined several options for preventing the positive feedback effect caused by the inclusion of V rf in the potential. When a time delay was introduced in the response of the reaction field to M , the effect was slower to appear but was not eliminated. An other alternative was to allow the reaction field forces to act only on the interior molecules (region (a)), and not on the surface molecules (region (s)). However, with this constraint we obtained different dielectric constants for SCAAS spheres of different sizes. We finally chose to scale the reaction field by treating £rf as an adjustable param eter (see W atts (1981) for a related treatm ent): Vrf = ( l- « ) — f e r ^ in which Crf can assume values between 0 and 1. Further studies will be needed to assess the validity of the above treatm ent. We performed several lengthy (50 psec each) simulations with a 10 A SCAAS sphere of pure water, experimenting with different values for Cr f ■ The results are summarized in Table 1.5, where the values of the dielectric constants as determined by Eq. 1.46 are given as a function of the value assumed for Crf in each simulation. The table shows that reasonable results are obtained with C r f < 0.8. These results may be compared to e = 59 obtained with the same water param eters, but with our PBC model and the approach used in Neumann (1985). The total dipole moment autocorrelation functions Cw(£) = ^ or both the PBC system and a 10 A SCAAS sphere with C r f = 0.75 are shown in 45 I I Crf e 1.00 460 0.80 85 0.75 78 0.70 74 0.50 28 Table 1.5: Dielectric constant as a function of Crf- The results are for 50 psec simulations with a 10 A SCAAS sphere, and the dielectric constant e is calculated I using Eq. 1.46. The experimental value of the dielectric constant of water at 298 1 K is 78.3 (Handbook of Chemistry and Physics, 1970). A M s s o 5 v 0.6 s 0.4 0.2 0.0 ■ » 6 9 3 1 2 Time (psec) Figure 1.18: Total dipole moment autocorrelation functions, ^ ( i ) , for the PBC system (solid curve) and a 10 A SCAAS system with ( 7 ^ = 0.75 (dashed curve). Both results were obtained with data from 50 psec trajectories. 46 Fig. 1.18. In both cases C m (0 was calculated with data from 50 psec trajectories. The function Cm {1) converges exceedingly slowly, and thus the results shown in the figure should only be considered as having sem iquantitative status. The dielectric relaxation times of 3.7 psec and 2.6 psec (the tim e C m (0 takes to reach the value 1/e) for the PBC and SCAAS systems, respectively, are similar to the value of 3.84 psec obtained in a related study (Neumann, 1985). The main point of the present study was not to obtain the definitive value of e (that should, incidentally, also involve the effect of induced dipoles), but to explore the means of evaluating e for spherical models. Although spherical mod els provide the conceptual basis of most of the electrostatic models (which treat the system as a finite reference sphere surrounded by infinite bulk (e.g. Frohlich (1958), Neumann (1983), Neumann (1985)), it appears th at the connection be tween the microscopic and macroscopic models has not been solved in a rigorous way. Most im portantly, the concept of the reaction field may have some funda m ental problems. For instance, in allowing the medium surrounding a reference sphere to respond exclusively to the dipole moment M of this sphere, we neglect the fact that the components of the surroundings (e.g. water molecules) should in reality fluctuate in response to other “reference” spheres as well as our refer ence sphere (a related problem was addressed by Buckingham (1956)). Thus, we do not feel that our treatm ent provides a rigorous way of obtaining the dielectric constant of spherical models. Similar conceptual problems appear to be involved in reaction field treatm ents of PBC models. The SCAAS model allows one to explore, with the available computer re sources, much larger systems than those that can be studied with PBC models, and might (with the help of the recently developed generalized Ewald m ethod (Kuwajima & Warshel, 1988)) allow for the study of systems th at are large enough to include long range dipole-dipole correlation effects explicitly. Fur therm ore, we are now exploring the possibility of introducing a grid of Langevin dipoles in place of the continuum around the explicit SCAAS solvent. This might 47 allow one to reduce the uncertainties associated with the continuum reaction field treatm ent. 1.5 C o n c lu d in g R e m a r k s The development of the SCAAS model has been m otivated by the need for an efficient computational method for studies of polar solutions using a relatively small number of solvent molecules, and by the problems associated with PBC simulations of inhomogeneous systems. (For an excellent discussion of the prob lems associated with the PBC approach, see Valleau & W hittington (1977).) In the present work we have provided a detailed description of the SCAAS model, and examined the performance of the model under different conditions. SCAAS simulations of pure water were shown to yield results in accord with the results of the corresponding PBC simulation for both static properties (such as g(r)) and dynamical properties (such as the velocity autocorrelation function). The model was found to be effective in enforcing the sin d angular distribution expected from a system of pure solvent molecules, and showed only minor sensitivity to size when SCAAS systems of 8 A, 9 A, and 10 A radii were compared. The performance of the model in simulations of ionic solutions was exam ined with several test cases. Simulations of a chloride ion in water dem onstrated th at the distance-dependent polarization function (i.e. Eq. 1.42) of the solvent molecules around an ion is almost independent of the system size. An adiabatic charging calculation of the solvation free energy of a sodium ion in water demon strated that the model can provide reliable solvation free energies, and therefore can be used in simulations of more complex systems, including charge transfer chemical reactions (see Warshel & Hwang (1986) and Hwang et al. (1988) for examples). The minor size dependence of the model indicates that the model may be used with a small number of solvent molecules, thus allowing convergent results to be obtained within a reasonable computer time. 48 As far as solvation free energies are concerned, we find that the SCAAS model and the much simpler and much faster FCLD model give similar results. This | justifies the use of the FCLD model for accelerated calculations of solvation free s ; energies (Warshel & Russell, 1984). | The SCAAS model was implemented as an effective way of representing the ' solvent molecules around protein active sites (Warshel et al., 1986) or other spe cific groups of interest. The model has thus provided in some cases a practical j alternative for our earlier SCSSD and PDLD models (Warshel k Russell, 1984) ! I for calculating solvation free energies in proteins. However, it appears that the i r I SCAAS model encounters more serious problems in terms of its sensitivity to the ; | system size when utilized in protein calculations; coping with the inhomogene- , r ity of the protein surface remains as a major challenge in accurate microscopic calculations of electrostatic energies. It might be im portant in such calculations ' to use significantly larger systems than the ones needed for studies of solvation ' effects in purely aqueous solutions. In treating larger systems it is im portant > j to use a large truncation radius for w ater-water interactions to account for the l J J long-range correlation effects associated with electrostatic interactions. This j could increase the calculation time substantially. However, our newly-developed , ' i I extension of the Ewald m ethod to non-periodic systems (Kuwajima k Warshel, j 1988) might allow for the practical implementation of SCAAS systems of a much 1 larger size than the ones currently used. This development may also allow for 1 the fundamental study of dielectric effects in polar solvents, where the correct J I ! incorporation of the solvent reaction field is still an open question. j i i 1.6 R e fe r e n c e s j Adelman, S.A. (1980) J. Chem. Phys. 73, 3145-3158. Adelman, S.A. (1983) Adv. Chem. Phys. 53, 61-223. Adelman, S.A. k Brooks III, C.L. (1982) J. Phys. Chem. 8 6 , 1511-1524. Anderson, J., Ullo, J.J. k Yip, S. (1987) J. Chem. Phys. 87, 1726-1732. 49 Belch, A.C. & Berkowitz, M. (1985) Chem. Phys. Lett. 113, 278-282. Berendsen, H .J.C., Postma, J.P.M ., van Gunsteren, W .F. & Hermans, J. (1981) Intermolecular Forces (ed. B. Pullm an), pp. 331— 342. Dordrecht, Holland: Reidel. Berkowitz, M. & McCammon, J.A. (1982) Chem. Phys. Lett. 90, 215-217. Born, M. (1920) Z. Phys. 1, 45-48. Brooks III, C.L. & ; Karplus, M. (1983) J. Chem. Phys. 79, 6312-6325. Brunger, A., Brooks III, C.L. & Karplus, M. (1984) Chem. Phys. Lett. 105, 495-500. Buckingham, A.D. (1956) Proc. Roy. Soc. A 238, 235-244. Chelkowski, A. (1980) Dielectric Physics, pp. 208-217. New York: Elsevier. Desnoyers, J.E. k Jolicoeur, C. (1969) Modern Aspects of Electrochemistry, Vol. 5 (ed. J.O ’M. Bockris & : B.E. Conway), pp. 1-83. New York: Plenum Press. Dorsey, N.E. (1940) Properties of Ordinary Water-Substance, pp. 612-618. New York: Reinhold. Einstein, A. (1905) Ann. d. Physik 17, 549-560. Frohlich, H. (1958) Theory of Dielectrics, pp. 36-42. London: Oxford Univer sity Press. Handbook of Chemistry and Physics (1970) (ed. R.C. W east), p. E-67. Cleve land: Chemical Rubber Company. Hermans, J., Pathiaseril, A. k Anderson, A. (1988) J. Am. Chem. Soc. 110, 5982-5986. Hill, T.L. (1960) An Introduction to Statistical Thermodynamics, pp. 205-209. Reading, Mass.: Addison-Wesley. Hwang, J.-K ., King, G., Creighton, S. & Warshel, A. (1988) J. Am. Chem. Soc. 110, 5297-5311. Hwang, J.-K . k W arshel, A. (1987) J. Am. Chem. Soc. 109, 715-720. Jorgensen, W .L., Chandrasekhar, J., M adura, J.D ., Impey, R.W. k Klein, M.L. (1983) J. Chem. Phys. 79, 926-935. 50 Kirkwood, J.G . (1934) J. Chem. Phys. 2, 351-361. Kirkwood, J.G . (1939) J. Chem. Phys. 7, 911-919. Krynicki, K., Green, C.D. & Sawyer, D.W. (1978) Faraday Discuss. Chem. Soc. 66, 199-208. Kubo, R. (1966) Repts. Prog. Phys. 29, 255-284. Kuwajima, S. & Warshel, A. (1988) J. Chem. Phys. 89, 3751-3759. Kuwajima, S. Sz Warshel, A. (1989) submitted. McQuarrie, D.A. (1976) Statistical Mechanics, p. 515. New York: Harper & Row. Mason, G. (1968) Nature 217, 733-735. Narten, A.H., Thiessen, W .E. & : Blum, L. (1982) Science 217, 1033-1034. Neumann, M. (1983) Mol. Phys. 50, 841-858. Neumann, M. (1985) J. Chem. Phys. 82, 5663-5672. Onsager, L. (1936) J. Am. Chem. Soc. 58, 1486-1493. Postm a, J.P.M ., Berendsen, H.J.C. & Haak, J.R . (1982) Faraday Symp. Chem Soc. 17, 55-67. Russell, S.T. & Warshel, A. (1985) J. Mol. Biol. 185, 389-404. Singh, U.C., Brown, F.K., Bash, P.A. & Kollman, P.A. (1987) J. Am. Chem. Soc. 109,1607-1614. Soper, A.K. & Phillips, M.G. (1986) Chem. Phys. 107, 47-60. Valleau, J.P. & Torrie, G.M. (1977) Modem Theoretical Chemistry, Vol. 5, Part A (ed. B.J. Berne), pp. 169-194. New York: Plenum Press. Valleau, J.P. & W hittington, S.G. (1977) Modem Theoretical Chemistry, Vol. 5, P art A (ed. B.J. Berne), pp. 137-168. New York: Plenum Press. Wang, M.C. & Uhlenbeck, G.E. (1945) Rev. Mod. Phys. 17, 323-342. Warshel, A. (1978) Chem. Phys. Lett. 55, 454-458. Warshel, A. (1979) J. Phys. Chem. 83, 1640-1652. 51 Warshel, A. (1982) J. Phys. Chem. 8 6 , 2218-2224. Warshel, A. k Hwang, J.-K. (1986) J. Chem. Phys. 84, 4938-4957. I | Warshel, A. k King, G. (1985) Chem. Phys. Lett. 121, 124-129. S Warshel, A. k Russell, S.T. (1984) Q uart. Rev. Biophys. 17, 283-422. ! ■ j Warshel, A., Sussman, F. k King, G. (1986) Biochemistry 25, 8368-8372. W atts, R.O. (1981) Chem. Phys. 57, 185-195. Chapter 2 C om puter Sim ulation o f Solvated P roteins 2.1 I n tr o d u c tio n A complete understanding of the relationship between protein structure and function would have benefits for many scientific disciplines. It is not surprising that the nature of this structure-function relationship is currently the focus of extensive research (Warshel & Levitt, 1976; Allen, 1981; Warshel, 1984; Warshel & Russell, 1984; Mildvan & Fry, 1987; W oodbury et al., 1987; MacKerell et al., 1988; Zauhar & Morgan, 1988; Harvey, 1989). Studies of model protein systems via computer simulation can furnish a great deal of information on the modeled systems, often at a level of detail difficult to obtain directly from experimental studies. In order to make a model system as physically realistic as possible, a good deed of thought m ust go into its design. In constructing a model to be used for the simulation of protein molecules, it is im portant to remember that proteins have evolved within the predominantly aqueous environments of living cells, and it is therefore crucial to explicitly include the w ater surrounding the protein in the model. The ideal protein-w ater model system would consist of a protein molecule completely immersed in a sphere of water molecules, where the 53 radius of the sphere was large enough to make the effects of the water-vacuum interface at the system boundary negligible. A radius of 50 A would probably be sufficient for this purpose, but the large number of atoms (~ 50000) would deter anyone from seriously attem pting to simulate such a system. A rough rule of thum b for molecular dynamics simulations is that the computer time needed to run a trajectory of a given length is proportional to N y, where N is the number of atoms in the system, and the exponent 7 typically has a value somewhere between 1.5 and 2.0. Expressed in another way, the com putation time is propor tional to 637, where b is the radius of the model system, and 37 has a value of about 5. Clearly it is in our interest to create a model system of a more manage able size. A less-than-ideal but still physically realistic model might consist of a protein molecule surrounded by a “coat” of water molecules of some specified thickness (e.g. 10 A ). Unfortunately, this model system would still contain a very large number of degrees of freedom (at least by today’s standards). Several thousand water molecules would be required for this “coat”, and a protein-w ater system of this size would require a very long time to simulate, even on present- day state-of-the-art supercomputers. Instead of waiting for the development of faster computers, we have made an approximation that reduces the number of degrees of freedom included in the model, and thus makes the model practical to use with the computers presently available. The approximation consists of sol vating only that portion of the protein which is directly involved in the process of interest. The idea of what “directly involved” means is usually unambigu ous. For instance, if one is studying enzyme catalysis, the enzyme active site usually consists of portions of only a few amino acid residues in close proximity; this group of residues constitutes the portion of the protein molecule which is “directly involved” in catalyzing the reaction. If one is studying solvation en ergetics, the solvated group is usually even more localized than in the case of catalysis. If careful attention is paid to the boundary conditions used for the wa ter, then the physics of systems such as these is not significantly compromised by solvating the relatively small “interesting” region instead of solvating the entire 54 protein molecule. In most cases, this solvation can be accomplished with well under two hundred water molecules. Fig. 2.1 shows a ball-and-stick illustration of our protein-w ater model system. In the figure, the acidic group Asp-3 of the protein Bovine Pancreatic Trypsin Inhibitor (BPTI) is shown solvated by a sphere of water molecules. It is apparent from the figure that even for a small protein such as BPTI, the number of water molecules used in our approach is much less than would be needed to solvate the entire protein molecule. In this chapter we will describe our protein-w ater model and address its ac curacy by testing it in a series of free energy calculations. A complete description of the protein sub-model can be found in Warshel & Levitt (1976), and we will therefore discuss only those features of the model that are relevant for this work. For the water sub-model we use the SCAAS model (described in Chapter 1). Only minor modifications to the SCAAS model need to be made in adapting it from the small solutes examined in Chapter 1 to the large protein solutes which are the focus of this chapter; these modifications will be discussed. 2.2 T h e P r o te in S u b -M o d e l The potential energy functions used to model the various bonding (i.e. stretch ing, bending, and torsional terms) and nonbonding interactions — as well as the particular param eters used in these functions — are described in W arshel & Levitt (1976). We find it convenient in nearly all of our studies th at involve solvation to partition the system into solute and solvent components, since it is desirable to be able to treat the solute quantum-mechanically when a chemical reaction is involved. The solute, which is chosen as that part of the system that undergoes a chemical change or that is unique for some other reason, is com posed of 1 to 10 atoms. For the work described in this chapter — in which the solvation free energies of the acidic groups of B PTI are calculated — the COOH group of each acid group is taken as the solute. The solvent is the remainder 55 Figure 2.1: An illustration of the model used in our molecular dynamics simula tions of protein-water systems. Here a sphere of SCAAS water is shown solvating the Asp-3 group of a B PTI molecule. 56 of the protein molecule together with the water molecules. The electrostatic in teraction Vqh between the solute charges (Q) and the solvent perm anent dipoles (//,), and the permanent dipole-perm anent dipole interaction are given by V Q ll = 332J2— (2-1) ij Tii UM M = 3 3 2 ^ ^ (2.2) k<j rfei in which Qi and qj are the residual charges of solute atom i and solvent atom j , respectively, in units of electron charge. The distances Tij are in A, and energy is expressed in units of kcal/mol. It is usually not practical to evaluate the full double summation indicated in Eq. 2.2 because of the excessive computational tim e this would require. Instead, only those solvent-solvent interactions for atoms that lie within some specified truncation radius (typically 10-12 A) are included in the summation. A complication introduced by this truncation process — when carried out at the atom -atom level — is that artificial charges are often created when some of the atoms within a dipolar group fall within the truncation radius and the rest of the atoms in the group fall outside of the truncation radius. For this reason we organize the solvent into electrically neutral dipolar groups (made up of two to three atoms each), so th at the truncation can be implemented at the dipole-dipole level rather than the atom -atom level. In other words, none of the atoms in dipole a are perm itted to interact with any of the atoms in dipole b if any of the atoms in dipole a are beyond the truncation radius of any of the atoms in dipole b. This dipole-dipole truncation ensures that the calculated forces and energetics of the system are much more physically realistic than they would be in the case of atom -atom truncation. Unless otherwise indicated, the truncation radius used in all the calculations in this chapter is 12 A. Note that the energy contributions in Eqs. 2.1 and 2.2 are evaluated using the vacuum dielectric constant (e = 1), since the dielectric contributions are included explicitly in the model. References to the sources of the param eters used in the model protein system are given in Table 2.1. 57 P a ra m e te rs S ource Protein Warshel & : Levitt (1976) Protein residual charges and polarizabilities Table 1 of Russell & Warshel (1985) . W ater Table 1.1 Table 2.1: Sources of the parameters used in the protein simulations. Most of the protein param eters (bonding and non-bonding terms) can be found in Warshel & Levitt (1976). The protein residual charges and polarizabilities were refined in Russell & Warshel (1985). The water parameters are the same as those used in Chapter 1 of this work. The contribution to the potential energy from induced dipoles interactions is given by the equation (Russell & Warshel, 1985) Vind = -166 = -1 6 6 £ auk • £? (2.3) i % E in which fii is the dipole induced at the ith atom, is the atomic polarizability ; of atom i (in units of A), £? is the electric field at atom i due to the perm anent charges of the system, and is the total field at atom i due to the complete (perm anent plus induced) charge distribution of the system. The fields and are defined as: (! = E s9 i + -&■) (2-5) ' ij \ ' ij J where — r j. As explained in Chapter 1, the use of Eq. 2.3 in molecular dynamics simulations is not feasible because of the difficulty involved in calculat ing the analytic derivatives of this equation (which are needed to determine the forces to be used in the equation of motion). Instead, we make an approximation by substituting £°fd for where d is an effective screening constant: = (2 .6) 2.4 / The screening constant d (not to be confused with a dielectric constant) is chosen so that the average induced dipoles energy evaluated with Eq. 2.6 is equal to that obtained with Eq. 2.3. Over the course of a long molecular dynamics simulation, the model protein molecule will tend to unfold to some extent uxdess a constraint is added to the potential energy surface to prevent it from doing so. The constraint we use is defined by the equation: V„y,ui = £ - r°)2 (2.7) in which are the coordinates of the protein atoms, r° are the coordinates of the protein atoms when the protein is in its crystal structure, K^y^ai is a force constant, and is the potential energy added to the system by the constraint. The protein crystal structure is obtained experimentally by X -ray diffraction. The effect of the constraint is to keep the protein molecule close to its crystal structure configuration. The constraint is weak enough (with K^y^ai typically set to a value of 0.4 kcal mol-1 A-2) to give the protein molecule sufficient flexibility to relax from the crystal structure to conformations that are more favorable to it when it is in solution. 2.3 A d a p tin g th e S C A A S M o d e l for L a rg e P r o te in S o lu te s As was stated previously, our scheme for including water in the protein-w ater model system is to solvate a relatively localized portion of the protein molecule rather than the entire molecule. We engage the SCAAS model for this purpose. It has been shown that the SCAAS model performs well for studies involving small solutes, such as the chloride and sodium ions examined in Chapter 1, but the model was in fact developed primarily for use in studies of large biomolecules. Only minor modifications to the SCAAS model are necessary in order to convert it for use with the large protein solutes th at are the focus of this chapter. 59 2.3.1 T h e P o la riza tio n C o n stra in t The idea behind the polarization constraint developed for the SCAAS model is to constrain the average angular distribution function of the solvent molecules in the surface region (s) of the model system (see Fig. 1.1) to be the same as the distribution function expected for the actual system. In all of the applications reported in Chapter 1, the water molecules in the surface region (s) always formed a complete spherical shell, which as can be seen in Fig. 2.1 is no longer the case for protein-w ater systems. The shape of the surface region (s), however, has no effect on the implementation of the polarization constraint in protein- water systems; it is not necessary for the shape of the surface region (s) to be a completely spherical shell. Examination of Eqs. 1.17-1.27 reveals that the only requirement placed on the molecules in the surface region (s) in the derivation of the polarization constraint is that they lie at approxim ately the same radial distance from the center of the sphere. The polarization constraint as developed in Chapter 1: VT'tfc) = ^ ( 0 i - W> - AS;)2 (2.8) can therefore be taken exactly as is and used for the water in our protein-w ater system. 2 .3 .2 T h e R ad ial C o n stra in t The same radial constraint th at was developed for the SCAAS model in Chap ter 1: vrd (R i) = ~(Ri - (R “ j) - & Ri)2 (2.9) is used in the protein-w ater model. The constraint distances {Rj), however, must be defined in a more general way than is done in Eq. 1.31, because now water molecules halve been displaced by the protein molecule. The reference distances {Rj) are now defined with the equation 3 = 1 — « white - nproteiniiRj)) + ^ 7 rp(R°j)3 (2.10) 60 where j and naoiute have the same meanings as they had in Eq. 1.31, and nprotein(R) represents the number of water molecules excluded by protein atoms within the radius R, where R is measured from the position of the solute. The function n.protein(-R) is determined by superimposing a sphere of bulk water molecules onto the protein molecule, with the center of the sphere coinciding with the solute. Those water molecules th at penetrate the van der Waals radii of any of the protein atoms are then counted in nprotein(H). The force constant K r and all of the other details of the radial constraint are exactly as they are j ; described in Chapter 1. 2 .3 .3 S ize D e p e n d e n c e In the previous chapter the SCAAS model was shown to possess only slight dependencies on the size of the sphere chosen for simulations involving small solutes. We now address the issue of the size dependence of the SCAAS model when it is incorporated into a large protein-water system. For this purpose we performed a set of four free energy calculations in which the four model systems were identical except for the size of the sphere of water molecules. In these ; simulations, the difference between the solvation free energies of the ionized and I unionized forms of the Glu-49 acid group of BPTI was calculated. This free energy difference — which we designate as AAG^0 ; — has special relevance for the calculation of piTa’s (to be discussed in the next section). The m ethod used j for calculating A A Gp g o l is the same multistage sampling technique (described in detail in Appendix C) used in Section 1.4.3 for calculating the solvation free energy of a sodium ion. For each of the four calculations, trajectories of length 2 psec each were propagated on a total of 12 potential surfaces (see Eq. 1.45), j and then Eqs. 1.44 and 1.43 were applied to obtain the values of A A Gp aot. The ! results of the four calculations are summarized in Table 2.2. Disregarding the result obtained with the 8 A SCAAS sphere, the ~ 4 kcal/m ol spread in values is similar to the spread in values obtained in Section 1.4.3 for the sodium Sphere Radius . 8 A -71.9 ± 1.6 9 A -80.2 ± 1.5 10 A -82.6 ± 1.3 11 A -78.3 ± 0.8 Table 2.2: A A G^o l (in kcal/m ol) of the Glu-49 acidic group of B PTI as a function of the SCAAS sphere radius. Here AA(?j0 i is defined as the difference between the solvation free energies of the ionized and unionized forms of the acid group. How the error bars listed in the table are arrived at is described in Appendix C. ion solvation free energy. A 1-2 kcal/m ol spread would have been excellent, but considering the total free energy difference of % 80 kcal/m ol involved in this calculation, the 4 kcal/m ol spread is not discouraging. The result obtained with the 8 A sphere is markedly different from the other three results, which should probably be interpreted as a warning against using a SCAAS sphere with a radius of less than 10 A in simulations of protein-w ater systems. 2 .4 C a lc u la tio n o f p i ^ ’s o f th e a cid ic g r o u p s o f B P T I Since a com putational riiodel is supposed to provide a close facsimile to physical reality, one of the first concerns of the developer of a computer simulation model is with establishing how the approximations m ade in devising the model affect the precision and accuracy of results obtained from simulations performed with the model. The precision of a quantity “measured” d a a molecular dynamics simulation is directly related to the length of computer time needed to obtain a convergent result for that quantity. This means th at — although the conver gence times for different quantities can differ enormously — any quantity can be obtained with any desired degree of precision, provided that one is willing to invest the necessary computer tim e to do so. On the other hand, the results of 62 a calculation cannot be made more accurate simply by continuing the simula tion for a longer amount of time. The accuracy of a model hinges on how well \ the convergent results of calculations agree with experimental information. The j successful duplication of one particular experimental result does not constitute j enough evidence to corroborate the model’s accuracy, however. It is well known j j that most models (even bad ones) can duplicate a given experimental result when the “correct” set of param eters is chosen. In other words, the physical premises the model is based upon can be incomplete or even incorrect, and it is still pos- ; sible to obtain results th at agree with a given experimental datum . It is much f more difficult to calculate two or more quantities with the same model, using the same set of param eters, in such a way that the results agree with the experimen tal information. In this case, only those models with the correct physics built into them will be able to provide accurate results, while no amount of param eter adjustm ent will reconcile the results of inadequate models with the experimental data. We now turn our attention to the question of how accurate the results of simulations with our protein-w ater model are. An examination of any system at the atomic level will reveal that electro static energies have by far the greatest influence on the total energetics of the system, and thus a particularly appropriate test of our model is suggested by the calculation of p K ^s, which involves the evaluation of electrostatic energies of ionized acidic groups. Evaluation of the self energies of charged species (i.e. the energies of the species in their given dielectric environments relative to their en ergies in vacuum) is a prerequisite for consistent electrostatic calculations. The self energies of charged species - — such as the ionized acid groups we will be con cerned with shortly — are taken into account automatically when we calculate their solvation free energies. In this section we report the results of calculations of the pica’s of the four acidic groups of BPTI. These pica’s have been reliably determined with 13C NMR spectroscopy by Richarz & W uthrich (1978), and their values are listed in Table 2.3. This set of four calculations may not seem at first to qualify as a valid test 63 Acid P K a Asp-3 3.0 Glu-7 3.7 Glu-49 3.8 Asp-50 3.4 acetic 4.75 Table 2.3: The pATa’s of the four acidic groups of the protein B PTI. These p R a values were obtained by Richarz Sz Wuthrich (1978) by monitoring the changes in the 13C NMR spectrum of B PTI as a function of pH. The last entry in the table is for acetic acid in purely aqueous solution, and is relevant for calibration purposes. of the m odel’s accuracy, since in each case we are calculating the same quantity: the pK a of the given acid group. However, the local environments of the four groups are quite different, with the various contributions (protein perm anent dipoles, water perm anent dipoles, induced dipoles) to the solvation free energies differing by as much as 25 kcal/m ol, and the four calculations are therefore sufficiently different to constitute a valid test of the model. In order to obtain quantitatively correct results from all four calculations, the model system must possess the proper balance between the various contributions to the free energy. An additional factor to consider is that the determination of pRVs involves the calculation of free energy differences of magnitude 70-80 kcal/mol. To accurately determine such large free energy differences within a reasonable error range is a demanding test of any model. The first reports of microscopic calculations of the energetics of the acidic groups of B PT I were those of Warshel & Russell (1984), and Russell & Warshel (1985). These studies were done with a conceptually simple, yet quite powerful model th at used static protein coordinates, and represented the solvent molecules w ith Langevin point dipoles. These calculations yielded results th at were accu rate within a 5 kcal/m ol error range. (The error range we are referring to here is for the accuracy of the calculation, rather than its precision.) In a more recent 64 work (W arshel et al., 1986), which used an early version of the present model, the piTa’s of the two groups Asp-3 and Glu-7 were calculated, yielding results th at were apparently accurate to within 2 kcal/mol. This study was done with a limited amount of computer time, and only examined two of the four acid groups. In this study, 80 water molecules were used in both Asp-3 and Glu-7 calculations. However, to be consistent in a comparative study such as this, the number of water molecules should be determined by the radius of the sphere (as we now do), rather than taken as a constant number. A much more extensive study (which follows in this work) has led to the conclusion th at our earlier re port (Warshel et al., 1986) of achieving an error range of only 2 kcal/m ol was too optimistic, and th at we still have a long way to go in order to be able to perform calculations with this degree of accuracy. At present, the relatively un biased set of four pK a calculations that we will report shortly still places the error range at % 5 kcal/m ol. Obtaining a 2 kcal/mol error range in calculations of the energetics of biochemical processes remains as one of the main challenges in this field. 2.4.1 M e th o d The free energy of the acid dissociation reaction: AH(aq) — ► A -(ag) + H+(ag) (2.11) is given by A G = 2.3RT(pK™ - pH) (2.12) where the w superscript attached to the acid dissociation constant K™ indicates th at we are referring to a free acid in a solvent of pure water. In a real sys tem, after the dissociation occurs, the hydrogen ion diffuses away into solution. However, when one is using a finite model system to simulate this reaction, the question of where to put the dissociated hydrogen ion arises. As is shown in Appendix B, we are saved from having to answer this question by the fact that 65 free energy is a state function. The reaction described by Eq. 2.11 can therefore be expressed as a series of separate reactions which sum up to Eq. 2.11, and which can be chosen so as to simplify our task. A decomposition of the acid dissociation reaction that suits our needs yields the identity (see Appendix B): A G ^ (A -) - AG:„,(AH) = 2.3R T pK : - A G ^ /A H ) - I P ( H) + E A (A) - AG;„,(H+) (2.13) where AG^on(i(AH) is the energy needed to break the AH bond, 7P(H ) is the ionization potential of H, E A ( A) is the electron affinity of A, and A G°ol(X) is the standard free energy of solvation for X. For many acids, the values of the quantities on the right-hand side of Eq. 2.13 can be found in the literature. Note th at the way the problem has been cast in Eq. 2.13, the only quantities that actually need to be obtained from the computer simulations are solvation free energies. This saves us from having to simulate the bond-breaking step, or from needing to worry about the problem of the poorly-defined hydrogen ion concentration just mentioned. Our first task is to satisfy the identity expressed in Eq. 2.13 by finding a suitable set of COOH param eters for a small reference acid in pure aqueous solution. Adjustm ent of the acid param eters is completely valid at this stage, but after the COOH param eters have been calibrated with this calculation, they must remain fixed for all of the subsequent calculations involving the protein acidic groups. Note th at this calibration procedure also fixes the param eters of the water molecules that may be used in the subsequent calculations. The reference acid should be small, so th at all of the solvation energy of the COOH group can be attributed to the surrounding water (i.e. the acid should not be large enough to be able to solvate itself). We have chosen acetic acid as the reference acid, although formic acid could also have been used. At a tem perature of 298 K, the pK™ of acetic acid is 4.75 (Handbook of Chemistry and Physics, 1970), the gas phase deprotonation free energy, (A G j^ ^ A H ) + I P ( H) — E A (A)), is 341.5±2 kcal/m ol (Cumming & Kebarle, 1978), and the absolute standard solvation free 66 energy of the proton, A(?®0j(H+), is — 260.5 kcal/m ol (Baxendale, 1964). W ith this information, Eq. 2.13 becomes: A A G ^ = A G ^ (A -) - A G ^(A H ) = -7 4 .5 ± 2 kcal/m ol (2.14) where, for the sake of brevity, we will now use A A G°ol to signify (AG“#I(A‘ ) — A G :oi(AH)). The model system we employed in order to calculate AAG"o / consisted of a single acetic acid molecule surrounded by an 11 A SCAAS sphere of water molecules. To evaluate free energy differences, we use the equation (Valleau & Torrie, 1977): exp{— /?AGb J = (exp{— /3AT4a})a (2.15) in which j3 = (kBT)~J, kB is the Boltzmann constant, A Gba = Gb — Ga, AVt,0 = 14 — Va> and (X )a indicates a Boltzmann average of X on potential surface a. The derivation of Eq. 2.15 (along with other details concerning the calculation of free energy) is given in Appendix C. Here, the potential surfaces Va and Vh represent the unionized and ionized states of the acid group, respectively. It is not feasible to try to evaluate free energy differences of more than as 10 kcal/m ol directly with Eq. 2.15 (see Appendix C). We therefore use a step wise procedure known as multistage sampling (Valleau & Torrie, 1977). In this procedure, trajectories are run not only on Va and V & , but also on a number of interm ediate potential surfaces V), which are constructed from the potentials Va and Vb as follows: 14 = XjVb + (1 - Xj)Va (2.16) where A j are parameters that are given values between 0 and 1. To calculate A A G°ol, we employed a total of 12 potential surfaces (Va, Vb, and ten interm edi ate states V}) in order to “charge” the acid from its unionized form to its ionized form. The trajectories on each potential surface were for a duration of 2 psec each. We performed several multistage sampling calculations with different COOH param eters in the attem pt to achieve a value of — 74.5 kcal/m ol for AAG°oi. 67 Unionized Acid (14) Q c r e C 0.608 2.918 0.151 Oc -0.488 3.442 0.182 Oh -0.541 3.442 0.182 H 0.421 0.982 0.159 Ionized Acid (14) < 1 <7 e C 0.400 2.918 0.151 Oc -0.700 3.442 0.182 Oh -0.700 3.442 0.182 H 0.000 0.982 0.159 Table 2.4: Param eters used for the COOH group. The carbonyl oxygen atom is denoted by Oc, while O h denotes the hydroxyl oxygen atom. The charges of the unionized acid are closely based on the atomic charges derived for formic acid in an ab initio study by Cox & Williams (1981). The units of the various param eters are: electron charge (?); A (<r); kcal mol-1 (e). For simplicity, the polarizabilities of the atoms in the COOH group have been set to zero. Combining rules for van der Waals interactions: < 7;j = (< 7j -f er,)/2, el - 7 - = (e ,^ )1^2. We eventually settled upon the COOH parameters that are listed in Table 2.4. These COOH param eters are the ones used in all of the subsequent protein cal culations. The charges of the unionized acid are based closely on those obtained in an ab initio study by Cox & Williams (1981). The charges of the ionized acid are slightly different than the charges used in Warshel et al. (1986). We have found th at the calculated A A G ^ ’s are relatively insensitive to the specific charge distributions used for the two states (except, of course, th at the charges on the COOH atoms should sum to 0 and — 1 for the unionized and ionized states, respectively). The A AG °o l values are quite sensitive, however, to the van der Waals radii of the COOH atoms, and in particular to the radii of the oxygen atoms. Table 2.5 lists the results of the multistage sampling calculations performed with the COOH param eters of Table 2.4. As Table 2.5 shows, these a a g :0 , for acetic acid A i A; AGji — A G ^ H A Gji - AG y) 0.00 0.15 - 1.6 -1 .7 - 1.6 0.15 0.25 -3 .2 -3 .4 -3 .3 0.25 0.40 -6 .9 -7 .7 -7 .3 0.40 0.50 - 6.6 -7 .2 -6 .9 0.50 0.60 -9 .3 - 8.6 -9 .0 0.60 0.68 - 8.8 -8 .4 - 8.6 0.68 0.75 - 8.1 -8 .4 -8 .3 0.75 0.82 -9 .3 -9 .4 -9 .4 0.82 0.88 -8 .5 - 8.6 - 8.6 0.88 0.94 -9 .2 - 8.8 -9 .0 0.94 1.00 - 10.0 -9 .7 -9 .9 Total -81.5 -82.0 -81.7 i Table 2.5: A m ultistage sampling “charging” calculation. The A ; and A j listed in each row indicate which potential surfaces were involved in obtaining the results for that row of the table. The potential surface of the unionized acid, Va, corre sponds to A = 0, while the potential surface of the ionized acid, corresponds to A = 1. AGji and — A Gij are defined as: A Gji = — kB T\n(exp{— 0AVji})i, and — A G i j = k ^ T \ n ( e x p { —/3 A V ij})j. The average of A G ji and — A G ij is the actual value of the free energy difference th at we report. The sum of all the individ ual steps yields the free energy difference A A G °ol — ( A G ° ol( A ~ ) — AG°oJ(AH)) for acetic acid in aqueous solution. All energies listed in the table are given in kcal/mol. COOH param eters yield a value of — 81.7 ± 0.3 kcal/mol for A A G °o l of acetic acid, when the target was — 74.5 kcal/mol. Although this seems like a large dis crepancy, these parameters put us in the right ballpark. Performing several more calculations would have yielded better COOH param eters, but at this point we were eager to proceed to the calculations of the protein acid group pATa’s. Now th at the COOH param eters have been calibrated fairly satisfactorily with the acetic acid calculation, evaluation of the pA £’s of the protein acid groups can be attem pted. The symbols K™ and K% refer to the acid dissociation constants of the free acid in solution and the protein-bound acid, respectively. A relation similar to Eq. 2.13 for evaluating pK ^'s can be derived for evaluating pATJ’s (see Appendix B): a g ^ ( a - ) - a g ^ ( a h ) = A G £ ,(A -) - AG” ,(AH) + 2.3RT(pK* - pAT” ) (2.17) The notation in Eq. 2.17 is as follows: A G “ ;(X) is the standard solvation free energy of species X in purely aqueous solution, and AG^o2 (X) is the standard solvation free energy of X when it is chemically bound to a protein molecule (where the protein molecule itself is in aqueous solution). This means th at A G ^ (A -) - A G ^(A H ) in Eq. 2.17 is the same as AG°ol(A~) - A G ^(A H ) in Eq. 2.13. Defining A A Gp a o l = (A G ,o/(A~) — A (?^(A H )) and substituting the acetic acid results from above, Eq. 2.17 becomes: A A G p a o l - -88.2 ± 2 kcal/mol + 2.ZRTpK% (2.18) The solvation free energy difference A A Gp sol may be evaluated with the same “charging” process just described for the acetic acid calculation. For these calcu lations, the complete protein-w ater model described in Sections 2.2 and 2.3 was employed, with the center of the SCAAS water sphere situated at the COOH group of the given acid. For the initial protein coordinates we used the 1.5 A resolution crystal structure determined with X -ray diffraction by Deisenhofer & Steigemann (1975). As with the acetic acid “charging” procedure, a total of 12 70 potential energy surfaces was used to effect the transform ation from unionized acid to ionized acid. Trajectories of duration 2 psec were run on each of the 12 potentials. During all of these simulations, the tem perature of the model system was maintained at 300 K. A complete set of 12 trajectories typically requires about 2 hours of CPU time on a Cray X-M P computer, or about 30 hours on an Alliant computer. 2 .4 .2 D isc u ssio n Table 2.3 reveals th at there is a spread of only 0.8 pK units in the experimentally determined pK% values of the four acid groups of B PTI, which means that for the present purposes, Eq. 2.18 can be simplified even further to: A A Gp a o l = — 83.6 ± 0.6 kcal/m ol (2-19) This means that the A A G^ol values for all four of the acid groups should be within « 1 kcal/m ol of one another. Our goal in this project, therefore, was to try to obtain the four A A Gp a o i values to be as similar to one another as possible, and not to worry too much about obtaining the value of — 83.6 kcal/m ol exactly, since there are some sources of experimental error associated with this number. We did not expect to achieve a successful result on the first attem pt, and this turned out to be the case. At this point we began a lengthy examination of the dependence of the four A A Gp a o l values on the various param eters of the model. As we have stated previously, the COOH and the water param eters can no longer be considered as adjustable. There are, however, other param eters that we are free to adjust. Table 2.6 is an abridged list of the various calculations of A A G*o l we have performed, in which the results are given in chronological order. Since these free energy calculations are quite time-consuming, we first concentrated only on the Asp-3 and Glu-7 acid groups. Calculation A was done with an 11 A SCAAS sphere, an 11 A truncation radius for solvent-solvent interactions, and the polarizabilities of the protein atoms set at 0.05 A3 and 0.8 A 3 for hydrogen and non-hydrogen atoms, respectively. The results of — 83.7 Assorted a a g ^ Calculations Label Asp-3 Glu-7 Glu-49 Asp-50 A -83.7 -85.7 B -80.1 -84.9 C -8 0 .6 -78.6 D -82.0 -78.3 E -78.2 -78.1 F -84.6 -84.8 G -83.6 -75.4 -84.4 -8 2 .8 H -8 4 .7 -80.8 -87.2 - 88.1 I -80.5 -7 8 .4 -81.6 -84.5 Table 2.6: Results of calculations of AAG ^ol for the four acidic groups of BPTI. The labels A— I in the left column are to identify the specific conditions under | which the calculations were performed. These conditions are discussed in the j text. ! kcal/m ol and — 85.7 kcal/m ol for the Asp-3 and Glu-7 groups, respectively, | are fairly good, with the difference between the two being only 2.0 kcal/m ol. j However, the magnitude of A A Gp a o l should be greater for Asp-3 than it is for Glu-7. Calculation B was identical to A, except th at a 13 A truncation radius was used. The results of — 80.1 kcal/m ol and — 84.9 kcal/m ol are less good than the results of calculation A. At this point we discovered, by displaying the protein- water system on a graphics term inal, that the sphere of water molecules was becoming badly distorted over the course of the simulations, so in calculation C we strengthened the radial constraint on the water molecules by decreasing the constraint distances by 0.5 A. The results of — 80.6 kcal/m ol and — 78.6 kcal/m ol are still in disagreement by 2.0 kcal/m ol, but now the m agnitude of A A G^ol for Asp-3 is bigger than that of Glu-7, as it should be. However, exam ination of the system with the graphics term inal showed that the water cluster was still not retaining its spherical shape, so we changed the radial constraint on the water molecules to be more like the constraint V^ygtai used on the protein atoms (see Eq. 2.7): . V™d = - r °)2 (2.20) The ry are the positions of the water molecules’ centers of mass, and the v° are the initial center of mass coordinates. The force constants K (R j ) are dependent on the water molecules’ distance from the center of the sphere: on p 2 K (R j) = 7— ■ - i-rz (2.21) v (Rj + 25)2 v ; K (R j ) has units of kcal mol-1 A - 2 , and was constructed in such a way th at Vjad is quite weak for small Rj, but gradually becomes stronger as Rj increases. This new constraint on the water molecules is much more effective in preserving the spherical shape of the water cluster. Calculation D (and all subsequent calculations) were done with the new water constraint described by Eq. 2.20. However, the results of — 82.0 kcal/mol for Asp-3 and — 78.3 kcal/m ol for Glu-7 are now 3.7 kcal/m ol apart. In calculation E, we adjusted the polarizabilities of the protein atoms to 0.2 A3 and 0.9 A3 for hydrogen and non-hydrogen atoms, respectively. The results of — 78.2 kcal/mol and — 78.1 kcal/m ol for Asp-3 and Glu-7 are now different by only 0.1 kcal/mol, and we had apparently reached an even better error range than was obtained in Warshel et al. (1986). We then began calculations involving the other two acid groups, Glu-49 and Asp-50. Calculation F was conducted under the same conditions as calculation E, and the results of — 84.6 kcal/m ol for Glu-49 and — 84.8 kcal/m ol for Asp-50, when compared to the results for Asp-3 and Glu-7, are therefore quite disappointing, and indicate an error range of 6-7 kcal/m ol for this type of calculation, instead of less than 1 kcal/m ol. We thought that this large error range might be due to having a water sphere of insufficient size. The model was therefore modified to include a shell of Langevin dipoles beyond the sphere of water molecules. The possibility of using this type of model was mentioned briefly at the end of Section 1.4.2. The magni tudes and orientations of the Langevin dipoles are adjusted after every ten steps 73 in the molecular dynamics simulations, while the positions of the dipoles remain fixed. Calculation G (and all subsequent calculations) was done with this new Langevin dipoles feature added to the model. In calculation G, a 10 A SCAAS water sphere was used, with the shell of Langevin dipoles extending to a radius j of 20 A. A 12 A truncation radius was used for solvent-solvent interactions. ! i i The results are even worse than before, with a spread in A A Gp a o l values of 9.0 ! kcal/m ol. In calculation H the radius of the SCAAS water sphere was extended j X * • • • • • ° ^ to 16 A, with a spherical shell of Langevin dipoles filling the region between 16 A and 20 A. Calculation H required about twice the amount of computer tim e as ; calculation G, but the spread in A A Gp aol values was only reduced to 7.3 kcal/m ol. Calculation I was performed with the same conditions as calculation H, except ! o _ i that the truncation radius was extended to 14 A. This change resulted in another i two-fold increase in the computer tim e needed. In calculation I the error range ' has been reduced to 6.1 kcal/m ol. We consider this error range to be high. The failure to duplicate the pAT£’s of the four acidic groups of B PT I within the ! i desirable error range of 1-2 kcal/m ol should not be judged too harshly, however. 1 Fig. 2.2 puts this calculation in perspective by displaying the A A Gp sol values for j the four acid groups as a function of the multistage sampling param eter A. The 6 kcal/m ol error range does not loom so large when compared to the overall energies of 80 kcal/m ol involved in this calculation. Although the 6 kcal/m ol ! error range is larger than we had hoped for, the results of these pK% calculations show that our model is still fairly reliable for electrostatic calculations in proteins. Fig. 2.3 shows the contributions to A A G^0i for the four acids from the various dielectric sources of the protein-w ater system. This information is also summa rized in Table 2.7. As Fig. 2.3 dem onstrates, the contributions to A A Gp a o l vary by as much as 25 kcal/m ol from one acid group to the next. This wide variation in the contributions to A A Gp a o l indicates that the local dielectric environments of the four acid groups are quite different, and therefore that this set of four pK* 1 I calculations is a valid measuring stick of our model’s accuracy. i -20 ~ 0 1 -4 0 - ro u © a . o < 3 < -60 - -8 0 - 6 .2 0 8 4 s J Figure 2.2: The calculated solvation free energy differences A A Gp sol for the four acidic groups of B PT I as a function of the m ultistage sampling param eter A .] j These results correspond to row I of Table 2.6. The results for Asp-3, Glu— 7, Glu-49, and Asp-50 are plotted as dotted, solid, dashed, and solid curves, I respectively. 75 P r o te in p e r m a n e n t d ip o l e s P r o te in induced d ip o l e s W a ter ♦ Bulk T otal s o l v a t i o n f r e e en er g y -80 -60 -40 -20 Figure 2.3: A breakdown of the contributions to A A G*o l from the different dielectric sources in the model. The results listed correspond to row I in Table 2.6. The numbers 3, 7, 49, and 50 at the bottom of the figure refer to the acid groups Asp-3, Glu-7, Glu-49, and Asp-50, respectively. Contributions to A A Gp a o l Source of solvation Asp-3 Glu-7 Glu-49 Asp-50 Protein perm anent dipoles Protein induced dipoles W ater perm anent dipoles Bulk -13.4 - 2.8 -56.3 -8 .3 -18.7 -9 .7 -41.9 -8 .3 -32.5 -10.7 -30.0 -8 .3 -3 2 .3 -1 4 .4 -29.6 -8 .3 Total -80.5 -78.4 -81.6 -84.5 Table 2.7: A breakdown of the contributions to A A G jo I from the different di electric sources in the model. These results (which are also displayed graphically in Fig 2.3), correspond to row I in Table 2.6. 49 50 49 50 7 49 50 -d_ 7 49 50 76 2 .5 R e fe r e n c e s Allen, L.C. (1981) Ann. N.Y. Acad. Sci. 367, 383-406. Baxendale, J.H. (1964) Radiation Res. Suppl. 4, 114-138. Beveridge, D.L., Mezei, M., Ravishanker, G. k Jayaram , B. (1985) J. Biosci. 8 , 167-178. Cox, S.R. k Williams, D.E. (1981) J. Comp. Chem. 2 , 304-323. | ' ' ’ I Cumming, J.B. k Kebarle, P. (1978) Can. J. Chem. 56, 1-9. j Deisenhofer, J. k Steigemann, W. (1975) Acta Crystallogr. Sect. B 31, 238- 250. ! i Handbook of Chemistry and Physics (1970) (ed. R.C. W east), p. D-122. Cleve- land: Chemical Rubber Company. ! Harvey, S.C. (1989) Prot. Struct. Funct. Genet. 5, 78-92. | . ! Jorgensen, W.L. k Ravimohan, C. (1985) J. Chem. Phys. 83, 3050-3054. Kirkwood, J.G. (1935) J. Chem. Phys. 3, 300-313. MacKerell, A.D. Jr., Nilsson, L., Rigler, R. k Saenger, W. (1988) Biochemistry 27, 4547-4556. Mildvan, A.S. k Fry, D.C. (1987) Advances in Enzymology and Related Areas of Molecular Biology, Vol. 59 (ed. A. Meister), pp. 241-313. New York: Wiley. Richarz, R. k W uthrich, K. (1978) Biochemistry 17, 2263-2269. Russell, S.T. k Warshel, A. (1985) J. Mol. Biol. 185, 389-404. j ! Valleau, J.P. k Torrie, G.M. (1977) Modem Theoretical Chemistry, Vol. 5, P art J A (ed. B.J. Berne), pp. 169-194. New York: Plenum Press. j Warshel, A. (1978) Chem. Phys. Lett. 55, 454-458. j 1 Warshel, A. (1982) J. Phys. Chem. 8 6 , 2218-2224. j | Warshel, A. (1984) Proc. Natl. Acad. Sci. U.S.A. 81, 444-448. \ Warshel, A. k King, G. (1985) Chem. Phys. Lett. 121, 124-129. Warshel, A. k Levitt, M. (1976) J. Mol. Biol. 103, 227-249. 77 Warshel, A. & Russell, S.T., (1984) Q. Rev. Biophys. 17, 283-422. Warshel, A., Sussman, F. & Hwang, J.-K . (1988) J. Mol. Biol. 2 01, 139-159. Woodbury, R.G., Trong, H.L. & Neurath, H. (1987) Acta Histochem. Cy- tochem. 20, 261-269. Zauhar, R.J. & : Morgan, R.S. (1988) J. Comp. Chem. 9, 171-187. 78 A ppendix A T he B eem an N um erical Integration A lgorithm To perform Molecular Dynamics (MD) simulations, one must solve the classical equation of motion for all the particles in the system. The equation of motion for a given particle is: < * 2p F -777 = — = a (A-1) ati m where r, F , and a are the position, force, and acceleration vectors, respectively, and m is the mass. Solving for r requires the two integrations v (t') = v(0) + [ a (t")dt" (A.2) Jo r (t) = r ( 0) + f v(t')dtf (A.3) Jo where v is the velocity vector. In general, Eq. A.2 cannot be solved exactly (since the acceleration a is a complicated function of the coordinates), so Eq. A.2 and Eq. A.3 are solved approximately by expanding them in Taylor series and retaining the first few term s. The Taylor expansions of Eqs. A.2 and A.3 are: where xn is an abbreviation for x(tn), and h = tn+1 — tn. For instance, one of the simplest numerical integration schemes is: b? r n+1 = r„ + hvn + (A.6) v„+i = v n + hsin (A.7) which would be classified as a second-order m ethod, since the Taylor series have been truncated at the second time derivative of r. This algorithm suffers from the fact that a very small h must be used in order to get acceptably accurate trajectories, where the criterion for an acceptable trajectory is that the total energy of the system should not drift or oscillate by more than a few percent during a simulation of typical length. As can be seen from Eqs. A.4 and A.5, retaining more term s of the Taylor expansion involves the calculation of deriva tives of higher than order two. The calculation of these derivatives is undesirable, however, because of the computational time required to do so. In addition, since each successive term in the Taylor expansion is more complex than the one that preceded it, writing the computer code for the higher-order derivatives quickly becomes very difficult. Fortunately, it is possible to obtain higher than second- order accuracy without having to evaluate high-order derivatives, by using an algorithm such as the one developed by Beeman (1976). The Beeman algorithm achieves q*h-order accuracy by using linear combina tions of accelerations from previous times to approximate the derivatives up to order q at the current time. There are three steps involved in this procedure. In the first step (known as the predictor step), a new position vector is calculated with the equation q- 1 r n+i = r„ + hvn + h2J2 (A.8) p=i The second step is to evaluate the new acceleration vector, a n+1, using the new position vector in Eq. A.I. In the third step (known as the corrector step), new position and velocity vectors are calculated with the equations 9 - 1 r „+1 = r„ + k v n + h2 ^2 Cpa^-p+2 (A.9) p=i 80 9 - 1 h vn+l - rn+1 - rn + h2 ]T dpe^_p+2 (A .10) p=i The coefficients 6P, Cp, and dp in Eqs. A.8, A.9, and A.10 are found by expanding all the quantities in these equations in Taylor series about tn, collecting like terms, equating these to the terms in the exact Taylor expansions, and solving the resulting set of linear equations. The specific forms of Eqs. A.8, A.9, and A .10 for a third-order algorithm (9 = 3) are, respectively h2 rn+i = rn + hwn + — (4an - a ^ ) (A .ll) o h2 rn+l = rn + k v n + — (an+1 + 2&n) (A .12) D h2 h v n+1 = hvn + — (3an+1 + 3an) (A.13) 0 Beeman also suggests a slightly simpler procedure in which only Eqs. A.8 and A .10 are used, and Eq. A.9 is discarded. This is the procedure that we follow, not only because it is simpler, but also because we have found that it yields more accurate trajectories. Using this variation of the algorithm (and again with g = 3), the specific form of Eq. A.10 is h2 hvn+1 = h-vn + — (2a n+1 + 5a„ - a ^ i ) (A.14) o To summarize, in our MD simulations we use a third-order Beeman algorithm, which consists of first calculating the new position vector with Eq. A .ll, then evaluating the new acceleration vector a„+1 with Eq. A .l, and then using Eq. A.14 to obtain the new velocity vector. The efficiency of this algorithm is made ap parent by the fact th at trajectories of the same accuracy as those obtained using Eqs. A .6 and A.7 with a time step of h can be obtained using this Beeman algorithm with a time step of 40h. A . l R e fe r e n c e Beeman, D. (1976) J. Comp. Phys. 2 0 , 130-139. 81 A ppendix B E valuation o f pica’s The change in free energy for the acid dissociation reaction: AH(<zg, e) -»• A~(aq, e) + H+(a?, 10~pH) (B .l) (where e and 10~pH are the acid and H+ concentrations, respectively) is given by: AG = 2.ZRT(-pK™ - pH) (B.2) The superscript w attached to the acid dissociation constant K™ is used to indicate that we are referring to a free acid in a solvent of pure water. When attem pting to calculate K “ by computer simulation, one is suddenly faced with the complexity of this seemingly simple reaction: the AH bond is broken, an electron is transferred from H to A, and while this is taking place the solvent is reorganizing itself so as to desolvate AH and to solvate the new species A- and H+. Although it is possible to simulate the complete reaction, it is not necessary to do so. We can take advantage of the fact that free energy is a state function by expressing the reaction as a sum of reactions, as follows: AH(ag, e) -► AH(ag, 1 Af) (B.3) AH(ag, lAf) — ► AH(<7, 1 atm ) (B.4) AH(<7,1 atm ) — > A~( H+(aq, 1M) (B.7) | A“ (aq, 1M) — ► A~(ag, e) (B .8) | K +(aq,lM ) -4 H + (aq,l(T pH) (B.9) | In Eqs. B.3-B.9, e is the concentration of AH in solution. The free energies of \ these reactions are given in Table B .l. The sum of the free energies of Eqs. B.3- I ! B.9 is: ! AG = A G t J A - ) - A G U A H ) + A G U H * ). | +AGL j(AH) + IP(H) - EA(A) - 2.3*rPH (B.10) where A G ^ ^ A H ) is the energy required to break the AH bond, 7P(H ) is the ionization potential of H, E A ( A) is the electron affinity of A, and AG°oi(X) is the standard free energy of solvation for species X. Equating Eq. B.10 to Eq. B.2 j and rearranging some of the terms yields: | A G ^ (A -) - AG IJAH ) = ! 2.3RTpK: - A G L ^ A H ) - IP ( H) + EA(A) - A G ;J H +) (B .ll) Eq. B .ll has been arranged in such a way that in many cases the quantities on the right-hand side of the equation can be found in the literature. The absolute stan dard solvation free energy of the proton, AG°o/(H+), is — 260.5 kcal/m ol (Bax- endale, 1964). Gas phase deprotonation free energies (AG^on(i(AH) + JP (H ) — E A ( A)) of quite a few molecules have been determined by Cumming & Kebarle (1978). The pisT^’s themselves can be found in a handbook such as the Hand book of Chemistry and Physics (1970). The only unknown left in Eq. B .ll is the difference between the solvation free energies of A~ and AH, the calculation of which is a task well-suited for the multistage sampling m ethod (see Appendix C). By using Eq. B .ll we avoid having to simulate the bond-breaking step of the reaction, and we do not have to deal with the issue of what type of constraint should be placed on the position of the H+ ion during and after the reaction. W hen the acid in question is part of a protein molecule, the relevant dissoci ation reaction is: RAH(ag, e) -► RA~(aq, e) + H+(aq, 10“pH) ' (B.12) in which R designates the bulk of the protein molecule, and AH is the acidic group. We choose to think of AH as being an acetic acid molecule which is covalently bonded to R. This choice is somewhat arbitrary, but turns out to be convenient for our purposes, since the side chains of both aspartic and glutamic acids each contain at least an entire acetic acid molecule. The free energy of the above reaction is: A G = 2.3RT(pK% — pH) (B.13) The superscript p on K p is used to indicate that the acid group is covalently bound to the protein molecule, and therefore that the solvent environment con sists of the protein molecule itself as well as the water surrounding the protein. Again, we can use the fact that free energy is a state function to express the above reaction more conveniently as a sum of the reactions: RAH(ag, e) — » RAH(ag, 1M ) (B.14) 84 Reaction A G Eq. B.14 Eq. B.15 Eq. B.16 Eq. B.17 Eq. B.18 Eq. B.19 —R T In e A G t^ R - A H ) + AG?0 ,(AH) - A G ^(A H ) 2.3RTpK? -2.3R T pH - A G ^ ( R - A - ) + A G ^ (A -) - A G ^ (A -) R T In e i Table B.2: Free energies of the steps involved in the acid dissociation reaction of a protein acid group. ! RAH(aq, 1 AT) -> R(aq, 1M ) + AH(aq,.lAf) (B.15) ! AH(aq, 1M) — ► A~(aq, 1M) + H+(aq, 1M) (B.16) [ H+( a ? , l M ) H +(a?,10-pH) (B.17) j R(aq, 1M) + A~(aq, 1M) — ► RA- (ag, IM ) (B.18) . RA _ (aq, 1M) — » • KA~(aq, e) (B.19) 1 The free energies of these reactions are listed in Table B.2. In the table, AG“ ;(X) 1 I designates the standard solvation free energy of the species X in pure aqueous solution, and A G ^ (X ) designates the standard solvation free energy of X when it is bound to a protein molecule (where the protein molecule itself is in aqueous solution). The sum of the free energies of Eqs. B.14-B.19 is: ; ! AG = A G l,(A -)-A G * „ ,(A H )- A G ^ (A -) + a C ^ ( A H ) j + 2.3RT(pK “ - pH) (B.20) I where we have made the approximation th at the energy of the R-AH bond \ | broken in step B.15 is the same as the energy of the R-A~ bond formed in step ; ' B.18. Equating Eq. B.20 to Eq. B.13 and rearranging some of the term s yields: ' i a g ?„,(a -) - a g l ,( a h ) = I A ff^ (A - ) - A G ^(A H ) + 2.3RT(pKZ - pK ?) (B.21) Again, as in the case of a purely aqueous solution, to calculate the p K ^s of protein acidic groups our task has been reduced to calculating differences in solvation free energies. In this case, two solvation free energy calculations must I be performed: the first to obtain (AG™o!(A~) — AG^,j(AH)), and the second to I • j obtain (AG^oi(A~) — AG^oi(AH)). The utility of choosing AH to be acetic acid j . | is now apparent: the acid param eters can be calibrated in the calculation of ! (A G “ ;(A _ ) — A 6r^ (A H )), and we can su b stitu te'4.75 (the pK a of acetic acid ! in aqeous solution) for pK™ in Eq. B.21. j * i I ; ! | ! B . l R e fe r e n c e s j Baxendale, J.H . (1964) Radiation Res. Suppl. 4, 114-138. j | Cumming, J.B. & Kebarle, P. (1978) Can. J. Chem. 56, 1-9. i Handbook of Chemistry and Physics (1970) (ed. R.C. W east), p. D-122. Cleve- | land: Chemical Rubber Company. ) 86 | A ppendix C i s I ! Evaluating Free Energy | ; l I \ ' < | D ifferences w ith M ultistage j ■ I I I I Sam pling | I ! > f | It is often desirable to extract information from molecular dynamics (MD) or j | Monte Carlo (MC) simulations about the values of various therm odynam ic ob- j servables. The value of an observable x that corresponds to the experimentally- j ■ j measured value can be calculated with either of the following two relations: j 5 = W L t dt ^ C'1^ I ‘ { , v ;< iri(r)e x p { -/3 (7 (r)} | < s > = -fdr'iJC -m vr { ] | where T represents the complete set of spatial coordinates of the given system, j By the ergodic hypothesis, the long-time average x and the ensemble average | (x) are equivalent. Eq. C .l lends itself to use in MD simulations, and Eq. C.2 is j ■ appropriate to use in MC simulations. i ■ /-i | The free energy, G, of a system cannot be evaluated with either Eq. C .l or ! I * * j Eq. C.2, because G does not have a single well-defined value either as a function | of the coordinates f or as a function of the time f; G itself is a statistical quantity. It might be thought that G could be evaluated with the equation that defines it (McQuarrie, 1976): G = - k BT In ( J dr exp{-/3U (r)}^ + PV (C.3) but the integral within Eq. C.3 does not converge within a reasonable com puta tional time. The difference in free energy between states a and b, characterized by poten tial surfaces £ 7 a(r) and Ub(T), respectively, can be w ritten as: 'fd T exp {-(3 U ay A G ab = Ga — Gb = —kBT In + p(Va - V b) (C.4) L /d re x p { -/3 t7 f e } In the systems we study (and for condensed phases in general), changes in volume as a result of changing the potential surface are negligible (i.e. Va fts I4)i so we will not include pV terms in subsequent discussion. Valleau & Torrie (1977) showed that Eq. C.4 can be manipulated — by multiplying the integral in the num erator by unity in the form exp{/3£/&} exp{— f3Ub} — to yield the following expression: ’fd T e x .p {-p A U ab} e x p { - m y A G ab = — kBT \n (C.5) / dl? exp{— /?£/& } where we have defined A Uab = Ua — Ub• It can be seen th at the argument of the logarithm function in Eq. C.5 is in the form of Eq. C.2, with x = exp{— /?AUab}. Eq. C.5 can therefore be written equivalently as: A Gab = - k BT\n{exp{-/3AU ab})b (C .6) where (exp{— 0A U ab})b is the Boltzmann average of the quantity exp{— f3AUab} evaluated on potential surface U b- If the labels a and b in Eq. C .6 are inter changed, we get a completely equivalent way of expressing the same relation: - A G ba = kBTln(exp{-/3A U ba})a (C.7) In theory Eqs. C.6 and C.7 are identical, but in practice, the two equations tend to yield different results because different regions of configuration space are 88 traversed in simulations on surfaces Ua and Ub. This fact suggests that we can use the difference between Eqs. G.6 and C.7 to estim ate the error bars for our I free energy calculations: j | ' f SAGab = - ( A G a f e - f A & b a j ( C .8 - ) j Z \ I j \ and we can use the average of Eqs. C.6 and G.7: j I - i I A <jy* = i ( A G „ - AGw) (C.9) i 1 ^ i \ I \ as the free energy difference we actually report, A GT J ^ 1 . j r I | The error SAGab can be quite large if we attem pt to evaluate free energy j I 1 ! differences that are too large. The reason for this is th at the potentials Ua I s ' S ! and Ub are dissimilar, and therefore the regions of configuration space sampled in the simulations on Ua and U b will be quite different. This situation can be j remedied by evaluating large free energy differences as a sum of small free energy I | ■ differences: j | Ga — Gb = ( Ga — Gi) + (C?i — G2) + • • • + — Gn ) + ( Gn — Gb) ( C . 1 0 ) ’ . i For each Gj in Eq. C.10 there is a corresponding potential surface Uj which is j 1 • i j generally defined as a linear combination of Ua and Ub as follows: j i I r ■ i I Uj = XjUb + (1 — \j)U a (C.ll) i I ' * ! where the are assigned values between 0 and 1. The idea is to choose the Xj in j I such a way that each of the parenthesized terms in Eq. C.10 is small enough to i be reliably evaluated with Eqs. C .6 and C.7. How many interm ediate potentials j ! Uj should be used depends on the particular problem being considered. Since j j we now must perform simulations on the n + 2 surfaces Ua, f/i, U2, , Un- i | Un, Ub, instead of just the 2 term inal surfaces Ua and Ub, the evaluation of large f j free energy differences can be quite time-consuming. We usually evaluate large free energy differences in steps of up to but no more than 10 kcal/m ol each. C .l R e fe r e n c e s McQuarrie, D.A. (1976) Statistical Mechanics, p. 55. New York: Harper & Row. Valleau, J.P. & Torrie, G.M. (1977) Modem Theoretical Chemistry, Vol. 5, Part A (ed. B.J. Berne), pp. 169-194. New York: Plenum Press. | A ppendix D ! j i i j O ptim izing Fortran C ode for j j ! i M achines w ith V ector P rocessing C apabilities Fortran programs typically contain constructs of the form do 20 i - l , n a ( i ) = 2 .* b (i)+ c ( i) b ( i ) = c ( i ) i 20 continue which are known as DO loops. On a normal (scalar) computer, the index i is assigned a value of 1 and the instructions up to the end of the loop (the “20 continue” statem ent) are carried out. Then i is incremented to 2 and the loop instructions are again performed. This cycle is continued until i reaches n, after which the loop is exited. On a computer with vector processing capabilities, the instructions within the loop may be carried out for many values of the index i simultaneously, rather than sequentially. The rate at which programs exe cute can be increased dram atically by this simultaneous processing within DO loops. W hether or not a given DO loop will “vectorize” on a vector machine is determined by the Fortran compiler. There are circumstances in which the compiler will decide not to vectorize the loop, usually because of ambiguities in the indexing, but for other reasons as well. Here is a DO loop (actually two nested DO loops) from one of our programs: energy=0. k=0 do 50 i = l , n - l n j = ip p ( i+ l) - ip p ( i) do 40 j t = l ,n j k=k+l . j=jpp(k) d r ( l ) = x ( l , i ) - x ( l , j ) d r (2 )= x (2 , i ) - x ( 2 , j ) I d r ( 3 ) = x ( 3 ,i ) - x ( 3 ,j ) r 2 = d r (l)* d r (l)+ d r ( 2 ) *dr(2)+ d r(3)*d r(3) r 2 = l./r 2 j rl= sq r t(r 2 ) r3=rl*r2 c = 3 3 2 .* c r g ( i)* c r g ( j)* r l energy=energy+c df=-r2*c f ( i , i ) = f ( l , i ) + d f * d r ( l ) f ( 2 ,i) = f ( 2 ,i) + d f * d r ( 2 ) f ( 3 ,i ) = f ( 3 ,i ) + d f * d r ( 3 ) f ( l , j ) = f ( l , j ) - d f * d x ( l ) f ( 2 ,j ) = f ( 2 ,j ) - d f * d r ( 2 ) f ( 3 ,j ) = f C 3 ,j ) - d f * d r ( 3 ) 40 continue 50 continue 92 The DO 40 loop will not vectorize because the variable j does not have a well- defined value (the compiler has no way of knowing what , the array jpp will contain when the program is executed). Performing vector operations on the j J array f near the bottom of the loop could have undesirable results because of this ambiguity, so this loop would be compiled in scalar mode to be on the safe side. It is possible to give the compiler a directive to ignore this ambiguity, and to compile this loop in vector mode. In general, compiler directives should not be used unless the programmer really knows the effect of vector operations on the array f, and is confident that the resulting machine code generated by i I the compiler is faithful to the original intent of the program as w ritten. An j alternative to giving compiler directives is to modify the code in such a way th at j | the loop will vectorize. Here is an example of the previous portion of code, which S j has been modified so as to make the DO 40 loop vectorizable: energy=0. I k=0 j do 50 i = l ,n-1 | j n j = ip p ( i+ l) - ip p ( i) ! do 40 j t = l , n j i i k=k+l I j=jpp(k) d r ( l ) = x ( l , i ) - x ( l , j ) d r ( 2 ) = x ( 2 ,i ) - x ( 2 ,j ) d r ( 3 ) = x ( 3 ,i ) - x ( 3 ,j ) r2 = d r(l)* d r(l)+ d r(2 )* d r(2 )+ d r(3 )* d r(3 ) r 2 = l./r 2 r l= sq r t(r 2 ) r3=rl*r2 c=332. * c r g ( i ) * c r g ( j ) *rl energy=energy+c df=-r2*c 93 f ( 1 , i ) = f ( l , i ) + d f * d r ( l ) f ( 2 ,i ) = f ( 2 ,i ) + d f * d r ( 2 ) f ( 3 ,i ) = f ( 3 ,i ) + d f * d r ( 3 ) f t m p ( l,j t )= d f* d r (l) ftm p (2 ,jt)= d f* d r (2) ftmp( 3 , jt ) = d f * d r (3) 40 continue k=k-nj do 45 j t = l , n j k=k+l j=jpp(k) f (1,j)= f(1,j)-ftm p (l,jt) f (2,j ) =f(2,j ) -ftmp(2,j t ) f (3,j)= f(3,j)-ftmp (3,jt) 45 continue 50 continue The DO 40 loop will now vectorize because vector operations on the array f do not conflict with operations on the new array ftmp, which is used as a tem porary storage area. The new DO 45 loop is not vectorizable, but most of the intensive calculations are done within the DO 40 loop. Other factors th at affect whether a DO loop is vectorizable are flow control statem ents such as GOTO statem ents, calls to subroutines, and IF statem ents. DO loops containing GOTO statem ents will not vectorize. Loops containing calls to most of the intrinsic Fortran subroutines (such as sqrt, cos, sin, exp, log, etc.) will vectorize because the compiler replaces the call to these subroutines with in-line code. Calls to other subroutines (such as user-written subroutines) will in general cause the loop to be un vectorizable. Some IF statem ents are vectorizable, but most are not. For example, in the following code: knt=0 94 do 70 i = l , n - i do 60 j = i+ l,n d r ( l ) = x ( l , i ) - x ( l , j ) d r (2 )= x (2 , i ) - x ( 2 , j ) d r ( 3 ) = x ( 3 ,i) - x ( 3 , j) r2 = d r (l)* d r (i)+ d r (2 )* d r(2 )+ d r (3 )* d r (3) i f ( r 2 .g t .r2min.a n d .r2 . I t .r2max) knt=knt+l 60 continue 70 continue the DO 60 loop is not vectorizable because of the IF statem ent. There is usually a way to replace the IF statem ent with a few purely m athem atical operations, as this slight modification of the above code illustrates: knt=0 do 70 i = l , n - l do 60 j = i+ l,n dr( 1 ) =x( 1 , i ) - x ( 1 ,j ) d r (2 )= x (2 , i ) - x ( 2 ,j) d r (3 )= x (3 , i ) - x ( 3 ,j) r2=dr (1) *dr ( l) +dr (2) *dr (2) +dr (3) *dr (3) kmin=r2/r2min kmin=min0 (kmin, 1) kmax=r2max/r2 kmax=min0 (kmax:, 1) knt=knt+kmin*kmax 60 continue 70 continue This segment of code produces the exact same results as the previous segment, except th at now the DO 60 loop will vectorize. The speed gained by execution 95 of the loop in vector mode rather than scalar mode more than compensates for the few extra m athem atical operations used to construct the “IF ” statem ent. 96 | A ppendix E | i j 1 Selected Bibliography | | Adelman, S.A. & Brooks III, C.L. (1982) J. Phys. Chem. 8 6 , 1511-1524. Chelkowski, A. (1980) Dielectric Physics. New York: Elsevier. Frohlich, H. (1958) Theory of Dielectrics. London: Oxford University Press. Hill, T.L. (1960) An Introduction to Statistical Thermodynamics. Reading, | Mass.: Addison-Wesley. ■ Kubo, R. (1966) Repts. Prog. Phys. 29, 255-284. j McQuarrie, D.A. (1976) Statistical Mechanics. New York: Harper & Row. j Neumann, M. (1985) J. Chem. Phys. 82, 5663-5672. Purcell, E.M. (1965) Electricity and Magnetism. New York: McGraw-Hill. j - s Rahman, A. & Stillinger, F.H. (1971) J. Chem. Phys. 55, 3336-3359. ! j Richarz, R. & W uthrich, K. (1978) Biochemistry 17, 2263-2269. j j Russell, S.T. & Warshel, A. (1985) J. Mol. Biol. 185, 389-404. I i Valleau, J.P. & Torrie, G.M. (1977) Modem Theoretical Chemistry, Vol. 5, P art ! A (ed. B.J. Berne), pp. 169-194. New York: Plenum Press. | Warshel, A. (1979) J. Phys. Chem. 83, 1640-1652. ! Warshel, A. (1982) J. Phys. Chem. 8 6 , 2218-2224. I | Warshel, A. (1984) Proc. Natl. Acad. Sci. U.S.A. 81, 444-448. Warshel, A. & King, G. (1985) Chem. Phys. Lett. 121, 124-129. Warshel, A. & Levitt, M. (1976) J. Mol. Biol. 103, 227-249. Warshel, A. & Russell, S.T. (1984) Q uart. Rev. Biophys. 17, 283-422. Warshel, A., Sussman, F. & King, G. (1986) Biochemistry 25, 8368-8372. UMI Number: DP22683 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. Dissertation Publishing UMI DP22683 Published by ProQuest LLC (2014). Copyright in the Dissertation held by the Author. Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106-1346
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Calculations of electrostatic interactions in proteins
PDF
Computer modeling and free energy calculatons of biological processes
PDF
Computer simulation of chemical reactions in aqueous solutions and biological systems
PDF
Chiroptical spectroscopy of nitrogenase
PDF
Consistent evaluation of binding free energies and study of the role of electrostatic effects in the stabilization of protein complexes
PDF
A critical study of the hydrogenation of talloel
PDF
Changes in the rate of net synthesis of oral tissue collagen with age and wound healing
PDF
The role of water in the heat denaturation of egg albumin
PDF
An experimental investigation of amino compounds of boron hydrides
PDF
Computer simulation studies of charge transfer through biological and artificial membrane channels
PDF
Computer simulations of polymeric systems
PDF
Building a model of organization acculturation: an interpretive study of organizational culture and stories
PDF
A new approach to the problem of measuring the properties of micelles
PDF
Reactions of atomic carbon studied using pulsed molecular beams
PDF
Computer simulation of polymers and membranes
PDF
A study of the effect of glycogen on the oxidation of butyrate by rat liver slices
PDF
Adrenalectomy and fat absorption
PDF
Contributions to the chemistry of the chlorides, simple and complex, of beryllium
PDF
A novel reaction of mismatched cytosine-cytosine pairs associated with Fragile X
PDF
Connections of the dorsomedial hypothalamic nucleus in the rat
Asset Metadata
Creator
King, Gregory
(author)
Core Title
Computer simulation of polar solvents with the SCAAS model
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Chemistry (Chemical Physics)
Degree Conferral Date
1989-12
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
chemistry, organic,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Warshel, Arieh (
committee chair
), [illegible] (
committee member
), Petruska, John (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c17-766525
Unique identifier
UC11348182
Identifier
DP22683.pdf (filename),usctheses-c17-766525 (legacy record id)
Legacy Identifier
DP22683
Dmrecord
766525
Document Type
Dissertation
Rights
King, Gregory
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
chemistry, organic