Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Calculations of electrostatic interactions in proteins
(USC Thesis Other)
Calculations of electrostatic interactions in proteins
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
CALCULATIONS OF ELECTROSTATIC INTERACTIONS IN PROTEINS by Stephen T. Russell A Dissertation Presented to the FACULTY OF THE G RADUATE SCH O O L UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree, DO CTO R O F PHILOSOPHY (Chemistry/Chemical Physics) August 1985 UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA-90089 This dissertation, written by under the direction of h..l?. Dissertation Committee, and approved by all its members,. has been presented to and accepted by The Graduate School, in partial fulfillment of re quirements for the degree of Ph.D. c 0 s r R % 7 Stephen T. Russell DOCTOR OF PHILOSOPHY ‘of Graduate Studies D ate.... DISSERTATION COMMITTEE To Robin i i A C K N O W LED G E M EN TS I would like to thank Professor Arieh Warshel and a ll m y friends for their help, support and friendship during m y many years at USC. Special thanks to m y parents for their contributions to m y early education. And fin a lly , I would like to thank m y wife, Robin, for supporting m e -- both financially and emotionally - through the years. TABLE O F CO NTENTS DEDICATION ............................................................... ACKNOW LEDGEM ENTS ............................................................................................ LIST OF TABLES............................. . LIST OF FIGURES................................................................................................ ABSTRACT . . ..................................................................................................... CHAPTER 1 - DEVELOPMENT OF A M O DEL TO SIMULATE THE EFFECT O F ELECTROSTATIC INTERACTIONS IN PROTEINS ..................... 1.1 Introduction............................................................................... 1.2 Simulating the Effect of the Protein Permanent Dipoles and Charges. . . . . 1.3 Simulating the Induced Dipoles of the Protein. . . 1.4 Contribution of the Surrounding Solvent...................... 1.5 Calculations of Environmental Effects in Proteins. 1.6 References . . . . CHAPTER 2 - CALCULATIONS OF THE pK 'S OF THE FO U R ACID G R O U PS a IN BOVINE PANCREATIC TRYPSIN INHIBITOR . . . . . . 2.1 General Formalism................................................................... 2.2 Calculations of In trin sic pK 's . . a 2.3 S ta b ility and Convergence of the Calculations. . . Page 2.4 Calculations of Charge-Charge Interactions in P ro te in s ............................. ............................. 2.5 Macroscopic Calculations of pK 's in. Proteins, a 2.6 References 85 91 104 3.1 3.2 3.3 3.4 3.5 3.6 3.7 CHAPTER 3 - CALCULATIONS OF THE RATES OF ENZYMATIC REACTIONS: ESTIMATION OF THE ELECTROSTATIC CONTRIBUTION TO CATALYSIS B Y THE SERINE PROTEASES..................... Introduction ...................................... ............................. Computational D etails....................................................... Calculations of Solvation Energies in Solutions. pK 's of Catalytic Groups in Trypsin cl The Energetics of the Proton Transfer Process. . The Stabilization of the Tetrahedral Intermediate References............................. ............................................. 106 106 113 119 121 127 128 131 APPENDIX - MACROSCOPIC TREATMENT O F A N ARBITRARY CH A R G E DISTRIBUTION IN A SPHERICAL CAVITY..................... SELECTED BIBLIOGRAPHY. 133 137 v LIST OF TABLES Table Page 1.1 Calculated and observed solvation energies of some charged groups . . . . . . ....................................... 33 1.2 Calculated solvation energies for an ion in water at 300°K using the ite ra tive form of the Langevin dipoles model........................... .. .. .. .. 35 1.3 Calculated solvation energy for Cl” (r\j = 2.2) using different grid spacings and the "effective" Langevin dipoles model . ................................................... 36 1.4 Parameters used in the PDLD model....................................... 43 2.1 pKQ values of four carboxylic acid groups in BPTI as determined by carbon-13 NMR.......................................... '6 2 2.2 Calculated energy contributions (in kcal/mol) to the ionization of the four acidic groups in BPTI when a ll other ionizable groups are neutral.................. 72 2.3 Calculated energy contributions to the ionization of the four acidic groups in BPTI, when each acids nearest neighbor acid group is ionized ......................... 88 2.4 Calculated energy contributions to the ionization of the four acidic groups in BPTI when the permanent dipoles of the protein are excluded.................................. 102 vi Table Page 3.1 Region I charges used for the active sites of Trypsin and Chymotrypsin , . . . ...................................... 115 3.2 Electrostatic energy contributions to various states of the active sites of serine proteases . . . 117 3.3 Electrostatic energy contributions of Table 3.2 relative to the state His(0)-Ser(0)-Sub(0) . . . . . . 118 3.4 Calculated solvation energies of the fragments involved in the catalytic reaction of serine proteases 120 v ii LIST OF FIGURES Figure 1.1 The polarization of polarizable particles by the fie ld of a positively charged particle . . . . . . . 1.2 The relation between the fie ld £° and the local fie ld £ for the system sim ilar to that presented in Figure 1.1............................................................................ 1.3 Radial, function of the energy stored in the medium surrounding a charge in an (a) non-polar and (b) polar environment................................................................... 1.4 The screening function d(r) for water molecules around an ion. ..................... .... 1.5 Average solute-solvent energy distribution (a) and polarization (b) of water molecules around a negative i o n .............................................. ............................. 1.6 Simulation of the energetics of an ion pair in polar solvents using Langevin dipoles type models. 1.7 The three regions model used to calculate electro static energies in proteins.............................................. 1.8 Scheme used to calculate the effect of the surrounding solvent on electrostatic interactions in proteins................................................................................ Page 14 18 21 29 30 37 41 45 v i i i Figure 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 Page Comparison of the environment influence of an ion in (a) protein and in (b) water. . ......................... 53 Describing the thermodynamic cycle used to estimate the energetics of acid dissociation in p ro te in s .................................................................................... 55 Pictorial representation of the backbone structure of BPTI taken from the coordinates of Deisenhofer and Steigmann (1975) . . . . . . . . . 59 Cycle used to determine the experimentally observed solvation enthalpy of formic acid in water . . . . 64 Calculated protein permanent dipoles and Langevin (water) dipoles around the various acid groups in BPTI: (top) ionized Glu 7 and unionized Asp 3, (bottom) ionized Asp 50 and unionized Glu 49 . . . 67 The contributions to the fie ld inside proteins. . . 70 Calculated contributions to AAG^i of the indicated acids in BPTI....................................................... 73 Comparison of the results obtained for the in trin sic pK ' s of the four acid groups in BPTI c l using two incomplete models: protein permanent dipoles only, and protein permanent and induced dipoles..................................... 76 Figure 2.9 Convergence of the Langevin dipoles models in calculations of pK 's in BPTI.......................................... a 2.10 Sensitivity of A /G ^ to the value of the protein residual point charges ....................................................... 2.11 Calculated contributions to (AGqq + AAGga + AA G qw) of the indicated acids in BPTI including the effect of a nearby ionized group (Asp 3" - Glu 7 '; Glu 49" - Asp 50") . . . . . . . 2.12 The effect of charge-charge interactions on the pKa 's of acidic groups in B P T I................................. . f =2 2.13 The free energy, AG. of moving a charge of ¥ radius a from water to a sphere of low dielectric constant, e . , of a radius b . ................ 2.14 Energetics of ionized groups in nonpolar proteins. 3.1 The catalytic triad in the active site of Trypsin. 3.2 Comparison of the energetics of enzymatic and solution reactions (Warshel, 1981) ............................. 3.3 Experimental estimate of the change in free energy AAG(R) that results from bringing two charges from in fin ity to a distance R from each other . . ... . . 3.4 Experimental and theoretical estimates for an HC00" +NH3CH3 ion p a i r .................................................. .... ABSTRACT The a v a ila b ility of detailed structures of proteins has provided a unique opportunity to study structure-furiction relationships in a quantitative way. In this study we report the details of a microscopic dielectric model which uses the complete protein structure plus a Langevin dipoles model to simulate the effect of the surrounding solvent. The Protein dipoles-Langevin dipoles (PDLD) model was designed to assess the effect of the protein on electrostatic interactions in an accurate, re lia b le and economical way. W e begin by demonstrating the usefulness of the model by calculating electrostatic effects in non-polar and polar solvents. W e then test the v a lid ity of the model in proteins by calculating the in trin sic pK values of the four acid groups in Bovine Pancreatic a Trypsin In hibitor. I t is found that only models which include the contributions from all sources: the protein permanent and induced dipoles and the surrounding solvent can give reasonable results. The error range in our calculations is reported to. be about = 4 kcal/mol. Though this is higher than we would lik e , i t s t i l l allows us to calculate enzymatic rate enhancements where the activation barrier can be lowered by as much as 8 kcal/mol relative to water. W e conclude by reporting preliminary results of the calculation of the proteins influence on the rate of peptide hydrolysis by the serine proteases. I t is found that results in good agreement with experiments are found for the pK, of His 57 and the reduction in the fl energy of the negatively charged tetrahedral intermediate in Trypsin. CHAPTER 1 DEVELOPMENT O F A M O DEL TO SIMULATE THE EFFECT O F ELECTROSTATIC INTERACTIONS IN PROTEIN 1.1 Introduction In this chapter we describe the development of the protein dipoles-Langevin dipoles (PDLD) model for the calculation of electrostatic interactions in proteins. The PDLD model was developed in response to a need to correlate the functions of proteins to their actual atomic structures as determined by X-ray crystallography. The large body of both functional and structural information available has created the need for objective, quantitative approaches to relate the microscopic structures of proteins to their function; in the past structure-function correlations were made on a s tric tly qualitative basis and were often quite speculative. The PDLD model is a microscopic d ielectric approach based on the premise of complete accountability; in order to assess the effect of a d ielectric medium on a particular charge distribution, a complete account of the dielectric medium must be made. In the case of a protein-solvent system this means that all components of the microscopic dielectric must be considered; the protein permanent dipoles and charges, the protein induced dipoles, and the contribution of the surrounding solvent. The concept of complete accountability can be better appreciated i f i t is compared to two very popular approaches in use today: quantum mechnical modelling of enzyme-substrate complexes, and d ielec tric continuum approaches for the stabilization of charges in proteins. Quantum mechanical approaches have been reviewed extensively by Naray-Szabo and Bleha (1982). Because of severe size constraints, these methods must resort to drastic truncations of the enzyme-substrate complex. Because of this restriction their primary strength is in studying the contributions of individual interactions ( i . e . , specific hydrogen bonds or charge-charge interactions). These methods by necessity, neglect the microscopic effect of the protein structure and its interaction with the reacting species in explaining the role the protein plays in catalysis. Continuum approaches have been used for many years to assess the stabilization of charges by the environmental influence of the protein (Tanford & Kirkwood; Tanford & Roxby, 1972; and Matthew et aj^., 1979 to cite a few). These approaches ignore the detailed nature of the protein and treat its structure as a dielectric continuum: typ ically as a sphere of low d ielec tric constant material surrounded by a medium of high d ielec tric constant. Proteins are often described as a collection of polar and non-polar amino acid residues; like a miscelle the non-polar residues collect in the in terio r of the protein. This leaves the surface to the polar residues where they are free to mingle with the surrounding solvent. This has led to the popular theory that proteins can be considered as blobs of low dielectric constant emersed in a sea of high d ielec tric constant. Our hypothesis is that by a proper orientation of the dipoles of the peptide bonds (which have a dipole moment of around 3.5 Debye (Wada, 1976)) that both polar as well as non-polar sites can be expected in the interiors of proteins. Indeed, the protein has a ll the necessary machinery to manufacture a vast array of highly specialized microscopic dielectric environments. Though quite d ifferen t, both approaches described suffer severe setbacks which are a direct result of neglecting important functional features of the protein d ie le c tric . Quantum mechanical approaches suffer from a near complete neglect of the protein structure while treating what remains in a very detailed manner. Continuum approaches neglect the importance of the protein permanent dipoles and their a b ility to influence the polarity of individual protein sites. Several groups have developed and applied models which include the effect of a ll protein atoms (or a large quantity of protein atoms) in th eir electrostatic calculations. Such approaches were used for evaluating electrostatic fields inside and outside of proteins (Johannin & Kellershohn, 1972; Hayes & Kollman, 1976; Sheridan & Allen, 1981) and in exploring the energetics of enzymatic reactions (van Duijnen et al_., 1979; Bolis et a_K, 1978; Allen, 1981). Work based on this approach has been used to estimate the importance of helix dipoles in controlling protein action and folding (Wada, 1976; Hoi, van Duijnen & Berendson, 1978; van Duijnen et a U , 1980; Sheridan | & Allen, 1980). Unfortunately, the approaches used in these studies 3 did not take into account the contributions of the protein induced dipoles or the surrounding solvent. W e hope to make clear, in a quantitative way, the p itfa lls associated with models which do not account for a ll d ielec tric contributions in calculations presented in the next chapter. A fin al point should be made concerning speed and r e lia b ilit y . Many current approaches (such as molecular dynamics) pose serious time constraints. These time constraints lim it, in many ways, the usefulness of a model as a means of asking questions. I f calculations on proteins require huge commitments in both time and money, then many important questions w ill not only be le f t unanswered, but w ill probably never be asked. What we would like is a model that is economical enough to use "in the trenches" where many sc ien tific questions are answered. Our model must not, however, sacrifice r e lia b ilit y in its fru g a lity . The model must be able to attain an acceptable degree of convergence and should be as independent as possible of in itia l conditions. In summary, i f we had to state a preamble to the creation of our model i t would be: to develop a quantitative method whereby the energy of electrostatic interactions could be assessed in an accurate, re lia b le , and economical way. I t is with this hope that the PDLD . I model was conceived. The in itia l use of the PDLD model was reported by Warshel and Levitt (1976) in th eir study of the sugar hydrolysis reaction I i i catalysed by the enzyme lysozyme. These calculations were the f ir s t I I 4 to account for all electrostatic energy contributions in the enzyme-substrate-solvent system using the fu ll X-ray crystal structure of lysozyme-hexasaccharide complex for the protein and the Langevin dipoles model to represent the contributions from the surrounding solvent. This work was important in that i t studied, in a quantitative way, the contributions of various proposed mechanisms to account for the enzymes remarkable a b ility to catalyse reactions. Included was an assessment of quantum effects (orbital sterring) and steric effects (s tra in ). The conclusion, at least in the case of lysozyme, was that electrostatic effects were dominant; that lysozyme was b u ilt to specifically stabilize the carbonium ion intermediate formed during the reaction. Though successful in its treatment of lysozyme, problems arose when transfering the model developed for lysozyme to the serine proteases. I t was decided that a careful rework of the 1976 model was in order. With increased computer capability available, a careful and complete upgrading of the old model was possible starting from fundamental electrostatic concepts. This work, which was reported in a review by Warshel and Russell (1984), w ill be presented in this chapter. Calculations .w ill be presented for model. systems and compared with results obtained from both experimental results as well as classical electrostatic theory. Where approximate methods are shown, comparisons w ill be made to the more detailed methods that they are designed to emulate. 5 The PDLD model treats interactions from three basic sources: (1) the protein permanent dipoles and charges, (2) the induced dipoles of the protein, and (3) the polarization of the surrounding solvent.. W e w ill begin by describing our treatment of these three items. Later, we w ill show how the whole model is assembled into a single unit. 1.2 Simulating the Effect of the Protein Permanent Dipoles and Charges The effect of the protein permanent dipoles is simulated by assigning residual point charges to each atom in the protein. W e are usually interested in calculating interactions between a specific subset of the protein (such as the subset of atoms involved directly in the chemical process under study) and the protein which remains. W e label the charges in these two groups Q and q respectively. With a given lib rary of permanent charges one can evaluate the interaction between the permanent charges which represent the protein dipoles and the charges in the subset Q using the equation, V “ 3 3 2 £ Q iV rij ' - 1 The dipole-dipole interaction within the protein can be evaluated by: V uu' 332 2 W rii l -2 1 , ^ 1 1 J 1J In these equations the energy is given in kcal/mol and the charges in -19 units of atomic charge (1.602 x 10 Coulombs). The distance between - O . charges, r ^ ., is given in A. Note, that these energy contributions are evaluated using the vacuum dielectric constant c = 1, since the d ielec tric contributions are included e x p lic itly in the model. I f isolated charges of the protein are to be considered(such as charges associated with ionized groups within the protein) then these contributions are included using the relation, V = 3 3 2 T Q iq j/rij *-3 where q' represents the set of protein permanent charges under consideration. Though the equations may seem t r iv ia l, th eir application to proteins is not. The major d iffic u lty lies in the development of a consistent and relia b le lib rary of protein residual point charges. In the development of our lib rary we begin by recognizing that the dipolar contribution from the protein molecular structure can be divided into two components: dipoles which arise from the fixed charges on each protein atom, and dipoles induced on each atom by the charges and induced dipoles on all other atoms in the system. Many I schemes are possible for partitioning the total dipole moment of a given group, however. Because of the uncertainty in what constitutes j the permanent and the induced component of the protein dipoles, we I . ■ | must place our fa ith in the fact that slight discrepancies in the I permanent dipoles w ill be compensated for by changes in the polarizations of the induced dipoles and the surrounding solvent. W e w ill show that this compensation effect does occur in calculations presented in the next chapter. Much work is s t ill needed in this area however. Several schemes have been used to create these charge sets. The most popular approach has been to calculate the charges on each amino acid in the gas phase using either ab -in itio (Clementi et a K , 1977; Hayes & Kollman, 1976) or semi-empirical (Momany et ^2.., 1975; Poland & Scheraga, 1967) quantum mechanical methods. A second method involves the use of consistent force fie ld approaches (Warshel & Lifson, 1970) to calibrate the charges to reproduce experimental results. This method was used by Levitt and Lifson (as quoted in Warshel & L e vitt, 1976; see also Hagler, Huler & Lifson, 1974 for additional details) where they fitte d calculated and observed energies and structures of amino acid crystals. I t is this charge set that we use. Table 1.4 lis t the values of the residual partial charges used. Though we feel that most of these charge sets would perform in a satisfactory manner, we feel that our charges give us a comparative advantage that the others lack. Most approaches create th eir charge sets using dipoles defined on the level of the individual amino acid residue. In other words, the complete amino acid residue represents one dipole of the protein (the total charge of each residue is zero but not the total of any subgroup of interest such as an N-H or a C-0 I | group). Our charge set uses dipoles defined at the sub-residue level J treating each atom of the protein chain and its corresponding side chain atoms as individual dipoles ( i. e . N-H, C-0, and CH^ groups are a ll represented as neutral dipolar e n titie s ). Also, the values of these dipoles are the same no matter what amino acid residue they reside in. The picture which results is one of a protein represented as a collection of permanent dipoles each of a magnitude and density approximately comparable to that of the surrounding solvent. This leads to at least a slight advantage when comparing the protein dipoles to those of the surrounding solvent or to the reference reaction in water. 1.3 Simulating the Induced Dipoles of the Protein In calculating electrostatic interactions in proteins i t is important to consider the d ielectric effect resulting from the polarization of the electron clouds of each protein atom. Several approaches have been used in the past but most rely on the use of d iele c tric functions to attenuate the interactions between charges (Gelin & Karplus, 1979; van Duijnen et a K , 1979; Weiner et aT[., 1982 to name a few). What we seek is an approach which treats the microscopic d ielectric of the protein e x p lic itly . Indirect approaches, such as the use of d ielec tric functions, can be very powerful; d iffic u ltie s in defining a proper d ielec tric function occur, however, when dealing with the complex heterogeneity of the protein d ielectric (Warshel & L e v itt, 1976; Rees, 1980; Warshel & Russell, 1984). Our approach (Warshel & L e vitt, 1976; Warshel & Russell, 1984) w ill define a polarization for each atom proportional to the microscopic local fie ld at that point. The local fie ld w ill consist of the fie ld from the protein permanent dipoles and the fie ld from all induced dipoles in the protein excluding the fie ld from the chosen atom on its e lf. An ite ra tiv e approach w ill be used to attain a self-consistent result. W e begin by defining an induced dipole for each protein atom, Vi - “i £ j(r f ) 1.4 where a., is the atomic p o la riza b ility of atom i located at position r -j* £(r -j) is the local electric fie ld on that atom. The local fie ld , and subsequently a ., is a function of the protein partial charges used to calculate the fie ld . To calculate the induced dipoles in equation 1.4 one must f ir s t | evaluate the actual fie ld , £, on each atom. The fie ld on atom i is i ~ 1 o given by the sum of the fie ld , £ , due to the protein permanent charges and dipoles, and the fie ld from the induced dipoles of all other atoms excluding the fie ld from the given dipole on its e lf. P, - « < § % > - O A g (r . .) = 0 r. . R A i j ' 10 - The cutoff function, g, is introduced in order to exclude the classical dipole-dipole interaction between bonded atoms (for induced O dipoles R = 2A). In these calculations we assume that the effect of the bonded atoms is included in the bond energy. Bond energy effects become important only in cases where structural changes in the protein are considered e x p lic itly . Equation 1.5 seems complicated since i t includes the induced dipoles in both its le ft and right side. However, the equation can be rearranged and written in matrix notation as, y = (I - aB)-1 a£° 1.6 where a is the diagnoal matrix of the atomic p o la riz a b ilitie s . The elements of the matrix B are given by, where s and t run on the x, y, z components of each vector. Use of eq. 1.7 is recommended for small molecules. For proteins, however, i t is inconvenient to contract and store the B matrix (small proteins can contain atoms in the thousands). Instead i t is possible to solve equation 1.5 using an ite ra tiv e approach. In the f ir s t iteration the induced dipoles, y . , on each atom are evaluated using the local fields ^ 1 produced from the protein residual point charges. In subsequent steps the local field s are calculated using both the protein residual point 11 charges as well as the induced dipoles determined during the previous step. Thus the ith dipole in the nth interaction is given by, For polarizabi1itie s and intermolecular distances characteristic of non-polar molecules, equation 1.8 converges after only a few iterations and provides the actual microscopic local fie ld for the system under study. The induced dipoles obtained by equation 1.8 can be used to evaluate the corresponding electrostatic energy. I t is important, however, to avoid a com m on error which may occur in evaluating electrostatic energies. That is , in treating a collection of charges and induced dipoles one tends to assume (quite lo gically) that the j ! ■ 1 i : j energy is simply the sum of the dipole-dipole, dipole-charge and j « charge-charge interactions at the given distances. This w ill, j S 2 4 / however, give an energy of - «Q /R (for a charge and an atom separated by a distance r ) . The energy evaluated in this way is twice as large as the corresponding energy obtained by the macroscopic theory that treats the induced dipoles as a continuum (Jackson, 1975). The reason for the discrepancy is not that the macroscopic picture is wrong but that the microscopic picture should include the energy invested in forming the fin a l dipoles. This energy is given by (a/2)§ J 12 | Thus the interaction energy between the permanent charge distribution and the induced dipoles of the protein is given (in kcal/mole) by Vind ' ‘ 332JjK«k + « R ,k ^ k - X' 9 where ££ is the fie ld on the kth dipole from the protein residual charges, K ... l , is the fie ld on the kth induced dipole from a ll other induced dipoles and ^ is the total fie ld on the kth dipole. The factor 1/2 is introduced since this interaction is counted twice while summing over a ll dipoles. The last term is the energy associated with forming the kth dipole. Using the relation y^ = and the fact that Ik = 1 ° + Ip k we o b ta in Vind ' ' 166 I CkHk i - 10 The approach described above is applicable for any collection of j molecules polarized by external fie ld s , and more importantly, by ! internal fie ld s (due to internal charges). An idealized example of polarization of non-polar molecules by a charge is presented in | Figures 1.1 and 1.2. Figure 1.1 shows the calculated polarization of j a system consisting of a positively charged particle placed at the i center of a grid of polarizable particles. | Figure 1.2a shows the relationship between C° and the radial i i i • component of the local fie ld obtained as a result of the self > ^ I consistent treatment described above. The local field s shown are an Figure 1.1. The polarization of polarizable particles by the fie ld of a positively charged p a rtic le . The induced dipoles of each particle are evaluated ite ra tiv e ly by considering the fie ld from the charge and all the other dipoles. The non-bonded interaction between the particles is represented by a van der Waals potential with a minimum * o' °3 r =2A. The p o la riza b ility of each atom is taken at 1A . 14 4 M I \ 8 / '/ / » 1 % f ♦ % average over the actual positions of the particles using several different solvent configurations. W e see that the actual local fie ld is reduced by the influence of the surrounding d ielec tric medium which polarizes to stab ilize the central charge. Self consistent calculations can become very time consuming for systems involving many protein atoms or a large number of solvent molecules. In recent calculations by Sussman (1985) 28 hours of VAX 11/780 computer time was spent to determine one induced energy value when the entire structure of Papain was used. For this reason i t is important to explore more economical alternatives. One alternative which has proven very successful, is the use of ah effective d iele c tric screening function. The relationship between the average values of £ and can be written as: < i(r)> s = s = <C°'('r ‘) /d ( r ) >s = § °(r)/< d (r)> s 1.11 Jwhere d(r) is an effective distance dependent d ielec tric screening !function; < > s designates an average over the actual sites of the I p article at a distance r + A r from the central charge. The function d (r), (which should be a matrix in the general case) expresses the screening of g° for a particle at a distance r by the opposing fie ld from the induced dipoles of a ll the surrounding particles. The average screening function for the system described in Figure 1.1 is given in Figure 1.2b. Figure 1.2b indicates that except for a peak in the f ir s t solvation sh ell, the function d (r) is approximately independent of distance. Therefore, d(r) can be replaced by an average of the screening function such that, f i = ! ° / d lA 2 I t is found that by using the average value, d, for a ll atoms that results in good agreement with those of the fu ll ite ra tiv e treatment are obtained. Thus, one can approximate the energy of the induced dipoles by (in kcal/mol), Vind = “ 166 I |° Hi - - 166 I (|? ) 2 (« i/a ) 1.13 | The constant d should not be confused with the d ielectric I constant of the system. The optimal value of d depends on the system O studied. For hydrocarbons, which are simulated with radii of 3 A and °3 a = 1.60 A (this corresponds to the effective density and p o la riz a b ilitie s of the CH^ groups), the optimal value of d is 1.61. O n the other hand for proteins which are simulated using the a's of Table 1.4 the value of d is 1.1 ± 0.1 (a larger value of d is obtained I with larger values of a ) . The reason for this discrepancy is that the ! | density of a protein is smaller than that of hydrocarbons. I t should j ! j I be noted that the self consistent energy obtained from equations 1.8 i j ! 17 Figure 1.2. (a) The relation between the fie ld £° and the local fie ld £ for the system similar to that presented in Figure 1.1. The fie ld £° is due only to the central charge while the local fie ld r includes both r° and the fie ld r from ~ ~ ~\x the dipoles on each other. The fie ld £ is given in terms of the average projection <£e>s where e = %°/ £° and <>s indicates that the average is done over the actual positions of the particles (rather than the space between the p a rtic le s ). The average is given as a function of the distance r, between the dipoles and the central charge. The distance is given in reduced units of ( r / r ), where •k r is the minimum of the van der Waals interaction used. * .0 03 r and a were taken as 3A and 1.6A respectively. The ★ o solute-solvent r was also taken as 3A. (b) The average value of the screening function d(r) (d(r) =<§ pe>/<£e>). The calculations were done by taking the Boltzmann average of several minimum energy configurations of the system (—).The function d(r) obtained using the average configuration ( . . . ) and using a cubic grid ( — ) are also given. The reduction of d(r) at ( r / r ) 3.3 is due to the j fin ite size of the system. (This was verified using J I larger systems.) j 18 (a) 0.125 0.100 Q 0.075 iZ 0.050 o 0.025 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 (b) 3.0 2.5 2.0 d(r) 1.5 1.0 I 1 I I J L_ I ---------- 1 — 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 r / r * 19 and 1.10 changes quite slowly with the density and polarizabi1ity of the system; when the density or p o la riz a b ility increase, the screening also increases and the self consistent fie ld 5 stays nearly constant. O n the other hand, the results of equation 1.13 depend lin early on the p o la riz a b ility which is inversely proportional to the density. Thus, when one models (using eq. 1.13) a new system of given density and atomic p o la riz a b ilitie s , i t is very important to recalibrate d using the stable results of the se lf consistent calculations. In this work we rely mostly on the self consistent approach of equation 1.8. The approximated results obtained by equation 1.13 w ill be mentioned for comparison (see Table 2.2 ). Figure 1.3 compares the self consistent calculations of the energy distribution for a charge in a non-polar flu id (with a p o la riz a b ility and density that corresponds to saturated hydrocarbons) to the approximated results obtained using eq. 1.13. The total energy obtained in this case by both approaches is about 50 kcal/mol. 1.4 Contribution of the Surrounding Solvent A c ritic a l contribution to the energetics of charges in proteins comes from water. Water contributes to the total solvation in two ways: i t surrounds the protein with a medium of high d ielectric constant, and i t penetrates into the interstices of the protein altering the d ielec tric makeup of the protein its e lf. Indeed, i t can be argued that water constitutes an important and essential structural feature of the protein and that any model which expects to simulate I I 20 Figure 1.3. Radial function of the energy stored in the medium surrounding a charge in an (a) non-polar and (b) polar enviornment. These calculations show a comparison of three approaches: ite ra tive calculations, calculations using effective screening functions and molecular dynamics. 21 (a) O -10 E o o -20 - (b) 0 -10 - o E B - 20 o -30 -40 L r(&) 5 6 = h H - P - -30 L 2 3 8 9 ---ite ra tiv e calculations effective screening calculation 5 6 8 — — iterative Langevin molecular dynamics effective Langevin 22 the energetics of charges in proteins must necessarily trea t the water on an equal footing with the protein amino acid structure. A variety of methods have been employed to simulate the properties of the solvent in and around proteins. The most ambitious of these involves the direct simulation of many water molecules surrounding the protein using molecular dynamics or Monte Carlo approaches. Such approaches were used in recent studies (Hagler & Moult, 1978; van Gunsteren, et a K , 1983) where the main emphasis centered on the structural aspect of the surrounding water. Although solvent structures are important for refining protein X-ray crystal structures, they have not been used successfully in calculations of electrostatic energies. Since many different structures can result in sim ilar energies, finding those classes of water structures that lead to the lowest energies represents a real computations problem. Although direct simulation of water-protein systems can (and eventually w ill) provide relia b le information about electrostatic interactions in proteins, serious convergence problems dictate the need for more practical and e ffic ie n t methods at this time. In order to allevia te speed and convergence problems associated with direct simulations, assumptions must be made to decrease the number of degrees of freedom available to the system. A possible approach is the use of fixed center dipole models which represent each water molecule by a single dipole oriented along its average direction | of polarization. Such methods include the fixed center Langevin j dipoles model (FCLD) and the protein dipoles Langevin dipoles model (PDLD) which w ill be discussed in this section. The f ir s t attempt to treat the energetics of the surrounding solvent by microscopic approaches was reported by Warshel and Levitt (1976) in their study of the catalytic reaction of lysozyme. This work represented the in it ia l use of the protein dipoles-Langevin dipoles model discussed here. Later works, (Warshel, Russell & Weiss, 1981; Warshel & Russell, 1984; and Russell & Warshel, 1985) extended this approach to include the effect of the water molecules which penetrate into the inner spaces of the protein. These studies were concerned with the catalytic reactions of serine proteases and the pK 1s of the acidic groups of Bovine Pancreatic Trypsin Inhibitor (BPTI). With the large number of degrees of freedom available to a solute-solvent system, i t is not surprising to find that many solvent configurations lead to low energy states. In simulations involving a ll solvent atoms (molecular dymamics) each solvent molecule responds to the actual local fie ld generated by the solute and a ll other solvent molecules. These approaches w ill ultim ately lead to detailed structures of solute-solvent systems when convergence can be attained. However, since our primary interest is in energy we need not concern ourselves with actual solvent structures. For our purposes i t is sufficient to know average properties which can be used to calculate energies. What we seek is a scheme where each solvent molecule responds to a local fie ld generated by the solute and the average orientation of the individual dipoles of a ll other solvent molecules. Thus, we present ourselves with the task of formulating the average polarization of the solvent molecules in terms of the fie ld from the solute charges and the average polarization of a ll other solvent molecules. In this section we w ill describe the fixed center Langevin dipoles model and demonstrate its usefulness by calculating various properties of ions and ion pairs in water. These results w ill be compared to all-atoms computer simulation results (molecular dynamics) by Warshel and reported in Warshel and Russell (1984). Later, i t w ill be shown how this model can be implemented in calculations of energetics in proteins. The average polarization of a dipole in an e le ctric fie ld is given by the Langevin equation (H ill, 1960). t I = <Vj> = (coth(xi ) - l / x i )y05i /? i 1.14 X. = yQ £/kbT 1.15 The Langevin equation accounts for both the induced polarization of the solvent molecule as well as the temperature dependent orientational polarization of the permanent dipolar component. What the Langevin equation does not account for is the interaction between neighboring dipoles. I t is here that we use a tric k from classical physical chemistry: we introduce a parameter C which attenuates the {polarization in such a way as to simulate the effect of the \ dipole-dipole interactions ignored. 1.16 This is sim ilar to the approach used when transferring the formalism of the ideal gas law to treatments of real gases where interactions between molecules are simulated using an a c tiv ity coefficient to produce an effective pressure (fugacity). W e can write out v iria l expansions for the real pressure and learn more of the exact nature of these a c tiv ity coefficients but this would not re a lly aid us in solving the problem at hand. Often the best approach is to try i t and see i f i t works. I f i t works well enough then we can consider ourselves lucky to have stumbled upon something useful. For weak fie ld (uCs<k j ) one can expand eq. 1.14 and obtain: To help fa c ilita te convergence we substitute the fu ll local fie ld , £., with a new fie ld , £!, where a ll interactions between solvent molecules closer than a predetermined cutoff distance are ignored. W e replace the parameter C in an equation 1.16, with a new parameter, C', whose purpose is to compensate for the effect of the <yi > = ( u ^ / 3 k bT) § ? / r 1.17 missing nearest neighbor dipoles. Equation 1.16 becomes, 1.18 26 c where 5. is the fie ld from the nearest neighbor dipoles of each ~ 1 solvent molecule. The energy associated with the interaction between the water molecules and the proteins charges is approximated (in kcal/mol) by: A G Q W ” A G QL + AGbulk " 3? I r yi L + AGbulk 1-19 where AGgL is the contribution of the Langevin dipoles and A Gbulk is the contribution of the surrounding bulk solvent (see below). The factor 1/2 reflects the energy associated with the polarization of the solvent dipole in its cavity. This energy is included in part in the interaction between and y. (Warshel & Russell, 1984). The values o fy *- , can be determined using the standard ite ra tiv e approach of equation 1.8 (where y" is given by equations 1.14 and 1.18) or by a non-iterative approach to be described below. The contribution of the bulk solvent (a ll solvent not simulated by Langevin dipoles) is simply obtained by a standard continuum approximation. For the case of a single ionized group we have: A G buik = "166 <e2/ b + (R0)2/b3) 1-20 | where b (in A) is the radius of the sphere included in the Langevin i | dipoles calculation and R is the distance of the ionized group from i | the center of the sphere. This expression, which is obtained from the j Kirkwood formula (Kirwood, 1934) reduces to Born's formula when R<< b (Born, 1920). As in the case of non-polar solvents we can introduce an average screening function, d (r), which when incorporated into equation (1.18) accounts for the screening effect of a ll other dipoles in the system. Equation (1.14) can be w ritten, xi = e ' V § i " fcl 1 kbTd(r) 1*21 Figure 1.4 shows a plot of d(r) versus r (where r is the radial distance of the solvent molecule from the charge) for the case of a single charge surrounded by 120 Langevin dipoles. The locations of the Langevin dipoles were taken from an all-atoms computer simulation where a dipole was placed on the oxygen of each water. An average of six solvent configurations were used. From Figure 1.4 a functional form for d(r) was approximated, d(r) = 2-r r < 6 A - o 1.22 d(r) = 6 r > 6 A Using this for the screening function we arrived at solvation energies in good agreement with those obtained using the fu ll ite ra tiv e approach. Figure 1.5 compares the radial energy distribution using three models: the ite ra tiv e Langevin (FCLD), the effective Lafigevin, and molecular dynamics results (Warshel & Russell, 1984). The non-iterative, "effective", Langevin approach is a strong 28 5.0 4.0 _ 3.0 w -o 2.0 1.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 1 0 .0 r(A) Figure 1.4. The screening function d(r) for water molecules around an ion. The calculations were done using ( i) The molecular dynamics a ll atom model {—• — ) and ( i i ) the ite ra tiv e Langevin model (—0— ). The screening function is obtained in the same way as in Figure 1.2. The error bars represent an estimate of the corresponding errors in convergence. 29 Figure 1.5. Average solute-solvent energy distribution (a) and polarization (b) of water molecules around a negative ion. The polarization is given in terms of the projection (where e is a unit vector in the direction of the fie ld O from the ion) in A- atomic charge units. The figure shows the sim ila rity between the results obtained by the following models: ( i ) The molecular dynamics calculation of the a l 1 atom model used to obtain Figure 1.4. ( i i ) (—0— ) Using the ite ra tiv e Langevin model averagec over several configurations generated by the molecular dynamics calculations of the a ll atom model. ( i i i ) (—A— ) Using the effective Langevin model (equation 1. 22) using the same configurations as in ( i i ) . The indicated polarization was obtained by averaging the water-ion electrostatic interaction (over the solvent molecules in a shell between r and r+Ar, and over differen t solvent configurations); n(r) represents the number of water molecules found in each shell (Figure 1.5a) and then dividing the resulting average by the corresponding £ ° ( r ) . This gives the effective dipole charge interaction since <ye> = -<y£°>/£°. 30 1 4 .0 12.0 *5 1Q0 effective Langevin 8. 0 iterative Langevin 60 © molecular dynamics of ail atoms model 4.0 2. 0 00 ao 9 .0 i o. o 10 40 50 6. 0 ri%) 0.35 0.30 0.25 - 0.20 , - 0.15 0.10 • — • 0.05 0 . 0 0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 1 0 . 0 „ o. r(A) tool in situations where i t can be shown to reproduce the results of the much more time consuming self consistent approach. Here, we use i t as an aid in determining the values of the parameters C' and yQ in an economical way. An in itia l calibration of yQ and C' was performed using the O "effective" Langevin approach with a dense grid of 0.5 A spacing. The dense grid allowed a careful modelling of ions of differen t ra d ii; with large grid spacings a ll ions would appear the same. Good agreement with experiment is obtained when values of 1.0 and 0.35 O charge-A (1.68 debye) are used for C' andyQ respectively. P , which represents the dipole moment of a fu lly polarized water molecule in solution, is chosen to closely resembler that of real water: 0.38 O charge-A (1.82 debye). Our value corresponds to a water molecule with a charge of -0.6 charge units on the oxygen and 0.3 on each hydrogen. Table 1.1 summarizes results obtained for various ions and compares them to th eir experimentally determined solvation free energies and enthalpies. W e chose to calibrate our model to reproduce solvation free energies of ions in water at 300°K. W e began by placing an ion at the O O center of a 12x12x12 A grid of 3A spacing. All grid points beyond a O 12 A lim it from the ion were eliminated from the grid and a Langevin dipole was added to each grid point that remained. The resulting model corresponds to an ion s lig h tly larger than Na+ emersed in a perfect grid of orientable point dipoles. I t is d iffic u lt to ascertain precisely the size of the ion since the ion is similar in 32 Table 1.1. Calculated and observed siolvation energies of some charged groups3 ’^ G R O U P .-calc aGsol SCSSD FCLD 0H“ -112±5 - 110±2 -110±5 -113±5 h3° + -108±5 -106±1 ID + 1 O rH rH 1 -113±5 HCO O ” -82±5 -80±1 -80±5 -83±5 ch3o" -94±5 -93±1 -95±5 -99±5 CH3NH3 — - - -75±5 -79+5 Imidazole+ -62±5 -60±1 -60±10 -65±10 Li + -135±5 -:12: 1±1 -124±5 -133±5 Na+ -110±5 - 101±2 -102±5 -106±5 Cl" -80±5 -79+1 -79±3 -85±3 r Ca . -373+10 -330±5 -380±5 -398±5 a Calculations using the Surface Constrained Soft Spheres Dipoles (SCSSD) model to simulate the solvation free energy of ions in solution (see Warshel, 1979 and section 3 .3 ). k Experimental values taken from results of Frieman & Krishnan (1973), Keberle (1977), Bartmess & Mclver (1979) and Lock & Mclver (1983). 33 size to the grid spacing. For calculations where an accurate ion size is necessary (as in Table 1.1) we can use the "effective" Langevin approach with a denser grid. In the f i r s t iteration the polarization of the grid points was estimated using the "effective" Langevin approach with the function d (r) given in equation 1.22. Subsequent iterations were performed using the self consistent approach of equations 1.5, 1.14 and 1.19. For this problem we used a dipole-dipole cutoff value of Q ^ j ) = 0 O for interactions closer than 3.2 A (s lig h tly larger than the distance separating nearest neighbor dipoles). All calculations used a value of 1 and 0.35 for C' and y respectively. For a single charge in a 12 O A grid about 25 iterations were sufficient for convergence. The solvation energy is calculated using equations 1.19 and 1.20. The cavity radius, b, in 1.20 is taken as the grid size plus one half of the grid spacing. Since the ion is in the center of the grid, R is taken as 0.0. Table 1.2 shows results obtained using several differen t grid sizes for the problem described above.. Included are calculations which used the results of an a ll atoms computer simulation to place the Langevin dipoles around the ion. Table 1.3 shows the effect of changing the density of the grid. Here we use the "effective" Langevin approach since ite ra tiv e applications at high grid densities are d iffic u lt and time consuming. In Figure 1.6 we calculate the energy of separating two charges in a grid of Langevin dipoles. The energy for the ion pair at i S separation R can be w ritten, ! Table 1.2. Calculated solvation energies for an ion in water at 300°K using the ite ra tiv e form of the Langevin dipoles model. Compared are calculations using different grid sizes as well as calculations where the dipoles were placed on points determined by an all-atoms computer simulation (molecular dynamics). Dipoles Langevin energy (kcal/mol) Bulk energy (kcal/mol) Total (kcal/mol) 253 119c 120c 44 -87.1 -84.0 -84 :± 3 -78 •12.3 ■15.8 •16 ■21 -99.4 -99.8 ■ . -100 ± 3 -99 The ion present has an ionic radius slightly smaller than that of Na+; we should therefore expect a solvation energy slightly lower than -102 kcal/mol (Friedman & Krishnan, 1973). The calculations b c d used values of 1 and 0.35 for C' and u' respectively. o Op. Calculation used a 12x12x12 A grid of 3 A spacing. Calculation used a 9x9x9 A grid of 3 I spacing. The position of the Langevin dipoles were taken as the location of the oxygen atoms of 120 water molecules obtained from an all-atoms computer simulation. The result is an average of four such configurations. The position of the Langevin dipoles were taken as the 44 nearest dipoles (cutoff about 7.5 K) found in a single all-atoms computer simulation configuration. Table 1.3. Calculated solvation energy for Cl- (r.. = 2.2) using different grid spacings and the "effective" Langevin dipoles model. Grid Spacing O (A) Longevin Energy9 (kcal/mol) Bulk energy^ (kcal/mol) TotalC (kcal/mol) 0.5 -65.8 -13.5 -79.3 1.0 - 68.1 -13.2 -81.3 1.5 -58.6 -13.0 -71.6 2.0 -65.6 - 12.8 -78.4 3.0 -70.2 -12.3 -82.4 a Calculated using a 12x12x12 A grid and the parameters C'=1 and iiQ = 0.35. b Calculated using equation 1.21 with b = 12+1 (grid spacing) and R=0. c o Energies were corrected to the density of the 3.0 A grid using 3 the conversion factor (3/A) where A is the spacing of the particular grid. Figure 1.6. Simulation of the energetics of an ion pair in polar solvents using Langevin dipoles type models. The calculations used two spherical ions with solute-solvent ra d ii, r , calibrated (in each case) to reproduce the solvation energies of HCC^ and CHgNHg. The figure demonstrates how the a r t if ic ia l barrier for ion recombination is reduced by improving the model used (see discussion in the te x t). The models used are: 0 non-iterative model with a fine grid of 0.5A spacing, □ ite ra tiv e model with a 1.5A grid spacing, • ite ra tiv e O model of 1.5A grid spacing.and surface correction. SCI indicates the self consistent-iterative approach of equations 1.5, 1.14 and 1.19. 37 ENERGY (kcal/mol) -130 0.5 A grid non iterative 1 . 5 A grid (SCI) -140 -150 ' 1 .5 A grid (SCI) ^ + boundry correction -160 -170 -180 -190 200 r(A) AG(R) = -332/R + AAG^(R) + A G ^ R ^ ) 1.23 where -332/R is the Coulombic interaction between unit charges of opposite sign and aG sq- j(R) is the solvation free energy of the charges at separation R. W e use three differen t models in these calculations: O (1) "effective" non-iterative Langevin with a dense 0.5 A grid O spacing, (2) the ite ra tiv e Langevin with a 1.5 A grid spacing and, (3) the results of (2) with a boundary correction added. This boundary correction consisted of trimming the outer layer of Langevin dipoles and replacing them with bulk water. The purpose of this was to compensate for the overpolarization of the surface water resulting from the neglect of the bulk ele ctric fie ld in the calculations. I t was found that both dense grids as well as boundary corrections were needed to properly simulate the small expected barrier of ion separation in water (Warshel & Russell, 1984). Ion separation continues to represent one of the more d iffic u lt challenges in simulations in polar solvents. 1.-5 Calculations of Environmental Effects in Proteins In describing our model we w ill borrow terminology from trad itio n al solvation theory. This constant comparison is not accidental, however, as the primary focus of the PDLD model is in comparing the solvation effect of the special "protein solvent" with | that of the "conventional solvent" water. In our case the solute is i that portion of the protein or protein-substrate complex directly involved in the process under study. An example would be an acid group undergoing ionization or the c ritic a l components of an enzyme active site . Our model begins by dividing the protein into three regions as shown in Figure 1.7. Region ( I ) contains the atoms which can be thought of as the solute (as described above). These are the atoms that we w ill allow to change in either charge or position during the course of the process. The charges of these atoms are frequently calculated quantum mechanically. Region ( I I ) includes the protein permanent and induced dipoles. The charges and positions of these atoms are ty p ica lly held constant (the one exception to this is when calculations are performed on coordinate sets obtained from either calculations of energy minimization or molecular dynamics). Region ( I I I ) includes the surrounding solvent. Typically water molecules up O to about 12 A from the center of reigon ( I ) are simulated by the O Langevin dipoles model while the water from 12 A to in fin ity is simulated using an appropriate continuum model (such as the Born equation). To c la rify the actual implementation of the model we describe below in detail the steps of the calculation: ( i) The protein coordinates are taken from the corresponding X-ray crystal structure. Hydrogen atoms are added in standard geometries and the protein is divided into the three regions described I in Figure 1.7. i j ( i i ) The contribution of the protein residual charges q. (the ! : t 40 protein permanent dipole I - ... protein induced dipoles Langevin dipoles all atom water model Figure 1.7. The three, regions model used to calculate electro static energies in proteins. Region I includes the charged groups of in terest, region I I includes the permanent and induced dipoles of the protein, and region I I I includes the surrounding solvent. charges representing the protein permanent dipoles) is calculated using equation 1.1 and the residual charges of Table 1.4. ( i i i ) The energy of the protein induced dipoles is calculated using equation 1.9 and the a's shown in Table 1.4 while evaluating the induced dipoles using the ite ra tiv e approach of equations 1.5 and 1.8. (iv ) The interaction between the Langevin dipoles and the protein charge distribution is evaluated according to the following O steps (see also Figure 1 .8 ). A large cubic grid (12x12x12) of 3 A spacing is b u ilt (Figure 1.8a). • The grid is rounded to a sphere of 12 A radius and the protein is placed on the grid where the center of region ( I ) is set at the center of the grid. Any grid point which is closer than 2.6 A to any protein atom is deleted (Figure 1.8b) and a Langevin dipole is assigned to each grid point that remains (Figure 1.8c). The average polarization is evaluated using equations 1.14 and 1.18 where the fie ld is evaluated ite ra tiv e ly in the same way as in equation 1.5, while excluding interactions between dipoles which are closer than 3.2 A (nearest neighbor Langevin dipoles). F in ally , the interaction energy between the protein and the Langevin dipoles is evaluated using equation 1.19. In order to obtain a proper simulation of the average penetration of the solvent into the unoccupied volume of the protein (including narrow channels and pockets), i t is important to average over differen t possible grid configurations. This is done by offsetting the origin of the grid by O a randomly selected s h ift (between 0 and ± 0.5 A). A test of the 42 Table 1.4. Parameters used in the PDLD model. qC 0.0 qC H -0.097 qCH2 -0.194 a) protein residual ' qCH3 qc = o -0.291 0.392 atomic charges qC02 0.760 qN 0.0 qN H -0.253 qNH 2 -0.599 q0H -0.427 protein atomic^ aH 0.5 p o la riza b ilitie s ax ( m ) 1.0 non-iterativec^ d ielec tric function d(r) 1.1 for induced dipoles yo 0.35 . C' 1.0 ( r < 3 1.0 Langevin dipoles0 ^ d(r) * 3 < _ ■ ' r _ < 8 r -2 Table 1.4. (continued) a Residual atomic charges (in atom charge units) used to represent the permanent dipoles of the protein (referenced in Warshel and L e vitt, 1976). The values given represent the charge on the main chain atom (the f ir s t atom lis te d ). The remaining charges are divided equally among each side chain atom so that the total charge of the group is zero. b °3 Values of the atomic p o la riza b ilitie s (in A ) of the protein atoms used in the evaluation of the induced dipoles of the protein equation 1.4. c Non-iterative d ielectric function used in calculating the induced energy with equation 1.3. ^ Langevin dipoles parameters. yQ is the upper lim it of the O Langevin dipole (in atomic charge - A) and C1 is a free parameter, p and C' are calibrated to reproduce the solvation free energies of ions in water with the restraint that yQ should be similar to the dipole moment of water. d(r) is the distance dependent d ielectric function used in equation 1. 22. Figure 1.8. Scheme used to calculate the effect of the surrounding solvent on electrostatic interactions in proteins. W e start by considering a cubic block of d ie lec tric material represented here as a grid of polarizable points (1 8 .a). W e superimpose the protein on the grid allowing the protein to dispalce all grid points found too close to any protein atom (1 8 .b). W e also exclude a ll points which are at a distance larger than a given cutoff radius. Finally, we turn on the protein permanent dipoles which polarize the remaining grid points in accordance with the Langevin model described in the text (1 .8c). This sphere of heterogeneous d ielec tric is then dropped into a sea of water represented by an appropriate continuum model. The resulting d ielectric effect is represented as a collection of dipoles coming from both the permanent and induced dipoles of the protein as well as the surrounding solvent. I 45 (a) o o °0 o °o a O <3------------< — ‘ j------------<)------------0 () m < <) < i 6 ..—* ) ■ ■ ........ —< 5 ----------H I------------I (b) (c) convergence w ill be presented in the next chapter. The effect of the fie ld from the Langevin dipoles on the induced dipoles of the protein should be included in the calculations by reevaluating the protein induced dipoles after evaluating the Langevin dipoles. In the present study we examined this effect and found that i t contributed no more than 2 kcal/mol to the calculated solvation energy. In general one has to examine the importance of this "back fie ld effect" in any new system. . After evaluating the fin a l polarization of the dipoles in the system we evaluate the to tal energy of the model by the sum of the contributions listed above. The relevant energies involve, in most cases, differences between the corresponding energies upon change of the solute (region ( I ) ) charges. j 1.24 and Vind change in we do not the A A f iq w is M G sol = A(AVQ m + V + AAVind + a ' agQL + “ bulk* = AAGQy + AAGQa + AAGqw where AVp^ and AVind denote the corresponding change in Vp upon change of Q from zero to its given value. AAGq^ is the Vp^ upon change of the charge Q. AV is zero as long as consider changes in the protein configuration upon change of charge Q. AAGq^ is the change inAVjnd upon change in Q, and the change in aGq^ and AG^-j^. Here we denote the change in calculated electrostatic potential energy as a free energy. 47 1.5 References Allen, L.C. (1981) Ann. N.Y. Acad. S c i., 367, 383-406. Bartmess, J.E. & Mclver, R .T., Jr. (1979) Gas Phase Ion Chemistry, Vol. 2 (ed. M.T. Bowers), pp. 2, 87-121. New York: Academic Press. Bolis, G., Ragazzi, M., Salvaderi, D., Ferro, D.R. & Clementi, E. (1978) Gazz. Chim. I t a l . 108, 425-443. Born, M. (1920) Z. Phys. I , 45-48. Clementi, E., Cavallone, F. & Schordamaglia, R. (1977) J. Am . Chem. Soc. 99, 5531-5544. Friedman, H.L. & Krishnan, C.V. (1973) Thermodynamics of Ion Hydration in Water, A Comprehensive Treatise, Vol. 3 (ed. F. Franks), pp. 1-118. New York: Plenum. Gelin, B.R. & Karplus, M. (1979) Biochemistry, JU S , 1256-1268. Hagler, A .T., Huler, E. & Lifson, S. (1974) J. Am. Chem. Soc. 96 5319-5327. Hayes, D.M. & Kollman, P.A. (1976) J. Am. Chem. Soc. 98, 3335-3345. H ill, T.L. (1962) An Introduction to S tatis tic a l Thermodynamics, pp. 205-209. Reading: Addison-Wesley. Hoi, W.G.J., van Duijnen, P.T. & Berendson, H.J.C. (1978) Nature Lond. 273, 443-446. Jackson, J.D. (1975) Classical Electrodynamics, 2nd ed., pp. 158-162. New York: Wiley. Johannin, G. & Kellershohn, N. (1972) Biochem. Biophys. Res. Commun. 49, 321-327. Keberle, P. (1977) A. Rev. Phys. Chem. 28, 445-476. Kirkwood, J.G. (1934) J. Chem. Phys. 2, 351-361. Locke, M.J. & Mclver, R.T., Jr. (1983) J. Am. Chem. Soc. 105, 4226-4232. Matthew, J.B ., Hanania, G .I.H . & Gurd, F.R.N. (1979) Biochemistry, 18, 1919-1928. Momany, F.A., McGuire, R .F., Burgess, A.W. & Scheraga, H.A. (1975) J. Phys. Chem., 79, 2361-2380. Naray-Szabo, G. & Bleha, T. (1982) Progress in Theoretical Organic Chemistry, Vol. 3 (ed. I.G . Csizmadia), pp. 267-336. Amsterdam: Elsevier. Poland, D. & Scheraga, H.A. (1967) Biochemistry, 6, 3791. Rees, D.C. (1980) J. Molec. Biol. 141, 323-326. Russell, S.T. & Warshel, A. (1985) J. Mol. Biol, in press. Sheridan, R.P. & Allen, L.C. (1981) J. Am. Chem. Soc. 103, 1544-1550. Sussman, F. (1985) private communication. j i Tanford, C. & Kirkwood, J.G. (1957) J. Am. Chem. Soc. 79, I 5333-5339. i | Tanford, C. & Roxby, R. (1972) Biochemistry, 11_, 2192-2198. j van Duijnen, P.Th., Thole, B.Th., Broer, R. & Nieuwpoort, W.C. I I (1980) In t. J. Quantum Chem. 17, 651-671. van Duijnen, P.Th., Thole, B.Th. & Hoi. W.G.J. (1979) Biophys. 49 Chem. 9, 273-280. Wada, A. (1976) Adv. Biophys. jl, 1-63. Warshel, A. (1979) J. Phys. Chem., 83, 1640-1652. Warshel, A. & L e v itt, M. (1976) J. Mole. Biol. 108, 227-249. Warshel, A. & Lifson, S. (1970) J. Chem. Phys. 53, 582-594. Warshel, A., Russell, S.T. (1984) Quarterly Rev. Biophys, 12, 283-422. Warshel, A., Russell, S.T. & Weiss, R.M. (1982) Biomimatic Chemistry and Transitiion State Analogs (eds. B.S. Green, Y. Ashani, and D. Chipman), pp. 267-273. Amsterdam: Elsevier. Weiner, P.K., Langridge, R., Blaney, J.M. Schafer, R. & Kollman, P. (1982) Proc. Natl. Acad. Sci. 79, 3754-3758. CHAPTER 2 CALCULATIONS OF THE pK 'S O F THE FOUR ACID GRO UPS a IN BOVINE PANCREATIC TRYPSIN INHIBITOR 2.1 General Formalism In the previous chapter a general approach was presented for simulating electrostatic interactions in solutions and in proteins. A sim plified model was developed which included the contributions from a ll sources: the protein permanent and induced dipoles and charges, plus, the interaction of the surrounding solvent. In this chapter the fixed dipoles-Langevin dipoles model w ill undergo a rigorous test in an e ffo rt to assess its s ta b ility and r e lia b ilit y . As a goal we stated a desire to develope a model that is both economical as well as re lia b le . W e know that eventually models using molecular dynamics on a ll protein atoms and including the surrounding solvent w ill become practical; a major e ffo rt has been ongoing in this group for some time to develope such models (Warshel (1984)). Until that time, however, theoretical approaches such as the model described here are needed to f i l l the void. As a test case we chose to calculate the pK 's of ionizable a groups in proteins and to compare these to experimentally measured values. This problem is analogous to our e a rlie r calculations on the solvation of ions in solution, but this time the protein plays the i j role of the surrounding solvent. This idea is represented in Figure 2.1 where the solvation of an ionized acid group is shown in water and in its protein s ite . Since the pK values for most acid groups in a proteins are sim ilar to those in water, the protein must make up for the missing water solvation through a proper orientation of its permanent dipoles and the polarization of its atoms. I t is this effect that we wish to demonstrate here. Our model, as presently stated, is b u ilt to assess environmental influences only. I t is useful therefore, to formulate the problem in terms of differences between the effects of d ifferen t environments. Any problem that can be reduced to a question of differences in solvation when moving from one medium to another can be approached in this way. Our current task is to devise a cycle whereby pK values can be a calculated in proteins using information attained from environmental effects as our measuring stick. Figure 2.2 shows the cycle used to relate free energy values to pK 's in proteins. In th is scheme we a I chose to represent the ionization process in terms of the energy of i | the corresponding reaction in water and the difference in the j solvation of the ionized and unionized forms of the acid in protein and in water. From this cycle we can write the free energy, AG , of I • ioinizing an acid in protein at a given pH as, Figure 2.1. Comparison of the environment influence of an ion in (a) protein and in (b) water. The effect of the protein is to exclude water from the vis cin ity of the ion and replace i t with the permanent and induced dipoles of its detailed structure. In order to stab ilize an ion, the protein must make up for the loss in solvation suffered as a result of the displaced solvent. The fact that proteins accomplish th is, contributes to the idea that proteins act as special purpose environments b u ilt to stab ilize specific charge distributions;, where water is free to reorient the protein backbone is fa re ly rigid and does not readily relax to stab ilize any charge distribution. 53 Figure 2.2. Describing the thermodynamic cycle used to estimate the energetics of acid dissociation in proteins. W e choose here to represent the ionization process in terms of the energy of the corresponding reaction in water (which we know experimentally) and the sovlation of the A" and A-H species in protein relative to water (which we can calculate). In the f ir s t step one mole of A-H bound to protein is transferred to solution at a pH corresponding to a hydrogen ion concentration of CQ. The energy for this process is the difference between the solvation energies of A-H in water and in protein ( aG sq- j (A-H)). In the second step, 1 mole of A-H is ionized to A" in solution while maintaining a constant pH. The energy of this step is related to the difference in the pKw of the acidic group in water and the pH of the solution (aG w = w 2.3RT (pKw-pH )). Fin ally, the solvated A” species is moved from water to protein where the energy of this process is once again, equal to the difference in the solvation energies of the ionized species in water and in protein (AGS0- ] (A~)). Therefore, the problem reduces to calculating differences in solvation energies only; the j energy of the reaction its e lf is known from experiments. I i By using this cycle the need to calculate the energy of the process in protein d irec tly is avoided improving the r e lia b ilit y and accuracy of the calculations. 55 II Q _ < o Q . t " o 3 w O <3 < C L c O , ro O O Q. i 3 o CL cc ro c v j 3 o <3 o o X <r 3 f o rv « n o < 1 o <3 t - i i ii Q T X 56 AGp(AHp^ - +H ^ > = AG^(A-)-4G"^(AH) + A G ^A H ^+tT) = A G ”^ ( A " ) - A fi^(A H ) + AG°(AHw -^ ‘ +H+) - RT ln[C(H+)] = A G W '>P(A“) - A G W ^ " P(AH) + 2.3RT(pK^-pH) 2.1 a where aG w ^p is the difference in solvation energy of the indicated species in protein (p) and in water (w). In obtaining this equation we used the relation A G W = AG° + RT ln [C (A ')c/C(AH)b] + RT ln[C(H+)] = -RT In K* + RT In [C(H+)] 2.2 a where the b and c subscripts refer to step b and c in Figure 2.2. W e w ill define the pK in protein, in analogy to the defin ition of the 3 1 pK in water, as the pH where AG_ becomes zero. Using this definition a p equation ( 2. 1) can be rewritten as, 0 = A G W _> P (A") - A G V ^>P(AH) + . 2.3RT(pK* - pKp) a a pKP = pK* + (l/2.3RT)(AAGPol - A A G ^) 2.3 where “ G sol = - AGsol(AH)- 57 With this equation we have accomplished our task of relating the p«a in proteins to a quantity we can measure, the pKfl in water, and a quantity we can calculate, the change in solvation energy when passing from water to protein. I t is important to note that since our calculations are relative to water we need not concern ourselves with the actual energy of the deprotonation process in the gas phase. This w energy is included in the experimentally measured pK&. 2.2 Calculations of In trin s ic pKJs ------------------ q W e decided to test the model by calculating the pK 's of the four < X acid groups in Bovine Pancreatic Trypsin Inhibito r (BPTI). BPTI is a small protein (peptide) consisting of only 58 amino acid residues (898 atoms). Its function is to in hibit the a c tiv ity of various digestive enzymes including chymotrypsin and trypsin. However, our interest stems not from its function but from the wealth of experimental information related to BPTI. This information includes very careful studies of its three dimensional structure and the pKfl values of its ionizable groups. BPTI contains four acid groups: aspartic acids 3 and 50, and glutamic acids 7 and 49. Figure 2.3 shows a p icto ria l representation of the mainchain backbone of BPTI with its four acid groups indicated. The f ir s t X-ray crystal structure of BPTI was reported by Huber and co-workers (Huber et al^., 1970) on crystals grown at pH 10.5 in 2.5 M K, Na phosphate buffer. At this pH a ll the acid groups should be in O | th e ir ionized state. This structure was later refined to 1.5 ( A Figure 2.3. P ictorial representation of the backbone structure of BPTI taken from the coordinates of Deisenhofer and Steigmann (1975). The four acid groups: Asp 3, Glu7, Glu 49 and Asp 50 are indicated on the fig ure. The large circles represent the location of the alpha carbons; the small circles represent the average positions of the side chain atoms. The picture was created using the programs PLUTO and RIBBON. 59 resolution by Deisenhofer and Steigmann (1975). The pK values of the four acid groups have been assigned using a the results of carbon-13 N M R spectroscopy (Richarz & Wuthrich, 1978; and March et 1982). The only significant difference between these two experiments was in the ionic strength of the samples where the la tte r group was careful to maintain a reasonably constant value. The results of these experiments are essentially id entical. Table 2.1 l i s t the values obtained along with pKa values for free aspartic and a glutamic acids in water. Although both studies were able to separate the two aspartic acid resonances from the two glutamic acid resonances quite w ell, neither was able to assign individual resonances to particular acid groups. This was not a major problems, however, since a ll four pKg values are v irtu a lly identical when compared with the error range of our calculations. W e w ill begin our analysis by considering the in trin s ic pKa's of the four acid groups in BPTI. The in trin sic pKa (pK^n t) is the p«a obtained when a ll other ionizable groups are in th eir neutral state. This corresponds to the energy required to form a charge in its protein site: the self energy. Each protein site w ill have its own in trin s ic pK dependent on the microscopic d ielec tric makeup of that d i s ite . Other approaches (March et aJL, 1982; Matthew et a K , 1979) treat this self energy contribution as a free parameter usually setting i t equal to the value of the pKa in water (modified by a j constant determined by the presence of hydrogen bonds). The primary 61 Table 2.1. pK values of four carboxylic acid groups in BPTI as u determined by carbon-13 NMR. Included also are pK 's of the a reference acids in water. Residue Richarz and Wuthricha,c March et_ al .^ ,c Asp 3 3.1 3.18 Asp 50 3.4 3.57 Glu 7 3.7 3.95 Glu 49 3.8 3.97 Free acid in water^ Aspartic acid 3.9 Glutamic acid 4.3 a Richarz and Wuthrich (1978). Measurements were made at 35°C with no ionic strength adjustments made during the titra tio n s . k March et aK (1982). Measurements were made at 41°C and adjusted to 25°C using the relation pK(25°C)=pK(41°C)+C where C=0.04 for glutamic acid and C=0.08 for aspartic acid. The ionic strength was held reasonably constant at 0.1M. The pK values reported are an average of those obtained from the a carboxyl carbon resonance and the resonance from the carbon atom from the CH? group bonded to the carboxyl carbon. d pK values of the free acids in water at 25°C were taken from Cantor and Schimmel (1980). emphasis in these studies is on the effect charge-charge interactions have on the pK 's. A serious drawback to such an approach, and a a reason why in trin s ic pK 's are of such importance, rest with the fact a that several c a ta ly tic a lly important charges exist in in terio r regions of proteins. These regions are often considered non-polar by these approaches. An example is aspartic acid 102 in Chymotrypsin and Trypsin. Calculations which attempt to reproduce the actual experimentally measured pKa values, of course, should consider the contribution from other ionized groups e x p lic ite ly when these contributions are deemed important (such as in the formation of salt bridges). W e w ill study the effect of charge-charge interactions by presenting two sets of calculations: one which includes the effect of these interactions, and one which does not. In order to assess the energy of ionizing an acid in its protein s ite , i t is f ir s t necessary to examine the ionization process in water. The enthalpy involved in such a process is described in Figure 2.4. The figure shows that water molecules stab ilize the ionized acid by about -83 kcal/mol. The cycle for the corresponding free energy gives about -80 kcal/mol (see Table 1.1 ). Note that solvation energies and solvation enthalpies of ions in water are similar to within 5% at room temperature. The solvation free energy of the unionization acid is about -10 kcal/mol. Thus, the change in solvation free energy upon ionizing an acid in water is about -70 | kcal/mol. Since pKa 's of acid groups in proteins are not in general, | much differen t than in water (except for a few very important special Figure 2.4. Cycle used to determine the experimntally observed solvation enthalpy of formic acid in water. I t is shown that water stabilizes the ionized acid by about -83 kcal/mol. A H 0 is the sum of the solvation enthalpies for H20 (-10.6) and formic acid (-11.2) (Arnett, et a l., 1972). AHpy is determined by the difference in the proton a ffin itie s of HC02H (-345) and (-170) (Kebarle, 1977): AHso l(H30+) is estimated from the cycle H^O + H20 - H30+ + pH' using the enthalpy for gas phase proton transfer (AHpy = 220 kcal/m ol), the enthalpy for this reaction in solution (13 kcal/mol), the heat of vaporization of H20 (which is about 20 kcal/mol for two water molecules), and the assumption that AHSQ-|(H30+) = AHsol(°H -). AH0 =+10.6+11.2 LO k O * C \ ) o o X + 1 / 1 o t o X J i o II (/> «/l C L X < 1 w n T (V I O O X « n 6 " C M X cases), we should obtain about -70 kcal/mol from our calculated AGsoi(A-) - AGso-|(AH). Reproducing such a large number (which corresponds to about 51 pK units) might appear unusual when a experiments measure pKfl changes of less than 1 un it. This is because when dealing with differences between the solvation of charges in water and in th e ir protein sites, we must also consider the difference in their respective self energies. Unless stated otherwise, we used the PDLD model with the parameters and charges listed in Table 1.4. Region I contained the atoms of the CO2 group of the acid under consideration. W e used the following charges for the acid groups: 0 .7, -0.35 and -0.35 for the unionized form and 0.7, -0.85 and -0.85 for the ionized form for the carbon atom and the two oxygen atoms respectively. As a computational sim plification, permanent and induced dipoles were added only to protein dipolar groups which possess at least one Q atom within 9 A of a region I atom. All other protein atoms were eliminated from consideration and replaced by either Langevin dipoles or bulk water. This sim plification has been found to produce results comparable to those obtained using the whole protein while resulting in a vast savings in computer time. The Langevin dipoles were calculated in the same way as described in the previous chapter. Figure 2.5 shows the calculated polarization of the protein permanent dipoles and Langevin dipoles (which represent the | surrounding water molecules). Represented are two configurations of i j I the protein: ionized Glu 7 with unionized Asp 3, and ionized Asp 50 I ! | 66 Figure 2.5. Calculated protein permanent dipoles and Langevin (water) dipoles around the various acid groups in BPTI: (top) ionized Glu 7 and unionized Asp 3, (bottom) ionized Asp 50 and unionized Glu 49. The protein dipoles are separated into two contributions: the to tal dipole moment from the peptide bonds (arrows originating from the midpoints of the solid dark lin e s ), and the total dipole moment of the side chain atoms (arrows originating in the small circles of the protein). You can see from the figure how the protein dipoles orient to stabilize the ionized acid, offsetting the loss of the displaced water. The importance the remaining solvent plays in the to tal protein-solvent system becomes quite clear; both the protein and the surrounding solvent are essential contributors to the energetics of charge s ta b iliza tio n . 67 68 with unionized Glu 49. For c la rity , the figure shows just a small fragment of the protein used in the calculations. I t is interesting to note how the protein permanent dipoles act as an extension of the solvent, using th e ir orientation to make up for the effect of the excluded solvent. In Figure 2.6 we use the PDLD model to calculate the magnitude of the ele ctric fie ld along a section of mainchain extending from residue 2 to residue 12. These calculations were performed in the presence of ionized Glu 7 and used the "effective" non-iterative approach to assess the contributions from the induced dipoles and the surrounding solvent. I t is interesting to note how the ele ctric fie ld is dominated by the fie ld from the protein permanent dipoles except for the region near the ionized Glu 7. In this region the effect of a ll components become comparable, emphasizing the need to include contributions from a ll sources when assessing electrostatic interactions in proteins. Results for the in trin sic pKJs of the four acid groups of BPTI a are summarized in Figure 2.7 and Table 2.2. I f one assumes that the in trin s ic pK 's in BPTI are sim ilar to the pK, of an acid group in a a water, then one should obtain about -70 kcal/mol for AAG^-j. W e arrived at values of -75.4, -67.1, -77.2 and -68.7 for Asp 3, Glu 7, Glu 49 and Asp 50 respectively: an average of -72 ± 4 kcal/mol J (corresponding to a relative error of 6%). Though s t i l l higher than j we would lik e , this deviation s t i l l allows us to make respectable k j estimates of protein contributions to enzyme catalysis where the 69 Figure 2.6. The contributions to the fie ld inside proteins. The figure describes the calculated ele ctric fie ld for BPTI with Glu 7 in its charged form (where a ll other ionizable groups are neutral) across a pathway along the alpha-carbons of the indicated residues. £n> £ » 5 andht, r ~Q* .~ii* ~a ~w are respectively, the magnitudes of the fie ld s from the inoized acid, the protein permanent dipoles, the protein induced dipoles and the surrounding solvent. The total fie ld is also indicated in the figure. 70 |£| kcal mol" 1 charge" 1 A-1 R I 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 a*carbon residue number Table 2.2. Calculated energy contributions (in kcal/mol) to the ionization of the four acidic groups in BPTI when a ll other ionizable groups are neutral. Acid AAG n Qu A AG Qa A AG Q w “ G ?ol (AAGP'int) Asp. 3 -10.9 -1.5 (-1.8 ) -63.0 -75.4 (-71.3) Glu 7 -21.5 -14.0 (-15.3) -31.6 -67.1 (-70.6) Glu 49 -33.6 -1.9 (-1.8) -41.7 -77.2 (-70.3) Asp 50 -31.6 0.8 (-1 .3 ) -37.9 -68.7 (-70.3) jThe calculations were done using the PDLD model with the ite ra tiv e j approach of eq. ( 1.8) to calculate the induced dipoles energy, AAGga , j and the ite ra tiv e Langevin dipoles model (with a grid spacing of | 3 A) to estimate the contribution of the water molecules, AAGqw- The j total change in solvationenergies, AAG^-j, are related to the | in trin sic pK ' s of the acid groups in th eir protein sites. The observed values of AAG^^jn are given in parenthesis. These values were obtained from eq. 2.6 and the value of -70 kcal/mol for “ G sol and the estimate of pK^nt using equations 2 .4 -2 .6 . AAGq^ values obtained using the non interative approach of eq. 1.13 with " d = 1.1 are shown in brackets. p Figure 2.7. Calculated contributions to AAG sq- j of the indicated acids in BPTI. The calculations were done by the PDLD model 7 using the ite ra tiv e approach of equation (1.5) to calculate the induced dipole energies AAG^ and the ite ra tiv e Langevin dipoles model (with a grid spacing of O 3 A) to estimate the contribution of the water molecules MGqw (see text for more d e ta ils ). The calculated values of AAGP o1 are compared to the corresponding experimental p,int estimates AAGin t (labeled by horizontal bars). These free energies were determined using the approach of equations 2 .4 -2 .6. I t is shown clearly that use of an incomplete model (such as protein permanent dipoles only) results in a large spread in the calculated pK ' s of the four acids. In fa c t, the figure demonstrates that without the compensating effects of both the induced dipoles as well as the surrounding solvent, then unacceptibly large deviations from the experimentally determined values result (Asp 3 and Glu 7 which have low values for A A G n receive th e ir relative compensation d ifferen t sources; the surrounding solvent and the induced dipoles respectively). 73 Energy (kcal/m ol) Permanent Induced Di poles Dipoles Water + Bulk (Langevin) Total Solvation AAG p, int sel “ G i, 3 r 49 50 3 r 49 50 3 7 49 50 3 7 49 50 Acid Number 74 protein can contribute as much as 7 kcal/mol to the reduction of the activation ba rrier. Furthermore, pK 's can be considered as a "worst a case" when estimating absolute error ranges. This is because the creation of a charge upon ionization leads to large values of » Gsol (when dealing with the creation of ion pairs for instance, the total charge of the system remains zero and AAG^q- j is small). Under these circumstances, small re lativ e errors can lead to large absolute errors. For the sake of comparison we can back calculate the in trin sic pK 's from the experimentally measured pK, values given in Table 2.1. a a The solvation free energy corresponding to the in trin sic p«a can be written as follows, < n t - < o l + 2' 3 RT <PK 1nt - 2 ' 4 The pK of each acid is shifted by an amount related to its C l interaction, with other charged groups in the system. The average fractional charge Q, on each acid is given by the following equation, Q = X/(l+X) X = 10Q(PH-PX) 2.5 where Q is the magnitude of the fu lly charged acid (in this case Q is taken to be -1 ). The pK of acid i , pKL ., in the presence of other a a , 1 charges can be written as, pKa ,i = pKin t + (1/2.3RT) I 332-Q.Qj/R.j e fR .j) 2.6 < 3 For charges near the surface of the protein, the d ie le c tric constant Energy (kcal/m o l) Permanent Di poles Only All Protein Interacti ons Protei n PI us Solvent AAG P. i«* sol -WA6«o | Figure 2.8. 3 7 49 50 3 7 49 50 3 7 49 50 Acid Number Comparison of the results obtained for the in trin s ic pK_ 1 s G of the four acid groups in BPTI using two incomplete models: protein permanent dipoles only, and protein permanent and induced dipoles. Also shown are the results obtained using the PDLD model. These results are a summary of values contained in Figure 2.7. ^ is assumed to be 40. I f you start with experimentally measured pK values you can come a up with a reasonable estimate of the corresponding PK^^'s by solving equations 2 .5 -2 .6 in an ite ra tiv e manner. In the f i r s t step the charges on each group are calculated using the experimentally measured pK^'s and equation 2.5. values are then calculated using equation 2.6. These p K ^ values then become the new pKa 's for the next step. This cycle is repeated until se lf consistence is attained. The in trinsic solvation free energy can then be estimated using equation 2.4. In these calculations (which were for demonstration purposes only) we chose to assign the pK ,'s to the four acids as a follows: 3.1, 3.4, 3.7, and 3.8 for Asp 3, Asp 50, Glu 7, and Glu 49 respectively. These results are given in Table 2.2 and Figure 2.7. Figure 2.8 shows results obtained using three models: (1) a model which contains only the protein permanent dipoles, ( 2) a model i i which contains a ll protein contributions but neglects the contributions from the surrounding solvent and (3) the PDLD model which contains a ll three of these energy contributions. The key point that emerges from Figures 2.7 and 2.8 is that a ll energy contributions should be included in the calculations in order to obtain reasonable results. For example, i f the protein dipoles were included only, then i one obtains a spread of about 22 kcal/mol or 17 pK units between the | a j in trin s ic pK 's of the four acids. Also note that Asp 3 and Glu 7, E a I | which have low values for AAGq^ received th eir re la tiv e compensation j from difference sources: the surrounding solvent and the induced l ! 77 dipoles respectively. I t is clear from this data that a complete model, including induced dipoles and the surrounding solvent molecules is necessary to compensate for the differences in the contributions of the protein permanent dipoles. The model in its present form s t i l l results in errors that might seem very large in terms of pKa units. Instead of obtaining AAG^0- | = -70 ± 1 we find that AAG^-j for Glu 7 and Asp 50 underestimate the protein stabilizatio n and that AAG^-j for Glu 49 is overestimated. The deviation between the calculated and observed results might re fle c t the fact that the protein configuration was kept fixed in the calculations. One should clearly explore the effect of energy minimization as well as averaging over d ifferen t protein configurations. 2.3 S ta b ility and Convergence of the Calculations The previous section presented our calculated electrostatic energies at face value. In examining the relevance of such calculations, i t is important to establish th eir levels of convergence and s ta b ility . That is, in the same way that experimental measurements can give differen t results for differen t measurements, calculations of many dimensional systems give different results for d ifferen t in it ia l conditions (and for d ifferen t sets of parameters). For example, molecular dynamics simulations of ions in aqueous | solution can give electrostatic energies with convergence errors of * j 10-15 kcal/mol after a very long computer time (see discussion in 78 Warshel and Russell, 1984). A longer computer time is required for convergence of the simulation of electrostatic energies of solvated proteins. Thus, i t is important to estimate the error range associated with the actual implementation of a given theoretical model. This is done below by examining the convergence of the calculations and the sen s itiv ity to the assumed protein residual charges. Figure 2.9 describes the convergence of our model. The calculations presented were obtained by generating several solvent configurations by randomly offsetting the origin of the grid of the Langevin dipoles. The relevant energies are taken as the Boltzmann average of the energies obtained with the differen t grids. The calculations converge after about 15 iterations which requires about 3 minutes on an IBM 3081 computer. Figure 2.10 presents the results of our model using the protein residual charges which are reduced by 20% from our regular charge set given in Table 1.4. Comparing Figure 2.10 to Figure 2.7 provides a simple way to assess the se n sitivity of the calculations to the i protein residual charges. As seen from the figures, the changes in AAG 50I are sma^ er than ^ e current difference between the calculated and observed energies. Note that the differences between the calculated AAGq^ values are compensated by the other contributions. A | larger compensation would have been obtained i f the protein | coordinates were minimized separately with the two sets of parameters. I This point is of significant importance since calculations that do not ! include the water molecules and the protein induced dipoles may give Figure 2.9. Convergence of the Langevin dipoles model in calculations of pKa 's in BPTI. The figure describes the fluctuation in the contribution of the Langevin dipoles to the total solvation energy, for d ifferen t randomly selected origins of the Langevin grid. The figure gives &Sqw for both the charged and uncharged forms of the indicated acids. The fin a l contributions to AAG^q- j are evaluated by taking the Boltzmann average of the AG qw (of the charged and uncharged forms) over the differen t iteration s. The calculations were done using the o . Ite ra tiv e Langevin model with a grid spacing of 3A. 80 Asp 3 -20 -40 -60 -80 R u n N u m b e r Glu 7 -20 o E g -40 j L o o -60 R u n N u m b e r Glu 4 9 -20 o -40 a < j J C L O — o - 60 < 3 R u n N u m b e r A s p 50 -20 o 6 a u J C -40 < -60 R u n N u m b e r Figure 2.10. S ensitivity of AAG^q- j to the value of the protein residual point charges. As a test of the sen sitivity of A A G j? ol to the parameterized residual point charges we repeat the calculations of f igure 2.7with a charge set scaled by 20% over those used in that figure. The results indicate that the increased value of (the effect of the protein permanent dipoles) is largely compensated for by changes in the protein induced dipoles and surrounding solvent. 82 E N E R G Y (kcal/mol) 5 AAG„._) S(AAG 3 7 49 50 3 7 49 50 ACID NUMBER L . 83 the impression that only the best parameter set could give the correct energies. Our experience is that only converging models that include the main physical contributions can give reasonable energies, and that upon convergence the energies with differen t sets of parameters are q u a lita tiv e ly sim ilar. I t is interesting to note that the present calculations in protein took the parameters C' and yQ of the Langeviri dipoles from studies of ions in solutions without readjusting the parameters. Such adjustment w ill be ju s tifie d only after studies (which are now in progress) provide an optimal parameter set for the protein residual charges (permanent dipoles). Another point of interest is the s ta b ility of the calculations to the boundary conditions used. The present calculations change by less than 0.5 kcal/mol upon changing the radius of the Langevin grid from O ' O 12 A to 15 A (with the proper change of the radius of the bulk region). In examining the s ta b ility of the calculations one should consider th e ir se n s itiv ity to small changes in the protein coodinates. I A systematic study involving the generation of d ifferen t coordinate sets by molecular dynamics was begun. Preliminary results indicate that geometry changes with an average total root mean square deviation O of 0.2 A from the original X-ray structure resulted in changes of up ' P to 2 kcal/mol in the calculated MGsol • { In considering the effect of geometry changes i t is important to realize that the protein equilibrium geometry changes upon ionization of its acidic groups. For example, the solvation energy of a given ionized group decreases by about 10 kcal/mol when one uses the protein coordinates which were measured while considering this group to be unionized. I f one uses a procedure that takes the protein X-ray structure without energy minimization (as was done in this work) i t is important to take into account the ionization state in which the X-ray structure was obtained. The energy change associated with the relaxation of the protein upon ionization of its acidic groups can be estimated from the corresponding AGq^. More studies are needed, however, to determine the proper correlation factor. 2.4 Calculations of Charge-Charqe Interactions in Proteins The calculation of the effective interaction between charged groups in proteins has attracted significant attention in recent years (see for example March et afL» 1982). Part of this interest stems from the fact that the effect of these interactions can be estimated d irec tly from the results of simple titra tio n experiments (measuring the change in the pK of one group as a result of the ionization of a another). Unfortunately, studies of the seemingly simple charge-charge interaction without analysis of the corresponding self energies of the charges can lead to serious inconsistencies (see Warshel and Russell, 1984; Warshel et al_., 1984). This point w ill be demonstrated in the next section for the specific case of BPTI. j BPTI offers an ideal opportunity to study the influence of | j charge-charge interactions in proteins by examining the pKa shifts i \ j i resulting from the ionization of neighboring acid groups. The four acids of BPTI can be divided into two groups (see Figures 2.3 and 2.5): Asp 3 - Glu 7, and Glu 49 - Asp 50. The carboxyl carbon atoms O of each pair are separated by 9.1 and 7.6 A respectively. The charge-charge interaction can result in the addition of a large energy contribution to the total solvation energy AAG^-j of the system. Despite th is , surface groups (which constitute a majority of charged groups in proteins) maintain pK, values very close to those a found in water (pK^ = pK^ s p«. . ) . An interesting question is: How d a l n X does the protein compensate for the addition of this large charge-charage interaction in order to maintain these constant values. W e can use the PDLD model to look at this question by calculating the pK shifts that result from the previous ionization of a nearby a acid group. W e w ill consider four differen t situations: (1) the ionization of Asp 3 in the presence of ionized Glu 7, (2) the ionization of Glu 7 in the presence of ionized Asp 3, (3) the ionization of Glu 49 in the presence of ionized Asp 50, and fin a lly (4) the ionization of Asp 50 in the presence of ionized Glu 49. These however, are not the only possible interactions available to these acid groups. March and co-workers (March et a K , 1982) point out the j po ssib ility of interactions between Arg 53 and Asp 50, and Arg 20 and J Glu 7. For the purposes of these calculations we w ill consider only the charge-charge interactions outlined in the four cases above. ' The total solvation energy of a protein-solvent system with an | ionized group included within the surrounding protein can be written as, AAGP , = AGnn + AAGn + AAGn + AAGn 2.7 sol Q Q Qu Qa Q w where Gqq is equal to the simple Coulombic interaction energy between the charged groups. AGqq is equal to the change in Gqq upon ionization of the group question ( GQQ(group 1) = GQQ(both groups ionized) - GQQ(group 1 unionized; group 2 ionized)). Remember, that in our calculations we are dealing with AA quantities: the f ir s t A represents the change in energy when going from the gas phase to the protein, the second A represents the change in energy which results from a change in the value of the central (region I) charges. Since Gqq includes interactions between specific protein groups and not the protein and solvent in general, its contribution to the total solvation energy would properly be written as AGqq. Figure 2.11 and Table 2.3 summarize the results obtained for the four cases described above. Observed values for the solvation energies were estimated using the relation: “ G solb S s A A G soint + A G qq/e<R) 2-8 using the in trin sic free energies estimated in Table 2.2 by the approach of equations 2 .4 -2 .6 . A value of 40 was used for the i j | d ie le c tric constant of the protein (see next section). I Table 2.3 shows that when a nearest neighbor charge is ionized, j values of -69.1, - 66. 6, -67.5 and -73.8 kcal/mol are obtained fo r Asp Table 2.3. Calculated energy contributions to the ionization of the four acidic groups in BPTI, when each acids nearest neighbor acid group is ionized. A G qq AA G n Qy AAG Qa AAG Q w. iAGsol (AAG^°bs Asp 3(G lu7(-)) 32.6 - 11.8 -4.4 -85.5 -69.1 (-70.5) Glu 7(Asp 3 ( - ) ) 31.6 -18.9 -16.6 -52.7 ^66.6 (-69.8) Glu 49(Asp50(-)) 41.5 -31.1 - 1.6 -76.3 -67.5 (-69.3) Asp 50(Glu49(-)) 41.0 -36.8 -5.0 -73.0 -73.8 (-69.3) The calculations were done using.the PDLD model in a manner sim ilar to that described in the caption of Table 2.2. The total change in solvation energies, AAG^-j. are given in kcal/mol. The corresponding observed values are given in parenthesis and estimated by AAGP’ ° bs S AAG^int + AGQ Q /e(R) using the AAGP’ ] nt of Table 2.2 and e(R) of 40. This data is also summarized in Figure 2.11 and details of the calculations are given in the te xt. Nearest neighbor acid groups are indicated in parenthesis. Figure 2.11. Calculated contributions to AASsol '^qq + A A G q B + AAG n ) of the indicated acids in BPTI including the Q w 3 effect of a nearby ionized group (Asp 3“ - Glu 7”; Glu 49” - Asp 50” ). The observed values of are obtained by the equation M G ^ nt + AGgg/e(R) using the estimate ofAAG^^n^ from Table 2.2. and e(R) of 40. The calculations were done using the same PDLD approach and the notation is the same as in figure 2.7. AGqq was calculated using Coulomb's law with a diele c tric constant of 1 since the d ie le c tric is included e x p lic itly in the model. The slight asymmetry in AGgq for neighboring groups reflects the fact that the unionized acids were represented by residual charges (thus for example the interaction Asp 3“ — Glu 7° is not equal to Asp 3° - Glu 7” . 89 Charge- Charge © • • ■ © Permanent Dipoles Induced Dipoles Wafer+ Bulk Total (Langevin) Solvation 0 0 0 - 8 0 -60 -40 -20 0 20 40 6 0 AG Q Q AAG Q p MG Q a MG Q w M G P sol 3 7 49 50 3 7 49 50 3 7 49 50 3 7 49 50 3 7 49 50 Acid Number 90 3, Glu 7, Glu 49 and Asp 50 respectively. I f we useMGS0-| = -70 kcal/mol as our estimate in water then our results can be considered as quite close to those expected. Our average value for K o i 15 found to be -69 ± 3 kcal/mol, corresponding to a 4% separation in data. As in the case of in trin sic pK s discussed e a rlie r, i t is a clear that a ll components of the microscopic d ielec tric are necessary to compensate for the large destabilization created by the presence of the neighboring ionized acid groups. This is most clear for the case of the Asp 3 - Glu 7 pair where Asp 3 receives its relative compensation from the surrounding water while Glu 7 receives its re la tiv e compensation from the induced dipoles of the protein (along with a small contribution from the protein permanent dipoles). The large cancellation of the charge-charge interaction is the direct manifestation of a large effective d ie le c tric constant for the medium surrounding the two charges. Figure 2.12 shows how large charge-charge interactions are translated into small pKa changes as a result of the compensating effects of the protein d ie le c tric . These results are a comparison of calculated AAG^0- | values summarized in Tables 2.2 and 2.3. 2.5 Macroscopic Calculations of pKVs in Proteins Our discussion of pK, calculations would not be complete without a comparing our method with that used most commonly today: macroscopic calculations which treat the protein as a low d iele c tric sphere submerged in a high d iele c tric environment. The formulation of this Figure 2.12. The effect of charge-charge interactions of the pKa 's of acidic groups in BPTI. This figure demonstrates how large charge-charge interactions between neighboring acids in BPTI are translated into small changes in th e ir actual pK values. This would be the result a expected i f the charge-charge energy were calculated using a Coulomb's law type formula with a high d iele c tric constant. The observed free energy associated with the pKa change of each acid (estimated by AGqq/e(R) using i(R) = 40 are indicated by horizontal lines. The calculated changes are obtained from the difference between the AAG^-j of Figures 2.7 and 2. 11. 92 4 0 30 20 10 1.33IA p K < Lli 3 7 49 50 3 7 49 50 ACID NUMBER 93 approach has been an evolutionary process which started with the pioneering work of Kirkwood in the 1930's (Kirkwood, 1934). Kirkwood derived a formula to calculate the interaction between charges in a medium consisting of a sphere of one d ielec tric constant surrounded by an environment of another d iele c tric constant. The f ir s t attempt to use this model to assess the factors responsible for titra tio n curves in proteins was by Tanford and Kirkwood (Tanford & Kirkwood, 1957; Tanford, 1957). Their approach, refered to as the Tanford-Kirkwood (TK) model treated the protein as a sphere of d iele c tric constant 2, surrounded by solvent (water) of d ie lec tric constant 80. In these calculations they placed charges at various levels beneath, and at the surface of model proteins and found that only when the charges were about 1 & beneath the surface did they obtain ! titra tio n curves sim ilar to experiment. These results, which were j quite remarkable considering the current knowledge of protein I structures of that time, had one drawback: the in trin s ic pK , the cl quantity which most reflects the d iele c tric influence of the protein environment, was.treated as a parameter; indeed, th e ir primary interest was in reproducing slopes of titra tio n curves rather than the | values of the pKg's themselves. i | Gurd and co-workers modified the TK model to allow for a change j j of the protein d iele c tric effect for regions of high solvent | i accessibility. This approach, refered to as the modified ! iTanford-Kirkwood (MTK) model s t i l l treated the self energy term (the i i i 94 in trin s ic pK ) as a parameter. c i For completeness a synopsis of the derivation of Kirkwood's formula is given in the appendix. For sim plicity, our discussion w ill not concern its e lf with the fu ll Kirkwood equation in its most general form (as given in equation A.10) but with the lim itin g case where the d ie le c tric constant of the protein is much less than that of the surrounding solvent (as assumed in the two models described above). I f one solves the Kirkwood equation for the case of a single charge in the lim it that ep/ew -*o you obtain the following result: AGTK = -166 Q 2/ b e [ l - ( l - ( r / b ) 2)] 2.9 where Q is the magnitude of the charge in electronic charge units, b is the radius of the protein, e is the d ie le c tric constant over the r protein, e w is the d ielec tric constant of water, and r is the displacement of the charge from the center of the sphere. The energy is given in kcal/mol. Without the self-energy included this equation indicates that charges w ill be more stable in non-polar solutions rather than polar. By incorporating the se lf energy in equation 2.9 however, reasonable results are obtained from the TK approach. Equation 2.9 can be rewritten as follows: AGsol s::AGsol " 166 q2 (1 /® + ^ ( ^ p ' 1/e w) " AgTK 2- 10 95 p=r 2______________________________ Figure 2.13. The free energy, aG - , of moving a charge of a radius a from water to a sphere of low d iele c tric constant, TK e. , of a radius b. aG^, is the free energy obtained by the Kirkwood’ s model which does not include the self energy of the charge. 96 I r\) Q c n c r > < £ > Sol.K FREE ENERGY (kcal/m ol) ro o a s a> cn where a" is the radius of the charge. Figure 2.13 shows a graph of the results obtained for these two equations. The point of the figure is to show that for a non-polar protein regardless of which model you use the self energy of the charge is quite sensitive to its location in the protein. C learly, models which treat the proteins as non-polar w ill only give good results for cases where the charges are at or near the surface of the protein where both high and low d ie lec tric models give sim ilar results. For charges in the in terio r of the protein, P / \ sol can chan9e up t0 3° kcal/mol (a change of about 25 pKg units) when moving the charge from water to the center of the protein. The standard M TK approach as summarized above suffers in that i t does not use a ll the available information about the protein structure in its calculations. Despite the fact that we can determine the depth ' of charges in proteins i t continues to use the constant depth as determined by Tanford (Tanford, 1957). The only connection between i the M TK calculations and the protein structure is the distances separating the charges. As an improvement to the M TK approach we can modify the PDLD model to simulate the effect of the M TK approach by turning off the permanent dipoles of the protein. This method which 1 | can be thought of as a discrete version of the continuum approach, can ! tre a t a protein with an actual non-spherical geometry (where the i j ionized groups are at th eir actual protein sites) as a hypothetical | non-polar protein surrounded by water. The corresponding solvation 1 energies for moving a hypothetical ionized group from the surface of the protein (with no permanent dipoles) to its center is described in Figure 2.14. As seen from the figure the stab ilizatio n of ionized acids in hypothetical non-polar proteins is much less than in water. This prediction would have been correct i f proteins could be represented by spheres of d iele c tric 2. However, the actual microenvironment around ionized groups in proteins is in fact very polar and such groups are stable rather than unstable in th eir special sites in the in terio r of proteins. I t is important to mention in this respect, that Kirkwood and Tanford did realize that ionized groups would be extremely unstable in the interiors of proteins (Tanford and Kirkwood, 1957). However, since experimental findings pointed put that the pl^ is sim ilar to the corresponding pK'f they concluded that charges in proteins must be at a a short (constant) distance from the protein surface. The above discussion might s t i l l be considered as an exercise since i t was not applied to the actual positions of the relevant acids. Here we feel that the present study does provide an opportunity to examine what could be the error associated with a consistent model where the protein is represented as a low d ielec tric medium and evaluate the energetics of charged surface groups at th eir actual positions. The examination of this point for an isolated group can be done by evaluating the calculated values of A4Gpw and AAG^ for BPTI where a ll the protein residual charges (permanent dipoles) are set to zero. The results of the calculations are given in Table 2.4. The corresponding results (which provide a microscopic estimate of the j I energy of the charges in the hypothetical nonpolar protein) are -81, 99 Figure 2.14. Energetics of ionized groups in nonpolar proteins. As a test case we calculated the energy of transferring O a 3 A radius charge (e.g. an acid group) from the surface of BPTI to its center assuming, incorrectly, that the protein contains no permanent dipoles. I t is demonstrated that a charge inside a nonpolar protein is destabilized by about 25 kcal/mol re la tiv e to water. This resu lt, which is consistent with electrostatic theory, cannot be obtained by models that neglect the self-energy of the charge group. 100 AGgo, (kcal/m ol) -5 0 -6 0 -7 0 -8 0 distance from center (&) Table 2.4. Calculated energy contributions to the ionization of the four acidic groups in BPTI when the permanent dipoles of the protein are excluded. These calculations .simulate the effect of a non-polar protein on the pK values of the four acids. a AAGn AAG„.. AAG? Qa Q w sol Asp 3 -5.6 -75.6 -81.2 Glu 7 -20.4 -44.8 -65.2 Glu 49 -15.3 -68.9 -84.2 Asp 50 -16.6 - 68.1 -84.7 j The calculations were done using the PDLD model with the permanent J dipoles of the protein (region I I ) set to zero. The total solvation j | energy changes, AAG^-j, are given in kcal/mol. 102 f i -65, -84 and -84 kcal/mol for Asp 3, Glu 7, Glu 49 and Asp 50, p respectively. These values are more negative than the Gso- | o f Figure 2.7. This overestimation is p a rtia lly due to the absence of the interactions between the Langevin dipoles and the permanent dipoles on the surface of the protein; when these interactions are included (e.g. Figure 2.7) they stab ilize the reference state of the uncharged acid. Leaving for future studies the exact reason of this overstabilization of charges near hydrophobic surfaces, we can recalibrate the above results and obtain a useful estimate of the trend in the relevant energies. A reasonable recalibration is P . . obtained by scaling a ll the calculated aaG sq- | by 0.86. This gives -70, -56, -72 and -72 kcal/mol for Asp 3, Glu 7, Glu 40 and Asp 50, respectively. These results indicate that the loss of the s ta iliz a tio n by the protein permanent dipoles is compensated by the reorientation of the surrounding water molecules in a ll cases, except I in the case of Glu 7. Thus a macroscopic model that considers the protein as a nonpolar medium can s t i l l give a reasonable in trin sic dK„ value for qroups which are close to the surface of the protein c l (although a part of the actual stab ilizatio n of the ionized acid by the protein permanent dipoles should be attributed inconsistently to the surrounding water). Problems could occur, however, in cases like that of Glu 7 or (in much more serious ways) in cases of groups which j are ionized in the in te rio r of proteins ( i. e . Asp 102 in j chymotrypsin). In such cases, as indicated in Fig. 2.14, the omission i of the effect of the protein permanent dipoles can lead to a i | destabilization of up to 30 kcal/mol. ! 103 2.6 References Arnett, E.M., Jones, F.M., I I I , Taageppera, M.W.G., Beauchamp, J .L ., Holtz, D. & T a ft, R.W.I. (1972) J. Am. Chem. Soc. 94, 4724-4726. Cantor, C.R. & Schimmel, P.R. (1980) Biophysical Chemistry Part 1: The Conformation of Biological Macromolecules, pp. 49-50. San Francisco: Freeman. Deisenhofer, J. & Steigmemann, W . (1975) Acta Cryst. Sect. B, 31, 238-250. Huber, R., Kukla, D., Ruhlmann, A., Epp, O.S. & Formanek, H. (1970) Naturwissenschaften, 57, 389-392. Kaberle, P. (1970) Ann. Rev. Phys. Chem., 28, 445-476. Kirkwood, J.G. (1934) J. Chem. Phys. j?, 351-361. March, K .L., Maskalick, D.G., England, R.D., Friend, S.H. & Gurd, F.R.N. (1982) Biochemistry, 21, 5241, 5251. Matthew, J.B ., Hanania, G .I.H . & Gurd, F.R.N. (1979) Biochemistry, 18, 1919-1928. Richarz, R., & Wuthrich, K. (1978) Biochemistry, 17, 2263-2269. Russell, 3.T. & Warshel, A. (1985) J. Mol. B io l., in press. Tanford, C. (1957) J. Am. Chem. Soc. 79, 5340-5347. Tanford, C. & Kirkwood, J.G. (1957) J. Am. Chem. Soc. 79, 5333-5339. Warshel, A. (1984) Proc. Natl. Acad. Sci. USA, 81, 444-448. Warshel, A. & Russell, S.T. (1984) Quarterly Rev. Biophysics, 17, 283-422. 104 Warshel, A., Russell, S.T. & Churg, A.K. (1984) Proc. Nat. Acad. Sci. U.S.A., 81, 4785-4789. 105 CHAPTER 3 CALCULATIONS OF THE RATES O F ENZYMATIC REACTIONS: ESTIMATION O F THE ELECTROSTATIC CONTRIBUTION TO CATALYSIS B Y THE SERINE PROTEASES 3.1 Introduction In this chapter preliminary results of calculations of the rate of peptide hydrolysis by the serine proteases are reported. In the previous chapter we demonstrated that a carefully calibrated model could be used to study the energetics of electrostatic interactions in proteins. Here we take the same model without modification and apply i t to the problem of enzyme catalysis. The serine proteases are a general class of enzymes whose primary role is to catalyze the hydrolysis of peptide bonds. They also show significant a c tiv ity for the hydrolysis of amides and esters in the laboratory. Two important members of this class are Trypsin and Chymotrypsin which w ill be the subjects of these calculations. The mode of operation of the serine proteases is one of the most extensively studied problems in enzyme catalysis both th eo retically (for a review see Naray-Szabo & Bleha, 1982) and experimentally (covered thoroughly in the text by Fersht, 1977). The detailed structure of these enzymes have been explored to an extent exceeding that of any other class of enzymes. Despite th is , there is s t i l l much debate over the actual means by which these enzymes catalyze th eir reactions. I t is well established that the serine proteases possess sim ilar active centers; this s im ilia rity exist despite gross differences in the remainder of th e ir three dimensional structures. The invarient active center consist of the catalytic triad : Aspartic acid 102, Histidine 57, and Serine 195 (this numbering system is used for both enzymes; these residue numbers actually correspond to the locations of the residues in the amino acid sequence of Chymotrypsin). Asp 102 is unique in that i t is buried in the interior of the protein: a place where polar groups are normally not found. A diagram of the catalytic tria d is shown in figure 3.1 with some of the important protein dipolar groups indicated. The reaction of the serine proteases is classified as a general base catalysed nucleophilic attack on the carbonyl carbon of the substrate by the hydroxyl oxygen of Ser 195. The reaction can be written as follows: 0 0 J l + - I I Im + R'-OH + R-C-X ImH + R-0 + R-C-X - > ■ 0 “ 0 + I 1 1 Im-H + R'-O-C-R + Im + R'-O-C-R + H X + I ° Im + R'OH + R-t-O.H - H20 j ' | where X is an NHR" group in the peptide hydrolysis reaction; Im is the imidazole of His 57 and R'OH is the hydroxyl group of Ser 195. For 107 Figure 3.1. The catalytic tria d in the active site of Trypsin. Several important peptide and side chain dipoles are also indicated. The peptide dipoles of Ser 195 and Gly 193 make up the so-called oxyanion hole. many years i t was thought that the enzyme functioned through an alleged "charge relay" mechanism (Blow et a U , 1969). Under this scheme two proton transfers would occur: a proton transfer would take place between the hydroxyl group of Ser 195 and His 57, and a proton transfer would take place between His 57 and the buried Asp 102. This double transfer has been studied as both a concerted as well as a step-wise process. The result of this process would be an increase in the nucleophilicity of the serine oxygen. Recent experimental studies (Bachovchin & Roberts, 1978; Markley & Ibanez, 1978; Kossiakoff & Spencer, 1980) and theoretical studies (Kollmann & Hayse, 1981; Warshel et a^. > 1982; work quoted in Naray-Szabo & Bleha, 1982) have combined to disprove this hypothesis. Currently two mechanisms are proposed. In the f ir s t i t is thought that the charged Asp 102 stabilizes the ion pair formed by the transfer of a proton from Ser 195 to His 57 increasing the nucleophilicity of the serine. The second hypothesis states that the charged tetrahedral intermediate of the serine-peptide linkage is stabilized by specific dipole interactions in the oxyanion hole of the protein (Warshel et a l ., 1982; see figure 3 .1 ). Both mechanisms emphasize the importance of the protein permanent dipoles: in the f ir s t mechanism the buried Asp 102 is stabilized by several specific hydrogen bonds surrounding i t , i and in the second the peptide and side chain dipoles of the protein i l I provide the necessary stab ilizatio n for the charged transition state i j d ire c tly . I The reaction process can be divided into three basic steps. A comparison of these steps is shown in Figure 3.2 for the case of a chemical reaction in protein and in water. The f ir s t step involves the formation of a complex between the reacting species. In solution this corresponds to the diffusion of the reactants into the same solvent cage. For proteins this step is represented by the formation of the enzyme-substrate complex (Michaelis complex). The second step involves the formation of the transition state (the acylation step). Using transition state theory we can write the rate of this step as, kcat - <kbT/,Wexp(-AG*at/RT) 3.2 where and h are the Boltzmann constant and Planck's constant respectively. At normal physiological temperatures k^T/h can be 13 approximated by 10 . J I The fin a l step (which is not shown in Figure 3.2) is the deacylation of the enzyme-substrate complex. I t is known that through a careful choice of substrates that either the acylation or the i I deacylation step can be rate lim itin g. However, for peptides and i |amides the acylation step (the formation of the tetrahedral ! |intermediate) is known to control the reaction rate. I j In this study we w ill look closely at the energetics of the |formation of the tetrahedral intermediate. W e w ill begin by checking » ■ the v a lid ity of the calculations by reproducing the pK_ of the ii imidazole ring of His 57 in Trypsin. W e w ill then look at the two j ! i i Jsteps leading to the formation of the tetrahedral intermediate: the Figure 3.2. Comparison of the energetics of enzymatic and solution reactions (Warshel, 1981). E, S, and P are respectively the enzyme, the substrate, and the product. C in the solution reaction is the catalyst (e.g. a base in general base catalysis). ES is the enzyme substrate complex. (SC), (SC*) and (PC) are the indicated components inside a solvent cage. The proteins catalytic contribution is indicated by AA3*, the difference in the activation energy of the reaction in protein and the corresponding reaction in water. I l l ENZYM E SOLUTION (SC*) c a g e c a t m c a g e E+S EP (SC) :r; (PC) ' b i n d 1 ro proton transfer from Ser 195 to His 57, and the subsequent formation of the tetrahedral intermediate. 3.2 Computational Details In the following sections we w ill report the results of calculations performed on the serine proteases Trypsin and Chymotrypsin. In the Trypsin calculations the X-ray crystal structure of the Trypsin-BPTI complex was used (Ruhlmann et al_., 1973 refined to O 1.9 A resolution by Huber et a/h, 1974). The inhibitor was removed and the substrate was simulated by the carbonyl group of the peptide bond connecting lysine 15 and alanine 16 in BPTI to the active site of O Trypsin. This model has a somewhat long 2.6 A distance between the oxygen of Ser 195 and the carbonyl carbon of the substrate. Future studies w ill use energy refinement techniques to create a more re a lis tic substrate model. Calculations on Chymotrypsin used the crystal structure of tosylated a -Chymotrypsin (Matthews et j§J[., 1967; later refined by B irktoft & Blow, 1972). The substrate was simulated by placing a carbonyl group at the location of the S-0 connecting the tosyl group to Ser 195. The ca talytic triad plus the model substrate was placed in region I . The catalytic triad was modelled using the CO2 group of Asp 102, the imidazole ring of His 57 and the CH^ O H group of Ser 195. The remaining portion of these residues were included with the rest of the j protein. Region I I consisted of a ll protein dipolar groups (see { O | section 1.2) which contained at least one atom within 7 A of any region I charge. This resulted in 302 and 408 region I I atoms for Trypsin and Chymotrypsin respectively. The protein that remained was simulated by either Lnagevin dipoles or as bulk solvent. Charges and p o la riza b ilitie s were added to each region I I atom from the parameter l is t given in Table 1.4. The charges used for the region I atoms are given in Table 3.1. The charges on the imidazole group of His 57 were determined quantum mechanically by calculating the potential from the permanent dipoles of the protein on each atom of the imidazole ring and adding these values to the appropriate diagonal matrix elements of the Fock Hamiltonian (Warshel, 1976). New charges were then calculated using the QCFF/PI approach of Warshel (Warshel & Lappicirella, 1981). This process was performed for both the charged and uncharged forms of imidazole. The contribution from the surrounding water was calculated in the same way as i t was for BPTI using the parameters of Table 1.4 and a 12x12x12 A grid with a 3 O A spacing. The energy of the reaction pathway was simulated by calculating energy differences between various states (resonance forms in the formalism of the empirical valence bond approach of Warshel (Warshel & Weiss, 1980; Warshel, 1981) of the active site groups. These states include the following: C02" H-Im H-0 C=0 3 .3 C02‘ H-Im-H+ 0" C=0 3 .4 C02" H-Im-H+ o -c -o " 3 .5 o o o 1 H-Im-H+ H-0 C=0 3 .6 114 Table 3.1. Region I charges used for the active sites of Trypsin and Chymotrypsin. 3 Ionized Unionized Tetrahedral Intermediate Asp 102 His 57b Ser 195 Substrate i i i c 0.70 0.70 0 -0.85 -0.35 0 -0.85 -0.35 c 0.164 0.035 N1 -0.161 0.013 H 0.187 0.187 C 0.545 0.110 H 0.075 0.077 N 2 -0.161 -0.520 H 0.187 — C 0.096 0.028 H 0.068 0.070 0 - 1.000 -0.427 0.000 H — 0.427 0.000 C 0.392 0.200 0 -0.392 - 1.200 a b Charges given in atomic charge units. The numbering system of the imidazole ring begins with the y-carbon & proceeds in a counterclockwise direction as represented in figure 3.1. N1 is the nitrogen on the in te rio r side of the imidazole (next to Asp 102) and N2 is the nitrogen which receives a proton from Ser 195 in the f ir s t step of the catalytic reactions of Trypsin and Chymotrypsin. The model tetrahedral intermediate is formed by modifying the charges of Ser 195 and the Substrate as shown in this column. 115; The top three states represent the f ir s t two steps in the acylation process: the transfer of a proton from Ser 195 to His 57, and the subsequent formation of the tetrahedral intermediate. The f ir s t and last states are used to calculate the pKa of the imidazole ring of His 57. By calculating the energy of each of these states in protein and in water we can arrive at an reasonable estimate for the change in energy resulting in each step. Warshel (Warshel, 1980) took this procedure one step further treating these states as valence bond resonance forms and including the solvation energy of each in the diagonal matrix elements of the valence bond Hamiltonian. This was used to give reaction surfaces for the catalytic reaction of Lysozyme in solution and in protein. I t is important to note that i t is not sufficient to calculate the electrostatic contributions to catalysis in proteins only. Since catalysis is measured re lativ e to the rate of the corresponding reaction in water, questions of the difference in the energies of the reactions in protein and in water must be addressed. The individual contributions to AAG^ - j for the four states described above are given in Table 3.2 for Trypsin. Also included are the results of calculations conducted on Chymotrypsin for the states I given by equations 3.3 and 3.5. Table 3.3 lis t the results of Table 3.2 with the energies given relative to the state given by equation ! 3.3. The estimation of the corresponding energies in water is f j discussed in the following section. Table 3.2. Electrostatic energy contributions to various states of the active sites of serine proteases. State* 3 agqq A G n Q P A G n Qa AG Q w AGP , sol < 0- Trypsin His(0)-Ser(0)-Sub(0) -25.2 -45.7 -794.6 -55.9 -896.2 -71 Hi s(+)-Ser(-)-Sub(0) -155.0 -42.8 -807.2 -55.1 -905.2 -85 H is (+ )-te t(-) -121.4 -64.2 -811.4 -60.8 -936.3 -106 His(+)-Ser(0)-Sub(0) -105.2 -30.2 -802.6 -41.6 -874.4 -49 Chymotrypsin His(0)-Ser(0)-Sub(0) -24.4 -44.0 -912.5 -50.0 -1006.9 -72 H is (+ )-te t(-) -106.1 -83.1 -928.7 -45.7 -1057.5 -121 j a All energies in kcal/mol. k His, Ser, Sub and Tet represent His 57, Ser 195,the substrate and the tetrahedral intermediate respectively. Ionized Asp 102 is present in a ll states. Table 3.3. Electrostatic energy contributions of Table 3.2 relative to the state H is(0)-S er(0)-S ub(0).a State1 3 AAGn Qv AAG n Qa AAG n Q w AAGP , sol A A G Trypsin His(0)-Ser(0)-Sub(0) 0 0 0 0 0 His(+)-Ser(-)-Sub(0) 2.9 - 12.6 0.8 -9.0 -14 H is (+ )-te t(-) -18.5 -16.8 -4.9 -40.1 -35 His(+)-Ser(0)-Sub(0) +5.5 - 8.0 14.3 21.8 22 Chymotrypsin His(0)-Ser(0)-Sub(0) 0 0 0 0 0 Hi s (+ )-te t(-) -39.1 -16.2 4.3 -50.6 -49 a All energies in kcal/mol. b Ionized Asp 102 is present in a ll states. 3.3 Calculations of Solvation Energies in Solutions In chapter 1 we developed a general method for calculating solvation energies in proteins. In the process we showed how the Langevin dipoles model could be used to determine the solvation energies of simple ions in solution (the FCLD model). In these i j calculations however, we rely on a more sophisticated method known as I ! j the Surface Constrained Soft Sphere Dipoles (SCSSD) model (Warshel, j i 1 ! 1979). This model represents each water molecule as a point dipole | I I i i | attached to a soft sphere. The model functions in the following way: ! j the solute molecule is surrounded by a cluster of solvent molecules j ! (point dipoles). The charges on the atoms of the solute should ! ! correspond to those used in the protein calculations. These solvent ! ; i ■ molecules are then surrounded by a surface of solvent molecules j ! I i I ! constrained to fixed centers which correspond to the structure of the ] | i ; bulk solvent. Energy minimization is then performed on an energy ! surface which consist of dipole-dipole and van der Waals interactions j \ between solvent dipoles, and dipole-charge and van der Waals ■ interactions between solute charges and solvent dipoles. The purpose I of the surface constaint is to eliminate the overpolarization that I occurs on the surface as a result of the neglect of the fie ld from the : bulk solvent. This effect is also seen in the FCLD model. The model j i ! ' is calibrated to reproduce solvation free energies at 300°K. Table • i J s 1.1 and Table 3.4 lis t solvation energies calculated by this 1 approach. Rather than calculating the solvation energy of the entire actives Table 3.4. Calculated solvation energies of the fragments involved in the catalytic reaction of serine proteases. 3 Fragment w ^sol ( lonized) A G ^ol (unionized) ^ s o l H C O O H -80 -10 -70 ch3ch2oh -92 -6 -86 Imidazole -62 -4 -55 C \ C-O-C-O -85 -6 -79 C 3 Energies in kcal/mol. All energies were determined using the SCSSD model. site d irec tly (which due to its size would represent a significant challenge) we take advantage of the fact that the to tal energy that results from bringing two charges together in solution remains constant; that is , the change in charge-charge interaction energy that results from bringing the charges closer together is largely compensated for by an opposite change in the total solvation energy. This compensation can be seen by looking at experimentally measured pKa shifts in zwitterions and dicarboxylic acids of various lengths. a These results are shown in Figures 3.3 and 3.4. With this in mind, we can estimate the solvation, energy of a particular active site state in water by the relation: AAGsol(R=P) = \ AGsol(R=” ) - AG QQ(r=p) 3 - 7 where R=P mean that a ll groups are in th eir protein sites and R=°° means that a ll groups are separated by an in fin ite distance. The subscript i represents each group in the enzyme active site (region I ) . Results using this relation for the region I states given in equations 3 .3 -3 .6 are listed in Tables 3.2 and 3.3. 3.4 pK_'s of Catalytic Groups in Trypsin a Before addressing the important question of catalysis i t is f i r s t necessary to examine the r e lia b ilit y of the calculations by comparing calculated and observed pKa 's of the catalytic groups; large deviations between calculated pKa 's and experimental findings would be i ! a clear warning that catalytic results might be suspect. Figure 3.3. Experimental estimate of the change in free energy AAG(R) that results from bringing two charges from in fin ity to a distance R from each other. AAG(R) values were estimated for several zwitterions and dicarboxylic acidsusing the relatio n , AAG(R) = 1.33 (pK (second ionization) - DK(acetic acid )). The a experimental pK 's are taken from Dawson et a l., a — (1974). The error bars correspond to the uncertainty in the length of the molecules. MG(R)(kca l/mol) o — - 1. 0 - - 2.0 - -3.0 - I— 1 ------- i —fl)— i 20 3.0 4.0 50 6.0 R(&) 7.0 8.0 9.0 co2- o / S / -0 2C 0 -02 </N v //N C 02- C02- -o2 c a -0 2C co2- . + h3n co2- ♦ ^ / ° 2' +h3n +H3N C02- Figure 3.4. Experimental and theoretical estimates for an HCOO" ion pair. The experimental estimates are based on the results of Figure 3.3. and on AGg0-j (<») values given in Table 1.1. The theoretical estimates are based on calculations using the SCSSD model. The figure demonstrates the remarkable compensation of the changes in the charge-charge interaction by changes in the solvation energy. 124 FREE ENERGY (Kcal/mol) I a o •P * o NJ O o o I 00 o I o o I O i ro O ro tn O J O Ln O 70 7 0 to -to to O IO O oo < > The assignment of the pKa 's of Asp 102 and His 57 in serine proteases has long been controversial. I t is known that the active sites of serine proteases possess a group whose pKfl is around 6.7 (see 13 Fersht, 1977 for a discussion of e a rlie r work). C N M R studies on ^ l y t i c protease, a serine protease which has only one histidine residue, indicated that the histidine had a pKg of less than 5 (Hunkapiller et a l ., 1973). They assigned the active site group with 15 a pK of 6.7 to Asp 102. However, later studies using N N M R c l (Backovchin & Roberts, 1978; for a review see Kamamori & Roberts, 1983) and neutron d iffraction (Kossiakoff & Spencer, 1980) have shown that the pK of His 57 about 6.9 and the pKa of Asp 102 is probably ct d less than 4. In these calculations we use the PDLD model tested on BPTI without modification. The solvation energy of the model active site w in water is estimated using equation 3.7 and the values of AG sol listed in Table 3.4. In these calculations we include Asp 102 in its ionized form. This gives a value of -80 -62 -6 -6 +105 = -49 kcal/mol for our estimate of the configuration Asp(-)-His(+)-Ser(O)-Sub(O) and _80 -4 -6 -6 +25 = -71 kcal/mol for our estimate of the configuration A sp(-)-H is(0)-S er(0)-S ub(0). The result is a value of 22 kcal/mol for our estimate of in equation 2.3. In Trypsin we find a value of 21.8 kcal/mol for our estimate of AAGP Q-j. This leads to a solvation energy change of 0 kcal/mol for moving the process of histidine i imidazole ionization from water to the site of His 57 in Trypsin. j I This resu lt, which corresponds to a pKa in Trypsin of approximately 6 , • d \ ! 126 is consistent with the most recent experimental interpretation. 3.5 The Energetics of the Proton Transfer Process The f ir s t step in the mechanism of peptide hydrolysis by the serine proteases is thought to be a proton transfer from the hydroxyl oxygen of Ser 195 to the N2 nitrogen of the imidazole ring of His 57. This reaction has been studied extensively in the gas phase using quantum mechanical methods prim arily in an e ffo rt to study the proposed charge relay system and to assess the role of Asp 102 in the cataly tic triad (Kollmann & Hayes, 1982; Umeyama et a ! ., 1981; see also the review by Naray-Szabo & Bleha, 1982). Calculations using the protein structure have not been possible by these methods. W e can however, couple the use of experimentally measured quantities in water with calculated changes in solvation energies when going from water to protein to arrive at a reasonable estimate of the proton transfer energy in proteins. Using as our reference state the configuration A sp (-)-H is(0)- Ser(0)-Sub(0) we can write the relative energy of the proton transfer step as follows: AGpT S 2.3 RT (pK” (Ser) - pK^ (His)) 3.8 w I f we use the well known value of 6 for the pK of histidine and a W replace the pKg of serine with the value of 16 found fo r ethanol (Dawson et aJL, 1974) we get: AG„T = 1 .3 3 (16-6) = 13 kcal/mol 3.9 The energy of the proton transfer in the protein can then be solved by calculating the differences in the solvation energies of the individual fragments in both protein and in water. By using experimental results from solution we avoid the necessity of performing d iffic u lt quantum mechanical calculations in the gas phase. The r e lia b ilit y of our results depend prim arily on our a b ility to approximate solvation energies in different environments. W e have performed the calculation described above for the proton transfer process in Trypsin. The relevant states for the process are the Asp(-)-His(0)-Ser(0)-Sub(0) and A sp(-)-H is(+)-Ser(-)-Sub(0) forms given in Tables 3.2 and 3.3. Note that these calculations are performed in the presence of ionized Asp 102. From these results j AGpT is found to be !3 + (-9 -(-1 4 )) = 18 kcal/mol. This value is 5 kcal/mol higher than in water indicating that the protein does not catalyze this step (even i f the error range of about 5 kcal/mol is taken into consideration the best the protein can do is to produce the same value for this step as found in water). 3.6 The S tabilization of the Tetrahedral Intermediate The fin a l step in the formation of the acyl enzyme passes through | a tetrahedral intermediate. As shown in Figure 3.1, the negatively I charged oxygen of the tetrahedral intermediate is stabilized by 3.10 128 several key dipoles in the oxyanion hole of the enzyme active s ite . In this section we w ill try to assess the role the enzyme plays in stabilizing this charged intermediate. For peptides and amides this represents the rate lim iting step for the hydrolysis process. Studies on synthetic model systems containing the catalytic tria d have shown that the protein lowers the activation barrier by an upper lim it of 8 kcal/mol (Rogers & Bruice, 1974; Zerner et aTi, 1964; Epand & Wilson, 1963). w AAG sqi for the model Trypsin systems in water can be estimated using equation 3.7 and the solvation energies listed in Table 3.2. This gives a value of -35 kcal/mol. W hen compared to aAAG^0- | value of -40 kcal/mol a reduction of 5 kcal/mol in the energy of the tetrahedral intermediate is found. This is in good agreement with the experimental results described above. W hen these calculations are repeated for Chymotrypsin values of D W -51 and -49 kcal/mol are obtained for A A G ;! - j and A^ so- | respectively. This results in a 2 kcal/mol reduction in the energy of the tetrahedral intermediate. I t should be noted however, that the pK c l value of His 57 in Chymotrypsin has not as yet been reproduced correctly. For this reason the results of the Trypsin calculations should be considered more re lia b le . I t is interesting to note that despite large differences in Gqg for the two enzymes (the Ggg values for the two tetrahedral intermediate states d iffe r by 15 kcal/mol) that this difference is almost completely compensated for by changes in the interactions with I j ! 129 the protein and the surrounding solvent. Chymotrypsin receives 20 kcal/mol additional stabilizatio n from the protein permanent dipoles while losing 8 from its interaction with the surrounding water. This i j difference in active site energy has been found in studies of 8 serine ! proteases using quantum mechanical approaches (Angyan & Naray-Szabo, 1983). I t appears certain that the protein permanent and induced dipoles, and the surrounding solvent must be included to resolve the differences found amongst the various enzymes in this class. 130 3.7 References Angyan, J. & Naray-Szabo, G. (1983) J. Theor. Biol. 103, 349-356. Bachovchin, W.W. & Roberts, J.D. (1978) J. Am . Chem. Soc., 100, 8041-8047. Blow, D.M., B irk to ft, J.J. & Hartley, B.S. (1969) Nature, 221, 337-340. Dawson, R.M.C., E llio t, D.C., E llio t, W.H. & Jones, K.M. (eds) (1974) Data fo r Biochemical Research, 2nd ed. Oxford: Clarendon. Epand, R.M. & Wilson, I.B . (1965) J. Biol. Chem., 240, 1104-1107. Fersht, A. (1977) Enzyme Structure and Mechanism. Reading: Freeman. Huber, R., Kukla, D., Bode, W., Schwager, P., Bartles, K., Deisenhofer, J. & Steigemann, W . (1974) J. Mol. Biol. 89, 73-101. Hunkapiller, M.W., Smallcombe, S.H., Whitaker, D.R. & Richards, J.H. (1973) Biochemistry, 12, 4732-4743. Kamamori, K. & Roberts, J.D. (1983) Acc. Chem. Res. 35-37. Kollman, P.A. & Hayes, D.M. (1981) J. Am . Chem. Soc., 103, 2955-2961. Kossiakoff, A.A. & Spencer, S.A. (1981) Biochemistry, 20, 6462-6474. Markley, J.L. & Ibanez, I.B . (1978) Biochemistry, 17, 4627-4640. Naray-Szabo, G.& Bleha, T. (1982) Progress in Theoretical Organic 131 Chemistry, Vol. 3 (ed. T.G. Csizmadia), pp. 267-336. Amsterdam: Elsevier. Rogers, G.A. & Bruice, T.C. (1974) J. Am. Chem. Soc. 96, 2473-2480. Rulmann, A., Kukla, D., Schwager, P., Bartels, K. & Huber, R. (1973) J. Mol. Biol. 77, 417-436. Umeyama, H., Nakagawa, S. & Kudo, T. (1981) J. Mol. Biol. 150, 409-421. Warshel, A. (1979) J. Phys. Chem. 83, 1640-1652. Warshel, A. (1981) Biochemistry, 20, 3167-3177. Warshel, A. & L a p ic ire lla , A. (1981) J. Am. Chem. Soc. 103, 4664-4673. Warshel, A. & L e v itt, M. (1976) J. Molec. Biol. 103, 227-249. Warshel, A., Russell, S.T. & Weiss, R.M. (1982) Biomimatic Chemistry and Transition State Analogs (eds. B.S. Green, V. Ashari, and D. Chipman), pp. 267-273. Amsterdam: Elsevier. Warshel, A. & Weiss, R.M. (1980) J. Am. Chem. Soc. 103, 446-451. Zerner, B., Bond, R.P.M. & Bender, M.L. (1964) J. Am. Chem. Soc. 86, 3674-3679. APPENDIX MACROSCOPIC TREATMENT O F A N ARBITRARY CHARG E DISTRIBUTION IN A SPHERICAL CAVITY This Appendix gives a b rief review of Kirkwood's treatment of the energy of an arbitrary charge distribution inside a spherical cavity (Kirkwood, 1934). The derivation is based on solving the Laplace 7 equation (V 4=0), for the potential of the charges and the "reaction potential" from the surrounding solvent. The general solution of the Laplace equation in polar coordinates is o o n $ ( r , 6,< f> ) = e" 1 I I (E / r n+1+B r n)pJJ(cos 0)eim ^ A1 n=0 m=-n nm nm n where e is the d ielec tric constant at the given position (r , e ,^ ) and P™(cos 8 ) are the associated Legandre polynomials. The f i r s t term represents the potential from the charge distribution inside the cavity. I f this distribution is described in terms of N charges,Qp ^2 *** ^N P°si t i ° n§) r p r 2, *** r^ respectively, we obtain N g °° n I ^ - I I (Enm / - n+1)P:tcos e)eim * A2 k=l ~ ~k n=0 m=-n where Enm is obtained by comparing equation (A2) with the multipolar 133 expansion of Q k/ k - r | J • This gives (n-i m|)! N Enm = E Qkr"P™(cos ek) e -lm i|)k A3 nm (n+|m|) i k=1 The Bnm term in equation (Al) represents the reaction potential from the surrounding d ie le c tric . Inside the cavity, this potential is given by o o n=U m=-n where -e. is the d iele c tric constant inside the cavity. Outside the cavity the potential can be expressed as 0 0 n • = < l l (C„m / r " +1)P” (cos 6)elm* AS 0 “ „ i 0 m=-n nm n where is the d ielectric constant outside the cavity and the 0 . coefficients of the r 1 1 terms in equation (Al) are set to zero to guarantee that the potential is zero at in fin ity . Now the problem reduces to evaluating E , B and C from 3 nm nm nm the boundary conditions of the problem. The boundary conditions used are ( i ) the potential must be continuous across the cavity surface (r=b) ( $ .(r=b)=$0(r= b )) and ( i i ) the normal component of the vector D (D=eE) is continuous across the cavity boundaries (e .. (9&./3 r) ^ .) = 134 S -;0(3$0/3 r ) r _b+5) • This provides the relation (n+1) ( 1-y) 2n+l B = , ( E /b*n L) A6 nm (n+1) ?+n n m where $ = ec / ei* Now the P°tential of equation (A4)can be expressed using equations (A3) and (A6 ). The interaction between and the charges is the free energy of polarizing the d ielec tric around the cavity (the "solvation energy"). This energy is given from equations (A3), (A4), and (A6) as eT1 N eT1 0 0 (n+1) (1-y) F AGsoi = — 1 Qk V r k> = — 1 7— ;— “ (“ ^ + r ) A7 so1 2 k K 2 n=0 (n+1) Y+n b where N N n n +n (n-|m |)! F = I I QkQ£ rPrJ I X A8 n k=l *=1 m=-n (n+lml)! P™(cos 6k)pJJ(cos 9£)exp{-iir(4>k"4>£)> Using the addition theorem of Legendre polynomials we obtain \ - I \ rjrjp „ (cos 9U ) A9 where Pn(cos < f > k^ is a simple Legendre Polynomial. Now the total 135 energy of the system (which includes the energy of interaction between the charges and the interaction between the charges and < f> p ) is given by I NN 1 INN A G - - r I QkQ s / V i a + " E Q r W = ~ ~ E £ Qk V £i r k£ 2 k IfY. K 3 6 1 2 k 2 k £^k eT1 y (n+1) ( 1-y) 1 N N + — ^ 7 “ ^----------- ^ " W T ) I I QkQ£ r kr JPn(cos ek£> A1° 2 n=0 (n+1) y+n b^n 1 k £ K * K 3 6 n K J 6 This equation gives, for n=0, the Born energy of the total Charge of the system. Sim ilarly i t gives, for n=l, the Onsager energy of the total dipole of the system. Reference Kirkwood, J.G. (1934) J. Chem. Phys. Zt 351-361. 136 SELECTED BIBLIOGRAPHY Fersht, A. (1977) Enz.ym e Structure and Mechanism. Reading: Freeman. Jackson, J.D. (1975) Classical Electrodynamics, 2nd ed., New York: Wiley. Naray-Szabo, G. & Bleha, T. (1982) Progress in Theoretical Chemistry, Vol. 3 (ed. I.G . Csizmadia), pp. 267-336. Amsterdam: Elsevier. Warshel, A. (1981) Biochemistry, 20, 3167-3177. Warshel, A. & L e v itt, M. (1976). J. Mol. B io l., 108, 227-249. Warshel, A. & Russell, S.T. (1984) Quarterly Rev. Biophys. 12, 283-422. UMI Number: DP21941 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. UMI DP21941 Published by ProQuest LLC (2014). Copyright in the Dissertation held by the Author. Dissertation P u blishing Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 4 8 1 0 6 -1 3 4 6
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Computer simulation of polar solvents with the SCAAS model
PDF
Computer modeling and free energy calculatons of biological processes
PDF
Computer simulation of chemical reactions in aqueous solutions and biological systems
PDF
Consistent evaluation of binding free energies and study of the role of electrostatic effects in the stabilization of protein complexes
PDF
Configuration interaction calculations on the resonance states of HCl
-
and the excited states of cyclopropane
PDF
Changes in the rate of net synthesis of oral tissue collagen with age and wound healing
PDF
A critical analysis of ideology and discourse in two hotels
PDF
Protein-protein interactions in blood serum
PDF
Structural Studies Of Selected Boron-Carbon Compounds
PDF
A critical study of the hydrogenation of talloel
PDF
Chiroptical spectroscopy of nitrogenase
PDF
Configuration interaction calculations on the resonance states of HF- the excited states of HF, and the Feschbach states of HF-
PDF
Adrenalectomy and fat absorption
PDF
Contributions to the chemistry of the chlorides, simple and complex, of beryllium
PDF
A new approach to the problem of measuring the properties of micelles
PDF
The role of water in the heat denaturation of egg albumin
PDF
Children's judgments of the similarity of television series and the similarity of gratifications associated with television series
PDF
¹⁹F-NMR studies of trifluoroacetyl insulin derivatives
PDF
Microheterogeneity of fetuin
PDF
Computational investigation of the catalytic aspartic acids of HIV-1 protease
Asset Metadata
Creator
Russell, Stephen T
(author)
Core Title
Calculations of electrostatic interactions in proteins
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Chemistry (Chemical Physics)
Degree Conferral Date
1985-08
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
chemistry, biochemistry,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Warshel, Arieh (
committee chair
), Petruska, John (
committee member
), Segal, Gerald (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c17-645324
Unique identifier
UC11354807
Identifier
DP21941.pdf (filename),usctheses-c17-645324 (legacy record id)
Legacy Identifier
DP21941.pdf
Dmrecord
645324
Document Type
Dissertation
Rights
Russell, Stephen T.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
chemistry, biochemistry