Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Advancing computational methods for free energy calculations in proteins and the applications for studies of catalysis in nucleotide-associated proteins
(USC Thesis Other)
Advancing computational methods for free energy calculations in proteins and the applications for studies of catalysis in nucleotide-associated proteins
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Advancing computational methods for free energy calculations in proteins and the applications for studies of catalysis in nucleotide-associated proteins by Hanwool Yoon A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL University of Southern California In partial Fulfillment of the requirements for the Degree DOCTOR OF PHILOSOPHY (CHEMISTRY) May 2019 Copyright 2019 Hanwool Yoon ` ii Acknowledgement All works done during my PhD program could not have been done without support from people who always guided, helped and encouraged me. It was my great honor to have Professor Arieh Warshel as my academic adviser. I will never forget the day when it was announced that he is the winner of the Nobel prize in Chemistry 2013. Even after he received the prize, he was still passionate to teach and guide me sitting down together almost every day. I would like to acknowledge my colleagues at Prof. Warshe’s group for their help and discussion. Especially, I had countless helps from Dr Zhen T. Chu for learning and running MOLARIS. I also thank Prof. Yuk Sham, my previous research advisor, who has led me into scientific career and he has been still my adviser in life as well as science during my Ph.D program. I am grateful to Prof. Aiichiro Nakano who introduced me to the Computer Science dual degree program at USC and I could have found the path that I really wanted after Ph.D with his considerable helps for career search. I cannot express my gratitude and love enough to my parents, Yong Chul Yoon and Eun Ja Park, who sacrificed a lot for my education and I am indebted to them forever. Michelle Sham, who is now my wife, has always supported and believed in me throughout my undergraduate, Master’s and PhD program and I was fortunate to be with her all this time. Lastly, Kylo, who is currently sitting on my lap while I am writing this, has always kept me happy throughout my Ph.D. ` iii Table of Contents Acknowledgement ............................................................................................................................. ii List of figures ..................................................................................................................................... v List of tables ................................................................................................................................... viii Abstract ............................................................................................................................................. x Chapter 1. Automatic calibration for the Empirical Valence Bond model with more systemic approach 1 1.1 Introduction ........................................................................................................................................ 1 1.2 Background of the Empirical Valence Bond Model ............................................................................ 2 1.3 Example of parameterization and its difficulty ................................................................................... 4 1.4 Introduction of the automatic calibration and the validation .......................................................... 12 1.5 Examples of the application .............................................................................................................. 14 1.6 Computational cost ........................................................................................................................... 15 1.7 Conclusion and discussion ................................................................................................................ 16 Chapter 2. Validating the Water Flooding Approach by Comparing it to Grand Canonical Monte Carlo Simulations ...................................................................................................................................... 19 2.1 Introduction ...................................................................................................................................... 19 2.2 Grand Canonical Monte Carlo method and Water Flooding ............................................................ 20 2.3 Methods ............................................................................................................................................ 21 2.3.1 Grand Canonical Monte Carlo .................................................................................................... 21 2.3.2 The Water Flooding model ........................................................................................................ 22 2.4 Benchmark tests and results ............................................................................................................. 24 2.5 Discussion .......................................................................................................................................... 34 Chapter 3. The control of the discrimination between dNTP and rNTP in DNA and RNA polymerase ... 36 3.1 Introduction ...................................................................................................................................... 36 3.2 Methods ............................................................................................................................................ 39 3.3 Results ............................................................................................................................................... 41 3.3.1 Analyzing of the selectivity of DNA polymerase ........................................................................ 41 3.3.2 Analyzing of the selectivity of DNA polymerase ........................................................................ 48 3.4 Conclusion ......................................................................................................................................... 50 Chapter 4. Simulating the Fidelity And The Three Mg Mechanism of Pol η and clarifying the validity of transition state theory in enzyme catalysis ....................................................................................... 51 4.1 Introduction ...................................................................................................................................... 51 4.2 Methods ............................................................................................................................................ 53 ` iv 4.3 Results ............................................................................................................................................... 54 4.3.1 The Fidelity of Pol η ................................................................................................................... 54 4.3.2 The Three Mg Mechanism Does Not Presents a new transition state theory ........................... 56 4.4 Conclusion ......................................................................................................................................... 61 Chapter 5. Exploring the catalytic mechanism of Cas9 using information inferred from Endonuclease VII ................................................................................................................................................... 63 5.1 Introduction and backgrounds .......................................................................................................... 63 5.2 Inferring the catalytic conformation from other nucleases ............................................................. 66 5.3 EVB calculation setups ...................................................................................................................... 67 5.4 Results and discussion ...................................................................................................................... 69 5.4.1 EVB simulations of the reaction of T4 Endonuclease VII ........................................................... 69 5.4.2 EVB calculations of the reaction of Cas9.................................................................................... 73 5.4.3 EVB calculation for Cas9 with mismatches around the cleavage site ........................................ 77 5.5 Conclusion ......................................................................................................................................... 78 References ....................................................................................................................................... 81 Appendix ......................................................................................................................................... 89 Details of the simulating system used for the chapter 2 ........................................................................ 89 Supporting materials for chapter 3 ......................................................................................................... 89 PMF calculation for the PP i release in chapter 4 .................................................................................... 91 ` v List of figures Figure 1-1. Two diabatic surfaces and an adiabatic surfaces after diagonalization .................................... 4 ...................................................................................................................................................................... 5 Figure 1-2. The reaction used as an example for the EVB simulation. The SN2 attack of polymerization step was simulated with the EVB method. The penta covalent bonds are formed at the final state which is a transition state of the polymerization reaction...................................................................................... 5 Figure 1-3. (a) Root Mean Square Deviation during the EVB simulation without any extra constraint in water and protein. (b) The initial structures used for the EVB simulation after relaxation without any extra constraint. The structures from water and protein are colored as blue and red, respectively. (c) and (d) The calculated EVB energy surface for the unconstrained simulation and the simulation with constrains on the region 2 (non-EVB atoms) and the bond between O3’ and Pα at state 1, respectively. . 6 Figure 1-4. (a) The change of the angle energy over the trajectory with different force constant for dihedral angles. ΔF means the difference in the force constant from the default value generated by Molaris library. (b) The relative catalysis (dg ϯ pro - dg ϯ wat) for each force constant for the dihedral angles. The experimental catalytic effect was ~5kcal/mol. .................................................................................... 10 Figure 1-5. An example case that shows a good convergence for both state 1 and state 2 but different response during over the trajectory. .......................................................................................................... 12 Figure 1-6. A schematic thermodynamic cycle for the automatic calibration. In the upper cycle, the reaction fragments are extracted from the protein along the EVB trajectories and then solvated while the conformations are fixed. (Depicted with the blue color circle) In the lower cycle, the constraints on the reaction fragments are released. ΔG RR is a restraint-release energy, which is the energy of releasing the restraints. .............................................................................................................................................. 16 Figure 1-7. (a) One of the best EVB results calculated with 5 angle constraints and one bond constraint with 50 kcal/mol force constants which were carefully chosen after identifying the problematic intra energy contributions. (b) The result calibrated automatically by taking the upper cycle in Fig 1-6 using the protein trajectory from (a). (c) The result calibrated automatically by taking the upper cycle in Fig 1-6 using the unconstrained protein simulation trajectory .............................................................................. 17 Figure 1-8. The final calculation of the proposed approach ...................................................................... 17 Figure 1-9. The energy surface before/after the correction term with the lower cycle in Fig 1-6. The calculated ΔΔg ϯ wat->pro after the correction is -6.4kcal/mol ......................................................................... 18 Figure 1-10. Convergence of solvating the fixed fragments ...................................................................... 18 Figure 2-1. GCMC titration for water molecules in a bulk water sphere with different radius for GCMC insertion/deletion. ...................................................................................................................................... 25 Figure 2-2. WF titration for water molecules in a bulk water sphere with different radius for WF region. .................................................................................................................................................................... 25 Figure 2-3. GCMC titration for water molecules in the single water cavity in BPTI................................... 26 Figure 2-4. WF titration for water molecules in the single water cavity in BPTI ....................................... 26 Figure 2-5. GCMC titration plot for the 3-water cavity in BPTI, when the protein is constrained (blue) and unconstrained (orange) .............................................................................................................................. 27 ` vi Figure 2-6. WF titration plot for water molecules in the 3-water cavity in BPTI, when the protein is constrained (blue) and not constrained (orange). WF calculation constrained on the structure generated by GCMC at B value=0 is given in a green color plot. ................................................................................. 27 Figure 2-7 Comparing the 3-water cavity sites of BPTI obtained by the WF(blue) and GCMC(pink) approaches, at B value of -8.0. The cavity water molecules are colored same as the protein backbone. 28 Figure 2-8. GCMC titration plot with different initial number of water molecules in the 3-water cavity in BPTI ............................................................................................................................................................. 29 Figure 2-9. WF titration plot with different initial number of water molecules in the 3-water cavity in BPTI ............................................................................................................................................................. 29 Figure 2-10. Plot of minimum free energies obtained by the WF’s MC post processing, for configurations of 3 and 7 water molecules, at the 3 water molecules cavity in BPTI. The difference of the minimum LRA energy between the two configurations is -5.64 kcal/mol at a B value of 8.0 (a), 3.15 kcal/mol at a B value of 13.3 (b) and the free energy of protein deformation from 3 water molecules to 7 water molecules configuration calculated according to eq.6 is 5.56 kcal/mol (c). ............................................... 31 Figure 2-11. GCMC titration plot for runs from 100ps to 1 (and 5) ns for the 1 water and 3 water cavities in BPTI. ........................................................................................................................................................ 32 Figure 2-12. WF titration plot for runs from 10ps to 100ps for the 1 water and 3 water cavities in BPTI 32 Figure 2-13. WF titration plot from 10ps to 100ps simulation for SNase ................................................. 33 Figure 2-14. GCMC titration plot from 100ps to 1ns simulation for SNase .............................................. 33 Figure 3-1. Analogous active sites of DNA polymerase (a) and RNA polymerase (b). These two polymerase catalyze the same reaction with the same key groups; two aspartate acids and magnesium ions. Furthermore, both have a tyrosine right above the 2’ of the ribose sugar. ...................................... 37 Figure 3-2 The EVB free energy profiles for the catalytic reaction with different substrates . .................. 42 Figure 3-3. The thermodynamic cycle for the alchemical transformation of the ribose sugar in NTP. E, S, and S’ represent enzyme, cognate substrate, and non-cognate substrate, respectively. The energetics of the states involved is described schematically in Fig. 3-4 .......................................................................... 42 Figure 3-4. Schematic description of the energetics of the states considered in the thermodynamic cycle of Fig.3-3 ..................................................................................................................................................... 43 Figure 3-5 . The relative catalytic efficiency (log Ω) calculated for DNA polymerase ................................. 45 Figure 3-6. A comparison between normally solvated structures (red) and structures processed by the water flooding approach (blue). All initial trajectories used were superimposed and the water molecules are presented as molecular surface with transparency. The extra water molecules found around the Y416 after the water flooding are highlighted with circles. ....................................................................... 47 Figure 3-7. (a) Hydrogen bond distances between 3’OH and the Y416 backbone during the FEP calculation for structures. (b) The distance between 2’O and Cγ of the side chain of Y416 during the FEP calculation. Three cases were tested with the original structure (red), with the structure processed by water flooding (blue), and the same structure by water flooding but the hydrogen bond 3’O and HN is forced by a 4.0 kcal/mol Å 2 harmonic constraint at 2 Å (green). ............................................................... 48 Figure 4-1. The structure of polymerase η and its active site .................................................................... 53 Figure 4-2. The EVB free energy profiles for the catalytic reaction by Pol η with different base pairs. .... 55 Figure 4-3. The calculated reaction profile of the catalytic reaction by Pol η with and without the 3rd Mg 2+ . Two pol η structures were used: the recent structure reported with the 3rd Mg(PDB:5L9X 84 ) and the structure reported earlier without the 3rd Mg (PDB:4O3N 83 ). The profiles were also calculated on ` vii 5L9X in which the 3rd Mg 2+ is removed and on 4O3N in which a new Mg is inserted at the corresponding position. ...................................................................................................................................................... 57 Figure 4-4. The landscape of the reaction of pol η as a function of the reaction path from the reactant to the product state and the distances between the 3rd Mg 2+ and the O2A of the incoming nucleotide ..... 58 Figure 4-5. Free energy diagrams for the catalytic reaction of pol. The surface 1 (red) and 2 (blue) in (a) depicts the actual surfaces for the catalysis with 2 and 3 metal ions, respectively. ∆g # enz is the barrier that corresponds to k cat/K M , which is given by the difference between the TS and the reacting fragments in water. ∆g # pol is the difference between the TS free energy and the free energy at the bound RS. The surfaces in (b) corresponds to the proposal of Gao. et al 84 , where the reference energy in the reactant state is placed in a correct height, while the dotted line corresponds to the actual figure presented in Gao. et al 84 . ................................................................................................................................................ 59 Figure 5-1. The mechanism of the conformational change of Cas9 upon dsDNA binding and PAM recognition and cleavage. The structures of PAM interacting domain and the guide are adjusted for PAM recognition, followed by DNA base pairs melting and DNA-RNA hybridization. Then, the HNH domain undergoes rearrangement to bring catalytic residues to the cleavage site on the targeted DNA strand (phosphate at position -3). ......................................................................................................................... 65 Figure 5-2. The common positive residues (circled) at the catalytic site in (a) Staphylococcal nuclease, (b) T4 Endonuclease VII and (c) Endonuclease Vvn.......................................................................................... 67 Figure 5-3. One metal ion mechanism, used in the calculation. A water molecule deprotonated by Nδ of histidine attacks the scissile phosphate between two nucleotides............................................................ 70 Figure 5-4. The different position of the Mg 2+ ion in (a) the original crystal structure of Endo VII and (b) after adjusting it to favor the proton transfer step as well as the nucleophilic attack step. ..................... 71 Figure 5-5. Calculated free energy profiles for the reference reaction in solution and the wild type and H43N of Endo VII. ........................................................................................................................................ 72 Figure 5-6. (a) The Cryo-EM structure of Cas9. (PDB: 5Y36 114 ) where the catalytic H840 was mutated to alanine to stop the reaction. The distance between the scissile phosphate and the Mg 2+ is 4.8 Å and the N863 faces away from the catalytic site. (b) Adjusted Cas9 structure that brings HIS840(converted from alanine) and ASN863 close to the corresponding positions in the catalytic site of Endo VII with 3.3 Å distance between the scissile phosphate and the Mg 2+ . ............................................................................ 74 Figure 5-7. Pulling K848 from its initial position (colored blue) to the neighborhood of the scissile phosphate (colored red). ............................................................................................................................ 75 Figure 5-8. The EVB reaction profiles in the cases when K848 is far from the active site, and when it is near with/without being ionized. ............................................................................................................... 76 Figure 5-9. The EVB reaction profiles for Cas9 with and without 3 consecutive mismatch base parings from 4 nucleotides upstream of PAM. The effect of the mutation of K848 to alanine is also calculated. Note that the low barrier case is unlikely to occur, since the unwinding conformational change that leads to the K848 movement is unlikely to be allowed (see text) ....................................................................... 78 Figure S3-1. The atoms of the substrate and the reactive part. The reacting regions are presented as bold. ............................................................................................................................................................ 89 Figure S4-1. The PMF profile of pulling the PP i after the chemical step. The profile is obtained for changing the distance between O3B of the PP i and PA from 3.8 Å to 5.5 Å. The red color profile was calculated from the structures where the 3 rd MG has been taken off.(a) was calculated with the presence of both catalytic and binding Mg while (b) was calculated after the catalytic Mg was removed. ............. 91 ` viii List of tables Table 1-1. Energy contribution in the EVB calculation. The first run was simulated without any constraint while the second run constrained the non-EVB atoms with 100 kcal/mol force constant and a bond constraint between O3’ and Pα at state 1. ................................................................................................... 7 Table 1-2. Mapping energy (kcal/mol) for angles with different force constants for dihedral angles. ΔF means the difference in the force constant from the default value generated by Molaris library. For each case, 5 different initial trajectories were used for protein and water. Out of 5 trajectories, Wat_best shows one of the results whose angle contribution is the closest to the average of protein’s. ................ 11 Table 3-2. Experimental data set from 74 and the catalytic effect of changing from deoxyribose to ribose sugar in T7 RNA polymerase ....................................................................................................................... 38 Table 3-3. The relative catalytic efficiency (log Ω) calculated for DNA polymerase with and without the water flooding postprocessing and 4.0 kcal/ mol Å 2 hydrogen bond constraint between the backbone nitrogen of the Y416 and 3OH’ in the substrate. ........................................................................................ 45 Table 3-4 Breakdown of the calculated free energy contributions to the TS binding in RB69 DNA polymerase. Energies are in kcal/mol . The calculations involve the water flooding approach and no H bond constraint . ......................................................................................................................................... 45 Table 3-5. The relative catalytic efficiency (log Ω) for T7 RNA polymerase structures with and without the constrained hydrogen bond with 15kcal/mol Å 2 distance force constant between O2’ and the hydroxyl group in Y416 at 1.9 Å was imposed in the wild type while a corresponding distance constraint at 2.8 Å between O2’ and Hζ in F416 was imposed in the mutant .......................................................................... 49 Table 3-6. Free energy contributions of the alchemical mutation in T7 RNA polymerase. Energies are in kcal/mol. The results were obtained with a hydrogen bond constraint between O2’ and the hydroxyl group. In Y639F, a corresponding constraint was imposed to position the mutated phenylalanine at the constrained wild type. ................................................................................................................................ 49 Table 4-1. The calculated and observed 83 relative fidelities for the systems studied. The relative fidelity, f, is defined as (k cat/K M) dNTP/(k cat/K M) dCTP where dNTP is either dATP or dCTP. k cat refers to the rate constants of the incorporation of dCTP and dATP opposite dG and 8-oxoG and Δg # is the corresponding free energy. The corresponding calculated fidelity was obtained by using the corresponding Δg # . Note that the calculations of the binding free energies present an overestimate due to the difficulties of capturing sufficiently large dielectric for the large electrostatic effects. The experimental steady state kinetics were measured at 37 °C and pH 7.5. ............................................................................................. 56 Table 5-1 . Calculated free energy profile for the catalytic reaction different systems. Free energies(kcal/mol) of the proton transfer(PT) step, the nucleophilic attack(NA) step and the total barrier in the reference solution, the wild type and the H43N mutant of Endo VII as well as the observed barrier converted from k cat from ref 137 .................................................................................................................. 73 Table 5-2. Calculated free energy profile for the catalytic reaction of Cas9 different systems. Calculated free energies(kcal/mol) of proton transfer(PT) step, nucleophilic attack(NA) step and total barrier in the reference solution and the three cases for Cas9. The observed barrier was converted from the k cat of ref 138 ................................................................................................................................................................. 76 ` ix Table S2-1. The coordinates of centers of simulating systems and the radius of each simulation and the cavity. Note that the center of a simulation is same as that of its cavity. .................................................. 89 Table S3-1 The charges and the vdW parameters used in the FEP calculation that mutate the 2’OH to 2’H. The vdW energy between atom i and j is calculated as E vdw = C iC jexp(-α i α jr ij), where r ij is the distance between atom i and j. ................................................................................................................................. 90 Table S3-2 Breakdown of the calculated free energy contributions to the TS binding in RB69 DNA polymerase Energies in kcal/mol . The calculations involve the water flooding approach and no H bond constraint . .................................................................................................................................................. 90 ` x Abstract Computational methods have been great complementary approaches when experiments often cannot identify key factors in fundamental biological problems. Especially, while experimental mutation studies allow us to infer the mechanisms, the only way to draw the full pictures of the mechanism is by calculating the entire energy surface of the reaction. The empirical valence bond(EVB) model has been successfully applied to elucidate the mechanisms of enzymatic catalysis in various biological systems. However, it is often difficult to produce reliable results without very careful system adjustments and analysis, thus, one might need to have enough experience until achieving consistent calculations. The major part of the difficulties is often encountered during the calibration procedure where the simulation in the reference solvent gets diverged from the simulation in proteins. In the first part of the thesis, an automatized protocol for the EVB model with theoretical assurance is introduced and validated. It significantly lessens the time and efforts that users should take until generating reliable results. Another challenging part, not just confined to the EVB but for the general free energy calculation, is the preparation of the optimal configuration of the simulating system. Among many aspects to be considered for consistent sampling and free energy calculation, the importance of the optimal water configuration has been only relatively recently recognized. Also, beside the quality of the simulation, it is well known that water molecules inside and around biological molecules often play a major role in determining key functions. Attempts to reach a formal rigor in water insertion in protein cavities have focused on grand canonical Monte Carlo(GCMC) methods. However, it requires major computational resources as the acceptance of the insertion/deletion attempts is very low. One of the fast and reliable methods to prepare better water configuration is the Water Flooding devised by Warshel’s group. However, this approach has not been fully validated and needs more benchmark test. The second chapter shows that the Water Flooding converges much faster than the GCMC but still reproduce the same results. ` xi In the rest of the chapters, the mechanisms of various nucleotide associated proteins are explored using the EVB with the help of the protocols introduced in chapter 1 and 2. Although Warshel’s group has successfully applied the EVB to many studies of DNA polymerases and nucleases, there are still countless fundamental questions to be solved to better understand the nature of those enzymatic catalysis. In particular with the rise of potential uses of the CIRSPR-Cas system in human medicine, it is more important than ever to understand the detailed mechanisms that are shared among many nucleotide-associated proteins. The findings and analysis from these chapters can provide powerful ways and insights for developing medicine that act against viruses and cancers and engineering enzymes for gene editing. ` 1 Chapter 1. Automatic calibration for the Empirical Valence Bond model with more systemic approach 1.1 Introduction Enzymatic reactions control most of the biological processes and maintain all forms of life. Reactions that would take years can be shorten to milliseconds or microseconds. It is well known that the enzymatic reactions are accelerated by lowering the activation barrier. However, it is still a trivial question to ask how such activation barrier at the transition state is lowered by biological molecules. Although experimental mutation studies allow us to infer the mechanisms, the only way to draw the full pictures is by calculating the entire energy surface of the reaction. By doing this, one can explicitly evaluate the exact major contributions for the lowered barrier. Exploring free energy surface of a reaction in protein is very challenging because of the complexity of the system and it is almost impossible to calculate the free energy surface with quantum mechanical (QM) treatment. The pioneering development of QM/MM model 1 , which treats the reaction region with QM method while treating the rest with molecular mechanics, has provided a general way to study chemical reaction in large systems. However, despite of successful progress of ab initio QM/MM method 2-11 , the current computational power is not sufficient enough for exhaustive sampling which is essential to obtain reliable free energy profile. For this reason, the empirical valence bond (EVB) approach 12-15 is currently one of the most powerful tools to study enzymatic reactions. In this method, classical potentials parameterized by ab initio calculation is used to represent the diabatic states. The key aspect of the EVB method is that it takes advantage of the well-defined experimental information or ab initio data for unique calibration. Once the experimental measurements of a reaction in solution is reproduced by adjusting parameters, one can confidently use the same Hamiltonian components to the same reaction in enzyme, which will be described in detail later. Therefore, the EVB model provides a much faster and reliable way to evaluate the energy surface of enzymatic reaction without expensive QM treatments. However, using the EVB method requires users to have thorough experience to produce proper results. For example, there has been an attempt to criticize the EVB method after failing to reproduce a previously reported EVB results by using incorrect parameters 16 and led them to ` 2 misrepresent the validity of the EVB model. 17 Not only that someone may easily overlook the theoretical background in this model such as the validity of transferability of the off diagonal term of the Hamiltonian 16 , but also the first time users can often generate incorrect results without careful analysis of the details in the data. Especially, the calibration steps often confuse the users because the reaction geometries are often observed to be very different between the protein and the reference simulations. One of the assumption in the EVB approach is that the reaction geometries in the enzyme are similar with the reference reaction in solution to apply the same gas phase constant and the off diagonal term in the Hamiltonian. (which will be further described in the next sections) Even experienced users sometimes inevitably have to adopt constraints to minimize the undesirably huge difference in the intra energy contribution from the different reaction geometries. Therefore, even though the EVB method itself is significantly faster than most of other approaches for enzymatic reaction studies, the users often have to run a countless number of simulations trying various parameters and constraints until they are confident enough with the calibrations. In this study, the common difficulties during the calibration step will be introduced with an example case in step by step. Then, an automatized calibration strategy will be proposed which can greatly improve the usability and accessibility of the EVB method with more reliable evaluation. 1.2 Background of the Empirical Valence Bond Model The EVB method mixes the diabatic states (resonance states) corresponding to the classical valence-bond (VB) structures. The potential energies of each diabatic states are expressed as (Eq 1-1) , where R and Q represent the atomic coordinate and charges of the solute, r and q represent those of the surrounding environment (protein and solvent), respectively. 𝑈𝑈 𝑆𝑆 𝑆𝑆 is the interaction energies between the solute and the environment. And the 𝛼𝛼 g as 𝑖𝑖 is the gas-phase energy of the valence bond structure at i-th state which is the energy to bring the fragments of each valence 𝜀𝜀 𝑖𝑖 = 𝑈𝑈 i ntr a 𝑖𝑖 � 𝑅𝑅 , 𝑄𝑄 𝑖𝑖 � + 𝑈𝑈 𝑆𝑆 𝑆𝑆 𝑖𝑖 � 𝑅𝑅 , 𝑄𝑄 𝑖𝑖 , 𝑟𝑟 , 𝑞𝑞 � + 𝑈𝑈 𝑆𝑆𝑆𝑆 𝑖𝑖 ( 𝑟𝑟 , 𝑞𝑞 ) + 𝛼𝛼 g as 𝑖𝑖 ` 3 bond structure from infinite separation. Finally, Uss represents the potential energy of the surrounding. With this potential energy definition, the EVB Hamiltonian matrix can be constructed for two diabatic states as following. 𝐻𝐻 EV B = � 𝜀𝜀 1 𝐻𝐻 1 2 𝐻𝐻 1 2 𝜀𝜀 2 � (Eq 1-2) Then, the ground state energy surface can be calculated by diagonalizing the matrix. The key aspect of the EVB approach is that it utilizes the unique calibration with the well-studied experimental data for the reference reaction (usually in water). That is, after simulating the reference reaction, the gas-phase constant, 𝛼𝛼 g as 𝑖𝑖 , and the off diagonal term, H12, are adjusted until the observed Δ 𝐺𝐺 and Δ 𝑔𝑔 ϯ are reproduced. (See Figure 1-1) The 𝛼𝛼 g as 𝑖𝑖 remains the same in different solvents. It becomes clearer when one considers the difference of (Eq 1-1) in between protein and water as below. 𝜀𝜀 𝑖𝑖 𝑝𝑝 𝑝𝑝 𝑝𝑝 − 𝜀𝜀 𝑖𝑖 𝑤𝑤𝑤𝑤𝑤𝑤 = { 𝑈𝑈 𝑆𝑆 𝑆𝑆 𝑖𝑖 , 𝑝𝑝 𝑝𝑝 𝑝𝑝 + 𝑈𝑈 𝑆𝑆𝑆𝑆 𝑖𝑖 , 𝑝𝑝 𝑝𝑝 𝑝𝑝 } − { 𝑈𝑈 𝑆𝑆 𝑆𝑆 𝑖𝑖 , 𝑤𝑤𝑤𝑤𝑤𝑤 + 𝑈𝑈 𝑆𝑆𝑆𝑆 𝑖𝑖 , 𝑤𝑤𝑤𝑤𝑤𝑤 } = Δ 𝑔𝑔 𝑆𝑆𝑝𝑝 𝑠𝑠 , 𝑝𝑝 𝑝𝑝 𝑝𝑝 𝑖𝑖 - Δ 𝑔𝑔 𝑆𝑆𝑝𝑝 𝑠𝑠 , 𝑤𝑤𝑤𝑤𝑤𝑤 𝑖𝑖 = ΔΔ 𝑔𝑔 𝑆𝑆𝑝𝑝 𝑠𝑠 ,( 𝑤𝑤𝑤𝑤𝑤𝑤 → 𝑝𝑝 𝑝𝑝 𝑝𝑝 ) 𝑖𝑖 (Eq 1-3) Therefore, after 𝛼𝛼 g as 𝑖𝑖 is calibrated, it is possible to evaluate the potential energy surface for protein by only considering the change in the solvation energy from water to protein. Also, it has been shown that the off diagonal term, H12, is transferable between different environments for SN2 reactions. 18 In this way, any possible errors that may exist in 𝜀𝜀 𝑖𝑖 does not affect the relative effects of different solvents. Note that the reference simulation in the solvent does not necessarily have to represent the best realistic reaction but it is rather a mathematical trick. In other words, the key points of using the EVB method is to explore the relative effects of the change in the environments of the reaction. An important assumption here is that the geometry of the solute through the reaction in water is very similar to that in the protein. Otherwise, one would have to consider the non-canceled intra energy term and any error residing in 𝜀𝜀 𝑖𝑖 may affect the relative effects of the different solvents. Now, we can construct the energy surface with the ground state potential calculated from (Eq1- 2). To overcome the sampling problem, the free energy perturbation (FEP) procedure is used with the EVB mapping potential defined as ` 4 𝜀𝜀 𝑚𝑚 = (1 − 𝜆𝜆 𝑚𝑚 ) 𝜀𝜀 1 + 𝜆𝜆 𝑚𝑚 𝜀𝜀 2 (Eq 1-4) 𝜆𝜆 is slowly increased from 0 to 1 to effectively sample all the configurational space to avoid getting trapped in a local minima by either 𝜀𝜀 1 or 𝜀𝜀 2. Then, the ground state energy surface can be calculated as following (Eq 1-5) , where εg is the ground state potential obtained from the Hamiltonian, < >m is the averaged values at potential εm, and ΔGi->i+1 is the energy associated with moving the potential from εm to εm+1. Figure 1-1. Two diabatic surfaces and an adiabatic surfaces after diagonalization 1.3 Example of parameterization and its difficulty As mentioned above, a key assumption in the EVB model is that the reaction geometry is similar between water and protein. The validity of this assumption will be discussed later. The main point in this section is to show that it is usually very difficult and takes long time to produce reliable results because it is often inevitable to adjust parameters that force both reactions in water and protein to proceed similarly. ` 5 As an example, a well-studied SN2 polymerization reaction was used. Specifically, a structure of RB69 DNA polymerase with a substrate of dCTP 19 was used with initial parameters used in the previous study. 20 The reaction scheme is depicted in Figure 1-2. Note that the EVB atoms consists of the attacking oxygen and the triphosphate chain. Figure 1-2. The reaction used as an example for the EVB simulation. The SN2 attack of polymerization step was simulated with the EVB method. The penta covalent bonds are formed at the final state which is a transition state of the polymerization reaction. The first trial of the EVB simulation is mostly likely to be inconsistent. As shown in Table 1-1, a huge contribution is made from the intra energy, especially from the angle and bond for the initial run without any extra force or constraint. This is obviously inconsistent result not only because it is evident that the reaction proceeds on a completely different potential energy surface with the relative catalysis beyond the expected range (Figure 1-3-c), but also the catalytic power is expected to be obtained from the electrostatics. 14, 21 The problem is that the simulation of a reaction in water by molecular dynamics has much more degrees of freedom than in protein in which the substrate is more tightly bound. (Figure 1-3-a) This leads the simulation in water to easily change the conformation to compensate the largely unfavorable intra energy by electrostatics. Especially in this example, two magnesium ions quickly repulse each other in water while it is fingered by two aspartate acids in protein. (Fig 1-3-b) In addition, the reaction orientation is significantly different due to the shifted sugar ring. Thus, it is often inevitable to impose constraints on both protein and water simulation. ` 6 (a) (b) (c) (d) Figure 1-3. (a) Root Mean Square Deviation during the EVB simulation without any extra constraint in water and protein. (b) The initial structures used for the EVB simulation after relaxation without any extra constraint. The structures from water and protein are colored as blue and red, respectively. (c) and (d) The calculated EVB energy surface for the unconstrained simulation and the simulation with constrains on the region 2 (non-EVB atoms) and the bond between O3’ and Pα at state 1, respectively. ` 7 No Constrained rg2+bond constrained Protein Water Protein Water Bond 19.6 24.4 -2.2 -1.4 Angle -9.3 49.8 2.0 -2.2 Electro -114.0 -124.4 -103.2 -105.4 VDW -2.02 -1.3 -0.6 -0.3 Torsion 1.21 -0.8 1.5 0.7 Table 1-1. Energy contribution in the EVB calculation. The first run was simulated without any constraint while the second run constrained the non-EVB atoms with 100 kcal/mol force constant and a bond constraint between O3’ and Pα at state 1. To bring the reaction orientation close to the protein’s, positional constraints of 1090 kcal/mol were imposed to the non-EVB atoms in water simulation and a bond constraint between O3’ and Pα defined in the state 1 at the distance observed in the protein simulation. This greatly alleviated the huge contribution of the bond and angle energy. (Table 1-1) Also, the shape of the energy surface of the water and protein simulation are now comparable. This result clearly implies that we do need the similar reaction geometries to expect comparable energy surfaces. However, the intra energies are still significantly deviated from each other (Table 1-1) and there was no catalysis obtained in protein at all. (Fig 1-4-d) A direct way to decrease such differences in the intra energy contributions is to increase the force constant of the harmonic angular potential defined in most of the molecular dynamic programs as 𝑬𝑬 𝒂𝒂𝒂𝒂 𝒂𝒂𝒂𝒂 𝒂𝒂 = 𝑭𝑭 ( 𝜽𝜽 − 𝜽𝜽 𝟎𝟎 ) 2 (Eq 1-6) Here, a various force constants were rigorously tested with 5 different initial trajectories for each case to see its average effect on the convergence of the angle energy between the protein and water (Figure 1-4-a) and the consequences on the calculated catalysis. (Figure 1-4-b) The trajectories were more similar with the bigger force constants as expected. However, a good convergence could be achieved only with very huge force constants. Also, the closer the trajectories were between protein and water, the larger the relative catalysis was obtained. Of course, it does not mean that we should use such extremely large forces which restricts proper sampling and cause an overestimation shown at ΔF=100 kcal/mol. More importantly, the bigger force constants do not necessarily guarantee the similar angle energy contribution. Table 1-2 ` 8 shows that even though the ‘chance’ to generate a trajectory with the similar reaction geometry for the water simulation increases with the bigger force constant, the average angle energies are still far between protein and water simulation. This is because the angle energy in Eq 6 becomes very sensitive with a bigger force constant. Even with a small deviation from 𝜃𝜃 0 , the energy will deviate a lot. Therefore, this approach does not sound consistent. However, this still suggests that the similar geometries do generate better relative catalysis though it needs to be inspected in more details. In practice, the EVB requires more careful analysis on every individual intra energy for each bond/angle types rather than changing the parameters collectively. For example, an automated script was used to record the average angles during the first mapping frame which samples the pure state 1 energies. (Table 1-3) The highlighted SN2 attacking angles including O3 - - Pα bond shows 3~5 kcal/mol difference in the energy even though the difference in the angle is only 2 o ~3 o . The problem here is that even though the state 2 parameters do not affect the dynamics in the beginning when λ is small (eq.4 ), its energy is still counted to calculate the mapping energy. (Eq 5) Therefore, if the orientation is different at either near λ=0 or λ=1, the parameters of the opposite state contribute significantly. This has been observed by many users as well, especially for SN2 reactions. One dilemma here is that if we decrease the force parameter for the state 2 to weaken its effect on the mapping energy in the early stage, now the average behavior in the later part of the simulation diverge between protein and water because of the weaker forces. Therefore, users often have to try many simulations with different parameters to ‘balance’ between the similar trajectories and the mapping energies. Lastly, even if one finds a balanced parameters with which the average behavior is very similar at both state 1 and state2, how they progress over the λ can be still different which still produces a significant difference due to the contribution from the intra energies. (Figure 1-5) It becomes even more complicated when one studies multiple mutants of the same system. Therefore, there is a steep learning curve to run the EVB. Without careful parameter adjustments, one may end up generating inconsistent results. Even for experienced users, it is still time consuming task to adjust the simulation. ` 9 Another important question is the validity of the assumed similar reaction geometries between protein and water. Figure 1-4 shows a direct effect of constraining the system on the relative enzymatic catalysis. This may be due to the narrower sampling of the conformational space or due to the relatively more decreased entropic contribution in water than in protein. But there is not an easy way to identify the source of the possible errors. Therefore, it is necessary to consider more reliable and systemic way to simulate the EVB approach which considers the cost of forcing the similar reaction geometries. ` 10 (a) (b) Figure 1-4. (a) The change of the angle energy over the trajectory with different force constant for dihedral angles. ΔF means the difference in the force constant from the default value generated by Molaris library. (b) The relative catalysis (dg ϯ pro - dg ϯ wat) for each force constant for the dihedral angles. The experimental catalytic effect was ~5kcal/mol. ` 11 ΔF = - 40 ΔF = - 20 ΔF = 0 ΔF = +20 ΔF = +40 ΔF = +70 ΔF = +100 Protein 21.6 0.7 3.3 4.2 7.0 9.0 15.4 Wat_avg 21.9 16.5 13.0 9.8 8.2 16.6 23.5 Wat_best 23.4 8.7 -2.2 3.8 7.2 8.3 15.9 Table 1-2. Mapping energy (kcal/mol) for angles with different force constants for dihedral angles. ΔF means the difference in the force constant from the default value generated by Molaris library. For each case, 5 different initial trajectories were used for protein and water. Out of 5 trajectories, Wat_best shows one of the results whose angle contribution is the closest to the average of protein’s. Table 1-3. An example analysis for every individual angle behavior for the state 2 energy at the first mapping frame. The second part of the table is the state 2 energy (kcal/mol) calculated by the angles from the third part of the table. All numbers shown are average values over the first frame. ` 12 Figure 1-5. An example case that shows a good convergence for both state 1 and state 2 but different response during over the trajectory. 1.4 Introduction of the automatic calibration and the validation One of the main purposes of this work is to automatize the EVB calibration procedure. Running the EVB repeatedly with different parameters and constraints hoping the water simulation to behave like in the protein can be a very frustrating task. In addition, more systemic approach is required to take the effect of constraining geometry on the catalysis into account. A schematic proposal is described in Figure 1-6. The basic idea is that if the entropy is really not a major determinant for the enzymatic catalysis as pointed out in the previous study 22 and if the catalysis can be obtained when the reaction geometries are similar as shown in the test cases above, the calibration with the reaction in water that explores the exactly same conformational space should not change the estimation much. Indeed, the calibration with unconstrained reaction in water was completely incorrect. Therefore, the calibration is done more reliably by taking the upper cycle in Figure 1-6. ΔΔ 𝑔𝑔 𝑤𝑤 , 𝑐𝑐 𝑝𝑝 𝑐𝑐 𝑆𝑆𝑤𝑤 → 𝑐𝑐 𝑤𝑤𝑤𝑤 ϯ = Δ 𝑔𝑔 𝑐𝑐 𝑤𝑤𝑤𝑤 ϯ − Δ 𝑔𝑔 𝑤𝑤 , 𝑐𝑐 𝑝𝑝 𝑐𝑐 𝑆𝑆𝑤𝑤 ϯ (Eq. 6) ` 13 To do this, the simulation of a protein is run while saving the reaction fragments coordinates at every 10 femto seconds. (which is the default frequency of recording the energies in most of the EVB studies) Now, we solvate each of the extracted reaction fragments while their coordinates are completely fixed. Each energy component is calculated by averaging the energies from the point when the numbers starts converging after enough relaxation. In this way, there will be no difference in the intra energies between the reactions in protein and water since both reaction geometries are identical and we will see the pure contribution from the environment by the electrostatics and the van der Waals. This should generate similar result to the one that was produced with constraints which will be shown later in this work. Of course, this significantly restricts the sampling of the reaction in water that may lead to an over/underestimation. Now, the energy of releasing the fixed conformation to a free moving state is considered. However, it is very expensive to calculate the potential of mean force for each extracted reaction fragment. We, thus, consider this process as a correction step to the calculated surface with the upper cycle by taking only the RS and TS mapping frame. It is described in the lower cycle in Figure 1-6 which calculates the restrain releasing energy at reactant state and transition state which will correct the activation barrier of the surface by Δ 𝑔𝑔 𝑤𝑤 ϯ − Δ 𝑔𝑔 𝑤𝑤 , 𝑐𝑐 𝑝𝑝 𝑐𝑐 𝑆𝑆𝑤𝑤 ϯ = ΔG RR, TS − ΔG RR, RS (Eq. 7) The calculation of ΔGRR, the energy associated with releasing the restraints, was formulated and implemented as a part of the previous study. 23 Here, only the main points are described. ΔGRR is evaluated by the free energy perturbation (FEP) approach with a restraint potential, 𝑈𝑈 𝑝𝑝 𝑟𝑟 𝑆𝑆𝑤𝑤 , 𝑗𝑗 𝑁𝑁 = 𝐾𝐾 𝑗𝑗 2 ∑ ( 𝑅𝑅 𝑐𝑐 𝑁𝑁 − 𝑅𝑅 � 𝑐𝑐 𝑁𝑁 ) 2 𝑖𝑖 (Eq. 8) , where Kf is a constraint force constant and 𝑅𝑅 � 𝑐𝑐 𝑁𝑁 is a reference coordinate extracted. The potential is applied to each atom, j, of the solute. Now, ΔGRR can be evaluated by a free energy perturbation (FEP) with a mapping potential 𝑈𝑈 𝑚𝑚 𝑁𝑁 = (1 − 𝜆𝜆 𝑚𝑚 ) 𝑈𝑈 𝑝𝑝 𝑟𝑟 𝑆𝑆 𝑤𝑤 , 1 𝑁𝑁 + 𝜆𝜆 𝑚𝑚 𝑈𝑈 𝑝𝑝 𝑟𝑟 𝑆𝑆 𝑤𝑤 , 2 𝑁𝑁 + ∑ 1 2 𝐾𝐾 𝑐𝑐 𝑤𝑤 𝑐𝑐 𝑟𝑟 ( 𝑅𝑅 𝑐𝑐 𝐹𝐹 − 𝑓𝑓 − 1 𝐹𝐹 = 1 𝑅𝑅 � 𝑐𝑐 𝐹𝐹 ) + E − 𝛽𝛽 − 1 ln ( 𝑣𝑣 0 𝑣𝑣 𝑐𝑐 𝑤𝑤 𝑐𝑐 𝑟𝑟 � ) (Eq. 9) ,where 𝜆𝜆 𝑚𝑚 is changed from 0 to 1 as it does in the EVB and E designates the unconstrained potential surface of the system. The term with Kcage is adopted so that the fragments are still ` 14 confined in the volume, vcage, even when 𝑈𝑈 𝑝𝑝 𝑟𝑟 𝑆𝑆𝑤𝑤 , 2 𝑁𝑁 is set to zero as the free moving state. This is for a practical purpose because it is impractical to sample the large molar volume, v0=1660 Å 3 at the unconstrained state. We consider the energy of confining the fragments in the cage volume from v0=1660 Å 3 which is the last term in Eq 9. Kcage is chosen so that vcage will correspond to 55 M concentration. Now, we add this correction term to Eq. 6 and finally obtain the relative catalysis between the reactions in protein and water which forces the same reaction geometry but still considers the effect of forcing the same reaction geometry in water. 1.5 Examples of the application As a test case, one of the well-defined simulations with very similar intra energies between the protein and water was used where many constraints were carefully chosen after identifying the problematic intra energy contributions. (Figure 1-7-a) Note that the test case is still not a satisfying result due to the underestimated catalysis. Experimentally determined catalysis in the polymerase is 4~8 kcal/mol less than in the solvent. This case was taken as an example because the difference in the intra energies was successfully minimized after a countless number of trials. Also, the transition state is not observed on the surface. In this case, we usually take the inflection point of the surface for both Δg ϯ and ΔG. Fig 1-7-b was ‘automatically’ calculated according to the upper cycle in Fig 1-6 on the same protein trajectory. In principle, these two should produce the similar results because the manually adjusted case is already forced to proceed on the similar reaction geometries. And they indeed shows the similar ~1.5kcal/mol lower barrier than the water. Moreover, Fig 1-7-(b) shows the clear transition state barrier which was missing in the manual calibration. Most importantly, the electrostatic contribution in the manual case was 2kcal/mol better in the water while it was 2kcal/mol better in protein with the proposed automatic calibration which is how enzymes are supposed to lower the barrier. (Table 1-4) Now the same approach with the upper cycle was tested on the unconstrained protein EVB trajectory (Fig 1-7-c) and it is almost identical to the Fig 1-7-b. Therefore, the evaluation of the upper cycle seems to be consistent as long as only the environmental change is considered. ` 15 Now the question is what the cost is for constraining the reaction geometry in water. The final results after taking the lower cycle of the Fig 1-6 into account are shown in the Fig 1-8 and 1-9. The transition state mapping frame was chosen where the average energy gap, Δε, is the closest to zero. The correction term turned out to be +5.3 kcal/mol. And when calibrated with this correction term, -6.4 kcal/mol of the relative enzymatic catalysis was obtained which is much closer to the experimental data. Although this number looks excellent, more studies need to be done to confirm that the evaluation of the lower cycle is reliable and consistent. In fact, the same approach was tested for T7 RNA polymerase which catalyzes the same polymerization reaction and the results for the upper cycle was very similar to the ones reported here while almost zero kcal/mol was calculated for the correction term. In this report, the whole reaction fragments were considered in the calculation of the restrain release energy. It is currently under investigation to see if the results would be different when only the EVB atoms are considered. The dependence on the choice of the mapping frame or conformation also needs to be studied. 1.6 Computational cost One may think that this approach is computationally too expensive. Indeed, the typical EVB is usually run for 200~500 ps and extracting the reaction fragments at every 10 fs means that we have 20k to 50k fragments to solvate. Fortunately, the solvation of each snapshots converges very quickly because of their relatively much smaller size and their coordinates are fixed. The change of the energy gaps over time converges only after 1~2 ps. (Figure 1010) Also, since we already have all the reaction fragments extracted from the proteins, it is possible to make use of as many processors as available to perform an embarrassing parallelization. For this specific test case with 21 mapping frames, 60 octa core processors (AMD 6378) were used and the whole calculation took 20 hours. This is almost three times longer than the manual EVB simulation. However, considering the time that the new users usually have to spend to generate reliable results or the usual number of trials to find the balanced constraints, this approach will provide much easier but more systemic and reliable way to evaluate the EVB. ` 16 1.7 Conclusion and discussion The test cases introduced in this work suggest that this approach can be very helpful to get reliable results more easily. Also, the possible errors by constraining the reaction fragments can be explicitly evaluated. The test case shows that this approach produces very similar results to when it is generated by constraining the reaction geometry manually. Moreover, although more studies need to be done, this approach produced much more satisfying catalysis which the manual EVB calibration could not achieve even after a countless number of trials. Therefore, the proposed approach can be very useful to alleviate the steep learning curve and the catalysis can be evaluated in more systemic way. Figure 1-6. A schematic thermodynamic cycle for the automatic calibration. In the upper cycle, the reaction fragments are extracted from the protein along the EVB trajectories and then solvated while the conformations are fixed. (Depicted with the blue color circle) In the lower cycle, the constraints on the reaction fragments are released. ΔG RR is a restraint-release energy, which is the energy of releasing the restraints. ` 17 Figure 1-7. (a) One of the best EVB results calculated with 5 angle constraints and one bond constraint with 50 kcal/mol force constants which were carefully chosen after identifying the problematic intra energy contributions. (b) The result calibrated automatically by taking the upper cycle in Fig 1-6 using the protein trajectory from (a). (c) The result calibrated automatically by taking the upper cycle in Fig 1-6 using the unconstrained protein simulation trajectory Table 1-4. Energy contributions from the simulations corresponding to the Figure 7. Figure 1-8. The final calculation of the proposed approach pro water_manual water_auto pro water_manual water_auto Bond -1.6 0.0 -1.6 19.6 24.4 19.6 Angle -12.3 -12.7 -12.3 -9.3 49.8 -9.3 Torsion 1.4 1.9 1.4 1.2 -0.8 1.2 Electro -103.5 -105.0 -101.0 -114.0 -124.4 -111.5 VDW -0.7 -0.4 -0.7 -2.0 -1.3 -2.0 constrained trajectory Unconstrained trajectory ` 18 Figure 1-9. The energy surface before/after the correction term with the lower cycle in Fig 1-6. The calculated ΔΔg ϯ wat->pro after the correction is -6.4kcal/mol Figure 1-10. Convergence of solvating the fixed fragments -25 -20 -15 -10 -5 0 5 10 15 0 1 2 3 4 5 Δε (kcal/mol) simulation time (ps) ` 19 Chapter 2. Validating the Water Flooding Approach by Comparing it to Grand Canonical Monte Carlo Simulations 2.1 Introduction It is now widely recognized 24 that water molecules inside and around biological molecules play a major role in determining key functions. It is now appreciated that the effect of internal water molecules must be considered in evaluation of the energetics of biological processes, ranging from enzymatic reactions 25 , ligand binding 26-29 , redox reactions 30 , ion conduction 31 , to proton transport 32-33 and ionization of deeply buried protein residues 34-35 . The protein dipole Langevin dipole (PDLD) model has been arguably the first attempt to represent the water in and around the protein in an explicit yet simplified way in studies of the energetics of biological processes 36-38 . This model allowed one to consider water penetration but focused on the overall effective energetics rather than on reproducing possible exact hydrogen bonding pattern. The earliest attempts to incorporate water molecules in atomistic free energy calculations of ionized groups in proteins were reported in ref. 39 , using the surface constrained all-atom solvent(SCAAS) model 40-41 . Different adaptations of the SCAAS , and its earlier SCSSD version 42 boundary conditions ideas have emerged and been applied by other groups (e.g. ref. 43 ) and eventually have used proper electrostatic boundaries 44-45 . Continuum models were also developed as powerful tool of representing the effect of the external water molecules but attempts to consider in such models a few water molecules implicitly around the protein have not been so successful 46 . Despite the effectiveness of the PDLD, the advances in computer power led to subsequent different atomistic models and to detailed studies of the energetics of many biological processes (e.g.ref. 38 ). However, notable problems emerged in some studies of relevant systems. One of the most serious problems has been the overestimate of the solvation penalty for moving charges to non-polar sites in proteins where, for example, if we consider the important benchmark of ionizable residues provided by Garcia-Moreno and coworkers 34-35, 47 , we can obtain major overestimates of the penalty of moving the given charges from bulk water to the protein sites by ` 20 simple free energy perturbation (FEP) microscopic calculations (see 32, 48 ). This problem could be reduced by using a semimacroscopic model such as the PDLD/S-LRA, with a relatively high dielectric for the charging energy in problematic sites (around 6-8) 48 . A way forward has been offered by the overcharging approach that induced a partial unfolding of the protein by artificially increasing the solute charge and forcing accelerated water penetration 48 , but this method requires major computer time. Another area where the water positioning issue has become a critical problem is the study of ligand binding. Here it has been noted (e.g. refs. 26 , 49- 51 ) that performing free energy perturbation (FEP) or linear response approximation (LRA) may not allow for proper water equilibration. Therefore, before simulating macromolecules with explicit solvent, it is important to equilibrate the system with physically optimal water configurations. 2.2 Grand Canonical Monte Carlo method and Water Flooding With infinite amount of simulation time it is possible to obtain the proper water configurations and attempts to reach a formal rigor in water insertion in proteins cavities have focused on grand canonical Monte Carlo (GCMC) methods 52-53 ) as reported in several studies 54-56 50-51 . However, while such methods seem attractive, they require major computational resources, as the acceptance of the insertion attempt is very low. In other words, the chance of obtaining unoccupied space and the orientation of inserted water molecule without collision with other atoms is extremely low by random action. Interesting studies were reported by Essex and coworkers ( 50-51 ) who compared several methods and in particularly advanced actual GCMC, trying to determine the energetics of water insertion in proteins. These studies gave useful insight and demonstrated that GCMC can be used with sufficient investment of computer time, although special care is needed in attempts to reach convergence. Despite the potential of the above approaches, they were not validated by comparing the calculated energetics of electrostatic calculations (e.g. pKa calculations) with and without the ` 21 corresponding water insertion approach. This is important since in cases of internal charges the effect of water molecules can be very large and this should be useful in assessing the error range. Considering the challenges of developing both effective and reliable methods for water insertion, Warshel and coworkers introduced recently the water flooding (WF) approach 49 . In this approach, the energy of rationally inserted water molecules are determined using the LRA and the linear interaction energy (LIE) approaches and then the energetics for each configuration is calculated by post-processing MC approach. The post-processing strategy makes the method much faster than any GCMC, since it avoids the need to perform any explicit MC simulation during the energy evaluation but simply uses the pre-calculated energies. Despite the impressive results obtained with the WF approach 49 57 , this strategy should be further validated by comparing it to the more rigorous GCMC approach . In this work, a systematic comparison between the results of the WF and GCMC approaches in the evaluation of water penetration to proteins. 2.3 Methods 2.3.1 Grand Canonical Monte Carlo The GCMC model used in this chapter generated the usual SCAAS model for the protein and the surrounding solvent and selected a central point at protein region of the study, using a radial boundary that prevented movement of water in or out a specific radius. The subsequent step started by inserting and deleting water from the inner region in the following way. Any insertion attempt was followed by finding 20 orientations of the inserted water and then choosing the one with the lowest potential (Uin,min) and then accepting or rejecting the move by: Pin=min [1, (1/(N+1))exp(-(Uin,min 𝛽𝛽 − 𝐵𝐵 ))] (Eq 2-1) Where Pin is the probability of acceptance of an insertion move and B =( Δ 𝐺𝐺 𝑏𝑏 𝑏𝑏 𝑠𝑠 𝑏𝑏 𝛽𝛽 +ln<N>), N is the total number of water molecule in the system , <N> is the average N for the given B, Δ 𝐺𝐺 𝑏𝑏 𝑏𝑏 𝑠𝑠 𝑏𝑏 is the assumed free energy (more precisely chemical potential) of a water molecule in the bulk. ` 22 The selection of a single orientation was a simplification of previous approaches that used Boltzmann average 44 58 . The condition of Eq. 2-1 was satisfied by a standard Metropolis approach 59 Our GCMC method was implemented in the MOLARIS-XG program package 60 . The deletion moves were accepted by evaluating the potential Uout of a random water molecule in the inner region, and then applying the criterion Pout=min [ 1, N exp(+(Uout 𝛽𝛽 − 𝐵𝐵 ))] (Eq 2-2) where Uout is the potential energy of a water molecule that is a candidate for removal. Each insertion or deletion attempt was followed by 10 MD steps of the entire system that establishes the equilibration. Following ref. 50 we performed the GCMC steps in a sub region of the complete system( the equivalent of then box in ref. 50 ) , surrounded by a spherical wall of a specified radius. 2.3.2 The Water Flooding model Warshel and coworkers developed a WF approache 49 which drastically accelerates the insertion process by not performing any explicit MC water insertion but rather using a post processing MC strategy. While the details of the method are described in ref. 61 , the key details are summarized below. The WF method starts by generating different configurations with excess number of the internal water molecules, deleting those that collide with the protein and then evaluating the electrostatic energies of these molecules using the linear response approximation (LRA). This approximation estimates the free energy of each configuration of the internal water molecules by: ∆ 𝐺𝐺 ( 𝒎𝒎 ) = ∑ � ∆ 𝐺𝐺 𝑖𝑖 𝑝𝑝 � 𝛿𝛿 𝑖𝑖 ( 𝒎𝒎 ) 𝑖𝑖 ( 𝒎𝒎 ( 𝑝𝑝 )) + ∑ 𝑖𝑖 ( 𝒎𝒎 ( 𝑝𝑝 )) ∑ ∆ 𝐺𝐺 𝑖𝑖 𝑗𝑗 𝑝𝑝 𝛿𝛿 𝑖𝑖 ( 𝒎𝒎 ) 𝛿𝛿 𝑗𝑗 ( 𝒎𝒎 ) 𝑗𝑗 � 𝒎𝒎 ( 𝑝𝑝 ) � < 𝑖𝑖 ( 𝒎𝒎 ( 𝑝𝑝 )) (Eq 2-3) ` 23 where m is a configuration vector that runs over all the water sites, δi(m) is a function that represents the occupancy of the water sites in the current m th configuration. δi(m) is 1 when the i-th site is occupied and zero otherwise. p represents a protein sites. The terms in Eq.2-3 are given by: ΔGi = ½ <U i Q− U i 0>U,Q + β< U i vdw>U,Q = ½ Σk ∉ WAT< U ik Q − U ik 0>U,Q + β< U i vdw>UQ, where i ϵ WAT (Eq 2-4) ΔGij = ½ < U ik Q − U ik 0>U,Q + ΔG ij ins,0ij, where i, j ϵ WAT (Eq 2-5) U i Q and U i 0 denote the total “solvation” energy (self-energy) of the ith water molecule and the subscript denotes the charge distribution of the water molecule. The angular bracket denotes the ensemble average obtained by propagating trajectories over the polar state of the water. It is important to mention here that Eq. 2-4 does not include contribution from the water molecules, which have already been inserted in the protein cavity. The second part of the Eq. 2-4 is calculated using the LIE 62-64 which provides a good approximation to the free energy of creating a cavity for the ith water molecule in the protein. Note in this respect that most of the free energy of the water molecules is electrostatic and thus captured quite reliably by the LRA 46 . Eq. 2-5 denotes the total pair-wise interaction between all ith and jth pairs of water molecules, where U ij Q is the pair-wise interaction of the i-j pair and Q represents the charge on the atoms of the water molecules. The last term in equation 5 denotes the pairwise term of non-polar interaction energy. The calculated free energies of the internal water molecules are used to select different water configurations by a MC procedure and calculate the corresponding free energy using Eq. 2-3. The post processing WF MC approach was modified relative to our previous treatment 49 and follows the formulation of Eq. 2-1 . This modification arranged the ∆ 𝐺𝐺 ( 𝒎𝒎 ) in according to the number of water molecules in each configuration and then performed MC with moves that only add or subtract one water molecule. ` 24 For adding water molecule, we use Pm+1=min {1, (1/(N+1))exp[-(( ∆ 𝐺𝐺 𝑚𝑚 + 1 − ∆ 𝐺𝐺 𝑚𝑚 ) 𝛽𝛽 − 𝐵𝐵 ′ 𝑚𝑚 + 1 )]} (Eq 2-6) Similarly, we used the equivalent of Eq. 2-2 for deleting a water molecule. The parameter B’ is given by B’m+1 =( Δ 𝐺𝐺 𝑏𝑏 𝑏𝑏 𝑠𝑠 𝑏𝑏 𝛽𝛽 +lnNm+1). This parameter can be evaluated by determining the free energy of inserting a single water molecule. Alternatively, the bulk energy that corresponds to the insertion free energy at the half point between no water and full one water molecule can be taken as B’. This post processing MC approach allowed us to select the minimum free energy configurations. The radius and center of the simulating system for each cavity studied was set as stated in Table SI2-1. The simulations used the SCAAS surface constraints and the local reaction field (LRF) long-range treatment (see ref. 65 ). The protein structures were subject to 3000 steps of minimization using steepest decent, followed by 300p MD relaxation, using the polarizable ENZYMIX force field 65 with time steps of 1.0fs and the solute parameters given in refs. 66-67 . These structures were then used for GCMC and WF simulations. During both the GCMC and WF simulations, a spherical hard wall was placed so that the inside water molecules could not escape and the outside water molecules could not come in. In GCMC, the insertion and deletion were attempted only inside this hard wall. 2.4 Benchmark tests and results We started by exploring the relationship between the performance of the WF and GCMC in a bulk water system, modeled as a water sphere. The dependence of the number of inserted water on the B value for the GCMC simulations are summarized in Fig. 2-1 and the corresponding WF results are depicted in Fig 2-2. As is clear from figures both models give very similar results. ` 25 Figure 2-1. GCMC titration for water molecules in a bulk water sphere with different radius for GCMC insertion/deletion. Figure 2-2. WF titration for water molecules in a bulk water sphere with different radius for WF region. Next, two methods are compared with more challenging case, bovine pancreatic trypsin inhibitor (BPTI)(PDB:5PTI 68 ), which has been previously tested by Ross et.al 50 A site that is known to contain a single water molecule was first tested. The corresponding B dependence by the GCMC and the WF models are described in Figs. 2-3 and 2-4. As seen from the figures, the trend in both figures is similar. An additional validation was performed by considering the 3 water site in BPTI 50 . Here again we compared the B dependence of the GCMC (Fig. 2-5) and WF (Fig. 2-6). The studies presented were done with different constraints on the ` 26 protein and the results seem to follow a similar trend. Next, the positions of the internal water molecules in the 3 water cavities were compared from two methods. As seen from Fig. 2-7, both methods produce similar structures and B dependence profile. Figure 2-3. GCMC titration for water molecules in the single water cavity in BPTI Figure 2-4. WF titration for water molecules in the single water cavity in BPTI ` 27 Figure 2-5. GCMC titration plot for the 3-water cavity in BPTI, when the protein is constrained (blue) and unconstrained (orange) Figure 2-6. WF titration plot for water molecules in the 3-water cavity in BPTI, when the protein is constrained (blue) and not constrained (orange). WF calculation constrained on the structure generated by GCMC at B value=0 is given in a green color plot. ` 28 Figure 2-7 Comparing the 3-water cavity sites of BPTI obtained by the WF(blue) and GCMC(pink) approaches, at B value of -8.0. The cavity water molecules are colored same as the protein backbone. A crucial aspect that has to be addressed by both the WF and the GCMC approaches is the dependence of the number of inserted water on the protein structure. To establish this problem, we considered the 3 water site of BPTI and generated open configurations by running short GCMC with large positive B values. The corresponding configurations were used in WF calculations (Fig. 2-8) generating an increase in the number of the inserted water molecules. Similar trend occurs in the GCMC calculations (Fig. 2-9). ` 29 Figure 2-8. GCMC titration plot with different initial number of water molecules in the 3-water cavity in BPTI Figure 2-9. WF titration plot with different initial number of water molecules in the 3-water cavity in BPTI The problem is, of course, to determine which configuration has the lowest free energy. This can be done by specialized LRA type treatment e.g. (see ref. 69 ). ` 30 ∆G( 𝑟𝑟 1 → 𝑟𝑟 2 ) = 1 2 � 〈 ∆ 𝜀𝜀 ( 𝑞𝑞 𝑚𝑚 → 𝑞𝑞 𝑚𝑚 ′) 𝑝𝑝 1 , 𝑚𝑚 〉 + 〈 ∆ 𝜀𝜀 ( 𝑞𝑞 𝑚𝑚 → 𝑞𝑞 𝑚𝑚 ′) 𝑝𝑝 2 , 𝑚𝑚 ′ 〉 � (Eq 2-7) where ∆ ε designates the difference in energy of the two states in the water charging process, 〈 〉 𝑝𝑝 𝑖𝑖 , 𝑚𝑚 indicates an average over the charge distribution, qm, where the system is held by a weak constraint near the position vector of the indicated states (ri). This expression tells us how much it costs to move the protein from one configuration to another. In order to evaluate the energetics of the protein deformation from the geometry obtained with 3 water molecules, r1, to that obtained with 7 water molecules, r2, we performed the calculations while keeping the number of water molecules the same in both states (7 water molecules) but setting to zero the fractional charges on the last four inserted water molecules, when considering state 1. This calculation gave 5.56 kcal/mol penalty for moving to the conformation of the 7 water molecules. This penalty compensates for the gain in the water energy upon moving from 3 to 7 water configurations at B=- 8 (Fig. 2-10). Of course, in the present case with the correct B (-13.8) we already have negative energy in going from 7 to 3 water molecules and the deformation energy make this difference even larger. At any rate, our approach can be used as a general tool in cases of significant conformational change due to additional water molecules. It is also possible to evaluate the protein deformation energy by the far more expensive overcharging approach 48 . ` 31 Figure 2-10. Plot of minimum free energies obtained by the WF’s MC post processing, for configurations of 3 and 7 water molecules, at the 3 water molecules cavity in BPTI. The difference of the minimum LRA energy between the two configurations is -5.64 kcal/mol at a B value of 8.0 (a), 3.15 kcal/mol at a B value of 13.3 (b) and the free energy of protein deformation from 3 water molecules to 7 water molecules configuration calculated according to eq.6 is 5.56 kcal/mol (c). At this stage, it may be useful to compare the computational time of the WF and GCMC methods. This is done in Figs. 2-11 and 2-12 with the WF and GCMC, respectively, for the 3 water site of BPTI. As seen from the figure the WF approach easily converges in less than 10ps while the GCMC needs at least 5 ns (probably much longer) to converge. ` 32 Figure 2-11. GCMC titration plot for runs from 100ps to 1 (and 5) ns for the 1 water and 3 water cavities in BPTI. Figure 2-12. WF titration plot for runs from 10ps to 100ps for the 1 water and 3 water cavities in BPTI Two methods of water insertions were compared for V66D Staphylococcal nuclease (SNase, PDB:2OXP 47 ), which was also studied in ref 48 as a benchmark of the WF approach. In this case residue V66 is located in a hydrophobic site and the mutation to ASP creates a case where an internal acid cannot be ionized until it is stabilized by internal water molecules and/or reorganization of the protein. As seen from Figure 2-13 the WF approach leads to a fast convergence of the internal water molecule to a configuration that was found to reproduce the observed pKa 48 . On the other hand, it is clear in Figure 2-14 that the GCMC has significant convergence difficulties. ` 33 Figure 2-13. WF titration plot from 10ps to 100ps simulation for SNase Figure 2-14. GCMC titration plot from 100ps to 1ns simulation for SNase ` 34 2.5 Discussion Water molecules play a crucial role in determining the energetics of biological processes. Thus, it is crucial to capture, in computational studies of protein functions, the correct effect of water inside and around proteins. The problem is particularly challenging in cases of highly polar or charged environments in protein interiors. Addressing this challenge by regular MD studies is far from simple, since the equilibration time of water penetration processes might be extremely long. One may use specialized approximated approaches such as our WF method 49 or other strategies (e.g. refs. 27, 56 ). However, it is important to assess the validity of the approximation used. An attempt to validate Just Add Water molecules (JAWS) approach of ref. 27 was reported in ref. 50 , but this validation concluded, however, that the MC part of the JAWS has difficulties in convergence and requires long simulation time and that converting the water densities to unique positions is far from simple. Furthermore, no validation in terms of the dependence on the B value was reported. As to the WaterMap. 28, 70 approach; one of the arguments has been that this approach can provide a way of capturing the water entropic effect. Now, while the entropic contributions of ordering of the water molecules were expressed by elegant expansion terms (e.g. ref. 28 ), not only does this treatment have possible formal problems (e.g. ref. 71 ), but also, to the best of my knowledge, there are no reported careful quantitative validations of such entropy estimates such as using restraint release (RR) treatment 72 . A fast evaluation of the water entropic effect is still a non-trivial problem. The WF approach has been used successfully in challenging cases but again has not been validated in terms of its rigorous foundation. In the present work, the WF approach is validated by comparing its results to those of the corresponding GCMC simulations. The comparison yielded very encouraging results, basically recovering the GCMC trend. In doing so the WF reproduced the GCMC for both the configurations and water insertion energy (as is apparent from having a similar B value). Apparently, as is clear from the present paper and from ref. 50 one can use GCMC in studies of water penetration to proteins. Unfortunately, the computer time needed for converging GCMC results is very extensive where the results of Fig. 2-11 show that we need more than 5 ns for the 3 water molecules case. In fact, an estimate of ref. 50 seems to suggest that GCMC should use ` 35 about 200 million MC steps per each B value simulation, which correspond to 2 μs in our MD/GCMC implementation. On the other hand, the WF simulations takes about 10 ps for both 1 and 3 water molecules. Thus, this is the method of choice, when one is interested in practical yet reliable calculations, for example, in case of screening for ligand binding. Moreover, if one likes to generate a titration plot of the average number of water molecules at each B value, many independent GCMC simulations are needed to be run for the entire range of B values, while WF only needs one short MD run since the very fast post-processing MC can generate a titration plot within a few cpu seconds. An interesting issue that was addressed in the present work is the dependence of the configuration of the water molecules on the protein conformations. This fundamental issue is far from trivial and in some cases should be treated by Eq. 2-7. In fact, the same problem exists with GCMC and other approaches. Perhaps, the dependence on the protein configurations should be handled in the same way they are handled in the stage where we start with different initial protein configurations to obtain the average results for FEP calculations (or in the related replica exchange steps). The issue of water insertion is the most critical in studies of the effect of internal water molecules near charged residues, as is the case in cytochrome c oxidase 49 and the V66D in SNase considered as above. Such calculations can be very expensive in particularly when we consider pathways of charge transfer, and thus the WF approach is ideal for such type of problems. Overall, I believe that the WF approach is very efficient and powerful and is arguably the method of choice in cases of calculations that are limited by the computer time (e.g. fast screening of drugs). However, if one is interested in very expensive, and in principle more rigorous GCMC, it can be accelerated by starting from the WF configurations. ` 36 Chapter 3. The control of the discrimination between dNTP and rNTP in DNA and RNA polymerase 3.1 Introduction In this chapter, we calculate the EVB and the free energy perturbation to elucidate the source of the discrimination of dNTP and rNTP by DNA and RNA polymerase. And the automatic calibration and the Water Flooding approach introduced in the previous chapters are used to improve the calculations, though the rest of this thesis including this chapter more focus on the analysis of the calculated results, rather than on the methods. Gene replication and transcription play a key role in the development of biological systems. These processes are catalyzed by DNA/RNA polymerases with high fidelity that maintains very low error rates. Such high fidelity is essential for the stability of reproduction and cellular activity but the occasional errors are also important source of evolution. The mechanism that DNA polymerases adopt to selectively choose the right Watson-Crick base pair between the template strand and the incoming nucleotide substrate has received a great attention in many studies. [1-15] Another important selective behavior of DNA and RNA polymerases is the discrimination against rNTP and dNTP, respectively. It is believed that DNA was evolved from RNA to stabilize earlier genetic system by giving a specific role to DNA as an information carrier in cells [16, 17]. One of the evidence that supports the RNA-to-DNA transition hypothesis is the analogous active site that uses two divalent ions mechanism, as well as the finding that rNTP is reduced to dNTP by reductases. [18, 19] Understanding the rNTP and dNTP discrimination should help in advancing the fundamental description of replication processes and can also provide a powerful way for developing selective drugs that act against viruses and cancers. [20, 21] In trying to understand the above selectivity, it is useful to consider a highly conserved tyrosine (e.g. Y416 and Y639 in RB69 DNA polymerase and T7 RNA polymerase, respectively). This residue is thought to be the key residue that contributes to the sugar selectivity, since it is the only residue near the 2’ position of the ribose sugar (see Figure 3-1)and [10, 22, 23] ).Yang et al. ` 37 [24] has shown that the mutant, Y416A, of RB69 DNA polymerase incorporates rNTP much more efficiently than its wild type. (Table 1) Similarly, but in an opposite way, Sousa et al. [25] mutated the corresponding tyrosine residue of T7 RNA polymerase, Y639, to phenylalanine ( since in RNA polymerase, the extra 2’OH is probably preferred by its hydrogen bonds with the hydroxyl in Y639, and the mutant behaved as a DNA polymerase). (Table 3-2) These experiments were aimed at exploring the assumption [26, 27] that the tyrosine in DNA polymerase acts as a steric gate to the ribose of the incoming rNTP, while the corresponding tyrosine residue in RNA polymerase stabilizes the incoming rNTP by making a hydrogen bond with the hydroxyl group of the ribose. Figure 3-1. Analogous active sites of DNA polymerase (a) and RNA polymerase (b). These two polymerase catalyze the same reaction with the same key groups; two aspartate acids and magnesium ions. Furthermore, both have a tyrosine right above the 2’ of the ribose sugar. Table 3-1. Experimental data set from Yang et al. 73 and the catalytic effect of changing from deoxyribose to ribose sugar in RB69 DNA polymerase ` 38 Table 3-2. Experimental data set from 74 and the catalytic effect of changing from deoxyribose to ribose sugar in T7 RNA polymerase The above hypothesis cannot be established uniquely by experimental studies, since it is hard to distinguish experimentally between electrostatic and steric contributions. Fortunately, computational approaches can help to determine what are the actual energy contributions that lead to the selectivity. Warshel and coworkers previous studies have successfully elucidated sources of the high replication fidelity and the catalytic mechanism of DNA polymerase β [4-6, 13] Thus, the same approach is applied in this paper to explore the origin of the sugar selectivity by RB69 DNA polymerase, using the empirical valence bond / free energy perturbation ( EVB/FEP ) calculations [28-31] and pKa calculations by the semi macroscopic version of the protein dipoles Langavin dipoles (LDLD/S-LRA ) approach. [32-34] The calibrations of the EVB for two different polymerases with several mutants are challenging because each system might have different reaction geometries and orientations. And a reasonable calibration for one system can generate a large intramolecular contribution to the reaction profile of another system. Although various constraints can be imposed to force the reactions in different systems to look similar, it is still tedious to find such balanced constraints that satisfy all systems. The automatic calibration can be very useful in this case and the results reported in this chapters were generated with minimum efforts for the calibration. ` 39 3.2 Methods The task is to evaluate the barriers that correspond to kpol and /or the TS binding energies, which are related to kpol/KD through (See 75 ) ΔG 𝑏𝑏 𝑖𝑖 𝑐𝑐 𝑏𝑏 𝑇𝑇 𝑆𝑆 = Δg 𝑝𝑝 𝑝𝑝 𝑝𝑝 𝑤𝑤 𝑟𝑟 𝑖𝑖𝑐𝑐 ‡ - Δg 𝑤𝑤𝑤𝑤𝑤𝑤 𝑟𝑟 𝑝𝑝 ‡ = -RT ln 𝑏𝑏 𝑝𝑝𝑝𝑝 𝑝𝑝 𝐾𝐾 𝐷𝐷 + RT ln𝑘𝑘 𝑤𝑤𝑤𝑤𝑤𝑤 (Eq 3-1) The evaluation of this relationship should be done with a sufficiently reliable strategy. The previous studies with the EVB have successfully revealed the important factors for the high replication fidelity of DNA pol beta. [4-6, 13] Here, the same methods are applied for exploring the selectivity of RB69 DNA polymerase with the following three different substrates; the natural substrate (PDB: 4FK0) and the mismatched Watson-Crick base pair substrate (PDB: 4FK4), which are reported in Ref. [36]. The incorrect sugar with 2’OH was also explored the incorrect sugar with the 2’OH, which was obtained by substituting the 2’H by a hydroxyl group for the incorrect sugar structure and relaxing the structure by the same procedure used for the wild type. The stepwise associative mechanism was considered, as it was found out in our previous studies [37-40] that it gives similar trend to that obtained by the concerted mechanism, but it makes it simpler to explore the TS energy (by evaluating the energy of the pentacovalent intermediate that is being formed between the nucleophilic attack and the bond breaking steps) . The starting structures were immersed in a 18 Å sphere of water molecules subject to the surface-constraint all-atom solvent (SCAAS) type boundary conditions [33]. Long range effects were treated by the local reaction field (LRF) method (which is equivalent to having no cut-off for the electrostatic interactions) [33]. The center of the simulations sphere was defined by the geometric center of the EVB reacting atoms (defined in figure S3-1). The parameters used in the EVB calculations are described in section SI. The positions of all atoms beyond 18 Å from this center were fixed at their crystallographic positions and their nonbonded interactions with the atoms within the simulation sphere were turned off. Note that the SCAAS allows one to obtain reliable results using smaller explicit systems than those used by most other methods. The protonation states were determined by ` 40 calculating the pKa s with our coarse grained (CG) model[41] and then keeping the ionizable residues within 12 Å from the simulation center in their ionization states. The effect of the more distanced ionizable residues was determined macroscopically while using a large charge–charge dielectric [42]. The immersed structures were equilibrated by increasing the temperature from 10 K to 300 K during 200 ps run , while imposing on the protein atoms 100 kcal/mol Å2 harmonic constraint. The temperature was decreased back to 10K over another 200 ps with the same harmonic constraint and the constraint was then released over 100 ps. The systems were reheated to 310 K over 200 ps with fully released constraint followed by 800 ps relaxation. A further relaxation was done for another200 ps to generate 5 different starting configurations for the EVB and FEP calculations by picking a structure at each 40ps. The resulting structures were used in starting the free energy calculations, and the corresponding final results were taken as the average of the 5 runs. The EVB simulation used 51 frames with each 10 ps for moving the potential from the reactant to the product potential at 300 K with a time step of 1 fs. The automatic calibration introduced in Chapter 1 was applied for the calibration of gas phase shift and off-diagonal parameters. 10 kcal/mol Å2 harmonic constraint was applied to two magnesium ions for both simulations in protein and reference system (water) which otherwise do not keep their positions close to the reactive site in the water simulation, which are used for the calibration. Note that the effect of the Mg ions has been explored in many studies (e.g. [43, 44]) that include examination of the predictively of the binding of the second metal ions and the late binding of the second ion. More recent studies (e.g.[45, 46] also addressed related questions. However, when the Mg is treated classically without our six dummy atom model it is justified to use harmonic constraint for fixing the mg ions near their observed position. The FEP simulation of the alchemical transformation used the intermediate state (see below) structures from the EVB simulations. The alchemical simulations were run with the same number of frames and time step as the EVB simulations. As stated above the FEP and EVB calculations were obtained by averaging over the 5 different starting configurations. Note here that choosing the intermediate state as the model for the TS exploits above mention the fact that the TS for concerted path has similar energetics as the intermediate state (IS) , but it ` 41 provides more stable results for the binding energy calculations. Despite the different studies of the TS structure (e.g. [47, 48]), the QM/MM analysis by Warshel and coworkers [49] is quiet reliable and clearly sufficient for exploring the selectivity towards the hydroxyl of the ribose group which is mainly influenced from the environment of this group. At any rate, in this work the IS is referred as the TS because of the above considerations. The analysis of the reaction mechanism indicated that it is most likely to involve proton transfer (PT) to the bulk and that the PT reaction is not likely to be the rate limiting [50], and thus it is sufficient to calculate ∆GPT instead of ∆g ‡ PT. This is equivalent to calculating the pKa shift of the proton donor (the 3’OH group) . The calculations of the pKa shift were done by PDLD/S-LRA in the MOLARIS package. [33] The PDLD/S-LRA method is a semimacroscopic method that take the Protein Dipole Langevin Dipoles (PDLD) method and scale it with a physically based dielectric constants. This method evaluates the charging energies with MD relaxation runs for the charged and uncharged stated in the framework of the LRA formulation. The details of this method as been described extensively and can be found for example in ref. [51] Lastly, the Water Flooding approach introduced in the Chapter 2 is applied to the FEP calculations and the results are compared to the calculation without the Water Flooding method. 3.3 Results 3.3.1 Analyzing of the selectivity of DNA polymerase In first step of our analysis we performed EVB/FEP calculations of the reaction profiles of the different systems. The results of the calculations are given in Fig. 3-2 As found in our previous studies [4-6, 13], the difference in the activation barrier between the correct and incorrect Watson-Crick base pair substrates was reproduced in this case as well. However, the calculations did not reproduce a significant difference between the calculated barrier of the deoxyribose and ribose sugars systems. Apparently, it is challenging to capture throughout the entire reaction pathway the effect of the small differences in the partial charges of the 2’OH between rNTP and dNTP, because most of the contributions are obtained from the highly charged triphosphate and the base. In fact, Figure 3-2 shows slightly unfavorable barrier due to the extra hydroxyl group of ` 42 the ribose sugar. The difficulties of obtaining converging results that reproduce the observed trend reflects in part the fact that the contribution to selectivity from kpol is significantly smaller than that from kpol/KD. Thus, we should try to use the thermodynamic cycle needed to the evaluation of kpol/KD. With this on mind we considered the cycle in Fig. 3-3 (see below) Figure 3-2 The EVB free energy profiles for the catalytic reaction with different substrates . Figure 3-3. The thermodynamic cycle for the alchemical transformation of the ribose sugar in NTP. E, S, and S’ represent enzyme, cognate substrate, and non-cognate substrate, respectively. The energetics of the states involved is described schematically in Fig. 3-4 ` 43 Figure 3-4. Schematic description of the energetics of the states considered in the thermodynamic cycle of Fig.3-3 In view of the discussion above we turn to the use of alchemical transformations in evaluating the energies of the cycle of Fig 3-3. That is, we evaluated the catalytic efficiency (kpol/Km), for the wt and the Y416 mutants, by FEP simulations of the relevant alchemical transformations based on the thermodynamic cycles of Fig. 3-3 (where the energetics of the relevant states is described in Fig.3-4 ). This type of alchemical cycle was used in previous related works ([53, 54] ). Finally, it is also useful to point out that this alchemical study served also a useful purpose in allowing us to explore the performance of our water flooding approach in an attempt to improve the agreement between the calculated and observed free energies. ` 44 This study explored the effect of mutating the deoxy-ribose sugar of dCTP to the ribose sugar of rCTP using the alchemical pathway, trying to obtain the contribution from the hydroxyl group while leaving other parts of substrate unchanged. More specifically, the relative catalytic efficiency, Ω = (kpol/KD) [OH] /(kpol/KD) [H] (Eq 3-2) can be expressed in terms of the corresponding free energies using the thermodynamic cycle of Fig. 3-3 as log Ω = - β(ΔΔGTS [H]->[OH] - ΔGsol [H]->[OH] )/2.303 (Eq 3-3) where ΔΔGTS [H]->[OH] and ΔGsol [H]->[OH] are the free energies of substituting the 2’H of the dNTP to an hydroxyl group in the TS of the reaction in the protein and in solution, respectively. The actual simulations were done in the opposite direction, mutating rCTP to dCTP, since it is more practical to mutate the hydroxyl group to a hydrogen atom and a dummy atom than to perform the opposite transformation. It is known that free energy calculations associated with mutations to dummy atoms frequently experience an extremely large vdW energy explosion, when the mapping parameter, λ, approaches the limit when we the run on the state were the forces are given almost entirely from the potential with the dummy atom (in this case we can obtain enormous artificial free energy contributions from the van der Waals (vdW) energy of the state with the real atom) . This situation is known as “vdW catastrophe” [55]. To avoid this problem, we used a modified vdW function whose exponential repulsion part is linear below a cutoff set at 1.8 Å for the hydrogen and 3.0 Å for the oxygen. The results of our calculations of the relative catalytic efficiency (log Ω) are given in Table 3-3 and the energy contributions from the FEP procedure are listed in Table 3-4 and Table S3-2. We also depict in Figure 3-5 the relative catalytic efficiency. ` 45 Without H bond const With H bond const exp Original Water Flooding Original Water Flooding WT -4.80 -7.29 -5.78 -8.48 -7.03 Y416A -1.28 -1.78 -2.00 -1.24 -1.52 Y416F -4.19 -6.25 -4.57 -9.32 -6.37 Table 3-3. The relative catalytic efficiency (log Ω) calculated for DNA polymerase with and without the water flooding postprocessing and 4.0 kcal/ mol Å 2 hydrogen bond constraint between the backbone nitrogen of the Y416 and 3OH’ in the substrate. Electrostatic VDW solvent 9.08 -3.05 WT 4.43 -5.76 Y416A 5.58 -2.40 Y416F 7.29 -6.83 Table 3-4 Breakdown of the calculated free energy contributions to the TS binding in RB69 DNA polymerase. Energies are in kcal/mol . The calculations involve the water flooding approach and no H bond constraint . Figure 3-5 . The relative catalytic efficiency (log Ω) calculated for DNA polymerase ` 46 In the first set of calculations we obtained overestimates for both the wild type and Y416F, while the results for Y416A agreed with the experimental trend. It is hard to tell whether such overestimates reflect the electrostatics or vdW contributions, since both terms showed about 5 kcal/mol differences compared to the calculations in solution. Next, we used our water flooding approach, and the corresponding results (Table 3-4 and Table S3-2) did improve the agreement with the observed values for the wild type and Y416F ( the result for Y416A was slightly worse, but the difference is not significant to be meaningful). The detail comparison between the electrostatic and vdW contributions in Table 3-4 shows that the improvement comes mostly from the increase in the vdW contributions. As seen from Figure 3-6, it is hard to see a clear differences between structures generated by the water flooding approach and the structures obtained by the regular protein solvation procedure. However, there are always extra water molecule besides the tyrosine. Thus, it seems that the water flooding treatment has improved the results by better relaxation rather than by direct interactions between the substrate and the optimally selected water molecules. ` 47 Figure 3-6. A comparison between normally solvated structures (red) and structures processed by the water flooding approach (blue). All initial trajectories used were superimposed and the water molecules are presented as molecular surface with transparency. The extra water molecules found around the Y416 after the water flooding are highlighted with circles. To further inspect the source of improvements we calculated the distances between the sugar and the tyrosine (Figure 3-7). The original structural study [36] has suggested that the 3’OH makes an hydrogen bond with the backbone nitrogen of Y416, and thus increase the binding energy of the incoming nucleotide. Indeed, the hydrogen bond during the calculation with the original structure (without the water flooding) was observed for most of times (Figure 3-7-a). Interestingly, however, after the structure was processed by the water flooding approach and then simulated by the FEP approach, it almost always showed a longer H-bonds distance. Note that all results reported in this paper are averaged over 5~10 different initial trajectories. Some trajectories did not even make a hydrogen bonds and still reproduced the observed energetics better than without the water flooding. To see this effect more clearly, we also performed FEP mutation, subjecting the hydrogen bond length to a constraint with an equilibrium length of 2 Å and a force constant of 4.0 kcal/mol Å2 . As shown in Table 3-4, regardless of the use of the water flooding approach, the results were worse with the hydrogen bond constraint except for the alanine mutant. As a consequence of this ‘loosen’ hydrogen bond in the structures generated by ` 48 the water flooding approach , the distance between the 2’OH and the Cγ of Y416 was also slightly further as well. This explains the alleviated vdW contributions in the wild type and the Y416F calculations. Figure 3-7. (a) Hydrogen bond distances between 3’OH and the Y416 backbone during the FEP calculation for structures. (b) The distance between 2’O and Cγ of the side chain of Y416 during the FEP calculation. Three cases were tested with the original structure (red), with the structure processed by water flooding (blue), and the same structure by water flooding but the hydrogen bond 3’O and HN is forced by a 4.0 kcal/mol Å 2 harmonic constraint at 2 Å (green). 3.3.2 Analyzing of the selectivity of DNA polymerase Building on the success in the evaluation the selectivity in DNA polymerase we used the same alchemical procedure for T7 RNA polymerase. In this case we used the α β methylene ATP elongation complex structure (PDB:1S76 76 ), where we modified the substrate to ATP and mutated Y416 to a phenylalanine, in order to simulate the corresponding mutant . The results are summarized in Tables 3-5 and 3-6. Apparently, when we just ran the simulations with our ENZYMIX force field, the proposed ( 20, 77 ) hydrogen bond between the O2’ and the hydroxyl ` 49 group of Y416 was not obtained and the observed effect of the mutation was not reproduced. Thus, we imposed a 15 kcal/ mol Å 2 distance constraint with a 1.9 Å equilibrium value, to force the hydrogen bond in the wild type. To position the mutated phenylalanine in a similar position to that of the tyrosine in the constrained wild type system, a similar distance constraint of 15 kcal/mol Å 2 and equilibrium distance of 2.8 Å were imposed on the distance between O2’ and the substituted hydrogen of the side chain of the phenylalanine . With these modifications, we could reproduce the experimental data (a reasonable trend was also obtained when the force constant was reduced to 6 kcal/mol Å 2 ). As seen from Table 6 the sugar discrimination in T7 RNA polymerase is most probably due to electrostatic effects since the only difference between the wild type and the mutant is obtained from the electrostatic contribution. One may assume that imposing constraint on the H bond leads to our conclusions. However, it is quite likely that the hydrogen bond classical function should be refined by comparing to ab initio calculations and also should be represented by an EVB hydrogen bond potential. Exploring this issue is left for future studies. exp normal H bond forced Wild Type -3.3 1.1 -3.2 Y639F -0.9 -2.3 -1.4 Table 3-5. The relative catalytic efficiency (log Ω) for T7 RNA polymerase structures with and without the constrained hydrogen bond with 15kcal/mol Å 2 distance force constant between O2’ and the hydroxyl group in Y416 at 1.9 Å was imposed in the wild type while a corresponding distance constraint at 2.8 Å between O2’ and Hζ in F416 was imposed in the mutant WT/H-bond Y639F with const Solution Electrostatic 10.1 6.8 9.1 VDW 0.3 0.5 -3.0 Table 3-6. Free energy contributions of the alchemical mutation in T7 RNA polymerase. Energies are in kcal/mol. The results were obtained with a hydrogen bond constraint between O2’ and the hydroxyl group. In Y639F, a corresponding constraint was imposed to position the mutated phenylalanine at the constrained wild type. ` 50 3.4 Conclusion This work explored the origin of discrimination between rNTP and to dNTP by DNA/RNA polymerases, using systematic calculations of the change in the transition state (TS) binding free energy in RB69 DNA polymerase and T7 RNA polymerase. The calculations reproduced the observed trend and supported the idea that the discrimination is due to the steric interaction between of the 2’OH and Tyr416 in DNA polymerase, while the electrostatics is the source of the discrimination in RNA polymerase. The calculations demonstrate the importance of the water contribution by the improvement of the results by using the water flooding approach, and indicate that the determination of the correct water configuration can be crucial for reliable binding calculations. Overall, the elucidation of the source of the discrimination can help significantly in developing selective drugs that act against viruses and cancers. ` 51 Chapter 4. Simulating the Fidelity And The Three Mg Mechanism of Pol η and clarifying the validity of transition state theory in enzyme catalysis 4.1 Introduction The nature of the catalytic reaction of polymerase η is a subject of a major recent interest. This interest is due in part to the important role of Pol η and other members of the Y family of DNA polymerases incurred by a multitude of endogenous and exogenous factors that challenges the replication machinery. 78-81 The Y family can catalyze translesion synthesis (TLS) across sites of damaged DNA. 82 Among eukaryotic translation polymerase, Pol η is considered to play an important role in 8-oxo-7,8-dhyedro-2’-deoxoguanosin(8-oxoG) bypasses (where the active site structure is depicted in Fig. 4-1). This involves a reduced fidelity for 8-oxoG. 78, 80, 83 Further interest in Pol η has grown recently with the realization that this system provides a strong support for the proposal that 3 rd Mg +2 ions are crucial for the catalytic action of DNA polymerase 84 . More specifically, Yang and coworkers 84-85 explored the action of pol η by an in crystallo study, which concluded that the reaction cannot proceed with only 2 Mg +2 ions and that the 3 rd Mg +2 binds at the transition state and leads to catalysis (see more in section 4.4.2) . These works addressed fundamental problems and raised interesting issues, but also provided a problematic perspective of transition state theory (TST) with the proposal that the 3 rd Mg +2 presents a new paradigm in DNA polymerases and in enzyme catalysis. Ref. 86 explored the 3 Mg mechanism in pol β by QM/MM calculations and structural studies, focusing on the pyrophosphorlysis reaction (which is the reversed phosphodiester bond formation reaction). Although the calculated barriers overestimated the corresponding observed barrier and the barrier for the original 2Mg phosphodiester bond formation reaction has not been reproduced (see section V), this insightful study provides hints on the importance of the product releases stage. ` 52 Interestingly, there are other systems where there are evidences that the reaction involves 3 metal ions ( e.g alkaline phosphatase 87-88 ) , and this further emphasize the importance of understanding the role of the 3 rd metal. One of the aims of the present chapter is to explore the effect of the 3 rd Mg +2 and to clarify the nature of the catalytic effect in Pol η. The second aim is to explore the origin of the reduced fidelity of Pol η towards 8-oxoG. It is found that the 3 rd Mg does not provide a major reduction of the chemical barrier and that its action is fully consistent with TST. However, this ion plays a major role in the product release stage. Furthermore, the simulations reproduce the observed trend in fidelity of Pol η. This chapter also shows the usefulness of the automatic calibration of the EVB, especially for the calculation that combines the PMF and the EVB (Figure 4-4) where multiple configurations with different position of the 3 rd metal are considered for each EVB simulation. In addition, although this chapter does not compare the results with and without the Water Flooding, all of the simulated structures were prepared with the Water Flooding considering that it is often challenging to generate reliable configurations with the highly charged active site, especially in this case with the triphosphate and the 2 or 3 metal ions in which. With the help of these two methods, the validity of the transition state theory could be explored and clarified with consistent results. ` 53 Figure 4-1. The structure of polymerase η and its active site 4.2 Methods As the main purpose of this chapter is to prove the validity of the transition state theory with the polymerase η, all calculations were done by the EVB with the same parameters and structure relaxation processes used in the chapter 3 for RB69 DNA polymerase except the longer relaxation for 1 ns after the equilibrating the structures. All the calculations used the MOLARIS- XG package. 89-90 The EVB study considered two structures. The first one is the structure resolved with the 3rd manganese by Gao et al. 84 (PDB : 5L9X) , where we changed the manganese ions to magnesium ions for the calculation. The second structure (PDB: 4O3N) was obtained in an earlier study 83 which was reported with the absence of the 3 rd magnesium. The N3A of the incoming nucleotide substituted from O3A to stall the catalytic reaction in the structural study was replaced with an oxygen. ` 54 The binding free energies of the incoming nucleotides in each structure are also evaluated by the semimacroscopic version of the protein dipole Langevin dipole in the linear response approximation version (the PDLD-S/LRA). 89, 91-92 More careful treatment of the dielectric response is probably required here because the substrate has a highly charged triphosphate group surrounded by metal ions and ionized residue. Thus, we estimated the intrinsic binding energy with neutral protein and dielectric constant of 4, while using an effective dielectric constant of 60 for the charge-charge interaction between the substrate and the side chains of the protonated protein. The need and justification of our treatment concept has been clarified in great detail in Ref 93-94 . 4.3 Results As stated above, our study addressed both the nature of the fidelity in Pol η and the role of the 3rd metal ion. Both investigations are as described below. 4.3.1 The Fidelity of Pol η To explore the nature of the fidelity of Pol η, the reaction profile for the right (R) and wrong (W) base pairs were calculated. Since the structure for the wrong base pair is not available, it was generated by substituting the cytosine base of the incoming nucleotide in the original structure (PDB:4O3N) 83 by an adenine base. This study also examined the fidelity of the 8-Oxo-7,8- dihydro-2′-deoxyguanosine(8-oxoG) bypass, where the structures were taken from the same study (PDB:4O3P and 4O3O). With the above starting structures, the EVB calculations were performed for the corresponding catalytic reactions with the correct and incorrect base pairs, as well as with 8-oxoG. The calculated free energy profiles and the corresponding activation barriers are given in Figure 4-2 and Table 4-1. As seen from the table, the observed trend in the activation barriers were reproduced. Then, the calculated fidelities were considered. Here, as already found in a Warshel and coworker’s previous systematic work 95 , that although the calculated binding free energies are largely overestimated due to the large charge of the ligand, the trend in the relative fidelity is reasonable. Also in the present case, while the difference in the calculated fidelities are exaggerated, the results are very encouraging, showing that we can ` 55 reproduce the trend of the special fidelity of pol η. Note that all calculations were performed with 5 different starting structures and the averaged energies are reported here. Despite the fact that we obtained exaggerated free energies, the average standard deviation of Δg # calc + ΔGbind,calc from all 4 base pair cases was 0.89 kcal/mol with the highest standard deviation of 0.93 kcal/mol for dATP/dG case. The rather large difference between the calculated and observed results is probably due to the large electrostatic effect with the highly charged system on the calculated binding energy, as reported and discussed in our previous study. 20, 95 . Nevertheless, the calculations clearly captured the reduced discrimination for the base pair with 8-oxoG, due to both binding energies and catalysis. Encouraged by the results of the calculations, I performed the same calculation for the same system and reaction but with the extra 3rd magnesium ion. Figure 4-2. The EVB free energy profiles for the catalytic reaction by Pol η with different base pairs. ` 56 Table 4-1. The calculated and observed 83 relative fidelities for the systems studied. The relative fidelity, f, is defined as (kcat/KM)dNTP/(kcat/KM)dCTP where dNTP is either dATP or dCTP. kcat refers to the rate constants of the incorporation of dCTP and dATP opposite dG and 8-oxoG and Δg # is the corresponding free energy. The corresponding calculated fidelity was obtained by using the corresponding Δg # . Note that the calculations of the binding free energies present an overestimate due to the difficulties of capturing sufficiently large dielectric for the large electrostatic effects. The experimental steady state kinetics were measured at 37 °C and pH 7.5. 4.3.2 The Three Mg Mechanism Does Not Presents a new transition state theory To explore the role of the 3 rd Mg, first the reaction profile with and without this ion were evaluated. (Fig.4-3) The calculated barrier with the 3 rd Mg accounts for the observed trend, whereas the observed barrier for the 2Mg system is not known (the experiment with 2Mg reports the appearance of the product, and this may be a product release rate determining step). Comparing Fig. 4-3-(a) and (c), shows the difference between the activation barrier between the cases with and without the 3 rd Mg. The result shows that the role of the 3rd Mg is more likely to help the exothermicity by 5 kcal/mol lower than that of the reaction without the Mg. It is still possible that the barrier is lowered by the 3 rd Mg when Fig 3-(a) is compared to 3-(b), although the simulation of 3-(b) might have needed more simulation time to reach a better convergence after the 3 rd Mg was taken out. k cat,obs (s -1 ) K M,obs (μM) Δg # obs (kcal/mol) ΔG bind obs (kcal/mol) Δg # calc (kcal/mol) ΔG bind calc (kcal/mol) k cat/K M,obs (s -1 μM -1 ) k cat/K M,calc (s -1 μM -1 ) f obs f calc dCTP/dG 1.33±0.05 1.3±0.2 17.39 0.07 23.29 -20.9 1.02 0.70 1 1 dATP/dG 0.10±0.01 92±23 18.94 1.17 25.57 -15.04 1.09E-03 1.0.E-06 0.001 1.09E-06 dCTP/oxoG 1.20±0.03 2.3±0.2 17.45 0.22 23.07 -22.3 0.52 12.54 1 1 dATP/oxoG 0.78±0.03 5.4±0.6 17.71 0.44 24.2 -19.98 0.14 0.02 0.28 0.0029 ` 57 Figure 4-3. The calculated reaction profile of the catalytic reaction by Pol η with and without the 3rd Mg 2+ . Two pol η structures were used: the recent structure reported with the 3rd Mg(PDB:5L9X 84 ) and the structure reported earlier without the 3rd Mg (PDB:4O3N 83 ). The profiles were also calculated on 5L9X in which the 3rd Mg 2+ is removed and on 4O3N in which a new Mg is inserted at the corresponding position. To further analyze the effect of the 3 rd magnesium ion at its different positions, five sets of EVB simulations were evaluated with distance constraints that used a 20 kcal/mol Å 2 force constant with 2.5, 3.0, 3.5, 4.0, and 4.5 Å constraints distances between the 3rd magnesium and the O2A of the incoming nucleotide. Note that the distance between these two atoms were naturally kept around 2.0 Å. The resulting EVB profiles (Fig. 4-3) define the landscape in the chemical direction in the constrained values but to obtain the complete landscape we need to evaluate the energetics along the direction of the displacement of the 3 rd Mg. Thus, we evaluated the potential of mean force (PMF) in that direction. This was done by imposing a 50 kcal/mol·Å 2 distance constraint between the 3 rd Mg and the O2A of the incoming nucleotide, sampling the harmonic energies in 41 10ps windows between the 2.0Å 2 and 6.0Å 2 distances. The PMF was then calculated by the weighted histogram analysis (WHAM) method. 96 ` 58 The complete landscape is presented in Fig. 4-4, where the larger exothermicity obtained with the 3 rd Mg was more visible when the landscape was drawn at different positions of the 3 rd Mg. While the overall barrier is not significantly different, except when the 3 rd Mg is at around 3 Å from the O2A of the incoming nucleotide, there is a larger free energy decrease in the final state. Figure 4-4. The landscape of the reaction of pol η as a function of the reaction path from the reactant to the product state and the distances between the 3rd Mg 2+ and the O2A of the incoming nucleotide As stated above the interesting study of Yang and coworkers 84, 97 identified a 3 rd Mg, which has been assumed to play a key role in the reaction of pol η. That is, the in crystallo experiment seems to show that in the absence of the 3 rd Mg (or other type of metal ions) the two canonic Mg does not appear in the product state for at least 1800s, whereas in the presence of the 3 rd Mg the product appeared in 40 s. This observation was interpreted in Refs. 84, 97 with an energy diagram that, despite being qualitatively unclear, seems to suggest that the first 2 Mg ions do not reduce the TS at all, whereas the 3 rd Mg leads to a major ground state (GS) destabilization (GSD ) where the free energy price of moving this Mg to its site is very large (see Fig. 4-5 ) and that this is the origin of the catalytic effect (now it takes less to move from the unstable GS to the TS) . It is also argued that the late appearance of the 3 rd ion contradicts somehow TST. ` 59 Figure 4-5. Free energy diagrams for the catalytic reaction of pol. The surface 1 (red) and 2 (blue) in (a) depicts the actual surfaces for the catalysis with 2 and 3 metal ions, respectively. ∆g # enz is the barrier that corresponds to kcat/KM , which is given by the difference between the TS and the reacting fragments in water. ∆g # pol is the difference between the TS free energy and the free energy at the bound RS. The surfaces in (b) corresponds to the proposal of Gao. et al 84 , where the reference energy in the reactant state is placed in a correct height, while the dotted line corresponds to the actual figure presented in Gao. et al 84 . The results here (which are given in Figs. 4-2 and 4-3 and then depicted schematically in Fig. 4- 5) correspond to small catalysis of the 3 rd Mg relative to the two Mg case. They also correspond to some catalysis by the TST by the 2 Mg. Thus the 3 rd Mg does not seem to correspond to the figure presented in Ref. 97 ( and its approximated depiction in our Fig. 4-5), where it almost completely reduces the activation barrier by what is actually ground state destabilization (although the ground state energy is undefined in that figure ). It seems that Ref. 97 assumed that TST which requires that the reacting atoms ( e.g. the 3 rd Mg ion or other parts ) will stay in the same region. This is reflected by the argument that “acquisition of a transient metal ion cofactor in the transition state might be a general feature that enables enzyme catalysis”, which seems to imply that the effect of 3 rd Mg should be considered only at the TS (rather than relative to its effect in the initial state). This point is further emphasized by the correct statement 97 that TST assumes the same chemical composition during the reaction, but then ` 60 suggests, incorrectly, that in the case of the 3 rd Mg we have a different chemical composition . However, in any consideration of a chemical reaction we must consider the complete system regardless of the changes in position (we must start with the 3 rd Mg). As to the motion of the 3 rd Mg ; the TST reflects the fact that any motion of the enzyme or a part of it (including metal ions ) can be a part of the general transition state stabilization in enzyme catalysis ( see the motion of the Ca ++ ion in Staphylococcal nuclease 98 ). In this case the correct activation free energy must be defined relative to a well-defined initial state. In fact the motion of the 3 rd Mg is formally similar to other motions of charged residues in the protein. With the above finding one might wonder why the product does not appear in at least 1800 s, with the two Mg system. The calculated results reflect a product release barrier that is reduced due to the 3 rd metal. Elucidating the mechanism of the postchemistry including pyrophosphate( PPi) leaving and its relation to the opening mechanism of DNA pol is an active area of research. 86 99- 103 Although this work does not try to determine the details of the mechanism of the postchemistry and the path for the PPi departure, it seems (considering the structural evidences of Ref. 97 ) that at least in the early steps of the PPi the leaving group moves with the binding Mg and also interacts with the 3 rd Mg . Thus, we evaluated a semiquantitative PMF (see Appendix 7.3) for pulling the PPi from the active site where the binding Mg moves with the leaving group. The PMF seems to indicate that the barrier for the departure of the leaving group is reduced by the interaction with the 3 rd Mg that probably helped in stabilizing the departure of the negatively charged PPi. Of course, establishing the correct energetics for the product release requires the evaluation of a very challenging PMF with possible alternative paths for the product release while considering the open and closed conformations. Here, more structural and biochemical studies will be very useful. However, we believe that the main role of the 3 rd Mg in reducing the barrier for the product release. It is useful to clarify here that we tend to believe that there is a barrier after the initial bond breaking step and thus increasing the exothermic helps in the product release. Of course, if there is a deep trap after the bond breaking step, this will not be the case. Here again, one needs careful and extensive studies in order to establish the actual situation. ` 61 4.4 Conclusion The work in this chapter explored the action of Pol η, examining both the fidelity and the 3 metal ions mechanism, reproducing the trend in the reduced fidelity towards 8-oxoG as well as the corresponding observed catalytic power. The ability to reproduce the correct observed barriers increases the confidence in the analysis of the 3 metal mechanism that is discussed below. The study of the pyrophosphorlysis reaction 86 raised important issues, but has not provided a fully informative free energy surfaces. That is, the profile with the 2Mg (that should have been the reverse of the original polymerization reaction) gave a major overestimate of the pyrophosphorlysis barrier, whereas the barrier for the polymerization reaction, that was reported in ref. 86 , practically disappeared (or perhaps was not captured with a leaving group distance of 1.7 Å). A reasonable barrier was obtained with MG 2+ as the 3 rd metal. It is essential to report the surface in at least the 2D space of the distances to leaving and attacking groups and to do it with careful PMFs calculations. At any rate, the observed behavior might as well reflect a product state features, where the departure of the leaving group is slowed down by the strong interaction between one of the Mg ions and the PPi, and where the presence of an additional metal helps in reducing this interaction and offering a lower barrier path. This study focused in part on the role of the 3 rd Mg and on the implication 97 that the catalytic effect of this ion contradicts TST. While the experimental finding of Ref. 97 is valuable and useful for studies in DNA replication, the results in this chapter indicate that the effect of the 3 rd Mg does not constitutes the presence of different chemical composition at the RS 97 and TS. Chemical compositions are always conserved in chemical reactions. It is always possible that the reaction coordinate is very complex and involves the motion of a part of the system from a large distance, but this element must be considered already in the reference ground state. In other words, the problem with the implications of Refs. 97 and 84 is that they do not include the state with the unbound 3 rd Mg in the initial system and thus do not consider the TS energy relative to that state . The argument that “acquisition of a transient metal ion cofactor in the transition state might be a general feature that enables enzyme catalysis” and that their finding revealed ` 62 “different compositions between the transition state and the reactant state”, overlooks the fact that any motion of the enzyme or a part of it (including metal ions) can be a part of the general transition state stabilization in enzyme catalysis. We also find it useful to comment on the idea of Ref. 84 who attributed to the formation of the C site (for the 3 rd Mg) and the binding of the 3 rd Mg to thermal motions. Of course, all uphill motions are thermal motions that follow the Boltzmann probability of being at different points of the free energy landscape. Thus, there is nothing special in the motions to the C site (relative to other motions) and all of those motions simply reflect the chance of reacting the corresponding points. A major part of this study is dedicated to clarification of fact that the effect of the 3 rd Mg does not violate TST. This clarification does not depend on the exact catalytic contribution of the 3 rd Mg. In this respect, we believe that this ion is important mainly in the product release state but more conclusive study would require one to obtain an ab initio QM/MM free energy surface with a full sampling. Such a study is very challenging and at present, the present approach probably provides a reasonable semiquantitative picture. ` 63 Chapter 5. Exploring the catalytic mechanism of Cas9 using information inferred from Endonuclease VII 5.1 Introduction and backgrounds Elucidating the nature of the gene editing mechanism of CRISPR-Cas9 is an important task in view of the importance of this breakthrough to the advance of human medicine. However, there have not been many computational studies on its catalysis mechanism yet due to the absence of a detailed structural information on the active form of Cas9. This chapter attempts to analyze the catalytic mechanism of Cas9 using the EVB with the information inferred from a similar system, T4 Endonuclease VII. Especially, in the lack of the detailed structures, the EVB is calibrated with Endonuclease VII to reproduce its known catalytic kinetics, then the similar active site configuration is tested on the Cas9 to propose the catalytically active form. The CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)-Cas (CRISPR- associated genes) system has recently received a major attention for its remarkable applicability in gene editing. 104-105 Cas9, one of the CRISPR associated protein, is an RNA-guided DNA endonuclease enzyme which was originally found in the adaptive immunity system of Streptococcus pyogenes. 106 The editing action of Cas9 ( described schematically in Figure 1) starts where the native Cas9 endonuclease binds to a guide RNA composed of two RNAs, CRISPR RNA(crRNA) ) and a trans-activating RNA(tracrRNA) 107 , where the system has been simplified (relative to the Cas9 that uses in nature the separate crRNA and tracrRNA by replacing these two disparate RNAs with a single guide RNA (sgRNA). 106 After a conformational change that transfers the Cas9 complex with the guide RNA to its activated form, the complex searches for a target DNA with sequences that match conserved protospacer adjacent motif(PAM) sequence, which is responsible for binding of the target DNA to the Cas9- guid RNA complex. 108 The double stranded DNA(dsDNA) is melted when the Cas9 protein finds a target sequence in the PAM region and a RNA-DNA formation is hybridized. 109 At this stage, the well-conserved RuvC and HNH domains cleave the non-target and the target DNA strand, ` 64 respectively. There have been many structural studies that identified the conformational changes in different steps of the DNA binding 110 , the PAM recognition 109, 111-112 , the transformation of the cleavage domains to the precatalytic state 113-115 , and subsequent cleavage 116 . These structural studies suggested that conformational changes of the HNH domain play important roles in the activation process. In particular, after the dsDNA binding and R-loop formation, a significant conformational rearrangement of the HNH domain brings this region close to the cleavage site on the target strand. (Figure 5-1) 117 An early experiment using intramolecular Förster Resonance Energy Transfer(FRET) revealed that the conformational dynamics of the HNH domain plays an important role in controlling the cleavage efficiency. 115 However, this high flexibility of the HNH domain makes it difficult to determine the precise catalytically active structure and the exact mechanism of the catalysis is still elusive. The difference in the conformation and position of the HNH domain in the two available Cas9 structures (with an incomplete non-target DNA strand 109 and with both unwound DNA strands 117 ) demonstrated the plasticity of the HNH domain and its important interplay with the nucleic acids. However, even in the latter structures, in absence of divalent cations, the distance between the catalytic HIS840 and the scissile phosphate is ~10 Å, which is still too far to support the catalytic reaction. ` 65 Figure 5-1. The mechanism of the conformational change of Cas9 upon dsDNA binding and PAM recognition and cleavage. The structures of PAM interacting domain and the guide are adjusted for PAM recognition, followed by DNA base pairs melting and DNA-RNA hybridization. Then, the HNH domain undergoes rearrangement to bring catalytic residues to the cleavage site on the targeted DNA strand (phosphate at position -3). There have been several in sillico attempts of using extensive molecular dynamic simulations in exploring possible active conformation 118-120 . Furthermore, a recent cryo-EM structure, solved at a 5.2 Å resolution, was used in MD simulation that tried to determine the closest position of the HNH domain to the cleavage site. 114 However, the relevance and validity of those structures cannot be established without calculating the free energy of the chemical reaction that would occur in the corresponding structures. At this stage, in absence of high resolution structure of a catalytically active state, it may be still too early to try to examine in a conclusive way the free energy landscape of the catalytic process. However, in view of the importance of the understanding of the Cas9 action, I propose to pursue in this direction by modeling the catalysis with several structural hypotheses. In particular, my ` 66 previous computational studies for polymerases and Warshel and coworker’s studies on endonucleases provide a powerful benchmark as all of these systems share very similar chemistry. 5.2 Inferring the catalytic conformation from other nucleases In view of the uncertainties about the catalytic structure of Cas9, it is important to exploit relevant information from other related systems. In this respect, I noticed that the HNH domain shares structural similarity with other nucleases such as Staphylococcal nuclease 121 , and especially with phage T4 endonuclease VII 122 (Endo VII, rmsd of 2.7Å for 61 equivalent Cα atoms) and Vibrio vulnificus nuclease 123 (rmsd of 2.7Å for 77 Cα atoms). While the catalytic structure at high resolution for Cas9 has not been reported yet, most of other nucleases including the ones mentioned above are known to have one metal ion mechanism(Figure 5-2) and have similar conformations for the catalytic state, which includes a histidine as a general base, aspartic acid or/and asparagine to hold a divalent cation, and a positively charged residues nearby. One major difference between these nucleases and the currently assumed catalytic state of Cas9 (from Cryo-EM structure (PDB:5Y36) 114 and those suggested by molecular dynamic simulations 119-120 ), is a positively charged residue on the other side of the Mg 2+ ion and next to the O3’ of the scissile phosphate. For example, there are ARG87 in Staphylococcal nuclease (Figure5-2-a), His43 in Endo VII(Figure5-2-b), and ARG99 in Endonuclease Vvn (Figure 5-2-c). In the case of Endo VII, the H43T mutant has shown a very low activity. 124 Therefore, it seems that the presence of a positive charge has a significant role for the common catalysis. ` 67 Figure 5-2. The common positive residues (circled) at the catalytic site in (a) Staphylococcal nuclease, (b) T4 Endonuclease VII and (c) Endonuclease Vvn Either the Cryo-EM structure or other FRET experiment for Cas9 has not shown any corresponding positive residue nearby. However, the HNH domain is known to have major conformational changes right before cleavage event 113, 115, 125 . Looking for possible positive residues, I found only two positive residues (LYS848 and LYS855) within 10 Å from the scissile phosphate. Interestingly, these two residues were mutated to alanine to improve the specificity of Cas9 (eSpCas9 1.1 126 ) and are widely used to avoid off-target cleavages. Although this mutant still has cleavage activity unlike Endo VII, it was reported that it has slower cleavage rates. 125 Thus, this chapter explores the possible catalytic role for these two residues. This will be done considering several structural options with the hope to get an insight about the activation process. 5.3 EVB calculation setups The calculations of the activation barriers were performed using the empirical valence-bond (EVB) method 14-15, 127 that has been used in the previous chapters for DNA/RNA polymerases. The polymerases and nucleases share the similar catalytic mechanism except that the many nucleases including the HNH domain of Cas9 require only one metal ion instead of two. All the calculations ` 68 in this study used the MOLARIS-XG package. 89-90 The reacting region and the Cas9-RNA-DNA complex for the EVB calculations were immersed in 30 Å sphere of water molecules using the surface-constraint all-atom solvent (SCAAS) type boundary condition. 89 Unlike the 18 Å sphere used in the previous chapters, a larger sphere needed to be used to consider the plasticity and conformational change of the structure. This system was further surrounded by 2 Å surface of Langevin dipoles and then a bulk continuum. The local reaction field (LRF) method 89 was used to treat the long range effects. All other atoms beyond this sphere were fixed at their initial positions in the Cryo-EM structure and the electrostatic interaction from outside of the sphere were turned off. The center of the simulation sphere was set to the geometric center of the EVB reacting atoms with the same parameters defined in the previous study of Staphylococcal nuclease 121 . The protonation states were determined by calculating the pKas with our coarse grained (CG) model 128 for the residue within 12Å from the center of the simulation with macroscopic large charge dielectric for the effect of ionizable residues beyond this range. The equilibration of the structures was done in the same way as the previous chapters by increasing the temperature from 10K to 300K for the first 200 ps with 100 kcal/mol Å 2 positional constraint on each atom. The constraint was gradually released over the next 100 ps. The system was then relaxed for 10 ns, followed by generating 5 different starting structures with 40 ps of consecutive relaxation from the previous structures. The average of each free energy calculation from the 5 runs with each of these starting structures is reported as the results. The EVB free energy calculations used the Free energy perturbation/Umbrella sampling approach (FEP/US) 127 with 51 frames of 10 ps each and 1fs time step. Note that the starting structures were relaxed with histidine protonated at both δ and ε position) and a hydroxide ion. In the first calculation, the histidine transferred a proton to the hydroxide, then, the opposite sign was applied to get the free energy of the protonation on the histidine. In the second calculation starting from the same relaxed structures as the one used for the proton transfer, the EVB potential was changed from the reactant to the intermediate potential. The automatic calibration was used by extracting the EVB region at each 10 fs of the simulation with the scissile phosphate converted to dimethyl phosphate and then solvated for 2.5 ps while the EVB reacting atoms were fixed. This simulation was used to calibrate the gas phase shift and the off diagonal term of the EVB Hamiltonian. ` 69 The stepwise associative mechanism was considered, as the previous study 129-132 proved that it is more practical to explore the TS free energy with the well-defined penta-covalent intermediate state between the nucleophilic attack and the bond breaking steps while it gives similar result to that of the concerted mechanism. Since the focus of this study is on the overall cleavage barrier, the calibration step for the barrier of the protonation step and the EVB calculation of the bond breaking steps were omitted. 5.4 Results and discussion 5.4.1 EVB simulations of the reaction of T4 Endonuclease VII Before exploring the reaction in Cas9, I explored the same reaction for T4 Endonucleases VII (Endo VII). The study of Endo VII was based on trying to reproduce the observed kinetics, considered the mechanism depicted in Figure 5-3 and using the same EVB parameters from the previous study for Staphylococcal nuclease 98 for the one metal mechanism. Note, however, that unlike Staphylococcal nuclease that uses GLU43 as a base, Endo VII and Cas9 use histidine to abstract proton from a water molecule. The proton was transferred to the δ nitrogen of H41 and the partial charges and vdW parameters of HIP and HIE for the EVB were taken from MOLARIS’ force field. The gas phase shift was calibrated according to the free energy of protonation converted from pKa of histidine and water. ` 70 Figure 5-3. One metal ion mechanism, used in the calculation. A water molecule deprotonated by Nδ of histidine attacks the scissile phosphate between two nucleotides. The crystal structure of H43A (PDB:2QNC), which is catalytically inactive, was used. For the wild type, H43A was mutated back to a histidine. In the original structure, the position of the Mg 2+ ion is coordinated between OP1 and O3’ of the scissile phosphate. (Figure 5-4-a) Although the EVB results for nucleophilic attack by OH - were reasonable, using this coordination of the Mg 2+ , there was a very large free energy cost (~15 kcal/mol) for the proton transfer step, because the distance between OH - and the Mg 2+ was too far to benefit the negative charge on the oxygen as it is deprotonated. This effect of the magnesium on the proton transfer step has been already observed in many of the previous studies of DNA polymerases including the studies in the previous chapter. 129, 133-136 Thus, the structure was equilibrated with a constraint that positioned the Mg 2+ ion in a way that favored the proton transfer step as well as the nucleophilic attack step. (Figure 5-4-b) Interestingly, when Cas9 structure was relaxed with HIS840 brought close to the ` 71 active site, the Mg 2+ ion was also positioned between OP1 of the scissile phosphate and the attacking OH - . Note that although the calculations considered the OH - in the first coordination shell of the Mg 2+ ion, it is also possible that this ion resides in the first or second shell. This is an important challenging open question, but it is assumed that a calibration on Endo VII will provide a reasonable result even without resolving this question. Figure 5-4. The different position of the Mg 2+ ion in (a) the original crystal structure of Endo VII and (b) after adjusting it to favor the proton transfer step as well as the nucleophilic attack step. ` 72 The above generated structure was equilibrated slowly and relaxed for 1ns and the EVB was then calculated. (Figure 5-5 and Table 5-1). The observed kcat was nicely reproduced for the wild type, while H43A showed very high free energy barrier. Figure 5-5. Calculated free energy profiles for the reference reaction in solution and the wild type and H43N of Endo VII. ` 73 ref Wild Type H43N obs dG p t d 𝑔𝑔 ϯ, NA dg ϯ dG p t d 𝑔𝑔 ϯ, NA dg ϯ dG p t d 𝑔𝑔 ϯ, NA dg ϯ dg ϯ 10.93 33 43.93 2.746 20.396 23.024 17.47 26.05 41.12 21.48 Table 5-1 . Calculated free energy profile for the catalytic reaction different systems. Free energies(kcal/mol) of the proton transfer(PT) step, the nucleophilic attack(NA) step and the total barrier in the reference solution, the wild type and the H43N mutant of Endo VII as well as the observed barrier converted from kcat from ref 137 5.4.2 EVB calculations of the reaction of Cas9 After reproducing the kinetics of Endo VII, the same structural preparation and settings were used in calculating the same reaction for Cas9. In choosing the structure, I used the recent Cryo- EM structure (PDB:5Y36), which was recently reported as the closest conformation to the catalytic state. However, this structure was still far from the actual catalytic structure, where the position of the histidine base (which was mutated to alanine to inactivate the enzyme) it is located ~8 Å away from the scissile phosphate. Moreover, ASN863, that corresponds to ASN62 in Endo VII, which is clearly a catalytic residue (since its mutation to alanine also causes inactivity) is ~10 Å away from the catalytic site. Thus I applied distance constraints with 15kcal/mol · Å 2 force constant, at 2.0 Å between OD1 of ASN863 and Mg 2+ ,and at 5.0 Å between CG of HIS840 and the phosphorus of the scissile phosphate, and the structures was relaxed for 10ns. (Figure 5-6) ` 74 Figure 5-6. (a) The Cryo-EM structure of Cas9. (PDB: 5Y36 114 ) where the catalytic H840 was mutated to alanine to stop the reaction. The distance between the scissile phosphate and the Mg 2+ is 4.8 Å and the N863 faces away from the catalytic site. (b) Adjusted Cas9 structure that brings HIS840(converted from alanine) and ASN863 close to the corresponding positions in the catalytic site of Endo VII with 3.3 Å distance between the scissile phosphate and the Mg 2+ . In exploring the possible role of a positive residue near the catalytic site, I identified two potential residues (LYS848 and LYS855) within 10Å from this site. In addition I identified K810 that is, however, in a distance of ~15Å ( the possible involvement of this residue in discussed in the Conclusions section). While the side chain of LYS848 is highly flexible and can be moved towards the catalytic site, LYS855 couldn’t be brought to the site even with a distance constraint with a 50 kcal/mol · Å 2 force constant, due to the presence of other residues on the way. Thus we took the structure prepared above and imposed a distance constraint with 10 kcal/mol · Å 2 force constant at 2.0 Å between NZ of LYS848 and O3’ of the scissile phosphate , and pull the lysine next to the phosphate during 200ps relaxation.(Figure 5-7) ` 75 Figure 5-7. Pulling K848 from its initial position (colored blue) to the neighborhood of the scissile phosphate (colored red). The following three cases were explored (a) LYS848 was kept far from the scissile phosphate, as in its the original structure, (b) LYS848 was brought near to the scissile phosphate, but was not ionized, and (c) the ionized LYS848 was brought near the scissile phosphate. (Figure 5-8 and Table 5-2). The EVB barrier was compared to the barrier obtained from a recent study, where the intrinsic rate of cleavage by the HNH domain was determined to be 0.67s -1 . 138 The calculated barrier was found to be higher when LYS848 was far away (both the PT step and nucleophilic attack step) than when it was near the catalytic site. When we ionized the lysine, the barrier was reduced by about 5 kcal/mol relative to the case when it was not ionized, and most of the reduction was at the nucleophilic attack step. ` 76 Figure 5-8. The EVB reaction profiles in the cases when K848 is far from the active site, and when it is near with/without being ionized. Table 5-2. Calculated free energy profile for the catalytic reaction of Cas9 different systems. Calculated free energies(kcal/mol) of proton transfer(PT) step, nucleophilic attack(NA) step and total barrier in the reference solution and the three cases for Cas9. The observed barrier was converted from the kcat of ref 138 ` 77 5.4.3 EVB calculation for Cas9 with mismatches around the cleavage site In view of the importance of controlling off target cleavage, I tried to explore the effect of mismatches around the cleavage site and the possible coupling to K848. The experiments with the engineered Cas9 and mismatches show both the wild type and eCas9 are sensitive to mismatches around the cleavage site and show almost no cleavage. 126, 138 In the simulations, 5’- GAA-3’ bases at 4,5,6 upstream positions of the PAM region in the target strand was converted to 5’-AGG3’ bases and relaxed the system for 1ns. This was followed by the same EVB calculations as before. Apparently, the mismatches did not change the barrier of the wild type significantly while K848A shows a high barrier. (Figure 5-9) In fact, this result is somewhat similar to the previous section where the positively charged residue determines the difference in the catalytic efficiency. However, note that the catalytic effect obtained here with K848 near the reactive center is highly to occur since the conformation with the mismatches is not likely to allow for K848 to reach the reacting bonds , since the needed unwinding is unlikely to be allowed as implied in ref 138 . It is also possible that in this case we have incompetent DNA binding as the 4 consecutive PAM proximal mismatches showed almost no binding. 138 We look forward to having crystal structures with mismatches that would allow us to explore better the corresponding effect on catalysis, but we suspect that the hindered unwinding might block the conformational change which accompany a positively charged residue near the cleavage site. ` 78 Figure 5-9. The EVB reaction profiles for Cas9 with and without 3 consecutive mismatch base parings from 4 nucleotides upstream of PAM. The effect of the mutation of K848 to alanine is also calculated. Note that the low barrier case is unlikely to occur, since the unwinding conformational change that leads to the K848 movement is unlikely to be allowed (see text) 5.5 Conclusion Understanding the gene editing mechanism of CRISPR is a challenge of a major importance in view of the crucial importance of gene therapy to medical advances. In this respect it is crucial to understand the catalytic mechanism of Cas9 and its role in confirming accurate editing. This work attempted to explore the energetics and mechanism of the chemical step in the activation of Cas9. Considering the absence of direct high resolution structural information of the catalytic configuration of Cas9, the strategy was to study the closely related catalytic reaction of endonuclease VII. The EVB parameter obtained from the Endo VII study were then used in the simulations of Cas9. It was found that the most likely catalytic configuration involves a structure ` 79 where a Lys residue comes close to the scissile phosphate. Obviously, this interesting conclusion needs more direct confirmations. It is useful to further address the possible correlation of our cleavage mechanism with recent experimental observations. 138 This involves :1) the engineered Cas9, namely eCas9, (K848A, K1003A, R1060A) has slower cleavage rate than the WT. 2) the DNA unwinding takes place followed by the movement of HNH domain toward the catalytic site. 3) there is a strong correlation between the DNA unwinding and the cleavage rate as the cleavage reaction proceeds more often in unwound state. 4) the DNA unwinding is hindered by the mutation of positive residues in eCas9 and the PAM-distal DNA-RNA base mismatches. 5) the impact of PAM-distal mismatches is greater for eCas9. Thus, they suggested that the enhancement in the cleavage specificity of eCas9 is largely due to destabilization of the unwound states by mutating the positive residues. While the results of the present work cannot be directly compared to the results obtained with eCas9, since we only mutated K848 and provide a clear mechanism of specificity, our results correspond to the same slower reaction rate in absence of K848, suggesting a possible presence of a positive residue such as K848 near the cleavage site. In fact, Figure 4-d of ref 138 shows a nearly constant difference in the cleavage rate between the WT and eCas9 above 50% of unwound fraction. While different dynamics and DNA unwound fractions between the WT and eCas9 seem to be correlated with the specificity, it also seems that there is a constant factor that makes a difference at the cleavage stage regardless of the mismatches. Therefore, after the DNA unwinding, I hypothesize that the following conformational change of the HNH domain may position one of the positive residues mutated in eCas9 around the catalytic site. Interestingly, a recent study with long timescale MD simulation 118 showed that K810 ( another lysine known to reduce off-target cleavage 126 ), is located around the phosphate -4 after 10μs simulation time. This observation further supports the hypothesis here although it is not clear whether the effect found with K848 will be obtained with K810. In fact, two different versions of eCas9 in the original proposal by Slaymaker et al 126 , one with K848A and the other with K810A, both showed reduced off-target efficiencies. At this point it might be useful to discuss the frequent implications 115, 139-142 that the activation of Cas9 is control by dynamical processes and or conformational control. First, with regards to ` 80 the dynamical idea; as we argued frequently (e.g. ref 143-144 ), overcoming activation barrier obeys in most case the Boltzmann law, and the corresponding rate constants can be described by transition state theory ( sometimes with a transmission factor correction ). With regards to the conformational control; It is possible that the conformational barrier is slightly higher than the chemical barrier ( difference in rate of a factor of around 3 ( ref 139 )), although another recent experiment 138 showed slower cleavage rate. However, as clarified in the discussion of the fidelity of DNA polymerase 145-147 the modulation by conformational changes only occurs after evolution found a way to reduce the very high chemical activation barrier. Thus, it is unlikely that the conformational barrier controls the editing process. This preliminary exploration of the effect of the effect of base changes did not involve the examination of the corresponding effect of the unwinding changes, which are correlated with the catalytic activity. Combining studies of the melting dependence on the base pairing with EVB studies may provide a tool for refining the editing process. Of course, such a study will involve significant convergence problems, but we believe that advances in structural studies will help to make such studies reasonably reliable. ` 81 References 1. Warshel, A.; Levitt, M., Theoretical Studies of Enzymic Reactions - Dielectric, Electrostatic and Steric Stabilization of Carbonium-Ion in Reaction of Lysozyme. Journal of Molecular Biology 1976, 103 (2), 227-249. 2. Cui, Q.; Elstner, M.; Kaxiras, E.; Frauenheim, T.; Karplus, M., A QM/MM implementation of the self-consistent charge density functional tight binding (SCC-DFTB) method. Journal of Physical Chemistry B 2001, 105 (2), 569-585. 3. Field, M. J., Simulating enzyme reactions: Challenges and perspectives. Journal of Computational Chemistry 2002, 23 (1), 48-58. 4. Field, M. J.; Bash, P. A.; Karplus, M., A Combined Quantum-Mechanical and Molecular Mechanical Potential for Molecular-Dynamics Simulations. Journal of Computational Chemistry 1990, 11 (6), 700-733. 5. Friesner, R. A.; Beachy, M. D., Quantum mechanical calculations on biological systems. Current Opinion in Structural Biology 1998, 8 (2), 257-262. 6. Gao, J. L., Hybrid quantum and molecular mechanical simulations: An alternative avenue to solvent effects in organic chemistry. Accounts of Chemical Research 1996, 29 (6), 298-305. 7. Gao, J. L.; Freindorf, M., Hybrid ab initio QM/MM simulation of N-methylacetamide in aqueous solution. Journal of Physical Chemistry A 1997, 101 (17), 3182-3188. 8. Marti, S.; Andres, J.; Moliner, V.; Silla, E.; Tunon, I.; Bertran, J., Transition structure selectivity in enzyme catalysis: a QM/MM study of chorismate mutase. Theoretical Chemistry Accounts 2001, 105 (3), 207-212. 9. Monard, G.; Merz, K. M., Combined quantum mechanical/molecular mechanical methodologies applied to biomolecular systems. Accounts of Chemical Research 1999, 32 (10), 904-911. 10. Muller, R. P.; Warshel, A., Ab-Initio Calculations of Free-Energy Barriers for Chemical-Reactions in Solution. Journal of Physical Chemistry 1995, 99 (49), 17516-17524. 11. Singh, U. C.; Kollman, P. A., A Combined Abinitio Quantum-Mechanical and Molecular Mechanical Method for Carrying out Simulations on Complex Molecular-Systems - Applications to the Ch3cl + Cl- Exchange-Reaction and Gas-Phase Protonation of Polyethers. Journal of Computational Chemistry 1986, 7 (6), 718-730. 12. Aqvist, J.; Fothergill, M., Computer simulation of the triosephosphate isomerase catalyzed reaction. J Biol Chem 1996, 271 (17), 10010-10016. 13. Aqvist, J.; Warshel, A., Computer-Simulation of the Initial Proton-Transfer Step in Human Carbonic Anhydrase-I. J Mol Biol 1992, 224 (1), 7-14. 14. Warshel, A., Computer modeling of chemical reactions in enzymes and solutions. Wiley: New York, 1991; p xiv, 236 p. 15. Warshel, A.; Weiss, R. M., AN EMPIRICAL VALENCE BOND APPROACH FOR COMPARING REACTIONS IN SOLUTIONS AND IN ENZYMES. Journal of the American Chemical Society 1980, 102 (20), 6218-6226. 16. Valero, R.; Song, L.; Gao, J.; Truhlar, D. G., Perspective on Diabatic Models of Chemical Reactivity as Illustrated by the Gas-Phase S(N)2 Reaction of Acetate Ion with 1,2-Dichloroethane. J Chem Theory Comput 2009, 5 (1), 1-22. 17. Kamerlin, S. C. L.; Cao, J.; Rosta, E.; Warshel, A., On Unjustifiably Misrepresenting the EVB Approach While Simultaneously Adopting It. Journal of Physical Chemistry B 2009, 113 (31), 10905- 10915. ` 82 18. Hong, G. Y.; Rosta, E.; Warshel, A., Using the constrained DFT approach in generating diabatic surfaces and off diagonal empirical valence bond terms for modeling reactions in condensed phases. Journal of Physical Chemistry B 2006, 110 (39), 19570-19574. 19. Xia, S. L.; Wang, J. M.; Konigsberg, W. H., DNA Mismatch Synthesis Complexes Provide Insights into Base Selectivity of a B Family DNA Polymerase. J Am Chem Soc 2013, 135 (1), 193-202. 20. Prasad, B. R.; Warshel, A., Prechemistry versus preorganization in DNA replication fidelity. Proteins 2011, 79 (10), 2900-2919. 21. Warshel, A.; Sharma, P. K.; Kato, M.; Xiang, Y.; Liu, H. B.; Olsson, M. H. M., Electrostatic basis for enzyme catalysis. Chemical Reviews 2006, 106 (8), 3210-3235. 22. Villa, J.; Strajbl, M.; Glennon, T. M.; Sham, Y. Y.; Chu, Z. T.; Warshel, A., How important are entropic contributions to enzyme catalysis? Proceedings of the National Academy of Sciences of the United States of America 2000, 97 (22), 11899-11904. 23. Strajbl, M.; Sham, Y. Y.; Villa, J.; Chu, Z. T.; Warshel, A., Calculations of activation entropies of chemical reactions in solution. J. Phys. Chem. 2000, 104, 4578 - 4584. 24. Ball, P., Water as an Active Constituent in Cell Biology. Chemical Reviews 2007, 108 (1), 74-108. 25. Warshel, A., Computer Modeling of Chemical Reactions in Enzymes and Solutions. John Wiley & Sons: New York, 1991. 26. Kato, M.; Braun-Sand, S.; Warshel, A., Chapter 15 Challenges and Progresses in Calculations of Binding Free Energies - What Does it Take to Quantify Electrostatic Contributions to Protein-Ligand Interactions? In Computational and Structural Approaches to Drug Discovery: Ligand-Protein Interactions, The Royal Society of Chemistry: 2008. 27. Michel, J.; Tirado-Rives, J.; Jorgensen, W. L., Prediction of the Water Content in Protein Binding Sites. The Journal of Physical Chemistry B 2009, 113 (40), 13337-13346. 28. Young, T.; Abel, R.; Kim, B.; Berne, B. J.; Friesner, R. A., Motifs for molecular recognition exploiting hydrophobic enclosure in protein–ligand binding. Proceedings of the National Academy of Sciences 2007, 104 (3), 808-813. 29. Helms, V.; Wade, R. C., Thermodynamics of water mediating protein-ligand interactions in cytochrome P450cam: a molecular dynamics study. Biophysical Journal 1995, 69 (3), 810-824. 30. Stephens, P. J.; Jollie, D. R.; Warshel, A., Protein Control of Redox Potentials of Iron-Sulfur Proteins. Chem. Rev. 1996, 96, 2491. 31. Morais-Cabral, J. H.; Zhou, Y.; MacKinnon, R., Energetic optimization of ion conduction rate by the K+ selectivity filter. Nature 2001, 414 (6859), 37-42. 32. Pisliakov, A. V.; Sharma, P. K.; Chu, Z. T.; Haranczyk, M.; Warshel, A., Electrostatic basis for the unidirectionality of the primary proton transfer in cytochrome c oxidase. Proc Natl Acad Sci U S A 2008, 105 (22), 7726-31. 33. Lee, H. J.; Svahn, E.; Swanson, J. M. J.; Lepp, H.; Voth, G. A.; Brzezinski, P.; Gennis, R. B., Intricate Role of Water in Proton Transport through Cytochrome c Oxidase. Journal of the American Chemical Society 2010, 132 (45), 16225-16239. 34. Dwyer, J. J.; Gittis, A. G.; Karp, D. A.; Lattman, E. E.; Spencer, D. S.; Stites, W. E.; García-Moreno E, B., High Apparent Dielectric Constants in the Interior of a Protein Reflect Water Penetration. Biophysical Journal 2000, 79 (3), 1610-1620. 35. Fitch, C. A.; Karp, D. A.; Lee, K. K.; Stites, W. E.; Lattman, E. E.; García-Moreno, E. B., Experimental pKa Values of Buried Residues: Analysis with Continuum Methods and Role of Water Penetration. Biophysical Journal 2002, 82 (6), 3289-3304. 36. Warshel, A.; Levitt, M., Theoretical studies of enzymic reactions: dielectric, electrostatic and steric stabilization of the carbonium ion in the reaction of lysozyme. J. Mol. Biol. 1976, 103, 227-249. ` 83 37. Warshel, A.; Russel, S. T., Calculations of electrostatic interactions in biological systems and in solutions. Q. Rev. Biophys. 1984, 17, 283-422. 38. Warshel, A.; Sharma, P. K.; Kato, M.; Xiang, Y.; Liu, H.; Olsson, M. H., Electrostatic basis for enzyme catalysis. Chemical reviews 2006, 106 (8), 3210-3235. 39. Warshel, A.; Sussman, F.; King, G., Free energy of charges in solvated proteins: Microscopic calculations using a reversible charging process. Biochemistry 1986, 25, 8368-8372. 40. Warshel, A.; King, G., Polarization Constraints in Molecular Dynamics Simulation of Aqueous Solutions: The Surface Constraint All Atom Solvent (SCAAS) Model. Chem. Phys. Lett. 1985, 121, 124-129. 41. King, G.; Warshel, A., A surface constrained all-atom solvent model for effective simulations of polar solutions. The Journal of Chemical Physics 1989, 91 (6), 3647-3661. 42. Warshel, A., Calculations of chemical processes in solutions. J. Phys. Chem. 1979, 83, 1640-1650. 43. Brooks III, C. L.; Karplus, M., Deformable stochastic boundaries in molecular dynamics. J. Chem. Phys. 1983, 79, 6312-6325. 44. Im, W.; Berneche, S.; Roux, B., Generalized solvent boundary potential for computer simulations. The Journal of Chemical Physics 2001, 114 (7), 2924-2937. 45. Ghosh, N.; Prat-Resina, X.; Gunner, M. R.; Cui, Q., Microscopic pKa Analysis of Glu286 in Cytochrome c Oxidase (Rhodobacter sphaeroides): Toward a Calibrated Molecular Model. Biochemistry 2009, 48 (11), 2468-2485. 46. Warshel, A.; Sharma, P. K.; Kato, M.; Parson, W. W., Modeling electrostatic effects in proteins. Bba-Proteins Proteom 2006, 1764 (11), 1647-1676. 47. Karp, D. A.; Gittis, A. G.; Stahley, M. R.; Fitch, C. A.; Stites, W. E.; García-Moreno E, B., High Apparent Dielectric Constant Inside a Protein Reflects Structural Reorganization Coupled to the Ionization of an Internal Asp. Biophysical Journal 2007, 92 (6), 2041-2053. 48. Kato, M.; Warshel, A., Using a charging coordinate in studies of ionization induced partial unfolding. J. Phys. Chem. B 2006, 110 (23), 11566-11570. 49. Chakrabarty, S.; Warshel, A., Capturing the energetics of water insertion in biological systems: the water flooding approach. Proteins 2013, 81 (1), 93-106. 50. Ross, G. A.; Bodnarchuk, M. S.; Essex, J. W., Water Sites, Networks, And Free Energies with Grand Canonical Monte Carlo. Journal of the American Chemical Society 2015, 137 (47), 14930-14943. 51. Bodnarchuk, M. S.; Viner, R.; Michel, J.; Essex, J. W., Strategies to Calculate Water Binding Free Energies in Protein-Ligand Complexes. J Chem Inf Model 2014, 54 (6), 1623-1633. 52. Allen, M. P.; Tildesley, D. J., Computer simulation of liquids. Oxford University Press: Oxford, 1989. 53. Frenkel, D.; Smit, B., Understanding molecular simulation: from algorithms to applications. Academic Press: 2002. 54. Resat, H.; Mezei, M., Grand Canonical Monte Carlo Simulation of Water Positions in Crystal Hydrates. Journal of the American Chemical Society 1994, 116 (16), 7451-7452. 55. Lynch, G. C.; Pettitt, B. M., Grand canonical ensemble molecular dynamics simulations: Reformulation of extended system dynamics approaches. The Journal of Chemical Physics 1997, 107 (20), 8594-8610. 56. Woo, H.-J.; Dinner, A. R.; Roux, B., Grand canonical Monte Carlo simulations of water in protein environments. The Journal of Chemical Physics 2004, 121 (13), 6392-6400. 57. Yoon, H.; Warshel, A., The control of the discrimination between dNTP and rNTP in DNA and RNA polymerase. Proteins-Structure Function and Bioinformatics 2016, 84 (11), 1616-1624. 58. Shelley, J. C.; Patey, G. N., A Configuration Bias Monte-Carlo Method for Water. J Chem Phys 1995, 102 (19), 7656-7663. ` 84 59. Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.; Teller, E., Equation of State Calculations by Fast Computing Machines. J Chem Phys 1953, 21 (6), 1087-1092. 60. Warshel, A., Chu Z. T.; Villa, J.; Strajbl, M.; Schutz, C. N.; Shurki, A.; Vicatos, S.; Plotnikov, N. V.; Schopf, P. , Molaris-XG, v 9.15. Los Angeles: University of Southern California 2012. 2012. 61. Chakrabarty, S.; Warshel, A., Capturing the energetics of water insertion in biological systems: The water flooding approach. Proteins: Structure, Function, and Bioinformatics 2013, 81 (1), 93-106. 62. Lamarre, D.; Anderson, P. C.; Bailey, M.; Beaulieu, P.; Bolger, G.; Bonneau, P.; Bös, M.; Cameron, D. R.; Cartier, M.; Cordingley, M. G., An NS3 protease inhibitor with antiviral effects in humans infected with hepatitis C virus. Nature 2003, 426 (6963), 186-189. 63. Åqvist, J.; Medina, C.; Samuelsson, J.-E., A new method for predicting binding affinity in computer-aided drug design. Protein engineering 1994, 7 (3), 385-391. 64. Sham, Y. Y.; Chu, Z. T.; Tao, H.; Warshel, A., Examining methods for calculations of binding free energies: LRA, LIE, PDLD-LRA, and PDLD/S-LRA calculations of ligands binding to an HIV protease. Proteins: Structure, Function, and Bioinformatics 2000, 39 (4), 393-407. 65. Lee, F. S.; Chu, Z. T.; Warshel, A., Microscopic and Semimicroscopic Calculations of Electrostatic Energies in Proteins by the POLARIS and ENZYMIX Programs. J. Comp. Chem. 1993, 14, 161-185. 66. Olsson, M. H. M.; Sharma, P. K.; Warshel, A., Simulating redox coupled proton transfer in cytochrome c oxidase: Looking for the proton bottleneck. Febs Lett 2005, 579 (10), 2026-2034. 67. Kato, M.; Pisliakov, A. V.; Warshel, A., The barrier for proton transport in aquaporins as a challenge for electrostatic models: The role of protein relaxation in mutational calculations. Proteins: Struct. Funct. Bioinf. 2006, 64 (4), 829-844. 68. Wlodawer, A.; Walter, J.; Huber, R.; Sjolin, L., Structure of Bovine Pancreatic Trypsin-Inhibitor - Results of Joint Neutron and X-Ray Refinement of Crystal Form-Ii. J Mol Biol 1984, 180 (2), 301-329. 69. Braun-Sand, S.; Sharma, P. K.; Chu, Z. T.; Pisliakov, A. V.; Warshel, A., The energetics of the primary proton transfer in bacteriorhodopsin revisited: It is a sequential light-induced charge separation after all. Bba-Bioenergetics 2008, 1777 (5), 441-452. 70. Abel, R.; Young, T.; Farid, R.; Berne, B. J.; Friesner, R. A., Role of the Active-Site Solvent in the Thermodynamics of Factor Xa Ligand Binding. Journal of the American Chemical Society 2008, 130 (9), 2817-2831. 71. Nguyen, C. N.; Kurtzman, T.; Gilson, M. K., Spatial Decomposition of Translational Water-Water Correlation Entropy in Binding Pockets. J Chem Theory Comput 2016, 12 (1), 414-429. 72. Singh, N.; Warshel, A., A comprehensive examination of the contributions to the binding entropy of protein–ligand complexes. Proteins: Structure, Function, and Bioinformatics 2010, 78 (7), 1724-1735. 73. Yang, G. W.; Franklin, M.; Li, J.; Lin, T. C.; Konigsberg, W., A conserved Tyr residue is required for sugar selectivity in a pol alpha DNA polymerase. Biochemistry 2002, 41 (32), 10256-10261. 74. Sousa, R.; Padilla, R., Mutant T7 Rna-Polymerase as a DNA-Polymerase. Embo J 1995, 14 (18), 4609-4621. 75. Singh, N.; Frushicheva, M. P.; Warshel, A., Validating the vitality strategy for fighting drug resistance. Proteins 2012, 80 (4), 1110-1122. 76. Yin, Y. W.; Steitz, T. A., The structural mechanism of translocation and helicase activity in T7 RNA polymerase. Cell 2004, 116 (3), 393-404. 77. Steitz, T. A., DNA polymerases: Structural diversity and common mechanisms. J Biol Chem 1999, 274 (25), 17395-17398. 78. Zhang, Y. B.; Yuan, F. H.; Wu, X. H.; Rechkoblit, O.; Taylor, J. S.; Geacintov, N. E.; Wang, Z. G., Error-prone lesion bypass by human DNA polymerase eta. Nucleic Acids Res 2000, 28 (23), 4717-4724. 79. McCulloch, S. D.; Kunkel, T. A., The fidelity of DNA synthesis by eukaryotic replicative and translesion synthesis polymerases. Cell Res 2008, 18 (1), 148-161. ` 85 80. Cruet-Hennequart, S.; Gallagher, K.; Sokol, A. M.; Villalan, S.; Prendergast, A. M.; Carty, M. P., DNA Polymerase eta, a Key Protein in Translesion Synthesis in Human Cells. Subcell Biochem 2010, 50, 189-209. 81. Yang, W., An Overview of Y-Family DNA Polymerases and a Case Study of Human DNA Polymerase eta. Biochemistry-Us 2014, 53 (17), 2793-2803. 82. Sale, J. E.; Lehmann, A. R.; Woodgate, R., Y-family DNA polymerases and their role in tolerance of cellular DNA damage. Nature Reviews Molecular Cell Biology 2012, 13 (3), 141-152. 83. Patra, A.; Nagy, L. D.; Zhang, Q. Q.; Su, Y.; Muller, L.; Guengerich, F. P.; Egli, M., Kinetics, Structure, and Mechanism of 8-Oxo-7,8-dihydro-2 '-deoxyguanosine Bypass by Human DNA Polymerase eta. J Biol Chem 2014, 289 (24), 16867-16882. 84. Gao, Y.; Yang, W., Capture of a third Mg2+ is essential for catalyzing DNA synthesis. Science 2016, 352 (6291), 1334-1337. 85. Nakamura, T.; Zhao, Y.; Yamagata, Y.; Hua, Y. J.; Yang, W., Watching DNA polymerase eta make a phosphodiester bond. Nature 2012, 487 (7406), 196-U77. 86. Perera, L.; Freudenthal, B. D.; Beard, W. A.; Shock, D. D.; Pedersen, L. G.; Wilson, S. H., Requirement for transient metal ions revealed through computational analysis for DNA polymerase going in reverse. P Natl Acad Sci USA 2015, 112 (38), E5228-E5236. 87. Stec, B.; Holtz, K. M.; Kantrowitz, E. R., A revised mechanism for the alkaline phosphatase reaction involving three metal ions. J Mol Biol 2000, 299 (5), 1303-1311. 88. Duarte, F.; Amrein, B. A.; Kamerlin, S. C. L., Modeling catalytic promiscuity in the alkaline phosphatase superfamily. Phys Chem Chem Phys 2013, 15 (27), 11160-11177. 89. Lee, F. S.; Chu, Z. T.; Warshel, A., MICROSCOPIC AND SEMIMICROSCOPIC CALCULATIONS OF ELECTROSTATIC ENERGIES IN PROTEINS BY THE POLARIS AND ENZYMIX PROGRAMS. Journal of Computational Chemistry 1993, 14 (2), 161-185. 90. Warshel, A.; Chu, Z. T.; Villa, J.; Strajbl, M.; Schutz, C. N.; Shuriki, A.; Vicatos, S.; Chakrabarty, S.; Plotnikov, N. V.; Schopf, P., Molaris-Xg, v9.11. 2012. 91. Warshel, A.; Chu, Z. T.; Tao, H.; Sham, Y. Y., Examining methods for calculations of binding free energies: LRA, LIE and PDLD/S-LRA calculations of ligands binding to proteins. Biophysical Journal 2000, 78 (1), 332A-332A. 92. Singh, N.; Warshel, A., Absolute binding free energy calculations: On the accuracy of computational scoring of protein-ligand interactions. Proteins 2010, 78 (7), 1705-1723. 93. Schutz, C. N.; Warshel, A., What are the dielectric "constants" of proteins and how to validate electrostatic models? Proteins 2001, 44 (4), 400-417. 94. Sham, Y. Y.; Chu, Z. T.; Warshel, A., Consistent calculations of pK(a)'s of ionizable residues in proteins: Semi-microscopic and microscopic approaches. Journal of Physical Chemistry B 1997, 101 (22), 4458-4472. 95. Xiang, Y.; Oelschlaeger, P.; Florian, J.; Goodman, M. F.; Warshel, A., Simulating the effect of DNA polymerase mutations on transition-state energetics and fidelity: Evaluating amino acid group contribution and allosteric coupling for ionized residues in human pol beta. Biochemistry-Us 2006, 45 (23), 7036-7048. 96. Alan, G. WHAM:the weighted histogram analysis method, 2.06. 97. Yang, W.; Weng, P. J.; Gao, Y., A new paradigm of DNA synthesis: three-metal-ion catalysis. Cell Biosci 2016, 6. 98. Aqvist, J.; Warshel, A., CALCULATIONS OF FREE-ENERGY PROFILES FOR THE STAPHYLOCOCCAL NUCLEASE CATALYZED REACTION. Biochemistry-Us 1989, 28 (11), 4680-4689. 99. Golosov, A. A.; Warren, J. J.; Beese, L. S.; Karplus, M., The Mechanism of the Translocation Step in DNA Replication by DNA Polymerase I: A Computer Simulation Analysis. Structure 2010, 18 (1), 83-93. ` 86 100. Miller, B. R.; Parish, C. A.; Wu, E. Y., Molecular Dynamics Study of the Opening Mechanism for DNA Polymerase I. Plos Comput Biol 2014, 10 (12). 101. Reed, A. J.; Suo, Z.; Vyas, R.; Raper, A. T., Structural Insights into the Post-Chemistry Steps of Nucleotide Incorporation Catalyzed by a DNA Polymerase. Journal of the American Chemical Society 2016. 102. Vyas, R.; Reed, A. J.; Tokarsky, E. J.; Suo, Z. C., Viewing Human DNA Polymerase beta Faithfully and Unfaithfully Bypass an Oxidative Lesion by Time-Dependent Crystallography. Journal of the American Chemical Society 2015, 137 (15), 5225-5230. 103. Yang, L. J.; Arora, K.; Beard, W. A.; Wilson, S. H.; Schlick, T., Critical role of magnesium ions in DNA polymerase beta's closing and active site assembly. Journal of the American Chemical Society 2004, 126 (27), 8441-8453. 104. Wright, A. V.; Nunez, J. K.; Doudna, J. A., Biology and Applications of CRISPR Systems: Harnessing Nature's Toolbox for Genome Engineering. Cell 2016, 164 (1-2), 29-44. 105. Doudna, J. A.; Charpentier, E., The new frontier of genome engineering with CRISPR-Cas9. Science 2014, 346 (6213), 1077-+. 106. Jinek, M.; Chylinski, K.; Fonfara, I.; Hauer, M.; Doudna, J. A.; Charpentier, E., A Programmable Dual-RNA-Guided DNA Endonuclease in Adaptive Bacterial Immunity. Science 2012, 337 (6096), 816-821. 107. Deltcheva, E.; Chylinski, K.; Sharma, C. M.; Gonzales, K.; Chao, Y. J.; Pirzada, Z. A.; Eckert, M. R.; Vogel, J.; Charpentier, E., CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III. Nature 2011, 471 (7340), 602-607. 108. Sternberg, S. H.; Redding, S.; Jinek, M.; Greene, E. C.; Doudna, J. A., DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature 2014, 507 (7490), 62-67. 109. Anders, C.; Niewoehner, O.; Duerst, A.; Jinek, M., Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease. Nature 2014, 513 (7519), 569-573. 110. Jinek, M.; Jiang, F. G.; Taylor, D. W.; Sternberg, S. H.; Kaya, E.; Ma, E. B.; Anders, C.; Hauer, M.; Zhou, K. H.; Lin, S.; Kaplan, M.; Iavarone, A. T.; Charpentier, E.; Nogales, E.; Doudna, J. A., Structures of Cas9 Endonucleases Reveal RNA-Mediated Conformational Activation. Science 2014, 343 (6176), 1215-+. 111. Nishimasu, H.; Cong, L.; Yan, W. X.; Ran, F. A.; Zetsche, B.; Li, Y. Q.; Kurabayashi, A.; Ishitani, R.; Zhang, F.; Nureki, O., Crystal Structure of Staphylococcus aureus Cas9. Cell 2015, 162 (5), 1113-1126. 112. Nishimasu, H.; Ran, F. A.; Hsu, P. D.; Konermann, S.; Shehata, S. I.; Dohmae, N.; Ishitani, R.; Zhang, F.; Nureki, O., Crystal Structure of Cas9 in Complex with Guide RNA and Target DNA. Cell 2014, 156 (5), 935-949. 113. Dagdas, Y. S.; Chen, J. S.; Sternberg, S. H.; Doudna, J. A.; Yildiz, A., A conformational checkpoint between DNA binding and cleavage by CRISPR-Cas9. Sci Adv 2017, 3 (8). 114. Huai, C.; Li, G.; Yao, R. J.; Zhang, Y.; Cao, M.; Kong, L. L.; Jia, C. Q.; Yuan, H.; Chen, H. Y.; Lu, D. R.; Huang, Q., Structural insights into DNA cleavage activation of CRISPR-Cas9 system. Nat Commun 2017, 8. 115. Sternberg, S. H.; LaFrance, B.; Kaplan, M.; Doucina, J. A., Conformational control of DNA target cleavage by CRISPR-Cas9. Nature 2015, 527 (7576), 110-113. 116. Tangprasertchai, N. S.; Di Felice, R.; Zhang, X. J.; Slaymaker, I. M.; Reyes, C. V.; Jiang, W.; Rohs, R.; Qin, P. Z., CRISPR-Cas9 Mediated DNA Unwinding Detected Using Site-Directed Spin Labeling. Acs Chem Biol 2017, 12 (6), 1489-1493. 117. Jiang, F. G.; Taylor, D. W.; Chen, J. S.; Kornfeld, J. E.; Zhou, K. H.; Thompson, A. J.; Nogales, E.; Doudna, J. A., Structures of a CRISPR-Cas9 R-loop complex primed for DNA cleavage. Science 2016, 351 (6275), 867-871. 118. Palermo, G.; Chen, J. S.; Ricci, C. G.; Rivalta, I.; Jinek, M.; Batista, V. S.; Doudna, J. A.; McCammon, J. A., Key role of the REC lobe during CRISPR–Cas9 activation by ‘sensing’, ‘regulating’, and ‘locking’ the catalytic HNH domain. Quarterly Reviews of Biophysics 2018, 51, e9. ` 87 119. Palermo, G.; Miao, Y. L.; Walker, R. C.; Jinek, M.; McCammon, J. A., CRISPR-Cas9 conformational activation as elucidated from enhanced molecular simulations. P Natl Acad Sci USA 2017, 114 (28), 7260- 7265. 120. Zuo, Z. C.; Liu, J., Structure and Dynamics of Cas9 HNH Domain Catalytic State. Sci Rep-Uk 2017, 7. 121. Cotton, F. A.; Hazen, E. E.; Legg, M. J., Staphylococcal Nuclease - Proposed Mechanism of Action Based on Structure of Enzyme-Thymidine 3',5'-Bisphosphate-Calcium Ion Complex at 1.5-a Resolution. P Natl Acad Sci USA 1979, 76 (6), 2551-2555. 122. Biertumpfel, C.; Yang, W.; Suck, D., Crystal structure of T4 endonuclease VII resolving a Holliday junction. Nature 2007, 449 (7162), 616-620. 123. Li, C. L.; Hor, L. I.; Chang, Z. F.; Tsai, L. C.; Yang, W. Z.; Yuan, H. S., DNA binding and cleavage by the periplasmic nuclease Vvn: a novel structure with a known active site. Embo J 2003, 22 (15), 4014- 4025. 124. GiraudPanis, M. J. E.; Lilley, D. M. J., T4 endonuclease VII - Importance of a histidine-aspartate cluster within the zinc-binding domain. J Biol Chem 1996, 271 (51), 33148-33155. 125. Yang, M. Y.; Peng, S. J.; Sun, R. R.; Lin, J. D.; Wang, N.; Chen, C. L., The Conformational Dynamics of Cas9 Governing DNA Cleavage Are Revealed by Single-Molecule FRET. Cell Rep 2018, 22 (2), 372-382. 126. Slaymaker, I. M.; Gao, L. Y.; Zetsche, B.; Scott, D. A.; Yan, W. X.; Zhang, F., Rationally engineered Cas9 nucleases with improved specificity. Science 2016, 351 (6268), 84-88. 127. Kamerlin, S. C.; Warshel, A., The EVB as a quantitative tool for formulating simulations and analyzing biological and chemical reactions. Faraday discussions 2010, 145, 71-106. 128. Vicatos, S.; Rychkova, A.; Mukherjee, S.; Warshel, A., An effective Coarse-grained model for biological simulations: Recent refinements and validations. Proteins 2014, 82 (7), 1168-1185. 129. Kamerlin, S. C. L.; McKenna, C. E.; Goodman, M. F.; Warshel, A., A Computational Study of the Hydrolysis of dGTP Analogues with Halomethylene-Modified Leaving Groups in Solution: Implications for the Mechanism of DNA Polymerases (vol 48, pg 5963, 2009). Biochemistry-Us 2009, 48 (32), 7776-7776. 130. Kamerlin, S. C. L.; Williams, N. H.; Warshel, A., Dineopentyl phosphate hydrolysis: Evidence for stepwise water attack. J Org Chem 2008, 73 (18), 6960-6969. 131. Klahn, M.; Rosta, E.; Warshel, A., On the mechanism of hydrolysis of phosphate monoesters dianions in solutions and proteins. Journal of the American Chemical Society 2006, 128 (47), 15310- 15323. 132. Rosta, E.; Klahn, M.; Warshel, A., Studies of the reaction mechanism of phosphate hydrolysis in the ras-GAP proteins and in aqueous model systems. Abstr Pap Am Chem S 2006, 232, 804-804. 133. Florian, J.; Goodman, M. F.; Warshel, A., Computer simulation studies of the fidelity of DNA polymerases. Biopolymers 2003, 68 (3), 286-299. 134. Matute, R.; Warshel, A., Computational study on the catalytic role of the magnesium ions in the active site of the DNA Polymerase beta. Abstr Pap Am Chem S 2016, 251. 135. Matute, R. A.; Yoon, H.; Warshel, A., Exploring the mechanism of DNA polymerases by analyzing the effect of mutations of active site acidic groups in Polymerase. Proteins 2016, 84 (11), 1644-1657. 136. Yoon, H.; Warshel, A., The control of the discrimination between dNTP and rNTP in DNA and RNA polymerase. Proteins: Structure, Function, and Bioinformatics 2016, 84 (11), 1616-1624. 137. Giraudpanis, M. J. E.; Duckett, D. R.; Lilley, D. M. J., The Modular Character of a DNA Junction- Resolving Enzyme - a Zinc-Binding Motif in Bacteriophage-T4 Endonuclease-Vii. J Mol Biol 1995, 252 (5), 596-610. 138. Singh, D.; Wang, Y. B.; Mallon, J.; Yang, O.; Fei, J. Y.; Poddar, A.; Ceylan, D.; Bailey, S.; Ha, T., Mechanisms of improved specificity of engineered Cas9s revealed by single-molecule FRET analysis. Nat Struct Mol Biol 2018, 25 (4), 347-354. ` 88 139. Gong, S. Z.; Yu, H. H.; Johnson, K. A.; Taylor, D. W., DNA Unwinding Is the Primary Determinant of CRISPR-Cas9 Activity. Cell Rep 2018, 22 (2), 359-371. 140. Raper, A. T.; Stephenson, A. A.; Suo, Z., Sharpening the Scissors: Mechanistic Details of CRISPR/Cas9 Improve Functional Understanding and Inspire Future Research. Journal of the American Chemical Society 2018, 140 (36), 11142-11152. 141. Raper, A. T.; Stephenson, A. A.; Suo, Z. C., Functional Insights Revealed by the Kinetic Mechanism of CRISPR/Cas9. Journal of the American Chemical Society 2018, 140 (8), 2971-2984. 142. Yang, M.; Peng, S.; Sun, R.; Lin, J.; Wang, N.; Chen, C., The Conformational Dynamics of Cas9 Governing DNA Cleavage Are Revealed by Single-Molecule FRET. Cell Rep 2018, 22 (2), 372-382. 143. Kamerlin, S. C. L.; Warshel, A., At the dawn of the 21st century: Is dynamics the missing link for understanding enzyme catalysis? Proteins 2010, 78 (6), 1339-1375. 144. Warshel, A.; Bora, R. P., Perspective: Defining and quantifying the role of dynamics in enzyme catalysis. J Chem Phys 2016, 144 (18). 145. Matute, R. A.; Yoon, H.; Warshel, A., Exploring the mechanism of DNA polymerases by analyzing the effect of mutations of active site acidic groups in Polymerase beta. Proteins 2016, 84 (11), 1644- 1657. 146. Ram Prasad, B.; Kamerlin, S. C. L.; Florián, J.; Warshel, A., Prechemistry barriers and checkpoints do not contribute to fidelity and catalysis as long as they are not rate limiting. Theoretical Chemistry Accounts 2012, 131 (12), 1288. 147. Ram Prasad, B.; Warshel, A., Prechemistry versus preorganization in DNA replication fidelity. Proteins 2011, 79 (10), 2900-19. 148. Florian, J.; Goodman, M. F.; Warshel, A., Computer simulation of the chemical catalysis of DNA polymerases: Discriminating between alternative nucleotide insertion mechanisms for T7 DNA polymerase. Journal of the American Chemical Society 2003, 125 (27), 8163-8177. ` 89 Appendix Details of the simulating system used for the chapter 2 System Coordinate of center of the simulating system Radius of simulation Radius of spherical hard wall for cavity BPTI one water cavity (32.824 4.006 10.705) 10 Å 3.5 Å BPTI three water cavity (31.242 6.845 1.776) 10 Å 4.5 Å SNase (13.13 23.47 8.33) 18 Å 6.0 Å Table S2-1. The coordinates of centers of simulating systems and the radius of each simulation and the cavity. Note that the center of a simulation is same as that of its cavity. Supporting materials for chapter 3 EVB/FEP parameters The EVB atoms (the reacting region) are shown in Figure 1S. The parameters for the EVB and the FEP calculations were taken as the same parameters used in the earlier studies. 20, 148 However, there are two more atoms included in this study which are 2’OH and 2’H on the ribose sugar as our new main focus. The charges and the vdW parameters are provided in Table S3-1. Figure S3-1. The atoms of the substrate and the reactive part. The reacting regions are presented as bold. ` 90 Table S3-1 The charges and the vdW parameters used in the FEP calculation that mutate the 2’OH to 2’H. The vdW energy between atom i and j is calculated as Evdw = CiCjexp(-αi αjrij), where rij is the distance between atom i and j. No hydrogen bond const Hydrogen bond with const No WF With WF No WF With WF Elec vdW Elec vdW Elec vdW Elec vdW water 9.08 -3.05 - - - - - - WT 4.20 -7.54 4.43 -5.76 3.84 -8.39 4.76 -7.64 Y416A 5.51 -2.04 5.58 -2.40 6.47 -2.08 6.55 -2.73 Y416F 8.42 -9.86 7.29 -6.83 5.67 -11.17 4.44 -6.79 Table S3-2 Breakdown of the calculated free energy contributions to the TS binding in RB69 DNA polymerase Energies in kcal/mol . The calculations involve the water flooding approach and no H bond constraint . Atom Charges C α 2’O (rNTP) -0.573 53.0 2.5 2’OH(rNTP) 0.447 5.0 2.5 2’H (dNTP) 0.072 5.0 2.5 ` 91 PMF calculation for the PPi release in chapter 4 The structure after the EVB calculation (after the bond breaking) was used for this PMF calculation. We considered 8 different situations with the presence/absence of each of the three Mg ions. Therefore, the corresponding metal ions were removed from the structures and then the structures were relaxed for 100ps. This calculations used 31 windows of 10 ps each with a distance constraint increasing from 3.8 Å to 5.5 Å between O3B in PP i and PA separated from the PP i. The PMF was then calculated by the weighted histogram analysis (WHAM) method. 96 Figure S4-1. The PMF profile of pulling the PPi after the chemical step. The profile is obtained for changing the distance between O3B of the PPi and PA from 3.8 Å to 5.5 Å. The red color profile was calculated from the structures where the 3 rd MG has been taken off. (a) was calculated with the presence of both catalytic and binding Mg while (b) was calculated after the catalytic Mg was removed.
Abstract (if available)
Abstract
Computational methods have been great complementary approaches when experiments often cannot identify key factors in fundamental biological problems. Especially, while experimental mutation studies allow us to infer the mechanisms, the only way to draw the full pictures of the mechanism is by calculating the entire energy surface of the reaction. The empirical valence bond (EVB) model has been successfully applied to elucidate the mechanisms of enzymatic catalysis in various biological systems. However, it is often difficult to produce reliable results without very careful system adjustments and analysis, thus, one might need to have enough experience until achieving consistent calculations. The major part of the difficulties is often encountered during the calibration procedure where the simulation in the reference solvent gets diverged from the simulation in proteins. In the first part of the thesis, an automatized protocol for the EVB model with theoretical assurance is introduced and validated. It significantly lessens the time and efforts that users should take until generating reliable results. ❧ Another challenging part, not just confined to the EVB but for the general free energy calculation, is the preparation of the optimal configuration of the simulating system. Among many aspects to be considered for consistent sampling and free energy calculation, the importance of the optimal water configuration has been only relatively recently recognized. Also, beside the quality of the simulation, it is well known that water molecules inside and around biological molecules often play a major role in determining key functions. Attempts to reach a formal rigor in water insertion in protein cavities have focused on grand canonical Monte Carlo (GCMC) methods. However, it requires major computational resources as the acceptance of the insertion/deletion attempts is very low. One of the fast and reliable methods to prepare better water configuration is the Water Flooding devised by Warshel’s group. However, this approach has not been fully validated and needs more benchmark test. The second chapter shows that the Water Flooding converges much faster than the GCMC but still reproduce the same results. ❧ In the rest of the chapters, the mechanisms of various nucleotide associated proteins are explored using the EVB with the help of the protocols introduced in chapter 1 and 2. Although Warshel’s group has successfully applied the EVB to many studies of DNA polymerases and nucleases, there are still countless fundamental questions to be solved to better understand the nature of those enzymatic catalysis. In particular with the rise of potential uses of the CRISPR-Cas system in human medicine, it is more important than ever to understand the detailed mechanisms that are shared among many nucleotide-associated proteins. The findings and analysis from these chapters can provide powerful ways and insights for developing medicine that act against viruses and cancers and engineering enzymes for gene editing.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Advancing ab initio QM/MM free energy calculations: refining, validating and quantifying the reference potential approach
PDF
Understanding electrostatic effects in the function of biological systems
PDF
Exploring the nature of the translocon-assisted protein insertion
PDF
Electronically excited and ionized states in condensed phase: theory and applications
PDF
The effect of microhydration on ionization energy and proton transfer in nucleobases: analysis and method development
PDF
Molecular dynamics studies of protein aggregation in unbounded and confined media
PDF
Transient pulsed plasma for pollution remediation and energy conversion applications
PDF
Energy simulation in existing buildings: calibrating the model for retrofit studies
PDF
New electronic structure methods for electronically excited and open-shell species within the equation-of-motion coupled-cluster framework
PDF
Development and application of robust many-body methods for strongly correlated systems: from spin-forbidden chemistry to single-molecule magnets
PDF
Kinetic Monte Carlo simulations for electron transport in redox proteins: from single cytochromes to redox networks
PDF
Step‐wise pulling protocols for non-equilibrium dynamics
PDF
Electrochemical pathways for sustainable energy storage and energy conversion
PDF
Detecting joint interactions between sets of variables in the context of studies with a dichotomous phenotype, with applications to asthma susceptibility involving epigenetics and epistasis
PDF
Investigation of mechanisms of complex catalytic reactions from obtaining and analyzing experimental data to mechanistic modeling
PDF
A diagrammatic analysis of the secondary structural ensemble of CNG trinucleotide repeat
PDF
Study of rotation and phase separation in ³He, ⁴He, and mixed ³He/⁴He droplets by X-ray diffraction
PDF
Improving the efficiency and sustainability of direct arylation polymerization for conjugated polymer synthesis
Asset Metadata
Creator
Yoon, Hanwool
(author)
Core Title
Advancing computational methods for free energy calculations in proteins and the applications for studies of catalysis in nucleotide-associated proteins
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Chemistry
Publication Date
03/07/2019
Defense Date
12/07/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Cas9,CRISPR,empirical valence bond model,free energy calculation,nucleotide polymerase,OAI-PMH Harvest,water flooding,water insertion
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Warshel, Arieh (
committee chair
), Aiichiro, Nakano (
committee member
), Cherezov, Vadim (
committee member
)
Creator Email
hanwooly@usc.edu,yoonx124@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-129932
Unique identifier
UC11675559
Identifier
etd-YoonHanwoo-7133.pdf (filename),usctheses-c89-129932 (legacy record id)
Legacy Identifier
etd-YoonHanwoo-7133.pdf
Dmrecord
129932
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Yoon, Hanwool
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
Cas9
CRISPR
empirical valence bond model
free energy calculation
nucleotide polymerase
water flooding
water insertion