Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Simulations across scales: insights into biomolecular, mechanisms, magnetic materials, and optical processes
(USC Thesis Other)
Simulations across scales: insights into biomolecular, mechanisms, magnetic materials, and optical processes
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
SIMULATIONS ACROSS SCALES: INSIGHTS INTO BIOMOLECULAR MECHANISMS, MAGNETIC MATERIALS, AND OPTICAL PROCESSES by Goran Giudetti A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (CHEMISTRY) May 2025 Copyright 2025 Goran Giudetti Epigraph “Omnia vanitas” –Goran Giudetti ii Acknowledgements This thesis, like any achievement I’ve reached as a scientist, is really a mosaic of shared experiences, invaluable support, and guidance from the incredible people I’ve been lucky enough to know. Behind every result here lies a collection of conversations, laughs, challenges, and insights from those who, knowingly or unknowingly, have shaped my path. To some of these exceptional people, I owe a special thank you, and these lines are dedicated to them. I want to thank Professors Antonelli, Pavisic, and Catalani for nurturing my passion for chemistry, along with all my other high school teachers and headmasters who defined the beginning of my career as a scientist. My most sincere gratitude extends to my esteemed colleagues and friends from the TheoChem group at RUG and the iOpenShell group at USC. It was an honor to meet such brilliant minds and share so many beautiful moments while working together. I feel truly blessed to have had the chance to be a part of both research teams. Special thanks go to RUG members Luis Suarez, Selim Sami, Sivasudhan Ratnachalam, Carsten Schroer, David Picconi, Elisa Palacino, Albert Thie, Panagiotis Stratidakis, Fernando Gella, Mira Kim, Kors Doedens, Kevin Pérez, Kiana Moghaddam, Edison Salazar, and my paranymphs Ivana Stijepovic and Maximilian Menger. I am extremely grateful to have conducted my doctoral research under the supervision of Dr. Shirin Faraji and Dr. Anna Krylov. I had the chance to learn a lot from two great scientists, and I could not have hoped for better, more compassionate mentors who truly cared for my professional development. iii I thank my close friends and companions who supported me in the US and filled my life with joy and love. I owe it all to Paweł Wójcik, Madhubani Mukherjee, Mattia Di Niro, Luana Zagami, Michela Melone, Samuele Cané, and Jessica Walker. Many thanks to the group members of the iOpenShell group, among whom Nayanthara Karippara Jayadev, George Baffour Pipim, Kyle Tanovitz, Jia Hao Soh, Arnab Chakraborty, Tingting Zhao and Yongbin Kim, for the wonderful time spent together in the office. I would like to express my gratitude to my friends who have always been there for me throughout the years. Amento forever! Last but not least, I want to thank my family, who, despite the distance, has always been close to me. These have been tough years, but I always found solace in remembering that I had a home and that you were there waiting for me. None of what I accomplished could have been possible without your support, trust, and love. Thank you for everything. I love you. iv Table of Contents Epigraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Applied computational chemistry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Electronically excited molecules: Overview and examples . . . . . . . . . . . . . . . . . . . 4 1.3 Electronic structure theory methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.1 Wavefunction methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.2 Density functional theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4 Treatment of the environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Chapter 2: How Reproducible are QM/MM Simulations? Lessons from Computational Studies of the Covalent Inhibition of the SARS-CoV-2 Main Protease by Carmofur . . . . . . . . . 13 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Details of the QM/MM simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3 Analysis of QM/MM results: Protocol 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4 Effect of different structures: Protocol 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Chapter 3: Multiscale Simulations of the Covalent Inhibition of the SARS-CoV-2 Main Protease: Four Compounds and Three Reaction Mechanisms . . . . . . . . . . . . . . . . . . . . . 37 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2 Methods: system preparation and computational protocols . . . . . . . . . . . . . . . . . . 41 3.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.3.1 Reaction of MP ro with carmofur . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.3.2 Reaction of MP ro with X77A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3.3 Reaction of MP ro with X77C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 v 3.3.4 Reaction of MP ro with nirmatrelvir . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.4.1 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Chapter 4: Theoretical insights into the effect of size and substitution patterns of azobenzene derivatives on the DNA G-quadruplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2 Computational protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.3 Results and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.4 Photoisomerization of azobenzene derivatives in gas phase . . . . . . . . . . . . . . . . . . 64 4.5 QM/MM simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Chapter 5: Origin of magnetic anisotropy in nickelocene molecular magnet and resilience of its magnetic behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.2 Computational protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.3 Results and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.4 Magnetic anisotropy and susceptibility of nickelocene molecular magnets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Chapter 6: Exploring the Global Reaction Coordinate for Retinal Photoisomerization: A Graph Theory-Based Machine Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.2 Theoretical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.3 Computational details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.4 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.6 Data and Software Availability statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Chapter 7: Optical Properties of New Donor–Acceptor Dyes for RNA Imaging: Insights from Ab Initio and Hückel’s Model Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 7.2 Theoretical models and computational details . . . . . . . . . . . . . . . . . . . . . . . . . 93 7.2.1 Quantum chemistry calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 7.2.2 Hückel’s model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 7.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 7.3.1 Explaining trends using Hückel’s model and making predictions . . . . . . . . . . 107 7.4 Design of the new dye: Preliminary results and outlook . . . . . . . . . . . . . . . . . . . . 111 7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Chapter 8: A computational study of possible mechanisms of singlet oxygen generation in miniSOG photoactive protein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 8.2 Computational details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 vi 8.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 8.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Chapter 9: Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 9.2 Predicting Radiationless Decay Rate for Diverse Chromophore Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 9.3 Excited-State Decay Calculations using Hybrid Approaches . . . . . . . . . . . . . . . . . . 130 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 vii List of Tables 2.1 Energies for the REAC, TS, and PROD structures computed using the same structures.a . . 35 2.2 Energies for the REAC and PROD structures.a . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.3 Energies for the REAC, INT, and PROD structures.a . . . . . . . . . . . . . . . . . . . . . . 36 4.1 Atom names and RESP charges used for AZ1, AZ2 and AZ3 in MD simulations . . . . . . . 61 4.2 GAFF parameters for bonds of AZ1, AZ2 and AZ3. The force constants (Kb) and equilibrium bonds (b0) are given in kJ/mol·nm2 and nm, respectively. . . . . . . . . . . . 62 4.3 GAFF parameters for angles of AZ1, AZ2 and AZ3. The force constants (Kθ) and equilibrium angles (θ0) are given in kJ/mol·rad2 and degree, respectively. . . . . . . . . . 62 4.4 GAFF parameters for angles of AZ1, AZ2 and AZ3. The force constants (Kϕ) and equilibrium angles (ϕ0) are given in kJ/mol and degree, respectively. n is the multiplicity. 62 4.5 Ion Lennard-Jones parameters and water models used in the MD simulations. Ion parameters are abbreviated as follows: AA: Amber-adapted Åqvist; JC: Joung and Cheatham. In AZn, n refers to the structure number of azobenzene derivatives AZ1, AZ2 and AZ3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 7.1 Values of α and β integrals (in eV) used for different types of centers within Hückel’s model. 96 7.2 Photophyiscal properties of the dyes (lowest rotamer structure) in DMSO computed with different levels of theory. Energies in eV, oscillator strength in parenthesis, dipole moments in Debye. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 7.3 Photophyiscal properties (excitation and emission energies, oscillator strengths, Stokes shifts, ground- and excited-state dipole moments, and change in dipole moment between the ground and excited states) of the dyes in different PCM solvents for the lowest rotamer structures. Energies in eV, dipole moments in Debye. . . . . . . . . . . . . . . . . . . . . . 105 7.4 Theoretical (ωB97M-V/aug-cc-pVDZ) and experimental absorption energies of MPI and R8. 112 viii 8.1 Excitation energies (eV) for model system B (no oxygen). Oscillator strengths for the transitions from RF(S0) are given in parenthesis. . . . . . . . . . . . . . . . . . . . . . . . . 122 8.2 Excitation energies (eV) for model system A; XMCQDPT2/cc-pVDZa . Oscillator strengths for the transitions from RF(S0) are given in parentheses. . . . . . . . . . . . . . . . . . . . 124 ix List of Figures 1.1 Growth of DFT publications in terms of research topics since 1980. Source: Haunschild et al. J Cheminform (2016) 8:52. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Different relaxation pathways for a photoexcited molecule: internal conversion (IC), intersystem crossing (ISC), fluorescence, phosphorescence. Source: Angew. Chem. Int. Ed. 2020, 59, 16832–16846. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Partition of a molecular system in QM and MM region. Source: Methods Mol Bio 2013, 924, 43–66. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1 The key step of the reaction between carmofur and MP ro —the nucleophilic attack of the thiolate of the catalytic cysteine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 Various components defining a QM/MM protocol. . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Illustration of the link atoms approach. QM and MM atoms are marked as Qi and Mi , respectively. The covalent bond between Q1 and M1 is cut and saturated by the link atom L. To avoid overpolarization, the MM charges on the boundary (M1 here) are set to zero; in some approaches, the excess charge is redistributed among the neighboring atoms to conserve the total charge. Image adapted with permission of Senn Angew. Chem., Int. Ed. 2009, 48, 1198–1229. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 Definition of the moderate-size QM region (83 atoms). Carbon atoms are colored green, oxygen – red, nitrogen – blue, hydrogen – white, fluorine – cyan, sulfur – orange. The hydrogen link atoms are shown in dark grey; they correspond to His41 (LH), Asn142 (LA1, LA2) and Cys145 (LC). Designation of several atoms in the system, which are referenced below (e.g., C7, O7, N7 from carmofur, SG from Cys145, etc.) is specified. . . . . . . . . . . 19 2.5 Alignment of the PROD and X-ray structures. Distances are in Å. The values in italics refer to the crystal structure. Hydrogen atoms are omitted. . . . . . . . . . . . . . . . . . . 24 2.6 Structures of REAC, TS, and PROD obtained with Protocol 1 and moderate-size (83 atoms) QM part and the corresponding total energy profile. Distances are in Å. The TS panel shows the vibrational mode with the imaginary frequency 293i cm−1 . . . . . . . . . . . . 25 x 2.7 Alignment of the REAC structures optimized with NWChem using the density espfit (colored balls and sticks, red values for distances) and density static (yellow sticks, dark blue values for distances) options. Hydrogen atoms are omitted. Distances are in Å. . . . . 27 2.8 Alignment of the REAC structures optimized by Q-Chem (balls and sticks colored by elements) and by NWChem (stick colored yellow). Distances are in Å. Hydrogen atoms are omitted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.9 The reaction energy profile showing the relative energies of REAC, INT and PROD located using Protocol 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1 Molecular models of the compounds considered here as covalent inhibitors of MP ro . Here and in all figures, carbon atoms are colored green, oxygen—red, nitrogen—blue, sulfur—yellow, fluorine—cyan, hydrogen—white. Red asterisks mark the target carbon atoms of the nucleophilic attack of the Cys145 thiolate of MP ro . . . . . . . . . . . . . . . . 39 3.2 The chemical formulae of the compounds shown in Fig. 3.1. The carbon that forms a covalent bond with the sulfur atom of the Cys145 residue during the covalent complex formation atom is marked blue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3 Fragments of the MP ro active site from the selected PDB structures. Panels (a) and (b): fragments of the active site of MP ro relevant to the reactions of selected compounds with Cis145 as they appear in the PDB structures. We focus on the chain Gly143-Ser144-Cys145 with the reactive Cys145 and the oxyanion hole groups and to the location of the His41 side chain relative to Cys145. Panel (c): the PDB structure 6W63, a non-covalent complex of MP ro with X77. To design covalent inhibitors, we replaced the fragment of X77 (highlighted in yellow) by the reactive warheads (see panels X77A and X77C in Fig. 3.1). Here and below the distances are given in Å. . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.4 The QM/MM-optimized structures of REAC, IP, PROD for the MP ro—carmofur reaction. The left bottom panel shows the superposition of PROD (colored balls and sticks) and the crystal structure 7BUY (yellow sticks). The side chain of His41 in the right bottom panel is shown in yellow sticks. The distances in italics correspond to the crystal structure. . . . 45 3.5 The computed Gibbs free energy profile for the REAC→TS1→IP step of the ion-pair formation in the MP ro - carmofur reaction. . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.6 The QM/MM optimized structures for the MP ro – X77A reaction. A large part of the X77A molecule is shown in light yellow sticks. The side chain of His41 in the bottom panels is shown in goldish yellow sticks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.7 The computed Gibbs free energy profiles for the MP ro – X77A reaction. The upper panel shows the diagram combining the results at the two reaction steps illustrated in the bottom panels. The collective variables are as follows: CV1 = d(NE-HS)-d(SG-HS), CV2 = d(SG-C)-d(C-N5). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.8 Structures of the QM/MM optimized structures for the MP ro – X77C reaction. . . . . . . . 50 xi 3.9 The computed Gibbs free energy profiles for the MP ro –X77C reaction. The upper panel shows the diagram combining the results at the two reaction steps illustrated in the bottom panels. The collective variables are as follows: CV1 = d(NE-HS)-d(SG-C), CV2 = d(C-F). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.10 The QM/MM optimized structures for the MP ro– nirmatrelvir reaction. . . . . . . . . . . 53 3.11 The computed Gibbs free energy profiles for the MP ro–nirmatrelvir reaction. The upper panel shows the diagram combining the results at the two reaction steps illustrated in the bottom panels. The collective variables are as follows: CV1=d(NE-HS)-d(SG-HS)-d(SG-C), CV2=d(NE-HS)-d(HS-OWat1)+d(OWat1-HWat1)-d(HWat1-OWat2)+d(OWat2-HWat2)- d(N-HWat2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.1 (a) The structures of the three azobenzene units in the trans isomer, G refers to the guanosine moieties and (b) schematic representation of the photoswitchable Gquadruplex structure with azobenzene residues (AZ1) in green color. K+ cations are presented as purple spheres. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2 Atom labeling scheme of the azobenzene derivatives used in the force field and QM region of QM/MM simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.3 RMSDs as a function of simulation time. (a, d and g) All atoms of the G-quadruplexes in different simulation groups described in the text and Table 4.5. (b, e and h) G-quartets and azobenzenes in JCKCl simulations. Superimposed structures of (c) the representative structures of the two different ground state populations of the AZ1 and (f and i) the two random snapshots of AZ2 and AZ3, in JCKCl MD simulations. . . . . . . . . . . . . . . . . 63 4.4 Schematic representation of the PESs of AZ1 photoisomerization mechanism as function of the CNNC dihedral angle. The ground state (S0), first (S1) and second (S2) excited states are shown in blue, red and green. The S 0 curve is a PES scan along the dihedral angle obtained from SF-B5050LYP/cc-pVDZ. The S1 and S2 curves are obtained through a connection of the excited states optimized geometries and MECPs (shown in purple) calculated at the SF-B5050LYP/cc-pVDZ level of theory. . . . . . . . . . . . . . . . . . . . 65 4.5 Absorption spectra of AZ1, AZ2 and AZ3 obtained by a Gaussian convolution of the excitation energies of 90 MD simulation snapshots. . . . . . . . . . . . . . . . . . . . . . . 66 5.1 a) Embedded cluster setup used for structure optimization: The all-electron QM region (NiCp2/Mg49O49) is treated with PBE0/6-31G*, while the outermost region contains point charges. The QM region is shown as lifted for clarity. Here, NiCp2 is on-top of Mg2+ adsorption site. b) Top and side views of the embedded NiCp2/Mg49O49 PBE0 region. c) Top and side views of a smaller cut-out (NiCp2/Mg25O25 ) used for the SF-TD-DFT calculations. The bond of the metal with the Cp centroid is shown with blue dash lines. Color code: Ni - purple, Mg - green, O - red, C - gray, and H - white. . . . . . . . . . . . . 69 5.2 Hole and particle NTO pairs of the spinless density matrix, giving rise to SOC within states 1 and 2 of nickelocene (EOM-SF-CCSD/cc-pVTZ). Singular values σ are 1.19 and 1.15, respectively. Red, green, and blue axes indicate x, y, and z coordinates axes, respectively. 71 xii 5.3 Hole and particle NTO pairs of the spinless density matrix between states 1 and 2 of the NiCp2/Mg25O25 adsorption complex (SF-PBE0/cc-pVTZ). Ni atom is on top of O2− . Singular values σ are 0.5 and 0.5, respectively. Red, green, and blue axes indicate x, y, and z coordinates axes, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.4 Structures of six ring-substituted nickelocene derivatives. In complex 1, two C-H groups are substituted with two P atoms. In complexes 2, 3, and 6, two H atoms are substituted with methyl, cyano, and aromatic groups, respectively. Complexes 4 and 5 are bent structures taken from refs 33 and 34, respectively. The bond of the metal with the Cp centroid is shown with blue dashed lines. Color code: Ni - purple, P - orange, N - blue, C - gray, and H - white. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.5 Calculated temperature dependence of the inverse susceptibility (1/χav ) of NiCp2 in the temperature range from 5 to 250 K (top) and from 5 to 80 K (bottom), and under an applied field of 1 T. Calculated curves including three and four electronic states are in blue and red, respectively. Experimental susceptibility data, i.e., black and green curves, are taken from refs 35 and 36, respectively. Experimental magnetization data are not available. This behavior is preserved also for all 6 derivatives. . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.1 Retinal Phototoisomerization: Transition from 11-cis retinal conformation to all-transretinal conformation, C in cyan (except C11=C12 in blue), H in pink, O in red. . . . . . . . 75 6.2 HOMO and LUMO of trans-retinal calculated with PBE0/STO-3G level of theory. . . . . . . 79 6.3 Natural transition orbitals representing transitions from the ground state to the S1 and the S2 state in trans-retinal. Holes are shown on the left and particles are on the right. . . 80 6.4 Flowchart of the computational protocol described in this text. . . . . . . . . . . . . . . . . 81 6.5 Energy in eV of HOMO (blue) and LUMO (red) as function d31 from NAMD (left) and AIMD (right) simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.6 Mutual information between internal coordinates and HOMO energy from AIMD dataset (top), selected NAMD dataset (middle), full NAMD dataset (bottom). . . . . . . . . . . . . . 84 6.7 Correlation matrix obtained from AIMD data (top), selected NAMD data (middle) and full NAMD data (bottom). The color bar denotes the correlation between internal coordinates. 85 6.8 2D graph representations (blue nodes with black edges) of the data sets produced with NAMD (left) and AIMD (right) simulations. Nodes are displayed as a function of HOMO energies versus torsional angles of d31. Red nodes show one possible shortest path that can be found with the Dijkstra algorithm, resembling a potential energy surface for the isomerization of retinal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.9 Internal coordinates from the NAMD (left) and AIMD (right) simulation along the reaction path found from graph analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 xiii 7.1 Structures of dyes considered in this study. Red and blue colors denote donor and acceptor moieties, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 7.2 Ground-state dipole moment and electrostatic potential (ESP) maps for dyes (1)-(11) in DMSO; ωB97X-D/aug-cc-pVTZ. Color scheme: Blue color represents positive charge and red color represents negative charge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 7.3 Model dyes ordered by their computed excitation energy (no solvent); ωB97M-V/aug-ccpVTZ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 7.4 NTOs for the S0 →S1 transitions for dyes (1)-(11) and the change in ESP upon electronic excitation in DMSO. ωB97M-V/aug-cc-pVTZ. Blue/violet represents net positive charge due to electron transfer from the donor to the acceptor. . . . . . . . . . . . . . . . . . . . . 99 7.5 Correlation plots for the excitation energies of the S0 →S1 transition computed with different theoretical methods for dyes (1)-(9) in DMSO. For YL166, YL154, YL158, 5MPI, 6MPI, and 7MPI, excitation energies are computed using Boltzmann populations of two lowest rotamers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.6 Correlation plots for the theoretical and experimental values of the Stokes shifts, systems (1)-(10). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 7.7 Correlation plots for the excitation energies of the S0 →S1 transition computed with ab initio (ωB97M-V) method and Hückel’s model for dyes (1)-(11). . . . . . . . . . . . . . . . 107 7.8 Top: NTOs computed with TD-DFT for YL158 (system (9)). Bottom: HOMO and LUMO computed by Hückel’s model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 7.9 HOMO and LUMO from Hückel’s model for dyes (1)-(11). . . . . . . . . . . . . . . . . . . 109 7.10 Changes in atomic charges between the HOMO and LUMO from Hückel’s model for selected dyes. Positive signs mean that electrons are removed from the respective atoms upon excitation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 7.11 Structure of the R8 compound. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.12 Hole (left) and electron (right) NTOs computed with ωB97M-V/aug-cc-pVDZ for R8. . . . 112 7.13 Intrinsic fluorescence lifetimes of the dyes relative to YL146. . . . . . . . . . . . . . . . . . 114 8.1 miniSOG protein. Flavin mononucleotide (FMN) and riboflavin (RB) cofactors are shown in the inserts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 8.2 QM cluster (model system A) used for QM/MM optimization and excited-state calculations: RF, O2, sidechains of Gln77, Asn72, Asn82, and seven water molecules. Oxygen is located about 4.1-4.2 Å from RF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 xiv 8.3 RAS-2SF reference and target determinants. Singly occupied orbitals are flavin’s π and π ∗ and oxygen’s π ∗ x and π ∗ y . LE and CT denote local excitations and charge-transfer configurations, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 8.4 Electronic configurations of the 3Σ − g , 1∆g, and 1Σ + g states of molecular oxygen. . . . . . . 122 8.5 NTOs for the two lowest transitions in RF cofactor in miniSOG[RF]. . . . . . . . . . . . . 123 8.6 Energy diagram of the low-lying manifold of singlet and triplet states derived from RF’s S0, S1, S2, T1, T2, and T3 and oxygen’s 3Σ − g , 1∆g, and 1Σ + g . Excitation energies are in electron-volt relative to the ground state, RF(S0) × O2( 3Σ − g ). . . . . . . . . . . . . . . . . 125 8.7 Couplings between the relevant states. SOC values (in cm−1 ) are shown in black and ||γ|| values (dimensionless) are shown in red. For the degenerate 1∆g states, the combined values (sum of the SOCs/NACs for the two components) are shown. . . . . . . . . . . . . 127 xv Abstract This doctoral thesis summarizes my contributions to computational chemistry, which were developed over five years of studies and training at the University of Groningen (RUG) and the University of Southern California (USC). The focus of my research is on excited state processes and biological systems. Chapter 2 describes a benchmark work concerning the level of transparency one needs to show when reporting the set-up of hybrid QM/MM calculations. We used the inhibition of the SARS-CoV-2 Main Protease (MP ro) by the anticancer drug carmofur as a test case. The reaction was modeled using two electronic structure packages: Q-Chem and NWChem. In Chapter 3, the knowledge gained in Chapter 2 is used both to characterize the inhibition of MP ro with three reaction mechanisms and to design two novel inhibitors derived from the chemical structure of the compound X77. Chapter 4 focuses on the characterization of the dynamical properties of the system formed by the azobenzene photoswitch embedded in the DNA G-quadruplex; the dynamics and structural characteristics of the system are explored by means of classical molecular dynamics simulations (MD). Chapter 5 describes the evaluation of the spectral and spin properties of molecular magnets based on Nickelocene adsorbed on a CuO metal layer using the spin-flip variant of time-dependent density functional theory (SF-TD-DFT). In Chapter 6 I describe a machine learning methodology for the representation of the global reaction coordinate involved in the isomerization of retinal. In Chapter 7 I report a study on the optical properties of novel fluorescent dyes for RNA imaging. Chapter 8 explores the possible mechanisms behind the production of singlet oxygen by action of the mini xvi Singlet Oxygen Generator (miniSOG) photoactive protein. The thesis concludes with Chapter 9 where I discuss future research directions. xvii Chapter 1 Introduction “Every attempt to employ mathematical methods in the study of chemical questions must be considered profoundly irrational and contrary to the spirit of chemistry... if mathematical analysis should ever hold a prominent place in chemistry — an aberration which is happily almost impossible — it would occasion a rapid and widespread degeneration of that science." - A. Comte (1830) 1.1 Applied computational chemistry Computational chemistry is a transformative discipline that harnesses the power of computer simulations to address real-world challenges across various scientific and industrial domains. By employing cuttingedge software, this field bridges theory and practice, yielding insights and solutions that impact research, development, and innovation. Many different fields of chemistry can benefit from their computational and theoretical descriptions. Chemoinformatics revolutionized drug discovery by predicting the interactions between drug molecules and target proteins. Through molecular docking and dynamics simulations, pharmaceutical companies gain insights into binding affinities, selectivity, and potential adverse effects. The 1 modeling accelerates the identification of promising drug candidates, optimizing their properties for enhanced therapeutic effects while minimizing unwanted interactions [1]. Industries benefit from predictive simulations of catalyst behavior and reaction mechanisms. Computational chemistry enables researchers to explore different catalyst structures and reaction pathways, optimizing catalyst design for enhanced efficiency and selectivity in chemical reactions. This leads to more sustainable and economical industrial processes [2]. Computer simulations also play a critical role in designing novel materials with tailored properties. By modeling the properties of molecules at the nanoscale, researchers can predict material properties, such as conductivity, strength, and reactivity. These insights are crucial for developing advanced materials used in electronics, energy storage devices, and medical applications [3]. In environmental chemistry, the modeling of the terrestrial surface aids in understanding the behavior of chemicals in the environment. Through simulations, it is possible to predict the fate, transport, and transformation of pollutants. These insights inform environmental management strategies, guiding decisions about pollutant remediation and risk assessment [4]. In silico experiments contribute significantly to energy research by modeling energy-related processes. Simulations help optimize combustion processes, improve the efficiency of fuel cells, and explore innovative energy conversion technologies. These insights drive the development of cleaner and more sustainable energy solutions [5, 6]. Chemists can predict reaction pathways, transition states, and kinetic parameters by performing quantum mechanical calculations and molecular dynamics simulations. These calculations guide experimental efforts toward desired outcomes, particularly for reactions with intricate mechanisms. Electronic structure calculations provide insights into the electronic states of molecules and allow the characterization of the related transitions [7]. The impact of this scientific field on research is so great that it has grown steadily in popularity over the past forty years. Fig. 1.1 shows this growth due to density functional theory (DFT) only [8]. Computational chemistry, as a tool, can provide accurate results on the condition that it is used properly. This means that one does not blindly run simulations and analyze whatever numbers the computer 2 Figure 1.1: Growth of DFT publications in terms of research topics since 1980. Source: Haunschild et al. J Cheminform (2016) 8:52. prints on the screen. Before embarking on any computational study, it is essential to have a clear understanding of the scientific question or problem at hand. This clarity guides the entire research process, from choosing the appropriate methodology to interpreting results correctly. Nowadays, there is a plethora of theoretical methods, each with its own set of approximations and limitations. Choosing the proper method is crucial to ensure reliability. For instance, DFT is often used for its balance between accuracy and computational cost, but it may not be suitable for all types of systems or properties, e.g., multi-reference or doubly excited states [9]. In electronic structure calculations the choice of basis sets, which define the mathematical functions used to describe molecular orbitals, can significantly impact results. Properly selecting basis sets and ensuring convergence with respect to their size is vital for accuracy. Insufficient basis set quality can lead to unreliable outcomes. In many real-world applications, molecules exist in solution, and solvent effects play a crucial role. Ignoring these effects or approximating them inadequately can result in inaccurate predictions. Utilizing methods like implicit solvent models or explicit solvent simulations affects the reliability of results in such cases [10, 11]. Molecules often exist in multiple conformations or isomers. Adequate sampling of these conformations is vital to capture the true energy landscape of different processes accurately as this needs to be compared directly with experiments. To ensure the reliability of computational results, it is essential to validate them against experimental data whenever possible. Benchmarking 3 against well-established systems or known properties helps assess the accuracy of the chosen computational method and its limitations due to the approximations assumed. 1.2 Electronically excited molecules: Overview and examples Electronically excited states form when molecules or atoms absorb enough energy and transition from their ground state (lowest energy) to a higher energy state. Energy can be absorbed in the form of light (photoexcitation), electricity, and heat. Many chemical reactions take place after a molecule is excited, the fate of this molecule is governed by the potential energy surface (PES) of the excited state. Fig. 1.2 depicts Figure 1.2: Different relaxation pathways for a photoexcited molecule: internal conversion (IC), intersystem crossing (ISC), fluorescence, phosphorescence. Source: Angew. Chem. Int. Ed. 2020, 59, 16832–16846. the possible outcomes after photoexcitation of a molecule. The excited molecule, being in a metastable state, will eventually decay to the ground state. These decay processes can be grouped into spin-allowed and spin-forbidden transitions. The former type represents transitions between states with the same spin multiplicity (e.g. singlet-singlet, triplet-triplet, ...), whereas the latter refers to transitions for which the spin multiplicity is not preserved (e.g. singlet-triplet). Both types of processes can be further differentiated between radiative and non-radiative transitions. Through radiative transitions, the molecule decays 4 to a lower energy state by emitting a photon via fluorescence (spin-allowed) or phosphorescence (spinforbidden). With non-radiative transitions the molecule decays without emitting light, but rather by hopping to a different state, often through a conical intersection–a point on the PES where two or more states are degenerate. If the hop is between states with the same spin multiplicity, we refer to such crossing as an internal conversion (IC), and as an intersystem crossing (ISC) otherwise [12]. 1.3 Electronic structure theory methods It is clear that, in order to describe chemical processes involving excited states, one needs to accurately compute the PESs of the molecular system taken under study. Nowadays there is a large number of theoretical methods that allow the calculation of molecular properties like the PES. These methods can be divided into wavefunction-based (WF) and DFT methods. The ultimate goal of the two theories is to solve the Schrödinger equation associated with a system. 1.3.1 Wavefunction methods The Hartree-Fock method is one of the simplest wavefunction-based ab initio methods in quantum chemistry. It approximates the wavefunction and energy of a multi-electron molecule. The key approximation in HF theory is the mean-field approximation, i.e., each electron moves in an average field created by the other electrons. The HF wavefunction is represented by a single Slater determinant constructed from a set of molecular orbitals (MOs). The MOs themselves are expressed as a linear combination of basis functions, which are commonly atomic orbitals centered on each atom. This allows the calculation to be translated into a matrix form. The procedure begins with an initial guess at the MO coefficients. Using these MOs, the Fock operator is constructed, which describes the kinetic energy of the electrons plus their potential energy from the nucleus-electron and averaged electron-electron interactions. The HF equations are then solved iteratively using the self-consistent field procedure. The Fock operator generates a new set of MO 5 coefficients. These MOs define a new Fock operator, which generates new MOs, and so on. This cycle continues until the energy is minimized and the MO coefficients converge to a consistent solution. At each iteration, the HF equations are solved to find the set of orbitals that minimize the energy. This is done by diagonalizing the Fock matrix. The lowest energy orbitals are filled first according to the Aufbau principle. Once convergence is reached, the final MOs are used to construct the Slater determinant wavefunction. The expectation value of the Hamiltonian with this wavefunction gives the HF energy. Though simple, HF provides a crucial starting point for more accurate post-HF methods. Two of the most popular post-HF methods are Configuration Interaction (CI) and Coupled-Cluster (CC) theory [13–15]. The key idea of CI is to represent the electronic wavefunction as a linear combination of Slater determinants rather than a single determinant as in HF theory. This provides a systematically improvable approach to recover the electron correlation energy. In CI, the wavefunction is constructed as follows: ΨCI = c0Ψ0 + c1Ψ1 + ... + cnΨn, (1.1) where Ψ0 is the HF wavefunction, Ψi are Slater determinants representing excited configurations generated by promoting electrons from occupied to virtual orbitals and ci are coefficients to be optimized. The excited configurations account for electron correlation via coupling between the ground state and excited states of the system. The more configurations included, the more correlation energy is recovered. In full CI (FCI) every possible configuration is included. However, full CI scales factorially with system size and is impossible to apply for all but the smallest systems. In most CI calculations, the configuration space is limited by only allowing excitations up to a certain level (e.g. singles and doubles in CISD). The trade-off is between cost and accuracy. A key step is choosing an appropriate orbital basis. Using canonical HF orbitals often leads to slow convergence. Transforming to natural orbitals can greatly accelerate CI [13]. While CI provides a systematic approach to electron correlation, the truncated CI expansion may not converge 6 quickly in many cases. Even more importantly, truncated CI methods violate size extensivity, leading to unacceptable errors. More advanced methods like CC theory were developed to effectively overcome these issues. The CC approach provides an elegant and systematic way to represent the many-electron wavefunction including electron correlation effects. It affords accuracy. The CC wavefunction is represented in the exponential form: ΨCC = e TΨ0, (1.2) where Ψ0 is the HF reference wavefunction and T is the cluster operator: T = T1 + T2 + T3 + ..., (1.3) T1, T2, T3 represent single, double, triple excitations from the reference, etc. When acting on Ψ0, T generates excited Slater determinants. The exponential form can be expanded in a Taylor series: ΨCC = 1 + T1 + T2 + 1 2 T 2 1 + 1 2 T1T2 + 1 2 T2T1 + 1 2 T 2 2 + ... Ψ0, (1.4) This leads to the excitations of all levels, giving the method its name. Excitations are not just added linearly as in CI. In CCSD, the expansion is truncated at double excitations. CCSD(T) also includes triple excitations in a perturbative manner. This offers a great balance of accuracy and cost. Solving the CC equations yields the amplitudes of the T operators, which define ΨCC. The computational cost of CCSD(T) scales steeply as N7 with system size due to all the coupled excitations. A major appeal of CC theory is its systematic convergence towards the FCI limit and its size extensivity. Coupled cluster with singles, doubles, triples and beyond provides a hierarchy of methods to achieve very high accuracy in ground state calculations [14, 15]. For calculating excited states within a CC framework, one can employ the Equation-of-Motion Coupled 7 Cluster method (EOM-CC) which can be seen as a hybrid between CI and CC. Starting from a ground state CCSD or CCSD(T) calculation, the EOM-CC wavefunction is defined as: ΨEOM = RΨCC, (1.5) where ΨCC is the ground state CC wavefunction and R is a linear excitation operator (as in CI) that acts on it: R = R0 + R1 + R2 + ..., (1.6) The EOM amplitudes Ri generate excited state wavefunctions when applied to the ground state. The EOMCC equations are derived by applying the similarity-transformed Hamiltonian to the EOM-CC wavefunction and requiring the energy to be stationary. Solving the EOM-CC equations yields excitation energies and Ri amplitudes that define the excited state CC wavefunctions [16, 17]. 1.3.2 Density functional theory DFT is arguably one of the most popular and versatile methods in quantum chemistry today. Rather than using the wavefunction, DFT is based on the electron density as the fundamental variable. The key theorems of DFT show that the ground state properties of a many-electron system are uniquely determined by the electron density ρ(r). Thus the ground state energy is defined as a functional of the density: E[ρ] = T[ρ] + Vne[ρ] + Vee[ρ], (1.7) where T[ρ] is the kinetic energy functional, Vne[ρ] is the nuclear-electron interaction energy, and Vee[ρ] is the electron-electron repulsion energy. The challenge is to approximate the functionals, particularly Vee[ρ] 8 which describes exchange and correlation interactions. Common rungs on Jacob’s ladder of approximations include: 1) Density Approximation (LDA): Vee[ρ] depends only on density at each point locally. 2) Generalized Gradient Approximation (GGA): Vee[ρ] also depends on gradient of density. 3) Hybrid functionals: Mix some percentage of HF exchange with DFT exchange. 4) Meta-GGAs: include kinetic energy density and/or laplacian of density [18, 19]. In Kohn-Sham DFT, the equations used to determine the density of a system are solved with a self-consistent field approach as in HF. DFT offers a good balance of reasonable accuracy and computational cost scaling as (N3 ). It has become an indispensable tool for calculating electronic properties of the ground state of large molecules and solid state systems though the results still depend heavily on the functional chosen. For excited state properties one needs to extend DFT to its time-dependent variant (TD-DFT). TD-DFT is based on the Runge-Gross theorem, which demonstrates a one-to-one mapping between time-dependent external potentials and time-dependent densities. This affords the computation of excitation energies and other properties. Just as for ground state DFT, the choice of the functional impacts the accuracy of the calculations [18, 19]. 1.4 Treatment of the environment Many chemical reactions take place in a liquid, a solid, a protein, or other complex media. The modeling of the interactions between molecules and the environment is therefore crucial in order to simulate real systems for practical applications. Protein-ligand binding, solubility, permeability, and other pharmaceutical properties are dictated by interactions with solvent and the active site of the protein. Light absorption and energy transfer processes in solution depend heavily on solvent relaxation effects. Mechanical, electrical, and thermal properties of materials arise from intermolecular interactions. Lattice vibrations, defect formation, and phase transitions rely on condensed phase modeling. Although ab initio theories can accurately predict properties of molecular systems from first principles, their computational cost prohibits their application for the treatment of large environments. To address this issue one can rely on mixed 9 quantum mechanics/molecular mechanics methods (QM/MM). The key idea behind QM/MM methods is Figure 1.3: Partition of a molecular system in QM and MM region. Source: Methods Mol Bio 2013, 924, 43–66. to partition the system into 2 regions [20, 21] (fig. 1.3). 1) A QM region where atoms are treated with an accurate ab initio method. 2) An MM region where the atoms of the environment are treated at a lower and more affordable level of theory that is molecular mechanics. Within MM methods, the potential energy of a system is defined as an additive expression of bonded energy contributions due to bond stretching, angle bending, dihedral torsion, and non-bonded contributions due to Coulomb and van der Waals interactions: Ebonded = # X bonds k (R − Ri) 2 + # angles X k (θ − θi) 2 + # dihedral X k [1 + cos (nϕi − δ)] (1.8) Enon−bonded = X # X atoms i<j qjqi 4ε0Rij + X # X atoms i<j 4εij " σij Rij 12 − σij Rij 6 # , (1.9) where the parameters corresponding to force constants and other physical properties of each energy term are obtained from the fitting of experimental data or highly accurate QM calculations and collected in 10 libraries known as force fields (FF). Interactions between QM and MM regions can be modeled within a mechanical embedding, where one takes into account only van der Waals forces between QM and MM atoms; alternatively, a better treatment is given by an electrostatic embedding where electrostatic interactions between the QM electron density and the point charges of MM atoms are explicitly incorporated as one-electron integrals in the Hamiltonian of the QM region: Hˆ QM E. Embed. = Hˆ QM el + # QM MOs X i ⟨ψi | # MM atoms X j e 2Qj 4πε0 |ri − Rj | |ψi⟩ (1.10) In eq. 1.10 Hˆ QM el is the electronic Hamiltonian for the isolated system, ψ labels molecular orbitals (MOs), ri and Rj are the position of an electron occupying the i-th QM MO and the j-th MM atom while Q denotes MM point charges [20, 21]. The quality of the chosen FF, the treatment of the boundaries and interactions between QM and MM regions, and the arbitrary size of the QM region make the benchmark of QM/MM methods challenging as there are multiple factors affecting the overall accuracy. Defining the QM region itself can be subjective based on chemical intuition and trial-and-error testing. The size and particular atom selections will differ based on user choices. Larger QM regions reduce boundary artifacts but increase computational expense. Partitioning schemes and link atom placements also vary. These inconsistencies get amplified in the subsequent QM and QM/MM calculations. The choices of QM method, MM force field, and how their interactions are handled introduce additional variability. While the QM method may be standardized, the MM force fields can differ in parameter sets. The treatment of electrostatic embedding using point charges is also implementation-dependent. Charge redistribution, damping, and cutoff handling at the boundary deviate [20, 21]. Furthermore, the algorithms and convergence criteria used for optimizing geometries and locating transition states are not consistent. Hardware architectures and integration with classical MD software add to the complexities. The use of microiterations, constraints, reaction coordinates, and other 11 technical aspects of exploring potential energy surfaces diverge. With so many coupled layers of variability, it becomes very challenging to achieve reproducible results between QM/MM software packages, even with the same conceptual partitioning and Hamiltonian model. Therefore, carefully reporting every technical detail involved in the simulations is crucial. Establishing standardized QM/MM protocols will also be an important step towards reducing implementation uncertainties. 12 Chapter 2 How Reproducible are QM/MM Simulations? Lessons from Computational Studies of the Covalent Inhibition of the SARS-CoV-2 Main Protease by Carmofur 2.1 Introduction Computer simulations of structures and properties of biomolecules are now routinely used to aid biomedical studies, including characterization of prospective drug candidates and their interaction with pathogens’ enzymes. Among various simulation tools, quantum mechanics/molecular mechanics (QM/MM) approaches [20, 22] play an important role because they are able to describe making and breaking of chemical bonds. Computational search for efficient covalent inhibitors, which operate by binding covalently to the protein, relies on QM/MM as an essential tool. The COVID-19 pandemic prompted massive research efforts to reveal the mechanisms of the action of SARS-CoV-2 enzymes at a molecular level, with an ultimate goal of designing drugs to fight the disease[23]. In the past two years, numerous computational papers describing various non-covalent and covalent inhibitors, which potentially can inactivate these enzymes were published [24–34]. Given the urgency and the significance of the subject, the question of reliability 13 and reproducibility of the results of these and future simulations is of an utmost importance. Reproducibility of computational modeling of biological systems is not trivial because of the complexity of underlying theoretical models, of the computational protocols implementing these models, and of software stacks executing these protocols[35]. The standards of reporting the details of calculations developed for electronic structure calculations[36] are simply not sufficient in this context. The question of reproducibility of research results is of course much broader than molecular simulations. A recent study[37] investigated the reproducibility of the computational results from a random sample of computational papers published in Science since 2011. The authors were able to reproduce the findings of only 44% of the studies and attributed the difficulties to a variety of problems, ranging from authors’ desire to protect their data or software to the lack of standards and mechanisms for depositing digital artifacts, as well as the complexity of the data and protocols. Given that biomolecular simulations are much more complex than an average computational study (in terms of the protocols, the sheer size of the data and the codes), the problems of reproducibility are likely to be more severe, and would not be easily addressed by the proposed policies[38]. Well-justified by the urgency of the situation, a rapid pace of publications reporting computational studies related to COVID-19 calls for careful assessment of various aspects of QM/MM simulations. We do not imply that the software has not been properly tested or that the algorithms are not reliable; rather we point out that the results depend on numerous parameters hidden in the simulation setups. The pitfalls in biomolecular QM/MM simulations are well-known: QM/MM is not a black-box tool, the computational protocols are not standardized, the software is constantly evolving, and computational workflows are not fully automated [35]. The reproducibility of the results of QM/MM simulations is further hindered by the omission of details—often perceived by authors as minor, irrelevant, or trivial—in the published papers. In this chapter, we explore what level of transparency in reporting the details is required for practical reproducibility of QM/MM simulations, with an aim to provide a guide, in the spirit of IUPAC guidelines[36], for future studies. We use the reaction of an essential SARS-CoV-2 enzyme, the main protease (MP ro)[39], with a 14 covalent inhibitor, carmofur, [40] as a test case of chemical reactions in biomolecules. The main protease MP ro is a cysteine protease; in SARS-CoV-2, it catalyzes the cleavage of the polyprotein of the virus into the working proteins, which is the key step in the virus replication in human cells [41]. Carmofur is a certified drug for other diseases [42]. Its presumed inhibitory activity towards MP ro is shown in Fig. 2.1. The deprotonated side chain of the cysteine residue Cys145 of MP ro , which is formed upon proton transfer to His41, reacts with the electrophilic carbon atom of the carmofur tail attached to the fluoro-uracil warhead. This results in a covalently bound adduct, thus blocking the function of the enzyme. Computational characterization of such a reaction entails calculations of the energy profile along the reaction coordinate. These calculations can confirm (or dispute) proposed mechanisms and provide an insight into elementary steps involved. The comparison of the reaction energy profiles computed for different target molecules can then be used to evaluate their relative effectiveness in deactivating the enzyme. We carried out QM/MM calculations to determine the structures and energies of the reactants, the product, and the transition state/intermediate for the reaction of carmofur with MP ro. A reaction of MP ro with another covalent inhibitor (called N3) was recently investigated computationally by two expert groups[29, 32] using QM/MM and molecular dynamics simulations with QM/MM potentials. Although the two papers report reaction energies within 3 kcal/mol from each other, the difference in the reaction barriers are much larger, up to 10 kcal/mol, illustrating an extent by which the results can be affected by different QM/MM-based schemes. We employed two different software packages, NWChem[43] and Q-Chem[44, 45], in order to utilize various technical innovations available in them. Using analogous QM/MM models within the two packages, we constructed segments of the potential energy profile for the reaction in the enzyme’s active site. Our main benchmarking goal was to reproduce the key energetics computed with the two packages— in the ideal case of perfectly reproducible protocols, the structures of the key stationary points along the reaction profile and the respective energetics computed by the two software packages should be identical, 15 within small error bars consistent with numerical thresholds used in the calculations. Our results indicate that such agreement is difficult to achieve. QM/MM is a versatile approach[20, 22] for multi-scale Figure 2.1: The key step of the reaction between carmofur and MP ro —the nucleophilic attack of the thiolate of the catalytic cysteine. modeling, suitable for simulating chemical processes in complex environments, such as solutions, solids, interfaces, or proteins. The key idea is to partition the system into the important part (i.e., a subsystem where the chemical reaction occurs), to be treated quantum mechanically, and the environment, to be treated by less demanding methods, e.g., by classical forcefields. However, there is no unique recipe for how to break the system into the QM and MM parts, how to treat them, and how to describe their interaction. Even at a high level of generalization, as depicted in Fig. 2.2, QM/MM theory comprises multiple models and techniques. The ensuing computational protocols are far from being black box and tend to be 16 Figure 2.2: Various components defining a QM/MM protocol. system specific. To complicate the matters further, different software implementation of QM/MM models may lead to differences in computed properties obtained with seemingly identical protocols. Once the QM and MM parts are defined, one needs to specify how to treat them (i.e., which level of theory for the QM and MM parts), their boundary (i.e., what to do with the broken bonds) and how to describe their interaction (i.e., embedding type). We describe the QM part by the density functional theory (DFT) using the PBE0-(D3)/6-31G* level of theory and the MM part by the AMBER99 forcefield. We saturate the broken bonds by hydrogen link atoms and use the electrostatic embedding QM/MM scheme, which is capable of accounting for changes in charge distributions in the course of a reaction. Specifying the above details defines the essential features of the QM/MM setup; however, as we show below, this alone is not sufficient for reproducibility of the results, especially between different software packages. Below we analyze the impact of other parameters on the computed properties and quantify their effect by comparing the results computed with the two software packages. We consider: • Details of treating the boundary between the QM and MM regions, specifically, charge redistribution schemes; 17 • The geometry optimization protocols, specifically, whether microiterations are used and how they are implemented; • Versions of the AMBER99 forcefields and processing of the topology files; • Details of the grid and dispersion correction used in the DFT calculations. In addition to these details, which may, at least in principle, be specified by a precise description of the implementation and relevant input keywords/parameters, there is always a concern of the execution of the workflow. Numerous tasks involved in setting the calculations are not fully automated and often involve manual inspection of the structure (e.g., in order to assign proper protonation states of histidines and other titratable residues); in-house scripts are often used to convert topology files and coordinates from one software to another (e.g., to convert the results from initial equilibration procedure by classical molecular dynamics into inputs for QM/MM models). This creates additional challenges for reproducing the results of QM/MM simulations even within the same research group. 2.2 Details of the QM/MM simulations Figure 2.3: Illustration of the link atoms approach. QM and MM atoms are marked as Qi and Mi , respectively. The covalent bond between Q1 and M1 is cut and saturated by the link atom L. To avoid overpolarization, the MM charges on the boundary (M1 here) are set to zero; in some approaches, the excess charge is redistributed among the neighboring atoms to conserve the total charge. Image adapted with permission of Senn Angew. Chem., Int. Ed. 2009, 48, 1198–1229. 18 Figure 2.4: Definition of the moderate-size QM region (83 atoms). Carbon atoms are colored green, oxygen – red, nitrogen – blue, hydrogen – white, fluorine – cyan, sulfur – orange. The hydrogen link atoms are shown in dark grey; they correspond to His41 (LH), Asn142 (LA1, LA2) and Cys145 (LC). Designation of several atoms in the system, which are referenced below (e.g., C7, O7, N7 from carmofur, SG from Cys145, etc.) is specified. Figure 2.3 illustrates the treatment of covalent bonds at the QM/MM boundary by using the link atom approach. Here we used hydrogens as the link atoms to saturate the dangling bonds; however, this alone does not fully define the model. The exact placement of the link atoms and the definition of charges on the boundary region varies among different implementations [46–48]. We used two different QM/MM partitioning schemes— one with a moderate-size QM system (83 atoms) and one with a larger QM system (155 atoms). Figure 2.4 shows the moderate-size QM region, which is used in most of our QM/MM calculations. It comprises 83 atoms belonging to the carmofur molecule, side chain/backbone atoms from His41, Asn142, Gly143, Cys145, one water molecule, and four hydrogen link atoms. Figure 2.4 shows that five covalent bonds are cut in this QM/MM partitioning (83 atoms in QM). Cutting the Cys145 chain entails adding the hydrogen link atom (LC) at the bond between CA(Cys145) (i.e., Q1 in Fig. 2.3) and C(backbone) (i.e., M1 in Fig. 2.3). Correspondingly, in Fig. 2.3 the Q2 atoms are CB(Cys145), N(backbone), H; the M2 atoms are O and N; the M3 atoms are C and H. Although cutting the non-polar bonds (such as CA-CB of the His41 side 19 chain) is preferred, in our QM/MM partitioning we choose to cut some polar bonds ([CA-C(O)] or [CAN(H)]) in order to include the backbone atoms forming important hydrogen bonds with the substrate into the QM subsystem. The large QM system (155 atoms) comprises the moderate-size QM system plus the side chain/backbone atoms from Thr25, Thr26, Leu27, Leu141, Asn142, Gly146, His164, Met165, Asp187, four water molecules, and 15 link atoms. The relevant details are given in the Supporting Information (SI). When using electrostatic embedding, the QM region is polarized by the Coulomb potential due to the MM point charges. The charges are determined by the forcefield and the treatment of the boundary. Forcefields are generally well documented, although some variations exist among different software packages that do not share the same topology with the AMBER99 parameters. For example, NWChem and Q-Chem differ in description of the CA atom-types of the C- and N-terminal amino-acids. In addition, the boundary treatments may also vary, which, as we show below, can lead to substantial discrepancies in QM energies; similar observations have been also made by Lin and Truhlar[48]. To avoid overpolarization of the QM region, the charges of the boundary atoms (M1 in Fig. 2.3) are usually set to zero; however, the subsequent treatment varies—some implementations redistribute the charges of the boundary atoms among the neighboring atoms to preserve the total charge whereas others simply ignore it; the redistribution schemes can also vary[46, 48]. The Q-Chem calculations were carried out using the HLINK option implemented in a developer’s version of Q-Chem. The charge of the boundary atom (M1 in Fig. 2.3) was set to 0 and its original forcefield charge was uniformly distributed among the neighboring MM atoms, e.g., 1/3 of the original charge on M1 was added to the three M2 atoms. This procedure is automated in the Q-Chem HLINK implementation and was tested by carrying out additional single-point calculation in which the QM energies were computed in the field of the manually prepared point charges. The NWChem calculations were carried out using the “mm_charges exclude none” option. It includes all MM point charges in the calculation except the ones located on the covalent QM/MM boundary, i.e., the charge of M1 set to 0 and there is no 20 charge redistribution over neighboring MM atoms. The two protocols (Q-Chem and NWChem) are equivalent to the “shift” and “Z1” schemes from Ref. 48, respectively. To reduce cost of the MM force evaluation step, classical molecular dynamics simulations often use electrostatic cutoffs (10-14 Å) in the MM force evaluation step. In QM/MM simulations, electrostatic cutoffs afford a speed-up in evaluating one-electron contributions to the Hamiltonian. However, such cutoffs can cause problems with optimization, e.g., convergence issues when charges cross the cutoff line. Therefore, we did not apply cutoffs of the electrostatic contributions, i.e., the cutoff radius was larger than the system size. This detail is crucial for achieving convergence in the energy minimization procedure. The results of the calculations can be also affected by the details of the geometry optimization algorithms (full QM/MM optimization versus micro-iterations). The QM energy includes the Coulomb interaction with the MM region. The MM energy includes forcefield interactions between the MM atoms and van der Waals interactions between the QM and MM parts. In the standard optimization step, the total energy and gradient include the electrostatic interaction between the MM charges and polarized electron density of the QM system. To speed-up calculations, NWChem affords a multi-region optimization procedure (called micro-iterations), such that at each optimization cycle the QM region is optimized for M steps (10 in our calculations) with the MM region being frozen, followed by N steps (300 in our case) of the optimization of the MM region with the QM region being frozen. In the MM micro-iterations, the QM/MM electrostatic interaction is described using two options: “density espfit” or “density static”. The first option approximates the electron density of the QM region with the point charges obtained in the end of the QM optimization cycle, whereas the second option uses exact frozen electron density of the QM region computed in the end of the QM optimization cycle[43]. As discussed below, the two schemes yielded slightly different structures. The micro-iteration feature is not available in Q-Chem, such that the QM and MM regions are optimized together in each cycle. Q-Chem can only carry out full unconstrained optimizations because the current implementation of the limitedmemory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm[49] does not allow for the constrained 21 geometry optimization or saddle point searches. Hence, we could only compare optimized structures of the reactants, products, and intermediates, but not of transition states. We prepared the model system of the enzyme-carmofur complex starting from the coordinates of the heavy atoms provided by the protein data dank (PDB) structure 7BUY [40]. Because the PDB structure contains only the alyphatic carmofur tail covalently bound to the catalytic Cys145 residue, the entire carmofur molecule with the fluoro-uracil warhead (see Fig. 2.1) was manually docked into the active site after careful inspection of the structure of the reaction product. Protons were added to the aminoacid residues according to their conventional states at neutral pH; i.e., all Arg and Lys residues were positively charged, Glu and Asp were negatively charged. The N-terminal Ser and C-terminal Gln residues were protonated, yielding positive and negative charges, respectively. The histidine residues were protonated according to the hydrogen-bond pattern implied by the heavy atoms positions, i.e., Nϵ-protonated His64, His163, His164, His172, His246, and Nδ-protonated His41, His80. The water molecules from the initial crystal structures were kept. The missing protons were added at the model topology generation step with psfgen from the NAMD suite[50]. We fully solvated the protein in a TIP3P water box and then manually removed water molecules that were further than 3 Å from either the carmofur or the protein surface. The final model system contained 1,250 water molecules, including those present in the crystal structure PDB 7BUY. Following QM/MM partitioning, we optimized these model systems with NWChem as described below. To generate input files for Q-Chem calculations, we employed the Tinker package [51], which can read PDB files, recognize the names of the amino acid residues, and generate topology files suitable for Q-Chem by assigning the atom type labels according to the Tinker convention (also used by Q-Chem). Some atom labels used in NWChem were adjusted to match the convention of the AMBER99 forcefield as implemented in Tinker; we show an example (the serine side chain) in the SI. We also provide the inputs for QM/MM simulations for both packages. We treated the QM part using the PBE0 functional [52] with the 6-31G* basis set and the MM parth using the AMBER99 forcefield[53]. We used default grids (SG-1) and the original variant of Grimme’s D3 correction[54]. We 22 note that the Q-Chem implementation uses by default slightly different damping functions than NWChem; however, the same damping functions can be deployed using appropriate keywords. The details of the grid, the exact variant of D3 correction, and relevant input keywords are given in the SI. We also investigated the effect of the functional choice on the reaction energetics and report additional results obtained with ωB97X-D[55, 56]. We consider the segment of the potential energy surface (PES) between the reactant (denoted below as REAC) and the product (denoted as PROD) of the reaction of MP ro with carmofur. PROD corresponds to the covalent complex between MP ro and the alyphatic tail of carmofur (whose structure can be compared to the crystallography data) and the separated fluoro-uracil warhead in the active site. REAC corresponds to the reactive conformation of the reactants, formed upon proton transfer within the catalytic dyad from the initially neutral side chain of Cys145 to the initially neutral side chain of His41 (see the upper part of Fig. 2.1). We did not model the initial enzyme-substrate (ES) complexes with the neutral Cys145 and His41 species, because the initial step of proton transfer from Cys/His to Cys−/His+ in cysteine proteases is well studied[28, 57, 58] and is not considered to be critical in the reaction mechanism. To investigate the effect of the QM size, we computed reaction profile using the two QM-MM partitioning schemes described above (83 and 155 QM atoms). We consider two different protocols for computing the reaction profile. In Protocol 1, we use the same structures (optimized with NWChem) to carry out single-point energy calculations with NWChem and Q-Chem. In Protocol 2, we compute reaction energy profile using structures optimized with respective packages, i.e., NWChem energetics is computed using NWChem-optimized structures and Q-Chem energetics is computed using Q-Chem-optimized structures (Q-Chem optimizations were carried out starting from the NWChem optimized structures). As discussed below, the agreement between the NWChem and Q-Chem optimized structures is reasonable but not perfect. For the moderate-size QM, we located the minimum energy structures of REAC, PROD, and the respective transition state (TS) by optimizing the geometry with the density espfit option in NWChem. The REAC and PROD structures were re-optimized for the 155-atomic QM part with the density espfit and 23 density static options. Using Protocol 2 (geometry optimization in each software package), we obtained the structures and energies of three minimum energy points, REAC, PROD, and the reaction intermediate (INT). Below we compare the total QM/MM energies and various individual contributions for each calculation. We emphasize that we compare the results obtained with NWChem and Q-Chem using precisely the same QM/MM partitioning, the same MM parameters, and the same QM level of theory. We note that often only these computational details are reported in the QM/MM studies. 2.3 Analysis of QM/MM results: Protocol 1 Figure 2.5: Alignment of the PROD and X-ray structures. Distances are in Å. The values in italics refer to the crystal structure. Hydrogen atoms are omitted. We begin by analyzing the three key points on the PES—REAC, TS, and PROD—located with NWChem using the (more economical) density espfit option with the moderate-size (83 atoms) QM part. In this calculation we were able to locate the true TS structure as the stationary point with a single imaginary frequency of 293i cm−1 (the frequency calculation was also carried out with NWChem). It is instructive to compare the results of QM/MM optimization with the only available piece of experimental information—the Xray structure of the enzyme deactivated by the reaction with carmofur. Fig. 2.5 shows the structure of PROD superimposed over the crystal structure PDB 7BUY, focusing on the moieties that are important for this reaction. Fig. 2.5 shows the most important structural parameters, namely, the length of the formed 24 covalent bond SG(Cys145)-C7(carmofur) and the parameters of the formed oxyanion hole (the distances between O7 atom of carmofur and the backbone nitrogen atoms). The X-ray and computed parameters agree reasonably well, especially taking into account that the PROD model system includes the leaving group (the fluoro-uracil warhead), which is absent in the X-ray structure. The magnitude of discrepancies is typical for QM/MM simulations. Fig. 2.6 summarizes the results of calculations. The individual panels show the REAC, TS, and PROD structures, and the reaction energy profiles computed with NWChem and Q-Chem. The reaction energetics computed with the two software packages shows the discrepancies of 3 kcal/mol at the TS point and of 6 kcal/mol at the PROD point, which is discouraging, especially given that the calculations used the same QM/MM partitioning, same geometries, and the same QM and MM treatments. Table 2.1 shows various energy contributions to the total QM/MM energy obtained with NWChem Figure 2.6: Structures of REAC, TS, and PROD obtained with Protocol 1 and moderate-size (83 atoms) QM part and the corresponding total energy profile. Distances are in Å. The TS panel shows the vibrational mode with the imaginary frequency 293i cm−1 . and Q-Chem. The total energy in the last row is what determines the reaction profile (energies of the TS and PROD relative to REAC; shown the left lower panel in Fig. 2.6); it is the sum of QM+QM/MM and MM terms. To understand the sources of discrepancies between the calculations performed with two 25 software packages, we compare different contributions to the total energy. These terms are: “QM in gas phase” is the quantum mechanical energy of the isolated QM subsystem (no MM point charges); “QM in MM charges” is the energy of the QM subsystem in the presence of the external MM charges from which the explicit charge-density and charge-nuclei contributions are subtracted[59]; “QM + QM/MM” is the full energy of the QM system in the presence of the MM charges (i.e., “QM in MM charges” plus the explicit charge-density and charge-nuclei contributions); and “MM” is the forcefield energy of the MM region. We attribute small differences in the absolute QM energies (“QM in gas phase”) to the differences in: (i) the precise positions of the link atoms and (ii) the parameters of the DFT grids (the grid parameters are given in the SI; they are similar for all elements except for sulfur). The differences in the absolute and relative QM energies are 0.0016 hartree and 0.2 kcal/mol, respectively. Thus, the single-point QM energies are consistent between the two packages. The discrepancies in total energies between the two packages are due to the different ways of redistributing external charges on the QM boundary. The MM energies follow a similar pattern; relative energies are consistent, whereas differences in absolute energies are larger. The discrepancies in the MM energies are due to slight differences in topologies within the versions of the AMBER99 forcefield implemented in the two software packages. This analysis of the discrepancies between individual terms allows us to attribute the discrepancies in the total energies to the “QM+QM/MM” term, i.e., the explicit electrostatic contribution (charge-nuclei and charge-density interaction), whereas the implicit electrostatic contribution (energy of the system polarized by the MM charges) that is accounted for in the “QM in MM charges” term is consistent in the two software packages. Our results illustrate that small differences in the treatment of the QM/MM boundary can and ultimately do lead to substantial differences in the computed reaction energy profiles. The overall differences in the computed energetics for TS and PROD relative to REAC are around 3-7 kcal/mol. The magnitude of these discrepancies is disappointingly large compared to the desired accuracy of 1 kcal/mol; it also exceeds the errors due to approximations in the quantum-chemistry treatments. This qualifies the uncertainties in the QM/MM calculations, setting 26 the bar for reproducibility of the QM/MM results between different software packages. We note that the effect of different charge redistribution schemes has been investigated before by Lin and Truhlar[48], who reported differences of tens of kcal/mol in proton affinity calculations using QM/MM partitioning with very small QM systems and various treatments of the boundary. As the next step, we analyze the effect of increasing the size of the QM subsystem (up to 155 atoms, see Section 2.2) and using different optimization protocols (density espfit versus density static). In these calculations, we only consider the REAC and PROD structures and their relative energies. The energies are collected in Table 2.2. First, we note that Figure 2.7: Alignment of the REAC structures optimized with NWChem using the density espfit (colored balls and sticks, red values for distances) and density static (yellow sticks, dark blue values for distances) options. Hydrogen atoms are omitted. Distances are in Å. the structures computed using these two optimization options differ slightly. Fig. 2.7, which compares the two structures of REAC, shows that the discrepancies do not exceed 0.1 Å for the critical distances: i.e., the distance of the nucleophilic attack (SG(Cys145)-C7(carmofur)), the distance between SG(Cys145) and NE(His145), and the distances describing the future oxyanion hole between O7 (carmofur) and nitrogen atoms in the Gly143-Cys145 chain. However, these slight discrepancies in the optimized structures lead to discrepancies in the relative energies of REAC and PROD, up to 2-3 kcal/mol, as shown in Table 2.2; cf the corresponding rows “NWChem density espfit” and “NWChem density static”. Comparison of the respective 27 rows in the “QM + QM/MM” section shows the discrepancies between the NWChem and Q-Chem energies increases up to 6 kcal/mol, giving rise to the same-magnitude discrepancies (5 kcal/mol) in the total energy as in the calculations with a moderate-size QM subsystem. This is a counter-intuitive finding— we expected that increasing the QM subsystem would reduce the discrepancies due to slightly different treatment of the boundary as the boundary moves further way from the reaction center. However, larger QM subsystem resulted in a larger QM-MM boundary, which entailed cutting more covalent bonds and, consequently, more link atoms and more points where the redistribution of boundary charges occur. It is indeed disappointing that increasing the QM does not improve the agreement between the two software packages. 2.4 Effect of different structures: Protocol 2 Figure 2.8: Alignment of the REAC structures optimized by Q-Chem (balls and sticks colored by elements) and by NWChem (stick colored yellow). Distances are in Å. Hydrogen atoms are omitted. Having established the effect of the structures computed using two different protocols in NWChem on the reaction profile, we now focus on the differences in the key energetics computed with MWChem and QChem using the structures optimized with each software. Here we use moderate-size QM part (83 atoms). 28 Figure 2.9: The reaction energy profile showing the relative energies of REAC, INT and PROD located using Protocol 2. The structures were first optimized with NWChem (using "density static") and then reoptimized with QChem. In these calculations, we located one more minimum energy point on the PES between REAC and PROD, which was overlooked in the density espfit calculations. This structure denoted as INT corresponds to a tetrahedral intermediate typical for the serine or cysteine protease catalysis. The energy of INT is slightly below the previously located TS. Also, we could not locate the second transition state, separating INT and PROD. Multiple scans of the PES in the region around the first TS (separating REAC and INT) reveal very shallow landscape and allow us to estimate the height of the second transition state to be below 1 kcal/mol. Therefore, here we focus on the three stationary points, REAC, INT and PROD. Fig. 2.8 shows superposition of the moieties assigned to the QM part in the REAC structure obtained in two packages. The distances along the chemical bonds are practically identical in the two optimized structures. There are slight discrepancies in intermolecular distances. For instance, the distance of the nucleophilic attack (i.e., the SG(Cys145)-C7(carmofur) distances) are 3.44 Å and 3.39 Å in the NWChem and Q-Chem optimized structures, respectively. There are also slight differences in the hydrogen-bond patterns of water molecules near the active site. These small differences can affect relative energies obtained with two packages. The 29 results are summarized in Fig. 2.9 and in Table 2.3. We observe discrepancies of 1-3.5 kcal/mol in relative total energies computed using this protocol, which are slightly smaller than the discrepancies obtained with Protocol 1. 2.5 Discussion Our initial research plan was to validate the QM/MM protocols in the two software packages, NWChem and Q-Chem, using the MP ro-carmofur reaction as a test case, and then proceed with computational design of other prospective covalent inhibitors of MP ro by the joint efforts of our research groups located at different parts of the globe. However, our benchmarking calculations revealed numerous problems of QM/MM calculations, which we did not anticipate to encounter in such a mature field. Indeed, the QM-MM boundary issues have been extensively discussed in many studies going back to the late 20th century[20, 48, 60–63]. Several software packages are used nowadays, almost routinely, to scan reaction energy profiles for enzyme catalysis. Most users commonly employ the default options of the QM/MM algorithms with confidence that these broadly used tools are robust and produce reliable results. However, our study documents that the codes hide some serious pitfalls related to the QM-MM boundary treatments. The results discussed above show that small differences in protocols such as optimization algorithms can lead to slightly different stationary points, even when starting from the identical staring points. Moreover, small differences in implementations can lead to discrepancies of up to 5 kcal/mol in relative energies computed with two different software packages, even when using the identical structures. Strictly speaking, these problems are not critical for the computational prediction of covalent inhibitors of enzymes. For the purpose of computational screening of prospective inhibitors, it is sufficient to estimate whether the binding energy of the complex is sufficiently large (i.e., greater than 15 kcal/mol) while the barrier for the ratelimiting step is not too high (e.g., less than 15 kcal/mol). This is clearly the case for the MP ro-carmofur 30 reaction, as we report here. Despite the noted quantitative discrepancies in calculation results, the overall picture that emerged from the QM/MM simulations is consistent in the sense that: (1) the reaction energy (the relative energy between PROD and REAC) is large enough (around 20 kcal/mol) to explain strong covalent binding of the carmofur tail by the protein; and (2) the energy barrier of the reaction is small enough (less than 10 kcal/mol) to explain efficient chemical reaction between carmofur and MP ro . The key features of the energy landscape are illustrated in Fig. 2.9 and in Tables 2.1, 2.2, and 2.3. The structure called TS, lying within 8 kcal/mol above REAC, separates REAC from the reaction intermediate INT, the energy of which is slightly below the TS level. The shallow energy landscape around the TS-INT region should contain another low saddle between INT and PROD; however, locating all possible stationary points is not necessary (a more rigorous approach would be to compute free energy profiles by QM/MM-based molecular dynamics simulations[31, 33, 64–67], possibly augmented by machine learning methods[68, 69]). Moreover, given the shallow energy landscape, it would be unrealistic to expect the full consistency between the two software packages because of small differences in total energies (Table 2.3) and important structural parameters (Fig. 2.8), as can be seen from total energy components in Table 2.3. The results of the present simulations are consistent with the experimental observation that carmofur binds covalently to MP ro and can act as an efficient inhibitor. Following this initial study, we have already carried out series of QM/MM calculations aiming to identify novel covalent inhibitors of MP ro. The results are presented in the companion paper, where we followed the lessons learned here and carefully reported all technical details of QM/MM calculations, as required for reproducibility. Coming back to the benchmarking, we comment here on the choice of a quantum-chemistry method and the size of the QM part. Currently, there is no practical alternative to DFT due to large sizes of typical QM subsystems, which often go up to hundreds of atoms, and the need to perform numerous energy and gradient calculations in QM/MM optimizations or free-energy simulations. The hybrid functionals such as B3LYP or PBE0 are commonly used in such calculations[70–77]. However, modern range-separated functionals might offer 31 better accuracy and reliability, especially because the QM subsystem undergoes massive charge redistribution in the course of the reaction, so that the energetics might be spoiled by self-interaction errors[78, 79]. To look into this, we computed single point QM/MM energies with the ωB97X-D3 functional[55, 56] using Q-Chem. We obtained relative energies of +7.3 for INT and -22.2 kcal/mol for PROD, which are close to the +7.5 and -22.1 kcal/mol values obtained with PBE0-D3 (shown in Table 2.3). These differences due to the choice of the functional are clearly smaller than the discrepancies due to different QM/MM implementations in different software packages. The conventional wisdom is that using large QM subsystems generally improves the accuracy and the robustness of the results. Indeed, the differences between the two treatments of the boundary observed in our calculations are considerably smaller that the discrepancies reported by Lin and Truhlar[48] who used tiny QM systems. The effect of the size of the QM region and other aspects of QM/MM simulations have been investigated by many researchers[80–84]. For example, Ochsenfeld and co-workers have shown[80, 81] that reaction energetics can be reliably computed even with mechanical embedding, provided the QM system size is large enough. When using electrostatic embedding, they reported much faster convergence with respect to the QM system size: about 1,000 atoms for proton transfer in DNA[81] and 150-300 atoms for an isomerization reaction in a peptidic system[80]. We investigated the effect of increasing the QM part by comparing the results of the 83-atom-large and 155-atom-large QM subsystems. Contrary to our expectation that a larger QM region would make the results less sensitive to the details of the protocols and, therefore, improve the reproducibility, we found that the effect is more nuanced. It turns out that the larger QM subsystem requires cutting more covalent bonds, giving rise to an extended boundary. The presence of the extended boundary exacerbates the effect of small differences in the treatment of the QM/MM boundary between the two software packages. This finding poses a bigger question about multi-scale methods—can one expect the convergence of the results towards the exact answer (full QM treatment) as the QM subsystem size increases and—if yes—how smooth this convergence might be? 32 2.6 Conclusions In this chapter we show that seemingly minor details of QM/MM simulations, such as the treatment of the MM atom types, placements of link atoms, and charges in the boundary region, should be reported in order to ensure the reproducibility of the results. Specifically, to accurately reproduce the results obtained with the specific electrostatic embedding QM/MM approach and a specific software, one would need all details of the calculation, to a keyword in the input file, and the exact version of the code. The PDB structures of the stationary points are simply not sufficient. Even with such level of details, one can expect differences up to 5 kcal/mol between different software packages, because of the differences in implementation and the inability for a user to control every small detail of the algorithm, as many parameters are hard-coded and cannot be changed via input. The problem of reproducibility exists even within the same package and the same group. Presently, it is not clear what one can do, but the first step is to acknowledge this problem and this is what we attempted to do in these pages. The QM treatment (with no embedding) is under control[36], provided that thresholds, cutoffs, damping functions, grids, etc are specified. For DFT, which is the standard choice for QM/MM calculations, the cutoff, grid parameters, and exact details for empirical dispersion corrections should be reported for quantitative reproducibility of the results. We encourage researchers to share more details, including but not limited to the actual grids used in numerical integration. Such details can be provided in the supplementary materials or uploaded to relevant databases (such as MolSSI’s COVID hub). Reproducibly of QM/MM results is challenging because one keyword can change the conclusions quantitatively. The basic idea of multi-scale modeling is great—it is well justified by physics and is practical. The basic idea of the electrostatic embedding scheme is also great and very useful. The scientific community uses extensively QM/MM-based techniques to describe chemical processes happening in complex environments. Presently, there is no alternative for modeling chemical transformations in complex biomolecular systems. The next step to maturity of the field is a standardization of the protocols 33 and ways to store, access and analyze results obtained by different scientists with different software. Simply stating “QM/MM electrostatic embedding scheme” in a paper is clearly not sufficient, as the details of the scheme could be fine-tuned by several keywords in the software, and that could lead to quantitatively different results. It is difficult to find a practical solution to the changes of the software—the codes are constantly evolving in order to adapt to new hardware, improve efficiency, or expand the functionality. Even if no bugs are introduced by the updates, the results produced by different versions of the same software can differ because defaults were changed. Some proposals go as far as suggesting to use docker containers with snapshots of the exact software executables and even operational system used in a research project[37], however, we consider this to be impractical and burdensome to the researchers and the environment (e.g., the infrastructure for keeping such vast amount of data would have a significant carbon footprint). Equally impractical is providing all output files for a project because of their large sizes—tens of gigabytes for a single reaction profile or terabites for dynamics and free-energy simulations. Instead, we suggest to find a reasonable compromise in reporting the details—clearly there is a vast space between not showing any details and having the full containers with the exact software, libraries, input and output files. Instead of aiming at the exact reproducibility we propose to attempt to quantify anticipated error bars due to software implementations, as we have done here in the context of energetics of an enzymatic reaction. On a positive note, despite of these pitfalls, we emphasize the consistency of the qualitative conclusions based on the results obtained in the different parts of the globe and using different software packages applied to the important and urgent problem. We conclude that prediction of prospective covalent inhibitors for troublesome enzymes can be successfully accomplished by properly documented QM/MM modeling. 34 Table 2.1: Energies for the REAC, TS, and PROD structures computed using the same structures.a Energy Energy (a.u.) Energy relative contribution Software REAC to REAC (kcal/mol) TS PROD QM in NWChem -2474.6967 13.9 -5.4 gas phase Q-Chem -2474.6983 14.1 -5.2 QM in NWChem -2474.5907 12.2 -5.7 MM charges Q-Chem -2474.6637 12.9 -5.1 QM NWChem -2475.1349 14.9 -9.7 + QM/MM Q-Chem -2474.9139 11.5 -17.0 MM NWChem -39.1656 -6.3 -8.9 Q-Chem -39.2451 -5.9 -8.8 Total energy NWChem -2514.3005 8.5 -18.6 QM + QM/MM + MM Q-Chem -2514.1591 5.5 -25.8 a 83-atom QM subsystem (see Fig. 2.4); structures optimized with NWChem using the density espfit option. D3 correction is included in all terms except MM. Definitions: “QM in gas phase” is the quantum-mechanical energy of the isolated QM subsystem (no MM point charges); “QM in MM charges” is the energy of the QM subsystem in the presence of the external MM charges without the explicit charge-density and charge-nuclei contributions; “QM + QM/MM” is the total QM energy in the presence of the MM charges (i.e., “QM in MM charges” energy plus the explicit charge-density and charge-nuclei contributions); “MM” is the forcefield energy of the MM region. 35 Table 2.2: Energies for the REAC and PROD structures.a Energy contribution Software Optimization option in NWChem E (a.u.), REAC Energy of PROD relative to REAC (kcal/mol) QM in gas phase NWChem density espfit -4092.3434 -3.7 Q-Chem -4092.3472 -4.0 NWChem density static -4092.3439 -5.2 Q-Chem -4092.3475 -5.3 QM in MM charges NWChem density espfit -4091.8895 -1.0 Q-Chem -4092.2708 -1.0 NWChem density static -4091.8897 -2.5 Q-Chem -4092.2712 -2.5 QM + QM/MM NWChem density espfit -4093.1537 -28.4 Q-Chem -4092.8357 -34.1 NWChem density static -4093.1711 -26.0 Q-Chem -4092.8553 -31.5 MM NWChem density espfit -38.3278 +5.9 Q-Chem -38.5575 +6.1 NWChem density static -38.3284 +5.4 Q-Chem -38.5543 +5.9 Total energy QM + QM/MM + MM NWChem density espfit –4131.4814 -22.5 Q-Chem -4131.3932 -28.0 NWChem density static -4131.4995 -20.5 Q-Chem -4131.4096 -25.6 a 155-atom QM subsystem, structures optimized with NWChem using the density espfit and density static options. See footnote in Table 2.1 for the definition of various energy terms. Table 2.3: Energies for the REAC, INT, and PROD structures.a Energy Energy (a.u.) Energy relative contribution Software REAC to REAC (kcal/mol) INT PROD QM in NWChem -2474.5961 13.5 -3.5 MM charges Q-Chem -2474.6833 10.9 -5.6 QM NWChem -2475.1218 7.1 -16.8 + QM/MM Q-Chem -2474.9402 11.2 -16.6 MM NWChem -39.1798 -0.8 -1.8 Q-Chem -39.2606 -5.8 -5.3 Total energy NWChem -2514.3017 6.3 -18.6 QM + QM/MM + MM Q-Chem -2514.2008 5.5 -21.9 a 83-atom QM subsystem optimized with NWChem density static and independently with Q-Chem. See footnote in Table 2.1 for the definition of various energy terms. 36 Chapter 3 Multiscale Simulations of the Covalent Inhibition of the SARS-CoV-2 Main Protease: Four Compounds and Three Reaction Mechanisms 3.1 Introduction The quantum mechanics/molecular mechanics (QM/MM) methods are indispensable tools for modeling biochemical reactions in complex environments.[85–93] QM/MM-based calculations enable construction of potential energy and free energy profiles of enzyme-catalyzed reactions and reactions of the covalent inhibition of enzymes. The latter are of particular interest because of the COVID-19 pandemics. The pandemics stimulated massive efforts, including computer simulations, aiming to reveal molecular-level mechanisms of the action of SARS-CoV-2 enzymes and to design efficient non-covalent and covalent inhibitors to inactivate target enzymes.[94–97] This work contributes to this effort by modeling reactions of four compounds with the critical SARS-CoV-2 enzyme, the main protease (MP ro), also known as the 3 chymotrypsin-like protease (3CLPro).[98, 99] This enzyme, encoded by the viral genome, plays an important role in cleaving viral polyproteins into functional proteins. Thus, inhibiting this enzyme [98–105] blocks viral replication, making MP ro an attractive drug target. QM/MM-based computer simulations provide insights into cysteine protease reaction mechanisms and can be used to predict novel compounds as 37 prospective drugs.[106–119] Numerous studies investigated irreversible (or covalent) inhibitors of cysteine proteases (a class to which MP ro belongs).[96, 112–129] The list of prospective inhibitors is growing, but mechanisms of their interaction with the enzyme are not yet fully elucidated. We consider two compounds, carmofur[105, 130, 131] and nirmatrelvir,[132, 133] which have already been identified as the irreversible inhibitors of MP ro. We also introduce two novel compounds, called X77A and X77C. We designed these molecules computationally, starting from the structure of X77, a potent noncovalent inhibitor[134, 135] of MP ro. X77 is capable of forming a tight surface complex with SARS-CoV-2 MP ro, whose structure has been deposited in the Protein Data Bank[136] (PDB ID: 6W63). Fig. 3.1 shows molecular models of the compounds considered, and their chemical formulae are given in Fig. 3.2. As we discuss below, the reactions of all four compounds with the catalytic amino acid residue Cys145 of MP ro involve the nucleophilic attack of the Cys145 thiolate on the target carbon atom of the inhibitor (red asterisks in Fig. 3.1 mark these target carbon atoms); however, the detailed mechanisms are different. In the figures and in the text, we refer to these carbon atoms and their chemically bound partners (see Fig. 3.1) oxygen, nitrogen, fluorine without additional indices (in the files in the Supporting Information (SI), these atoms have specific indices in each compound). Carmofur, 1-hexylcarbamoyl-5-fluorouracil, is a known drug for the treatment of colorectal cancer.[131] Nirmatrelvir, (1R,2S,5S)-N-[(1S)-1-cyano-2-[(3S)-2-oxopyrrolidin-3-yl]ethyl]-3-[(2S)- 3,3dimethyl-2-[(2,2,2-trifluoroacetyl)amino]butanoyl]-6,6-dimethyl-3-azabicyclo- [3.1.0]hexane2-carboxamide, is also known as the substance PF-07321332 developed by Pfizer. This compound is an active component of the approved oral drug (Paxlovid) for the treatment of COVID-19.[132] The two new molecules were derived computationally from the structure of X77, N-(4-tert-butylphenyl)-N- [(1R)-2-(cyclohexylamine)-2-oxo-1-(pyridin3-yl)ethyl]-1H-imidazole-4-carboxamid. Several studies [134, 135] described this compound and its mimetics as a promising non-covalent inhibitors of MP ro. According to the results of molecular docking,[135] the binding energy of X77 to MP ro is high (∆G∼-10 kcal/mol), giving rise to the 38 dissociation constant of 0.057 µM. Here, we follow a different strategy, aiming to develop effective covalent inhibitors. Specifically, we propose to modify X77 by introducing warhead groups capable of efficient chemical reactions with the catalytic cysteine residue in the MP ro active site (see Fig. 3.1). In other words, we propose to turn an efficient non-covalent inhibitor into a covalent inhibitor. According to the current Figure 3.1: Molecular models of the compounds considered here as covalent inhibitors of MP ro. Here and in all figures, carbon atoms are colored green, oxygen—red, nitrogen—blue, sulfur—yellow, fluorine—cyan, hydrogen—white. Red asterisks mark the target carbon atoms of the nucleophilic attack of the Cys145 thiolate of MP ro . knowledge, reactions of the covalent binding of the catalytic Cys145 of MP ro are initiated by the proton transfer from cysteine to its partner in the catalytic dyad, His41, followed by the nucleophile attack of the sulfur ion on the target carbon atom of the ligand[107–129, 137–141] (e.g., see discussion in Ref. 125). The emerging negative charge on the atom chemically bound to the target carbon atom (oxygen in carmofur and X77A, nitrogen in nirmatrelvir, fluorine in X77C) is stabilized by the oxyanion hole formed by the peptide chain Gly143-Ser144-Cys145. Thus, the following structural elements are important for modeling the 39 Figure 3.2: The chemical formulae of the compounds shown in Fig. 3.1. The carbon that forms a covalent bond with the sulfur atom of the Cys145 residue during the covalent complex formation atom is marked blue. inhibition reaction—the side chains in the catalytic dyad (Cys145/His41) and the oxyanion hole (Gly143- Ser144-Cys145); this is common for all four compounds considered in this chapter. The differences are as follows. In reactions with carmofur, X77A, and X77C, the formation of the covalent bond between the sulfur and carbon atoms is accompanied by the leaving group (fluoro-uracil for carmofur and X77A, fluorine ion for X77C), whereas in the reaction with nirmatrelvir, there is a proton transfer pathway, which saturates the emerging valency in the nitrile nitrogen. Mechanisms of the creation of the leaving groups may also follow different scenarios. These important details of the reaction mechanisms are the focus of our study. 40 3.2 Methods: system preparation and computational protocols We used the following strategy to simulate mechanisms of selected reactions. First, at the QM/MM level, we optimized structures of stationary points on the potential energy surfaces (PES) corresponding to the reactants, products, intermediates, and transition states along the hypothesized reaction coordinates. The analysis of the corresponding structures informed the selection of collective variables for the subsequent calculations of the Gibbs energy profiles using molecular dynamics simulations with QM/MM potentials (QM/MM MD). The crystal structure PDB ID: 7BUY[105] of MP ro with the aliphatic tail of the carmofur molecule attached to Cys145 served as a template to construct all model systems. The MP ro–carmofur model for QM/MM calculations of the reaction of covalent inhibition of MP ro by carmofur using the NWChem[142] and Q-Chem[143, 144] software packages, was reported in our previous study[145]. We prepared model systems of the three other compounds by inserting the corresponding substrates into the protein structure using molecular mechanics tools. To validate the structures produced by molecular mechanics, we used the crystal structures of MP ro complexed with the relevant ligands, i.e., PDB ID: 7VH8 for the product of the MP ro-nirmatrelvir reaction and PDB ID: 6W63 for the complex of the non-covalent inhibitor X77 with MP ro. Fig. 3.3 shows the fragments of the active site of MP ro, as they appear in the PDB structures relevant to the present simulations. We pay attention to the position of the amino acid residues of the catalytic, Cys145 and His41, and of the oxyanion hole side chains, Gly143-Ser144-Cys145, which is directly related to the chemical reactions of the selected compounds (Figs. 3.1, 3.2) with Cys145 in the protein cavity. To design prospective covalent inhibitors, we replaced the molecular group of the non-covalent MP ro inhibitor X77 (highlighted in yellow in Fig. 3.3) by the reactive warheads (see panels X77A and X77C in Figs. 3.1, 3.2). Importantly, our molecular docking calculations show that the X77A and X77C molecules have binding energies with MP ro similar to those of the parent X77 species: -8.9 kcal/mol for X77A and -9.4 kcal/mol for X77C, to be compared with our computed value for X77 -9.74 kcal/mol, or with the literature value of -10.2 kcal/mol.[135] Therefore, the proposed molecules X77A and 41 X77C exhibit a high affinity to the catalytic site of MP ro. The partitioning of the model systems into Figure 3.3: Fragments of the MP ro active site from the selected PDB structures. Panels (a) and (b): fragments of the active site of MP ro relevant to the reactions of selected compounds with Cis145 as they appear in the PDB structures. We focus on the chain Gly143-Ser144-Cys145 with the reactive Cys145 and the oxyanion hole groups and to the location of the His41 side chain relative to Cys145. Panel (c): the PDB structure 6W63, a non-covalent complex of MP ro with X77. To design covalent inhibitors, we replaced the fragment of X77 (highlighted in yellow) by the reactive warheads (see panels X77A and X77C in Fig. 3.1). Here and below the distances are given in Å. the QM and MM parts for each compound is explained in Results and in the SI. As emphasized in our QM/MM study of the MP ro–carmofur model,[145] reporting only relevant structures from PDB as initial coordinates of heavy atoms and the partitioning of the system into QM and MM parts is not sufficient to ensure the reproducibility of QM/MM-based calculations of the energy profiles of enzymatic reactions. More details need to be reported for others to be able to evaluate the results and to reproduce the findings. In the SI, we provide details of the preparation of the model systems, including addition of hydrogen atoms and the protonation states of the amino acid side chains, solvation of proteins, initial relaxation of the model structures using classical MD, link-atom schemes in the QM/MM boundary treatment, embedding protocols, and optimization algorithms. We use the following notations for the computed structures: REAC (reactant state), IP (ion pair state), PROD (product state), TS (transition state). For each system, the REAC structure refers to the neutral state, Cys145/His41, in the catalytic dyad; IP corresponds to the 42 structure with the ion-pair state, Cys-/His+, PROD corresponds to the structure with the covalently bound Cys145 with a leaving group kept in the active site. Reaction intermediates, besides the IP state, are described in the corresponding subsections; in particular, TI means the tetrahedral intermediate and MC means the Meisenheimer complex. The QM/MM optimization of the stationary points was carried out using the density functional theory with the PBE0 functional[146] with the dispersion correction[147] (D3) to describe the QM part. The performance of the PBE0 functional has been extensively documented and benchmarked (see, for example, Ref. 145). It has been shown to perform well for computing reaction profiles for organic molecules[148]. Our groups used this functional for simulating other biological systems. In our previous paper (Ref. 145) we compared this functional against a more advanced wB97X-D and found that the energy profile is insensitive to the functional choice. Energies and forces in the MM part were computed with the AMBER force-field parameters.[149] These QM(PBE0-D3/6-31G*)/MM(AMBER) calculations were performed using the NWChem program [142] with the electrostatic embedding scheme. The QM/MM-optimized minimum-energy structures were obtained in series of unconstrained minimizations. The TS structures were optimized in series of constrained minimizations, assuming appropriate reaction coordinates. The structures of TSs, separating the corresponding minimum-energy points, were verified by performing forward and backward descent from the located saddle points. Additional details are given in Results and in the SI. The optimized coordinates of all structures were deposited to the COVID-19 hub repository supported by MolSSI (see the SI for the complete list of deposited files). To compute the Gibbs free energy profiles, which is an essential part of modeling protein-ligand systems,[150] we employed algorithms based on biased MD trajectories[151, 152]. The recent implementation[153] interfacing the MD program NAMD[154] with quantum chemistry packages allowed us to apply computational protocols with the QM/MM potentials, as in our previous studies of enzyme-catalyzed reactions.[155–157] Energies and gradients in QM were computed at the PBE0-D3/6-31G** level using the TeraChem program.[158] The 43 CHARMM36 force-field[159] was used in the MM subsystems. Sizes of QM subsystems in these calculations were somewhat smaller than in the QM/MM optimization, but considerably larger than, for example, in previous studies[109–111] of the MP ro–nirmatrelvir reaction. We performed umbrella sampling simulations with additional harmonic potentials centered at different values of collective variables. Trajectories were 5-10 ps long; force constants were 10-40 kcal/mol/Å2 ; the umbrella integration and weighted histogram analysis were used. Further details, such as selection and validation of collective variables, and the QM-MM partitioning, are described in Results and in the SI. 3.3 Results and discussion 3.3.1 Reaction of MP ro with carmofur As explained above, we used the MP ro-carmofur model system, which was characterized in 145, as a template for modeling reactions of covalent inhibition of the enzyme by all compounds considered in the present work. In 145, we focused only on the reaction step from IP to PROD (using the QM/MM scheme with a slightly different QM-MM partitioning). Here, we also consider the step of the ion pair (Cys145−/His41+) formation. In the present calculations, the QM subsystem consisted of 155 atoms, including the entire carmofur molecule, the side chains of His41, Gly143, Ser144, Cys145 side chains; the detailed description of the computational protocol is given in the SI. Fig. 3.4 shows the QM/MM optimized structures of REAC, IP, and PROD. The computed structures of all stationary points on the PES, including the transition states, are given in the SI. At the step of IP formation, REAC→TS1→IP, a two-dimensional energy plot along the distances between the transferring proton HS and the SG atom of Cys145 and the NE atom of His41, d(SG-HS) and d(NE-HS), allowed us to estimate the TS1 point separating the REAC and IP structures. Along this pathway we observed a gradual decrease of the distance of the nucleophilic attack, d(SG-C), from the initial value 3.62 Å in REAC to 3.15 Å in IP. At the next step IP→TS2→PROD, the 44 distance d(SG-C) served as a reaction coordinate in the QM/MM constrained optimization. After passing the TS2 structure, the covalent bond SG-C is formed, and the leaving group, the fluoro-uracil warhead, is separated from the formed covalent adduct of MP ro with the aliphatic tail of the carmofur molecule. The adduct is firmly accommodated in the protein cavity and the C-O group is captured in the oxyanion hole. We use the only available piece of experimental information, the crystal structure of the reaction products, Figure 3.4: The QM/MM-optimized structures of REAC, IP, PROD for the MP ro—carmofur reaction. The left bottom panel shows the superposition of PROD (colored balls and sticks) and the crystal structure 7BUY (yellow sticks). The side chain of His41 in the right bottom panel is shown in yellow sticks. The distances in italics correspond to the crystal structure. to validate the computational protocol. The bottom left panel in Fig. 3.4 (‘Superposition’) shows the active site of the computed PROD structure (colored balls and sticks) and compares it with the relevant fragment (yellow sticks) of the crystal structure (PDB ID: 7BUY[105]). We note a good agreement for the key distances between the computationally derived structure and the crystal structure (see Fig. 3.3). In the adduct of the protein with the carmofur tail, the distances between atoms SG and NE in the catalytic dyad, as well as the distances in the oxyanion hole region between the nitrogen atoms of Cys145 and Gly143 and the 45 oxygen atom (O) in the adduct of the protein with the carmofur tail are close in the experimental and computationally derived structures, even though the leaving group (the fluoro-uracil warhead) is not present in the crystal structure, but it is kept in the active site of the model system. The computed QM(PBE0-D3/6- 31G*)/MM(AMBER) energies of all stationary points are as follows: REAC→TS1 (+12) →IP (+9) →TS1 (+16) →PROD (-13). Here and below, the values in parentheses gives the energy of the corresponding stationary point in kcal/mol relative to the level of REAC. According to these results, the highest energy barrier in the MP ro–carmofur reaction leading to a stable covalent adduct corresponds to the formation of the IP state. To estimate the activation energy in this reaction, we carried out calculations of the free energy profile. We computed the Gibbs free energy profile with the QM(PBE0-D3/6-31G**)/MM(CHARMM36) potentials used in MD simulations. The QM part included the carmofur molecule, the Cys145 and His41 side chains, and a nearby water molecule. We defined the reaction coordinate (the collective variable (CV)) as the combination of the relevant distances: CV=d(SG-HS)-d(NE-HS); the details are given in the SI. The computed profile (Fig. 3.5) shows the activation barrier of 10.4 kcal/mol and the position of the IP state 9.3 kcal/mol higher than that of the REAC state, which is consistent with the energies of the stationary points optimized in the QM/MM calculations. We note that the step of the ion-pair formation is common for the Figure 3.5: The computed Gibbs free energy profile for the REAC→TS1→IP step of the ion-pair formation in the MP ro - carmofur reaction. catalytic cycle of cysteine proteases;[120, 138, 139] however, different computational studies evaluating the 46 corresponding free energy surface resulted in different free-energy profiles. For example, for the ion-pair formation in the reaction of MP ro covalent inhibition by the N3 peptidyl Michael acceptor, two research groups almost simultaneously reported the Gibbs free energy activation barriers of 1.4 kcal/mol[117] and 10.7 kcal/mol[118]. To conclude this subsection, we note that the simulations describe the formation of the covalent adduct in the MP ro–carmofur reaction[105] consistently with the experimental observations. However, no attempts to use carmofur as the COVID-19 drug have been reported. 3.3.2 Reaction of MP ro with X77A We designed the X77A compound by introducing the warhead with the fluoro-uracil moiety, resembling that in the carmofur molecule. The target atom for the nucleophile attack of the Cys145 thiolate is the similar carbonyl carbon atom marked by the asterisk in Fig. 3.1. Thus, it is reasonable to expect that the mechanism of the reaction MP ro with X77A would resemble the mechanism of the MP ro reaction with carmofur with the same leaving group. Although the basic features are common, we located an additional reaction intermediate (besides IP)—a tetrahedral intermediate (TI)—on the route from IP to PROD. In the QM/MM optimization, the large QM part included the entire X77A molecule, the molecular groups of His41, Cys145, Ser144, Gly143, Thr25, Thr26, Leu27, Leu141, Asn142, Gly146, His164, Met165, Asp187, and 7 water molecules (208 atoms in total). The panels in Fig. 3.6 illustrate the minimum-energy points optimized in the QM(PBE0-D3/6-31G*)/MM(AMBER) calculations; the structures of all stationary points including TSs, are shown in the SI. The first step—REAC→TS1(+8)→IP(+4)—shares similar features with the MP ro - carmofur reaction (cf. upper panels in Figs. 3.4, 3.6), but with a slightly lower energy barrier. Scans along the gradually decreasing coordinate d(SG-C) allowed us to locate the stationary points at the subsequent reaction steps: IP(+4)→TS2 (+5) →TI (-15) and TI (-15) →TS3 (-14) →PROD (-23). According to these QM(PBE0-D3/6-31G*)/MM(AMBER) calculations, the highest energy barrier corresponds to the step of the IP formation, whereas the energy barriers at the subsequent steps are low, 1-4 kcal/mol. The structure of 47 the products (the bottom left panel in Fig. 3.6) shows that the covalent adduct is firmly trapped in the protein cavity; the C-O bond is captured by the oxyanion hole. The QM(PBE0-D3/6-31G**)/MM(CHARMM36) Figure 3.6: The QM/MM optimized structures for the MP ro – X77A reaction. A large part of the X77A molecule is shown in light yellow sticks. The side chain of His41 in the bottom panels is shown in goldish yellow sticks. MD simulations resulted in the Gibbs energy profiles for the MP ro – X77A reaction shown in Fig. 3.7. The QM part included the fragment of the substrate, His41, Leu141, Asn142, Gly143, Ser144, Cys145, Gly146, His164, and Met165, a water molecule that interacts with the His41. The collective variables were selected as follows: CV1=d(NE-HS)-d(SG-HS) at the reaction step of the ion pair formation, and CV2=d(SG-C)- d(C-N5) for the subsequent steps. The upper part in Fig. 3.7 summarizes the data showing that activation barriers along the reaction pathway are low; the highest energy barrier corresponds to the formation of the ion pair state. Thus, according to the present simulations, the X77A compound should be an efficient covalent inhibitor of MP ro . 48 Figure 3.7: The computed Gibbs free energy profiles for the MP ro – X77A reaction. The upper panel shows the diagram combining the results at the two reaction steps illustrated in the bottom panels. The collective variables are as follows: CV1 = d(NE-HS)-d(SG-HS), CV2 = d(SG-C)-d(C-N5). 3.3.3 Reaction of MP ro with X77C Klein et al.45 proposed to use aromatic compounds that can react with the catalytic cysteine by the SN Ar addition/elimination mechanism as a new class of covalent inhibitors of cysteine proteases. Several such compounds have been tested as prospective inhibitors of the protease rhodesain[129]. Inspired by this idea, we introduced the 5-fluoro-6-nitro-pyrimidine2,4(1H,3H)-dione warhead into the X77 template to create compound X77C. Upon deprotonation, the sulfur ion of Cys145 attacks the carbon atom C initially bound to fluorine (see the X77C panel in Fig. 3.1). We used the same strategy as for the MP ro – X77A reaction to characterize the energy profiles for the MP ro – X77C reaction and to dissect the reaction mechanism: the QM/MM calculations of the structures on the PES followed by the QM/MM MD calculations of the Gibbs 49 free energy profiles. The results of QM(PBE0-D3/6-31G*)/MM(AMBER) optimization of the minimumenergy structures are shown in Fig. 3.8; the structures of all stationary points including TSs are given in the SI. In QM/MM optimization, the large QM part included the entire X77C molecule, the molecular groups of His41, Cys145, Ser144, Gly143, Thr25, Thr26, Leu27, Leu141, Asn142, Gly146, His164, Met165, Asp187, and 7 water molecules (203 atoms in total). According to these results (REAC→TS1(+4)→IP(0)→TS2 (+2) →MC (-15) →TS3 (-14) →PROD (-25), we located two reaction intermediates—the ion pair state (IP) and the Meisenheimer complex (MC), which are separated by fairly low energy barriers (not exceeding 4 kcal/mol). The structure of PROD confirms the formation of the covalent adduct; the leaving group (F−) is captured by the anion hole. The results of the QM(PBE0-D3/6-31G**)/MM(CHARMM36) MD simulations of the Gibbs Figure 3.8: Structures of the QM/MM optimized structures for the MP ro – X77C reaction. free energy profiles are shown in Fig. 3.9. The QM part included the fragment of the substrate, His41, Leu141, Asn142, Gly143, Ser144, Cys145, Gly146, His164, and Met165, a water molecule that interacts with the His41. The collective variables, selected after several trials, are: CV1=d(NE-HS)-d(SG-C) up to the MC formation and CV2=d(C-F) at the subsequent step. In this case, the IP intermediate is not clearly visible 50 Figure 3.9: The computed Gibbs free energy profiles for the MP ro –X77C reaction. The upper panel shows the diagram combining the results at the two reaction steps illustrated in the bottom panels. The collective variables are as follows: CV1 = d(NE-HS)-d(SG-C), CV2 = d(C-F). on the free energy surface; the free energy profile at the step resembles the features reported by RamosGuzmán et al. in modeling other reactions of covalent inhibition of MP ro[108–110, 118]. In contrast, the MC intermediate corresponds to the minimum-energy structure in both the QM/MM and QM/MM MD calculations. We note that the nature of the Meisenheimer complex in the SN Ar reactions[129, 160, 161] is still debated, in particular, whether it represents a reaction intermediate or a transition state. In our case, the results favor the formation of the minimum-energy structure separated from the reactants and products by the free energy barriers of 13 and 4.4 kcal/mol. We conclude that the compound X77C can react with MP ro with low activation barriers, leading to the covalent binding of the catalytic Cys145. 51 3.3.4 Reaction of MP ro with nirmatrelvir The molecular model of nirmatrelvir is shown in the lower left panels in Fig. 3.1. The covalent binding of nirmatrelvir by MP ro is confirmed by several crystal structures in the Protein Data Bank (e.g., PDB IDs: 7VH8, 7MLG, 7MLF). The reaction of MP ro with a nitrile-based ligand, such as nirmatrelvir, should lead to a covalent thioimidate adduct after deprotonation of Cys145 and the nucleophilic attack of the thiolate making the SG-C covalent bond[117, 118, 128]. The interaction of the nirmatrelvir molecule with MP ro studied by classical MD simulations[133] shows a tight binding of this compound at the protein surface. Three computational papers[117–119] have reported mechanistic details of the MP ro– nirmatrelvir reaction based on QM/MM calculations. Ramos-Guzmán et al.[109, 110] computed the minimum free energy path for the nirmatrelvir covalent binding to MP ro using the adaptive string method with QM/MM potentials. The QM subsystem composed of the fragments of Cys145 and His41, a water molecule and the warhead of the inhibitor (about 50 atoms in total), was described at the B3LYP-D3/6-31+G* level in Ref. 31 and at the M06-2X-D3/6-31+G* level in Ref. 110. The path was determined through biased MD simulations using 7 collective variables that included the distances of all the bonds being broken, formed or whose formal order changed during the process. The computed profiles show single TS of 14-16 kcal/mol and the reaction energy of 10-14 kcal/mol[109, 110]. No clear stabilization of the IP state was observed. Ngo et al.[111] used the ONIOM version of QM/MM to evaluate the MP ro–nirmatrelvir reaction energy profile. The authors assumed that Cys145, His41 and the nearby residue Asp187 form a catalytic triad to facilitate covalent binding of the ligand to the protein. The QM part comprised 49 atoms, including small fractions of the ligand and of the amino acid triad, described at the B3LYP-D3/6-31G(d) level upon QM/MM optimization followed by single point calculations at the M06-2X/6-311+G(2d,2p) level. The constructed energy diagram corresponds to a flat profile within 3.4 kcal/mol at the first reaction steps showing no formation of the ion-pair state. At the final step, describing proton transfer with a participation of a mediated water molecule, an energy barrier of 19 kcal/mol was reported. The computed energy of reaction products was 52 about 9 kcal/mol below the level of reactants. The QM part in QM(PBE0-D3/6-31G*)/MM(AMBER) optimization included the reactive part of the nirmatrelvir molecule, the molecular groups of His41, Gly143, Ser144, Cys145, Thr25, Thr26, Leu27, Gly146, Ser147, Val148, Met162, His163, His64, Met165, and 10 water molecules. The results of QM/MM optimization of the minimum-energy points on PES are shown in Fig. 3.10; the relative energies are as follows: REAC→TS1(+2)→INT(-18)→TS2 (-12)→PROD (-29). We Figure 3.10: The QM/MM optimized structures for the MP ro– nirmatrelvir reaction. located a reaction intermediate (INT). Its structure has a short distance d(SG-C) of 1.92 Å (much shorter than that in the IP state in the reactions of MP ro with camofur, X77A, and X77C), whereas the His41 side chain remains protonated (positively charged). To complete the reaction, i.e., to protonate the N atom of the ligand, two water molecules, Wat1 and Wat2 shown in the right part in Fig. 3.10, form a proton wire from the Nε atom of His41 with typical distances about 2.7 Å between heavy atoms. The Gibbs free energy profiles were computed with the QM(PBE0-D3/6-31G*)/MM(CHARMM36) potentials describing the 97-atomic subsystem composed of the nirmatrelvir molecule, Cys145 and His41 side chains, and water molecules. The obtained profile is shown in Fig. 3.11. The results of the present simulations agree in part with the previous modeling of the MP ro–nirmatrelvir reaction[109–111]. In particular, all approaches disfavor the formation of the IP state as an energy minimum. All approaches yield a considerable reaction 53 energy, e.g., -14 kcal/mol in the present Gibbs free energy calculations. However, we did not obtain a high energy barrier on the reaction pathway—our values do not exceed 5 kcal/mol (Fig. 3.11), in contrast to the values of 14-18 kcal/mol in Refs. 25-27. We cannot support the mechanism suggested in Ref. 111 in which the triad Cys145-His41-Asp187 plays the central role. First, all available crystal structures of the reaction product of nirmatrelvir with MP ro show that the side chain of Asp187 is located too far from the His41 side chain to form a typical catalytic triad as in, for example, serine proteases. The shortest O(Asp)-N(His) distance in the crystal structures is longer than 5 Å. The authors[111] manually prepared a model system in which this distance corresponded to a hydrogen-bonded pair (presumably, within 3 Å). Second, the comparison of Scheme 1 and Fig. 3.3 in Ref. 111 shows that the quantum subsystem was modified only for the critical reaction step from INT-3 to INT-4 by adding a water molecule to a small QM subsystem considered at the previous steps. The high reaction barrier of 19 kcal/mol was reported only for the modified model system. Qualitatively, our reaction mechanism is close to the one described in Refs. 25 and 26, but it assumes the formation of the reaction intermediate INT (Figs. 3.10, 3.11) with the features resembling those of TI in the reaction of MP ro with X77A (Fig. 3.6). The reaction profile reported in Ref. 31 features a shallow shape at the steps of the ion-pair formation (as in our calculations) and a sharp rise to 16 kcal/mol at the step of proton transfer from the doubly protonated His41 to the nitrogen atom of the ligand via a single water molecule. In our model, two water molecules mediate the proton transfer route from His41 to the ligand (as obtained in large-scale QM/MM MD simulations). It is well known that proton transfer via the chain of two water molecules has a lower barrier than proton transfer via a single molecule. In this respect, we note that, according to a careful computational study[162], proton transfer via aligned chains of water molecules is characterized by low energy barriers—within 10 kcal/mol. As far as comparison with experiment is concerned, unfortunately, the experimental kinetic data (i.e., rate constants of individual steps in reactions of MP ro with substrates and covalent inhibitors) are limited. According to the pre-steady-state kinetics study of MP ro from SARS-CoV-2[163] and SARS-CoV[164], the acylation 54 Figure 3.11: The computed Gibbs free energy profiles for the MP ro–nirmatrelvir reaction. The upper panel shows the diagram combining the results at the two reaction steps illustrated in the bottom panels. The collective variables are as follows: CV1=d(NE-HS)-d(SG-HS)-d(SG-C), CV2=d(NE-HS)-d(HSOWat1)+d(OWat1-HWat1)-d(HWat1-OWat2)+d(OWat2-HWat2)-d(N-HWat2). step proceeds efficiently and is unlikely to be responsible for the overall steady-state rate constant, kcat. These conclusions are supported by computational studies of the peptide bond hydrolysis in the active site of MP ro[115, 117, 165]. The only pre-steady-state kinetic data for the formation of the covalent MP ro - inhibitor complex are reported in Ref. 163 for the fluorophore-containing compound PF-00835231. This study compares the kinetics of the fluorescence intensity upon the reaction of the wild-type MP ro and its Cys145Ala mutant. The latter lacks the catalytic cysteine residue and may form only a non-covalent complex. For both variants of the protein, the non-covalent binding occurs via two steps. The rate constant of the second step is ∼1 s−1 ; this step was attributed to the tighter binding of the inhibitor leading to the 55 increase in the fluorescence intensity[163]. The subsequent reaction step, which can be attributed either to a chemical reaction or to conformational changes following the covalent adduct formation, was observed only for the wild-type enzyme, and it is 5 times slower than the preceding non-covalent binding step. We also note that the reported kcat values for MP ro reactions are predominantly derived from steady-state experiments and refer to either deacylation step or product release[163, 164]. 3.4 Conclusion COVID-19 pandemic motivated numerous studies of the SARS-related enzymes, which considerably expanded the understanding of the enzyme catalysis. Our study contributes to these efforts. We modeled reactions of four compounds and show that these compounds are capable of binding chemically to the catalytic cysteine residue of MP ro and, therefore, can serve as irreversible inhibitors of this enzyme. The simulations revealed three distinct reaction mechanisms. We recognize that these three mechanisms do not exhaust all possible scenarios—other documented examples of the MP ro inhibition include the Michael addition to the unsaturated carbon-carbon bond[118, 119] and the reactions with ketones[108, 116]. Our simulations contribute to the ongoing efforts to find more effective drugs to fight COVID-19. We show that the employed computational protocols are sufficiently reliable and yield the results consistent with the already known information, i.e., the computed energy profiles for carmofur and nirmatrelvir show that the corresponding reactions with MP ro are efficient with respect to energy barriers and reaction energies. Therefore, we expect that the compounds designed computationally in our work, X77A and X77C, and characterized at the same level of theory are promising drug candidates for blocking MP ro. In summary, the results of our QM/MM modeling of chemical reactions of the catalytic Cys145 amino acid residue of the SARS-CoV-2 main protease with four compounds, carmofur, nirmatrelvir, X77A, X77C, show that these species can form stable covalent adducts with MP ro, and the activation barriers are sufficiently low for the 56 reactions to be efficient. The structural results for carmofur and nirmatrelvir are consistent with the experimental findings, and the success of the simulations provides a sound basis for a prediction of the two novel potential inhibitors, X77A and X77C, proposed in this chapter. From the fundamental perspective, this study illustrates that the formation of covalent adducts follow three distinct reaction mechanisms of the irreversible inhibition of cysteine proteases. 3.4.1 Appendix Details about model systems preparation, QM/MM setup and free energy calculations, input parameters, list of structures deposited to the MolSSI COVID-19 hub. This material is available free of charge via the Internet at http://pubs.acs.org. 57 Chapter 4 Theoretical insights into the effect of size and substitution patterns of azobenzene derivatives on the DNA G-quadruplex 4.1 Introduction G-quadruplexes are important non-canonical DNA structures formed by stacking of G-quartets; a planar structure of four guanine bases linked by a Hoogsteen hydrogen bond network[166–168]. Such structures are stabilized in the presence of K+ or Na+ cations located between the G-quartets. G-quadruplexes have attracted considerable attention because of their potential as therapeutic targets for cancer[169, 170]. For example, stabilization of the G-quadruplex within telomeric DNA and oncogene promoter regions can inhibit telomere elongation in cancer cells and oncogene transcription or translation, respectively[171– 174]. Besides the biological applications, G-quadruplex structures can be utilized as interesting building blocks in nanodevices[175] and optomechanical molecular motors[176] as their folding and unfolding can be controlled in the presence of external stimuli such as, light[177], pH[178], metal cations[179, 180] and small molecules[181–183]. Light is a promising external trigger which has multiple advantages including 58 high precision, eco-friendliness, spatiotemporal control and non-invasiveness features[184, 185]. The introduction of photolabile groups into G-quadruplex structures is one of the most widely used methods to regulate G-quadruplex formation[186, 187]. Figure 4.1: (a) The structures of the three azobenzene units in the trans isomer, G refers to the guanosine moieties and (b) schematic representation of the photoswitchable G-quadruplex structure with azobenzene residues (AZ1) in green color. K+ cations are presented as purple spheres. Moreover, azobenzene derivatives have been employed in G-quadruplexes which can reversibly either fold or unfold upon light irradiation[177, 188]. Heckel and co-workers developed the smallest photocontrollable DNA switch reported to date, i.e. a photoswitchable G-quadruplex in which two sets of two guanosines were connected through photoswitchable azobenzene derivatives, AZ1, AZ2, and AZ3 as part of the backbone structure (Fig. 4.1a)[188]. 4.2 Computational protocol The G-quadruplex structure with AZ1 (PDB code 2N9Q)[188] was used as a starting structure for constructing the model system. The Parmbsc0[189] force field was selected for G-quadruplex nucleobases. Recent studies have reported that the Parmbsc0 is a valid force field for DNA simulations[190–192], in particular for the simulations within the ns timescale. For atoms in the azobenzene, since it is a non-standard molecule, parameters were defined using the Generalized Amber Force Field (GAFF)[193]. Partial atomic 59 charges of all azobenzene derivatives atoms (Fig. 4.2) were assigned with the restrained electrostatic potential (RESP)[194] at the HF/6-31G* level of theory. The K+ ions were described using the Amber adapted Åqvist model[195]. The RESP charges and GAFF parameters are reported in the tables below. C2A C1A C6A C5A C4A C3A N5A N5B C4B C5B C6B C1B C2B C3B 5’C H1B C7A H2B H5B B3H H3A A2H H6A H5A C3’ H5’2 H5’1 H7’1 H7’2 3’2H H3’1 C7B H8’1 H8’2 AZ1 AZ2 AZ3 C2A C1A C6A C5A C4A C3A N5A N5B C4B C5B C6B C1B C2B C3B 5’C C3’ H5’1 H5’2 H3’1 H3’2 H2B H5B B3H H3A A2H H6A H5A H6B H2B C2A C1A C6A C5A C4A C3A N5A N5B C4B C5B C6B C1B C2B C3B 5’C H5’1 H5’2 C3’ H3’2 H3’1 H3A A2H H6A H5A H2B H5B H6B B3H Figure 4.2: Atom labeling scheme of the azobenzene derivatives used in the force field and QM region of QM/MM simulations The G-quadruplex structure was inserted into a water cubic box extending to 10 Å buffer in each direction. The structure was solvated with water molecules described by the TIP3P model and was subjected to 2500 steps of energy minimization using the steepest descent algorithm. Then, the minimized structure was equilibrated under an NVT ensemble (300 K) for 1 ns followed by 2 ns NPT equilibration (1 atm) using a velocity rescaling thermostat[196, 197] and a Parrinello–Rahman barostat[198, 199] (τT = 0.1 ps, τP = 1 ps). The cut off for van der Waals and electrostatic interactions was set to 10.0 Å. The long-range electrostatic interactions were calculated using the particle mesh Ewald (PME) method[200] and the LINCS algorithm[201] was used to fix all bonds. Finally, the MD production run was performed in an NPT ensemble for 50 ns. Furthermore, the PDB code 2N9Q was adapted for G-quadruplex structures containing AZ2 and AZ3, and a similar simulation setup has been used for their MD production runs. All MD simulations were performed using the GROMACS 2018.2 package[202]. 60 Table 4.1: Atom names and RESP charges used for AZ1, AZ2 and AZ3 in MD simulations AZ1 AZ2 AZ3 atom names atom types charge atom names atom types charge atom names atom types charge C5’ c3 0.4030495 C5’ c3 0.4030495 C5’ c3 -0.578415 C1A ca -0.170015 C1A ca -0.170015 C1A ca -0.170015 C2A ca -0.095668 C2A ca -0.095668 C2A ca -0.095668 C6A ca -0.095668 C6A ca -0.095668 C6A ca -0.095668 C3A ca -0.206585 C3A ca -0.206585 C3A ca -0.206585 C5A ca -0.206585 C5A ca -0.206585 C5A ca -0.206585 C4A ca 0.342622 C4A ca 0.342622 C4A ca 0.342622 N5A ne -0.202255 N5A ne -0.202255 N5A ne -0.202255 C1B ca -0.170015 C1B ca -0.170015 C1B ca -0.095668 C2B ca -0.095668 C2B ca -0.095668 C2B ca -0.095668 C6B ca -0.095668 C6B ca -0.095668 C6B ca -0.170015 C3B ca -0.206585 C3B ca -0.206585 C3B ca -0.206585 C5B ca -0.206585 C5B ca -0.206585 C5B ca -0.206585 C4B ca 0.342622 C4B ca 0.342622 C4B ca 0.342622 N5B nf -0.202255 N5B nf -0.202255 N5B nf -0.202255 C3’ c3 0.331041 C3’ c3 0.331041 C3’ c3 -0.578415 H5’1 h1 0.038780 H5’1 h1 0.038780 H5’1 h1 0.015467 H5’2 h1 0.038780 H5’2 h1 0.038780 H5’2 h1 0.015467 H2A ha 0.132519 H2A ha 0.132519 H2A ha 0.132519 H6A ha 0.132519 H6A ha 0.132519 H6A ha 0.132519 H3A ha 0.124774 H3A ha 0.124774 H3A ha 0.124774 H5A ha 0.124774 H5A ha 0.124774 H5A ha 0.124774 H2B ha 0.132519 H2B ha 0.132519 H2B ha 0.132519 H6B ha 0.132519 H6B ha 0.132519 H6B ha 0.132519 H3B ha 0.124774 H3B ha 0.124774 H3B ha 0.124774 H5B ha 0.124774 H5B ha 0.124774 H5B ha 0.124774 H3’1 h1 0.038780 H3’1 h1 0.038780 H3’1 h1 0.015467 H3’2 h1 0.038780 H3’2 h1 0.038780 H3’2 h1 0.015467 C7A c3 0.4030495 H7’1 h1 0.038780 H7’2 h1 0.038780 C8B c3 0.4030495 H8’1 h1 0.038780 H8’2 h1 0.038780 61 Table 4.2: GAFF parameters for bonds of AZ1, AZ2 and AZ3. The force constants (Kb) and equilibrium bonds (b0) are given in kJ/mol·nm2 and nm, respectively. AZ1 AZ2 AZ3 atom types b0 Kb atom types b0 Kb atom types b0 Kb c3-ca 1.5156e-01 2.6861e+05 c3-ca 1.5156e-01 2.6861e+05 c3-ca 1.5156e-01 2.6861e+05 c3-h1 1.0969e-01 2.7665e+05 c3-h1 1.0969e-01 2.7665e+05 c3-h1 1.0969e-01 2.7665e+05 ca-ca 1.3984e-01 3.8585e+05 ca-ca 1.3984e-01 3.8585e+05 ca-ca 1.3984e-01 3.8585e+05 ca-ha 1.0860e-01 2.8937e+05 ca-ha 1.0860e-01 2.8937e+05 ca-ha 1.0860e-01 2.8937e+05 ne-nf 1.2632e-01 6.0450e+05 ne-nf 1.2632e-01 6.0450e+05 ne-nf 1.2632e-01 6.0450e+05 ca-ne 1.4079e-01 3.2577e+05 ca-ne 1.4079e-01 3.2577e+05 ca-ne 1.4079e-01 3.2577e+05 ca-nf 1.4079e-01 3.2577e+05 ca-nf 1.4079e-01 3.2577e+05 ca-nf 1.4079e-01 3.2577e+05 c3-c3 1.5375e-01 2.5179e+05 Table 4.3: GAFF parameters for angles of AZ1, AZ2 and AZ3. The force constants (Kθ) and equilibrium angles (θ0) are given in kJ/mol·rad2 and degree, respectively. AZ1 AZ2 AZ3 atom types θ0 Kθ atom types θ0 Kθ atom types θ0 Kθ c3-ca-ca 1.2077e+02 5.3162e+02 c3-ca-ca 1.2077e+02 5.3162e+02 c3-c3-h1 1.0956e+02 3.8819e+02 ca-c3-h1 1.0956e+02 3.9321e+02 ca-c3-h1 1.0956e+02 3.9321e+02 ca-c3-c3 1.1207e+02 5.2844e+02 ca-ca-ca 1.2002e+02 5.5748e+02 ca-ca-ca 1.2002e+02 5.5748e+02 c3-ca-ca 1.2077e+02 5.3162e+02 ca-ca-ha 1.1988e+02 4.0317e+02 ca-ca-ha 1.1988e+02 4.0317e+02 ca-c3-h1 1.0956e+02 3.9321e+02 ca-ca-ne 1.2061e+02 5.6777e+02 ca-ca-ne 1.2061e+02 5.6777e+02 ca-ca-ca 1.2002e+02 5.5748e+02 ca-ne-nf 1.1517e+02 5.8919e+02 ca-ne-nf 1.1517e+02 5.8919e+02 ca-ca-ha 1.1988e+02 4.0317e+02 ca-ca-nf 1.2061e+02 5.6777e+02 ca-ca-nf 1.2061e+02 5.6777e+02 ca-ca-ne 1.2061e+02 5.6777e+02 h1-c3-h1 1.0846e+02 3.2836e+02 h1-c3-h1 1.0846e+02 3.2836e+02 ca-ne-nf 1.1517e+02 5.8919e+02 ne-nf-ca 1.1517e+02 5.8919e+02 ne-nf-ca 1.1517e+02 5.8919e+02 ca-ca-nf 1.2061e+02 5.6777e+02 h1-c3-h1 1.0846e+02 3.2836e+02 ne-nf-ca 1.1517e+02 5.8919e+02 Table 4.4: GAFF parameters for angles of AZ1, AZ2 and AZ3. The force constants (Kϕ) and equilibrium angles (ϕ0) are given in kJ/mol and degree, respectively. n is the multiplicity. AZ1 AZ2 AZ3 atom types ϕ0 Kϕ n atom types ϕ0 Kϕ n atom types ϕ0 Kϕ n c3-ca-ca-ca 180.00 15.16700 2 c3-ca-ca-ca 180.00 15.16700 2 c3-ca-ca-ca 180.00 15.16700 2 c3-ca-ca-ca 180.00 15.16700 2 c3-ca-ca-ca 180.00 15.16700 2 c3-ca-ca-ha 180.00 15.16700 2 c3-ca-ca-ha 180.00 15.16700 2 c3-ca-ca-ha 180.00 15.16700 2 ca-ca-ca-ca 180.00 15.16700 2 ca-ca-ca-ca 180.00 15.16700 2 ca-ca-ca-ca 180.00 15.16700 2 ca-ca-ca-ha 180.00 15.16700 2 ca-ca-ca-ha 180.00 15.16700 2 ca-ca-ca-ha 180.00 15.16700 2 ca-ca-c3-h1 0.00 0.00000 0 ca-ca-c3-h1 0.00 0.00000 0 ca-ca-c3-h1 0.00 0.00000 0 ca-ca-ca-ne 180.00 15.16700 2 ca-ca-ca-ne 180.00 15.16700 2 ca-ca-ca-ne 180.00 15.16700 2 ca-ca-ne-nf 180.00 0.00000 3 ca-ca-ne-nf 180.00 0.00000 3 ca-ca-ne-nf 180.00 0.00000 3 ca-ne-nf-ca 180.00 86.80000 2 ca-ne-nf-ca 180.00 86.80000 2 ca-ne-nf-ca 180.00 86.80000 2 ne-ca-ca-ha 180.00 15.16700 2 ne-ca-ca-ha 180.00 15.16700 2 ne-ca-ca-ha 180.00 15.16700 2 ne-nf-ca-ca 180.00 0.00000 3 ne-nf-ca-ca 180.00 0.00000 3 ne-nf-ca-ca 180.00 0.00000 3 nf-ca-ca-ha 180.00 15.16700 2 nf-ca-ca-ha 180.00 15.16700 2 nf-ca-ca-ha 180.00 15.16700 2 ha-ca-ca-ha 180.00 15.16700 2 ha-ca-ca-ha 180.00 15.16700 2 ha-ca-ca-ha 180.00 15.16700 2 ca-ca-ca-nf 180.00 15.16700 2 ca-ca-ca-nf 180.00 15.16700 2 ca-ca-ca-nf 180.00 15.16700 2 ca-c3-c3-h1 0.00 0.65084 3 ca-ca-c3-c3 0.00 0.00000 0 h1-c3-c3-h1 0.00 0.65084 3 4.3 Results and conclusions The production run for AZ1 showed that the upper K+ is able to escape from the G-quadruplex and move above the top G-quartet. Furthermore, the K+ located in the middle of the structure can move to the site 62 Table 4.5: Ion Lennard-Jones parameters and water models used in the MD simulations. Ion parameters are abbreviated as follows: AA: Amber-adapted Åqvist; JC: Joung and Cheatham. In AZn, n refers to the structure number of azobenzene derivatives AZ1, AZ2 and AZ3. simulation group ion parameters ion type water model σ (nm) ϵ (kJ/mol) AZn-AAK+ AA K+ TIP3P 4.736 × 10−1 1.372 × 10−3 AZn-JCK+ JC K+ SPC/E 2.838 × 10−1 1.798 × 10−0 AZn-JCKCl JC K+ SPC/E 2.838 × 10−1 1.798 × 10−0 Cl− 4.830 × 10−1 5.349 × 10−2 Figure 4.3: RMSDs as a function of simulation time. (a, d and g) All atoms of the G-quadruplexes in different simulation groups described in the text and Table 4.5. (b, e and h) G-quartets and azobenzenes in JCKCl simulations. Superimposed structures of (c) the representative structures of the two different ground state populations of the AZ1 and (f and i) the two random snapshots of AZ2 and AZ3, in JCKCl MD simulations. where the previous K+ escaped from. This observation shows that the chosen model for the K+ ions does not reproduce the experimental conditions, for which all ions are stable inside the G-quadruplex. This finding ultimately brought us to find better force field parameters for describing the cations within the G-quadruplex. We found that by employing the parameters developed by Joung and Cheatham[203] the cations remained stable during further MD production runs with all types of azobenzene molecules. The time evolution of the RMSD of the atomic positions of azobenzene and the G-quartets is reported in Fig. 4.3 63 and reveals the existence of two different ground state populations of AZ1. This study was further extended by Dr. K. G. Moghaddam, who modeled the potential energy surfaces (PESs) of the isomerization of each azobenzene compound in the gas phase and computed their absorption spectra while embedded in DNA Gquadruplex using hybrid quantum mechanical/molecular mechanical (QM/MM) methods. Although these findings are described in detail in the doctoral thesis of Dr. Moghaddam, I provide a summary of these findings in the sections below for the sake of completeness. 4.4 Photoisomerization of azobenzene derivatives in gas phase In order to characterize the photoisomerization of azobenzene derivatives, it is crucial to locate different critical points on the PESs involved in the reaction. Such critical geometries are points of minima and crossing points of electronic states which ultimately determine the dynamics of the isomerization. The PESs of the ground state (S0) and the two lowest excited singlet states (S1 and S2) of AZW1, AZW2, and AZW3 are computed through a series of constrained optimizations, at the SF-B5050LYP/cc-pVDZ level of theory, where the constraints enforce the scan of the CNNC dihedral angle over a range of 0-180 degrees. Points of minima of electronic states are computed with relaxed optimizations, while two minimum energy crossing points (MECPs), S2/S1 and S1/S0, are located using the branching plane updating algorithm[204]. The energy profile that arises from these calculations is depicted in Fig. 4.4 This result shows that all three derivatives have similar photoisomerization reactions which occur via three consecutive steps; (i) S0 → S2 excitation, (ii) rapid decay from S2 to S1 passing the conical intersection CIS2/S1 , (iii) decay to the ground state of the trans or cis isomer via CIS1/S0 . The similar photodynamics observed for the AZ1, AZ2, and AZ3 derivatives indicates that the size and the substitution pattern do not affect significantly the ultra-fast cis–trans photoiomerization mechanism of the azobenzene unit, as observed also in experiments. 64 Figure 4.4: Schematic representation of the PESs of AZ1 photoisomerization mechanism as function of the CNNC dihedral angle. The ground state (S0), first (S1) and second (S2) excited states are shown in blue, red and green. The S 0 curve is a PES scan along the dihedral angle obtained from SF-B5050LYP/cc-pVDZ. The S1 and S2 curves are obtained through a connection of the excited states optimized geometries and MECPs (shown in purple) calculated at the SF-B5050LYP/cc-pVDZ level of theory. 4.5 QM/MM simulations It is clear from the previous section that the photoisomerization of AZW1, AZW2, and AZW3 is a multistep process that is triggered by the promotion of these molecules to the S2 state after absorbing light of appropriate wavelength. Therefore, it is important to understand how the absorption band of the azobenzene derivatives is affected by the presence of the G-quartets. The absorption spectra of each compound embedded in the G-quadruplex are computed by sampling 90 snapshots from the previous MD simulations and computing electronic excitations within a QM/MM framework. The MM region treats the atoms of DNA, cations and solvent molecules as point charges that add one electron integrals to the Hamiltonian of the QM region, therefore taking into account for polarization. The QM region is defined by the atoms of the azobenzene compounds plus the Hydrogen link-atoms that are introduced to cap any dangling bond 65 after partitioning the QM and MM regions. The electronic excitations, computed at the ωB97X-D/cc-pVDZ level of theory, are used in a Gaussian convolution to generate absorption spectra (Fig. 4.5) The spectral Figure 4.5: Absorption spectra of AZ1, AZ2 and AZ3 obtained by a Gaussian convolution of the excitation energies of 90 MD simulation snapshots. signatures obtained from QM/MM calculations not only is in good agreement with experiment, but shows in addition that the polarization effect of the G-quadruplex has a minimal impact on the position of the absorption peaks. The width of the absorption bands can be easily attributed to the thermal fluctuations taking place along the MD simulations. 66 Chapter 5 Origin of magnetic anisotropy in nickelocene molecular magnet and resilience of its magnetic behavior 5.1 Introduction Molecular magnets have potential applications as building blocks of spin-based memory devices. The individual molecules can be deposited on a surface or self-assembled into 3D architectures, giving rise to scalable magnetic materials[205–207]. Using molecular magnetic units affords high chemical tunability. To be a good magnet, molecule should possess magnetic anisotropy: orientational dependence of the ability to magnetize. Microscopically, magnetic anisotropy originates from a large spin-orbit coupling (SOC) that gives rise to zero-field splitting (ZFS) of the magnetic sublevels[208]. Such magnetic anisotropy yields slow magnetic relaxation, providing energy levels that are well-defined by their spin S and spin projection MS quantum numbers and well-separated in energy. Assuming that coherence times are sufficiently long[209], in order to realize a molecule-based quantum device, its states must then be easy to address by inducing transitions with light or microwave fields[210, 211]. Addressability, the ability of controlling such energy levels and generating superposition states, permits initialization, manipulation, and read-out of the individual molecular magnetic units of a quantum device. To develop individually 67 addressable molecular magnets, one should deposit molecules on a surface and then investigate their electronic structure and magnetic behavior with a probing technique[205, 206]. Spin-flip transitions between the magnetic sublevels of the system can be induced by microwaves within an electron paramagnetic resonance (EPR) setting[212], or via inelastic electron tunneling (i.e., within the junction of a scanning tunneling microscope (STM))[213–215], revealing also the energy spacing between the magnetic sublevels (i.e., spin-orbit splitting or magnetic anisotropy). STM affords atomic-scale spatial resolution, allowing one to address individual molecules, whereas EPR experiments require larger ensembles of magnetic units (e.g., solutions or molecular crystals). Additionally, since the electronic structure of the molecule and, consequently, its response to an applied magnetic field may change significantly by the environment due to, for example, charge transfer or polarization, it is important to verify that spin states and related spin dynamics (coherence or magnetic relaxation times) are retained upon adsorption on a surface[216]. Recently, the nickelocene molecular magnet (NiCp2 , Cp = cyclopentadienyl) adsorbed on metal surfaces has been investigated by STM experiments[217–221]. Due to the robustness of NiCp2 ’s magnetic anisotropy upon adsorption and the ability to address its energy levels by STM, NiCp2 has also been used to functionalize the STM’s metallic tip, probing double spin-flip excitations as well as exchange interactions and coupled spinvibration transitions in the NiCp2 dimer (i.e., one NiCp2 is anchored to the tip and positioned above another NiCp2 deposited on a surface)[218–221]. In order to gain insights on the magnetic and spin properties of NiCp2 when deposited on a metal surface, we investigate electronic structure and magnetic behavior of the molecule adsorbed on a support using MgO(001) as a model surface. MgO is chosen due to its insulating character and low phonon density, thereby suppressing magnetic relaxation via negligible spin-phonon coupling[222]. 68 5.2 Computational protocol To obtain a reliable finite-cluster model of NiCp2 on the MgO(001) surface, we used an embedded cluster approach, which is often employed in computational catalysis to describe isolated point defects or isolated adsorbed molecules on ionic surfaces[223]. This structural model is well-suited for the description of individual molecular magnets on a surface, whereas using periodic boundary conditions would require large supercell to minimize the artificial molecule-molecule interactions with the periodic images. Figure 5.1: a) Embedded cluster setup used for structure optimization: The all-electron QM region (NiCp2/Mg49O49) is treated with PBE0/6-31G*, while the outermost region contains point charges. The QM region is shown as lifted for clarity. Here, NiCp2 is on-top of Mg2+ adsorption site. b) Top and side views of the embedded NiCp2/Mg49O49 PBE0 region. c) Top and side views of a smaller cut-out (NiCp2/Mg25O25 ) used for the SF-TD-DFT calculations. The bond of the metal with the Cp centroid is shown with blue dash lines. Color code: Ni - purple, Mg - green, O - red, C - gray, and H - white. First, we performed a DFT structure optimization of the NiCp2/Mg49O49 model cluster embedded in a sufficiently large array of point charges resembling the ionic MgO surface. Fig. 5.1 shows our embedded cluster model (NiCp2 plus MgO surface). We increase the number of point charges to converge adsorption energy and equilibrium distance between the NiCp2 and MgO(001) surface. Second, we considered a smaller cut-out (i.e., NiCp2/Mg25O25 ) of the optimized NiCp2/Mg49O49 model cluster and investigate spin states and magnetic properties of NiCp2 on MgO(001) using the SF-TD-DFT ansatz. For DFT structure optimizations, we employed the PBE0[146, 224] functional, whereas to compute spin states, we followed recommendations in refs [225] and [226], for transition-metal compounds and used SF-PBE0 and SF-LRC-ωPBEh[227] within the noncollinear formulation of SF-TD-DFT[228, 229]. The size and shape of 69 the NiCp2/Mg25O25 quantum mechanical region was chosen to minimize the computational cost while providing a structural model comparable with models already available in the literature for adsorbates on MgO(001). [230, 231] STM images of NiCp2 on Cu(001)[217–219] and Ag(110)[220, 221] show that NiCp2 is bonded perpendicularly to the surface through a Cp ring, whereas the other Cp ring is exposed to the vacuum. For these reasons, we take inspiration from previous studies on metal substrates[217–221] and select the NiCp2 perpendicularly adsorbed on MgO(001) as a representative structure. Additionally, we investigate two possible adsorption sites (i.e., Ni(II) ion on top of oxygen and magnesium). 5.3 Results and conclusions Our calculations reveal that NiCp2 adsorb perpendicularly to the surface with a more favorable O-top adsorption site and adsorption energy of -7 kcal/mol, suggesting a likely physisorption. Upon adsorption, spin density is preserved. The molecule retains S = 1, and there is no significant spin polarization from the surface. Moreover, spin-state ordering and orbital character of electronic states are preserved when compared to calculations for isolated NiCp2 at both SF-TD-DFT and the spin-flip variant of the equationof-motion coupled-cluster method (EOM-SF-CC). Figs. 5.2 and 5.3 show the Natural Transition Orbitals (NTOs) used for the characterization of electronic states This contribution represents the first theoretical modeling of the magnetic and spin properties of NiCp2 embedded in a MgO surface. Furthermore, one of our main findings shows that magnetic anisotropy and susceptibility computed for isolated NiCp2 are in agreement with experimental measurements, this is extensively discussed in the doctoral thesis of Dr. S. Kotaru. All electronic structure calculations were performed with Q-Chem[143, 232], while magnetic anisotropy, magnetization and susceptibility were calculated using the ezMagnet module[233] of the ezSpectra suite[234]. As was done in the previous chapter, I will provide below a summary of the findings reported in Dr. Kotaru’s doctoral thesis concerning the magnetic properties of nickelocene molecular magnets. 70 Figure 5.2: Hole and particle NTO pairs of the spinless density matrix, giving rise to SOC within states 1 and 2 of nickelocene (EOM-SF-CCSD/cc-pVTZ). Singular values σ are 1.19 and 1.15, respectively. Red, green, and blue axes indicate x, y, and z coordinates axes, respectively. Figure 5.3: Hole and particle NTO pairs of the spinless density matrix between states 1 and 2 of the NiCp2/Mg25O25 adsorption complex (SF-PBE0/cc-pVTZ). Ni atom is on top of O2− . Singular values σ are 0.5 and 0.5, respectively. Red, green, and blue axes indicate x, y, and z coordinates axes, respectively. 71 5.4 Magnetic anisotropy and susceptibility of nickelocene molecular magnets The origin of the magnetic anisotropy of the nickelocene molecular magnet was studied in the gas phase within an EOM-SF-CCSD framework. Six additional derivatives (Fig. 5.4) were included in the investigation to determine the influence of different functional groups on the magnetic and spin properties of the magnets. Molecular geometries of NiCp2 and derivatives 1-3, and 6 were optimized at the ωB97XFigure 5.4: Structures of six ring-substituted nickelocene derivatives. In complex 1, two C-H groups are substituted with two P atoms. In complexes 2, 3, and 6, two H atoms are substituted with methyl, cyano, and aromatic groups, respectively. Complexes 4 and 5 are bent structures taken from refs 33 and 34, respectively. The bond of the metal with the Cp centroid is shown with blue dashed lines. Color code: Ni - purple, P - orange, N - blue, C - gray, and H - white. D/cc-pVDZ level of theory, while the geometries of derivatives 4[235] and 5[236] were obtained from ref 33 and ref 34 respectively. One triplet ground state and 3 singlet excited states were computed at the EOM-SF-CCSD/cc-pVTZ level of theory with the inclusion of the calculation of SOC matrix elements. The ezMagnet module enables extracting magnetic anisotropy (from the spin-orbit splitting) and computing macroscopic magnetic properties (magnetization and susceptibility) starting from EOM-CC calculations of magnetic states and relevant properties (SOCs and angular momentum operators). Calculated macroscopic quantities can then be directly compared with experiments, bypassing the spin-Hamiltonian formalism. A comparison between the computed and measured[237, 238] temperature dependence of the inverse of 72 the magnetic susceptibility is reported in Fig. 5.5. The computed results are in agreement with experiments and are able to reproduce the deviation from the Curie law below 70 K. Ultimately, it is possible Figure 5.5: Calculated temperature dependence of the inverse susceptibility (1/χav ) of NiCp2 in the temperature range from 5 to 250 K (top) and from 5 to 80 K (bottom), and under an applied field of 1 T. Calculated curves including three and four electronic states are in blue and red, respectively. Experimental susceptibility data, i.e., black and green curves, are taken from refs 35 and 36, respectively. Experimental magnetization data are not available. This behavior is preserved also for all 6 derivatives. to attribute the magnetic anisotropy and susceptibility of NiCp2-based molecular magnets to the strong spin-orbit coupling between the triplet ground state and the third singlet excited state. Overall, this work represents the first attempt to describe the magnetic properties of NiCp2 molecular magnets within an EOM-SF framework that shows excellent agreement with experimental measurements. 73 Chapter 6 Exploring the Global Reaction Coordinate for Retinal Photoisomerization: A Graph Theory-Based Machine Learning Approach 6.1 Introduction To develop a clear understanding of a reaction pathway, we need the potential energy surface (PES)[239, 240] of the molecules. PESs form a central concept in the application of electronic structure methods to the study of structures, properties, and reactivities of a molecular system. We can study the dynamics of the molecule, under the Born-Oppenheimer approximation (BO)[241–243], on a single potential energy surface with the aid of ab initio molecular dynamics (AIMD)[244, 245] method. In the AIMD method, the potential energy and its derivatives are evaluated “on the fly”, as needed for the integration of the equations of motion of the system. Since BO approximation breaks down whenever two or more electronic states have a small energy gap, AIMD can’t be used for simulating non-adiabatic processes[246, 247]. Among such processes, we find radiationless decay, intramolecular energy and charge transfer, and most photochemical reactions. One of the most frequently used techniques to simulate the dynamics of complex photochemical reactions is non-adiabatic molecular dynamics (NAMD) using the fewest switches surface 74 hopping (FSSH)[248, 249]. In this technique, the forces are computed as the gradients of single BO PESs and the electronic wavefunction is expanded in terms of adiabatic eigenfunctions. The expansion coefficients are evaluated by solving the time-dependent Schrödinger equation using NA coupling terms. The hopping probability between the electronic adiabatic states depends on the electronic amplitudes as well as nonadiabatic coupling terms. Due to the multistate and multidimensional nature of photoinduced reactions, computation of the PESs and of the reaction coordinate is highly demanding. The knowledge of the reaction coordinates[250] provides the fundamental details of the underlying mechanisms of a given chemical transformation. Very often the definition of a reaction path is based on intuitive considerations. Only recently, the attention was attracted to systematic approaches for selecting appropriate variables and mapping them onto multistep kinetics. The search for reaction coordinates usually involves attaining and analyzing large amounts of data from molecular dynamics simulations. This approach considers every possible internal coordinate as a candidate for the reaction coordinate. However, accounting for all the internal coordinates in the global reaction coordinate is not feasible except for small molecules. Machine learning-based tools can be very helpful in determining the most important internal coordinate involved in the reaction mechanism. Tavadze et al[251] have used a graph-based technique in the search for the reaction coordinate of cis-trans isomerization of azobenzene and shown that the C-N=N-C dihedral angle corresponds to the reaction coordinate. Figure 6.1: Retinal Phototoisomerization: Transition from 11-cis retinal conformation to all-trans-retinal conformation, C in cyan (except C11=C12 in blue), H in pink, O in red. 75 In this work, we model the photoinduced isomerization of retinal, the chromophore present in the human photoreceptor, upon excitation to the lowest and brightest excited state. Photoreceptors consist of a transmembrane protein and a special light-absorbing chromophore that is covalently attached to the associated photosensory domain in the protein[252, 253]. The primary step in visual transduction is initiated when the chromophore in the photoreceptor absorbs light of specific wavelengths[254]. The energy provided by the photon causes the chromophore to undergo photoisomerization leading to structural changes in the protein through allosteric interactions. The conformation changes in the photoreceptors trigger a chemical signaling cascade that initiates visual phototransduction by the human brain[255]. In the initial stage of vision, retinal is found in the 11-cis-retinal conformation, which isomerizes to an all-trans-retinal isomer upon capturing a photon[252, 253, 256] (see Figure 6.1). The photoexcitation causes an electronic transition that opens the double bond creating a temporary single bond that can rotate freely around its axis. Once the excited retinal decays to the ground state, the double bond reforms and locks the molecule into the trans configuration. This photoisomerization process is an ultrafast event that occurs in a matter of femtoseconds[254, 257, 258], making it one of the fastest reactions in nature. As a consequence of the isomerization, the chromophore activates the opsin protein which eventually sends a chemical signal to the visual cortex[255]. Thus, understanding the mechanism of retinal photoisomerization is fundamental and crucial for the development of therapies for various visual disorders. In this contribution, we aim to model the pathway of the photoisomerization reaction of retinal with the aid of concepts from machine learning such as MI and graph representations. Photoisomerization in retinal was simulated using AIMD and NAMD. We attempt to gain insights into the reaction mechanism using an alternative route to the conventional ab initio methods. The goal of the current study is to compare the reaction path obtained with computationally efficient AIMD simulations to that with more compute-intensive NAMD simulations. Thus, we chose a system for which the reaction path is already known. Application of the proposed method to more general cases remains a future study. 76 Herein, we report on a global reaction coordinate (containing all internal coordinates of retinal) to describe the reaction mechanism of the photoisomerization of retinal. Our global coordinate comprises all internal coordinates (bonds, angles, and dihedral angles) of a retinal molecule. Density functional theory (DFT) and molecular dynamics provide electronic and dynamic properties of a molecule at the atomic scale. We obtained the internal coordinates from AIMD and NAMD simulations and calculated the MI between the energy of the highest occupied molecular orbital (HOMO) and the internal coordinates collected from the trajectories. Thus, we rank order all these coordinates and quantify their contribution to the reaction mechanism of retinal photoisomerization. We construct a graph-based network where each node is represented by the HOMO energy and the corresponding internal coordinates. Nodes are then connected with edges whose weight is determined using an expression that takes into account energy difference and the MI of the internal coordinates (eq. 6.1). The global reaction coordinates are determined along the network following the paths of least action. Here we showed that the most important internal coordinate and the reaction path computed with the graph do not depend on the sampling scheme. 6.2 Theoretical methods We generated configurations with AIMD and NAMD methods which were passed for calculating the mutual information between the internal coordinates and the HOMO energy at each time step of the simulations. Mutual Information (MI)[259] is a quantity that measures the relationship between two random datasets that are sampled simultaneously. It also captures the non-linear correlation between two random datasets. When there is a big dataset with a large range of features, MI can help to select a subset of the most crucial features in order to discard the irrelevant ones. Thus it can be used as an important tool for feature selection[260]. For example, it has been used to determine the structure-property relationship 77 in nanomaterials[261, 262]. To calculate the MIs, we first built the data sets from the molecular dynamics simulations and defined the internal coordinates as the features of our system. The data sets include information of all the internal coordinates such as bonds, angels, and dihedrals, at each time step. To understand MI[263] mathematically, let us consider any two random variable sets, X and Y, each with its own probability distribution. In order to evaluate the correlation between these random variable sets, one begins by measuring how similar the joint distribution p(X, Y ) is to the factored distribution p(X)p(Y ). If X and Y are two independent sets, i.e. p(x, y) = p(x)p(y) where x ∈ X and y ∈ Y , then the MI is zero. MI between two random variable sets X and Y is given by, MI(X; Y ) = X x∈X X y∈Y p(x, y)log p(x, y) p(x)p(y) (6.1) To check for the correlation between the internal coordinates in our dataset during the course of the reaction, we have built the correlation matrix, XT X, where X is a matrix of all the features. We adapted the idea of graph theory and applied it to find the shortest path between the nodes. Each configuration along the trajectories has been represented as a node in the graph. We used the Networkx python package[264] to find the shortest path with the help of Dijkstra’s algorithm[265, 266]. 6.3 Computational details We optimized the geometry of trans-retinal at the ωB97X-D/6-31+G* level of theory. We generated two sets of configurations of retinal. In one set, configurations are extracted from AIMD trajectories, while in the other set, configurations are obtained from NAMD trajectories using Tully’s scheme of FSSH[248]. We plotted the HOMO and LUMO in trans-retinal calculated with PBE0/STO-3G level of theory (Figure 6.2). We can see that the HOMO is a π orbital, whereas the LUMO is of the π ∗ character. The natural transition orbitals (PBE0/STO-3G) of the S1 and S2 transitions (Figure 8.5) show that the S1 transition is 78 nπ∗ type and the S2 transition is ππ∗ type. In the photoisomerization of retinal, S2 state is the optically active excited state. Thus, we performed both our NAMD and AIMD simulation on the S2 surface for the final analysis. The graph-based approach requires the calculation of the MI between internal coordinates and the energy gap between the orbitals involved in the transition that allows the isomerization of retinal. With a minimal basis set these orbitals correspond to the HOMO and LUMO as we show by comparing them to natural transition orbitals. Using a larger basis with polarization and diffuse functions would change the nature of the virtual orbitals, i.e. the transition is not between HOMO and LUMO. This poses an additional challenge in identifying the proper unoccupied orbital at each time step in the trajectories. Another reason for the choice of the basis relates to the computational cost of the simulations. Since many geometries need to be generated/sampled, we opted for the smallest standard basis available in Q-Chem. At the same time, the accuracy of the ab initio calculations does not affect the quality of the reaction path. While the minimal basis set was sufficient for the chosen prototypical photoisomerization, we do recommend using basis sets of appropriate size after careful inspection of the orbitals associated with the system at hand. Figure 6.2: HOMO and LUMO of trans-retinal calculated with PBE0/STO-3G level of theory. We have taken the two transition states for the isomerization, with d31 (the main dihedral angle along which the photoisomerization occurs, shown in Figure S1 and Table S1 in the supporting information) equal to 90◦ and 270◦ respectively, as starting points for AIMD simulations. We ran 125 simulations for each configuration (a total of 250). We computed 200 AIMD steps for each trajectory using a step size of 40 a.u. (∼ 1 fs) with TDDFT at the PBE0/STO-3G level of theory with Q-Chem[267]. These initial geometries 79 Figure 6.3: Natural transition orbitals representing transitions from the ground state to the S1 and the S2 state in trans-retinal. Holes are shown on the left and particles are on the right. allowed us to sample the TSs in retinal isomerization while ensuring that half of them end up in the cis basin, and the other half in trans. We observed that such a short time was sufficient for the molecule to reach the cis or trans basins. During the time evolution of the system, propagated with the velocity-Verlet algorithm, atoms were allowed to exchange energy with a Nose-Hoover thermostat at 300 K. All AIMD trajectories were computed on the second excited state of retinal for which the central double bond of d31 opens due to a ππ∗ transition. The NAMD simulations were run with TDDFT at the PBE0/STO-3G level of theory using the FSSH scheme without including decoherence with PySurf[268] software package. To validate the robustness of the graph analysis, we ran NAMD starting at both S1 and S2 states, with the non-adiabatic coupling (NAC) vectors being computed with Q-Chem. The results of NAMD at the S1 state are shown in the supporting information. A total of 50 simulations (25 from cis and 25 from trans) were performed with the maximum initial population in the lowest singlet excited state. Since we did not observe any photoisomerization happening during the runs (1000 steps with a time step of 40 a.u.), we sampled the TS in the same way it was sampled for AIMD, i.e. by starting 200 additional simulations from TS geometries and propagating for 30 steps with a time step of 40 a.u. We ended up with 50,000 data points from AIMD and 56,000 from NAMD simulations, which were used to calculate the MI between the energy of the HOMO and all the 80 Figure 6.4: Flowchart of the computational protocol described in this text. internal coordinates. Overall the NAMD data points comprise of 25,257 configurations with d31 values between 0◦±45◦ (cis isomer) and 25,460 configurations with d31 values between 180◦±45◦ (trans isomer). We constructed a graph with this data using the values of HOMO energy and all the internal coordinates as attributes of each node. In this way, each node represents a step in the collection of trajectories generated by the dynamics. Mutual information was used as the weight of the paths connecting nodes in the graph. While MI can be used as such, one needs to eliminate potential redundancies. To better understand how the internal coordinates are related to one another, we plotted the correlation matrices using three datasets. 81 We constructed the graphs in two steps. First, we connected the nodes for which the d31 internal coordinate differs by 3◦ and the HOMO energy increases while moving from one node to the next one. We could join two sections in our graph with these criteria, i.e. the nodes between the 0◦ - 90◦ and 180◦ - 270◦ . In the next step, we kept the constraint on the dihedral the same as in the last step but followed the nodes where the difference in HOMO energy decreases on going from one node to the other. Thus, we constructed the entire graph in the range of 0◦ to 360◦ . After the graph was constructed, we followed the expression of the weight in equation 6.2 to find the shortest path from the cis conformer to the trans conformer with the Dijkstra algorithm. Figure 6.4 shows a flowchart of our workflow. In equation 6.2, the summation runs over the top N internal coordinates ranked by their MI value. ∆fi is the difference in internal coordinate value between the connected nodes and mi is the MI value associated with the internal coordinate considered. wpq = |∆Epq| X N i=1 mi |∆f pq i | (6.2) 6.4 Results and discussion Figure 6.5: Energy in eV of HOMO (blue) and LUMO (red) as function d31 from NAMD (left) and AIMD (right) simulations 82 Figure 6.5 shows the energy of HOMO and LUMO as a function of d31 from the NAMD simulations. The gap between HOMO and LUMO decreases at the transition state i.e. when the dihedral angle d31 reaches 90◦ and 270◦ . This is typical for molecules undergoing photoisomerization reaction and our NAMD simulation corroborates this pattern. We have used the data from AIMD and NAMD simulations to calculate the mutual information between the internal coordinates and the HOMO energy. With the NAMD data we compared two cases where we considered, (i) the full NAMD dataset and (ii) only data points that are close to the transition state. Thus we could investigate the effect of oversampling the cis and trans basins with NAMD simulations. Figure 6.6 (top) shows the mutual information between the energy of HOMO and internal coordinates obtained from the AIMD simulation. We have chosen internal coordinates which are in proximity to the main dihedral responsible for the photoisomerization. From Figure 6.6 (bottom) one can see that all the internal coordinates we have chosen, are highly correlated with the energy of the HOMO at each time step of the NAMD simulations with d31 and d13 being associated with the highest mutual information value. The MI plot with AIMD dataset also tells us that the dihedral angles closer to the reaction center play a more important role than the other internal coordinates. d13 and d31 represent dihedral C-C11=C12-C and HC11=C12-C respectively which rotate during the photoisomerization process. Figure 6.6 (top) describes the mutual information with the AIMD dataset. We can see that in this case, all the internal coordinates close to d31 are contributing similarly. A similar situation arises for the truncated NAMD dataset. We can explain the difference in the mutual information between these three datasets by considering the oversampling of the cis and trans configurations in the NAMD simulations. To generate the AIMD dataset and the selected NAMD dataset we started the simulations from the initial configurations close to the transition state. Thus the resemblance between the mutual information plots of the truncated NAMD dataset and the AIMD dataset is expected. We are interested in identifying which other internal coordinates may be involved in the isomerization mechanism apart from d31/d13. For this reason, we build the correlation 83 Figure 6.6: Mutual information between internal coordinates and HOMO energy from AIMD dataset (top), selected NAMD dataset (middle), full NAMD dataset (bottom). 84 Figure 6.7: Correlation matrix obtained from AIMD data (top), selected NAMD data (middle) and full NAMD data (bottom). The color bar denotes the correlation between internal coordinates. 85 matrix from our AIMD dataset. This is done by normalizing the initial data and calculating the matrix of the covariances between each internal coordinate. Performing this procedure on the NAMD data would show that all features, including d13 and d31, are uncorrelated due to the fact that the majority of the data points are for structures of the cis or trans isomer instead of the TS. We found that avoiding oversampling the cis and trans isomers during NAMD (Figure 6.7, middle) results in a correlation matrix similar to the one obtained from AIMD simulations (Figure 6.7, top). If we want to compare the correlation matrices obtained from AIMD and NAMD simulations, we need to change our NAMD dataset in such a way that it contains the majority of data points close to the transition state. The correlation matrix (Figure 6.7, top) obtained from AIMD data shows different levels of correlation and anti-correlation between pairs of internal coordinates. d13 and d31 and are anti-correlated to each other which is expected as they are alternate angles. In a similar manner, r12 and r13 are anti-correlated, which is coherent with an asynchronous stretching of these bonds. The positive correlation between r30 and r31 suggests a synchronous vibration of the C11-H30 and C12-H31 bonds. We verified our AIMD results after thoroughly comparing the mutual information and correlation matrices between AIMD and NAMD datasets and decided to perform the graph analysis using the mutual information from the AIMD data. The graphs were constructed with both AIMD and NAMD datasets. Five internal coordinates with the highest MI values from AIMD dataset were chosen for the construction of the graph. We have plotted the HOMO energy as a function of the dihedral angles obtained from the graph with NAMD data (Figure 6.8). The HOMO energy increases in the beginning and reaches its highest when d31 is close to 90◦ and starts decreasing which is consistent with the mechanism of the photoisomerization. If we compare Figure 6.5 and 6.8, we can see that the trend in HOMO energy agrees well between the NAMD simulation and the graph analysis. We obtained a similar trend with the AIMD data, i.e. maxima at d31 = 90◦ and 270◦ . If we compare NAMD on S1 and S2 surfaces and AIMD on the second excited (S2) state, we can show that graph analysis can correctly identify the most important internal coordinate 86 and it is independent of the method of sampling. Figure 6.9 shows the trend in internal coordinates as a function of the path parameter i.e. along the shortest path (red nodes in Figure 6.8). We observe an excellent agreement between the AIMD and the NAMD simulations. Note that we have removed d13 as it is the equivalent dihedral angle of d31. To further verify the effect of basis sets, we performed AIMD simulations with the PBE0/3-21G level of theory. As can be seen from Figures S6 - S9, with the increase in the basis, there is no significant change in the quality of the graph. We still find d31 to be the main internal coordinate contributing to the photoisomerization reaction. The plot of the internal coordinates along the path parameter (Figure S9) resembles the one we obtained with the minimal basis set. With this, we can safely conclude that the change in basis does not have a drastic effect on the sampling we used, and subsequently on the graph. If one wants to compute the reaction path relying only on ab initio methods, both the level of theory and the basis set size will affect the accuracy of the calculations. As our goal is to model the isomerization of retinal through a graph analysis and NAMD/AIMD calculations are used only for sampling purposes, we can afford to use a low-quality basis set to save the computational cost. AIMD in combination with a graph analysis provides the potential analysis faster with the same accuracy as NAMD with the graph theory. We would also like to mention that the cases where the non-adiabatic coupling vectors are unavailable, AIMD can be used to obtain the reaction path if it is used in conjunction with graphs. On comparing the computational cost of AIMD and NAMD, we found that AIMD trajectories are computed 1.5 times faster (CPU time) than NAMD ones, using QChem software on an Intel(R)Xeon(R) CPU with the clock speed of 3.07 GHz. Thus, AIMD can be used as a more affordable alternative to NAMD while simulating the reaction pathway in photoinduced reactions. 6.5 Conclusion In summary, we have modeled the photoisomerization reaction with machine learning and graph-theory tools in a biologically important system: retinal. From the analysis of the correlation matrix, we do not 87 Figure 6.8: 2D graph representations (blue nodes with black edges) of the data sets produced with NAMD (left) and AIMD (right) simulations. Nodes are displayed as a function of HOMO energies versus torsional angles of d31. Red nodes show one possible shortest path that can be found with the Dijkstra algorithm, resembling a potential energy surface for the isomerization of retinal. Figure 6.9: Internal coordinates from the NAMD (left) and AIMD (right) simulation along the reaction path found from graph analysis. notice any relevant internal coordinate involved in the isomerization apart from the main dihedral angle (d13/d31). The shortest path constructed from the graph is consistent with the expected PES associated with the reaction, with transition states located around the d31 dihedral angle values of 90◦ and 270◦ . Although this computational protocol does not aim to provide any quantitative measurement concerning the reaction, it can be used for gaining insights into a complete description of the reaction coordinate of chemical processes. We have also shown that the reaction paths obtained through AIMD and NAMD simulations agree well with each other, which enables one to model the reaction paths without involving 88 data from expensive NAMD simulations. Furthermore, additional improvements to the protocol can be incorporated by choosing a better method of assigning weights to the edges while constructing the graph. While we followed Tavadze et al.[251] and calculated the mutual information between the HOMO energy and the internal coordinates, we do not exclude the possibility of using other descriptors in more general cases. As an example, the HOMO-LUMO gap during the sampling step can readily be used as a descriptor. Lastly, we believe the protocols described in this work can be applied to more complex processes other than isomerization reactions. 6.6 Data and Software Availability statement The following data files are available free of charge at https://zenodo.org/records/4280446 (i) Optimized geometry of the cis and trans retinal, (ii) Figure labelling the internal coordinates of retinal. 89 Chapter 7 Optical Properties of New Donor–Acceptor Dyes for RNA Imaging: Insights from Ab Initio and Hückel’s Model Calculations 7.1 Introduction Fluorescent dyes are essential for bioimaging. They are used to visualize structures and processes in cells and in live organisms. Especially important are dyes that can selectively label a particular region of the cell. One approach for selective imaging of organelles or biomolecules is based on fluorogenic dyes, small molecules that change their fluorescent properties in different environments, for example, enhance fluorescence upon binding to a target or have solvent-dependent Stokes shifts. Here we focus on dyes suitable for live-cell RNA imaging[269]. Localization and movements of RNA in cells are important aspects of cellular function, such as gene expression and protein synthesis. Thus, tools that enable the study of these processes are important. A major challenge in live-cell RNA imaging is a dearth of effective dyes[270]. Motivated by this need, several small molecules based on a push-pull stilbene motif[271], with methyl pyridinium as the electron acceptor and indole or indolizine as the electron donor, were developed and tested[272–274]. Several dyes showed excellent performance—strong fluorogenic response (many-fold increase of fluorescence upon binding to RNA), good photostability, low cytotoxicity, etc. The two original 90 dyes from this family, MPI (system (3) in Fig. 7.1) and its non-methylated analog absorb in the blue-green range (440 and 540 nm, respectively). These dyes showed improved quantum yield relative to the commercially available dye called “SYTO RNA Select” (note that the structures of SYTO dyes are not available, so they cannot be further optimized)[270]. Zhang and coworkers[274] have discovered that the substitution of indole to indolizine resulted in the desirable red-shifted emission and generally improved optical properties. These indolizine-based dyes—systems (8) and (9)—have delivered good contrast for live-cell RNA in cell imaging[274]. However, the relationship between structural modifications and optical properties of these dyes remained unexplored. Here we use theoretical tools to analyze the electronic structure of the dyes from this family, with the goal to identify possible strategies for further optimization of their properties. Given that red-shifted dyes are best suited for live-cell and deep tissue imaging, it is desirable to determine chemical modifications capable of inducing this advantageous effect. We use a family of molecules reported[274] by Kim et al. to validate our theoretical models and then use our insights to make predictions of chemical modifications that can further induce a red shift. We present our results in a series of two papers: a theoretical paper (this contribution, paper 1) and an experimental paper (paper 2) [add citation]. We employ high-level quantum chemistry methods to compute relevant optical properties of the model dyes and analyze the results using charge distributions and molecular orbital theory. In order to derive insights from many-body simulations, we use reduced quantities such as natural transition orbitals and exciton descriptors. We also use Hückel’s model, which provides an essential description of the electronic structure[275]. The combination of the two approaches—high-level quantum chemistry calculations and Hückel’s model—allows us to interpret the results in simple terms and suggest the way to tune the properties of the dyes by chemical modifications. We validate the theoretical results against the experiment. Figure 7.1 shows structure of the dyes considered in this study. Systems (1)-(3), (8), and (9) have been characterized previously[274] whereas systems (4)-(7), (10), and (11) are new. Molecule (11) has not yet 91 Figure 7.1: Structures of dyes considered in this study. Red and blue colors denote donor and acceptor moieties, respectively. been characterized experimentally. After analyzing the trends in the optical properties of (1)-(11), we proposed a new candidate molecule with red-shifted absorption/emission. The theoretical predictions were validated experimentally. The new dye is indeed red-shifted and has good contrast, however, the quantum yield is too small for practical applications, which calls for further optimization. All molecules are positively charged and consist of donor and acceptor moieties connected through the ethene bridge. In the ground state, a positive charge resides on the acceptor moiety, and upon excitation electrons are moved from the donor moiety to the acceptor, thus reducing its positive charge. In systems (1)-(9), the positively charged moiety is pyridinium and in systems (10) and (11) it is isoquinolinium. The acceptors are indole (systems (1)-(7) and (10)) and indolizine (systems (8), (9), (11)), with and without methyl substitutions. Pyridinium moiety is always connected to the bridge through the para- position whereas indole and indolizine moieties are connected in several ways. As we show below, these structural modifications have a noticeable effect on optical properties—such as absorption and emission energies, Stokes and solvatochromic shifts—of the dyes. These variations can be explained by molecular orbital theory and trends in charge distributions in the ground and excited states, suggesting ways to tune the properties by further chemical modifications. 92 The donor–acceptor structural motif of the dyes results in polar charge distributions, which affects their docking properties. Furthermore, electronic excitation in these dyes leads to significant charge rearrangement—electron transfer from indole/indolizine to the pyridinium/isoquinolinium. Molecules with large charge transfer often show large solvatochromic effects, which can be beneficial for imaging applications. For example, dyes with large solvent-dependent Stokes shifts have been used for selective imaging, which was exploited in the pyridinium analogs of GFP used for imaging of endoplasmic reticulum[276] . The structure of the paper is as follows. In the next section we describe theoretical models and computational details. We then discuss the results of quantum chemistry and Hückel’s model calculations. In Section 7.4 we present the new theoretically designed dye and discuss its optical properties as well as directions for further optimization. In this paper, we refer to the experimental measurements, which are fully described in Paper 2, along with the details of synthesis and characterization of the dyes. Our concluding remarks are given in Section 7.5. 7.2 Theoretical models and computational details 7.2.1 Quantum chemistry calculations We compute excited states using TD-DFT with selected functionals (ωB97X-D, ωB97M-V, CAM-B3LYP, B5050LYP). We only consider long-range-corrected functionals to mitigate the self-interaction error, which is known to exaggerate charge-transfer effects[277, 278]. We compare TD-DFT results against the experimental data (see Paper 2 for detals) and against EOMCCSD calculations[279]. We focus on reproducing the trends rather than absolute values. The latter are difficult to compute with spectroscopic accuracy even when using high-level methods, however, if a method can reliably reproduce differences between different dyes, then it can be used for computational design of 93 novel systems and for interpretation of the experimental results. To analyze many-body wavefunctions, we compute reduced quantities such as natural transition orbitals (NTOs) and exciton descriptors[280, 281], which enable comparison of the wave-functions among different methods, providing physical insights[277, 282]. We describe solvent effects using the polarizable continuum model (PCM). We considered water, dimethylsulfoxide (DMSO), and DNA/RNA. We used the following dielectric constants (at 20◦C) and refractive indexes of solvents modeled with PCM: Water (ϵ = 78.54), DMSO (ϵ = 47.00), DNA/RNA (ϵ = 8.00). We note that PCM is likely to yield a relatively poor description of the DNA/RNA environment because it is highly inhomogeneous. We first compute the structures and properties of isolated molecules. We then optimize ground and excited states in solvents using ωB97X-D/cc-pVDZ. These structures are then used to compute electronic transitions using different functionals and EOM-CCSD. TD-DFT calculations were carried out with the aug-cc-pVDZ and aug-cc-pVTZ bases and EOM-CCSD calculations were carried out with aug-cc-pVDZ. All calculations were performed using the Q-Chem electronic structure program[44, 45]. Wave-function analysis was carried out using libwfa[281]. 7.2.2 Hückel’s model We also use Hückel’s model to rationalize the trends. Hückel’s model is a semi-empirical theory that provides a simple and intuitive framework for the description of conjugated organic molecules. It is a oneelectron theory based on a model Hamiltonian, which captures the essence of the MO-LCAO (molecular orbitals—linear combination of atomic orbitals) picture of electronic structure. Hückel’s treatment is most commonly applied to describe π-electron systems. The diagonalization of the Hückel Hamiltonian yields molecular orbitals composed of 2pz atomic orbitals of the atoms participating in the π-system. The shapes of these orbitals provide a guide for understanding underlying symmetries and charge distributions in the 94 ground and excited states. As we illustrate below, the shapes of Hückel’s orbitals are remarkably similar to the molecular orbitals computed by full ab initio calculations. Because of its simplicity, Hückel’s theory is often used to interpret the results of the electronic structure calculations, in particular, optical properties of dyes[283], including non-linear optical properties[284, 285] and even electronically metastable states[286]. The workflow of Hückel’s model calculations involves the following steps: 1. Identify sp2 -hybridized atoms that are part of the π-system. Each such atom contributes one 2pz atomic orbital to the basis set. 2. Number these atoms from 1 to n. 3. Construct the Hückel Hamiltonian (H) matrix based on the connectivity of the atoms in the πframework. 4. Diagonalize H to obtain the coefficients of the molecular orbitals and their energies. We implemented these steps in a Python program (see the SI for details and link to a github repository). The H matrix has the following form: H = α11 β12 . . . β1n β21 α22 . . . . . . . . . . . . . . . . . . βn1 . . . . . . αnn , (7.1) where αii represent site energies (called Coulomb integrals) associated with each atom and βij (called resonance integrals) represent the interactions between the two connected atoms in the network. The meaning of α and β becomes clear by considering a two-center system, such as ethylene—α represent energies of non-interacting sites and the splitting between bonding and anti-bonding molecular orbitals is proportional to β. 95 The concise description of the Hückel theory, its extensions to heteroatomic molecules, and strategies for choosing the parameters was given by Yates[287–289]. Within Hückel’s framework, only the connected atoms give rise to non-zero βij and α values are the same for the same type of atoms. Thus, for carbon atoms, one would use fixed values of α and β. For each new type of atoms—in our case, nitrogen—we introduce an additional value of α (more negative for more electronegative atoms) and the βij values for each new type of connectivity between atoms. Further, the values of α for each atom type can be adjusted to reflect different chemical environments (protonation, substituents) or different occupations[287, 288]— e.g., occupancy of N in pyridine and pyrrole is 1 and 2 respectively (Hückel’s occupations refer to the number of electrons participating in the π-system). Table 7.1: Values of α and β integrals (in eV) used for different types of centers within Hückel’s model. Element Occupation αX βCX Source C 1 -11.20 -2.620 Orig. a N, pyridine 1 -12.51 -2.096 Orig. a N, pyridinium 1 -16.44 -1.834 Orig. a N, indole, indolizine 2 -15.13 -2.620 Orig. a N, indole, methylated 2 -14.13 -2.000 This work b N, indolizine 2 -15.13 -2.000 This work b a From Refs. 287288. b This work (see text). Table 7.1 lists the relevant parameters used in this work. The parameters were taken from Yates[287, 288], with the following modifications: • To account for the electron-donating character of the CH3 group, which destabilizes the site energies of atoms it is attached to, we changed the value of α value for a doubly occupied N atom connected to a methyl group from -15.13 to -14.13 eV. • To account for the smaller overlap between the 2pz orbitals of nitrogen and carbon, we changed the β value for a bond between a doubly occupied N atom and a C atom from -2.620 to -2.000 eV. 96 Figure 7.2: Ground-state dipole moment and electrostatic potential (ESP) maps for dyes (1)-(11) in DMSO; ωB97X-D/aug-cc-pVTZ. Color scheme: Blue color represents positive charge and red color represents negative charge. 7.3 Results and discussion We begin by analyzing computed equilibrium structures to determine the relative energies of possible rotamers. The results are summarized in the SI. For some systems, the energy difference between the lowest rotamers is small, so both can be thermally populated. In these cases, we account for the contributions of different structures in the spectra by using Boltzmann-weighted spectra of the thermally accessible rotamers. We first discuss pyridinium-based dyes, molecules (1)-(9). The size of the conjugate system is the same in all systems. The structures differ by: (i) where nitrogen is located in the acceptor moiety (i.e., indole versus indolizine); and (ii) how the indole/indolizine moiety is connected to the bridge. Indole nitrogen is methylated in all structures except (1). Figure 7.2 shows the electrostatic potential (ESP) maps and dipole moments for the selected structures. The molecules are polar. The dipole moment (defined at the standard molecular orientation in which molecular center-of-mass is placed in the origin of the coordinate 97 system) points from the pyridinium towards the indole/indolizine moiety. The way the two moieties are connected determines the orientation of the dipole moment relative to the bridge. The ESP maps show that the positive charge is indeed localized on the acceptor moiety. Figure 7.3: Model dyes ordered by their computed excitation energy (no solvent); ωB97M-V/aug-cc-pVTZ. Fig. 7.3 shows the range of computed excitation energies—(9) YL158 has the lowest excitation energy and (5) 5MPI has the highest; the difference between the lowest and the highest energy is 0.6 eV. Table 7.3 shows the key electronic properties for the dyes computed with ωB97X-D (experimental values are also given when available; see Paper 2 for details). In all cases, the lowest electronic transition is bright. All chromophores show significant chargetransfer character manifested by large change of the dipole moment upon the excitation. Hence, it is not surprising that the molecules exhibit strong solvatochromism. Fig. 8.5 shows the NTOs corresponding to the S0 →S1 transitions and the changes in ESP upon excitation. 98 Figure 7.4: NTOs for the S0 →S1 transitions for dyes (1)-(11) and the change in ESP upon electronic excitation in DMSO. ωB97M-V/aug-cc-pVTZ. Blue/violet represents net positive charge due to electron transfer from the donor to the acceptor. 99 Table 7.2: Photophyiscal properties of the dyes (lowest rotamer structure) in DMSO computed with different levels of theory. Energies in eV, oscillator strength in parenthesis, dipole moments in Debye. Dye/DMSO Method Eex (fL) Eem (fL) ∆E µgr µex ∆µ (1) - YL154 ωB97X-D 3.27 (1.40) 2.71 (1.56) 0.56 22.54 12.77 -9.78 (1) - YL154 ωB97M-V 3.36 (1.42) 2.71 (1.54) 0.65 22.84 14.35 -8.49 (1) - YL154 B5050LYP 3.20 (1.40) 2.70 (1.61) 0.50 22.12 10.63 -11.49 (1) - YL154 CAM-B3LYP 3.19 (1.36) 2.67 (1.56) 0.52 22.35 11.27 -11.08 (1) - YL154 EOM-CCSD 3.64 (1.29) 2.99 (1.54) 0.65 23.30 12.01 -11.29 (1) - YL154 exp 2.84 2.09 0.75 (2) - YL146 ωB97X-D 3.10 (1.12) 2.52 (1.23) 0.58 21.76 11.49 -10.27 (2) - YL146 ωB97M-V 3.20 (1.14) 2.53 (1.23) 0.67 22.09 13.24 -8.85 (2) - YL146 B5050LYP 3.02 (1.12) 2.50 (1.27) 0.52 21.36 9.29 -12.07 (2) - YL146 CAM-B3LYP 3.02 (1.08) 2.49 (1.22) 0.53 21.58 9.92 -11.66 (2) - YL146 EOM-CCSD 3.44 (1.03) 2.80 (1.23) 0.64 22.55 11.04 -11.51 (2) - YL146 exp 2.91 2.17 0.74 (3) - MPI ωB97X-D 3.26 (1.20) 2.89 (1.34) 0.37 15.15 6.92 -8.23 (3) - MPI ωB97M-V 3.32 (1.21) 2.88 (1.33) 0.44 15.42 7.92 -7.50 (3) - MPI B5050LYP 3.24 (1.20) 2.94 (1.37) 0.30 14.91 5.90 -9.01 (3) - MPI CAM-B3LYP 3.16 (1.18) 2.87 (1.33) 0.29 15.00 6.17 -8.83 (3) - MPI EOM-CCSD 3.43 (1.11) 2.94 (1.30) 0.49 15.89 6.88 -9.01 (3) - MPI exp 2.82 2.30 0.52 (4) - 4MPI ωB97X-D 3.14 (0.87) 2.50 (1.01) 0.64 15.63 5.87 -9.76 (4) - 4MPI ωB97M-V 3.25 (0.91) 2.53 (1.03) 0.72 15.91 7.48 -8.43 (4) - 4MPI B5050LYP 3.07 (0.85) 2.50 (1.04) 0.57 15.36 4.31 -11.05 (4) - 4MPI CAM-B3LYP 3.05 (0.82) 2.45 (0.99) 0.60 15.51 4.63 -10.88 (4) - 4MPI EOM-CCSD 3.43 (0.73) 2.67 (0.92) 0.76 16.38 5.37 -11.01 (4) - 4MPI exp 2.87 2.06 0.81 (5) - 5MPI ωB97X-D 3.42 (1.31) 2.83 (1.48) 0.59 17.04 7.44 -9.60 (5) - 5MPI ωB97M-V 3.51 (1.30) 2.83 (1.44) 0.68 17.32 8.99 -8.33 (5) - 5MPI B5050LYP 3.37 (1.30) 2.85 (1.52) 0.52 16.68 5.79 -10.89 (5) - 5MPI CAM-B3LYP 3.34 (1.27) 2.80 (1.47) 0.54 16.88 6.08 -10.80 (5) - 5MPI EOM-CCSD 3.68 (1.19) 2.96 (1.44) 0.72 17.77 6.15 -11.62 (5) - 5MPI exp 3.05 2.52 0.53 (6) - 6MPI ωB97X-D 3.33 (1.32) 2.70 (1.52) 0.63 18.48 8.22 -10.26 (6) - 6MPI ωB97M-V 3.42 (1.33) 2.71 (1.50) 0.71 18.77 9.94 -8.83 (6) - 6MPI B5050LYP 3.23 (1.30) 2.70 (1.57) 0.53 18.05 6.10 -11.94 (6) - 6MPI CAM-B3LYP 3.23 (1.27) 2.66 (1.51) 0.57 18.29 6.69 -11.61 (6) - 6MPI EOM-CCSD 3.61 (1.13) 2.84 (1.43) 0.77 19.26 7.66 -12.60 (6) - 6MPI exp 2.95 2.12 0.83 (7) - 7MPI ωB97X-D 3.33 (0.79) 2.65 (1.02) 0.68 19.76 9.16 -10.60 (7) - 7MPI ωB97M-V 3.47 (0.85) 2.68 (1.04) 0.79 19.96 11.00 -8.95 (7) - 7MPI B5050LYP 3.26 (0.74) 2.67 (1.04) 0.59 19.46 6.97 -12.49 (7) - 7MPI CAM-B3LYP 3.23 (0.74) 2.62 (1.01) 0.61 19.64 7.66 -11.98 (7) - 7MPI EOM-CCSD 3.61 (0.66) 2.83 (0.97) 0.83 20.39 9.25 -11.14 (7) - 7MPI exp 2.92 2.43 0.49 (8) - YL166 ωB97X-D 2.92 (1.27) 2.60 (1.40) 0.32 14.32 5.33 -8.99 (8) - YL166 ωB97M-V 2.97 (1.27) 2.58 (1.38) 0.39 14.66 6.53 -8.13 100 (8) - YL166 B5050LYP 2.90 (1.30) 2.65 (1.45) 0.25 13.79 3.62 -10.17 (8) - YL166 CAM-B3LYP 2.86 (1.25) 2.59 (1.39) 0.27 14.09 4.20 -9.89 (8) - YL166 EOM-CCSD 3.07 (1.17) 2.64 (1.36) 0.43 15.15 4.83 -10.32 (8) - YL166 exp 2.50 2.12 0.38 (9) - YL158 ωB97X-D 2.75 (1.15) 2.46 (1.27) 0.29 16.16 8.45 -7.71 (9) - YL158 ωB97M-V 2.78 (1.12) 2.41 (1.23) 0.37 16.53 9.41 -7.11 (9) - YL158 B5050LYP 2.72 (1.17) 2.49 (1.30) 0.23 15.62 7.37 -8.26 (9) - YL158 CAM-B3LYP 2.69 (1.13) 2.44 (1.26) 0.25 15.92 7.68 -8.24 (9) - YL158 EOM-CCSD 2.90 (1.05) 2.48 (1.24) 0.42 17.12 8.37 -8.75 (9) - YL158 exp 2.39 2.04 0.35 (10) - R7 ωB97X-D 3.13 (1.33) 2.54 (1.57) 0.58 2.96 14.88 11.92 (10) - R7 ωB97M-V 3.23 (1.34) 2.55 (1.54) 0.67 2.68 12.36 9.68 (10) - R7 B5050LYP 3.03 (1.29) 2.54 (1.61) 0.48 3.23 18.23 15.01 (10) - R7 CAM-B3LYP 3.02 (1.28) 2.50 (1.56) 0.52 3.12 17.27 14.15 (10) - R7 exp 2.74 2.04 0.70 (11) - R7-YL158 ωB97X-D 2.83 (1.39) 2.30 (1.66) 0.53 3.76 13.89 10.13 (11) - R7-YL158 ωB97M-V 2.92 (1.38) 2.28 (1.62) 0.64 3.69 11.68 7.99 (11) - R7-YL158 B5050LYP 2.72 (1.36) 2.31 (1.72) 0.41 4.06 17.47 13.41 (11) - R7-YL158 CAM-B3LYP 2.73 (1.34) 2.27 (1.66) 0.46 3.86 16.28 12.42 Table 7.2 compares the properties computed with different methods for dyes in DMSO (experimental values are also shown when available). We begin by assessing the performance of quantum chemistry methods. We consider both absolute values of excitation energies and the trend. Fig. 7.5 shows the computed excitation energies (accounting for contributions from different rotamers when appropriate) versus experimental values for systems (1)-(9). All methods overestimate excitation energies relative to the experiment by as much as 0.6 eV, however, the overall trend is captured much better. EOM-CCSD gives the largest correlation coefficient (R2=0.923), with one notable outlier, YL154 (system (1)), which is the most polar system from the set. It is possible that this deviation is due to the limitations of CPCM treatment (we use the zero-order version of the theory, which does not include state-specific corrections). We note that the results might also improve using a larger basis set. Among various functionals, ωB97M-V performs the best (R2=0.889), followed by ωB97X-D (R2=0.863). In ωB97M-V, the notable outliers are (7)/7MPI (energy overestimated) and (2)/YL146 (energy is underestimated). We note that in these three cases, the change of the dipole moment, ∆µ, computed by ωB97M-V differs considerably from the EOM-CCSD values. In 101 the SI (Fig. S1), we show correlation plots for systems (1)-(10) computed with TD-DFT. Using this larger set, the correlation coefficients are slightly reduced for all functionals, but the overall trend is unchanged: 0.853 (ωB97M-V), 0.840 (ωB97X-D), 0.843 (CAM-B3LYP), and 0.785 (B5050LYP). Reducing the basis set from aug-cc-pVTZ to aug-cc-pVDZ in the TD-DFT calculations has small effect on the results (see Fig. S2 in the SI); for ωB97M-V the correlation coefficient is actually slightly improved in aug-cc-pVDZ (0.906 versus 0.889). All methods show a significant change in dipole moment—more than 10 Debye—upon excitation for systems (1)-(9), consistent with the ESP maps. According to EOM-CCSD, the following systems shows the largest ∆µ—(1), (2), (5), (6), and (7). This has an effect on solvatochromic shifts. Table 7.3 shows optical properties of the dyes computed with ωB97X-D in different PCM solvents, as well as the experimental excitation and emission energies in DMSO and RNA. The calculations show shifts in excitation energies of up to 0.6 eV in water (the most polar solvent) relative to the isolated chromophores. Comparing DMSO and RNA environments, experimental excitation energies differ by 0.1-0.2 eV. The dyes show Stokes shifts in the range of 0.2-0.9 eV (experimental values), consistent with the changes in the bond-alternation pattern in excited state. The largest value was observed for (5) and the smallest—for (9) and (8). Interestingly, the spread of values is larger for RNA relative to DMSO. Fig. 7.6 shows correlation plot of theoretical versus experimental Stokes shifts in polar (DMSO) and non-polar (RNA/DNA) environments. Overall, the performance of theory is disappointing—the correlation coefficient for ωB97X-D is 0.53, without obvious trend (e.g., both DMSO and RNA results show a big scatter). If the biggest outlier is removed from the analysis (7MPI in DMSO), the correlation coefficient increases to 0.75 (see Fig. S5 in the SI). 102 Figure 7.5: Correlation plots for the excitation energies of the S0 →S1 transition computed with different theoretical methods for dyes (1)-(9) in DMSO. For YL166, YL154, YL158, 5MPI, 6MPI, and 7MPI, excitation energies are computed using Boltzmann populations of two lowest rotamers. 103 Figure 7.6: Correlation plots for the theoretical and experimental values of the Stokes shifts, systems (1)- (10). 104 Table 7.3: Photophyiscal properties (excitation and emission energies, oscillator strengths, Stokes shifts, ground- and excited-state dipole moments, and change in dipole moment between the ground and excited states) of the dyes in different PCM solvents for the lowest rotamer structures. Energies in eV, dipole moments in Debye. Chro/solv Method Eex (fL) Eem (fL) ∆E µgr µex ∆µ YL146 ωB97X-D 2.71 (0.97) 2.43 (0.71) 0.28 13.21 1.80 -11.41 YL146/DMSO ωB97X-D 3.10 (1.12) 2.52 (1.23) 0.58 21.76 11.49 -10.27 YL146/wat ωB97X-D 3.11 (1.12) 2.53 (1.24) 0.58 21.84 11.62 -10.22 YL146/DNA ωB97X-D 3.02 (1.13) 2.50 (1.21) 0.52 20.74 10.06 -10.68 YL146/DMSO exp 2.91 2.17 0.74 YL146/DNA exp 2.98 2.24 0.74 YL166 ωB97X-D 2.84 (1.12) 2.74 (1.08) 0.10 6.86 0.60 -6.26 YL166/DMSO ωB97X-D 2.92 (1.15) 2.61 (1.22) 0.30 15.13 7.72 -7.41 YL166/wat ωB97X-D 2.92 (1.14) 2.61 (1.22) 0.31 15.19 7.77 -7.42 YL166/DNA ωB97X-D 2.87 (1.16) 2.61 (1.22) 0.26 14.42 7.19 -7.23 YL166/DMSO exp 2.50 2.12 0.38 YL166/DNA exp 2.38 2.12 0.26 YL154 ωB97X-D 2.76 (1.08) 2.51 (0.87) 0.25 13.09 1.74 -11.35 YL154/DMSO ωB97X-D 3.17 (1.19) 2.57 (1.31) 0.59 21.94 11.50 -10.44 YL154/wat ωB97X-D 3.16 (1.19) 2.57 (1.31) 0.59 22.02 11.63 -10.39 YL154/DNA ωB97X-D 3.07 (1.19) 2.55 (1.29) 0.51 20.96 10.10 -10.86 YL154/DMSO exp 2.84 2.09 0.75 YL154/DNA exp 2.77 2.12 0.65 YL158 ωB97X-D 2.80 (1.09) 2.72 (1.04) 0.08 7.85 1.44 -6.42 YL158/DMSO ωB97X-D 2.88 (1.13) 2.52 (1.10) 0.36 15.20 6.83 -8.37 YL158/wat ωB97X-D 2.89 (1.13) 2.52 (1.11) 0.37 15.28 6.91 -8.37 YL158/DNA ωB97X-D 2.82 (1.15) 2.50 (1.09) 0.32 14.22 6.02 -8.20 YL158/DMSO exp 2.39 2.04 0.35 YL158/DNA exp 2.25 2.04 0.21 MPI ωB97X-D 3.04 (1.13) 2.94 (1.11) 0.10 9.93 2.00 -7.93 MPI/DMSO ωB97X-D 3.26 (1.20) 2.89 (1.34) 0.37 15.15 6.92 -8.23 MPI/wat ωB97X-D 3.25 (1.20) 2.89 (1.34) 0.36 15.21 6.98 -8.22 MPI/DNA ωB97X-D 3.19 (1.21) 2.88 (1.33) 0.31 14.44 6.24 -8.20 MPI/DMSO exp 2.82 2.30 0.52 MPI/DNA exp 2.71 2.30 0.41 4MPI ωB97X-D 2.70 (0.75) 2.45 (0.72) 0.25 10.41 4.22 -6.19 4MPI/DMSO ωB97X-D 3.14 (0.87) 2.50 (1.01) 0.64 15.63 5.87 -9.76 4MPI/wat ωB97X-D 3.14 (0.87) 2.50 (1.01) 0.64 15.68 5.96 -9.72 4MPI/DNA ωB97X-D 3.06 (0.86) 2.49 (1.00) 0.57 14.97 4.95 -10.02 4MPI/DMSO exp 2.87 2.06 0.81 4MPI/DNA exp 2.88 2.14 0.74 5MPI ωB97X-D 2.99 (0.91) 2.56 (0.06) 0.43 10.85 2.46 -8.39 5MPI/DMSO ωB97X-D 3.42 (1.31) 2.83 (1.48) 0.59 17.04 7.44 -9.60 5MPI/wat ωB97X-D 3.37 (1.14) 2.78 (1.32) 0.59 12.21 6.83 -5.38 5MPI/DNA ωB97X-D 3.29 (1.15) 2.77 (1.32) 0.52 15.49 5.05 -5.44 5MPI/DMSO exp 3.05 2.52 0.53 105 5MPI/DNA exp 3.07 2.18 0.89 6MPI ωB97X-D 2.83 (0.89) 2.66 (0.89) 0.17 11.99 3.03 -8.96 6MPI/DMSO ωB97X-D 3.33 (1.32) 2.70 (1.52) 0.63 18.48 8.22 -10.26 6MPI/wat ωB97X-D 3.26 (1.10) 2.62 (1.27) 0.64 18.38 8.01 -10.37 6MPI/DNA ωB97X-D 3.17 (1.10) 2.61 (1.26) 0.56 17.48 6.57 -10.91 6MPI/DMSO exp 2.95 2.12 0.83 6MPI/DNA exp 2.96 2.35 0.61 7MPI ωB97X-D 2.83 (0.69) 1.84 (0.00) 0.99 12.54 2.59 -9.95 7MPI/DMSO ωB97X-D 3.33 (0.79) 2.65 (1.02) 0.68 19.76 9.16 -10.60 7MPI/wat ωB97X-D 3.42 (0.74) 2.67 (0.98) 0.75 19.53 7.68 -11.85 7MPI/DNA ωB97X-D 3.30 (0.74) 2.67 (0.96) 0.63 18.58 6.25 -12.33 7MPI/DMSO exp 2.92 2.43 0.49 7MPI/DNA exp 2.94 2.21 0.73 R7 ωB97X-D 2.71 (1.22) 2.58 (1.22) 0.13 13.71 3.77 -9.94 R7/DMSO ωB97X-D 3.13 (1.33) 2.54 (1.57) 0.58 2.96 14.88 11.92 R7/wat ωB97X-D 3.13 (1.33) 2.54 (1.57) 0.59 2.92 14.76 11.84 R7/DNA ωB97X-D 3.04 (1.33) 2.53 (1.55) 0.51 3.57 16.29 12.72 R7/DMSO exp 2.74 2.04 0.70 R7/DNA exp 2.65 2.10 0.55 R7-YL158 ωB97X-D 2.48 (1.36) 2.40 (1.37) 0.13 11.61 1.35 -10.26 R7-YL158/DMSO ωB97X-D 2.83 (1.39) 2.30 (1.66) 0.53 3.76 13.89 10.13 R7-YL158/wat ωB97X-D 2.84 (1.39) 2.30 (1.66) 0.54 2.75 13.76 11.01 R7-YL158/DNA ωB97X-D 2.74 (1.40) 2.30 (1.65) 0.44 4.22 15.35 11.13 106 7.3.1 Explaining trends using Hückel’s model and making predictions Figure 7.7: Correlation plots for the excitation energies of the S0 →S1 transition computed with ab initio (ωB97M-V) method and Hückel’s model for dyes (1)-(11). We begin by comparing Hückel’s HOMO-LUMO gaps with the excitation energies from ab initio calculations. Fig. 7.7 shows a correlation plot of the S0 →S1 transition computed with TD-DFT (ωB97M-V) and Hückel’s model for dyes (1)-(11) (Fig. XX in the SI shows similar plots for other functionals). Despite the simplicity of Hückel’s model, the correlation is quite good (coefficient 0.83), meaning that the model captures the main trends. We note that using original, unmodified parameters for indolizine results in poorer correlation (see Fig. XX in the SI). Fig. 7.8 compares NTOs from TD-DFT calculations with the HOMO and LUMO from Hückel’s model for YL158. As one can see, the nodal structure and the variations in the amplitudes of the NTOs are also reproduced remarkably well by Hückel’s model. Fig. 7.9 shows Hückel’s molecular orbitals for the dyes. Hückel’s model captures different patterns in the frontier orbitals as well as varying extent of charge transfer. Hückel’s molecular orbitals reveal differences between the dyes – in some, most change happens on the ethene bridge whereas in others rings are also involved. 107 Figure 7.8: Top: NTOs computed with TD-DFT for YL158 (system (9)). Bottom: HOMO and LUMO computed by Hückel’s model. Fig. 7.10 shows changes in atomic charges upon the HOMO-LUMO transition from Hückel’s model. Positive signs mean that electrons are removed from the respective atoms upon excitation—or that the electron density is depleted in the LUMO relative to the HOMO. Conversely, negative signs mean that the density in the LUMO increases upon excitation. These plots show electron transfer from the donor to the acceptor, consistently with the ab initio ESP maps. These changes in atomic charges can help to rationalize the trends in excitation energies and also suggest which sites can be targeted by substituents to tune excitation energies. Within Hückel’s model, excitation energies are given by the LUMO-HOMO energy gap. Hence, stabilizing the HOMO or destabilizing the LUMO leads to a blue shift in excitation energy. Conversely, destabilizing the HOMO or stabilizing the LUMO leads to a red shift. In these dyes, nitrogen is the most electronegative atom, with the lowest site energy. Hence, orbitals that have larger electron density on the 108 Figure 7.9: HOMO and LUMO from Hückel’s model for dyes (1)-(11). Figure 7.10: Changes in atomic charges between the HOMO and LUMO from Hückel’s model for selected dyes. Positive signs mean that electrons are removed from the respective atoms upon excitation. 109 nitrogen are stabilized more. If the density on nitrogen is depleted upon excitation, it means that the the HOMO is stabilized relative to the LUMO. • Methylation: Compare (1) and (2)—they have the same structures but in (2) indole’s nitrogen is methylated; (2) is red-shifted relative to (1). The pattern of the frontier molecular orbitals is very similar. Methylation (higher site energy due to electron-donating properties of the methyl group) destabilizes both the HOMO and the LUMO, but in (2) the density on indole’s nitrogen is depleted, thus the LUMO is stabilized relative to the HOMO—hence, the red shift in (2). • Connectivity: – Compare (2) and (3)—they have the same moieties, but different connectivity; (2) is red-shifted relative to (3). In (3), density on indole’s nitrogen is depleted much more than in (2). Because nitrogen has the lowest site energy, depleting density in the LUMO destabilizes it relative to the HOMO—hence, higher excitation energy. – Compare (2) and (4)—(4) is connected through the 6-membered ring; (2) is slightly red-shifted relative to (4). Again, we see that density on indole’s nitrogen is depleted in (4) more than in (2). – Compare (4), (6), (7), (5), listed in the order of increasing excitation energy. Depletion on indole’s nitrogen is similar in (4) and (6), and is larger in (7) and (5), with (5) having the largest value. • Indole versus indolizine: Compare (2) (indole) versus (8) and (9)—energies are red-shifted in (8) and (9), (9) being the lowest. In (2) density on nitrogen is depleted whereas in (8) and (9) density on nitrogen increases, stabilizing the LUMO—hence, the red shift. (9) shows larger increase in excitation energy than (8). 110 • Compare (10) and (11) to see that this analysis holds these too. This analysis suggests that excitation energies in this family of dyes can be tuned up by placing electron donating or withdrawing substituents in the positions where the changes in atomic charges are the largest. For example, according to Fig. 7.10, placing electron-donating substituents on the most red atoms and electron-withdrawing substituents on the most blue atoms should result in the red-shifted absorption. We tested some of the sites by carrying out TD-DFT calculations of so-functionalized molecules and the predictions of Hückel’s model were confirmed. In the next section, we present the result for one new dye designed following these ideas. 7.4 Design of the new dye: Preliminary results and outlook According to Hückel’s model calculations, upon excitation the electron density increases on the carbon atoms adjacent to the positively charged nitrogen of the pyridinium moiety in all dyes. Hence, functionalizing these carbon atoms with electron-withdrawing substituents would result in a red-shifted absorption. Due to practical concerns related to the feasibility of synthesis, we converged on an alternative strategy — i.e., replacing carbon atoms directly with more electronegative elements, which can be achieved by using pyridazine as the acceptor moiety. Fig. 7.11 shows the resulting structure. This new dye, denoted R8, is structurally similar to MPI from which it can be derived by replacing the pyridinium moiety with pyridazine. We use R8 to test the theoretical predictions presented in the previous section. The full details of the synthesis and characterization are given in Paper 2. We computed the absorption energy for R8 at the ωB97M-V/aug-cc-pVDZ level of theory in DMSO (PCM). Fig. 7.12 shows the NTOs of the lowest excited state, which is also the brightest electronic transition and Table 7.4 compares the absorption energy with the experimental measurements and the corresponding values for MPI (reference compound). 111 Figure 7.11: Structure of the R8 compound. Figure 7.12: Hole (left) and electron (right) NTOs computed with ωB97M-V/aug-cc-pVDZ for R8. Table 7.4: Theoretical (ωB97M-V/aug-cc-pVDZ) and experimental absorption energies of MPI and R8. Dye/DMSO Method Eex, eV (fL) MPI ωB97M-V 3.32 (1.21) MPI exp 2.82 R8 ωB97M-V 2.92 (1.09) R8 exp 2.58 As predicted by Hückel’s model, the modifications incorporated in the new dye (R8) results in an absorption spectrum red-shifted by 0.4 eV and 0.24 eV among theoretical and experimental values respectively. While we were able to design R8 as an improved version of MPI in terms of absorption, this new dye turned out to have a much reduced fluorescent quantum yield (FQY), which is undesirable. The FQY is determined by the competition of radiative and radiationless decay channels: F QY = kr kr + knr = τnr τr + τnr , (7.2) 112 where kr/nr and τr/nr are respective rate constants and lifetimes. Hence, the decreased FQY could result from either increased radiative lifetime or decreased radiationless lifetime. The former can be estimated according to the following formula[290]: τf l = 2π c n 3 ε E2 exfL · 2.42 × 10−8 (7.3) where: c is the speed of light in vacuum (137 a.u.), n is the refractive index of the medium (1.48 for DMSO at 20◦C), ε is the dielectric constant of the medium (47 for DMSO at 20◦C), Eex is the absorption energy of a dye in a.u., fL is the oscillator strength of a specific electronic transition, 2.42 × 10−8 is a factor that converts lifetimes from a.u. to ns. As per Eq. (7.3), the red-shifted excitation energies lead to longer τf l. n Fig. 7.13 shows computed intrinsic fluorescence lifetimes of the dyes relative to YL146 (τ Dye f l /τ Y L146 f l ). As one can see, R8 has ∼30% longer τf l relative to the reference system. This alone would result in the decrease of FQY by factor 1.3. However, the observed decrease is much larger, suggesting that R8 also has decreased radiationless lifetime. The calculations of radiationless relaxation rates are more involved and we will pursue them in future work. Overall, this example illustrates the importance of including calculations of FQY into the computational design process. 7.5 Conclusion We presented quantum chemical calculations of electronic properties of the 11 dyes developed for live-cell RNA imaging. Ab initio calculations reproduce the main trends in the essential properties of the dyes rather well. While EOM-CCSD yields the best correlation with the experimental absorption energies, TDDFT simulations also perform well, with ωB97M-V functional delivering the best results. We also carried out Hückel’s model calculations, which provide insight into the electronic structure of the dyes. Hückel’s model simulations follow the trends of ab initio simulations while delivering a simple picture of the nodal 113 Figure 7.13: Intrinsic fluorescence lifetimes of the dyes relative to YL146. structure of the frontier orbitals and patterns in charge redistribution upon excitation. By analyzing the patterns in the frontier MOs, we were able to explain the trends in electronic excitation energies due to different connectivity and different isomers (e.g., indole versus indolizine moieties). By analyzing the changes in partial charges, we proposed a strategy for tuning-up excitation energies in this family of dyes by chemical modifications. Given the excellent performance of Hückel’s model, it can be used for the initial screening of prospective dyes and in the machine-learning workflows for optimization of their properties. On the basis of this analysis, we proposed a new dye with red-shifted absorption/emission. The new dye was synthesized and tested experimentally. While the experiment confirmed red-shifted optical properties, unfortunately, the new dye turned out to be rather dim. In future work, we will explore reasons for the lost of FQY. Our study illustrate the power of the theory to predict new structures with desired spectral shifts while emphazing the need to include modeling of FQY in the computational design process. 114 Chapter 8 A computational study of possible mechanisms of singlet oxygen generation in miniSOG photoactive protein 8.1 Introduction Genetically encodable photoactive proteins are used in a variety of applications[291, 292]. Of particular interest are photoactive systems that can generate reactive oxygen species (ROS) upon exposure to light. The interest in such systems stems from their uses in electronic microscopy[293], photodynamic therapy[294], and chromophore-assisted laser cell inactivation[295]. One such protein is miniSOG (for mini Singlet Oxygen Generator)—a small (106 amino acid residues) flavin-containing protein capable of generating ROS when stimulated by blue light[296]. miniSOG is the first flavin-binding protein developed specifically as a genetically encodable light-induced source of singlet oxygen. The chromophore in miniSOG is flavin mononucleotide (FMN), however, variants with a riboflavin (RF) cofactor have also been investigated[297, 298]. Fig. 8.1 shows the miniSOG structure as well as structures of the FMN and RF chromophores. Interestingly, the quantum yield of singlet oxygen production in miniSOG is much smaller than that in free FMN—i.e., 0.03 versus 0.51 (see, for example, Ref. 297), which has been attributed to the effective 115 Figure 8.1: miniSOG protein. Flavin mononucleotide (FMN) and riboflavin (RB) cofactors are shown in the inserts. quenching of the FMN’s triplet state by the protein via electron transfer[297, 299, 300]. This undesirable quenching by the protein has also been deemed responsible for producing other types of ROS, such as peroxide[299], which is undesirable for applications. Several studies reported modifications of miniSOG aiming to increase the quantum yield of singlet oxygen production[299, 301]. For example, by mutating one residue forming a hydrogen bond with FMN, the quantum yield of O2( 1∆g) was increased up to ∼ 0.2 in SOPP (singlet oxygen producing protein)[299, 301]. It was also discovered that prolonged intense irradiation of miniSOG leads to an increase of singlet oxygen production[302]. The mechanism for this photoactivation involves photodegradation of FMN to lumichrome (LC), which increases chromophore’s accessibility to oxygen[297] thus making oxygen quenching more effective than protein quenching of the triplet chromophore. This mechanistic interpretation of the structural data[297] is consistent with observations that the yield of singlet oxygen increases in both miniSOG and SOPP at elevated temperatures[300] due to protein’s breathing motions favorable for oxygen diffusion. 116 The photodegradation phenomenon has been investigated in dozens of studies, which considered both free flavins[303–314] and flavin-containing proteins[315, 316], however, the exact details of the mechanism have not yet been fully elucidated. The mechanism of photosensitization in miniSOG is also not fully understood. Detailed molecular-level understanding of these processes is essential for the successful rational design of future miniSOG and SOPP variants aiming to improve the quantum yield of singlet oxygen production and the spectral properties of the protein. Questions about the mechanism involve identification of electronic states involved in photosensitization and photoconversion[317–320]. This requires calculations of singlet and triplet states as well as relevant electronic couplings. In addition, characterization of the effect of the protein environment on these quantities is important, as it is known that they strongly depend on the polarity of the environment[321–323]. Many computational studies investigated SOCs in flavin proteins and flavin-like chromophores. For example, SOC calculations have been carried out to elucidate the reaction between FMN and neighboring cysteine in LOV domains[324], to estimate the influence of the protein environment on the excited states of flavin[325], to describe the reverse cycle FADH2→FAD, connected with the reduction of O2 to H2O2 in glucose oxidase[326], to design fluorinated flavin derivatives with desired spectral properties[327], and so on. The production of singlet oxygen[328] by photosensitization, a transfer of electronic excitation from an electronically excited donor to a ground-state acceptor, occurs in many systems. This process is responsible for the ability of oxygen to effectively quench both fluorescence (i.e., singlet excited states) and phosphorescence (i.e., triplet excited states). Unlike Förster energy transfer[329], which involves transfer of dipole-allowed excitations and can happen between distant moieties, the transfer of spin-forbidden electronic excitations (triplet excitons via Dexter energy transfer) can only occur when the donor and acceptor 117 are in close proximity[330]. Hence, the accessibility of the chromophores to dissolved oxygen is the key factor determining the efficiency of singlet oxygen generation. The nature of electronic couplings responsible for singlet oxygen production and quenching of singlet and triplet excited states by oxygen has been extensively debated[328, 331–333]. Despite the limited computational power, earlier theoretical works have developed insightful explanations of this process as well as related phenomena (e.g., ignition of slow fluorescence, singlet–triplet annihilation, collision-enhanced radiative transitions in oxygen, etc)[332–337], which we can now confirm by high-level calculations. The two main scenarios of singlet oxygen production include intersystem crossing (ISC), facilitated by spin– orbit couplings (SOC), and internal conversion (IC), facilitated by non-adiabatic couplings (NACs)[332– 337]. We note that the latter process is enabled by configuration interactions with charge-transfer configurations[334, 336, 337] and is similar to singlet fission[338, 339], a process of generating two triplet excitons from a single singlet exciton. In this contribution, we report high-level electronic structure calculations using QM/MM approach[20, 22]. We consider protein-bound flavin chromophore, RF, with and without nearby oxygen molecule (about 4.1-4.2 Å away from RF). Our calculations provide quantitative support to earlier mechanistic proposals[332–334] put forward when computational power was not sufficient to carry out accurate ab initio calculations on realistic systems. Our results provide complimentary details to a large body of research on singlet oxygen generation by flavin-based systems. 8.2 Computational details Our model structure was prepared in an earlier study[340], where it was constructed using crystal structure of miniSOG with the RF cofactor (PDB ID 7QF4)[298]. Hydrogen atoms were added assuming the conventional protonation states of the polar residues at neutral pH: Arg and Lys were charged positively, 118 Figure 8.2: QM cluster (model system A) used for QM/MM optimization and excited-state calculations: RF, O2, sidechains of Gln77, Asn72, Asn82, and seven water molecules. Oxygen is located about 4.1-4.2 Å from RF. Glu and Asp were charged negatively, and His85 was neutral. Following notations from Ref. 340, we refer to this model miniSOG[RF]. The initial structure was solvated and neutralized following the standard protocols, and ten dioxygen molecules were added to it at random places. The structure was then equilibrated using molecular dynamics with CHARMM36 forcefield topology and parameters[341], TIP3P water, and RF parameters in the oxidized form of flavin from Ref. 342; for details, see Ref. 340. Selected snapshots from equilibrium trajectories were optimized using QM/MM with the PBE0-D3 functional[343, 344] and the 6-31G* basis set, and using the AMBER99 forcefield[345] for the MM part. Our model structure corresponds to the snapshot with the shortest oxygen-RF distance (about 4.1-4.2 Å). The QM system included RF, O2, sidechains of Gln77, Asn72, and Asn82, and seven water molecules. This structure—called model A—was also used to compute electronic states and relevant couplings. The structure is shown in Fig. 8.2; the Cartesian coordinates were deposited on Zenodo (see the SI for details). 119 We carried out excited-state calculations using several structures derived from model A: (i) model A, (ii) model A without oxygen molecule (model B), and (iii) model B with oxygen molecule placed far away from the chromophore (∼6-8 Å). The excited states were computed using several approaches: TD-DFT (time-dependent density functional theory), RAS-CI (restricted active space configuration interaction)[346], and extended multiconfigurational quasi-degenerate perturbation theory of the second order (XMCQDPT2)[347]. The XMCQDPT2 calculations for model B were based on state-averaged CASSCF(10/8)/cc-pVDZ wavefunctions (14 states were used in the averaging). The XMCQDPT2 calculations for model A were based on state-averaged CASSCF(12/10)/cc-pVDZ wavefunctions (19 states were used in the averaging). We used Intruder State Avoidance shift of 0.02 hartree. We used default parameters for resolvent-fitting approximations. Activespace orbitals are shown in the SI. Because in XMCQDPT2 singlets and triplets are computed separately, the relative total energies of different multiplicity manifolds are not balanced. To correct for this mismatch, we shift the singlet manifold of the combined RF-O2 system so that the excitation energy of the lowest state in the singlet manifold, which corresponds to the RF(S0)×O2( 1∆g) state, equals experimental[348] excitation energy of the O2( 3Σ − g )→O2( 1∆g) transition (0.97 eV). We carried out SOC calculations using RASCI and TD-DFT. TD-DFT calculations are suitable for computing singlet and triplet excited states of closed-shell molecules, such as RF and FMN. However, electronic degeneracies in oxygen impart open-shell character[349] to the wave-functions of relevant states. Such states can be tackled either by multi-reference methods, such as CASSCF, or by spin-flip approaches[350, 351]. As we explain below, the RF-O2 system can be described by a double spin-flip approach, in the same fashion as was done before in the context singlet fission[352–354]. The RAS-2SF calculations have employed quintet reference corresponding to the high-spin RF(T1)×O2( 3Σ − g ) restricted open-shell Hartree– Fock determinant. 120 2SF RF O2 LE CT Figure 8.3: RAS-2SF reference and target determinants. Singly occupied orbitals are flavin’s π and π ∗ and oxygen’s π ∗ x and π ∗ y . LE and CT denote local excitations and charge-transfer configurations, respectively. The SOCs were computed as matrix elements of the spin–orbit part of the Breit–Pauli Hamiltonian. Two-electron contributions were computed using the mean-field approach[355–358]. TD-DFT and RAS-CI calculations were carried out using the Q-Chem electronic structure package[44, 45]. XMCQDPT2 calculations were carried out using Firefly[359]. 8.3 Results and discussion We begin by reviewing the basic energetics of the RF chromophore and molecular oxygen. Molecular oxygen’s ground state is 3Σ − g . The next two states are singlets: doubly degenerate 1∆g and 1Σ + g states located at 0.97 eV and 1.63 eV, respectively[328, 348]. The electronic configurations of these four states can be described by distributing two electrons in two degenerate π ∗ orbitals—they are shown in Fig. 8.4. There are four determinants—two of an open-shell type (in which the two π ∗ orbitals are singly occupied) and two of a closed-shell type (in which one of the orbitals is doubly occupied and the second is empty). According to El-Sayed’s rules, [360], one can anticipate small (or zero) SOCs between the determinants of the same type and large SOCs between the closed-shell and open-shell determinants—since these are 121 related by a transition between π ∗ x and π ∗ y and thus involve an orbital flip. To understand the SOCs between these states, recall that each state is described by two determinants, so the combined effect depends on the relative signs (a similar situation was described in Ref. 361). By analyzing the configurations in Fig. 8.4, one can see that the SOC between the 3Σ − g and 1∆g is expected to be small (contributions from the two determinants cancel out) whereas the SOC between the 3Σ − g and 1Σ + g can be large (contributions from open-shell–closed-shell transitions add up). The calculation of SOCs confirms this—at the RAS-SF/6- 311G(d,p) level of theory, the respective SOCs are 0.00 and 173.36 cm−1 . - 3Σg - - 1Δg + + 1Σg + ⇡ ⇤ x ⇡ ⇤ y Figure 8.4: Electronic configurations of the 3Σ − g , 1∆g, and 1Σ + g states of molecular oxygen. Table 8.1: Excitation energies (eV) for model system B (no oxygen). Oscillator strengths for the transitions from RF(S0) are given in parenthesis. State ωB97M-V/aug-cc-pVTZ XMCQDPT2/cc-pVDZa S1 3.26 (0.33) 2.93 (0.42) S2 3.84 (0.02) 3.60 (0.20) T1 2.16 2.48 T2 2.82 2.96 T3 3.41 3.19 aXMCQDPT2 is based on SA14-CASSCF(10/8) wavefunctions. The XMCQDPT2 calculations were carried out for a model structure with oxygen molecule ∼8 Å away from RF (see text for details). Table 8.1 lists energies of the RF chromophore in the model miniSOG[RF] system; additional results are given in the SI. The computed energetics is similar to other flavin-based systems[327]: at the XMCQDPT2 level, the lowest triplet state is ∼0.3 eV below S1, and the second triplet is slightly above S1. The S2 state is ∼0.7 eV above S2. TD-DFT slightly overestimates excitation energy of the singlet and underestimates energies of the triplets relative to XMCQDPT2, however, the overall picture is similar. The RAS-SF singlet energies (shown in the SI) are overestimated due to the insufficient treatment of dynamical correlation). Fig. 8.5 shows NTOs for the S0 →T1 and S0 →S1 transitions in RF (model system B). The shape of NTOs is similar, consistent with the π → π ∗ character of the transitions. Because the two states have similar 122 orbital character, the S1-T1 SOC is expected to be small by virtue of El-Sayed’s rules[360], as confirmed by the calculations—e.g., at the TD-DFT level, the S1-T1 SOC is 0.58 cm−1 (see the SI). T1 S1 Figure 8.5: NTOs for the two lowest transitions in RF cofactor in miniSOG[RF]. Such small values of S0 →T1 SOCs in flavin-based systems have been reported by previous studies[324, 327]. They might appear puzzling in view of a high quantum efficiency of triplet-state yields[300, 362]—as high as 0.4-0.5. Such efficient ISC in flavins is facilitated by spin-vibronic interactions, which entail contributions from higher triplet states[327, 363, 364] as well as enhancement of SOC upon bending motions of the chromophore. As we illustrate below, the production of triplet RF can be also enhanced by molecular oxygen via IC, as was observed experimentally[328, 331–333] and explained theoretically[336, 337]. To investigate possible pathways of the singlet oxygen production, we consider a model system that comprises the RF chromophore and a nearby oxygen molecule, embedded in the protein (model system A, see Computational Details). The low-lying electronic states of the combined RF-O2 system can be described as products of Ψ(RF) × Ψ(O2), and their energies can be estimated as a sum of the respective energies of the two moieties. We note that electronic configurations of these composite states are derived by distributing four electrons in the four orbitals—π and π ∗ orbitals of RF and two π ∗ orbitals of oxygen, a situation suitable for double spin-flip approach using a high-spin quintet reference (see Fig. 8.3)[351, 365]. 123 Fig. 8.6 shows energy diagram for the singlet and triplet manifolds obtained by combining energies of the isolated O2 and RF (taken from model system B) using experimental energies for O2 and our best estimates for the RF chromophore (XMCQDPT2 results). We note that a similar energy diagram was invoked by Tsubomura and Mulliken in 1960[332] and by Minaev in 1985[334]. Table 8.2 shows the results of XMCQDPT2 calculations for system A—as one can see, the differences between idealized estimates and the calculations are very small. Overall, the presence of O2 has a negligible effect on the states’ energies, as expected for this weakly interacting complex; thus, the energy diagram in Fig. 8.6 provides a good description of energy levels. The RAS-2SF results are given in the SI. The RAS-2SF energies are less accurate then XMCQDPT2 due to an insufficient description of dynamic correlation. Inclusion of hp (hole-particle) excitations improves the energies significantly, but the changes in the wavefunctions and, consequently, the properties are small. As one can see, upon excitation to the S1 state of RF, several pathways for electronic relaxation are energetically possible in the triplet and singlet manifolds. The accessible states are: RF(T1)×O2( 3Σ − g ), RF(S0)×O2( 1Σ + g ), and RF(S0)×O2( 1∆g). Table 8.2: Excitation energies (eV) for model system A; XMCQDPT2/cc-pVDZa . Oscillator strengths for the transitions from RF(S0) are given in parentheses. State Multiplicity Eex, eV RF(S0)×O2( 3Σ − g ) triplet 0.0 RF(S0)×O2( 1∆g) singlet 0.97 RF(S0)×O2( 1∆g) singlet 0.97 RF(S0)×O2( 1Σ + g ) singlet 1.64 RF(T1)×O2( 3Σ − g ) triplet 2.48 RF(S1)×O2( 3Σ − g ) triplet 2.92 (0.422) RF(T2)×O2( 3Σ − g ) triplet 2.98 RF(T3)×O2( 3Σ − g ) triplet 3.16 RF(S2)×O2( 3Σ − g ) triplet 3.59 (0.204) RF(T1)×O2( 1∆g) triplet 3.60 RF(T1)×O2( 1∆g) triplet 3.60 RF(S1)×O2( 1∆g) singlet 3.80 RF(S1)×O2( 1∆g) singlet 3.80 a XMCQDPT2 is based on SA19-CASSCF(12/10) wavefunctions (see text for details). 124 Triplets Singlets RF(S0)xO2(3Σg - ) RF(T1)xO2(3Σg - ) RF(S0)xO2(1Δg) RF(S1)xO2(3Σg - ) RF(T1)xO2(1Δg) RF(S0)xO2(1Σg +) RF(T1)xO2(1Σg +) Exp. Energies: O2(1Δg): 0.97 eV O2(1Σg +): 1.63 eV Bella’s flavin: S1: 2.79: 2.96 T1: 2.42: 2.42 Fix T3: now TDDFT I used these energies to place the States in the diagram below Goran DFT: T1: 0.63 eV (RASSF – 0.41 eV) S1: 2.11 eV (f=0.158) T2=2.63 eV SOC S1 - T1 = 1.41 T1-S0 =6.96 Dashed arrows: SOCs for ISC Blue dotted like shows that there is NAC coupling between the two states, similar to ||gamma||=0.57, which is quite large! 0.97 1.63 2.48 2.93 3.45 4.11 RF(S2)xO2(3Σg - ) 3.60 RF(T2)xO2(3Σg - ) 2.96 RF(T3)xO2(3Σg - ) 3.19 RF(T1)xO2(3Σg - ) RF(T2)xO2(3Σg - ) RF(T3)xO2(3Σg - ) RF(T2)xO2(1Δg) 3.93 Figure 8.6: Energy diagram of the low-lying manifold of singlet and triplet states derived from RF’s S0, S1, S2, T1, T2, and T3 and oxygen’s 3Σ − g , 1∆g, and 1Σ + g . Excitation energies are in electron-volt relative to the ground state, RF(S0) × O2( 3Σ − g ). To further analyze these pathways, we consider relevant electronic couplings—SOCs between the states of different multiplicity and NACs between the states of the same multiplicity. As a proxy for NAC, we consider the norm of one-particle transition density matrix, ||γ||, between the two states[352, 366] (large ||γ||signifies considerable one-electron character of the transition, which can develop due to the admixture of charge-transfer configurations). Fig. 8.7 shows the computed couplings. First, we consider the initially excited state, RF(S1)×O2( 3Σ − g )(its multiplicity is triplet because of oxygen). The value of SOC that couples this state to the singlet state RF(T1)×O2( 3Σ − g ) is small (0.07 cm−1 ), as expected from the SOC value for the T1-S1 coupling in RF. The value of SOC with the RF(S0)×O2( 1∆g) states is larger (0.16 cm−1 ). Thus, a single-step electronic transition producing O2( 1∆g) is possible, but does not appear to be very effective. 125 We note that the values of SOCs can be significantly enhanced when oxygen is closer to the chromophore, as shown by Minaev and coworkers[336, 337]. Hence, thermal fluctuations or structural relaxation of the excited-state collision complex can increase the yield. Importantly, the initially excited state shows a substantial NAC with triplet RF(T1)×O2( 3Σ − g ), suggesting that this non-adiabatic transition may be fast and effective. This means that the production of triplet RF can proceed both via ISC and via IC, when oxygen is present. Such an oxygen-assisted pathway for the triplet production has been put forward by Tsubomura and Mulliken in 1960[332] to explain enhanced ISC—an increased yield of triplet states in the presence of oxygen—first discussed by Kasha in 1950[367]. This enhancement was also documented by Minaev and co-workers, who provided a theoretical support using semi-empirical calculations on a model system[334]. Tsubomura and Mulliken also posited that sufficiently large coupling between these states can develop via configuration interaction mixing of charge-transfer configurations[332], which was later illustrated by Minaev’s calculations[334, 336, 337]. We note that the admixture of charge-transfer (or charge-resonance) configurations is also responsible for couplings facilitating singlet fission[352] and triplet–triplet annihilation[368, 369]. We now consider possible transitions from the RF(T1)×O2( 3Σ − g )states. The singlet state of this character can be produced by either non-adiabatic transition described above or by a collision of oxygen molecule with the RF(T1) state formed by ISC. According to Fig. 8.7, the singlet state of this character shows substantial NACs with lower states in the singlet manifold and, therefore, can lead to the singlet oxygen generation via IC. The triplet state of this character shows small but non-zero SOCs with the RF(S0)×O2( 1Σ + g ) and RF(S0)×O2( 1∆g) (0.04 and 0.01 cm−1 ). The reason why this state has small SOCs with the lower states in the singlet manifold is because the respective transitions would involve changes of states of two electrons, which means that the only coupling terms can come from the exchange interaction, as in the Dexter energy transfer[370] or from the admixture of charge-transfer configurations. Again, both processes can 126 be enhanced when oxygen is closer to the chromophore, so for a quantitative description, sampling of the thermal motions and structural relaxation of the initially excited state are important. We also note that involvement of higher states can both significantly increase the couplings and expand the available pathways. For example, upon excitation to S2, T2 and T3 become accessible. We also note that admixture of the RF(T1)×O2( 1Σ + g ) state, which has large SOC (100 cm−1 ) with RF(T1)×O2( 3Σ − g ), can also contribute to the enhancement. We note that some pathways lead to the production of O2( 1Σ + g ). This other singlet oxygen has been observed experimentally[362]. It relaxes to O2( 1∆g) with unit efficiency[362]. The computed value of the NAC for the RF(S0)×O2( 1Σ + g ) → RF(S0)×O2( 1∆g) transition is large, consistent with the experimental observations[362]. This large value suggests very fast internal conversion, which can outcompete ISC to the ground state of the system, RF(S0)×O2( 3Σ − g ), which features rather large SOC (173 cm−1 ). Triplets Singlets RF(S0)xO2(3Σg - ) RF(T1)xO2(3Σg - ) RF(T1)xO2(3Σg - ) RF(S0)xO2(1Δg) RF(S1)xO2(3Σg - ) RF(S0)xO2(1Σg +) 173 0.02 0.99 0.04 0.02 0.32 0.01 0.07 0.004 0.16 0.04 0.26 Figure 8.7: Couplings between the relevant states. SOC values (in cm−1 ) are shown in black and ||γ|| values (dimensionless) are shown in red. For the degenerate 1∆g states, the combined values (sum of the SOCs/NACs for the two components) are shown. Our results are consistent with previous mechanistic discussions of the singlet oxygen production and enhancement of radiative transitions due to collisions with other species [328, 334, 336, 337, 362]. The value of our contribution is that by providing concrete values of the electronic couplings it lands ab initio 127 support to previously hypothesized scenarios. We note that the pathway of singlet oxygen production via a triplet state of the oxygen-RF collision complex means that the kinetic models used to describe singlet oxygen production in flavin-based systems (such as one in Ref. 300) need to be amended to account for different spin statistics. 8.4 Conclusion We report high-level quantum chemistry calculations of a model system representing miniSOG photoactive protein with the RF chromophore. Our calculations of relevant electronic states and couplings between them clarify the mechanism of singlet oxygen generation in this system. In particular, our results indicate that singlet oxygen generation can proceed both via the singlet and the triplet RF(T1)×O2( 3Σ − g ) state of the RF-O2 complex. The triplet RF(T1)×O2( 3Σ − g ) state can be produced either by IC of the initially excited S1 state of RF bound to oxygen or by the T1 state of RF (produced via ISC) forming a collision complex with O2. The relaxation of the singlet RF(T1)×O2( 3Σ − g ) state proceeds via IC or Dexter energy transfer whereas the triplet RF(T1)×O2( 3Σ − g ) state can decay via ISC facilitated by small but non-zero SOCs. The couplings can be enhanced by sampling configurations in which oxygen is closer to the chromophore. Our results provide robust theoretical support to previously hypothesized scenarios. We hope that a better understanding the function of miniSOG will aid further development of effective genetically encoded photoactive proteins. Future work will focus on quantitative calculations of rates of the relevant processes and mechanisms of photodegradation and production of other types of ROS—such as peroxide—in these systems. 128 Chapter 9 Future directions 9.1 Introduction In light of the results reported in the previous chapter, several promising avenues emerge to extend the current understanding of excited-state dynamics, singlet oxygen generation, and nonradiative decay processes, particularly through computational approaches that explore the intricate interactions of chromophores and proteins in biological contexts. This section outlines these potential directions, integrating the advancements from previous studies to provide a vision for future work. 9.2 Predicting Radiationless Decay Rate for Diverse Chromophore Classes The methodology developed by Ramos et al.[371] uses nonadiabatic derivative couplings (NDCs) and Franck-Condon factors to compute the nonradiative decay rate in near-infrared (NIR) and short-wave infrared (SWIR) dye molecules. The model uses Fermi’s golden rule (FGR), which is also discussed by do Casal et al.[372]. Their model was able to predict FQYs for the model systems. It also helped clarify energy gap law behavior, suggesting potential pathways for decay that are not limited to the highest-frequency 129 vibrational modes. Future research could expand this approach to a broader class of chromophores, investigating whether the proposed rate expressions work well across different molecular structures. Calculations of decay rates using static methods–such as FGR–could allow the prediction of rates for diverse chromophores. These modes can help to improve the design of chromophores for bioimaging such as the the new dyes discussed in Chapter 8. 9.3 Excited-State Decay Calculations using Hybrid Approaches The first-principles approach for calculating excited-state decay rates presented by do Casal et al. (2023) demonstrates considerable potential for improving the precision of decay constants via FGR applied at critical points on the potential energy surface (PES). This static approach could be coupled with timedependent methodologies to create a more robust predictive model, especially suited to systems in complex environments. By blending these computational frameworks, future research could offer an advanced toolset for larger, flexible systems. Further, machine learning models could supplement these hybrid methods, extending them to high-throughput computational workflows in excited-state research. 130 Bibliography [1] A. V. Sadybekov and V. Katritch. “Computational approaches streamlining drug discovery”. In: Nature 616.7958 (2023), pp. 673–685. issn: 1476-4687. [2] K. D. Vogiatzis et al. “Computational Approach to Molecular Catalysis by 3d Transition Metals: Challenges and Opportunities”. In: Chemical Reviews 119.4 (2019), pp. 2453–2523. [3] M. O. Steinhauser and S. Hiermaier. “A review of computational methods in materials science: examples from shock-wave and polymer physics”. In: Int J Mol Sci 10.12 (2009), pp. 5135–5216. [4] R. J Allan et al. COMPUTATIONAL CHEMISTRY IN THE ENVIRONMENTAL MOLECULAR SCIENCES LABORATORY. Springer, 2012. isbn: 0-306-46034-3. [5] A. W. Prentice and M. A. Zwijnenburg. “The Role of Computational Chemistry in Discovering and Understanding Organic Photocatalysts for Renewable Fuel Synthesis”. In: Advanced Energy Materials 11.29 (2021), p. 2100709. [6] A. V. Vorontsov and P. G. Smirniotis. “Advancements in hydrogen energy research with the assistance of computational chemistry”. In: International Journal of Hydrogen Energy 48.40 (2023), pp. 14978–14999. issn: 0360-3199. [7] G.-J. Cheng et al. “Computational Organic Chemistry: Bridging Theory and Experiment in Establishing the Mechanisms of Chemical Reactions”. In: J. Am. Chem. Soc. 137.5 (2015), pp. 1706–1725. [8] R. Haunschild, A. Barth, and W. Marx. “Evolution of DFT studies in view of a scientometric perspective”. In: Journal of Cheminformatics 8.1 (2016), p. 52. issn: 1758-2946. [9] H. Lischka et al. “Multireference Approaches for Excited States of Molecules”. In: Chemical Reviews 118.15 (2018), pp. 7293–7361. [10] J. Kleinjung and F. Fraternali. “Design and application of implicit solvent models in biomolecular simulations”. In: Curr Opin Struct Biol 25.100 (2014,), pp. 126–134. [11] U. Morzan et al. “Spectroscopy in Complex Environments from QM-MM Simulations”. In: Chem. Rev. 118 (2018), pp. 4071–4113. 131 [12] S. Mai and L. González. “Molecular Photochemistry: Recent Developments in Theory”. In: Angewandte Chemie International Edition 59.39 (2020), pp. 16832–16846. [13] P. W. Atkins and R. S. Friedman. Molecular Quantum Mechanics. New York: Oxford University Press, 2005. [14] M. Musiał and R. J. Bartlett. “Coupled-cluster theory in quantum chemistry”. In: Rev. Mod. Phys. 79 (2007), p. 291. [15] T. Kuś and R. J. Bartlett. “Improving upon the accuracy for doubly excited states within the coupled cluster singles and doubles theory”. In: J. Chem. Phys. 131 (2009), p. 124310. [16] D. C. Comeau and R. J. Bartlett. “The equation-of-motion coupled-cluster method. Applications to open- and closed-shell reference states”. In: Chem. Phys. Lett. 207 (1993), pp. 414–423. [17] M. Musiał and R. J. Bartlett. “Multireference Fock-space coupled-cluster and equation-of-motion coupled-cluster theories: The detailed interconnections”. In: J. Chem. Phys. 129 (2008), p. 134105. [18] R. Parr and W. Yang. “Density-Functional Theory of the Electronic Structure of Molecules”. In: Annu. Rev. Phys. Chem. 46 (1995), pp. 701–728. [19] S. Leang, F. Zahariev, and M. Gordon. “Benchmarking the performance of time-dependent density functional methods”. In: J. Chem. Phys. 136.10 (2012), p. 104101. [20] H. M. Senn and W. Thiel. “QM/MM methods for biomolecular systems”. In: Angew. Chem., Int. Ed. 48 (2009), pp. 1198–1229. [21] G. Groenhof. “Introduction to QM/MM simulations”. In: Methods Mol Biol 924 (2013), pp. 43–66. [22] A. Warshel and M. Levitt. “Theoretical Studies of Enzymatic Reactions: Dielectric Electrostatic and Steric Stabilization of the Carbonium Ion in the Reaction of Lysozyme”. In: J. Mol. Biol. 103 (1976), p. 227. [23] Q. Chen, A. Allot, and Z. Lu. “Keep up with the latest coronavirus research”. In: Nature 579 (2020), p. 193. [24] V. Frecer and S. Miertus. “Antiviral agents against COVID-19: structure-based design of specific peptidomimetic inhibitors of SARS-CoV-2 main protease”. In: RSC Adv. 10 (2020), pp. 40244–40263. [25] V. Zarezade et al. “The identification of novel inhibitors of human angiotensin-converting enzyme 2 and main protease of SARS-CoV-2: A combination of in silico methods for treatment of COVID-19”. In: J. Molec. Struct. 1237 (2021), p. 130409. [26] L. Riva et al. “Discovery of SARS-CoV-2 antiviral drugs through large-scale compound repurposing”. In: Nature 586 (2020), pp. 113–119. [27] D. Mondal and A. Warshel. “Exploring the Mechanism of Covalent Inhibition: Simulating the Binding Free Energy of α-Ketoamide Inhibitors of the Main Protease of SARS-CoV-2”. In: Biochemistry 48 (2020), pp. 4601–4608. 132 [28] K. Świderek and V. Moliner. “Revealing the molecular mechanisms of proteolysis of SARS-CoV-2 Mpro by QM/MM computational methods”. In: Chem. Sci. 11 (2020), pp. 10626–10630. [29] K. Arafet et al. “Mechanism of inhibition of SARS-CoV-2 Mpro by N3 peptidyl Michael acceptor explained by QM/MM simulations and design of new derivatives with tunable chemical reactivity”. In: Chem. Sci. 12 (2021), pp. 1433–1444. [30] C. A. Ramos-Guzmán et al. “Unraveling the SARS-CoV-2 Main Protease Mechanism Using Multiscale Methods”. In: ACS Cat. 10.21 (2020), pp. 12544–12554. [31] C. A. Ramos-Guzmán, J. J. Ruiz-Pernía, and Tun˜ón. “Computational simulations on the binding and reactivity of a nitrile inhibitor of the SARS-CoV-2 main protease”. In: Chem. Comm. 57 (72 2021), pp. 9096–9099. [32] C. A. Ramos-Guzmán et al. “A microscopic description of SARS-CoV-2 main protease inhibition with Michael acceptors. Strategies for improving inhibitor design”. In: Chem. Sci. 12 (10 2021), pp. 3489–3496. [33] C. A. Ramos-Guzmán, J. J. Ruiz-Pernía, and Tun˜ón. “Inhibition Mechanism of SARS-CoV-2 Main Protease with Ketone-Based Inhibitors Unveiled by Multiscale Simulations: Insights for Improved Designs”. In: Angew. Chem., Int. Ed. 60.49 (2021), pp. 25933–25941. [34] C. A. Ramos-Guzmán, J. J. Ruiz-Pernía, and Tun˜ón. “Multiscale Simulations of SARS-CoV-2 3CL Protease Inhibition with Aldehyde Derivatives. Role of Protein and Inhibitor Conformational Changes in the Reaction Mechanism”. In: ACS Cat. 11.7 (2021), pp. 4157–4168. [35] A. I. Krylov et al. “Computational chemistry software and its advancement: Three Grand Challenge cases for computational molecular science”. In: J. Chem. Phys. 149 (2018), p. 180901. [36] J. E. Boggs. “Guidelines for presentation of methodological choices in the publication of computatiional results. A. Ab initio electronic structure calculations”. In: Pure & Appl. Chem. 70 (1998), pp. 1015–1018. [37] V. Stodden, J. Seiler, and Z. Ma. “An empirical analysis of journal policy effectiveness for computational reproducibility”. In: Proc. Nat. Acad. Sci. 115.11 (2018), pp. 2584–2589. [38] D. B. Allison, R. M. Shiffrin, and V. Stodden. “Reproducibility of research: Issues and proposed remedies”. In: Proc. Nat. Acad. Sci. 115 (2018), pp. 2561–2562. [39] S. Ullrich and C. Nitsche. “The SARS-CoV-2 main protease as drug target”. In: Bioorg. & Med. Chem. Lett. 30 (2020), pp. 127377–127377. [40] Z. Jin et al. “Structural basis for the inhibition of SARS-CoV-2 main protease by antineoplastic drug carmofur”. In: Nat. Struct. & Mol. Bio. 27 (2020), pp. 529–532. [41] W. Dai et al. “Structure-based design of antiviral drug candidates targeting the SARS-CoV-2 main protease”. In: Science 368 (2020), pp. 1331–1335. 133 [42] J. Sakamoto et al. “An Individual Patient Data Meta-analysis of Adjuvant Therapy with Carmofur in Patients with Curatively Resected Colon Cancer”. In: Jap. J. of Clin. Onc. 35 (2005), pp. 536–544. [43] E. Aprá et al. “NWChem: Past, present, and future”. In: J. Chem. Phys. 152.18 (2020), p. 184102. [44] A. I. Krylov and P. M. W. Gill. “Q-Chem: An engine for innovation”. In: WIREs: Comput. Mol. Sci. 3 (2013), pp. 317–326. [45] E. Epifanovsky et al. “Software for the frontiers of quantum chemistry: An overview of developments in the Q-Chem 5 package”. In: J. Chem. Phys. 155.8 (2021), p. 084801. [46] Y. Shao and J. Kong. “YinYang atom: A simple combined ab initio quantum mechanical molecular mechanical model”. In: J. Phys. Chem. A 111 (2007), pp. 3661–3671. [47] T. Vreven and K. Morokuma. Chapter 3 Hybrid Methods: ONIOM(QM:MM) and QM/MM. Vol. 2. Elsevier, 2006, pp. 35–51. [48] H. Lin and D. G. Truhlar. “Redistributed Charge and Dipole Schemes for Combined Quantum Mechanical and Molecular Mechanical Calculations”. In: J. Phys. Chem. A 109 (2005), pp. 3991–4004. [49] D. C. Liu and J. Nocedal. “On the limited memory BFGS method for large scale optimization”. In: Mathematical Programming 45 (1989), pp. 503–528. [50] J. C. Phillips et al. “Scalable molecular dynamics with NAMD”. In: J. Comput. Chem. 26 (2005), pp. 1781–1802. [51] J. W. Ponder. TINKER – Software Tools for Molecular Design, URL http://dasher.wustl.edu/tinker/ (accessed on April 23, 2017). [52] C. Adamo and V. Barone. “Toward reliable density functional methods without adjustable parameters: The PBE0 model”. In: J. Chem. Phys. 110 (1999), pp. 6158–6170. [53] V. Hornak et al. “Comparison of multiple amber force fields and development of improved protein backbone parameters”. In: Proteins- Struct. Funct. Bioinf. 65 (2006), pp. 712–725. [54] S. Grimme. “Accurate description of van der Waals complexes by density functional theory including empirical corrections”. In: J. Comput. Chem. 25 (2004), pp. 1463–1473. [55] J. Chai and M. Head-Gordon. “Systematic optimization of long-range corrected hybrid density functionals”. In: J. Chem. Phys. 128 (2008), p. 084106. [56] J. Chai and M. Head-Gordon. “Long-range corrected hybrid density functionals with damped atom-atom dispersion interactions”. In: Phys. Chem. Chem. Phys. 10 (2008), pp. 6615–6620. [57] J. C. Powers et al. “Irreversible Inhibitors of Serine, Cysteine, and Threonine Proteases”. In: Chem. Rev. 102.12 (2002), pp. 4639–4750. 134 [58] A. Paasche, T. Schirmeister, and B. Engels. “Benchmark Study for the Cysteine‚ÄìHistidine Proton Transfer Reaction in a Protein Environment: Gas Phase, COSMO, QM/MM Approaches”. In: J. Chem. Theory Comput. 9.3 (2013), pp. 1765–1777. [59] NWChem prints the “QM in MM charges” energy in the output file whereas Q-Chem does not. To obtain this value, one needs to follow these steps: First, the MM charges and their coordinates need to be extracted from a QM/MM (HLINK) output. Second, a single-point calculation should be executed with these charges specified in the $external_charges section and with the coordinates of the QM atoms specified in the $molecule section. Third, another single-point job should be executed without the external charges and using the SCF orbitals from the second job. The input for this procedure is given in the SI. [60] M. Liu et al. “QM/MM through the 1990s: The First Twenty Years of Method Development and Applications”. In: Isr. J. of Chem. 54.8-9 (2014), pp. 1250–1263. [61] B. Wang and D. G. Truhlar. “Tuned and Balanced Redistributed Charge Scheme for Combined Quantum Mechanical and Molecular Mechanical (QM/MM) Methods and Fragment Methods: Tuning Based on the CM5 Charge Model”. In: J. Chem. Theory Comput. 9.2 (2013), pp. 1036–1042. [62] A. Monari, J. L. Rivail, and X. Assfeld. “Theoretical Modeling of Large Molecular Systems. Advances in the Local Self Consistent Field Method for Mixed Quantum Mechanics/Molecular Mechanics Calculations”. In: Acc. Chem. Res. 46.2 (2013), pp. 596–603. [63] B. Grigorenko et al. “Modeling of biomolecular systems with the quantum mechanical and molecular mechanical method based on the effective fragment potential technique: Proposal of flexible fragments”. In: J. Phys. Chem. A 106 (2002), pp. 10663–10672. [64] A. Rizzi, P. Carloni, and M. Parrinello. “Targeted Free Energy Perturbation Revisited: Accurate Free Energies from Mapped Reference Potentials”. In: J. Phys. Chem. Lett. 12.39 (2021), pp. 9449–9454. [65] M. G. Khrenova, B. L. Grigorenko, and A. V. Nemukhin. “Molecular Modeling Reveals the Mechanism of Ran-RanGAP-Catalyzed Guanosine Triphosphate Hydrolysis without an Arginine Finger”. In: ACS Cat. 11.15 (2021), pp. 8985–8998. [66] M. G. Khrenova, A. M. Kulakova, and A. V. Nemukhin. “Light-Induced Change of Arginine Conformation Modulates the Rate of Adenosine Triphosphate to Cyclic Adenosine Monophosphate Conversion in the Optogenetic System Containing Photoactivated Adenylyl Cyclase”. In: J. Chem. Inf. and Mod. 61.3 (2021), pp. 1215–1225. [67] C. H. S. da Costa et al. “Evaluating QM/MM Free Energy Surfaces for Ranking Cysteine Protease Covalent Inhibitors”. In: J. Chem. Inf. and Mod. 60.2 (2020), pp. 880–889. [68] L. Shen and W. Yang. “Molecular Dynamics Simulations with Quantum Mechanics/Molecular Mechanics and Adaptive Neural Networks”. In: J. Chem. Theory Comput. 14.3 (2018), pp. 1442–1455. [69] X. Pan et al. “Machine-Learning-Assisted Free Energy Simulation of Solution-Phase and Enzyme Reactions”. In: J. Chem. Theory Comput. 17.9 (2021), pp. 5745–5758. 135 [70] K. B. Bravaya et al. “Effect of protein environment on electronically excited and ionized states of the green fluorescent protein chromophore”. In: J. Phys. Chem. B 115 (2011), pp. 8296–8303. [71] E. Epifanovsky et al. “The effect of oxidation on the electronic structure of the green fluorescent protein chromophore”. In: J. Chem. Phys. 132 (2010), p. 115104. [72] M. Khrenova et al. “Quantum chemistry calculations provide support to the mechanism of the light-induced structural changes in the flavin-binding photoreceptor protein”. In: J. Chem. Theory Comput. 6 (2010), p. 2293. [73] A. M. Bogdanov et al. “Turning on and off photoinduced electron transfer in fluorescent proteins by π-stacking, halide binding, and Tyr145 mutations”. In: J. Am. Chem. Soc. 138 (2016), pp. 4807–4817. [74] T. Sen et al. “Influence of the first chromophore-forming residue on photobleaching and oxidative photoconversion of EGFP and EYFP”. In: Int. J. Mol. Sci. 20 (2019), p. 5229. [75] T. Sen et al. “Interplay between locally excited and charge transfer states governs the photoswitching mechanism in fluorescent protein Dreiklang”. In: J. Phys. Chem. B 125 (2021), pp. 757–770. [76] A. Barrozo, M. Y. El-Naggar, and A. I. Krylov. “Distinct Electron Conductance Regimes in Bacterial Decaheme Cytochromes”. In: Angew. Chem., Int. Ed. 57 (2018), pp. 6805–6809. [77] S. Xu et al. “Multiheme Cytochrome Mediated Redox Conduction Through Shewanella oneidensis MR-1 Cells”. In: J. Am. Chem. Soc. 140 (2018), pp. 10085–10089. [78] Y. Zhang and W. Yang. “A challenge for density functionals: Self-interaction error increases for systems with a noninteger number of electrons”. In: J. Chem. Phys. 109.7 (1998), pp. 2604–2608. [79] M. Lundber and P. Siegbahn. “Quantifying the effects of the self-interaction error in DFT: When do the delocalized states appear?” In: J. Chem. Phys. 122 (2005), p. 224103. [80] C. V. Sumowski and C. Ochsenfeld. “A Convergence Study of QM/MM Isomerization Energies with the Selected Size of the QM Region for Peptidic Systems”. In: J. Phys. Chem. A 113 (2009), pp. 11734–11741. [81] S. Roßbach and C. Ochsenfeld. “Influence of Coupling and Embedding Schemes on QM Size Convergence in QM/MM Approaches for the Example of a Proton Transfer in DNA”. In: J. Chem. Theory Comput. 13 (2017), pp. 1102–1107. [82] H. J. Kulik et al. “How Large Should the QM Region Be in QM/MM Calculations? The Case of Catechol O-Methyltransferase”. In: J. Phys. Chem. B 120.44 (2016), pp. 11381–11394. [83] Q. Cui, T. Pal, and L. Xie. “Biomolecular QM/MM Simulations: What Are Some of the "Burning Issues"?” In: J. Phys. Chem. B 125.3 (2021), pp. 689–702. [84] L. O. Jones et al. “Embedding Methods for Quantum Chemistry: Applications from Materials to Life Sciences”. In: J. Am. Chem. Soc. 142.7 (2020), pp. 3281–3295. 136 [85] A. Warshel and M. Levitt. “Theoretical Studies of Enzymatic Reactions: Dielectric Electrostatic and Steric Stabilization of the Carbonium Ion in the Reaction of Lysozyme”. In: J. Mol. Biol. 103 (1976), p. 227. [86] H. M. Senn and W. Thiel. “QM/MM Methods for Biomolecular Systems”. In: Angew. Chem., Int. Ed. 48 (2009), p. 1198. [87] M. Liu et al. “QM/MM through the 1990s: The First Twenty Years of Method Development and Applications”. In: Isr. J. Chem. 54 (2014), p. 1250. [88] M. W. Van der Kamp and A. J. Mulholland. “Combined Quantum Mechanics/Molecular Mechanics (QM/MM) Methods in Computational Enzymology”. In: Biochemistry 52 (2013), p. 2708. [89] K. Świderek, I. Tuñón, and V. Moliner. “Predicting Enzymatic Reactivity: From Theory to Design”. In: WIREs Comput. Mol. Sci. 4 (2014), p. 407. [90] H. J. Kulik et al. “How Large Should the QM Region Be in QM/MM Calculations? The Case of Catechol O-Methyltransferase”. In: J. Phys. Chem. B 120 (2016), p. 11381. [91] S. Roßbach and C. Ochsenfeld. “Influence of Coupling and Embedding Schemes on QM Size Convergence in QM/MM Approaches for the Example of a Proton Transfer in DNA”. In: J. Chem. Theory Comput. 13 (2017), p. 1102. [92] S. F. Sousa et al. “Application of Quantum Mechanics/Molecular Mechanics Methods in the Study of Enzymatic Reaction Mechanisms”. In: WIREs Comput. Mol. Sci. 7 (2017), e1281. [93] Q. Cui, T. Pal, and L. Xie. “Biomolecular QM/MM Simulations: What Are Some of the "Burning Issues"?” In: J. Phys. Chem. B 125 (2021), p. 689. [94] K. Gao et al. “Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2”. In: Chem. Rev. 122 (2022), p. 11287. [95] C. Bai et al. “Predicting Mutational Effects on Receptor Binding of the Spike Protein of SARS-CoV-2 Variants”. In: J. Am. Chem. Soc. 143 (2021), p. 17646. [96] J. Zhou et al. “Fast and Effective Prediction of the Absolute Binding Free Energies of Covalent Inhibitors of SARS-CoV-2 Main Protease and 20S Proteasome”. In: J. Am. Chem. Soc. 144 (2022), p. 7568. [97] J. Wang et al. “A New Class of -Ketoamide Derivatives with Potent Anticancer and Anti-SARS-CoV-2 Activities”. In: Eur. J. Med. Chem. 215 (2021), p. 113267. [98] K. Anand et al. “Coronavirus Main Proteinase (3CLpro) Structure: Basis for Design of anti-SARS Drugs”. In: Science 300 (2003), p. 1763. [99] Z. Jin et al. “Structure of Mpro from SARS-CoV-2 and Discovery of Its Inhibitors”. In: Nature 582 (2020), p. 289. 137 [100] L. Zhang et al. “Crystal Structure of SARS-CoV-2 Main Protease Provides a Basis for Design of Improved -Ketoamide Inhibitors”. In: Science 368 (2020), p. 409. [101] R. Cannalire et al. “Targeting SARS-CoV-2 Proteases and Polymerase for COVID-19 Treatment: State of the Art and Future Opportunities”. In: J. Med. Chem. 65 (2022), p. 2716. [102] S. Ullrich et al. “Challenges of Short Substrate Analogues as SARS-CoV-2 Main Protease Inhibitors”. In: Bioorg. Med. Chem. Lett. 50 (2021), p. 128333. [103] Q. Li and C. Kang. “Progress in Developing Inhibitors of SARS-CoV-2 3C-Like Protease”. In: Microorganisms 8 (2020), p. 1250. [104] W. Dai et al. “Structure-Based Design of Antiviral Drug Candidates Targeting the SARS-CoV-2 Main Protease”. In: Science 368 (2020), p. 1331. [105] Z. Jin et al. “Structural Basis for the Inhibition of SARS-CoV-2 Main Protease by Antineoplastic Drug Carmofur”. In: Nat. Struct. Mol. Biol. 27 (2020), p. 529. [106] S. Ma, L. S. Devi-Kesavan, and J. Gao. “Molecular Dynamics Simulations of the Catalytic Pathway of a Cysteine Protease: A Combined QM/MM Study of Human Cathepsin K”. In: J. Am. Chem. Soc. 129 (2007), p. 13633. [107] A. G. Taranto, P. Carvalho, and M. A. Avery. “QM/QM Studies for Michael Reaction in Coronavirus Main Protease (3CL Pro)”. In: J. Mol. Graphics Modell. 27 (2008), p. 275. [108] C. A. Ramos-Guzmán, J. J. Ruiz-Pernía, and I. Tuñón. “Inhibition Mechanism of SARS-CoV-2 Main Protease with Ketone-Based Inhibitors Unveiled by Multiscale Simulations: Insights for Improved Designs”. In: Angew. Chem., Int. Ed. 60 (2021), p. 25933. [109] C. A. Ramos-Guzmán, J. J. Ruiz-Pernía, and I. Tuñón. “Computational Simulations on the Binding and Reactivity of a Nitrile Inhibitor of the SARS-CoV-2 Main Protease”. In: Chem. Commun. 57 (2021), p. 9096. [110] C. A. Ramos-Guzmán et al. “The Impact of SARS-CoV-2 3CL Protease Mutations on Nirmatrelvir Inhibitory Efficiency. Computational Insights into Potential Resistance Mechanisms”. In: Chem. Sci. 14 (2023), p. 2686. [111] S. T. Ngo et al. “Insights into the Binding and Covalent Inhibition Mechanism of PF-07321332 to SARS-CoV-2 Mpro”. In: RSC Adv. 12 (2022), p. 3729. [112] G. Oanca et al. “Exploring the Catalytic Reaction of Cysteine Proteases”. In: J. Phys. Chem. B 124 (2020), p. 11349. [113] D. Wei et al. “Reaction Pathway and Free Energy Profile for Papain-Catalyzed Hydrolysis of N-Acetyl-Phe-Gly 4-Nitroanilide”. In: Biochemistry 52 (2013), p. 5145. [114] K. Świderek and V. Moliner. “Revealing the Molecular Mechanisms of Proteolysis of SARS-CoV-2 Mpro by QM/MM Computational Methods”. In: Chem. Sci. 11 (2020), p. 10626. 138 [115] C. A. Ramos-Guzmán, J. J. Ruiz-Pernía, and n. Tuñón. “Unraveling the SARS-CoV-2 Main Protease Mechanism Using Multiscale Methods”. In: ACS Catal. 10 (2020), p. 12544. [116] D. Mondal and A. Warshel. “Exploring the Mechanism of Covalent Inhibition: Simulating the Binding Free Energy of -Ketoamide Inhibitors of the Main Protease of SARS-CoV-2”. In: Biochemistry 59 (2020), p. 4601. [117] K. Arafet et al. “Mechanism of Inhibition of SARS-CoV-2 Mpro by N3 Peptidyl Michael Acceptor Explained by QM/MM Simulations and Design of New Derivatives with Tunable Chemical Reactivity”. In: Chem. Sci. 12 (2021), p. 1433. [118] C. A. Ramos-Guzmán, J. J. Ruiz-Pernía, and I. Tuñón. “A Microscopic Description of SARS-CoV-2 Main Protease Inhibition with Michael Acceptors. Strategies for Improving Inhibitor Design”. In: Chem. Sci. 12 (2021), p. 3489. [119] C. A. Ramos-Guzmán, J. J. Ruiz-Pernía, and I. Tuñón. “Multiscale Simulations of SARS-CoV-2 3CL Protease Inhibition with Aldehyde Derivatives. Role of Protein and Inhibitor Conformational Changes in the Reaction Mechanism”. In: ACS Catal. 11 (2021), p. 4157. [120] H.-H. Otto and T. Schirmeister. “Cysteine Proteases and Their Inhibitors”. In: Chem. Rev. 97 (1997), p. 133. [121] J. C. Powers et al. “Irreversible Inhibitors of Serine, Cysteine, and Threonine Proteases”. In: Chem. Rev. 102 (2002), p. 4639. [122] K. Arafet, S. Ferrer, and V. Moliner. “First Quantum Mechanics/Molecular Mechanics Studies of the Inhibition Mechanism of Cruzain by Peptidyl Halomethyl Ketones”. In: Biochemistry 54 (2015), p. 3381. [123] K. Arafet, S. Ferrer, and V. Moliner. “Computational Study of the Catalytic Mechanism of the Cruzain Cysteine Protease”. In: ACS Catal. 7 (2017), p. 1207. [124] K. Arafet et al. “Quantum Mechanics/Molecular Mechanics Studies of the Mechanism of Cysteine Protease Inhibition by Peptidyl-2,3-Epoxyketones”. In: Phys. Chem. Chem. Phys. 19 (2017), p. 12740. [125] K. Arafet, K. Świderek, and V. Moliner. “Computational Study of the Michaelis Complex Formation and the Effect on the Reaction Mechanism of Cruzain Cysteine Protease”. In: ACS Omega 3 (2018), p. 18613. [126] J. R. A. Silva et al. “Assessment of the Cruzain Cysteine Protease Reversible and Irreversible Covalent Inhibition Mechanism”. In: J. Chem. Inf. Model. 60 (2020), p. 880. [127] A. M. Dos Santos et al. “Experimental Study and Computational Modelling of Cruzain Cysteine Protease Inhibition by Dipeptidyl Nitriles”. In: Phys. Chem. Chem. Phys. 20 (2018), p. 24317. [128] C. H. S. Da Costa et al. “Evaluating QM/MM Free Energy Surfaces for Ranking Cysteine Protease Covalent Inhibitors”. In: J. Chem. Inf. Model. 60 (2020), p. 880. 139 [129] P. Klein et al. “New Cysteine Protease Inhibitors: Electrophilic (Het)Arenes and Unexpected Prodrug Identification for the Trypanosoma Protease Rhodesain”. In: Molecules 25 (2020), p. 1451. [130] W. Vuong et al. “Feline Coronavirus Drug Inhibits the Main Protease of SARS-CoV-2 and Blocks Virus Replication”. In: Nat. Commun. 11 (2020), p. 4282. [131] J. Sakamoto et al. “An Individual Patient Data Meta-Analysis of Adjuvant Therapy with Carmofur in Patients with Curatively Resected Colon Cancer”. In: Jpn. J. Clin. Oncol. 35 (2005), p. 536. [132] D. R. Owen et al. “An Oral SARS-CoV-2 Mpro Inhibitor Clinical Candidate for the Treatment of COVID-19”. In: Science 374 (2021), p. 1586. [133] B. Ahmad et al. “Exploring the Binding Mechanism of PF-07321332 SARS-CoV-2 Protease Inhibitor through Molecular Dynamics and Binding Free Energy Simulations”. In: Int. J. Mol. Sci. 22 (2021), p. 9124. [134] A. M. Andrianov et al. “Computational Discovery of Small Drug-like Compounds as Potential Inhibitors of SARS-CoV-2 Main Protease”. In: J. Biomol. Struct. Dyn. 39 (2021), p. 5779. [135] V. V. Welborn. “Beyond Structural Analysis of Molecular Enzyme-Inhibitor Interactions”. In: Electron. Struct. 4 (2022), p. 14006. [136] H. M. Berman et al. “The Protein Data Bank”. In: Nucleic Acids Res. 28 (2000), p. 235. [137] F. Richter et al. “Computational Design of Catalytic Dyads and Oxyanion Holes for Ester Hydrolysis”. In: J. Am. Chem. Soc. 134 (2012), p. 16197. [138] A. Paasche, T. Schirmeister, and B. Engels. “Benchmark Study for the Cysteine–Histidine Proton Transfer Reaction in a Protein Environment: Gas Phase, COSMO, QM/MM Approaches”. In: J. Chem. Theory Comput. 9 (2013), p. 1765. [139] A. Paasche et al. “Evidence for Substrate Binding-Induced Zwitterion Formation in the Catalytic Cys-His Dyad of the SARS-CoV Main Protease”. In: Biochemistry 53 (2014), p. 5930. [140] B. Elsässer et al. “Distinct Roles of Catalytic Cysteine and Histidine in the Protease and Ligase Mechanisms of Human Legumain As Revealed by DFT-Based QM/MM Simulations”. In: ACS Catal. 7 (2017), p. 5585. [141] L. Zanetti-Polzi et al. “Tuning Proton Transfer Thermodynamics in SARS-CoV-2 Main Protease: Implications for Catalysis and Inhibitor Design”. In: J. Phys. Chem. Lett. 12 (2021), p. 4195. [142] E. Aprá et al. “NWChem: Past, Present, and Future”. In: J. Chem. Phys. 152 (2020), p. 184102. [143] A. I. Krylov and P. M. W. Gill. “Q-Chem: An Engine for Innovation”. In: WIREs Comput. Mol. Sci. 3 (2013), p. 317. [144] E. Epifanovsky et al. “Software for the frontiers of quantum chemistry: An overview of developments in the Q-Chem 5 package”. In: J. Chem. Phys. 155.8 (2021), p. 084801. 140 [145] G. Giudetti et al. “How Reproducible Are QM/MM Simulations? Lessons from Computational Studies of the Covalent Inhibition of the SARS-CoV-2 Main Protease by Carmofur”. In: J. Chem. Theory Comput. 18 (2022), p. 5056. [146] C. Adamo and V. Barone. “Toward Reliable Density Functional Methods without Adjustable Parameters: The PBE0 Model”. In: J. Chem. Phys. 110 (1999), p. 6158. [147] S. Grimme, S. Ehrlich, and L. Goerigk. “Effect of the Damping Function in Dispersion Corrected Density Functional Theory”. In: J. Comput. Chem. 32 (2011), p. 1456. [148] N. Mardirossian and M. Head-Gordon. “Thirty Years of Density Functional Theory in Computational Chemistry: An Overview and Extensive Assessment of 200 Density Functionals”. In: Mol. Phys. 115 (2017), p. 2315. [149] J. W. Ponder and D. A. Case. “Force Fields for Protein Simulations”. In: Adv. Protein Chem. 66 (2003), p. 27. [150] T. S. Lee et al. “GPU-Accelerated Molecular Dynamics and Free Energy Methods in Amber18: Performance Enhancements and New Features”. In: J. Chem. Inf. Model. 58 (2018), p. 2043. [151] J. Kästner and W. Thiel. “Bridging the Gap between Thermodynamic Integration and Umbrella Sampling Provides a Novel Analysis Method: “Umbrella Integration””. In: J. Chem. Phys. 123 (2005), p. 144104. [152] G. Fiorin, M. L. Klein, and J. Hénin. “Using Collective Variables to Drive Molecular Dynamics Simulations”. In: Mol. Phys. 111 (2013), p. 3345. [153] M. C. R. Melo et al. “NAMD Goes Quantum: An Integrative Suite for Hybrid Simulations”. In: Nat. Methods 15 (2018), p. 351. [154] J. C. Phillips et al. “Scalable Molecular Dynamics on CPU and GPU Architectures with NAMD”. In: J. Chem. Phys. 153 (2020), p. 44130. [155] M. G. Khrenova, V. G. Tsirelson, and A. V. Nemukhin. “Dynamical Properties of Enzyme–Substrate Complexes Disclose Substrate Specificity of the SARS-CoV-2 Main Protease as Characterized by the Electron Density Descriptors”. In: Phys. Chem. Chem. Phys. 22 (2020), p. 19069. [156] M. G. Khrenova, B. L. Grigorenko, and A. V. Nemukhin. “Molecular Modeling Reveals the Mechanism of Ran-RanGAP-Catalyzed Guanosine Triphosphate Hydrolysis without an Arginine Finger”. In: ACS Catal. 11 (2021), p. 8985. [157] M. G. Khrenova, A. M. Kulakova, and A. V. Nemukhin. “Light-Induced Change of Arginine Conformation Modulates the Rate of Adenosine Triphosphate to Cyclic Adenosine Monophosphate Conversion in the Optogenetic System Containing Photoactivated Adenylyl Cyclase”. In: J. Chem. Inf. Model. 61 (2021), p. 1215. [158] S. Seritan et al. “TeraChem: A Graphical Processing Unit-Accelerated Electronic Structure Package for Large-Scale Ab Initio Molecular Dynamics”. In: WIREs Comput. Mol. Sci. 11 (2021), e1494. 141 [159] R. B. Best et al. “Optimization of the Additive CHARMM All-Atom Protein Force Field Targeting Improved Sampling of the Backbone , and Side-Chain 1 and 2 Dihedral Angles”. In: J. Chem. Theory Comput. 8 (2012), p. 3257. [160] A. J. J. Lennox. “Meisenheimer Complexes in SNAr Reactions: Intermediates or Transition States?” In: Angew. Chem., Int. Ed. 57 (2018), p. 14686. [161] N. A. Senger et al. “The Element Effect Revisited: Factors Determining Leaving Group Ability in Activated Nucleophilic Aromatic Substitution Reactions”. In: J. Org. Chem. 77 (2012), p. 9535. [162] C. Li and G. A. Voth. “A Quantitative Paradigm for Water-Assisted Proton Transport through Proteins and Other Confined Spaces”. In: Proc. Natl. Acad. Sci. 118 (2021), e2113141118. [163] M. Y. Zakharova et al. “Pre-Steady-State Kinetics of the SARS-CoV-2 Main Protease as a Powerful Tool for Antiviral Drug Discovery”. In: Front. Pharmacol. 12 (2021), p. 773198. [164] J. Solowiej et al. “Steady-State and Pre-Steady-State Kinetic Evaluation of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) 3CL pro Cysteine Protease: Development of an Ion-Pair Model for Catalysis”. In: Biochemistry 47 (2008), p. 2617. [165] H. S. Fernandes, S. F. Sousa, and N. M. F. S. A. Cerqueira. “New Insights into the Catalytic Mechanism of the SARS-CoV-2 Main Protease: An ONIOM QM/MM Approach”. In: Mol. Diversity 26 (2022), p. 1373. [166] S. Burge et al. “Quadruplex DNA: sequence, topology and structure”. In: Nucleic Acids Research 34.19 (Sept. 2006), pp. 5402–5415. issn: 0305-1048. [167] S. Neidle. “The structures of quadruplex nucleic acids and their drug complexes”. In: Current Opinion in Structural Biology 19.3 (2009), pp. 239–250. issn: 0959-440X. [168] W. A. Harrell Jr, S. Neidle, and S. Balasubramanian. Quadruplex nucleic acids. Vol. 7. 3. The Royal Society of Chemistry, 2006. [169] R. Hänsel-Hertsch, M. Di Antonio, and S. Balasubramanian. “DNA G-quadruplexes in the human genome: detection, functions and therapeutic potential”. In: Nature Reviews Molecular Cell Biology 18.5 (2017), pp. 279–284. issn: 1471-0080. [170] S. Neidle. “Human telomeric G-quadruplex: The current status of telomeric G-quadruplexes as therapeutic targets in human cancer”. In: The FEBS Journal 277.5 (2010), pp. 1118–1125. [171] M. L. Bochman, K. Paeschke, and V. A. Zakian. “DNA secondary structures: stability and function of G-quadruplex structures”. In: Nature Reviews Genetics 13.11 (2012), pp. 770–780. issn: 1471-0064. [172] H. J. Lipps and D. Rhodes. “G-quadruplex structures: in vivo evidence and function”. In: Trends Cell Biol. 19.70 (2009). issn: 414–422. 142 [173] A. Ou et al. “High resolution crystal structure of a KRAS promoter G-quadruplex reveals a dimer with extensive poly-A -stacking interactions for small-molecule recognition”. In: Nucleic Acids Research 48.10 (2020), pp. 5766–5776. issn: 0305-1048. [174] K. G. Moghaddam et al. “Binding of quinazolinones to c-KIT G-quadruplex; an interplay between hydrogen bonding and - stacking”. In: Biophysical Chemistry 253 (2019), p. 106220. issn: 0301-4622. [175] F. Wang, X. Liu, and I. Willner. “DNA Switches: From Principles to Applications”. In: Angewandte Chemie International Edition 54.4 (2015), pp. 1098–1129. [176] M. McCullagh et al. “DNA-Based Optomechanical Molecular Motor”. In: Journal of the American Chemical Society 133.10 (2011), pp. 3452–3459. [177] X. Wang et al. “Conformational Switching of G-Quadruplex DNA by Photoregulation”. In: Angewandte Chemie International Edition 49.31 (2010), pp. 5305–5309. [178] D. Miyoshi et al. “Structural Polymorphism of Telomeric DNA Regulated by pH and Divalent Cation”. In: Nucleosides, Nucleotides & Nucleic Acids 22.2 (2003), pp. 203–221. [179] T. Li, S. Dong, and E. Wang. “A Lead(II)-Driven DNA Molecular Device for Turn-On Fluorescence Detection of Lead(II) Ion with High Selectivity and Sensitivity”. In: Journal of the American Chemical Society 132.38 (2010), pp. 13156–13157. [180] D. M. Engelhard, J. Nowack, and G. H. Clever. “Copper-Induced Topology Switching and Thrombin Inhibition with Telomeric DNA G-Quadruplexes”. In: Angewandte Chemie International Edition 56.38 (2017), pp. 11640–11644. [181] D. P. N. Gonçalves et al. “Tetramethylpyridiniumporphyrazines—a new class of G-quadruplex inducing and stabilising ligands”. In: Chem. Commun. (45 2006), pp. 4685–4687. [182] D. P. N. Gonçalves et al. “Synthesis and G-quadruplex binding studies of new 4-N-methylpyridinium porphyrins”. In: Org. Biomol. Chem. 4 (17 2006), pp. 3337–3342. [183] R. Rodriguez et al. “Ligand-Driven G-Quadruplex Conformational Switching By Using an Unusual Mode of Interaction”. In: Angewandte Chemie International Edition 46.28 (2007), pp. 5405–5407. [184] A. S. Lubbe, W. Szymanski, and B. L. Feringa. “Recent developments in reversible photoregulation of oligonucleotide structure and function”. In: Chem. Soc. Rev. 46 (4 2017), pp. 1052–1079. [185] W. Szymański et al. “Reversible Photocontrol of Biological Systems by the Incorporation of Molecular Photoswitches”. In: Chemical Reviews 113.8 (2013), pp. 6114–6178. [186] S. Lena et al. “Triggering of Guanosine Self-Assembly by Light”. In: Angewandte Chemie International Edition 49.21 (2010), pp. 3657–3660. [187] S. Ogasawara and M. Maeda. “Reversible Photoswitching of a G-Quadruplex”. In: Angewandte Chemie International Edition 48.36 (2009), pp. 6671–6674. 143 [188] J. Thevarpadam et al. “Photoresponsive Formation of an Intermolecular Minimal G-Quadruplex Motif”. In: Angewandte Chemie International Edition 55.8 (2016), pp. 2738–2742. [189] A. Pérez et al. “Refinement of the AMBER Force Field for Nucleic Acids: Improving the Description of / Conformers”. In: Biophys. J. 92 (2007), 3817–3829. [190] M. P. Long et al. “Molecular dynamics simulations of alkaline earth metal ions binding to DNA reveal ion size and hydration effects”. In: Phys. Chem. Chem. Phys. 22.10 (2020). issn: 1463-9076. [191] H. W. Kim, Y. M. Rhee, and S. K. Shin. “Charge–dipole interactions in G-quadruplex thrombin-binding aptamer”. In: Phys. Chem. Chem. Phys. 20 (32 2018), pp. 21068–21074. [192] A. K. Sahoo, B. Bagchi, and P. K. Maiti. “Understanding enhanced mechanical stability of DNA in the presence of intercalated anticancer drug: Implications for DNA associated processes”. In: The Journal of Chemical Physics 151.16 (2019), p. 164902. issn: 0021-9606. [193] J. Wang et al. “Development and testing of a general amber force field”. In: Journal of Computational Chemistry 25.9 (2004), pp. 1157–1174. [194] T. Fox and P. A. Kollman. “Application of the RESP Methodology in the Parametrization of Organic Solvents”. In: The Journal of Physical Chemistry B 102.41 (1998), pp. 8070–8079. [195] J. qvist. “Ion-water interaction potentials derived from free energy perturbation simulations”. In: The Journal of Physical Chemistry 94.21 (1990), pp. 8021–8024. [196] G. Bussi, D. Donadio, and M. Parrinello. “Canonical sampling through velocity rescaling”. In: The Journal of Chemical Physics 126.1 (2007), p. 014101. issn: 0021-9606. [197] H. J. C. Berendsen et al. “Molecular dynamics with coupling to an external bath”. In: The Journal of Chemical Physics 81.8 (1984), pp. 3684–3690. issn: 0021-9606. [198] M. Parrinello and A. Rahman. “Polymorphic transitions in single crystals: A new molecular dynamics method”. In: Journal of Applied Physics 52.12 (1981), pp. 7182–7190. issn: 0021-8979. [199] S. Nosé and M. Klein. “Constant pressure molecular dynamics for molecular systems”. In: Molecular Physics 50.5 (1983), pp. 1055–1076. [200] U. Essmann et al. “A smooth particle mesh Ewald method”. In: The Journal of Chemical Physics 103.19 (1995), pp. 8577–8593. issn: 0021-9606. [201] B. Hess et al. “LINCS: A linear constraint solver for molecular simulations”. In: Journal of Computational Chemistry 18.12 (1997), pp. 1463–1472. [202] B. Hess et al. “GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation”. In: Journal of Chemical Theory and Computation 4.3 (2008), pp. 435–447. [203] I. S. Joung and T. E. I. Cheatham. “Determination of Alkali and Halide Monovalent Ion Parameters for Use in Explicitly Solvated Biomolecular Simulations”. In: The Journal of Physical Chemistry B 112.30 (2008), pp. 9020–9041. 144 [204] S. Maeda, K. Ohno, and K. Morokuma. “Updated Branching Plane for Finding Conical Intersections without Coupling Derivative Vectors”. In: J. Chem. Theory Comput. 6.5 (2010), pp. 1538–1545. [205] M. J. Graham et al. “Forging solid-state qubit design principles in a molecular furnace”. In: Chem. Mater. 29 (2017), p. 1885. [206] M. Atzori and R. Sessoli. “The second quantum revolution: Role and challenges of molecular chemistry”. In: J. Am. Chem. Soc. 141 (2019), p. 11339. [207] M. R. Wasielewski et al. “Exploiting chemistry and molecular systems for quantum information science”. In: Nature Rev. Chem. 4 (2020), p. 490. [208] D. Aravena and E. Ruiz. “Spin dynamics in single-molecule magnets and molecular qubits”. In: Dalton Trans 49 (2020), p. 9916. [209] J. M. Zadrozny et al. “Millisecond coherence time in a tunable molecular electronic spin qubit”. In: ACS Central Sci. 1 (2015), p. 488. [210] D. P. Divincenzo. “The physical implementation of quantum computation”. In: Fortschr. Phys. 48 (2000), p. 771. [211] S. L. Bayliss et al. “Optically addressable molecular spins for quantum information processing”. In: Science 370 (2020), p. 1309. [212] J. C. Bardin, D. H. Slichter, and D. J. Reilly. “Microwaves in quantum computing”. In: IEEE Microw. Mag. 1 (2021), p. 403. [213] A. J. Heinrich et al. “Single-atom spin-flip spectroscopy”. In: Science 306 (2004), p. 466. [214] N. Tsukahara et al. “Adsorption-induced switching of magnetic anisotropy in a single iron(II) phthalocyanine molecule on an oxidized Cu(110) surface”. In: Phys. Rev. Lett. 102 (2009), p. 167203. [215] I. G. Rau et al. “Reaching the magnetic anisotropy limit of a 3d metal atom”. In: Science 344 (2014), p. 988. [216] A. Cornia et al. “Chemical strategies and characterization tools for the organization of single molecule magnets on surfaces”. In: Chem. Soc. Rev. 40 (2011), p. 3076. [217] N. Bachellier et al. “Unveiling nickelocene bonding to a noble metal surface”. In: Phys. Rev. B 93 (2016), p. 195403. [218] M. Ormaza et al. “Efficient spin-flip excitation of a nickelocene molecule”. In: Nano Lett. 17 (2017), p. 1877. [219] B. Verlhac et al. “Atomic-scale spin sensing with a single molecule at the apex of a scanning tunneling microscope”. In: Science 366 (2019), p. 623. 145 [220] G. Czap et al. “Probing and imaging spin interactions with a magnetic single-molecule sensor”. In: Science 364 (2019), p. 670. [221] G. Czap et al. “Detection of spin-vibration states in single magnetic molecules”. In: Phys. Rev. Lett. 123 (2019), p. 106803. [222] K. L. Harriman, D. Errulat, and M. Murugesu. “Magnetic axiality: Design principles from molecules to materials”. In: Trends Chem. 1 (2019), p. 425. [223] C. A. Downing, A. A. Sokol, and C. R. A. Catlow. “The reactivity of CO2 on the MgO(100) surface”. In: Phys. Chem. Chem. Phys. 16 (2014), p. 184. [224] J. P. Perdew, K. Burke, and M. Ernzerhof. “Generalized gradient approximation made simple”. In: Phys. Rev. Lett. 77 (1996), p. 3865. [225] N. Orms and A. I. Krylov. “Singlet-triplet energy gaps and the degree of diradical character in binuclear copper molecular magnets characterized by spin-flip density functional theory”. In: Phys. Chem. Chem. Phys. 20 (2018), p. 13127. [226] S. Kotaru et al. “Magnetic exchange interactions in binuclear and tetranuclear iron(III) complexes described by spin-flip DFT and Heisenberg effective Hamiltonians”. In: J. Comput. Chem. 44 (2023), p. 367. [227] M. A. Rohrdanz, K. M. Martins, and J. M. Herbert. “A long-range-corrected density functional that performs well for both ground-state properties and time-dependent density functional theory excitation energies, including charge-transfer excited states”. In: J. Chem. Phys. 130 (2009), p. 054112. [228] Y. A. Bernard, Y. Shao, and A. I. Krylov. “General formulation of spin-flip time-dependent density functional theory using non-collinear kernels: Theory, implementation, and benchmarks”. In: J. Chem. Phys. 136 (2012), p. 204103. [229] F. Wang and T. Ziegler. “Time-dependent density functional theory based on a noncollinear formulation of the exchange-correlation potential”. In: J. Chem. Phys. 121 (2004), p. 12191. [230] M. Alessio, F. A. Bischoff, and J. Sauer. “Chemically accurate adsorption energies for methane and ethane monolayers on the MgO(001) surface”. In: Phys. Chem. Chem. Phys. 20 (2018), p. 9760. [231] M. Alessio, D. Usvyat, and J. Sauer. “Chemically accurate adsorption energies: CO and H2O on the MgO(001) surface”. In: J. Chem. Theory Comput. 15 (2019), p. 1329. [232] E. Epifanovsky et al. “Software for the Frontiers of Quantum Chemistry: An Overview of Developments in the Q-Chem 5 Package”. In: J. Chem. Phys. 155 (2021), p. 84801. [233] M. Alessio and A. I. Krylov. “Equation-of-motion coupled-cluster protocol for calculating magnetic properties: Theory and applications to single-molecule magnets”. In: J. Chem. Theory Comput. 17 (2021), p. 4225. 146 [234] S. Gozem and A. I. Krylov. “The ezSpectra suite: An easy-to-use toolkit for spectroscopy modeling”. In: WIREs: Comput. Mol. Sci. 12 (2022), e1546. [235] S. Trtica et al. “Naphthalene-Bridged ansa-Nickelocene: Synthesis, Structure, Electrochemical, and Magnetic Measurements”. In: European Journal of Inorganic Chemistry 2012.28 (2012), pp. 4486–4493. [236] R. A. Musgrave et al. “Role of torsional strain in the ring-opening polymerisation of low strain [n]nickelocenophanes”. In: Chem. Sci. 10 (42 2019), pp. 9841–9852. [237] R. Prins, J. van Voorst, and C. Schinkel. “Zero-field splitting in the triplet ground state of nickelocene”. In: Chemical Physics Letters 1.2 (1967), pp. 54–55. [238] P. Baltzer et al. “Magnetic properties of nickelocene. A reinvestigation using inelastic neutron scattering and magnetic susceptibility”. In: Inorganic Chemistry 27.9 (1988), pp. 1543–1548. [239] H. B. Schlegel. “Exploring potential energy surfaces for chemical reactions: An overview of some practical methods”. In: J. Comput. Chem. 24.12 (2003), pp. 1514–1527. [240] F. Bernardi, M. Olivucci, and M. A. Robb. “Potential energy surface crossings in organic photochemistry”. In: Chem. Soc. Rev. 25 (5 1996), pp. 321–328. [241] J. M. Combes, P. Duclos, and R. Seiler. The Born-Oppenheimer Approximation. Boston, MA: Springer US, 1981, pp. 185–213. isbn: 978-1-4613-3350-0. [242] R. Woolley and B. Sutcliffe. “Molecular structure and the born—Oppenheimer approximation”. In: Chem. Phys. Lett. 45.2 (1977), pp. 393–398. issn: 0009-2614. [243] H. Essén. “The physics of the born–oppenheimer approximation”. In: Int. J. Quant. Chem. 12.4 (1977), pp. 721–735. [244] R. Iftimie, P. Minary, and M. E. Tuckerman. “Ab initio molecular dynamics: Concepts, recent developments, and future trends”. In: Proc. Nat. Acad. Sci. 102.19 (2005), pp. 6654–6659. [245] M. E. Tuckerman et al. “Ab Initio Molecular Dynamics Simulations”. In: J. Phys. Chem. 100.31 (1996), pp. 12878–12887. [246] L. J. Butler. “Chemical reaction dynamics beyond the Born-Oppenheimer approximation”. In: Annu. Rev. Phys. Chem. 49.1 (1998). PMID: 15012427, pp. 125–171. [247] T. R. Nelson, S. Fernandez-Alberti, and S. Tretiak. “Modeling excited-state molecular dynamics beyond the Born–Oppenheimer regime”. In: 2.11 (2022), pp. 689–692. issn: 2662-8457. [248] J. C. Tully. “Molecular dynamics with electronic transitions”. In: J. Chem. Phys. 93.2 (July 1990), pp. 1061–1071. issn: 0021-9606. [249] L. Wang, A. Akimov, and O. V. Prezhdo. “Recent Progress in Surface Hopping: 2011–2015”. In: J. Phys. Chem. Lett. 7.11 (2016). PMID: 27171314, pp. 2100–2112. 147 [250] W. Li and A. Ma. “Recent developments in methods for identifying reaction coordinates”. In: Mol. Simul. 40.10-11 (2014), pp. 784–793. [251] P. Tavadze et al. “A Machine-Driven Hunt for Global Reaction Coordinates of Azobenzene Photoisomerization”. In: J. Am. Chem. Soc. 140.1 (2018). PMID: 29235856, pp. 285–290. [252] S. Filipek et al. “G Protein-Coupled Receptor Rhodopsin: A Prospectus”. In: Annu. Rev. Physiol. 65.1 (2003). PMID: 12471166, pp. 851–879. [253] K. Palczewski. “G Protein–Coupled Receptor Rhodopsin”. In: Annu. Rev. Biochem. 75.1 (2006). PMID: 16756510, pp. 743–767. [254] R. W. Schoenlein et al. “The First Step in Vision: Femtosecond Isomerization of Rhodopsin”. In: Science 254.5030 (1991), pp. 412–415. [255] W. Wang, J. H. Geiger, and B. Borhan. “The photochemical determinants of color vision”. In: BioEssays 36.1 (2014), pp. 65–74. [256] H. Kandori, Y. Shichida, and T. Yoshizawa. “Photoisomerization in Rhodopsin”. In: Biochemistry (Mosc) 66.11 (2001), pp. 1197–1209. issn: 1608-3040. [257] K. Palczewski and P. D. Kiser. “Shedding new light on the generation of the visual chromophore”. In: Proc. Nat. Acad. Sci. 117.33 (2020), pp. 19629–19638. [258] P. J. M. Johnson et al. “Local vibrational coherences drive the primary photochemistry of vision”. In: 7.12 (2015), pp. 980–986. issn: 1755-4349. [259] A. Kraskov, H. Stögbauer, and P. Grassberger. “Estimating mutual information”. In: Phys. Rev. E 69 (6 2004), p. 066138. [260] H. Peng, F. Long, and C. Ding. “Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy”. In: IEEE Trans. Pattern Anal. Mach. Intell. 27.8 (2005), pp. 1226–1238. [261] S. M. Mangan et al. “Dependence between Structural and Electronic Properties of CsPbI3: Unsupervised Machine Learning of Nonadiabatic Molecular Dynamics”. In: J. Phys. Chem. Lett. 12.35 (2021), pp. 8672–8678. [262] G. Zhou, W. Chu, and O. V. Prezhdo. “Structural Deformation Controls Charge Losses in MAPbI3: Unsupervised Machine Learning of Nonadiabatic Molecular Dynamics”. In: 5.6 (2020), pp. 1930–1938. [263] T. M. Cover and J. A. Thomas. Elements of Information Theory. New York: Wiley & Sons, 1991. [264] G. Varoquaux, T. Vaught, and J. Millman, eds. Exploring Network Structure, Dynamics, and Function using NetworkX. Pasadena, CA USA, 2008, pp. 11 –15. [265] E. W. Dijkstra. “A note on two problems in connexion with graphs”. In: Numer. Math. 1.1 (1959), pp. 269–271. issn: 0945-3245. 148 [266] S. Viswanath et al. “Analyzing milestoning networks for molecular kinetics: Definitions, algorithms, and examples”. In: J. Chem. Phys. 139.17 (Nov. 2013), p. 174105. issn: 0021-9606. [267] E. Epifanovsky et al. “Software for the frontiers of quantum chemistry: An overview of developments in the Q-Chem 5 package”. In: J. Chem. Phys. 155.8 (Aug. 2021), p. 084801. issn: 0021-9606. [268] M. F. S. J. Menger, J. Ehrmaier, and S. Faraji. “PySurf: A Framework for Database Accelerated Direct Dynamics”. In: J. Chem. Theory Comput. 16.12 (2020), pp. 7681–7689. issn: 1549-9618. [269] L. Mannack, S. Eising, and A. Rentmeister. “Current techniques for visualizing RNA in cells”. In: F100Research 5.775 (2016). [270] R. W. Dirks and H. J. Tanke. “Styryl Molecules Light-Up RNAs”. In: Chem. & Biol. 13 (2006), pp. 559–561. [271] Q. Li et al. “RNA-Selective, Live Cell Imaging Probes for Studying Nuclear Structure and Function”. In: Chem. & Biol. 13 (2006), pp. 615–623. [272] L. Guo et al. “Indole-based cyanine as a nuclear RNA-selective two-photon fluorescent probe for live cell imaging”. In: ACS Chem. Biol. 10 (2015), 11711175. [273] G. Song et al. “Low molecular weight fluorescent probes with good photostability for imaging RNA-rich nucleolus and RNA in cytoplasm in living cells”. In: Biomaterials 35 (2014), pp. 2103–2112. [274] M. Jung Kim et al. “Development of Highly Fluorogenic Styrene Probes for Visualizing RNA in Live Cells”. In: ACS Chem. Biol. 18 (2023), pp. 1523–1533. [275] W. Kutzelnigg. “What I like about Hückel theory”. In: J. Comput. Chem. 28.1 (2007), pp. 25–34. [276] Y. G. Ermakova et al. “Pyridinium analogues of green fluorescent protein chromophore: Fluorogenic dyes with large solvent-dependent Stokes shifts”. In: J. Phys. Chem. Lett. 8 (2018), pp. 1958–1963. [277] A. Patra et al. “Performance of density functionals for excited-state properties of isolated chromophores and exciplexes: Emission spectra, solvatochromic shifts, and charge-transfer character”. In: J. Chem. Theory Comput. 20 (2024), pp. 2520–2537. [278] N. Mardirossian and M. Head-Gordon. “Thirty years of density functional theory in computational chemistry: An overview and extenive assessment of 200 density functionals”. In: Mol. Phys. 115 (2017), pp. 2315–2372. [279] A. I. Krylov. “Equation-of-motion coupled-cluster methods for open-shell and electronically excited species: The hitchhiker’s guide to Fock space”. In: Annu. Rev. Phys. Chem. 59 (2008), pp. 433–462. [280] A. I. Krylov. “From orbitals to observables and back”. In: J. Chem. Phys. 153 (2020), p. 080901. 149 [281] F. Plasser, A. I. Krylov, and A. Dreuw. “libwfa: Wavefunction analysis tools for excited and open-shell electronic states”. In: WIREs: Comput. Mol. Sci. 12 (2022), e1595. [282] S. A. Mewes et al. “Benchmarking excited-state calculations using exciton properties”. In: J. Chem. Theory Comput. 14 (2018), pp. 710–725. [283] K. B. Bravaya et al. “Quantum chemistry behind bioimaging: Insights from ab initio studies of fluorescent proteins and their chromophores”. In: Acc. Chem. Res. 45 (2012), pp. 265–275. [284] S. M. Risser, D. N. Beratan, and S. Marder. “Structure-function relationship for β, the first molecular hyperpolarizability”. In: J. Am. Chem. Soc. 115 (1993), pp. 7719–7728. [285] E. Kamarchik and A. I. Krylov. “Non-Condon effects in one- and two-photon absorption spectra of the green fluorescent protein”. In: J. Phys. Chem. Lett. 2 (2011), pp. 488–492. [286] W. Skomorowski, S. Gulania, and A. I. Krylov. “Bound and continuum-embedded states of cyanopolyyne anions”. In: Phys. Chem. Chem. Phys. 20 (2018), pp. 4805–4817. [287] K. Yates. “VI - THE QUANTITATIVE SIGNIFICANCE OF HMO RESULTS”. In: (1978). Ed. by K. Yates, pp. 206–238. [288] K. Yates. “V - EXTENSIONS AND IMPROVEMENTS OF THE SIMPLE HüCKEL METHOD”. In: (1978). Ed. by K. Yates, pp. 156–205. [289] K. Yates. “II - HüCKEL MOLECULAR ORBITAL THEORY”. In: (1978). Ed. by K. Yates, pp. 27–87. [290] T. R. Gosnell. Fundamentals of Spectroscopy and Laser Physics. Cambridge University Press, 2002. [291] D. Chudakov et al. “Fluorescent Proteins and Their Applications in Imaging Living Cells and Tissues”. In: Physiol. Rev. 90 (2010), pp. 1103–1163. [292] A. Acharya et al. “Photoinduced Chemistry in Fluorescent Proteins: Curse or Blessing?” In: Chem. Rev. 117 (2017), pp. 758–795. [293] C. Smith. “Two microscopes are better than one”. In: Nature 492 (2012), pp. 293–297. [294] D. L. Sai et al. “Tailoring photosensitive ROS for advanced photodynamic therapy”. In: Exp. & Mol. Med. 53 (2021), pp. 495–504. [295] M. A. McLean et al. “Mechanism of Chromophore Assisted Laser Inactivation Employing Fluorescent Proteins”. In: Anal. Chem. 81 (2001), 1755–1761. [296] X. Shu et al. “A Genetically Encoded Tag for Correlated Light and Electron Microscopy of Intact Cells, Tissues, and Organisms”. In: PLos Biol. 9 (2011), e1001041. [297] J. Torra et al. “Tailing miniSOG: structural bases of the complex photophysics of a flavin-binding singlet oxygen photosensitizing protein”. In: Sci. Rep. 9 (2019), p. 2428. 150 [298] C. Lafaye et al. “Riboflavin-binding proteins for singlet oxygen production”. In: Photochem. Photobiol. Sci. 21 (2022), pp. 1545–1555. [299] M. Westberg et al. “Rational Design of an Efficient, Genetically Encodable, Protein-Encased Singlet Oxygen Photosensitizer”. In: J. Am. Chem. Soc. (2015). [300] M. Westberg et al. “Temperature Sensitive Singlet Oxygen Photosensitization by LOV-Derived Fluorescent Flavoproteins”. In: J. Phys. Chem. B 121 (2017), pp. 2561–2574. [301] M. Westberg et al. “No Photon Wasted: An Efficient and Selective Singlet Oxygen Photosensitizing Protein”. In: J. Phys. Chem. B 121 (2017), pp. 9366–9371. [302] R. Ruiz-González et al. “Singlet Oxygen Generation by the Genetically Encoded Tag miniSOG”. In: 135 (2013), pp. 9564–9567. [303] W. Holzer et al. “Photo-induced degradation of some flavins in aqueous solution”. In: Chem. Phys. 308 (2005), pp. 69–78. [304] G. Strauss and W. J. Nickerson. “Photochemical Cleavage of Water by Riboflavin. II. Role of Activators”. In: J. Am. Chem. Soc. 83 (1961), pp. 3187–3191. [305] P. F. Heelis. “The photophysical and photochemical properties of flavins (isoalloxazines)”. In: Chem. Soc. Rev. 11 (1982), pp. 15–39. [306] M. Insińska-Rak, A. Golczak, and M. Sikorski. “Photochemistry of Riboflavin Derivatives in Methanolic Solutions”. In: J. Phys. Chem. A 116 (2012), pp. 1199–1207. [307] W. M. Moore et al. “Photochemistry of Riboflavin. I. The Hydrogen Transfer Process in the Anaerobic Photobleaching of Flavins”. In: J. Am. Chem. Soc. 85 (1963), pp. 3367–3372. [308] M. Halwer. “The Photochemistry of Riboflavin and Related Compounds”. In: J. Am. Chem. Soc. 73 (1951), pp. 4870–4874. [309] M. Insińska-Rak et al. “New photochemically stable riboflavin analogue—3-Methyl-riboflavin tetraacetate”. In: J. Photochem. Photobiol. A 186 (2007), pp. 14–23. [310] D. E. Metzler and W. L. Cairns. “Photochemical degradation of flavines. VI. New photoproduct and its use in studying the photolytic mechanism”. In: J. Am. Chem. Soc. 93 (1971), pp. 2772–2777. [311] I. Ahmad and H. D. C. Rapson. “Multicomponent spectrophotometric assay of riboflavine and photoproducts”. In: J. Pharm. & Biomed. Anal. 8 (1990), pp. 217–223. [312] M. Insińska-Rak et al. “Riboflavin degradation products; combined photochemical and mass spectrometry approach”. In: J. Photochem. Photobiol. A 403 (2020), p. 112837. [313] M. A. Sheraz et al. “Photo, thermal and chemical degradation of riboflavin”. In: Beilstein J. Org. Chem. (2014). 151 [314] W. M. Moore and C. Baylor Jr. “Photochemistry of riboflavine. IV. Photobleaching of some nitrogen-9 substituted isoalloxazines and flavines”. In: J. Am. Chem. Soc. 91 (1969), pp. 7170–7179. [315] W. Holzer et al. “Absorption and emission spectroscopic characterisation of the LOV2-domain of phot from Chlamydomonas reinhardtii fused to a maltose binding protein”. In: Chem. Phys. 302 (2004), pp. 105–118. [316] W. Holzer, A. Penzkofer, and P. Hegemann. “Absorption and emission spectroscopic characterisation of the LOV2-His domain of phot from Chlamydomonas reinhardtii”. In: Chem. Phys. 308 (2004), pp. 79–91. [317] G. K. Radda and M. Calvin. “Chemical and Photochemical Reductions of Flavin Nucleotides and Analogs”. In: Biochemistry 3 (1964), 384–393. [318] B. Holmström. “A Metastable Intermediate in the Anaerobic Photolysis of Riboflavin”. In: Bull. Soc. Chim. Belges 71 (1962), pp. 869–876. [319] P.-S. Song and D. E. Metzler. “PHOTOCHEMICAL DEGRADATION OF FLAVINS. IV. STUDIES OF THE ANAEROBIC PHOTOLYSIS OF RIBOFLAVIN”. In: Photochem. & Photobiol. 6 (1967), pp. 691–709. [320] I. Ahmad et al. “Photolysis of riboflavin in aqueous solution: a kinetic study”. In: Int. J. Pharm. 280 (2004), pp. 199–208. [321] S. Salzmann, J. Tatchen, and C. M. Marian. “The photophysics of flavins: What makes the difference between gas phase and aqueous solution?” In: J. Photochem. Photobiol. A 198 (2008), pp. 221–231. [322] M. Kabir, Y. Orozco-Gonzalez, and S. Gozem. “Electronic spectra of flavin in different redox and protonation states: a computational perspective on the effect of the electrostatic environment”. In: Phys. Chem. Chem. Phys. 21.30 (2019), pp. 16526–16537. [323] M. Kabir et al. “Alternative Strategy for Spectral Tuning of Flavin-Binding Fluorescent Proteins”. In: J. Phys. Chem. B 127 (2023), pp. 1301–1311. [324] K. Zenichowski, M. Gothe, and P. Saalfrank. “Exciting flavins: Absorption spectra and spin–orbit coupling in light–oxygen–voltage (LOV) domains”. In: J. Photochem. Photobiol. A 190 (2007), pp. 290–300. [325] S. Salzmann et al. “Influence of the LOV Domain on Low-Lying Excited States of Flavin: A Combined Quantum-Mechanics/Molecular-Mechanics Investigation”. In: J. Phys. Chem. B 113 (2009), pp. 15610–15618. [326] B. F. Minaev, H. Årgen, and V. O. Minaeva. Handbook of Computational Chemistry. Springer, 2016. Chap. Spin–Orbit Coupling in Enzymatic Reactions and the Role of Spin in Biochemistry, pp. 1–31. [327] M. Bracker et al. “Computer-Aided Design of Fluorinated Flavin Derivatives by Modulation of Intersystem Crossing and Fluorescence”. In: ChemPhotoChem 6 (2022), e202200040. 152 [328] P. R. Ogilby. “Singlet oxygen: there is indeed something new under the sun”. In: Chem. Soc. Rev. 39 (2010), pp. 3181–3209. [329] T. Föerster. “Zwischenmolekulare Energiewanderung und Fluoreszenz”. In: Ann. Phys. 2 (1948), pp. 55–75. [330] D. L. Dexter. “A Theory of Sensitized Luminescence in Solids”. In: J. Chem. Phys. 21 (1953), pp. 836–850. [331] H. Kautsky. “QUENCHING OF LUMINESCENCE BY OXYGEN”. In: Trans. Faraday Soc. 35 (1938), p. 216. [332] H. Tsubomura and R. S. Mulliken. “Molecular Complexes and their Spectra. XII. Ultraviolet Absorption Spectra Caused by the Interaction of Oxygen with Organic Molecules”. In: J. Am. Chem. Soc. 82 (1960), pp. 5966–5974. [333] K. Kawaoka, A. U. Khan, and D. R. Kearns. “Role of Singlet Excited States of Molecular Oxygen in the Quenching of Organic Triplet States”. In: J. Chem. Phys. 46 (1967), pp. 1842–1853. [334] B. F. Minaev. “Quantum-chemical investigation of the mechanisms of the photosensitization, luminescence, and quenching of singlet 1∆g oxygen in solutions”. In: Zhurnal Prikladnoi Spektroskopii 42 (1985). https://link.springer.com/article/10.1007/BF00661398, pp. 766–772. [335] B. F. Minaev et al. “INTERACTION MECHANISM OF MOLECULAR OXYGEN WITH EXCITED STATES OF LUMINOPHORES IN SOLUTION, IN POLYMERS, AND AT A SURFACE”. In: Zhurnal Prikladnoi Spektroskopii 50 (1989). https://link.springer.com/article/10.1007/BF00659989, pp. 291–297. [336] B. F. Minaev, S. Lunell, and G. I. Kobzev. “Collision-InducedIntensity of the b 1Σ + g − a 1∆g transition in Molecular Oxygen: Model Calculations for the Collision Complex 02 + H2”. In: Int. J. Quant. Chem. 50 (1994), pp. 279–292. [337] B. F. Minaev, V. V. Kukueva, and H. Årgen. “Configuration Interaction Study of the O-C2H4 Exciplex: Collision-induced Probabilities of Spin-forbidden Radiative and Non-radiative Transitions”. In: J. Chem. Soc., Faraday Trans. 90 (1994), pp. 1479–1468. [338] M. Smith and J. Michl. “Recent advances in singlet fission”. In: Annu. Rev. Phys. Chem. 64 (2013), pp. 361–368. [339] D. Casanova. “Theoretical modeling of singlet fission”. In: Chem. Rev. 118 (2018), pp. 7164–7207. [340] I. Polyakov, A. Kulakova, and A. Nemukhin. “Computational Modeling of the Interaction of Molecular Oxygen with the miniSOG Protein—A Light Induced Source of Singlet Oxygen”. In: Biophysica 3 (2023), pp. 252–262. [341] R. B. Best et al. “Optimization of the Additive CHARMM All-Atom Protein Force Field Targeting Improved Sampling of the Backbone ϕ, ψ and Side-Chain χ1 and χ2 Dihedral Angles”. In: J. Chem. Theory Comput. 8.9 (2012), pp. 3257–3273. 153 [342] A. Alexandrov. “Molecular Mechanics Model for Flavins”. In: J. Comput. Chem. 40 (2019), pp. 2834–2842. [343] J. Perdew, K. Burke, and M. Ernzerhof. “Generalized Gradient Approximation Made Simple”. In: Phys. Rev. Lett. 77 (1996), pp. 3865–3868. [344] S. Grimme et al. “A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu”. In: J. Chem. Phys. 132.15 (2010), p. 154104. [345] J. W. Ponder and D. A. Case. “Force fields for protein simulations”. In: Adv. Prot. Chem. 66 (2003), pp. 27–85. [346] D. Casanova and M. Head-Gordon. “Restricted active space spin-flip configuration interaction approach: Theory, implementation and examples”. In: Phys. Chem. Chem. Phys. 11 (2009), pp. 9779–9790. [347] A. Granovsky. “Extended multi-configuration quasi-degenerate perturbation theory: The new approach to multi-state multi-reference perturbation theory”. In: J. Chem. Phys. 134 (2011), p. 214113. [348] G. Herzberg. “Molecular spectra and molecular structure: I. Spectra of diatomic molecules”. In: I (1950). [349] A. I. Krylov. The Quantum Chemistry of Open-Shell Species. Ed. by A. L. Parrill and K. B. Lipkowitz. Vol. 30. J. Wiley & Sons, 2017, pp. 151–224. [350] A. I. Krylov. “Size-consistent wave functions for bond-breaking: The equation-of-motion spin-flip model”. In: Chem. Phys. Lett. 338 (2001), pp. 375–384. [351] D. Casanova and A. I. Krylov. “Spin-Flip Methods in Quantum Chemistry”. In: Phys. Chem. Chem. Phys. 22 (2020), pp. 4326–4342. [352] X. Feng, A. V. Luzanov, and A. I. Krylov. “Fission of entangled spins: An electronic structure perspective”. In: J. Phys. Chem. Lett. 4 (2013), pp. 3845–3852. [353] X. Feng and A. I. Krylov. “On couplings and excimers: Lessons from studies of singlet fission in covalently linked tetracene dimers”. In: Phys. Chem. Chem. Phys. 18 (2016), pp. 7751–7761. [354] X. Feng, D. Casanova, and A. I. Krylov. “Intra- and inter-molecular singlet fission in covalently linked dimers”. In: J. Phys. Chem. C 120 (2016), pp. 19070–19077. [355] B. A. Hess et al. “A mean-field spin-orbit method applicable to correlated wavefunctions”. In: Chem. Phys. Lett. 251 (1996), pp. 365–371. [356] C. M. Marian. “Spin-orbit coupling and intersystem crossing in molecules”. In: WIREs Comput. Mol. Sci. 2 (2012), pp. 187–203. 154 [357] P. Pokhilko, E. Epifanovsky, and A. I. Krylov. “General framework for calculating spin–orbit couplings using spinless one-particle density matrices: theory and application to the equation-of-motion coupled-cluster wave functions”. In: J. Chem. Phys. 151 (2019), p. 034106. [358] A. Carreras et al. “Calculation Of Spin-orbit Couplings Using RASCI Spinless One-particle Density Matrices: Theory And Applications”. In: J. Chem. Phys. 153 (2020), p. 214107. [359] A. A. Granovsky. XMCQDPT2, URL http://classic.chem.msu.su (Accessed Dec. 18, 2009). [360] M. A. El-Sayed. “Triplet State: Its radiative and non-radiative properties”. In: Acc. Chem. Res. 1 (1968), pp. 8–16. [361] M. Alessio et al. “Origin of Magnetic Anisotropy in Nickelocene Molecular Magnet and Resilience of its Magnetic Behavior”. In: J. Phys. Chem. C 127.7 (2023), pp. 3647–3659. [362] M. Weldon et al. “Singlet Sigma: The “Other” Singlet Oxygen in Solution”. In: Photochem. and Photobiol. 70 (1999), pp. 369–379. [363] T. J. Penfold et al. “Spin-Vibronic Mechanism for Intersystem Crossing”. In: Chem. Rev. 118 (2018), pp. 6975–7025. [364] C. Marian. “Understanding and Controlling Intersystem Crossing in Molecules”. In: Annu. Rev. Phys. Chem. 72 (2021), pp. 617–640. [365] D. Casanova et al. “Double spin-flip approach within equation-of-motion coupled cluster and configuration interaction formalisms: Theory, implementation and examples”. In: J. Chem. Phys. 130 (2009), p. 044103. [366] S. Matsika et al. “What we can learn from the norms of one-particle density matrices, and what we can’t: Some results for interstate properties in model singlet fission systems”. In: J. Phys. Chem. A 118 (2014), pp. 11943–11955. [367] M. Kasha. “Characterization of electronic transitions in complex molecules”. In: Disc. Faraday Soc. 9 (1950), pp. 14–19. [368] B. F. Minaev. “THEORETICAL MODEL OF TRIPLET–TRIPLET ANNIHILATION”. In: Izvestiya Vysshikh Uchebnykh Zavedenii, Fizika (1977). English version: Plenum Publishing Corporation, 0038-5697/78/2109-1120, 1979, pp. 12–17. [369] M. Tayebjee, D. McCamey, and T. Schmidt. “Beyond Shockley-Queisser: Molecular approaches to high-efficiency photovoltaics”. In: J. Phys. Chem. Lett. 6 (2015), pp. 2367–2378. [370] D. L. Dexter. “Two ideas on energy transfer phenomena: Ion-pair effects involving the OH stretching mode, and sensitization of photovoltaic cells”. In: J. Lumin. 18-19 (1979), pp. 779–784. [371] P. Ramos et al. “Nonadiabatic Derivative Couplings through Multiple Franck–Condon Modes Dictate the Energy Gap Law for Near and Short-Wave Infrared Dye Molecules”. In: The Journal of Physical Chemistry Letters 15.7 (2024). PMID: 38329913, pp. 1802–1810. 155 [372] M. T. do Casal et al. “First-Principles Calculations of Excited-State Decay Rate Constants in Organic Fluorophores”. In: The Journal of Physical Chemistry A 127.48 (2023). PMID: 37988002, pp. 10033–10053. 156
Abstract (if available)
Abstract
This doctoral thesis summarizes my contributions to computational chemistry, which were developed over five years of studies and training at the University of Groningen (RUG) and the University of Southern California (USC). The focus of my research is on excited state processes and biological systems. Chapter 2 describes a benchmark work concerning the level of transparency one needs to show when reporting the set-up of hybrid QM/MM calculations. We used the inhibition of the SARS-CoV-2 Main Protease (MPro ) by the anticancer drug carmofur as a test case. The reaction was modeled using two electronic structure packages: Q-Chem and NWChem. In Chapter 3, the knowledge gained in Chapter 2 is used both to characterize the inhibition of MPro with three reaction mechanisms and to design two novel inhibitors derived from the chemical structure of the compound X77. Chapter 4 focuses on the characterization of the dynamical properties of the system formed by the azobenzene photoswitch embedded in the DNA G-quadruplex; the dynamics and structural characteristics of the system are explored by means of classical molecular dynamics simulations (MD). Chapter 5 describes the evaluation of the spectral and spin properties of molecular magnets based on Nickelocene adsorbed on a CuO metal layer using the spin-flip variant of time-dependent density functional theory (SF-TD-DFT). In Chapter 6 I describe a machine learning methodology for the representation of the global reaction coordinate involved in the isomerization of retinal. In Chapter 7 I report a study on the optical properties of novel fluorescent dyes for RNA imaging. Chapter 8 explores the possible mechanisms behind the production of singlet oxygen by action of the mini Singlet Oxygen Generator (miniSOG) photoactive protein. The thesis concludes with Chapter 9 where I discuss future research directions.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Quantum mechanical description of electronic and vibrational degrees of freedom in laser-coolable molecules
PDF
Structure – dynamics – function analysis of class A GPCRs and exploration of chemical space using integrative computational approaches
PDF
Harnessing fluorinated C1 nucleophilic reagents for the direct fluoroalkylation of ubiquitous C(sp2)-X and C(sp)-H centers
PDF
Vibrational spectroscopy and molecular orientation at interfaces: correlating the structure of a material's surface to its properties
PDF
Computational spectroscopy in gas and condensed phases
PDF
Controlled heteroatom functionalization of carbon-carbon bonds by aerobic oxidation
PDF
Hydrogen energy system production and storage via iridium-based catalysts
PDF
Modeling x-ray spectroscopy in condensed phase
PDF
Iridium and ruthenium complexes for catalytic hydrogen transfer reactions
PDF
Development and application of robust many-body methods for strongly correlated systems: from spin-forbidden chemistry to single-molecule magnets
PDF
Design, optimization, and synthesis of novel therapeutics
PDF
Chemical investigations in drug discovery and drug delivery
PDF
Electronic structure analysis of challenging open-shell systems: from excited states of copper oxide anions to exchange-coupling in binuclear copper complexes
PDF
Going with the flow: constraining the lateral advection of redox-active metals from continental margins under differing oxygen regimes
PDF
Synthesis, characterization and reaction chemistry of polyazides and cyanometallates
PDF
Investigating the photodissociation of propargyl bromide
PDF
Probing charge transfer and electric fields at the bulk and interface using vibrational spectroscopy
PDF
Unlocking tools in chemistry to facilitate progress in drug discovery
PDF
Functionalization of nanodiamond surface for magnetic sensing application
PDF
Advancing lithium batteries and related electrochemical technologies for a sustainable future
Asset Metadata
Creator
Giudetti, Goran
(author)
Core Title
Simulations across scales: insights into biomolecular, mechanisms, magnetic materials, and optical processes
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Chemistry
Degree Conferral Date
2025-05
Publication Date
01/22/2025
Defense Date
01/13/2025
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
catalysis,computational chemistry,drug design,molecular modeling,OAI-PMH Harvest,spectroscopy,theoretical chemistry
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Krylov, Anna (
committee chair
), Fokin, Valery (
committee member
), Zhang, Chao (
committee member
)
Creator Email
giudetti@usc.edu,goran.giudetti@gmail.com
Unique identifier
UC11399FI0B
Identifier
etd-GiudettiGo-13772.pdf (filename)
Legacy Identifier
etd-GiudettiGo-13772
Document Type
Dissertation
Format
theses (aat)
Rights
Giudetti, Goran
Internet Media Type
application/pdf
Type
texts
Source
20250127-usctheses-batch-1237
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
catalysis
computational chemistry
drug design
molecular modeling
spectroscopy
theoretical chemistry