Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Algorithm development for modeling protein assemblies
(USC Thesis Other)
Algorithm development for modeling protein assemblies
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ALGORITHM DEVELOPMENT FOR MODELING PROTEIN ASSEMBLIES By Yiyu Li A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (MOLECULAR PHARMACOLOGY AND TOXICOLOGY) May 2013 Copyright 2013 Yiyu Li ii Acknowledgements I would like to express my great appreciation to my supervisor Dr. Ian S. Haworth for his support, guidance and great help. Without him, this doctoral thesis would not have been possible. I would like to thank Dr. Ralf Langen for his guidance and his group for providing the EPR data and EM data for hIAPP fibrils. Also, thanks to Dr. Daryl L. Davies and his group for their work on hPEPT1 loop 6-7, and Dr. Huynh-Hoa Bui, Dr. Timothy K. Gallaher and Dr. Brian T. Sutch for their work on WATGEN. I sincerely appreciate Dr. Ralf Langen, Dr. Nouri Neamati, Dr. Enrique Cadenas, Dr. Julio A. Camarero and Dr. Curtis T. Okamoto for serving on my PhD committee. Thanks to the members of Dr. Ian Haworth’s laboratory, Ma’mon M. Hatmal, Brian T. Sutch and Huynh-Hoa Bui, for their help and friendship. I would also like to acknowledge University of Southern California Center for High-performance Computing and Communication for their support of the computation in this thesis work. Finally, I am deeply grateful to my family and my friends for their love, support, encouragement and company in my life. iii Table of Contents Acknowledgements ...............................................................................................ii List of Tables ........................................................................................................vi List of Figures ...................................................................................................... vii Abstract ................................................................................................................ x Chapter 1 Introduction ....................................................................................... 1 1.1. Background ................................................................................................ 1 1.1.1. Amyloid Fibrils ...................................................................................... 1 1.1.2. Approaches for Solving Amyloid Fibril Structures ................................ 2 1.1.3. The Human Islet Amyloid Polypeptide (hIAPP) Fibrils ......................... 3 1.2. Overview of the Chapters ........................................................................... 6 Chapter 2 Computer Modeling of the Protofilament Structures of Human Islet Amyloid Polypeptide (hIAPP)................................................................................ 9 2.1. Introduction ................................................................................................ 9 2.2. Background ................................................................................................ 9 2.2. Methods ................................................................................................... 11 2.2.1. Model Building ................................................................................... 11 2.2.2. Simulated Annealing Molecular Dynamics (SAMD) ........................... 14 2.3. Results ..................................................................................................... 18 2.3.1. Structural Models Obtained without Pitch Constraints ....................... 18 2.3.2. Structural Models with Different Pitches ............................................. 20 2.3.3. SAMD Results using Different Starting Modeling ............................... 28 2.3.4. Side-chain Directions of the hIAPP Protofilament Models .................. 29 2.4. Discussion ................................................................................................ 32 Chapter 3 MFIBRIL: A Program for Modeling Amyloid Fibril Structures .......... 35 3.1. Introduction .............................................................................................. 35 iv 3.2. Background .............................................................................................. 36 3.3. Methods ................................................................................................... 38 3.3.1. The MFIBRIL Algorithm for Building Protofilaments/Fibrils ................ 38 3.3.2. Geometrical Analysis of Idealized Protofilaments .............................. 46 3.3.2. Definition of Datasets from Simulations ............................................. 47 3.4. Results ..................................................................................................... 47 3.4.1. Construction of Protofilament Models with Varied Stagger, Pitch and Offset ........................................................................................................... 47 3.4.2. Structures of Idealized Protofilaments................................................ 53 3.4.3. Variation of Stagger and Pitch ........................................................... 54 3.4.4. Variation of Offset .............................................................................. 55 3.4.5. hIAPP Fibril Models............................................................................ 59 3.4.6. Other Amyloid Fibrils Built by MFIBRIL .............................................. 59 3.4.7. The MFIBRIL Website ........................................................................ 63 3.5. Discussion ................................................................................................ 64 Chapter 4 Modeling and Prediction of the Fibril Structure of Human Islet Amyloid Polypeptide (hIAPP).............................................................................. 67 4.1. Introduction .............................................................................................. 67 4.2. Background .............................................................................................. 67 4.3. Methods ................................................................................................... 69 4.3.1. Fibril Constructions by MFIBRIL ........................................................ 69 4.3.2. Molecular Dynamics Simulations ....................................................... 71 4.3.3. Experimental Data Prediction for the Fibril Models ............................ 72 4.4. Results ..................................................................................................... 73 4.4.1. Simulation Results of the hIAPP Fibril Models ................................... 73 4.4.2. DEER Distance Prediction ................................................................. 78 4.4.3. Ring-to-ring Distance Prediction ........................................................ 83 4.4.4. Heterogeneity in Model_22 ................................................................ 85 4.4.5. Models Derived from Model_22 ......................................................... 86 4.5. Discussion ................................................................................................ 94 v Chapter 5 Modeling of the Water Network at Protein-RNA Interfaces (WATGEN).. ....................................................................................................... 98 5.1. Introduction .............................................................................................. 98 5.2. Background .............................................................................................. 98 5.3. Methods ................................................................................................. 100 5.3.1. Summary of the WATGEN Algorithm ............................................... 100 5.3.2. Protein-RNA Complexes .................................................................. 101 5.3.3. Algorithm Validation ......................................................................... 107 5.3.4. Classification of Interfacial Water Molecules .................................... 108 5.4. Results ................................................................................................... 110 5.4.1. Algorithm Validation ......................................................................... 110 5.4.2. Overview of the Water Networks ...................................................... 112 5.4.3. Classification of Water Molecules in the Network............................. 115 5.4.4. Images of the Water Network ........................................................... 121 5.5. Discussion .............................................................................................. 122 Chapter 6 Modeling the Transmembrane Domains of the Human Dipeptide Transporter hPEPT1 ......................................................................................... 126 6.1. Introduction ............................................................................................ 126 6.2. Background ............................................................................................ 126 6.3. Methods ................................................................................................. 128 6.3.1. Model Construction and Substrate Uptake Simulations ................... 128 6.3.2. Secondary Structure Prediction ....................................................... 129 6.4. Results ................................................................................................... 129 6.4.1. Structure and Functional Role of Loop 6–7 ...................................... 129 6.4.2. Two Phase Simulations of hPEPT1 Transportation ......................... 132 6.4.3. Continuous Simulations Involved TMD 5 and 10 ............................. 143 6.5. Discussion .............................................................................................. 145 References ....................................................................................................... 149 vi List of Tables Table 2.1. Constraints used in SAMD calculations. ............................................ 15 Table 2.2. The experimental and calculated DEER distances ............................ 16 Table 2.3. Structural data from SAMD with no pitch constraint........................... 19 Table 2.4. Structural data from SAMD with a 250 Å pitch constraint .................. 21 Table 2.5. Structural data from SAMD with a 500 Å pitch constraint .................. 22 Table 2.6. Structural data from SAMD with a 1000 Å pitch constraint ................ 22 Table 2.7. Mean geometrical data from SAMD calculations ............................... 23 Table 2.8. Side-chain directions in different sets of the hIAPP protofilaments .... 30 Table 2.9. Side-chain directions in the inter β-strand region ............................... 31 Table 3.1. Side-chain directions in consensus and a representative peptide ..... 48 Table 3.2. Geometrical analysis of the hIAPP protofilaments built by MFIBRIL.. 49 Table 3.3. Minimized energies for the protofilament models............................... 51 Table 4.1. Average energies of the fibril structures over the last 0.5 ns. ............ 75 Table 5.1. Classification of water molecules at 224 protein-RNA interfaces ..... 102 Table 5.2. Parameter estimates from the predictive model .............................. 113 Table 5.3. Types of water molecules at 224 protein-RNA interfaces ................ 116 Table 6.1. The first 30 simulation models constructed for phase 1. .................. 135 Table 6.2. The first 30 simulation models constructed for phase 2. .................. 140 vii List of Figures Figure 1.1. A schematic illustration of a single peptide in hIAPP fibril .................. 4 Figure 2.1. The “facing fragments” starting structure .......................................... 13 Figure 2.2. The “broken strand” starting structure .............................................. 14 Figure 2.3. A model of hIAPP protofilament structure without pitch constraints.. 20 Figure 2.4. Models of hIAPP protofilaments with various pitches ....................... 24 Figure 2.5. Actual pitches of the protofilament structures ................................... 24 Figure 2.6. The sheet-sheet distances in the hIAPP protofilament structures .... 26 Figure 2.7. Box plot of the 13-35 distance in SAMD calculations ....................... 26 Figure 2.8. A model of hIAPP protofilament with 500 Å-pitch constraints. .......... 27 Figure 2.9. The sticky end of a hIAPP protofilament model ................................ 32 Figure 3.1. A flowchart showing the three steps in the MFIBRIL program. ......... 39 Figure 3.2. Definitions in the MFIBRIL program.................................................. 42 Figure 3.3. Two interaction modes in the fibril modeling of MFIBRIL. ................. 44 Figure 3.4. A representative peptide and the protofilament models .................... 50 Figure 3.5. Procheck analysis of the protofilament models................................. 51 Figure 3.6. Procheck analysis of protofilament models after minimization. ........ 53 Figure 3.7. Dependence of the sheet-sheet distance on pitch and stagger. ....... 55 Figure 3.8. Protofilament models built in MFIBRIL with different offsets. ............ 56 Figure 3.9. Geometries of the protofilament models built with different offsets... 57 Figure 3.10. Two hIAPP fibril models built in MFIBRIL. ...................................... 58 viii Figure 3.11. Modeling of a ribbon-like hIAPP fibril using MFIBRIL ..................... 60 Figure 3.12. Modeling of a twisted Aβ fibril using MFIBRIL ................................ 61 Figure 3.13. User interface and a demonstration of the MFIBRIL website ......... 62 Figure 4.1. Starting structures of the four basic models ..................................... 70 Figure 4.2. Two peptides in the same layer in starting structures ....................... 71 Figure 4.3. Backbone RMSDs of the four basic fibril models. ............................. 74 Figure 4.4. Fibril structures after 10 ns simulations ............................................ 76 Figure 4.5. Middle peptide layer in the fibril structures after 10 ns simulations ... 77 Figure 4.6. Boxplot of the Diff pred-exp for the structures obtained every 1 ns. ....... 79 Figure 4.7. Histograms of Diff pred-exp aggregated over the last 4ns. .................... 82 Figure 4.8. Ring-to-ring distances of Y37-F15 for the four basic models ............ 84 Figure 4.9. Ring-to-ring distances of Y37-F23 for the four basic models ............ 85 Figure 4.10. Typical peptide layers in the 1st and 2nd fibril half in Model_22. .... 86 Figure 4.11. Experimental and predicted mobility for modified Model_22 .......... 92 Figure 4.12. Ring-to-ring distances of Y37-F15 and Y37-F23 for Model_22....... 93 Figure 4.13. Individual peptides in three modified structures of Model_22. ........ 94 Figure 5.1. Classification of the water network at a protein-RNA interface. ...... 109 Figure 5.2. Numbers of experimental water molecules within a specific distance (0.5 to 3.0 Å) from predicted water sites ........................................................... 111 Figure 5.3. Boxplots of O-O distances between experimental water molecule sites and those in a WATGEN-predicted or a random network.. ...................... 112 ix Figure 5.4. Plot of the number of WATGEN-predicted water molecules vs. the predicted value using the regression model for interface size .......................... 114 Figure 5.5. WATGEN solvation of the structure 1ULL_B_A ............................. 122 Figure 6.1. Secondary structure predictions for loop 6-7 in hPEPT1 ................ 131 Figure 6.2. A model of loop6-7 and its functional role for translocation. ........... 132 Figure 6.3. A hPEPT1 transportation model involved TMD 5 and TMD 10. ...... 133 Figure 6.4. Starting model and final model for the phase 1 simulation ............. 134 Figure 6.5. Changes of interaction energy E in phase 1 simulation .................. 138 Figure 6.6. Starting model and final model for the phase 2 simulation ............. 139 Figure 6.7. Changes of interaction energy E in the phase 2 simulation ............ 143 Figure 6.8. Changes of interaction energy E in continuous simulations ........... 144 x Abstract Amyloid fibrils are associated with over 40 human diseases. For example, fibrils of human islet amyloid polypeptide (hIAPP), α-synuclein, and amyloid-β are pathological hallmarks of type 2 diabetes, Parkinson’s disease and Alzheimer’s disease, respectively. Understanding of fibril structure and that of the protofilaments that constitute the fibrils will help to reveal the mechanism of fibril formation in human diseases and to facilitate therapeutic intervention. However, the fibril structures of full-length amyloid proteins/peptides are difficult to determine by direct experimental approaches since they are neither soluble nor crystallizable. The goals of this thesis are to develop algorithms, programs and protocols for the modeling and determination of amyloid fibril structures based on experimental data mainly obtained from election paramagnetic resonance (EPR) spectroscopy and electron microscopy (EM), and then to use the resulting models for explaining previously unclear aspects of the experimental data and for aiding experimental design to obtain more structural details of the amyloid fibrils. In the thesis work, a simulation protocol for modeling amyloid protofilaments mainly based on EPR data was developed. This protocol was demonstrated to be able to produce structures of hIAPP protofilaments consistent with the experimental data. Then, to investigate the properties of the protofilament structures obtained and to generate fibril models consisting of more than one xi protofilament, a new program, MFIBRIL, was developed for flexible construction of fibrils from a monomeric unit. Several potential models of the hIAPP fibril were constructed using MFIBRIL and then refined by equilibration using molecular dynamics simulations in the NAMD software package. The refined models were evaluated by predicting the ring-to-ring distances using the MFIBRIL analysis tool and the inter-spin label distances and residue mobilities using PRONOX (another program developed in our laboratory) and comparing the predictions to the experimental data. This work led to identification of a favorable hIAPP fibril model consisting of two protofilaments with strong hydrophobic interactions between I26 and V32 surrounding hydrophilic interactions between S28 and T30 in the second β-strand (C-terminal) regions. The programs and protocols developed in this work are applicable to structural determination of other amyloid fibrils. In addition to the modeling of amyloid fibrils, two other projects are described in the thesis. The first is addition to the solvation function of WATGEN (a program developed in our laboratory for modeling the water network at protein- peptide interfaces) to protein-RNA interfaces and investigation of the features of solvated protein-RNA interfaces. The second project involves modeling of the transport mechanism of the human dipeptide transporter (hPEPT1). . 1 Chapter 1 Introduction 1.1. Background 1.1.1. Amyloid Fibrils Amyloid is a term used to describe extracellular deposits and intracellular amyloid-like inclusions. Amyloid fibrils are highly organized fibrillar aggregates formed by misfolded proteins (Chiti and Dobson, 2006) and proteins that have the intrinsic ability to form insoluble fibrils are called amyloid proteins (Margittai and Langen, 2008). Protofilaments are the units that constitute amyloid fibrils (Chiti and Dobson, 2006). Amyloid diseases are the diseases characterized by amyloid deposits. Amyloid fibrils have been associated with more than 40 human diseases, including neurodegenerative diseases such as Alzheimer’s disease and Parkinson’s disease, non-neuropathic localized diseases such as Type II diabetes, and non-neuropathic systemic conditions such as AL amyloidosis (Chiti and Dobson, 2006). Amyloid fibrils are pathological hallmarks of amyloid diseases (Margittai and Langen, 2008), but the real pathogenic species (amyloid fibrils or prefibrillar aggregates) remains uncertain. Recent findings tend to support the view that prefibrillar aggregates are more cytotoxic than mature amyloid fibrils (Chiti and Dobson, 2006; Meier et al., 2006; Zraika et al., 2010). 2 However, due to the dynamic progression, high polymorphism and limited information on the aggregates, it is difficult to study their structures and mechanism of formation. In contrast, information on amyloid fibrils is relatively rich. Determination of the detailed structure of amyloid protofilaments and fibrils is important for understanding the mechanism of fibril formation, facilitating the study of prefibrillar aggregates, and developing therapeutic approaches to inhibit formation of amyloid protofilaments/fibrils and detect the misfolding of amyloid proteins. 1.1.2. Approaches for Solving Amyloid Fibril Structures Due to their insolubility, the structure of amyloid fibrils is difficult to solve by approaches such as crystallography and nuclear magnetic resonance (NMR) spectroscopy. Indirect structural information can be obtained from EM and atomic force microscopy (AFM) to study the morphology and assembly of amyloid fibrils (Goldsbury et al., 1997; Green et al., 2004); Fourier transform infrared (FITR) spectroscopy and circular dichroism (CD) to investigate the secondary structure (Higham et al., 2000; Kayed et al., 1999); X-ray fibril diffraction to study the arrangement of β-strands in amyloid fibrils (Sumner Makin and Serpell, 2004); and mutagenesis to study critical sites for fibril formation (Fox et al., 2010; Qin et al., 2007). Solid-state NMR (SSNMR) and EPR using double electron-electron resonance (DEER) are two important approaches to study the structural details of amyloid fibrils (Chiti and Dobson, 2006; Margittai and Langen, 2008) through measurement of intramolecular distances. Continuous wave EPR (CW-EPR) also 3 allows measurement of residue mobility. However, these approaches do not give a complete fibril structure, and computational modeling combined with the experimental data is needed to generate a comprehensive structure. A combination of SSNMR and modeling has been applied successfully for structure determination of HET-s(218-289) prion fibrils (Wasmer et al., 2008) and ribbon- like hIAPP fibrils (Luca et al., 2007), while EPR and computational modeling have been applied for structure determination of membrane-bound (non-fibrillar) alpha- synuclein (Jao et al., 2008). SSNMR or DEER distances can be used as distance constraints in the modeling, with further optimization by including other experimental data such as that obtained from EM and CW-EPR. 1.1.3. The Human Islet Amyloid Polypeptide (hIAPP) Fibrils hIAPP is a 37-residue peptide that is secreted with insulin by pancreatic beta-cells. Amyloid fibril formation by hIAPP is relevant to type II diabetes. Figure 1.1 shows the sequence of hIAPP, the important regions for fibril formation and the generally accepted shape of the single peptide in the hIAPP fibril. Two major forms of hIAPP fibrils have been observed by EM: ribbon-like fibrils and left-handed twisted fibrils (Goldsbury et al., 1997). The focus of the work in this thesis is on the twisted fibrils. EM shows that a twisted hIAPP fibril consists of 1-4 protofilaments with a 250 or 500 Å pitch (Goldsbury et al., 1997) (i.e. the pitch of a protofilament could range from 250 to 2000 Å). Twisted fibrils formed by two protofilaments are observed most frequently. FITR and CD show that hIAPP fibrils consist of a significant amount of β-sheets (Higham et al., 2000; 4 Kayed et al., 1999), X-ray and electron diffraction data indicate that the β-strands in hIAPP are perpendicular to the fibril axis (cross-β-structure) and the hydrogen bonding spacing between successive β-strands is about 4.7 Å. A diffuse 9.5-Å reflection is also observed in the X-ray diffraction data, which may be due to an inter-β-sheet spacing (Sumner Makin and Serpell, 2004). EPR indicates that the β-strands are in a parallel and in-register arrangement (in-register: the same residues in different molecules stack on top of each other) and the N-terminal region of the fibril is slightly less ordered (Jayasinghe and Langen, 2004). Figure 1.1. A schematic illustration of a single peptide in hIAPP fibril. Most of recent studies show that the hIAPP peptide in the fibril adopts a horseshoe shape with two β- strands connected by a loop region. The two pairs of double arrows shows the two β- strand regions identified by EPR data (Bedrood et al., 2012) (red double arrows) and NMR data (Luca et al., 2007) (blue double arrows). The region with green outlines shows the hIAPP amyloidogenic region. The six residue numbers in blue indicate the positions different from the nonamyloidogenic rat IAPP. The residues are filled in different colors to show non-aromatic hydrophobic residues (pink), aromatic residues (violet), non-aromatic polar residues (cyan), positive residues (blue) and cysteine (yellow). 5 Early models proposed for hIAPP fibrils contained three β-strands in a hIAPP monomer (Jaikaran and Clark, 2001; Kajava et al., 2005), but most recent models have a horseshoe-shaped monomer formed by only two β-strands (Bedrood et al., 2012; Luca et al., 2007; Wiltzius et al., 2008). Luca et al. (Luca et al., 2007) determined a model for ribbon-like hIAPP fibrils using scanning transmission EM, solid state NMR (SSNMR) and modeling (Luca et al., 2007), in which each peptide has a horseshoe shape formed by a β-strand (residues 8 to 16), a connection region (17 to 27), and a second β-strand (28 to 37). Molecular dynamics (MD) simulations based on this model indicated potential formation of fibrils containing two or three protofilaments with different organizations and interaction interfaces (Zhao et al., 2011a, b). Further important structural information has been derived from X-ray structures of several hIAPP fragments, including NFLVHSS (residues 14-20), NNFGAIL (21-27) and SSTNVG (28-33) (Wiltzius et al., 2008). Hamiltonian-temperature replica exchange MD simulations indicated β-strands for residues 17-26 and 30-35 in a hIAPP dimer (Laghaei et al., 2011). Another study using NMR, atomic force microscopy and mass spectroscopy revealed that IAPP molecules associate into transient low-order oligomers via interactions between H18 and Y37, and simulations based on this finding rendered a 2-fold (2-protofilament) model (Wei et al., 2011) similar to that in Luca et al. (Luca et al., 2007) A 3-fold model was proposed from two- dimensional infrared spectroscopy and MD simulations (Wang et al., 2011). hIAPP fragments of residues 20-29 (Westermark et al., 1990), 30-37 (Nilsson 6 and Raleigh, 1999) and 8-20 (Jaikaran et al., 2001) are sufficient to form fibrils in vitro, where 20-29 is the amyloidogenic region, and FLVHS (15-19) and NFLVH (14-18) are the two shortest fragments that have been found self-assemble (Mazor et al., 2002). A screening study of hIAPP mutants also revealed the importance of regions 20-29 and 12-17 in fibril formation, with most of the 28 mutations that reduced amyloidogenesis located in these two regions (Fox et al., 2010). Residue 20 is important for fibrilization, since an S20G mutation increased in vitro amyloidogenicity and an S20K mutation greatly delayed fibrilization (Cao et al., 2012). The aromatic residues in hIAPP also play an important role in fibrilization. Substituting one or all the aromatic residues with Leu and Ala decreased the rate of fibril formation (Marek et al., 2007; Tracz et al., 2004), and fluorescence resonance energy transfer has shown that Y37 interacts with at least one Phe residues during fibrillization (Padrick and Miranker, 2001). The effects of the aromatic residues in fibril formation are likely to be due to their hydrophobicity and planarity (Doran et al., 2012). The nonamyloidogenic rat IAPP differs from hIAPP at only six residues at positions 18, 23, 25, 26, 28 and 29, including three key proline residues (P25, P28, P29) that play a major role in preventing fibril formation (Green et al., 2003). The single hIAPP mutant I26P is also nonamyloidogenic (Fox et al., 2010). 1.2. Overview of the Chapters This thesis describes the development of algorithms, programs and 7 simulation protocols for modeling biomolecular structures, with the main focus on amyloid fibrils. The programs and protocols developed for modeling amyloid fibril structures are described in Chapters 2, 3 and 4, with illustration of their use for hIAPP fibrils. This work includes development of a new program, MFIBRIL, for flexible construction of fibrils from a monomeric unit. Most importantly, these chapters illustrate the potential to use computation to derive structures based on experimental data, and then use these structures with further computation to explain previously unclear aspects of the experimental data. This circular and integrated approach is a key element in the thesis. Chapter 2 describes a protocol of simulated annealing molecular dynamics (SAMD) constrained by EPR data for solving the structures of hIAPP protofilaments (the units that constitute a fibril). Details of the resulting protofilament structures and the potential application of this simulation protocol for solving the structures of other amyloid protofilaments are discussed. Chapter 3 describes the algorithm, functionalities and applications of a newly developed program, MFIBRIL, for modeling the amyloid fibril structures based on the protofilament structures and performed geometric analysis. MFIBRIL can select a representative peptide from a set of peptides obtained, for example, the protofilaments from a SAMD protocol. Based on a peptide, MFIBRIL can build idealized structures of protofilaments and fibrils with different geometric properties such as pitch, stagger, spacing, helical axis and interaction interfaces. Chapter 4 demonstrates the use of MFIBRIL combined with NAMD (Phillips 8 et al., 2005) and another program developed in our laboratory, PRONOX (Hatmal et al., 2012), to investigate hIAPP fibril structures. MFIBRIL was used to build fibril models with different potential interaction interfaces, and then equilibration using NAMD was performed to refine the models. Inter-label (DEER) distances, residue mobilities and ring-to-ring distances in the models were predicted using PRONOX and MFIBRIL analysis tool and then compared to the experimental results. Based on the model energies and the predicted and experimental DEER distances, favorable hIAPP fibril models were selected. Chapters 5 and 6 describe other algorithm development and applications not associated with fibrils. Chapter 5 describes a work with WATGEN, a program developed in our laboratory for modeling the water network at a protein-peptide interface (Bui et al., 2007). The chapter describes additions to WATGEN that extend the solvation function to protein-RNA interfaces and investigates the features of solvated protein-RNA interfaces. Chapter 6 describes modeling work on the human dipeptide transporter (hPEPT1). Several preliminary transport models constituted by the core transmembrane domains (TMDs) of hPEPT1 were built and tested using substrate uptake simulations in an in-house program, TMD 2.0, and the structure and functional role of the loop connecting TMD 6 and TMD 7 (loop 6-7) were also investigated. 9 Chapter 2 Computer Modeling of the Protofilament Structures of Human Islet Amyloid Polypeptide (hIAPP) 2.1. Introduction This chapter and the work published in Bedrood et al., 2012 describes a protocol of simulated annealing molecular dynamics (SAMD) constrained by electron paramagnetic resonance (EPR) data for solving the structures of hIAPP protofilaments and the details of the resulting structures. The potential application of this simulation protocol for solving the structures of other amyloid protofilaments is also discussed. This work was performed in collaboration with the laboratory of Dr. Ralf Langen, whose group provided the EPR and EM data. 2.2. Background Alzheimer’s disease, Parkinson’s disease, type 2 diabetes mellitus and spongiform encephalopathies are human diseases that involve misfolding and deposition of proteins (Stefani, 2004). In type 2 diabetes, amyloid deposits found in the pancreata of 95% of patients are primarily composed of the 37-residue human islet amyloid polypeptide (hIAPP) (Clark et al., 1987). hIAPP is co- secreted with insulin by -cells in the pancreatic islets of Langerhans (Clark et al., 1995; Westermark et al., 1987a; Westermark et al., 1987b), and is thought to play a role in carbohydrate metabolism (Ahren et al., 1998; Gebre-Medhin et al., 10 1998; Kahn et al., 1990) and satiety (Young, 2005). hIAPP is monomeric in its physiological state, but aggregated in the disease state. Evidence from cell and animal studies indicates a link between hIAPP misfolding and pancreatic -cell dysfunction (Butler et al., 2004; Cooper et al., 1987; Janson et al., 1996; Kahn et al., 1999). In disease, hIAPP undergoes a multi-step misfolding process in which the monomer changes into various oligomeric forms and ultimately forms fibrils constituted by 2 or more protofilaments. A detailed molecular and mechanistic understanding of this process could facilitate therapeutic intervention. However, the 3-dimensional structure of hIAPP fibrils or other misfolded forms remains elusive. Fourier transform infrared spectroscopy (FT-IR) and circular dichroism show that the fibrils contain -sheet structure (Higham et al., 2000; Radovan et al., 2008), and X-ray and electron diffraction indicate that the cross- strands are 4.7 Å apart and perpendicular to the fibril axis, and the distance between the - sheets is 8-11 Å (Sumner Makin and Serpell, 2004). Site-directed spin labeling (SDSL) and electron paramagnetic resonance (EPR) of hIAPP fibrils show these strands are arranged in a parallel, in-register structure (Jayasinghe and Langen, 2004). According to transmission electron microscopy (EM) and atomic force microscopy, hIAPP fibrils can have striated ribbon and twisted fibril morphologies (Goldsbury et al., 1997). Luca et al. used solid-state NMR and electron microscopy to propose a model with residues 18-27 of hIAPP as a bend between two -strands (Luca et 11 al., 2007). A related, albeit slightly different model, has been suggested based upon the X-ray crystal structures of hIAPP fragments (Wiltzius et al., 2008). Additionally, a number of theoretical models have been proposed for hIAPP fibrils, with one such model suggesting hIAPP adopts a planar S-shaped fold with three -strands (residues 8-16, 20-27 and 30-37) stacked in register (Kajava et al., 2005). To obtain more detailed structural insight into the architecture of hIAPP fibrils, we adapted an EPR-based computational approach previously developed to determine the structure of membrane-bound -synuclein (Jao et al., 2008). The following text describes the details of the computational protocol for the determination of the hIAPP protofilament structures and the resulting models. 2.2. Methods 2.2.1. Model Building Computational refinement of the structure of the hIAPP fibril was performed using distances from DEER experiments (covering residues 12-36 of hIAPP) and data from EM and CW-EPR. The model included 101 copies of residues 12-36 of the 37-amino acid hIAPP peptide. The initial model (“facing fragments” starting model) was constructed as shown in Figure 2.1. Each peptide was broken into two fragments (residues 12-26 and 27-36). The two fragments were each constructed as β-strands perpendicular to the fibril axis at a distance of 30 Å apart and with a 5 Å space between adjacent strands (Figure 2.1A,B). Seventeen spin labels were added to every fourth peptides at residues 12, 13, 14, 15, 16, 12 17, 18, 19, 24, 27, 28, 29, 30, 31, 32, 35 and 36, beginning from the first peptide (Figure 2.1C). This starting model was based on CD and FT-IR data showing that hIAPP fibrils consist of β-sheets (Luca et al., 2007; McHaourab et al., 1996); experimental data from fiber diffraction suggesting that the β-strands are perpendicular to the fibril axis and each β-strand is about 5 Å apart from the neighboring strand (Sumner Makin and Serpell, 2004); and SDSL data suggesting that the β-strands are parallel and in-register (Jayasinghe and Langen, 2004). Different labeled frequencies varied from 1 to 10 were applied to the starting models to select the optimal label frequency in SAMD calculations. Another starting model with different β-strand arrangement (“broken strand” starting model) shown in Figure 2.2 were also tested. The starting structures were generated by in-house programs FIBRIL and PRONOX (Hatmal et al., 2011), which were used to produce input files for AMBER10 (Case et al., 2005). 13 Figure 2.1. The “facing fragments” starting structure with every four peptide labeled. This starting model is used in most of our SAMD calculations for hIAPP protofilaments. (A) A model with 101 peptides constructed as two fragments (residues 12-26 and 27-36 of hIAPP) in beta strand conformations and orthogonal to the fibril axis. Every fourth peptide includes spin labels, starting from peptide 1. (B) A view of peptide 33 (a spin- labeled peptide) in the same orientation to that shown in A. (C) The same peptide following rotation through 90 . The fibril axis is now going into the page. The 17 individual spin labels are at positions 12, 13, 14, 15, 16, 17, 18, 19 and 24 (fragment 1) and 27, 28, 29, 30, 31, 32, 35 and 36 (fragment 2). 14 Figure 2.2. The “broken strand” starting structure with every two peptide labeled. (A) A model with 101 peptides (residues 12-36 of hIAPP with a break between residue 26 and 27) constructed in beta strand conformations and orthogonal to the fibril axis. Every two peptide includes spin labels, starting from peptide 1. (B) A view of peptide 6 (a spin-labeled peptide) in the same orientation to that shown in A. (C) The same peptide following rotation through 90 . The fibril axis is now going into the page. The 17 individual spin labels are at positions 12, 13, 14, 15, 16, 17, 18, 19, 24, 27, 28, 29, 30, 31, 32, 35 and 36. A break are placed between residue 26 and 27. 2.2.2. Simulated Annealing Molecular Dynamics (SAMD) SAMD calculations were performed in AMBER10 (Case et al., 2005). After a brief minimization (3000 steps) of the starting structure, 10 cycles of SAMD were performed for 160 ps/cycle. Each SAMD cycle included a heating phase from 0K to 1200K in 4 ps; maintenance of the temperature at 1200K for a further 6 ps (during this first 10 ps, the force constants for the constraints were increased from 0.1 to 10.0); and then cooling to 0K over 150 ps, with stepwise adjustments of the TAUTP parameter. SAMD was performed with a time step of 0.001 ps, a distance dependent dielectric of 4, and a cut-off of 10.0 Å. The two fragments of each peptide were free to move in the SAMD calculations, under the constraints 15 described below. Full details of the parameters and input files used for these calculations will be provided on request. Table 2.1. Constraints used in SAMD calculations. Geometry Element Structural Element in Each Peptide Value 1 r1 2 r2 2 r3 2 r4 2 Inter-label distance Label pairs in Table 2.2 for labeled peptides Exptl. N (Å) 3 N-R-1 3 N-R 3 N+R 3 N+R+1 3 ß -sheet H-bonds Residues 12 - 18, 31 - 36 2.15 Å 1.3 Å 1.8 Å 2.5 Å 3.0 Å Peptide bond breaks Bonds connecting residues 26-27 1.5 Å 1.4 Å 1.5 Å 3.0 Å 5.0 Å Fibril Linearity 4 Peptides 1 - 101 475 Å 468 Å 469 Å 480 Å 481 Å Pitch constraint 5 Peptides with 1,11 relationship 324º 308.6º 317.6º 328.7º 332.3º Omega torsion All peptide bonds 180º 177.0º 178.0º 182.0º 183.0º 1 The value is an experimental or idealized value for the particular geometry element. 2 AMBER10 input values for simulated annealing. Besides, we used constraint strength rk2=rk3=10 and rk2=rk3=1000 for the pitch constraints <= 2000 Å and the pitch constraints > 2000 Å, respectively. 3 N and R are the experimental DEER distance and the Range shown in column 2 and column 5 of Table 2.2, respectively. 4 The distance constraint was applied between the Cα atoms of amino acid (aa) 14 in peptide 1 and aa14 in peptide 101, and between the Cα atoms of aa34 in peptide 1 and aa34 in peptide 101. 5 Constraints between peptide 1-peptide 11, peptide 2-peptide 12, … peptide 91-peptide 101 in the fibril. The numbers shown are those for a pitch constraint of 500 Å. In general, for a pitch constraint of P Å, the constraint value, r1, r2, r3, and r4 are given by [360- ((10× 360)/x)× 5)]º , where x = P, P*(1-0.3), P*(1-0.15), P*(1+0.15), and P*(1-0.3), respectively. This gives a constraint value of 324º (as entered in AMBER) or -36º (reflecting a left-handed twist) for a pitch constraint of 500 Å. Each torsional angle constraint was defined using four backbone atoms: for ß-strand 1, Cα of aa13 and aa17 in peptides i and i+10; and for ß-strand 2, Cα of aa31 and aa35 in peptides i and i+10. Input files for all constraints were generated with in-house code. 16 Table 2.2. DEER distances determined experimentally, inter-label distances measured in the model, and ranges used in SAMD calculations DEER No Label Pairs DEER Distance (Å) 1 Model Distance 1 mean± SD (Å) 3 Model Distance 2 mean± SD (Å) 4 Range (± Å) 2 1 12-18 22 21.6± 1.3 21.6± 1.3 2 2 13-18 21 20.3± 1.2 20.2± 1.2 2 3 13-19 21 20.0± 1.2 19.8± 1.1 2 4 13-24 32 32.5± 3.8 33.0± 3.7 8 5 13-28 35 32.2± 1.5 32.4± 1.8 4 6 13-32 27 22.2± 2.5 22.1± 2.4 8 7 13-35 26 26.7± 2.1 26.0± 2.1 4 8 13-36 26 24.3± 2.2 24.4± 2.4 4 9 14-19 19 19.2± 1.3 19.3± 1.3 2 10 14-32 27 26.8± 1.3 26.5± 1.2 2 11 15-29 24 20.3± 3.1 20.7± 3.4 8 12 16-27 35 30.7± 3.6 31.2± 4.1 8 13 17-29 25 18.9± 2.1 19.9± 3.1 8 14 17-35 33 29.5± 0.9 29.5± 1.0 4 15 19-24 20 18.6± 0.9 18.5± 0.9 2 16 24-30 23 21.5± 2.3 22.1± 2.5 4 17 24-31 23 22.6± 2.0 22.6± 2.2 4 18 27-35 27 26.6± 1.3 26.7± 1.3 2 1 Experimental data obtained by Sahar Bedrood and Balachandra G. Hegde in the laboratory of Dr. Ralf Langen. 2 Range used in SAMD calculations 3 Mean inter-label distances in 10 structures obtained by SAMD calculations with no pitch constraint 4 Mean inter-label distances in 10 structures obtained by SAMD calculations with a 500 Å pitch constraint The constraints used in the calculations are shown in Table 2.1. DEER distances were implemented with ranges of 2, 4 or 8 (Table 2.2) to reflect the width of the experimental distribution qualitatively. The “missing” peptide bond between residues 26 and 27 in each peptide was constrained to distances < 5 Å. The distance from the first to the last peptide was limited to 468-481 Å to prevent the fibril from bending. 17 Several sets of calculations were performed with inclusion of constraints of the helical pitch of the fibril, based on the data from EM, with the goal of producing structures with pitches ranging from 200 to 4000 Å. The pitch constraint was implemented based on the assumption of a ~5 Å inter-peptide distance (as defined in the starting structure) and constraint of torsional angles between the β-strands of peptides i and i+10 in the fibril (Table 2.1). Thus, for example, a pitch of 500 Å is equivalent to a full turn of the helix over 100 peptides; therefore, the required torsional angle is -36 (for a left-handed helix) between the β-strands of peptides i and i+10. The equation used to generate the torsional constraints for a given pitch and details of the implementation of the constraints are given in a footnote to Table 2.1. SAMD calculations were also performed with no constraint of the pitch. Structures were collected at the end of each SAMD cycle, giving a total of 10 structures. Each of these was energy minimized (500 steps) to connect the “missing” peptide bonds and replace the spin labels with the original amino acids. Structural analysis was performed using AMBER10 (Case et al., 2005) and PROCHECK (Laskowski et al., 1993a) for checking the integrity of the structure. The chirality and pitch of the fibril helix, the distance between the two β-sheets, and the stagger of each peptide were analyzed using the in-house program FIBRIL. Distances between label pairs in the modeling structures were calculated and compared to the experimental DEER distances. 18 2.3. Results 2.3.1. Structural Models Obtained without Pitch Constraints The continuous wave EPR spectroscopy (Bedrood et al., 2012) has shown that the N-terminal region of hIAPP is disordered while residues 11 to 36 have generally low mobility values, and residues 14-19 and 31-36 are two β-strand regions due to their periodicity observed in the residue motilities (Figure 2 in Bedrood et al., 2012). And the four-pulse DEER distances (Table 2.2 column 2) suggested the residue 12 and 13 should be included in the first β-strand region (Bedrood et al., 2012). These results are consistent with the findings by solid state NMR (Luca et al., 2007) and fragment crystallography (Wiltzius et al., 2008). Based on the continuous wave EPR data (Figure 2 in Bedrood et al., 2012) and the DEER distances (Table 2.2 column 2), we performed constrained SAMD calculations to generate structure models of hIAPP protofilaments. To avoid the interferences from the disordered residues, we only include residue 12-36 in the models. The calculations included a stack of 101 peptides with every fourth peptide being spin labeled (Figure 2.1) and a family of 10 structural models was generated (Table 2.3). Remarkably, all of these models spontaneously exhibited left-handed helical structures that resemble the overall twisted morphology observed by EM (Figure 1 in Bedrood et al., 2012). The pitch of the helix was 237.0 13.5 Å, which corresponds to a complete helical turn over about 50 peptides. This pitch is at the lower end of the range of pitches seen by EM. The 19 distance between the β-sheets (i.e. sheet-to-sheet distance) is 9.4 0.6 Å, which is consistent with the findings in X-ray diffraction and electron diffraction (Sumner Makin and Serpell, 2004). Table 2.3. Structural data from SAMD calculations with no pitch constraint (ID: 4_nc) Cycle Pitch (Å) Radius (Å) Sheet-Sheet (Å) Stagger (Peptides) 1 240.0 22.8 10.6 4 2 239.8 25.9 9.9 4 3 249.0 23.3 8.8 4 4 245.6 25.0 9.2 4 5 233.5 23.4 9.0 4 6 226.2 28.4 8.9 4 7 251.2 24.8 9.3 4 8 221.1 26.4 8.8 4 9 251.8 25.0 9.9 4 10 212.4 25.9 9.1 4 Average 237.0 25.1 9.4 4.0 SD 13.5 1.7 0.6 0.0 One of the 10 structural models is shown in Figure 2.3. The peptide conformation has a horseshoe shape (Figure 2.3 A), but with the important distinction that one ß-strand was displaced with respect to the plane of the second β-strand (Figure 2.3 B). This displacement can be seen by observing the location of a single peptide in the fibril (Figure 2.3 C,D), in which the two ß- strands within each peptide have a stagger of about 15-18 Å, causing one ß- strand in peptide i to be adjacent to the opposite ß -strand in peptide i+3 or i+4. The resulting helical twist produced by this arrangement of peptides is shown in Figure 2.3 E. To further investigate the stagger of the 10 models, we performed a geometric analysis by searching for the peptide i+N whose residue 36 is closest 20 to the residue 12 of peptide i. The results revealed that all the 10 structural models had consistent peptide stagger of about 4 peptide-spacing (i.e. N is about 4) (Table 2.3). Figure 2.3. A model of hIAPP protofilament structure obtained without pitch constraints. (A) A typical hIAPP peptide incorporated in the fibril, viewed along the fibril axis. (B). The same peptide viewed along an axis orthogonal to the fibril axis, showing the stagger of the two ß-strands of the peptide. (C) A section of the structural model showing the stagger of the peptides (shown as blue, green and orange ribbons: the relationship of blue-colored peptides is indicated). (D) Rotation of the structural model with the axes of ß -strands orthogonal to the fibril. (E) A structural model containing 101 peptides showing a turn of the left-handed helix of ~90 between the centers of the red boxes. The helix has an average pitch of 230 Å. 2.3.2. Structural Models with Different Pitches As shown above, the structure models obtained without pitch constraints always exhibit pitches of about 250 Å (Table 2.3), which is at the lower end of the range of pitches seen by EM. This could be due to the calculations including only 21 a single stack of peptides, while the EM images (Figure 1 in Bedrood et al., 2012) show protofilaments wrapped around each other to form fibrils with variable pitches of 250 to about 1000 Å. Therefore, further SAMD calculations were performed (Table 2.4 - Table 2.7) to determine if the DEER distances could also be consistent with these longer pitches. Using pitch constraints (Table 2.1) with the same strength (rk2=rk3=10) as the inter-label distance constraints, we were able to generate structures with different pitches up to 2000 Å. With stronger pitch constraints (rk2=rk3=1000), we were able to generate structures with pitches larger than 2000 Å. All of these structures are left-handed helical structures with similar peptide arrangement as the structures obtained without pitch constraints. Table 2.4. Structural data from SAMD calculations with a 250 Å pitch constraint (ID: 4_500) Cycle Pitch (Å) Radius (Å) Sheet-Sheet (Å) Stagger (Peptides) 1 253.0 29.2 8.8 4 2 242.4 25.0 8.6 4 3 241.4 26.1 8.9 4 4 241.8 25.1 9.2 4 5 236.8 26.9 8.7 4 6 239.3 22.2 8.8 4 7 239.0 23.5 8.2 4 8 244.1 23.1 8.8 4 9 244.3 25.7 8.6 4 10 245.1 22.9 8.9 4 Average 242.7 24.9 8.8 4.0 SD 4.4 2.1 0.3 0.0 22 Table 2.5. Structural data from SAMD calculations with a 500 Å pitch constraint (ID: 4_5000) Cycle Pitch (Å) Radius (Å) Sheet-Sheet (Å) Stagger (Pepties) 1 436.9 28.0 10.8 4 2 436.4 23.2 11.0 4 3 439.1 26.9 10.8 3 4 440.4 23.0 10.7 4 5 444.7 33.4 10.7 4 6 459.6 25.3 11.0 4 7 454.2 31.0 10.5 3 8 455.4 22.7 10.7 4 9 467.6 33.1 11.1 3 10 448.3 24.9 11.1 4 Average 448.3 27.1 10.8 3.7 SD 10.6 4.1 0.2 0.5 Table 2.6. Structural data from SAMD calculations with a 1000 Å pitch constraint (ID: 4_1000) Cycle Pitch (Å) Radius (Å) Sheet-Sheet (Å) Stagger (Pepties) 1 817.0 23.1 11.7 3 2 810.2 19.4 12.6 3 3 897.8 25.0 12.2 3 4 829.7 22.4 13.0 3 5 799.5 24.6 12.5 3 6 820.0 26.7 12.3 3 7 862.5 31.3 12.9 3 8 884.3 19.1 13.0 3 9 756.1 21.3 12.9 3 10 853.1 24.1 12.5 3 Average 833.0 23.7 12.6 3.00 SD 42.3 3.6 0.4 0.00 23 Table 2.7. Mean geometrical data from SAMD calculations with various pitch constraints or different label frequencies 1 Dataset ID Label frequency Input Pitch 2 Pitch (Å) Radius (Å) Sheet-Sheet (Å) Stagger (Pepties) 4_200 4 200 203.5± 4.4 24.2± 1.7 8.4± 0.2 4.1± 0.3 4_250 4 250 242.7± 4.4 24.9± 2.1 8.8± 0.3 4.0± 0.0 4_500 4 500 448.3± 10.6 27.1± 4.1 10.8± 0.2 3.7± 0.5 4_1000 4 1000 833.0± 42.3 23.7± 3.6 12.6± 0.4 3.0± 0.0 4_1500 4 1500 1227.5± 124.7 20.4± 2.5 18.1± 1.2 2.5± 0.5 4_2000 4 2000 1409.4± 131.2 19.0± 1.5 15.6± 1.5 2.8± 0.4 4_2500 4 2500 2115.7± 128.5 22.8± 2.8 20.8± 1.3 1.9± 0.3 4_3000 4 3000 2517.0± 154.9 21.0± 2.5 20.8± 1.1 2.0± 0.0 4_4000 4 4000 3286.1± 152.3 19.2± 3.8 20.0± 2.2 2.1± 0.3 1_nc 1 No input 308.4± 14.9 44.6± 4.8 14.2± 0.2 4.0± 0.0 2_nc 2 No input 264.1± 22.1 29.2± 2.0 11.1± 0.3 4.0± 0.0 4_nc 4 No input 237.0± 13.5 25.1± 1.7 9.4± 0.6 4.0± 0.0 1 All means are calculated for 10 structures. 2 The formal pitch defined by constraints in the calculation (see footnote 5 in Table 2.1). The results of the calculation do not necessarily match this pitch exactly. Calculations in which the pitch was nominally constrained to 250 Å (Table 2.4), 500 Å (Table 2.5) and 1000 Å (Table 2.6), respectively, gave structures with average pitches of 242.7± 4.4, 448.3 10.6 and 833.0 42.3 Å, respectively (Figure 2.4). Calculations using larger pitch constraints were able to generate structures with larger pitches if the pitch constraint is strong enough (Table 2.7). The actual pitches of these structures are usually less than the input pitch constraints, and the differences between the actual pitches and the input pitches are larger when the input pitches are larger (Figure 2.5). These might be because large pitches were not preferred in the calculations including only a single stack of peptides. 24 Figure 2.4. Models of hIAPP protofilaments with pitches of (A) 242 Å, (B) 440 Å, and (C) 830 Å. The red arrow indicates a left-handed turn of ~90 in each model. The structures shown in the figure were obtained from cycle 4 of SAMD calculations performed with constraints to give nominal pitches of 250 Å (Table 2.4), 500 Å (Table 2.5), and 1000 Å (Table 2.6), respectively. Figure 2.5. Actual pitches of the protofilament structures obtained from SAMD with different pitch constraints. The data of constrained pitch larger than 2000 Å is obtained using 100 times stronger pitch constraints. 25 Stagger between the two β-strands was also observed in these structure models. The structures generated by 250 Å-pitch constraints (Table 2.4) have the same stagger (4 peptide spacing) as the structures generated without pitch constraints (Table 2.3), while the structures generated by 500 Å- and 1000 Å- pitch constraints have slightly smaller staggers (4 or 3 peptide spacing) (Table 2.5, Table 2.6). The structures with pitches larger than 1000 Å have further smaller peptides staggers (Table 2.7). One of the best established geometrical parameters for hIAPP fibrils is the distance between the beta sheets, which has been determined to 8-11 Å by X- ray and electron diffraction (Sumner Makin and Serpell, 2004). Calculation of this "sheet-to-sheet" distance for the derived models gave values of 8.8± 0.3, 10.8± 0.2 and 12.6± 0.4 Å for mean helical pitches of 242.7, 448.3 and 833.0 Å, respectively (Table 2.4 - Table 2.6), but much higher sheet-to-sheet distances for larger pitches (Table 2.7). Scatter plot of the sheet-to-sheet distances vs. the actual pitches of the hIAPP models (Figure 2.6) also showed that the sheet-to- sheet distances were consistent with the experimental data only in the models with pitches ≤1000 Å. DEER data were also inconsistent with helical pitches >1000 Å. For example, the mean 13R1/35R1 distance in models with pitches ≤1000 Å was consistent with the DEER distance, but this consistency was lost in models with higher helical pitches (Figure 2.7). The 13R1/35R1 distance spans the two beta sheets and should be sensitive to the sheet-to-sheet distance. 26 Figure 2.6. The sheet-sheet distances in the hIAPP protofilament structures with different pitches. The data were fitted with a logarithmic curve. Figure 2.7. A box plot showing the variation of the 13-35 distance in SAMD calculations with various pitch constraints. The Y axis shows the distance differences between the model and the DEER data, normalized by the “range” used in SAMD calculations. The mean values of the distance differences are also shown as diamonds in the plot. The bar is the median value and the top and bottom of the box are the 25 th and 75 th percentiles, respectively. The X axis shows the formal pitch constraint (upper number) applied in different SAMD runs and the pitches (mean ± standard deviation) obtained in the calculations. 27 Therefore, we suggest that the above results support an upper limit for the helical pitch of close to 1000 Å, about the same as that seen in the EM images. Figure 2.8. A model of hIAPP protofilament obtained using 500 Å-pitch constraints. (A) A typical hIAPP peptide incorporated in the fibril, viewed along the fibril axis. (B). The same peptide viewed along an axis orthogonal to the fibril axis, showing the stagger of the two ß-strands of the peptide. (C) A section of the structural model showing the stagger of the peptides (shown as blue, green and orange ribbons: the relationship of blue-colored peptides is indicated). (D) Rotation of the structural model with the axes of ß -strands orthogonal to the fibril. (E) A structural model containing 101 peptides showing a turn of the left-handed helix of ~90 between the centers of the red boxes. The helix has an average pitch of 440 Å. (F) Overlay of the protofilament on a cartoon depiction of a fibril structure determined by EM. A structural model with a pitch of about 500 Å is shown in Figure 2.8. The peptide conformation (Figure 2.8 A) is similar to the peptide confirmation in the structure model obtained without pitch constraint (Figure 2.3 A). And the stagger (about ~15 Å) between the two ß-strands can also be clearly seen (Figure 2.8 B- D). The arrangement of peptide is also consistent with the model without pitch constraint except the resulting helical protofilament in this model is less twisted, 28 i.e. with larger pitch (Figure 2.8 E). The protofilament with pitch about 500 Å may form a 2-protofilament fibril structure with pitch about 250 Å (Figure 2.8 F). 2.3.3. SAMD Results using Different Starting Modeling The above models were generated using the “facing fragments” starting model (Figure 2.1) with every four peptides labeled. Different starting modeling has been tested to see if similar structures can be generated (Table 2.7). An important parameter in the starting model is the label frequency. The EPR experiment only labeled 3% peptides with one pair of labels for the measurement of each deer distance (Bedrood et al., 2012). Thus, on the one hand, low label frequency in SAMD could better represent the experimental condition and reduce the effects of the spin labels (spin labels occupy relatively large spaces and their conformation are also different from the original amino acids they replaced). On the other hand, since we need explicit spin labels to achieve the inter-label constraints in SAMD, higher label frequency could better constrain the structure to match the DEER distances. Therefore, an optimal label frequency is a trade- off between the effects of the labels and the strength of the constraints. Preliminary SAMD calculations using label frequency from 1 to 10 have been performed. The resulting structures revealed that label frequency of 4 (i.e. labeling every 4 peptides) is the optimal label frequency in our SAMD protocol for hIAPP protofilaments. “Label frequency > 4” failed to generate structures consistent with experimental data, while “label frequency < 4” gave structures similar to the ones generated by “label frequency = 4”, but with larger sheet-to- 29 sheet distance than the experimental observation (Table 2.7), which was caused by the large size of the spin labels. Another starting model with different arrangement of the two β-sheets was also tested (“broken strand” starting model, Figure 2.2). In this model, instead of placing the two fragments to face each other, we let the two fragments continue as a single β-sheet, but left a small gap between them. SAMD calculations using this starting model generated similar protofilament structures as using the “facing fragment” starting model: the peptides were also in horseshoe shape and the sheet-to-sheet distances were also similar. This result suggested the resulting hIAPP structures obtained by SAMD were independent of the starting arrangement of the two β-sheets. 2.3.4. Side-chain Directions of the hIAPP Protofilament Models Side-chain directions were also analyzed on the hIAPP protofilaments models with pitch<=1000 Å and consistent results were obtained across the different structure sets (Table 2.8). The side-chain directions in the two β-strand regions were quite clear and they were consistent with the residue mobilities determined by the EPR spectroscopy (Figure 2 in Bedrood et al., 2012): the residues generally pointing into the protofilament axis were relatively less mobile, while the residues generally pointing out of the protofilament axis were relatively more mobile (Table 2.8). The side-chain directions in the inter β-strand region of the structures with “label frequency = 1 or 2” were more ordered than those of the structures with “label frequency = 4” due to the stronger constraints of higher 30 label frequency. Table 2.8. Side-chain directions in different sets of the hIAPP protofilaments a,b Amino Acid 1_nc c 2_nc c 4_nc c 4_250 c 4_500 c 4_1000 c Consensus e L12 OUT out out OUT out out out A13 IN IN IN IN IN IN IN N14 OUT OUT OUT OUT OUT OUT OUT F15 IN IN IN IN IN IN IN L16 OUT OUT OUT OUT OUT OUT OUT V17 IN IN IN IN IN IN IN H18 OUT OUT OUT OUT OUT OUT OUT S19 in in in in in In in S20 out out out out out out out N21 out i/o i/o i/o i/o i/o i/o N22 i/o i/o i/o i/o i/o i/o i/o F23 out out out out out out out G24 OUT out out out out out out A25 IN in i/o i/o i/o i/o in I26 OUT out out out out out out L27 in i/o i/o i/o i/o i/o i/o S28 OUT out i/o i/o i/o i/o out S29 in in in in i/o i/o in T30 OUT OUT OUT OUT OUT OUT OUT N31 IN IN IN IN IN IN IN V32 OUT OUT OUT OUT OUT OUT OUT G33 IN IN IN IN IN IN IN S34 OUT OUT OUT OUT OUT OUT OUT N35 IN IN IN IN IN IN IN T36 OUT OUT OUT OUT OUT OUT OUT a Regions shaded in gray are β-strands. b “OUT” and “out” indicate >80% and >60% of side-chains pointing to the outside of the protofilament, respectively, in the 101 peptides in each dataset; “IN” and “in” indicate >80% and >60% of side-chains are pointing inward, respectively, in the 101 peptides in each dataset; “i/o” indicates none of these criteria were met. c The dataset ID referred to Table 2.7. e Consensus side-chain directions based on the results of the different datasets. Since previous studies have identified the residues 20 to 29 (inter β-strand region) as the key amyloidogenic region, it is of interest to evaluate the consistency of side chain orientations in this region among the proposed models. In Table 2.9, we compare the side chain orientations of the "consensus" peptide 31 derived in this work with those in two other models (Luca et al., 2007; Wiltzius et al., 2008). There is an excellent agreement between the side chain orientations in this region in our consensus peptide and those in Luca et al.(Luca et al., 2007). Interestingly, we also observe good agreement with the X-ray structure of NNFGAIL (21-27) (Wiltzius et al., 2008), based on viewing this peptide as two halves separated by a flexible glycine residue. On this basis, the alternating inward and outward patterns observed in Luca et al.(Luca et al., 2007) and in our consensus peptide (with the exception of N22, which we cannot define) are also seen in the NNFGAIL X-ray structure (Wiltzius et al., 2008). The general consistency among the models provides a good basis for further use of these orientations in theoretical studies. Table 2.9. Side-chain directions in the inter β-strand region of hIAPP protofilaments. Amino Acid Consensus a Luca et al. b Wiltzius et al. c S20 out O N21 i/o O X N22 i/o I Y F23 out O X G24 out - A25 in I X I26 out O Y L27 i/o I X S28 out O S29 in I T30 OUT O a Consensus side-chain directions in Table 2.8. “OUT” and “out” indicate >80% and >60% of side chains pointing to the outside of the protofilament, respectively; “IN” and “in” indicate >80% and >60% of side chains are pointing inward, respectively; “i/o” indicates none of these criteria were met. b Taken from Figure 11 of reference Luca et al., 2007. O and I indicate side chains pointing to the outside and inside of the peptide hairpin, respectively. "-" indicates that the direction could not be determined from the figure. d Data from the X-ray structure 14 for peptide NNFGAIL (21-27) (PDB ID 3DGJ), with relative side chain directions shown for two halves of the peptide in upper case letters. 32 2.4. Discussion In this study, we investigated the structure of hIAPP protofilaments using computational refinement based on EM and EPR data. The resulting structure models revealed that the two strands in each peptide are staggered by about 15 Å, and this staggered relationship leads to a left-handed twisted morphology within the protofilament. This stagger is not present in the model of striated ribbon fibrils (Luca et al., 2007) and may represent the underlying structural difference between those two morphologies. An interesting consequence of the twisted arrangement with staggered -strands is the significant exposure of a hydrophobic surface at the fibril ends (Figure 2.9). These “sticky ends” are likely to promote the capture of further monomeric hIAPP, and thus may facilitate fibril elongation. Figure 2.9. The sticky end of a hIAPP protofilament model caused by the stagger. The first β-strands, second β-strands and the inter-strand regions were shown in green, blue and pink, respectively. The displayed side-chains (without hydrogen) show the major hydrophobic residues exposed at the end: Alanine (yellow), Phenylalanine (orange), and Valine (red). 33 A detailed analysis of the geometry of the derived models indicated an upper limit for the helical pitch of about 1000 Å. Comparison of structures with different pitches revealed only small differences in peptide conformation and slight changes in the angle between adjacent ß-strands, but these differences resulted in significant differences in the helical pitch. The general consistency between the structures derived at an atomic level and the macroscopic EM images is remarkable, particularly since only a single stack of peptides was used in the computational refinement. Our findings provide a direct connection between the macroscopic EM images and a structural model at the molecular level. The finding that minor structural changes at the monomer level can lead to significant variations in the long-range twist could explain why fibrils of hIAPP and other amyloid proteins can vary significantly in twist along the fibril axis (Goldsbury et al., 1997; Sumner Makin and Serpell, 2004). Subtle structural changes might occur during fibril growth and such changes could be sufficient to result in a significant alteration in fibril pitch. We note that our structural models only consider a single stack of peptides and that additional restrictions may occur from the wrapping of multiple structures around each other within fibrils, or perhaps also within protofilaments (Luca et al., 2007). The twisted arrangement of the protofilaments and the out-of-plane stagger of each monomer also suggest a reason for the particular stability of hIAPP fibrils. This arrangement results in the absence of a discrete plane of hydrogen bonds orthogonal to the fibril axis, and thus there is no plane that would be susceptible to fracture. 34 Although a small stagger has been inferred from mutagenesis data (Luhrs et al., 2005) and solid-state NMR results (Petkova et al., 2006) for Aβ fibrils, the present results differ significantly from all previously proposed fibril structures. The difference arises from the ability to determine long-range distances, which are essential for defining the overall structure and defining the stagger between the two -strands. These distances complement those from solid-state NMR, which is effective for determining distances <10 Å. The refined family of structures represents the first example in which structural investigation at the atomistic level is able to reproduce the overall morphology observed by EM. In our previous studies, we have combined CW and pulsed EPR with computational refinement to investigate structures of membrane proteins (Jao et al., 2008; Jao et al., 2010). The results of the current study demonstrate that this approach is also applicable to amyloid protein misfolding. The approach should be applicable to studying the structures of fibrils from other amyloid proteins, as well as those of the various oligomeric misfolded forms of these proteins. 35 Chapter 3 MFIBRIL: A Program for Modeling Amyloid Fibril Structures 3.1. Introduction In this chapter, MFIBRIL is described as a structure modeling and analysis program for amyloid fibrils. The program has the following functions: (1) given any single building block, build fibril structures with different pitch, stagger, spacing, helical axis, number of protofilaments and interaction interfaces; (2) calculate the side-chain directions of the residues in any given protofilaments; (3) select the representative peptides from a set of peptides (e.g. a protofilament) based on their dominant CA positions (and side-chain directions); (4) calculate the geometric properties, including radius, sheet-sheet distance, pitch and stagger of a fibril model; (5) predict inter-label (DEER) distances, residue mobilities and ring-to-ring distances in a fibril model by interacting with the program PRONOX (Hatmal et al., 2012); and (6) analyze the stability of a fibril model by interacting with PROCHECK (Laskowski et al., 1993b) and NAMD equilibration (Phillips et al., 2005). The work described below (and published in Li et al. 2012) mainly covers the MFIBRIL functionalities for protofilament/fibril modeling and geometric analysis, i.e. functions (1), (3) and (4). Details of functions (5) and (6) will be given in Chapter 4. 36 3.2. Background Amyloidogenic peptides and proteins have the intrinsic ability to aggregate and form insoluble fibrils. These aggregates are associated with more than 40 human diseases, including Alzheimer’s disease, Parkinson’s disease, and prion diseases (Margittai and Langen, 2008). Human islet amyloid polypeptide (hIAPP) is a 37-residue peptide hormone co-secreted with insulin by pancreatic beta cells in response to an elevated glucose level. Among several physiological roles, soluble hIAPP facilitates the effects of insulin by stimulating neurons to reduce food intake (Woods et al., 2006). However, misfolding and aggregation of hIAPP is toxic to beta cells (Chiti and Dobson, 2006; Meier et al., 2006; Zraika et al., 2010) and hIAPP fibrils are found in >90% of patients with type II diabetes (Jayasinghe and Langen, 2004). Understanding the structure of these fibrils and earlier aggregates on the fibril pathway may reveal the mechanism of hIAPP aggregation and facilitate the design of therapeutic agents and biomarkers for type II diabetes. The structure of the hIAPP fibril is difficult to study directly, but structural information has been obtained by indirect methods. Fourier transform infrared spectroscopy and circular dichroism have shown that the fibril has a significant β- stranded component (Higham et al., 2000; Kayed et al., 1999). X-ray and electron diffraction data indicate that the β-strands are perpendicular to the fibril axis (i.e. a cross-β-structure) and that the spacing between successive β-strands is about 4.7 Å. A diffuse 9.5-Å reflection is also observed in X-ray diffraction data 37 and is thought to be due to spacing between β-sheets(Sumner Makin and Serpell, 2004). Election paramagnetic resonance (EPR) spectroscopy shows that the β- strands are parallel and in-register; i.e., the same residues in neighboring peptides stack on each other (Jayasinghe and Langen, 2004; Margittai and Langen, 2008). Electron microscopy (EM) has shown that hIAPP fibrils are polymorphous, with two major morphologies: a ribbon-like form and a left-handed twisted form (Goldsbury et al., 1997). The diameter of the fibrils typically varies from 80 to 150 Å(Goldsbury et al., 1997; Sumner Makin and Serpell, 2004) and the pitch of the twisted fibrils varies from about 250 to 500 Å (Goldsbury et al., 1997; Luca et al., 2007; Sumner Makin and Serpell, 2004). Conventional and scanning transmission EM show that a hIAPP fibril consists of 2-4 or more protofilaments (Goldsbury et al., 1997), where a protofilament is defined as a single peptide stack. EM observations of fibril preparations have also shown some thinner structures of about 50 Å in diameter (Goldsbury et al., 1997; Sumner Makin and Serpell, 2004). We recently derived a family of models for protofilaments in twisted hIAPP fibrils using EPR spectroscopy and computational simulated annealing (Bedrood et al., 2012). A key element in obtaining these models was computational refinement based on inter-spin label distances obtained using the double electron-electron resonance (DEER) method. The resulting family of protofilament models is characterized by stacked hIAPP monomers that form 38 opposing β-sheets and exhibit a left-handed helical twist. The two β-strands of each monomer adopt out-of-plane positions and are staggered by about 3 peptide layers (Bedrood et al., 2012). The EPR-derived models provide many insights into the structure of the hIAPP protofilament. Our goal is to use these models as a basis for construction of potential hIAPP fibril models that may be validated with further EPR experiments. However, the resolution of the protofilament models is limited, particularly in regions that are less well constrained by experimental data. This leads to variability in the conformation of the constituent peptides and in the local helicity, which restricts our ability to understand the details of the protofilament and to build fibril models. Therefore, we developed an algorithm, MFIBRIL, to extract representative peptides from the models and use these peptides to generate idealized protofilaments with defined geometries. Here, we describe MFIBRIL and discuss its potential for generation of experimentally testable fibril models built from symmetrical and geometrically defined protofilaments. 3.3. Methods 3.3.1. The MFIBRIL Algorithm for Building Protofilaments/Fibrils In describing the MFIBRIL algorithm, we refer to models of hIAPP protofilaments obtained from simulated annealing based on EPR constraints (Bedrood et al., 2012). These calculations were performed on protofilaments comprising 101 peptides, with every fourth peptide carrying spin labels. Separate 39 calculations were performed with constraints to produce protofilament models with different helical pitches and each calculation gave a family of ten protofilament models that were consistent with the EPR constraints (Bedrood et al., 2012) In all families, the peptide unit adopted a horseshoe conformation, but with one β-strand displaced with respect to the plane of the second β-strand (Bedrood et al., 2012). Figure 3.1. A flowchart showing the three steps in the MFIBRIL program. Peptides with this general out-of-plane horseshoe conformation were used in 40 MFIBRIL to build an idealized protofilament in three steps, as illustrated in Figure 3.1: (1) selection of a representative monomeric unit (peptide) from peptide aggregates, for example, a family of protofilament models obtained from simulated annealing; (2) standardization of the orientation of this peptide; (3) rebuilding of an idealized protofilament or fibril with a defined pitch, stagger and axis offset (defined below) using multiple copies of the peptide. Using these three steps, we constructed idealized protofilaments or fibrils starting with multiple peptides from each of the families of protofilament models. Each step is described in detail in the following sections. We note that step (1) in this procedure is specifically focused on determination of a representative peptide from a family of protofilament models obtained from simulated annealing calculations for hIAPP. However, MFIBRIL can select representative peptides from any aggregate consisting of identical peptides with similar conformations. Step (1) can also be omitted and MFIBRIL can be used to build a repeating molecular assembly starting from a single unit from any source. The program is not restricted to models obtained from simulation or to molecular assemblies comprising only amyloidogenic peptides. MFIBRIL is available for use at <http://chemsoft.hsc.usc.edu:8080/MFIBRIL/>. Step 1: Selection of representative peptides Representative peptides are selected from a simulation dataset of protofilament models (Bedrood et al., 2012) based on two criteria: the position of the Cα of each amino acid and the side chain direction. Each dataset contains 10 41 models and each model includes 101 peptides (Bedrood et al., 2012). First, all the peptides of a dataset are aligned in PyMol (Schrodinger, 2010) based on a user-defined region, for which we used residues 13-18 and 31-35 (the beta- strand regions) of peptide #50 in the 6 th 101-peptide hIAPP protofilament model. Then, for each residue, a boundary cube is defined that covers all the Cα atoms from the same residue of different peptides. This cube is divided into 64 smaller sub-cubes of equal volume. The score of each sub-cube is given by the number of Cα atoms it contains. The 'Cα score' of each residue in a particular peptide is equal to the score of the sub-cube in which the Cα atom resides, and the total Cα score of the peptide is the sum of the scores of all the residues in that peptide. Step 2: Standardization of the orientation of the representative peptides The orientation of each representative peptide is standardized before building a protofilament. To standardize the orientation, a peptide plane is defined perpendicular to the fibril axis. In the hIAPP peptide, this plane is defined by three points (Figure 3.2 A): P1 - the midpoint of the Cα atoms of residues 13 and 17, P2 - the midpoint of the Cα atoms of residues 13 and 35, and P3 - the midpoint of the Cα atoms of residues 31 and 35. The peptide plane is then rotated to x=0 and the peptide is shifted to point M, the midpoint of P2 and P4, where P4 is the midpoint of the Cα atoms of residues 17 and 31. Alternatively, users could choose the midpoint of P1 and P3 for point M. Finally, the peptide is rotated to project the line M-P2 as the y-axis in the x=0 plane. This standardization procedure gives a hIAPP peptide with two β-strands that are 42 almost in the same plane perpendicular to the x-axis, with the open end of the peptide in the positive y direction (Figure 3.2 B). In this configuration, the x-axis is parallel to the growth axis of the protofilament. Figure 3.2. Definitions in the MFIBRIL program. A. Orientation of the peptide in the yz plane at x=0 and perpendicular to the fibril (x) axis. B. A different view of the oriented peptide, with the open end of the hairpin in the positive y direction. C. Introduction of stagger into the peptide. D. Different offsets of rotational axes (red dots). The "center" point is the default. Step 3a: Building an idealized protofilament hIAPP protofilaments are left-handed helices that contain peptides staggered by about three peptide layers (Bedrood et al., 2012). To reproduce these 43 geometrical properties, peptide stagger and helical pitch (Figure 3.2 C) must be included in the idealized protofilaments. Furthermore, the helical axis of the protofilament may not be positioned centrally. Thus, MFIBRIL permits this axis to be offset to any (x,y) coordinate (Figure 3.2 D). The stagger of a peptide is defined by the offset of the two β-stands along the x-axis (Figure 3.2 C) and specifically by the difference (in Å) between the x coordinates of the Cα atoms of residues 13 and 35. For convenience, we refer to the stagger in terms of peptide spacing, where 1 “peptide spacing" is the distance between neighboring peptides and is 4.7 Å for hIAPP. To obtain a given stagger, the peptide is rotated around the y-axis to give the appropriate difference between the x coordinates of Cα(13) and Cα(35), without changing the peptide conformation. The pitch of the protofilament is defined with the assumption that each peptide rotates by the same angle to achieve the final pitch. That is, given a pitch = p Å, every peptide performs a clockwise rotation with angle α = 360º / (p/spacing). To achieve a left-handed pitch, p must be positive. The axis for helical rotation is parallel to the x-axis (Figure 3.2 C). The offset is defined as the cross point of the rotational axis and the plane x=0, and can be defined by the user. A reference point is the "center" (Figure 3.2 D) of the peptide at point C (0, y 0 , z 0 ), where y 0 is the midpoint between the two Cα atoms that have the largest and smallest y coordinates (y 1 and y 2 ) among all the residues or among the designated residues in a single unit, and z 0 is the 44 midpoint of the Cα atoms that have the largest and smallest z coordinates (z 1 and z 2 ) among all the residues or among the designated residues in a single unit. The offset of the rotational axis is defined from the reference point. Other reference points (identified as red dots with blue text in Figure 3.2 D, where R y = y 1 -y 0 and R z = z 1 -z 0 ) can be defined to build potential fibril models based on peptide-peptide interactions via the loop or β strands. Finally, the single unit will be shifted to align the offset to the origin, and then the rotation axis will be the x axis. Figure 3.3. Two interaction modes in the fibril modeling of MFIBRIL. (A) The peptide arrangement in a fibril layer for the mode “12”. (B) The peptide arrangement in a fibril layer for the mode “cc”. Step 3b: Building an idealized fibril model There are two approaches to build a fibril using MFIBRIL. The first approach needs to obtain a structure consisting of at least one layer in the fibril. Then taking this structure as a building block, a fibril model can be built by MFIBRIL using the exact method for building a protofilament described in step 3a. The 45 limitations of this approach are (1) it requires a building block consisting of at least one fibril layer; (2) and the stagger of the monomer (or a single unit) cannot be adjusted individually. This approach is useful for ensembles for which there is a well-established structure consisting of at least one fibril layer. The second approach is to build the fibril based on the single unit using MFIBRIL directly. The fibril is modeled in this way: first apply the stagger to a single unit as described before, then build a fibril layer based on the interaction defined for the single units in a layer, and finally build a fibril as building a protofilament using the constructed layer as a building block and apply the defined pitch on the fibril. Thus, the fibril modeling is very similar to the protofilament modeling except for the interaction between the single units in a layer must be defined in the fibril modeling. The interaction between the single units in one layer is defined by two parameters: the offset and the interaction mode. The offset is the same as the one described in step3a. The interaction mode defines the arrangement of the single units in a layer around the rotation center given by the offset. Currently, two interaction modes, “12” and “cc”, for a 2-protofilament fibril are available (Figure 3.3). In any mode, the position of the first single unit is modeled based on the offset as in step3a, for example, the left peptide in Figure 3.3 A and B, in which, point C1 (x, y 0 , z 0 ) shows the peptide “center” defined as in stpe3a, and point O shows the offset (e.g. the offset b2L in Figure 3.2 D) shifted to the origin. Then in mode “12”, the second identical peptide is shifted by -2*z 0 from the first peptide, i.e. the “center” of the second 46 peptide is at C2 (x, y 0 , -z 0 ) (Figure 3.3 A). In mode “cc”, the second identical peptide is symmetric to the first peptide around the x axis, i.e. the “center” of the second peptide is at C2 (x, -y 0 , -z 0 ) (Figure 3.3 B). More interaction modes in MFIBRIL might be developed in the future if desired. 3.3.2. Geometrical Analysis of Idealized Protofilaments Analysis of the geometric properties of idealized protofilaments can also be performed using MFIBRIL. The sheet-sheet distance (the distance between the two β-sheets in the protofilament) and radius were determined for idealized protofilaments with variation of peptide stagger, helical pitch, and axis offset. These data were compared with those obtained in EPR-constrained simulation models (Bedrood et al., 2012) and through direct experiments (Goldsbury et al., 1997; Sumner Makin and Serpell, 2004). The sheet-sheet distance in the protofilament is calculated as the average of the distance between the Cα of residue 12 of peptide i and the closest Cα of residue 36 of another peptide (peptide j) and the distance between the Cα of residue 13 of peptide i and the Cα of residue 35 of peptide j. The radius is defined as the maximum distance among all the Cα atoms of a peptide to the rotational axis of the protofilament. The quality of the constructed protofilaments was examined using Procheck (Laskowski et al., 1993b) and through energy minimization using AMBER10 (Case et al., 2005). In this calculation, the side chains of a protofilament were minimized for 1000 steps and then the whole structure was minimized for 5000 steps. 47 3.3.2. Definition of Datasets from Simulations EPR-constrained simulated annealing molecular dynamics (SAMD) calculations (Bedrood et al., 2012) gave protofilament models with helical pitches ranging from 250 to 1000 Å and a stagger of about 3 peptides. Four datasets (each with 10 models of 101-peptide protofilaments) were obtained from these calculations. Here, we refer to these datasets as 4_nc, 4_250, 4_500 and 4_1000 based on the helical pitch constraint: no constraint (nc), and 250 Å, 500 Å and 1000 Å, respectively; with 4 referring to placement of a spin label on every 4th peptide. The SAMD dataset IDs used in this chapter are the same as the ones used in Chapter 2 Table 2.7. The details of SAMD calculations have been described in Chapter 2 and our previous publication (Bedrood et al., 2012). 3.4. Results 3.4.1. Construction of Protofilament Models with Varied Stagger, Pitch and Offset MFIBRIL was used to build idealized protofilament models with variation of stagger, pitch and offset. Fifty peptides with the highest Cα score were selected from each of the datasets 4_nc, 4_250, 4_500 and 4_1000. Each combination of stagger, pitch and offset was used to reorient the peptide (Figure 3.2) and then the side chain direction for each peptide were compared with the consensus directions obtained as indicated in Table 3.1. Peptides were excluded if they did not meet the following criteria: an exact match for side chains defined as "IN" or 48 "OUT", except for residues 12 and 36; and no more than 2 differences for side chains defined as "in" and "out". We note that the side chain directions of a given peptide can alter with variation of stagger. A peptide was also excluded if the protofilament built from the peptide had a negative sheet-sheet distance (i.e., a clash between β-sheets). Table 3.1. Side-chain directions in consensus and a representative peptide a Amino Acid Consensus b Representative peptide c L12 out OUT A13 IN IN N14 OUT OUT F15 IN IN L16 OUT OUT V17 IN IN H18 OUT OUT S19 in IN S20 out OUT N21 i/o OUT N22 i/o OUT F23 out OUT G24 out OUT A25 in IN I26 out OUT L27 i/o IN S28 out OUT S29 in IN T30 OUT OUT N31 IN IN V32 OUT OUT G33 IN IN S34 OUT OUT N35 IN IN T36 OUT OUT a Regions shaded in gray are β-strands. b Consensus side-chain directions determined in EPR-constrained simulation datasets. This is the same as the one shown in Table 2.8 last column. c A representative peptide (Figure 3.4 A) determined in this work. 49 Table 3.2. Geometrical analysis of hIAPP protofilament models built in MFIBRIL Mode # Stagger (peptides) Pitch (Å) Offset a 4_ nc 4_ 250 4_ 500 4_ 1000 Total b Sheet- sheet (Å) Radius (Å) 1 3 250 center 18 19 24 19 80 8.3± 1.6 17.5± 0.9 2 3 500 center 18 19 24 19 80 10.6± 1.6 17.5± 0.9 3 3 1000 center 18 19 24 19 80 11.7± 1.7 17.5± 0.9 4 3 2000 center 18 19 24 19 80 12.3± 1.7 17.5± 0.9 5 2 500 center 23 23 25 24 95 14.9± 1.3 18.2± 0.8 6 4 500 center 16 13 11 3 43 5.2± 1.6 16.6± 0.9 7 3 250 loop 16 16 11 1 44 3.8± 0.9 33.0± 1.7 8 3 500 loop 18 19 24 18 79 7.8± 1.6 33.6± 1.7 9 3 1000 loop 18 19 24 19 80 10.3± 1.6 33.6± 1.7 10 3 250 b1 18 19 24 18 79 10.0± 1.4 23.8± 1.2 11 3 500 b1 18 19 24 19 80 11.2± 1.6 23.8± 1.2 12 3 1000 b1 18 19 24 19 80 12.0± 1.6 23.8± 1.2 13 3 250 b2 18 19 24 19 80 7.9± 1.8 22.9± 1.2 14 3 500 b2 18 19 24 19 80 10.3± 1.7 22.9± 1.2 15 3 1000 b2 18 19 24 19 80 11.6± 1.7 22.9± 1.2 a For definitions, see Figure 3.2 D. b The sum of the previous four columns. Each of these columns indicates the sample size of each dataset (i.e. the number of protofilaments modeled from different representative peptides obtained from each dataset). The results of this procedure for protofilament models #1 to #15 are shown in Table 3.2. For example, for model #1, there were 18, 19, 24 and 19 peptides from datasets 4_nc, 4_250, 4_500 and 4_1000 (out of the original 50 selected from each dataset) that were not excluded by the above criteria. For model #1, this gave a total of 80 peptides (Table 3.2) that were used to build 80 different idealized protofilaments with stagger = 3 peptides, pitch 250 Å, and offset = center (Figure 3.2). The sheet-sheet distance and radius were determined for each of these protofilaments and are given as an average ± SD in Table 3.2. A similar procedure was performed for a systematic set of protofilaments with varied stagger, pitch and offset. Some examples are shown as models #2 to #15 50 in Table 3.2. These results are described in more detail in the following sections. Figure 3.4. A representative peptide and the protofilament models built based on it. (A) Two views of a representative hIAPP peptide (residues 12-36) obtained from MFIBRIL analysis of simulation models. (B) Stacking of the same peptide in a protofilament built in MFIBRIL. The stagger of 3 peptides is illustrated by the coloring of orange, blue and green. It is apparent that one beta strand of, for example, an orange peptide is in the same plane as the other beta strand of the next orange peptide in the structure. (C) Protofilament models (101 peptides) built from the consensus peptide by MFIBRIL, with staggers of 3 peptides and pitches of 250 Å, 500 Å, and 1000 Å. These are models #1, #2 and #3, respectively, in Table 3.2. Models #1 to #15 built from the peptide in Figure 3A were also successfully minimized in AMBER10 (Table 3.3) and the structures before and after minimization all had most residues in allowed regions of conformational space based on Procheck analysis (Figure 3.5, Figure 3.6). These results indicate that 51 the models constructed in MFIBRIL have reasonable intermolecular distances and hydrogen bonding. Table 3.3. Minimized energies (AMBER10) for the protofilament models constructed in MFIBRIL from the peptide shown in Figure 3.4 A. Model # a Energy (kcal) 1 -11393 2 -13651 3 -13918 4 -13767 5 -10407 6 -14817 7 -9368 8 -9151 9 -12813 10 -8531 11 -10829 12 -13289 13 -14343 14 -11554 15 -13660 a The MFIBRIL model # refers to Table 3.2. Figure 3.5. Procheck analysis of the models built from the peptide in Figure 3.4. Since all the models #1 to #15 were built from the same representative peptide, so the Procheck results are the same for all the models before minimization. 52 53 Figure 3.6. Procheck analysis of models #1 to #15 built from the peptide in Figure 3.4 A after minimization in AMBER10. 3.4.2. Structures of Idealized Protofilaments Geometric analysis of protofilament models #1, #2 and #3 (Table 3.2) gave sheet-sheet distances of 8.3± 1.6, 10.6± 1.6 and 11.7± 1.7 Å, respectively. These values were similar to those of 8.8± 0.3, 10.8± 0.2, and 12.6± 0.4 Å in the respective simulation datasets 4_250, 4_500, and 4_1000 (Bedrood et al., 2012). These results show that MFIBRIL can be used to idealize the simulation model 54 without loss of the key geometric features. Construction of protofilaments corresponding to models #1, #2 and #3 is shown in Figure 3.4, based on a single peptide selected from dataset 4_500. This peptide (Figure 3.4 A) had the top Cα score among the 1010 peptides (10 models × 101 peptides) in this dataset and had side-chain directions that were largely consistent with the consensus directions (Table 3.1). Multiple copies of the peptide in Figure 3.4 A were used to construct a protofilament with a stagger of 3 peptides and a left-handed pitch of 500 Å (Figure 3.4 B). Simplified images of this protofilament and those with pitches of 250 Å and 1000 Å (built from the same peptide) are shown in Figure 3.4 C. In these images, the peptides are colored blue, orange and green to illustrate the 3- peptide stagger. 3.4.3. Variation of Stagger and Pitch The general dependence of the sheet-sheet distance on pitch and stagger is shown in Figure 3.7. A decrease in stagger or an increase in pitch causes an increase in the sheet-sheet distance. At large pitches (>1000 Å) the rate of increase in the sheet-sheet distance is slow and this distance approaches the value for a straight protofilament with the same stagger. The experimental value for the sheet-sheet distance (about 10 Å) (Sumner Makin and Serpell, 2004) is not satisfied for a protofilament with a pitch of 2000 Å and a stagger of 3 peptides (model #4, Table 3.2) or with a stagger of 2 or 4 peptides and a pitch of 500 Å (models #5 and #6 in Table 3.2). Thus, the experimental sheet-sheet distance is 55 only obtained in protofilaments with pitches of about 250 Å to 500 Å (Figure 3.7 A) and a stagger of about 3 peptides (14.1 Å) (Figure 3.7 B). Figure 3.7. Dependence of the sheet-sheet distance on pitch and stagger. (A) Dependence of the sheet-sheet distance on pitch in a protofilament built in MFIBRIL with a stagger of 3 peptides. (B) Dependence of the sheet-sheet distance on peptide stagger (shown as the distance with the number of peptides in parentheses) in a protofilament built in MFIBRIL as a left-handed helix with a pitch of 500 Å. 3.4.4. Variation of Offset The results in the previous section assume a helical axis that is positioned 56 centrally with respect to the hIAPP monomers. However, wrapping of multiple protofilaments around each other to give fibrils is likely to require this axis to be offset from the central position. MFIBRIL models built with inclusion of this offset are shown in Figure 3.8 and the corresponding geometric data are shown in Table 3.2 and Figure 3.9. The structures in Figure 3.8 A all have a pitch of 500 Å and a stagger of 3 peptides, with offsets (defined in Figure 3.2) of center, loop, b1 and b2. These structures are examples of those in models #2, #8, #11 and #14, respectively (Table 3.2). All were built using the peptide in Figure 3.4 A. Figure 3.8. Protofilament models built in MFIBRIL with different offsets. (A) Protofilament models (101 peptides) built in MFIBRIL with a stagger of 3 peptides and a left-handed helical pitch of 500 Å, using axis offset points indicated in blue (refer to Figure 3.2 D). Successive peptides are colored green, orange and blue. (B) Protofilament models (3-peptide stagger, 500 Å pitch) viewed along the helical axis, with construction based on the axis offset points shown in Figure 3.2 D. 57 Figure 3.9. Geometric data of the protofilament models built with different offsets. (A) Dependence of the protofilament radius on stagger (shown in Å: 1 peptide stagger = 4.7 Å) in protofilaments built in MFIBRIL with a left-handed helical pitch of 500 Å and using the indicated axis offset points. (B and C) Dependence of the sheet-sheet distance on (B) pitch (with stagger fixed at 3 peptides) and (C) stagger (with pitch fixed at 500 Å) in protofilaments built using the axis offset points indicated in A. The protofilament radius differs significantly with use of different offsets (Figure 3.9 A). Offsets around the loop region give the largest radius (about 35 Å). This feature can be seen in the structures in Figure 3.8 B for the loopL, loop and loopR offsets. The offset of "center" gives the smallest radius (about 17.5 Å), while the beta strand offsets (b1L, b1, b1R, b2L, b2 and b2R) give similar radii of about 23-28 Å with a stagger of 14.1 Å (3 peptides). The images in Figure 3.8 B also show how different offsets result in different potential interaction interfaces 58 for protofilament-protofilament interactions. These interfaces would occur close to the offset axis, which appears as the "hole" in the structures in Figure 3.8 B. Different offsets resulted in a smaller variation in sheet-sheet distances (Figure 3.9 B and C). Thus, the conclusion that the experimental sheet-sheet distance is obtained for protofilament pitches of 250 Å to 500 Å and a stagger of about 3 peptides (14.1 Å) holds almost regardless of the offset used (Table 3.2). Figure 3.10. Two hIAPP fibril models built in MFIBRIL. Each fibril model is consisted of two protofilaments with 500-Å pitch and 101 peptides. (A) A fibril model with two protofilaments interacting through the inter-strand regions. (B) A fibril model with two protofilaments interacting through the second β-strands. (C) The bottom five pairs of peptides in (A) viewed along the fibril axis. (D) The bottom five pairs of peptides in (B) viewed along the fibril axis. 59 3.4.5. hIAPP Fibril Models Two hIAPP fibril models (Figure 3.10) based on the protofilament models “loop” and “b2” in Figure 3.8 were constructed using MFIBRIL. The two protofilaments in the first fibril model (Figure 3.10 A, C) interact with each other via the inter-strand regions, while the two protofilament in the second fibril model (Figure 3.10 B, D) interacts with each other via the second β-strands. One thing observed from the fibril models is that the pitch of a 2-protofilament fibril is half of the protofilament pitch. For example, in Figure 3.10, the pitches of both fibrils are about 1000 Å, while the pitches of the constituted protofilaments are about 500 Å. Another observation is that the radii of the fibril models and the groove sizes in the fibrils could be very different when the interaction interfaces between the protofilaments are different. More hIAPP fibril models and more details about the fibrils will be shown in Chapter 4. 3.4.6. Other Amyloid Fibrils Built by MFIBRIL The above results show the details of different hIAPP protofilaments/fibrils (twisted morphology) built by MFIBRIL based on the simulation structures (Bedrood et al., 2012). The current section is included to show that MFIBRIL is also applicable to the modeling of other amyloid fibrils. A hIAPP fibril in striated ribbon morphology was built (Figure 3.11) by MFBRIL based on a model obtained by solid-state NMR (Luca et al., 2007). A one-layer structure extracted from the solid-state NMR model and a fibril model built from this one-layer structure are shown in Figure 3.11 A and C, respectively. 60 The arrangement of the peptides within a protofilament is shown in Figure 3.11 B. In contrast to our twisted hIAPP fibrils shown above, this ribbon-like fibril model is straight, instead of twisted, and the two β-strands of each peptide are almost in the same plane instead of having an obvious stagger. Figure 3.11. Modeling of a ribbon-like hIAPP fibril using MFIBRIL based on a structure presented in Luca et al., 2007. (A) One layer of the structure presented in Figure 11 of Luca et al., 2007. (B) The arrangement of the peptides within a protofilament of the ribbon-like hIAPP fibril model shown in C. (C) A ribbon-like hIAPP fibril model built from the structure shown in A. A fibril model of Aβ, an amyloid protein associated with Alzheimer’s disease, was also constructed (Figure 3.12). The modeling is based on a structural model 61 (PDB ID: 2LMP) obtained by the simulated annealing with constraints from solid- state NMR data of twisted Aβ fibrils (Paravastu et al., 2008). A one-layer structure extracted from PDB ID 2LMP and the Aβ fibril model built from this one- layer structure are shown in Figure 3.12 A and C, respectively. The fibril model consists of three protofilaments, and a 1000-Å pitch was applied to each protofilament in this model. The arrangement of the peptides within a protofilament is shown in Figure 3.12 B. A stagger with about a 2-peptide spacing is observed between the two β-strands of each peptide. Figure 3.12. Modeling of a twisted Aβ fibril using MFIBRIL based on a structure presented in Paravastu et al., 2008. (A) One layer of the structure (PDB ID: 2LMP) presented in Paravastu et al., 2008. (B) The arrangement of the peptides within a protofilament of the Aβ fibril model shown in C. (C) a twisted Aβ fibril model built from the structure shown in A. 62 Figure 3.13. User interface and a demonstration of the MFIBRIL website. (A) The user interface for submitting a modeling request in the website. The request form was filled as an example. (B) The user interface showing the result. (C) The input PDB structure indicated in A. (D) The resulting fibril model produced for the request shown in A. 63 3.4.7. The MFIBRIL Website In order to make the MFIBRIL program publicly available, a website was constructed (current URL: http://chemsoft.hsc.usc.edu:8080/MFIBRIL/). Figure 3.13 shows the user interface of the MFIBRIL website and demonstrates how to use the website to build a protofilament model. The form in the user interface (Figure 3.13 A) contains three sections: section 1 allows upload of a PDB file of a single unit and input of the number of residues in this unit; section 2 sets the parameters for the fibril modeling; and section 3 accepts input of an email address to receive the modeling result, which is optional. When a user clicks a parameter space in section 1 or 2, more detailed instructions about the parameter will appear beside the space. For example, the light yellow box in Figure 3.13 A shows the pop-up instruction when a user clicks the first bank of the parameter “Four Residue Nos. to determine the y-z plane and y axis”. When a user clicks “Submit”, a simple check is performed to see if there might be some problems with the user input: for example, a required field was not filled and some input values are out of range. If the user inputs pass the check, the request will be submitted to the server. Within a few seconds, a link to the resulting fibril model will be displayed (Figure 3.13 B) so that the user can download the result by clicking the link. If an email address was input and the structure was not too big to send, a copy of the resulting structure will also be sent to the user by email. The user can also click the “Back to the form” link to go back to the form and submit another request. 64 An example run of the MFBRIL website is shown in Figure 3.13. Figure 3.13 A shows the input values filled in the request form, for which the input PDB structure is shown in Figure 3.13 C. Figure 3.13 B shows the resulting link and Figure 3.13 D shows the resulting fibril model downloaded from the link. 3.5. Discussion The MFIBRIL algorithm permits generation of idealized protofilaments based on models from simulation. The idealized models keep the important features of the simulation models, but are more ordered. These idealized protofilaments are suitable for further model building of hIAPP fibrils, which can also be achieved in MFIBRIL. Beyond the specific application to hIAPP, MFIBRIL can be used for building any fibrillar aggregate of identical peptides. Polymorphism is common for amyloid proteins, including amyloid β and α-synuclein, and recent theories suggest that different conformations of amyloid aggregates may play distinct roles in disease (Reinke and Gestwicki, 2011). Thus, studying the structural details may reveal different aggregation mechanisms that have different roles in disease. Idealized models generated in MFIBRIL will also provide targets for screening for potential biomarkers and therapeutic agents using docking techniques. The results obtained with variation of stagger and pitch in MFIBRIL indicate that hIAPP protofilaments may be limited to staggers of about 2 to 4 peptides (centered on 3) and pitches between about 250 to 500 Å. These parameters 65 gave a sheet-sheet distance and a protofilament radius consistent with experimental data (Goldsbury et al., 1997; Sumner Makin and Serpell, 2004) and in line with our EPR-constrained simulation models (Bedrood et al., 2012). As stagger approaches 4 peptides or the pitch decreases to <250 Å, sheet-sheet clashes prevent formation of the protofilament. Conversely, as the stagger falls below 2 or the pitch increases to 1000 Å and above, the protofilament structure opens up and hydrophobic surfaces within the horseshoe peptide conformation have greater exposure to water. The balance between these extremes may provide the most stable protofilament. Putative models for fibrils based on the structural and mutagenesis information can be tested through construction of idealized structures in MFIBRIL. Indeed, our main goal in developing the program is to build experimentally testable models for amyloid fibrils with defined staggers, pitches and offsets. The initial definition of the layer will be based on the proposed peptide-peptide interactions in a particular fibril, with two or more interacting protofilaments. It is likely that only specific combinations of stagger, pitch and offset will be viable for a particular assembly, based on steric considerations only, and this will limit the number of models. For models that are sterically viable, we can then determine label mobilities and pairwise inter-spin label distances for multiple pairs of sites using another algorithm in our laboratory, PRONOX (Hatmal et al., 2012). With this algorithm, we can generate data in a form consistent with that obtained from EPR DEER experiments (Hatmal et al., 2012), which may allow the predicted 66 data to be validated experimentally. This approach is explored further in Chapter 4. 67 Chapter 4 Modeling and Prediction of the Fibril Structure of Human Islet Amyloid Polypeptide (hIAPP) 4.1. Introduction Chapter 4 demonstrates the use of MFIBRIL (Li et al., 2012) combined with NAMD (Phillips et al., 2005) and another program developed in our laboratory, PRONOX (Hatmal et al., 2012), to investigate hIAPP fibril structures. MFIBRIL was used to build fibril models with different potential interaction interfaces, and then equilibration using molecular dynamics (MD) simulations in NAMD was performed to refine the models. Energies of the models, including electrostatic and van der Waals interactions, were analyzed by NAMD and VMD (Humphrey et al., 1996). Inter-label (DEER) distances, residue mobilities and ring-to-ring distances in models retrieved from the MD simulations were predicted using PRONOX and MFIBRIL analysis tool and then compared to the experimental results. Based on the model energies and the predicted and experimental DEER distances, a favorable hIAPP fibril model was selected. This selected model could be used to aid the design of further DEER experiments to obtain more precise hIAPP fibril structures. 4.2. Background Electron paramagnetic resonance using site-directed spin labeling can be 68 used as an approach for determination of protein structures that are difficult to solve by other methods. One important aspect of this approach is the measurement of inter-label distances using the double electron–electron resonance (DEER) method. Interpretation of experimental data could be facilitated by a computational approach to calculation of inter-label distances. PRONOX (Hatmal et al., 2012) is a program developed in our laboratory for rapid computation of inter-label distances based on calculation of spin label conformer distributions at any site of a protein. The program incorporates features of the label distribution established experimentally, including weighting of favorable conformers of the label. The use of PRONOX for fitting to DEER data for determination of protein tertiary structure has been demonstrated (Hatmal et al., 2012). Several recent models of hIAPP fibrils have been proposed. Luca et al. (Luca et al., 2007) determined a model for ribbon-like hIAPP fibrils using scanning transmission EM, solid state NMR (SSNMR) and modeling (Luca et al., 2007), in which each peptide has a horseshoe shape formed by a β-strand (residues 8 to 16), a connection region (17 to 27), and a second β-strand (28 to 37). MD simulations based on this model indicated potential formation of fibrils containing two or three protofilaments with different organizations and interaction interfaces (Zhao et al., 2011a, b). Further important structural information has been derived from X-ray structures of several hIAPP fragments, including NFLVHSS (residues 14-20), NNFGAIL (21-27) and SSTNVG (28-33) (Wiltzius et al., 2008). A 3-fold 69 model was proposed from two-dimensional infrared spectroscopy and MD simulations (Wang et al., 2011). The exact fibril model and the interaction interface between the protofilaments are still unclear. 4.3. Methods 4.3.1. Fibril Constructions by MFIBRIL The detailed algorithm for fibril construction in MFIBRIL has been covered in Chapter 3. In the fibril construction described in the current chapter, the monomeric unit used was the representative peptide (residues 12-36) shown in Figure 3.4 (Chapter 3), in conjunction with the fragment of residues 1-11 in the model presented in Figure 11 of Luca et al., 2007. Four basic fibril models (model_11, model_12, model_22 and model_LL) with different interaction interfaces were constructed using MFIBRIL. Each model consists of two protofilaments and each protofilament consists of 35 peptides. The two protofilaments interact with each other through the first (N-terminal) β-strand in each interacting peptide (model_11), the first β-strand in one peptide and the second (C-terminal) β-strand in the second peptide (model_12), the second β- strands (model_22), and the inter-strand (loop) regions (model_LL) (Figure 4.1, Figure 4.2). Twenty-three more models derived from model_22 were also constructed by adjusting the separation distance and the shift between the two protofilaments. A 500-Å pitch was applied to each protofilament for all these models, i.e. the pitch of a fibril model is 250-Å (Figure 4.1). 70 Figure 4.1. Starting structures of Model_11 (A), Model_12 (B), Model_22 (C) and Model_LL (D). Two views with a 90º rotation are shown for each fibril model. In each fibril structure, one protofilament is colored in green, and the other protofilament is colored in cyan. 71 Figure 4.2. Two peptides in the same layer in starting structures of fibril model_11 (A), model_12 (B), model_22 (C) and model_LL (D). Residues 12-19 and 31-36 are displayed as β-strands. 4.3.2. Molecular Dynamics Simulations The hIAPP fibril models constructed by MFIBRIL were input into the MFIBRIL accessory package to generate files for solvation of the fibril. In this step, residue 72 Y37 of hIAPP (the C-terminal residue) was also added to each peptide using the VMD psfgen plugin (Humphrey et al., 1996). Using the output files from the preparation step, each model was solvated in a water box such that there is a layer of water 12Å in each direction from the atom with the largest coordinate in that direction. The solvation was performed in VMD (Humphrey et al., 1996). Simulations with periodic boundary conditions were performed on each solvated structure using NAMD (Phillips et al., 2005) with the CHARMM27 force field. After minimization for 1000 steps, MD trajectories of 10 ns (for the four basic models: Model_11, Model_12, Model_22 and Model_LL) or 1 ns (for the modified Model_22 models) were performed at 310 K with periodic boundary conditions. In the simulation, the timestep was 1 fs, the non-bonded cutoff was 10 Å, and Langevin dynamics and PME were turned on. Structures from the trajectories were saved every 1 ps (1000 steps). Analyses of RMSD and energies were performed using VMD and the NAMD Energy plugin. 4.3.3. Experimental Data Prediction for the Fibril Models PRONOX is a program developed in our laboratory to calculate the inter- label distances based on the calculation of spin label conformer distributions (Hatmal et al., 2012). In this chapter, PRONOX was used to predict inter-label (DEER) distances and residue mobilities for the different hIAPP fibril models. Input files and scripts for the PRONOX calculations were generated by the MFIBRIL accessory package. The MFIBRIL analysis tool was also used to calculate ring-to-ring distances between aromatic residues. 73 Calculations of inter-label distances were performed using PRONOX on structures obtained every 1ns in the simulations. For each structure, the 50 peptides in the middle 25 "layers" of the 35-layer fibril were calculated. The PRONOX analysis was perform using a clash cutoff of 0.40 and weighting of 0.90 for favored torsion angles and without fine search (Hatmal et al., 2012). Residue mobility was also predicted by calculating the number of spin label conformers at each residue. Larger number of conformers is likely to indicate higher mobility of a residue. The mean or median values were used to represent the overall predicted mobilities of residues 8-36 for each fibril model. Ring-to-ring distances between two pairs of aromatic residues, Y37 and F15, and Y37 and F23, were calculated by the analysis tool in the MFIBRIL accessory package. For each peptide in the middle 25 “layers” of the 35-layer fibril, the ring- to-ring distances between its Y37 and the closest F15 and the closest F23 within the fibril were measured. These results are presented as ring-to-ring distances of the pairs Y37-F15 and Y37-F23, respectively. Here, the ring-to-ring distance is the distance between the centers of two aromatics rings. Data analysis was performed by SAS 9.2 (SAS Institute Inc., Cary, NC). 4.4. Results 4.4.1. Simulation Results of the hIAPP Fibril Models Simulations with periodic boundary conditions were performed on the four fibril models (Figure 4.2) for 10ns at 310 K. All four models were stable in the 74 simulation. Figure 4.3 shows the backbone root-mean-square deviations (RMSDs) of the four fibril models during the simulation. In all four models, the backbone RMSDs first increased quickly and then the increase slowed down. After about 6ns, the RMSDs start to become stable. The RMSDs also suggest that a longer simulation might be required to fully equilibrate the system, especially for model_LL and model_12. Figure 4.3. Backbone RMSDs of the fibril model of model_11 (A), model_12 (B), model_22 (C) and model_LL (D). Energies of the entire systems of all four models decrease quickly at the beginning of the simulation and then become stable within the first 0.05 ns. Energies including electrostatic interactions and van der Waals interactions within the fibril and between the fibril and the solvent (water molecules) were also analyzed over the last 0.5 ns simulation (Table 4.1). The results show that the 75 electrostatic interactions made greater contribution to the energies compared to the van der Waals interactions. Model_11 shows relatively fewer electrostatic interactions within the fibril compared to the other models, but relatively more electrostatic interactions between the fibril and the solvent. Model_22 shows relatively more van der Waals interactions within the fibril, but relatively fewer van der Waals interactions between the fibril and the solvent. Table 4.1. Average energies of the fibril structures over the last 0.5 ns simulations. Energies within fibril Energies between fibril and water Model Electrostatic Van der Waals Total Electrostatic Van der Waals Total _11 -29579 -8780 -38359 -77611 -4834 -82446 _12 -30697 -8894 -39591 -75906 -4852 -80758 _22 -30288 -8967 -39255 -76536 -4790 -81326 _LL -30640 -8470 -39111 -75902 -5241 -81143 The fibril structures after simulations for 10 ns are shown in Figure 4.4, and the middle layers of these fibril structures are shown in Figure 4.5. Based on observation of contacts between complementary amino acids, the two protofilaments in Model_22 exhibit more favorable interactions compared to the other models. 76 Figure 4.4. Fibril structures of Model_11 (A), Model_12 (B), Model_22 (C) and Model_LL (D) after simulations for 10ns. Two views with a 90º rotation are shown for each fibril model. In each fibril structure, one protofilament is colored in green, and the other protofilament is colored in cyan. 77 Figure 4.5. The middle peptide layer of the fibril structures of Model_11 (A), Model_12 (B), Model_22 (C) and Model_LL (D) after simulations for 10 ns. The peptide is colored to show the backbone (green), non-aromatic hydrophobic residues (pink), aromatic residues (violet), non-aromatic polar residues (cyan), positive residues (blue) and cysteine (yellow). 78 4.4.2. DEER Distance Prediction Eighty DEER distances were predicted by PRONOX on structures obtained every nanosecond in the simulations. Diff pred-exp , Differences between the predicted and experimental DEER distances, were calculated (the experimental data refer to Chapter 2, Table 2.2). Boxplots of Diff pred-exp over the 10 ns simulations (Figure 4.6) show that Diff pred-exp obtained from different structures are consistent within the same model. Figure 4.7 shows the distribution of all the 18 DEER distances (the DEER No. refers to Table 2.2) in all four basic models. These results show that almost all the predicted distances are in normal distributions. Based on the data shown in Bedrood et al., 2012 and Chapter 2 Table 2.2, the experimental data of DEER No. 1, 2, 3, 9, 10, 15 and 18 have the highest quality. For these high-quality label pairs, the predicted distances matches the experimental distribution very well, except for DEER No.15, for which the predicted distance is 5-8 Å smaller than the experimental value. The predicted distances for other label pairs are also close to the experimental data, except for some pairs with lower experimental quality, such as DEER No. 11, 12 and 13. These results show that although the four basic models have different interfacial interactions and have undergone 10 ns simulations, they maintain the conformational features within each peptide that match the experimental DEER distances. 79 Figure 4.6. Boxplot of the differences between the predicted and experimental DEER distances (Diffpred-exp) for structures obtained every 1 ns in the 10 ns simulation of Model_11, _12, _22 and _LL, for the DEER distance between residues 13 and 24 (DEER No. 4). The data points at -1000 ps and 0 ps are for the starting structure and the structure after minimization, respectively. Boxplots for the other 17 DEER distances were calculated in a similar manner. 80 81 82 Figure 4.7. Histograms of Diff pred-exp aggregated over the last 4ns in MD simulations for the Model_11, Model_12, Model_22 and Model_LL. The reference line in each histogram is x=0, and the blue curve shows the fitting of a normal distribution. The DEER No. refers to Table 2.2. 83 4.4.3. Ring-to-ring Distance Prediction The ring-to-ring distances of Y37-F15 and Y37-F23 were measured in the structures after 1ns and 10ns of the simulations. The ring-to-ring distance is calculated between the centers of two rings from a Y37 and its closest F15 (or F23) in the fibril. For Y37-F15 (Figure 4.8) among the four models, Model_LL shows the smallest ring-to-ring distances (about 17 Å), and Model_22 shows the second smallest ring-to-ring distances (about 20 Å). The closest F15 to Y37 in a peptide usually comes from the peptide 2-4 layers below and within the same protofilament for all four models. This layer difference is consistent with the stagger (about 3-peptide length) between the two β-strands of a peptide. For Y37-F23 (Figure 4.9) among the four models, Model_22 shows smallest ring-to-ring distances (about 15 Å) and the other models have much larger ring- to-ring distances. In Model_22, the closest F23 to Y37 in a peptide usually comes from the peptide 1 layer above and in the opposite protofilament. Fluorescence resonance energy transfer (FRET) has shown that Y37 is near F15 and F23: the FRET efficiency suggested either both distances were about 10.2 Å or one was very short while the other was about 11Å (Padrick and Miranker, 2001). Compared to the other three models, Model_22 matches these experimental data best overall, although its ring-to-ring distances are still higher than the values suggested by FRET. Since the diameter of a ring is about 2.8 Å and our prediction measured the distances between the ring centers, the closest distances of the two rings could be 2.8 Å smaller than our predicted distances. 84 Also, a derived model with a smaller pitch could reduce the ring-to-ring distances of Y37-F15 due to the relationship of the distance with stagger, and a slight shift between the two protofilaments may also give smaller ring-to-ring distances for Y37-F23. Figure 4.8. Ring-to-ring distances of Y37-F15 for the structure at 1 ns (top panel) and 10 ns (bottom panel) in Model_11, Model_12, Model_22 and Model_LL (from left to right). 85 Figure 4.9. Ring-to-ring distances of Y37-F23 for the structure at 1 ns (top panel) and 10 ns (bottom panel) in Model_11, Model_12, Model_22 and Model_LL (from left to right). 4.4.4. Heterogeneity in Model_22 Structural analysis layer by layer revealed that Model_22 changes its inter- protofilament interaction along the fibril during the simulation: the interaction interfaces in the first half of the fibril differ from those in the second half after simulation for a few nanoseconds. Such interfacial change is not observed in the other three models. Figure 4.10 A shows a typical peptide layer in the first half of fibril, with an interface mixed with hydrophilic and hydrophobic interactions. And Figure 4.10 B shows a typical peptide layer in the second half of fibril, with an 86 interface with a strong hydrophobic interaction wrapping around a hydrophilic interaction. Analysis of residue mobilities and ring-to-ring distances show that the second half peptide has slightly better agreement with the experimental data than the first half peptide. Figure 4.10. A typical peptide layer in the first half (A) and second half (B) of the fibril in Model_22 after simulation for 10 ns. The peptide is colored to show the backbone (green), non-aromatic hydrophobic residues (pink), aromatic residues (violet), non- aromatic polar residues (cyan), positive residues (blue) and cysteine (yellow). 4.4.5. Models Derived from Model_22 The above results show that among all the four basic models, Model_22 is the most promising in terms of appropriate interactions at the interface and fit to experimental data (aromatic ring-to-ring distances). Also, the heterogeneity of Model_22 suggests that better models might exist with slight changes from Model_22. Therefore, 23 models derived from model_22 were constructed by adjusting the separation distance (from 6 Å to 12 Å) and the lateral shift (from -4 87 Å to 16 Å) between two protofilaments. To represent the 23 derived models, we use the id format d[x]s[y], where [x] indicates half of the separation (in Å) between the two protofilaments, and [y] indicates half of the lateral shift (in Å) between the two protofilaments. The original Model_22 is equivalent to d3s0, i.e. the separation distance is about 2*3 = 6 Å, and the lateral shifting is 2*0 = 0 Å. Fits to experimental data were performed on these models after 1 ns simulations. The DEER distance prediction shows similar results for all the models. The mobility prediction (Figure 4.11) shows that the periodicities of the first β-strand (residue 12-19) in all the derived models are consistent with the experimental data, as was the original Model_22, while for the second β-strand region (residue 31-36), models with larger shift or larger separation distance are more likely to match the experimental data. All models with lateral shifts of 12 Å (i.e. s6) or 16 Å (i.e. s8), and those with separation distances of 10-12 Å (i.e. d5- d6) and lateral shifts >4 Å match the experimental mobilities in the two β-strand regions, while the other models do not completely match the experimental data in the second β-strand. For the ring-to-ring distances of Y37-F15 (Figure 4.12, upper panel), all the models show generally consistent results, although a few models, such as d4s4 and d6s6, show lower values. For the ring-to-ring distances of Y37-F23 (Figure 4.12, lower panel), the models with separation distances ≤8 Å (except d4s8) show significantly lower values than the models with larger separation distances. With consideration of both the mobility prediction and ring-to-ring distances, 88 models d3s6, d3s8 and d4s6 match the experimental data best (Figure 4.13). Interestingly, all these three models have interaction interfaces close to the second half peptides in the original Model_22. All their interaction interfaces have a strong hydrophobic interaction between I26 and V32 surrounding a hydrophilic interaction between S28 and T30. 89 (Figure continues to the next page) 90 (Figure continues to the next page) 91 (Figure continues to the next page) 92 Figure 4.11. Experimental mobility data (the last panel) and the mean (red) and median (blue) of the number of conformers, i.e. predicted mobility, of each labeled residue in the original Model_22 (first penal) and the 23 modified models after 1 ns simulations. The experimental mobility values are the inverse central line widths of EPR spectra (Bedrood et al., 2012). 93 Figure 4.12. Ring-to-ring distances of Y37-F15 (upper panel) and Y37-F23 (lower panel) for the original Model_22 (left most) and the 23 modified models after 1 ns simulations. 94 Figure 4.13. Individual peptides from the starting structures (upper figures) and from structures after 1ns simulations (lower figures) in the d3s6 (A), d3s8 (B) and d4s6 (C) models. The peptide is colored to show the backbone (green), non-aromatic hydrophobic residues (pink), aromatic residues (violet), non-aromatic polar residues (cyan), positive residues (blue) and cysteine (yellow). 4.5. Discussion In this study, four hIAPP fibril models (twisted morphology), Model_11, Model_12, Model_22 and Model_LL, with different interaction interfaces were 95 constructed, equilibrated and evaluated. All the models are stable in the simulation with periodic boundary conditions at physiological temperature (310 K). Also, all are consistent with the experimental DEER distances. Among the four basic models investigated, Model_22 exhibited a favorable interfacial interaction and has the best agreement with FRET data suggesting Y37 is near both F15 and F23. The derived models from Model_22 with a large lateral shift (about 12-16Å) between the two protofilaments have good agreement with the experimental mobilities and their ring-to-ring distances of Y37-F23 are also close to the FRET data when the separation distances are not greater than 8 Å. The interaction interfaces in all the favorable derived models have a strong hydrophobic interaction between I26 and V32 packing a hydrophilic interaction between S28 and T30 (Figure 4.13). These results suggest a potential inter- protofilament interface interacting via the second β-strand regions (C-terminal). This finding is consistent with models proposed for the ribbon-like hIAPP fibril, in which the two protofilaments also interact with each other by the second β-strand regions (Luca et al., 2007). MD simulations based on this model also indicated that the interaction between the two C-terminal β-strand is more favorable than the interaction between the two N-terminal β-strands, which is better than the interaction between one N-terminal and one C-terminal β-strand (Zhao et al., 2011b). Although the derived models of Model_22, such as d3s6 and d3s8, have generally good agreement with all the experimental data, some inconsistencies 96 still exist. One is the mobility predictions in the inter-β-strand region for the current models do not fit the experimental mobility very well. There may be several reasons for this poor fit. First, our current method of mobility prediction is not accurate, since it was performed by calculating the number of conformers for each labeled residue with two assumptions: (1) the peptide backbone is inflexible, and (2) a larger number of conformers suggests higher mobility. Violations of these assumptions could happen in the situations: (1) the backbone of a region involved in neither a secondary structure nor a strong interaction could be quite flexible, and (2) experimentally, a residue might stay in a few favorable conformations, even if it has many accessible conformations. These violations might greatly affect the accuracy of our prediction and a modified prediction method considering these potential violations could be developed to reduce such problems. Second, the favorable models we found still need improvement. Slight shifts between the two protofilaments might improve the mobility consistency. Also, the ring-to-ring distances of Y37-F15 and Y37-F23 in the favorable models are still larger than those predicted by the FRET data. Increasing the fibril twist could reduce the ring-to-ring distances of Y37-F15 because of the relation of this distance with stagger, and a slight shift between the two protofilaments may also reduce the ring-to-ring distances of Y37-F23. Therefore, based on the current favorable models, a series of new models with small changes could be constructed and tested in the same way to obtain 97 improved structures. A longer simulation could be performed on the favorable models to obtain further equilibrated structures. New EPR experiments might then be designed to test the prediction structure through measurement of new DEER distances. 98 Chapter 5 Modeling of the Water Network at Protein-RNA Interfaces (WATGEN) 5.1. Introduction In this chapter, the WATGEN algorithm is used to predict water networks at protein-RNA interfaces. WATGEN has been validated in detail for solvation of protein-peptide interfaces (Bui et al., 2007) and the core of the algorithm used here is unchanged. The major changes made to facilitate the work described below (and published in Li et al, 2011) were adding the solvation function of WATGEN to protein-RNA interfaces and investigating the features of solvated protein-RNA interfaces. 5.2. Background Water plays an important role in biomolecular association. Water molecules at protein-protein and protein-oligonucleotide interfaces form extensive hydrogen-bonded networks and facilitate the formation and dissociation of these complexes (Levy and Onuchic, 2006; Li and Lazaridis, 2007; Makarov et al., 2002). Water molecules influence molecular recognition, specificity and affinity of protein complexes with other proteins (Hayes et al., 2011; Reichmann et al., 2008; Yokota et al., 2003) and with DNA (Duan and Nilsson, 2002; Luo et al., 2011; Robinson and Sligar, 1998; Sonavane and Chakrabarti, 2009), RNA 99 (Castrignanò et al., 2002; Sonavane and Chakrabarti, 2009), and sugars (Nurisso et al., 2010). Water is also important in the interaction of proteins with small ligands (de Beer et al., 2010) and recognition of the hydration status of a protein binding pocket may guide optimal drug design (Barillari et al., 2007; Homans, 2007). It is also apparent that not all water molecules at an interface are equivalent: some show rapid exchange with the bulk, while others are relatively tightly bound (Makarov et al., 2002); and some water molecules bound to a protein surface prior to ligand binding are retained in complexes with small molecules (Homans, 2007) and proteins (Reichmann et al., 2008). These observations suggest the need for a method of classification of water molecules in the complex network formed at protein interfaces. Solvation of a protein surface has been predicted computationally based on X-ray data (Durchschlag and Zipper, 2003; Pitt and Goodfellow, 1991), molecular dynamics simulations (Geroult et al., 2007; Makarov et al., 2002; Virtanen et al., 2010), and grid-based simulation (Michel et al., 2009). Key "wet spots" have been identified as critical features of protein-protein interactions (Teyra and Pisabarro, 2007), and the inclusion of water within docking algorithms is likely to increase the accuracy of predicted structures (Li and Lazaridis, 2007). van Dijk and Bonvin addressed this issue by mimicking the exclusion of water molecules during the formation of an interface (van Dijk and Bonvin, 2006), while the solvated rotamer approach of Jiang et al. (Jiang et al., 2005) permits protein interface design with inclusion of water-mediated hydrogen bonds. We have also 100 described an algorithm, WATGEN, for rapid solvation of a protein-protein interface using optimization of the hydrogen-bonded water-protein network, which we validated against data for water positions from X-ray structures (Bui et al., 2007). The structural details of protein-RNA interfaces are emerging based on analyses (Bahadur et al., 2008; Ellis et al., 2007; Lewis et al., 2011) of the growing number of solved structures. Treger and Westhof (Treger and Westhof, 2001) first pointed out the key role of water at the protein-RNA interface and Bahadur et al. (Bahadur et al., 2008) proposed that an average protein-RNA interface contains 32 water molecules, based on X-ray data. The detailed findings for protein-protein complexes and the emerging evidence for protein- RNA interfaces indicates that rational design of these interfaces must take into account the “fit” of the water network at the predicted interface. This requires prediction of this water network and classification of the binding properties of each water molecule in the network. 5.3. Methods 5.3.1. Summary of the WATGEN Algorithm WATGEN has been described in detail and validated for solvation of protein- peptide interfaces (Bui et al., 2007). Calculation of the water network at a protein- RNA interface was performed using the same method, with modifications only to add geometry and atom information for RNA. Briefly, after addition of hydrogen 101 atoms to the protein and RNA in standard geometries, the water molecules at the interface are calculated in WATGEN in 4 steps (Bui et al., 2007): (1) water sites defined by the oxygen atom of each water molecule are distributed around hydrogen-bonding centers (donors and acceptors) on both sides of the interface; (2) the oxygen sites are classified and may be combined based on their distance to other oxygen sites or hydrogen-bonding sites in the interface; (3) the best sites are selected based on maximization of potential hydrogen-bonding interactions (without hydrogen atoms at this stage) and minimization of van der Waals clashes; and (4) hydrogen atoms are added to each O atom in a geometry that optimizes the number of hydrogen-bonding contacts. 5.3.2. Protein-RNA Complexes A set of 224 protein-RNA complexes (Table 5.1) were solvated using the WATGEN algorithm. These complexes were derived from an initial set of 541 PDB files containing protein-RNA complexes, which were downloaded from the RCSB database (Berman et al., 2000) using the advanced search facility with “Macromolecule Type” set to “Contains Protein = Yes, RNA = Yes, DNA = No, DNA/RNA Hybrid = Ignore” and “Text Search” set to “PROTEIN OR RNA NOT RIBOSOME NOT RIBOSOMAL”. These PDB files were reduced to 394 unique binary protein-RNA complexes (one protein and one RNA molecule) using the SIMA algorithm, which computes a similarity score for pairwise comparison of two protein-RNA interfaces (Sutch et al., 2009). Binary complexes from a single PDB file were then recombined in two stages to create contiguous interfaces: 102 interacting RNA chains were identified and combined and then all protein chains with an interaction with the combined RNA chain were included. This gave 292 complexes, which were further reduced by elimination of 68 complexes with non- natural RNA nucleotides at the interface (within 20 Å of any amino acid). This left 224 protein-RNA complexes for analysis in WATGEN. The RNA chain was specified as the “ligand” in WATGEN. Solvation was performed using a water box that extended 6 Å from the minimum and maximum x, y and z coordinates of the ligand (Bui et al., 2007). Table 5.1. Classification of water molecules at 224 protein-RNA interfaces Structure ID nWat SWB DWB HPHOB %SWB %DWB %HPHOB 1A1T_A_B 121 44 72 5 36.4% 59.5% 4.1% 1A34_A_BC 78 21 45 12 26.9% 57.7% 15.4% 1A4T_B_A 96 43 52 1 44.8% 54.2% 1.0% 1A9N_AB_Q 176 69 95 12 39.2% 54.0% 6.8% 1AQ3_A_R 89 28 61 0 31.5% 68.5% 0.0% 1AUD_A_B 140 56 77 7 40.0% 55.0% 5.0% 1B7F_A_P 198 77 116 5 38.9% 58.6% 2.5% 1BIV_B_A 124 51 64 9 41.1% 51.6% 7.3% 1BMV_2_M 85 32 51 2 37.6% 60.0% 2.4% 1C9S_LtoV_W 618 241 358 19 39.0% 57.9% 3.1% 1CVJ_BE_N 144 56 79 9 38.9% 54.9% 6.3% 1CX0_A_B 148 55 80 13 37.2% 54.1% 8.8% 1DDL_ABC_D 94 36 55 3 38.3% 58.5% 3.2% 1DI2_AB_CD 89 31 54 4 34.8% 60.7% 4.5% 1DI2_AB_E 71 20 49 2 28.2% 69.0% 2.8% 1DZ5_AB_CD 290 85 184 21 29.3% 63.4% 7.2% 1E7K_A_C 101 39 59 3 38.6% 58.4% 3.0% 1E8O_ABCD_E 220 78 138 4 35.5% 62.7% 1.8% 1EC6_A_D 156 55 99 2 35.3% 63.5% 1.3% 1EIY_AB_C 221 63 153 5 28.5% 69.2% 2.3% 1EKZ_A_B 73 17 46 10 23.3% 63.0% 13.7% 1ETF_B_A 159 60 86 13 37.7% 54.1% 8.2% 1EUQ_A_B 400 162 218 20 40.5% 54.5% 5.0% 1EXY_B_A 145 55 75 15 37.9% 51.7% 10.3% 1F8V_ACE_R 105 26 76 3 24.8% 72.4% 2.9% 103 Structure ID nWat SWB DWB HPHOB %SWB %DWB %HPHOB 1G59_A_B 344 101 227 16 29.4% 66.0% 4.7% 1G70_B_A 130 44 81 5 33.8% 62.3% 3.8% 1GAX_A_C 453 137 298 18 30.2% 65.8% 4.0% 1GTF_LtoV_W 616 253 351 12 41.1% 57.0% 1.9% 1H4Q_AB_T 190 68 114 8 35.8% 60.0% 4.2% 1HJI_B_A 99 41 57 1 41.4% 57.6% 1.0% 1HVU_AB_C 154 37 109 8 24.0% 70.8% 5.2% 1I9F_B_A 122 35 80 7 28.7% 65.6% 5.7% 1J1U_A_B 108 22 80 6 20.4% 74.1% 5.6% 1J2B_AB_C 538 194 334 10 36.1% 62.1% 1.9% 1J2B_AB_D 496 181 293 22 36.5% 59.1% 4.4% 1K1G_A_B 133 49 79 5 36.8% 59.4% 3.8% 1KNZ_AB_W 102 50 51 1 49.0% 50.0% 1.0% 1KOG_ABE_J 271 76 182 13 28.0% 67.2% 4.8% 1KOG_AB_I 240 78 153 9 32.5% 63.8% 3.8% 1KQ2_ABHIKM_R 169 67 97 5 39.6% 57.4% 3.0% 1L1C_AB_C 140 60 73 7 42.9% 52.1% 5.0% 1L9A_A_B 165 59 96 10 35.8% 58.2% 6.1% 1LNG_A_B 163 63 91 9 38.7% 55.8% 5.5% 1M8V_ALMN_O 100 38 58 4 38.0% 58.0% 4.0% 1M8V_ABL_T 88 27 55 6 30.7% 62.5% 6.8% 1M8W_B_D 139 66 71 2 47.5% 51.1% 1.4% 1M8Y_A_C 160 71 86 3 44.4% 53.8% 1.9% 1MFQ_BC_A 214 69 139 6 32.2% 65.0% 2.8% 1N35_A_BC 280 94 181 5 33.6% 64.6% 1.8% 1N38_A_BC 176 65 103 8 36.9% 58.5% 4.5% 1NYB_A_B 90 37 51 2 41.1% 56.7% 2.2% 1OOA_AB_C 221 70 145 6 31.7% 65.6% 2.7% 1P6V_AC_D 154 53 90 11 34.4% 58.4% 7.1% 1P6V_AC_B 196 75 108 13 38.3% 55.1% 6.6% 1PGL_2_3 75 23 49 3 30.7% 65.3% 4.0% 1QZW_A_B 95 25 66 4 26.3% 69.5% 4.2% 1R9F_A_BC 144 43 93 8 29.9% 64.6% 5.6% 1RC7_A_DE 112 37 71 4 33.0% 63.4% 3.6% 1RC7_A_BC 42 15 24 3 35.7% 57.1% 7.1% 1RKJ_A_B 141 52 83 6 36.9% 58.9% 4.3% 1SI3_A_B 113 49 62 2 43.4% 54.9% 1.8% 1T4L_B_A 123 31 84 8 25.2% 68.3% 6.5% 1TFW_AC_GJ 324 110 204 10 34.0% 63.0% 3.1% 1TFW_B_EH 162 56 101 5 34.6% 62.3% 3.1% 1TFY_AC_GJ 273 94 170 9 34.4% 62.3% 3.3% 1U0B_B_A 353 129 214 10 36.5% 60.6% 2.8% 1ULL_B_A 132 66 65 1 50.0% 49.2% 0.8% 1URN_C_R 115 38 68 9 33.0% 59.1% 7.8% 104 Structure ID nWat SWB DWB HPHOB %SWB %DWB %HPHOB 1UTD_LM_0 76 27 48 1 35.5% 63.2% 1.3% 1VFG_A_C 123 25 90 8 20.3% 73.2% 6.5% 1WMQ_A_C 93 23 60 10 24.7% 64.5% 10.8% 1WNE_A_BC 205 74 125 6 36.1% 61.0% 2.9% 1WSU_A_E 87 29 56 2 33.3% 64.4% 2.3% 1WWD_A_B 73 32 39 2 43.8% 53.4% 2.7% 1WWE_A_B 115 33 76 6 28.7% 66.1% 5.2% 1WWF_A_B 109 43 62 4 39.4% 56.9% 3.7% 1WZ2_A_C 465 132 316 17 28.4% 68.0% 3.7% 1YTU_B_EF 117 50 65 2 42.7% 55.6% 1.7% 1YTY_AB_CD 287 107 171 9 37.3% 59.6% 3.1% 1YVP_B_H 145 60 81 4 41.4% 55.9% 2.8% 1YYK_AB_DEF 111 42 69 0 37.8% 62.2% 0.0% 1YYW_C_KL 88 31 57 0 35.2% 64.8% 0.0% 1YYW_A_GH 94 34 53 7 36.2% 56.4% 7.4% 1YYW_A_EF 56 12 43 1 21.4% 76.8% 1.8% 1YZ9_A_CDEF 165 55 107 3 33.3% 64.8% 1.8% 1ZBH_AD_F 136 45 81 10 33.1% 59.6% 7.4% 1ZBH_AD_E 162 54 100 8 33.3% 61.7% 4.9% 1ZDJ_A_R 64 17 47 0 26.6% 73.4% 0.0% 1ZSE_AB_R 114 36 78 0 31.6% 68.4% 0.0% 2A8V_B_E 58 16 40 2 27.6% 69.0% 3.4% 2A9X_1_2 88 29 52 7 33.0% 59.1% 8.0% 2AD9_A_B 94 33 57 4 35.1% 60.6% 4.3% 2ADC_A_C 103 41 58 4 39.8% 56.3% 3.9% 2ADC_A_B 126 50 76 0 39.7% 60.3% 0.0% 2AKE_A_B 109 29 72 8 26.6% 66.1% 7.3% 2AZX_B_D 74 22 44 8 29.7% 59.5% 10.8% 2B2E_AB_R 162 55 105 2 34.0% 64.8% 1.2% 2B6G_A_B 38 10 26 2 26.3% 68.4% 5.3% 2BBV_CF_N 51 15 36 0 29.4% 70.6% 0.0% 2BGG_A_PQ 152 55 94 3 36.2% 61.8% 2.0% 2BTE_AD_B 379 110 253 16 29.0% 66.8% 4.2% 2BX2_L_R 78 21 51 6 26.9% 65.4% 7.7% 2C06_AB_C 97 41 55 1 42.3% 56.7% 1.0% 2C4R_L_R 149 42 100 7 28.2% 67.1% 4.7% 2CSX_A_C 223 64 155 4 28.7% 69.5% 1.8% 2CT8_A_C 196 57 136 3 29.1% 69.4% 1.5% 2D6F_C_E 149 32 111 6 21.5% 74.5% 4.0% 2DEU_AB_D 276 92 172 12 33.3% 62.3% 4.3% 2DR5_A_B 189 78 107 4 41.3% 56.6% 2.1% 2DU3_A_D 144 41 90 13 28.5% 62.5% 9.0% 2DU5_A_D 153 43 103 7 28.1% 67.3% 4.6% 2EC0_A_BC 224 80 141 3 35.7% 62.9% 1.3% 105 Structure ID nWat SWB DWB HPHOB %SWB %DWB %HPHOB 2ERR_A_B 120 49 62 9 40.8% 51.7% 7.5% 2ESE_A_B 88 29 48 11 33.0% 54.5% 12.5% 2EZ6_AB_CD 439 128 295 16 29.2% 67.2% 3.6% 2F8K_A_B 80 26 53 1 32.5% 66.3% 1.3% 2F8S_A_CD 96 34 62 0 35.4% 64.6% 0.0% 2FK6_A_R 158 43 112 3 27.2% 70.9% 1.9% 2FY1_A_B 135 52 77 6 38.5% 57.0% 4.4% 2G4B_A_B 78 30 48 0 38.5% 61.5% 0.0% 2GIC_ABCDE_R 929 310 593 26 33.4% 63.8% 2.8% 2GJE_AD_RS 211 65 143 3 30.8% 67.8% 1.4% 2GTT_LtoV_X 1904 652 1202 50 34.2% 63.1% 2.6% 2GTT_AtoK_W 1923 669 1211 43 34.8% 63.0% 2.2% 2HGH_A_B 175 50 110 15 28.6% 62.9% 8.6% 2HT1_AB_M 122 37 83 2 30.3% 68.0% 1.6% 2I91_AB_CD 273 115 150 8 42.1% 54.9% 2.9% 2IHX_A_B 242 81 151 10 33.5% 62.4% 4.1% 2IPY_A_C 213 83 129 1 39.0% 60.6% 0.5% 2IX1_A_B 299 100 186 13 33.4% 62.2% 4.3% 2IY5_AB_T 259 71 167 21 27.4% 64.5% 8.1% 2IZN_C_S 65 22 42 1 33.8% 64.6% 1.5% 2JEA_AB_C 119 41 73 5 34.5% 61.3% 4.2% 2JPP_AB_D 163 55 101 7 33.7% 62.0% 4.3% 2JPP_AB_C 124 43 76 5 34.7% 61.3% 4.0% 2KDQ_A_B 100 37 61 2 37.0% 61.0% 2.0% 2KFY_A_B 75 35 35 5 46.7% 46.7% 6.7% 2KH9_A_B 80 28 47 5 35.0% 58.8% 6.3% 2KM8_BC_A 238 78 142 18 32.8% 59.7% 7.6% 2KX5_B_A 102 30 59 13 29.4% 57.8% 12.7% 2L2K_B_A 137 44 85 8 32.1% 62.0% 5.8% 2L3C_A_B 106 27 68 11 25.5% 64.2% 10.4% 2L3J_A_B 192 63 117 12 32.8% 60.9% 6.3% 2NR0_ACD_G 317 103 199 15 32.5% 62.8% 4.7% 2NR0_ABC_E 233 56 163 14 24.0% 70.0% 6.0% 2NR0_CD_H 186 48 132 6 25.8% 71.0% 3.2% 2NR0_AB_F 272 83 176 13 30.5% 64.7% 4.8% 2NUE_AB_C 163 54 106 3 33.1% 65.0% 1.8% 2NUG_AB_CDEF 429 144 268 17 33.6% 62.5% 4.0% 2OZB_AB_C 193 73 113 7 37.8% 58.5% 3.6% 2PLY_AB_C 195 61 130 4 31.3% 66.7% 2.1% 2PY9_AC_F 184 61 116 7 33.2% 63.0% 3.8% 2QUX_ABDE_C 184 59 122 3 32.1% 66.3% 1.6% 2R7T_A_X 171 55 109 7 32.2% 63.7% 4.1% 2R7V_A_X 113 36 76 1 31.9% 67.3% 0.9% 2R8S_HL_R 202 62 132 8 30.7% 65.3% 4.0% 106 Structure ID nWat SWB DWB HPHOB %SWB %DWB %HPHOB 2R92_AB_PT 333 103 222 8 30.9% 66.7% 2.4% 2R93_AB_R 228 67 161 0 29.4% 70.6% 0.0% 2RKJ_ABEF_CDG 629 160 436 33 25.4% 69.3% 5.2% 2V3C_BCD_N 339 125 198 16 36.9% 58.4% 4.7% 2V3C_AC_M 474 163 292 19 34.4% 61.6% 4.0% 2W2H_AC_R 142 35 97 10 24.6% 68.3% 7.0% 2W2H_BD_S 139 33 102 4 23.7% 73.4% 2.9% 2WYY_ACDFHM_R 980 279 640 61 28.5% 65.3% 6.2% 2XD0_AE_G 330 119 199 12 36.1% 60.3% 3.6% 2XGJ_A_C 114 42 72 0 36.8% 63.2% 0.0% 2XNR_A_C 65 22 40 3 33.8% 61.5% 4.6% 2ZI0_AB_CD 383 146 231 6 38.1% 60.3% 1.6% 2ZKO_AB_CD 176 61 108 7 34.7% 61.4% 4.0% 2ZM5_AB_C 317 112 194 11 35.3% 61.2% 3.5% 2ZNI_AB_D 296 112 173 11 37.8% 58.4% 3.7% 2ZNI_AB_C 290 105 181 4 36.2% 62.4% 1.4% 2ZUE_A_B 384 134 230 20 34.9% 59.9% 5.2% 2ZZM_A_B 376 126 235 15 33.5% 62.5% 4.0% 2ZZN_AB_C 505 164 323 18 32.5% 64.0% 3.6% 2ZZN_AB_D 388 130 247 11 33.5% 63.7% 2.8% 3A2K_AB_C 402 161 233 8 40.0% 58.0% 2.0% 3A2K_AB_D 408 165 230 13 40.4% 56.4% 3.2% 3A6P_FH_IJ 361 102 246 13 28.3% 68.1% 3.6% 3A6P_AC_DE 406 136 260 10 33.5% 64.0% 2.5% 3ADB_A_C 339 86 238 15 25.4% 70.2% 4.4% 3ADC_B_D 247 68 167 12 27.5% 67.6% 4.9% 3ADC_A_C 332 96 221 15 28.9% 66.6% 4.5% 3ADI_AB_DE 53 17 33 3 32.1% 62.3% 5.7% 3AEV_B_C 170 69 91 10 40.6% 53.5% 5.9% 3AKZ_ABC_E 464 151 302 11 32.5% 65.1% 2.4% 3AL0_BC_E 551 189 346 16 34.3% 62.8% 2.9% 3AM1_A_B 200 61 129 10 30.5% 64.5% 5.0% 3BOY_ABC_D 300 88 182 29 29.3% 60.7% 9.7% 3CIY_AB_CD 362 123 227 12 34.0% 62.7% 3.3% 3CZ3_ABCD_EF 401 155 240 6 38.7% 59.9% 1.5% 3CZ3_ABCD_GH 436 168 254 14 38.5% 58.3% 3.2% 3D2S_B_F 33 12 21 0 36.4% 63.6% 0.0% 3D2S_A_E 37 16 21 0 43.2% 56.8% 0.0% 3EPH_A_E 331 130 195 6 39.3% 58.9% 1.8% 3EQT_AB_CD 184 59 112 13 32.1% 60.9% 7.1% 3EX7_CDI_F 123 43 77 3 35.0% 62.6% 2.4% 3FTE_A_CD 108 30 76 2 27.8% 70.4% 1.9% 3GIB_ABC_H 138 46 80 12 33.3% 58.0% 8.7% 3HL2_ABCD_E 417 105 300 12 25.2% 71.9% 2.9% 107 Structure ID nWat SWB DWB HPHOB %SWB %DWB %HPHOB 3HSB_ABCD_X 132 52 69 11 39.4% 52.3% 8.3% 3HTX_A_BC 489 156 314 19 31.9% 64.2% 3.9% 3IAB_AB_R 392 132 251 9 33.7% 64.0% 2.3% 3ICQ_BT_D 337 96 233 8 28.5% 69.1% 2.4% 3IIN_A_BCD 147 49 89 9 33.3% 60.5% 6.1% 3IVK_AB_C 178 63 111 4 35.4% 62.4% 2.2% 3IWN_CD_A 206 70 126 10 34.0% 61.2% 4.9% 3IWN_D_B 148 48 86 14 32.4% 58.1% 9.5% 3KFU_DEF_M 257 81 159 17 31.5% 61.9% 6.6% 3KS8_ABCD_EF 282 92 184 6 32.6% 65.2% 2.1% 3KTV_BD_C 239 75 143 21 31.4% 59.8% 8.8% 3LOB_CF_R 52 12 37 3 23.1% 71.2% 5.8% 3M7N_DG_Y 115 47 66 2 40.9% 57.4% 1.7% 3MDG_A_C 94 27 59 8 28.7% 62.8% 8.5% 3NNC_A_B 101 30 67 4 29.7% 66.3% 4.0% 3NNH_AC_E 199 66 127 6 33.2% 63.8% 3.0% 3O3I_X_A 84 28 52 4 33.3% 61.9% 4.8% 3OG8_AB_CD 281 84 192 5 29.9% 68.3% 1.8% 3OK7_A_BC 221 77 131 13 34.8% 59.3% 5.9% 3OL6_AE_BCG 502 151 343 8 30.1% 68.3% 1.6% 3OL6_A_BCD 370 114 249 7 30.8% 67.3% 1.9% 484D_A_B 165 71 91 3 43.0% 55.2% 1.8% 6MSF_A_R 78 22 56 0 28.2% 71.8% 0.0% The CPU time for computation of the 224 water networks was 2 h 3 m 53 s on a Dual Quadcore Intel Xeon 2.33 GHz (12GB Memory) node on a Linux Cluster. The average CPU time was 33.2 s per complex and the average interface had 61.3 amino acids and 22.0 nucleotides. WATGEN can be used at http://rockscluster.hsc.usc.edu/research/software/watgen/watgen.html. 5.3.3. Algorithm Validation To show that WATGEN solvates protein-RNA interfaces accurately, we compared the predicted water networks with water sites determined by X-ray crystallography. Of the 224 protein-RNA complexes used in the study, 105 108 contain experimentally determined water sites, and these complexes were used in the comparison. Experimental water molecules were mapped to the closest predicted water site. The predictive power of the algorithm was assessed by determining the proportion of experimental waters with a proximal predicted water site. As a negative control, a randomized water network was calculated for each interface by placing the same number of predicted interface water molecules in the same spatial volume at grid points 2.6 Å apart (Bui et al., 2007). Statistical analysis was performed using linear regression analysis and a Wilcoxon rank-sum test. All analyses were performed in SAS 9.2 (SAS Institute Inc., Cary, NC) with P < 0.05 considered to indicate significance. 5.3.4. Classification of Interfacial Water Molecules An understanding of the complexity of the water network at a biomolecular interface requires classification of the location of each water molecule. To establish this classification, we defined each water molecule based on its interactions with the protein and RNA and its positional relationship with bulk water (Figure 5.1). A water molecule forming hydrogen bonds with both protein and RNA is defined as a single-water bridge, and a water molecule with hydrogen bonds to one molecule (protein or RNA) of the interface and to another water that is in turn hydrogen bonded to the other molecule (RNA or protein) is defined as participating in a double-water bridge. The WATGEN algorithm completes the water network by inserting water molecules into free space in the interface (as 109 defined by the absence of clashes), which results in some water molecules having contacts with hydrophobic surfaces in the interface. Such water molecules are categorized as forming a single or double hydrophobic bridge (that is, at least one contact with the protein or RNA is not made through a hydrogen bond). Relatively few of these water molecules are present in the interface. Figure 5.1. Classification of the water network at a protein-RNA interface. The protein is shown in grey and the RNA in violet. Water molecules fill the interface and are represented by spheres of different colors and radii. Numbers represent the depth of the water molecule from the bulk solvent (not depicted). WATGEN calculates the first level of water molecules, which are in direct contact with the bulk solvent. These are colored red for depth 0. Water molecules making contact with red (depth 0) water molecules are colored orange (depth 1) and this process is repeated until all water molecules are classified. Water molecules that do not make contact with the bulk solvent directly or indirectly via other water molecules are considered buried and are colored pink. WATGEN adds water molecules to empty regions of the interface while avoiding clashes and these occasionally result in water contacts with hydrophobic surfaces. These water molecules are colored light blue. The sphere radius depends upon the hydrogen-bonding pattern. Single water bridges are shown as large spheres, double water bridges as small spheres, and hydrophobic bridges are intermediate spheres. Dashed lines represent hydrogen bonds between water molecules and RNA or protein. Black solid lines represent hydrogen bonds between water molecules. The original of this figure was made by Dr. Brian Sutch. 110 The position of a water molecule with respect to the bulk solvent is defined using two criteria (Figure 5.1). First, if the water molecule has a path to the bulk solvent (free space in the calculated system) that can be traced through hydrogen bonds, it is considered to be exchangeable with the bulk. The depth of each exchangeable water molecule is evaluated with respect to the bulk. Second, a water molecule may be trapped in a cavity with no hydrogen-bonded pathway to the bulk, and is defined to be buried. 5.4. Results 5.4.1. Algorithm Validation Of the 224 protein-RNA complexes solvated by WATGEN (Table 5.1), a subset of 105 complexes with experimental water molecules were used to validate the algorithm. The predicted water networks for these complexes were compared with the experimental water sites. Of the 1481 experimental water sites, WATGEN predicts 981 (66.2%) and 1332 (89.9%) within 1.5 Å and 2.0 Å, respectively (Figure 5.2A). One difficulty with interpretation of these data is that the number of water molecules in the 105 predicted water networks is much higher than that found experimentally. We have shown (Bui et al., 2007) that WATGEN does not overpredict the number of water molecules, based on computation of the energies of networks containing fewer or more water molecules, both of which were energetically less favorable than networks computed with the WATGEN conditions used in the current and previous work 111 (Bui et al., 2007). Figure 5.2. Numbers of experimental water molecules within a specific distance (0.5 to 3.0 Å) from predicted water sites in (A) the WATGEN-predicted network and (B) a random network. Data are calculated based on O-O distances between experimental and predicted or random water sites. This figure was made using the program R 2.12.1 (Team, 2009). The difference from experimental data is due to the difficulty of identifying electron density for relatively mobile water molecules in X-ray structures. However, the large number of predicted sites could account for the good agreement with the fewer experimental sites simply by chance. To address this issue, we calculated a random water network for each complex comprising the same number of water molecules in the same interface volume. In the random networks, only 441 (29.8%) and 787 (53.1%) of water molecules were within 1.5Å and 2.0Å of the 1481 experimental sites, respectively (Figure 5.2B). For each experimental water site, we calculated the distances (O to O) to the closest 112 water molecule in the random network and the closest water molecule in the WATGEN-predicted network. The median of these distances over the 1481 water sites was significantly smaller for the WATGEN-predicted network (P<0.001 by Wilcoxon rank-sum test, Figure 5.3). Figure 5.3. Boxplots of O-O distances between experimental water molecule sites and those in a WATGEN-predicted (red) or a random (blue) network. The quartiles are defined by the edges of the rectangles, which contain 50% of the data. Black horizontal lines within the boxes show medians, circles represent outliers, and dotted lines show the standard deviation. The median values are significantly different (P<0.001 by Wilcoxon rank-sum test). 5.4.2. Overview of the Water Networks Linear regression analysis showed that the number of predicted water molecules (nWat in Table 5.1) in the 224 protein-RNA interfaces is significantly related to the size of the interface: p<0.0001 and R 2 =0.9854 for the model nWat 113 = -0.553 + 2.448× nIntfaa + 3.315× nIntfnt (Equation 1), where nWat is the number of predicted water molecules in the protein-RNA interface, nIntfaa is the number of amino acids in the protein-RNA interface (the number of amino acids with direct or water-mediated interactions with the RNA), and nIntfnt is the number of nucleotides in the protein-RNA interface (the number of nucleotides with direct or water-mediated interactions with the protein) (Figure 5.4). The estimated slopes of nIntfaa and nIntfnt are both significantly different from zero (p<0.0001), with 95% confidence limits of (2.376, 2.521) and (3.003, 3.627), respectively, and the estimated intercept is not significantly different from zero (p=0.8524). This model suggests that the mean number of predicted water molecules in the protein-RNA interface increases by about 2.4 per increase of one amino acid in the interface, and by about 3.3 per increase of one nucleotide in the interface. This regression model was generated from a high quality dataset in which duplicate interfaces were excluded (Sutch et al., 2009). The results of jackknife error analysis of the model are shown in Table 5.2. The parameter estimates from the predictive model are consistent with the jackknife estimates. Table 5.2. Parameter estimates from the predictive model (Equation 1) and the jackknife estimates. Predictor Proposed predictive model Jackknife results Coeff. a SE b P-Value 95% CL c Coeff. a SE b P-Value Intercept -0.553 2.968 0.8524 (-6.401, 5.296) -0.553 2.460 0.8225 nIntfaa d 2.448 0.037 <0.0001 (2.376, 2.521) 2.448 0.069 <0.0001 nIntfnt e 3.315 0.158 <0.0001 (3.003, 3.627) 3.315 0.207 <0.0001 a Coefficient; b Standard Error; c Confidence Limits d Number of amino acids in the protein-RNA interface e Number of nucleotides in the protein-RNA interface 114 Figure 5.4. Upper panel. Plot of the number of WATGEN-predicted water molecules vs. the predicted value using the regression model for interface size (Equation 1). The blue line shows the regression line. Lower panel. Identical plot, with red dots showing the 95% confidence level (CL) of the mean values and green dots showing the 95% CL of the predicted values. 115 5.4.3. Classification of Water Molecules in the Network Each water molecule in the predicted water networks was classified (Figure 1) based on its bridging location between the protein and RNA molecules as a hydrogen-bonded single-water bridge (SWB), as part of a hydrogen-bonded double-water bridge (DWB), or as a water molecule with a contact with a hydrophobic surface (HPHOB). Water molecules that fulfilled 2 or more of these definitions were classified based on their most direct interaction (i.e. SWB > DWB > HPHOB). This classification is shown for the 224 protein-RNA interfaces in Table 5.1. As a percentage of the total number of water molecules in each complex, the interfaces contained 33.7 ± 5.4% SWB, 62.3 ± 5.3% DWB, and 4.1 ± 2.7% HPHOB water molecules. Each water molecule was further classified with respect to its depth from the bulk solvent. Water molecules with a direct hydrogen-bonded path to the bulk were considered to be solvent accessible (that is, exchangeable with bulk). The depth was then defined by the number of intervening waters between the water molecule and the bulk; therefore, a water molecule in direct hydrogen-bonded contact with the bulk solvent had a depth of zero. Water molecules with no direct hydrogen-bonded path to the bulk were classified as buried. These classifications were overlaid on the SWB, DWB and HPHOB classifications, as shown for the 224 interfaces in Table 5.3. For SWB molecules, 46.5 ± 12.2% and 34.3 ± 7.4% were at depths of zero and 1, respectively, and for DWB molecules, 79.7 ± 8.6% and 11.7 ± 4.8% were at these respective depths. Thus, most interface water 116 molecules are close to the bulk solvent. Table 5.3. Types of water molecules at 224 protein-RNA interfaces Structure ID Single-water bridges Double-water bridges Hphob 0 1 2 3 4 ≥5 bur dir 1 2 3 4 ≥5 bur acc Bur 1A1T_A_B 19 17 2 1 1 1 3 56 7 4 0 0 5 0 5 0 1A34_A_BC 15 3 0 0 0 0 3 42 3 0 0 0 0 0 12 0 1A4T_B_A 20 11 4 3 2 1 2 47 3 0 1 0 1 0 0 1 1A9N_AB_Q 28 26 9 4 0 0 2 79 14 1 1 0 0 0 12 0 1AQ3_A_R 13 10 2 0 1 2 0 52 4 2 1 1 1 0 0 0 1AUD_A_B 23 13 11 3 1 0 5 57 12 2 2 1 1 2 6 1 1B7F_A_P 34 24 9 4 2 0 4 87 8 7 7 3 0 4 4 1 1BIV_B_A 18 13 7 3 1 0 9 53 3 5 3 0 0 0 9 0 1BMV_2_M 12 10 5 4 0 0 1 39 5 5 2 0 0 0 2 0 1C9S_LtoV_W 86 99 37 5 1 0 13 295 37 14 3 0 0 9 18 1 1CVJ_BE_N 30 13 4 5 2 0 2 67 7 2 2 0 0 1 8 1 1CX0_A_B 34 16 4 0 0 0 1 72 5 1 1 0 0 1 13 0 1DDL_ABC_D 21 12 0 0 0 0 3 54 1 0 0 0 0 0 3 0 1DI2_AB_CD 19 9 2 0 0 0 1 47 4 3 0 0 0 0 4 0 1DI2_AB_E 15 5 0 0 0 0 0 42 4 3 0 0 0 0 2 0 1DZ5_AB_CD 26 38 15 3 0 0 3 141 30 6 5 0 0 2 21 0 1E7K_A_C 16 14 5 4 0 0 0 49 5 3 2 0 0 0 1 2 1E8O_ABCD_E 31 27 12 5 1 0 2 101 19 13 3 0 0 2 4 0 1EC6_A_D 22 16 10 2 2 2 1 75 14 5 2 1 1 1 2 0 1EIY_AB_C 24 29 8 1 1 0 0 115 23 10 1 0 1 3 5 0 1EKZ_A_B 9 6 1 0 0 0 1 42 2 2 0 0 0 0 10 0 1ETF_B_A 18 17 7 6 6 3 3 70 6 5 1 1 3 0 13 0 1EUQ_A_B 51 57 30 14 2 0 8 160 19 17 9 5 3 5 19 1 1EXY_B_A 21 14 9 4 1 0 6 64 6 1 1 2 0 1 15 0 1F8V_ACE_R 14 11 1 0 0 0 0 66 8 2 0 0 0 0 3 0 1G59_A_B 40 34 7 6 3 4 7 165 31 11 10 4 3 3 13 3 1G70_B_A 11 15 13 2 0 0 3 65 8 5 3 0 0 0 5 0 1GAX_A_C 58 51 12 5 0 0 11 227 50 18 0 1 0 2 18 0 1GTF_LtoV_W 100 111 28 7 0 0 7 294 37 14 3 3 0 0 10 2 1H4Q_AB_T 35 21 8 1 1 0 2 90 20 2 1 1 0 0 8 0 1HJI_B_A 18 16 4 2 0 0 1 48 6 3 0 0 0 0 1 0 1HVU_AB_C 30 7 0 0 0 0 0 98 11 0 0 0 0 0 8 0 1I9F_B_A 18 12 5 0 0 0 0 65 14 1 0 0 0 0 7 0 1J1U_A_B 12 5 1 0 0 0 4 72 5 2 0 0 0 1 6 0 1J2B_AB_C 52 62 34 14 8 5 19 223 46 19 20 6 9 11 8 2 1J2B_AB_D 48 62 29 8 7 8 19 200 39 19 12 6 8 9 22 0 117 Structure ID Single-water bridges Double-water bridges Hphob 0 1 2 3 4 ≥5 bur dir 1 2 3 4 ≥5 bur acc Bur 1K1G_A_B 20 17 8 1 0 0 3 61 8 8 1 0 0 1 5 0 1KNZ_AB_W 12 12 3 2 2 2 17 26 6 2 1 4 1 11 0 1 1KOG_ABE_J 37 27 8 2 0 0 2 148 27 7 0 0 0 0 12 1 1KOG_AB_I 35 18 13 4 0 0 8 124 18 7 0 0 0 4 8 1 1KQ2_ABHIKM_R 22 18 3 4 4 6 10 70 17 7 0 0 0 3 4 1 1L1C_AB_C 25 17 5 4 2 1 6 62 4 6 1 0 0 0 7 0 1L9A_A_B 21 18 7 6 2 0 5 73 9 11 3 0 0 0 10 0 1LNG_A_B 29 16 5 4 3 1 5 68 6 10 4 1 1 1 9 0 1M8V_ALMN_O 19 10 3 3 0 0 3 51 5 0 0 2 0 0 4 0 1M8V_ABL_T 19 6 1 1 0 0 0 51 3 1 0 0 0 0 5 1 1M8W_B_D 28 20 10 6 2 0 0 56 9 1 2 2 1 0 2 0 1M8Y_A_C 31 17 5 5 4 4 5 64 12 1 1 2 3 3 3 0 1MFQ_BC_A 30 18 8 2 1 3 7 108 17 9 2 1 2 0 5 1 1N35_A_BC 17 38 20 4 3 1 11 98 42 21 10 2 1 7 5 0 1N38_A_BC 20 27 8 4 0 0 6 74 15 8 2 2 0 2 8 0 1NYB_A_B 21 8 3 1 0 0 4 45 2 1 1 1 0 1 2 0 1OOA_AB_C 31 22 9 3 2 1 2 100 25 9 4 1 4 2 6 0 1P6V_AC_D 27 14 2 2 2 1 5 69 5 9 4 0 0 3 9 2 1P6V_AC_B 37 21 9 3 2 0 3 91 7 2 2 1 0 5 12 1 1PGL_2_3 5 13 4 1 0 0 0 36 7 5 1 0 0 0 3 0 1QZW_A_B 15 5 2 0 1 0 2 54 10 1 1 0 0 0 4 0 1R9F_A_BC 19 14 3 3 3 0 1 77 8 3 4 1 0 0 8 0 1RC7_A_DE 24 6 1 4 2 0 0 62 4 3 0 1 1 0 4 0 1RC7_A_BC 10 3 1 0 0 0 1 24 0 0 0 0 0 0 3 0 1RKJ_A_B 17 21 9 2 0 0 3 71 9 2 1 0 0 0 6 0 1SI3_A_B 22 15 7 2 0 2 1 47 5 5 4 1 0 0 1 1 1T4L_B_A 9 14 6 1 0 0 1 70 9 4 1 0 0 0 8 0 1TFW_AC_GJ 43 41 13 5 1 0 7 162 22 11 9 0 0 0 9 1 1TFW_B_EH 20 26 3 3 1 0 3 78 11 6 4 2 0 0 3 2 1TFY_AC_GJ 43 28 9 1 0 0 13 137 18 8 5 0 0 2 9 0 1U0B_B_A 36 45 19 10 4 7 8 149 26 19 4 4 9 3 8 2 1ULL_B_A 27 14 9 2 1 6 7 53 4 4 2 0 1 1 0 1 1URN_C_R 27 7 1 0 0 0 3 61 6 0 0 0 0 1 9 0 1UTD_LM_0 14 9 4 0 0 0 0 40 5 1 2 0 0 0 1 0 1VFG_A_C 10 10 5 0 0 0 0 72 14 3 1 0 0 0 8 0 1WMQ_A_C 9 9 2 0 0 0 3 51 8 1 0 0 0 0 10 0 1WNE_A_BC 23 26 11 4 1 2 7 86 19 11 5 0 2 2 6 0 1WSU_A_E 10 13 5 1 0 0 0 49 3 3 1 0 0 0 2 0 1WWD_A_B 19 8 2 0 0 0 3 34 3 0 0 0 0 2 2 0 1WWE_A_B 19 11 2 0 0 0 1 66 5 5 0 0 0 0 5 1 118 Structure ID Single-water bridges Double-water bridges Hphob 0 1 2 3 4 ≥5 bur dir 1 2 3 4 ≥5 bur acc Bur 1WWF_A_B 23 13 3 2 0 0 2 53 3 4 1 0 0 1 4 0 1WZ2_A_C 44 52 22 10 0 0 4 235 45 28 5 0 0 3 17 0 1YTU_B_EF 14 14 6 4 2 1 9 42 11 3 2 1 0 6 1 1 1YTY_AB_CD 34 36 16 7 2 4 8 122 27 11 4 0 7 0 9 0 1YVP_B_H 22 18 14 4 2 0 0 51 14 10 4 2 0 0 4 0 1YYK_AB_DEF 27 11 2 0 0 0 2 62 7 0 0 0 0 0 0 0 1YYW_C_KL 16 7 3 3 1 1 0 45 9 2 0 1 0 0 0 0 1YYW_A_GH 14 7 8 4 1 0 0 46 4 2 1 0 0 0 7 0 1YYW_A_EF 7 4 1 0 0 0 0 32 9 2 0 0 0 0 1 0 1YZ9_A_CDEF 23 16 8 3 1 0 4 82 13 7 4 0 0 1 3 0 1ZBH_AD_F 20 14 7 0 0 0 4 59 14 6 2 0 0 0 10 0 1ZBH_AD_E 20 21 13 0 0 0 0 75 13 6 6 0 0 0 7 1 1ZDJ_A_R 7 7 2 0 0 0 1 42 3 1 1 0 0 0 0 0 1ZSE_AB_R 11 14 8 2 0 0 1 56 10 7 4 1 0 0 0 0 2A8V_B_E 9 5 0 0 0 0 2 33 4 1 0 0 0 2 2 0 2A9X_1_2 17 8 3 0 0 0 1 46 6 0 0 0 0 0 6 1 2AD9_A_B 18 9 3 1 1 0 1 50 1 2 3 0 1 0 4 0 2ADC_A_C 21 12 2 0 0 0 6 53 2 0 0 0 0 3 3 1 2ADC_A_B 24 14 8 1 0 0 3 57 9 9 1 0 0 0 0 0 2AKE_A_B 12 13 1 0 0 0 3 65 3 1 3 0 0 0 8 0 2AZX_B_D 8 8 1 1 0 0 4 39 3 0 0 0 0 2 8 0 2B2E_AB_R 12 18 9 9 2 1 4 70 12 10 9 4 0 0 1 1 2B6G_A_B 5 3 0 0 0 0 2 22 2 0 0 0 0 2 2 0 2BBV_CF_N 8 7 0 0 0 0 0 33 2 1 0 0 0 0 0 0 2BGG_A_PQ 21 18 3 5 1 1 6 66 13 7 5 2 0 1 3 0 2BTE_AD_B 58 29 14 3 0 0 6 199 37 8 6 1 0 2 16 0 2BX2_L_R 6 11 2 0 2 0 0 36 11 1 2 1 0 0 6 0 2C06_AB_C 21 13 4 1 0 0 2 48 3 2 1 1 0 0 1 0 2C4R_L_R 13 21 8 0 0 0 0 79 15 3 2 1 0 0 7 0 2CSX_A_C 25 20 5 3 3 1 7 116 21 7 2 2 1 6 2 2 2CT8_A_C 21 19 8 1 0 0 8 108 17 5 3 0 0 3 2 1 2D6F_C_E 13 10 3 2 1 0 3 93 15 0 0 1 1 1 6 0 2DEU_AB_D 34 32 12 5 4 1 4 124 26 12 4 4 1 1 10 2 2DR5_A_B 41 20 7 2 0 0 8 92 7 4 2 0 0 2 4 0 2DU3_A_D 24 8 4 0 0 0 5 75 13 1 0 0 0 1 13 0 2DU5_A_D 25 13 2 2 0 0 1 90 12 0 1 0 0 0 7 0 2EC0_A_BC 25 15 11 3 2 8 16 84 29 13 5 5 2 3 3 0 2ERR_A_B 32 11 1 2 1 0 2 53 4 3 0 2 0 0 8 1 2ESE_A_B 15 8 2 1 1 0 2 40 5 2 0 0 0 1 11 0 2EZ6_AB_CD 59 47 14 4 0 0 4 240 36 13 5 1 0 0 16 0 119 Structure ID Single-water bridges Double-water bridges Hphob 0 1 2 3 4 ≥5 bur dir 1 2 3 4 ≥5 bur acc Bur 2F8K_A_B 14 6 2 2 0 0 2 43 3 4 3 0 0 0 1 0 2F8S_A_CD 16 14 4 0 0 0 0 42 14 5 1 0 0 0 0 0 2FK6_A_R 25 15 2 0 0 0 1 92 16 2 1 0 0 1 3 0 2FY1_A_B 27 15 5 2 0 0 3 67 4 3 1 0 0 2 5 1 2G4B_A_B 19 5 0 0 0 0 6 44 1 1 0 0 0 2 0 0 2GIC_ABCDE_R 106 111 54 11 6 7 15 415 101 41 20 7 5 4 23 3 2GJE_AD_RS 27 24 6 3 2 2 1 122 13 4 4 0 0 0 3 0 2GTT_LtoV_X 204 232 115 41 15 3 42 852 227 84 21 10 3 5 47 3 2GTT_AtoK_W 213 240 131 40 6 0 39 847 247 77 27 5 1 7 41 2 2HGH_A_B 25 17 6 1 0 0 1 93 16 1 0 0 0 0 11 4 2HT1_AB_M 13 14 7 1 0 0 2 68 10 1 3 1 0 0 2 0 2I91_AB_CD 33 30 14 10 5 20 3 77 23 18 8 9 13 2 8 0 2IHX_A_B 31 30 14 4 2 0 0 118 21 7 5 0 0 0 10 0 2IPY_A_C 35 18 9 8 3 2 8 98 12 3 4 3 3 6 1 0 2IX1_A_B 35 32 16 3 3 3 8 119 44 10 7 1 2 3 13 0 2IY5_AB_T 30 24 6 5 3 0 3 140 15 6 2 1 0 3 20 1 2IZN_C_S 10 7 2 1 1 0 1 36 4 0 1 1 0 0 1 0 2JEA_AB_C 17 12 5 2 1 2 2 52 9 5 2 2 2 1 5 0 2JPP_AB_D 24 16 5 0 0 0 10 80 15 2 2 0 0 2 7 0 2JPP_AB_C 16 12 5 1 0 0 9 69 3 1 1 0 0 2 4 1 2KDQ_A_B 22 5 5 1 1 1 2 52 6 2 0 0 1 0 2 0 2KFY_A_B 19 9 1 0 0 0 6 33 0 1 0 0 0 1 4 1 2KH9_A_B 14 9 2 1 0 0 2 40 3 4 0 0 0 0 5 0 2KM8_BC_A 33 30 4 5 1 0 5 108 14 13 3 2 0 2 16 2 2KX5_B_A 15 9 4 0 1 0 1 52 6 0 1 0 0 0 13 0 2L2K_B_A 16 18 9 1 0 0 0 73 7 3 2 0 0 0 7 1 2L3C_A_B 17 6 1 1 1 0 1 58 5 2 2 0 0 1 11 0 2L3J_A_B 37 16 5 2 2 0 1 99 11 5 2 0 0 0 12 0 2NR0_ACD_G 33 35 18 9 6 1 1 147 24 15 10 2 1 0 15 0 2NR0_ABC_E 25 23 5 2 0 1 0 138 12 9 3 1 0 0 14 0 2NR0_CD_H 19 22 6 0 1 0 0 113 10 7 2 0 0 0 6 0 2NR0_AB_F 33 33 11 3 2 0 1 140 22 6 5 1 2 0 13 0 2NUE_AB_C 33 13 6 1 0 0 1 94 8 1 2 1 0 0 3 0 2NUG_AB_CDEF 67 46 7 5 3 8 8 191 38 18 9 5 5 2 14 3 2OZB_AB_C 30 17 7 4 2 4 9 69 17 11 4 0 6 6 5 2 2PLY_AB_C 28 22 6 2 0 0 3 102 18 6 2 0 0 2 4 0 2PY9_AC_F 24 21 5 4 1 2 4 98 11 4 1 0 0 2 6 1 2QUX_ABDE_C 27 22 4 1 0 2 3 95 19 5 1 1 1 0 2 1 2R7T_A_X 16 24 9 1 1 4 0 73 19 8 3 4 2 0 7 0 2R7V_A_X 11 15 6 2 1 0 1 47 17 6 3 2 1 0 1 0 120 Structure ID Single-water bridges Double-water bridges Hphob 0 1 2 3 4 ≥5 bur dir 1 2 3 4 ≥5 bur acc Bur 2R8S_HL_R 19 20 10 3 2 1 7 90 22 11 3 1 1 4 8 0 2R92_AB_PT 27 46 23 3 0 0 4 122 46 30 20 1 0 3 8 0 2R93_AB_R 24 25 12 3 1 0 2 99 28 16 11 1 1 5 0 0 2RKJ_ABEF_CDG 74 63 17 4 0 0 2 342 63 17 7 4 0 3 33 0 2V3C_BCD_N 55 34 14 7 3 1 11 155 24 8 7 2 0 2 14 2 2V3C_AC_M 65 62 13 2 4 1 16 229 42 10 7 1 1 2 18 1 2W2H_AC_R 12 13 4 2 2 0 2 71 14 4 1 2 2 3 10 0 2W2H_BD_S 10 10 3 4 3 3 0 79 14 4 0 2 3 0 3 1 2WYY_ACDFHM_R 74 100 42 11 6 17 29 443 117 28 22 11 13 6 60 1 2XD0_AE_G 40 40 17 13 5 0 4 135 22 23 9 9 1 0 11 1 2XGJ_A_C 14 15 6 1 0 0 6 50 6 9 4 1 0 2 0 0 2XNR_A_C 13 7 2 0 0 0 0 35 1 4 0 0 0 0 3 0 2ZI0_AB_CD 80 40 14 3 1 0 8 172 29 25 4 0 0 1 6 0 2ZKO_AB_CD 22 27 5 1 4 1 1 79 5 3 9 10 1 1 7 0 2ZM5_AB_C 42 37 13 6 1 8 5 129 23 15 9 2 6 10 9 2 2ZNI_AB_D 47 32 11 11 3 0 8 130 29 7 3 2 0 2 11 0 2ZNI_AB_C 37 35 10 3 4 3 13 131 28 12 6 0 0 4 4 0 2ZUE_A_B 41 51 21 6 4 2 9 170 26 15 7 3 3 6 19 1 2ZZM_A_B 41 40 19 10 6 8 2 166 28 18 12 2 5 4 15 0 2ZZN_AB_C 66 55 26 10 3 2 2 238 36 24 11 7 3 4 18 0 2ZZN_AB_D 46 47 18 6 2 1 10 178 31 12 14 4 3 5 10 1 3A2K_AB_C 65 50 23 14 0 1 8 159 31 22 9 5 4 3 8 0 3A2K_AB_D 58 53 22 21 2 3 6 149 26 31 8 8 6 2 13 0 3A6P_FH_IJ 36 39 16 6 3 0 2 179 36 18 11 1 1 0 12 1 3A6P_AC_DE 53 56 14 9 2 1 1 198 33 22 4 2 1 0 10 0 3ADB_A_C 29 39 10 7 0 0 1 185 36 12 5 0 0 0 14 1 3ADC_B_D 35 17 4 5 1 1 5 130 16 13 3 1 1 3 8 4 3ADC_A_C 39 34 8 7 5 1 2 165 32 17 3 1 3 0 15 0 3ADI_AB_DE 12 4 1 0 0 0 0 32 1 0 0 0 0 0 3 0 3AEV_B_C 36 18 8 1 0 3 3 77 7 4 2 1 0 0 10 0 3AKZ_ABC_E 59 43 14 10 11 6 8 229 28 13 11 7 7 7 11 0 3AL0_BC_E 58 74 25 16 6 3 7 249 34 22 20 8 7 6 14 2 3AM1_A_B 34 10 5 4 0 3 5 99 19 2 1 1 1 6 8 2 3BOY_ABC_D 31 31 8 0 0 0 18 146 30 3 1 0 0 2 29 0 3CIY_AB_CD 52 39 17 2 3 3 7 189 26 10 1 1 0 0 12 0 3CZ3_ABCD_EF 73 44 25 6 4 0 3 181 44 8 7 0 0 0 5 1 3CZ3_ABCD_GH 66 57 33 8 0 0 4 196 33 19 4 0 0 2 14 0 3D2S_B_F 9 2 0 0 0 0 1 20 1 0 0 0 0 0 0 0 3D2S_A_E 10 4 1 1 0 0 0 19 1 1 0 0 0 0 0 0 3EPH_A_E 38 35 17 11 8 12 9 117 30 11 6 8 18 5 6 0 121 Structure ID Single-water bridges Double-water bridges Hphob 0 1 2 3 4 ≥5 bur dir 1 2 3 4 ≥5 bur acc Bur 3EQT_AB_CD 29 20 2 0 0 0 8 84 19 4 0 0 0 5 10 3 3EX7_CDI_F 17 14 2 2 1 1 6 59 11 3 0 1 1 2 2 1 3FTE_A_CD 14 13 1 1 1 0 0 55 15 4 1 1 0 0 1 1 3GIB_ABC_H 18 18 4 4 0 0 2 69 6 2 0 0 0 3 8 4 3HL2_ABCD_E 59 36 8 2 0 0 0 248 41 11 0 0 0 0 12 0 3HSB_ABCD_X 29 14 2 0 0 0 7 61 4 1 0 0 0 3 9 2 3HTX_A_BC 80 51 16 5 2 1 1 237 48 21 4 1 3 0 19 0 3IAB_AB_R 45 41 22 8 3 6 7 172 37 23 10 1 5 3 9 0 3ICQ_BT_D 47 45 4 0 0 0 0 191 35 7 0 0 0 0 8 0 3IIN_A_BCD 25 17 4 1 1 0 1 78 9 0 0 0 1 1 9 0 3IVK_AB_C 28 16 3 0 2 7 7 82 12 5 2 0 8 2 4 0 3IWN_CD_A 33 21 12 1 0 0 3 105 14 3 3 0 0 1 10 0 3IWN_D_B 21 16 6 2 1 0 2 67 13 5 0 1 0 0 13 1 3KFU_DEF_M 36 28 8 6 2 1 0 126 18 8 4 2 1 0 14 3 3KS8_ABCD_EF 37 28 10 8 5 2 2 138 21 12 9 3 0 1 6 0 3KTV_BD_C 27 30 7 4 2 1 4 115 18 7 2 0 0 1 21 0 3LOB_CF_R 6 4 2 0 0 0 0 31 4 2 0 0 0 0 3 0 3M7N_DG_Y 7 16 11 5 4 1 3 38 10 4 4 4 4 2 2 0 3MDG_A_C 10 9 5 1 0 0 2 43 13 3 0 0 0 0 8 0 3NNC_A_B 19 9 2 0 0 0 0 57 7 3 0 0 0 0 1 3 3NNH_AC_E 34 22 6 3 0 0 1 104 17 3 1 1 0 1 6 0 3O3I_X_A 13 10 0 0 0 0 5 48 1 0 0 0 0 3 4 0 3OG8_AB_CD 26 41 10 4 2 0 1 126 46 17 2 0 0 1 5 0 3OK7_A_BC 27 22 7 6 3 9 3 108 11 4 1 5 2 0 13 0 3OL6_AE_BCG 49 62 15 6 7 11 1 235 69 18 9 6 4 2 8 0 3OL6_A_BCD 30 44 16 7 6 11 0 161 50 18 10 5 3 2 7 0 484D_A_B 28 19 8 4 2 2 8 71 5 7 3 1 2 2 3 0 6MSF_A_R 14 3 2 2 0 0 1 49 4 1 1 1 0 0 0 0 5.4.4. Images of the Water Network An example of a classified water network is shown for the 1ULL_B_A (Ye et al., 1996) interface in Figure 5.5. This image shows the complex of the HIV-1 rev peptide bound in the major groove of the stem of an RNA hairpin solvated by WATGEN. The classification procedure identified water molecules forming single-water bridges at a depth of 5 water molecules (6 hydrogen bonds) from 122 bulk water, and also identified several buried water molecules. The detailed three-dimensional nature of the network is better appreciated by viewing electronically. The computed water networks for the 224 interfaces are available for download (PDB file with a PyMol (Schrodinger, 2010) script) at http://rockscluster.hsc.usc.edu/research/software/watgen/watgen.html. Figure 5.5. WATGEN solvation of an RNA aptamer complexed with HIV-1 REV peptide (1ULL_B_A). Water molecules are shown as spheres based on the position of the O atom, with the color and size corresponding to the schematic image shown in Figure 5.1. Elipse A identifies a pathway of hydrogen-bonded water molecules from red (depth 0, direct interaction with the bulk) to orange, yellow, green, blue, and violet (depth 5). Elipse B identifies two 'buried' water molecules (shown in pink) that are not connected to bulk solvent directly or indirectly via hydrogen bonding to other water molecules. The original of this figure was made by Dr. Brian Sutch. 5.5. Discussion Water is a critical component of biomolecular interfaces and inclusion of solvation effects is required in rational design and affinity prediction for these 123 interfaces. Accurate placement of water molecules in experimentally determined interfaces is a first step, and the results shown here demonstrate that this can be achieved by the WATGEN algorithm. The derived parameters for the predicted water networks give an indication of the acceptable features that are required for a network at a designed interface. These parameters are likely to be representative of a typical protein-RNA interface, since they were derived from a dataset of 224 unique complexes. Therefore, these data can make a significant contribution to design of new biomolecular interfaces for diagnostic and therapeutic purposes. Determination of protein-RNA complexes using docking is currently a major challenge. A recent successful approach in this area has been reported by de Vries et al. (de Vries et al., 2010a) using the HADDOCK algorithm developed by the same authors (de Vries et al., 2010b). We envisage two potential uses of WATGEN in docking of RNA molecules to proteins (or vice versa). First, the solvated docking approach used by van Dijk and Bonvin (van Dijk and Bonvin, 2006) may be facilitated by analysis of WATGEN-computed water networks to identify highly conserved water positions that can be included in docking protocols, without dependence on X-ray data for the water location. Second, we have described a motif-based approach to deconvolution of a protein-RNA interface (Sutch et al., 2009), and establishment of a database of these motifs may allow knowledge-driven docking of RNA molecules to proteins, initially in the absence of water. Predicted structures can then by solvated using WATGEN, 124 with the goal of obtaining solvated interfaces with similar properties to those found for experimentally determined protein-RNA interfaces. Computation of the number of water molecules for a particular interface can be achieved using Equation 1 above. This equation may reflect a fundamental property of solvation of a protein-RNA interface and is likely to be useful for evaluation of simulation and docking results. The classification of the water network described here provides a further basis for detailed analysis of solvated protein-RNA complexes determined biophysically or by docking. First, it allows a complete description of the water network in qualitative terms, with identification of potentially important water molecules that form key single-water bridges. Second, since each water molecule has a specific classification, a numerical score for the whole network can be assigned to describe the quality of interface solvation. Enthalpically, the score for each molecule should reflect the assumption that a water molecule should maintain at least 4 hydrogen bonds in the complex, based on pure water and also on the finding in Schlessman et al. (Schlessman et al., 2008) of 3-5 hydrogen bonds for internal water molecules in proteins. Entropically, a penalty can be applied to unfavorable water molecules that are buried in a cavity or positioned deeper from the bulk water. Development of a validated scoring system will require further analysis of the energetics of the water networks. We briefly note a few technical aspects of WATGEN. First, the algorithm requires that one of the molecules of the interface is defined as the “ligand”, 125 since the “water box” within which the calculation is performed is defined with respect to the ligand. In this study, we defined the RNA molecule as the ligand in each complex, but the calculations can equally be performed with the protein defined as the ligand. We performed this set of calculations (data not shown) and found that the algorithm predicts networks that do not differ significantly from the networks obtained with RNA as the ligand, in terms of the numbers of water molecules and the average RMSD of the water sites. Second, in this work we used X-ray structures downloaded from the RCSB, since we assume that this is the most likely source of structures for most users. However, these structures often contain breaks in the protein or RNA chain and computational 'repair' of these breaks may be appropriate for use of the algorithm for specific complexes. Third, we excluded non-natural nucleotides from the current dataset, but these will be included in a future version of the program. It is clear that water plays an essential role within biomolecular interfaces. Where once it was assumed that the effect of water was secondary to the interaction, it is now apparent that water is part of the interaction itself. The WATGEN algorithm provides a rapid method for accurate solvation of protein- RNA and protein-protein interfaces. Further development of a scorable hydration model will provides the basis for use of the water network as an important component of interface design. 126 Chapter 6 Modeling the Transmembrane Domains of the Human Dipeptide Transporter hPEPT1 6.1. Introduction In this chapter, the structure and functional role of the loop connecting the TMD 6 and TMD 7 (loop 6-7) of hPEPT1 was investigated. The work on loop 6-7 described below and in the publication Xu et al., 2010 was performed in collaboration with the laboratory of Dr. Daryl L. Davies. Dr. Davies’s group provided the mutagenesis data and we performed secondary structure prediction and computational modeling to explain the possible role of loop 6-7 in substrate transport. Several preliminary transport models constituting the core transmembrane domains (TMDs) of hPEPT1 were also built and tested using substrate uptake simulations. This work was performed using the program TMD 2.0, which was developed in our laboratory. 6.2. Background The human dipeptide transporter (hPEPT1) is primarily expressed on the apical membrane of small intestinal epithelial cells (Liang et al., 1995). hPEPT1 has an important physiological role in uptake into the circulation of di- and tripeptides originating from digestion of dietary proteins (Rubio-Aliaga and Daniel, 127 2008). In addition to its natural substrates, hPEPT1 transports many pharmacologically active peptidomimetics, including β-lactam antibiotics and angiotensin-converting enzyme (ACE) inhibitors, and antiviral and anticancer agents such as valacyclovir (Brandsch et al., 2008; Rubio-Aliaga and Daniel, 2008). The broad substrate specificity and high capacity of hPEPT1 makes it an attractive target for oral drug delivery. hPEPT1 is a proton-coupled symporter with 12 -helical transmembrane domains (TMDs) (Covitz et al., 1998), of which TMDs 3, 5, 7 and 10 have been proposed to form part of the substrate translocation pathway (Kulkarni et al., 2003a; Links et al., 2007; Xu et al., 2009). As might be expected, charged residues and polar residues in the TMDs play crucial roles in substrate transport. E595 in TMD 10 is essential for function, Y588 in TMD 10 may have an important secondary mechanistic role (Xu et al., 2009), and R282 in TMD 7 also has a key role (Kulkarni et al., 2007; Kulkarni et al., 2003b). In rabbit PEPT1, R282 links transport of the substrate and proton (Meredith, 2004), and findings in the human and rabbit proteins suggest that a salt bridge forms between R282 and D341 in TMD 8 (Kulkarni et al., 2007; Meredith, 2009). Several residues in TMD 5 and TMD 3, such as Y167, N171 and S174 in TMD5 (Kulkarni et al., 2003a) and Y91 in TMD3 (Links et al., 2007), were also identified to be important for the hPEPT1 transportation. Compared to the TMDs, there is little information on the loops of hPEPT1. The longest loop (about 200 amino acids) connects TMDs 9 and 10 128 extracellularly, but may not be essential for function (Daniel, 2004; Meredith and Price, 2006). YdgR, a related E. coli oligopeptide transport protein, is not as large as hPEPT1 due to the absence of this loop (Daniel, 2004) and rPEPT1 is functional after truncation of the loop (Meredith and Price, 2006). The largest intracellular loop (55 amino acids: K224-K278) in hPEPT1 connects TMDs 6 and 7 (loop 6-7). This loop contains a high number of charged amino acids (16 K and R, 5 D and E), but there is no information on the structure. 6.3. Methods 6.3.1. Model Construction and Substrate Uptake Simulations The transport model of hPEPT1 was constructed using the core TMDs (3, 5, 7 and 10) to form a translocation channel. In the simulations shown in below, only TMDs 5 and 10 are included, but the methodology can be extended to any number of TMDs. The pre-transport state were modeled as a V-shaped channel, while the post-transport state were modeled as a reversed V-shaped channel. During the substrate translocation, gradual transition from the pre-transport sate to the post-transport state was simulated by adjusting the tilting angles and the distances between the TMDs. The substrate uptake simulation was divided to two phases based on the experimental data. Phase 1 features the interaction between the substrate and the residue Y588 at TMD 10, and the channel opens up in this phase. Phase 2 features the interaction between the substrate and the residue E595 at TMD 10, and the channel flips to post-transport state in this 129 phase. Each phase was divided to several steps in the simulations. For each step, interactions between the substrate and key residues of the hPEPT1 were calculated and represented as an energy parameter E. Plots of the changes of E during the transportation were generated by R (Team, 2009) or Microsoft Excel. Different types of simulations with different channel models and different simulation procedures were performed to investigate the details of the translocation mechanism. The model construction and simulations were performed using an in-house program TMD 2.0. 6.3.2. Secondary Structure Prediction Secondary structure predictions were performed using the on-line servers of six programs: NetSurfP (Petersen et al., 2009), GOR (Sen et al., 2005), Jpred (Cole et al., 2008), PREDATOR (Frishman and Argos, 1996), YASSPP (Karypis, 2006), and PSIPRED (Bryson et al., 2005). Symbols “H”, “E” and “C” (or ”-” ) were used in the results to show the prediction of helix, strand and coil, respectively. 6.4. Results 6.4.1. Structure and Functional Role of Loop 6 –7 The largest intracellular loop (loop 6-7), linking the TMD 6 and TMD 7, might play an important role in the transportation function of hPEPT1 (Xu et al., 2010). Secondary structure predictions were performed to investigate the possible 130 structure of loop 6-7 (Figure 6.1 A). The consensus of predictions by six algorithms indicated two helical regions in the loop: one in the N-terminal half and the other in the C-terminal half. The residues that showed a significant influence on hPEPT1 function after mutagenesis (Xu et al., 2010) are all in or close to the predicted helical regions. The most important negatively charged residue, E267, lies in the middle of the predicted C-terminal helix. To validate these findings, we performed similar predictions for loop 6-7 of the E.coli lac permease (LacY) (Figure 6.1 B), for which the X-ray structure has been determined (Abramson et al., 2003). PEPT1 and LacY are members of the major facilitator superfamily and proteins in this family may share similar structure (Saier et al., 2006). Furthermore, the largest intracellular loop in LacY also occurs between TMDs 6 and 7. The LacY structure was determined in the post-transport state and has loop 6-7 in an extended conformation with two short helical regions in the C and N terminal halves of the loop (Abramson et al., 2003). The secondary structure prediction also shows helices in the two halves of LacY loop 6-7 (Figure 6.1 B). This is consistent with the X-ray structure and similar to the prediction for hPEPT1, despite the lack of sequence homology between the loops of the two proteins. Based on the mutagenesis data and the secondary structure prediction, a model and the functional role of loop 6-7 for substrate translocation were proposed (Figure 6.2). In the pre-transport state, the two amphipathic helices in loop 6-7 may interact with each other by hydrophobic contacts. During substrate 131 transport, the hydrophobic contacts may be broken and the loop could undergo a conformational change (post-transport state). After substrate translocation, the positive and negative charges in the two helices of loop 6-7 may provide long range electrostatic interactions that contribute to reestablishment of a pre- transport conformation (Figure 6.2 C). Figure 6.1. Secondary structure predictions for loop 6-7 in hPEPT1 and LacY. The predictions were performed using on-line servers of six programs: NetSurfP (Petersen et al., 2009), GOR (Sen et al., 2005), Jpred (Cole et al., 2008), PREDATOR (Frishman and Argos, 1996), YASSPP (Karypis, 2006), and PSIPRED (Bryson et al., 2005). The summary was generated based on the results of these programs: ‘H’ (helix) indicates at least five of the programs predicted a helix at a given position, and ‘h’ indicates four of the programs predicted a helix. In the hPEPT1 sequence, the letters in cyan and magenta indicate the positively and negatively charged residues, respectively, that were mutated in this study, and * indicates that the mutation had a significant influence on hPEPT1 activity, which were obtained by Liya Xu in the laboratory of Dr. Daryl L. Davies. # In the NetSurfP prediction, H indicates an alpha-helix probability >0.6. 132 Figure 6.2. A model of loop6-7 and its functional role for substrate translocation. (A) Lateral view of the loop 6-7 model. (B) The loop 6-7 model viewed from the top of the model. The color of the residues indicates positive residues (blue), negative residues (red), hydrophobic residues (green) and other type of residues (gray). (C) Proposed conformation change of the loop6-7 in the transition between pre-transport state and post-transport state. 6.4.2. Two Phase Simulations of hPEPT1 Transportation An hPEPT1 transport model involved TMD 5 and TMD 10 was proposed based on mutagenesis data (Figure 6.3). In this model there are two major phases for the substrate translocation. Phase 1 features the interaction between the substrate and Y588 located at the top of TMD 10: after the substrate interacts with the top of the hPEPT1 channel, TMD 5 and TMD 10 may rotate and the channel opens, and then substrate binds to Y588. Phase 2 features the 133 interaction between the substrate and E595 located in the middle of TMD 10: the channel keeps opening and then the substrate moves down to bind to E595, which triggers helix flipping and substrate release. Simulations of Ala-Ala uptake through the channel constituted by TMD 5 and TMD 10 were performed to examine the details of the two phases in this model. Figure 6.3. A hPEPT1 transportation model involved TMD 5 and TMD 10. For phase 1, we constructed 100 different simulation models with varied rotation angles and flipping angles (Table 6.1). Each simulation starts with a tight V-shape model: 4 Å of distance between TMD 5 and TMD 10, 100º (or 110º ) tilt angle of TMD 5, 80º (or 70º ) tilt angle of TMD 10, 210º - 270º rotation angles of 134 TMD 5 and TMD 10 (Figure 6.4 A). Each simulation ends with an open V-shape model: unchanged tilting angles of TMD 5 and TMD 10, 10 Å between the two TMDs, 180º rotation angle of TMD 5, and 300º rotation angle of TMD 10 (Figure 6.4 B). The conformational changes of TMD 5 and TMD 10 are divided into 20 steps and the translocation of Ala-Ala is simulated through the channel starting by interacting with Y588. Figure 6.5 shows the change of the interaction energies between Ala-Ala and residues T181, T178, S177, S174 and N171 of TMD 5 during the translocation in the different models. In phase 1, Ala-Ala first interacts with S177 (more obvious in the models with 210º - 225º initial rotation angle of TMD 10), and then interacts with S174 and T178. N171 also has some interaction with Ala-Ala in the late steps (Figure 6.5). Figure 6.4. Starting model (A) and final model (B) for the phase 1 simulation (Model No. 1) involving TMD 5 (gray), TMD 10 (cyan), and substrate Ala-Ala (yellow). The residues in magenta at TMD 5 from top to bottom are S177, S174, N171 and Y167; the residues in blue from top to bottom at TMD 10 are Y588 and E595. 135 Table 6.1. The first 30 simulation models constructed for phase 1 a . No b d0 c d1 d TM5_t 0 e TM5_ t1 f TM10_ t0 g TM10_t 1 h TM5 _r0 i TM5_r 1 j TM10_ r0 k TM10_ r1 l 1 4 8 100 100 80 80 210 180 210 300 2 4 8 100 100 80 80 210 180 225 300 3 4 8 100 100 80 80 210 180 240 300 4 4 8 100 100 80 80 210 180 255 300 5 4 8 100 100 80 80 210 180 270 300 6 4 8 100 100 80 80 225 180 210 300 7 4 8 100 100 80 80 225 180 225 300 8 4 8 100 100 80 80 225 180 240 300 9 4 8 100 100 80 80 225 180 255 300 10 4 8 100 100 80 80 225 180 270 300 11 4 8 100 100 80 80 240 180 210 300 12 4 8 100 100 80 80 240 180 225 300 13 4 8 100 100 80 80 240 180 240 300 14 4 8 100 100 80 80 240 180 255 300 15 4 8 100 100 80 80 240 180 270 300 16 4 8 100 100 80 80 255 180 210 300 17 4 8 100 100 80 80 255 180 225 300 18 4 8 100 100 80 80 255 180 240 300 19 4 8 100 100 80 80 255 180 255 300 20 4 8 100 100 80 80 255 180 270 300 21 4 8 100 100 80 80 270 180 210 300 22 4 8 100 100 80 80 270 180 225 300 23 4 8 100 100 80 80 270 180 240 300 24 4 8 100 100 80 80 270 180 255 300 25 4 8 100 100 80 80 270 180 270 300 26 4 8 110 110 70 70 210 180 210 300 27 4 8 110 110 70 70 210 180 225 300 28 4 8 110 110 70 70 210 180 240 300 29 4 8 110 110 70 70 210 180 255 300 30 4 8 110 110 70 70 210 180 270 300 a 100 models were constructed for phase 1. The table shows the first 30 models. b Model No. c Distance (Å) between TMD 5 and TMD 10 at step 0. d Distance (Å) between TMD 5 and TMD 10 at the final step. e Tilt angle (° ) of TMD 5 at step 0. f Tilt angle (° ) of TMD 5 at the final step. g Tilt angle (° ) of TMD 10 at step 0. h Tilt angle (° ) of TMD 10 at the final step. i Rotation angle (° ) of TMD 5 at step 0. j Rotation angle (° ) of TMD 5 at the final step. k Rotation angle (° ) of TMD 10 at step 0. l Rotation angle (° ) of TMD 10 at the final step. 136 137 138 Figure 6.5. Changes of interaction energy E in phase 1 simulation for the 30 different models in Table 6.1. The file No at the top of each plot corresponds to the model No in Table 6.1. The lines in each plot show the interaction energy E between the substrate Ala-Ala and N171 atom O (yellow), N171 atom N (green), S174 atom O (orange), S177 atom O (red), T178 atom O (blue) and T181 atom O (black). For phase 2, we constructed 50 simulation models with varied rotation angles and flipping angles (Table 6.2). Each simulation starts with an open V-shape model similar to the last step of phase 1: 8 Å between TMD 5 and TMD 10, 100º (or 110º ) tilt angle for TMD 5, 80º (or 70º ) tilt angle for TMD 10, 150º - 180º rotation angles for TMD 5, and 270º - 300º of rotation angles for TMD 10 (Figure 6.6 A). Each simulation ends with a reversed V-shape model: unchanged distance between TMD 5 and TMD 10, unchanged rotation angle of TMD 5 and TMD 10, 80º (or 70º ) tilt angle for TMD 5, and 100º (or 110º ) tilt angle for TMD 10 139 (Figure 6.6 B). The conformational changes of TMD 5 and TMD 10 are divided into 20 steps and the translocation of Ala-Ala is simulated through the channel starting by interacting with E595. Figure 6.7 show the change of the interaction energies between Ala-Ala and residues S177, S174, N171 and Y167 in TMD 5 during translocation in the different models. In phase 2, Ala-Ala first interacts with N171 and S174, and then with Y167 (Figure 6.7). Figure 6.6. Starting model (A) and final model (B) for the phase 2 simulation (Model No. 1) involving TMD 5 (gray), TMD 10 (cyan), and substrate Ala-Ala (yellow). The residues in magenta on TMD 5 from top to bottom are S177, S174, N171 and Y167; the residues in blue from top to bottom on TMD 10 are Y588 and E595. 140 Table 6.2. The first 30 simulation models constructed for phase 2 a . No b d0 c d1 d TM5_t 0 e TM5_ t1 f TM10_ t0 g TM10_ t1 h TM5 _r0 i TM5_ r1 j TM10_ r0 k TM10 _r1 l 1 8 13 100 80 80 100 180 180 300 300 2 8 13 100 80 80 100 180 180 285 285 3 8 13 100 80 80 100 180 180 270 270 4 8 13 100 80 80 100 180 180 315 315 5 8 13 100 80 80 100 180 180 330 330 6 8 13 100 80 80 100 165 165 300 300 7 8 13 100 80 80 100 165 165 285 285 8 8 13 100 80 80 100 165 165 270 270 9 8 13 100 80 80 100 165 165 315 315 10 8 13 100 80 80 100 165 165 330 330 11 8 13 100 80 80 100 150 150 300 300 12 8 13 100 80 80 100 150 150 285 285 13 8 13 100 80 80 100 150 150 270 270 14 8 13 100 80 80 100 150 150 315 315 15 8 13 100 80 80 100 150 150 330 330 16 8 13 100 80 80 100 195 195 300 300 17 8 13 100 80 80 100 195 195 285 285 18 8 13 100 80 80 100 195 195 270 270 19 8 13 100 80 80 100 195 195 315 315 20 8 13 100 80 80 100 195 195 330 330 21 8 13 100 80 80 100 210 210 300 300 22 8 13 100 80 80 100 210 210 285 285 23 8 13 100 80 80 100 210 210 270 270 24 8 13 100 80 80 100 210 210 315 315 25 8 13 100 80 80 100 210 210 330 330 26 8 13 110 70 70 110 180 180 300 300 27 8 13 110 70 70 110 180 180 285 285 28 8 13 110 70 70 110 180 180 270 270 29 8 13 110 70 70 110 180 180 315 315 30 8 13 110 70 70 110 180 180 330 330 a 50 models were constructed for phase 2. This table shows the first 30 models. b Model No. c Distance (Å) between TMD 5 and TMD 10 at step 0. d Distance (Å) between TMD 5 and TMD 10 at the final step. e Tilt angle (° ) of TMD 5 at step 0. f Tilt angle (° ) of TMD 5 at the final step. g Tilt angle (° ) of TMD 10 at step 0. h Tilt angle (° ) of TMD 10 at the final step. i Rotation angle (° ) of TMD 5 at step 0. j Rotation angle (° ) of TMD 5 at the final step. k Rotation angle (° ) of TMD 10 at step 0. l Rotation angle (° ) of TMD 10 at the final step. 141 142 143 Figure 6.7. Changes of interaction energy E in the phase 2 simulation for the 30 different models in Table 6.2. The file No at the top of each plot corresponds to the model No in Table 6.2. The lines in each plot show the interaction energy E between the substrate Ala-Ala and Y167 atom O (purple), N171 atom O (yellow), N171 atom N (green), S174 atom O (orange) and S177 atom O (red). 6.4.3. Continuous Simulations Involved TMD 5 and 10 Continuous simulations of substrate uptake were performed on the channel constituted by TMD 5 and TMD 10, starting after the potential rotation of the TMDs in phase 1 until the end of phase 2. In each simulation, the V-shaped channel opens up from 5 Å to 10 Å during the first 40 steps, and then flips to a reversed V-shape during the next 12 steps. The tilt angle of TMD 5 and the tilt angle of TMD 10 start at 100º and 80º , and end at 80º and 100º , respectively. The rotation angles of TMD 5 and TMD 10 are 180º and 300º , respectively. The 144 substrate uptake was simulated by first interacting with Y588 for the first 24 steps (phase 1), and then with E595 (phase 2). Figure 6.8 shows the simulation results using Ala-Ala or Phe-Phe as the substrate. It shows that in phase 1, N171 has strong interaction with the substrate and S174 has relatively weak interaction. At the beginning of phase 2, the interaction with N171 increases dramatically and the interaction with S174 also increases, while Y167 begins to have a very strong interaction with the substrate. In the later stage of phase 2, Y167 shows a dominant interaction, while the interactions with N171 and S174 decrease. At the end of the simulation, the interaction with Y167 also decreases and the substrate starts to release (Figure 6.8). Phe-Phe shows much stronger interaction with the key residues at the translocation channel than the Ala-Ala, which is consistent with experimental data showing that Phe-Phe has a stronger binding affinity to hPEPT1 (Vig et al., 2006). Figure 6.8. Changes of interaction energy E in continuous simulations of the uptake of Ala-Ala (A) and Phe-Phe (B) through the channel constituted by TMD 5 and TMD 10. The lines in each plot show the interaction energy E between the substrate and the Y167 atom O (purple), N171 atom O (yellow), N171 atom N (green) and S174 atom O (orange). E values larger than “-10000” are displayed as “-10000” in the plots. 145 6.5. Discussion hPEPT1 has broad substrate specificity, as demonstrated by its transport of many di- and tri-peptides. Interestingly, hPEPT1 shows a varying degree of affinity for di- and tri-peptides, which results in differences in substrate transport (Vig et al., 2006). The transporter also has a high throughput requirement, since hPEPT1 is mainly functional after intake of food. Assuming the basic principle of a two-state model with pre- and post-transport conformations (Bolger et al., 1998), high throughput might be achieved by a specific mechanism to induce a return to the pre-transport state after substrate translocation. It has been proposed that reformation of a salt bridge between R282 and D341 after transport might be part of this mechanism (Kulkarni et al., 2007). The current study indicates that positive and negative residues in loop 6-7 also play an important role in substrate uptake. The structure of loop 6-7 of hPEPT1 is unknown and there are no structures of proteins with strong sequence homology to hPEPT1. The functional similarity between LacY and hPEPT1 (they are both 12-transmembrane proton symporters that translocate small hydrophilic molecules) has been used as a basis for homology modeling (Meredith and Price, 2006). Our secondary structure predictions indicated two helices in loop 6-7 in both proteins. In hPEPT1, the charged residues in the predicted helix in the N-terminal half of the loop are all positive, whereas all the negatively charged residues in the loop fall in the predicted C-terminal helix. These findings suggest a relationship among the 146 charged residues, the putative helical structure, and the function of loop 6-7 in substrate uptake. The two putative helices in loop 6-7 of hPEPT1 are amphipathic, and in the closed (pre-transport) conformation these helices may interact through hydrophobic contacts, with outward orientations of the charged residues. Upon substrate transport, the hydrophobic contacts may be broken and the loop could undergo a conformational change. The studies on LacY provide some support for this model. Both biophysical and biochemical studies on LacY studies suggest the conformational change which needed for completion of transport is accomplished via a rocker-switch movement (Guan et al., 2007; Zhou et al., 2008). The LacY structure was determined in the post-transport state and has loop 6-7 in an extended conformation, but with helices in the C and N terminal halves of the loop. The positive and negative charges within the similar predicted helices in the respective halves of loop 6-7 in hPEPT1 may provide long range electrostatic interactions that contribute to reestablishment of a pre-transport conformation after substrate translocation. Intracellular loops containing secondary structure and loops that influence substrate transport have been found in other proteins. Mutagenesis and cysteine modification identified a putative alpha-helix in an intracellular loop of the serotonin transporter (Zhang and Rudnick, 2005), and mutations of an intracellular loop in the dopamine transporter significantly influence the kinetics of substrate uptake, with changes of K m and V max relative to the wild type protein 147 (Chen et al., 2000). Our results for hPEPT1 provide a further example of the importance of intracellular loops in protein transporters and add another element to the emerging picture of the hPEPT1 translocation mechanism. Our substrate uptake simulations on the preliminary channel models also suggest potential transportation mechanism of hPEPT1 and potential functional roles of the key residues around the translocation channel. The simulation results suggest that the residues S174, N171 in TMD 5 have major interactions with the substrate in phase 1, while residue Y161 in TMD 5 has a strong interaction with the substrate in phase 2. S177 in TMD 5 interacts with the substrate at the beginning of substrate uptake only if the rotation of TMD 5 occurs. Since the cysteine-scanning mutagenesis results (Kulkarni et al., 2003a) suggest the potential importance of S177, the simulation results might indicate that rotation of TMD 5 does occur. In the simulations, Phe-Phe shows stronger interactions with the translocation channel than Ala-Ala, which is consistent with the results obtained by the affinity assay and activity assay (Vig et al., 2006). These preliminary simulation models provide a starting point for further simulations on a more accurate hPEPT1 mode. A crystal structure of the post- transport state of PepT(So), a functionally similar prokaryotic homologue of hPEPT1 from Shewanella oneidensis, has been solved (Newstead et al., 2011). The relative positions of TMD5 and 10 in this structure are similar to the positions used in the simulations. Generation of a more comprehensive model of hPEPT1 is possible using homology modeling based on the structure of PepT(So). Then 148 similar substrate uptake simulations using this new model could be performed to better understand the hPEPT1 translocation mechanism. Accurate simulations of substrate uptake would be very helpful for aiding design of drugs that target hPEPT1 for efficient oral delivery. 149 References Abramson, J., Smirnova, I., Kasho, V., Verner, G., Kaback, H.R., and Iwata, S. (2003). Structure and mechanism of the lactose permease of Escherichia coli. Science 301, 610-615. Ahren, B., Oosterwijk, C., Lips, C.J., and Hoppener, J.W. (1998). Transgenic overexpression of human islet amyloid polypeptide inhibits insulin secretion and glucose elimination after gastric glucose gavage in mice. Diabetologia 41, 1374- 1380. Bahadur, R.P., Zacharias, M., and Janin, J. (2008). Dissecting protein-RNA recognition sites. Nucleic acids research 36, 2705-2716. Barillari, C., Taylor, J., Viner, R., and Essex, J.W. (2007). Classification of water molecules in protein binding sites. Journal of the American Chemical Society 129, 2577-2587. Bedrood, S., Li, Y., Isas, J.M., Hegde, B.G., Baxa, U., Haworth, I.S., and Langen, R. (2012). Fibril structure of human islet amyloid polypeptide. J Biol Chem 287, 5235-5241. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. (2000). The Protein Data Bank. Nucleic acids research 28, 235-242. Bolger, M.B., Haworth, I.S., Yeung, A.K., Ann, D., von Grafenstein, H., Hamm- Alvarez, S., Okamoto, C.T., Kim, K.J., Basu, S.K., Wu, S., et al. (1998). Structure, function, and molecular modeling approaches to the study of the intestinal dipeptide transporter PepT1. J Pharm Sci 87, 1286-1291. Brandsch, M., Knutter, I., and Bosse-Doenecke, E. (2008). Pharmaceutical and pharmacological importance of peptide transporters. J Pharm Pharmacol 60, 543-585. Bryson, K., McGuffin, L.J., Marsden, R.L., Ward, J.J., Sodhi, J.S., and Jones, D.T. (2005). Protein structure prediction servers at University College London. NuclAcids Res 33, W36-W38. Bui, H.-H., Schiewe, A.J., and Haworth, I.S. (2007). WATGEN: an algorithm for modeling water networks at protein-protein interfaces. Journal of computational chemistry 28, 2241-2251. 150 Butler, A.E., Jang, J., Gurlo, T., Carty, M.D., Soeller, W.C., and Butler, P.C. (2004). Diabetes due to a progressive defect in beta-cell mass in rats transgenic for human islet amyloid polypeptide (HIP Rat): a new model for type 2 diabetes. Diabetes 53, 1509-1516. Cao, P., Tu, L.H., Abedini, A., Levsh, O., Akter, R., Patsalo, V., Schmidt, A.M., and Raleigh, D.P. (2012). Sensitivity of amyloid formation by human islet amyloid polypeptide to mutations at residue 20. J Mol Biol 421, 282-295. Case, D.A., Cheatham, T.E., 3rd, Darden, T., Gohlke, H., Luo, R., Merz, K.M., Jr., Onufriev, A., Simmerling, C., Wang, B., and Woods, R.J. (2005). The Amber biomolecular simulation programs. J Comput Chem 26, 1668-1688. Castrignanò , T., Chillemi, G., Varani, G., and Desideri, A. (2002). Molecular dynamics simulation of the RNA complex of a double-stranded RNA-binding domain reveals dynamic features of the intermolecular interface and its hydration. Biophysical journal 83, 3542-3552. Chen, N., Ferrer, J.V., Javitch, J.A., and Justice, J.B.J. (2000). Transport- dependent accessibility of a cytoplasmic loop cysteine in the human dopamine transporter. J Biol Chem 275, 1608-1614. Chiti, F., and Dobson, C.M. (2006). Protein misfolding, functional amyloid, and human disease. Annu Rev Biochem 75, 333-366. Clark, A., Cooper, G.J., Lewis, C.E., Morris, J.F., Willis, A.C., Reid, K.B., and Turner, R.C. (1987). Islet amyloid formed from diabetes-associated peptide may be pathogenic in type-2 diabetes. Lancet 2, 231-234. Clark, A., de Koning, E.J., Hattersley, A.T., Hansen, B.C., Yajnik, C.S., and Poulton, J. (1995). Pancreatic pathology in non-insulin dependent diabetes (NIDDM). Diabetes Res Clin Pract 28 Suppl, S39-47. Cole, C., Barber, J.D., and Barton, G.J. (2008). The Jpred 3 secondary structure prediction server. NuclAcids Res 36, W197-W201. Cooper, G.J., Willis, A.C., Clark, A., Turner, R.C., Sim, R.B., and Reid, K.B. (1987). Purification and characterization of a peptide from amyloid-rich pancreases of type 2 diabetic patients. Proc Natl Acad Sci U S A 84, 8628-8632. Covitz, K.M., Amidon, G.L., and Sadee, S. (1998). Membrane topology of the human dipeptide transporter, hPEPT1, determined by epitope insertions. Biochem J 37, 15214-15221. 151 Daniel, H. (2004). Molecular and Integrative Physiology of Intestinal Peptide Transport. Annual Review of Physiology 66, 361-384. de Beer, S.B.A., Vermeulen, N.P.E., and Oostenbrink, C. (2010). The role of water molecules in computational drug design. Current topics in medicinal chemistry 10, 55-66. de Vries, S.J., Melquiond, A.S.J., Kastritis, P.L., Karaca, E., Bordogna, A., van Dijk, M., Rodrigues, J.P.G.L.M., and Bonvin, A.M.J.J. (2010a). Strengths and weaknesses of data-driven docking in critical assessment of prediction of interactions. Proteins 78, 3242-3249. de Vries, S.J., van Dijk, M., and Bonvin, A.M.J.J. (2010b). The HADDOCK web server for data-driven biomolecular docking. Nature protocols 5, 883-897. Doran, T.M., Kamens, A.J., Byrnes, N.K., and Nilsson, B.L. (2012). Role of amino acid hydrophobicity, aromaticity, and molecular volume on IAPP(20-29) amyloid self-assembly. Proteins 80, 1053-1065. Duan, J., and Nilsson, L. (2002). The role of residue 50 and hydration water molecules in homeodomain DNA recognition. European biophysics journal : EBJ 31, 306-316. Durchschlag, H., and Zipper, P. (2003). Modeling the hydration of proteins: prediction of structural and hydrodynamic parameters from X-ray diffraction and scattering data. European biophysics journal : EBJ 32, 487-502. Ellis, J.J., Broom, M., and Jones, S. (2007). Protein-RNA interactions: structural analysis and functional classes. Proteins 66, 903-911. Fox, A., Snollaerts, T., Errecart Casanova, C., Calciano, A., Nogaj, L.A., and Moffet, D.A. (2010). Selection for nonamyloidogenic mutants of islet amyloid polypeptide (IAPP) identifies an extended region for amyloidogenicity. Biochemistry 49, 7783-7789. Frishman, D., and Argos, P. (1996). Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. Protein Engineering, Design and Selection 9, 133-142. Gebre-Medhin, S., Mulder, H., Pekny, M., Westermark, G., Tornell, J., Westermark, P., Sundler, F., Ahren, B., and Betsholtz, C. (1998). Increased insulin secretion and glucose tolerance in mice lacking islet amyloid polypeptide (amylin). Biochem Biophys Res Commun 250, 271-277. 152 Geroult, S., Hooda, M., Virdee, S., and Waksman, G. (2007). Prediction of solvation sites at the interface of Src SH2 domain complexes using molecular dynamics simulations. Chemical biology & drug design 70, 87-99. Goldsbury, C.S., Cooper, G.J., Goldie, K.N., Muller, S.A., Saafi, E.L., Gruijters, W.T., Misur, M.P., Engel, A., Aebi, U., and Kistler, J. (1997). Polymorphic fibrillar assembly of human amylin. J Struct Biol 119, 17-27. Green, J., Goldsbury, C., Mini, T., Sunderji, S., Frey, P., Kistler, J., Cooper, G., and Aebi, U. (2003). Full-length rat amylin forms fibrils following substitution of single residues from human amylin. J Mol Biol 326, 1147-1156. Green, J.D., Goldsbury, C., Kistler, J., Cooper, G.J., and Aebi, U. (2004). Human amylin oligomer growth and fibril elongation define two distinct phases in amyloid formation. J Biol Chem 279, 12206-12212. Guan, L., Mirza, O., Verner, G., Iwata, S., and Kaback, H.R. (2007). Structural determination of wild-type lactose permease. Proceedings of the National Academy of Sciences 104, 15294-15298. Hatmal, M.M., Lee, Y., Hegde, B.G., Hegde, P.B., Jao, C.C., Langen, R., and Haworth, I.S. (2011). Computer modeling of nitroxide spin labels on proteins. Biopolymers, in press. Hatmal, M.M., Li, Y., Hegde, B.G., Hegde, P.B., Jao, C.C., Langen, R., and Haworth, I.S. (2012). Computer modeling of nitroxide spin labels on proteins. Biopolymers 97, 35-44. Hayes, J.M., Skamnaki, V.T., Archontis, G., Lamprakis, C., Sarrou, J., Bischler, N., Skaltsounis, A.-L., Zographos, S.E., and Oikonomakos, N.G. (2011). Kinetics, in silico docking, molecular dynamics, and MM-GBSA binding studies on prototype indirubins, KT5720, and staurosporine as phosphorylase kinase ATP- binding site inhibitors: the role of water molecules examined. Proteins 79, 703- 719. Higham, C.E., Jaikaran, E.T., Fraser, P.E., Gross, M., and Clark, A. (2000). Preparation of synthetic human islet amyloid polypeptide (IAPP) in a stable conformation to enable study of conversion to amyloid-like fibrils. FEBS Lett 470, 55-60. Homans, S.W. (2007). Water, water everywhere--except where it matters? Drug discovery today 12, 534-539. Humphrey, W., Dalke, A., and Schulten, K. (1996). VMD: visual molecular dynamics. Journal of molecular graphics 14, 33-38, 27-38. 153 Jaikaran, E.T., and Clark, A. (2001). Islet amyloid and type 2 diabetes: from molecular misfolding to islet pathophysiology. Biochim Biophys Acta 1537, 179- 203. Jaikaran, E.T., Higham, C.E., Serpell, L.C., Zurdo, J., Gross, M., Clark, A., and Fraser, P.E. (2001). Identification of a novel human islet amyloid polypeptide beta-sheet domain and factors influencing fibrillogenesis. J Mol Biol 308, 515-525. Janson, J., Soeller, W.C., Roche, P.C., Nelson, R.T., Torchia, A.J., Kreutter, D.K., and Butler, P.C. (1996). Spontaneous diabetes mellitus in transgenic mice expressing human islet amyloid polypeptide. Proc Natl Acad Sci U S A 93, 7283- 7288. Jao, C.C., Hegde, B.G., Chen, J., Haworth, I.S., and Langen, R. (2008). Structure of membrane-bound alpha-synuclein from site-directed spin labeling and computational refinement. Proc Natl Acad Sci U S A 105, 19666-19671. Jao, C.C., Hegde, B.G., Gallop, J.L., Hegde, P.B., McMahon, H.T., Haworth, I.S., and Langen, R. (2010). Roles of amphipathic helices and the bin/amphiphysin/rvs (BAR) domain of endophilin in membrane curvature generation. J Biol Chem 285, 20164-20170. Jayasinghe, S.A., and Langen, R. (2004). Identifying structural features of fibrillar islet amyloid polypeptide using site-directed spin labeling. J Biol Chem 279, 48420-48425. Jiang, L., Kuhlman, B., Kortemme, T., and Baker, D. (2005). A "solvated rotamer" approach to modeling water-mediated hydrogen bonds at protein-protein interfaces. Proteins 58, 893-904. Kahn, S.E., Andrikopoulos, S., and Verchere, C.B. (1999). Islet amyloid: a long- recognized but underappreciated pathological feature of type 2 diabetes. Diabetes 48, 241-253. Kahn, S.E., D'Alessio, D.A., Schwartz, M.W., Fujimoto, W.Y., Ensinck, J.W., Taborsky, G.J., Jr., and Porte, D., Jr. (1990). Evidence of cosecretion of islet amyloid polypeptide and insulin by beta-cells. Diabetes 39, 634-638. Kajava, A.V., Aebi, U., and Steven, A.C. (2005). The parallel superpleated beta- structure as a model for amyloid fibrils of human amylin. J Mol Biol 348, 247-252. Karypis, G. (2006). YASSPP: better kernels and coding schemes lead to improvements in protein secondary structure prediction. Proteins 64, 575-586. 154 Kayed, R., Bernhagen, J., Greenfield, N., Sweimeh, K., Brunner, H., Voelter, W., and Kapurniotu, A. (1999). Conformational transitions of islet amyloid polypeptide (IAPP) in amyloid formation in vitro. J Mol Biol 287, 781-796. Kulkarni, A.A., Davies, D.L., Links, J.S., Patel, L.N., Lee, V.H.L., and Haworth, I.S. (2007). A charge pair interaction between Arg282 in transmembrane segment 7 and Asp341 in transmembrane segment 8 of hPepT1. Pharm Res 24, 66-72. Kulkarni, A.A., Haworth, I.S., and Lee, V.H.L. (2003a). Transmembrane segment 5 of the dipeptide transporter hPepT1 forms a part of the substrate translocation pathway. Biochemical and Biophysical Research Communications 306, 177-185. Kulkarni, A.A., Haworth, I.S., Uchiyama, T., and Lee, V.H.L. (2003b). Analysis of Transmembrane Segment 7 of the Dipeptide Transporter hPepT1 by Cysteine- scanning Mutagenesis. JBiolChem 278, 51833-51840. Laghaei, R., Mousseau, N., and Wei, G. (2011). Structure and thermodynamics of amylin dimer studied by Hamiltonian-temperature replica exchange molecular dynamics simulations. J Phys Chem B 115, 3146-3154. Laskowski, R.A., MacArthur, M.A., Moss, D.S., and Thornton, J.M. (1993a). PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Cryst 26, 283-291. Laskowski, R.A., MacArthur, M.W., Moss, D.S., and Thornton, J.M. (1993b). PROCHECK: a program to check the stereochemical quality of protein structures. Journal of Applied Crystallography 26, 283-291. Levy, Y., and Onuchic, J.N. (2006). Water mediation in protein folding and molecular recognition. Annual review of biophysics and biomolecular structure 35, 389-415. Lewis, B.A., Walia, R.R., Terribilini, M., Ferguson, J., Zheng, C., Honavar, V., and Dobbs, D. (2011). PRIDB: a Protein-RNA Interface Database. Nucleic acids research 39, D277-282. Li, Y., Hatmal, M.M., Langen, R., and Haworth, I.S. (2012). Idealized models of protofilaments of human islet amyloid polypeptide. J Chem Inf Model 52, 2983- 2991. Li, Z., and Lazaridis, T. (2007). Water at biomolecular binding interfaces. Physical chemistry chemical physics : PCCP 9, 573-581. 155 Liang, R., Fei, Y.J., Prasad, P.D., Ramamoorthy, S., Han, H., Yang-Feng, T.L., Hediger, M.A., Ganapathy, V., and Leibach, F.H. (1995). Human Intestinal H+/Peptide Cotransporter. JBiolChem 270, 6456-6463. Links, J.L.S., Kulkarni, A.A., Davies, D.L., Lee, V.H.L., and Haworth, I.S. (2007). Cysteine scanning of transmembrane domain three of the human dipeptide transporter: Implications for substrate transport. Journal of Drug Targeting 15, 218-225. Luca, S., Yau, W.M., Leapman, R., and Tycko, R. (2007). Peptide conformation and supramolecular organization in amylin fibrils: constraints from solid-state NMR. Biochemistry 46, 13505-13522. Luhrs, T., Ritter, C., Adrian, M., Riek-Loher, D., Bohrmann, B., Dobeli, H., Schubert, D., and Riek, R. (2005). 3D structure of Alzheimer's amyloid-beta(1-42) fibrils. Proc Natl Acad Sci U S A 102, 17342-17347. Luo, X., Lv, F., Pan, Y., Kong, X., Li, Y., and Yang, Q. (2011). Structure-based prediction of the mobility and disorder of water molecules at protein-DNA interface. Protein and peptide letters 18, 203-209. Makarov, V., Pettitt, B.M., and Feig, M. (2002). Solvation and hydration of proteins and nucleic acids: a theoretical view of simulation and experiment. Accounts of chemical research 35, 376-384. Marek, P., Abedini, A., Song, B., Kanungo, M., Johnson, M.E., Gupta, R., Zaman, W., Wong, S.S., and Raleigh, D.P. (2007). Aromatic interactions are not required for amyloid fibril formation by islet amyloid polypeptide but do influence the rate of fibril formation and fibril morphology. Biochemistry 46, 3255-3261. Margittai, M., and Langen, R. (2008). Fibrils with parallel in-register structure constitute a major class of amyloid fibrils: molecular insights from electron paramagnetic resonance spectroscopy. Q Rev Biophys 41, 265-297. Mazor, Y., Gilead, S., Benhar, I., and Gazit, E. (2002). Identification and characterization of a novel molecular-recognition and self-assembly domain within the islet amyloid polypeptide. J Mol Biol 322, 1013-1024. McHaourab, H.S., Lietzow, M.A., Hideg, K., and Hubbell, W.L. (1996). Motion of spin-labeled side chains in T4 lysozyme. Correlation with protein structure and dynamics. Biochemistry 35, 7692-7704. Meier, J.J., Kayed, R., Lin, C.Y., Gurlo, T., Haataja, L., Jayasinghe, S., Langen, R., Glabe, C.G., and Butler, P.C. (2006). Inhibition of human IAPP fibril formation 156 does not prevent beta-cell death: evidence for distinct actions of oligomers and fibrils of human IAPP. Am J Physiol Endocrinol Metab 291, E1317-1324. Meredith, D. (2004). Site-directed Mutation of Arginine 282 to Glutamate Uncouples the Movement of Peptides and Protons by the Rabbit Proton-peptide Cotransporter PepT1. JBiolChem 279, 15795-15798. Meredith, D. (2009). The mammalian proton-coupled peptide cotransporter PepT1: sitting on the transporter-channel fence? PhilTransRSocLond 364, 203- 207. Meredith, D., and Price, R.A. (2006). Molecular Modeling of PepT1 -- Towards a Structure. Journal of Membrane Biology 213, 79-88. Michel, J., Tirado-Rives, J., and Jorgensen, W.L. (2009). Prediction of the water content in protein binding sites. The journal of physical chemistry B 113, 13337- 13346. Newstead, S., Drew, D., Cameron, A.D., Postis, V.L., Xia, X., Fowler, P.W., Ingram, J.C., Carpenter, E.P., Sansom, M.S., McPherson, M.J., et al. (2011). Crystal structure of a prokaryotic homologue of the mammalian oligopeptide- proton symporters, PepT1 and PepT2. The EMBO journal 30, 417-426. Nilsson, M.R., and Raleigh, D.P. (1999). Analysis of amylin cleavage products provides new insights into the amyloidogenic region of human amylin. J Mol Biol 294, 1375-1385. Nurisso, A., Blanchard, B., Audfray, A., Rydner, L., Oscarson, S., Varrot, A., and Imberty, A. (2010). Role of water molecules in structure and energetics of Pseudomonas aeruginosa lectin I interacting with disaccharides. The Journal of biological chemistry 285, 20316-20327. Padrick, S.B., and Miranker, A.D. (2001). Islet amyloid polypeptide: identification of long-range contacts and local order on the fibrillogenesis pathway. J Mol Biol 308, 783-794. Paravastu, A.K., Leapman, R.D., Yau, W.M., and Tycko, R. (2008). Molecular structural basis for polymorphism in Alzheimer's beta-amyloid fibrils. Proc Natl Acad Sci U S A 105, 18349-18354. Petersen, B., Petersen, T., Andersen, P., Nielsen, M., and Lundegaard, C. (2009). A generic method for assignment of reliability scores applied to solvent accessibility predictions. BMC Structural Biology 9, 51. 157 Petkova, A.T., Yau, W.M., and Tycko, R. (2006). Experimental constraints on quaternary structure in Alzheimer's beta-amyloid fibrils. Biochemistry 45, 498-512. Phillips, J.C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., Chipot, C., Skeel, R.D., Kale, L., and Schulten, K. (2005). Scalable molecular dynamics with NAMD. J Comput Chem 26, 1781-1802. Pitt, W.R., and Goodfellow, J.M. (1991). Modelling of solvent positions around polar groups in proteins. Protein engineering 4, 531-537. Qin, Z., Hu, D., Han, S., Hong, D.P., and Fink, A.L. (2007). Role of different regions of alpha-synuclein in the assembly of fibrils. Biochemistry 46, 13322- 13330. Radovan, D., Smirnovas, V., and Winter, R. (2008). Effect of pressure on islet amyloid polypeptide aggregation: revealing the polymorphic nature of the fibrillation process. Biochemistry 47, 6352-6360. Reichmann, D., Phillip, Y., Carmi, A., and Schreiber, G. (2008). On the contribution of water-mediated interactions to protein-complex stability. Biochemistry 47, 1051-1060. Reinke, A.A., and Gestwicki, J.E. (2011). Insight into amyloid structure using chemical probes. Chem Biol Drug Des 77, 399-411. Robinson, C.R., and Sligar, S.G. (1998). Changes in solvation during DNA binding and cleavage are critical to altered specificity of the EcoRI endonuclease. Proceedings of the National Academy of Sciences of the United States of America 95, 2186-2191. Rubio-Aliaga, I., and Daniel, H. (2008). Peptide transporters and their roles in physiological processes and drug disposition. Xenobiotica 38, 1022-1042. Saier, M.H.J., Tran, C.V., and Barabote, R.D. (2006). TCDB: the transporter classification database for membrane transport protein analysis and information. Nucleic Acids Res 34, D181-D186. Schlessman, J.L., Abe, C., Gittis, A., Karp, D.A., Dolan, M.A., and Garcí a- Moreno E, B. (2008). Crystallographic study of hydration of an internal cavity in engineered proteins with buried polar or ionizable groups. Biophysical journal 94, 3208-3216. Schrodinger, LLC (2010). The PyMOL Molecular Graphics System, Version 1.3r1. 158 Sen, T.Z., Jernigan, R.L., Garnier, J., and Kloczkowski, A. (2005). GOR V server for protein secondary structure prediction. Bioinformatics 21, 2787-2788. Sonavane, S., and Chakrabarti, P. (2009). Cavities in protein-DNA and protein- RNA interfaces. Nucleic acids research 37, 4613-4620. Stefani, M. (2004). Protein misfolding and aggregation: new examples in medicine and biology of the dark side of the protein world. Biochim Biophys Acta 1, 5-25. Sumner Makin, O., and Serpell, L.C. (2004). Structural characterisation of islet amyloid polypeptide fibrils. J Mol Biol 335, 1279-1288. Sutch, B.T., Chambers, E.J., Bayramyan, M.Z., Gallaher, T.K., and Haworth, I.S. (2009). Similarity of protein-RNA interfaces based on motif analysis. Journal of chemical information and modeling 49, 2139-2146. Team, R.D.C. (2009). R: A language and environment for statistical computing (Vienna, Austria: R Foundation for Statistical Computing). Teyra, J., and Pisabarro, M.T. (2007). Characterization of interfacial solvent in protein complexes and contribution of wet spots to the interface description. Proteins 67, 1087-1095. Tracz, S.M., Abedini, A., Driscoll, M., and Raleigh, D.P. (2004). Role of aromatic interactions in amyloid formation by peptides derived from human Amylin. Biochemistry 43, 15901-15908. Treger, M., and Westhof, E. (2001). Statistical analysis of atomic contacts at RNA-protein interfaces. Journal of molecular recognition : JMR 14, 199-214. van Dijk, A.D.J., and Bonvin, A.M.J.J. (2006). Solvated docking: introducing water into the modelling of biomolecular complexes. Bioinformatics (Oxford, England) 22, 2340-2347. Vig, B.S., Stouch, T.R., Timoszyk, J.K., Quan, Y., Wall, D.A., Smith, R.L., and Faria, T.N. (2006). Human PEPT1 Pharmacophore Distinguishes between Dipeptide Transport and Binding. Journal of Medicinal Chemistry 49, 3636-3644. Virtanen, J.J., Makowski, L., Sosnick, T.R., and Freed, K.F. (2010). Modeling the hydration layer around proteins: HyPred. Biophysical journal 99, 1611-1619. Wang, L., Middleton, C.T., Singh, S., Reddy, A.S., Woys, A.M., Strasfeld, D.B., Marek, P., Raleigh, D.P., de Pablo, J.J., Zanni, M.T., et al. (2011). 2DIR 159 spectroscopy of human amylin fibrils reflects stable beta-sheet structure. J Am Chem Soc 133, 16062-16071. Wasmer, C., Lange, A., Van Melckebeke, H., Siemer, A.B., Riek, R., and Meier, B.H. (2008). Amyloid fibrils of the HET-s(218-289) prion form a beta solenoid with a triangular hydrophobic core. Science 319, 1523-1526. Wei, L., Jiang, P., Xu, W., Li, H., Zhang, H., Yan, L., Chan-Park, M.B., Liu, X.W., Tang, K., Mu, Y., et al. (2011). The molecular basis of distinct aggregation pathways of islet amyloid polypeptide. J Biol Chem 286, 6291-6300. Westermark, P., Engstrom, U., Johnson, K.H., Westermark, G.T., and Betsholtz, C. (1990). Islet amyloid polypeptide: pinpointing amino acid residues linked to amyloid fibril formation. Proc Natl Acad Sci U S A 87, 5036-5040. Westermark, P., Wernstedt, C., Wilander, E., Hayden, D.W., O'Brien, T.D., and Johnson, K.H. (1987a). Amyloid fibrils in human insulinoma and islets of Langerhans of the diabetic cat are derived from a neuropeptide-like protein also present in normal islet cells. Proc Natl Acad Sci U S A 84, 3881-3885. Westermark, P., Wilander, E., and Johnson, K.H. (1987b). Islet amyloid polypeptide. Lancet 2, 623. Wiltzius, J.J., Sievers, S.A., Sawaya, M.R., Cascio, D., Popov, D., Riekel, C., and Eisenberg, D. (2008). Atomic structure of the cross-beta spine of islet amyloid polypeptide (amylin). Protein Sci 17, 1467-1474. Woods, S.C., Lutz, T.A., Geary, N., and Langhans, W. (2006). Pancreatic signals controlling food intake; insulin, glucagon and amylin. Philos Trans R Soc Lond B Biol Sci 361, 1219-1235. Xu, L., Haworth, I.S., Kulkarni, A.A., Bolger, M.B., and Davies, D.L. (2009). Mutagenesis and Cysteine Scanning of Transmembrane Domain 10 of the Human Dipeptide Transporter. Pharm Res 26, 2358-2366. Xu, L., Li, Y., Haworth, I.S., and Davies, D.L. (2010). Functional role of the intracellular loop linking transmembrane domains 6 and 7 of the human dipeptide transporter hPEPT1. The Journal of membrane biology 238, 43-49. Ye, X., Gorin, A., Ellington, A.D., and Patel, D.J. (1996). Deep penetration of an alpha-helix into a widened RNA major groove in the HIV-1 rev peptide-RNA aptamer complex. Nature structural biology 3, 1026-1033. Yokota, A., Tsumoto, K., Shiroishi, M., Kondo, H., and Kumagai, I. (2003). The role of hydrogen bonding via interfacial water molecules in antigen-antibody 160 complexation. The HyHEL-10-HEL interaction. The Journal of biological chemistry 278, 5410-5418. Young, A. (2005). Inhibition of food intake. Adv Pharmacol 52, 79-98. Zhang, Y.W., and Rudnick, G. (2005). Cysteine-scanning mutagenesis of serotonin transporter intracellular loop 2 suggests an alpha-helical conformation. J Biol Chem 280, 30807-30813. Zhao, J., Yu, X., Liang, G., and Zheng, J. (2011a). Heterogeneous triangular structures of human islet amyloid polypeptide (amylin) with internal hydrophobic cavity and external wrapping morphology reveal the polymorphic nature of amyloid fibrils. Biomacromolecules 12, 1781-1794. Zhao, J., Yu, X., Liang, G., and Zheng, J. (2011b). Structural polymorphism of human islet amyloid polypeptide (hIAPP) oligomers highlights the importance of interfacial residue interactions. Biomacromolecules 12, 210-220. Zhou, Y., Guan, L., Freites, J.A., and Kaback, H.R. (2008). Opening and closing of the periplasmic gate in lactose permease. Proceedings of the National Academy of Sciences 105, 3774-3778. Zraika, S., Hull, R.L., Verchere, C.B., Clark, A., Potter, K.J., Fraser, P.E., Raleigh, D.P., and Kahn, S.E. (2010). Toxic oligomers and islet beta cell death: guilty by association or convicted by circumstantial evidence? Diabetologia 53, 1046-1056.
Abstract (if available)
Abstract
Amyloid fibrils are associated with over 40 human diseases. For example, fibrils of human islet amyloid polypeptide (hIAPP), α-synuclein, and amyloid-β are pathological hallmarks of type 2 diabetes, Parkinson’s disease and Alzheimer’s disease, respectively. Understanding of fibril structure and that of the protofilaments that constitute the fibrils will help to reveal the mechanism of fibril formation in human diseases and to facilitate therapeutic intervention. However, the fibril structures of full-length amyloid proteins/peptides are difficult to determine by direct experimental approaches since they are neither soluble nor crystallizable. ❧ The goals of this thesis are to develop algorithms, programs and protocols for the modeling and determination of amyloid fibril structures based on experimental data mainly obtained from election paramagnetic resonance (EPR) spectroscopy and electron microscopy (EM), and then to use the resulting models for explaining previously unclear aspects of the experimental data and for aiding experimental design to obtain more structural details of the amyloid fibrils. ❧ In the thesis work, a simulation protocol for modeling amyloid protofilaments mainly based on EPR data was developed. This protocol was demonstrated to be able to produce structures of hIAPP protofilaments consistent with the experimental data. Then, to investigate the properties of the protofilament structures obtained and to generate fibril models consisting of more than one protofilament, a new program, MFIBRIL, was developed for flexible construction of fibrils from a monomeric unit. Several potential models of the hIAPP fibril were constructed using MFIBRIL and then refined by equilibration using molecular dynamics simulations in the NAMD software package. The refined models were evaluated by predicting the ring-to-ring distances using the MFIBRIL analysis tool and the inter-spin label distances and residue mobilities using PRONOX (another program developed in our laboratory) and comparing the predictions to the experimental data. This work led to identification of a favorable hIAPP fibril model consisting of two protofilaments with strong hydrophobic interactions between I26 and V32 surrounding hydrophilic interactions between S28 and T30 in the second β-strand (C-terminal) regions. The programs and protocols developed in this work are applicable to structural determination of other amyloid fibrils. ❧ In addition to the modeling of amyloid fibrils, two other projects are described in the thesis. The first is addition to the solvation function of WATGEN (a program developed in our laboratory for modeling the water network at protein-peptide interfaces) to protein-RNA interfaces and investigation of the features of solvated protein-RNA interfaces. The second project involves modeling of the transport mechanism of the human dipeptide transporter (hPEPT1).
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Computer modeling of human islet amyloid polypeptide
PDF
Structural features and modifiers of islet amyloid polypeptide: implications for type II diabetes mellitus
PDF
Structural studies of the IAPP membrane-mediated aggregation pathway
PDF
Before they were amyloid: understanding the toxicity of disease-associated monomers and oligomers prior to their aggregation
PDF
Enhancing and inhibiting diabetic amyloid misfolding
PDF
Structure and kinetics of the Orb2 functional amyloid
PDF
Oligomer formation of functional amyloid protein - Orb2A
PDF
Developing recombinant single chain Fc-dimer fusion proteins for improved protein drug delivery
PDF
The effect of familial mutants of Parkinson's disease on membrane remodeling
PDF
Computer modeling of protein-peptide interface solvation
PDF
Molecular modeling of cyclodextrin interactions with proteins
PDF
Integration of KNIME and molecular docking for evaluation of tau fibril inhibitors
PDF
Solvation as a driving force for peptide docking to the major histocompatibility complex (MHC) class II molecules
PDF
Optimizing small compounds to better understand tau fibril inhibition
PDF
Computational model for predicting ionic solubility
PDF
Methods and protocols for detecting the intracellular assembly of elastin-like polypeptides
PDF
Vitronectin misfolding and aggregation: implications for the pathophysiology of age-related diseases
PDF
The folding, misfolding and refolding of membrane proteins and the design of a phosphatidylserine-specific membrane sensor
PDF
Discovery of a novel HIV-1 integrase inhibitor binding site: insight into enzyme structure/function and inhibitor design
PDF
Huntington’s disease vaccine development using fungal prion HET-s
Asset Metadata
Creator
Li, Yiyu
(author)
Core Title
Algorithm development for modeling protein assemblies
School
School of Pharmacy
Degree
Doctor of Philosophy
Degree Program
Molecular Pharmacology and Toxicology
Publication Date
04/08/2013
Defense Date
01/10/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
algorithm,human islet amyloid polypeptide (hIAPP),MFIBRIL,molecular modeling,OAI-PMH Harvest,protein assemblies
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Haworth, Ian S. (
committee chair
), Langen, Ralf (
committee member
), Neamati, Nouri (
committee member
)
Creator Email
yiyuli@usc.edu,yyseashell@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-234407
Unique identifier
UC11293183
Identifier
usctheses-c3-234407 (legacy record id)
Legacy Identifier
etd-LiYiyu-1526.pdf
Dmrecord
234407
Document Type
Dissertation
Rights
Li, Yiyu
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
algorithm
human islet amyloid polypeptide (hIAPP)
MFIBRIL
molecular modeling
protein assemblies