Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Structure-based computational analysis and prediction of TCR CDR3 loops in the TCR-peptide-MHC complex using solvation parameters and peptide molecular dynamics.
(USC Thesis Other)
Structure-based computational analysis and prediction of TCR CDR3 loops in the TCR-peptide-MHC complex using solvation parameters and peptide molecular dynamics.
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Structure-based computational analysis and prediction of TCR CDR3 loops in the TCR-peptide-
MHC complex using solvation parameters and peptide molecular dynamics.
By
Vini Patel
A Thesis Presented to the
FACULTY OF THE USC ALFRED E. MANN SCHOOL OF PHARMACY AND
PHARMACEUTICAL SCIENCES
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
(MOLECULAR PHARMACOLOGY AND TOXICOLOGY)
August 2023
ii
Acknowledgements
I want to express my profound gratitude to the people who have supported me during my research
endeavor and my master’s studies. Their contributions have made a huge impact on my academic
pursuits, and I extend my heartfelt gratitude to them.
First of all, I would like to extend my deepest gratitude to my mentor, Dr. Ian Haworth for his
guidance and mentorship. Dr. Haworth not only provide me with an opportunity to explore a novel
area of research but also meticulously guided me through the entire project. His insightful
suggestions and his support to develop and pursue my own ideas played a pivotal role in shaping
my research work. He provided me with consistent support during the difficult part of the research
and provided unwavering support during the crafting of my thesis. It was a great pleasure to know
him and be a part of his research team. I would also like to extend my sincere gratitude to my lab
colleague, Nairuti Mehta. I am thankful for her collaboration, and the valuable insights she brought
to our shared endeavors.
I am deeply appreciative of the invaluable feedback and suggestions provided by my thesis
committee members, Dr. Serghei Mangul and Dr. Houda Alachkar. Their expertise and thoughtful
guidance have been instrumental in refining the quality of my thesis.
Lastly, my journey would not have been possible without the steadfast support of my parents and
brother. Their unwavering belief in my abilities, coupled with their emotional and financial
support, has been a driving force behind my accomplishments. Their encouragement to relentlessly
pursue my goals has been a constant source of inspiration.
iii
Table of Contents
Acknowledgements ......................................................................................................................... ii
List of Tables .................................................................................................................................. iv
List of Figures ................................................................................................................................. v
Abstract .......................................................................................................................................... ix
Chapter 1: Introduction ................................................................................................................... 1
1.1 Background ................................................................................................................... 1
1.2 Structure of MHC Class-I molecules and binding of antigenic peptides ...................... 3
1.3 TCR structure and binding of antigenic peptides ......................................................... 5
1.4 Recent structure-based studies for TCR binding to the peptide ................................... 6
Chapter 2: Methods for Development of CDR3 Loop Conformations and
Molecular Interactions .................................................................................................................... 9
2.1 Introduction ................................................................................................................... 9
2.2 Methods....................................................................................................................... 15
Chapter 3: Application of solvation and Molecular Analysis of CDR3 loops .............................. 39
3.1 Introduction ................................................................................................................. 39
3.2 Structures with CDR3 loop variation obtained from AddProt .................................... 40
3.3 Molecular interactions between peptide and CDR3 loops .......................................... 43
3.4 Inter-Loop and Intra-Loop molecular analysis of structures with CDR3α/CDR3β
variations using python scripts .......................................................................................... 48
Chapter 4: Molecular Dynamics Simulations of peptides from
MHC-peptide-TCR complexes ..................................................................................................... 55
4.1 Introduction ................................................................................................................. 55
4.2 Methods....................................................................................................................... 56
4.3 Results ......................................................................................................................... 58
Chapter 5: Discussion ................................................................................................................... 89
References ..................................................................................................................................... 91
Appendix ....................................................................................................................................... 94
iv
List of Tables
Table 1: List of TCRs from the TCR3d database with peptide sequences and sequences
of CDR3 loops. ............................................................................................................................. 12
Table 2: Excel sheet used to read PDB name, chain letter, and start and end residue number. .... 17
Table 3: Ten PDB structures with CDR3a loop sequence, angle increment total conformers,
and 10 selected conformers ........................................................................................................... 42
Table 4: Ten PDB structures with CDR3b loop sequence, angle increment total conformers,
and 10 selected conformers ........................................................................................................... 42
Table 5: Snippet from the AddProt solutions output for CDR3a of 1AO7 PDB structures .......... 43
Table 6: Example of the 1AO7 complex with the Solvate result for CDR3α and CDR3β
variation. ....................................................................................................................................... 44
Table 7: Hydrophobic interactions and oxygen clashes between CDR3 loops for 1AO7
X-ray structure and CDR3 loop varied structures. ........................................................................ 49
Table 8: Average hydrophobic interactions and oxygen clash distances between CDR3α and
CDR3β for 15 PDB structures comparing X-ray structures and CDR3loop varied structures. .... 50
Table 9: Comparison of all peptides based on conformations and binding .................................. 61
Table 10: Characteristics of conformers in molecular dynamics simulations of GILGFVFTL ... 64
Table 11: Characteristics of conformers in molecular dynamics simulations of GILGKVFTL ... 68
Table 12: Characteristics of conformers in molecular dynamic simulations of GILGA VFTL. .... 71
Table 13: Characteristics of conformers in molecular dynamic simulations of GILGFVKTL. ... 75
Table 14: Characteristics of conformers in molecular dynamic simulations of GILGFVFVL. ... 79
Table 15: Characteristics of conformers in molecular dynamic simulations of GLFGGGFGV .. 82
Table 16: Characteristics of conformers in molecular dynamics simulations of GILGGGFTL ... 86
v
List of Figures
Figure 1: MHC-peptide-TCR complex. The CDR3 α and β loops are shown to bind
to the peptide that is present in the binding cleft of the MHC heavy chain .................................... 3
Figure 2: Flow chart explaining the process of development of CDR3 conformations. .............. 14
Figure 3: Python Code used for the extraction of the CDR3 loops ............................................. 17
Figure 4: Example of an in_peptide.txt file .................................................................................. 19
Figure 5: Example of run.bat file that runs the AddProt ............................................................... 21
Figure 6: Python Code that extracts out the conformer from the AddProt output and
replaces the X-ray CDR3 loop with the varied AddProt conformer ............................................. 25
Figure 7: Incorrect C-O-N-H torsion angle of 1AO7 CDR3β loop .............................................. 26
Figure 8: Python Code for correction of Torsion angle (written by Nairuti Mehta).
The code calculates the dihedral angles and sets them to 180º ..................................................... 28
Figure 9: Python Code that reads and extracts values from the Excel sheet for automation. ....... 30
Figure 10: Solvate GUI with the MHC TCR and peptide file input parameter ............................ 35
Figure 11: Clash distance code example ....................................................................................... 37
Figure 12: Code for calculation of intermolecular interactions. A similar code was
used for the calculation of intra-molecular interaction and the oxygen clashes ........................... 39
Figure 13: 416 conformers of the 1AO7 CDR3α loop obtained from AddProt. .......................... 41
Figure 14: Box plots comparing four categories of complexes- Binding ( X-ray structure)
Non-binding conformations with variation in peptide, Non-binding conformation
with variation in CDR3α loop, and Non-binding conformation with variation in
CDR3α loop and peptide orientation ............................................................................................ 46
Figure 15: Box plots comparing four categories of complexes- Binding ( X-ray structure),
non-binding conformations with variation in peptide, non-binding conformation with
variation in CDR3β loop, and non-binding conformation with variation in CDR3β
loop and peptide orientation .......................................................................................................... 46
Figure 16: Box plots comparing four categories of complexes- Binding ( X-ray structure),
non-binding conformations with variation in peptide, non-binding conformation
with variation in CDR3α loop, and non-binding conformation with variation in
CDR3α loop and peptide orientation ............................................................................................ 47
vi
Figure 17: Box plots comparing four categories of complexes- Binding (X-ray structure),
non-binding conformations with variation in peptide, non-binding conformation with
variation in CDR3β loop, and non-binding conformation with variation in CDR3β
loop and peptide orientation .......................................................................................................... 47
Figure 18: Box plots comparing the hydrophobic interaction between the CDR3 loops and
Peptide (Binding - X-ray Structure and Non-Binding – Structures with variation in
CDR3⍺/CDR3β) ........................................................................................................................... 48
Figure 19: Box plots comparing the hydrophobic interaction between the CDR3 loops
(Binding - X-ray Structure and Non-Binding – Structures with variation in CDR3⍺/CDR3β) ... 51
Figure 20: Box plots comparing the hydrophobic interaction of the CDR3 loops
(Binding - X-ray Structure and Non-Binding – Structures with variation in CDR3⍺
and CDR3β) .................................................................................................................................. 52
Figure 21: Box plots comparing the internal hydrophobic interaction of the CDR3 loops
(Binding–- X-ray Structure and Non-Binding – Structures with variation in CDR3⍺/CDR3β) .. 53
Figure 22: Box plots comparing the oxygen clash between the CDR3 loops
(Binding - X-ray Structure and Non-Binding – Structures with variation in CDR3⍺/ CDR3β) .. 54
Figure 23: Box plots comparing the internal oxygen clash of the CDR3 loops
(Binding - X-ray Structure and Non-Binding – Structures with variation in CDR3⍺/CDR3β) ... 55
Figure 24: YANACONDA Code to calculate the N-C terminal and plot a distance
vs. time plot for all the trajectories. .............................................................................................. 58
Figure 25: YANACONDA Code written for the calculation of the center of mass
of a residue in the peptide GILGFVKTL and the code to plot graphs for the distance
between the center of mass of the selected residues. .................................................................... 59
Figure 26: Molecular Structure of GILGFVFTL .......................................................................... 64
Figure 27: Quantitative analysis of the N-C terminal distance to capture stable conformers of
GILGFVFTL ................................................................................................................................. 64
Figure 28: The N-C terminal distance graph for GILGFVFTL .................................................... 65
Figure 29: The distances between the center of mass of the different residues in the peptide.
(A) Leu3-Phe5 (B) Phe5-Phe7 (C) Leu3-Phe7(D) Ile2-Vale6 ...................................................... 66
Figure 30: Molecular Structure of GILGKVFTL ......................................................................... 68
Figure 31: Quantitative analysis of the N-C terminal distance to capture stable conformers
of GILGKVFTL. ........................................................................................................................... 68
Figure 32: The N-C terminal distance graph for GILGKVFTL ................................................... 69
vii
Figure 33: The distances between the center of mass of the different residues in the peptide.
(A) Leu3-Lys5 (B) Lys5-Phe7 (C) Leu3-Phe7(D) Ile2-Vale6 ..................................................... 70
Figure 34: Molecular structure of GILGA VFTL .......................................................................... 71
Figure 35: Quantitative analysis of the N-C terminal distance to capture stable conformers
of GILGA VFTL ........................................................................................................................... 71
Figure 36: The N-C terminal distance graph for GILGA VFTL .................................................... 72
Figure 37: The distances between the center of mass of the different residues in the peptide.
(A) Leu3-Ala5 (B) Ala5-Phe7 (C) Leu3-Phe7(D) Ile2-Vale6 ...................................................... 73
Figure 38: Molecular structure of GILGFVKTL .......................................................................... 75
Figure 39: Quantitative analysis of the N-C terminal distance to capture stable conformers
of GILGFVKTL ............................................................................................................................ 75
Figure 40: The N-C terminal distance graph for GILGFVKTL ................................................... 76
Figure 41: The distances between the center of mass of the different residues in the peptide.
(A) Leu3-Phe5 (B) Phe5-Lys7 (C) Leu3-Lys7(D) Gly2-Gly6. .................................................... 77
Figure 42: Molecular Structure of GILGFVFVL ......................................................................... 79
Figure 43: Quantitative analysis of the N-C terminal distance to capture stable conformers
of GILGFVFVL ............................................................................................................................ 79
Figure 44: The N-C terminal distance graph for GILGFVFVL. .................................................. 80
Figure 45: The distances between the center of mass of the different residues in the peptide.
(A) Leu3-Phe5 (B) Phe5-Lys7 (C) Leu3-Lys7(D) Gly2-Gly6. .................................................... 81
Figure 46: Molecular Structure of GLFGGGFGV ....................................................................... 82
Figure 47: Quantitative analysis of the N-C terminal distance to capture stable conformers
of GLFGGGFGV . ......................................................................................................................... 82
Figure 48: The N-C terminal distance graph for GLFGGGFGV . ................................................. 83
Figure 49: The distances between the center of mass of the different residues in the peptide.
(A) Phe3-Gly5 (B) Phe5-Phe7 (C) Leu3-Lys7(D) Leu2-Gly6. .................................................... 84
Figure 50: Molecular Structure of GILGGGFTL ......................................................................... 86
Figure 51: Quantitative analysis of the N-C terminal distance to capture stable conformers
of GILGGGFTL ............................................................................................................................ 86
Figure 52: The N-C terminal distance graph for GILGGGFTL ................................................... 87
viii
Figure 53: The distances between the center of mass of the different residues in the peptide.
(A) Leu3-Gly5 (B) Gly3-Phe7 (C) Leu3-Phe7(D) Ile2-Gly6. ...................................................... 88
Figure 54: The N-C terminal distance for the peptide GGGGGGGGG. ...................................... 89
ix
Abstract
T-cells play an important role in the activation of the immune response, especially through adaptive
immunity. The T-cell receptor scans the cells for MHC molecules containing an antigen peptide.
The peptide molecule first binds to the MHC molecule and then the T-cell receptor attaches to this
complex by changing the conformations of the complementarity-determining regions (CDRs) of
the T-cell receptor. The CDRs bind to both MHC molecules and peptides. The CDR3 region is
mainly important for the binding to the peptide and understanding the mechanism of its binding
can allow an understanding of the immune response and help in the development of cancer
immunotherapy. This thesis discusses a computational method developed for structural analysis of
the CDR3 region of the T-cell receptor and the factors that play an important role in interactions
of the peptide and CDR3 regions. The results showed that CDR3 loops and peptides try to
maximize internal hydrophobic contacts with only a few hydrophobic interactions between the
CDR3 loops and the peptide. Also, CDR3 β loops played a major role in interaction with the
peptide. Both the loops adjusted relative to each other to fit with the peptide-MHC complex. In
addition, the structural stability of the peptide in the physiological solution was studied using
molecular dynamics. The molecular dynamics results suggest that the stability of the peptide, in
addition to anchor residues, plays an important role in binding with the MHC molecule.
1
Chapter 1: Introduction
1.1 Background:
Adaptive immunity is a second-line defense against viral, bacterial, and pathogenic infections. The
adaptive immune system usually involves a combination of an antibody-dependent immune
response and a cell-based immune response. These processes are carried out by two important
types of lymphocytes: B-cells and T-cells, respectively. Cell-mediated immunity occurs when an
infected cell presents its antigen peptide to the T cell, which activates the T-cell. This can lead to
the destruction of the host cell, or the T-cells can produce molecules that will activate macrophages
for the destruction of the infecting organism
1
.
The T-cell is presented to antigen peptide on the surface of cells that have been infected by
pathogens or are ingested by dendritic cells, with the help of a complex known as major
histocompatibility complex (MHC). There are two types of MHC complexes, MHC class-I and
MHC class-II. Both classes are responsible for activating different T-cell types, leading to different
immune responses downstream. MHC class- I is presented on cells with intracellular infection and
activates CD8+ T-cells. The CD8+ T-cells are responsible for the cytotoxic behavior killing the
infected cells. The MHC class-II molecules are generally presented when an extracellular pathogen
needs to be eradicated and is presented by certain special types of cells such as dendritic cells,
macrophages, and B-cells. MHC class-II molecules lead to the activation of CD4+ T -cells that
activate other cells such as macrophages and B cells, leading to formation of antibodies
2,3
. This
thesis only deals with the MHC class-I peptide complex, so the MHC class-I peptide complex and
its interaction with TCR will be discussed here.
2
When a cell is infected by a pathogen or virus, these organisms produce certain proteins within the
cells. These proteins are broken down into small fragments in the cell cytoplasm via the
proteasome. These fragments of proteins that have originated from the pathogen or virus will serve
as an antigen for binding to the MHC class-I molecule. The MHC class-I molecule binds to specific
antigen peptides that fit its binding groove. Thus, after phagocytosis, the peptides undergo
trimming by cytosolic aminopeptidases and are then transported to ER lumen, further trimmed by
ER aminopeptidases, and then bound to TAP proteins and presented to MHC class-I molecules.
The MHC class-I molecule can bind to a large repertoire of antigenic peptides but selects only the
optimal candidates that are suitable for its binding groove. This MHC class-I molecule is then
transported to the cell surface, where it is presented to cytotoxic T- cells for activation of the
immune response. The antigen-presenting cells also have two more different types of molecules
presented on the surface, co-stimulatory molecules that bind to complementary protein molecules
on the surface of T-cells, and adessive molecules that allow the T-cell to bind to the antigen-
presenting cell so that it is able to bind to the MHC-peptide complex. The TCR present on the T-
cell can then identify the MHC-class I molecule and the peptide presented by it to be activated
4
.
3
Figure 1: MHC-peptide-TCR complex. The CDR3 α and β loops are shown to bind to the peptide
that is present in the binding cleft of the MHC heavy chain
5
.
1.2 Structure of MHC class-I molecules and binding of the antigenic peptide:
The class-I MHC molecule consists of a polymorphic heavy chain and a macroglobulin β chain, a
conserved chain, and an endogenous antigen epitope. Thus, the MHC class-I molecule forms a
ternary structure. The heavy chain is composed of three components- an extracellular region, a
transcellular region, and a cytosolic region. The extracellular portion consists of a1 and a2
domains which form a dimer with elongated α helices and a β strand floor which forms the peptide
binding cleft. Thus, the binding cleft consists of a β-sheet surrounded by α helices. The whole of
the heavy chain is highly polymorphic with most of the polymorphism present in the peptide
binding cleft to accommodate a large repertoire of antigenic peptides that bind to it. There is also
an a3 domain which is like a macroglobulin molecule which is not required for peptide binding
and their position as compared to the binding cleft depends on the genetic variations and alleles.
The b2m is also another macroglobulin that forms many contacts with a1 and a2 domains. It helps
in maintaining the peptide binding cleft and thus supports the binding of the peptide
6
.
TCR ! loop
TCR β loop
CDR3 β loop
CDR3 ! loop
Peptide
MHC heavy Chain
MHC β-microglobulin
4
The binding cleft in the MHC class-I molecule consists of conserved regions on either side,
available to accommodate the carboxyl and amino termini of the peptide. Hydrogen bonds are used
for fitting the peptides in these conserved regions. This allows the MHC to bind to diverse groups
of peptides and have a good affinity for the peptide. This kind of binding works irrespective of the
length of antigenic peptide binding as a longer peptide can adjust to bind to carboxyl and the
amino-terminal by bulging or creating a zig-zag pattern
6
.
The MHC class-I molecule can bind to a specific number of peptides based on the allele. The
polymorphic residues make up certain pockets in the binding cleft. Only a certain number of
specific peptide residues can fit in these regions. There are typically two binding pockets that bind
to the second and the ninth position of the peptide backbone. These binding pockets were named
‘anchor’ residues as they are responsible for predicting whether the peptide will bind to the binding
cleft or not. This allows a proper orientation of the peptide in the binding cleft and causes a bulging
in the center of the peptide chain. There are many water molecules present in this region which
allows the bulging residues of the peptide backbone to bind with the binding cleft. Also, the
bulging residues are 4-7 which is generally responsible for binding with the T-cell receptor. Thus,
the different preferences to specific amino acids across the binding cleft allow prediction of the
sequence of the peptide that will be able to bind to a specific type of MHC class-I molecule. Some
secondary pockets bind to the first and third residues of the peptide chain. The residues that bind
in this region can either increase or decrease the binding affinity. The total binding energy is the
sum of all the residues binding in the cleft pocket
6
.
5
1.3 T-cell receptor structure and binding to the antigen peptide:
The T-cell receptor (TCR) is one of the most studied receptor structures for its importance in
immunological therapy. These receptors are found on the cell membrane of the T-cells and are
responsible for the recognition of the antigen, which leads to activation of a signaling pathway that
in turn activates the immune responses.
The T-cell receptor consists of two chains- α and β polypeptide chains that consist of an
extracellular region -that binds to the antigen, a transmembrane region, and a short cytoplasmic
region. There is also the presence of a membrane-proximal connecting peptide. The two α-β chains
are connected by the disulfide bonds and contain variable immunoglobulin regions to bind to the
antigen; however, unlike an antibody, the TCR has only one binding site. Some minority T cell
receptors also have γ and σ chains, that bind to the antigen. Development of TCRs occurs in
the thymus, where different sets of genetic segments determine the development of the main
polypeptide chain in the TCR. The TCR α gene is made up of DNA recombination of the
Variable gene (V) and Joining (J) gene segment (V α and J α segments). The β chain, on the
other hand, is composed of DNA recombination of the Variable segment, the Joining segment,
and an additional D gene segment. (V β, D, and J β segments)
7
. These genetic segments are
followed by a constant region. VDJ recombination along with the addition and deletion of the
nucleotides during the TCR development, allows for the development of wide-ranging T-cell
receptors. The diversity generated could lead to the production of 10
15
different TCRs
7
. The
genetic recombination allows for the development of highly variable loops in the TCRs which
bind to the p-MHC complexes. These binding region of the TCR is composed of six
complementarity-determining regions (CDR1, 2,3) in both α and β chains
7
. The CDR1 and
CDR2 region generally have a constant genetic “germline” arrangement that remains
6
conserved for all TCRs binding to the same kind of MHC molecule. These regions bind with
the conserved regions of the MHC allele. This is based on the Jerne hypothesis, which states
that there is coevolution of α-β chains of the TCR and MHC molecules, and the extension of
the hypothesis is that it acts as a bias for the TCR to bind to the MHC molecule so that the
TCR can further investigate the antigenic peptide for activation
8
. The CDR3 regions are
highly variable regions as they are mainly responsible for interacting with the peptide portion
of the p-MHC complex. This hypervariability of CDR3 loops allows them to identify a large
repertoire of peptide residues, thereby initiating the immune response. So, the CDR3 loop
mainly interacts with the middle amino acids of the peptide. The TCR diagonally binds to the
MHC-peptide complex. This allows for the CDR1 and CDR2 loops to interact with the
specific amino acids of the MHC while the CDR3 loop interacts with the peptides
9
. However,
recent studies have shown that there is a slight change of ±45 degrees in the diagonal binding
of the TCR with the peptide-MHC complex and thus, may lead to the changes in the conserved
residues that interact from the two proteins
10
. Studies have shown that when there is the same
V α, V β segment of the TCR on the MHC peptide, there is a difference in the amino acid
contacts between them and which makes it hard to predict the nature of the interaction
between these two proteins
11
. Thus, studying how the CDR3 loop binds to the peptide, changes
its conformation, and the favorability of certain amino acid residues for the CDR3 loop will
help to identify certain trends in the TCR-p-MHC complex binding and allow for the
prediction of peptide binding to TCR and its activation
11
.
1.4 Recent Structure-Based studies for TCR Binding to the Peptide:
A study carried out by Zhu et al. showed specific amino acids in the residue and on the CDR3
loops that are important for the interactions and characterized the type of interactions that takes
7
place. The interactions of the amino acid residues of TCR inside and outside of the disulfide bond
were compared with the peptide bound to the MHC molecule using molecular dynamic simulations
using GROMACS. Alanine and glycine scanning of important amino acid residues that have been
proposed to take part in antigen recognition and binding was also conducted. It was found that
antigenic sites of P2, P4, P8, and P6 are generally responsible for hydrogen bonding and pi stacks
with the residues of the TCR both inside and outside of the disulfide bond between the variable
region of the TCR. The amino acid residues for the CDR3 loop in the α chain are mainly made up
of the non-charged polar amino acids and non-aliphatic non-polar amino acids. These amino acids
form hydrogen bonding and pi stacks with the antigen residues. The amino acids ranged from the
94
th
to 103
rd
positions. All the residues except for the presence of the tyrosine on any of the residues
formed hydrogen bonding and the latter formed hydrophobic interaction in the form of pi stacks,
usually this was on the 103
rd
position. Similar properties were presented by the CDR3 β chain. It
was found that most of the amino acids that do not form disulfide bonds are generally hydrophobic
non-aliphatic amino acids that are responsible for interacting with the peptide using hydrogen
bonding and the pi stacks
12
.
A two-step mechanism has been proposed for the binding of the TCR with the pMHC complex. It
has been hypothesized that first there is binding of the CDR1 and CDR2 loops with the conserved
regions or specific regions of the MHC molecule. The bound and unbound crystallographic
structures suggest that after this initial binding, the CDR3 loops undergo a conformational shift
that allows it to fit into the peptide presented by the MHC molecule. The CDR3 loop assumes an
induced fit conformation with the peptide to activate the downstream pathway. This theory was
laid out by Wu et al
13
. Another theory also suggests that the CDR3 loops undergo a conformer
selection that is most compatible with the different sets of peptides from a set of equilibrium
8
conformers available. These mechanisms allow us to explain the cross-reactivity of the TCRs as
these conformers and the induced fit nature of the CDR3 loops allow a single TCR to bind with a
large number of peptide MHC complexes. This theory is backed by the study conducted by Scott
et al., in this study molecular dynamics and X-ray crystallography were used to study the CDR3 α
and CDR3 β and their variations in the CDR3 loops of A6 TCR in their bound and unbound states.
It was found that the CDR3 β has a highly variable energy landscape as compared to the CDR3 α
chain which allows it to be more flexible than the CDR3 α chain. This flexibility allows it to follow
the second theory of conformer selection from a large set of conformations. CDR3 α, on the other
hand, shows a rigidity in the conformations and this rigidity allows it to follow the two-set induced
fit theory and allows for the specificity of the TCR to be activated by certain peptides
14
.
Various tools have also been developed for the prediction of the peptide binding to the TCR using
bioinformatics and machine learning models. For example, NetTCR, which is a sequence-based
prediction model based on convolutional neural networks. The binding is predicted based on the
peptide sequence and the CDRβ chain sequence of the TCR
15
. Another tool available for the
predictions of the binding of the TCR with the peptide is epiTCR, which uses a Random Forest
based method for the prediction of the binding using the TCR CDR3β chain and peptide
sequences
16
. TCRrex also uses the same algorithm to predict the binding based on the CDR3
sequences
17
. TCR-Pred also predicts by collecting the data from various databases and predicts the
specificity of the peptide for the TCR CDR3 using structural sequences using the Prediction of
Activity Spectra for Substances algorithm
18
. To properly understand the underlying mechanisms
for the binding of the peptide with the TCR CDR3 loops, interactions between them in a solvated
environment will be able to simulate the environment where the binding occurs and develop trends
and patterns for the prediction.
9
This thesis deals with the development of a structure-based prediction model based on the
sequences of the peptide and the conformational changes of the TCR CDR3 loops. The process
follows the extraction of the CDR3 loops and changing the conformation of the middle three to
four amino acids that have been proposed to be interacting with the peptide. The loops are then
studied for the intermolecular and intramolecular hydrophobic interactions and oxygen clashes
when these loops are varied. In addition, the entire protein structures with the varied regions in the
loops are solvated and the interactions between the peptide and the protein are studied. This method
will allow us to study the structural interactions using the algorithms of the WATGEN5. The results
obtained from this method can then be used to develop a machine-learning model, that will allow
us to predict the binding of the peptide for the activation of the TCR based on the sequence.
10
Chapter 2: Method for Development of CDR3 Loop Conformations
and Molecular Interactions
2.1 Introduction:
As discussed in the previous chapter, the peptide that is bound to the MHC molecule, has a bulge
in the center, exposing its middle three amino acids to the TCR for recognition. The structures of
T-cell receptor-peptide-MHC (TCR-pMHC) complexes suggest there is a close contact between
the middle three amino acids of the peptide with the middle three amino acid residues of CDR3α
and CDR3β loops, the loops that are highly conformationally flexible to bind to the peptide. By
changing the conformation of the middle three amino acids of the CDR3α loop and CDR3β loop,
changes in the interaction between the peptide and these loops can be studied. These conformations
are “non-binding conformations”. These conformations are compared against the X-ray
conformations, which are considered as the “binding conformations”
19
. In addition, a trend in these
interactions and their relationship to T-cell activation can be developed and studied for the
prediction of TCR activation. Moreover, inserting different conformations of the peptide sequences
can also provide an idea of the important interactions between the TCR and the peptide. Based on
the concept of conformation selection of the protein-ligand binding, TCR loops select the best-
fitting conformation from the set of conformations in the equilibrium. Using this principle and
generating different conformations of the CDR3 loops will provide information regarding why
only certain conformations are best binders and others are not. This result will be used for the
model building which will then be utilized for the prediction of various TCR sequences for their
binding and activation to a specific known binder. Thus, a user-specific model is generated in this
chapter which can then be used for the development of various conformations of TCR CDR3 loops
for studying TCR binding
19
.
11
In this chapter, the methodology for the generation of different conformations of the CDR3 α and
CDR3 β loop is discussed. The conformations of the CDR3α loops and CDR3β loops and the
peptides have been made by an in-house software called AddProt. AddProt can take in the sequence
of the protein whose conformation is to be generated by changing the phi and psi angles of the
protein backbone. The desired increment in the torsion angle can be used for the generation of
different conformations for the selected amino acid residue backbone. The software then changes
the structure based on the increment of the torsion angles and checks for the internal clashes that
the protein sequence has within itself. In the end, it produces different conformations and generates
a list of the conformations that failed and the conformations that survived the clash distance
analysis. The clash distance threshold can also be implemented in the software, which will allow
it to scan the whole sequence and find the distances between the amino acid side chains and discard
the structures where the amino acid side chain clashes.
As seen in Table 1, 35 different types of T cell receptors in humans were studied for their CDR3
loops for their binding to 59 class I peptide-MHC complexes. Out of these 35 T cell receptors, data
for binding of T cell receptor clone A6 was found with 8 different peptide molecules with certain
mutations in their sequences. Similarly, data for binding of other TCRs were also found with more
than one pMHC complex. These data show the cross-reactivity of the TCR for the different
peptides; however, by studying the similarity of the sequences of the peptides to which the TCR
binds, the specificity for the peptides can also be noticed. The TCR chains are mentioned and
labeled with the letter ‘D’ for the α chain and ‘E’ for the β chain. The CDR3 α ranges from 11-19
amino acid residue length starting with cysteine (CYS (residue 89-104)) and ending with
phenylalanine (PHE (residue 100-118).
12
Table 1: List of TCRs from the TCR3d database with peptide sequences and sequences of CDR3
loops.
PDB
ID
Epitope TCR
name
MHC
name
Species RES
NUM
CDR3alpha
RES
NUM
CDR3beta
1AO7 LLFGYPVYV A6 HLA-
A*02:01
Human 90-
106
CAVTTDSWGKLQF 92-
108
CASRPGLAGGRPEQYF
1OGA GILGFVFTL JM22 HLA-
A*02:01
Human 90-
101
CAGAGSQGNLIF 93-
105
CASSSRSSYEQYF
1QRN LLFGYAVYV A6 HLA-
A*02:01
Human 90-
106
CAVTTDSWGKLQF 92-
108
CASRPGLAGGRPEQYF
1QSE LLFGYPRYV A6 HLA-
A*02:01
Human 90-
106
CAVTTDSWGKLQF 92-
108
CASRPGLAGGRPEQYF
1QSF LLFGYPVAV A6 HLA-
A*02:01
Human 90-
106
CAVTTDSWGKLQF 92-
108
CASRPGLAGGRPEQYF
2AK4 LPEPLPQGQLTAY SB27 HLA-
B*35:05
Human 90-
106
CALSGFYNTDKLIF 92-
108
CASPGLAGEYEQYF
2BNQ SLLMWITQV 1G4 HLA-
A*02:01
Human 90-
104
CAVRPTSGGSYIPTF 90-
103
CASSYVGNTGELFF
2BNR SLLMWITQC 1G4 HLA-
A*02:01
Human 90-
104
CAVRPTSGGSYIPTYF 90-
103
CASSYVGNTGELFF
2ESV VMAPRTLIL KK50.4 HLA-E Human 89-
105
CIVVRSSNTGKLIF 92-
108
CASSQDRDTQYF
2GJ6 LLFGKPVYV A6 HLA-
A*02:01
Human 90-
106
CAVTTDSWGKLQF 92-
108
CASRPGLAGGRPEQYF
2NX5 EPLPQGQLTAY ELS4 HLA-
B*35:01
Human 90-
106
CAVQASGGSYIPTF 92-
108
CATGTGDSNQPQHF
2YPL KAFSPEVIPMF AGA HLA-
B*57:01
Human 89-
100
CAVSGGYQKVTF 91-
101
CASTGSYGYTF
3DXA EENLLDFVRF DM1 HLA-
B*44:05
Human 104-
115
CIVWGGYQKVTF 104-
118
CASRYRDDSYNEQFF
3GSN NLVPMVATV RA14 HLA-
A*02:01
Human 90-
100
CARNTGNQFYF 91-
105
CASSPVTGGIYGYTF
3H9S MLWGYLQYV A6 HLA-
A*02:01
Human 90-
106
CAVTTDSWGKLQF 92-
108
CASRPGLAGGRPEQYF
3HG1 ELAGIGILTV MEL5 HLA-
A*02:01
Human 89-99 CAVNVAGKSTF 91-
105
CAWSETGLGTGELFF
3KPR EEYLKAWTF LC13 HLA-
B*44:05
Human 90-
106
CILPLAGGTSYGKLTF 92-
108
CASSLGQAYEQYF
3KPS EEYLQAFTY LC13 HLA-
B*44:05
Human 90-
106
CILPLAGGTSYGKLTF 92-
108
CASSLGQAYEQYF
3MV7 HPVGEADYFEY TK3 HLA-
B*35:01
Human 104-
118
CAVQDLGTSGSRLTF 104-
115
CASSARSGELFF
3O4L GLCTLVAML AS01 HLA-
A*02:01
Human 90-
100
CAEDNNARLMF 95-
107
CSARDGTGNGYTF
3PWP LGYGFVNYI A6 HLA-
A*02:01
Human 90-
106
CAVTTDSWGKLQF 92-
108
CASRPGLAGGRPEQYF
3QDJ AAGIGILTV DMF5 HLA-
A*02:01
Human 88-99 CAVNFGGGKLIF 94-
106
CASSLSFGTEAFF
3QFJ LLFGFPVYV A6 HLA-
A*02:01
Human 90-
106
CAVTTDSWGKLQF 92-
108
CASRPGLAGGRPEQYF
3SJV FLRGRAYGL RL42 HLA-
B*08:01
Human 104-
113
CVVRAGKLIF 104-
116
CASGQGNFDIQYF
3UTS ALWGPDPAAA 1E6 HLA-
A*02:01
Human 89-
101
CAMRGDSSYKLIF 92-
107
CASSLWEKLAKNIQYF
3VXM RFPLTFGWCF C1-28 HLA-
A*24:02
Human 91-
106
CAVGAPSGAGSYQLTF 91-
104
CASSPTSGIYEQYF
3VXR RYPLTFGWCF H27-14 HLA-
A*24:02
Human 90-
101
CAVRMDSSYKLIF 92-
104
CASSSWDTGELFF
3VXS RYPLTLGWCF H27-14 HLA-
A*24:02
Human 90-
101
CAVRMDSSYKLIF 92-
104
CASSSWDTGELFF
4EUP ALGIGILTV JKF6 HLA-
A*02:01
Human 91-
104
CASSFLGTGVEQYF 90-
102
CAVSGGGADGLTF
4G9F KRWIIMGLNK C12C HLA-
B*27:05
Human 104-
118
CAMRDLRDNFNKFYF 104-
117
CASREGLGGTEAFF
4G8G KRWIILGLNK C12C HLA-
B*27:05
Human 92-
106
CAMRDLRDNFNKFYF 104-
117
CASREGLGGTEAFF
13
.
The PDB files were obtained from the RCSB data bank. The CDR3α and CDR3β loops were
extracted from these PDB files, and the conformations of the middle three amino acids were
changed by changing the backbone torsion angles. These changed conformations of the CDR3α
4JFD ELAAIGILTV a24b17 HLA-
A*02:01
Human 89-99 CAVNDGGRLTF 91-
105
CAWSETGLGMGGWQF
4JFE ELAGIGALTV a24b17 HLA-
A*02:01
Human 89-99 CAVNDGGRLTF 91-
105
CAWSETGLGMGGWQF
4MJI TAFTIPSI 3B HLA-
B*51:01
Human 89-
101
CATDDDSARQLTF 95-
107
CASSLTGGGELFF
4MNQ ILAKFLHWL ILA1a1b1 HLA-
A*02:01
Human 87-
101
CAVDSATALPYGYIF 91-
102
CASSYQGTEAFF
4PRH HPVGDADYFEY TK3 HLA-
B*35:08
Human 104-
118
CAVQDLGTSGSRLTF 104-
115
CASSARSGELFF
4PRP HPVGQADYFEY TK3 HLA-
B*35:01
Human 104-
118
CAVQDLGTSGSRLTF 104-
115
CASSARSGELFF
4QOK EAAGIGILTV MEL5 HLA-
A*02:01
Human 89-99 CAVNVAGKSTF 91-
105
CAWSETGLGTGELFF
4QRP HSKKKCDEL DD31 HLA-
B*08:01
Human 104-
115
CALSDPVNDMRF 104-
118
CASSLRGRGDQPQHF
4QRR IPSINVHHY clone12 HLA-
B*35:01
Human 104-
122
CALGELAGAGGTSYGKLTF 104-
118
CASSLEGGYYNEQFF
5BRZ EVDPIGHLY MAG-
IC3
HLA-
A*01:01
Human 91-
105
CAVRPGGAGPFFVVF 92-
104
CASSFNMATGQYF
5C0C RQFGPDWIVA 1E6 HLA-
A*02:01
Human 89-
101
CAMRGDSSYKLIF 92-
107
CASSLWEKLAKNIQYF
5EU6 YLEPGPVTV PMEL17 HLA-
A*02:01
Human 91-
105
CAVLSSGGSNYKLTF 93-
105
CASSFIGGTDTQYF
5HHM GILGLVFTL JM22 HLA-
A*02:01
Human 90-
101
CAGAGSQGNLIF 93-
105
CASSSRSSYEQYF
5HHO GILEFVFTL JM22 HLA-
A*02:01
Human 90-
101
CAGAGSQGNLIF 93-
105
CASSSRSSYEQYF
5HYJ AQWGPDPAAA 1E6 HLA-
A*02:01
Human 89-
101
CAMRGDSSYKLIF 92-
107
CASSLWEKLAKNIQYF
5JZI KLVALGINAV 1406 HLA-
A*02:01
Human 97-
107
CAYGEDDKIIF 91-
102
CASRRGPYEQYF
5NME SLYNTVATL 868 HLA-
A*02:01
Human 89-
101
CAVRTNSGYALNF 91-
103
CASSDTVSYEQYF
5NMF SLYNTIATL 868 HLA-
A*02:01
Human 89-
101
CAVRTNSGYALNF 91-
103
CASSDTVSYEQYF
5NMG SLFNTIAVL 868 HLA-
A*02:01
Human 89-
101
CAVRTNSGYALNF 91-
103
CASSDTVSYEQYF
5W1W VMAPRTLVL GF4 HLA-E Human 104-
118
CAGQPLGGSNYKLTF 103-
119
CASSANPGDSSNEKLFF
5WKF GTSGSPIVNR D30 HLA-
A*11:01
Human 100-
112
CGLGDAGNMLTF 104-
121
CASSLGQGLLYGYTF
5WKH GTSGSPIINR D30 HLA-
A*11:01
Human 100-
112
CGLGDAGNMLTF 104-
121
CASSLGQGLLYGYTF
6AM5 SMLGIGIVPV DMF5 HLA-
A*02:01
Human 88-99 CAVNFGGGKLIF 94-
106
CASSLSFGTEAFF
6AMU MMWDRGLGMM DMF5 HLA-
A*02:01
Human 88-99 CAVNFGGGKLIF 94-
106
CASSLSFGTEAFF
6AVF APRGPHGGAASGL KFJ5 HLA-
B*07:02
Human 91-
104
CLVGEILDNFNKFYF 91-
104
CASSQRQEGDTQYF
6BJ3 IPLTEEAEL 55 HLA-
B*35:01
Human 91-
103
CALGEGGAQKLVF 91-
105
CASRTRGGTLIEQYF
6BJ8 VPLTEDAEL 55 HLA-
B*35:01
Human 91-
103
CALGEGGAQKLVF 91-
105
CASRTRGGTLIEQYF
6MTM FEDLRVLSF EM2 HLA-
B*37:01
Human 89-
102
CGTERSGGYQKVTF 89-
102
CASSMSAMGTEAFF
14
and CDR3β loops were fitted back in the PDB files to create structures with varied middle three
amino acid orientations. These varied protein structures were then utilized to study interactions
between the CDR3 loops as well as study the interactions between the loops and the peptide. The
entire process is summarized in Figure 2.
Figure 2: Flow chart explaining the process of development of CDR3 conformations.
After the development of the structures with CDR3 loop variations, these structures are solvated
using WATGEN5. WATGEN is an algorithm that calculates the correct position for the water
molecules based on the protein and ligand's X-ray crystal structure and adds the layer of water
molecules in the binding. WATGEN5 then finds the water molecules that have been displaced due
to the binding of solvated ligands. The position of water molecules is predicted first for the binding
cleft without the ligand. Various molecular properties of the protein such as torsion angles and
hydrogen bond donors and acceptors need to be calculated by WATGEN to find out the correct
position to place the oxygen atoms of the water molecules as their position is based on the length
Extract the file_name, chain letter, residue numbers and other information from the excel sheet
Copy all the input files(The CDR3 loops) from their respective folder to the folder where the code will function
Create in_peptide.txt from the torsion angles, increments and the errors mentioned in the excel sheet. The in_peptide txt
is made specifically for each PDB complex sturctures
Convert the in_peptide.txt to the in_txt file, a format the the AddProt takes in
Call the AddProt and generate the conformations using the AddProt and rename the output file based on the PDB ID
(pdbid_CDR3_vary.pdb)
Copy the files to their respective directory and generate the run.bat files in each individual directory and move the
in_peptide.txt as well, so that if there is an error, the Addprot can be run again
Take in the 10 conformers and fit each of them in the PDB file at their respective place depending on it being CDR3 alpha
/beta loop.
Check for the torsion angles and change the torsion angle for the oxygen of residue before the varied residue using the
code
Move the improved files are to respective directories depending on the PDB ID name and the conformer number.
Generate a conformer list containing the 10 random conformations selected.
15
of the hydrogen bonds made by these hydrogen bond donors and acceptors. The hydrogen atoms
of the water molecules are then added based on the length of the hydrogen bond length between
the hydrogen bond donors and hydrogen atoms of the water molecules
20
.
The water layers are first added to the binding areas of the empty protein i.e., without ligand. The
ligands, on the other hand, are protonated in advance before solvation and the water layers are
added to these ligands. Usually, 10 layers of water molecules are added which are required to
complete the water networks in the binding site. WATGEN adds layers one by one with the
previous layer acting as a ligand of the upcoming layer, thus the new layer is added based on the
position of the oxygen and the hydrogen atoms of the previous layer, to form the hydrogen bonding
and complete the water network
20
.
After the addition of the water layer, displacement of the water molecules is calculated when the
ligand is added to the protein molecule. This calculation is done using a method that finds the
matching water molecules in the empty protein and the bound protein and categorizes the water
molecules based on the impact of the binding of the ligand to the protein. This allows us to
differentiate the water molecules that just moved in the position and not got displaced from the
ones that got displaced
20
.
In addition, the clash distance, hydrophobic interactions, and oxygen-oxygen interactions were
calculated using Python scripts. The methods section explains the entire process of development
of structures with CDR3 loop variations to the solvation and molecular interaction analysis.
16
2.2 Methods:
2.2.1 Development of different conformations of CDR3 loops:
The CDR3 alpha and beta chain data was extracted from the T cell receptor structural repertoire
database for 59 pMHC complexes. The PDB structures were cleaned by removing unwanted side
chains and water molecules. These structures were used for the extraction of CDR3 chains.
The CDR3 chains were extracted by deleting chains “A”, “B”, “C”, and the X-ray water molecules.
The chains were then stored in a separate file. A specific code was generated in Python script for
the extraction of the CDR3 loops. A specific Excel sheet with the PDB names as well as the start
and end residue number of the residues of the CDR3 loop including the chain letter was generated.
The code read through this Excel file going through one row at a time (one PDB file at a time) and
extracted out the CDR3 loop based on the starting residue number of the loop and the end residue
number. These CDR3 loops were then stored in their specific files.
2.2.1.1 CDR3 Loop Extraction Code:
Code was developed for the extraction of the CDR3 loops from the PDB files. This code works by
the following steps:
1. It opens the file based on the file name mentioned in the Excel sheet and reads the lines.
2. From the lines, it detects the Chain letter, Residue number, Atom name and Residue
name based on the character space of the line.
3. It selects the first line of the loop and the line after the last residue of the loop and
extracts out the lines containing the loop and adds a delimiter at the end.
4. Once the loop is extracted, the atoms are renumbered from 1 and the file is saved as a
PDB file with proper name.
17
Figure 3: Python Code used for the extraction of the CDR3 loops
21
.
Table 2: Excel sheet used to read PDB name, chain letter, and start and end residue number.
PDB_I
D
Chain AA
res
AA end
res
CDR3alpha Ch
ain
AA
res
AA end
Res
CDR3beta
1AO7 D 90 106 CA VTTDSWGKLQF E 92 108 CASRPGLAGGRPEQYF
1OGA D 90 101 CAGAGSQGNLIF E 93 105 CASSSRSSYEQYF
1QRN D 90 106 CA VTTDSWGKLQF E 92 108 CASRPGLAGGRPEQYF
1QSE D 90 106 CA VTTDSWGKLQF E 92 108 CASRPGLAGGRPEQYF
1QSF D 90 106 CA VTTDSWGKLQF E 92 108 CASRPGLAGGRPEQYF
2AK4 D 90 106 CALSGFYNTDKLIF E 92 108 CASPGLAGEYEQYF
2BNQ D 90 104 CA VRPTSGGSYIPTF E 90 103 CASSYVGNTGELFF
2BNR D 90 104 CA VRPTSGGSYIPTYF E 90 103 CASSYVGNTGELFF
2ESV D 89 105 CIVVRSSNTGKLIF E 92 108 CASSQDRDTQYF
2GJ6 D 90 106 CA VTTDSWGKLQF E 92 108 CASRPGLAGGRPEQYF
2NX5 D 90 106 CA VQASGGSYIPTF E 92 108 CATGTGDSNQPQHF
2YPL D 89 100 CA VSGGYQKVTF E 91 101 CASTGSYGYTF
3DXA D 104 115 CIVWGGYQKVTF E 104 118 CASRYRDDSYNEQFF
3GSN A 90 100 CARNTGNQFYF B 91 105 CASSPVTGGIYGYTF
3H9S D 90 106 CA VTTDSWGKLQF E 92 108 CASRPGLAGGRPEQYF
3HG1 D 89 99 CA VNV AGKSTF E 91 105 CAWSETGLGTGELFF
18
3KPR D 90 106 CILPLAGGTSYGKLTF E 92 108 CASSLGQAYEQYF
3KPS D 90 106 CILPLAGGTSYGKLTF E 92 108 CASSLGQAYEQYF
3MV7 D 104 118 CA VQDLGTSGSRLTF E 104 115 CASSARSGELFF
3O4L D 90 100 CAEDNNARLMF E 95 107 CSARDGTGNGYTF
3PWP D 90 106 CA VTTDSWGKLQF E 92 108 CASRPGLAGGRPEQYF
3QDJ D 88 99 CA VNFGGGKLIF E 94 106 CASSLSFGTEAFF
3QFJ D 90 106 CA VTTDSWGKLQF E 92 108 CASRPGLAGGRPEQYF
3SJV D 104 113 CVVRAGKLIF E 104 116 CASGQGNFDIQYF
3UTS D 89 101 CAMRGDSSYKLIF E 92 107 CASSLWEKLAKNIQYF
3VXM D 91 106 CA VGAPSGAGSYQLTF E 91 104 CASSPTSGIYEQYF
3VXR D 90 101 CA VRMDSSYKLIF E 92 104 CASSSWDTGELFF
3VXS D 90 101 CA VRMDSSYKLIF E 92 104 CASSSWDTGELFF
4EUP H 91 104 CASSFLGTGVEQYF G 90 102 CA VSGGGADGLTF
4G9F D 104 118 CAMRDLRDNFNKFYF E 104 117 CASREGLGGTEAFF
4G8G D 92 106 CAMRDLRDNFNKFYF E 104 117 CASREGLGGTEAFF
4JFD D 89 99 CA VNDGGRLTF E 91 105 CAWSETGLGMGGWQF
4JFE D 89 99 CA VNDGGRLTF E 91 105 CAWSETGLGMGGWQF
4MJI D 89 101 CATDDDSARQLTF E 95 107 CASSLTGGGELFF
4MNQ D 87 101 CA VDSATALPYGYIF E 91 102 CASSYQGTEAFF
4PRH D 104 118 CA VQDLGTSGSRLTF E 104 115 CASSARSGELFF
4PRP D 104 118 CA VQDLGTSGSRLTF E 104 115 CASSARSGELFF
4QOK D 89 99 CA VNV AGKSTF E 91 105 CAWSETGLGTGELFF
4QRP D 104 115 CALSDPVNDMRF E 104 118 CASSLRGRGDQPQHF
4QRR D 104 122 CALGELAGAGGTSYG
KLTF
E 104 118 CASSLEGGYYNEQFF
5BRZ D 91 105 CA VRPGGAGPFFVVF E 92 104 CASSFNMATGQYF
5C0C I 89 101 CAMRGDSSYKLIF J 92 107 CASSLWEKLAKNIQYF
5EU6 D 91 105 CA VLSSGGSNYKLTF E 93 105 CASSFIGGTDTQYF
5HHM D 90 101 CAGAGSQGNLIF E 93 105 CASSSRSSYEQYF
5HHO D 90 101 CAGAGSQGNLIF E 93 105 CASSSRSSYEQYF
5HYJ D 89 101 CAMRGDSSYKLIF E 92 107 CASSLWEKLAKNIQYF
5JZI D 97 107 CAYGEDDKIIF E 91 102 CASRRGPYEQYF
5NME D 89 101 CA VRTNSGYALNF E 91 103 CASSDTVSYEQYF
5NMF D 89 101 CA VRTNSGYALNF E 91 103 CASSDTVSYEQYF
5NMG D 89 101 CA VRTNSGYALNF E 91 103 CASSDTVSYEQYF
5W1W D 104 118 CAGQPLGGSNYKLTF E 103 119 CASSANPGDSSNEKLF
F
5WKF D 100 112 CGLGDAGNMLTF E 104 121 CASSLGQGLLYGYTF
5WKH D 100 112 CGLGDAGNMLTF E 104 121 CASSLGQGLLYGYTF
6AM5 D 88 99 CA VNFGGGKLIF E 94 106 CASSLSFGTEAFF
6AMU D 88 99 CA VNFGGGKLIF E 94 106 CASSLSFGTEAFF
19
6A VF B 91 104 CLVGEILDNFNKFYF E 91 104 CASSQRQEGDTQYF
6BJ3 D 91 103 CALGEGGAQKLVF H 91 105 CASRTRGGTLIEQYF
6BJ8 D 91 103 CALGEGGAQKLVF H 91 105 CASRTRGGTLIEQYF
6MTM D 89 102 CGTERSGGYQKVTF E 89 102 CASSMSAMGTEAFF
2.2.2 Conformation generation using AddProt:
After the extraction of the loops, depending on the length of the CDR3 loops, middle three or four
amino acids were selected to change their conformation. In the respective folder of the CDR3 loop
folders based on the PDB name, an in_peptide.txt file was generated.
Figure 4: Example of an in_peptide.txt file.
As shown in the Figure 4, there are different sets of lines in the in_peptide.txt which are explained
as follows:
1. The first line is a set of variables that are read by the AddProt to undergo certain functions.
2. The second line is the file name of CDR3 alpha/beta loop.
3. The third line is the sequence of CDR3 alpha/beta loop with the middle three amino acids
whose conformation is to be changed are replaced with dash lines.
4. The fourth line consists of a set of numbers – the first number is the position of the amino
acid from the varied region towards the N terminal, the second number is the number of
File with CDR3 loop
CDR3 loop sequence
Phi and Psi torsion angles
Varied region
A.A position number
from the N side of
varied region
Number of residue in varied region
A.A position
number from the C
side of varied region
Clash distance threshold
20
amino acids in the varied region, the fourth number “0.0 0” can be ignored, and the fifth
number is the position number of the amino acid towards the C terminal from the varied
region.
5. The fifth line consists of the amino acid sequence of the varied region.
6. Depending on the length of the varied region, the next few lines are the torsion angles
starting from the Psi angle of a.a before the varied region, followed by phi angle, and so
on until the phi angle of the last amino acid of the varied region. The last column of this
line consists of the increment for the angle, here 30
°
increment gives fast results, but better
results can be obtained with 10
°
or 15
°
however, it would take a longer time to generate
results. 10
°
and 20
°
were used when the peptide was giving less than 10 conformers.
Usually, for the peptides with prolines, these increments were used. 40
°
was used when
there were many conformers than what AddProt could handle in cases when 4 residues
are varied. Usually, Addprot can handle only 9999 conformers, when a structure had a
greater number of conformers, 40
°
was used.
In a case when the middle residues of the peptide had proline in its sequence, the phi
torsion angles were not changed. It was because when the torsion angle was changed, the
carbonyl oxygen just before the proline would bend towards the proline, hence not
providing proper conformations. In addition to this, the proline ring was not properly
formed in certain cases. Hence, the phi angles were not changed.
7. The last line consists of errors (±) for generating the correct peptide bonds which are
assumed to be of 1.3Å in length and joining angles are 120
°
and omega angles are 180
°
. The
last value in the line is the clash distance threshold, 1.2 Å is found to give good results.
21
After the generation of the in_peptide.txt file, it was used as input for the run.bat program file. The
run.bat file took in the in_peptide.txt file, created a copy of the file, and named it as in.txt which
is the file format that the AddProt understands. In addition to this, it runs the commands for copying
all the CDR3 loop files into the current folder, so that the AddProt can choose the file that it
mentioned in the in.txt file as input file. It then calls for the AddProt executable from its location
to generate the conformations. After the completion of the AddProt run, the AddProt generates two
output files, one PDB file with all the conformations created and a text file that consists of data
regarding the conformations that survived the clash distance threshold, conformations that were
discarded and their torsion angle change. After the generation of AddProt output files, run.bat
renames the file into the format that we mention. The file then deletes all the input CDR3 loop
PDB files and deletes the in.txt (copy of in_peptide.txt file). An example of the run.bat file is given
in Figure 5.
Figure 5: Example of run.bat file that runs the AddProt.
Copies all the files from
a CDR3 alpha/beta
loops folder
Creates a copy of
in_peptide.txt as
in.txt
Intermediate
before the final
naming of output
file
Save output pdb file with varied
conformations with proper name
Deletes in.txt file
Calls out AddProt
from its folder
Deletes the previous
output files if any
Deletes the
CDR3 loop PDB
files
Save the output
txt file with
proper name
22
2.2.3 Generation of different TCR-peptide MHC PDB files that includes the conformations:
The different conformations of the middle 3-4 amino acids were added to the PDB file with the
whole loop in it. Out of the conformations that were generated, 10 different conformations of
CDR3 α and β loops were selected randomly and extracted out of the AddProt output PDB file.
These conformations were fitted back in the D and E chains of the TCR-pMHC complex PDB,
generating 10 different TCR-pMHC protein PDB files.
2.2.3.1 PDB file with varied CDR3 loop generation code:
A Python code was used for the generation of the different TCR-peptide MHC complexes. The
example consists of the code used for the conformations of the CDR3 α loop for the 1AO7 TCR-
peptide-MHC complex. The same code can be used for other PDB files and CDR3 β loops. The
code was generated based on code developed by Nairuti Mehta, a lab colleague, for the fitting of
different peptide conformations into the peptide chain
22
. However, major modifications were made
to the code as a part of the protein chain needs to be replaced in the PDB file.
The code does the following things:
1. The code reads in the AddProt generated output file containing and counts the number
of the conformations made. From this number, it randomly selects 10 conformations
and creates a list of the random conformation number. From this list, each conformation
is fitted in the following method.
2. The code reads the lines in the AddProt output PDB file and identifies a conformation
out of all the conformations made based on the random list created by the code. The
single conformer is identified using the delimiter and the number added after the
delimiter signifies the conformation number.
23
3. The conformation is then extracted and changed to the chain letter and residue number
are made as the AddProt output changes all the chain letters to “Z” and the residue
number as well. So, the chain letter and residue number of the conformation are made
to be like the original loop in the TCR chains.
4. The protein PDB file containing TCR and pMHC complex is read in and the lines
between which the conformation will be fitted are identified using indexes of the lines
and the conformation is fitted between the fixed regions of the loops and the TCR chain.
5. Once the conformation is fitted into the TCR chain, the atoms of the entire PDB file
are renumbered as in the AddProt output file the numbering starts from 1, but to fit the
conformation back in the chain, the numbering should match the fixed portion of the
TCR chain and there is additional hydrogen added in the AddProt, so to accommodate
them, the entire file is renumbered.
6. The new protein file with varied CDR3 loops is saved as a separate PDB file.
These files were then solvated using WATGEN5 for analysis of intermolecular interactions
between the TCR and peptide-MHC complex.
24
25
Figure 6: Python Code that extracts out the conformer from the AddProt output and
replaces the X-ray CDR3 loop with the varied AddProt conformer
21
.
2.2.4 Torsion Angle Corrections:
When the conformations of the peptide sequences were checked after their formation from the
AddProt, it was found that the peptide bond between the first varied peptide residue and the amino
acid of the fixed part of the peptide sequence before the varied region had a torsion angle that was
26
less than 180°. The ideal C-O-N-H torsion angle is 180°. A similar problem was identified at the
other end of the varied peptide region, with the C-O-N-H bond of the last varied residue and the
fixed part after the varied region residue. Thus, a Python script using the RDkit package was
written by Nairuti Mehta for the calculation of the torsion angles and for setting the torsion angle
to 180° for the peptide conformations
22
.
In the case of the CDR3 loops, the hydrogens are not generally present in the TCR α and β chain.
However, when the AddProt changes the torsion angles, it adds in the hydrogen. Thus, a C-O-N-
H bond is possible between the varied region's first residue and the fixed region's previous residue.
Due to the decrease in the torsion angle, the oxygen was internally clashing, and the torsion angle
was corrected by changing the coordinates of the oxygen atom.
Figure 7: Incorrect C-O-N-H torsion angle of 1AO7 CDR3β loop.
2.2.4.1 Torsion Angle Correction Code:
A code written by Nairuti Mehta, for correction of the torsion angle, was changed to fit the
requirement for the changed torsion angle in the CDR3 loops
22
. In the case of the CDR3 loops as
the loop is fitted in the protein chain, the entire chain is extracted out and the four atoms are
27
identified, their torsion angle is calculated and changed and since there is a fault in the coordinates
of the oxygen of the carbonyl carbon, the changed coordinates of the oxygen atoms are fitted back
in the protein file with varied conformations.
The code works in the following way:
1. The code takes in the complex PDB file with the fitted conformation from the AddProt.
2. The ‘D’ or ‘E’ chain is extracted depending on the part of the CDR3 loop that is
changed, and it is stored in a separate file called ‘in_orig’ in which the Delimiter is
changed from TER to ‘MODEL’ and ‘ENDMDL’ as the RDkit only reads the PDB file
with these Delimiters.
3. The ‘in_orig’ file is then used to extract out the C, O, N, H atoms coordinated by
selecting the through atom name and residue number, the entire line is selected for each
atom. In the case of the H atom, as there are many H atoms present in the varied region,
an array is created with all the H atoms, and the first H atom is selected out of those,
which is connected to the N atom of the residue. In addition to this, the O atom of the
previous non-varied residue is also changed to the HX atom as RDkit was not able to
establish bonds between all four atoms correctly. So, by changing the O atom to HX
RDkit was able to identify all the bonds and able to calculate the torsion angle.
4. The coordinates of all four atoms are stored in a list and the extracted lines of these
four atoms from the PDB files are stored in another PDB file called ‘in_torsion.pdb’.
5. The ‘in_torsion.pdb’ is read in by the RDkit Package. For the RDkit package to read
all the atoms, first it needs to be converted to a conformer and then the atoms of the
conformers are selected and the torsion angle is calculated using GetDihedral() function
and changed using SetDihedral() from rdkit.Chem.rdMolTransforms module. Also, the
28
chemical structure needs to be sanitized before changing the torsion angle, as
sometimes in some cases, the RDkit considers the atoms to be a part of a ring and hence
is not able to calculate the torsion angle
23
.
6. Once the torsion angle is changed to 180º, and the bond angle is set to 120º, the changed
coordinates of the O atom are stored in another PDB file called ‘in_mod2.pdb’. Here,
the O atom is still in the form of HX. This is replaced again and stored in the PDB
format.
7. The coordinates of the O atom are extracted from the ‘in_mod.pdb’ file and replaced in
the main file (complex PDB file) with the varied conformation.
Figure 8: Python Code for correction of Torsion angle (written by Nairuti Mehta). The code
calculates the dihedral angles and sets them to 180º
22
,
21
.
29
2.2.5 Automation of the process for CDR3 loops:
The entire process discussed above was automated using a Python script. The process automation
was done to be able to continue doing the same procedure for all 59 conformations and both loops.
The Python script follows the Excel sheet that is specifically created for this purpose. It has the
name of all the PDB complexes and the information for the variables that would go into the code
specific to the PDB complexes.
30
Figure 9: Python Code that reads and extracts values from the Excel sheet for automation
21
.
2.2.6. Automation of the process for the peptide chain:
The process of automation was also conducted for the generation of the peptide conformations,
torsion angle check, and the formation of the files that would be the input file for the Solvate
program. These conformations were used for the study of interactions with the CDR3 loops.
The entire process of generation of the peptide conformations was similar to the CDR3 loops.
Different codes generated by Nairuti Mehta, for peptide conformations were stringed together to
generate a code that would automate the entire process
22
.
An Excel sheet was made with all the PDB IDs and their file names, their sequences, the varied
region, the position of the varied regions, the phi, psi torsion angles, the increments, errors, and
the clash distance threshold. The peptide chain was extracted out of the PDB file of the TCR-
31
pMHC complexes and protonated (code written by Nairuti Mehta)
22
. Then they were stored in a
separate directory for all the 59 pMHC complexes. Depending on the peptide's chain length, the
middle 3-4 amino acid residues were selected for changing their conformations using AddProt and
the process mentioned above. The proline was kept stable due to its clash with the previous amino
acid residue’s carbonyl oxygen.
2.2.6.1 Peptide vary automation code:
The code took out all the variable values from the Excel sheet and for each PDB ID structure ran
the following function:
1. The code copied all the input peptide files that were extracted out of the PDB structures
as well as the AddProt executable and moved them to the folder in which the code
would work.
2. The code generated the in_peptide.txt files for each PDB complex and converted them
to the in_txt files that would be read by the AddProt.
3. AddProt is initiated by the code which produces the different conformations in the form
of an out.pdb and data.ap.out. These files are renamed to their respective PDB names
and then they are copied to the respective folders along with the in_peptide text.
4. Run.bat is also formed after this process for each PDB Id and moved to the respective
folders for each PDB complex name.
5. Once the output file with various conformations was formed, a code generated by
Nairuti Mehta was added to the above code to randomly select 15 conformers out of
the output files and fit them into the respective peptide chain of the PDB structure,
32
generating a file with the 15-peptide chain with the varied region fit inside each chain
22
.
Each of these peptide chains was separated by delimiter TER.
6. The file was then converted to the ‘in_orig.pdb’ file in which the delimiter was
converted from TER to ‘MODEL’ at the beginning of the peptide chains and
‘ENDMDL’ at the end of the peptide chains.
7. The ‘in_orig.pdb’ file was used to extract out the C-O-N-H bonds with the fixed regions
of the peptide chain and the varied region of the peptide chain towards the beginning
and the end of the varied regions using the code developed by Nairuti Mehta for
checking the torsion angle of the peptide
22
. The extracted 8 atoms (2 C, O, N, H from
both the ends of the varied regions) were stored in another PDB file named
‘in_torsion.pdb’ after converting the Os to HX as the RDkit was not able to identify the
correct oxygen atoms and was not able for form bond with the C atom.
8. The ‘in_torsion.pdb’ was then sanitized using the RDkit function to make sure that
RDkit does not confuse these bonds with the formation of a ring. The RDkit was then
used to check and change the torsion angle to 180 and the bond length to 120.
9. In the case of the first four atoms of the peptide bond between the fixed residue and the
1
st
varied residue, the coordinates of the oxygen atom were changed and in the case of
the last four atoms between the last varied residue and the fixed residue after that, the
hydrogen of the varied residue was changed to fit the torsion angle.
10. These changed coordinates are saved in the file called ‘in_mod2.pdb’. The HX atoms
are again converted to the O atom name and the file is saved again as ‘in_mod.pdb’. A
Fortran executable was used to fit these coordinates respectively in the ‘in_orig.pdb’
file from the ‘in_mod.pdb’.
33
11. The delimiters in the ‘in_orig.pdb’ file was changed back to TER.
12. The files were then edited for the Solvate using a code developed by Nairuti Mehta, in
which the chain letters are changed from ‘C’ to ‘LIG 1’ and the name of the file was
changed to ‘PDB_ID_vary_peptide15.pdb’ and saved in respective folders. These files
were ready to be solvated.
22
2.2.9. Solvation:
The coordinates of the varied CDR3 loops were inserted into the original X-ray structure and the
file was ready to be solvated. The original X-ray structures were also solvated to check for
hydrophobic interactions and hydrogen bonding. The peptide files were prepared for the solvation
using the code developed by Nairuti Mehta
22
.
The solvation of the binding cleft of the MHC and the TCR along with the peptide was done with
the following method:
1. Specific folders were generated with the protein with the varied CDR3 loops and the
sol-prepped peptide file.
2. The solvate.exe opened two windows, Solvate GUI, and a terminal window. Solvate
GUI was used to add all the input specific to the folder and the file and the WATGEN5
then ran in the terminal window.
3. The solvate.exe was stored in a directory that along with other folders such as Inputs-
where the files to be solvated were made and all the executables to run the function of
the Solvate were stored in the exe folder and a “common” folder that consists of the
format of the atoms and the PDB files that would be read in the executable.
34
4. In the Solvate GUI, the path to the “Inputs” was by default added in the first input box.
The Run name was the folder name in which the CDR3 loops were present, the protein
name was kept to “Prot” and the protein file was the file name of the X-ray structure.
To differentiate the protein from the ligand, the chains of the just MHC and TCR were
mentioned in the Chains input box.
5. The Cofactor Residue Names, Cofactor Residue Numbers, Cation Residue Names, and
Anion Residues names were kept empty.
6. The Ligand Poses filename was the name of the peptide files that were prepared for the
solvation. This file consisted of LIG L 1 in place of the residue name, chain letter, and
residue number in the PDB file and was the same for all the peptide files. This way the
solvate understood that the peptide is the ligand and not the part of the protein.
7. The Delimiter was set to “TER” which was used to separate the different chains of the
protein file.
8. The SMILES strings, Rotatable bonds, Molecular weight, Log P, and the Exp. Water
Residue Name was kept as such and the layers were changed to 10 layers.
9. The solvation of the X-ray structures along with the solvation of the variable CDR3
loops with the X-ray peptide structure was carried out. In addition to this, the solvation
of the variable CDR3 loops with the varied peptide files was also done.
35
Figure 10: Solvate GUI with the MHC TCR and peptide file input parameter.
2.2.8. Clash calculations:
The clash distance calculations were done to check whether the conformers that were made to fit
in the protein PDB file clashed with the peptide, MHC chains, and other unchanged CDR3 loops.
The code was written in collaboration with Nairuti Mehta
22
. The clash distance threshold was kept
at 1.2Å. The code to calculate the clash distance did the following things:
1. It categorizes the distance as less than ‘0.5 Å’ and less than ‘1.2Å’.
2. It read lines from the input file- the complex PDB files and extracts out the atom names,
chain names, residue names, residue numbers, x coordinates, y coordinates, and z
coordinates by creating a list of every item.
36
3. The code calculates the clash distance for the coordinates of a single atom with all the
other atoms. In this case, the clash distance of the CDR3 loops atoms against all the
other chains is calculated.
4. The calculation is done by subtracting the x, y z coordinates of the two atoms,
multiplying them by 2, and then taking the square root of the sum of the resulting
coordinates.
5. All the distances for all the chains mentioned in the code are calculated, but only the
distances that are less than 1.2 Å are printed out as an output in the form of the Excel
sheet for each conformer containing protein PDB file.
6. This process is also automated for the different protein PDB files with the varied region
for each PDB ID name by reading the code through the Excel sheet.
7. A text file is also generated for all the conformers containing protein PDB files with
the number of clashes in each file. This is done combinedly for all the conformer files
under the same PDB complex name. An example of the code is shown below.
37
Figure 11: Clash distance code example
21
.
2.2.9. Hydrophobic Interactions:
The code was written in collaboration with Nairuti Mehta
22
. The intermolecular hydrophobic
interactions between the CDR3 α and CDR3 β loops of the TCR were calculated to check the
changes in the interactions when the middle three amino acids were varied for either of the CDR3
loops. For these interactions, the coordinates of the carbon atoms of the backbone and the side
chains as well as the sulfur atoms of the cysteine and the methionine amino acid residues were
used to calculate the distance between these atoms of the other CDR3 loop. The cut-off for the
hydrophobic interactions was kept at 5 Å and 4 Å. Initially, the distance for all the carbon and
sulfur atoms was calculated for the α and β chain of the TCR, however, later during the analysis
only the CDR3 loops were extracted and the hydrophobic interactions between them were
38
calculated. The Python script used to calculate the clash distance was used in this case with the
slight change of restricting the atoms to the carbons and sulfurs.
The code takes in the protein file with the varied CDR3 loops and the X-ray structures and extracts
the coordinates of the mentioned atoms and calculates the distance between them with the cut-off
at 5 Å and 4Å. The data was then stored in the Excel file. The number of hydrophobic interactions
was calculated, and certain side chain carbons of the asparagine, aspartic acid, glutamine, and
glutamic acid were removed from the analysis as they do not participate in the formation of the
hydrophobic interactions. The entire process was automated for all the PDB structures. Similarly,
intermolecular hydrophobic interactions were also calculated for the PDB structures with the
variation of both CDR3 α and CDR3 β loops. The intramolecular hydrophobic interactions were
also calculated for the CDR3 α loops and β loops individually when the middle three or four amino
acids were varied. The code used for this purpose was the same as that used for the intermolecular
interactions. The chains for intramolecular distances were set to be the same and then the CDR3
loops were extracted for the analysis. The intermolecular distances between the CDR3α and
CDR3β with the peptide were also calculated for all the varied PDB structures and the X-ray
structure. These methods provide a faster way of computing the hydrophobic interactions.
39
Figure 12: Code for calculation of intermolecular interactions. A similar code was used for the
calculation of intra-molecular interaction and the oxygen clashes.
2.2.10. Oxygen Clashes:
The oxygen clashes were calculated between the two loops for all the PDB structures when the CDR3 α
was varied, CDR3 β was varied, and CDR3 αand β loops were varied. In addition to this, intramolecular
oxygen clashes were also calculated for the varied and the X-ray CDR3 α and CDR3β loops. The code
used for the calculation of the oxygen clash was the same as that used for hydrophobic interaction, with
the difference being that instead of carbon atoms, only oxygen atoms were selected, and the distance was
calculated. The CDR3 loops were extracted with the cut-off distances of 4Å and 5Å.
All the codes and the data used for this thesis are available in the GitHub
Repository(https://github.com/vbpatel12/Codes-And-Data-For-TCR-Peptide-Binding-
Analysis.git)
21
40
Chapter 3: Application of Solvation and Molecular Analysis of
CDR3 loops
3.1. Introduction:
The molecular interactions between the CDR3 loops and the peptide attached to the MHC
molecules were studied by solvating the entire protein structure separately and by solvating the
peptide separately. The solvated peptide and the proteins were then allowed to bind and the
resulting hydrophobic interactions, hydrogen bonding interactions, and contact single-water
bridges were studied. This allowed us to further understand the factors affecting the binding of the
TCR with the peptide. Studying the displaced water molecules and the water bridges that are
formed between the TCR and peptide to form the hydrogen bonding, allows us to understand that
in addition to the normal non-covalent forces, these water bridges may allow the peptide and the
TCR to be oriented in such a way that allows the TCR to bind to the peptide, thus activating the
TCR. The interactions between the TCR and the peptide were compared for the X-ray structures
considered as binding conformations and the varied conformations that are non-binding
conformations. The orientation of the CDR3 loops relative to one another enables them to make
specific contacts with the peptide of the MHC that results in the activation of the TCR. Thus, based
on the coordinates of the carbon atoms of the protein and the ligands, calculation of intermolecular
hydrophobic interactions between the CDR3 loops was also conducted to study how the two CDR3
loops need to interact properly for them to bind to the peptide and initiate the immune response.
The intramolecular hydrophobic interactions help to understand the internal conformation of the
loops, thereby permitting understanding of which orientation of the CDR3 loop is more favorable
for binding with the peptide.
41
The interactions' analyses were done using Python script, the structures with variations were categorized
as non-binding, and the X-ray structures were defined as binding. The Shapiro-Wilk normality test was
conducted using SciPy to check whether the interactions were normally distributed. Mann-Whitney U
test was conducted to find the p values (alpha = 0.05). Box plots were plotted using Matplotlib and
Seaborn packages in Python.
This chapter discusses the outputs obtained from the Addprot, which are used for the solvation of the
structures as well as the molecular interaction analysis between the CDR3 loops and between the CDR3
loops and peptide using Solvate and python script.
3.2 Structures with CDR3 loop variation obtained from AddProt:
Different conformers for 59 PDB structures were made for the CDR3 α loop and CDR3 β loop
using the AddProt. Out of many conformers, 10 were randomly selected for further analysis of
the interaction between the CDR3 loops and interactions with the peptide. The total number of
conformers and selected conformers for further analysis are shown in Tables 3 and 4, and the
structural output of AddProt is shown in Figure 13.
Figure 13: 416 conformers of the 1AO7 CDR3α loop obtained from AddProt.
42
Table 3: Ten PDB structures with CDR3a loop sequence, angle increment total conformers, and
10 selected conformers. (Amino Acids whose torsion angles are varied are mentioned in red).
PDB
ID
CDR3a Loop No. of
Varied
residues
Angle
Increment
Total
Conformers
Selected Conformers
1AO7 CA VTTDSWGKLQF 3 30 416 74, 406, 313, 81, 330, 7
267, 80, 174, 407,
1OGA CAGAGSQGNLIF 3 30 177 56, 46, 42, 95 101, 120,
111, 139, 31, 2
2BNQ CA VRPTSGGSYIPTF 4 40 1125 632, 341, 962, 327, 245,
732, 1026, 1055, 898,
991
2NX5 CA VQASGGSYIPTF 3 10 1236 625, 542, 1093, 317, 393,
1094, 1028, 546, 555,
1132
3MV7 CA VQDLGTSGSRLTF 4 40 524 87, 349, 170, 287, 54,
467, 406, 176, 156, 497
3VXS CA VRMDSSYKLIF 3 30 101 53, 59, 90, 9, 7, 87, 79,
37, 42, 50
4QRP CALSDPVNDMRF 3 30 28 19, 1, 10, 15, 22, 27, 9,
14, 25, 24
5JZI CAYGEDDKIIF 3 30 192 79, 32, 177, 44, 46, 63,
180, 183, 104, 119
6A VF CLVGEILDNFNKFYF 4 30 185 65, 26, 42, 77, 153, 119,
149, 131, 99, 95
6MTM CGTERSGGYQKVTF 3 30 102 90, 35, 58, 19, 98, 41,
100, 34, 101, 25
Table 4: Ten PDB structures with CDR3b loop sequence, angle increment total conformers, and
10 selected conformers. (Amino Acids whose torsion angles are varied are mentioned in red)
PDB
ID
CDR3b Loop No. of
Varied
Residues
Angle
Increment
Total
Conformers
Selected Conformers
1AO7 CASRPGLAGGRPEQYF 4 40 817 26, 564, 15, 639, 546,
536, 566, 491, 387, 270
2AK4 CASPGLAGEYEQYF 3 30 504 322, 210, 417, 462, 263,
61, 175, 336, 33, 116
2NX5 CATGTGDSNQPQHF 3 20 3647 1042, 3069, 1874, 3554,
2066, 3533, 1915, 2520,
2903, 769
3MV7 CASSARSGELFF 3 30 381 104, 336, 197, 107, 184,
105, 89, 170, 201, 278
43
3VXM CASSPTSGIYEQYF 3 20 263 112, 47, 38, 183, 262,
179, 250, 241, 92, 174
4G8G CASREGLGGTEAFF 3 30 385 205, 122, 96, 104, 5, 234,
254, 157, 63, 214
4MJI CASSYQGTEAFF 3 20 140 75, 21, 56, 121, 132, 53,
130, 122, 106, 27
5HHM CASSSRSSYEQYF 3 30 412 220, 207, 362, 395, 319,
56, 99, 401, 215, 302
5NMG CASSDTVSYEQYF 3 10 4162 2728, 2620, 3808, 2680,
2031, 2307, 2237, 1051,
1345, 405
6BJ3 CASRTRGGTLIEQYF 4 40 708 351, 465, 637, 168, 318,
189, 496, 5, 589, 456
Once the conformers are generated, AddProt provides output with a file consisting of the list of
the conformers that clashed, and those that did not clash and were made as final output.
Table 5: Snippet from the AddProt solutions output for CDR3a of 1AO7 PDB structures.
Solution Dist Ang1 Ang2 Tors Score Psi
(-1)
Phi
(1)
Psi
(1)
Phi
(2)
Psi
(2)
Phi
(3)
1 1.23 110.6 114.2 184.1 30.9 0 90 150 60 60 90
2 1.24 105.4 124 170.8 39 0 90 180 60 330 150
3 1.21 117.9 113.7 183.4 25.5 0 90 240 330 90 60
4 1.45 107.4 104.8 184.7 42.7 0 120 90 60 240 330
5 1.42 113.5 108.8 182.5 27.4 0 120 120 30 270 300
6 1.27 109.5 111.6 185.1 31.9 0 120 150 0 120 90
7 1.49 133.2 128.3 188.7 44.6 0 120 150 30 210 330
8 1.4 104.5 116.8 160.4 43.5 0 120 150 30 240 270
9 1.23 128.7 120.6 179.1 22.2 0 120 180 0 240 300
10 1.22 117.7 117.1 185.8 23.9 0 120 180 330 120 90
Table 5 shows the output obtained from the AddProt. ‘Dist’ is the distance between the residues
and the threshold for the distance is set to 1.2Å. ‘Ang1’ is the angle between the Cα (Trp8), C
(Trp8), and N(Gly9). The ‘Ang2’ is C (Trp8), N(Gly9) and Cα (Gly9). ‘Tors’ is the omega angle
of the last varied residue and the first fixed residue (between Trp8 and Gly9). The scores show the
44
number of errors. Lower scores mean lesser error and better structures. The Psi and Phi angles are
torsion angles of the varied region. Random 10 conformers are selected from these solutions, fitted
in the CDR3 loop and the TCR structure, and used for solvation analysis and intermolecular as
well as intramolecular analyses.
3.3 Intermolecular analysis between peptide and CDR3 loops using Solvate:
Solvation of the 59 MHC-peptide-TCR complexes were done using WATGEN5. The binding cleft of the
MHC and TCR was solvated individually for each of the complexes and each peptide was solvated
separately. After solvation of each component of the complex, the peptide was fitted back into the
complex and the changes in the various parameters were calculated. Solvate computes various
parameters to study the interactions such as hydrophobic interactions and hydrogen bonding interactions.
In addition to this, water molecule-related parameters such as displaced water- the water molecules that
are displaced by the peptide and the single water bridges- water-mediated interactions between the
peptide and protein were also calculated. Solvation of the complexes was carried out with different types
of variation. The complexes with a variation only of the CDR3α and CDR3β loops and the complexes
with variation in the CDR3α loop and peptide and the CDR3β loop and peptide were solvated. Solvation
of the X-ray complexes and the complexes with a variation of the middle three amino acids of the peptide
structures was conducted by Nairuti Mehta
22
. Solvation parameters of the 1AO7 complex with variation
of CDR3α and CDR3β loops are shown in Table 6.
Table 6: Example of the 1AO7 complex with the Solvate result for CDR3α and CDR3β
variation.
ID Prot-Lig
HB
Prot-Lig HF Actual SWB Absolute
Displaced
Waters
Contact
SWB
1AO7 20 1 33 12 38
1AO7_CDR3𝛼_74 24 18 36 11 38
1AO7_CDR3𝛼_406 22 5 36 18 32
1AO7_CDR3𝛼_313 22 7 34 15 30
45
1AO7_CDR3𝛼_81 24 15 33 12 34
1AO7_CDR3𝛼_330 22 9 33 12 34
1AO7_CDR3𝛼_7 22 8 33 18 30
1AO7_CDR3𝛼_267 23 17 33 15 33
1AO7_CDR3𝛼_80 24 15 35 15 35
1AO7_CDR3𝛼_174 25 12 34 8 33
1AO7_CDR3𝛼_407 22 5 36 13 33
1AO7_CDR3β_26 19 17 31 12 30
1AO7_CDR3β_564 20 22 35 13 31
1AO7_CDR3β_15 22 41 29 13 37
1AO7_CDR3β_639 22 29 32 13 33
1AO7_CDR3β_546 23 34 33 10 34
1AO7_CDR3β_536 24 36 30 9 33
1AO7_CDR3β_566 20 22 33 15 29
1AO7_CDR3β_491 21 28 34 11 34
1AO7_CDR3β_387 22 35 34 10 32
1AO7_CDR3β_270 23 43 30 10 36
These solvation parameters were compared with the X-ray structures. In addition to this, the parameters
for complexes with the variation of both CDR3α and peptide as well as complexes with variation of
CDR3β and peptide were compared with the X-ray, and complexes with variation of the peptide only.
The parameters were compared by taking averages of four categories- the binding conformations, non-
binding conformation with peptide variation, non-binding conformation with CDR3α/CDR3β variation,
and the non-binding conformation with CDR3α/CDR3β variation and peptide variation. Box plots were
used to analyze the differences between these parameters for the four categories.
The box plots show that there is no significant difference between the hydrogen bonding interactions
between the four categories for both CDR3α (Figure 14 A) and CDR3β (Figure 15 A) variations. On the
other hand, when there is variation in the CDR3 loops and variaiton in both CDR3 loops and peptides,
there are more hydrophobic interactions(Figure 14 (B) and Figure 15(B)). When there is variation in the
peptide, there are more hydrophobic interactions even if there is a change in the CDR3 loops or not.
46
Also, just the variation of the CDR3α loop produced fewer hydrophobic interactions as compared to the
variation of the CDR3β loops.
Figure 14: Box plots comparing four categories of complexes- Binding ( X-ray structure) Non-
binding conformations with variation in peptide, Non-binding conformation with variation in
CDR3α loop, and Non-binding conformation with variation in CDR3α loop and peptide
orientation. (A) Box plots for Hydrogen bonding interactions. (B) Box plots for Hydrophobic
interactions (NB- Non-binding, α = 0.05 using Mann Whitney U test)
Figure 15: Box plots comparing four categories of complexes- Binding ( X-ray structure), non-
binding conformations with variation in peptide, non-binding conformation with variation in
CDR3β loop, and non-binding conformation with variation in CDR3β loop and peptide
orientation. (A) Box plots for Hydrogen bonding interactions. (B) Box plots for Hydrophobic
interactions (NB- Non-binding, α = 0.05 using Mann Whitney U Test)
A.
B.
A. B.
47
In addition to the comparison of the hydrophobic interactions, a comparison of other solvation
parameters such as displaced water molecules, actual single water bridges, and the contact single water
bridges was also done. The displaced water molecules and the actual single water bridges are not
changed significantly when there is a variation of the peptide, CDR3 loop, and CDR3 loop plus peptide
variation (Figure 16 and Figure 17).
Figure 16: Box plots comparing four categories of complexes- Binding ( X-ray structure) , non-
binding conformations with variation in peptide, non-binding conformation with variation in
CDR3α loop, and non-binding conformation with variation in CDR3α loop and peptide
orientation. (A) Box plots for Displaced water molecules. (B) Box plots for Actual single water
bridges (C) Box plots for contact single water bridges. (NB- Non-binding, α = 0.05 using Mann
Whitney U test)
Figure 17: Box plots comparing four categories of complexes- Binding (X-ray structure) , non-
binding conformations with variation in peptide, non-binding conformation with variation in
CDR3β loop, and non-binding conformation with variation in CDR3β loop and peptide
orientation. (A) Box plots for Displaced water molecules. (B) Box plots for Actual single water
bridges (C) Box plots for contact single water bridges. (NB- Non-binding, α = 0.05 using Mann-
Whitney U test)
A.
B.
C.
A. B. C.
48
The hydrophobic interactions between the peptides and CDR3α and CDR3 β loops in the PDB structures
with variations in the middle three amino acids of the CDR3 α and CDR β loops were also calculated
using the Python scripts. This approach provides a more flexible method to analyze the hydrophobic
interactions between the peptides and the CDR3 α and CDR3 β.
Figure 18 (A & B) shows that when there are variations in the CDR3α loops, there is a significant increase
in the hydrophobic interactions. However, there is an increase in the hydrophobic interactions when the
CDR3β loop is varied in the case when the threshold distance was 5Å and 4Å (Figure D & C). All the
structures with variations in CDR3 α and CDR3 β loops showed that they had many hydrophobic
interactions with the peptide as compared to the X-ray structures. The number of interactions was also
found to be far less than the interactions between the CDR3 loops.
Figure 18: Box plots comparing the hydrophobic interaction between the CDR3 loops and
Peptide (Binding - X-ray Structure and Non-Binding – Structures with variation in
CDR3⍺/CDR3β). (A) Hydrophobic interactions with CDR3⍺ variation 5Å threshold. (B)
Hydrophobic interactions with CDR3⍺ variation 4Å threshold (C) Hydrophobic interactions with
CDR3β variation 5Å threshold. (D) Hydrophobic interactions with CDR3β variation 4Å
threshold. (α = 0.05- using Mann-Whitney U test)
A.
B.
C. D.
49
These results coincided with the finds of solvate results. There is a significant increase in hydrophobic
interactions between structures with variations in CDR3 α and CDR3 β loops and unchanged peptide
molecules. Thus, interestingly, both Python script results and solvate results showed that with variation
in the CDR3 loops, there is an increase in the hydrophobic interactions with the peptide.
3.4 Inter-Loop and Intra-Loop molecular analysis of structures with CDR3α/CDR3β
variation using python script:
In order for faster computation of hydrophobic interactions between the CDR3 α and CDR3 β loops as
well as hydrophobic interactions within CDR3 α and β loops at a desirable threshold distance, python
scripts were used to compute the interactions between the CDR3 loops for the X-ray structures as well
as structures with variation in CDR3α/CDR3β loop. This section discusses the two types of interactions-
hydrophobic interactions and oxygen clashes, which were studied to understand different
CDR3α/CDR3β conformations. These analyses were performed to further explore the structural
interactions between and within CDR3 loops.
Table 7: Hydrophobic interactions and oxygen clashes between CDR3 loops for 1AO7 X-ray
structure and CDR3 loop varied structures.
PDB_ID C-C interactions O-O clash NB - Xray
(C-C)
NB - Xray
(O-O)
1AO7 36 1 -- --
1AO7_CDR3𝛼_174 42 0 6 1
1AO7_CDR3𝛼_267 39 1 3 0
1AO7_CDR3𝛼_313 36 0 0 1
1AO7_CDR3𝛼_330 36 1 0 0
1AO7_CDR3𝛼_406 28 0 -8 1
1AO7_CDR3𝛼_407 25 0 -11 1
1AO7_CDR3𝛼_7 27 0 -9 1
1AO7_CDR3𝛼_74 39 1 3 0
1AO7_CDR3𝛼_80 36 0 0 1
50
1AO7_CDR3𝛼_81 32 0 -4 1
1AO7_CDR3β_15 27 3 -9 -2
1AO7_CDR3β_26 27 0 -9 1
1AO7_CDR3β_270 28 2 -8 -1
1AO7_CDR3β_387 25 3 -11 -2
1AO7_CDR3β_491 26 2 -10 -1
1AO7_CDR3β_536 26 3 -10 -2
1AO7_CDR3β_546 25 3 -11 -2
1AO7_CDR3β_564 25 1 -11 0
1AO7_CDR3β_566 25 1 -11 0
1AO7_CDR3β_639 28 3 -8 -2
The averages of the hydrophobic interactions and oxygen clashes were taken and compared with the
hydrophobic interactions and oxygen clashes of X-ray for 59 PDB structures.
Table 8: Average hydrophobic interactions and oxygen clash distances between CDR3α and
CDR3β for 15 PDB structures comparing X-ray structures and CDR3 loop varied structures.
X-ray Binding Non-Binding CDR3α
variation
Non-Binding CDR3β
variation
PDB_ID C-C O-O C-C(NB) O-O (NB) C-C(NB) O-O (NB)
1ao7 36 1 34 1 26.2 2.1
1oga 26 5 26 7.8 20.4 3.7
2ak4 14 0 26.9 3 14.3 0
3h9s 31 2 36.4 1.5 19.3 2.2
3mv7 27 4 27 4 27 4.3
3qfj 32 2 30.4 1 22.1 1.71
4jfd 23 0 22.6 0 23.1 0
4qok 11 1 10.6 1 10.2 1
4qrr 31 4 31 4 48.1 5.9
5hhm 27 5 26 7.5 19.9 3.7
5jzi 26 2 23.5 1.3 22.2 0
5nmg 19 1 19 1 19 1
6amu 22 0 19 0 14.9 0.4
6bj3 18 2 18 2.3 18.5 2.2
6mtm 29 5 31.1 5 26.8 6.7
51
3.4.1 Hydrophobic interactions:
The intermolecular hydrophobic interactions between the CDR3 α and β loops was calculated for the
PDB structures when CDR3α loops are varied as well as X-ray structure shows that there is not much
difference in the hydrophobic interactions between the loops as compared to the X-ray structure figure
19(A and B). The interactions were calculated for 5Å and 4Å. There are lesser interactions for 4Å, and
they also show not much difference when CDR3α is varied. On the other hand, a significant difference
is present when the CDR3β loops are varied showing lesser hydrophobic interactions in the case when
the cut-off distance is 5Å. The average hydrophobic interactions are lower when the distance cut-off is
4Å. Thus, showing that the changes in the CDR3β affect the interaction with the CDR3α loop.
Figure 19: Box plots comparing the hydrophobic interaction between the CDR3 loops (Binding -
X-ray Structure and Non-Binding – Structures with variation in CDR3⍺/CDR3β) (A) Inter-Loop
Hydrophobic interactions with CDR3⍺ variation 5Å threshold. (B) Inter-Loop Hydrophobic
interactions with CDR3⍺ variation 4Å threshold (C) Inter-Loop Hydrophobic interactions with
CDR3β variation 5Å threshold. (D) Inter-Loop Hydrophobic interactions with CDR3β variation
4Å threshold. (α = 0.05- using Mann-Whitney U test)
In addition to the above analyses, structures containing the variation of both CDR3 loops were
developed, with each PDB structure having 100 variations (permutation combination of 10 CDR3 α loop
A. B.
C. D.
52
variation with 10 CDR3β loop variation per PDB structure). It was found that as compared to X-rays
and the structures having variation in only either of the loops, the intermolecular hydrophobic
interactions of these structures were lesser hydrophobic interactions between them.
Figure 20: Box plots comparing the hydrophobic interaction of the CDR3 loops (Binding - X-ray
Structure and Non-Binding – Structures with variation in CDR3⍺ and CDR3β). (A) Inter-
Hydrophobic interactions with CDR3⍺ and CDR3β variation 5Å threshold. (B) Inter-
Hydrophobic interactions with CDR3⍺ and CDR3β variation 4Å threshold(α = 0.05- using
Mann-Whitney U test)
The hydrophobic interactions within CDR3 loops were also calculated for the varied loops and the X-
ray loops of the PDB structures. Even with the variation in the orientation of the middle three to four
amino acids of both loops, there is not much difference in the hydrophobic interactions within loops
(Figure 20 A and Figure 20 B). In the case of the CDR3β 4Å distance (Figure 20 C), there are more
hydrophobic interactions within loops for structures with CDRβ loops variation. In addition to this, it Is
also seen that the number of hydrophobic interactions within the CDR3 loops is much more than the
intermolecular interactions between CDR3α and CDR3β loops. Thus, these large number of internal
hydrophobic interactions stabilize the loops allowing them to form proper interactions with the peptide.
A. B.
53
Figure 21: Box plots comparing the internal hydrophobic interaction of the CDR3 loops
(Binding–- X-ray Structure and Non-Binding – Structures with variation in CDR3⍺/CDR3β). (A)
Intra-Loop Hydrophobic interactions with CDR3⍺ variation 5Å threshold. (B) Intra-Loop
Hydrophobic interactions with CDR3⍺ variation 4Å threshold (C) Intra-Loop Hydrophobic
interactions with CDR3β variation 5Å threshold. (D) Intra-Loop Hydrophobic interactions with
CDR3β variation 4Å threshold. ( α = 0.05- using Mann-Whitney U test)
3.4.2 Oxygen Clashes:
In order to get a better understanding of how the CDR3 loops interact, oxygen clashes were also taken
into account. The intermolecular oxygen clashes were calculated between CDR3 α and CDR3 β loops.
When only the CDR3α loop was varied in the PDB structures, the average number of clashes between
the CDR3 α and CDR3 β loops was less when the cut-off distance was 5Å (Figure 22 (A)& (B)). On
the other hand, there is no difference in the oxygen clashes between CDR3 α and CDR3 β loops when
the CDR3β loop was varied(Figure 22 (C)& (D)).
A. B.
C. D.
54
Figure 22: Box plots comparing the oxygen clash between the CDR3 loops (Binding - X-ray
Structure and Non-Binding – Structures with variation in CDR3⍺/ CDR3β). (A) Oxygen
interactions with CDR3⍺ variation 5Å threshold. (B) Oxygen interactions with CDR3⍺ variation
4Å threshold (C) Oxygen interactions with CDR3β variation 5Å threshold. (D) Oxygen
interactions with CDR3β variation 4Å threshold. ( α = 0.05- using Mann-Whitney U test)
Calculation of the oxygen clashes within the CDR3 α and CDR3 β loops was also conducted. Figure
23 A and B show that there is no significant between the oxygen clashes within the CDR3 α loops in
the case of X-ray structures and when the CDR3 α loop was varied. Figure 23 D and C show similar
results between the oxygen clashes within the CDR3 β loops in the case of X-ray structures and when
the CDR3 β loop was varied.
A. B.
C. D.
55
Figure 23: Box plots comparing the internal oxygen clash of the CDR3 loops (Binding - X-ray
Structure and Non-Binding – Structures with variation in CDR3⍺/CDR3β). (A) Intra-Oxygen
interactions with CDR3⍺ variation 5Å threshold. (B) Intra-Oxygen interactions with CDR3⍺
variation 4Å threshold (C) Intra-Oxygen interactions with CDR3β variation 5Å threshold. (D)
Intra-Oxygen interactions with CDR3β variation 4Å threshold. ( α = 0.05- using Mann-Whitney
U test)
These analyses were conducted for the exploration of the interactions between and within the
CDR3 loops for the X-ray structures and structures with variation in CDR3 loops. The results,
however, failed to provide any trends for the hydrophobic interactions and oxygen clashes between
and within the loops. It seemed that the CDR3 loops minimized the hydrophobic interactions with
the peptides and maximized interactions within the CDR3 α and CDR3β loops.
A.
B.
C. D.
56
Chapter 4: Molecular Dynamic Simulations of Peptides from MHC-
P-TCR Complexes
4.1 Introduction:
This chapter focuses on understanding the structural characteristics of the peptides that bind to the
MHC molecule before binding to the TCR. The peptide is found to form a hydrophobic cluster
gaining stability within itself before binding to the MHC molecule. As a result, it is hypothesized
that the peptide would form stable conformations in solution before forming any contact with the
MHC molecule. The molecular dynamics simulations of the peptide only without the presence of
the MHC molecule, allowed us to study the various conformations that a peptide acquires in the
solution. As discussed in the introduction, the N and C terminal positions of the peptide are very
important for binding to the MHC molecules, as the MHC molecules have special pockets to
accommodate the N-C terminal of the peptide. Thus, the stable conformation of the peptide should
be such that the length between the N and C terminal should be optimum in order to bind to the
MHC molecule. The molecular dynamic simulations were carried out for the GILGFVFTL
peptide, a highly studied peptide for its binding to MHC and TCR, along with its variants, to find
characteristics important in a peptide that lead to proper binding. The N-C terminal distance of the
X-ray conformation of the peptide was taken as a standard conformer for binding and the N-C
distance of all the other conformers was compared with it for the binding. In addition to this, the
distances of the center of mass of the hydrophobic cluster amino acids were to study their
characteristics in the non-binding state. This work was carried out in collaboration with Nairuti
Mehta
22
.
57
4.2 Methods:
The molecular dynamic simulations were carried out using YASARA
24
. The X-ray structure of
GILGFVFTL was kept as a positive control as it is found to be binding to the MHC molecule. A
poly-glycine (GGGGGGGGG) with the same conformation as GILGFVFTL was used as a
negative control to check the behavior of the peptide in the absence of the side chains. The
molecular dynamics simulations were set by creating specific folders for each GILGFVFTL
variation’s PDB structure. The PDB structures were then loaded in YASARA and set as the target.
YASARA conducts various functions using specialized codes written in YANACONDA- a Python-
based language developed for YASARA
25
. These codes are known as macros. The macros can be
played to conduct different tasks such as conducting simulations, analysis of simulations,
molecular docking, etc. There are various macros for carrying out the simulations- md_run and
md_runfast. These two macros vary as they conduct the simulations at different speeds. The
md_run macro conducts the simulation at a conservative speed or at a slower speed when the
structures contain severe errors, taking the snapshot at every 100
th
picosecond. On the other hand,
md_runfast macro increases the speed of the simulations by constraining bonds and angles of the
hydrogens increasing the timestep by 5 femtoseconds (each time step being 2*2.5 femtoseconds.)
and taking the snapshot at every 250
th
picosecond. Once the macro is selected for the simulation,
the following tasks are carried out such as cleaning the structures, optimization of the hydrogen
bonds within the peptide and with the solvent, the simulation parameters are set and the simulation
cell is created, the energy minimization takes place and the initial atom velocities are set. The
simulation is conducted in physiological conditions using 0.9% NaCl solution with 7.4pH at
physiological temperature and pressure. The AMBER14 forcefield was used as a default in
YASARA
24
.
58
The structures were created by taking the GILGFVFTL X-ray conformation as a template and the
new sequences were imposed on this template using AddProt. The simulation was allowed to run
for around 175 nanoseconds.
The trajectory was analyzed using the md_analysis macro. This macro converts all the snapshots
into PDB files and conducts default analyses and creates a default report with tables and figures.
We wrote codes in the md_analysis to obtain the N-C terminal distance as well as the distance
between the center of masses of the different residues.
Figure 24: YANACONDA Code to calculate the N-C terminal and plot a distance vs. time plot
for all the trajectories.
YASARA conducts two analyses for the trajectories. It analyses all the parameters inside the
simulations cell and outside the simulation cells. When the analyses are conducted inside the
simulation cell, the periodic boundaries play an important role in calculations. When the distance
of the N-C terminal of the peptide was calculated first inside the simulation cell, due to the presence
of the periodic boundaries, YASARA calculated the shortest possible distance between the N-C
terminal producing incorrect results. As a result, the above code was written to calculate the
distance outside the simulation cell. The code shown in Figure 25 was written to calculate and plot
the distance between the center of mass of the selected residues of the peptide. The distance
between residues 3-5, 5-7, and 3-7 were calculated as we hypothesized that these residues play an
important role in the stability of the peptide conformation and binding with the MHC molecule. In
addition to this, the distance between the center of mass of the 2
nd
and 6
th
residues was also
calculated. And lastly, the distance between the center of mass for the 1
st
and 9
th
residue was also
calculated in order to compare it with the distance between the N-C terminal.
59
Figure 25: YANACONDA Code written for the calculation of the center of mass of a residue in
the peptide GILGFVKTL and the code to plot graphs for the distance between the center of mass
of the selected residues.
4.3 Results:
Molecular dynamics simulations were conducted for 12 different structures, all of them being
variants of GILGFVFTL, to understand the structural properties and characteristics of the peptide
before it binds to the MHC molecule. This section discusses the results obtained after the
simulations of some of the important peptides after 175ns. The main analysis discussed in this
section is the distance between the N-C terminal and the center of mass between different residues
of the peptides. The X-ray conformation of GILGFVFTL was used as a positive control as it is
known to have good binding with the MHC molecule, and the GGGGGGGGG was used as a
negative control.
Table 9 shows the list of all the molecular dynamics simulations conducted along with the
summary of the mutations made, no. of conformers in the simulation, and the experimental log t1/2.
However, due to the unavailability of software used for the prediction of the binding of the peptide
with HLA, a software which was used in the Parker et al. study, the binding log t1/2 of some of the
peptides used in the simulations is not available
26
. This software predicts the binding based on the
dissociation of the β2-macroglobulin from the peptide-HLA complex
26,27
. Instead, scores produced
60
by SYFPEITHI, a tool that predicts the binding with HLA molecule based on the sequence and
presence of the anchor residues, were used to provide information on binding. The anchor residues
are considered based on the HLA molecule selected; in this case HLA-A*02:01
28
.
The higher the score the better the binder the peptide is. However, as compared to the experimental
log t1/2, these scores are not very accurate as it only considers the anchor residues (P2, P5/P6, P9)
29
.
The molecular dynamics simulations of GILGFVFTL, GLLGFVFTL, and GALGFVFTL were
conducted by Nairuti Mehta
22
. Further details regarding the simulations of GLLGFVFTL and
GALGFVFTL can be found in her thesis. The molecular dynamics simulations of these peptides
showed that a strong hydrophobic residue at the second position leads to better and stronger
binding as compared to a weaker hydrophobic residue. In addition, it was hypothesized that for a
peptide to be a good binder, it needs to form many stable conformers and remain stable for a longer
period, allowing it enough time to bind to the MHC molecule. This hypothesis is supported by the
simulations conducted for GILGFVFTL and GLLGFVFTL. In the case of peptide GILGGGFTL,
when the middle three amino acids were changed to glycine, it produced many stable conformers
that lasted for a long time of 65.75ns. It also had a significantly higher binding score of 27.
However, certain exceptions to the case were also found such as better binding of GLFGGGFGV
even with only 5 stable conformers that last for 44.75ns, which shows better binding to the MHC
molecule than GILGFVFTL. Similarly, GILGLVFTL is a PDB structure 5HHM which is known
to bind to the MHC molecule, however, it also produces 6 conformers for 57.25ns. One of the
surprising results we found was for linear GILGFVFTL, which attained the N-C distance similar
to X-ray within a couple of nanoseconds, however, it produced only 3 stable conformers that lasted
for just 20.5ns.
61
In addition to this, the research conducted by Nairuti Mehta on the binding of the peptide with the
MHC molecule found that a hydrophobic cluster exists in the center
22
. However, when the
hydrophobic phenylalanine at the 5
th
position was replaced with the charged residue such as lysine
(GILGKVFTL) and a weak hydrophobic group alanine (GILGAVFTL), the binding of the
GILGKVFTL was found to be significantly better (2.64) which is better than the alanine at the
second position. This may be due to the result we found that showed that GILGKVFTL spends a
large amount of its time at the N-C terminal distance, which is near the X-ray, which may be
allowing it to remain stable enough to bind. However, similar activity is found for GALGFVFTL,
but it has a lower binding half-life, thus it cannot be solidly concluded that maintenance of the N-
C terminal distance is important for the binding. Mutation at the 7
th
and 8
th
position in the case of
GILGFVKTL and GILGFVFVL showed a lower number of stable conformers as well as low
binding half-life, especially in the case of phenylalanine to lysine mutation at the 7
th
position,
showing that the hydrophobic group at this position is important for stability. However, the binding
scores are higher in this case as the anchor residues are not mutated.
The peptide with just glycine at all the positions except the 3
rd
, 5
th,
and 7
th
positions showed only
2 stable conformers and a very low binding score. The peptide that is scrambled, surprisingly
remained stable for quite a long time at 92.5ns with 6 conformers, however, it has a very low score
of binding.
Table 9: Comparison of all peptides based on conformations and binding.
26,28
.
Peptide Variation Simulati
on time
(ns)
No. of
stable
confor
mers
Total time
spent in stable
conformers
(ns)
Experi
mental
log t1/2
SYFPE
ITHI
Score
28
GILGFVFTL X-ray
Structure
175 13 93 3 30
GLLGFVFTL
22
I2L 175 10 103 4.18 32
GALGFVFTL
22
I2A 175 9 67.25 1.95 26
62
GILGKVFTL F5K 175 6 45.25 2.64 30
GILGAVFTL F5A 175 7 64.75 2.34 29
GILGLVFTL F5L 175 6 57.25
30
GGLGFGFGG I2G, V6G,
T8G,L9G
175 2 12.25
7
GILGFVKTL F7K 175 4 57.5 1.83 29
GILGFVFVL T8V 175 4 23 2.32 28
GLFGGGFGV I2L, L3F,
F5G,
V6G,T8G,
L9V
175 5 44.75 3.30 27
GILGGGFTL F5G, V6G 175 9 65.75
27
FGILVFGLT Scrambled 175 6 92.5
7
GILGFVFTL_180 Linear
peptide
175 3 20.5
GGGGGGGGG PolyG 175 0 0
The molecular dynamic simulations for selected peptides are discussed in depth below. The graphs
are obtained from the analysis report of YASARA. More information regarding the simulations of
the rest of the peptides can be found in the GitHub repository
21
.
GILGFVFTL: The molecular dynamics simulation of this structure was conducted by Nairuti
Mehta
22
. The analysis included here was used as a control, and based on this analysis, we decided
to use other structures to get more idea regarding the structural behavior of the peptides. The
peptide was extracted from the X-ray structure and the initial distance between the N-C terminal
was around 23.59 Å. Figure 28 (N to C dist.) shows that the peptide takes quite a while to stabilize,
having a large variation in the N-C terminal distance. After 40ns the peptide stabilizes, where some
of the conformations can be identified that stay in that distance for more than 5-10ns. The list of
these conformations and their start time and end time is given in Table 9. The peptides spend
63
around 93ns in these conformers. Also, it is seen that most of these conformers have the N-C
distance below the X-ray distance. We also tried to quantify the conformations by taking the
average distance for 5ns using Excel and Figure 27 was produced, which shows the conformers
and their time. It was hypothesized that the interaction between the side chains may be responsible
for the formation of the periodic stable conformers. As a result, the distance between the center of
mass of various residues was also calculated. We hypothesized that the interaction between the
residues 3, 5, 7, 2, and 6 can be responsible for the formation of the conformers.
Figure 29 shows the graphs obtained by calculating the distance between the center of mass of
different residues of the peptide. As seen from the graphs, the conformers spend a lot of their time
at a distance that is similar to the distance in the X-ray conformation. For example, in the case of
the distance between Leu3-Phe5, initially, the side chains are moving randomly; however, after
60ns the peptide forms a stable interaction which is roughly in the same distance as the X-ray. The
peptide, even though showing some movements, comes back to that same distance range. In the
case of Leu3-Phe7, the peptide is seen to exist as two different conformers and in the case of Ile2-
Leu6, the peptide moves randomly at first and then comes to a stable conformation that has a
distance almost similar to the X-ray structure.
64
Figure 26: Molecular Structure of GILGFVFTL
Table 10: Characteristics of conformers in molecular dynamics simulations of GILGFVFTL
22
Conformatio
ns
Average
Distance
(Å)
Minimum
Distance
(Å)
Maximum
Distance
(Å)
Start
Time (ns)
End Time
(ns)
Net Time (ns)
(End Time -
Start Time)
1 24.18 18.57 28.30 5.00 9.75 4.75
2 22.13 17.13 25.84 32.25 45.75 13.50
3 14.91 10.09 18.35 51.00 58.25 7.25
4 14.17 10.38 18.87 67.00 72.00 5.00
5 3.89 2.94 7.51 73.50 80.75 7.25
6 4.30 3.22 8.83 81.00 89.25 8.25
7 13.56 11.53 18.51 90.75 96.00 5.25
8 15.11 11.75 19.46 97.25 102.00 4.75
9 12.81 8.02 17.46 110.25 116.00 5.75
10 13.73 10.08 18.44 122.00 127.00 5.00
11 8.63 5.94 12.77 131.50 147.00 15.50
12 14.55 12.00 17.75 150.00 155.75 5.75
13 11.07 7.17 16.74 156.00 161.00 5.00
Total Time (ns) 93
Figure 27: Quantitative analysis of the N-C terminal distance to capture stable conformers of
GILGFVFTL
22
.
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170
Average Distance of conformations
Simulation time (ns)
Average Distance between Gly 1 (N-ter) and Leu 9 (C-ter)
1
2
3
12
8
7
6 5
4
13
9
10
11
65
Figure 28: The N-C terminal distance graph for GILGFVFTL
22
.
A.
B.
23.59
Å
66
C.
D.
Figure 29: The distances between the center of mass of the different residues in the peptide. (A)
Leu3-Phe5 (B) Phe5-Phe7 (C) Leu3-Phe7(D) Ile2-Vale6
22
.
67
GILGKVFTL: From the work and the thesis of Nairuti Mehta, it has been hypothesized that the
peptide forms a hydrophobic cluster which stabilizes the conformation of the peptide, allowing it
to have an appropriate N-C distance that is able to bind to the MHC molecule. However, A study
conducted by Parker et al., showed that the experimental log half-life of GILGKVFTL was almost
similar to the GILGFVFTL even with the pre the presence of lysine in the middle
26
. As a result,
the molecular dynamics simulation was conducted for this peptide to study what makes the peptide
more susceptible to binding. The initial N-C terminal distance was 23.7Å. The peptide sticks to 9
conformers, most of which exist at a distance between the N-C terminal similar to the X-ray except
for between 110-120ns where the N-C distance is very low and the peptide is folded. The peptide
stays in these conformations for 45.25ns.
As seen from the Figure 33, not many conformers stick and the distance between the center of
mass particularly for Leu3-Lys5 and Lys5-Phe7 is random, and no conformation is found to stick.
Leu3-Phe7 do show some conformations that stick for longer than 5ns. However, for Ile2-Val6
there is a large variation in the conformers.
68
Figure 30: Molecular Structure of GILGKVFTL
Table 11: Characteristics of conformers in molecular dynamics simulations of GILGKVFTL.
Conformations Average
Distance
(Å)
Minimum
Distance
(Å)
Maximum
Distance
(Å)
Start
Time (ns)
End
Time
(ns)
Net Time (ns)
(End Time -
Start Time)
1 10.99 8.19 15.38 22.25 29.25 7.00
2 19.01 14.36 23.16 35.00 42.50 7.50
3 19.88 15.31 24.07 57.25 67.50 10.25
4 22.98 14.67 27.18 102.75 108.00 5.25
5 5.54 3.17 9.23 111.25 121.25 10.00
6 20.50 14.70 26.51 169.75 175.00 5.25
Total Time (ns) 45.25
Figure 31: Quantitative analysis of the N-C terminal distance to capture stable conformers of
GILGKVFTL.
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170
Average Distance ofconformers
Simulation Time (ns)
1
4
3
2
5
6
Average distance between the Gly1 (N Termianl) and Leu9 (C Terminal)
69
Figure 32: The N-C terminal distance graph for GILGKVFTL.
A.
B.
70
C.
D.
Figure 33: The distances between the center of mass of the different residues in the peptide. (A)
Leu3-Lys5 (B) Lys5-Phe7 (C) Leu3-Phe7(D) Ile2-Vale6.
GILGA VFTL: This variation consists of alanine at the fifth position, a weak hydrophobic amino
acid. The experimental log half-life of this peptide sequence was less than when lysine is present
in the fifth position, making it a weaker binder than GILGKVFTL even when there is a
hydrophobic group at this position
26
. There are 7 conformations that were stable for more than 5ns
and for a total time of 64.75 ns as shown in Figure 35. The average distance that most of the
conformers were below the initial distance of 23.94 Å.
71
Figure 34: Molecular structure of GILGA VFTL
Table 12: Characteristics of conformers in the molecular dynamic simulations of GILGA VFTL.
Conformations Average
Distance
(Å)
Minimum
Distance
(Å)
Maximum
Distance
(Å)
Start
Time
(ns)
End
Time
(ns)
Net Time
(ns) (End
Time -
Start
Time)
1 16.18 11.90 20.12 16.75 22.75 6.00
2 4.39 3.23 12.13 69.00 78.25 9.25
3 17.76 10.74 21.80 90.50 101.50 11.00
4 11.36 8.23 16.34 116.75 121.50 4.75
5 15.22 9.44 18.63 121.75 126.50 4.75
6 5.06 3.16 7.63 138.25 159.25 21.00
7 16.57 12.37 20.17 167.00 175.00 8.00
Total Time (ns) 64.75
Figure 35: Quantitative analysis of the N-C terminal distance to capture stable conformers of
GILGA VFTL.
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170
Average distnace of conformers
Simulations Time (ns)
Average distance between Gly1 (N-terminal) and Leu9 (C-Terminal)
1
4
3
2
5
6
7
72
Figure 36: The N-C terminal distance graph for GILGA VFTL.
A.
B.
73
C.
D.
Figure 37: The distances between the center of mass of the different residues in the peptide. (A)
Leu3-Ala5 (B) Ala5-Phe7 (C) Leu3-Phe7(D) Ile2-Vale6.
74
GILGFVKTL: This peptide was chosen from the study conducted by Parker et al for the binding
analysis of different variants of the GILGFVFTL. The sequence of this peptide was superimposed
using AddProt on the binding X-ray conformation template of GILGFVFTL. This peptide is a
variation of lysine in place of phenylalanine in the 7
th
position. This peptide is a very weak binder,
with the peptide having an experimental log half-life of 1.83. the binding is less than 1.5 times
GILGFVFTL. This peptide had 4 stable and very distinct conformers that lasted to total time of
57.50 ns and an average distance of N-C terminal being around 15-16Å.
In addition, there are distinct conformers present when the distances between the center of mass
of the residues were plotted. In the case of the Leu3-Phe5 gets farther away from each other. In the
case of Phe5-Lys7, the residues are stable for 50-70ns, which corresponds to the stability of the
peptide during that time. In addition to this, Leu3-Ly7 shows 4 distinct stable regions which also
correspond to the N-C distance, and similar stability is observed with Ile2-Val6.
75
Figure 38: Molecular structure of GILGFVKTL
Table 13: Characteristics of conformers in the molecular dynamic simulations of GILGFVKTL.
Conformations Average
Distance
(Å)
Minimum
Distance
(Å)
Maximum
Distance
(Å)
Start
Time (ns)
End
Time
(ns)
Net Time (ns)
(End Time -
Start Time)
1 15.47 10.48 20.29 23.00 49.25 26.25
2 15.24 11.25 20.03 60.25 68.75 8.50
3 17.38 12.12 20.76 116.50 123.75 7.25
4 14.62 6.96 19.59 143.00 158.50 15.50
Total Time(ns) 57.50
Figure 39: Quantitative analysis of the N-C terminal distance to capture stable conformers of
GILGFVKTL.
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170
Average Disnatnce
Simulation Time(ns)
Average distance between Gly1 (N-terminal) and Leu9 (C-Terminal)
1
4
3
2
76
Figure 40: The N-C terminal distance graph for GILGFVKTL.
A.
B.
77
C.
D.
Figure 41: The distances between the center of mass of the different residues in the peptide. (A)
Leu3-Phe5 (B) Phe5-Lys7 (C) Leu3-Lys7(D) Gly2-Gly6.
78
GILGFVFVL: In this peptide, valine was added in place of threonine and imposed on the X-ray
binding conformer orientation using AddProt. The experimental binding log half-life of this
peptide is 2.32. This peptide was chosen from the Parker et al study to find the importance of the
8
th
position in the peptide, which may be helping the peptide to properly align its C terminal in the
MHC molecule. Here, a polar amino acid residue was changed to a hydrophobic residue. There
were four stable conformers that lasted for 23 ns. Even though the stable conformers were found
to have an N-C distance below the initial distance, the peptide spends a large amount of time at
almost 18-24 Å.
Figure 45 for the distance between the center of mass of the Leu3, Phe5, and Phe7 show that
stability in the interactions is found from 50-95ns in the case of both Leu3-Phe5 and Phe5-Phe7.
In addition to this, stability in interaction is also seen for Ile3-Val6 during this time. The interaction
between Phe5-Phe7 is found to be maintained throughout the simulation.
79
Figure 42: Molecular Structure of GILGFVFVL
Table 14: Characteristics of conformers in the molecular dynamic simulations of GILGFVFVL.
Conformations Average
Distance
(Å)
Minimum
Distance
(Å)
Maximum
Distance
(Å)
Start
Time (ns)
End Time
(ns)
Net Time
(ns) (End
Time -
Start
Time)
1 3.99 3.18 6.55 7.00 12.75 5.75
2 6.61 3.04 10.81 82.75 89.25 6.50
3 7.32 3.25 11.77 131.50 136.25 4.75
4 4.42 3.09 8.60 167.50 173.50 6.00
Total Time(ns) 23.00
Figure 43: Quantitative analysis of the N-C terminal distance to capture stable conformers of
GILGFVFVL
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170
Average distance between Gly1 (N-terminal) and Leu9 (C-Terminal)
1
4
3 2
80
Figure 44: The N-C terminal distance graph for GILGFVFVL.
A.
B.
81
C.
D.
Figure 45: The distances between the center of mass of the different residues in the peptide. (A)
Leu3-Phe5 (B) Phe5-Lys7 (C) Leu3-Lys7(D) Gly2-Gly6.
GLFGGGFGV: This peptide has variation at the 2
nd
, 3
rd
, 5
th
,6
th
, 8
th,
and 9
th
positions to glycine.
The experimental log t ½ of this peptide is 3.3 which is more than the GILGFVFTL (3.0)
26
. There
are five stable conformations with between 4-15Å N-C terminal distances. However, the graphs of
the distance between the center of the mass peptide of the peptide residue show hardly any stable
conformations.
82
Figure 46: Molecular Structure of GLFGGGFGV
Table 15: Characteristics of conformers in the molecular dynamic simulations of GLFGGGFGV
Conformations Average
Distance
(Å)
Minimum
Distance
(Å)
Maximum
Distance
(Å)
Start
Time (ns)
End Time
(ns)
Net Time
(ns) (End
Time -
Start
Time)
1 4.73 3.00 8.90 60.00 69.50 9.50
2 14.18 8.60 18.91 71.75 79.75 8.00
3 12.81 8.31 16.91 98.50 105.00 6.50
4 4.39 3.05 10.89 113.25 125.25 12.00
5 5.10 3.23 11.49 146.50 155.25 8.75
Total Time(ns) 44.75
Figure 47: Quantitative analysis of the N-C terminal distance to capture stable conformers of
GLFGGGFGV .
0
2
4
6
8
10
12
14
16
18
20
22
24
26
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170
Average distance of conformers
Simulation Time (ns)
Average distance between Gly1 (N-terminal) and Val9 (C-Terminal)
1
4
3
2
5
83
Figure 48: The N-C terminal distance graph for GLFGGGFGV .
A.
B.
84
C.
D.
Figure 49: The distances between the center of mass of the different residues in the peptide. (A)
Phe3-Gly5 (B) Phe5-Phe7 (C) Leu3-Lys7(D) Leu2-Gly6.
85
GILGGGFTL: In this peptide, the middle three amino acids were varied to glycine to observe the
characteristics of the peptide and see if these amino acids are important for stability. There were 9
conformers found which lasted for 65.75ns; however, each conformer persisted for roughly 5-8 ns.
A distinct conformer can be seen for 132-147 ns that lasted for 14.5 ns, however, it is almost in a
folded state. From the N-C terminal distance graph, it can be seen that the conformers were found
at an average distance of 15Å.
The graphs for the distance between the center of mass between the 3,5,7,2, and 6
th
residues show
that there is no stable interaction between these residues and the peptide is highly flexible with a
lot of random motion.
86
Figure 50: Molecular Structure of GILGGGFTL
Table 16: Characteristics of conformers in the molecular dynamics simulations of GILGGGFTL
Conformations Average
Distance
(Å)
Minimum
Distance
(Å)
Maximum
Distance
(Å)
Start
Time (ns)
End Time
(ns)
Net Time
(ns) (End
Time -
Start Time)
1 14.32 9.75 19.69 9.00 14.00 5.00
2 15.70 11.66 20.27 39.00 44.50 5.50
3 9.68 6.51 12.80 46.00 50.75 4.75
4 11.25 7.99 14.78 53.75 61.75 8.00
5 8.77 6.29 14.66 65.00 72.75 7.75
6 13.68 11.00 18.87 73.50 78.75 5.25
7 13.78 9.27 19.06 99.25 108.25 9.00
8 4.12 3.15 6.66 132.25 146.75 14.50
9 12.05 6.50 18.80 158.50 164.50 6.00
Total Time(ns) 65.75
Figure 51: Quantitative analysis of the N-C terminal distance to capture stable conformers of
GILGGGFTL
0
2
4
6
8
10
12
14
16
18
20
22
24
26
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170
Average Distance of confomers
Simulation Time (ns)
Average distance between Gly1 (N-terminal) and Leu9 (C-Terminal)
1
4
3
2
5
6
7
8
9
87
Figure 52: The N-C terminal distance graph for GILGGGFTL.
A.
B.
88
C.
D.
Figure 53: The distances between the center of mass of the different residues in the peptide. (A)
Leu3-Gly5 (B) Gly3-Phe7 (C) Leu3-Phe7(D) Ile2-Gly6.
89
GGGGGGGGG: As there are no side chains, the peptide moved around without having any
conformers that stuck for more than 5ns.
Figure 54: The N-C terminal distance for the peptide GGGGGGGGG.
90
Chapter 5: Final Discussion
This thesis studied the molecular interactions of the CDR3 loops of TCR with the peptide through
the development of various conformers of the CDR3 loops using computational tools and Python
scripts. Chapter 3 discussed the results obtained from this method. The results from the calculation
of the interactions using Python script showed that CDR3 loops make many hydrophobic
interactions with each other as well as internally within themselves and the number of hydrophobic
interactions that it makes with the peptide is much less. Also, the calculation of the hydrophobic
interactions within the CDR3 loops showed that the CDR3 loops tried to maximize the interactions
within themselves.
The solvation of the X-ray MHC-peptide-TCR complex, which allowed us to simulate the binding
of the peptide with the TCR and MHC molecule in the presence of water, showed that the peptide
mostly contacted the MHC molecule, with little to no interactions with the CDR3 loops. When the
conformation of only CDR3 loops was changed, the number of hydrophobic interactions increased.
From the calculation of the solvate parameters and the intermolecular interactions, it was found
that the conformation and orientation of the CDR3 β loop were very important for the interactions
with the peptide and any change in the CDR3 β loops affected these interactions. The effect of the
CDR3α loop was relatively low. Thus, these loops work in accordance with each other to produce
conformers that will fit the MHC molecule as well as the peptide attached to the MHC molecule.
So, rather than CDR3 trying to bind to the peptide, it tries to check whether the CDR3 loops fit in
the complex without any clash with the peptide. From the studies conducted by Nairuti Mehta, the
peptides also try to maximize internal hydrophobic interactions with a hydrophobic cluster in the
middle of the peptide. The peptide binds to the MHC molecule via N and C termini and certain
anchor residues bind to the binding pockets of the MHC molecule. Thus, it can be said that both
91
peptide and the CDR3 loops try to minimize the interactions with each other and tries to fit in
within the complex. Biologically, this also helps the TCR to quickly scan a large number of MHC
molecules to find the foreign antigen and activate the immune response.
The developed model can be used to further analyze the interactions between the CDR3 loops and
the peptide. By collecting more data, a machine learning model can also be generated to predict
the binding of a known peptide sequence to the TCR. This method can then be utilized for the
development of the T-cell receptors for specific tumor antigens for cancer immunotherapies.
The molecular dynamics simulations provided insight into the behavior of the peptide alone in
physiological solution. The results obtained from these simulations suggest that for a peptide to
bind to an MHC molecule, it needs to possess proper internal stability to maintain proper N-C
terminal distance, as well as having the correct anchor residues. It has been found that even the
peptides that remain stable with desired N-C terminal distance for long periods, having many
conformations, do not show good binding if desired anchor residues are not present. However,
further investigation of other PDB structures and their variants will allow a better understanding
of the characteristics of the peptides for binding to different MHC molecules and may also allow
identification of trends and patterns that will help with prediction of the binding of the peptides to
MHC molecules.
92
References
1. Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P. The Adaptive Immune System.
In: Molecular Biology of the Cell. 4th Edition. Garland Science; 2002. Accessed July 15,
2023. https://www.ncbi.nlm.nih.gov/books/NBK21070/
2. Mariuzza RA, Agnihotri P, Orban J. The structural basis of T-cell receptor (TCR) activation:
An enduring enigma. J Biol Chem. 2020;295(4):914-925. doi:10.1016/S0021-
9258(17)49904-2
3. Szeto C, Lobos CA, Nguyen AT, Gras S. TCR Recognition of Peptide–MHC-I: Rule Makers
and Breakers. Int J Mol Sci. 2020;22(1):68. doi:10.3390/ijms22010068
4. Garstka MA, Fish A, Celie PHN, et al. The first step of peptide selection in antigen
presentation by MHC class I molecules. Proc Natl Acad Sci. 2015;112(5):1505-1510.
doi:10.1073/pnas.1416543112
5. Bank RPD. RCSB PDB - 1AO7: COMPLEX BETWEEN HUMAN T-CELL RECEPTOR,
VIRAL PEPTIDE (TAX), AND HLA-A 0201. Accessed July 16, 2023.
https://www.rcsb.org/structure/1AO7
6. Batalia MA, Collins EJ. Peptide binding by class I and class II MHC molecules.
Biopolymers. 1997;43(4):281-302. doi:10.1002/(SICI)1097-0282(1997)43:4<281::AID-
BIP3>3.0.CO;2-R
7. Nikolich-Zugich J, Slifka MK, Messaoudi I. The many important facets of T-cell repertoire
diversity. Nat Rev Immunol. 2004;4(2):123-132. doi:10.1038/nri1292
8. Jerne NK. The somatic generation of immune recognition. Eur J Immunol. 2004;34(5):1234-
1242. doi:10.1002/eji.200425132
9. Housset D, Malissen B. What do TCR–pMHC crystal structures teach us about MHC
restriction and alloreactivity? Trends Immunol. 2003;24(8):429-437. doi:10.1016/S1471-
4906(03)00180-7
10. Rudolph MG, Luz JG, Wilson IA. Structural and Thermodynamic Correlates of T Cell
Signaling. Annu Rev Biophys Biomol Struct. 2002;31(1):121-149.
doi:10.1146/annurev.biophys.31.082901.134423
11. Garcia KC, Adams EJ. How the T Cell Receptor Sees Antigen—A Structural View. Cell.
2005;122(3):333-336. doi:10.1016/j.cell.2005.07.015
12. Zhu Y , Huang C, Su M, et al. Characterization of amino acid residues of T-cell receptors
interacting with HLA-A*02-restricted antigen peptides. Ann Transl Med. 2021;9(6):495.
doi:10.21037/atm-21-835
93
13. Wu LC, Tuot DS, Lyons DS, Garcia KC, Davis MM. Two-step binding mechanism for T-cell
receptor recognition of peptide–MHC. Nature. 2002;418(6897):552-556.
doi:10.1038/nature00920
14. Scott DR, Borbulevych OY , Piepenbrink KH, Corcelli SA, Baker BM. Disparate Degrees of
Hypervariable Loop Flexibility Control T-Cell Receptor Cross-Reactivity, Specificity, and
Binding Mechanism. J Mol Biol. 2011;414(3):385-400. doi:10.1016/j.jmb.2011.10.006
15. Jurtz VI, Jessen LE, Bentzen AK, et al. NetTCR: Sequence-Based Prediction of TCR Binding
to Peptide-MHC Complexes Using Convolutional Neural Networks. Bioinformatics; 2018.
doi:10.1101/433706
16. Pham MDN, Nguyen TN, Tran LS, et al. epiTCR: a highly sensitive predictor for TCR–
peptide binding. Birol I, ed. Bioinformatics. 2023;39(5):btad284.
doi:10.1093/bioinformatics/btad284
17. Gielis S, Moris P, Bittremieux W, et al. Detection of Enriched T Cell Epitope Specificity in
Full T Cell Receptor Sequence Repertoires. Front Immunol. 2019;10:2820.
doi:10.3389/fimmu.2019.02820
18. As S, Av R, Da F, Aa L. TCR-Pred: A new web-application for prediction of epitope and
MHC specificity for CDR3 TCR sequences using molecular fragment descriptors.
Immunology. Published online March 16, 2023. doi:10.1111/imm.13641
19. Nm M, Y L, V P, et al. Prediction of Peptide and TCR CDR3 Loops in Formation of Class I
MHC-Peptide-TCR Complexes Using Molecular Models with Solvation. Methods Mol Biol
Clifton NJ. 2023;2673. doi:10.1007/978-1-0716-3239-0_19
20. Morningstar-Kywi N, Wang K, Asbell TR, et al. Prediction of Water Distributions and
Displacement at Protein–Ligand Interfaces. J Chem Inf Model. 2022;62(6):1489-1497.
doi:10.1021/acs.jcim.1c01266
21. Patel V . TCR-Peptide Code and Data. Published online July 30, 2023. Accessed August 2,
2023. https://github.com/vbpatel12/Codes-And-Data-For-TCR-Peptide-Binding-Analysis.git
22. Mehta, Nairuti. MS Thesis, University of Southern California. Published online 2023.
23. RDKit. Accessed July 16, 2023. https://www.rdkit.org/
24. YASARA - Yet Another Scientific Artificial Reality Application. Accessed July 16, 2023.
http://www.yasara.org/
25. The YANACONDA macro language. Accessed July 16, 2023.
http://www.yasara.org/yanaconda.htm
26. Parker KC, Bednarek MA, Coligan JE. Scheme for ranking potential HLA-A2 binding
peptides based on independent binding of individual peptide side-chains. J Immunol.
1994;152(1):163-175. doi:10.4049/jimmunol.152.1.163
94
27. Nakamura S, Ohmura R, Nakanishi I. An Interaction-based Approach for Affinity Prediction
between Antigen Peptide and Human Leukocyte Antigen Using COMBINE Analysis. Chem-
Bio Inform J. 2017;17(0):93-102. doi:10.1273/cbij.17.93
28. SYFPEITHI: Epitope prediction. Accessed July 16, 2023.
http://www.syfpeithi.de/bin/MHCServer.dll/EpitopePrediction.htm
29. Zhang C, Anderson A, DeLisi C. Structural principles that govern the peptide-binding motifs
of class I MHC molecules. J Mol Biol. 1998;281(5):929-947. doi:10.1006/jmbi.1998.1982
95
Appendix
The codes and the dataset used for this thesis as well as the results are available in a Github Repository:
https://github.com/vbpatel12/Codes-And-Data-For-TCR-Peptide-Binding-Analysis.git.
Appendix 1: Code for Extraction of CDR3 loops:
96
Appendix 2: Code for insertion of the Varied CDR3 loops into the PDB file and correction of
torsion:
97
98
99
100
101
102
Appendix 3: Code for Clash Distance:
103
104
Appendix 4: Code for Inter-Hydrophobic Interactions:
105
106
Appendix 5: Code for Intra-Hydrophobic Interactions:
107
108
Appendix 6: Code for Inter-Oxygen Clashes:
109
110
Appendix 7: Code for Intra-Oxygen Clashes:
111
Abstract (if available)
Abstract
T-cells play an important role in the activation of the immune response, especially through adaptive immunity. The T-cell receptor scans the cells for MHC molecules containing an antigen peptide. The peptide molecule first binds to the MHC molecule and then the T-cell receptor attaches to this complex by changing the conformations of the complementarity-determining regions (CDRs) of the T-cell receptor. The CDRs bind to both MHC molecules and peptides. The CDR3 region is mainly important for the binding to the peptide and understanding the mechanism of its binding can allow an understanding of the immune response and help in the development of cancer immunotherapy. This thesis discusses a computational method developed for structural analysis of the CDR3 region of the T-cell receptor and the factors that play an important role in interactions of the peptide and CDR3 regions. The results showed that CDR3 loops and peptides try to maximize internal hydrophobic contacts with only a few hydrophobic interactions between the CDR3 loops and the peptide. Also, CDR3 ? loops played a major role in interaction with the peptide. Both the loops adjusted relative to each other to fit with the peptide-MHC complex. In addition, the structural stability of the peptide in the physiological solution was studied using molecular dynamics. The molecular dynamics results suggest that the stability of the peptide, in addition to anchor residues, plays an important role in binding with the MHC molecule.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Prediction of peptides in formation of MHC class I - peptide - TCR complexes using molecular models and artificial intelligence
PDF
Structural prediction of MHC-peptide-TCR interactions: potential for vaccine design
PDF
Solvation as a driving force for peptide docking to the major histocompatibility complex (MHC) class II molecules
PDF
Towards DNA-directed assembly of pMHC multimers for detection of low-affinity T cells
PDF
Developing and benchmarking computational tools to facilitate T cell receptor repertoire analysis
PDF
Computer modeling of protein-peptide interface solvation
PDF
Pharmacokinetic and molecular modeling of orally bioavailable peptides
PDF
Artificial intelligence in medicinal chemistry and drug discovery
PDF
Structure – dynamics – function analysis of class A GPCRs and exploration of chemical space using integrative computational approaches
PDF
Investigation of the synergistic effect of midostaurin, a tyrosine kinase inhibitor, with anti FLT3 antibodies-based therapies for acute myeloid leukemia
PDF
The role of CD99 in T cells
PDF
Investigating the effects of T cell mediated anti-leukemia activity in FLT3-ITD positive acute myeloid leukemia
PDF
Role of purinergic P2X4 receptors in regulation of dopamine homeostasis in the basal ganglia and associated behaviors
PDF
Enhancing the specificity and cytotoxicity of chimeric antigen receptor Natural Killer cells
PDF
Structural and biochemical studies of large T antigen: the SV40 replicative helicase
PDF
Mechanism study of SV40 large tumor antigen atpase and helicase functions in viral DNA replication
PDF
Exploring the LAG-3/FGL-1 pathway: novel insights and therapeutic approaches in cancer immunotherapy
PDF
Structural studies of nicotinic acetylcholine receptors and their regulatory complexes
PDF
Role of purinergic P2X7 receptors in inflammatory responses in the brain and liver: a study using a mouse model of chronic ethanol and high-fat diet exposure
PDF
Molecular characterization of the HIV-1 Vpu protein and its role in antagonizing the cellular restriction factor BST-2/tetherin both in vitro and in vivo
Asset Metadata
Creator
Patel, Vini (author)
Core Title
Structure-based computational analysis and prediction of TCR CDR3 loops in the TCR-peptide-MHC complex using solvation parameters and peptide molecular dynamics.
School
School of Pharmacy
Degree
Master of Science
Degree Program
Molecular Pharmacology and Toxicology
Degree Conferral Date
2023-08
Publication Date
08/09/2023
Defense Date
08/08/2023
Publisher
University of Southern California. Libraries
(digital)
Tag
CDR3 loops,computational chemistry,data analysis,molecular dynamics simulations,molecular interactions,OAI-PMH Harvest,python,solvation,structural analysis,T-cell receptors,TCR-peptide-MHC complexes
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Haworth, Ian S. (
committee chair
), Alachkar, Houda (
committee member
), Mangul, Serghei (
committee member
)
Creator Email
vbpatel@usc.edu;vbpatel@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113296706
Unique identifier
UC113296706
Identifier
etd-PatelVini-12220.pdf (filename)
Legacy Identifier
etd-PatelVini-12220
Document Type
Thesis
Rights
Patel, Vini
Type
texts
Source
20230809-usctheses-batch-1081
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Repository Email
cisadmin@lib.usc.edu
Tags
CDR3 loops
computational chemistry
data analysis
molecular dynamics simulations
molecular interactions
python
solvation
structural analysis
T-cell receptors
TCR-peptide-MHC complexes