Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Integration of KNIME and molecular docking for evaluation of tau fibril inhibitors
(USC Thesis Other)
Integration of KNIME and molecular docking for evaluation of tau fibril inhibitors
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Integration of KNIME and Molecular Docking for Evaluation of Tau Fibril Inhibitors
by
Zipeng Zheng
A Thesis Presented to the
FACULTY OF THE USC ALFRED E. MANN SCHOOL OF PHARMACY AND
PHARMACEUTICAL SCIENCE
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
(MOLECULAR PHARMACOLOGY AND TOXICOLOGY)
August 2024
ii
ACKNOWLEDGEMENTS
First and foremost, I am profoundly grateful to my advisor, Dr. Ian S. Haworth for the invaluable
guidance, unwavering support, and insightful feedback. His expertise and encouragement have
been instrumental in shaping my research and bringing this thesis to fruition.
I also extend my heartfelt thanks to the members of my thesis committee, Dr. Paul Seidler and
Dr. Clay Wang, for their thoughtful suggestions and constructive critiques, which greatly
enhanced the quality of my work.
I am deeply appreciative of my colleague Ruchira Joshi who provided stimulating discussions
and collaboration on the project “KNIME workflows for applications in medicinal and
computational chemistry.”
I would like to thank Simulations Plus for providing a license for ADMETPredictor® through
the University+ program.
iii
TABLE OF CONTENTS
Acknowledgements..........................................................................................................................ii
List of Tables....................................................................................................................................v
List of Figures.................................................................................................................................vi
Abstract ........................................................................................................................................viii
Chapter 1: Introduction...................................................................................................................1
1.1 Alzheimer’s Disease and Tau Protein.......................................................................1
1.2 Tau Fibrils and Stacked Ligand Characteristics.......................................................3
1.3 KNIME as a Computational Tool for Drug Discovery and Design.........................8
1.4 Molecular Docking Principles..................................................................................9
1.5 Purpose of the Thesis..............................................................................................10
Chapter 2: AI with KNIME............................................................................................................12
2.1 Background.............................................................................................................12
2.2 Tau Inhibition Data.................................................................................................12
2.3 Calculation of Molecular Descriptors.....................................................................12
2.4 Development of KNIME Workflows......................................................................13
2.4.1 Data Cleaning...............................................................................................13
2.4.2 Decision Tree Analysis and Simple Machine Learning Model....................14
2.4.3 Genetic Algorithm and Linear Correlation...................................................16
2.4.4 Variable Threshold and Prediction of Activity..............................................17
2.5 Results.....................................................................................................................18
2.6 Discussion...............................................................................................................29
Chapter 3: Molecular Docking Methodology...………….............................................................30
3.1 Background.............................................................................................................30
3.2 Ligand Preparation with KNIME............................................................................33
3.3 Molecular Docking using TMD..............................................................................37
3.4 Evaluation of Ligand-Ligand Stacking Interaction with TMD...............................39
Chapter 4: Molecular docking Results...........................................................................................40
4.1 Results from Docking with TMD............................................................................40
4.2 Results from Ligand Stacking Calculations............................................................43
4.3 Discussion...............................................................................................................47
Chapter 5: Improved AI Algorithm................................................................................................48
5.1 Background.............................................................................................................48
5.2 Modification of the AI Model with KNIME...........................................................48
5.3 Results.....................................................................................................................50
5.4 Discussion...............................................................................................................50
iv
References......................................................................................................................................53
Appendices.....................................................................................................................................57
Appendix A: Python Code for the Orchestration of TMD....................................57
Appendix B: Python Code for Extracting Parameters from Stacking Calculation
Output................................................................................................63
Appendix C: Parameters Generated by TMD for all 295 inhibitors......................67
v
LIST OF TABLES
Table 1. Identification of key features and feature value for enrichment of inhibitors in the
decision tree analysis for different puncta……………………………………………………….20
Table 2. Impact of increasing puncta count on classification and prediction values using
linear sampling…………………………………………………………………………………...21
Table 3. Ranking of molecular features' importance in the machine learning model,
comparing selection using a genetic algorithm with and without prior linear correlation
analysis for elimination…………...……………………………………………………………...23
Table 4. Ranking of the importance of molecular features in the machine learning model,
comparing selection with a genetic algorithm both with and without prior elimination
through linear correlation analysis……………………………………………………………….24
Table 5. Configuration of the “Table Creator” Node in the Acid and Base Charge Adjustment
Branch……………………………………………………………………………………………36
Table 6. 3D parameters extracted from the stacking calculation for selected inhibitors
generated by TMD……………………………………………………………………………….46
vi
LIST OF FIGURES
Figure 1. Primary structure and functional domains of the full-length tau………………………..2
Figure 2. Cryo-EM density models of paired helical filament (a) and straight filament (b)
conformations of tau fibrils………………………………………………………………………..4
Figure 3. Sequence alignment of the four microtubule-binding repeats (R1–R4)
with the eight observed β-strand regions (a) and the visual representation of the
β-strand regions (b)………………………………………………………………………………..5
Figure 4. Cryo-EM density model of PET ligand GTP-1 bound to tau fibrils……………………6
Figure 5. Cryo-EM density model of EGCG bound to tau fibrils………………………...………7
Figure 6. General overview of the thesis………………………………………………………...11
Figure 7. KNIME workflows for data cleaning, decision tree analysis, and a preliminary
machine learning model………………………………………………………………………….15
Figure 8. KNIME workflows for genetic algorithm and linear correlation……………………...16
Figure 9. KNIME workflow with input for variation of threshold and development
of machine learning model……………………………………………………………………….17
Figure 10. Selection of the most acidic pKa among compounds with multiple acidic groups…..18
Figure 11. Results from the decision tree analysis of 928 compounds as potential tau fibril
inhibitors, using puncta count threshold of 20,000………………………………………………19
Figure 12. Counts of molecular features chosen by a genetic algorithm………………………...22
Figure 13. Scatter plot of predicted versus experimental puncta counts illustrating tau fibril
formation in the presence of 278 potential inhibitors……………………………………………26
Figure 14. Prediction of tau fibril inhibition using a machine learning model with
varying puncta thresholds………………………………………………………………………..28
Figure 15. Overview of the molecular docking process with TMD…………………………….31
Figure 16. KNIME workflows for calculation of 3D coordinates using SMILES………………32
Figure 17. Configuration of data_orig.in file used in the molecular docking process…………...38
Figure 18. Output generated from make_atoms.exe listing the three atoms of contact
between the ligand and tau fibril…………………………………………………………………41
Figure 19. Z-matrix output with renumbered atoms generated by makezmat.exe………………41
vii
Figure 20. Modified output consisting of default setting as shown in Figure 17 and
rotatable bonds identified by find_rot_bond.exe………………………………………………...42
Figure 21. Stacking calculation output of EGCG computed by stack_calc.exe of TMD………..44
Figure 22. Stackable and docked poses of EGCG……………………………………………….45
Figure 23: KNIME workflow for introducing the generated parameters in the KNIME
AI model discussed in Chapter 2…………………………………………………………...……49
Figure 24. Prediction of tau fibril inhibition using a machine learning model built with
general descriptors with varying average Z-score……………………………………………….51
Figure 25. Prediction of tau fibril inhibition using a machine learning model built with
specific descriptors generated using TMD with varying average Z-score……………………….52
viii
ABSTRACT
Alzheimer's disease is a progressive neurodegenerative disorder associated with the aggregation
of tau into intracellular neurofibrillary tangles. The structure and function of tau fibrils have been
extensively studied using modern techniques like Cryo-EM. Cryo-EM has revealed that inhibitors
of tau fibrils such as EGCG form highly stable stacks as it binds to tau fibrils. Other inhibitors of
tau fibril formation may exhibit similar stacking behaviors. In this work, a predictive approach is
presented using a machine learning model built with KNIME. Initially, general descriptors from a
polyphenol database are utilized to construct the AI model. To improve the model’s accuracy,
molecular modeling studies with TMD, a tool currently under development in our laboratory, was
employed to generate more specific descriptors such as stacking score for incorporation into the
KNIME AI model. This method can be a novel approach in understanding the binding and
inhibitory nature of inhibitors of tau fibrils.
1
CHAPTER 1: INTRODUCTION
1.1 Alzheimer's Disease and Tau Protein
Alzheimer’s disease (AD) is a progressive neurodegenerative disease that affects cognitive
processes, such as memory, language, and thinking.
1
It was first characterized by the German
psychiatrist, Alois Alzheimer, in the early 1900s.
2 As the most common consequence of AD,
dementia affects around 6.7 million Americans age 65 and older, and this number will project to
13.8 million by 2060 unless new medicine is developed.
1 While AD has complex pathophysiology,
the two most common hallmark features of AD are extracellular plagues of amyloid-beta (Aβ)
peptides and intracellular neurofibrillary tangles (NFTs) of tubulin associated unit (Tau) protein
aggregates.
3,4 Despite the significance of both Aβ and tau as markers of Alzheimer's disease, this
work centers on tau. Discovered by Marc Kirschner and his team in 1975, tau is a microtubuleassociated protein (MAP) crucial for microtubule stability and is highly abundant in the axons of
neurons.
4–6
In humans, the tau gene is located at chromosome 17 at a band position 17q21.
7 Tau
has four functional domains: N-terminal projection domain, proline-rich regions, microtubulebinding region, and C-terminal domain (Figure 1).
8 Alternative splicing of Tau can generate six
isoforms, ranging from 352 to 441 residues.
9
Isoforms are formed by alternative inserts such as
0N, 1N, or 2N in the N-terminal projection domain and binding repeats such as R1, R2, R3, and
R4 in the C-terminal domain. As a model to exemplify tau primary structure, the full-length tau or
htau40 contains the 2N inserts (N1 and N2) in the N-terminal projection domain and all 4 binding
repeats (R1 to R4) in the C-terminal domain, hence 2N4R (Figure 1).
10 Among the 441 residues
of 2N4R, there are many serine and threonine residues (80), but fewer hydrophobic amino acids
compared to other proteins such as alanine (34), valine (25), isoleucine (15), leucine (20),
methionine (5), phenylalanine (3), tyrosine (5), and no tryptophan (0).8,9 Tau's amino acid
composition is unusually hydrophilic, which prevents it from forming the compact folded structure
2
as seen in most cytosolic proteins. Studies using circular dichroism, NMR, and small angle X-ray
scattering (SAXS) have shown that Tau is "natively unfolded" or "intrinsically disordered".
9–11 In
other words, tau is highly flexible and mobile, with only a small amount of transient secondary
structures like alpha-helix, beta-strand, and polyproline II helix. The N-terminal region has an
isoelectric point (pI) of 3.8, the proline-rich domain has a pI of 11.4, and the C-terminal region has
a pI of 10.8.9 This means tau is a dipole with two oppositely charged domains, and this asymmetry
of charges is crucial for interactions with microtubules and other partners, as well as for internal
folding and Tau aggregation.
9
Figure 1. Primary structure and functional domains of the full-length tau. Figure is from ref
10.
3
1.2 Tau Fibrils and Stacked Ligand Characteristics
Modern techniques like cryo-electron microscopy have revolutionized our understanding
of the structure tau fibril. Tau fibrils in AD can adopt two shapes: paired helical filament (PHF)
and straight filament (SF) (Figure 2).
12 The core of PHFs and SFs consists of eight β-sheets (β1-8)
that form a C-shaped structure along the length of the protofilament. This structure includes a βhelix region with three β-sheets arranged in a triangle and two cross-β regions where pairs of βsheets pack anti-parallel to each other, as shown in Figure 3.12 The ultrastructural differences
between paired helical filaments (PHFs) and straight filaments (SFs) arise from variations in how
the two protofilaments interact laterally. In PHFs, the protofilaments are identical and related by
helical symmetry, while in SFs, the protofilaments in SFs pack asymmetrically.
In addition to structure elucidation of tau fibrils, cryo-EM studies have provided insights
into how inhibitors bind to their target structures at the molecular level. Two examples of such
inhibitors discussed are tau selective positron-emission tomography (PET) ligands and
epigallocatechin gallate (EGCG). GTP-1 (Genentech Tau Probe 1) is a second-generation tau PET
tracer with a Kd of 11 nM (Figure 4a).13 Its binding as a stack to tau fibrils is illustrated in Figures
4b to 4e. The structure shows extra density where GTP-1 binds to a solvent-exposed C shaped cleft
formed by β6 and β7 strands (Figure 4b and c).14 This density is the same in both protofilaments,
indicating that GTP-1 binds equally to both.
14 Each GTP-1 molecule interacts with three β-strands:
it contacts Gln351 in strand 1, Gln351 and Lys353 in strand 2, and Ile360 in strand 3, as well as
the backbone between Gln351 and Lys353 in strands 1 and 2 (Figure 4d and e).14 The piperidine
ring and fluoroethyl tail of GTP-1 are aligned with the filament, interacting with the sidechain and
backbone of Gln351 in both strands. Additionally, the GTP-1 heterocycles are positioned at an
ideal distance for π-π stacking and can form an extended assembly supported by the tau filament.
14
4
Specifically, the primary contribution of the aromatic-aromatic interaction in GTP-1 comes from
the rigid heteroaromatic region (pyrimido[1,2-a]benzimidazole).
14 This unique stable π-π stacking
is also observed in tau fibrils incubated with (–)-epigallocatechin-3-gallate (EGCG), a green tea
polyphenol known to disaggregate amyloid filaments in vitro (Figure 5a).15,16 As shown in Figure
5b and c, EGCG is bound to sites different from GTP-1.16 Site 1 is in a cleft between two tau
protofilaments and surrounded by polar residues Asn327, His329, Glu338, and Lys340 and shows
the strongest density, as the three-lobe shape observed matches EGCG’s structure.16 Sites 2 and 3,
near the β-helix region and close to Lys321 and Lys317, have smaller densities and lack the threelobe pattern, suggesting they are not EGCG.16 Both studies highlight a crucial aspect: ligand
stacking through π-π interactions appears to play a significant role in inhibition.
Figure 2. Cryo-EM density models of paired helical filament (a) and straight filament (b)
conformations of tau fibrils. Figure is from ref 12.
5
Figure 3. Sequence alignment of the four microtubule-binding repeats (R1–R4) with the
eight observed β-strand regions (a) and the visual representation of the β-strand regions
(b). Figure is from ref 12.
6
Figure 4. Cryo-EM density model of PET ligand GTP-1 bound to tau fibrils. (a) molecular
structure of GTP-1; (b) cryo-EM map of tau fibrils incubated with GTP-1, showing extra density
for GTP-1 indicated by white triangles; (c) refined atomic model of tau fibrils bound to GLP-1;
(d) close view of GTP-1 interactions with tau fibrils; (e) side view of the tau fibril bound to GLP1. Figure is from ref 13.
7
Figure 5. Cryo-EM density
model of EGCG bound to tau
fibrils. (a) molecular structure
of EGCG, and it includes a
benzenediol ring (A) attached
to a tetrahydropyran moiety (C),
which connects to a galloyl ring
(D) and a pyrogallol ring (B);
(b) tau fibril structure after 3
hours of incubation with
EGCG, showing three new
density regions (Sites 1–3) with
EGCG addition; (c) detailed
top and side views of the
interactions formed between
EGCG and Site 1 of tau fibril.
Figure is from ref 16.
8
1.3 KNIME as a Computational Tool for Drug Discovery and Design
Artificial intelligence (AI) is increasingly recognized for its potential to transform drug
discovery in the pharmaceutical industry.
17 Traditionally, this process heavily relies on methods
such as trial-and-error experimentation and high-throughput screening, and it can be slow, costly,
and complex.17 AI techniques like machine learning (ML) and natural language processing offer
the opportunity to accelerate and improve drug discovery by enabling more efficient and accurate
analysis of large amounts of data.
18 AI offers a powerful tool for evaluating Quantitative StructureActivity Relationship (QSAR), which involves using molecular descriptors to construct
computational models that quantify and predict various properties associated with ligand-protein
binding.19 However, this task is inherently challenging as it requires both the expertise of
laboratory medicinal chemists and advanced computational methodologies. One viable solution
lies in the use of analytics software such as the KNIME platform (University of Konstanz,
Konstanz Information Miner). KNIME has been successfully employed to develop workflows and
pipelines for tasks such as identifying drug scaffolds, retrieving PDB files, conducting molecular
filtering, virtual drug screening, and designing chemical libraries.20–23 Moreover, KNIME has been
utilized to standardize chemical structures, ensuring they are optimized for QSAR modeling.24
Another significant advantage of KNIME is its minimal requirement for coding knowledge,
making it accessible to a wide range of users. Its flexibility allows for the creation of customized
workflows tailored to specific needs in drug discovery and QSAR modeling. Additionally, KNIME
Hub provides a convenient platform where users can easily access and download workflows
developed by others.
9
1.4 Molecular Docking Principles
Computer-aided drug design (CADD) encompasses various computational strategies
aimed at discovering, designing, and developing new therapeutic agents.
25 It plays a critical role
in enhancing active ligands, identifying new drugs, and gaining insights into biological processes
at the molecular level.
25 Furthermore, the application areas of CADD methods are expanding due
to the growth in biological and chemical data, increased data storage capacity, the identification of
more drug targets, and advancements in data processing capabilities.26 CADD methods can be
categorized into target-based and ligand-based approaches, depending on the type of data
available.27 In target-based drug design, the goal is to create potential active compounds using
target structures.25 One example of such method is molecular docking, which is a structure-based
computational method that predicts the binding mode and affinity between small molecules and
macromolecules by simulating their interactions.25 Examples of software that perform molecular
docking include AutoDock, AutoDock Vina, GOLD, Glide, MOE, ICM, and FlexX.
25
The docking process consists of two main steps: first, determining the conformation,
position, and orientation of the ligand within the binding sites (known as the pose), and second,
evaluating the binding affinity. Prior to generating ligand poses, the binding site is typically
identified before docking ligands. Three-dimensional structures of protein-ligand complexes are
commonly used for this purpose, and they are easily accessible to the public in databases like the
protein data bank (PDB).
25 During the pose generation stage, the structural parameters of ligands,
including torsional (dihedral), translational, and rotational degrees of freedom, are systematically
adjusted.28 Conformational search algorithms accomplish this by utilizing systematic and
stochastic search methods. Systematic search methods methodically change the structural
parameters, gradually altering the ligand's shape.
28 As a result, all possible combinations of the
10
structural parameters will be explored. In contrast, stochastic methods randomly adjust these
parameters to explore different conformations.
28 The scoring function's role is to distinguish
correct poses from incorrect ones, or active binders from inactive compounds, within a reasonable
computation time.29 However, scoring functions involve estimation rather than direct calculation
of the binding affinity between the protein and ligand, relying on various assumptions and
simplifications.29 These functions are typically categorized as force-field-based, empirical, or
knowledge-based scoring functions.
29
1.5 Purpose of the Thesis
Figure 6 illustrates the overall structure of the thesis. Chapter 2 highlights the use of
KNIME to create a machine learning model using general molecular descriptors for predicting
the bioactivity of a polyphenol library against tau fibrils. While general molecular descriptors
can be used to build machine learning models, the model lacks specificity. Therefore, molecular
docking studies of tau fibrils with ligands was conducted. Additionally, more specific descriptors
relevant to the ligand stacking observed in this class of inhibitors were generated. The
methodology and findings from these docking studies are discussed in in chapters 3 and 4. At
last, descriptors generated using molecular docking techniques were used to improve the
machine learning model, as discussed in chapter 5.
11
Figure 6. General overview of the thesis. The red dotted box highlights the development of the
AI model in KNIME, covered in Chapter 2. The green dotted box outlines the creation of PDB
files for the molecular docking process, detailed in Chapter 3. The blue dotted box describes the
molecular docking process and parameter generation using TMD, also discussed in Chapter 3.
The curved arrow shows the transfer of generated parameters into the AI model, which is
explored in Chapter 5.
12
CHAPTER 2: AI WITH KNIME
2.1 Background
The purpose of this chapter is to develop a decision tree model and machine learning
algorithm based on molecular descriptors and physicochemical properties of 930 potential
inhibitors of Tau fibrillation. To achieve this, physicochemical properties and molecular
descriptors of these inhibitors were first calculated using ADMET Predictor. Subsequently, the
calculation output was used as input for data analysis, machine learning, and computational
chemistry in KNIME. The work in this chapter has been published and was performed in
collaboration with Ruchira Joshi, as noted in several sections below.30
2.2 Tau Inhibition Data
Tau inhibition data of the library of 930 polyphenol inhibitors were provided by Dr. Paul
Seidler’s research group. A collection of 930 molecules was obtained from MedChemExpress
(Cat.No. HY-L057) along with their SMILES strings.
2.3 Calculation of Molecular Predictors
The canonical SMILES strings of the 930 compounds from the dataset were analyzed using
ADMET Predictor 10.0 (Simulations Plus, Lancaster, CA). This software was used to compute
physicochemical properties and molecular descriptors, collectively referred to as molecular
features. However, one compound, tannic acid, was excluded from analysis because it contains 25
ionizable groups, exceeding the software's limit of 20 groups. The molecular features calculated
for the 929 compounds were exported from ADMET Predictor as an Excel spreadsheet.
Subsequently, this spreadsheet served as input for further data analysis, machine learning, and
computational chemistry tasks in KNIME Analytics version 4.7.2.
13
2.4 Development of KNIME Workflows
There are three distinct KNIME workflows developed for specific purposes. The workflow
described in section 2.4.1 initially curated an Excel sheet containing molecular features. It then
processed the cleaned data through a decision analysis tree model and a simple machine learning
model as described in section 2.4.2. Sections 2.4.1 and 2.4.2 make up the workflow shown in
Figure 7. In section 2.4.3, another workflow focused on using genetic algorithms and linear
correlation techniques (Figure 8), detailed in Joshi's thesis. Lastly, the third workflow represents
an enhanced version of the initial workflow described in 2.4.4. It incorporates improved features
identified through genetic algorithms and linear correlation to predict inhibition (Figure 9).
2.4.1 Data Cleaning
The data cleaning process, illustrated in Figure 7B (part of Figure 7A), involved a
systematic approach to refine the dataset for analysis. Initially, two "Column Filter" nodes were
used to eliminate unnecessary columns from the dataset. The first node removed columns
identified as not essential, while the second node further pruned columns after an initial assessment
of their usefulness. Subsequently, three "Row Filter" nodes were applied to exclude specific rows:
those containing descriptions of molecular features, entries related to lysipressin due to missing
puncta data, and the entry for tannic acid, which exceeded ionizable group limits and was excluded
earlier. To enhance data quality, the multiple pKa values in the S+acidic_pKa and S+basic_pKa
columns were consolidated to retain only the most acidic and most basic values using a series of
"String Manipulation" nodes (Figure 7C). For example, a substring extraction method extracted
the value after the first semicolon in the pKa string, followed by formatting to remove any spaces
and renaming to "most acidic pKa" or "most basic pKa." A "Rule Engine" node was utilized to
convert any "None" values in these columns to a standardized value (e.g., 14.0). The workflows
14
handling acidic and basic pKa values were integrated using a "Joiner Node" based on matching
row IDs. After integration, original pKa columns were removed using another "Column Filter"
node to streamline the dataset. Subsequently, the refined dataset from pKa processing was joined
with the filtered dataset using a second "Joiner Node." A "Constant Value Column Filter" ensured
columns with only zero values were excluded. To finalize data preparation for subsequent analysis,
"Column Auto Type Cast" nodes were employed to ensure data types were correctly interpreted,
followed by resetting of row IDs using a "RowID" node. This meticulous cleaning and preparation
process resulted in a high-quality dataset containing 65 molecular features derived from ADMET
Predictor, optimized for subsequent processes.
2.4.2 Decision Tree Analysis and Simple Machine Learning Model
Following the data cleaning process, a simple decision tree analysis was integrated into the
workflow using three additional nodes (Figure 7A). Specifically, a "Rule Engine" node was
introduced with the following expression: "$puncta$ < n => 'Inhibitor', $puncta$ >= n => 'Noninhibitor'," where 'n' represents the threshold number of puncta. This generated a new column
labeled "puncta_classification," which categorized each compound as either an "Inhibitor" or
"Non-Inhibitor" based on its puncta count. Subsequently, a "Column Filter" node was employed
to remove the original puncta column from the dataset. The newly created "puncta_classification"
column was designated as the class column for classification purposes, with all other settings left
unchanged. Following the data curating process, a machine learning model was constructed using
three key nodes: "Partitioning," "Gradient Boosted Tree Learner," and "Gradient Boosted Trees
Predictor" (Figure 7A). Initially, the "Partitioning" node divided the dataset into subsets using a
linear selection method, allocating 70% of the data to the training set. Subsequently, the "Gradient
Boosted Tree Learner" node utilized the "puncta_classification" column as the target variable to
15
train a gradient boosted tree model. This model was designed to predict outcomes based on the
characteristics learned from the training data. The trained model was then applied to the remaining
30% of the dataset using the "Gradient Boosted Trees Predictor" node, which generated predictions
for each instance. To evaluate the model's performance, the "Scorer" node was employed to
categorize the predictions into a confusion matrix. This matrix provides a clear overview of the
model's accuracy in classifying compounds as either "Inhibitor" or "Non-Inhibitor" based on their
puncta classification.
Figure 7. KNIME workflows for data cleaning, decision tree analysis, and a preliminary
machine learning model. (a) Overview of the main workflow; (b) workflow for data cleaning
procedure, including manipulation of pKa values; (c) workflow for extracting the most acidic pKa
from compounds with multiple acidic groups. Output of the decision tree is shown in Figure 11.
16
2.4.3 Genetic Algorithm and Linear Correlation
A significant limitation of the machine learning approach described in section 2.4.2 is its
inability to clearly identify the key molecular features that contribute to the model's predictions.
To overcome this challenge and pinpoint the crucial features essential for predictive modeling, a
genetic algorithm workflow was implemented (Figure 8). In addition, to mitigate potential
negative influences from correlations among molecular features in the results of the genetic
algorithm, a linear correlation routine was implemented (Figure 8). This approach aims to integrate
multiple correlated features into a single representative feature, thereby reducing the overall
number of molecular descriptors used in the analysis. Both workflows utilized the curated data
described in section 2.4.1 as input, and details of the development of these two workflows are
described in Joshi’s thesis.
Figure 8. KNIME workflows for genetic algorithm and linear correlation. Overall process of
the KNIME workflows that perform genetic algorithm and linear correlation (a), feature list
17
generation (b), application of a genetic algorithm (c), feature counting (d), and linear correlation
routine (e) prior to the genetic algorithm. Output is shown in Figure 12.
2.4.4 Variable Threshold and Prediction of Activity
The molecular features identified in section 2.4.3 were subsequently utilized to construct a
new machine learning model within the workflow depicted in Figure 9. This workflow was based
upon the simple machine learning model workflow described in 2.4.3 but also integrated a method
for adjusting the puncta threshold, allowing for the assessment of predicted thresholds for each
molecule in the test set. This approach facilitated a more nuanced evaluation of how varying puncta
thresholds impacted the model's predictions.
Figure 9. KNIME workflow with input for variation of threshold and development of
machine learning model. “Figure 2 outputs” here refers to Figure 8 outputs of the thesis. Data
outputs is shown in Figure 14.
18
2.5 Results
In the data cleaning process illustrated in Figure 7B, a key objective was to transform
multiple pKa values, initially stored in a single column from ADMET Predictor, into the most
acidic and/or basic pKa value for each molecule, as detailed in Figure 7C. This conversion of pKa
data was demonstrated for 10 compounds in Figure 10. Following this transformation, columns
(molecular features) containing only zero values across all compounds were removed. Subsequent
data formatting steps in Figure 7B led to the creation of a refined dataset comprising 928
compounds.
Figure 10. Selection of the most acidic pKa among compounds with multiple acidic groups.
This result is obtained from the workflow in Figure 7C.
In the decision tree analysis presented in Figure 7A, the goal was to identify molecular
features that distinguish compounds inhibiting tau fibril formation from those that do not, using a
puncta threshold of 20,000 (where inhibitors had a puncta count <20,000). The distribution of the
928 compounds showed 262 (28.2%) as inhibitors and 666 (71.8%) as non-inhibitors (Figure 11,
top panel). Among the molecular features examined, the aromatic hydroxyl group count
(ArHdrxl_-OH) emerged as significantly associated with inhibition of tau fibril formation (Figure
11, bottom panel). Specifically, compounds containing at least 5 aromatic hydroxyl groups
exhibited a notably higher proportion of inhibitors—78.6% compared to the overall 28.2%
Data from ADMET Predictor Data after Processing in KNIME
19
inhibitor ratio across all 928 compounds. This finding was consistent across various puncta count
thresholds used to define inhibitors (Table 1), underscoring the potential importance of ArHdrxl_-
OH as a predictive feature in models targeting tau fibril inhibition.
Figure 11. Results from the decision tree analysis of 928 compounds as potential tau fibril
inhibitors, using puncta count threshold of 20,000.
20
Table 1. Identification of key features and feature value for enrichment of inhibitors in the
decision tree analysis for different puncta counts.
Puncta count Key feature Feature value % Inhibitors in
whole population
% Inhibitors among
molecules with feature
10,000 ArHdrxl_-OH >4.5 28 (3.0%) 21 (37.5%)
12,500 ArHdrxl_-OH >4.5 62 (6.7%) 27 (48.2%)
15,000 ArHdrxl_-OH >4.5 116 (12.5%) 31 (55.4%)
17,500 ArHdrxl_-OH >4.5 188 (20.3%) 39 (69.6%)
20,000 ArHdrxl_-OH >4.5 262 (28.2%) 44 (78.6%)
22,500 ArHdrxl_-OH >4.5 372 (40.1%) 49 (87.5%)
25,000 N_IoAcAt >3.5 465 (50.1%) 121 (73.8%)
27,500 ArHdrxl_-OH >3.5 567 (61.1%) 126 (86.3%)
30,000 ArHdrxl_-OH >3.5 673 (72.5%) 134 (91.8%)
In the simple machine learning model developed according to Figure 7A, where the dataset
of 928 compounds was partitioned into a training set (649 compounds, 70%) and a test set (279
compounds, 30%) using linear sampling, the threshold puncta count defining an inhibitor was
manually specified within the workflow (Table 2). At a threshold of 20,000 puncta, the model
achieved an accuracy of 70%, demonstrating good specificity but lower sensitivity.
21
Table 2. Impact of increasing puncta count on classification and prediction values using
linear sampling.
Puncta
count
True
Negative
(TN)
False
Positive
(FP)
False
Negative
(FN)
True
Positive
(TP)
Accuracy Sensitivitya Specificityb
10,000 265 7 4 3 0.961 0.429 0.974
15,000 231 15 25 8 0.857 0.242 0.939
20,000 179 23 59 18 0.706 0.234 0.886
25,000 84 57 64 74 0.566 0.536 0.596
30,000 8 71 22 178 0.667 0.890 0.101
a Sensitivity = TP/TP+FN
b Specificity = TN/TN+FP
In the output generated by the genetic algorithm, 40 features were selected more than 218
times, while 25 features appeared less frequently (Figure 12, blue bars). Notably, HBD (number
of hydrogen bond donors) emerged as the most frequently chosen feature, selected 983 times,
indicating its strong relevance in the model. Conversely, ArHdrxl_-OH (number of aromatic
hydroxyl groups) was identified as the 10th most selected feature, appearing 377 times (Table 3).
While this frequency indicates that ArHdrxl_-OH was prioritized more than randomly expected
(218 times), it contrasts with its portrayal as the most significant feature in the decision tree
analysis (Figure 11). However, after implementing the linear correlation method, the key finding
was the prominent re-emergence of ArHdrxl_-OH (number of aromatic hydroxyl groups) as the
most influential molecular feature. This feature appeared in 794 out of 889 models, demonstrating
its dominant role in predicting tau fibril inhibition (Figure 12). This outcome was consistent with
22
the results obtained from the decision tree analysis, confirming that the correlation routine
effectively resolved multicollinearity issues among the features. In the refined analysis post-linear
correlation, all top 10 features appeared more than 300 times in the genetic algorithm (Figure 12),
indicating a substantive change in feature importance compared to the initial analysis without
linear correlation (Table 4).
Figure 12. Counts of molecular features chosen by a genetic algorithm. The counts are shown
with (red) and without (blue) preliminary linear correlation analysis to reduce multicollinearity.
The features are ordered from left to right based on their significance following the linear
correlation analysis
23
Table 3. Ranking of molecular features' importance in the machine learning model,
comparing selection using a genetic algorithm with and without prior linear correlation
analysis for elimination.
Features Without linear correlation With linear correlation
Rank Count Rank Count
HBD 1 983 Eliminated Eliminated
EqualChi 2 642 2 406
F_DbleB 3 620 8 334
HBA 4 573 Eliminated Eliminated
N_IoAcAt 5 507 Eliminated Eliminated
HBDo 6 462 Eliminated Eliminated
EqualEta 7 443 Eliminated Eliminated
PriAmAli_-NH2 8 441 Eliminated Eliminated
HBDH 9 405 Eliminated Eliminated
ArHdrxl_-OH 10 377 1 794
24
Table 4. Ranking of the importance of molecular features in the machine learning model,
comparing selection with a genetic algorithm both with and without prior elimination
through linear correlation analysis.
Features With linear correlation Without linear correlation
Rank Count Rank Count
ArHdrxl_-OH 1 794 10 377
EqualChi 2 406 2 642
N_Carbon 3 398 24 287
N_Pisyms 4 370 19 324
F_SgleB 5 344 22 293
IHB 6 342 59 144
FormalQ 7 335 44 215
F_DbleB 8 334 3 620
N_Kekule 9 321 15 341
Carbonyl_C=O 10 311 16 339
25
The 41 non-correlated features were integrated into a new machine learning model as part
of a workflow that enabled automated adjustment of the puncta threshold (Figure 9). This model
was evaluated using puncta intervals of 1000 to predict inhibition of tau fibril formation among
279 compounds in the test set, representing 30% of the total 928 compounds. The results compared
against experimental data are illustrated in Figure 13. At a puncta threshold of 20,000, the model
correctly identified 19 true positives and incorrectly flagged 21 compounds as false positives. In
contrast, a random selection of 40 compounds typically yields around 11 active inhibitors. This
demonstrates the model's capability to identify 19 potential inhibitors more effectively than
random chance. When the puncta threshold was lowered to 15,000, the model identified 9 true
positives and 15 false positives. In comparison, random selection of 24 compounds would typically
result in choosing 7 active inhibitors.
26
Figure 13. Scatter plot of predicted versus experimental puncta counts illustrating tau fibril
formation in the presence of 278 potential inhibitors. The 278 compounds make up the test
set. TP: true positives; FP: false positives; TN: true negatives; FN: false negatives.
27
Figure 14 presents a detailed analysis of the prediction outcomes for different categories
of compounds based on their experimental puncta counts in inhibiting tau fibril formation. Notably,
the top section of the figure focuses on compounds identified as strong inhibitors, characterized
by experimental puncta count of less than 15,000. Among these, EGCG exhibits a "blue" region
in the prediction graph starting below 15,000, aligning with its experimental efficacy. These
instances are labeled as true positives, indicating the model's accuracy in predicting their inhibitory
activity. Conversely, compounds closest to the median puncta count of approximately 24,958, with
a range from 24,621 to 25,675, are predicted as non-inhibitors (true negatives), consistent with
their experimental classification. This section illustrates the model's ability to correctly identify
compounds that do not inhibit tau fibril formation based on their puncta counts. In contrast,
compounds with the least inhibitory activity, characterized by experimental puncta counts
exceeding 40,000, are accurately predicted as inactive by the model (true negatives). This
alignment with experimental data reinforces the model's capability to discern compounds that lack
efficacy in inhibiting tau fibril formation. However, the figure also highlights areas where the
model exhibits shortcomings. For instance, following EGCG, there are four compounds identified
as false negatives because their "blue" prediction region starts only above 15,000, despite their
experimental puncta counts falling below this threshold. This discrepancy indicates instances
where the model fails to correctly identify these compounds as inhibitors. Moreover, 7-hydroxy4H-chromen-4-one is noted as a false positive because although it initially shows a "blue" region
indicating inhibition below 15,000, it also displays subsequent "pink" regions above this threshold.
This inconsistency suggests a misprediction where the model incorrectly identifies it as an inhibitor.
Overall, figure 14 provides valuable insights into the model's performance in predicting
compounds' efficacy in inhibiting tau fibril formation. It underscores both the strengths of the
28
model in correctly classifying active and inactive compounds and areas for improvement to
enhance its accuracy, particularly in addressing false negatives and false positives. These findings
are pivotal for refining the model further and advancing its utility in drug discovery efforts
targeting neurodegenerative diseases.
Figure 14. Prediction of tau fibril inhibition using a machine learning model with varying
puncta thresholds. Calculations were done at 1,000-puncta intervals, as shown at the top of the
figure. Compounds are categorized as follows: the 10 strongest inhibitors (puncta count < 15,000),
10 compounds near the median puncta count (24,621–25,675), and the 10 least active compounds
(puncta count > 40,000). The blue and pink regions indicate predictions of inhibition and noninhibition, respectively.
29
2.6 Discussion
It is important to note that the molecular features utilized in the current model are general
and not specifically tailored to compounds expected to inhibit tau fibril formation. As such, the
model lacks the specificity of a pharmacophore that characterizes a class of inhibitors.
Incorporating molecular features derived from computational chemistry analyses of molecules
that bind to tau fibrils would likely enhance the model's predictive accuracy, providing a more
focused and effective approach for identifying potential inhibitors. Chapters 3 and 4 discuss the
generation of more specific molecular features based on inhibitor-fibril interaction using
molecular docking methods, while chapter 5 incorporates these newly made features into the
algorithm discussed in section 2.4.4.
30
CHAPTER 3: MOLECULAR DOCKING METHODOLOGY
3.1 Background
The aim of this chapter is to explore the molecular docking process using TMD, a tool
developed in our laboratory. We chose to use TMD instead of commercially available docking
programs because it allows us greater control over the entire docking procedure. For instance, we
can account for the presence of aromatic hydroxyl groups, which were determined to be the
primary feature in the machine learning algorithm. Moreover, TMD incorporates the characteristic
stacking of ligands along tau filaments, aligning with the rotational and translational symmetry of
the fibril, which is essential for our docking simulations.14,16 Instead of conventional scoring
methods used by typical docking software, our approach aims to generate a π−π interaction score
(which we refer to as stacking score) and other molecular features as input for the KNIME machine
learning algorithm. This approach enables us to analyze similarities and differences based on
specific molecular features without relying solely on general descriptors. A basic outline of the
whole process is shown in figure 15, and the Python code that controls this process is in appendix
1. In brief, ligand PDB files are initially produced from their corresponding SMILES using the
KNIME workflow depicted in figure 16, as discussed in section 3.2. These ligand PDB files will
serve as input for the docking program, along with the tau fibril to EGCG complex (PDB ID:
7UPG), which will generate various intermediate files for subsequent stages, as detailed in section
3.3. Ultimately, the stacking score will be computed and extracted as an input file for the machine
learning model in KNIME, as discussed in section 3.4.
31
Figure 15. Overview of the molecular docking process with TMD.
32
Figure 16. KNIME workflows for calculation of 3D coordinates using SMILES. Yellow box:
workflow for reading input, simple cleaning, and desalting; red box: KNIME workflow for
adjustment of carboxylic acids; blue box: KNIME workflow for adjustment of amines; green
box: KNIME workflow for converting SMILES strings to protein data bank (PDB) files.
33
3.2 Ligand Preparation with KNIME
As discussed in section 3.1, molecular docking requires ligand preparation step. To do so,
KNIME was employed to generate ligand PDB files from their SMILES strings, and the workflow
is illustrated in figure 16. This workflow comprises four main segments: basic cleaning of input
data (yellow), acidic charge adjustment (red), basic charge adjustment (blue), and conversion of
charge-adjusted SMILES string into PDB files (green). First, for the workflow to operate with
minimal modifications, the input excel sheet should have column 1 as the ligand name (“Identifier”
as column header) and column 2 as the SMILES string (“SMILES” as column header). These two
columns will serve as the main input for the workflow. If the excel sheet contains additional input
columns, as well as the description row, they will be filtered out. For the final output to be correctly
written out (as discussed in the later segment), the ligand names must be in the proper format.
Using a series of “String Manipulation” nodes, semicolons were removed, “(E/Z)” was changed
to “(E-Z)”, some quotation marks were removed, and some question marks (denoting α or β) were
replaced with “$”. Having the ligand names in the proper format, certain ligands contain
counterions (such as chloride anion) in their SMILES strings. Counterions can be disruptive to the
molecular docking calculation, and it has been apparent in this work. Therefore, they were
removed using the “Molecule Type Cast” and subsequently the “Speedy SMILES De-salt” nodes.
Optional nodes (9a – 9c) were included to show the SMILES strings before and after the de-salting
step in an output excel file. After all these initial data treatments, the updated “Identifier” and
SMILES strings were subjected to the acidic charge manipulation segment of the workflow.
Specifically, carboxylic acids were converted to carboxylate to reflect that at physiological pH.
The idea is to locate and replace carboxylic acids in the ligand SMILES string with their charged
counterparts. There are three metanodes (collection of nodes) in this segment, as well as additional
34
nodes. Due to carboxylic acids having multiple SMILES representations, each metanode was
designed to tackle them. Metanode 1 was used to locate and convert simple SMILES
representations of carboxylic acid into carboxylate, as shown in Table 5 (top table). If simple
SMILES representations for carboxylic acids were obtained from other databases, this can be
configured into the “Table Creator node” (node 12). However, carboxylic acids occasionally will
be not represented as shown in Table 5. For example, the SMILES representation of Xanthurenic
acid in the input file is “O=C(C1=NC2=C(O)C=CC=C2C(O)=C1)O”, and the carboxylic acid here
is not a discrete unit. Rather, the “O=C“ at the beginning and the “O” at the end without the
parenthetical elements make up the carboxylic acid group. This method of representing carboxylic
acid is seen in many ligands SMILES in the input file, so metanode 2 was built to specifically
recognize this pattern. However, Fidaxomicin and Isoliquiritin apioside, were recognized even
though they did not contain carboxylic acid groups, and close inspection revealed that they
contained esters instead. These two outliers were filtered out using the “RDKit Functional Group
Filter” node. Finally, the ending O was converted into [O-]. Metanode 3 was built to deal with a
similar issue as seen in metanode 2 but with “O” at the beginning and “C=O” at the end. Overall
setup is like that of metanode 2, but the configuration of the “String Manipulation” and “Row
Filter” nodes are different. Remaining nodes outside apart from those in the metanodes were used
to tidy up the data so that the column “Identifier” and the updated SMILES strings were prepared
for the basic charge conversion segment. In this segment, ionizable amines were converted to
ammonium. Unlike carboxylic acids, amines were rather complex to convert since not all amines
are ionizable. In this work, aliphatic amines were considered ionizable, and others were left
unchanged. Tertiary amines were converted manually due to “RDKit Functional Group Filter”
node not having the option to filter in tertiary amines. Metanodes 4 and 5 were built to convert
35
primary and secondary amines into their respective ammonium counterparts. While these two
metanodes are almost identical in setup, they differ in the two “Table creator” nodes and String
“Replacer node.” Two separate metanodes were used because aliphatic amines are represented as
N in SMILES, but primary and secondary ammonium are written as [NH3+] and [NH2+],
respectively. The “RDKit Functional Group Filter” nodes were first used to filter in ligands
containing 1 or 2 primary amines, and these ligands will be subjected to the workflow in metanode
4. The “Table Creator” nodes in metanode 4 are configured as shown in Table 5 (bottom). The first
“Table Creator” contains some nitrogen-containing functional groups written in SMILES that
cannot be ionized. In addition, these will be searched in the ligand SMILES and temporarily
substituted as X to avoid being converted. [X1] to [X14] are common in the “Table Creator” node
in both metanodes, while the remaining are situational ones depending on the ligand. Nitrogen
containing ligand SMILES that were not processed in metanodes 4 and 5 (such as tertiary amines)
were written out in an excel sheet and manually checked for ionization. After both acidic and basic
charge adjustments, these curated ligand SMILES strings were subjected to the final segment of
the workflow. They were first converted into SMILES format (recognizable by KNIME) and then
subjected to coordinate generation and hydrogen addition. These 3D structures, now in RDKit
format, were converted to PDB files using the “OpenBabel” node and written out as the
corresponding ligand name with a PDB extension in a specified directory.
36
Table 5. Configuration of the “Table Creator” Node in the Acid and Base Charge
Adjustment Branch.
"Table Creator" in
Metanode 1
Search Replace
O=C(O) O=C([O-])
C(O)=O C(=O)[O-]
Node 49 in
Metanode 4
Node 56 in
Metanode 4
Node 74 in
Metanode 5
Node 74 in
Metanode 5
Search Replace Search Replace Search Replace Search Replace
=N [X1] [X1] =N =N [X1] [X1] =N
N= [X2] [X2] N= N= [X2] [X2] N=
#N [X3] [X3] #N #N [X3] [X3] #N
N# [X4] [X4] N# N# [X4] [X4] N#
N+ [X5] [X5] N+ N+ [X5] [X5] N+
=CN [X6] [X6] =CN =CN [X6] [X6] =CN
NC= [X7] [X7] NC= NC= [X7] [X7] NC=
N@+ [X8] [X8] N@+ N@+ [X8] [X8] N@+
N@@+ [X9] [X9] N@@+ N@@+ [X9] [X9] N@@+
NC(=O) [X10] [X10] NC(=O) NC(=O) [X10] [X10] NC(=O)
C(=O)N [X11] [X11] C(=O)N C(=O)N [X11] [X11] C(=O)N
O=C(N [X12] [X12] O=C(N O=C(N [X12] [X12] O=C(N
C(N[ [X13] [X13] C(N[ C(N[ [X13] [X13] C(N[
NC([ [X14] [X14] NC([ NC( [X14] [X14] NC(
CCCNC [X15] [X15] CCCNC NH3+ [X15] [X15] NH3+
CCNCC [X16] [X16] CCNCC CN1CC2 [X16] [X16] CN1CC2
CNC( [X17] [X17] CNC(
37
3.3 Molecular Docking Analysis using TMD
This section details the various components or executables comprising TMD, a docking
program written in Fortran coding (Figure 15). The orchestration of these components is managed
by Python code, as shown in Appendix 1. Ligand PDB files generated in Section 3.2 serve as input
for Make_atoms.exe. This executable compute the contact point of the ligand with a specific
residue on the fibril (refer to Figure 16). The selection of this residue is based on observed
hydrogen bond interactions with EGCG and the tau fibril. Specifically, the program identifies the
aromatic hydroxyl group as the primary contact point, along with two other connecting atoms. If
aromatic hydroxyl groups are absent, the carbon of a benzene ring was used by default. This
information was then stored in an intermediate file named "atoms.in". Moreover, Make_atoms.exe
renamed the ligand PDB to “in.pdb.” Both “atoms.in” and “in.pdb” files were treated as input files
for Makezmat.exe, which was used to compute z-matrices of ligand. In addition, the atoms were
renumbered so that the first atom number started as the fourth atom. However, this numbering
changed at the end the run such that the fourth atom started as the first atom (atom number – 3).
The output file was a z-matrix file called “mol001.zmat”. Subsequently, the z-matrix file, along
with the protein-ligand complex PDB file and an input file called “Data_makepdb.in” were read
into the TMD.exe. The purpose of this TMD run was to generate a ligand PDB file where the
renumbering was reflected in the z-matrix file. The resulting PDB file was temporarily called
“out0001.pdb”. This renumbered ligand PDB file, and another input called “data_orig.in”
(configuration shown in Figure 17) were read into the find_rot_bond.exe, which locate torsional
bonds in the ligands. After locating the torsional bonds, we also approximated the torsions for
conjugated systems, double bonds, single bonds, and single-to-double bonds to vary from 60 to
300 degrees in increments of 120 degrees. These torsional degrees were based on the lowest energy
38
or most stable configuration found in the literature.31,32 This information will be included in the
“Dara_makepdb.in” file and renamed as “data.in” for subsequent docking. A second TMD run was
conducted using the updated "data.in," the protein-ligand complex PDB, and the "mol001.zmat"
files to generate the "makeonepdb.in" and "fixvar.in" files. The "makeonepdb.in" file records the
number of solutions at specific contact points, while the "fixvar.in" file contains the z-matrices for
each solution. A third TMD run was then performed using these two files to produce all conformers,
which were stored in the "all1.pdb" file. Additionally, the contact points in the "makeonepdb.in"
file were updated using numbers_in.exe and renamed "makeonepdb2.in." This step aimed to
translate the same poses to a neighboring fibril. Finally, a fourth TMD run was executed using
"fixvar.in" and "makeonepdb2.in" to calculate all the poses, with the results saved in the "all2.pdb"
file. The files “all1.pdb” and “all2.pdb” were used for subsequent stacking calculations.
Figure 17. Configuration of data_orig.in file used in the molecular docking process.
39
3.4 Evaluation of ligand-ligand stacking interaction using TMD
The final component of TMD, called stack_calc.exe, has multiple functions (Figure 15). It
begins by loading the poses stored in "all1.pdb" and "all2.pdb" to determine if corresponding poses
from these files collide. By "same pose," it refers to poses with similar geometries but different
translations. The criteria for collision are straightforward: if any atoms from the poses overlap,
they are considered to have collided and are eliminated. Poses that do not collide are then evaluated
for stacking. Stacking is assessed specifically for molecules with aromatic rings, as these rings can
engage in π-π stacking interactions. The program measures molecular stacking by calculating the
distance between atom "a" in ring "x" and the same atom in the aligned poses, focusing on the
aromatic carbons. The stacking strength is then classified into three categories: strong (6 or less),
moderate (between 6 and 6.5), and weak (greater than 6.5). An additional "none" category is
assigned to poses that clash. Prior to this categorization, the program also computes the mean and
standard deviation of the stacking distances, as well as counts the number of atoms that are within
5 Angstroms. In addition to stacking, ligands are categorized based on the types of interactions
they form with the protein, including hydrogen bonds, hydrophobic contacts, and unfavorable
contacts. Interactions are considered favorable if the scoring is positive and unfavorable if negative.
The scoring formula assigns +1 for each hydrogen bond, +0.2 for each hydrophobic contact, and -
1 for each unfavorable contact. This information is saved in a file named "data.out." Finally, the
information generated by TMD and stored in "data.out" file for each ligand was extracted into an
Excel sheet using a Python script provided in Appendix 2, preparing it for KNIME analysis.
40
CHAPTER 4: MOLECULAR DOCKING RESULTS
4.1 Results from TMD Analysis
Different components of the TMD simulation yielded varying results, using (-)-
Epigallocatechin gallate as an example. Make_atoms.exe identified the three ligand atoms
involved in forming hydrogen bond interactions with tau fibril (Figure 18). The initial point of
contact is an aromatic hydroxyl group, followed by two adjacent carbon atoms. Makezmat.exe
renumbered the ligand's atoms and generated z-matrices for each one (Figure 19). The aromatic
hydroxyl is renumbered as atom 4. Find_rot_bond.exe identified rotatable bonds in the ligands
and determined the torsional angles based on the type of bond. For instance, for conjugated bonds,
the torsion angle ranged from 60 degrees to 300 degrees in increments of 120 degrees. The
identified rotatable bonds are shown in Figure 20. Find_rot_bond.exe program identified four
rotatable bonds. Using the file with all updated bond angles, TMD.exe generated poses, resulting
in 1,000 solutions for (-)-Epigallocatechin gallate. These solutions were used for subsequent
stacking calculations with stack_calc.exe.
41
Figure 18. Output generated from make_atoms.exe listing the three atoms of contact between
the ligand and tau fibril. This output file is based on chemistry of (-)-Epigallocatechin gallate.
Figure 19. Z-matrix output with renumbered atoms generated by makezmat.exe. The atoms
of primary contact are now renumbered as 4,5, and 6 with 4 being the aromatic hydroxyl group.
This output file is based on the chemistry of (-)-Epigallocatechin gallate.
42
Figure 20. Modified output consisting of default setting as shown in Figure 17 and rotatable
bonds identified by find_rot_bond.exe. This output file is based on chemistry of (-)-
Epigallocatechin gallate.
43
4.2 Results from Ligand Stacking Calculations
Figure 21 shows the output file generated by stack_calc.exe for EGCG. Out of the 1,000
solutions from TMD.exe, 973 poses clashed and were discarded because they could not stack
with the corresponding translated pose and are classified as "None." These poses can be
categorized as favorable or unfavorable based on the scoring system outlined in section 3.4.
Despite the high number of clashing poses, EGCG had many favorable interactions with tau
fibril—890 out of 973 poses showed favorable interactions. Of the remaining poses, 27 were
considered stackable, with 19 classified as weak stacking and 8 as moderate stacking. This data
supports the concept of conformational sampling, as it indicates that a smaller number of
stackable poses are more likely to yield the correct conformation, with the majority being
eliminated. Figure 22 illustrates the docked poses of EGCG on one layer of the fibril in panel A
and the translated pose on an adjacent in panel B, with the best stackable pose shown panel C.
Parameters for 9 inhibitors were extracted using the Python code provided in Appendix 2 and are
detailed in Tables 6a-6c (with additional data in Appendix 3).
44
Figure 21. Stacking calculation output of EGCG computed by stack_calc.exe of TMD.
45
Figure 22. Stackable and docked poses of EGCG. (a) Stackable poses of EGCG on one layer of
the fibril; (b) translated copies of these poses on an adjacent layer of the fibril; (c) best stackable
pose of EGCG as computed by TMD. The parameters of the best stackable pose are summarized
in Figure 21.
46
Table 6. 3D parameters extracted from the stacking calculation for selected inhibitors
generated by TMD. (The rest of the data is in appendix 3.)
Ligands HBond HPhob BDCon Score
(+)-Gallocatechin 0 0 0 0
(-)-(S)-Equol 0 1 0 0.2
(-)-Catechin gallate 3 1 0 3.2
(-)-Epiafzelechin 0 1 0 0.2
(-)-Epicatechin gallate 3 0 0 3
(-)-Epigallocatechin Gallate 3 1 0 3.2
(-)-Epigallocatechin 0 0 0 0
(-)-Gallocatechin gallate 2 1 0 2.2
(-)-Gallocatechin 0 1 0 0.2
Ligands S + F S + U M + F M + U W + F W + U N + F N + U
(+)-Gallocatechin 0 0 1 1 1 1 48 14
(-)-(S)-Equol 0 0 2 1 13 6 48 38
(-)-Catechin gallate 0 0 4 0 23 3 1020 109
(-)-Epiafzelechin 0 0 2 0 4 0 53 25
(-)-Epicatechin gallate 2 0 15 1 24 2 739 113
(-)-Epigallocatechin Gallate 0 0 8 0 16 3 890 83
(-)-Epigallocatechin 0 0 0 0 0 0 9 2
(-)-Gallocatechin gallate 10 0 35 2 20 1 740 87
(-)-Gallocatechin 0 0 3 0 1 0 57 22
Hbond: count of hydrogen bonds; Hphob: count of hydrophobic contacts, BDCont: bad or
unfavorable contacts; S: strong stacking; M: medium stacking; W: weak stacking, N: no
stacking; F: favorable interactions; U: unfavorable interactions
Ligands Stackable
ligand count
Ring
count
Best
stackable
ligand #
Average
distance
Distance <
5A
Stdev
(+)-Gallocatechin 4 2 2 5.18 14 6.41
(-)-(S)-Equol 22 2 12 5.19 12.5 6.27
(-)-Catechin gallate 30 3 12 5.19 12.7 6.29
(-)-Epiafzelechin 6 2 2 5.18 14.5 6.34
(-)-Epicatechin gallate 44 2 7 5.23 12 5.97
(-)-Epigallocatechin Gallate 27 3 1 5.19 12.3 6.31
(-)-Epigallocatechin 0 2 0 0 0 0
(-)-Gallocatechin gallate 68 3 27 5.26 9.7 5.7
(-)-Gallocatechin 4 2 1 5.19 13.5 6.3
47
4.3 Discussion
When the thesis was initially drafted, only about 295 molecules had undergone testing, leaving
room for further refinement. One key area of improvement involves optimizing the code by more
precisely defining the permissible angles for rotatable bonds, particularly for bonds that exhibit
restricted rotation, such as double bonds. By narrowing these angle ranges, the program can reduce
computational complexity and improve accuracy in predicting molecular behavior. Furthermore,
the stacking calculations, which are currently limited, could be expanded to include all potential
layers, providing a more detailed understanding of intermolecular interactions. The docking
process, essential for evaluating how well molecules fit into a target site, could also be significantly
accelerated. By focusing on a narrower range of rotatable bond angles, specifically those identified
as optimal from the best stackable ligands, the process could become more efficient, reducing
computational time and resources. This targeted approach would enhance the model's ability to
accurately predict the binding affinity and inhibitory potential of a wider range of molecules as
more data becomes available. These refinements are critical for the continued evolution and
effectiveness of the AI model discussed in Chapter 5.
48
CHAPTER 5: IMPROVED AI ALGORTHM
5.1 Background
The goal of this chapter is to enhance the AI algorithm discussed in Chapter 2 by integrating
more detailed descriptors introduced in Chapter 3. The AI workflow was updated to include these
new descriptors. Additionally, a new scoring system, called the average Z-score, was developed
during the writing of the thesis to evaluate the inhibition activity of polyphenol inhibitors. In this
system, a negative value (around -1.6) signifies high inhibitory activity, values near 0 suggest weak
inhibition, and positive values indicate aggregation inducers. In this chapter, average Z-score was
used to evaluate the inhibition activity instead of puncta count. Ultimately, the previous AI model,
which was based on the top 10 features including the count of aromatic hydroxyl groups selected
through genetic algorithms and linear correlation, was compared with a new AI model that
incorporated more specific descriptors generated based on the method described in chapter 3, along
with the number of aromatic hydroxyl groups.
5.2 Modification of the AI model using KNIME
The updated AI model is illustrated in Figure 23 (with only the newly added components
displayed; the elements not shown remain the same as in Chapter 2). The CSV reader first imported
the molecular features of the 928 inhibitors, and then ten features were selected for analysis based
on genetic algorithms and linear correlation. The subsequent metanode, "Identifier Modification,"
consists of a series of string manipulation nodes used in Chapter 3 to alter the identifiers of the
ligands. This was done to ensure consistency with the naming conventions used in TMD analysis.
The average Z-score data for 545 inhibitors were imported, with the identifiers adjusted using the
same "Identifier Modification" metanode. Additionally, the specific descriptors of 289 inhibitors
generated using TMD were imported into KNIME. The "Classification" metanode, consisting of a
49
series of math nodes, was used to calculate the total stacking strength by combining favorable and
unfavorable interactions into a percentage. For instance, the counts for "none and favorable
interactions" and "none and unfavorable interactions" were summed and then divided by the total
number of inhibitors. Since some molecules lacked stackable counts, any missing values were
replaced with 0. The average Z-score and the new descriptors were combined and prepared for use
in the AI algorithm. Although not depicted in the figure, there was an additional minor adjustment
to the table columns generator. Instead of increments of 1000 from 0 to 40,000 as shown in Chapter
2, the increments used in the updated AI model were -0.1, ranging from -2 to 2.
Figure 23: KNIME workflow for introducing the generated parameters in the KNIME AI
model discussed in Chapter 2.
50
5.3 Results
The analysis of prediction outcomes for various compound categories based on their
experimental average Z-score in inhibiting tau fibril formation is shown in Figure 24 for the AI
model based on general descriptors and Figure 25 for the one based on specific descriptors and
aromatic hydroxyl count. The top portion of the figures highlights compounds classified as strong
inhibitors, defined by an experimental average Z-score of less than -0.851. In both models, strong
inhibitors like (+)-gallocatechin and pyrogallol were correctly predicted to be inhibitors at -0.8
(green region), closely matching their experimental average Z-scores. With an average Z-score
cutoff of -0.4, the AI model using general descriptors correctly identified 6 out of 10 strong
inhibitors as true positives, whereas the model incorporating more specific descriptors identified
8 out of 10. The middle section of both figures illustrates weak inhibitors with an average Z-score
around 0. When using a cutoff of -0.4, 8 out of 10 of these inhibitors fall within the red region,
indicating they are not inhibitors and therefore considered true negatives. In the bottom section, at
higher average Z-scores, all these molecules are positioned in the red region at the -0.4 cutoff,
indicating that they are not inhibitors. This is particularly evident in the model utilizing more
specific descriptors.
5.4 Discussion
Stacking interactions may be critical in inhibiting tau fibril formation, with parameters for
these interactions obtainable through molecular docking studies. Inlcusion of general descriptors
is a strong starting point for building the model, as they offer valuable guidance. For instance, the
identification of the aromatic hydroxyl group was crucial in directing our investigation ligand-tau
fibril interactions. Incorporating system-specific docking parameters can significantly enhance the
accuracy of an AI model, as illustrated by the models shown in Figures 24 and 25. However, the
51
current models are not fully accurate due to suboptimal 3D parameters generated by TMD and the
limited dataset of only 295 molecules used in model development. To improve the model’s
predictive accuracy, future studies should focus on optimizing TMD and expanding the dataset to
include additional molecules.
Figure 24. Prediction of tau fibril inhibition using a machine learning model built with
general descriptors with varying average Z-score. Calculations were done at intervals of -0.1
from -2 to 2, as shown at the top of the figure. Compounds are categorized as follows: the 10
strongest inhibitors (Avg Z-score < -0.8), 10 compounds near the avg Z-score of 0, and the 10
least active compounds (avg Z-score > 0.78). The green and pink regions indicate predictions of
inhibition and non-inhibition, respectively.
52
Figure 25. Prediction of tau fibril inhibition using a machine learning model built with
specific descriptors generated using TMD with varying average Z-score. Calculations were
done at intervals of -0.1 from -2 to 2, as shown at the top of the figure. Compounds are
categorized as follows: the 10 strongest inhibitors (Avg Z-score < -0.8), 10 compounds near the
avg Z-score of 0, and the 10 least active compounds (avg Z-score > 0.78). The green and pink
regions indicate predictions of inhibition and non-inhibition, respectively.
53
REFERENCES
(1) 2023 Alzheimer’s Disease Facts and Figures. Alzheimers Dement. 2023, 19 (4), 1598–1695.
https://doi.org/10.1002/alz.13016.
(2) Stelzmann, R. A.; Norman Schnitzlein, H.; Reed Murtagh, F. An English Translation of
Alzheimer’s 1907 Paper, “Über Eine Eigenartige Erkankung Der Hirnrinde.” Clin. Anat.
1995, 8 (6), 429–431. https://doi.org/10.1002/ca.980080612.
(3) Sadigh-Eteghad, S.; Sabermarouf, B.; Majdi, A.; Talebi, M.; Farhoudi, M.; Mahmoudi, J.
Amyloid-Beta: A Crucial Factor in Alzheimer’s Disease. Med. Princ. Pract. 2015, 24 (1), 1–
10. https://doi.org/10.1159/000369101.
(4) Orr, M. E.; Sullivan, A. C.; Frost, B. A Brief Overview of Tauopathy: Causes, Consequences,
and Therapeutic Strategies. Trends Pharmacol. Sci. 2017, 38 (7), 637–648.
https://doi.org/10.1016/j.tips.2017.03.011.
(5) Weingarten, M. D.; Lockwood, A. H.; Hwo, S. Y.; Kirschner, M. W. A Protein Factor
Essential for Microtubule Assembly. Proc. Natl. Acad. Sci. 1975, 72 (5), 1858–1862.
https://doi.org/10.1073/pnas.72.5.1858.
(6) Muralidar, S.; Ambi, S. V.; Sekaran, S.; Thirumalai, D.; Palaniappan, B. Role of Tau Protein
in Alzheimer’s Disease: The Prime Pathological Player. Int. J. Biol. Macromol. 2020, 163,
1599–1617. https://doi.org/10.1016/j.ijbiomac.2020.07.327.
(7) Neve, R. L.; Harris, P.; Kosik, K. S.; Kurnit, D. M.; Donlon, T. A. Identification of cDNA
Clones for the Human Microtubule-Associated Protein Tau and Chromosomal Localization
of the Genes for Tau and Microtubule-Associated Protein 2. Mol. Brain Res. 1986, 1 (3),
271–280. https://doi.org/10.1016/0169-328X(86)90033-1.
(8) Goedert, M.; Spillantini, M. G.; Potier, M. C.; Ulrich, J.; Crowther, R. A. Cloning and
Sequencing of the cDNA Encoding an Isoform of Microtubule-Associated Protein Tau
Containing Four Tandem Repeats: Differential Expression of Tau Protein mRNAs in Human
Brain. EMBO J. 1989, 8 (2), 393–399. https://doi.org/10.1002/j.1460-2075.1989.tb03390.x.
(9) Mandelkow, E.-M.; Mandelkow, E. Biochemistry and Cell Biology of Tau Protein in
Neurofibrillary Degeneration. Cold Spring Harb. Perspect. Med. 2012, 2 (7), a006247–
a006247. https://doi.org/10.1101/cshperspect.a006247.
(10) Schweers, O.; Schönbrunn-Hanebeck, E.; Marx, A.; Mandelkow, E. Structural Studies of
Tau Protein and Alzheimer Paired Helical Filaments Show No Evidence for Beta-Structure.
J. Biol. Chem. 1994, 269 (39), 24290–24297.
(11) Mukrasch, M. D.; Bibow, S.; Korukottu, J.; Jeganathan, S.; Biernat, J.; Griesinger, C.;
Mandelkow, E.; Zweckstetter, M. Structural Polymorphism of 441-Residue Tau at Single
Residue Resolution. PLoS Biol. 2009, 7 (2), e34.
https://doi.org/10.1371/journal.pbio.1000034.
54
(12) Fitzpatrick, A. W. P.; Falcon, B.; He, S.; Murzin, A. G.; Murshudov, G.; Garringer, H. J.;
Crowther, R. A.; Ghetti, B.; Goedert, M.; Scheres, S. H. W. Cryo-EM Structures of Tau
Filaments from Alzheimer’s Disease. Nature 2017, 547 (7662), 185–190.
https://doi.org/10.1038/nature23002.
(13) Sanabria Bohórquez, S.; Marik, J.; Ogasawara, A.; Tinianow, J. N.; Gill, H. S.; Barret, O.;
Tamagnan, G.; Alagille, D.; Ayalon, G.; Manser, P.; Bengtsson, T.; Ward, M.; Williams, S.-P.;
Kerchner, G. A.; Seibyl, J. P.; Marek, K.; Weimer, R. M. [18F]GTP1 (Genentech Tau Probe
1), a Radioligand for Detecting Neurofibrillary Tangle Tau Pathology in Alzheimer’s
Disease. Eur. J. Nucl. Med. Mol. Imaging 2019, 46 (10), 2077–2089.
https://doi.org/10.1007/s00259-019-04399-0.
(14) Merz, G. E.; Chalkley, M. J.; Tan, S. K.; Tse, E.; Lee, J.; Prusiner, S. B.; Paras, N. A.;
DeGrado, W. F.; Southworth, D. R. Stacked Binding of a PET Ligand to Alzheimer’s Tau
Paired Helical Filaments. Nat. Commun. 2023, 14 (1), 3048. https://doi.org/10.1038/s41467-
023-38537-y.
(15) Sonawane, S. K.; Chidambaram, H.; Boral, D.; Gorantla, N. V.; Balmik, A. A.; Dangi, A.;
Ramasamy, S.; Marelli, U. K.; Chinnathambi, S. EGCG Impedes Human Tau Aggregation
and Interacts with Tau. Sci. Rep. 2020, 10 (1), 12579. https://doi.org/10.1038/s41598-020-
69429-6.
(16) Seidler, P. M.; Murray, K. A.; Boyer, D. R.; Ge, P.; Sawaya, M. R.; Hu, C. J.; Cheng, X.;
Abskharon, R.; Pan, H.; DeTure, M. A.; Williams, C. K.; Dickson, D. W.; Vinters, H. V.;
Eisenberg, D. S. Structure-Based Discovery of Small Molecules That Disaggregate
Alzheimer’s Disease Tissue Derived Tau Fibrils in Vitro. Nat. Commun. 2022, 13 (1), 5451.
https://doi.org/10.1038/s41467-022-32951-4.
(17) Blanco-González, A.; Cabezón, A.; Seco-González, A.; Conde-Torres, D.; AnteloRiveiro, P.; Piñeiro, Á.; Garcia-Fandino, R. The Role of AI in Drug Discovery: Challenges,
Opportunities, and Strategies. Pharm. Basel Switz. 2023, 16 (6), 891.
https://doi.org/10.3390/ph16060891.
(18) Xu, Y.; Liu, X.; Cao, X.; Huang, C.; Liu, E.; Qian, S.; Liu, X.; Wu, Y.; Dong, F.; Qiu, C.-
W.; Qiu, J.; Hua, K.; Su, W.; Wu, J.; Xu, H.; Han, Y.; Fu, C.; Yin, Z.; Liu, M.; Roepman, R.;
Dietmann, S.; Virta, M.; Kengara, F.; Zhang, Z.; Zhang, L.; Zhao, T.; Dai, J.; Yang, J.; Lan,
L.; Luo, M.; Liu, Z.; An, T.; Zhang, B.; He, X.; Cong, S.; Liu, X.; Zhang, W.; Lewis, J. P.;
Tiedje, J. M.; Wang, Q.; An, Z.; Wang, F.; Zhang, L.; Huang, T.; Lu, C.; Cai, Z.; Wang, F.;
Zhang, J. Artificial Intelligence: A Powerful Paradigm for Scientific Research. Innov. Camb.
Mass 2021, 2 (4), 100179. https://doi.org/10.1016/j.xinn.2021.100179.
(19) Vilar, S.; Costanzi, S. Predicting the Biological Activities through QSAR Analysis and
Docking-Based Scoring. Methods Mol. Biol. Clifton NJ 2012, 914, 271–284.
https://doi.org/10.1007/978-1-62703-023-6_16.
(20) Hemmerich, J.; Gurinova, J.; Digles, D. Accessing Public Compound Databases with
KNIME. Curr. Med. Chem. 2020, 27 (38), 6444–6457.
https://doi.org/10.2174/0929867326666190801152317.
55
(21) Mazanetz, M. P.; Marmon, R. J.; Reisser, C. B. T.; Morao, I. Drug Discovery
Applications for KNIME: An Open Source Data Mining Platform. Curr. Top. Med. Chem.
2012, 12 (18), 1965–1979. https://doi.org/10.2174/156802612804910331.
(22) Nicola, G.; Berthold, M. R.; Hedrick, M. P.; Gilson, M. K. Connecting Proteins with
Drug-like Compounds: Open Source Drug Discovery Workflows with BindingDB and
KNIME. Database J. Biol. Databases Curation 2015, 2015, bav087.
https://doi.org/10.1093/database/bav087.
(23) Gally, J.; Bourg, S.; Do, Q.; Aci‐Sèche, S.; Bonnet, P. VSPrep: A General KNIME
Workflow for the Preparation of Molecules for Virtual Screening. Mol. Inform. 2017, 36
(10), 1700023. https://doi.org/10.1002/minf.201700023.
(24) Mansouri, K.; Moreira-Filho, J. T.; Lowe, C. N.; Charest, N.; Martin, T.; Tkachenko, V.;
Judson, R.; Conway, M.; Kleinstreuer, N. C.; Williams, A. J. Free and Open-Source QSARReady Workflow for Automated Standardization of Chemical Structures in Support of QSAR
Modeling. J. Cheminformatics 2024, 16 (1), 19. https://doi.org/10.1186/s13321-024-00814-
3.
(25) Muhammed, M. T.; Aki-Yalcin, E. Molecular Docking: Principles, Advances, and Its
Applications in DrugDiscovery. Lett. Drug Des. Discov. 2024, 21 (3), 480–495.
https://doi.org/10.2174/1570180819666220922103109.
(26) Kapetanovic, I. M. Computer-Aided Drug Discovery and Development (CADDD): In
Silico-Chemico-Biological Approach. Chem. Biol. Interact. 2008, 171 (2), 165–176.
https://doi.org/10.1016/j.cbi.2006.12.006.
(27) Salmaso, V.; Moro, S. Bridging Molecular Docking to Molecular Dynamics in Exploring
Ligand-Protein Recognition Process: An Overview. Front. Pharmacol. 2018, 9, 923.
https://doi.org/10.3389/fphar.2018.00923.
(28) Ferreira, L.; Dos Santos, R.; Oliva, G.; Andricopulo, A. Molecular Docking and
Structure-Based Drug Design Strategies. Molecules 2015, 20 (7), 13384–13421.
https://doi.org/10.3390/molecules200713384.
(29) Meng, X.-Y.; Zhang, H.-X.; Mezei, M.; Cui, M. Molecular Docking: A Powerful
Approach for Structure-Based Drug Discovery. Curr. Comput. Aided-Drug Des. 2011, 7 (2),
146–157. https://doi.org/10.2174/157340911795677602.
(30) Joshi, R.; Zheng, Z.; Agarwal, P.; Hatmal, M. M.; Chang, X.; Seidler, P.; Haworth, I. S.
KNIME Workflows for Applications in Medicinal and Computational Chemistry. Artif.
Intell. Chem. 2024, 100063. https://doi.org/10.1016/j.aichem.2024.100063.
(31) Gawrilow, M.; Suhm, M. A. Quantifying Conformational Isomerism in Chain Molecules
by Linear Raman Spectroscopy: The Case of Methyl Esters. Molecules 2021, 26 (15), 4523.
https://doi.org/10.3390/molecules26154523.
56
(32) Karpfen, A.; Choi, C. H.; Kertesz, M. Single-Bond Torsional Potentials in Conjugated
Systems: A Comparison of Ab Initio and Density Functional Results. J. Phys. Chem. A 1997,
101 (40), 7426–7433. https://doi.org/10.1021/jp971606l.
57
APPENDICES
Appendix A: Python Code for the Orchestration of TMD. The Fortan code for each
component of TMD is not shown below.
import shutil
import os
pdb_path = r"D:\TMD_7upg\PDB"
input_path = r"D:\TMD_7upg\input"
main = r"D:\TMD_7upg"
atoms_path = r"D:\TMD_7upg\atoms"
os.makedirs(atoms_path, exist_ok=True)
output_parent_folder = r"D:\TMD_7upg\outputs1"
output_parent_folder2 = r"D:\TMD_7upg\outputs2"
os.makedirs(output_parent_folder, exist_ok=True)
os.makedirs(output_parent_folder2, exist_ok=True)
#Part 1: Create a folder where the atoms.in, atoms.out, and pdb are stored.
for file in os.listdir(pdb_path):
if file.endswith(".pdb"):
file_path = os.path.join(pdb_path, file)
shutil.copy(file_path,atoms_path)
in_file = os.path.join(main, "in.pdb")
shutil.copy(file_path, in_file)
new_working_directory = r"D:\TMD_7upg"
os.chdir(new_working_directory)
os.system('cmd /c "exe\Make_Atoms_In3.exe"')
shutil.copy("atoms.in",atoms_path)
shutil.move("data.out", atoms_path)
original_path1 = os.path.join(atoms_path, "atoms.in")
extension1 = '.in'
new_name1 = file.replace('.pdb', extension1)
new_path1 = os.path.join(atoms_path, new_name1)
os.rename(original_path1, new_path1)
original_path2 = os.path.join(atoms_path, "data.out")
extension2 = '.out'
new_name2 = file.replace('.pdb', extension2)
new_path2 = os.path.join(atoms_path, new_name2)
os.rename(original_path2, new_path2)
# Part 2: Copy input files to main directory and run makezmat, TMD, and
find_rot_bond.
shutil.copy(os.path.join(input_path, "7upg.pdb"),main)
shutil.copy(os.path.join(input_path, "data_makepdb.in"), main)
shutil.copy(os.path.join(input_path, "data_orig.in"), main)
shutil.copy(os.path.join(input_path, "numbers.in"), main)
shutil.copy(os.path.join(input_path, "run1.bat"), main)
shutil.copy(os.path.join(input_path, "run2.bat"), main)
shutil.copy(os.path.join(input_path, "Stack_Calc3.bat"), main)
new_working_directory = r"D:\TMD_7upg"
os.chdir(new_working_directory)
os.system('cmd /c "exe\Makezmat_new.exe"')
os.remove("atoms.in")
58
shutil.copy("data_makepdb.in", "data.in")
os.system('cmd /c "exe\TMD3.exe"')
shutil.move("data.add", "data_makepdb.add")
find_rot_bond_exe = r'D:\TMD_7upg\exe\find_rot_bond3.exe'
os.system(f'cmd /c "{find_rot_bond_exe}"')
shutil.move("data_rot_bond.out", "data_find_rot_bond.out")
shutil.copy("data_find_rot_bond.out", atoms_path)
shutil.copy("data_new.in", "data.in")
shutil.copy("data.in",atoms_path)
os.remove("out.pdb")
os.system('cmd /c "exe\TMD3.exe"')
#Part 3: Create output 1 and output 2 folder and store output files there.
subfolder = os.path.join(output_parent_folder,
os.path.splitext(file)[0])
os.makedirs(subfolder, exist_ok=True)
subfolder2 = os.path.join(output_parent_folder2,
os.path.splitext(file)[0])
os.makedirs(subfolder2, exist_ok=True)
poses1_folder = os.path.join(subfolder2, "poses1")
os.makedirs(poses1_folder, exist_ok=True)
poses2_folder = os.path.join(subfolder2, "poses2")
os.makedirs(poses2_folder, exist_ok=True)
out = os.path.join(main, "mol001.zmat")
filer = file.replace("pdb", "zmat")
shutil.move(out, os.path.join(main, filer))
shutil.move(os.path.join(main, filer), subfolder)
out1 = os.path.join(main, "fixvar.out")
filer1 = file.replace("pdb", "fixvar")
shutil.move(out1, os.path.join(main, filer1))
shutil.move(os.path.join(main, filer1), subfolder)
out2 = os.path.join(main, "data.add")
filer2 = file.replace("pdb", "add")
shutil.move(out2, os.path.join(main, filer2))
shutil.move(os.path.join(main, filer2), subfolder)
out3 = os.path.join(main, "makepdb.in")
shutil.move(os.path.join(main, "makepdb.in"), subfolder)
out4 = os.path.join(main, "makeonepdb.in")
shutil.move(os.path.join(main, "makeonepdb.in"), subfolder)
out5 = os.path.join(main, "run1.bat")
shutil.copy(os.path.join(main, "run1.bat"), subfolder)
out6 = os.path.join(main, "run2.bat")
shutil.copy(os.path.join(main, "run2.bat"), subfolder)
out7 = os.path.join(main, "Stack_Calc3.bat")
shutil.copy(os.path.join(main, "Stack_Calc3.bat"), subfolder)
out8 = os.path.join(main, "numbers.in")
59
shutil.copy(os.path.join(main, "numbers.in"), subfolder)
files_to_delete = ["in.pdb", "data_makepdb.in", "data_orig.in",
"run1.bat", "run2.bat", "Stack_Calc3.bat",
"atoms.in", "data_find_rot_bond.out",
"out.pdb", "in.pdb", "out.json", "mkz.add",
"mkz_next_in.out", "dock.in", "go_dock.bat",
"go_intra.bat", "go_makeonepdb.bat",
"go_makepdb.bat", "intra.in", "sample_go_makepdb.txt",
"data_makepdb.add", "data_new.in",
"out000001.pdb",
]
for file in files_to_delete:
file_path = os.path.join(main, file)
if os.path.exists(file_path):
os.remove(file_path)
# Part 4a: Create a folder for data.in and save it to atoms
def rename_data_in_files(pdb_path, atoms_path):
pdb_base_name = os.path.basename(subfolder)
data_in_file_path = os.path.join(atoms_path, "data.in")
new_data_in_name = os.path.join(atoms_path, pdb_base_name +
"_data" + ".in")
os.rename(data_in_file_path, new_data_in_name)
rename_data_in_files(pdb_path, atoms_path)
# Part 4b: Create a folder for data.rot_bond.out and save it to atoms
def rename_rot_bond_files(pdb_path, atoms_path):
pdb_base_name = os.path.basename(subfolder)
rot_bond_file_path = os.path.join(atoms_path,
"data_find_rot_bond.out")
new_rot_bond_name = os.path.join(atoms_path, pdb_base_name +
"_rot_bond" + ".out")
os.rename(rot_bond_file_path, new_rot_bond_name)
rename_rot_bond_files(pdb_path, atoms_path)
#Part 5a: Run TMD using makepdb.in file and move poses output to "poses1
folder."
for file in os.listdir(subfolder):
if file.endswith(".fixvar"):
shutil.copy(os.path.join(subfolder,
file),os.path.join(main,"fixvar.in"))
if file.endswith(".zmat"):
shutil.copy(os.path.join(subfolder,
file),os.path.join(main,"mol001.zmat"))
for file in os.listdir(subfolder):
if file == "makepdb.in":
shutil.copy(os.path.join(subfolder, file), main)
shutil.copy(os.path.join(subfolder,file),os.path.join(main,"data.in"))
os.system('cmd /c "exe\TMD3.exe"')
shutil.move(os.path.join(main, "data.add"),
60
os.path.join(main,"makepdb.add"))
for file in os.listdir(main):
if "out0" in file:
source_path = os.path.join(main, file)
destination_path = os.path.join(poses1_folder, file)
shutil.move(source_path, destination_path)
if file == "out.pdb":
shutil.move(os.path.join(main,file),poses1_folder)
# Part 5b: Run TMD using makeonepdb.in file and generate all1.pdb file.
for file in os.listdir(subfolder):
if file == "makeonepdb.in":
shutil.copy(os.path.join(subfolder, file), main)
shutil.copy(os.path.join(subfolder,file),os.path.join(main,"data.in"))
os.system('cmd /c "exe\TMD3.exe"')
shutil.move(os.path.join(main, "data.add"),
os.path.join(main,"makeonepdb.add"))
for file in os.listdir(main):
if "all" in file:
keep_folder = os.path.join(subfolder2, "keep")
os.makedirs(keep_folder, exist_ok=True)
source_path = os.path.join(main, file)
destination_path = os.path.join(keep_folder, file)
shutil.move(source_path, destination_path)
os.rename(os.path.join(keep_folder,"all.pdb"),os.path.join(keep_folder,"all1.
pdb"))
files_to_delete2 = [
"out.pdb", "torsions.txt", "go_order.bat", "go_torsions.bat",
"go_makeonepdb.bat", "go_makepdb.bat", "fixvar.out",
"compare.in",
"names.in", "go_watgenorder.bat"
]
for file in files_to_delete2:
file_path = os.path.join(main, file)
if os.path.exists(file_path):
os.remove(file_path)
#Part 6a: Run change_line.exe, generate makepdb2.in, and run TMD.
for file in os.listdir(subfolder):
if file == "numbers.in":
shutil.copy(os.path.join(subfolder,file),main)
os.system('cmd /c "exe\change_line.exe"')
for file in os.listdir(main):
if file == "makepdb2.in":
shutil.copy(os.path.join(main,file),os.path.join(main,"data.in"))
os.system('cmd /c "exe\TMD3.exe"')
shutil.move("data.add","makepdb2.add")
for file in os.listdir(main):
if "out0" in file:
source_path = os.path.join(main, file)
61
destination_path = os.path.join(poses2_folder, file)
shutil.move(source_path, destination_path)
if file == "out.pdb":
shutil.move(os.path.join(main,file),poses2_folder)
# Part 6b: Run change_line.exe, generate makeonepdb2.in, run TMD, and
generate all2.pdb file.
for file in os.listdir(main):
if file == "makeonepdb2.in":
shutil.copy(os.path.join(main,file),os.path.join(main,"data.in"))
os.system('cmd /c "exe\TMD3.exe"')
shutil.move("data.add","makeonepdb2.add")
for file in os.listdir(main):
if "all" in file:
source_path = os.path.join(main, file)
destination_path = os.path.join(keep_folder, file)
shutil.move(source_path, destination_path)
os.rename(os.path.join(keep_folder,"all.pdb"),os.path.join(keep_folder,"all2.
pdb"))
files_to_delete3 = [
"out.pdb", "torsions.txt", "go_order.bat", "go_torsions.bat",
"go_makeonepdb.bat", "go_makepdb.bat", "fixvar.out",
"compare.in",
"names.in", "go_watgenorder.bat", "fixvar.in", "mol001.zmat",
"data.in"
]
for file in files_to_delete3:
file_path = os.path.join(main, file)
if os.path.exists(file_path):
os.remove(file_path)
for filename in os.listdir(main):
if "make" in filename:
os.remove(filename)
#Part 7: Perform stack_calculation
shutil.copy("7upg.pdb","prot.pdb")
shutil.copy(os.path.join(keep_folder, "all1.pdb"), os.path.join(main,
"in1.pdb"))
shutil.copy(os.path.join(keep_folder, "all2.pdb"), os.path.join(main,
"in2.pdb"))
os.system('cmd /c "exe\Stack_Calc_final3.exe"')
shutil.move("out1.pdb", subfolder2)
shutil.move("out2.pdb", subfolder2)
shutil.move("data.out", subfolder2)
data_out_folder = "D:\TMD_7upg\data_out"
os.makedirs(data_out_folder, exist_ok=True)
shutil.copy(os.path.join(subfolder2, "data.out"), data_out_folder)
shutil.copy("best_ligand_stack.pdb", data_out_folder)
# Part 8a: Rename data.out files based on the compound name and move them to
62
another folder.
def rename_data_out_files(pdb_path, data_out_folder):
pdb_base_name = os.path.basename(subfolder)
data_out_file_path = os.path.join(data_out_folder, "data.out")
new_data_out_name = os.path.join(data_out_folder, pdb_base_name +
".out")
os.rename(data_out_file_path, new_data_out_name)
rename_data_out_files(pdb_path, data_out_folder)
os.remove("atoms.in")
# Part 8b: Rename best_ligand_stack.pdb based on the compound name and move
them to data_out.
def rename_best_ligand_files(pdb_path, data_out_folder):
pdb_base_name = os.path.basename(subfolder)
best_ligand_file_path = os.path.join(data_out_folder,
"best_ligand_stack.pdb")
new_best_ligand_name = os.path.join(data_out_folder,
pdb_base_name + "_best" + ".pdb")
os.rename(best_ligand_file_path, new_best_ligand_name)
rename_best_ligand_files(pdb_path, data_out_folder)
for file in os.listdir(main):
if file.endswith(".pdb"):
file_path = os.path.join(main, file)
os.remove(file_path)
for file in os.listdir(main):
if file.endswith(".bat"):
file_path = os.path.join(main, file)
os.remove(file_path)
for file in os.listdir(main):
if file.endswith(".in"):
file_path = os.path.join(main, file)
os.remove(file_path)
os.remove("data_temp.out")
63
Appendix B: Python Code for Extracting Parameters from Stacking Calculation Output.
import os
import pandas as pd
# Step1: Extract best ligand info
def extract_ligand_values(data):
lines = data.split('\n')
for i, line in enumerate(lines):
if "Summary of stacked ligands" in line:
values_line = lines[i + 2].split() # Skip the header line
if len(values_line) >= 8:
return (values_line[0], float(values_line[1]),
float(values_line[2]), float(values_line[3]),
float(values_line[4]), float(values_line[5]),
float(values_line[6]), float(values_line[7]))
return (None, None, None, None, None, None, None, None)
# Step2: Extract the total number of stackable ligand info
def count_ligand_rows(data):
lines = data.split('\n')
start_index = end_index = None
for i, line in enumerate(lines):
if "Summary of stacked ligands" in line:
start_index = i + 2
elif "Details of the best ligand" in line:
end_index = i - 1
break
return max(0, end_index - start_index) if start_index and end_index else
0
# Step3: Extract ring count
def count_ring_rows(data):
lines = data.split('\n')
start_index = end_index = None
for i, line in enumerate(lines):
if "Rings in ligand" in line:
start_index = i + 1
elif "nring avdis dis<5A stdev hbond hphob badcn" in line:
end_index = i - 1
break
return max(0, end_index - start_index) if start_index and end_index else
0
# Step4: Extract details of the best ligand
def extract_best_ligand_details(data):
lines = data.split('\n')
ring1_value = ring2_value = ring3_value = 0.0
for i, line in enumerate(lines):
if "Details of the best ligand" in line:
for j in range(i + 1, len(lines)):
64
if "ring 1" in lines[j]:
try:
ring1_value = float(lines[j].split()[-1])
except ValueError:
ring1_value = 0.0
elif "ring 2" in lines[j]:
try:
ring2_value = float(lines[j].split()[-1])
except ValueError:
ring2_value = 0.0
elif "ring 3" in lines[j]:
try:
ring3_value = float(lines[j].split()[-1])
except ValueError:
ring3_value = 0.0
# Stop searching once all three rings are found
if ring1_value != 0.0 and ring2_value != 0.0 and
ring3_value != 0.0:
break
break # Exit the loop once the relevant section is processed
return ring1_value, ring2_value, ring3_value
# Step5: Extract categories of ligands
def extract_ligand_categories(data):
lines = data.split('\n')
categories = {}
for i, line in enumerate(lines):
if "Categories of ligands" in line:
for j in range(i + 2, len(lines)):
parts = lines[j].split(':')
if len(parts) == 2:
category = parts[0].strip()
value = int(parts[1].strip())
categories[category] = value
else:
break
return categories
# Step6a: Determine if pyrogallol is present
def pyrogallol(data):
lines = data.split('\n')
ring_substitution_index = None
for i, line in enumerate(lines):
if "Ring substitution pattern" in line:
ring_substitution_index = i
break
if ring_substitution_index is not None:
pyrogallol_count = sum(1 for line in lines[ring_substitution_index +
1:] if "ring pyrogallol" in line)
return pyrogallol_count
else:
return 0 # Return 0 if "ring substitution pattern" is not found
65
# Step6b: Determine if catechol is present
def catechol(data):
lines = data.split('\n')
ring_substitution_index = None
for i, line in enumerate(lines):
if "Ring substitution pattern" in line:
ring_substitution_index = i
break
if ring_substitution_index is not None:
catechol_count = sum(1 for line in lines[ring_substitution_index +
1:] if "ring catechol" in line)
return catechol_count
else:
return 0 # Return 0 if "ring substitution pattern" is not found
# Step6c: Determine if dihydroxy_not_catechol is present
def dihydroxy_not_catechol(data):
lines = data.split('\n')
ring_substitution_index = None
for i, line in enumerate(lines):
if "Ring substitution pattern" in line:
ring_substitution_index = i
break
if ring_substitution_index is not None:
dydroxy_not_catechol_count = sum(
1 for line in lines[ring_substitution_index + 1:] if "ring
dihydroxy (not catechol)" in line)
return dydroxy_not_catechol_count
else:
return 0 # Return 0 if "ring substitution pattern" is not found
# Step6d: Determine if phenol is present
def phenol(data):
lines = data.split('\n')
ring_substitution_index = None
for i, line in enumerate(lines):
if "Ring substitution pattern" in line:
ring_substitution_index = i
break
if ring_substitution_index is not None:
phenol_count = sum(1 for line in lines[ring_substitution_index + 1:]
if "ring phenol" in line)
return phenol_count
else:
return 0 # Return 0 if "ring substitution pattern" is not found
# Directory containing the files
directory = 'D:/TMD_7upg/stack_calc_out'
output_data = []
66
# Step: Loop through each data.out file and compile everything into an excel
file
for filename in os.listdir(directory):
if filename.endswith(".out"):
file_path = os.path.join(directory, filename)
with open(file_path, 'r') as file:
data = file.read()
ligand_rows = count_ligand_rows(data)
ring_rows = count_ring_rows(data)
ring_pyrogallol = pyrogallol(data)
ring_catechol = catechol(data)
ring_dihydroxy = dihydroxy_not_catechol(data)
ring_phenol = phenol(data)
(ligand, avdis, dis_less_than_5A, stdev, hbond, hphob, bdcon,
score) = extract_ligand_values(data)
ring1_value, ring2_value, ring3_value =
extract_best_ligand_details(data)
categories = extract_ligand_categories(data)
filename_without_extension = os.path.splitext(filename)[0]
output_row = [filename_without_extension, ligand_rows, ring_rows,
ligand, avdis, dis_less_than_5A, stdev,
hbond, hphob, bdcon, score, ring_pyrogallol,
ring_catechol,
ring_dihydroxy, ring_phenol, ring1_value,
ring2_value, ring3_value]
for key in ['Strong + Favorable', 'Strong + Unfavorable',
'Moderate + Favorable', 'Moderate + Unfavorable',
'Weak + Favorable', 'Weak + Unfavorable', 'None +
Favorable', 'None + Unfavorable']:
output_row.append(categories.get(key, 0))
output_data.append(output_row)
# Create a DataFrame from the output data
columns = ['Ligands', 'Stackable ligand count', 'Ring count',
"Best stackable ligand #", 'Average distance', 'Distance < 5A',
'Stdev',
'HBond', 'HPhob', 'BDCon', 'Score', 'Ring_Pyrogallol',
'Ring_Catechol', 'Ring_Dihydroxy (Not Catechol)',
'Ring_Phenol', 'Ring 1 Value', 'Ring 2 Value', 'Ring 3 Value',
'Strong + Favorable', 'Strong + Unfavorable', 'Moderate +
Favorable', 'Moderate + Unfavorable',
'Weak + Favorable', 'Weak + Unfavorable', 'None + Favorable',
'None + Unfavorable']
df = pd.DataFrame(output_data, columns=columns)
# Save the DataFrame to an Excel file
excel_file_path = 'D:/TMD_7upg/Data.xlsx'
df.to_excel(excel_file_path, index=False)
print("Success!")
67
Appendix C: Parameters Generated by TMD for all 295 inhibitors.
Ligands Stackable
ligand count
Ring
count
Best
stackable
ligand #
Average
distance
Distance
< 5A
Stdev
(+)-Gallocatechin 4 2 2 5.18 14 6.41
(+)-Usnic acid 0 1
(-)-(S)-Equol 22 2 12 5.19 12.5 6.27
(-)-Catechin gallate 30 3 12 5.19 12.7 6.29
(-)-Epiafzelechin 6 2 2 5.18 14.5 6.34
(-)-Epicatechin gallate 44 2 7 5.23 12 5.97
(-)-Epigallocatechin
Gallate
27 3 1 5.19 12.3 6.31
(-)-Epigallocatechin 0 2
(-)-Gallocatechin gallate 68 3 27 5.26 9.7 5.7
(-)-Gallocatechin 4 2 1 5.19 13.5 6.3
(2S)-6-Prenylnaringenin 522 2 354 5.26 11 5.71
(E)-Methyl 4-coumarate 1714 1 568 5.26 11 5.68
(S)-10-
Hydroxycamptothecin
25 1 20 5.25 11 5.68
(±)-10-
Hydroxycamptothecin
26 1 20 5.25 11 5.68
1,2,4-Trihydroxybenzene 92 1 41 5.26 11 5.68
68
1-Naphthol 22 2 9 5.25 11 5.68
2'-Hydroxy-4'-
methylacetophenone
26 1 18 5.26 11 5.69
2,3,4-Trihydroxybenzoic
acid
50 1 23 5.25 11 5.68
2,3-Dihydroxy-4-
methoxyacetophenone
60 1 19 5.26 11 5.69
2,4-Dihydroxybenzoic
acid
169 1 72 5.25 11 5.68
2,5-
Dihydroxyacetophenone
87 1 43 5.26 11 5.69
2,6-Dibromophenol 26 1 15 5.26 11 5.68
2,6-Dihydroxybenzoic
acid
65 1 31 5.25 11 5.68
2,6-
Dimethylhydroquinone
27 1 18 5.25 9 5.69
2-Ethyl-6-methylphenol 8 1 5 5.25 11 5.68
2-Hydroxyphenylacetic
acid
134 1 116 5.26 11 5.68
2-Methoxyestradiol 15 1 10 5.26 9 5.69
3',4'-
Dihydroxyacetophenone
99 1 38 5.26 11 5.69
3'-Hydroxypterostilbene 1137 2 458 5.26 8 5.67
69
3'-Hydroxypuerarin 0 2
3,5-Dimethoxyphenol 147 1 72 5.25 11 5.68
3-(3-
Hydroxyphenyl)propionic
acid
766 1 585 5.26 11 5.68
3-Chloro-L-tyrosine 606 1 471 5.25 11 5.68
3-Hydroxybenzaldehyde 207 1 86 5.25 11 5.67
3-Hydroxybenzoic acid 159 1 69 5.26 11 5.68
3-Hydroxyhippuric acid 689 1 284 5.26 11 5.69
3-Hydroxyphenylacetic
acid
325 1 256 5.25 11 5.67
3-Methoxytyramine 388 1 151 5.25 11 5.68
3-Methylcatechol 34 1 17 5.26 11 5.69
3-Nitro-L-tyrosine 784 1 420 5.26 11 5.69
3-O-Methylgallic acid 110 1 73 5.25 9 5.68
4$-Hydroxy-2$-
methylacetophenone
112 1 39 5.26 11 5.68
4',5-Dihydroxyflavone 31 2 16 5.26 11 5.7
4'-Hydroxy-3'-
methylacetophenone
90 1 36 5.26 11 5.69
4'-Hydroxychalcone 923 2 663 5.26 10.5 5.68
4'-Methoxyresveratrol 422 2 338 5.26 10 5.69
70
4-
$Hydroxyphenylpyruvic
acid
1214 1 481 5.25 11 5.68
4-(1,2-
Dihydroxyethyl)benzene1,2-diol
210 1 72 5.26 11 5.69
4-Allylcatechol 388 1 173 5.26 11 5.68
4-Ethylresorcinol 41 1 16 5.26 11 5.69
4-Hydroxy-1H-indole-3-
carbaldehyde
34 1 4 5.17 13 6.4
4-Hydroxybenzoic acid 268 1 90 5.26 11 5.68
4-Hydroxybenzyl cyanide 744 1 559 5.25 11 5.67
4-Hydroxybenzylamine 251 1 98 5.26 11 5.68
4-Hydroxychalcone 1130 2 738 5.26 11 5.69
4-Hydroxymephenytoin 33 1 2 5.2 12 6.13
4-Methylcatechol 66 1 29 5.26 11 5.68
4-Methyldaphnetin 17 1 9 5.25 11 5.67
4-Methylesculetin 42 1 24 5.26 11 5.68
4-Methylumbelliferone 43 1 32 5.26 11 5.69
5,6-Dihydroxyindole 59 1 24 5.26 11 5.68
5,7-Dihydroxy-4-
methylcoumarin
40 1 32 5.25 11 5.68
5,7-Dimethoxyluteolin 185 2 93 5.26 8.5 5.69
71
5-Aminosalicylic Acid 74 1 40 5.26 11 5.68
5-Hydroxyferulic acid 198 1 139 5.26 9 5.69
5-Hydroxyflavone 35 2 18 5.26 11 5.69
5-Hydroxyindole-3-acetic
acid
163 1 126 5.25 11 5.68
5-Hydroxyindole 75 1 28 5.26 11 5.69
5-Methoxysalicylic acid 181 1 88 5.26 11 5.68
6-Hydroxycoumarin 71 1 28 5.25 11 5.68
6-Hydroxymelatonin 299 1 143 5.26 11 5.69
7,4'-Di-O-methylapigenin 93 2 48 5.2 9.5 6.17
7,4'-Dihydroxyflavone 67 2 56 5.26 11 5.68
7-Hydroxy-4H-chromen4-one
53 1 40 5.25 11 5.68
8-Deoxygartanin 7 2 1 5.13 14.5 6.69
8-Hydroxybergapten 25 1 23 5.16 14 6.51
Acacetin 66 2 25 5.2 10 6.17
Acetylshikonin 9 1 1 5.13 15 6.7
Alizarin 7 2 3 5.26 11 5.69
Alloimperatorin 35 1 12 5.16 14 6.48
Aloesin 0 1
Alpinetin 69 2 61 5.26 9 5.74
Apigenin 7-glucoside 6 2 1 5.14 14 6.66
Apocynin 123 1 40 5.25 11 5.68
72
Arbutin 148 1 114 5.27 9 5.69
Armillarisin A 164 1 114 5.26 11 5.69
Astragalin 4 2 2 5.23 12 5.96
Ayanin 65 2 24 5.18 13 6.34
Bavachin 3 2 2 5.19 13.5 6.3
Bavachinin 1289 2 855 5.25 9 5.74
Berberrubine (chloride) 8 2 1 5.17 14 6.41
Bergenin 0 1
Bisphenol A 147 2 114 5.26 11 5.68
Blumeatin 3 2 2 5.18 13.5 6.31
Brazilin 4 2 4 5.19 13 6.26
Brevifolincarboxylic acid 22 1 13 5.26 9 5.69
Butein 452 2 184 5.26 9.5 5.69
Butin 41 2 36 5.26 9.5 5.68
Calycosin 281 2 204 5.26 10.5 5.69
Cardamonin 329 2 114 5.26 11 5.69
Carvacrol 46 1 17 5.25 11 5.68
Catechin 3 2 2 5.17 14 6.47
Chrysin-7-O-glucuronide 6 2 4 5.1 14.5 6.95
Chrysoeriol 40 2 5 5.2 11 6.17
Columbamine 149 2 105 5.26 9.5 5.68
Colutehydroquinone 0 2
Coniferyl alcohol 386 1 191 5.25 11 5.68
73
Corylin 95 2 66 5.26 11 5.7
Corypalmine 64 2 11 5.24 10.5 5.88
Coumestrol 42 2 29 5.26 10 5.7
Creosol 56 1 27 5.25 11 5.68
Cryptochlorogenic acid 622 1 274 5.25 11 5.67
Cyanidin (Chloride) 38 2 22 5.26 8.5 5.69
Daidzein 126 2 46 5.26 10.5 5.69
Danshensu 712 1 281 5.25 11 5.68
Danthron 7 2 5 5.15 14 6.57
Daphnoretin 209 2 68 5.23 11.5 5.95
Dehydrodiisoeugenol 372 2 277 5.25 12 5.82
Delphinidin (chloride) 42 2 9 5.23 12 5.98
Demethyleneberberine 140 2 83 5.26 10 5.7
Deoxyarbutin 380 1 139 5.26 11 5.69
Dichotomitin 17 2 1 5.17 13 6.45
Dihydrocaffeic acid 559 1 189 5.25 11 5.67
Dihydrodaidzein 50 2 28 5.23 12 5.97
Dihydrokaempferol 71 2 48 5.25 9 5.76
Dihydromyricetin 2 2 1 5.14 13.5 6.65
Dimethylacrylalkannin 14 1 13 5.26 11 5.7
Diphyllin 11 3 3 5.14 14 6.71
DL-m-Tyrosine 544 1 415 5.25 11 5.68
74
DL-Norepinephrine
(hydrochloride)
315 1 140 5.25 11 5.68
Ellagic acid 24 2 15 5.26 11 5.7
Emodin-8-glucoside 4 2 1 5.1 15 6.98
Emodin 5 2 4 5.15 14 6.57
Eriodictyol 3 2 2 5.17 14 6.41
Esculetin 56 1 23 5.25 11 5.68
Estriol 8 1 7 5.23 12 5.87
Ethyl gallate 438 1 200 5.25 11 5.68
Ethynyl Estradiol 18 1 13 5.23 12 5.87
Eugenol 356 1 154 5.25 11 5.67
Eupatilin 92 2 15 5.2 11 6.16
Eupatorin 72 2 27 5.26 11 5.7
Farrerol 7 2 3 5.19 13.5 6.26
Ferulic acid 430 1 179 5.25 11 5.67
Flavokawain C 1849 2 1174 5.26 9.5 5.68
Fraxetin 17 1 8 5.25 11 5.68
Fraxidin 22 1 9 5.16 14 6.52
Fraxin 0 1
Galangin 18 2 3 5.2 11.5 6.19
Gallic acid (hydrate) 90 1 40 5.26 11 5.68
Gallic acid 90 1 40 5.26 11 5.68
Genistein 119 2 32 5.26 11 5.69
75
Genistin 0 2
Genkwanin 32 2 16 5.2 10 6.17
Glabridin 1 2 1 5.16 14 6.48
Glycitein 103 2 44 5.26 10 5.69
Glycycoumarin 21 2 7 5.26 11 5.69
Gnetol 86 2 76 5.26 10 5.69
Groenlandicine 26 2 7 5.22 12 6
Guaiacol 68 1 35 5.26 11 5.68
Guaijaverin 4 2 1 5.2 13 6.21
Hematoxylin 6 2 1 5.21 12.5 6.15
Herbacetin 48 2 35 5.26 10.5 5.69
Higenamine
(hydrochloride)
102 2 44 5.26 9 5.68
Hispidin 170 1 70 5.25 11 5.67
Homogentisic acid 261 1 203 5.25 11 5.68
Homovanillic acid 405 1 175 5.26 11 5.68
Homovanillyl alcohol 391 1 160 5.26 11 5.68
Hydroxygenkwanin 32 2 7 5.2 11 6.18
Hydroxyphenyllactic acid 1216 1 473 5.26 11 5.69
Hydroxytyrosol acetate 1688 1 720 5.26 11 5.68
Hydroxytyrosol 400 1 170 5.25 11 5.68
Icaritin 62 2 19 5.18 11.5 6.35
Irisolidone 0 2
76
Isocorydine 0 1
Isoformononetin 418 2 123 5.26 11 5.69
Isofraxidin 24 1 7 5.26 11 5.68
Isoliquiritigenin 789 2 534 5.26 9 5.68
Isomangiferin 8 2 5 5.26 10 5.7
Isoorientin 75 2 37 5.26 9 5.7
Isorhamnetin 50 2 21 5.2 9.5 6.17
Isorhapontigenin 248 2 106 5.26 8.5 5.68
Isovanillic acid 139 1 72 5.25 11 5.68
Isovanillin 163 1 78 5.25 11 5.68
Isovitexin 227 2 160 5.26 10.5 5.7
Jaceosidin 46 2 14 5.2 10 6.18
Jatrorrhizine (chloride) 179 2 114 5.26 9.5 5.7
Kaempferide 64 2 25 5.2 10 6.19
Kaempferol 107 2 78 5.26 11 5.7
L-5-Hydroxytryptophan 166 1 136 5.25 11 5.67
Laetanine 17 2 7 5.26 11 5.69
Leucocyanidin 1 2 1 5.16 14.5 6.52
Licoflavone A 17 2 6 5.26 10.5 5.69
Lucidin 5 2 1 5.14 14 6.68
Luteolin 7-O-glucuronide 187 2 101 5.25 10.5 5.74
Luteolinidin (chloride) 36 2 18 5.26 8.5 5.69
Maackiain 43 2 28 5.23 12 5.92
77
Medicarpin 40 2 10 5.2 14 6.15
Mequinol 282 1 109 5.25 11 5.68
Methyl 4-
hydroxyphenylacetate
1319 1 482 5.26 11 5.69
Methyl caffeate 687 1 247 5.26 11 5.69
Methyl gallate 192 1 75 5.25 11 5.68
Methyl Salicylate 74 1 37 5.25 11 5.68
Methylnissolin 151 2 32 5.21 12 6.09
Miquelianin 33 2 26 5.26 9 5.69
Mollugin 3 2 1 5.1 15 6.96
Moracin M 108 2 36 5.26 9.5 5.69
Morin 13 2 2 5.2 12.5 6.2
Mosloflavone 38 2 19 5.26 11 5.7
N,N,OTridesmethylvenlafaxine
296 1 133 5.25 11 5.67
N-Acetyl-5-
hydroxytryptamine
199 1 96 5.26 11 5.69
N-Acetyl-L-tyrosine 1424 1 542 5.26 11 5.69
Naphthazarin 17 1 7 5.25 11 5.67
Naringenin chalcone 176 2 131 5.26 10 5.69
Negletein 22 2 7 5.2 9.5 6.18
Neobavaisoflavone 398 2 128 5.26 8.5 5.69
Nepetin 10 2 8 5.17 14 6.41
78
Nevadensin 62 2 13 5.2 11 6.18
Norbergenin 12 1 5 5.26 11 5.69
Noreugenin 27 1 20 5.25 11 5.68
Noricaritin 38 2 33 5.14 13.5 6.67
Norswertianolin 0 2
O-Desmethylangolensin 342 2 197 5.26 7 5.66
Obtusifolin 11 2 5 5.26 11 5.69
Ochromycinone 3 2 3 5.25 11 5.68
Octopamine
(hydrochloride)
505 1 172 5.25 11 5.68
Okanin 366 2 137 5.26 9.5 5.69
Ononetin 1238 2 930 5.26 8.5 5.65
Orcinol glucoside 94 1 38 5.25 11 5.68
Orcinol 36 1 30 5.25 11 5.69
Orientin 7 2 1 5.21 12.5 6.11
Oroxylin A 21 2 5 5.2 10 6.17
Oxyresveratrol 94 2 81 5.26 11 5.69
p-Coumaric acid 744 1 272 5.25 11 5.68
p-Hydroxymandelic acid 511 1 182 5.25 11 5.67
Paeonol 67 1 48 5.25 11 5.68
Pectolinarigenin 38 2 13 5.18 11.5 6.34
Phellodendrine (chloride) 48 2 37 5.24 11.5 5.85
Phloretin 1034 2 756 5.26 8 5.67
79
Piceatannol 86 2 78 5.26 10.5 5.68
Pinobanksin 3-acetate 0 2
Pinostilbene 136 2 100 5.26 8.5 5.68
Pinostrobin 3 2 1 5.19 13 6.27
Plumbagin 17 1 8 5.25 11 5.68
Propyl gallate 1085 1 474 5.26 11 5.68
Protocatechualdehyde 160 1 66 5.26 11 5.69
Protocatechuic acid 130 1 55 5.25 11 5.68
Prunetin 8 2 3 5.17 14 6.35
Pterostilbene 2648 2 1660 5.26 9 5.67
Purpurin 21 2 13 5.26 11 5.69
Pyrocatechuic acid 80 1 59 5.25 11 5.68
Pyrogallol 48 1 26 5.26 11 5.68
Quercetagetin 42 2 10 5.26 10 5.69
Quercetin (dihydrate) 21 2 9 5.2 11.5 6.19
Quercitrin 10 2 7 5.2 12 6.24
Quinizarin 8 2 5 5.26 11 5.69
Raspberry ketone 1385 1 503 5.26 11 5.68
Resveratrol 106 2 68 5.26 9 5.68
Retusin 55 2 38 5.26 11 5.7
Reynoutrin 5 2 1 5.19 13 6.24
Rhamnetin 15 2 7 5.2 11 6.19
Rhamnocitrin 28 2 13 5.2 10 6.17
80
Rhein 8-Glucoside 0 2
Rhodionin 0 2
Robinetin 44 2 33 5.26 8.5 5.69
Robustine 37 1 13 5.25 11 5.68
Sakuranetin 149 2 110 5.23 11.5 5.95
Salicylic acid 96 1 48 5.25 11 5.68
Scutellarin 0 2
Sesamol 84 1 33 5.26 11 5.68
Sinapaldehyde 239 1 73 5.26 11 5.69
Swertianolin 18 2 1 5.13 14 6.68
Syringaldehyde 100 1 26 5.26 11 5.69
Syringic acid 62 1 33 5.26 11 5.68
Tamarixetin 51 2 7 5.2 11.5 6.19
Tectochrysin 32 2 16 5.2 10.5 6.19
Tectorigenin 1 2 1 5.11 14 6.91
Tetrac 235 2 64 5.26 11 5.69
Triptophenolide 2 1 2 5.25 11 5.67
Tyramine 583 1 440 5.25 11 5.68
Tyrosol 661 1 233 5.25 11 5.68
Umbelliferone 74 1 58 5.25 11 5.68
Urolithin C 42 2 29 5.26 11 5.69
Usnic acid 0 1
Vanillyl alcohol 167 1 70 5.26 11 5.68
81
Vanilpyruvic acid 835 1 359 5.26 11 5.69
Wedelolactone 3 2 1 5.15 14 6.6
Wogonin 25 2 5 5.2 10.5 6.17
Xanthoxylin 65 1 7 5.2 12 6.11
Xanthurenic acid 20 1 4 5.17 13 6.38
Ligands HBond HPhob BDCon Score
(+)-Gallocatechin 0 0 0 0
(+)-Usnic acid
(-)-(S)-Equol 0 1 0 0.2
(-)-Catechin gallate 3 1 0 3.2
(-)-Epiafzelechin 0 1 0 0.2
(-)-Epicatechin gallate 3 0 0 3
(-)-Epigallocatechin Gallate 3 1 0 3.2
(-)-Epigallocatechin
(-)-Gallocatechin gallate 2 1 0 2.2
(-)-Gallocatechin 0 1 0 0.2
(2S)-6-Prenylnaringenin 0 1 0 0.2
(E)-Methyl 4-coumarate 0 1 0 0.2
(S)-10-Hydroxycamptothecin 0 1 0 0.2
(±)-10-Hydroxycamptothecin 0 1 0 0.2
1,2,4-Trihydroxybenzene 1 1 1 0.7
82
1-Naphthol 0 1 3 -1.3
2'-Hydroxy-4'-methylacetophenone 0 5 1 0.5
2,3,4-Trihydroxybenzoic acid 2 1 0 2.2
2,3-Dihydroxy-4-methoxyacetophenone 0 1 0 0.2
2,4-Dihydroxybenzoic acid 1 1 1 0.7
2,5-Dihydroxyacetophenone 0 1 0 0.2
2,6-Dibromophenol 0 1 0 0.2
2,6-Dihydroxybenzoic acid 3 1 3 1.7
2,6-Dimethylhydroquinone 0 1 1 -0.3
2-Ethyl-6-methylphenol 0 2 2 -0.6
2-Hydroxyphenylacetic acid 2 2 3 0.9
2-Methoxyestradiol 0 1 1 -0.3
3',4'-Dihydroxyacetophenone 0 1 0 0.2
3'-Hydroxypterostilbene 0 1 0 0.2
3'-Hydroxypuerarin
3,5-Dimethoxyphenol 1 4 2 0.8
3-(3-Hydroxyphenyl)propionic acid 0 1 0 0.2
3-Chloro-L-tyrosine 0 3 1 0.1
3-Hydroxybenzaldehyde 0 1 0 0.2
3-Hydroxybenzoic acid 0 1 0 0.2
3-Hydroxyhippuric acid 0 1 0 0.2
3-Hydroxyphenylacetic acid 1 1 1 0.7
3-Methoxytyramine 0 1 0 0.2
83
3-Methylcatechol 2 1 0 2.2
3-Nitro-L-tyrosine 3 4 6 0.8
3-O-Methylgallic acid 0 0 1 -0.5
4$-Hydroxy-2$-methylacetophenone 0 1 0 0.2
4',5-Dihydroxyflavone 2 1 4 0.2
4'-Hydroxy-3'-methylacetophenone 0 1 2 -0.8
4'-Hydroxychalcone 0 1 0 0.2
4'-Methoxyresveratrol 2 1 0 2.2
4-$Hydroxyphenylpyruvic acid 0 1 0 0.2
4-(1,2-Dihydroxyethyl)benzene-1,2-diol 0 1 0 0.2
4-Allylcatechol 0 1 0 0.2
4-Ethylresorcinol 0 2 0 0.4
4-Hydroxy-1H-indole-3-carbaldehyde 2 3 4 0.6
4-Hydroxybenzoic acid 0 1 0 0.2
4-Hydroxybenzyl cyanide 0 1 0 0.2
4-Hydroxybenzylamine 0 1 0 0.2
4-Hydroxychalcone 0 1 0 0.2
4-Hydroxymephenytoin 1 1 0 1.2
4-Methylcatechol 1 1 0 1.2
4-Methyldaphnetin 2 1 1 1.7
4-Methylesculetin 6 5 5 4.5
4-Methylumbelliferone 0 1 0 0.2
5,6-Dihydroxyindole 0 1 0 0.2
84
5,7-Dihydroxy-4-methylcoumarin 4 2 3 2.9
5,7-Dimethoxyluteolin 0 0 2 -1
5-Aminosalicylic Acid 3 1 3 1.7
5-Hydroxyferulic acid 0 0 2 -1
5-Hydroxyflavone 2 1 3 0.7
5-Hydroxyindole-3-acetic acid 2 3 4 0.6
5-Hydroxyindole 0 1 0 0.2
5-Methoxysalicylic acid 2 1 2 1.2
6-Hydroxycoumarin 0 1 0 0.2
6-Hydroxymelatonin 0 2 0 0.4
7,4'-Di-O-methylapigenin 1 1 1 0.7
7,4'-Dihydroxyflavone 0 2 1 -0.1
7-Hydroxy-4H-chromen-4-one 0 1 0 0.2
8-Deoxygartanin 0 1 0 0.2
8-Hydroxybergapten 4 6 2 4.2
Acacetin 1 0 1 0.5
Acetylshikonin 1 1 1 0.7
Alizarin 2 1 5 -0.3
Alloimperatorin 2 2 1 1.9
Aloesin
Alpinetin 0 0 1 -0.5
Apigenin 7-glucoside 0 2 0 0.4
Apocynin 0 1 1 -0.3
85
Arbutin 0 1 0 0.2
Armillarisin A 4 1 2 3.2
Astragalin 4 1 3 2.7
Ayanin 0 1 0 0.2
Bavachin 0 3 4 -1.4
Bavachinin 0 3 1 0.1
Berberrubine (chloride) 1 5 3 0.5
Bergenin
Bisphenol A 0 1 0 0.2
Blumeatin 0 5 1 0.5
Brazilin 1 3 1 1.1
Brevifolincarboxylic acid 1 0 1 0.5
Butein 3 1 0 3.2
Butin 0 1 0 0.2
Calycosin 0 1 0 0.2
Cardamonin 1 4 2 0.8
Carvacrol 0 2 0 0.4
Catechin 1 1 0 1.2
Chrysin-7-O-glucuronide 0 0 0 0
Chrysoeriol 1 0 0 1
Columbamine 0 1 1 -0.3
Colutehydroquinone
Coniferyl alcohol 0 1 0 0.2
86
Corylin 0 1 0 0.2
Corypalmine 0 1 1 -0.3
Coumestrol 0 1 0 0.2
Creosol 0 1 1 -0.3
Cryptochlorogenic acid 1 3 3 0.1
Cyanidin (Chloride) 0 0 1 -0.5
Daidzein 0 1 0 0.2
Danshensu 1 1 0 1.2
Danthron 1 0 3 -0.5
Daphnoretin 0 1 0 0.2
Dehydrodiisoeugenol 1 5 3 0.5
Delphinidin (chloride) 2 0 1 1.5
Demethyleneberberine 0 0 2 -1
Deoxyarbutin 0 1 0 0.2
Dichotomitin 2 2 1 1.9
Dihydrocaffeic acid 0 1 0 0.2
Dihydrodaidzein 0 0 0 0
Dihydrokaempferol 0 3 1 0.1
Dihydromyricetin 2 0 3 0.5
Dimethylacrylalkannin 2 0 4 0
Diphyllin 1 1 2 0.2
DL-m-Tyrosine 1 1 1 0.7
DL-Norepinephrine (hydrochloride) 1 1 0 1.2
87
Ellagic acid
5
1
2 4.2
Emodin
-
8
-glucoside
0
5
0
1
Emodin
2
0
5
-0.5
Eriodictyol
3
0
5 0.5
Esculetin
0
1
0 0.2
Estriol
0
4
1 0.3
Ethyl gallate
2
1
0 2.2
Ethynyl Estradiol
0
4
1 0.3
Eugenol
2
1
0 2.2
Eupatilin
1
0
1 0.5
Eupatorin
3
5
6
1
Farrerol
1
3
2 0.6
Ferulic acid
5
2
3 3.9
Flavokawain C
0
1
0 0.2
Fraxetin
5
4
3 4.3
Fraxidin
2
1
1 1.7
Fraxin
Galangin
1
1
0 1.2
Gallic acid (hydrate)
0
1
0 0.2
Gallic acid
0
1
0 0.2
Genistein
2
2
1 1.9
Genistin
Genkwanin
1
1
1 0.7
88
Glabridin 1 5 5 -0.5
Glycitein 0 1 0 0.2
Glycycoumarin 1 7 3 0.9
Gnetol 4 1 1 3.7
Groenlandicine 2 2 4 0.4
Guaiacol 2 1 1 1.7
Guaijaverin 2 1 0 2.2
Hematoxylin 2 0 2 1
Herbacetin 3 1 1 2.7
Higenamine (hydrochloride) 1 0 1 0.5
Hispidin 0 1 0 0.2
Homogentisic acid 1 1 1 0.7
Homovanillic acid 3 1 1 2.7
Homovanillyl alcohol 2 1 1 1.7
Hydroxygenkwanin 1 4 1 1.3
Hydroxyphenyllactic acid 1 1 0 1.2
Hydroxytyrosol acetate 0 1 0 0.2
Hydroxytyrosol 0 1 0 0.2
Icaritin 0 1 0 0.2
Irisolidone
Isocorydine
Isoformononetin 2 2 1 1.9
Isofraxidin 0 1 1 -0.3
89
Isoliquiritigenin 0 1 0 0.2
Isomangiferin 2 0 2 1
Isoorientin 1 0 1 0.5
Isorhamnetin 1 0 1 0.5
Isorhapontigenin 0 1 1 -0.3
Isovanillic acid 0 1 0 0.2
Isovanillin 0 1 0 0.2
Isovitexin 0 1 0 0.2
Jaceosidin 1 0 1 0.5
Jatrorrhizine (chloride) 0 1 2 -0.8
Kaempferide 2 0 1 1.5
Kaempferol 2 1 0 2.2
L-5-Hydroxytryptophan 0 1 0 0.2
Laetanine 0 1 0 0.2
Leucocyanidin 4 0 5 1.5
Licoflavone A 0 4 0 0.8
Lucidin 2 2 2 1.4
Luteolin 7-O-glucuronide 0 0 1 -0.5
Luteolinidin (chloride) 1 0 1 0.5
Maackiain 0 3 1 0.1
Medicarpin 0 1 1 -0.3
Mequinol 0 1 0 0.2
Methyl 4-hydroxyphenylacetate 0 1 0 0.2
90
Methyl caffeate 0 1 0 0.2
Methyl gallate 0 1 0 0.2
Methyl Salicylate 3 1 3 1.7
Methylnissolin 1 1 1 0.7
Miquelianin 0 0 1 -0.5
Mollugin 2 0 1 1.5
Moracin M 0 1 0 0.2
Morin 1 1 0 1.2
Mosloflavone 3 5 6 1
N,N,O-Tridesmethylvenlafaxine 2 1 1 1.7
N-Acetyl-5-hydroxytryptamine 1 1 0 1.2
N-Acetyl-L-tyrosine 4 4 6 1.8
Naphthazarin 2 1 4 0.2
Naringenin chalcone 2 1 0 2.2
Negletein 1 1 1 0.7
Neobavaisoflavone 0 0 1 -0.5
Nepetin 2 0 4 0
Nevadensin 1 0 1 0.5
Norbergenin 2 1 0 2.2
Noreugenin 2 1 0 2.2
Noricaritin 2 5 6 0
Norswertianolin
O-Desmethylangolensin 0 0 1 -0.5
91
Obtusifolin 0 2 0 0.4
Ochromycinone 2 1 5 -0.3
Octopamine (hydrochloride) 0 1 0 0.2
Okanin 0 1 0 0.2
Ononetin 2 1 0 2.2
Orcinol glucoside 1 1 2 0.2
Orcinol 2 1 0 2.2
Orientin 0 1 1 -0.3
Oroxylin A 1 0 1 0.5
Oxyresveratrol 2 3 2 1.6
p-Coumaric acid 1 1 0 1.2
p-Hydroxymandelic acid 2 1 1 1.7
Paeonol 0 6 1 0.7
Pectolinarigenin 1 0 0 1
Phellodendrine (chloride) 0 1 0 0.2
Phloretin 0 1 0 0.2
Piceatannol 2 3 3 1.1
Pinobanksin 3-acetate
Pinostilbene 0 1 2 -0.8
Pinostrobin 1 4 1 1.3
Plumbagin 2 1 4 0.2
Propyl gallate 0 1 0 0.2
Protocatechualdehyde 0 1 0 0.2
92
Protocatechuic acid 0 1 0 0.2
Prunetin 1 8 7 -0.9
Pterostilbene 0 1 0 0.2
Purpurin 3 1 1 2.7
Pyrocatechuic acid 1 1 1 0.7
Pyrogallol 0 1 0 0.2
Quercetagetin 2 1 0 2.2
Quercetin (dihydrate) 2 0 1 1.5
Quercitrin 2 0 0 2
Quinizarin 2 1 5 -0.3
Raspberry ketone 0 1 0 0.2
Resveratrol 0 0 1 -0.5
Retusin 4 4 6 1.8
Reynoutrin 2 1 0 2.2
Rhamnetin 2 1 1 1.7
Rhamnocitrin 1 1 1 0.7
Rhein 8-Glucoside
Rhodionin
Robinetin 0 0 1 -0.5
Robustine 2 1 1 1.7
Sakuranetin 1 1 0 1.2
Salicylic acid 3 1 3 1.7
Scutellarin
93
Sesamol
1
3
1 1.1
Sinapaldehyde
3
3
3 2.1
Swertianolin
2 11
5 1.7
Syringaldehyde
0
1
0 0.2
Syringic acid
2
2
1 1.9
Tamarixetin
1
1
0 1.2
Tectochrysin
1
1
1 0.7
Tectorigenin
3
5
7 0.5
Tetrac
0
1
0 0.2
Triptophenolide
0
3
0 0.6
Tyramine
0
1
0 0.2
Tyrosol
0
2
0 0.4
Umbelliferone
4
1
2 3.2
Urolithin C
0
1
0 0.2
Usnic acid
Vanillyl alcohol
0
1
0 0.2
Vanilpyruvic acid
0
1
0 0.2
Wedelolactone
4
9 10 0.8
Wogonin
1
0
1 0.5
Xanthoxylin
4
5
5 2.5
Xanthurenic acid
1
4
1 1.3
94
Ligands S + F S + U M + F M + U W + F W + U N + F N + U
(+)-Gallocatechin 0 0 1 1 1 1 48 14
(+)-Usnic acid 0 0 0 0 0 0 31 9
(-)-(S)-Equol 0 0 2 1 13 6 48 38
(-)-Catechin gallate 0 0 4 0 23 3 1020 109
(-)-Epiafzelechin 0 0 2 0 4 0 53 25
(-)-Epicatechin gallate 2 0 15 1 24 2 739 113
(-)-Epigallocatechin Gallate 0 0 8 0 16 3 890 83
(-)-Epigallocatechin 0 0 0 0 0 0 9 2
(-)-Gallocatechin gallate 10 0 35 2 20 1 740 87
(-)-Gallocatechin 0 0 3 0 1 0 57 22
(2S)-6-Prenylnaringenin 23 0 187 55 156 101 678 442
(E)-Methyl 4-coumarate 192 48 279 83 586 526 560 355
(S)-10-
Hydroxycamptothecin
6 4 4 3 4 4 102 63
(±)-10-
Hydroxycamptothecin
9 3 8 4 0 2 108 73
1,2,4-Trihydroxybenzene 10 2 12 11 28 29 54 26
1-Naphthol 0 1 2 2 7 10 31 33
2'-Hydroxy-4'-
methylacetophenone
2 0 4 0 8 12 69 48
2,3,4-Trihydroxybenzoic
acid
5 0 16 4 23 2 173 37
95
2,3-Dihydroxy-4-
methoxyacetophenone
6 0 11 0 37 6 180 43
2,4-Dihydroxybenzoic acid 30 6 42 20 36 35 183 80
2,5-Dihydroxyacetophenone 13 4 16 16 16 22 108 73
2,6-Dibromophenol 2 0 4 1 6 13 45 56
2,6-Dihydroxybenzoic acid 6 0 23 4 28 4 158 75
2,6-Dimethylhydroquinone 2 3 3 3 5 11 24 24
2-Ethyl-6-methylphenol 0 2 2 1 0 3 45 38
2-Hydroxyphenylacetic acid 13 3 29 6 57 26 293 125
2-Methoxyestradiol 2 2 3 1 2 5 29 17
3',4'-
Dihydroxyacetophenone
6 3 18 4 46 22 92 45
3'-Hydroxypterostilbene 63 27 209 246 310 282 1300 840
3'-Hydroxypuerarin 0 0 0 0 0 0 83 17
3,5-Dimethoxyphenol 15 6 28 15 54 29 277 151
3-(3-
Hydroxyphenyl)propionic
acid
60 18 94 54 306 234 626 347
3-Chloro-L-tyrosine 102 37 141 38 142 146 803 432
3-Hydroxybenzaldehyde 21 6 30 26 63 61 113 100
3-Hydroxybenzoic acid 19 3 22 22 50 43 153 78
3-Hydroxyhippuric acid 79 15 91 74 225 205 713 360
3-Hydroxyphenylacetic acid 43 7 42 31 113 89 260 167
96
3-Methoxytyramine 57 18 77 56 85 95 506 260
3-Methylcatechol 4 1 7 1 13 8 43 28
3-Nitro-L-tyrosine 90 41 249 15 298 91 2476 601
3-O-Methylgallic acid 17 1 31 6 50 5 369 109
4$-Hydroxy-2$-
methylacetophenone
17 10 19 15 22 29 70 67
4',5-Dihydroxyflavone 1 0 7 2 12 9 63 45
4'-Hydroxy-3'-
methylacetophenone
12 6 14 6 18 34 68 44
4'-Hydroxychalcone 22 12 166 109 294 320 481 365
4'-Methoxyresveratrol 12 9 90 60 141 110 605 390
4-$Hydroxyphenylpyruvic
acid
172 33 175 65 462 307 839 364
4-(1,2-
Dihydroxyethyl)benzene1,2-diol
14 5 45 7 75 64 235 75
4-Allylcatechol 50 9 81 24 132 92 266 138
4-Ethylresorcinol 2 0 3 1 9 26 59 37
4-Hydroxy-1H-indole-3-
carbaldehyde
0 0 4 1 22 7 118 58
4-Hydroxybenzoic acid 30 6 37 26 88 81 116 59
4-Hydroxybenzyl cyanide 93 15 96 42 279 219 224 140
4-Hydroxybenzylamine 30 6 32 27 65 91 57 41
97
4-Hydroxychalcone 32 11 217 108 434 328 561 424
4-Hydroxymephenytoin 0 0 16 8 5 4 289 125
4-Methylcatechol 6 1 13 5 23 18 36 20
4-Methyldaphnetin 1 0 4 1 9 2 40 15
4-Methylesculetin 5 1 10 3 16 7 39 20
4-Methylumbelliferone 6 2 11 1 11 12 38 23
5,6-Dihydroxyindole 4 1 10 4 24 16 49 17
5,7-Dihydroxy-4-
methylcoumarin
7 1 7 3 13 9 41 22
5,7-Dimethoxyluteolin 6 6 56 16 63 38 271 170
5-Aminosalicylic Acid 11 0 21 2 33 7 136 39
5-Hydroxyferulic acid 9 9 57 4 70 49 683 215
5-Hydroxyflavone 1 0 9 2 15 8 68 38
5-Hydroxyindole-3-acetic
acid
31 4 17 17 41 53 286 139
5-Hydroxyindole 7 3 10 7 23 25 36 34
5-Methoxysalicylic acid 36 0 61 4 65 15 401 119
6-Hydroxycoumarin 7 3 9 5 24 23 34 23
6-Hydroxymelatonin 21 9 79 5 113 72 1042 491
7,4'-Di-O-methylapigenin 0 0 18 3 46 26 419 133
7,4'-Dihydroxyflavone 3 2 13 11 18 20 76 45
7-Hydroxy-4H-chromen-4-
one
9 2 12 4 15 11 45 31
98
8-Deoxygartanin 0 0 0 0 7 0 221 125
8-Hydroxybergapten 0 0 0 0 17 8 97 29
Acacetin 0 0 9 6 28 23 202 68
Acetylshikonin 0 0 0 0 9 0 138 61
Alizarin 0 1 1 0 5 0 29 14
Alloimperatorin 0 0 7 0 14 14 162 57
Aloesin 0 0 0 0 0 0 59 43
Alpinetin 4 2 16 5 29 13 159 103
Apigenin 7-glucoside 0 0 0 0 6 0 258 81
Apocynin 15 9 21 12 35 31 228 99
Arbutin 15 1 34 1 81 16 737 448
Armillarisin A 28 6 52 3 42 33 277 93
Astragalin 1 0 0 1 2 0 328 125
Ayanin 0 0 9 6 38 12 1055 190
Bavachin 0 0 0 2 0 1 57 52
Bavachinin 108 30 407 226 344 174 1663 957
Berberrubine (chloride) 0 0 1 0 4 3 26 8
Bergenin 0 0 0 0 0 0 15 0
Bisphenol A 6 2 39 24 37 39 165 138
Blumeatin 0 0 1 0 2 0 119 27
Brazilin 0 0 2 0 1 1 16 8
Brevifolincarboxylic acid 4 0 2 0 16 0 112 14
Butein 15 9 143 44 156 85 847 295
99
Butin 3 2 13 7 4 12 81 41
Calycosin 27 6 71 55 62 60 348 166
Cardamonin 30 8 106 36 101 48 637 349
Carvacrol 4 1 6 4 15 16 30 41
Catechin 0 0 1 0 2 0 30 21
Chrysin-7-O-glucuronide 0 0 0 0 3 3 451 128
Chrysoeriol 0 0 3 6 12 19 177 75
Columbamine 18 18 16 18 15 64 258 121
Colutehydroquinone 0 0 0 0 0 0 89 12
Coniferyl alcohol 49 16 80 19 132 90 579 304
Corylin 7 2 25 20 20 21 107 55
Corypalmine 0 18 6 0 22 18 223 183
Coumestrol 8 1 8 2 11 12 35 24
Creosol 6 3 11 7 12 17 69 53
Cryptochlorogenic acid 58 13 174 19 244 114 2418 791
Cyanidin (Chloride) 1 1 10 4 12 10 62 27
Daidzein 10 2 26 21 37 30 105 50
Danshensu 111 19 133 32 287 130 812 270
Danthron 0 0 0 0 5 2 31 18
Daphnoretin 4 5 52 27 63 58 457 218
Dehydrodiisoeugenol 48 21 51 102 45 105 975 578
Delphinidin (chloride) 2 0 17 1 20 2 87 11
Demethyleneberberine 15 9 23 17 38 38 163 80
100
Deoxyarbutin 44 12 45 29 123 127 148 105
Dichotomitin 0 0 9 0 8 0 214 31
Dihydrocaffeic acid 50 13 84 24 271 117 670 214
Dihydrodaidzein 2 1 19 7 15 6 62 57
Dihydrokaempferol 6 2 23 11 20 9 77 38
Dihydromyricetin 0 0 0 0 2 0 52 10
Dimethylacrylalkannin 1 1 0 0 12 0 159 80
Diphyllin 0 0 0 0 9 2 257 86
DL-m-Tyrosine 63 11 58 50 208 154 604 332
DL-Norepinephrine
(hydrochloride)
45 8 82 22 92 66 296 101
Ellagic acid 6 0 2 0 14 2 43 10
Emodin-8-glucoside 0 0 0 0 4 0 244 83
Emodin 0 0 0 0 3 2 25 10
Eriodictyol 0 0 1 0 0 2 15 8
Esculetin 5 1 11 2 22 15 39 13
Estriol 2 0 3 1 1 1 15 19
Ethyl gallate 58 0 111 0 226 43 926 146
Ethynyl Estradiol 6 0 6 0 0 6 54 39
Eugenol 57 19 64 37 80 99 504 321
Eupatilin 0 0 33 0 53 6 781 236
Eupatorin 6 0 19 0 35 12 613 161
Farrerol 0 0 1 0 4 2 33 11
101
Ferulic acid 55 14 100 25 132 104 649 221
Flavokawain C 171 39 464 243 536 396 2437 1083
Fraxetin 1 0 3 0 11 2 86 33
Fraxidin 0 0 0 0 21 1 168 48
Fraxin 0 0 0 0 0 0 147 20
Galangin 0 0 2 2 7 7 74 23
Gallic acid (hydrate) 12 0 26 1 45 6 168 35
Gallic acid 12 0 26 1 45 6 168 35
Genistein 9 1 27 15 34 33 75 43
Genistin 0 0 0 0 0 0 100 31
Genkwanin 0 0 6 1 19 6 151 53
Glabridin 0 0 0 1 0 0 11 8
Glycitein 8 2 21 15 30 27 174 75
Glycycoumarin 2 0 7 3 4 5 223 127
Gnetol 2 2 12 18 21 31 200 104
Groenlandicine 2 0 3 3 8 10 40 19
Guaiacol 7 2 8 8 20 23 106 70
Guaijaverin 0 0 1 0 3 0 108 30
Hematoxylin 0 0 2 3 1 0 9 6
Herbacetin 3 0 15 1 25 4 83 22
Higenamine (hydrochloride) 7 1 21 12 37 24 146 92
Hispidin 17 6 12 12 83 40 224 82
Homogentisic acid 40 13 32 24 75 77 298 139
102
Homovanillic acid 62 18 101 38 98 88 647 301
Homovanillyl alcohol 59 18 68 55 85 106 509 285
Hydroxygenkwanin 0 0 6 1 19 6 127 54
Hydroxyphenyllactic acid 161 34 176 39 495 311 766 382
Hydroxytyrosol acetate 157 1 348 58 716 408 1901 649
Hydroxytyrosol 52 11 68 20 150 99 299 124
Icaritin 0 0 3 0 47 12 710 138
Irisolidone 0 0 0 0 0 0 146 37
Isocorydine 0 0 0 0 0 0 5 0
Isoformononetin 39 6 87 62 120 104 245 149
Isofraxidin 0 2 5 0 16 1 143 37
Isoliquiritigenin 31 7 180 90 275 206 739 409
Isomangiferin 4 0 2 0 2 0 233 62
Isoorientin 10 0 22 7 28 8 1278 300
Isorhamnetin 0 0 9 6 24 11 213 41
Isorhapontigenin 13 4 59 39 60 73 463 205
Isovanillic acid 18 6 26 12 46 31 265 87
Isovanillin 21 6 29 17 48 42 251 126
Isovitexin 32 2 60 22 72 39 1321 644
Jaceosidin 0 0 11 1 22 12 283 43
Jatrorrhizine (chloride) 18 18 35 18 34 56 276 229
Kaempferide 0 0 9 6 32 17 221 54
Kaempferol 8 1 24 15 36 23 79 33
103
L-5-Hydroxytryptophan 21 7 31 12 47 48 589 288
Laetanine 2 0 3 1 3 8 66 20
Leucocyanidin 0 0 0 0 1 0 14 7
Licoflavone A 1 0 3 0 1 12 78 72
Lucidin 0 0 0 0 4 1 50 19
Luteolin 7-O-glucuronide 10 9 62 33 59 14 1761 884
Luteolinidin (chloride) 2 0 8 5 14 7 59 35
Maackiain 3 0 6 10 16 8 23 11
Medicarpin 0 0 15 5 7 13 49 33
Mequinol 30 6 35 23 82 106 75 53
Methyl 4-
hydroxyphenylacetate
175 44 156 96 426 422 596 364
Methyl caffeate 44 24 96 52 237 234 588 232
Methyl gallate 22 0 64 0 91 15 365 62
Methyl Salicylate 10 1 15 3 27 18 264 133
Methylnissolin 0 0 43 63 24 21 169 116
Miquelianin 3 3 5 4 11 7 460 173
Mollugin 0 0 0 0 2 1 78 10
Moracin M 8 1 22 16 32 29 81 52
Morin 0 0 3 1 5 4 71 25
Mosloflavone 2 0 11 0 20 5 247 54
N,N,OTridesmethylvenlafaxine
43 15 54 17 90 77 466 325
104
N-Acetyl-5-
hydroxytryptamine
17 9 28 21 36 88 437 273
N-Acetyl-L-tyrosine 180 52 181 87 512 412 1777 912
Naphthazarin 1 0 2 0 8 6 40 21
Naringenin chalcone 15 2 50 25 50 34 249 138
Negletein 0 0 4 0 14 4 138 29
Neobavaisoflavone 37 7 114 69 80 91 696 374
Nepetin 0 0 0 1 3 6 86 25
Nevadensin 0 0 12 0 44 6 782 166
Norbergenin 4 0 3 0 4 1 63 4
Noreugenin 3 0 4 3 9 8 37 15
Noricaritin 0 0 0 0 32 6 688 150
Norswertianolin 0 0 0 0 0 0 83 10
O-Desmethylangolensin 26 6 82 62 86 80 642 379
Obtusifolin 1 0 0 0 6 4 41 16
Ochromycinone 0 1 1 0 1 0 22 9
Octopamine (hydrochloride) 72 18 78 43 152 142 200 129
Okanin 19 3 69 76 103 96 522 245
Ononetin 61 19 325 190 386 257 1670 764
Orcinol glucoside 15 6 14 0 30 29 370 196
Orcinol 4 1 3 6 10 12 37 29
Orientin 0 0 4 3 0 0 240 144
Oroxylin A 0 0 4 0 12 5 96 20
105
Oxyresveratrol 1 0 9 20 22 42 198 101
p-Coumaric acid 83 18 109 42 290 202 207 111
p-Hydroxymandelic acid 78 13 90 40 164 126 321 151
Paeonol 10 0 18 5 20 14 198 87
Pectolinarigenin 0 0 3 3 11 21 238 87
Phellodendrine (chloride) 12 0 12 15 3 6 63 71
Phloretin 28 19 255 148 331 253 1488 919
Piceatannol 2 2 8 24 15 35 200 114
Pinobanksin 3-acetate 0 0 0 0 0 0 122 111
Pinostilbene 1 3 24 7 65 36 430 295
Pinostrobin 0 0 1 0 2 0 47 28
Plumbagin 1 0 3 0 8 5 28 21
Propyl gallate 125 0 330 13 461 156 2471 416
Protocatechualdehyde 12 3 25 5 73 42 129 67
Protocatechuic acid 13 1 24 8 58 26 156 44
Prunetin 0 0 0 2 2 4 105 38
Pterostilbene 213 60 484 438 824 629 1822 1373
Purpurin 1 0 0 0 13 7 30 15
Pyrocatechuic acid 11 1 23 6 26 13 175 68
Pyrogallol 5 0 12 1 23 7 78 26
Quercetagetin 1 0 13 1 21 6 102 10
Quercetin (dihydrate) 0 0 3 2 11 5 74 16
Quercitrin 0 0 2 4 3 1 229 78
106
Quinizarin 0 1 0 0 7 0 29 18
Raspberry ketone 170 43 176 108 435 453 488 368
Resveratrol 1 3 7 27 20 48 191 115
Retusin 3 0 13 6 24 9 991 216
Reynoutrin 0 0 1 0 4 0 95 28
Rhamnetin 0 0 5 0 9 1 124 33
Rhamnocitrin 0 0 5 0 23 0 174 27
Rhein 8-Glucoside 0 0 0 0 0 0 30 3
Rhodionin 0 0 0 0 0 0 141 15
Robinetin 3 1 10 8 11 11 93 39
Robustine 3 0 6 0 17 11 84 39
Sakuranetin 6 3 57 18 41 24 148 141
Salicylic acid 11 0 29 2 39 15 174 85
Scutellarin 0 0 0 0 0 0 424 77
Sesamol 10 2 12 9 27 24 37 25
Sinapaldehyde 24 2 51 0 120 42 1042 190
Swertianolin 0 0 0 0 18 0 347 41
Syringaldehyde 18 0 30 0 38 14 406 130
Syringic acid 12 0 18 3 27 2 427 83
Tamarixetin 0 0 6 6 26 13 217 45
Tectochrysin 0 0 6 1 19 6 126 68
Tectorigenin 0 0 0 0 1 0 65 17
Tetrac 9 8 41 70 17 90 1321 1202
107
Triptophenolide 1 0 0 0 0 1 8 4
Tyramine 80 19 72 50 165 197 154 118
Tyrosol 86 18 86 55 206 210 175 124
Umbelliferone 10 1 10 5 27 21 34 26
Urolithin C 8 1 9 2 10 12 40 22
Usnic acid 0 0 0 0 0 0 16 3
Vanillyl alcohol 24 6 27 24 38 48 233 121
Vanilpyruvic acid 132 31 182 64 229 197 1921 683
Wedelolactone 0 0 0 0 3 0 31 22
Wogonin 0 0 2 0 16 7 172 89
Xanthoxylin 0 0 17 6 21 21 494 187
Xanthurenic acid 0 0 4 0 12 4 112 48
Abstract (if available)
Abstract
Alzheimer's disease is a progressive neurodegenerative disorder associated with the aggregation of tau into intracellular neurofibrillary tangles. The structure and function of tau fibrils have been extensively studied using modern techniques like Cryo-EM. Cryo-EM has revealed that inhibitors of tau fibrils such as EGCG form highly stable stacks as it binds to tau fibrils. Other inhibitors of tau fibril formation may exhibit similar stacking behaviors. In this work, a predictive approach is presented using a machine learning model built with KNIME. Initially, general descriptors from a polyphenol database are utilized to construct the AI model. To improve the model’s accuracy, molecular modeling studies with TMD, a tool currently under development in our laboratory, was employed to generate more specific descriptors such as stacking score for incorporation into the KNIME AI model. This method can be a novel approach in understanding the binding and inhibitory nature of inhibitors of tau fibrils.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Optimizing small compounds to better understand tau fibril inhibition
PDF
Inhibition of monoamine oxidase A and histone deacetylase inhibitors: computational prediction of ligand binding
PDF
Artificial intelligence in medicinal chemistry and drug discovery
PDF
A new model of neurodegeneration: integrating molecular, electrochemical, and neuronal behavior into a novel paradigm
PDF
Algorithm development for modeling protein assemblies
PDF
Molecular docking of sulfonylureas to the SUR1 receptor
PDF
Prediction of peptides in formation of MHC class I - peptide - TCR complexes using molecular models and artificial intelligence
PDF
Solvation as a driving force for peptide docking to the major histocompatibility complex (MHC) class II molecules
PDF
Structure-based computational analysis and prediction of TCR CDR3 loops in the TCR-peptide-MHC complex using solvation parameters and peptide molecular dynamics.
PDF
Computer modeling of human islet amyloid polypeptide
PDF
Inhibition of MAO-A by Dual MAO-A/HDAC inhibitors: in silico approach for ligand binding and affinity prediction
PDF
Discovery of small molecules for brain cancer treatment
PDF
Optimization of ADRB2 overexpression and reagent characterization for cyclic AMP measurement
PDF
Identifying interaction partners of 5’tiRNAgly to assess physiological role of tRNA fragments
PDF
Pharmacokinetic and molecular modeling of orally bioavailable peptides
PDF
Image-driven pharmacokinetics of tropoelastin nanoparticles
PDF
NMI (near-infrared dye conjugate MAO A inhibitor) outperformed FDA-approved prostate cancer drugs with a unique mechanism based on bioinformatic analysis of NCI60 screening data
PDF
Genome engineering of filamentous fungi for efficient novel molecule production
PDF
Proinsulin-transferrin recombinant fusion protein: mechanism of activation and potential application in diabetes treatment
PDF
Effect of vicrostatin (an integrin based therapy) on canine osteosarcoma
Asset Metadata
Creator
Zheng, Zipeng (author)
Core Title
Integration of KNIME and molecular docking for evaluation of tau fibril inhibitors
School
School of Pharmacy
Degree
Master of Science
Degree Program
Molecular Pharmacology and Toxicology
Degree Conferral Date
2024-08
Publication Date
03/03/2025
Defense Date
08/28/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
KNIME,machine learning,molecular descriptors,molecular docking,OAI-PMH Harvest,predictions
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Haworth, Ian (
committee chair
), Seidler, Paul (
committee member
), Wang, Clay (
committee member
)
Creator Email
zipeng.zheng.104@gmail.com,zipengz@usc.edu
Unique identifier
UC11399A77B
Identifier
etd-ZhengZipen-13478.pdf (filename)
Legacy Identifier
etd-ZhengZipen-13478.pdf
Document Type
Thesis
Format
theses (aat)
Rights
Zheng, Zipeng
Internet Media Type
application/pdf
Type
texts
Source
20240904-usctheses-batch-1207
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
KNIME
machine learning
molecular descriptors
molecular docking
predictions