Seminar Report On AI Driven Drug Discovery 2
Seminar Report On AI Driven Drug Discovery 2
With due respect and gratitude, I take the opportunity to thank those who have
helped me directly and indirectly. I convey my sincere thanks to Prof. M.P.
Wankhade HOD Computer Dept. and Prof. L.B. Pawar for their help in selecting
the seminar topic and support.
I thank to my seminar guide Prof. L.B. Pawar for her guidance, timely help
and valuable suggestions without which this seminar would not have been
possible. Her direction has always been encouraging as well as inspiring for me.
Attempts have been made to minimize the errors in the report.
I would also like to express my appreciation and thanks to all my friends who
knowingly or unknowingly have assisted and encourage me throughout my hard
work.
ABSTRACT
1. Introduction
2. Method
1.1 FragAdd Framework
2.1 Data Representation
2.3 Graph Neural Networks
2.4 Training Details
2.5 Virtual Screening Pipeline
3. Results
3.1 Molecular Property Prediction Benchmark
3.2 Adding Strategy Exploration
3.3 Visualization of molecular representation
3.4 Application in Virtual Screening
3.5 Combination of FragAdd with other methods
4. Conclusion
5. Reference
1 Introduction
Drug discovery is becoming increasingly costly [1, 2]. Research and development of a
new medicine can cost anywhere from million-to-million US dollars, which has
increased exponentially in the last decade [3]. With the growing availability of big
data, deep learning is a promising approach to accelerate drug discovery in areas such
as compound synthesis, virtual screening, and de novo drug design [4–7]. However,
the effectiveness of deep learning depends on the availability of labeled data, which is
expensive, time [1] consuming, and sometimes impractical to obtain[8] . Pretraining
can help address this issue by learning background knowledge from a large amount of
unlabeled data [9] , and this knowledge has been shown to significantly improve the
performance of downstream tasks[10] . Recently, the masked language model
approach has been widely utilized for pretraining small molecules [11–13] . Infomax
[14] was one of the first graph pretraining methods to promote mutual information
between local and global representations. Hu et al. [12] then implemented Mask in a
molecular graph and discussed the advantages of using local and global level tasks
simultaneously. Grover [11] further advanced the Mask concept by proposing 1-hop
Mask augmentation, which queries the model to predict artificial labels at local and
graph levels. MolCLR [13] then implemented contrastive learning from computer
vision and developed two deletion augmentation methods: bond deletion and subgraph
removal, which can further corrupt the molecule. Although Mask-based pretraining
methods have shown some success in small molecule deep learning, it is not ideal for
small molecules due to two intrinsic properties: a limited vocabulary size and a
non[1]sequential molecule structure. For example, molecules have a much smaller
vocabulary size of less than 20, while university-level English speakers know
approximately 10 000 word families on average[15] . If all the masked atoms in the
molecules are predicted as carbon, an accuracy of approximately 74% (counted for 1
million molecules) can be achieved. This task is too easy, which prevents the
pretraining method from learning useful information. Furthermore, unlike human
language, where words are arranged sequentially, molecules have chemical structures
that are essential to their properties[5, 16] . Applying a mask to chemical bonds does
not cause any changes in the structures of molecules, whereas deleting bonds
significantly modifies the properties of the molecule. Consequently, this obstacle
prevents the pretraining method from gaining valuable knowledge. In contrast to the
existing pretraining strategies that involve reducing or eliminating information through
the use of masks, we introduce a novel approach called FragAdd, which involves the
addition of a chemically implausible molecular fragment to the input molecule. This
strategy is intended to provide structural variation and prevent the collapse of the
molecular structure. To learn rich local information while producing a meaningful
molecular representation, we designed a series of experiments to explore how the
adding strategy can be implemented. The fragments used in the strategy were taken
from a fragment database created using pretraining data.
2 Method
2.1 FragAdd framework
We created FragAdd to pretrain small molecules and use the pretrained model for
downstream objectives such as property prediction and virtual screening. Pretraining
provides Artificial Intelligence (AI) systems with a basic understanding of the data by
learning the patterns in small molecule data [9] . As a small molecule pretraining
framework, FragAdd introduces novel augmentation and training objectives to process
molecule graphs and update parameters. After pretraining with unlabeled data, the model
is further refined on supervised tasks, for example, predicting the toxicity of molecules.
Inspired by the modular nature of small molecules, FragAdd changes the molecular
structure to provide diversity and avoids predicting molecular vocabulary to increase the
difficulty of pretraining tasks. Diversity describes the number of chemical forms
generated from the augmentation, and difficulty indicates how challenging the task is for
an intelligent system to complete. Focusing on these two aspects may corrupt the
molecules’ structure to increase diversity and adjust the difficulty level by multiple
operations. Molecules have a modular nature, which regards molecules as a collection of
molecular fragments generated by addition reactions. Pharmacists use this idea to
optimize the quality of drug candidates by adding or deleting parts from molecules.
Based on this idea, FragAdd attaches a fragment outside the input molecule to imitate
the process of the natural addition reaction. During the augmentation process, FragAdd
generates a chemically invalid fragment and adds it to the input molecule, as shown in
Fig. 1. We generated a fragment database from all molecules in the pretraining dataset.
To sample a fragment from the database, we designed a twostep approach: first,
choosing a subgroup based on size (the number of atoms in a fragment[17] ), and then
randomly sampling one fragment from the chosen group (fragments larger than 20 atoms
are placed into one group). Further, we corrupted the sampled fragment by atom
mutation and ring break so as to avoid problems in distinguishing the fragment from the
original molecule. Atom mutation replaces some atoms with a different atom type, and
ring break deletes a bond in a ring from a molecule if a ring exists. The ratio of mutation
and break can be adjusted for a suitable difficulty level. To attach the damaged fragment
to the input molecule, we connected two randomly sampled carbons from the two
pieces. If no carbon exists for connection, atoms indexed zero in the molecule graph can
be chosen. The FragAdd augmentation corrupts a dynamic region that depends on the
size of the added fragment instead of a fixed local region by Mask-like methods.
For pretraining objectives, FragAdd locally classifies whether each atom belongs to the
extra fragment while globally summing up the number of added atoms. In fact, previous
work has proved the effectiveness of pretraining small molecules at both local and global
levels [11, 12] . Locally, FragAdd predicts a binary classification for each atom so that
the model learns to decompose molecules into fragments and determine which fragment
is chemically unreasonable. Globally, FragAdd predicts the number of added atoms to
summarize the chemical knowledge into molecular representation by pooling. Both levels
of training objectives are vital for effective pretraining.
A molecule graph is represented as with a node feature for each. Graph Neural Networks
(GNNs) [19] use a message-passing approach, where the representations of the
neighboring nodes of node are combined to iteratively update the representation of node.
After rounds of aggregation, the representation of node captures the structural
information within its -hop neighborhood. Formally, the -th layer of a GNN is expressed
as follow v where is the feature vector of node at -th layer, and is a set of neighbors of
node. We implemented Graph Isomorphism Network (GIN) [20] as our model. GIN is the
most expressive of the GNNs for the representation learning of graphs. Moreover, GIN
uses Multi-Layer Perceptrons (MLPs) as the aggregation function, proving that it satisfies
the conditions for a maximally powerful GNN. For the pretraining of molecular graphs,
GIN is the most recognized architecture. When setting up the GIN model, all
hyperparameters stay the same as in the previous work to exclude the model’s influence
during comparison. Five GIN layers were used to process molecule graphs. Nodes were
embedded into 300-dimensional units, and no dropout are used. Only node features of the
last layer were considered when model outputs and mean pooling were used to read out
global representations. We used a linear layer to predict the training objective for all the
pretraining tasks.
We pretrained two million small molecules from the ZINC database for 100 epochs, and
about 134 hundurds fragments were obtained using the BRICS algorithm. Instead of
increasing the pretraining data size to achieve the best benchmark result, we kept the data
size at two million molecules and conducted more rounds of exploration on adding
strategies. We set the random seed to zero and the batch size to 256. The Adam optimizer
was updated with a learning rate of 0.001; no weight decay or learning rate schedule was
used to keep the system at a minimum. We included a ratio of 0.1 in our global training
objective when combining the local and global loss. The pretrained model was fine-tuned
on eight classification datasets from Molecule Net, and the batch size was reduced to 32.
Molecule Net classification datasets are the most accepted for small molecule property
prediction, including three biophysics and five physiology datasets [21]. We added a
dropout rate of 0.5 and reduced the batch size to 32 for small-size downstream tasks such
as SIDER. Further, a linear layer was used to predict the final binary label and average
the accuracy across all tasks for each dataset.
We took Estrogen Receptor Alpha (ER) binding data from the Nuclear Receptor Activity
(NURA) dataset and divided it into reference and search data. The search data were then
combined with two million molecules to form the final virtual screening dataset. NURA
dataset contains information on small molecules that act as nuclear receptor modulators
[22]. We obtained 1287 ER binding active and 4861 inactive molecules from the 11
nuclear receptors of NURA. We sampled 20% of ER data as reference data, which were
used as a template for similarity search and fine-tuning. The other 80% ER data were
merged with two million small molecules from the ZINC database for screening. All
weak active ER binders were eliminated for simplicity. We adjusted FragAdd on ERα
reference data for the purpose of generating molecular representations. We set the batch
size to 32, which is suitable for a small dataset, and fine-tuned the pretrained model for
30 epochs. To make the fine-tuning process easier, we excluded weak active data and
only took into account absolute active or inactive data. During training, a linear layer was
used to classify binding activity, and meaning pooling created molecular representations
for similarity search. We employed the Python library FAISS to carry out a molecular
similarity search, utilizing embeddings from the GIN model and the Tanimoto coefficient
to search for fingerprints. FAISS is a Python library for similarity searching and
clustering of large-scale vectors [23]. The distance between molecular representations
was calculated with the minimum Euclidean (L2) distance (the maximum inner product
search could also be used). In this study, we chose the RDKit fingerprint and set the
fingerprint size to 300, the same as the pretrained embedding. Additionally, the k-nearest
fingerprint was defined by the Tanimoto coefficient, which is the ratio of the intersection
of two vectors to the union of the two vectors. α We used AutoDock Vina (version 1.2.3),
a widely used docking software for protein-ligand interaction [24], to investigate the
interaction between unknown screening retrievals and ER protein. To begin our analysis,
we first created three-dimensional molecular structures with Open Babel [25]. We then
carefully determined the center of the grid box, using the mean value of atoms
coordinates within the binding pocket of ER. This approach helped us to accurately
define our docking search space, which was set to a dimension of 30 angstroms. Apart
from these specific settings, we adhered to the default parameters provided by AutoDock
Vina. Finally, we visualized the docking pose with Pymol and Discovery Studio [26, 27]
3 Result
To determine how to implement the adding strategy which described in Section 2.1, we
explored four components that influence the diversity and difficulty of augmentation on
small molecules (Fig. 2a). At the beginning of the FragAdd augmentation process, a
fragment should be sampled from the fragment database. However, the generated
fragment database is unbalanced for fragment size (number of atoms in fragment),
resulting in a decrease in diversity when sample from the database directly (one-step
sampling). For example, fragments with size less than 3 or larger than 20 have nearly no
chance of being selected. Therefore, a better sampling method that tackles the
unbalancing problem of fragment size can contribute to the diversity of corruption. For
fragment corruption, how the fragment can be damaged to adjust the difficulty to a
reasonable level needs to be explored. Additionally, it is crucial to choose the connection
bond in fragment addition step. If most connection bonds are obvious wrong, the model
will only need to break the bond to separate the molecule into two parts, which makes it
too easy for the model to learn valuable molecular information. Finally, training
objectives directly affect the difficulty of the pretraining tasks locally or globally. Based
on the benchmark, we found the best solutions for the four chosen components, shown in
Fig. 2b. Compared with one-step sampling, first choosing fragment size substantially
improves the accuracy, showing the importance of maintaining the fragment size
distribution normalized. For fragment corruption, atom substitution and ring scaffold
hoping contribute independently to the invalid chemical information. Additionally, the
carbon-carbon (C-C) bond proved to be a more effective choice for connecting fragments
than any random bond. This superiority can be attributed to the high prevalence of C-C
bonds in our pretraining dataset, where they constitute approximately 59% of all bonds in
small molecules. Furthermore, carbon atoms in these molecules are typically connected
to more than 1.05 hydrogen atoms on average, a higher connectivity compared to other
atoms (e.g., “O”: 0.06, “N”: 0.32). This statistical prevalence of C-C bonds and the
connectivity pattern of carbon atoms make the C-C bond attachment more chemically
reasonable and effective for maintaining molecular integrity. Results also show that local
and global training objectives are essential to pretrain performance, as they learn rich
local information while producing a high-quality graph representation.
3.3 Visualization of molecular representation
The comparison shows that FragAdd learns the structure details about the existence of
fragments in molecules. We also noticed that FragAdd generates subgroups under the
same color, especially for scaffolds colored blue and red, which have subgroups far away
in the t-SNE space. We further found that the subgroups significantly differ in side
chains, showing that FragAdd can learn structural information deeper than the algorithm
used to calculate the scaffold.
We replaced the fingerprint method used in virtual screening with FragAdd and
investigated whether it could help to retrieve more desired molecules from the screening
database. Virtual screening is a common technique for the in-silico development of new
medicines [32–34], which searches for molecules with the highest probability of a
particular property or activity in molecule libraries. To generate molecular representations
with abundant chemical information, pretraining methods have been employed [35]. This
approach is advantageous over the traditional fingerprint method, as it does not require
the use of artificial rules to extract chemical information. Nevertheless, the application of
pretraining in virtual screening has not been extensively studied. We created a scenario to
find molecules that bind to the estrogen receptor (ER) from the top output of a molecular
similarity search. ER is a crucial therapeutic target, especially considering that
approximately 70% of breast cancer patients exhibit ER positive status [36, 37]. Given
this prevalence and the critical role of ER in the disease’s progression, our study focuses
on this receptor to better understand its interactions and potential avenues for therapeutic
intervention. The dataset of ERα, comprising 6148 molecules, was split into reference
and search subsets in a 1:4 ratio, and the search subset was combined with two million
molecules to form the final search dataset. The k k α reference subset was employed to
fine-tune the model and served as the basis for reference molecules during the search
process. We used a -nearest neighbor search for each reference molecule, calculating the
distance between molecular representations and setting to 200. As most molecules in the
search data do not have ER binding activity labels, we used different methods to analyze
known and unknown retrievals (known retrievals include molecules that have a binding
label). The analysis of known ERα ligands suggests that pretraining and fine-tuning are
beneficial for virtual screening, as demonstrated in Fig. 4. FragAdd achieved the highest
true binder rate for known binders and retrieved more than half of the true binders in the
top 200 outputs for each reference molecule. The traditional fingerprint method was not
successful in retrieving enough true binders, which highlights the advantages of deep
learning compared to the fingerprint method. We further explored the roles of pretraining
and fine-tuning in virtual screening. Combining the true binder rate and inactive number
results, we found that fine-tuning improves performance by decreasing the number of
inactive binding molecules. To gain an intuitive understanding of the function of
pretraining and fine-tuning, we visualized ER data using tmap[38] . Comparing before
and after fine-tuning reveals that fine-tuning helps classify active and inactive binding to
reduce the inactive number. Without pretraining, many molecules mix with other ones
instead of forming a tree structure, which indicates that pretraining assists in learning the
chemical features of each molecule.
In contrast to known ER ligands, the lack of binding activity labels for unknown
retrievals makes it difficult to analyze them. To address this, we conducted a docking
study to assess their binding to the ER protein (Fig. 5). Docking is a computational
technique used to predict protein-ligand interactions and binding affinity. We used the
affinity gap to evaluate the binding of the unknown retrievals. FragAdd achieved the
closest affinity gap to zero, indicating that it retrieves better unknown binders than the
traditional fingerprint method. This confirms that both pretraining and fine-tuning are
essential for unknown retrievals. To further understand the affinity α gap result, we
visualized the docking pose of a high affinity unknown retrieval, ZINC1627292. The
molecule interacts with the protein target through two hydrogen bonds on either side of
the molecule and a Tshaped stacking between benzene rings. Of the three interactions, the
hydrogen bond with His524 and the Pi-Pi interaction with Phe404 are conserved in the
natural binders for ER. For both known and unknown retrievals, FragAdd increases the
number of potential binders in the top 200 outputs.
3.5 Combination of FragAdd with other methods
FragAdd preserves the original molecule component, thus allowing for the integration of
other augmentation techniques. As an addition approach, FragAdd only adds a bond to
one carbon atom in the original molecule; this means that FragAdd is compatible with
Mask and its derivatives, raising the question of whether FragAdd can be combined with
other methods. If it can, FragAdd will offer a new choice for other pretraining
frameworks. FragAdd improves the average performance added to other methods,
indicating that the adding and deleting strategies could be used simultaneously. To
implement this idea, we conducted Mask-like augmentation on the input molecule and
then attached a fragment to the masked molecule and added the two loss items. We tested
this operation for Infomax, Atom Mask, and Bond Mask. Bond Mask hides bond types
for some bonds inside the molecular graph. For Infomax and Atom Mask, FragAdd
improves more than 1% accuracy after being combined with FragAdd. Moreover, for
Bond Mask, the accuracy stays the same, showing that it is better to adjust the ratio of
loss items for the best combination performance.
4 Conclusion
We propose a pretraining framework, FragAdd, which uses fragments from
decomposition as an additional part of an adding strategy, as an alternative to the Mask
based strategy in small molecule pretraining. Our results show that FragAdd outperforms
previous baselines in molecular property prediction and virtual screening tasks. It
achieved the best average accuracy in eight classification datasets, and excelled in two
datasets related to drug discovery. This performance is attributed to the extraction of
molecular representations that capture structure details. We also found that both
pretraining and fine-tuning are essential for virtual screening, and that FragAdd can be
used in conjunction with other self-supervised methods. A pretrained model based
molecule search engine has the potential to greatly accelerate the drug discovery process.
However, we have noticed that FragAdd occasionally incorporates excessive structural
variations, resulting in a bias during subsequent virtual screening. Additionally, the
training of FragAdd has utilized the same model and dataset as previous studies, which
might not be adequate for achieving optimal performance. Currently, we are focusing on
developing a dependable molecule search engine that can cater to the specific
requirements of biomedical research.
5. References
[1] H. F. Lynch and C. T. Robertson, Challenges in confirming drug effectiveness after
early approval, Science, vol. 374, no. 6572, pp. 1205–1207, 2021. \
[3] S. Simoens and I. Huys, R&D costs of new medicines: A landscape analysis, Front.
Med., vol. 8, p. 760762, 2021. [3] Q. Jiao, Z. Qiu, Y. Wang, C. Chen, Z. Yang, and X.
Cui, Edge-gated graph neural network for predicting protein ligand binding affinities, in
Proc. IEEE Int. Conf. Bioinformatics and Biomedicine (BIBM), Houston, TX, USA,
2021, pp. 334–339.
[4] H. Beck, M. Härter, B. Haß, C. Schmeck, and L. Baerfacker, Small molecules and
their impact in drug discovery: A perspective on the occasion of the 125th anniversary
of the Bayer Chemical Research Laboratory, Drug Discov. Today, vol. 27, no. 6, pp.
1560– 1574, 2022.
[5] Y. Ye, Unleashing the power of big data to guide precision medicine in China, Nature,
vol. 606, no. 7916, pp. 49–51, 2022.
[6] Y. Wang, Z. Qiu, Q. Jiao, C. Chen, Z. Meng, and X. Cui, Structure-based protein drug
affinity prediction with spatial attention mechanisms, in Proc. IEEE Int. Conf.
Bioinformatics and Biomedicine (BIBM), Houston, TX, USA, 2021, pp. 92–97.
[8] Y. LeCun and I. Misra, Self-supervised learning: The dark matter of intelligence,
https://ai.meta.com/blog/selfsupervised-learning-the-dark-matter-of-intelligence/,
2021.
[9] C. Cai, S. Wang, Y. Xu, W. Zhang, K. Tang, Q. Ouyang, L. Lai, and J. Pei, Transfer
learning for drug discovery, J. Med. Chem., vol. 63, no. 16, pp. 8683–8694, 2020.
[10] Y. Rong, Y. Bian, T. Xu, W. Xie, Y. Wei, W. Huang, and J. Huang, Self-supervised
graph transformer on large-scale molecular data, in Proc. 34th Int. Conf. Neural
Information Processing Systems, Virtual Event, 2020, pp. 12559–12571.
[11] W. H. Hu, B. W. Liu, J. Gomes, M. Zitnik, P. Liang, V. Pande, and J. Leskovec,
Strategies for pre-training graph neural networks, presented at Int. Conf. Learning
Representations (ICLR), Virtual Event, 2020.
[12] Y. Wang, J. Wang, Z. Cao, and A. Barati Farimani, Molecular contrastive learning
of representations via graph neural networks, Nat. Mach. Intell., vol. 4, no. 3, pp.
279–
287, 2022. [13] P. Veličković, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, and R. D.
Hjelm, Deep graph Infomax, presented at Int. Conf. Learning Representations (ICLR),
Vancouver, Canada, 2018. [14] J. Milton and J. Treffers-Daller, Vocabulary size revisited:
The link between vocabulary size and academic achievement, Appl. Linguist.
Rev., vol. 4, no. 1, pp. 151–172, 2013. [15] X. Zhang, C. Chen, Z. Meng, Z. Yang, H.
Jiang, and X. Cui, CoAtGIN: Marrying convolution and attention for graph-based
molecule property prediction, in Proc. IEEE Int. Conf. Bioinformatics and Biomedicine
(BIBM), Las Vegas, NV, USA, 2022, pp. 374–379.
[16] 574 Big Data Mining and Analytics, September 2024, 7(3): 565−576 G. Landrum,
RDKit: Open-source cheminformatics, https://www.rdkit.org, 2023.
[17] J. Degen, C. Wegscheid-Gerlach, A. Zaliani, and M. Rarey, On the art of
compiling and using ‘drug-like’ chemical fragment spaces, ChemMedChem, vol. 3, no.
10, pp.
1503–1507, 2008.
[18] Y. Li, R. Zemel, M. Brockschmidt, and D. Tarlow, Gated graph sequence neural
networks, presented at Int. Conf. Learning Representations (ICLR), San Juan, Puerto
Rico, 2016.
[19] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, How powerful are graph neural
networks? presented at Int. Conf. Learning Representations (ICLR), Vancouver, Canada,
2018.
[20] Z. Wu, B. Ramsundar, E. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K.
Leswing, and V. Pande, Molecule Net: A benchmark for molecular machine learning,
Chem. Sci., vol. 9, no. 2, pp. 513–530, 2018.
[21] C. Valsecchi, F. Grisoni, S. Motta, L. Bonati, and D. Ballabio, NURA: A curated
dataset of nuclear receptor modulators, Toxicol. Appl. Pharmacol., vol. 407, p. 115244,
2020.
[22] J. Johnson, M. Douze, and H. Jégou, Billion-scale similarity search with GPUs,
IEEE Trans. Big Data, vol. 7, no. 3, pp. 535–547, 2021.
[23] O. Trott and A. J. Olson, AutoDock Vina: Improving the speed and accuracy of
docking with a new scoring function. efficient optimization, and multithreading, J.
Comput. Chem., vol. 31, no. 2, pp. 455–461, 2010.
[24] N. M. O’Boyle, M. Banck, C. A. James, C. Morley, T. Vandermeersch, and G. R.
Hutchison, Open Babel: An open chemical toolbox, J. Cheminf., vol. 3, no. 1, p. 33,
2011.
[25] W. L. DeLano, PyMOL: An open-source molecular graphics tool, CCP4
Newsletter On Protein Crystallography, vol. 40, no. 1, pp. 82–92, 2002.
[26] Dassault Systèmes, BIOVIA discovery studio visualizer, https://www.3ds.com,
2023.
[27] W. Hamilton, Z. T. Ying, and J. Leskovec, Inductive representation learning on
large graphs, in Proc. 31st Int. Conf. Neural Information Processing Systems, Long
[28] Beach, CA, USA, 2017, pp. 1025–1035. G. Subramanian, B. Ramsundar, V.
Pande, and R. A. Denny, Computational modeling of β-secretase 1 (BACE1) inhibitors
using ligand based approaches, J. Chem. Inf. Model., vol. 56, no. 10, pp. 1936–1949,
2016.
[29] K. M. Gayvert, N. S. Madhukar, and O. Elemento, A data driven approach to
predicting successes and failures of clinical trials, Cell Chem. Biol., vol. 23, no. 10, pp.
1294–1301, 2016. [30] G. Hinton and S. Roweis, Stochastic neighbor embedding, in
Proc. 15th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2002,
pp. 857–864.
[31] A. A. Sadybekov, A. V. Sadybekov, Y. Liu, C. IliopoulosTsoutsouvas, X. P. Huang, J.
Pickett, B. Houser, N. Patel, N. K. Tran, F. Tong, et al., Synthon-based ligand
discovery in virtual libraries of over 11 billion compounds, Nature, vol. 601, no.
7893, pp. 452–459, 2022.
[32] F. Gentile, J. C. Yaacoub, J. Gleave, M. Fernandez, A. T. Ton, F. Ban, A. Stern, and
A. Cherkasov, Artificial intelligence–enabled virtual screening of ultra-large
chemical libraries with deep docking, Nat. Protoc., vol. 17, no. 3, pp. 672–697,
2022.
[33] J. Wang, Z. Qiu, X. Zhang, Z. Yang, W. Zhao, and X. Cui, Boosting deep learning
based docking with cross-attention and centrality embedding, in Proc. IEEE Int.
Conf. Bioinformatics and Biomedicine (BIBM), Las Vegas, NV, USA, 2022, pp.
360–365.
[34] K. Atz, F. Grisoni, and G. Schneider, Geometric deep learning on molecular
representations, Nat. Mach. Intell., vol. 3, no. 12, pp. 1023–1032, 2021.
[35] D. Bafna, F. Ban, P. S. Rennie, K. Singh, and A. Cherkasov, Computer-aided ligand
discovery for estrogen receptor alpha, Int. J. Mol. Sci., vol. 21, no. 12, p. 4193,
2020.
[36] M. Kriegel, H. J. Wiederanders, S. Alkhashrom, J. Eichler, and Y. A. Muller, A
PROSS-designed extensively mutated estrogen receptor α variant displays enhanced
thermal stability while retaining native allosteric regulation and structure, Sci. Rep., vol.
11, no. 1, p. 10509, 2021.