0% found this document useful (0 votes)
42 views4 pages

ABlooper: Fast Accurate Antibody CDR Loop Structure

ABlooper is a deep learning-based tool designed for the rapid and accurate prediction of antibody CDR loop structures, particularly the challenging CDR-H3 loop. It utilizes E(n)-Equivariant Graph Neural Networks to provide high accuracy predictions with confidence estimates, achieving an average CDR-H3 RMSD of 2.49 Å, which improves to 2.05 Å for its most confident predictions. The tool is available for public use and demonstrates significant improvements over existing methods in both speed and accuracy.

Uploaded by

Naimur Rahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views4 pages

ABlooper: Fast Accurate Antibody CDR Loop Structure

ABlooper is a deep learning-based tool designed for the rapid and accurate prediction of antibody CDR loop structures, particularly the challenging CDR-H3 loop. It utilizes E(n)-Equivariant Graph Neural Networks to provide high accuracy predictions with confidence estimates, achieving an average CDR-H3 RMSD of 2.49 Å, which improves to 2.05 Å for its most confident predictions. The tool is available for public use and demonstrates significant improvements over existing methods in both speed and accuracy.

Uploaded by

Naimur Rahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Bioinformatics, 38(7), 2022, 1877–1880

https://doi.org/10.1093/bioinformatics/btac016
Advance Access Publication Date: 31 January 2022
Original Paper

Structural bioinformatics
ABlooper: fast accurate antibody CDR loop structure
prediction with accuracy estimation

Downloaded from https://academic.oup.com/bioinformatics/article/38/7/1877/6517780 by CHANG GOO KIM user on 14 March 2024


1,
Brennan Abanades *, Guy Georges2, Alexander Bujotzek2 and
1,
Charlotte M. Deane *
1
Department of Statistics, University of Oxford, Oxford, UK and 2Roche Pharma Research and Early Development, Large Molecule
Research, Roche Innovation Center Munich, Penzberg, Germany
*To whom correspondence should be addressed.
Associate Editor: Jinbo Xu

Received on August 16, 2021; revised on November 26, 2021; editorial decision on January 3, 2022

Abstract
Motivation: Antibodies are a key component of the immune system and have been extensively used as biotherapeu-
tics. Accurate knowledge of their structure is central to understanding their antigen-binding function. The key area
for antigen binding and the main area of structural variation in antibodies are concentrated in the six complementar-
ity determining regions (CDRs), with the most important for binding and most variable being the CDR-H3 loop. The
sequence and structural variability of CDR-H3 make it particularly challenging to model. Recently deep learning
methods have offered a step change in our ability to predict protein structures.
Results: In this work, we present ABlooper, an end-to-end equivariant deep learning-based CDR loop structure
prediction tool. ABlooper rapidly predicts the structure of CDR loops with high accuracy and provides a confidence
estimate for each of its predictions. On the models of the Rosetta Antibody Benchmark, ABlooper makes predictions
with an average CDR-H3 RMSD of 2.49 Å, which drops to 2.05 Å when considering only its 75% most confident
predictions.
Availability and implementation: https://github.com/oxpig/ABlooper.
Contact: opig@stats.ox.ac.uk
Supplementary information: Supplementary data are available at Bioinformatics online.

1 Introduction 2016). The area of antibodies that it is hardest to model is the


sequence variable regions that provide the structural diversity
1.1 Antibody structure necessary to bind a wide range of antigens. This diversity is largely
Antibodies are a class of protein produced by B cells during an im- focussed on six loops known as the complementarity determining
mune response. Their ability to bind with high affinity and specifi- regions (CDRs). The most diverse of these CDRs and therefore
city to almost any antigen makes them attractive for use as the hardest to model is the third CDR loop of the heavy chain
therapeutics (Carter and Lazar, 2018). (CDR-H3) (Teplyakov et al., 2014).
Knowledge of the structure of antibodies is becoming increasing-
ly important in biotherapeutic development (Chiu et al., 2019).
However, experimental structure determination is time-consuming 1.2 Deep learning for protein structure prediction
and expensive so it is not always practical or even possible to use At CASP14 (Kryshtafovych et al., 2021), DeepMind showcased
routinely. Computational modelling tools have allowed researchers AlphaFold2 (Jumper et al., 2021), a neural network capable of ac-
to bridge this gap by predicting large numbers of antibody structures curately predicting many protein structures. The method relies on
to a high level of accuracy (Leem et al., 2016; Ruffolo et al., 2021). the use of equivariant neural networks and an attention mechanism.
For example, models of antibody structures have recently been used More recently, RoseTTAFold, a novel neural network based on
for virtual screening (Schneider et al., 2021) and to identify equivariance and attention was shown to obtain results comparable
coronavirus-binding antibodies that bind the same epitope with very to those of AlphaFold2 (Baek et al., 2021).
different sequences (Robinson et al., 2021). These methods both rely on the use of equivariant networks. For
The overall structure of all antibodies is similar and therefore a network to be equivariant with respect to a group, it must be able
can be accurately predicted using current methods (e.g. Leem et al., to commute with the group action. For rotations, this means that

C The Author(s) 2022. Published by Oxford University Press.


V 1877
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits
unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
1878 B.Abanades et al.

rotating the input before feeding it into the network will have the IDs of all the structures used in the train, test, and validation sets is
same result as rotating the output. In the case of proteins, using a given in the Supplementary Material.
network equivariant to both translations and rotations in 3D space ABodyBuilder was used to build models of all the structures.
allows us to learn directly from atom coordinates. This is in contrast Structural models were generated using the singularity version of
to previous methods like TrRosetta (Yang et al., 2020) or the origin- ABodyBuilder (Leem et al., 2016) (fragment database from July 8,
al version of AlphaFold (Senior et al., 2020) that predicted invariant 2021) excluding all templates with a 99% or higher sequence iden-
features, such as inter-residue distances and orientations which are tity. ABlooper CDR models for the test sets were obtained by
then used to reconstruct the protein. A number of approaches for remodelling the CDR loops on ABodyBuilder models.
developing equivariant networks have been recently developed (e.g.
Finzi et al., 2021).
2.2 Deep learning
In this article, we explore the use of an equivariant approach to
ABlooper is composed of five E(n)-EGNNs, each one with four

Downloaded from https://academic.oup.com/bioinformatics/article/38/7/1877/6517780 by CHANG GOO KIM user on 14 March 2024


CDR structure prediction. We chose to use E(n)-Equivariant Graph
layers, all trained in parallel. The model is trained on the position of
Neural Networks (E(n)-EGNNs; Satorras et al., 2021) as our equiv-
the Ca -N-C-Cb backbone atoms for all six CDR loops plus two an-
ariant approach due to their speed and simplicity.
chor residues at either end. E(n)-EGNNs require a starting geom-
etry, so a non-descriptive input geometry is generated by evenly
1.3 Deep learning for antibody structure prediction spacing each CDR loop residue on a straight line between its anchor
Deep learning-based approaches have also been shown to improve residues (Fig. 1). The model is given four different types of features
structure prediction in antibodies, e.g. DeepH3 (Ruffolo et al., per node resulting in a 41-dimensional vector. These include a one-
2020), an antibody-specific version of TrRosetta. Recently, DeepAb hot encoded vector describing the amino acid type, the atom type
(Ruffolo et al., 2021), an improved version of DeepH3, was shown and which loop the residue belongs to. Additionally, sinusoidal pos-
to outperform all currently available antibody structure prediction itional embeddings are given to each residue describing how close it
methods. DeepAb and DeepH3 are similar to TrRosetta and the ori- is to the anchors. An outline of how E(n)-EGNNs are used within
ginal version of AlphaFold in that deep learning is used to obtain ABlooper is shown in Figure 1.
inter-residue geometries that are then fed into an energy minimiza- Two different losses were used during training. To quantify the
tion method to produce the final structure. structural similarity between the predicted and true structures,
In this work, we present ABlooper, a fast and accurate tool for RMSD was used. To encourage the conservation of distances be-
antibody CDR loop structure prediction. By leveraging E(n)- tween neighbouring atoms in the backbone chain, an L1-loss be-
EGNNs, ABlooper directly predicts the structure of CDR loops. By tween the true and predicted inter-atom distances was used. This
simultaneously predicting multiple structures for each loop and was composed of five terms between the following pairs of atoms:
comparing them amongst themselves, ABlooper is capable of esti- Cia -Ciþ1 i i i i i i i
a ; Ca -Cb ; Ca -N ; Ca -C ; C -N
iþ1
.
mating a confidence measure for each predicted loop. Each of the five E(n)-EGNNs were trained to make predictions
independently by minimizing the RMSD between their prediction
and the crystal structure. The output from the five networks is then
2 Materials and Methods averaged to obtain a final prediction. To ensure that the final com-
bined prediction of all E(n)-EGNNs was physically plausible, the
2.1 Data
L1-loss was used on the final averaged structure.
The data used to train, test and validate ABlooper were extracted
The model was trained in two phases. First, it was trained until
from SAbDab (Dunbar et al., 2014), a database of all antibody
convergence without the L1-loss term using the RAdam (Liu et al.,
structures contained in the PDB (Berman et al., 2000). Structures
2020) optimizer with a learning rate of 103 and a weight decay of
with a resolution better than 3.0 Å and no missing backbone atoms
103 . In the second stage, the L1-loss term was added with a weight-
within any of the CDRs were selected. The CDRs were defined using
ing of 1.0. For this stage, the model was trained using the Adam
the Chothia numbering scheme (Chothia et al., 1989).
(Kingma and Ba, 2014) optimizer with a learning rate of 104 and
For easy comparison with different pipelines, we used the 49
early stopping. More details on the implementation of ABlooper can
antibodies from the Rosetta Antibody Benchmark as our test set.
be found in the Supplementary Material.
For validation, 100 structures were selected at random. It was
ensured that there were no structures with the same CDR sequences
in the training, testing and validation sets. Sequence redundancy 2.3 Loop relaxation
was allowed within the training set to expose the network to the ex- During training, ABlooper is encouraged to predict physically plaus-
istence of antibodies with identical sequences but different structural ible CDR loops via the intra-residue atom distance loss term.
conformations. This resulted in a total of 3438 training structures. However, ABlooper occasionally produces loops with incorrect
Additionally, we use a secondary test set composed of 114 anti- backbone geometries. To enforce correct backbone geometries we
bodies (SAbDab Latest Structures) with a resolution of under 2.3 Å relax the predicted loops using a restrained energy minimization
and a maximum CDR-H3 loop length of 20, which were added to procedure. As our energy function, we use the AMBER14 (Maier
SAbDab after the initial test, train and validation sets were extracted et al., 2015) protein force field with an additional harmonic poten-
(November 8, 2020 to May 24, 2021). A list containing the PDB tial term keeping the positions of backbone atoms close to their

INPUT DATA E(n)-EGNN OUTPUT LOOPS


H1: GFNIKEY
H2: DPEQGN
H3: DTAAYFDY
L1: RASRDIKSYLN
L2: YATSLAE
L3: LQHGESPWT

INPUT CDR ITERATIVELY UPDATE CDR


D LOOP
GEOMETRY SEQUENCES COORDINATES STRUCTURES

Fig. 1. Flowchart showing how E(n)-EGNN is used to predict CDR loops in ABlooper. The input geometry for each CDR loop is generated by aligning its residues between
their anchors, while the node features are extracted from the loop sequence. Atom coordinates are then iteratively updated using a four-layer E(n)-EGNN resulting in a pre-
dicted set of conformations for each CDR
ABlooper: fast accurate antibody CDR loop structure prediction with accuracy estimation 1879

original predicted positions. The spring constant of the harmonic certainty of the final averaged prediction. If all five models agree on
potential is set to 10 kcal/mol2. Energy minimization is done using the same conformation, then it is more likely that it will be the cor-
the Langevin Integrator in the OpenMM python package (Eastman rect conformation, if they do not, then the final prediction is likely
et al., 2017). This relaxation step typically results in a small loss in to be less accurate (Fig. 2). This allows ABlooper to give a confi-
accuracy, but ensures that predicted loops are physically plausible. dence score for each predicted loop. As shown in Figure 2D, this
score can be used as a filter, removing structures which are expected
to be incorrectly modelled by ABlooper. For example, by setting a
2.4 Deepab and AlphaFold2
1.5 Å inter-prediction RMSD cut-off on structures from the Rosetta
DeepAb structural models were generated using the open-source Antibody Benchmark, the average CDR-H3 RMSD for the set can
version of the code (available at https://github.com/Rosetta be reduced from 2.49 to 2.05 Å while keeping around three quarters
Commons/DeepAb). As suggested in their paper (Ruffolo et al., of the predictions. As expected, accuracy filtering has a tendency to
2021), we generated five decoys per structure. This took around

Downloaded from https://academic.oup.com/bioinformatics/article/38/7/1877/6517780 by CHANG GOO KIM user on 14 March 2024


remove longer CDR-H3 predictions but it is not exclusively corre-
10 min per antibody on an 8-core Intel i7-10700 CPU. lated to length (see Supplementary Material).
Antibody structures were generated using the open-source ver-
sion of AlphaFold2 (available at https://github.com/deepmind/alpha
fold). We used the ‘full_dbs’ preset and allowed it to use templates 4 Discussion
from before May 14, 2020. As AlphaFold2 is intended to predict
single chains (Jumper et al., 2021), we predicted and aligned the We present ABlooper, a fast and accurate tool for predicting the
heavy and light chain independently before comparing to other structures of the CDR loops in antibodies. It builds on recent advan-
methods. On a 20-core Intel 6230 CPU this took around 3 h per ces in EGNNs to improve CDR loop structure prediction.
antibody modelled. On an NVIDIA Tesla V100 GPU, the unrelaxed version of
ABlooper can predict the CDR backbone atoms for 100 structures
in under 5 s. Loop relaxation and side-chain prediction are the most
3 Results computationally expensive parts of the pipeline taking around 10 s
per structure. ABlooper outperforms ABodyBuilder (a state of the
3.1 Using ABlooper to predict CDR loops on modelled art homology method) and produces antibody models of similar ac-
antibody structures curacy to both AlphaFold2 and DeepAb, but on a far faster
We used ABlooper to predict the CDRs on ABodyBuilder models of timescale.
the Rosetta Antibody Benchmark (RAB) and the SAbDab Latest By predicting each loop multiple times, ABlooper is capable of
Structures (SLS) sets. The RMSD between the Ca -N-C-Cb atoms in producing an accuracy estimate for each generated loop structure. It
the backbone of the crystal structure and the predicted CDRs for is not clear whether a high prediction diversity score is indicative of
both test sets is shown in Table 1. loops with multiple conformations or underrepresentation of the
ABlooper achieves lower mean RMSDs than AbodyBuilder for given loop sequence in SAbDab (Dunbar et al., 2014). However,
most CDRs (Table 1). By far, the largest improvement is for the due to how ABlooper is trained (with the averaged prediction
CDR-H3 loop, where due to the large structural diversity, homology encouraged to be physically plausible), we would expect individual
modelling performs worst (Leem et al., 2016). ABlooper predicts decoys from ABlooper to be unphysical for divergent predictions.
loops of a similar accuracy to AlphaFold2 and DeepAb for all CDRs With the arrival of B-cell receptor repertoire sequencing, the
number of publicly available paired antibody sequence data is rapid-
except CDR-H3, where ABlooper and DeepAb outperform
ly increasing (Kovaltsuk et al., 2018; Olsen et al., 2022). Fast accur-
AlphaFold2.
ate tools such as ABlooper provide the opportunity for structural
One potential source of error for ABlooper is the model frame-
studies (such as Robinson et al., 2021) at previously infeasible
works generated by ABodyBuilder, so we examined its resilience to
scales. The model used for ABlooper is available at: https://github.
the small deviations seen in these models and found little to no cor-
com/oxpig/ABlooper.
relation between framework error and CDR prediction error (see
Supplementary Material).
Funding
3.2 Prediction diversity as a measure of prediction
This work was supported by the Engineering and Physical Sciences Research
quality Council (EPSRC) with grant number (EP/S024093/1).
ABlooper predicts five structures for each loop. We found that the
average RMSD between predictions can be used as a measure of Conflict of Interest: none declared.

Table 1. Performance comparison between AlphaFold2, ABodyBuilder, DeepAb and ABlooper for both test sets

Method CDR-H1 CDR-H2 CDR-H3 CDR-L1 CDR-L2 CDR-L3

Rosetta Antibody Benchmark


AlphaFold2a 0.84 0.99 2.87 0.53 0.49 0.95
ABodyBuilder 1.08 0.99 2.77 0.69 0.50 1.12
DeepAb 0.83 0.93 2.44 0.50 0.44 0.85
ABlooper 0.92 1.01 2.49 0.62 0.52 0.97
ABlooper unrelaxed 0.90 1.03 2.45 0.61 0.51 0.93
SAbDab latest structures
ABodyBuilder 1.24 1.07 3.25 0.88 0.57 1.03
DeepAba 1.00 0.82 2.49 0.59 0.45 0.90
ABlooper 1.14 0.97 2.72 0.74 0.55 1.04
ABlooper Unrelaxed 1.14 0.99 2.66 0.73 0.54 1.01

The mean RMSD to the crystal structure across each test set for the six CDRs is shown. RMSDs for each CDR are calculated after superimposing their corre-
sponding chain to the crystal structure. RMSDs are given in Angstroms (Å).
a
It is likely that AlphaFold2 used at least some of the structures in the benchmark set during training. Similarly, structures in the SAbDab Latest Structures set
may have been used for training DeepAb.
1880 B.Abanades et al.

Downloaded from https://academic.oup.com/bioinformatics/article/38/7/1877/6517780 by CHANG GOO KIM user on 14 March 2024


Fig. 2. (A) CDR-H3 loop RMSD between final averaged prediction and crystal structure compared with average RMSD between the five ABlooper predictions for both the
Rosetta Antibody Benchmark and the SAbDab Latest Structures set. (B) An example of a poorly predicted CDR-H3 loop. All five predictions are given in grey, with the final
averaged prediction in blue and the crystal structure in green. The predictions from the five networks are very different, indicating an incorrect final prediction. (C) Example of
correctly predicted CDR loops. All five predictions are similar, indicating a high confidence prediction. Colours are the same as in (B). (D) Effect of removing structures with a
high CDR-H3 inter-prediction RMSD on the averaged RMSD for the set. The number of structures remaining after each quality cut-off is shown as a percentage. Data shown
for the RAB and the SLS sets

References Leem,J. et al. (2016). ABodyBuilder: automated antibody structure prediction


with data–driven accuracy estimation. MAbs, 8, 1259–1268.
Baek,M. et al. (2021) Accurate prediction of protein structures and interac- Liu,L. et al. (2020). On the variance of the adaptive learning rate and beyond.
tions using a three-track neural network. Science, 373, 871–876. In: Proceedings of the Eighth International Conference on Learning
Berman,H.M. et al. (2000) The Protein Data Bank. Nucleic Acids Res., 28, Representations (ICLR 2020).
235–242. Maier,J.A. et al. (2015) ff14SB: improving the accuracy of protein side chain
Carter,P.J. and Lazar,G.A. (2018) Next generation antibody drugs: pursuit of and backbone parameters from ff99SB. J. Chem. Theory Comput., 11,
the ‘high-hanging fruit’. Nat. Rev. Drug Discov., 17, 197–223. 3696–3713.
Chiu,M.L. et al. (2019) Antibody structure and function: the basis for engin- Olsen,T.H. et al. (2022) Observed antibody space: a diverse database of
eering therapeutics. Antibodies, 8, 55. cleaned, annotated, and translated unpaired and paired antibody sequences.
Chothia,C. et al. (1989) Conformations of immunoglobulin hypervariable Protein Sci., 31, 141–146.
regions. Nature, 342, 877–883. Robinson,S.A. et al. (2021) Epitope profiling using computational structural
Dunbar,J. et al. (2014) SAbDab: the structural antibody database. Nucleic modelling demonstrated on coronavirus-binding antibodies. PLoS
Acids Res., 42, D1140–D1146. Computational Biology, 17, e1009675
Eastman,P. et al. (2017) OpenMM 7: rapid development of high performance Ruffolo,J.A. et al. (2020) Geometric potentials from deep learning improve
algorithms for molecular dynamics. PLoS Comput. Biol., 13, e1005659. prediction of CDR H3 loop structures. Bioinformatics, 36, i268–i275.
Finzi,M. et al. (2021). A practical method for constructing equivariant multi- Ruffolo,J.A. et al. (2021) Antibody structure prediction using interpretable
layer perceptrons for arbitrary matrix groups. In: Proceedings of the 38th deep learning. Patterns, 100406.
International Conference on Machine Learning, PMLR, Vol. 139, pp. Satorras,V.G. et al. (2021). E (n) equivariant graph neural networks. arXiv
3318–3328. preprint arXiv:2102.09844.
Jumper,J. et al. (2021) Highly accurate protein structure prediction with .Schneider, C., et al. (2021). DLAB: deep learning methods for structure-based
alphafold. Nature, 596, 583–589. virtual screening of antibodies. Bioinformatics, 38(2), 377–383.
Kingma,D.P. and Ba,J. (2014). Adam: a method for stochastic optimization. Senior,A.W. et al. (2020) Improved protein structure prediction using poten-
arXiv preprint arXiv:1412.6980. tials from deep learning. Nature, 577, 706–710.
Kovaltsuk,A. et al. (2018) Observed antibody space: a resource for data min- Teplyakov,A. et al. (2014) Antibody modeling assessment II. structures and
ing next-generation sequencing of antibody repertoires. J. Immunol., 201, models. Proteins, 82, 1563–1582.
2502–2509. Yang,J. et al. (2020) Improved protein structure prediction using
Kryshtafovych,A. et al. (2021) Critical assessment of methods of protein struc- predicted interresidue orientations. Proc. Natl. Acad. Sci. USA, 117,
ture prediction (CASP)—round XIV. Proteins, 89, 1607–1617. 1496–1503.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy