ABlooper: Fast Accurate Antibody CDR Loop Structure
ABlooper: Fast Accurate Antibody CDR Loop Structure
https://doi.org/10.1093/bioinformatics/btac016
Advance Access Publication Date: 31 January 2022
Original Paper
Structural bioinformatics
ABlooper: fast accurate antibody CDR loop structure
prediction with accuracy estimation
Received on August 16, 2021; revised on November 26, 2021; editorial decision on January 3, 2022
Abstract
Motivation: Antibodies are a key component of the immune system and have been extensively used as biotherapeu-
tics. Accurate knowledge of their structure is central to understanding their antigen-binding function. The key area
for antigen binding and the main area of structural variation in antibodies are concentrated in the six complementar-
ity determining regions (CDRs), with the most important for binding and most variable being the CDR-H3 loop. The
sequence and structural variability of CDR-H3 make it particularly challenging to model. Recently deep learning
methods have offered a step change in our ability to predict protein structures.
Results: In this work, we present ABlooper, an end-to-end equivariant deep learning-based CDR loop structure
prediction tool. ABlooper rapidly predicts the structure of CDR loops with high accuracy and provides a confidence
estimate for each of its predictions. On the models of the Rosetta Antibody Benchmark, ABlooper makes predictions
with an average CDR-H3 RMSD of 2.49 Å, which drops to 2.05 Å when considering only its 75% most confident
predictions.
Availability and implementation: https://github.com/oxpig/ABlooper.
Contact: opig@stats.ox.ac.uk
Supplementary information: Supplementary data are available at Bioinformatics online.
rotating the input before feeding it into the network will have the IDs of all the structures used in the train, test, and validation sets is
same result as rotating the output. In the case of proteins, using a given in the Supplementary Material.
network equivariant to both translations and rotations in 3D space ABodyBuilder was used to build models of all the structures.
allows us to learn directly from atom coordinates. This is in contrast Structural models were generated using the singularity version of
to previous methods like TrRosetta (Yang et al., 2020) or the origin- ABodyBuilder (Leem et al., 2016) (fragment database from July 8,
al version of AlphaFold (Senior et al., 2020) that predicted invariant 2021) excluding all templates with a 99% or higher sequence iden-
features, such as inter-residue distances and orientations which are tity. ABlooper CDR models for the test sets were obtained by
then used to reconstruct the protein. A number of approaches for remodelling the CDR loops on ABodyBuilder models.
developing equivariant networks have been recently developed (e.g.
Finzi et al., 2021).
2.2 Deep learning
In this article, we explore the use of an equivariant approach to
ABlooper is composed of five E(n)-EGNNs, each one with four
Fig. 1. Flowchart showing how E(n)-EGNN is used to predict CDR loops in ABlooper. The input geometry for each CDR loop is generated by aligning its residues between
their anchors, while the node features are extracted from the loop sequence. Atom coordinates are then iteratively updated using a four-layer E(n)-EGNN resulting in a pre-
dicted set of conformations for each CDR
ABlooper: fast accurate antibody CDR loop structure prediction with accuracy estimation 1879
original predicted positions. The spring constant of the harmonic certainty of the final averaged prediction. If all five models agree on
potential is set to 10 kcal/mol2. Energy minimization is done using the same conformation, then it is more likely that it will be the cor-
the Langevin Integrator in the OpenMM python package (Eastman rect conformation, if they do not, then the final prediction is likely
et al., 2017). This relaxation step typically results in a small loss in to be less accurate (Fig. 2). This allows ABlooper to give a confi-
accuracy, but ensures that predicted loops are physically plausible. dence score for each predicted loop. As shown in Figure 2D, this
score can be used as a filter, removing structures which are expected
to be incorrectly modelled by ABlooper. For example, by setting a
2.4 Deepab and AlphaFold2
1.5 Å inter-prediction RMSD cut-off on structures from the Rosetta
DeepAb structural models were generated using the open-source Antibody Benchmark, the average CDR-H3 RMSD for the set can
version of the code (available at https://github.com/Rosetta be reduced from 2.49 to 2.05 Å while keeping around three quarters
Commons/DeepAb). As suggested in their paper (Ruffolo et al., of the predictions. As expected, accuracy filtering has a tendency to
2021), we generated five decoys per structure. This took around
Table 1. Performance comparison between AlphaFold2, ABodyBuilder, DeepAb and ABlooper for both test sets
The mean RMSD to the crystal structure across each test set for the six CDRs is shown. RMSDs for each CDR are calculated after superimposing their corre-
sponding chain to the crystal structure. RMSDs are given in Angstroms (Å).
a
It is likely that AlphaFold2 used at least some of the structures in the benchmark set during training. Similarly, structures in the SAbDab Latest Structures set
may have been used for training DeepAb.
1880 B.Abanades et al.