De Novo Design
De Novo Design
copying and redistribution of the article or any adaptations for non-commercial purposes.
Article
pubs.acs.org/jcim
■ INTRODUCTION
In the drug discovery process, various characteristics, including
respectively. The general advantage of ligand-based approaches
is their wide range of applicability, because those approaches
biological activities and ADMET properties, must be can be used in the case of that the three-dimensional (3D)
considered simultaneously. Medicinal chemists synthesize structure of the target is not available.
congeneric series of compounds to clarify the structure− Evolutionary algorithms are actively used for the compu-
activity relationship (SAR) of their own hits or lead series, and terized molecular design. They are based on concepts derived
they use the SAR knowledge to optimize the compounds in from biological evolution, including reproduction, mutation,
further synthesis. In addition, drug discovery entails compli- crossover, and selection. The algorithms are widely used to
cated tasks of optimization with respect to ADMET properties. solve various drug discovery problems, such as parameter
Medicinal chemists are often required to change scaffolds optimization of QSAR/QSPR models28 and 3D-ligand align-
further by so-called core hopping to address scaffold-dependent ment29 as well as the compound design described above. New
issues.1−3 One of the good examples is GSK’s B-Raf inhibitor molecules are designed by repeatedly applying the evolutionary
program.4,5 They changed the molecular frameworks (scaf- operations to existing molecules. Among these operations,
folds) during both of lead generation and lead optimization mutation and crossover are vital for generating new chemical
stages to discover a compound for clinical trial. structures. Mutation methods are roughly classified into the
Together with medicinal chemistry, computational chemistry fragment-based mutation and the atom-based mutation. In our
plays an important part in the discovery of new drugs. previous study, the atom-based method was used for the
Computational molecular design has been an active research mutation, in which an atom is modified into another atom to
area over the past decades. Many computerized structural explore the chemical space. The method often resulted in a lot
design approaches have been developed, which utilize protein- of unfavorable structures that contained invalid hetero−hetero
structures and/or ligand-structures. 6−27 For example, atom bonds such as O−O and N−F.20 One of the approaches
LEGEND,6 LUDI,7 SPROUT,8 LEA3D,9 LigBuilder,10,11 and to avoiding this problem is to use exclusion rules of the
SYNOPSIS12 use protein structures; whereas TOPAS,13 substructures reported by Huang et al.18 An alternative method
CoG,14 and Flux15,16 use the structures of known ligands.
The former methods are referred to as the structure-base design Received: July 17, 2013
and the latter methods to as the ligand-based design, Published: December 28, 2013
© 2013 American Chemical Society 49 dx.doi.org/10.1021/ci400418c | J. Chem. Inf. Model. 2014, 54, 49−56
Journal of Chemical Information and Modeling Article
is to use a fragment-based mutation instead of the atom-based prepared using the seed fragments and additional fragments
mutation. We also considered that the use of fragments is a from a fragment library. The procedure of the present
good way to generate chemically feasible structures when those evolutionary approach is summarized as follows:
fragments are derived from known molecules. (1) Input a reference structure.
Evolutionary algorithms use fitness score to select the (2) Generate seed fragments.
surviving structures. For example, Molecule Evoluator17 uses (3) Make an initial set of individual structures.
medicinal chemists’ knowledge and Flux uses a similarity index (4) Generate offsprings for the next generation by mutation
(Tanimoto coefficient or Euclidean distance) as the fitness and crossover.
function. In addition, building structures with chemical (5) Evaluate fitness of the structures and select some of them
feasibility is an important point for de novo design. Frag- to survive.
ment-based approaches were favorably used for the purpose. (6) Steps 4 and 5 are repeated until that the alternation of
For example, NovoFLAP19 uses fragments with thirty-two generation reaches to the specified number.
chemical transformation operators to generate structures
obeying valence rules. Flux builds molecules by a RECAP- In the following, we will use the term “fragment” to mean a
based rule (11 reaction schemes) to connecting fragments. building block. In our approach, a molecule can be built by
These approaches are reasonable to generate feasible structures. connecting the fragments at their connection points specified in
However, we considered that a more simplified approach with advance. The connection points of a fragment are identified
fewer connection rules has an advantage in use. from the original molecule as shown in Figure 2.
In this paper, we propose a similarity-driven fragment-based
evolutionary approach to producing drug-like molecules for
drug discovery. Aim of the method is to explore the candidates
that are similar to a reference molecule and yet somewhat
different in not only the side chains but also their scaffolds.
Chemical feasibility of the candidates is also considered.
■ METHODS
Outline of Evolutionary Algorithm. The basic idea of the
present method is a similarity-driven simple evolutionary
approach (Figure 1). The method employs an existing active
molecule of our interest. It is used as a reference molecule to
navigate a chemical space to be explored. The reference
molecule is also used to obtain seed fragments for making the
initial population. The initial set of individual structures is
Figure 2. Example of generation of seed fragments and making a initial
structure. A connectable point is indicated by an asterisk.
Figure 4. Fragment connection rules for generating molecules with chemical validity. Allowd connections (a−c) and prohibited connections (d−f).
■
crossed over. To achieve crossover, both the fragment type and
bond order at the two crossover points must be the same
(Figure 5d). RESULT AND DISCUSSION
Fitness Evaluation. Our aim of the current work is to Change of the Mean Fitness. First, a computational trial
explore the candidate structures that are similar to a reference of the molecular evolution was carried out to explore the
molecule and yet somewhat different in the scaffolds. For that
reason, the Tanimoto coefficient was used as a fitness function
to evaluate the molecular similarity to the reference molecule.
Surviving compounds were then selected in accordance with
the fitness scores (details are to be described in the Selection
subsection). These evolutionary processes were repeated for
the number of generations specified in advance to yield the
designed structures as an output.
In this study, a particular descriptor (referred to as
topological-fragment-descriptor; TFD) was employed to profile
chemical structures. The TFD was calculated in a manner
similar to that of topological-fragment-spectra method.34,35 For
the first, we enumerated all possible structural fragments that
have the specified number (six atoms in this work) or less
connected atoms excepting hydrogen atoms. Each fragment was Figure 6. Change of the mean fitness for reference molecule 1. The
red shows the total average of the ten trials. The blue line shows that
characterized by its constituent atoms based on atomic type,
for the best molecule, and the green for the worst molecule.
hybridization, and whether the atom contained at least one
(aromatic or aliphatic) ring. Then, each characterized fragment
was hashed into a single integer. The occurrence of individual candidate structures for hAA2A with a reference molecule 1. To
fragments with the same characteristic value was then counted examine the performance of the current approach, the change
to generate a numerical vector. Every chemical structure was of the mean fitness was measured.
described as a multidimensional numerical pattern vector by The mean fitness of each population was calculated for the
means of the TFD method. 100 surviving molecules. As mentioned above, every experi-
52 dx.doi.org/10.1021/ci400418c | J. Chem. Inf. Model. 2014, 54, 49−56
Journal of Chemical Information and Modeling Article
Figure 8. Result of hAA2A ligand design. All of the designed molecules are taken from the 5th trial. The NTG (NTG/CG) of a molecule is specified
by bold atoms and bold bonds.
Figure 9. Successful examples of r5HT1A ligand design. Structures with known NTG (NTG/CG) were successfully designed from seed fragments.
The NTG moiety is specified by bold atoms and bold bonds.
Table 1. Result of Scaffold Analysis Using NTG (NTG/CG) was designed from a reference of 15, and compound 17 that
a shares same NTG was found as a known active molecule. These
GPCR SARfari designed
design examples show the applicability of our proposed
active NTG- method.
target moleculesb NTGsb moleculesb NTGsb sharedc
Chemical Feasibility of Designed Molecules. In Figure
hAA2A 2214 811 2975 610 6
8, molecules 2−5 are the designed molecules with the highest
r5HT1A 3195 1260 4063 865 10
a
fitness, the lowest fitness, and in-between molecules collected
Five reference molecules were used for the design. bDuplicates were from the fifth trial (run 5). The reference molecule used in this
excluded. cNumber of NTG/CGs that are shared in both of the entries case was 1. Chemical feasibility (or chemical validity) of the
of GPCR SARfari and the designed molecules.
designed molecules was examined because the candidate
structures should not include unfavorable structures such as
4) shared the same NTG of reference 1, demonstrating the invalid heterohetero atom bonds that often appeared in our
evolutionary direction of the similarity-based approach. The previous work. In this work, we introduced a fragment library
notable points of the design are the following: (1) NTG for the mutation operation to avoid the problem. The
scaffolds of the designed molecule 5 and a known hAA2A connection rules for the fragments defined in the present
ligand 7 are same. (2) NTG scaffolds of 5 and the reference method may also play an important role to improve the
molecule 1 are different. The structural difference between 5
performance. The designed molecules are highly similar to the
and 7 is only three methyl groups on the furan rings and the
reference molecule. The matter is obvious from the visual
amide linker. As mentioned, a new molecule with similar but
inspection of Figure 8 as well. In particular, the scaffolds of
different scaffold could be successfully designed from simple
seed fragments. It should also be noted that molecules with compounds 3 and 4 (shown by bold atoms and bold bonds) are
lower fitness are to be worthy of remark. For example, the the same as the scaffold of the reference molecule.
fitness of the molecule 5 is 0.73. Scaffold Variation of Designed Molecules. We
Ligand Design for r5HT1A. Figure 9 shows part of the investigated scaffold variation of the designed molecules
results of the r5HT1A ligand design. The designed molecules obtained from the molecular evolution experiment. Again, the
for r5HT1A were compared with the active molecules of GPCR chemical graph (CG) representation of the nonterminal vertex
SARfari as well. Some successful examples are shown in Figure graph (NTG) was used to define the scaffold. As shown in
9. When 2 was used as a reference, molecule 8 was obtained as Figure 8, the scaffold of the designed molecule 3 is the same as
one of the designed molecules. NTG scaffolds of 2 and 8 are that of the reference molecule 1, but the scaffold of the
different from each other. Compound 9 was identified from molecule 5 is different from 1. The number of unique
GPCR SARfari as a similar molecule of 8. The structural molecules and unique scaffolds are summarized in Table 1.
difference between 8 and 9 was only a methoxy group of the The results clearly show that a large number of unique
phenyl ring. Compound 10 was also designed from the molecules with a variety of the scaffolds were produced by the
reference 2. Compound 11 was identified from the database current molecular evolutions. The ratios of the number of
that has the same NTG of 10. Compound 13 was designed unique molecules to the number of unique scaffolds were 4.88
from a reference of 12, and we were able to find 14 that for hAA2A and 4.70 for r5HT1A, respectively. This means that
perfectly matched a molecule in the database. Compound 16 the designed molecules that shared the same scaffold are less
54 dx.doi.org/10.1021/ci400418c | J. Chem. Inf. Model. 2014, 54, 49−56
Journal of Chemical Information and Modeling Article
■
successfully designed by our approach without any special
consideration. In other words, six known (validated) scaffolds ABBREVIATIONS
and 604 new scaffolds were produced during the current
molecular evolution for the ligand design. For the case of QSAR, quantitative structure−activity relationship; QSPR,
r5HT1A, 10 known scaffolds and 855 new scaffolds were quantitative structure−property relationship; GPCR, G pro-
produced during the molecular evolution with the reference tein-coupled receptor; r5HT1A, rat 5-hydroxytryptamine
receptor 1A; hAA2A, human adenosine receptor A2a
■
molecules.
Comparison with Other Methods. The performance of
the current approach was compared with other methods. We REFERENCES
compared with two recent works, NovoFLAP19 and Flux,15 (1) Jenkins, J. L.; Glick, M.; Davies, J. W. A 3D similarity method for
because they reported the chemical structures of both of scaffold hopping from known drugs or natural ligands to new
reference molecules and designed molecules. Here, we focused chemotypes. J. Med. Chem. 2004, 47, 6144−6159.
on molecular similarity and medicinal chemistry viewpoint. (2) Oyarzabal, J.; Howe, T.; Alcazar, J.; Andrés, J. I.; Alvarez, R. M.;
Dautzenberg, F.; Iturrino, L.; Martínez, S.; Van der Linden, I. Novel
First, the study was performed using the reference molecules of approach for chemotype hopping based on annotated databases of
CP99994 (s1) and ICI (s6).19 Then, another study was chemically feasible fragments and a prospective case study: new
performed using the reference molecules of Gleevec (s11) and melanin concentrating hormone antagonists. J. Med. Chem. 2009, 52,
a Factor Xa inhibitor (s14).15 The designed molecules with the 2076−2089.
highest fitness are shown in Supporting Information Figures S1 (3) Bahl, A.; Barton, P.; Bowers, K.; Caffrey, M. V.; Denton, R.;
and S2. When CP99994 (s1) was used as a reference, s2 was Gilmour, P.; Hawley, S.; Linannen, T.; Luckhurst, C. A.; Mochel, T.;
designed as the best molecule and s3 was designed as the Perry, M. W.; Riley, R. J.; Roe, E.; Springthorpe, B.; Stein, L.;
second best molecule. The designed molecule s2 is very similar Webborn, P. Scaffold-hopping with zwitterionic CCR3 antagonists:
to s1; the difference is only the substitution position of the identification and optimization of a series with good potency and
methoxy group. The difference between s1 and s3 was the size pharmacokinetics leading to the discovery of AZ12436092. Bioorg.
Med. Chem. Lett. 2012, 22, 6694−6699.
of the central heteroring, in which such a design is not shown in (4) Stellwagen, J. C.; Adjabeng, G. M.; Arnone, M. R.; Dickerson, S.
the literature. This type of designed molecule is medicinally H.; Han, C.; Hornberger, K. R.; King, A. J.; Mook, R. A., Jr.; Petrov, K.
relevant because of an empirical knowledge that reducing the G.; Rheault, T. R.; Rominger, C. M.; Rossanese, O. W.; Smitheman, K.
ring size may improve metabolic stability.37 In the case of ICI N.; Waterson, A. G.; Uehling, D. E. Development of potent B-
(s6), candidate molecules which have new molecular frame- RafV600E inhibitors containing an arylsulfonamide headgroup. Bioorg.
works (s7, s8) were produced by connecting the known Med. Chem. Lett. 2011, 21, 4436−4440.
fragments in novel ways. Although it is difficult to strictly (5) Rheault, T. R.; Stellwagen, J. C.; Adjabeng, G. M.; Hornberger, K.
compare the performance or the quality of different methods, R.; Petrov, K. G.; Waterson, A. G.; Dickerson, S. H.; Mook, R. A.;
the result shows at least that the similar and medicinally Laquerre, S. G.; King, A. J.; Rossanese, O. W.; Arnone, M. R.;
relevant analogues were successfully designed by our method. Smitherman, K. N.; Kane-Carson, L. S.; Han, C.; Moorthy, G. S.;
■
Moss, K. G; Uehling, D. E. Discovery of dabrafenib: a selective
inhibitor of Raf kinases with antitumor activity against B-Raf-driven
CONCLUSIONS tumors. ACS Med. Chem. Lett. 2013, 4, 358−362.
We reported a similarity-driven simple evolutionary approach (6) Nishibata, Y.; Itai, A. Automatic creation of drug candidate
to producing candidate molecules for drug design and structures based on receptor structure. Starting point for artificial lead
discovery. The method makes it possible to produce candidate generation. Tetrahedron 1991, 47, 8885−8990.
molecules that are similar to the reference molecule and yet (7) Bohm, H. J. The computer program LUDI: A new method for
the de novo design of enzyme inhibitors. J. Comput.-Aided Mol. Des.
somewhat different in not only side chains but also their 1992, 6, 61−78.
scaffolds. And it is also expected that those candidate structures (8) Gillet, V.; Johnson, A. P.; Mata, P.; Sike, S.; Williams, P.
are chemically feasible. The method was implemented on a SPROUT: A program for structure generation. J. Comput.-Aided Mol.
software tool and validated with the computer experiments for Des. 1993, 7, 127−153.
the GPCR-related ligand design using our own fragment library (9) Douguet, D.; Munier-Lehmann, H.; Labesse, G.; Pochet, S.
prepared from GPCR SARfari. LEA3D: a computer-aided ligand design for structure-based drug
■
design. J. Med. Chem. 2005, 48, 2457−2468.
ASSOCIATED CONTENT (10) Wang, R.; Gao, Y.; Lai, L. A multi-purpose program for
structure-based drug design. J. Mol. Model. 2000, 6, 498−516.
*
S Supporting Information (11) Yuan, Y.; Pei, J.; Lai, L. LigBuilder 2: A practical de novo drug
Results of comparison study (Figures S1 and S2). This material design approach. J. Chem. Inf. Model. 2011, 51, 1083−1091.
is available free of charge via the Internet at http://pubs.acs.org. (12) Vinkers, H. M.; de Jonge, M. R.; Daeyaert, F. F.; Heeres, J.;
■
Koymans, L. M.; van Lenthe, J. H.; Lewi, P. J.; Timmerman, H.; Van
AUTHOR INFORMATION Aken, K.; Janssen, P. A. SYNOPSIS: synthesize and optimize system in
silico. J. Med. Chem. 2003, 46, 2765−2773.
Corresponding Author (13) Schneider, G.; Lee, M. L.; Stahl, M.; Schneider, P. De novo
*Tel.: +81-75-594-0787. Fax: +81-75-594-0790. E-mail: kawai_ design of molecular architectures by evolutionary assembly of drug-
kentaro@kaken.co.jp. derived building blocks. J. Comput.-Aided Mol. Des. 2000, 14, 487−494.
(14) Brown, N.; McKay, B.; Gilardoni, F.; Gasteiger, J. A graph-based (35) Kawai, K.; Fujishima, S.; Takahashi, Y. Predictive activity
genetic algorithm and its application to the multiobjective evolution of profiling of drugs by topological-fragment-spectra-based support
median molecules. J. Chem. Inf. Comput. Sci. 2004, 44, 1079−1087. vector machines. J. Chem. Inf. Model. 2008, 48, 1152−1160.
(15) Fechner, U.; Schneider, G. Flux (1): A virtual synthesis scheme (36) Takahashi, Y. Chemical data mining based on non-terminal
for fragment-based de novo design. J. Chem. Inf. Model. 2006, 46, 699− vertex graph. In 2004 IEEE International Conference on Systems, Man
707. and Cybernetics, The Hague, Oct 10−13; IEEE: Piscataway, NJ, 2004;
(16) Fechner, U.; Schneider, G. Flux (2): Comparison of molecular Vol. 5, pp 4583−4587.
mutation and crossover operators for ligand-based de novo design. J. (37) St. Jean, D. J., Jr.; Fotsch, C. Mitigating heterocycle metabolism
Chem. Inf. Model. 2007, 47, 656−667. in drug discovery. J. Med. Chem. 2012, 55, 6002−6020.
(17) Lameijer, E. W.; Kok, J. N.; Bäck, T.; Ijzerman, A. P. The
molecule evoluator. An interactive evolutionary algorithm for the
design of drug-like molecules. J. Chem. Inf. Model. 2006, 46, 545−552.
(18) Huang, Q.; Li, L. L.; Yang, S. Y. PhDD: a new pharmacophore-
based de novo design method of drug-like molecules combined with
assessment of synthetic accessibility. J. Mol. Graph. Model. 2010, 28,
775−787.
(19) Damewood, J. R., Jr.; Lerman, C. L.; Masek, B. B. NovoFLAP: A
ligand-based de novo design approach for the generation of
medicinally relevant ideas. J. Chem. Inf. Model. 2010, 50, 1296−1303.
(20) Kawai, K.; Yoshimaru, K.; Takahashi, Y. Generation of target-
selective drug candidate structures using molecular evolutionary
algorithm with SVM classifiers. J. Comput. Chem. Jpn. 2011, 10, 79−87.
(21) Gillet, V.; Johnson, A. P.; Mata, P.; Sike, S.; Williams, P.
SPROUT: A program for structure generation. J. Comput.-Aided Mol.
Des. 1993, 7, 127−153.
(22) Eisen, M. B.; Wiley, D. C.; Karplus, M.; Hubbard, R. E. HOOK:
A program for finding novel molecular architectures that satisfy the
chemical and steric requirements of a macromolecule binding site.
Proteins 1994, 19, 199−221.
(23) Bohacek, R. S.; McMartin, C. Multiple highly diverse structures
complementary to enzyme binding sites: Results of extensive
application of de novo design method incorporating combinatorial
growth. J. Am. Chem. Soc. 1994, 116, 5560−5571.
(24) Pearlman, D. A.; Murcko, M. A. CONCERTS: Dynamic
connection of fragments as an approach to de novo ligand design. J.
Med. Chem. 1996, 39, 1651−1663.
(25) Beccari, A. R.; Cavazzoni, C.; Beato, C.; Costantino, G. LiGen:
A High Performance Workflow for Chemistry Driven de Novo Design.
J. Chem. Inf. Model. 2013, 53, 1518−1527.
(26) Schneider, G.; Fechner, U. Computer-based de novo design of
drug-like molecules. Nat. Rev. Drug Discov. 2005, 4, 649−663.
(27) Dey, F.; Caflisch, A. Fragment-based de novo ligand design by
multiobjective evolutionary optimization. J. Chem. Inf. Model. 2008, 48,
679−690.
(28) Wang, J.; Krudy, G.; Xie, X. Q.; Wu, C.; Holland, G. Genetic
Algorithm-Optimized QSPR Models for Bioavailability, Protein
Binding, and Urinary Excretion. J. Chem. Inf. Model. 2006, 46,
2674−2683.
(29) Jones, G.; Gao, Y.; Sage, C. R. Elucidating molecular overlays
from pairwise alignments using a genetic algorithm. J. Chem. Inf. Model.
2009, 49, 1847−1855.
(30) Walters, W. P.; Green, J.; Weiss, J. R.; Murcko, M. A. What do
medicinal chemists actually make? A 50-year retrospective. J. Med.
Chem. 2011, 54, 6405−6416.
(31) GPCR SARfari. https://www.ebi.ac.uk/chembl/sarfari/
gpcrsarfari (accessed Mar 8, 2013).
(32) Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.;
Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.;
Overington, J. P. ChEMBL: a large-scale bioactivity database for drug
discovery. Nucleic Acids Res. 2012, 40, D1100−1107.
(33) O’Boyle, N. M.; Banck, M.; James, C. A.; Morley, C.;
Vandermeersch, T.; Hutchison, G. R. Open Babel: An open chemical
toolbox. J. Cheminf. 2011, 3, 33.
(34) Takahashi, Y.; Ohoka, H.; Ishiyama, Y. Structural Similarity
Analysis Based on Topological Fragment Spectra. In Advances in
Molecular Similarity; Carbo, R., Mezey, P., Eds.; JAI Press: Greenwich,
CT, 1998; Vol. 2, pp 93−104.