Ijms 22 03848
Ijms 22 03848
Molecular Sciences
Article
Structure Driven Prediction of Chromatographic Retention
Times: Applications to Pharmaceutical Analysis
Roman Szucs 1,2, * , Roland Brown 1 , Claudio Brunelli 1 , James C. Heaton 1 and Jasna Hradski 2
1 Pfizer R&D UK Limited, Ramsgate Road, Sandwich CT13 9NJ, UK; roland.brown@pfizer.com (R.B.);
claudio.brunelli@pfizer.com (C.B.); james.heaton@pfizer.com (J.C.H.)
2 Department of Analytical Chemistry, Faculty of Natural Sciences, Comenius University in Bratislava,
Mlynská Dolina CH2, Ilkovičova 6, SK-84215 Bratislava, Slovakia; hradski1@uniba.sk
* Correspondence: roman.szucs@pfizer.com
Abstract: Pharmaceutical drug development relies heavily on the use of Reversed-Phase Liquid Chro-
matography methods. These methods are used to characterize active pharmaceutical ingredients and
drug products by separating the main component from related substances such as process related im-
purities or main component degradation products. The results presented here indicate that retention
models based on Quantitative Structure Retention Relationships can be used for de-risking methods
used in pharmaceutical analysis and for the identification of optimal conditions for separation of
known sample constituents from postulated/hypothetical components. The prediction of retention
times for hypothetical components in established methods is highly valuable as these compounds
are not usually readily available for analysis. Here we discuss the development and optimization of
retention models, selection of the most relevant structural molecular descriptors, regression model
building and validation. We also present a practical example applied to chromatographic method
Citation: Szucs, R.; Brown, R.; development and discuss the accuracy of these models on selection of optimal separation parameters.
Brunelli, C.; Heaton, J.C.; Hradski, J.
Structure Driven Prediction of Keywords: Quantitative Structure Retention Relationships; chromatographic method development;
Chromatographic Retention Times: pharmaceutical analysis
Applications to Pharmaceutical
Analysis. Int. J. Mol. Sci. 2021, 22,
3848. https://doi.org/10.3390/
ijms22083848 1. Introduction
Pharmaceutical analysis is an important area of chemical analysis used to support
Academic Editor: Josef Jampilek
diverse and excessively complex activities associated with drug development. The appli-
cation of Reversed-Phase Liquid Chromatography (RP-LC) is ubiquitous in the support
Received: 26 March 2021
Accepted: 6 April 2021
of process chemistry optimisation, formulation development as well as key quality con-
Published: 8 April 2021
trol assessment for the release of materials designated for all stages of pre-clinical and
clinical trials.
Publisher’s Note: MDPI stays neutral
In process chemistry development, RP-LC is commonly used to assess the assay/purity
with regard to jurisdictional claims in
of starting materials, isolated synthetic intermediates and Active Pharmaceutical Ingredi-
published maps and institutional affil- ents (APIs). This usually requires baseline separation of all known components of complex
iations. mixtures, their identification and subsequent quantitation. This is performed in accor-
dance with the International Council for Harmonisation of Technical Requirements for
Pharma-ceuticals for Human Use guidelines as applied to product specification, impurities
management and method validation [1–3]. In addition, purging of process related impuri-
Copyright: © 2021 by the authors.
ties, synthetic by-products and key degradants requires their chromatographic monitoring
Licensee MDPI, Basel, Switzerland.
at all relevant interventions (e.g., isolation steps). Process chemistry understanding relies
This article is an open access article
heavily on the application of RP-LC. Chemists are required to understand the impact of syn-
distributed under the terms and thetic parameters on the quality of their processes which make important starting materials,
conditions of the Creative Commons intermediates and final API. This is an essential requirement of commercial synthetic route
Attribution (CC BY) license (https:// development. Lastly, the understanding of degradation also requires chromatographic
creativecommons.org/licenses/by/ separation of key degradation products from the main component and their subsequent
4.0/). identification and quantitation [4–6].
Figure 1. Pairwise
Figure 1. Pairwise structural
structural similarities
similarities expressed
expressed as
as Tanimoto
Tanimotoindex.
index.See
Seetext
textfor
fordetails.
details.
2.1.5.
2.1.5. Model
Model Validation
Validation
In
In order to
order to assess
assess the
the ability
ability of
of QSRR
QSRR models
models to
to predict
predict retention
retention times
times ofof compounds
compounds
that
that were not used in their development or optimisation, retention times eight
were not used in their development or optimisation, retention times for test sets,
for eight test
created
sets, created as described in the Section 2.1.2, were predicted. This was repeated for all
as described in the Section 2.1.2, were predicted. This was repeated for six
all six
screening
screening conditions
conditions asas described
describedin inthe
theSection
Section2.1.1. QSRR predicted
2.2. QSRR predicted retention
retention times
times are
are
shown
shown in in the
the Table S1 and
Table S1 and Figure
Figure 33 demonstrates
demonstrates the
the match
match between
between QSRR
QSRR predicted and
predicted and
experimentally determinedretention
experimentally determined retentiontimes.
times.Finally,
Finally,the
thecorresponding
corresponding RMSE
RMSE andand R val-
R values
ues are provided in the
are provided in the Table 3. Table 3.
Figure 3. Predicted vs experimental retention times (t ) for 6 screening conditions. See Table 4 for
Figure 3. Predicted vs experimental retention times (tR
R) for 6 screening conditions. See Table 4 for
the details
the details of
of experiments.
experiments.
Table 3.3.RMSE
Root and
meanR values
squareforerror
test sets at six screening
(RMSE) conditions.
and correlation See Table(R)
coefficient 4 for the experiment
values details.
for test sets at 6
screening conditions. See Table 4 for the details of experiments.
Experiment Experiment Experiment Experiment Experiment Experiment
#1
Experiment Experiment #2 #3
Experiment #4
Experiment #5
Experiment #6
Experiment
RMSE #1
0.4262 #2
0.9981 #3
0.3472 #4
1.0133 #5
0.4091 #6
0.8401
RMSER 0.9769
0.4262 0.9763
0.9981 0.9851
0.3472 0.9792
1.0133 0.9799
0.4091 0.9874
0.8401
R 0.9769 0.9763 0.9851 0.9792 0.9799 0.9874
2.2. Application to Method Development
2.2. Application
As describedto Method Development optimisation is performed once a suitable stationary
in the introduction,
and mobile phase,in
As described buffer, and pH [20]optimisation
the introduction, is selected. Atis this stage, itonce
performed is typically column
a suitable tem-
stationary
perature
and andphase,
mobile the content
bufferofand
organic modifier
pH [20] in the At
is selected. mobile
this phase
stage, (Gradient time
it is typically = tG [min])
column tem-
that are optimised.
perature The details
and the content of the modifier
of organic initial six in
experiments
the mobileare presented
phase in Table
(Gradient time4.= Ex-
tG
[min]) that are optimised. The details of the initial six experiments are presentedS2.
perimental retention times for KPSS for these experiments are shown in Table inThese
Table
measured
4. retention
Experimental times were
retention timesextrapolated
for KPSS forusingthesethe ACD/Labsare
experiments LC shown
Simulator software.
in Table S2.
Int. J. Mol. Sci. 2021, 22, x FOR PEER REVIEW 9 of 15
Figure
Figure 4. Resolution heat
4. Resolution heat map
map for
for KPSS.
key predictive
Intensitysample set (KPSS).
represents Intensity represents
overall chromatogram overallHigh
resolution. chromatogram resolution.
resolution is depicted
High resolution is depicted by red color, low resolution is depicted by blue color. (a) constructed from experimental
by red color, low resolution is depicted by blue color. (a) constructed from experimental retention times. (b) constructed retention
times. (b) constructed
from QSRR predicted from Quantitative
retention Structureindicates
times. Diamond RetentiontheRelationship (QSRR)
center point selectedpredicted
from theretention times. The
model created fromdiamond
experi-
mental retention
indicates times.
the center point selected from the model created from experimental retention times.
In order to assess the suitability of the QSRR, we have essentially replicated the
process described except that in this case, instead of measured retention times, we used
Int. J. Mol. Sci. 2021, 22, x FOR PEER REVIEW 10 of 15
Figure 5. Predicted chromatogram for KPSS components from the retention model built from
Figure 5. Predicted chromatogram for KPSS components from RtModelEXP (solid line) and RtMod-
experimentally determined retention times (RtModelEXP ) (solid line) and the retention model built
elQSRR (dashed line). Column temperature 40°C. Gradient profile: Time = 0 min, %B = 15%; Time = ◦
from QSSR predicted retention times (RtModelQSRR ) (dashed line). Column temperature 40 C.
12 min, %B = 45%; Time = 17 min, %B = 95%. See Materials and Methods for other details.
Gradient profile: Time = 0 min, %B = 15%; Time = 12 min, %B = 45%; Time = 17 min, %B = 95%. See
Materials andto
In order Methods
comparefor retention
other details.
times predicted from RtModelEXP and those predicted
from RtModelQSRR we used all 24 compounds. We then created all possible combinations
In order to compare retention times predicted from RtModelEXP and those predicted
of two to ten components from this compound set. For each of these combinations we
from RtModelQSRR we used all 24 compounds. We then created all possible combinations
calculated a resolution coefficient (RC) according to equation 1
of two to ten components from this compound set. For each of these combinations we
calculated a resolution coefficient (RC) according
𝑅𝐶 = 1 to Equation (1):
(1)
,
𝑒 1 ,
RC = ∏ Rs (1)
limit 1)
where Rslimit = 1.25 is minimal satisfactory resolution i,j ( Rsi,j − between two components and Rsi,j is
e
the actual chromatographic resolution between two components in the mixture. If the Rsi,j
equal Rs
iswhere tolimit = 1.25 is Rs
or exceeds minimal
limit then satisfactory
it is set to resolution
Rslimit. Thebetween two components
RC indicates and Rsi,j is
that if the resolution
the actual
between two chromatographic
components is resolution
equal to or between
exceedstwo components
Rslimit then the inRCthe
hasmixture.
a valueIfof Rsi,j
theone.
is equal to
Whereas, or exceeds
if the Rslimit
resolution then ittwo
between is set to Rslimit . The
components RCthen
is zero indicates
the RCthat if the
value resolution
will also be
∞ components is equal to or exceeds Rs
between
zero (i.e. 1/etwo ≈ 0). Therefore, all other values will falllimit then the
between RC has
values a value
of zero andofone.
one.
Whereas, if the resolution between two components is zero then
Note that for the calculation of the resolution between two components we used average the RC value will also be
zero (i.e., 1/e ∞ ≈ 0). Therefore, all other values will fall between values of zero and one.
peak width of 0.1 min. The black line in Figure 6 shows the portion of all combinations for
Note that
which bothfor the calculation
models (RtModelof EXPthe
andresolution
RtModelQSRR between two components
), predicted we used average
baseline separation of all
peak width of 0.1 min. The
components in the mixture (RC = 1). black line in Figure 6 shows the portion of all combinations for
which both models (RtModelEXP and RtModelQSRR ), predicted baseline separation of all
components in the mixture (RC = 1).
Mol. Sci. 2021,
Int.22,
J. xMol.
FORSci.
PEER REVIEW
2021, 22, 3848 11 of 15 11 of 15
Figure 6. Portion
Figure(%) of all combinations
6. Portion of compounds
(%) of all combinations containing
of compounds two to ten
containing twocomponents for which
to ten components for RtModel
which EXP and
RtModelQSRR predicted
RtModel baseline
EXP and separation
RtModel (Resolution
QSRR predicted Coefficient
baseline (RC) =
separation 1). The total
(Resolution number of(RC)
Coefficient combinations
= 1). The evaluated
total number
is in parentheses. of corresponds
Black line combinationstoevaluated is from
model built in parentheses. Black
predicted data line
and redcorresponds to model
line corresponds built built from
to model
from predicted data and red line corresponds
mixture of predicted and experimental data. See text for details.to model built from mixture of predicted and experi-
mental data. See text for details.
This data demonstrates that of all theoretical mixtures containing up to seven compo-
This data demonstrates
nents which were thatseparated
of all theoretical mixtures containing
with a resolution up tomore
of at least 1.25, seventhan compo-
80% were identified
nents which werewithseparated with aEven
both models. resolution
for theofmost
at least 1.25, more
complex than containing
mixtures 80% were identified
ten components, nearly
with both models.
65%Evenof allfor the most complex
combinations mixtures containing
were identified ten components,
with both models. It can be nearly
concluded that once
QSRR derived
65% of all combinations were retention
identifiedtimes
with are
bothestablished
models. It they
can be canconcluded
be used tothat identify
once conditions in
QSRR derived which all components
retention are fully separated.
times are established, they can be However,
used tothe observation
identify conditions described
in in Figure 6
(black line)
which all components represents
are fully an extreme
separated. However,casethe
since we are comparing
observation describeda in model
Figurebuilt
6 from entirely
experimental
(black line) represents data case
an extreme withsince
one webuilt
arefrom entirelya QSRR
comparing model predicted data. Practically, this
built from entirely
scenario will almost always be applied to a mixture of
experimental data with one built from entirely QSRR predicted data. Practically, this components, for sce-
some of which the
nario will almost always be applied to a mixture of components, for some of which the replacing ap-
measured data will be available. We simulated this scenario by randomly
measured dataproximately 20% (5 We
will be available. out of 24) of retention
simulated times obtained
this scenario by randomly from replacing
RtModelQSRR ap- with retention
proximately 20% times
(fiveobtained
out of 24)from RtModeltimes
of retention EXP . As shown from
obtained in Figure 6 (red
RtModel line),
QSRR withthere
reten-were noticeable
increases
tion times obtained frominRtModel
the proportion of mixtures
EXP. As shown identified
in Figure 6 (redas baseline
line), thereseparated
were noticea- in both models. In
ble increases inpractical terms, we
the proportion usually have
of mixtures many as
identified experimentally determined
baseline separated in both retention
models.times available
In practical terms, we usually have many experimentally determined retention2–5
and few QSRR determined data. We would typically be looking at components with
times
which to estimate successful separation. These components
available and few QSRR determined data. We would typically be looking at 2–5 compo- are likely to be subtle molecular
nents with which to estimate successful separation. These components are likely to bemodel.
modifications within the acceptable structural similarity properties of the
Lastly, pairwise
subtle molecular modifications resolutions
within were calculated
the acceptable structuralforsimilarity
all 24 compounds
propertiesdetermined
of using
the model. both QSRR and experimentally determined retention times. The same assumptions re-
garding
Lastly, pairwise the peak widths
resolutions as in previous
were calculated for allcalculations
24 compounds were made. All using
determined pairs that exhibited
both QSRR and experimentally determined retention times. The same assumptions re- be separated
resolution higher than 20 were excluded as these components would always
garding the peakeven if theas
widths error of prediction
in previous was excessive.
calculations were made. RCAll values
pairsfor
thatallexhibited
remaining pairs were
calculated for retention times predicted from RtModelEXP and RtModelQSRR . RC values
resolution higher than 20 were excluded as these components would always be separated
for these models were compared. Figure 7 shows what proportion of pairwise RC values
even if the error of prediction was excessive. RC values for all remaining pairs were cal-
calculated from RtModelQSRR which falls within specified intervals of RC values calculated
culated for retention times predicted from RtModelEXP and RtModelQSRR. RC values for
from RtModel . This figure demonstrates that in excess of 60% of pairwise RC values
these models were compared.EXP Figure 7 shows what proportion of pairwise RC values cal-
obtained from RtModel fall within ±0.1 of RC values obtained from RtModelEXP . This
culated from RtModelQSRR, which fellQSRR within specified intervals of RC values calculated
again indicates that likelihood of making correct decision with regards to selection optimal
from RtModelEXP. This figure demonstrates that in excess of 60% of pairwise RC values
separation conditions based on QSRR derived models is high.
obtained from RtModelQSRR fall within ±0.1 of RC values obtained from RtModelEXP. This
Int. J. Mol. Sci. 2021, 22, 3848
Int. x FOR PEER REVIEW 12 of
12 of 15
15
Figure Portion(%)
7. Portion
Figure 7. (%)ofofpairwise
pairwiseRC
RCvalues
valuescalculated
calculated from
from RtModel
RtModel QSRR
QSRR
falling
which fallswithin
within certain
certain
interval RC values calculated from RtModel
interval RC values calculated from RtModelEXP . See text for details.
EXP. See text for details.
was used to calculate VolSurf+3D descriptors [14]. Prior to descriptor calculation, 3D con-
formers were generated using Corina (Molecular Networks GmbH, Nürnberg, Germany
and Altamira LLC, Columbus, OH, USA) followed by energy minimization using MMFF94
force field, embedded in MOE software.
WEKA [39] (version 3.8, Waikato, New Zealand) platform was used for feature selec-
tion and for the development and optimization of regression algorithms.
ACD/Labs LC Simulator (ACD/Labs, Toronto, ON, Canada) version 2019 was used
to carry out two-dimensional resolution optimisation.
4. Conclusions
Chromatographic QSRR models were demonstrated to be useful for the prediction
of retention times for hypothetical components with favourable accuracy. Likewise, the
optimum resolution space was shown to be accurately represented when calculated using
this approach. This was achieved by using a combination of Dragon, MOE and VolSurf+3D
descriptors with a Support Vector Machine regression algorithm which outperformed all
other tested conditions. An Evolutionary Search algorithm was used to reduce number of
considered molecular descriptors from which the retention models were built. The retention
times predicted from these models were used to build two-dimensional (gradient time
versus temperature) resolution maps in order to identify optimal separation conditions.
We found excellent agreement between the resolution of sample components obtained
from a model built using experimental retention times with those from QSRR predicted
retention times. These results indicate the usefulness of QSRR for the identification of
optimal chromatographic conditions as well as for de-risking of existing methods for
new/hypothetical components. It thus raises the prospect of an alternative approach to
separation optimisation and de-risking that would not inherently rely on the availability of
physical samples.
Abbreviations
RP-LC Reversed-Phase Liquid Chromatography
API Active Pharmaceutical Ingredient
KPSS Key Predictive Sample Set
QSRR Quantitative Structure Retention Relationship
R Correlation Coefficient
ES Evolutionary Search
MLR Multiple Linear Regression
RMSE Root Mean Square Error
SVM Support Vector Machine
GPR Gaussian Processes Regression
RF Random Forest
PLS Partial Least Squares
RtModelEXP Retention model built from experimental retention times
RtModelQSRR Retention model built from QSRR predicted retention times
RC Resolution Coefficient
Rslimit Minimal satisfactory resolution between two components
Rsi,j Actual chromatographic resolution between two components in the mixture
References
1. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. ICH
Harmonised Tripartite Guideline: Specifications: Test Procedures and Acceptance Criteria for New Drug Substances and New
Drug Products: Chemical Substances Q6A. Available online: https://database.ich.org/sites/default/files/Q6A%20Guideline.pdf
(accessed on 14 November 2020).
2. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. ICH
Harmonised Tripartite Guideline: Impurities in New Drug Substances Q3A(R2). Available online: https://database.ich.org/
sites/default/files/Q3A%28R2%29%20Guideline.pdf (accessed on 31 July 2020).
3. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use.
ICH Harmonised Tripartite Guideline: Validation of Analytical Procedures: Text and Methodology Q2(R1). Available online:
https://database.ich.org/sites/default/files/Q2%28R1%29%20Guideline.pdf (accessed on 31 July 2020).
4. Olsen, B.A.; Sreedhara, A.; Baertschi, S.W. Impurity investigations by phases of drug and product development. TrAC, Trends
Anal. Chem. 2018, 101, 17–23. [CrossRef]
5. Baertschi, S.W.; Alsante, K.M.; Reed, R.A. (Eds.) Pharmaceutical Stress Testing: Predicting Drug Degradation, 2nd ed.; CRC Press:
Boca Raton, FL, USA, 2011. [CrossRef]
6. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use.
ICH Harmonised Tripartite Guideline: Stability Testing of New Drug Substances and Products Q1A(R2). Available online:
https://database.ich.org/sites/default/files/Q1A%28R2%29%20Guideline.pdf (accessed on 22 February 2021).
7. Fekete, S.; Fekete, J.; Molnár, I.; Ganzler, K. Rapid high performance liquid chromatography method development with high
prediction accuracy, using 5 cm long narrow bore columns packed with sub-2 µm particles and Design Space computer modeling.
J. Chromatogr. A 2009, 1216, 7816–7823. [CrossRef]
8. Szucs, R.; Brunelli, C.; Lestremau, F.; Hanna-Brown, M. Liquid chromatography in the pharmaceutical industry. In Liq-
uid Chromatography: Applications, 2nd ed.; Fanali, S., Haddad, P.R., Poole, C.F., Riekkola, M.-L., Eds.; Elsevier: Amsterdam,
The Netherlands, 2017; pp. 515–537. [CrossRef]
9. Witting, M.; Böcker, S. Current status of retention time prediction in metabolite identification. J. Sep. Sci. 2020, 43, 1746–1754.
[CrossRef]
10. Taraji, M.; Haddad, P.R.; Amos, R.I.J.; Talebi, M.; Szucs, R.; Dolan, J.W.; Pohl, C.A. Chemometric-assisted method development in
hydrophilic interaction liquid chromatography: A review. Anal. Chim. Acta 2018, 1000, 20–40. [CrossRef]
11. Kaliszan, R. Quantitative structure property (retention) relationships in liquid chromatography. In Liquid Chromatography:
Fundamentals and Instrumentation, 2nd ed.; Fanali, S., Haddad, P.R., Poole, C.F., Riekkola, M.-L., Eds.; Elsevier: Amsterdam,
The Netherlands, 2017; pp. 553–572. [CrossRef]
12. Bouwmeester, R.; Martens, L.; Degroeve, S. Comprehensive and Empirical Evaluation of Machine Learning Algorithms for Small
Molecule LC Retention Time Prediction. Anal. Chem. 2019, 91, 3694–3703. [CrossRef] [PubMed]
13. Mauri, A.; Consonni, V.; Pavan, M.; Todeschini, R. DRAGON software: An easy approach to molecular descriptor calculations.
MATCH Commun. Math. Comput. Chem. 2006, 56, 237–248.
14. Cruciani, G.; Crivori, P.; Carrupt, P.A.; Testa, B. Molecular fields in quantitative structure-permeation relationships: The VolSurf
approach. J. Mol. Struct. THEOCHEM 2000, 503, 17–30. [CrossRef]
15. Valdés-Martiní, J.R.; Marrero-Ponce, Y.; García-Jacas, C.R.; Martinez-Mayorga, K.; Barigye, S.J.; Vaz D‘Almeida, Y.S.; Pham-The,
H.; Pérez-Giménez, F.; Morell, C.A. QuBiLS-MAS, open source multi-platform software for atom- and bond-based topological
(2D) and chiral (2.5D) algebraic molecular descriptors computations. J. Cheminformatics 2017, 9, 35. [CrossRef]
Int. J. Mol. Sci. 2021, 22, 3848 15 of 15
16. Yap, C.W. PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem. 2011,
32, 1466–1474. [CrossRef] [PubMed]
17. Cao, D.-S.; Xu, Q.-S.; Hu, Q.-N.; Liang, Y.-Z. ChemoPy: Freely available python package for computational biology and
chemoinformatics. Bioinformatics 2013, 29, 1092–1094. [CrossRef] [PubMed]
18. Steinbeck, C.; Hoppe, C.; Kuhn, S.; Floris, M.; Guha, R.; Willighagen, E.L. Recent Developments of the Chemistry Development
Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics. Curr. Pharm. Des. 2006, 12, 2111–2120. [CrossRef]
19. Witten, I.H.; Frank, E.; Hall, M.A.; Pal, C.J. Data Mining: Practical Machine Learning Tools and Techniques, 4th ed.; Morgan Kaufmann:
Cambridge, MA, USA, 2016.
20. Haddad, P.R.; Taraji, M.; Szücs, R. Prediction of Analyte Retention Time in Liquid Chromatography. Anal. Chem. 2021, 93, 228–256.
[CrossRef]
21. Henneman, A.; Palmblad, M. Retention Time Prediction and Protein Identification. In Mass Spectrometry Data Analysis in
Proteomics; Matthiesen, R., Ed.; Humana: New York, NY, USA, 2020; pp. 115–132. [CrossRef]
22. Moruz, L.; Käll, L. Peptide retention time prediction. Mass Spectrom. Rev. 2017, 36, 615–623. [CrossRef]
23. Krokhin, O.V.; Spicer, V. Predicting Peptide Retention Times for Proteomics. Curr. Protoc. Bioinformatics 2010, 13.14.11–13.14.15.
[CrossRef]
24. Tarasova, I.A.; Masselon, C.D.; Gorshkov, A.V.; Gorshkov, M.V. Predictive chromatography of peptides and proteins as a
complementary tool for proteomics. Analyst 2016, 141, 4816–4832. [CrossRef]
25. Krokhin, O. Peptide retention prediction in reversed-phase chromatography: Proteomic applications. Expert Rev. Proteomics 2012,
9, 1–4. [CrossRef] [PubMed]
26. Wen, Y.; Talebi, M.; Amos, R.I.J.; Szucs, R.; Dolan, J.W.; Pohl, C.A.; Haddad, P.R. Retention prediction in reversed phase high
performance liquid chromatography using quantitative structure-retention relationships applied to the Hydrophobic Subtraction
Model. J. Chromatogr. A 2018, 1541, 1–11. [CrossRef] [PubMed]
27. Wen, Y.; Amos, R.I.J.; Talebi, M.; Szucs, R.; Dolan, J.W.; Pohl, C.A.; Haddad, P.R. Retention Index Prediction Using Quantitative
Structure-Retention Relationships for Improving Structure Identification in Nontargeted Metabolomics. Anal. Chem. 2018, 90,
9434–9440. [CrossRef]
28. Taraji, M.; Haddad, P.R.; Amos, R.I.J.; Talebi, M.; Szucs, R.; Dolan, J.W.; Pohl, C.A. Rapid Method Development in Hydrophilic
Interaction Liquid Chromatography for Pharmaceutical Analysis Using a Combination of Quantitative Structure-Retention
Relationships and Design of Experiments. Anal. Chem. 2017, 89, 1870–1878. [CrossRef] [PubMed]
29. Mauri, A.; Consonni, V.; Todeschini, R. Molecular descriptors. In Handbook of Computational Chemistry, 2nd ed.; Leszczynski, J.,
Kaczmarek-Kedziera, A., Puzyn, T., Papadopoulos, M.G., Reis, H., Shukla, M.K., Eds.; Springer: Cham, Switzerland, 2017; pp.
2065–2093. [CrossRef]
30. Leardi, R. Genetic algorithms in chemistry. J. Chromatogr. A 2007, 1158, 226–233. [CrossRef] [PubMed]
31. Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The WEKA Data Mining Software: An Update.
SIGKDD Explor. 2009, 11, 10–18. [CrossRef]
32. Kotthoff, L.; Thornton, C.; Hoos, H.H.; Hutter, F.; Leyton-Brown, K. Auto-WEKA 2.0: Automatic model selection and hyperpa-
rameter optimization in WEKA. J. Mach. Learn. Res. 2017, 18, 826–830.
33. Willett, P.; Barnard, J.M.; Downs, G.M. Chemical Similarity Searching. J. Chem. Inf. Comput. Sci. 1998, 38, 983–996. [CrossRef]
34. Aalizadeh, R.; Thomaidis, N.S.; Bletsou, A.A.; Gago-Ferrero, P. Quantitative Structure-Retention Relationship Models to Support
Nontarget High-Resolution Mass Spectrometric Screening of Emerging Contaminants in Environmental Samples. J. Chem. Inf.
Model. 2016, 56, 1384–1398. [CrossRef] [PubMed]
35. Passarin, P.B.S.; Lourenço, F.R. Modeling an in silico platform to predict chromatographic profiles of UV filters using ChromSimu-
lator. Microchem. J. 2020, 157, 105002. [CrossRef]
36. Shevade, S.K.; Keerthi, S.S.; Bhattacharyya, C.; Murthy, K.R.K. Improvements to the SMO Algorithm for SVM Regression. IEEE
Trans. Neural Netw. 2000, 11, 1188–1193. [CrossRef]
37. Smola, A.J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [CrossRef]
38. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
39. Frank, E.; Hall, M.A.; Witten, I.H. The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and
Techniques”, 4th ed.; Morgan Kaufmann, 2016. Available online: https://www.cs.waikato.ac.nz/ml/weka/Witten_et_al_2016
_appendix.pdf (accessed on 14 November 2020).