Detection of Malicious Urls Using Machine Learning: Nuria Reyes Dorta Pino Caballero Gil Carlos Rosa Remedios
Detection of Malicious Urls Using Machine Learning: Nuria Reyes Dorta Pino Caballero Gil Carlos Rosa Remedios
https://doi.org/10.1007/s11276-024-03700-w
Abstract
The detection of fraudulent URLs that lead to malicious websites using addresses similar to those of legitimate websites is
a key form of defense against phishing attacks. Currently, in the case of Internet of Things devices is especially relevant,
because they usually have access to the Internet, although in many cases they are vulnerable to these phishing attacks. This
paper offers an overview of the most relevant techniques for the accurate detection of fraudulent URLs, from the most widely
used machine learning and deep learning algorithms, to the application, as a proof of concept, of classification models
based on quantum machine learning. Starting from an essential data preparation phase, special attention is paid to the initial
comparison of several traditional machine learning models, evaluating them with different datasets and obtaining interest‑
ing results that achieve true positive rates greater than 90%. After that first approach, the study moves on to the application
of quantum machine learning, analysing the specificities of this recent field and assessing the possibilities it offers for the
detection of malicious URLs. Given the limited available literature specifically on the detection of malicious URLs and
other cybersecurity issues through quantum machine learning, the research presented here represents a relevant novelty on
the combination of both concepts in the form of quantum machine learning algorithms for cybersecurity. Indeed, after the
analysis of several algorithms, encouraging results have been obtained that open the door to further research on the applica‑
tion of quantum computing in the field of cybersecurity.
Keywords Malicious URL · Machine learning · Confusion matrix · ROC curve · Support vector machine · Decision tree ·
Logistic regression · Neural network · Quantum computing
Vol.:(0123456789)
Wireless Networks
use of Quantum Machine Learning (QML) to address this and features like having IP address, URL length, Shortening
problem. In order to expand the set of tools to combat the Service, httpSecure, Digit count or Abnormal URL.
phishing threat, this work analyses the possible application In [13], J48 decision tree, logistic regression, Naive
of QML for the detection of fraudulent URLs, and com‑ Bayes and SVM algorithms are applied with a data‑
pare the obtained results with those produced using clas‑ set from Machine Learning Lab and features like Con‑
sical machine learning/deep learning methods. As this is tentLength, compromissionType, serverType, poweredBy
a fairly new field, one of the first steps is to identify the or contentType.
most suitable combination of algorithms to apply a QML The work [14] considers several generic attributes such
model, depending on the quantum conditions, and taking as length of UR, use of an IP address in URL, hexadecimal
into account both its advantages and disadvantages in order character codes in the URL, @ symbol in URL, number of
to assess whether this approach can be useful in the context dots in URL, number of sensitive words in URL, etc.
of cybersecurity in general, and in the detection of malicious The paper [15] uses a dataset that contains real-world
URLs in particular. legitimate and malicious Android applications, converting
In recent years, several studies have addressed the issue of each application into a grayscale image. Besides, they also
applying ML techniques for the early detection of fraudulent employ a hybrid quantum CNN, a quantum Neural Network,
URLs from different points of view. and other CNN models.
In the paper [3], the authors work with a dataset consist‑ The authors of [16] also apply QML to analyze an intru‑
ing of 121 sets of URLs collected over different days. In sion dataset and compare the obtained results obtained with
total, this public dataset comprises over 2.4 million URLs, conventional Support Vector Machine (SVM) and quantum
each with over 3.2 million features, that are analysed with SVM, as well as with conventional CNN and quantum CNN.
various ML algorithms. The work [17] is another of the few papers that deal with
The authors of [4] propose, in addition to using a blacklist a quantum-based neural network classifier to detect mali‑
of URLs, to leverage other features such as lexical character‑ cious web request.
istics, length of the URL and length of the primary domain. Table 1 shows a schematic comparison between the main
Host-based features also include information such as crea‑ aspects of this work in relation to some of the aforemen‑
tion date, Whois server, and name servers. tioned publications.
In the work [5], the authors apply Convolutional Neural As can be seen, none of the above-mentioned works
Networks (CNN) to both characters and words of the URL includes one of the main novelties of the present work,
string to capture several types of semantic information. which consists of studying the potential of the application
The paper [6] provides an extensive literature review of QML for the early detection of fraudulent URLs, and
highlighting the main techniques used to detect malicious comparing the obtained results with those produced with
URLs that are based on ML models. different classic ML techniques.
The authors of [7] apply logistic regression, decision trees
and SVM combined with majority voting technique for mali‑
cious URLs detection. Table 1 Comparative analysis
The work [8] uses decision trees, random forest, SVM,
Refer‑ ML fund Multiple Different ML/QML QML
Naive Bayes and CNN algorithms, with a dataset of with ences ML datasets parameter‑
legitimate website URLS collected from the site lists of the izations
top 5000 websites in the world.
The paper [9] applies decision trees, K-NN and random [3] Yes Yes No No No
forest algorithms on a dataset taken from a specific reposi‑ [5] No No (Only No No No
one)
tory for ML.
[7] Yes Yes No No No
The work [10] uses random forest, K-NN, J48 decision
[8] Yes Yes No No No
tree and BayesNet algorithms on a dataset taken from mali‑
[9] No Yes No No No
cious and benign websites and ML classifiers, using a fea‑
[10] Yes Yes Yes No No
tures like URL length or number of special characters.
[11] Yes Yes No No No
The authors of [11] use decision tree, random forest,
[12] Yes Yes No No No
K-NN, Naive Bayes, SVM and logistic regression algorithms
[13] No Yes No No No
with a dataset from Kaggle, using a features like URL labels
[14] Yes No No No No
and text tokenization.
[16] No Yes No Yes Yes
The paper [12] uses J48 decision tree, logistic regres‑
[17] No No No No Yes
sion, Naive Bayes and SVM algorithms with a dataset from
This paper Yes Yes Yes Yes Yes
Open-Phish, Phishtank, Zone-H, and WEBSPAM-UK2007,
Wireless Networks
layers there are several neurons with the following structure. 4 Metrics
The neurons of the same layer are not connected to each other
and they all share the same activation function. On the other To assess different ML algorithms, both confusion matrix or
hand, when there are two consecutive layers, it happens that ROC curve can be used.
all the neurons in one layer connect with all the neurons in the
next layer, which makes the network a dense network. 4.1 Confusion matrix
the Curve (AUC). A classifier is said to be perfect when it • In deep learning, use early stopping to halt training when
has an area under the ROC curve equal to 1. An example validation performance starts to degrade [31].
of a ROC curve can be seen in Fig. 2.
In order to resolve the underfitting issue, the following solu‑
tions can be applied:
5 Issues with machine learning algorithms
• Choose more complex models with greater capacity to
Two typical problems that can be found when using Machine capture the underlying patterns, and use deep neural
Learning algorithms are overfitting and underfitting [28]. networks or more sophisticated Machine Learning algo‑
Overfitting occurs when a Machine Learning model rithms.
learns the training data too well, capturing not only the • Create additional informative features that better describe
underlying patterns but also the noise or random fluctuations the data, and experiment with transformations of existing
in the data. As a result, the model performs exceptionally features.
well on the training data but poorly on unseen or new data. • Adjust hyperparameters to fine-tune the model, and try
Underfitting occurs when a Machine Learning model is different learning rates, depths, regularization strengths,
too simple to capture the underlying patterns in the training etc [32].
data. It fails to fit the training data and, as a result, performs
poorly on both the training and test data.
In order to resolve the overfitting issue, the following 6 Used dataset
solutions can be applied:
For the research on ML applied to the detection of fraudulent
• Use simpler models with fewer parameters or less com‑ URLs, the dataset obtained from https://machinelear ning.
plicated Machine Learning algorithms. inginf.units.it/data-and-tools/hidden-fraudulent-urls-dataset
• Select relevant features and discard irrelevant or redun‑ was used. This dataset was chosen because it is available and
dant ones, and apply techniques like dimensionality labelled, and contains a rich set of data that provides good
reduction like PCA [29]. performance in the study of different ML algorithms, includ‑
• Use cross-validation techniques like k-fold cross-valida‑ ing QML. In addition, it has been used in other researches,
tion to assess model performance more accurately, and which allows comparing results. Below, a comparison
adjust hyperparameters based on cross-validation results between the present work with another work that used the
[28]. same dataset is included.
• Apply regularization techniques like L1 (Lasso) and L2 In particular, this dataset contains the following
(Ridge) regularization to penalize large parameter values, information:
which prevent the model from fitting noise in the data
[30]. • url is the current URL.
• compromissionType is the variable that indicates if the
website is compromised by phishing, defacement, or nor‑
mal.
• isHiddenFraudulent is a dependent variable indicating
whether the URL is fraudulent or not.
• contentLength is a variable that takes integer values and
was obtained by sending an HTTP HEAD request to the
URL. It also indicates the size of the message body, in
bytes, sent to the recipient.
• serverType is a string indicating the server, such as
Apache, Microsoft IIS.
• poweredBy is a string that indicates the application plat‑
form underlying the web server, that is, it is used to spec‑
ify with which software the response has been generated
by the server.
• contentType contains charset information that is of the
encoding type.
• lastModified is a variable indicating when its last modi‑
Fig. 2 ROC curve fied date was.
Wireless Networks
7 Data processing column were deleted. Doing so, a dataset of 181,916 rows
and 7 columns was obtained, including the dependent vari‑
Both the poweredBy and serverType variables were pre‑ able. Among them, in total were 8,618 fraudulent URLs.
processed to hold the framework name and the major and Therefore, it was concluded that this is an unbalanced set
minor version number. Besides, several approaches were of data, so this class was taken into account by indicating
developed for the treatment of the data and in all of them class_weight="balanced" in the models.
the lastModified column was deleted since that data was Besides, in order to study the correlation, the Pearson
found not relevant to the study. In Fig. 3 the first five rows method is applied to the independent variables that do not
of the used dataset are shown. take classes (see Fig. 4).
Before implementing any model, the NaN value was As can be seen in Fig. 5, there is a high correlation
replaced with a 0 in the PoweredBy column. In the same among some variables. Therefore, it was decided to remove
way, the rows in which some data was missing in any the count_http and count_hyphen variables. However, after
doing so, the dataset was saved with all variables, including
8.1 Results
Table 5 Area under the ROC curve Table 7 Training with the original set
ML algorithm Area (%) ML algorithm Precision (%) Recall (%) Accuracy (%)
present work, when processing the data, focuses more on Table 8 shows the results when the models try to predict
the information provided by the URL itself. In addition, the fraudulent URLs from the new dataset.
that work compares according to accuracy while here the Table 8 shows that the program, when trying to predict a
analysis is based on recall. Thus, to compare data, only their new dataset, the maximum that it is capable of identifying
second Table can be used as here a different approach to of fraudulent URLs is 87%. This percentage is probably due
the data has been followed. Since only two methods coin‑ to the fact that the program found new fraudulent URLS that
cide between both works, those two methods are compared it did not know were fraudulent because there are no similar
below. ones in its database.
Table 6 shows the accuracies obtained by both programs,
where Accuracy (1) denotes the accuracy obtained in the
program described in this paper, while Accuracy (2) denotes 9 Application of quantum machine learning
the accuracy obtained in [13]. It can be seen that the accu‑
racy obtained there in the logistic regression is better than In order to study the possible practical usefulness of ML
the one obtained here, probably due to the data processing algorithms linked to quantum computing, hereinafter called
they used. Quantum Machine Learning, an analysis of a quantum
approach to the solution of the analyzed problem has been
8.2 Evaluation carried out to measure the degree of efficiency that this new
paradigm can provide through the use of quantum neural
To complete the evaluation, the trained models were tested networks.
with a different dataset. Those trained data models were then
evaluated against a new dataset in order to find out how good 9.1 QML algorithms
the model is and whether it serves to generalize. This new
dataset was obtained from https://github.com/ESDAUNG/ In order to apply QML algorithms, four possible approaches
PhishDataset/blob/main/data_bal%20-%2020000.xlsx. can be distinguished depending on how the type of data to be
This new dataset only contains phishing URLs and the used and the hardware on which the algorithms are executed
compromissionType. Therefore, it was decided to delete are combined.
the rest of the columns of the dataset: serverType, con-
tentLength, etc. By retraining three of the models with the • CC: Classical data with ML algorithms running on Clas‑
original dataset, Table 7 was obtained. sic hardware
Table 7 shows the results obtained from the prediction • CQ: Classical data with ML algorithms running on
of the new dataset with some of the models mentioned in Quantum hardware
Tables 3 and 4, being trained with the original dataset and • QC: Quantum data with ML algorithms running on Clas‑
eliminating the columns mentioned above. sical hardware
Wireless Networks
• QQ: Quantum data with ML algorithms running on will have as many components as possible categories the
Quantum hardware variable has, all of them being "0" except for the posi‑
tion that corresponds to the category of that observation,
In this work, CQ is mainly used, starting from classical data, which will contain a 1. The drawback in this case is that
encoded by the corresponding algorithm in quantum infor‑ some of the analyzed features have more than 100 pos‑
mation, to subsequently perform the simulations on classi‑ sible categories, which would significantly increase the
cal hardware. All the work has been developed in Python, size of the dataset.
using the IBM quantum computing framework called Qiskit. • Binary coding: hybrid method combining the two previ‑
Specifically, a Variational Quantum Classifier has been used, ous ones so that first, ordinal encoding is applied and
requiring the following steps prior to training [35]: then each integer is converted into binary and as many
columns are generated as there are digits in the resulting
• Data coding, process that consists of transferring the binary. This method is more optimal than the “one hot”
original data to qubits and is done through feature map‑ encoding, but it still complicates the dataset.
ping, choosing different algorithms for it: • Hashing: a hashing function is required that transforms
each category of the variable into an integer value within
– ZZFeatureMap
a certain range. However, collisions (different inputs gen‑
– ZFeatureMap
erating the same output) must be controlled because they
– PauliFeatureMap
can affect the quality of the dataset.
• Application of a parameterized quantum circuit or
Ansatz, quantum circuit whose main characteristic is that After evaluating the impact on the dataset and on the model
it has a set of adjustable weights that must minimize an (SVM/Q-VQC), ordinal encoding was chosen. In particu‑
objective function. The chosen Ansatz have been: lar, to choose the algorithm for converting text variables to
numerical variables, different advantages and disadvantages
– RealAmplitudes of each algorithm were analysed, opting for ordinal encoding
– EfficientSU2 mainly due to its simplicity, efficiency and ease of imple‑
– ExcitationngPreserving mentation, which is fundamental in quantum processes.
Besides, unlike other methods such as one-hot encoding,
• Choice of optimization algorithm, with a function equiv‑
ordinal encoding does not increase the dimensionality of
alent to that of a classic Deep Learning model, selecting
the dataset, which is crucial for current quantum models
three local optimizers (a function that tries to locate an
that are very limited in the number of usable qubits. Moreo‑
optimal value within the neighboring set of a candidate
ver, its use generally leads to less information loss (e.g. by
solution):
preserving order) compared to algorithms such as random
coding. Therefore, although it also has some disadvantages
– COBYLA (Constrained Optimization By Linear
(possible artificial numerical relationship between categories
Approximation optimizer)
that might not be inherently ordered), the advantages identi‑
– GradientDescent (Gradient Descent minimization
fied and the preliminary tests we were able to do, tipped the
routine)
balance towards this option.
– SLSQP (Sequential Least Squares Programming
However, one of the future lines of work involves com‑
optimizer)
paring different encoding methods and evaluating their
impact on the final results. In particular, it is considered
9.2 Adaptation of the dataset especially interesting to analyse the use of one-hot in com‑
bination with dimensionality reduction algorithms such as
Since the application of QML models requires datasets PCA (Principal Component Analysis), in cases where the
where all their characteristics are numeric, it has been nec‑ application of one-hot increases the number of variables to
essary to codify the categorical variables. For this, several be handled excessively.
alternatives were considered, such as: The selected fields and the processes carried out on the
original dataset are detailed below to allow the application
• Ordinal encoding: suitable for establishing a hierarchical of QML algorithms considering that all the characteristics
order between the values of the variable. However, in the must be numeric.
analyzed case, the values of the categorical variables do
not correspond to this casuistry. • url: stores the total number of characters in the URL.
• One-hot encoding: a vector of numerical characteristics • compromissionType: variable that indicates if the website
is linked to each category in such a way that the vector is compromised by phishing, defacement, or is normal.
Wireless Networks
9.3 Application of VQC
Table 9 ZZFeatureMap Ansatz Opt Train (s) Test (s) TPC (s) TMAC (s)
Table 10 ZFeatureMap Ansatz Opt Train (s) Test (s) TPC (s) TMAC (s)
Table 11 PauliFeatureMap
Ansatz Opt TrainS TestS TPC (s) TMAC (s)
• Regarding the algorithm for mapping features, the Several conclusions have been drawn from the analysis,
ZFeatureMap stands out, with which the best results both about the dataset itself and about the usefulness of
have been obtained globally. QML in cybersecurity.
• In the execution of classic hardware, the performance On the one hand, regarding the dataset, during the study
of the M2 PRO processor stands out, which obtains a it was concluded that the first used dataset was unbalanced,
clear advantage in training times in almost all cases. which helped to identify the most optimal algorithms and
processes to optimise the results. In this way, this work has
highlighted the importance of this prior analysis of the data,
before starting to apply different algorithms in a generalised
10 Conclusions and future work way.
On the other hand, the study concluded that the typology
In this work, the cybersecurity problem of detecting fraud‑ of the analysed problem also conditions the focus on which
ulent URLs has been analysed from different perspectives, results are the most interesting in practice. For example, this
using machine learning in both its traditional version and is the case of the confusion matrix, where working on a
its quantum version. The main goal has been to explore cybersecurity problem involves paying special attention to
the possibilities of quantum computing applied to machine False Negatives because they mean that malicious URLs
learning in the context of cybersecurity. are being taken as valid and can generate significant service
Wireless Networks
or economic losses. That is why the objective in this case The aforementioned lines point the directions of several
should be to minimise False Positives as much as possible, problems detected during this study, as well as new research
to see how Accuracy decreases but Recall increases. Specifi‑ focuses that give continuity to this work.
cally, since the main goal is to achieve the lowest number of As a final conclusion, this work opens the door to numer‑
False Positives, but without greatly increasing the number of ous future studies on the optimal parameters for the use of
False Negatives, the final conclusion is that the best measure QML and how to integrate these algorithms in the analysis
to compare the models is the F1-score. of different cybersecurity problems, thus incorporating new
Another conclusion in the field of classical machine possibilities for the early detection of fraudulent actions.
learning models is that one of the three neural networks pro‑
Acknowledgements This research has been partially supported
posed in this work was clearly identified as the most optimal by the Cybersecurity Chair of the University of La Laguna and
for the analysed dataset. The next best model was the Sup‑ the project PID2022-138933OB-I00: ATQUE funded by MCIN/
port Vector Machine with RBF kernel, and the third best AEI/10.13039/501100011033/FEDER, EU.
model was the Support Vector Machine with a Poly kernel.
Funding Open Access funding provided thanks to the CRUE-CSIC
An important objective of this work has been to evalu‑ agreement with Springer Nature.
ate the usefulness of QML models in cybersecurity, in an
attempt to identify the most appropriate algorithm combi‑ Open Access This article is licensed under a Creative Commons Attri‑
nations used in this context. In this sense, several interest‑ bution 4.0 International License, which permits use, sharing, adapta‑
tion, distribution and reproduction in any medium or format, as long
ing conclusions were obtained about the relevance of the as you give appropriate credit to the original author(s) and the source,
optimisers and the feature mapping algorithm, identifying provide a link to the Creative Commons licence, and indicate if changes
certain combinations that produce results similar to classical were made. The images or other third party material in this article are
models with a simplified dataset. included in the article’s Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in
When comparing the results obtained with ML and QML, the article’s Creative Commons licence and your intended use is not
it is clear that classic models produce better results, espe‑ permitted by statutory regulation or exceeds the permitted use, you will
cially considering that they do so by working on the entire need to obtain permission directly from the copyright holder. To view a
dataset. This conclusion is natural considering that these copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
models have been studied and optimised exhaustively for
years, and so the amount of academic literature on the topic
is extensive. However, it is also clear that with QML, despite References
being poorly optimised and relatively recent in its appli‑
1. ENISA: ENISA threat landscape 2023. https://w ww.e nisa.e uropa.
cation, very promising results have been achieved in this eu/publications/enisa-threat-landscape-2023
work, for example with the combination of ZFeatureMap, 2. Fortinet: What is URL phishing? (2023). https://www.fortinet.
RealAmp and SLSQP. com/resources/cyberglossary/url-phishing
From the research carried out on the existing literature, it 3. Vanhoenshoven, F., Nápoles, G., Falcon, R., Vanhoof, K., & Köp‑
pen, M. (2016). Detecting malicious urls using machine learning
is directly clear that the application of QML is a very recent techniques. In: IEEE Symposium series on computational intel‑
field and consequently its results are still very theoretical. ligence (SSCI), pp. 1–8
In particular, its use in the field of cybersecurity is even 4. Sahoo, D., Liu, C., & Hoi, S.C. (2017). Malicious url detection
more restricted, especially due to the current limitations of using machine learning: A survey. arXiv preprint arXiv:1701.
07179
quantum hardware. Thus, in this work numerous situations 5. Le, H., Pham, Q., Sahoo, D., & Hoi, S.C. (2018). Urlnet: learning
have emerged that pave the way for future studies, such as: a url representation with deep learning for malicious url detection.
arXiv preprint arXiv:1802.03162.
• Shortage of up-to-date cybersecurity datasets suitable for 6. Aljabri, M., Altamimi, H.S., Albelali, S.A., Maimunah, A.-H.,
Alhuraib, H.T., Alotaibi, N.K., Alahmadi, A.A., Alhaidari, F.,
quantum computing work. Mohammad, R.M.A., & Salah, K. (2022). Detecting malicious
• Encoding alphanumeric variables to purely numeric urls using machine learning techniques: review and research direc‑
values, striking a balance between information loss and tions. IEEE Access.
limiting the excessive growth of variables to be processed 7. Patil, D. R., & Patil, J. B. (2018). Malicious URLs detection using
decision tree classifiers and majority voting technique. Cybernet-
in QML algorithms. ics and Information Technologies, 18(1), 11–29.
• Optimal parameterisations of QML for cybersecurity. 8. Hieu Nguyen, H., & Thai Nguyen, D. (2016). Machine learn‑
• Application of quantum hardware. ing based phishing web sites detection. In: AETA 2015: Recent
• Frameworks to be used in the context of QML, different advances in electrical engineering and related sciences, pp.
123–131.
from the Qiskit libraries, such as QML PennyLane, the 9. Yahya, F., Isaac W., Mahibol, R., Kim Ying, C., Bin Anai, M.,
Cirq libraries (Google), or Microsoft Quantum Develop‑ Frankie, A., Sidney, Ling Nin Wei, E., & Guntur Utomo, R.
ment Kit (QDK). (2021). Detection of phising websites using machine learning
Wireless Networks
approaches. In 2021 International conference on data science 30. Cerulli, G. (2023). Model selection and regularization. In: Fun‑
and its applications (ICoDSA). damentals of supervised machine learning, pp. 61–64.
10. Alkhudair, F., Alassaf, M., Khan, U. R., & Alfarraj, S. (2020). 31. Pothuganti, S. (2018). Review on over-fitting and under-fitting
Detecting malicious url. In 2020 International conference on com- problems in machine learning and solutions. International Journal
puting and information technology 1, 97–101. of Advanced Research in Electrical, Electronics and Instrumenta-
11. A. Waheed, M., Gadgay, B., DC, S., P., V., & Ul Ain, Q. (2022). A tion Engineering, 7(9), 3692–3695.
machine learning approach for detecting malicious url using dif‑ 32. Jasper Snoek, R.P.A. (2012). Hugo Larochelle: Practical bayes‑
ferent algorithms and NLP techniques. In: 2022 IEEE North Kar- ian optimization of machine learning algorithms. In: Advances in
nataka Subsection Flagship International Conference (NKCon). Neural Information Processing Systems, vol. 25.
12. Ha, M., Shichkina, Y., Nguyen, N., Phan, T.-S. (2023). Classifica‑ 33. Abu-Nimeh, S., Nappa, D., Wang, X., & Nair, S. (2007). A com‑
tion of malicious websites using machine learning based on url parison of machine learning techniques for phishing detection.
characteristics. In Computational Science and Its Applications In: Proceedings of the anti-phishing working groups 2nd annual
- ICCSA 2023 Workshops, pp. 317–327 eCrime researchers summit, pp. 60–69.
13. Urcuqui, C., Navarro, A., Osorio, J., & García, M. (2017). 34. Li, T., Kou, G., & Peng, Y. (2020). Improving malicious URLs
Machine learning classifiers to detect malicious websites. Pro- detection via feature engineering: Linear and nonlinear space
ceedings of the Spring School of Networks, 1950, 14–17. transformation methods. Information Systems, 91, 101494.
14. Chiramdasu, R., Srivastava, G., Bhattacharya, S., Reddy, P.K., 35. Qiskit.org: Quantum machine learning course. https://l earn.q iskit.
& Gadekallu, T.R. (2021). Malicious url detection using logistic org/course/machine-learning
regression. In: 2021 IEEE International conference on omni-layer
intelligent systems (COINS), pp. 1–6. Publisher's Note Springer Nature remains neutral with regard to
15. Mercaldo, F., Ciaramella, G., Iadarola, G., Storto, M., Martinelli, jurisdictional claims in published maps and institutional affiliations.
F., & Santone, A. (2022). Towards explainable quantum machine
learning for mobile malware detection and classification. Applied
Sciences, 12(23), 12025.
16. Kalinin, M., & Krundyshev, V. (2023). Security intrusion detec‑
tion using quantum machine learning techniques. Journal of Com- Nuria Reyes‑Dorta received her
puter Virology and Hacking Techniques, 9, 125–136. Bachelor’s degree in Mathemat‑
17. Patel, O., Tiwari, A., Patel, V., & Gupta, O. (2015). Quantum ics from the University of La
based neural network classifier and its application for firewall to Laguna, Spain, where she is cur‑
detect malicious web request. In 2015 IEEE Symposium Series on rently finishing the Master's
Computational Intelligence. IEEE, pp. 67–74 degree in Cybersecurity and
18. Gelman, A., Hill, J., & Vehtari, A. (2020). Regression and other Data Intelligence. She is focus‑
stories. Cambridge University Press. ing her main research work on
19. Hosmer, D. W., Jr., Lemeshow, S., & Sturdivant, R. X. (2013). the applications of Machine
Applied logistic regression. Wiley. Learning and Quantum Machine
20. Quinlan, J. R. (2014). C4.5: programs for machine learning. Lear ning in the f ield of
Elsevier. cybersecurity.
21. Cortes, C., & Vapnik, V. (1995). Support-vector networks.
Machine Learning, 20, 273–297.
22. Cristianini, N., & Ricci, E. (2008). Support vector machines.
Springer.
23. McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the Pino Caballero‑Gil completed her
ideas immanent in nervous activity. The Bulletin of Mathematical B.Sc. and Ph.D. in Mathematics
Biophysics, 5, 115–133. at the University of La Laguna in
24. Moldwin, T., & Segev, I. (2020). Perceptron learning and clas‑ Spain, where she currently holds
sification in a modeled cortical pyramidal cell. Frontiers in Com- the position of full professor of
putational Neuroscience, 14, 33. Computer Science and Artificial
25. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep feedfor‑ Intelligence at the Department of
ward network. In: Deep Learning, pp. 164–223. Computer Engineering and Sys‑
26. Nielsen, M. A., & Chuang, I. L. (2010). Quantum computation tems. Her area of expertise
and quantum information. Cambridge University Press. includes stream ciphers, crypto‑
27. Raschka, S., & Mirjalili, V. (2019). Python machine learning: graphic protocols, security of
Machine learning and deep learning with python, scikit-learn, wireless networks and mobile
and tensorflow 2. Packt Publishing Ltd. applications, and quantum-
28. Osval Antonio Montesinos López, J.C. & Abelardo Montes‑ resistant cryptography. She is the
inos López. (2022). Overfitting, model tuning, and evaluation of leader of the CryptULL research
prediction performance. In Multivariate statistical machine learn- group on Cryptology, which is
ing methods for genomic prediction, pp. 109–139. dedicated to the development of cutting-edge projects in the field. She
29. Haozhe Xie, H.X. & Jie Li. (2017). A survey of dimensionality has made significant contributions to the academic community through
reduction techniques based on random projection. arXiv:1706. numerous refereed conference and journal papers, as well as books.
04371.
Wireless Networks
Carlos Rosa‑Remedios received university. He is a member of the CryptULL research group, a research
his B.Sc. in Mathematics from group in Cryptology, focusing his work on the study of the applications
the University of La Laguna and of Machine Learning algorithms and Quantum Computing in the field
is accredited as Director of Secu‑ of Cybersecurity and Critical Infrastructures.
rity by the Ministry of the Inte‑
rior. He currently combines his
work as head of technology at
112 in the Canary Islands with
his Ph.D. studies at the Univer‑
sity of La Laguna and teaching
at the Faculty of Computer Engi‑
neering and in the Master's
Degree in Cybersecurity and
Data Intelligence at the same