Deep Learning Based Detection For Cyber Attacks in Iot Networks: A Distributed Attack Detection Framework
Deep Learning Based Detection For Cyber Attacks in Iot Networks: A Distributed Attack Detection Framework
https://doi.org/10.1007/s10922-023-09722-7
Abstract
The widespread use of smart devices and the numerous security weaknesses of
networks has dramatically increased the number of cyber-attacks in the internet of
things (IoT). Detecting and classifying malicious traffic is key to ensure the secu-
rity of those systems. This paper implements a distributed framework based on deep
learning (DL) to prevent many different sources of vulnerability at once, all under
the same protection system. Two different DL models are evaluated: feed forward
neural network and long short-term memory. The models are evaluated with two
different datasets (i.e.NSL-KDD and BoT-IoT) in terms of performance and iden-
tification of different kinds of attacks. The results demonstrate that the proposed
distributed framework is effective in the detection of several types of cyber-attacks,
achieving an accuracy up to 99.95% across the different setups.
* Beatriz Otero
beatriz.otero@upc.edu
Olivia Jullian
oliviajullianparra@gmail.com
Eva Rodriguez
eva.rodriguez@upc.edu
Norma Gutierrez
normaescg@gmail.com
Héctor Antona
hector.antona@gmail.com
Ramon Canal
ramon.canal@upc.edu
1
Department of Computer Architecture, Universitat Politecnica de Catalunya, Jordi Girona, 31,
Barcelona 08034, Spain
13
Vol.:(0123456789)
33 Page 2 of 24 Journal of Network and Systems Management (2023) 31:33
1 Introduction
13
Journal of Network and Systems Management (2023) 31:33 Page 3 of 24 33
the new attacks are small mutants of existing ones; in the same way that changes
in images can be identified by small changes in their pixels. In IoT-Fog networks,
they are used for detecting network threads and attacks [8, 9]. Even though IoT
network features (i.e., its distributed nature and the limited computing capabili-
ties of the end-devices) requires novel solutions for IDS [10].
This work proposes a novel reliable system which applies DL approaches to tar-
get several kinds of attacks in IoT networks. The contributions of this work are as
follows:
2 Related Work
This section discusses the related works that use DL for intrusion detection in IoT.
The emergence of Fog computing ecosystems requires new DL models to be resil-
ient, adaptive, and closer to the edge. Edge nodes already provide computing, stor-
age, communication and control services. Similarly, they can host security services.
Attacks in IoT systems range from gaining access to the local system to DoS. Initial
works in IoT environments consider centralized solutions for DL based intrusion
detection systems. Alom et al. [11] develop the first IDS based on DBN. They test
the solution using the NSL-KDD dataset normalized through a numerical encoding
procedure. For the classification phase, the system achieves a 97.5% ACC versus
the 40% of the dataset training data. Kim et al. [12] focus on the detection of DDoS
attacks. They chose LSTM to detect network intrusions. They succeed in detecting
and predicting complex patterns in a time series by detecting anomalies from a tem-
poral pattern of less reliable symptoms. This work uses the KDD Cup’99 dataset
and it achieves an accuracy of 98.8%. Shone et al. [13] propose a solution based
on stacked NDAEs and Random Forest as the classification algorithm. They use
the KDD Cup’99 and NSL-KDD datasets and they obtain similar results as previ-
ous approaches (98% ACC). Kwon et al. [3] chose the MLP model for evaluation.
They use the NSL-KDD dataset preprocessed by normalizing numerical values and
encoding categorical values as numerical ones. They train the network with different
13
33 Page 4 of 24 Journal of Network and Systems Management (2023) 31:33
hyper-parameter configurations (i.e. units, hidden layers, epochs and learning rate),
and the Softmax layer produces the outputs. Authors obtain promising results with
an ACC of 90%. Finally, Ferrag et al. [14] conduct a comparative study for differ-
ent DL models in intrusion detection systems. They consider Recurrent Neural Net-
work (RNN), Deep Neural Network (DNN), Restricted Boltzmann Machine (RBM),
Convolutional Neural Network (CNN), Deep Boltzmann Machine (DBM) and Deep
Autoencoder (DAE), using two datasets CSE-CIC-IDS2018 [15] and the Koroniotis
version [16] of the BoT-IoT, which considers a 5% of the entire dataset for training
and testing. The study concludes that CNN and DAE methods achieve the best accu-
racies, 97% and 98% respectively, obtaining best performance for both datasets.
Subsequent works conceive distributed solutions to prevent network attacks. The
first one is conducted by Yadav and Subramanian [17], which propose a solution for
detecting DDoS attacks in the application layer through traffic classification using
SAEs. They construct their dataset from features extracted from their web server log
(request flooding, session flooding, asymmetric attack). The logs are pre-processed
and the features are transformed to a numeric form. The system first learns the fea-
tures through the SAE, then the DDoS dataset is built based on these features. At
the end, they are classified with a logistic regression classifier. The experimental
results demonstrate that the proposed method learns features from the SAE, which
are beneficial for classifications. It improves the DR to 98.99% with an average FPR
of 1.27%. Lopez-Martin et al. [18] propose a system based on CVAEs, which uses
a detection method that also performs feature reconstruction, recovering missing
features from incomplete datasets. They use a deviation-based approach, but with a
discriminative framework for traffic samples and classification. Traffic samples are
labeled with the intrusion that achieves less reconstruction error instead of using a
threshold for intrusion definition. Then, the intrusion labels are included inside the
CVAE decoded layers. The method is tested using the NSL-KDD dataset, and it can
recover missing categorical features with 3, 11 and 70 values, achieving an ACC of
99%, 92% and 71% respectively. Luo and Nagarajan [19] base their solution on AEs
for wireless sensor networks (WSN), overcoming the high computation resource
consumption of DL in WSN. The authors design a two-part algorithm which resides
on sensors detecting anomalies in a fully-distributed manner. The training model is
offloaded to the cloud. They create their dataset by collecting data over 4 consecu-
tive months in a real WSN indoor test-bed. The results show high accuracy and a
low false positive rate (AUC > 0.8 in most cases). In this work, the sensors perform
distributed anomaly detection without communicating to each other. They evaluate
the TPR and FPR. TPR is better (18%) for a random scheme because the major-
ity of the training data is historical, while the prioritized scheme has a much lower
FPR, up to 60%, because it updates the weights and biases more responsively. Diro
and Chilamkurti [8] propose a distributed architecture for IoT networks where the
fog nodes are responsible for training models and hosting attack detection systems,
while the master nodes conduct collaborative parameter sharing and optimization.
In this architecture the DL system uses three layers of neurons (150, 120 and 50),
and the last Softmax layer with neurons equal to the number of classes. The model
is tested using the NSL-KDD dataset and the results demonstrate that the distrib-
uted model performs better than the centralized one improving overall accuracy of
13
Journal of Network and Systems Management (2023) 31:33 Page 5 of 24 33
detection from 96 to 99%. Finally, Roopak et al. [9] analyze four different classifica-
tion DL models MLP, CNN, LSTM, and CNN+LSTM, obtaining the best accuracy
results for the last one (97.16%). The system uses the CICIDS [20, 21] dataset but
balances the DDoS attack dataset by duplicating the data, which improves the train-
ing of the DL methods.
Table 1 summarises the surveyed DL models used for intrusion detection in IoT
networks, considering both centralized and distributed architectures. Our work dif-
ferentiates from them in (1) the distributed nature of the attack detection scheme,
as well as, (2) the pre-processing of the datasets. This improves accuracy in the
detection of attacks. Our proposal an accuracy up to 99.95% using the FFNN model.
Moreover, our work is not restricted to a specific environment (e.g. wireless sensors
networks), it covers a wide range of cyber-attacks, and it offers flexibility in chang-
ing the IoT fog node and environment without risking the device security.
Nowadays, IoT networks need a distributed solution at the edge for cyber-attack
detection. In fact, the distributed nature of IoT environments require re-engineering
of intrusion detection services. Traditional centralized IDSs have shown to be inef-
fective in the prevention of novel (zero-day) attacks. This section presents a novel
distributed DL-based attack detection framework. It consists of four main stages:
Data treatment and pre-processing, DL model training and testing, distributed
framework deployment and attack detection and classification (see Fig. 1).
This section first presents the datasets considered, as well as the pre-processing
conducted. In order to achieve a broad solution, two different datasets are consid-
ered: the BoT-IoT dataset that addresses attacks specific of IoT environments, and
the NSL-KDD dataset to broaden the types of cyber-attacks. Both datasets are pre-
processed to be later used in the NNs. The second part of this section addresses a
Table 1 Key contributions using DL models for intrusion attack detection in IoT networks
Related work (year) DL models Datasets Metrics
13
33 Page 6 of 24 Journal of Network and Systems Management (2023) 31:33
key issue for the proposed framework, that is the selection of the appropriate DL
model. The two models considered are the FFNN and LSTM because they are used
for supervised learning and they have good deployment with highly correlated fea-
tures. Both networks are evaluated in a centralized model. Then, the best performing
DL model is used in the distributed framework. Finally, the proposed distributed
framework is presented, and its performance assessed in a realistic disperse environ-
ment where data is limited.
3.1 Datasets
This paper aims to provide a robust framework effective in the detection of IoT
cyber-attacks. Thus, we use both the BoT-IoT (a specific IoT cyber-attacks dataset)
and NSL-KDD (general cyber-attack dataset).
3.1.1 BoT‑IoT Dataset
The BoT-IoT dataset [16] consists of over 73 million records of network activity in
a simulated IoT environment, including both normal and several cyber-attack traffic
flows. The training and test dataset have 5 output classes reflecting either normal
traffic or 4 different types of attacks: DDoS, Denial of Service (DoS), Keylogging
and Data Theft. DoS and DDoS attacks address malicious attempts to disrupt nor-
mal traffic of a server, service or network overloading them with a flood of Internet
traffic. Keylogging represent attacks that gather information. Finally, data theft rep-
resents private user information leak. The BoT-IoT dataset is based on the activity
of a network composed by 62 hosts (based in the network mask 192.168.100.0/26).
Beyond the scenario presented in the BoT-IoT dataset, in this paper the network traf-
fic is considered for any network size since the increment of IoT devices and the
appearance of new networks types (i.e. 5G) generates many possible network sizes
and therefore scenarios. All the different attacks of the dataset are based on flooding.
13
Journal of Network and Systems Management (2023) 31:33 Page 7 of 24 33
Therefore, the more relevant features are those which provide information about the
rates, the number of packets and types of protocol used by the IoT device.
3.1.2 NSL‑KDD Dataset
3.2 Data Pre‑processing
This section presents the first stage of the framework. Data pre-processing is per-
formed before the DL models are trained. The appropriate pre-processing of net-
work traffic allows DL models efficiently predict non biased results, while a correct
pre-processing avoids overfitting problems. The pre-processing is divided in two
steps: features selection and data processing.
3.2.1 Features Selection
The features selected for the BoT-IoT are those which provide information about the
rates, the number of packets and the types of protocol used by the IoT device. The
reason of this selection resides in the characteristics of the attacks present in these
devices (explained in the previous section).
In the case of NSL-KDD dataset, features are divided into different categories:
basic, content, time and host traffic. In this way, all the characteristics considered for
the BoT-IoT dataset from content, time and host traffic characteristics (as the number
of packets) are also shared by the NSL-KDD dataset, the common nature of most of
13
33 Page 8 of 24 Journal of Network and Systems Management (2023) 31:33
the attacks present in both datasets. An example of shared features are those referred
to duration, protocol-type, rates and number of bytes received and send.
Because of the similarity in the attacks (for instance DoS), most of the charac-
teristics are shared between both datasets. On the other hand, NSL-KDD dataset
exploits network cyber-attacks with different characteristics. This is the case of prob-
ing attacks. These attacks are based on network vulnerabilities, not only the device
itself. Thus, its targets are broader than the attacks considered for the BoT-IoT data-
set (mainly focused on the device). Therefore, the framework uses additional fea-
tures related to the basics of the network such as urgent packets, service, logged-in.
3.2.2 Data Processing
Regarding the content of the dataset records, both datasets are unbalanced since the
number of records is far higher for normal traffic than for attacks. Different tech-
niques can be used to balance the data based on oversampling and undersampling
techniques. The large number of records lead us to consider undersampling tech-
niques. The datasets are balanced reducing the number of normal traffic records
(randomly picked), while maintaining the number of attacks. Hence, we synthesized
a subset of the dataset holding a parity of 50% attacks and 50% not attacks.
Different techniques are used before training the models to prevent overfitting
problems. After balancing both datasets, all the features are transformed into float-
ing point values. This means that, in the BoT-IoT dataset, all the IP addresses are
treated as different numbers, thus, effectively considering all network sizes. This
transformation is also shown when applying one hot encoding to the categorical val-
ues. Feature selection is not required due to the reduced number of features of the
dataset. Finally, a standard normalization of the dataset is done in both scenarios to
prevent DL model overfitting and possible biased results.
Once the data is pre-processed, DL model training and testing, the second stage of
the framework, starts. DL enables computational models to learn in different stages
of representation and abstraction to effectively model complex relationships among
input data. DL algorithms work by transforming one module to a more abstract one,
always starting from the pre-processed raw data. The combination of multiple lev-
els allows learning high complex models, which are often used for classification
tasks, considering the input features relevant for discrimination and suppressing
minor variations. Hence, two DL models, FFNN and LSTM, are considered in the
proposed framework. They have been chosen because of two main reasons: i) their
extensive use in supervised learning; ii) their good performance with highly cor-
related features. Furthermore, FFNN has been demonstrated a simple and efficient
solution for intrusion detection with excellent results in previous studies, achieving
98% of accuracy.
On the other hand, LSTMs have evolved from Recurrent Neural Networks
(RNNs) to efficiently learn patterns in long sequences [8] (e.g. network data
13
Journal of Network and Systems Management (2023) 31:33 Page 9 of 24 33
3.3.1 FFNN
FFNN [24] is a widely used Artificial Neural Network (ANN) model, where infor-
mation moves from the input to the output layer through the hidden layer, only in the
forward direction. The number of neurons in the input and output layers are deter-
mined by the number of input and output variables, respectively, while in the hidden
layer they are determined by a trial and error method. The three layers of the FFNN
are fully connected because each neuron of one layer is connected to all the other
neurons of the next layer. In real world problems, the network is established by a
model that consists of many neurons.
In cyber-attack detection, the output of the FFNN model determines whether
there is an attack or not. Figure 2 shows the structure for the FFNN model applied to
the attack detection in IoT environments using the BoT-IoT dataset.
3.3.2 LSTM
LSTMs [25] are a particular type of Recurrent Neural Networks (RNNs) used to
gather information from multiple layers, contrary to FFNNs. RNN is the preferred
model for training sequential data. This type of NN is an extension of a conventional
13
33 Page 10 of 24 Journal of Network and Systems Management (2023) 31:33
FFNN with cyclic connections. RNNs are more powerful in modeling sequences,
but they have gradient vanishing and exploding problems, which make them par-
ticularly hard to train. LSTMs overcomes these problems introducing the LSTM cell
formed by a set of gates to control the information flow. LSTMs use the connections
between hidden units to generate their output and model their weights. Three gates
control the information flow: the input, the forget, and the output gates. The input
gate determines the ratio of the input, which affects the calculation of the cell state.
The output gate determines if the information from a previous train is gathered or
discarded and passes the decision to the forget gate. Figure 3 depicts the structure
for the LSTM model applied to attack detection in IoT environments using the BoT-
IoT dataset. The structure is the same as in Fig. 2, but with an LSTM distribution.
3.3.3 Overfitting Prevention
The use of DL techniques, specifically LSTM and FFNN models, has proven
extremely accurate in the past years (i.e. 97%–98% of accuracy for the Bot-IoT data-
set [14]). DAEs and AEs (a specific case of FFNN) have also performed well, with
more than 95% accuracy for datasets with similar characteristics [5].
Due to the tendency of DL in overfitting, we include specific measures in the
development of our framework to ensure realistic results as follows.
Section 3.2 uses undersampling techniques for the correct balance the data. These
techniques allow the models to achieve realistic environments and equiprobability
for each of the attacks present in IoT devices. In addition to the balance of the data,
we consider the IP addresses as floats, to represent all types of networks.
The use of dropouts in some of the DL models to prevent unrealistic high perfor-
mances. This step is described in Sect. 4.1.
However, the most relevant prevention against overfitting presented in this paper
is the use of Hyperband [26], a TensorFlow tool that allows the correct fine tuning
13
Journal of Network and Systems Management (2023) 31:33 Page 11 of 24 33
of the model. This mechanism allows the research of the best hyperparameters com-
bination to achieve the highest performance of the model. During the research, the
training and validation losses are the main metrics considered in the process of fine
tuning. Thus overfitting can be avoided. A more detailed explanation can be found
in section 4.
Finally, removing IP addresses during the pre-processing of the datasets is
another possibility that is considered to avoid overfitting. However, not the same IP
attackers are used in the training and validation set. This means, the model cannot
learn by heart the float numbers corresponding to attackers’ IP address, since there
is no repeated addresses. In addition, we find that the understanding of the IP attack-
ers’ nature is crucial, since preventive methods can be deployed in the future based
on tracking. As a good possible future line of research, using explainable AI could
let us know if there is effectively any relationship between the IP addresses and the
way an attack is done, especially in those related to a DoS.
3.4.1 Framework Description
13
33 Page 12 of 24 Journal of Network and Systems Management (2023) 31:33
Fig. 4 DL based distributed attack detection architecture for Fog/IoT networks: a Distributed architec-
ture; b Model update per layers
response to critical actions. Finally, the Cloud layer is the core layer which com-
prises high-performance servers and storage devices.
Likewise, the DL-based distributed attack detection architecture has three layers
matching the IoT architecture. The Cloud layer contains the global model constituted
by the latest values of the gradient weights, which are updated by the fog-nodes.The
fog nodes host the attack detection system being responsible of training in a distrib-
uted manner the DL models. They feed the global parameters managed in the Cloud
by the coordinating master. The master manages the parameter validation, optimiza-
tion and exchange. The optimized parameters are then propagated to the fog nodes
to train the distributed DL model. While, IoT devices provide their pre-processed
raw data to fog nodes. Figure 4 shows the distributed attack detection architecture
and Table 3 summarizes the notations and terminology used in the distributed attack
detection algorithm.
Initially, the model weights are initialized to random values, as in most Deep
Learning approaches. The fog nodes download the latest version of the global model
from the Cloud server ( Mw) and train it with the pre-processed raw data from the
Iot devices. Each IoT device in the Edge layer pre-processes its raw data ( RDx ) and
13
Journal of Network and Systems Management (2023) 31:33 Page 13 of 24 33
sends the pre-processed data ( PDx ) to the corresponding node in the Fog layer.
Then, as stated before, the latest version of the global model is downloaded from
the Cloud to the fog node. Consequently, the fog node trains the downloaded model
locally with the received dataset and sends the selected gradient weights to the
server (ΔWx ). Finally, the global model is validated and updated in the Cloud layer
( Mw). Overall, this process updates the DL model at each iteration by combining the
data from all the fog nodes, which -in turn- gather the information from all the IoT
devices. Algorithm 1 describes the steps in pre-processing, updates the model and
gradient weights, and validates the model for the proposed distributed attack detec-
tion mechanism.
The best two performing DL models, FFNN and LSTM, for the datasets, BoT-IoT
and NSL-KDD, have been integrated into the distributed framework. Their perfor-
mance is presented and analysed in next section.
13
33 Page 14 of 24 Journal of Network and Systems Management (2023) 31:33
environment is produced by the change of the fog node responsible for the device.
The proposed framework adapts centralized environments to protect distributed sys-
tems without decreasing the flexibility and freedom of its devices. Thus, every train-
ing in a fog node is shared among the others and a common model is available on
the cloud. Thanks to the fog nodes contribution to the final model’s weights, all the
possible changes in the environment are covered for a device connected to the cloud.
As a first use case, we consider a general system in the cloud, using the proposed
common attacks in clouds found in the NSL-KDD dataset. To also ensure a more
present distributed system, we present a second use case, more specific and based on
IoT environments. The attacks are considered for IoT devices, provided by the BoT-
IoT dataset. In both cases, the dataset is considered as a centralized environment
first, training FFNN and LSTM models and comparing them. Then, the partition of
the dataset is done for as many fog nodes as the cloud is composed.
Considering the nature of the presented framework (explained in Sect. 3.4.1), the
idea of using a centralized model for distributed environments cannot be contem-
plated. To bring intrusion detection capabilities to Fog computing architectures in
IoT environments, the model must be trained in the fog nodes and be updated in the
cloud layer. This makes inevitable the propagation of errors during a fog node train-
ing to its neighbors.
When training the fog nodes in the distributed framework, each of them has a par-
tition of the dataset. Due to the similar network characteristics present in each of the
nodes, during their training, the possible different models (in this case a FFNN and
a LSTM) are facing the same challenge as in a centralized environment: the train-
ing of a model for attacks detection. If a model has less accuracy for the detection
of attacks in a centralized environment, these differences will also propagate to its
updated weights in the cloud layer and affect the posterior training of the fog nodes.
Thus, the worse the performance of a fog node is, the worse its updated model in the
cloud will be and therefore, the more impact this error will have for the rest of the
training nodes, sharing the same distributed model in inference.
Overall, because the cloud layer updates the distributed model with similar
trained environments coming from the fog nodes, the possibility of a better perfor-
mance in this distributed framework deploying a less accurate model is very improb-
able and thus, discarded.
In addition, the computational cost for the deployment of the presented models
also must be considered. Similar results are obtained with both models. It is then
expected to have similar results in the distributed environment. Thus, the marginal
gain that an LSTM model could offer us (in case the fog nodes where trained in
different environments), is not comparable to the computational cost that its deploy-
ment represents. This is due to its complexity (with higher number of neurons and
layers) as well as its training time (which will be always a FFNN even with the same
complexity, due to the memory component of the LSTM models).
Therefore, a common model, extracted by the best performance in the central-
ized environment, is deployed in the cloud layer. Each fog node is then trained with
a partition of the dataset and shares the characteristics of its network environment
and attacks through the update of the cloud model weights. Finally, each device con-
nected to the cloud is able to change as many times as desired the responsible fog
13
Journal of Network and Systems Management (2023) 31:33 Page 15 of 24 33
node with total protection against either general attacks for the cloud environment as
well as IoT attacks for the IoT distributed environment.
4 Experiments
This section presents the stage 4 of the distributed DL-based attack detection frame-
work, which includes (i) the design of the neural networks parameters and (ii) the
design of the distributed framework in terms of node scaling, data partitioning, and
distributed algorithms. The design of the neural network parameters has been done
using Hyperband and Learning Curve [27] procedures. Hyperband is essential for
building the model. FFNN and LSTM models trained on datasets with a small num-
ber of features may end up with overfitting problems. Hyperband is used to avoid
these problems and to obtain a solid model (by tuning the different parameters of
the network). Hyperband must decide between different hyper-parameters to achieve
a solid model. In this case, due to the limited number of features, Hyperband not
only will determine the number of units per hidden layer, Hyperband also will con-
sider the activation function and the dropout to achieve best accuracy. To this end,
the model is tested with the most common and efficient activation layers: Sigmoid,
ReLU and Tanh. Moreover, a maximum dropout threshold is set to 75% to also guar-
antee robustness in the final model.
The design of the neural network depends on the DL model (FFNN or LSTM) con-
sidered, its purpose (detection) and the dataset (Bot-IoT or NSL-KDD) used. The
use of a different model for each dataset is due to the differences in the attacks pre-
sented in both datasets.
As explained in Sect. 3.1, due to the similarity in the attacks (for instance DoS),
most of the characteristics are shared between both datasets. The main difference is
that NSL-KDD dataset exploits other network cyber-attacks that have different char-
acteristics. Thus, its targets are broader than the attacks considered for the BoT-IoT
dataset and therefore, the inputs to the detection model must be different.
Therefore, the FFNN designed for attack detection in IoT environments (Bot-IoT
dataset) consists of 3 hidden layers with 100, 50 and 100 neurons respectively, all
with ReLU activation functions and an output layer with the Sigmoid activation
function. ReLU is used to avoid saturation between weights, hence having continu-
ous weights value modelling during the training phase. On the other hand, binary
cross-entropy is used for binary classification problems such as the target system.
Thus, Sigmoid is the sole activation function compatible with binary cross-entropy
since the loss requires logarithmic values between 0 and 1, which is not accom-
plished with any other activation function. Furthermore, the model is compiled with
an Adam optimizer, a binary cross-entropy loss and 15 epochs. All of these met-
rics are designed empirically along with the tuning of the hyper-parameters with
Hyperband.
13
33 Page 16 of 24 Journal of Network and Systems Management (2023) 31:33
For the NSL-KDD dataset, the model consists of four hidden layers, an input
layer and an output layer. The hidden layers have 48, 128, 88 and 188 neurons,
respectively. Furthermore, the activation function for all hidden layers is ReLU and
Sigmoid for the output. After each hidden layer, a dropout is set by Hyperband (i.e.
25% for the first and third layer, 45% for the second, and 30% for the last one). The
network is trained with an Adam optimizer, binary cross-entropy loss and 10 epochs.
The LSTM is the second model considered for the intrusion detection framework.
As for the FFNN model, the LSTM is designed for detection using the BoT-IoT
and NSL-KDD datasets. For attack detection in IoT networks (Bot-IoT dataset), the
model consists of 3 hidden layers with 100, 64 and 32 neurons, respectively. All
the hidden layers with ReLU activation functions and the output layer with Sigmoid
activation function. The model is compiled with an Adam optimizer, binary cross-
entropy loss and 30 epochs. As for the FFNN model, all the metrics are designed
empirically jointly with Hyperband for tuning the hyper-parameters. Finally, the
model is also defined to generalize the intrusion detection framework using the
NSL-KDD dataset. It consists of five hidden layers, an input layer and an output
layer. The hidden layers have 50, 100, 200, 100 and 50 neurons, respectively. The
activation function for all hidden layers is ReLU and Sigmoid for the output layer.
For compiling parameters, the network is trained with an Adam optimizer, binary
cross-entropy loss and 20 epochs.
Table 4 summarizes the designed of the FFNN and LSTM DL models to detect
and classify cyber-attacks in IoT networks.
Once the network is tested in a centralized and theoretical environment, the best
performing detection networks are tested in the distributed framework to simulate
real environments. In our system, user information can be gathered and processed
in a local environment, and then crucial training information is passed to the Cloud
environment.
The distributed environment emanates from a subset of pre-processed data. The sub-
sets are generated by dividing the selected dataset into equal parts for each end device
the system has. Several experiments are performed. First, the distributed environment
performs a non distributed test where the training resides in the Cloud. Then, the sys-
tem is tested with 3, 5, 7 and 9 fog nodes. Each fog node trains with the information
residing in 5 different devices, meaning that the number of devices used is 15, 25, 35
13
Journal of Network and Systems Management (2023) 31:33 Page 17 of 24 33
and 45 for 3, 5, 7, and 9 fog nodes. Hence, the total training samples are equally divided
into 15, 25, 35 and 45 parts, with non-repeated information.
The different number of fog nodes used allows us to ensure the proper operation of
the system for different groups of devices since we target a system that maintains simi-
lar performance level in all different setups.
Overall, the centralized training will contain all the dataset information from all the
users. In contrast, the distributed experiments will train individually in each fog node
and only pass the essential information to the Cloud system. This essential information
is the gradient of the weights of the model. With this information, the global model,
residing in the Cloud environment, can be updated by doing the mean of all received
results.
All the DL models used in our experimental framework are implemented using Keras
on TensorFlow. They are evaluated using different metrics that ensure and test their
efficiency. These metrics are the Accuracy (ACC), Loss, Precision and Recall. They
can be computed from a confusion matrix, i.e., a matrix representation of the classifi-
cation results (see Table 5), where True Positive (TP) and True Negative (TN) denote
the number of attack and normal records correctly classified. While False Positive (FP)
and False Negative (FN) denote the number of normal and attack records incorrectly
classified.
ACC (see Eq. 1) is the ratio of correctly classified predictions over the total number
of instances evaluated:
TP + TN
Accuracy = (1)
TP + TN + FP + FN
Precision (see Eq. 2) is the ratio of items correctly classified from the total of items
predicted:
TP
Precision = (2)
TP + FP
Recall (see Eq. 3) represents the ratio of items correctly classified (attack or normal)
as class C to all the items that were class C:
TP
Recall = (3)
TP + FN
13
33 Page 18 of 24 Journal of Network and Systems Management (2023) 31:33
The most common loss is the binary cross-entropy loss (see Eq. 4). Note that the y
value represents the real output, whereas the ŷ represents the output estimation:
Loss = −[y ⋅ log(̂y) + (1 − y) ⋅ log(1 − ŷ )] (4)
All NNs inferences are repeated 20 times in the distributed environment to ensure
their stability and consistency without overcharging the cloud server with redundant
calculations.
In order to provide the best DL model to the framework, first a centralized envi-
ronment is created to train and test the best models for ID in IoT networks. Hence,
a comparison between FFNN and LSTM models is done based on their accuracy,
precision and recall. Furthermore, a generalisation of the framework is also achieved
in this section, providing DL models the ability to detect intrusions in more general
networks.
Table 6 summarizes the results of the FFNN and LSTM DL models to detect
cyber-attacks in IoT networks.
The results show a different performance for the DL models depending on the
dataset considered, achieving almost the same results for the BoT-IoT dataset, but
showing significant variations for the NSL-KDD dataset. The difference between
FFNNs and LSTMs in general cyber-attack detection are significant in terms of
accuracy, where LSTM models drop more than 2% compared to the FFNNs. Pre-
cision and recall have a similar behaviour for these two different models. On the
other side, there is no substantial difference in the performance of FFNN and LSTM
models for IoT attacks (less than 1% of ACC is dropped when using LSTM models).
However, these differences are notorious when the models require abstraction (drop-
ping more than 1% in LSTM models). Thus, the FFNN model is the best choice in
terms of abstraction but if only BoT-IoT is considered, both models are similar.
In conclusion, considering past inferences (i.e. LSTM models) has a negative
impact on the final model performance when abstraction is required. In this case, the
use of a forget gate forces the model to use redundant information. Thus, the inclu-
sion of this redundant information reduces the abstraction of the model, dropping
its inference capabilities and hence, having a worse performance than FFNNs. Still,
LSTM models perform well in IoT intrusion detection with an accuracy above 99%
in IoT attacks and 96% in general cyber-attacks.
13
Journal of Network and Systems Management (2023) 31:33 Page 19 of 24 33
Finally, taking into account the simplicity, the training method of the fog nodes
(described in Sects. 3.4.1 and 3.4.2), and the performance shown by FFNNs, this DL
model is the final choice for the intrusion detection framework and it is implemented
in a distributed environment hereafter.
FFNNs are demonstrated, in Sect. 5.1, the best DL models for cyber-attack detection
in IoT centralized environments.
Due to the nature of the distributed framework, only FFNN is integrated in the
distributed framework and tested using the BoT-IoT and NSL-KDD datasets. Two
different environments are set up with the corresponding DL model and data. Fig-
ures 5 and 6 show the accuracy and precision for the BoT-IoT dataset. Likewise,
Figs. 7 and 8 show the results for the NSL-KDD dataset.
As the datasets are balanced, meaning that there is a very similar percentage of
benign and malign samples, the accuracy, precision and recall are representative
performance metrics. While, the loss function detects the possible over-fitting in the
model.
The results show the performance of the network in the validation phase, which
takes place in the Cloud environment after being individually trained in each fog
node, as depicted in Sect. 3.4.1.
In the precision graph of the Fig. 5, the accuracy reaches almost 100% for both
the centralized model and the distributed model with 3 nodes.The reason resides in
the minimal data distribution, since the distributed framework can train with 1/3 of
the samples. When using 9 nodes, the accuracy diminishes due to the smaller num-
ber of samples used for training. If the 9 node system could be trained with a similar
Fig. 5 BoT-IoT dataset FFNN ACC results in the distributed framework for 1, 3, 5, 7 and 9 fog nodes
13
33 Page 20 of 24 Journal of Network and Systems Management (2023) 31:33
Fig. 6 BoT-IoT dataset FFNN Loss results in the distributed framework for 1, 3, 5, 7 and 9 fog nodes
Fig. 7 NSL-KDD dataset FFNN ACC results in the distributed framework for 1, 3, 5, 7 and 9 fog nodes
number of samples to that of the 3 nodes system, the performance would be almost
as in the centralized architecture.
Nevertheless, the network overall performance is minimally affected by the dis-
tributed framework. Performance loss is below 1% for all metrics and it maintains
a loss significantly below 1. Hence, the final performance of the distributed system
maintains its excellence by acquiring high results.
Figures 7 and 8 show the results obtained for the NSL-KDD dataset for the cen-
tralized and distributed (3, 5, 7 and 9 nodes) frameworks.
13
Journal of Network and Systems Management (2023) 31:33 Page 21 of 24 33
Fig. 8 NSL-KDD dataset FFNN Loss results in the distributed framework for 1, 3, 5, 7 and 9 fog nodes
13
33 Page 22 of 24 Journal of Network and Systems Management (2023) 31:33
13
Journal of Network and Systems Management (2023) 31:33 Page 23 of 24 33
methodology and project design, experiments validation, investigation process, management and coordi-
nation of the research activities. ER contributed on manuscript writing, investigation and project admin-
istration. NG contributed on algorithm design and code implementations. HA contributed on software
development. RC contributed on writing-original draft and acquisition of the project financial support.
All authors read and approved the final manuscript.
Funding declaration Open Access funding provided thanks to the CRUE-CSIC agreement with Springer
Nature. This work is partially supported by the Spanish Ministry of Science and Innovation under contract
PID2021-124463OB-IOO, by the Generalitat de Catalunya under grants 2017SGR962, 2021SGR00326,
and by the DRAC (IU16-011591), the HORIZON Vitamin-V (101093062) and the HORIZON-AG
PHOENI2X (101070586) projects.
Declarations
Conflicts of Interest The authors declare no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License,
which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as
you give appropriate credit to the original author(s) and the source, provide a link to the Creative Com-
mons licence, and indicate if changes were made. The images or other third party material in this article
are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is
not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission
directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licen
ses/by/4.0/.
References
1. Lab, W.T.: Internet security report: WatchGuard’s threat lab analyzes the latest malware and
internet attacks. Technical report (Q3 2020). https://www.watchguard.com/wgrd-resource-center/
security-report-q3-2020. Accessed 17 Mar 2021
2. Cherdantseva, Y., Hilton, J.: A reference model of information assurance security. In: 2013 Inter-
national Conference on Availability, Reliability and Security, pp. 546–555 (2013). https://doi.
org/10.1109/ARES.2013.72
3. Kwon, D., Kim, H., Kim, J., Suh, S., Kim, I., Kim, J.: A survey of deep learning-based network
anomaly detection. Clust. Comput. 22(5), 949–961 (2019)
4. Anderson, J., Carbonell, J., Mitchell, T., Michalski, R., Amarel, S., Tecuci, T., Kodratoff, Y.:
Machine Learning: An Artificial Intelligence Approach. M. Kaufmann, Los Altos, CA (1983)
5. Kilincer, I., Ertam, F., Sengur, A.: Machine learning methods for cyber security intrusion detec-
tion: datasets and comparative study. Comput. Netw. 188, 107840 (2021). https://doi.org/10.
1016/j.comnet.2021.107840
6. Fadlullah, Z.M., Tang, F., Mao, B., Kato, N., Akashi, O., Inoue, T., Mizutani, K.: State-of-the-
art deep learning: evolving machine intelligence toward tomorrow’s intelligent network traffic
control systems. IEEE Commun. Surv. Tutor. 19(4), 2432–2455 (2017). https://doi.org/10.1109/
COMST.2017.2707140
7. Tsimenidis, S., Lagkas, T., Rantos, K.: Deep learning in IoT intrusion detection. J. Netw. Syst.
Manag. (2022). https://doi.org/10.1007/s10922-021-09621-9
8. Diro, A.A., Chilamkurti, N.: Distributed attack detection scheme using deep learning approach
for Internet of Things. Futur. Gener. Comput. Syst. 82, 761–768 (2018). https://doi.org/10.
1016/j.future.2017.08.043
9. Roopak, M., Yun Tian, G., Chambers, J.: Deep learning models for cyber security in IoT net-
works. In: 2019 IEEE 9th Annual Computing and Communication Workshop and Conference
(CCWC), pp. 452–457 (2019). https://doi.org/10.1109/CCWC.2019.8666588
13
33 Page 24 of 24 Journal of Network and Systems Management (2023) 31:33
10. Shalaka, M., Pawar, P.M., Muthalagu, R.: Efficient Intelligent Intrusion Detection System for
Heterogeneous Internet of Things (HetIoT). J. Netw. Syst. Manag. (2023). https://doi.org/10.
1007/s10922-022-09697-x
11. Alom, M.Z., Bontupalli, V., Taha, T.M.: Intrusion detection using deep belief networks. In: 2015
National Aerospace and Electronics Conference (NAECON), pp. 339–344 (2015). https://doi.
org/10.1109/NAECON.2015.7443094
12. Kim, J., Kim, J., Thi Thu, H.L., Kim, H.: Long Short Term Memory Recurrent Neural Network
classifier for intrusion detection. In: 2016 International Conference on Platform Technology and
Service (PlatCon), pp. 1–5 (2016). https://doi.org/10.1109/PlatCon.2016.7456805
13. Shone, N., Ngoc, T.N., Phai, V.D., Shi, Q.: A deep learning approach to network intrusion detec-
tion. IEEE Trans. Emerg. Topics Comput. Intell. 2(1), 41–50 (2018). https://doi.org/10.1109/
TETCI.2017.2772792
14. Ferrag, M.A., Maglaras, L., Moschoyiannis, S., Janicke, H.: Deep learning for cyber security intru-
sion detection: approaches, datasets, and comparative study. J. Inform. Secur. Appl. 50, 102419
(2020). https://doi.org/10.1016/j.jisa.2019.102419
15. Cibersecurity, C.I.: Dataset IDS 2018: CSE-CIC-IDS2018 on AWS. https://www.unb.ca/cic/datas
ets/ids-2018.html Accessed 17 Mar 2021
16. Koroniotis, N., Moustafa, N., Sitnikova, E., Turnbull, B.: Towards the development of realistic bot-
net dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Futur. Gener.
Comput. Syst. 100, 779–796 (2019). https://doi.org/10.1016/j.future.2019.05.041
17. Yadav, S., Subramanian, S.: Detection of application layer DDoS attack by feature learning using
Stacked AutoEncoder. In: 2016 International Conference on Computational Techniques in Infor-
mation and Communication Technologies (ICCTICT), pp. 361–366 (2016). https://doi.org/10.1109/
ICCTICT.2016.7514608
18. Lopez-Martin, M., Carro, B., Sanchez-Esguevillas, A., Lloret, J.: Conditional variational Autoen-
coder for prediction and feature recovery applied to intrusion detection in IoT. Sensors (Basel)
17(1967), 1–17 (2017). https://doi.org/10.3390/s17091967
19. Luo, T., Nagarajan, S.G.: Distributed anomaly detection using autoencoder neural networks in WSN
for IoT. In: 2018 IEEE International Conference on Communications (ICC), pp. 1–6 (2018). https://
doi.org/10.1109/ICC.2018.8422402
20. Sharafaldin., I., Habibi Lashkari., A., Ghorbani., A.A.: Toward generating a new intrusion detection
dataset and intrusion traffic characterization. In: Proceedings of the 4th International Conference on
Information Systems Security and Privacy (ICISSP), vol. 1, pp. 108–116 (2018)
21. Vijayanand, R., Devaraj, D., Kannapiran, B.: Intrusion detection system for wireless mesh network
using multiple support vector machine classifiers with genetic-algorithm-based feature selection.
Comput. Secur. 77, 304–314 (2018). https://doi.org/10.1016/j.cose.2018.04.010
22. University of California, I.: KDD Cup’99. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.
html. Accessed 17 Mar 2021
23. Cibersecurity, C.I.: NSL-KDD. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html Accessed
17 Mar 2021
24. Ramchoun, H., Idrissi, M.J., Ghanou, Y., Ettaouil, M.: Multilayer perceptron: architecture optimiza-
tion and training. Int. J. Interact. Multimed. Artif. Intell. 4(1), 26–30 (2016). https://doi.org/10.1145/
3090354.3090427
25. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997).
https://doi.org/10.1162/neco.1997.9.8.1735
26. Li, L., Jamieson, K.G., DeSalvo, G., Rostamizadeh, A., Talwalkar, A.: Efficient hyperparameter
optimization and infinitely many armed bandits. CoRR (2016) . https://doi.org/10.48550/arXiv.
1603.06560
27. Borgianini, F.: Using the learning curve to design effective training. PM Netw. 12(7), 50–52 (1998)
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
13