0% found this document useful (0 votes)

4 views31 pages

TelematiqueVol21Issue1 616

This study focuses on identifying Internet of Things (IoT) and DDoS attacks using gradient boosting techniques across three datasets: BoT-IoT, IoT-23, and CIC-DDoS2019. The experiments demonstrate that the gradient boosting methods, particularly Light Gradient Boosting Method (LGBM), significantly outperform Cascaded Deep Forest (CDF) in both accuracy and execution speed. The results indicate that boosting algorithms can enhance the detection of cyberattacks in IoT environments, achieving accuracies up to 94.79%.

Uploaded by

shruti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views31 pages

TelematiqueVol21Issue1 616

Uploaded by

shruti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

TELEMATIQUE Volume 21 Issue 1, 2022

ISSN: 1856-4194 6982 – 7012

Identification of Internet of Things (Iot) Attacks Using Gradient Boosting:

A Cross Dataset Approach
Shruti Garg and Vivek Kumar1, Srinivasa Rao Payyavula2
1
Department of Computer Science and Engineering, Birla Institute of Technology, Ranchi,
835215, INDIA
2
Head of Part, Samsung R&D Institute India, Bangalore

E-mail: 1mtity10005.20@bitmesra.ac.in, 2p.srinivasa@samsung.com

Received 08/11/2022; Accepted 30/11/2022

Abstract
IoT attacks have become very common in recent years, especially during pandemic times when
most activity takes place online. These attacks involve theft of data and complete or partial
blocking of access to various devices, creating an emergency at various locations. These at-
tacks/attackers can be found in various forms on the internet. With that being said, the aim of
this study is to identify ‘IoT attacks’ and ‘DDoS attacks’ using three different datasets, namely
BoT-IoT, IoT-23, and the Canadian Institute of Cyber Security-Distributed Denial of Service-
2019 (CIC-DDoS2019). BoT-IoT and IoT23 datasets are utilized in experiment I and II for
identifying IoT attacks. BoT-IoT dataset will be used for training in Experiment I, and the
testing will be done by IoT-23 dataset. Experiment II is conducted in the reverse order of the
datasets. Experiment III was conducted to identify DDoS attacks in the CIC-DDoS2019 dataset
on two different days. Training and testing were done in all experiments using two gradient
boosting techniques, namely Extreme Gradient Boosting (XGB) and Light Gradient Boosting
Method (LGBM), and their performance was compared with that of the Cascaded Deep Forest
(CDF). Feature extraction and selection (FES) is done using two established methods: principal
component analysis (PCA) and analysis of variance (ANOVA). The accuracy achieved with
the boosting methods is at least 16% higher than that achieved with CDF. Boosting algorithms
are at least 240 times faster than CDF. Among the two boosting algorithms, the execution time
of LGBM is the lowest; it is executed in 54 seconds or less and has an accuracy of up to 94.79%.

Index Term: IoT Attacks, Cross dataset, Machine Learning (ML), LGBM, XGB, CDF

1. INTRODUCTION

The increase in online activity during the pandemic period has opened up enormous op-
portunities for attackers [1]. Hackers capitalise on the venerability and security flaws in IoT
devices [2]. According to a report by Internet Data Corporation (IDC), there will be 55.7 bil-
lion connected devices, and 75% of them will be connected to IoT devices [3]. The number of
cyberattacks will increase to 15.4 million by 2023, doubling the 7.9 million attacks in 2018
[4].
Artificial intelligence (AI) is a buzzword now a days and widely accepted solution for
detecting attacks on the internet [5–6]. Machine learning (ML) categorization algorithms [7–
9] evolve as autonomous analytical tool to obtain accurate scores based on extracted features.
Various approaches to cyber-attack identification include non-AI-based methods, ML and DL.

6982
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

In this section, the authors aim to provide a review of the literature and background infor-
mation on ML approaches to cyber-attack intrusion and anomaly detection. Various ML algo-
rithms [10] were applied to BoT-IoT dataset to identify the traffic associated with attacks and
anomalies in an IoT network based on 44 selected features. Five ML algorithms, such as NB,
Bayes net (BN), Decision Tree (DT), C4.5 Random Tree, and Random Forest (RF) for accurate
identification of malicious Bot-IoT traffic. The C4.5, Random Tree, and RF algorithms
achieved 99.99% accuracy, while NB and BN achieved 99.79% and 99.77% accuracy, respec-
tively. To improve efficiency, they use the bijective soft set approach, which has been shown
to be very effective in decision-making and selection concepts. Hence, conclusion drawn that
the NB algorithm is most effective in detecting intrusions and anomalies in IoT networks. A
deep learning (DL) model [11] has been proposed to detect DDoS attacks on network traffic.
The proposed architecture was able to detect changes quickly and accurately, even with
smaller sample size. This is due to the classification process, feature extraction technique and
layers that are updated during training. They used the CICDoS2019 dataset and converted it
into two different formats to make it more effective for classifying and detecting DDoS attacks
with DDN. The first dataset was labelled as two types of traffic for the presence and absence
of DDoS attacks. The second dataset detects the entire spectrum of DDoS attacks. It was found
that the attacks were detected with 99.97% and 99.99% accuracy and precision, and the attack
types were classified with 80.49% and 94.57% precision and accuracy, respectively.

Pokhrel et al. [12] proposed an innovative technique using a ML approach to mitigate and
detect botnet DDoS attacks on IoT networks to solve the security problems caused by bots.
They used the BoT-IoT dataset, which has 999,610 records. Of these, 994,828 are botnet traffic
and the rest are normal. As the dataset was not balanced, another balanced dataset with an
equal amount of normal and botnet traffic was created using the SMOTE technique. Various
ML models were used to train the BoT-IoT dataset, namely, K-nearest neighbour (KNN), NB
and artificial neural network (ANN). The KNN algorithm achieved 92.1% accuracy and
ROC_AUC of 92.2% for the dataset created using SMOTE and 99.6% accuracy and 99.2%
ROC_AUC for the real-time dataset. It was found that the KNN algorithm was the best algo-
rithm for detecting cyberattacks in the BoT-IoT dataset. Hasan et al. [13] designed a deep
convolutional (DC) architecture for detecting DDoS attacks on optical burst systems (OBS).
The DC neural network approach proves to be very promising when the dataset is miniscule,
as general ML algorithms cannot effectively perform traffic analysis. A comparison of applied
ML algorithms support vector machine (SVM), NB and the KNN algorithm was done. It was
found that the DCNN model achieved 99% accuracy, while the ML models like NB, SVM and
KNN did not perform well, with 79%, 88%, and 93% accuracy, respectively. Their study there-
fore concluded that the DCNN model is most promising compared to the traditional ML algo-
rithms. Priyadarshini et al. [14] proposed a long short-term memory (LSTM) model to detect
the anomalous characteristics of DDoS attacks at the transport/network level. The model se-
cures cloud computing and fog computing environments. The LSTM model is most efficient
on time-based sequential data and is therefore proper for training samples of network traffic
packets recorded at specific time intervals. LSTM has the ability to retain past and future
knowledge to influence the current packet. They used the IDS CTU-13 botnet and ISCX 2012
datasets. The experiment was conducted with different numbers of hidden layers, units and
dropouts for the LSTM model. It was found that the model with 128 units and three hidden
layers was the best model, with 98.88% accuracy.

6983
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

1.1 Cross dataset attack identification

Although there is enough research to identify IoT attacks, cross-dataset research is still un-
explored. The different types of IoT attacks and the possible locations of their occurrence are
shown in Fig. 1. Also, Fig. 1 shows that IoT devices can be accessed from five different loca-
tions and that 10 types of attacks are possible through these devices.

Fig. 1. IoT Environment and possible attacks [15]

The behaviour of the attacks is variable by nature. The new attacks that take place in the
future may differ from the previous ones. Therefore, it is not enough to train and test the same
attack. To solve this problem, a cross-data test of the trained models was performed in the
current work. Two gradient boosting algorithms are used to identify attacks, which are trained
and tested with different datasets. The entire workflow of the current work is shown in Fig. 2.

Fig. 2. Workflow diagram of attack identification using boosting algorithms

6984
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

Fig. 2 states the author’s contribution as follows:

1. Here, cross-dataset training and testing of IoT attacks and DDoS attacks is performed using
two boosting algorithms, i.e., XGB [16-18] and LGBM [19, 20], since the attack datasets
consist of a large amount of data (more than 72,000,000 records [21]). Therefore, gradient
boosting algorithms are selected for attack identification. Gradient boosting executes 52
times faster than other algorithms with high accuracy [22].
2. The performance of gradient-boosting algorithms is compared with a fairly new algorithm,
i.e., CDF [23–25]. CDF was proposed in 2017 by combining deep learning and random
forest and performed better than both [26].
3. Two established datasets, namely BoT-IoT [12, 27] and IoT-23 [28–29], were used alter-
nately as training and test sets in the first two experiments. The result of the first two exper-
iments was binary: malicious and non- malicious. The third experiment uses the CIC-DDoS-
2019 dataset [30–32] to identify five DDoS attacks: LDAP, MSSQL, Net-BIOS, SYN-Flood
and UDP-Flood. The training and test cases in this experiment were created using data from
two different days of the same dataset. There was no correlation between the experiments
on the two days.
4. Because the training and test data come from different datasets, feature extraction and se-
lection (FES) is conducted using two statistical methods: Principle Component Analysis
(PCA) and Analysis of variance (ANOVA).
5. To create similar attributes in the BoT-IoT and IoT-23 datasets, the pcap files of the IoT-23
dataset are extracted and converted to csv using the CICFLOWMETER [38-39] software.
The rest of the paper is structured as follows: Section 2 briefly describes the different meth-
ods of feature selection and classification. Section 3 describes the detailed experimental results
and their detailed analysis. Section 4 contains the discussion, followed by a conclusion and an
outlook for the future scope of the paper in Section 5.

I. Experiments
1.1 Data description
In this paper, three datasets, BoT IoT[10, 27, 40, 41], IoT-23[27-28, 60] and CIC-DDoS-
2019[30, 31, 42] ], are used for attack detection. The first two datasets were based on attacks
on IoT devices, and the third dataset consists of 12 different DDoS attacks triggered on two
different days. The description of the three datasets is as follows:
2.1.1 BOT IOT
A realistic network environment was created by designing a cyber range lab at UNSW
Canberra for the preparation of the BoT-IoT dataset [43]. The reliability of the BoT-IoT dataset
was evaluated [10, 27, 40, 41] evaluated through different machine learning and statistical;
methods for various application and also compared with the existing datasets. In this dataset a
combination of normal and botnet traffic identification across IOT-specific network is provided.
It consists of different format pcap file having more than 72,0000,000 records of size 69.3 GB,
where flow traffic csv file is of 16.7 GB. The dataset includes different attacks such as: DDoS,
DoS, OS and Service Scan, Keylogging and Data exfiltration attacks. To handle such huge data,
5% of the of original data was extracted through MYSQL queries which is having 3 million
records of size 1.07 GB.
2.1.2 IOT23
The IoT-23 dataset was captured in the Stratosphere Laboratory by the AIC Group, FEL and
CTU University in the Czech Republic. It is a large dataset [28, 44] with real and labelled

6985
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

network traffic from IoT devices. The aim of preparing this dataset is to provide a new reposi-
tory of IoT malware for the application of ML algorithms. The IoT-23 dataset consists of 20
malware captures running on IoT devices and three captures of benign IoT device traffic. Many
researchers [45-47] have used the dataset for their experiment in identification of IOT
attacks.
2.1.3 CIC-DDoS-2019
CIC-DDoS-2019 [30-32, 42] ] is a new real-world dataset consisting of various types of
attacks with 50,063,112 records [48]. Of these, 56,863 records represent benign traffic and
50,006,249 records represent DDoS attacks. Each of these rows contains 86 features to identify
the respective attacks. In preparing the dataset, a realistic background of traffic was generated
using the B-profile system [49] to mock the abstract behaviour of human interactions in the
proposed testbed. The attacks included in the CIC-DDoS-2019 dataset consisted of DDoS at-
tacks on two days. The first day is called the ‘test day’, and the second day is called the ‘training
day’. There was no connection between the experiments on the two days. The different attacks
on two different days are shown in Fig .3.

UDP- UDP-Lag(1873) LDAP(1905191)

Flood(3754680)
MSSQL(5763061)

SYN
Flood(4284751)

PortMap(186960)
NetBIOS(3454578)
LDAP(1905191) MSSQL(5763061) NetBIOS(3454578) PortMap(186960)
SYN Flood(4284751) UDP-Flood(3754680) UDP-Lag(1873)

(a) Attack type present on ‘test day’ with their number of instances in each CSV

6986
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

SYN
TFTP(1048574) Flood(1582681) UDP
NTP(1217007)
Flood(1470742)
NetBIOS(4094986) UDP-Lag(370607)
MSSQL(1844905)

SSDP(2611374)

SNMP(5161377) LDAP(2181542)

DNS(5074413)

SYN Flood(1582681) UDP Flood(1470742) UDP-Lag(370607) MSSQL(1844905)

SSDP(2611374) LDAP(2181542) DNS(5074413) SNMP(5161377)
NetBIOS(4094986) NTP(1217007) TFTP(1048574)

(b)Attack type present on ‘training day’ with their number of instances in each CSV

Fig. 3. Attacks present in CIC-DDoS-2019 dataset in two days

1.2 Feature Extraction and Selection (FES)

The FES [33-35] process plays an important role in ML and has a significant impact on the
performance of the model. When building a predictive model, feature selection consists of
minimising the number of input variables. The data attributes used to train ML models have a
significant impact on the results [50]. The performance of the model can be affected by irrele-
vant or only partially relevant characteristics. The first and most critical phases of model design
should be feature selection and data cleansing.
The following are some of the advantages of conducting feature selection before training the
data:
 Reduces overfitting: If there is fewer duplicate data, conclusions based on noise are less
likely to be drawn.
 Improve accuracy: Modelling accuracy increases as a result of less misleading data.
 Reduces training time: The complexity of algorithms is reduced when there are fewer data
points and algorithms train quicker.
In ML, the purpose of feature selection is to discover the optimal set of characteristics that
allows one to develop usable models of the phenomena being examined. The following cate-
gories can be used to group FES strategies in ML:
 Supervised Approaches: These techniques may be used with labelled data to uncover im-
portant features for supervised models, such as classification and regression.
 Unsupervised Approaches: Unlabelled data may be analysed using these techniques.
FES in the current research is conducted using two established feature selection methods:
PCA and ANOVA.

2.2.1 PCA
6987
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

It is explained in [18, 51-53] and an unsupervised linear transformation approach generally

used for feature extraction and dimensionality reduction. Its goal is to find the directions with
the highest variance in high-dimensional data and project the data onto a new subspace that has
the same or fewer dimensions than the original one.
Fig. 4 shows the highest variance directions of the data in the graphical representation. PCA1
(first maximum variance) and PCA2 (second maximum variance) are used to represent this
direction of variance (2nd maximum variance). The direction of the greatest variance in the
data helps identify an object. PCA differs from other feature selection approaches such as ran-
dom forest, regularisation, forward/backward selection because it does not require class labels
(hence it is called unsupervised).

Fig. 4. PCA – Directions of maximum variance

2.2.2 ANOVA
The term ‘analysis of variance’ i.e. ANNOVA refers to the process of comparing two or more
variables [54- 55]. As the name suggests, it compares many independent groups using variance
as a metric. One-way ANOVA and two-way ANOVA are two types of ANOVA. When there
are three or more independent groups of a variable, a one-way ANOVA is used [36-37]. Fig. 5
shows the two distributions and their behaviour.

Fig. 5. Behavior of distributions

6988
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

From Fig. 5 it can be inferred that if the distributions are close to each other or overlap, the
overall mean and the individual means are comparable. However, when the distributions are
far apart, the overall mean and the individual means differ by a greater distance. Since the
values in each group differ, this indicates differences between them. Therefore, the ANOVA
was used in the current study to examine variability between different groups and within the
same group. The significant difference between groups was measured by the F-ratio in
ANOVA. This is close to one if there is no significant difference between the groups and all
variances are identical.

2.3 Classification
The concept of manuals is evolving in a world where virtually all manual processes have
been automated. Algorithms for ML can make computers play chess, perform operations and
become more intelligent and more personable. Many security techniques have been used to
prevent cyberattacks [56] and to identify intruders through intrusion detection systems (IDS).
In this study, two boosting algorithms are used: XGB, LGBM and a non-boosting technique:
the CDF algorithm used to identify intrusion in cyberattacks [20, 57, 58]. Boosting is an en-
semble strategy used to improve the performance of model predictions of any learning algo-
rithm. The goal of boosting is to instruct poor learners in a systematic way, with each attempt
correcting the previous one. Extensions aimed at computational efficiency have recently made
boosting approaches fast enough for widespread use. Boosting methods have become the pre-
ferred and often the best-performing strategy in ML contests for classification and regression
on tabular data.

2.3.1 Extreme Gradient Boosting (XGB)

A decision tree-based gradient boosting ML ensemble is called XGB technique[16-18]. The
performance of XGB is higher than that of artificial neural networks (ANN), which is found to
be superior to all other algorithms in the prediction of unstructured data (pictures, text, etc.).
This approach generates new models that forecast the errors and residuals of all earlier models,
which are then combined, and a final prediction is generated.
However, DT-based algorithms are now regarded as best-in-class for small-to-medium struc-
tured/tabular data [59]. All feasible algorithms must be tested with the data at hand by data
scientists to determine the best algorithm. Furthermore, selecting the appropriate algorithm is
insufficient. Hyperparameters must be tuned to get the best algorithm configuration for a da-
taset. There are also various additional factors to consider while selecting the winning method,
such as computational complexity, explain ability and implementation simplicity.

2.3.2 Light gradient boosting method (Light GBM)

A tree-based gradient boosting system was described and applied in [60-61] Light GBM is
a fast algorithm that builds trees vertically, whereas trees in other ML algorithms grow hori-
zontally. Light GBM thus builds the tree leaf-by-leaf rather than level-by-level. A larger value
of delta loss was used for leaf growth. Leaf-by-leaf algorithms can minimise more losses while
growing the same leaf compared to step-by-step algorithms. This approach, where data grow
at an exponential rate, makes Light GBM suitable for standard data science techniques to pro-
vide faster results. Light GBM [19-20] can handle large amounts of data while requiring less
memory. LGBM also allows GPU learning, so it is particularly well suited to research where
there is a huge amount of data in each dataset. LGBM does not work well on small datasets.
Light GBM can easily overfit small datasets. Although there is no limit to the number of rows,

6989
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

previous research suggests that it should only be used for data with 10,000 rows or more [79].
Light GBM uses histogram-based methods that divide continuous feature values (attributes)
into discrete bins using histogram-based techniques [61-62]. This reduces memory require-
ments and speeds up training.

2.3.3 Cascaded Deep Forest (CDF)

Cascading Deep Forest is a cascading ensemble approach used in various applications [23-
25] to improve performance. Several layers of Random Forest have been implemented in Deep
Forest, which is similar in architecture to deep neural networks but more user-friendly and
easier to train, as evidenced by some recent studies [63]. However, deep neural networks usu-
ally fail on small datasets, while deep forest works well on small datasets. Moreover, it is much
easier to train deep forest models because there are limited hyperparameters compared to deep
neural networks [64-65]. There are several limitations in supervised ML [66] such as labelled
data, data size, overfitting and class imbalance, which makes researchers focus on the CDF
model.

II. Experiments and Results

The experiments carried out in the present study involve two types of identification of at-
tacks:
1. IoT attacks
2. DDoS attack
These attacks are identified here using two boosting algorithms XGB and LGBM according
to FES at 5, 10, 15.....35 features. A comparison of the boosting algorithm is made with CDF
in terms of accuracy and time for identifying attacks.
To process a high volume of data, a high-end computer i7 -10750 CPU @2.60 GHz with 16
GB RAM with a Windows 10 operating system is used here. All programmes are written in
Python 3.7 and run on Jupyter Notebook.

3.1 Pre-processing
All attack files present in the various datasets are pre-processed in the following step.
1. Removal of attributes with object instances (e.g. pkSeqID, daddr, subcategory, etc.) (See
supplementary material Tables S1–S3, for details).
2. Encoding the labels i.e., normal as 0 and 1, for all attacks in Experiments II and I. The five
attacks LDAP, MSSQL, NetBIOS, SYN-Flood and UDP- Flood are encoded as 1–5, respec-
tively and benign is encoded as 0 in experiment III. (See supplementary material Tables S4–
S6, for details).
3. Removal of columns with standard deviation = 0 and correlation = 0.
4. Normalisation of independence variables x using equation 1.
x = (x-µ(x))/σ (x)
(1)
5. Verification of standard deviation & correlation.
6. Creation of train and test set. Instances of train and test set are taken as different dataset/
data from different days in the case of CIC-DDoS-2019. (See supplementary material Table
S4–S6, for details)
The results obtained in the three experiments are evaluated in terms of time, accuracy and
number of features required to achieve highest accuracy. The formula used for accuracy calcu-
lation is given in equation 2.

6990
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

Accuracy=TP+TN/Total instances
(2)

Where TP =True Positive

TN= True Negative

The descriptions of experiment can be found in the following subsections:

3.2 Experiment I: BoT-IoT vs IoT23

In this experiment, the BoT-IoT dataset is taken as training data with 718045 instances and
IoT-23 dataset is taken as test set with 147662 instances. The BoT-IoT instance has been re-
trieved from All features - Files - CloudStor (aarnet.edu.au) which consists of 5% of the entire
dataset labelled as normal and type of attack. All different types of attacks are considered ma-
licious data, and all normal instances are considered non-malicious data. Four attacks, Benign,
Okiru, PartOfAHorizontalPortScan and DDoS, are taken from the IoT-23 dataset for test sam-
ples in pcap format. The pcap files were converted to csv using CICFLOWMETER software
to make attributes compatible in both datasets. From converted csv files, benign instances are
treated as non-malicious samples, and the rest of all attack instances are treated as malicious.
Hence, binary classification is done here. The results obtained in this experiment are shown in
Table 1.
Table 1 shows that the highest accuracy is achieved for five features in the shortest time for
both boosting and non-boosting algorithms. However, PCA feature selection works better for
boosting algorithms, while ANOVA achieves higher accuracy for non-boosting algorithms.
Comparing the results by time, the non-boosting algorithm takes the most time. The same ob-
servation was made when comparing accuracy. The accuracy of the non-boosting algorithms
was lower than that of the boosting algorithms.

Table 1: Results of boosting and non-boosting models for BoT IoT vs IoT 23 in terms of
accuracy and time
Boosting techniques Non boosting
technique
XGB LGBM CDF
Features No. Accu- Time Accu- Time Accu- Time
Methods of fea- racy (Min) racy (Min) racy (Min)
tures
5 88.54 2.39 90.54 0.05 71.96 54.47
PCA 10 83.74 2.51 83.89 0.08 66.87 55.45
15 83.14 3.14 82.98 0.11 66.93 55.39
20 81.51 3.27 82.71 0.14 66.87 55.58
25 81.64 3.39 82.96 0.15 66.12 56.07
30 82.14 3.47 83.11 0.18 66.21 56.37
35 81.64 3.59 83.37 0.21 65.74 56.49
ANOVA 5 84.67 1.42 86.87 0.09 79.84 56.37
10 77.44 1.55 78.74 0.10 70.77 58.59
15 77.87 1.57 78.76 0.11 70.58 57.57
20 77.69 2.01 78.64 0.12 71.27 57.32
25 77.74 2.27 78.77 0.12 71.34 57.49
6991
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

30 77.78 2.55 78.88 0.13 71.38 58.36

35 76.98 2.58 77.49 0.16 70.64 59.12

3.3 Experiment II: IoT 23 vs BoT IoT

As in Experiment 1, the IoT-23with 165872 instances is used as the training data and the
BoT-IoT with 33174 instances is used as the test set. The attacks are labelled malicious and
non-malicious. The results obtained in Experiment II are listed in Table 2.

Table 2: Results of boosting and non-boosting models for IoT 23 vs BoT IoT in terms of
accuracy and time
Boosting techniques Non boosting
technique
XGB LGBM CDF
Features No. Accu- Time Accu- Time Accu- Time
Methods of fea- racy (Min) racy (Min) racy (Min)
tures
5 85.25 2.39 87.79 0.05 70.37 56.28
PCA 10 79.88 1.24 81.09 0.07 62.69 57.42
15 79.74 1.39 80.96 0.08 62.43 58.04
20 79.63 1.42 80.71 0.11 62.17 58.47
25 79.47 1.51 80.53 0.12 62.09 58.52
30 80.03 1.57 81.24 0.15 63.66 59.31
35 79.21 2.06 80.46 0.18 62.35 59.77
ANOVA 5 81.96 1.42 83.45 0.03 79.84 56.37
10 74.67 1.44 75.24 0.04 69.45 56.57
15 74.43 1.52 75.18 0.05 69.38 57.12
20 74.37 2.03 75.49 0.08 68.23 57.44
25 74.19 2.11 75.32 0.11 68.18 58.29
30 75.02 2.15 76.01 0.12 69.17 58.32
35 74.18 2.23 75.67 0.15 68.32 59.01

Table 2 shows that the results obtained in experiment II are consistent with those of exper-
iment I. PCA feature selection works better with boosting algorithms, while ANOVA achieves
higher accuracy with non-boosting. Comparing the results by time, the non-boosting algorithm
takes the most time. The same observation was made when comparing accuracy. The accuracy
of the non-boosting algorithm was lower than that of the boosting algorithm.

3.4 Experiment III: CIC-DDoS-2019 Train vs Test

The CIC-DDoS-2019 dataset consists of different attacks carried out on two different days
(see Section 2.1.3). It is found that there are six attacks: LDAP, MSSQL, NetBIOS, SYN-Flood,
UDP-Flood and UDP-lag, which occur on both days. The percentage of records in the six dif-
ferent attack files is shown in Table 3.
Table 3 shows that the number of UDP lags is only 3.21% on the training day and 0.01% on
the test day. Therefore, the UDP-lag is also removed from the experiment.
The boosting algorithms are applied to five common attacks presented on the 'training day'
and the 'tested day' with 1063244 and 312706 samples, respectively. The accuracy obtained
with the two boosting algorithms and CDF, as well as the execution time, are shown in Table
4.
6992
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

From Table 4, the highest accuracy was obtained for 35 features. The feature selection
method PCA gives higher accuracy in the boosting algorithm, while ANOVA gives higher
accuracy in the non-boosting method. The accuracy obtained with the non-boosting method
was at least 9% lower than the accuracy obtained with the boosting algorithms. Additionally,
the values achieved by the boosting algorithms were at least 46 minutes lower than those of the
non-boosting method.

Table 3: Percentage of records in six common attack files

Training Day Test day
No. of in- No. of in-
Attack Type stances Percentage (%) stances Percentage (%)
LDAP 2181542 18.90 1905191 9.94
MSSQL 1844905 15.98 5763061 30.07
NetBIOS 4094986 35.47 3454578 18.03
SYN Flood 1582681 13.71 4284751 22.36
UDP-Flood 1470742 12.74 3754680 19.59
UDP-Lag 370607 3.21 1873 0.01

Table 4: Results of boosting and non-boosting models for day2 vs day1 in terms of accu-
racy and time
Boosting techniques Non-boosting tech-
nique
XGB LGBM CDF
Features No. Accu- Time Accu- Time Accu- Time
Methods of fea- racy (Min) racy (Min) racy (Min)
tures
5 90.86 2.37 91.79 0.29 80.92 48.27
PCA 10 91.67 2.54 92.45 0.36 81.77 48.44
15 92.09 3.10 92.84 0.39 82.16 49.07
20 92.74 3.24 93.55 0.41 82.89 49.36
25 93.13 3.36 93.97 0.45 83.03 49.41
30 93.78 3.49 94.33 0.47 83.14 49.45
35 94.49 4.05 94.79 0.51 83.76 50.02
ANOVA 5 90.61 2.30 91.84 0.34 80.11 48.27
10 91.47 2.45 92.47 0.37 81.26 48.41
15 92.22 3.07 92.51 0.41 82.71 49.09
20 92.82 3.34 93.13 0.45 83.84 49.32
25 93.07 3.46 93.42 0.47 84.05 49.46
30 93.18 3.51 93.97 0.49 84.48 50.07
35 93.86 4.08 94.29 0.54 85.41 50.39

6993
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

2. DISCUSSION AND CONCLUSIONS

Several researchers have applied different ML algorithms to identify different types of at-
tacks [4, 67]. Attacks on the internet can take different forms depending on the attack target,
severity of the attack, network type and legal aspects [68]. Of the different types of attacks,
‘IoT attacks’ and ‘DDoS attacks’ are identified in this paper. Two gradient boosting and one
non-boosting ensemble ML were used to identify both types of attacks. The results from the
three experiments show that gradient-boosting methods perform better than non-boosting
methods. Even the boosting method is faster than the non-boosting method in both binary and
multiclass classifications. The graph shown in Figs. 6 and 7 shows the average accuracies
achieved in the three experiments conducted here.

Average accuracies(%)

100.00
80.00
60.00
40.00
20.00
0.00
5 10 15 20
25 30 35 5 10 15 20
PCA 25 30 35
ANOVA

CDF XGB LGBM

Fig. 6. Average accuracies obtained by different algorithms

Average execution time(in mins)

60.00

40.00

20.00

0.00
5 10 15
20 25 30
35 5 10 15 20 LGBM
PCA 25 30 35
ANOVA

LGBM XGB CDF

Fig. 7. Average execution time taken by different algorithms

Fig. 6 shows that the average accuracy of the boosting algorithms is between 84–90%, while
it is less than 80% for the non-boosting algorithms. However, the average accuracy of both
6994
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

methods is higher for five features, which is evident from the peaks at the beginning and middle
of the graph. Comparing the average accuracy of the two boosting methods, the values obtained
by LGBM were higher than those of XGB.
Fig. 7 shows that the average execution time of the two boosting algorithms is significantly
lower (about 50 minutes) than that of the non-boosting algorithms. The maximum execution
time of CDF is about 53–56 minutes for 5, 10, 15, .......35 number of features. The average
execution time of LGBM is between 0.13–0.30 minutes for the different features selected. The
average execution time for XGB is between CDF and LGBM.
As in the current work, various ML algorithms are used to identify attacks, which were sum-
marised by Hazi and Ameen in 2021 [69]. ML has been applied in a collaborative and decen-
tralised manner called federated learning, as described in [70]. IoT botnet attacks were identi-
fied in [71–72] using ML and rule-based fuzzy learning. Shafiq et al. (2020) proposed a wrap-
per-based feature selection method for malicious IoT traffic [73].
The traditional ML was applied to the IoT-23 dataset in [74–75]. In [76], 1D, 2D, and 2D
deep learning methods were applied to the BoT-IoT and IoT-23 datasets. A hybrid approach,
CyDDoS [77], was proposed in Intrusion Detection System, which combines an ensemble of
feature engineering with DL. The proposed method was applied to the CIC-DDoS- 2019 da-
taset and its performance was tested in CPU and GPU environments.
XGB [78] was used as a feature selection tool by Poornima et al. (2022) along with LSTM.
Of the two boosting methods used in the present work, LGBM was found to be an efficient and
suitable method for intrusion detection [79–83]. However, none of the researchers used LGBM
for cross-data intrusion detection. However, a cross-dataset attack identification study was
presented in [84] for the IOTID20 and Bot-IoT datasets. However, the basic ML algorithms
DT, NB, kNN, logistic regression (LR) and RF were used for binary classification. No time
comparison analysis was done, which is important in the present study, as non-boosting tech-
niques are 240 times faster than boosting techniques (from Fig. 7).
As an advance in research on intruder detection, ML was used in [85], which confirms the
validity of the current research in the present scenario. As every research study has some lim-
itations, the current research is not validated for real data as the infrastructure is not available.
Designing and testing adversarial cases could be an extension of the current research in the
future.
The conclusions from the above discussion state that identifying IoT attacks with boosting
methods’ is an important research work. To achieve the above objective, two types of attacks
are identified: IoT attacks and DDoS attacks as binary class and multiclass output, respectively.
Three trending datasets are used to identify the above attacks, namely BoT-IoT, IoT-23 and
CIC-DDoS-2019. A boosting and non-boosting approach was used to identify the attacks. The
boosting approach was found to be suitable for identifying attacks. Of the two boosting meth-
ods, LGBM is the most efficient, with an accuracy of 94.79% in 0.51 seconds, along with PCA
as the FES method for DDoS attacks.

3. REFERENCE

[1] A. Arampatzis, L. O'Hagan, Cybersecurity and Privacy in the Age of the Pandemic, In
Handbook of Research on Cyberchondria, Health Literacy, and the Role of Media in So-
ciety’s Perception of Medical Information (2022) :pp. 35-53, IGI Global.
[2] A. E. Omolara, A. Alabdulatif, O. I. Abiodun, M. Alawida, A. Alabdulatif, and H. Ar-
shad, The internet of things security: A survey encompassing unexplored areas and new
insights, Computers & Security 112 (2022): 102494.

6995
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

[3]https://www.cisco.com/c/en/us/solutions/collateral/executive perspectives/annual-internet-
report/white-paper-c11-741490.html
[4] Y. Miao, C. Chen, L. Pan, Q.-L. Han, J. Zhang, and Y. Xiang, Machine Learning–based
Cyber Attacks Targeting on Controlled Information: A Survey, ACM Computing Sur-
veys (CSUR) 54, no. 7 (2021): 1-36.
[5] C. Iwendi, et al. "Sustainable security for the internet of things using artificial intelli-
gence architectures." ACM Transactions on Internet Technology (TOIT) 21.3 (2021): 1-
22.
[6] S. Dilek, H. Çakır, and M. Aydın, Applications of artificial intelligence techniques to
combating cyber crimes: A review, arXiv preprint arXiv:1502.03552 (2015).
[7] A. Delplace, S. Hermoso, and K. J. a. p. a. Anandita, "Cyber Attack Detection thanks to
Machine Learning Algorithms," arXiv preprint arXiv:2001.06309 (2020).
[8] A. A. AlZubi, M. Al-Maitah, and A. Alarifi, Cyber-attack detection in healthcare using
cyber-physical system and machine learning techniques, Soft Computing 25, no. 18
(2021): 12319-12332..
[9] A. Handa, A. Sharma, and S. K. Shukla, Machine learning in cybersecurity: A review,
Data Mining and Knowledge Discovery 9, no. 4 (2019): e1306.
[10] M. Shafiq, Z. Tian, Y. Sun, X. Du, and M. Guizani, Selection of effective machine learning
algorithm and Bot-IoT attacks traffic identification for internet of things in smart city,
Future Generation Computer Systems 107 (2020): 433-442.
[11] A. E. Cil, K. Yildiz, and A. Buldu, Detection of DDoS attacks with feed forward based
deep neural network model, Expert Systems with Applications 169 (2021): 114520..
[12] S. Pokhrel, R. Abbas, and B. Aryal, IoT Security: Botnet detection in IoT using Machine
learning, arXiv preprint arXiv:2104.02231 (2021).
[13] M. Z. Hasan, K. Z. Hasan, and A. Sattar, Burst header packet flood detection in optical
burst switching network using deep learning model, Procedia computer science 143
(2018): 970-977.
[14] R. Priyadarshini, R. K. Barik, and I. Sciences, A deep learning based intelligent framework
to mitigate DDoS attack in fog environment, Journal of King Saud University-Computer
and Information Sciences (2019).
[15]N. Islam et al., "Towards machine learning based intrusion detection in IoT networks," vol.
69, pp. 1801-1821, 2021.
[16] E. Papageorgiou, A Predictive Model for Customer Satisfaction, (2021).
[17] M. H. L. Louk, and B. A. Tama, Exploring Ensemble-Based Class Imbalance Learners for
Intrusion Detection in Industrial Control Networks, Big Data and Cognitive Computing 5,
no. 4 (2021): 72.
[18] S. Das, S. Bose, G. K. Nayak, S. C. Satapathy, and S. Saxena, Brain tumor segmentation
and overall survival period prediction in glioblastoma multiforme using radiomic features,
Concurrency and Computation: Practice and Experience (2021): e6501.
[19] F. Alzamzami, M. Hoda, and A. Saddik, Light gradient boosting machine for general sen-
timent classification on short texts: a comparative evaluation, IEEE Access 8 (2020):
101840-101858.
[20] M. Massaoudi, H. Abu-Rub, S. S. Refaat, I. Chihi, and F. S. Oueslati, An effective ensem-
ble learning approach-based grid stability assessment and classification, in 2021 IEEE
Kansas Power and Energy Conference (KPEC), 2021, pp. 1-6: IEEE.
[21] N. Koroniotis, N. Moustafa, E. Sitnikova, and B. Turnbull, Towards the development of
realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot
dataset, Future Generation Computer Systems 100 (2019): 779-796

6996
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

[22] C. Bentéjac, A. Csörgő, and G. Martínez-Muñoz, A comparative analysis of gradient

boosting algorithms, Artificial Intelligence Review 54, no. 3 (2021): 1937-1967.
[23] S. Ray, Disease classification within dermascopic images using features extracted by res-
net50 and classification through deep forest, arXiv preprint arXiv:1807.05711 (2018).
[24] Y.-L. Zhang, J. Zhou, W. Zheng, J. Feng, L. Li, Z. Liu, M. Li, Z. Zhang, C. Chen, X. Li,
and Y. Qi, Distributed deep forest and its application to automatic detection of cash-out
fraud," ACM Transactions on Intelligent Systems and Technology (TIST) 10, no. 5
(2019): 1-19.
[25] W. Zhang and M. Wang, An improved deep forest model for prediction of e-commerce
consumers’ repurchase behavior, Plos one 16, no. 9 (2021): e0255906.
[26] Z.-H. Zhou and J. Feng, Deep forest, National Science Review 6, no. 1 (2019): 74-86..
[27] J. M. Peterson, J. L. Leevy, and T. M. Khoshgoftaar, A review and analysis of the bot-iot
dataset, in 2021 IEEE International Conference on Service-Oriented System Engineering
(SOSE), 2021, pp. 20-27: IEEE.
[28] J. S. Bains, H. V. Kopanati, R. Goyal, B. K. Savaram, and S. Butakov, Using Machine
Learning for malware traffic prediction in IoT networks, in 2021 Second International
Conference on Intelligent Data Science Technologies and Applications (IDSTA), 2021,
pp. 146-149: IEEE.
[29] S. Garcia, A. Parmisano, & M. Jose Erquiaga. (2020). IoT-23: A labeled dataset with ma-
licious and benign IoT network traffic (Version 1.0.0) [Data set]. Zenodo.
http://doi.org/10.5281/zenodo.4743746”
[30] H. Kousar, M. M. Mulla, P. Shettar, and D. Narayan, Detection of DDoS Attacks in Soft-
ware Defined Network using Decision Tree, in 2021 10th IEEE International Conference
on Communication Systems and Network Technologies (CSNT), 2021, pp. 783-788:
IEEE.
[31] D.-C. Can, H.-Q. Le, and Q.-T. Ha, Detection of distributed denial of service attacks using
automatic feature selection with enhancement for imbalance dataset, in Asian Conference
on Intelligent Information and Database Systems, 2021, pp. 386-398: Springer.
[32] I. Sharafaldin, A. H. Lashkari, S. Hakak, and A. A. Ghorbani, Developing realistic distrib-
uted denial of service (DDoS) attack dataset and taxonomy, in 2019 International Carna-
han Conference on Security Technology (ICCST), 2019, pp. 1-8: IEEE.
[33] J. Shlens, A tutorial on principal component analysis. arXiv preprint arXiv: (2014)
1404.1100.
[34] G. Chandrashekar, F. J. C. Sahin, and E. Engineering, A survey on feature selection meth-
ods, Computers & Electrical Engineering 40, no. 1 (2014): 16-28.
[35] B. Xue, M. Zhang, W. N. Browne, and X. Yao, A survey on evolutionary computation
approaches to feature selection, IEEE Transactions on Evolutionary Computation 20, no.
4 (2015): 606-626.
[36] U. Moorthy, U. D. Gandhi, and H. Computing, A novel optimal feature selection technique
for medical data classification using ANOVA based whale optimization, Journal of Am-
bient Intelligence and Humanized Computing 12, no. 3 (2021): 3527-3538..
[37] J. Ramírez, J. M. Górriz, A. Ortiz, F. J. Martínez-Murcia, F. Segovia, D. Salas-Gonzalez,
D. Castillo-Barnes, I. A. Illán, C. G. Puntonet, and Alzheimer's Disease Neuroimaging
Initiative, Ensemble of random forests One vs. Rest classifiers for MCI and AD predic-
tion using ANOVA cortical and subcortical feature selection and partial least squares,
Journal of neuroscience methods 302 (2018): 47-57.

6997
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

[38] M. K. A. Abuthawabeh and K. W. Mahmoud, Android malware detection and categoriza-

tion based on conversation-level network traffic features, in 2019 International Arab Con-
ference on Information Technology (ACIT), 2019, pp. 42-47: IEEE.
[39]W.-H. Lin, P. Wang, B.-H. Wu, M.-S. Jhou, K.-M. Chao, and C.-C. Lo, "Behaviorial-based
network flow analyses for anomaly detection in sequential data using temporal convolu-
tional networks," in International Conference on e-Business Engineering, 2019, pp. 173-
183: Springer.
[40] P. Kumar, G. P. Gupta, R. Tripathi, and Engineering, Toward design of an intelligent cyber
attack detection system using hybrid feature reduced approach for iot networks, Arabian
Journal for Science and Engineering 46, no. 4 (2021): 3749-3778..
[41] M. Shafiq, Z. Tian, A. K. Bashir, X. Du, and M. Guizani, CorrAUC: a malicious bot-IoT
traffic detection method in IoT network using machine-learning techniques, IEEE Inter-
net of Things Journal 8, no. 5 (2020): 3242-3254.
[42] S. Ullah, Z. Tian, A. K. Bashir, X. Du, & M. Guizani, HDL-IDS: a hybrid deep learning
architecture for intrusion detection in the Internet of Vehicles, IEEE Internet of Things
Journal 8, no. 5 (2020): 3242-3254.
[43] Koroniotis, N., N. Moustafa, E. Sitnikova, & B. Turnbull, Towards the development of
realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot
dataset, Future Generation Computer Systems 100 (2019): 779-796.
[44] S. Garcia, A. Parmisano, & M. J. Erquiaga., IoT-23: A labeled dataset with malicious and
benign IoT network traffic, Stratosphere Lab., Praha, Czech Republic, Tech. Rep (2020),
http://doi.org/10.5281/zenodo.4743746”
[45] D. Nanthiya, P. Keerthika, S. Gopal, S. Kayalvizhi, T. Raja, and R. S. Priya, SVM Based
DDoS Attack Detection in IoT Using Iot-23 Botnet Dataset, in 2021 Innovations in Power
and Advanced Computing Technologies (i-PACT), 2021, pp. 1-7: IEEE.
[46] R. Maciel, J. Araujo, C. Melo, P. Pereira, P., J. Dantas, & P. Maciel, Impact evaluation of
DDoS and Malware attack using IoT devices, Distributed Denial of Service Attacks: Con-
cepts, Mathematical and Cryptographic Solutions 6 (2021): 1.
[47] P. Zyblewski, M. Pawlicki, R. Kozik, and M. Choraś, Cyber-Attack Detection from IoT
Benchmark Considered as Data Streams, in Progress in Image Processing, Pattern Recog-
nition and Communication Systems: Springer, 2021, pp. 230-239.
[48] M. A. Ferrag, L. Shu, H. Djallel, and K.-K. R. Choo, Deep learning-based intrusion detec-
tion for distributed denial of service attack in Agriculture 4.0, Electronics 10, no. 11
(2021): 1257.
[49] A. Gharib, I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, An evaluation framework
for intrusion detection dataset, in 2016 International Conference on Information Science
and Security (ICISS), 2016, pp. 1-6: IEEE.
[50] W. Liu and J. Wang, "A brief survey on nature-inspired metaheuristics for feature selection
in classification in this decade, in 2019 IEEE 16th International Conference on Network-
ing, Sensing and Control (ICNSC), 2019, pp. 424-429: IEEE.
[51] F. Song, Z. Guo, and D. Mei, Feature selection using principal component analysis, in
2010 international conference on system science, engineering design and manufacturing
informatization, 2010, vol. 1, pp. 27-30: IEEE.
[52] A. K. Gárate-Escamila, A. H. El Hassani, and E. Andrès, Classification models for heart
disease prediction using feature selection and PCA, Informatics in Medicine Unlocked 19
(2020): 100330.
[53] X. Fan, H. Feng, M. J. C. Yuan, and Decision, PCA based on mutual information for fea-
ture selection, Control and Decision 28, no. 6 (2013): 915-919.

6998
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

[54] N. O. F. Elssied, O. Ibrahim, A. H. Osman, Engineering, and Technology, A novel feature

selection based on one-way anova f-test for e-mail spam classification, Engineering and
Technology 7, no. 3 (2014): 625-638..
[55] B. Vrigazova, I. Ivanov, and Engineering, Optimization of the ANOVA procedure for sup-
port vector machines, International Journal of Recent Technology and Engineering 8, no.
4 (2019): 5160-5165.
[56] R. Chourasiya, V. Patel, and A. Shrivastava, Classification of cyber attack using machine
learning technique at microsoft azure cloud, Int. Res. J. Eng. Appl. Sci (2018).
[57] M. Rai and H. L. Mandoria, Network Intrusion Detection: A comparative study using state-
of-the-art machine learning methods, in 2019 International Conference on Issues and
Challenges in Intelligent Computing Techniques (ICICT), 2019, vol. 1, pp. 1-5: IEEE.
[58] M. AlJame, A. Imtiaz, I. Ahmad, and A. J. S. R. Mohammed, Deep forest model for diag-
nosing COVID-19 from routine blood tests, Scientific Reports 11, no. 1 (2021): 1-12.
[59] K. J. Lone, L. Hussain, S. Saeed, A. Aslam, A. Maqbool, and F. Mehmood Butt, Detecting
basic human activities and postural transition using robust machine learning techniques
by applying dimensionality reduction methods, Waves in Random and Complex Me-
dia (2021): 1-26.
[60] L. Tutica, K. Vineel, S. Mishra, M. K. Mishra, and S. Suman, Invoice deduction classifi-
cation using LGBM prediction model, in Advances in Electronics, Communication and
Computing: Springer, 2021, pp. 127-137.
[61] M. Marvi, A. Arfeen, and R. Uddin, A generalized machine learning‐based model for the
detection of DDoS attacks, International Journal of Network Management 31, no. 6
(2021): e2152.
[62] P. Li, Q. Wu, and C. Burges, Mcrank: Learning to rank using multiple classification and
gradient boosting, Advances in neural information processing systems 20 (2007).
[63] L. Zhong, Q. Meng, Y. Chen, L. Du, and P. J. B. b. Wu, A laminar augmented cascading
flexible neural forest model for classification of cancer subtypes based on gene expres-
sion data, BMC bioinformatics 22, no. 1 (2021): 1-17.
[64] L. N. Paolucci, J. H. Schoereder, P. M. Brando, and A. N. Andersen, Fire-induced forest
transition to derived savannas: cascading effects on ant communities, Biological Conser-
vation 214 (2017): 295-302.
[65] A. Ghodrati, A. Diba, M. Pedersoli, T. Tuytelaars, and L. Van Gool, Deepproposal: Hunt-
ing objects by cascading deep convolutional layers, in Proceedings of the IEEE interna-
tional conference on computer vision, 2015, pp. 2578-2586.
[66] A. El-Nabawy, N. A. Belal, and N. El-Bendary, A Cascade Deep Forest Model for Breast
Cancer Subtype Classification Using Multi-Omics Data, Mathematics 9, no. 13 (2021):
1574.
[67] H. Alqahtani, I. H. Sarker, A. Kalim, M. Hossain, S. Md, S. Ikhlaq, and S. Hossain, Cyber
intrusion detection using machine learning classification techniques, in International
Conference on Computing Science, Communication and Security, 2020, pp. 121-131:
Springer.
[68] M. Uma and G. Padmavathi, A Survey on Various Cyber Attacks and their Classification,
Int. J. Netw. Secur. 15, no. 5 (2013): 390-396.
[69]S. H. Haji and S. Y. Ameen, Attack and anomaly detection in iot networks using machine
learning techniques: A review, Asian journal of research in computer science 9, no. 2
(2021): 30-46.

6999
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

[70]E. M. Campos, P. F. Saura, A. González-Vidal, J. L. Hernández-Ramos, J. B. Bernabe, G.

Baldini, and A. Skarmeta, Evaluating Federated Learning for intrusion detection in Inter-
net of Things: Review and challenges, Computer Networks (2021): 108661.
[71]Z. Alothman, M. Alkasassbeh, and S. Al-Haj Baddar, An efficient approach to detect IoT
botnet attacks using machine learning, Journal of High Speed Networks 26, no. 3 (2020):
241-254.
[72]M. Al-Kasassbeh, M. Almseidin, K. Alrfou, and S. Kovacs, "Detection of IoT-botnet at-
tacks using fuzzy rule interpolation," Journal of Intelligent & Fuzzy Systems 39, no. 1
(2020): 421-431..
[73]M. Shafiq, Z. Tian, A. K. Bashir, X. Du, and M. Guizani, IoT malicious traffic identifica-
tion using wrapper-based feature selection mechanisms, Computers & Security 94
(2020): 101863.
[74]J. Alsamiri and K. Alsubhi, Internet of things cyber attacks detection using machine learn-
ing, Int. J. Adv. Comput. Sci. Appl 10, no. 12 (2019): 627-634.
[75]S. Strecker, R. Dave, N. Siddiqui, and N. Seliya, A Modern Analysis of Aging Machine
Learning Based IoT Cybersecurity Methods, arXiv preprint arXiv:2110.07832 (2021).
[76]I. Ullah and Q. H. Mahmoud, Design and development of a deep learning-based model for
anomaly detection in IoT networks, IEEE Access 9 (2021): 103906-103926.
[77]I. Ortet Lopes, D. Zou, F. A. Ruambo, S. Akbar, and B. Yuan, Towards Effective Detection
of Recent DDoS Attacks: A Deep Learning Approach, Security and Communication Net-
works 2021 (2021).
[78]R. Poornima, M. Elangovan, G. Nagarajan, Network attack classification using LSTM with
XGBoost feature selection, Journal of Intelligent & Fuzzy Systems Preprint: 1-14.
[79]Y. Meidan, V. Sachidananda, H. Peng, R. Sagron, Y. Elovici, and A. Shabtai, A novel
approach for detecting vulnerable IoT devices connected behind a home NAT, Comput-
ers & Security 97 (2020): 101968.
[80]R. N. Chowdhury, M. M. Chowdhury, S. Chowdhury, M. R. Islam, M. A. Ayub, A. Chow-
dhury, and K. A. Kalpoma, Parameter Optimization and Performance Analysis of State-
of-the-Art Machine Learning Techniques for Intrusion Detection System (IDS), in 2020
23rd International Conference on Computer and Information Technology (ICCIT), 2020,
pp. 1-6: IEEE.
[81]S. Islam, M. A. Rouf, A. Shahariar Parvez, and P. Podder, "Machine Learning-Driven Al-
gorithms for Network Anomaly Detection," in Inventive Computation and Information
Technologies: Springer, 2022, pp. 493-507.
[82]S. Seth, G. Singh, and K. Kaur Chahal, A novel time efficient learning-based approach for
smart intrusion detection system, Journal of Big Data 8, no. 1 (2021): 1-28.
[83]H. Yao, P. Gao, P. Zhang, J. Wang, C. Jiang, and L. Lu, Hybrid intrusion detection system
for edge-based IIoT relying on machine-learning-aided detection, IEEE Network 33, no.
5 (2019): 75-81.
[84]A. Farah, Cross Dataset Evaluation for IoT Network Intrusion Detection, PhD diss., The
University of Wisconsin-Milwaukee, 2020.
[85]O. Ibitoye, O. Shafiq, and A. Matrawy, Analyzing adversarial attacks against deep learning
for intrusion detection in IoT networks, in 2019 IEEE global communications conference
(GLOBECOM), 2019, pp. 1-6: IEEE.

Supplementary Material
Correlation between attributes in different datasets pre-processing step 5:

7000
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

Fig. S1. Correlation in attributes of BoT IoT dataset

7001
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

Fig. S2. Correlation in attributes of IoT23 dataset

7002
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

Fig. S3. Correlation in train attributes of CIC-DDoS-2019 dataset

7003
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

Fig. S4. Correlation in test attributes of CIC-DDoS-2019 dataset

Supplementary Material Tables

S1. Attributes present/selected/deleted in different datasets in pre-processing step 1
Table S1 : Bot IoT
S.No. Attributes present in dataset Attributes selected after Attributes
pre-processing deleted
1. stime stime
2. flgs_number flgs_number
3. proto_number proto_number
4. pkts pkts
5. bytes bytes
6. state_number state_number
7004
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

7. ltime ltime
8. seq seq
9. dur dur
10. mean mean
11. stddev stddev
12. sum sum
13. min min
14. max max
15. spkts spkts
16. dpkts dpkts
17. sbytes sbytes
18. dbytes dbytes
19. rate rate
20. srate srate
21. drate drate
22. TnBPSrcIP TnBPSrcIP
23. TnBPDstIP TnBPDstIP
24. TnP_PSrcIP TnP_PSrcIP
25. TnP_PDstIP TnP_PDstIP
26. TnP_PerProto TnP_PerProto
27. TnP_Per_Dport TnP_Per_Dport
28. AR_P_Proto_P_SrcIP AR_P_Proto_P_SrcIP
29. AR_P_Proto_P_DstIP AR_P_Proto_P_DstIP
30. N_IN_Conn_P_DstIP N_IN_Conn_P_DstIP
31. N_IN_Conn_P_SrcIP N_IN_Conn_P_SrcIP
32. AR_P_Proto_P_Sport AR_P_Proto_P_Sport
33. AR_P_Proto_P_Dport AR_P_Proto_P_Dport
34. Pkts_P_State_P_Proto- Pkts_P_State_P_Proto-
col_P_DestIP col_P_DestIP
35. Pkts_P_State_P_Proto- Pkts_P_State_P_Proto-
col_P_SrcIP col_P_SrcIP
36. attack attack
37. pkSeqID pkSeqID
38. proto proto
39. saddr saddr
40. sport sport
41. daddr daddr
42. dport dport
43. state state
44. subcategory subcate-
gory
45. category category
46. flgs flgs

Table S2 : IoT-23

7005
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

S.No. Attributes present in Attributes selected after pre- Attributes de-

dataset processing leted
1. Flow duration Flow Duration
2. total Fwd Packet Total Fwd Packet
3. total Bwd packets Total Bwd packets
4. total Length of Fwd Total Length of Fwd Packet
Packet
5. total Length of Bwd Total Length of Bwd Packet
Packet
6. Fwd Packet Length Fwd Packet Length Max
Min
7. Fwd Packet Length Fwd Packet Length Min
Max
8. Fwd Packet Length Fwd Packet Length Mean
Mean
9. Fwd Packet Length Std Fwd Packet Length Std
10. Bwd Packet Length Bwd Packet Length Max
Min
11. Bwd Packet Length Bwd Packet Length Min
Max
12. Bwd Packet Length Bwd Packet Length Mean
Mean
13. Bwd Packet Length Std Bwd Packet Length Std
14. Flow Bytes/s Flow IAT Mean
15. Flow Packets/s Flow IAT Std
16. Flow IAT Mean Flow IAT Max
17. Flow IAT Std Flow IAT Min
18. Flow IAT Max Fwd IAT Total
19. Flow IAT Min Fwd IAT Mean
20. Fwd IAT Min Fwd IAT Std
21. Fwd IAT Max Fwd IAT Max
22. Fwd IAT Mean Fwd IAT Min
23. Fwd IAT Std Bwd IAT Total
24. Fwd IAT Total Bwd IAT Mean
25. Bwd IAT Min Bwd IAT Std
26. Bwd IAT Max Bwd IAT Max
27. Bwd IAT Mean Bwd IAT Min
28. Bwd IAT Std Fwd Header Length
29. Bwd IAT Total Bwd Header Length
30. Fwd PSH flags Fwd Packets/s
31. Bwd PSH Flags Bwd Packets/s
32. Fwd URG Flags Packet Length Min
33. Bwd URG Flags Packet Length Max
34. Fwd Header Length Packet Length Mean
35. Bwd Header Length Packet Length Std
36. FWD Packets/s Packet Length Variance
37. Bwd Packets/s SYN Flag Count
7006
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

38. Packet Length Min RST Flag Count

39. Packet Length Max ACK Flag Count
40. Packet Length Mean URG Flag Count
41. Packet Length Std Average Packet Size
42. Packet Length Vari- Fwd Segment Size Avg
ance
43. FIN Flag Count Bwd Segment Size Avg
44. SYN Flag Count Subflow Fwd Packets
45. RST Flag Count Subflow Fwd Bytes
46. PSH Flag Count FWD Init Win Bytes
47. ACK Flag Count Bwd Init Win Bytes
48. URG Flag Count Fwd Seg Size Min
49. CWR Flag Count Active Mean
50. ECE Flag Count Active Std
51. down/Up Ratio Active Max
52. Average Packet Size Active Min
53. Fwd Segment Size Avg Idle Mean
54. Bwd Segment Size Idle Std
Avg
55. Fwd Bytes/Bulk Avg Idle Max
56. Fwd Packet/Bulk Avg Idle Min
57. Fwd Bulk Rate Avg Label
58. Bwd Bytes/Bulk Avg Bwd Bytes/Bulk
Avg
59. Bwd Packet/Bulk Avg Bwd
Packet/Bulk Avg
60. Bwd Bulk Rate Avg Bwd Bulk Rate
Avg
61. Subflow Fwd Packets Subflow Fwd
Packets
62. Subflow Fwd Bytes Subflow Fwd
Bytes
63. Subflow Bwd Packets Subflow Bwd
Packets
64. Subflow Bwd Bytes Subflow Bwd
Bytes
65. Fwd Init Win bytes Fwd Init Win
bytes
66. Bwd Init Win bytes Bwd Init Win
bytes
67. Fwd Act Data Pkts Fwd Act Data
Pkts
68. Fwd Seg Size Min Fwd Seg Size
Min
69. Active Min Active Min
70. Active Mean Active Mean
71. Active Max Active Max
7007
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

72. Active Std Active Std

73. Idle Min Idle Min
74. Idle Mean Idle Mean
75. Idle Max Idle Max
76. Idle Std Idle Std
77. Flow duration Flow duration
78. total Fwd Packet total Fwd Packet
79. total Bwd packets total Bwd pack-
ets
80. total Length of Fwd total Length of
Packet Fwd Packet

Table S3 : CIC-DDoS-2019
S.No. Attributes present in Attributes selected after Attributes deleted
dataset pre-processing
1. Flow ID Flow Duration Flow Packets/s
2. Source IP Total Fwd Packets Flow Bytes/s
3. Source Port Total Backward Pack- Flow ID
ets
4. Destination IP Total Length of Fwd Source IP
Packets
5. Destination Port Total Length of Bwd Destination Port
Packets
6. Protocol Fwd Packet Length Source Port
Max
7. Timestamp Fwd Packet Length Destination IP
Min
8. Flow Duration Fwd Packet Length Timestamp
Mean
9. Total Fwd Packets Fwd Packet Length Protocol
Std
10. Total Backward Bwd Packet Length Bwd PSH Flags
Packets Max
11. Total Length of Bwd Packet Length Fwd URG Flags
Fwd Packets Min
12. Total Length of Bwd Packet Length Bwd URG Flags
Bwd Packets Mean
13. Fwd Packet Length Bwd Packet Length FIN Flag Count
Max Std
14. Fwd Packet Length Flow IAT Mean PSH Flag Count
Min
15. Fwd Packet Length Flow IAT Std ECE Flag Count
Mean
16. Fwd Packet Length Flow IAT Max Fwd Avg
Std Bytes/Bulk
17. Bwd Packet Length Flow IAT Min Fwd Avg Pack-
Max ets/Bulk

7008
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

18. Bwd Packet Length Fwd IAT Total Fwd Avg Bulk Rate
Min
19. Bwd Packet Length Fwd IAT Mean Bwd Avg
Mean Bytes/Bulk
20. Bwd Packet Length Fwd IAT Std Bwd Avg Pack-
Std ets/Bulk
21. Flow Bytes/s Fwd IAT Max Bwd Avg Bulk Rate
22. Flow Packets/s Fwd IAT Min SimillarHTTP
23. Flow IAT Mean Bwd IAT Total
24. Flow IAT Std Bwd IAT Mean
25. Flow IAT Max Bwd IAT Std
26. Flow IAT Min Bwd IAT Max
27. Fwd IAT Total Bwd IAT Min
28. Fwd IAT Mean Fwd PSH Flags
29. Fwd IAT Std Fwd Header Length
30. Fwd IAT Max Bwd Header Length
31. Fwd IAT Min Fwd Packets/s
32. Bwd IAT Total Bwd Packets/s
33. Bwd IAT Mean Min Packet Length
34. Bwd IAT Std Max Packet Length
35. Bwd IAT Max Packet Length Mean
36. Bwd IAT Min Packet Length Std
37. Fwd PSH Flags Packet Length Vari-
ance
38. Bwd PSH Flags SYN Flag Count
39. Fwd URG Flags RST Flag Count
40. Bwd URG Flags ACK Flag Count
41. Fwd Header Length URG Flag Count
42. Bwd Header CWE Flag Count
Length
43. Fwd Packets/s Down/Up Ratio
44. Bwd Packets/s Average Packet Size
45. Min Packet Length Avg Fwd Segment
Size
46. Max Packet Length Avg Bwd Segment
Size
47. Packet Length Fwd Header Length.1
Mean
48. Packet Length Std Subflow Fwd Packets
49. Packet Length Var- Subflow Fwd Bytes
iance
50. FIN Flag Count Subflow Bwd Packets
51. SYN Flag Count Subflow Bwd Bytes
52. RST Flag Count Init_Win_bytes_for-
ward
53. PSH Flag Count
Init_Win_bytes_backward
7009
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

54. ACK Flag Count act_data_pkt_fwd

55. URG Flag Count min_seg_size_for-
ward
56. CWE Flag Count Active Mean
57. ECE Flag Count Active Std
58. Down/Up Ratio Active Max
59. Average Packet Active Min
Size
60. Avg Fwd Segment Idle Mean
Size
61. Avg Bwd Segment Idle Std
Size
62. Fwd Header Idle Max
Length.1
63. Fwd Avg Idle Min
Bytes/Bulk
64. Fwd Avg Pack- Inbound
ets/Bulk
65. Fwd Avg Bulk Rate Label
66. Bwd Avg
Bytes/Bulk
67. Bwd Avg Pack-
ets/Bulk
68. Bwd Avg Bulk
Rate
69. Subflow Fwd Pack-
ets
70. Subflow Fwd Bytes
71. Subflow Bwd Pack-
ets
72. Subflow Bwd Bytes
73.
Init_Win_bytes_forward
74.
Init_Win_bytes_back-
ward
75. act_data_pkt_fwd
76. min_seg_size_for-
ward
77. Active Mean
78. Active Std
79. Active Max
80. Active Min
81. Idle Mean
82. Idle Std
83. Idle Max
84. Idle Min

7010
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

85. SimillarHTTP
86. Inbound
87. Label
S2. Pre-processing Step 6: Divide in train test set

Table S4 Experiment I : BoT IoT vs IoT 23(Train test ratio 80:20)

Instances before Instances Method of
resampling after resampling resampling
Train(BoT IoT) 668045 – at- 668045 –attack(1), ‘Resample()’
tack(1), 50000 – Normal(0) function from Py-
1477 – Normal(0) thon
Test(IoT23) 116281 – at- 116281 – attack(1), ‘Resample()’
tack(1), 49591 – 31381 – Normal(0) function from Py-
Normal(0) thon

Table S5 Experiment II : IoT 23 vs BoT IoT(Train test ratio 80:20)

Instances before Instances Method of
resampling after resampling resampling
Train(IoT23) 668045 – at- 116281 – attack(1), ‘Resample()’
tack(1), 49591 – Normal(0) function from Py-
1477 – Normal(0) thon
Test(BoT IoT) 116281 – at- 30953 –attack(1), ‘Resample()’
tack(1), 49591 – 2221 – Normal(0) function from Py-
Normal(0) thon

Table S6 Experiment III : CIC-DDoS 2019 Train vs Test(Train test ratio 70:30)
Instances before Instances Method of resampling
resampling after resampling
Train LDAP(1)- LDAP(1)-217993 Every 10th instance was taken
day 2181542 MSSQL(2)- of LDAP
MSSQL(2)- 205568 Every 11th instance was taken
1844905 NetBIOS(3)- of MSSQL
NetBIOS(3)- 204664 Every 20th instance was taken
4094986 SYN Flood(4)- of NetBIOS
SYN Flood(4)- 226042 Every 14th instance was taken
1582681 UDP Flood(5)- of SYN Flood
UDP Flood (5)- 208977 Every 14th instance was taken
1470742 of UDP Flood
Test LDAP(1)- LDAP(1)-63507 Every 30th instance was taken
day 1905191 MSSQL(2)-57631 of LDAP
MSSQL(2)- NetBIOS(3)-57577 Every 100th instance was taken
5763061 SYN Flood(4)- of MSSQL
NetBIOS(3)- 71413 Every 60th instance was taken
3454578 UDP Flood(5) - of NetBIOS
SYN Flood(4)- 62578 Every 60th instance was taken
4284751 of SYN Flood

7011
TELEMATIQUE Volume 21 Issue 1, 2022
ISSN: 1856-4194 6982 – 7012

UDP Flood(5) - Every 60th instance was taken

3754680 of UDP Flood

7012

Network Intrusion Detection in Big Datasets Using Spark Environment and Incremental Learning
No ratings yet
Network Intrusion Detection in Big Datasets Using Spark Environment and Incremental Learning
8 pages
Multi-Class Intrusion Detection Based On Transformer For IoT Networks Using CIC-IoT-2023 Dataset
No ratings yet
Multi-Class Intrusion Detection Based On Transformer For IoT Networks Using CIC-IoT-2023 Dataset
25 pages
Distributed Denial of Services (Ddos) & Iot Botnet Malware Identification Using Machine Learning & Deep Learning Models
No ratings yet
Distributed Denial of Services (Ddos) & Iot Botnet Malware Identification Using Machine Learning & Deep Learning Models
6 pages
Securing The Internet of Things - Evaluating Machine Learning Algorithms For Detecting IoT Cyberattacks Using CIC-IoT2023 Dataset
No ratings yet
Securing The Internet of Things - Evaluating Machine Learning Algorithms For Detecting IoT Cyberattacks Using CIC-IoT2023 Dataset
10 pages
Hybrid Deep Learning Model For Attack Detection in Internet of Things
No ratings yet
Hybrid Deep Learning Model For Attack Detection in Internet of Things
20 pages
A Data-Driven Approach For Classifying and Predicting DDoS Attacks With Machine Learning
100% (1)
A Data-Driven Approach For Classifying and Predicting DDoS Attacks With Machine Learning
13 pages
Sensors 23 07342
No ratings yet
Sensors 23 07342
22 pages
19148-Article Text-78917-2-10-20240405
No ratings yet
19148-Article Text-78917-2-10-20240405
24 pages
Intrusion Detection Using Deep Neural Network Algorithm On The Internet of Things
No ratings yet
Intrusion Detection Using Deep Neural Network Algorithm On The Internet of Things
4 pages
Deep Learning Based Detection For Cyber Attacks in Iot Networks: A Distributed Attack Detection Framework
No ratings yet
Deep Learning Based Detection For Cyber Attacks in Iot Networks: A Distributed Attack Detection Framework
24 pages
1 s2.0 S0045790623000514 Main
No ratings yet
1 s2.0 S0045790623000514 Main
14 pages
A Comparative Study of Using Boosting-Based Machine Learning Algorithms For Iot Network Intrusion Detection
No ratings yet
A Comparative Study of Using Boosting-Based Machine Learning Algorithms For Iot Network Intrusion Detection
15 pages
Electronics 12 03911 v2
No ratings yet
Electronics 12 03911 v2
26 pages
DTL Ids
No ratings yet
DTL Ids
10 pages
RTL-DL: A Hybrid Deep Learning Framework For Ddos Attack Detection in A Big Data Environment
No ratings yet
RTL-DL: A Hybrid Deep Learning Framework For Ddos Attack Detection in A Big Data Environment
16 pages
Deep Transfer Learning For IoT Attack Detection
No ratings yet
Deep Transfer Learning For IoT Attack Detection
10 pages
Liu 2021
No ratings yet
Liu 2021
37 pages
Deep Learning Algorithms For Intrusion D
No ratings yet
Deep Learning Algorithms For Intrusion D
8 pages
List ReadingPaper
No ratings yet
List ReadingPaper
20 pages
s44147 025 00635 7
No ratings yet
s44147 025 00635 7
26 pages
Ensemble-Based Botnet Attack Detection and Classification Using Machine Learning Algorithms On NBaIoT Dataset
No ratings yet
Ensemble-Based Botnet Attack Detection and Classification Using Machine Learning Algorithms On NBaIoT Dataset
6 pages
DDOS Attack Final
No ratings yet
DDOS Attack Final
41 pages
Detection of Cyber Attacks On IoT Based Cyber Phys
No ratings yet
Detection of Cyber Attacks On IoT Based Cyber Phys
9 pages
Enhanced IDS With Deep Learning For IoT-Based Smart Cities Security
No ratings yet
Enhanced IDS With Deep Learning For IoT-Based Smart Cities Security
19 pages
Apply Machine Learning Techniques To Detect Malicious Network Traffic in Cloud Computing
No ratings yet
Apply Machine Learning Techniques To Detect Malicious Network Traffic in Cloud Computing
24 pages
CYBER ATTACKS DETECTION USING GoogleNet MODEL FOR ENVIRONMENTAL AWARE SMART CITY APPLICATIONS
No ratings yet
CYBER ATTACKS DETECTION USING GoogleNet MODEL FOR ENVIRONMENTAL AWARE SMART CITY APPLICATIONS
10 pages
A Deep Learning Methodology To Predicting Cybersecurity Attacks On The Internet of Things
No ratings yet
A Deep Learning Methodology To Predicting Cybersecurity Attacks On The Internet of Things
22 pages
IOT Based Ids System Using ANN
No ratings yet
IOT Based Ids System Using ANN
8 pages
Doshi 2018
No ratings yet
Doshi 2018
7 pages
Sada
No ratings yet
Sada
11 pages
(An Empowered Autonomous Institute Affiliated To SPPU) (NAAC Accredited With A+ Grade) WAGHOLI, PUNE - 412207
No ratings yet
(An Empowered Autonomous Institute Affiliated To SPPU) (NAAC Accredited With A+ Grade) WAGHOLI, PUNE - 412207
10 pages
Effective Intrusion Detection in IoT Env
No ratings yet
Effective Intrusion Detection in IoT Env
8 pages
Sensors 23 05568
No ratings yet
Sensors 23 05568
20 pages
Information 14 00041
No ratings yet
Information 14 00041
21 pages
Feature Extraction For Machine Learning-Based Intrusion Detection in
No ratings yet
Feature Extraction For Machine Learning-Based Intrusion Detection in
12 pages
m64421 F.farhad - Paper
No ratings yet
m64421 F.farhad - Paper
10 pages
Applsci 14 04729
No ratings yet
Applsci 14 04729
15 pages
IoT Network Attack Detection Using Supervised Machine Learning
No ratings yet
IoT Network Attack Detection Using Supervised Machine Learning
15 pages
Intrusion Detection System For Imbalance Ratio Class Using Weighted XGBoost Classifier
No ratings yet
Intrusion Detection System For Imbalance Ratio Class Using Weighted XGBoost Classifier
11 pages
Sensors 22 08417
No ratings yet
Sensors 22 08417
12 pages
Transfer Learning Based Intrusion Detect
No ratings yet
Transfer Learning Based Intrusion Detect
17 pages
Online Self-Supervised Deep Learning For Intrusion Detection Systems
No ratings yet
Online Self-Supervised Deep Learning For Intrusion Detection Systems
17 pages
Research 2
No ratings yet
Research 2
12 pages
5.an Ensemble Deep Learning Model For Cyber Threat Hu - 2023 - Digital Communicati
No ratings yet
5.an Ensemble Deep Learning Model For Cyber Threat Hu - 2023 - Digital Communicati
10 pages
B20-ml Basedbotnet Attack in IoT Devices
No ratings yet
B20-ml Basedbotnet Attack in IoT Devices
66 pages
Formato de Excel Modelo para Revision de Literatura
No ratings yet
Formato de Excel Modelo para Revision de Literatura
11 pages
Design of An Intrusion Detection Model For IoT-Enabled Smart Home
No ratings yet
Design of An Intrusion Detection Model For IoT-Enabled Smart Home
18 pages
IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered Feature Selection With ML Model For IoT Threats & Attack Detection
No ratings yet
IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered Feature Selection With ML Model For IoT Threats & Attack Detection
18 pages
A Machine Learning-Based Intrusion Detection of DDoS Attack On IoT Devices
No ratings yet
A Machine Learning-Based Intrusion Detection of DDoS Attack On IoT Devices
6 pages
Rq3 Paper 04
No ratings yet
Rq3 Paper 04
19 pages
Paper 1
No ratings yet
Paper 1
8 pages
A Sample Article Using IEEEtran Cls For IEEE Journals and Transactions
No ratings yet
A Sample Article Using IEEEtran Cls For IEEE Journals and Transactions
7 pages
Springer Iot Ddos
No ratings yet
Springer Iot Ddos
22 pages
A Survey On Intrusion Detection Systems For Iot Networks Based On Long Short-Term Memory
No ratings yet
A Survey On Intrusion Detection Systems For Iot Networks Based On Long Short-Term Memory
14 pages
Etasr 4202 PDF
No ratings yet
Etasr 4202 PDF
6 pages
Anomaly Detection Based On CNN and Regularization Techniques Against Zero-Day Attacks in IoT Networks
No ratings yet
Anomaly Detection Based On CNN and Regularization Techniques Against Zero-Day Attacks in IoT Networks
14 pages
A Machine Learning-Based Intrusion Detection
100% (1)
A Machine Learning-Based Intrusion Detection
15 pages
Electronics 11 00898
No ratings yet
Electronics 11 00898
13 pages
Ijsse 15.03 01
No ratings yet
Ijsse 15.03 01
9 pages
Botnet Attack Detection in the Internet of Things Using Selected Learning Algorithms: A Research Study on Securing IoT Against Cyber Threats Using Machine Learning
From Everand
Botnet Attack Detection in the Internet of Things Using Selected Learning Algorithms: A Research Study on Securing IoT Against Cyber Threats Using Machine Learning
Bolakale Aremu
5/5 (1)
MTP Report
No ratings yet
MTP Report
29 pages
Machine Learning (R17A0534) Lecture Notes: B.Tech Iv Year - I Sem (R17) (2020-21)
No ratings yet
Machine Learning (R17A0534) Lecture Notes: B.Tech Iv Year - I Sem (R17) (2020-21)
5 pages
Diabetes Prediction Using Machine Learning Classification Techniques
No ratings yet
Diabetes Prediction Using Machine Learning Classification Techniques
34 pages
Webphishing Detection PPT 83
No ratings yet
Webphishing Detection PPT 83
16 pages
Design A Learning System in Machine Learning
No ratings yet
Design A Learning System in Machine Learning
41 pages
Machine Learning - It3190E: Hanoi University of Science and Technology School of Information and Communication Technology
No ratings yet
Machine Learning - It3190E: Hanoi University of Science and Technology School of Information and Communication Technology
14 pages
Early Detection of Cardiovascular Diseases Using Machine Learning 2
No ratings yet
Early Detection of Cardiovascular Diseases Using Machine Learning 2
38 pages
Brain, Bytes & Bias: ML Interview Questions You Can't Miss!
No ratings yet
Brain, Bytes & Bias: ML Interview Questions You Can't Miss!
21 pages
How To Learn Machine Learning Algorithms For Interviews
No ratings yet
How To Learn Machine Learning Algorithms For Interviews
16 pages
Ada Boost
No ratings yet
Ada Boost
2 pages
Forest Fire Prediction Using Machine Learning
No ratings yet
Forest Fire Prediction Using Machine Learning
28 pages
Applied Computational Intelligence and Soft Computing - 2022 - Chung - Mental Health Prediction Using Machine Learning
No ratings yet
Applied Computational Intelligence and Soft Computing - 2022 - Chung - Mental Health Prediction Using Machine Learning
19 pages
Fake News Detection Using Natural Language Processing
100% (1)
Fake News Detection Using Natural Language Processing
8 pages
XG Boost
No ratings yet
XG Boost
39 pages
Predicting Short-Term Stock Prices Using Ensemble Methods and Online Data Sources
No ratings yet
Predicting Short-Term Stock Prices Using Ensemble Methods and Online Data Sources
39 pages
GWL Prediction Paper
No ratings yet
GWL Prediction Paper
17 pages
Course Report
No ratings yet
Course Report
22 pages
R18 Prediction of Water Quality With Ensemble Learning Algorithms
No ratings yet
R18 Prediction of Water Quality With Ensemble Learning Algorithms
9 pages
Laptop Price Prediction Using Machine Learning (Abstract)
0% (1)
Laptop Price Prediction Using Machine Learning (Abstract)
3 pages
Predicting Breast Cancer Recurrence Using Machine Learning Techniques: A Systematic Review
No ratings yet
Predicting Breast Cancer Recurrence Using Machine Learning Techniques: A Systematic Review
41 pages
2024 MTH058 Lecture04 AILearningParadigms
No ratings yet
2024 MTH058 Lecture04 AILearningParadigms
85 pages
Imbalanced Dataset Classification and Solutions: A Review
No ratings yet
Imbalanced Dataset Classification and Solutions: A Review
29 pages
High Protection Voice Identification Based Bank Locker Security System With Live Image Authentication
No ratings yet
High Protection Voice Identification Based Bank Locker Security System With Live Image Authentication
6 pages
Chapter 7 Classification and Prediction 3735
No ratings yet
Chapter 7 Classification and Prediction 3735
89 pages
Ardent Report
No ratings yet
Ardent Report
62 pages
XGBoost Guide With Code
No ratings yet
XGBoost Guide With Code
3 pages
Supervised - ML Complete Book
No ratings yet
Supervised - ML Complete Book
153 pages
Research Paper
No ratings yet
Research Paper
6 pages
Final PPT PFD
No ratings yet
Final PPT PFD
30 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

TelematiqueVol21Issue1 616

Uploaded by

TelematiqueVol21Issue1 616

Uploaded by

TELEMATIQUE Volume 21 Issue 1, 2022

ISSN: 1856-4194 6982 – 7012

Identification of Internet of Things (Iot) Attacks Using Gradient Boosting:

E-mail: 1mtity10005.20@bitmesra.ac.in, 2p.srinivasa@samsung.com

Received 08/11/2022; Accepted 30/11/2022

1.1 Cross dataset attack identification

Fig. 1. IoT Environment and possible attacks [15]

Fig. 2. Workflow diagram of attack identification using boosting algorithms

Fig. 2 states the author’s contribution as follows:

UDP- UDP-Lag(1873) LDAP(1905191)

SYN Flood(1582681) UDP Flood(1470742) UDP-Lag(370607) MSSQL(1844905)

Fig. 3. Attacks present in CIC-DDoS-2019 dataset in two days

1.2 Feature Extraction and Selection (FES)

It is explained in [18, 51-53] and an unsupervised linear transformation approach generally

Fig. 4. PCA – Directions of maximum variance

Fig. 5. Behavior of distributions

2.3.1 Extreme Gradient Boosting (XGB)

2.3.2 Light gradient boosting method (Light GBM)

2.3.3 Cascaded Deep Forest (CDF)

II. Experiments and Results

Where TP =True Positive

The descriptions of experiment can be found in the following subsections:

3.2 Experiment I: BoT-IoT vs IoT23

30 77.78 2.55 78.88 0.13 71.38 58.36

3.3 Experiment II: IoT 23 vs BoT IoT

3.4 Experiment III: CIC-DDoS-2019 Train vs Test

Table 3: Percentage of records in six common attack files

2. DISCUSSION AND CONCLUSIONS

CDF XGB LGBM

Fig. 6. Average accuracies obtained by different algorithms

Average execution time(in mins)

LGBM XGB CDF

[22] C. Bentéjac, A. Csörgő, and G. Martínez-Muñoz, A comparative analysis of gradient

[38] M. K. A. Abuthawabeh and K. W. Mahmoud, Android malware detection and categoriza-

[54] N. O. F. Elssied, O. Ibrahim, A. H. Osman, Engineering, and Technology, A novel feature

[70]E. M. Campos, P. F. Saura, A. González-Vidal, J. L. Hernández-Ramos, J. B. Bernabe, G.

Fig. S1. Correlation in attributes of BoT IoT dataset

Fig. S2. Correlation in attributes of IoT23 dataset

Fig. S3. Correlation in train attributes of CIC-DDoS-2019 dataset

Fig. S4. Correlation in test attributes of CIC-DDoS-2019 dataset

Supplementary Material Tables

S.No. Attributes present in Attributes selected after pre- Attributes de-

38. Packet Length Min RST Flag Count

72. Active Std Active Std

54. ACK Flag Count act_data_pkt_fwd

Table S4 Experiment I : BoT IoT vs IoT 23(Train test ratio 80:20)

Table S5 Experiment II : IoT 23 vs BoT IoT(Train test ratio 80:20)

UDP Flood(5) - Every 60th instance was taken

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.