0% found this document useful (0 votes)
37 views115 pages

ADML IoT 1-0-1

Uploaded by

hanadaydidp35
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views115 pages

ADML IoT 1-0-1

Uploaded by

hanadaydidp35
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 115

Anomaly Detection with

Machine Learning in IoT Cellular


Networks
Master Thesis
Imran Qayyum Khan

Thesis submitted for the degree of


Master in Informatics: Network and System
Administration
60 credits

Department of Informatics
Faculty of mathematics and natural sciences

UNIVERSITY OF OSLO

Autumn 2020
Anomaly Detection with
Machine Learning in IoT
Cellular Networks

Master Thesis

Imran Qayyum Khan

Supervisor: Co-Supervisor:

Prof. Dr. Thanh van Do Prof. Boning Feng


Telenor Group, Telenor Research Oslo Metropolitan University,
Oslo Metropolitan University, Oslo, Norway
Oslo, Norway
© 2020 Imran Qayyum Khan

Anomaly Detection with Machine Learning in IoT Cellular Networks

http://www.duo.uio.no/

Printed: Reprosentralen, University of Oslo


Abstract
The number of Internet of Things (IoT) devices are increasing day by day.
This growing of IoT devices involve a big challenge in the field of security,
especially for network and telecom operators, IoT service providers and also
for the users. The implementation of security on IoT devices brings very big
challenges for us. Attackers launch many attacks towards IoT devices like
Distributed Denial of Services. To detect and prevent these types of attack
in IoT devices that use the mobile network, it’s needed to have a proper
overview of the existing threats and vulnerabilities.

The main prospective of this thesis is to present and compare different


machine learning algorithms. Supervised machine learning classification
methods are used in this study, where five machine learning algorithms are
tested and evaluated by their performance. Both datasets are analyzed by
using these algorithms namely, k-NN, SVM,naı̈ve Bayes, decision tree and
logistic regression. Four algorithm k-NN, SVM, decision tree,and logistic
regression behave similarly but naı̈ve Bayes shows some inconsistencies in
this study experiment. But we see the accuracy and precision of the models
are average above 90%.

1
Acknowledgements
First of all, I would like to thanks to Almighty God who help me to make
this thesis work. Secondly, I would like to thanks to my Supervisor Prof.
Dr. Thanh van Do for his support and guidelines provided throughout
my master thesis. I would also thanks to my Co-Supervisor Prof. Boning
Feng who give me the opportunity to write about such an interesting topic.
With the help of both of you, I overcome the challenges faced throughout
my thesis.

I am grateful to Prof. Hårek Hagerud as my teacher, Kyrre Begnum


as a knowledgeable professor and great person to give motivation, advice
and knowledge specially in the field of cloud computing and automation. I
would also like to thanks to Prof. Anis Yazidi as a teacher, who guide us
how to write thesis.

Thanks to University of Oslo and Oslo Metropolitan University (OsloMet)


for granting me admission and Norwegian Government to provide me op-
portunity to fulfill my dream of doing my Master in Network and System
Administration.

I would like to thanks everyone involved in this master program. My spe-


cial thanks and appreciation goes to Bernardo Santos who always had time
for me and my questions. With his advice, proof-reading, and comments I
was able to improve my thesis document.

I would like to thanks all to my friends and colleagues, especially Tariq


Mahmood as a study partner and Syed Zyyad Ali Shah who gave tremen-
dous suggestions and advises through out this degree. All credit of this
degree goes to my best friend Hammad Raza, without his motivational
speeches I never come to Norway and start this program.

Finally, I would like to thanks to my beloved parents, brothers, sisters,


my wife and my lovely daughters. Without their encouragement and sup-
port none of this happened.

Author

Imran Qayyum Khan

2
Contents
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 IoT in Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Research Methodology . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Background and Related Work 8


2.1 Cellular Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 Cellular Networks Architecture Concepts . . . . . . . . 8
2.1.2 Evolution of Mobile Technologies . . . . . . . . . . . . . 9
2.2 Vulnerability, Security Threats and Attacks . . . . . . . . . . 11
2.2.1 Vulnerability . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Threat . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Security Challenges . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 UMTS (Universal Mobile Telecommunication System) 12
2.3.2 LTE (Long Term Evolution) . . . . . . . . . . . . . . . . 12
2.3.3 5G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.1 NB-IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.2 Protocol Vulnerabilities in IoT . . . . . . . . . . . . . . . 18
2.5 What is DDoS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.5.1 DDoS attacks, an overview . . . . . . . . . . . . . . . . . 24
2.5.2 DDoS Direct and Indirect attacks . . . . . . . . . . . . . 26
2.5.3 How attackers launch a DDoS attack . . . . . . . . . . 27
2.5.4 DDoS attack types . . . . . . . . . . . . . . . . . . . . . 28
2.5.5 UDP attack . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5.6 Detection of DDoS Traffic . . . . . . . . . . . . . . . . . 29
2.6 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.6.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . 30
2.6.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . 34
2.6.3 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . 35
2.6.4 Reinforcement Learning . . . . . . . . . . . . . . . . . . 35
2.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3 Approach 38
3.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Design Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3 Implementation and Experiment phase . . . . . . . . . . . . . 41
3.3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.2 Used Tools and Software . . . . . . . . . . . . . . . . . . 42
3.3.3 Collection of Normal and DDoS attack traffic . . . . . . 43
3.3.4 Feature Extraction . . . . . . . . . . . . . . . . . . . . . 46
3.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4.1 Cleaning and Transformation . . . . . . . . . . . . . . . 47

3
3.4.2 Splitting of Dataset . . . . . . . . . . . . . . . . . . . . . 49
3.4.3 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4 Results 57
4.1 First Threshold - (Length below 100 bytes) . . . . . . . . . . . 58
4.1.1 Normal Scenario . . . . . . . . . . . . . . . . . . . . . . 58
4.1.2 DDoS Scenario . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Second Threshold - (Length between 50 and 70 bytes & be-
tween 160 and 180 bytes) . . . . . . . . . . . . . . . . . . . . . . 68
4.2.1 Normal Scenario . . . . . . . . . . . . . . . . . . . . . . 68
4.2.2 DDoS Scenario . . . . . . . . . . . . . . . . . . . . . . . . 73

5 Evaluation / Discussion 78

6 Conclusion and Future Work 82


6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

A Modeling Source Code 84

B Dataset Samples 87

4
List of Tables
1 Specification of dataset CICDDoS2019 [124]
2 Labeling of binary classification
3 scikit-learn Python Library [130]
4 Classifier statistics (Normal)
5 Performance Metrics
6 Classifier statistics (DDoS)
7 Performance Metrics (DDoS Dataset)
8 Classifier statistics (Normal)
9 Performance Metrics (Normal Dataset)
10 Classifier statistics (DDoS)
11 Performance Metrics (DDoS Dataset)

5
List of Figures
1 Internet of Things [4]
2 IoT environment
3 Cellular IoT connections by segment and technology (billion)
[5]
4 Massive vs. Critical IoT [12]
5 The three-tier Architecture of the H-IoT systems [22]
6 Mobile subscriptions by technology [5]
7 Different Generations in Telecom [30]
8 ITU X.805 Framework [37]
9 Security threats in 5G [31]
10 NB-IoT deployment [44]
11 Partial Deployment of NB-IoT [44]
12 IoT Protocol Stack
13 MQTT in IoT [50]
14 Some Applications in ZigBee [59]
15 DDoS Attack [65]
16 Attack Life Cycle [18]
17 Direct and Indirect Attacks [74]
18 Complex Reflection Attack [74]
19 DDoS Attack Types [75]
20 Machine Learning [86]
21 An example of KNN classification [89]
22 Structure of Decision Tree [90]
23 an example of separable problem in 2 dimensional space [92]
24 an example - Naı̈ve Bayes model [95]
25 an example - Regression in Gaussian distribution [98]
26 Research Methodology
27 End to End Communication in Network
28 Proposed Method for DDoS Detection
29 Test Lab Devices
30 Test Lab Network
31 Testbed Architecture [124]
32 Wireshark capturing [123]
33 Parametric/non-parametric models
34 k-NN classifier - an example [133]
35 Pseudo-code for k-NN Algorithm [133]
36 SVM - an example [133]
37 Pseudo-code for SVM Algorithm [133]
38 Pseudo-code for Naı̈ve Bayes Algorithm [133]
39 Pseudo-code for Decision Tree Algorithm [133]
40 Pseudo-code for Logistic Regression Algorithm [133]
41 K-Nearest Neighbors
42 Error Rate vs K
43 Other Classifiers
44 K-Nearest Neighbors
45 Error Rate vs K
46 Other Classifiers
47 K-Nearest Neighbors

6
48 Error Rate vs K
49 Other Classifiers
50 K-Nearest Neighbors
51 Error Rate vs K
52 Other Classifiers
53 Overview of Performance Metrics (Normal - 1st Threshold)
54 Overview of Performance Metrics (DDoS - 1st Threshold)
55 Overview of Performance Metrics (Normal - 2nd Threshold)
56 Overview of Performance Metrics (DDoS - 2nd Threshold)
57 First Dataset Sample (Before Transform)
58 First Dataset Sample (After Transform)
59 Second Dataset (Before Transform)
60 Second Dataset (After Transform)

7
Listings
1 Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2 Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3 Splitting of data . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4 Classifiers used in this work . . . . . . . . . . . . . . . . . . . . 54
5 Evaluation model . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6 find K value - Normal (First Threshold) . . . . . . . . . . . . . 60
7 find K value - DDoS (First Threshold) . . . . . . . . . . . . . . 65
8 find K value - Normal (Second Threshold) . . . . . . . . . . . . 70
9 find K value - DDoS (Second Threshold) . . . . . . . . . . . . . 75
10 Source code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

8
Acronyms
6LoWPAN IPv6 over Low-Power Wireless Personal Area Network. 22

AMQP Advanced Message Queueing Protocol. 20

BSN Body Sensor Network. 5

C-IoT Cellular IoT. 2


CALLER ID Caller identification. 5
CoAP Constrained Application Protocol. 5, 18
COVID-19 Coronavirus Disease 2019. 4
CRISP-DM Cross-industry standard process for data mining. 47

DDoS Distributed Denial of Service. 4, 5


DNS Domain Name Server. 5
DoNAS Data over Non-Access stratum. 8
DoS Denial of Service. 3, 4

eNodeB evolved Node B. 8


EPC Evolved Packet Core. 2

GPS Global Positioning System. 5


GSM Global System for Mobile Communication. 2

HSS Home Subscriber Server. 2

ICMP Internet Control Message Protocol. 26


ICRC International Committee of Red Cross. 4
IMSI International Mobile Subscriber Indentity. 9
IoT Internet of Things. 1

k-fold k-Fold Cross-Validation. 49

LTE Long Term Evolution. 2

M2M Machine to Machine. 3


MME Mobility Management Node. 2
MQTT Message Queue Telemetry Protocol. 19

9
NAS Non-Access stratum. 8
NB-IoT Narrow Band Internet of Thing. 2

PDN Packet Data Network. 9

PGW Packet Gateway. 2

QCI QoS Class Identifier. 9

RAN Radio Access Network. 14

RFID Radio Frequency Identification. 5

SGW Serving Gateway. 2

TCP Transmission Control Protocol. 5

UDP User Datagram Protocol. 5


UE User Equipment. 8

UMTS Universal Mobile Telecommunications System. 2

XMPP Extensible Messaging and Presence Protocol. 20

10
1 Introduction
In this chapter, we create a context of this thesis, explaining the motivation,
aim, research questions, delimitation and contribution of this thesis’s work.
In addition, the last section defines the whole content of this thesis.

1.1 Background
Internet of Things (IoT) is described as a “network to connect anything
with the Internet based on stipulated protocols through information sensing
equipments to conduct information exchange and communications in order
to achieve smart recognitions, positioning, tracing, monitoring, and admin-
istration [1].”

By providing IoT based applications and services such as smart health-


care, control energy, process monitoring, environmental observation and
fleet management [2] [3], IoT is providing new capabilities and opportuni-
ties for end-users. As per 2020 forecasts by Cisco Systems [4], 50 billion
internet of things including cardiac monitors, thermostats, smart phones,
surveillance cameras, kitchen applications, cars, television everything are
connected via internet. Figure 1 illustrated the IoT connectivity of devices
like refrigerators, washing machines, pets, healthcare, energy grids, trans-
port system everything are connected with internet [4].

Figure 1: Internet of Things [4]

The IoT environment, which consists of three groups, such as the manu-
facturer of gadgets, the IoT application running on application servers and

1
the Evolved Packet Core (EPC), shown in figure 2, has a role for telecom-
munications operators. In this context, each of these parties should ensure
the protection, security and availability of services to the consumer [5] [6].

Figure 2: IoT environment

Inside the individual area network, IoT devices transmit packets via Z-
wave and Zigbee, while packets of the network are transmitted through
GSM, UMTS, LTE in wide area network. The four nodes, Mobile Manage-
ment Entity MME, Home Subscriber Server HSS, Serving Gateway SGW
and Packet Data Network Gateway PGW, are transmitted from eNodeB
packets to the EPC.

Ericsson forecast report (C-IoT connections by segment and technology)


states that “The Massive IoT technologies NB-IoT and Cat-M1 continue
to be rolled out around the world, but at a slightly slower pace in 2020
than previously forecasted due to the impact of COVID-19. 2G and 3G
connectivity still enables the majority of IoT applications, but during 2019,
the number of Massive IoT connections increased by a factor of 3, reaching
close to 100 million connections at the end of the year.” [7].

In figure 3 Ericsson has predicted that some 29 billion IoT gadgets will
be usable by 2025 [5]. From out of 18 billion devices are IoT devices, such
as motion and door sensors, smart devices [5] and 11 billion of these overall
devices are belongs to smartphones.

In the current advanced stage, IoT is leading the charge and offering pow-
erful services such as new market opportunities, growth in trade revenues,
decision-making, reduction in cost, security and safety, enhanced citizen

2
Figure 3: Cellular IoT connections by segment and technology (billion) [5]

participation and a system of measures [2]. In any case, 70% of IoT devices
contain vulnerabilities in area such as encryption and security of password
among others that give open access to attackers for extreme attacks such
as Denial of Service (DoS) [8].

Attackers are trying new techniques to destroy security, steal your in-
tellectual property and disrupt sensitive information. Regular security
threats are becoming more and more dangerous to defeat and more com-
plex. In this way, we need to be aware of what kind of security monitoring
is needed [9]. The way to defend IoT against attackers is to learn how to
predict attacks [2] [3].

During the time of machine-to-machine M2M, machine-to-person and


people-to-person interface, IoT is categorized in two forms, based on its
specifications and characteristics, which are massive IoT and critical IoT
[10] [11], as shown in figure 4.

Massive IoT uses a large range of sensor and actuator gadgets that are
relatively inexpensive and devouring to maintain longer battery life. A few
examples of massive IoT are used by smart cities, agriculture, transport,
and logistics. Critical IoT applications include autonomous vehicles, re-
mote surgery, remote manufacturing etc. requiring high usability, ultra-
reliability and low latency [5].

Any security issues arising from the delay or inaccessibility of the benefit
to any of these groups will have a possible impact on business and society
[13]. Due to the need for sufficient computational control, capacity and
battery life to conduct authentication, encryption and safety calculations
[8] [6], enormous quantities of resource-restricted IoT devices are connected

3
Figure 4: Massive vs. Critical IoT [12]

to an arrangement that generates a vast amount of information that pose


security challenges to the entire IoT network. In this way IoT devices are
very vulnerable to attacks such as Denial of Services (DoS) and Distributed
Denial of Service (DDoS) [14] [13].

Attackers are attempting new methods to develop or create security thre-


ats that cause theft from intellectual property to confidential data. In these
days security attacks are growing more precarious and more sophisticated.
So, we should recognize what kind of security controls are mandatory [9].
Discovering how to predict the attacks is only way to protect IoT against
attacker [3].

Due to COVID-19 pandemic, cyber criminals gave us massive challenges


specially in health field. Due to this health crisis, they took advantage to
develop their attacks on healthcare, hospitals, medical research centers and
on international health public organizations. Due to these threat issues,
the International Committee of Red Cross (ICRC) and other members have
published a letter to various governments to do more on security and safety
on these medical organizations from cyber-attacks. [15].

Any security concerns will lead to interruption or inaccessibility of the


services in any of these categories affecting business and communities alike
[13].

1.2 Motivation
IoT coordinates millions of web-based devices to make our everyday lives
easier, however it faces security challenges that need considerationas IoT
objects are ineffectively controlled, maintained and protected [16].

4
The IoT DDoS attacks were a main dominant attack in 2017, in line with
the Arbor Security report [17], and 65 percent of the attacks carried out
in 2016 were a significant DDoS attack. The Mirai DDoS attack [18] was
triggered by the contamination of defective IoT devices, being the biggest
attack ever. Consequently, DDoS attacks should be detected and mitigated.
Transmission Control Protocol (TCP), User Datagram Protocol (UDP), and
DNS flooding are the most common attacks on DDoS. Protection measure-
ments are challenging to enforce due to memory limitations, power, memory
and the heterogeneous nature of IoT devices.

1.3 IoT in Industry


This example illustrates how IoT and its vulnerabilities will affect our
lives. Due to urbanization and population, the need for a green environ-
ment is currently expanding. “Green rooftops” are a way to preserve a
green, high-quality environment. Water quality can be improved, carbon
dioxide reduced, urban warm islands, reduced noise, emery, air pollution
etc. [19]. However, it is difficult for a human to cut grass that grows on the
rooftops while grass cutting robots can do it effectively. It would be costly
and impractical for each owner of a green roof to acquire a robot.

In the case of a company cutting grass by owning a few robots and con-
tracting green roof owners to cut grass on a plan based on their develop-
ment, be that as it may be. IoT can mechanize this errand with a drone
that allows geofencing engineering to carry robots to the rooftops, a benefit
that uses GPS, CALLER ID and RFID. Geofencing is also used by robots to
prevent dropping and walking outside the roof barrier. In this case, there-
fore, IoT brings biological and financial benefits by enabling devices to com-
municate. In reality, if an intruder activates DoS or misinforms drones or
robots, the consequences can be quite harmful.

IoT security is therefore very imperative, and IoT attack results can be
more dangerous than web attacks. As EY states, “Our Cyber Risk Manage-
ment services help organizations tackle the many security challenges they
face on a daily basis — supporting risk-based decisions and improved cy-
bersecurity, reducing costs related to managing security risk, and improving
their overall cybersecurity posture” [20].

1.3.1 Healthcare
Healthcare is one of the most important areas in which IoT has pro-
vided comfort to both physicians and patients with important aspects such
as real-time observation, health care and patient data management, etc.
The Body Sensor Network (BSN) innovation is another breakthrough that
makes it possible for a physician to collect data from patients to additionally
screen them via extremely compulsive devices that use lightweight protocol
for transmission of data such as CoAP [21].

5
As shown in figure 5, these devices collect and transmit sensitive infor-
mation to another core, such as a gateway. The protection and security
of these sensors’ devices is extremely important because they hold the pa-
tient’s critical data. Any unauthorized entry, leakage and capture of these
devices can cause serious harm to patients.

Figure 5: The three-tier Architecture of the H-IoT systems [22]

The information segment can be changed due to attack. Due to changes


in fragment and manipulation in packet, the data is altered, that can be
dangerous and life critical [23] [21]. If an intruder inflicts DoS on devices
that change the value of the patient’s high heart-beat, the device will not
be triggered, and this will cause real problems and in some cases death.

1.4 Problem Statement


This thesis aims to analyze different machine learning techniques that
can help in detecting or even predicting an exploit targeting IoT devices
connected to cellular network. The hypothesis of this thesis is as follows:

If we are able to obtain information from the control and data plane in
a cellular network, coming from IoT devices, we can use machine learning
and anomaly detection algorithms on these data to see if it allows us to
detect or even predict an upcoming attack.

6
1.5 Research Methodology
The aim of this thesis is to detect anomalies in IoT devices that are con-
nected to a cellular network. There are three key stages: information col-
lection (normal and DDoS traffic), feature extraction and selection, and ma-
chine learning classification. To this end, the packets were generated by
IoT devices. Normal and DDoS events were collected separately during the
information collection processes. Information was pre-processed and classi-
fied, in the final stage with Scikit-learn tool [24]. Using (k-NN, SVM,naı̈ve
Bayes, decision tree and logistic regression) the most known classifiers, a
performance evaluation was made considering the task at hand.

1.6 Thesis Structure


The thesis is organized as follows.

Chapter 1 Introduction: Short overview of the security problems in IoT,


machine learning and DoS attacks, and what is the purpose of this thesis.

Chapter 2 Background and Related Work: What kind of technologies


and concepts are involved in the thesis, and also some related work.

Chapter 3 Approach: Planned steps that are important to address given


the problem statement, along with descriptions of the experiment design,
methodology and results evaluation approach.

Chapter 4 Results: Presents the outcome of what was done and explains
how the project was implemented and how the experiment was conducted.

Chapter 5 Discussion / Evaluation: Given the provide outcome, outlines


are provided as to what to do when facing the problem selected in this the-
sis. Acknowledgment of shortcomings.

Chapter 6 Conclusion: Show summary of the thesis highlighting the


obtained results and provide guidelines for continuation of the research
topic.

7
2 Background and Related Work
This chapter presents the evolution of mobile technologies, Internet of
Things and an overview on the available security mechanisms for this sec-
tor. The section starts by describing the mobile technologies, difference be-
tween attack, threat, and vulnerability, and continues with what is a DDoS
attack and how it can be inflicted. It showcases how other researchers have
used machine learning to detect DDoS attacks and what techniques they
used so for. In addition, this chapter describes the life cycle of the attack
and clarifies with a single summary how the attack can be detected within
the network traffic.

2.1 Cellular Networks


A communication network where the last link is wireless is called a cellu-
lar network or mobile network. The transmission of this network is spread
out over land by what is called “cells”, which normally served three cell sites
or transceiver base stations. With these base stations, the cell provides net-
work coverage that is used for transmission of data, voice, and other types
of content [25].

2.1.1 Cellular Networks Architecture Concepts


We can first understand the mobile network architecture, where how the
packets transfer from end to end communication between user equipment
and IoT servers. The core network architecture is described below:
1. User Equipment: Devices such as smartphones and IoT devices that
interacts with network and core service. When communicating with a
network, each user’s device holds a unique identification for connect-
ing.
2. eNodeB: evolved NodeB is abbreviated as eNodeB or eNB and is part
of the LTE network. User equipment is connected to EPC through the
air interface with the help of eNodeB. The established link between
eNodeB and UE is called the radio interface. eNodeB is responsible
for client data stream encryption, radio resource management, and IP
header compression in accordance with 3GPP discharge 8, client plane
routing to SGW, paging message and broadcast information prepara-
tion and transmission, MME determination at UE link while UE data
does not provide routing [26].

3. MME: Mobility Management Entity (MME) deals with control plan


and manages eNodeB security and portability based signaling. In ad-
dition, MME negotiates with the user plane in the DoNAS (Data over
Non-Access Stratum) as it may be. In addition, MME is also reliable
when paging user equipment is idle mode and tracking area range
list, roaming, NAS signaling and NAS signaling security [27] [28].

4. HSS: Home Subscriber Server (HSS) can be a standardized function


capable of removing the individual authorization and benefit profiles

8
of all subscribers. HSS acts as a database containing both the open
and private identities of the subscriber, credentials, the IMSI and the
data used to show the type of assistance that each user that using
the subscription of mobile services. HSS is used when the device asks
for radio resources to check the device specific IMSI status. Other
features of the HSS include support area function, mobile roaming,
domestic area registries, preference setting of subscriber, setting and
mobile authentication server [29].

5. SGW: For eNodeB, Serving Gateway (SGW) is the local mobility han-
dover point. SGW is responsible of the packet forwarding, inspection,
routing, PDN, UE and QCI uplink and downlink charging [27].
6. PGW: Packet Data Network (PGW), the door interfaces of the EPC to
outside IP network. PGW features include (Deep Packet Inspection),
IP assignment to UE, legal inspection, user package sifting, uplink
and downlink benefit level charging and policy control [27].

2.1.2 Evolution of Mobile Technologies


Currently there are 16 billion cellular customers from 2G to 4G and it
is gradually increasing [18] to the 5G generation, with approximately 28
Billion including IoT devices. Figure 6 can clearly show us an increase in
customers by organizing changes such as 2G to 3G and 4G. Now the boom
of subscriptions (for user and devices) has arrived. We need now faster data
and a low latency network to accommodate such increase [18].

Figure 6: Mobile subscriptions by technology [5]

1. 2G - GSM/CDMA:- GSM stands for Global System for Mobile Com-


munication. A technology that uses for transmission of data and voice
services. The concept of GSM emerged in Bell Laboratories in 1070s.
GSM is most widely implemented and standardized technology around
the world. GSM is circuit switching system that divide into 200 kHz
channels into eight 25 kHz. In most part of the world GSM mobile

9
Figure 7: Different Generations in Telecom [30]

communication uses the bands 900 MHz and 1800 MHz. In US, GSM
operates on 850 MHz and 1900 MHz. [31]

After almost 10 years, in 1991, 2G was launched by more features


and services such as coverage improvement and, of course, voice qual-
ity was much higher than 1G. The speed was increased to 64 kbps in
2G and 1st digital solution used, like GSM (Global System of Mobile),
CDMA (Code Division of Multiple Access), TDMA (Time Division Mul-
tiplexing Accessing). 2G were used for voice and packet switching.

Due to some low-cost base station and mobile set, the GSM tech-
nology is popular around the world. On the basis of GSM architec-
ture, technologies grows advanced in next generation (3G) systems
and LTE. The Base Station Subsystem consists of the Base Station
Transceiver (BTS) unit which is connected to the mobile station (MS)
and Base Station Controller (BSC) through the connected air inter-
face. BSC manages the process from a BTSs to the exchange cen-
ter. Portability over BTSs is also monitored. Network Switching
Sub-system is another sub-component. It contains the Mobile Switch-
ing Center (MSC) and databases of subscriber. MSC performs the
switching part and carry out the call from connected to the calling
party. MSC is associated with the Public Switching Telephone Net-
work (PSTN). Home Location Register (HLR) and Visitor Location
Register (VLR) are used to test the proposed character of the MSC
subscriber. [31]
2. Third Generation (3G) - UMTS:- The new era of technology is launc-
hed in 2003 when 3G is launched. This 3G launched the boom in the
field of mobile devices and smartphones powered by 3G services. 3G
was based on digital voice and also included digital IP and web mail
and SMS information. The technology used in 3G was W-CDMA and
UMTS. The speed has been expanded to 2,000 kbps in 3G and the first
portable broadband has also been introduced [32].

10
3. Forth Generation (4G) - LTE:- Get the benefit of trying to get a
higher web speed on mobile devices 4G/LTE is introduced. Networks
and apps eventually require a higher web speed, so 4G was launched
in 2011 because of this massive need for faster internet access, it is
still used in conjunction with 2G and 3G. It was clearly explained for
data that is based on an IP-based protocol (LTE) for interactive media,
the key of 4G designing. The switching technique used in 4G was
packet switching. The speed has increased to 100,000 kbps [32].
4. Fifth Generation (5G):- Due to higher usage of devices in current
year the new concept of high-speed internet is introduced. Due to
high speed internet next generation of transport network is required.
5G is the answer to these questions, 5G will be very useful in the year
2020. The 5G breakthrough can be a digital voice and data capacity
and special features for IoT (Internet of Things) and AR (Augmented
Reality), VR (Virtual Reality). Anything from smart cars to city grids,
etc., and items communicate with each other for IoT smart by using
different protocols such as the CISCO CCN and the MQTT protocols.
Packet switching technology is used in 5G network. The latency in 5G
network is only 1ms [32].

2.2 Vulnerability, Security Threats and Attacks


Before moving forward towards IoT security, we should be able to under-
stand and know about the difference between vulnerability, security and
attack.

2.2.1 Vulnerability
Vulnerability is a weakness point in information system, where an at-
tacker gets internal control and launch an attack or authorized activity.
A vulnerability may be in three steps: Flaw in a system, an attacker get
access to the flaw and can exploit them. The attacker must be connected
with the system by any kind of tool and exploit the vulnerability. In any
case, vulnerability must not be harmed in the form of risk, because the risk
that could theoretically impact the system can be exploiting by misuse of
vulnerability [33].

We see a lot of flaws in IoT devices. Consumer-available products expe-


rience flaws that are extremely easy to spot, but due to lack of memory,
processor and power in IoT devices it is difficult to fix. In addition, there
are software errors in IoT coming from computer management, operating
systems and communication protocols.

2.2.2 Threat
A threat is an advantage to the intruder, that is achieved by using the
system’s vulnerability and it includes a negative effect on the actions of
the system. In addition, threat can be triggered by humans and also by
naturally causes such as a seismic tremor, floods and some other natural
disaster that hurts the computer structure and IoT systems.

11
There are also Man-made threats created by experienced individuals to
find vulnerabilities and produce system damaging codes and scripts or to
participate in any criminal activity, e.g. trade or government data. These
kinds of threats are called structured threat, while unstructured threats
can be caused by unpracticed humans, who present a malicious device in
their equipment and do not have enough data on the threat that their adap-
tation can cause. As well as structured and unstructured threats [34] [35],
are both uses in IoT devices. Threat must be defined through the collective
participation of application originators, security specialist, analyzers, de-
signers, and system administrator. Attack tree and attack pattern can be
used to identify threats.

2.2.3 Attack
Attacks are the dangerous and alarming effects of a certain behavior of
a device that is induced by vulnerabilities through a variety of gadgets and
methods. Attacks have distinctive processes, for instance, one type of at-
tack in which the attacker finds the sensitive information of unencrypted
network traffic is called an active attack. Another type of attack is a passive
attack that monitors weakly encrypted traffic to find authentication infor-
mation. The most typical attacks are accessed attack, physical attacks,
distributed denial of service attack, privacy attack link steal password and
cyber security attack [34].

2.3 Security Challenges


In this section we will discuss security issues in UMTS, LTE and 5G tech-
nologies. We also discuss security challenges and vulnerabilities in IoT.

2.3.1 UMTS (Universal Mobile Telecommunication Sys-


tem)
UMTS is called universal mobile telecommunication system, also known
as 3G-mobile systems. Regardless of the unique security architecture of
UMTS (3G) that led to the development of unused administrations and a
key component of network formation, the existence and area of portable
users is one of the 3G concerns. Every user of cellular network is uniquely
identified by IMSI. Another thing is the firewalls that already had a secu-
rity problem. UMTS firewalls secure simple text information from external
attacks. In any case, attacks can emerge from another adaptable supporter
of the arrangements, from someone who can prepare for the UMTS core
network. Current VPN provide a specified adaptability for setting up safe
mobile user associations. Data privacy is a concern for UTMS, as the infor-
mation transmitted inside the gateway is not encrypted in WAP architec-
ture [36].

2.3.2 LTE (Long Term Evolution)


The enhanced protection and efficient networking of the mobile telecom-
munication system known as 4G is accepted for the next generation. Since

12
it uses TCP / IP architecture, the 4G organization is totally IP oriented. All
the components make the Evolved Packet Core (EPC) which is also known
as EPS (Evolved Packet System). 4G Engineering consists of two parts,
EPS and eNodeB. Authors [37] [38] classified LTE threats in the following
categories:
• Identity and Privacy of User: Prohibited access and usage of user,
the hardware identity of users to access network or change the char-
acter of users to commit malicious actions.

• Tracking of User Equipment: IP that tracking can be related to an


IMSI dependent on UE / UEIM.
• DoS Threat: Credibility of pushing DoS attacks against other cus-
tomer devices.

• Unauthorized Access to The Network: Unauthorized access to the


EPC section can lead to a number of attacks and security exploitation.
• eNodeB Physical Access and credentials: Unauthorized access to
eNodeB will result in an attack being driven to any EPC node. Cre-
dential faked or cloned passwords, false configuration, and inacces-
sible, algorithmic assault-related information may cause significant
security issues.
• Protocol Attack: Exploit any convention’s security vulnerabilities in
the interface of any node can easily cause DDoS attack and any other
security issues.

• Jamming: It is an attack in which a jammer transmits in order to


disrupt a gain or cause a denial of service via the RF vector. Uplinks
and downlinks may be used to stick to the LTE. Sticking focuses on
the base station within the uplink, while the target is the client gear
in downlink jamming where the flag is moved from the base station to
the user hardware.

Another form of jamming attack in LTE is protocol-aware, that is enabled


by protocol openness. In addition, messages sent by the base station are not
encrypted, which can lead to hacking and sniffing attacks [38]. The X.805
system security architecture illustrated in Figure 12 is also recommended
for 4G remote arrangement, that described by authors Anastasios et al [37].
The X.805 security arrangement consists of three layers: service and infras-
tructure layer, application, and eight security planes that are availability,
data confidentiality, communication security, privacy, authentication, data
integrity, control, access and non-repudiation.

The definition of Yongsuk et al [39] tends to suggest that due to the het-
erogeneous and IP-based arrangement of 4G, there are conceivable effects
of modern threats that could lead to unintended interference with data and
service disclosure. In addition, the imaginable threats in 4G include con-
sumer ID and service theft, DoS, IP spoofing and a massive range of related
heterogeneous devices, which are possible safety holes for the framework.

13
Figure 8: ITU X.805 Framework [37]

These 4G interface devices can overpower any 4G portable node in a variety


of ways, such as DDoS. There is no effortless completion of security issues,
and security research and development would be available on an ongoing
basis because security threats are always open.

2.3.3 5G
Compared to previous standardized, the different characteristics of 5G
application and services, such as eMBB, URLLC and mMTC, are vulnera-
ble to attack and have a variety of uses for targeting 5G services. There
are many possible ways in 5G that transfer from client hardware, such
as mobile phones, robots, IoTs, computerized mechanical devices, indepen-
dent vehicles, etc. The 5G security threat is shown in figure 9, which com-
bines Radio Access Network (RAN), core and internet. Man-in-the-Middle
(MiTM) attacks on Cloud RAN (C-RAN) domains, IP core network that are
vulnerable to DDoS and client devices that are exploit by malware and bots
[31].

The security challenges associated with them core network are critical
for mobile network to recognize, as the full structure of 5G portable sys-
tems depends on the network. The entire system will fail if the advances of
everyone are vulnerable to the attacker. 4G and previous mobile networks
have not been outlined in order to fix security concerns related to NFV and
SDN. Security requirements are required for both signaling and informa-
tion activity at different attack focal points (as shown in figure 9) from user
equipment (UE) to RAN and core network [31].

14
Figure 9: Security threats in 5G [31]

2.4 IoT
As described in [40], IoT has a lot of security threats and challenges.
According to the researchers, we need to understand new features of IoT
for understanding security threats in IoT device. Below we define some
security threats of IoT that cause attack in IoT devices.
1. Ubiquitous: If we are talking about the IoTs, it is involved in our
daily life and use all our resources. Individuals who do not have an
idea of the security of devices are still use them, and manufacturers do
not pay much attention to the security of these devices. Producers do
not provide any safety advice or information that the device collects all
your sensitive data. The unsafe default configuration of these devices
is one of the latest attacks’ triggers. With the all aspects of these
devices abnormal behaviors of these devices are viewed and controlled
by the network operators.

2. Diversity: IoT has a number of devices that are involved in use cases
and applications. IoT tracks different cloud networks through distinc-
tive security elements and conventions. Differences in device capa-
bilities and requirements make it difficult to create a global defense
network. In order to dispatch DDoS attacks, attackers benefit from
these distinct qualities. The Intrusion Detection Systems (IDS), and
Intrusion Prevention Systems (IPS) can provide help in preventing
intrusion attacks.
3. Privacy: The parallel relationship exists between IoT devices and
users. A few sensors jointly gather important natural information
to track our climate. It is an easy task for a hacker to get sensitive

15
information and identities. For example, a smart home activity ar-
rangement by a home network traffic [41].

4. Unattended: Some IoT devices are special purpose devices, such as


Implantable Medical Devices (IMDs). These types of devices have
been operating in an unprecedented physical world for a long time
without human mediation. It’s extremely difficult to both apply se-
curity computing and see if these devices are remotely hacked. In
[42] authors proposed a lightweight, stable execution environment for
these types of devices.
5. Mobile: Several IoT devices are portable and switch from network
to network. As an example, a smart vehicle that collects street data
when driving from one place to another. If the attacker injects the
code by mobile devices, the device configuration or activity is changed.
But the change of device configuration is very difficult special when
the network portability is configured on your device. The chances of
attack are decreased in portability.

2.4.1 NB-IoT
Narrowband IoT is an invention of LPWAN radio that enables a wide
range of devices and services to be linked to the transmission band of cel-
lular media , specifically designed for IoT and standardized by 3GPP [43] ,
as discussed in the delimitation section of Chapter 1. NB-IoT is designed
to transmit a small batch of data provided by low power and a less verbose
IoT devices that transmits several bytes of data per day. NB-IoT operates
at 880-960MHz and 791-832KHz [44] respectively. In any case, there are
many NB-IoT confines on which an attacker can focus. As introduced in
3GPP Release-13, the reason for NB-IoT is to step forward indoor range
of 20dB compared to the legacy GPRS devices, to help at least 52547 low-
throughput devices, to reduce the complexity and efficient power consump-
tion with a battery life of 5 Watt 164 dB MCL battery capacity target of
10 times long time. As shown in figure 10 [44], NBIoT can be deployed in
stand-alone mode, in-band mode and in the guard band of an existing LTE
carrier.

Given the current LTE system where NB-IoT is supported by a parcel of


LTE cells, the problem arises when the gadgets must be served by a leading
NB-IoT cell that is distant from the device, but the most grounded LTE cell
is near the devices as shown in figure 10. In this case, coverage is compli-
cated and there is a critical path from the serving cell to misfortune that
shown in figure 11. Due to low SINR and range problem we observed the
LTE cell interference in NB IoT. NB-IoT endures from co-channel interfer-
ence. Regardless of the fact that challenging Physical Resource Block (PRB)
is a method to avoid co-channel interference when synchronization occurs
between cells at the subframe stage.

There are a few open issues and vulnerabilities that need to be resolved.
The NB-IoT plan is to significantly enhance massive heterogeneous devices

16
Figure 10: NB-IoT deployment [44]

Figure 11: Partial Deployment of NB-IoT [44]

and different applications that change prerequisites such as idleness, reli-


ability, bandwidth. The heterogeneous nature of the IoT makes it a chal-
lenge to establish a structure. Security and protection are another problem
for IoT users. The same issue is with NB-IoT due to the acceptability of
listening in attack after the PF radio channel gap. In addition, the limited
design of the device and limited transmission capabilities of NB-IoT make
it more difficult to run a feasible security system by simple calculations, as
most of the techniques require sufficient battery and handling control for
message trading. Ancient vitality management and control innovation with
a complicated NB-IoT channel is also a major issue and vitality capability
[45] [46]. Another issue with NB-IoT is that with larger packet sizes, the

17
S11-U interface could crash.

2.4.2 Protocol Vulnerabilities in IoT


IoT uses a variety of network communication to work with heterogeneous
service. Wi-Fi, Bluetooth, Z-Wave, IEEE 802.15.4 and LTE are examples of
IoT conventions. There are also some basic communication developments
in IoT, such as Ultra-Wide Bandwidth (UWB), Near Field Communication
(NFC) and RFID. Prior to the use of these protocols, potential threats in IoT
conventions should be notable. As shown in figure 12, IoT basic conventions
are presented first and their vulnerabilities are explained below.

Figure 12: IoT Protocol Stack

1. CoAP (Constrained Application Protocol)

The lightweight CoAP is an application layer convention clearly de-


fined by IETF for restricted devices. In the client server architec-
ture, this convention can be coordinated with GET, POST, PUT, and
Delete HTTP counting. CoAP runs over UDP while with low over-
head, and multicast communication is also enabled. It has two types
of CON (Confirmable) and NON (Non-Confirmable) messages with a
total length of 1400 bytes and a header length of 32 bits [47].

Datagram Transport Layer Security (DTLS) approved by CoAP is


necessary for the protection of messages. Although DTLS is an ad-
ditional layer of security for the safety of the application layer, there
are ongoing discussions about the containment of DTLS. DTLS CoAP
issues include a comprehensive DTLS header that does not fit into
IEEE 80.2.15.4 MTU, high handshake, CoAP Intermediate mode con-
tradiction, and toll-based computing.

The test performed in [48] shows that DoS attacks can be activated
more than once by sending CoAP requests to a border switch in a
smart home. As a result, 75% of the legitimate packets are misplaced
by sending malicious requests every 500 ms, and the effectively pack-
ets are destroyed by CoAP flooding. So, no effect on communication is
observed under the DoS attack by empowering the protected mode of
the transceivers.

18
2. MQTT (Message Queuing Telemetry Transport)

MQTT is a message protocol that connects devices to middleware


and applications using a broker-based publish-subscribe protocol. In
the publish-subscribe process, messages are transmitted from a provi-
der to a subscriber based on a message parameter. In MQTT, the
provider operates over TCP and transfers the message through three
QoS phases. The supplier, the subscriber and the broker are three
MQTT elements. For IoT and M2M (Machine to Machine) commu-
nication, MQTT is an appropriate message convention requiring low
capacity, memory and low transfer speed [49].

Figure 13: MQTT in IoT [50]

In the MQTT system, Ahmad et al [51] categorized IoT risk opera-


tors into four groups:
3. Malicious Internal User: The client has legal access to the device
and uses the device for malicious purposes. The malicious user who
gets access to the MQTT broker may also give risk to attacks. Curi-
ous User: The client or analyst in the IoT environment who wants to
find the hole and vulnerabilities. Bad Manufacturer: The maker who
takes off the open portal for the aggressors to get information about
the devices or users to get access of device remotely. In order to dis-
patch an attack or collect sensitive information, the enemy will inject
a malicious code into the MQTT client or broker at that point. Exter-
nal Attacker: Master programmer who performs malicious activity on
any part of the MQTT-based framework.

Attackers in the MQTT-based IoT environment can submit DoS,


spoof character, disclosure of information, privileges and alter infor-
mation of data. Disturbing broker services can cause DoS within the

19
MQTT system, where the main task of the broker is to deliver mes-
sages from the distributor to the subscribers. Attackers also trigger
DoS by draining the MQTT client and broker by sending messages
larger than 256 MB, which is the MQTT’s most extreme payload mea-
sure. In addition, TCP focuses on MQTT, and TCP attacks such as
consumption of bandwidth, SYN flood, etc. in DoS attacks. An unse-
cured MQTT broker can generate a variety of IoT vulnerabilities. For
example, the transfer of all information or confidential information to
the public and the modification of the data stored in the broker or the
launch of the DoS [52] may lead to an out-of-chance for the aggres-
sor to hit a compromised broker. Despite the fact that MQTT relies
on SSL / TLS for the security component, it is costly to enforce it on
devices [53].
4. AMQP (Advanced Message Queueing Protocol)

A lightweight M2M communication protocol, that reinforces the pub-


lication subscription architecture and requests for response. The AMQP
system provides an “exchange” word for distributors and supporters
to find them. The subscriber then makes the “queue” and attaches
it to the “exchange” and by “binding” the trade messages must stick
to the line. AMQP, like MQTT, runs over TCP and uses SSL / TLS
and SALS for stability. It is connection-oriented and is known as a
strong and stable protocol [54]. AMQP uses SSL / TLS-based TCP-
based transmission encryption, there are still vulnerabilities that an
attacker can use to intercept IoT communication. Because TCP / IP
is a key protocol for AMQP, attackers have already misused TCP vul-
nerabilities in many ways. In addition, AMQP is also susceptible to
IoT frameworks [55].
5. XMPP (Extensible Messaging and Presence Protocol)

This protocol is based on XML (Extensible Markup Language) and


provides real-time communication. XMPP can be configured as a client
server and run on a TCP / IP stack. Since XMPP is based on XML,
it can be used in a number of customized applications, such as time
reporting, notification, communication between devices, objects, actu-
ator sensors, etc. XMPP uses SASL and TLS [56] for secure authenti-
cation and encryption purposes.

In [53], authors state that XMPP has failed to provide end-to-end se-
cured communication for the deployment and implementation of IoT.
Unreliable XMPP is defenseless against attacks such as sniffing of
password, unauthorized access to servers, embedding, erasing, replay-
ing, and even more attacks.

20
6. ZigBee

The ZigBee is a set of communication protocols for transferring a


low data rate in short range wireless network. The hundreds of com-
panies of ZigBee Alliance [57] developed the ZigBee standard. Physi-
cal Layer and Medium Access layer protocols are adopted by ZigBee.
Mostly ZigBee device frequency band is based on wireless range 868
MHz, 915 MHz, and 2.4 GHz. 250 K bits per second is a maximum
data rate of ZigBee devices.

The ZigBee devices mainly is on battery power that consumes low


power, low data rate and low cost, but the main requirement is bat-
tery life. The total time that ZigBee applications spend with wireless
devices are very limited, mostly the devices are in power saving mode,
that is also called sleep mode. Due to this feature ZigBee devices can
retain the battery life for several years [58].

Figure 14: Some Applications in ZigBee [59]

As an example of ZigBee is home based monitoring system of pa-


tient. In the home monitoring system, the patient’s heart rate and
blood pressure is monitored with wearable devices. The wearable
devices relate to different sensors via ZigBee. All patient data is
transmitted to the local server, i.e. a local personal computer. From
this personal computer, data is initially analyzed inside the patient’s
home. For the final decision, the data is transferred to patient’s physi-
cian for further analysis on data [58].

The transmission is received in wireless network by any of devices.


The devices are bluetooth enable or any other devices. If the intruder’s

21
device is in the network, it will listen all the sensitive information via
transmitted messages. The confidentiality problem is solved by ap-
pling some encryption algorithm on the messages. IEEE 802.15.4 en-
crypts the ongoing messages by Advanced Encryption Standard (AES)
[58].
7. 6loWPAN

The 6LoWPAN (IPv6 over Low-Power Wireless Personal Area Net-


work) is designed as physical layer and communication layer for MAC
by IETF for low control and lossy systems that are compatible with
IEEE 802.15.4 standard. The 6LoWPAN devices are known for their
smaller bit rate, fast run, computational control of low power, and
low-cost memory. Authors in [60] investigated the discovery of vulner-
abilities in 6loWPAN using fuzzing methodology using Scapy. Fuzzing
can be a highly automated method that is widely used to detect unex-
pected error and flaws in network protocols that can be misused by an
attacker.

Authors [61] propose that an attacker can misuse 6loWPAN routing


mechanism and fragmentation in order to discard the correct prepa-
ration of the actual part of the packet. It was considered that the
constructed devices with tens of kilobytes of RAM, few MHz of compu-
tational power and communication through low power of wireless and
6loWPAN are defenseless against the following attacks: Fragment du-
plication attack on the 6loWPAN layer, in which the receiver is unable
to distinguish the legitimate fragment from the spoofed fragment and
has deal with all the parts it receives, on the same IPv6 that corre-
sponds to the 6loWPAN tag and the MAC address of the receiver. For
example; in the handshake packet of DTLS protocol the hacker inject
the spoofed FRAGNs and inject this fragment to legitimate 6loWPAN
packet. The attacker injects random payload with spoofed packet and
add this packet in original packet. The attackers will block any packet
with inject of fragment.

Another type of attack in 6loWPAN is Buffer Reservation attack,


in which the attacker hits the memory of IoT devices. The purpose
does not discriminate between the real and the attacking elements of
this attack, as opposed to the previous attacks. The aggressor starts a
single FRAGE1 with a few abnormal payloads to send a buffer reser-
vation attack and coordinates it to the target node. If the target node
buffer is not included as of now, the target node will get the FRAG1
that saves the buffer to reassemble the fragment packet of the at-
tackers. The attacker either does not send the remaining FRAGNS
at that point or saves the buffer resource by intermittently sending
the FRAGNs to the timeout estimate of the target node. Therefore, no
other part of packet can be processed for the function. The attackers
identify their target node via the guide section used in 6loWPAN [61]
for both buffer reservation and fragment duplication of attack.

22
In another recent research [62] that classifies the safety risks of
6loWPAN as an end-to-end and hop-to-hop attack. The hop-to-hop
attacks of 6loWPAN systems are triggered by inner malicious nodes
that are harmed. This form of attack is attacked by radio hops, phys-
ical link and routing discovery process. Tempering, battery exhaus-
tion, wormhole, jamming, spoofing and selective forwarding attacks
are triggered by unprotected equipment and the ability of the attacker
to control the 6loWPAN layer. The end-to-end attack on WSN IPv6-
based systems is caused by unauthorized external hardware. Attack-
ing the end-to-end link is harmful to the whole network. End-to-end
security is necessary because the hardware performs reassembly in
IPv6 and bundle fragmentation to maintain a strategic distance from
bundle modification and to reassemble the components. The attack
of this group takes place between the end of IPv6 and the 6loWPAN
boundary switch. Overwhelming the edge router, for example, by gen-
erating large amounts of activity or impeding communication by in-
fusing incorrect messages within the border router.
8. 802.15.4 Standard

The IEEE 802.15.4 physical layer standard is also becoming popu-


lar with IoT development due to the use of low power. However, reli-
able data communication could be a major challenge in the low power
consumption protocol. Various approaches have been implemented to
provide reliable communication over the different layers of the proto-
col stack. These approaches are secure, which is a physical layer and
upper layer encryption, and theoretical information security, which
can be achieved through physical layer security strategies. The strate-
gies for physical layer encryption rely on the information modulation.
At various protocols, sharing of security plans is for upper-level se-
curity, such as end-to-end encryption, but they also do not prevent
risks and attacks such as flood attacks, DoS and traffic inspections
[63]. 802.15.4 is enabled by the MAC layer that offers security ser-
vices such as confidentiality and integrity. However, these services
may be achieved at the cost of the use of power, which is not easy for
802.4.15. For the transmission of secure data, a strategy for steganog-
raphy method is proposed [64]. Low data rate over convert channel is
a big drawback of this method.

2.5 What is DDoS


DDoS stands for ‘Distributed Denial-of-Service’ and it is a kind of DoS
(Denial-of-Service) where the intruder performs a attack through several
locations from different sources simultaneously. DoS attacks are most driven
by directing or shutting down a specific resource, and one method of oper-
ation is to exploit a system deficiency and cause failure of processing or
saturation of system resources. Another way to targeting the infected sys-
tem is to flood and monopolize the system, thereby refusing the use of it by
someone else [18]. The rejection of accessing or coordinating the infected

23
device is what characterizes and categorizes the attack as DoS . It is im-
portant to remember that the attacker has to install the agent code on any
resource or device that supports it, in order to have a infiltration point in
the targeted system, regardless of being IoT devices, servers, network com-
ponents and any mobile device [18].

Figure 15: DDoS Attack [65]

DDoS is carried out simultaneously by sending a critical number of re-


quests via botnets and compromised IoT devices to undermine the comput-
ing resource of the target (Transmission Capability and Bandwidth). Bots
can be either malicious customers who are expressly legitimate or attack
customers who are affected [66].

2.5.1 DDoS attacks, an overview


Authors [67] claim that battery, computation, memory and radio trans-
mission capability are limited in IoT devices. In this way, it is not easy to
enforce security actions that involve a massive communication stack and
more computing resources. The IoT devices, services and supporting net-
works cannot withstand such type of attacks such as DDoS, spoofing, jam-
ming man-in-the-middle, and privacy. Authors also suggest the usage of
machine learning techniques, that is important for finding the vulnerabil-
ity and security threats in IoTs.

As most of the threats in IoT come from risky IoT devices, a network-
based technique for detecting infected IoT devices has been suggested in
[18]. The method suggested is accomplished by researching the two Mirai
and BASHLITE malware families. With the help of ISP accessible tools
such as NetFlow, DNS capturing analyzer, packets capturing, it is possible
to detect and analyze the malware. The main objective of the authors is to

24
discover the common properties, techniques and malware phases that de-
tect the weaknesses of IoT. The four stages that every IoT malware followed
in their life cycle are given in figure 16.
• Scanning: Testing is carried out by filter engines to detect the vul-
nerabilities of hosts. Random IPv4 subnets are checked and most of
the time port 23 running the Telnet daemon is the target, often port
2323 and all other ports running that running the different services
are scanned.
• Attacking: It is a very common property in IoT malware devices.
Most of the time the attacker sees the default username and password
such as ”admin / admin” or ’root/root’ and attack on the IoT devices.
The attack is also exploited these devices by using the TR-064 and
TR-069 services.
• Infection: It is conducted in a number of ways, such as getting to
HTTP(wget), TFTP [68], and over Telnet. The malware is downloaded
in this way by using the C binary compiler code and infected the
scanned device.
• Abuse: DDoS attacks are carried out by IoT botnets. SYN and SYN
/ ACK flood, TCP and UDP flood, and HTTP attacks are included in
the bulk of DDoS attacks.

Figure 16: Attack Life Cycle [18]

As per authors, attacks on compromised IoT devices take place after fol-
lowing the same life cycles and can be mitigated at the ISP level by the said

25
system. Attacker issues DDoS attacks from the cloud by leasing virtual ma-
chines as they have higher processing capacity and cannot be tracked back
by these machines. DDoS security applied on both direction, i.e. source or
destination end. Attacks can be described as within the target side defense
system when it comes to the target, which is one of the disadvantages of
this type of defense. However, by comparing approaching and active ac-
tivity to identify DDoS attacks through D-WARD [69], MULTOPS [70] and
MANAnet [71] frameworks, the source side security frameworks underlie
the destination side defense.

The authors in [72] proposed a DDoS machine learning detection system


that would include one pre-trained module to detect suspicious activities
inside virtual machines and another online learning module to revise the
pre-trained module. The structure is tested against TCP SYN, ICMP, DNS
reflection and SSH brute-force attacks on nine separate machine learning
algorithms and described as machine learning highlights. The finding re-
sult is the 93 percent accuracy by using the supervised approach in machine
learning algorithm such as Naı̈ve Bayes, SVM and Decision Tree although
the machine learning for the unsupervised method is not examined.

Automated machine learning-based security against DDoS Attacks is im-


plemented in [73]. Instead of using source or destination side detection or
protection the authors propose an automated effective defense system that
monitor’s resource utilization by using Neural Networks for anomaly detec-
tion. This approach is used by professionals to report the use of properties
to detect the anomalies.

As there is no consensus on what traffic considered anomaly, typical fea-


tures of DDoS attacks, such as layer 3 IP header, layer 4 TCP header and
layer 7 HTTP specifications are extracted highlights for machine learning.
The assessment of machine learning techniques is then ready to distinguish
attacks on the basis of the knowledge obtained in these three highlighted
features in both usual and abnormal circumstances. It is believed that the
trained algorithm calculation would detect and drop harmful packets that
have recently damaged the structure some time ago. The NCTUNS net-
work test system was used to test this method.

2.5.2 DDoS Direct and Indirect attacks


As in Figure 17, a DDoS attack can be divided in two ways, either di-
rect or indirectly. In a direct attack, the attacker explicitly sends a bundle
of packets to the targeted machine in the direct network. However, in the
indirect attack the intruder sorts to an indirect server and attacks the vic-
tim through the source IP via a reflective server. The attacker sends the
spoof IP to the reflector server, and the reflector server sends the response
to the target. Also, in a direct attack, the victim gets the packets with the
same payload as the attacker sends, while in an indirect attack the reflect-
ing server opens a request from the attacker and sends an answer to the
victim. As an example, if the intruder sends 1Mb/s, the attacker can use an

26
Figure 17: Direct and Indirect Attacks [74]

increase in the number of packets and/or the transfer payload, and the re-
flector can submit more than the amount of the packets to the target victim
[74]. In a complex reflective attack, the uses can expert center as well as a
boot called a handler that controls 100 Zombies in a botnet, as can be seen
in Figure 18.

Figure 18: Complex Reflection Attack [74]

2.5.3 How attackers launch a DDoS attack


At first, attackers can discover vulnerabilities in one or more IoT devices
to execute a malicious program to send a DDoS attack. At that point, the at-
tacker formed a huge collection of geographically dispersed zombies called
the botnet. Any zombie set requires a handler that could be a huge num-
ber software packeges over the internet. Since they have data on complex
zombies, they deal directly with attackers and zombies. When the attack is
launched, the attackers direct it to the zombie handlers, who will transmit
the attack to all the zombies. At that point, the zombies will attack the tar-
get structure. Because of the nature of these attacks the management and
channeling of DDoS attacks triggered by IP spoofing is difficult to handle
[75] [76] still to this day.

27
2.5.4 DDoS attack types
As shown in Figure 19, attacks by DDoS are primarily divided into three
groups, as an attack can be deployed on different layers. In multiple layers,
the attacker exploits the weakness of individual ports. For example, in UDP
flood the attacker overrun the target host random port with UDP. In the
meantime, that host checks that there is no place for that port to listen on
this specific application. Host replies with ICMP ’Destination Unreachable’
error. Due to consumption of these host resource, the host unable to its
legitimate user. Protocol attacks such as Ping of Death (PoD) and Smurf
control the sending of harmful pings to a computer by means of the Internet
Protocol [77].

Figure 19: DDoS Attack Types [75]

The attacker can also use the Ping Search technique to find possible ca-
sualties, and the TCP SYN or ACK, UPD and ICMP are the most common
Ping Scans. When the firewall and ACL rules are less restrictive against
LANs or run inside IP addresses, ICMP checks are successful. In any case,
UDP Filter is useful when unsolicited UDP service and ICMP departure
traffic is not blocked within the Firewall. Successfully search against a
stateless firewall in the case of TCP that does not reject random ACK pack-
ets [78].

2.5.5 UDP attack


The attacker sends huge UDP packets to the target victim in a UDP attack.
The system that is used by the host will attempt to locate the application
in that port. If any service or program is not running on that port, an in-
accessible ICMP message is received to the source of the attacker. Since
the attacker continually sends UDP packets and has to keep up reacting to
ICMP unreachable message, the message will lead to network connection

28
overloadin the long run, the victim machine will not be able to reply to its
legitimate user. Due to the stateless nature of the UDP convention, attack-
ers effectively dispatch UDP flood attacks by spoofing themselves. How-
ever, a few Operating Systems have ability to avoid UDP flood by limiting
the number of responses [79].

2.5.6 Detection of DDoS Traffic


Various techniques have been used to distinguish DDoS attacks by cate-
gorizing the operation of the arrangement, such as [80] [81] [82] [83]. Ex-
tending the QoS, improving network management and network security are
some of the reasons for the classification of activities. Even though, there
is a one-way flow of activity or a bi-directional flow of activity, the classi-
fication can be either. The unidirectional stream is the organized packet
stream from a must to a five-fold server that includes the source IP and
port, the target IP and port, and the transport layer protocol. Whereas
the bidirectional stream considers the packet of traffic sent and received
between hosts.

Pattern discovery can be an instrument that identifies attacks by recog-


nizing the signature of known attacks. Pattern position systems are often
used as a virus detection system. Snort detecting the attack by using the
attack signature, it is one of the good detecting systems [84]. In either case,
Payload Inspection and Machine Learning-based behavior detection are the
two feasible approaches for DDoS detection.

2.6 Machine Learning


Several machine learning techniques have been used to detect DDoS at-
tacks. Each approach is distinguishing between the distinctive DDoS at-
tacks and different results that based on the data properties of the algo-
rithm. For example, Manjula et al. [85] shows that the measurement of
Fuzzy C-Means in unsupervised clustering processes perform much better
on the basis of twenty-three extricated attributes compared to others algo-
rithms.

A one-of-a-kind solution with a range of features to recognize all kinds of


DDoS attacks is still not available. Due to massive amount of network data,
it is difficult to recognize if the generated data is done by legitimate users or
from real-time attack. Peter et al. [75] tests show that the Long Short-Term
Memory Recurrent Neural Network (LSTM RNN) deep learning approach
gives impressive results for detecting a DDoS attack in a network. The
choice of supervised or unsupervised machine learning algorithms depends
on specific parameters, such as the volume and structure of information and
the form of DDoS. Five machine learning approaches for detecting DDoS
attack in IoT are described below.

29
Figure 20: Machine Learning [86]

2.6.1 Supervised Learning


Supervised machine learning is a technique in which we advise the cal-
culation of what conclusion it can give; however, the possible results are
already known and the training data is already labeled.K-Nearest Neigh-
bors (KNN), Support Vector Machine (SVM), Naı̈ve Bayes, decision tree and
logistic regression algorithms are used for IoT devices network intrusion,
classification of spoofing and DDoS attacks [67].
1. K-Nearest Neighbors (KNN): KNN [87] [88] could be an effective
and robust classification algorithm. In addition, KNN is known as
’Instance-based Learner’, which implies that the memorization of al-
gorithm training experiences rather than learning the model. As an
example, when we put data as an input, after splitting the dataset
into testing and training part it will gives us some output. KNN is
a paradigm of machine learning that build on labeled dataset of the
sampling data (x , y) and predicts the relationship between x and y.
The main purpose is to learn the function h : x → y to predict the
undetectable understanding of the target x,h(x).

KNN classifier operates with the given positive number k, the close-
ness matrix d and the hidden perception x:

(a) It goes through all the datasets and determines d and x between
each of the observations being prepared. Within the context of
the planning outcomes, the closest K focuses on x, set A. The K is
always an odd number to estimate tie conditions.

30
(b) It checks the probabilities of each perception at that point.

Figure 21: An example of KNN classification [89]

The selection limit for the KNN classifier is based on the K vari-
able. The K variable must be chosen by the originator or data re-
searcher who performs machine learning tasks as a hyperparameter
in KNN, and the K value should be the best possible fit integer for the
dataset. As we limit the k-value for the classifier, that gives an output
as predicted value. Higher K-respect will, however, lead to smoother
limitations on choices and more tolerance to anomalies, since more
voters would have each expectation [88].
2. Decision Tree: The Decision Tree is a well-known machine learning
algorithm used to classify unknown data from trained data. A decision
tree may be either a binary or non-binary tree that includes a root,
internal and leaf node. All perceptions are placed in the root node, and
each of the inner nodes holds the testing of features. The selection is
made by using the top-down recursive approach and, as shown in the
figure [22], the category of the leaf node is returned as a result. The
calculation of the C4.5 decision tree selects the characteristics based
on the data collection rate and makes the ID3 (Iterative dichotomiser
3), that could be used to determine the decision tree which produces
the dataset option [90].

By navigating the tree, the calculation of the decision tree algo-


rithm classifies from data or the hidden case and has been used to
distinguish DDoS attacks. In [91], the authors used the option tree
calculation to identify the attack and follow the location of attacker.
3. Support Vector Machines (SVM): In machine learning, support
vector machines [92] are administered learning models with related
learning calculations that dissect information utilized for characteri-
zation and relapse investigation. Given a lot of preparing precedents,

31
Figure 22: Structure of Decision Tree [90]

each set apart as having a place with either of two classes, a SVM
preparing calculation constructs a model that doles out new guides
to one classification or the other, making it a non-probabilistic double
direct classifier (despite the fact that techniques, for example, Platt
scaling exist to utilize SVM in a probabilistic characterization set-
ting). A SVM display is a portrayal of the precedents as focuses in
space, mapped with the goal that the instances of the different classifi-
cations are partitioned by a reasonable hole that is as wide as could be
expected under the circumstances. New precedents are then mapped
into that equivalent space and anticipated to have a place with a class
dependent on which side of the hole they fall.

Figure 23: an example of separable problem in 2 dimensional space [92]

32
Not with standing performing straight arrangement, SVMs can pro-
ductively play out a non-direct characterization utilizing what is kno-
wn as the bit trap, certainly mapping their contributions to high di-
mensional element spaces[93].
4. Naı̈ve Bayes Classifier: Based on the Bayes Hypothesis, the Naı̈ve
Bayes can be a simple probabilistic classifier that is useful to large
datasets [94]. When the features within the datasets are independent
of each other, the Naı̈ve Bayes model is easy to build, being a classifier
that provides a speedy performance. For example, where the reason
for classification is to distinguish from the probability that the incom-
ing packets are DDoS or normal, Naı̈ve Bayes performs very well in
double cases.

Figure 24: an example - Naı̈ve Bayes model [95]

Under the equation below, the Bayes Theorem recalculates the fea-
tures and the classifier expects that the value of the training model
on the target is not subject to any other attribute [94][96]. The Naı̈ve
Bayes learns by calculating the likelihood of the preparation of the
training data.

P (B|A) × P (A)
P (A|B) = (1)
P (B)
Equation 1: Bayes’ theorem

There are two presumptions about the usage of Naı̈ve Bayes. One
could have categorical values that can lead to over-sensitivity, while
another possibility is that the features are distributed. The features
rely on each other in the operations, but the outcome of Naı̈ve Bayes
is still satisfactory.
5. Logistic Regression: The logistic model [97] is a broadly utilized
statistical model that, in its fundamental shape, utilizes a logistic cal-
culation to display a binary dependent variable. In regression analy-
sis, logistic regression is assessing the parameters of a strategic model;

33
it is a type of binomial regression.
Scientifically, a binary calculated model has a needy variable with

Figure 25: an example - Regression in Gaussian distribution [98]

two conceivable qualities, for example, pass/fall, win/lose, alive/dead


or solid/wiped out; these are spoken to by a pointer variable, where the
two qualities are marked “0” and “1”. In this model, the log-chances
for the esteem marked “1” is a direct mix of at least one free factor
(“indicators”); the autonomous factors can each be a twofold factor
(two classes, coded by a marker variable) or a persistent variable (any
genuine esteem).

The relating likelihood of the corresponding probability of the value


“1” can fluctuate between 0 and 1, subsequently the naming; the ca-
pacity that changes over log-chances to likelihood is the logistic func-
tion. The unit of estimation for the log-chances scale is known as a
logistic. Comparable to models with an alternate sigmoid capacity
rather than the strategic capacity can likewise be utilized, for exam-
ple, the probit demonstrate; the characterizing normal for the calcu-
lated model is that expanding one of the free factors multiplicatively
scales the chances of the given result at a consistent rate, with ev-
ery reliant variable having its very own parameter; for a paired au-
tonomous variable this sums up the chances proportion [97].

2.6.2 Unsupervised Learning


The reason behind using unsupervised learning is that the machine al-
most learns the information and discovers grouping in data. Some of the
unsupervised processes used to distinguish DDoS attacks are Fuzzy C-
Means, Multivariate correlations, among others.

34
1. Fuzzy C-Means: The fuzzy c-method is used to recognition of pat-
tern. This method is used in the form the clustering. The analysis
of cluster involve on each data point such items that are similar, are
placed in similar cluster, while the item belong to different cluster
placed on different cluster. Clusters are recognized by means of like-
ness measures. These likeness measures incorporate separate, net-
work, and concentrated. Diverse closeness measures may be chosen
based on the information or the application [99].

2.6.3 Deep Learning


There are different levels of features in a deep learning method, which
are automatically discovered. Each level of the instance is established from
the previous level. Research indicates that deep learning has a high poten-
tial for the discovery of anomaly detection within the SDN [100].

2.6.4 Reinforcement Learning


It is another form of machine learning method that needs no preparation
experience, and the system learns the optimal behavior based on the tech-
nique of trial and error. The algorithms used to improve malware location,
IoT authentication learning, and anti-jamming transmission [101] [67] are
Markov Decision Processes (MDPs), Q-learning, Dyna-Q and Post-Decision
State.

2.7 Related Work


Various studies have been carried out to survey the validity and accuracy
of the IDS datasets that is available. Malowidzki et al. [102] assess on the
current situation about publicly available IDS datasets and provide input
on those types of datasets that can serve as the basis for a wide dataset. The
authors propose variations in the arrangement of information and high-
light perspectives that result in a high-quality, reliable dataset [102]. In
addition, Koch et al. [103] include an evaluation of the IDS datasets, which
are disseminated through 13 sources of distinctive information. Analysis of
this data is based on 8 attributes of datasets. Also, the author analyzes the
security system and investigation of their short-comes.

Thomas et al. [104] study dissects a single dataset, DARPA , which takes
a rather distinctive approach and explores its usage throughout the analy-
sis of intrusion detection. The author conclude that the DARPA dataset has
the capability to view attacks that are usually seen in the network traffic
operation and can thus be viewed as an “the baseline of any research”.

Consequentially, the work of Dhanabal and Shantharajah [105] discusses


the use of the NSL-KDD dataset to detect interruptions. The developers
consider the adequacy of the NSL-KDD data-set to identify traffic irreg-
ularities structured using different classification calculations. This work
uses J48, SVM and Naı̈ve Bayes algorithms to survey the dataset and con-
cludes that the calculation of J48 provides the best results in accuracy.

35
In comparison to other datasets, Sharafaldin et al. [106] provide a more
comprehensive review of the IDS datasets, which focuses more on the pro-
visioning of a high-level diagram. 11 IDS datasets are evaluated by the
authors and compared to 11 properties. In addition, Bhuyan et al. [107]
briefly clarify and compare a broad range of approaches and frameworks
for network-anomaly detection. Authors are constantly examining meth-
ods for network defend and datasets that researchers may use to network
anomaly detection [107]. Similarly, Nisioti et al. [108] are talking about
12 IDS datasets and provide a fundamental assessment of unsupervised
technique for intrusion detection.

The most popular datasets for machine learning and artificial intelli-
gence [109] are described Yavanoglu and Aydos [110]. In addition, Ring
et al. [111] evaluated a variety of datasets. In order to dissect the impor-
tance of individual datasets for particular assessment situations, this work
recognizes 15 distinctive properties. On the basis of these characteristics,
the author are expected to provide an overview of existing datasets [111].

Based on the survey [112] DDoS attacks such as ICMP, UDP, SYN flood-
ing, etc are detected by Probabilistic Packet Marking (PPM) and Determini-
malistic Packet Marking (DPM) techniques. The solution is based on Trace-
back Methods, Entropy Variation and Intrusion Detection and Prevention
systems (IDS/IPS) are used. Intrusion Detection and Prevention systems
are using signature and anomaly based detection techniques. Each of these
methods has a distinctive outcome. Another DDoS recognition technique
is suggested by authors in [113] to resolve the limitation of factual and
classification-based techniques in the Socially Aware Organization (SAN)
using the Multi-Protocol-Fusion Highlight (MPFF). The MPFF strategy
is based on Autoregressive Coordinates Moving Normal Position Display,
which recognizes normal and DDoS behavior viably. On the other hand,
some analysts are trying to distinguish DDoS by various machine learning
procedures and calculations. Below described some research paper sum-
mary that helps in this study.

In [73], the type of attack is HTTP, TCP/IP. The technique used on traffic
filtering is utilization of resource monitoring and anomaly detection. Ar-
tificial neural network is used and attack is on destination location. In
[114] the author use machine learning technique on signature and anomaly
based detection. The attack type is flooding that based on OSI layer 3,4
and 7. Algorithm used is C.4.5, Naı̈ve and K-Means. The accuracy of these
algorithm is 98.8 %. The attack is took place on source devices. Another re-
search [115] are used on supervised classification. The classifier used here
are Naı̈ve Bayes and K-NN. The attack took placed on HTTP and result is
90 %.

Most of the past work, with a Ffew clear cases [105], takes a qualitative
approach to the survey of the IDS datasets. In this regard, a critical area of
background chapter is focuses on assessing the consistency of information

36
from a more analyzing point of view by different criteria, some of which are
relevant with Nehinbe work [116].

In this study, that also evaluate and analyze on the fact of a few new
concepts in this field are based on previous work. DDoS attack is the main
focus of this study. If we talking about the past work, that mostly work is
relevent with IDS dataset analysis, that is basically one type of attack in se-
curity. Although there have been several instances in the IDS dataset that
can be need to investigate. Where DDoS attacks have been highlighted,
they have never been considered to be the most significant point. In ad-
dition, by taking a more explanatory approach to the assessment, this re-
search work based on the work of Dhanabal and Shantharajah [105] that
uses multiple machine learning algorithms to survey datasets. There are
two angles that recognize this work from that of Dhanabal and Shanthara-
jah [105], even with the fact that similer mehodolgy is used. In start one
dataset is used that is end with DDoS attack in the end.

In general, the purpose of this work is to suggestions on the given dataset


with applied different supervised machine learning algorithms. This work
is a potential rule for the role of the behavior of DDoS based on machine
learning.

37
3 Approach
In Chapter 2, we discussed IoT security challenges in depth, where DDoS
attacks are one of the most considerable attacks in IoT devices. In this
chapter we explain machine learning based methods that we propose for
DDoS detection in cellular network, as well as the technologies and tools
that we are using in our experimental setup for generating traffic in DDoS
attack. We also describe the machine learning classifier implementations.

3.1 Objectives
The aim of this thesis is to detect DDoS attacks through machine learning
in cellular network via the packets generated by IoT devices. We divided
our experiment in three phases. In first phase we collect the data from IoT
devices to select and extract the features for machine learning classifica-
tion. In the second phase given the collected data, we organize it in two
datasets, one is normal situation, another with a DDoS attack. In the third
phase, the data have been prepared by Scikit-learn tool and labeled it in
correct format.

3.2 Design Phase


We follow the experiment research methodology [117] in our thesis. This
methodology consists of three phases that shown in figure 26. The figure
shows how we integrate this methodology for DDoS attack detection. Data
collection phase consists of traffic generation and subsequent capture the
feature selection phase indicate a segregation in datasets by one of them
featuring a DDoS attack. In the machine learning classifier phase, the
dataset reflects the classification model of DDoS and normal traffic. For
further explanation on how the proposed model detects DDoS attack, we
need to describe how the packets travel from IoT devices to IoT application
servers.

In a core cellular network generally, the user packet transfers from the
eNodeB to the Serving Gatway (SGW). The packets are then forwarded
from SGW via S5/S8 interface towards the Packet Gateway (PGW). The
PGW forwards these packets towards the application server.

When the IP is generated from the device, to reach the destination, that
packet is forward to an eNodeB. In this packet, the eNodeB attaches an-
other IP packet that has a GTP header. After that, it will encapsulate an
IP/UDP header to transmitted as an Ethernet frame to Serving Gateway.
Due to this packet encapsulation and changing of IP addressing, we should
inspect the packet for DDoS detection.

38
Figure 26: Research Methodology

With the above packet inspection methodology, we see in the example


shown in figure 27 that how the packet travels in core network. Let’s
assume the scenario: Bob wants to communicate with Alice through the
cellular network. The eNodeB network IP is 172.16.10.0/16 that is con-
nected between eNodeB and MME. On the other side 172.16.20.0/16 net-
work is between SGW and PGW. PGW assigns globally the routable ad-
dress 84.50.84.156 to Bob which is translated to 172.16.30.12. When Bob
forwards the packet towards Alice, the packet reaches the eNodeB, than for-

39
wards toward the SGW. The SGW forwards it to PGW, and finally it reaches
Alice [118]. If there is any exploit in this payload, we would be able to de-
tect it before it reaches the PGW or the final destination, which is Alice in
this example.

Figure 27: End to End Communication in Network

An interface S11-U is used for small data transmission between SGW and
MME in CIoT (Cellular IoT) [119]. GPRS tunneling protocol user plane
(GTP-U) is used for S11-U. In GSM, UMTS, LTE and 5G core networks,
GTP-U is used in user plane for transmitting of user data traffic. In core
network when the traffic is encapsulated and passing, GTP based protocol
is used for IP/UDP. GTP also provides mobility for mobile user by creating a
tunnel between eNodeB, SGW, MME and PGW. The primary job of GTP-U is
the maintenance of paths per client tunneling request for IP Packets, echo
requests and error reporting. GTP have two versions; these are GTPv1-U
which is used for plane message transporting data and GTPv2-C which is
used for control plane signaling.

When we are talking about the GTP tunneling in core network, both nor-
mal and malicious packets look similar and these packets are not inspected
because of the placement of these packets inside the GTP. If we want to de-
tect a DDoS attack, we consider GTP user plane (GTPv1-U) to inspect the
packet. Figure 28 shows the performed cellular GTP tunneling between
eNodeB, SGW, PGW and packet traces from the user end. Each node is
assigned with a unique IP address and the packet is encapsulated in an
actual source and destination IP for the purpose of security and mobility.
In NB-IoT, GTP tunneling is performed from MME to SGW and PGW as
shown in figure [28] (blue arrow). Our proposed method capture packets
from SGW and performs packet inspection in SGW to recognize malicious
packets by extracting the features that can indicate a DDoS attack. Af-
ter that, machine learning classification algorithms can segregate between
normal and abnormal DDoS packets. If the packet is classified with a nor-
mal traffic than it will be forward towards SGW through PGW and reach
the IoT application server. If it is considered abnormal and further veri-
fied as an attack, the devices’ info is forwarded to the Identity Management

40
System (IDMS), which is the responsible for temporary or permanent block
of devices.

Figure 28: Proposed Method for DDoS Detection

3.3 Implementation and Experiment phase


We use different tools and packages for generating normal and DDoS
traffic and analyze patterns by using machine learning technology. The
tools and packages that we are using are described below:

3.3.1 Data Collection


We always require data to train our machine learning algorithm for train-
ing purposes. There are several databases that are used for classification of

41
datasets such as CAIDA1 [120] dataset that is recorded in 2007. In CAIDA,
it is not guaranteed that the non-malicious data has been removed from
the DDoS dataset. Hence, when we use this dataset, it can include normal
packets as DDoS packets, and due to this we weren’t getting the desired
results. Another dataset that widely used for research is NSL-KDD2 [121] .
This dataset contains various attacks that also includes six types of DDoS
attack. These data are labeled with normal and attack type. However, the
author has not mentioned if the data are generated by IoT constructed de-
vices. Therefore, we choose to generate data based on our own requirement.

By using machine learning technology to detect DDoS attacks, we need


normal and DDoS attack traffic. First, we generate normal and DDoS traf-
fic separately and after that we combine these datasets together.

3.3.2 Used Tools and Software


1. Kali Linux3 :- Kali Linux [122] is an open sourced and free of cost
tool. It is one of the Linux Debian’s based distributions that is used
for advanced penetration testing and forensics. This open source tool’s
based on customizable user’s preferences and support more than 600
penetration testing tools in different languages.
(a) Attack Parameters:- We use the below mentioned parameters
to generate DDoS attacks with hping3.
-count (-c): we represent packet count with this command
-data (-d): set packet size for sending
-flood: fast possibility to send packets
-rand-source: spoofing the random IP address for source attack
-win (-w): specify the window size, default is 64
(b) Scapy:- It’s a packet manipulation program that allows user to
sniff the network packet. It is also use for traceroute, scanning,
probing, attacking, testing, network discovery. By using Scapy,
an user can send invalid frames and attach it’s own packet with
original packet.
(c) Apache Server:- We use this server for TCP SYN flood and UDP
packets for the victim device/server.
2. Wireshark4 :- Wireshark [123] is an open-sourced and free packet an-
alyzer software that is used for analysis, network troubleshooting,
communication and software protocol development. Wireshark is us-
ing the Qt widget toolkit that is implemented in the interface by using
pcap to capture the packets. It runs in all of famous operating system
like; Linux, Solaris, macOS and Microsoft Windows. Wireshark has a
graphical front-end but it is similar to tcpdump. We use Wireshark in
our thesis for capturing packets that coming from IoT devices.
1 https://www.caida.org/home/
2 https://www.unb.ca/cic/datasets/nsl.html
3 https://www.kali.org
4 https://www.wireshark.org

42
3. Python5 :- It’s a general purpose, high level open source programming
language. Ease of learning, efficient code, easy communication are
some of the features of Python, many researchers use this program-
ming language in this field. Python has many great environments
and libraries such as Spyder6 and Jupyter notebook7 . With the help
of Matplotlib library, scientists draw powerful 2D-graphs for their ma-
chine learning studeis. Due to these powerful features of Python, we
use this language for machine learning experiment.

4. Skikit-Learn8 :- As open source machine learning tool for Python pro-


gramming language. It is a simple tool for data analysis and data min-
ing. Scikit-learn consists of different algorithms for implementation
for supervised and unsupervised learning.

Scikit-learn is used by different researchers and big industries like


Booking.com, Spotify, IBM Watson for integration of machine learning
features due to its eas of use, library , open API and being open source.

3.3.3 Collection of Normal and DDoS attack traffic


1. Normal Dataset:- For generating normal traffic, we had mobile de-
vices (e.g. smartphones and raspberry Pi with IoT boards) connected
through Wi-Fi to a mobile gateway. This gateway has a programmable
SIM card that allows to connect to our test cellular network using con-
sumer available hardware and open source software.

Figure 29: Test Lab Devices

5 https://www.python.org/
6 https://www.spyder-ide.org/
7 https://jupyter.org/
8 https://scikit-learn.org/stable

43
Figure 30: Test Lab Network

To capture the traffic generated by these devices, we used Wire-


Shark, that allow us to have an overview of what was happening.

2. DDoS Dataset:- For DDoS traffic we use the CICDDoS20199 dataset,


that is available for machine learning research. This dataset is based
on simulation and dated between 2016 to 2019. For this study we
select this dataset as it provides a comprehensive analysis or various
type of DDoS attacks. Also, due to the COVID-19 pandemic, we were
unable to setup the necessary scenarios in our Secure 5G4IoT lab in
OsloMet10 in time to complete this work.

Figure 31: Testbed Architecture [124]

9 https://www.unb.ca/cic/datasets/ddos-2019.html
10 https://www.oslomet.no/

44
CICDDoS2019 contains the study of organized activity analysis us-
ing CICFlowMeter-V32 [125] and labeled flows. The B-Profile system
[124] was used to explain the basic actions of human intelligence and
to generate naturalistic activity for traffic. The theoretical behavior of
25 users for this data set was based on the HTTP , HTTPS, FTP, SSH,
and e-mail protocols [124]. Various advanced DDoS attacks, such as
Port Map, NetBIOS, LDAP, MSSQL, UDP, UDP-Lag, SYN, NTP, DNS
and SNMP, are included in this data collection. Capturing time for
the training day on 12 January began at 10:30 a.m. and it finished
at 5:15 p.m. and the test day on March 11 started at 9:40 a.m. And
it finished at 5:35 p.m. Finally the attacks were executed during this
period.

Table 1: Specification of dataset CICDDoS2019 [124]

Machine OS IPs

Server Ubuntu 16.04 192.168.50.1


(Web Server) (first day)

192.168.50.4
(second day)

Firewall Fortinet 205.174.165.81

Win 7 192.168.50.8

Win Vista 192.168.50.5

PCs (first day) Win 8.1 192.168.50.6

Win 10 192.168.50.7

Win 7 192.168.50.9

Win Vista 192.168.50.6

PCs (second day) Win 8.1 192.168.50.7

Win 10 192.168.50.8

45
3.3.4 Feature Extraction
To distinguish between DDoS and normal IoT traffic, we need to indicate
the packet features that are selected for machine learning classification.
Protocol type, port, source and destination IP and packet length features
have been used for recognizing most of the DDoS detection.

For the machine learning classification, the key highlights demonstrat-


ing the DDoS must be chosen in order to recognize between the standard
IoT operation and the DDoS IoT operation. DDoS IoT traffic in machine
learning has been used and classified by source IP, target IP, port, protocol
types and packet length. According to Pariya et al.[126] wireless sensor
network, performance and number of collisions are quantitative metrics
that can also be used in DDoS anticipation procedures for the evaluation of
DDoS execution. We have chosen the characteristics below to differentiate
between ordinary traffic and DDoS. [127] [24]

1. Packet Size:- Under a small time-stamp, DDoS assail disperses a


large number of packets, and these packets are smaller in the estima-
tion that the measure is mutually resolved while the ordinary packet
varies in size. Rohan et al. [24] maintain that the DDoS bundle is less
than 100 bytes, while the normal operating bundle is between 100 and
1200 bytes. However, for the TCP SYN attack, the DDoS packet esti-
mate is set at 58, 60 and 174 bytes, based on the data we collected as
both ordinary and DDoS operations. The dataset captured by Markus
et al. [128], approximately 31 separate IoT devices, appears to shift
from 42 to 1434 bytes for IoT packet measurement. In this way, a
sudden increase in the operating current with a steady bundle mea-
surement is a DDoS attack, either smaller or larger than 100 bytes.

2. Packet Time Interval:- Streams of ordinary IoT service during the


usual interim period. Be that as it may, the intermediate time be-
tween parcels in a DDoS attack is close to zero because the operator
delivers the packet extremely quickly [24].

3. Packet Size Variance:- For the most part, parcels of assault activ-
ity have the same estimate, while regular traffic has different packet
measurements, the activity of the same record has a distinct estimate
[24]. All TCP attack bundles weigh 90 bytes in our data set to illus-
trate. Conditions 1 and 2 can also be used with differences in package
sizes.

4. Protocol Type:- Assault professions are a number of conventions,


whereas regular service requires a number of conventions. Two con-
ventions (TCP and UDP) have been used for attack operations, al-
though other conventions exist within regular activity captures.

5. Destination IP:- IoT devices communicate with many expected tar-


get numbers and seldom modify their target IP over time. This high-
light also displays DDoS attacks. Inside a short time stamp, a single
gadget interaction with a range of specific targets shows an assault.

46
An unmistakable target can be checked within 10 seconds to identify
an attack [24].

6. TCP SYN:- The server does not receive the client’s ACK response in
TCP SYN dissension of benefit assault as mentioned in the Assault
case, since the point of the attacker is not to establish touch, but to
connect the assets to cause the server to be inert. In this way, SYN
and ACK are viewed as part of the TCP flood.

3.4 Methodology
The data generated was captured from WireShark. The raw data recorded
in pcap format and then converted in comma separated values (CSV). Nor-
mal and malicious data features extracted as discussed in Chapter 2. Nor-
mal traffic was put in one file and DDoS attack or malicious data placed in
another file.

Figure 32: Wireshark capturing [123]

3.4.1 Cleaning and Transformation


1. Missing Values:- It’s very difficult to handle the missing values in
machine learning, because it could create an incorrect prediction for
any model. The null values and respective entries are then removed.
2. Transformation:- The arrangement of the data collected might not
be appropriate for modeling. As illustrated by the CRISP-DM method
[129], in such cases, the type of data should be changed in such a way
that the information can be integrated into the models at that point.
Appropriately, a few data features have been converted to numeric or
float, since modeling do not fit well with strings.

47
1 #transformation
2
3 scaled_features = scaler.transform(df.drop(’Length’,axis=1))
4 df_feat = pd.DataFrame(scaled_features,columns=df.drop(’Length’,
axis=1).columns)
5 df_feat.head()
Listing 1: Transformation

3. Labels of Classes:- Table 2 illustrates the binary classes length sites.


Our dataset represent the two type of classes: first packets with length
below 100 packets size are represented with 1 and other length of
packets are represented with 0. Second type of data, if a packet has
a length between 50 to 70 and 160 to 180 it is then represented with
1, meaning it could indicate an attack. If the packet doesn’t fit those
intervals, it is represented with 0, as it is normal.

Table 2: Labeling of binary classification

Length Label

normal 0

malicious 1

1 #labeling
2
3 i = 0
4 while i < len(df[’Arrival Time’]):
5 #Coding Threshold: [IoT]: SYN can occur in packets below 100
bytes
6 if(df.at[i,’Length’] <= 100):
7 #Coding Threshold: [IoT]: SYN can occur in packets between 42
and 1434 bytes
8 #if(df.at[i,’Length’] >= 42 and df.at[i,’Length’] <= 1434):
9 #Coding Threshold: SYN can occur in packets between 50 and 70
bytes or between 160 and 180 bytes
10 #if(df.at[i,’Length’] >= 50 and df.at[i,’Length’] <= 70 or df.
at[i,’Length’] >= 160 and df.at[i,’Length’] <= 180 ):
11 df.at[i,’Length’] = 1
12 else:
13 df.at[i,’Length’] = 0
14 df.at[i,’Arrival Time’] = time.mktime(datetime.datetime.
strptime((df[’Arrival Time’][i])[:-7], ’%b %d, %Y %H:%M:%S.%f’
).timetuple())
15 df.at[i,’Source’] = df.at[i,’Source’].replace(".", "")
16 df.at[i,’Destination’] = df.at[i,’Destination’].replace(".", "
")
17 i += 1
Listing 2: Labeling

48
3.4.2 Splitting of Dataset
The ability to summarize new or hidden data is a key feature of a great
learning model. A model that is close to a specific dataset is described as
over-fit.

A description of the presentation of model includes the different combina-


tions of input. Models essentially require two sets of data, one is training
and the other is testing. Training data is the set of instances trained on
the model, while the test data is used to determine the ability of the model,
i.e. the execution of the model. The train / test part can offer good results;
however, there are a few drawbacks in this technique. While splitting is
irregular, it can happen that the split causes imbalanced between the train
and the test set, where a large number of instances are covered by the train
set from one class. It will show that a model fails to show and overfits.

1 #Splitting of data by 0.30 ratio


2
3 X_train, X_test, y_train, y_test = train_test_split(scaled_features,df[
’Length’],test_size=0.30,random_state=100)
Listing 3: Splitting of data

Datasets are divided into two subsets; training and testing. The split
data is divided in 70/30 ratio. The train test split helper method is used
from scikit-learn library [130] for splitting of data.

With this approach, training data is divided into two parts, training and
validation. The training set is used to train the model in start, then valida-
tion set is used to estimate the performance of data. k-fold approach [131]
is used for validation of this study.

3.4.3 Modeling
The classification is divided into two phases.
• Generation of learning model
• Construction of the predicted labels
We use scikit-learn Python library for implementation of our task. This
library is used in data analysis, data mining and machine learning.
1. Selection of model:- This study features the testing and training on
classification methods namely k-nearest neighbors, sum and others.
(a) Parametric/Non-parametric Algorithms:- The difference be-
tween parametric and non-parametric is that a parametric can be
loosely described as a predefined attribute in data. It possesses
a fixed number of parameters [132]. The parametric algorithms
are ideal if the assumptions are correct. However, if the assump-
tions are incorrect than these algorithm perform badly.

49
Non-parametric algorithms are more flexible algorithms. These
type of algorithms perform slower computations, but make less
assumptions about the dataset [132]. In this study, non-parametric
algorithms are used.

Figure 33: Parametric/non-parametric models

(b) Instance-based:- Instance-based learning methods are “concep-


tually straightforward approaches to approximating real-valued
or discrete-values target functions” [133]. These types of algo-
rithm storing the data in learning and then present it. When a
new instance is presented, this will compare with the previous
instance and classified accordingly.
2. Models used in this study:- In this study we use different models
for our dataset. Below the model that are going to be used in this
study are presented in pseudo-code.
(a) K-nearest neighbors:
First model is k-nearest neighbors. It is an instance based clas-
sifier. When we use k-NN, dimensional space is used in dataset,
where a new instance based on it’s similar instances. These in-
stances are known as neighbors. Figure 34 shows that the new
instance is labeled as x, if x is most similar class for the neigh-
bor observed. For determination a distance function is applied
between same instances [133].

50
Figure 34: k-NN classifier - an example [133]

Figure 35: Pseudo-code for k-NN Algorithm [133]

(b) Support Vector Machine (SVM): The SVM learning handle


is carried out in two steps: first, the inputs are plotted in a n-
dimensional space where n is based on the number of charac-
teristics; as support vectors, the configuration of the individual
attributes is indicated. In addition, the instances are divided by
a hyperplane. Hyperplane is a line that divides the array of in-
formation centers directly into two different classes. The SVM
separates the dataset with hyperline.

Computational problems are highly likely when handling the


mapping of complex nonlinear capacities. In truth, the larger the
dimensional space, the greater the problem of partition [134].
The kernel tricks are used to ease this problem. The kernel
is transforming the complex functions in to higher dimensional
space. It separates the predefined problem that is labeled to sep-

51
Figure 36: SVM - an example [133]

Figure 37: Pseudo-code for SVM Algorithm [133]

arate the users input [133]. Figure 36 represents the classifica-


tion with SVM algorithm.
(c) Naı̈ve Bayes: The naı̈ve classification algorithm is based on
the Bayes hypothesis, where equality of opportunity is expected.
In statistics, two occasions are said to be free on the grounds
that the probability of one does not affect the other [133]. The
Bayesian classification equation for the probability calculation is
shown in equation 1.

52
Figure 38: Pseudo-code for Naı̈ve Bayes Algorithm [133]

(d) Decision Tree: The classification of the decision tree begins


at the root center and classifies expectations on the basis of the
values of the individual characteristics. Each hub of node refers
to a single inclusion, whereas the values required by the node
[133] are reflected.

Figure 39: Pseudo-code for Decision Tree Algorithm [133]

The calculation works its way down from the root node by it-
eratively measuring the data set for each inclusion in the prepa-
ration kit. The data collection is used to determine the degree
of preconception that has been forced to highlight in the target
groups. The higher the set of data, the higher the value of the
property in the classification of each observation [133] [134]. The
root hub is replaced by the quality that collects the highest infor-
mation, and the calculation requires the information set by the
selected highlight to produce the subsets. The description of this
approach is shown in figure 39.

53
(e) Logistic Regression: Another predictive analysis and best
suited analysis type is Logistic regression,where the subordinate
variable is binary. The data is described by the binary calculated
and the relationship between a subordinate binary variable and
other non-binary free variables is clarified [134]. figure 40 de-
scribes the calculation for the assessed relapse classifier.

Figure 40: Pseudo-code for Logistic Regression Algorithm [133]

3. Training:- During the training process, the selected algorithm learns


from training data and creates a training model for machine learning.
The provided data for this training is described in previous section.
In the training process, the training set find patterns and maps these
input classes with the targeted attributes. On the basis of these at-
tributes classes, the training model is produced. In this study DDoS
dataset is used for input source, where the target attributes are nor-
mal and attack packets in the network. Initially we were focusing on
five different algorithms with different kinds of datasets but due to
COVID-19, we were unable to produce data on time. Due to that rea-
son, we just focus only on one dataset. Training is conducted by using
the several methods. Table 3 shows different classifiers that are used
in this study. Appendix A contains the source code of our model that
we built to identify a DDoS attack.
1 #knn, Decision Tree, SVM, Naive Bayes, Logistic Regration
classifier
2
3 clf = neighbors.KNeighborsClassifier(n_neighbors=k, n_jobs=-1)
4 clf.fit(X_train, y_train)
5 clf1 = svm.SVC(kernel=’linear’, gamma="auto") # Linear Kernel
6 clf2 = linear_model.LogisticRegression(random_state=1, solver=’
lbfgs’, n_jobs=-1)
7 clf3 = ensemble.RandomForestClassifier(n_estimators=100,
random_state=1, n_jobs=-1)
8 clf4 = naive_bayes.GaussianNB()
Listing 4: Classifiers used in this work

54
Table 3: scikit-learn Python Library [130]

Model Scikit-Learn Method & Classifier

KNN neighbors.KNeighborsClassifier

Decision tree.DecisionTreeClassifier
Tree

SVM sklearn.svm.LinearSVC

Naı̈ve sklearn.naive bayes.GaussianNB


Bayes

Logistic Re- sklearn.linear model.LogisticRegression


gression

4. Testing:- The last stage of modeling is testing of unseen data. Data


is divided in to training and testing. Various performance metrics
are generated for analyzing the performance of the chosen classifiers,
such as accuracy, precision, recall and F-measure.

3.4.4 Evaluation
Evaluation is a very crucial part for understanding the performance of a
chosen model. This part defines the performance of the models. Below are
described the various metrics used in this study.

1
2 \# Model Accuracy: how often is the classifier correct?
3 print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
4
5 \# Model Precision: what percentage of positive tuples are labelled as
such?
6 print("Precision:",metrics.precision_score(y_test, y_pred, average=
’weighted’))
7
8 \# Model Recall: what percentage of positive tuples are labelled as
such?
9 print("Recall:",metrics.recall_score(y_test, y_pred, average=’
weighted’))
Listing 5: Evaluation model

1. Accuracy:- It’s a one way to describe your model performance by the


count of correct and incorrect classifier elements. These correct and

55
incorrect values are represented in the values of accuracy. A tabu-
lar visualization of the performance of your supervised algorithm is
shown in next chapter.

With the evaluation part we can determine the performance of clas-


sifier [135]. It is a clear view to see the elements in different evalua-
tions. The accuracy formulate as follows:

Correctly classif ied instances


Accuracy = × 100% (2)
T otal instances
Equation 2: Accuracy
2. Precision:- For assessing the performance of learning model accu-
racy is not enough. The accuracy gives an idea that the model is
trained correctly, but it doesn’t give the detailed information of the
specific application. For that reason we use the other performance
measurements, such as precision.

The rate of correctly classified true positive or true negative is known


as precision. When we see the dataset and data modeling scenarios
then maybe false negative or true negative haven’t the same values.
In most of the studies, we see the high false positive rate means the
traffic is malicious but in reality it is not infected. We compute the
precision with the following formula.
T rue positives
P recision = (3)
T rue positives + F alse positives
Equation 3: Precision
3. Recall:- The performance measurement is also get with recall met-
ric. Recall is measure that how many actual positive are measured
or recall. It is also an important metric, that detects how many true
positive or false negative are recall in the serious consequences. Spe-
cially for the DDoS attack it doesn’t mean that the traffic is malicious
or harmed for the system. Maybe the traffic or graph is increased due
to false negative. The Recall is measured by below formula.
T rue positives
Recall = (4)
T rue positives + F alse negatives
Equation 4: Recall
4. F-measure:- The F-measure is another metric that provide overall
score by the combination of precision and recall.The model is correctly
identified threats when the minimal false alarm. The F-measure score
is good, when the false positive and low false negatives in the model
gets the minimal values. The F-measure formula is given below.
P recision Recall
F − measure = 2 × (5)
P recision + Recall
Equation 5: F-measure

56
4 Results
In Chapter 3 we elaborate on how to generate and collect the datasets.
Normal dataset is generated in OsloMet lab and DDoS datasets are collect
from CIC [124]. These both datasets are labeled and we extract the re-
quired features. After that the data is transformed and converted into the
format that is acceptable for Scikit-learn machine learning algorithm. A
few experiments were carried out to verify the performance and accuracy of
the classifier for various combinations and sizes of data. Our DDoS detec-
tion test considered to be based on TCP SYN attack due to time constraints,
as the point of the study is to provide a DDoS attack method. The data is
split in preparation of training and testing datasets

Below is the procedure throughout in our machine learning DDoS attack,


that we are followed in our study experiment.

1. Review the datasets


2. Feature selection and targeting
3. Splitting of dataset into training and testing set
4. Model training with the classifier
5. Data Prediction from test set.
6. Checking prediction and accuracy of classifier

Cross-validation is a method that is used to assess the execution of a pre-


dictive model. Scikit-learn performs cross-validation in supervised machine
learning by splitting of the open data collection in the training and testing
sets, and holding test set (X-test and y-test). The dataset is divided into
k-fold cross validation, even with an estimated dataset in the k-fold cross
validation technique. As a test set, one of the k sub-sample is used for each
k, while the other k-1 sub-samples are used as a training set. Each data
overlay is attempted as it was once, and the accuracy of the machine learn-
ing display is calculated by averaging the accuracy of all k-folds. Each data
fold is attempted as it was once, and the accuracy of the machine learning
display is determined by averaging the exactness of all k-folds.

In Section 4.1 we describe the results of our classification algorithm.


Given our scenario and overall performance we choose different algorithms
for our analysis. The collected data is used to train the chosen model to
detect and eventually predict DDoS attacks. The performance of each algo-
rithm is evaluated separately, and the results are displayed in other tables.

We use two different threshold scenarios in both datasets that we de-


scribed in chapter 3. First, we set the threshold of packets below 100 bytes.
Second threshold is set between 50 and 70 or between 160 and 180 bytes.
Both thresholds produce different outcomes given in datasets. Below we
present the outcomes of each classifier given the illustrated thresholds:

57
4.1 First Threshold - (Length below 100 bytes)
Our dataset is based on roughly 10,000,00 observations which associated in
the ways described in chapter 3. Below we illustrate the normal and DDoS
scenario with their results.

4.1.1 Normal Scenario


Table 4 shows how accurate they performed with normal traffic. The SVM
performed fairly well in this experiment. In the normal dataset SVM gives
no anomaly but rest of the algorithms show possible anomalies. The “Clas-
sifier Accuracy” column denotes the result of our classifiers.

Table 4: Classifier statistics (Normal)

Classifier name Classification Accuracy (%)

K-Nearest Neighbors 83.70

Support Vector Machine 82.55

Naıve Bayes 75.50

Decision Tree 83.70

Logistic Regression 82.55

58
1. K-Nearest Neighbors

Figure 41: K-Nearest Neighbors

59
(a) Error Rate and K-Value The performance of the K nearest
neighbors classifier depends on three factors; k value (of neigh-
bors), distance metric and decision rule.

By choosing a small or a large value of k produce different out-


comes given our dataset. With the help of cross validation, we
can select a good value of k. Below is the code used to find the
best value of k given our dataset.

1 # find k value
2
3 error_rate = []
4
5 for i in range(1,40):
6
7 knn = KNeighborsClassifier(n_neighbors=i)
8 knn.fit(X_train,y_train)
9 pred_i = knn.predict(X_test)
10 error_rate.append(np.mean(pred_i != y_test))
Listing 6: find K value - Normal (First Threshold)

In our study we use different values of K, due to different


thresholds. Below figure shows the different values of k in our
dataset. We experimenting different K values in our training
dataset, that starting from K=1 to K=40. Below graph show K-
values in our normal traffic.

Figure 42: Error Rate vs K

60
2. Other Classifiers

Figure 43: Other Classifiers

61
3. Performance Metrics
Table 5 shows the overall performance on the basis of 1st threshold
with normal traffic. Table showcase the accuracy, precision, and recall
results. All classifiers were trained and tested on 70:30 ratio split
dataset.

Table 5: Performance Metrics

Classifier name Accuracy (%) Precision (%) Recall (%)

K-NN 83.70 86.72 83.70

SVM 82.55 86.15 82.55

Naıve Bayes 75.51 82.70 75.51

Decision Tree 83.70 86.72 83.70

Logistic Regres- 82.55 86.15 82.55


sion

62
4.1.2 DDoS Scenario
Table 6 shows how accurate they performed with DDoS traffic. The SVM
show no anomaly in this experiment but other classifiers show anomalies.
SVM anomaly has good accuracy result but it doesn’t shows any anomaly.
Navie Bayes performs well to detect the anomaly but with lower accuracy.
The “Classifier Accuracy” column denotes the result of our classifiers.

Table 6: Classifier statistics (DDoS)

Classifier name Classification Accuracy (%)

K-Nearest Neighbors 98.12

Support Vector Machine 98.19

Naıve Bayes 97.90

Decision Tree 98.12

Logistic Regression 98.18

63
1. K-Nearest Neighbors

Figure 44: K-Nearest Neighbors

64
(a) Error Rate and K-Value The performance of the K nearest
neighbors classifier depends on three factors; k value (of neigh-
bors), distance metric and decision rule.

By choosing a small or a large value of k produce different out-


comes given our dataset. With the help of cross validation, we
can select a good value of k. Below is the code used to find the
best value of k given our dataset.

1 # find k value
2
3 error_rate = []
4
5 for i in range(1,40):
6
7 knn = KNeighborsClassifier(n_neighbors=i)
8 knn.fit(X_train,y_train)
9 pred_i = knn.predict(X_test)
10 error_rate.append(np.mean(pred_i != y_test))
Listing 7: find K value - DDoS (First Threshold)

In our study we use different values of K, due to different


thresholds. Below figure shows the different values of k in our
dataset. We experimenting different K values in our training
dataset, that starting from K=1 to K=40. Below graph show K-
values in our DDoS Scenario.

Figure 45: Error Rate vs K

65
2. Other Classifiers

Figure 46: Other Classifiers

66
3. Performance Metrics
Table 7 shows the overall performance on the basis of 1st threshold
with DDoS traffic. Table showcase the accuracy, precision, and recall
results. All classifiers were trained and tested on 70:30 ratio split
dataset.

Table 7: Performance Metrics (DDoS Dataset)

Classifier name Accuracy (%) Precision (%) Recall (%)

K-NN 98.21 97.54 98.21

SVM 98.19 96.42 98.19

Naıve Bayes 97.98 97.07 97.98

Decision Tree 98.21 97.57 98.21

Logistic Regres- 98.18 97.22 98.18


sion

67
4.2 Second Threshold - (Length between 50 and 70 bytes
& between 160 and 180 bytes)
In this experiment, we choose a stricter threshold to study behavior change
in the chosen classifiers. We observed that the classifier performance is
similar compairing with the previous threshold. Below we illustrate the
normal and DDoS scenario with their results in second threshold that we
set a lenght betwen 50 to 70 bytes & 160 to 180 bytes.

4.2.1 Normal Scenario


Table 8 shows how accurate they performed with normal traffic. The KNN
performed fairly well in this experiment. In the normal dataset SVM, de-
cision tree and logistic regression gives no anomaly but KNN and Naive
Bayes classifiers show possible anomalies. The “Classifier Accuracy” col-
umn denotes the result of our classifiers.

Table 8: Classifier statistics (Normal)

Classifier name Classification Accuracy (%)

K-Nearest Neighbors 84.52

Support Vector Machine 84.25

Naıve Bayes 82.61

Decision Tree 84.51

Logistic Regression 84.22

68
1. K-Nearest Neighbors

Figure 47: K-Nearest Neighbors

69
(a) Error Rate and K-Value The performance of the K nearest
neighbors classifier depends on three factors; k value (of neigh-
bors), distance metric and decision rule.

By choosing a small or a large value of k produce different out-


comes given our dataset. With the help of cross validation, we
can select a good value of k. Below is the code used to find the
best value of k given our dataset.

1 # find k value
2
3 error_rate = []
4
5 for i in range(1,40):
6
7 knn = KNeighborsClassifier(n_neighbors=i)
8 knn.fit(X_train,y_train)
9 pred_i = knn.predict(X_test)
10 error_rate.append(np.mean(pred_i != y_test))
Listing 8: find K value - Normal (Second Threshold)

In our study we use different values of K, due to different


thresholds. Below figure shows the different values of k in our
dataset. We experimenting different K values in our training
dataset, that starting from K=1 to K=40. Below graph show K-
values in our normal traffic.

Figure 48: Error Rate vs K

70
2. Other Classifiers

Figure 49: Other Classifiers

71
3. Performance Metrics
Table 9 shows the overall performance on the basis of second threshold
with normal traffic. Table showcase the accuracy, precision, and recall
results. All classifiers were trained and tested on 70:30 ratio split
dataset.

Table 9: Performance Metrics (Normal Dataset)

Classifier name Accuracy (%) Precision (%) Recall (%)

K-NN 84.52 80.85 84.52

SVM 84.25 70.98 84.25

Naıve Bayes 82.61 78.82 82.61

Decision Tree 84.51 80.70 84.51

Logistic Regres- 84.22 78.33 84.22


sion

72
4.2.2 DDoS Scenario
Table 10 shows how accurate they performed with DDoS traffic. In this
dataset K-NN and SVM show good results with this threshold Table 10
shows the anomaly is detected with all five classifiers. The “Classifier Ac-
curacy” column denotes the result of our classifiers.

Table 10: Classifier statistics (DDoS)

Classifier name Classification Accuracy (%)

K-Nearest Neighbors 99.19

Support Vector Machine 99.19

Naıve Bayes 98.92

Decision Tree 99.20

Logistic Regression 99.18

73
1. K-Nearest Neighbors

Figure 50: K-Nearest Neighbors

74
(a) Error Rate and K-Value The performance of the K nearest
neighbors classifier depends on three factors; k value (of neigh-
bors), distance metric and decision rule.

By choosing a small or a large value of k produce different out-


comes given our dataset. With the help of cross validation, we
can select a good value of k. Below is the code used to find the
best value of k given our dataset.

1 # find k value
2
3 error_rate = []
4
5 for i in range(1,40):
6
7 knn = KNeighborsClassifier(n_neighbors=i)
8 knn.fit(X_train,y_train)
9 pred_i = knn.predict(X_test)
10 error_rate.append(np.mean(pred_i != y_test))
Listing 9: find K value - DDoS (Second Threshold)

In our study we use different values of K, due to different


thresholds. Below figure shows the different values of k in our
dataset. We experimenting different K values in our training
dataset, that starting from K=1 to K=40. Below graph show K-
values in our DDoS Scenrio.

Figure 51: Error Rate vs K

75
2. Other Classifiers

Figure 52: Other Classifiers

76
3. Performance Metrics
Table 11 shows the overall performance on the basis of second thresh-
old with DDoS traffic. The table showcase the accuracy, precision, and
recall results. All classifiers were trained and tested on 70:30 ratio
split dataset.

Table 11: Performance Metrics (DDoS Dataset)

Classifier name Accuracy (%) Precision (%) Recall (%)

K-NN 99.19 98.85 99.19

SVM 99.19 98.39 99.19

Naıve Bayes 98.92 98.69 98.92

Decision Tree 99.20 98.90 99.20

Logistic Regres- 99.18 98.77 99.18


sion

77
5 Evaluation / Discussion
This study was conducted to analyze the action, performance and utiliza-
tion of the machine learning algorithms in the context of intrusion detec-
tion system. Due to growing of research in vulnerabilities, the detection
of anomalies is a main topic in these days. In the last few years, with
thousands of computer-based applications being developed every day, the
Web has grown exponentially and has quickly become a fundamental com-
ponent of today’s era, and secure organizational situations are becoming
fundamental to its aggressive development.

DDOS attacks are one of the greatest threats to web destinations among
different types of attacks and act as a demolish risk to computer system pro-
tection , especially due to their potential impact. Therefore, research has
been expanded in this area, with analysts focusing on other ways of identi-
fying and avoiding DDoS attacks. Researchers and industry are working to
find out good solutions in the field of machine learning and artificial intel-
ligence for intrusion detection and prevention. However, different business
partners and researchers often find it difficult to obtain excellent quality
datasets to test and evaluate their machine learning models for detection
of threats. This problem is the main motivation of this study, and basis for
research questions.

Various works have been studied in order to explore the dynamics of dif-
ferent datasets and how their legitimacy is influenced by machine learning
techniques. Numerous problems have been found with regard to the cur-
rent datasets, the counting of security concerns, the accessibility of informa-
tion, the availability and the arrangement with the objective investigation.
This was followed by a review of previous work related to the interpretation
and comparison of datasets.

This study showed an arrangement to determine the adequacy of estab-


lished DDoS datasets in order to detect interference using CRISP-DM as
the key technique. The basic phases of CRISP-DM have been mapped effi-
ciently and effectively to the research questions, and they have been care-
fully considered till the conclusion of this study.

During the experiments, two datasets are used.


1. Normal dataset collected from OsloMet’s Secure 5G4IoT Lab
2. DDoS dataset collected from CIC [124]

Due to COVID-19 pandemic, we faced access restrictions to our lab and


thus we weren’t able to set-up all experiments as we originally intended
and thus for the DDoS traffic we had to use outsourced available data.

78
The datasets were divided individually into a ratio of 70:30 for model
training and testing. In order to ensure that the experiment is carried out
in an appropriate manner, all classifiers were chosen based on literature
review. We found that, the K-nearest neighbors had an overall better per-
formance compared to other classifiers. The results were evaluated using a
set of performance metrics, including precision , accuracy and recall. Below
are the findings of this study according to research.

In 1st threshold, the K-NN performance metrics are good. It achieved


98.21 % result with precision 97.54% and recall is 98.21%. In second thresh-
old, K-NN perform 99.19% result, that is actually very good result of clas-
sifier.

When trained with the CICDDoS2019 [124] dataset, K-nearest neigh-


bors display the good precision and accuracy score. In first threshold, SVM
gives 98.19% result but it doesn’t find any anomaly in this dataset. The
Navie Bayes gives 97.98% , logistic regression give 98.18% and random for-
est gives 98.21% accuracy in this data set. If we see all these classifiers
accuracy and precision values and compare with KNN, these classifiers are
not giving good results as compared with KNN. The accuracy of KNN is
98.85% and precision is 97.54% with this threshold.

Figure 53: Overview of Performance Metrics (Normal - 1st Threshold)

If we talk about the second threshold in same dataset CICDDoS20019


[124], K-nearest neighbors also gives the good precision and accuracy score.
SVM also gives 99.19% accuracy but it shows all the dataset have anomoly.
Navie Bayes give 98.92%, logistic regression 99.18% and random forest
given 99.20% result in this threshold. In this scenario KNN still best be-
cause it will gives good accuracy 99.17% and precision is 98.85%.

79
Figure 54: Overview of Performance Metrics (DDoS - 1st Threshold)

A reason for these discrepancies is most likely due to the thresholds cho-
sen. Throughout this work we were able to concluded that some classifiers
are more sensitive hence producing results that weren’t the expected ones.
An establishment of more robust thresholds that are more adequate to our
studied scenario is needed to provide more reliable results.

Figure 55: Overview of Performance Metrics (Normal - 2nd Threshold)

In this study we are able to detect the attack given by the supervised and
labeled dataset with differences in performance depending on the classifier.
In a real life context and given the early stage of the implementation, the
result data would have be sent to the corresponding security expert team
in a telecom operator for further validation.

80
Figure 56: Overview of Performance Metrics (DDoS - 2nd Threshold)

In this first stage of analysis and validation from the security experts,
it is up to them to perform a decision on what to do with these devices or
network. One possible solution is to blacklist the devices from the network,
impacting their usage. The security experts could also shutdown that spe-
cific cell as a more drastic operation.

81
6 Conclusion and Future Work
6.1 Conclusion
This thesis work addresses the two main questions defined in Chapter
1. The mission was to look at the issues of IoT protection from the point
of view of the Cellular Network in terms of the security challenges. A de-
tailed study is made about the security threat analysis within the cellular
network, including GSM , UMTS, LTE and 5G, was conducted to achieve
this goal.

IoT protocol vulnerabilities have also been discussed point by point, which
are mainly attacks that exploit security vulnerabilities. Machine learning
techniques have been used to identify DDoS attacks in insecure IoT de-
vices to achieve the objectives of this study. Recognizing attacks within the
cellular network is not the same as recognizing attacks in an IP network.
However, a sudden increase in the acceptance of packets in a single node
from the number of distinctive MME nodes in the case of IoT could suggest
an attack, as IoT devices do not transmit packets in a very high frequency.

This work begins with the literature review and previous work in this
filed. Firstly, the study presents an overview of how other researchers dis-
cuss the issue of discovery of intrusion detection with the use of machine
learning. This has provided a much better understanding that how differ-
ent algorithms work and can help understand how to mitigate the propa-
gation of DDoS attacks. In addition, it also provides an understanding of
which algorithms are commonly used to deal with problems in this area.

Based on our proposal, the normal and DDoS data has been generated ac-
cording to the cellular network that discussed in chapter 2. In this way, we
suggested performing packet analysis to detect DDoS attacks as described
in chapter 4, using machine learning classifiers. At that point, by using five
classification methods, such as KNN, Decision Tree, and Naı̈ve Bayes, SVM
and logistic regression, we analyze their performance as to detect possible
attacks, where tests against different dataset sizes, k-fold cross-validation
were performed.

This proposal only focus on TCP attacks, as this protocol is commonly


used to launch an attack, and due to time constant we just focus on SYN
attack in this study.

This has become a much discussed topic in these days: to identify IoT-
based DDoS attacks in the cellular network using a machine learning ap-
proach.

6.2 Future Work


The point was to identify DDoS attacks within the context of the cellular
network in this proposed work, and the aim was to propose an arrangement

82
that could lead to a specific use in the future. Subsequently, the strategy
recommends a full-scale DDoS detection technique within the cellular net-
work, and offline data has been used for training and testing of the model.
We would like to recommend that this methodology be tested in a true re-
search setting for future work. In addition, this strategy focused only on the
TCP SYN flood. In order to secure IoT services in the future, we would like
to incorporate all potential DDoS attacks. In conclusion, using a few other
algorithms, such as Recurrent Neural Network in the Google Tensorflow
framework in the future.

We hope that this study starts as a basis to create a helping tool for tele-
com operators that could be used in the future to detect DDoS and other
types of attacks in a more automated fashion.

83
A Modeling Source Code
1 def knn_comparison(X_train, X_test, y_train, y_test, k):
2 #x = data[[’Source Port’, ’Destination Port’,’Length’]].values
3 #y = data[’Arrival Time’].astype(int).values
4 X_train = X_train.astype(int)
5 X_test = X_test.astype(int)
6 y_train = y_train.astype(int).values
7 y_test = y_test.astype(int).values
8
9 clf = neighbors.KNeighborsClassifier(n_neighbors=k, n_jobs=-1)
10 clf.fit(X_train, y_train)
11
12 print(’For k=:’, k)
13 #Predict the response for test dataset
14 y_pred = clf.predict(X_test)
15
16 # Model Accuracy: how often is the classifier correct?
17 print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
18
19 # Model Precision: what percentage of positive tuples are labelled
as such?
20 print("Precision:",metrics.precision_score(y_test, y_pred, average=
’weighted’))
21
22 # Model Recall: what percentage of positive tuples are labelled as
such?
23 print("Recall:",metrics.recall_score(y_test, y_pred, average=’
weighted’))
24
25 # Plotting decision region
26 # Decision region for further added feature = 1.5
27 value = 1.5
28 # Plot training sample with added features = 1.5 +/- 0.75
29 width = 0.75
30 plot_decision_regions(X_train, y_train, clf=clf,
filler_feature_values={2: value, 3:value},
31 filler_feature_ranges={2: width, 3: width},
legend=2) #, ax=ax)
32
33 #ax.set_xlabel(’Feature 1’)
34 #ax.set_ylabel(’Feature 2’)
35 #ax.set_title(’Feature 3 = {}’.format(value))
36
37 # Adding axes annotations
38 #fig.suptitle(’Knn with K=’+ str(k))
39 #plt.show()
40
41 # Adding axes annotations
42 plt.xlabel(’X’)
43 plt.ylabel(’Y’)
44 plt.title(’Knn with K=’+ str(k))
45 plt.show()
46 # from:
47 # http://rasbt.github.io/mlxtend/user_guide/plotting/
plot_decision_regions/
48 # https://www.datacamp.com/community/tutorials/svm-classification-
scikit-learn-python
49
50 def other_classifiers_comparison(X_train, X_test, y_train, y_test):
51

84
52 X_train = X_train.astype(int)
53 X_test = X_test.astype(int)
54 y_train = y_train.astype(int).values
55 y_test = y_test.astype(int).values
56
57 #Create a svm Classifier
58 clf1 = svm.SVC(kernel=’linear’, gamma="auto") # Linear Kernel
59 clf2 = linear_model.LogisticRegression(random_state=1, solver=’
lbfgs’, n_jobs=-1)
60 clf3 = ensemble.RandomForestClassifier(n_estimators=100,
random_state=1, n_jobs=-1)
61 clf4 = naive_bayes.GaussianNB()
62
63 # Plotting decision regions
64 # Decision region for further added feature = 1.5
65 #value = 1.5
66 # Plot training sample with added features = 1.5 +/- 0.75
67 #width = 0.75
68 #plot_decision_regions(X_train, y_train, clf=clf,
filler_feature_values={2: value, 3:value}, filler_feature_ranges
={2: width, 3: width}, legend=2)
69
70 gs = gridspec.GridSpec(2, 2)
71 fig = plt.figure(figsize=(10,8))
72
73 labels = [’SVM’, ’Logistic Regression’, ’Random Forest’, ’Naive
Bayes’]
74 for clf, lab, grd in zip([clf1, clf2, clf3, clf4],labels,itertools.
product([0, 1], repeat=2)):
75
76 #Train the model using the training sets
77 clf.fit(X_train, y_train)
78
79 #Predict the response for test dataset
80 y_pred = clf.predict(X_test)
81
82 print("Results for: ", lab)
83 # Model Accuracy: how often is the classifier correct?
84 print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
85
86 # Model Precision: what percentage of positive tuples are
labeled as such?
87 print("Precision:",metrics.precision_score(y_test, y_pred,
average=’weighted’))
88
89 # Model Recall: what percentage of positive tuples are labelled
as such?
90 print("Recall:",metrics.recall_score(y_test, y_pred, average=’
weighted’))
91
92 # Plotting decision region
93 # Decision region for further added feature = 1.5
94 value = 1.5
95 # Plot training sample with added features = 1.5 +/- 0.75
96 width = 0.75
97
98 ax = plt.subplot(gs[grd[0], grd[1]])
99 fig = plot_decision_regions(X=X_train, y=y_train, clf=clf,
filler_feature_values={2: value, 3:value},
100 filler_feature_ranges={2: width, 3: width},
legend=2)
101 plt.title(lab)

85
102 plt.show()
103 i = 0
104 while i < len(df[’Timestamp’]):
105 #df.at[i,’Timestamp’] = time.mktime(datetime.datetime.strptime(df[’
Timestamp’][i], ’%M:%S.%f’).timetuple())
106 #Coding Threshold: SYN can occur in packets between 42 and 1434
bytes
107 #if(df.at[i,’Average Packet Size’] >= 42 and df.at[i,’Average
Packet Size’] <= 1434):
108 #Coding Threshold: SYN can occur in packets between 50 and 70 bytes
or between 160 and 180 bytes
109 if(df.at[i,’Average Packet Size’] >= 50 and df.at[i,’Average Packet
Size’] <= 70 or df.at[i,’Average Packet Size’] >= 160 and df.at[i,
’Average Packet Size’] <= 180 ):
110 df.at[i,’Average Packet Size’] = 1
111 else:
112 df.at[i,’Average Packet Size’] = 0
113 i += 1
114
115 # visualizing the updated timestamp field
116 df.head()
117 X_train, X_test, y_train, y_test = train_test_split(scaled_features,df[
’Average Packet Size’],test_size=0.30,random_state=100)
118 other_classifiers_comparison(X_train, X_test, y_train, y_test)
Listing 10: Source code

86
B Dataset Samples

Figure 57: First Dataset Sample (Before Transform)

Figure 58: First Dataset Sample (After Transform)

87
Figure 59: Second Dataset (Before Transform)

Figure 60: Second Dataset (After Transform)

88
References
[1] Keyur K Patel, Sunil M Patel, et al. “Internet of things-IOT: defini-
tion, characteristics, architecture, enabling technologies, application
& future challenges”. In: International journal of engineering science
and computing 6.5 (2016).
[2] Shanzhi Chen et al. “A vision of IoT: Applications, challenges, and
opportunities with china perspective”. In: IEEE Internet of Things
journal 1.4 (2014), pp. 349–359.
[3] In Lee and Kyoochun Lee. “The Internet of Things (IoT): Appli-
cations, investments, and challenges for enterprises”. In: Business
Horizons 58.4 (2015), pp. 431–440.
[4] Priyanka Rawat, Kamal Deep Singh, and Jean Marie Bonnin. “Cog-
nitive radio for M2M and Internet of Things: A survey”. In: Com-
puter Communications 94 (2016), pp. 1–29.
[5] Fredrik Jejdling (ericsson). Ericsson Mobility Report. URL: https:
//www.ericsson.com/en/mobility-report/reports.
[6] Yong Ho Hwang. “Iot security & privacy: threats and challenges”.
In: Proceedings of the 1st ACM workshop on IoT privacy, trust, and
security. 2015, pp. 1–1.
[7] ericsson. IoT connections outlook. URL: https://www.ericsson.
com/en/mobility-report/reports/june-2020/iot-connections-
outlook.
[8] Rwan Mahmoud et al. “Internet of things (IoT) security: Current
status, challenges and prospective measures”. In: 2015 10th Interna-
tional Conference for Internet Technology and Secured Transactions
(ICITST). IEEE. 2015, pp. 336–341.
[9] Mung Chiang and Tao Zhang. “Fog and IoT: An overview of research
opportunities”. In: IEEE Internet of Things Journal 3.6 (2016), pp. 854–
864.
[10] Philipp Schulz et al. “Latency critical IoT applications in 5G: Per-
spective on the design of radio interface and network architecture”.
In: IEEE Communications Magazine 55.2 (2017), pp. 70–78.
[11] Wenjie Yang et al. “Narrowband wireless access for low-power mas-
sive internet of things: A bandwidth perspective”. In: IEEE wireless
communications 24.3 (2017), pp. 138–145.
[12] Michael A Andersson et al. “Feasibility of ambient RF energy har-
vesting for self-sustainable M2M communications using transpar-
ent and flexible graphene antennas”. In: IEEE Access 4 (2016), pp. 5850–
5857.
[13] Marcin Bajer. “IoT for smart buildings-long awaited revolution or
lean evolution”. In: 2018 IEEE 6th International Conference on Fu-
ture Internet of Things and Cloud (FiCloud). IEEE. 2018, pp. 149–
154.

89
[14] Jianli Pan and Zhicheng Yang. “Cybersecurity Challenges and Op-
portunities in the New” Edge Computing+ IoT” World”. In: Proceed-
ings of the 2018 ACM International Workshop on Security in Soft-
ware Defined Networks & Network Function Virtualization. 2018,
pp. 29–32.
[15] Beth Stackpole. Symantec Security Summary - June 2020, COVID-
19 attacks continue and new threats on the rise. URL: https : / /
symantec-enterprise-blogs.security.com/blogs/feature-
stories/symantec-security-summary-june-2020.
[16] Tobias Heer et al. “Security Challenges in the IP-based Internet of
Things”. In: Wireless Personal Communications 61.3 (2011), pp. 527–
542.
[17] NETSCOUT. Cloud in the crosshairs. URL: https://www.netscout.
com/report/.
[18] Ivo Van der Elzen and Jeroen van Heugten. “Techniques for detect-
ing compromised IoT devices”. In: University of Amsterdam (2017).
[19] Kubra Saeedi. Machine learning for ddos detection in packet core
network for iot. 2019.
[20] EY Building a better working world. Cyber risk management. URL:
https://www.ey.com/no_no/consulting/cybersecurity-
risk-management.
[21] Ngo Manh Khoi et al. “IReHMo: An efficient IoT-based remote health
monitoring system for smart regions”. In: 2015 17th International
Conference on E-health Networking, Application & Services (Health-
Com). IEEE. 2015, pp. 563–568.
[22] Y. A. Qadri et al. “The Future of Healthcare Internet of Things: A
Survey of Emerging Technologies”. In: IEEE Communications Sur-
veys Tutorials 22.2 (2020), pp. 1121–1167.
[23] Prosanta Gope and Tzonelih Hwang. “BSN-Care: A secure IoT-based
modern healthcare system using body sensor network”. In: IEEE
sensors journal 16.5 (2015), pp. 1368–1376.
[24] Rohan Doshi, Noah Apthorpe, and Nick Feamster. “Machine learn-
ing ddos detection for consumer internet of things devices”. In: 2018
IEEE Security and Privacy Workshops (SPW). IEEE. 2018, pp. 29–
35.
[25] Wikipedia. Cellular network. URL: https://en.wikipedia.org/
wiki/Cellular_network.
[26] Artiza Networks. LTE Tutorials. URL: https://www.artizanetworks.
com/resources/tutorials/fuc.html.
[27] Qualcomm Inc. RAN WG3 Chairman Dino Flore. LTE RAN architec-
ture aspects. URL: ftp://www.3gpp.org/workshop/2009- 12-
17_ITU-R_IMT-Adv_eval/docs/pdf/REV-09000%5C%20LTE%
5C%20RAN%5C%20Architecture%5C%20aspects.pdf.
[28] 3GPP MCC Frédéric Firmin. The Evolved Packet Core. URL: https:
//www.3gpp.org/technologies/keywords- acronyms/100-
the-evolved-packet-core.

90
[29] Alepo Technologies Inc. Home Subscriber Server (HSS). URL: https:
//medium.com/@AlepoTech/home-subscriber-server-hss-
82470d3f332.
[30] Q. Pham et al. “A Survey of Multi-Access Edge Computing in 5G
and Beyond: Fundamentals, Technology Integration, and State-of-
the-Art”. In: IEEE Access 8 (2020), pp. 116974–117017.
[31] Madhusanka Liyanage et al. Comprehensive Guide to 5G Security.
Wiley Online Library, 2018.
[32] Jahangir Saqlain. “IoT and 5G: History evolution and its architec-
ture their compatibility and future.” In: (2018).
[33] wikipedia. Vulnerability (computing). URL: https://en.wikipedia.
org/wiki/Vulnerability_(computing).
[34] Mohamed Abomhara et al. “Cyber security and the internet of things:
vulnerabilities, threats, intruders and attacks”. In: Journal of Cyber
Security and Mobility 4.1 (2015), pp. 65–88.
[35] Sachin Babar et al. “Proposed security model and threat taxonomy
for the Internet of Things (IoT)”. In: International Conference on
Network Security and Applications. Springer. 2010, pp. 420–429.
[36] Christos Xenakis and Lazaros Merakos. “Security in third genera-
tion mobile networks”. In: Computer communications 27.7 (2004),
pp. 638–650.
[37] Anastasios N Bikos and Nicolas Sklavos. “LTE/SAE security issues
on 4G wireless networks”. In: IEEE Security & Privacy 11.2 (2012),
pp. 55–62.
[38] Marc Lichtman et al. “LTE/LTE-A jamming, spoofing, and sniffing:
threat assessment and mitigation”. In: IEEE Communications Mag-
azine 54.4 (2016), pp. 54–61.
[39] Yongsuk Park and Taejoon Park. “A survey of security threats on 4G
networks”. In: 2007 IEEE Globecom workshops. IEEE. 2007, pp. 1–
6.
[40] Wei Zhou et al. “The effect of iot new features on security and pri-
vacy: New threats, existing solutions, and challenges yet to be solved”.
In: IEEE Internet of Things Journal 6.2 (2018), pp. 1606–1616.
[41] Bogdan Copos et al. “Is anybody home? Inferring activity from smart
home network traffic”. In: 2016 IEEE Security and Privacy Work-
shops (SPW). IEEE. 2016, pp. 245–251.
[42] Job Noorman et al. “Sancus: Low-cost trustworthy extensible net-
worked devices with a zero-software trusted computing base”. In:
22nd {USENIX} Security Symposium ({USENIX} Security 13). 2013,
pp. 479–498.
[43] Wikipedia. Narrowband IoT. URL: https://en.wikipedia.org/
wiki/Narrowband_IoT.
[44] Yihenew Dagne Beyene et al. “On the performance of narrow-band
Internet of Things (NB-IoT)”. In: 2017 ieee wireless communications
and networking conference (wcnc). IEEE. 2017, pp. 1–6.

91
[45] Nitin Mangalvedhe, Rapeepat Ratasuk, and Amitava Ghosh. “NB-
IoT deployment study for low power wide area cellular IoT”. In: 2016
ieee 27th annual international symposium on personal, indoor, and
mobile radio communications (pimrc). IEEE. 2016, pp. 1–6.
[46] Changsheng Yu et al. “Uplink scheduling and link adaptation for
narrowband Internet of Things systems”. In: IEEE Access 5 (2017),
pp. 1724–1734.
[47] Mohsin B Tamboli and Dayanand Dambawade. “Secure and efficient
CoAP based authentication and access control for Internet of Things
(IoT)”. In: 2016 IEEE International Conference on Recent Trends in
Electronics, Information & Communication Technology (RTEICT).
IEEE. 2016, pp. 1245–1250.
[48] Rafael de Jesus Martins et al. “Performance Analysis of 6LoWPAN
and CoAP for Secure Communications in Smart Homes”. In: 2016
IEEE 30th International Conference on Advanced Information Net-
working and Applications (AINA). IEEE. 2016, pp. 1027–1034.
[49] Ala Al-Fuqaha et al. “Internet of things: A survey on enabling tech-
nologies, protocols, and applications”. In: IEEE communications sur-
veys & tutorials 17.4 (2015), pp. 2347–2376.
[50] ABHISHEK GHOSH. Message Queuing Telemetry Transport (MQTT)
Protocol. URL: https://thecustomizewindows.com/2014/07/
message-queuing-telemetry-transport-mqtt-protocol/.
[51] Ahmad W Atamli and Andrew Martin. “Threat-based security anal-
ysis for the internet of things”. In: 2014 International Workshop on
Secure Internet of Things. IEEE. 2014, pp. 35–43.
[52] Syed Naeem Firdous et al. “Modelling and evaluation of malicious
attacks against the iot mqtt protocol”. In: 2017 IEEE International
Conference on Internet of Things (iThings) and IEEE Green Com-
puting and Communications (GreenCom) and IEEE Cyber, Physical
and Social Computing (CPSCom) and IEEE Smart Data (Smart-
Data). IEEE. 2017, pp. 748–755.
[53] Lavinia Nastase. “Security in the internet of things: A survey on ap-
plication layer protocols”. In: 2017 21st International Conference on
Control Systems and Computer Science (CSCS). IEEE. 2017, pp. 659–
666.
[54] Nitin Naik. “Choice of effective messaging protocols for IoT systems:
MQTT, CoAP, AMQP and HTTP”. In: 2017 IEEE international sys-
tems engineering symposium (ISSE). IEEE. 2017, pp. 1–7.
[55] Ian Noel McAteer et al. “Security vulnerabilities and cyber threat
analysis of the AMQP protocol for the internet of things”. In: (2017).
[56] Jesús A Romualdo Ramırez, Enrique Mendéz Franco, and David
Tinoco Varela. “Fuzzification of facial movements to generate human-
machine interfaces in order to control robots by XMPP internet pro-
tocol”. In: MATEC Web of Conferences. Vol. 125. EDP Sciences. 2017,
p. 04020.

92
[57] ZigBee Alliance. “Zigbee alliance”. In: WPAN industry group, http://www.
zigbee. org/. The industry group responsible for the ZigBee standard
and certification (2010).
[58] Shahin Farahani. ZigBee wireless networks and transceivers. Newnes,
2011.
[59] Sharly Joana Halder, Joon-Goo Park, and Wooju Kim. “Adaptive fil-
tering for indoor localization using ZIGBEE RSSI and LQI measure-
ment”. In: Adaptive Filtering Applications (2011), pp. 305–324.
[60] Abdelkader Lahmadi, Cesar Brandin, and Olivier Festor. “A testing
framework for discovering vulnerabilities in 6LoWPAN networks”.
In: 2012 IEEE 8th International Conference on Distributed Comput-
ing in Sensor Systems. IEEE. 2012, pp. 335–340.
[61] René Hummen et al. “6LoWPAN fragmentation attacks and mitiga-
tion mechanisms”. In: Proceedings of the sixth ACM conference on
Security and privacy in wireless and mobile networks. 2013, pp. 55–
66.
[62] Ghada Glissa and Aref Meddeb. “6LoWPAN multi-layered security
protocol based on IEEE 802.15. 4 security features”. In: 2017 13th
International Wireless Communications and Mobile Computing Con-
ference (IWCMC). IEEE. 2017, pp. 264–269.
[63] Ajay Kumar Nain et al. “A secure phase-encrypted IEEE 802.15.
4 transceiver design”. In: IEEE Transactions on Computers 66.8
(2017), pp. 1421–1427.
[64] Ajay Kumar Nain and Pachamuthu Rajalakshmi. “A reliable covert
channel over IEEE 802.15. 4 using steganography”. In: 2016 IEEE
3rd World Forum on Internet of Things (WF-IoT). IEEE. 2016, pp. 711–
716.
[65] keycdn. DDoS Attack. URL: https://www.keycdn.com/support/
ddos-attack.
[66] Vincenzo Matta, Mario Di Mauro, and Maurizio Longo. “DDoS at-
tacks with randomized traffic innovation: Botnet identification chal-
lenges and strategies”. In: IEEE Transactions on Information Foren-
sics and Security 12.8 (2017), pp. 1844–1859.
[67] Liang Xiao et al. “IoT security techniques based on machine learn-
ing: How do IoT devices use AI to enhance security?” In: IEEE Sig-
nal Processing Magazine 35.5 (2018), pp. 41–49.
[68] Sam Edwards and Ioannis Profetis. “Hajime: Analysis of a decen-
tralized internet worm for IoT devices”. In: Rapidity Networks 16
(2016).
[69] Jelena Mirkovic, Gregory Prier, and Peter Reiher. “Attacking DDoS
at the source”. In: 10th IEEE International Conference on Network
Protocols, 2002. Proceedings. IEEE. 2002, pp. 312–321.
[70] Thomer M Gil and Massimiliano Poletto. “MULTOPS: A Data-Structure
for Bandwidth Attack Detection.” In: USENIX Security Symposium.
2001, pp. 23–38.

93
[71] MANANET CS3. Comprehensive DDoS Defense Solutions. URL: http:
//cs3-inc.com/MANAnet.html.
[72] Zecheng He, Tianwei Zhang, and Ruby B Lee. “Machine learning
based DDoS attack detection from source side in cloud”. In: 2017
IEEE 4th International Conference on Cyber Security and Cloud
Computing (CSCloud). IEEE. 2017, pp. 114–120.
[73] Stefan Seufert and Darragh O’Brien. “Machine learning for auto-
matic defence against distributed denial of service attacks”. In: 2007
IEEE International Conference on Communications. IEEE. 2007,
pp. 1217–1222.
[74] Todd Booth and Karl Andersson. “Network security of internet ser-
vices: eliminate DDoS reflection amplification attacks”. In: Journal
of Internet Services and Information Security (JISIS) 5.3 (2015),
pp. 58–79.
[75] Peter Ken Bediako. Long Short-Term Memory Recurrent Neural Net-
work for detecting DDoS flooding attacks within TensorFlow Imple-
mentation framework. 2017.
[76] Esraa Alomari et al. “Botnet-based distributed denial of service (DDoS)
attacks on web servers: classification and art”. In: arXiv preprint
arXiv:1208.0403 (2012).
[77] Imperva. DDoS Attacks. URL: https://www.imperva.com/learn/
ddos/ddos-attacks/.
[78] CAPEC. View the List of Attack Patterns. URL: https://capec.
mitre.org/.
[79] FuiFui Wong and Cheng Xiang Tan. “A survey of trends in massive
DDoS attacks and cloud-based mitigations”. In: International Jour-
nal of Network Security & Its Applications 6.3 (2014), p. 57.
[80] Seyed Mohammad Mousavi and Marc St-Hilaire. “Early detection of
DDoS attacks against SDN controllers”. In: 2015 International Con-
ference on Computing, Networking and Communications (ICNC). IEEE.
2015, pp. 77–81.
[81] Monowar H Bhuyan, DK Bhattacharyya, and Jugal K Kalita. “An
empirical evaluation of information metrics for low-rate and high-
rate DDoS attack detection”. In: Pattern Recognition Letters 51 (2015),
pp. 1–7.
[82] Rodrigo Braga, Edjard Mota, and Alexandre Passito. “Lightweight
DDoS flooding attack detection using NOX/OpenFlow”. In: IEEE Lo-
cal Computer Network Conference. IEEE. 2010, pp. 408–415.
[83] Laura Feinstein et al. “Statistical approaches to DDoS attack detec-
tion and response”. In: Proceedings DARPA information survivabil-
ity conference and exposition. Vol. 1. IEEE. 2003, pp. 303–314.
[84] Jarrod Bakker. “Intelligent traffic classification for detecting DDoS
attacks using SDN/OpenFlow”. In: (2017).
[85] Manjula Suresh and R Anitha. “Evaluating machine learning algo-
rithms for detecting DDoS attacks”. In: International Conference on
Network Security and Applications. Springer. 2011, pp. 441–452.

94
[86] Rory P Bunker and Fadi Thabtah. “A machine learning framework
for sport result prediction”. In: Applied computing and informatics
15.1 (2019), pp. 27–33.
[87] David Adedayo Adeniyi, Zhaoqiang Wei, and Y Yongquan. “Auto-
mated web usage data mining and recommendation system using
K-Nearest Neighbor (KNN) classification method”. In: Applied Com-
puting and Informatics 12.1 (2016), pp. 90–108.
[88] Kevin Zakka’s Blog. A Complete Guide to K-Nearest-Neighbors with
Applications in Python and R. URL: https://kevinzakka.github.
io/2016/07/13/k-nearest-neighbor/.
[89] VS Prasatha et al. “Effects of distance measure choice on knn clas-
sifier performance-a review”. In: arXiv preprint arXiv:1708.04321
(2017).
[90] Feng Tian et al. “Research on flight phase division based on deci-
sion tree classifier”. In: 2017 2nd IEEE International Conference on
Computational Intelligence and Applications (ICCIA). IEEE. 2017,
pp. 372–375.
[91] Yi-Chi Wu et al. “DDoS detection and traceback with decision tree
and grey relational analysis”. In: International Journal of Ad Hoc
and Ubiquitous Computing 7.2 (2011), pp. 121–136.
[92] Corinna Cortes and Vladimir Vapnik. “Support-vector networks”.
In: Machine learning 20.3 (1995), pp. 273–297.
[93] Asa Ben-Hur et al. “Support vector clustering”. In: Journal of ma-
chine learning research 2.Dec (2001), pp. 125–137.
[94] Tina R Patil. “MSSS Performance analysis of naive bayes and J48
classification algorithm for data classification. Intl”. In: Journal of
Computer Science and Applications 6.2 (2013).
[95] Alex Smola and SVN Vishwanathan. “Introduction to machine learn-
ing”. In: Cambridge University, UK 32.34 (2008), p. 2008.
[96] Tejaswinee A Shinde and Jayashree R Prasad. “IoT based animal
health monitoring with naive Bayes classification”. In: IJETT 1.2
(2017).
[97] the free encyclopedia Wikipedia. Logistic regression. URL: https:
//en.wikipedia.org/wiki/Logistic_regression.
[98] the free encyclopedia Wikipedia. Regression analysis. URL: https:
//en.wikipedia.org/wiki/Regression_analysis.
[99] keycdn. DDoS Attack. URL: https://reference.wolfram.com/
legacy/applications/fuzzylogic/Manual/12.html.
[100] Tuan A Tang et al. “Deep learning approach for network intrusion
detection in software defined networking”. In: 2016 International
Conference on Wireless Networks and Mobile Communications (WIN-
COM). IEEE. 2016, pp. 258–263.
[101] Vishal Maini. Machine Learning for Humans, Part 5: Reinforcement
Learning. URL: https : / / medium . com / machine - learning -
for-humans/reinforcement-learning-6eacf258b265.

95
[102] Marek Małowidzki, P Berezinski, and Michał Mazur. “Network in-
trusion detection: Half a kingdom for a good dataset”. In: Proceed-
ings of NATO STO SAS-139 Workshop, Portugal. 2015.
[103] Robert Koch, Björn Stelte, and Mario Golling. “Attack trends in
present computer networks”. In: 2012 4th International Conference
on Cyber Conflict (CYCON 2012). IEEE. 2012, pp. 1–12.
[104] Ciza Thomas, Vishwas Sharma, and N Balakrishnan. “Usefulness
of DARPA dataset for intrusion detection system evaluation”. In:
Data Mining, Intrusion Detection, Information Assurance, and Data
Networks Security 2008. Vol. 6973. International Society for Optics
and Photonics. 2008, 69730G.
[105] L Dhanabal and SP Shantharajah. “A study on NSL-KDD dataset
for intrusion detection system based on classification algorithms”.
In: International Journal of Advanced Research in Computer and
Communication Engineering 4.6 (2015), pp. 446–452.
[106] Amirhossein Gharib et al. “An evaluation framework for intrusion
detection dataset”. In: 2016 International Conference on Information
Science and Security (ICISS). IEEE. 2016, pp. 1–6.
[107] Monowar H Bhuyan, Dhruba Kumar Bhattacharyya, and Jugal K
Kalita. “Network anomaly detection: methods, systems and tools”.
In: Ieee communications surveys & tutorials 16.1 (2013), pp. 303–
336.
[108] Antonia Nisioti et al. “From intrusion detection to attacker attribu-
tion: A comprehensive survey of unsupervised methods”. In: IEEE
Communications Surveys & Tutorials 20.4 (2018), pp. 3369–3388.
[109] Thomas H Morris, Zach Thornton, and Ian Turnipseed. “Industrial
control system simulation and data logging for intrusion detection
system research”. In: 7th annual southeastern cyber security summit
(2015), pp. 3–4.
[110] Ozlem Yavanoglu and Murat Aydos. “A review on cyber security
datasets for machine learning algorithms”. In: 2017 IEEE Interna-
tional Conference on Big Data (Big Data). IEEE. 2017, pp. 2186–
2193.
[111] Markus Ring et al. “A survey of network-based intrusion detection
data sets”. In: Computers & Security 86 (2019), pp. 147–167.
[112] Priyanka Kamboj et al. “Detection techniques of DDoS attacks: A
survey”. In: 2017 4th IEEE Uttar Pradesh Section International Con-
ference on Electrical, Computer and Electronics (UPCON). IEEE.
2017, pp. 675–679.
[113] Jieren Cheng et al. “A DDoS detection method for socially aware
networking based on forecasting fusion feature sequence”. In: The
computer journal 61.7 (2018), pp. 959–970.
[114] Umer Ahmed Butt et al. “A Review of Machine Learning Algorithms
for Cloud Computing Security”. In: Electronics 9.9 (2020), p. 1379.

96
[115] S Umarani and D Sharmila. “Predicting application layer DDoS at-
tacks using machine learning algorithms”. In: International Journal
of Computer and Systems Engineering 8.10 (2015), pp. 1912–1917.
[116] Joshua Ojo Nehinbe. “A critical evaluation of datasets for investi-
gating IDSs and IPSs researches”. In: 2011 IEEE 10th International
Conference on Cybernetic Intelligent Systems (CIS). IEEE. 2011, pp. 92–
97.
[117] Ali Feizollah et al. “A study of machine learning classifiers for anomaly-
based mobile botnet detection”. In: Malaysian Journal of Computer
Science 26.4 (2013), pp. 251–265.
[118] Irfan Ali. Technology, Presentation, Inforgraphic, Documents, 26 April
2017. [Online]. URL: https://www.slideshare.net/aliirfan04/
gtp-overview.
[119] Cisco. Ultra IoT C-SGN Administration Guide, StarOS Release 21.20,
23 August 2020. [Online]. URL: https://www.cisco.com/c/en/
us / td / docs / wireless / asr _ 5000 / 21 - 20 _ 6 - 14 / Ultra -
IoT - CSGN - Admin - Guide / 21 - 20 - ultra - iot - csgn - admin /
21 - 17 - Ultra - IoT - CSGN - Admin _ chapter _ 01 . html ? dtid =
osscdc000283.
[120] Center for Applied Internet Data Analysis. The CAIDA DDoS Attack
2007, Dataset. URL: https://www.caida.org/data/passive/
ddos-20070804_dataset.xml.
[121] Canadian Institute for Cybersecurity. NSL-KDD dataset. URL: https:
//www.unb.ca/cic/datasets/nsl.html.
[122] g0tmi1k. What is Kali Linux? URL: https : / / www . kali . org /
docs/introduction/what-is-kali-linux/.
[123] wireshark. Wireshark User’s Guide. URL: https://www.wireshark.
org/docs/wsug_html_chunked/ChapterIntroduction.html.
[124] Canadian Institute for Cybersecurity. DDoS Evaluation Dataset CIC−
DDoS2019. URL: https://www.unb.ca/cic/datasets/ddos-
2019.html.
[125] Fabio Cesar Schuartz, Anelise Munaretto, and Mauro Fonseca. “Uma
Comparação entre os Sistemas de Detecção de Ameaças Distribuıdas
de Rede Baseado no Processamento de Dados em Fluxo e em Lotes”.
In: Anais do XXIV Workshop de Gerência e Operação de Redes e
Serviços. SBC. 2019, pp. 29–42.
[126] Priya Pandey, Maneela Jain, and Rajneesh Pachouri. “DDOS AT-
TACK ON WIRELESS SENSOR NETWORK: A REVIEW.” In: In-
ternational Journal of Advanced Research in Computer Science 8.9
(2017).
[127] Thwe Thwe Oo and Thandar Phyu. “Analysis of DDoS Detection
System based on Anomaly Detection System”. In: International Con-
ference on Advances in Engineering and Technology (ICAET’2014).
Singapore. 2014.

97
[128] Markus Miettinen et al. “Iot sentinel: Automated device-type iden-
tification for security enforcement in iot”. In: 2017 IEEE 37th In-
ternational Conference on Distributed Computing Systems (ICDCS).
IEEE. 2017, pp. 2177–2184.
[129] the free encyclopedia EWikipedia. Cross-industry standard process
for data mining. URL: https : / / en . wikipedia . org / wiki /
Cross-industry_standard_process_for_data_mining.
[130] scikit-learn. Classification. URL: https://scikit- learn.org/
stable/.
[131] Jason Brownlee. A Gentle Introduction to k-fold Cross-Validation.
URL : https://machinelearningmastery.com/k-fold-cross-
validation/.
[132] sebastianraschka. Machine Learning FAQ. URL: https://sebastianraschka.
com/faq/docs/parametric_vs_nonparametric.html.
[133] Tom M Mitchell. Machine Learning, volume 1 of 1. 1997.
[134] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foun-
dations of machine learning. MIT press, 2018.
[135] David Martin Powers. “Evaluation: from precision, recall and F-
measure to ROC, informedness, markedness and correlation”. In:
(2011).

98

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy