ADML IoT 1-0-1
ADML IoT 1-0-1
Department of Informatics
Faculty of mathematics and natural sciences
UNIVERSITY OF OSLO
Autumn 2020
Anomaly Detection with
Machine Learning in IoT
Cellular Networks
Master Thesis
Supervisor: Co-Supervisor:
http://www.duo.uio.no/
1
Acknowledgements
First of all, I would like to thanks to Almighty God who help me to make
this thesis work. Secondly, I would like to thanks to my Supervisor Prof.
Dr. Thanh van Do for his support and guidelines provided throughout
my master thesis. I would also thanks to my Co-Supervisor Prof. Boning
Feng who give me the opportunity to write about such an interesting topic.
With the help of both of you, I overcome the challenges faced throughout
my thesis.
Author
2
Contents
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 IoT in Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Research Methodology . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Approach 38
3.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Design Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3 Implementation and Experiment phase . . . . . . . . . . . . . 41
3.3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.2 Used Tools and Software . . . . . . . . . . . . . . . . . . 42
3.3.3 Collection of Normal and DDoS attack traffic . . . . . . 43
3.3.4 Feature Extraction . . . . . . . . . . . . . . . . . . . . . 46
3.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4.1 Cleaning and Transformation . . . . . . . . . . . . . . . 47
3
3.4.2 Splitting of Dataset . . . . . . . . . . . . . . . . . . . . . 49
3.4.3 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4 Results 57
4.1 First Threshold - (Length below 100 bytes) . . . . . . . . . . . 58
4.1.1 Normal Scenario . . . . . . . . . . . . . . . . . . . . . . 58
4.1.2 DDoS Scenario . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Second Threshold - (Length between 50 and 70 bytes & be-
tween 160 and 180 bytes) . . . . . . . . . . . . . . . . . . . . . . 68
4.2.1 Normal Scenario . . . . . . . . . . . . . . . . . . . . . . 68
4.2.2 DDoS Scenario . . . . . . . . . . . . . . . . . . . . . . . . 73
5 Evaluation / Discussion 78
B Dataset Samples 87
4
List of Tables
1 Specification of dataset CICDDoS2019 [124]
2 Labeling of binary classification
3 scikit-learn Python Library [130]
4 Classifier statistics (Normal)
5 Performance Metrics
6 Classifier statistics (DDoS)
7 Performance Metrics (DDoS Dataset)
8 Classifier statistics (Normal)
9 Performance Metrics (Normal Dataset)
10 Classifier statistics (DDoS)
11 Performance Metrics (DDoS Dataset)
5
List of Figures
1 Internet of Things [4]
2 IoT environment
3 Cellular IoT connections by segment and technology (billion)
[5]
4 Massive vs. Critical IoT [12]
5 The three-tier Architecture of the H-IoT systems [22]
6 Mobile subscriptions by technology [5]
7 Different Generations in Telecom [30]
8 ITU X.805 Framework [37]
9 Security threats in 5G [31]
10 NB-IoT deployment [44]
11 Partial Deployment of NB-IoT [44]
12 IoT Protocol Stack
13 MQTT in IoT [50]
14 Some Applications in ZigBee [59]
15 DDoS Attack [65]
16 Attack Life Cycle [18]
17 Direct and Indirect Attacks [74]
18 Complex Reflection Attack [74]
19 DDoS Attack Types [75]
20 Machine Learning [86]
21 An example of KNN classification [89]
22 Structure of Decision Tree [90]
23 an example of separable problem in 2 dimensional space [92]
24 an example - Naı̈ve Bayes model [95]
25 an example - Regression in Gaussian distribution [98]
26 Research Methodology
27 End to End Communication in Network
28 Proposed Method for DDoS Detection
29 Test Lab Devices
30 Test Lab Network
31 Testbed Architecture [124]
32 Wireshark capturing [123]
33 Parametric/non-parametric models
34 k-NN classifier - an example [133]
35 Pseudo-code for k-NN Algorithm [133]
36 SVM - an example [133]
37 Pseudo-code for SVM Algorithm [133]
38 Pseudo-code for Naı̈ve Bayes Algorithm [133]
39 Pseudo-code for Decision Tree Algorithm [133]
40 Pseudo-code for Logistic Regression Algorithm [133]
41 K-Nearest Neighbors
42 Error Rate vs K
43 Other Classifiers
44 K-Nearest Neighbors
45 Error Rate vs K
46 Other Classifiers
47 K-Nearest Neighbors
6
48 Error Rate vs K
49 Other Classifiers
50 K-Nearest Neighbors
51 Error Rate vs K
52 Other Classifiers
53 Overview of Performance Metrics (Normal - 1st Threshold)
54 Overview of Performance Metrics (DDoS - 1st Threshold)
55 Overview of Performance Metrics (Normal - 2nd Threshold)
56 Overview of Performance Metrics (DDoS - 2nd Threshold)
57 First Dataset Sample (Before Transform)
58 First Dataset Sample (After Transform)
59 Second Dataset (Before Transform)
60 Second Dataset (After Transform)
7
Listings
1 Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2 Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3 Splitting of data . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4 Classifiers used in this work . . . . . . . . . . . . . . . . . . . . 54
5 Evaluation model . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6 find K value - Normal (First Threshold) . . . . . . . . . . . . . 60
7 find K value - DDoS (First Threshold) . . . . . . . . . . . . . . 65
8 find K value - Normal (Second Threshold) . . . . . . . . . . . . 70
9 find K value - DDoS (Second Threshold) . . . . . . . . . . . . . 75
10 Source code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8
Acronyms
6LoWPAN IPv6 over Low-Power Wireless Personal Area Network. 22
9
NAS Non-Access stratum. 8
NB-IoT Narrow Band Internet of Thing. 2
10
1 Introduction
In this chapter, we create a context of this thesis, explaining the motivation,
aim, research questions, delimitation and contribution of this thesis’s work.
In addition, the last section defines the whole content of this thesis.
1.1 Background
Internet of Things (IoT) is described as a “network to connect anything
with the Internet based on stipulated protocols through information sensing
equipments to conduct information exchange and communications in order
to achieve smart recognitions, positioning, tracing, monitoring, and admin-
istration [1].”
The IoT environment, which consists of three groups, such as the manu-
facturer of gadgets, the IoT application running on application servers and
1
the Evolved Packet Core (EPC), shown in figure 2, has a role for telecom-
munications operators. In this context, each of these parties should ensure
the protection, security and availability of services to the consumer [5] [6].
Inside the individual area network, IoT devices transmit packets via Z-
wave and Zigbee, while packets of the network are transmitted through
GSM, UMTS, LTE in wide area network. The four nodes, Mobile Manage-
ment Entity MME, Home Subscriber Server HSS, Serving Gateway SGW
and Packet Data Network Gateway PGW, are transmitted from eNodeB
packets to the EPC.
In figure 3 Ericsson has predicted that some 29 billion IoT gadgets will
be usable by 2025 [5]. From out of 18 billion devices are IoT devices, such
as motion and door sensors, smart devices [5] and 11 billion of these overall
devices are belongs to smartphones.
In the current advanced stage, IoT is leading the charge and offering pow-
erful services such as new market opportunities, growth in trade revenues,
decision-making, reduction in cost, security and safety, enhanced citizen
2
Figure 3: Cellular IoT connections by segment and technology (billion) [5]
participation and a system of measures [2]. In any case, 70% of IoT devices
contain vulnerabilities in area such as encryption and security of password
among others that give open access to attackers for extreme attacks such
as Denial of Service (DoS) [8].
Attackers are trying new techniques to destroy security, steal your in-
tellectual property and disrupt sensitive information. Regular security
threats are becoming more and more dangerous to defeat and more com-
plex. In this way, we need to be aware of what kind of security monitoring
is needed [9]. The way to defend IoT against attackers is to learn how to
predict attacks [2] [3].
Massive IoT uses a large range of sensor and actuator gadgets that are
relatively inexpensive and devouring to maintain longer battery life. A few
examples of massive IoT are used by smart cities, agriculture, transport,
and logistics. Critical IoT applications include autonomous vehicles, re-
mote surgery, remote manufacturing etc. requiring high usability, ultra-
reliability and low latency [5].
Any security issues arising from the delay or inaccessibility of the benefit
to any of these groups will have a possible impact on business and society
[13]. Due to the need for sufficient computational control, capacity and
battery life to conduct authentication, encryption and safety calculations
[8] [6], enormous quantities of resource-restricted IoT devices are connected
3
Figure 4: Massive vs. Critical IoT [12]
1.2 Motivation
IoT coordinates millions of web-based devices to make our everyday lives
easier, however it faces security challenges that need considerationas IoT
objects are ineffectively controlled, maintained and protected [16].
4
The IoT DDoS attacks were a main dominant attack in 2017, in line with
the Arbor Security report [17], and 65 percent of the attacks carried out
in 2016 were a significant DDoS attack. The Mirai DDoS attack [18] was
triggered by the contamination of defective IoT devices, being the biggest
attack ever. Consequently, DDoS attacks should be detected and mitigated.
Transmission Control Protocol (TCP), User Datagram Protocol (UDP), and
DNS flooding are the most common attacks on DDoS. Protection measure-
ments are challenging to enforce due to memory limitations, power, memory
and the heterogeneous nature of IoT devices.
In the case of a company cutting grass by owning a few robots and con-
tracting green roof owners to cut grass on a plan based on their develop-
ment, be that as it may be. IoT can mechanize this errand with a drone
that allows geofencing engineering to carry robots to the rooftops, a benefit
that uses GPS, CALLER ID and RFID. Geofencing is also used by robots to
prevent dropping and walking outside the roof barrier. In this case, there-
fore, IoT brings biological and financial benefits by enabling devices to com-
municate. In reality, if an intruder activates DoS or misinforms drones or
robots, the consequences can be quite harmful.
IoT security is therefore very imperative, and IoT attack results can be
more dangerous than web attacks. As EY states, “Our Cyber Risk Manage-
ment services help organizations tackle the many security challenges they
face on a daily basis — supporting risk-based decisions and improved cy-
bersecurity, reducing costs related to managing security risk, and improving
their overall cybersecurity posture” [20].
1.3.1 Healthcare
Healthcare is one of the most important areas in which IoT has pro-
vided comfort to both physicians and patients with important aspects such
as real-time observation, health care and patient data management, etc.
The Body Sensor Network (BSN) innovation is another breakthrough that
makes it possible for a physician to collect data from patients to additionally
screen them via extremely compulsive devices that use lightweight protocol
for transmission of data such as CoAP [21].
5
As shown in figure 5, these devices collect and transmit sensitive infor-
mation to another core, such as a gateway. The protection and security
of these sensors’ devices is extremely important because they hold the pa-
tient’s critical data. Any unauthorized entry, leakage and capture of these
devices can cause serious harm to patients.
If we are able to obtain information from the control and data plane in
a cellular network, coming from IoT devices, we can use machine learning
and anomaly detection algorithms on these data to see if it allows us to
detect or even predict an upcoming attack.
6
1.5 Research Methodology
The aim of this thesis is to detect anomalies in IoT devices that are con-
nected to a cellular network. There are three key stages: information col-
lection (normal and DDoS traffic), feature extraction and selection, and ma-
chine learning classification. To this end, the packets were generated by
IoT devices. Normal and DDoS events were collected separately during the
information collection processes. Information was pre-processed and classi-
fied, in the final stage with Scikit-learn tool [24]. Using (k-NN, SVM,naı̈ve
Bayes, decision tree and logistic regression) the most known classifiers, a
performance evaluation was made considering the task at hand.
Chapter 4 Results: Presents the outcome of what was done and explains
how the project was implemented and how the experiment was conducted.
7
2 Background and Related Work
This chapter presents the evolution of mobile technologies, Internet of
Things and an overview on the available security mechanisms for this sec-
tor. The section starts by describing the mobile technologies, difference be-
tween attack, threat, and vulnerability, and continues with what is a DDoS
attack and how it can be inflicted. It showcases how other researchers have
used machine learning to detect DDoS attacks and what techniques they
used so for. In addition, this chapter describes the life cycle of the attack
and clarifies with a single summary how the attack can be detected within
the network traffic.
8
of all subscribers. HSS acts as a database containing both the open
and private identities of the subscriber, credentials, the IMSI and the
data used to show the type of assistance that each user that using
the subscription of mobile services. HSS is used when the device asks
for radio resources to check the device specific IMSI status. Other
features of the HSS include support area function, mobile roaming,
domestic area registries, preference setting of subscriber, setting and
mobile authentication server [29].
5. SGW: For eNodeB, Serving Gateway (SGW) is the local mobility han-
dover point. SGW is responsible of the packet forwarding, inspection,
routing, PDN, UE and QCI uplink and downlink charging [27].
6. PGW: Packet Data Network (PGW), the door interfaces of the EPC to
outside IP network. PGW features include (Deep Packet Inspection),
IP assignment to UE, legal inspection, user package sifting, uplink
and downlink benefit level charging and policy control [27].
9
Figure 7: Different Generations in Telecom [30]
communication uses the bands 900 MHz and 1800 MHz. In US, GSM
operates on 850 MHz and 1900 MHz. [31]
Due to some low-cost base station and mobile set, the GSM tech-
nology is popular around the world. On the basis of GSM architec-
ture, technologies grows advanced in next generation (3G) systems
and LTE. The Base Station Subsystem consists of the Base Station
Transceiver (BTS) unit which is connected to the mobile station (MS)
and Base Station Controller (BSC) through the connected air inter-
face. BSC manages the process from a BTSs to the exchange cen-
ter. Portability over BTSs is also monitored. Network Switching
Sub-system is another sub-component. It contains the Mobile Switch-
ing Center (MSC) and databases of subscriber. MSC performs the
switching part and carry out the call from connected to the calling
party. MSC is associated with the Public Switching Telephone Net-
work (PSTN). Home Location Register (HLR) and Visitor Location
Register (VLR) are used to test the proposed character of the MSC
subscriber. [31]
2. Third Generation (3G) - UMTS:- The new era of technology is launc-
hed in 2003 when 3G is launched. This 3G launched the boom in the
field of mobile devices and smartphones powered by 3G services. 3G
was based on digital voice and also included digital IP and web mail
and SMS information. The technology used in 3G was W-CDMA and
UMTS. The speed has been expanded to 2,000 kbps in 3G and the first
portable broadband has also been introduced [32].
10
3. Forth Generation (4G) - LTE:- Get the benefit of trying to get a
higher web speed on mobile devices 4G/LTE is introduced. Networks
and apps eventually require a higher web speed, so 4G was launched
in 2011 because of this massive need for faster internet access, it is
still used in conjunction with 2G and 3G. It was clearly explained for
data that is based on an IP-based protocol (LTE) for interactive media,
the key of 4G designing. The switching technique used in 4G was
packet switching. The speed has increased to 100,000 kbps [32].
4. Fifth Generation (5G):- Due to higher usage of devices in current
year the new concept of high-speed internet is introduced. Due to
high speed internet next generation of transport network is required.
5G is the answer to these questions, 5G will be very useful in the year
2020. The 5G breakthrough can be a digital voice and data capacity
and special features for IoT (Internet of Things) and AR (Augmented
Reality), VR (Virtual Reality). Anything from smart cars to city grids,
etc., and items communicate with each other for IoT smart by using
different protocols such as the CISCO CCN and the MQTT protocols.
Packet switching technology is used in 5G network. The latency in 5G
network is only 1ms [32].
2.2.1 Vulnerability
Vulnerability is a weakness point in information system, where an at-
tacker gets internal control and launch an attack or authorized activity.
A vulnerability may be in three steps: Flaw in a system, an attacker get
access to the flaw and can exploit them. The attacker must be connected
with the system by any kind of tool and exploit the vulnerability. In any
case, vulnerability must not be harmed in the form of risk, because the risk
that could theoretically impact the system can be exploiting by misuse of
vulnerability [33].
2.2.2 Threat
A threat is an advantage to the intruder, that is achieved by using the
system’s vulnerability and it includes a negative effect on the actions of
the system. In addition, threat can be triggered by humans and also by
naturally causes such as a seismic tremor, floods and some other natural
disaster that hurts the computer structure and IoT systems.
11
There are also Man-made threats created by experienced individuals to
find vulnerabilities and produce system damaging codes and scripts or to
participate in any criminal activity, e.g. trade or government data. These
kinds of threats are called structured threat, while unstructured threats
can be caused by unpracticed humans, who present a malicious device in
their equipment and do not have enough data on the threat that their adap-
tation can cause. As well as structured and unstructured threats [34] [35],
are both uses in IoT devices. Threat must be defined through the collective
participation of application originators, security specialist, analyzers, de-
signers, and system administrator. Attack tree and attack pattern can be
used to identify threats.
2.2.3 Attack
Attacks are the dangerous and alarming effects of a certain behavior of
a device that is induced by vulnerabilities through a variety of gadgets and
methods. Attacks have distinctive processes, for instance, one type of at-
tack in which the attacker finds the sensitive information of unencrypted
network traffic is called an active attack. Another type of attack is a passive
attack that monitors weakly encrypted traffic to find authentication infor-
mation. The most typical attacks are accessed attack, physical attacks,
distributed denial of service attack, privacy attack link steal password and
cyber security attack [34].
12
it uses TCP / IP architecture, the 4G organization is totally IP oriented. All
the components make the Evolved Packet Core (EPC) which is also known
as EPS (Evolved Packet System). 4G Engineering consists of two parts,
EPS and eNodeB. Authors [37] [38] classified LTE threats in the following
categories:
• Identity and Privacy of User: Prohibited access and usage of user,
the hardware identity of users to access network or change the char-
acter of users to commit malicious actions.
The definition of Yongsuk et al [39] tends to suggest that due to the het-
erogeneous and IP-based arrangement of 4G, there are conceivable effects
of modern threats that could lead to unintended interference with data and
service disclosure. In addition, the imaginable threats in 4G include con-
sumer ID and service theft, DoS, IP spoofing and a massive range of related
heterogeneous devices, which are possible safety holes for the framework.
13
Figure 8: ITU X.805 Framework [37]
2.3.3 5G
Compared to previous standardized, the different characteristics of 5G
application and services, such as eMBB, URLLC and mMTC, are vulnera-
ble to attack and have a variety of uses for targeting 5G services. There
are many possible ways in 5G that transfer from client hardware, such
as mobile phones, robots, IoTs, computerized mechanical devices, indepen-
dent vehicles, etc. The 5G security threat is shown in figure 9, which com-
bines Radio Access Network (RAN), core and internet. Man-in-the-Middle
(MiTM) attacks on Cloud RAN (C-RAN) domains, IP core network that are
vulnerable to DDoS and client devices that are exploit by malware and bots
[31].
The security challenges associated with them core network are critical
for mobile network to recognize, as the full structure of 5G portable sys-
tems depends on the network. The entire system will fail if the advances of
everyone are vulnerable to the attacker. 4G and previous mobile networks
have not been outlined in order to fix security concerns related to NFV and
SDN. Security requirements are required for both signaling and informa-
tion activity at different attack focal points (as shown in figure 9) from user
equipment (UE) to RAN and core network [31].
14
Figure 9: Security threats in 5G [31]
2.4 IoT
As described in [40], IoT has a lot of security threats and challenges.
According to the researchers, we need to understand new features of IoT
for understanding security threats in IoT device. Below we define some
security threats of IoT that cause attack in IoT devices.
1. Ubiquitous: If we are talking about the IoTs, it is involved in our
daily life and use all our resources. Individuals who do not have an
idea of the security of devices are still use them, and manufacturers do
not pay much attention to the security of these devices. Producers do
not provide any safety advice or information that the device collects all
your sensitive data. The unsafe default configuration of these devices
is one of the latest attacks’ triggers. With the all aspects of these
devices abnormal behaviors of these devices are viewed and controlled
by the network operators.
2. Diversity: IoT has a number of devices that are involved in use cases
and applications. IoT tracks different cloud networks through distinc-
tive security elements and conventions. Differences in device capa-
bilities and requirements make it difficult to create a global defense
network. In order to dispatch DDoS attacks, attackers benefit from
these distinct qualities. The Intrusion Detection Systems (IDS), and
Intrusion Prevention Systems (IPS) can provide help in preventing
intrusion attacks.
3. Privacy: The parallel relationship exists between IoT devices and
users. A few sensors jointly gather important natural information
to track our climate. It is an easy task for a hacker to get sensitive
15
information and identities. For example, a smart home activity ar-
rangement by a home network traffic [41].
2.4.1 NB-IoT
Narrowband IoT is an invention of LPWAN radio that enables a wide
range of devices and services to be linked to the transmission band of cel-
lular media , specifically designed for IoT and standardized by 3GPP [43] ,
as discussed in the delimitation section of Chapter 1. NB-IoT is designed
to transmit a small batch of data provided by low power and a less verbose
IoT devices that transmits several bytes of data per day. NB-IoT operates
at 880-960MHz and 791-832KHz [44] respectively. In any case, there are
many NB-IoT confines on which an attacker can focus. As introduced in
3GPP Release-13, the reason for NB-IoT is to step forward indoor range
of 20dB compared to the legacy GPRS devices, to help at least 52547 low-
throughput devices, to reduce the complexity and efficient power consump-
tion with a battery life of 5 Watt 164 dB MCL battery capacity target of
10 times long time. As shown in figure 10 [44], NBIoT can be deployed in
stand-alone mode, in-band mode and in the guard band of an existing LTE
carrier.
There are a few open issues and vulnerabilities that need to be resolved.
The NB-IoT plan is to significantly enhance massive heterogeneous devices
16
Figure 10: NB-IoT deployment [44]
17
S11-U interface could crash.
The test performed in [48] shows that DoS attacks can be activated
more than once by sending CoAP requests to a border switch in a
smart home. As a result, 75% of the legitimate packets are misplaced
by sending malicious requests every 500 ms, and the effectively pack-
ets are destroyed by CoAP flooding. So, no effect on communication is
observed under the DoS attack by empowering the protected mode of
the transceivers.
18
2. MQTT (Message Queuing Telemetry Transport)
19
MQTT system, where the main task of the broker is to deliver mes-
sages from the distributor to the subscribers. Attackers also trigger
DoS by draining the MQTT client and broker by sending messages
larger than 256 MB, which is the MQTT’s most extreme payload mea-
sure. In addition, TCP focuses on MQTT, and TCP attacks such as
consumption of bandwidth, SYN flood, etc. in DoS attacks. An unse-
cured MQTT broker can generate a variety of IoT vulnerabilities. For
example, the transfer of all information or confidential information to
the public and the modification of the data stored in the broker or the
launch of the DoS [52] may lead to an out-of-chance for the aggres-
sor to hit a compromised broker. Despite the fact that MQTT relies
on SSL / TLS for the security component, it is costly to enforce it on
devices [53].
4. AMQP (Advanced Message Queueing Protocol)
In [53], authors state that XMPP has failed to provide end-to-end se-
cured communication for the deployment and implementation of IoT.
Unreliable XMPP is defenseless against attacks such as sniffing of
password, unauthorized access to servers, embedding, erasing, replay-
ing, and even more attacks.
20
6. ZigBee
21
device is in the network, it will listen all the sensitive information via
transmitted messages. The confidentiality problem is solved by ap-
pling some encryption algorithm on the messages. IEEE 802.15.4 en-
crypts the ongoing messages by Advanced Encryption Standard (AES)
[58].
7. 6loWPAN
22
In another recent research [62] that classifies the safety risks of
6loWPAN as an end-to-end and hop-to-hop attack. The hop-to-hop
attacks of 6loWPAN systems are triggered by inner malicious nodes
that are harmed. This form of attack is attacked by radio hops, phys-
ical link and routing discovery process. Tempering, battery exhaus-
tion, wormhole, jamming, spoofing and selective forwarding attacks
are triggered by unprotected equipment and the ability of the attacker
to control the 6loWPAN layer. The end-to-end attack on WSN IPv6-
based systems is caused by unauthorized external hardware. Attack-
ing the end-to-end link is harmful to the whole network. End-to-end
security is necessary because the hardware performs reassembly in
IPv6 and bundle fragmentation to maintain a strategic distance from
bundle modification and to reassemble the components. The attack
of this group takes place between the end of IPv6 and the 6loWPAN
boundary switch. Overwhelming the edge router, for example, by gen-
erating large amounts of activity or impeding communication by in-
fusing incorrect messages within the border router.
8. 802.15.4 Standard
23
device is what characterizes and categorizes the attack as DoS . It is im-
portant to remember that the attacker has to install the agent code on any
resource or device that supports it, in order to have a infiltration point in
the targeted system, regardless of being IoT devices, servers, network com-
ponents and any mobile device [18].
As most of the threats in IoT come from risky IoT devices, a network-
based technique for detecting infected IoT devices has been suggested in
[18]. The method suggested is accomplished by researching the two Mirai
and BASHLITE malware families. With the help of ISP accessible tools
such as NetFlow, DNS capturing analyzer, packets capturing, it is possible
to detect and analyze the malware. The main objective of the authors is to
24
discover the common properties, techniques and malware phases that de-
tect the weaknesses of IoT. The four stages that every IoT malware followed
in their life cycle are given in figure 16.
• Scanning: Testing is carried out by filter engines to detect the vul-
nerabilities of hosts. Random IPv4 subnets are checked and most of
the time port 23 running the Telnet daemon is the target, often port
2323 and all other ports running that running the different services
are scanned.
• Attacking: It is a very common property in IoT malware devices.
Most of the time the attacker sees the default username and password
such as ”admin / admin” or ’root/root’ and attack on the IoT devices.
The attack is also exploited these devices by using the TR-064 and
TR-069 services.
• Infection: It is conducted in a number of ways, such as getting to
HTTP(wget), TFTP [68], and over Telnet. The malware is downloaded
in this way by using the C binary compiler code and infected the
scanned device.
• Abuse: DDoS attacks are carried out by IoT botnets. SYN and SYN
/ ACK flood, TCP and UDP flood, and HTTP attacks are included in
the bulk of DDoS attacks.
As per authors, attacks on compromised IoT devices take place after fol-
lowing the same life cycles and can be mitigated at the ISP level by the said
25
system. Attacker issues DDoS attacks from the cloud by leasing virtual ma-
chines as they have higher processing capacity and cannot be tracked back
by these machines. DDoS security applied on both direction, i.e. source or
destination end. Attacks can be described as within the target side defense
system when it comes to the target, which is one of the disadvantages of
this type of defense. However, by comparing approaching and active ac-
tivity to identify DDoS attacks through D-WARD [69], MULTOPS [70] and
MANAnet [71] frameworks, the source side security frameworks underlie
the destination side defense.
26
Figure 17: Direct and Indirect Attacks [74]
increase in the number of packets and/or the transfer payload, and the re-
flector can submit more than the amount of the packets to the target victim
[74]. In a complex reflective attack, the uses can expert center as well as a
boot called a handler that controls 100 Zombies in a botnet, as can be seen
in Figure 18.
27
2.5.4 DDoS attack types
As shown in Figure 19, attacks by DDoS are primarily divided into three
groups, as an attack can be deployed on different layers. In multiple layers,
the attacker exploits the weakness of individual ports. For example, in UDP
flood the attacker overrun the target host random port with UDP. In the
meantime, that host checks that there is no place for that port to listen on
this specific application. Host replies with ICMP ’Destination Unreachable’
error. Due to consumption of these host resource, the host unable to its
legitimate user. Protocol attacks such as Ping of Death (PoD) and Smurf
control the sending of harmful pings to a computer by means of the Internet
Protocol [77].
The attacker can also use the Ping Search technique to find possible ca-
sualties, and the TCP SYN or ACK, UPD and ICMP are the most common
Ping Scans. When the firewall and ACL rules are less restrictive against
LANs or run inside IP addresses, ICMP checks are successful. In any case,
UDP Filter is useful when unsolicited UDP service and ICMP departure
traffic is not blocked within the Firewall. Successfully search against a
stateless firewall in the case of TCP that does not reject random ACK pack-
ets [78].
28
overloadin the long run, the victim machine will not be able to reply to its
legitimate user. Due to the stateless nature of the UDP convention, attack-
ers effectively dispatch UDP flood attacks by spoofing themselves. How-
ever, a few Operating Systems have ability to avoid UDP flood by limiting
the number of responses [79].
29
Figure 20: Machine Learning [86]
KNN classifier operates with the given positive number k, the close-
ness matrix d and the hidden perception x:
(a) It goes through all the datasets and determines d and x between
each of the observations being prepared. Within the context of
the planning outcomes, the closest K focuses on x, set A. The K is
always an odd number to estimate tie conditions.
30
(b) It checks the probabilities of each perception at that point.
The selection limit for the KNN classifier is based on the K vari-
able. The K variable must be chosen by the originator or data re-
searcher who performs machine learning tasks as a hyperparameter
in KNN, and the K value should be the best possible fit integer for the
dataset. As we limit the k-value for the classifier, that gives an output
as predicted value. Higher K-respect will, however, lead to smoother
limitations on choices and more tolerance to anomalies, since more
voters would have each expectation [88].
2. Decision Tree: The Decision Tree is a well-known machine learning
algorithm used to classify unknown data from trained data. A decision
tree may be either a binary or non-binary tree that includes a root,
internal and leaf node. All perceptions are placed in the root node, and
each of the inner nodes holds the testing of features. The selection is
made by using the top-down recursive approach and, as shown in the
figure [22], the category of the leaf node is returned as a result. The
calculation of the C4.5 decision tree selects the characteristics based
on the data collection rate and makes the ID3 (Iterative dichotomiser
3), that could be used to determine the decision tree which produces
the dataset option [90].
31
Figure 22: Structure of Decision Tree [90]
each set apart as having a place with either of two classes, a SVM
preparing calculation constructs a model that doles out new guides
to one classification or the other, making it a non-probabilistic double
direct classifier (despite the fact that techniques, for example, Platt
scaling exist to utilize SVM in a probabilistic characterization set-
ting). A SVM display is a portrayal of the precedents as focuses in
space, mapped with the goal that the instances of the different classifi-
cations are partitioned by a reasonable hole that is as wide as could be
expected under the circumstances. New precedents are then mapped
into that equivalent space and anticipated to have a place with a class
dependent on which side of the hole they fall.
32
Not with standing performing straight arrangement, SVMs can pro-
ductively play out a non-direct characterization utilizing what is kno-
wn as the bit trap, certainly mapping their contributions to high di-
mensional element spaces[93].
4. Naı̈ve Bayes Classifier: Based on the Bayes Hypothesis, the Naı̈ve
Bayes can be a simple probabilistic classifier that is useful to large
datasets [94]. When the features within the datasets are independent
of each other, the Naı̈ve Bayes model is easy to build, being a classifier
that provides a speedy performance. For example, where the reason
for classification is to distinguish from the probability that the incom-
ing packets are DDoS or normal, Naı̈ve Bayes performs very well in
double cases.
Under the equation below, the Bayes Theorem recalculates the fea-
tures and the classifier expects that the value of the training model
on the target is not subject to any other attribute [94][96]. The Naı̈ve
Bayes learns by calculating the likelihood of the preparation of the
training data.
P (B|A) × P (A)
P (A|B) = (1)
P (B)
Equation 1: Bayes’ theorem
There are two presumptions about the usage of Naı̈ve Bayes. One
could have categorical values that can lead to over-sensitivity, while
another possibility is that the features are distributed. The features
rely on each other in the operations, but the outcome of Naı̈ve Bayes
is still satisfactory.
5. Logistic Regression: The logistic model [97] is a broadly utilized
statistical model that, in its fundamental shape, utilizes a logistic cal-
culation to display a binary dependent variable. In regression analy-
sis, logistic regression is assessing the parameters of a strategic model;
33
it is a type of binomial regression.
Scientifically, a binary calculated model has a needy variable with
34
1. Fuzzy C-Means: The fuzzy c-method is used to recognition of pat-
tern. This method is used in the form the clustering. The analysis
of cluster involve on each data point such items that are similar, are
placed in similar cluster, while the item belong to different cluster
placed on different cluster. Clusters are recognized by means of like-
ness measures. These likeness measures incorporate separate, net-
work, and concentrated. Diverse closeness measures may be chosen
based on the information or the application [99].
Thomas et al. [104] study dissects a single dataset, DARPA , which takes
a rather distinctive approach and explores its usage throughout the analy-
sis of intrusion detection. The author conclude that the DARPA dataset has
the capability to view attacks that are usually seen in the network traffic
operation and can thus be viewed as an “the baseline of any research”.
35
In comparison to other datasets, Sharafaldin et al. [106] provide a more
comprehensive review of the IDS datasets, which focuses more on the pro-
visioning of a high-level diagram. 11 IDS datasets are evaluated by the
authors and compared to 11 properties. In addition, Bhuyan et al. [107]
briefly clarify and compare a broad range of approaches and frameworks
for network-anomaly detection. Authors are constantly examining meth-
ods for network defend and datasets that researchers may use to network
anomaly detection [107]. Similarly, Nisioti et al. [108] are talking about
12 IDS datasets and provide a fundamental assessment of unsupervised
technique for intrusion detection.
The most popular datasets for machine learning and artificial intelli-
gence [109] are described Yavanoglu and Aydos [110]. In addition, Ring
et al. [111] evaluated a variety of datasets. In order to dissect the impor-
tance of individual datasets for particular assessment situations, this work
recognizes 15 distinctive properties. On the basis of these characteristics,
the author are expected to provide an overview of existing datasets [111].
Based on the survey [112] DDoS attacks such as ICMP, UDP, SYN flood-
ing, etc are detected by Probabilistic Packet Marking (PPM) and Determini-
malistic Packet Marking (DPM) techniques. The solution is based on Trace-
back Methods, Entropy Variation and Intrusion Detection and Prevention
systems (IDS/IPS) are used. Intrusion Detection and Prevention systems
are using signature and anomaly based detection techniques. Each of these
methods has a distinctive outcome. Another DDoS recognition technique
is suggested by authors in [113] to resolve the limitation of factual and
classification-based techniques in the Socially Aware Organization (SAN)
using the Multi-Protocol-Fusion Highlight (MPFF). The MPFF strategy
is based on Autoregressive Coordinates Moving Normal Position Display,
which recognizes normal and DDoS behavior viably. On the other hand,
some analysts are trying to distinguish DDoS by various machine learning
procedures and calculations. Below described some research paper sum-
mary that helps in this study.
In [73], the type of attack is HTTP, TCP/IP. The technique used on traffic
filtering is utilization of resource monitoring and anomaly detection. Ar-
tificial neural network is used and attack is on destination location. In
[114] the author use machine learning technique on signature and anomaly
based detection. The attack type is flooding that based on OSI layer 3,4
and 7. Algorithm used is C.4.5, Naı̈ve and K-Means. The accuracy of these
algorithm is 98.8 %. The attack is took place on source devices. Another re-
search [115] are used on supervised classification. The classifier used here
are Naı̈ve Bayes and K-NN. The attack took placed on HTTP and result is
90 %.
Most of the past work, with a Ffew clear cases [105], takes a qualitative
approach to the survey of the IDS datasets. In this regard, a critical area of
background chapter is focuses on assessing the consistency of information
36
from a more analyzing point of view by different criteria, some of which are
relevant with Nehinbe work [116].
In this study, that also evaluate and analyze on the fact of a few new
concepts in this field are based on previous work. DDoS attack is the main
focus of this study. If we talking about the past work, that mostly work is
relevent with IDS dataset analysis, that is basically one type of attack in se-
curity. Although there have been several instances in the IDS dataset that
can be need to investigate. Where DDoS attacks have been highlighted,
they have never been considered to be the most significant point. In ad-
dition, by taking a more explanatory approach to the assessment, this re-
search work based on the work of Dhanabal and Shantharajah [105] that
uses multiple machine learning algorithms to survey datasets. There are
two angles that recognize this work from that of Dhanabal and Shanthara-
jah [105], even with the fact that similer mehodolgy is used. In start one
dataset is used that is end with DDoS attack in the end.
37
3 Approach
In Chapter 2, we discussed IoT security challenges in depth, where DDoS
attacks are one of the most considerable attacks in IoT devices. In this
chapter we explain machine learning based methods that we propose for
DDoS detection in cellular network, as well as the technologies and tools
that we are using in our experimental setup for generating traffic in DDoS
attack. We also describe the machine learning classifier implementations.
3.1 Objectives
The aim of this thesis is to detect DDoS attacks through machine learning
in cellular network via the packets generated by IoT devices. We divided
our experiment in three phases. In first phase we collect the data from IoT
devices to select and extract the features for machine learning classifica-
tion. In the second phase given the collected data, we organize it in two
datasets, one is normal situation, another with a DDoS attack. In the third
phase, the data have been prepared by Scikit-learn tool and labeled it in
correct format.
In a core cellular network generally, the user packet transfers from the
eNodeB to the Serving Gatway (SGW). The packets are then forwarded
from SGW via S5/S8 interface towards the Packet Gateway (PGW). The
PGW forwards these packets towards the application server.
When the IP is generated from the device, to reach the destination, that
packet is forward to an eNodeB. In this packet, the eNodeB attaches an-
other IP packet that has a GTP header. After that, it will encapsulate an
IP/UDP header to transmitted as an Ethernet frame to Serving Gateway.
Due to this packet encapsulation and changing of IP addressing, we should
inspect the packet for DDoS detection.
38
Figure 26: Research Methodology
39
wards toward the SGW. The SGW forwards it to PGW, and finally it reaches
Alice [118]. If there is any exploit in this payload, we would be able to de-
tect it before it reaches the PGW or the final destination, which is Alice in
this example.
An interface S11-U is used for small data transmission between SGW and
MME in CIoT (Cellular IoT) [119]. GPRS tunneling protocol user plane
(GTP-U) is used for S11-U. In GSM, UMTS, LTE and 5G core networks,
GTP-U is used in user plane for transmitting of user data traffic. In core
network when the traffic is encapsulated and passing, GTP based protocol
is used for IP/UDP. GTP also provides mobility for mobile user by creating a
tunnel between eNodeB, SGW, MME and PGW. The primary job of GTP-U is
the maintenance of paths per client tunneling request for IP Packets, echo
requests and error reporting. GTP have two versions; these are GTPv1-U
which is used for plane message transporting data and GTPv2-C which is
used for control plane signaling.
When we are talking about the GTP tunneling in core network, both nor-
mal and malicious packets look similar and these packets are not inspected
because of the placement of these packets inside the GTP. If we want to de-
tect a DDoS attack, we consider GTP user plane (GTPv1-U) to inspect the
packet. Figure 28 shows the performed cellular GTP tunneling between
eNodeB, SGW, PGW and packet traces from the user end. Each node is
assigned with a unique IP address and the packet is encapsulated in an
actual source and destination IP for the purpose of security and mobility.
In NB-IoT, GTP tunneling is performed from MME to SGW and PGW as
shown in figure [28] (blue arrow). Our proposed method capture packets
from SGW and performs packet inspection in SGW to recognize malicious
packets by extracting the features that can indicate a DDoS attack. Af-
ter that, machine learning classification algorithms can segregate between
normal and abnormal DDoS packets. If the packet is classified with a nor-
mal traffic than it will be forward towards SGW through PGW and reach
the IoT application server. If it is considered abnormal and further veri-
fied as an attack, the devices’ info is forwarded to the Identity Management
40
System (IDMS), which is the responsible for temporary or permanent block
of devices.
41
datasets such as CAIDA1 [120] dataset that is recorded in 2007. In CAIDA,
it is not guaranteed that the non-malicious data has been removed from
the DDoS dataset. Hence, when we use this dataset, it can include normal
packets as DDoS packets, and due to this we weren’t getting the desired
results. Another dataset that widely used for research is NSL-KDD2 [121] .
This dataset contains various attacks that also includes six types of DDoS
attack. These data are labeled with normal and attack type. However, the
author has not mentioned if the data are generated by IoT constructed de-
vices. Therefore, we choose to generate data based on our own requirement.
42
3. Python5 :- It’s a general purpose, high level open source programming
language. Ease of learning, efficient code, easy communication are
some of the features of Python, many researchers use this program-
ming language in this field. Python has many great environments
and libraries such as Spyder6 and Jupyter notebook7 . With the help
of Matplotlib library, scientists draw powerful 2D-graphs for their ma-
chine learning studeis. Due to these powerful features of Python, we
use this language for machine learning experiment.
5 https://www.python.org/
6 https://www.spyder-ide.org/
7 https://jupyter.org/
8 https://scikit-learn.org/stable
43
Figure 30: Test Lab Network
9 https://www.unb.ca/cic/datasets/ddos-2019.html
10 https://www.oslomet.no/
44
CICDDoS2019 contains the study of organized activity analysis us-
ing CICFlowMeter-V32 [125] and labeled flows. The B-Profile system
[124] was used to explain the basic actions of human intelligence and
to generate naturalistic activity for traffic. The theoretical behavior of
25 users for this data set was based on the HTTP , HTTPS, FTP, SSH,
and e-mail protocols [124]. Various advanced DDoS attacks, such as
Port Map, NetBIOS, LDAP, MSSQL, UDP, UDP-Lag, SYN, NTP, DNS
and SNMP, are included in this data collection. Capturing time for
the training day on 12 January began at 10:30 a.m. and it finished
at 5:15 p.m. and the test day on March 11 started at 9:40 a.m. And
it finished at 5:35 p.m. Finally the attacks were executed during this
period.
Machine OS IPs
192.168.50.4
(second day)
Win 7 192.168.50.8
Win 10 192.168.50.7
Win 7 192.168.50.9
Win 10 192.168.50.8
45
3.3.4 Feature Extraction
To distinguish between DDoS and normal IoT traffic, we need to indicate
the packet features that are selected for machine learning classification.
Protocol type, port, source and destination IP and packet length features
have been used for recognizing most of the DDoS detection.
3. Packet Size Variance:- For the most part, parcels of assault activ-
ity have the same estimate, while regular traffic has different packet
measurements, the activity of the same record has a distinct estimate
[24]. All TCP attack bundles weigh 90 bytes in our data set to illus-
trate. Conditions 1 and 2 can also be used with differences in package
sizes.
46
An unmistakable target can be checked within 10 seconds to identify
an attack [24].
6. TCP SYN:- The server does not receive the client’s ACK response in
TCP SYN dissension of benefit assault as mentioned in the Assault
case, since the point of the attacker is not to establish touch, but to
connect the assets to cause the server to be inert. In this way, SYN
and ACK are viewed as part of the TCP flood.
3.4 Methodology
The data generated was captured from WireShark. The raw data recorded
in pcap format and then converted in comma separated values (CSV). Nor-
mal and malicious data features extracted as discussed in Chapter 2. Nor-
mal traffic was put in one file and DDoS attack or malicious data placed in
another file.
47
1 #transformation
2
3 scaled_features = scaler.transform(df.drop(’Length’,axis=1))
4 df_feat = pd.DataFrame(scaled_features,columns=df.drop(’Length’,
axis=1).columns)
5 df_feat.head()
Listing 1: Transformation
Length Label
normal 0
malicious 1
1 #labeling
2
3 i = 0
4 while i < len(df[’Arrival Time’]):
5 #Coding Threshold: [IoT]: SYN can occur in packets below 100
bytes
6 if(df.at[i,’Length’] <= 100):
7 #Coding Threshold: [IoT]: SYN can occur in packets between 42
and 1434 bytes
8 #if(df.at[i,’Length’] >= 42 and df.at[i,’Length’] <= 1434):
9 #Coding Threshold: SYN can occur in packets between 50 and 70
bytes or between 160 and 180 bytes
10 #if(df.at[i,’Length’] >= 50 and df.at[i,’Length’] <= 70 or df.
at[i,’Length’] >= 160 and df.at[i,’Length’] <= 180 ):
11 df.at[i,’Length’] = 1
12 else:
13 df.at[i,’Length’] = 0
14 df.at[i,’Arrival Time’] = time.mktime(datetime.datetime.
strptime((df[’Arrival Time’][i])[:-7], ’%b %d, %Y %H:%M:%S.%f’
).timetuple())
15 df.at[i,’Source’] = df.at[i,’Source’].replace(".", "")
16 df.at[i,’Destination’] = df.at[i,’Destination’].replace(".", "
")
17 i += 1
Listing 2: Labeling
48
3.4.2 Splitting of Dataset
The ability to summarize new or hidden data is a key feature of a great
learning model. A model that is close to a specific dataset is described as
over-fit.
Datasets are divided into two subsets; training and testing. The split
data is divided in 70/30 ratio. The train test split helper method is used
from scikit-learn library [130] for splitting of data.
With this approach, training data is divided into two parts, training and
validation. The training set is used to train the model in start, then valida-
tion set is used to estimate the performance of data. k-fold approach [131]
is used for validation of this study.
3.4.3 Modeling
The classification is divided into two phases.
• Generation of learning model
• Construction of the predicted labels
We use scikit-learn Python library for implementation of our task. This
library is used in data analysis, data mining and machine learning.
1. Selection of model:- This study features the testing and training on
classification methods namely k-nearest neighbors, sum and others.
(a) Parametric/Non-parametric Algorithms:- The difference be-
tween parametric and non-parametric is that a parametric can be
loosely described as a predefined attribute in data. It possesses
a fixed number of parameters [132]. The parametric algorithms
are ideal if the assumptions are correct. However, if the assump-
tions are incorrect than these algorithm perform badly.
49
Non-parametric algorithms are more flexible algorithms. These
type of algorithms perform slower computations, but make less
assumptions about the dataset [132]. In this study, non-parametric
algorithms are used.
50
Figure 34: k-NN classifier - an example [133]
51
Figure 36: SVM - an example [133]
52
Figure 38: Pseudo-code for Naı̈ve Bayes Algorithm [133]
The calculation works its way down from the root node by it-
eratively measuring the data set for each inclusion in the prepa-
ration kit. The data collection is used to determine the degree
of preconception that has been forced to highlight in the target
groups. The higher the set of data, the higher the value of the
property in the classification of each observation [133] [134]. The
root hub is replaced by the quality that collects the highest infor-
mation, and the calculation requires the information set by the
selected highlight to produce the subsets. The description of this
approach is shown in figure 39.
53
(e) Logistic Regression: Another predictive analysis and best
suited analysis type is Logistic regression,where the subordinate
variable is binary. The data is described by the binary calculated
and the relationship between a subordinate binary variable and
other non-binary free variables is clarified [134]. figure 40 de-
scribes the calculation for the assessed relapse classifier.
54
Table 3: scikit-learn Python Library [130]
KNN neighbors.KNeighborsClassifier
Decision tree.DecisionTreeClassifier
Tree
SVM sklearn.svm.LinearSVC
3.4.4 Evaluation
Evaluation is a very crucial part for understanding the performance of a
chosen model. This part defines the performance of the models. Below are
described the various metrics used in this study.
1
2 \# Model Accuracy: how often is the classifier correct?
3 print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
4
5 \# Model Precision: what percentage of positive tuples are labelled as
such?
6 print("Precision:",metrics.precision_score(y_test, y_pred, average=
’weighted’))
7
8 \# Model Recall: what percentage of positive tuples are labelled as
such?
9 print("Recall:",metrics.recall_score(y_test, y_pred, average=’
weighted’))
Listing 5: Evaluation model
55
incorrect values are represented in the values of accuracy. A tabu-
lar visualization of the performance of your supervised algorithm is
shown in next chapter.
56
4 Results
In Chapter 3 we elaborate on how to generate and collect the datasets.
Normal dataset is generated in OsloMet lab and DDoS datasets are collect
from CIC [124]. These both datasets are labeled and we extract the re-
quired features. After that the data is transformed and converted into the
format that is acceptable for Scikit-learn machine learning algorithm. A
few experiments were carried out to verify the performance and accuracy of
the classifier for various combinations and sizes of data. Our DDoS detec-
tion test considered to be based on TCP SYN attack due to time constraints,
as the point of the study is to provide a DDoS attack method. The data is
split in preparation of training and testing datasets
57
4.1 First Threshold - (Length below 100 bytes)
Our dataset is based on roughly 10,000,00 observations which associated in
the ways described in chapter 3. Below we illustrate the normal and DDoS
scenario with their results.
58
1. K-Nearest Neighbors
59
(a) Error Rate and K-Value The performance of the K nearest
neighbors classifier depends on three factors; k value (of neigh-
bors), distance metric and decision rule.
1 # find k value
2
3 error_rate = []
4
5 for i in range(1,40):
6
7 knn = KNeighborsClassifier(n_neighbors=i)
8 knn.fit(X_train,y_train)
9 pred_i = knn.predict(X_test)
10 error_rate.append(np.mean(pred_i != y_test))
Listing 6: find K value - Normal (First Threshold)
60
2. Other Classifiers
61
3. Performance Metrics
Table 5 shows the overall performance on the basis of 1st threshold
with normal traffic. Table showcase the accuracy, precision, and recall
results. All classifiers were trained and tested on 70:30 ratio split
dataset.
62
4.1.2 DDoS Scenario
Table 6 shows how accurate they performed with DDoS traffic. The SVM
show no anomaly in this experiment but other classifiers show anomalies.
SVM anomaly has good accuracy result but it doesn’t shows any anomaly.
Navie Bayes performs well to detect the anomaly but with lower accuracy.
The “Classifier Accuracy” column denotes the result of our classifiers.
63
1. K-Nearest Neighbors
64
(a) Error Rate and K-Value The performance of the K nearest
neighbors classifier depends on three factors; k value (of neigh-
bors), distance metric and decision rule.
1 # find k value
2
3 error_rate = []
4
5 for i in range(1,40):
6
7 knn = KNeighborsClassifier(n_neighbors=i)
8 knn.fit(X_train,y_train)
9 pred_i = knn.predict(X_test)
10 error_rate.append(np.mean(pred_i != y_test))
Listing 7: find K value - DDoS (First Threshold)
65
2. Other Classifiers
66
3. Performance Metrics
Table 7 shows the overall performance on the basis of 1st threshold
with DDoS traffic. Table showcase the accuracy, precision, and recall
results. All classifiers were trained and tested on 70:30 ratio split
dataset.
67
4.2 Second Threshold - (Length between 50 and 70 bytes
& between 160 and 180 bytes)
In this experiment, we choose a stricter threshold to study behavior change
in the chosen classifiers. We observed that the classifier performance is
similar compairing with the previous threshold. Below we illustrate the
normal and DDoS scenario with their results in second threshold that we
set a lenght betwen 50 to 70 bytes & 160 to 180 bytes.
68
1. K-Nearest Neighbors
69
(a) Error Rate and K-Value The performance of the K nearest
neighbors classifier depends on three factors; k value (of neigh-
bors), distance metric and decision rule.
1 # find k value
2
3 error_rate = []
4
5 for i in range(1,40):
6
7 knn = KNeighborsClassifier(n_neighbors=i)
8 knn.fit(X_train,y_train)
9 pred_i = knn.predict(X_test)
10 error_rate.append(np.mean(pred_i != y_test))
Listing 8: find K value - Normal (Second Threshold)
70
2. Other Classifiers
71
3. Performance Metrics
Table 9 shows the overall performance on the basis of second threshold
with normal traffic. Table showcase the accuracy, precision, and recall
results. All classifiers were trained and tested on 70:30 ratio split
dataset.
72
4.2.2 DDoS Scenario
Table 10 shows how accurate they performed with DDoS traffic. In this
dataset K-NN and SVM show good results with this threshold Table 10
shows the anomaly is detected with all five classifiers. The “Classifier Ac-
curacy” column denotes the result of our classifiers.
73
1. K-Nearest Neighbors
74
(a) Error Rate and K-Value The performance of the K nearest
neighbors classifier depends on three factors; k value (of neigh-
bors), distance metric and decision rule.
1 # find k value
2
3 error_rate = []
4
5 for i in range(1,40):
6
7 knn = KNeighborsClassifier(n_neighbors=i)
8 knn.fit(X_train,y_train)
9 pred_i = knn.predict(X_test)
10 error_rate.append(np.mean(pred_i != y_test))
Listing 9: find K value - DDoS (Second Threshold)
75
2. Other Classifiers
76
3. Performance Metrics
Table 11 shows the overall performance on the basis of second thresh-
old with DDoS traffic. The table showcase the accuracy, precision, and
recall results. All classifiers were trained and tested on 70:30 ratio
split dataset.
77
5 Evaluation / Discussion
This study was conducted to analyze the action, performance and utiliza-
tion of the machine learning algorithms in the context of intrusion detec-
tion system. Due to growing of research in vulnerabilities, the detection
of anomalies is a main topic in these days. In the last few years, with
thousands of computer-based applications being developed every day, the
Web has grown exponentially and has quickly become a fundamental com-
ponent of today’s era, and secure organizational situations are becoming
fundamental to its aggressive development.
DDOS attacks are one of the greatest threats to web destinations among
different types of attacks and act as a demolish risk to computer system pro-
tection , especially due to their potential impact. Therefore, research has
been expanded in this area, with analysts focusing on other ways of identi-
fying and avoiding DDoS attacks. Researchers and industry are working to
find out good solutions in the field of machine learning and artificial intel-
ligence for intrusion detection and prevention. However, different business
partners and researchers often find it difficult to obtain excellent quality
datasets to test and evaluate their machine learning models for detection
of threats. This problem is the main motivation of this study, and basis for
research questions.
Various works have been studied in order to explore the dynamics of dif-
ferent datasets and how their legitimacy is influenced by machine learning
techniques. Numerous problems have been found with regard to the cur-
rent datasets, the counting of security concerns, the accessibility of informa-
tion, the availability and the arrangement with the objective investigation.
This was followed by a review of previous work related to the interpretation
and comparison of datasets.
78
The datasets were divided individually into a ratio of 70:30 for model
training and testing. In order to ensure that the experiment is carried out
in an appropriate manner, all classifiers were chosen based on literature
review. We found that, the K-nearest neighbors had an overall better per-
formance compared to other classifiers. The results were evaluated using a
set of performance metrics, including precision , accuracy and recall. Below
are the findings of this study according to research.
79
Figure 54: Overview of Performance Metrics (DDoS - 1st Threshold)
A reason for these discrepancies is most likely due to the thresholds cho-
sen. Throughout this work we were able to concluded that some classifiers
are more sensitive hence producing results that weren’t the expected ones.
An establishment of more robust thresholds that are more adequate to our
studied scenario is needed to provide more reliable results.
In this study we are able to detect the attack given by the supervised and
labeled dataset with differences in performance depending on the classifier.
In a real life context and given the early stage of the implementation, the
result data would have be sent to the corresponding security expert team
in a telecom operator for further validation.
80
Figure 56: Overview of Performance Metrics (DDoS - 2nd Threshold)
In this first stage of analysis and validation from the security experts,
it is up to them to perform a decision on what to do with these devices or
network. One possible solution is to blacklist the devices from the network,
impacting their usage. The security experts could also shutdown that spe-
cific cell as a more drastic operation.
81
6 Conclusion and Future Work
6.1 Conclusion
This thesis work addresses the two main questions defined in Chapter
1. The mission was to look at the issues of IoT protection from the point
of view of the Cellular Network in terms of the security challenges. A de-
tailed study is made about the security threat analysis within the cellular
network, including GSM , UMTS, LTE and 5G, was conducted to achieve
this goal.
IoT protocol vulnerabilities have also been discussed point by point, which
are mainly attacks that exploit security vulnerabilities. Machine learning
techniques have been used to identify DDoS attacks in insecure IoT de-
vices to achieve the objectives of this study. Recognizing attacks within the
cellular network is not the same as recognizing attacks in an IP network.
However, a sudden increase in the acceptance of packets in a single node
from the number of distinctive MME nodes in the case of IoT could suggest
an attack, as IoT devices do not transmit packets in a very high frequency.
This work begins with the literature review and previous work in this
filed. Firstly, the study presents an overview of how other researchers dis-
cuss the issue of discovery of intrusion detection with the use of machine
learning. This has provided a much better understanding that how differ-
ent algorithms work and can help understand how to mitigate the propa-
gation of DDoS attacks. In addition, it also provides an understanding of
which algorithms are commonly used to deal with problems in this area.
Based on our proposal, the normal and DDoS data has been generated ac-
cording to the cellular network that discussed in chapter 2. In this way, we
suggested performing packet analysis to detect DDoS attacks as described
in chapter 4, using machine learning classifiers. At that point, by using five
classification methods, such as KNN, Decision Tree, and Naı̈ve Bayes, SVM
and logistic regression, we analyze their performance as to detect possible
attacks, where tests against different dataset sizes, k-fold cross-validation
were performed.
This has become a much discussed topic in these days: to identify IoT-
based DDoS attacks in the cellular network using a machine learning ap-
proach.
82
that could lead to a specific use in the future. Subsequently, the strategy
recommends a full-scale DDoS detection technique within the cellular net-
work, and offline data has been used for training and testing of the model.
We would like to recommend that this methodology be tested in a true re-
search setting for future work. In addition, this strategy focused only on the
TCP SYN flood. In order to secure IoT services in the future, we would like
to incorporate all potential DDoS attacks. In conclusion, using a few other
algorithms, such as Recurrent Neural Network in the Google Tensorflow
framework in the future.
We hope that this study starts as a basis to create a helping tool for tele-
com operators that could be used in the future to detect DDoS and other
types of attacks in a more automated fashion.
83
A Modeling Source Code
1 def knn_comparison(X_train, X_test, y_train, y_test, k):
2 #x = data[[’Source Port’, ’Destination Port’,’Length’]].values
3 #y = data[’Arrival Time’].astype(int).values
4 X_train = X_train.astype(int)
5 X_test = X_test.astype(int)
6 y_train = y_train.astype(int).values
7 y_test = y_test.astype(int).values
8
9 clf = neighbors.KNeighborsClassifier(n_neighbors=k, n_jobs=-1)
10 clf.fit(X_train, y_train)
11
12 print(’For k=:’, k)
13 #Predict the response for test dataset
14 y_pred = clf.predict(X_test)
15
16 # Model Accuracy: how often is the classifier correct?
17 print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
18
19 # Model Precision: what percentage of positive tuples are labelled
as such?
20 print("Precision:",metrics.precision_score(y_test, y_pred, average=
’weighted’))
21
22 # Model Recall: what percentage of positive tuples are labelled as
such?
23 print("Recall:",metrics.recall_score(y_test, y_pred, average=’
weighted’))
24
25 # Plotting decision region
26 # Decision region for further added feature = 1.5
27 value = 1.5
28 # Plot training sample with added features = 1.5 +/- 0.75
29 width = 0.75
30 plot_decision_regions(X_train, y_train, clf=clf,
filler_feature_values={2: value, 3:value},
31 filler_feature_ranges={2: width, 3: width},
legend=2) #, ax=ax)
32
33 #ax.set_xlabel(’Feature 1’)
34 #ax.set_ylabel(’Feature 2’)
35 #ax.set_title(’Feature 3 = {}’.format(value))
36
37 # Adding axes annotations
38 #fig.suptitle(’Knn with K=’+ str(k))
39 #plt.show()
40
41 # Adding axes annotations
42 plt.xlabel(’X’)
43 plt.ylabel(’Y’)
44 plt.title(’Knn with K=’+ str(k))
45 plt.show()
46 # from:
47 # http://rasbt.github.io/mlxtend/user_guide/plotting/
plot_decision_regions/
48 # https://www.datacamp.com/community/tutorials/svm-classification-
scikit-learn-python
49
50 def other_classifiers_comparison(X_train, X_test, y_train, y_test):
51
84
52 X_train = X_train.astype(int)
53 X_test = X_test.astype(int)
54 y_train = y_train.astype(int).values
55 y_test = y_test.astype(int).values
56
57 #Create a svm Classifier
58 clf1 = svm.SVC(kernel=’linear’, gamma="auto") # Linear Kernel
59 clf2 = linear_model.LogisticRegression(random_state=1, solver=’
lbfgs’, n_jobs=-1)
60 clf3 = ensemble.RandomForestClassifier(n_estimators=100,
random_state=1, n_jobs=-1)
61 clf4 = naive_bayes.GaussianNB()
62
63 # Plotting decision regions
64 # Decision region for further added feature = 1.5
65 #value = 1.5
66 # Plot training sample with added features = 1.5 +/- 0.75
67 #width = 0.75
68 #plot_decision_regions(X_train, y_train, clf=clf,
filler_feature_values={2: value, 3:value}, filler_feature_ranges
={2: width, 3: width}, legend=2)
69
70 gs = gridspec.GridSpec(2, 2)
71 fig = plt.figure(figsize=(10,8))
72
73 labels = [’SVM’, ’Logistic Regression’, ’Random Forest’, ’Naive
Bayes’]
74 for clf, lab, grd in zip([clf1, clf2, clf3, clf4],labels,itertools.
product([0, 1], repeat=2)):
75
76 #Train the model using the training sets
77 clf.fit(X_train, y_train)
78
79 #Predict the response for test dataset
80 y_pred = clf.predict(X_test)
81
82 print("Results for: ", lab)
83 # Model Accuracy: how often is the classifier correct?
84 print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
85
86 # Model Precision: what percentage of positive tuples are
labeled as such?
87 print("Precision:",metrics.precision_score(y_test, y_pred,
average=’weighted’))
88
89 # Model Recall: what percentage of positive tuples are labelled
as such?
90 print("Recall:",metrics.recall_score(y_test, y_pred, average=’
weighted’))
91
92 # Plotting decision region
93 # Decision region for further added feature = 1.5
94 value = 1.5
95 # Plot training sample with added features = 1.5 +/- 0.75
96 width = 0.75
97
98 ax = plt.subplot(gs[grd[0], grd[1]])
99 fig = plot_decision_regions(X=X_train, y=y_train, clf=clf,
filler_feature_values={2: value, 3:value},
100 filler_feature_ranges={2: width, 3: width},
legend=2)
101 plt.title(lab)
85
102 plt.show()
103 i = 0
104 while i < len(df[’Timestamp’]):
105 #df.at[i,’Timestamp’] = time.mktime(datetime.datetime.strptime(df[’
Timestamp’][i], ’%M:%S.%f’).timetuple())
106 #Coding Threshold: SYN can occur in packets between 42 and 1434
bytes
107 #if(df.at[i,’Average Packet Size’] >= 42 and df.at[i,’Average
Packet Size’] <= 1434):
108 #Coding Threshold: SYN can occur in packets between 50 and 70 bytes
or between 160 and 180 bytes
109 if(df.at[i,’Average Packet Size’] >= 50 and df.at[i,’Average Packet
Size’] <= 70 or df.at[i,’Average Packet Size’] >= 160 and df.at[i,
’Average Packet Size’] <= 180 ):
110 df.at[i,’Average Packet Size’] = 1
111 else:
112 df.at[i,’Average Packet Size’] = 0
113 i += 1
114
115 # visualizing the updated timestamp field
116 df.head()
117 X_train, X_test, y_train, y_test = train_test_split(scaled_features,df[
’Average Packet Size’],test_size=0.30,random_state=100)
118 other_classifiers_comparison(X_train, X_test, y_train, y_test)
Listing 10: Source code
86
B Dataset Samples
87
Figure 59: Second Dataset (Before Transform)
88
References
[1] Keyur K Patel, Sunil M Patel, et al. “Internet of things-IOT: defini-
tion, characteristics, architecture, enabling technologies, application
& future challenges”. In: International journal of engineering science
and computing 6.5 (2016).
[2] Shanzhi Chen et al. “A vision of IoT: Applications, challenges, and
opportunities with china perspective”. In: IEEE Internet of Things
journal 1.4 (2014), pp. 349–359.
[3] In Lee and Kyoochun Lee. “The Internet of Things (IoT): Appli-
cations, investments, and challenges for enterprises”. In: Business
Horizons 58.4 (2015), pp. 431–440.
[4] Priyanka Rawat, Kamal Deep Singh, and Jean Marie Bonnin. “Cog-
nitive radio for M2M and Internet of Things: A survey”. In: Com-
puter Communications 94 (2016), pp. 1–29.
[5] Fredrik Jejdling (ericsson). Ericsson Mobility Report. URL: https:
//www.ericsson.com/en/mobility-report/reports.
[6] Yong Ho Hwang. “Iot security & privacy: threats and challenges”.
In: Proceedings of the 1st ACM workshop on IoT privacy, trust, and
security. 2015, pp. 1–1.
[7] ericsson. IoT connections outlook. URL: https://www.ericsson.
com/en/mobility-report/reports/june-2020/iot-connections-
outlook.
[8] Rwan Mahmoud et al. “Internet of things (IoT) security: Current
status, challenges and prospective measures”. In: 2015 10th Interna-
tional Conference for Internet Technology and Secured Transactions
(ICITST). IEEE. 2015, pp. 336–341.
[9] Mung Chiang and Tao Zhang. “Fog and IoT: An overview of research
opportunities”. In: IEEE Internet of Things Journal 3.6 (2016), pp. 854–
864.
[10] Philipp Schulz et al. “Latency critical IoT applications in 5G: Per-
spective on the design of radio interface and network architecture”.
In: IEEE Communications Magazine 55.2 (2017), pp. 70–78.
[11] Wenjie Yang et al. “Narrowband wireless access for low-power mas-
sive internet of things: A bandwidth perspective”. In: IEEE wireless
communications 24.3 (2017), pp. 138–145.
[12] Michael A Andersson et al. “Feasibility of ambient RF energy har-
vesting for self-sustainable M2M communications using transpar-
ent and flexible graphene antennas”. In: IEEE Access 4 (2016), pp. 5850–
5857.
[13] Marcin Bajer. “IoT for smart buildings-long awaited revolution or
lean evolution”. In: 2018 IEEE 6th International Conference on Fu-
ture Internet of Things and Cloud (FiCloud). IEEE. 2018, pp. 149–
154.
89
[14] Jianli Pan and Zhicheng Yang. “Cybersecurity Challenges and Op-
portunities in the New” Edge Computing+ IoT” World”. In: Proceed-
ings of the 2018 ACM International Workshop on Security in Soft-
ware Defined Networks & Network Function Virtualization. 2018,
pp. 29–32.
[15] Beth Stackpole. Symantec Security Summary - June 2020, COVID-
19 attacks continue and new threats on the rise. URL: https : / /
symantec-enterprise-blogs.security.com/blogs/feature-
stories/symantec-security-summary-june-2020.
[16] Tobias Heer et al. “Security Challenges in the IP-based Internet of
Things”. In: Wireless Personal Communications 61.3 (2011), pp. 527–
542.
[17] NETSCOUT. Cloud in the crosshairs. URL: https://www.netscout.
com/report/.
[18] Ivo Van der Elzen and Jeroen van Heugten. “Techniques for detect-
ing compromised IoT devices”. In: University of Amsterdam (2017).
[19] Kubra Saeedi. Machine learning for ddos detection in packet core
network for iot. 2019.
[20] EY Building a better working world. Cyber risk management. URL:
https://www.ey.com/no_no/consulting/cybersecurity-
risk-management.
[21] Ngo Manh Khoi et al. “IReHMo: An efficient IoT-based remote health
monitoring system for smart regions”. In: 2015 17th International
Conference on E-health Networking, Application & Services (Health-
Com). IEEE. 2015, pp. 563–568.
[22] Y. A. Qadri et al. “The Future of Healthcare Internet of Things: A
Survey of Emerging Technologies”. In: IEEE Communications Sur-
veys Tutorials 22.2 (2020), pp. 1121–1167.
[23] Prosanta Gope and Tzonelih Hwang. “BSN-Care: A secure IoT-based
modern healthcare system using body sensor network”. In: IEEE
sensors journal 16.5 (2015), pp. 1368–1376.
[24] Rohan Doshi, Noah Apthorpe, and Nick Feamster. “Machine learn-
ing ddos detection for consumer internet of things devices”. In: 2018
IEEE Security and Privacy Workshops (SPW). IEEE. 2018, pp. 29–
35.
[25] Wikipedia. Cellular network. URL: https://en.wikipedia.org/
wiki/Cellular_network.
[26] Artiza Networks. LTE Tutorials. URL: https://www.artizanetworks.
com/resources/tutorials/fuc.html.
[27] Qualcomm Inc. RAN WG3 Chairman Dino Flore. LTE RAN architec-
ture aspects. URL: ftp://www.3gpp.org/workshop/2009- 12-
17_ITU-R_IMT-Adv_eval/docs/pdf/REV-09000%5C%20LTE%
5C%20RAN%5C%20Architecture%5C%20aspects.pdf.
[28] 3GPP MCC Frédéric Firmin. The Evolved Packet Core. URL: https:
//www.3gpp.org/technologies/keywords- acronyms/100-
the-evolved-packet-core.
90
[29] Alepo Technologies Inc. Home Subscriber Server (HSS). URL: https:
//medium.com/@AlepoTech/home-subscriber-server-hss-
82470d3f332.
[30] Q. Pham et al. “A Survey of Multi-Access Edge Computing in 5G
and Beyond: Fundamentals, Technology Integration, and State-of-
the-Art”. In: IEEE Access 8 (2020), pp. 116974–117017.
[31] Madhusanka Liyanage et al. Comprehensive Guide to 5G Security.
Wiley Online Library, 2018.
[32] Jahangir Saqlain. “IoT and 5G: History evolution and its architec-
ture their compatibility and future.” In: (2018).
[33] wikipedia. Vulnerability (computing). URL: https://en.wikipedia.
org/wiki/Vulnerability_(computing).
[34] Mohamed Abomhara et al. “Cyber security and the internet of things:
vulnerabilities, threats, intruders and attacks”. In: Journal of Cyber
Security and Mobility 4.1 (2015), pp. 65–88.
[35] Sachin Babar et al. “Proposed security model and threat taxonomy
for the Internet of Things (IoT)”. In: International Conference on
Network Security and Applications. Springer. 2010, pp. 420–429.
[36] Christos Xenakis and Lazaros Merakos. “Security in third genera-
tion mobile networks”. In: Computer communications 27.7 (2004),
pp. 638–650.
[37] Anastasios N Bikos and Nicolas Sklavos. “LTE/SAE security issues
on 4G wireless networks”. In: IEEE Security & Privacy 11.2 (2012),
pp. 55–62.
[38] Marc Lichtman et al. “LTE/LTE-A jamming, spoofing, and sniffing:
threat assessment and mitigation”. In: IEEE Communications Mag-
azine 54.4 (2016), pp. 54–61.
[39] Yongsuk Park and Taejoon Park. “A survey of security threats on 4G
networks”. In: 2007 IEEE Globecom workshops. IEEE. 2007, pp. 1–
6.
[40] Wei Zhou et al. “The effect of iot new features on security and pri-
vacy: New threats, existing solutions, and challenges yet to be solved”.
In: IEEE Internet of Things Journal 6.2 (2018), pp. 1606–1616.
[41] Bogdan Copos et al. “Is anybody home? Inferring activity from smart
home network traffic”. In: 2016 IEEE Security and Privacy Work-
shops (SPW). IEEE. 2016, pp. 245–251.
[42] Job Noorman et al. “Sancus: Low-cost trustworthy extensible net-
worked devices with a zero-software trusted computing base”. In:
22nd {USENIX} Security Symposium ({USENIX} Security 13). 2013,
pp. 479–498.
[43] Wikipedia. Narrowband IoT. URL: https://en.wikipedia.org/
wiki/Narrowband_IoT.
[44] Yihenew Dagne Beyene et al. “On the performance of narrow-band
Internet of Things (NB-IoT)”. In: 2017 ieee wireless communications
and networking conference (wcnc). IEEE. 2017, pp. 1–6.
91
[45] Nitin Mangalvedhe, Rapeepat Ratasuk, and Amitava Ghosh. “NB-
IoT deployment study for low power wide area cellular IoT”. In: 2016
ieee 27th annual international symposium on personal, indoor, and
mobile radio communications (pimrc). IEEE. 2016, pp. 1–6.
[46] Changsheng Yu et al. “Uplink scheduling and link adaptation for
narrowband Internet of Things systems”. In: IEEE Access 5 (2017),
pp. 1724–1734.
[47] Mohsin B Tamboli and Dayanand Dambawade. “Secure and efficient
CoAP based authentication and access control for Internet of Things
(IoT)”. In: 2016 IEEE International Conference on Recent Trends in
Electronics, Information & Communication Technology (RTEICT).
IEEE. 2016, pp. 1245–1250.
[48] Rafael de Jesus Martins et al. “Performance Analysis of 6LoWPAN
and CoAP for Secure Communications in Smart Homes”. In: 2016
IEEE 30th International Conference on Advanced Information Net-
working and Applications (AINA). IEEE. 2016, pp. 1027–1034.
[49] Ala Al-Fuqaha et al. “Internet of things: A survey on enabling tech-
nologies, protocols, and applications”. In: IEEE communications sur-
veys & tutorials 17.4 (2015), pp. 2347–2376.
[50] ABHISHEK GHOSH. Message Queuing Telemetry Transport (MQTT)
Protocol. URL: https://thecustomizewindows.com/2014/07/
message-queuing-telemetry-transport-mqtt-protocol/.
[51] Ahmad W Atamli and Andrew Martin. “Threat-based security anal-
ysis for the internet of things”. In: 2014 International Workshop on
Secure Internet of Things. IEEE. 2014, pp. 35–43.
[52] Syed Naeem Firdous et al. “Modelling and evaluation of malicious
attacks against the iot mqtt protocol”. In: 2017 IEEE International
Conference on Internet of Things (iThings) and IEEE Green Com-
puting and Communications (GreenCom) and IEEE Cyber, Physical
and Social Computing (CPSCom) and IEEE Smart Data (Smart-
Data). IEEE. 2017, pp. 748–755.
[53] Lavinia Nastase. “Security in the internet of things: A survey on ap-
plication layer protocols”. In: 2017 21st International Conference on
Control Systems and Computer Science (CSCS). IEEE. 2017, pp. 659–
666.
[54] Nitin Naik. “Choice of effective messaging protocols for IoT systems:
MQTT, CoAP, AMQP and HTTP”. In: 2017 IEEE international sys-
tems engineering symposium (ISSE). IEEE. 2017, pp. 1–7.
[55] Ian Noel McAteer et al. “Security vulnerabilities and cyber threat
analysis of the AMQP protocol for the internet of things”. In: (2017).
[56] Jesús A Romualdo Ramırez, Enrique Mendéz Franco, and David
Tinoco Varela. “Fuzzification of facial movements to generate human-
machine interfaces in order to control robots by XMPP internet pro-
tocol”. In: MATEC Web of Conferences. Vol. 125. EDP Sciences. 2017,
p. 04020.
92
[57] ZigBee Alliance. “Zigbee alliance”. In: WPAN industry group, http://www.
zigbee. org/. The industry group responsible for the ZigBee standard
and certification (2010).
[58] Shahin Farahani. ZigBee wireless networks and transceivers. Newnes,
2011.
[59] Sharly Joana Halder, Joon-Goo Park, and Wooju Kim. “Adaptive fil-
tering for indoor localization using ZIGBEE RSSI and LQI measure-
ment”. In: Adaptive Filtering Applications (2011), pp. 305–324.
[60] Abdelkader Lahmadi, Cesar Brandin, and Olivier Festor. “A testing
framework for discovering vulnerabilities in 6LoWPAN networks”.
In: 2012 IEEE 8th International Conference on Distributed Comput-
ing in Sensor Systems. IEEE. 2012, pp. 335–340.
[61] René Hummen et al. “6LoWPAN fragmentation attacks and mitiga-
tion mechanisms”. In: Proceedings of the sixth ACM conference on
Security and privacy in wireless and mobile networks. 2013, pp. 55–
66.
[62] Ghada Glissa and Aref Meddeb. “6LoWPAN multi-layered security
protocol based on IEEE 802.15. 4 security features”. In: 2017 13th
International Wireless Communications and Mobile Computing Con-
ference (IWCMC). IEEE. 2017, pp. 264–269.
[63] Ajay Kumar Nain et al. “A secure phase-encrypted IEEE 802.15.
4 transceiver design”. In: IEEE Transactions on Computers 66.8
(2017), pp. 1421–1427.
[64] Ajay Kumar Nain and Pachamuthu Rajalakshmi. “A reliable covert
channel over IEEE 802.15. 4 using steganography”. In: 2016 IEEE
3rd World Forum on Internet of Things (WF-IoT). IEEE. 2016, pp. 711–
716.
[65] keycdn. DDoS Attack. URL: https://www.keycdn.com/support/
ddos-attack.
[66] Vincenzo Matta, Mario Di Mauro, and Maurizio Longo. “DDoS at-
tacks with randomized traffic innovation: Botnet identification chal-
lenges and strategies”. In: IEEE Transactions on Information Foren-
sics and Security 12.8 (2017), pp. 1844–1859.
[67] Liang Xiao et al. “IoT security techniques based on machine learn-
ing: How do IoT devices use AI to enhance security?” In: IEEE Sig-
nal Processing Magazine 35.5 (2018), pp. 41–49.
[68] Sam Edwards and Ioannis Profetis. “Hajime: Analysis of a decen-
tralized internet worm for IoT devices”. In: Rapidity Networks 16
(2016).
[69] Jelena Mirkovic, Gregory Prier, and Peter Reiher. “Attacking DDoS
at the source”. In: 10th IEEE International Conference on Network
Protocols, 2002. Proceedings. IEEE. 2002, pp. 312–321.
[70] Thomer M Gil and Massimiliano Poletto. “MULTOPS: A Data-Structure
for Bandwidth Attack Detection.” In: USENIX Security Symposium.
2001, pp. 23–38.
93
[71] MANANET CS3. Comprehensive DDoS Defense Solutions. URL: http:
//cs3-inc.com/MANAnet.html.
[72] Zecheng He, Tianwei Zhang, and Ruby B Lee. “Machine learning
based DDoS attack detection from source side in cloud”. In: 2017
IEEE 4th International Conference on Cyber Security and Cloud
Computing (CSCloud). IEEE. 2017, pp. 114–120.
[73] Stefan Seufert and Darragh O’Brien. “Machine learning for auto-
matic defence against distributed denial of service attacks”. In: 2007
IEEE International Conference on Communications. IEEE. 2007,
pp. 1217–1222.
[74] Todd Booth and Karl Andersson. “Network security of internet ser-
vices: eliminate DDoS reflection amplification attacks”. In: Journal
of Internet Services and Information Security (JISIS) 5.3 (2015),
pp. 58–79.
[75] Peter Ken Bediako. Long Short-Term Memory Recurrent Neural Net-
work for detecting DDoS flooding attacks within TensorFlow Imple-
mentation framework. 2017.
[76] Esraa Alomari et al. “Botnet-based distributed denial of service (DDoS)
attacks on web servers: classification and art”. In: arXiv preprint
arXiv:1208.0403 (2012).
[77] Imperva. DDoS Attacks. URL: https://www.imperva.com/learn/
ddos/ddos-attacks/.
[78] CAPEC. View the List of Attack Patterns. URL: https://capec.
mitre.org/.
[79] FuiFui Wong and Cheng Xiang Tan. “A survey of trends in massive
DDoS attacks and cloud-based mitigations”. In: International Jour-
nal of Network Security & Its Applications 6.3 (2014), p. 57.
[80] Seyed Mohammad Mousavi and Marc St-Hilaire. “Early detection of
DDoS attacks against SDN controllers”. In: 2015 International Con-
ference on Computing, Networking and Communications (ICNC). IEEE.
2015, pp. 77–81.
[81] Monowar H Bhuyan, DK Bhattacharyya, and Jugal K Kalita. “An
empirical evaluation of information metrics for low-rate and high-
rate DDoS attack detection”. In: Pattern Recognition Letters 51 (2015),
pp. 1–7.
[82] Rodrigo Braga, Edjard Mota, and Alexandre Passito. “Lightweight
DDoS flooding attack detection using NOX/OpenFlow”. In: IEEE Lo-
cal Computer Network Conference. IEEE. 2010, pp. 408–415.
[83] Laura Feinstein et al. “Statistical approaches to DDoS attack detec-
tion and response”. In: Proceedings DARPA information survivabil-
ity conference and exposition. Vol. 1. IEEE. 2003, pp. 303–314.
[84] Jarrod Bakker. “Intelligent traffic classification for detecting DDoS
attacks using SDN/OpenFlow”. In: (2017).
[85] Manjula Suresh and R Anitha. “Evaluating machine learning algo-
rithms for detecting DDoS attacks”. In: International Conference on
Network Security and Applications. Springer. 2011, pp. 441–452.
94
[86] Rory P Bunker and Fadi Thabtah. “A machine learning framework
for sport result prediction”. In: Applied computing and informatics
15.1 (2019), pp. 27–33.
[87] David Adedayo Adeniyi, Zhaoqiang Wei, and Y Yongquan. “Auto-
mated web usage data mining and recommendation system using
K-Nearest Neighbor (KNN) classification method”. In: Applied Com-
puting and Informatics 12.1 (2016), pp. 90–108.
[88] Kevin Zakka’s Blog. A Complete Guide to K-Nearest-Neighbors with
Applications in Python and R. URL: https://kevinzakka.github.
io/2016/07/13/k-nearest-neighbor/.
[89] VS Prasatha et al. “Effects of distance measure choice on knn clas-
sifier performance-a review”. In: arXiv preprint arXiv:1708.04321
(2017).
[90] Feng Tian et al. “Research on flight phase division based on deci-
sion tree classifier”. In: 2017 2nd IEEE International Conference on
Computational Intelligence and Applications (ICCIA). IEEE. 2017,
pp. 372–375.
[91] Yi-Chi Wu et al. “DDoS detection and traceback with decision tree
and grey relational analysis”. In: International Journal of Ad Hoc
and Ubiquitous Computing 7.2 (2011), pp. 121–136.
[92] Corinna Cortes and Vladimir Vapnik. “Support-vector networks”.
In: Machine learning 20.3 (1995), pp. 273–297.
[93] Asa Ben-Hur et al. “Support vector clustering”. In: Journal of ma-
chine learning research 2.Dec (2001), pp. 125–137.
[94] Tina R Patil. “MSSS Performance analysis of naive bayes and J48
classification algorithm for data classification. Intl”. In: Journal of
Computer Science and Applications 6.2 (2013).
[95] Alex Smola and SVN Vishwanathan. “Introduction to machine learn-
ing”. In: Cambridge University, UK 32.34 (2008), p. 2008.
[96] Tejaswinee A Shinde and Jayashree R Prasad. “IoT based animal
health monitoring with naive Bayes classification”. In: IJETT 1.2
(2017).
[97] the free encyclopedia Wikipedia. Logistic regression. URL: https:
//en.wikipedia.org/wiki/Logistic_regression.
[98] the free encyclopedia Wikipedia. Regression analysis. URL: https:
//en.wikipedia.org/wiki/Regression_analysis.
[99] keycdn. DDoS Attack. URL: https://reference.wolfram.com/
legacy/applications/fuzzylogic/Manual/12.html.
[100] Tuan A Tang et al. “Deep learning approach for network intrusion
detection in software defined networking”. In: 2016 International
Conference on Wireless Networks and Mobile Communications (WIN-
COM). IEEE. 2016, pp. 258–263.
[101] Vishal Maini. Machine Learning for Humans, Part 5: Reinforcement
Learning. URL: https : / / medium . com / machine - learning -
for-humans/reinforcement-learning-6eacf258b265.
95
[102] Marek Małowidzki, P Berezinski, and Michał Mazur. “Network in-
trusion detection: Half a kingdom for a good dataset”. In: Proceed-
ings of NATO STO SAS-139 Workshop, Portugal. 2015.
[103] Robert Koch, Björn Stelte, and Mario Golling. “Attack trends in
present computer networks”. In: 2012 4th International Conference
on Cyber Conflict (CYCON 2012). IEEE. 2012, pp. 1–12.
[104] Ciza Thomas, Vishwas Sharma, and N Balakrishnan. “Usefulness
of DARPA dataset for intrusion detection system evaluation”. In:
Data Mining, Intrusion Detection, Information Assurance, and Data
Networks Security 2008. Vol. 6973. International Society for Optics
and Photonics. 2008, 69730G.
[105] L Dhanabal and SP Shantharajah. “A study on NSL-KDD dataset
for intrusion detection system based on classification algorithms”.
In: International Journal of Advanced Research in Computer and
Communication Engineering 4.6 (2015), pp. 446–452.
[106] Amirhossein Gharib et al. “An evaluation framework for intrusion
detection dataset”. In: 2016 International Conference on Information
Science and Security (ICISS). IEEE. 2016, pp. 1–6.
[107] Monowar H Bhuyan, Dhruba Kumar Bhattacharyya, and Jugal K
Kalita. “Network anomaly detection: methods, systems and tools”.
In: Ieee communications surveys & tutorials 16.1 (2013), pp. 303–
336.
[108] Antonia Nisioti et al. “From intrusion detection to attacker attribu-
tion: A comprehensive survey of unsupervised methods”. In: IEEE
Communications Surveys & Tutorials 20.4 (2018), pp. 3369–3388.
[109] Thomas H Morris, Zach Thornton, and Ian Turnipseed. “Industrial
control system simulation and data logging for intrusion detection
system research”. In: 7th annual southeastern cyber security summit
(2015), pp. 3–4.
[110] Ozlem Yavanoglu and Murat Aydos. “A review on cyber security
datasets for machine learning algorithms”. In: 2017 IEEE Interna-
tional Conference on Big Data (Big Data). IEEE. 2017, pp. 2186–
2193.
[111] Markus Ring et al. “A survey of network-based intrusion detection
data sets”. In: Computers & Security 86 (2019), pp. 147–167.
[112] Priyanka Kamboj et al. “Detection techniques of DDoS attacks: A
survey”. In: 2017 4th IEEE Uttar Pradesh Section International Con-
ference on Electrical, Computer and Electronics (UPCON). IEEE.
2017, pp. 675–679.
[113] Jieren Cheng et al. “A DDoS detection method for socially aware
networking based on forecasting fusion feature sequence”. In: The
computer journal 61.7 (2018), pp. 959–970.
[114] Umer Ahmed Butt et al. “A Review of Machine Learning Algorithms
for Cloud Computing Security”. In: Electronics 9.9 (2020), p. 1379.
96
[115] S Umarani and D Sharmila. “Predicting application layer DDoS at-
tacks using machine learning algorithms”. In: International Journal
of Computer and Systems Engineering 8.10 (2015), pp. 1912–1917.
[116] Joshua Ojo Nehinbe. “A critical evaluation of datasets for investi-
gating IDSs and IPSs researches”. In: 2011 IEEE 10th International
Conference on Cybernetic Intelligent Systems (CIS). IEEE. 2011, pp. 92–
97.
[117] Ali Feizollah et al. “A study of machine learning classifiers for anomaly-
based mobile botnet detection”. In: Malaysian Journal of Computer
Science 26.4 (2013), pp. 251–265.
[118] Irfan Ali. Technology, Presentation, Inforgraphic, Documents, 26 April
2017. [Online]. URL: https://www.slideshare.net/aliirfan04/
gtp-overview.
[119] Cisco. Ultra IoT C-SGN Administration Guide, StarOS Release 21.20,
23 August 2020. [Online]. URL: https://www.cisco.com/c/en/
us / td / docs / wireless / asr _ 5000 / 21 - 20 _ 6 - 14 / Ultra -
IoT - CSGN - Admin - Guide / 21 - 20 - ultra - iot - csgn - admin /
21 - 17 - Ultra - IoT - CSGN - Admin _ chapter _ 01 . html ? dtid =
osscdc000283.
[120] Center for Applied Internet Data Analysis. The CAIDA DDoS Attack
2007, Dataset. URL: https://www.caida.org/data/passive/
ddos-20070804_dataset.xml.
[121] Canadian Institute for Cybersecurity. NSL-KDD dataset. URL: https:
//www.unb.ca/cic/datasets/nsl.html.
[122] g0tmi1k. What is Kali Linux? URL: https : / / www . kali . org /
docs/introduction/what-is-kali-linux/.
[123] wireshark. Wireshark User’s Guide. URL: https://www.wireshark.
org/docs/wsug_html_chunked/ChapterIntroduction.html.
[124] Canadian Institute for Cybersecurity. DDoS Evaluation Dataset CIC−
DDoS2019. URL: https://www.unb.ca/cic/datasets/ddos-
2019.html.
[125] Fabio Cesar Schuartz, Anelise Munaretto, and Mauro Fonseca. “Uma
Comparação entre os Sistemas de Detecção de Ameaças Distribuıdas
de Rede Baseado no Processamento de Dados em Fluxo e em Lotes”.
In: Anais do XXIV Workshop de Gerência e Operação de Redes e
Serviços. SBC. 2019, pp. 29–42.
[126] Priya Pandey, Maneela Jain, and Rajneesh Pachouri. “DDOS AT-
TACK ON WIRELESS SENSOR NETWORK: A REVIEW.” In: In-
ternational Journal of Advanced Research in Computer Science 8.9
(2017).
[127] Thwe Thwe Oo and Thandar Phyu. “Analysis of DDoS Detection
System based on Anomaly Detection System”. In: International Con-
ference on Advances in Engineering and Technology (ICAET’2014).
Singapore. 2014.
97
[128] Markus Miettinen et al. “Iot sentinel: Automated device-type iden-
tification for security enforcement in iot”. In: 2017 IEEE 37th In-
ternational Conference on Distributed Computing Systems (ICDCS).
IEEE. 2017, pp. 2177–2184.
[129] the free encyclopedia EWikipedia. Cross-industry standard process
for data mining. URL: https : / / en . wikipedia . org / wiki /
Cross-industry_standard_process_for_data_mining.
[130] scikit-learn. Classification. URL: https://scikit- learn.org/
stable/.
[131] Jason Brownlee. A Gentle Introduction to k-fold Cross-Validation.
URL : https://machinelearningmastery.com/k-fold-cross-
validation/.
[132] sebastianraschka. Machine Learning FAQ. URL: https://sebastianraschka.
com/faq/docs/parametric_vs_nonparametric.html.
[133] Tom M Mitchell. Machine Learning, volume 1 of 1. 1997.
[134] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foun-
dations of machine learning. MIT press, 2018.
[135] David Martin Powers. “Evaluation: from precision, recall and F-
measure to ROC, informedness, markedness and correlation”. In:
(2011).
98