A Deep Learning Approach for IDS Thesis
A Deep Learning Approach for IDS Thesis
A thesis submitted in the partial fulfilment of the requirements for the degree of
By
Bipraneel Roy
December, 2018
THIS PAGE LEFT INTENTIONALLY BLANK
DEDICATION
I owe the most to my mother Bakul Rani Roy, for her genuine optimism and
continuous inspiration. Without her encouragement and support this piece of work would
have been nowhere. I sincerely put my gratitude towards her for supporting me in every
way one can be supported and dedicate my MRes Thesis to my Mother.
THIS PAGE LEFT INTENTIONALLY BLANK
ACKNOWLEDGEMENT
The work presented in this thesis is to the best of my knowledge and believe, original
except as acknowledged in the text. I hereby declare that I have not submitted this material
either in full or in part, for a degree at this or any other institution.
Signature
THIS PAGE LEFT INTENTIONALLY BLANK
TABLE OF CONTENTS
i
2.7.1. Types of Learning in ML ................................................................................... 13
2.7.2. Algorithms of Machine Learning ...................................................................... 13
2.8. Machine Learning versus Deep Learning ................................................................. 16
2.9. A Popular ML Algorithm - Deep Learning .............................................................. 17
2.9.1. DL Architecture ................................................................................................. 18
2.9.2. Salient Aspects of Deep Learning ...................................................................... 19
2.9.2.1. Representation Learning ............................................................................. 19
2.9.2.2. Distributed Representations ....................................................................... 20
2.9.2.3. Learning Multiple Levels of Representations............................................. 20
2.9.3. Resent Advances ................................................................................................ 20
2.10. Implementation of DL in IoT Applications ............................................................ 21
2.10.1. Smart Homes..................................................................................................... 21
2.10.2. Smart City......................................................................................................... 21
2.10.3. Energy.............................................................................................................. 22
2.10.4. Intelligent Transportation System ................................................................... 22
2.10.5. Healthcare and Wellbeing ............................................................................... 23
2.10.6. Agriculture....................................................................................................... 23
2.10.7. Education ......................................................................................................... 24
2.10.8. Industry ............................................................................................................ 25
2.10.9. Government ..................................................................................................... 25
2.10.10. Sport and Entertainment .................................................................................. 26
2.11. Related Works ....................................................................................................... 26
2.12. Improvement to Existing Identified Research Gaps .............................................. 28
Chapter 3 - NEURAL NETWORKS ................................................................................... 30
3.1. Artificial Neurons ................................................................................................. 30
3.2. Feed Forward Neural Networks ........................................................................... 31
3.3. Recurrent Neural Networks ................................................................................. 32
3.4. Long Short-Term Memory RNN ......................................................................... 34
3.5. Bi-directional Long Short-Term Memory RNN .................................................. 36
3.6. Training Neural Networks ................................................................................... 37
3.7. Activation Functions ............................................................................................ 38
3.7.1. Step Function ................................................................................................. 39
ii
3.7.1.1. Rectified Linear Unit (ReLU) ................................................................ 40
3.8. Deep Learning ...................................................................................................... 40
3.9. Dropout Regularization ........................................................................................ 41
3.10. Deep Learning Loss Function .............................................................................. 41
3.10.1. Mean Squared Error ...................................................................................... 42
3.10.2. Cross Entropy Loss ...................................................................................... 42
3.11. Deep Neural Network Implementation Frameworks ........................................... 42
3.11.1. H2O.............................................................................................................. 43
3.11.2. Torch ............................................................................................................ 43
3.11.3. TensorFlow .................................................................................................. 43
3.11.4. Caffe............................................................................................................. 43
3.11.5. Theano ......................................................................................................... 44
3.11.6. Neon ............................................................................................................ 44
3.12. Training Dataset and Feature Identification: UNSW-NB15 ................................. 46
3.13. Dataset Format Conversion .................................................................................. 48
Chapter 4 - RESEARCH METHODOLOGY...................................................................... 50
4.1. Introduction .......................................................................................................... 50
4.2. Model Design Methodology ................................................................................ 50
4.3. Implementation Methodology .............................................................................. 52
4.3.1. Keras Library .............................................................................................. 52
4.4. Evaluation Methodology ...................................................................................... 53
4.5. Hardware and Software Used .............................................................................. 55
Chapter 5 - ARCHITECTURE & IMPLEMENTATION ................................................... 56
5.1. System Architecture ............................................................................................. 56
5.2. Technical Knowhow – Programming Language, Development Environment,
Backend Framework and Libraries ...................................................................... 57
5.3. Code Structure...................................................................................................... 58
5.3.1. IDS Class .................................................................................................... 60
5.3.2. Data Class ................................................................................................... 60
5.3.2.1. preprocess Method.................................................................................. 61
5.3.2.1.1. importDataset Method........................................................................ 61
5.3.2.1.2. normalizeData Method....................................................................... 63
iii
5.3.2.1.3. dataStructure Method ......................................................................... 65
5.3.2.1.4. reshape Method ................................................................................. 66
5.3.3. Classifier Class............................................................................................ 69
5.3.4. FitModel Class ............................................................................................ 72
5.3.5. Detection Class ........................................................................................... 73
5.4. Conclusions .......................................................................................................... 74
Chapter 6 - SIMULATION RESULTS & EVALUATION ................................................ 76
6.1. Metric Definition and Clarification...................................................................... 76
6.2. Performance over Different Hyper-Parameters ................................................... 77
6.2.1. Performance over Different Time-Steps.......................................................... 78
6.2.2. Performance over Different Batch-Size .......................................................... 79
6.2.3. Performance over Different Dropout Rates ..................................................... 79
6.3. Performance on Reduced Test-set........................................................................ 80
6.4. Performance on Full UNSW-NB15 Test-set........................................................ 83
Chapter 7 - Conclusion & Future Work............................................................................... 85
References ............................................................................................................................ 87
Appendix - A...................................................................................................................... 100
iv
LIST OF FIGURES
v
LIST OF TABLES
vi
LIST OF ABBREBRIATIONS
AE Auto Encoder
AI Artificial Intelligence
BP Back Propagation
DL Deep Learning
EM Expectation Maximization
vii
FN False Negative
FP False Positive
IP Internet Protocol
ML Machine Learning
viii
ReLU Rectified Linear Unit
TN True Negative
TP True Positive
ix
ABSTRACT
Internet-of-Things connects every ‘thing’ with the Internet and allows these
‘things’ to communicate with each other. IoT comprises of innumerous interconnected
devices of diverse complexities and trends. This fundamental nature of IoT structure
intensifies the amount of attack targets which might affect the sustainable growth of IoT.
Thus, security issues become a crucial factor to be addressed. A novel deep learning
approach have been proposed in this thesis, for performing real-time detections of security
threats in IoT systems using the Bi-directional Long Short-Term Memory Recurrent
Neural Network (BLSTM RNN). The proposed approach have been implemented through
Google TensorFlow implementation framework and Python programming language. To
train and test the proposed approach, UNSW-NB15 dataset has been employed, which is
the most up-to-date benchmark dataset with sequential samples and contemporary attack
patterns. This thesis work employs binary classification of attack and normal patterns. The
experimental result demonstrates the proficiency of the introduced model with respect to
recall, precision, FAR and f-1 score. The model attains over 97% detection accuracy. The
test result demonstrates that BLSTM RNN is profoundly effective for building highly
efficient model for intrusion detection and offers a novel research methodology.
x
Chapter 1
INTRODUCTION
The Internet, since 1960s, has been playing an important role in connecting
individuals and putting organization and businesses together. It has collapsed the
geographical barriers that previously used to exist between peoples and has provided
an efficient and financially worthwhile way of communications.
These days, things are changing and opening a completely new dimension of
communication due to the emergence of smart objects which possess the competency
of creating and collaborating data through the Internet in a much smarter course.
Internet of Things (IoT) is the cutting-edge innovation and frameworks which can
possibly change the way in which we live. IoT can be viewed as an innovation which
is built upon two fundamental components: “Internet” and “Things”. The “Things”
simply refers to any kind of device or object that has the capability of perceiving or
collecting information about itself or the surrounding environment. These smart
devices or things has the capability of analyzing and acting accordingly with other
devices by using “Internet” as the backbone network for communication.
IoT communication systems can reach way beyond the traditional Internet and
has the potential to improve human life condition. For instance, through IoT, human
health can be remotely monitored, thus, rejecting the necessity of visiting the hospital
physically. For example, University of Edinburgh, Scotland, have created minute
computing gadgets that can be attached to human chest, and can screen and gather
respiratory information and after that transmit it remotely to the respective specialists
who can pursue their cases remotely [114]. IoT is being utilized by government
organizations round the globe for gathering information from various regions and to
make accessible enhanced facilities in security, health, development and
transportation. IoT is employed by enterprises for accommodating enhanced customer
services and to augment security and safety to employees. IoT can also enhance the
way of managing day to day life. For instance, Amazon Echo are a bunch of smart IoT
devices having a linguistic capability. People can interact with the devices and can ask
for advice regarding weather, schedule alarm or obtain new feed from the Internet.
1
Internet of Things (IoT), originally termed by Kevin Ashton in the year 1999 [2],
stands for a system of globally recognizable physical devices (or things) which can
sense the environment around them and behave intelligently. To construct the IoT
network, a varied assortment of technologies is required. These techniques support to
shape a virtual world of objects or things over the physically connected networks
where things can communicate to each other in an intelligent way, providing
information to people or taking actions based on individual inputs. IoT is rising at an
accelerating stride, interconnecting billions of device or ‘Things’. As per Gartner [1],
about 25 billion distinctively recognizable objects or things are predicted to be a part
of the worldwide computing system by 2020 [1]. These interconnected devices
augment regular activities and shape smart solutions. However, the immense prospects
and conveniences brought by IoT leads to security concerns.
1.2. Motivation
RQ1. Why Deep Learning is efficient in intrusion detection accuracy over IoT
network than prevailing machine learning techniques?
RQ3. What are the parameters essential for BLSTM RNN to generate a low False
Alarm Rate (FAR) and a high detection accuracy?
RQ4: What are the efficient ways to implement the BLSTM RNN approach?
Determine the key factors that give advantages to Deep Learning (DL) over
prevailing Machine Learning (ML) techniques in detecting intrusion over IoT
network.
Implement BLSTM RNN approach using TensorFlow framework for
developing the code for an AI model proficient of detecting intrusions in IoT
network.
3
Determine the probable optimal hyper-parameters required by the proposed
model to attain the highest detection accuracy and FAR in least time.
Evaluate the reported performance of the introduced model.
1.6. Delimitation
This thesis is limited to the detection of intrusions in IoT network layer only.
Also, the proposed model is restricted to only detecting intrusions and provides no
prevention mechanism whatsoever.
4
1.8. Outline
5
Chapter 2
LITERATURE REVIEW
2.1. Introduction
With progressively deep incorporation of human society with the Internet, the
way people live, work and study is changing. Along with it, numerous security
concerns are growing more serious. Identifying various network attacks, remains an
inevitable technical concern. An Intrusion Detection System (IDS) could recognize
attacks which are ongoing or an invasion that has already happened. As a matter-of-
fact, the mechanism of detecting intrusion is equal to a classification task, including
multiclass classification or binary classification. Precisely, the key motivation of
detecting intrusions is to improve the classifier’s detection accuracy in efficiently
identifying abnormal data patterns.
6
entrenched things which intercommunicates to provide smart Information Technology
(IT) facilities.
The European Research Cluster on the Internet-of-Things (IERC) specifies IoT
as: “a dynamic global network infrastructure with self-configuring capabilities based
on standard and interoperable communication protocols where physical and virtual
‘things’ have identities, physical attributes, and virtual personalities and use intelligent
interfaces, and are seamlessly integrated into the information network” [107].
7
2.4. Classification of “Things” in IoT
According to [105] the mapping among “things” over cyber and physical world
is an inevitable part of IoT infrastructure, where “things” could be categorized into
two sorts: Physical things and Cyber things. The grouping of “things” in IoT are
illustrated in the Figure 2.2.
a) Objects: These are tangible things with measurable bodies like Persons, vehicles,
tablets etc.
b) Behaviors: It refers to the movements of the objects. For instance, running, driving,
monitoring and so on.
c) Tendency: This refers to the trends in physical things, like the tendency of a vehicle
in a parking is to be stationary. This trend may also occur due to external factors
like congested traffic or weather becoming cloudy.
d) Physical events: These are an assortment of all the above-named properties
integrating to define the events caused by certain situations in the physical world.
8
2.4.2. Cyber Things
Despite the massive potential of the IoT in numerous spheres, the entire
communication setup of the IoT network is flawed as per the security standpoint is
concerned. The rising usage of IoT devices requires a prevailing security against
probable vulnerabilities or attacks. Therefore, security is essential at every layers of
IoT infrastructure, primarily for there is no network boundary or perimeter. Security
constraints that required to be considered in IoT applications could be characterized
into four key categories [109]:
9
Since IoT is the incorporation of multiple diverse networks, consequently it is
challenging to accomplish a reliable association between the explicit nodes because of
the constantly varying characteristics of the nodes. IoT architecture can be broadly
arranged into three layers: sensing layer, transportation layer and application layer.
Figure 2.3 below illustrates the security architecture of IoT.
10
2.5.2. Security Issues at Sensor Network
Summoning Malicious Codes: Harmful programs like worm could affect the
sensor network very easily, as, to execute, the worms do not require any other
dependent files, which makes it very hard to identify and take action.
Tag defect: Due to lack of enough security it is easy for the intruder to
accomplish illegal usage of legal reader. An invader could effortlessly get the
tag information and for accessing Radio Frequency Identification (RFID)
devices devoid of any kind of prior authentication through forging.
11
2.5.5. Network Capacity Limitation
The converging of devices which ascends from the IoT system kindles greater
claim for a certain grade of Quality of Service (QoS) of the connected network
infrastructure. Applications that deliver certain services might demand additional
frequent transfer of small data blocks (sessions) essential for upgrading and
synchronization. Frequency of the mentioned sessions will generate a significant effect
on delay and penetrability of the network. This fragment of the infrastructure
necessarily be securely brought for ensuring secure information flow [111].
12
the classical example of spam filter would put it into the proper context, which is: The
task ‘T’ represents the prediction of whether an email is a spam or not. The ‘E’
represents the experience, that is the training data set, and the performance ‘P’ is
measured as the ratio between appropriately classified emails.
13
1) Regression Algorithm
a) Stepwise regression
b) Logistic regression
c) Ordinary lest squares regression (OLSR)
d) Linear regression
2) Clustering Algorithm
14
3) Bayesian Algorithm
a) Bayesian Network
b) Gaussian Naïve Bayes
c) Naïve Bayes
d) Multinomial Naïve Bayes
a) Decision Stump
b) Classification and Regression Tree (CART)
c) M5
d) C4.5 and C5.0
Artificial Neural Network (ANN) algorithms mimics the way of the biological
neurons for complying with classification and regression problems [121]. Some ANN
algorithm examples are as follows:
15
2.8. Machine Learning versus Deep Learning
16
2.9. A Popular ML Algorithm - Deep Learning
Figure 2.5: Google Trend screening more inclination toward DL in recent times [39]
17
2.9.1. DL Architecture
Deep neural network (DNN) comprises of 3 major layers, namely: the input
layer, manifold hidden layers and the output layer. The layers are constituted with
multiple neurons or units. A single neuron is the computational unit which accepts
some input vectors, computes a weighted summation of the input vectors, then passes
the resultant sum through the activation function for generating the output. Figure 2.6
represents the structure of a single neuron, where {X1, X2…Xn} represents the set of
inputs, {W1, W2 … Wn} represents the weight vector and the bias is represented by b.
These weights and biases would be optimized through the training course. The
summation of all the inputs, their respective weights and bias are feed into the
activation function to generate the output. The purpose of the activation function is to
help the neuron to learn complex patterns and present a non-linear properties into the
network.
Figure 2.6: A neuron with multiple inputs and weights and bias [39]
In a typical DL input layer, random weights are allotted to the input data and
are forwarded to subsequent layer. Every succeeding layer similarly allots weights to
the respective inputs and generate outputs. Output of former layer contributes as input
of the subsequent layer. Model’s output layer represents the prediction outcome. The
accuracy of the model is determined by a loss function that computes the error-rate
among the actual output (i.e. output generated by the model) and the expected output.
The loss or error-rate represents the divergence among the actual and expected output.
The error-rate is then transmitted over the network back to the input layer. This
18
technique of error-rate transmission across the network is termed as Back-propagation
(BP). The BP is utilized for updating network weights and biases. The DNN again
iterates the cycle and optimizes the weights of individual neuron in every iteration, till
the error-rate reduces under an anticipated threshold value. Once it’s attained, the DNN
is trained and is equipped for operation. Figure 2.7 illustrates the high level working
of training phase of a typical DL algorithm.
Four main reasons of deep learning resurgence are discussed in the following sections.
19
making a machine capable of both feature learning and using those features for
performing an explicit task.
Neural networks (NNs) are around for several decades [32]. Nevertheless, till
2006, deep NNs were overtaken by shallow architectures. In the same year, though,
Hinton and Salakhutdinov [33] proposed a unique technique of pre-training the DNNs.
The concept was based on employing restricted Boltzmann machines for initializing
the weights of a single layer at a time. This acquisitive technique initialized the weights
of the fully connected NN which resulted to enhanced local optima [34]. Vincent et al.
20
[35] exhibited that alike effects could be achieved by utilizing auto-encoders. An auto-
encoder is an ANN employed for unsupervised learning.
Additional causes have lately facilitated deep learning networks to attain state-
of-the-art performance. For instance, accessibility of big datasets, faster computing
devices like multi-core CPU and GPU computing architectures. A deep learning
architecture excludes manual feature engineered training data and hence requires an
enormous size of data. In this era of ‘big data’, various institutions and researchers can
inexpensively and easily accumulate huge datasets that might be utilized for training
DL models with many parameters.
21
conjointly with Dell Technologies has lately developed a DL testbed, and used it in a
Smart Community Center, Kawasaki, Japan, for evaluating and analyzing the gathered
information [91]. The big data which fuels the testbed were collected from
construction management, building security and air conditioning. Another significant
matter for smart city is prediction of patterns of the crowd movement. Song et al. [58]
established a mechanism built upon DNNs for resolving the issue on a city level. Their
proposed model is based on 4-layered LSTM RNN employed to learn from human
mobility (GPS information), joined with modes of transportation like train, car,
bicycle, and walk). The authors insist that their approach of deep LSTM RNN attains
better efficiency than shallow LSTMs. Waste supervision and classification of trash is
one more correlated job that smart cities should exhibit. A vision-based classifications
by utilizing deep CNNs might be a way to address the job [59]. Amato et al. [60]
established a decentralized structure for identifying the occupied and the vacant spots
in a parking lots by means of smart cameras and deep CNNs. Valipour et al. [61] also
came up with a detection system for vacant parking areas by employing CNN and
exhibits better results than SVM network.
2.10.3. Energy
22
accuracy of 88%. Y Tian et al. [65] also conveyed the study on short-term traffic flow
forecast by means of LSTM RNN model that showed enhanced prediction accuracy in
comparison with other models like support vector machine (SVM), stacked Auto
Encoders (AEs) and traditional feed forward NN. In [66], ITS data are fed to an IDS
built using DNN in order to facilitate in-vehicular network communications security.
Moreover, it inspires the progress of methodologies used for traffic signs recognition.
For example, technologies like autonomous driving, mobile mapping and driver
assistance systems require such mechanisms for providing consistent services. Cires
et al. [67] introduced a DNN based system of traffic sign recognition and stated
increased accuracy with the methodology. Additionally, self-driving vehicles utilize
DNNs to execute various jobs, like detecting pedestrians, obstacles, traffic signs etc.
2.10.6. Agriculture
23
been applied in remote sensing for crop and land recognition and gradation [74] [75]
[76]. Another instance, DL has been used for predicting and detecting in the area of
automatic farming. Steen et al. [77] introduced a DL based model of deep CNNs to
detect obstacles in the agricultural land. This approach helps the autonomous
machineries to operate safe and sound over the field. Furthermore, in automated
harvesting, detecting the stage of fruit (ripe or raw) is crucial. Sa et al. [78] employed
a variant of deep CNNs called Region-based CNN for studying the fruit images.
2.10.7. Education
24
2.10.8. Industry
In industry, cyber-physical systems (CPS) and IoT forms the central essentials
to advance manufacturing technologies delivering high-accuracy and intelligent
systems [39]. Luckow et al. [83] investigates a visual inspection by using CNN
network with AlexNet and GoogLeNet. In this study, various images of vehicles along
with their explanations are fed to a DL model. The system uses TensorFlow framework
and shows that the best efficiency acquired is accuracy of 94%. Shao et al. [120]
employed DNNs in a fault identification system aimed for extracting important
features by utilizing denoising auto encoder (DAE) and contractive auto encoder
(CAE) and. In another study, Lee [46] proposed a model in combination with IoT and
cloud platform for sustenance of error recognition of defect categories in car headlights
in automobile manufacturing and the outcome established the better efficiency of the
DBN model over SVM and RBF (Radial Basic Function). In [11], the authors has
proposed stacked denoising auto-encoders (SdA) for two purposes: one, sensory data
noise reduction that happened due to electrical and mechanical turbulences. Second,
for performing classification of faults. They experimented their proposed approach in
wafer samples of a photolithography process and the reported outcome revealed that
in noisy situation, the proposed system generates 14% higher accuracy with respect to
other methods including SVM and K-Nearest Neighbors.
2.10.9. Government
25
implemented in Neon framework with an accuracy of 89%- 99%. Additionally, [86]
addresses the issue of road damage detection by utilizing DNN architectures which
gathers its data from crowd-sourcing empowered by IoT devices. The study is
performed through a deep CNN and evaluations shows a damage identification
accuracy of 81.4%.
26
benchmark dataset is chosen for evaluating our proposed BLSTM RNN model for
detecting intrusions in the IoT network.
Dataset Accuracy
UNSW-NB15 94.04%
CIDDS-001 99.99%
GPRS-WEP 82.74%
GPRS-WPA2 92.48%
In recent years, deep learning has developed progressively, and has become
functional for detecting intrusions and outperforming conventional methods. Studies
reveals that DL entirely outperforms conventional shallow learning methods. In [12],
a deep learning method has been used by employing a DNN for flow-based anomaly
recognition. The outcome reveals that the proposed technique could be used for
detecting anomalies in software-defined systems. In [13], a deep learning technique
has been proposed where the authors use a self-taught-learning (STL) algorithm over
NSL-KDD dataset. When relating the performance with former studies, the approach
has proved to be more efficient. However, their studies emphasize only on the feature
reduction capability of DL techniques. Fu et al. [5] introduces a novel technique for
intrusion detection intended for the IoT systems established upon anomaly extraction.
In their study, the authors assert that anomalies are detectable by analyzing the patterns
of the data of the IoT sensor layer, like the temperature, humidity or anything that an
IoT object sensor could collect and report. The study uses an unsupervised algorithm
for data-mining for identifying normal patterns. For performance evaluation, Intel Lab
Project dataset was employed, but no detected accuracy was reported to the designed
system. Another study conducted by M. Sheikhan et al. [20] claims that RNNs can be
viewed as reduced-sized neural networks (NNs). The paper introduces a 3-layer RNN
architecture having 41 input features and 4 intrusion classes as outputs for a misuse-
based intrusion detection system. Nevertheless, the RNN units of layers remain partly
connected. As a result, the proposed RNNs does not exhibit the capability of DL to
27
produce high dimensional features. Moreover, performance evaluation of the proposed
approach in terms of binary classification has not been reported.
With the consistent growth of big data along with the increase in computational
power, the deep learning technique has become popular rapidly, and is increasingly
utilized in numerous fields. This thesis work introduces a unique DL to detect
intrusions over IoT network by employing a bidirectional LSTM (BLSTM) recurrent
neural network (RNN). Related with former works, we have used the BLSTM-based
model aimed at binary classification and excluding pre-training. In addition, we have
used two distinct data sets for training and testing purposes (namely, UNSW-
NB15_training-set.csv and UNSW-NB15_test-set.csv) for evaluating the performance
of the proposed model.
There exist several research gaps within the prior related works. Foremost, no
studies has been conducted using both BLSTM RNN and TensorFlow implementation
framework in order to detect intrusion in the IoT network. Second, most of the previous
work has used the traditional RNN that has the exploding and vanishing gradient [15]
problem, which gets resolved by LSTM RNN. But, LSTM network has a major
limitation, that, it cannot be trained in both positive and negative time direction [28].
As a result, during training phase, the LSTM network needs to search for “optimal-
delay” (another extra parameter needed for training) of the network. Eventually, while
the delay becomes so big that nearly none of the vital data could be saved, then the NN
congregates to the probable optimal resolution depending on the prior information
[28]. Bidirectional LSTM (BLSTM) RNN resolves the problem of optimal delay, since
the BLSTM architecture propagates the existing data in both forward and backward
direction in time [28]. We have attended this research gap in our work by proposing a
novel Bidirectional LSTM RNN architecture for intrusion detection. Third, most of
the prior works used benchmark dataset like KDD’99, NSL-KDD etc. which remain
highly criticized. In [20] the authors’ express that the KDD dataset is obsolete and
endures with data redundancy data which may prompt to partial detection accuracy. In
[27] the authors insist that the NSL-KDD dataset comprises of redundant occurrences
and it is not appropriate to be used for the accurate training of NN models. Fourth, a
very limited amount of work has been done to detect intrusion in the IoT network using
deep learning technique. This piece of work contributes to the literature of IoT network
intrusion detection mechanism. This research work uses UNSW-NB15 dataset, which
28
according to the literature forms to be the most recent and effective dataset published
for intrusion detection research work purpose.
29
Chapter 3
NEURAL NETWORKS
30
calculated by summing up all the input vectors. Moreover, another input called bias
(bk) is also feed into the network. The computation of the net input vector Vk is shown
in (3.1).
n (3.1)
Uk W
i 0
kiXi
Where, X0 = 1 and Wk0 = bk. The output of the neuron Yk is calculated by (3.2).
To perform the computation an activation function (·) is employed on the net input
Vk:
Yk (Vk ) (3.2)
31
3.3. Recurrent Neural Networks
RNN hidden layers act as a memory unit. Precisely, the RNN output of time
t−1 effect the output of time t. The RNN neurons are armed with feedback loops which
yields the present output as the input for the subsequent step. The neurons of an RNN
could be expressed like an internal memory which preserves the computational
information from input in the previous step. For training an RNN, a variance of the
back-propagation algorithm, termed as Back-Propagation-Through-Time (BPTT) is
employed. Fundamental component of BPTT algorithm is a procedure called
unrolling. Figure 3.4 illustrates the assembly of an RNN and the idea of unrolling.
RNN can be unfolded in a graph without any cycles as presented in Figure 3.4, where,
(X(t), X(t+1), …) represents multiple input time steps, (u(t), u(t+1), …) is multiple
internal state time steps, and (y(t), y(t+1), …) as multiple time steps of outputs. When
unrolling the RNN structure, and the internal state (u(t)) and the output (y(t)) of the
prior time step are delivered as inputs to the subsequent time step.
32
Figure 3.4: Unrolling of RNN architecture
RNN forms a class of powerful DNN that uses its internal memory along with
loops for dealing with sequence data [48]. The unfolded architecture of RNNs in the
Figure 3.5 represents the calculation procedure of an RNN unfolded (or unrolled) in
time.
In the above figure, during each iteration at time T, the hidden state of the
hidden layer, ℎT, gets updated depending on the current input XT, and prior hidden state,
ℎT−1, through the following equation:
33
Where, 𝑊𝑥ℎ represents the input layer to hidden layer weight matrix, 𝑊ℎℎ
denotes the weight matrix amongst two consecutive hidden states (ℎ𝑡−1 and ℎ𝑡), 𝑏ℎ is
the bias vector of the hidden layer, and 𝜎ℎ denotes the activation function to generate
the hidden state. The network output could be calculated as:
Where 𝑊ℎ𝑦 denotes hidden layer to output layer weight matrix, 𝑏𝑦 denotes the
bias vector of the output layer, and 𝜎𝑦 represents the output layer activation function.
34
cell. In Figure 3.6, at each time reiteration, 𝑡, the LSTM cell has input 𝑥𝑡, and the output
̃ t, the
ℎ𝑡. During the training phase, the LSTM cell also considers the cell input state, 𝑪
cell output state, 𝐶𝑡, and the previous cell output state, 𝐶𝑡−1. The gated structure allows
LSTM to deal with aforementioned long-distance dependencies [48]. LSTM cell
comprises of 3 gates, namely: input gate, output gate and forget gate. Figure 3.6 depicts
the input gate, the output gate and the forget gate are denoted as 𝑖𝑡, 𝑜𝑡 and 𝑓𝑡
respectively at time 𝑡. All the three gates and the input cell state are denoted by colored
boxes in Figure 3.6, are calculated by the following equations iterated from t = 1 to T:
~ (3.8)
Ct tanh(WcXt Ucht 1 bc)
Where 𝑊𝑓, 𝑊𝑖, 𝑊𝑜, and 𝑊𝐶 denotes the weight matrices which maps the input
of the hidden layer with the 3 gates and the input cell state, whereas the 𝑈𝑓, 𝑈𝑖, 𝑈𝑜, and
𝑈𝐶 represents the weight matrices connecting the previous cell output state to the three
gates and the input cell state. The 𝑏𝑓, 𝑏𝑖, 𝑏𝑜, and 𝑏𝐶 are the bias vectors. The 𝜎𝑔 denotes
the activation function of the gates, and the tanh denotes the hyperbolic tangent
function. Based on the results of four above equations, at each time iteration 𝑡, the cell
output state, 𝐶𝑡, and the layer output, ℎ, can be calculated as follows:
~ (3.9)
Ct ft * Ct 1 it * Ct
ht Ot * tanh( Ct ) (3.10)
The last output of a LSTM layer would be a vector of all the outputs: 𝒀𝑇 =
[ℎ𝑇−𝑛,…,ℎ𝑇−1].
35
3.5. Bi-directional Long Short-Term Memory RNN
(3.11)
h t H (WxhXt Whhh t 1 bh )
(3.12)
h t H (WxhXt Whhh t 1 bh )
(3.13)
yt Whyh t Whyh t by
Both the output of the forward and backward layers are calculated by means of
the standard LSTM equations, Equations (3.5) - (3.10). The BLSTM layer produces
an output vector, 𝒀𝑇, which is calculated by the equation:
(3.14)
yt (h t , h t )
Where 𝜎 function combines both the output sequences. The 𝜎 function could
be of four kinds: concatenating, summation, average and multiplication function.
Incorporating BRNNs with LSTM neurons results a bidirectional LSTM recurrent
neural network (BLSTM RNN) [45]. The BLSTM RNN is capable of accessing long-
term context data in both the backward and forward directions. The combination of
both the forward and backward LSTM layers is considered as a single BLSTM layer.
It has been shown that the bidirectional models are considerably better than regular
unidirectional models in various domains like phoneme classification and speech
recognition [48]. Figure 3.7 illustrates a bidirectional LSTM structure with three
consecutive time steps.
36
Figure 3.7: Unfolded BLSTM RNN structure with three consecutive time steps [48]
The implementation of neural network goes over two major stages: Training
and Testing. During training, the NN is feed with knowledge (data) and the network is
required to learn from its input data. The learning procedure is performed through an
optimization (error minimization) process. Optimization algorithms are mathematical
functions which help to reduce the loss function by fine-tuning the neural network
parameters. The loss function computes the variance among the expected output and
the actual output. Hence, minimizing the loss makes the network model generate
optimal output. The optimization algorithm which is used for training the NNs is
termed as Gradient Descent. The Gradient descent algorithm calculates the gradients
or the slopes of the loss function with regard to the NN parameters (biases and
weights). The technique which is used to calculate the gradients is termed as Back-
Propagation (BP) [116]. The gradient is the amount of the alteration that occurs in the
loss function due to the variation in the network parameters. Depending on the gradient
the network parameters are updated by means of a scalar value called learning rate.
37
This mechanism is performed via iterations by allowing several repetitions over the
training data. One surpass over the training data is known as an epoch. Since each
epoch, the parameter values move nearer to the optimal value resulting in the loss
function convergence. For a large dataset computing the loss and gradient for the full
dataset might be computationally infeasible. Hence, a variance of the gradient descent
known as Stochastic Gradient Descent (SGD) is widely in use. In SGD algorithm, the
total input is distributed into smaller subsets of input termed as batches. NN parameters
are then updated by computing the loss function of single batch at a time. There are
several other popular variants, namely: RMSprop, AdaGrad and Adam [117].
Training the NN is often associated with the problem of overfitting, which is,
when the network is characterized with high accuracy over the training-set but
generates poor accuracy when evaluated on a new test data. Several counter measures
can be applied for preventing overfitting. One is Early-Stopping, where the loss
function of a validation set (a small sub-set of training set) is calculated after each
epoch. If the value of the loss function over validation set starts increasing, despite the
decreasing loss of the training set, it could be a sign of overfitting. In that case the
training should be stopped. Another technique is dropout regularization, which is
frequently used in deep learning where a certain ratio of neural network connections
are eliminated randomly over each epoch. The network weights and biases gets
updated by the training algorithm like Back-Propagation (BP), Back-Propagation-
Through-Time (BPTT) etc. Parameters like dropout, decay, batch size, learning rate
etc. are the optimization algorithm parameters which are generally determined by the
researcher over trial-and-error. All these parameters are collectively called as hyper-
parameters.
Neurons are the building blocks an ANN. Neurons take inputs from the
preceding neurons, multiply the input values with weights, generate a sum of products,
and pass the summation through an activation function to generate the final output
(3.16). Mathematical illustration of the neuron is presented in (3.15).
38
Output f (Y ) (3.16)
Firing a neuron actually means activating it. The similarity between biological
neurons and artificial neurons is illustrated in Figure 3.8.
In the above Figure 3.8., the dendrites carry the electrical signals to the neuron
body and act as the neuron inputs. Similarly, in an artificial neuron, the inputs in1,
in2,…, inn resembles the dendrites. The activation function resembles the cell body,
and the propagated output resembles biological axon. The artificial neurons imitates
a similar functioning logic as that of a biological neuron.
39
Figure 3.9: Step function [123]
There are several step functions that are widely used in machine learning. For
instance: Sigmoid function, Tanh (hyperbolic tangent) function and Rectified Linear
Unit (ReLU) function. Among these, ReLU is the most popular step function in the
area of RNN. A brief discussion of ReLU step function is provided in the following
section.
This function is the most broadly employed solution for vanishing gradient
issue of LSTM RNN. The mathematical representation is shown in (3.17).
Y f ( x) max( 0, x) (3.17)
When the input remains smaller than 0, then output remains 0. When the input
is greater than 0, then the input and output becomes equivalent. The ReLU function is
more efficient for a binary classification problem, and we employ it as hidden layer
activation functions in our proposed model.
40
lots of hidden layers are termed as Deep Neural Network (DNN) and Deep Learning
(DL) represents the learning algorithm of the DNN. Salient features of DL is that it
can learn the features by itself, thus there is no need of hand crafting the features. This
DL property facilitates the learning procedure and makes DNN more efficient and
robust in comparison to the shallow learning [53].
Deep neural network (DNN) models have numerous parameters and have the
ability to model highly composite functions. This capacity is a boon and a bane. Such
prototypes would frequently overfit on the training-set and would drop accuracy and
generalizability over the test-set [56]. Regularization in ANN terminology speaks of
the procedure of regulating neural network layers for preventing the over-fitting.
Dropout (also known as dropout probability or dropout rate) is the most widely utilized
regularization technique in DL. During the learning process the hidden layer(s)
neurons are selected randomly and are discarded depending on the dropout rate.
Precisely, randomly selected neurons are dropped-out i.e. dropped out neurons could
not update weights any more, thus helping the learning process to evade the problem
of overfitting [55].
MSE (3.18)
Wt * * Wt 1
Wt
41
3.10.1. Mean Squared Error
MSE or Mean Squared Error loss function remains one of the most widely
used in the area of DL. The mathematical function of MSE is presented in (3.19).
i 1 3.19
1
MSE
n ( yˆ y )
n
i i
2
Where, Yi forms the learning procedure output, Yˆi is the expected output and n
is number of output classes.
Also termed as CEL, is one more widely used loss function which is frequently
chosen for regression or classification issues. Mathematical representation of CEL is
presented in (3.20).
i 1 3.20
cross _ entrophy k
yi log( yˆi )
Where, i denotes the amount of training instances, Yˆi is the expected outcome
and Yi is the learning output [56]. CEL and MSE are extensively employed in
classification problems.
Use of deep learning architectures has grown fast, and this growth has been
sustained by various deep learning frameworks in current years. Every DL framework
has their own strength and weakness depending on the optimization algorithms, DL
architectures, and convenience of deployment and development [92]. Many of these
DL frameworks has been widely used in research work for proficient implementation
of DNNs. In the subsequent sections some of the frameworks are reviewed.
42
3.11.1. H2O
3.11.2. Torch
3.11.3. TensorFlow
3.11.4. Caffe
43
3.11.5. Theano
3.11.6. Neon
44
• Comprehensive
error debugging
Theano Python Python • Several models Various
are supported low level
APIs
• Rapid training of
LSTMs over
GPUs
Neon Python Python • Quick training • Not supportive
time for CPU multi-
threading
• Platform
switching is easy
• Provisions
modern
architectures like
GAN
Deeplearning4j Java Python, • Models get • Extended
Scala, imported from training time
Clojure leading related to other
frameworks (such tools
as Theano, Caffe,
Torch and
TensorFlow)
• Provides
visualization
interface.
Caffe C++ Python, • Offers a bunch Not very
Matlab of models as good for
reference RNN
• Elementary
platform swapping
• Excellent for
CNN
45
3.12. Training Dataset and Feature Identification: UNSW-NB15
Machine learning (ML) and the techniques of data mining are extensively
utilized for advance detection of intrusion in contemporary years which makes it
probable to automate intrusion detections in IoT network. One of the major research
difficulties that IoT intrusion detection research face is the inaccessibility of a
comprehensive network-based dataset which mirrors modern network traffic
environment [23]. Several current researches exhibited that for the present network
threat scenario, those datasets do not conclusively reflect modern network traffic and
contemporary low footprint attacks. In [20], Bajaj and Arora express that the KDD
dataset is obsolete, recommending that the NSL-KDD dataset is most appropriate for
analyzing current networks. They also stress that, KDD 99 dataset endures with
redundant information that leads to biased outcome of intrusion detection. This results
in inappropriate feature classification. They also claim that intrusion detection studies
would likely generate results that do not characterize the real network scenarios due to
the use of KDD 99 datasets [20]. However, the absence of alternatives is the reason
the dataset is still being used. However, answering the unobtainability of network
benchmark dataset encounters, in the year 2015, Moustafa and Slay [23] came up with
their studies and produced UNSW-NB15 dataset. The authors claim that the introduced
dataset comprises a fusion of the contemporary real network traffic and synthesized
threat actions. The authors castigated that NSL-KDD dataset and KDD’99 dataset
don’t characterize the up-to-date interventions in IDS and presented a comprehensive
and all-inclusive dataset called the UNSWNB15. This dataset encompassed several
features from KDD’99 dataset [24]. They further inspected the features of UNSW-
NB15 dataset and KDD’99 dataset and the results attained exhibited that the actual
KDD’99 dataset attributes are less effective compared to UNSW-NB15 features. The
new UNSW-NB15 intrusion dataset that includes diverse attributes or features
including those in the KDD’99 dataset. This newly generated UNSW-NB15 intrusion
dataset forms the most up-to-date dataset, published in 2015 to facilitate intrusion
detection research works.
Another challenge that this area of research face is obtaining the labeled input
dataset for the purpose of intrusion detection in IoT [5]. Fu et al. [5] recognizes out
this difficulty that majority of the independent researchers face in the field of IoT
security. For overcoming the challenge, UNSW-NB15 dataset have been employed in
this research work. Moustafa and Slay (in 2015) [23] suggested that the NSL-KDD
dataset and KDD‘99 dataset did not characterize the up-to-date features for intrusion
46
detection, and presented a comprehensive and all-inclusive dataset called the
UNSWNB15. This dataset encompassed several features from KDD‘99 dataset. In [9],
they further analyzed the features of the UNSW-NB15 and KDD‘99 dataset. Results
demonstrated that actual KDD‘99 dataset features were less representative as
compared to the features of UNSW-NB15 intrusion dataset. The UNSW-NB15 dataset
contains 45 features [23]. The dataset is further split into separate training and testing
set containing all the current attack types. T. Janarthanan and S. Zargari [26] performed
an extensive study on the UNSW-NB15 dataset for extraction of most competent
features and thus proposed a feature subset which dramatically increased the intrusion
detection efficiency. In this thesis, the dataset subset in the file UNSW-NB15_training-
set.csv will used for training the proposed model, while the UNSW_NB15_test-set.csv
will be utilized for testing the proposed model. Both the dataset files can be obtained
from: https://www.unsw.adfa.edu.au/australian-centre-for-cyber-
security/cybersecurity/ADFA-NB15-Datasets/
The data set file UNSW-NB15_training-set.csv contains 175,341 records for
training, while the test set file UNSW-NB15_testing-set.csv contains 82,332 records.
the UNSW-NB15 dataset has 9 attack types in total, out of which 5 types are often
present in IoT attacks (Analysis, Backdoor, Denial-of-Service, Worms, and
Reconnaissance) [118][119]. Hence, we extracted only these 5 types of attack samples
along with the 'normal' samples to prepare our training-set. In [4], the authors have
used UNSW-NB15 dataset for conducting IoT research because unlike previous
benchmark datasets, UNSW-NB15 exhibits contemporary attack patterns and modern
normal traffic patterns. Moreover, since UNSW-NB15 has separate training-set and
testing-set, data distribution remains different [4]. Again, in [31], the authors points
out that: “It encompasses realistic normal traffic behavior and combines it with the
synthesized up to date attack instances”. Also, [35] points out that previous benchmark
data sets like KDD‘99 and NSL-KDD could not meet the current network security
research needs as they does not comprehend the present-day network security
circumstances and the latest attack features.
The UNSW-NB15 data set have been chosen for this research purpose as it
covers modern attack patterns, consists of modern normal traffic patterns, and contains
only two classes (“attack” and “normal”). Since we are performing binary
classification task, this class distribution facilitates the approach that has been
proposed. Secondly, UNSW-NB15 forms a comprehensive data set that presents 5
types of IoT attacks. Thirdly, UNSW-NB15 is a sequential dataset, which is very
appropriate for training recurrent neural networks.
47
3.13. Dataset Format Conversion
Table 3.2 shows five rows of the dataset after format conversion. First five
columns represent the five selected features proposed by [9] and their corresponding
48
values. Column six (attack_cat) shows the category of attack including normal
samples, and the last column shows the dataset label, where, value 0 is equivalent to
normal samples and value 1 represents attack samples.
49
Chapter 4
RESEARCH METHODOLOGY
4.1. Introduction
The proposed model aims to detect intrusions at the transport layer of the IoT
architecture. In order to detect intrusions in the data at the IoT transport layer, a process
design is required which would accept IoT data as input, perform the processing and
generate a two-fold classification: “attack” or “normal”. A high level model design
view is shown in Figure 4.1.
50
1. Preprocessing Input Data: This phase deals with the conversion of input data
in an acceptable data structure permitted by the simulation framework.
2. Training: This phase includes fitting the NN model with the training data for
classification.
3. Detection: In this stage the trained NN model performs the detection of
intrusions.
In Figure 4.2 above, during the preprocessing stage the training-set data
samples are encoded and normalized and fit into a data structure compatible to the
51
TensorFlow framework. After preprocessing, the whole dataset will be divided into
two subsets: Training subset and validation subset. The former one, which is the
training subset is used for training the BLSTM RNN network, and the validation subset
is used for validating the already trained network. After the completion of the
validation process a separate testing dataset is used for testing the model classification
accuracy and other performance measures.
In this thesis, Python programming language and Keras library will be used for
the implementation purpose. Keras is a neural network library which is open source
and offers a high-level Application Programmer Interface (API) for implementing
DNNs. Keras executes atop several other DL frameworks such as TensorFlow1 and
Theano2. In this research work, we will be using Keras atop TensorFlow. Keras is
chosen as Keras API enables rapid prototyping of neural network models in research.
Keras also permits modular configuration of NNs, i.e. it allows to combine parameters
like activation functions, loss functions and optimizers. Moreover, the Keras API is
easy to learn and use, and has the added advantage of easily porting models between
frameworks. Since Keras is self-contained, it can be used without having to interact
with the back-end NN framework, which is TensorFlow in our case. This approach
minimizes the complication and need for programming the back-end framework and
enables fast experimentation. We have chosen Keras, primarily, for the following
advantages:
Better user experience for deep learning algorithms: The Keras API is user
friendly. The API is well designed, object oriented, and flexible. Researchers can
define new deep learning models without needing to work with potentially
complex back ends, resulting in simpler and leaner codes [40].
Persistent Python integration: Keras is a native Python package, which allows easy
access to the entire Python data science ecosystem. For example, the Python Scikit-
learn API can also use Keras models [40].
1
https://sheffieldml.github.io/GPyOpt/
2
http://deeplearning.net/software/theano/
52
Portability: Keras allows researchers to port from Tensorflow back-ends to other
back-ends like Theano. In addition, Keras makes many learning resources,
documentation, and code samples freely available [40].
We have used Keras library as it takes into account simple and quick prototyping
through modularity, extensibility and user friendliness extensibility [40].
𝑇𝑃 + 𝑇𝑁 4.1
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑋
53
The model’s misclassification rate could be defined as – how frequent the
model is wrong. Misclassification rate is the percentage of wrong detections and can
be calculated by using the formulae in (4.2):
𝐹𝑃 + 𝐹𝑁 4.2
𝑚𝑖𝑠𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑖𝑜𝑛_𝑟𝑎𝑡𝑒 =
𝑋
False Positive Rate (FPR), calculated by (4.3), is the percentage at which the
system incorrectly classifies normal samples as anomaly:
FP 4.3
FPR
XNormal
Other parameters for evaluating the proposed model includes recall, precision,
and f1-score values Precision is calculated as the ratio of correct positive detections to
the total actual positive detections, as shown in (4.4):
TP 4.4
precision
TP FP
TP 4.5
recall
TP FN
In (4.6), F1-Score denotes the harmonic mean of recall and precision. F1-Score
is calculated by the weighted average of precision and recall by taking both the FP and
FN into account.
54
These metrics are employed to evaluate the proposed model in the testing phase
of the model simulations.
55
Chapter 5
56
Figure 5.1: System Architecture
Epoch 100
Hidden Layers 3
Activation function Sigmoid
Optimizer Adam
Classification engine binary_crossentropy
Learning Rate 0.001
Class size 2
One hot encoder Yes
Weights Random
Biases Random
57
many advanced Graphical User Interface (GUI) functionalities that assist for RNN
implementation. For instance, the “Variable Explorer” GUI helps to visualize the
variables (data and values) used in the implementation. We have used this functionality
to visualize and analyze the dataset values, confusion matrix threshold values,
prediction metrics, etc. Moreover, it facilitates the running and debugging of the
python code through syntax coloring and breakpoints. Spyder IDE also supports
parallel-run, i.e., multiple neural networks can be trained and/or tested simultaneously.
While conducting our experiment, this feature have been used to train and test several
RNNs simultaneously. Spyder IDE integrates the essentials libraries for developing
RNN, like, NumPy, SciPy and Matplotlib.
For implementing the BLSTM RNN, the high level neural network library
called Keras [116] have been employed. The Keras Sequential class is utilized for
instantiating the RNN object. Other Keras classes like LSTM and Bidirectional ia being
used for implementing the proposed BLSTM model. Dropout is another class that
belongs to Keras library which have been used for preventing network overfitting. In
our implementation, we use the Google TensorFlow as backend neural network
framework. For manipulating the matrices effectively, the NumPy and Pandas [117]
libraries are employed. NumPy is primarily used to create the Tensorflow data
structure. NumPy also allows to use much less memory for matrices than the default
python lists, and it also makes the matrix operations much more efficient. Pandas is
built on top of NumPy and it provides higher level interface for manipulating datasets
with named rows and columns (the input datasets are required to be Pandas
dataframes). We have used Pandas library to import the values from training-set and
test-set (which are .csv files) and store them as data frames. For measuring the
performance of the detection system, some helper functions have been used from the
library sklearn [115] namely: confusion_matrix and classification_report.
58
arrows suggest. The ‘param’ signifies the function parameter. X_train and Y_train are
the training-set matrix. X_test is the test-set matrix.
As in Figure 5.2, five different classes have been developed, i.e., IDS, Data,
Classifier, FitModel, and Detection, in order to implement our system architecture.
Table 5.2 summarizes functions of all the used classes. In the following sections, the
classes and their respective methods are described in detail.
59
Table 5.2: Classes and their respective functions
Class Description
Name
IDS The IDS class is the parent class which instantiates and encapsulates
other four classes and their respective methods.
Data The Data class deals with the data preprocessing mechanism.
Classifier This class builds the code for the BLSTM RNN architecture.
FitModel This class is responsible for two tasks: The first task is to compile our
model, and the second task is to train the model with the preprocessed
data produced from the Data class.
Detection This class performs the intrusion detection task and generates the
evaluation metrics.
The IDS class is the parent class which encapsulates rest of the four functioning
classes and respective methods. It has two methods: build() and execute(). The build()
method is responsible for instantiating all the other four classes with their respective
methods. The execute() method just reuses the previously instantiated classes with a
separate set of arguments. The two green branches in Figure 5.2 (above) represents the
reuse functionality and the param represents the arguments. In the first green line
param=test-set, which means the Data class is re-used by providing test-set as new
argument. Similarly, the second green line (where param=X_test) signifies that the
Detection class is re-used with X_test as argument. X_test has been discussed in detail
in Section 5.3.2.1.3.
The Data class deals with the data preprocessing mechanism and consists of
only one method called preprocess().The preprocess() method deals with data
preprocessing and consists of a four sub-functions, including importDataset(),
normalizeData(), dataStructure() and reshape(). Details of the preprocess() method
and its sub-functions are described in the following sections.
60
5.3.2.1. preprocess Method
The preprocess method takes the dataset (as a .csv file) as input parameter and
reconstructs the data samples into a TensorFlow neural network compatible structure
that can be used for training and testing purpose. The preprocess() method is being
called twice: in the first time, it is called by the build() method in the IDS class for
processing the training-set and in the second time by the execute() method in the IDS
class for processing the test-set. In the first instance, the parameter passed to the
preprocess() method is the 'UNSW_NB15_training-set_5000.csv' file which contains
the training dataset, and in the second instance the 'UNSW_NB15_testing-set.csv' file,
which contains the testing dataset. The preprocess() method encapsulates 4 sub-
functions for performing the data pre-processing task. The processed files are returned
to the main IDS classes and are used in the training and testing phases, respectively.
The details of the sub-functions of the preprocess() method are presented in the
following sections.
This method imports the training-set for training the neural network. The
training-set is imported as Pandas data-frame (a 2-dimensional labeled data structure).
This is because, training a neural network requires a NumPy array format; and to
generate a NumPy array out of the .csv file format, Pandas data-frame is essential. The
read_csv method from Pandas library is used in order to import the training-set as
data-frame format. Figure 5.3 below is a snapshot of the Spyder IDE ‘Variable
Explorer’ showing the structure of the data-frame format. While importing data it’s
important to note that, it is necessary not only to select the exact columns (which are
service, sbytes, sttl, smean, ct_dst_sport_ltm), but also to convert them into an array
of numbers, because only numbers can be the input of neural networks. The data-frame
is named as dataset_train. Below is the line of python statements which import the
training set from a .csv file and convert it to a data-frame.
61
import Pandas as pd
dataset_train = pd.read_csv('UNSW_NB15_training-set_5000.csv') I
In (I), first the Pandas library class is imported with pd as its object. Then
pd.read_csv (where pd is an object of the Pandas class and read_csv is an inbuilt
method from Pandas class) is employed to read the training set and import the values
as data frame and store them in dataset_train.
Figure 5.3: A view of the training-set as data-frame format taken from Spyder IDE
Next, the columns from the training-set are selected and stored as a NumPy
array by (II):
62
need to be selected from the dataset. To do that, we have used the iloc() method to get
the right index of the columns we want. The data-frame (dataset_train) and the iloc()
method is used to specify the column numbers that we want. Essentially, the columns
pointed to by these column numbers contain the features that we intend to select. The
UNSW_NB15_training-set_5451.csv file (i.e. the training set) consists of 45 features
in total (i.e. 44 columns, since index number of the Pandas dataframe starts from zero,
hence 45 features is projected as 0 to 44 columns in the dataframe). For this research
purpose, we have selected the feature sub-set proposed by [9] (discussed in Section
3.13). These 5 features are located in the 4th, 8th, 11th, 28th and 36th columns in the
original dataset. The index of the iloc() method (3,7,10,27,35) are actually the
dataframe column index of those features. As we have discussed before that the index
number of the dataframe starts from zero, hence column 1 in the original dataset is
index 0 in iloc() method, column 4 is index 3, column 8 is index 7, so on and so forth.
Since we have 5 features and 5000 training samples, hence, the training_set consists
of 5 columns and 5000 rows where each row corresponds to a sample.
This sub-function deals with the data normalization. The input of this sub-
function is the output of the importDataset() method (dataset_train). Normalization
refers to rescaling real numbers by the use of the formulae in (5.1)
𝑋 − min(𝑋) (5.1)
𝑋𝑛𝑜𝑟 =
max(𝑋) − min(𝑋)
In (5.1), X refers to the value of each data sample. Xnor is the normalized value,
min(X) is the minimum of all the values in the training-set and max(X) is maximum
of all the values in the training-set. To perform the data normalization, we introduce a
new variable called training_set_scaled which will store the new normalized values,
because it is recommended to keep the original non-normalized training-set separate
from the normalized one. The normalizeData sub-function returns a normalized 2D
array (called training_set_scaled) of real numbers in the range of 0 and 1. The
normalization process is carried out by the following Python statements in (III).
63
from sklearn.preprocessing import MinMaxScaler
Table 5.3: Example of categorical data (marked yellow) and other un-normalized data
- 530 254 53 1
- 816 62 82 1
64
Table 5.4: Categorical to numeric conversion (marked yellow) other normalized data
0 4.97482e-05 1 0.144928 0
0 0.000229947 1 0.707384 0
In order to feed data into an RNN, a TensorFlow data structure is required for
storing the features and labels. To transform the dataset into a TensorFlow data
structure, two separate entities are created. The first entity will be X_train, which is
the input of the RNN and then the second entity will be Y_train, which will contain
the expected output of the RNN. So, technically, X_train will contain the prior
observations (from time t-1 till time t), and Y_train will contain the expected
observation at time t+1. Important to note here is that the RNN neuron takes the
X_train, learns the correlations between the observations in the data samples, and
generates a prediction out of the learning. This generated prediction is the actual output
of the RNN neuron at time t. To calculate the efficiency of the neurons, this actual
output is compared with Y_train containing the expected output. The data structure is
created by (IV).
X_train = []
Y_train = []
for i in range(1, 5451): IV
X_train.append(training_set_scaled[i-1:i, 0])
Y_train.append(training_set_scaled[i, 0])
X_train, Y_train = np.array(X_train), np.array(y_train)
In (IV), two newly introduced variables X_train and Y_train are initialized as
empty lists. Then, these two entities X_train and Y_train are populated with the traffic
observations from our training-set by using a for-loop (where i represents the time t)
65
ranging from 1 to the last index of our training-set i.e. 5451. The X_train is appended
with observations ranging from time i-1 to time i by using the append() function.
Y_train is similarly appended with the observation at time t+1. Since both X_train and
Y_train are lists, converting them to NumPy arrays is crucial, so that they can be
accepted by our BLSTM RNN model. This conversion is implemented by using
np.array() function (where np is an object of NumPy class).
X_test = []
Y_test = []
for i in range(1, 4205): V
X_test.append(test_set_scaled[i-1:i, 0])
Y_test.append(test_set_scaled[i, 0])
X_test, Y_test = np.array(X_test), np.array(Y_test)
X_test and Y_test are just similar to X_train and Y_train, except, they hold the
test-set observations instead of training-set observations. The dataStructure() method
takes the normalized 2D array, i.e., output of the normalizeData() method as input and
returns NumPy arrays: X_train and Y_train (for training-set) and X_test and Y_test
(for test-set).
To make the data structure compatible with the input format of our RNN, the
reshape() method is used as shown below in (VI).
In (VI), the np.reshape() method from NumPy class is used to implement the
reshaping of the X_train. As per Keras Recurrent Layer documentation3, the input
shape of an RNN should be a 3-D tensor with the following dimensions: batch-size,
time-step, and input-dimension. The np.reshape() method actually generates a 3D
3
(available: https://keras.io/layers/recurrent/)
66
tensor which is compatible for RNNs. The argument structure of the np.reshape()
method is like this: np.reshape (name of the array to be reshaped, (batch_size,
time_step, input_dim)). The first argument in (VI) is X_train because X_train is the
array that needs to be reshaped. The batch size is 5450, followed by the time-step of
5. The time step is usually equals the column numbers of the input array. The input
dimension is 1. The reusability feature of the object oriented programming has been
exploited to implement the reshape() method. In order to reshape the training-set into
an RNN-compatible input format, X_train is passed as an argument to this method
along with the other corresponding parameter values. When X_test is passed as an
argument to this method, it reshapes the X_test into a compatible format which is fit
for testing the model.
Table 5.5 summarizes the input parameters and the final output of the
preprocess() method. Data pre-processing is implemented through the preprocess()
method. Since we have to perform the pre-processing for both the training-set and test-
set, we have used the reusability feature of OOP to implement data pre-processing. To
implement the pre-processing of the training-set, the preprocess() method is called
(from inside the build() method) with the training-set as an argument. For
implementing test-set pre-processing, the same preprocess() method is called again
(from inside the execute() method) with the test-set as an argument. This is why, Table
5.5 and Table 5.6 are segmented into two: blue corresponds to training-set and green
for test-set. The respective sub-functions of the preprocess() method like
importDataset(), normalizeData(), dataStructure() and reshape() also work according
to the input parameters of the preprocess() method. For example, when the input is
training-set, the output of the importDataset() sub-function is training_set, and when
the input is test-set, the output of the same sub-function test_set, so on and so forth.
Precisely, the same sub-function yields different output in different implementation
steps. Table 5.5 below lists all the sub-functions of the preprocess() method.
67
Table 5.5: Input and Output of the preprocess() method
68
test_set_scaled compatible for
RNN
This class implements the architecture of our proposed BLSTM RNN model.
In order to make it more robust, dropout regularization have been utilized, which is a
mechanism to prevent overfitting of the model. The model architecture is implemented
through several steps. Table 5.7 lists all the required steps and their respective actions.
Table 5.7: List of steps and their corresponding actions for building the RNN
69
Step1: Importing the Keras Library
In (VIII), the Sequential class from Keras initializes the neural network object
called classifier. Here, we introduce a new name called ‘classifier’. This classifier is
an object of the Sequential class which represents a sequence of RNN layers.
Executing this line will initialize the RNN. In the implementation level, this identifier
(‘classifier’) represents our RNN model that we are going to build. In other words,
‘classifier’ will be the name of our proposed BLSTM RNN model.
70
classifier.add(Dropout(0.2))
In (IX), the add() method (which is a built-in method of the Sequential class)
generates the input layer. Inside the add() method, the LSTM class is used to add the
LSTM units or neurons. Then the Bidirectional class is used as a wrapper class which
wraps up the LSTM units and provide a BLSTM unit altogether.
In the last line of (IX), the add() method of the sequential class is used for
implementing the dropout regularization to our model with a dropout-rate of 0.2
Step 4: Add hidden layers with corresponding neurons and add Dropout
regularization
# hidden layer 1
classifier.add(Bidirectional(LSTM(units = 220,
return_sequences=True)))
classifier.add(Dropout(0.2))
# hidden layer 2 X
classifier.add(Bidirectional(LSTM(units = 240,
return_sequences=True)))
classifier.add(Dropout(0.2))
# hidden layer 3
classifier.add(Bidirectional(LSTM(units = 260, ,
return_sequences=True)))
4
https://keras.io/layers/recurrent/#lstm
71
classifier.add(Dropout(0.2))
In (X), the 1st, 2nd, and the 3rd hidden layers are comprised of 220, 240 and 260
neurons, respectively. The return_sequence and dropout regularization works the same
way as explained before in the third step.
In (XI), the add() method from sequential class implements the output layer
generation. Since the output layer is fully connected to the previous BLSTM layer,
hence the Dense class of the Keras library is used to implement the full connectivity.
As the network will perform a binary classification, so, the units parameter is set to 1,
which means only 1 neuron will be there in the output. The second parameter
(activation = ‘sigmoid’) implements the output layer activation function.
This class implements the compilation and the NN training. Compiling is the
conversion procedure of the high level source code to the machine level binary code.
The compilation of the model is implemented through (XII):
72
In (XIII), the RNN is trained with the training set. The fit() method is used to
implement the training procedure. The fit() method connects the neural network to the
training-set and perform iterations based on given parameters. The fit() method takes
the following 4 arguments:
X_train: the input of the training set that has the features.
Y_train: the ground truth (i.e. the expected output) of the training set.
Epochs: is the number of epochs, i.e. number of iterations our neural network will be
trained. In other words, it is the number of times the whole dataset will be propagated
through the model. The implemented model will be trained using the same dataset for
100 times.
batch_size: finally, the batch_size is the amount of samples that are processed by the
neural network at a time. The batch-size is 132, which means that the weights and
biases of our network will get updated every 132 data samples.
Once y_pred is produced, the confusion matrix (cm) along with the
classification report (report) generation is implemented through (XV). The
classification report provides the evaluation metrics including precision, recall and f1-
score, which are further discussed in Section 4.4 of Chapter 4.
73
from sklearn.metrics import classification_report
5.4. Conclusions
The Keras library have been used to implement the neural network and perform
simulations. The Keras executes on top of Google TensorFlow which forms the
backend framework of the implementation. The Pandas library is used to convert the
74
.csv format data-sets into RNN compatible data-frames. NumPy is a package in Python
used for scientific computing. In order to manipulate arrays of unlike shapes (such as
2D array and 3D array), NumPy package is employed. The inbuilt scientific Python
library called sklearn is used to implement the generation of evaluation parameters
like confusion matrix and classification report.
75
Chapter 6
TP 6.1
recall
TP FN
Where, TP denotes the count of true positive samples (i.e. instances that are
intrusions and are labeled by the model as intrusions) and FN denotes the amount of
false negatives (i.e. instances that are intrusions but are labeled by the model as non-
intrusions). Recall states the model capability of detecting all the “attack” samples
within the dataset. It gives a sense of how good our model is in detecting “attack”
samples from within the dataset. Precision is defined in (7.2).
TP 6.2
precision
TP FP
Where, FP denotes the false positives (i.e. instances that are non-intrusions but
are labeled by the model as intrusions). Precision states the ability of the model to
identify only the relevant instances. It gives us a sense for how likely when the model
labels a sample as positive, remains accurate.
The F1 score, defined in (7.3) is the harmonic mean of precision and recall
taking both metrics into account.
76
2(recall * precision) 6.3
f 1 score
recall precision
𝑇𝑃 + 𝑇𝑁 6.4
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑋
Where, X denotes total number of samples fed as input. TP is the true positives
and TN denotes the true negatives (that is, instances that are normal and are labeled by
the model as normal). Accuracy tells how often the model is correct.
The misclassification rate of the model defined in (7.5) signifies how often the
classifier is wrong. Misclassification rate is the percentage of wrong detections and
can be calculated by using the formulae in (4.2):
𝐹𝑃 + 𝐹𝑁 6.5
𝑚𝑖𝑠𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑖𝑜𝑛_𝑟𝑎𝑡𝑒 =
𝑋
False Positive Rate (FPR), also called False Alarm Rate (FAR) is calculated by
(7.6).
FP 6.6
FPR
XNormal
Where, XNormal is the number of actual normal samples in X. FPR or FAR is the
percentage at which the model incorrectly classifies normal samples as intrusions.
This section provides the investigation details of the model performance over
different hyper-parameters. As described in Chapter 5, the proposed model consisted
of one input layer with 5 neurons, three hidden layers with 220, 240, and 260 neurons
respectively, and one output layer with one neuron. All the layers are densely
interconnected with each other. Sigmoid was used as activation function. As an initial
experiment, 5450 samples were considered from the UNSW-NB15 training dataset
77
and the outcome for 100 iterations were recorded. It is important to note here: in order
to investigate the model performance in relation with different hyper-parameters, only
one hyper-parameter at a time is applicable to change. For instance, while studying the
model performance with respect to time-steps, rest of the hyper-parameters (batch-
size, dropouts, learning rate) remains constant. In the following sections, the model
performance over different hyper-parameters have been discussed.
The model performance was studied with respect to varying time steps. Time
step refers to the number of steps the RNN is unrolled in time. In other words, time
step defines the memory capacity of an RNN cell. Table 6.1 shows the hyper-
parameters which remains constant during different time-steps. Table 6.2 tabulates the
results of different time steps ranging from 1 to 60.
Table 6.2 above, shows that the model is at its best with a time step value of
60. A second interesting trend is the relatively poor performance of the model at 15,
30 and 45 time steps. Though there remains no major fluctuations in the precision,
recall and f1 value, but the accuracy drops very sharp. This may have arisen due to
the decreased number of examples present to the model causing it to require more than
the allotted number of epochs to converge.
78
6.2.2. Performance over Different Batch-Size
The time step value was chosen as 60, as it performed best with respect to all
the evaluation matrices. A batch-size dictates the amount of samples fed to the network
at a point of time. Determining optimal batch size requires cross validation, so, started
with a very large batch size of 1090 (by keeping in mind that the total sample size must
be divisible by batch size. This is a convenient convention, though it’s not mandatory).
Table 6.3 shows the hyper-parameters which remains constant. Table 6.4 tabulates the
results of different batch sizes.
It could be observed that when using a larger batch there is a degradation in the
quality of the model. This is probably because the large batch size have a tendency to
converge to sharp minima which leads to degraded generalization [120]. In contrast,
small batch size shows a promising performance for our model with 100% f1 and recall
value.
79
parameters which remains constant. Table 6.6 tabulates the results of different dropout
rates.
The idle value for drop out ranges from 0.2 to 0.8 depending on the model
architecture and dataset size [121]. Too large dropout values may result the network
to underfit and too small dropout might result in overfitting. Determining the optimal
value or the “sweet spot” is a trial and error method. Table 6.6 shows that a drop of
value of 0.2 and 0.8 functions most appropriate for yielding a robust performance.
80
Table 6.7: Architecture of our model with all the optimum parameter values
After training, the training-set becomes known data to the NN model. For
observing the model’s performance over an unknown set of data, we fed our model
with a test-set. For initial testing, a reduced test-set with considerably less amount of
data samples were prepared. The idea of preparing a reduced test-set was to see the
models performance over unknown data relatively quick. For instance, if that model
yields unsatisfactory performance during testing, it is easier and faster to retest the
model with the reduced test-set, rather than retesting the model with full test-set.
Another purpose of creating a reduced test-set is that, the full UNSW-NB15 test-set
comprises of 9 types of attacks in total, out of which 5 types are often present in IoT
attacks (Analysis, Backdoor, Denial-of-Service, Worms, and Reconnaissance). Hence,
only these 5 types of attack samples along with the 'normal' samples were extracted to
prepare the reduced test-set. The reduced test-set samples were extracted from
UNSW_NB15_testing-set.csv file, and contains 4206 test samples. Table 6.8 shows
the number of anomalies and normal samples used in the test-set. Table 6.9
summarizes the four parameter values in the confusion matrix: TP, FN, FP and FN
(confusion matrix parameters are discussed previously in Chapter 4). Table 6.10 shows
the experimental outcomes.
81
Table 6.9: Confusion matrix values (reduced test-set)
Number of
Parameter
Samples
TP 4027
TN 1
FP 10
FN 166
Performance
Percentage
Measure
Accuracy 0.9571
Precision 0.99
Recall 1
f1 - score 1
Miscalculation
0.041
rate
FAR 0
Detection
2.19
Time (sec)
The model is capable of detecting attacks over the reduced test-set, with more
than 95% accuracy, i.e. the model is successful of classifying more than 95% of the
samples correctly. The model generates a precision value of 99%. That is, whenever
the model labels a sample as “attack” or “normal”, it is 99% accurate. Our model
generates a recall value of 100%, which indicates that the model is capable of detecting
100% of all the attack instances present in the data-set. The model generates a zero
false alarm rates and a very low wrong detection rate of 4.1%. The proposed model
was capable of classifying 4205 samples of data in 2.19 seconds on an Intel Core i7
2.4GHz Central Processing Unit (CPU) without a Graphics Processing Unit (GPU),
which is impressively fast.
82
6.4. Performance on Full UNSW-NB15 Test-set
Number of
Parameter
Samples
TP 79966
TN 21
FP 2102
FN 242
Performance
Percentage
Measure
Accuracy 0.9715
Precision 0.99
Recall 0.97
f1 - score 0.98
Miscalculation
0.028
rate
FAR 0
Detection
36.2
Time (sec)
From Table 6.12, it could be observed that our proposed model is capable of
detecting attacks in the full UNSW-NB15 test-set with more than 97% accuracy. That
is, the model is successful of detecting more than 97% of the attack and normal
samples correctly. Interestingly, as the full test-set comprises of 4 new attack types,
83
the accuracy score shows that our model can also classify completely new attack types
as well. An impressive precision value of 0.99 shows that whenever our model
classifies a data sample as “attack” or “normal”, the model remains 99% correct in its
classification. A satisfactory recall value of 0.97 indicates that the model is capable of
detecting 97% of all the attack instances present in the full UNSW-NB15 test-set,
including the new attack types. The model generates a false alarm rate of 0.02,
indicating that the proposed model very seldom fires a false alarm to the user. A
miscalculation rate or wrong detection rate signifies how often the model classification
is wrong. A very low wrong detection rate of 0.02 signifies that our proposed model
exhibit a very negligible detection error. The model exhibits impressive speed and
was capable of classifying 82332 samples of data in only 36.2 seconds in an Intel Core
i7 2.4GHz Central Processing Unit (CPU) without a Graphics Processing Unit (GPU).
84
Chapter 7
Prime purpose of this research work was to detect intrusions in IoT network.
To accomplish the objective, Artificial Neural Network (ANN), specifically, Bi-
directional Long Short-Term Memory Recurrent Neural Network (BLSTM RNN), a
deep learning approach, and Googles ML framework termed TensorFlow was adopted
and utilized through Python programing language. This research work shows that DL
could effectively cope with securities in IoT network. The proposed model can detect
five type of attacks that occur to IoT network, namely: Analysis, Backdoors, DoS,
Reconnaissance and Worms.
IDS are evaluated by the attained accuracy of intrusion detection including the
false alarm rate (FAR) of the model. The proposed IoT intrusion detection model
demonstrated over 97% accuracy in effectively identifying attack and normal samples.
The proposed model reported a FAR of 2.5%. This value is equivalent to the model’s
general misclassification rate, which constitutes a false negative rate. The proposed
model’s sensitivity is found to be 100% in (shown in section 6.3), which implies an
impressive 0% false negative rate
The main contributions of this thesis are that, it investigates and explains the
efficiency of DL algorithms in addressing intrusion detection in IoT systems.
Secondly, it shows the efficiency of BLSTM RNN in detecting IoT attacks through
simulation results and shows the parameter values essential for BLSTM RNN to
generate high detection accuracy. This research work also contributes to the efficient
way of implementing BLSTM RNN approach by using Python programming language
and Google TensorFlow implementation framework.
This research work employs one of the most recent benchmark intrusion
dataset called UNSW-NB15 which is a synthetic dataset restricted to only 5 types of
attacks that occurs in any IoT network: Backdoor, DoS, Reconnaissance, Analysis and
Worms. In future work, we are planning to enrich the IoT attack dataset by adding
more IoT attack types with real IoT network traffic. The data pre-processing stage of
the thesis was done manually which took a lot of working hours. This is because the
source dataset was not in acceptable TensorFlow format. The recommendation to this
drawback is that further work should be done to automate this process. The thesis work
was analyzed only on CPU. The future recommendation is that the model should be
analyzed on different computing resources like GPUs and should port the model to
85
different platforms such as iOS, Android, Google Clouds, CUDA etc. to test the
performance of the proposed BLSTM RNN model. The best results generated by the
algorithm depended on parameters such as batch size, epochs, learning rate, time steps.
The values of these parameters were set manually per every iteration until the best
results was achieved. For future development we will be working on automating the
assigning of values for these parameters. This will ease the guess work and try and
error approach of getting the best values that will produce the best detection accuracy.
86
References
[3] X. Yuan, C. Li and X. Li, “DeepDefense: Identifying DDoS Attack via Deep
Learning,” in 2017 IEEE International Conference on Smart Computing
(SMARTCOMP), Hong Kong, 2017, pp. 1-8
87
system using machine learning classification.”, in International Conference on
Electrical Engineering and Informatics (ICEEI), 2015.
DOI:10.1109/ICEEI.2015.7352512.
[11] Jung, E., Cho, I., & Kang, S. M., “An Agent Modeling for Overcoming the
Heterogeneity in the IoT with Design Patterns”, in Park, J., Adeli, H., Park, N. and
Woungang, I. (Eds.) Mobile, Ubiquitous, and Intelligent Computing. Vo. 274, pp. 69-
74.
[12] A. Javaid, Q. Niyaz,W. Sun, and M. Alam, “A deep learning approach for
network intrusion detection system,'' in 9th EAI Int. Conf. Bio-inspired Inf. Commun.
Technol. (BIONETICS), New York, NY, USA, May 2016, pp. 21_26.
[15] X. Yuan, C. Li and X. Li, “DeepDefense: Identifying DDoS Attack via Deep
Learning," in 2017 IEEE International Conference on Smart Computing
(SMARTCOMP), Hong Kong, 2017, pp.1-8.
[17] Sabhnani M., and Serpen G., “Application of machine learning algorithms to
KDD intrusion detection dataset within misuse detection context”, in International
Conference on Machine Learning, Models, Technologies and Applications, pp. 209-
215, 2003.
[18] Ghorbani A., Lu W., and Tavallaee M., 2010, “Network Intrusion Detection
and Prevention: Concepts and Techniques”, Springer Science, LLC.
88
[19] Kumar, G. Sunil, and C. V. K. Sirisha, “Robust Preprocessing and Random
Forests Technique for Network Probe Anomaly Detection.,” International Journal of
Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-1, Issue-6,
January 2012. available:
http://www.academia.edu/9521473/Robust_Preprocessing_and_Random
_Forests_Technique_for_Network_Probe_Anomaly_Detection
[20] Bajaj and Arora, “Improving the Intrusion Detection using Discriminative
Machine Learning Approach and Improve the Time Complexity by Data Mining
Feature Selection Methods”, International Journal of Computer Applications (0975-
8887), Volume 76-No.1, August 2013. available:
http://research.ijcaonline.org/volume76/number1/pxc3890587.pdf
[21] Pervez M. S. and Farid D. M., "Feature selection and intrusion classification in
NSL-KDD cup 99 dataset employing SVMs," in The 8th International Conference on
Software, Knowledge, Information Management and Applications (SKIMA 2014),
Dhaka, 2014, pp. 1-6.
[22] Ingre B. and Yadav A., "Performance analysis of NSL-KDD dataset using
ANN," in International Conference on Signal Processing and Communication
Engineering Systems, Guntur, 2015, pp. 92-96.
[23] Moustafa N. and Slay J., 2015, “Unsw-nb15: A comprehensive data set for
network intrusion detection,” in MilCIS-IEEE Stream, Military Communications and
Information Systems Conference, Canberra, Australia, IEEE publication, 2015.
[24] Moustafa N. and Slay J., “The significant features of the UNSW-NB15 and the
KDD99 sets for Network Intrusion Detection Systems”, in the 4th International
Workshop on Building Analysis Datasets and Gathering Experience Returns for
Security (BADGERS 2015), collocated with RAID 2015, 2016. Available:
http://handle.unsw.edu.au/1959.4/unsworks_41254
[27] Z. Dewa and L. A. Maglaras, “Data Mining and Intrusion Detection Systems,”
vol. 7, no. 1, pp. 62–71, 2016
[29] C. Yin, Y. Zhu, J. Fei, and X. He, “A Deep Learning Approach for Intrusion
Detection Using Recurrent Neural Networks,” vol. 5, 2017.
[30] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,'' Nature, vol. 521, no.
7553, pp. 436_444, May 2015.
[35] G. Li and Z. Yan, “Data Fusion for Network Intrusion Detection: A Review”,
2018.
90
[38] T. M. Mitchell, “Machine Learning”, McGraw-Hill, Inc., New York, NY,
USA, 1 edition, 1997.
[40] A. Gulli and S. Pal, Deep Learning with Keras, April 2017. Birmingham: Packt
Publishing Ltd.
[41] K. Panetta. (2016) Gartner’s top 10 strategic technology trends for 2017.
[Online]. Available: http://www.gartner.com/smarterwithgartner/ gartners-top-10-
technology-trends-2017/
[46] H. Lee, “Framework and development of fault detection classification using iot
device and cloud environment,” Journal of Manufacturing Systems, 2017.
91
[49] A. Candel, V. Parmar, E. LeDell, and A. Arora, “Deep learning with h2o,”
2015.
[53] S. Raschka and V. Mirjalili, Python Machine Learning, 2nd ed. Birmingham,
UK: Packt Publishing, 2017.
92
[58] X. Song, H. Kanasugi, and R. Shibasaki, “Deeptransport: Prediction and
simulation of human mobility and transportation mode at a citywide level.” IJCAI,
2016.
[62] A. Gensler, J. Henze, B. Sick, and N. Raabe, “Deep learning for solar power
forecasting—an approach using autoencoder and lstm neural networks,” in Systems,
Man, and Cybernetics (SMC), 2016 IEEE International Conference on. IEEE, 2016,
pp. 2858–2865.
[65] Y. Tian and L. Pan, “Predicting short-term traffic flow by long short-term
memory recurrent neural network,” in Smart City/SocialCom/SustainCom
(SmartCity), 2015 IEEE International Conference on. IEEE, 2015, pp. 153–158.
93
[66] M.-J. Kang and J.-W. Kang, “Intrusion detection system using deep neural
network for in-vehicle network security,” PloS one, vol. 11, no. 6, p. e0155781, 2016.
[68] C. Liu, Y. Cao, Y. Luo, G. Chen, V. Vokkarane, and Y. Ma, “Deepfood: Deep
learning-based food image recognition for computer-aided dietary assessment,” in
International Conference on Smart Homes and Health Telematics. Springer, 2016, pp.
37–48.
94
[75] K. Kuwata and R. Shibasaki, “Estimating crop yields with deep learning and
remotely sensed data,” in Geoscience and Remote Sensing Symposium (IGARSS), 2015
IEEE International. IEEE, 2015, pp. 858– 861.
[79] H. Wang, N. Wang, and D.-Y. Yeung, “Collaborative deep learning for
recommender systems,” in Proceedings of the 21th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining. ACM, 2015, pp. 1235–1244.
95
[83] A. Luckow, M. Cook, N. Ashcraft, E. Weill, E. Djerekarov, and B. Vorster,
“Deep learning in the automotive industry: Applications and tools,” in Big Data (Big
Data), 2016 IEEE International Conference on. IEEE, 2016, pp. 3759–3768.
[84] Q. Wang, Y. Guo, L. Yu, and P. Li, “Earthquake prediction based on spatio-
temporal data mining: An lstm network approach,” IEEE Transactions on Emerging
Topics in Computing, 2017.
[87] W. Liu, J. Liu, X. Gu, K. Liu, X. Dai, and H. Ma, “Deep learning based
intelligent basketball arena with energy image,” in International Conference on
Multimedia Modeling. Springer, 2017, pp. 601–613.
[89] K.-C. Wang and R. Zemel, “classifying nba offensive plays using neural
networks,” in Proc. MIT SLOAN Sports Analytics Conf, 2016.
96
[92] F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. Goodfellow, A. Bergeron,
N. Bouchard, D. Warde-Farley, and Y. Bengio, “Theano: new features and speed
improvements,” arXiv preprint arXiv:1211.5590v1 [cs.SC], 2012.
[94] S. Raschka and V. Mirjalili, Python Machine Learning, 2nd ed. Birmingham,
UK: Packt Publishing, 2017.
97
[101] Z. Cui, S. Member, R. Ke, S. Member, and Y. Wang, “Deep Stacked
Bidirectional and Unidirectional LSTM Recurrent Neural Network for Network-wide
Traffic Speed Prediction,” 2016, pp. 1–12.
[105] H Ning, “Unit and Ubiquitous Internet of Things”, CRC Press Inc.,2013.
[106] E. Jung et al., “An Agent Modeling for Overcoming the Heterogeneity in the
IoT with Design Patterns.” in Park, J., Adeli, H., Park, N. and Woungang, I. (Eds.)
Mobile, Ubiquitous, and Intelligent Computing, Vol. 274, pp. 69-74, 2014.
[111] M. Weber, “Security challenges of the Internet of Things,” pp. 638–643, 2016.
98
[113] K. Lackner, “Composing a melody with long-short term memory (LSTM)
Recurrent Neural Networks,” 2016.
[114] Nerney, C. (2012) The tiny (yet powerful) world of speckled computing.
Available: http://www.itworld.com/article/2721483/consumer-tech-science/the-tiny--
yetpowerful--world-of-speckled-computing.html.
[119] M. Abomhara and G. M. Køien, “Cyber Security and the Internet of Things :
Vulnerabilities , Threats , Intruders,” vol. 4, pp. 65–88, 2015.
[122] N. Richárd. (2018, Sep 5). The Big Difference between Artificial and
Biological Neural Networks [Online]. Available: https://towardsdatascience.com/the-
differences-between-artificial-and-biological-neural-networks-a8b46db828b7
[123] A. Jonathan. (2016, Feb 21). What is the unit Step Function in Artificial Neural
Network? [Blog]. Available: https://www.quora.com/What-is-the-unit-step-Function-
in-Artificial-Neural-Network
99
Appendix - A
Conference
100
A Deep Learning Approach for Intrusion Detection
in Internet of Things using Bi-Directional Long
Short-Term Memory Recurrent Neural Network
Bipraneel Roy Dr. Hon Cheung
School of Computing, Engineering and Mathematics School of Computing, Engineering and Mathematics
Western Sydney University Western Sydney University
Sydney, Australia Sydney, Australia
B.Roy2@westernsydney.edu.au H.Cheung@westernsydney.edu.au
102
h t H (WxhXt Whhh t 1 bh ) (1) datasets, UNSW-NB15 exhibits contemporary attack patterns
and modern normal traffic patterns. Moreover, since UNSW-
NB15 has separate training-set and testing-set, data
h t H (WxhXt Whhh t 1 bh ) (2)
distribution remains different [22]. Again, in [26], the authors
points out that: “It encompasses realistic normal traffic
Yt Whyh t Whyh t by) (3) behavior and combines it with the synthesized up to date
attack instances”. [27] also points out that previous
The final output vector, 𝒀𝑇 is calculated by the equation: benchmark data sets like KDD’99 and NSL-KDD could not
meet the current network security research needs as they does
Yt (h t, h t ) (4) not comprehend the present-day network security
circumstances and the latest attack features.
The function combines both the output sequences
from the neurons in the hidden layers and can be one of four We choose UNSW-NB15 data set for our research as it
functions: concatenation, summation, averaging and covers modern attack patterns, consists of modern normal
multiplication. traffic patterns, and contains only two classes (‘attack’ and
‘normal’). Since we are performing binary classification task,
Incorporating BRNNs with LSTM neurons results a this class distribution facilitates our proposed approach.
bidirectional LSTM recurrent neural network (BLSTM
Secondly, UNSW-NB15 forms a comprehensive data set that
RNN) [12]. The BLSTM RNN is capable of accessing long-
presents 5 types of IoT attacks. The categories of attack
term context data in both the backward and forward
directions. The combination of both the forward and classes are discussed below:
backward LSTM layers is considered as a single BLSTM
layer. It has been shown that the bidirectional models are 1) Analysis
considerably better than regular unidirectional models in These types of attacks are targeted at IoT system networks.
various domains like phoneme classification and speech The attacker first acquires related network information
recognition [13]. through packet sniffing or port scanning and then launches
attacks on the targeted network [28].
D. Intrusion Dataset
Moustafa and Slay (in 2015) [14] suggested that the NSL- 2) Backdoor
KDD dataset and KDD’99 dataset did not characterize the up- With the advancement of IoT, several proposed IoT operating
to-date features for intrusion detection, and presented a systems such as Contik and RTOS might encompass
comprehensive and all-inclusive dataset called the backdoor where it is possible to reprogram them for getting
UNSWNB15. This dataset encompassed several features access to confidential data stored or transmitted on the IoT
from KDD’99 dataset [15]. They further analyzed the networks [29].
features of the KDD’99 dataset and the UNSW-NB15
dataset. The results demonstrated that actual KDD’99 dataset 3) Denial-of-Service
features were less representative as compared to the features Over the application layer, an IoT network can be
of UNSW-NB15 dataset [15]. compromised by Denial of Service (DoS) attacks or
The UNSW-NB15 dataset contains 45 features [16]. The Distributed Denial of Service (DDoS) attacks; where, the
dataset is further divided into training and testing datasets that service becomes unavailable to the authentic users, because
contain all the current attack types. T. Janarthanan and S. the system becomes unavailable due to overwhelming
Zargari [15] performed an extensive study on the UNSW- number of requests resulting in resources and capacity
NB15 dataset for the purpose of extracting the most overload [28].
competent features and thus proposed a subset with features
which dramatically increased the intrusion detection 4) Worms
efficiency. The UNSW-NB15 dataset is the most recent and Worms are malicious software that can be executed on the
effective dataset published for intrusion detection research IoT Application layer that could harm IoT System devices.
purposes. In this paper, the dataset subset in the file ‘UNSW- For instance, Stuxnet and Mirai have been developed to
NB15_training-set.csv’ is used for training the proposed IDS attack IoT objects [29].
model, while that in ‘UNSW_NB15_test-set.csv’ is used for
testing the model. Both the dataset files can be obtained from: 5) Reconnaissance
https://www.unsw.adfa.edu.au/australian-centre-for-cyber- It is an umbrella term of any illegitimate mapping and
security/cybersecurity/ADFA-NB15-Datasets/ discovery of vulnerabilities in systems and services. For
example, packet sniffing, port scanning and traffic analysis
The data set file ‘UNSW-NB15_training-set.csv’contains [30].
175,341 records for training, while the test set file ‘UNSW-
NB15_testing-set.csv’ contains 82,332 records. the UNSW-
NB15 dataset has 9 attack types in total, out of which 5 types E. Data Preprocessing
are often present in IoT attacks (Analysis, Backdoor, Denial- As an initial experiment, reduced dataset samples are
of-Service, Worms, and Reconnaissance) [28][29][30]. randomly selected from the whole training set and placed in
Hence, we extracted only these 5 types of attack samples a new .csv Microsoft Excel file titled
along with the 'normal' samples to prepare our test-set. “UNSW_NB15_training-set_5451.csv”. In addition, we only
In [22], the authors have used UNSW-NB15 dataset for consider the attributes proposed in [15], namely, service,
conducting IoT research because unlike previous benchmark sbytes, sstl, smean, and ct_dst_sport_ltm. The training dataset
103
is manually manipulated using the approach of Fu et al. [8], False Positive Rate (FPR), calculated by (7), is the
where, the authors has followed the approach of manually percentage at which the system incorrectly classifies normal
adding some abnormal samples in the dataset in order to make samples as anomaly:
the dataset fit for their research purpose. The benefit of using FP (7)
the approach is that the input dataset would be competent for FPR
XNormal
intrusion detection, which would fit the goal of the research.
Moreover the approach helps in dealing with the problem of where, XNormal is the number of actual normal samples in
X.
procuring labeled intrusion detection IoT datasets at a high
cost. Here, we have followed the approach of manual Other parameters for evaluating the proposed model
manipulation and have extracted the features and attack types include precision, recall and f1-score values. Precision is
manually. Thus, our resulting dataset consists of 5 features calculated as the ratio of correct positive detections to the
and two class labels: “Attack” and “Normal”. Table I shows total actual positive detections, as shown in (8):
the dataset structure. The first 5 columns represent the
extracted features, the 6th column represents the attack TP (8)
category and the last column is the binary labeling. Value 0 precision
TP FP
resembles ‘normal’ and value 1 as ‘attack’.
F. Evaluation Matrix Recall is the ratio of correct positive detections to the
The confusion matrix is applied to characterize the number of actual abnormal samples, as presented in (9):
accuracy of our proposed BLSTM RNN model during TP (9)
testing. The confusion matrix is a 2-dimensional matrix recall
TP FN
representing the correlation amongst the detected and actual
values as shown in Fig. 2. True Positive (TP) specifies the
count of anomalous or unusual samples that are accurately In (10), F1-Score denotes the harmonic mean of recall and
detected by the system. True negative (TN) signifies the precision:
amount of normal samples which are detected as normal by
the system. False Positive (FP) represents the count of normal 2(recall * precision) (10)
f 1 score
samples which are recognized as anomalies. False Negative recall precision
(FN) refers to the amount of attack samples which have been
classified as normal.
104
environments. It employs dataflow graphs and maps the shows the number of anomalies and normal samples used in
nodes or vertices of the graph across multiple machines the test-set.
incorporating graphics processing units (GPUs), multicore
central processing units (CPUs) and Tensor processing units Table III summarizes the four performance values in the
(TPUs). The architectural design provides a receptive and confusion matrix: TP, FN, FP and FN. Table IV shows the
flexible platform for the application developers by allowing experimental outcomes. The proposed model is able to detect
the developers to research with novel training algorithms and attacks using the reduced UNSW_NB15 dataset, with more
optimizations. TensorFlow supports several of applications, than 95% accuracy with 100% precision. The model
with an emphasis on training and implication on deep generates a zero false alarm rates and a very low wrong
learning neural networks and it is being widely used for ML detection rate of 0.04% with an impressive recall and f1-score
research [23]. Its supple dataflow representation aids power value of 98%. The proposed model was capable of classifying
users to accomplish excellent performance. 4205 samples of data in 2.19 seconds on an Intel Core i7
2.4GHz Central Processing Unit (CPU) without a Graphics
B. Data Preprocessing phase Processing Unit (GPU).
Data pre-processing forms the first phase of TABLE 7.2: NUMBER OF SAMPLES USED FOR
implementation stage. In this phase, the whole training CLASSIFICATION
dataset is read and stored in the computer memory. After that, Class Sample size
feature extraction is employed. Since heterogeneous data Attack 4094
type is not supported by Python, the non-numeric data entries Normal 112
(also called categorical data, such as the feature service,
shown in table I) are then converted to numeric values.
Dependent variables are encoded followed by data TABLE 7.3: SIMULATED RESULTS OF THE FOUR PERFORMANCE
VALUES IN THE CONFUSION MATRIX
normalization (feature scaling). In order to process the
features, we need to create a TensorFlow data structure for Parameter Number of Samples
storing the features and labels. Since, we are employing an TP 4027
RNN, reshaping the data to respective time-steps is required. TN 1
Reshaping forms the last step of data pre-processing phase. FP 10
FN 166
C. Training phase
Training is the second phase of the implementation. First
TABLE 7.4: REPORTED ACCURACY, PRECISION, RECALL AND F1-
we have built the BLSTM RNN model by using Keras SCORE OF THE PROPOSED CLASSIFIER INCLUDING
library. The model is then compiled and then followed by MISCALCULATION RATE AND FAR
model-training. It is here, where the UNSW_NB15_training-
set_5451.csv (reduced training-set) file is further divided into Performance
Percentage
two subsets: Training set and Validation set, with a split ratio Measure
of 0.33%, i.e., 67% of the UNSW_NB15_training- Accuracy 0.9571
Precision 1
set_5451.csv will be used for training, while 33% of for Recall 0.96
validating. The training subset is used by the compiler to train f1 - score 0.98
the model, while the validation subset is used for evaluating Miscalculation rate 0.041
the model performance after each epoch. FAR 0
Detection Time (sec) 2.19
After training the model, we analyse model’s
performance and repeat the training after tuning the model’s
parameters, until satisfactory performance is attained. VI. CONCLUSION AND FUTURE WORK
D. Testing phase In this paper, we have presented a new IDS model based
In this phase of our system, we load the test dataset and on the BLSTM RNN for anomaly intrusion detection. The
feed it into our trained model for the testing purpose. We then BLSTM RNN is able to perform deep learning effectively
record the evaluation matrix for analysing our system. and to learn detailed features from the dataset in the training
phase. This ability is important in learning characteristics in
V. RESULTS AND DISCUSSION network traffic involved in anomaly intrusions to distinguish
The Keras deep learning framework [17] and Google abnormal traffic from normal traffic.
TensorFlow library are used to simulate the proposed We use Keras deep learning framework and Google
BLSML RNN model. In the simulations, the proposed model TensorFlow library to implement the proposed new model.
basically performs binary classification where it classifies The implemented BLSTM was applied to a reduced dataset
each input test sample as “normal” or “attack” in the testing of the UNSW-NB15 dataset, which was used in several
phase. The evaluation metrics defined in the previous section, published works on IDS in IoT networks. The detection was
i.e., accuracy, error rate, precision, false positive rate, true based on binary classification, thus identifying normal and
positive rate, recall and F-1 score are used to evaluate the threat patterns. The developed model was able to achieve
model performance in detecting intrusions. In the experiment, high accuracy in detecting attack traffic in the used dataset.
the simulated model was trained with a total of 5451 samples.
The training samples were deduced from For future developments, more experiments will be
UNSW_NB15_training-set.csv file. The model was then performed to further analyse the proposed BLSTM RNN
tested with 4206 test samples. These test samples were model using large data sets from published data sets,
extracted from UNSW_NB15_testing-set.csv file. Table II especially data sets containing dedicated IoT traffics. In
addition, the developed model will be improved to increase
105
its detection accuracy further and the trade-offs between K. PEFFERS, T. TUUNANEN, M. ROTHENBERGER, and S.
detection parameters. CHATTERJEE, A Design Science Research Methodology for
Information Systems Research, Journal of Management Information
Systems, vol. 24, no. 3, pp. 45-77, 2007.
VII. REFERENCES
T. Janarthanan and S. Zargari, "Feature selection in UNSW-NB15 and
KDDCUP'99 datasets," 2017 IEEE 26th International Symposium on
M. U. Farooq, M. Waseem, A. Khairi, & S. Mazhar, “A critical analysis on Industrial Electronics (ISIE), Edinburgh, 2017, pp. 1881-1886. doi:
the security concerns of the internet of things (IoT)”, International 10.1109/ISIE.2017.8001537
Journal of Computer Applications, 111, 2015. A. Javaid, Q. Niyaz,W. Sun, and M. Alam, ``A deep learning approach for
Kevin Ashton, That Internet of things thing, It can be accessed at: network intrusion detection system,'' presented at the 9th EAI Int. Conf.
http://www.rfidjournal.com/articles/view?4986. Bio-inspired Inf. Commun. Technol. (BIONETICS), New York, NY,
C. Yin, Y. Zhu, J. Fei, and X. He, “A Deep Learning Approach for Intrusion USA, May 2016, pp. 21-26.
Detection Using Recurrent Neural Networks,” vol. 5, 2017. Z. Dewa and L. A. Maglaras, “Data Mining and Intrusion Detection
Y. LeCun, Y. Bengio, and G. Hinton, ``Deep learning,'' Nature, vol. 521, no. Systems,” vol. 7, no. 1, pp. 62–71, 2016.
7553, pp. 436_444, May 2015. M. Elkhodr, S. Shahrestani, and H. Cheung, “T HE INTERNET OF
B. A. Tama and K. H. Rhee, “Attack Classification Analysis of IoT Network THINGS : NEW INTEROPERABILITY , MANAGEMENT AND
via Deep Learning Approach,” Research Briefs on Information & SECURITY CHALLENGES,” vol. 8, no. 2, pp. 85–102, 2016.
Communication Technology Evolution (ReBICTE), 2018. Doi: M. Elkhodr, S. Shahrestani, and H. Cheung, “T HE INTERNET OF
10.22667/ReBiCTE.2017.11.15.015. THINGS : NEW INTEROPERABILITY , MANAGEMENT AND
A. Javaid, Q. Niyaz,W. Sun, and M. Alam, ``A deep learning approach for SECURITY CHALLENGES,” vol. 8, no. 2, pp. 85–102, 2016.
network intrusion detection system,'' presented at the 9th EAI Int. Conf. Z. Cui, S. Member, R. Ke, S. Member, and Y. Wang, “Deep Stacked
Bio-inspired Inf. Commun. Technol. (BIONETICS), New York, NY, Bidirectional and Unidirectional LSTM Recurrent Neural Network for
USA, May 2016, pp. 21-26. Network-wide Traffic Speed Prediction,” pp. 1–12, 2018.
T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and M. Ghogho, T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. Le Roux, and K. Takeda,
``Deep learning approach for network intrusion detection in software “BIDIRECTIONAL LSTM-HMM HYBRID SYSTEM FOR
defined networking,'' in Proc. Int. Conf. Wireless Netw. Mobile POLYPHONIC SOUND EVENT DETECTION Mitsubishi Electric
Commun. (WINCOM), Oct. 2016, pp. 258-263. Research Laboratories ( MERL ), 201 Broadway , Cambridge , MA 02139 ,
Fu et al., “An Intrusion Detection Scheme Based on Anomaly Mining in USA,” no. September, pp. 2–6, 2016.
Internet of Things”, In IEEE International Conference on Wireless, B. A. Tama and K. H. Rhee, “Attack Classification Analysis of IoT Network
Mobile & Multimedia Networks (ICWMMN 2011), Beijing, 2011, pp. via Deep Learning Approach,” Research Briefs on Information &
315-320. DOI: 10.1049/cp.2011.1014. Communication Technology Evolution (ReBICTE), 2018. Doi:
Bajaj and Arora, “Improving the Intrusion Detection using Discriminative 10.22667/ReBiCTE.2017.11.15.015.
Machine Learning Approach and Improve the Time Complexity by P. Barham et al., “TensorFlow : A system for large-scale machine learning,”
Data Mining Feature Selection Methods”, International Journal of pp. 265–284.
Computer Applications (0975-8887), Volume 76-No.1, August 2013.
N. Buduma, TensorFlow for deep learning—implementing neural networks.
M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” USA: O'Reilly Media, Inc., 2016.
IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–
V. Timčenko and S. Gajin, “Machine Learning based Network Anomaly
2681, 1997.
Detection for IoT environments.”
Mike Schuster and Kuldip K Paliwal, “Bidirectional recurrent neural
G. Li and Z. Yan, “Data Fusion for Network Intrusion Detection : A
networks,” Signal Processing, IEEE Transactions on, vol. 45, no. 11,
Review,” 2018.
pp. 2673–2681, 1997.
H. A. Abdul-ghani, D. Konstantas, and M. Mahyoub, “A Comprehensive IoT
Alex Graves and Jurgen Schmidhuber, “Framewise ¨ phoneme classification
Attacks Survey based on a Building-blocked Reference Model,” vol. 9,
with bidirectional LSTM and other neural network architectures,”
no. 3, 2018.
Neural Networks, vol. 18, no. 5, pp. 602–610, 2005.
M. Abomhara and G. M. Køien, “Cyber Security and the Internet of Things :
Z. Cui, S. Member, R. Ke, S. Member, and Y. Wang, “Deep Stacked
Vulnerabilities , Threats , Intruders,” vol. 4, pp. 65–88, 2015.
Bidirectional and Unidirectional LSTM Recurrent Neural Network for
Network-wide Traffic Speed Prediction,” pp. 1–12, 2018.
106