0% found this document useful (0 votes)
7 views6 pages

Article 13

Uploaded by

araj16585
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views6 pages

Article 13

Uploaded by

araj16585
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

ISSN (Print) : 0974-6846

Indian Journal of Science and Technology, Vol 11(48), DOI: 10.17485/ijst/2018/v11i48/139802, December 2018 ISSN (Online) : 0974-5645

Network Intrusion Detection System


Using Machine Learning
Riyazahmed A. Jamadar*
Department of Information Technology, AISSMS Institute of Information Technology /
Savitribai Phule Pune University (SPPU), Sangamvadi, Pune − 411001, Maharashtra, India; riyaz.jamadar@gmail.com

Abstract
Objective: This study proposes a model for building the network intrusion detection system using a machine learning
algorithm called decision tree. This system detects primarily an anomaly based intrusion. Methods: In this model, the
categorical features from the dataset Change Control IDentifiers (CCIDS) 2017 are encoded using label encoder. Using
Recursive-Feature-Elimination (RFE) some best features is selected. This data is then divided into training and testing
data. Training data is then used to form a Decision-Tree-Model wherein each leaf signifies the possible outcome. Findings:
Classification models are developed making use of the training data to classify the test data as malicious or benign.
Measuring the accuracy of the classifier on future data rather than the past data is of a paramount aspect. The observed
accuracy of the classifier on test data is 99%. The precision of the proposed system indicates that the True-Positive-Rate
(TPR) is 99.9% and the False-Positive-Rate (FPR) is 0.1%. The proposed model uses the latest data set for training data
and test data compared to the traditional systems which have been modeled using KDD-CUP-99 data set. Moreover, unlike
other systems, it does not use any data-mining tool like Weka. This work provides as basis for any new algorithm using
dataset CCIDS 2017. Improvements: The work can be extended to exploit the big data available for attacks and intrusions
using big data analytics.

Keywords: Accuracy, Detection, Decision Tree, Intrusion, Machine Learning

1. Introduction and if matched it will return the attack type. However, it


is notable to check anomaly based intrusion. Due to the
Network security and Information security is a paramount available signature based file the false positive rate is low.
concern of the growing economy. For the sake of network Profile based attacks also termed as anomaly based
security, at personal level, only the installation of antivi- attacks are those attacks which doesn’t use any already
rus and firewall on the system is performed. However, for predefined path. The IDS which is employed to detect
an organization, the task of handling the network security such kind of attack should be flexible enough to handle
is not that simple. It not only requires the updated type
such anonymous scenario. It has high false-positive-rate1.
of attacks but also capable to deal the data in enormous
The datasets available for training are KDD-CUP-99
amount. The Intrusion Detection System (IDS) would
and DARPA 98/99. But these datasets have become out-
detect any intrusion and alert the administrator.
dated. This will become a hurdle for the researchers aim-
Two types of attacks are possible:
ing to build anomaly based intrusion detection system.
1. Signature based attack, and Many of the recent attacks such as SSH attacks are not
2. Profile based attack (Anomaly). covered in these datasets. Thus a dataset is required
which is independent of redundancy and has real-world
Signature based attack detects all the predefined attacks. data. The openly available dataset DARPA98/99 and
The signature based files are mapped with the attacks KDD-CUP-99 have some limitations over updating of

*Author for correspondence


Network Intrusion Detection System Using Machine Learning

new attacks. CICIDS 2017 is one of the latest real-world to counteract the dataset limitations such as nonlinear,
dataset which overcomes all the limitations till date. It limited and incomplete. In this paper they used KDD-
primarily includes the labeled flows, based upon the time CUP99 as their source dataset. Firstly the dataset from
stamp, protocols, ports, source IPs and destination IPs KDD-CUP99 is feed to both the decision tree based
and the outcomes of the network traffic analysis by CIC approach and multilayer perceptron which classifies them
Flow Meter. Moreover, it provides 84 feature with 4 cat- and label the data into attack or benign. This label along
egorical columns. with the data becomes the new data set which is again
feed to the well trained multilayer perceptron to evaluate
the test data. The major short-comings of this approach
2. Literature Survey are that it doesn’t account for handling big data.
The work proposed in2 analyzes the various supervised In5 develops a learning-model for fast learning net-
machine learning classifiers based on the data sets con- work based on Particle-Swarm-Optimization and named
taining the labeled instances/objects of network traffic as PSO-FLN KDD-CUP99 dataset was used. Here they
features/parameters obtained from genuine and malicious considered the data as a particle, firstly the provide
application. Their main focus was to build NIDS which weights to the particle and then for each of the particle
is termed as mobile based Network Intrusion Detection they build a fast learning network. For each of the particle
system. It employs ISCX Android Botnet Dataset which they calculate the accuracy and if the fitness-value is bet-
contains 1929 samples from botnet families of four years. ter than the best local fitness-value it will going to set the
Firstly the data from the ISCX Android Botnet Dataset current fitness value as best local fitness value. Update the
is passed through the genuine and malicious applica- particle position accordingly. The major short-comings
tion after which filtration and selection of feature is done of this approach is that it does not account for handling
resulting in the formation of labeled data. These labeled the big data and also it gives high false-positive-rate.
data is divided into test and training data. The training By increasing the number of node it is possible to get
data is used to develop a model using machine learning improved accuracy.
based algorithm classifier which in turn, is used to evalu- The work proposed in6 uses the KDD-CUP99 as
ate the test data. The system has high false-positive-rate. their data set. In their work firstly they feed the data to
Random Forest classifier is used to classify the data and Principal-Component-Analysis (PCA) which reduces the
due to its high true-positive-rate as compared to other higher dimension dataset to lower dimension dataset.
machine learning classifier; however its false-positive- Then this new dataset is fed into various machine-learn-
rate is slightly high. They also used weka as data mining ing based algorithms such as support vector machine,
tool which adds more overheads. k-nearest neighbor, decision tree algorithm, random
The work proposed in3 is two tiers architecture to forest tree classification algorithm, adaboost algorithm,
detect network level intrusions. They used weka data min- naïve bayes probabilistic classifier. Experiment results
ing tool using NSL-KDD dataset. Firstly, they processed are analyzed and compared among the algorithms with
the data by building autonomous model on training set regard to detection-rate and detection-time in which tree
using hierarchical agglomerative clustering, further data algorithm achieved superior results. They use the weka
gets classified using KNN classification and finally mis- interface as their machine learning tool which in turn
use-detection and anomaly-detection are done using increases the overhead.
multilevel perceptron and reinforcement algorithm. The The work proposed in7 makes use of NSL-KDD
use of unsupervised learning (Hierarchical clustering) to dataset. In this study, the dataset is normalized and dis-
develop the data warehouse iteratively which makes their cretized by the k-means method and the selection of fea-
system self-autonomous. The use of weka as a data mining ture using Information gain algorithm which is passed to
tool increases the overhead. The false-positive-rate is high. the naïve bayes machine learning algorithm. They found
The work proposed in4 implements a classification that k-means clustering method provides better result
method which is basically combines a machine learning as relative to the discretization technique of mean and
based decision tree algorithm and multilayer perceptron. standard deviation. The data after getting labeled from
They use Artificial Neural Networks (ANNs) primarily the k-means method is fed to the information-gain that

2 Vol 11 (48) | December 2018 | www.indjst.org Indian Journal of Science and Technology
Riyazahmed A. Jamadar

uses scoring methods for nominal or weighting of con- bytes etc. were selected. This data is then divided into
tinuous attributes that are discredited by using the maxi- training and testing data. Training data is then used to
mum entropy. The major short-comings of this approach build a decision tree model. Decision tree is a supervised
are that the k-means method can’t handle nonlinear and learning method which needs training data for building
incomplete data. The accuracy and false-positive-rate of a model and based on this training model testing data is
the system can be further improved. tested.
Decision tree uses a tree-like structure where each leaf
signifies the possible outcome; in this case each leaf rep-
3. Dataset CICIDS 2017 resents the type of attack or normal behavior (benign).
Network intrusion detection system requires updated Test data is passed through the training model to deter-
data so as to train the model to work effectively in the mine whether it is benign or attack and if it resembles any
anonymous intrusion. The openly accessible dataset KDD- attack then it will return the type of attack.
CUP99, DARPA98/99 has limitations over the updating of Decision trees like algorithms work through recur-
new attacks. CICIDS 2017 dataset has genuine as well as sive partitioning of the training set in order to obtain
most common attacks resembling the true real-world data subsets that are as pure as possible to a given target
(CSV’s). Moreover it has the labeled-flows based upon the class. Each node of the tree is associated to a particular
time stamp, protocols, ports, source IPs and destination IPs set of records T that is split by a specific test on a fea-
and the outcome of the network traffic analysis by CIC Flow ture. For instance, a split on a continuous attribute A can
Meter. It also covers the complete network traffic, complete be induced by the test A≤x. The set of records T is then
capture of the data, attack diversity such as web based, brute divided in two subsets that lead to the left branch of the
force, DoS, DDos, infiltration, Bot and Scan and the hetero- tree and the right one8-10.
geneity of the captured data which is not covered by earlier Tl={t ϵ T:t(A)≤x}
datasets. Moreover, it provides 84 feature with 4 categorical and
columns. The 11 criteria important for developing a reliable Tr={t ϵ T:t(A)>x}
dataset are: Complete Network Configuration, Complete Similarly, a categorical feature B can be used to induce
Traffic, Labeled Dataset, and Complete Interaction, splits according to its values. For example:
Complete Capture, and Available protocols, Attack Diversity, If B={b1,…,bk} each branch ‘ i’ can be induced by the
Heterogeneity, Feature Set and Metadata. test B=bi .
The divide step of the recursive algorithm to induce
decision tree takes into account all possible splits for
4. Proposed Architecture each feature and tries to find the best one according to a
In the proposed architecture, the data is collected from selected quality measure.
the openly accessible dataset i.e. CCIDS 2017. Then the Figure 1 signifies that each leaf represents the type of
categorical features of the data are encoded using label attack or normal behavior (benign).Test data is passed
encoder. The label encoder is used in order to convert through the training model to determine whether it is
string data into numerical format as any machine-learn- benign or attack and if it resembles any attack then it will
ing based algorithm is incapable to accept any string data. return the type of attack.
All the features/parameters of the data are not needed for
developing the model therefore; some best features are
5. Experimental Results
selected using RFE. This data is then divided into train-
ing and testing data. Training data is then used to form For the classification problems the TPR (Success rate of
a Decision-Tree-Model. Decision tree uses a tree-like detecting malicious activity) and FPR are two important
structure where each leaf signifies the possible outcome. factors. Classification models are developed making use
Some of the best features are selected using RFE as of the training data to classify the test data as malicious or
all the features of the data are not necessary for building benign. Therefore, it is important to measure the accuracy
the model. A Total of 13 features were selected from 83 of the classifier on future data rather than in the past data.
features. Features such as source ip, destination ip, flow The observed accuracy of the classifier on test data is 99%.

Vol 11 (48) | December 2018 | www.indjst.org Indian Journal of Science and Technology 3
Network Intrusion Detection System Using Machine Learning

Figure 1. Working of the proposed Network Intrusion Detection system (NIDS).

In the available CICIDS 2017 dataset provide us with 84 Table 2 shows the confusion matrix developed by
features from which 4 are categorical feature. analyzing the test data. This matrix helps us to calculate
Table 1 shows the f1-score provides the harmonic the accuracy of the proposed system. The precision of the
mean of precision and recall. The scores corresponding proposed system indicates that the TPR is 99.9% and the
to every class represent the accuracy of the classifier in FPR is 0.1%.
classifying the data points in that particular class com-
pared to all other classes. The support indicates the num-
6. Conclusion and Future Work
ber of samples of the true response that lie in that class.
The training and test models have been developed using Previously KDD-Cup99 Dataset was considered as the
Python libraries on system of Core i3 7th Gen processor, benchmark dataset for intrusion-detection but Nowadays,
using database SQLite3. the network and the attack methods have changed drasti-

4 Vol 11 (48) | December 2018 | www.indjst.org Indian Journal of Science and Technology
Riyazahmed A. Jamadar

Table 1. Classification report 7. Acknowledgement


Attribute Precision Recall f1-score Support
The Author would like to thank UNB Canadian Institute
0.0 1.00 1.00 1.00 265572
of Cyber security for providing CICIDS 2017 dataset for
1.0 0.95 0.95 0.95 41 this work.
2.0 1.00 1.00 1.00 2479
3.0 1.00 0.98 0.99 195
4.0 1.00 1.00 1.00 5938
8. References
5.0 1.00 1.00 1.00 4 1. Jamadar RA, Himani Gupta, Ankit Baghel & Rituraj. Study
6.0 0.69 0.61 0.65 18 and analysis of hadoop based Network Intrusion Detection
7.0 1.00 1.00 1.00 7625 System, International Journal of Engineering and Science
Invention. Dec 2017; 6(12):1−4. http://www.ijesi.org/
8.0 1.00 1.00 1.00 260
papers/Vol(6)12/Version-4/A0612040104.pdf.
9.0 1.00 1.00 1.00 740
2. Kumar S. Viinikainen A. & Hamalainen T. Machine
10.0 1.00 0.86 0.92 14 learning classification model for network based intrusion
11.0 1.00 1.00 1.00 323 detection system, 2016 11th International Conference for
Average / Internet Technology and Secured Transactions (ICITST),
1.00 1.00 1.00 283209
total 5-7 Dec. 2016. IEEE, Barcelona, Spain; Dec 2016. p.
242−49. https://ieeexplore.ieee.org/document/7856705/
citations#citations.
cally so use CICIDS 2017 dataset is used. Thus, it can be
3. Divyatmika & Manasa Sreekesh. A Two-tier Network based
used to detect attacks based on current network scenario.
Intrusion Detection System Architecture using Machine
The approach based on Decision Tree is presented
Learning Approach. 2016 International Conference on
and discussed to develop an efficient intrusion detection Electrical, Electronics, and Optimization Techniques
model. The experimental results demonstrate that the (ICEEOT), 3-5 March 2016. IEEE, Chennai, India;
proposed approach can be used to develop an Intrusion- Mar. 2016. p. 42−47. https://ieeexplore.ieee.org/docu-
Detection-Model having high detection rate, high accu- ment/7755404.
racy (99.9%) and low False-Positive-Rate. 4. Jamal Esmaily, Reza Moradinezhad & Jamal Ghasemi.
The future work would be collecting real time pack- Intrusion Detection System Based on Multi-Layer
ets from the network and testing them against the already Perceptron Neural Networks and Decision Tree. 2015 7th
classified training dataset. Based on results achieved this Conference on Information and Knowledge Technology
work can be extended to host based IDS or analysis on an (IKT), 05 October 2015. IEEE, Urmia: Iran; 2015. https://
application level. doi.org/10.1109/IKT.2015.7288736.

Table 2. Confusion matrix


265563 2 0 3 0 0 7 0 0 0 0 0
2 39 0 0 0 0 0 0 0 0 0 0
0 0 2479 0 0 0 0 0 0 0 0 0
0 0 0 192 0 0 0 0 0 0 0 0
2 0 0 0 5938 0 0 0 0 0 0 0
0 0 0 0 0 4 0 0 0 0 0 0
5 0 0 0 0 0 11 0 0 0 0 0
0 0 0 0 0 0 0 7624 0 0 2 0
0 0 0 0 0 0 0 0 260 0 0 0
0 0 0 0 0 0 0 0 0 740 0 0
0 0 0 0 0 0 0 0 0 0 12 0
0 0 0 0 0 0 0 0 0 0 0 323

Vol 11 (48) | December 2018 | www.indjst.org Indian Journal of Science and Technology 5
Network Intrusion Detection System Using Machine Learning

5. Ali MH, Bahaa Abbas Dawood Al Mohammed, Alyani 8. Mathematics Behind Classification and Regression. Date
Ismail & Mohamad Fadli Zolkipli. A new intrusion detec- accessed: 26.11.2012. https://stats.stackexchange.com/
tion system based on Fast learning network and swarm opti- questions/44382/mathematics-behind-classification-and-
mization. IEEE, 2018; 6:20255–61. https://doi.org/10.1109/ regression-trees.
ACCESS.2018.2820092. 9. Sharafaldin I, Arash Habibi Lashkari & Ali A. Ghorbani
6. Chabathula KJ, Jaidhar CD & Ajay Kumara MA. Comparative Toward Generating a New Intrusion Detection Dataset
study of principal component analysis based intrusion and Intrusion Traffic Characterization. 4th International
detection approach using machine learning Algorithms.
Conference on Information Systems Security and Privacy
3rd International Conference on Signal Processing,
(ICISSP); Jan 2018. p. 108−16. http://www.scitepress.org/
Communication and Networking (ICSCN); 2015. p. 1−6.
Papers/2018/66398/66398.pdf.
https://doi.org/10.1109/ICSCN.2015.7219853.
7. Effendy DA, Kusrini Kusrini & Sudarmawan Sudarmawan. 10. Gharib A, Iman Sharafaldin, Arash Habibi Lashkari & Ali
Classification of intrusion Detection System (IDS) based A. Ghor. An evaluation framework for intrusion detec-
on computer network. 2017 2nd International confer- tion data set. International Conference on Information
ences on Information Technology, Information Systems Science and Security (ICISS), 19-22 Dec. 2016. IEEE,
and Electrical Engineering (ICITISEE). IEEE, Yogyakarta: Pattaya: Thailand; 2016. p. 1−6 https://doi.org/10.1109/
Indonesia; 2017. p. 90−94. https://ieeexplore.ieee.org/doc- ICISSEC.2016.7885840. PMid: 27047644, PMCid:
ument/8285566. PMC4818783.

6 Vol 11 (48) | December 2018 | www.indjst.org Indian Journal of Science and Technology

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy