Sysj 20
Sysj 20
Abstract—This survey presents a comprehensive overview of Different IDSs can employ diverse algorithms for detecting
Machine Learning (ML) methods for cybersecurity intrusion attacks. These algorithms can be classified into three categories
detection systems, with a specific focus on recent approaches [10]: i) rule-based algorithms, which use prior knowledge
based on Deep Learning (DL). The review analyzes recent
methods with respect to their intrusion detection mechanisms, of attacks, such as the corresponding data distributions, to
performance results, and limitations as well as whether they use create a rule system and perform detection; ii) statistics-based
benchmark databases to ensure a fair evaluation. In addition, a algorithms, which detect anomalies by building a statistical
detailed investigation of benchmark datasets for cybersecurity distribution of intrusion patterns; and iii) Machine Learning
is presented. This paper is intended to provide a road map (ML)-based approaches, in which learning algorithms are
for readers who would like to understand the potential of DL
methods for cybersecurity and intrusion detection systems, along adopted to train classifiers that can distinguish among different
with a detailed analysis of the benchmark datasets used in the types of attacks.
literature to train DL models. Rule-based methods, while simple and fast to execute,
Index Terms—Cybersecurity, IDS, Deep Learning. cannot compensate for incomplete or noisy data and are
difficult to update. To overcome these problems, statistics-
I. I NTRODUCTION based approaches have been proposed to enable the processing
of imprecise information; however, such methods entail a
presented in [16], in addition to describing the various ML- puzzles; the limitations of attempts to detect these attacks; and
based methods of network intrusion detection, focuses on the attack generation scenarios.
characteristics of the types of intrusion. Therefore, this review Finally, there are several surveys that address specific as-
presents how available statistical features can be used and pects or applications of IDSs. For example, the work reported
modified for distributed attack detection and the importance in [24] focuses on IDSs for IoT systems, describing their
of the threshold used to process these types of features. taxonomy and placement strategies. In a similar manner, the
In contrast to [11], [16], the survey presented in [17] focuses review presented in [25] discusses DM concepts with IoT
on the use of ML and Data Mining (DM) concepts in IDSs. applications. Another example is the survey in [26], which
This review includes a clear explanation of ML and DM algo- covers only unsupervised methods used in IDSs. Although
rithms introduced in highly cited papers published before 2016 this review is limited to unsupervised methods, it is a good
as well as their usage in IDSs. Notably, this review does not reference for learning about a variety of feature selection
include the newest DL methods, such as Convolutional Neural methods. Additionally, datasets and EU standards (e.g., the
Networks (CNN); the newest datasets, such as AWID2018 General Data Protection Regulation – GDPR) for data col-
and CICIDS2017; or practical details such as attack frequency lection and protection are addressed in this review. Other
and sample size for the benchmark datasets. Nevertheless, this reviews considering specific aspects of this field include the
review does consider fuzzy logic, neural networks, genetic work described in [27], which focuses on hardware techniques
algorithms, and rule-based algorithms. for IDS implementation; the paper presented in [28], which
Similar to the survey presented in [17], the work reported in considers only immunity-based approaches; and the survey
[18] provides a review of ML methods for IDSs, associating published in [29], which describes network security techniques
different types of attacks with the features that can be used to for supervisory control and data acquisition systems.
detect them. In particular, the associated features can provide
insight into how similar features of different types of intru-
sion can support similar approaches to attack detection. For B. Contributions
example, the duration and service features from the KDD99
dataset are the most highly contributing features for detecting This work is intended to serve as an extensive survey
both User-to-Root (U2R) and Remote-to-Local (R2L) attacks, of databases and methods based on ML and DL that have
often causing these two attack types to be misclassified as one been introduced thus far in the literature on cybersecurity and
another. Although this paper fails to investigate the newest DL intrusion detection. This survey focuses on papers published
algorithms and attack types and their most related features, it after 2013, with some exceptions being trendsetter algorithms
provides an extensive survey of feature selection methods. or highly cited papers.
The review published in [19] surveys ML-based intrusion Compared to the other surveys on intrusion detection dis-
detection methods alongside newer DL-based methods. Al- cussed in Section I-A, this survey makes three main contribu-
though this survey focuses on certain specific ML and DL tions: i) it summarizes previous surveys with regard to their
methods, such as Deep Belief Networks (DBNs) and Recur- level of detail in describing methods for cybersecurity, with the
rent Neural Networks (RNNs), as well as known benchmark purpose of encouraging further reading based on the readers’
datasets, it does not cover other DL algorithms, such as CNNs, interests; ii) it focuses on a practical perspective when describ-
or benchmark datasets such as CICIDS2017. The reviews ing the relevant datasets, specifically addressing the number
presented in [14], [20], [21] also consider DL-based methods. of features, the feature types, and attack distributions rather
However, they focus on only a subset of these methods, do than describing general details, feature selection methods, and
not discuss benchmark datasets, or do not provide detailed algorithms, which are analyzed in other surveys; and iii) it
descriptions of the accuracies achieved using DL methods. presents a comprehensive investigation of the newest DL meth-
In contrast to the previously mentioned surveys, the work ods for intrusion detection, analyzing their detection capability,
presented in [22] focuses on the different types of attacks performance, and limitations as well as the databases used.
rather than algorithms for IDSs, without providing details on This review does not consider previous types of ML methods
accuracy. Furthermore, this paper presents an attack taxon- since they have been thoroughly addressed in other survey
omy to provide detailed definitions of various attack types, papers [11], [17], [18].
including how and in which layers they occur. Attack tools The remainder of this paper is organized as follows. Sec-
are also explained in great detail for readers who wish to tion II presents a review of cybersecurity datasets, including
build IDSs for protection against specific attack types. Al- the data collection steps, feature and attack types, bench-
though this paper does not provide detailed information about mark datasets, and reliability criteria. Section III reviews and
new benchmark datasets or DL algorithms, a brief review analyzes DL-based intrusion detection methods, considering
on industrial IDSs, such as programmable logic controller DBNs, Autoencoders (AEs), CNNs, Long Short-Term Mem-
systems, is presented. Similarly, the review published in [23] ory (LSTM) networks, and Generative Adversarial Networks
addresses only application-layer Distributed DoS (DDoS) at- (GANs). Section IV provides a discussion of and insights into
tacks, describing how they are hidden behind low traffic and the limitations and current research trends regarding public
the features used to detect DDoS attacks occurring in the datasets and IDSs. Finally, Section V concludes this work.
application layer. Furthermore, this review discusses defense Table I summarizes the acronyms and notations used in this
mechanisms for protecting against these attacks, such as user paper.
IEEE SYSTEMS JOURNAL 3
TABLE I TABLE II
L IST OF ACRONYMS AND NOTATIONS USED IN THIS PAPER P ROGRAMS USED TO CAPTURE AND PREPROCESS NETWORK TRAFFIC
Notation Description Method Step Program Ref.
ML Machine Learning libPCAP [32]
DL Deep Learning Capture winPCAP [33]
DM Data Mining SNORT [34]
IDS Intrusion Detection System Wireshark [35]
IoT Internet of Things PCAP tshark [36]
IP Internet Protocol tcpdump [37]
Preprocessing
TCP Transmission Control Protocol networkminer [38]
UDP User Datagram Protocol rapidminer [39]
GDPR General Data Protection Regulation scapy [40]
PCAP Packet CAPture Cisco NetFlow [41]
NetFlow Capture/Preprocessing
SSH Secure Shell nfdump [42]
FTP File Transfer Protocol
SQL Structured Query Language
SYN TCP packet used to request a connection
case of nonflooding attack types such as R2L and U2R
DoS Denial of Service attacks, which are performed using packet payloads.
DDoS Distributed Denial of Service
U2R User-to-Root
NetFlow enables the collection of summary information
R2L Remote-to-Local or certain predefined attributes related to the flow of
XSS Cross-Site Scripting
k-NN k-Nearest Neighbor
packets in a network. Examples of the features that can be
ANN Artificial Neural Network extracted include the number of packets in given a time
SVM Support Vector Machine
RBM Restricted Boltzmann Machine
period or the size of data transmitted over the network.
DBN Deep Belief Network Although data collection via NetFlow is more memory
AE Autoencoder
CNN Convolutional Neural Network
efficient than data collection via PCAP, only summary
RNN Recurrent Reural Network data are considered, and it is not possible to extract new
LSTM Long Short-Term Memory
GAN Generative Adversarial Networks
types of features to address new needs.
PCA Principal Component Analysis The most commonly used programs for performing PCAP
are libPCAP, winPCAP and SNORT. In addition, several
II. C YBERSECURITY DATASETS programs allow the preprocessing of PCAP files to extract
This section presents a review of cybersecurity datasets, different types of features. For example, such preprocessing
outlining the data collection steps, feature and attack types, programs include Wireshark, tshark, tcpdump, networkminer,
available benchmark databases, and reliability criteria. rapidminer and scapy. The most commonly used programs for
capturing and preprocessing NetFlow data are Cisco NetFlow
A. Data Collection and nfdump. Table II summarizes the various programs used
to capture and preprocess network traffic using the PCAP and
This section presents the methods of data collection used NetFlow methodologies.
in cybersecurity applications. Specifically, data collection can
be performed in two different ways. The first is based on B. Feature Types
processing system calls (system logs) from host-based oper- This section examines the types of features extracted from
ating systems. The second is based on packet headers and the available datasets. Although new features are added when
payloads extracted from network traffic packages and from novel attack patterns are discovered, there are several reoccur-
applications using the Transmission Control Protocol (TCP)/IP ring feature types in the literature.
communication stack [30]. First, a distinction can be drawn between host-based and
The two main methodologies used to collect network traffic network-based data based on the procedure used to collect the
in the second way are full Packet CAPture (PCAP) and the data, as described in Section II-A. In most cases, host-based
NetFlow protocol: data are composed of system/operation logs, which consist of
PCAP enables the collection of the most detailed data attributes such as system calls. Feature extraction from system
from a network because it involves the extraction of calls is generally performed using methods based on natural
whole network packets (including packet headers) for language processing, such as n-grams [43].
all information being transmitted. In particular, the data On the other hand, network-based data are obtained by
collected from such packets include the packet size, collecting network traffic data. However, network traffic is
protocol types, headers of flows, flags, source and des- composed of many individual packets/frames, and feature
tination IP addresses, and source and destination port extraction must be performed for each traffic session, known
numbers [31]. However, the information contained in as flow-level traffic data, to reduce the dimensionality of the
the payload of a packet may be deleted or anonymized data and detect intrusions. Such feature extraction is conducted
due to privacy issues. In fact, a packet payload may based on three different types of features: basic, traffic-based,
contain sensitive data such as private information, instant and content-based features.
messaging conversations, or a history of visited websites. Basic features are extracted from TCP/IP connections and
In most cases, a trade-off must be established between can be classified as header-based, flow-based, connection-
anonymizing payloads to protect user privacy and using based, or packet-based features. Header-based features are
all collected data to achieve accurate attack detection. related to the packet header and include the source and
This trade-off is especially important to consider in the destination IP addresses, the TCP and User Datagram
IEEE SYSTEMS JOURNAL 4
Protocol (UDP) source and destination ports, the IP Denial of Service (DoS) [45] attacks are based on tem-
protocol, the service, and the IP header length. Flow- porarily blocking the normal use of network utilities
based features include attributes computed through anal- by flooding the network with traffic. Examples of DoS
ysis of the flow. In particular, a flow is defined as a attacks include botnet, Slowloris, smurf, and SYN flood
set of packets having a common set of properties (flow attacks.
keys), which may include IP addresses, port numbers, Distributed DoS (DDoS) [46] attacks are based on flood-
or meta-information [44]. Examples of flow-based fea- ing the server and making it unable to respond by over-
tures are statistical aggregations (e.g., average, maximum, loading it with service requests. Unlike in DoS attacks,
minimum) on the size, time of arrival, and number of the flooding is performed via many sources. Examples of
inbound/outbound packets in a given time period, the du- DDoS attacks include local area network denial (LAND),
ration of that period, and the type of packets. Connection- ping-of-death, RUDY, and teardrop attacks.
based features are related to a particular connection, User-to-Root (U2R) [45] attacks involve behaving as a
which is defined as a stream of packets between two normal user with the aim of detecting system vulnerabil-
specific IP addresses. Such features include the interval ities and gaining root access. Examples of U2R attacks
between packets, the timestamp, and the time to live. Fi- include buffer overflow, rootkit, Perl, and loadmodule
nally, packet-based features are related to the transmitted attacks.
data and include the payload and mean number of bytes Remote-to-Local (R2L) [45] attacks attempt to use a
of a packet. The main advantage of basic features is that remote system to gain unauthorized access to and damage
they are general and can be used to detect several kinds the target system. R2L attacks may be combined with
of attacks [45], [46]. U2R attacks, making these types of attacks difficult to
Traffic-based features are associated with either a specific differentiate. Examples of R2L attacks include Secure
time interval (e.g., 2 seconds) or a specific number of Shell (SSH) brute force, warezmaster, multihop, imap,
connections (e.g., 100 connections). These features can be and spy attacks.
extracted by considering either the same host or the same Probe [45] attacks are based on searching for vulnerabili-
service. In the first case, the extracted features include ties throughout the whole network by sending scan pack-
statistical sums of connections with the same destination ets and gaining information about the system. Examples
host, whereas in the second case, the extracted features of probe attacks include Satan, IP sweep, and port sweep
comprise statistical sums of connections to the same ser- attacks.
vice for a fixed amount of time or number of connections Password [18] attacks attempt to gain unauthorized access
[45]. One drawback of traffic-based features is that some to the system by using guessing techniques to steal
attack types span time intervals longer than 2 seconds passwords. Examples of password attacks include brute
or a number of connections greater than 100. Examples force FTP-Patator and brute force SSH-Patator attacks.
of such attack types include low-frequency attack types Injection [47] attacks use scripts that inject com-
such as U2R, R2L, and low-rate DoS attacks, in which the mands/queries with the purpose of gaining unauthorized
frequency of the transmitted information is similar to that access and stealing information. Examples of injection
of legitimate traffic, in contrast to high-frequency attack attacks include SQL injection and Cross-Site Scripting
types, which exhibit a higher frequency than normal (XSS).
traffic. Although some newly proposed connection-based
features span time intervals longer than 2 seconds, these Table III lists the attack types considered in the most fre-
features are not fully adequate for identifying such attack quently used benchmark datasets, along with their definitions.
patterns [45]. Although the definitions provided in Table III can be used to
Content-based features are extracted from information distinguish the different attacks, three additional factors must
embedded in different data portions of packets and in- be considered when designing an IDS. First, an attack of one
clude the number of requests, the request type, and the type may be the beginning of another attack of a different
number of failed login attempts. Content-based features type. In this case, the characteristics of the true attack will be
are especially useful for detecting low-frequency attack a combination of the characteristics of both attacks. Second,
types, which do not exhibit sequential patterns as high- some attack characteristics may evolve over time. For instance,
frequency attacks do. In fact, while traffic-based fea- DDoS attacks are mostly understood to be high-frequency
tures can be used to detect high-frequency attacks, low- attacks that flood the bandwidth of a network; however, DDoS
frequency attacks are difficult to detect using only basic attacks in the application layer are low-frequency attacks that
and traffic-based features, and in most cases, content- flood the server instead of flooding the network. Third, some
based features are also required [45]. attack types may show similar patterns. For example, both
DoS and probe attacks, in most cases, exhibit sequential
C. Attack Types patterns and involve a large number of connections to the
This section outlines the various attack types considered same host, whereas R2L and U2R attacks are both embedded
in IDSs. In particular, we present the following attack types, in packets. Therefore, although DoS and probe attacks are
since they are the ones considered in the most frequently used easy to differentiate from R2L and U2R attacks, it may not
benchmark datasets [48]: be as easy to differentiate DoS attacks from probe attacks or
IEEE SYSTEMS JOURNAL 5
TABLE III
ATTACK TYPES REPRESENTED IN THE MOST FREQUENTLY USED CYBERSECURITY BENCHMARK DATASETS
Attack name Examples Description
Temporarily blocks the normal use of network utilities by flooding the network
Denial of Service (DoS) [45] Botnet, Slowloris, smurf, SYN flood
with traffic.
Floods the server and makes it nonresponsive to users by overloading it with
Distributed DoS (DDoS) [46] LAND, ping of death, RUDY, teardrop service requests. Unlike in DoS attacks, the flooding originates from many
sources.
Behaves as a normal user with the aim of detecting system vulnerabilities and
User-to-Root (U2R) [45] Buffer overflow, rootkit, Perl, loadmodule
gaining root access.
Gains local access via a remote system and damages the system. May be
Remote-to-Local (R2L) [45] SSH brute force, warezmaster, multihop, imap, spy
combined with U2R attacks, thus making these attacks difficult to differentiate.
Searches for vulnerabilities throughout the whole network via IP addresses by
Probe [45] Satan, IP sweep, port sweep
sending scan packets and gaining information about the system.
Password [18] Brute force FTP-Patator, brute force SSH-Patator Gains access to the system after stealing passwords by guessing.
Uses a script to inject commands/queries to gain unauthorized access and steal
Injection [47] SQL injection, Cross-Site Scripting (XSS)
information.
U2R attacks from R2L attacks due to their similar embedding features and uncategorized samples. The dataset consists of
patterns. 155 features extracted using Wireshark [49].
To increase the effectiveness of differentiating among attack 2) CICIDS2017: This dataset was created from realistic
types, several studies have investigated which types of features traffic data at the Canadian Institute for Cybersecurity of the
are effective for detecting particular attack types. For example, University of New Brunswick (UNB) in 2017 and includes a
the authors of [18] report that on the basis of the features full-packet dataset with 152 features and raw PCAP files [50].
contained in the KDDCUP99 dataset, even though DoS attacks The dataset considers attacks and subattacks such as injection
can be differentiated using basic and traffic-based features, attacks based on SQL injection and XSS, password attacks
considering some sparse features, such as flags, destination IP based on brute force FTP-Patator and brute force SSH-Patator,
addresses, percentages of connections to the same service, and and flooding attacks based on DoS, Goldeneye DDoS, HULK
percentages of connections to the same port, can result in more DDoS, slow HTTP DDoS, Slowloris DDoS, and Heartbleed.
effective detection. Similarly, duration, service, destination Although the criteria for a reliable dataset proposed by [54]
host same service rate, and flag features are vital for detecting are satisfied, one feature among the attributes is duplicated.
probe (scanning) attacks. The most important features for 3) KDD99/KDDCup99: Also known as KDDCup99, the
detecting U2R attacks are the number of failed logins, number KDD99 dataset was created using DARPA 1998 PCAP files
of shells, number of roots, duration, and service. For R2L and includes full-packet data, divided into subsets for training
attacks, the most important features are the duration, service, and testing [51].
service bytes, destination bytes, number of failed logins, count, This dataset considers DoS-based subattacks such as back,
destination host count, and destination host service count. As LAND, ping of death, teardrop, Neptune, and smurf attacks;
seen above, the features used to detect attacks of the probe, U2R subattacks such as buffer overflow, loadmodule, Perl,
U2R, and R2L types show a high degree of similarity, which and rootkit attacks; R2L subattacks such as ftp-write, guess-
explains why these three attack types are often misclassified password, imap, multihop, PHF, spy, warezclient, and warez-
among each other. master attacks; and probe-based subattacks such as port sweep,
IP sweep, NMAP, and Satan attacks. As of 2019, this dataset
D. Benchmark Datasets remains the most widely used benchmark dataset in the field of
This section introduces and analyzes benchmark datasets network intrusion detection. However, this dataset suffers from
for intrusion detection, considering both the extent to which several limitations, including duplicated samples, different
they reflect novel attack types due to the evolving nature of probability distributions between the training and test data,
intrusion patterns over time and their shortcomings. For the unbalanced classes, and a lack of coverage of the newest attack
benchmark datasets considered in this section, Table IV lists types.
the most frequently used datasets, while Table V summarizes 4) NSL-KDD: This dataset was created by erasing all
the distribution of the samples in each dataset across the duplicate records from the KDD99 dataset and using sampling
different attack types considered. techniques to balance the number of data samples in each class
1) AWID2018: Also known as CSE-CIC-IDS2018, this [45]. This dataset includes separate databases for training and
dataset includes databases for training and testing collected testing, where the test database consists of fourteen subattack
using two different capture procedures. The data collected types that are not present in the training database. NSL-KDD
using the first procedure consist of full-packet network traffic is not subject to most of the limitations of the KDD99 dataset;
with system logs, while the data collected using the second however, this dataset still lacks newer attack types.
procedure consist of reduced packet traffic. The dataset in- 5) Kyoto: This dataset was created from honeypots at
cludes two different labels for attacks: a main attack label and Kyoto University and consists of traffic data collected daily
a subattack label. This dataset has the advantages of including between 2006 and 2015 [52]. The dataset includes 24 features,
the newest attack types, such as password attacks based on fourteen of which are in common with the KDD99 dataset, and
the SSH/FTP brute force approach, injection attacks based on labels indicating normal data, known attacks, and unknown
SQL injection, and flooding attacks based on DoS. However, attacks. The dataset is missing data from some days and
the data exhibit some limitations, such as noisy, misleading months during the time of its collection, and the average
IEEE SYSTEMS JOURNAL 6
TABLE IV
OVERVIEW OF THE MOST FREQUENTLY USED CYBERSECURITY BENCHMARK DATASETS
Ref. Name Year Num. of features Num. of samples Attack types Separate train-test sets
210900113 (full)
[49] AWID2018 2018 155 Flooding, impersonation, injection Yes
2326218 (reduced)
DoS/DDoS, port scan, FTP-Patator, SSH-
[50] CICIDS2017 2017 152 2830743 Patator, bot, web attacks, infiltration, Heart- No
bleed
4900000 (full)
[51] KDD99 1999 42 494021 (subset) DoS, probe, U2R, R2L Yes
311029 (testing)
125973 (training)
22544 (testing)
[45] NSL-KDD 2009 42 DoS, probe, U2R, R2L Yes
25192 (training)
11850 (testing)
[52] Kyoto 2006-2015 24 Various Known, unknown No
2540047 (full)
Fuzzers, worms, shellcode, analysis, back-
[53] UNSW-NB15 2015 49 175341 (training) Yes
doors, DoS, exploits, generic, reconnaissance
82332 (testing)
Notes: * = including 1 feature as a label; ** = including 2 features as labels; *** = at the time of this survey.
TABLE V
D ISTRIBUTIONS OF ATTACK TYPES IN THE MOST FREQUENTLY USED BENCHMARK DATASETS
Name AWID2018
Attack Normal Flooding Impersonation Injection
N. samples 205074514 1409392 2361892 2054315
(Perc.) (97.24%) (0.67%) (1.12%) (0.97%)
Name CICIDS2017
Attack Benign DoS DDoS Port scan FTP-P. SSH-P. Bot Web att. Infiltr. Heartb.
N. samples 2273097 252661 128027 158930 7938 5897 1966 2180 36 11
(Perc.) (80.3004%) (8.9257%) (4.5228%) (5.6144%) (0.2804%) (0.2083%) (0.0695%) (0.077%) (0.0012%) (0.0003%)
Name KDD99
Attack Normal DoS Probe U2R R2L
N. samples 972781 3683370 41102 52 1126
(Perc.) (20.71%) (78.4%) (0.8897%) (0.0001%) (0.0002%)
Name NSL-KDD
Attack Normal DoS Probe U2R R2L
N. samples 77054 53385 14077 252 3649
(Perc.) (51.9%) (35.9%) (9.5%) (0.2%) (2.5%)
Name Kyoto
Attack Normal Known attacks Unknown attacks
N. samples 1186780 11218206 563
(Perc.) (9.5706%) (90.429%) (0.0004%)
Name UNSW-NB15
Attack Fuzzers Worms Shellcode Analysis Backdoors DoS Exploits Generic Rec.
N. samples 24246 174 1511 2677 2329 16353 44525 215481 13987
(Perc.) (7.572%) (0.054%) (0.47%) (0.83%) (0.72%) (5.112%) (13.8%) (67.092%) (4.35%)
Notes: N. samples = Number of samples; Perc. = Percentage; FTP-P. = FTP-Patator; SSH-P. = SSH-Patator; Web att. = Web attacks; Infiltr. = Infiltration; Heartb. = Heartbleed;
Rec. = Reconnaissance. The largest remainder method was used when computing the percentages to ensure a total of .
number of samples per month is approximately twelve million. used intrusion detection datasets; however, it is commonly
Since the traffic was captured from honeypots, which are considered to be outdated and to contain irregularities [56].
designed to protect against less advanced attackers, most of the 8) ISCX IDS 2012: Also known as UNB or UNB ISCX
monitored attacks did not originate from advanced attackers. 2012, this dataset was created at UNB in 2012 and includes
Therefore, the dataset may not be representative of realistic full-packet network data [57]. The dataset includes normal
attacks. traffic data and attack data for attack types such as infil-
6) UNSW-NB15: This dataset was synthetically created at tration, DoS, DDoS, and brute force SSH attacks. Although
the Cyber Range Lab of the Australian Centre for Cybersecu- this dataset includes some of the newest attack types, it is
rity and includes full, training, and test datasets as well as criticized as being unrealistic for not containing sufficient
raw PCAP files. The dataset includes 49 features and two internet background noise, as it consists of pure network traffic
label attributes: the first label describes the attack, and the rather than data received by any real device [58].
second label is binary. The dataset considers attacks such as 9) CIC DoS: This dataset was created at the Canadian
fuzzers, backdoors, shellcode, DoS attacks, worms, generic Institute for Cybersecurity of UNB in 2017 [59]. It considers
attacks, reconnaissance attacks, exploits, and analysis attacks the application layer and incorporates data that describe high-
[53]. One of the limitations of this dataset is the existence of volume (traditional) DoS attacks, data corresponding to low-
several missing samples. volume DoS attacks, and normal data from the ISCX IDS 2012
7) DARPA: This dataset was created at the MIT Lincoln dataset.
Laboratory in 1998 and includes full, training, and test sets 10) Gure-KddCup: This dataset was created using the
of raw PCAP files [55]. The newer versions of the DARPA PCAP data from the DARPA 1998 dataset [60]. It includes
dataset, DARPA 1999 and DARPA 2000, are based on the features similar to those of the KDD99 dataset, with the
1998 version. This dataset is one of the most commonly addition of payload information and other new features, such
IEEE SYSTEMS JOURNAL 7
as IP addresses and port numbers, to make U2R and R2L as network telescope and DDoS databases [58], [70]. Although
attacks more visible/distinguishable [61]. there are a few up-to-date databases, such as CAIDA DDoS,
11) CDX: The Cyber Defence Exercises (CDX) dataset most do not accurately represent the different possible types
[62] was collected from the United States Military Academy of attacks. For instance, the DoS attack databases consist only
network in 2009 and consists of PCAP data extracted from of spoofed-source DoS attacks and exclude other versions of
system logs, divided into intrusion traffic and normal traffic DoS attacks.
[56]. 20) DEFCON: The DEFCON datasets are created for in-
12) ASNM-CDX: This dataset was created from the CDX trusion modeling competitions held every year. Although these
network traffic data in 2009. The dataset includes 5772 datasets are continuously created, they focus only on intrusions
samples, each with 875+1+1 features. It includes distributed and attacks and lack normal background traffic [58]. Therefore,
features often used in detecting low-frequency attacks, such they are not frequently used for network intrusion detection.
as the number of packets and the total bytes in/out from four 21) Others: In addition to the most commonly used bench-
seconds to fifty-four seconds. In some cases, the features have mark datasets, a variety of publicly available raw traffic
been converted with the fast Fourier transform to increase datasets exist. These datasets include Metrosec, UNIBS 2009,
their discriminative ability. This dataset has two attack label TUIDS, the University of Napoli traffic dataset, payload
attributes: the first label discriminates between legitimate and datasets such as the CSIC 2010 HTTP Dataset, the UNM
malicious traffic, and the second label indicates whether the system call dataset, and an enormous variety of network traffic
attack is based on buffer overflow. However, this dataset lacks from the Capture the Flag Competitions (CTF) and CDX.
traffic diversity since it consists only of buffer overflow attacks Moreover, several host-based datasets also exist, including
[63]. the ADFA Linux Dataset (ADFA-LD), the ADFA Windows
13) LBNL: This dataset was created at the Lawrence Berke- Dataset (ADFA-WD) and the ADFA Windows Dataset Stealth
ley National Laboratory (LBNL) between 2004 and 2005. Attacks Addendum (ADFA-WD:SAA) [71].
Although the dataset includes packet headers, the payloads
are anonymized due to privacy issues, which limits its infor- III. DL- BASED I NTRUSION D ETECTION M ETHODS
mativeness [64]. Traditional ML-based methods for cybersecurity include
14) ISOT: This dataset was created in 2010 by combining approaches based on the k-Nearest Neighbor (k-NN) algo-
Storm, Waledac, and Zeus botnet attack data from the French rithm, k-means clustering, Artificial Neural Networks (ANNs),
Chapter of the Honeynet Project and normal traffic data from fuzzy logic, Bayesian networks, hidden Markov models, self-
the Traffic Lab at Ericsson Research and LBNL [65]. organizing maps, decision trees, evolutionary classifiers, Sup-
15) MAWI: This dataset was collected by the MAWI Work- port Vector Machines (SVMs), and rule-based systems [17],
ing Group in Japan and includes continuously updated traffic [18], [22], [26]. In this survey, we focus on the more recent
data from 2001 to 2019. A graph-based methodology has been DL-based approaches, which have not been covered in detail
used to label the raw data as either abnormal or normal [66]. in previous surveys.
One of the limitations of this dataset is duplicated packets. To provide up-to-date descriptions of the recent methods
16) CTU-13: This dataset is a combination of botnet traffic developed for cybersecurity, this section describes DL-based
data, normal data, and background data collected at Czech methods for intrusion detection. For each algorithm, we
Technical University in Prague (CTU) in 2011. Although the consider evaluation criteria such as a fast run/convergence
data consist of a variety of botnet scenarios and extended time, a high detection ability with a low false positive rate,
truncated versions of PCAP files with complete TCP, UDP and adaptability to novel intrusions, computational efficiency, and
Internet Control Message Protocol (ICMP) headers, the dataset scalability [16]. In the remainder of this section, we consider
is specifically designed only for botnet detection. Therefore, DL methods based on DBNs, AEs, CNNs, LSTM networks,
it is considered unrealistic to mix these data with normal and and GANs [15]. A summary of the presented DL methods in
background traffic [67]. the IDS context is presented in Table VI.
17) UMass: This dataset was collected between 2004 and
2018 and contains traffic data such as Tor traffic data, Gateway A. Deep Belief Networks (DBNs)
Link 3 Trace data, web requests, and response data. However, DBNs are a type of ANN obtained by stacking together
most of the data were collected under similar network traffic several Restricted Boltzmann Machines (RBMs [77]), which
conditions and lack a broad variety of attacks [68]. act as the layers of the DBN, and introducing connections
18) Twente: This dataset was created from honeypots at between the layers but not within each layer. The RBMs used
the University of Twente in 2009 and consists of more than to construct a DBN consist of two main layers, one visible
fourteen million flows and more than seven million alerts. In and one hidden, constituted by a variable number of neurons.
this dataset, some samples are left unlabeled, and informative Additionally, within each RBM, the neurons of different layers
data from the packet headers and payloads are anonymized are fully connected, whereas the connections within the same
[69]. This dataset has the limitation that traffic originating from layer are restricted [72]. Fig. 1 shows an example of a DBN.
honeypots does not represent realistic attacks since honeypots Because of their layered structure, DBNs have the advantage
are designed to protect against less advanced attackers. that fast learning procedures can be used, which can be applied
19) CAIDA: The CAIDA dataset consists of a variety of in a greedy fashion, layer by layer, in an unsupervised way
different databases that are specific to particular events, such [78]. As a consequence of this advantage, methods based
IEEE SYSTEMS JOURNAL 8
TABLE VI
S UMMARY OF DL- BASED METHODS FOR INTRUSION DETECTION
Method Description Pros Cons
Deep Belief Stacks of Restricted Boltzmann Machines (RBMs) with Fast and unsupervised layer-by-layer learning in
Training uses an approximation of
Networks (DBNs) connections between the layers but not within each a greedy fashion. Unsupervised dimensionality
the gradient.
[72] layer. reduction.
Can be trained in an end-to-end manner using
Autoencoders Encoder-decoder structure that maps input data to a Requires an additional ML model
learning algorithms based on gradient descent.
(AEs) [73] hidden space and then reconstructs them. to perform classification.
Unsupervised dimensionality reduction.
Convolutional Performs classification while automatically learn- Computationally expensive to train.
Sequences of convolutional layers trained via gradient
Neural Networks ing data representations. Learns discriminant spa- Not naturally suited to processing
descent.
(CNNs) [74] tial patterns invariant to translation and shifting. data in time-series form.
Long Short-Term The research community is increas-
Neurons arranged in a temporal sequence, able to
Memory Can natively process time-series data. ingly focusing on CNNs rather than
maintain memory for arbitrary intervals of time.
(LSTM) [75] LSTM networks.
Generative Adv. Combination of a generator, which generates data start-
Learns data distributions in an unsupervised man- Often requires visual inspection of
Networks (GANs) ing from a random distribution, and a discriminator,
ner. the results.
[76] which distinguishes real data from synthetic data.
evaluation of dataset bias for IDSs may contribute to a fairer [3] “Hacked consumers don’t forgive companies who lose their
assessment of the various algorithms that have been proposed data. bad news for yahoo,” https://secludit.com/en/blog/
consumer-hacking-confidence.
in the field of cybersecurity. [4] McAfee, “Mcafee labs threats report,” https://www.mcafee.com/
In addition to dataset bias, a few works in the literature enterprise/en-us/assets/reports/rp-quarterly-threats-aug-2019.pdf,
address other issues related to public benchmark datasets, such 2019.
[5] R. Bhadoria, “Security architecture for cloud computing,” in Cyber Se-
as repeated data, missing values, incorrect labeling [137], or an curity and Threats: Concepts, Methodologies, Tools, and Applications,
optimistic number of false alarms due to considering specific 2018, pp. 729–755.
situations in a nonrealistic way [140]. [6] M. Swarnkar and R. Bhadoria, “Security aspects in utility computing,”
in Emerging Research Surrounding Power Consumption and Perfor-
B. Novel Features mance Issues in Utility Computing, 2016, pp. 262–275.
[7] S. Dorbala and R. Bhadoria, “Analysis for security attacks in cyber-
As the number of methodologies that are able to achieve physical systems,” in Cyber-Physical Systems: A Computational Per-
high accuracy on known datasets increases, attack patterns spective, 2015, pp. 395–414.
[8] S. K. Khaitan and J. D. McCalley, “Design techniques and applications
tend to evolve to better cheat the existing IDSs. This evolution, of cyberphysical systems: A survey,” IEEE Syst. J., vol. 9, no. 2, pp.
which can arise in nonstationary environments, is known as 350–365, 2015.
concept shift and occurs as the definitions of attacks change [9] R. Sandhu and P. Samarati, “Authentication, access control and intru-
sion detection,” in CRC Handbook of Computer Science and Engineer-
over time [141]. ing. CRC Press Inc., 1997, pp. 1929–1948.
For instance, the work presented in [142] shows that some [10] S. Han, M. Xie, H. Chen, and Y. Ling, “Intrusion detection in cyber-
low-frequency DDoS attacks that appear in newer datasets physical systems: Techniques and challenges,” IEEE Syst. J., vol. 8,
no. 4, pp. 1052–1062, 2014.
exhibit a higher degree of similarity to normal data traffic [11] T. T. T. Nguyen and G. Armitage, “A survey of techniques for internet
than do similar attacks in older datasets. As a consequence, in traffic classification using machine learning,” IEEE Commun. Surveys
recent cases, some features are less effective in detecting such Tuts., vol. 10, no. 4, pp. 56–76, 2008.
[12] R. Donida Labati, A. Genovese, V. Piuri, F. Scotti, and S. Vishwakarma,
attacks than they are in detecting older attack patterns. “Computational intelligence in cloud computing,” in Recent Advances
Therefore, it remains an open research issue to investigate in Intelligent Engineering: Volume Dedicated to Imre J. Rudas’ Sev-
whether the available features in known benchmark datasets entieth Birthday. Springer, 2020, pp. 111–127.
[13] Y. Cai, A. Genovese, V. Piuri, F. Scotti, and M. Siegel, “IoT-based
are sufficient to achieve high detection rates even in the architectures for sensing and local data processing in ambient intelli-
presence of changing attack patterns or whether it will be gence: Research and industrial trends,” in Proc. of I2MTC, 2019.
necessary to add new features to maintain a high level of [14] Z. M. Fadlullah, F. Tang, B. Mao, N. Kato, O. Akashi, T. Inoue, and
K. Mizutani, “State-of-the-art Deep Learning: Evolving machine intel-
detection accuracy. ligence toward tomorrow’s intelligent network traffic control systems,”
V. C ONCLUSION IEEE Commun. Surveys Tuts., vol. 19, no. 4, pp. 2432–2455, 2017.
[15] S. Pouyanfar, S. Sadiq, Y. Yan, H. Tian, Y. Tao, M. P. Reyes, M.-
In this review, we have analyzed Machine Learning (ML)- L. Shyu, S.-C. Chen, and S. S. Iyengar, “A survey on deep learning:
based approaches to cybersecurity and intrusion detection sys- Algorithms, techniques, and applications,” ACM Comput. Surv., vol. 51,
tems, with a specific focus on the most recent methods based no. 5, 2018.
[16] M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, “Network
on Deep Learning (DL), which represent the current state of anomaly detection: Methods, systems and tools,” IEEE Commun.
the art for intrusion detection in network traffic. Specifically, Surveys Tuts., vol. 16, no. 1, pp. 303–336, 2014.
we have considered methods based on deep belief networks, [17] A. L. Buczak and E. Guven, “A survey of data mining and machine
learning methods for cyber security intrusion detection,” IEEE Com-
autoencoders, convolutional neural networks, long short-term mun. Surveys Tuts., vol. 18, no. 2, pp. 1153–1176, 2016.
memory networks, and generative adversarial networks. In [18] P. Mishra, V. Varadharajan, U. Tupakula, and E. S. Pilli, “A detailed
contrast to previous surveys, this review considers studies that investigation and analysis of using machine learning techniques for
intrusion detection,” IEEE Commun. Surveys Tuts., vol. 21, no. 1, pp.
use common benchmark datasets to ensure a fair evaluation 686–728, 2019.
and comparison of the proposed algorithms. [19] D. Kwon, H. Kim, J. Kim, S. C. Suh, I. Kim, and K. J. Kim, “A survey
To provide a reference for how recent cybersecurity methods of Deep Learning-based network anomaly detection,” Cluster Comput.,
vol. 22, pp. 949–961, 2017.
use benchmark datasets for intrusion detection, in this survey, [20] E. Hodo, X. J. A. Bellekens, A. W. Hamilton, C. Tachtatzis, and R. C.
we have also reviewed the main datasets used for this purpose Atkinson, “Shallow and Deep networks intrusion detection system: A
by highlighting their potential for training effective ML- taxonomy and survey,” ArXiv, vol. abs/1701.02145, 2017.
[21] Y. Xin, L. Kong, Z. Liu, Y. Chen, Y. Li, H. Zhu, M. Gao, H. Hou,
based algorithms. In particular, we have considered the data and C. Wang, “Machine learning and Deep Learning methods for
collection procedures, the distributions of feature and attack cybersecurity,” IEEE Access, vol. 6, pp. 35 365–35 381, 2018.
types, and dataset reliability criteria. [22] H. Hindy, D. Brosset, E. Bayne, A. Seeam, C. Tachtatzis, R. C.
Atkinson, and X. J. A. Bellekens, “A taxonomy and survey of intrusion
By providing a survey of ML and DL approaches, along detection system design techniques, network threats and datasets,”
with descriptions of the benchmark datasets considered when CoRR, vol. abs/1806.03517, 2018.
developing recent methods, this review aims to provide a [23] A. Praseed and P. S. Thilagam, “DDoS attacks at the application
layer: Challenges and research perspectives for safeguarding web
practical road map for researchers in academia and industry applications,” IEEE Commun. Surveys Tuts., vol. 21, no. 1, pp. 661–
working in the field of ML and DL for cybersecurity applica- 685, 2019.
tions. [24] B. B. Zarpelo, R. S. Miani, C. T. Kawakani, and S. C. de Alvarenga, “A
survey of intrusion detection in Internet of Things,” J. Netw. Comput.
R EFERENCES Appl., vol. 84, no. C, pp. 25–37, 2017.
[25] C. Tsai, C. Lai, M. Chiang, and L. T. Yang, “Data mining for internet
[1] S. Muggleton, “Alan Turing and the development of artificial intelli- of things: A survey,” IEEE Commun. Surveys Tuts., vol. 16, no. 1, pp.
gence,” AI Commun., vol. 27, no. 1, pp. 3–10, 2014. 77–97, 2014.
[2] “WannaCry ransomware attack,” https://en.wikipedia.org/wiki/ [26] A. Nisioti, A. Mylonas, P. D. Yoo, and V. Katos, “From intrusion de-
WannaCry_ransomware_attack. tection to attacker attribution: A comprehensive survey of unsupervised
methods,” IEEE Commun. Surveys Tuts., vol. 20, no. 4, 2018.
IEEE SYSTEMS JOURNAL 13
[27] R. Abdulhammed, M. Faezipour, and K. M. Elleithy, “Network intru- [61] I. Perona, I. Gurrutxaga, O. Arbelaitz, J. I. Martín, J. Muguerza,
sion detection using hardware techniques: A review,” in Proc. of LISAT, and J. M. Pérez, “Service-independent payload analysis to improve
2016, pp. 1–7. intrusion detection in network traffic,” in Proc. of AusDM, 2008.
[28] J. Kim, P. J. Bentley, U. Aickelin, J. Greensmith, G. Tedesco, and [62] National Security Agency, “Cyber Defense Exercise (CDX),” https:
J. Twycross, “Immune system approaches to intrusion detection – a //apps.nsa.gov/iaarchive/programs/cyber-defense-exercise/index.cfm,
review,” Nat. Comput., vol. 6, no. 4, pp. 413–466, 2007. 2001.
[29] A. Volkova, M. Niedermeier, R. Basmadjian, and H. de Meer, “Security [63] I. Homoliak, M. Barabas, P. Chmelar, M. Drozd, and P. Hanacek,
challenges in control network protocols: A survey,” IEEE Commun. “ASNM: Advanced security network metrics for attack vector descrip-
Surveys Tuts., vol. 21, no. 1, pp. 619–639, 2019. tion,” in Proc. of SAM, 2013.
[30] O. Savas and J. Deng, Big Data Analytics in Cybersecurity. Auerbach [64] R. Pang, M. Allman, V. Paxson, and J. Lee, “The devil and packet trace
Publications, 2017. anonymization,” SIGCOMM Comput. Commun. Rev., vol. 36, no. 1, pp.
[31] F. Pacheco, E. Exposito, M. Gineste, C. Baudoin, and J. Aguilar, 29–38, 2006.
“Towards the deployment of machine learning solutions in network [65] S. Saad, I. Traore, A. Ghorbani, B. Sayed, D. Zhao, W. Lu, J. Felix,
traffic classification: A systematic survey,” IEEE Commun. Surveys and P. Hakimian, “Detecting P2P botnets through network behavior
Tuts., vol. 21, no. 2, pp. 1988–2014, 2019. analysis and machine learning,” in Proc. of PST, 2011, pp. 174–180.
[32] “LibPCAP,” https://www.tcpdump.org. [66] R. Fontugne, P. Borgnat, P. Abry, and K. Fukuda, “MAWILab: Com-
[33] “WinPCAP,” https://www.winpcap.org. bining diverse anomaly detectors for automated anomaly labeling and
[34] “Snort,” https://www.snort.org. performance benchmarking,” in Proc. of CoNEXT, 2010.
[35] “Wireshark,” https://www.wireshark.org. [67] S. García, M. Grill, J. Stiborek, and A. Zunino, “An empirical com-
[36] “tshark,” https://www.wireshark.org/docs/man-pages/tshark.html. parison of botnet detection methods,” Comput. Secur., vol. 45, 2014.
[37] “TCPDump,” https://www.tcpdump.org. [68] University of Massachusetts Amherst - Laboratory for Advanced Soft-
[38] “Networkminer,” https://www.netresec.com/?page=NetworkMiner. ware Systems, “UMassTraceRepository,” http://traces.cs.umass.edu/
[39] “Rapidminer,” https://rapidminer.com. index.php/Network/Network, 2018.
[40] “Scapy,” https://scapy.net. [69] A. Sperotto, R. Sadre, F. van Vliet, and A. Pras, “A labeled data set
[41] “Cisco Netflow,” https://www.cisco.com/c/en/us/products/ for flow-based intrusion detection,” in IP Operations and Management,
ios-nx-os-software/ios-netflow/index.html. ser. Lect. Notes in Comput. Sc. Springer, 2009, pp. 39–50.
[42] “Nfdump,” https://github.com/phaag/nfdump. [70] Center for Applied Internet Data Analysis, “Data Collection, Curation
[43] D. Jurafsky and J. H. Martin, Speech and Language Processing: An and Sharing,” https://www.caida.org/data/, 2018.
Introduction to Natural Language Processing, Computational Linguis- [71] G. Creech and J. Hu, “Generation of a new IDS test dataset: Time to
tics, and Speech Recognition, 1st ed. Prentice Hall PTR, 2000. retire the KDD collection,” in Proc. of WCNC), 2013, pp. 4487–4492.
[44] R. Hofstede, P. Čeleda, B. Trammell, I. Drago, R. Sadre, A. Sperotto, [72] R. Salakhutdinov and G. Hinton, “Deep boltzmann machines,” in Proc.
and A. Pras, “Flow monitoring explained: From packet capture to of AISTATS, 2009, pp. 448–455.
data analysis with NetFlow and IPFIX,” IEEE Commun. Surveys Tuts., [73] I. Goodfellow, Y. Bengio, and A. Courville, “Autoencoders,” in Deep
vol. 16, no. 4, pp. 2037–2064, 2014. Learning. MIT Press, 2016.
[45] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed [74] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
analysis of the KDD CUP 99 data set,” in Proc. of CISDA, 2009. learning applied to document recognition,” Proc. of the IEEE, vol. 86,
[46] X. Jing, Z. Yan, and W. Pedrycz, “Security data collection and data no. 11, pp. 2278–2324, 1998.
analytics in the internet: A survey,” IEEE Commun. Surveys Tuts., [75] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
vol. 21, no. 1, pp. 586–618, 2019. Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[47] J. Fonseca, M. Vieira, and H. Madeira, “Testing and comparing web [76] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
vulnerability scanning tools for SQL Injection and XSS attacks,” in S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,”
Proc. of PRDC), 2007, pp. 365–372. in Proc. of NIPS, 2014, p. 2672–2680.
[48] A. Lazarevic, V. Kumar, and J. Srivastava, “Intrusion detection: A [77] G. E. Hinton, “Training products of experts by minimizing contrastive
survey,” Managing Cyber Threats, vol. 5, pp. 19–78, 2005. divergence,” Neural Comput., vol. 14, no. 8, p. 1771–1800, 2002.
[49] University of the Aegean, “AWID2018 dataset,” http://icsdweb.aegean. [78] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm
gr/awid/features.html, 2018. for deep belief nets,” Neural Comput., vol. 18, no. 7, 2006.
[50] Canadian Institute for Cybersecurity, “Intrusion Detection Evalua- [79] M. A. Salama, H. Eid, R. Ramadan, A. Darwish, and A. E. Hassanien,
tion Dataset (CICIDS2017),” https://www.unb.ca/cic/datasets/ids-2017. “Hybrid intelligent intrusion detection scheme,” Adv. Intell. Soft Com-
html, 2017. put., vol. 96, pp. 295–302, 2011.
[51] University of California, Irvine (UCI), “KDD Cup 1999,” http://www. [80] G. Zhao, C. Zhang, and L. Zheng, “Intrusion detection using Deep
kdd.org/kdd-cup/view/kdd-cup-1999, 1999. Belief Network and probabilistic neural network,” in Proc. of CSE,
[52] Kyoto University, “Traffic Data from Kyoto University’s Honeypots,” vol. 1, 2017, pp. 639–642.
http://www.takakura.com/Kyoto_data, 2015. [81] N. Gao, L. Gao, Q. Gao, and H. Wang, “An intrusion detection model
[53] N. Moustafa and J. Slay, “UNSW-NB15: a comprehensive data set for based on Deep Belief Networks,” in Proc. of CBD, 2014, pp. 247–252.
network intrusion detection systems,” in Proc. of MilCIS, 2015. [82] M. Z. Alom, V. Bontupalli, and T. M. Taha, “Intrusion detection using
[54] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “Toward generating Deep Belief Networks,” in Proc. of NAECON, 2015, pp. 339–344.
a new intrusion detection dataset and intrusion traffic characterization,” [83] K. Alrawashdeh and C. Purdy, “Toward an online anomaly intrusion
in Proc. of ICISSP, 2018. detection system based on Deep Learning,” in Proc. of ICMLA, 2016,
[55] Massachussets Institute of Technology, “1998 DARPA Intrusion pp. 195–200.
Detection Evaluation Dataset,” https://www.ll.mit.edu/r-d/datasets/ [84] E. R. Merino, F. M. Castrillejo, J. D. Pin, and D. B. Prats, “Weighted
1998-darpa-intrusion-detection-evaluation-dataset, 1998. contrastive divergence,” CoRR, vol. abs/1801.02567, 2018.
[56] B. Sangster, T. J. O’Connor, T. Cook, R. Fanelli, E. Dean, W. J. Adams, [85] NVIDIA, “CUDA,” https://developer.nvidia.com/cuda-zone, 2020.
C. Morrell, and G. Conti, “Toward instrumenting network warfare [86] B. Abolhasanzadeh, “Nonlinear dimensionality reduction for intrusion
competitions to generate labeled datasets,” in Proc. of CSET, 2009. detection using Auto-Encoder bottleneck features,” in Proc. of IKT,
[57] Canadian Institute for Cybersecurity, “Intrusion Detection Evalu- 2015, pp. 1–5.
ation Dataset (ISCXIDS2012),” https://www.unb.ca/cic/datasets/ids. [87] M. Yousefi-Azar, V. Varadharajan, L. Hamey, and U. Tupakula,
html, 2012. “Autoencoder-based feature learning for cyber security applications,”
[58] A. Shiravi, H. Shiravi, M. Tavallaee, and A. A. Ghorbani, “Toward in Proc. of IJCNN, 2017, pp. 3854–3861.
developing a systematic approach to generate benchmark datasets for [88] V. L. Cao, M. Nicolau, and J. McDermott, “A Hybrid Autoencoder and
intrusion detection,” Comput. Secur., vol. 31, no. 3, pp. 357–374, 2012. density estimation model for anomaly detection,” in Proc. of PPSN,
[59] Canadian Institute for Cybersecurity, “DoS dataset (CIC DoS dataset 2016, pp. 717–726.
2017),” https://www.unb.ca/cic/datasets/dos-dataset.html, 2017. [89] B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. ki Cho,
[60] ALDAPA, “Gure-Kddcup dataset,” http://www.sc.ehu.es/acwaldap/ and H. Chen, “Deep Autoencoding Gaussian Mixture Model for
gureKddcup, 2008. unsupervised anomaly detection,” in Proc. of ICLR, 2018.
[90] A. Y. Javaid, Q. Niyaz, W. Sun, and M. Alam, “A Deep Learning
approach for network intrusion detection system,” in Proc. of BICT,
2015.
IEEE SYSTEMS JOURNAL 14
[91] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, [117] S. Z. Lin, Y. Shi, and Z. Xue, “Character-level intrusion detection based
“Stacked denoising autoencoders: Learning useful representations in a on Convolutional Neural Networks,” in Proc. of IJCNN, 2018, pp. 1–8.
deep network with a local denoising criterion,” J. Mach. Learn. Res., [118] Y. Xiao, C. Xing, T. Zhang, and Z. Zhao, “An intrusion detection model
vol. 11, p. 3371–3408, 2010. based on feature reduction and Convolutional Neural Networks,” IEEE
[92] Y. Yu, J. Long, and Z. Cai, “Network intrusion detection through Access, vol. 7, pp. 42 210–42 219, 2019.
stacking dilated Convolutional Autoencoders,” Secur. Commun. Netw., [119] G. Feng, B. Li, M. Yang, and Z. Yan, “V-CNN: Data visualizing based
vol. 2017, pp. 1–10, 2017. Convolutional Neural Network,” in Proc. of ICSPCC, 2018, pp. 1–6.
[93] Q. Niyaz, W. Sun, and A. Y. Javaid, “A Deep Learning based DDoS [120] R. Ronen, M. Radu, C. Feuerstein, E. Yom-Tov, and M. Ahmadi, “Mi-
detection system in software-defined networking (sdn),” EAI Endorsed crosoft malware classification challenge,” CoRR, vol. abs/1802.10135,
Trans. on Security and Safety, vol. 4, no. 12, 2017. 2018.
[94] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, “A Deep Learning [121] S.-N. Nguyen, V.-Q. Nguyen, J. Choi, and K. Kim, “Design and
approach to network intrusion detection,” IEEE Trans. Emerg. Topics implementation of intrusion detection system using Convolutional
Comput. Intell., vol. 2, no. 1, pp. 41–50, 2018. Neural Network for DoS detection,” in Proc. of ICMLSC, 2018.
[95] F. Farahnakian and J. Heikkonen, “A Deep Auto-Encoder based ap- [122] S. Park, M. Kim, and S. Lee, “Anomaly detection for HTTP using
proach for intrusion detection system,” in Proc. of ICACT, 2018. Convolutional Autoencoders,” IEEE Access, vol. 6, 2018.
[96] L. R. Parker, P. D. Yoo, T. A. Asyhari, L. Chermak, Y. Jhi, and K. Taha, [123] R. Kruse, C. Borgelt, C. Braune, S. Mostaghim, M. Steinbrecher,
“DEMISe: Interpretable Deep extraction and mutual information selec- F. Klawonn, and C. Moewes, Computational Intelligence: A Method-
tion techniques for IoT intrusion detection,” in Proc. of ARES, 2019, ological Introduction, 2nd ed. Springer, 2016.
pp. 98:1–98:10. [124] A. Brown, A. Tuor, B. Hutchinson, and N. Nichols, “Recurrent neural
[97] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in network attention mechanisms for interpretable system log anomaly
Proc. of ICLR, 2014. detection,” in Proc. of MLCS, 2018, pp. 1–8.
[98] Q. P. Nguyen, K. W. Lim, D. M. Divakaran, K. H. Low, and M. C. [125] G. Kim, H. Yi, J. Lee, Y. Paek, and S. Yoon, “LSTM-based system-call
Chan, “GEE: A gradient-based explainable Variational Autoencoder for language modeling and robust ensemble method for designing host-
network anomaly detection,” in Proc. of CNS, 2019, pp. 91–99. based intrusion detection systems,” ArXiv, vol. abs/1611.01726, 2016.
[99] G. Maciá-Fernández, J. Camacho, R. Magán-Carrión, P. García- [126] F. Jiang, Y. Fu, B. B. Gupta, F. Lou, S. Rho, F. Meng, and Z. Tian,
Teodoro, and R. Therón, “UGR’16: A new dataset for the evaluation of “Deep Learning based multi-channel intelligent attack detection for
cyclostationarity-based network IDSs,” Computers & Security, vol. 73, data security,” IEEE Trans. Sustain. Comput., pp. 1–1, 2018.
pp. 411–424, 2018. [127] R. Vinayakumar, K. P. Soman, and P. Poornachandran, “Applying
[100] L. Vu, V. L. Cao, Q. U. Nguyen, D. N. Nguyen, D. T. Hoang, and Convolutional Neural Network for network intrusion detection,” in
E. Dutkiewicz, “Learning latent distribution for distinguishing network Proc. of ICACCI, 2017, pp. 1222–1228.
traffic in intrusion detection system,” in Proc. of ICC, 2019, pp. 1–6. [128] W. Wang, Y. Sheng, J. Wang, X. Zeng, X. Ye, Y. Huang, and
[101] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification M. Zhu, “HAST-IDS: Learning hierarchical spatial-temporal features
with Deep Convolutional Neural Networks,” Commun. ACM, vol. 60, using Deep Neural Networks to improve intrusion detection,” IEEE
no. 6, p. 84–90, 2017. Access, vol. 6, pp. 1792–1806, 2018.
[102] A. Genovese, V. Piuri, K. N. Plataniotis, and F. Scotti, “PalmNet: [129] Y. Zhang, X. Chen, L. Jin, X. Wang, and D. Guo, “Network intrusion
Gabor-PCA Convolutional Networks for touchless palmprint recogni- detection: Based on Deep Hierarchical Network and original flow data,”
tion,” IEEE Trans. Inf. Forensics Security, vol. 14, no. 2, 2019. IEEE Access, vol. 7, pp. 37 004–37 016, 2019.
[103] R. Donida Labati, A. Genovese, E. Muñoz, V. Piuri, and F. Scotti, [130] M. Elbayad, L. Besacier, and J. Verbeek, “Pervasive attention: 2D
“A novel pore extraction method for heterogeneous fingerprint images Convolutional Neural Networks for sequence-to-sequence prediction,”
using Convolutional Neural Networks,” Pattern Recognit. Lett., vol. CoRR, vol. abs/1808.03867, 2018.
113, no. 1, pp. 58–66, 2018. [131] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang,
[104] A. Genovese, V. Piuri, F. Scotti, and S. Vishwakarma, “Touchless T. Weyand, M. Andreetto, and H. Adam, “MobileNets: Efficient
palmprint and finger texture recognition: A Deep Learning fusion Convolutional Neural Networks for mobile vision applications,” CoRR,
approach,” in Proc. of CIVEMSA, 2019. vol. abs/1704.04861, 2017.
[105] R. Donida Labati, A. Genovese, E. Muñoz, V. Piuri, and F. Scotti, “Ap- [132] S. Bai, J. Z. Kolter, and V. Koltun, “An empirical evaluation of generic
plications of computational intelligence in industrial and environmental convolutional and recurrent networks for sequence modeling,” CoRR,
scenarios,” in Learning Systems: from Theory to Practice. Springer, vol. abs/1803.01271, 2018.
2018, vol. 756, pp. 29–46. [133] Y. Hong, U. Hwang, J. Yoo, and S. Yoon, “How generative adversarial
[106] Z. Li, Z. Qin, K. Huang, X. Yang, and S. Ye, “Intrusion detection using networks and their variants work: An overview,” ACM Comput. Surv.,
Convolutional Neural Networks for representation learning,” in Neural vol. 52, no. 1, pp. 10:1–10:43, 2019.
Information Processing. Springer, 2017, pp. 858–866. [134] A. Genovese, V. Piuri, and F. Scotti, “Towards explainable face aging
[107] “One-hot encoding,” https://www.sciencedirect.com/topics/ with Generative Adversarial Networks,” in Proc. of ICIP, 2019.
computer-science/one-hot-encoding, 2020. [135] T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and
[108] M. Kalash, M. Rochan, N. Mohammed, N. D. B. Bruce, Y. Wang, G. Langs, “Unsupervised anomaly detection with Generative Adversar-
and F. Iqbal, “Malware classification with Deep Convolutional Neural ial Networks to guide marker discovery,” CoRR, vol. abs/1703.05921,
Networks,” in Proc. of NTMS, 2018, pp. 1–5. 2017.
[109] T. Kim, S. C. Suh, H. Kim, J. Kim, and J. Kim, “An encoding technique [136] H. Zenati, C. S. Foo, B. Lecouat, G. Manek, and V. R. Chandrasekhar,
for CNN-based network anomaly detection,” in Proc. of Big Data, “Efficient GAN-based anomaly detection,” ArXiv, vol. abs/1802.06222,
2018, pp. 2960–2965. 2018.
[110] R. Blanco, P. Malagón, J. J. Cilla, and J. M. Moya, “Multiclass network [137] A. Gharib, I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “An
attack classifier using CNN tuned with genetic algorithms,” in Proc. of evaluation framework for intrusion detection dataset,” in Proc. of
PATMOS, 2018, pp. 177–182. ICISS), 2016, pp. 1–6.
[111] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image [138] A. Torralba and A. A. Efros, “Unbiased look at dataset bias,” in Proc.
recognition,” in Proc. of CVPR, 2016, pp. 770–778. of CVPR, 2011, pp. 1521–1528.
[112] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, [139] T. Tommasi, N. Patricia, B. Caputo, and T. Tuytelaars, “A deeper
D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with look at dataset bias,” in Domain Adaptation in Computer Vision
convolutions,” in Proc. of CVPR, 2015. Applications. Springer, 2017, pp. 37–55.
[113] K. Simonyan and A. Zisserman, “Very deep convolutional networks [140] J. McHugh, “Testing intrusion detection systems: a critique of the 1998
for large-scale image recognition,” in Proc. of ICLR, 2015. and 1999 DARPA intrusion detection system evaluations as performed
[114] K. Wu, Z. Chen, and W. Li, “A novel intrusion detection model for a by lincoln laboratory,” ACM Trans. Inf. Syst. Secur., vol. 3, pp. 262–
massive network using Convolutional Neural Networks,” IEEE Access, 294, 2000.
vol. 6, pp. 50 850–50 859, 2018. [141] J. G. Moreno-Torres, T. Raeder, R. Alaíz-Rodríguez, N. V. Chawla, and
[115] U. Çekmez, Z. Erdem, A. G. Yavuz, O. K. Sahingoz, and A. Buldu, F. Herrera, “A unifying view on dataset shift in classification,” Pattern
“Network anomaly detection with Deep Learning,” in Proc. of SIU, Recognit., vol. 45, pp. 521–530, 2012.
2018, pp. 1–4. [142] R. F. Fouladi, T. Seifpoor, and E. Anarim, “Frequency characteristics
[116] M. Ito and H. Iyatomi, “Web application firewall using character-level of DoS and DDoS attacks,” in Proc. of SIU, 2013, pp. 1–4.
Convolutional Neural Network,” in Proc. of CSPA, 2018, pp. 103–106.