8478457
8478457
Research Article
Improved Grey Wolf Optimization- (IGWO-) Based Feature
Selection on Multiview Features and Enhanced Multimodal-
Sequential Network Intrusion Detection Approach
1
Department of ECE, P.A. College of Engineering and Technology, Pollachi, India
2
Department of EEE, Nehru Institute of Engineering and Technology, Coimbatore 641105, India
3
Department of EEE, Coimbatore Institute of Technology, Coimbatore, India
4
FESAC, Pentecost University, Accra, Ghana
Received 27 July 2022; Revised 12 January 2023; Accepted 13 January 2023; Published 1 February 2023
Copyright © 2023 M. Yuvaraja et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
The goal of the network intrusion detection system (NIDS) is to spot malicious activity in a network. It seeks to do that by
examining the behavior of the traffic network. To find abnormalities, the NIDS heavily use machine learning (ML) and data
mining techniques. The performance of NIDSs is significantly impacted by feature selection. This is due to the numerous
characteristics that are used in anomaly identification, which take a lot of time. The time required to analyze traffic behavior
and raise the accuracy level is thus influenced by the feature selection strategy. In the current work, the researcher’s goal was
to provide a feature selection model for NIDSs. IGWO (improved grey wolf optimizations) for FSs (feature selections) was
proposed to address these difficulties. The three primary processes in this proposed study are preprocessing, extractions and
classifications of FSs, and evaluations of results. IGWOs are used to choose a subset of input variables by minimizing features
to measure the accuracy in the search space and discover the best solution. A particular structure of HPNs (hierarchical
progressive networks) is controlled by the MDAEs (multimodal deep autoencoders) and ABLSTMs (attention-based long
short-term memories) for enhanced multimodal-sequential IDSs, i.e., AB-LSTMs. It is possible to understand relationships
between neighboring network connections automatically and efficiently integrate information from many levels of
characteristics inside a network connection using the EMS-DHPN technique simultaneously. This work’s suggested hybrid
IDSs called IGWO-EMS-DHPN technique were evaluated using two intrusion datasets: UNSW-NB15 and CICIDS-2017 which
is compared with other existing classifiers in terms of relative accuracies, precisions, recalls, and F1-scores in categorizations.
While several classifiers have been developed, the suggested IGWO-EMS-DHPN classifier obtains maximum accuracy.
DDOS, cross-site scripting, and probe, are becoming more modal DLTs to develop practical feature fusion units
and more prevalent, making information security a growing for traffic data. In the information security industry,
concern. The NIDSs are critical security countermeasures multimodal DLTs are being used to address the chal-
for detecting and preventing malicious intrusions in the lenges of IDSs
cybersecurity industry. Network intrusion detection is a
standard classification problem; its goal is to watch network (2) Constructing smart AIDSs with structures of HPNs
behaviors each moment and assess whether to issue an alarm that fully utilize structured traffic information
message to the network [3]. enhance the accuracy of intrusion recognitions
IDSs are critical scientific breakthroughs in information (3) The suggested approach employs IGWOs for its FSs,
security as they can detect invasions ongoing or even if which minimizes irrelevant and repetitive data, and
already occurred. IDSs must determine whether attacks are important characteristics are selected from search
normal or they are DOSs (denial of services), U2Rs (user spaces with accuracy and optimal solution as the
to roots), probes, and R2Ls (root to locals) [4]. In short, IDSs base. And also, indicating that channels of MDAEs
enhance classifier performances in identifying invasive and interpreters of each access channel can be
behavior. Traditional IDSs monitor traffics using detailed altered to detect in new environments
descriptions like rules or signatures where positive and neg-
ative false detections were common resulting in false alarms. (4) The recommended EMS-DHPN approach’s perfor-
IDSs (intrusion detection systems) can be based on hosts mance in detecting attacks in current networks is
HIDS (Host IDSs) or based on networks NIDSs (network benchmarked using two datasets from 2015 to 2017
IDSs) or based on signatures/anomalies AIDSs (anomaly where accuracy and robustness in binary and multi-
IDSs) or HIDSs (hybrid IDSs) [5]. HIDSs combine features class categorizations were tested
of NIDSs and HIDS and are highly reliable security frame-
The relevant work in the field of intrusion detection is
works [6].
covered in Section 2. Section 3 largely introduced the recom-
MLTs (machine learning technologies) have been exten-
mended multimodal-sequential intrusion detection system.
sively employed to determine various sorts of threats and aid
Section 4 assessed the efficiency of the classification algo-
network administrators prevent intrusions. However, most
rithm on two datasets and reviewed the experimental inves-
typical MLTs are shallow and focus on feature engineering
tigation. Finally, in Section 5, there is a discussion of the
and selections, making them ineffective for huge intrusion
paper’s conclusion and future work.
data classification. External learning is also unsuitable for
high-dimensional learning with huge data. These challenges
led the researchers to seek a better approach [7]. 2. Literature Review
On the other hand, DLTs (deep learning techniques) col-
lect better representations from data to generate significantly Many researchers concentrated on classic MLTs in the early
improved approaches. DLTs have made significant strides in stages of studies where predominantly shallow ANNs (artifi-
AI (artificial intelligence) during the last decade. DLTs have cial neural networks) were used. FFNNs (feedforward neural
also outperformed shallow MLTs in areas like finance, auto- networks) built classifiers while BPs (backpropagations)
matic machine translations, speech recognition, and com- trained network classifiers. IDSs have also been proposed
puter vision [8]. Singular DLTs usually perform admirably based on MLTs including SVMs (support vector machines),
when large amounts of data are available in computer vision. RFs (random forests), and NBs (naive Bayes).
NIDSs can benefit from DLTs, due to their ability to gener- Mohammad [10] suggested grey wolf optimization
alize in new environments while handling voluminous data (GWO) and particle swarm optimization (PSO) for IDS.
and thus can be applied for identifying new threats [9]. They developed two novel approaches (PSO-GWO-NB
Therefore, intrusion detection models will inevitably lose and PSO-GWO-ANN) for FSs and IDS. In addition, this
some information of traffic data and can only use incomplete study evaluated the most frequently repeated features of
feature information to classify because in these methods, the PSO and GWO. For assessments, intrusion datasets were
complex feature information within a network connection used in this study. Furthermore, two classifiers, namely,
and the temporal information between network connections NB and ANN, were used in evaluations where their trials
were either completely ignored or considered simply. showed that MRF features produce good precisions and
This research work proposes EMS-DHPN framework to recalls. Their findings revealed that PSO-GWO-NB classi-
utilize multiple information of traffic data and thus enhanc- fiers outperformed PSO-GWO-ANN classifiers in FSs and
ing the effectiveness of AIDSs. MDAEs are designed to IDSs.
understand distributions of subfeature vectors. Also, AB- Al-Safi et al. [11] used IGs (information gains), SVMs,
LSTM technology is used to support the methodology’s ABCs (artificial bee colonies), and CSs (cuckoo searches)
intelligence. Experimental results validate the developed for identifying anomalies in networks. Their main steps sug-
model. This work’s significant contributions are as detailed gested were FSs and categorizations where FSs were based
below: on IGs, and the best features from the NSL-KDD dataset
were chosen. Their proposed technique performed well on
(1) A novel multiview strategy for minimizing feature modern intrusion datasets (NSLKDD). The study used the
complications and investigating advanced multi- UCI dataset extensively as a baseline for the evaluation of
Wireless Communications and Mobile Computing 3
IDSs. The results of classification methods were measured by Zhong et al. [16] designed big data-based hierarchical
rates of ACCs (accuracy correct classifications), precisions, DL systems (BDHDLS) for IDS. The proposed BDHDLS
and recalls. The proposed method outperformed other mod- analyzed network traffics and payloads using behavioral
ern techniques on the NSLKDD dataset in terms of speed and content features. It worked in three steps: (1) utilizing
and accuracy. Apache Spark big data techniques for FSs and clustering,
Aghdam and Kabiri [12] suggested ACOs (ant colony (2) combining both behavioral and content-based features
optimizations) and nearest neighbors for intrusion identifi- in parallel for improved recognition rated, and (3) utilizing
cations. The study followed FSs and categorizations. Ini- multiple DLTs in a hierarchical tree framework to under-
tially, FSs converted TCP dump data into feature sets or stand independent traffic patterns for all intrusive attacks.
vectors. Subsequently, ACOs searched and identified all Multimodel approaches instead of single-model approaches
features. The resulting FSs were evaluated by using smaller can improve intrusion detection rates. The effectiveness of
feature spaces and assessing classification results. This was IDSs was measured using three metrics: true positive rates,
also followed by untrained ACOs and neighborhood clas- false positive rates, and accuracies. Their results of tests on
sifiers for identifying fresh attacks. Finally, precision, recall the CICIDS2017 dataset showed the time taken to build
(or) false positive, F-measure, and accuracy evaluated clas- BDHDLS was significantly reduced as big data techniques
sification methods. The suggested scheme surpassed prior use many machines in a parallel training strategy.
approaches, improving accuracies in identifying intrusion Haggag et al. [17] devised min-max normalization,
attempts and lowering false alarms with fewer features. SMOTE (synthetic minority oversampling technique), and
Ali et al. [13] presented fast learning network (FLN) FFN (fast fusion networks) for intrusion identifications.
based on PSO called PSO-FLN for their proposed IDS. Their Their suggested DLS-IDS approach has four major blocks:
proposed scheme consisted of three major steps where the dataset selection, dataset preprocessing, class imbalance
exploration-exploitation trade-off described the ability to solutions, and Apache Spark model training. Preprocessing
evaluate different regions of problem spaces and find opti- consists of two steps: feature preparation and feature scaling.
mums. Secondly, particle-based FLN were used which were All features were normalized using min-max. SMOTE took
created by PSO for training classifications of IDS. Creating care of class imbalances. Spark had three primary compo-
particles that represent weighed solutions was the first step nents: driver, cluster management, and worker. FFNs were
in PSO-based optimizations of FLN. To improve accuracy, trained on multilayer perceptron (MLP), RNN (recurrent
both weights and neuron counts in hidden layers had to be neural Networks), and LSTMs. NSL-KDD dataset was used
chosen. Their PSO-FLN model was compared to many in a comparison of Apache Spark with conventional imple-
metaheuristic techniques to train extreme ML and FLN. In mentation calculation delays. The performance of IDS was
terms of learning accuracies, PSO-FLN outperformed other measured in terms of accuracy and precision. Their modified
learning algorithms. model also improved attack recognition rates.
Almomani [14] suggested PSO, GWO, firefly optimiza- Yin et al. [18] proposed DLTs using RNNs called RNN-
tion (FFO), genetic algorithm (GA), and SVM for NIDS. IDS for their IDSs. Their study’s neuron counts and learning
Their recommended FS reduced investigation times, rates affect the model’s binary and multiclass categorization
increased reliability, and relied on PSO, GWO, FFA, and performances. Therefore, preprocessing and categorization
GA for improving the effectiveness of NIDS. The first pre- were the main steps in the proposed study. The first stage
processing stage involved removing labels, features, label used numericalization and normalization techniques. The
encoding, and data binarization. Using Anaconda Python proposed RNN-IDS model had two parts: forward propaga-
Open Source, GAs, PSOs, GWOs, and FFAs were used to tions which computed outputs and BP which transferred
generate 13 sets of MI (mutual information) rules. Finally, residuals to upgrade weights, similar to standard training
the model’s features were classified using MLTs, namely, of NNs. The suggested work compared the proposed scheme
SVMs and J48, and tested on the UNSW-NB15 dataset. with J48, ANNs, RFs, SVMs, and other MLTs where their
Their proposed IDS with limited parameters were found to results showed that the RNN-IDS model enhanced recogni-
be more reliable. tion rates of IDS.
Kuang et al. [15] developed an approach based on Mighan and Kahani [19] proposed the use of MLP, SAE
SVMs for IDS where their scheme integrated KPCA (kernel (stacked autoencoder), DT, and SVM for their IDS. Their
principal component analyses) and GAs. The proposed hybrid scheme combined DLTs with MLTs. Their suggested
model determined whether an activity was an attack by approach had four processes: (1) data preprocesses, (2) latent
employing multilayered SVMs in classifications. The feature extractions, (3) threat categorizations, and (4) deci-
dimensionality of feature vectors was reduced using KPCAs sions. The first step of data normalization was min-max nor-
for quicker training of SVMs. Furthermore, noises caused malizations, which removed dimension effects for each
by feature differences were reduced while performances attribute. Secondly, SAE extracted latent features, i.e., infer-
were enhanced. Finally, the tube diameters, kernel parame- ences from other variables instead of direct observations.
ters, and punishment factor C are used to optimize GAs. Thirdly, attack classification used latent features extracted
The research findings demonstrated that the suggested by DLTs in an extensive data framework. Fourth, SVMs were
technique beat existing detection algorithms on the KDD trained on datasets for classifications before which DTs
CUP99 dataset in terms of predictive accuracies, conver- trained datasets to reduce false positive attack detections.
gence speeds, and generalizations. Following SAE feature extractions, classification-based
4 Wireless Communications and Mobile Computing
intrusion detection algorithms like SVMs, RFs, DTs, and To measure the search space’s accuracy and locate the ideal
NBs were utilized to detect intrusion in huge network traf- solution, describe the preprocessing and feature extraction
fic data quickly. The proposed SAE-SVM’s performances module that splits complicated characteristics from traffic
were satisfactory. data. Figure 1 displays the proposed detection approach for
Lopez-Martin et al. [20] conducted intrusion categoriza- EMS-DHPN.
tion in an IoT context using a conditional VAE (CVAE).
This program in IDS was said to be the first to conduct fea- 3.1. Preprocessing and Feature Extraction. The preprocessing
ture reconstruction using CVAE. The NSL-KDD dataset was and feature extraction module extract features from traffic
used for the studies, and the authors said that the model out- data. First, acquire the traffic database containing previous
performs well-known methods like linear SVM, random for- network behavior. The connection record describes a succes-
est, and multilayer perceptron’s in terms of classification sion of TCP packets from sources to destinations, defined as
accuracy. F = ðf 1, f 2, ::, f nÞ in which f implies features and n ele-
Wu et al. [21] proposed fast object recognition and pic- ments counts in each connection record.
ture enhancement tasks may be completed using an edge Figure 2 depicts the categorization of each record’s fea-
computing and multitask-driven architecture. To encrypt tures based on packets, traffic, or generic. Segmentations
medical pictures and protect patient privacy and the ensured sequential relationships between records were pre-
healthcare environment, Wu et al. [22] presented a unique served and records counts did not change. Then, for each
content-aware deoxyribonucleic acid (DNA) computer sys- document, get numerous feature groups, each one a vector,
tem. A two-stage DL model for effective NIDS was pro- F groups = fF 1, F 2, ::, F mg, which represent counts of fea-
posed by Khan et al. [23] using stacked autoencoder with ture groups.
softmax for classification. It was demonstrated that the The data process divides the large feature vector into
model could extract useful feature representations from smaller features to reduce feature complexity instead of
huge amounts of data. concatenating them all together as in the simple method.
Keserwani et al. [24] introduced an anomaly-based cloud The dividing rule is also adaptable to other observing aids
intrusion detection system (IDS) to discover intrusions in a and feature vision. Because data features vary between net-
cloud network. The suggested method employs a deep learn- work data monitoring technologies. The UNSW-NB15 and
ing strategy for classification and a hybrid metaheuristic CICIDS 2017 evaluation datasets had 2 and 3 categories of
algorithm for feature selection. A mixture of the crow search characteristics, respectively.
algorithm (CSA) and grey wolf optimization (GWO) pulls
pertinent characteristics from the cloud network connection 3.2. Feature Selection. FS is the procedure of minimizing or
for the deep learning classifier section to process more discovering the most significant inputs for processing and
efficiently. For classification, a deep sparse autoencoder analysis. FS approaches are used to choose salient character-
(DSAE) is used. Accuracy, precision, recall or detection rate istics to determine the accuracy in the search space and
(DR), and F1-score are the parameters taken into consider- locate the ideal solution. Optimizations using GWOs and
ation for the performance comparison. Sharing sensitive IGWOs are detailed below.
information online has expanded due to the development 3.2.1. GWOs. GWOs replicate the hunting style of the grey
of network-based services, putting network security at risk. wolf pack. Grey wolves have a tight four-level social order,
The number of assaults and invaders is always increasing, with collections ranging from 5 to 12. GWO is based on
making detection increasingly difficult. The manual labeling the social intelligence of grey wolves that like to live in packs
of audit data takes more time, is more expensive, and is labo- of 5-12. This program simulates GWO’s leadership structure
rious. It is crucial to designate the significant characteristic using four levels: alpha, beta, delta, and omega. The primary
of network traffic for intrusion detection using a classifier responsibility of the alpha is to make decisions (e.g., hunting,
that would obtain a greater performance since the capacity sleeping place, and wake-up time). Beta is known to help
to identify important inputs may minimize size and training alpha make decisions and provide input. A scout or sentinel
time and increase accurate results. To determine the best is a hunter. Omega wolves are controlled by alpha and beta
feature subset that increases classification accuracy, we wolves. Omega wolves must obey all wolves.
searched the feature space in this work using the grey wolf The dominant wolves in the group are termed alpha (α).
optimizer, a swarm-based optimization approach. The alpha wolves’ subordinates are the beta wolves (β).
Omega (ω) wolves are the lowest ranking. Omega wolves
3. Proposed Methodology must submit to all wolves and eat last in a pack. Other
wolves in the collection are termed delta (δ) that consent
The proposed work includes preprocessing and feature to alpha and beta wolves but dominate omega wolves [25].
extraction, feature selection, classification, and results in In GWOs α, β, and δ, direct hunting procedures and ω
the evaluation. Develop EMS-DHPN with a unique hierar- wolves obey.
chical progressive network structure for current assault GWO’s circling behavior is estimated as follows:
detection. EMS-DHPN has three layers. Assist in integrating
complicated features in each traffic flow using a multimodal
fusion method based on the MDAE. The second layer uses ! ! !!
AB-LSTM to acquire temporal data between traffic flows. X ðt + 1Þ = X p ðt Þ + A:D: ð1Þ
Wireless Communications and Mobile Computing 5
Here, !
x1 , !
x2 , and x!
Multimode deep auto encoder
(MDAE) 3 denote the three optimal wolves in
the genetic at a given iteration t. Here, A1 , A2 , and A3 are
Attention-based long short-term ! ! !
memory (AB-LSTM)
evaluated as in the above formulae. Dα , Dβ , and Dδ are
computed as follows:
Results evaluation
! ! ! !
Dα = C 1 : X α − X ,
Figure 1: Proposed EMS-DHPN framework.
! ! ! !
Dβ = C 2 : X β − X , ð8Þ
! !
Here AC denote coefficient vectors, X p denote prey’s ! ! ! !
locations vector, X replicates the location of wolves in a d Dδ = C 3 : X δ − X ,
-dimensional space in which d represents the number of var-
! ! ! !
iables, ðtÞ represents the iterations number, and D is defined In which, C 1 , C 2 , and C 3 are evaluated as per the for-
as below: mula mentioned above.
! !! ! 3.2.2. IGWOs. The GWO uses the best three options to
D = C :X p ðt Þ − X ðt Þ : ð2Þ update each wolf’s location. Omega wolves make up an
enormous population and are less fit than alpha, beta, and
! ! delta wolves. Realigning the weaker wolves can increase
Here, A and C are defined as below: GWO’s diversification capacity and find better results.
In the introduced IGWOs, each generation’s wolves are
! ranked by fitness. They are divided into two groups:
A = 2 a :r!
! ! ð3Þ
1 − a,
enhanced grey wolves and grey wolves. Each upgraded grey
wolf has a master wolf from which it learns using the equa-
!
C = 2:r!
2:
ð4Þ tions given below:
Extraction
features
Packet Trafc General
Multi-features
Here, xn is the continuous solution and A4 is identified are specified. Gaussian restricted Boltzmann machines were
in Equation (3). used as intermediate layers for assessing distributions of
( input units. The goal is to comprehend the final consensus
1, if I ðxn Þ > rand, representation F ′ = fF joint g given m feature groups F groups
xId ðt + 1Þ = ð12Þ
0, otherwise: = fF 1 , F 2 , ::, F m g.
As demonstrated in Figure 4, MDAE model training
processes include forward encodes and backward decodes.
In which xId is the new binary solution for an improved
Forward encoded compute initial joint representation values
wolf in dimension d, rand ϵ ½0 1, and I implies sigmoid
and fuse multifeatures. Back decodes adjust weight matrices
functions depicted below as
based on reconstruction errors. RBMs (restricted Boltzmann
1 machines) with their undirected graphical structures use two
I ðx Þ = : ð13Þ layers: input and hidden, with the number of hidden neu-
1 + exp ð−10 ∗ ðx − 0:5ÞÞ rons varying between ten to hundred and twenty based on
inputs. Joint distributions Pðv, hÞ are easily computed via
Equation (12) is used for updating all wolves’ locations. an energy function:
However, the rand is now 0.5. Explorations use nonlinear
control parameters exp ð−Eðv, hÞÞ
Pðv, hÞ = ,
Z
t2 ð15Þ
a=2 1− 2 : ð14Þ 1 1
T Eðv, hÞ = 2 vT v − 2 cT v + bT h + hT Wv ,
2σ σ
The flow diagram of IGWO is depicted in Figure 3. Where Z is a constant value for normalizations while E
3.3. Classification. Classification is the intrusion detection ðv, hÞ represents energy to function. σ stands for hyperpara-
procedure of a particular feature subset. Classifications are meter, and W implies weight matrices between visible and
sometimes known as targets/labels. Classification is super- hidden layers where c and b are biases for visible and hidden
vised learning with marks and input data (selected features). layers, respectively. The conditional probability distributions
EMS-DHPN stands for enhanced multimodal sequential of the Russian RBM are calculated as below:
intrusion detection. This section has three subsections:
Pðhi = 1jvÞ = SigmoidðWv + bÞ,
(1) Multimodal fusion model ð16Þ
Pðvi jhÞ = N ðWv + bÞ:
(2) Sequential learning model
(3) Multimodal real-time model Divergence algorithm that trains Gaussian RBMs and
RBM parameters θðW, b, cÞ can be obtained using a learning
3.3.1. Multimodal Fusion Model. MDAEs can be used in rule depicted as
IDSs using multimodal learning technology [27]. MDAEs
assume that associations between traffic flow features are ΔW = Edata ðvhÞ − Emodel ðvhÞ, ð17Þ
varied and complementary. The architecture of MDAEs is
depicted in Figure 4. The number of input channels in where Edata is the expectation perceived in the training
the input layer is determined by the feature groups that data and Emodel is the expectation perceived in the data
Wireless Communications and Mobile Computing 7
Start
Make the lower half of the wolves learn from the top half by placing the
wolves in descending order of ftness.
Upgrade X𝛼X𝛽X𝛿
No
End of iterations?
Yes
Forward encoder
Joint representation Fjoint
Joint layer Upper RBM interpreter (a) Individual gaussian RBM (b) Upper RBM
Back decoder
Reconstructed
Intermediate
Gaussian RBM Gaussian RBM Gaussian RBM
layer W
Multi-input F1 F2 Fm
channel WT
Fgroups
(c) Encoder was unfold into MDAE
generated by the RBM model. Backward decodes involved Table 2: Confusion matrix.
unfolding forward stacked RBMs into deeper autoencode
having multiple inputs/outputs. Algorithm 1 describes for- Predicted
ward encoding and its reverse encodes. Positive Negative
Positive Tp Fp
Actual
Negative Fn Tn
3.3.2. Sequential Learning Model. There are two layers in the
AB-LSTM model: LSTMs and attentions which learn to find
solutions and manage more complex sequences and rela-
tionships. There are three nonlinear gating units in LSTM’s simulated in AB-LSTM by the addition of an attention layer
RNNs, namely, forget, input, and output gates [28]. LSTM to outputs from LSTMs which can also serve in detecting
storage units decide on obtaining new data while discarding intruders. Equation (18) is the network’s input gate that con-
old data. This work added an attention layer to LSTMs for tains the level of the new memory. The memory quantity of
focusing on significant data. Long-term dependences were the forget gate is controlled by Equation (19). Last but not
Wireless Communications and Mobile Computing 9
Metrics
Datasets Methods
Precision (%) Recall (%) F-measure (%) Accuracy (%) FAR (%)
PSO-FLN 79.9 80.1 80.8 81.8 0.9
UNSW-NB15 MS-DHPN 84.4 86.2 85.3 86.2 0.7
IGWO-EMS-DHPN 86.5 87.5 88.6 92.6 0.4
PSO-FLN 96.8 97.1 97.3 97.5 0.6
CICIDS 2017 MS-DHPN 98.2 98.1 98.1 98.6 0.3
IGWO-EMS-DHPN 98.9 98.9 99.3 99.7 0.2
least, Equation (22) changes the memory storage of output mance. They can only express part of the traffic data. To
gates while LSTMs calculate control states hi and cell address this issue, EMS-DHPN was developed. To make
states ci : EMS-DHPN more usable, flexible MDAEs were proposed.
To classify network traffics, two layered MDAEs are built
ahead of time, and softmax functions are used at the end
ii = σðW i ∗ ½hi−1 , xi + bi Þ, ð18Þ of AB-LSTMs. The loss functions also compute the differ-
ences between actual and predicted labels yt . The cross-
À Á
f i = σ W f ∗ ½hi−1 , xi + b f , ð19Þ entropy loss function is used for binary classification:
100 100
80 80
60
60
40
40
20
20
0
0
PSO-FLN
PSO-FLN
MS-DHPN
MS-DHPN
IGWO-EMS-DHPN 2
1
IGWO-EMS-DHPN 2
Precision (%)
1
UNSW-NB15 F-measure (%)
CICIDS 2017
UNSW-NB15
100 100
80 80
60 60
40
40
20
20
0
0
PSO-FLN
PSO-FLN
MS-DHPN
MS-DHPN
IGWO-EMS-DHPN 2
2
1 IGWO-EMS-DHPN
Recall (%) 1
Accuracy (%)
UNSW-NB15
CICIDS 2017 UNSW-NB15
CICIDS 2017
Figure 6: Recall performance comparison in various classification
methods. Figure 8: Accuracy performance comparison in various classification
methods.
The confusion matrix in Table 2 determines the metric 4.1.1. Accuracy. Accuracy is the rate of all records accurately
used for two class categorizations. For example, two rows categorized total records.
and two columns in a confusion matrix (Table 2) repre-
sent the number of Fp, Fn, Tp, and Tn in predictive ana- Tp + Tn
lytics (Tn). Accuracy = : ð27Þ
Tp + Tn + Fp + Fn
TP represents attack record counts that were correctly
classified as attacks while TN implies normal records classi-
fied correctly. FP stands for inaccurately classified normal 4.1.2. Precision. Precision is the rate of the accurately recog-
records as attacks while FN denotes attack record counts nized threat records in all detected threats records.
that were incorrectly classified as normal records. Using
Tp
these four, the following metrics were computed for examin- Precision = : ð28Þ
ing the efficiency of classifiers. Tp + Fp
Wireless Communications and Mobile Computing 11
1
0.9
0.8
0.7
0.6
FAR (%)
0.5
0.4
0.3
0.2
0.1
0
PSO-FLN MS-DHPN IGWO-EMS-DHPN
UNSW-NB15 0.9 0.7 0.4
CICIDS 2017 0.6 0.3 0.2
4.1.3. Recall. Recall is the rate of the accurately recognized and other categorization methods, such as PSO-FLN and
threat records in all threat records, also known as the true MS-DHPN. There are high recall rates of 87.5% for one
positive rate (TPR). dataset and 98.9% for the IGWO-EMS-DHPN approach.
We know that IGWO-EMS-DHPN can obtain high recall
Tp rates, which indicates a high detection rate, whereas tradi-
Recall = : ð29Þ
Tp + Fn tional techniques like PSO-FLN and MS-DHPN provide
lower recall rates of 80.1% and 97.1% and 86.2% and
4.1.4. F1-Score or F-Measure. F1-score or F-measure is the 98.1%, respectively.
harmonic mean of precision and recall.
4.2.3. F-Measure Result Comparison. F-measure compari-
2ðRecall × PrecisionÞ sons between the suggested IGWO-EMS-DHPN and tradi-
F1‐score ðorÞF‐measure = : ð30Þ tional approaches, such as PSO-FLN and MS-DHPN, are
Recall + Precision
shown in Figure 7. IGWO-EMS-DHPN is well known for
4.1.5. False Alarm Rate. FAR is the likelihood of incorrectly its high F-measure, exhibiting excellent attack detection
rejecting the null hypothesis for a given test is measured by based on the results. Compared to other methods like
a false positive ratio, also known as a fall-out ratio or false PSO-FLN and MS-DHPN, which provide F-measure rates
alarm ratio. of 80.8% and 97.3% and 85.3% and 98.1%, the proposed
work can provide better attack detection results with two dif-
Fp ferent datasets than the other previous techniques.
FAR = : ð31Þ
Fp + T p
4.2.4. Accuracy Result Comparison. Figure 8 depicts a com-
parison of the accuracy of various classification techniques,
4.2. Result Comparison. Table 3 lists comparative experi- including the proposed IGWO-EMS-DHPN and other cur-
mental results obtained by classifiers on the intrusion detec- rent methods such as PSO-FLN and MS-DHPN. According
tion datasets. to the graph, the proposed method has high accuracy when
4.2.1. Precision Result Comparison. As shown in Figure 5, the compared to previous techniques. The suggested IGWO-
suggested IGWO-EMS-DHPN and existing methods like EMS-DHPN is an excellent method of accurately detecting
PSO-FLN and MS-DHPN are compared in terms of preci- attacks, with high accuracy rates of 92.6% and 99.7% for
sion. Compared to existing approaches, the recommended two separate datasets. When comparing the accuracy of
approach has high precision rates of 86.5% and 98.9%, two existing approaches, PSO-FLN and MS-DHPN provide
separate datasets. In contrast, traditional techniques like lower rates of 81.8%, 97.5%, and 86.2%, respectively. The
PSO-FLN, MS-DHPN, and MS-DHPN have lower precision experiments demonstrated that the suggested system is far
of 79.9%, 96.8%, 84.4%, and 98.2%, respectively. Thus, the superior to the conventional techniques.
suggested approach is valuable and practical for recognizing
short-term threats. 4.2.5. FAR Result Comparison. Figure 9 depicts a comparison
of the FAR of various classification techniques, including the
4.2.2. Recall Result Comparison. Figure 6 depicts the recall proposed IGWO-EMS-DHPN and other current methods
analysis of the proposed suggested IGWO-EMS-DHPN such as PSO-FLN and MS-DHPN. According to the graph,
12 Wireless Communications and Mobile Computing
the proposed method has lesser FAR when compared to [4] W. Zhong and F. Gu, “A multi-level deep learning system for
previous techniques. The suggested IGWO-EMS-DHPN is malware detection,” Expert Systems with Applications,
an excellent method of accurately detecting attacks, with vol. 133, pp. 151–162, 2019.
lesser FAR rates for two separate datasets. [5] D. Papamartzivanos, F. G. Mármol, and G. Kambourakis,
“Introducing deep learning self-adaptive misuse network
intrusion detection systems,” IEEE Access, vol. 7, pp. 13546–
5. Conclusion and Future Work 13560, 2019.
The proposed work uses MDAE and AB-LSTM learning to [6] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, “A deep learning
approach to network intrusion detection,” IEEE transactions
classify attacks (or intrusions) using UNSW-NB15 and
on emerging topics in computational intelligence, vol. 2, no. 1,
CICIDS 2017 datasets. Each of the four steps in the pro- pp. 41–50, 2018.
posed IGWO-EMS-DHPN framework is described below.
[7] S. M. Kasongo and Y. Sun, “A deep learning method with filter
Preprocessing and feature extraction stages can be proc- based feature engineering for wireless intrusion detection sys-
essed separately. A second feature selection step allows tem,” IEEE access, vol. 7, pp. 38597–38607, 2019.
IGWOs to narrow the search field and locate the best solu-
[8] P. Tao, Z. Sun, and Z. Sun, “An improved intrusion detection
tion. Finally, MDAE and AB-LSTM enable EMS-DHPN. A algorithm based on GA and SVM,” Ieee Access, vol. 6,
network connection using the EMS-DHPN technique may pp. 13624–13631, 2018.
efficiently integrate multiple levels of selected features while [9] S. M. H. Bamakan, B. Amiri, M. Mirzabagheri, and Y. Shi, “A
learning temporal information across nearby network con- new intrusion detection approach using PSO based multiple
nections. The IGWO-EMS-DHPN detection technique was criteria linear programming,” Procedia Computer Science,
developed for attack detection and tested on two intrusion vol. 55, pp. 231–237, 2015.
datasets. The findings of several experiments were com- [10] A. H. Mohammad, “Intrusion detection using a new hybrid
pared to other methodologies. The suggested IGWO-EMS- feature selection model,” Intelligent Automation And Soft
DHPN was examined along with other techniques and Computing, vol. 29, no. 3, pp. 65–80, 2021.
compared in terms of accuracies, precisions, recalls, and F [11] A. H. S. Al-Safi, Z. I. R. Hani, and M. M. A. Zahra, “Using a
-measures. Compared to existing PSO-FLN and MS-DHPN, hybrid algorithm and feature selection for network anomaly
the suggested IGWO-EMS-DHPN achieves high accuracy intrusion detection,” Journal of Mechanical Engineering
rates of 92.6% and 99.7% for two separate datasets. Create Research and Developments, vol. 44, pp. 253–262, 2021.
your traffic gathering system. Find new assaults to validate [12] M. H. Aghdam and P. Kabiri, “Feature selection for intru-
the suggested model and research data multimodality in sion detection system using ant colony optimization,” Inter-
information security from a more fundamental perspective national Journal of Network Security, vol. 18, no. 3,
to increase intrusion recognition rate. pp. 420–432, 2016.
[13] M. H. Ali, B. A. D. Al Mohammed, A. Ismail, and M. F. Zolk-
ipli, “A new intrusion detection system based on fast learning
Data Availability network and particle swarm optimization,” IEEE Access, vol. 6,
pp. 20255–20261, 2018.
The (CICIDS2017 dataset) information used to support the
[14] O. Almomani, “A feature selection model for network
study’s conclusions has been deposited in the (kaggle) reposi-
intrusion detection system based on PSO, GWO, FFA and
tory (10.1109/ACCESS.2020.3009843/https://www.kaggle.com/
GA Algorithms,” Symmetry, vol. 12, no. 6, pp. 1046–1065,
datasets/cicdataset/cicids2017). The (UNSW-NB15 dataset) 2020.
data used to support the findings of this study have been depos-
[15] F. Kuang, W. Xu, and S. Zhang, “A novel hybrid KPCA and
ited in the (UNSW-NB15 dataset) repository (10.1109/MilCIS SVM with GA model for intrusion detection,” Applied Soft
.2015.7348942). Computing, vol. 18, pp. 178–184, 2014.
[16] W. Zhong, N. Yu, and C. Ai, “Applying big data based deep
Conflicts of Interest learning system to intrusion detection,” Big Data Mining and
Analytics, vol. 3, no. 3, pp. 181–195, 2020.
The authors declare that they have no conflicts of interest. [17] M. Haggag, M. M. Tantawy, and M. M. El-Soudani, “Imple-
menting a deep learning model for intrusion detection on
References apache spark platform,” IEEE Access, vol. 8, pp. 163660–
163672, 2020.
[1] L. Li, Y. Yu, S. Bai, Y. Hou, and X. Chen, “An effective two-step [18] C. Yin, Y. Zhu, J. Fei, and X. He, “A deep learning approach for
intrusion detection approach based on binary classification intrusion detection using recurrent neural networks,” Ieee
and k-NN,” IEEE Access, vol. 6, pp. 12060–12073, 2018. Access, vol. 5, pp. 21954–21961, 2017.
[2] G. Kim, S. Lee, and S. Kim, “A novel hybrid intrusion detection [19] S. N. Mighan and M. Kahani, “A novel scalable intrusion
method integrating anomaly detection with misuse detection,” detection system based on deep learning,” International
Expert Systems with Applications, vol. 41, no. 4, pp. 1690–1700, Journal of Information Security, vol. 20, no. 3, pp. 387–403,
2014. 2021.
[3] S. Aljawarneh, M. Aldwairi, and M. B. Yassein, “Anomaly- [20] M. Lopez-Martin, B. Carro, A. Sanchez-Esguevillas, and
based intrusion detection system through feature selection J. Lloret, “Conditional variational autoencoder for prediction
analysis and building hybrid efficient model,” Journal of Com- and feature recovery applied to intrusion detection in IoT,”
putational Science, vol. 25, pp. 152–160, 2018. Sensors, vol. 17, no. 9, p. 1967, 2017.
Wireless Communications and Mobile Computing 13