0% found this document useful (0 votes)
209 views91 pages

QT Proposal

The document proposes a multi-stage malware detection model using deep learning and a compiler. It discusses the increasing problem of malware attacks and limitations of existing detection techniques. The proposed model has two stages: (1) a deep learning stage that uses a CNN model to detect malware from grayscale images of malware samples, and (2) a compiler-based stage that validates the first stage's detections and removes any remaining malware using symbol conversion tools. The model is expected to provide more accurate malware detection compared to current methods by leveraging both deep learning and compiler analysis in a multi-stage approach.

Uploaded by

sajadul.cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
209 views91 pages

QT Proposal

The document proposes a multi-stage malware detection model using deep learning and a compiler. It discusses the increasing problem of malware attacks and limitations of existing detection techniques. The proposed model has two stages: (1) a deep learning stage that uses a CNN model to detect malware from grayscale images of malware samples, and (2) a compiler-based stage that validates the first stage's detections and removes any remaining malware using symbol conversion tools. The model is expected to provide more accurate malware detection compared to current methods by leveraging both deep learning and compiler analysis in a multi-stage approach.

Uploaded by

sajadul.cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 91

A proposal submitted in partial fulfilment of the requirements

of Qualifying Defence (QT)

For the degree of Doctor of Philosophy in Science and Technology

Title: Multi-stage Malware Detection Model using Deep Learning Algorithm and
Compiler

Prepared By

Mohammad Sarwar Hossain Mollah

Metric: 4191418

Prepared for

DR. MOHD FADZLI BIN MARHUSIN


DR. SYARIL NIZAM OMAR

10th November 2023

i
Table of Contents
List of Figures.........................................................................................................................iii
List of Tables..........................................................................................................................iii
Abstract.................................................................................................................................iv
Chapter 1................................................................................................................................1
1.1 Introduction...............................................................................................................................1
1.2. Background...............................................................................................................................3
1.3. Problem Statement...................................................................................................................7
1.4. Objective (s) of the Research.....................................................................................................8
1.5. Research Questions...................................................................................................................8
Chapter 2................................................................................................................................9
2.1 Systematic Literature Review......................................................................................................9
2.1 Literature Review.......................................................................................................................9
2.3 Further Direction......................................................................................................................38
2.4 Theoretical framework.............................................................................................................39
Chapter 3..............................................................................................................................42
3.1 Methodology............................................................................................................................42
3.1.1 Dataset.....................................................................................................................................................43
3.1.2 Deep Learning base stage (stage-1)..........................................................................................................47
3.1.3 Compiler base stage (stage-2)..................................................................................................................49
3.1.4 Proposed multi-stage malware detection model.....................................................................................53

Chapter 4..............................................................................................................................57
4.1 Expected Outcome....................................................................................................................57
Chapter 5..............................................................................................................................60
5.1 Conclusion................................................................................................................................60
Reference..............................................................................................................................61

ii
List of Figures
Fig 1.1 A typical Convolutional Neural Network
Fig. 2.1 Process of Literature Searching
Fig. 2.2 The proposed Theoretical Framework
Fig 3.1 Transformation of malware binary to grayscale image
Fig 3.2 Variants of 6 malware families Dontovo.A, Autorun.K, Lolyda.AT, Adialer.C, Swizzor.gen!I,
Agent.FYI and benign samples.
Fig.3.3 Different section of a Trojan malware: Dontovo.A
Fig 3.4 1st stage of the proposed model
Fig. 3.5 2nd stage of the proposed model
Fig. 3.6 (a) Training and validation accuracy of 1st stage of the proposed model
Fig. 3.6 (b) Training and validation loss of 1st stage of the proposed model
Fig. 3.7 Proposed Multistage Malware Detection Model
Fig. 3.8 Flow chart for implementing the proposed malware detection model.ion technique.

iii
iv
List of Tables
Table 2.1: The research paper for the systematic review studies
Table 2.2: Related studies were found using online sources.
Table 2.3: The inclusion and exclusion criteria
Table 2.4: Journal and conference names, as well as the volume of research published.
Table 2.5: An overview of the evaluation techniques.
Table 3.1: Malware Dataset of 25 Familiesy and Benign Samples.
Table 4.1: Overall expected outcome of the proposed malware detection technique

v
Abstract
Increasingly, the number of malware and advanced cyber-attacks is becoming a severe
problem for any financial or defense system. From individual user devices to national
critical infrastructure, all are being hacked regularly, which causes enormous economic
losses and has other effects. The use of open-source programs, android apps, free software,
and user ignorance are sources of cyber-attacks. Presently, mere anti-malware software
is not enough to detect all those attacks due to the complexity of the malware behavior and
the use of traditional detection techniques. There needs to be more datasets (symbol-based
& image-based) available for multistage analysis in detecting malware. The main problem
of these research findings is a lack of study on machine learning (various types of
datasets) and compiler (symbol-based) multistage analysis, but it is essential for detection
and execution protection. There is a lack of performance evaluation between the existing
and proposed malware detection models. The researcher proposed a novel multistage
malware detection model using deep learning and a compiler to fill the gap. The proposed
technique has two stages; the deep learning base stage (stage-1) and the second one is the
compiler base stage (stage-2). Initially, in the deep learning base stage, malware as a greyscale
image will be fed from the available dataset (malimg dataset) into the Convolutional Neural
Network (CNN) to train the model. Next, malware as a greyscale image will be provided to the
CNN model to test its performance. After the malware detection in the first stage, the same
greyscale image is converted into a symbol using a tool iMAGE to sYMBOL (IDA Pro, and
radare2) is fed to the 2nd stage of the model. Subsequently, 2nd stage (compiler) of the model will
detect the malware and then remove the malware from the system. The 2nd stage of the model
will validate the accuracy of the first stage of the model and remove the malware from the
system, which can’t be done by the first stage of the model. Finally, the proposed model will be
evaluated against the existing malware detection model, and it is expect that the proposed model
will provide more accurate results than the current one.

vi
Chapter 1

1.1 Introduction
IoT and Cloud computing are taking place in every aspect of life from the business sector to the
health sector, the education sector to the manufacturing, the banking sector to government sector
especially the government critical infrastructure like a central bank, airport, utility service
(power, gas, water etc.) to name a few . Unfortunately, the number of reported security breaches
has been growing considerably in recent years [1] with reports of attacks caused by malware are
making the headlines, now more than ever. The rapid velocity of malware creation and
widespread use of complex techniques like obfuscation, polymorphism, and morphing making
the environment difficult in detecting the malware. Almost every week one such security
vulnerability is reported which may be seen as a failure by the security community on the control
and detection of malicious content. According to the McAfee Labs [299] report, the average
number of malware attacks per minute was 588 in the third quarter of 2020 and grew to 648 in
the fourth quarter of 2020. Meanwhile, AV-TEST institute [310] reported the spread of malware
increased significantly from 2017, with 719.15 million to 1324.25 million malware in 2022.
According to Symantec Report [4102] digital security threats can be emerged from new and
unexpected sources. The growth of the malware variants has increased largely showed in a
statistic where 78 percent of the new variants owing to the Kotver Trojan (Trojan. Kotver).
Among others malware, ransomware and its variation become the most dangerous attack name in
now days and frequently it comes in newspaper headlines. The Colonial Pipeline was the victim
of a ransomware attack in May 2021. It infected some of the pipeline's digital systems, shutting it
down for a week. The shutdown affected consumers and airlines along the East Coast. This
caused US government to declare a state of emergency and it covers all international newspaper
[5103]. The number of attacks on U.S. hospitals each year doubled between 2016 and 2021"
These are direct threats to patient safety," told representative of American Hospital Association,
covered news in ABC News [6104]. LockBit has emerged as the most prolific name in
ransomware attacks and has now been blamed for an incident that has hit Royal Mail’s
international operations published in Gurdian newspaper [7105]. Statistics showed that the pace
of malware development is very high and the complexity of malware writing is increasing day by

1
day. Thus, the detection of malware using traditional approaches namely rule-based, graph-
based, entropy-based, etc. becomes insufficient. Toldinas, J. [899] proposes a novel approach for
network intrusion detection using multistage deep learning image recognition. X. Li [9100]
introduced a detection model for multi-stage and low-frequency strikes using a multi-layer
bidirectional long and short-term memory (B-MLSTM) network. M. Injadat [10101] proposes a
novel multi-stage optimized ML-based NIDS framework that reduces computational complexity
while maintaining its detection performance. To tackle this problem, machine learning with other
technologies like compiler [117], data mining [12-1315-16] and few more can provide good
results in detecting malware [1424]. Furthermore, Deep learning techniques have become more
dominant in recent times and is shown to provide promising results [15-1620-21], [14, 1724-25]
especially when it is paired with other technologies such as compiler [18-204-6], graph theory
[21-2411-14], cryptography [2517] to name a few when employed in detecting malware. In the
systematic literature review chapter, a total of 144 papers have been collected from various
reputed journals like IEEE Xplore, Elsevier, Springer as well as many reputed digital libraries.
After systematic review of the research papers will reveals that deep learning with other
technologies provide extraordinary performance in detecting malware. This literature review also
provides a direction in developing a new malware detection model. Moreover, details of myriad
malware detection model employed in various diversified field will be highlighted in systematic
literature review chapter.

On the other hand, researchers have been continuously proposing new malware detection
technique. Presently, it has become a race between malware writers writers with anti-malware
creators. Therefore, continued research in this filed will always stay relevant. In order to propose
new malware detection technique, it is required to undergo a systematic review of the existing
research which will be provide in Chapter-2. The proposed research proposal for malware
detection is organised in following ways:
I. Chapter 1 (Introduction, Background, Problem statement, Objectives and Research
question).
II. Chapter 2 (Systematic Literature Review)
III. Chapter-3 (Methodology)
IV. Chapter-4 (Expected Outcome)
V. Chapter-5 (Conclusion)

2
1.2. Background
In this section, the researcher study the background of the available malware detection
techniques and its associated concept. Before going to discuss the different malware detection
technique, its required to have a discussion on malware analysis technique.

Malware Analysis: Malware analysis is required for developing any malware detection model.
There are two types of malware analysis, namely static malware analysis and dynamic malware
analysis. In static malware analysis, various malware features are extracted such as hash value,
N-grams, opcodes, strings, and PE header information. The following tools; Pe-view, PEid, CFF
Explorer, Psfile, Disassemblers have been used to extract the malware features [2636]. In
dynamic malware analysis, various features like file system operations, API calls, system calls,
registry key changes, process execution, network activities are extracted using various tools i.e.;
Process Explorer, ProcMon, TCPdump, Regshot, Sandboxes etc. [2636]. The convolutional
Neural Network (CNN) and Recurrent Neural Network (RNN) have been used for malware
feature extraction and further for malware detection [1725]. Malware analysis become
prerequisite to build machine learning base model. There are two types of malware analysis
namely static malware analysis and dynamic malware analysis. In static malware analysis,
various malware features are extracted like hash value, N-grams, opcodes, strings, and PE header
information after analysis. There have various tools; Pe-view, PEid, CFF Explorer, Psfile,
Disassemblers etc. to extract the malware features differently [36]. In dynamic malware analysis,
various features are also extracted like file system operations, API calls, System calls, registry
key changes, process execution and network activities using various tools i.e.; Process Explorer,
ProcMon, TCPdump, Regshot, Sandboxes etc36]. The extracted static and dynamic malware
features will further used for malware classification by machine learning algorithms and neural
networks.

Malware Detection ApproachTechnique: Broadly, the following malware detection techniques


have been followed: signature-based, behaviour-based, heuristic-based, etc.

Signature based malware detection technique: It is a conventional static malware analysis


technique which is very fast in detecting known malicious files as compared to other techniques
such as Rule based, graph matching, Statical Method Entropy based, etc. In this technique, static

3
malware analysis collects the particular signature or pattern of the files. Signatures of malware
files stored in the database are used to verify the signature of an unknown file. If the signature of
the file matches, then it is declared as a malicious file, otherwise a benign file; it means it cannot
detect unknown malware because of the requirement of signature in advance. Another problem is
it can be easily evaded by the obfuscated malware[1621]. Mostly, antivirus software is
implemented using signature-based techniques [2731].Broadly the following malware detection
approaches have been followed: signature-based, behaviour-based, heuristic-based etc.

Signature based malware detection technique: It is a conventional technique which is very fast in
detecting known malicious files as compared to other techniques such as Rule base, graph
matching, Statical Method Entropy base etc. In this technique, static malware analysis has been
used to collects the particular signature or pattern of the files. Then the signature of the malware
will be stored in the database to verify the signature of the incoming malware. If the signature of
the incoming malware is matched then it is declared as a malicious file otherwise benign file.
This technique commonly used in building antivirus software [31].

Behaviour based malware detection technique: In this technique, dynamic malware analysis has
been used to collect various features such as APIs, browser events, systems events, network
events, etc. And its very effective as it does not need the executable to be decrypted. Still, it has
some draw back like-time intensive, resource consuming, less scalable and some malicious
activity unobserved. In the behaviour-based approach, the detection of malware is done based on
malicious activities performed by malware during or post execution. In this technique, dynamic
malware analysis is used to collect various features such as APIs, browser events, systems
events, network events, etc. Hooking relevant APIs are critical to observe and log meaningful
API Call sequences.

Heuristic-based malware detection techniques: The heuristics techniques are the extended
version of behaviour-based malware detection techniques [2826]. Instead of looking for a perfect
solution, heuristic strategies look for a quick solution that falls within an acceptable range of
accuracy. Heuristics are used in machine learning (ML) and artificial intelligence (AI) when it is
deemed impractical to solve a particular problem with a step-by-step algorithm. Because a
heuristic approach emphasizes speed over accuracy, it is often combined with optimization

4
algorithms to improve results. The heuristics techniques are the extended version of behaviour-
based malware detection technique [9] using this technique and getting good performance.
Therefore, similar behaviour is used to train the malware detection system for detecting new
malware or variants of known malware. Additionally, the behaviour-based approach also
provides a solution to handle obfuscated malware.

Machine Learning based Malware Detection Techniques: Due to exponential growth of malware
it is very difficult for the anti-virus company and malware researcher to classify the malware.
Therefore, image base automatic classification of malware become suitable to classify the large
volume of malware. Machine learning and deep learning is an image base malware classification
technique that helps discover patterns automatically from a large amount of data to predict the
outcome of unknown observations based on previously identified patterns set. One of the major
applications of Machine Learning (ML) is malware detection and classification, in recent times
[2927], [3032] and [2636]. The following machine learning algorithms -Hidden Markov Models,
Support Vector Machines, k-nearest neighbours, random forests and convolutional neural
networks are successfully applied in the field of malware detection [3129]. Hidden Markov
Model (HMM) is a statistical model which is trained on a specific malware family and later used
to classify between samples of such family and non-infected files. Support Vector Machines
(SVMs) are a class of supervised learning techniques used for binary classification. SVM
perform features reduction steps before training and testing models to get better result [14, 17,
2824-26]. K-Nearest Neighbour(K-NN) algorithm is a supervised learning technique that works
on labelled training data. 𝑘𝑘-NN algorithm has an important advantage in that it involves no
training as already it has the labelled training data. This algorithm is mostly useful when there is
an imbalance in training samples. Random Forest (RF) is a collection, or ensemble, of decision
trees classifiers that run randomly for a specified number of times [1]. Each run votes for the
most accurate results given the training set passed through the classifier. The result is based on
the most voted for tree. Machine learning is a field that helps discover patterns automatically
from a large amount of data to predict the outcome of unknown observations based on previously
identified patterns set. One of the major applications of Machine Learning (ML) is malware
classification in recent times. The following machine learning algorithms -Hidden Markov
Models, Support Vector Machines, k-nearest neighbours, random forests and convolutional
neural networks are successfully applied in the field of malware detection. Hidden Markov

5
Model (HMM) is a statistical model which is trained on a specific malware family and later used
to classify between samples of such family and non-infected files. Support Vector Machines
(SVMs) are a class of supervised learning techniques used for binary classification. SVM
perform features reduction steps before training and testing models to get better result [24-26].
K-Nearest Neighbour(K-NN) algorithm is a supervised learning technique that works on labelled
training data. 𝑘𝑘-NN algorithm has an important advantage in that it involves no training as
already has the labelled training data. This algorithm is mostly useful when there is an imbalance
in training samples. Random Forest (RF) is a collection, or ensemble, of decision trees classifiers
that run randomly for a specified number of times [1]. Each run votes for the most accurate
results given the training set passed through the classifier. The result is based on the most voted-
for tree.

Deep Learning based Malware Detection Technique: Convolutional Neural Network (CNN)
is a type of deep neural network in which the connectivity pattern between its neurons is inspired
by the anatomy of the brain cell and how brain cells operate [3129]. CNN algorithm are
increasingly become popular as they are capable of automatically learning distinctive features of
malware, therefore, anti-virus companies are also using CNN algorithm [1], [2-39-10].
Moreover, it has other application such as image recognition and language processing etc. CNN
architecture has three layers such as a convolution layer, pooling layer and fully connected layer.
When these layers are stacked, a CNN architecture is formed [1520], [14, 1724-25]. The details
of the CNN architecture are described and depicted in figure 1.1 below:Convolutional Neural
Network (CNN) is a type of deep neural network in which the connectivity pattern between its
neurons is inspired by the anatomy of the brain cell and it functioning like a human brain cell.
CNN can learn features automatically and it is attracting various fields such as image
recognition and language processing. The application of CNN model become popular in malware
detection field including anti-virus company are also using CNN model [1],24-25]. The details of
the CNN architecture are describes and depicted in below figure 1.1

6
Fig. 1.1 A typical Convolutional Neural Network

Convolution Layer: Majority of the computation takes place in this foundation layer. It is an
arrangement of feature maps with neurons. A collection of learnable filters, or kernels, make up
the layer's parameters shown in figure 1.1. In order to create a distinct 2-dimensional activation
map, these filters are convolved with the feature maps. This map is then layered along the depth
dimension to create the output volume. By sharing parameters among neurons in the same
feature map, the complexity of the network is reduced through lowering the number of

7
parameters. The depth (number of filters per layer), stride (for filter movement), and zero
padding (to regulate the spatial size of output) are the hyperparameters that determine the size of
the output volume.

Pooling Layer: This layer applied in operation after the convolution layer shown in figure 1.1.
The function of this layer is to reduce the size of feature maps for faster computation. In order to
do that this layer serving to minimize the spatial dimension of the activation maps and the
number of parameters in the net. Pooling operations that are often used include max pooling,
average pooling, stochastic pooling, spectral pooling and spatial pyramid pooling. In Max
Pooling, the largest element is taken from feature map. Average Pooling calculates the average
of the elements in a predefined sized image section. The total sum of the elements in the
predefined section is computed in Sum Pooling.

Fully Connected Layer: This layer operates on a flattened input where each input is connected to
all neurons. High-level reasoning is carried out here. Because the neurons are not spatially
oriented (one dimensional). This layer is used to optimize objectives such as class scores. FC
layers are usually found towards the end of CNN architectures. This is how the CNN works and
is used to provide the most promising results in detecting malware [3129], [3233].

Convolution Layer: Majority of the computation takes place of this foundation layers. It is an
arrangement of feature maps with neurons. A collection of learnable filters, or kernels, make up
the layer's parameters shown in figure 1.1. In order to create a distinct 2-dimensional activation
map, these filters are convolved with the feature maps. This map is then layered along the depth
dimension to create the output volume. By sharing parameters among neurons in the same
feature map, the complexity of the network is reduced through lowering the number of
parameters. The depth (number of filters per layer), stride (for filter movement), and zero
padding (to regulate the spatial size of output) are the hyperparameters that determine the size of
the output volume.

Pooling Layer: This layer applied in operation after the convolution layer shown in figure 1.1.
The function of this layer is to reduce the size of feature maps for faster computation. In order to
do that this layer serving to minimize the spatial dimension of the activation maps and the
number of parameters in the net. Pooling operations that are often used include max pooling,

8
average pooling, stochastic pooling, spectral pooling and spatial pyramid pooling. In Max
Pooling, the largest element is taken from feature map. Average Pooling calculates the average
of the elements in a predefined sized Image section. The total sum of the elements in the
predefined section is computed in Sum Pooling.

Fully Connected Layer: This layer operates on a flattened input where each input is connected to
all neurons. High-level reasoning is carried out here. Because the neurons are not spatially
oriented (one dimensional). This layer used to optimize objectives such as class scores. FC layers
are usually found towards the end of CNN architectures. This is how the CNN is working and
used to provides the most promising result [29],33].

Challenges of ML-based Malware Detection: Different studies showed, every day thousands of
new malware are grown and malware changes their behaviour in less than 24 hours. Thus,
Mmachine learning-based technology is also facing some challenges while applying for
malware detection. The first challenge is consistently updating the machine learning
classifier to detect new and muted malware. That is why ML based technique increases the
computation cost and is complicated compared to other techniques. The 2nd challenge is
adversarial machine learning when malware developer uses the tactics of bypassing the
malware detector [3349]. This is an undeniable fact that there is no other way except
machine learning to detect the present malware which is highly complex. Now the thing is
how to we tackle these challenges to implement ML in malware detection. Feature selection
and dimensionality reduction algorithms can be applied to select only the most useful and
discriminative malware feature. This could lower the training cost. The 2nd challenge is
adversarial ML can be resolved by building a hybrid malware classifier where static and dynamic
features can be utilised. Ensemble malware detectors can be more resilient in handling
adversarial machine learning where as the malware detector based on a single ML algorithm
could be bypassed easily [34, 35, 1618-21].

In the above, researcher discussed and analysed different malware analysis and malware
detection techniques and found their weaknesses and challenges. Subsequently, researcher also
observed a new trend that machine learning algorithm coupled with other technologies such as
graph theory, data mining, Hidden Markov model, cryptography, compiler provide good result in
detecting malware. Analysing this new trend of malware detection is very important in

9
developing a new malware detection technique. Moreover, an exhaustive systematic literature
review will be conducted in Chapter 2 to expand on the details of the current trend of the
malware detection techniques as well as the weakness and challenges. After studying the existing
background of the malware detection approaches it is evident that the current malware detection
techniques are not enough to detect the incoming attack, so there is a need for new malware
detection technique. In the above, we have discussed and analysed different malware analysis
techniques and malware detection techniques and found weaknesses and challenges. Then, we
have also seen researcher focuses on building new malware detection technique using machine
learning algorithm with other technology like graph theory, data mining, Hidden Markov model,
cryptography, compiler etc. The analysis of the new trend of malware detection techniques is
very important to build a new malware technique. Further, the available solutions to the
weaknesses and challenges are also presented. Besides, an exhaustive systematic literature
review will be conducted in Chapter 2 to know the more details of the current trend of the
malware detection techniques as well as the weakness and challenges. After studying the existing
background of the malware detection approaches it is evident that the current malware detection
techniques are not enough to detect the incoming attack, so there is a need for new malware
detection technique. Toldinas, J. [899] proposes a novel approach for network intrusion detection
using multistage deep learning image recognition. X. Li [9100] introduced a detection model for
multi-stage and low-frequency strikes using a multi-layer bidirectional long and short-term
memory (B-MLSTM) network is provided. M. Injadat.[10101] proposes a novel multi-stage
optimized ML-based NIDS framework that reduces computational complexity while maintaining
its detection performance.

Therefore, researcher proposed a novel multi-stage model composed of two different stage:
Deep learning based stage (stage-1) and compiler based stage (stage-2) have been proposed. In
stage-1, Convolutional Neural Network (CNN) algorithm will be used for malware feature
extraction and malware detection. The details of the stage-1will be discussed in the methodology
part of the proposal.

In stage 2, the researcher proposed computer compiler will be used as malware detector where
keeping the basic function of the compiler will remain same. The function of a compiler is to

10
compile a program if not get any bug in the source code otherwise provide error. A compiler has
six different phase, respectively-lexical analyser, syntax analyser, semantic analyser,
intermediate coder generator, code optimiser and target code generator. Each of the phase of the
compiler has distinctive role to compile a program. Therefore, when a source code contain bug
then the program will not compile and it will provide an error. This principal of the compiler will
be employed to detect malware. The details of the compiler base stage will be discussed in the
methodology part of the proposal.

Therefore, a multi-stage model composed of two different stage: Stage-1 Deep learning based
sub model. Stage-2 compiler based sub model has been proposed. In stage 1, the deep learning
based sub model, Convolutional Neural Network (CNN) will be used for malware detection. In
order to do that at first a malware as grey scale image will feed from the available dataset into
the model for training. Then testing of the CNN base deep learning model to detect the malware.
The details of the model will be discussed in the methodology part of the proposal.

In stage 2, the compiler base sub model is developed in such a way that the malware will not be
able to harm/execute into the real user system. It is known to all that a function of a compiler is
to compile a program before it runs into the computer system. Therefore, when a source code
containing a malware wants to harm a computer it will be compiled first and the tokens in a
detector in a compiler will match with some parts of the file. A compiler has six different phase
respectively-lexical analyser, syntax analyser, semantic analyser, intermediate coder generator,
code optimiser and target code generator. So, when passed a symbol to the GCC compiler, it
will provide an error in lexical phase of a compiler then in coder optimiser phase it will just be
deleted. By this way the malware file will not infect the computer system. The details of the
model will be discussed in the methodology part of the proposal.

1.3. Problem Statement


Due to exponential growth of malware and rapid adoption of new technique, anti-virus company
and security researcher is facing challenge to classify, detect and remove the malware. It is
identified in the proposed research the increase of malware and advanced cyber-attacks, which
are now becoming a serious problem. From individual user devices to national critical
infrastructure, which are hacked regularly incurring huge financial losses and have other effects.

11
As a basis for the research study, the researcher has identified the following problems to be in
threefold.

I. There is a lack of datasets (image-based & symbol-based) available for multistage


analysis to detect malware.
II. Image base (deep learning) and compiler base multistage malware detection and
prevention model not enough done.
III.[II.] Lack of performance evaluation against existing malware detection model and the
proposed model.

1.4. Objective (s) of the Research


[I.] To create a novel dataset (image-based & symbol-based) using current malware andTo
investigate the ideal features from dataset towards recognizing malware.
I.[II.] To develop a malware detection model using deep learning algorithm (CNN) and
compiler.
II.[III.] To evaluate the performance of proposed malware detection model against existing
malware detection model.

1.5. Research Questions


The following research questions to be addressed to carry out study are as follows:

RQ1: How can researcher develop dataset and select ideal features from dataset from dataset for
multistage analysis of the proposed malware detection model?

RQ2: How can researcher develop the multistage malware detection model?

RQ3: How can researcher validate the accuracy of the proposed malware detection model against
existing malware detection model?
RQ1: How can a dataset be created, and what are the best attributes chosen for the suggested
malware detection and prevention model's multistage analysis by the researcher?
RQ2: How can the researcher create a multi-phase model for detecting and preventing malware?
RQ3: How can the researcher compare the suggested approach to the current malware detection
and prevention paradigm?

12
13
Chapter 2

2.1 Systematic Literature Review


“Malware, also known as malicious code and malicious software, refers to a program that is
inserted into a system, usually covertly, with the intent of compromising the confidentiality,
integrity, or availability of the victim's data, applications, or operating system (OS) or of
otherwise annoying or disrupting the victim.” [362]. A virus, worm, Trojan horse, Spyware,
phishing, DOS, DDOS, SQL injection and ransomware or other code-based entity that infects a
host. Spyware and some forms of adware are also examples of malicious codemalware. The first
malware was created for fun, but now it’s a profit-driven industry and in the last couple of years
advanced malware uses complex obfuscation methods. Now a days malware like spyware,
ransomware and its variant have emerged as a cyber security threat that continuously changes to
victim computer systems, smart devices, bank server, and critical infrastructure including utility
system. As a result, malware detection has always been a major concern, owing to shortcomings
in performance accuracy, analysis type, and malware detection techniques that fail to detect
malware attacks and muted malware attacks. Now it’s become a battle between malware creators
with anti-malware writers. Therefore, it will always be relevant to research on this field. In order
to propose any malware detection technique, it is required to undergo a systematic review of the
existing research. Therefore, a thorough literature review has been conducted by searching
various reputed journals, conference papers and other accepted online research work to
investigate their detection techniques. Furthermore, an empirical study is conducted to evaluate
the performance the of the malware detection techniques and address related difficulties that
might motivate future research. The objective of the research literature review is to unfold new
malware detection techniques and avoid redundant research work. In this literature review, 144
papers have been selected for systematic literature review.

2.1 Literature Review


In this section, the review of various proposed malware detection technique will be discussed
and analysed. N. Z. Gorment et al. [1] presented machine learning base malware detection

14
technique chronology started from 1950 to 2022.. Sahay et al. [362] presented the malware
detection technique has emerged in 1950 fFrom that time to now there have various malware
detection technique using various technology from signature base malware detection techniques
to malware current machine learning base technology especially deep learning base techniques.
The recent trend of malware detection technique will be investigated in details in Table 2.1.:

While there hasn’t been much work on malware detection using compiler with deep learning, D.
Pizzolotto et al. [373] develop a compiler based deep learning network with capable of
automatically detecting malware with 99% accuracy. In order to do that optimiser phase of the
compiler need to train by Convolutional Neural Network (CNN) and Long-Short Term Memory
network (LSTM). It is known to all that at the time of compilation from source code to binary
code, several flags are given to the compiler. Then these flags is to be trained by CNN and
LSTM. They analysed O0 and O2 flags on Linux x86 64, compiled by gcc or clang. These flags
can also be used to optimise towards faster executables. Therefore, whenever a source code with
malware to be compiled, the proposed system will fix this bug. Hence by this way this proposed
system act as a malware detector within a compiler system. Xiaolei Ren et al. [184] proposed
first time to take step to systematically studying the effectiveness of compiler optimisation on
binary code differences for malware detection by using non default setting of compiler
optimisation. It was found that compiler optimisation technique using non default setting likely
to provide efficiency to detect malware. Z. Tian et al. [195] proposed to reveal fine-grained
compiler details for individual functions by designing a lightweight function abstraction strategy
and leveraging typical sequence-oriented neural networks (CNN and RNN) to solve compiler
identification problem. All of them are implemented in a tool called (Neural modelling-based
Compiler Identification). Then construct a large dataset consisting of 854,858 unique functions
by processing a set of diverse real-world projects, and systematically evaluate and compare the
performance of the proposed methods with respect to revealing compiler family, optimisation
level, compiler version, and compiler setting combination respectively. The experimental
evaluation shows that Neural CI CI(code intelligence) achieves promising performance of
revealing these fine-grained compiler details and outperforms existing function level compiler
identification methods in both detection performance and comprehensiveness. Chen, Y et al.
[206] present an end-to-end system called HIMALIA which employs deep learning to recover

15
compiler optimisation levels from disassembled binary codes. Then propose a new function
representation method and employ an embedding layer in the network architecture which is
trained together with recurrent layers. These methods can help to retain and capture the semantic
information of the instructions as much as possible. Later they build a dataset consisting of
378,695 different functions from 5828 binaries and perform comprehensive experiments. The
results show that HIMALIA exhibits high accuracy and the results are apexplicable. Z. Wang et
al. [117] describe the relationship between machine learning and compiler optimisation. Later
they proposed a model for optimise the compiler using machine learning. L. Jones et al. [388]
proposed a static analysis model to bypass the deference of the anti-malware technique by simply
compiling the source code with different flags or with a different compiler rather use a very
sophisticated obfuscation technique.

Sun, Q. et al. [2212] propose a method for detecting malware using graphs' spectral heat and
wave signatures, which are efficient. Then extracted 250 and 1,000 heat and wave
representations and trained and tested heat and wave representations on eight machine learning
classifiers. They used a dataset of 37,537 unpacked Windows malware executables and extracted
the control flow graph (CFG) of each windows malware to obtain the spectral representations.
The experimental results showed that by using heat and wave spectral graph theory, the best
malware analysis accuracy reached 95.9%. Li. S. [2313] design a malware classifier based on
graph convolutional network to adapt to the difference of malware characteristics. In order to do
that firstly extract the API call sequence from the malware code and generate a directed cycle
graph, then use the Markov chain and principal component analysis method to extract the feature
map of the graph, and design a classifier based on graph convolutional network, and finally
analyse and compare the performance of the method A. Hellal et al. [2414] present a survey of
graph-based approaches for malware detection.

Souri, A [1316] et al. presents a systematic and detailed survey of the malware detection
mechanisms using data mining techniques. The main contributions of this paper are: (1)
providing a summary of the current challenges related to the malware detection approaches in
data mining, (2) presenting a systematic and categorised overview of the current approaches to
machine learning mechanisms, (3) exploring the structure of the significant methods in
the malware detection approach and (4) discussing the important factors of classification

16
malware approaches in the data mining.

B. Sharma et al. [2517] proposed an AI with cyber security base intelligent system to face the
challenges of malicious code obfuscation, polymorphic and morphing techniques. The scenario
of AI-based cyber security were discussed and shortcomings of AI in cyber security were
highlighted. Kumar et al. [3418] used waterfall plots based on Shapley value to detect the trends
in features for misclassification. The trends in the five topmost featured for misclassification
were used to make inductive rules. The inductive rules were applied to overcome
misclassification and enhance the performance of bagging algorithms. The inductive rules could
be applied to detect unknown future malware known as zero-day malware preventing the attack
on security systems effectively. The accuracy for the Extra tree bagging algorithm was 98.1% for
future unknown malware. Roseline et al. [3519] proposed a hybrid stacked multi-layered
ensemble approach which was robust and efficient than deep learning models. The proposed
model outperformed the machine learning and deep learning models with an accuracy of
98.91%. Vasan et al. [1520] proposed a novel ensemble convolutional neural networks (CNNs)
based architecture for effective detection of both packed and unpacked malware. The main
assumption based on their deeper architectures was that different CNNs provided
different semantic representations of the image; therefore, a set of CNN architectures made it
possible to extract features with higher qualities than traditional methods. Result demonstrated
more than 99% accuracy for unpacked malware and over 98% accuracy for packed
malware. AKANDWANAHO et al. [1621], proposed a neural network ensemble for malware
detection. The approach is based on a hybrid search mechanism where the optimising of
individual networks is done by an adaptive memetic algorithm with Tabu Search, which are also
used to improve hidden neurons and weights of neural networks. The results from the empirical
study prove that the proposed method is more adaptive and efficient at detecting a range of cyber
threats, as it generates better results than the existing methods.

K. Shaukat et al. [1424] proposes a novel deep learning-based approach for malware detection. It
delivers better performance than conventional approaches by combining static and dynamic
analysis advantages. First, it visualises a portable executable (PE) file as a coloured image.
Second, it extracts deep features from the colour image using fine-tuned deep learning model.
Third, it detects malware based on the deep features using support vector machines (SVM).

17
Mustafa Majid et al. [1725] reviews literature on deep learning techniques that are used for
malware detection. Huijuan Zhu et al. [3928] propose an automated extraction method without
any manual expert intervention in Android. Specifically, researcher characterise the vital parts of
the Dalvik executable (Dex) to an RGB (Red/Green/Blue) image. Furthermore, researcher
propose a novel convolutional neural network (CNN) variant with diverse receptive fields using
max pooling and average pooling simultaneously (MADRF), named MADRF-CNN, which can
capture the dependencies between different parts of the image (transferred from the Dex file) by
capitalising on multi-scale context information. The deep learning methods used for malware
detection include CNN, RNN, LSTM and auto encoders. Alnajim et al. [3129] offers a unique
method for classifying malware, using images that use dual attention and convolutional neural
networks. The model has demonstrated exceptional performance in malware classification,
achieving the remarkable accuracy of 98.14% on the Malimg benchmark dataset. Djenna, A et
al. [4030] proposes a new systematic approach to identifying modern malware using dynamic
deep learning-based methods combined with heuristic approaches to classify and detect five
modern malware families: adware, Radware, rootkit, SMS malware, and ransomware. The
symmetry investigation in artificial intelligence and cybersecurity analytics will enhance
malware detection, analysis, and mitigation abilities to provide resilient cyber systems against
cyber threats. Then validated the approach using a dataset that specifically contains recent
malicious software to demonstrate that the model achieves its goals and responds to real-world
requirements in terms of effectiveness and efficiency. H. Malani et al. [3233] design is to train
and validate various deep learning neural networks, such as convolutional and recurrent
networks, while visualising the malware for better analysis at the same time. They also perform a
comparative analysis based on various performance metrics. They were able to achieve over 98%
validation accuracy using CNN. RNN yielded poor results. Wen, Q et al. [4110633] proposes
to detect malwares according to very small binary fragments of PE files by using a CNN-based
model. Datasets especially test set are often one of the most difficult problems in zero-day
malware detection, because it means that the virus has never appeared before. Shahidi et al.
[4234] In this paper, we have used a hierarchical semantic approach to convert numerical and
string data to meaningful values, Subgraph Semantic Homomorphism Coefficient (SSHC) to
select optimal features, and Group Method of Data Handling (GMDH) deep neural network
(DNN) algorithm to detect malware via a cloud-computing infrastructure. To evaluate the model,

18
Android Trojan Dataset has been used. After evaluation, the accuracy reached 99.91%, which
was improved by about 5.25%. Aslan et al. [4335] design a novel deep-learning-based
architecture is proposed which can classify malware variants based on a hybrid model. The main
contribution of the study is to propose a new hybrid architecture which integrates two wide-
ranging pre-trained network models in an optimised manner. Imtiaz et al. [4437] developed a
novel approach DeepAMD to defend against real-world Android malware using deep Artificial
Neural Network (ANN) was adopted including an efficiency comparison of DeepAMD with
conventional machine learning classifiers and state-of-the-art studies based on performance
measures such as accuracy, recall, f-score, and precision. Ashik et al. [4538], In this paper,
researcherwe investigate the relevance of the features of unpacked malicious and benign
executables like mnemonics, instruction opcodes, and API to identify a feature that classifies the
executable. Prominent features are extracted using Minimum Redundancy and Maximum
Relevance (mRMR) and Analysis of Variance (ANOVA). Experiments were conducted on four
datasets using machine learning and deep learning approaches such as Support Vector Machine
(SVM), Naïve Bayes, J48, Random Forest (RF), and XGBoost. Hemalatha et al. [4645], G. Sun
et al. [4746], Damaševiˇcius et al. [4847], Gurumayum Akash Sharma et al. [4952], Huan Zhou
et al. [5061], Halim et al. [5162], Bozkir et al. [5264], Wei Zhong et al. [5366], R. Vinayakumar
et al. [5467] and Rad, B.B et al. [5572] the following papers proposed different malware
detection techniques using deep learning techniques CNN, LSTM and very few cases they use
RNN. In some cases, researchers use machine learning techniques with deep learning techniques
to get good result along with other techniques.

Kamboj et al. [2826], proposed solution to use various machine learning techniques to detect
whether a file downloaded from the internet contains malware or not. The research aims to use
different machine learning algorithms to differentiate between malicious and benign files
successfully. The main idea is to study different features of the downloaded file like MD5 hash,
size of the Optional Header, and Load Configuration Size. Based on the analysis performed on
these features, the files will be classified as malicious or non-malicious. Akhtar et al. [2927],
research study mainly focused on the dynamic malware detection. Malware progressively
changes, leading to the use of dynamic malware detection techniques in this research study. Each
day brings a new influx of malicious software programmes that pose a threat to online safety by
exploiting vulnerabilities in the Internet. Threats were automatically evaluated based on their

19
behaviours in a simulated environment, and reports were created. After reviewing the test and
experimental data for all five classifiers, we?? found that the RF, SGD, extra trees and Gaussian
NB Classifier all achieved a 100% accuracy in the test. Huijuan Zhu et al. [28] propose an
automated extraction method without any manual expert intervention. Specifically, characterise
the vital parts of the Dalvik executable (Dex) to an RGB (Red/Green/Blue) image. Furthermore,
they propose a novel convolutional neural network (CNN) variant with diverse receptive fields
using max pooling and average pooling simultaneously (MADRF), named MADRF-CNN, which
can capture the dependencies between different parts of the image (transferred from the Dex file)
by capitalizing on multi-scale context information. To evaluate the effectiveness of the proposed
method, they conducted extensive experiments and our??? experimental results showed that the
Accuracy of our method is 96.9%. Akhtar, M.S et al. [3032] study demonstrated that detecting
harmful traffic on computer systems, and thereby improving the security of computer networks.,
was possible employing the findings of malware analysis and detection with machine learning
algorithms to compute the difference in correlation symmetry (Naive Byes, SVM, J48, RF, and
with the proposed approach) integrals. Wolsey et al. [4234] reviewed to outline the state-of-the-
art AI techniques used in malware detection and prevention, providing an in-depth analysis of
the latest studies in this field. The algorithms investigated consist of Shallow Learning, Deep
Learning and Bio-Inspired Computing, applied to a variety of platforms, such as PC, clou d d,
Android and IoT. The survey also touches on the rapid adoption of AI by cybercriminals as a
means to create ever more advanced malware and exploit the AI algorithms designed to defend
against them. Alycia N et al. [5642], Kujanpää, et al. [5743], Bawazeer et al. [5844], Kwon et al.
[5948], Gibert, Sarker, I.H et al. [6051], Chen, H. et al. [6170], N. Udayakumar et al. [62-6474]
the following papers focus on malware detection using various machine learning techniques.

Table 2.1: The research papers for the systematic review studies.
SL Author Title of the Paper Year Journal/ Research related
Name Conference Techniques
1 N. Z. Machine Learning 14 IEEE Machine Learning
Gorment et Algorithm for March Xplore Algorithms
al. Malware Detection: 2023 Journal
Taxonomy, Current
Challenges and
Future Directions

20
2 Sanjay K. Evolution of 2020 Springer Machine Learning with
Sahay et al Malware and Its other techniques
Detection Technique
s
3 Xiaolei Ren Unleashing the 2021 ACM Compiler and Deep
et al. Hidden Power of Digital Learning
Compiler Library
Optimization on
Binary Code
Difference: An
Empirical Study
Identifying Compiler
and Optimization
Options from Binary
Code using Deep
Learning
Approaches
4 D. Identifying Compiler 2021 IEEE Compiler and Deep
Pizzolotto et and Optimization Explore Learning
al. Options from Binary
Code using Deep
Learning
Approaches
5 Z. Tian, Y. Fine-Grained 29 IEEE "Neural modeling-based
Huang et al. Compiler March Xplore Compiler Identification
Identification with 2021 Journal (NeuralCI)" with a focus
Sequence-Oriented on sequence-oriented
Neural Modeling neural networks for
analyzing compiler details
in binary code.
6 Chen, Y. et HIMALIA: Springer Compiler + Deep Learning
al. Recovering
Compiler
Optimization Levels
from Binaries by
Deep Learning
7 Z. Wang et Machine Learning in IEEE Complier + Machine
al. Compiler Xplore learning
Optimization

21
8 L. Jones et al. CARDINAL: IEEE Complier
similarity analysis to Xplore
defeat malware
compiler variations
9 McAfee McAfee Labs https:// N/AMachine learning and
Threats report April www.mcafe other technology
2021 e.com/

10 Avast AV-TEST Institute: https:// Machine learning and other


Statistic of malware www.av- technologyN/A
test.org/
11 Amira, A survey of malware ACM Graph theory techniques
Abdelouahab analysis using digital and particularly
community detection library community detection
algorithms algorithms (eg. Louvain,
Modularity etc.)
12 Leveraging Spectral ACM Spectral graph theory
Qirui Sun et Representations of digital
al. Control Flow Graphs library
for Efficient
Analysis of
Windows Malware
Efficient Analysis of
Windows Malware
13 S. Li et al. Intelligent malware The Journal "Graph convolutional
detection based on of network (GCN)"
graph convolutional Supercomp
network. uting
14 Aya Hellal et A survey on graph- 2020 IEEE Graph-based approaches
al. based methods for Xplore
malware detection Journal
15 Sahil Malware and 2022 Internationa Data mining and Machine
Sehrawat, Dr. Malware Detection l Journal for learning techniques
Dinesh Singh Techniques: A Research in
Survey Applied
Science &
Engineering
Technology

22
(IJRASET)
16 Huijuan Zhu A state-of-the-art 2018 ELSEVIER "Data mining techniques,"
et al. survey of malware specifically in the context
detection approaches of malware detection.
using data mining
techniques
17 B. Sharma et Advances and 23 May IEEE Artificial Intelligence and
al. Challenges in 2023 Xplore Cryptography
Cryptography using Journal
Artificial
Intelligence
18 Rajesh Explainable Machine August IC3-2022: "Machine Learning" and
Kumar and S. Learning for 2022 Proceedings specifically Ensemble
Geetha Malware Detection of the 2022 Bagging Algorithms.
Using Ensemble Fourteenth
Bagging Algorithms Internationa
l
Conference
on
Contempora
ry
Computing
19 Roseline, Towards Efficient 2019 Internationa "MultilayeredMulti-
S.A. Malware Detection l Carnahan layered random forest
and Classification Conference ensemble technique"
using on Security
Multilayered Technology
Random Forest (ICCST)
Ensemble Technique
20 Vasan, D. Image-based 2020 Computers "Convolutional Neural
malware & Security Networks (CNNs)" as part
classification using of the "Image-based
ensemble of CNN Malware Classification
architectures using Ensemble of CNNs
(IMCEC)" approach.
(IMCEC)
21 Akandwanah Intelligent Malware 2019 The African "Neural networks" and
o and Detection Using a Journal of "adaptive memetic
Kooblal et al. Neural Network Information algorithm with tabu
Ensemble Based on and
23
a Hybrid Search Communica search."
Mechanism tion (AJIC)
22 T. Nghi Phu An Efficient 2019 The "Dynamic programming
et al. Algorithm to Extract Computer algorithm" for control
Control Flow-Based Journal flow-based feature
Features for IoT extraction, which is used
Malware Detection for IoT malware detection.
23 Roseline, Intelligent Vision- 2020 IEEE machine learning-based
S.A. Based Malware Access anti-malware solution" and
Detection and it refers to a "layered
Classification Using ensemble approach" that
Deep Random Forest mimics the key
Paradigm characteristics of deep
learning techniques.
24 Kamran A novel deep 2023 Engineering Machine Learning (SVM)
Shaukat learning-based Application + Deep Learning (CNN)
approach for of Artificial
malware detection Intelligence,
ELSEVIER
25 Al-Ani Musta A review of artificial 2023 Materials Deep Learning (CNN,
fa Majid et intelligence-based Today RNN, LSTM and auto
al. malware detection proceedings encoders) Model
using deep learning ,
ScienceDire
ct Journal.
26 Akshit Kamb Detection of March ScienceDire Machine Learning
oj et al. malware in 2023 ct Journal. Algorithm including RF
downloaded files
using various
machine learning
models
27 Muhammad Evaluation of January MDPI kNN, DT, RF, AdaBoost,
Shoaib Machine Learning 2023 SGD, extra trees and the
Akhtar Algorithms for Gaussian NB classifier.
Malware Detection
28 H. Zhu et al. An effective end-to- 28 Expert Deep Learning (CNN)
end android malware January Systems
detection method 2023 with
Application

24
s,
ScienceDire
ct
29 Alnajim, Mitigating the Risks 21 July Electronics, Deep Learning (CNN)
A.M. of Malware Attacks 2023 MDPI.
with Deep Learning
Techniques
30 Amir Djenna Artificial March Symmetry, Behavior-based deep
Intelligence-Based 2023 MDPI learning and Heuristic-
Malware Detection, based approaches
Analysis, and
Mitigation
31 Raman Analysis of Malware 19 Mathematic Adaptive Neuro Fuzzy
Dugyala Detection and January al Problems Interference System
Signature Generation 2022 in (ANFIS) and
Using a Novel Engineering
Salp Swarm Optimization
Hybrid Approach ,
(SSA).
Hindawi.
32 Muhammad Malware Analysis 2022 Symmetry, "Machine learning
Shoaib and Detection Using MDPI algorithms," including
Akhtar and Machine Learning Naive Bayes, SVM
Tao Feng Algorithms (Support Vector Machine),
J48, RF (Random Forest),
and CNN (Convolutional
Neural Network)
33 Harsh Malani A Unique Approach 2022 2022 4th Deep Learning and
et al. to Malware Internationa specifically Convolutional
Detection Using l Neural Networks (CNN)
Deep Convolutional Conference
Neural Networks on
Electrical,
Control and
Instrumenta
tion
Engineering
(ICECIE)
34 Adam The State-of-the-Art October Cryptograp "Artificial Intelligence
Wolsey et al. in AI-Based 2022 hy and techniques," including
Malware Detection Security, "Shallow Learning," "Deep

25
Techniques: A Cornell Learning," and "Bio-
Review University Inspired Computing."
35 MinSu Kim Research on 29 2022 "Artificial Intelligence
et al. Malware Detection Septemb IEEE/ACIS (AI)."
System Using er 2022 7th
Artificial Internationa
Intelligence l
Conference
on Big
Data, Cloud
Computing,
and Data
Science
(BCD)
36 Qiaokun Wen CNN based zero-day 2021 Forensic "CNN-based
et al. malware detection Science (Convolutional Neural
using small binary Internationa Network-based)" for zero-
segments l: Digital day malware detection
Investigatio using small binary
n segments.
37 S.M. Shahidi A semantic malware 2021 Computers "Machine learning" and
et al. detection model & Electrical "deep neural network
based on the GMDH Engineering (DNN)" as it discusses the
neural networks use of a Group Method of
Data Handling (GMDH)
deep neural network
algorithm for malware
detection.
38 Ö. Aslan, A. A New Malware 2021 IEEE "Deep Learning (DL)"
A. Yilmaz et Classification Access algorithms.
al. Framework Based
on Deep Learning
Algorithms
39 Jagsir Singh, A survey on machine 2021 Elsevier Machine Learning
Jaswinder learning-based Algorithms- SVM, RF, and
Singh malware detection in KNN
executable files
40 S.I. Imtiaz, DeepAMD: 2021 Future "Deep Artificial Neural
S.u. Rehman, Detection and Generation Network (ANN)"
A.R. Javed et identification of Computer

26
al. Android malware Systems
using high-efficient
Deep Artificial
Neural Network
41 Ashik, M.; Detection of 2021 Electronics "Machine Learning" and
Jyothish, A Malicious Software "Deep Learning"
et al. by Analyzing algorithms, specifically
Distinct Artifacts including approaches such
Using Machine as Support Vector Machine
Learning and Deep (SVM), Naïve Bayes, J48,
Learning Algorithms Random Forest (RF),
XGBoost, Deep Dense
network, One-Dimensional
Convolutional Neural
Network (1D-CNN), and
CNN-LSTM
42 Alycia N [16] Adversarial attacks 2021 Pattern Deep learning
against image-based Recognition
malware detection and
using autoencoders Tracking
43 Kujanpää et Automating 2021 14th ACM "Deep reinforcement
al. Privilege Escalation Workshop learning."
with Deep on Artificial
Reinforcement Intelligence
Learning and
Security
(AISec ’21)
44 Omar Malware Detection 2021 1st "Machine Learning
Bawazeer, Using Machine Internationa algorithms" and "Hardware
Tarek Helmy Learning Algorithms l Performance Counters
et al. Based on Hardware Conference (HPCs)"
Performance on
Counters: Analysis Engineering
and Simulation and
Technology
(ICoEngTec
h) 2021
45 Hemalatha, J. An Efficient 2021 Entropy "Deep learning,"
et al. DenseNet-Based specifically using a
Deep Learning "DenseNet-based deep

27
Model for Malware learning model"
Detection
46 G. Sun and Deep Learning and 2021 IEEE "Recurrent Neural
Q. Qian et al. Visualization for Transaction Networks (RNN),"
Identifying Malware s on "Convolutional Neural
Families Dependable Networks (CNN)," and
and Secure "minhash."
Computing
47 Damaševiˇciu Ensemble-Based 2021 Electronics "Neural Networks"
s, R. et al. Classification Using (specifically dense and
Neural Networks convolutional neural
and networks or CNN) and
"Machine Learning
Machine Learning
Models"
Models for Windows
PE Malware
Detection
48 Soonhong Machine Learning Decembe ACM "Machine learning" and
Kwon et al. based Malware r 2020 Digital "deep learning"
Detection with the Library
2019 KISA Data
Challenge Dataset
49 Daniel The rise of machine 2020 Journal Presents various
Gibert et al. learning for network and adversarial
detection and computer
learning techniques to fool
classification of application,
malware: Research ELSEVIER machine learning
developments, trends detectors
and challenges
50 D. Pizzolotto Identifying Compiler 02 IEEE Deep learning
et al. and Optimization Novemb Xplore
Options from Binary er 2020 Journal
Code using Deep
Learning
Approaches
51 Sarker et al Cybersecurity data 2020 Journal of machine learning
science: an overview Big Data
from machine
learning perspective
52 Gurumayum A Deep Learning 2020 Progress in Deep Learning,"
28
Akash Approach to Image- Computing, specifically using a
Sharma Based Malware Analytics "Convolutional Neural
Analysis and Network (CNN)" and
Networking "Support Vector Machine
(SVM)"
53 Tran Nghi An Efficient The Dynamic Programming
Phu et al Algorithm to Extract Computer Algorithm for Control
Control Flow-Based Journal, flow-based feature
Features for IoT Oxford extraction method
Malware Detection, university
press
54 Onur Barut Machine Learning 2020 7th "Machine learning" and
et al. Based Malware Internationa "deep learning."
Detection on l
Encrypted Traffic: A Conference
Comprehensive on
Performance Study Networking
, Systems
and
Security
(NSysS)
55 Muhammad MALGRA: Machine 2020 Electronics Machine Learning" for
Ali et al. Learning and N- malware detection,
Gram Malware specifically using
Feature Extraction techniques related to N-
and Detection grams and supervised
System machine-learning
algorithms.
56 S. Choudhary Malware Detection 2020 Internationa Machine Learning
and A. & Classification l
Sharma [35] using Machine Conference
Learning on
Emerging
Trends in
Communica
tion,
Control and
Computing
(ICONC3)
57 Demetrio et Adversarial 2020 ACM TO machine learning-based

29
al. EXEmples: A EDIT malware detection.
Survey and conference
Experimental
Evaluation of
Practical Attacks on
Machine Learning
for Windows
Malware Detection
58 Wadkar, M. Detecting malware 2020 Expert "Linear support vector
et al. evolution using Systems machine (SVM)" as a
support vector with machine learning-based
machines Application technique for detecting
s evolutionary changes
within malware families.
59 Naeem, H. Malware detection in 2020 Ad Hoc deep convolutional neural
et al. industrial internet of Networks network (deep learning).
things based on
hybrid image
visualization and
deep learning model
60 I. M. M. Malware Detection 2019 IEEE "Machine learning,"
Matin et al. Using Honeypot Xplore specifically the use of
and Machine Journal "Decision Tree" and
Learning "Support Vector Machine
(SVM)" as classification
algorithms for detecting
malware in conjunction
with honeypots.
61 Sanjeev et al. Evolution of 2019 Information "Machine learning" and
Malware and its and "deep learning" techniques,
Detection Communica specifically in the context
Techniques tion of malware detection.
Technology
for
Sustainable
Developme
nt, Springer.
62 H. Zhou et Malware Detection 2019 Cyber Deep Learning.
al. with Neural Network Security
Using Combined

30
Features
63 Mudzfirah Recurrent Neural 2019 Internationa "Long Short-Term
Abdul Halim Network for l Journal of Memory (LSTM)" and
et al. Malware Detection Advance "Convolutional Neural
Soft Network (CNN)" used for
Computing malware detection.
64 Maleki, N. et An Improved 2019 Internationa "Machine learning
al. Method for Packed l Journal of algorithms" for detecting
Malware Detection Computer malware based on features
using PE Header and Network extracted from the PE
Section Table and header and section table of
Information Information PE files.
Security
65 Bozkir, A.S. Utilization and 2019 27th Signal "Convolutional Neural
et al. Comparison of Processing Networks (CNNs)" and
Convolutional and "machine learning
Neural Networks in Communica methods."
Malware tions
Recognition Application
s
Conference
(SIU)
66 Roseline, Vision-Based 2019 Internationa Convolutional Neural
S.A. et al. Malware Detection l Networks (CNN),"
and Classification Conference specifically referred to as a
Using Lightweight on "Lightweight
Computer Convolutional Neural
Deep Learning
Vision and Networks deep learning
Paradigm
Image model"
Processing.
67 Wei Zhong, A multi-level deep 2019 Expert Deep learning
Feng Gu et learning system for Systems
al. malware detection with
Application
s
68 Vinayakumar Robust Intelligent 2019 IEEE Deep learning
, R. et al. Malware Detection Access
Using Deep
Learning

31
69 Chen, Y., Shi HIMALIA: 7 Springer Deep learning
et al. Recovering Novemb Journal
Compiler er 2018
Optimization Levels
from Binaries by
Deep Learning
70 Z. Wang et Machine Learning in 10 May IEEE Machine learning.
al. Compiler 2018 Xplore
Optimisation Journal
71 Hongyi Chen Malware Collusion 2018 Applied "Machine learning,"
and Jinshu Su Attack against SVM: Sciences specifically "Support
et al. Issues and Vector Machines (SVM),"
Countermeasures. which is a machine
learning method used for
Android malware
detection.
72 Ban Malware Detection 2018 Journal of Machine Learning
Khammas et using Sub-Signatures Information
al. and Machine Security
Learning Technique Research
73 B. B. Rad et Malware 2018 Journal of "Artificial Neural
al. classification and Engineering Networks (ANN)."
detection using Science and
artificial neural Technology
network
74 W. Han et al. MalInsight: A 2018 Journal of "Machine learning
systematic Profiling Network techniques" for malware
based malware and detection.
detection framework Computer
Application
s
75 J. Su, and V. Lightweight 2018 IEEE 42nd "Convolutional neural
D. classification of IoT Annual network (CNN)" for image
Vasconcellos malware based on Computer recognition in the context
[28] image recognition Software of IoT malware
and classification.
Application
s
Conference
(COMPSA

32
C)
76 N. Malware 2018 2nd Machine Learning
Udayakumar, Classification Using Internationa
V. J. Saglani Machine Learning l
et al. Algorithms Conference
on Trends
in
Electronics
and
Informatics
(ICOEI)
77 C. C. San et Malicious Software 2018 Computatio "Machine learning,"
al. [38] Family nal Science specifically the use of
Classification using and machine learning multi-
Machine Learning Technology class classifiers such as
Multi-class , Springer. Random Forest (RF), K-
Classifiers Nearest Neighbor (KNN),
and Decision Table (DT)
for malware family
classification.
78 Z. Cui et al. Detection of 2018 IEEE "Deep learning,"
Malicious Code Transaction specifically the use of a
Variants Based on s on "convolutional neural
Deep Learning Industrial network (CNN)" for
Informatics detecting malware variants.
79 Shafiee, Ev Detection of 2018 The Journal Machine learning and
et al. Spoofing Attack of Neural Network
using Machine Navigation,
Learning based on Cambridge
Multi-Layer Neural University
Network in Single- Press
Frequency GPS
Receivers
80 Agarap, A.F. Towards building an 2017 arXiv "Deep Learning" and
et al. intelligent anti- "Support Vector Machine
malware system a (SVM)" for malware
deep learning classification.
approach using
support vector

33
machine (SVM) for
malware
classification
81 Luo, J.S. et Binary malware 2017 IEEE "Machine learning,"
al. image classification Internationa specifically the use of
using machine l "TensorFlow" for malware
learning with local Conference classification.
binary pattern on Big Data
(Big Data)
82 L. Jones et CARDINAL: 30 IEEE "Static analysis" and
al. similarity analysis to March Xplore "Bloom filter."
defeat malware 2017 Journal
compiler variations
83 N. Dynamic Malware 2017 Internationa "Machine learning" as it
Udayakumar, Analysis Using l focuses on machine
S. et Machine Learning Conference learning-based malware
al.Anandasel Algorithm on classification and
vi et al. Intelligent detection.
Sustainable
Systems
(ICISS)
84 B. Kolosnjaji Deep learning for 2016 Australasian "Deep Learning." The
et al. classification of Joint paper discusses the use of
malware system call Conference deep learning, specifically
sequences on Artificial neural networks with
Intelligence convolutional and
recurrent layers, for the
classification of malware
system call sequences.
85 H. William et DL4MD: A deep 2016 Internationa "Deep learning" with a
al. learning framework l focus on the "stacked
for intelligent Conference AutoEncoders (SAEs)
malware detection on Data model" for intelligent
Mining malware detection.
(DMIN)
86 S. Tobiyama, Malware Detection 2016 IEEE 40th "Deep Neural Networks,"
and Y. with Deep Neural Annual specifically "Recurrent
Yamaguchi Network Using Computer Neural Network (RNN)"
et al. Process Behavior Software and "Convolutional Neural
and

34
Application Network (CNN)."
s
Conference
(COMPSA
C)
87 Huang, W., MtNet: a multi-task 2016 The African "Deep learning" as the
Stokes, J.W neural network for Journal of paper discusses the use of a
et al. dynamic malware Information deep neural network
classification and architecture for malware
Communica classification.
tion (AJIC)
88 Dalla Preda, Infections as 2015 ACM 1st "Machine learning."
M. et al. Abstract Symbolic Internationa
Finite Automata: l Workshop
Formal Model and on Software
Applications Protection
89 Abah, J et al. A MACHINE 2015 Internationa "Machine Learning,"
LEARNING l Journal of specifically the use of a
APPROACH TO Network "K-Nearest Neighbour
ANOMALY- Security & classifier" for malware
BASED Its detection on Android
DETECTION ON Application platforms.
ANDROID s (IJNSA)
PLATFORMS
90 MICHAL Machine learning 2014 Logic Machine Learning
CHORAS et techniques applied to Journal of
al. detect cyber-attacks IGPL
on web applications

Research Methodology
This section explains the processes taken to examine previous work in the field of malware
detection techniques. We also go through how the current studies were chosen using a set of
inclusion and exclusion criteria. Diverse databases were searched to find relevant materials on
malware attacks, detection techniques and classification strategies. The publications were
thoroughly examined using primary study identification and other methodologies. To gather
publications for this study, researchers searched internet databases. Many academics performing
systematic literature reviews in cyber security have used these databases, which are known to
contain published work in our topic of interest. We retrieved a large number of items from each

35
database depending upon their relevancy to our search parameters. The research technique used
in this study included searching relevant publications from a range of academic databases such as
Elsevier, Springer, IEEE Xplore, SPIE Digital Library, Google Scholar, Scopus, AIP, and Wiley
Online Library as shown in Table 2.2. This section explains the processes taken to examine
previous work in the field of malware detection techniques. We also go through how the current
studies were chosen using a set of inclusion and exclusion criteria.
Diverse databases were searched to find relevant material on malware attacks, detection
techniques and classification strategies. The publications were thoroughly examined using
primary study identification and other methodologies. To gather publications for this study,
researchers searched internet databases. Many academics performing systematic literature
reviews in cyber security have used these databases, which are known to contain published work
in our topic of interest. We retrieved a large number of items from each database depending upon
their relevancy to our search parameters. The research technique used in this study included
searching relevant publications from a range of academic databases such as Elsevier, Springer,
IEEE Xplore, SPIE Digital Library, Google Scholar, Scopus, AIP, and Wiley Online Library as
shown in Table 2.2.

Table 2.2: Related studies were found using online sources.

S/ Database Sources Database Link Number of


N articles

1 IEEE Xplore https://ieeexplore.ieee.org/ 52

2 ScienceDirect https://www.sciencedirect.com/ 29

3 Springer https://www.springer.com/ 20

4 MDPI https://www.mdpi.com/ 12

5 ACM digital library https://dl.acm.org/ 12

6 ARXIV https://arxiv.org/ 2

36
7 Oxford University Press https://global.oup.com/?cc=gb 2

8 Symantec Corporation https://www.broadcom.com/ 1


products/cybersecurity

9 McAfee https://www.mcafee.com/ 1

10 IJRASET https://www.ijraset.com/ 1

11 AJIC https://academic.oup.com/ 1

12 Hindawi https://www.hindawi.com/ 1

13 SPIE digital Library https://www.spiedigitallibrary.org/ 1

14 IOP Science https://iopscience.iop.org/ 1

15 MECS Press https://www.mecs-press.org/ 1

16 Dline https://www.dline.info/ 1
journals.php

Total 144

In paper searching, Boolean AND/OR and a combination of search phrases has been used. In
addition, researcher checked the references in some of the papers downloaded and sought for
publications with titles that were connected to our area of interest. This was done in the hopes of
finding further papers not found in the internet databases we examined. To find relevant material
in various respectable academic archives, the following keywords were used: ‘Malware’,
‘Malware detection + Machine Learning’, ‘Malware detection + Deep Learning’, ‘Malware
detection + AI, ‘Malware detection + compiler’, ‘Malware + detection’, ‘Malware detection +
Graph Theory’ Malware detection + Data Mining’, Malware detection + Encryption’, Malware
Detection+ Markov Model. The following are the research question which explored in this study:

RQ1: How much research has been carried out on malware detection and classification
techniques?

37
RQ2: What are mostly the probable issues to be addressed in failing to detect the cyber-attacks?
RQ3: What are the proposed techniques or solutions to address the issue of malware detection
and classification?
RQ4: On which area(s) is(are) this research on malware detection has been mostly focused?
Does it direct to any future research.
In paper searching, boolean AND/OR to combine some of the search phrases has been used. In
addition, researcher checked the references in some of the papers downloaded and sought for
publications with titles that were connected to our area of interest. This was done in the hopes of
finding further papers not found in the internet databases we examined. To find relevant material
in various respectable academic archives, the following keywords were used: ‘Malware’,
‘Malware detection + Machine Learning’, ‘Malware detection + Deep Learning’, ‘Malware
detection + AI, ‘Malware detection + compiler’, ‘Malware + detection’, ‘Malware detection +
Graph Theory’ Malware detection + Data Mining’, Malware detection + Encryption’, Malware
Detection+ Markov Model. The following research questions explored in this study:

RQ1: How much research has been carried out on malware detection?
RQ2: What are the probable issues to be addressed in failing to detect the malware attack?
RQ3: What are the proposed techniques or solutions to address the issue of malware detection
and classification?
RQ4: On which area(s) is(are) this research on malware detection has been mostly focused?
Does it directed any future research.

Table 2.3: The inclusion and exclusion criteria

S/N Inclusion criteria Exclusion criteria

1 The research examines The research did not look at malware-related


malware attack, protection, activities.
and detection techniques.

2 The topic was peer-reviewed The topic has not been peer reviewed or published in
or published in scholarly any academic journals or conference papers.
publications or conference
papers.

38
3 Those research paper are Those research paper are not written in English.
written in English.

4 The documents are journal, The pieces are nott surveys or research studies, but
conference paper, technical rather news flashes or magazine articles.
report, magazine, and
surveys etc.

Data collection and Analysis


The approach for reviewing articles begins with the definition of common terminologies linked
to cybersecurity or information security concerns using the above-mentioned search keywords. It
aids in the elimination of articles that are not relevant to the intended research project's topic.
The focus was on research and survey publications with malware-related topic matter that were
authored in English language text. Figure 2.1 depicts the choices and elimination criteria
procedure. The whole assessment process went well, with publications being filtered out based
on irrelevant titles from 144 to 120; abstracts from 120 to 102; and final evaluations regarding
the contents and objectives of the complete text from 102 to 91.

The approach for reviewing articles begins with the definition of common terminology linked to
cybersecurity or information security concerns using the above-mentioned search keywords. It
aids in the elimination of articles that are not relevant to the intended research project's topic.
The focus was on research and survey publications with malware-related topic matter that were
authored in English language text. Figure 2.1 depicts the choices and elimination criteria
procedure. The whole assessment process went well, with publications being filtered out based
on irrelevant titles from 144 to 120; abstracts from 120 to 102; and final evaluations regarding
the contents and objectives of the complete text from 102 to 91.

39
Fig. 2.1 Process of Literature Searching

Result
The findings of our research are summarised in this section. Table 2.1 summarises the data from
the studies and research trend techniques that will assist us in answering our research question
RQ4. It explains the methodologies or solutions presented for each investigation. This is
obtained directly from the authors of the research. It also displays the study's main emphasis as
well as the research methods. The study's solution is targeted at malware classification,
detection, or deployment, as indicated by the area of focus. Table 2.1 lists the 85 papers from the
review that include the information needed to address our research topic of interest. It lists the
study's author(s), the study's title, the year it was published, and the source of publishing,
whether it was a journal or a conference session. Table 2.4 lists the names of the publications and
conferences where the studies were submitted, as well as the number of studies published by
each. The solution or techniques offered by the research are summarised in Table 2.5. The
techniques are categorised in the first column under Adopted techniques, and the studies that fit
under each category are listed in the second column.The findings of our research are summarised
in this section. Table 2.1 summarises the data from the studies and research trend techniques that
will assist us in answering our research question RQ4. It explains the methodologies or solutions
presented for each investigation. This is obtained directly from the authors of the research. It also
displays the study's main emphasis as well as the research method. The study's solution is
targeted at malware classification, detection, or deployment, as indicated by the area of focus.
Table 2.1 lists the 85 papers from the review that include the information needed to address our
research topic of interest. It lists the study's author(s), the study's title, the year it was published,
and the source of publishing, whether it was a journal or a conference session. Table 2.4 lists the
names of the publications and conferences where the studies were submitted, as well as the
number of studies published by each. The solution or techniques offered by the research are

40
summarised in Table 2.5. The techniques are categorised in the first column under Adopted
techniques, and the studies that fit under each category are listed in the second column.

Discussion
The following questions have been set which will be discussed below. The findings to our study
questions are discussed in this section.

RQ1: How much research has been carried out on malware detection techniques?

Table II shows the 144 research articles found in the total study from the sources examined.
Table IV shows the diversity of publications across several journals and conference sessions. 91
of the research have been published in different journal and conferences, the majority of which
are highly regarded IEEE Xplore, Elsevier publications, Springer etc. They cover a wide range of
research focus with different malware detection techniques will present in question 3 under this
discussion section.

RQ2: What are the probable issues to be addressed in failing to detect the malware attack?
The capability of classic detection approaches based on static and dynamic analysis has been
hampered due to the rapid development of anti-malware detection techniques. The application of
AI technology in malware detection has become a research hotspot, thanks to the outstanding
prediction performance of neural networks. In terms of malware categorisation and detection
purpose, the training of neural network designs with millions of data provides the greatest
effectiveness. Time-to-time anti-malware has been created to guard against malware attacks.
However, malware creator tend to be one step ahead of anti-malware researchers, employing
advanced obfuscation, polymorphic, morphing and novel approaches to prevent anti-malware
detection. That's why it is required to develop more robust solution that can detect malware more
effectively. The proposed multistage technique using machine learning and compiler exp capable
to detect the malware more effectively.

RQ3: What are the proposed techniques to address the issue of malware detection?
The finding’s offered techniques are numerous and diverse. Machine Learning based techniques,
Deep Learning base techniques, Machine Learning with Compiler base techniques, Machine
Learning with Graph Theory, Machine Learning with Data Mining techniques, Machine
Learning with encryption techniques, Machine Learning with Markov Model techniques,
Machine Learning fuzzy technique. Researcher will go over them under the topics below.

41
Machine Learning base Technique: As shown in Table VI, machine learning base malware
detection techniques were recommended as remedies for malware detection in the following
papers.
[9,11,14,19,20,22,23,29,34,38,41,44,45,48,50,51,52,53,54,56,57,60,61,66,67,68,70,72,73,76,78,
83,84,85].
Deep Learning base Technique: Deep learning has been used to analyse malware in the last
several years. Bytes sequence, grey-scale image, structural entropy, API call sequence, HTTP
traffic, and network behaviour have all been used to apply different types of deep learning
algorithms, such as convolutional neural networks (CNN) and recurrent neural networks (RNN),
to a wide range of malware analysis are using cases. In the review, researcher get 31 papers that
were used deep learning approaches and that is
[5,6,8,19,20,21,25,26,27,30,31,35,36,38,39,42,47,49,50,55,57,58,62,63,64,65,74,75,79,80,82,
90]
Machine Learning with Compiler based Technique: After details review of the following papers
[37, 18, 19, 20, 11, 383-8] in beginning of the chapter we found that machine learning and
mostly deep learning have been used with compiler for malware detection purpose and it provide
promising result.
Machine Learning with Graph Theory Technique: The following papers [22-24, 1212-15] have been
reviewed in beginning of the chapter found that graph theory with machine learning have been used for
malware detection purpose and it provide good result.
Machine Learning with Data Mining Technique: The following papers [12, 1315-16] have been
reviewed in beginning of the chapter found that data mining with machine learning have been used for
malware detection purpose and it provide good result.
Machine Learning with Encryption Technique: The following papers [25, 3417-18] have been
reviewed in beginning of the chapter found that data mining with machine learning have been used for
malware detection purpose and it provide good result.

Others Technique: The remaining 17 research were placed in the 'Others' category since their
suggested methodologies did not fit into any of the previously mentioned categories.

RQ4: On which area(s) is(are) this research on malware detection has been mostly focused?
Does it motivate any future research?

It is found in statistics that showed in table IV that deep learning with other technology become
dominant field for malware detection in recent times. The details are discussed in RQ3.
Therefore, it will be justified if we propose deep learning with compiler as new technique for
malware detectiodetection The following questions have been set which will be discussed below.
The findings to our study questions are discussed in this section.n.

42
Table 2.4: Journal and conference names, as well as the volume of research published in each.

S/N Title of Journal/Conference Number of


studies

Conference Paper

1 6th International Conference for Convergence in Technology (I2CT) 1

2 14th ACM Workshop on Artificial Intelligence and Security (AISec ’21) 1

3 Australasian Joint Conference on Artificial Intelligence 1

4 International Conference on Data Mining (DMIN) 1

5 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC) 1

6 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC) 1

7 1st International Conference on Engineering and Technology (ICoEngTech) 2021 1

8 7th International Conference on Networking, Systems and Security (NSysS) 1

9 International Conference on Emerging Trends in Communication, Control and 1


Computing (ICONC3)

10 2nd International Conference on Trends in Electronics and Informatics (ICOEI) 1

11 International Conference on Intelligent Sustainable Systems (ICISS) 1

12 ACM 1st International Workshop on Software Protection 1

13 ACM TO EDIT conference 1

14 27th Signal Processing and Communications Applications Conference (SIU) 1

15 International Conference on Computer Vision and Image Processing 1

16 International Carnahan Conference on Security 1


Technology (ICCST)

17 IEEE Transactions on Dependable and Secure Computing 1

18 IEEE International Conference on Big Data (Big Data) 1

43
19 Intelligence and Security Informatics Conference (EISIC) 1

Total 19

Journal Paper

20 Forensic Science International: Digital Investigation 1

21 Applied Sciences 1

22 Journal of Information Security Research 1

23 Computers & Electrical Engineering 1

24 Journal of Network and Computer Applications 2

25 The Journal of Supercomputing 1

26 IEEE Access 6

27 Future Generation Computer Systems 1

28 Journal of Big Data 1

29 Electronics 5

30 Journal of Engineering Science and Technology 1

31 Pattern Recognition and Tracking. 1

33 Information and Communication Technology for Sustainable Development, 1


Springer.

34 Progress in Computing, Analytics and Networking 1

35 Cyber Security 1

36 The African Journal of Information and Communication (AJIC) 3

37 Logic Journal of IGPL 1

38 The Computer Journal 1

39 Computational Science and Technology, Springer. 1

40 International Journal of Network Security & Its Applications (IJNSA) 1

41 International Journal of Advance Soft Computing 1

42 Mathematical Biosciences and Engineering 1

43 Entropy 1

44
44 International Journal of Computer Network and Information Security 1

45 Expert Systems with Applications 2

46 Computers & Security 1

47 Ad Hoc Networks 1

48 arXiv 1

49 Journal of Navigation, Cambridge University Press 1

50 Logic Journal of the IGPL, Computer Journal, Oxford University Press 2

Total 44

Table 2.5: An overview of the evaluation techniques.

SL Adopted technique Paper references Number of papers

1 Machine Learning base 22,23,29,34,38,41,44,45,48,50,51,52,53,54,56,57,6 32


techniques 0,61,66,67,68,70,72,73,76,78,83,84,85

2 Deep Learning base 19,20,21,25,26,27,30,31,35,36,38,39,42,45,47,49,5 29


techniques 0,55,57,58,62,63,64,65,74,75,79,80,82

3 ML + DL with Compiler 3-7 5


base techniques

4 Machine Learning with 12-15 4


Graph Theory

5 Machine Learning with 16 2


Data Mining techniques

6 Machine Learning with 17-18 2


encryption techniques

8 Others Techniques 1,2, 8, 9, 10, 11, 86-90 17

2.3 Further Direction


An exhaustive systematic literature review on malware detection techniques have been
conducted following different journals, conference paper including reputed digital libraries. After
reviewing the research papers, a new trend has been seen where researchers build new malware
detection techniques mostly using machine learning and deep learning with other technologies
such as compiler, graph theory, data mining, Markov model, etc. to combat the malware variants.
After systematic review of all the listed papers, it was found that deep learning technique (e.g.

45
CNN) with other technologies become dominant malware detection techniques in recent times.
Therefore, systematic literature review and also question 4 suggest deep learning with compiler
based technique.

An exhaustive systematic literature review on malware detection techniques have been


conducted following different journals, conference paper including reputed digital libraries. After
reviewing the research papers, a new trend has been seen that researchers build new malware
detection techniques mostly using machine learning and deep learning with others technology
like compiler, graph theory, data mining, Markov model etc. to combat the malware variant.
After systematic review of the all-listed papers, it was found that deep learning technique (eg.
CNN) with others technology become dominant malware detection techniques in recent time.
Therefore, systematic literature review and also question 4 suggesting for deep learning with
compiler based technique.

2.4 Theoretical framework


This section aims to explain the proposed conceptual framework which incorporate frequently
applied theories- Self-Nonself Theory and the Danger Theory, Machine Learning Theory,
Behavioral Psychology, Information Theory, Network Theory, Cryptography Principles,
Epidemiological Models, Complex Systems Theory, Evolutionary Biology and their relevance,
interrelationship for malware detection.

46
Fig. 2.2 The proposed Theoretical Framework

Self-Nonself Theory and the Danger Theory


The Self-Nonself Theory and the Danger Theory employed in different areas of research-pattern
recognition, anomaly detection problems and negative selection etc. Artificial Immune System
(AIS) is based on this theory is becoming a common system in cyber security field through years
of study [6392-6696-97] this paper showed significant success in pattern recognition, anomaly
detection problems and negative selection. It is evident that researchers have great interest to
develop model and algorithm using artificial immune system and machine learning including
deep learning to improve the accuracy of the malware detection [63-64], [67-6892-95]. Beside
accuracy, another task of this system is to address the muted malware. Therefore, The Self-

47
Nonself Theory and the Danger Theory has used in the proposed multistage malware detection
model. Specifically, this theory has applied in deep learning part of the multistage model for
malware feature selection.

Machine Learning Theory


The Machine Learning Theory is widely recognized as a prominent research model for malware
detection [361-2]. A systematic literature review has been conducted in chapter 2 of this proposal
and found that the application of machine learning theory has been increased in recent times for
malware detection. Machine learning theory has also been employed in anti-virus company McA
fee [29] for malware detection.
Behavioral Psychology

Behavioral Psychology principal has widely used to study the user behavior and detect
deviations that may signal malware activity. User behavior analytics can help identify abnormal
patterns that suggest a security breach [6984].

Evolutionary Biology
It is one of the biggest challenges for the anti-malware developer to develop a technique to detect
malware variant. Therefore, researchers are continuously trying to develop the technique to catch
up the malware variant. To develop the technique, the researcher adopted the concept from evolu
tionary biology [6794].
Cryptography Principles
Understanding of cryptography principals is highly essential to develop any malware detection m
odel. This paper [2517] present artificial intelligence with cryptography to make robust system fo
r malware detection. The underlying concept has used in the proposed multistage model.
Network theory and graph theory

This theory is used as a prominent research model that has received widespread recognition for
its usefulness in malware detection. Specially, to understand the architecture of the network,
patterns of the communication and network traffic is required to develop a model for malware
detection [21-2411-14].

Information Theory
This theory applied in real communication and computing systems security including coding tech

48
niques for secure communication system design [7098]. Furthermore, this theory is used to analy
ze data and identify anomalies that may indicate the presence of malware.
Epidemiological Models
The Epidemiological models, such as the SIR (Susceptible-Infectious-Recovered) model, can be
adapted to analyze the spread of malware in a network. These models help in understanding how
malware propagates and can inform strategies for containment and mitigation. The proposed mo
del has used to understand how malware propagate and its counter measure.
Proposed Theoretical framework
The framework is visually depicted in Figure 2.2, illustrating the aforementioned theories-ML,
Self-Nonself Theory and the Danger Theory, Behavioral Psychology, Evolutionary Biology,
Cryptography Principles, Network theory and Graph theory, Information Theory,
Epidemiological Models. These theories jointly encompass essential aspects of malware
detection (encrypted malware, polymorphic malware, pattern recognition, anomaly detection,
negative selection, normal and abnormal behavior analysis, origin analysis and malware
propagation). Therefore, these theories have used to build theoretical framework which inspires
the development of proposed multistage malware detection model. The framework explores the
underlying mechanisms that work for pattern recognition, anomaly detection, negative selection,
normal and abnormal behavior analysis, origin analysis and malware propagation which will
become the driving factor of the proposed malware detection model. The decisions concerning
the pattern recognition, anomaly detection, negative feature selection, normal and abnormal
behavior identification, origin identification and malware propagation were guided by proposed
theoretical framework. For example, the Self-Nonself Theory might inspire the creation of
machine learning models that differentiate between normal and abnormal system behaviors,
similar how the immune system distinguishes between self and non-self.

Chapter 3

3.1 Methodology
This chapter describesprovides the illustration of the methodologies applied in the proposed
multi-stage malware detection model. y that will be used in this research work.It also describe
the dataset that will be used in this research. The following steps will be followed in the whole

49
methods:

 Develop the architectural model


 Select features to be extracted by CNN.
 Apply Deep learning algorithm
 Construct malware analysis
 Convert the selected malware to symbol
 Symbols are identified as malware
 Construct the results.
 Set up the lab environment according to lab architecture.
 Collect PE datasets to train the model
 Cleaning the dataset to remove the ambiguity
 Labelling the images and symbols to train both DL and use in compiler.
 The images and symbols are then converted into binary array and shuffled
for model access to a variety of data points, each of which belongs to a different
class.
 Examine existing studies
 Compare results
 Demonstrate results
The research methodology includes the collection of data and how the data was analysed and
how the proposed multi-stage malware detection model will detect malware. Next, the results
will be evaluated against existing malware detection techniques. In order to do that the proposed
multistage malware detection model have two different stages respectively:
[(I)] Deep learning base sub model (II) Compiler base sub model.

3.1.1 Dataset Dataset


The proposed multistage malware detection model have two different stages respectively: (I)
Deep learning base stage (stage-1) and (II) Compiler base stage (stage-2). In deep learning base
stage, required image base dataset therefore widely accepted images base malimg dataset was
chosen for this stage which was made by Natarj et al. [71107]. In compiler base stage, required
symbol base dataset therefore, the researcher convert the same malimg image dataset into

50
symbol dataset as symbol dataset is not widely available.
The PE file format is a data structure that encapsulates the information necessary for the
Windows OS loader to manage the wrapped executable code. PE file format is design for
Windows OS, MAC OS and Android OS and the proposed research work is limited to these OS.
The following OS support the different file types which include .EXE (executable), .DLL
(Dynamic Link Libraries), .SYS (Windows Drivers), .APP (executable), .OSX
(executable), .BIN (binary executable), .APK (Android Application Kit), etc. PE file format were
not design in code modification resistant thus hacker can easily inject the malware code in the PE
file. It is evident in Natarj et al. [71107] that the infected PE file have different visual texture of
the image than an uninfected PE file , therefore analysing the PE file visual texture will relevant
to judge whether a file become infected or not. The creator of malimg dataset was made with the
same philosophy.

The malimg dataset was chosen for deep learning base stage due to its wide range of
acceptability in malware research field. The malimg dataset is taken from the Kaggle public
domain (https://www.kaggle.com/datasets/ikrambenabd/malimg-original). The raw form of the dataset
was byte coded that was extracted from the PE file header by using IDA pro, HeX Editor tools
and then it changes by processing as an 8-bit unsigned number vector. After that, this PE file is
represented as a grayscale image with a range of 0 to 255; here 0 means black and 255 means
white. The image width is constant, but the height might change depending on the files size.

Fig. 3.1 Transformation of binary malware binary to grayscale image

After converting, 10227 grayscale images have generated that represents 25 different types of
malware family such as Dontovo.A, Allaple.L, Yiner.A, Lolyda.AA 1, C2Lop.P, Instantaccess,
Autorun.K , Obfuscator.AD etc. and add 1 benign dataset family contains 49489 samples have
been collected from other source (https://www.kaggle.com/datasets/walt30/malware-images)

51
as the experiment required both malicious and benign samples. Here is the list of 25 malware
families and benign samplesy in a tabular format.

Table 3.1 Malware Dataset of 25 Familiesy and Benign Samples


Sl. Malware Class Malware Family & Benign Number of Image

1 Worm Allaple.LAllaple.L 1591

2 Worm Allaple.A 2949

3 Worm Yiner.A 800

4 PWS Lolyda.AA 1 213

5 PWS Lolyda.AA 2 184

6 PWS Lolyda.AA 3 123

7 Trojan C2Lop.P 146

8 Trojan C2Lop.gen!g 200

9 Dialer Instantaccess 431

10 TDownloader Swizzot.gen!I 132

11 TDownloader Swizzor.gen!E 128

12 Worm VB.AT 408

13 Rogue Fakerean 381

14 Trojan Alueron.gen!J 198

15 Trojan Malex.gen!J 136

16 PWS Lolyda.AT 159

17 Dialer Adialer.C 125

18 TDownloader Wintrim.BX 97

19 Dialer Dialplatform.B 177

20 TDownloader Dontovo.A 162

21 TDownloader Obfuscator.AD 142

22 Backdoor Agent.FYI 116

23 Worm: AutoIT Autorun.K 106

52
24 Backdoor Rbot!gen 158

25 Trojan Skintrim.N 80

26 Benign Non-malware 494

Fig.3.2 Variants of 6 malware families Dontovo.A, Autorun.K, Lolyda.AT, Adialer.C,


Swizzor.gen!I, Agent.FYI and benign samples.

I) Fig. 3.2 shows examples of malware from 6 different families and benign samples. Researcher can make an empirical
observation that images of malware belonging to the same family appear visually similar and distinct from those belonging to a

different family. The visual similarity of malware images inspired the researcher to look at malware classification using

53
techniques from computer vision, where image-based classification has been well studied. Further, researcher deeply
investigate the Dontovo.A malware family as an example.

Fig 3.32. Different section of a Trojan malware: Dontovo.A

In the above fig.3.32 showed different section of a Trojan malware as an example for further

discussion. The .text section of the image contains the executable code which is

wrapped. It is seen in first part of .text section contains code whose texture is fine grained. The
rest of the part is filled by Zeros (Black) indicating zero padding at end of .text section.
The .data segment carries both unsigned code (black fragment) and data that has been

generated (fine grained texture). This .rdata section contains read-only


data in an executable file. The last section .rsrc which stores all of the terminal's
resources. This may contain icon which may used by other application. Here, researcher done an
experiment on deep learning base stage by applying malimg dataset to test the performance of
the deep learning base stage which will be describe in next section. The researcher objective is to
train the proposed deep learning base model as piloting purpose.
Furthermore, the researcher plan is to create a new dataset using newrecent malware for real
implementation of the model. Moreover, this newly created dataset will converted into symbol
dataset using available tools iMAGE to sYMBOL(IDA Pro, and radare2) for 2nd stage of the
model.
54
3.1.2 Deep Learning base stage (stage-1)

In this stage of the model, describes how the Convolutional Neural Network (CNN) algorithm
will be used for malware detection. At first, grey scale image from available dataset (eg. malimg
dataset) is fed to the kernel of Convolutional Neural Network (CNN) model illustrated below in
figure 2 in order to extract the features from the images using convolutional layers, polling layers
and fully connected layers as discussed in detailed at the previous section. Next, the CNN model
will detect the malware with a percentage of accuracy based on the extracted features. To get
better results, the CNN algorithm has been customized which is given below. After detection of
the malware will feed to 2nd stage of the model to validate the accuracy, which will be described
in detail in the compiler base stage.

The dataset for this study is taken from the Kaggle public domain. The raw form of the dataset
was byte coded that was extracted from the PE file by using IDA pro, HeX Editor tools and then
it changes by processing as an 8-bit unsigned number vector. After that, this PE file is
represented as a grayscale image with a range of 0 to 255; here 0 means black and 255 means
white. After converting, we get 10227 grayscale images that represents 25 different types of
malware family. The image width is constant, but the height might change depending on the files
size.
Fig 3.43 1st stage of the proposed model

55
Here, researcher done an experiment of the first stage of the model (Convolutional Neural Netwo
rk (CNN) using malimg dataset and got 95% accuracy. Firstly, researchers take grayscale images
with a 224×224 input form. After that, the transferable convolution in the hidden layers offers an
outcome shape (224×224×16), while the non-learnable layer (MaxPool) divides the output by 2 t
o create a new output form of (112,112,16). By applying the same method, it is also used to redu
ce the MaxPooling layer and the transferable layer by 50%. This calculation is performed by appl
ying the subsequent rule:
mh=nw=m+2p-fs+1

According to the formula, the input image data mh represent to the image height and nw represen
ts the image width, both are predefined for the model. In the proposed model, padding is 0, indic
ating that the extra pixel intensity is also 0. Padding (p) is a notion associated with CNN (Convol
utional Neural Network) that reflects the number of pixels on an input frame in a CNN model. H
ere, the batch size is defined by 3 and the kernel shape is (f). Since the model set stride to 1, it ca
n only filter to one pixel at a time. Stride is the count of pixels that shuffle across the weight matr
ix. Lastly, the final output layer, known as SoftMax, calculated the probability that each of the 25
classes that the input image belonged to. After training the model, tested it by using benign and
malware files and it provided 95% of detection rate and 5% of loss which depicted in figure
below.

56
Fi

g.3.5 (a)Training and validation accuracy of 1 st stage of the model. Fig. 3.5(b)Training and

validation loss of 1st stage of the model.

Further, the researcher plan is to create a new dataset with recent malware for real

implementation of the proposed model.

Algorithm 1: CNN
Input: Clear and equal Gray Scale
Output: Prediction as expected

START
1. Receive features from input training.
2. Choose deep-learning frameworks to build CNNs, including PyTorch, TensorFlow, and Keras.
3. Prepare the Input Data
4. Build the architecture of CNN, including the number of layers, the size of the filters, and the number of filters
per layer. Then, add the layers to customise the model.
5. Divide into input, output, Conv, Dropout, pooling, hidden, and fully connected layer
6. Input units are passed with some weights attached to it to the hidden layer i.e. x1, x2,x3,….xn is passed. All the
inputs are multiplied by their weights.
7. Each hidden layer consists of neurons. All the inputs are connected to each neuron.
8. Compute bias which is a constant that helps the model to fit in the best possible way . M1 = W1*X1 +
W2*X2 + W3*X3 + W4*X4 + W5*X5 + b (W1-n, are the weights assigned to the inputs X1-n, and b is the
bias.
9. Kernel initialised & activation function (Linear Unit Function (ReLU)) is applied to the linear equation M1 to
calculate non-linearity in the model.
10. Repeat step 7 and 8 in each hidden layer and move to the last layer i.e process is known as forwarding
Propagation.
11. The dense is performed for the hidden layers, kernels and activation function.
12. In the output layer calculate the error which is the difference between the actual and the predicted output.
13. If the error is large, then apply following:

57
i. Randomly initialized value of the weight, and intercepts propagation and the errors are calculated
after all the computation.
ii. The gradient is computed i.e derivative of error w.r.t current weights
iii. Compute new weights where to control the speed or steps of the back propagation where fast want
to move on the curve to reach global minima by using the following formula:
Wn=Wn-a(λerr÷λWn)

14. It continues the process of calculation the the errors from the new weights, and continuously updating of
weight untill achieve global minima and loss is minimised.
15. Evaluate the CNN: Visualizing the filters or the feature maps and calculate the loss function with AUC,
precision, recall using confusion matrix.
16. If the model is fit with training dataset and use optimiser to minimise the error by fine tune parameter. Predict
the result of training model & finally return back to predicted result.

END

Fig 3.1. Transformation of malware binary to grayscale image

The image width is constant, but the height might change depending on the files size.

The Data Structures of the PE files are– DOS Header, DOS Stub, PE File Header, Image
Optional Header, Section Table, Data Dictionaries, and Sections. Among these structured of
data, Sections are needed for our research because the .text section represent the executable
code. From the malimg dataset, an individual sample is shown below and this sample image
contains different types of information.

Fig 3.2. Different section of a Trojan malware

58
Figure 3.2 displays a picture of Dontovo A, a popular Trojan downloader that downloads and
executes arbitrary files. It is worth noting that, as shown in Fig. 3.2, various parts (binary
fragments) of the virus have diverse visual textures in many situations. It has a full taxonomy of
numerous fundamental binary pieces as well as their representations as grayscale images. The
executable code is found in the .text segment. The code whose pattern is fine grained is found in
the initial half of the .text segment, as seen in the image. The rest is covered with zeros (black),
indicating that this area has no padding. The .data segment after that carries both unsigned code
(black fragment) and data that has been generated (fine grained texture). The last component is
the. rsrc section, which stores all of the terminal's resources. These might also contain icons used
by a program.
Based on the features that are retrieved from the PE header and sections, the following
subsections provide further information.
[a.] Total number of sections- How many entries are there in the section table.
[b.] Compiling time- It indicates the time and date the file was created.
[c.] Indicator of compile time- if (TimeDateStamp > current date) extract 1 otherwise 0.
[d.] Total number of symbols- It refers to the quantity of symbols in the symbol table.
We implemented a derived feature called "Indicator of compile time" based on the
TimeDateStamp. The compiler has the option to record the compile time and date in the file
when creating binaries. The attacker may modify the compile date and time. The feature "
Indicator of compile time " is listed as 1, which indicates that a binary file that was compiled in
the future is suspicious. Because some malware writers remove the symbols from the binary to
make analysis more difficult, Total Number of Symbols was chosen as a feature. The categories
of malware that will be used in our research is shown in table 3.1.
Link: Malimg dataset: https://www.kaggle.com/datasets/keerthicheepurupalli/malimg-dataset9010)

Table 3.1: Tabulation of 25 different classes of malware and their family.


Index Malware Malware Number
Class Name Family Name of
Images
1 Worm Allaple.L 1591
2 Worm Allaple.A 2949
3 Worm Yiner.A 800
4 PWS Lolyda.AA 1 213
5 PWS Lolyda.AA 2 184
6 PWS Lolyda.AA 3 123

59
7 Trojan C2Lop.P 146
8 Trojan C2Lop.gen!g 200
9 Dialer Instantaccess 431
10 TDownloader Swizzot.gen!I 132
11 TDownloader Swizzor.gen!E 128
12 Worm VB.AT 408
13 Rogue Fakerean 381
14 Trojan Alueron.gen!J 198
15 Trojan Malex.gen!J 136
16 PWS Lolyda.AT 159
17 Dialer Adialer.C 125
18 TDownloader Wintrim.BX 97
19 Dialer Dialplatform.B 177
20 TDownloader Dontovo.A 162
21 TDownloader Obfuscator.AD 142
22 Backdoor Agent.FYI 116
23 Worm:AutoIT Autorun.K 106
24 Backdoor Rbot!gen 158
25 Trojan Skintrim.N 80

60
3.1.2 Deep Learning Base sub model To insert the symbol in compiler base sub model,
it is required to create symbol dataset by maintaining the following steps:

[I)] Obtain a set of PE files: A collection of PE image files is required that will be converted
into symbols. These files can be obtained from various sources, such as software packages
or publicly available malware.

[II)] Understand the PE file format: Researcher needs to be familiarized with the structure
and layout of the PE file format. This will help to extract relevant information and symbols
from the file.

[III)] Extract symbol information: Symbols provide names and other metadata for
functions, variables, and other program elements within the PE file. Tools like debuggers
or symbol extraction libraries help in extracting symbol information from the PE image
files. Some popular tools include IDA Pro, and radare2.

[IV)] Define the dataset structure: Decide on the structure and format of the dataset. This
will depend on the specific information which needs to be extracted from the PE files. For
example, function names, addresses, sizes, and other relevant attributes might be included
in the dataset.

[V)] Write a script or program: Use a programming language (such as Python) to write a
script or program that processes the PE files and extracts the desired symbol information.
Researcher can leverage existing libraries or develop own parsing logic based on the PE file
format specifications.

[VI)] Iterate over the PE files: Iterate over collection of PE files and apply script or
program to extract the symbol information from each file. Store the extracted information
in a desired dataset format, such as a jpg file or a database as per requirement of database.

61
[VII)] Validate and clean the dataset: After extracting the symbols, perform any necessary
validation and cleaning steps on the dataset. This may involve removing duplicates,
handling missing or inconsistent data, and ensuring the dataset's overall quality.

[VIII)] Add additional features or labels: Depending upon malware detection case, the
dataset can be enhanced by adding additional features or labels. For example, information
about the file's origin, malware classification and some other relevant attributes can be
included.

[IX)] Split the dataset: The dataset can be split for training and testing the model. This
ensures that your models are evaluated properly and helps prevent over-fitting. But the
proposed model will use symbol to indicate the identifier or token in compiler.

[X)] Document the dataset: Provide documentation that describes the structure, content,
and any pre-processing technique applied to the dataset. This documentation will be
helpful for future researchers to understand and utilise the dataset.

In this stage of the model, how the proposed deep learning base sub model will be used for
malware detection is described. At first, grey scale image from available dataset (eg. malimg
dataset) is fed to the kernel of Convolutional Neural Network (CNN) model illustrated below in
figure 2 in order to extract the features from the images using convolutional layers, polling layers
and fully connected layers as detailed in the previous section. Next, the CNN model will detect
the malware with a percentage of accuracy based on the extracted features. To get better results,
the CNN algorithm has been customized which is given below. After detection of the malware
will feed to 2nd stage of the model to validate the accuracy, which will be described in detail in
the compiler base sub model.

62
Fig 3.3 1st stage of the proposed model
Algorithm 1: CNN
Input: Clear and equal Gray Scale
Output: Prediction as expected

START
[1.] Receive features from input training.
[2.] Choose deep-learning frameworks to build CNNs, including PyTorch, TensorFlow, and Keras.
[3.] Prepare the Input Data
[4.] Build the architecture of CNN, including the number of layers, the size of the filters, and the number of filters
per layer. Then, add the layers to customise the model.
[5.] Divide into input, output, Conv, Dropout, pooling, hidden, and fully connected layer
[6.] Input units are passed with some weights attached to it to the hidden layer i.e. x1, x2,x3,….xn is passed. All the
inputs are multiplied by their weights.
[7.] Each hidden layer consists of neurons. All the inputs are connected to each neuron.
[8.] Compute bias which is a constant that helps the model to fit in the best possible way . M1 = W1*X1 +

63
W2*X2 + W3*X3 + W4*X4 + W5*X5 + b (W1-n, are the weights assigned to the inputs X1-n, and b is the
bias.
[9.] Kernel initialised & activation function (Linear Unit Function (ReLU)) is applied to the linear equation M1 to
calculate non-linearity in the model.
[10.] Repeat step 7 and 8 in each hidden layer and move to the last layer i.e process is known as forwarding
Propagation.
[11.] The dense is performed for the hidden layers, kernels and activation function.
[12.] In the output layer calculate the error which is the difference between the actual and the predicted output.
[13.] If the error is large, then apply following:
[i.] Randomly initialized value of the weight, and intercepts propagation and the errors are calculated
after all the computation.
[ii.] The gradient is computed i.e derivative of error w.r.t current weights
[iii.] Compute new weights where to control the speed or steps of the back propagation where fast want
to move on the curve to reach global minima by using the following formula:
Wn=Wn-a(λerr÷λWn)

[14.] It continues the process of calculation the the errors from the new weights, and continuously updating of
weight untill achieve global minima and loss is minimised.
[15.] Evaluate the CNN: Visualizing the filters or the feature maps and calculate the loss function with AUC,
precision, recall using confusion matrix.
[16.] If the model is fit with training dataset and use optimiser to minimise the error by fine tune parameter. Predict
the result of training model & finally return back to predicted result.

END

3.1.3 Compiler base sub modelstage (stage-2)


This section describes how the proposed compiler based sub modelstage will be used for
validating the accuracy of the deep learning base stage CNN model and how the malware is
removed as depicted in figure 3.4 below. In order to do that, it is required to know this the 2nd
stage of the compiler based sub modelof the model. The 2ndThis stage of the model compiler base
sub model has been built using six different phases of the compiler-lexical analysis, syntax
analysis, semantic analysis, intermediate coder generator, code optimiser and code generator and
ensuring the functions of the compiler remain the same. The GCC (GNU Compiler Collection)
compiler has been used to build thisfor the 2nd stage of thestage model. Firstly, the output of the
first stage of the model, which is a malware (grey scale) will be converted to symbol using
available tools iMAGE to sYMBOL (IDA Pro, and radare2). Next, this malware, which is now a
symbol will be fed to the 2nd stage of the model. Afterwards, each symbol will get different
Unicode through symbolic link. For example, if 10000 symbols are loaded to compiler base sub
modelstage, then all 10000 symbols will get distinct Unicode numbers like keywords (int,
printf, scanf etc.) at the compiler. Then for the testing purpose, write intt instead of int compiler
will show an error. Eventually, the program will not run for this simple reason because one is a
64
variable and the other is a keyword. Similarly, if any symbol/Unicode number other than this
10000 Unicode number is run at the compiler, which was not loaded previously, the compiler
will remove it at code optimization phase instead of providing an error.

65
Fig. 3.6 2nd stage of the proposed model
By this way, Compiler base sub modelstage is being applied after identification of the malware
with a percentage of accuracy at the first stage of the model. Therefore, if any mis-judgement
was made at the 1st stage of the model, still there is a system for validating the result for the 2 nd
time and removal of the malware. Thus, after detection of the malware will not get any scope to
be run into the victim system as malware will be removed by 2 nd stage of the model.To insert the
symbol in compiler base sub model, it is required to create symbol dataset by maintaining the
following steps:Obtain a set of PE files: A collection of PE image files is required that will be
converted into symbols. These files can be obtained from various sources, such as software
packages or publicly available malware.Understand the PE file format: Researcher needs to be
familiarized with the structure and layout of the PE file format. This will help to extract relevant
information and symbols from the file.
Extract symbol information: Symbols provide names and other metadata for functions,
variables, and other program elements within the PE file. Tools like debuggers or symbol
extraction libraries help in extracting symbol information from the PE image files. Some popular
tools include IDA Pro, and radarDefine the dataset structure: Decide on the structure and
format of the dataset. This will depend on the specific information which needs to be extracted
from the PE files. For example, function names, addresses, sizes, and other relevant attributes
might be included in the dataset.Write a script or program: Use a programming language (such
as Python) to write a script or program that processes the PE files and extracts the desired
symbol information. Researcher can leverage existing libraries or develop own parsing logic
based on the PE file format specifications.Iterate over the PE files: Iterate over collection of PE
files and apply script or program to extract the symbol information from each file. Store the
extracted information in a desired dataset format, such as a jpg file or a database as per
requirement of database.Validate and clean the dataset: After extracting the symbols, perform
any necessary validation and cleaning steps on the dataset. This may involve removing
duplicates, handling missing or inconsistent data, and ensuring the dataset's overall qualitAdd
additional features or labels: Depending upon malware detection case, the dataset can be
enhanced by adding additional features or labels. For example, information about the file's
origin, malware classification and some other relevant attributes can be included.Split the

66
dataset: The dataset can be split for training and testing the model. This ensures that your
models are evaluated properly and helps prevent over-fitting. But the proposed model will use
symbol to indicate the identifier or token in compiler.Document the dataset: Provide
documentation that describes the structure, content, and any pre-processing technique applied to
the dataset. This documentation will be helpful for future researchers to understand and utilise
the datasAt the period of compilation, various data structures will create symbol tables that will
be maintained by compilers in order to store information about the occurrence of various entities
such as variable names, function names, objects, classes, interfaces, etc. In the symbol tables, the
dataset entry algorithm may serve the following purposes varying upon the different languages
such as; C/C++/Java/Python:
I. To store the names of all entities in a structured form at one place.
II. To verify if a variable has been declared.
III. To implement type checking, by verifying assignments and expressions in the source
code are semantically correct.
IV. To determine the scope of a name (scope resolution).
V. A symbol table dataset is simply a table which can be either linear or a hash table. It
maintains an entry for each name in the following format:
<symbol name, type, attribute>
VI.I For e.g., if a symbol table has to store data about the given variable declaration as
follows:
static int malware;
VI.II then it must store the entry such as:
<malware, int, static>
VI.III The attribute contains the entries related to the name.
When the symbol table generated, the proposed Algorithm will work to implement 2 nd stage of
the model.

A1 : Proposed Algorithm:
0. Start checking by compiler
1. If (white-space||tab): /*in lexical analyser(LA)*/
then removes white-spaces and tabs;

67
2. If (comment): /*in lexical analyser(LA)*/
then removes comment;
3. If (unrecognised_symbol): /*in lexical analyser(LA)*/
then removes unrecognised_symbol;
4. If (image symbol)
then lexical analyser generates image-symbol token
for symbol in LA.
4.1 If lexeme -symbol is image-symbol,
then it makes entry to the symbol table.
4.2 If image-symbol (different image dataset used in machine learning base model)
then it checks
4.3 if (symbol is suspicious or malicious):
then it checks
4.3.1if (check result is malicious)
then it gives warning to the user.
5. If (symbol is not suspicious/malicious) /*in lexical analyser(LA)*/
then it reads the next input symbol.
6. If (symbol meaningful) /*in semantic analyser(SA)*/
then it makes entry to the symbol table.
If (not meaningful)
then remove
7. If symbol (deadcode) /*in Code Optimiser (CO)*/
Then it remove at Code optimisation Phase;
8. Repeat for (no. 4 to number 7)
9. If whole code or symbol checking is OK
then compile, assemble and run
10. Finally load to memory and store to SDD.
11. Finish

3.1.4 Proposed multi-stage malware detection model


The proposed multi-stage malware detection model have two different stage depicted below in
figure 3.5 and the function of each of the stage have already been discussed in previous section.

68
In this section, the proposed model will be analysed further.

Fig. 3.7 Proposed Multi-stage Malware Detection Model


The grey scale images from the malimg dataset have been fed to the first stage of the model. The
model will detect malware with a percentage of accuracy. Next, the 2 nd stage of the model will
validate the result of the first stage and remove the malware. If the first stage of the model

69
correctly detects m as malware, what decision the compiler will render? Consequently, the
compiler will validate the result and remove the malware. If the first stage of the model
misclassified m as benign then the 2nd stage of the model will detect the malware. Thus, leaving
out any chance of missing because when a compiler just wrongly types a (,) it will be detected.
Checking the validity of the detection of the dataset used by first stage of the model that will be
used by the compiler to detect in a lexical phase. It will be removed in the code optimization
phase of the compiler as CNN cannot remove malware. The final decision will be made by the
2nd stage of the model.

The following is a flow chart for implementing the proposed multistage malware detection
model to detect malware.

Fig. 3.8 Flow chart for implementing the proposed malware detection technique.

To demonstrate the effectiveness and efficiency of the proposed multi-stage malware detection
model will conducted the following tests: (1) performance comparisons with various malware To
image and symbols sizes, (2) performance of the researcher algorithm and (3) performance
comparisons of the researcher multistage malware detection model, with previously calculated

70
malware techniques. The True/False label refers to the real outcome, whereas the
Positive/Negative label refers to the expected conclusion of an investigation. Depending upon
the statistical tools; the researcher used the various equations to obtain statistical results of the
analysis based on the above detection techniques.

71
Chapter 4

4.1 Expected Outcome


This chapter provides the expected outcome from the methods that are applied to build the

malware detection model specified in Section 3 as well as the overall expected outcome for this

research.This chapter provides expected outcome from the methods are applied to build the

malware detection model are specified in Section 3 and also overall expected outcome for this

research.

Expected Outcome from Experimental and Testing of Model

The expected outcome which are the expected results or observations from the experimental and
testing activities are specified in the previous sections are require to fit the techniques as
following below:

The expected outcome which are the expected result or observation from the experimental and

testing activities as specified in previous section are require to fit the techniques as following

below:

(i) To improve the accuracy of the Technique

1. It is always a good idea first to make sure that the output (dependent) variable (target or label)
actually depends on the input variables (features). It is possible that researcher are chasing a
ghost that does not exist. There is a way to check this, but before that, step 2 need to follow.

2. Start by using the z-scores to normalise the input variables. Any normalising would do but
there is a reason for using z-scores. It has to do with the next step.

57
3. Researcher can do a Principal Component Analysis (PCA). It will tell the contribution of each
of the new variables (obtained after the transformation) to the variation on the output variable.
PCA will answer the question mentioned at the outset about the existence of dependency clearly.
Before performing PCA, the variables have to be normalised using z-scores.

4. After PCA use the new (transformed) variables as the inputs to the neural network. Researcher
can actually use the original variables to have an advantage of using the new variables as
follows: There are some variable which can be omit if their contribution to the variation in the
output is negligible.

5. There is a dependency between the output and the inputs, researcher can tweak the neural
network in many ways until researcher get the best possible accuracy.

6. Create another set of training symbol data that duplicates the data in the underrepresented
areas. Then train with the modified training set.

7. Researcher dataset will evaluated by existing model.

8. Validate model of researcher will add at the fully connected layer of transfer learning model to
speed up the calculation with high accuracy (may be there have also require to do fine tuning the
parameters of researcher model and may also require to train few layers of transfer learning
model too).

9. In case everything describes in methodology which will apply to detect real time malware as a
compiler.

4.2 Overall Expected Outcome


The overall expected outcome for this research is mapped to the research questions and research
objectives. There are three research outcomes that also define the project’s research contribution.
They will address the three-research questions for this thesis and are as presented in table 4.1The
overall expected outcome for this research is mapped to the research question and research
objective. There are three research outcome that also define the project’s research contribution.
They will address the three-research question for this thesis and are as presented in table 4.1

Table 4.1: Overall expected outcome of the proposed malware detection technique:
58
Problem state Research objectives Methods Outcomes
ments
1. There is a la 1.To create a novel d 1. Collect current malware files 1.A novel enhanced data se
ck of datasets ataset using current and benign files from various sit t and symbols set created t
(symbol-based malware. es to create malware and benign o provide better accuracy r
& image base dataset. ate.
d) available fo
r multi-stage a
nalysis to dete
ct malware.

2.Deep 2.To develop a 2. (a) Set up the lab 2. Developed a multi-stage


learning multi-stage malware environment according to lab malware detection model.
(CNN) and detection model architecture.
Compiler using deep learning
based multi- (CNN) and
(b) Collect PE datasets to train
stage analysis compiler.
the model
not enough
done but it is
essential for
detection and
execution
(c) © Cleaning the dataset to
protection.
remove the ambiguity

(d) Labelling the images and


symbols to train both DL and
compiler

(ed) The images and symbols


are then converted into binary
array and shuffled for model
access to a variety of data
points, each of which belongs to
a different class.
3. Lack of 3. (a) Require a comparison 3.Validate the
performance with existing research performance by using
3.To Evaluate the (b) Compare proposed results evaluation techniques
evaluation
performance of with existing research results (CM, TP, FP, TPR, FPR,
against
proposed malware (c) Statistical analysis and ACU, and RAC) of the
existing evaluation techniques such as
detection model proposed multi-stage
malware CM, TP, FP, TPR, FPR, ACU,
against existing framework, which makes
detection and RAC are to be used to
malware detection computations faster and
model and the check each pinpoint of the
model. research development. detects essential malware
proposed
features while using
model.
CNNs and compilers to
59
determine the final
structure of a malware
image or symbol-based
shape.
Chapter 5

5.1 Conclusion

The increase of malware and advanced cyber-attacks are now becoming a serious problem.
Unknown malware which has not been determined by security vendors is often used for attacks,
and it is becoming difficult to detect them. Researchers is continuously developing new
techniques to detect the malware. Presently, it has become a race between malware creator and
anti-malware developer.

The purpose of this research proposal is to establish that the performance of the proposed
multistage malware detection model is higher than the existing malware detection model. In
order to build the model, a systematic literature review is conducted to find out the existing
model gap, weaknesses and challenges. The gap of the current model is a lack of performance
validation mechanism within the model and malware removal system.

The proposed solution is a multistage malware detection model using Convolutional Neural
Network (CNN) algorithm and compiler, which is expected to meet the gap. The proposed model
has two stages, receptively, CNN based sub model and compiler-based sub model. The first stage
of the model will detect the model malware with a percentage of accuracy and the 2 nd stage of the
model will validate the result and remove the malware.

To build the proposed model, the challenges that were being faced are lack of dataset and

60
previous work comprising CNN with compiler. Finally, the proposed model will become more
robust and resilient than the existing model when properly implemented.

The increase of malware and advanced cyber-attacks are now becoming a serious problem.
Unknown malware which has not been determined by security vendors is often used for attacks,
and it is becoming difficult to detect. Researchers is continuously developing new technique to
detect the malware now it’s become a race between malware creator and anti-malware developer.

The purpose of this research proposal is to established that the performance of the proposed
multi-stage malware detection model higher than the existing malware detection model. In order
to build the model, conducted a systematic literature review to find out the existing model gap.
weakness and challenges. The gap of the current model is a lack of performance validation
mechanism within the model and malware removal system.

The proposed a multistage malware detection model using Convolutional Neural Network
(CNN) algorithm and compiler expecting to meet the gap. The proposed model has two stages
receptively CNN base sub model and compiler base sub model. First stage of the model will
detect the model malware with a percentage of accuracy then 2 nd stage of the model will validate
the result and remove the malware.

To build the proposed model, the challenges that were facing are lack of dataset and previous
work comprising CNN with compiler. Finally, the proposed model will become more robust and
resilient than existing model.

61
Reference

[1] N. Z. Gorment, A. Selamat, L. K. Cheng and O. Krejcar, "Machine Learning Algorithm for Malware
Detection: Taxonomy, Current Challenges and Future Directions," in IEEE Access, doi:
10.1109/ACCESS.2023.3256979.

[29]McAfee:McAfee Labs Threats report April 2021. Available online: https://


https://www.mcafee.com/blogs/other-blogs/mcafee-labs/2021-threat-predictions-report/(accessed on 30 August
2023).

[310] AV-TEST Institute: Statistic of malware. Available online: https://www.av-test.org/en/statistics/malware/


(accessed on 30 August 2023).

[4102] Internet Security Threat Report (ISTR), Symantec Cor-portion, 2018 (accessed on 30th September-2023).

[5103] Pipeline Attack Yields Urgent Lessons About U.S. Cybersecurity, The Gurdian newspaper, available online
https://www.nytimes.com/2021/05/14/us/politics/pipeline-hack.html (accessed on 30 August 2023).

[6104] Cyberattacks on hospitals are growing threats to patient safety, available online
https://www.nytimes.com/2021/05/14/us/politics/pipeline-hack.html, (accessed on 30 August 2023).

[7105] What is LockBit ransomware and how does it operate?


https://www.theguardian.com/business/2023/jan/13/what-is-lockbit-ransomware-and-how-does-it-operate-malware-
royal-mail (accessed on 30 August 2023).

[899] Toldinas, Jevgenijus, Algimantas Venčkauskas, Robertas Damaševičius, Šarūnas Grigaliūnas, Nerijus
Morkevičius, and Edgaras Baranauskas. 2021. "A Novel Approach for Network Intrusion Detection Using
Multistage Deep Learning ImageRecognition"Electronics10,no. 15: 1854.
https://doi.org/10.3390/electronics10151854

[9100] X. Li, M. Xu, P. Vijayakumar, N. Kumar and X. Liu, "Detection of Low-Frequency and Multi-Stage
Attacks in Industrial Internet of Things," in IEEE Transactions on Vehicular Technology, vol. 69, no. 8, pp.
8820-8831, Aug. 2020, doi: 10.1109/TVT.2020.2995133.

[10101] M. Injadat, A. Moubayed, A. B. Nassif and A. Shami, "Multi-Stage Optimized Machine Learning
Framework for Network Intrusion Detection," in IEEE Transactions on Network and Service Management, vol.
18, no. 2, pp. 1803-1816, June 2021, doi: 10.1109/TNSM.2020.3014929.

[117] Z. Wang and M. O’Boyle, "Machine Learning in Compiler Optimization," in Proceedings of the IEEE,
vol. 106, no. 11, pp. 1879-1901, Nov. 2018, doi: 10.1109/JPROC.2018.2817118.

[1215] S. Sehrawat and Dr. D. Singh, “Malware and Malware Detection Techniques: A Survey,” International
Journal for Research in Applied Science and Engineering Technology, vol. 10, no. 5, pp. 3947–3953, May 2022,
doi: https://doi.org/10.22214/ijraset.2022.43287.

[1316] Souri, A., Hosseini, R. A state-of-the-art survey of malware detection approaches using data mining
techniques. Hum. Cent. Comput. Inf. Sci. 8, 3 (2018). https://doi.org/10.1186/s13673-018-0125-x

[1424] K. Shaukat, S. Luo, and V. Varadharajan, “A novel deep learning-based approach for malware
detection,” Engineering Applications of Artificial Intelligence, vol. 122, p. 106030, Jun. 2023, doi:
https://doi.org/10.1016/j.engappai.2023.106030.

[1520] Vasan, D.; Alazab, M.; Wassan, S.; Safaei, B.; Zheng, Q. Image-based malware classification using
ensemble of CNN architectures (IMCEC). Computers & Security. 2020, 92.

62
[1621] AKANDWANAHO, Stephen M. and KOOBLAL, Muni. Intelligent Malware Detection Using a Neural
Network Ensemble Based on a Hybrid Search Mechanism. The African Journal of Information and
Communication (AJIC). 2019, vol.24, pp.1-21. ISSN 2077-7213. http://dx.doi.org/10.23962/10539/28660.

[1725] Mustfa Majid, A.-A. et al. (2023) ‘A review of artificial intelligence based malware detection using Deep
Learning’, Materials Today: Proceedings, 80, pp. 2678–2683. doi: 10.1016/j.matpr.2021.07.012.

[184] Ren, X. (2021, March 23). Unleashing the hidden power of compiler optimization on binary code
Difference: an Empirical study. arXiv.org. https://arxiv.org/abs/2103.12357

[195] Z. Tian, Y. Huang, B. Xie, Y. Chen, L. Chen and D. Wu, "Fine-Grained Compiler Identification With
Sequence-Oriented Neural Modeling," in IEEE Access, vol. 9, pp. 49160-49175, 2021, doi:
10.1109/ACCESS.2021.3069227.

[206] Chen, Y., Shi, Z., Li, H., Zhao, W., Liu, Y., Qiao, Y. (2019). HIMALIA: Recovering Compiler
Optimization Levels from Binaries by Deep Learning. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent
Systems and Applications. IntelliSys 2018. Advances in Intelligent Systems and Computing, vol 868. Springer,
Cham. https://doi.org/10.1007/978-3-030-01054-6_3

[2111] Amira, A. et al. (2023) ‘A survey of malware analysis using community detection algorithms’, ACM
Computing Surveys [Preprint]. doi:10.1145/3610223.

[2212] Sun, Q. et al. (2022) ‘Leveraging spectral representations of control flow graphs for efficient analysis of
Windows Malware’, Proceedings of the 2022 ACM on Asia Conference on Computer and Communications
Security [Preprint]. doi:10.1145/3488932.3527294.

[2313] Li, S., Zhou, Q., Zhou, R. et al. Intelligent malware detection based on graph convolutional network.
Journal of Supercomputing (2021). https://doi.org/10.1007/s11227-021-04020-y.

[2414] A. Hellal, F. Mallouli, A. Hidri and R. K. Aljamaeen, "A survey on graph-based methods for malware
detection," 2020 4th International Conference on Advanced Systems and Emergent Technologies (IC_ASET),
Hammamet, Tunisia, 2020, pp. 130-134, doi: 10.1109/IC_ASET49463.2020.9318301.

[25] B. Sharma, P. Goel and J. K. Grewal, "Advances and Challenges in Cryptography using Artificial
Intelligence," 2023 IEEE 8th International Conference for Convergence in Technology (I2CT), Lonavla, India,
2023, pp. 1-5, doi: 10.1109/I2CT57861.2023.10126338.

[26] Jagsir Singh, Jaswinder Singh, A survey on machine learning-based malware detection in executable files,
Journal of Systems Architecture,volume 112, 2021, 101861, ISSN 1383-7621,
https://doi.org/10.1016/j.sysarc.2020.101861.

[27] Dugyala, R. et al. (2022) ‘Analysis of malware detection and signature generation using a novel hybrid
approach’, Mathematical Problems in Engineering, 2022, pp. 1–13. doi:10.1155/2022/5852412.

[28] Kamboj, A. et al. (2023) ‘Detection of malware in downloaded files using various machine learning
models’, Egyptian Informatics Journal, 24(1), pp. 81–94. doi:10.1016/j.eij.2022.12.002.

[29] Akhtar, M.S.; Feng, T. Evaluation of Machine Learning Algorithms for Malware Detection. Sensors 2023,
23, 946. https://doi.org/10.3390/s23020946

[30] Akhtar, M.S.; Feng, T. Malware Analysis and Detection Using Machine Learning Algorithms. Symmetry
2022, 14, 2304. https:// doi.org/10.3390/sym14112304

[31] Alnajim, A.M.; Habib, S.; Islam, M.; Albelaihi, R.; Alabdulatif, A. Mitigating the Risks of Malware Attacks
with Deep Learning Techniques. Electronics 2023, 12, 3166. https://doi.org/10.3390/electronics12143166

[32] H. Malani, A. Bhat, S. Palriwala, J. Aditya and A. Chaturvedi, "A Unique Approach to Malware Detection

63
Using Deep Convolutional Neural Networks," 2022 4th International Conference on Electrical, Control and
Instrumentation Engineering (ICECIE), KualaLumpur, Malaysia, 2022, pp. 1-6, doi:
10.1109/ICECIE55199.2022.10000344.

[33] Gibert, D., Mateu, C. and Planes, J., 2020. The rise of machine learning for detection and classification of
malware: Research developments, trends and challenges. Journal of Network and Computer Applications, 153,
p.102526.

[34]Kumar, R. and Subbiah, G. (2022) ‘Explainable machine learning for malware detection using ensemble
bagging algorithms’, Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing
[Preprint]. doi:10.1145/3549206.3549284.

[35] Roseline, S.A.; Sasisri, A.D.; Geetha, S.; Balasubramanian, C. Towards Efficient Malware Detection and
Classification using Multilayered Random Forest Ensemble Technique. In Proceedings of the 2019 International
Carnahan Conference on Security Technology (ICCST), Chennai, India, 1–3 October 2019; pp. 1–6.

[36] Sahay, S.K., Sharma, A., Rathore, H. (2020). Evolution of Malware and Its Detection Techniques. In: Tuba, M.,
Akashe, S., Joshi, A. (eds) Information and Communication Technology for Sustainable Development. Advances in
Intelligent Systems and Computing, vol 933. Springer, Singapore. https://doi.org/10.1007/978-981-13-7166-0_14

[37] D. Pizzolotto and K. Inoue, "Identifying Compiler and Optimization Options from Binary Code using Deep
Learning Approaches," 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME),
Adelaide, SA, Australia, 2020, pp. 232-242, doi: 10.1109/ICSME46990.2020.00031.

[38] L. Jones, A. Sellers and M. Carlisle, "CARDINAL: similarity analysis to defeat malware compiler
variations," 2016 11th International Conference on Malicious and Unwanted Software (MALWARE), Fajardo,
PR, USA, 2016, pp. 1-8, doi: 10.1109/MALWARE.2016.7888728.

[39] Huijuan Zhu, Huahui Wei, Liangmin Wang, Zhicheng Xu, Victor S. Sheng, An effective end-to-end
android malware detection method, Expert Systems with Applications, Volume 218, 2023,119593,ISSN 0957-
4174, https://doi.org/10.1016/j.eswa.2023.119593.

[40] Djenna, A.; Bouridane, A.; Rubab, S.; Marou, I.M. Artificial Intelligence-Based Malware Detection,
Analysis, and Mitigation. Symmetry 2023, 15, 677. https://doi.org/10.3390/sym15030677

[41] Qiaokun Wen, K.P. Chow, CNN based zero-day malware detection using small binary segments, Forensic
Science International: Digital Investigation, Volume 38, Supplement, 2021, 301128, ISSN 2666-2817,
https://doi.org/10.1016/j.fsidi.2021.301128.

[42] Wolsey, Adam. (2022). The State-of-the-Art in AI-Based Malware Detection Techniques: A Review.
10.48550/arXiv.2210.11239.

[43] M. Kim, "Research on Malware Detection System Using Artificial Intelligence," 2022 IEEE/ACIS 7th
International Conference on Big Data, Cloud Computing, and Data Science (BCD), Danang, Vietnam, 2022, pp.
211-213, doi: 10.1109/BCD54882.2022.9900792.

[44] Shahidi, S., Shakeri, H. and Jalali, M., 2021. A semantic malware detection model based on the GMDH
neural networks. Computers & Electrical Engineering, 91, p.107099.

[45] Aslan, O. and Yilmaz, A., 2021. A New Malware Classification Framework Based on Deep Learning
Algorithms. IEEE Access, 9, pp.87936-87951.

[46] Hemalatha, J., Roseline, S., Geetha, S., Kadry, S. and Damaševičius, R., 2021. An Efficient DenseNet-
Based Deep Learning Model for Malware Detection. Entropy, 23(3), p.344.

[47] G. Sun and Q. Qian, "Deep Learning and Visualization for Identifying Malware Families," in IEEE
Transactions on Dependable and Secure Computing, vol. 18, no. 1, pp. 283-295, 1 Jan.-Feb. 2021, doi:

64
10.1109/TDSC.2018.2884928.

[48] Damaševiˇcius, R.; Venˇckauskas, A.; Toldinas, J.; Grigali ¯ unas, Š. Ensemble-Based Classification Using
Neural Networks and Machine Learning Models for Windows PE Malware Detection. Electronics 2021, 10, 485.

[49] Gurumayum Akash Sharma, Khundrakpam Johnson Singh, Maisnam Debabrata Singh, A Deep Learning
Approach to Image-Based Malware Analysis, Progress in Computing, Analytics and Networking, 2020, Volume
1119 ISBN: 978-981-15-2413-4

[50] Huan Zhou, Malware Detection with Neural Network Using Combined Features, Cyber Security, 2019,
Volume 970, ISBN: 978-981-13-6620-8

[51] Halim, Mudzfirah & Abdullah, Azizi & Zainol Ariffin, Khairul Akram. (2019). Recurrent Neural Network
for Malware Detection, International Journal of Advance Soft Computing. Appl, Vol.11, No. 1, March 2019
ISSN 2074-8523.

[52] Bozkir, A.S.; Cankaya, A.O.; Aydos, M. Utilization and Comparison of Convolutional Neural Networks in
Malware Recognition. In Proceedings of the 27th Signal Processing and Communications Applications Conference
(SIU), Sivas, Turkey, 24–26 April 2019; pp. 1–4.

[53] Wei Zhong, Feng Gu, A multi-level deep learning system for malware detection, Expert Systems with
Applications, Volume 133, 2019, Pages 151-162, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2019.04.064.

[54] R. Vinayakumar, M. Alazab, K. P. Soman, P. Poornachandran and S. Venkatraman, "Robust Intelligent


Malware Detection Using Deep Learning," in IEEE Access, vol. 7, pp. 46717-46738, 2019, doi:
10.1109/ACCESS.2019.2906934.

[55] Rad, B.B., Nejad, M.K.H. and Shahpasand, M.A.R.Y.A.M., 2018. Malware classification and detection
using artificial neural network. Journal of Engineering Science and Technology, 13, pp.14-23.

[56] Alycia N. Carey, Huy Mai, Justin Zhan, Asif Mehmood, "Adversarial attacks against image-based malware
detection using autoencoders," Proceedings Volume 11735, Pattern Recognition and Tracking XXXII; 117350A
(2021); doi:10.1117/12.2587923

[57] Kujanpää, K., Victor, W. and Ilin, A., 2021. Automating Privilege Escalation with Deep Reinforcement
Learning. arXiv preprint arXiv:2110.01362.

[58] Bawazeer, O., Helmy, T. and Al-hadhrami, S., 2021. Malware Detection Using Machine Learning
Algorithms Based on Hardware Performance Counters: Analysis and Simulation. Journal of Physics: Conference
Series, 1962(1), p.012010.

[59] Kwon, S. et al. (2020) ‘Machine learning based malware detection with the 2019 Kisa Data Challenge
Dataset’, Proceedings of the 2020 ACM International Conference on Intelligent Computing and its Emerging
Applications [Preprint]. doi:10.1145/3440943.3444745.

[60] Sarker, I.H., Kayes, A.S.M., Badsha, S. et al. Cybersecurity data science: an overview from machine
learning perspective. Journal of Big Data 7, 41 (2020). https://doi.org/10.1186/s40537-020-00318-5

[61] Chen, H., Su, J., Qiao, L. and Xin, Q., 2018. Malware Collusion Attack against SVM: Issues and
Countermeasures. Applied Sciences, 8(10), p.1718.

[62] J. Su, V. D. Vasconcellos, S. Prasad, S. Daniele, Y. Feng, and K. Sakurai, “Lightweight classification of IoT
malware based on image recognition,” in Proceedings of the 2018 IEEE 42nd Annual Computer Software and
Applications Conference (COMPSAC), pp. 664–669, Tokyo, Japan, July 2018. View at Publisher • View at
Google Scholar

[63] S. Choudhary and A. Sharma, "Malware Detection & Classification using Machine Learning," 2020

65
International Conference on Emerging Trends in Communication, Control and Computing (ICONC3), 2020, pp.
1-4, doi: 10.1109/ICONC345789.2020.9117547.

[64] N. Udayakumar, V. J. Saglani, A. V. Cupta and T. Subbulakshmi, "Malware Classification Using Machine
Learning Algorithms," 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI),
2018, pp. 1-9, doi: 10.1109/ICOEI.2018.8553780.

[63] Chliah, H., Battou, A., Baz, O. (2022). Artificial Immune System and Artificial Neural Network in Intrusion
Detection System. In: Elhoseny, M., Yuan, X., Krit, Sd. (eds) Distributed Sensing and Intelligent Systems.
Studies in Distributed Intelligence . Springer, Cham. https://doi.org/10.1007/978-3-030-64258-7_67

[64] Skobtsov, Y. (2022). Artificial Immune Systems—Models and Applications. In: Kravets, A.G., Bolshakov,
A.A., Shcherbakov, M. (eds) Cyber-Physical Systems: Intelligent Models and Algorithms. Studies in Systems,
Decision and Control, vol 417. Springer, Cham. https://doi.org/10.1007/978-3-030-95116-0_3

[65] Singh, K., Kaur, L. & Maini, R. A survey of intrusion detection techniques based on negative selection
algorithm. Int J Syst Assur Eng Manag 13 (Suppl 1), 175–185 (2022). https://doi.org/10.1007/s13198-021-
01357-8

[66] Praneet Saurabh, Bhupendra Verma, Negative selection in anomaly detection—A survey, Computer Science
Review,Volume 48, 2023, 100557, ISSN 1574-0137,https://doi.org/10.1016/j.cosrev.2023.100557.

[67] C. Duru, J. Ladeji–Osias, K. Wandji, T. Otily and R. Kone, "A Review of Human Immune Inspired
Algorithms for Intrusion Detection Systems," 2022 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA,
2022, pp. 364-371, doi: 10.1109/AIIoT54504.2022.9817213.

[68] I. Dutt, S. Borah and I. K. Maitra, "Immune System Based Intrusion Detection System (IS-IDS): A
Proposed Model," in IEEE Access, vol. 8, pp. 34929-34941, 2020, doi: 10.1109/ACCESS.2020.2973608.

[69] S. Tobiyama, Y. Yamaguchi, H. Shimada, T. Ikuse and T. Yagi, "Malware Detection with Deep Neural
Network Using Process Behavior," 2016 IEEE 40th Annual Computer Software and Applications Conference
(COMPSAC), 2016, pp. 577-582, doi: 10.1109/COMPSAC.2016.151

[70] M. Bloch et al., "An Overview of Information-Theoretic Security and Privacy: Metrics, Limits and
Applications," in IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 1, pp. 5-22, March 2021,
doi: 10.1109/JSAIT.2021.3062755.

[71] Nataraj, L., Karthikeyan, S., Jacobs, G. and Manjunath, B.S. “Malware Images: Visualization and Automatic
Classification” VizSec '11: Proceedings of the 8th International Symposium on Visualization for Cyber Security,
July 2011Article No.: 4 Pages 17 7https://doi.org/10.1145/2016904.2016908

[72] S. Choudhary and A. Sharma, "Malware Detection & Classification using Machine Learning," 2020
International Conference on Emerging Trends in Communication, Control and Computing (ICONC3), 2020, pp.
1-4, doi: 10.1109/ICONC345789.2020.9117547.

[73]N.Udayakumar, V. J. Saglani, A. V. Cupta and T. Subbulakshmi, "Malware Classification Using


MachineLearning Algorithms," 2018 2nd International Conference on Trends in Electronics and Informatics
(ICOEI), 2018, pp. 1-9, doi: 10.1109/ICOEI.2018.8553780.

[74] Nghi Phu, T., Dai Tho, N., Huy Hoang, L., Ngoc Toan, N. and Ngoc Binh, N., 2020. An Efficient
Algorithm to Extract Control Flow-Based Features for IoT Malware Detection. The Computer Journal, 64(4),
pp.599-609.

[75]Roseline, S.A.; Geetha, S.; Kadry, S.; Nam, Y. Intelligent Vision-based Malware Detection and
Classification using Deep Random Forest Paradigm. IEEE Access, 8, 206303–206324, 2020, doi:
10.1109/ACCESS.2020.3036491[24] K. Shaukat, S. Luo, and V. Varadharajan, “A novel deep learning-based

66
approach for malware detection,” Engineering Applications of Artificial Intelligence, vol. 122, p. 106030, Jun.
2023, doi: https://doi.org/10.1016/j.engappai.2023.10603[25] Mustafa Majid, A.-A. et al. (2023) ‘A review of
artificial intelligence based malware detection using Deep Learning’, Materials Today: Proceedings, 80, pp.
2678–2683. doi: 10.1016/j.matpr.2021.07.012.

[26] Kamboj, A. et al. (2023) ‘Detection of malware in downloaded files using various machine learning
models’, Egyptian Informatics Journal, 24(1), pp. 81–94. doi:10.1016/j.eij.2022.12.002.

[27] Akhtar, M.S.; Feng, T. Evaluation of Machine Learning Algorithms for Malware Detection. Sensors
2023, 23, 946. https://doi.org/10.3390/s23020946

[28] Huijuan Zhu, Huahui Wei, Liangmin Wang, Zhicheng Xu, Victor S. Sheng, An effective end-to-end
android malware detection method, Expert Systems with Applications, Volume 218, 2023,119593,ISSN
0957-4174, https://doi.org/10.1016/j.eswa.2023.119593.

[29] Alnajim, A.M.; Habib, S.; Islam, M.; Albelaihi, R.; Alabdulatif, A. Mitigating the Risks of Malware
Attacks with Deep Learning Techniques. Electronics 2023, 12, 3166.
https://doi.org/10.3390/electronics12143166

[30] Djenna, A.; Bouridane, A.; Rubab, S.; Marou, I.M. Artificial Intelligence-Based Malware Detection,
Analysis, and Mitigation. Symmetry 2023, 15, 677. https://doi.org/10.3390/sym15030677

[31] Dugyala, R. et al. (2022) ‘Analysis of malware detection and signature generation using a novel hybrid
approach’, Mathematical Problems in Engineering, 2022, pp. 1–13. doi:10.1155/2022/5852412[32]
Akhtar, M.S.; Feng, T. Malware Analysis and Detection Using Machine Learning Algorithms. Symmetry 2022,
14, 2304. https:// doi.org/10.3390/sym1411230[33] H. Malani, A. Bhat, S. Palriwala, J. Aditya and A.
Chaturvedi, "A Unique Approach to Malware Detection Using Deep Convolutional Neural Networks," 2022 4th
International Conference on Electrical, Control and Instrumentation Engineering (ICECIE), KualaLumpur,
Malaysia, 2022, pp. 1-6, doi: 10.1109/ICECIE55199.2022.10000344.

[34] Wolsey, Adam. (2022). The State-of-the-Art in AI-Based Malware Detection Techniques: A Review.
10.48550/arXiv.2210.11239.

[35] M. Kim, "Research on Malware Detection System Using Artificial Intelligence," 2022 IEEE/ACIS 7th
International Conference on Big Data, Cloud Computing, and Data Science (BCD), Danang, Vietnam, 2022, pp.
211-213, doi: 10.1109/BCD54882.2022.9900792.

[36] Jagsir Singh, Jaswinder Singh, A survey on machine learning-based malware detection in executable
files, Journal of Systems Architecture,volume 112, 2021, 101861, ISSN 1383-7621,
https://doi.org/10.1016/j.sysarc.2020.1018[37] Shahidi, S., Shakeri, H. and Jalali, M., 2021. A semantic
malware detection model based on the GMDH neural networks. Computers & Electrical Engineering, 91,
p.107099.

[38] Aslan, O. and Yilmaz, A., 2021. A New Malware Classification Framework Based on Deep Learning
Algorithms. IEEE Access, 9, pp.87936-87951.

[76] P. Kotian and R. Sonkusare, "Detection of Malware in Cloud Environment using Deep Neural Network,"
2021 6th International Conference for Convergence in Technology (I2CT), 2021, pp. 1-5, doi:
10.1109/I2CT51068.2021.9417901.

[77]Imtiaz, S., Rehman, S., Javed, A., Jalil, Z., Liu, X. and Alnumay, W., 2021. DeepAMD: Detection and
identification of Android malware using high-efficient Deep Artificial Neural Network. Future Generation
Computer Systems, 115, pp.844-856.

[78] Khammas, B., 2018. Malware Detection using Sub-Signatures and Machine Learning Technique. Journal of
Information Security Research, 9(3), p.96.[50] D. Pizzolotto and K. Inoue, "Identifying Compiler and

67
Optimization Options from Binary Code using Deep Learning Approaches," 2020 IEEE International
Conference on Software Maintenance and Evolution (ICSME), Adelaide, SA, Australia, 2020, pp. 232-242, doi:
10.1109/ICSME46990.2020.00031.

[51] Sarker, I.H., Kayes, A.S.M., Badsha, S. et al. Cybersecurity data science: an overview from machine
learning perspective. Journal of Big Data 7, 41 (2020). https://doi.org/10.1186/s40537-020-00318-5

[52] Gurumayum Akash Sharma, Khundrakpam Johnson Singh, Maisnam Debabrata Singh, A Deep Learning
Approach to Image-Based Malware Analysis, Progress in Computing, Analytics and Networking, 2020, Volume
1119 ISBN: 978-981-15-2413-4

[79] Barut, O., Grohotolski, M., DiLeo, C., Luo, Y., Li, P. and Zhang, T., 2020. Machine Learning Based
Malware Detection on Encrypted Traffic: A Comprehensive Performance Study. 7th International Conference on
Networking, Systems and Security.

[80] Ali, M., Shiaeles, S., Bendiab, G. and Ghita, B., 2020. MALGRA: Machine Learning and N-Gram Malware
Feature Extraction and Detection System. Electronics, 9(11), p.1777.

[81]Luca Demetrio, Scott E. Coull, Battista Biggio, Giovanni Lagorio, Alessandro Armando, and Fabio Roli.
Adversarial EXEmples: A Survey and Experimental Evaluation of Practical Attacks on Machine Learning for
Windows Malware Detection. In Proceedings of ACM TO EDIT conference. ACM, New York, NY, USA, 31
pages. https://doi.org/TOEDIT

[82] Wadkar, M.; Di Troia, F.; Stamp, M. Detecting malware evolution using support vector machines. Expert
Systems with Applications. 2020,143, 113022.

[83] Naeem, H.; Ullah, F.; Naeem, M.R.; Khalid, S.; Vasan, D.; Jabbar, S.; Saeed, S. Malware detection in
industrial internet of things based on hybrid image visualization and deep learning model. Ad Hoc Netw. 2020,
105.

[84] I. M. M. Matin and B. Rahardjo, "Malware Detection Using Honeypot and Machine Learning," 2019 7th
International Conference on Cyber and IT Service Management (CITSM), Jakarta, Indonesia, 2019, pp. 1-4, doi:
10.1109/CITSM47753.2019.8965419.

[85] Sanjay K. Sahayy,Ashu Sharmaz and Hemant Rathorex, Evolution of Malware and its Detection
Techniques, Springer, Information and Communication Technology for Sustainable Development, pp 139-150,
2019.[61] Huan Zhou, Malware Detection with Neural Network Using Combined Features, Cyber
Security, 2019, Volume 970, ISBN: 978-981-13-6620-8

[62] Halim, Mudzfirah & Abdullah, Azizi & Zainol Ariffin, Khairul Akram. (2019). Recurrent Neural
Network for Malware Detection, International Journal of Advance Soft Computing. Appl, Vol.11, No. 1, March
2019 ISSN 2074-8523.

[86] Maleki, N., Bateni, M. and Rastegari, H., 2019. An Improved Method for Packed Malware Detection using
PE Header and Section Table Information. International Journal of Computer Network and Information Security,
11(9), pp.9-17.[64] Bozkir, A.S.; Cankaya, A.O.; Aydos, M. Utilization and Comparison of
Convolutional Neural Networks in Malware Recognition. In Proceedings of the 27th Signal Processing and
Communications Applications Conference (SIU), Sivas, Turkey, 24–26 April 2019; pp. 1–4.

[87] Roseline, S.A.; Hari, G.; Geetha, S.; Krishnamurthy, R. Vision-Based Malware Detection and Classification
Using Lightweight Deep Learning Paradigm. In Proceedings of the International Conference on Computer

68
Vision and Image Processing, Jaipur, India, 27–29 September 2019; pp. 62–73.

[88] Z. Cui, F. Xue, X. Cai, Y. Cao, G. -g. Wang and J. Chen, "Detection of Malicious Code Variants Based on
Deep Learning," in IEEE Transactions on Industrial Informatics, vol. 14, no. 7, pp. 3187-3196, July 2018, doi:
10.1109/TII.2018.2822680.

[89] Agarap, A.F.; Pepito, F.J.H. Towards building an intelligent anti-malware system a deep learning approach
using support vector machine (SVM) for malware classification. arXiv 2017, arXiv:1801.00318.[70] Chen, H.,
Su, J., Qiao, L. and Xin, Q., 2018. Malware Collusion Attack against SVM: Issues and Countermeasures.
Applied Sciences, 8(10), p.1718.

[90] Ashik, M.; Jyothish, A.; Anandaram, S.; Vinod, P.; Mercaldo, F.; Martinelli, F.; Santone, A. Detection of
Malicious Software by Analyzing Distinct Artifacts Using Machine Learning and Deep Learning Algorithms.
Electronics 2021, 10, 1694. https://doi.org/10.3390/electronics10141694.[2] Sahay, S.K., Sharma, A., Rathore, H.
(2020). Evolution of Malware and Its Detection Techniques. In: Tuba, M., Akashe, S., Joshi, A. (eds) Information and
Communication Technology for Sustainable Development. Advances in Intelligent Systems and Computing, vol 933.
Springer, Singapore. https://doi.org/10.1007/978-981-13-7166-0_14

[3] D. Pizzolotto and K. Inoue, "Identifying Compiler and Optimization Options from Binary Code using Deep
Learning Approaches," 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME),
Adelaide, SA, Australia, 2020, pp. 232-242, doi: 10.1109/ICSME46990.2020.00031.

[4] Ren, X. (2021, March 23). Unleashing the hidden power of compiler optimization on binary code Difference:
an Empirical study. arXiv.org. https://arxiv.org/abs/2103.12357

[5] Z. Tian, Y. Huang, B. Xie, Y. Chen, L. Chen and D. Wu, "Fine-Grained Compiler Identification With Sequence-
Oriented Neural Modeling," in IEEE Access, vol. 9, pp. 49160-49175, 2021, doi: 10.1109/ACCESS.2021.3069227.

[6] Chen, Y., Shi, Z., Li, H., Zhao, W., Liu, Y., Qiao, Y. (2019). HIMALIA: Recovering Compiler Optimization Levels
from Binaries by Deep Learning. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications.
IntelliSys 2018. Advances in Intelligent Systems and Computing, vol 868. Springer, Cham.
https://doi.org/10.1007/978-3-030-01054-6_3

[7] Z. Wang and M. O’Boyle, "Machine Learning in Compiler Optimization," in Proceedings of the IEEE, vol. 106,
no. 11, pp. 1879-1901, Nov. 2018, doi: 10.1109/JPROC.2[8] L. Jones, A. Sellers and M. Carlisle, "CARDINAL:
similarity analysis to defeat malware compiler variations," 2016 11th International Conference on Malicious
and Unwanted Software (MALWARE), Fajardo, PR, USA, 2016, pp. 1-8, doi: 10.1109/MALWARE.2016.7888728.

[9]McAfee:McAfee Labs Threats report April 2021. Available online: https://


https://www.mcafee.com/blogs/other-blogs/mcafee-labs/2021-threat-predictions-report/(accessed on 30
August 2023).

[10] AV-TEST Institute: Statistic of malware. Available online: https://www.av-test.org/en/statistics/malware/


(accessed on 30 August 2023).

[11] Amira, A. et al. (2023) ‘A survey of malware analysis using community detection algorithms’, ACM
Computing Surveys [Preprint]. doi:10.1145/3610223.

[12] Sun, Q. et al. (2022) ‘Leveraging spectral representations of control flow graphs for efficient analysis of
Windows Malware’, Proceedings of the 2022 ACM on Asia Conference on Computer and Communications
Security [Preprint]. doi:10.1145/3488932.3527294.

[13]Li, S., Zhou, Q., Zhou, R. et al. Intelligent malware detection based on graph convolutional network. Journal

69
of Supercomputing (2021). https://doi.org/10.1007/s11227-021-04020-y.

[14]A. Hellal, F. Mallouli, A. Hidri and R. K. Aljamaeen, "A survey on graph-based methods for malware
detection," 2020 4th International Conference on Advanced Systems and Emergent Technologies (IC_ASET),
Hammamet, Tunisia, 2020, pp. 130-134, doi: 10.1109/IC_ASET49463.2020.9318301.

[15]S. Sehrawat and Dr. D. Singh, “Malware and Malware Detection Techniques: A Survey,” International
Journal for Research in Applied Science and Engineering Technology, vol. 10, no. 5, pp. 3947–3953, May 2022,
doi: https://doi.org/10.22214/ijraset.2022.43287.

[16] Souri, A., Hosseini, R. A state-of-the-art survey of malware detection approaches using data mining
techniques. Hum. Cent. Comput. Inf. Sci. 8, 3 (2018). https://doi.org/10.1186/s13673-018-0125-x

[17] B. Sharma, P. Goel and J. K. Grewal, "Advances and Challenges in Cryptography using Artificial
Intelligence," 2023 IEEE 8th International Conference for Convergence in Technology (I2CT), Lonavla, India,
2023, pp. 1-5, doi: 10.1109/I2CT57861.2023.10126338.

[18] Kumar, R. and Subbiah, G. (2022) ‘Explainable machine learning for malware detection using
ensemble bagging algorithms’, Proceedings of the 2022 Fourteenth International Conference on
Contemporary Computing [Preprint]. doi:10.1145/3549206.3549284.

[19] Roseline, S.A.; Sasisri, A.D.; Geetha, S.; Balasubramanian, C. Towards Efficient Malware Detection and
Classification using Multilayered Random Forest Ensemble Technique. In Proceedings of the 2019
International Carnahan Conference on Security Technology (ICCST), Chennai, India, 1–3 October 2019;
pp. 1–6.

[20] Vasan, D.; Alazab, M.; Wassan, S.; Safaei, B.; Zheng, Q. Image-based malware classification using
ensemble of CNN architectures (IMCEC). Computers & Security. 2020, 92.

[21] AKANDWANAHO, Stephen M. and KOOBLAL, Muni. Intelligent Malware Detection Using a Neural
Network Ensemble Based on a Hybrid Search Mechanism. The African Journal of Information and
Communication (AJIC). 2019, vol.24, pp.1-21. ISSN 2077-7213. http://dx.doi.org/10.23962/10539/28660.

70

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy