0% found this document useful (0 votes)

21 views23 pages

A Comprehensive Review On Malware Detection Approaches

Uploaded by

sac heen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views23 pages

A Comprehensive Review On Malware Detection Approaches

Uploaded by

sac heen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Received November 22, 2019, accepted December 22, 2019, date of publication January 3, 2020, date of current version

January 10, 2020.

Digital Object Identifier 10.1109/ACCESS.2019.2963724

A Comprehensive Review on Malware Detection

Approaches
ÖMER ASLAN 1,2 AND REFIK SAMET 1, (Member, IEEE)
1 Computer Engineering Department, Ankara University, 06830 Ankara, Turkey
2 Computer Engineering Department, Siirt University, 56100 Siirt, Turkey

Corresponding author: Ömer Aslan (omer.aslan@siirt.edu.tr)

ABSTRACT According to the recent studies, malicious software (malware) is increasing at an alarming
rate, and some malware can hide in the system by using different obfuscation techniques. In order to protect
computer systems and the Internet from the malware, the malware needs to be detected before it affects a
large number of systems. Recently, there have been made several studies on malware detection approaches.
However, the detection of malware still remains problematic. Signature-based and heuristic-based detection
approaches are fast and efficient to detect known malware, but especially signature-based detection approach
has failed to detect unknown malware. On the other hand, behavior-based, model checking-based, and
cloud-based approaches perform well for unknown and complicated malware; and deep learning-based,
mobile devices-based, and IoT-based approaches also emerge to detect some portion of known and unknown
malware. However, no approach can detect all malware in the wild. This shows that to build an effective
method to detect malware is a very challenging task, and there is a huge gap for new studies and methods.
This paper presents a detailed review on malware detection approaches and recent detection methods which
use these approaches. Paper goal is to help researchers to have a general idea of the malware detection
approaches, pros and cons of each detection approach, and methods that are used in these approaches.

INDEX TERMS Cyber security, malware classification, malware detection approaches, malware features.

I. INTRODUCTION malware instances can present the characteristics of multiple

In recent years, almost every member of the society has been classes at the same time.
using the Internet for daily life. This is because it is almost In the early days, malware was written for simple purposes,
impossible to do anything without the Internet including thus, it was easier to detect. This kind of malware can be
social interactions, online banking, health related transaction, defined as traditional (simple) malware. However, these days,
and marketing. Since the Internet has been growing rapidly, the malware which can run in kernel mode, and is more
criminals have started to commit crimes on the Internet rather destructive and harder to detect than traditional malware can
than in real world. Criminals are generally using malicious be defined as new generation malware (next-generation). This
software to launch cyber-attacks to the victim machines. Any kind of malware can easily bypass protection software that is
software which intentionally executes malicious payloads running in kernel mode such as firewalls, antivirus software,
on victim machines (computers, smart phones, computer etc. Generally, traditional malware consists of one process
networks, etc.) is considered as malware. There are differ- and does not use complicated techniques to hide itself. On the
ent types of malware including virus, worm, Trojan horse, other hand, new generation malware uses multiple different
rootkit, and ransomware. Each malware type and family is existing or new processes at the same time, and uses some
designed to affect original victim machine in different ways obfuscated techniques to hide itself and become persistent
such as damaging the targeted system, allowing remote code in the system. New generation malware can launch more
execution, stealing confidential data, etc. These days, the destructive attacks such as targeted and persistent which have
classification of malware is getting harder because some never been seen before, and more than one type of malware is
used during the attacks. The comparison of traditional versus
new generation of malware can be seen in Table 1.
The associate editor coordinating the review of this manuscript and These days, the number, sophistication, and cost of mal-
approving it for publication was Ali Kashif Bashir . ware inflicted on the world economy have been increasing

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 6249
Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

TABLE 1. Traditional versus new generation malware. no method could detect all new generation and sophisticated
malware. This shows that building an effective method to
detect malware is a very challenging task, and there is a huge
demand for new studies and methods.
This paper presents the literature review in order to inves-
tigate the current situation of malware detection approaches.
The paper makes the following contributions:
• Explains new technological trends for malware creation
and new approaches to detect malware.
• Investigates the probability of detecting malware.
• Presents a summary of the current studies on malware
detection.
• Explains important approaches and methods for mal-
ware detection.
• Discusses current challenges and proposes new assump-
tions for malware detection approaches.
• Provides a systematic overview of malware detection
approaches and methods for further studies.

incrementally. According to scientific and business reports, The rest of the paper is organized as follows: Section II
approximately 1 million malware files are created every day, demonstrates problem definition. Malware detection tech-
and cybercrime will damage the world economy by approxi- niques and algorithms are explained in section III, and
mately $6 trillion annually by 2021 [1]. Recent studies show malware detection approaches are explained in section IV.
that mobile malware is on the rise. According to the McAfee Evaluation on malware detection approaches are presented in
mobile threat report, there is a huge increase in backdoors, section V. Finally, the conclusion and future works are given
fake applications and banking Trojans for mobile devices [2]. in section VI.
Besides, the malware attacks related to the social media,
healthcare industry, cloud computing, internet of things (IoT), II. PROBLEM DEFINITION
and cryptocurrencies are also on the rise. According to cyber- This section investigates the problem of malware and possi-
security ventures, ransomware malware will cost around bility of detection. It can be said that it is impossible to design
$11.5 billion globally at the end of 2019 [1]. an algorithm which can detect all malware. This is because
To protect legitimate users and companies from mal- the problem of detecting the malware has shown NP-complete
ware, malware need to be detected. Malware detection is in many studies. This is important because before starting to
the process of determining whether a given program has build an effective detection system, it is a good practice and
malicious intent or not. In early days, signature-based detec- experience for researcher to understand the scope, limitation,
tion approach was used widely to detect malware. However, and possibility of malware detector. The possibility of detec-
this approach has some limitations such as it cannot detect tion malware is remaining problematic because theoretically
unknown and new generation malware. In process of time, it is a hard problem, and practically malware creators using
researchers proposed new approaches including behavioral-, complicated techniques such as obfuscation to make detect-
heuristic-, and model checking-based detection. With these ing process very challenging.
approaches, datamining and machine learning (ML) algo-
rithms are also started to be used widely in malware detec- A. DIFFICULTY OF PROBLEM IN THEORY
tion. Recently, new approaches have been proposed such Since the first malware that appeared in the wild was a virus,
as deep learning-, cloud-, mobile devices-, and IoT-based most of the studies had been done theoretically were based on
detection. For known and some of unknown malware, heuris- the detection of virus. According to early studies, the detec-
tic detection approach performs well. On the other hand, tion of virus is impossible [3]–[5] and NP-complete [6]–[9].
for unknown and complicated malware; behavior-, model According to F. Cohen, the detection of computer virus is
checking-, and cloud-based approaches perform better. Deep an undecidable because detection process itself contains a
learning-, mobile devices-, and IoT-based approaches also contradiction [3], [5], [6]. If the detection problem is seen as
emerge to detect some portion of known and unknown a decision-making problem, D (decision-maker) will decide
malware. It has not been proved exactly that one detec- whether P is a virus or not. According to Cohen, it cannot
tion approach is more effective than the others. This is be decided whether P is a virus because if P is a virus,
because each method has its own advantages and disadvan- it will be marked by D as a virus and will not be able to
tages, and in different situation one method can detect better make changes to other programs, as it will not act as a virus.
than another. Even though several new methods have been If D decision maker did not identify P as a virus, P will
proposed by using different malware detection approaches, interact with other programs to spread and become infected.

6250 VOLUME 8, 2020

Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

This decision process involves contradiction, and therefore • Packaging: Packaging is an obfuscation technique to
it is not possible to identify P as a virus. According to compress malware to prevent detection, or hiding the
M. Chess and R. White, there is no program that detects actual code by using encryption [15], [16]. Due to this
all viruses without false positives (FPs) because viruses are technique, malware can easily bypass firewall and anti-
polymorphic and can be exist in different forms [5]. Accord- virus software. Packaged malware need to be unpacked
ing to M. Adleman detecting a virus is quite intractable and before being analyzed. The packers can be divided into
almost impossible [7]. This is because according to Gödel 4 different groups include compressors, crypters, protec-
numberings of the partial recursive functions, it is not pos- tors, and bundlers.
sible to create detecting mechanism. To reliably identifying In this section, the limitations of malware detecting sys-
a bounded-length mutating virus is NP-complete explained tems have been summarized. Current studies demonstrate that
in [8]. According to the author, virus detector for certain virus it is almost impossible to write an algorithm to detect all
strain can be used to solve the satisfiability problem. Since malware. This is because the computational complexity of
satisfiability problem is known to be NP-complete, so the malware is not clear, and the detection of malware problem
detection of the malware is NP-complete. Zuo et al. claim that is proved to be NP-complete. Besides, the use of new tech-
there exist computer viruses whose detecting procedures have niques (obfuscation and packing) during malware creation
sufficiently large time complexity, and there are undecidable also makes detection process more challenging.
viruses which have no minimal detecting procedure [9].
III. MALWARE DETECTION TECHNIQUES AND
B. DIFFICULTY OF PROBLEM IN PRACTICE ALGORITHMS
The new generation malware uses the common obfusca- In recent years, datamining and ML algorithms have been
tion techniques such as encryption, oligomorphic, polymor- used extensively for malware detection. Malware detection
phic, metamorphic, stealth, and packing methods to make is the process of investigating the content of the program and
detection process more difficult. This kind of malware can deciding whether the analyzed program malware or benign.
easily bypass protection software that is running in kernel The malware detection process includes 3 stages: Malware
mode such as firewalls, antivirus software, etc. and some analysis, feature extraction, and classification.
malware instances can also present the characteristics of
multiple classes at the same time. This makes practically A. MALWARE ANALYSIS
almost impossible to detect all malware with single detection In order to understand the content and behaviors of malware,
approach. The definition of common obfuscation techniques it needs to be analyzed. Malware analysis is the process
explain as follows: of determining the functionality of malware and answers to
following questions [17], [18]. How malware works, which
• Encryption: In encryption, malware uses encryption to machines and programs are affected, which data is being
hide malicious code block in its entire code [10]. Hence, damaged and stolen, etc. There are mainly two techniques
malware becomes invisible in the host. to analyze malware: static and dynamic [17]. Static analysis
• Oligomorphic: In oligomorphic method, a different key examines the malware without running the actual code [19].
is used when encrypting and decrypting malware pay- On the other hand, dynamic analysis examines the malware
load [11]. Thus, it is more difficult to detect malware behaviors while running its code. Malware analysis starts
which uses oligomorphic method than encryption. with basic static analysis and finishes with advanced dynamic
• Polymorphic: In polymorphic method, malware uses a analysis. The malware is analyzed by using reverse engineer-
different key to encrypt and decrypt [12] likewise the key ing [20] and some other malware analysis tools to represent
used in oligomorphic method. However, the encrypted the malware in different format. Reverse engineering process
payload portion contains several copies of the decoder can be seen in Figure 1.
and can be encrypted in layered [13]. Thus, it is more
difficult to detect polymorphic malware when compared
to oligomorphic malware.
• Metamorphic: Metamorphic method does not use
encryption. Instead, it uses dynamic code hiding which
the opcode changes on each iteration when the malicious
process is executed [14]. It is very difficult to detect
such malware because each new copy has a completely
different signature.
• Stealth: Stealth method also called code protection,
implements a number of counter techniques to prevent it
from being analyzed correctly [11]. For instance, it can
make changes on the system and keep it hidden from
detection systems. FIGURE 1. A flow chart of reverse engineering process.

VOLUME 8, 2020 6251

Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

B. MALWARE FEATURE EXTRACTION network related attacks which are used for intrusion
Malware features are extracted by using data mining tech- detection system.
niques. Data mining is the process of extracting new mean- • Drebin dataset (2014): This dataset is created for smart
ingful information from large datasets or databases which has phones to examine the effectiveness of the existing anti-
been unknown before this process. In recent years, by using virus software [23]. It consists of 5560 malware across
datamining new models and datasets have been created [21]. 20 families and 123,453 benign samples.
There are different models such as n-gram, and graph model • Microsoft malware classification challenge dataset
to create malware dataset and features. (2015): It has been published by Microsoft and consists
of 20,000 malware [24]. Malware has been analyzed
using the IDA packet disassembler and the output should
1) THE n-gram MODEL
be processed using data mining prior to ML.
The n-gram is a feature extraction technique which has been
• ClaMP (Classification of Malware with PE headers)
used widely in many areas as well as malware detection.
dataset (2016): It consists of 5184 records and has
The n-gram can use both static and dynamic attributes to cre-
55 properties [25]. The dataset uses API arrays, contains
ate features. To create features from behaviors, n-gram group
examples of malicious and benign software with their
the system calls or application programing interfaces (APIs)
features.
in a consecutive order by specified n (n = 2, n = 3, n =
• AAGM dataset (2017): It is a network-based dataset for
4, n = 6, etc.) values. Although the n-gram model has been
android malware [26]. It consists of 400 malware and
used widely in malware detection, it has some drawbacks
1500 benign samples from 12 families [26].
when determining features. This is because every sequential
• EMBER dataset (2018): It consists of 1 million records
static and dynamic attributes are not related to one another.
and holds malware and benign features [27].
This makes classification and clustering more challenging for
later processes. Besides, n-gram generates enormous feature These datasets can be used for researches who want to get
space which increases the analysis time and decreases the some experience before proposing a new malware detection
model performance. For these reasons, there is a huge demand approach.
to find out new models to achieve better performance than
n-gram. C. MALWARE CLASSIFICATION
Machine learning (ML) is a set of algorithm that correctly
2) GRAPH-BASED MODEL estimates the outcomes of the applications without being
The graph-based model is one of the commonly used tech- explicitly programmed. The purpose of the ML is to convert
niques to generate features as well. System calls made in the input data into acceptable value intervals by using statisti-
this method are converted into graph G (V, E) such that cal analysis. By using ML, many operations can be performed
V represents nodes which identify system calls and the E on related data such as classification, regression and cluster-
represents edges which identify the relationship among the ing. ML algorithms have been used in malware detection for
system calls. Since the size of the graph increases over time, many years [28]. Well-known ML algorithms are Bayesian
sub-diagrams can be used to describe the graph. The sub- network (BN), naive Bayes (NB), C4.5 decision tree variant
diagram is defined in many studies as NP-Complete. This (J48), logistic model trees (LMT), random forest tree (RF),
means that it requires a lot of time to define each sub-diagram. k-nearest neighbor (KNN), multilayer perceptron (MLP),
After the whole diagram is expressed with fewer nodes and simple logistic regression (SLR), support vector machine
edges, the programs are identified as malicious or benign. (SVM), and sequential minimal optimization (SMO). These
algorithms are used especially in behavior-based detection
and some of other detection approaches. Although each algo-
3) MALWARE DATASET rithm has its own advantages and disadvantages, it cannot be
As in other research areas, there are not many datasets pub- concluded that one algorithm is more efficient than another.
lished previously which are accepted and widely used for However, an algorithm can perform better than other algo-
malware detection. In addition, most of the existing datasets rithms in terms of the distribution of the data, number of
are not accessible for research, and in most cases the datasets features, and dependencies between properties.
accessed are not in the appropriate formats for data mining
processes and ML algorithms. The datasets used in mal-
IV. MALWARE DETECTION APPROACHES
ware analysis can be listed as follows: NSL-KDD, Drebin,
In recent years, there has been a rapid increase in the num-
Microsoft malware classification challenge, ClaMP (classifi-
ber of academic studies on malware detection. In the early
cation of Malware with PE headers), AAGM, and EMBER
days, signature-based detection method was widely used.
dataset.
This method works fast and efficiently against the known
• NSL-KDD dataset (2009): It is an updated version of malware, but does not perform well against the zero-day
the KDD’99 dataset which consists of approximately malware [21], [29]. In the process of time, researchers have
125,000 records and 41 features [22]. It shows the started to use techniques such as behavior-, heuristic-, and

6252 VOLUME 8, 2020

Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

FIGURE 2. A flow chart of malware detection approaches and features.

model checking-based detection; and new techniques such as A. SIGNATURE-BASED MALWARE DETECTION
deep learning-, cloud-, mobile devices-, and IoT-based detec- Signature is a malware feature which encapsulates the
tion. Overview of malware detection approaches, features, program structure and identifies each malware uniquely.
and used techniques can be seen in Figure 2. Signature- based detection approach is widely used within
In each approach, feature extracting method is different one commercial antivirus. This approach is fast and efficient to
from another. It could not have been proven one detection detect known malware, but insufficient to detect unknown
method works better than another because each method has malware. In addition, malware belonging to the same fam-
its own advantages and disadvantages. By using behavior-, ily can easily escape the signature-based detection by using
heuristic-, and model checking-based detection approaches; obfuscation techniques. General view of signature-based
huge number of malware can be detected with a few behaviors detection schema can be seen in Figure 3.
and specifications. In addition, new malware can be detected
by using these approaches as well. However, they cannot 1) SIGNATURE GENERATION PROCESS
detect all malware. There is great necessity to find the method During the signature generation, first features are extracted
which effectively detects more complex and unknown mal- from executables (Figure 3). Then, signature generation
ware. Before explaining each detection approach in details, engine generates a signatures and stores them into signature
some well-known methods in each detection approach and database. When sample program needs to be marked as mal-
their related works are summarized in Table 2. Then, detailed ware or benign, signature of the related sample is extracted
literature review is presented, and the pros and cons of each as the same way before and compared with signatures on
study are explained. the database. Based on the comparison, sample program is

VOLUME 8, 2020 6253

Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

TABLE 2. Summary of related works on malware detection approaches.

marked as malware or benign. There are many different tech- • String Scanning: Compares the byte sequence in
niques to create a signature such as string scanning, top-and- the analyzed file with the byte sequences previously
tail scanning, entry point scanning, and integrity checking. saved in the database. Byte signatures have been

6254 VOLUME 8, 2020

Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

code [11]. Therefore, certain malware can be detected

by extracting the signature from the sequences at the
program entry points.
• Integrity Checking (Hash Signatures): Integrity check
generates a cryptographic checksum such as MD5 and
SHA-256 for each file in a system at regular intervals,
and it is used to identify possible changes that may be
caused by malware.
Different signature generation techniques have been sum-
marized. Even though these techniques are quite fast and effi-
cient to generate a signature, they are not resistant to malware
obfuscating techniques. For example, malware can easily
change the strings and program entry point in its instruction
set. By this, generated signature may mislead the detecting
FIGURE 3. Signature-based malware detection schema.
schema. To extract more powerful and general signatures, dif-
ferent techniques and features can be used. Detailed review of
TABLE 3. Example ClamAV byte signature.
signature-based malware detection approach and its methods
are summarized as follows:

2) RELATED WORKS FOR SIGNATURE-BASED DETECTION

TABLE 4. ‘‘90FF1683EE0483EB0175F6’’ Assembly byte sequence.
F. Zolkipli and Jantan proposed a new malware detection
framework which is based on s-based detection, genetic
algorithm (GA), and signature generator [63]. Even though
the authors claim that this method can detect unknown mal-
ware, there is not enough information given in the paper
for proposed framework such as test results, number of
malware analyzed, and comparison of proposed method with
other existing studies. Tang et al. proposed a bioinformatics
TABLE 5. Display of byte signatures in Yara format. technique to generate accurate exploit-based signatures for
polymorphic worms [64]. The technique involves three steps:
multiple sequence alignment to reward consecutive sub-
string extractions, noise elimination to remove noise effects,
and signature transformation to make the simplified regular
expression signature compatible with current IDSs.
The authors claim that suggested schema is noise-tolerant,
and more accurate and precise than those generated by
some other exploit-based signature generation schemas. This
is because it extracts more polymorphic worm characters
used extensively by antivirus scanners for many years. like one-byte invariants and distance restrictions between
They are often used to detect malware which belongs invariant bytes. However, proposed schema is limited to
to the same family with different signatures [61]. polymorphic worm and cannot be generalized to other
Table 3 shows the ClamAV byte signature [62]. malware types.
‘‘90FF1683EE0483EB0175F6’’ is the hexadecimal rep- Borojerdi and Abadi proposed a MalHunter detection sys-
resentation of the relevant code section and it is shown in tem which is a new method based on sequence clustering
assembly language as in Table 4. The same byte signature and alignment [65]. It generates signatures automatically
is shown in Yara format in Table 5 [62]. based on malware behaviors for polymorphic malware. The
• Top-and-Tail Scanning: Instead of the whole file, only novel method works as follows: First, from different malware
the top and end points of the file are taken and certain samples, behavior sequences are generated. Then, based on
signatures are created [11]. It is a very convenient signa- similar behavioral sequences, different groups are generated
ture method to detect viruses that attach themselves to and stored in the database. To detect malware sample, behav-
the beginning and end of files. ior sequences are gathered and compared with sequences
• Entry Point Scanning: The entry point of a file indi- which have been generated earlier and stored in the database.
cates where the first run starts when that file starts to run. Based on the comparison, the sample is marked as mal-
Malware usually changes the entry point of a program, ware or benign. The test results showed that by choosing
so that malicious code being executed before the actual the cluster radius 0.4 and similarity threshold 0.05, they

VOLUME 8, 2020 6255

Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

achieved detection rate of 90.83% with a FPR of 0.80%. not been compared with other studies in the literature, and
The authors claim that proposed schema is resistant to the evaluation metrics are not very high and are not explained
obfuscation techniques, and it can be used for the generic in detail.
detection of all types of polymorphic malware rather than
being limited to a specific malware type. The authors also 3) EVALUATION OF SIGNATURE-BASED DETECTION
claim that the suggested system outperformed state-of-the- In the literature review, signature-based detection methods
art signature generation methods including Tang et al. [64], have been summarized. Signature-based detection schema
Newsome et al. [66], and Perdisci et al. [67] previously has been used for antivirus vendors for many years and it
reported in the literature. The proposed method is limited to is quite fast and effective to detect known malware. This
polymorphic malware and it has been tested on only hundreds approach is generally used to detect malware which belongs
of malware which is not enough to determine the performance to the same family. However, it fails to detect new gen-
of proposed method. eration malware which uses obfuscation and polymorphic
Automatic string signatures generation (Hancock) is techniques. Besides, it is prone to many FPs and extracting
explained in [41]. According to the paper, proposed schema signature takes a lot of man-power.
can automatically generate high-quality string signatures Although previous signature-based methods have achieved
with minimal FPs and maximal malware coverage. The pro- some success, they are not enough to detect new generation
posed method uses a set of library code identification tech- malware. To build an effective signature, the following key
niques, and diversity-based heuristics techniques to ensure points are taken into consideration:
the contexts in which a signature is embedded in contain- • Signature should be as short as possible and can repre-
ing malware files similar to one another [41]. Although the sent many malware with single signature,
authors claim that Hancock can automatically generate string • Effective automatic signature generation mechanism
signatures with a FPR below 0.1%, this FPR will be changed must be built,
based on benign samples that are analyzed. This is because • During the signature generation, datamining and ML
benign set is constantly growing, and getting some satisfy- techniques need to be used more,
ing result on some part of benign cannot be generalized to • Signature should be resistant to packing and obfuscation
whole set. Thus, these problems need to be solved for further techniques.
studies. Santos et al. proposed n-grams-based file signatures
to detect malware [68]. First, for known files n-grams are
extracted for every file and used as a file signature. Then, for B. BEHAVIOR-BASED MALWARE DETECTION
any unknown instance, n-grams are generated, and by using Behavior-based malware detection approach observes the
measuring function and k-nearest neighbor algorithm [69], program behaviors with monitoring tools and determines
file is marked malware or benign. Paper demonstrated that whether the program is malware or benign. Although the pro-
n-grams-based signatures can detect unknown malware to a gram codes are being changed, the behavior of the program
certain degree. will be similar; thus, majority of new malware can be detected
Efficient signature based malware detection on mobile with this method [29]. On the other hand, some malware bina-
devices is proposed in [70]. First, signature has been cre- ries do not run properly under protected environment (virtual
ated. Second, hash table has been used to store the hash machine, sandbox environment). Hence, malware samples
values of signatures to increase scanning speed. Finally, sig- are may be incorrectly marked as benign.
nature matching algorithm is used to compare the signatures.
To eliminate the mismatches, the probability of occurrence 1) BEHAVIOR DETECTION PROCESS
of signature bytes in non-malicious content has been used.
When establishing a behavior-based detection system,
According to the authors, the results have shown that sug-
behaviors are obtained by using one the following
gested schema performs well when compared to the Clam-AV
procedure:
scanner, and provides huge memory savings while main-
taining fast scanning speed. The proposed system was only • Automatic analysis by using sandbox [18];
compared with Clam-AV scanner, which is not enough for • Monitoring of system calls [36], [38];
overall evaluation. Zheng et al. presented the DroidAnalytics, • Monitoring of file changes [18];
an Android malware analytic system which can automatically • Comparison of registry snapshots [29];
collect malware, generate signatures for applications, identify • Monitoring network activities [17];
malicious code segment, and associate the malware under • Process monitoring [18].
study with various malware in the database [71]. In proposed In behavior-based detection, first, behaviors are deter-
system three-level signature generation schema has been used mined by using one of the technique used above and the
to identify each application. The authors assert that proposed dataset is created by subtracting the features using datamin-
signature methodology provides significant advantages over ing. Then, specific features from the dataset are obtained and
traditional cryptographic hash like MD5-based signature, and classification done by using ML algorithms. General view of
resistant to packing and mutations. The proposed system has behavior-based schema can be seen in Figure 4.

6256 VOLUME 8, 2020

Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

malware. According to the authors, experimental evaluation

demonstrated that algorithm can detect all variants of certain
malware with noFPs, and is resilient to obfuscation trans-
formations. However, the algorithm has some limitation for
obfuscation transformations. For instance, it cannot handle
instruction replacement very well, and fails to detect mal-
ware which uses this technique. Handling instruction replace-
ment problem and different ordering of memory updates can
improve the performance.
The behavior detection methods which limit the number of
features are represented in [74], [75], [38]. Lanzi et al. [74]
proposed a system-centric behavior model. In proposed
model, the interaction of the malware programs with system
resources (directory, file, registry, etc.) is different from the
benign. The behavior sequences of the program to be marked
were compared with the behavior sequences of the two groups
(malware, benign). The authors claim that the proposed sys-
tem detected a significant fraction of malware with a few
FP. The proposed method could not detect all malicious
activities such as malware which does not attempt to hide
its presence or to gain control of the OS, and which uses
only computer network for transmission. To include network-
FIGURE 4. Behavior-based malware detection schema.
related policies, and rules for malware programs that ignore
other applications and the OS can improve the performance.
M. Chandramohan et al. suggested Bounded Feature Space
2) RELATED WORKS FOR BEHAVIOR-BASED DETECTION
Behavior Modeling (BOFM) which limits the number of
The program similarities using system calls were described
features to detect malware [75]. In this model, system calls
in [36]. Wagener et al. proposed a flexible and automated
were transformed into high-level behaviors and features were
technique to extract malware behaviors from the system
created using the behaviors. The feature vector was created
calls. The alignment technique has been used to identify
and ML algorithms were applied to the feature vector to
similarities, and Hellinger distance has been calculated to
determine whether the given program is malware or benign.
compute associated distances. According to the paper, obfus-
BOFM is fixed dimension which means it does not grow in
cated malware variants that show similar behaviors can be
proportion with the number of malware samples. This makes
detected. The authors assert that the classification process
BOFM efficient and scalable in practice. Also, by using
can be improved using a phylogenetic tree that represents the
BOFM a better detection accuracy, lower computation times,
common functionalities of malware. The missing aspects of
and memory usage were obtained. This method ignored the
the article can be address as the following:
frequency of system calls. Executing the same system call
• Lack of knowledge about the malware dataset is shown, repeatedly can cause DoS attacks. Considering the frequency
• Statistical evaluation of performance is not provided, of system calls can improve DR and accuracy. A hardware-
• Comparison of proposed method against other methods enhanced architecture which uses a processor and field-
are not given. programmable gate array (FPGA) is proposed in [38]. The
Besides, it is not clear how phylogenetic tree can improve authors represented a frequency-centralized model (FCM) to
the performance. extract the system calls and construct the features from the
The behavior-based detection approach is proposed by behaviors. Features obtained from the benign and malware
Fukushima et al. in [72]. The proposed method can detect samples were used for training the ML classifier to detect
both unknown and encrypted malware on Windows OS. the malware. The paper claims that the suggested system
The proposed framework checks not only specific behaviors achieved a high classification accuracy, fast DR, low power
that malware performs, but also normal behaviors that mal- consumption, and can detect new malware samples. Besides,
ware usually does not perform. According to authors, DR proposed method supports early prediction which can detect
was approximately 60% to 67% without any FP. The DR malware while malware is still running. However, malware
is very low, to increase the DR, more malicious behaviors can perform various behaviors, and there is no uniform policy
can be identified, and to prove the effectiveness of new to specify number of behaviors and features to be extracted
method, test set will be extended. Semantics-aware malware before triggering the early prediction. Furthermore, the pro-
detection is proposed in [73]. The authors are determined posed method performance has only been compared with
that certain malicious behaviors such as a decryption loop BOFM and n-gram which is not enough to determine the
in a polymorphic virus appears in all variants of a certain efficiency of the proposed algorithm.

VOLUME 8, 2020 6257

Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

Liu et al. used MapReduce to group malware behaviors Graph-based malware detection using dynamic analysis
and detect malware [76]. According to the authors, most is proposed in [42]. The proposed schema works on graphs
of the studies done so far were process-oriented, and deter- which are constructed from dynamically collected instruction
mined a process as a malware only by its invoking system traces of the target executable. Markov chains have been used
calls. However, now most of the malware, which is defined in which the vertices are the instructions and the transition
as complex malware, consists of several processes and is probabilities are estimated by the data contained in the trace.
transmitted to the system by driver or by DLL [77]. In such They constructed similarity matrix which is combination of
cases, malware performs actions on victim machine by using graph kernels between the instruction trace graphs. They per-
more than one process instead of its own processes. When formed classification by using SVM on similarity matrix. The
only one process is analyzed, malware can be marked as results showed that there is a significant improvement over
benign. The paper emphasized persistent behaviors by using signature-based and other machine learning-based detection
Auto-Start Extensibility Points (ASEP), and based on these techniques. For the test case, modified version of Ether frame-
behaviors it differentiated malware from benign. The exper- work has been used. There are some limitations of Ether
imental results showed that the DR improved on previous system including:
research by 28%. However, there are some limitations of pro- • Ether is not completely invisible which means that some
posed method. The limitations of this method can be address intelligent malware can detect it and does not show their
as follows: real behaviors,
• Some malware binaries do not require persistent behav- • Ethernet card can be emulated by the underlying Xen
ior ASEP, system and string settings can be changed by malware,
• Persistent malware behaviors can be completed without • Ether is quite slow for malware analysis.
using system calls, Using different framework can increase the performance.
• The cost of data transmission has not been measured. Mojtaba and Hashemi proposed a graph mining method
Besides, the proposed method results have not been compared for detecting unknown malware binaries [80]. First, the paper
with other studies in the literature. Eliminating above limita- extracted control flow graph (CFG) from programs and com-
tions can improve the method performance. bined it with extracted API calls to have more information
A supervised ML model is proposed in [78]. The model about executable files. This new representation model was
applied a kernel base SVM that used weighting measures, called API-CFG. Then, the CFGs were converted to a set
which calculates the frequency of each library call to detect of feature vectors. Finally, the classification was performed
Mac OS X malware. The DR was calculated as 91% with by ML algorithms. According to the authors, the proposed
3.9% FP rate. Test results indicated that increasing sample method classified unseen benign and malicious code with
size increased the detection accuracy, but decreased the FPR. high accuracy, and outperformed n-grams based detection
Combining static and dynamic features, using other tech- method. However, the paper did not evaluate the performance
niques such as fuzzy classification and deep learning can for obfuscated malware, and also did not compare the results
increase the performance. with known methods. To compare performance with other
A graph-based detection schema was defined in [79], graph mining approaches may generate more trustworthy
[37]. Kolbitsch et al. [79] proposed a graph-based detection, results.
in which the system calls are converted into a behavior graph,
3) EVALUATION OF BEHAVIOR-BASED DETECTION
where nodes represented system calls and edges indicated
In literature review, behavior-based detection approach and
transitions among system calls that showed the data depen-
related methods have been summarized. Detection schema
dency. The program graph to be marked is extracted and
based on behaviors consists of 3 steps:
compared with the existing graph to determine whether the
given program is malware. Even though the proposed model • Determine behaviors (datamining can be used),
performed well for the known malware, it has difficulties in • Extract features from behaviors (datamining is used),
detecting unknown malware. A graph-based method which • Apply classification (machine learning is used).
specifies the common behaviors of malware and benign sam- Data mining techniques such as n-gram, n-tuple, bag, graph
ples is represented in [37]. In proposed system, kernel objects model, etc. have been used to determine the features from
were determined by system calls and behaviors were deter- behaviors; Hellinger distance, cosine coefficient, chi-square,
mined according to these objects. According to the authors, etc. (probability and statistical method) distance algorithms
the proposed method is scalable and can detect unknown are used to specify similarities among features. The diffi-
malware with high DR, and with low FP rates. In addition, culties in defining a behavior, the large number of extracted
the proposed model is highly scalable regardless of new features (when using n-grams, etc.), and the difficulties
instances added and robust against system call attacks. How- in identifying the similarities and differences among the
ever, the proposed method can observe only partial behavior extracted properties have prevented the creation of an effec-
of an executable. To explore more possible execution paths tive detection system. Besides, some malware does not
would improve the accuracy of this method. run properly within the virtual machines/sandboxes, and

6258 VOLUME 8, 2020

Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

advanced code obfuscating techniques prevent malware from This way, the technique did not need to deal with a large
being analyzed correctly. The use of new methods and tech- database of rules, which also accelerates the detection time
niques along with the use of ML and data mining algorithms and accuracy rate. According to the paper, the proposed
in malware detection has begun to play a major role when system outperformed popular antivirus software tools such as
generating features meaningfully. There is huge demand for McAfee, VirusScan and Norton AntiVirus; and outperformed
more scientific studies to cover shortcomings of existing data-mining-based detection systems including naive Bayes,
methods. This study has summarized the existing researches support vector machine (SVM), and decision tree techniques.
and makes suggestions to fill the gap. To collect more API calls which can provide more informa-
tion about malware and identify complex relationships among
C. HEURISTIC-BASED MALWARE DETECTION the API calls may improve the performance.
In recent years, heuristic based detection approach has been Since traditional signature-based anti-virus systems fail to
used frequently [81]. It is a complex detection method which detect polymorphic, metamorphic, and previously unknown
uses experiences and different techniques such as rules and malicious executables; heuristic-based malware detection is
ML techniques [10]. Although it has a high accuracy rate to explained in [84], [85]. Yanfang et al. proposed intelli-
detect zero-day malware to a certain degree, it cannot detect gent malware detection system (IMDS) [84]. The IMDS
complicated malware. Heuristic-based detection schema can used objective-oriented association (OOA) mining that works
be seen in Figure 5. based on windows API calls. The method consists of 3
parts: PE (portable executables) parser, OOA rule genera-
tor, and rule based classifier. PE parser extracted Windows
API execution calls from PE. OOA Fast_FP-Growth algo-
rithm used API calls and generated association rules. Finally,
based on the association rules, OOA mining algorithms per-
formed and executables marked malicious or benign. The
paper claims that the proposed system performed better
than other techniques including anti-virus software such as
Norton AntiVirus, McAfee VirusScan and KAV, as well as
the systems using data mining techniques such as naive
Bayes, SVM and decision tree. To overcome the disadvan-
tages of signature-, and behavioral-based malware detection
approaches, B. Zahra, et al. proposed heuristic type of method
which can detect malware that cannot be detected by previous
two approaches [85]. Authors applied learning algorithm to
generate a pattern which was similar to signature. Based on
the signature, new suspicious programs were marked mal-
ware or benign. The paper mentioned API system calls, oper-
ational code (Opcode), n-grams, control flow graph (CFG),
and hybrid features that are used extensively in heuristic
approach [85].
FIGURE 5. Heuristic-based malware detection schema. A statistical analysis of opcode frequency distributions to
identify and differentiate modern (polymorphic and meta-
morphic) malware is explained in [86]. A total of 67 malware
1) RELATED WORKS FOR HEURISTIC-BASED DETECTION executables were sampled statically disassembled and their
Arnold and Tesauro proposed an automatically generated statistical opcode frequency distributions were compared
Win32 heuristic virus detection in [82]. They automatically with the aggregate statistics of 20 non-malicious samples.
construct multiple neural network classifiers which can detect Test results showed that there is a statistically significant dif-
unknown Win32 viruses. Generally, heuristic schema has ference in opcode distribution between malware and benign.
high FP rate, but the authors claim that by combining the To get more reliable results, more samples need to be ana-
individual classifier outputs using a voting procedure, the risk lyzed and suggested method results’ need to be compared
of FP is reduced to an arbitrarily low level. The study is with other well-known heuristic methods. A detection system
limited to Win32 virus, and can be extended to other mal- that combines static and dynamic features has been suggested
ware. More malware needs to be examined for this method. in [43]. According to the paper, combining static and dynamic
Expert-designed heuristic features can improve the perfor- features improve the method performance. By combining
mance. Yanfang et al. proposed post-processing techniques these features, the feature vector was constructed and classi-
of associative classification for malware detection [83]. The fied using ML classifiers. The paper claims that the detection
proposed system greatly reduced the number of generated rate of the proposed system is satisfactory and increased when
rules by using rule pruning, rule ranking, and rule selection. compared to their first study. However, the probability of

VOLUME 8, 2020 6259

Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

detecting unknown malware is still low and FPR is high.

Using more distinctive features and train the model with more
malware may improve the method performance for unknown
malware.
Naval et al. [44] suggested a dynamic malware detec-
tion system, which collects system calls and constructs a
graph that finds the semantically relevant paths among them.
To find all semantically-relevant paths in a graph is a NP-
complete problem. Thus, to reduce the time complexity,
the authors measured the most relevant paths, which specify
malware behaviors that cannot be found in benign samples.
The authors claim that the proposed method outperforms
its counterparts because it can detect malware even using
system-call injection attacks at a high percentage, which the FIGURE 6. Model checking-based malware detection schema.
similar methods cannot detect. The paper has some limita-
tions such as performance overhead during path computation,
it is vulnerable to call-injection attacks, and cannot identify variants with a single specification. Proposed method has
all semantically-relevant paths efficiently. Eliminating these some limitations as follows:
limitations may improve the performance.
• Can only detect worm variants,
• Some part of process has to be done manually,
2) EVALUATION OF HEURISTIC-BASED DETECTION
• Performance of proposed method is low.
The literature review of heuristic-based malware detection
has been explained. Heuristic-based schema can use both To get better results, CTPL can be extended to detect other
strings and some behaviors to generate rules, and based on malware. In addition, more accurate data integrity construc-
that rules it generates signature. It uses API calls, CFG, tions and efficient data structures can be used to improve the
n-grams, Opcode, and hybrid features when generates a sig- method performance.
nature [85]. Although the heuristic-based detection can detect Holzer et al. presented verification technology to specify
various forms of known and unknown malware, it is insuffi- and detect malware [87]. They explained malware detection
cient to detect all new generation of malware. In addition, tool chain which integrates the process of specification devel-
heuristic-based approaches are prone to high FPR. opment, and enables future automated malware analysis with
specification extraction. In this method, malicious behavior is
D. MODEL CHECKING-BASED MALWARE DETECTION formalized using the expressive specification language CTPL
Although model checking is originally developed to verify based on classic CTL, and extracts a finite state model from
the correctness of system against specifications, it has been the disassembled executable. Authors claim that a model
used to detect malware as well. In this detection approach, checking-based approach can capture the semantics of mal-
malware behaviors are manually extracted and behavior ware more accurately than traditional methods, and conse-
groups are coded using linear temporal logic (LTL) to display quently achieve higher DR. There is not enough information
a specific feature [10]. Program behaviors are created by about proposed method and its test results. To get more
looking at the flow relationship of one or more system calls reliable results, more malware and benign samples need to be
and define behaviors by using properties such as hiding, analyzed, behavioral dependencies should be clear to extract
spreading, and injecting. By comparing these behaviors, it is more accurate specifications, and the whole process can be
determined whether the program is malware or benign. Model automated.
checking-based detection can detect some new malware to Kinder et al. proposed a proactive malware detector which
a certain degree, but cannot detect all new generation of works based on model checking and can detect worm variants
malware. Model checking-based detection schema can be without signature updates [88]. They described a tool that
seen in Figure 6. extracts an annotated control flow graph from the binary and
automatically verifies against a formal malware specifica-
1) RELATED WORKS FOR MODEL CHECKING-BASED tion. For this, they introduced the new specification language
DETECTION CTPL, which balances the high expressive power needed for
Kinder et al. proposed a flexible method to detect malicious malware signatures with efficient model checking algorithms.
code patterns in executables by model checking [46]. They Test results showed that suggested method can recognize
introduced the specification language CTPL (computation variants of existing malware with a low risk of FP. The
tree predicate logic) which extends the well-known logic CTL suggested approach is an early stage and subject to some
(computation tree logic), and describes an efficient model limitations:
checking algorithm. According to the authors, test results • Model extraction process is syntactic and does not
demonstrated that proposed method can detect many worm include data flow analysis,

6260 VOLUME 8, 2020

Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

• Malicious behaviors are split across several procedures approach which can provide only a limited view of the mal-
and cannot be identified unless procedures are inlined, ware. To identify behavioral dependencies more accurately;
which decreases the method performance, extract more accurate specifications; and using effective LTL,
• The macro does not cover all instruction sets of the CTL, CTPL formulas can improve the performance. Model
x86 architecture. checking-based detection approach can be evaluated at the
Eliminating or decreasing these deficiencies will surely early stage, so, to see the effectiveness of the approach, more
improve the performance. studies need to be done.
Beaucamps et al. represented rewriting and model check-
ing which capture high-level malware behaviors when detect- E. DEEP LEARNING-BASED MALWARE DETECTION
ing malware [89]. Proposed method uses a rewriting-based Deep Learning is a subfield of ML that inherited from artifi-
abstraction mechanism which produces abstracted forms of cial neural networks (ANN) which learn from examples. It is
program traces, independent of the program implementation. a new approach and widely used for image processing, driver-
It can handle similar behaviors in a generic way and thus to be less cars, and voice control; but it is not used sufficiently in
robust with respect to its variants. The authors claim that this malware detection. Although it is quite effective and reduces
method can be useful for both static and dynamic analysis. feature space drastically, it is not resistant to evasion attacks.
This approach is at an early stage and in the study only Deep learning-based schema can be seen in Figure 7.
theoretical results are presented. To see the method efficiency,
the proposed method needs to be tested.
Song and Touili proposed a pushdown model-checking
method for malware detection [90]. Proposed schema works
as follows:
• Binary code translates to pushdown systems (PDS),
• The paper introduced a stack computation tree predicate
logic (SCTPL) to represent the malicious behaviors,
• It provides an algorithm to model-check pushdown sys-
tems against SCTPL specifications.
Proposed method reduced the model-checking problem to
checking the emptiness of Symbolic Alternating Büchi Push-
down Systems. The authors claim that they obtained encour-
aging experimental results. However, suggested method
works if the data in the stack cannot be changed by direct
memory access. Identification of android malware families
with model checking is represented in [91]. To show the effec-
tiveness of suggested system most common malware family
in Android environment the DroidKungFu and the Opfake FIGURE 7. Deep learning-based malware detection schema.
families have been analyzed. The suggested algorithm can
analyze and verify the java bytecode that is produced when
the source code is compiled. A preliminary investigation has 1) RELATED WORKS FOR DEEP LEARNING-BASED
been also conducted to assess the validity of the proposed DETECTION
method. The authors mentioned that test results are promis- Large-scale malware classification using random projections
ing, and they can identify malicious payloads with a very high and neural networks is presented in [92]. In the suggested sys-
accuracy in a reasonable time. The paper has analyzed only tem, dimensionality of the original input space had reduced
a few malware families, to extend the analysis and evaluate by a factor of 45 (179K/4K). Using suggested architecture,
more malware families will produce more reliable results. several very large-scale neural network systems with over
Also, investigating the payload family tree can give clues 2.6 million labeled samples were trained and achieved clas-
about phylogenies of malware which will result in better sification results with a two-class error rate of 0.49% for a
classification. single neural network and 0.42% for an ensemble of neural
networks. Authors emphasized that using more hidden layer
2) EVALUATION OF MODEL CHECKING-BASED DETECTION could not improve the accuracy. For example, using one-layer
The literature review of model checking-based detection neural network performed better than two and three-layer
schema has been summarized. This approach is generally neural network. Droid-Sec which uses deep learning- based
used for program verification and not used sufficiently for detection is proposed in [93]. It used both static and dynamic
malware detection. Although it is effective to detect some new analysis and extracted more than 200 features. They used
malware variants, it is still insufficient to detect all complex unsupervised pre-training phase and the supervised back-
malware. Besides, it is a complex and resource-intensive propagation phase. In the pre-training phase, they adopted

VOLUME 8, 2020 6261

Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

the deep belief network (DBN) [94] that utilizes the built fooled by evasion attacks. Grosse et al. investigated the
restricted Boltzmann machines (RBM) which is beneficial for viability of adversarial crafting against deep neural net-
better characterizing Android apps. In the back-propagation works [95]. The authors mentioned that crafted inputs lead
phase, the pre-trained neural network fine-tuned with labeled to deceive ML models which results misclassifications. For
value in a supervised manner. This way, the whole deep evaluation, DREBIN dataset has been used. They achieved
learning model is built completely. According to test results, misclassification rates of up to 80% against neural network,
96% accuracy has been measured which outperformed SVM, which shows that adversarial crafting is indeed a real threat
C4.5, LR, and naïve Bayes. To analyze more apps and to in security critical domains. Kolosnjaji et al. investigated the
automate the analysis processes can be useful to build more vulnerabilities of malware detection methods that use deep
reliable detector. networks to learn from raw bytes [96]. They proposed a
Deep neural network based malware detection using two gradient-based attack that is capable of evading a recently-
dimensional binary program features explained is in [50]. proposed deep network by only changing few specific bytes
Proposed framework consists of 3 main parts: at the end of each malware sample, while preserving its
• In the first part, 4 different types of complementary intrusive functionality. According to their test results, adver-
features from the benign and malicious binaries are sarial malware binaries evade the targeted network with
extracted, high probability, even though less than 1% of their bytes
• In the second part, deep neural network which consists are modified.
of an input layer, two hidden layers and an output layer The literature review of deep learning-based malware
has been used, detection has been summarized. Even though it is power-
• In the third part, score calibrator, which translates the ful, effective and reduces feature space drastically, it is not
outputs of the neural network, is used and the probability resistant to evasion attacks. Besides, building a hidden layer
of the file being malware is measured. takes time and adding extra hidden layers rarely increases the
model performance. Deep learning-based malware detection
According to the authors, suggested system achieves a 95% approach is quite in the early stages, so more studies need to
DR at 0.1% FPR over an experimental dataset of over be done to identify this approach more correctly.
400,000 software binaries. Even though proposed approach
achieved high accuracy rate on the standard cross-validation,
F. CLOUD-BASED MALWARE DETECTION
the performance decreased sharply when split validation was
used. This can be eliminated by using deobfuscation the Cloud computing has been rapidly developing because it
binary before feature extraction. Besides, the number of provides a lot of advantages including easy accessibility, on-
benign samples is too small when compared with the number request storage, and decreasing costs. Since cloud has been so
of malware analyzed. To get accurate estimation more benign popular, it has also been used to detect malware. Cloud-based
samples need to be analyzed. malware detection enhances the detection performance for
Huang and W. Stokes proposed a new multi-task deep Pcs and mobile devices with much bigger malware databases
learning (multi-task neural network- MtNet) architecture for and intensive computational resources. Cloud-based detec-
malware classification [51]. The proposed model is trained tion uses different types of detection agents over the cloud
with data extracted from dynamic analysis of malicious and servers and offers security as a service. A user can upload
benign files. The system is trained on 4.5 million files and any type of file and receive a report whether uploaded file is
tested on a holdout test set of 2 million files. The paper malware or not. Cloud-based detection schema can be seen
claims that MtNet has made a big improvement compared to in Figure 8.
a shallow neural architecture. Multi-task learning encourages
the hidden layers to learn a more generalized representation at
lower levels in the neural architecture. Besides, MtNet archi-
tecture also employs rectified linear unit (ReLU) activation
functions and dropout for the hidden layers. ReLU activation
functions cut the number of epochs needed for training a
binary malware classifier in half while dropout leads to sig-
nificant reductions in the test error rate. The main challenge
of this study is that it is almost impossible to increase the
model performance by adding extra layers. Besides, MtNet is
susceptible to attacks and can be evaded. Overcoming these
challenges may improve the model performance.

2) EVALUATION OF DEEP LEARNING-BASED DETECTION

Although deep learning-based malware detection methods
seem new and powerful to detect malware, it can be also FIGURE 8. Cloud-based malware detection schema.

6262 VOLUME 8, 2020

Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

Even though cloud-based detection approach has many sample in multiple end-users’ environments can improve the
advantages, there are some issues with this detecting schema. results of the analysis with very small overhead. On the other
Some of disadvantages can be the following: hand, suggested framework raises the privacy and security
(1) User needs to upload file contents to the cloud which issues, and is prone to various forms of detection and eva-
can disclose some sensitive data such as location, password, sion attacks. Solving security related issues and implement
and credit card information, resistant framework against evasion attacks will increase the
(2) The cloud detection mechanism has some over-head framework performance.
over other detection mechanism, so communication between A cloud-based anti-malware system called CloudEyes,
the client and server must be optimized, especially for the IoT which provides efficient and trusted security services for
and mobile devices. resource-constrained IoT devices presented is in [98]. For
(3) The lack of real time monitoring for all files within all the client side, CloudEyes implemented a lightweight scan-
locations. ning agent that utilizes the digest of signature fragments to
dramatically reduce the range of accurate matching. For the
1) RELATED WORKS FOR CLOUD-BASED DETECTION cloud server side, CloudEyes presented suspicious bucket
Sang Kil et al. proposed a design and implementation of a cross filtering, a novel signature detection mechanism based
novel anti-malware system called SplitScreen [32]. It is a dis- on the reversible sketch structure, which provides retrospec-
tributed malware detection schema which uses an additional tive and accurate orientations of malicious signature frag-
screening step prior to the signature matching phase found ments. Furthermore, by transmitting sketch coordinates and
in existing approach. The SplitScreen’s two-phase scanning the modular hashing, CloudEyes guarantees both the data
enables fast and memory efficient malware detection that privacy and low-cost communications by transmitting sketch
can be decomposed into a client/server process that reduces coordinates and the modular hashing. Authors claim that the
the amount of storage. Proposed method implemented as an mechanisms in CloudEyes are effective and practical which
extension of ClamAV which improves scanning throughput can outperform other existing systems with less time and
using today’s signature sets by over 2x by using half the communication consumption. On the other hand, the detec-
memory. According to the authors, the speedup and memory tion rate and accuracy can be further improved. Also, some
savings of SplitScreen improve further as the number of methods can be used such as Winnowing Block Shingling
signatures increases. The proposed method is scalable on a and Winnowing Multi-Hashing to reduce the size of the data
wide range of low-end consumer and handheld devices. Since in order to optimize the storage and matching performances
single server is used in the cloud, it will be better to optimize during signature initialization.
the server performance, and load some works on client side. Xiao, Liang, et al. investigated the cloud-based malware
Yanfang et al. presented cloud-based schema which detection game, in which mobile devices offload their appli-
combines file content and file relations to improve malware cation traces to security servers via base stations or access
detection results and develops a file verdict system [97]. points in dynamic networks [56]. They designed a malware
The system incorporated into the Comodo’s Anti-malware detection scheme with Q-learning for a mobile device to
products, and empirical studies were conducted on large derive the optimal offloading rate without knowing the trace
daily datasets collected by Comodo cloud security center. generation and the radio bandwidth model of other mobile
The authors claim that their experimental results demon- devices. The Dyna architecture is used to improve perfor-
strated that the accuracy and efficiency of Valkyrie system mance, and post-decision state learning-based scheme is used
outperform other popular anti-malware software tools such to accelerate the reinforcement learning process.
as Kaspersky AntiVirus and McAfee VirusScan, as well According to the authors, test results showed that the pro-
as other alternative data mining based detection systems. posed schemes improve the detection accuracy, reduce the
However, since file relations and file content have different detection delay, and increase the utility of a mobile device in
properties, combining these 2 features directly can decrease the dynamic malware detection game when compared with
the quality of information including correlation and consis- the benchmark strategy. Since many different parties com-
tency issues. Using different approaches as well as Joint- municate with each other during the detection process, some
Embedding approach can help to solve the correlation and overhead can mitigate the performance including the net-
consistency problem. work transmission delay, detection delay for mobile device,
Martignoni et al. presented a framework that enhances the cloud processing time, and the local detection delay.
the capabilities of existing dynamic behavior-based detec- Reducing these delays will improve the performance.
tors. The proposed framework enables sophisticated behavior Yadav R. Mahesh presented malware detection system for
based analysis of suspicious programs in multiple realis- cloud environment [99]. The proposed work consists of 2
tic and heterogeneous environments in the cloud [54]. The modules, clustering and classification. In clustering module,
suggested schema forces sample programs to execute in a the input dataset is gathered into clusters with the utilization
distributed environment including security lab and potential of Weighted Fuzzy C-means clustering (MFCM) algorithm.
victim machines. The evaluation results demonstrated that the In classification module, the centroid from the clusters is
analysis of multiple execution traces of the same malware given to the intermittent Auto Associative Neural Network

VOLUME 8, 2020 6263

Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

which is used to characterize whether the information is

intruded or not. The authors claim that proposed classifier
successfully identifies the malware with high detection preci-
sion thereby outperforming existing classifiers. The proposed
system performance needs to be improved more in the future.

2) EVALUATION OF CLOUD-BASED DETECTION

The literature review of the cloud-based malware detec-
tion has been summarized. Recently, since cloud computing
is becoming very popular, cloud-based malware detection
has become popular as well. Cloud-based malware detec-
tion enhances the detection performance for Pcs and mobile
devices with much bigger malware databases, and intensive
computational resources. Other advantages of cloud-based
detection are installations, configurations and updating regu-
larly. However, there are some disadvantages of cloud- based
detection. In order to work properly the Internet connection
must be always fast, but in some cases internet connection
is slow. Furthermore, real time monitoring for all files is not FIGURE 9. Mobile devices-based malware detection schema.
possible in the cloud.

G. MOBILE DEVICES-BASED MALWARE DETECTION purposes, and those from real malware found in the wild.
In mobile devices world, Android platform has become the Simple 2-means clustering algorithm is chosen to distinguish
market leader. According to recent studies, new malicious benign applications and their correspondent malware version.
app for Android is introduced every 10s. Because of that The authors specified that API call analysis, information
researchers have focused on Android platform rather than flow tracking, and network monitoring technique contribute
other platforms for malware detection. Numerous malware to a deeper analysis of the malware, and provide malware
detection methods have been proposed for smartphones espe- behaviors and more accurate results. The authors identified
cially for Android platform. Generally, these methods use that open(), read(), access(), chmod(), and chown() are the
datamining and ML algorithms to detect malware. A number most used system calls by malware. The authors mentioned
of different features such as system calls, security-sensitive that the proposed method has shown to be an effective means
APIs, information flows, and control flow structures are used. of isolating the malware and alerting the users to downloaded
Even if current studies have made improvement in detecting malware. However, test cases have been done generally on
traditional and new generation malware for mobile devices; self-written malware and a few real malware which is not
detecting of complex malware, and scaling the detection tech- enough for real evaluation. Thus, more real malware needs to
niques for a large bundle of apps still remain a challenging be analyzed. Moreover, there is no enough information about
task. Mobile devices-based detection schema can be seen metrics which represent the framework performance such as
in Figure 9. DR, accuracy, and FP. In addition, the authors did not mention
how they handle zero-day malware.
1) RELATED WORKS FOR MOBILE DEVICES-BASED Host-based malware detection system for Android is pre-
MALWARE DETECTION sented in [58], [101]. Andromaly—a behavioral malware
Isohara et al. proposed a kernel-base behavior analysis for detection framework for Android devices is represented in
Android malware inspection [57]. The system consists of a [58]. The proposed framework used a host-based malware
log collector and a log analysis application. The log collector detection system that continuously monitors various features
records all system calls and filters events with the target appli- and events obtained from the mobile device and then applies
cation, and the log analyzer matches activities with signatures ML anomaly detectors to classify the collected data as nor-
described by regular expressions to detect a malicious activ- mal or malicious. They evaluated several combinations of
ity. They evaluated 230 applications in total. According to anomaly detection algorithms, feature selection techniques
the authors, system can effectively detect malicious behaviors and the number of top features to find the combination
of the unknown applications. 230 apps are not enough to that yields the best performance when detecting new mal-
measure the efficiency of the suggested system, so more apps ware on Android. The authors claim that proposed frame-
need to be analyzed. Besides, there is no enough information work is effective for both mobile devices in general and
about DR, accuracy, and FP. on Android in particular. However, experiments have been
A new framework to obtain and analyze smartphone appli- done on artificially-created malware rather than real malware.
cation activity is presented in [100]. The 2 types of datasets Saracino et al. proposed MADAM, a novel multi-level host-
have been used: those from artificial malware created for test based malware detection system for Android devices that

6264 VOLUME 8, 2020

Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

simultaneously analyzes and correlates features at 4 levels: achieving 99.23% f-measure. Furthermore, when evaluated
kernel, application, user, and package to detect malicious with more than 87.000 apps collected in-the-wild, CASAN-
behaviors [101]. In this study, the actions of each malware are DRA achieves 89.92% accuracy, which has outperformed
examined and misbehavior classes are generated from mal- existing methods by more than 25% in their typical batch
ware behaviors, which encompass most of the known mal- learning setting and more than 7% when they are continu-
ware behaviors. According to the authors, MADAM detects ously retained. The authors, did not mention how they handle
and effectively blocks more than 96% of malicious apps malware which uses obfuscation techniques and unknown
among the 2800 apps. MADAM is subject to mimicry attacks malware. To improve the model performance different graph
which inserting malicious code into benign apps to mislead- kernel, and API dependencies such as information flows and
ing the detection system. Besides, the paper did not mention permission dependencies can be used.
how they handled unknown malware. Narayanan et al. proposed a MKLDROID, a unified frame-
Li et al. introduced significant permission identification work for Android that systematically integrates multiple
(SigPID) method to detect android malware [102]. Instead views of apps for performing comprehensive malware detec-
of extracting and analyzing all Android permissions, three tion and malicious code localization [104]. The MKLDROID
levels of pruning by mining the permission data have been uses a graph kernel to capture structural and contextual
developed which identifies the most significant permissions information from apps’ dependency graphs when identifies
to distinguishing malware and benign. SigPID then utilizes malicious code patterns. Then, it employs multiple kernel
ML classification algorithms to classify different families of learning (MKL) to find a weighted combination of the views
malware and benign apps. According to the authors’ findings, which yields the best detection accuracy. Through large-scale
only 22 permissions are significant out of 135 when over experiments on several datasets wild apps, authors claim
2000 malware analyzed. The test results indicated that when that MKLDROID outperforms three state-of the-art methods
a SVM is used as the classifier, they could achieve over consistently, in terms of accuracy. In addition, malicious code
90% of precision, recall, accuracy, and f-measure; which are localization experiments on a dataset of repackaged malware,
about the same as those produced by the baseline technique. MKLDROID was able to identify all the malware classes with
When proposed schema is compared with other state-of-the- 94% average recall. On the other hand, MKLDROID, cannot
art methods, SigPID is more effective by detecting 93.62% detect all sorts of malicious behaviors and cannot be resistant
of malware in the dataset and 91.4% new malware samples. to obfuscating techniques. Furthermore, MKLDROID can
To use SigPID features with static features can improve the be fooled by adversarial attacks. MKLDROID used only
performance. A review on feature selection in mobile mal- user-awareness contextual information to separate malware
ware detection is presented in [103]. In the paper, 100 stud- from benign. However, other types of contextual informa-
ies were examined based on features selection techniques. tion such as probing and device-specific privileges could be
They categorized features into 4 groups including: static, used.
dynamic, hybrid features and applications metadata. The
authors identified that the most common and distinctive static
2) EVALUATION OF MOBILE DEVICES-BASED MALWARE
features are Android permission, network address, strings,
DETECTION
and hardware components; dynamic features are system calls,
The literature review of the mobile devices-based detec-
network traffic, system components, and user interaction;
tion approach has been summarized. It can use both static
hybrid features are permissions and Java code, system calls,
and dynamic features. Although the proposed methods seem
and AndroidManifest.xml; metadata features are category,
effective when detecting traditional malware, it needs to be
description, permissions, contact email, number of screen-
improved to detect up-to-date malware. Besides, it is not
shots, and version. The authors emphasized that some of
scalable for large bundle of apps. In mobile area, the malware
examined papers introduced novel methods, however due to
detection is still in the earlier stages, and there need to be more
lack of malware sample, authors could not test their systems
studies on this area to fill the gaps.
thoroughly.
Malware detection using graph kernel for Android is pre-
sented in [59], [104]. Narayanan et al. proposed CASAN- H. IoT-BASED MALWARE DETECTION
DRA context-aware, adaptive and scalable android malware Internet of Things (IoT) architecture generally consists of
detector through online learning [59]. The authors proposed a a wide range of Internet-connected smart devices such as
novel graph kernel, which facilitates capturing apps security- home appliances, network cameras, and sensors. The IoT and
sensitive behaviors along with their context information from mobile devices have started to dominate the Internet more
dependence. The authors mentioned that CASANDRA has than PCs. Since mobile and IoT devices are becoming more
specific advantages: it is adaptive to the evolution in mal- popular among users day by day, they are also becoming more
ware features over time, and explains the significant fea- favorite targets for attackers. Because of that the malware
tures that led to an apps classification as being malware detection schema landscape is changing from computers to
or benign. According to the authors, CASANDRA outper- IoT and mobile devices. IoT-based detection schema can be
forms two state-of-the-art methods on a benchmark dataset seen in Figure 10.

VOLUME 8, 2020 6265

Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

TABLE 6. Comparison of malware detection approaches.

detection is still in the earlier stages for IoT, and there need
to be more studies on this area to fill the gaps.

FIGURE 10. IoT-based malware detection schema. V. EVALUATION ON MALWARE DETECTION APPROACHES
In previous section, malware detection approaches were ana-
lyzed based on the main idea, algorithm types, and feature
extraction methods, etc. This section summarizes detection
1) RELATED WORKS FOR IoT-BASED MALWARE DETECTION approaches and their methods, provides advantages and dis-
Malware detection approach for IoT devices is represented in advantages of each detection approach, and provides some
[105], [60]. Novel light-weight technique for detecting DDos suggestions to build a more effective detection schema. The
malware in IoT environments is explained in [105]. They comparison of malware detection approaches, and advan-
extracted the malware images such as one-channel gray-scale tages, disadvantages of each malware detection approach can
image from a malware binary, then utilized a light-weight be seen in Table 6 and Table 7, appropriately.
convolutional neural network for classifying their families. Signature-, behavior-, heuristic-, and model checking-
According to the paper, experimental results showed that the based approaches are well-known and have been used for
proposed system can achieve 94.0% accuracy for the classi- malware detection more than a decade. These approaches are
fication of benign and DDoS malware, and 81.8% accuracy using reverse engineering, datamining, and ML techniques to
for the classification of benign and two main malware fam- detect malware.
ilies. Even though proposed method is fast and lightweight, Signature-based detection approach is fast and effective
it is vulnerable to complex code obfuscation techniques. The to detect known malware. During the signature generation;
author mentioned that this problem can be partially reduced static features such as byte sequences, assembly instructions,
by using more complex static features, such as Opcode strings, Opcode, and list of DLLs are used. Signature detec-
sequences and API calls to a certain degree. Detecting crypto- tion schema has been used for many years and decreases over-
ransomware in IoT networks based on energy consumption head and execution time. However, it cannot detect new gen-
footprint for Android devices is represented in [60]. The pro- eration of malware (Table 6), it is vulnerable to obfuscation
posed system use ML algorithms and specifically monitors and polymorphic techniques, and omitting feature selection.
the energy consumption patterns of different processes to To build an effective signature-based detection schema: some
classify ransomware from malware applications. According dynamic features can be used to avoid obfuscation; feature
to the authors, proposed technique outperformed KNN, neu- selection phase can be added; and new technologies such as
ral networks, SVM and RF, in terms of accuracy rate, recall deep learning, active learning, and ML can be used to increase
rate, precision rate and f-measure. The proposed method the detection rate.
description is not clear. Besides, there is no information about Behavior-based detection approach is used to determine
which ransomware family was analyzed and how they han- the functionality of malware. Thus, even if malware instruc-
dled unknown ransomware. Also, the paper did not mention tion sequence and signature may change, the functionality of
any limitations and challenging tasks. malware will be more or less the same. So, it can detect new
malware, and different variants of the same malware [106].
It is also effective against obfuscation and polymorphic tech-
2) EVALUATION OF IoT-BASED MALWARE DETECTION niques (Table 7). However, it produces high FPs. Besides,
The literature review of the IoT-based detection approach some behaviors are similar in malware and benign sam-
has been summarized. Although the proposed methods seem ples, so grouping these behaviors is difficult, and some mal-
effective when detecting traditional malware, it needs to be ware does not run in protected environment and mistakenly
improved to detect up-to-date malware. Besides, the malware marked as benign. To specify all behaviors correctly, multiple

6266 VOLUME 8, 2020

Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

TABLE 7. Pros and cons of each malware detection approach.

execution paths can be gathered using different machines on difficulties to detect complex malware, and is not scalable
clouds. This can decrease the number of malware mistakenly for large bundle of apps. To integrate the mobile and IoT-
marked as benign. based approach with cloud-based can improve theDR and
Heuristic-based detection approach can use both static and scale better for large bundle of apps.
dynamic features such as API calls, Opcode, CFG, n-gram, Even though each detection method has its own advan-
list of DLLs, and hybrid features. It can detect some previ- tages and works better for different datasets, no detection
ously unknown malware, but it is vulnerable to metamorphic method could detect all malware. Malware detection rate
techniques, and numerous rules and training phases [107] versus complexity of malware can be seen in Figure 11.
make this detection approach complicated (Table 7). Decreas- When complexity of malware (unknown malware, new gen-
ing the number of rules, and building a more efficient learning eration of malware, obfuscated malware) increases, the detec-
phase can improve the method performance. tion rate decreases for all detection approaches. It can be
Model checking-based approach is powerful, can detect seen that signature types of detection approaches such as
unknown malware, and is resistant against obfuscation and signature-, heuristic-, and most of the time mobile devices-
polymorphic techniques (Table 7). However, it can obtain a and IoT-based schemas show lower performance than other
limited view of the malware, not resistant to evasion attacks, approaches such as behavior-, model checking-, cloud-, and
and cannot detect all new generation of malware. To identify deep learning-based approaches (Figure 11).
more accurate formulas, and using effective model checker This is because the later approaches are more effective
may improve performance. to detect unknown and obfuscated malware. Behavior-
Recently; deep learning-, cloud-, mobile devices-, and based detection approach performs pretty well, while sig-
IoT-based approaches have started to be used in malware nature based detection approach shows lowest performance
detection (VI). Deep learning-based detection approaches are (Figure 11). Model checking- and cloud-based detection
effective to detect new malware and reduce features space approaches perform slightly better than deep-learning-,
sharply [108], [109], but it is not resistant to some evasion heuristic-, mobile devices-, and IoT-based detection
attacks. On the other hand, cloud-based detection approaches approaches. Combining malware detection approaches can
increase DR, decrease FPs, and provide bigger malware provide better detection mechanism. For example, combin-
databases and powerful computational resources [110]. The ing behavior-based with model checking-based approaches,
overhead between client and server, and lack of real monitor- and using deep learning and cloud at the same time
ing are still a challenging tasks in cloud environment. Mobile will surely provide better detection mechanism. Besides,
devices- and IoT-based detection approaches can use both using new technologies such as block chain and big data
static and dynamic features, and improve detection rates on may give more opportunity to build a more effective
traditional and new generation of malware [111]. But, it has detector.

VOLUME 8, 2020 6267

Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

FIGURE 11. Malware detection rate versus complexity of malware based on previous studies.

Although malware detectors are being improved every day, hand, for an unknown and complicated malware behavior-,
the following research challenges still remain an open issue model checking-, and cloud-based approaches perform bet-
in malware detection approaches: ter. Deep learning-, mobile-, and IoT-based approaches have
• New generation malware uses some obfuscation and also emerged to detect some portion of known and unknown
packing techniques to hide itself. By using these tech- malware. However, some portion of malware could not be
niques malware can prevent itself from being correctly detected by using these approaches. This shows that to build
analyzed and avoid detection. Signature-based detection an effective method to detect malware is a very challenging
approach is not resistant to malware obfuscation. Even task, and there is a huge gap to fill in new studies and
if behavior-, and model checking-based approaches are methods. Even though the trends in malware creation and
effective to most of obfuscation techniques, they cannot detection approaches are changing rapidly, this study still can
be resistant to all obfuscation techniques. be considered as a key reference for the computer scientist
• Real-time monitoring and detection are a challenging and developers who work in this field. As a future work,
tasks. Most of the studies have been done so far to detect new approach and method need to be proposed. To do that
malware by using dataset and are not appropriate for combining malware detection approaches can be one of the
real-time monitoring. solutions among many. For instance, combining behavior-
• Most of the malware detection approaches are prone to based with model checking-based approaches, and using deep
FPs and FNs. Some features and signatures can be very learning and cloud at the same time will surely provide better
close in malware and benign samples which raises FPs detection mechanism.
and FNs. Recently, the number, severity, sophistication of malware
• No detection method can affectively detect all unknown attacks, and cost of malware inflicts on the world econ-
malware. omy have been increasing exponentially. Attacks with these
• Generally, learning algorithms are prone to bias, and kinds of software have a disastrous effect and cause con-
overfitting. This leads to decreases DRs and increases siderable material damage to individuals, private compa-
FPs. nies, and governments’ assets. Thus, malware should be
• There is no well-known and accepted dataset which can detected before damaging the important assets in the com-
be used to evaluate the malware detection approaches pany. However, there are large gaps in the research area of
performance. This is because each malware detection malware detection and prevention. The aim of this study is
method uses different malware and dataset. to contribute to the research of malware. In this context,
the paper has presented a detailed review of the state-of-
the-art studies for malware detection approaches, and tech-
VI. CONCLUSION niques and algorithms that are used for malware detection.
Even though several new methods have been proposed The advantages and disadvantages of each malware detection
by using these different malware detection approaches, approach have been explained. As well as datamining and
no method could detect all new generation and sophisticated ML, new technologies such as deep learning-, cloud-, mobile
malware. For the known malware signature- and heuristic- devices-, and IoT-based detection schemas have also become
based detection approaches perform well. On the other popular.
6268 VOLUME 8, 2020
Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

REFERENCES [29] O. Aslan and R. Samet, ‘‘Investigation of possibilities to detect malware

[1] S. Morgan. 2019 cybersecurity almanac: 100 facts, figures, predic- using existing tools,’’ in Proc. IEEE/ACS 14th Int. Conf. Comput. Syst.
tions and statistics. Cisco and Cybersecurity Ventures. Accessed: Appl. (AICCSA), Oct. 2017.
Nov. 10, 2019. [Online]. Available: https://cybersecurityventures.com/ [30] M. G. Schultz, E. Eskin, F. Zadok, and S. J. Stolfo, ‘‘Data mining methods
cybersecurity-almanac-2019 for detection of new malicious executables,’’ in Proc. IEEE Symp. Secur.
[2] R. Samani and G. Davis. (2019). McAfee Mobile Threat Report Privacy, May 2001.
Q1. [Online]. Available: https://www.mcafee.com/enterprise/en-us/ [31] A. Karnik, S. Goswami, and R. Guha, ‘‘Detecting obfuscated viruses
assets/reports/rp-mobile-threat-report-2019.pdf using cosine similarity analysis,’’ in Proc. 1st Asia Int. Conf. Modeling
[3] F. Cohen, ‘‘Computer viruses,’’ Ph.D. dissertation, Univ. Southern Simulation (AMS), Mar. 2007.
California, Los Angeles, CA, USA, 1986. [32] S. K. Cha, I. Moraru, J. Jang, J. Truelove, D. Brumley, and
D. G. Andersen, ‘‘SplitScreen: Enabling efficient, distributed malware
[4] F. Cohen, ‘‘A formal definition of computer worms and some related
detection,’’ J. Commun. Netw., vol. 13, no. 2, pp. 187–200, Apr. 2011.
results,’’ Comput. Secur., vol. 11, no. 7, pp. 641–652, Nov. 1992.
[33] U. Baldangombo, N. Jambaljav, and S.-J. Horng, ‘‘A static malware
[5] D. M. Chess and S. R. White, ‘‘An undetectable computer virus,’’ in Proc.
detection system using data mining methods,’’ 2013, arXiv:1308.2831.
Virus Bull. Conf., vol. 5, 2000.
[Online]. Available: https://arxiv.org/abs/1308.2831
[6] F. Cohen, ‘‘Computer viruses: Theory and experiments,’’ Comput. Secur.,
[34] A. Malhotra and K. Bajaj, ‘‘A hybrid pattern based text mining approach
vol. 6, no. 1, pp. 22–35, 1987.
for malware detection using DBScan,’’ CSI Trans., vol. 4, nos. 2–4,
[7] L. M. Adleman, ‘‘An abstract theory of computer viruses,’’ in Advances
pp. 141–149, Dec. 2016.
in Cryptology—CRYPTO. New York, NY, USA: Springer-Verlag, 1990.
[35] A. Moser, C. Kruegel, and E. Kirda, ‘‘Exploring multiple execution
[8] D. Spinellis, ‘‘Reliable identification of bounded-length viruses is NP-
paths for malware analysis,’’ in Proc. IEEE Symp. Secur. Privacy (SP),
complete,’’ IEEE Trans. Inf. Theory, vol. 49, no. 1, pp. 280–284,
May 2007.
Jan. 2003.
[36] G. Wagener, R. State, and A. Dulaunoy, ‘‘Malware behaviour analysis,’’
[9] Z. Zuo, Q. Zhu, and M. Zhou, ‘‘On the time complexity of com-
J. Comput. Virol., vol. 4, no. 4, pp. 279–287, Nov. 2008.
puter viruses,’’ IEEE Trans. Inf. Theory, vol. 51, no. 8, pp. 2962–2966,
[37] Y. Park, D. S. Reeves, and M. Stamp, ‘‘Deriving common mal-
Aug. 2005.
ware behavior through graph clustering,’’ Comput. Secur., vol. 39,
[10] K. Alzarooni, ‘‘Malware variant detection,’’ Ph.D. dissertation, Dept.
pp. 419–430, Nov. 2013.
Comput. Sci., Univ. College London, London, U.K., 2012.
[38] S. Das, Y. Liu, W. Zhang, and M. Chandramohan, ‘‘Semantics–
[11] P. Szor, The Art of Computer Virus Research and Defense. based online malware detection: Towards efficient real–time protection
Upper Saddle River, NJ, USA: Pearson Education, 2005. against malware,’’ IEEE Trans. Inf. Forensics Security, vol. 11, no. 2,
[12] W. Stallings and L. Brown, Computer Security: Principles and Practice. pp. 289–302, Feb. 2016.
Upper Saddle River, NJ, USA: Pearson Education, 2012. [39] M. Norouzi, A. Souri, and S. M. Zamini, ‘‘A data mining classification
[13] P. Szor and P. Ferrie, ‘‘Hunting for metamorphic,’’ in Proc. Virus Bull. approach for behavioral malware detection,’’ J. Comput. Netw. Commun.,
Conf., 2001. vol. 2016, p. 1, Mar. 2016.
[14] S. Alam, R. Horspool, I. Traore, and I. Sogukpinar, ‘‘A framework for [40] B. Zhang, J. Yin, J. Hao, D. Zhang, and S. Wang, ‘‘Malicious codes
metamorphic malware analysis and real-time detection,’’ Comput. Secur., detection based on ensemble learning,’’ in Autonomic and Trusted Com-
vol. 48, pp. 212–233, Feb. 2015. puting (Lecture Notes in Computer Science), vol. 4610. Berlin, Germany:
[15] M. D. Preda. Code Obfuscation and Malware Detection by Abstract Springer, 2007, pp. 468–477.
Interpretation. Accessed: Nov. 12, 2019. [Online]. Available: [41] K. Griffin, S. Schneider, X. Hu, and T.-C. Chiueh, ‘‘Automatic generation
https://iris.univr.it/retrieve/handle/11562/337972/3306/main.pdf of string signatures for malware detection,’’ in Proc. Int. Workshop Recent
[16] W. Yan, Z. Zhang, and N. Ansari, ‘‘Revealing packed malware,’’ IEEE Adv. Intrusion Detection. Berlin, Germany: Springer, 2009.
Secur. Privacy Mag., vol. 6, no. 5, pp. 65–69, Sep. 2008. [42] B. Anderson, D. Quist, J. Neil, C. Storlie, and T. Lane, ‘‘Graph-based
[17] Y. Alosefer, ‘‘Analysing Web-based malware behaviour through client malware detection using dynamic analysis,’’ J. Comput. Virol., vol. 7,
honeypots,’’ Ph.D. dissertation, School Comput. Sci. Inform., Cardiff no. 4, pp. 247–258, Nov. 2011.
Univ., Cardiff, U.K., 2012. [43] R. Islam, R. Tian, L. M. Batten, and S. Versteeg, ‘‘Classification of mal-
[18] M. Sikorski and A. Honig, Practical Malware Analysis: The Hands-On ware based on integrated static and dynamic features,’’ J. Netw. Comput.
Guide to Dissecting Malicious Software. San Francisco, CA, USA: No Appl., vol. 36, no. 2, pp. 646–656, Mar. 2013.
Starch Press, 2012. [44] S. Naval, V. Laxmi, M. Rajarajan, M. S. Gaur, and M. Conti, ‘‘Employing
[19] N. Idika and P. Mathur, ‘‘A survey of malware detection techniques,’’ program semantics for malware detection,’’ IEEE Trans. Inf. Forensics
Purdue Univ., West Lafayette, IN, USA, Tech. Rep., vol. 48, 2007. Security, vol. 10, no. 12, pp. 2591–2604, Dec. 2015.
[20] E. Eilam, Reversing: Secrets of Reverse Engineering. Hoboken, NJ, USA: [45] P. Singh and A. Lakhotia, ‘‘Static verification of worm and virus behavior
Wiley, 2011. in binary executables using model checking,’’ in Proc. IEEE Syst., Man
[21] A. Souri and R. Hosseini, ‘‘A state-of-the-art survey of malware detection Soc. Inf. Assurance Workshop, Mar. 2003.
approaches using data mining techniques,’’ Hum.-Centric Comput. Inf. [46] J. Kinder, S. Katzenbeisser, C. Schallhart, and H. Veith, ‘‘Detecting mali-
Sci., vol. 8, no. 1, p. 3, 2018. cious code by model checking,’’ in Proc. Int. Conf. Detection Intrusions
[22] M. Tavallaee, ‘‘A detailed analysis of the KDD CUP 99 data set,’’ in Proc. Malware, Vulnerability Assessment. Berlin, Germany: Springer, 2005.
IEEE Symp. Comput. Intell. Secur. Defense Appl., 2009, pp. 1–6. [47] P. Beaucamps and J. Marion, ‘‘On behavioral detection,’’ in Proc. EICAR,
[23] D. Arp, M. Spreitzenbarth, M. Hübner, H. Gascon, and K. Rieck, vol. 9, 2009.
‘‘Drebin: Effective and explainable detection of Android malware in [48] F. Song and T. Touili, ‘‘Efficient malware detection using model-
your pocket,’’ in Proc. Netw. Distrib. Syst. Secur. Symp., vol. 14, 2014, checking,’’ in Proc. Int. Symp. Formal Techn. Berlin, Germany: Springer,
pp. 23–26. 2012.
[24] R. Ronen, M. Radu, C. Feuerstein, E. Yom-Tov, and M. Ahmadi, [49] A. Cimitile, F. Martinelli, F. Mercaldo, V. Nardone, A. Santone, and
‘‘Microsoft malware classification challenge,’’ 2018, arXiv:1802.10135. G. Vaglini, ‘‘Model checking for mobile android malware evolution,’’ in
[Online]. Available: https://arxiv.org/abs/1802.10135 Proc. IEEE/ACM 5th Int. FME Workshop Formal Methods Softw. Eng.
[25] Classification of Malware PE Headers. Accessed: Nov. 14, 2019. (FormaliSE), May 2017.
[Online]. Available: https://github.com/urwithajit9/ClaMP [50] J. Saxe and K. Berlin, ‘‘Deep neural network based malware detection
[26] A. H. Lashkari, A. F. A. Kadir, H. Gonzalez, K. F. Mbah, and using two dimensional binary program features,’’ in Proc. 10th Int. Conf.
A. A. Ghorbani, ‘‘Towards a network–based framework for Android mal- Malicious Unwanted Softw. (MALWARE), Oct. 2015.
ware detection and characterization,’’ in Proc. 15th Annu. Conf. Privacy, [51] W. Huang and J. W. Stokes, ‘‘MtNet: A multi-task neural network for
Secur. Trust (PST), Aug. 2017. dynamic malware classification,’’ in Proc. Int. Conf. Detection Intrusions
[27] S. Anderson and P. Roth, ‘‘EMBER: An open dataset for training static PE Malware, Vulnerability Assessment. Cham, Switzerland: Springer, 2016.
malware machine learning models,’’ 2018, arXiv:1804.04637. [Online]. [52] D. Zhu, H. Jin, Y. Yang, D. Wu, and W. Chen, ‘‘DeepFlow: Deep learning-
Available: https://arxiv.org/abs/1804.04637 based malware detection by mining Android application for abnormal
[28] E. Gandotra, D. Bansal, and S. Sofat, ‘‘Malware analysis and classifica- usage of sensitive data,’’ in Proc. IEEE Symp. Comput. Commun. (ISCC),
tion: A survey,’’ J. Inf. Secur., vol. 5, no. 2, pp. 56–64, 2014. Jul. 2017.

VOLUME 8, 2020 6269

Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

[53] Y. Ye, L. Chen, S. Hou, W. Hardy, and X. Li, ‘‘DeepAM: A heterogeneous [76] S.-T. Liu, H.-C. Huang, and Y.-M. Chen, ‘‘A system call analysis method
deep learning framework for intelligent malware detection,’’ Knowl. Inf. with mapreduce for malware detection,’’ in Proc. IEEE 17th Int. Conf.
Syst., vol. 54, no. 2, pp. 265–285, Feb. 2018. Parallel Distrib. Syst., Dec. 2011, pp. 631–637,
[54] L. Martignoni, R. Paleari, and D. Bruschi, ‘‘A framework for behavior- [77] U. Bayer, I. Habibi, D. Balzarotti, E. Kirda, and C. Kruegel, ‘‘A view on
based malware analysis in the cloud,’’ in Proc. Int. Conf. Inf. Syst. Secur. current malware behaviors,’’ in Proc. USENIX Workshop, 2009.
Berlin, Germany: Springer, 2009. [78] H. H. Pajouh, A. Dehghantanha, R. Khayami, and K.-K.-R. Choo, ‘‘Intel-
[55] H. Sun, X. Wang, J. Su, and P. Chen, ‘‘RScam: Cloud-based anti-malware ligent OS X malware threat detection with code inspection,’’ J. Comput.
via reversible sketch,’’ in Proc. Int. Conf. Secur. Privacy Commun. Syst. Virol. Hack Tech., vol. 14, no. 3, pp. 213–223, Aug. 2018.
Cham, Switzerland: Springer, 2015. [79] C. Kolbitsch, P. M. Comparetti, C. Kruegel, E. Kirda, X.-Y. Zhou, and
[56] L. Xiao, Y. Li, X. Huang, and X. Du, ‘‘Cloud–based malware detection X. Wang, ‘‘Effective and efficient malware detection at the end host,’’ in
game for mobile devices with offloading,’’ IEEE Trans. Mobile Comput., Proc. USENIX Secur. Symp., Aug. 2009, vol. 4, no. 1, pp. 351–366.
vol. 16, no. 10, pp. 2742–2750, Oct. 2017. [80] M. Eskandari and S. Hashemi, ‘‘A graph mining approach for detecting
[57] T. Isohara, K. Takemori, and A. Kubota, ‘‘Kernel-based behavior analysis unknown malwares,’’ J. Vis. Lang. Comput., vol. 23, no. 3, pp. 154–162,
for Android malware detection,’’ in Proc. 7th Int. Conf. Comput. Intell. Jun. 2012.
Secur., Dec. 2011. [81] F. Adkins, L. Jones, M. Carlisle, and J. Upchurch, ‘‘Heuristic malware
[58] A. Shabtai, U. Kanonov, Y. Elovici, C. Glezer, and Y. Weiss, ‘‘Andromaly: detection via basic block comparison,’’ in Proc. 8th Int. Conf. Malicious
A behavioral malware detection framework for Android devices,’’ J. Unwanted Softw., Amer. (MALWARE), Oct. 2013.
Intell. Inf. Syst., vol. 38, no. 1, pp. 161–190, Feb. 2012. [82] W. Arnold and G. Tesauro, ‘‘Automatically generated Win32 heuristic
[59] A. Narayanan, M. Chandramohan, L. Chen, and Y. Liu, ‘‘Context–aware, virus detection,’’ in Proc. Int. Virus Bull. Conf., vol. 200, 2000.
adaptive, and scalable Android malware detection through online learn- [83] Y. Ye, T. Li, Q. Jiang, and Y. Wang, ‘‘CIMDS: Adapting postprocessing
ing,’’ IEEE Trans. Emerg. Topics Comput., vol. 1, no. 3, pp. 157–175, techniques of associative classification for malware detection,’’ IEEE
Jun. 2017. Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 40, no. 3, pp. 298–307,
[60] A. Azmoodeh, A. Dehghantanha, M. Conti, and K.-K.-R. Choo, ‘‘Detect- May 2010.
ing crypto-ransomware in IoT networks based on energy consumption [84] Y. Ye, D. Wang, T. Li, D. Ye, and Q. Jiang, ‘‘An intelligent PE-malware
footprint,’’ J. Ambient Intell. Hum. Comput., vol. 9, no. 4, pp. 1141–1152, detection system based on association mining,’’ J. Comput. Virol., vol. 4,
Aug. 2018. no. 4, pp. 323–334, Nov. 2008.
[61] K. Hahn, ‘‘Robust static analysis of portable executable malware,’’ in
[85] Z. Bazrafshan, H. Hashemi, S. M. H. Fard, and A. Hamzeh, ‘‘A survey on
Proc. HTWK Leipzig, 2014.
heuristic malware detection techniques,’’ in Proc. 5th Conf. Inf. Knowl.
[62] Hooked on Mnemonics Worked for Me. Accessed: Nov. 15, 2019. Technol., May 2013.
[Online]. Available: http://hooked-on-mnemonics.blogspot.com/
[86] D. Bilar, ‘‘Opcodes as predictor for malware,’’ Int. J. Electron. Secur.
2011/01/intro-to-creating-anti-virus-signatures.html
Digit. Forensics, vol. 1, no. 2, p. 156, 2007.
[63] M. F. Zolkipli and A. Jantan, ‘‘A framework for malware detection using
[87] A. Holzer, J. Kinder, and H. Veith, ‘‘Using verification technology to
combination technique and signature generation,’’ in Proc. 2nd Int. Conf.
specify and detect malware,’’ in Proc. Int. Conf. Comput. Aided Syst.
Comput. Res. Develop., May 2010.
Theory. Berlin, Germany: Springer, 2007.
[64] Y. Tang, B. Xiao, and X. Lu, ‘‘Using a bioinformatics approach to gener-
[88] J. Kinder, S. Katzenbeisser, C. Schallhart, and H. Veith, ‘‘Proactive detec-
ate accurate exploit-based signatures for polymorphic worms,’’ Comput.
tion of computer worms using model checking,’’ IEEE Trans. Depend.
Secur., vol. 28, no. 8, pp. 827–842, Nov. 2009.
Sec. Comput., vol. 7, no. 4, pp. 424–438, Oct. 2010.
[65] H. Razeghi Borojerdi and M. Abadi, ‘‘MalHunter: Automatic generation
of multiple behavioral signatures for polymorphic malware detection,’’ in [89] P. Beaucamps, I. Gnaedig, and J. Marion, ‘‘Abstraction-based malware
Proc. ICCKE. Mashhad, Iran: Ferdowsi Univ. Mashhad, vol. 1, Oct. 2013. analysis using rewriting and model checking,’’ in Proc. Eur. Symp. Res.
Comput. Secur. Berlin, Germany: Springer, 2012.
[66] J. Newsome, B. Karp, and D. Song, ‘‘Polygraph: Automatically gener-
ating signatures for polymorphic worms,’’ in Proc. IEEE Symp. Secur. [90] F. Song and T. Touili, ‘‘Pushdown model checking for malware detec-
Privacy (Samp;P), Oakland, CA, USA, May 2005, pp. 226–241. tion,’’ Int. J. Softw. Tools Technol. Transf., vol. 16, no. 2, pp. 147–173,
Apr. 2014.
[67] R. Perdisci, W. Lee, and N. Feamster, ‘‘Behavioral clustering of HTTP-
based malware and signature generation using malicious network traces,’’ [91] P. Battista, F. Mercaldo, V. Nardone, A. Santone, and C. A. Visaggio,
in Proc. 7th USENIX Conf. Netw. Syst. Design Implement., San Jose, CA, ‘‘Identification of Android malware families with model checking,’’ in
USA, 2010, pp. 391–404. Proc. 2nd Int. Conf. Inf. Syst. Secur. Privacy, 2016.
[68] I. Santos, Y. K. Penya, J. Devesa, and P. G. Bringas, ‘‘N-grams-based file [92] G. E. Dahl, J. W. Stokes, L. Deng, and D. Yu, ‘‘Large-scale malware
signatures for malware detection,’’ in Proc. 11th Int. Conf. Enterprise Inf., classification using random projections and neural networks,’’ in Proc.
vol. 9, 2009, pp. 317–320. IEEE Int. Conf. Acoust., Speech Signal Process., May 2013.
[69] E. Fix and J. L. Hodges, ‘‘Discriminatory analysis: Nonparametric dis- [93] Z. Yuan, Y. Lu, Z. Wang, and Y. Xue, ‘‘Droid–Sec: Deep learning in
crimination: Small sample performance,’’ Univ. California, Berkeley, Android malware detection,’’ ACM SIGCOMM Comput. Commun. Rev.,
Berkeley, CA, USA, Tech. Rep. 11, 1952. vol. 44, no. 4, pp. 371–372, 2014.
[70] D. Venugopal and G. Hu, ‘‘Efficient signature based malware detection [94] Y. Bengio, ‘‘Learning deep architectures for AL,’’ Found. Trends Mach.
on mobile devices,’’ Mobile Inf. Syst., vol. 4, no. 1, pp. 33–49, 2008. Learn., vol. 2, no. 1, pp. 1–127, 2009.
[71] M. Zheng, M. Sun, and J. C. Lui, ‘‘Droid analytics: A signature based ana- [95] K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. McDaniel,
lytic system to collect, extract, analyze and associate android malware,’’ ‘‘Adversarial perturbations against deep neural networks for mal-
in Proc. 12th IEEE Int. Conf. Trust, Secur. Privacy Comput. Commun., ware classification,’’ 2016, arXiv:1606.04435. [Online]. Available:
Jul. 2013. https://arxiv.org/abs/1606.04435
[72] Y. Fukushima, A. Sakai, Y. Hori, and K. Sakurai, ‘‘A behavior based [96] B. Kolosnjaji, A. Demontis, B. Biggio, D. Maiorca, G. Giacinto,
malware detection scheme for avoiding false positive,’’ in Proc. 6th IEEE C. Eckert, and F. Roli, ‘‘Adversarial malware binaries: Evading deep
Workshop Secure Netw. Protocols, Oct. 2010, pp. 79–84. learning for malware detection in executables,’’ in Proc. 26th Eur. Signal
[73] M. Christodorescu, S. Jha, S. Seshia, D. Song, and R. Bryant, Process. Conf. (EUSIPCO), Sep. 2018.
‘‘Semantics–aware malware detection,’’ in Proc. IEEE Symp. Secur. Pri- [97] Y. Ye, T. Li, S. Zhu, W. Zhuang, E. Tas, U. Gupta, and M. Abdulhayoglu,
vacy (S&AMP;P), May 2005. ‘‘Combining file content and file relations for cloud based malware
[74] A. Lanzi, D. Balzarotti, C. Kruegel, M. Christodorescu, and E. Kirda, detection,’’ in Proc. 17th ACM SIGKDD Int. Conf. Knowl. Discovery
‘‘AccessMiner: Using system-centric models for malware protection,’’ Data Mining (KDD), 2011.
in Proc. 17th ACM Conf. Comput. Commun. Secur. (CCS), 2010, [98] H. Sun, X. Wang, R. Buyya, and J. Su, ‘‘CloudEyes: Cloud-based mal-
pp. 399–412. ware detection with reversible sketch for resource-constrained Internet of
[75] M. Chandramohan, H. B. K. Tan, L. C. Briand, L. K. Shar, and Things (IoT) devices,’’ Softw. Pract. Exper., vol. 47, no. 3, pp. 421–441,
B. M. Padmanabhuni, ‘‘A scalable approach for malware detection Mar. 2017.
through bounded feature space behavior modeling,’’ in Proc. 28th [99] R. M. Yadav, ‘‘Effective analysis of malware detection in cloud comput-
IEEE/ACM Int. Conf. Autom. Softw. Eng. (ASE), Nov. 2013, pp. 312–322. ing,’’ Comput. Secur., vol. 83, pp. 14–21, Jun. 2019.

6270 VOLUME 8, 2020

Ö. Aslan, R. Samet: Comprehensive Review on Malware Detection Approaches

[100] I. Burguera, U. Zurutuza, and S. Nadjm-Tehrani, ‘‘Crowdroid: Behavior- ÖMER ASLAN received the B.Sc. degree in com-
based malware detection system for Android,’’ in Proc. 1st ACM Work- puter engineering from the University of Trakya,
shop Secur. Privacy Smartphones Mobile Devices (SPSM), 2011. Turkey, in 2009, and the M.Sc. degree in infor-
[101] A. Saracino, D. Sgandurra, G. Dini, and F. Martinelli, ‘‘MADAM: Effec- mation security from the University of Texas at
tive and efficient behavior-based Android malware detection and pre- San Antonio, USA, in 2014. He is currently pur-
vention,’’ IEEE Trans. Depend. Sec. Comput., vol. 15, no. 1, pp. 83–97, suing the Ph.D. degree in computer engineering
Jan. 2018. with Ankara University, Turkey. He is a Research
[102] J. Li, L. Sun, Q. Yan, Z. Li, W. Srisa-an, and H. Ye, ‘‘Significant
Assistant in cyber security with the University of
permission identification for machine-learning-based Android malware
Siirt, Turkey. He has published seven conference
detection,’’ IEEE Trans. Ind. Informat., vol. 14, no. 7, pp. 3216–3225,
Jul. 2018. papers and one book chapter.
[103] A. Feizollah, N. B. Anuar, R. Salleh, and A. W. A. Wahab, ‘‘A review
on feature selection in mobile malware detection,’’ Digit. Invest., vol. 13,
pp. 22–37, Jun. 2015.
[104] A. Narayanan, M. Chandramohan, L. Chen, and Y. Liu, ‘‘A multi-view
context-aware approach to Android malware detection and malicious
code localization,’’ Empirical Softw. Eng, vol. 23, no. 3, pp. 1222–1274,
Jun. 2018.
[105] J. Su, V. Danilo Vasconcellos, S. Prasad, S. Daniele, Y. Feng, and
K. Sakurai, ‘‘Lightweight classification of IoT malware based on image
recognition,’’ in Proc. IEEE 42nd Annu. Comput. Softw. Appl. Conf.
(COMPSAC), vol. 2, Jul. 2018.
[106] E. B. Karbab and M. Debbabi, ‘‘MalDy: Portable, data-driven mal-
ware detection using natural language processing and machine learn-
ing techniques on behavioral analysis reports,’’ Digit. Invest., vol. 28,
pp. S77–S87, Apr. 2019.
[107] C. Choi, C. Esposito, M. Lee, and J. Choi, ‘‘Metamorphic malicious code REFIK SAMET (Member, IEEE) is currently a
behavior detection using probabilistic inference methods,’’ Cognit. Syst. Professor with the Computer Engineering Depart-
Res., vol. 56, pp. 142–150, Aug. 2019. ment, Ankara University, Turkey. He has worked
[108] W. Wang, M. Zhao, and J. Wang, ‘‘Effective Android malware detection and managed projects at TUBITAK, NATO, Euro-
with a hybrid model based on deep autoencoder and convolutional neural pean Union, and University Scientific Research
network,’’ J. Ambient Intell. Hum. Comput., vol. 10, no. 8, pp. 3035–3043, Units. He is working on reliability and fault-
Aug. 2019. tolerance of real-time computer systems, informa-
[109] H. Zhang, W. Zhang, Z. Lv, A. K. Sangaiah, T. Huang, and
tion security, cyber security, malware analysis, and
N. Chilamkurti, ‘‘MALDC: A depth detection method for malware based
computer forensics. He has six patents, four books,
on behavior chains,’’ in Proc. World Wide Web, 2019, pp. 1–20.
[110] Q. K. Ali Mirza, I. Awan, and M. Younas, ‘‘CloudIntell: An intelli- and four book chapters. He has over 60 articles
gent malware detection system,’’ Future Gener. Comput. Syst., vol. 86, published in National and International journals and more than 60 conference
pp. 1042–1053, Sep. 2018. papers. He is a member of scientific committee at many National and
[111] Z. Ma, H. Ge, Y. Liu, M. Zhao, and J. Ma, ‘‘A combination method for International science conferences and journals. He is a member of the IEEE
Android malware detection based on control flow graphs and machine Computer Society and IEEE Turkey Section.
learning algorithms,’’ IEEE Access, vol. 7, pp. 21235–21245, 2019.

VOLUME 8, 2020 6271

BGM 506
No ratings yet
BGM 506
2 pages
The Curious Case of Machine Learning in Malware Detection: Sherif Saad, William Briguglio and Haytham Elmiligi
No ratings yet
The Curious Case of Machine Learning in Malware Detection: Sherif Saad, William Briguglio and Haytham Elmiligi
8 pages
Ijett V73i1p132
No ratings yet
Ijett V73i1p132
15 pages
Chapter One
No ratings yet
Chapter One
8 pages
The Curious Case of Machine Learning in Malware Detection: Sherif Saad, William Briguglio and Haytham Elmiligi
No ratings yet
The Curious Case of Machine Learning in Malware Detection: Sherif Saad, William Briguglio and Haytham Elmiligi
9 pages
A Comprehensive Survey On Identification of Malware Types and Malware Classification Using Machine Learning Techniques
No ratings yet
A Comprehensive Survey On Identification of Malware Types and Malware Classification Using Machine Learning Techniques
8 pages
Analyzing and Comparing The Effectiveness of Malware Detection - A Study of Machine Learning Approaches - ScienceDirect
No ratings yet
Analyzing and Comparing The Effectiveness of Malware Detection - A Study of Machine Learning Approaches - ScienceDirect
39 pages
A Novel Ensemble-Based Approach For Windows Malware Detection
No ratings yet
A Novel Ensemble-Based Approach For Windows Malware Detection
10 pages
Malware Detection Issues and Challenges
No ratings yet
Malware Detection Issues and Challenges
7 pages
The State-of-the-Art in AI-Based Malware Detection Techniques: A Review
No ratings yet
The State-of-the-Art in AI-Based Malware Detection Techniques: A Review
18 pages
Chapter One 1.1 Background of The Study
No ratings yet
Chapter One 1.1 Background of The Study
40 pages
The Rise of Machine Learning For Detection and Classification of Malware - Research Developments, Trends and Challenges - ScienceDirect
No ratings yet
The Rise of Machine Learning For Detection and Classification of Malware - Research Developments, Trends and Challenges - ScienceDirect
75 pages
Survey Paper of Group 7
No ratings yet
Survey Paper of Group 7
9 pages
A Review of Deep Learning Based Malware Detection Techniques
No ratings yet
A Review of Deep Learning Based Malware Detection Techniques
19 pages
Artificial Intelligence in Malware Detection: Cosolan Cornelia Ionela May 22, 2018
No ratings yet
Artificial Intelligence in Malware Detection: Cosolan Cornelia Ionela May 22, 2018
5 pages
15709-Article Text-55876-2-10-20220114
No ratings yet
15709-Article Text-55876-2-10-20220114
26 pages
SSRN Id3901568
No ratings yet
SSRN Id3901568
21 pages
Malware Detection and Analysis Challenges and Rese
No ratings yet
Malware Detection and Analysis Challenges and Rese
10 pages
Comparative Analysis of Feature Extraction Methods of PXC
No ratings yet
Comparative Analysis of Feature Extraction Methods of PXC
7 pages
Malware Application Detection Using Machine Learning
No ratings yet
Malware Application Detection Using Machine Learning
8 pages
Ijcna 2021 o 56
No ratings yet
Ijcna 2021 o 56
18 pages
Im 2007
No ratings yet
Im 2007
48 pages
Malware Detection and Classification Based On Graph Convolutional Networks and Function Call Graphs
No ratings yet
Malware Detection and Classification Based On Graph Convolutional Networks and Function Call Graphs
11 pages
Robust Malicious Software Detection and Classifica
No ratings yet
Robust Malicious Software Detection and Classifica
16 pages
PDF 3
No ratings yet
PDF 3
9 pages
Malware Survey Arxxiv
No ratings yet
Malware Survey Arxxiv
9 pages
A Survey of The Recent Trends in Deep Le
No ratings yet
A Survey of The Recent Trends in Deep Le
30 pages
Malware Survey IJNSA
No ratings yet
Malware Survey IJNSA
22 pages
Abusitta 2021
No ratings yet
Abusitta 2021
17 pages
Electronics 11 03665 v2
No ratings yet
Electronics 11 03665 v2
20 pages
6 Thsemminiproject
No ratings yet
6 Thsemminiproject
12 pages
Symmetry 14 02304
No ratings yet
Symmetry 14 02304
11 pages
Computers 13 00059
No ratings yet
Computers 13 00059
18 pages
Document Malware
No ratings yet
Document Malware
9 pages
1 s2.0 S2405844023107821 Main
No ratings yet
1 s2.0 S2405844023107821 Main
19 pages
Published Chapter
No ratings yet
Published Chapter
12 pages
Unit Ii Ais
No ratings yet
Unit Ii Ais
26 pages
Comprehensive Review On CNN-based Malware Detection With Hybrid Optimization Algorithm
No ratings yet
Comprehensive Review On CNN-based Malware Detection With Hybrid Optimization Algorithm
13 pages
Malware Analysis and Classification Survey
No ratings yet
Malware Analysis and Classification Survey
9 pages
A Survey of Malware Detection Techniques
No ratings yet
A Survey of Malware Detection Techniques
49 pages
Judy S Detection and Classification of Malware For
No ratings yet
Judy S Detection and Classification of Malware For
6 pages
Malware Survey Arxxiv
No ratings yet
Malware Survey Arxxiv
10 pages
Case Study
No ratings yet
Case Study
13 pages
Intelligent Behavior-Based Malware Detection System On Cloud Computing Environment
No ratings yet
Intelligent Behavior-Based Malware Detection System On Cloud Computing Environment
20 pages
Classification of Malware Detection Using Machine Learning Algorithms A Survey
No ratings yet
Classification of Malware Detection Using Machine Learning Algorithms A Survey
7 pages
Final Research - Merged
No ratings yet
Final Research - Merged
10 pages
Research Paper 2 Malware Detection
No ratings yet
Research Paper 2 Malware Detection
24 pages
Project JAISON
No ratings yet
Project JAISON
61 pages
From Code To Conundrum Machine Learnings Role in Modern Malware Detection
No ratings yet
From Code To Conundrum Machine Learnings Role in Modern Malware Detection
6 pages
Scalable Malware Detection System Using Big Data A
No ratings yet
Scalable Malware Detection System Using Big Data A
18 pages
(IJETA-V7I5P3) :prateek Nigam
No ratings yet
(IJETA-V7I5P3) :prateek Nigam
8 pages
Preprints202412 0348 v1
No ratings yet
Preprints202412 0348 v1
45 pages
Mal Wares
No ratings yet
Mal Wares
48 pages
Effective Malware Detection Based On Behaviour and Data Features
No ratings yet
Effective Malware Detection Based On Behaviour and Data Features
16 pages
IET Information Security - 2020 - Ghouti - Malware Classification Using Compact Image Features and Multiclass Support
No ratings yet
IET Information Security - 2020 - Ghouti - Malware Classification Using Compact Image Features and Multiclass Support
11 pages
Malware Detection Using Data Mining Techniques: Sara Najari, Iman Lotfi
No ratings yet
Malware Detection Using Data Mining Techniques: Sara Najari, Iman Lotfi
5 pages
Malcode Detection
No ratings yet
Malcode Detection
5 pages
Evaluation of Machine Learning For Smart Phone Malware Detection
No ratings yet
Evaluation of Machine Learning For Smart Phone Malware Detection
6 pages
Malware Classification Based On Multilayer Perception and
No ratings yet
Malware Classification Based On Multilayer Perception and
22 pages
Dynamic Malware Detection in Wireless Networks Using Deep Learning
No ratings yet
Dynamic Malware Detection in Wireless Networks Using Deep Learning
16 pages
Malware Analysis: Digital Forensics, Cybersecurity, And Incident Response
From Everand
Malware Analysis: Digital Forensics, Cybersecurity, And Incident Response
Rob Botwright
No ratings yet
Cybercrime and Digital Forensics: An Introduction: Wilfridus - Bambang December 14, 2015
No ratings yet
Cybercrime and Digital Forensics: An Introduction: Wilfridus - Bambang December 14, 2015
43 pages
RIPEMD
100% (1)
RIPEMD
26 pages
1000 Password
No ratings yet
1000 Password
17 pages
CODM
No ratings yet
CODM
391 pages
Nspracts
No ratings yet
Nspracts
27 pages
MD5 Algorithm in Cryptography and Network Security
No ratings yet
MD5 Algorithm in Cryptography and Network Security
8 pages
Credit Card Skimmer Disguised As Harmless Facebook Tracker
No ratings yet
Credit Card Skimmer Disguised As Harmless Facebook Tracker
9 pages
All Ciphers
No ratings yet
All Ciphers
13 pages
SOC Interview Question Answer
100% (3)
SOC Interview Question Answer
24 pages
Elliptic Curve Cryptography Thesis
100% (3)
Elliptic Curve Cryptography Thesis
8 pages
Resleeve Pentest Report PDF
No ratings yet
Resleeve Pentest Report PDF
72 pages
Client Server Security
100% (3)
Client Server Security
4 pages
Prisma Cloud IAM Security - Assessment
No ratings yet
Prisma Cloud IAM Security - Assessment
5 pages
Network Pentest Course
No ratings yet
Network Pentest Course
10 pages
Classical Encryption Techniques: Hill Cipher, Polyalphabetic Ciphers, Vigenere Cipher Week 04-II
No ratings yet
Classical Encryption Techniques: Hill Cipher, Polyalphabetic Ciphers, Vigenere Cipher Week 04-II
30 pages
خمس 5 مواضيع تدريبية ومقترحة بالحل في الرياضيات الثالثة ثانوي علوم تجريبية ورياضيات وتقني رياضي إعداد الأستاذ شلالي بلال موقع تربية أونلاين
No ratings yet
خمس 5 مواضيع تدريبية ومقترحة بالحل في الرياضيات الثالثة ثانوي علوم تجريبية ورياضيات وتقني رياضي إعداد الأستاذ شلالي بلال موقع تربية أونلاين
42 pages
Adama Enemona Daniel Eie 524 Assignment 5
No ratings yet
Adama Enemona Daniel Eie 524 Assignment 5
3 pages
Data Breach Case Studies
No ratings yet
Data Breach Case Studies
10 pages
Watchguard Firebox X Edge E-Series & GreenBow IPSec VPN Client Software Configuration (English)
No ratings yet
Watchguard Firebox X Edge E-Series & GreenBow IPSec VPN Client Software Configuration (English)
18 pages
Network Security Fundamentals & Concepts (INE-converted)
No ratings yet
Network Security Fundamentals & Concepts (INE-converted)
45 pages
New Crypto Lab File
No ratings yet
New Crypto Lab File
24 pages
Assessing and Managing Security Risk in IT Systems: A Technology-Independent Approach
No ratings yet
Assessing and Managing Security Risk in IT Systems: A Technology-Independent Approach
28 pages
Chapter 5 CS
No ratings yet
Chapter 5 CS
37 pages
Final Quiz 2 - Mail and Web Services
No ratings yet
Final Quiz 2 - Mail and Web Services
8 pages
CCT Course Modules
No ratings yet
CCT Course Modules
4 pages
Sangfor BR P Endpoint-Secure-Brochure 20240313
No ratings yet
Sangfor BR P Endpoint-Secure-Brochure 20240313
12 pages
Digital Watermarking: An Image Authentication Techniques
No ratings yet
Digital Watermarking: An Image Authentication Techniques
17 pages
BCT - Unit-1 (Notes)
No ratings yet
BCT - Unit-1 (Notes)
28 pages
1-CASE Java-Slide
No ratings yet
1-CASE Java-Slide
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

A Comprehensive Review On Malware Detection Approaches

Uploaded by

A Comprehensive Review On Malware Detection Approaches

Uploaded by

Received November 22, 2019, accepted December 22, 2019, date of publication January 3, 2020, date of current version

January 10, 2020.

A Comprehensive Review on Malware Detection

Corresponding author: Ömer Aslan (omer.aslan@siirt.edu.tr)

I. INTRODUCTION malware instances can present the characteristics of multiple

6250 VOLUME 8, 2020

VOLUME 8, 2020 6251

6252 VOLUME 8, 2020

FIGURE 2. A flow chart of malware detection approaches and features.

VOLUME 8, 2020 6253

TABLE 2. Summary of related works on malware detection approaches.

6254 VOLUME 8, 2020

code [11]. Therefore, certain malware can be detected

2) RELATED WORKS FOR SIGNATURE-BASED DETECTION

VOLUME 8, 2020 6255

6256 VOLUME 8, 2020

malware. According to the authors, experimental evaluation

VOLUME 8, 2020 6257

6258 VOLUME 8, 2020

VOLUME 8, 2020 6259

detecting unknown malware is still low and FPR is high.

6260 VOLUME 8, 2020

VOLUME 8, 2020 6261

2) EVALUATION OF DEEP LEARNING-BASED DETECTION

6262 VOLUME 8, 2020

VOLUME 8, 2020 6263

which is used to characterize whether the information is

2) EVALUATION OF CLOUD-BASED DETECTION

6264 VOLUME 8, 2020

VOLUME 8, 2020 6265

TABLE 6. Comparison of malware detection approaches.

6266 VOLUME 8, 2020

TABLE 7. Pros and cons of each malware detection approach.

VOLUME 8, 2020 6267

REFERENCES [29] O. Aslan and R. Samet, ‘‘Investigation of possibilities to detect malware

VOLUME 8, 2020 6269

6270 VOLUME 8, 2020

VOLUME 8, 2020 6271

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.