14th ICCCNT 2023 Paper 943
14th ICCCNT 2023 Paper 943
Abstract— Malwares are an extensively researched topic and Furthermore, the study of malware behavior is an
their study is an important domain in the realms of important practice since it can detect many things and is where
cybersecurity. However, as such, as the creativity and numbers the expertise of a cybersecurity professional really shines.
of hackers and malicious users increase, it becomes incredibly Simultaneously, it would be irresponsible to also not search
important for cybersecurity professionals to become wary about for existing malwares when examining a file since many of
existing malwares while also being able to recognize suspicious them work on the base of those viruses and as such, fail similar
activity for files. detection techniques. As such, this study and its application in
Static analysis and dynamic analysis for a particular file is
a virtual environment serves as a solid base to implement these
an approach that can help understand the integral fault with a
very things. However, the biggest reason for using the virtual
particular exe file while at the same time, observe it for a period
of time to see what type of actions it takes. For it need to do so,
environment is that it ensures the safety of the users
it needs to be suspended in a virtual environment where it environment and viewing it in isolation as it undergoes its
cannot harm the system by implementing the same through executable functions. It serves as a counter-intuitive
Docker. Not only does it prevent the file from causing issues but alternative to other malware methods that rely more on
the level of inspection is also increased while protecting the prediction and guesswork, which while useful, have a pattern
security of the system. of following trends.
For the crux of this study, the proposed architecture will be
The objectives of this paper is to show the results of a
dealing with two primary types of analyses, namely static
analysis and dynamic analysis. The former will cover most of the
virtual sandbox that integrates the use of both static and
existing techniques used for static analysis such as checksums of dynamic analysis implemented through docker. A variety of
viruses and PE analysis while the latter will cover utilities that its results are shown and how the various tools of both these
observe the behavior of the inspected file in an isolated types of analyses work together to perform a comprehensive
environment ranging from its memory information to its TCP check on a suspicious, executable file.
dump. Finally, the two will be integrated for a full-scan on a
virtual environment to ensure that the virtual machine can II. LITERATURE SURVEY
utilize this integrated sandbox to its best potential. With the work and ingenuity of hackers and their devised
malwares increasing, it is not sufficient to just check the
Keywords— Malware Analysis; Static Analysis; Dynamic behavior of suspicious files just against existing checksums.
Analysis; Virtual Sandbox; Cybersecurity Tool; Most malwares these days work under the radar where they
access websites that have malwares or duplicate themselves
I. INTRODUCTION and create malwares upon bypassing the checks performed by
static analysis. At the same time, most of these checks need to
Malwares have become increasingly advanced as
be performed in a risk-free environment and as such,
cybersecurity measures adapt to their innovations constantly,
significant groundbreaking research work has been done in
trying to stay one step ahead. As such, methods of analysis
these domains.
should also be tuned towards detecting unusual behavior while
also retaining all the existing methods of static analysis to Pang et al. [1] proposed a solution to handle the improper
ensure that the walls of cybersecurity is ready for tried and and inaccurate process which dilute the importance of
tested malwares along with any new ones that come. accurate security metrics by suggesting five malware
detection metrics. These metrics were then tested out with
Malware sandboxes usually work with a systemic
synthetic data using a large dataset collected from VirusTotal.
workflow that an exe file is suspended in a virtual
This study was able to characterize the adjustments to be made
environment that simulates all the features of a regular
to such metrics to improve the statistical improvement.
operating system. In that virtual environment, it is able to
execute as it was supposed to by the malicious user and the Rezaei et al. [2] proposed a new way of detecting malware
cybersecurity professional can observe the behavior of the based on the PE file structure that used only nine features
suspicious file and due to its behavior as it pertains to its based on the header for PE analysis. Their proposed machine
network and memory usage, they can determine whether it is learning models identified the malware programs with an
safe for use or not. At the same time, virtual environments also accuracy of 95.59% while there was also a significant
ensure that original snapshots of the virtual machine can allow difference in the feature vectors of the malware and benign
multiple files to be checked without restarting the virtual files which in turn shows how it can applied to real-time
machine. malware detection systems. Conversely, Raff et al. [3] worked
on the various forms of malware detection without domain of a particular malware, in particular, ransomware. The
knowledge – byte n-grams and strings, both of which deal with authors suggested a four-step implementation to enhance the
the feasibility of applying neural networks to feature learning analyst effectiveness to deal with the malware in a virtual
and malware detections. By restricting themselves to a small environment after initial classification, followed by a
part of the PE header, the authors showed that neural networks feedback loop for the regular checks.
have the capability to learn from raw bytes without explicit
Sikdar et al. [11] developed a model of malware protection
feature construction and perform better compared to
based on game theory and sandbox theory to characterize and
approaches that parse the PE header into explicit features.
compute the optimal defence tactics for malware detection.
Carpenter et al. [4] set out an objective of improving the The authors simulate an interaction between malware and
current state of anti-malware systems by using a new dataset anti-malware as a two-player game where one user tries to
that improved host-based intrusion detection systems by generate a sandbox environment and the other tries to escape
providing API call sequences for multiple malware samples detection or attack the other. As such, this game theoretic
executed in virtual machines. The data was tagged with long model is used to create a QCQP-based framework to compute
sequences of API calls for each sample and for deployment in the optimal strategy for sandbox creating user.
resource intensive devices, three feature methods were tested
Andrade et al. [12] proposed an architecture that identifies
such that one sequence of APIs can be tagged with multiple
the behaviour of a malware in a sandbox environment using
labels.
Cuckoo that was taken as a report. This report was converted
Raff et al. [5] introduced a malware detection practice into a csv file and on that, feature engineering was done to
from raw byte sequences through a neural network that dealt create machine learning model which classified final results.
with primarily an exponential number of steps and a great Much of the features were based on the comparison of the
issue with batch normalization. The authors dealt with these operating system status before and after the execution of the
issues by building introducing linear complexity dependence executable file while the other attributes were derived based
on the sequence which allowed for interpretable sub-regions on monitoring on runtime actions.
of the binary sequence to be identified, which, in turn, allowed
Karagiannis et al. [13] showcased the deployment of the
the neural network to process such unique data.
appropriate digital environment for vulnerable web services
Croce et al. [6] proposed a versatile framework that is and configuring environments using containerization
based on a random search which is a score-based sparse techniques and solutions to reduce the overhead that occurs
targeted and untargeted attacks in the black-box setting. The due to the replication of existing systems. These
bounded perturbations, adversarial patches and adversarial containerization techniques provide scalable solutions and
frames which query efficiency for sparse attack models. This allow less effort and lesser system resources for detecting the
clearly outperforms all black-box and even all white-box work of suspicious malwares. On a similar note, Lee et al. [14]
attacks for different models on MNIST, CIFAR-10 and proposed using Docker for providing communication between
ImageNet and our untargeted achieves very high success rates multiple containers which allows easier configuration of
for challenging settings of 20 x 20 adversarial patches and 2- various terminal services and tools by logically connecting to
pixel while adversarial frames for 224 x 224 images. each other but these logic networks also are vulnerable to
DDoS and cryptocurrency attacks such as ARP spoofing.
Vasilescu et al. [7] proposed a distributed firewall solution Therefore, examining the attacks from Docker container
named Distfw and its integration with a sandbox for malware environment so that it prevents lateral movement attacks that
analysis and detection, showcasing also the effectiveness and occur between different Docker containers.
shortcomings of such a solution. The authors utilised Cuckoo
to perform automated analysis of malware samples and III. THEORY
compare the results from manual analyses. Similarly, Balazs
et al. [8] created a sandbox tool which collects a lot of Malware analysis includes various methods and
information from the execution of a file in a sandbox to create modules that need to be executed. For the purpose of this
statistics how the file works and as such, while the actions by paper, the authors have utilized various analysis modules
the files are not flagged as malicious, they are also elaborated
which fall under 2 divisions; static, and dynamic analysis.
on by the author as to how such information can help identify
suspicious behaviour.
Static analysis mainly deals with debugging a
Miramirkhani et al. [9] proposed a new analysis potentially malicious file without executing it. The static code
environment where well-known properties of such analyzer checks the content of the file against predefined
environments are replaced by realistic values and the rules or existing malware file hashes. To implement static
instrumentation artifacts remain hidden. For virtual machines analysis, the following modules have been added to the
that suspend files there, this includes scrubbing VM-revealing sandbox:
indicators while emulating the real-world execution of the
same. The authors classed a new type of sandbox evasion 1) Basic Image analysis: Basic Image analysis deals with
techniques that occur as a result of the normal use of real identifying various propertie of a file, this includes the file
systems and as such, using these types of information, the size, last date modified, date created, file architecture (It can
malware comes to detect that it is being tested in an artificial be one of x86, x64, or amd64). These properties can later be
environment. As such, the authors created statistical models used classify the file as potentially malicious or not.
that can aid sandbox operators in creating system snapshots
that show realistic system behaviour. Higuera et al. [10] also
presented a systemic workflow for the methodological 2) PE Analysis: Portable Executable files are a standard
analysis of a particular malware to gain sufficient knowledge file format for executables object code, and DLLs (dynamic
link libraries) used in the Windows operating system. During the sandbox to provide logs and a report on all the network
PE Analysis, the sandbox extracts various parameters of the activities while the uploaded file is being executed.
file, they include the image base, the entrypoint address of the
file, RVA (Relative Virtual Address) which is the relative 3) TShark: TShark is a terminal version of WireShark
address of an item after it is loaded into memory relative to which is a packet analyzing program. This tool is used to log
the base address, number of sections in the PE file, all the the DNS summary, TCP dump, and HTTP requests.
sections, and finally, the DOS headers of the file. If a non-PE Analyzing all network interactions during the duration of
file is uploaded to the sandbox, the file is checked for DOS program execution provides a clear insight into any malicious
headers, if it is absent, PE analysis for that file is skipped. behavior exhibited by the file.
V. RESULTS
TShark, and Strace, suggests that the focus is on dynamic increasingly important. Future research can concentrate on
analysis of malware behavior in a controlled environment. developing techniques to efficiently manage large-scale
By monitoring network traffic and system changes caused malware analysis, such as parallel processing and distributed
by malware, the sandbox can identify malicious behavior that sandbox architectures.
may not be apparent through static analysis alone. This allows
for a more accurate understanding of the malware's behavior REFERENCES
and potential impact. [1] Du, Pang & Sun, Zheyuan & Chen, Huashan & Cho, Jin-Hee & Xu,
In addition to dynamic analysis, the inclusion of static Shouhuai. (2018). Statistical Estimation of Malware Detection Metrics
analysis techniques such as VirusTotal, PE file analysis, and in the Absence of Ground Truth. IEEE Transactions on Information
basic image analysis indicates a holistic approach to malware Forensics and Security. PP. 1-1. 10.1109/TIFS.2018.2833292.
analysis. These techniques can help identify potential [2] T. Rezaei and A. Hamze, "An Efficient Approach For Malware
indicators of malware, such as known signatures or suspicious Detection Using PE Header Specifications," 2020 6th International
Conference on Web Research (ICWR), Tehran, Iran, 2020, pp. 234-
code. 239, doi: 10.1109/ICWR49608.2020.9122312.
Overall, this malware analysis sandbox is a valuable tool [3] Raff, Edward & Sylvester, Jared & Nicholas, Charles. (2017). Learning
for malware analysts and cybersecurity professionals seeking the PE Header, Malware Detection with Minimal Domain Knowledge.
to detect and prevent cyber attacks. Its combination of static [4] Carpenter, M., & Luo, C. (2023). Behavioural Reports of Multi-Stage
and dynamic analysis techniques allows for a more thorough Malware. arXiv preprint arXiv:2301.12800.
analysis of malware behavior, enabling organizations to better [5] Raff, Edward & Barker, Jon & Sylvester, Jared & Brandon, Robert &
protect themselves against potential threats. Catanzaro, Bryan & Nicholas, Charles. (2017). Malware Detection by
Eating a Whole EXE.
VII. FUTURE SCOPE [6] Croce, Francesco & Andriushchenko, Maksym & Singh, Naman &
Flammarion, Nicolas & Hein, Matthias. (2020). Sparse-RS: a versatile
This paper presents an integrated malware analysis framework for query-efficient sparse black-box adversarial attacks.
sandbox that paves the way for future research and [7] M. Vasilescu, L. Gheorghe and N. Tapus, "Practical malware analysis
enhancements. Some potential future research areas include: based on sandboxing," 2014 RoEduNet Conference 13th Edition:
Networking in Education and Research Joint Event RENAM 8th
1) Enhanced Static Analysis: The static analysis module Conference, Chisinau, Moldova, 2014, pp. 1-6, doi:
can be enhanced by incorporating advanced techniques such 10.1109/RoEduNet-RENAM.2014.6955304.
as classification algorithms based on machine learning. This [8] Balazs, Zoltan. (2016). Malware Analysis Sandbox Testing
would allow the simulator to more accurately identify and Methodology. Le Journal de la Cybercriminalité & des Investigations
Numériques. 1. 10.18464/cybin.v1i1.3.
classify malware based on their static properties and known
[9] N. Miramirkhani, M. P. Appini, N. Nikiforakis and M. Polychronakis,
behavioural patterns. "Spotless Sandboxes: Evading Malware Analysis Systems Using
Wear-and-Tear Artifacts," 2017 IEEE Symposium on Security and
2) Behavioral Analysis: While the dynamic analysis Privacy (SP), San Jose, CA, USA, 2017, pp. 1009-1024, doi:
10.1109/SP.2017.42.
module captures and analyses network and memory
[10] Bermejo Higuera J, Abad Aramburu C, Bermejo Higuera J-R, Sicilia
behaviour, additional research can enhance behavioural Urban MA, Sicilia Montalvo JA. Systematic Approach to Malware
analysis capabilities. This may involve the development of Analysis (SAMA). Applied Sciences. 2020; 10(4):1360.
machine learning models to detect and categorise malicious https://doi.org/10.3390/app10041360
[11] Sikdar, S., Ruan, S., Han, Q., Pitimanaaree, P., Blackthorne, J., Yener,
behaviour demonstrated by malware samples during B., & Xia, L. (2022). Anti-Malware Sandbox Games. arXiv preprint
execution. arXiv:2202.13520.
[12] C. A. B. d. Andrade, C. G. d. Mello and J. C. Duarte, "Malware
3) Advanced Sandbox Evasion Detection: Malware Automatic Analysis," 2013 BRICS Congress on Computational
Intelligence and 11th Brazilian Congress on Computational
authors are constantly developing new methods to evade Intelligence, Ipojuca, Brazil, 2013, pp. 681-686, doi: 10.1109/BRICS-
detection within sandboxes. Future research can concentrate CCI-CBIC.2013.119.
on developing sophisticated evasion detection mechanisms to [13] Karagiannis, Stylianos & Magkos, Emmanouil & Ntantogian,
identify and counter such techniques, thereby assuring that Christoforos & Ribeiro, Luis. (2020). Sandboxing the Cyberspace for
Cybersecurity Education and Learning. 10.1007/978-3-030-66504-
the sandbox is effective against advanced malware. 3_11.
[14] Lee H, Kwon S, Lee J-H. Experimental Analysis of Security Attacks
4) Scalability and Performance Optimisation: As the for Docker Container Communications. Electronics. 2023; 12(4):940.
number of malware samples continues to grow, assuring https://doi.org/10.3390/electronics12040940
scalability and optimising the sandbox's performance become