0% found this document useful (0 votes)
50 views8 pages

Spre It Zen Barth 2013 Mobile Sandbox

Uploaded by

rotedi4150
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views8 pages

Spre It Zen Barth 2013 Mobile Sandbox

Uploaded by

rotedi4150
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Mobile-Sandbox:

Having a Deeper Look into Android Applications

Michael Spreitzenbarth, Florian Echtler, Johannes Hoffmann


Felix Freiling Thomas Schreck Ruhr-University Bochum
Friedrich-Alexander-University Siemens CERT Bochum, Germany
Erlangen, Germany Munich, Germany johannes.hoffmann@rub.de
michael.spreitzenbarth, florian.echtler,
felix.freiling@cs.fau.de t.schreck@siemens.com

ABSTRACT markets. In previous work [22] we analyzed about 6,100


Smartphones in general and Android in particular are in- malicious applications and clustered them into 53 malware
creasingly shifting into the focus of cybercriminals. For un- families with the help of the VirusTotal API [15]. Nearly
derstanding the threat to security and privacy it is important 57% of our analyzed malware families tried to steal per-
for security researchers to analyze malicious software writ- sonal information from the smartphone like address book
ten for these systems. The exploding number of Android entries, the IMEI or GPS coordinates. Additionally, send-
malware calls for automation in the analysis. In this paper, ing SMS messages rates with about 45%. Most common
we present Mobile-Sandbox, a system designed to automati- was sending these messages to premium rated numbers to
cally analyze Android applications in two novel ways: (1) it make money immediately. The last main feature which was
combines static and dynamic analysis, i.e., results of static implemented in nearly 20% of the malware families is the
analysis are used to guide dynamic analysis and extend cov- ability to connect to a remote server in order to receive
erage of executed code, and (2) it uses specific techniques to and execute commands. Another detailed and well-readable
log calls to native (i.e., “non-Java”) APIs. We evaluated the overview of all these existent malware families is provided
system on more than 36,000 applications from Asian third- by Zhou et al. [27].
party mobile markets and found that 24% of all applications
actually use native calls in their code.
1.2 The Need for Automated Analysis
Given the enormous growth of Android malware, security
researchers and vendors must analyze more and more appli-
Categories and Subject Descriptors cations (apps) in a given period of time to understand the
D.4.6 [Operating Systems]: Security and Protection purpose of the software and to develop countermeasures.
Until recently, analysis was done manually by using tools
Keywords like decompilers and debuggers. This process is very time
consuming and error-prone depending on the skill set of the
Android, Malware, Application analysis
analyst. Therefore, tools for automatic analysis of apps were
developed.
1. INTRODUCTION The classical approach to automated analysis is static
analysis. Static analysis investigates software properties that
1.1 Android (Malware) on the Rise can be investigated by inspecting the downloaded app and
In recent years smartphone sales tremendously increased. its source code only. Signature-based detection of apps, the
This explosive growth has drawn the attention of criminals common approach by anti-virus technologies, is an exam-
who try to attract the user to install malicious software ple of static analysis. In practice, malware uses obfuscation
on the device. Google’s smartphone platform Android is techniques to make static analysis harder. A particular form
the most popular operating system and recently overtook of obfuscation used by Android apps is to hide system ac-
Symbian- and iOS-based installations. Most probably, this tivities by calling functions outside the Dalvik/Java runtime
growth stems from the openness of the platform which allows library, i.e., in native libraries written in C/C++ or other
a user to install arbitrary software. programming languages.
But attackers are misusing this openness to spread ma- In contrast to static analysis, dynamic analysis does not
licious applications through common Android application inspect the source code but rather executes it within a con-
trolled environment, often called sandbox. By monitoring
and logging every relevant operation of the execution, a re-
port is automatically generated for each analysis. Dynamic
Permission to make digital or hard copies of all or part of this work for analysis can combat obfuscation techniques rather well but
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies can be circumvented by runtime detection methods. There-
bear this notice and the full citation on the first page. To copy otherwise, to fore, it usually makes sense to combine static and dynamic
republish, to post on servers or to redistribute to lists, requires prior specific analysis which can be done in many different ways.
permission and/or a fee.
SAC’13 March 18-22, 2013, Coimbra, Portugal.
Copyright 2013 ACM 978-1-4503-1656-9/13/03 ...$10.00.
1808
1.3 Existing Android Analysis Systems of API level 7 and below (17 percent) to the share of API
Similar to the development in the desktop PC world, the level 10 and below (75 percent). Additionally, they are not
early systems for analysis of Android apps used a static ap- able to track native code.
proach. A typical system for this approach was proposed by
Schmidt et al. [20]. They attempt to extract the function
1.4 Contribution: Mobile-Sandbox
calls from an Android application (using the readelf util- Overall, there are only few analysis systems that com-
ity) and compare the resulting list with the data of known bine static and dynamic analysis and none that dynamically
malware. Another example for the static approach is An- monitor both actions within the Dalvik VM and outside it
droguard by Desnos et al. [5, 6] which decompiles the appli- in native libraries. Moreover, many of these systems are
cation and applies signature based malware detection. This not readily available for research or are not maintained any-
system is completely open source. more. In this paper, we seek to fill this gap by introducing
In response to static analysis systems in the desktop PC Mobile-Sandbox, a system that
world, malware authors developed various obfuscation tech-
1. uses a novel combination of static and dynamic analy-
niques that have have proven their effectiveness against static
sis techniques,
analysis [17, 24]. This is also an emerging trend in Android
applications and it is clear that static analysis alone can- 2. can track native API calls, and
not ensure complete analysis coverage anymore. Therefore,
researchers have begun to develop systems for dynamic anal- 3. is easily accessible for everyone through a web inter-
ysis of Andoid apps. face [4].
One of the first such systems is TaintDroid by Enck et
al. [7]. It is an efficient and dynamic taint tracking system Within the static analysis part we analyze the application
which provides realtime analysis by leveraging Android’s ex- with various modules to get an overview of the application.
ecution environment. This system was complemented with First, we perform several anti-virus scans using the Virus-
a fully automated user emulation and reporting system by Total service [15], secondly, we parse the manifest file, and
Lantz [16] and is available under the name Droidbox. Droid- finally we decompile the application to better identify sus-
box is an effective tool to analyze Android apps, however, picious code.
it lacks support to track native API calls. In fact, we are Within the dynamic analysis, we execute the application
unaware of any system that supports native API call track- in an emulator and log every operation of the application,
ing during dynamic analysis to date. Both tools are open i.e., we log both the actions executed in the Java Virtual Ma-
source and publicly available. chine Dalvik and actions executed in native libraries which
Another interesting system using dynamic analysis is pB- may be bundled with the application. To be best of our
MDS by Xie et al. [26]. It uses machine learning to create knowledge, Mobile-Sandbox is the first analysis framework
user and system profiles for a specified behavior. After- for the Android platform which has this capability.
wards, it tries to correlate user inputs with system calls by For evaluating our system we collected over 136,000 freely
comparing their behavior profiles to detect anomalous ap- available apps from the most important Asian markets and
plication activities. This system was built for Symbian OS the Google Play-Market. We also collected about 7,500 ma-
and tested with a very small data-set. Crowdroid, by Bur- licious samples from different malware families. We then
guera [3] uses a similar approach but with a much wider set used Mobile-Sandbox to automatically analyze 40,000 ran-
of behavior data and with a more advanced monitoring sys- domly chosen apps from both sample sets. Within these
tem. CrowDroid uses strace, a debugging utility for Linux 40,000 samples our system detected 4,641 malicious appli-
and some other Unix-like systems, to monitor every system cations and additionally 5 suspicious samples which try to
call and the signals it receives. Crowdroid, however, does hide their malicious action inside native code. This insight
not consider information from Android’s Dalvik VM. The clearly indicates that current analysis systems are overlook-
system AASandbox of Bläsing et al. [2] was the first sys- ing important potential threats.
tem combining static and dynamic analysis in a very basic 1.5 Roadmap
way for the Android platform. Unfortunately, AASandbox
does not seem to be maintained anymore. Another system The remainder of this paper is organized as follows: Sec-
combining static and dynamic analysis is DroidRanger by tion 2 characterizes the current threat landscape in mobile
Zhou et al. [28]. DroidRanger implements a combination of devices especially for malware on the Android platform and
permission-based behavioral footprinting to detect samples gives some background on the Android platform. In Sec-
of already known malware families and a heuristic-based fil- tion 3 we illustrate our framework and explain the main
tering scheme to detect unknown malicious families. With ideas behind our static and dynamic analysis. In Section 4
this approach they were able to detect 32 malicious samples we present the results of our evaluation. We conclude in
inside the official Android Market in June 2011. Within their Section 5.
dynamic part they use a kernel module to log only system
calls used by known Android exploits or malware. 2. BACKGROUND
The system which is most similar to our approach is An-
drubis from the Vienna University of Technology [18]. In 2.1 The Android Threat Landscape
their approach they also use Droidbox and TaintDroid for Mobile threats can be categorized into two classes: web-
automated analysis but they are limited to applications be- based and application-based threats. Web-based threats on
neath API level 8 (Android 2.3). In contrast, we are able to mobile devices are a growing attack vector used by crimi-
analyze applications beneath API level 11. This difference nals. These threats rely on the enormous usage of mobile
can be very important when you compare the market-share browsers and their feature-rich implementations. Modern
1809
web browsers support features like embedded video play- sis system. The analysis process has been divided into two
ers or support for video calls. Due to the nature of these parts. At first, we discuss the static analysis in Section 3.1.
features, e.g., parsing huge amounts of external data, the The results of the static analysis are used to guide the fol-
possibility for the existence of exploitable vulnerabilities is lowing dynamic analysis which is described in Section 3.2.
high. Attackers are able to trick the user to follow a web The dynamic analysis automatically executes the apps on
link, sent to them via email or social media, and infect the a modified Android system with the help of the Android
smartphone by exploiting a browser vulnerability. emulator.
The other type of mobile threats are application-based
threats posed by third-party applications in the mobile mar- 3.1 Static analysis
kets. To install applications on the smartphones the vendors Our static analysis consists of several modules. In order
created so-called mobile markets like Apple’s “App Store” to gain a first impression of the application that should be
and Google’s “Google Play”. On iOS-based devices, software analyzed, the corresponding hash value is matched with the
can be obtained from the App Store only. Furthermore, Ap- VirusTotal database. The received detection rate is stored
ple evaluates every software uploaded to the App Store and for the report. However, it does not play a vital role within
only adds the app if it passes certain (unknown) security further processing. The result that is delivered by Kaspersky
checks. On Android devices the end user is also allowed to (one part of the VirusTotal scanning engine), is used for the
install apps from third-party markets. Especially in Asia a classification into existing malware families.
lot of these markets emerged. Typically these third-party Afterwards, the application is extracted in order to get
markets pose a high risk to install malicious applications access to its components; these are required for further ana-
due to the fact that the market owners do not evaluate the lysis. As a following step, we analyze the Android manifest
applications. to get a listing of all required permissions. For this reason
According to Felt et al. [8], mobile applications pose the we use the tool aapt being delivered with the Android SDK
following three types of threats – Malware, Personal Spyware [11]. While parsing the manifest we filter the intents as well
and Grayware. This illustrates that the threat landscape for as the services and receiver for further analysis, too. We also
Android is real and relevant. read out the SDK-version; this is another important detail
to assure that only applications being compatible to the pro-
2.2 Android System Basics vided Android system are passed on to dynamic analysis.
We briefly introduce Android and its relevant parts for Now, the Dalvik byte code that is stored in the classes.dex
this paper in this section. For a thorough introduction we file is converted to smali [10]. We can determine and fil-
refer to Six [21]. ter the embedded advertising networks from the resulting
Android is based on Linux and therefore consists of the files and their directory structure as not to dilute the re-
same core components as usual Linux distributions do. The sults of the analysis. Afterwards, the complete smali-code is
core components are a (patched) Linux kernel, the Bionic searched for potentially dangerous functions and methods.
libc and libraries like WebKit, SQLite and OpenGL. The Here, we take care of calls that can be found frequently and
Android runtime environment consists of core libraries which in particular within malware. This includes, for example:
provide most functionality provided by the core Java li-
• sendTextMessage(): This call is responsible for the
braries. It additionally consists of the Dalvik Virtual Ma-
sending of SMS messages
chine which is responsible for running Android applications
in the operating system. Applications are written in the Java • getPackageInfo(): With the help of this method, mal-
language and each application is executed in its own Dalvik ware often searches for installed AV products
VM. This VM runs dalvik-dex code which is translated from
Java bytecode. Dex code is an optimized bytecode suited • getSimCountryIso(): This call is used to find out in
for mobile devices; the biggest difference is that dex code is which country the user currently resides. This is im-
register based instead of stack based, as is “traditional” Java portant in case of malware to contact the right pre-
bytecode. mium services
One relevant feature of the Dalvik VM for this paper is
• Ljava/lang/Runtime;->exec(): Executes the speci-
the ability that applications written in the Java language
fied command in a separate process. In case of mal-
can additionally access native libraries through the Native
ware, the commands are often ‘su’ or ‘chmod’
Development Kit (NDK) which makes use of the Java Native
Interface (JNI). Developers may move performance critical Moreover, we look for calls of the available encryption li-
operations to native libraries (shared objects in the ELF for- braries. With this step we try to gain deeper knowledge
mat) which are then directly called from running dex code. of the use of encryption and obfuscation within the appli-
The native code contained in such libraries runs outside the cations. During the code-review we try to recognize the
Dalvik VM directly on the processor of the smartphone or functions and methods that normally need a permission for
emulator. their error-free execution. For this reason we refer to data
of Felt et al. [9] which we translated from Java to smali.
3. MOBILE-SANDBOX: ARCHITECTURE With the help of the gathered list we can now compare if
the app is over- or underprivileged. As an ongoing step, it
AND IMPLEMENTATION is searched for statically coded URL’s with the help of a
In order to determine whether an app is malicious or not, regular expression.
it needs to be analyzed with great effort. Its attributes as We filter all implemented timers and broadcasts the app
well as the function range need to be documented. Within is waiting for, as a preparation for the dynamic analysis.
this section we describe the process of our automated analy- Timers and broadcasts are event triggers for certain code to
1810
be executed in Android. We analyze these mechanisms to Droid focuses on providing real-time privacy information to
either trigger the corresponding events or wait for a specified a user on a private device, while DroidBox builds on this
time period in the ongoing dynamic analysis to improve the work by logging all data accessed by the app to the system
coverage of executed code. By this we assure for example log, thereby creating a comprehensive picture of the app’s
that the analysis is not stopped before a timer has expired. runtime activities. This includes data read from and written
This is a common problem in Windows-based dynamic ana- to files, sent and received over the network, SMS messages
lysis [25]. sent, and many more.
At the end of the static analysis a XML file is created, While TaintDroid supports Android up to version 2.3.4,
containing all data generated during the previous steps. thereby covering at least 75 % of the device market as of
Oct. 2012, DroidBox only supports Android up to version
3.2 Dynamic analysis 2.1 which can be considered severely outdated. Therefore,
While certain types of malicious behavior can already be we updated the DroidBox patchset to work with TaintDroid
recognized through static analysis, many kinds of malware 2.3.4, including some additional enhancements such as sup-
can only be reliably detected by looking at its runtime be- port for UDP traffic logging.
havior.
3.2.2 Tracking Native Code
3.2.1 Building Blocks However, this setup still has a “blind spot”: since both
To perform such a dynamic analysis of the app in ques- TaintDroid as well as DroidBox are built on the Dalvik vir-
tion, we rely on the Android emulator provided by Google tual machine used by Android to execute dex bytecode, only
[1]. This software simulates a full ARMv7 device with key dex bytecode (in general translated from Java bytecode) can
peripherals such as GSM module and touchscreen, thereby be traced by those tools. Native code executed using the
allowing to run unknown apps in a safe environment on a JNI will not be visible. Since the introduction of the An-
host computer. As an added benefit, the emulator can be droid NDK, it is easily possible to call native code in ex-
reset to its previous state after an app has been tested by ternal libraries. Although many higher-level functions of
simply replacing the system image files with the original Android are available only in Java, some lower-level func-
ones. Figure 1 gives an overview of the integration of the tions such as socket(), connect(), read() or write() can
emulator into the entire Mobile-Sandbox framework. be easily called from the standard C library and could be
used by malware for communication purposes. While the
app would still require the corresponding permissions such
Figure 1: Dynamic Analyzer Component Overview. as android.permission.INTERNET for being allowed to open
network sockets, a purely Java-based call tracer would not
be able to detect any communication conducted using native
calls.
Native code Consequently, the dynamic analyzer should have the abil-
ity to trace code included in native shared objects, those
included with the app through the NDK as well as those
Java code shipped with Android. For this purpose, we have included a
modified version of the ltrace [23] command, a common
Dalvik VM Instance Linux debugging utility that intercepts library calls of a
with DroidBox patch ltrace monitored application. After the app to be examined has
been started, an ltrace instance is launched and attached
to the Dalvik process running the app in question. All na-
tive calls made into dynamically loaded shared objects are
then logged to a separate file. To reduce the amount of log
MonkeyRunner data and increase performance, internal functions commonly
introduced by the NDK compiler are excluded from logging.
Android Emulator Such functions are for example _Unwind_Backtrace.

3.2.3 Network Traffic


A third logging component which is already supported
PCAP net log DroidBox logÞle ltrace logÞle natively by the emulator is capturing of network traffic to a
PCAP file. This common format can later be analyzed using
tools such as WireShark. In summary, our setup produces
three separate log files detailing the app’s behavior (see also
control script Figure 1):
Host Computer
• DroidBox logfile containing important Java method
calls and data from the Dalvik VM;
data ßow
control ßow • ltrace logfile containing native method calls in shared
objects using JNI;
Since the “stock” emulator offers only limited logging ca-
pabilities, we have chosen the well-known TaintDroid/Droid- • network PCAP file containing all data sent over the
Box [7, 16] system as basis for our dynamic analyzer. Taint- simulated 3G network.
1811
3.2.4 User Interaction While executing the applications inside the emulator we
Another issue which has to be considered for dynamic monitored several outgoing SMS messages (see Listing 2 for
analysis is that of code coverage. For most types of malware, two examples).
simply launching the app will not trigger any malicious pay- Number : 84242 −− Message : QUIZ
load - it is necessary for the user to interact with the app Number : 7132 −− Message : 844858
and perhaps even confirm some malicious actions.
In order to test a significant fraction of the code paths
present in the examined app, we use the MonkeyRunner Listing 2: Two examples for outgoing SMS mes-
toolkit provided by the Android SDK. Using this utility, it is sages.
possible to automatically send simulated interaction events, In addition we found several kinds of data leakage. In List-
such as touchscreen contacts or key presses, to the tested ing 3 we displayed the content of a file that was generated
app. Since MonkeyRunner does not take any UI elements by the application.
into account, but rather produces random events, a sufficient
number of events should be generated to make sure that F i l e : /mnt/ s d c a r d / Tencent / v1 . l o g
most interaction elements have been triggered at least once. Operation : write
In addition to MonkeyRunner, we also use functionality Data : D e v i c e I n f o [ i m e i =357242043237517 ,
built into the emulator to simulate external events such as telNum=, phModel=g e n e r i c ,
incoming phone calls or SMS messages. These events are sysSdk =10 , RELEASE= 2 . 3 . 4 ]
sometimes also used by malware to trigger malicious behav-
ior. Listing 3: Data leakage to a file located on the un-
protected SDcard.
3.2.5 Summary
Within our network traffic analysis we found a lot of pri-
In summary, the following steps are executed in order to
vacy related data like IMSI, IMEI or smartphone model,
analyze the runtime behavior of an app:
which was uploaded to remote servers or web services. One
1. Reset emulator to the initial state. example for such an information leakage via HTTP POST
can be seen in Listing 4.
2. Launch emulator and wait until startup is completed.
<?xml v e r s i o n = ”1 . 0 ” e n c o d i n g =”u t f −8”?>
3. Install app to be analyzed using adb. <r e q u e s t >
<v e r s i o n >1.15</ v e r s i o n >
4. Launch app in a new Dalvik VM.
<p l a t f o r m >2</p l a t f o r m >
5. Attach ltrace to the VM process running the app. <p l a t f o r m V e r s i o n >2.3</ p l a t f o r m V e r s i o n >
<IMEI>357242043237517 </IMEI>
6. Launch MonkeyRunner to generate simulated UI events. <simID >89014103211118510720 </ simID>
7. Simulate additional user events like phone calls. </r e q u e s t >

8. Launch a second run of MonkeyRunner. Listing 4: Information leakage found inside recorded
9. Collect the Dalvik and ltrace log and the PCAP file. network traffic.

The resulting log files of this process are inserted in our


database. This database can be used to perform multiple 4. EVALUATION
other analyses. We now present an evaluation of our sandbox system. We
analyzed the following aspects: correctness, performance,
3.3 Examples detectability and scalability. At the end of this section we
To demonstrate the full range of functions of Mobile-Sandbox also present a short case study of a malicious app using
and the format of the log files, we now show some results native calls.
of the log files that resulted from some applications we an-
alyzed. We start with some information from our dynamic 4.1 Correctness
analysis which can be very useful when looking at encrypted By correctness we mean that an entry in the mobile-
data inside the application. In this case, the application has sandbox log file only appears if and only if the correspond-
encrypted IMEI and IMSI numbers with the help of the DES ing action was performed by the analyzed app. To check
algorithm before sending these data to a remote server. If correctness, we chose 20 samples from a set of malicious ap-
you take a look at the network traffic, you would only see plications that we collected from different sources. These
a package with encrypted data, but looking in the results samples represent different families of Android malware as
from Mobile-Sandbox you get the decrypted data and the shown in Table 1, meant to assure a wide coverage of mali-
DES key that was used for encryption (see Listing 1). cious actions and different points in the development states
of malware evolution. More specifically, we chose samples
Data −− 3 5 7 2 4 2 0 4 3 2 3 7 5 1 7 | 3 1 0 0 0 5 1 2 3 4 5 6 7 8 9 that hit the markets from mid 2010 until beginning of 2012.
Algorithm used −− DES The LeNa and RootSmart families use exploits and native
Key −− 7 7 , 1 9 , 6 8 , 1 2 4 , 2 4 , 9 0 , 1 0 , 1 9 , 6 5 , 1 6 , 7 1 , 2 3 calls, FakeInst and Adsms send premium SMS messages.
The Moghava family acts only on the smartphone itself and
Listing 1: Decrypted IMEI and IMSI and used en- modifies locally stored pictures, i.e., there is no malicious ac-
cryption key. tion observable that “leaves” the smartphone. The TapSnake
1812
Malware Number of Primary Build Information Emulator Galaxy S2
Family Samples Usage Build.BOARD unknown smdk4210
Adsms 2 S Build.DEVICE generic GT–I9100
BaseBrid 5 I, B Build.MODEL sdk GT–I9100
SerBG 2 R, I, B Build.PRODUCT sdk GT–I9100
RootSmart 1 R, I, B, A Build.TAGS test–keys release–keys
LeNa 1 R, I, B, A ro.kernel.qemu 1 0
Moghava 1 C ro.hardware goldfish smdk4210
FakeInst 7 S
TapSnake 1 L Table 2: Differences in build information between
the emulator and a Samsung Galaxy S2.
Table 1: Overview of Mobile Malware used for Eval-
uation (R = gains root acces, C = compromises local malicious application more difficult or to act differently in
storage, S = sends SMS messages, I = steals privacy these environments. Even if this behavior is not prevalent
related information, B = botnet characteristics, L = with Android at the moment, we think that this will change
steals location data, A = installs additional apps). in the future. So, an important security aspect is the de-
tectability of our analysis platform.
family sends location information from the smartphone to a A mechanism to detect the Android emulator deals with
remote server. the specific builds of the operating systems that are used
Within this sample set, we consider RootSmart to be the for it. An application querying this information can easily
most sophisticated malware sample which is, amongst other detect if it is running inside an emulator or on a real de-
capabilities, also able to exploit the Android OS while Tap- vice. Table 2 shows some system values that can be used
snake is the simplest sample when looking at the techniques for identification as they are sufficiently different from real
used for malicious behavior. smartphones. To prevent this detection mechanism a cus-
We manually inspected samples from all these families and tom build of the Android system is required. In this build
consulted all available analysis reports by anti-virus compa- we changed the first five variables from Table 2 to the values
nies and from other sources on the Internet. The result- of a real Samsung Galaxy S2. Unfortunately, modifying the
ing action sequences yielded the ground truth to which we last two values can cause system crashes while running the
compared the behavior the was output by Mobile-Sandbox. emulator because there are some Android system services
Overall, Mobile-Sandbox only detected actions that were that rely on the fact that these values are set correctly. An-
part of the ground truth. However, after initial analysis other problem hiding the emulator is the fact, that Android
we failed to see certain behaviors that were described in the launches the qemud and qemu-props daemons that offer em-
analysis reports on the Web. Later we realized that the ulation assistance to Qemu when running inside the emula-
missing behaviors were due to missing external stimuli, i.e., tor. Removing these two daemons is not feasible as they are
remote servers of a botnet not being active anymore. These needed to emulate the radio equipment.
insights gave us additional confidence that Mobile-Sandbox Another weak point is Qemu itself. As emulated hardware
is working correctly. always behaves differently to native hardware an application
is able to detect if it is running inside an emulator by observ-
4.2 Performance ing the behavior of certain performance aspects of the CPU.
The performance of some parts of our system is still rather Raffetseder et al. [19] show multiple ways to detect the x86
weak. During the evaluation for this paper we measured version of Qemu. Similar techniques could also be applied
runtimes between 9 and 14 minutes for the analysis of one to the ARM architecture to detect the corresponding Qemu
single application. The system is running on an Ubuntu version.
server with Intel Xeon 2,4GHz CPU and 48 GB of RAM. Additionally, we changed the default IMSI and IMEI of
In average Mobile-Sandbox finishes the virus check within the emulator (originally both “0”) to random values that are
3 seconds and the subsequent static analysis within addi- consistent with regular IMEI and IMSI numbers. We did
tional 8 to 15 seconds. Afterwards the system needs about this modification to avoid emulator detection mechanisms
2 minutes to reset and reboot a clean version of the emula- that check for non-standard or empty values in these device
tor. After successfully booting the emulator it takes another identifiers. We have seen this detection technique employed
2 to 6 minutes to install the application. This step depends in various samples, but we are not aware of any other detec-
on the file size of the application. The execution of the ap- tion mechanisms used in other samples yet.
plication and the MonkeyRunner scripts lasts another 6-10
minutes depending on the amount of user events and timers 4.4 Scalability
we want to trigger. After shutting down the emulator the Our last evaluation criterion refers to the scalability of
system needs additional 10 seconds for analysis of all log files Mobile-Sandbox, i.e., the question whether it can be used in
and network traffic. large-scale analysis projects with several hundreds or thou-
The performance can be enhanced tremendously by run- sands of apps.
ning multiple instances of the analysis frameworks simulta-
neously. 4.4.1 Malware in Third-Party Apps
To evaluate the scalability aspect, we collected about 80,000
4.3 Detectability apps between December 2011 and August 2012 from the
As we know from malware targeting the Windows en- most important Asian markets by downloading them using
vironment, there are mechanisms to detect virtualized or the Android emulator in an automated fashion. We call this
sandboxed environments to make the analysis process of the set of apps the “Asian set”. We also received about 7,500
1813
Android malware samples from different families through out of four shows why it is so important to develop a system
anonymous uploads to our webservice [4] and through the which is able to log these events. Figure 3 also depicts the
VirusTotal Malware Intelligence Services (VTMIS) [14]. We share of native calling apps in the malware set. Interestingly,
call this set of samples the “malware set”. this share is about 13 percent. This means that the existence
We then used Mobile-Sandbox to automatically analyze of native calls does not necessarily imply a higher probability
36,000 randomly chosen apps from the Asian set and 4,000 that the app is malicious.
randomly chosen samples from the malware set. The analy-
sis results were stored in a database on which we performed
statistical analysis. These analyses were performed using a Figure 3: Share of Samples Using JNI.
single installation of Mobile-Sandbox within a time span of
14 days. 36000 Samples in Sample Set
Samples using JNI

30000
Figure 2: Top 5 Detected Malware Families and
Number of Corresponding Samples in the Union of
Asian and Malware Set. 24000

Number of Samples
900 18000
Number of Samples per Family

12000

6000
500

0
Third-Party Market Apps Malicious Apps

Within the Asian set samples that use native code, we


200
found five samples that were hiding their malicious actions
inside the native part of their code. When uploading these
samples to VirusTotal we got a detection rate of 0%. This
0
again makes clear how important it is to monitor and analyze
FakeInst

Opfake

KungFu

Plangton

BaseBrid

native code.

4.4.3 Statistical Implications


Besides exhibiting that the use of Mobile-Sandbox can In summary, we found 1.78% (641 out of 36,000) samples
scale to several thousand apps, the statistical analysis pro- from the Asian set to be malicious. Recall that 5 out of these
vides some very interesting results. Considering the union 641 were not detected by VirusTotal. But how representa-
of the Asian set and the malware set, we found 4,641 mali- tive are the results from our measurements? Since we have
cious samples according to the VirusTotal API. Due to the a rather large measurement base (number of measurements)
fact, that the malware set consisted of only 4,000 samples, and we have taken a random sample, we can apply quality
there had to be at least 641 samples from the Asian set that assurance techniques from surveys performed in the area of
were classified as malicious by VirusTotal. From these 641 social sciences by Groves [12, 13].
samples, 213 samples belong to FakeInst, 185 samples are In general, there are two quality criteria for statistical
infected with Opfake, 177 samples are infected with Plank- value estimations. The first is the probability of error, i.e.,
ton, 64 samples belong to KungFu and 2 samples belong to the probability that a statement is not true. In empirical
the Jifake malware family. research, the probability of error is acceptable if it is below
Taking a deeper look at the distribution of the malicious 5%. The second criterion is the margin of error, meaning
applications we noticed that about 54 percent of the 4,641 the margin of percentages that the measurement could be
samples belong to only four malware families (see Figure 2 different. Acceptable values are 5% or below. Measuring a
for a comprehensive overview). These families are FakeInst, value of 75% with 5% error probability and 5% error margin,
Opfake, KungFu and Plangton. Their main functionality is for example, means that with 95% probability, the “real”
sending premium SMS messages and, in the case of KungFu, value is between 70 and 80%.
another main functionality is information stealing. Following Groves [12, 13], it is sufficient to analyze at least
664 samples to guarantee 1% error probability and 5% error
4.4.2 Native Calls margin. The measurements given above about the use of
Another point of interest is the use of native interface native code calls in the Asian set can therefore be generalized
calls inside Android applications. According to our analysis with high probability. The percentages of malware within
about 24 percent of the samples from the Asian set use native the entire Asian set (1.78%) is so small that a 5% error
API calls. When one considers that the author of an app can margin is too large to generalize this value. To reach below
potentially “hide” many malicious actions inside the native a 1% error margin we would need to analyze almost 100,000
part of the application and that common tools are not able samples [12, 13] so we cannot generalize our findings in this
to trace or log this part of the application, the share of one aspect.
1814
5. CONCLUSIONS In Proceedings of the 1st ACM workshop on Security
In this paper, we proposed Mobile-Sandbox, a static and and privacy in smartphones and mobile devices, pages
dynamic analyzer for Android applications with the purpose 3–14. ACM, 2011.
to support malware analysts to detect malicious behavior. [9] A. P. Felt, E. Chin, S. Hanna, D. Song, and
In the static analysis we parse the application’s Manifest file D. Wagner. Android permissions demystified. In Proc.
and decompile the application. In a further step we deter- of the 18th ACM conference on Computer and
mine if the application is using suspicious looking permis- communications security, 2011.
sions or intents. The second part of our sandbox performs [10] J. Freke. smali - an disassembler for android’s dex
the dynamic analysis where we execute the application in format, September 2009.
order to log all performed actions including those stemming [11] Google Inc. Android SDK, October 2009.
from native API calls. [12] R. M. Groves. Research on survey data quality. Public
There are still many points to improve Mobile-Sandbox, Opinion Quarterly, 51(2):157–172, 1987.
especially regarding the performance. For a better perfor- [13] R. M. Groves. Survey Errors and Survey Costs. Wiley,
mance and thus, for analyzing more samples in an appropri- 1989.
ate time period, an idea would be to parallelize the analysis [14] Hispasec Sistemas S.L. VirusTotal Malware
process and try to fix the crashes of MonkeyRunner. These Intelligence Services.
improvements will probably also lead to a more reliable sys-
[15] Hispasec Sistemas S.L. VirusTotal Public API.
tem. As an additional future work, we want to implement a
[16] P. Lantz. droidbox - android application sandbox,
malware detection system which is no longer based on anti-
February 2011.
virus systems: The idea is to use our results from Section 4
and combine them with machine learning techniques based [17] A. Moser, C. Kruegel, and E. Kirda. Limits of static
on our 1TB sample set. With the help of these techniques, analysis for malware detection. In Proc. of the 23rd
we hope to find new heuristics and patterns for efficient mal- Annual Computer Security Applications Conference,
ware detection and clustering. 2007.
For all the mobile users we will provide an application [18] V. U. of Technology. Andrubis - analysis of android
for the Android platform which is able to check the status apks. http://anubis.iseclab.org, May 2012.
of already installed applications and is able to upload them [19] T. Raffetseder, C. Kruegel, and E. Kirda. Detecting
directly into our sandbox for analysis if no report is available. system emulators. In ISC, pages 1–18, 2007.
[20] A.-D. Schmidt, R. Bye, H.-G. Schmidt, J. Clausen,
Acknowledgments: This work has been supported by the O. Kiraz, K. Yüksel, S. Camtepe, and A. Sahin. Static
Federal Ministry of Education and Research (grant 01BY1021 analysis of executables for collaborative malware
– MobWorm). detection on android. In Proc. of the ICC
Communication and Information Systems Security
6. REFERENCES Symposium, 2009.
[21] J. Six. Application Security for the Android Platform:
[1] Android Developers. Using the Android Emulator,
Processes, Permissions, and Other Safeguards. Oreilly
January 2012.
& Assoc Inc, 2011.
[2] T. Bläsing, L. Batyuk, A.-D. Schmidt, S. Camtepe,
[22] M. Spreitzenbarth. The Evil Inside a Droid - Android
and S. Albayrak. An android application sandbox
Malware: Past, Present and Future. In Proceedings of
system for suspicious software detection. In Proc. of
the BALTIC CONFERENCE Network Security and
the 5th International Conference on Malicious and
Forensics, 2012.
Unwanted Software (MALWARE), 2010.
[23] The Debian Project. ltrace, January 2012.
[3] I. Burguera, U. Zurutuza, and S. Nadjm-Tehrani.
[24] C. Willems and F. C. Freiling. Reverse code
Crowdroid: behavior-based malware detection system
engineering - state of the art and countermeasures. it -
for android. In Proc. of the 1st ACM workshop on
Information Technology, pages 53–63, 2011.
Security and privacy in smartphones and mobile
devices, 2011. [25] C. Willems, T. Holz, and F. C. Freiling. Toward
automated dynamic malware analysis using
[4] Department of Computer Science
CWSandbox. IEEE Security & Privacy, 5(2):32–39,
Friedrich-Alexander-University Erlangen-Nuremberg.
2007.
Mobile-Sandbox. http://www.mobile-sandbox.com,
January 2012. [26] L. Xie, X. Zhang, J.-P. Seifert, and S. Zhu. pbmds: a
behavior-based malware detection system for
[5] A. Desnos. Androguard, January 2011.
cellphone devices. In Proc. of the third ACM
[6] A. Desnos and G. Gueguen. Android: From reversing
conference on Wireless network security, 2010.
to decompilation. In Proc. of Black Hat Abu Dhabi,
[27] Y. Zhou and X. Jiang. Dissecting android malware:
2011.
Characterization and evolution. In Proc. of the 33rd
[7] W. Enck, P. Gilbert, B. gon Chun, L. P. Cox, J. Jung,
IEEE Symposium on Security and Privacy (Oakland
P. McDaniel, and A. N. Sheth. TaintDroid: An
2012), May 2012.
Information-Flow Tracking System for Realtime
[28] Y. Zhou, Z. Wang, W. Zhou, and X. Jiang. Hey, You,
Privacy Monitoring on Smartphones. In Proc. of the
Get Off of My Market: Detecting Malicious Apps in
USENIX Symposium on Operating Systems Design
Official and Alternative Android Markets. In Proc. of
and Implementation (OSDI), October 2010.
the 19th Annual Symposium on Network and
[8] A. Felt, M. Finifter, E. Chin, S. Hanna, and
Distributed System Security, 2012.
D. Wagner. A survey of mobile malware in the wild.
1815

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy