0% found this document useful (0 votes)

20 views15 pages

Code Obfuscation Against Static and Dynamic

asdasd

Uploaded by

pythonbittester

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views15 pages

Code Obfuscation Against Static and Dynamic

asdasd

Uploaded by

pythonbittester

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Code Obfuscation against Static and Dynamic

Reverse Engineering

Sebastian Schrittwieser1 and Stefan Katzenbeisser2

1
Vienna University of Technology, Austria
sebastian.schrittwieser@tuwien.ac.at
2
Darmstadt University of Technology, Germany
katzenbeisser@seceng.informatik.tu-darmstadt.de

Abstract. The process of reverse engineering allows attackers to under-

stand the behavior of software and extract proprietary algorithms and
data structures (e.g. cryptographic keys) from it. Code obfuscation is
frequently employed to mitigate this risk. However, while most of to-
day’s obfuscation methods are targeted against static reverse engineer-
ing, where the attacker analyzes the code without actually executing it,
they are still insecure against dynamic analysis techniques, where the
behavior of the software is inspected at runtime. In this paper, we intro-
duce a novel code obfuscation scheme that applies the concept of software
diversiﬁcation to the control ﬂow graph of the software to enhance its
complexity. Our approach aims at making dynamic reverse engineering
considerably harder as the information an attacker can retrieve from the
analysis of a single run of the program with a certain input, is useless for
understanding the program behavior on other inputs. Based on a pro-
totype implementation we show that our approach improves resistance
against both static disassembling tools and dynamic reverse engineering
at a reasonable performance penalty.

Keywords: Code obfuscation, reverse engineering, software protection,

diversiﬁcation.

1 Introduction
Today, software is usually distributed in binary form which is, from an attacker’s
perspective, substantially harder to understand than source code. However, var-
ious techniques can be applied for analyzing binary code. The process of reverse
engineering aims at restoring a higher-level representation (e.g. assembly code)
of software in order to analyze its structure and behavior. In some applications
there is a need for software developers to protect their software against reverse
engineering. The protection of intellectual property (e.g. proprietary algorithms)
contained in software, conﬁdentiality reasons, and copy protection mechanisms
are the most important examples. Another important aspect are cryptographic
algorithms such as AES. They are designed for scenarios with trusted end-points
where encryption and decryption are performed in secure environments and with-
stand attacks in a black-box context, where an attacker does not have knowledge

T. Filler et al. (Eds.): IH 2011, LNCS 6958, pp. 270–284, 2011.

c Springer-Verlag Berlin Heidelberg 2011
Code Obfuscation against Static and Dynamic Reverse Engineering 271

of the internal state of the algorithm (such as round keys derived from the sym-
metric key). In contrast to traditional end-to-end encryption in communications
security, where the attacker resides between the trusted end-points, many types
of software (e.g. DRM clients), have to withstand attacks in a white-box context
where an attacker is able to analyze the software while its execution. This is
particularly difficult for software that runs on an untrusted host.
Software obfuscation is a technique to obscure the control flow of software as
well as data structures that contain sensitive information and is used to miti-
gate the threat of reverse engineering. Collberg et al. [8] define an obfuscating
transformation τ as a transformation of a program P into a program P so that
P and P have the same observable behavior. The original program P and the
obfuscated program P must not differ in their functionality to the user (aside
from performance losses because of the obfuscating transformation), however,
non-visible side effects, like the creation of temporary files are allowed in this
loose definition. Another formal concept of software obfuscation was defined by
Barak et al. [3]. Although this work shows that a universal obfuscator for any
type of software does not exist and perfectly secure software obfuscation is not
possible, software obfuscation is still used in commercial systems to “raise the
bar” for attackers. In the context of Digital Rights Management systems it is
the prime candidate for the protection against attackers who have full access
to the client software. While the research community developed a vast num-
ber of obfuscation schemes (see e.g. [5] and [16]) targeted against static reverse
engineering, where the structure of the software is analyzed without actually
executing it, they are still insecure against dynamic analysis techniques, which
execute the program in a debugger or virtual machine and inspect its behavior.
In this work we introduce a novel code obfuscation technique that effectively
prevents static reverse engineering and limits the impact of dynamic analysis.
Technically, we apply the concept of code diversification to enhance the complex-
ity of the software to be analyzed. Diversification was used in the past to prevent
“class breaks”, so that a crack developed for one instance of a program will most
likely not run on another instance and thus each copy of the software needs to be
attacked independently. In this work we use diversification for the first time for
a different purpose, namely increasing the resistance against dynamic analysis.
The main contribution of the paper is a novel code obfuscation scheme that
provides strong protection against automated static reverse engineering and
which uses the concept of software diversification in order to enhance the com-
plexity of dynamic analysis. Note that we do not intend to construct a perfectly
secure obfuscation scheme, as dynamic analysis can not be prevented. However,
our aim is to make attacks significantly more difficult so that knowledge derived
from one run of the software in a virtual machine does not necessarily help in
understanding the behavior of the software in runs on other inputs.
The remainder of the paper proceeds as follows. After a short overview of
related work (Section 2) we introduce our approach in Section 3. In Section 4
we explain how performance is influenced by our method and evaluate security
aspects. Finally, a conclusion is given in Section 5.
272 S. Schrittwieser and S. Katzenbeisser

2 Related Work
There are a number of publications on software obfuscation and their imple-
mentation. A comprehensive taxonomy of obfuscating transformations was in-
troduced in 1997 by Collberg et al. [8]. To measure the effect of an obfuscating
transformation, Collberg defined three metrics: potency, resilience and cost. Po-
tency describes how much more difficult the obfuscated program P is to under-
stand for humans. Software complexity metrics (e.g. [6,12,22,11,13,21,19]), which
were developed to reduce the complexity of software, can be used to evaluate
this rather subjective metric. In contrast to potency that evaluates the strength
of the obfuscating transformation against humans, resilience defines how well it
withstands an attack of an automatic deobfuscator. This metric evaluates both
the programmer effort (how much effort is required to develop a deobfuscator)
and the deobfuscator effort (the effort of space and time required for the de-
obfuscator to run). A perfect obfuscating transformation has high potency and
resilience values, but low costs in terms of additional memory usage and in-
creased execution time. In practice, a trade-off between resilience/potency and
costs (computational overhead) has to be made. However, the main problem of
measuring an obfuscation technique’s strength is that a well-defined level of se-
curity does not exist, even though it can make the process of reverse engineering
significantly harder and more time consuming. Several other theoretical works
on software obfuscation can be found in [17] and [23].
As preventing disassembling is nearly impossible in scenarios where attackers
have full control over the host on which the software is running, the common so-
lution is to make the result of disassembling worthless for further static analysis
by preventing the reconstruction of the control flow graph. To this end, [16] and
[5] use so-called branching functions to obfuscate the targets of CALL instruc-
tions: The described methods replace CALL instructions with jumps (JMP) to a
generic function (branching function), which decides at runtime which function
to call. Under the assumption that for a static analyzer the branching function
is a black box, the call target is not revealed until the actual execution of the
code. This effectively prevents reconstruction of the control flow graph using
static analysis. However, the concept of a branching function does not protect
against dynamic analysis. An attacker can still run the software on various in-
puts and observe its behavior. Medou et al. [18] argue that recently proposed
software protection models would not withstand attacks that combine static and
dynamic analysis techniques. Still, code obfuscation can make dynamic analysis
considerably harder.
An attack is called a class break, if it was developed for a single entity, but
can easily be extended to break any similar entity. In software, for example, we
would speak of a class break if an attacker can not only remove a copy protec-
tion mechanism on the software purchased, but also can write a generic patch
that removes it from every copy of the software. For software publishers, class
breaks are dreaded, because they allow mass distribution of software cracks (e.g.
on the Internet) to people who would otherwise not be able to develop cracks
themselves. The concept of diversification for preventing class breaks of software
Code Obfuscation against Static and Dynamic Reverse Engineering 273

was put forth by Anckaert [1]. An algorithm for automated software diversifica-
tion was introduced by De Sutter et al. [9]. Their approach uses optimization
techniques to generate different, but semantically equivalent, assembly instruc-
tions from code sequences. While software diversification is an effective solution
(see e.g. [2]), it raises major difficulties in software distribution, because each
copy has to be different. There is no efficient way for the distribution of diversi-
fied copies via physical media (e.g. DVD), and software updates for diversified
software are difficult to distribute as well. Franz [10] proposes a model for the
distribution of diversified software on a large scale. The author argues that the
increasing popularity of online software delivery makes it feasible to send each
user a different version of the software. However, a specific algorithm for the
diversification process is not given.
Another approach to protect cryptographic keys embedded in software is the
use of White-Box Cryptography (WBC), which attempts to construct a de-
cryption routine that is resistant against a “white-box” attacker, who is able
to observe every step of the decryption process. In WBC, the cipher is imple-
mented as a randomized network of key dependent lookup tables. A white-box
DES implementation was introduced by Chow et al. [7]. Based on this approach,
other white-box implementations of DES and AES have been proposed, but all
of them have been broken so far (see e.g. Jabob et al. [14], Wyseur et al. [24]
and Billet et al. [4]). Michiels and Gorissen [20] introduce a technique that uses
white-box cryptography to make software tamper-resistant. In their approach,
the executable code of the software is used in a white-box lookup table for the
cryptographic key. Changing the code would result in an invalid key. However,
due to the lack of secure WBC implementations, the security of this construction
is unclear.
Hardware-based approaches would allow to completely shield the actual exe-
cution of code from the attacker. However, this only moves attacks to the tamper
resistance of the hardware, while raising new challenges like difficult support for
legacy systems and high costs. Therefore, hardware-based software protection is
out of scope of this work.

3 Approach
Our approach combines obfuscation techniques against static and dynamic re-
verse engineering. Within this paper, the term static analysis refers to the process
of automated reverse engineering of software without actually executing it. Using
a disassembler, an attacker can translate machine code into assembly language,
a process that makes machine instructions visible, including ones that modify
the control flow such as jumps and calls. This way, the control flow graph of
the software can be reconstructed without executing even a single line of code.
By inserting indirect jumps that do not reveal their jump target until runtime
and utilizing the concept of a branching function we make static control flow
reconstruction more difficult.
Employing code obfuscation to prevent static analysis is a first step towards
running code securely, even in the presence of attackers who have full access
274 S. Schrittwieser and S. Katzenbeisser

to the host. However, an attacker is still able to perform dynamic analysis of

the software by executing it. The process of disassembling and stepping through
the code reveals much of its internal structure, even if obfuscating transforma-
tions were applied to the code. Preventing dynamic analysis in a software-only
approach is not fully possible as an attacker can always record executed in-
structions, the program’s memory, and register values of a single run of the soft-
ware. However, in our approach we aim at making dynamic analysis considerably
harder for the attacker by applying concepts from diversification. In particular,
the information an attacker can retrieve from the analysis of a single run of the
program with certain inputs is useless for understanding the trace of another
input. It thus increases costs for an attacker dramatically, as the attacker needs
to run the program many times and collect all information to obtain a complete
view of the program. This concept can be considered as diversification of the
control flow graph.

3.1 Protection against Static Reverse Engineering

In our approach we borrow the idea of a branching function to statically obfus-
cate the control ﬂow of the software. While previous implementations replace
existing CALL instructions with jumps to the branching function, we split the
code into small portions that implement only a few instructions and then jump
back to the branching function. While this increases the overhead, it makes the
blocks far more complex to understand. Because of the small size of code blocks,
they leak only little information: A single code block usually is too small for an
attacker to extract useful data without knowing the context the code block is
used inside the software. The jump from the branching function to the following
code block is indirect, i.e. it does not statically specify the memory address of
the jump target, but rather speciﬁes where the jump target’s address is located
at runtime. Static disassembling results in a huge collection of small code blocks
without the information on how to combine them in the correct order to form a
valid piece of software.
Figure 1 explains this approach. The assembly code of the software is split
into small pieces, which we call gadgets. At the end of each gadget we add a jump
back to the branching function. At runtime, this function calculates, based on
the previously executed gadget, the virtual memory address of the following gad-
get and jumps there. The calculation of the next jump target should not solely
depend on the current gadget, but also on the history of executed gadgets so
that without knowing every predecessor of a gadget, an attacker is not able to
calculate the address of the following one. We achieve this requirement by assign-
ing a signature to each gadget (see Section 3.3). During runtime, the signatures
of executed gadgets are summed up and this sum is used inside the branching
function as input parameter for a lookup table that contains the address of the
subsequent gadget. Without knowing the signature sum of all predecessors of a
gadget, it is hard to calculate the subsequently executed gadget.
Code Obfuscation against Static and Dynamic Reverse Engineering 275

Gadgets Branching Function (pseudocode)

#!% _branch:
! ! save flags on stack
$ !% save registers on stack
EAX <= [sig]
ADD lookupTable to EAX
target <= [EAX]
% ! restore registers
% ! restore flags
jump to [target]

$ !%

6

5
%
#" 3
$ !% 1

Fig. 1. Overall architecture of the obfuscated program: small code blocks (gadgets) are
connected by a branching function

3.2 Protection against Dynamic Reverse Engineering

The approach effectively prevents static analysis, as a debugger is not able to
connect gadgets to each other without calculating signature sums and executing
the branching function. Dynamic analysis, however, reveals all gadgets used in
a single invocation of the software as well as their order. An attacker can easily
remove the jumps to the branching function by just concatenating called gad-
gets in their correct order. By performing this task for several inputs, he gets
significant information on the software behavior.
To mitigate that risk, we diversify the control flow graph of the software so
that it contains many more control flow paths than the original implementation.
We diversify gadgets (i.e. add semantically identical but syntactical different
gadgets to the code) and add input dependent branches so that different gadgets
get executed upon running the software with different inputs. We can symbolize
this by a gadget graph, where the actual gadget code is stored in the edges
that connect two nodes, which symbolize the state of a program. Figure 2 shows
the multi-target branching concept before gadget diversification. For every node,
we create outgoing edges and fill them with gadgets (i.e. instructions from the
original code). All outgoing edges of one node start with the same instruction
and only differ in gadget length. In a further step, these gadgets are diversified.
Every path through the graph is a valid trace of the program. The branches are
input dependent: based on the program’s input the branching function decides
which path through the graph has to be taken. For a logical connection between
gadgets, we implement a path signature algorithm that uniquely identifies the
currently executed node and all its predecessors (see Section 3.3).
276 S. Schrittwieser and S. Katzenbeisser

and edi, 0xff

mov edi, [te2+edi*4] and edi, 0xff
xor esi, edi mov edi, [te2+edi*4]

xor esi, edi xor esi, edi

xor esi, [ebp] xor esi, [ebp] mov edi, [te2+edi*4]
add ebp, 4 xor esi, edi
xor esi, [ebp]
xor esi, [ebp] add ebp, 4
add ebp, 4
mov esi, ebx

add ebp, 4
mov esi, ebx

Fig. 2. Gadget graph

Fig. 3. Diversiﬁed control ﬂow graph

In order to increase the security of the obfuscation, we prevent that a path

that is valid for one input is also valid for other inputs. We do this by modifying
some instruction’s operands and automatically compensate these modifications
during runtime by corrective input data. Consider, for example, the assembly
instruction add eax, 8. If we replace this instruction with add eax, ebx; sub
eax, 1, where the content of the register eax is derived from the program’s
input, only a value of 9 in ebx would yield to the correct value in register eax.
Figure 3 shows a more complex control flow graph.
All paths through this graph are valid and semantically equal traces of the
program. However, because of the inserted modifications to operands, one specific
path yields correct computation only for a specific input (or a group of inputs)
and fails otherwise. If an attacker would use the trace of one input for running the
program in the context of another input (e.g. by diverting the control flow in the
Code Obfuscation against Static and Dynamic Reverse Engineering 277

branching function), our modiﬁcations to operands would not be compensated by

the new input and the program would show unexpected behavior and might crash
at some point (e.g. because of access to miscalculated memory addresses). The
process of creating the diversified gadget graph is much easier and faster than
breaking the obfuscation as an attacker has to obtain each trace individually.
At the beginning of our obfuscation algorithm, a random gadget graph is
created from the software to be obfuscated, based on the input parameters for
branching level and gadget size. We then generate unique path signatures (for
details see Section 3.3) inside a depth-first search that traverses through all
possible paths of the graph. Furthermore, we diversify the gadget code (see
Section 3.4), assign the path signature to the gadget and add the gadget to the
output file. For every possible path that can be taken to reach a gadget, we
add the gadget’s memory address and path signature sum to the lookup table.
Finally, we attach the branching function and the lookup table to the obfuscated
code. Algorithm 1 shows the obfuscation algorithm in pseudocode.

Algorithm 1. Obfuscation algorithm in pseudocode

create random gadget graph
DepthFirstSearch (graph)
while path signature of current gadget is not unique do
create random path signature
end while
diversify gadget code
add path signature to gadget
output gadget code
add gadget’s memory address and path signature sum to lookup table
end DepthFirstSearch
output branching function
output lookup table

3.3 Graph Construction

The main challenge of our approach against dynamic reverse engineering is the
performance of the obfuscation algorithm. One the one hand, our approach aims
to signiﬁcantly delay dynamic analysis of an attacker by making it hard to tra-
verse the entire graph within a reasonable time frame (i.e. a brute force attack).
However, on the other hand, the initial construction of the graph has to be
dramatically less time consuming than an attack. We solve this problem with
full knowledge of the structure of the graph at obfuscation time compared to
runtime. The obfuscation algorithm creates the graph and stores its structure
in memory, allowing very eﬃcient graph traversal at obfuscation time. In con-
trast, an attacker only has access to the binary code of the software that does
not contain an explicit description of the graph’s structure. An attacker has to
execute all (or at least most) paths of the graph through the branching function,
including the gadget’s entire code, in order to rebuild the graph and obtain a
complete view of the software.
278 S. Schrittwieser and S. Katzenbeisser

Our graph construction algorithm takes the original program code as well as
a minimum and maximum gadget size and a minimum and maximum branching
size as input parameters and is based on a depth-first search. Starting at the root
node, the algorithm adds a random number of child nodes (within the bounds
of the branching size) and assigns a gadget to each connecting edge. All edges
to child nodes contain the same code by means of being filled with a random
number of instructions (within the given bounds on the gadget size) from the
original code. Only the gadget size and therefore the number of instructions
differ at this stage. Gadgets are not diversified at graph construction time. We
define the absolute number of instructions executed until reaching a node of
the graph as node level. Before adding a new node to the graph, the algorithm
calculates the node level of the new node and checks if it already exists anywhere
in the graph. It that case, instead of creating the node, the algorithm links to the
existing node. This method prevents a continually growing width of the graph.
During gadget graph construction, we calculate and store a path signature in
each node. We make it unique (see below) so that it clearly identifies the node
and all its predecessors. The signature is based on simple ADD and SUB assem-
bly instructions on a fixed memory location. Each gadget adds (or subtracts) a
random value to (or from) the value stored in memory. When traversing through
the graph, the value stored at the memory location identifies the currently ex-
ecuted gadget and the path that was taken through the graph to reach this
gadget. A node can have more than one signature, as more than one path of the
graph could reach this node. In that case, each node signature uniquely identifies
one of the possible paths from the root to the node. During signature assignment
we prevent collisions (two nodes sharing the same signature), by comparing the
current signature to all previously calculated signatures and choosing a differ-
ent value for the ADD or SUB instruction if needed. We decided to implement
a trail-and-error approach instead of an algorithm that generates provable dis-
tinct signatures to avoid performances bottlenecks at runtime. Figure 4 shows
the path signature for a small graph.
We further add a second input parameter to the branching function described
in the static part of our approach. Now, both the program’s input and the
path signature are input parameters for a lookup table that determines the next
gadget to be called. To eliminate any information leakage from the branching
function’s input value, only a hash value of the program’s input and the path
signature is stored in the lookup table.

3.4 Automatic Gadget Diversification

An efficient generation of semantically equivalent mutations of gadgets is the key
challenge for software diversification. This process has to be fully automatic to
be able to process large amounts of source code and the transformation function
is preferably one-way to prevent differential analysis of gadgets. Pattern-based
diversification algorithms (e.g. [9]) are a reasonable first code replacement step.
However, the fact that an attacker only has local view on a gadget, can help to
Code Obfuscation against Static and Dynamic Reverse Engineering 279

-3 3
1

-3 1 3

2 -3

-1|-2 4

10 6

9|8 5|4|7

Fig. 4. Path signatures

improve the strength of the diversiﬁcation by inserting code dependency prob-

lems that are locally undecidable for an attacker.
We propose a combination of dummy code insertions and a process we call
instruction splitting. The idea is to split basic instructions into two ore more
instructions that are in combination semantically equivalent to the original in-
struction and then insert dummy code instructions in between them. We create
bogus dependencies between the actual gadget code and dummy instructions
by accessing data of split instructions inside the dummy code. To identify and
remove dummy instructions, an attacker has to be sure that the code does not
perform any vital operations on the code that is executed afterwards. However,
this problem is hard to decide due to dependencies between gadgets. Because
of the small gadgets sizes, an attacker only has local view on a gadget without
knowledge of the subsequently executed gadget.
A simple example is the instruction add eax, 5 that can be split into the
two instructions add eax, 2 and add eax, 3. Of course, this simple transfor-
mation provides only very limited security against automatic gadget matching
algorithms. We can, however, tremendously improve the strength of the trans-
formation by inserting dummy code. For example, the instruction mov dword
[0x0040EA00], eax can be considered as dummy code, if the value that is
stored in 0x0040EA00 is not used anywhere later in the software. The instruction
sequence add eax, 2; mov dword [0x0040EA00], eax; add eax, 3 is only
semantically equivalent to add eax, 5, if mov dword [0x0040EA00], eax is
dummy code. For an attacker with only local knowledge, this is an ambiguous
problem.
Simple pattern based transformations do not withstand automated attacks
aiming at reversing the diversiﬁcation. The instructions test eax, eax and cmp
eax, 0 are semantically equivalent, but the transformation is weak,
280 S. Schrittwieser and S. Katzenbeisser

xor esi, [ebp]

sub ebp, eax
xor esi, [ebp] add ebp, 12
add ebp, 4 add eax, 5
τ
add ebx, 4 ⇒ add ebx, 2
mov eax, [esp+4] mov dword [0x0040EA00], ebx
jmp branch add ebx, 2
mov eax, [esp+4]
jmp branch

Fig. 5. Code block diversiﬁcation and obfuscation

because a very simple matching algorithm can easily identify them as equiva-
lent. However, analogous to the instruction splitting method, multi-instruction
patterns can be combined with dummy code insertions to enable strong diversi-
fication. To provide an example, consider the instructions push ebp; mov ebp,
esp. A semantically equivalent expression would be push ebp; push esp; pop
ebp. A simple substitution transformation of one version for the other would
most likely not withstand an automated attack. However, if the transformation
is combined with dummy code insertion (e.g. push ebp; push esp; add esp,
[0x0040EA00]; pop ebp, where 0x0040EA00 is 0), an attacker with local knowl-
edge of the gadget can not reveal the dummy code instructions and hence can
not decide gadget equivalence locally.
Figure 5 shows the transformation of a small code block. The transformation
function τ adds dummy code (lines 4 and 6) and modifies the instruction add
ebp, 4 so that it only provides the correct functionality if the corresponding
input 8 is loaded into register eax. This modification prevents an attacker from
extracting this specific (and fully functional) trace and using it with other inputs.
To be able to generalize a trace, all input dependent operand modifications would
have to be removed, thus the entire code would have to be analyzed instruction
by instruction.

4 Discussion
The following section discusses the impact of our obfuscation scheme on perfor-
mance and size of the resulting program and evaluates security aspects.
Performance and Size. To demonstrate the effectiveness of our approach, we
implemented a prototype that reads assembly source code and generates an
obfuscated version of it. We measured the performance losses of a simple bench-
marking tool as well as a standard AES implementation using 8 different gadgets
sizes. While the dynamic part of our approach accounts for an increase in re-
quired memory space because of diversified copies of gadgets, execution time
heavily depends on the size and implementation of the branching function, as
it inserts additional instructions. The performance decreases with the number
Code Obfuscation against Static and Dynamic Reverse Engineering 281

of gadgets, due to calls to the branching function, which are required to switch
between gadgets. In contrast, the strength of the obfuscation is directly propor-
tional to the number of gadgets, so a trade-off between obfuscation strength and
performance has to be made. We compared different gadget sizes from 1 to 50
with the execution times of the non-obfuscated programs (see Figure 6). While
very small gadgets result in significant performance decreases, the execution time
for a program with a gadget sizes of 10 and bigger approximates the execution
time for the original program.

120
AES
MOV benchmark
80
overhead

0
0 5 10 15 20 25 30 35 40 45 50
number of instructions per gadget

Fig. 6. Execution time for diﬀerent gadget sizes

Security. We classiﬁed our method with Collberg’s metric. Potency (strength

against humans) can be evaluated with software complexity metrics. Program
Length [11], Nesting Complexity [12], and Data Flow Complexity [22] are in-
creased by our obfuscating transformation and we rate its potency level simi-
lar to Collberg’s transformation “Parallelize Code” (potency level: high). Both
methods hide the control flow graph and allow the attacker only local view on
small code blocks.
Resilience (strength against automated deobfuscators) is based on the run-
time of a deobfuscator and the scope of the obfuscation transformation. The
runtime grows exponentially with the size of the software and the branching
level of the resulting graph, as a deobfuscator has to traverse through the entire
graph to reconstruct the control flow. For example, splitting a small program
(100 assembly instructions) into gadgets of 12 to 15 instructions and building
a gadget graph where every node has 2 to 3 child nodes, yields to more than
1800 different paths through this graph. In Collberg’s classification, the scope of
our transformation is “global ”. The combination of both measures results in the
resilience level “strong”.
We furthermore used two state-of-the-art reverse engineering tools to evaluate
the strength of the static part of our approach. At first, we tried to reconstruct
the program’s control flow with the disassembler IDA Pro 5.6. Table 1 compares
the automated disassembling rates for the original versions of the code and the
282 S. Schrittwieser and S. Katzenbeisser

Table 1. Amount of successfully reconstructed code areas (IDA Pro)

AES algorithm MOV benchmark

original obfuscated original obfuscated
37.96% 10.27% 100% 0.13%

obfuscated ones. The values in the table are the percent of successfully recon-
structed areas. While IDA Pro was able to reconstruct nearly 38% of the original
AES code, the percentage for the obfuscated version declined to about 10%. For
the MOV benchmark, the difference was even larger. The results show that for
both the AES algorithm and the MOV benchmark, the obfuscated version was
much more difficult to reconstruct for IDA Pro. The huge differences between the
two examples was caused by different amount of obfuscated code. While for the
MOV benchmark the entire code was obfuscated, in the AES example only the
algorithm itself was obfuscated. IDA Pro was able to reconstruct non-obfuscated
parts of the code correctly, but failed at reconstructing obfuscated code. The dis-
assembler is not able to determine the jump targets of the branching function
without actually executing it.
The second tool we used for evaluation is Jakstab [15] which aims at recovering
control flow graphs. Jakstab was not able to resolve the indirect jump at the
end of the branching function of our sample program. Although it successfully
extracted some of the jump targets from the lookup table, the correct order of
the jumps still remained unknown to Jakstab.
Although both tools implement methods for disassembling software and re-
constructing control flow graphs, it is not surprising to see them fail at breaking
our proposed obfuscation technique as they are not tailored to our particular
implementation. Hence, for a more realistic evaluation we also discuss on what
a possible deobfuscator for our approach would look like.
One of the main strengths of our approach is that obfuscated software does
not contain an explicit representation of the graph structure. It is hidden inside
the lookup table, which only reveals the direct successor of a gadget within a
single trace during runtime. If an attacker wants to manipulate the software
(e.g. remove a copy protection mechanism) he could pursue the following two
strategies:

– Reconstructing the entire graph. Without obfuscation, an attacker

would search for the copy protection code inside the software and then re-
move it. In our diversified version of the software, however, multiple different
versions of the copy protection are distributed over the entire code. More-
over, they are split into small blocks to fit into the gadgets. An attacker
could execute every possible trace of the software and so reconstruct the en-
tire control flow graph. The result would, without doubt, reveal the structure
of the code as the individual traces can be analyzed separately. However, the
enormous number of possible paths through the graph makes this approach
time consuming.
Code Obfuscation against Static and Dynamic Reverse Engineering 283

– Removing diversity of a single trace. Alternatively, the attacker could

remove the copy protection code from one trace and then make this trace
valid for all inputs (i.e. remove diversity). The main challenge of this ap-
proach is, that the attacker has to analyze and understand the entire trace
to be able to identify and remove modiﬁcations to operands that were in-
serted during obfuscation time to bind the code to a speciﬁc input.

Neither strategy can likely be performed without human interaction. In the first
one, a large number of variants of the same copy protection mechanism would
have to be identified and removed manually from the individual traces. In the
second strategy, a human deobfuscator would have to analyze an entire trace
to be able to identify the inserted modifications that make the trace specific to
a single input. We believe, that this high amount of manual effort significantly
raises the bar for reverse engineering attacks.

5 Conclusion

This paper proposed a novel software obfuscation method, based on control

flow diversification, which makes it difficult for an attacker to relate structural
information obtained by running a program several times and logging its trace.
By splitting code into small portions (gadgets) before diversification, we achieve
a complex control flow graph and static analysis can only reveal very limited
local information of the program. We practically evaluated the strength of our
approach against automated deobfuscators and showed that it can dramatically
increase the effort for an attacker. A performance evaluation showed observable
slowdowns for very small gadgets sizes, due to the vast amount of inserted jumps.
Versions with bigger gadgets, however, yield to very reasonable performance
results.
Future work includes the development of more sophisticated diversification
techniques. In contrast to the current implementation where diversification is
done only inside gadgets, we consider inter-gadget diversification as an even
more effective method against automated gadget matching algorithms.

References

1. Anckaert, B., De Bosschere, K.: Diversity for Software Protection

2. Anckaert, B., De Sutter, B., De Bosschere, K.: Software piracy prevention through
diversity. In: Proceedings of the 4th ACM Workshop on Digital Rights Manage-
ment, DRM 2004, pp. 63–71. ACM, New York (2004)
3. Barak, B., Goldreich, O., Impagliazzo, R., Rudich, S., Sahai, A., Vadhan, S.P.,
Yang, K.: On the (Im)possibility of obfuscating programs. In: Kilian, J. (ed.)
CRYPTO 2001. LNCS, vol. 2139, pp. 1–18. Springer, Heidelberg (2001)
4. Billet, O., Gilbert, H., Ech-Chatbi, C.: Cryptanalysis of a white box AES imple-
mentation. In: Handschuh, H., Hasan, M.A. (eds.) SAC 2004. LNCS, vol. 3357, pp.
227–240. Springer, Heidelberg (2005)
284 S. Schrittwieser and S. Katzenbeisser

5. Cappaert, J., Preneel, B.: A general model for hiding control flow. In: Proceedings
of the Tenth Annual ACM Workshop on Digital Rights Management. ACM, New
York (2010)
6. Chidamber, S., Kemerer, C.: A metrics suite for object oriented design. IEEE
Transactions on Software Engineering 20(6) (2002)
7. Chow, S., Eisen, P., Johnson, H., van Oorschot, P.: A white-box DES implemen-
tation for DRM applications. In: Digital Rights Management, pp. 1–15 (2003)
8. Collberg, C., Thomborson, C., Low, D.: A taxonomy of obfuscating transformations
(1997)
9. De Sutter, B., Anckaert, B., Geiregat, J., Chanet, D., De Bosschere, K.: Instruction
set limitation in support of software diversity. In: Lee, P.J., Cheon, J.H. (eds.)
ICISC 2008. LNCS, vol. 5461, pp. 152–165. Springer, Heidelberg (2009)
10. Franz, M.: E unibus pluram: massive-scale software diversity as a defense mecha-
nism. In: Proceedings of the 2010 Workshop on New Security Paradigms. ACM,
New York (2010)
11. Halstead, M.: Elements of software science. Elsevier, New York (1977)
12. Harrison, W., Magel, K.: A complexity measure based on nesting level. ACM Sig-
plan Notices 16(3) (1981)
13. Henry, S., Kafura, D.: Software Structure Metrics Based on Information Flow.
IEEE Transactions on Software Engineering 7(5), 510–518 (1981)
14. Jacob, M., Boneh, D., Felten, E.: Attacking an obfuscated cipher by injecting faults.
In: Digital Rights Management, pp. 16–31 (2003)
15. Kinder, J., Veith, H.: Jakstab: A static analysis platform for binaries. In: Gupta,
A., Malik, S. (eds.) CAV 2008. LNCS, vol. 5123, pp. 423–427. Springer, Heidelberg
(2008)
16. Linn, C., Debray, S.: Obfuscation of executable code to improve resistance to static
disassembly. In: Proceedings of the 10th ACM Conference on Computer and Com-
munications Security. ACM, New York (2003)
17. Lynn, B., Prabhakaran, M., Sahai, A.: Positive results and techniques for obfusca-
tion. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004. LNCS, vol. 3027,
pp. 20–39. Springer, Heidelberg (2004)
18. Madou, M., Anckaert, B., De Sutter, B., De Bosschere, K.: Hybrid static-dynamic
attacks against software protection mechanisms. In: Proceedings of the 5th ACM
Workshop on Digital Rights Management, pp. 75–82. ACM, New York (2005)
19. McCabe, T.: A complexity measure. IEEE Transactions on Software Engineering
(1976)
20. Michiels, W., Gorissen, P.: Mechanism for software tamper resistance: an applica-
tion of white-box cryptography. In: Proceedings of the 2007 ACM Workshop on
Digital Rights Management, pp. 82–89. ACM, New York (2007)
21. Munson Taghi, M., John, C.: Measurement of data structure complexity. Journal
of Systems and Software 20(3), 217–225 (1993)
22. Oviedo, E.: Control flow, data flow and program complexity. McGraw-Hill, Inc.,
New York (1993)
23. Wee, H.: On obfuscating point functions. In: Proceedings of the Thirty-Seventh
Annual ACM Symposium on Theory of Computing. ACM, New York (2005)
24. Wyseur, B., Michiels, W., Gorissen, P., Preneel, B.: Cryptanalysis of white-box
DES implementations with arbitrary external encodings. In: Adams, C., Miri, A.,
Wiener, M. (eds.) SAC 2007. LNCS, vol. 4876, pp. 264–277. Springer, Heidelberg
(2007)

Java Programming 8th Edition by Joyce Farrell
No ratings yet
Java Programming 8th Edition by Joyce Farrell
308 pages
1.17 - Surreptitious Software Obfuscation, Watermarking, and Tamperproofing For Software Protection
No ratings yet
1.17 - Surreptitious Software Obfuscation, Watermarking, and Tamperproofing For Software Protection
1,041 pages
Embedded Coder Users Guide R2018a PDF
No ratings yet
Embedded Coder Users Guide R2018a PDF
3,766 pages
Language Translator
100% (1)
Language Translator
13 pages
Full Text 01
No ratings yet
Full Text 01
69 pages
Software Engineering QPS Solutions
No ratings yet
Software Engineering QPS Solutions
80 pages
Prelim Lab Quiz 1
No ratings yet
Prelim Lab Quiz 1
7 pages
Sample Paper Annual CL 6 2022-23
No ratings yet
Sample Paper Annual CL 6 2022-23
2 pages
Library Management System
No ratings yet
Library Management System
84 pages
Eadon AI
No ratings yet
Eadon AI
132 pages
Code Obfuscation Using Code Splitting With Self-Modifying Code
No ratings yet
Code Obfuscation Using Code Splitting With Self-Modifying Code
60 pages
Fast Automated Unpacking and Classification of Malware
No ratings yet
Fast Automated Unpacking and Classification of Malware
118 pages
Software Reverse Engineering in Digital Forensics
100% (2)
Software Reverse Engineering in Digital Forensics
19 pages
Malware Analysis Using Machine Learning and Deep Learning Techniques
No ratings yet
Malware Analysis Using Machine Learning and Deep Learning Techniques
7 pages
COMPUTER SOFTWARE and OPERATING SYSTEMS
No ratings yet
COMPUTER SOFTWARE and OPERATING SYSTEMS
6 pages
Power BI Developer Mode
No ratings yet
Power BI Developer Mode
35 pages
Coding Malware in Fancy Programming Languages For Fun and Profit
No ratings yet
Coding Malware in Fancy Programming Languages For Fun and Profit
18 pages
Multi-Stage Binary Code Obfuscation Using Improved Virtual Machine
No ratings yet
Multi-Stage Binary Code Obfuscation Using Improved Virtual Machine
14 pages
Obf Signal
No ratings yet
Obf Signal
16 pages
RE Article
No ratings yet
RE Article
12 pages
Deobfuscation Reverse Engineering Obfuscated Code
No ratings yet
Deobfuscation Reverse Engineering Obfuscated Code
10 pages
RE Article
No ratings yet
RE Article
12 pages
Mrsdebsass
No ratings yet
Mrsdebsass
5 pages
Cho CC08
No ratings yet
Cho CC08
12 pages
Simple Obfuscation Tools
No ratings yet
Simple Obfuscation Tools
8 pages
Cho2008 Chapter ImplementationOfAnObfuscationT
No ratings yet
Cho2008 Chapter ImplementationOfAnObfuscationT
11 pages
Lambdas Streams
No ratings yet
Lambdas Streams
10 pages
Surreptitious Software Book
No ratings yet
Surreptitious Software Book
13 pages
Cyberbullying IPR
No ratings yet
Cyberbullying IPR
25 pages
LineVul A Transformer-Based Line-Level Vulnerability Prediction
No ratings yet
LineVul A Transformer-Based Line-Level Vulnerability Prediction
13 pages
T3 - Searching Sorting
No ratings yet
T3 - Searching Sorting
20 pages
Pacmem: Enforcing Spatial and Temporal Memory Safety Via Arm Pointer Authentication
No ratings yet
Pacmem: Enforcing Spatial and Temporal Memory Safety Via Arm Pointer Authentication
15 pages
Behavioral Analysis of Obfuscated Code
No ratings yet
Behavioral Analysis of Obfuscated Code
63 pages
Alz Arooni
No ratings yet
Alz Arooni
212 pages
Java String
No ratings yet
Java String
6 pages
1.paper Chat1114 Chat1125
No ratings yet
1.paper Chat1114 Chat1125
4 pages
CSS - DLL Week 4 3rd Quarter
No ratings yet
CSS - DLL Week 4 3rd Quarter
4 pages
Advant Controller 31
100% (1)
Advant Controller 31
132 pages
CCS2003 PDF
No ratings yet
CCS2003 PDF
10 pages
Driller: Augmenting Fuzzing Through Selective Symbolic Execution
No ratings yet
Driller: Augmenting Fuzzing Through Selective Symbolic Execution
16 pages
Quist 2009
No ratings yet
Quist 2009
6 pages
A Study & Review On Code Obfuscation
No ratings yet
A Study & Review On Code Obfuscation
6 pages
On The Effectiveness of Source Code Transformations For Binary Obfuscation
No ratings yet
On The Effectiveness of Source Code Transformations For Binary Obfuscation
7 pages
Advanced Antivirus Evasion Techniques: University of Piraeus Department of Digital Systems
No ratings yet
Advanced Antivirus Evasion Techniques: University of Piraeus Department of Digital Systems
34 pages
PROD PDFs Cryp-Request - Yaml
No ratings yet
PROD PDFs Cryp-Request - Yaml
4 pages
Hello World 2
No ratings yet
Hello World 2
5 pages
Elhassan Elboraey PM Resume
No ratings yet
Elhassan Elboraey PM Resume
1 page
Wwhagdvhu5iqmcmc18 PDF
No ratings yet
Wwhagdvhu5iqmcmc18 PDF
10 pages
ICT Assignment 4
No ratings yet
ICT Assignment 4
5 pages
Pyemu: A Multi-Purpose Scriptable Ia-32 Emulator: Cody Pierce
No ratings yet
Pyemu: A Multi-Purpose Scriptable Ia-32 Emulator: Cody Pierce
38 pages
Data Structures: Link List
No ratings yet
Data Structures: Link List
6 pages
Kiive Audio Tape Face Manual
No ratings yet
Kiive Audio Tape Face Manual
9 pages
Malware Classification Using Static Disassembly and Machine Learning
No ratings yet
Malware Classification Using Static Disassembly and Machine Learning
10 pages
Document
No ratings yet
Document
83 pages
Practicas-1al 5
No ratings yet
Practicas-1al 5
12 pages
Detection of Self-Mutating Computer Viruses
No ratings yet
Detection of Self-Mutating Computer Viruses
17 pages
Eturn Riented Bfuscation: Vivek Balachandran, Sabu Emmanuel and NG Wee Keong
No ratings yet
Eturn Riented Bfuscation: Vivek Balachandran, Sabu Emmanuel and NG Wee Keong
12 pages
Paprer CJ Usenix03
No ratings yet
Paprer CJ Usenix03
18 pages
SAP Analytics BI Statement of Direction
No ratings yet
SAP Analytics BI Statement of Direction
8 pages
Study and Comparison of Virus Detection Techniques
No ratings yet
Study and Comparison of Virus Detection Techniques
3 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
11 pages
Software Protection Using Obfuscation
No ratings yet
Software Protection Using Obfuscation
6 pages
Defect Management Process
100% (11)
Defect Management Process
9 pages
Virt-ICE: Next-Generation Debugger For Malware Analysis
No ratings yet
Virt-ICE: Next-Generation Debugger For Malware Analysis
8 pages
Codility FAQ
No ratings yet
Codility FAQ
1 page
Self Encrypting Code To Protect Against Analysis and Tampering
No ratings yet
Self Encrypting Code To Protect Against Analysis and Tampering
14 pages
Java Collections
No ratings yet
Java Collections
4 pages
Secure Coding in TypeScript: Best Practices and Vulnerability Prevention
From Everand
Secure Coding in TypeScript: Best Practices and Vulnerability Prevention
Baldurs L.
No ratings yet
Code Generation Techniques and Applications: Definitive Reference for Developers and Engineers
From Everand
Code Generation Techniques and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Algorithms Made Simple: Understanding the Building Blocks of Software
From Everand
Algorithms Made Simple: Understanding the Building Blocks of Software
William E. Clark
No ratings yet
Rust In Practice, Second Edition
From Everand
Rust In Practice, Second Edition
Rick Tim
No ratings yet
Fortify Security Analysis Essentials: Definitive Reference for Developers and Engineers
From Everand
Fortify Security Analysis Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Effective Error Monitoring with Bugsnag: Definitive Reference for Developers and Engineers
From Everand
Effective Error Monitoring with Bugsnag: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Veracode Essentials: Definitive Reference for Developers and Engineers
From Everand
Veracode Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Effective Vulnerability Management: Managing Risk in the Vulnerable Digital Ecosystem
From Everand
Effective Vulnerability Management: Managing Risk in the Vulnerable Digital Ecosystem
Chris Hughes
5/5 (1)
Practical C++ Backend Programming
From Everand
Practical C++ Backend Programming
Justin Barbara
No ratings yet
In Depth Security Vol. II: Proceedings of the DeepSec Conferences
From Everand
In Depth Security Vol. II: Proceedings of the DeepSec Conferences
BoD - Books on Demand
No ratings yet
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
From Everand
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
Mustafa Al-Dori
5/5 (1)
Practical C++ Backend Programming: Crafting Databases, APIs, and Web Servers for High-Performance Backend
From Everand
Practical C++ Backend Programming: Crafting Databases, APIs, and Web Servers for High-Performance Backend
Justin Barbara
No ratings yet
Learn Penetration Testing with Python 3.x: Perform Offensive Pentesting and Prepare Red Teaming to Prevent Network Attacks and Web Vulnerabilities (English Edition)
From Everand
Learn Penetration Testing with Python 3.x: Perform Offensive Pentesting and Prepare Red Teaming to Prevent Network Attacks and Web Vulnerabilities (English Edition)
Yehia Elghaly
5/5 (1)
Reversing: Secrets of Reverse Engineering
From Everand
Reversing: Secrets of Reverse Engineering
Eldad Eilam
4.5/5 (16)
Mobile Offensive Security Pocket Guide: A Quick Reference Guide For Android And iOS
From Everand
Mobile Offensive Security Pocket Guide: A Quick Reference Guide For Android And iOS
James Stevenson
1/5 (1)
Autonomic Networking: Fundamentals and Applications
From Everand
Autonomic Networking: Fundamentals and Applications
Fouad Sabry
No ratings yet
Top Networking Terms You Should Know
From Everand
Top Networking Terms You Should Know
JOHN SMITH
No ratings yet
Penetration Testing Fundamentals-2: Penetration Testing Study Guide To Breaking Into Systems
From Everand
Penetration Testing Fundamentals-2: Penetration Testing Study Guide To Breaking Into Systems
Devi Prasad
No ratings yet
Penetration Testing Fundamentals -1: Penetration Testing Study Guide To Breaking Into Systems
From Everand
Penetration Testing Fundamentals -1: Penetration Testing Study Guide To Breaking Into Systems
Devi Prasad
No ratings yet
Learning Software Engineering
From Everand
Learning Software Engineering
IT Campus Academy
No ratings yet
Software Development Security: CISSP, #8
From Everand
Software Development Security: CISSP, #8
Selwyn Classen
No ratings yet
The Browser Hacker's Handbook
From Everand
The Browser Hacker's Handbook
Wade Alcorn
No ratings yet
Dataflow and Reactive Programming Systems
From Everand
Dataflow and Reactive Programming Systems
Matt Carkci
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Code Obfuscation Against Static and Dynamic

Uploaded by

Code Obfuscation Against Static and Dynamic

Uploaded by

Code Obfuscation against Static and Dynamic

Sebastian Schrittwieser1 and Stefan Katzenbeisser2

Abstract. The process of reverse engineering allows attackers to under-

Keywords: Code obfuscation, reverse engineering, software protection,

T. Filler et al. (Eds.): IH 2011, LNCS 6958, pp. 270–284, 2011.

to the host. However, an attacker is still able to perform dynamic analysis of

3.1 Protection against Static Reverse Engineering

Gadgets Branching Function (pseudocode)

3.2 Protection against Dynamic Reverse Engineering

and edi, 0xff

and edi, 0xff

xor esi, edi xor esi, edi

Fig. 2. Gadget graph

Fig. 3. Diversiﬁed control ﬂow graph

In order to increase the security of the obfuscation, we prevent that a path

branching function), our modiﬁcations to operands would not be compensated by

Algorithm 1. Obfuscation algorithm in pseudocode

3.3 Graph Construction

3.4 Automatic Gadget Diversification

Fig. 4. Path signatures

improve the strength of the diversiﬁcation by inserting code dependency prob-

xor esi, [ebp]

Fig. 5. Code block diversiﬁcation and obfuscation

Fig. 6. Execution time for diﬀerent gadget sizes

Security. We classiﬁed our method with Collberg’s metric. Potency (strength

Table 1. Amount of successfully reconstructed code areas (IDA Pro)

AES algorithm MOV benchmark

– Reconstructing the entire graph. Without obfuscation, an attacker

– Removing diversity of a single trace. Alternatively, the attacker could

This paper proposed a novel software obfuscation method, based on control

1. Anckaert, B., De Bosschere, K.: Diversity for Software Protection

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.