Code Obfuscation Against Static and Dynamic
Code Obfuscation Against Static and Dynamic
Reverse Engineering
1 Introduction
Today, software is usually distributed in binary form which is, from an attacker’s
perspective, substantially harder to understand than source code. However, var-
ious techniques can be applied for analyzing binary code. The process of reverse
engineering aims at restoring a higher-level representation (e.g. assembly code)
of software in order to analyze its structure and behavior. In some applications
there is a need for software developers to protect their software against reverse
engineering. The protection of intellectual property (e.g. proprietary algorithms)
contained in software, confidentiality reasons, and copy protection mechanisms
are the most important examples. Another important aspect are cryptographic
algorithms such as AES. They are designed for scenarios with trusted end-points
where encryption and decryption are performed in secure environments and with-
stand attacks in a black-box context, where an attacker does not have knowledge
of the internal state of the algorithm (such as round keys derived from the sym-
metric key). In contrast to traditional end-to-end encryption in communications
security, where the attacker resides between the trusted end-points, many types
of software (e.g. DRM clients), have to withstand attacks in a white-box context
where an attacker is able to analyze the software while its execution. This is
particularly difficult for software that runs on an untrusted host.
Software obfuscation is a technique to obscure the control flow of software as
well as data structures that contain sensitive information and is used to miti-
gate the threat of reverse engineering. Collberg et al. [8] define an obfuscating
transformation τ as a transformation of a program P into a program P so that
P and P have the same observable behavior. The original program P and the
obfuscated program P must not differ in their functionality to the user (aside
from performance losses because of the obfuscating transformation), however,
non-visible side effects, like the creation of temporary files are allowed in this
loose definition. Another formal concept of software obfuscation was defined by
Barak et al. [3]. Although this work shows that a universal obfuscator for any
type of software does not exist and perfectly secure software obfuscation is not
possible, software obfuscation is still used in commercial systems to “raise the
bar” for attackers. In the context of Digital Rights Management systems it is
the prime candidate for the protection against attackers who have full access
to the client software. While the research community developed a vast num-
ber of obfuscation schemes (see e.g. [5] and [16]) targeted against static reverse
engineering, where the structure of the software is analyzed without actually
executing it, they are still insecure against dynamic analysis techniques, which
execute the program in a debugger or virtual machine and inspect its behavior.
In this work we introduce a novel code obfuscation technique that effectively
prevents static reverse engineering and limits the impact of dynamic analysis.
Technically, we apply the concept of code diversification to enhance the complex-
ity of the software to be analyzed. Diversification was used in the past to prevent
“class breaks”, so that a crack developed for one instance of a program will most
likely not run on another instance and thus each copy of the software needs to be
attacked independently. In this work we use diversification for the first time for
a different purpose, namely increasing the resistance against dynamic analysis.
The main contribution of the paper is a novel code obfuscation scheme that
provides strong protection against automated static reverse engineering and
which uses the concept of software diversification in order to enhance the com-
plexity of dynamic analysis. Note that we do not intend to construct a perfectly
secure obfuscation scheme, as dynamic analysis can not be prevented. However,
our aim is to make attacks significantly more difficult so that knowledge derived
from one run of the software in a virtual machine does not necessarily help in
understanding the behavior of the software in runs on other inputs.
The remainder of the paper proceeds as follows. After a short overview of
related work (Section 2) we introduce our approach in Section 3. In Section 4
we explain how performance is influenced by our method and evaluate security
aspects. Finally, a conclusion is given in Section 5.
272 S. Schrittwieser and S. Katzenbeisser
2 Related Work
There are a number of publications on software obfuscation and their imple-
mentation. A comprehensive taxonomy of obfuscating transformations was in-
troduced in 1997 by Collberg et al. [8]. To measure the effect of an obfuscating
transformation, Collberg defined three metrics: potency, resilience and cost. Po-
tency describes how much more difficult the obfuscated program P is to under-
stand for humans. Software complexity metrics (e.g. [6,12,22,11,13,21,19]), which
were developed to reduce the complexity of software, can be used to evaluate
this rather subjective metric. In contrast to potency that evaluates the strength
of the obfuscating transformation against humans, resilience defines how well it
withstands an attack of an automatic deobfuscator. This metric evaluates both
the programmer effort (how much effort is required to develop a deobfuscator)
and the deobfuscator effort (the effort of space and time required for the de-
obfuscator to run). A perfect obfuscating transformation has high potency and
resilience values, but low costs in terms of additional memory usage and in-
creased execution time. In practice, a trade-off between resilience/potency and
costs (computational overhead) has to be made. However, the main problem of
measuring an obfuscation technique’s strength is that a well-defined level of se-
curity does not exist, even though it can make the process of reverse engineering
significantly harder and more time consuming. Several other theoretical works
on software obfuscation can be found in [17] and [23].
As preventing disassembling is nearly impossible in scenarios where attackers
have full control over the host on which the software is running, the common so-
lution is to make the result of disassembling worthless for further static analysis
by preventing the reconstruction of the control flow graph. To this end, [16] and
[5] use so-called branching functions to obfuscate the targets of CALL instruc-
tions: The described methods replace CALL instructions with jumps (JMP) to a
generic function (branching function), which decides at runtime which function
to call. Under the assumption that for a static analyzer the branching function
is a black box, the call target is not revealed until the actual execution of the
code. This effectively prevents reconstruction of the control flow graph using
static analysis. However, the concept of a branching function does not protect
against dynamic analysis. An attacker can still run the software on various in-
puts and observe its behavior. Medou et al. [18] argue that recently proposed
software protection models would not withstand attacks that combine static and
dynamic analysis techniques. Still, code obfuscation can make dynamic analysis
considerably harder.
An attack is called a class break, if it was developed for a single entity, but
can easily be extended to break any similar entity. In software, for example, we
would speak of a class break if an attacker can not only remove a copy protec-
tion mechanism on the software purchased, but also can write a generic patch
that removes it from every copy of the software. For software publishers, class
breaks are dreaded, because they allow mass distribution of software cracks (e.g.
on the Internet) to people who would otherwise not be able to develop cracks
themselves. The concept of diversification for preventing class breaks of software
Code Obfuscation against Static and Dynamic Reverse Engineering 273
was put forth by Anckaert [1]. An algorithm for automated software diversifica-
tion was introduced by De Sutter et al. [9]. Their approach uses optimization
techniques to generate different, but semantically equivalent, assembly instruc-
tions from code sequences. While software diversification is an effective solution
(see e.g. [2]), it raises major difficulties in software distribution, because each
copy has to be different. There is no efficient way for the distribution of diversi-
fied copies via physical media (e.g. DVD), and software updates for diversified
software are difficult to distribute as well. Franz [10] proposes a model for the
distribution of diversified software on a large scale. The author argues that the
increasing popularity of online software delivery makes it feasible to send each
user a different version of the software. However, a specific algorithm for the
diversification process is not given.
Another approach to protect cryptographic keys embedded in software is the
use of White-Box Cryptography (WBC), which attempts to construct a de-
cryption routine that is resistant against a “white-box” attacker, who is able
to observe every step of the decryption process. In WBC, the cipher is imple-
mented as a randomized network of key dependent lookup tables. A white-box
DES implementation was introduced by Chow et al. [7]. Based on this approach,
other white-box implementations of DES and AES have been proposed, but all
of them have been broken so far (see e.g. Jabob et al. [14], Wyseur et al. [24]
and Billet et al. [4]). Michiels and Gorissen [20] introduce a technique that uses
white-box cryptography to make software tamper-resistant. In their approach,
the executable code of the software is used in a white-box lookup table for the
cryptographic key. Changing the code would result in an invalid key. However,
due to the lack of secure WBC implementations, the security of this construction
is unclear.
Hardware-based approaches would allow to completely shield the actual exe-
cution of code from the attacker. However, this only moves attacks to the tamper
resistance of the hardware, while raising new challenges like difficult support for
legacy systems and high costs. Therefore, hardware-based software protection is
out of scope of this work.
3 Approach
Our approach combines obfuscation techniques against static and dynamic re-
verse engineering. Within this paper, the term static analysis refers to the process
of automated reverse engineering of software without actually executing it. Using
a disassembler, an attacker can translate machine code into assembly language,
a process that makes machine instructions visible, including ones that modify
the control flow such as jumps and calls. This way, the control flow graph of
the software can be reconstructed without executing even a single line of code.
By inserting indirect jumps that do not reveal their jump target until runtime
and utilizing the concept of a branching function we make static control flow
reconstruction more difficult.
Employing code obfuscation to prevent static analysis is a first step towards
running code securely, even in the presence of attackers who have full access
274 S. Schrittwieser and S. Katzenbeisser
#!% _branch:
! ! save flags on stack
$ !% save registers on stack
EAX <= [sig]
ADD lookupTable to EAX
target <= [EAX]
% ! restore registers
% ! restore flags
jump to [target]
$ !%
6
4
5
%
#" 3
$ !% 1
Fig. 1. Overall architecture of the obfuscated program: small code blocks (gadgets) are
connected by a branching function
add ebp, 4
mov esi, ebx
Our graph construction algorithm takes the original program code as well as
a minimum and maximum gadget size and a minimum and maximum branching
size as input parameters and is based on a depth-first search. Starting at the root
node, the algorithm adds a random number of child nodes (within the bounds
of the branching size) and assigns a gadget to each connecting edge. All edges
to child nodes contain the same code by means of being filled with a random
number of instructions (within the given bounds on the gadget size) from the
original code. Only the gadget size and therefore the number of instructions
differ at this stage. Gadgets are not diversified at graph construction time. We
define the absolute number of instructions executed until reaching a node of
the graph as node level. Before adding a new node to the graph, the algorithm
calculates the node level of the new node and checks if it already exists anywhere
in the graph. It that case, instead of creating the node, the algorithm links to the
existing node. This method prevents a continually growing width of the graph.
During gadget graph construction, we calculate and store a path signature in
each node. We make it unique (see below) so that it clearly identifies the node
and all its predecessors. The signature is based on simple ADD and SUB assem-
bly instructions on a fixed memory location. Each gadget adds (or subtracts) a
random value to (or from) the value stored in memory. When traversing through
the graph, the value stored at the memory location identifies the currently ex-
ecuted gadget and the path that was taken through the graph to reach this
gadget. A node can have more than one signature, as more than one path of the
graph could reach this node. In that case, each node signature uniquely identifies
one of the possible paths from the root to the node. During signature assignment
we prevent collisions (two nodes sharing the same signature), by comparing the
current signature to all previously calculated signatures and choosing a differ-
ent value for the ADD or SUB instruction if needed. We decided to implement
a trail-and-error approach instead of an algorithm that generates provable dis-
tinct signatures to avoid performances bottlenecks at runtime. Figure 4 shows
the path signature for a small graph.
We further add a second input parameter to the branching function described
in the static part of our approach. Now, both the program’s input and the
path signature are input parameters for a lookup table that determines the next
gadget to be called. To eliminate any information leakage from the branching
function’s input value, only a hash value of the program’s input and the path
signature is stored in the lookup table.
-3 3
1
-3 1 3
2 -3
-1|-2 4
10 6
9|8 5|4|7
because a very simple matching algorithm can easily identify them as equiva-
lent. However, analogous to the instruction splitting method, multi-instruction
patterns can be combined with dummy code insertions to enable strong diversi-
fication. To provide an example, consider the instructions push ebp; mov ebp,
esp. A semantically equivalent expression would be push ebp; push esp; pop
ebp. A simple substitution transformation of one version for the other would
most likely not withstand an automated attack. However, if the transformation
is combined with dummy code insertion (e.g. push ebp; push esp; add esp,
[0x0040EA00]; pop ebp, where 0x0040EA00 is 0), an attacker with local knowl-
edge of the gadget can not reveal the dummy code instructions and hence can
not decide gadget equivalence locally.
Figure 5 shows the transformation of a small code block. The transformation
function τ adds dummy code (lines 4 and 6) and modifies the instruction add
ebp, 4 so that it only provides the correct functionality if the corresponding
input 8 is loaded into register eax. This modification prevents an attacker from
extracting this specific (and fully functional) trace and using it with other inputs.
To be able to generalize a trace, all input dependent operand modifications would
have to be removed, thus the entire code would have to be analyzed instruction
by instruction.
4 Discussion
The following section discusses the impact of our obfuscation scheme on perfor-
mance and size of the resulting program and evaluates security aspects.
Performance and Size. To demonstrate the effectiveness of our approach, we
implemented a prototype that reads assembly source code and generates an
obfuscated version of it. We measured the performance losses of a simple bench-
marking tool as well as a standard AES implementation using 8 different gadgets
sizes. While the dynamic part of our approach accounts for an increase in re-
quired memory space because of diversified copies of gadgets, execution time
heavily depends on the size and implementation of the branching function, as
it inserts additional instructions. The performance decreases with the number
Code Obfuscation against Static and Dynamic Reverse Engineering 281
of gadgets, due to calls to the branching function, which are required to switch
between gadgets. In contrast, the strength of the obfuscation is directly propor-
tional to the number of gadgets, so a trade-off between obfuscation strength and
performance has to be made. We compared different gadget sizes from 1 to 50
with the execution times of the non-obfuscated programs (see Figure 6). While
very small gadgets result in significant performance decreases, the execution time
for a program with a gadget sizes of 10 and bigger approximates the execution
time for the original program.
120
AES
MOV benchmark
80
overhead
40
0
0 5 10 15 20 25 30 35 40 45 50
number of instructions per gadget
obfuscated ones. The values in the table are the percent of successfully recon-
structed areas. While IDA Pro was able to reconstruct nearly 38% of the original
AES code, the percentage for the obfuscated version declined to about 10%. For
the MOV benchmark, the difference was even larger. The results show that for
both the AES algorithm and the MOV benchmark, the obfuscated version was
much more difficult to reconstruct for IDA Pro. The huge differences between the
two examples was caused by different amount of obfuscated code. While for the
MOV benchmark the entire code was obfuscated, in the AES example only the
algorithm itself was obfuscated. IDA Pro was able to reconstruct non-obfuscated
parts of the code correctly, but failed at reconstructing obfuscated code. The dis-
assembler is not able to determine the jump targets of the branching function
without actually executing it.
The second tool we used for evaluation is Jakstab [15] which aims at recovering
control flow graphs. Jakstab was not able to resolve the indirect jump at the
end of the branching function of our sample program. Although it successfully
extracted some of the jump targets from the lookup table, the correct order of
the jumps still remained unknown to Jakstab.
Although both tools implement methods for disassembling software and re-
constructing control flow graphs, it is not surprising to see them fail at breaking
our proposed obfuscation technique as they are not tailored to our particular
implementation. Hence, for a more realistic evaluation we also discuss on what
a possible deobfuscator for our approach would look like.
One of the main strengths of our approach is that obfuscated software does
not contain an explicit representation of the graph structure. It is hidden inside
the lookup table, which only reveals the direct successor of a gadget within a
single trace during runtime. If an attacker wants to manipulate the software
(e.g. remove a copy protection mechanism) he could pursue the following two
strategies:
Neither strategy can likely be performed without human interaction. In the first
one, a large number of variants of the same copy protection mechanism would
have to be identified and removed manually from the individual traces. In the
second strategy, a human deobfuscator would have to analyze an entire trace
to be able to identify the inserted modifications that make the trace specific to
a single input. We believe, that this high amount of manual effort significantly
raises the bar for reverse engineering attacks.
5 Conclusion
References
5. Cappaert, J., Preneel, B.: A general model for hiding control flow. In: Proceedings
of the Tenth Annual ACM Workshop on Digital Rights Management. ACM, New
York (2010)
6. Chidamber, S., Kemerer, C.: A metrics suite for object oriented design. IEEE
Transactions on Software Engineering 20(6) (2002)
7. Chow, S., Eisen, P., Johnson, H., van Oorschot, P.: A white-box DES implemen-
tation for DRM applications. In: Digital Rights Management, pp. 1–15 (2003)
8. Collberg, C., Thomborson, C., Low, D.: A taxonomy of obfuscating transformations
(1997)
9. De Sutter, B., Anckaert, B., Geiregat, J., Chanet, D., De Bosschere, K.: Instruction
set limitation in support of software diversity. In: Lee, P.J., Cheon, J.H. (eds.)
ICISC 2008. LNCS, vol. 5461, pp. 152–165. Springer, Heidelberg (2009)
10. Franz, M.: E unibus pluram: massive-scale software diversity as a defense mecha-
nism. In: Proceedings of the 2010 Workshop on New Security Paradigms. ACM,
New York (2010)
11. Halstead, M.: Elements of software science. Elsevier, New York (1977)
12. Harrison, W., Magel, K.: A complexity measure based on nesting level. ACM Sig-
plan Notices 16(3) (1981)
13. Henry, S., Kafura, D.: Software Structure Metrics Based on Information Flow.
IEEE Transactions on Software Engineering 7(5), 510–518 (1981)
14. Jacob, M., Boneh, D., Felten, E.: Attacking an obfuscated cipher by injecting faults.
In: Digital Rights Management, pp. 16–31 (2003)
15. Kinder, J., Veith, H.: Jakstab: A static analysis platform for binaries. In: Gupta,
A., Malik, S. (eds.) CAV 2008. LNCS, vol. 5123, pp. 423–427. Springer, Heidelberg
(2008)
16. Linn, C., Debray, S.: Obfuscation of executable code to improve resistance to static
disassembly. In: Proceedings of the 10th ACM Conference on Computer and Com-
munications Security. ACM, New York (2003)
17. Lynn, B., Prabhakaran, M., Sahai, A.: Positive results and techniques for obfusca-
tion. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004. LNCS, vol. 3027,
pp. 20–39. Springer, Heidelberg (2004)
18. Madou, M., Anckaert, B., De Sutter, B., De Bosschere, K.: Hybrid static-dynamic
attacks against software protection mechanisms. In: Proceedings of the 5th ACM
Workshop on Digital Rights Management, pp. 75–82. ACM, New York (2005)
19. McCabe, T.: A complexity measure. IEEE Transactions on Software Engineering
(1976)
20. Michiels, W., Gorissen, P.: Mechanism for software tamper resistance: an applica-
tion of white-box cryptography. In: Proceedings of the 2007 ACM Workshop on
Digital Rights Management, pp. 82–89. ACM, New York (2007)
21. Munson Taghi, M., John, C.: Measurement of data structure complexity. Journal
of Systems and Software 20(3), 217–225 (1993)
22. Oviedo, E.: Control flow, data flow and program complexity. McGraw-Hill, Inc.,
New York (1993)
23. Wee, H.: On obfuscating point functions. In: Proceedings of the Thirty-Seventh
Annual ACM Symposium on Theory of Computing. ACM, New York (2005)
24. Wyseur, B., Michiels, W., Gorissen, P., Preneel, B.: Cryptanalysis of white-box
DES implementations with arbitrary external encodings. In: Adams, C., Miri, A.,
Wiener, M. (eds.) SAC 2007. LNCS, vol. 4876, pp. 264–277. Springer, Heidelberg
(2007)