Detecting Kernel-Level Rootkits Through Binary Analysis
Detecting Kernel-Level Rootkits Through Binary Analysis
+
(
!
cuted (using the state previously saved at node 4). Then, the
execution process is rewinded to the first check point, and
continues via the right path (i.e., via node 3). Again, the ma-
chine state needs to be saved at node 4, and both alterna-
tives are followed a second time. Thus a total of four paths
have to be explored as a result of only two branch instruc-
tions.
Also, it is possible that impossible paths are being fol- Figure 2. Control flow graph with loop.
lowed. If, in our example, both the branch(x) and the
branch(y) instructions evaluated to the same boolean
value, it would be impossible that execution flows through
nodes 2 and 6, or through nodes 3 and 5. For our prototype, Instead, a more sophisticated algorithm based on the con-
the path explosion problem and impossible paths have not trol flow graph of the binary is necessary. In [1], a suitable
caused any difficulties (refer to Section 4 for the evaluation algorithm is presented that is based on dominator trees. This
of our system). This is due to the limited size of the kernel algorithm operates on the control flow graph and can detect
modules. Therefore, we save the machine state at every con- (and remove) the back-edges of loops. Simply speaking, a
ditional branch instruction and explore both alternative con- back-edge is the jump from the end of the loop body back to
tinuations. the loop header, and it is usually the edge that would be iden-
Another problem is the presence of loops. Because the tified as the “loop-defining-edge” by a human looking at the
machine state is saved at every branch instruction and both control flow graph. For example, Figure 2 shows a control
alternatives are explored one after another, the existence of a flow graph with a loop and the corresponding back-edge.
For our system, we first create a control flow graph of the dlers, while adore-ng patches itself into the VFS layer of
kernel module code after it has been preprocessed. Then, a the kernel to intercept accesses to the /proc file system.
loop detection algorithm is run and the back-edges are de- Since each rootkit was extensively analyzed during the pro-
tected. Each conditional branch instruction that has a back- totype development phase, it was expected that all malicious
edge as a possible continuation is tagged appropriately. Dur- kernel accesses would be discovered.
ing symbolic execution, no machine state is saved at these
instructions and processing continues only at the non-back- The second set consisted of a set of seven additional pop-
edge alternative. This basically means that a loop is executed ular rootkits downloaded from the Internet, described in Ta-
at most once by our system. For future work, we intend to re- ble 1. Since these rootkits were not analyzed during the pro-
place this simple approach by more advanced algorithms for totype development phase, the detection rate for this group
symbolic execution of loops. Note, however, that more so- can be considered a measure of the generality of the detec-
phisticated algorithms that attempt to execute a loop multi- tion technique as applied against previously unknown root-
ple times will eventually hit the limits defined by the halting kits that utilize similar means to subvert the kernel as knark
problem. Thus, every approach has to accept a certain de- and adore-ng.
gree of incompleteness that could lead to incorrect results.
A last problem are indirect jumps that are based on un- The final set consisted of a control group of legitimate
known values. In such cases, it might be possible to heuristi- kernel modules, namely the entire default set of kernel mod-
cally choose possible targets and speculatively continue with ules for the Fedora Core 1 Linux x86 distribution. This
the execution process there. In our current prototype, how- set includes 985 modules implementing various components
ever, we simply terminate control flow at these points. The of the Linux kernel, including networking protocols (e.g.,
reason is that indirect jumps based on unknown values al- IPv6), bus protocols (e.g., USB), file systems (e.g., EXT3),
most never occurred in our experiments. and device drivers (e.g., network interfaces, video cards). It
was assumed that no modules incorporating rootkit function-
ality were present in this set.
4. Evaluation
Table 2 presents the results of the detection evaluation for
The proposed rootkit detection algorithm was imple-
each of the three sets of modules. As expected, all malicious
mented as a user space prototype that simulated the object
writes to kernel memory by both knark and adore-ng
parsing and symbol resolution performed by the exist-
were detected, resulting in a false negative rate of 0% for
ing kernel module loader before disassembling the mod-
both rootkits. All malicious writes by each evaluation root-
ule and analyzing the code for the presence of malicious
kit were detected as well, resulting in a false negative rate of
writes to kernel memory. The prototype implementa-
0% for this set. We interpret this result as an indication that
tion was evaluated with respect to its detection capabil-
the detection technique generalizes well to previously un-
ities and performance impact on production systems. To
seen rootkits. Finally, no malicious writes were reported by
this end, an experiment was devised in which the proto-
the prototype for the control group, resulting in a false pos-
type was run on several sets of kernel modules. Detection
itive rate of 0%. We thus conclude that the detection algo-
capability for each set was evaluated in terms of false pos-
rithm is completely successful in distinguishing rootkits ex-
itive rates for legitimate modules, and false negative rates
hibiting specified malicious behavior from legitimate kernel
for rootkit modules. Detection performance was evalu-
modules, as no misclassifications occurred during the entire
ated in terms of the total execution time of the prototype
detection evaluation.
for each module analyzed. The evaluation itself was con-
ducted on a testbed consisting of a single default Fedora
Core 1 Linux installation on a Pentium IV 2.0 GHz work-
station with 1 GB of RAM. scan: initializing scan for rootkits/all-root.o
scan: loading kernel symbol table from boot/System.map
scan: kernel memory configured [c0100000-c041eaf8]
4.1. Detection Results scan: resolving external symbols in section .text
scan: disassembling section .text
scan: performing scan from [.text+40]
For the detection evaluation, three sets of kernel mod- scan: WRITE TO KERNEL MEMORY [c0347df0] at [.text+50]
ules were created. The first set comprised the knark and scan: 1 malicious write detected, denying module load
For the performance evaluation, the elapsed execution Figure 5. all-root initialization function.
time of the analysis phase of the prototype was recorded
for all modules, legitimate and malicious. Time spent pars-
ing the object file and patching relocation table entries into the module was excluded, as these functions are already per-
formed as part of the normal operation of the existing mod-
ule loader. The goal of the evaluation was to provide some
1 Note that this disassembly was generated prior to kernel symbol resolu-
tion, thus the displayed read and write accesses are performed on place indication about the performance overhead introduced by the
holder addresses. At runtime and for the symbolic execution, the proper detection process in the loading of a module in a production
memory address would be patched into the code. kernel. Note that as mentioned previously, no runtime over-
head is generated by our technique after the module has been Our tool is currently available as a user program only. In
loaded. order to provide automatic protection from rootkits, it would
be necessary to integrate our analyzer into the kernel’s mod-
1000
ule loading infrastructure. As an additional requirement, the
Detection Overhead
analyzer must not be bypassable when a process with root
permissions attempts to load a module. The reason is that
Number of Modules
100 kernel modules can only be inserted by the root user. Thus,
the threat model has to assume that the attacker has supe-
ruser privileges when attempting to load a kernel module.
10
Up until Linux 2.4, most work of the module loading pro-
cess was done in user space, using the insmod program. In
1 this case, adding our checker to insmod would not be use-
0 100 200 300 400 500
Execution Time (ms) ful because an attacker can simply supply a customized ver-
Figure 6. Detection overhead on module load. sion without checks. The solution is to move the analyzer
code into kernel space. Interestingly, starting from Linux 2.5,
most of the module loading code has been moved into the
kernel space, providing an optimal place to add our checks.
Figure 6 shows the elapsed execution time of all evalu- Unfortunately, mechanisms have been proposed to inject
ated modules, discretized into log-scale buckets with a width code directly into the kernel without using the module load-
of 10 ms. As we can see, the vast majority of modules would ing interface. These ideas originated from the fact that some
experience a delay of 10 ms or less during module load. Sev- system administrators disabled the module loading function-
eral modules with more complex initialization procedures ality as a defense against kernel-level rootkits. These mech-
(and thus complex control flow graphs) required more time anisms operate by writing the code directly into kernel space
to fully analyze, but as can be seen in Table 3, the detection via the /dev/kmem device, completely bypassing the mod-
algorithm never spent more than 420 ms to classify a mod- ule loading code.
ule as malicious or legitimate. Thus, we conclude that the In our opinion, a sensible and secure solution would disal-
impact of the detection algorithm on the module load opera- low modifications of kernel memory via /dev/kmem, a fea-
tion is acceptable for a production system. ture that is already offered by Linux security solutions such
as grsecurity [5]. In addition, our kernel-level rootkit analy-
Minimum Maximum Median Std. Deviation sis system would operate in kernel context behind the mod-
0.00 ms 420.00 ms 0.00 ms 39.83 ule loading interface, thus having the opportunity to stati-
cally scan each module before it gets to run as part of the
Table 3. Detection overhead statistics. kernel.
A possible way for rootkits to evade the behavioral spec-
ification that is based on forbidden kernel symbols (see Sec-
tion 3 for details) is to stop using these symbols. However, to
5. Discussion perform the necessary modifications of the kernel data struc-
tures or function pointers, their addresses are needed. There-
Our prototype is a user-space program that statically an- fore, alternative approaches to resolving these addresses are
alyzes Linux loadable kernel modules for the presence of required. One option is to use a brute force guessing tech-
rootkit functionality. These modules have to be ELF object nique that works by scanning the kernel memory for the oc-
files that are compiled for the Intel x86 architecture. currence of “known content” that is stored at the target loca-
The limitation on the classes of modules that can be an- tion. This is particularly effective for the system call table.
alyzed stems from the fact that a kernel module needs to be The reason is that its content is known because system call
parsed and its code sections disassembled before the actual table entries are pointers to handler functions whose sym-
analysis can start. Therefore, additional parsing and disas- bols are exported.
sembly routines would be necessary to process different ob- Although a brute force guessing approach might not al-
ject file formats or instruction sets. Because a vast majority ways be suitable, we propose the addition of a specifica-
of Linux systems run on Intel x86 machines, and because tion that considers the scanning of kernel memory as an-
Linux kernel modules have to be provided as ELF object other indication of the presence of a rootkit. This specifi-
files, we developed our prototype for this combination. The cation checks for loops that, starting from any kernel sym-
analysis technique itself, however, can be readily extended bol, sequentially read data and compare this data to constant
to other systems. values. Also, note that the specification that checks for il-
legitimate memory accesses based on actual destination ad- References
dresses works independently of kernel symbols referenced
by the module. [1] A. Aho, R. Sethi, and J. Ullman. Compilers – Principles,
Techniques, and Tools. World Student Series of Computer
Science. Addison Wesley, 1986.
6. Conclusions [2] S. Aubert. rkscan: Rootkit Scanner. http:
//www.hsc.fr/ressources/outils/rkscan/
Rootkits are powerful attack tools that are used by in- index.html.en, 2004.
truders to hide their presence from system administrators. [3] Black Tie Affair. Hiding Out Under UNIX. Phrack Maga-
Kernel-level rootkits, in particular, directly modify the ker- zine, 3(25), 1989.
nel, and, therefore, can intercept and prevent any attempt of [4] FuSyS. Kstat v. 1.1-2. http://s0ftpj.org/, November 2002.
an administrator to determine if the security of the system [5] grsecurity. An innovative approach to security utilizing a
has been violated. Because of this, it is important to devise multi-layered detection, prevention, and containment model.
mechanisms that can protect the integrity of the kernel even http://www.grsecurity.net/, 2004.
in the aftermath of the compromise of the administrator ac- [6] Halflife. Abuse of the Linux Kernel for Fun and Profit. Phrack
count. Magazine, 7(50), April 1997.
This paper presents a technique that is based on static [7] G. Kim and E. Spafford. The Design and Implementation of
analysis to identify instruction sequences that are an indi- Tripwire: A File System Integrity Checker. Technical report,
cation of rootkits. Informal behavioral specifications define Purdue University, Nov. 1993.
such characteristic instruction sequences as data transfer op- [8] T. Lawless. St. Michael and St. Jude. http://
sourceforge.net/projects/stjude/, 2004.
erations that write to certain illegitimate kernel memory ar-
[9] T. Miller. T0rn rootkit analysis. http://www.ossec.
eas. Symbolic execution is then used to simulate the execu-
net/rootkits/studies/t0rn.txt.
tion of the kernel module to detect instructions that fulfill
[10] T. Miller. Analysis of the KNARK Rootkit. http://www.
these specifications. Through this method, it is possible to ossec.net/rootkits/studies/knark.txt, 2004.
detect malicious behavior before a module is loaded into the [11] N. Murilo and K. Steding-Jessen. Chkrootkit v. 0.43. http:
kernel, and, in addition, it is possible to operate on closed- //www.chkrootkit.org/.
source components, such as proprietary drivers. [12] D. Safford. The Need for TCPA. IBM White Paper, October
We implemented our technique in a prototype tool and we 2002.
evaluated both the effectiveness and the performance of the [13] sd and devik. Linux on-the-fly kernel patching without LKM.
tool with respect to nine real-world rootkits as well as the Phrack Magazine, 11(58), 2001.
complete set of 985 legitimate kernel modules that are in- [14] Stealth. adore. http://spider.scorpions.net/
cluded with the Fedora Core 1 Linux distribution. The re- ˜stealth, 2001.
sults show that all tested rootkits were successfully identi- [15] Stealth. Kernel Rootkit Experiences and the Future. Phrack
fied, and no false positives were raised on legitimate mod- Magazine, 11(61), August 2003.
ules. We thus conclude that the technique can reliably de- [16] Stealth. adore-ng. http://stealth.7350.org/
tect malicious kernel modules and, therefore, it represents a rootkits/, 2004.
useful tool to harden the operating system kernel. In addi- [17] TCG. Trusted Computing Group Home. https://www.
tion, we show that detection can be done efficiently, despite trustedcomputinggroup.org/home, 2004.
the application of a potentially expensive static analysis tech-
nique.
Future work will be centered on devising a more formal
description of the aspects that characterize rootkit-like be-
havior. In addition, we plan to study how attacks that attempt
to bypass our detection procedures can be prevented. Finally,
we intend to integrate the detection component into the ker-
nel module loader infrastructure as a step towards preparing
the system for general usage.
Acknowledgments
This research was supported by the Army Research Of-
fice, under agreement DAAD19-01-1-0484 and by the Na-
tional Science Foundation under grants CCR-0209065 and
CCR-0238492.