0% found this document useful (0 votes)
9 views10 pages

Detecting Kernel-Level Rootkits Through Binary Analysis

Uploaded by

wagdcps
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views10 pages

Detecting Kernel-Level Rootkits Through Binary Analysis

Uploaded by

wagdcps
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Detecting Kernel-Level Rootkits

Through Binary Analysis

Christopher Kruegel William Robertson and Giovanni Vigna


Technical University Vienna Reliable Software Group
chris@auto.tuwien.ac.at University of California, Santa Barbara
{wkr,vigna}@cs.ucsb.edu

Abstract The tools used by an attacker after gaining administra-


tive privileges include tools to hide the presence of the at-
A rootkit is a collection of tools used by intruders to tacker (e.g., log editors), utilities to gather information about
keep the legitimate users and administrators of a compro- the system and its environment (e.g., network sniffers), tools
mised machine unaware of their presence. Originally, root- to ensure that the attacker can regain access at a later time
kits mainly included modified versions of system auditing (e.g., backdoored servers), and means of attacking other sys-
programs (e.g., ps or netstat on a Unix system). How- tems. Common tools have been bundled by the hacker com-
ever, for operating systems that support loadable kernel munity into “easy-to-use” kits, called rootkits [3].
modules (e.g., Linux and Solaris), a new type of rootkit has Even though the idea of a rootkit is to provide all the
recently emerged. These rootkits are implemented as kernel tools that may be needed after a system has been compro-
modules, and they do not require modification of user-space mised, rootkits focus in particular on backdoored programs
binaries to conceal malicious activity. Instead, these rootkits and tools to hide the attacker from the system administra-
operate within the kernel, modifying critical data structures tor. Originally, rootkits mainly included modified versions
such as the system call table or the list of currently-loaded of system auditing programs (e.g., ps or netstat for Unix
kernel modules. systems) [9]. These modified programs do not return any in-
This paper presents a technique that exploits binary anal- formation to the administrator that involves specific files and
ysis to ascertain, at load time, if a module’s behavior re- processes used by the intruder. Such tools, however, are eas-
sembles the behavior of a rootkit. Through this method, it ily detected using file integrity checkers such as Tripwire [7].
is possible to provide additional protection against this type Recently, a new type of rootkit has emerged. These root-
of malicious modification of the kernel. Our technique relies kits are implemented as loadable kernel modules (LKMs). A
on an abstract model of module behavior that is not affected loadable kernel module is an extension to the operating sys-
by small changes in the binary image of the module. There- tem (e.g., a device driver) that can be loaded into and un-
fore, the technique is resistant to attempts to conceal the ma- loaded from the kernel at runtime. Solaris and Linux are two
licious nature of a kernel module. popular operating systems that support this type of runtime
kernel extension.
By implementing a rootkit as a kernel module, it is possi-
ble to modify critical kernel data structures (such as the sys-
1. Introduction tem call table, the list of active processes, or the list of kernel
modules) or intercept requests to the kernel regarding files
Most intrusions and computer security incidents follow a and processes that are created by an intruder [10, 14, 15].
common pattern where a remote user scans a target system Once the kernel is infected, it is very hard to determine
for vulnerable services, launches an attack to gain some type if a system has been compromised without the help of
of access to the system, and, eventually, escalates her privi- hardware extensions such as the Trusted Platform Module
leges. These privileges are then used to create backdoors that (TPM) [17, 12]. Therefore, it is important that mechanisms
will allow the attacker to return to the system at a later time. are in place to detect kernel rootkits and prevent their inser-
In addition, actions are taken to hide the evidence that the tion into the kernel.
system has been compromised in order to prevent the sys- In this paper, we present a technique for the detection of
tem administrator from noticing the security breach and im- kernel-level rootkits in the Linux operating system. The tech-
plementing countermeasures (e.g., reinstalling the system). nique is based on static analysis of loadable kernel module
binaries. More precisely, the use of behavioral specifications the file system, and, therefore, it may be detected using in-
and symbolic execution allow one to determine if the mod- tegrity checkers.
ule being loaded includes evidence of malicious intent. Another way to modify the behavior of the kernel is to
The contribution of this approach is twofold. First, by us- access kernel memory directly from user space through the
ing static analysis, our technique is able to determine if a /dev/kmem file. This technique (used, for example, by
kernel module is malicious before the kernel module is ac- SucKIT [13]) requires the identification of data structures
tually loaded into the kernel and executed. This is a major that need to be modified within the kernel image. However,
advantage, because once the kernel image has been modi- this is not impossible; in particular, well-known data struc-
fied it may become infeasible to perform dynamic analysis tures such as the system call table are relatively easy to lo-
of the module’s actions in a reliable way. Second, the tech- cate.
nique is applied to the binary image of a module and does not Kernel-level rootkits can be detected in a number of dif-
require access to the module’s source code. Because of this, ferent ways. The most basic techniques include searching
the technique is widely applicable and it is possible to ana- for modified kernel modules on disk, searching for known
lyze the behavior of device drivers and other closed-source strings in existing binaries, or searching for configuration
kernel components that are distributed in binary form only. files associated with specific rootkits. The problem is that
The rest of the paper is structured as follows. Section 2 when a system has been compromised at the kernel level,
discusses related work on rootkits and rootkit detection. Sec- there is no guarantee that these detection tools will return re-
tion 3 presents our approach to the detection of kernel-level liable results. This is also true for signature-based rootkit de-
rootkits. Then, Section 4 provides an experimental evalua- tection tools such as chkrootkit [11] that rely on oper-
tion of the effectiveness and efficiency of our technique. Fi- ating system services to scan a machine for indications of
nally, Section 5 discusses possible limitations of the current known rootkits.
prototype, while Section 6 briefly concludes. To circumvent the problem of a possibly untrusted op-
erating system, rootkit scanners such as kstat [4],
rkscan [2], or St. Michael [8] follow a different ap-
2. Related Work proach. These tools are either implemented as kernel
modules with direct access to kernel memory, or they an-
Kernel-level rootkits have been circulating in the under- alyze the contents of the kernel memory via /dev/kmem.
ground hacker community for some time and in different Both techniques allow the programs to monitor the in-
forms [6]. In general, there are different means that can be tegrity of important kernel data structures without the use of
used to modify kernel behavior. system calls. For example, by comparing the system call ad-
The most common way of modifying the kernel is by in- dresses in the system call table with known good values
serting a loadable kernel module. The module has access (taken from the /boot/System.map file), it is possi-
to the symbols exported by the kernel and can modify any ble to identify hijacked system call entries.
data structure or function pointer that is accessible. Typi- This approach is less prone to being foiled by a kernel-
cally, these kernel-level rootkits “hijack” entries in the sys- level rootkit because kernel memory is accessed directly.
tem call table and provide modified implementations of the Nevertheless, changes can only be detected after a rootkit
corresponding system call functions [10, 14]. These modi- has been installed. In this case, the rootkit had the chance to
fied system calls often perform checks on the data passed execute arbitrary code in the context of the kernel. Thus, it is
back to a user process and can thus efficiently hide informa- possible that actions have been performed to thwart or dis-
tion about files and processes. An interesting variation is im- able rootkit scanners. Also, rootkits can carry out changes at
plemented by the adore-ng rootkit [15, 16]. In this case, locations that are not monitored (e.g., task structures).
the rootkit does not touch the system call table but hijacks the
routines used by the Virtual File System (VFS), and, there- 3. Rootkit Detection
fore, it is able to intercept (and modify) calls that access files
in both the /proc file system and the root file system. The idea for our detection approach is based on the ob-
A related technique injects malicious code directly into servation that the runtime behavior of regular kernel mod-
existing kernel modules instead of providing a complete ules (e.g., device drivers) differs significantly from the be-
rootkit module. While this solution is in principle similar to havior of kernel-level rootkits. We note that regular modules
the insertion of a rootkit kernel module, it has the advan- have different goals than rootkits, and thus implement differ-
tage that the modification will survive a kernel reboot pro- ent functionality.
cedure if the modified module is automatically loaded in the The main contribution of this paper is that we show that it
kernel standard configuration. On the other hand, this tech- is possible to distinguish between regular modules and root-
nique requires the modification of a binary that is stored on kits by statically analyzing kernel module binaries. The anal-
ysis is performed in two steps. First, we have to specify unde- the /proc file system. This is accomplished by changing
sirable behavior. Second, each kernel module binary is stati- the function addresses in the /proc file system root node
cally analyzed for the presence of instructions sequences that that point to the corresponding read and write functions. Be-
implement these specifications. cause the /proc file system is used by many auditing tools
Currently, our specifications are given informally, and the to gather information about the system (e.g., about running
analysis step has to be adjusted appropriately to deal with processes, or open network connections), a rootkit can eas-
new specifications. Although it might be possible to intro- ily hide important information by filtering the output that is
duce a formal mechanism to model behavioral specifications, passed back to the user process. An example of this approach
it is not necessary for our detection prototype. The reason is is the adore-ng rootkit [16], which replaces functions of
that a few general specifications are sufficient to accurately the virtual file system (VFS) node of the /proc file sys-
capture the malicious behavior of all LKM-based rootkits. tem.
Nevertheless, the analysis technique is powerful enough that As a general observation, we note that rootkits perform
it can be easily extended. This may become necessary when writes to a number of locations in the kernel address space
rootkit authors actively attempt to evade detection by chang- that are usually not touched by regular modules. These writes
ing the code such that it does not adhere to any of our speci- are necessary either to obtain control over system services
fications. (e.g., by changing the system call table, file system functions,
or the list of active processes) or to hide the presence of the
kernel rootkit itself (e.g., modifying the list of installed mod-
3.1. Specification of Behavior
ules). Because write operations to operating system manage-
ment structures are required to implement the needed func-
A specification of malicious behavior has to model a se-
tionality, and because these writes are unique to kernel root-
quence of instructions that is characteristic for rootkits but
kits, they present a salient opportunity to specify malicious
that does not appear in regular modules (at least, with a high
behavior.
probability). That is, we have to analyze the behavior of root-
kits to derive appropriate specifications that can be used dur- To be more precise, we identify a loadable kernel mod-
ing the analysis step. ule as a rootkit based on the following two behavioral speci-
fications:
In general, kernel modules (e.g., device drivers) initial-
ize their internal data structures during startup and then in- 1. The module contains a data transfer instruction that per-
teract with the kernel via function calls, using both system forms a write operation to an illegal memory area, or
calls or functions internal to the kernel. In particular, it is
2. the module contains an instruction sequence that i) uses
not often necessary that a module directly writes to kernel
a forbidden kernel symbol reference to calculate an ad-
memory. Some exceptions include device drivers that read
dress in the kernel’s address space and ii) performs a
from and write to memory areas that are associated with a
write operation using this address.
managed device and that are mapped into the kernel address
space to provide more efficient access or modules that over- Whenever the destination address of a data transfer can
write function pointers to register themselves for event call- be determined statically during the analysis step, it is possi-
backs. ble to check whether this address is within a legitimate area.
Kernel-level rootkits, on the other hand, usually write di- The notion of legitimate areas is defined by a white-list that
rectly to kernel memory to alter important system manage- specifies the kernel addressed that can be safely written to.
ment data structures. The purpose is to intercept the regular For our current system, these areas include function pointers
control flow of the kernel when system services are requested used as event call-back hooks (e.g., br ioctl hook()) or
by a user process. This is done in order to monitor or change exported arrays (e.g., blk dev).
the results that are returned by these services to the user pro- One drawback of the first specification is the fact that the
cess. Because system calls are the most obvious entry point destination address must be derivable during the static anal-
for requesting kernel services, the earliest kernel-level root- ysis process. Therefore, a complementary specification is in-
kits modified the system call table accordingly. For example, troduced that checks for writes to any memory address that
one of the first actions of the knark [10] rootkit is to re- is calculated using a forbidden kernel symbol.
place entries in the system call table with customized func- A kernel symbol refers to a kernel variable with its cor-
tions to hide files and processes. responding address that is exported by the kernel (e.g., via
In newer kernel releases, the system call table is no longer /proc/ksysm). These symbols are needed by the module
exported by the kernel, and thus it cannot be directly ac- loader, which loads and inserts modules into the kernel ad-
cessed by kernel modules. Therefore, alternative approaches dress space. When a kernel module is loaded, all references
to influence the results of operating system services have to external variables that are declared in this module but de-
been proposed. One such solution is to monitor accesses to fined in the kernel (or in other modules) have to be patched
appropriately. This patching process is performed by substi- Note that our behavioral specifications have the advantage
tuting the place holder addresses of the declared variables that they provide a general model of undesirable behavior.
in the module with the actual addresses of the correspond- That is, these specifications characterize an entire class of
ing symbols in the kernel. malicious actions. This is different from fine-grained spec-
The notion of forbidden kernel symbols can be based ifications that need to be tailored to individual kernel mod-
on black-lists or white-lists. A black-list approach enumer- ules. In addition, behavioral specifications have the poten-
ates all forbidden symbols that are likely to be misused by tial to detect previously unknown rootkits. In contrast to ap-
rootkits, for example, the system call table, the root of the proaches that rely on anti-virus-like pattern matching tech-
/proc file system, the list of modules, or the task structure niques, our tool can detect any kernel-level rootkit that satis-
list. A white-list, on the other hand, explicitly defines ac- fies our assumptions.
ceptable kernel symbols that can legitimately be accessed by
modules. As usual, a white-list approach is more restrictive, 3.2. Symbolic Execution
but may lead to false positives when a module references a
legitimate but infrequently used kernel symbol that has not Based on the specifications introduced in the previous
been allowed previously. However, following the principle section, the task of the analysis step is to statically check the
of fail-safe defaults, a white-list also provides greater assur- module binary for instructions that correspond to these spec-
ance that the detection process cannot be circumvented. ifications. When such instructions are found, the module is
Note that it is not necessarily malicious when a forbidden labeled as a rootkit.
kernel symbol is declared by a module. When such a symbol We perform analysis on binaries using symbolic execu-
is not used for a write access, it is not problematic. There- tion. Symbolic execution is a static analysis technique in
fore, we cannot reject a module as a rootkit by checking the which program execution is simulated using symbols, such
declared symbols only. as variable names, rather than actual values for input data.
Also, it is not sufficient to check for writes that target a The program state and outputs are then expressed as math-
forbidden symbol directly. Often, kernel rootkits use such ematical (or logical) expressions involving these symbols.
symbols as a starting point for more complex address calcu- When performing symbolic execution, the program is basi-
lations. For example, to access an entry in the system call cally executed with all possible input values simultaneously,
table, the system call table symbol is used as a base ad- thus allowing one to make statements about the program be-
dress that is increased by a fixed offset. Another example havior.
is the module list pointer that is used to traverse a linked list One problem with symbolic execution is the fact that it
of module elements until the one that should be removed is is impossible to make statements about arbitrary programs
reached. Therefore, a more extensive analysis has to be per- in general, due to the halting problem. However, when the
formed to also track indirect uses of forbidden kernel sym- completeness requirement is relaxed, it is often possible to
bols for write accesses. obtain useful results in practice. Relaxing the completeness
A clever intruder could use an attack in which two mod- requirement implies that the analysis is not guaranteed to de-
ules cooperate to evade detection. In this attack, a first mod- tect malicious instructions sequences in all cases. However,
ule only reads the sensitive address (e.g., the address of the this can be tolerated when most relevant instances are found.
system call table) and then it exports a function to access In order to simulate the execution of a program, or, in our
the address. A second module then reads the sensitive ad- case, the execution of a loadable kernel module, it is neces-
dress indirectly from the first module and uses it for an ille- sary to perform two preprocessing steps.
gal write access. To thwart this evasion, all symbols and re- First, the code sections of the binary have to be disas-
turn values of functions declared by other kernel modules are sembled. In this step, the machine instructions have to be
also marked as forbidden. Thus, when the second module ac- extracted and converted into a format that is suitable for
cesses the function exported by the first module, the return symbolic execution. That is, it is not sufficient to simply
value is tagged as forbidden and also subsequent write oper- print out the syntax of instructions, as done by programs
ations based on this value would result in an alarm. such as objdump. Instead, the type of the operation and
Naturally, there is an arms-race between rootkits that use its operands have to be parsed into an internal representa-
more sophisticated methods to obtain kernel addresses and tion. The disassembly step is complicated by the complexity
our detection system that relies on specifications of mali- of the Intel x86 instruction set, which uses a large number
cious behavior. For current rootkits, our basic specifications of variable length instructions and many different address-
allow for reliable detection with no false positives (see Sec- ing modes for backwards compatibility reasons.
tion 4 for details). However, it might be possible to circum- In the second preprocessing step, it is necessary to ad-
vent these specifications. In that case, it is necessary to pro- just address operands in all code sections present. The rea-
vide more elaborate descriptions of malicious behavior. son is that a Linux loadable kernel module is merely a stan-
dard ELF relocatable object file. Therefore, many memory subsequently be loaded. This is particularly common for the
address operands have not been assigned their final values stack area when return addresses are pushed on the stack by
yet. These memory address operands include targets of jump a call operation and later loaded by the corresponding return
and call instructions but also source and destination locations instruction.
of load, store, and move instructions. During symbolic execution, we can simulate the effect of
For a regular relocatable object file, the addresses are ad- arithmetic, logic, and data transfer instructions. To this end,
justed by the linker. To enable the necessary link operations, the values of the operands are calculated and the required op-
a relocatable object also contains, besides regular code and eration is performed. When at least one of the operands is an
data sections, a set of relocation entries. Note, however, that unknown token, the result is also unknown.
kernel modules are not linked to the kernel code by a regular Another feature is a tainting mechanism that tags val-
linker. Instead, the necessary adjustment (i.e., patching) of ues that are related to the use of forbidden kernel symbols.
addresses is performed during module load time by a special Whenever a forbidden symbol is used as an operand, even
module loader. For Linux kernels up to version 2.4, most of when its value is unknown, the result of the operation is
the module loader ran in user space; for kernels from version marked as tainted. Whenever a tainted value is later used by
2.5 and up, much of this functionality was moved into the another instruction, its result becomes tainted as well. This
kernel. To be able to simulate execution, we perform a pro- allows us to detect writes to kernel memory that are based on
cess similar to linking and substitute place holders in instruc- the use of forbidden symbols.
tion operands and data locations with the real addresses. This For the initial machine state, we prepare the processor
has the convenient side-effect that we can mark operands that state such that the instruction pointer register is pointing
represent forbidden kernel symbols so that the symbolic ex- to the first instruction of the module’s initialization routine,
ecution step can later trace their use in write operations. while the stack pointer and the base (i.e., frame) pointer reg-
When the loadable kernel module has been disassembled ister refer to valid addresses on the kernel stack. All other
and the necessary address modifications have occurred, the registers and the entire memory is marked as unknown.
symbolic execution process can commence. To this end, an Then, instructions are sequentially processed and the ma-
initial machine state is created and execution starts with the chine state is updated accordingly. For each data transfer, it
module’s initialization routine, called init module(). is checked whether data is written to kernel memory areas
that are not explicitly permitted by the white-list, or whether
Handling Machine State The machine state represents a
data is written to addresses that are tainted because of the use
snapshot of the system during symbolic execution. That is,
of forbidden symbols.
the machine state contains all possible values that could be
The execution of instructions continues until execution
present in the processor registers and the memory address
terminates with the final return instruction of the initializa-
space of the running process at a certain point during the ex-
tion function, or until a control flow instruction is reached.
ecution process. Given the notion of a machine state, an in-
struction can then be defined as a function that maps one ma- Handling Control Flow Control flow instructions present
chine state into another one. This mapping will reflect the ef- problems for our analysis when they have two possible suc-
fect of the instruction itself (e.g., a data value is moved from cessor instructions (i.e., continuations). In this case, the sym-
one register to another), but also implicit effects such as in- bolic execution process must either select a continuation to
crementing the instruction pointer. continue at, or a mechanism must be introduced to save the
When complete knowledge about the processor and mem- current machine state at the control flow instruction and ex-
ory state is available, and given the absence of any input and plore both paths one after the other. In this case, the execu-
external modifications of the machine state, it would be pos- tion first continues with one path until it terminates and then
sible to deterministically simulate the execution of a mod- backs up to the saved machine state and continues with the
ule. However, in our case, the complexity of such a com- other alternative.
plete simulation would be tremendous. Therefore, we intro- The only problematic type of control flow instructions are
duce a number of simplifications that improve the efficiency conditional branches. This is because it is not always possi-
of the symbolic execution process, while retaining the abil- ble to determine the real target of such a branch operation
ity to detect most malicious instruction sequences. statically. The most common reason is that the branch con-
A main simplification is the fact that we consider the ini- dition is based on an unknown value, and thus, both continu-
tial configuration of the memory content as unknown. This ations are possible. Neither unconditional jumps nor call in-
means that whenever a value is taken from memory, a spe- structions are a difficulty because both only have a single tar-
cial unknown token is returned. However, it does not imply get instruction where the execution continues. Also, calls and
that all loads from memory are automatically transformed the corresponding return operations are not problematic be-
into unknown tokens. When known values are stored at cer- cause they are handled correctly by the stack, which is part
tain memory locations, these values are remembered and can of the machine state.
Because malicious writes can occur on either path after loop would prevent the execution process from terminating.
a conditional branch, we chose to save the machine state The reason is that both continuations of the branch that cor-
at these instructions and then consecutively explore both al- responds to the loop termination condition are explored (i.e.,
ternative continuations. Unfortunately, this has a number of the loop body and the code path after the loop). When the
problems that have to be addressed. path that follows the loop body eventually reaches the loop
termination condition again, the state is saved a second time.
Then, as usual, both alternative continuations are explored.
' (
           

One of these continuations is, of course, the loop body that




 



     

leads back to the loop termination condition, where the pro-


      
cess repeats.
    #     $
) ( * (

To force termination of our symbolic execution process,


it is necessary to remove control flow loops. Note that it is
not sufficient to simply mark nodes in the control flow that
   

       
+
(

     ! 

 


have been previously processed. The reason is that nodes can


     " 

be legitimately processed multiple times without the exis-


tence of loops. In the example shown in Figure 1, the sym-
    %     &
, ( - (

bolic execution processes node 4 twice because of the join-


. (
    
ing control flows from node 2 and node 3. However, no loop
is present, and the analysis should not terminate prematurely
when reaching node 4 for the second time.
Figure 1. Example control flow graph.

One problem is caused by the exponential explosion of


possible paths that need to be followed. Consider the case of
multiple branch instructions that are the result of a series of
if-else constructs in the corresponding source code (see Fig-
ure 1). After each if-else block, the control flow joins. In this
example, the machine state needs to be saved at node 1, at
the branch(x) instruction. Then, the first path is taken via
node 2. The machine state is saved a second time at node 4
and both the left and the right path are subsequently exe- / 0 1 2 3 4 5 6 7

cuted (using the state previously saved at node 4). Then, the
execution process is rewinded to the first check point, and
continues via the right path (i.e., via node 3). Again, the ma-
chine state needs to be saved at node 4, and both alterna-
tives are followed a second time. Thus a total of four paths
have to be explored as a result of only two branch instruc-
tions.
Also, it is possible that impossible paths are being fol- Figure 2. Control flow graph with loop.
lowed. If, in our example, both the branch(x) and the
branch(y) instructions evaluated to the same boolean
value, it would be impossible that execution flows through
nodes 2 and 6, or through nodes 3 and 5. For our prototype, Instead, a more sophisticated algorithm based on the con-
the path explosion problem and impossible paths have not trol flow graph of the binary is necessary. In [1], a suitable
caused any difficulties (refer to Section 4 for the evaluation algorithm is presented that is based on dominator trees. This
of our system). This is due to the limited size of the kernel algorithm operates on the control flow graph and can detect
modules. Therefore, we save the machine state at every con- (and remove) the back-edges of loops. Simply speaking, a
ditional branch instruction and explore both alternative con- back-edge is the jump from the end of the loop body back to
tinuations. the loop header, and it is usually the edge that would be iden-
Another problem is the presence of loops. Because the tified as the “loop-defining-edge” by a human looking at the
machine state is saved at every branch instruction and both control flow graph. For example, Figure 2 shows a control
alternatives are explored one after another, the existence of a flow graph with a loop and the corresponding back-edge.
For our system, we first create a control flow graph of the dlers, while adore-ng patches itself into the VFS layer of
kernel module code after it has been preprocessed. Then, a the kernel to intercept accesses to the /proc file system.
loop detection algorithm is run and the back-edges are de- Since each rootkit was extensively analyzed during the pro-
tected. Each conditional branch instruction that has a back- totype development phase, it was expected that all malicious
edge as a possible continuation is tagged appropriately. Dur- kernel accesses would be discovered.
ing symbolic execution, no machine state is saved at these
instructions and processing continues only at the non-back- The second set consisted of a set of seven additional pop-
edge alternative. This basically means that a loop is executed ular rootkits downloaded from the Internet, described in Ta-
at most once by our system. For future work, we intend to re- ble 1. Since these rootkits were not analyzed during the pro-
place this simple approach by more advanced algorithms for totype development phase, the detection rate for this group
symbolic execution of loops. Note, however, that more so- can be considered a measure of the generality of the detec-
phisticated algorithms that attempt to execute a loop multi- tion technique as applied against previously unknown root-
ple times will eventually hit the limits defined by the halting kits that utilize similar means to subvert the kernel as knark
problem. Thus, every approach has to accept a certain de- and adore-ng.
gree of incompleteness that could lead to incorrect results.
A last problem are indirect jumps that are based on un- The final set consisted of a control group of legitimate
known values. In such cases, it might be possible to heuristi- kernel modules, namely the entire default set of kernel mod-
cally choose possible targets and speculatively continue with ules for the Fedora Core 1 Linux x86 distribution. This
the execution process there. In our current prototype, how- set includes 985 modules implementing various components
ever, we simply terminate control flow at these points. The of the Linux kernel, including networking protocols (e.g.,
reason is that indirect jumps based on unknown values al- IPv6), bus protocols (e.g., USB), file systems (e.g., EXT3),
most never occurred in our experiments. and device drivers (e.g., network interfaces, video cards). It
was assumed that no modules incorporating rootkit function-
ality were present in this set.
4. Evaluation
Table 2 presents the results of the detection evaluation for
The proposed rootkit detection algorithm was imple-
each of the three sets of modules. As expected, all malicious
mented as a user space prototype that simulated the object
writes to kernel memory by both knark and adore-ng
parsing and symbol resolution performed by the exist-
were detected, resulting in a false negative rate of 0% for
ing kernel module loader before disassembling the mod-
both rootkits. All malicious writes by each evaluation root-
ule and analyzing the code for the presence of malicious
kit were detected as well, resulting in a false negative rate of
writes to kernel memory. The prototype implementa-
0% for this set. We interpret this result as an indication that
tion was evaluated with respect to its detection capabil-
the detection technique generalizes well to previously un-
ities and performance impact on production systems. To
seen rootkits. Finally, no malicious writes were reported by
this end, an experiment was devised in which the proto-
the prototype for the control group, resulting in a false pos-
type was run on several sets of kernel modules. Detection
itive rate of 0%. We thus conclude that the detection algo-
capability for each set was evaluated in terms of false pos-
rithm is completely successful in distinguishing rootkits ex-
itive rates for legitimate modules, and false negative rates
hibiting specified malicious behavior from legitimate kernel
for rootkit modules. Detection performance was evalu-
modules, as no misclassifications occurred during the entire
ated in terms of the total execution time of the prototype
detection evaluation.
for each module analyzed. The evaluation itself was con-
ducted on a testbed consisting of a single default Fedora
Core 1 Linux installation on a Pentium IV 2.0 GHz work-
station with 1 GB of RAM. scan: initializing scan for rootkits/all-root.o
scan: loading kernel symbol table from boot/System.map
scan: kernel memory configured [c0100000-c041eaf8]
4.1. Detection Results scan: resolving external symbols in section .text
scan: disassembling section .text
scan: performing scan from [.text+40]
For the detection evaluation, three sets of kernel mod- scan: WRITE TO KERNEL MEMORY [c0347df0] at [.text+50]
ules were created. The first set comprised the knark and scan: 1 malicious write detected, denying module load

adore-ng rootkits, both of which were used during de-


velopment of the prototype. As mentioned previously, both Figure 3. all-root rootkit analysis.
rootkits implement different methods of subverting the con-
trol flow of the kernel: knark overwrites entries in the sys-
tem call table to redirect various system calls to its own han-
Rootkit Technique Description
adore syscalls File, directory, process, and socket hiding
Rootshell backdoor
all-root syscalls Gives all processes UID 0
kbdv3 syscalls Gives special user UID 0
kkeylogger syscalls Logs keystrokes from local and network logins
rkit syscalls Gives special user UID 0
shtroj2 syscalls Execute arbitrary programs as UID 0
synapsys syscalls File, directory, process, socket, and module hiding
Gives special user UID 0
Table 1. Evaluation rootkits.

Module Set Modules Analyzed Detections Misclassification Rate


Development rootkits 2 2 0 (0%)
Evaluation rootkits 6 6 0 (0%)
Fedora Core 1 modules 985 0 0 (0%)
Table 2. Detection results.

To verify that the detection algorithm performed correctly 00000040 <init_module>:


on the evaluation rootkits, traces of the analysis performed 40: a1 60 00 00 00 mov 0x60,%eax
by the prototype on each rootkit were examined with re- 45: 55 push %ebp
spect to the corresponding module code. As a simple exam- 46: 89 e5 mov %esp,%ebp
ple, consider the case of the all-root rootkit, the analy- 48: a3 00 00 00 00 mov %eax,0x0
sis trace of which is shown in Figure 3. From the trace, we 4d: 5d pop %ebp
can see that one malicious kernel memory write was detected 4e: 31 c0 xor %eax,%eax
at .text+50 (i.e., at an offset of 50 bytes into the .text 50: c7 05 60 00 00 00 00 movl $0x0,0x60
57: 00 00 00
section). By examining the disassembly of the all-root
5a: c3 ret
module, the relevant portion of which is shown in Fig-
ure 4, we can see that the overwrite occurs in the module’s Figure 4. all-root module disassembly.
initialization function, init module()1 . Specifically, the
movl instruction at .text+50 is flagged as a malicious
write to kernel memory. Correlating the disassembly with
the corresponding rootkit source code, shown in Figure 5, 1 int init_module(void)
2 {
we can see that this instruction corresponds to the write to
3 orig_getuid =
the sys call table array to replace the getuid() sys-
sys_call_table[__NR_getuid];
tem call handler with the module’s malicious version at line 4 sys_call_table[__NR_getuid] =
4. Thus, we conclude that the rootkit’s attempt to redirect a give_root;
system call was properly detected. 5
6 return 0;
4.2. Performance Results 7 }

For the performance evaluation, the elapsed execution Figure 5. all-root initialization function.
time of the analysis phase of the prototype was recorded
for all modules, legitimate and malicious. Time spent pars-
ing the object file and patching relocation table entries into the module was excluded, as these functions are already per-
formed as part of the normal operation of the existing mod-
ule loader. The goal of the evaluation was to provide some
1 Note that this disassembly was generated prior to kernel symbol resolu-
tion, thus the displayed read and write accesses are performed on place indication about the performance overhead introduced by the
holder addresses. At runtime and for the symbolic execution, the proper detection process in the loading of a module in a production
memory address would be patched into the code. kernel. Note that as mentioned previously, no runtime over-
head is generated by our technique after the module has been Our tool is currently available as a user program only. In
loaded. order to provide automatic protection from rootkits, it would
be necessary to integrate our analyzer into the kernel’s mod-
1000
ule loading infrastructure. As an additional requirement, the
Detection Overhead
analyzer must not be bypassable when a process with root
permissions attempts to load a module. The reason is that
Number of Modules

100 kernel modules can only be inserted by the root user. Thus,
the threat model has to assume that the attacker has supe-
ruser privileges when attempting to load a kernel module.
10
Up until Linux 2.4, most work of the module loading pro-
cess was done in user space, using the insmod program. In
1 this case, adding our checker to insmod would not be use-
0 100 200 300 400 500
Execution Time (ms) ful because an attacker can simply supply a customized ver-
Figure 6. Detection overhead on module load. sion without checks. The solution is to move the analyzer
code into kernel space. Interestingly, starting from Linux 2.5,
most of the module loading code has been moved into the
kernel space, providing an optimal place to add our checks.
Figure 6 shows the elapsed execution time of all evalu- Unfortunately, mechanisms have been proposed to inject
ated modules, discretized into log-scale buckets with a width code directly into the kernel without using the module load-
of 10 ms. As we can see, the vast majority of modules would ing interface. These ideas originated from the fact that some
experience a delay of 10 ms or less during module load. Sev- system administrators disabled the module loading function-
eral modules with more complex initialization procedures ality as a defense against kernel-level rootkits. These mech-
(and thus complex control flow graphs) required more time anisms operate by writing the code directly into kernel space
to fully analyze, but as can be seen in Table 3, the detection via the /dev/kmem device, completely bypassing the mod-
algorithm never spent more than 420 ms to classify a mod- ule loading code.
ule as malicious or legitimate. Thus, we conclude that the In our opinion, a sensible and secure solution would disal-
impact of the detection algorithm on the module load opera- low modifications of kernel memory via /dev/kmem, a fea-
tion is acceptable for a production system. ture that is already offered by Linux security solutions such
as grsecurity [5]. In addition, our kernel-level rootkit analy-
Minimum Maximum Median Std. Deviation sis system would operate in kernel context behind the mod-
0.00 ms 420.00 ms 0.00 ms 39.83 ule loading interface, thus having the opportunity to stati-
cally scan each module before it gets to run as part of the
Table 3. Detection overhead statistics. kernel.
A possible way for rootkits to evade the behavioral spec-
ification that is based on forbidden kernel symbols (see Sec-
tion 3 for details) is to stop using these symbols. However, to
5. Discussion perform the necessary modifications of the kernel data struc-
tures or function pointers, their addresses are needed. There-
Our prototype is a user-space program that statically an- fore, alternative approaches to resolving these addresses are
alyzes Linux loadable kernel modules for the presence of required. One option is to use a brute force guessing tech-
rootkit functionality. These modules have to be ELF object nique that works by scanning the kernel memory for the oc-
files that are compiled for the Intel x86 architecture. currence of “known content” that is stored at the target loca-
The limitation on the classes of modules that can be an- tion. This is particularly effective for the system call table.
alyzed stems from the fact that a kernel module needs to be The reason is that its content is known because system call
parsed and its code sections disassembled before the actual table entries are pointers to handler functions whose sym-
analysis can start. Therefore, additional parsing and disas- bols are exported.
sembly routines would be necessary to process different ob- Although a brute force guessing approach might not al-
ject file formats or instruction sets. Because a vast majority ways be suitable, we propose the addition of a specifica-
of Linux systems run on Intel x86 machines, and because tion that considers the scanning of kernel memory as an-
Linux kernel modules have to be provided as ELF object other indication of the presence of a rootkit. This specifi-
files, we developed our prototype for this combination. The cation checks for loops that, starting from any kernel sym-
analysis technique itself, however, can be readily extended bol, sequentially read data and compare this data to constant
to other systems. values. Also, note that the specification that checks for il-
legitimate memory accesses based on actual destination ad- References
dresses works independently of kernel symbols referenced
by the module. [1] A. Aho, R. Sethi, and J. Ullman. Compilers – Principles,
Techniques, and Tools. World Student Series of Computer
Science. Addison Wesley, 1986.
6. Conclusions [2] S. Aubert. rkscan: Rootkit Scanner. http:
//www.hsc.fr/ressources/outils/rkscan/
Rootkits are powerful attack tools that are used by in- index.html.en, 2004.
truders to hide their presence from system administrators. [3] Black Tie Affair. Hiding Out Under UNIX. Phrack Maga-
Kernel-level rootkits, in particular, directly modify the ker- zine, 3(25), 1989.
nel, and, therefore, can intercept and prevent any attempt of [4] FuSyS. Kstat v. 1.1-2. http://s0ftpj.org/, November 2002.
an administrator to determine if the security of the system [5] grsecurity. An innovative approach to security utilizing a
has been violated. Because of this, it is important to devise multi-layered detection, prevention, and containment model.
mechanisms that can protect the integrity of the kernel even http://www.grsecurity.net/, 2004.
in the aftermath of the compromise of the administrator ac- [6] Halflife. Abuse of the Linux Kernel for Fun and Profit. Phrack
count. Magazine, 7(50), April 1997.
This paper presents a technique that is based on static [7] G. Kim and E. Spafford. The Design and Implementation of
analysis to identify instruction sequences that are an indi- Tripwire: A File System Integrity Checker. Technical report,
cation of rootkits. Informal behavioral specifications define Purdue University, Nov. 1993.
such characteristic instruction sequences as data transfer op- [8] T. Lawless. St. Michael and St. Jude. http://
sourceforge.net/projects/stjude/, 2004.
erations that write to certain illegitimate kernel memory ar-
[9] T. Miller. T0rn rootkit analysis. http://www.ossec.
eas. Symbolic execution is then used to simulate the execu-
net/rootkits/studies/t0rn.txt.
tion of the kernel module to detect instructions that fulfill
[10] T. Miller. Analysis of the KNARK Rootkit. http://www.
these specifications. Through this method, it is possible to ossec.net/rootkits/studies/knark.txt, 2004.
detect malicious behavior before a module is loaded into the [11] N. Murilo and K. Steding-Jessen. Chkrootkit v. 0.43. http:
kernel, and, in addition, it is possible to operate on closed- //www.chkrootkit.org/.
source components, such as proprietary drivers. [12] D. Safford. The Need for TCPA. IBM White Paper, October
We implemented our technique in a prototype tool and we 2002.
evaluated both the effectiveness and the performance of the [13] sd and devik. Linux on-the-fly kernel patching without LKM.
tool with respect to nine real-world rootkits as well as the Phrack Magazine, 11(58), 2001.
complete set of 985 legitimate kernel modules that are in- [14] Stealth. adore. http://spider.scorpions.net/
cluded with the Fedora Core 1 Linux distribution. The re- ˜stealth, 2001.
sults show that all tested rootkits were successfully identi- [15] Stealth. Kernel Rootkit Experiences and the Future. Phrack
fied, and no false positives were raised on legitimate mod- Magazine, 11(61), August 2003.
ules. We thus conclude that the technique can reliably de- [16] Stealth. adore-ng. http://stealth.7350.org/
tect malicious kernel modules and, therefore, it represents a rootkits/, 2004.
useful tool to harden the operating system kernel. In addi- [17] TCG. Trusted Computing Group Home. https://www.
tion, we show that detection can be done efficiently, despite trustedcomputinggroup.org/home, 2004.
the application of a potentially expensive static analysis tech-
nique.
Future work will be centered on devising a more formal
description of the aspects that characterize rootkit-like be-
havior. In addition, we plan to study how attacks that attempt
to bypass our detection procedures can be prevented. Finally,
we intend to integrate the detection component into the ker-
nel module loader infrastructure as a step towards preparing
the system for general usage.

Acknowledgments
This research was supported by the Army Research Of-
fice, under agreement DAAD19-01-1-0484 and by the Na-
tional Science Foundation under grants CCR-0209065 and
CCR-0238492.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy