Sane2000 Jail PDF
Sane2000 Jail PDF
ABSTRACT
1. Introduction
The UNIX access control mechanism is designed for an environment with two types
of users: those with, and without administrative privilege. Within this framework, every
attempt is made to provide an open system, allowing easy sharing of files and inter-pro-
cess communication. As a member of the UNIX family, FreeBSD inherits these secu-
rity properties. Users of FreeBSD in non-traditional UNIX environments must balance
their need for strong application support, high network performance and functionality,
This work was sponsored by http://www.servetheweb.com/ and donated to
the FreeBSD Project for inclusion in the FreeBSD OS. FreeBSD 4.0-RELEASE was
the first release including this code. Follow-on work was sponsored by Safeport Net-
work Services, http://www.safeport.com/
and low total cost of ownership with the need for alternative security models that are
difficult or impossible to implement with the UNIX security mechanisms.
One such consideration is the desire to delegate some (but not all) administrative
functions to untrusted or less trusted parties, and simultaneously impose system-wide
mandatory policies on process interaction and sharing. Attempting to create such an
environment in the current-day FreeBSD security environment is both difficult and
costly: in many cases, the burden of implementing these policies falls on user applica-
tions, which means an increase in the size and complexity of the code base, in turn
translating to higher development and maintaennce cost, as well as less overall flexibil-
ity.
This abstract risk becomes more clear when applied to a practical, real-world exam-
ple: many web service providers turn to the FreeBSD operating system to host customer
web sites, as it provides a high-performance, network-centric server environment. How-
ever, these providers have a number of concerns on their plate, both in terms of protect-
ing the integrity and confidentiality of their own files and services from their customers,
as well as protecting the files and services of one customer from (accidental or inten-
tional) access by any other customer. At the same time, a provider would like to provide
substantial autonomy to customers, allowing them to install and maintain their own soft-
ware, and to manage their own services, such as web servers and other content-related
daemon programs.
This problem space points strongly in the direction of a partitioning solution, in which
customer processes and storage are isolated from those of other customers, both in
terms of accidental disclosure of data or process information, but also in terms of the
ability to modify files or processes outside of a compartment. Delegation of manage-
ment functions within the system must be possible, but not at the cost of system-wide
requirements, including integrity and privacy protection between partitions.
However, UNIX-style access control makes it notoriously difficult to compartmen-
talise functionality. While mechanisms such as chroot(2) provide a modest level com-
partmentalisation, it is well known that these mechanisms have serious shortcomings,
both in terms of the scope of their functionality, and effectiveness at what they provide
[CHROOT].
In the case of the chroot(2) call, a process’s visibility of the file system name-space is
limited to a single subtree. However, the compartmentalisation does not extend to the
process or networking spaces and therefore both observation of and interference with
processes outside their compartment is possible.
To this end, we describe the new FreeBSD ‘‘Jail’’ facility, which provides a strong
partitioning solution, leveraging existing mechanisms, such as chroot(2), to what effec-
tively amounts to a virtual machine environment. Processes in a jail are provided full
access to the files that they may manipulate, processes they may influence, and network
services they can make use of, and neither access nor visibility of files, processes or net-
work services outside their partition.
Unlike other fine-grained security solutions, Jail does not substantially increase the
policy management requirements for the system administrator, as each Jail is a virtual
FreeBSD environment permitting local policy to be independently managed, with much
the same properties as the main system itself, making Jail easy to use for the administra-
tor, and far more compatible with applications.
1
... no matter how patently stupid it may be.
3. Other Solutions to the Root Problem
Many operating systems attempt to address these limitations by providing fine-grained
access controls for system resources [BIBA]. These efforts vary in degrees of success,
but almost all suffer from at least three serious limitations:
First, increasing the granularity of security controls increases the complexity of the
administration process, in turn increasing both the opportunity for incorrect configura-
tion, as well as the demand on administrator time and resources. In many cases, the
increased complexity results in significant frustration for the administrator, which may
result in two disastrous types of policy: ‘‘all doors open as it’s too much trouble’’, and
‘‘trust that the system is secure, when in fact it isn’t’’.
The extent of the trouble is best illustrated by the fact that an entire niche industry has
emerged providing tools to manage fine grained security controls [UAS].
Second, usefully segregating capabilities and assigning them to running code and
users is very difficult. Many privileged operations in UNIX seem independent, but are
in fact closely related, and the handing out of one privilege may, in effect, be transitive
to the many others. For example, in some trusted operating systems, a system capability
may be assigned to a running process to allow it to read any file, for the purposes of
backup. However, this capability is, in effect, equivalent to the ability to switch to any
other account, as the ability to access any file provides access to system keying material,
which in turn provides the ability to authenticate as any user. Similarly, many operating
systems attempt to segregate management capabilities from auditing capabilities. In a
number of these operating systems, however, ‘‘management capabilities’’ permit the
administrator to assign ‘‘auditing capabilities’’ to itself, or another account, circumvent-
ing the segregation of capability.
Finally, introducing new security features often involves introducing new security
management APIs. When fine-grained capabilities are introduced to replace the setuid
mechanism in UNIX-like operating systems, applications that previously did an ‘‘appro-
priateness check’’ to see if they were running as root before executing must now be
changed to know that they need not run as root. In the case of applications running with
privilege and executing other programs, there is now a new set of privileges that must be
voluntarily given up before executing another program. These change can introduce
significant incompatibility for existing applications, and make life more difficult for
application developers who may not be aware of differing security semantics on differ-
ent systems [POSIX1e].
dev/
lo0 127.0.0.1 etc/
usr/
var/
dev/ home/
etc/
usr/
var/ dev/ 10.0.0.1
/ home/ etc/ 10.0.0.2
jail_1/ usr/ 10.0.0.3 fxp0
jail_2/ var/ 10.0.0.4
jail_3/ home/ 10.0.0.5
5. Jail Implementation
Processes running with root privileges in the jail find that there are serious restrictions
on what it is capable of doing — in particular, activities that would extend outside of the
jail:
• Modifying the running kernel by direct access and loading kernel modules is
prohibited.
• Modifying any of the network configuration, interfaces, addresses, and routing
table is prohibited.
• Mounting and unmounting file systems is prohibited.
• Creating device nodes is prohibited.
• Accessing raw, divert, or routing sockets is prohibited.
• Modifying kernel runtime parameters, such as most sysctl settings, is prohibited.
• Changing securelevel-related file flags is prohibited.
• Accessing network resources not associated with the jail is prohibited.
Other privileged activities are permitted as long as they are limited to the scope of the
jail:
• Signalling any process within the jail is permitted.
• Changing the ownership and mode of any file within the jail is permitted, as
long as the file flags permit this.
• Deleting any file within the jail is permitted, as long as the file flags permit this.
• Binding reserved TCP and UDP port numbers on the jails IP address is permit-
ted. (Attempts to bind TCP and UDP ports using IN_ADDRANY will be redi-
rected to the jails IP address.)
• Functions which operate on the uid/gid space are all permitted since they act as
labels for filesystem objects of proceses which are partitioned off by other mecha-
nisms.
These restrictions on root access limit the scope of root processes, enabling most
applications to run un-hindered, but preventing calls that might allow an application to
reach beyond the jail and influence other processes or system-wide configuration.
6.1. The jail(2) system call, allocation, refcounting and deallocation of struct
prison.
The jail(2) system call is implemented as a non-optional system call in FreeBSD.
Other system calls are controlled by compile time options in the kernel configuration
file, but due to the minute footprint of the jail implementation, it was decided to make it
a standard facility in FreeBSD.
The implementation of the system call is straightforward: a data structure is allocated
and populated with the arguments provided. The data structure is attached to the current
process’ struct proc, its reference count set to one and a call to the chroot(2)
syscall implementation completes the task.
Hooks in the code implementing process creation and destruction maintains the refer-
ence count on the data structure and free it when the last reference is lost. Any new pro-
cess created by a process in a jail will inherit a reference to the jail, which effectively
puts the new process in the same jail.
There is no way to modify the contents of the data structure describing the jail after its
creation, and no way to attach a process to an existing jail if it was not created from the
inside that jail.
After running the jail command, the shell is now within the jail environment, and all
further commands will be limited to the scope of the jail until the shell exits. If the net-
work alias has not yet been configured, then the jail will be unable to access the net-
work.
The startup configuration of the jail environment may be configured so as to quell
warnings from services that cannot run in the jail. Also, any per-system configuration
required for a normal FreeBSD system is also required for each jailbox. Typically, this
includes:
• Create empty /etc/fstab
• Disable portmapper
• Run newaliases
• Disabling interface configuration
• Configure the resolver
• Set root password
• Set timezone
• Add any local accounts
• Install any packets
A few warnings are generated for sysctl’s that are not permitted to be set within the
jail, but the end result is a set of processes in an isolated process environment, bound to
a single IP address. Normal procedures for accessing a FreeBSD machine apply: telnet-
ing in through the network reveals a telnet prompt, login, and shell.
% ps ax
PID TT STAT TIME COMMAND
228 ?? SsJ 0:18.73 syslogd
247 ?? IsJ 0:00.05 inetd -wW
249 ?? IsJ 0:28.43 cron
252 ?? SsJ 0:30.46 sendmail: accepting connections on port 25
291 ?? IsJ 0:38.53 /usr/local/sbin/sshd
93694 ?? SJ 0:01.01 sshd: rwatson@ttyp0 (sshd)
93695 p0 SsJ 0:00.06 -csh (csh)
93700 p0 R+J 0:00.00 ps ax
It is immediately obvious that the environment is within a jailbox: there is no init pro-
cess, no kernel daemons, and a J flag is present beside all processes indicating the pres-
ence of a jail.
As with any FreeBSD system, accounts may be created and deleted, mail is delivered,
logs are generated, packages may be added, and the system may be hacked into if con-
figured incorrectly, or running a buggy version of a piece of software. However, all of
this happens strictly within the scope of the jail.
7.3. Jail Management
Jail management is an interesting prospect, as there are two perspectives from which a
jail environment may be administered: from within the jail, and from the host environ-
ment. From within the jail, as described above, the process is remarkably similar to any
regular FreeBSD install, although certain actions are prohibited, such as mounting file
systems, modifying system kernel properties, etc. The only area that really differs are
that of shutting the system down: the processes within the jail may deliver signals
between them, allowing all processes to be killed, but bringing the system back up
requires intervention from outside of the jailbox.
From outside of the jail, there are a range of capabilities, as well as limitations. The
jail environment is, in effect, a subset of the host environment: the jail file system
appears as part of the host file system, and may be directly modified by processes in the
host environment. Processes within the jail appear in the process listing of the host, and
may likewise be signalled or debugged. The host process file system makes the host-
name of the jail environment accessible in /proc/procnum/status, allowing utilities in the
host environment to manage processes based on jailname. However, the default config-
uration allows privileged processes within jails to set the hostname of the jail, which
makes the status file less useful from a management perspective if the contents of the
jail are malicious. To prevent a jail from changing its hostname, the "jail.set_host-
name_allowed" sysctl may be set to 0 prior to starting any jails.
One aspect immediately observable in an environment with multiple jails is that uids
and gids are local to each jail environment: the uid associated with a process in one jail
may be for a different user than in another jail. This collision of identifiers is only visi-
ble in the host environment, as normally processes from one jail are never visible in an
environment with another scope for user/uid and group/gid mapping. Managers in the
host environment should understand these scoping issues, or confusion and unintended
consequences may result.
Jailed processes are subject to the normal restrictions present for any processes,
including resource limits, and limits placed by the network code, including firewall
rules. By specifying firewall rules for the IP address bound to a jail, it is possible to
place connectivity and bandwidth limitations on individual jails, restricting services that
may be consumed or offered.
Management of jails is an area that will see further improvement in future versions of
FreeBSD. Some of these potential improvements are discussed later in this paper.
8. Future Directions
The jail facility has already been deployed in numerous capacities and a few opportu-
nities for improvement have manifested themselves.
9. Conclusion
The jail facility provides FreeBSD with a conceptually simple security partitioning
mechanism, allowing the delegation of administrative rights within virtual machine par-
titions.
The implementation relies on restricting access within the jail environment to a well-
defined subset of the overall host environment. This includes limiting interaction
between processes, and to files, network resources, and privileged operations. Adminis-
trative overhead is reduced through avoiding fine-grained access control mechanisms,
and maintaining a consistent administrative interface across partitions and the host envi-
ronment.
The jail facility has already seen widespread deployment in particular as a vehicle for
delivering "virtual private server" services.
The jail code is included in the base system as part of FreeBSD 4.0-RELEASE, and
fully documented in the jail(2) and jail(8) man-pages.
Notes & References
[BIBA]
K. J. Biba, Integrity Considerations for Secure Computer Systems, USAF Elec-
tronic Systems Division, 1977
[CHROOT]
Dr. Marshall Kirk Mckusick, private communication: ‘‘According to the SCCS
logs, the chroot call was added by Bill Joy on March 18, 1982 approximately 1.5
years before 4.2BSD was released. That was well before we had ftp servers of
any sort (ftp did not show up in the source tree until January 1983). My best
guess as to its purpose was to allow Bill to chroot into the /4.2BSD build direc-
tory and build a system using only the files, include files, etc contained in that
tree. That was the only use of chroot that I remember from the early days.’’
[LOTTERY1]
David Petrou and John Milford. Proportional-Share Scheduling: Implementation
and Evaluation in a Widely-Deployed Operating System, December 1997.
http://www.cs.cmu.edu/˜dpetrou/papers/freebsd_lottery_writeup98.ps
http://www.cs.cmu.edu/˜dpetrou/code/freebsd_lottery_code.tar.gz
[LOTTERY2]
Carl A. Waldspurger and William E. Weihl. Lottery Scheduling: Flexible Pro-
portional-Share Resource Management, Proceedings of the First Symposium on
Operating Systems Design and Implementation (OSDI ’94), pages 1-11, Mon-
terey, California, November 1994.
http://www.research.digital.com/SRC/personal/caw/papers.html
[POSIX1e]
Draft Standard for Information Technology — Portable Operating System Inter-
face (POSIX) — Part 1: System Application Program Interface (API) —
Amendment: Protection, Audit and Control Interfaces [C Language] IEEE Std
1003.1e Draft 17 Editor Casey Schaufler
[ROOT]
Historically other names have been used at times, Zilog for instance called the
super-user account ‘‘zeus’’.
[UAS] One such niche product is the ‘‘UAS’’ system to maintain and audit RACF con-
figurations on MVS systems.
http://www.entactinfo.com/products/uas/
[UF] Quote from the User-Friendly cartoon by Illiad.
http://www.userfriendly.org/cartoons/archives/98nov/19981111.html