Isolation: The Confinement Principle
Isolation: The Confinement Principle
Isolation
The confinement
principle
Dan Boneh
Running untrusted code
We often need to run buggy/unstrusted code:
– programs from untrusted Internet sites:
• desktop and mobile apps, Javascript, browser extensions
– honeypots
app 1 app 2
⇒ difficult to manage
Dan Boneh
Approach: confinement
Confinement: ensure misbehaving app cannot harm rest of system
app1 app2
OS1 OS2
process 1
process 2
Operating System
Dan Boneh
Approach: confinement
Confinement: ensure misbehaving app cannot harm rest of system
Dan Boneh
Implementing confinement
Key component: reference monitor
– Mediates requests from applications
• Enforces confinement
• Implements a specified protection policy
– Must always be invoked:
• Every application request must be mediated
– Tamperproof:
• Reference monitor cannot be killed
… or if killed, then monitored process is killed too
– Small enough to be analyzed and validated Dan Boneh
A old example: chroot
To use do: (must be root)
chroot /tmp/guest root dir “/” is now “/tmp/guest”
su guest EUID set to “guest”
• Reboot system
Dan Boneh
Freebsd jail
Stronger mechanism than simple chroot
Dan Boneh
System call interposition
Observation: to damage host system (e.g. persistent changes)
app must make system calls:
– To delete/overwrite files: unlink, open, write
– To do network attacks: socket, bind, connect, send
Implementation options:
– Completely kernel space (e.g., Linux seccomp)
– Completely user space (e.g., program shepherding)
– Hybrid (e.g., Systrace)
Dan Boneh
Early implementation (Janus) [GWTB’96]
fopen(“/etc/passwd”, “r”)
OS Kernel
Monitor kills application if request is disallowed
Dan Boneh
Example policy
Sample policy file (e.g., for PDF reader)
Chrome renderer
process starts
… Renderer process
renders site
user space
BPF filter input: syscall number, syscall args., arch. (x86 or ARM)
Filter returns one of:
– SECCOMP_RET_KILL: kill process
– SECCOMP_RET_ERRNO: return specified error to caller
– SECCOMP_RET_ALLOW: allow syscall
Dan Boneh
Installing a BPF filter
• Must be called before setting BPF filter.
• Ensures set-UID, set-GID ignored on subequent execve()
⇒ attacker cannot elevate privilege
App 1
App 2
App 3
making sys calls filtered by
secomp-BPF Docker engine
host OS
• Whoever starts container hardware
can specify BPF policy
– default policy blocks many syscalls, including ptrace
Dan Boneh
Docker sys call filtering
Run nginx container with a specific filter called filter.json:
$ docker run --security-opt seccomp=filter.json nginx
Example filter:
“defaultAction”: “SCMP_ACT_ERRNO”, // deny by default
“syscalls”: [
{ "names": ["accept”], // sys-call name
"action": "SCMP_ACT_ALLOW", // allow (whitelist)
"args": [ ] } , // what args to allow
…
]
Dan Boneh
Ostia: SCI with minimal kernel support
Monitored app disallowed from making monitored sys calls
– Minimal kernel change (… but app can call close() itself )
OS Kernel
Dan Boneh
Isolation
Isolation via
Virtual Machines
Dan Boneh
Virtual Machines
VM2 VM1
Apps Apps
Guest OS 2 Guest OS 1
Virtual Machine Monitor (VMM, hypervisor)
Host OS
Hardware
Example: NSA NetTop
single HW platform used for both classified and unclassified data
Dan Boneh
Why so popular now?
VMs in the 1960’s:
– Few computers, lots of users
– VMs allow many users to shares a single computer
Classified VM Public VM
malware
secret
covert
doc listener
channel
hypervisor
Dan Boneh
An example covert channel
Both VMs use the same underlying hardware
At 1:00am listener does CPU intensive calc. and measures completion time
b=1 ⇒ completion-time > threshold
Guest OS Guest OS
Xen hypervisor
Hardware
Type 1 hypervisor:
VMs from different customers may run on the same machine no host OS
• Hypervisor must isolate VMs … but some info leaks
Dan Boneh
VM isolation in practice: end-user
Qubes OS: a desktop/laptop OS where everything is a VM
• Runs on top of the Xen hypervisor
• Access to peripherals (mic, camera, usb, …) controlled by VMs
Applications:
Dan Boneh
Hypervisor detection
Dan Boneh
Hypervisor detection (red pill techniques)
• VM platforms often emulate simple hardware
– VMWare emulates an ancient i440bx chipset
… but report 8GB RAM, dual CPUs, etc.
Dan Boneh
Software Fault Isolation [Whabe et al., 1993]
Dan Boneh
Software Fault Isolation
SFI approach: Partition process memory into segments
app #1 app #2
Solution:
Dan Boneh
Cross domain calls
caller callee
domain domain
call stub draw:
call draw
return
br addr br addr
br addr ret stub br addr
br addr br addr
• Only stubs allowed to make cross-domain jumps
• Jump table contains allowed exit points
– Addresses are hard coded, read-only segment
Dan Boneh
SFI Summary
• Performance
– Usually good: mpeg_play, 4% slowdown
Dan Boneh
Isolation: summary
• Many sandboxing techniques:
Physical air gap, Virtual air gap (hypervisor),
System call interposition (SCI), Software Fault isolation (SFI)
Application specific (e.g. Javascript in browser)
Dan Boneh