Anatomy of The Linux Kernel
Anatomy of The Linux Kernel
Given that the goal of this article is to introduce you to the Linux kernel and explore its architecture and
major components, let's start with a short tour of Linux kernel history, then look at the Linux kernel
architecture from 30,000 feet, and, finally, examine its major subsystems. The Linux kernel is over six
million lines of code, so this introduction is not exhaustive. Use the pointers to more content to dig in
further.
Linux quickly evolved from a single-person project to a world-wide development project involving thousands of developers. One of the most
important decisions for Linux was its adoption of the GNU General Public License (GPL). Under the GPL, the Linux kernel was protected from
commercial exploitation, and it also benefited from the user-space development of the GNU project (of Richard Stallman, whose source dwarfs
that of the Linux kernel). This allowed useful applications such as the GNU Compiler Collection (GCC) and various shell support.
converted by W eb2PDFConvert.com
Introduction to the Linux kernel
Now on to a high-altitude look at the GNU/Linux operating system architecture. You can think about an operating system from two levels, as
shown in Figure 2.
Figure 2. The fundamental architecture of the GNU/Linux operating system
At the top is the user, or application, space. This is where the user applications are executed. Below the user space is the kernel space. Here, the
Linux kernel exists.
There is also the GNU C Library (glibc). This provides the system call interface that connects to the kernel and provides the mechanism to
transition between the user-space application and the kernel. This is important because the kernel and user application occupy different protected
address spaces. And while each user-space process occupies its own virtual address space, the kernel occupies a single address space. For
more information, see the links in the Resources section.
The Linux kernel can be further divided into three gross levels. At the top is the system call interface, which implements the basic functions such
as read and write. Below the system call interface is the kernel code, which can be more accurately defined as the architecture-independent
kernel code. This code is common to all of the processor architectures supported by Linux. Below this is the architecture-dependent code, which
forms what is more commonly called a BSP (Board Support Package). This code serves as the processor and platform-specific code for the
given architecture.
converted by W eb2PDFConvert.com
System call interface
The SCI is a thin layer that provides the means to perform function calls from user space into the kernel. As discussed previously, this interface
can be architecture dependent, even within the same processor family. The SCI is actually an interesting function-call multiplexing and
demultiplexing service. You can find the SCI implementation in ./linux/kernel, as well as architecture-dependent portions in ./linux/arch. More
details for this component are available in the Resources section.
Process management
Process management is focused on the execution of processes. In the kernel, these are
What is a kernel?
called threads and represent an individual virtualization of the processor (thread code, data,
As shown in Figure 3, a kernel is really nothing more
stack, and CPU registers). In user space, the term process is typically used, though the Linux than a resource manager. Whether the resource being
implementation does not separate the two concepts (processes and threads). The kernel managed is a process, memory, or hardware device,
provides an application program interface (API) through the SCI to create a new process (fork, the kernel manages and arbitrates access to the
resource between multiple competing users (both in
exec, or Portable Operating System Interface [POSIX] functions), stop a process (kill, exit), the kernel and in user space).
and communicate and synchronize between them (signal, or POSIX mechanisms).
Also in process management is the need to share the CPU between the active threads. The kernel implements a novel scheduling algorithm that
operates in constant time, regardless of the number of threads vying for the CPU. This is called the O(1) scheduler, denoting that the same
amount of time is taken to schedule one thread as it is to schedule many. The O(1) scheduler also supports multiple processors (called
Symmetric MultiProcessing, or SMP). You can find the process management sources in ./linux/kernel and architecture-dependent sources in
./linux/arch). You can learn more about this algorithm in the Resources section.
Memory management
Another important resource that's managed by the kernel is memory. For efficiency, given the way that the hardware manages virtual memory,
memory is managed in what are called pages (4KB in size for most architectures). Linux includes the means to manage the available memory,
as well as the hardware mechanisms for physical and virtual mappings.
But memory management is much more than managing 4KB buffers. Linux provides abstractions over 4KB buffers, such as the slab allocator.
This memory management scheme uses 4KB buffers as its base, but then allocates structures from within, keeping track of which pages are full,
partially used, and empty. This allows the scheme to dynamically grow and shrink based on the needs of the greater system.
Supporting multiple users of memory, there are times when the available memory can be exhausted. For this reason, pages can be moved out of
memory and onto the disk. This process is called swapping because the pages are swapped from memory onto the hard disk. You can find the
memory management sources in ./linux/mm.
Virtual file system
The virtual file system (VFS) is an interesting aspect of the Linux kernel because it provides a common interface abstraction for file systems. The
VFS provides a switching layer between the SCI and the file systems supported by the kernel (see Figure 4).
Figure 4. The VFS provides a switching fabric between users and file systems
converted by W eb2PDFConvert.com
At the top of the VFS is a common API abstraction of functions such as open, close, read, and write. At the bottom of the VFS are the file system
abstractions that define how the upper-layer functions are implemented. These are plug-ins for the given file system (of which over 50 exist). You
can find the file system sources in ./linux/fs.
Below the file system layer is the buffer cache, which provides a common set of functions to the file system layer (independent of any particular
file system). This caching layer optimizes access to the physical devices by keeping data around for a short time (or speculatively read ahead so
that the data is available when needed). Below the buffer cache are the device drivers, which implement the interface for the particular physical
device.
Network stack
The network stack, by design, follows a layered architecture modeled after the protocols themselves. Recall that the Internet Protocol (IP) is the
core network layer protocol that sits below the transport protocol (most commonly the Transmission Control Protocol, or TCP). Above TCP is the
sockets layer, which is invoked through the SCI.
The sockets layer is the standard API to the networking subsystem and provides a user interface to a variety of networking protocols. From raw
frame access to IP protocol data units (PDUs) and up to TCP and the User Datagram Protocol (UDP), the sockets layer provides a standardized
way to manage connections and move data between endpoints. You can find the networking sources in the kernel at ./linux/net.
Device drivers
The vast majority of the source code in the Linux kernel exists in device drivers that make a particular hardware device usable. The Linux source
tree provides a drivers subdirectory that is further divided by the various devices that are supported, such as Bluetooth, I2C, serial, and so on. You
can find the device driver sources in ./linux/drivers.
Architecture-dependent code
While much of Linux is independent of the architecture on which it runs, there are elements that must consider the architecture for normal
operation and for efficiency. The ./linux/arch subdirectory defines the architecture-dependent portion of the kernel source contained in a number of
subdirectories that are specific to the architecture (collectively forming the BSP). For a typical desktop, the i386 directory is used. Each
architecture subdirectory contains a number of other subdirectories that focus on a particular aspect of the kernel, such as boot, kernel, memory
management, and others. You can find the architecture-dependent code in ./linux/arch.
Linux is also a dynamic kernel, supporting the addition and removal of software components on the fly. These are called dynamically loadable
kernel modules, and they can be inserted at boot when they're needed (when a particular device is found requiring the module) or at any time by
the user.
A recent advancement of Linux is its use as an operating system for other operating systems (called a hypervisor). Recently, a modification to the
kernel was made called the Kernel-based Virtual Machine (KVM). This modification enabled a new interface to user space that allows other
operating systems to run above the KVM-enabled kernel. In addition to running another instance of Linux, Microsoft Windows can also be
virtualized. The only constraint is that the underlying processor must support the new virtualization instructions. See the Resources section for
more information.
Going further
This article just scratched the surface of the Linux kernel architecture and its features and capabilities. You can check out the Documentation
directory that's provided in every Linux distribution for detailed information about the contents of the kernel. Be sure to check out the Resources
section at the end of this article for more detailed information about many of the topics discussed here.
converted by W eb2PDFConvert.com
UNIX, MINIX and Linux are covered in Wikipedia, along with a detailed family tree of the
operating systems.
The GNU C Library, or glibc, is the implementation of the standard C library. It's used in the
GNU/Linux operating system, as well as the GNU/Hurd microkernel operating system.
developerWorks Premium
uClinux is a port of the Linux kernel that can execute on systems that lack an MMU. This Exclusive tools to build your next great
allows the Linux kernel to run on very small embedded platforms, such as the Motorola app. Learn more.
DragonBall processor used in the PalmPilot Personal Digital Assistants (PDAs).
developerWorks Labs
"Kernel command using Linux system calls" (developerWorks, March 2007) covers the SCI, Technical resources for innovators and
which is an important layer in the Linux kernel, with user-space support from glibc that enables early adopters to experiment with.
function calls between user space and the kernel.
IBM evaluation software
"Inside the Linux scheduler" (developerWorks, June 2006) explores the new O(1) scheduler Evaluate IBM software and solutions,
introduced in Linux 2.6 that is efficient, scales with a large number of processes (threads), and and transform challenges into
opportunities.
takes advantage of SMP systems.
"Access the Linux kernel using the /proc filesystem" (developerWorks, March 2006) looks at
the /proc file system, which is a virtual file system that provides a novel way for user-space
applications to communicate with the kernel. This article demonstrates /proc, as well as
loadable kernel modules.
"Server clinic: Put virtual filesystems to work" (developerWorks, April 2003) delves into the
VFS layer that allows Linux to support a variety of different file systems through a common
interface. This same interface is also used for other types of devices, such as sockets.
"Inside the Linux boot process" (developerWorks, May 2006) examines the Linux boot
process, which takes care of bringing up a Linux system and is the same basic process
whether you're booting from a hard disk, floppy, USB memory stick, or over the network.
"Linux initial RAM disk (initrd) overview" (developerWorks, July 2006) inspects the initial RAM
disk, which isolates the boot process from the physical medium from which it's booting.
"Better networking with SCTP" (developerWorks, February 2006) covers one of the most
interesting networking protocols, Stream Control Transmission Protocol, which operates like
TCP but adds a number of useful features such as messaging, multi-homing, and multi-
streaming. Linux, like BSD, is a great operating system if you're interested in networking
protocols.
"Anatomy of the Linux slab allocator" (developerWorks, May 2007) covers one of the most
interesting aspects of memory management in Linux, the slab allocator. This mechanism
originated in SunOS, but it's found a friendly home inside the Linux kernel.
"Virtual Linux" (developerWorks, December 2006) shows how Linux can take advantage of
processors with virtualization capabilities.
"Linux and symmetric multiprocessing" (developerWorks, March 2007) discusses how Linux
can also take advantage of processors that offer chip-level multiprocessing.
"Discover the Linux Kernel Virtual Machine" (developerWorks, April 2007) covers the recent
introduction of virtualization into the kernel, which turns the Linux kernel into a hypervisor for
other virtualized operating systems.
Check out Tim's book GNU/Linux Application Programming for more information on
programming Linux in user space.
In the developerWorks Linux zone, find more resources for Linux developers, including Linux
tutorials, as well as our readers' favorite Linux articles and tutorials over the last month.
Stay current with developerWorks technical events and Webcasts.
Discuss
Get involved in the developerWorks community through our developer blogs, forums,
podcasts, and community topics in our new developerWorks spaces.
converted by W eb2PDFConvert.com
Comments
Sign in or register to leave a comment.
Add comment:
Note: HTML elements are not supported within comments.
Submit
You ask a great question, ranj. I've added a discussion on this topic to the Real World Linux
group with some resources to get you started. Join the group and the discussion:
https://ibm.biz/BdxPf3
Posted by cmw.osdude on 06 December 2012 Report abuse
I am a new bee in embedded linux and found your post very helpful . I am very interested in
embedded linux but have not got opportunity to work in this field. Kindly guide me how to excel in
this domain .
I dont want to take only bookish knowledge. want to have more and more hands on experience
Posted by ranj on 06 December 2012 Report abuse
IBM accessibility
Portugus (Brasil)
Espaol
Vit
converted by W eb2PDFConvert.com