Operating Systems From 0 To 1 PT1
Operating Systems From 0 To 1 PT1
O P E R AT I N G S Y S T E M S :
FROM 0 TO 1
Contents
Preface i
I Preliminary 1
1 Domain documents . . . . . . . . . . . . . . . 3
2.4 Abstraction . . . . . . . . . . . . . . . . . 26
3 Computer Architecture . . . . . . . . . . . . . 33
4.1 objdump . . . . . . . . . . . . . . . . . . 50
II Groundwork 191
7 Bootloader . . . . . . . . . . . . . . . . . . 193
10 Process . . . . . . . . . . . . . . . . . . . . 281
11 Interrupt . . . . . . . . . . . . . . . . . . . 287
Index . . . . . . . . . . . . . . . . . . . . . 293
Biblography . . . . . . . . . . . . . . . . . . 295
Preface
Greetings!
If that is the case, this book is for you. By going through this book,
you will be able to find the missing pieces that are essential and enable
you to implement your own operating system from scratch! Yes, from
scratch, without going through any existing operating system layer to
prove to yourself that you are an operating system developer. You may
ask,“Isn’t it more practical to learn the internals of Linux?”.
Yes...
and no.
Learning Linux can help your workflow at your day job. However, if
you follow that route, you still won’t achieve the ultimate goal of writ-
ing an actual operating system. By writing your own operating system,
you will gain knowledge that you will not be able to glean just from learn-
ii tu, do hoang
ing Linux.
✄ You will learn how a computer works at the hardware level, and you
will learn to write software to manage that hardware directly.
There are many books and courses on this topic made by famous profes-
sors and experts out there already. Who am I to write a book on such
an advanced topic? While it’s true that many quality resources exist, I
find them lacking. Do any of them show you how to compile your C code
and the C runtime library independent of an existing operating system?
Most books on operating system design and implementation only dis-
cuss the software side; how the operating system communicates with the
hardware is skipped. Important hardware details are skipped, and it’s
difficult for a self-learner to find relevant resources on the Internet. The
aim of this book is to bridge that gap: not only will you learn how to pro-
gram hardware directly, but also how to read official documents from hard-
ware vendors to program it. You no longer have to seek out resources to
help yourself interpret hardware manuals and documentation: you can
do it yourself. Lastly, I wrote this book from an autodidact’s perspec-
tive. I made this book as self-contained as possible so you can spend more
operating systems: from 0 to 1 iii
time learning and less time guessing or seeking out information on the
Internet.
One of the core focuses of this book is to guide you through the pro-
cess of reading official documentation from vendors to implement your
software. Official documents from hardware vendors like Intel are criti-
cal for implementing an operating system or any other software that di-
rectly controls the hardware. At a minimum, an operating system devel-
oper needs to be able to comprehend these documents and implement
software based on a set of hardware requirements. Thus, the first chap-
ter is dedicated to discussing relevant documents and their importance.
Let’s dive in. With this book, I hope to provide enough foundational
knowledge that will open doors for you to make sense of other resources.
This book will be beneficial to students who’ve just finished their first
C/C++ course greatly. Imagine how cool it would be to show prospec-
tive employers that you’ve already built an operating system.
Prerequisites
– Ohm’s law
If you are unfamiliar with these concepts, you can quickly learn them
here: http://www.allaboutcircuits.com/textbook/, by reading chap-
ter 1 and chapter 2.
✄ C programming. In particular:
iv tu, do hoang
✄ Linux basics:
✄ Touch typing. Since we are going to use Linux, touch typing helps. I
know typing speed does not relate to problem-solving, but at least your
typing speed should be fast enough not to let it get in the way and de-
grade the learning experience.
✄ Write code independently. It’s pointless to copy and paste code. Real
learning happens when you solve problems on your own. Some exam-
ples are provided to help kick start your work, but most problems are
yours to conquer. However, the solutions are available online for you
after giving a good try.
Acknowledgments
Preliminary
1
Domain documents
A problem domain is the part of the world where the computer is to pro- problem domain
duce effects, together with the means available to produce them, directly
or indirectly. (Kovitz, 1999)
Requirements are the effects that the machine is to exert in the prob-
lem domain by virtue of its programming.
4 operating systems: from 0 to 1
Application Non-software
Software Domain
Domain Domains
One thing to note is that software is its own problem domain. A prob-
lem domain does not necessarily divide between software and itself. Compilers,
3D graphics, games, cryptography, artificial intelligence, etc., are parts of
software engineering domains (actually it is more of a computer science
domain than a software engineering domain). In general, a software-exclusive
domain creates software to be used by other software. Operating System
is also a domain, but is overlapped with other domains such as electrical
engineering. To effectively implement an operating system, it is required
to learn enough of the external domain. How much learning is enough
for a software engineer? At the minimum, a software engineer should be
knowledgeable enough to understand the documents prepared by hard-
ware engineers for using (i.e. programming) their devices.
Learning a programming language, even C or Assembly, does not mean
a software engineer can automatically be good at hardware programming
or any related low-level programming domains. One can spend 10 years,
20 years or his entire life writing C/C++ code, and he still cannot write
an operating system, simply because of the ignorance of relevant domain
knowledge. Just like learning English does not mean a person automat-
ically becomes good at reading Math books written in English. Much
6 operating systems: from 0 to 1
Software requirement document includes both a list of requirements and Software requirement
a description of the problem domain (Kovitz, 1999).
previous section, the tricky part is not programming alone but program-
ming according to a problem domain. The bulk of software design and
implementation depends upon the knowledge of the problem domain. The
better understood the domain, the higher quality software can be. For
example, building a house is practiced over thousands of years and is well
understood, and it is easy to build a high-quality house; software is no
different. Code that is difficult to understand is usually due to the au-
thor’s ignorance of a problem domain. In the context of this book, we
seek to understand the low-level working of various hardware devices.
What vs How “what” and “how” are vague terms. What is the “what”?
Is it nouns only? If so, what if a customer requires his software to per-
form specific steps of operations, such as purchasing procedure for a
customer on a website. Does it include “verbs” now? However, isn’t
the “how” supposed to be step by step operations? Anything can be
the “what” and anything can be the “how”.
✄ etc
In the future, instead of a drop-down menu, all books are listed directly
on a page in thumbnails. Books might be reimplemented as a graph,
and each node is a book for finding related books, as a recommender
is going to be added in the next version. The requirement document
needs updating again to remove all the outdated implementation de-
tails, thus required additional efforts to maintain the requirement doc-
8 operating systems: from 0 to 1
ument, and when the effort for syncing with the implementation is too
much, the developers give up documentation, and everyone starts rant-
ing how useless documentation is.
Software specification document states rules relating desired behavior of Software specification
the output devices to all possible behavior of the input devices, as well
as any rules that other parts of the problem domain must obey.Kovitz
(1999)
Aside from the Intel’s official website, the website of this book also hosts
the documents for convenience2 . 2
Intel may change the links to the doc-
uments as they update their website,
Intel documents divide the requirement and specification sections clearly, so this book doesn’t contain any link
to the documents to avoid confusion
but call the sections with different names. The corresponding to the re- for readers.
quirement document is a section called “Functional Description”, which
consists mostly of domain description; for specification, “Register Description”
section describes all programming interfaces. Both documents carry no
unnecessary implementation details3 . Intel documents are also great ex- 3
As it should be,those details are
trade secret.
amples of how to write well requirements/specifications, as explained in
this chapter.
Other than the Intel documents, other documents will be introduced
in the relevant chapters.
2
From hardware to software:
Layers of abstraction
At the core, a transistor is just a resistor whose values can vary based transistor
on an input voltage value.
Figure 2.1.2: Modern transistor
With this property, a transistor can be used as a current amplifier (more
voltage, less resistance) or switch electrical signals off and on (block and
unblock an electron flow) based on a voltage level. At 0 v, no current can
pass through a transistor, thus it acts like a circuit with an open switch
(light bulb off) because the resistor value is enough to block the electri-
cal flow. Similarly, at +3.5 v, current can flow through a transistor be-
1 2 3
cause the resistor value is lessened, effectively enables electron flow, thus
acts like a circuit with a closed switch. If you want a deeper explana-
A bit has two states: 0 and 1, which is the building block of all digi- tion elec-
tal systems and software. Similar to a light bulb that can be turned on trons move, you should look at
and off, bits are made out of this electrical stream from the power source: the video “How semiconductors
Bit 0 are represented with 0 v (no electron flow), and bit 1 is +3.5 v to work” on Youtube, by Ben Eater.
+5 v (electron flow). Transistor implements a bit correctly, as it can reg-
ulate the electron flow based on voltage level.
The classic transistors invented open a whole new world of micro digi-
tal devices. Prior to the invention, vacuum tubes - which are just fancier
light bulbs - were used to present 0 and 1, and required human to turn
it on and off. MOSFET, or M etal–Oxide–Semiconductor Field-Effect MOSFET
T ransistor, invented in 1959 by Dawon Kahng and Martin M. (John) Atalla
at Bell Labs, is an improved version of classic transistors that is more
suitable for digital devices, as it requires shorter switching time between
two states 0 and 1, more stable, consumes less power and easier to pro-
duce.
There are also two types of MOSFETs analogous to two types of tran-
sistors: n-MOSFET and p-MOSFET. n-MOSFET and p-MOSFET are
also called NMOS and PMOS transistors for short.
All digital devices are designed with logic gates. A logic gate is a device logic gate
that implements a boolean function. Each logic gate includes a number
from hardware to software: layers of abstraction 13
of inputs and an output. All computer operations are built from the com- Figure 2.2.1: Example: NAND
binations of logic gates, which are just combinations of boolean functions. gate
We should realize and appreciate how powerful boolean functions are why and how fromNAND
2.2.2 Logic Gate implementation: CMOS circuit suggest the course Build a
ModernComputer fromFirst
Underlying every logic gate is a circuit called CMOS - C omplementary
Principles: to From
MOSFET. CMOS consists of two complementary transistors, NMOS
Tetris available on Coursera:
and PMOS. The simplest CMOS circuit is an inverter or a NOT gate:
https://www.coursera.org/
learn/build-a-computer. Go
even further, after the course,
you
should take the series
Computational Structures on
Edx.
CMOS
14 operating systems: from 0 to 1
Example 2.2.1. 74HC00 is a chip with four 2-input NAND gates. The
chip comes with 8 input pins and 4 output pins, 1 pin for connecting to
a voltage source and 1 pin for connecting to the ground. This device is
the physical implementation of NAND gates that we can physically touch
and use. But instead of just a single gate, the chip comes with 4 gates
that can be combined. Each combination enables a different logic func-
tion, effective creating other logic gates. This feature is what make the
chip popular.
Each of the gates above is just a simple NAND circuit with the elec-
tron flows, as demonstrated earlier. Yet, many these NAND-gates chips
combined can build a simple computer. Software, at the physical level,
is just electron flows.
16 operating systems: from 0 to 1
(a) Logic diagram of 74HC00 (b) Logic diagram of one NAND gate
A A
A A
Y Y
B B
B B
How can the above gates be created with 74HC00? It is simple: as ev-
ery gate has 2 input pins and 1 output pin, we can write the output of
1 NAND gate to an input of another NAND gate, thus chaining NAND
gates together to produce the diagrams as above.
Being built upon gates, as gates only accept a series of 0 and 1, a hard-
ware device only understands 0 and 1. However, a device only takes 0
and 1 in a systematic way. Machine language is a collection of unique Machine language
bit patterns that a device can identify and perform a corresponding ac-
tion. A machine instruction is a unique bit pattern that a device can iden-
tify. In a computer system, a device with its language is called CPU -
C entral Processing U nit, which controls all activities going inside a com-
puter. For example, in the x86 architecture, the pattern 10100000 means
telling a CPU to add two numbers, or 000000101 to halt a computer. In
the early days of computers, people had to write completely in binary.
Why does such a bit pattern cause a device to do something? The rea-
son is that underlying each instruction is a small circuit that implements
the instruction. Similar to how a function/subroutine in a computer pro-
gram is called by its name, a bit pattern is a name of a little function in-
side a CPU that got executed when the CPU finds one.
Note that CPU is not the only device with its language. CPU is just
a name to indicate a hardware device that controls a computer system.
A hardware device may not be a CPU but still has its language. A de-
vice with its own machine language is a programmable device, since a user
can use the language to command the device to perform different actions.
For example, a printer has its set of commands for instructing it how to
print a page.
18 operating systems: from 0 to 1
Example 2.3.1. A user can use 74HC00 chip without knowing its in-
ternal, but only the interface for using the device. First, we need to know
its layout:
2A 4 11 4Y
2B 5 10 3B
2Y 6 9 3A
GND 7 8 3Y
Input Output
Table 2.3.2: Functional
nA nB nY Description
L L H
L X H
X L H
H H L ✄ n is a number, either 1, 2, 3,
The functional description provides a truth table with all possible pin or 4
inputs and outputs, which also describes the usage of all pins in the de-
✄ H = HIGH voltage level; L =
vice. A user needs not to know the implementation, but on such a table
LOW voltage level; X = don’t
to use the device. We can say that the truth table above is the machine
care.
language of the device. Since the device is digital, its language is a col-
lection of binary strings:
✄ The device has 8 input pins, and this means it accepts binary strings
of 8 bits.
from hardware to software: layers of abstraction 19
✄ The device has 4 output pins, and this means it produces binary strings
of 4 bits from the 8-bit inputs.
The number of input strings is what the device understand, and the num-
ber of output strings is what the device can speak. Together, they make
the language of the device. Even though this device is simple, yet the lan-
guage it can accept contains quite many binary strings: 28 + 24 = 272.
However, the number is a tiny fraction of a complex device like a CPU,
with hundreds of pins.
When leaving as is, 74HC00 is simply a NAND device with two 4-bit
inputs3 . 3
Or simply 4-bit NAND gate, as it can
only accept 4 bits of input at the maxi-
Input Output mum.
Pin 1A 1B 2A 2B 3A 3B 4A 4B 1Y 2Y 3Y 4Y
Value 1 1 0 0 1 1 0 0 0 1 0 1
2A 0 1 4Y
2B 0 1 3B
2Y 1 1 3A
GND 0 3Y
1A A Vcc
1B A 4B
A C
NAND1 1Y C 4A
2A B 4Y
Y
NAND3 2B B C 3B
2Y D D 3A
B D
NAND2 GND Y 3Y
(a) 2-bit OR gate logic diagram, built from 3 NAND (b) Pin 3A and 3B take the values from 1Y and 2Y.
gates with 4 pins just for 2 bits of input.
Figure 2.3.3: 2-bit OR gate imple-
mentation
Table 2.3.3: Truth table of OR
logic diagram.
A B C D Y
0 0 1 1 0
0 1 1 0 1
To implement a 4-bit OR gate, we need a total of four of 74HC00 chips
1 0 0 1 1
configured as OR gates, packaged as a single chip as in figure 2.3.4. 1 1 0 0 1
1A A1 Vcc
Figure 2.3.4: 4-bit OR chip made
1B A2 4B from four 74HC00 devices
1Y C1 4A
2A B1 4Y
2B B1 C1 3B
2Y D1 D1 3A
GND Y1 3Y
1A A2 Vcc
1B A2 4B
1Y C2 4A
2A B2 4Y
2B B2 C2 3B
2Y D2 D2 3A
GND Y2 3Y
1A A3 Vcc
1B A3 4B
1Y C3 4A
2A B3 4Y
2B B3 C3 3B
2Y D3 D3 3A
GND Y3 3Y
1A A4 Vcc
1B A4 4B
1Y C4 4A
2A B4 4Y
2B B4 C4 3B
2Y D4 D4 3A
GND Y4 3Y
from hardware to software: layers of abstraction 21
or <op1>, <op2>
nand <op1>, <op2>
A decoder is built out of logic gates similar to other digital devices. However,
a storage device can be anything that can store 0 and 1 and is retriev-
able. A storage device can be a magnetized device that uses magnetism
to store information, or it can be made out of electrical circuits that can
change and rermember states when a voltage is applied. Regardless of
the technology used, as long as the device can store data and is accessi-
ble to retrieve data, it suffices. Indeed, the modern devices are so com-
plex that it is impossible and unnecessary to understand every implemen-
tation detail. Instead, we only need to learn the interfaces, e.g. the pins,
that the devices expose.
from hardware to software: layers of abstraction 23
1A A2 Vcc
1B A2 4B
1Y C2 4A
2A B2 4Y
2B B2 C2 3B
2Y D2 D2 3A
GND Y2 3Y
1A 1 Vcc 1A A4 Vcc
1B 1 0 4B 1B A4 4B
1Y 0 0 4A 1Y C4 4A
2A B4 4Y
2A 0 4-bit NAND 1 4Y
2B B4 C4 3B
2B 0 1 3B
2Y 1 1 3A 2Y D4 D4 3A
GND 0 3Y GND Y4 3Y
source1.asm
if (...) {
Figure 2.3.6: Repeated assembly
....... patterns are generalized into a new
} else { language.
.......
}
.................
source2.asm
.................
.................
source<n>.asm
In reality, even though all languages are equivalent in power, not all
of them are capable of express programs of each other. Programming lan-
guages vary between two ends of a spectrum: high level and low level.
The higher level a programming language is, the more distant it be-
comes from the hardware. In some high-level programming languages,
such as Python, a programmer cannot manipulate underlying hardware,
despite being able to deliver the same computations as low-level program-
ming languages. The reason is that high-level languages want to hide hard-
ware details to free programmers from dealing with irrelevant details not
related to current problem domains. Such convenience, however, is not
free: it requires software to carry an extra code for managing hardware
details (e.g. memory) thus making the code run slower, and it makes hard-
ware programming difficult or impossible. The more abstractions a pro-
gramming language imposes, the more difficult it is for writing low-level
software, such as hardware drivers or an operating system. This is the
reason why C is usually a language of choice for writing an operating sys-
tem, since C is just a thin wrapper of the underlying hardware, making
26 operating systems: from 0 to 1
2.4 Abstraction
✄ The recurring details are given a new and simpler language than the
languages of the lower layers.
✄ CMOS layer has a recurring pattern that makes sure logic gates are
reliably translated to CMOS circuits: a k-input gate uses k PMOS
and k NMOS transistors (Wakerly, 1999). Since digital devices use
CMOS exclusively, a language arose to describe higher level ideas while
hiding CMOS circuits: Logic Gates.
✄ Logic Gates hides the language of circuits and focuses on how to im-
plement primitive Boolean functions and combine them to create new
functions. All logic gates receive input and generate output as binary
numbers. Thanks to this recurring patterns, logic gates are hidden away
for the new language: Assembly, which is a set of predefined binary
patterns that cause the underlying gates to perform an action.
✄ Soon, people realized that many recurring patterns arisen from within
Assembly language. Repeated blocks of Assembly code appear in Assembly
source files that express the same or similar idea. There were many
such ideas that can be reliably translated into Assembly code. Thus,
the ideas were extracted for building into the high level programming
languages that everyone programmer learns today.
Programming Language
Assembly Language
Logic Gates
Circuit
a
digraph {
a -> b;
b -> c;
a -> c; b d
d -> c;
}
draw_line(a, b);
a -> b;
computers.
3.1.1 Server
A desktop computer is a general-purpose computer with an input and out- desktop computer
put system designed for a human user, with moderate resources enough
for regular use. The input system usually includes a mouse and a key-
board, while the output system usually consists of a monitor that can
display a large mount of pixels. The computer is enclosed in a chassis
large enough for putting various computer components such as a proces-
sor, a motherboard, a power supply, a hard drive, etc.
A mobile computer is similar to a desktop computer with fewer resources mobile computer
but can be carried around.
Game consoles are similar to desktop computers but are optimized for
gaming. Instead of a keyboard and a mouse, the input system of a game
console are game controllers, which is a device with a few buttons for con-
trolling on-screen objects; the output system is a television. The chas-
sis is similar to a desktop computer but is smaller. Game consoles use
custom processors and graphic processors but are similar to ones in desk-
top computers. For example, the first Xbox uses a custom Intel Pentium
III processor.
both the input and output systems along with the computer in a single
package.
36 operating systems: from 0 to 1
only able to perform one or a few specialized tasks. These computers are
used for a single purpose, but they are still general-purpose since it is pos-
sible to program them to perform different tasks, depends on the require-
ments, without changing the underlying hardware.
phones.
Figure 3.1.8: Apple A5 SoC
Be it a microcontroller or a system-on-chip, there must be an environ-
ment where these devices can connect to other devices. This environment
is a circuit board called a PCB – Printed C ircuit Board. A printed cir-
cuit board is a physical board that contains lines and pads to enable elec-
tron flows between electrical and electronics components. Without a PCB,
devices cannot be combined to create a larger device. As long as these
computer architecture 37
devices are hidden inside a larger device and contribute to a larger de-
vice that operates at a higher level layer for a higher level purpose, they
are embedded devices. Writing a program for an embedded device is there-
fore called embedded programming. Embedded computers are used in au-
tomatically controlled devices including power tools, toys, implantable
medical devices, office machines, engine control systems, appliances, remote
controls and other types of embedded systems.
1
2x USB 2.0
Raspberry Pi Model B+ V1.2
RUN
(C)Raspberry Pi 2014
4x USB +
Ethernet
CPU/GPU
Display DSI
on bottom side
controller
microSD slot
Broadcom LAN9514
BCM2835 2x USB 2.0
512MB SDRAM
Camera CSI
3.3V
current
limiter
HDMI
&
1.8V
Regulator polarity protection Ethernet
Video+audio
3.5mm out
RJ45
Composite
Micro power
good
USB HDMI out Ethernet
Power in
4 poles jack
Field Programmable Gate Array (FPGA) is a hardware an array of re- Field Programmable Gate
configurable gates that makes circuit structure programmable after it Array
is shipped away from the factory1 . Recall that in the previous chapter, 1
This is why it is called Field Gate
Programmable Array. It is changeable
each 74HC00 chip can be configured as a gate, and a more sophisticated “in the field” where it is applied.
device can be built by combining multiple 74HC00 chips. In a similar
38 operating systems: from 0 to 1
manner, each FPGA device contains thousands of chips called logic blocks,
which is a more complicated chip than a 74HC00 chip that can be con-
figured to implement a Boolean logic function. These logic blocks can
be chained together to create a high-level hardware feature. This high-
level feature is usually a dedicated algorithm that needs high-speed pro-
cessing.
FPGA is applied in the cases where the specialized operations are un-
suitable and costly to run on a regular computer such as real-time medi-
cal image processing, cruise control system, circuit prototyping, video en-
computer architecture 39
Computer organization is the functional view of the design of a computer. Computer organization
In this view, hardware components of a computer are presented as boxes
with input and output that connects to each other and form the design
of a computer. Two computers may have the same ISA, but different or-
ganizations. For example, both AMD and Intel processors implement x86
ISA, but the hardware components of each processor that make up the
environments for the ISA are not the same.
Computer organizations may vary depend on a manufacturer’s design,
but they are all originated from the Von Neumann architecture2 : 2
John von Neumann was a mathe-
matician and physicist who invented a
computer architecture.
Input and Figure 3.2.1: Von-Neumann
CPU Memory Output Architecture
System bus
Control bus
Address bus
Data bus
Bus are electrical wires for sending raw bits between the above compo-
nents.
I/O Devices are devices that give input to a computer i.e. keyboard, mouse,
sensor, etc, and takes the output from a computer i.e. monitor takes
information sent from CPU to display it, LED turns on/off according
to a pattern computed by CPU, etc.
Registers are a hardware component for high-speed data access and com- Registers
munication with other hardware devices. Registers allow software to
control hardware directly by writing to registers of a device, or receive
information from hardware device when reading from registers of a
device.
Not all registers are used for communication with other devices. In
a CPU, most registers are used as high-speed storage for temporary
data. Other devices that a CPU can communicate always have a set
of registers for interfacing with the CPU.
These two interfaces are extremely important, as they are the only inter-
faces for controlling hardware with software. Writing device drivers is es-
sentially learning the functionality of each register and how to use them
properly to control the device.
ever, this device was located in a chip also known as MCH or M emory
C ontroller H ub. In this case, the CPU does not communicate directly
to the RAM, but to the MCH chip, and this chip then accesses the mem-
ory to read or write data. The first option provides better performance
since there is no middleman in the communications between the CPU
and the memory.
System Bus
System Bus
Control
Control Address
Address CPU Memory
MCH Data
Memory Data
MCH
CPU
(a) Old CPU (b) Modern CPU
3.2.3 Hardware
✄ a chipset of two chips which are the Northbridge and Southbridge chips
✄ generic slots for other devices, e.g. network card, sound card.
CPU
Clock Front-side
Graphics Generator
bus
card slot
Chipset
Memory Slots
High-speed
graphics bus
(AGP or PCI
Express)
Northbridge Memory
bus
(memory
controller hub)
Internal
Bus
Southbridge
PCI (I/O controller
Bus hub)
IDE
SATA
USB Cables and
Ethernet ports leading
Audio Codec
CMOS Memory off-board
PCI Slots
LPC
Bus Super I/O
Serial Port
Parallel Port
Flash ROM Floppy Disk
(BIOS) Keyboard
Mouse
computer architecture 47
✄ etc.
For the remain of this chapter, please carry on the reading to chapter 3
in Intel Manual Volume 1, “Basic Execution Environment” .
4
x86 Assembly and C
Not quite. Surely, the compiler at its current state of the art is trust-
worthy, and we do not need to write code in assembly, most of the time.
A compiler can generate code, but as mentioned previously, a high-level
language is a collection of patterns of a lower-level language. It does not
cover everything that a hardware platform provides. As a consequence,
not every assembly instruction can be generated by a compiler, so we still
need to write assembly code for these circumstances to access hardware-
specific features. Since hardware-specific features require writing assem-
bly code, debugging requires reading it. We might spend even more time
reading than writing. Working with low-level code that interacts directly
with hardware, assembly code is unavoidable. Also, understand how a
compiler generates assembly code could improve a programmer’s produc-
tivity. For example, if a job or school assignment requires us to write as-
sembly code, we can simply write it in C, then let gcc does the hard work-
ing of writing the assembly code for us. We merely collect the generated
assembly code, modify as needed and be done with the assignment.
We will learn objdump extensively, along with how to use Intel docu-
ments to aid in understanding x86 assembly code.
50 operating systems: from 0 to 1
4.1 objdump
$ objdump -d hello
$ objdump -D hello
The output overruns the terminal screen. To make it easy for reading,
send all the output to less:
At the start of the output displays the file format of the object file:
00000000004003e0 <_start>:
4003e0: 31 ed xor ebp,ebp
4003e2: 49 89 d1 mov r9,rdx
4003e5: 5e pop rsi
...more assembly code....
0000000000400410 <deregister_tm_clones>:
400410: b8 3f 10 60 00 mov eax,0x60103f
400415: 55 push rbp
400416: 48 2d 38 10 60 00 sub rax,0x601038
...more assembly code....
✄ Chapter 1 provides brief information about the manual, and the com-
ment notations used in the book.
Exercise 4.3.1. Read section 1.3 in volume 2, exclude sections 1.3.5 and
1.3.7.
jmp eax
Then, we use an editor e.g. Emacs, then create a new file, write the code
and save it in a file, e.g. test.asm. Then, in the terminal, run the com-
mand:
-f option specifies the file format, e.g. ELF, of the final output file. But
in this case, the format is bin, which means this file is just a flat binary
output without any extra information. That is, the written assembly
code is translated to machine code as is, without the overhead of the
metadata from file format like ELF. Indeed, after compiling, we can
examine the output using this command:
$ hd test
x86 assembly and c 55
The file only consists of 3 bytes: 66 ff e0, which is equivalent to the in-
struction jmp eax.
00000000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 01 00 03 00 01 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 40 00 00 00 00 00 00 00 34 00 00 00 00 00 28 00 |@.......4.....(.|
00000030 05 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000060 00 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00 |................|
00000070 06 00 00 00 00 00 00 00 10 01 00 00 02 00 00 00 |................|
00000080 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 |................|
00000090 07 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................|
000000a0 20 01 00 00 21 00 00 00 00 00 00 00 00 00 00 00 | ...!...........|
000000b0 01 00 00 00 00 00 00 00 11 00 00 00 02 00 00 00 |................|
000000c0 00 00 00 00 00 00 00 00 50 01 00 00 30 00 00 00 |........P...0...|
000000d0 04 00 00 00 03 00 00 00 04 00 00 00 10 00 00 00 |................|
000000e0 19 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................|
000000f0 80 01 00 00 0d 00 00 00 00 00 00 00 00 00 00 00 |................|
00000100 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
56 operating systems: from 0 to 1
00000110 ff e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000120 00 2e 74 65 78 74 00 2e 73 68 73 74 72 74 61 62 |..text..shstrtab|
00000130 00 2e 73 79 6d 74 61 62 00 2e 73 74 72 74 61 62 |..symtab..strtab|
00000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000160 01 00 00 00 00 00 00 00 00 00 00 00 04 00 f1 ff |................|
00000170 00 00 00 00 00 00 00 00 00 00 00 00 03 00 01 00 |................|
00000180 00 74 65 73 74 2e 61 73 6d 00 00 00 00 00 00 00 |.disp8-5.asm....|
00000190
Thus, it is better just to use flat binary format in this case, to experiment
instruction by instruction.
Note: Using the bin format puts nasm by default into 16-bit mode.
To enable 32-bit code to be generated, we must add this line at the be-
ginning of an nasm source file:
bits 32
Instruction
Opcode ModR/M SIB Displacement Immediate
Prefixes
7 65 32 0 7 65 32 0
Reg/
Mod R/M Scale Index Base
Opcode
1. The REX prefix is optional, but if used must be immediately before the opcode; see Section
2.2.1, “REX Prefixes” in the manual for additional information.
2. For VEX encoding information, see Section 2.3, “Intel® Advanced Vector Extensions (Intel®
AVX)” in the manual.
3. Some rare instructions can take an 8B immediate or 8B displacement.
jmp [0x1234]
ff 26 34 12
The very first byte, 0xff is the opcode, which is unique to jmp
instruction.
✄ mod field, or modifier field, is combined with r/m field for a total of
5 bits of information to encode 32 possible values: 8 registers and
24 addressing modes.
The tables 4.5.1 and 4.5.2 list all possible 256 values of ModR/M byte
and how each value maps to an addressing mode and a register, in 16-
bit and 32-bit modes.
x86 assembly and c 59
r8(/r) AL CL DL BL AH CH DH BH
r16(/r) AX CX DX BX SP BP1 SI DI
r32(/r) EAX ECX EDX EBX ESP EBP ESI EDI
mm(/r) MM0 MM1 MM2 MM3 MM4 MM5 MM6 MM7
xmm(/r) XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7
(In decimal) /digit (Opcode) 0 1 2 3 4 5 6 7
(In binary) REG = 000 001 010 011 100 101 110 111
Effective Address Mod R/M Values of ModR/M Byte (In Hexadecimal)
[BX + SI] 00 000 00 08 10 18 20 28 30 38
[BX + DI] 001 01 09 11 19 21 29 31 39
[BP + SI] 010 02 0A 12 1A 22 2A 32 3A
[BP + DI] 011 03 0B 13 1B 23 2B 33 3B
[SI] 100 04 0C 14 1C 24 2C 34 3C
[DI] 101 05 0D 15 1D 25 2D 35 3D
disp162 110 06 0E 16 1E 26 2E 36 3E
[BX] 111 07 0F 17 1F 27 2F 37 3F
[BX + SI] + disp83 01 000 40 48 50 58 60 68 70 78
[BX + DI] + disp8 001 41 49 51 59 61 69 71 79
[BP + SI] + disp8 010 42 4A 52 5A 62 6A 72 7A
[BP + DI] + disp8 011 43 4B 53 5B 63 6B 73 7B
[SI] + disp8 100 44 4C 54 5C 64 6C 74 7C
[DI] + disp8 101 45 4D 55 5D 65 6D 75 7D
[BP] + disp8 110 46 4E 56 5E 66 6E 76 7E
[BX] + disp8 111 47 4F 57 5F 67 6F 77 7F
[BX + SI] + disp16 10 000 80 88 90 98 A0 A8 B0 B8
[BX + DI] + disp16 001 81 89 91 99 A1 A9 B1 B9
[BP + SI] + disp16 010 82 8A 92 9A A2 AA B2 BA
[BP + DI] + disp16 011 83 8B 93 9B A3 AB B3 BB
[SI] + disp16 100 84 8C 94 9C A4 AC B4 BC
[DI] + disp16 101 85 8D 95 9D A5 AD B5 BD
[BP] + disp16 110 86 8E 96 9E A6 AE B6 BE
[BX] + disp16 111 87 8F 97 9F A7 AF B7 BF
EAX/AX/AL/MM0/XMM0 11 000 C0 C8 D0 D8 E0 E8 F0 F8
ECX/CX/CL/MM1/XMM1 001 C1 C9 D1 D9 E1 E9 F1 F9
EDX/DX/DL/MM2/XMM2 010 C2 CA D2 DA E2 EA F2 FA
EBX/BX/BL/MM3/XMM3 011 C3 CB D3 DB E3 EB F3 FB
ESP/SP/AHMM4/XMM4 100 C4 CC D4 DC E4 EC F4 FC
EBP/BP/CH/MM5/XMM5 101 C5 CD D5 DD E5 ED F5 FD
ESI/SI/DH/MM6/XMM6 110 C6 CE D6 DE E6 EE F6 FE
EDI/DI/BH/MM7/XMM7 111 C7 CF D7 DF E7 EF F7 FF
1. The default segment register is SS for the effective addresses containing a BP index, DS for other effective
addresses.
2. The disp16 nomenclature denotes a 16-bit displacement that follows the ModR/M byte and that is added to the
index.
3. The disp8 nomenclature denotes an 8-bit displacement that follows the ModR/M byte and that is sign-extended
and added to the index.
r8(/r) AL CL DL BL AH CH DH BH
r16(/r) AX CX DX BX SP BP SI DI
r32(/r) EAX ECX EDX EBX ESP EBP ESI EDI
mm(/r) MM0 MM1 MM2 MM3 MM4 MM5 MM6 MM7
xmm(/r) XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7
(In decimal) /digit (Opcode) 0 1 2 3 4 5 6 7
(In binary) REG = 000 001 010 011 100 101 110 111
Effective Address Mod R/M Values of ModR/M Byte (In Hexadecimal)
[EAX] 00 000 00 08 10 18 20 28 30 38
[ECX] 001 01 09 11 19 21 29 31 39
[EDX] 010 02 0A 12 1A 22 2A 32 3A
[EBX] 011 03 0B 13 1B 23 2B 33 3B
[--][--]1 100 04 0C 14 1C 24 2C 34 3C
disp322 101 05 0D 15 1D 25 2D 35 3D
[ESI] 110 06 0E 16 1E 26 2E 36 3E
[EDI] 111 07 0F 17 1F 27 2F 37 3F
[EAX] + disp83 01 000 40 48 50 58 60 68 70 78
[ECX] + disp8 001 41 49 51 59 61 69 71 79
[EDX] + disp8 010 42 4A 52 5A 62 6A 72 7A
[EBX] + disp8 011 43 4B 53 5B 63 6B 73 7B
[--][--] + disp8 100 44 4C 54 5C 64 6C 74 7C
[EBP] + disp8 101 45 4D 55 5D 65 6D 75 7D
[ESI] + disp8 110 46 4E 56 5E 66 6E 76 7E
[EDI] + disp8 111 47 4F 57 5F 67 6F 77 7F
[EAX] + disp32 10 000 80 88 90 98 A0 A8 B0 B8
[ECX] + disp32 001 81 89 91 99 A1 A9 B1 B9
[EDX] + disp32 010 82 8A 92 9A A2 AA B2 BA
[EBX] + disp32 011 83 8B 93 9B A3 AB B3 BB
[--][--] + disp32 100 84 8C 94 9C A4 AC B4 BC
[EBP] + disp32 101 85 8D 95 9D A5 AD B5 BD
[ESI] + disp32 110 86 8E 96 9E A6 AE B6 BE
[EDI] + disp32 111 87 8F 97 9F A7 AF B7 BF
EAX/AX/AL/MM0/XMM0 11 000 C0 C8 D0 D8 E0 E8 F0 F8
ECX/CX/CL/MM/XMM1 001 C1 C9 D1 D9 E1 E9 F1 F9
EDX/DX/DL/MM2/XMM2 010 C2 CA D2 DA E2 EA F2 FA
EBX/BX/BL/MM3/XMM3 011 C3 CB D3 DB E3 EB F3 FB
ESP/SP/AH/MM4/XMM4 100 C4 CC D4 DC E4 EC F4 FC
EBP/BP/CH/MM5/XMM5 101 C5 CD D5 DD E5 ED F5 FD
ESI/SI/DH/MM6/XMM6 110 C6 CE D6 DE E6 EE F6 FE
EDI/DI/BH/MM7/XMM7 111 C7 CF D7 DF E7 EF F7 FF
3. The disp8 nomenclature denotes an 8-bit displacement that follows the ModR/M byte (or the SIB byte if one is
present) and that is sign-extended and added to the index.
jmp [0x1234]
ff 26 34 12
0xff is the opcode. Next to it, 0x26 is the ModR/M byte. Look up in
the 16-bit table , the first operand is in the row, equivalent to a disp16, Remember, using bin format
which means a 16-bit offset. Since the instruction does not have a generates 16-bit code by default
second operand, the column can be ignored.
66 01 c8
The interesting feature of this instruction is that 0x66 is the not the
opcode. 0x01 is the opcode. So then, what is 0x66? Recall that for
every assembly instruction, there will be an optional instruction prefix,
and that is what 0x66 is. According to the Intel manual, vol 1:
Why is the first operand in the row and the second in a column? Let’s
break down the ModR/M byte, with an example value c8, into bits:
1 1 0 0 1 0 0 0
The mod field divides addressing modes into 4 different categories. Further
combines with the r/m field, exactly one addressing mode can be selected
from one of the 24 rows. If an instruction only requires one operand, then
the column can be ignored. Then the reg/opcode field finally provides
an extra register or different variants, if an instruction requires one.
SIB is Scale-I ndex-Base byte. This byte encodes ways to calculate the
memory position into an element of an array. SIB is the name that is
based on this formula for calculating an effective address:
Below is the table listing all 256 values of SIB byte, with the lookup
rule similar to ModR/M tables:
00000000 67 ff 24 43
First of all, the first byte, 0x67 is not an opcode but a prefix. The num-
ber is a predefined prefix for address-size override prefix. After the pre-
fix, comes the opcode 0xff and the ModR/M byte 0x24. The value from
ModR/M suggests that there exists a SIB byte that follows. The SIB byte
is 0x43.
Look up in the SIB table, the row tells that eax is scaled by 2, and the
column tells that the base to be added is in ebx.
jmp [0x1234]
64 operating systems: from 0 to 1
1. The [*] nomenclature means a disp32 with no base if the MOD is 00B. Otherwise, [*] means disp8 or disp32 +
[EBP]. This provides the following address modes:
ff 26 34 12
67 ff 24 85 34 12 00 00
✄ 0x24 is the ModR/M byte. According to table 4.5.2, the value sug-
gests that a SIB byte follows, .
✄ 0x85 is the SIB byte. According to table 4.5.3, the byte 0x85 can
be destructured into bits as follow:
SS R/M REG
1 0 0 0 0 1 0 1
The above values are obtained through the columns SS, R/M and
finally the 8 column of REG respectively. The total bits combined
into the value 10000101, which is 0x85 in hex value. By default,
if a register after the displacement is not specified, it is set to EBP
register, and thus the 6th column (bit pattern 101) is always cho-
sen. If the example uses another register:
the SIB byte becomes 0x86 instead of , which is in the 7th column.
Try to verify with the table 4.5.3 again.
66 operating systems: from 0 to 1
66 b8 34 12 00 00
Exercise 4.5.1. Read section 2.1 in Volume 2 for even more details.
Each table contains the following fields, and can have one or more rows:
Feature flag
(r), it means the first operand is encoded in r/m field of ModR/M byte,
and is only readable.
Flags affected lists the possible changes to system flags in EFLAGS reg-
ister.
Exceptions list the possible errors that can occur when an instruction
cannot run correctly. This section is valuable for OS debugging. Exceptions
fall into one of the following categories:
✄ Floating-Point Exception
For our OS, we only use Protected Mode Exceptions and Real-Address Mode
Exceptions. The details are in section 3.1.1.13 and 3.1.1.14, volume 2.
Let’s look at our good old jmp instruction. First, the opcode table:
main:
jmp main
jmp main2
jmp main
70 operating systems: from 0 to 1
main2:
jmp 0x1234
main main2
Table 4.7.2: Memory address of
↓ ↓
each opcode
Address 00 01 02 03 04 05 06 07 08 09
Opcode eb fe eb 02 eb fa e9 2b 12 00
The same rule can be applied to rel16 and rel32 encoding. In the
example code, jmp 0x1234 uses rel16 (which means 2-byte offset) and
is generated into e9 2b 12. As the table 4.7.1 shows, e9 opcode takes a
cw operand, which is a 2-byte offset (section 3.1.1.1, volume 2). Notice
one strange issue here: the offset value is 2b 12, while it is supposed to
be 34 12. There is nothing wrong. Remember, rel8/rel16/rel32 is an
offset, not an address. A offset is a distance from a point. Since no label
is given but a number, the offset is calculated from the start of a program.
In this case, the start of the program is the address 00, the end of jmp
0x1234 is the address 092 , so the offset is calculated as 0x1234 - 0x9 2
which means 9 bytes was consumed,
starting from address 0.
= 0x122b. That solved the mystery!
jmp [0x1234]
is generated into:
ff 26 34 12
Since this is 16-bit code, we use table 4.5.1. Looking up the table,
ModR/M value 26 means disp16, which means a 16-bit offset from the
start of current index4 , which is the base address stored in DS register. 4
Look at the note under the table.
is generated into:
67 ff 28
Since 28 is the value in the 5th column of the table 4.5.25 that refers 5
Remember the prefix 67 indicates the
instruction is used as 32-bit. The pre-
to [eax], we successfully generate an instruction for a far jump. After fix only added if the default environ-
ment is assumed as 16-bit when gener-
CPU runs the instruction, the program counter eip and code segment
ating code by an assembler.
register cs is set to the memory address, stored in the memory location
that eax points to, and CPU starts fetching code from the new address
in cs and eip. To make it more concrete, here is an example:
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
eax
0x00001000 1000 34 12 00 00 78 56
cs
0x00005678
eip
0x00001234
can be seen from the figure above, the blue part is a segment address,
loaded into cs register with the value 0x5678; the red part is the
memory address within that segment, loaded into eip register with the
value 0x1234 and start executing from there.
jmp 0x5678:0x1234
is generated into:
ea 34 12 78 56
In this section, we will examine how data definition in C maps to its as-
sembly form. The generated code is extracted from .bss section. That
means, the assembly code displayed has no6 , aside from showing that such 6
Actually, code is just a type of data,
and is often used for hijacking into a
a value has an equivalent assembly opcode that represents an instruc- running program to execute such code.
However, we have no use for it in this
tion. book.
The code-assembly listing is not random, but is based on Chapter 4
of Volume 1, “Data Type”. The chapter lists fundamental data types that
x86 hardware operates on, and through learning the generated assembly
code, it can be understood how close C maps its syntax to hardware, and
then a programmer can see why C is appropriate for OS programming.
The specific objdump command used in this section will be:
Note: zero bytes are hidden with three dot symbols: ... To show all
the zero bytes, we add -z option.
x86 assembly and c 73
The most basic types that x86 architecture works with are based on sizes,
each is twice as large as the previous one: 1 byte (8 bits), 2 bytes (16 bits),
4 bytes (32 bits), 8 bytes (64 bits) and 16 bytes (128 bits).
These types are simplest: they are just chunks of memory at different
sizes that enables CPU to access memory efficiently. From the manual,
section 4.1.1, volume 1:
the output is (the colors mark which values belong to which variables):
mer (the colors correspond the the variables). Intel is a little-endian ma-
chine, which means smaller addresses hold bytes with smaller values, larger
addresses hold byte with larger values. For example, 0x1234 is displayed
as 34 12; that is, 34 appears first at address 0x601032, then 12 at 0x601033.
The decimal values within a byte is unchanged, so we see 34 12 instead
of 43 21. This is quite confusing at first, but you will get used to it soon.
Also, isn’t it redundant when char type is always 1 byte already and
why do we bother adding int8_t? The truth is, char type is not guar-
anteed to be 1 byte in size, but only the minimum of 1 byte in size. In
C, a byte is defined to be the size of a char, and a char is defined to be small-
est addressable unit of the underlying hardware platform. There are hard-
ware devices that the smallest addressable unit is 16 bit or even bigger,
which means char is 2 bytes in size and a “byte” in such platforms is ac-
tually 2 units of 8-bit bytes.
Not all architectures support the double quadword type. Still, gcc does
provide support for 128-bit number and generate code when a CPU sup-
ports it (that is, a CPU must be 64-bit). By specifying a variable of type
__int128 or unsigned __int128, we get a 128-bit variable. If a CPU does
not support 64-bit mode, gcc throws an error.
In all the examples above, when the value of a variable with smaller
size is assigned to a variable with larger size, the value easily fits in the
larger variable. On the contrary, the value of a variable with larger size
is assigned to a variable with smaller size, two scenarios occur:
✄ The value is greater than the maximum value of the variable with smaller
layout, so it needs truncating to the size of the variable and causing
incorrect value.
✄ The value is smaller than the maximum value of the variable with a
smaller layout, so it fits the variable.
x86 assembly and c 77
However, the value might be unknown until runtime and can be value, it
is best not to let such implicit conversion handled by the compiler, but
explicitly controlled by a programmer. Otherwise it will cause subtle bugs
that are hard to catch as the erroneous values might rarely be used to
reproduce the bugs.
Pointers are variables that hold memory addresses. x86 works with 2 types
of pointers:
Far pointer is also an offset like a near pointer, but with an explicit seg-
ment selector.
Near Pointer
Figure 4.8.2: Numeric Data
Types
Offset
31 0
47 32 31 0
C only provides support for near pointers, since far pointers are plat-
form dependent, such as x86. In application code, you can assume that
the address of current segment starts at 0, so the offset is actually any
memory address from 0 to the maximum address.
Source
#include <stdint.h>
int8_t i = 0;
int8_t *p1 = (int8_t *) 0x1234;
int8_t *p2 = &i;
78 operating systems: from 0 to 1
The pointer p1 holds a direct address with the value 0x1234. The pointer
p2 holds the address of the variable i. Note that both the pointers are
8 bytes in size (or 4-byte, if 32-bit).
A bit field is a contiguous sequence of bits. Bit fields allow data structur-
ing at bit level. For example, a 32-bit data can hold multiple bit fields
that represent multiples different pieces of information, such as bits 0-4
specifies the size of a data structure, bit 5-6 specifies permissions and so
on. Data structures at the bit level are common for low-level program-
ming.
Source
struct bit_field {
int data1:8;
x86 assembly and c 79
Least
Significant
Bit
int data2:8;
int data3:8;
int data4:8;
};
struct bit_field2 {
int data1:8;
int data2:8;
int data3:8;
int data4:8;
char data5:4;
};
struct normal_struct {
int data1;
int data2;
int data3;
int data4;
};
struct normal_struct ns = {
.data1 = 0x12345678,
.data2 = 0x9abcdef0,
.data3 = 0x12345678,
.data4 = 0x9abcdef0,
};
80 operating systems: from 0 to 1
int i = 0x12345678;
struct bit_field bf = {
.data1 = 0x12,
.data2 = 0x34,
.data3 = 0x56,
.data4 = 0x78
};
Assembly Each variable and its value are given a unique color in the as-
sembly listing below:
0804a018 <ns>:
804a018: 78 56 js 804a070 <_end+0x34>
804a01a: 34 12 xor al,0x12
804a01c: f0 de bc 9a 78 56 34 lock fidivr WORD PTR [edx+ebx*4+0x12345678]
804a023: 12
804a024: f0 de bc 9a 78 56 34 lock fidivr WORD PTR [edx+ebx*4+0x12345678]
804a02b: 12
0804a028 <i>:
804a028: 78 56 js 804a080 <_end+0x44>
804a02a: 34 12 xor al,0x12
0804a02c <bf>:
804a02c: 12 34 56 adc dh,BYTE PTR [esi+edx*2]
x86 assembly and c 81
The sample code creates 4 variables: ns, i, bf, bf2. The definition of normal_struct
and bit_field structs both specify 4 integers. bit_field specifies ad-
ditional information next to its member name, separated by a colon, e.g.
.data1 : 8. This extra information is the bit width of each bit group.
It means, even though defined as an int, .data1 only consumes 8 bit of
information. If additional data members are specified after .data1, two
scenarios happen:
✄ If the new data members fit within the remaining bits after .data, which
are 24 bits7 , then the total size of bit_field struct is still 4 bytes, or 7
Since .data1 is declared as an int, 32
bits are still allocated, but .data1 can
32 bits. only access 8 bits of information.
✄ If the new data members don’t fit, then the remaining 24 bits (3 bytes)
are still allocated. However, the new data members are allocated brand
new storages, without using the previous 24 bits.
In the example, the 4 data members: .data1, .data2, .data3 and .data4,
each can access 8 bits of information, and together can access all of 4 bytes
of the integer first declared by .data1. As can be seen by the generated
assembly code, the values of bf are follow natural order as written in the
C code: 12 34 56 78, since each value is a separate members. In con-
trast, the value of i is a number as a whole, so it is subject to the rule
of little endianess and thus contains the value 78 56 34 12. Note that
at 804a02f, is the address of the final byte in bf, but next to it is a num-
ber 12, despite 78 is the last number in it. This extra number 12 does
not belong to the value of bf. objdump is just being confused that 78 is
an opcode; 78 corresponds to js instruction, and it requires an operand.
For that reason, objdump grabs whatever the next byte after 78 and put
it there. objdump is a tool to display assembly code after all. A better
tool to use is gdb that we will learn in the next chapter. But for this chap-
ter, objdump suffices.
82 operating systems: from 0 to 1
Finally, the struct of bf28 is the same of bf9 , except it contains one 8
bit_field2
9
bit_field
more data member: .data5, and is defined as a char. For this reason, an-
other 4 bytes are allocated just for .data5, even though it can only ac-
cess 4 bits of information, and the final value of bf2 is: 12 34 56 78 0f
00 00 00. The remaining 3 bytes must be accessed by the mean of a pointer,
or casting to another data type that can fully access all 4 bytes..
struct bit_field {
int data1:8;
};
struct bit_field bf = {
.data1 = 0x1234,
};
struct bit_field2 {
int data1:8;
int data5:32;
};
Although share the same name, string as defined by x86 is different than
a string in C. x86 defines string as “continuous sequences of bits, bytes,
words, or doublewords”. On the other hand, C defines a string as an ar-
ray of 1-byte characters with a zero as the last element of the array to
x86 assembly and c 83
make a null-terminated string. This implies that strings in x86 are ar-
rays, not C strings. A programmer can define an array of bytes, words
or doublewords with char or uint8_t, short or uint16_t and int or uint32_t,
except an array of bits. However, such a feature can be easily implemented,
as an array of bits is essentially any array of bytes, or words or double-
words, but operates at the bit level.
The following code demonstrates how to define array (string) data types:
Source
#include <stdint.h>
804a037: 12
Then it comes a16 with 2 elements, each is 2-byte long. Since 2 ele-
ments are 4 bytes in total, which is in the natural alignment, gcc pads
no byte. The value of a16 is 34 12 78 56, with a16[0] equals to 34 12
and a16[1] equals to 78 56. Note that, objdump is confused again, as
de is the opcode for the instruction fidivr (short of reverse divide) that
requires another operand, so objdump grabs whatever the next bytes that
makes sense to it for creating “an operand”. Only the highlighted values
belong to a32.
Finally is a64, also with 2 elements, but 8 bytes each. The total size
of a64 is 16 bytes, which is in the natural alignment, therefore no padding
bytes added. The values of both a64[0] and a64[1] are the same: f0
de bc 9a 78 56 34 12, that got misinterpreted to fidivr instruction.
uint8_t a2[2][2] = {
{0x12, 0x34},
{0x56, 0x78}
};
x86 assembly and c 85
uint8_t a3[2][2][2] = {
{{0x12, 0x34},
{0x56, 0x78}},
{{0x9a, 0xbc},
{0xde, 0xff}},
};
char names[2][10] = {
"John␣Doe",
"Jane␣Doe"
86 operating systems: from 0 to 1
};
This section will explore how compiler transform high level code into as-
sembly code that CPU can execute, and see how common assembly pat-
terns help to create higher level syntax. -S option is added to objdump
to better demonstrate the connection between high and low level code.
In this section, the option --no-show-raw-insn is added to objdump
command to omit the opcodes for clarity:
Previous section explores how various types of data are created, and how
they are laid out in memory. Once memory storages are allocated for vari-
ables, they must be accessible and writable. Data transfer instructions
move data (bytes, words, doublewords or quadwords) between memory
and registers, and between registers, effectively read from a storage source
and write to another storage source.
Source
#include <stdint.h>
int32_t i = 0x12345678;
return 0;
}
80483f6: ret
80483f7: xchg ax,ax
80483f9: xchg ax,ax
80483fb: xchg ax,ax
80483fd: xchg ax,ax
80483ff: nop
The general data movement is performed with the mov instruction. Note
that despite the instruction being called mov, it actually copies data from
one destination to another.
The red instruction copies data from the register esp to the register
ebp. This mov instruction moves data between registers and is assigned
the opcode 89.
The blue instructions copies data from one memory location (the i
variable) to another (the j variable). There exists no data movement from
memory to memory; it requires two mov instructions, one for copying the
data from a memory location to a register, and one for copying the data
from the register to the destination memory location.
4.9.2 Expressions
Source
int expr(int i, int j)
{
int add = i + j;
int sub = i - j;
int mul = i * j;
int div = i / j;
int mod = i % j;
int neg = -i;
int and = i & j;
int or = i | j;
int xor = i ^ j;
int not = ~i;
int shl = i << 8;
x86 assembly and c 89
return 0;
}
Assembly The full assembly listing is really long. For that reason, we ex-
amine expression by expression.
Similar to imul, idiv performs sign divide. But, different from imul
above idiv only takes one operand:
The same idiv instruction also performs the modulo operation, since
it also calculates a remainder and stores in the variable mod, at lo-
cation [ebp-0x2c].
Expression: int or = i | j;
shl (shift logical left) shifts the bits in the destination operand to
the left by the number of bits specified in the source operand. In
this case, eax stores i and shl shifts eax by 8 bits to the left. A dif-
ferent name for shl is sal ( shift arithmetic left). Both can be used
synonymous. Finally, the result is stored in the variable shl at [ebp-0x14].
Here is a visual demonstration of shl/sal and shr instructions:
After shifting to the left, the right most bit is set for Carry Flag in
EFLAGS register.
sar is similar to shl/sal, but shift bits to the right and extends
the sign bit. For right shift, shr and sar are two different instruc-
tions. shr differs to sar is that it does not extend the sign bit. Finally,
the result is stored in the variable shr at [ebp-0x10].
x86 assembly and c 93
X 10001000100010001000100010001111 10001000100010001000100010001111 X
1 00010001000100010001000100011110 0 0 01000100010001000100010001000111 1
0 01000100010001000111100000000000 0 0 00000000001000100010001000100010 0
(a) SHL/SAL (Source: Figure 7-6, Volume 1) (b) SHR (Source: Figure 7-7, Volume 1)
00100010001000100010001000100011 1
11000100010001000100010001000111 X
11100010001000100010001000100011 1
With sar, the sign bit (the most significant bit) is preserved. That
is, if the sign bit is 0, the new bits always get the value 0; if the sign
bit is 1, the new bits always get the value 1.
cmp and variants of the variants of set instructions make up all the
logical comparisons. In this expression, cmp compares variable i and
j; then sete stores the value 1 to al register if the comparison from
cmp earlier is equal, or stores 0 otherwise. The general name for vari-
ants of set instruction is called SETcc. The suffix cc denotes the
condition being tested for in EFLAGS register. Appendix B in vol-
ume 1, “EFLAGS Condition Codes”, lists the conditions it is possi-
ble to test for with this instruction. Finally, the result is stored in
the variable equal1 at [ebp-0x41].
Logical AND operator && is one of the syntaxes that is made entirely
in software14 with simpler instructions. The algorithm from the as- 14
That is, there is no equivalent assem-
bly instruction implemented in hard-
sembly code is simple: ware.
First, i is copied into eax at 80484d9. Then, the value of eax + 0x1
is copied into edx as an effective address at 80484dc. The lea (load
effective address) instruction copies a memory address into a reg-
ister. According to Volume 2, the source operand is a memory ad-
dress specified with one of the processors addressing modes. This
means, the source operand must be specified by the addressing modes
defined in 16-bit/32-bit ModR/M Byte tables, 4.5.1 and 4.5.2.
After loading the incremented value into edx, the value of i is in-
creased by 1 at 80484df. Finally, the previous i value is stored back
to i1 at [ebp-0x8] by the instruction at 80484e2.
The primary differences between this increment syntax and the pre-
vious one are:
4.9.3 Stack
✄ push instruction and its variants add a new element on top of the stack
✄ pop instructions and its variants remove the top-most element from
the stack.
Local variables are variables that exist within a scope. A scope is delim-
ited by a pair of braces: {..}. The most common scope to define local
variables is at function scope. However, scope can be unnamed, and vari-
ables created inside an unnamed scope do not exist outside of its scope
and its inner scope.
void foo() {
int a;
int b;
}
int foo() {
int i;
{
int a = 1;
int b = 2;
{
return i = a + b;
}
}
}
a and b are local to where it is defined and local into its inner child
scope that return i = a + b. However, they do not exist at the function
scope that creates i.
the stack. The local variables and arguments are automatically allocated
upon enter a function and destroyed after exiting a function, that’s why
it’s called automatic variables.
A base frame pointer points to the start of the current function frame,
and is kept in ebp register. Whenever a function is called, it is allocated
with its own dedicated storage on stack, called stack frame. A stack frame
is where all local variables and arguments of a function are placed on a
stack15 . 15
Data and only data are exclusively
allocatedon stackforevery stack
When a function needs a local variable or an argument, it uses ebp frame. No code resides here.
to access a variable:
✄ All local variables are allocated after the ebp pointer. Thus, to access
a local variable, a number is subtracted from ebp to reach the loca-
tion of the variable.
✄ The ebp itself pointer points to the return address of its caller.
L = Local Variable
return i;
}
✄ [ebp+0x8] accesses a.
✄ [ebp+0xc] access b.
For accessing arguments, the rule is that the closer a variable on stack
to ebp, the closer it is to a function name.
ebp+0x8 ebp+0x4
↓ ↓
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
0xffe0 N i
Figure 4.9.6: Function arguments
N = Next local variable starts here and local variables in memory
From the figure, we can see that a and b are laid out in memory with
the exact order as written in C, relative to the return address.
Source
#include <stdio.h>
return a + b;
}
return 0;
}
Assembly For every function call, gcc pushes arguments on the stack in
reversed order with the push instructions. That is, the arguments pushed
on stack are in reserved order as it is written in high level C code, to
ensure the relative order between arguments, as seen in previous sec-
tion how function arguments and local variables are laid out. Then,
gcc generates a call instruction, which then implicitly pushes a re-
turn address before transferring the control to add function:
080483f2 <main>:
int main(int argc, char *argv[]) {
80483f2: push ebp
80483f3: mov ebp,esp
add(1,2);
80483f5: push 0x2
80483f7: push 0x1
80483f9: call 80483db <add>
80483fe: add esp,0x8
return 0;
8048401: mov eax,0x0
}
8048406: leave
x86 assembly and c 103
8048407: ret
Upon finishing the call to add function, the stack is restored by adding
0x8 to stack pointer esp (which is equivalent to 2 pop instructions). Finally,
a leave instruction is executed and main returns with a ret instruction.
A ret instruction transfers the program execution back to the caller to
the instruction right after the call instruction, the add instruction. The
reason ret can return to such location is that the return address implic-
itly pushed by the call instruction, which is the address right after the
call instruction; whenever the CPU executes ret instruction, it retrieves
the return address that sits right after all the arguments on the stack:
080483db <add>:
#include <stdio.h>
int add(int a, int b) {
80483db: push ebp
80483dc: mov ebp,esp
80483de: sub esp,0x10
int local = 0x12345;
80483e1: DWORD PTR [ebp-0x4],0x12345
return a + b;
80483e8: mov edx,DWORD PTR [ebp+0x8]
80483eb: mov eax,DWORD PTR [ebp+0xc]
80483ee: add eax,edx
}
80483f0: leave
80483f1: ret
Exercise 4.9.3. The above code that gcc generated for function call-
ing is actually the standard method x86 defined. Read chapter 6, “Produce
Calls, Interrupts, and Exceptions”, Intel manual volume 1.
104 operating systems: from 0 to 1
4.9.6 Loop
return 0;
}
80483ff: 90 nop
Exercise 4.9.4. Why does the increment instruction (the blue instruc-
tion) appears before the compare instructions (the green instructions)?
Exercise 4.9.5. What assembly code can be generated for while and
do...while?
4.9.7 Conditional
if (argc) {
i = 1;
} else {
i = 0;
}
return 0;
}
The generated assembly code follows the same order as the correspond-
ing high level syntax:
Every program consists of code and data, and only those two components
made up a program. However, if a program consists purely code and data
of its own, from the perspective of an operating system (as well as hu-
man), it does not know in a program, which block of binary is a program
and which is just raw data, where in the program to start execution, which
region of memory should be protected and which is free to modify. For
that reason, each program carries extra metadata to communicate with
the operating system how to handle the program.
formation.
✄ An ELF header: the very first section of an executable that describes ELF header
the file’s organization.
✄ A program header table: is an array of fixed-size structures that de- program header table
scribes segments of an executable.
✄ A section header table: is an array of fixed-size structures that describes section header table
sections of an executable.
✄ Segments and sections are the main content of an ELF binary, which Segments and sections
are the code and data, divided into chunks of different purposes.
– metadata about other sections used only in the linking process, and
disappear from the final executable.
{ .text
.rodata
{ ...
.data
Later we will compile our kernel as an ELF executable with GCC, and
explicitly specify how segments are created and where they are loaded
the anatomy of a program 109
in memory through the use a linker script, a text file to instruct how a
linker should generate a binary. For now, we will examine the anatomy
of an ELF executable in detail.
$ man elf
$ readelf -h hello
The output:
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x400430
Start of program headers: 64 (bytes into file)
Start of section headers: 6648 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 9
Size of section headers: 64 (bytes)
Number of section headers: 31
Section header string table index: 28
Magic Displays the raw bytes that uniquely addresses a file is an ELF
executable binary. Each byte gives a brief information.
Output Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Byte Description
Possible values:
Value Description
0 Invalid class
1 32-bit objects
2 64-bit objects
Data A byte in Magic field. It specifies the data encoding of the processor-
specific data in the object file.
Possible values:
Value Description
Possible values:
112 operating systems: from 0 to 1
Value Description
0 Invalid version
1 Current version
0 No file type
1 Relocatable file
2 Executable file
3 Shared object file
4 Core file
0xff00 Processor specific, lower bound
0xffff Processor specific, upper bound
The values from 0xff00 to 0xffff are reserved for a processor to de-
fine additional file types meaningful to it.
Machine Specifies the required architecture value for an ELF file e.g.
x86_64, MIPS, SPARC, etc. In the example, the machine is of x86_64
architecture.
Version Specifies the version number of the current object file (not the
version of the ELF header, as the above Version field specified).
Entry point address Specifies the memory address where the very first
code to be executed. The address of main function is the default in
a normal application program, but it can be any function by explic-
itly specifying the function name to gcc. For the operating system
we are going to write, this is the single most important field that we
need to retrieve to bootstrap our kernel, and everything else can be
ignored.
the anatomy of a program 113
Start of section headers The offset of the section header table in bytes,
similar to the start of program headers. In the example, it is 6648 bytes
into file.
Flags Hold processor-specific flags associated with the file. When the
program is loaded, in a x86 machine, EFLAGS register is set according
to this value. In the example, the value is 0x0, which means EFLAGS
register is in a clear state.
Size of this header Specifies the total size of ELF header’s size in bytes.
In the example, it is 64 bytes, which is equivalent to Start of program
headers. Note that these two numbers are not necessarily equivalent,
as program header table might be placed far away from the ELF header.
The only fixed component in the ELF executable binary is the ELF
header, which appears at the very beginning of the file.
Section header string table index Specifies the index of the header
in the section header table that points to the section that holds all
114 operating systems: from 0 to 1
✄ Every section in an object file has exactly one section header describ-
ing it. But, section headers may exist that do not have a section.
✄ An object file may have inactive space. The various headers and the
sections might not “cover” every byte in an object file. The contents
of the inactive data are unspecified.
To get all the headers from an executable binary e.g. hello, use the fol-
lowing command:
$ readelf -S hello
Here is a sample output (do not worry if you don’t understand the
output. Just skim to get your eyes familiar with it. We will dissect it
soon enough):
summarizes the total number of sections in the file, and where the
address where it starts. Then, comes the listing section by section with
the following header, is also the format of each section output:
Output
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
Type This field (in a section header) identifies the type of each section.
Types are used to classify sections.
Address The starting virtual address of each section. Note that the ad-
dresses are virtual only when a program runs in an OS with support
for virtual memory enabled. In our OS, we run on the bare metal, the
addresses will all be physical.
Offset is a distance in bytes, from the first byte of a file to the start of
an object, such as a section or a segment in the context of an ELF bi-
nary file.
Flag Descriptions
E Link editor is to exclude this section from executable and shared library that it builds when
those objects are not to be further relocated.
x Unknown flag to readelf. It happens because the linking process can be done manually
with a linker like GNU ld (we will later later). That is, section flags can be specified
manually, and some flags are for a customized ELF that the open-source readelf doesn’t
know of.
O This section requires special OS-specific processing (beyond the standard linking rules) to
avoid incorrect behavior. A link editor encounters sections whose headers contain
OS-specific values it does not recognize by Type or Flags values defined by ELF standard,
the link editor should combine those sections.
o All bits included in this flag are reserved for operating system-specific semantics.
p All bits included in this flag are reserved for processor-specific semantics. If meanings are
specified, the processor supplement explains them.
Link and Info are numbers that references the indexes of sections, sym-
bol table entries, hash table entries. Link field only holds the index
of a section, while Info field holds an index of a section, a symbol ta-
ble entry or a hash table entry, depends on the type of a section.
Later when writing our OS, we will handcraft the kernel image by ex-
plicitly linking the object files (produced by gcc) through a linker script.
We will specify the memory layout of sections by specifying at what
addresses they will appear in the final image. But we will not assign
any section flag and let the linker take care of it. Nevertheless, know-
ing which flag does what is useful.
Output
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 1] .interp PROGBITS 0000000000400238 00000238
000000000000001c 0000000000000000 A 0 0 1
Nr is 1.
120 operating systems: from 0 to 1
EntSize is 0, which means this section does not have any fixed-size
entry.
Info and Link are 0 and 0, which means this section links to no sec-
tion or entry in any table.
Output
[14] .text PROGBITS 00000000004003e0 000003e0
0000000000000192 0000000000000000 AX 0 0 16
Nr is 14.
EntSize is 0, which means this section does not have any fixed-size
entry.
Info and Link are 0 and 0, which means this section links to no sec-
tion or entry in any table.
Align is 16, which means the starting address of the section should
be divisible by 16, or 0x10. Indeed, it is: 0x3e0/0x10 = 0x3e.
the anatomy of a program 121
In this section, we will learn different details of section types and the pur-
poses of special sections e.g. .bss, .text, .data, etc, by looking at each
section one by one. We will also examine the content of each section as
a hexdump with the commands:
For example, if you want to examine the content of section with index
25 (the .bss section in the sample output) in the file hello:
$ readelf -x 25 hello
If a section contains strings e.g. string symbol table, the flag -x can
be replaced with -p.
NULL marks a section header as inactive and does not have an associated
section. NULL section is always the first entry of section header table.
It means, any useful section starts from 1.
Output
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
NOTE marks a section with special information that other programs will
check for conformance, compatibility, etc, by a vendor or a system builder.
Output
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 2] .note.ABI-tag NOTE 0000000000400254 00000254
0000000000000020 0000000000000000 A 0 0 4
[ 3] .note.gnu.build-i NOTE 0000000000400274 00000274
0000000000000024 0000000000000000 A 0 0 4
$ readelf -x 2 hello
we have:
Output
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 1] .interp PROGBITS 0000000000400238 00000238
000000000000001c 0000000000000000 A 0 0 1
...
[11] .init PROGBITS 0000000000400390 00000390
000000000000001a 0000000000000000 AX 0 0 4
[12] .plt PROGBITS 00000000004003b0 000003b0
0000000000000020 0000000000000010 AX 0 0 16
[13] .plt.got PROGBITS 00000000004003d0 000003d0
the anatomy of a program 123
0000000000000008 0000000000000000 AX 0 0 8
[14] .text PROGBITS 00000000004003e0 000003e0
0000000000000192 0000000000000000 AX 0 0 16
[15] .fini PROGBITS 0000000000400574 00000574
0000000000000009 0000000000000000 AX 0 0 4
[16] .rodata PROGBITS 0000000000400580 00000580
0000000000000004 0000000000000004 AM 0 0 4
[17] .eh_frame_hdr PROGBITS 0000000000400584 00000584
000000000000003c 0000000000000000 A 0 0 4
[18] .eh_frame PROGBITS 00000000004005c0 000005c0
0000000000000114 0000000000000000 A 0 0 8
...
[23] .got PROGBITS 0000000000600ff8 00000ff8
0000000000000008 0000000000000008 WA 0 0 8
[24] .got.plt PROGBITS 0000000000601000 00001000
0000000000000020 0000000000000008 WA 0 0 8
[25] .data PROGBITS 0000000000601020 00001020
0000000000000010 0000000000000000 WA 0 0 8
[27] .comment PROGBITS 0000000000000000 00001030
0000000000000034 0000000000000001 MS 0 0 1
.data This section holds the initialized data of a program. Since the
data are initialized with actual values, gcc allocates the section with
actual byte in the executable binary.
.bss This section, shorts for Block Started by Symbol, holds unini-
tialized data of a program. Unlike other sections, no space is allo-
cated for this section in the image of the executable binary on disk.
The section is allocated only when the program is loaded into main
memory.
124 operating systems: from 0 to 1
Other sections are mainly needed for dynamic linking, that is code
linking at runtime for sharing between many programs. To enable
such feature, an OS as a runtime environment must be presented.
Since we run our OS on bare metal, we are effectively creating such
environment. For simplicity, we won’t add dynamic linking to our
OS.
SYMTAB and DYNSYM These sections hold symbol table. A symbol table
is an array of entries that describe symbols in a program. A symbol
is a name assigned to an entity in a program. The types of these en-
tities are also the types of symbols, and these are the possible types
of an entity:
Output
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 5] .dynsym DYNSYM 00000000004002b8 000002b8
0000000000000048 0000000000000018 A 6 1 8
...
[29] .symtab SYMTAB 0000000000000000 00001068
0000000000000648 0000000000000018 30 47 8
$ readelf -s hello
LOCAL are symbols that are only visible in the object files that
defined them. In C, the static modifier marks a symbol (e.g.
a variable/function) as local to only the file that defines it.
126 operating systems: from 0 to 1
hello.c
return 0;
}
GLOBAL are symbols that are accessible by other object files when
linking together. These symbols are primarily non-static func-
tions and non-static global data. The extern modifier marks
a symbol as externally defined elsewhere but is accessible in the
final executable binary, so an extern variable is also considered
GLOBAL.
hello.c
#include <stdio.h>
$ ./hello
warning: function is not implemented.
add(1,2) is 0
math.c
Value Description
HIDDEN A symbol is hidden when the name is not visible to any other program outside of its
running program.
PROTECTED A symbol is protected when it is shared outside of its running program or shared libary
and cannot be overridden. That is, there can only be one definition for this symbol
across running programs that use it. No program can define its own definition of the
same symbol.
INTERNAL Visibility is processor-specific and is defined by processor-specific ABI.
Ndx is the index of a section that the symbol is in. Aside from fixed
index numbers that represent section indexes, index has these spe-
cial values:
Value Description
Others Sometimes, values such as ANSI_COM, LARGE_COM, SCOM, SUND appear. This means that the
index is processor-specific.
✄ main is a function.
✄ main is inside the 14th section, which is .text. This is logical, since
.text holds all program code.
Output
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[28] .shstrtab STRTAB 0000000000000000 000018b6
000000000000010c 0000000000000000 0 0 1
[30] .strtab STRTAB 0000000000000000 000016b0
0000000000000206 0000000000000000 0 0 1
.strtab holds the symbols e.g. variable names, function names, struct
names, etc., in a C program, but not fixed-size null-terminated C strings;
the C strings are kept in .rodata section.
$ readelf -p 29 hello
The output shows all the section names, with the offset (also the string
index) into .shstrtab the table to the left:
[ 8e] .init
[ 94] .plt.got
[ 9d] .text
[ a3] .fini
[ a9] .rodata
[ b1] .eh_frame_hdr
[ bf] .eh_frame
[ c9] .init_array
[ d5] .fini_array
[ e1] .jcr
[ e6] .dynamic
[ ef] .got.plt
[ f8] .data
[ fe] .bss
[ 103] .comment
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
00000000 \0 . s y m t a b \0 . s t r t a b
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
00000010 \0 . s h s t r t a b \0 . i n t e
.... and so on ....
Figure 5.4.1: String table in
memory of .shstrtab. A red
Similarly, the output of .strtab: number is the starting index of a
string.
[ 2e] __do_global_dtors_aux
[ 44] completed.7585
[ 53] __do_global_dtors_aux_fini_array_entry
[ 7a] frame_dummy
[ 86] __frame_dummy_init_array_entry
[ a5] hello.c
[ ad] __FRAME_END__
[ bb] __JCR_END__
[ c7] __init_array_end
[ d8] _DYNAMIC
[ e1] __init_array_start
[ f4] __GNU_EH_FRAME_HDR
[ 107] _GLOBAL_OFFSET_TABLE_
[ 11d] __libc_csu_fini
[ 12d] _ITM_deregisterTMCloneTable
[ 149] j
[ 14b] _edata
[ 152] __libc_start_main@@GLIBC_2.2.5
[ 171] __data_start
[ 17e] __gmon_start__
[ 18d] __dso_handle
[ 19a] _IO_stdin_used
[ 1a9] __libc_csu_init
[ 1b9] __bss_start
[ 1c5] main
[ 1ca] _Jv_RegisterClasses
[ 1de] __TMC_END__
[ 1ea] _ITM_registerTMCloneTable
HASH holds a symbol hash table, which supports symbol table access.
the bytes in the section can have any value. Until a operating system
actually loads the section into main memory, there is no need to allo-
cate space for the binary image on disk to reduce the size of a binary
file. Here is the details of .bss from the example output:
Output
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[26] .bss NOBITS 0000000000601038 00001038
0000000000000008 0000000000000000 WA 0 0 1
[27] .comment PROGBITS 0000000000000000 00001038
0000000000000034 0000000000000001 MS 0 0 1
In the above output, the size of the section is only 8 bytes, while the
offsets of both sections are the same, which means .bss consumes no
byte of the executable binary on disk.
Notice that the .comment section has no starting address. This means
that this section is discarded when the executable binary is loaded
into memory.
REL holds relocation entries without explicit addends. This type will be
explained in details in 8.1
RELA holds relocation entries with explicit addends. This type will be
explained in details in 8.1
However, we will not use any .init and INIT_ARRAY sections in our
operating system, for simplicity, as initializing an environment is part
of the operating-system domain.
hello.c
#include <stdio.h>
return 0;
}
for gcc. If we want init2 to run before init1, we give it a higher pri-
ority:
hello.c
#include <stdio.h>
return 0;
}
hello.c
#include <stdio.h>
the anatomy of a program 137
void init1() {
printf("%s\n", __FUNCTION__);
}
void init2() {
printf("%s\n", __FUNCTION__);
}
return 0;
}
hello.c
#include <stdio.h>
return 0;
}
hello.c
#include <stdio.h>
void preinit1() {
printf("%s\n", __FUNCTION__);
}
void preinit2() {
printf("%s\n", __FUNCTION__);
}
void init1() {
printf("%s\n", __FUNCTION__);
}
void init2() {
printf("%s\n", __FUNCTION__);
}
__attribute__((section(".preinit_array"))) preinit
preinit_arr[2] = {preinit1, preinit2};
__attribute__((section(".preinit_array"))) init init_arr
[2] = {init1, init2};
return 0;
}
GROUP defines a section group, which is the same section that appears
in different object files but when merged into the final executable bi-
nary file, only one copy is kept and the rest in other object files are
discarded. This section is only relevant in C++ object files, so we will
not examine further.
Exercise 5.4.1. Verify that the value of the Link field of a SYMTAB sec-
tion is the index of a STRTAB section.
Exercise 5.4.2. Verify that the value of the Info field of a SYMTAB sec-
tion is the index of last local symbol + 1. It means, in the symbol table,
from the index listed by Info field onward, no local symbol appears.
Exercise 5.4.3. Verify that the value of the Info field of a REL section
is the index of the SYMTAB section.
Exercise 5.4.4. Verify that the value of the Link field of a REL section
is the index of the section where relocation is applied. For example. if
the section is .rel.text, then the relocating section should be .text.
the anatomy of a program 141
PHDR specifies the location and size of the program header table itself,
both in the file and in the memory image of the program
LOAD specifies a loadable segment. That is, this segment is loaded into
main memory.
142 operating systems: from 0 to 1
✄ Write (W)
✄ Execute (E)
$ readelf -l hello
Output:
Output
Elf file type is EXEC (Executable file)
Entry point 0x400430
There are 9 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000001f8 0x00000000000001f8 R E 8
INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x000000000000070c 0x000000000000070c R E 200000
the anatomy of a program 143
✄ the upper LOAD has Read and Execute permission. This is a text seg-
144 operating systems: from 0 to 1
✄ the lower LOAD has Read and Write permission. This is a data segment.
It means that this segment can be read and written to, but is not al-
lowed to be used as executable code, for security reason.
in a LOAD segment are always loaded by the operating system; all sec-
tions have the same permission, either a RE (Read + Execute) for ex-
ecutable sections, or RW (Read + Write) for data sections.
To see the last point clearer, consider an example of linking two object
files. Suppose we have two source files:
hello.c
#include <stdio.h>
and:
math.c
$ readelf -S math.o
$ readelf -l math.o
There are no program headers in this file.
$ readelf -l hello.o
There are no program headers in this file.
Only when object files are combined into a final executable binary, sec-
tions are fully realized:
✄ 1st section address = starting segment address + section offset = 0x8048000 + 0x154 = 0x08048154
✄ 2nd section address = starting segment address + section offset = 0x8048000 + 0x168 = 0x08048168
Indeed, the end address of a segment is also the end address of the final
section. We can see this by listing all the segments:
$ readelf -l hello
And check, for example, LOAD segment which starts at 0x08048000 and
end at 0x08048000 + 0x005fc = 0x080485fc:
Output
Elf file type is EXEC (Executable file)
Entry point 0x8048310
There are 9 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x00120 0x00120 R E 0x4
INTERP 0x000154 0x08048154 0x08048154 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0x005fc 0x005fc R E 0x1000
LOAD 0x000f08 0x08049f08 0x08049f08 0x00114 0x00118 RW 0x1000
DYNAMIC 0x000f14 0x08049f14 0x08049f14 0x000e8 0x000e8 RW 0x4
NOTE 0x000168 0x08048168 0x08048168 0x00044 0x00044 R 0x4
GNU_EH_FRAME 0x0004dc 0x080484dc 0x080484dc 0x00034 0x00034 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10
150 operating systems: from 0 to 1
The last section in the first LOAD segment is .eh_frame. The .eh_frame
section starts at 0x0804851 because the start address is 0x08048000, the
offset into the file is 0x510. The end address of .eh_frame should be: 0x08048000 + 0x510 + 0xec = 0x080485fc
because the segment size is 0xec. This is exactly the same as the end ad-
dress of the first LOAD segment above: 0x08048000 + 0x5ec = 0x080485fc.
We will be using the GDB - GNU Debugger for debugging our ker-
nel. gdb is the program name. gdb can do four main kinds of things:
✄ Start your program, specifying anything that might affect its behav-
ior.
There must be an existing program for debugging. The good old “Hello
World” program suffices for the educational purpose in this chapter:
hello.c
152 operating systems: from 0 to 1
#include <stdio.h>
$ gdb hello
Output
Symbols from "/tmp/hello".
Local exec file:
‘/tmp/hello’, file type elf32-i386.
Entry point: 0x8048310
0x08048154 - 0x08048167 is .interp
0x08048168 - 0x08048188 is .note.ABI-tag
0x08048188 - 0x080481ac is .note.gnu.build-id
0x080481ac - 0x080481cc is .gnu.hash
0x080481cc - 0x0804821c is .dynsym
0x0804821c - 0x08048266 is .dynstr
0x08048266 - 0x08048270 is .gnu.version
0x08048270 - 0x08048290 is .gnu.version_r
0x08048290 - 0x08048298 is .rel.dyn
0x08048298 - 0x080482a8 is .rel.plt
0x080482a8 - 0x080482cb is .init
0x080482d0 - 0x08048300 is .plt
0x08048300 - 0x08048308 is .plt.got
0x08048310 - 0x080484a2 is .text
0x080484a4 - 0x080484b8 is .fini
0x080484b8 - 0x080484cd is .rodata
0x080484d0 - 0x080484fc is .eh_frame_hdr
0x080484fc - 0x080485c8 is .eh_frame
0x08049f08 - 0x08049f0c is .init_array
0x08049f0c - 0x08049f10 is .fini_array
0x08049f10 - 0x08049f14 is .jcr
0x08049f14 - 0x08049ffc is .dynamic
0x08049ffc - 0x0804a000 is .got
0x0804a000 - 0x0804a014 is .got.plt
0x0804a014 - 0x0804a01c is .data
0x0804a01c - 0x0804a020 is .bss
✄ Path of a symbol file. A symbol file is the file that contains the debug-
ging information. Usually, this is the same file as the binary, but it is
154 operating systems: from 0 to 1
✄ The path of the debugging program and its file type. In the example,
it is this line:
✄ The entry point to the debugging program. That is, the very first code
the program runs. In the example, it is this line:
✄ A list of sections with its starting and ending addresses. In the exam-
ple, it is the remaining output.
This command is similar to info target but give extra information about
program sections, specifically the file offset and the flags of each section.
Example 6.2.3. Here is the output when running against hello pro-
gram:
The output is similar to info target, but with more details. Next to
the section names are the section flags, which are attributes of a section.
Here, we can see that the sections with LOAD flag are from LOAD segment.
The command can be combined with the section flags for filtered outputs:
ALLOBJ displays sections for all loaded object files, including shared
libraries. Shared libraries are only displayed when the program is al-
ready running.
section-flags displays only sections with specified section flags. Note that
these section flags are specific to gdb, though it is based on the sec-
tion attributes defined previously. Currently, gdb understands the fol-
lowing flags:
156 operating systems: from 0 to 1
ALLOC Section will have space allocated in the process when loaded.
Set for all sections except those containing debug information.
LOAD Section will be loaded from the file into the child process mem-
ory. Set for pre-initialized code and data, clear for .bss sections.
The output:
This commands list all function names and their loaded addresses. The
names can be filtered with a regular expression.
This command lists all global and static variable names, or filtered with
a regular expression.
Example 6.2.7. If we add a global variable int i into the sample source
program and recompile then run the command, we get the following out-
put:
5 printf("Hello World!\n");
0x0804841c <+17>: sub esp,0xc
0x0804841f <+20>: push 0x80484c0
0x08048424 <+25>: call 0x80482e0 <puts@plt>
0x08048429 <+30>: add esp,0x10
6 return 0;
0x0804842c <+33>: mov eax,0x0
7 }
0x08048431 <+38>: mov ecx,DWORD PTR [ebp-0x4]
0x08048434 <+41>: leave
0x08048435 <+42>: lea esp,[ecx-0x4]
0x08048438 <+45>: ret
End of assembler dump.
Now the high level source (in green text) is included as part of the as-
sembly dump. Each line is backed by the corresponding assembly code
below it.
The filename must be included in a single quote, and the function must
be prefixed by double colons e.g. ’hello.c’::main to specify disassem-
bling of the function main in the file hello.c.
6.2.6 Command: x
(gdb) x main
By default, without any argument, the command only prints the con-
tent of a single memory address. In this case, that is the starting mem-
ory address in main.
Output 0x804840b <main>: 0x8d 0x4c 0x24 0x04 0x83 0xe40xf0 0xff
0x8048413 <main+8>: 0x71 0xfc 0x55 0x89 0xe5 0x510x83 0xec
0x804841b <main+16>: 0x04 0x83 0xec 0x0c
/20b main argument means that the command prints 20 bytes, where
main starts in memory.
The general form for format argument is: /<repeated count><format
letter>
runtime inspection and debug 163
If the repeated count is not supplied, by default gdb supplies the count
as 1. The format letter is one the following values:
Letter Description
(gdb) r
Output
Starting program: /tmp/hello
Hello World!
[Inferior 1 (process 1002) exited normally]
The program runs successfully and printed the message “Hello World”.
However, it would not be useful if all gdb can do is run a program.
hello.c
1 #include <stdio.h>
2
3 int main(int argc, char *argv[])
4 {
5 printf("Hello World!\n");
6 return 0;
7 }
(gdb) b 3
runtime inspection and debug 165
(gdb) r
Output
Starting program: /tmp/hello
Breakpoint 1, main (argc=1, argv=0x7fffffffdfb8) at hello.c:5
5 printf("Hello World!\n");
The breakpoint is at line 3, but gdb stopped line 5. The reason is that
line 3 does not contain code, but a function signature; gdb only stops where
it can execute code. The code in the function starts at line 5, the call to
printf, so gdb stops there.
Example 6.3.3. Line of code is not always the reliable way to specify
a breakpoint, as the source code can be changed. What if gdb should al-
ways stop at main function? In this case, a better method is to use the
function name directly:
b main
Then, regardless of how the source code changes, gdb always stops at
the main function.
Output
$3 = {int (int, char **)} 0x400526 <main>
b *0x400526
Example 6.3.5. gdb can also set breakpoint in any source file. Suppose
that hello program is composed not just one file but many files e.g. hello1.c,
hello2.c, hello3.c... In that case, simply add the filename before ei-
ther a line number:
b hello.c:3
b hello.c:main
This command executes the current line and stops at the next line. When
the current line is a function call, steps over it.
(gdb) r
Output
Starting program: /tmp/hello
Breakpoint 1, main (argc=1, argv=0x7fffffffdfb8) at hello.c:5
5 printf("Hello World!\n");
(gdb) n
In the output, the first line shows the output produced after execut-
ing line 5; then, the next line shows where gdb stops currently, which is
line 6.
This command executes the current line and stops at the next line. When
the current line is a function call, steps into it to the first next line in the
called function.
(gdb) r
Output
Starting program: /tmp/hello
Breakpoint 1, main (argc=1, argv=0xffffd154) at hello.c:11
11 add(1, 2);
168 operating systems: from 0 to 1
(gdb) s
Output
add (a=1, b=2) at hello.c:6
6 return a + b;
After executing the command s, gdb stepped into the add function where
the first statement is a return.
6.3.5 Command: ni
(gdb) r
Output
Starting program: /tmp/hello
Breakpoint 1, main (argc=1, argv=0xffffd154) at hello.c:5
5 printf("Hello World!\n");
(gdb) ni
Output
0x0804841f 5 printf("Hello World!\n");
(gdb) ni
Output
0x08048424 5 printf("Hello World!\n");
170 operating systems: from 0 to 1
(gdb) ni
(gdb)
Output 6 return 0;
Upon entering ni, gdb executes current instruction and display the
next instruction. That’s why from the output, gdb only displays 3 ad-
dresses: 0x0804841f, 0x08048424 and 0x08048429. The instruction at
0x0804841c, which is the first instruction of printf, is not displayed be-
cause it is the first instruction that gdb stopped at. Assume that gdb stopped
at the first instruction of printf at 0x0804841c, the current instruction
can be displayed using x command:
6.3.6 Command: si
Example 6.3.10. Recall that the assembly code generated from printf
contains a call instruction:
(gdb) si
Output
0x0804841f 5 printf("Hello World!\n");
(gdb) si
172 operating systems: from 0 to 1
Output
0x08048424 5 printf("Hello World!\n");
(gdb) si
Output
0x080482e0 in puts@plt ()
This command executes until the next line is greater than the current
line.
hello.c
#include <stdio.h>
int add1000() {
int total = 0;
printf("Done adding!\n");
runtime inspection and debug 173
return total;
}
Using next command, we need to press 1000 times for finishing the
loop. Instead, a faster way is to use until:
(gdb) b add1000
(gdb) r
Output
Starting program: /tmp/hello
Breakpoint 1, add1000 () at hello.c:4
4 int total = 0;
(gdb) until
Output
5 for (int i = 0; i < 1000; ++i){
(gdb) until
Output 6 total += i;
174 operating systems: from 0 to 1
(gdb) until
Output
5 for (int i = 0; i < 1000; ++i){
(gdb) until
Output
8 printf("Done adding!\n");
Executing the first until, gdb stopped at line 5 since line 5 is greater
than line 4.
Executing the second until, gdb stopped at line 6 since line 6 is greater
than line 5.
Executing the third until, gdb stopped at line 5 since the loop still
continues. Because line 5 is less than line 6, with the fourth until, gdb
kept executing until it does not go back to line 5 anymore and stopped
at line 8. This is a great way to skip over loop in the middle, instead of
setting unneeded breakpoint.
(gdb) r
Output
Starting program: /tmp/hello
Breakpoint 1, add1000 () at hello.c:4
4 int total = 0;
(gdb) until 8
Output
add1000 () at hello.c:8
8 printf("Done adding!\n");
runtime inspection and debug 175
This command executes until the end of a function and displays the re-
turn value. finish is actually just a more convenient version of until.
Example 6.3.13. Using the add1000 function from the previous exam-
ple and use finish instead of until:
(gdb) r
Output
Starting program: /tmp/hello
Breakpoint 1, add1000 () at hello.c:4
4 int total = 0;
(gdb) finish
Output
Run till exit from #0 add1000 () at hello.c:4
Done adding!
0x08048466 in main (argc=1, argv=0xffffd154) at hello.c:15
15 add1000(1, 2);
Value returned is $1 = 499500
6.3.9 Command: bt
This command prints the backtrace of all stack frames. A backtrace is a backtrace
list of currently active functions:
hello.c
void d(int d) { };
void c(int c) { d(0); }
void b(int b) { c(1); }
void a(int a) { b(2); }
{
a(3);
return 0;
}
(gdb) b a
(gdb) r
Output
Starting program: /tmp/hello
Breakpoint 1, a (a=3) at hello.c:9
9 void a(int a) { b(2); }
(gdb) s
Output
b (b=2) at hello.c:7
7 void b(int b) { c(1); }
(gdb) s
Output
c (c=1) at hello.c:5
5 void c(int c) { d(0); }
(gdb) s
runtime inspection and debug 177
Output
d (d=0) at hello.c:3
3 void d(int d) { };
(gdb) bt
Output
#0 d (d=0) at hello.c:3
#1 0x080483eb in c (c=1) at hello.c:5
#2 0x080483fb in b (b=2) at hello.c:7
#3 0x0804840b in a (a=3) at hello.c:9
#4 0x0804841b in main (argc=1, argv=0xffffd154) at hello.c:13
Most-recent calls are placed on top and least-recent calls are near the
bottom. In this case, d is the most current active function, so it has the
index 0. Next is c, the 2nd active function, has the index 1 and so on with
function b, function a, and finally function main at the bottom, the least-
recent function. That is how we read a backtrace.
6.3.10 Command: up
(gdb) bt
Output
#0 d (d=0) at hello.c:3
#1 0x080483eb in c (c=1) at hello.c:5
#2 0x080483fb in b (b=2) at hello.c:7
#3 0x0804840b in a (a=3) at hello.c:9
#4 0x0804841b in main (argc=1, argv=0xffffd154) at hello.c:13
(gdb) up
178 operating systems: from 0 to 1
Output
#1 0x080483eb in c (c=1) at hello.c:3
3 void b(int b) { c(1); }
The output displays the current frame is moved to c and where the
call to c is made, which is in function b at line 3.
Similar to up, this command goes down one frame later then the current
frame.
(gdb) bt
Output
#0 d (d=0) at hello.c:3
#1 0x080483eb in c (c=1) at hello.c:5
#2 0x080483fb in b (b=2) at hello.c:7
#3 0x0804840b in a (a=3) at hello.c:9
#4 0x0804841b in main (argc=1, argv=0xffffd154) at hello.c:13
(gdb) up
Output
#1 0x080483eb in c (c=1) at hello.c:3
3 void b(int b) { c(1); }
(gdb) down
Output
#0 d (d=0) at hello.c:1
1 void d(int d) { };
runtime inspection and debug 179
This command lists the current values in commonly used registers. This
command is useful when debugging assembly and operating system code,
as we can inspect the current state of the machine.
The above registers suffice for writing our operating system in later
part.
83 ec 0c → cc ec 0c
Figure 6.4.1: Opcode replace-
sub esp,0x4 int 3
ment, with int 3
int 3 only costs a single byte, making it efficient for debugging. When
int 3 instruction is executed, the operating system calls its breakpoint
interrupt handler. The handler then checks what process reaches a break-
point, pauses it and notifies the debugger it has paused a debugged pro-
cess. The debugged process is only paused and that means a debugger
is free to inspect its internal state, like a surgeon operates on an anes-
thetic patient. Then, the debugger replaces the int 3 opcode with the
original opcode and executes the original instruction normally.
cc ec 0c → 83 ec 0c
Figure 6.4.2: Restore the original
int 3 sub esp,0x4
opcode, after int 3 was executed
Example 6.4.1. It is simple to see int 3 in action. First, we add an
int 3 instruction where we need gdb to stop:
hello.c
#include <stdio.h>
$ gdb hello
runtime inspection and debug 181
(gdb) r
Output
Starting program: /tmp/hello
Program received signal SIGTRAP, Trace/breakpoint trap.
main (argc=1, argv=0xffffd154) at hello.c:6
6 printf("Hello World\n");
The blue text indicates that gdb encountered a breakpoint, and indeed
it stopped at the right place: the printf call, where int 3 preceded it.
hello.c DIE
Line 1 #include <stdio.h> ....
Line 2 ....
⇒ Line 3 int main(int argc, char *argv[]) → main in hello.c is at
Line 5 .......... 0x804840b in hello
Line 6 .......... ....
....
↓↑
not be able to build an operating system that can be debugged with gdb.
hello.c
#include <stdio.h>
return 0;
}
With the binary ready, we can look at the line number table with the
command:
hello.c 10 0x8048431
Line number is the line number in the source file of which the line is not
an empty line. In the example, line 8 is an empty line, so it does not
appear.
Starting address is the memory address where the line actually starts
in the executable binary.
With such crystal clear information, this is how gdb is able to set a break-
point on a line easily. For placing breakpoints on variables and functions,
it is time to look at the DIEs. To get the DIEs information from an ex-
ecutable binary, run the command:
-wi option lists all the DIE entries. This is one typical DIE entry:
Red This left-most number indicates the current nesting level of a DIE
entry. 0 is the outer-most level DIE with its entity is the compilation
unit. This means subsequent DIE entries with higher nesting level are
all the children of this tag, the compilation unit. It makes sense, as
all the entities must originate from a source file.
runtime inspection and debug 185
Blue These numbers in hex format indicate the offsets into .debug_info
section. Each meaningful information is displayed along with its off-
set. When an attribute references to another attribute, the offset is
used to precisely identify the referenced attribute.
Green These names with DW_AT_ prefix are the attributes attached to a
DIE that describe an entity. Notable attributes:
DW_AT_name
DW_AT_low_pc
DW_AT_high_pc The start and end of the current entity, which is the
compilation unit, in the executable binary. The value in DW_AT_low_pc
is the starting address. DW_AT_high_pc is the size of the compila-
tion unit, when adding up to DW_AT_low_pc results in the end ad-
dress of the entity. In this example, code compiled from hello.c
starts at 0x804840b and end at 0x804840b + 0x2e = 0x8048439.
To really make sure, we verify with objdump:
Output
int main(int argc, char *argv[])
{
804840b: 8d 4c 24 04 lea ecx,[esp+0x4]
804840f: 83 e4 f0 and esp,0xfffffff0
8048412: ff 71 fc push DWORD PTR [ecx-0x4]
8048415: 55 push ebp
8048416: 89 e5 mov ebp,esp
8048418: 51 push ecx
8048419: 83 ec 04 sub esp,0x4
printf("Hello World\n");
804841c: 83 ec 0c sub esp,0xc
804841f: 68 c0 84 04 08 push 0x80484c0
186 operating systems: from 0 to 1