x86 Architecture 1703184092
x86 Architecture 1703184092
Architecture
ICS312
Machine-Level and
Systems Programming
AH AL BH BL CH CL DH DL
AH AL BH BL CH CL DH DL
SI
DI
BP
SP
IP
= FLAGS
CS
DS
SS
ES
16 bits
Control
ALU Unit
Addresses in Memory
We mentioned several registers that are used for
holding addresses of memory locations
Segments:
CS, DS, SS, ES
Pointers:
SI, DI: indices (typically used for pointers)
SP: Stack pointer
BP: (Stack) Base pointer
IP: pointer to the next instruction
Let’s look at the structure of the address space
Address Space
In the 8086 processor, a program is limited to referencing an
address space of size 1MiB, that is 220 bytes
Therefore, addresses are 20-bit long!
A d-bit long address allows to reference 2d different “things”
Example:
2-bit addresses
00, 01, 10, 11
4 “things”
3-bit addresses
000, 001, 010, 011, 100, 101, 110, 111
8 “things”
In our case, these things are “bytes”
One does not address anything smaller than a byte
Therefore, a 20-bit address makes it possible to address 220
individual bytes, or 1MiB
Address Space
One says that a running program has a 1MiB
address space
And the program needs to use 20-bit
addresses to reference memory content
Instructions, data, etc.
Problem: registers are at 16-bit long! How
can they hold a 20-bit address???
The solution: split addresses in two pieces:
The selector
The offset
Simple Selector and Offset
Let us assume that we have an address
space of size 24=16 bytes
Yes, that would not be a useful computer
Addresses are 4-bit long
Let’s assume we have a 2-bit selector and a
2-bit offset
As if our computer had only 2-bit registers
4 bits 16 bits
1011…
1100…
1101…
1110…
1111…
The 8086 Selector Scheme
So far we’ve talked about the selector as a 4-bit quantity, for
simplicity
And we can clearly store a 4-bit quantity in a 16-bit register
This leads to 16 non-overlapping segments
The designers of the 8086 wanted more flexibility
E.g., if you know that you need only an 8K segment, why use
64K for it? Just have the “next” segment start 8K after the
previous segment
We’ll see why segments are needed in a little bit
So, for the 8086, the selector is NOT a 4-bit field, but rather
the address of the beginning of the segment
But now we’re back to our initial problem: Addresses are 20-
bit, how are we to store an address in a 16-bit register???
The 8086 Selector Scheme
What the designers of the 8086 did is pretty simple
Enforce that the beginning address of a segment can
only be a multiple of 16
Therefore, its representation in binary always has its four
lowest bits set to 0
Or, in hexadecimal, its last digit is always 0
So the address of a beginning of a segment is a 20-bit
hex quantity that looks like: XXXX0
Since we know the last digit is always 0, no need to store
it at all
Therefore, we need to store only 4 hex digits
Which, lo and behold, fit in a 16-bit register!
The 8086 Selector Scheme
So now we have two 16-bit quantities
The 16-bit selector
address space
Therefore, the program constantly references code
bytes in three different segments
For now let’s assume that each region is fully
contained in a single segment, which is in fact
not always the case data
CS: points to the beginning of the code
segment
DS: points to the beginning of the data
segment stack
SS: points to the beginning of the stack
segment
The trouble with segments
It is well-known that programming with segmented
architectures is really a pain
In the 8086 you constantly have to make sure segment
registers are set up correctly
What happens if you have data/code that’s more than 64KiB?
You must then switch back and forth between selector values,
which can be really awkward
Something that can cause complexity also is that two different
(selector, offset) pairs can reference the same address
Example: (a,b) and (a-1, b+16)
There is an interesting on-line article on the topic: http://
world.std.com/~swmcd/steven/rants/pc.html
How come it ever survived?
If you code and your data are <64KiB, segments are great
Otherwise, they are a pain
Given the horror of segmented programming, one may wonder how
come it stuck?
From the linked article: “Under normal circumstances, a design so
twisted and flawed as the 8086 would have simply been ignored by the
market and faded away.”
But in 1980, Intel was lucky that IBM picked it for the PC!
Not to criticize IBM or anything, but they were also the reason
why we got stuck with FORTRAN for so many years :/
Big companies making “wrong” decisions has impact
AH AL = EAX
BX
BH BL = EBX
CX
CH CL = ECX
DX
DH DL = EDX
SI = ESI
DI = EDI
BP = EBP
SP = ESP
FLAGS = EFLAGS
IP = EIP
32 bits
Conclusion
From now on we’ll keep referring to the
register names, so make sure you absolutely
know them
The registers are, in some sense, the variables
that we can use
But they have no “type” and you can do
absolutely whatever you want with them, meaning
that you can do horrible mistakes
We’re ready to move on to writing assembly
code for the 32-bit x86 architecture