0% found this document useful (0 votes)
14 views118 pages

Chapter N°2 Main Components of Computers

The document outlines the course content for Computer Architecture, covering topics such as buses, CPU registers, the Arithmetic and Logic Unit (ALU), and internal memory types. It explains the functions and structures of various components, including data, address, and control buses, as well as the operation of the ALU and registers. The document serves as a comprehensive guide for understanding the architecture and functioning of computer systems.

Uploaded by

wazixxxx124
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views118 pages

Chapter N°2 Main Components of Computers

The document outlines the course content for Computer Architecture, covering topics such as buses, CPU registers, the Arithmetic and Logic Unit (ALU), and internal memory types. It explains the functions and structures of various components, including data, address, and control buses, as well as the operation of the ALU and registers. The document serves as a comprehensive guide for understanding the architecture and functioning of computer systems.

Uploaded by

wazixxxx124
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 118

Course Document

Computer Architecture
Mr. A. LAHMISSI - January 2025
1
X R

1 Introduction - Computer Architecture

2 Buses

3 CPU Registers

4 CPU Arithmetic and Logic Unit - ALU

5 Internal Memories: RAM (SRAM & DRAM), ROM

6 Internal Memories: Memory organization

7 Internal Memories: Memory levels and useful concepts

8 Internal Memories : Cache memory: usefulness, and mapping

9 Summery and Conclusions

2
H
Communication Bus State 0
State 1
3rd state HZ

R/W
50% reduction of wires V
Registers Registers

bi-directional D Q
uni-directional uni-directional CK

3
H
Communication Bus Exercise Identify below Components and Buses

R/W

Mux/D-Mux

4
Communication Bus

A communication bus is a set of wires

Ensures the transmission of signals and the exchange of information between


the different units of the computer

5
Bus wires

Each wire carries a signal or it does not, it is present or absent:

1
0 10110100
1
1
0 8 wires  bus of 8-bits
1
0
0

6
Type of Buses

Depending on the nature of the information to be transported, there are three


types of information bus:

 Data bus (bi-directional)


 Address bus (uni-directional)
 Command (or control) bus (Primarily uni-directional)

7
H
Type of Buses

(bi-directional)
Data bus

Peripherals
Main I/O
CPU
memory Unit

(uni-directional)

Address bus

Control bus
(bi-directional)
(Primarily uni-directional)

8
H
Address Bus for Memory Addressing (Addressing capabilities)

 The Memory Address bus : is a unidirectional bus which allows to transport the
addresse code generated by the CPU for addressing the Main Memory.
 The Memory Address bus with a size of n bits can addresses
2n bytes (2n of DATA Register).
 One Data Register can be a bytes, a word or a multiple of them.
DATA Register of One Byte contains 8 bits 1 kBytes is 210 Bytes (1024 B )
DATA Register of One Word contains 16 bits 1 MBytes is 220 Bytes (1048 576 B )
DATA Register of Double Word contains 32 bits 1 GBytes is 230 Bytes
DATA Register of Quadruple Word contains 64 bits 1 TBytes is 240 Bytes

Size of the addressing bus Addressing capacity (memory space)


4 bits 016 bytes
8 bits 256 bytes
10 bits 01 k bytes (1024 bytes)
16 bits 64 k bytes (65536 bytes)
32 bits 4G bytes (4x230 bytes)
9
H
Data Bus

 Data bus: this is a bidirectional bus, it ensures the transfer of data between
the microprocessor and its environment, and vice versa. The size of this
bus specifies the possible values of the data that can be carried by this bus.
 If the size of the DATA Bus is of n bits, then:
o The possible unsigned DATA values that can be carried by the Bus are:
from ( 0 ) to ( +2n - 1 )
o The possible signed DATA values that can be carried by the Bus are :
from ( -2n – 1 ) to ( + 2n –1 - 1 )

DATA Bus Size Unsigned mode DATA range Signed mode DATA range
4 bits 0 , 15 (0H – FH) -8 ~ +7 (08H ~7H)
8 bits 0 , 255 (00H – FFH) -128 ~ +127 (80H ~ 7FH)
16 bits 0 , 65535 (0000H-FFFFH) -32768 ~ +32767 (8000H ~ 7FFFH)
32 bits FFFF FFFFh ( 8000 0000 ~ 7FFF FFFFH )
64 bits FFFF FFFF FFFF FFFFh ( 8000 0000 0000 0000H
~ 7FFF FFFF FFFF FFFFH )

10
H
Control Bus
Control bus: is typically unidirectional but sometimes could be bidirectional. it is used
to carry control signals between the microprocessor and other
components such as memory and I/O devices.
M/IO
 M/IO = 1: memory access, = 0: Peripheral access
 R/W = 1: Read operation, = 0: Write operation CPU R/W
 DEN =1 : Invalid data , = 0: Valid data, DEN

Program R/W M/IO DEN


Mov BL, X 1 1 0
IN AL, port_clavier 1 0 0
OUT port_ecran, BL 0 0 0
Mov X, AL 0 1 0

it's worth noting that while the control bus is primarily unidirectional, some control signals
need a response from the receiving devices, but this is generally not considered a
bidirectional communication in the same sense as data buses, which are often bidirectional.
11
H
The ALU

The ALU is the only component in the computer responsible for


performing all instructions.
INSTRUCTION

Operator & Operands

Arithmetical Logical
(+, -, %, *…) (AND, OR, XOR, …)

Exemple
Instruction1: ADD A, B // (ADD): Binary Operator (A,B): Operands
Instruction2: NOT A // (NOT): Unary Operator (A): Operand

12
H
The ALU

 It is composed of several combinational logical circuits


 It has 2 inputs (on N bits) as operands, and one output as result
 It also has a command input allowing to choose the operation to be done
(performed by a Multiplexer)
 It contains a table of codes of operations for the command input
associated with the UAL
 It has Flags (flags) grouped in PSW register. (PSW = Program Status
Word)

A combinational circuit combines a series of logic gates, including AND, OR and NOT gates...
13
H
The ALU
C
(Command Input)

A
(Op1)

UAL S
(Result)
B
(Op2)

PSW
(Flags)

PSW = Program Status Word (Register)


14
H
ALU Design Example

C.C.ADD
A
C.C.SOUS
S
(Result)
C.C.AND

B C.C.CMP

PSW
(Flags)

(C0C1)
Command
Code

C.C : is a Combinational Circuit which combines a series of logic gates, including AND, OR and NOT gates...
15
H
ALU Design Example

C.C.OR
A
C.C.SOUS
S
C.C.NAND

B C.C.ADD

(C0C1)
Command
Code

C.C : is a Combinational Circuit which combines a series of logic gates, including AND, OR and NOT gates...
16
H
Multiplexer (4 to 1) Multiplexer (8 to 1)
C1 C0 S
0 0 E0
0 1 E1
1 0 E2
1 1 E3

17
H
Multiplexer (4 to 1)
Table of codes assigned to line commands

C
S
C2 C1 C0
C = 011
0 0 0 Add

0 0 1 Sous

0 1 0 Multiplicate
MUX S = Div
0 1 1 Div

1 0 0 Shift Right

1 0 1 Shift Left

1 1 0 (A or B) xor A

1 1 1 Not

18
H
Logical gates designs

OR OR NOT
NAND

XNOR= 2 x NAND + OR
XNOR=NOR or AND
XOR=OR and NAND
(3 gates)

NOR NOR
19
H
Half-Adder (H.A) – 1 bits

It is a circuit that adds two binary numbers of one bit in each, let it be A and B. however
it returns two outputs, S and R of one bit in each.
S: is the result of the addition
R: is the carry out of the addition

A0 B0 S0 R0
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1

20
H
Full -Adder (F.A) – 1 bits
It is a circuit that adds three numbers of one bit in each, let it be An, Bn and Rn-1;
however it returns two outputs, Sn and Rn of one bit in each.
S : is the result of the addition (on N bits).
R : is the carry out of the addition (on 1 bits ).
N bits

(H.A)
0 0 0 0 0 (H.A)
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1

21
H
Full -Adder (F.A) – 4 bits

22
H
Full -Adder (F.A) – 4 bits

Processor 32 bits  32 bits H.A


Processor 64 bits  64 bits H.A

23
H
The PSW - (Register) – the FLAGS C

C.C.ADD
A
C.C.SOUS

C.C.AND
B MUX S
C.C.CMP

C.C.XOR

ZF SF CF OF … PSW
(Flags)
PSW = Program Status Word (Register)
C.C : is a Combinational Circuit which combines a series of logic gates, including AND, OR and NOT gates...
24
H
The PSW - (Register) – the FLAGS
PSW = Program Status Word (Register)

 CF (Carry-out): set to 1 in case of overflow on unsigned operation


 OF (Overflow): set to 1 in case of overflow on signed operation.
 ZF (Zero): set to 1 if the result of the operation is 0.
 PF (Parity): 1 if the low-order of result contains even number of 1, 0 otherwise.
 AF (Auxiliary): set to 1, if there is a carry on the 4th bit (0~~3), 0 otherwise
 SF (Sign): set to 1, if the MSB bit of the result is = 1, 0 otherwise

25
H
Signed values and unsigned values

In signed mode the following byte In signed mode the byte is presented by C2
Binary Code 1111 1111 Example (-1) is presented in Binary Code as:
Is in Decimal code (-1) 1000 0001 ~↦C2↦~ 1111 1111

 If the size of the DATA is n bits, so:


o The possible values that can be carried by the Bus for unsigned DATA are:
from ( 0 ) to ( +2n - 1 )
o However, the possible values that can be carried by the Bus for signed DATA are :
from ( -1 ) to ( - 2n-1 )

ZF SF CF OF …
UnSigned values and unsigned values

In unsigned mode the following byte


Binary Code 1111 1111
Is in Decimal Code (+255)

26
H
ALU

C
N-bits C.C.OP1
A
C.C.OP2
N-bits
N-bits C.C.OP3
B MUX S
C.C.OP4

C.C.OPi


Z S C O
27
H
ALU

28
H
CPU Registers Type of Registers

 When the processor executes the instructions in progress, the data is


temporarily stored in small fast memories (8, 16, 32 or 64 bits) called
REGISTERS. Depending on the type of processor, the total number of registers
can vary from a dozen to several hundred (Ex: Intel-32bits contains 16
registers).
 They can be of the address type (they then contain an address of the memory
word) or data (they then contain the content of a memory word). They can be
specific and have a very precise function (for example the “ordinal counter”
register) or they can be general and serve mainly for intermediate calculations
(for example the “AX” accumulator register)

29
H
CPU Registers Type of Registers

30
R H
Accumulator (AX)
Base (BX)
Counter (CX)
Data (DX)

Code Segment (CS)


Data Segment (DS)
Stack Segment (SS)
Extra Segment (ES)

Source Index (SI)


Destination Index (DI)
Base Pointer (BP)
Stack Pointer (SP)
Instruction Pointer (IP)

Carry flag (CF)


Parity flag (PF)
Auxiliary flag (AF)
Zero flag (ZF)
Sign flag (SF)
Overflow flag (OF)

Interrupt flag (IF)


Direction flag (DF)
Trap flag (TF)

31
R H

EU (Execution Unit)
BIU (Bus Interface Unit)
both work simultaneously.

BIU will access memory and


EU executes the instruction
fetched by BIU.

MOV DS[SI], EFh == MOV [0001], 11111110 == MOV [SI], EFh


32
R H
EU (Execution Unit) Register Explanation :

General-purpose registers, Special purpose registers:


are 4 registers, primarily used for There are two index registers and three pointer registers.
arithmetic and data movement Source Index (SI)
Accumulator (AX) Destination Index (DI)
Base (BX) Base Pointer (BP)
Counter (CX) Stack Pointer (SP)
Data (DX) Instruction Pointer (IP)

six segment registers,


Code Segment (CS)
Data Segment (DS)
Stack Segment (SS)
Extra Segment (ES)
a processor status flags register (EFLAGS),
and an instruction pointer (EIP).

33
R H

some general purpose register

EAX = Extended accumulator


EBX = Extended Base
ECX = Extended Counter
EDX = Extended Data register

EAX = Extended Accumulator

34
R H

 EAX (extended accumulator register) is automatically used by multiplication and


division instructions. For example, in multiplication operation, one operand is stored in
EAX or AX or AL register according to the size of the operand.

 EBX (Extended Base Counter) : is used for addressing, particularly when dealing with
arrays and strings and it also can be used as a data register when not used for
addressing.

 ECX (Extended Counter): ECX or CX are used as a loop counter.

 EDX (Extended data register) : is used for In and Out instructions (input, output), also
used to store partial results of Mul and Div operations and in other cases, can be used
as a data register.

35
R H

 ESI and EDI (Extended Source Index) and


(Extended Destination Index registers)
are used by high-speed memory transfer instructions.

 EBP (extended frame pointer register) is used by high-level


languages to reference function parameters and local
variables on the stack. It should not be used for ordinary
arithmetic or data transfer except at an advanced level of
programming.

 ESP (Extended Stack Pointer) : addresses data on the stack


(a memory segment).
It is rarely used for ordinary arithmetic or data transfer.

36
R H
AX = Extended accumulator
BX = Extended Base
CX = Extended Counter CX : can be used by some
instructions as a Counter
DX = Extended Data register

(SI) Source Index


Code Segment (CS)
(DI) Destination Index
Data Segment (DS)
(BP) Base Pointer (BP generally Used to address Stack DATA SS:[BP])
Stack Segment (SS)
(SP) Stack Pointer
Extra Segment (ES)
(IP) Instruction Pointer

DS=00 (by default)


SI EQU 01
MOV DS[SI], EFh == MOV [0001], 11111110 == MOV [SI], EFh

37
H
CPU Registers Type of Registers
16 bits
16 bits

15 0 7 07 0
AX Accumulator Register AH AL
BX Base Register BH BL H : High part
CX Counter Register CH CL L : Low part
DX Data Register DH DL
SP Stack Pointer SP
BP Base Pointer BP
DI Destination Index DI
SI Source Index SI
General register
Specific register
38
H
CPU Registers The PSW Register – - (the FLAGS)

15 0

X X X X OF DF IF TF SF ZF X AF X PF X CF PSW

Condition bits (red)


Are bits to read, configured by the processor and interpreted by the programmer

 CF (Carry-out): set to 1 in case of overflow on unsigned operation


 OF (Overflow): set to 1 in case of overflow on signed operation.
 ZF (Zero): set to 1 if the result of the operation is 0.
 PF (Parity): 1 if the low-order of result contains even number of 1, 0 otherwise.
 AF (Auxiliary): set to 1, if there is a carry on the 3rd bit, 0 otherwise
 SF (Sign): set to 1, if the MSB bit of the result is = 1, 0 otherwise

39
H
Signed Mode of Operations

40
H
CPU Registers The PSW Register – - (the FLAGS)

15 0
X X X X OF DF IF TF SF ZF X AF X PF X CF PSW
DF: Direction Flag, IF: interruption Flag, TF: Trap Flag
Control Bits (bleu)
These are writeable bits, configured by the programmer.
DF=0: SI, DI will be incremented by concerned instructions (instruction : CLD // DF=0)
DF=1: SI, DI will be decremented by concerned instructions (instruction : STD // DF=1)
IF=0 : interruptions ignored (instruction : CLI // IF=0)
IF=1 : interruptions allowed (instruction : STI // IF=1)
TF=0: step-by-step execution
TF=1: block execution

To modify the bit TF:


PUSHF //(empiler le registre PSW dans la pile)
POP AX // mettre la pile en AX et donc AX=PSW
OR AX, 0000000100000000b (Pour TF=1) // ou AND AX, 1111111011111111b (for TF=0)
PUSH AX
POPF // (Dépiler sommet de pile vers PSW)

41
H
CPU Registers Segment Registers

Segment Registers
DS (Data Segment Register ) : contains DATA Segment address, points to the data memory area

SS (Stack Segment Register ) : contains Stack Segment address


CS (Code Segment Register) : contains Code Segment address of program instructions.
ES (Extra Segment Register ) : address for auxiliary or additional data

 any Memory address consist of a segment address plus an offset address.


 Actual Segment address defines the beginning address of any 64K-bytes memory segment
 offset address: selects the desired location within the 64K-bytes memory segment.
Stack Segment Register (SS): is used for addressing stack segment of the memory.
The Stack Segment is the memory segment which is used to store stack data
Stack data contains return addresses after interrupts and subroutine calls. Also, it can be
used for storing temporary data. The Stack Pointer (SP) always points to the top of the stack
segment. The address pointed by the SP is stored in the Stack Pointer (in CPU).

42
H
Internal Memories

We call "memory" any device capable of acquiring, recording, storing


information (data + programs), and restoring it on demand.

Registers UCT
(CPU) There is
No direct
access

Cache Memory Main Memory Secondary


Memory

Primary Memory Auxiliary External


SRAM (Internal) Mass Memory
Internal External
Memories DRAM Hard disc
ROM Flash disque Memories
… etc CD/DVD
… etc
H
The central Memories

Main memory

The central memory


RAM ROM
is RAM-dominated
Volatil Non volatil
Read/Write Read Only
Temporarily stores Stores immutable information
information in use on the computer (Ex:Bios)

44
H
The central Memories RAM (Random Access Memory)

 RAM is a structure composed of several memory cells of the same length.


 It is formally defined as an array of cells indexed by addresses
 It is characterized by the cell length (byte, Word…) and the number of cells in
total which define the Memory Size (capacity).

 memory size unite is Byte (8 bits).


1 Byte × 1024 = 1 Byte × 210 = 1 kilobytes = 1 kB
1 kB × 1024 = 1 Byte × 220 = 1 Megabytes = 1 MB
1 MB × 1024 = 1 Byte × 230 = 1 Gigabytes = 1 GB
1 GB × 1024 = 1 Byte × 240 = 1 Terabytes = 1 TB

45
H
The central Memories
Central Memory RAM

Cell Address 0
0 0 0 1 1 1 1 0
Cell
Cell
Cell length = Number of bits
(here is 8 bits) .
.
Memory Size = cell length × Nb_cells .
(unit is bytes « octets » ) .
.

Exemple Cell Address N-3

If : Nb_cell = 16 and cell length = 8 bits Cell Address N-2


Memory Size = 16 × ( 1 bytes)
Cell Address N-1
= 16 bytes (« octets »)

46
H
Communication between CPU & RAM Reading
Order to read the current
Control bus address
(order to read) Addressing (0010)

Address bus transports 0010


Cell Adresse N-1

Cell

CPU Cell

.
10100110 .
.
Cell
. Adresse 0100
.
10000000 Adresse 0011

10100110 Adresse 0010


Data bus transports
10100110 Cell Adresse 0001
MOV AL, [0010] ; AX = 10100110 Cell Adresse 0000
47
H
Communication between CPU & RAM Writing
Order to write on the
Control bus current address
(order to read) Addressing (0011)

Address bus transports 0011


Cell Adresse N-1

Cell

CPU Cell

.
10111110 .
.
Cell
. Adresse 0100
.
10111110 Adresse 0011
Data bus transports
10111110 10100110 Adresse 0010

MOV AL, 10111110 Cell Adresse 0001


MOV [0011], AL Cell Adresse 0000
48
H
Communication between CPU & RAM INTUITIF COMPUTE !

If the address is on n-bits, then it is possible


Cell 0000
to reference 2n memory cells
Cell
(for the previous example, it is possible to
reference 16 memory cells using only 4 bits) Cell

In reality the memory size is much bigger .


than this and therefore Segmentation of the .
memory is needed. .
1kBytes = 210 Bytes .
1MBytes = 220 Bytes .
1GBytes = 230 Bytes
Cell 1101
1TBytes = 240 Bytes
Cell 1110
2kBytes = 211
Bytes
16MBytes = 24 x 220 Bytes = 224 Bytes Cell 1111

49
H
Memory Segmentation

Segmentation is the subdivision (logical, not physical) of RAM into multiple equal-sized
areas. (typically 64K-bytes memory). It is done by partitioning the bits of the address bus
 any Memory address consist of a segment address plus an offset address.
 Actual Segment address defines the beginning address of any 64K-bytes memory segment
 offset address: selects the desired location within the 64K-bytes memory segment.

Address syntax 0 0 0 0
[Segment : Offset] Segment 00 0 0 0 1
0 0 1 0
0 0 1 1
[00:10] 0 1 0 0
0 1 0 1
Segment 01 0 1 1 0
[01:11] 0 1 1 1
1 0 0 0
Segment 10 1 0 0 1
[11:01] 1 0 1 0
1 0 1 1
Segment 11 1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
50
H
Memory Segmentation

@Segment
@Offset

Address syntax 0 0 0 0
[Segment : Offset] Segment 00 0 0 0 1
0 0 1 0
0 0 1 1
[00:10] 0 1 0 0
0 1 0 1
Segment 01 0 1 1 0
[01:11] 0 1 1 1
1 0 0 0
Segment 10 1 0 0 1
[11:01] 1 0 1 0
1 0 1 1
Segment 11 1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1

51
H
Memory Segmentation

The Memory words are numbered 0,1,2…. The processor dispose of an instruction set and
Each instruction in the instruction set fits into one memory word.
The processor also has a number of General Purpose Registers. There are three special
registers :
 Stack Pointer (SP): generally used to point to the last record of the stack and is
normally initialized immediately below the global data of the program. When data is
pushed on to the stack (using the PUSH instruction) the (SP) gets
automatically incremented. Thus, the stack grows towards higher memory locations.
 Instruction Pointer (IP) carries the address of the current instruction under execution
and is automatically incremented to point to the next instruction to be executed. The IP
register always hold the address of the next instruction to be executed.
 Base Pointer (BP) is generally used to store the base address of an activation record for
procedure evocations. Although any other register can act as the base pointer, but the
availability of an explicit base pointer gives better structure and clarity to program
compilation.
MOV AX, [BP] // indirect addressing
MOV AX, [BX+0002H] //data are accessed via a base register with displacement

52
H
Memory Segmentation Stack Operations

Stack Segment Register (SS): is used for addressing Stack Segment in the memory.
The Stack Segment is the memory segment which is used to store Stack Data.
One of the most ingenious temporary uses of the stack is to call other sub-procedures. In
this case, the Stack Data contains return addresses after interrupts and subroutine calls.
Also, it can be used for storing temporary data and so a sub-procedure can use the stack for
its own local variables. Initially, the (SP) value always gets started to point the top of the
stack segment.
0000
0001
 The stack operations follow the principle of LIFO 0010
(Last In First Out) 0011
 The stack operations can be performed by : 0100
0101
 PUSH and POP instructions 0110
 interrupts or subroutine calls. 0111
Segment 1000 Offset
SS=10 1001 SP=01
1010
xxxxx 1011
1100
1101
1110
1111
53
H
Memory Segmentation Stack Operations Using PUSH and POP instructions
after some
operations
AX changed

Note: POP doesn’t delete the stacked data


To modify the bit TF:
PUSHF // stack the PSW data in the stack memory
POP AX // put the stack memory data in AX (AX=PSW)
OR AX, 0000000100000000b (for TF=1) // AND AX, 1111111011111111b (for TF=0)
PUSH AX // put AX in the stack memory
POPF // put the stack memory in PSW (PSW=AX)

54
H
Memory Segmentation Stack Operations Using Subroutine calls

call and ret use the stack (involving SP and IP registers).


The call instruction pushes theactuel (IP) contents onto the stack (involving SP) and then
puts the new address specified in the call instruction into the (IP). Since that is the end of
the call instruction, the processor then executes the subprogram untill the ret is reached
at the end of the subprogram. ret returns the contents of the last item pushed on the
stack back into the (IP) and the (SP) is decremented to point to the next latest stocked
value. At this stage, what is loaded into the (IP) is the address of the next instruction just
after the call , so the execution flow resumes from that point after the call instruction.
the call instruction execution has no effect on the (CS).

For an intersegment (near) return, the address on the stack is a segment offset that is
popped onto the IP.
For an intersegment (far) return, the address on the stack is a long pointer. The offset is
popped first, followed by the segment selection.

55
R H
AX = Extended accumulator
BX = Extended Base
CX = Extended Counter CX : can be used by some
instructions as a Counter
DX = Extended Data register

(SI) Source Index


Code Segment (CS)
(DI) Destination Index
Data Segment (DS)
(BP) Base Pointer (BP generally Used to address Stack DATA SS:[BP])
Stack Segment (SS)
(SP) Stack Pointer
Extra Segment (ES)
(IP) Instruction Pointer

DS=00 (by default)


SI EQU 01
MOV DS[SI], EFh == MOV [0001], 11111110 == MOV [SI], EFh

56
H
ID Segments
Memory Segmentation Stack Operations Using Subroutine calls ID Offsets

Segments 0000
01: offset1: Ins 1
0001
01: offset2: Ins 2 DS=00 0010 Offsets
01: offset3: CALL PROC 0011
01: offset4: Ins 4 0100
CS=01 0101
PROC 0110
01:offset31: Inst 1 0111
01:offset32: Inst 2 1000
RET 1001 SP=01
SS=10 1010
xxxxx 1011
1100
1101
ES=11 1110
1111
Near Procedure (Near Call )
CALL : store IP_Return from IP in the stack ⇒ (SP=SP-2) and load IP to Call position
RET : reload IP_Return to IP from the stack and empty the stack location ⇒ (SP=SP+2)

57
H
Memory Segmentation Stack Operations Using Subroutine calls

Segments 0000 Offsets


01: offset1: Ins 1 0001
01: offset2: Ins 2 DS=00 0010
01: offset3: CALL PROC 0011
01: offset4: Ins 4 0100
CS=01 0101 IP=01
0110
PROC 0111
11: offset1: Ins 1 1000
11: offset2: Ins 2 1001 SP=01
SS=10 1010
RET
xxxxx 1011
1100 IP=00
1101
ES=CS=11 1110
1111
Near Procedure (Near Call )
CALL : store IP_Return from IP in the stack ⇒ (SP=SP-2) and load IP to Call position
RET : reload IP_Return to IP from the stack and empty the stack location ⇒ (SP=SP+2)
Far Procedure (Far Call )
CALL : store CS_Return ⇒ SP=SP-2 , store IP_Return ⇒ SP=SP-2 and load CS&IP to Call position
RET :reload CS_Return to CS ⇒SP=SP+2, reload IP_Return to IP ⇒SP=SP+2, empty stack location
58
H
Cache Memory Random Access
Read/Write

SRAM DRAM
Used in Cache Memory Used in Central Memory

It does not need to be it needs to be refreshed


refreshed ( every 55ms)

High Cost Low Cost

Small size Larger size

Short access time (short latency) Long access time (long latency)

Consumes less energy Consumes more energy

59
H

SRAM DRAM

it stores information as long as the power is supplied. it stores information as long as the power is supplied or a
few milliseconds when the power is switched off.
Transistors are used to store information in SRAM. Capacitors are used to store data in DRAM.

Capacitors are not used hence no refreshing is required. To store information for a longer time, the contents of the
capacitor need to be refreshed periodically.
SRAM is faster compared to DRAM. (high data transfer rate) DRAM provides slow access speed. (lower data transfer rate)

It does not have a refreshing unit. It has a refreshing unit.

These are expensive. These are cheaper.

SRAMs are low-density devices. DRAMs are high-density devices.

bits are stored in voltage form. bits are stored in the form of electrical charges.

These are used in cache memories. These are used in main memories.

Consumes less power and generates less heat. Uses more power and generates more heat.

SRAMs has lower latency DRAM has more latency than SRAM

SRAM is used in high-speed cache memory DRAM is used in lower-speed main memory.

SRAM is used in high performance applications DRAM is used in general purpose applications

60
H
Cache Memory Mapping functions Important Concepts
Cache memory operates between 10 to 100 times faster than RAM, requiring only a few nanoseconds
to respond to a CPU request.

OUTPUT
INPUT UNITS
CPU UNITS

Cache Memory Very fast but with small size


(SRAM) because it is expensive

CENTRAL SECONDARY
MEMORY MEMORY

61
H
Cache Memory Mapping functions Important Concepts
Cache memory operates between 10 to 100 times faster than RAM, requiring only a few nanoseconds
to respond to a CPU request.

OUTPUT
INPUT UNITS
CPU UNITS

How much often and


speed of Operations
Cache Memory Need a management
(SRAM) Solution to share
between cache and
04 operations via central
central memories!
memory can be done in just
02 operations via the cache Mapping
memory with a much faster
speed.
It is as 30 seconds of CENTRAL SECONDARY
operations via central MEMORY MEMORY
memory, could be done in just
1 second via cache memory.
62
H
Cache Memory Mapping functions Important Concepts

Cache Memory in Computer Organization


Cache memory is an extremely fast memory type that acts as a buffer between RAM and
the CPU. It is used to speed up and synchronize with the high-speed CPU.
There are various independent caches in a CPU, which store instructions and data. The
most important use of cache memory is that it is used to reduce the average time to
access data from the main memory.
By storing this information closer to the CPU, cache memory helps speed up the overall
processing time. Cache memory is much faster than the main memory (RAM).
When the CPU needs data, it first checks the cache. If the data is there, the CPU can
access it quickly. If not, it must fetch the data from the slower main memory.

63
H
Cache Memory Mapping functions Important Concepts

Levels of Memory
Level 1 or Register: It is a type of memory in which data is stored and accepted that
are immediately stored in the CPU. The most commonly used registers are
Accumulator, Program counter , Address Register, etc.
Level 2 or Cache memory: It is the fastest memory that has faster access time where
data is temporarily stored for faster access.
Level 3 or Main Memory: It is the memory on which the computer works currently. It
is small in size and once power is off data no longer stays in this memory.
Level 4 or Secondary Memory: It is external memory that is not as fast as the main
memory but data stays permanently in this memory.

64
H
Cache Memory Mapping functions Important Concepts Principle of locality

Principle of locality
The locality principle states that the information that the processor will access
has a high probability of being located in a spatial window and a temporal
window.
Temporal Locality Spatial Locality

A program that has A program that has


manipulated information in manipulated information in the
the recent past has a very recent past has a very high
high chance of chance of manipulating
manipulating it again information close to this
shortly after. information shortly after.

65
H
Cache Memory Mapping functions Important Concepts Principle of locality

Temporal Locality Spatial Locality


For loop (repeating) Table T
Index Contenu
- Instruction 1; (x)
T[1] Val 1
- Instruction 2; (x,y)
- Instruction 3; T[2] Val 2
- ….; Memory Block T[3] Val 3
- ….; 1 Block = k cells
- ….;
- ….;
- Instruction N;
T[i] Val i

66
H
Cache Memory Mapping functions Important Concepts

When the processor needs to read or write a location in the main memory, it first checks
for a corresponding entry in the cache.
Cache hit occurs, when the required word is found in the cache memory. Then the
required word would be delivered to the CPU from the cache memory.
Cache miss occurs, when the required word is not present in the cache memory.
So, the page containing the required word has to be mapped from the main memory.
This mapping is performed using cache mapping techniques.
For a cache miss, the cache allocates a new entry to the cache and copies the rested data
from the main memory, then the request is fulfilled from the contents of the cache.
Cache Mapping defines the
method of how the contents of
the main memory are brought
into the cache memory. It
defines how a block from the
main memory is mapped to the
cache memory in case of a
cache miss.

67
H
Cache Memory Mapping functions Important Concepts

Cache Mapping Performance


When the processor needs to read or write a location in the main memory, it first checks for a
corresponding entry in the cache.
If the processor finds that the memory location is in the cache, a Cache Hit has occurred and data is
read from the cache.
If the processor does not find the memory location in the cache, a cache miss has occurred. For a
cache miss, the cache allocates a new entry and copies in data from the main memory, then the
request is fulfilled from the contents of the cache.
The performance of cache memory is frequently measured in terms of a quantity called Hit ratio.
cache mapping`s performance is directly proportional to the Hit ratio.

number of Hit number of Hit


Hit Ratio ( HR )  
number of Hit  number of Miss number of total access

number of Miss number of Miss


Miss Ratio ( MR )  
number of Hit  number of Miss number of total access

We can improve Cache performance using higher cache line size, and higher associativity, reduce miss
rate, reduce miss penalty, and reduce the time to hit in the cache.

68
H
Cache Memory Mapping functions Important Concepts

Exercise
The access time of cache memory is 100ns and that of the main memory is 1 µsec.
80% of the memory operations are for reading requests and the rest of the others are for writing.
The hit ratio for reading is (0,9).
Calculate the total average access time of the system for both read and write requests

[Answers]

69
H
Cache Memory Mapping functions Important Concepts

Exercise
a cache memory is needs an access time of 30ns and main memory 150ns,
What is the average access time of CPU to read from memories (assume hit ratio = 80%)?

[Answers]

70
H
Cache Memory Mapping functions Important Concepts

 The Main_memory is divided into equal size partitions called as “blocks” or frames.
 The Cache_memory is divided into partitions having the same size as that of those blocks,
and these Cache_partitions are called “Lines”.
 During cache mapping, a certain number of blocks are copied to the cache and they are read
directly from the cache when they are needed for processing. So, only needed blocks are
copied into the cache and a kind of Mapping technique is needed to do that.

Mapping Main Memory Secondary


Cache Memory Storage
Line1
Block1
Line2
Processor Line3 Block2 Page1

Block3

Page2
BlockM

71
H
Cache Memory Mapping functions Important Concepts

Line Tag Cache Memory Central memory



Block 1 (k cells) @0 Cell (Block
0
frame)
1 Block 2 (k cells) @1 Cell
2 Block 3 (k cells) @2 Cell Block 1
(size = K
Cells)
Block j (k cells)
Cell
J Block M (k cells) Block 2
(size = K
Cells)
(one Line Size = K Cells)
Cell

Block M
Tag Tag is the higher-order bits (size = K
of the memory address. Cells)

Size of the cells


72
R H
Cache Memory Mapping functions Important Concepts

Address decoder

Memory

(Multiplexer)

73
H
Cache Memory Mapping functions Cache Mapping Forms

The cache size is much smaller than the memory size. A strategy method for copying
data blocks into the cache must be defined. This strategy method is called mapping.

Cache Mapping Techniques

Direct Mapping Fully Associative K-way Set Associative


Mapping Mapping

74
H
Cache Memory Mapping functions Direct Mapping

Direct Mapping Pattern


In direct mapping, Example
Line N° Block N°
A main memory_block can map only to a 0 0
particular line of the cache in a repeated 1 1 Repeated
mapping pattern for each group of Blocks. 2 2 mapping
3 3 pattern
Thus, any new incoming block from any one 4 4
group will always replace the existing block in 5
that particular line following the same 6 Repeated
pattern. 7 mapping
8 pattern
4 modulo 5 = 4
The total number of Blocks = (Main Memory 9
5 Modulo 5 = 0
size) divided by (total Number of lines in 6 Modulo 5 = 1 10
Cache) Repeated
mapping
pattern

75
H
Cache Memory Mapping functions Direct Mapping

In below presentation, the cache memory is divided into ‘n’ number of lines. a block ‘j’ of
the main memory can map only to a specific line number (j mod n) of the cache.
The line number of cache to which a particular block can map is given by :
Cache line number = (J ) Modulo (n)
Modulo = is the rest of a division operation. Ex: (12 modulo 10 = 2)
Direct Mapping Pattern
Example
Line N° Block N°
0 0
1 1
Repeated
2 2 mapping
3 3 pattern
4 4
5
6 Repeated
7 mapping
8 pattern
4 modulo 5 = 4
5 modulo 5 = 0 9
6 modulo 5 = 1 10
76
H
Cache Memory Mapping functions Direct Mapping

The memory blocks are mapped to cache lines using a specified mechanism. When
requesting a memory address, the memory address components are divided into three
partitions. Those three partitions are the tag, the index, and the offset.
The number of bits of each partition depends on : the size of the cache, the size of the
main memory, and the number of blocks ( which is a multiple number of number of
cache lines).
 The Tag bits represents the higher-order bits of the memory address.
 The index bits indicate the cache Line number to which the memory block is mapped.
the memory block number is identified by the Tag bits and the index bits together.
 While the offset bits specify the position of the Cell data within the line or the
memory Block

77
H
Cache Memory Mapping functions Direct Mapping

cache size = Number of cells in Cache x cell size

Cache Matrix = Line N x Line offset =


Tag = ID Number of cells in Cache

Block Number Block offset

Number of cells
in memory

Memory size = Number of cells in memory x cell size


2n x cell size
78
R H
Cache Memory Mapping functions Direct Mapping

In direct mapping all the lines contain the same Tag value at the same time. So, one comparator is
enough to compare all line’s Tag, with the requested Tag)

All the lines contain the


same Tag value at the
same time. So, one
comparator is enough to
compare all lines Tag with
the requested Tag)

The size of the comparator


= The total number of bits
present in the tag

79
H
Cache Memory Mapping functions Direct Mapping

Cache Memory is a
Direct Mapping procedure
matrix of Cells
(here 8x8 Matrix)

Index bits for lines; and


Offset bits for columns;

Tag is the rest of the


address

In direct mapping the


number of Blocks is a
multiple number of the
number of cache lines

Total number of Blocks = (Main Memory size) divided by (total Number of lines in the Cache)
80
H
Cache Memory Mapping functions Direct Mapping

Searching procedure in Direct Mapping


The cache controller uses the line number field (index bits ) of the requested address in
order to access a particular line of a given cache. Then, compare the tag bits of that
requested address with the tag bits of the line. If the two tags match, then a cache Hit
occurs, and the desired word is found in that cache line (following the offset).
If the two tags are not matched, then a cache Miss would occur, consequently, the cache
controller addresses the required block (Block number) from the main memory, and
stores its data_words inside the data field of the concerned cache line. This operation is
sometimes called a “linefill”. So when a cache Miss occurs, a “linefill” takes places and
overwrites the existing block words in that cache line and replaces them with the new
block words.
Once the data words are read [ directly from the cache (by cache hit) or indirectly from
the main memory (by cache miss) ], it will be delivered to the requesting processor.
It should be clear that all main memory addresses with the same value of index bits [in
our previous example 001] will map their data words to the same line in the cache. Only
one of those memories data can be in the cache at any given time. This means a
problem called thrashing can easily occur.
m1
81
Diapositive 81

m1 thrash=defeat
me; 30.11.2024
H
Cache Memory Mapping functions Direct Mapping
Cache
Division of the Memory Address in Direct Mapping size
In direct mapping, the physical address is divided as :
The Tag, Line Number and the Offset . Tag Cache Matrix
They are all derived from the requested
memory address
Memory Address

Main Memory
Block Number
Block Cell offset

Main Memory size

Total number of Blocks =


(Main Memory size) divided
by (lines size)

82
H
Cache Memory Mapping functions Direct Mapping

Division of the Memory Address in Direct Mapping


In direct mapping, the physical address is divided as :
The Tag, Line Number and the Offset . They are all derived from the requested memory address
Relationships in the Direct Mapping can be defined as:
Number of Comparators = 1
Physical Address (PA) = Tag ǀ Line Number ǀ Line Offset
Cache size  LineSize  TNoL
TNoB  MemorySize   BlockSize 
LN  BN  mod TNoL 
Tag bits Number  TNoB bits Number  TNoL bits Number
BlockSize  LineSize
Number of Comparator  1 PA
 LN LineNumber
 BN BlockNumbe r


TNoB Total Number of Bloks
TNoL Total Number of Line

83
H
Cache Memory Mapping functions Direct Mapping

Exercise
if there are N (=8) lines in the cache,. Find the number of bits of the line index.

[Answers]
if there are N lines in the cache, the number of bits of the line index in direct mapping :
line index =log2(N) bits , that is N=2index bits=23 , so number of bits of the line index = 3 bits

Exercise
Consider an 8-bits memory address (Physical address) [01000011] and cache with 8 lines, the blocks
containing 8 cells of Bytes. Find:
The total number of Blocks in the main memory.
Give the Block number of memory address 01000011, its mapped line and its Tag

[Answers]
We use 3bits for the index (since cache lines hold 8=23 lines ).
and 3bits for the offset (since blocks containing 8 Bytes 8=23 Bytes )
The remaining two bits are for the tag.
The total number of Blocks = total memory size/ Block size = 28 /8 =32 Blocks
Block number =01000; line number=(01000)2 mod 8 = 8 mod 8 = 0 and Tag = 01

84
H
Cache Memory Mapping functions Direct Mapping 

Important Results
Here are a few crucial results for a direct-mapped cache:
 The block j of the main memory is mapped to only one specific line number in
the cache ( j mod lines in cache).
 The size of every multiplexer = The total number of lines present in the cache
 Total number of required comparators = 1
 The size of the comparator = The total number of bits present in the tag
 Hit latency = Comparator latency + Multiplexer latency
 There is no requirement for a replacement algorithm (all the cache changes).

85
H
Cache Memory Mapping functions Direct Mapping

Advantages of Direct-mapping
 Simple method of implementation.
 Low hardware complexity. (Only one comparator)
 short access time.
 There is no requirement for a replacement algorithm.
Disadvantages of Direct-mapping
 Unpredictable Cache performances.
 poor handling of spatial locality.
 Inefficient Use of cache space.
 High Confronting missing cases.

86
H
Cache Memory Mapping functions Fully Associative Mapping

Cache Mapping Techniques

Direct Mapping Fully Associative K-way Set Associative


Mapping Mapping

Memory Address

Fully Associative Mapping allows a memory block to be loaded into any cache line. This
makes fully associative mapping more flexible than direct mapping. It is considered to be
the fastest and most flexible mapping form.
Fully associative Mapping tend to have the fewest cache-misses for a given cache capacity,
but they require more hardware for additional tag comparisons to compare every line Tag.
They are best suited to relatively small caches because of the large number
of comparators.
A fully associative cache contains a single set with K ways, where K is the number of lines
in the set. A memory address can map to a block DATA in any of these ways.

87
H
Cache Memory Mapping functions Fully Associative Mapping

In fully associative mapping, a block of the main memory can map to any line of the cache
freely available at that moment. fully associative mapping is more flexible than direct
mapping.

If all the lines of cache are freely


Main Memory
available, then, any main memory block
Cache Memory Mapping
can map to any line of the cache.
Line1
Block1
Line2 If all the cache lines are all occupied,
Line3 Block2
then one of the lines will have to be
Line4 replaced.
Line5 Block3 A replacement algorithm like FCFS, LRU
Line6 Algorithms is introduced to choose and
select which line data has to be replaced.

Block M

88
Cache Memory Mapping functions Fully Associative Mapping
Cache
Division of Memory Address in Fully Associative Mapping size
In Fully Associative Mapping, the memory
Cache Line
address is divided into a Tag field and a Matrix Number
Block/line offset.
 The Tag uniquely identifies the block Tag Line offset
number and it is used to compare with
the cache Tags to find a Tag match (a Hit).
 The offset identifies the required word
within the Tagged block/line .
 In associative mapping, there is no index Main Memory Main Memory
Block Number Block offset
bits. The line number is not indicated.
Main
Memory Size

Number of Comparators = number of lines


Physical Address (PA) = Tag ǀ Line Offset

89
Cache Memory Mapping functions Fully Associative Mapping

Division of Memory Address in Fully Associative Mapping


In Fully Associative Mapping, the memory address is divided into a Tag field and a
Block/line offset.

Relationships in the Fully Associative Mapping can be defined as:


Number of Comparators = number of lines
Physical Address (PA) = Tag ǀ Line Offset

Cache size  LineSize  TNoL PA


TNoB  MemorySize  BlockSize
LineSize  BlockSize
Tag bits Number  TNoB bits Number
Number of Comparator s  TNoL
one Line is one Way
TNoB Total Number of Blocks

TNoL Total Number of Line

90
D D
92 68 R H
Cache Memory Mapping functions Fully Associative Mapping

Instead of Mux/counter, one comparator is employed for each line (number of


comparators = number of lines) to avoid counting and so losing time
The lines contain not
necessarily the same
Tags at the same time.
Number of comparators
is the same as the lines
number to speed up
comparison instead of
using only one
comparator while
counting and MUX to
check one by one to
finish them this
consume time. So the
use of many
comparators save
enormous time.
this is enough to
compare all line’s Tags
with the requested Tag)

91
D D
92 68 R H
Cache Memory Mapping functions Fully Associative Mapping

Number of comparators equal Number of Comparator = Total number of lines


number of lines in cache Physical Address (PA) = Tag + Line Offset

Requested Physical Address (PA)

92
R H
Cache Memory Mapping functions Fully Associative Mapping

Number of Comparator = Total number of lines


Physical Address (PA) = Tag + Line Offset
Requested Physical Address (PA)

93
R H
Cache Memory Mapping functions Fully Associative Mapping

Number of Comparator = Total number of lines


Physical Address (PA) = Tag + Line Offset

94
H
Cache Memory Mapping functions Fully Associative Mapping Mapping Performance

Cache Mapping Performance


When the processor needs to read or write a location in the main memory, it first checks for
a corresponding entry in the cache generating either a cache-Hit or a cache-Miss.

The performance of cache memory is frequently measured in terms of a quantity called Hit
ratio (HR).
cache mapping`s performance is directly proportional to the Hit ratio.

Fully Associative Mapping is more flexible than direct mapping and has higher possible Hit
ratio (HR) for a given cache capacity compared with direct mapping but requires more
hardware with longer access time. It is considered to be the most flexible mapping form
but with more hardware. They are best suited to relatively small caches because of the
large number of hardware comparators.

We can improve Cache performance using higher cache line size, and higher associativity, reduce miss
rate, reduce miss penalty, and reduce the time to hit in the cache.

95
H
Cache Memory Mapping functions Fully Associative Mapping

Advantages of Associative Mapping Technique


 It is the fastest and most flexible technique.
 More flexible to reduce cache-Miss problem using the Mapping algorithm.
 It has the highest possible Hit ratio (HR) for a given cache capacity
Disadvantages of Associative Mapping Technique
 The comparison time to search a block is comparably high.
 It is more expensive because of the added hardware elements.
 suitable to relatively small caches only because of the added hardware elements.

96
H
Cache Memory Mapping functions K-way-Set Associative Mapping

Cache Mapping Techniques

Direct Mapping Fully Associative K-way Set Associative


Mapping Mapping

Memory Address

the cache memory is divided into many sets of lines with a complete associability in within.
This form of mapping is an enhanced form of direct mapping where the drawbacks of
direct mapping are removed. Set associative addresses the problem of possible thrashing
in the direct mapping method. It does this by enabling that instead of having exactly one
line that a block can map to in the cache, we will group a few lines together creating a set .
Then a block in memory can map to any one of the lines of a specific set.
Set associative cache mapping combines the best of direct and associative cache mapping
techniques all together.

97
H
Cache Memory Mapping functions K-way-Set Associative Mapping

The following are the rules of K-way-Set Associative Mapping


 In k-way set associative mapping, Cache lines are grouped into Sets where each set
contains k number of lines (K-way-Set ).
 A particular main memory block can map to only one particular set of the cache and
within that set, the memory block can then map to any cache line that is freely
available. (K-way possibilities in the Set )
 The cache Set Number to which a particular memory block can map is given by :

Associated Cache set number = (concerned Block Number) Modulo (Number of sets)

Memory Address

Special Cases
• If k = 1, then k-way set associative mapping becomes direct mapping.
• If k = Total number of lines in the cache, then k-way set associative mapping becomes
fully associative mapping.

98
H
Cache Memory Mapping functions K-way-Set Associative Mapping

Division of Memory Address in Fully Associative Mapping Cache


size
In K-way-Set Associative Mapping, the
memory address is divided into a Tag field,
Number
a set number field and a Block/line offset. Set size
of ways
 The Tag with the Set Number uniquely
identifies the block number and the Tag Set
Tag Line Cell offset
is used to compare with the cache Tags Number
to find a Tag match (a Hit).
 The offset identifies the required cell
within the Tagged block/line .
 In set associative mapping there is no Main Memory Main Memory
line number index in the requested Block Number Block Cell offset
address and all cache lines need to be
checked in reading/writing. Main Memory size

In Set-associative mapping, each cell present in the cache set maps other cells in the
main memory with the same set number.

99
H
Cache Memory Mapping functions K-way-Set Associative Mapping

Associated Cache set number = (concerned Block Number) Modulo (Number of sets)

Within the same set,


Mapping Main Memory
Cache Memory Tags are differents
(TAG=0000) Line0 (way0)
Set 0 Block0 0 modulo 4 = set 0
(TAG=0001) Line1 (way1)

Line2 (way0) Block1 1 modulo 4 = set 1


Set 1
Line3 (way1)
(TAG=0000) Line4 (way0) Block2 2 modulo 4 = set 2
Set 2
(TAG=0001) Line5 (way1)
Block3 3 modulo 4 = set 3
Line6 (way0)
Set 3
Line7 (way1)
Block4 4 modulo 4 = set 0

Block5 5 modulo 4 = set 1

Block M

100
H
Cache Memory Mapping functions K-way-Set Associative Mapping

Division of Memory Address in K-way-Set Associative Mapping

Physical address (PA) =

Number of Comparators = Number of ways


Physical Address (PA) = Tag ǀ set Number ǀ Line Offset
Relationships in the K-way-Set-Associative Mapping is as follows :
Cache size  LineSize  NoW  TNoS
Memory size  BlockSize  TNoB
SN  BN  mod TNoS 
TagbitsNum ber  PAbitsNumb er  SetbitsNum ber  BlockOffsetbitsNumbe r
Number of Comparator s  NoW
 SN SetNumber
 BN BlockNumbe r

TNoB Total Number of Blocks
TNoS Total Number of Sets

 NoW Number of Ways in one Set
101
H
Cache Memory Mapping functions K-way-Set Associative Mapping

Instead of Mux/counter, one comparator is employed for


each one way (number of comparators = number of ways)

102
H
Cache Memory Mapping functions K-way-Set Associative Mapping

Number of comparators equal number


of k ways in one set (k comparators )

103
H
Cache Memory Mapping functions K-way-Set Associative Mapping

Number of comparators equal number


of k ways in one set (k comparators )

(Most computers use 2-way or 4-way cache)


104
H
Cache Memory Mapping functions K-way-Set Associative Mapping Example

Six-Way Set Associative 24-KB Data


Cache (24k=6x64x64).
The figure shows the logical
structure of a 24-KB Intel Atom
instruction cache.

The Intel Atom platform has the


following caches, all with a cache
line size of 64 bytes:

 32-KB eight-way set associative


L1 instruction cache.

 24-KB six-way set associative


L1 data cache.

 512-KB eight-way set associative


unified instruction and data L2
cache.

105
H
Cache Memory Mapping functions K-way-Set Associative Mapping

Exercise The figure shows the logical


structure of a K-ways associative
mapping of Intel Atom instruction
cache. Find:

 The size of the cache.

 The total size of the main


memory

 The total number of mempry


blocks.

 The number of cells in one


block. (Cells per Block)

 The total number of used


comparators.
[Answers]
Cache size =24kB
Memory size = 232 =4GB ,
Number of Blocks = 226
cells per Block = 26 = 64
Number of comparators = 6
106
H
Cache Memory Mapping functions K-way-Set Associative Mapping

Exercise
From the presented cache
mapping, find :
 The total size of the main
memory,
 Total Number of sets
 Total Number of ways
 Total Number of Blocks
 Total Number of Bytes.
 Redo the same question if
there are 8 ways instead
of two, and explain the
advantage.
[Answers] Memory size =16GB , 2 ways, Number of sets =4, Number of Blocks = 230 Number of Bytes = 4x232
=16GB . if 8 ways, nothing changes except cache performance will be better ( MR will reduce)
107
H
Cache Memory Mapping functions K-way-Set Associative Mapping

Exercise

If the total Number of Sets in a cache is 4 and the Total number of ways (lines) within one set
is 2, Find the Set number for the Blocks 00, 01,06,11, 54

SN  BN  mod TNoS 


TNoL  TNoS  TNoLi1S 
[Answers]  SN SetNumber
The Set number for the Blocks 00, is 00mod4 = set 0
 BN BlockNumbe r

The Set number for the Blocks 01, is 01mod4 = set 1
TNoS Total Number of Sets
The Set number for the Blocks 06, is 06mod4 = set 2 TNoL Total Number of Line
The Set number for the Blocks 11, is 11mod4 = set 3 
TNoLi1S Total Number of Lines in a Set
The Set number for the Blocks 54, is 54mod4 = set 2

108
H
Cache Memory Mapping functions K-way-Set Associative Mapping

Searching procedure in K-way-Set Associative Mapping


The cache memory is divided into many sets of lines with a complete associability in within.
After generating a memory request by the CPU, the set number field of the address is used
to access the particular set of the cache. The Tag field of the requested address is hence
compared with the Tags of all k ways (lines) within that set.
 If the Tag matches to the way tag of any way within the set, a cache Hit occurs.
 If the Tag does not match to all way’s tags within the set, a cache Miss occurs.
 In case of a cache Miss, the required memory cell has to be brought from the main
memory.
 If the cache is full, a replacement is made following the employed replacement policy.

109
H
Cache Memory Mapping functions K-way-Set Associative Mapping

Need of Replacement Algorithm-

Set associative mapping is a combination of direct mapping and fully associative mapping.
It uses fully associative mapping within each set. Thus, the set associative mapping needs
a replacement algorithm.

Division of Physical Address-

In set associative mapping, the physical address is divided as :

110
H
Cache Memory Mapping functions K-way-Set Associative Mapping

Advantages of Set-Associative Mapping


It has the highest hit rate for instruction cache.
Conflict Misses are very few.

Disadvantages of Set-Associative Mapping


It is the most expensive.
Could be slower than direct mapping.

111
H
Summary

Levels of Memory
Level 1 or Register: It is a type of memory in which data is stored and accepted that are
immediately stored in the CPU. The most commonly used register is
Accumulator, Program counter, Address Register, etc.
Level 2 or Cache memory: It is the fastest memory that has faster access time where
data is temporarily stored for faster access.
Level 3 or Main Memory: It is the memory on which the computer works currently. It is
small in size and once power is off data no longer stays in this memory.
Level 4 or Secondary Memory: It is external memory that is not as fast as the main
memory but data stays permanently in this memory.

Cache memory operates between 10 to 100 times faster than RAM, requiring only a few
nanoseconds to respond to a CPU request. The name of the actual hardware that is used
for cache memory is high-speed static random access memory (SRAM).

112
H
Summary

Cache mapping is a technique that defines how contents of main memory are brought
into cache. Cache Mapping Techniques are :

 Direct mapping is Simple for implementation with Low hardware complexity (Only
one Tag comparator) and it has short access time.

 Fully Associative Mapping is more flexible than direct mapping and has higher
possible Hit ratio (HR) for a given cache capacity compared with direct mapping
but requires more hardware. It is considered to be the fastest and most flexible
mapping form but with more hardware. They are best suited to relatively small
caches because of the large number of hardware comparators.

 Set associative mapping is a combination of direct mapping and fully associative


mapping. It uses fully associative mapping within each set. It has the highest hit
rate for instruction cache.

We can improve Cache performance using higher cache line size, and higher associability, reduce miss
rate, reduce miss penalty, and reduce the time to hit in the cache.
113
H
Summary

Direct-mapped cache, is commonly used in microcontrollers and simple embedded systems with
limited hardware resources.
The direct-mapped cache is simple and fast, but it may not be suitable for all memory access
patterns and workloads due to its higher conflict and miss rates and limited associability. Direct-
mapped cache, there is no requirement for a replacement algorithm.

Use Cases and Applications


Processors often use direct-mapped caches as instruction caches. Instructions tend to exhibit good
locality of reference, and the deterministic mapping of direct-mapped caches effectively exploits the
property of instructions exhibiting good locality of reference, resulting in fast and predictable
instruction fetches.

One of the major use cases is in embedded systems where power consumption, simplicity, and
determinism are critical considerations. These applications often operate with limited resources and
require efficient memory access with predictable timing.

Direct-mapped caches offer constant access times and predictable cache behavior, making them
suitable for real-time systems like aerospace, automotive, or industrial control applications.

We also use them when low latency is a critical requirement. Due to their simple and
straightforward design, direct-mapped caches can offer low access times for specific memory access
patterns.
114
H
Summary

Parameter Direct Mapping Fully associative k-way associative


Number of k
1 Number of lines
comparators Number of ways (lines per set)
Number of Sets ×
Number of lines × Number of Number of lines × Number of
Number of lines(ways) per Set ×
Cache size cells per line × Number of Bytes cells per line × Number of Bytes
Number of cells per line ×
per cell per cell
Number of Bytes per cell
Total Number of Memory size ÷ Number of cells Memory size ÷ Number of cells Memory size ÷ Number of cells
per Block per Block per Block
Blocks

115
H
Summary

Temporal and spatial locality insure that nearly all references can be found in smaller
memories and at the same time gives the illusion of a large, fast memory being presented to
the processor.

Cache locality (both temporal and spatial) concept helps you to manage your program to
run faster by making sure that frequently accessed data is kept close by in fast-access
memory. By understanding and applying this concept, you can write more efficient code
that takes full advantage of the computer's caching system.

In this code example, if result, data1, and data2 are pointers to 0x00, 0x40 and 0x80 respectively
then this loop will cause repeated accesses to memory locations that all map to the same line in
the basic cache. The same thing will happen on each iteration of the loop and our software will
perform poorly. Direct mapped caches are therefore not typically used in the main caches

116
117

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy