0% found this document useful (0 votes)
243 views74 pages

COA - Chapter # 9

The document discusses the structure and function of the CPU. It describes how the CPU must fetch instructions from memory, interpret and process the instructions, fetch and process any required data, and write results back to memory. It discusses the internal structure of the CPU including registers for temporary storage. There are different types of registers including general purpose registers, data registers, address registers, and condition code registers. The document provides examples of register organizations for Intel x86-64 and ARM processors.

Uploaded by

Set Emp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
243 views74 pages

COA - Chapter # 9

The document discusses the structure and function of the CPU. It describes how the CPU must fetch instructions from memory, interpret and process the instructions, fetch and process any required data, and write results back to memory. It discusses the internal structure of the CPU including registers for temporary storage. There are different types of registers including general purpose registers, data registers, address registers, and condition code registers. The document provides examples of register organizations for Intel x86-64 and ARM processors.

Uploaded by

Set Emp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 74

PROCESSOR STRUCTURE &

FUNCTION
CHAPTER # 9 Computer Organization & Architecture
CPU Function
S H E H E R YAR MALI K

 CPU must
 Fetch instruction
 The processor reads an instruction from memory (register, cache, main
memory)
 Interpret instruction
 The instruction is decoded to determine what action is required
 Fetch data
 The execution of an instruction may require reading data from memory or an
I/O module
 Process data
 The execution of an instruction may require performing some arithmetic or
logical operation on data
 Write data
 The results of an execution may require writing data to memory or an I/O
module

Chapter # 9 Computer Organization & Architecture 2


CPU With Systems Bus
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 3


CPU Internal Structure
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 4


Registers
S H E H E R YAR MALI K

 CPU must have some working space (temporary


storage)
 Called registers
 Number and function vary between processor
designs
 One of the major design decisions
 Top level of memory hierarchy

Chapter # 9 Computer Organization & Architecture 5


Types of Registers
S H E H E R YAR MALI K

 User-visible registers
 Enable the machine or assembly language programmer to
minimize main memory references by optimizing use of
registers
 Control and status registers
 Used by the control unit to control the operation of the
processor and by privileged, operating system programs to
control the execution of programs

Chapter # 9 Computer Organization & Architecture 6


User Visible Registers
S H E H E R YAR MALI K

 General Purpose
 Data
 Address
 Condition Codes

Chapter # 9 Computer Organization & Architecture 7


General Purpose Registers
S H E H E R YAR MALI K

 May be true general purpose


 May be restricted
 May be used for data or addressing
 Data
 Accumulator
 Addressing
 Segment

Chapter # 9 Computer Organization & Architecture 8


General Purpose Registers
S H E H E R YAR MALI K

 Make them general purpose


 Increase flexibility and programmer options
 Increase instruction size & complexity
 Make them specialized
 Smaller (faster) instructions
 Less flexibility

Chapter # 9 Computer Organization & Architecture 9


How Many GP Registers?
S H E H E R YAR MALI K

 Between 8 - 32
 Fewer = more memory references
 More does not reduce memory references and takes
up processor real estate

Chapter # 9 Computer Organization & Architecture 10


How big?
S H E H E R YAR MALI K

 Large enough to hold full address


 Large enough to hold full word
 Often possible to combine two data registers
 C programming
 double int a;

Chapter # 9 Computer Organization & Architecture 11


Condition Code Registers
S H E H E R YAR MALI K

 Sets of individual bits


 e.g. result of last operation was zero
 Can be read (implicitly) by programs
 e.g. Jump if zero
 Can not (usually) be set by programs

Chapter # 9 Computer Organization & Architecture 12


Control & Status Registers
S H E H E R YAR MALI K

 Program counter (PC)


 Contains the address of an instruction to be fetched
 Instruction register (IR)
 Contains the instruction most recently fetched
 Memory address register (MAR)
 Contains the address of a location in memory
 Memory buffer register (MBR)
 Contains a word of data to be written to memory or the
word most recently read

Chapter # 9 Computer Organization & Architecture 13


Program Status Word
S H E H E R YAR MALI K

A set of bits which includes Condition Codes


 Sign
 Contains the sign bit of the result of the last arithmetic operation
 Zero
 Set when the result is 0
 Carry
 Set if an operation resulted in a carry
 Equal
 Set if a logical compare result is equality
 Overflow
 Used to indicate arithmetic overflow
 Interrupt Enable/Disable
 Used to enable or disable interrupts
 Supervisor
 Indicates whether the processor is executing in supervisor or user mode

Chapter # 9 Computer Organization & Architecture 14


Supervisor Mode
S H E H E R YAR MALI K

 Intel ring zero


 Kernel mode
 Allows privileged instructions to execute
 Used by operating system
 Not available to user programs

Chapter # 9 Computer Organization & Architecture 15


Other Registers
S H E H E R YAR MALI K

 May have registers pointing to


 Process control blocks
 Interrupt Vectors
 Page table
 CPU design and operating system design are closely
linked

Chapter # 9 Computer Organization & Architecture 16


Example Register Organizations
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 17


EFLAGS Register
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 18


Intel x86-64 Registers
S H E H E R YAR MALI K

 16 integer general-purpose registers


 64-bit
 RAX, RBX, RCX, RDX, RBP, RSI, RDI, RSP, R8 - R15
 8 floating-point registers, FPU x87
 80-bit
 ST0 - ST7
 8 Multimedia Extensions registers
 64-bit
 MM0 – MM7
 they share space with the registers ST0 - ST7
 16 SSE (Streaming SIMD Extensions) registers
 128-bit
 XMM0 - XMM15
 64-bit RIP pointer
 64-bit flag register RFLAGS
Chapter # 9 Computer Organization & Architecture 19
Intel x86-64 Registers
S H E H E R YAR MALI K

 64-bit x86 adds 8 more general-purpose registers, named R8-


R15
 It also introduces a new naming convention
 except that AH, CH, DH and BH have no equivalents
 R0 is RAX
 R1 is RCX
 R2 is RDX
 R3 is RBX
 R4 is RSP
 R5 is RBP
 R6 is RSI
 R7 is RDI
Chapter # 9 Computer Organization & Architecture 20
Intel x86-64 Registers
S H E H E R YAR MALI K

 R8, R9, R10, R11, R12, R13, R14, R15 are the new
registers and have no other names
 R0D – R15D are the lowermost 32 bits of each
register
 For example, R0D is EAX
 R0W – R15W are the lowermost 16 bits of each
register
 For example, R0W is AX
 R0L – R15L are the lowermost 8 bits of each register
 for example, R0L is AL

Chapter # 9 Computer Organization & Architecture 21


Intel x86-64 Registers
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 22


Simplified ARM Organization
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 23


ARM (32-bit) Processor Modes
S H E H E R YAR MALI K

 The ARM architecture supports seven execution modes


 User Mode
 Most application programs execute in user mode
 User program being executed is unable to access protected system resources
or to change mode
 Privilege Modes
 These modes are used to run system software and divided into two categories
 System Mode
 This mode is not entered by any exception and uses the same registers available in User
mode
 The System mode is used for running certain privileged operating system tasks
 System mode tasks may be interrupted by any of the five exception categories
 Exception Modes
 The exception are entered when specific exceptions occur
 Each of these modes has some dedicated registers that substitute for some of the user
mode registers

Chapter # 9 Computer Organization & Architecture 24


ARM Exception Modes
S H E H E R YAR MALI K

 The exception are entered when specific exceptions occur


 Each of these modes has some dedicated registers that substitute for some of
the user mode registers, and which are used to avoid corrupting User mode
state information when the exception occurs
 The exception modes are as follows
 Supervisor mode
 Usually what the OS runs in and it is entered when the processor encounters a software interrupt
instruction
 Abort mode
 Entered in response to memory faults
 Undefined mode
 Entered when the processor attempts to execute an instruction that is supported neither by the main
integer core nor by one of the coprocessors
 Interrupt mode
 Entered whenever the processor receives an interrupt signal from any other interrupt source
 Fast interrupt mode
 Entered whenever the processor receives an interrupt signal from the designated fast interrupt source
 A fast interrupt cannot be interrupted, but a fast interrupt may interrupt a normal interrupt

Chapter # 9 Computer Organization & Architecture 25


ARM Register Organization
S H E H E R YAR MALI K

 The ARM processor has a total of thirty seven 32-bit registers,


classified as follows
 31 registers referred to in the ARM manual as general-purpose registers
 In fact, some of these, such as the program counters, have special purposes
 6 program status registers
 Registers are arranged in partially overlapping banks, with the current
processor mode determining which bank is available
 At any time, sixteen numbered registers and one or two program status
registers are visible, for a total of 17 or 18 software-visible registers
 Registers R0 through R7, register R15 (the program counter) and the current
program status register (CPSR) are visible in and shared by all modes
 Registers R8 through R12 are shared by all modes except fast interrupt, which has
its own dedicated registers R8_fiq through R12_fiq
 All the exception modes have their own versions of registers R13 and R14
 All the exception modes have a dedicated saved program status register (SPSR)

Chapter # 9 Computer Organization & Architecture 26


ARM Register Organization
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 27


ARM Register Organization
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 28


ARM AArch64 Registers
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 29


Instruction Cycle
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 30


Indirect Cycle
S H E H E R YAR MALI K

 May require memory access to fetch operands


 Indirect addressing requires more memory accesses
 Can be thought of as additional instruction sub-cycle

Chapter # 9 Computer Organization & Architecture 31


Instruction Cycle with Indirect
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 32


Instruction Cycle State Diagram
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 33


Data Flow (Instruction Fetch)
S H E H E R YAR MALI K

 Depends on CPU design


 Fetch
 PC contains address of next instruction
 Address moved to MAR
 Address placed on address bus
 Control unit requests memory read
 Result placed on data bus, copied to MBR, then to IR
 Meanwhile PC incremented by 1

Chapter # 9 Computer Organization & Architecture 34


Data Flow (Data Fetch)
S H E H E R YAR MALI K

 IR is examined
 If indirect addressing, indirect cycle is performed
 Right most N bits of MBR transferred to MAR
 Control unit requests memory read
 Result (address of operand) moved to MBR

Chapter # 9 Computer Organization & Architecture 35


Data Flow (Fetch Diagram)
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 36


Data Flow (Indirect Diagram)
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 37


Data Flow (Execute)
S H E H E R YAR MALI K

 May take many forms


 Depends on instruction being executed
 May include
 Memory read/write
 Input/Output
 Register transfers
 ALU operations

Chapter # 9 Computer Organization & Architecture 38


Data Flow (Interrupt)
S H E H E R YAR MALI K

 Simple
 Predictable
 Current PC saved to allow resumption after interrupt
 Contents of PC copied to MBR
 Special memory location (e.g. stack pointer) loaded to MAR
 MBR written to memory
 PC loaded with address of interrupt handling routine
 Next instruction (first of interrupt handler) can be fetched

Chapter # 9 Computer Organization & Architecture 39


Data Flow (Interrupt Diagram)
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 40


Prefetch
S H E H E R YAR MALI K

 Fetch accessing main memory


 Execution usually does not access main memory
 Can fetch next instruction during execution of
current instruction
 Called instruction prefetch

Chapter # 9 Computer Organization & Architecture 41


Improved Performance
S H E H E R YAR MALI K

 But not doubled


 Fetch usually shorter than execution
 Prefetch more than one instruction?
 Any jump or branch means that prefetched instructions are
not the required instructions
 Divide in more activities/stages to improve
performance
 Solution is processor pipelining

Chapter # 9 Computer Organization & Architecture 42


Pipelining is Natural
S H E H E R YAR MALI K

 Laundry Example
 Nazim, Botir, Babar, Temur
each have one load of clothes A B C D
to wash, dry, and fold
 “Washing” takes 30 minutes

 “Drying” takes 30 minutes

 “Folding” takes 30 minutes

 “Stashing” takes 30 minutes


to put clothes into drawers

Chapter # 9 Computer Organization & Architecture 43


Sequential Laundry
S H E H E R YAR MALI K

6 PM 7 8 9 10 11 12 1 2 AM

30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
T Time
a A
s
k
B
O
r C
d
e
r
D

 Sequential laundry takes 8 hours for 4 loads

Chapter # 9 Computer Organization & Architecture 44


Pipelined Laundry: Start work ASAP
S H E H E R YAR MALI K

6 PM 7 8 9 10 11 12 1 2 AM

30 30 30 30 30 30 30 Time
T
a A
s
k
B
O
r C
d
e
r
D

 Pipelined laundry takes 3.5 hours for 4 loads!

Chapter # 9 Computer Organization & Architecture 45


Processor Pipelining
S H E H E R YAR MALI K

 Fetch instruction
 Decode instruction
 Calculate operands address
 Fetch operands
 Execute instructions
 Write result

 Overlap these operations

Chapter # 9 Computer Organization & Architecture 46


Two Stage Instruction Pipeline
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 47


Timing Diagram for
Instruction Pipeline Operation
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 48


Why Pipeline?
S H E H E R YAR MALI K

 Suppose we execute 100 instructions


 Single Cycle Machine
 45 ns/cycle x 1 CPI x 100 inst = 4500 ns
 Multicycle Machine
 10 ns/cycle x 4.6 CPI (due to instr. mix) x 100 inst = 4600 ns
 Ideal pipelined machine
 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns

Chapter # 9 Computer Organization & Architecture 49


The Effect of a Conditional Branch on Instruction
Pipeline Operation
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 50


Six Stage Instruction Pipeline
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 51


Alternative Pipeline Depiction
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 52


Speedup Factors with Instruction Pipelining
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 53


Pipeline Hazards
S H E H E R YAR MALI K

 Pipeline, or some portion of pipeline, must stall


 Also called pipeline bubble
 Types of hazards
 Resource
 Data
 Control

Chapter # 9 Computer Organization & Architecture 54


Resource Hazards
S H E H E R YAR MALI K

 Two (or more) instructions in pipeline need same resource


 Executed in serial rather than parallel for part of pipeline
 Also called structural hazard
 E.g. Assume simplified five-stage pipeline
 Each stage takes one clock cycle
 Ideal case is new instruction enters pipeline each clock cycle
 Assume main memory has single port
 Assume instruction fetches and data reads and writes performed one at a time
 Ignore the cache
 Operand read or write cannot be performed in parallel with instruction fetch
 Fetch instruction stage must idle for one cycle fetching I3

 E.g. multiple instructions ready to enter execute instruction phase


 Single ALU

 One solution: increase available resources


 Multiple main memory ports
 Multiple ALUs

Chapter # 9 Computer Organization & Architecture 55


Resource Hazard Diagram
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 56


Data Hazards
S H E H E R YAR MALI K

 Conflict in access of an operand location


 Two instructions to be executed in sequence
 Both access a particular memory or register operand
 If in strict sequence, no problem occurs
 If in a pipeline, operand value could be updated so as to produce different
result from strict sequential execution
 E.g. x86 machine instruction sequence:

 ADD EAX, EBX /* EAX = EAX + EBX


 SUB ECX, EAX /* ECX = ECX – EAX

 ADD instruction does not update EAX until end of stage 5, at clock cycle 5
 SUB instruction needs value at beginning of its stage 2, at clock cycle 4
 Pipeline must stall for two clocks cycles
 Without special hardware and specific avoidance algorithms, results in
inefficient pipeline usage
Chapter # 9 Computer Organization & Architecture 57
Data Hazard Diagram
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 58


Types of Data Hazard
S H E H E R YAR MALI K

 Read after write (RAW), or true dependency


 An instruction modifies a register or memory location
 Succeeding instruction reads data in that location
 Hazard if read takes place before write complete
 Write after read (WAR), or antidependency
 An instruction reads a register or memory location
 Succeeding instruction writes to location
 Hazard if write completes before read takes place
 Write after write (WAW), or output dependency
 Two instructions both write to same location
 Hazard if writes take place in reverse of order intended sequence

Chapter # 9 Computer Organization & Architecture 59


Control Hazard
S H E H E R YAR MALI K

 Also known as branch hazard


 Pipeline makes wrong decision on branch prediction
 Brings instructions into pipeline that must
subsequently be discarded

Chapter # 9 Computer Organization & Architecture 60


Dealing with Branches
S H E H E R YAR MALI K

 Multiple Streams
 Prefetch Branch Target
 Loop buffer
 Branch prediction
 Delayed branching

Chapter # 9 Computer Organization & Architecture 61


Multiple Streams
S H E H E R YAR MALI K

 Have two pipelines


 Prefetch each branch into a separate pipeline
 Use appropriate pipeline
 Leads to bus & register contention
 Multiple branches lead to further pipelines being
needed

Chapter # 9 Computer Organization & Architecture 62


Prefetch Branch Target
S H E H E R YAR MALI K

 Target of branch is prefetched in addition to


instructions following branch
 Keep target until branch is executed
 Used by IBM 360/91

Chapter # 9 Computer Organization & Architecture 63


Loop Buffer
S H E H E R YAR MALI K

 Very fast memory


 Maintained by fetch stage of pipeline
 Check buffer before fetching from memory
 Very good for small loops or jumps
 c.f. cache
 Used by CRAY-1

Chapter # 9 Computer Organization & Architecture 64


Loop Buffer Diagram
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 65


Branch Prediction
S H E H E R YAR MALI K

 Predict never taken


 Assume that jump will not happen
 Always fetch next instruction
 68020 & VAX 11/780
 VAX will not prefetch after branch if a page fault would
result (O/S v CPU design)
 Predict always taken
 Assume that jump will happen
 Always fetch target instruction

Chapter # 9 Computer Organization & Architecture 66


Branch Prediction
S H E H E R YAR MALI K

 Predict by Opcode
 Some instructions are more likely to result in a jump than
others
 Can get up to 75% success
 Taken/Not taken switch
 Based on previous history
 Good for loops
 Refined by two-level or correlation-based branch history
 Correlation-based
 In more complex structures, branch direction correlates
with that of related branches
 Use recent branch history as well

Chapter # 9 Computer Organization & Architecture 67


Branch Prediction
S H E H E R YAR MALI K

 Delayed Branch
 Do not take jump until you have to
 Rearrange instructions

Chapter # 9 Computer Organization & Architecture 68


Branch Prediction State Diagram
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 69


Branch Prediction Flowchart
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 70


Dealing With Branches
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 71


Intel 80486 Pipelining
S H E H E R YAR MALI K

 Fetch
 From cache or external memory
 Put in one of two 16-byte prefetch buffers
 Fill buffer with new data as soon as old data consumed
 Average 5 instructions fetched per load
 Independent of other stages to keep buffers full
 Decode stage 1
 Opcode & address-mode info
 At most first 3 bytes of instruction
 Can direct D2 stage to get rest of instruction
 Decode stage 2
 Expand opcode into control signals
 Computation of complex address modes
 Execute
 ALU operations, cache access, register update
 Writeback
 Update registers & flags
 Results sent to cache & bus interface write buffers

Chapter # 9 Computer Organization & Architecture 72


80486 Instruction Pipeline Examples
S H E H E R YAR MALI K

Chapter # 9 Computer Organization & Architecture 73


Q.1 Q.2
S H E H E R YAR MALI K
 Given the following memory and  Consider different systems with and
register values. without pipelining. Each system has to
 Word 700 contains 740 execute 1400 instructions.
 Word 710 contains 750  Calculate the total execution time for 1400
 Word 720 contains 710
 Word 730 contains 740 instructions in each of the following case?
 Word 740 contains 700  Single-cycle machine
 Word 750 contains 700  It takes 40ns for each cycle
 AX Register contains 720  Multi-cycle machine (without pipelining)
 BX Register contains 740
 It takes 6ns for each cycle
 CX Register contains 710
 DX Register contains 750  It has 7 stages
 Base Register contains 200  It consumes 7 clocks per instruction on
 What would be the result in average
following cases?  Ideal pipelined machine
 ADD DX, [BX]
 It takes 6ns for each cycle
 SUB [CX], BX
 MOV DX, 30  It has 7 stage
 ADD [AX], [700]
Chapter # 9 Computer Organization & Architecture 74

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy