0% found this document useful (0 votes)
7 views46 pages

Presentation 35191 Content Document 20250423021246PM

The document discusses pipelining in computer architecture, emphasizing its role in overlapping instruction execution to enhance performance. It details the five stages of a pipelined processor: Instruction Fetch, Instruction Decode, Execution, Memory Access, and Write Back, along with the data path components involved. Additionally, it addresses challenges such as structural, data, and control hazards, and solutions like forwarding and branch prediction to optimize pipelining efficiency.

Uploaded by

p32474429
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views46 pages

Presentation 35191 Content Document 20250423021246PM

The document discusses pipelining in computer architecture, emphasizing its role in overlapping instruction execution to enhance performance. It details the five stages of a pipelined processor: Instruction Fetch, Instruction Decode, Execution, Memory Access, and Write Back, along with the data path components involved. Additionally, it addresses challenges such as structural, data, and control hazards, and solutions like forwarding and branch prediction to optimize pipelining efficiency.

Uploaded by

p32474429
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

Module - 4

Pipelining and Single/Multi-cycle Processor

Computer Organization and Architecture

Dr Shashikiran V , Associate Professor, Dept of CSE, Dayananda Sagar University and Team
1
2
WHY HOW much speedup in the instruction execution rate ?

P
I
P
E
L
I
N
E

3
4
Pipelining – Basic concepts

1) Pipelining is an implementation technique whereby multiple instructions


are overlapped in execution; it takes advantage of parallelism that exists
among the actions needed to execute an instruction.
2) The time required between moving an instruction one step down the
pipeline is a processor cycle.
3) all stages proceed at the same time, the length of a processor cycle is
determined by the time required for the slowest pipe stage.
4) Pipelining yields a reduction in the average execution time per instruction.
5) the reduction can be viewed as decreasing the number of clock cycles per
instruction (CPI), as decreasing the clock cycle time, or as a combination

5
6
Pipelining – Basic concepts contd.

5 stage – pipelined processor : A simple RISC processor


1) On each clock cycle, another instruction is fetched and begins its five-cycle execution.
2) If an instruction is started every clock cycle, the performance will be up to five times
that of a processor that is not pipelined.
3) The names for the stages in the pipeline are the same as those used for the cycles in
the unpipelined
4) implementation: IF = instruction fetch, ID = instruction decode, EX = execution, MEM
= memory access, and WB =write-back.

7
5 stage – pipelined processor

8
5 stages:
Stage 1:Instruction Fetch (IF):
• In this stage, the processor fetches the instruction from
memory using the program counter (PC)
and
increments the PC to point to the next instruction.
The fetched instruction is then placed into an instruction
register.

9
5 stages:
Stage 2: Instruction Decode (ID):
In this stage, the fetched instruction is decoded to
determine what operation it specifies
and
what operands are involved.
Any necessary register values are also read in this stage.

10
5 stages:
Stage 3: Execution (EX):

This stage involves executing the operation specified by


the instruction.

This may involve arithmetic or logic operations, memory


accesses, or other operations depending on the
instruction.

11
5 stages:
Stage 4: Memory Access (MEM):

If the instruction involves a memory operation (such as a


load or store), this stage is used to access memory to
read or write data.

Otherwise, this stage might be a pass-through stage with


no operation.

12
5 stages:
Stage 5: Write Back (WB):

In this stage, the results of the execution stage are


written back to the appropriate registers in the register
file.

This stage completes the processing of the instruction.

13
Example:
Executing a single instruction, "Add R1, R2, R3," in a five-stage pipelined processor.

• IF: Fetches the instruction "Add R1, R2, R3" from memory.

• ID: Decodes the instruction and reads operands from the register file (R2 and R3).

• EX: Executes the addition operation (R2 + R3).

• MEM: No memory access needed in this instruction, so no operation is performed.

• WB: Writes the result of the addition to register R1.

14
Register Files
(Required to understand Data Path)

15
Signals for handling read and write operations within the
register file:

Write Enable (we):


1. Determines whether a write operation to the register file should occur.
2. When "we" =1, it indicates that the CPU wants to write data into the register file.
3. When "we" =0 , no write operation takes place, regardless of the state of other control
signals.
Write Select (ws):
4. Specifies which register within the register file should receive the data being written.
5. Selects the destination register for a write operation.
6. For example, if "ws" is set to 0011, it might indicate that the data should be written into
the register at index 3 (assuming a 0-based indexing system).
Read Select 1 (rs1):
7. Determines which register's value should be read from the register file as the first
operand.
8. Selects the source register from which data should be fetched for an operation.
9. For example, if "rs1" is set to 0101, it might indicate that the data from the register at
index 5 should be read as the first operand.
16
Signals for handling read and write operations within the
register file:

Read Select 2 (rs2):


1. Similar to "rs1," determines which register's value should be read from the register file
as the second operand.
2. Selects the source register from which data should be fetched for an operation.
3. For example, if "rs2" is set to 1001, it might indicate that the data from the register at
index 9 should be read as the second operand.
Read Data 1 (rd1):
4. Represents the output of the register file corresponding to the first read select signal
(rs1).
5. Contains the value of the register specified by "rs1," which serves as the first operand
for subsequent operations.
Read Data 2 (rd2):
6. Represents the output of the register file corresponding to the second read select
signal (rs2).
7. Contains the value of the register specified by "rs2," which serves as the second
operand for subsequent operations.
17
Data Path
The data path consists of the hardware elements within the CPU that are
responsible for performing arithmetic, logic, and data transfer operations.
Here are the key components of the data path:
1.Registers: Registers are small, fast storage locations within the CPU used to
hold data temporarily during instruction execution. This includes general-
purpose registers (GPRs), special-purpose registers (e.g., program counter,
stack pointer), and other registers used for specific purposes (e.g., instruction
register, memory address register).
2.ALU (Arithmetic Logic Unit): The ALU performs arithmetic and logic
operations on data fetched from registers or memory. It can handle
operations such as addition, subtraction, AND, OR, and more.
3.Memory Units: These units facilitate the transfer of data between the CPU
and memory. This includes the instruction memory (where program
instructions are stored) and the data memory (where program data is stored).
4.Multiplexers and Buffers: These components are used to route and select
data and control signals between different parts of the data path.

18
Datapath for MIPS processor – Single cycle

19
How a five-stage pipelined datapath executes a register-to-register
ALU instruction (such as ADD, SUB, AND, OR) in a CPU architecture:
1.Fetch (IF):
1. The Fetch stage retrieves the instruction from memory based on the current value of the program counter
(PC).
2. The fetched instruction is then stored in the instruction register (IR).
2.Decode (ID):
1. In the Decode stage, the instruction in the IR is decoded.
2. The control unit interprets the instruction and generates control signals for subsequent stages based on the
instruction's operation code (opcode).
3. The source and destination registers (Rs, Rt, and Rd) are identified during this stage.
3.Execute (EX):
1. The Execute stage performs the actual ALU operation specified by the instruction.
2. It takes the values from the source registers (Rs and Rt), performs the ALU operation, and produces the result.
3. For example, if the instruction is "ADD R1, R2, R3," the ALU would add the values stored in registers R2 and
R3.
4.Memory Access (MEM):
1. For a register-to-register ALU instruction, there is no memory access involved. Therefore, this stage typically
performs no operation (a "bubble" stage).
5.Write Back (WB):
1. In the Write Back stage, the result of the ALU operation is written back to the destination register (Rd).
2. If the instruction is "ADD R1, R2, R3," the result of the addition operation would be written back to register R1.

20
Datapath for MIPS processor – Single cycle

"EXT sel" in Register


immediate ALU
instructions controls
how immediate values
are extended (Sign
extension) to match
the width of the
register file before
being used in ALU
operations.

It ensures consistency
in operand sizes and
proper execution of
instructions.

21
How a five-stage pipelined datapath executes a register-immediate
ALU instruction (such as ADDI, SUBI, ANDI, ORI) in a CPU architecture:
1.Fetch (IF):
1. The Fetch stage retrieves the instruction from memory based on the current value of the program counter (PC).
2. The fetched instruction is then stored in the instruction register (IR).
2.Decode (ID):
1. In the Decode stage, the instruction in the IR is decoded.
2. The control unit interprets the instruction and generates control signals for subsequent stages based on the
instruction's operation code (opcode).
3. The source register (Rs), destination register (Rd), and immediate value (imm) are identified during this stage.
3.Execute (EX):
1. The Execute stage performs the actual ALU operation specified by the instruction, along with immediate value.
2. It takes the value from the source register (Rs), the immediate value (imm), performs the ALU operation, and
produces the result.
3. For example, if the instruction is "ADDI R1, R2, 10," the ALU would add the value stored in register R2 with the
immediate value 10.
4.Memory Access (MEM):
1. For a register-immediate ALU instruction, there is no memory access involved. Therefore, this stage typically
performs no operation (a "bubble" stage).
5.Write Back (WB):
1. In the Write Back stage, the result of the ALU operation is written back to the destination register (Rd).
2. If the instruction is "ADDI R1, R2, 10," the result of the addition operation would be written back to register R1.

22
Datapath for MIPS processor – Single cycle

23
Harvardcycle
Datapath for MIPS processor – Single style: separate read-only program
memory & - read/write data memory

24
In a five-stage pipelined CPU architecture, the datapath for load
and store instructions
1.Fetch (IF):
1. Fetches the instruction from memory using the program counter (PC).
2. The fetched instruction is stored in the instruction register (IR).
2.Decode (ID):
1. Decodes the instruction in the IR.
2. Identifies the operation (load or store), memory address, and the involved register(s).
3. Generates control signals based on the instruction's opcode.
3.Execute (EX):
1. In the context of load and store instructions, this stage is used for calculating the effective address.
2. If the instruction is a load, the effective address is calculated using the base address and any offset specified
in the instruction.
3. If the instruction is a store, the effective address calculation is similar, but it may involve additional steps
depending on the addressing mode.
4.Memory Access (MEM):
1. Performs the memory access operation.
2. For load instructions, the data is read from memory at the calculated effective address.
3. For store instructions, the data from the source register is written to memory at the calculated effective
address.
5.Write Back (WB):
1. For load instructions, the data read from memory is written back to the destination register.
2. For store instructions, there is typically no operation in the Write Back stage, as the operation is already
completed during the Memory Access stage.

25
Pipelined Datapath for MIPS processor – Multiple cycle

Every pipe stage becomes cycle in the pipeline

26
Pipelining in Multi-cycle MIPS processor
The pipeline can be thought of as a series of data paths shifted in time

27
Major hurdles in Pipelining
Structural hazards arise from resource conflicts when the hardware
cannot support all possible combinations of instructions simultaneously
in overlapped execution.
Data hazards arise when an instruction depends on the results of a
previous instruction in a way that is exposed by the overlapping of
instructions in the pipeline.
Control hazards arise from the pipelining of branches and other
instructions that change the PC.

28
Pipelining – Structural Hazards

1) overlapped execution of instructions requires pipelining of


functional units and duplication of resources to allow all possible
combinations of instructions in the pipeline
2) some combination of instructions cannot be accommodated
because of resource conflicts, the processor is said to have a
structural hazard
For Example :-
Processor may have only one register-file write port, but under
certain circumstances, the pipeline might want to perform two writes
in a clock cycle. This will generate a structural hazard.
Illustrated in the Figure shown in the next slide.
29
Pipelining – Structural Hazards Illustrated using figure

1) A processor with only one memory


port will generate a conflict whenever
a memory reference occurs.
2) In this example the load instruction
uses the memory for a data access at
the same time instruction 3 wants to
fetch an instruction from memory.

30
Pipelining – Data Hazards
• Data hazards occur when the pipeline changes the order of read/write
accesses to operands so that the order differs from the order seen by
sequentially executing instructions on an unpipelined processor
Consider the example
1) DADD R1,R2,R3 1) All the instructions after the DADD use the result of the
DADD instruction.
2) DSUB R4,R1,R5
3) AND R6,R1,R7
2) DADD instruction writes the value of R1 in the WB pipe
4) OR R8,R1,R9
stage, but the DSUB instruction reads the value during its
5) XOR R10,R1,R11 ID stage. This problem is called a data hazard.

3) Forwarding or Interlock control logic solves the problem.

31
Pipelining – Example for Data Hazards

32
Pipelining with Forwarding is the solution to solve
Data Hazard.

33
Pipelining : - Control Hazard

The instruction after the branch is fetched, but the instruction is ignored, and the fetch is
restarted
once the branch target is known. It is probably obvious that if the branch is not taken,
the second IF for branch successor is redundant. This is shown below.

34
Pipelining : - Branch Prediction schemes
Branch prediction : predicted-not-taken scheme
• In the simple five-stage pipeline, this predicted-not-taken or predicted-untaken scheme
is implemented by continuing to fetch instructions as if the branch were a normal
instruction. The pipeline looks as if nothing out of the ordinary is happening. If the
branch is taken, however, we need to turn the fetched instruction into a no-op and
restart the fetch at the target address. Figure below illustrates the same.

35
How Pipelining is implemented : Instruction Fetch

• 1. Instruction fetch cycle (IF):


IR <- Mem[PC];
NPC <- PC + 4;
• Operation—Send out the PC and
fetch the instruction from memory
into the instruction register (IR);
increment the PC by 4 to address
the next sequential instruction.
• The IR is used to hold the
instruction that will be needed on
subsequent clock cycles; likewise,
the register NPC is used to hold
the next sequential PC.
36
How Pipelining is implemented : Instruction decode

• Instruction decode/register fetch cycle


(ID):
A <- Regs[rs];
B <- Regs[rt];
Imm <- sign-extended immediate field of
IR;
• Operation — Decode the instruction and
access the register file to read the
registers (rs and rt are the register
specifiers). The outputs of the general
purpose registers are read into two
temporary registers (A and B) for use in
later clock cycles. The lower 16 bits of
the IR are also sign extended and stored
into the temporary register Imm, for use
in the next cycle.
37
How Pipelining is implemented : Instruction execution

Execution/effective address cycle (EX):


The ALU operates on the operands prepared in the
prior cycle, performing one of four functions
depending on the MIPS instruction type:
Memory reference:
ALUOutput ← A + Imm;
Operation—The ALU adds the operands to form
the effective address and places the result into the
register ALUOutput.
Register-register ALU instruction:
ALUOutput ← A func B;
Operation—The ALU performs the operation
specified by the function code on the value in
register A and on the value in register B. The result
is placed in the temporary register ALUOutput.

38
How Pipelining is implemented : Instruction execution

Register-Immediate ALU instruction:


ALUOutput ← A op Imm;
Operation—The ALU performs the operation specified by the
opcode on the value in register A and on the value in register
Imm. The result is placed in the temporary register
ALUOutput.
Branch:
ALUOutput ← NPC + (Imm << 2);
Cond ← (A == 0)
Operation—The ALU adds the NPC to the sign-extended
immediate value in Imm, which is shifted left by 2 bits to
create a word offset, to compute the address of the branch
target. Register A, which has been read in the prior cycle, is
checked to determine whether the branch is taken. Since we
are considering only one form of branch (BEQZ), the
comparison is against 0. Note that BEQZ is actually a pseudo-
instruction that translates to a BEQ with R0 as an operand.
For simplicity, this is the only form of branch we consider.

39
How Pipelining is implemented : Memory access

• Memory access/branch completion cycle (MEM):


The PC is updated for all instructions: PC ← NPC;
Memory reference:
LMD ← Mem[ALUOutput] or
Mem[ALUOutput] ← B;
Operation—Access memory if needed. If instruction is a
load, data return from memory and are placed in the
LMD (load memory data) register; if it is a store, then the
data from the B register are written into memory. In
either case, the address used is the one computed during
the prior cycle and stored in the register ALUOutput.
• Branch:
• if (cond) PC ← ALUOutput
operation—If the instruction branches, the PC is
replaced with the branch destination address in the
register ALUOutput.

40
How Pipelining is implemented : write back

Write-back cycle (WB):


■ Register-register ALU instruction:
Regs[rd] ← ALUOutput;
■ Register-immediate ALU instruction:
Regs[rt] ← ALUOutput;
■ Load instruction:
Regs[rt] ← LMD;
Operation—Write the result into the register
file, whether it comes from the memory system
(which is in LMD) or from the ALU (which is in
ALUOutput); the register destination field is
also in one of two positions (rd or rt) depending
on the effective opcode.

41
Data path for 5-stage pipelined processor

• Conjunction of diagrams in slide 17 18 19 20 21

42
What Makes Pipelining Hard to Implement?
1) I/O device request
2) Invoking an operating system service from a user program
3) Tracing instruction execution
4) Breakpoint (programmer-requested interrupt)
5) Integer arithmetic overflow
6) FP arithmetic anomaly
7) Page fault (not in main memory)
8) Misaligned memory accesses (if alignment is required)
9) Memory protection violation
10) Using an undefined or unimplemented instruction
11) Hardware malfunctions
12) Power failure
43
MIPS Pipeline to Handle Multicycle Operations

The MIPS pipeline with


three additional
unpipelined, floating-
point, functional units.
Because only one instruction
issues on every clock cycle,
all instructions go through the
standard pipeline for integer
operations. The FP operations
simply
loop for multi-cycle when they
reach the EX stage. After they
have finished the EX stage,
they proceed to MEM and WB
to complete execution.
44
MIPS R4000 Pipeline

The function of each stage is as follows:


IF—First half of instruction fetch; PC selection actually happens here, together with initiation of instruction
cache access.
IS—Second half of instruction fetch, complete instruction cache access.
RF—Instruction decode and register fetch, hazard checking, and instruction cache hit detection.
EX—Execution, which includes effective address calculation, ALU operation, and branch-target computation
and condition evaluation.
DF—Data fetch first half of data cache access.
DS—Second half of data fetch, completion of data cache access.
TC—Tag check, to determine whether the data cache access hit.
WB—Write-back for loads and register-register operations.
45
Crosscutting issues

Static scheduling
The compiler can attempt to schedule instructions to avoid the
hazard( structure, data and control); this approach is called compiler or
static scheduling.
Dynamic scheduling
Several early processors used another approach, called dynamic scheduling,
whereby the hardware rearranges the instruction execution to reduce the
stalls.
Dynamic scheduling with scoreboard is adopted in CDC6600 machine.

46

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy