0% found this document useful (0 votes)

6 views40 pages

EC483_Fall2024_W7

Chapter 3 of 'Computer Architecture: A Quantitative Approach' discusses instruction-level parallelism (ILP) and its exploitation through pipelined architecture. It covers the concepts of data, name, and control dependencies, as well as hazards that can occur in pipelined systems, and techniques for optimizing ILP such as loop unrolling and branch prediction. The chapter emphasizes the importance of minimizing stalls and maximizing instruction throughput in modern processors.

Uploaded by

tmbuot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views40 pages

EC483_Fall2024_W7

Uploaded by

tmbuot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Computer Architecture

A Quantitative Approach, Sixth Edition

Chapter 3
Instruction-Level
Parallelism and Its
Exploitation

Copyright © 2019, Elsevier Inc. All

rights Reserved 1
Un-pipelined Architecture
Unpipelined Start and finish a job before moving to the next

Fetch

Decode

Jobs
Execute

Time
2
Pipelined Architecture
Pipelined Break the job into smaller stages

F D X

F D X
Jobs

F D X

Time
3
5-Stage Pipeline

In order to enable pipelining we need to

hold or keep the input stable to each
stage and this require latching data and
control signals to each stage in the
pipeline → Pipeline Registers 4
Clocks and Latches

Stage 1 L Stage 2 L

Clk

• Unpipelined: time to execute one instruction = T + Tovh

• For an N-stage pipeline, time per stage = T/N + Tovh
• Total time per instruction = N (T/N + Tovh) = T + N Tovh
• Clock cycle time = T/N + Tovh
• Clock speed = 1 / (T/N + Tovh)
• Ideal speedup = (T + Tovh) / (T/N + Tovh)
• Cycles to complete one instruction = N
• Average CPI (cycles per instr) = 1 5
A 5-Stage Pipeline

6
A 5-Stage Pipeline

Use the PC to access the I-cache and increment PC by 4

P
C

PC+4
P
C

PC+4

7
A 5-Stage Pipeline
Read registers, compare registers, compute branch target; for now, assume
branches take 2 cyc (there is enough work that branches can easily take more)

8
A 5-Stage Pipeline

ALU computation, effective address computation for load/store

9
A 5-Stage Pipeline

Memory access to/from data cache, stores finish in 4 cycles

10
A 5-Stage Pipeline

Write result of ALU computation or load into register file

11
Introduction

• Pipelining become universal technique in 1985

• Overlaps execution of instructions
• Exploits “Instruction Level Parallelism”

Two main approaches:

• Hardware-based dynamic approaches
• Used in server and desktop processors
• Not used as extensively in PMP processors
• Compiler-based static approaches
• Not as successful outside of scientific applications

12
Instruction-Level Parallelism

• When exploiting instruction-level parallelism, goal is to

minimize pipeline CPI
• Pipeline CPI = Ideal pipeline CPI +Structural stalls +Data hazard
stalls + Control stalls

• **Parallelism with basic block is limited

• Typical size of basic block = 3-6 instructions
• Must optimize across branches

13
Instruction Dependences

• There are three different types of dependences

• Data dependence (True data dependence)
• Name dependence (Instructions using same register
names)
• Control dependence ( Branches )

•An instruction j is data-dependent on instruction

i if either of the following holds
– Instruction i produces a result that may be used by
instruction j
– Instruction j is data dependent on instruction k and
instruction k is data dependent on instruction i 14
Data Dependences

• Example of data dependence

Lp: fld f0,0(x1) //f0=array element

fadd.d f4,f0,f2 //add scalar in f2

fsd f4,0(x1) //store result

addi x1,x1,-8 //decrement pointer 8 bytes

bne x1,x2,Lp //branch if x1 ≠ x2

15
Instruction Dependences

• Dependencies are a property of programs

• Pipeline organization determines if dependence is
detected and if it causes a stall

• Data dependence conveys:

– Possibility of a hazard
– Order in which results must be calculated
– Upper bound on exploitable instruction level parallelism

• Dependencies that flow through memory locations

are difficult to detect

16
Name Dependences

• A name dependence occurs when two instructions

use the same register or memory location, called a
name, but there is no flow of data between the
instructions associated with that name

• Two types of name dependence

– An antidependence Write After Read or (WAR)
– Output dependence Write After Write or (WAW

17
Register Renaming

• Instructions with name dependence can execute

simultaneously or out of order if the registers are
renamed (register renaming)

• Renaming can be done statically at compile time or

dynamically by hardware at run time.

18
Control Dependence

• A control dependence determines the ordering of an

instruction i with respect to a branch instruction
if p1 {
S1;
};
if p2 {
S2;
}
• Instruction S1 is control dependent on p1 and S2 is
control dependent on p2
• Control dependence is preserved by implementing
control hazard detection that causes control stalls.
19
Control Dependence

• Can we move S1 after (if p2 ) or S2 before (if p1) ?

• Yes! but without affecting the correctness if p1 {
S1;
of the program };
if p2 {
S2;
• The two properties critical to program }
correctness are exception behavior and the data flow
add x2,x3,x4
beq x2,x0,L1
ld x1,0(x2)
L1:
• The load instruction may cause a memory
protection exception if moved before the branch
20
Control Dependence

• It is insufficient to just maintain data dependences

because an instruction may be data-dependent on
more than one predecessor add x1,x2,x3
beq x4,x0,L
sub x1,x5,x6
L: ...
or x7,x1,x8
• The or instruction is data-dependent on both the add
and sub instructions
• The data flow must be preserved.
• Speculation helps to lessen the impact of the control
dependence while still maintaining the data flow
21
Value liveness

• The property of whether a value will be used by an

upcoming instruction is called liveness
• What if we knew that the register destination of the sub
instruction (x4) was unused after the instruction labeled
skip? add x1,x2,x3
beq x12,x0,skip
sub x4,x5,x6
add x5,x4,x9
skip: or x7,x8,x9

• Then we can move the sub before the beq

• This type of code scheduling is also a form of
22
speculation, often called software speculation
Hazards

• Structural hazards: different instructions in different stages

(or the same stage) conflicting for the same resource

• Data hazards: an instruction cannot continue because it

needs a value that has not yet been generated by an
earlier instruction

• Control hazard: fetch cannot continue because it does

not know the outcome of an earlier branch – special case
of a data hazard – separate category because they are
treated in different ways

23
Structural Hazards

• Example: a unified instruction and data cache →

stage 4 (MEM) and stage 1 (IF) can never coincide

• The later instruction and all its successors are delayed

until a cycle is found when the resource is free → these
are pipeline bubbles

• Structural hazards are easy to eliminate – increase the

number of resources (for example, implement a separate
instruction and data cache)

24
Enabling and optimizing ILP

• To enable ILP we need to

– Detect data dependences either in software or hardware
– Insert stalls when ever needed for correct program result
– Flush pipeline when ever a branch is taken

• To optimize ILP we need to

– Minimize number of stalls needed for correct program result
• Know when and how the ordering among instructions may be changed
– Minimize flushing the pipeline
• Predicting branch outcomes.

25
Compiler Techniques for Exposing ILP

• Pipeline Scheduling
– Separate dependent instruction from the source instruction by
the pipeline latency of the source instruction
• Example
➢ C code:
for (i=999; i>=0; i=i-1)
x[i] = x[i] + s;

➢ Un-Pipelined RISC-V Code

Loop: fld f0,0(x1) //f0=array element x[i]
fadd.d f4,f0,f2 //add scalar in f2=s
fsd f4,0(x1) //store result
addi x1,x1,-8 //decrement pointer 8 bytes (per DW)
bne x1,x2,Loop //branch if x1≠x2
Where are the data dependencies in the above code? And of which type?
26
Compiler Techniques for Exposing ILP
➢ Pipelined RISC-V Code
Loop: fld f0,0(x1) Loop: fld f0,0(x1)
stall addi x1,x1,-8
fadd.d f4,f0,f2 fadd.d f4,f0,f2
stall stall
stall stall
fsd f4,0(x1) fsd f4,8(x1)
addi x1,x1,-8 bne x1,x2,Loop
bne x1,x2,Loop
Constrains

27
Compiler Techniques for Exposing ILP

• Loop unrolling
– Replicates the loop body multiple times, and adjusting the loop
termination code
– Unroll by a factor of 4 (assume # elements is divisible by 4)
– Eliminate unnecessary instructions
Loop fld f0,0(x1)
fadd.d f4,f0,f2
fsd f4,0(x1) //drop addi & bne
fld f6,-8(x1)
fadd.d f8,f6,f2
fsd f8,-8(x1) //drop addi & bne
fld f0,-16(x1)
fadd.d f12,f0,f2
fsd f12,-16(x1) //drop addi & bne
fld f14,-24(x1)
fadd.d f16,f14,f2
fsd f16,-24(x1)
addi x1,x1,-32
bne x1,x2,Loop
• Eliminating three branches and three decrements of x1 28
Compiler Techniques for Exposing ILP

• Pipeline schedule the unrolled loop

Loop: fld f0,0(x1)

fld f6,-8(x1)
fld f8,-16(x1)
fld f14,-24(x1)
fadd.d f4,f0,f2
fadd.d f10,f6,f2
fadd.d f12,f8,f2
fadd.d f16,f14,f2
fsd f4,0(x1)
fsd f10,-8(x1)
fsd
fsd
f12,-16(x1)
f16,-24(x1)
◼ 14 cycles
addi x1,x1,-32 ◼ 3.5 cycles per element
bne x1,x2,Loop

29
Compiler Techniques for Exposing ILP
❖ Determine that unrolling the loop would be useful by finding that the
loop iterations were independent, except for the loop maintenance
code
❖ Use different registers for different computations to avoid name
dependence.
❖ Eliminate the extra test and branch instructions and adjust the loop
termination and iteration code.
❖ Determine that the loads and stores in the unrolled loop can be
interchanged if they are independent, they do not refer to the same
address.
❖ Schedule the code, preserving any dependences needed to yield
the same result as the original code.

30
Compiler Techniques Limitations

❖ Loop overhead
❖ Amount of overhead that can be reduced decrease by each
additional unroll
❖ Code size limitations
❖ Increase in code size → possible increase in cache miss rate
❖ Compiler limitations
❖ Potential shortfall in registers --> register pressure.

31
Branch Prediction

❖ Basic 1-bit predictor:

❖ Predict not taken, just increment pc+4 (do nothing special)

T
T
N 0 1

N
32
Basic 1-bit predictor

▪ How basic 1bit predictor branch predictor behaves on

the following branch patterns?
▪ TTTTTTTTTTTNTTTTTTTTTTTTTTT…..
▪ NNNNNNNNNNNNTNNNNNNNNNNNNN….
▪ TNTNTNTNTNTNTNTNTNTNTNTNTNTNTN…..

33
Basic 1-bit predictor

▪ Assume 30% of instructions are branches, and 60%of

branches are mispredicted, calculate pipeline CPI if the
branch misprediction penalty is 2 cycles.
Pipeline CPI =
= 1 + %Branch Instructions  %Branch Miss Prediction Rate  Branch Miss Prediction Penalty
= 1 + 0.3  0.6  2 = 1.36

34
Resources

▪ Memory Timing
▪ https://www.hardwaresecrets.com/understanding-ram-timings/
▪ Memory Architecture
▪ https://en.wikipedia.org/wiki/Multi-channel_memory_architecture
▪ CS6810 Computer Architecture 87 Lectures by Rajeev
Balasubramonian
▪ https://www.youtube.com/playlist?list=PL8EC1756A7B1764F6

35
Resources

▪ HPCA short Lecture series on High Performance Computer Architecture

▪ Part 1 (161 Lectures)
▪ https://www.youtube.com/playlist?list=PLAwxTw4SYaPmqpjgrmf4-
DGlaeV0om4iP
▪ Part 2 (62 Lectures)
▪ https://www.youtube.com/playlist?list=PLAwxTw4SYaPkNw98-
MFodLzKgi6bYGjZs
▪ Part 3 (169 Lectures)
▪ https://www.youtube.com/playlist?list=PLAwxTw4SYaPnhRXZ6wuHnnclMLfg_y
jHs
▪ Part 4 (120 Lectures)
▪ https://www.youtube.com/playlist?list=PLAwxTw4SYaPn79fsplIuZG34KwbkYSe
dj
▪ Part 5 (149 Lectures)
▪ https://www.youtube.com/playlist?list=PLAwxTw4SYaPkr-
vo9gKBTid_BWpWEfuXe
36
How we implement a basic 1-bit predictor?

Branch PC
10 bits
Table of
1K entries

Each
entry is
a bit
The table keeps track of what the branch did last time

37
Basic 2-bit Branch Prediction

❖ Basic 2-bit predictor:

❖ For each branch:
❖ Predict taken or not taken
❖ Change prediction only if the prediction is wrong two consecutive times.

T T T
T

00 01 10 11
N
N N N
▪Check the following case assuming we start from 11 state:
TNTNTNTNTNTNTNTNTNTNTNTNTNTNTN…..
▪ We get 50% Correct prediction!
38
Basic 2-bit Branch Prediction
• For each branch, maintain a 2-bit saturating counter:
if the branch is taken: counter = min(3,counter+1)
if the branch is not taken: counter = max(0,counter-1)
• If (counter >= 2), predict taken, else predict not taken
• Advantage: few typical branches will not influence the
prediction (a better measure of “the common case”)
• Especially useful when multiple branches share the same
counter (some bits of the branch PC are used to index
into the branch predictor)
• Can be easily extended to N-bits (in most processors,
N=2)
• Prediction performance depends on both the prediction
accuracy and the branch frequency
39
Basic 2-bit Branch Prediction

Branch PC
10 bits Table of
1K entries

Each
entry is
a 2-bit
sat.
The table keeps track of the common-case counter
outcome for the branch

unit4.aca
No ratings yet
unit4.aca
6 pages
CompanionAsset 9780128119051 Chapter03 (3)
No ratings yet
CompanionAsset 9780128119051 Chapter03 (3)
67 pages
Cosc530 Ch3all6up
No ratings yet
Cosc530 Ch3all6up
8 pages
4-Advanced pipelining_241114_060906
No ratings yet
4-Advanced pipelining_241114_060906
80 pages
ILP Overview and Scoreboard
No ratings yet
ILP Overview and Scoreboard
60 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
108 pages
Onur Digitaldesign - Comparch 2021 Lecture13 Pipelining Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture13 Pipelining Afterlecture
138 pages
Pipelining Become Universal Technique in 1985
No ratings yet
Pipelining Become Universal Technique in 1985
16 pages
13) Ilp1 PDF
No ratings yet
13) Ilp1 PDF
85 pages
4th Lecture Computer Architecture
No ratings yet
4th Lecture Computer Architecture
15 pages
Chapter 2 ILP
No ratings yet
Chapter 2 ILP
89 pages
CS 6290 Instruction Level Parallelism
No ratings yet
CS 6290 Instruction Level Parallelism
45 pages
Instruction Level Parallelism and Its Exploitation: Unit Ii by Raju K, Cse Dept
No ratings yet
Instruction Level Parallelism and Its Exploitation: Unit Ii by Raju K, Cse Dept
201 pages
Instruction-Level Parallelism: Stalls Control Stalls WAW Stalls WAR Stalls RAW Stalls Structural CPI CPI
No ratings yet
Instruction-Level Parallelism: Stalls Control Stalls WAW Stalls WAR Stalls RAW Stalls Structural CPI CPI
50 pages
MCP Unit 1
No ratings yet
MCP Unit 1
41 pages
Instruction level Parallelism
No ratings yet
Instruction level Parallelism
22 pages
Instruction Scheduling
No ratings yet
Instruction Scheduling
17 pages
Module 5 Instruction Level Parallelism and Pipelining (1)
No ratings yet
Module 5 Instruction Level Parallelism and Pipelining (1)
54 pages
Exploiting ILP With Software Approach
No ratings yet
Exploiting ILP With Software Approach
104 pages
3a.ILP Dipendenze e Superscalare
No ratings yet
3a.ILP Dipendenze e Superscalare
24 pages
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
170 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
Lecture-7-15.01.2025
No ratings yet
Lecture-7-15.01.2025
19 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
No ratings yet
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
38 pages
CAQA5e ch3
No ratings yet
CAQA5e ch3
45 pages
EE457Unit9a_OoO
No ratings yet
EE457Unit9a_OoO
77 pages
Chapter 5 PPTV 41 STDV 1
No ratings yet
Chapter 5 PPTV 41 STDV 1
47 pages
CAunitiii
No ratings yet
CAunitiii
36 pages
Computer_Architecture_ILP_-_techniques_for_increasing
No ratings yet
Computer_Architecture_ILP_-_techniques_for_increasing
11 pages
Instruction Level Parallelism: Soner Onder
No ratings yet
Instruction Level Parallelism: Soner Onder
25 pages
Instruction-Level Parallel Processors: Objective
No ratings yet
Instruction-Level Parallel Processors: Objective
31 pages
Instruction-Level Parallel Processors: Asim Munir
No ratings yet
Instruction-Level Parallel Processors: Asim Munir
28 pages
Lec5 - ILP Issues in Pipeline Design
No ratings yet
Lec5 - ILP Issues in Pipeline Design
38 pages
S-Turbo Hardware Tool 2010 Catalog
No ratings yet
S-Turbo Hardware Tool 2010 Catalog
60 pages
CS 6461: Computer Architecture Instruction Level Parallelism
No ratings yet
CS 6461: Computer Architecture Instruction Level Parallelism
41 pages
study guide chapter 3
No ratings yet
study guide chapter 3
3 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
Lecture 5
No ratings yet
Lecture 5
76 pages
CA Lecture 12
No ratings yet
CA Lecture 12
48 pages
U3.1 Concepts and Challenges[1] (1)
No ratings yet
U3.1 Concepts and Challenges[1] (1)
12 pages
Unit - 1 Microprocessor Architecture
No ratings yet
Unit - 1 Microprocessor Architecture
52 pages
Star Lion College of Engineering & Technology: Cs2354 Aca-2 Marks & 16 Marks
No ratings yet
Star Lion College of Engineering & Technology: Cs2354 Aca-2 Marks & 16 Marks
14 pages
Module 5_Processor Structure and Function
No ratings yet
Module 5_Processor Structure and Function
74 pages
Jinisha CLASS 12 Commerce Project Documentation
No ratings yet
Jinisha CLASS 12 Commerce Project Documentation
66 pages
Chapter 2 Lecture 4 and 5
No ratings yet
Chapter 2 Lecture 4 and 5
56 pages
E - Gizmo Kits & Modules
No ratings yet
E - Gizmo Kits & Modules
19 pages
03 Dynamic Sched
No ratings yet
03 Dynamic Sched
84 pages
SRM Pipelining 05.Pptx
No ratings yet
SRM Pipelining 05.Pptx
42 pages
Chapter_4
No ratings yet
Chapter_4
78 pages
Ch2 Lec7 Instruction Piplining
No ratings yet
Ch2 Lec7 Instruction Piplining
34 pages
Probability Final
No ratings yet
Probability Final
42 pages
Introduction To Instruction Level Parallelism (ILP) : ECE338 Parallel Computer Architecture Spring 2022
No ratings yet
Introduction To Instruction Level Parallelism (ILP) : ECE338 Parallel Computer Architecture Spring 2022
13 pages
Cs2354 Advanced Computer Architecture 2 Marks
No ratings yet
Cs2354 Advanced Computer Architecture 2 Marks
10 pages
G481 Module 2 Forces in Action Questions
No ratings yet
G481 Module 2 Forces in Action Questions
13 pages
Lec3 PDF
No ratings yet
Lec3 PDF
15 pages
Sharp AR-5726 AR-5731 SM PC
100% (2)
Sharp AR-5726 AR-5731 SM PC
201 pages
Frame and Bumpers
No ratings yet
Frame and Bumpers
12 pages
Instruction-Level Parallelism (ILP), Since The
100% (1)
Instruction-Level Parallelism (ILP), Since The
57 pages
Psychological Stress and Psoriasis: A Systematic Review and Meta-Analysis
No ratings yet
Psychological Stress and Psoriasis: A Systematic Review and Meta-Analysis
12 pages
ITU RO Dr. Vlad Cristiana 2020
No ratings yet
ITU RO Dr. Vlad Cristiana 2020
55 pages
Jurnal 1
No ratings yet
Jurnal 1
17 pages
pdc2: MODULE2
No ratings yet
pdc2: MODULE2
113 pages
Research Paper Optimization of Shaft Design Under Fatigue Loading Using Goodman Method
No ratings yet
Research Paper Optimization of Shaft Design Under Fatigue Loading Using Goodman Method
5 pages
R.M.K. College of Engineering and Technology
No ratings yet
R.M.K. College of Engineering and Technology
7 pages
Processor Organization & Instruction Cycle
No ratings yet
Processor Organization & Instruction Cycle
31 pages
Third Harmonic Injection SPWM Method Based On Alternating Carrier Polarity To Suppress The Common Mode Voltage
No ratings yet
Third Harmonic Injection SPWM Method Based On Alternating Carrier Polarity To Suppress The Common Mode Voltage
12 pages
4-Quality Control
No ratings yet
4-Quality Control
75 pages
Bomba de Diafragma 2 Pulgadas ManualPD20P-XXX-XXX-En
No ratings yet
Bomba de Diafragma 2 Pulgadas ManualPD20P-XXX-XXX-En
8 pages
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
No ratings yet
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
21 pages
Recent Development in Application of Artificial Intelligence in Petroleum Engineering
No ratings yet
Recent Development in Application of Artificial Intelligence in Petroleum Engineering
21 pages
EEF011 Computer Architecture 計算機結構: Exploiting Instruction-Level Parallelism with Software Approaches
0% (1)
EEF011 Computer Architecture 計算機結構: Exploiting Instruction-Level Parallelism with Software Approaches
40 pages
Structure in Linguistics
No ratings yet
Structure in Linguistics
6 pages
Sumber 2
No ratings yet
Sumber 2
3 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
2 pages
Electric Traction System
100% (1)
Electric Traction System
23 pages
Drenth Display Unit USB Manual 2013-5
No ratings yet
Drenth Display Unit USB Manual 2013-5
20 pages
Gardner Denver Nash: Liquid Ring Vacuum Pumps
No ratings yet
Gardner Denver Nash: Liquid Ring Vacuum Pumps
2 pages
Harrigan 2017
No ratings yet
Harrigan 2017
9 pages
Maths
No ratings yet
Maths
8 pages
Improved Accuracy Current-Mode Multiplier Circuits With Applications in Analog Signal Processing PDF
No ratings yet
Improved Accuracy Current-Mode Multiplier Circuits With Applications in Analog Signal Processing PDF
5 pages
Composite Reinforced Line Pipe
No ratings yet
Composite Reinforced Line Pipe
7 pages
4373901-0-ZWI-DM06-Pressure Reducing Valve
No ratings yet
4373901-0-ZWI-DM06-Pressure Reducing Valve
2 pages
Chapter 4 Metal Cutting
No ratings yet
Chapter 4 Metal Cutting
45 pages
300 Watt MOSFET Real HI-FI Power Amplifier
100% (1)
300 Watt MOSFET Real HI-FI Power Amplifier
3 pages
Instruction Level Parallelism-Concepts N Challenges
100% (1)
Instruction Level Parallelism-Concepts N Challenges
4 pages
Colcon Cheats Sheet
No ratings yet
Colcon Cheats Sheet
1 page
MCQ Questions On Computer Fundamental
50% (2)
MCQ Questions On Computer Fundamental
49 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
From Everand
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
Mamta Devi
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

EC483_Fall2024_W7

Uploaded by

EC483_Fall2024_W7

Uploaded by

Computer Architecture

A Quantitative Approach, Sixth Edition

Copyright © 2019, Elsevier Inc. All

In order to enable pipelining we need to

• Unpipelined: time to execute one instruction = T + Tovh

Use the PC to access the I-cache and increment PC by 4

ALU computation, effective address computation for load/store

Memory access to/from data cache, stores finish in 4 cycles

Write result of ALU computation or load into register file

• Pipelining become universal technique in 1985

Two main approaches:

• When exploiting instruction-level parallelism, goal is to

• **Parallelism with basic block is limited

• There are three different types of dependences

•An instruction j is data-dependent on instruction

• Example of data dependence

Lp: fld f0,0(x1) //f0=array element

fadd.d f4,f0,f2 //add scalar in f2

fsd f4,0(x1) //store result

addi x1,x1,-8 //decrement pointer 8 bytes

bne x1,x2,Lp //branch if x1 ≠ x2

• Dependencies are a property of programs

• Data dependence conveys:

• Dependencies that flow through memory locations

• A name dependence occurs when two instructions

• Two types of name dependence

• Instructions with name dependence can execute

• Renaming can be done statically at compile time or

• A control dependence determines the ordering of an

• Can we move S1 after (if p2 ) or S2 before (if p1) ?

• It is insufficient to just maintain data dependences

• The property of whether a value will be used by an

• Then we can move the sub before the beq

• Structural hazards: different instructions in different stages

• Data hazards: an instruction cannot continue because it

• Control hazard: fetch cannot continue because it does

• Example: a unified instruction and data cache →

• The later instruction and all its successors are delayed

• Structural hazards are easy to eliminate – increase the

• To enable ILP we need to

• To optimize ILP we need to

➢ Un-Pipelined RISC-V Code

• Pipeline schedule the unrolled loop

Loop: fld f0,0(x1)

❖ Basic 1-bit predictor:

▪ How basic 1bit predictor branch predictor behaves on

▪ Assume 30% of instructions are branches, and 60%of

▪ HPCA short Lecture series on High Performance Computer Architecture

❖ Basic 2-bit predictor:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.