0% found this document useful (0 votes)

385 views4 pages

Compiler Techniques For Exposing ILP

This document discusses techniques for improving instruction level parallelism (ILP) through compiler optimizations like loop unrolling. It provides an example of a floating point array addition loop and analyzes the performance of the original code, a rescheduled version, an unrolled version, and an unrolled and rescheduled version on a simple pipeline. Unrolling the loop and rescheduling the instructions to hide latencies achieves the best performance of 3.5 cycles per iteration compared to 9 cycles for the original code. However, it notes that the performance gains from unrolling diminish with more iterations and it increases code size and pressure on registers.

Uploaded by

Gan Esh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

385 views4 pages

Compiler Techniques For Exposing ILP

Uploaded by

Gan Esh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Compiler Techniques for Exposing ILP

Keep pipeline full: need sequences of unrelated instructions.

Related instructions must be separated by an amount dependent on the pipeline

depth.

The section concentrates on using loop unrolling.

Consider the following code, where x and s are floating point and i is an int:
for (i=999; i>0; i--)
x[i] = x[i] + s;

The following is a MIPS implementation assuming that s is in F2, R1 has the address
of the last element of the array, and 8(R2) is the address of the first element of the
array.
loop:

L.D
ADD.D
S.D
DADDUI
BNE

F0,0(R1)
F4, F0, F2
F4, 0(R1)
R1, R1, #-8
R1, R2, loop

What happens when you execute this on a simple pipeline.

We have not discussed how floating point operations work, but we will assume the
following latencies:
Instruction
Producing Result
FP ALU Op
FP ALU Op
Load double
Load double

Instruction
Using Result
FP ALU Op
Store double
FP ALU Op
Store double

Latency
in cycles
3
2
1
0

Here is the timing of these instructions.

loop:

L.D

F0, 0(R1)

stall
ADD.D

F4, F0, F2

stall
stall
S.D
DADDUI

F4, F0, 0(R1)

R1, R1, #-8

stall
BNE

R1, R2, loop

clock cycle
issued
1
2
3
4
5
6
7
8
9

We assume the latencies of from the table above.

We assume a latency of 1 cycle from integer ALU to branch since the branch address

is calculated in ID which occurs in the same cycle is the EX of the previous

instruction.
We ignore other delays due to branches.
We can remove half of the stalls by moving the DADDUI up after the L.D.
loop:

L.D
DADDUI
ADD.D

F0, 0(R1)
R1, R1, #-8
F4, F0, F2

stall
stall
S.D
BNE

F4, 8(R1)
R1, R2, loop

clock cycle
issued
1
2
3
4
5
6
7

The body of the loop takes 7 cycles.

Now we unroll 4 cycles of the loop, assuming the number of iterations is divisible by
4:
loop:

L.D
ADD.D
S.D
L.D
ADD.D
S.D
L.D
ADD.D
S.D
L.D
ADD.D
S.D
DADDUI
BNE

F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F6, -8(R1)
F8, F6, F2
F8, -8(R1)
F10, -16(R1)
F12, F10, F2
F12, -16(R1)
F14, -24(R1)
F16, F14, F2
F16, -24(R1)
R1, R1, #-32
R1, R2, loop

clock cycle
issued
1
3
6
7
9
12
13
15
18
19
21
24
25
27

Each LD has 1 stall, each ADDD has 2, and the DADDUI has 1 for a total of 13 stall
cycles and a total of 27 clock cycles for the loop.
Without unrolling, the original would take 36 cycles for 4 iterations and the
rescheduled code would take 28 cycles.
We can do better by changing the order of the instructions:
loop:

L.D

F0, 0(R1)

clock cycle
issued
1

L.D
L.D
L.D
ADD.D
ADD.D
ADD.D
ADD.D
S.D
S.D
DADDUI
S.D
S.D
BNE

2
3
4
5
6
7
8
9
10
11
12
13
14

F6, -8(R1)
F10, -16(R1)
F14, -24(R1)
F4, F0, F2
F8, F6, F2
F12, F10, F2
F16, F14, F2
F4, 0(R1)
F8, -8(R1)
R1, R1, #-32
F12, -16(R1)
F16, -24(R1)
R1, R2, loop

There are now no stalls at all.

Summary of the 4 examples:
Description
ideal
original
scheduled
unrolled
unrolled and scheduled

Cycles per iteration

5
9
7
6.75
3.5

Limitations of loop unrolling:

decrease in saving as we unroll more

When we unroll 4, 2 cycles out of 14 or 14.3% are loop overhead

If we unroll 8, 2 cycles out of 26 or 7.7% are loop overhead

If we unroll 16, 2 cycles out of 34 or 5.9% are loop overhead

increase in code size

limited number of registers

http://vip.cs.utsa.edu/classes/cs3853f2012/notes/ch3-5.html

Case Studies On Educational Administration
No ratings yet
Case Studies On Educational Administration
9 pages
DSO Organizational Chart - by Michael W. Davis, DDS
No ratings yet
DSO Organizational Chart - by Michael W. Davis, DDS
1 page
Attachment 1726796111
No ratings yet
Attachment 1726796111
4 pages
End of Chapter 01 Solution
50% (4)
End of Chapter 01 Solution
21 pages
OilField Review 2016 Cement Evaluation
No ratings yet
OilField Review 2016 Cement Evaluation
10 pages
Cambridge Advanced Practice Tests 2015
0% (1)
Cambridge Advanced Practice Tests 2015
17 pages
Group 1 Annotated Syllabus For Week 3 Merged
No ratings yet
Group 1 Annotated Syllabus For Week 3 Merged
36 pages
Learning Competency Directory S. Y. 2018-2019 Subject: Mathematics 7 Grade: 7
No ratings yet
Learning Competency Directory S. Y. 2018-2019 Subject: Mathematics 7 Grade: 7
17 pages
Postanesthetic Aldrete Recovery Score: Original Criteria Modified Criteria Point Value
No ratings yet
Postanesthetic Aldrete Recovery Score: Original Criteria Modified Criteria Point Value
3 pages
DLL - Science 6 - Q2 - W1
100% (1)
DLL - Science 6 - Q2 - W1
6 pages
Food Sanitation and Safety
No ratings yet
Food Sanitation and Safety
79 pages
REPORT Review of Related Literature
No ratings yet
REPORT Review of Related Literature
66 pages
Case No. 1 Case Title: The Patterson Operation (Case On Motivation)
100% (1)
Case No. 1 Case Title: The Patterson Operation (Case On Motivation)
11 pages
Nego Sec 1-23
0% (1)
Nego Sec 1-23
4 pages
Test I. True-False Direction: Write T If The Statement Is Correct and F If It Is Wrong Beside
0% (1)
Test I. True-False Direction: Write T If The Statement Is Correct and F If It Is Wrong Beside
8 pages
Loan Sample Computation
100% (1)
Loan Sample Computation
15 pages
General Education - PRC
No ratings yet
General Education - PRC
29 pages
Petitioner, vs. Respondents
No ratings yet
Petitioner, vs. Respondents
1 page
CFAS Assessment Activities 1
No ratings yet
CFAS Assessment Activities 1
5 pages
Assignment 3
No ratings yet
Assignment 3
2 pages
Title Three: Penalties: Chapter I: Penalties in General
No ratings yet
Title Three: Penalties: Chapter I: Penalties in General
27 pages
ACCCOB3 - Group Case Analysis
No ratings yet
ACCCOB3 - Group Case Analysis
8 pages
Chapter 15 IM 10th Ed
No ratings yet
Chapter 15 IM 10th Ed
38 pages
Quiz - Inventories Cut-Off Test
No ratings yet
Quiz - Inventories Cut-Off Test
1 page
Diamond Trading Cuasay
No ratings yet
Diamond Trading Cuasay
2 pages
Static 5 Particle3D
No ratings yet
Static 5 Particle3D
19 pages
Samplepractice Exam 18 April 2017 Questions and Answers
No ratings yet
Samplepractice Exam 18 April 2017 Questions and Answers
21 pages
Speech Outlining
100% (1)
Speech Outlining
6 pages
ch01 - Managerial Accounting
No ratings yet
ch01 - Managerial Accounting
46 pages
The Trojan War
100% (1)
The Trojan War
11 pages
Aud Theo Quiz 1
No ratings yet
Aud Theo Quiz 1
9 pages
Labor Law CASE DOCTRINES 2019
No ratings yet
Labor Law CASE DOCTRINES 2019
50 pages
Chapter 17 Differential Amplifiers
No ratings yet
Chapter 17 Differential Amplifiers
20 pages
Fraud, Error and NOCLAR
No ratings yet
Fraud, Error and NOCLAR
7 pages
Case Study 1 - Gabuat
No ratings yet
Case Study 1 - Gabuat
4 pages
Microelectronics MCQs-Questions
No ratings yet
Microelectronics MCQs-Questions
16 pages
Transboundary 1
No ratings yet
Transboundary 1
1 page
BA 101 Case Analysis 4 (Swiss Guard) Final
No ratings yet
BA 101 Case Analysis 4 (Swiss Guard) Final
6 pages
Experiment N1
No ratings yet
Experiment N1
4 pages
Business Ethics Lesson 4
No ratings yet
Business Ethics Lesson 4
10 pages
SDBHSDVJ
No ratings yet
SDBHSDVJ
9 pages
Intacc Practice Mats
No ratings yet
Intacc Practice Mats
6 pages
7 - Eleven 4Ps
No ratings yet
7 - Eleven 4Ps
2 pages
Partnership Dissolutiona and Liquidation For Students
No ratings yet
Partnership Dissolutiona and Liquidation For Students
3 pages
Conceptual Framework - Assumptions and Financial Reporting
No ratings yet
Conceptual Framework - Assumptions and Financial Reporting
4 pages
Stockholder'S Equity: Composition
No ratings yet
Stockholder'S Equity: Composition
4 pages
Marubeni Corporation, Et Al. vs. Felix Lirag G.R. No. 130998, August 10, 2001 Pardo, J.: Doctrine
No ratings yet
Marubeni Corporation, Et Al. vs. Felix Lirag G.R. No. 130998, August 10, 2001 Pardo, J.: Doctrine
1 page
Exam - Chapter 006
No ratings yet
Exam - Chapter 006
19 pages
MN Loop Unrolling
No ratings yet
MN Loop Unrolling
5 pages
Adv Topic Compiler Supported ILPSlides
No ratings yet
Adv Topic Compiler Supported ILPSlides
18 pages
Adv Topic Compiler Supported ILP
No ratings yet
Adv Topic Compiler Supported ILP
17 pages
Exploiting Instruction-Level Parallelism With Software Approaches
No ratings yet
Exploiting Instruction-Level Parallelism With Software Approaches
108 pages
Lec18-Static BRANCH PREDICTION VLIW
No ratings yet
Lec18-Static BRANCH PREDICTION VLIW
40 pages
9 Loop Unrolling
No ratings yet
9 Loop Unrolling
21 pages
4.1 Basic Compiler Techniques For Exposing ILP Instruction-Level Parallelism
No ratings yet
4.1 Basic Compiler Techniques For Exposing ILP Instruction-Level Parallelism
11 pages
HW3 Sol PDF
No ratings yet
HW3 Sol PDF
5 pages
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
170 pages
Unit II
No ratings yet
Unit II
84 pages
Lecture: Static ILP: Topics: Predication, Speculation (Sections C.5, 3.2)
No ratings yet
Lecture: Static ILP: Topics: Predication, Speculation (Sections C.5, 3.2)
26 pages
Data Dependences and Hazards
No ratings yet
Data Dependences and Hazards
24 pages
Lecture 5
No ratings yet
Lecture 5
76 pages
13) Ilp1 PDF
No ratings yet
13) Ilp1 PDF
85 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
D-155 - 3 Cylinder Diesel Engine (01/75 - 12/85) 00 - Complete Machine 04-02 - Piston and Cylinder Sleeve
No ratings yet
D-155 - 3 Cylinder Diesel Engine (01/75 - 12/85) 00 - Complete Machine 04-02 - Piston and Cylinder Sleeve
4 pages
Expression of Interest Bhushan - 1
No ratings yet
Expression of Interest Bhushan - 1
6 pages
Individual Performance Commitment and Review Form: Tabuk City Division Tuga National High School Tuga, Tabuk City
No ratings yet
Individual Performance Commitment and Review Form: Tabuk City Division Tuga National High School Tuga, Tabuk City
10 pages
Book Sizes
No ratings yet
Book Sizes
9 pages
On Case Study Method of Teaching
No ratings yet
On Case Study Method of Teaching
36 pages
MTS3101 Appendices v1
No ratings yet
MTS3101 Appendices v1
35 pages
D2R Season 9 Charger Paladin Build (D2R 2.8)
No ratings yet
D2R Season 9 Charger Paladin Build (D2R 2.8)
22 pages
Premier General Catalogue PDF
No ratings yet
Premier General Catalogue PDF
48 pages
School Plan of Activities Sembreak
No ratings yet
School Plan of Activities Sembreak
2 pages
Web Content Management System
No ratings yet
Web Content Management System
6 pages
SL 4001.167172237 PDF
No ratings yet
SL 4001.167172237 PDF
2 pages
Lab Report Liquid Flow
No ratings yet
Lab Report Liquid Flow
17 pages
Pumeet
No ratings yet
Pumeet
46 pages
Lab 1
No ratings yet
Lab 1
12 pages
Source Follower: (Common-Drain Amplifier)
No ratings yet
Source Follower: (Common-Drain Amplifier)
40 pages
Modeling Class X AI
No ratings yet
Modeling Class X AI
24 pages
Week 11 Probability and Statistics
No ratings yet
Week 11 Probability and Statistics
27 pages
Intro To Psych L6
No ratings yet
Intro To Psych L6
10 pages
Literature Review of Personality Traits Essays
100% (1)
Literature Review of Personality Traits Essays
6 pages
Cadd 1 Final Tos
No ratings yet
Cadd 1 Final Tos
2 pages
Analysis of Consumer Satisfaction and Lo 300543b7
No ratings yet
Analysis of Consumer Satisfaction and Lo 300543b7
18 pages
Quality Improvement Training Guide 2022
No ratings yet
Quality Improvement Training Guide 2022
99 pages
CCW Basics and The Micro 830
No ratings yet
CCW Basics and The Micro 830
52 pages
DevOps Part I
No ratings yet
DevOps Part I
16 pages
March Apr Current RAS NEW (1) 1
No ratings yet
March Apr Current RAS NEW (1) 1
40 pages
Briefing Dealer - Update Socialization Jan 2020
No ratings yet
Briefing Dealer - Update Socialization Jan 2020
17 pages
Analysis of Legal Case Document Automated Summarizer
No ratings yet
Analysis of Legal Case Document Automated Summarizer
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Compiler Techniques For Exposing ILP

Uploaded by

Compiler Techniques For Exposing ILP

Uploaded by

Compiler Techniques for Exposing ILP

Keep pipeline full: need sequences of unrelated instructions.

Related instructions must be separated by an amount dependent on the pipeline

The section concentrates on using loop unrolling.

What happens when you execute this on a simple pipeline.

Here is the timing of these instructions.

F4, F0, 0(R1)

R1, R2, loop

We assume the latencies of from the table above.

is calculated in ID which occurs in the same cycle is the EX of the previous

The body of the loop takes 7 cycles.

There are now no stalls at all.

Cycles per iteration

Limitations of loop unrolling:

decrease in saving as we unroll more

When we unroll 4, 2 cycles out of 14 or 14.3% are loop overhead

If we unroll 8, 2 cycles out of 26 or 7.7% are loop overhead

If we unroll 16, 2 cycles out of 34 or 5.9% are loop overhead

increase in code size

limited number of registers

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.