0% found this document useful (0 votes)

14 views38 pages

Lecture 10

Uploaded by

ramanideep20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views38 pages

Lecture 10

Uploaded by

ramanideep20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Control flow graphs and

loop optimizations

Friday, October 26, 12

Agenda
• Building control flow graphs

• Low level loop optimizations

• Code motion

• Strength reduction

• Unrolling

• High level loop optimizations

• Loop fusion

• Loop interchange

• Loop tiling

Friday, October 26, 12

Moving beyond basic blocks
• Up until now, we have focused on single basic blocks

• What do we do if we want to consider larger units of

computation

• Whole procedures?

• Whole program?

• Idea: capture control flow of a program

• How control transfers between basic blocks due to:

• Conditionals

• Loops

Friday, October 26, 12

Representation
• Use standard three-address code

• Jump targets are labeled

• Also label beginning/end of functions

• Want to keep track of targets of jump statements

• Any statement whose execution may immediately follow

execution of jump statement

• Explicit targets: targets mentioned in jump statement

• Implicit targets: statements that follow conditional jump

statements

• The statement that gets executed if the branch is not

taken

Friday, October 26, 12

Running example

A = 4
t1 = A * B
repeat {
t2 = t1/C
if (t2 ≥ W) {
M = t1 * k
t3 = M + I
}
H = I
M = t3 - H
} until (T3 ≥ 0)

Friday, October 26, 12

Running example

1 A = 4
2 t1 = A * B
3 L1: t2 = t1 / C
4 if t2 < W goto L2
5 M = t1 * k
6 t3 = M + I
7 L2: H = I
8 M = t3 - H
9 if t3 ≥ 0 goto L3
10 goto L1
11 L3: halt

Friday, October 26, 12

Control flow graphs
• Divides statements into basic blocks

• Basic block: a maximal sequence of statements I0, I1, I2, ..., In

such that if Ij and Ij+1 are two adjacent statements in this
sequence, then

• The execution of Ij is always immediately followed by the

execution of Ij+1

• The execution of Ij+1 is always immediate preceded by

the execution of Ij

• Edges between basic blocks represent potential flow of

control

Friday, October 26, 12

CFG for running example
A = 4
t1 = A * B

L1: t2 = t1/c
if t2 < W goto L2

M = t1 * k
goto L1
t3 = M + I

L2: H = I
M = t3 - H
if t3 ≥ 0 goto L3
How do we build
this automatically?
L3: halt

Friday, October 26, 12

Constructing a CFG
• To construct a CFG where each node is a basic block

• Identify leaders: first statement of a basic block

• In program order, construct a block by appending

subsequent statements up to, but not including, the next
leader

• Identifying leaders

• First statement in the program

• Explicit target of any conditional or unconditional branch

• Implicit target of any branch

Friday, October 26, 12

Partitioning algorithm
• Input: set of statements, stat(i) = ith statement in input

• Output: set of leaders, set of basic blocks where block(x) is

the set of statements in the block with leader x

• Algorithm
leaders = {1}

//Leaders always includes first statement
for i = 1 to |n|
//|n| = number of statements
if stat(i) is a branch, then
leaders = leaders ∪ all potential targets
end for
worklist = leaders
while worklist not empty do
x = remove earliest statement in worklist
block(x) = {x}
for (i = x + 1; i ≤ |n| and i ∉ leaders; i++)
block(x) = block(x) ∪ {i}
end for
end while

Friday, October 26, 12

Running example

1 A = 4
2 t1 = A * B
3 L1: t2 = t1 / C
4 if t2 < W goto L2
5 M = t1 * k
6 t3 = M + I
7 L2: H = I
8 M = t3 - H
9 if t3 ≥ 0 goto L3
10 goto L1
11 L3: halt

Leaders =
Basic blocks =
Friday, October 26, 12
Running example

1 A = 4
2 t1 = A * B
3 L1: t2 = t1 / C
4 if t2 < W goto L2
5 M = t1 * k
6 t3 = M + I
7 L2: H = I
8 M = t3 - H
9 if t3 ≥ 0 goto L3
10 goto L1
11 L3: halt

Leaders =

{1, 3, 5, 7, 10, 11}
Basic blocks =
{ {1, 2}, {3, 4}, {5, 6}, {7, 8, 9}, {10}, {11} }
Friday, October 26, 12
Putting edges in CFG
• There is a directed edge from B1 to B2 if

• There is a branch from the last statement of B1 to the first

statement (leader) of B2

• B2 immediately follows B1 in program order and B1 does not end

with an unconditional branch

• Input: block, a sequence of basic blocks

• Output: The CFG

for i = 1 to |block|
x = last statement of block(i)
if stat(x) is a branch, then
for each explicit target y of stat(x)
create edge from block i to block y
end for
if stat(x) is not unconditional then
create edge from block i to block i+1
end for

Friday, October 26, 12

Result
A = 4
t1 = A * B

L1: t2 = t1/c
if t2 < W goto L2

M = t1 * k
goto L1
t3 = M + I

L2: H = I
M = t3 - H
if t3 ≥ 0 goto L3

L3: halt

Friday, October 26, 12

Discussion
• Some times we will also consider the statement-level CFG,
where each node is a statement rather than a basic block

• Either kind of graph is referred to as a CFG

• In statement-level CFG, we often use a node to explicitly

represent merging of control

• Control merges when two different CFG nodes point to

the same node

• Note: if input language is structured, front-end can generate

basic block directly

• “GOTO considered harmful”

Friday, October 26, 12

Statement level CFG
A = 4

t1 = A * B

L1: t2 = t1/c

if t2 < W goto L2

M = t1 * k
goto L1
t3 = M + I

L2: H = I

M = t3 - H

if t3 ≥ 0 goto L3

L3: halt
Friday, October 26, 12
Loop optimization

• Low level optimization

• Moving code around in a single loop

• Examples: loop invariant code motion, strength

reduction, loop unrolling

• High level optimization

• Restructuring loops, often affects multiple loops

• Examples: loop fusion, loop interchange, loop tiling

Friday, October 26, 12

Low level loop optimizations

• Affect a single loop

• Usually performed at three-address code stage or later in

compiler

• First problem: identifying loops

• Low level representation doesn’t have loop statements!

Friday, October 26, 12

Identifying loops

• First, we must identify dominators

• Node a dominates node b if every possible execution

path that gets to b must pass through a

• Many different algorithms to calculate dominators – we

will not cover how this is calculated

• A back edge is an edge from b to a when a dominates b

• The target of a back edge is a loop header

Friday, October 26, 12

Natural loops
• Will focus on natural loops –
loops that arise in structured
programs B1

• For a node n to be in a loop

with header h
B2
• n must be dominated by h

• There must be a path in the B3

CFG from n to h through a
back-edge to h

• What are the back edges in the B4

example to the right? The loop
headers? The natural loops?

Friday, October 26, 12

Loop invariant code motion

• Idea: some expressions evaluated in a loop never change;

they are loop invariant

• Can move loop invariant expressions outside the loop,

store result in temporary and just use the temporary in
each iteration

• Why is this useful?

Friday, October 26, 12

Identifying loop invariant code
• To determine if a statement
s: a = b op c
is loop invariant, find all definitions of b and c that reach s

• A statement t defining b reaches s if there is a path from

t to s where b is not re-defined

• s is loop invariant if both b and c satisfy one of the following

• it is constant

• all definitions that reach it are from outside the loop

• only one definition reaches it and that definition is also

loop invariant

Friday, October 26, 12

Moving loop invariant code
• Just because code is loop invariant doesn’t mean we can move it!
for (...) a = 5;
for (...) if (*) for (...)
for (...) if (*) a = 5 if (*)
a = b + c a = 5 else a = 4 + c
c = a; a = 6 b = a

• We can move a loop invariant statement a = b op c if

• The statement dominates all loop exits where a is live

• There is only one definition of a in the loop

• a is not live before the loop

• Move instruction to a preheader, a new block put right before

loop header

Friday, October 26, 12

Strength reduction
• Like strength reduction
peephole optimization for (i = 0; i < 100; i++)

• Peephole: replace A[i] = 0;

expensive instruction like
a * 2 with a << 1

• Replace expensive
i = 0;
instruction, multiply, with a
cheap one, addition L2:if (i >= 100) goto L1
j = 4 * i + &A
• Applies to uses of an *j = 0;
induction variable i = i + 1;
goto L2
• Opportunity: array
L1:
indexing

Friday, October 26, 12

Strength reduction
• Like strength reduction
peephole optimization for (i = 0; i < 100; i++)

• Peephole: replace A[i] = 0;

expensive instruction like
a * 2 with a << 1

• Replace expensive
i = 0; k = &A;
instruction, multiply, with a
cheap one, addition L2:if (i >= 100) goto L1
j = k;
• Applies to uses of an *j = 0;
induction variable i = i + 1; k = k + 4;
goto L2
• Opportunity: array
L1:
indexing

Friday, October 26, 12

Induction variables
• A basic induction variable is a variable j

• whose only definition within the loop is an assignment of the

form j = j ± c, where c is loop invariant

• Intuition: the variable which determines number of iterations is

usually an induction variable

• A mutual induction variable i may be

• defined once within the loop, and its value is a linear function of
some other induction variable j such that
i = c1 * j ± c2 or i = j/c1 ± c2
where c1, c2 are loop invariant

• A family of induction variables include a basic induction variable and

any related mutual induction variables

Friday, October 26, 12

Strength reduction algorithm
• Let i be an induction variable in the family of the basic induction
variable j, such that i = c1 * j + c2

• Create a new variable i’

• Initialize in preheader
i’ = c1 * j + c2

• Track value of j. After j = j + c3, perform

i’ = i’ + (c1 * c3)

• Replace definition of i with

i = i’

• Key: c1, c2, c3 are all loop invariant (or constant), so computations
like (c1 * c3) can be moved outside loop

Friday, October 26, 12

Linear test replacement
• After strength reduction, the i = 2
loop test may be the only use of for (; i < k; i++)
the basic induction variable j = 50*i
... = j
• Can now eliminate induction
variable altogether Strength reduction
• Algorithm
i = 2; j’ = 50 * i
• If only use of an induction for (; i < k; i++, j’ += 50)
variable is the loop test and ... = j’
its increment, and if the test
is always computed Linear test replacement

• Can replace the test with an

i = 2; j’ = 50 * i
equivalent one using one of
for (; j’ < 50*k; j’ += 50)
the mutual induction
variables ... = j’

Friday, October 26, 12

Loop unrolling

• Modifying induction
variable in each iteration for (i = 0; i < N; i++)
can be expensive A[i] = ...

• Can instead unroll loops

Unroll by factor of 4
and perform multiple
iterations for each
increment of the for (i = 0; i < N; i += 4)
induction variable A[i] = ...
A[i+1] = ...
• What are the advantages A[i+2] = ...
and disadvantages? A[i+3] = ...

Friday, October 26, 12

High level loop optimizations

• Many useful compiler optimizations require restructuring

loops or sets of loops

• Combining two loops together (loop fusion)

• Switching the order of a nested loop (loop interchange)

• Completely changing the traversal order of a loop (loop

tiling)

• These sorts of high level loop optimizations usually take

place at the AST level (where loop structure is obvious)

Friday, October 26, 12

Cache behavior j
• Most loop transformations target cache x
performance

• Attempt to increase spatial or temporal

i
locality

• Locality can be exploited when there is

y A
reuse of data (for temporal locality) or
recent access of nearby data (for spatial
locality) y = Ax
• Loops are a good opportunity for this: many
loops iterate through matrices or arrays

• Consider matrix-vector multiply example

for (i = 0; i < N; i++)
• Multiple traversals of vector:
for (j = 0; j < N; j++)
opportunity for spatial and temporal
locality y[i] += A[i][j] * x[j]
• Regular access to array: opportunity for
spatial locality

Friday, October 26, 12

Loop fusion
do I = 1, n
c[i] = a[i]
end do do I = 1, n
do I = 1, n
b[i] = a[i]
• Combine two loops c[i] = a[i]
together into a single b[i] = a[i]
end do loop end do

c[1:n] • Why is this useful? c[1:n]

a[1:n]
• Is this always legal? a[1:n]

b[1:n]
b[1:n]
a[1:n]

Friday, October 26, 12

Loop interchange
• Change the order of a nested
loop
j

• This is not always legal – it

x
changes the order that
elements are accessed! i

• Why is this useful?

y A
• Consider matrix-matrix
for (i = 0; i < N; i++)
multiply when A is stored for (j = 0; j < N; j++)
in column-major order y[i] += A[i][j] * x[j]
(i.e., each column is stored
in contiguous memory)

Friday, October 26, 12

Loop interchange
• Change the order of a nested
loop
j

• This is not always legal – it x

changes the order that
elements are accessed! i

• Why is this useful?

y A
• Consider matrix-matrix
for (j = 0; j < N; j++)
multiply when A is stored for (i = 0; i < N; i++)
in column-major order y[i] += A[i][j] * x[j]
(i.e., each column is stored
in contiguous memory)

Friday, October 26, 12

Loop tiling
for (i = 0; i < N; i++)
• Also called “loop blocking” for (j = 0; j < N; j++)

• One of the more complex

y[i] += A[i][j] * x[j]

loop transformations

• Goal: break loop up into for (ii = 0; ii < N; ii += B)

for (jj = 0; jj < N; jj += B)
smaller pieces to get spatial
for (i = ii; i < ii+B; i++)
and temporal locality
for (j = jj; j < jj+B; j++)

• Create new inner loops y[i] += A[i][j] * x[j]

so that data accessed in j
inner loops fit in cache x

• Also changes iteration

order, so may not be legal i

y A
Friday, October 26, 12
Loop tiling
for (i = 0; i < N; i++)
• Also called “loop blocking” for (j = 0; j < N; j++)

• One of the more complex

y[i] += A[i][j] * x[j]

loop transformations

• Goal: break loop up into for (ii = 0; ii < N; ii += B)

for (jj = 0; jj < N; jj += B)
smaller pieces to get spatial
for (i = ii; i < ii+B; i++)
and temporal locality
for (j = jj; j < jj+B; j++)

• Create new inner loops y[i] += A[i][j] * x[j]

so that data accessed in j
inner loops fit in cache x

• Also changes iteration B

order, so may not be legal i B

y A
Friday, October 26, 12
In a real (Itanium) compiler
GFLOPS relative to -O2; bigger is better
30.0
92% of Peak
Performance
22.5
factor faster than -O2

15.0

7.5

0
1

4
-O

-O

-O
et

l-j
ef

c
l
ch

gc
pr

ng
un
r
te
+

ki
in

oc
+
+

bl
+

Friday, October 26, 12

Loop transformations

• Loop transformations can have dramatic effects on performance

• Doing this legally and automatically is very difficult!

• Researchers have developed techniques to determine legality of loop

transformations and automatically transform the loop

• Techniques like unimodular transform framework and polyhedral

framework

• These approaches will get covered in more detail in advanced

compilers course

Friday, October 26, 12

Unit V Updated
No ratings yet
Unit V Updated
126 pages
Code Optimization
No ratings yet
Code Optimization
63 pages
Lecture08 - High-Level Digital Design Automation
No ratings yet
Lecture08 - High-Level Digital Design Automation
39 pages
Unit4 Contd CD
No ratings yet
Unit4 Contd CD
49 pages
Code Optimization
No ratings yet
Code Optimization
32 pages
Basic Blocks, Flow Graphs
No ratings yet
Basic Blocks, Flow Graphs
27 pages
CS6109 Module 10
No ratings yet
CS6109 Module 10
35 pages
CD Unit 5
No ratings yet
CD Unit 5
126 pages
Unit V-CD New
No ratings yet
Unit V-CD New
126 pages
Unit 6 and 7 - Code Optimization and Code Generation
No ratings yet
Unit 6 and 7 - Code Optimization and Code Generation
48 pages
Module 5 - Code Optimization
No ratings yet
Module 5 - Code Optimization
72 pages
FALLSEM2024-25 BCSE307L TH VL2024250101542 2024-10-25 Reference-Material-III
No ratings yet
FALLSEM2024-25 BCSE307L TH VL2024250101542 2024-10-25 Reference-Material-III
36 pages
CD - Ch.6
No ratings yet
CD - Ch.6
33 pages
Code Optimization
No ratings yet
Code Optimization
36 pages
CD Unit V
No ratings yet
CD Unit V
9 pages
Exp - 7 To 9 SE
No ratings yet
Exp - 7 To 9 SE
19 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
54 pages
Emailing Optimization
No ratings yet
Emailing Optimization
50 pages
Code Generation and Optimization
No ratings yet
Code Generation and Optimization
37 pages
Presentation 1
No ratings yet
Presentation 1
18 pages
L1.1.2 Introduction To Programming Languages
No ratings yet
L1.1.2 Introduction To Programming Languages
14 pages
Assignment 1
No ratings yet
Assignment 1
9 pages
Compiler Design - Module 5
No ratings yet
Compiler Design - Module 5
37 pages
UNIT 5 Notes CD
No ratings yet
UNIT 5 Notes CD
6 pages
Basic Blocks
No ratings yet
Basic Blocks
18 pages
Unit 4
No ratings yet
Unit 4
19 pages
Ch8a Myppt
No ratings yet
Ch8a Myppt
42 pages
CD Unit-V
No ratings yet
CD Unit-V
14 pages
Code Generation Compiler Construction
No ratings yet
Code Generation Compiler Construction
38 pages
Lecture Notes On Loop-Invariant Code Motion
No ratings yet
Lecture Notes On Loop-Invariant Code Motion
8 pages
Code Optimization - Compiler Design
No ratings yet
Code Optimization - Compiler Design
33 pages
Basic Block Optimization
No ratings yet
Basic Block Optimization
33 pages
Cdunit 5
No ratings yet
Cdunit 5
41 pages
High Level Synthesis - 02 - Basic Concepts
No ratings yet
High Level Synthesis - 02 - Basic Concepts
27 pages
Optimization PDF
No ratings yet
Optimization PDF
40 pages
Issues IN THE Design OF A Code Generator
No ratings yet
Issues IN THE Design OF A Code Generator
41 pages
Unit 4
No ratings yet
Unit 4
15 pages
CD Unit5
No ratings yet
CD Unit5
37 pages
Unit V QB
No ratings yet
Unit V QB
15 pages
Compiler Design Unit 5
No ratings yet
Compiler Design Unit 5
39 pages
Code Optimization
No ratings yet
Code Optimization
58 pages
Intermediate Representation: Goals
No ratings yet
Intermediate Representation: Goals
40 pages
Unit V
No ratings yet
Unit V
11 pages
CH06
No ratings yet
CH06
28 pages
Code Generation
No ratings yet
Code Generation
43 pages
Code Optimization
0% (1)
Code Optimization
42 pages
CD Unit-5
No ratings yet
CD Unit-5
30 pages
Code Optimization-I
No ratings yet
Code Optimization-I
12 pages
Optimization
No ratings yet
Optimization
67 pages
@@code Optim
No ratings yet
@@code Optim
20 pages
Computer Programming PDF
No ratings yet
Computer Programming PDF
260 pages
Unit 8 Code Optimization and Generation
No ratings yet
Unit 8 Code Optimization and Generation
10 pages
DSP Lab 7 Manual
0% (1)
DSP Lab 7 Manual
10 pages
Minor Data Science
No ratings yet
Minor Data Science
15 pages
Final Viva
No ratings yet
Final Viva
27 pages
11.intermediate Code Generation Quadruple, Triple, Indirect Triple
No ratings yet
11.intermediate Code Generation Quadruple, Triple, Indirect Triple
27 pages
4-Curve Fitting and Interpolation
No ratings yet
4-Curve Fitting and Interpolation
48 pages
A Brief Odyssey of Dataflow Analysis in Optimizing Compilers
No ratings yet
A Brief Odyssey of Dataflow Analysis in Optimizing Compilers
20 pages
MA207 Chap2
No ratings yet
MA207 Chap2
22 pages
Adc F04
No ratings yet
Adc F04
45 pages
Compiler Construction: A Compulsory Module For Students in
No ratings yet
Compiler Construction: A Compulsory Module For Students in
34 pages
Compiler Ch9
100% (1)
Compiler Ch9
24 pages
Nitte Meenakshi Institute of Technology
No ratings yet
Nitte Meenakshi Institute of Technology
13 pages
3 ARIMA Models - 3.1 Autoregressive Moving Average Models
No ratings yet
3 ARIMA Models - 3.1 Autoregressive Moving Average Models
37 pages
Algorithmic Problem Solving
No ratings yet
Algorithmic Problem Solving
12 pages
DeepSeek图解10页
No ratings yet
DeepSeek图解10页
11 pages
AI Othello: Mick G.D. Remmerswaal April 23, 2020
No ratings yet
AI Othello: Mick G.D. Remmerswaal April 23, 2020
35 pages
DAA Unit 3
No ratings yet
DAA Unit 3
23 pages
AVR223 - Digital Filters With AVR
No ratings yet
AVR223 - Digital Filters With AVR
24 pages
Gauss Elimination, Jordan, Siedel, Jacobi
No ratings yet
Gauss Elimination, Jordan, Siedel, Jacobi
4 pages
Chapter 5 PCM
No ratings yet
Chapter 5 PCM
10 pages
AI Questionaries For Exit Exam Preparation - AI (CS-2015)
No ratings yet
AI Questionaries For Exit Exam Preparation - AI (CS-2015)
9 pages
Big M Method
No ratings yet
Big M Method
10 pages
Minimum Cost Spanning Tree Unit-3
No ratings yet
Minimum Cost Spanning Tree Unit-3
20 pages
12 Sorting
No ratings yet
12 Sorting
11 pages
Bubble Sort Example
No ratings yet
Bubble Sort Example
2 pages
Report
No ratings yet
Report
4 pages
Assignment 3-2
No ratings yet
Assignment 3-2
2 pages
Distance Formula
No ratings yet
Distance Formula
8 pages
Data Mining - Sem 3 - Assignment - 2
No ratings yet
Data Mining - Sem 3 - Assignment - 2
5 pages
A Novel Metaheuristic: Jaguar Algorithm With Learning Behavior
No ratings yet
A Novel Metaheuristic: Jaguar Algorithm With Learning Behavior
6 pages
Radix Sort
No ratings yet
Radix Sort
6 pages
Ads Sy
No ratings yet
Ads Sy
3 pages
Code Optimiztion Criteria For Code-Improving Transformations
No ratings yet
Code Optimiztion Criteria For Code-Improving Transformations
10 pages
Instructions For How To Solve Assignment
No ratings yet
Instructions For How To Solve Assignment
3 pages
Exercise For Sorting and Searching in Data Structure N Skema 2014
No ratings yet
Exercise For Sorting and Searching in Data Structure N Skema 2014
3 pages
Quick Sort
No ratings yet
Quick Sort
5 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
C Programming
From Everand
C Programming
Netra
No ratings yet
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.