0% found this document useful (0 votes)
14 views38 pages

Lecture 10

Uploaded by

ramanideep20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views38 pages

Lecture 10

Uploaded by

ramanideep20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Control flow graphs and

loop optimizations

Friday, October 26, 12


Agenda
• Building control flow graphs

• Low level loop optimizations

• Code motion

• Strength reduction

• Unrolling

• High level loop optimizations

• Loop fusion

• Loop interchange

• Loop tiling

Friday, October 26, 12


Moving beyond basic blocks
• Up until now, we have focused on single basic blocks

• What do we do if we want to consider larger units of


computation

• Whole procedures?

• Whole program?

• Idea: capture control flow of a program

• How control transfers between basic blocks due to:

• Conditionals

• Loops

Friday, October 26, 12


Representation
• Use standard three-address code

• Jump targets are labeled

• Also label beginning/end of functions

• Want to keep track of targets of jump statements

• Any statement whose execution may immediately follow


execution of jump statement

• Explicit targets: targets mentioned in jump statement

• Implicit targets: statements that follow conditional jump


statements

• The statement that gets executed if the branch is not


taken

Friday, October 26, 12


Running example

A = 4
t1 = A * B
repeat {
t2 = t1/C
if (t2 ≥ W) {
M = t1 * k
t3 = M + I
}
H = I
M = t3 - H
} until (T3 ≥ 0)

Friday, October 26, 12


Running example

1 A = 4
2 t1 = A * B
3 L1: t2 = t1 / C
4 if t2 < W goto L2
5 M = t1 * k
6 t3 = M + I
7 L2: H = I
8 M = t3 - H
9 if t3 ≥ 0 goto L3
10 goto L1
11 L3: halt

Friday, October 26, 12


Control flow graphs
• Divides statements into basic blocks

• Basic block: a maximal sequence of statements I0, I1, I2, ..., In


such that if Ij and Ij+1 are two adjacent statements in this
sequence, then

• The execution of Ij is always immediately followed by the


execution of Ij+1

• The execution of Ij+1 is always immediate preceded by


the execution of Ij

• Edges between basic blocks represent potential flow of


control

Friday, October 26, 12


CFG for running example
A = 4
t1 = A * B

L1: t2 = t1/c
if t2 < W goto L2

M = t1 * k
goto L1
t3 = M + I

L2: H = I
M = t3 - H
if t3 ≥ 0 goto L3
How do we build
this automatically?
L3: halt

Friday, October 26, 12


Constructing a CFG
• To construct a CFG where each node is a basic block

• Identify leaders: first statement of a basic block

• In program order, construct a block by appending


subsequent statements up to, but not including, the next
leader

• Identifying leaders

• First statement in the program

• Explicit target of any conditional or unconditional branch

• Implicit target of any branch

Friday, October 26, 12


Partitioning algorithm
• Input: set of statements, stat(i) = ith statement in input

• Output: set of leaders, set of basic blocks where block(x) is


the set of statements in the block with leader x

• Algorithm
leaders = {1}

//Leaders always includes first statement
for i = 1 to |n|
//|n| = number of statements
if stat(i) is a branch, then
leaders = leaders ∪ all potential targets
end for
worklist = leaders
while worklist not empty do
x = remove earliest statement in worklist
block(x) = {x}
for (i = x + 1; i ≤ |n| and i ∉ leaders; i++)
block(x) = block(x) ∪ {i}
end for
end while

Friday, October 26, 12


Running example

1 A = 4
2 t1 = A * B
3 L1: t2 = t1 / C
4 if t2 < W goto L2
5 M = t1 * k
6 t3 = M + I
7 L2: H = I
8 M = t3 - H
9 if t3 ≥ 0 goto L3
10 goto L1
11 L3: halt

Leaders =
Basic blocks =
Friday, October 26, 12
Running example

1 A = 4
2 t1 = A * B
3 L1: t2 = t1 / C
4 if t2 < W goto L2
5 M = t1 * k
6 t3 = M + I
7 L2: H = I
8 M = t3 - H
9 if t3 ≥ 0 goto L3
10 goto L1
11 L3: halt

Leaders =


{1, 3, 5, 7, 10, 11}
Basic blocks =
{ {1, 2}, {3, 4}, {5, 6}, {7, 8, 9}, {10}, {11} }
Friday, October 26, 12
Putting edges in CFG
• There is a directed edge from B1 to B2 if

• There is a branch from the last statement of B1 to the first


statement (leader) of B2

• B2 immediately follows B1 in program order and B1 does not end


with an unconditional branch

• Input: block, a sequence of basic blocks

• Output: The CFG


for i = 1 to |block|
x = last statement of block(i)
if stat(x) is a branch, then
for each explicit target y of stat(x)
create edge from block i to block y
end for
if stat(x) is not unconditional then
create edge from block i to block i+1
end for

Friday, October 26, 12


Result
A = 4
t1 = A * B

L1: t2 = t1/c
if t2 < W goto L2

M = t1 * k
goto L1
t3 = M + I

L2: H = I
M = t3 - H
if t3 ≥ 0 goto L3

L3: halt

Friday, October 26, 12


Discussion
• Some times we will also consider the statement-level CFG,
where each node is a statement rather than a basic block

• Either kind of graph is referred to as a CFG

• In statement-level CFG, we often use a node to explicitly


represent merging of control

• Control merges when two different CFG nodes point to


the same node

• Note: if input language is structured, front-end can generate


basic block directly

• “GOTO considered harmful”

Friday, October 26, 12


Statement level CFG
A = 4

t1 = A * B

L1: t2 = t1/c

if t2 < W goto L2

M = t1 * k
goto L1
t3 = M + I

L2: H = I

M = t3 - H

if t3 ≥ 0 goto L3

L3: halt
Friday, October 26, 12
Loop optimization

• Low level optimization

• Moving code around in a single loop

• Examples: loop invariant code motion, strength


reduction, loop unrolling

• High level optimization

• Restructuring loops, often affects multiple loops

• Examples: loop fusion, loop interchange, loop tiling

Friday, October 26, 12


Low level loop optimizations

• Affect a single loop

• Usually performed at three-address code stage or later in


compiler

• First problem: identifying loops

• Low level representation doesn’t have loop statements!

Friday, October 26, 12


Identifying loops

• First, we must identify dominators

• Node a dominates node b if every possible execution


path that gets to b must pass through a

• Many different algorithms to calculate dominators – we


will not cover how this is calculated

• A back edge is an edge from b to a when a dominates b

• The target of a back edge is a loop header

Friday, October 26, 12


Natural loops
• Will focus on natural loops –
loops that arise in structured
programs B1

• For a node n to be in a loop


with header h
B2
• n must be dominated by h

• There must be a path in the B3


CFG from n to h through a
back-edge to h

• What are the back edges in the B4


example to the right? The loop
headers? The natural loops?

Friday, October 26, 12


Loop invariant code motion

• Idea: some expressions evaluated in a loop never change;


they are loop invariant

• Can move loop invariant expressions outside the loop,


store result in temporary and just use the temporary in
each iteration

• Why is this useful?

Friday, October 26, 12


Identifying loop invariant code
• To determine if a statement
s: a = b op c
is loop invariant, find all definitions of b and c that reach s

• A statement t defining b reaches s if there is a path from


t to s where b is not re-defined

• s is loop invariant if both b and c satisfy one of the following

• it is constant

• all definitions that reach it are from outside the loop

• only one definition reaches it and that definition is also


loop invariant

Friday, October 26, 12


Moving loop invariant code
• Just because code is loop invariant doesn’t mean we can move it!
for (...) a = 5;
for (...) if (*) for (...)
for (...) if (*) a = 5 if (*)
a = b + c a = 5 else a = 4 + c
c = a; a = 6 b = a

• We can move a loop invariant statement a = b op c if

• The statement dominates all loop exits where a is live

• There is only one definition of a in the loop

• a is not live before the loop

• Move instruction to a preheader, a new block put right before


loop header

Friday, October 26, 12


Strength reduction
• Like strength reduction
peephole optimization for (i = 0; i < 100; i++)

• Peephole: replace A[i] = 0;


expensive instruction like
a * 2 with a << 1

• Replace expensive
i = 0;
instruction, multiply, with a
cheap one, addition L2:if (i >= 100) goto L1
j = 4 * i + &A
• Applies to uses of an *j = 0;
induction variable i = i + 1;
goto L2
• Opportunity: array
L1:
indexing

Friday, October 26, 12


Strength reduction
• Like strength reduction
peephole optimization for (i = 0; i < 100; i++)

• Peephole: replace A[i] = 0;


expensive instruction like
a * 2 with a << 1

• Replace expensive
i = 0; k = &A;
instruction, multiply, with a
cheap one, addition L2:if (i >= 100) goto L1
j = k;
• Applies to uses of an *j = 0;
induction variable i = i + 1; k = k + 4;
goto L2
• Opportunity: array
L1:
indexing

Friday, October 26, 12


Induction variables
• A basic induction variable is a variable j

• whose only definition within the loop is an assignment of the


form j = j ± c, where c is loop invariant

• Intuition: the variable which determines number of iterations is


usually an induction variable

• A mutual induction variable i may be

• defined once within the loop, and its value is a linear function of
some other induction variable j such that
i = c1 * j ± c2 or i = j/c1 ± c2
where c1, c2 are loop invariant

• A family of induction variables include a basic induction variable and


any related mutual induction variables

Friday, October 26, 12


Strength reduction algorithm
• Let i be an induction variable in the family of the basic induction
variable j, such that i = c1 * j + c2

• Create a new variable i’

• Initialize in preheader
i’ = c1 * j + c2

• Track value of j. After j = j + c3, perform


i’ = i’ + (c1 * c3)

• Replace definition of i with


i = i’

• Key: c1, c2, c3 are all loop invariant (or constant), so computations
like (c1 * c3) can be moved outside loop

Friday, October 26, 12


Linear test replacement
• After strength reduction, the i = 2
loop test may be the only use of for (; i < k; i++)
the basic induction variable j = 50*i
... = j
• Can now eliminate induction
variable altogether Strength reduction
• Algorithm
i = 2; j’ = 50 * i
• If only use of an induction for (; i < k; i++, j’ += 50)
variable is the loop test and ... = j’
its increment, and if the test
is always computed Linear test replacement

• Can replace the test with an


i = 2; j’ = 50 * i
equivalent one using one of
for (; j’ < 50*k; j’ += 50)
the mutual induction
variables ... = j’

Friday, October 26, 12


Loop unrolling

• Modifying induction
variable in each iteration for (i = 0; i < N; i++)
can be expensive A[i] = ...

• Can instead unroll loops


Unroll by factor of 4
and perform multiple
iterations for each
increment of the for (i = 0; i < N; i += 4)
induction variable A[i] = ...
A[i+1] = ...
• What are the advantages A[i+2] = ...
and disadvantages? A[i+3] = ...

Friday, October 26, 12


High level loop optimizations

• Many useful compiler optimizations require restructuring


loops or sets of loops

• Combining two loops together (loop fusion)

• Switching the order of a nested loop (loop interchange)

• Completely changing the traversal order of a loop (loop


tiling)

• These sorts of high level loop optimizations usually take


place at the AST level (where loop structure is obvious)

Friday, October 26, 12


Cache behavior j
• Most loop transformations target cache x
performance

• Attempt to increase spatial or temporal


i
locality

• Locality can be exploited when there is


y A
reuse of data (for temporal locality) or
recent access of nearby data (for spatial
locality) y = Ax
• Loops are a good opportunity for this: many
loops iterate through matrices or arrays

• Consider matrix-vector multiply example


for (i = 0; i < N; i++)
• Multiple traversals of vector:
for (j = 0; j < N; j++)
opportunity for spatial and temporal
locality y[i] += A[i][j] * x[j]
• Regular access to array: opportunity for
spatial locality

Friday, October 26, 12


Loop fusion
do I = 1, n
c[i] = a[i]
end do do I = 1, n
do I = 1, n
b[i] = a[i]
• Combine two loops c[i] = a[i]
together into a single b[i] = a[i]
end do loop end do

c[1:n] • Why is this useful? c[1:n]

a[1:n]
• Is this always legal? a[1:n]

b[1:n]
b[1:n]
a[1:n]

Friday, October 26, 12


Loop interchange
• Change the order of a nested
loop
j

• This is not always legal – it


x
changes the order that
elements are accessed! i

• Why is this useful?


y A
• Consider matrix-matrix
for (i = 0; i < N; i++)
multiply when A is stored for (j = 0; j < N; j++)
in column-major order y[i] += A[i][j] * x[j]
(i.e., each column is stored
in contiguous memory)

Friday, October 26, 12


Loop interchange
• Change the order of a nested
loop
j

• This is not always legal – it x


changes the order that
elements are accessed! i

• Why is this useful?


y A
• Consider matrix-matrix
for (j = 0; j < N; j++)
multiply when A is stored for (i = 0; i < N; i++)
in column-major order y[i] += A[i][j] * x[j]
(i.e., each column is stored
in contiguous memory)

Friday, October 26, 12


Loop tiling
for (i = 0; i < N; i++)
• Also called “loop blocking” for (j = 0; j < N; j++)

• One of the more complex


y[i] += A[i][j] * x[j]

loop transformations

• Goal: break loop up into for (ii = 0; ii < N; ii += B)


for (jj = 0; jj < N; jj += B)
smaller pieces to get spatial
for (i = ii; i < ii+B; i++)
and temporal locality
for (j = jj; j < jj+B; j++)

• Create new inner loops y[i] += A[i][j] * x[j]


so that data accessed in j
inner loops fit in cache x

• Also changes iteration


order, so may not be legal i

y A
Friday, October 26, 12
Loop tiling
for (i = 0; i < N; i++)
• Also called “loop blocking” for (j = 0; j < N; j++)

• One of the more complex


y[i] += A[i][j] * x[j]

loop transformations

• Goal: break loop up into for (ii = 0; ii < N; ii += B)


for (jj = 0; jj < N; jj += B)
smaller pieces to get spatial
for (i = ii; i < ii+B; i++)
and temporal locality
for (j = jj; j < jj+B; j++)

• Create new inner loops y[i] += A[i][j] * x[j]


so that data accessed in j
inner loops fit in cache x

• Also changes iteration B

order, so may not be legal i B

y A
Friday, October 26, 12
In a real (Itanium) compiler
GFLOPS relative to -O2; bigger is better
30.0
92% of Peak
Performance
22.5
factor faster than -O2

15.0

7.5

0
1

ch

ge

am

4
-O

-O

-O

-O
et

an

l-j
ef

c
l
ch

ro

gc
pr

ng
un
r
te
+

ki
in

oc
+
+

bl
+

Friday, October 26, 12


Loop transformations

• Loop transformations can have dramatic effects on performance

• Doing this legally and automatically is very difficult!

• Researchers have developed techniques to determine legality of loop


transformations and automatically transform the loop

• Techniques like unimodular transform framework and polyhedral


framework

• These approaches will get covered in more detail in advanced


compilers course

Friday, October 26, 12

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy