Lecture 10
Lecture 10
loop optimizations
• Code motion
• Strength reduction
• Unrolling
• Loop fusion
• Loop interchange
• Loop tiling
• Whole procedures?
• Whole program?
• Conditionals
• Loops
A = 4
t1 = A * B
repeat {
t2 = t1/C
if (t2 ≥ W) {
M = t1 * k
t3 = M + I
}
H = I
M = t3 - H
} until (T3 ≥ 0)
1 A = 4
2 t1 = A * B
3 L1: t2 = t1 / C
4 if t2 < W goto L2
5 M = t1 * k
6 t3 = M + I
7 L2: H = I
8 M = t3 - H
9 if t3 ≥ 0 goto L3
10 goto L1
11 L3: halt
L1: t2 = t1/c
if t2 < W goto L2
M = t1 * k
goto L1
t3 = M + I
L2: H = I
M = t3 - H
if t3 ≥ 0 goto L3
How do we build
this automatically?
L3: halt
• Identifying leaders
• Algorithm
leaders = {1}
//Leaders always includes first statement
for i = 1 to |n|
//|n| = number of statements
if stat(i) is a branch, then
leaders = leaders ∪ all potential targets
end for
worklist = leaders
while worklist not empty do
x = remove earliest statement in worklist
block(x) = {x}
for (i = x + 1; i ≤ |n| and i ∉ leaders; i++)
block(x) = block(x) ∪ {i}
end for
end while
1 A = 4
2 t1 = A * B
3 L1: t2 = t1 / C
4 if t2 < W goto L2
5 M = t1 * k
6 t3 = M + I
7 L2: H = I
8 M = t3 - H
9 if t3 ≥ 0 goto L3
10 goto L1
11 L3: halt
Leaders =
Basic blocks =
Friday, October 26, 12
Running example
1 A = 4
2 t1 = A * B
3 L1: t2 = t1 / C
4 if t2 < W goto L2
5 M = t1 * k
6 t3 = M + I
7 L2: H = I
8 M = t3 - H
9 if t3 ≥ 0 goto L3
10 goto L1
11 L3: halt
Leaders =
{1, 3, 5, 7, 10, 11}
Basic blocks =
{ {1, 2}, {3, 4}, {5, 6}, {7, 8, 9}, {10}, {11} }
Friday, October 26, 12
Putting edges in CFG
• There is a directed edge from B1 to B2 if
L1: t2 = t1/c
if t2 < W goto L2
M = t1 * k
goto L1
t3 = M + I
L2: H = I
M = t3 - H
if t3 ≥ 0 goto L3
L3: halt
t1 = A * B
L1: t2 = t1/c
if t2 < W goto L2
M = t1 * k
goto L1
t3 = M + I
L2: H = I
M = t3 - H
if t3 ≥ 0 goto L3
L3: halt
Friday, October 26, 12
Loop optimization
• it is constant
• Replace expensive
i = 0;
instruction, multiply, with a
cheap one, addition L2:if (i >= 100) goto L1
j = 4 * i + &A
• Applies to uses of an *j = 0;
induction variable i = i + 1;
goto L2
• Opportunity: array
L1:
indexing
• Replace expensive
i = 0; k = &A;
instruction, multiply, with a
cheap one, addition L2:if (i >= 100) goto L1
j = k;
• Applies to uses of an *j = 0;
induction variable i = i + 1; k = k + 4;
goto L2
• Opportunity: array
L1:
indexing
• defined once within the loop, and its value is a linear function of
some other induction variable j such that
i = c1 * j ± c2 or i = j/c1 ± c2
where c1, c2 are loop invariant
• Initialize in preheader
i’ = c1 * j + c2
• Key: c1, c2, c3 are all loop invariant (or constant), so computations
like (c1 * c3) can be moved outside loop
• Modifying induction
variable in each iteration for (i = 0; i < N; i++)
can be expensive A[i] = ...
a[1:n]
• Is this always legal? a[1:n]
b[1:n]
b[1:n]
a[1:n]
loop transformations
y A
Friday, October 26, 12
Loop tiling
for (i = 0; i < N; i++)
• Also called “loop blocking” for (j = 0; j < N; j++)
loop transformations
y A
Friday, October 26, 12
In a real (Itanium) compiler
GFLOPS relative to -O2; bigger is better
30.0
92% of Peak
Performance
22.5
factor faster than -O2
15.0
7.5
0
1
ch
ge
am
4
-O
-O
-O
-O
et
an
l-j
ef
c
l
ch
ro
gc
pr
ng
un
r
te
+
ki
in
oc
+
+
bl
+