0% found this document useful (0 votes)
7 views18 pages

Basic Blocks

The document discusses the concept of basic blocks in programming, which are sequences of three-address instructions with specific entry and exit rules. It outlines methods for constructing basic blocks, identifying leaders, and optimizing code through techniques such as dead code elimination and common subexpression elimination. Additionally, it covers the representation of arrays in Directed Acyclic Graphs (DAGs) and the importance of efficient register usage in relation to variable liveness and next use information.

Uploaded by

codehacker026
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views18 pages

Basic Blocks

The document discusses the concept of basic blocks in programming, which are sequences of three-address instructions with specific entry and exit rules. It outlines methods for constructing basic blocks, identifying leaders, and optimizing code through techniques such as dead code elimination and common subexpression elimination. Additionally, it covers the representation of arrays in Directed Acyclic Graphs (DAGs) and the importance of efficient register usage in relation to variable liveness and next use information.

Uploaded by

codehacker026
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Basic Blocks:

It is sequence of three address instructions with following characteristics

 Flow of control can only enter the basic block thru the first instruction in the block.
 No jumps into the middle of the block
 Control will leave the block without halting or branching except possibly at the last
instruction in the block

Basic blocks act as nodes for flow graphs where edges indicate the flow of execution. If an edge
exists from block b1 to block b2 it means that instructions in b2 are to be executed after b1

Steps to construct Basic Blocks:

Partitioning three address instructions into basic blocks:

Input: sequence of three address instructions

Output: a list of basic blocks

Method: identify the leaders.

Leader is the first instruction in the block. We start with the leader and group all instructions into
one block till we encounter another leader (excluding next leader).

Rules for identifying leader:

1.The first 3 address instruction in IC (intermediate code) is leader

2. Any instruction that is the target of a conditional/unconditional jump is a leader

3. Any instruction that immediately follows a jump is a leader

Ex for i=1,10 do

For j=1,10 do

a[I,j]=0.0

for i=1,10 do

a[i,i]=1.0

here a is 10X10 matrix .each element takes 8 bytes and is stored as row major order

if lower index is 1

a[i,j]=base+((i-l)*nc)*w+(j-l)*w

base+(i*nc+j)w –l*nc*w-l*w)

base+(i*10+j)8-1*10*8-1*8

base+(i*10+j)8-88

1)i=1 //B1

2)j=1 //B2
3)t1=10*I //B3

4)t2=t1+j

5)t3=8*t2

6)t4=t3-88

7)a[t4]=0.0

8)j=j+1

9)if j<=10 goto (3)

10)i=i+1 //B4

11) if i<=10 goto (2)

12)i=1 //B5

13)t5=i-1 //B6

14)t6=88*t5

15)a[t6]=1.0

16)i=i+1

17)if i<=10 goto (13)

loops: B3 by itself, B6 by itself, {B2,B3,B4}

Block no Instructions
B1 i=1
B2 j=1
B3 t1=10*i
t2=t1+j
t3=8*t2
t4=t3-88
a[t4]=0.0
j=j+1
if j<=10 goto B3

B4 i=i+1
if i<=10 goto B2

B5 i=1
B6 t5=i-1
t6=88*t5
a[t6]=1.0
i=i+1
if i<=10 goto B6

Next Use information:

Machine instructions involving registers are faster & shorter than the instructions involving operands
in memory.

hence it is important to utilize registers efficiently.


If the value of a variable that is currently in a register will never be used subsequently then that
register can be assigned to some other variable.

Knowing when the value of variable is used is essential

We want to keep variables in registers for as long as possible, to avoid having to reload them
whenever they are needed.

When a variable isn’t needed any more we free the register to reuse it for other variables.

We must know if a particular value will be used later in the basic block.

If, after computing a value X, we will soon be using the value again, we should keep it in a register.

If the value has no further use in the block ,we can reuse the register for assigning to another
variable

Applications of nextuse:

It can be used to assign space for temporaries and to assign register to variables

Suppose

instruction I :assigns value to x I x=10

if instruction j :uses that value(operand) J y=x+5

and if control flows from i to j ,and along the path if x is not modified then we say that

j uses value of x computed at i

x is live at statement j

nextuse of x in i is j

This (liveness & next use)has to be done for every three address statement in the block

Algorithm:

Input: block B

Assumptions: Initially all non temporary variables (pgm variables) in symbol table(ST) are live on exit

Output: at each statement i:x=y+z in B ,we attach to I the liveness and next use information of x,y
and z

Method:
Block
x=y+z
t=x+y
z=x+t

We start with the last statement in B and scan backwards to the beginning of B.

At each statement i:x=y+z in B we do the following

i. Attach to the stmt I the information currently in ST regarding next use and liveness of
x,y,z
ii. In the ST set x to not live and no next use
iii. In the ST set y,z to live and next use to i
+ is generic for TAC of the form x=+y or x=y same as above ignoring z(2 & 3 order)
x,y,z are live on exit t is not

First approach:

start from stmt 4

stmt statement liveness nextuse stmt nextuse


no
x y z t x y z t x y z t
intial l l l d - - - - x=y+z 2 2 - -
3 z=x+t l d l 3 - - 3
t=x+y 3 - - 3
2 t=x+y l l d 2 2 - -
z=x+t -
1 x=y+z d l l - 1 1 -

Second approach

Variables Initial Line 3 Line 2 Line 1 Lin


z=x+t t=x+y x=y+z e
no
x L(0) L(3) L(2) D 1 x=y+z x=L(2) y=L(2) z=D t=d
y L(0) L(0) L(2) L(1) 2 t=x+y t=L(3) x=L(3) y=L(0) z=d
z L(0) D D L(1) 3 z=x+t z =L(0) x=L(0) t=D y=L(0)
t D L(3) D D

Line Stmt status Remarks


No

Line1 Col X:D Y:L(1) Z: L(1) T:D Before executing stmt 1 observe
(1) the nextuse It says y & z will be
used in stmt 1 so don’t empty the
register assigned for y&z
1 x=y+z
(2) T:D X:L(2) Y:L(2) Z:D Before executing stmt 2 observe
the nextuse It says y & x will be
used in stmt 2 so don’t empty the
register assigned for y & x
2 t=x+y
(3) Z:D X:L(3) T:L(3) Y:L(0) Before executing stmt 3 observe
the nextuse It says x & t will be
used in stmt 3 so don’t empty the
register assigned for x & t
3 z=x+t
(0) z:L(0) x:L(0) t:D Y:l(0)

Flow Graphs: Used for data flow analysis


Once intermediate code is partitioned into basic blocks ,we can represent flow of control between
them by flow graphs

Nodes are basic blocks

Edges denote the flow of control

If there is an edge from B1 to B2 then for B1 , B2 is successor & B1 is predecessor for B2

Two extra nodes are added entry & exit

Here jumps to specific instruction in basic block is replaced by block number. bcoz after constructing
the flow graph ,blocks may undergo substantial changes to instructions in blocks that may call for
fixing the targets of the jumps everytime target was changed.

Loops:

Since most of the execution time is spent in loops, it is important that we generate efficient code for
loops.

we say a loop exists,

if and only if ,there is a node in L called loop entry with the property that no other node in L has a
predecessor outside L

every node in l has nonempty path ,completely within L to the entry of L

loops: B3 by itself, B6 by itself, {B2,B3,B4}

Optimization of Basic Blocks:


We can improve the running time of code ,by performing local optimization within each basic block
by itself.

Advantages:

Executes faster

Efficient memory usage

Optimization Techniques:

Dead code elimination

Common subexpression elimination

Compile time evaluation: constant folding, constant propagation

Strength reduction

Code movement

DAG representation of Basic Blocks:

Basic block DAG


a=b+c
b=a-d
c=b+c
d=a-d

we can apply many local optimization techniques by constructing DAG from Basic blocks .

Procedure:

 There is a node in the DAG for each initial values of the variables appearing in basic
block(BB)
 There is a node N associated with each statement s within the block. The children of N are
those nodes corresponding to statements that are the last definitions prior to s of the
operands used by s
 Node N is labeled by the operator
 applied at s and also attached to N is the list of variables for which it is the last definition
within the block.
 Certain nodes are designated output nodes. These are the nodes whose variables are live on
exit from the blocks.that is ,their values may be used later in another block of the flow graph

We can perform following on the DAG representation of BB,

 We can eliminate local common subexpressions


 We can eliminate dead code
 We can reorder stmts that do not depend on one another,such reordering may reduce the
time a temporary value needs to be preserved in a register (a=b+c, e=d+f,b=b+d)
 We can apply algebraic laws to reorder operands of TAC, thereby simplify computation

Finding local common subexpression:

TAC

1) a=b+c
2) b=a-d

3) c=b+c

4) d=a-d

new TAC: between 2nd & 4th a & d doesn’t change,so one of them can be removed.But since d is
reused b can be eliminated

a=b+c // if b is not reused or dead eliminate 2nd instruction

d=a-d

c=d+c

if b is alive, extra copy stmt is added

b=d

ii) when looking for common subexpressions,we are really looking for expressions that are
guaranteed to compute the same value,no matter how that value is computed.

Below stmt 1 & 4 computes same value bcoz sum remains the same(b0+c0). Eventhough at the
surface level they are different(2nd & 3rd modify b & c)

a=b+c

b=b–d

c=c+d

e=b+c

DAG without optimization


consider e = b+c

from fig

b=b0-d0,

c=c0+d0

replace b & c in e by above values

(b0 - d0)+(c0+ d0)=b0+c0 which happens to be a

so a & e can represent same node

This can be detected when algebraic identities are applied to the DAG

Dead Code Elimination:

Source program with After


Dead Code elimination
i=0; i=0;
while(i>0)
{
a++;b++;
}

We delete from DAG any root (node with no ancestor) that has no live variables attached.

If applied repeatedly , this will remove all nodes from the DAG that correspond to dead code

If in fig above assume a & b are live but c & e are not, we can immediately remove the root labeled
e.

Then node labeled c becomes root & can be removed.

The roots labeled a & b remain,since they have live variables attached

Ex a=a+d

a=d

d=d+c

d=a+d
d=c

after removing dead variables

a=d

d=c

Use of Algebraic identities:


x+0=0+x=x x-0=x
x*1=1*x=x x/1=x
local reduction in strength:
Expensive Cheaper
x2 x*x
2*x x+x
x/2 x*0.5
Constant folding
Here we evaluate constant expressions at compile time and replace the constant
expressions.
TAC Modified TAC
2*3.142*r 6.284*r
22/7*r*r 3.142*r*r

Constant Propagation

Ex x = 3;
y = x + y;

x=3
y=3+y

int x = 14;
int y = 7 - x / 2;
return y * (28 / x + 2);

Propagating x yields:

int x = 14;
int y = 7 - 14 / 2;
return y * (28 / 14 + 2);

Loop collapsing
int a[100][300];

for (i = 0; i < 300; i++)


for (j = 0; j < 100; j++)
a[j][i] = 0;

Here is the code fragment after the loop has been collapsed.

int a[100][300];
int *p = &a[0][0];

for (i = 0; i < 30000; i++)


*p++ = 0;

Loop invariant(code movement)


while(i<10)
{
a=10;
i=i+1;
}

After modification
a=10;
while(i<10)
{
i=i+1;
}

Associative laws might expose common subexpressions,


Source program TAC
a=b+c a=b+c
e=c+d+b t=c+d
e=t+b
we can change the sequence to a=b+c
e=a+d
Compiler writer has to examine language reference manual to determine what are
permitted
Ex x*y-x*z can be written as x* (y-z)
But a+(b-c) cannot be written as (a+b)-c not permitted in Fortran (cannot violate ())

Representation of array references:

Array indexing cannot be treated like other operators

TAC Looks like Remarks


x=a[i] x=a[i] j could be equal to I & hence middle stmt would in fact change a[i]
a[j]=y a[j]=y thus not correct
z=a[i] z=x

Proper way of representing array accesses in DAG is as follows:

For x=a[i] create node with operator =[] and two children representing initial value of array a 0 and
index i.variable x becomes a label of this new node
For a[j]=y a new node is created with operator[]= and three children representing a 0,j & y. No {}

Node []= kills all currently constructed nodes whose value depends on a0.

A node that has been killed will not receive any more labels that is it cannot become common
subexpression

Sometimes a node must be killed even though none of its children have an array like a0 attached

likewise a node can kill ,if it has a descendent that is an array ,even though none of its children are
array nodes.

b=12+a beginning address of 4th element For efficiency reasons b is representing some
x=b[i] position in an array a
b[j]=y If elements are 4 bytes long,b represents 4th
z=b[i] element of a. If i & j represent same value then
b[i] & b[j] represent same location
a Therefore it is important to have third
0 4 8 12 statement b[j]=y
Kill the node with x as its attached child

Reassembling BB from DAG’s

After possible optimizations by manipulating DAG once constructed ,we may reconstitute the TAC
for BB from which we built the DAG

For each node that has one or more variables,

we construct a TAC that computes the value of one of those variable.

we prefer to compute the result into a variable that is live on exit from the block

however ,if global live variable information is not available ,we assume every program variable is live
except temporary

if more than one live variable is attached ,then we have to introduce copy stmts to give correct value
to each of these live variable(global optimization may look after the redundancy)

Original TAC Optimized TAC Remarks


a=b+c a=b+c Here we decided that if b is not live on exit from the
b=a-d d=a-d block , optimized TAC is valid .
c=b+c c=d+c d should be used in place of b as optimized block never
d=a-d computes b
a=b+c if both are live on exit or not sure about liveness we
d=a-d need to compute b as well as d
b=d still it is efficient bcoz minus replaces copy (during global
c=d+c data flow analysis it may be removed)

While reconstructing BB from DAG we need to not only worry about what variables will be used to
hold values but also about the order of computation

Rules are

 Order of instructions must respect the order of nodes in the DAG. That is we cannot
compute a nodes value until we have computed a value for each of its children
 Assignments to an array must follow all previous assignments to or evaluations from the
same array according to the order of these instructions in the original BB
 Evaluations of array must follow any previous assignments to the same array

Original DAG BB from DAG


a=b+c a=b+c
b=b–d
c=c+d
b=b-d
e=b-d c=c+d
e=b

A simple code generator:


Generates code for single basic block

It keeps track of what values are in what registers so that it can avoid unnecessary loads & stores

Primary concern/objective is efficient usage of registers

Uses of registers

 Some or all of the operands of an operation must be in registers


 Registers make good temporaries(either to hold supexpression result or variable value)
 Registers are used to help with run time storage management like managing run time stack
 Registers are used to hold global values

But registers are limited

Assumption:

some set of registers are used to hold values within block

All registers are not available(global variables & stack mgmt)

Some or all of the operands of an operation must be in registers


Assumptions:

Each IC operator has exactly one machine instruction that takes operands in register & perform
operation, leaving the result in register

Machine instructions are of the form


LD reg,mem /* loads into register*/

ST mem,reg /* stores register contents*/

OP reg1,reg2,reg3 /*reg1 is dest register holding result ,reg1 & reg2 holds the two
operands*/

Register & Address descriptor:


Our code generation algorithm ,considers each TAI and (LD,opn,ST)

determines what loads are necessary to get needed operands into register(Loads)

After generating loads it will generate the TAI for operation itself(operation)

If there is a need to store the result into memory location,it will also generate the store(stores)

Inorder to make decision, we need to have a data structure that tells about

what variable value is in register

whether a memory location attached to a variable is having latest value or not(may not be stored
from reg back to memory)

we have two descriptors to store this information

Register Descriptors(RD):
For each available register , RD keeps track of the variable names whose current value is in that
register.

Initially all register descriptors are empty,as the code generation progresses, each register will hold
the value of zero or more names

Address Descriptors(AD): For each program variable an AD keeps track of the location/s
where the current value of that variable can be found

Here location may be register, memory address, a stack location or combination .The information
can be stored in ST against variable name

Code Generation Algorithm:


Uses function: getReg(I)

It selects registers for each memory location associated with TAI :I

Function has access to all register & address descriptors for all the variables of BB

May also have access to certain useful data flow information about liveness on exit of block

Machine Instructions for operations:


For TAI of the form x=y+z do the following
1.Use getReg(x=y+z) to select registers for x,y & z call these Rx, Ry & Rz

2. if y is not in Ry(as per RD for Ry),then issue LD Ry,y´,where y´ is one of the memory location for y
according to AD of y

3. similarly for z

4.issue the instruction ADD Rx, Ry , Rz

Machine Instructions for copy stmts: For TAI of the form x=y (y not exist,exist)
Here we assume that getReg will always choose the same register for both x & y

If y is not in Ry issue LD Ry,y´

If it is there we do nothing,except adjusting register descriptor for Ry so that it includes x as one of


the values found there

Ending BB:
Here variables used by the block may wind up with their only location being a register

If the variable is a temporary used only within block, it’s fine .when block ends we can ignore
temporary and assume register is empty

If variable is live on exit from the block or if we don’t know which variables are live on exit then we
need to assume that the value of the variable is needed later

In that case,for each variable x whose location descriptor does not say that its value is located in
memory location for x,we must generate

ST x,R where R is a register where x value exists at the end of the block

Managing Register & Address Descriptor:


As the code generation algorithm issues load,store & other machine instructions it needs to update
the register and address descriptors.

Rules are as follows

1. For the instruction LD R,x

a)change the RD for register R so it holds only x


b)change the AD for x by adding register R as an additional location

2. for ST x,R change AD for x to include its own memory location

3. for ADD Rx, Ry , Rz implementing x=y+z (Ex d=v+u)

a)change the RD for register Rx, so it only holds x


b)change the AD for x so that its only location is Rx, (not even x location)
c)remove Rx from AD of any variable other than x

4. copy stmt x=y, after generating the load for y into register Ry if needed and after managing
descriptors as for load stmt (change the RD for register Ry so it holds only y
,change the AD for y by adding register Ry as an additional location
,remove Ry from AD of any variable other than y)

a)add x to the RD for Ry


b) change AD for x so that its only location is Ry(not even x)
TAC
1)t=a-b // t,u,v temporaries local to block

2)u=a-c // a,b,c,d are variables live on exit from the block

3)v=t+u
4)a=d
5)d=v+u

Liveness & Next Use Computation


Variabl Initia Line 5 Line 4 Line 3 Line 2 Line 1 Line Nextuse & Liveness
es l d=v+u a=d v=t+u u=a-c t=a-b no
a L(0) L(0) D D L(2) L(1) 1 t=a-b t=L(3) a=L(2) b=L(0)
b L(0) L(0) L(0) L(0) L(0) L(1) 2 u=a-c u= L(3) a=D c=L(0)
c L(0) L(0) L(0) L(0) L(2) L(2) 3 v=t+u v= L(5) t= D u= L(5)
d L(0) D L(4) L(4) L(4) L(4) 4 a=d a= L(0) d= D
t D D D L(3) L(3) D 5 d=v+u d= L(0) v= D u=D
u D L(5) L(5) L(3) D D
v D L(5) L(5) D D D

Instruction target code R1 R2 R3 a B c d t u v Remarks


a B c d
t=a-b LD R1,a
LD R2,b since b is not needed
Sub R2,R1,R2 R2 is used to store
(t=a-b)
result
a t a,R1 B c d R2
chosen R2(b) as

R1(a) not available

if mnmno of reg usage

policy is used

LD R3,c a t c
u=a-c Sub R1,R1,R3 R1 is reused bcoz a is

(u=a-c) no longer needed and

exists in locn a
u t c a b c,R3 d R2 R1
chosen R1(a)

could have chosen

R3(c).

R2(t) has next use


v=t+u ADD
R3,R2,R1 R3 used to store
(v=t+u) reused since c no

longer needed(

u t v a B c d R2 R1 R3
chosen R3( c ) could

have chosen R2(t) .

R1(u) not available

a=d LD R2,d
Not in reg ,hence load
(a=d)
d
u a,d v R2 B c d,R2 R1 R3
Chosen r2(t) to load d

R1(u) and R3(v) not

available

d=v+u ADD
R1,R3,R1 R1 is reused since u is

temporary & no longer

needed
d a v R2 B c R1 R3
Chosen R1(u) to store

new d could have

chosen R3(v).

R2(a) not available

Since a & d are not in

memory on exit

restore
Exit ST a ,R2 d a v a,R2 B c d,R1 R3
ST d,R1
Design Of Function getReg:
Consider x=y + z

We have to pick register for y & z, issues for y & z are same.

so let us consider Ry selection(y reg exist,(! & empty reg) ,else 1!2!)

i. If y is currently in a register, pick a register already containing y as Ry


LD instruction not needed (Ry contains Y may be bcoz of previous computation)
ii. If y is not in a register, but there is a register that is currently empty, pick one such as
Ry
iii. problem occurs when y is not in register & none of the register is empty,
we need to pick up one of the allowable registers anyway .
Let R be a candidate & v is one of the variable value saved according to RD for R
.(ie R is holding variable v)
We need to make sure that v’s value either is not really needed
or that there is somewhere else we can go to get the value of R
Possibilities are(somewhere,x,notused else spill) x = y + z
a)if AD for v says that v is somewhere besides R then it’s ok
b)if v is x ,(z is not x bcoz z is lost x=y+x) then also ok bcoz x is never again
going to be used /* R->x & TAI is x=y+z*/
c)if v is not used later (after I) and v is live on exit (v is recomputed) then
also ok
d)if not ok by one of the first three cases then generate ST v,R spill
operation
since R may hold several variables at the moment we repeat this step for
each such variable v.
at the end R’s score is the number of store instructions we need to
generate.
Pick one with lowest score

Now consider selection of Rx. issues are same


Some differences are
(only x or y not used)

a)Since new value of x is being computed, a register that holds only x is always acceptable
choice for Rx

This statement holds even if x is one of y and z bcoz x as operand is utilized and the new
value is stored in Rx(bcoz our machine instruction allows two registers to be the same in
one instruction) Ex ADD R1,R1,R2(x=x+z)

b)if y is not used after instruction I in the sense described for v (IIIc) and Ry holds only y
after being loaded if necessary Ry can be used as Rx

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy