BCS 324 Topic 6
BCS 324 Topic 6
1. Code Generation
2. Code Optimization
Dr. D. K. Muyobo
1
Discussion topics
1. Code generation
1.1 Issues in the design of a code generator
1.2 The target machine
1.3 Run-time storage management
1.4 Basic blocks and flow graphs
2. Code optimization
2.1 The principal sources of optimization
2.2 Optimization of basic blocks
2.3 Loops in flow graphs
2
1. Code generation
Code generation
◦ Is the final phase of compiling
◦ Takes input of intermediate representation of the
source program
◦ Outputs an equivalent target program as shown in
figure below
◦ Can take place whether optimization was done or
not
Requirements for a code generator
◦ Output must be correct
◦ Output must be of high quality
3
Source Front Intermediate Code Intermediate Code Target
program end code optimizer code generator program
Symbol
table
4
1.1 Issues in the design of a code
generator
1. Input to the code generator
◦ Consists of intermediate representation of the source
program produced by the front end
◦ Also consists of information in the symbol table used
to determine run-time addresses of the data objects
denoted by the names in the intermediate
representation.
◦ This stage can proceed on the assumption that its
input is free of errors
◦ In some compilers, this kind of semantic checking is
done together with code generation.
5
2. Target programs
◦ Output of the code generator
◦ This output may take several forms e.g.
Relocatable machine code
Assembly language
Absolute machine code
◦ Output as absolute machine code is advantageous of
very fast execution
◦ A set of relocatable object modules can be linked
together and loaded for execution by a link loeader.
6
3. Memory management
◦ Mapping names in the source program to addresses
of data objects in run-time memory is done
cooperatively by the front-end and the code
generator
◦ If machine code is being generated, labels in 3-address
statements have to be converted to address
instructions.
7
4. Instruction selection
◦ The nature of the instruction set of the target
machine determines the difficulty of instruction
selection
◦ Instruction speeds and machine idioms are other
important factors
◦ If we don’t care about efficiency of the target
program, instruction selection is straight-forward
◦ For each type of 3-address statements, we can design
a code skeleton that outlines the target code to be
generated for that construct
8
◦ E.g. every 3-address statement of the form
X = Y + Z, where X, Y and Z are statically
allocated, can be translated into the code sequence:
MOV Z,R0
ADD Y,R0
MOV R0,X
9
5. Register allocation
◦ Instructions involving register operands usually are
shorter and faster than those for memory operands
◦ Efficient utilization of registers is thus important for
good code
◦ The use of registers is subdivided into 2 sub-
problems:
1. During register allocation, we select the set of variables that
will reside in registers at a point in the program
2. During a subsequent register assignment phase we pick the
specific register that a variable will reside in.
◦ Finding an optimal assignment of registers is difficult
especially because target machine’s hardware of OS
may require different register usage conventions.
10
6. Choice of evaluation order
◦ The order in which computations are performed can
affect the efficiency of the target code
◦ Some computation orders require fewer registers to
hold the intermediate results than others.
◦ It is difficult to choose the order.
7. Approaches to code generation
◦ The main criteria of a code generator is to produce
correct code
◦ To achieve this, we need to design a code generator
that can be easily implemented, tested and maintained.
11
1.2 The target machine
Familiarity with the target machine and its
instructor set is a pre-requisite for designing a
good code generator
Our target computer is a byte-addressed
machine with four bytes to a word and n
general-purpose registers,
R0, R1, …, Rn-1.
It has 2-address statements of the form:
◦ Op source destination e.g.
MOV X,R0.
12
The source and destination fields are not long
enough to hold memory addresses, so certain
bit patterns in these fields specify that words
following an instruction contain operands
and/or addresses.
13
Instruction costs
◦ Cost of instruction (added cost) = 1 + cost
associated with source and destination
address modes
◦ This cost corresponds to the length (in
words) of the instruction.
◦ Address modes involving registers have cost
zero, while those involving memory locations
or literals in them have cost one, because
such operands are stored with the
instruction.
14
1.3 Run-time storage management
Information needed during an execution of a procedure
is kept in a block of storage called activation record
Storage for names local to the procedure also appears
in the activation record
We have to get code to manage these activation
records at run-time
Two standard storage allocation strategies discussed
previously are:
1. Static allocation, and
2. Stack allocation
15
In static allocation, the position of an activation
record in memory is fixed at compile time
In stack allocation, a new activation record is
pushed onto the stack for each execution of a
procedure.
16
Static allocation
◦ Consider code needed to implement static allocation
◦ A call statement in the intermediate code is
implemented by a sequence of two target machine
instructions
◦ A MOV instruction saves the return address, and a
GOTO transfers control to the target code for the
called procedure e.g.
MOV #here+20, callee.staticArea
GOTO callee.codeArea
17
◦ The attributes callee.staticArea and
callee.codeArea are constants referring to the
address of the activation record and the first
instruction for the called procedures respectively
◦ The code for a procedure ends with a return to the
calling procedure, except that the first procedure has
no caller, so its final instruction is HALT, which
presumably returns control to the Operating System.
◦ A return from procedure callee is implemented by
GOTO*callee.staticArea
Which transfers control to the address saved at the beginning
of the activation record.
18
Stack allocation
◦ Static allocation can become stack allocation by using
relative addresses for storage in activation records
◦ In stack allocation, the position of the record for an
activation of a procedure is usually stored in a
register, so words in the activation record can be
accessed as offsets from the value in this register
◦ Relative addresses in an activation record can be
taken as offsets from any known position in the
activation record
19
◦ When a procedure call occurs, the calling procedure
increments the stack pointer and transfers control to
the called procedure
◦ After control returns to the caller, it decrements the
stack pointer, thereby de-allocating the activation
record of the called procedure
◦ The code for the first procedure initializes the stack
by setting the stack pointer to the start of the stack
area in memory:
MOV #stackstart, SP
code for first procedure
HALT
20
◦ A procedure call sequence increments stack pointer,
saves the return address, and transfers control to the
called procedure:
ADD #caller.recordSize, SP
MOV #here+16, *SP
GOTO callee.codeArea
◦ The attribute caller.recordSize represents the
size of an activation record, so ADD instruction
leaves SP pointing to the beginning of the next
activation record
◦ MOV #here+16 = the address of the instruction
following the GOTO. It is saved in the address
pointed to by the SP.
21
Run-time addresses for names
◦ The storage allocation strategy and the layout
of local data in an activation record for a
procedure determines how the storage for
names is accessed
◦ Advantages of this approach are:
1. It makes a compiler more portable
2. It optimizes the compiler
22
◦ We look at a copy statement of the 3-address
statement in x=0;
◦ Suppose after the declarations in a procedure are
processed the symbol table entry for x contains as
relative address 12 for x. Taking x first to be in a
statically allocated area, then the actual run-time
address of x is static+12.
23
◦ Since the position of the static area is not known
when intermediate code to access the name is
generated, it makes sense to generate 3-address code
to “compute” static+12, with the understanding
that this computation will be carried out during the
code generation phase
◦ The assignment statement x=0 then translates into
Static[12]=0
◦ If the static area starts at address 100, the target
code for this statement is:
MOV #0, 12
24
1.4 Basic blocks and flow graphs
A graph representation of 3-address statements,
called a flow graph, is useful for understanding
code-generation algorithms
Nodes in the flow graph represent
computations, and edges represents the flow of
control
Some register-assignment algorithms use flow
graphs to find the inner loops where a program
is expected to spend most of its time.
25
Basic blocks
◦ A basic block is a sequence of consecutive statements
in which flow of control enters at the beginning and
leaves at the end without halt or possibility of
branching except at the end e.g.
The following sequence of 3-address statements forms a basic
block
t1 = a*a
t2 = a+b
t3 = z*t2
t4 = t1+t3
t5 = b*b
t6 = t4+t5
26
◦ Algorithm (partition into basic blocks)
Input. A sequence of 3-address statements e.g. x=y+z
Output. A list of basic blocks with each 3-address statement
in exactly one block
Method.
1. We first determine the set of leaders, the first statements of
basic blocks. Rules are as follows:
i. The first statement is a leader
ii. Any statement that is the target of a conditional or
unconditional GOTO is a leader
iii. Any statement that immediately follows a GOTO or
conditional GOTO statement is a leader
2. For each leader, its basic block consists of the leader and all
statements up to but not including the next leader or the end
of the program.
27
◦ Transformations on basic blocks
A basic block computes a set of expressions like
values of names etc.
Two blocks are equivalent if they compute same set
of expressions
A number of transformations can be applied to a
basic block without changing the set of expressions
computed by the block
These transformations improve quality of code and
include:
a) Structure-preserving transformations
b) Algebraic transformations
28
a) Structure preserving transformations
The primary structure-preserving
transformations on basic blocks are:
i. Common sub-expression elimination
ii. Dead-code elimination
iii. Renaming of temporary variables
iv. Interchange of two independent adjacent statements
29
i. Common sub-expression elimination
Consider the block:
a = b + c
b = a - d
c = b + c
d = a - d
The 2nd and 4th statements compute the same
expression, namely b+c-d, so this block may be
transformed into equivalent block:
a = b + c
b = a - d
c = b + c
d = b
30
ii. Dead-code elimination
Suppose x is dead i.e. never subsequently used, at the
point where the statement x=y+z appears in a basic
block
Then this statement may be safely removed without
changing the value of the block.
32
◦ b) Algebraic transformations
Countless algebraic transformations can be used to
change the set of expressions computed by a basic
block into an algebraically equivalent set e.g.
statements like
x=x+0 or x=x+1
can be eliminated from a basic block without
changing the sat of expressions it computers .the
exponentiation operator in the statement
x=y**z
requires a function call to implement.
33
◦ Using an algebraic transformation, this
statement can be replaced by the cheaper, but
equivalent statement
x=y*y
34
Flow graphs
◦ We can add flow of control information to the set of
basic blocks by making up a program by constructing
a directed graph called a flow graph
◦ The nodes of the flow graph are the basic blocks
◦ One node is distinguished as initial i.e. the block
whose leader is the 1st statement
◦ There is a directed edge from block B1 to block B2 if
B2 can immediately follow B1 in some execution
sequence.
◦ We say that B1 is a predecessor of B2, and B2 is a
successor of B1.
35
Representation of basic blocks
◦ Basic blocks can be represented by a variety
of data structures e.g.
After partitioning the 3-address statements by the
algorithm above
Each basic block can be represented by a record
consisting of a count of the number of quadruples
in the block, followed by a pointer to the leader
(first quadruple) of the block, and by the lists of
predecessors and successors of the block.
36
Loops
◦ A collection of nodes in a flow graph such that
i. All nodes in the collection are strongly connected i.e. from
any node in the loop to any other, there is a connecting
path
ii. The collection of nodes has a unique entry i.e. a node in
the loop such that the only way to reach a node outside
the loop is to first go through the entry
A loop that contains no other loops is called an inner
loop.
37
2. Code optimization
Ideally, the compiler should produce forget code
that is as good as can be written by hand. But in
practice this goal is not easy to achieve.
However some compilers can improve code to
run faster take less space compilers that apply
code improving transformations are called
optimizing compilers .
The most payoff for the least effort is obtained
if we can identify the frequently executed
parts of a program and make them as efficient
as possible.
38
Introduction
◦ To create as efficient target language, one
needs more than an optimizing compiler.
◦ We consider the code-improving
transformation of programs on which a
programmer and compiler can use to improve
forget code.
39
Criteria for code-improving transformation
◦ Best program transformations are those that yield
most benefits for least effort
i. A transformation must preserve the meaning of programs
i.e. an optimization must not charge the meaning of output
or cause error
ii. A transformation must speed up programs by a measurable
amount as well as the space taken by the compiled code
iii. A transformation must be worth the effort. The time and
effort spent on writing the compiler must be repaid by an
efficient, easy to implement and manage programs.
40
Getting better performance
◦ Dramatic improvements in the running time
of the program - such as cutting the running
time are usually a result of improving the
program at all levels i.e. from source level to
target level
◦ At each level there are options between
algorithm and of implementing a given
algorithm so that fewer operations are
performed, as shown in the diagram below.
41
Source Front Intermediate Code Target
program end code generator program
Figure: Places for potential improvements by the user and the compiler
42
An organization for an optimizing compiler
◦ Although we have said that improvements occur at all
levels, this topic concentrates on the transformation
of intermediate code as shown in the figure below
◦ The code improvement phase consists of control-
flow and data-flow analysis followed by the application
of transformations
◦ The code generator produces the target program
from the transformed intermediate code
43
◦ Advantages of organization in figure below:
i. Operations needed to implement high level
constructs are made explicit in the intermediate
code so its possible to optimize them
ii. The intermediate code can be relatively
independent of the target machine, so the
optimizer does not have to change much if the
code generator is placed by one for a different
machine.
44
Front Code Code
end optimizer generator
Control-flow Data-flow
Transformations
analysis analysis
45
2.1 The principal sources of
optimization
Techniques for implementing code-
improving transformations include:
i. Function-preserving transformations
ii. Common sub-expressions
iii. Copy propagation
iv. Dead code elimination
v. Loop optimization
vi. Code motion
vii. Variables and reductions in strength
46
i. Function-preserving transformations
◦ Compiler improves programs without changing
functions
◦ Methods include above mentioned common sub-
expressions, copy propagation, dead code
elimination, loop optimization, etc.
ii. Common sub-expressions
◦ Occurrence of an expression that was previously
computed and whose values have not changed
since
◦ We simply avoid recomputing it to save time (to
increase speed).
47
iii. Copy propagation
◦ If we assign like this f = g
◦ Then the value of f = the value of g
◦ We can avoid using f and use the new value g
instead.
iv. Dead-code elimination
◦ A variable whose value is not use subsequently is
dead (or useless code) e.g. statements that
compute values that never get used
◦ May appear as programmer errors, or as a result of
previous transformations
◦ Simply eliminate the dead code.
48
iv. Loop optimizations
◦ Decrease the number of instructions in
inner loops
◦ Techniques that help a lot in loop
optimizations are:
a) Code motion – moves code outside a loop
b) Induction variable elimination – eliminates i an j
from the inner loops B2 and B3
c) Reduction in strength – replaces an expensive
operation by a cheaper one e.g. the MUL can be
replaced by the cheaper ADD instruction.
49
2.2 Optimization of basic blocks
Many of the structure preserving
transformations can be implemented by a dag
for a basic block
This is because there is a node in the dag for
each of the initial values of the variables
appearing in the basic block, and there is a node
n associated with each statement s within the
block
The children of n are those nodes
corresponding to statements that are the last
definitions prior to s of the operands used by s.
50
Example
◦ A basic block and its dag
+ c
a = b + c
b = a - d - b,d
c = b + c
d = a - d + a bo
bo co
51
Use of algebraic identifiers
◦ Algebraic identifiers represent another
important class of optimization on basic
blocks e.g.
x + 0 = 0 + x = x
x - 0 = x
x * 1 = 1 * x = x
x / 1 = x
52
Another class of algebraic optimization includes
reduction in strength i.e.
◦ Replacing a more expensive operation with a cheaper
one e.g.
x * * 2 = x * x
2.0 * x = x + x
x / 2 = x * 0.5
A third class of related optimization is constant
folding
◦ We evaluate constant expressions at compile time
and replace the constant expressions by their values
e.g. 2 * 3.14 would be replaced by 6.28.
53
2.3 Loops in flow graphs
We use the notion of a node “dominating”
another to define “natural loops” and the
important special class of “reducible” flow
graphs
Dominators
◦ A node of a flow graph dominates node n, written as
d dom n, if every path from the initial node of the
flow graph to n goes through d.
◦ Under this definition, every node dominates itself,
and the entry of a loop dominates all nodes in the
loop.
54
Natural loops
◦ An important application of dominator
information is in determining the loops of a
flow graph suitable for improvement
◦ There are two properties for such loops:
i. A loop must have a single entry point, called
“header”. This point dominates all other nodes in
the loop, or it would not be the sole entry to the
loop
ii. There must be at least one way to iterate the
loop i.e. at least one path back to the header.
55
Inner loops
◦ If we use natural loops as “the loops”, then we
have the useful property that unless two
loops have the same header, they are either
disjoint or one is nested in the other
◦ So we focus on loops that contain no other
loops.
56