Intermediate Code Generation
Intermediate Code Generation
INTRODUCTION
The front end translates a source program into an intermediate representation from which
the back end generates target code.
1. Retargeting is facilitated. That is, a compiler for a different machine can be created by
attaching a back end for the new machine to an existing front end.
INTERMEDIATE LANGUAGES
Syntax tree
Postfix notation
The semantic rules for generating three-address code from common programming language
constructs are similar to those for constructing syntax trees or for generating postfix notation.
Graphical Representations:
Syntax tree:
A syntax tree depicts the natural hierarchical structure of a source program. A dag
(Directed Acyclic Graph) gives the same information but in a more compact way because
common subexpressions are identified. A syntax tree and dag for the assignment statement a : =
b * - c + b * - c are as follows:
assign assign
a + a +
* * *
c c c
Postfix notation:
Syntax-directed definition:
Syntax trees for assignment statements are produced by the syntax-directed definition.
Non-terminal S generates an assignment statement. The two binary operators + and * are
examples of the full operator set in a typical language. Operator associativities and precedences
are the usual ones, even though they have not been put into the grammar. This definition
constructs the tree from the input a : = b * - c + b* - c.
E ( E1 ) E.nptr : = E1.nptr
Two representations of the syntax tree are as follows. In (a) each node is represented as a
record with a field for its operator and additional fields for pointers to its children. In (b), nodes
are allocated from an array of records and the index or position of the node serves as the pointer
to the node. All the nodes in the syntax tree can be visited by following pointers, starting from
the root at position 10.
aaaaaaaaaaaaa
assign 0 id b
1 id c
id a
2 uminus
2 1
3 * 0 2
+
4 id b
5 id c
* *
6 uminus 5
id b id b
7 * 4 6
uminus uminus 8 + 3 7
9 id a
id c id c
10 assign 9 8
(a) (b)
Three-Address Code:
x : = y op z
where x, y and z are names, constants, or compiler-generated temporaries; op stands for any
operator, such as a fixed- or floating-point arithmetic operator, or a logical operator on boolean-
valued data. Thus a source language expression like x+ y*z might be translated into a sequence
t1 : = y * z
t2 : = x + t 1
The use of names for the intermediate values computed by a program allows three-
address code to be easily rearranged – unlike postfix notation.
Three-address code corresponding to the syntax tree and dag given above
t1 : = - c t 1 : = -c
t 2 : = b * t1 t2 : = b * t1
t3 : = - c t 5 : = t2 + t2
t 4 : = b * t3 a : = t5
t5 : = t2 + t4
a : = t5
(a) Code for the syntax tree (b) Code for the dag
The reason for the term “three-address code” is that each statement usually contains three
addresses, two for the operands and one for the result.
4. The unconditional jump goto L. The three-address statement with label L is the next to be
executed.
5. Conditional jumps such as if x relop y goto L. This instruction applies a relational operator (
<, =, >=, etc. ) to x and y, and executes the statement with label L next if x stands in relation
relop to y. If not, the three-address statement following if x relop y goto L is executed next,
as in the usual sequence.
6. param x and call p, n for procedure calls and return y, where y representing a returned value
is optional. For example,
param x1
param x2
...
param xn
call p,n
generated as part of a call of the procedure p(x 1, x2, …. ,xn ).
When three-address code is generated, temporary names are made up for the interior
nodes of a syntax tree. For example, id : = E consists of code to evaluate E into some temporary
t, followed by the assignment id.place : = t.
E E1 + E2 E.place := newtemp;
E.code := E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘+’ E2.place)
E E1 * E2 E.place := newtemp;
E.code := E1.code || E2.code || gen(E.place ‘:=’ E 1.place ‘*’ E2.place)
E - E1 E.place := newtemp;
E.code := E1.code || gen(E.place ‘:=’ ‘uminus’ E1.place)
E ( E1 ) E.place : = E1.place;
E.code : = E1.code
E id E.place : = id.place;
E.code : = ‘ ‘
Semantic rules generating code for a while statement
S.begin:
E.code
S1.code
goto S.begin
S.after: ...
Triples
Indirect triples
Quadruples:
A quadruple is a record structure with four fields, which are, op, arg1, arg2 and result.
The op field contains an internal code for the operator. The three-address statement x : =
y op z is represented by placing y in arg1, z in arg2 and x in result.
The contents of fields arg1, arg2 and result are normally pointers to the symbol-table
entries for the names represented by these fields. If so, temporary names must be entered
into the symbol table as they are created.
Triples:
To avoid entering temporary names into the symbol table, we might refer to a temporary
value by the position of the statement that computes it.
If we do so, three-address statements can be represented by records with only three fields:
op, arg1 and arg2.
The fields arg1 and arg2, for the arguments of op, are either pointers to the symbol table
or pointers into the triple structure ( for temporary values ).
Since three fields are used, this intermediate code format is known as triples.
Indirect Triples:
For example, let us use an array statement to list pointers to triples in the desired order.
Then the triples shown above might be represented as follows:
DECLARATIONS