UNIT 3 Compiler Design
UNIT 3 Compiler Design
o In syntax directed translation, every non-terminal can get one or more than one
attribute or sometimes 0 attribute depending on the type of the attribute. The value
of these attributes is evaluated by the semantic rules associated with the production
rule.
o In the semantic rule, attribute is VAL and an attribute may hold anything like a string,
a number, a memory location and a complex record
Example
o The syntax directed translation scheme is used to evaluate the order of semantic
rules.
o In translation scheme, the semantic rules are embedded within the right side of the
productions.
Example
S→E$ { printE.VAL }
SDT is implementing by parse the input and produce a parse tree as a result.
Example
S→E$ { printE.VAL }
Intermediate code
Intermediate code is used to translate the source code into the machine code. Intermediate
code lies between the high-level language and the machine language.
o If the compiler directly translates source code into the machine code without
generating intermediate code then a full native compiler is required for each new
machine.
o The intermediate code keeps the analysis portion same for all the compilers that's
why it doesn't need a full compiler for every unique machine.
o Intermediate code generator receives input from its predecessor phase and semantic
analyzer phase. It takes input in the form of an annotated syntax tree.
o Using the intermediate code, the second phase of the compiler synthesis phase is
changed according to the target machine.
Intermediate representation
Intermediate code can be represented in two ways:
Postfix Notation
o Postfix notation is the useful form of intermediate code if the given language is
expressions.
o The ordinary (infix) way of writing the sum of x and y is with operator in the middle: x
* y. But in the postfix notation, we place the operator at the right end as xy *.
Example
Production
1. E → E1 op E2
2. E → (E1)
3. E → id
Semantic Rule Program fragment
E.code = E1.code
E.code = id print id
o In the parse tree, most of the leaf nodes are single child to their parent nodes.
o Syntax tree is a variant of parse tree. In the syntax tree, interior nodes are operators
and leaves are operands.
Abstract syntax trees are important data structures in a compiler. It contains the least
unnecessary information.
Abstract syntax trees are more compact than a parse tree and can be easily used by a
compiler.
Three address code
o Three-address code is an intermediate code. It is used by the optimizing compilers.
o In three-address code, the given expression is broken down into several separate
instructions. These instructions can easily translate into assembly language.
o Each Three address code instruction has at most three operands. It is a combination
of assignment and a binary operator.
Three address code is a type of intermediate code which is easy to generate and can be
easily converted to machine code. It makes use of at most three addresses and one operator
to represent an expression and the value computed at each instruction is stored in
temporary variable generated by compiler. The compiler decides the order of operation given
by three address code.
General representation –
a = b op c
Given Expression:
1. a := (-c * b) + (-c * d)
t1 := -c
t2 := b*t1
t3 := -c
t4 := d * t3
t5 := t2 + t4
a := t5
The three address code can be represented in two forms: quadruples and triples.
Quadruple
Triples
Indirect Triples
1. Quadruple –
It is structure with consist of 4 fields namely op, arg1, arg2 and result. op
denotes the operator and arg1 and arg2 denotes the two operands and result is
used to store the result of the expression.
Advantage –
Disadvantage –
Example
1. a := -b * c + d
t1 := -b
t2 := c + d
t3 := t1 * t2
a := t3
These statements are represented by quadruples as follows:
(0) uminus b - t1
(1) + c d t2
(2) * t1 t2 t3
(3) := t3 - a
Triples
The triples have three fields to implement the three address code. The field of triples
contains the name of the operator, the first source operand and the second source operand.
Example:
1. a := -b * c + d
t1 := -b t2 := c + dM t3 := t1 * t2 a := t3
These statements are represented by triples as follows:
(0) uminus b -
(1) + c d
(3) := (2)
Triples **
This representation doesn’t make use of extra temporary variable to represent a single
operation instead when a reference to another triple’s value is needed, a pointer to that
triple is used. So, it consist of only three fields namely op, arg1 and arg2.
Disadvantage –
1. S → id := E
2. E → E1 + E2
3. E → E1 * E2
4. E → (E1)
5. E → id
S → id :=E {p = look_up(id.name);
If p ≠ nil then
Emit (p = E.place)
Else
Error;
}
E → E1 + E2 {E.place = newtemp();
Emit (E.place = E1.place '+' E2.place)
}
E → E1 * E2 {E.place = newtemp();
Emit (E.place = E1.place '*' E2.place)
}
E → id {p = look_up(id.name);
If p ≠ nil then
Emit (p = E.place)
Else
Error;
}
o The Emit function is used for appending the three address code to the output file.
Otherwise it will report an error.
1. E → E OR E
2. E → E AND E
3. E → NOT E
4. E → (E)
5. E → id relop id
6. E → TRUE
7. E → FALSE
The AND and OR are left associated. NOT has the higher precedence then AND and lastly OR.
E → E1 OR E2 {E.place = newtemp();
Emit (E.place ':=' E1.place 'OR' E2.place)
}
E → E1 + E2 {E.place = newtemp();
Emit (E.place ':=' E1.place 'AND' E2.place)
}
The EMIT function is used to generate the three address code and the newtemp( ) function is
used to generate the temporary variables.
The E → id relop id2 contains the next_state and it gives the index of next three address
statements in the output sequence.
Here is the example which generates the three address code using the above translation
scheme:
1. S→ LABEL : S
2. LABEL → id
In this production system, semantic action is attached to record the LABEL and its value in
the symbol table.
1. S → if E then S
2. S → if E then S else S
3. S→ while E do S
4. S→ begin L end
5. S→ A
6. L→ L;S
7. L→ S
o This M is put before statement in both if then else. In case of while-do, we need to
put M before E as we need to come back to it after executing S.
o After this we should ensure that instead of second S, the code after the if-then else
will be executed. Then we place another non-terminal marker N after first S.
The grammar is as follows:
1. S→ if E then M S
2. S→ if E then M S else M S
3. S→ while M E do M S
4. S→ begin L end
5. S→ A
6. L→ L;MS
7. L→ S
8. M→ ∈
9. N→ ∈
Postfix Translation
In a production A → α, the translation rule of A.CODE consists of the concatenation of the
CODE translations of the non-terminals in α in the same order as the non-terminals appear
in α.
1. S → while M1 E do M2 S1
1. S→ C S1
2. C→ W E do
3. W→ while
C → W E do C W E do
1. S for L = E1 step E2 to E3 do S1
Can be factored as
1. F→ for L
2. T → F = E1 by E2 to E3 do
3. S → T S1
Multi-dimensional arrays:
Row major or column major forms
o Base+((i1-low1)*(high2-low2+1)+i2-low2)*width
The production:
1. S → L := E
2. E → E+E
3. E → (E)
4. E → L
5. L → Elist ]
6. L → id
7. Elist → Elist, E
8. Elist → id[E
L → id {L.place = lookup(id.name);
L.offset = null;
}
Elist → Elist, E {t := newtemp;
m := Elist1.ndim + 1;
EMIT (t ':=' Elist1.place '*' limit(Elist1.array, m));
EMIT (t, ':=' t '+' E.place);
Elist.array = Elist1.array;
Elist.place := t;
Elist.ndim := m;
}
Where:
ndim denotes the number of dimensions.
limit(array, i) function returns the upper limit along with the dimension of array
Procedures call
Procedure is an important and frequently used programming construct for a compiler. It is
used to generate good code for procedure calls and returns.
Calling sequence:
The translation for a call includes a sequence of actions taken on entry and exit from each
procedure. Following actions take place in a calling sequence:
o When a procedure call occurs then space is allocated for activation record.
o Establish the environment pointers to enable the called procedure to access data in
enclosing blocks.
o Save the state of the calling procedure so that it can resume execution after the call.
o Also save the return address. It is the address of the location to which the called
routine must transfer after it is finished.
o Finally generate a jump to the beginning of the code for the called procedure.
1. S→ call id(Elist)
2. Elist → Elist, E
3. Elist → E
Declarations
When we encounter declarations, we need to lay out storage for the declared variables.
For every local name in a procedure, we create a ST(Symbol Table) entry containing:
The production:
1. D→ integer, id
2. D → real, id
3. D → D1, id
ENTER is used to make the entry into symbol table and ATTR is used to trace the data type.
Case Statements
Switch and case statement is available in a variety of languages. The syntax of case
statement is as follows:
1. switch E
2. begin
3. case V1: S1
4. case V2: S2
5. .
6. .
7. .
8. case Vn-1: Sn-1
9. default: Sn
10. end
o When switch keyword is seen then a new temporary T and two new labels test and
next are generated.
o When the case keyword occurs then for each case keyword, a new label Li is created
and entered into the symbol table. The value of Vi of each case constant and a pointer
to this symbol-table entry are placed on a stack.