0% found this document useful (0 votes)
12 views187 pages

CD Unit 3

Pdf

Uploaded by

shireeshasri2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views187 pages

CD Unit 3

Pdf

Uploaded by

shireeshasri2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 187

UNI

T-3
PART-A

Semantic analysis

276
Syntax Analysis
Example
a := b + c* 100
❖ The seven tokens are grouped into a parse tree

Assignment stmt

identifier expression
:=

a expression expression
+

identifier
c*100
b
277
Example of
Parse Tree
list → list + digit (2.2)
Given the grammar:
list → list - digit (2.3)
list → digit (2.4)
digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 (2.5)

What is the parse list


tree for 9-5+2?
list digit

list digit

digit

9 - 5 + 2

278
Abstract Syntax
Tree (AST)
The AST is a condensed/simplified/abstract form of
the parse tree in which:
1. Operators are directly associated with interior nodes
(non-terminals)

2. Chains of single productions are collapsed.


3. Terminals that have no attributes are ignored, i.e.,
the corresponding leaf nodes are discarded.

279
Abstract and
Concrete Trees
list

list digit +
list digit
- 2
digit

9 - 5 + 2 9 5

Parse or concrete tree Abstract syntax tree

280
Advantages of the AST
Representation
• Convenient representation for semantic analysis and
intermediate-language (IL) generation
• Useful for building other programming language tools e.t., a
syntax-directed editor

281
Syntax Directed
Translation (SDT)
Syntax-directed translation is a method of translating a string into a
sequence of actions by attaching on such action to each rule of a grammar.

A syntax-directed translation is defined by augmenting the CFG: a translation


rule is defined for each production. A translation rule defines the translation
of the left-hand side non terminal.

282
Syntax-Directed Definitions
and Translation Schemes
A. Syntax-Directed Definitions:
• give high-level specifications for translations
• hide many implementation details such as order of evaluation of
semantic actions.
• We associate a production rule with a set of semantic actions, and we do
not say when they will be evaluated.

B. Translation Schemes:
• Indicate the order of evaluation of semantic actions associated with
a production rule.
• In other words, translation schemes give a little bit information about
implementation details.

283
Example Syntax-Directed Definition

term ::= ID
{ term.place := ID.place ; term.code = “” }

term1 ::= term2 *ID


{term1.place := newtemp( );
term1.code := term2.code || ID.code ||
gen(term1.place ‘:=‘ term2.place ‘*’ ID.place}

expr ::= term


{ expr.place := term.place ; expr.code := term.code }

expr1 ::= expr2 +term


{ expr1.place := newtemp( )
expr1.code := expr2.code || term.code ||
gen(expr1.place ‘:=‘ expr2.place ‘+’ term.place }

284
YACC – Yet Another
Compiler-Compiler
 A bottom-up parser generator
 It provides semantic stack manipulation and supports
specification of semantic routines.
 Developed by Steve Johnson and others at AT&T Bell Lab.
 Can use scanner generated by Lex or hand-coded scanner in C
 Used by many compilers and tools, including production
compilers.

285
Syntax-Directed

Translation
Grammar symbols are associated with attributes to associate
information with the programming language constructs that
they represent.
• Values of these attributes are evaluated by the semantic rules
associated with the production rules.
• Evaluation of these semantic rules:
– may generate intermediate codes
– may put information into the symbol table
– may perform type checking
– may issue error messages
– may perform some other activities
– in fact, they may perform almost any activities.
• An attribute may hold almost any thing.
– a string, a number, a memory location, a complex record.
286
Syntax-Directed Definitions and
Translation Schemes
• When we associate semantic rules with productions, we use
two notations:
– Syntax-Directed Definitions
– Translation Schemes
• Syntax-Directed Definitions:
– give high-level specifications for translations
– hide many implementation details such as order of
evaluation of semantic actions.
– We associate a production rule with a set of semantic
actions, and we do not say when they will be evaluated.

287
Syntax-Directed Definitions
and Translation
Schemes
• Translation Schemes:
– indicate the order of evaluation of semantic actions
associated with a production rule.
– In other words, translation schemes give a little bit
information about implementation details.

288
Syntax-Directed

Definitions
A syntax-directed definition is a generalization of a context-
free grammar in which:
– Each grammar symbol is associated with a set of attributes.
– This set of attributes for a grammar symbol is partitioned
into two subsets called synthesized and inherited attributes
of that grammar symbol.
– Each production rule is associated with a set of semantic
rules.
• Semantic rules set up dependencies between attributes which
can be represented by a dependency graph.
• This dependency graph determines the evaluation order of
these semantic rules.
• Evaluation of a semantic rule defines the value of an attribute.
But a semantic rule may also have some side effects such as289
Annotated
Parse Tree
• A parse tree showing the values of attributes at each node
is called an annotated parse tree.
• The process of computing the attributes values at the nodes
is called annotating (or decorating) of the parse tree.
• Of course, the order of these computations depends on the
dependency graph induced by the semantic rules.

290
Syntax-Directed
Definition
• In a syntax-directed definition, each production A ‹ α is
associated with a set of semantic rules of the form:
b=f(c1,c2,…,cn) where f is a function,
and b can be one of the followings:
➔ b is a synthesized attribute of A and c1,c2,…,cn are
attributes of the grammar symbols in the production (
A ‹ α ).
OR
➔ b is an inherited attribute one of the grammar symbols
in α (on the
right side of the production), and c1,c2,…,cn are
attributes of the grammar symbols in the production (
A ‹ α ).

291
Attribute
Grammar
• So, a semantic rule b=f(c1,c2,…,cn) indicates that the
attribute b depends on attributes c1,c2,…,cn.
• In a syntax-directed definition, a semantic rule may just
evaluate a value of an attribute or it may have some
side effects such as printing values.

• An attribute grammar is a syntax-directed definition in


which the functions in the semantic rules cannot have side
effects (they can only evaluate values of attributes).

292
Syntax-Directed
Definition -- Example
Production Semantic Rules
L ‹ E return print(E.val)
E‹ E1 + T E.val = E1.val + T.val
E‹ T E.val = T.val
T ‹ T1 * F T.val = T1.val * F.val
T‹ F T.val = F.val
F‹ (E) F.val = E.val
F ‹ digit F.val = digit.lexval

• Symbols E, T, and F are associated with a synthesized


attribute val.
• The token digit has a synthesized attribute lexval (it is
assumed that it is evaluated by the lexical analyzer).
293
Annotated Parse Tree
-- Example
L
Input: 5+3*4

E.val=17 return

E.val=5 + T.val=12

T.val=5 T.val=3 * F.val=4

F.val=5 F.val=3 digit.lexval=4

digit.lexval=5 digit.lexval=3

294
Dependency
GraphL
Input: 5+3*4

E.val=17

E.val=5 T.val=12

T.val=5 T.val=3 F.val=4

F.val=5 F.val=3 digit.lexval=4

digit.lexval=5 digit.lexval=3

295
Syntax-Directed
Definition – Example2
Production Semantic Rules
E ‹ E1 + T E.loc=newtemp(), E.code = E1.code ||
T.code
|| add E1.loc,T.loc,E.loc

E‹ T E.loc = T.loc, E.code=T.code

T ‹ T1 * F T.loc=newtemp(), T.code = T1.code ||


F.code
|| mult T1.loc,F.loc,T.loc

T‹ F T.loc = F.loc, T.code=F.code


F‹ (E) F.loc = E.loc, F.code=E.code
F ‹ id F.loc = id.name, F.code=“”

296
Syntax-Directed
Definition – Example2
• Symbols E, T, and F are associated with synthesized
attributes loc and code.
• The token id has a synthesized attribute name (it is
assumed that it is evaluated by the lexical analyzer).
• It is assumed that || is the string concatenation operator.

297
Syntax-Directed Definition –
Inherited Attributes
Production Semantic Rules
D‹ TL L.in = T.type
T‹ int T.type = integer
T ‹ real T.type = real
L ‹ L1 id L1.in = L.in, addtype(id.entry,L.in)
L ‹ id addtype(id.entry,L.in)

• Symbol T is associated with a synthesized attribute


type.
• Symbol L is associated with an inherited attribute in.

298
A Dependency Graph –
Inherited Attributes
Input: real p q

D L.in=real

T L T.type=real L1.in=real
addtype(q,real)

real L id addtype(p,real) id.entry=q

id id.entry=p

parse tree dependency graph


299
S-Attributed
Definitions
• Syntax-directed definitions are used to specify syntax-directed
translations.
• To create a translator for an arbitrary syntax-directed definition
can be difficult.
• We would like to evaluate the semantic rules during parsing (i.e.
in a single pass, we will parse and we will also evaluate semantic
rules during the parsing).
• We will look at two sub-classes of the syntax-directed
definitions:
– S-Attributed Definitions: only synthesized attributes used in
the syntax-directed definitions.
– L-Attributed Definitions: in addition to synthesized
attributes, we may also use inherited attributes in a
restricted fashion.

300
S-Attributed
Definitions
• To implement S-Attributed Definitions and L-Attributed
Definitions are easy (we can evaluate semantic rules in a
single pass during the parsing).
• Implementations of S-attributed Definitions are a little bit
easier than implementations of L-Attributed Definitions

301
Bottom-Up Evaluation of S-
Attributed Definitions
• We put the values of the synthesized attributes of the
grammar symbols into a parallel stack.
– When an entry of the parser stack holds a grammar
symbol X (terminal or non-terminal), the
corresponding entry in the parallel stack will hold the
synthesized attribute(s) of the symbol X.

• We evaluate the values of the attributes during reductions.

302
Bottom-Up Evaluation of S-
Attributed Definitions
A → XYZ A.a=f(X.x,Y.y,Z.z) where all attributes are
synthesized.
stack parallel-stack

top →


Z Z.z
➔ top
Y Y.y
X X.x A A.a
. . . .
303
Bottom-Up Eval. of S-Attributed
Definitions (cont.)
Production Semantic Rules
L ‹ E return print(val[top-1])
E‹ E1 + T val[ntop] = val[top-2] + val[top]
E‹ T
T ‹ T1 * F val[ntop] = val[top-2] * val[top]
T‹ F
F‹ (E) val[ntop] = val[top-1]
F ‹ digit

• At each shift of digit, we also push digit.lexval into val-stack.


• At all other shifts, we do not put anything into val-stack because
other terminals do not have attributes (but we increment the
stack pointer for val-stack).

304
Canonical LR(0) Collection for The Grammar

..
I0: L’→ L
L I1: L’→L . I7: L →Er . .
.
I11: E →E+T * 9
r
L→
E→ ..Er
E+T E I2: L →E r .. +
..
I8: E →E+ T. T
F 4
T →T *F

..
E→ T E →E +T T → T*F (
5
T→
T→
F→ ..
T*F
F
(E)
T I3: E →T
T →T *F
.. ..
T→ F
F → (E)
F→ d
d
6

F→ d
F I4: T →F . *

.
( . I9: T →T* F
..
F
I12: T →T*F .
I5: F →
E→ .. ( E)
E+T
E
F → (E)
F→ d (
5

..
E→ T d 6
T→
T→
F→ ..
T*F
F
(E)
T
3
..
I10:F →(E )
E →E +T
) I13: F →(E) .
F→ d F
4 8
d I6: F →d . (
d
5
6
+

305
Bottom-Up Evaluation
0E2+8F4 5-3
-- Example
*4r T‹F T.val=F.val – do nothing

0E2+8T11 5-3 *4r s9 push empty slot into


val-
stack

0E2+8T11*9 5-3- 4r s6 d.lexval(4) into val-stack

0E2+8T11*9d6 5-3-4 r F‹d F.val=d.lexval – do


nothing

0E2+8T11*9F12 5-3-4 r T‹T* F


T.val=T1.val*F.val

0E2+8T11 5-12 r E‹E+T


E.val=E1.val*T.val

0E2 17 r s7 push empty slot


into
val-stack 306
Top-Down Evaluation (of S-
Attributed Definitions)
Productions Semantic Rules
A‹ B print(B.n0), print(B.n1)
B ‹ 0 B1 B.n0=B1.n0+1, B.n1=B1.n1
B ‹ 1 B1 B.n0=B1.n0, B.n1=B1.n1+1
B‹  B.n0=0, B.n1=0

where B has two synthesized attributes (n0 and n1).

307
Top-Down Evaluation (of S-
Attributed Definitions)
• Remember that: In a recursive predicate parser, each non-
terminal corresponds to a procedure.

procedure A() {
call B(); A‹
B
}
procedure B() {
if (currtoken=0) { consume 0; call B(); } B‹
0B
else if (currtoken=1) { consume 1; call B(); } B‹
1B
else if (currtoken=$) {} // $ is end-marker B‹ 
else error(“unexpected token”);
}

308
Top-Down Evaluation (of S-
Attributed Definitions)
procedure A() {
int n0,n1; Synthesized attributes of non-terminal B
call B(&n0,&n1); are the output parameters of procedure B.
print(n0); print(n1);
} All the semantic rules can be
evaluated
procedure B(int *n0, int *n1) { at the end of parsing of production
rules
if (currtoken=0)
{int a,b; consume 0; call B(&a,&b); *n0=a+1; *n1=b;}
else if (currtoken=1)
{ int a,b; consume 1; call B(&a,&b); *n0=a; *n1=b+1; }
else if (currtoken=$) {*n0=0; *n1=0; } // $ is end-marker
else error(“unexpected token”);
}

309
L-Attributed
Definitions
• S-Attributed Definitions can be efficiently implemented.
• We are looking for a larger (larger than S-Attributed
Definitions) subset of syntax-directed definitions which can
be efficiently evaluated.
➔ L-Attributed Definitions

• L-Attributed Definitions can always be evaluated by the


depth first visit of the parse tree.
• This means that they can also be evaluated during the
parsing.

310
L-Attributed
Definitions
• A syntax-directed definition is L-attributed if each
inherited attribute of Xj, where 1jn, on the right side of
A ‹ X1X2...Xn depends only on:
1. The attributes of the symbols X1,...,Xj-1 to the left of Xj
in the production and
2. the inherited attribute of A

• Every S-attributed definition is L-attributed, the


restrictions only apply to the inherited attributes (not to
synthesized attributes).

311
A Definition which is NOT
L-Attributed
Productions Semantic Rules
A‹ LM L.in=l(A.i), M.in=m(L.s), A.s=f(M.s)
A‹ QR R.in=r(A.in), Q.in=q(R.s), A.s=f(Q.s)
• This syntax-directed definition is not L-attributed because
the semantic rule Q.in=q(R.s) violates the restrictions of
L-attributed definitions.
• When Q.in must be evaluated before we enter to Q because
it is an inherited attribute.
• But the value of Q.in depends on R.s which will be
available after we return from R. So, we are not be able to
evaluate the value of Q.in before we enter to Q.

312
Translation
Schemes
• In a syntax-directed definition, we do not say anything
about the evaluation times of the semantic rules (when the
semantic rules associated with a production should be
evaluated?).

• A translation scheme is a context-free grammar in which:


– attributes are associated with the grammar symbols and
– semantic actions enclosed between braces {} are
inserted within the right sides of productions.

• Ex: A ‹ { ... } X { ... } Y { ... }


Semantic Actions 313
Translation Schemes

• When designing a translation scheme, some restrictions


should be observed to ensure that an attribute value is
available when a semantic action refers to that attribute.
• These restrictions (motivated by L-attributed definitions)
ensure that a semantic action does not refer to an
attribute that has not yet computed.
• In translation schemes, we use semantic action
terminology instead of semantic rule terminology used in
syntax-directed definitions.
• The position of the semantic action on the right side
indicates when that semantic action will be evaluated.

314
Translation Schemes for S-

attributed Definitions
If our syntax-directed definition is S-attributed, the
construction of the corresponding translation scheme will be
simple.
• Each associated semantic rule in a S-attributed syntax-
directed definition will be inserted as a semantic action into
the end of the right side of the associated production.

Production Semantic Rule


E ‹ E1 + T E.val = E1.val + T.val ➔ a production of
a syntax directed
definition

E ‹ E1 + T { E.val = E1.val + T.val } ➔ the production of
315
the
A Translation
Scheme Example
• A simple translation scheme that converts infix expressions
to the corresponding postfix expressions.

E‹ TR
R‹ + T { print(“+”) } R1
R‹ 
T‹ id { print(id.name) }

a+b+c ➔ ab+c+

infix expression postfix expression

316
A Translation Scheme
E Example (cont.)

T R

id {print(“a”)} + T {print(“+”)} R

id {print(“b”)} + T {print(“+”)}
R

id {print(“c”)} 

The depth first traversal of the parse tree (executing the


semantic actions in that order) will produce the postfix
representation of the infix expression.
317
Inherited Attributes in
• Translation
If a translation Schemes
scheme has to contain both synthesized and
inherited attributes, we have to observe the following rules:
1. An inherited attribute of a symbol on the right side of a
production must be computed in a semantic action before
that symbol.
2. A semantic action must not refer to a synthesized attribute
of a symbol to the right of that semantic action.
3. A synthesized attribute for the non-terminal on the left can
only be computed after all attributes it references have
been computed (we normally put this semantic action at
the end of the right side of the production).
• With a L-attributed syntax-directed definition, it is always
possible to construct a corresponding translation scheme
which satisfies these three conditions (This may not be
possible for a general syntax-directed translation).
318
Top-Down
Translation
• We will look at the implementation of L-attributed
definitions during predictive parsing.
• Instead of the syntax-directed translations, we will work
with translation schemes.
• We will see how to evaluate inherited attributes (in L-
attributed definitions) during recursive predictive parsing.
• We will also look at what happens to attributes during the
left-recursion elimination in the left-recursive grammars.

319
A Translation Scheme with
Inherited Attributes
D ‹ T id { addtype(id.entry,T.type), L.in = T.type } L

T ‹ int { T.type = integer }


T ‹ real { T.type = real }
L ‹ id { addtype(id.entry,L.in), L1.in = L.in } L1
L‹ 

• This is a translation scheme for an L-attributed definitions.

320
Predictive Parsing (of
Inherited Attributes)
procedure D() {
int Ttype,Lin,identry;
call T(&Ttype); consume(id,&identry);
addtype(identry,Ttype); Lin=Ttype;
call L(Lin); a synthesized attribute
(an output parameter)
}
procedure T(int *Ttype) {
if (currtoken is int) { consume(int); *Ttype=TYPEINT; }
else if (currtoken is real) { consume(real);
*Ttype=TYPEREAL; }
else { error(“unexpected type”); } an inherited attribute
(an input parameter)
}
321
Predictive Parsing (of
Inherited Attributes)
procedure L(int Lin) {
if (currtoken is id) { int L1in,identry;
consume(id,&identry);
addtype(identry,Lin); L1in=Lin; call
L(L1in); }
else if (currtoken is endmarker) { }
else { error(“unexpected token”); }
}

322
Translation Scheme - Intermediate
Code Generation
E ‹ T { A.in=T.loc } A { E.loc=A.loc }
A ‹ + T { A1.in=newtemp(); emit(add,A.in,T.loc,A1.in)
}
A1 { A.loc =A1.loc}
A‹  { A.loc = A.in }
T ‹ F { B.in=F.loc } B { T.loc=B.loc }
B‹ * F { B1.in=newtemp(); emit(mult,B.in,F.loc,B1.in)
}
B1 { B.loc = B1.loc}
B‹  { B.loc = B.in }
F ‹ ( E ) { F.loc = E.loc }
F ‹ id { F.loc = id.name }
323
Predictive Parsing – Intermediate
Code Generation
procedure E(char **Eloc) {
char *Ain, *Tloc, *Aloc;
call T(&Tloc); Ain=Tloc;
call A(Ain,&Aloc); *Eloc=Aloc;
}
procedure A(char *Ain, char **Aloc){
if (currtok is +) {
char *A1in, *Tloc, *A1loc;
consume(+); call T(&Tloc); A1in=newtemp();
emit(“add”,Ain,Tloc,A1in);
call A(A1in,&A1loc); *Aloc=A1loc;
}
else { *Aloc = Ain }
} 324
Predictive
Parsing (cont.)
procedure T(char **Tloc) {
char *Bin, *Floc, *Bloc;
call F(&Floc); Bin=Floc;
call B(Bin,&Bloc); *Tloc=Bloc;
}
procedure B(char *Bin, char **Bloc) {
if (currtok is *) {
char *B1in, *Floc, *B1loc;
consume(+); call F(&Floc); B1in=newtemp();
emit(“mult”,Bin,Floc,B1in);
call B(B1in,&B1loc); Bloc=B1loc;
}

325
Predictive Parsing (cont.)

else { *Bloc = Bin }


}
procedure F(char **Floc) {
if (currtok is “(“) { char *Eloc; consume(“(“);
call E(&Eloc); consume(“)”); *Floc=Eloc }
else { char *idname; consume(id,&idname);
*Floc=idname }
}
326
Bottom-Up Evaluation of
Inherited Attributes
• Using a top-down translation scheme, we can
implement any L-attributed definition based
on a LL(1) grammar.
• Using a bottom-up translation scheme, we can also
implement any L-attributed definition based on a
LL(1) grammar (each LL(1) grammar is also an LR(1)
grammar).
• In addition to the L-attributed definitions based on
LL(1) grammars, we can implement some of L-
attributed definitions based on LR(1) grammars (not
all of them) using the bottom-up translation scheme.

327
Removing Embedding
Semantic Actions
• In bottom-up evaluation scheme, the
semantic actions are evaluated during the
reductions.
• During the bottom-up evaluation of S-
attributed definitions, we have a parallel
stack to hold synthesized attributes.
• Problem: where are we going to hold
inherited attributes?
• A Solution:
328
Removing Embedding
Semantic Actions
– We will convert our grammar to an equivalent
grammar to guarantee to the followings.
– All embedding semantic actions in our translation
scheme will be moved into the end of the production
rules.
– All inherited attributes will be copied into the
synthesized attributes (most of the time synthesized
attributes of new non-terminals).
– Thus we will be evaluate all semantic actions during
reductions, and we find a place to store an inherited
attribute.

329
Removing Embedding
Semantic Actions
• To transform our translation scheme into an equivalent
translation scheme:
1. Remove an embedding semantic action Si, put new a non-
terminal Mi instead of that semantic action.
2. Put that semantic action Si into the end of a new
production rule Mi→ for that non-terminal Mi.
3. That semantic action Si will be evaluated when this new
production rule is reduced.
4. The evaluation order of the semantic rules are not
changed by this transformation.

330
Removing Embedding
Semantic Actions
A→ {S1} X1 {S2} X2 ... {Sn} Xn

 remove embedding semantic actions

A→ M1 X1 M2 X2 ... Mn Xn
M1→ {S1}
M2→ {S2}
.
.
Mn→ {Sn}

331
Removing Embedding
Semantic Actions
E‹ TR
R‹ + T { print(“+”) } R1
R‹ 
T‹ id { print(id.name) }

 remove embedding semantic actions

E‹ TR
R‹ + T M R1
R‹ 
T‹ id { print(id.name) }
M‹  { print(“+”) }
332
Translation with
Inherited Attributes
• Let us assume that every non-terminal A has an inherited
attribute A.i, and every symbol X has a synthesized
attribute X.s in our grammar.
• For every production rule A→ X1 X2 ... Xn ,
– introduce new marker non-terminals M1,M2,...,Mn and
– replace this production rule with A→ M1 X1 M2 X2 ...
Mn Xn
– the synthesized attribute of Xi will be not changed.
– the inherited attribute of Xi will be copied into the
synthesized attribute of Mi by the new semantic action
added at the end of the new production rule Mi→.
– Now, the inherited attribute of Xi can be found in the
synthesized attribute of Mi (which is immediately
available in the stack.

333
Translation with
Inherited
S → {A.i=1} A{S.s=k(A.i,A.s)}
Attributes
A → {B.i=f(A.i)} B {C.i=g(A.i,B.i,B.s)} C {A.s=
h(A.i,B.i,B.s,C.i,C.s)}
B → b {B.s=m(B.i,b.s)}
C → c {C.s=n(C.i,c.s)}
S → {M1.i=1} M1 {A.i=M1.s} A {S.s=k(M1.s,A.s)}
A → {M2.i=f(A.i)} M2 {B.i=M2.s}B
{M3.i=g(A.i,M2.s,B.s)} M3 {C.i=M3.s} C {A.s= h(A.i,
M2.s,B.s, M3.s,C.s)}
B → b {B.s=m(B.i,b.s)}
C → c {C.s=n(C.i,c.s)}
M1→ {M1.s=M1.i}
M2→ {M2.s=M2.i}
M3→ {M3.s=M3.i}
334
Actual
Translation
1 1 Scheme
S → {M .i=1} M {A.i=M .s}1A{S.s=k(M .s,A.s)}
1
A → {M2.i=f(A.i)} M2 {B.i=M2.s} B {M3.i=g(A.i,M2.s,B.s)} M3
{C.i=M3.s} C {A.s= h(A.i, M2.s,B.s, M3.s,C.s)}
B → b {B.s=m(B.i,b.s)}
C → c {C.s=n(C.i,c.s)}
M1→ {M1.s= M1.i}
M2→ {M2.s=M2.i}
M3→ {M3.s=M3.i}

335
Actual
Translation
S → M1 A
Scheme
{ s[ntop]=k(s[top-1],s[top]) }

M1→  { s[ntop]=1 }

A → M2 B M3 C { s[ntop]=h(s[top-4],s[top-3],s[top-2],
s[top-1],s[top]) }

M2→  { s[ntop]=f(s[top]) }
M3→  { s[ntop]=g(s[top-2],s[top-1],s[top])}

B→b { s[ntop]=m(s[top-1],s[top]) }
C→c { s[ntop]=n(s[top-1],s[top]) } 336
Evaluation of
Attributes
S
S.s=k(1,h(..))
A.i=1
A
A.s=h(1,f(1),m(..),g(..),n(..))

B.i=f(1) C.i=g(1,f(1),m(..))
B C
B.s=m(f(1),b.s) C.s=n(g(..),c.s)

b c

337
Evaluation of
Attributes
stack input s-attribute stack
bc$
M1 bc$ 1
M1 M2 bc$ 1 f(1)
M1 M2 b c$ 1 f(1) b.s
M1 M2 B c$ 1 f(1) m(f(1),b.s)
M 1 M 2 B M3 c$ 1 f(1) m(f(1),b.s) g(1,f(1),m(f(1),b.s))
M1 M2 B M3 c $ 1 f(1) m(f(1),b.s) g(1,f(1),m(f(1),b.s)) c.s
M1 M2 B M3 C $ 1 f(1) m(f(1),b.s) g(1,f(1),m(f(1),b.s))
n(g(..),c.s)
M1 A $ 1 h(f(1),m(..),g(..),n(..))
S $ k(1,h(..))

338
Probl
ems
• All L-attributed definitions based on LR grammars cannot be evaluated
during bottom-up parsing.

S → { L.i=0 } L ➔ this translations scheme cannot be implemented


L → { L1.i=L.i+1 } L1 1 during the bottom-up parsing
L →  { print(L.i) }

S → M1 L
L → M2 L1 1 ➔ But since L →  will be reduced first by the
bottom-up
L→  { print(s[top]) } parser, the translator cannot know the number of
1s.
M1 →  { s[ntop]=0 }
M2 →  { s[ntop]=s[top]+1 }

339
Probl
ems
• The modified grammar cannot be LR grammar anymore.

L → Lb L → M Lb
L→a ➔ L→a NOT LR-grammar
M→

S’ → .L, $
L → . M L b,$
L → . a, $
M → .,a ➔ shift/reduce conflict
340
Intermediate Code
Generation
• Intermediate codes are machine independent codes, but
they are close to machine instructions.

• The given program in a source language is converted to an


equivalent program in an intermediate language by the
intermediate code generator.

• Intermediate language can be many different languages,


and the designer of the compiler decides this intermediate
language.
– syntax trees can be used as an intermediatelanguage.

341
Intermediate Code
Generation
– postfix notation can be used as an intermediate language.
– three-address code (Quadraples) can be used as an
intermediate language
• we will use quadraples to discuss intermediate code
generation
• quadraples are close to machine instructions, but they are
not actual machine instructions.
– some programming languages have well defined intermediate
languages.
• java – java virtual machine
• prolog – warren abstract machine
• In fact, there are byte-code emulators to execute
instructions in these intermediate languages.

342
Three-Address Code
(Quadraples)
• A quadraple is:
x := y op z
where x, y and z are names, constants or compiler-generated
temporaries; op is any operator.

• But we may also the following notation for quadraples (much


better notation because it looks like a machine code instruction)
op y,z,x
apply operator op to y and z, and store the result in x.
• We use the term “three-address code” because each statement
usually contains three addresses (two for operands, one for the
result).

343
Three-Address
Statements
Binary Operator: op y,z,result or result := y op z
where op is a binary arithmetic or logical operator. This binary
operator is applied to y and z, and the result of the operation is
stored in result.
Ex: add a,b,c
gt a,b,c
addr a,b,c
addi a,b,c

Unary Operator: op y,,result or result := op y


where op is a unary arithmetic or logical operator. This unary
operator is applied to y, and the result of the operation is stored in
result.
Ex: uminus a,,c
not a,,c
int to real a,,c

344
Three-Address
Statements (cont.)
Move Operator: mov y,,result or result := y
where the content of y is copied into result.
Ex: mov a,,c
movi a,,c
movr a,,c

Unconditional Jumps: jmp ,,L or goto L


We will jump to the three-address code with the label L,
and the execution continues from that statement.
Ex: jmp ,,L1 // jump to L1
jmp ,,7 // jump to the statement 7

345
Three-Address Statements (cont.)
Conditional Jumps: jmprelop y,z,L or if y
relop z goto L
We will jump to the three-address code with the label L if the result of y
relop z is true, and the execution continues from that statement. Ifthe
result is false, the execution continues from the statement following this
conditional jump statement.

Ex: jmpgt y,z,L1 // jump to L1 if y>z


jmpgte y,z,L1 // jump to L1 if y>=z
jmpe y,z,L1 // jump to L1 if y==z
jmpne y,z,L1 // jump to L1 if y!=z

346
Three-Address
Statements (cont.)
Our relational operator can also be a unary operator.
jmpnz y,,L1 // jump to L1 if y is not zero
jmpz y,,L1 // jump to L1 if y is zero
jmpt y,,L1 // jump to L1 if y is true
jmpf y,,L1 // jump to L1 if y is false

347
Three-Address
Statements
Procedure Parameters:
(cont.)
param x,, or param x
Procedure Calls: call p,n, or call p,n
where x is an actual parameter, we invoke the procedure p
with n parameters.
Ex: param x1,,
param x2,,
➔ p(x1,...,xn)
param xn,,
call p,n,
f(x+1,y) ➔ add x,1,t1
param t1,,
param y,,
348
call f,2,
Three-Address
Statements (cont.)
Indexed Assignments:
move y[i],,x or x := y[i]
move x,,y[i] or y[i] := x

Address and Pointer Assignments:


moveaddr y,,x or x := &y
movecont y,,x or x := *y

349
Syntax-Directed Translation into
Three-Address Code
S → id := E “.code = E.code || gen(‘mov’ E.place ‘,,’ id.place)

E → E1 +E2 E.place = newtemp();


E.code = E1.code || E2.code || gen(‘add’ E1.place ‘,’ E2.place ‘,’
E.place)

E → E1 *E2 E.place = newtemp();


E.code = E1.code || E2.code || gen(‘mult’ E1.place ‘,’ E2.place ‘,’
E.place)
E → - E1 E.place = newtemp();
E.code = E1.code || gen(‘uminus’ E1.place ‘,,’ E.place)
E → ( E1 ) E.place = E1.place;
E.code = E1.code
E → id E.place = id.place;
E.code = null

350
Syntax-Directed
Translation (cont.)
S → while E do S1 S.begin = newlabel();
S.after = newlabel();
“.code = gen(“.begin “:”) || E.code ||
gen(‘jmpf’ E.place ‘,,’ “.after) || “1.code ||
gen(‘jmp’ ‘,,’ “.begin) ||
gen(“.after ‘:”)
S → if E then S1 else S2 S.else = newlabel();
S.after = newlabel();
S.code = E.code ||
gen(‘jmpf’ E.place ‘,,’ “.else) || “1.code ||
gen(‘jmp’ ‘,,’ “.after) ||
gen(“.else ‘:”) || “2.code ||
gen(“.after ‘:”)

351
Translation Scheme to Produce
S → id := E
Three-Address Code
{ p= lookup(id.name);
if (p is not nil) then emit(‘mov’ E.place ‘,,’ p)
else error(“undefined-variable”) }
E → E1 +E2 { E.place = newtemp();
emit(‘add’ E1.place ‘,’ E2.place ‘,’ E.place) }
E → E1 *E2 { E.place = newtemp();
emit(‘mult’ E1.place ‘,’ E2.place ‘,’ E.place) }
E → - E1 { E.place = newtemp();
emit(‘uminus’ E1.place ‘,,’ E.place) }
E → ( E1 ) { E.place = E1.place; }
E → id { p= lookup(id.name);
if (p is not nil) then E.place = id.place
else error(“undefined-variable”) }

352
Translation Scheme
S → id := { E.inloc =with
S.inlocLocations
}E
{ p = lookup(id.name);
if (p is not nil) then { emit(E.outloc ‘mov’ E.place ‘,,’ p);
S.outloc=E.outloc+1 }
else { error(“undefined-variable”); “.outloc=E.outloc
}}

E → { E1.inloc = E.inloc } E1 + { E2.inloc = E1.outloc } E2


{ E.place = newtemp(); emit(E2.outloc ‘add’ E1.place ‘,’
E2.place ‘,’ E.place); E.outloc=E2.outloc+1 }

E → { E1.inloc = E.inloc } E1 + { E2.inloc = E1.outloc } E2


{ E.place = newtemp(); emit(E2.outloc ‘mult’ E1.place ‘,’
E2.place ‘,’ E.place); E.outloc=E2.outloc+1 }
353
Translation Scheme
with Locations
E → - { E1.inloc = E.inloc } E1
{ E.place = newtemp(); emit(E1.outloc ‘uminus’
E1.place ‘,,’ E.place);
E.outloc=E1.outloc+1 }

E → ( E1 ){ E.place = E1.place; E.outloc=E1.outloc+1 }

E → id { E.outloc = E.inloc; p= lookup(id.name);


if (p is not nil) then E.place = id.place
else error(“undefined-variable”) }

354
Boolean
Expressions
E → { E1.inloc = E.inloc } E1 and { E2.inloc = E1.outloc }E2
{ E.place = newtemp(); emit(E2.outloc ‘and’ E1.place ‘,’
E2.place ‘,’ E.place); E.outloc=E2.outloc+1 }
E → { E1.inloc = E.inloc } E1 or { E2.inloc = E1.outloc }E2
{ E.place = newtemp(); emit(E2.outloc ‘and’ E1.place ‘,’
E2.place ‘,’ E.place); E.outloc=E2.outloc+1 }
E → not { E1.inloc = E.inloc } E1
{ E.place = newtemp(); emit(E1.outloc ‘not’ E1.place ‘,,’
E.place); E.outloc=E1.outloc+1 }
E → { E1.inloc = E.inloc } E1 relop { E2.inloc = E1.outloc } E2
{ E.place = newtemp();
emit(E2.outloc relop.code E1.place ‘,’ E2.place ‘,’
E.place); E.outloc=E2.outloc+1 }
355
Translation
Scheme(cont.)
S → while { E.inloc = S.inloc } E do
{ emit(E.outloc ‘jmpf’ E.place ‘,,’‘NOTKNO→N’);
S1.inloc=E.outloc+1; } S1
{ emit(S1.outloc ‘jmp’ ‘,,’S.inloc);
S.outloc=S1.outloc+1;
backpatch(E.outloc,S.outloc); }

356
Translation
Scheme(cont.)
S → if { E.inloc = S.inloc } E then
{ emit(E.outloc ‘jmpf’ E.place ‘,,’‘NOTKNO→N’);
S1.inloc=E.outloc+1; } S1 else
{ emit(S1.outloc ‘jmp’ ‘,,’‘NOTKNO→N’);
S2.inloc=S1.outloc+1;
backpatch(E.outloc,S2.inloc); } S2
{ S.outloc=S2.outloc;
backpatch(S1.outloc,S.outloc); }

357
Three Address
Codes - Example
x:=1; 01: mov 1,,x
y:=x+10; 02: add x,10,t1
while (x<y) { ➔ 03: mov t1,,y
x:=x+1; 04: lt x,y,t2
if (x%2==1) then y:=y+1; 05: jmpf t2,,17
else y:=y-2; 06: add x,1,t3
} 07: mov t3,,x

358
Three Address
Codes - Example
08: mod x,2,t4
09: eq t4,1,t5
10: jmpf t5,,14
11: add y,1,t6
12: mov t6,,y
13: jmp ,,16
14: sub y,2,t7
15: mov t7,,y
16: jmp ,,4
17:

359
Arr
ays
• Elements of arrays can be accessed quickly if the elements
are stored in a block of consecutive locations.
A one-dimensional arrayA:
… …

baseA low i width


baseA is the address of the first location of the array A,
width is the width of each array element.
low is the index of the first array element

location of A[i] ➔ baseA+(i-low)*width


360
Arrays
(cont.)
baseA+(i-low)*width
can be re-written as i*width + (baseA-low*width)
should be computed at run-time can be computed at
compile-time
• So, the location of A[i] can be computed at the run-time by
evaluating the formula i*width+c where c is (baseA-
low*width) which is evaluated at compile-time.

• Intermediate code generator should produce the code to


evaluate this formula i*width+c (one multiplication and
one addition operation).
361
Two-
Dimensional
• A two-dimensional array Arrays
can be stored in
– either row-major (row-by-row) or
– column-major (column-by-column).
• Most of the programming languages use row-major
method.

• Row-major representation of a two-dimensional array:

row1 row2 rown

362
Two-Dimensional

Arrays (cont.)
The location of A[i ,i ] is
1 2
baseA+ ((i1-low1)*n2+i2-low2)*width
baseA is the location of the array A.
low1 is the index of the first row
low2 is the index of the first column
n2 is the number of elements in each row
width is the width of each array element
• Again, this formula can be re-written as

((i1*n2)+i2)*width + (baseA-((low1*n1)+low2)*width)

should be computed at run-time can be computed at


compile-time
363
Multi-
Dimensional
• In general, the location Arrays
of A[i ,i ,...,i ] is
1 2 k
(( ... ((i1*n2)+i2) ...)*nk+ik)*width + (baseA-
((...((low1*n1)+low2)...)*nk+lowk)*width)

• So, the intermediate code generator should produce the codes


to evaluate the following formula (to find the location of
A[i1,i2,...,ik]) :
(( ... ((i1*n2)+i2) ...)*nk+ik)*width + c
• To evaluate the (( ... ((i1*n2)+i2) ...)*nk+ik portion of this
formula, we can use the recurrence equation:
e1 = i1
em = em-1 * nm + im
364
Translation Scheme
for Arrays
• If we use the following grammar to calculate addresses of
array elements, we need inherited attributes.

L → id | id [ Elist ]
Elist → Elist , E | E

• Instead of this grammar, we will use the following


grammar to calculate addresses of array elements so that
we do not need inherited attributes (we will use only
synthesized attributes).

L → id | Elist ]
Elist → Elist , E | id [ E
365
Translation Scheme for
Arrays (cont.)
S → L := E { if (L.offset is null) emit(‘mov’ E.place ‘,,’L.place)
else emit(‘mov’ E.place ‘,,’ L.place ‘[‘ L.offset ‘]’)}

E → E1 + E2 { E.place = newtemp();
emit(‘add’E1.place ‘,’ E2.place ‘,’ E.place) }

E → ( E1 ) { E.place = E1.place; }

E→L { if (L.offset is null) E.place = L.place)


else { E.place = newtemp();
emit(‘mov’ L.place ‘[‘ L.offset ‘]’‘,,’ E.place)
}}

366
Translation Scheme for
L → id { L.place =Arrays (cont.)
id.place; L.offset = null; }
L → Elist ]
{ L.place = newtemp(); L.offset = newtemp();
emit(‘mov’ c(Elist.array) ‘,,’L.place);
emit(‘mult’Elist.place ‘,’width(Elist.array) ‘,’L.offset)
}
Elist → Elist1 , E
{ Elist.array = Elist1.array ; Elist.place = newtemp();
Elist.ndim = Elist1.ndim + 1;
emit(‘mult’ Elist1.place ‘,’limit(Elist.array,Elist.ndim)
‘,’Elist.place);
emit(‘add’Elist.place ‘,’E.place ‘,’Elist.place); }
Elist → id [ E
{Elist.array = id.place ; Elist.place = E.place; Elist.ndim
= 1; }
367
Translation Scheme for
Arrays – Example1
• A one-dimensional double arrayA : 5..100
➔ n1=95 width=8 (double) low1=5

• Intermediate codes corresponding to x := A[y]

mov c,,t1 // where c=baseA-(5)*8


mult y,8,t2
mov t1[t2],,t3
mov t3,,x

368
Translation Scheme for
Arrays – Example2
• A two-dimensional int array A : 1..10x1..20
➔ n1=10 n2=20 width=4 (integers) low1=1 low2=1

• Intermediate codes corresponding to x := A[y,z]

mult y,20,t1
add t1,z,t1
mov c,,t2 // where c=baseA-
(1*20+1)*4
mult t1,4,t3
mov t2[t3],,t4
mov t4,,x

369
Translation Scheme for
Arrays – Example3
• A three-dimensional int array A : 0..9x0..19x0..29
➔ n1=10 n2=20 n3=30 width=4 (integers) low1=0 low2=0
low3=0

• Intermediate codes corresponding to x := A[w,y,z]


mult w,20,t1
add t1,y,t1
mult t1,30,t2
add t2,z,t2
mov c,,t3 // where c=baseA-((0*20+0)*30+0)*4
mult t2,4,t4
mov t3[t4],,t5
mov t5,,x

370
Declara
tions
P→ MD
M→€ { offset=0 }
D→ D;D
D → id : T { enter(id.name,T.type,offset);
offset=offset+T.width }
T → int { T.type=int; T.width=4 }
T → real { T.type=real; T.width=8 }
T → array[num] of T1 { T.type=array(num.val,T1.type);
T.width=num.val*T1.width }
T → ↑ T1 { T.type=pointer(T1.type); T.width=4 }

where enter crates a symbol table entry with given values.


371
Nested Procedure
Declarations
• For each procedure we should create a symbol table.

mktable(previous) – create a new symbol table where


previous is the parent symbol table of this new symbol
table

enter(symtable,name,type,offset) – create a new entry for a


variable in the given symbol table.

enterproc(symtable,name,newsymbtable) – create a new


entry for the procedure in the symbol table of its parent.

372
Nested Procedure
Declarations
addwidth(symtable,width) – puts the total width of all entries
in the symbol table into the header of that table.

• We will have two stacks:


– tblptr – to hold the pointers to the symbol tables
– offset – to hold the current offsets in the symbol tables
in tblptr stack.

373
Nested Procedure
P→MD Declarations pop(tblptr);
{ addwidth(top(tblptr),top(offset));
pop(offset) }

M→€ { t=mktable(nil); push(t,tblptr); push(0,offset) }


D→D;D
D → proc id N D ; S
{ t=top(tblptr); addwidth(t,top(offset));
pop(tblptr); pop(offset);
enterproc(top(tblptr),id.name,t) }
D → id : T { enter(top(tblptr),id.name,T.type,top(offset));
top(offset)=top(offset)+T.width }

N→ € { t=mktable(top(tblptr)); push(t,tblptr);
push(0,offset) }
374
Intermediate Code
Generation

375
Intermediate Code
• Translating source Generation
program into an “intermediate
language.”
– Simple
– CPU Independent,
– …yet, close in spirit to machine language.
• Or, depending on the application other intermediate
languages may be used, but in general, we opt for simple,
well structured intermediate forms.
• (and this completes the “Front-End” of Compilation).
Benefits
1. Retargeting is facilitated
2. Machine independent Code Optimization can be
applied. 376
Intermediate Code
❖ Intermediate codesGeneration (II)
are machine independent codes, but they are close to
machine instructions.
❖ The given program in a source language is converted to an equivalent program
in an intermediate language by the intermediate code generator.
❖ Intermediate language can be many different languages, and the designer of
the compiler decides this intermediate language.

❑ syntax trees can be used as an intermediate language.


❑ postfix notation can be used as an intermediate language.
❑ three-address code (Quadraples) can be used as an intermediate language
➢ we will use quadraples to discuss intermediate code generation
➢ quadraples are close to machine instructions, but they are not actual
machine instructions.

377
Types of
Intermediate

Languages
Graphical Representations.
– Consider the assignment a:=b*-c+b*-c:
assign
assign

+
a + a

*
* *

b uminus uminus
uminus b

c c
b c
378
Syntax Dir. Definition for
Assignment Statements
PRODUCTION Semantic Rule
S → id := E { S.nptr = mknode (‘assign’, mkleaf(id, id.entry),
E.nptr) }

E → E1 + E2 {E.nptr = mknode(‘+’, E1.nptr,E2.nptr) }


E → E1 * E2 {E.nptr = mknode(‘*’, E1.nptr,E2.nptr) }
E → - E1 {E.nptr = mknode(‘uminus’,E1.nptr) }
E → ( E1 ) {E.nptr = E1.nptr }
E → id {E.nptr = mkleaf(id, id.entry) }

379
Three
Address

Code
Statements of general form x:=y op z

• No built-up arithmetic expressions are allowed.

• As a result, x:=y + z * w
should be represented as
t1:=z * w
t2:=y + t1
x:=t2

380
Three
Address

Code
Observe that given the syntax-tree or the dag of the
graphical representation we can easily derive a three
address code for assignments as above.

• In fact three-address code is a linearization of the tree.

• Three-address code is useful: related to machine-language/


simple/ optimizable.

381
Example of 3-
address code
t1:= - c t1:=- c
t2:= b * t1 t2:= b *
t3:= - c t1
t4:= b * t3 t5:= t2 +
t5:= 2 + t4 t2
a:= t5 a:= t5

382
Types of Three-Address
Statements.
Assignment Statement: x:=y op z
Assignment Statement: x:=op z
Copy Statement: x:=z
Unconditional Jump: goto L
Conditional Jump: if x relop y goto L
Stack Operations: Push/pop

More Advanced:
Procedure:
param x1
param x2

param xn
call p,n

383
Types of Three-Address
Statements.
Index Assignments:
x:=y[i]
x[i]:=y
Address and Pointer Assignments:
x:=&y
x:=*y
*x:=y

384
Syntax-Directed Translation

into 3-address
First deal with assignments.
code.
• Use attributes
– E.place: the name that will hold the value of E
• Identifier will be assumed to already have the place
attribute defined.
– E.code:hold the three address code statements that
evaluate E (this is the `translation’ attribute).
• Use function newtemp that returns a new temporary
variable that we can use.
• Use function gen to generate a single three address
statement given the necessary information (variable names
and operations).
385
Syntax-Dir. Definition for
3-address code
PRODUCTION Semantic Rule
S → id := E { S.code = E.code||gen(id.place ‘=’ E.place ‘;’) }
E → E1 + E2 {E.place= newtemp ;
E.code = E1.code || E2.code ||
|| gen(E.place‘:=’E1.place‘+’E2.place) }
E → E 1 * E2 {E.place= newtemp ;
E.code = E1.code || E2.code ||
|| gen(E.place‘=’E1.place‘*’E2.place) }
E → - E1 {E.place= newtemp ;
E.code = E1.code ||
|| gen(E.place ‘=’ ‘uminus’ E1.place) }
E → ( E1 ) {E.place= E1.place ; E.code = E1.code}
E → id {E.place = id.entry ; E.code = ‘’ }

e.g. a := b * - (c+d) 386


What about things that are
not assignments?
• E.g. while statements of the form “while E do “”
(intepreted as while the value of E is not 0 do S)
Extension to the previous syntax-dir. Def.
PRODUCTION
S → while E do S1
Semantic Rule
S.begin = newlabel;
S.after = newlabel ;

387
What about things that are not
assignments?(cont)
S.code= gen(S.begin ‘:’)
|| E.code
|| gen(‘if’ E.place ‘=’‘0’‘goto’ S.after)
|| S1.code
|| gen(‘goto’S.begin)
|| gen(S.after ‘:’)

388
Implementations of 3-address statements

•Quadruples
t1:=- c op arg1 arg2 result
t2:=b * t1 (0) uminu c t1
t3:=- c
s
t4:=b * t3
t5:=t2 + t4 (1) * b t1 t2
a:=t5
(2) uminu c
s
(3) * b t3 t4
(4) + t2 t4 t5
(5) := t5 a
Temporary names must be entered into the symbol table as they are created.

389
Implementations of 3-address statements, II
op arg1 arg2
•Triples
(0) uminu c
t1:=- c s
t2:=b * t1 (1) * b (0)
t3:=- c
(2) uminu c
t4:=b * t3 s
t5:=t2 + t4 (3) * b (2)
a:=t5
(4) + (1) (3)
(5) assign a (4)
Temporary names are not entered into the symbol table.

390
Other types of 3-
• e.g. ternary address
operations like statements
x[i]:=y x:=y[i]
• require two or more entries. e.g.
op arg1 arg2
(0) []= x i
(1) assign (0) y

op arg1 arg2
(0) []= y i
(1) assign x (0)
391
Implementations of 3-

address
Indirect Triples
statements, III
op op arg1 arg2

(0) (14) (14) uminus c

(1) (15) (15) * b (14)

(2) (16) (16) uminus c

(3) (17) (17) * b (16)

(4) (18) (18) + (15) (17)

(5) (19) (19) assign a (18)

392
Dealing with
Procedures
P → procedure id ‘;’ block ‘;’
Semantic Rule
begin = newlabel;
Enter into symbol-table in the entry of the procedure name
the begin label.
P.code = gen(begin ‘:’) || block.code ||
gen(‘pop’ return_address) || gen(“goto return_address”)
S → call id
Semantic Rule
Look up symbol table to find procedure name. Find its begin
label called proc_begin
return = newlabel;
S.code = gen(‘push’return); gen(goto proc_begin) || 393
Declarat
ions
Using a global variable offset

PRODUCTION Semantic Rule


P→ MD {}
M→ {offset:=0 }
D → id : T { addtype(id.entry, T.type, offset)
offset:=offset + T.width }
T → char {T.type = char; T.width = 4; }
T → integer {T.type = integer ; T.width = 4; }
T → array [ num ] of T1
{T.type=array(1..num.val,T1.type)
T.width = num.val * T1.width}
T → ^T1 {T.type = pointer(T1.type);
T1.width = 4}

394
Nested Procedure
Declarations
•For each procedure we should create a symbol table.
mktable(previous) – create a new symbol table where
previous is the parent symbol table of this new symbol
table
enter(symtable,name,type,offset) – create a new entry for a
variable in the given symbol table.
enterproc(symtable,name,newsymbtable) – create a new
entry for the procedure in the symbol table of its parent.
addwidth(symtable,width) – puts the total width of all entries
in the symbol table into the header of that table.
• We will have two stacks:
– tblptr – to hold the pointers to the symbol tables
– offset – to hold the current offsets in the symbol tables
in tblptr stack. 395
Keeping Track of Scope
Information
Consider the grammar fraction:

P→D
D → D ; D | id : T | proc id ; D ; S

Each procedure should be allowed to use independent names.


Nested procedures are allowed.

396
Keeping Track of Scope
Information
(a translation scheme)
P→ MD { addwidth(top(tblptr), top(offset));
pop(tblptr); pop(offset) }
M→ { t:=mktable(null); push(t, tblptr);
push(0,
offset)}
D → D1 ; D2 ...
D → proc id ; N D ; S { t:=top(tblpr);
addwidth(t,top(offset));
pop(tblptr); pop(offset);
enterproc(top(tblptr), id.name, t)}

N →  {t:=mktable(top(tblptr)); push(t,tblptr);
397
Keeping Track of Scope
Information
D → id : T {enter(top(tblptr), id.name, T.type, top(offset);
top(offset):=top(offset) + T.width

Example: proc func1; D; proc func2 D; S; S

398
Type Checking

. 399
Static
Checking
Abstract Decorated
Syntax Abstract
Tree Syntax Tree

Token Parser Static Intermediate


Intermedi
Strea Checke Code
ate Code
m r Generator

• Static (Semantic) Checks


– Type checks: operator applied to incompatible operands?
– Flow of control checks: break (outside while?)
– Uniqueness checks: labels in case statements
– Name related checks: same name?

400
Type
Checking
• Problem: Verify that a type of a construct matches that
expected by its context.

• Examples:
– mod requires integer operands (PASCAL)
– * (dereferencing) – applied to a pointer
– a[i] – indexing applied to an array
– f(a1, aβ, …, an) – function applied to correct arguments.
• Information gathered by a type checker:
– Needed during code generation.

401
Type
Systems
• A collection of rules for assigning type expressions to the
various parts of a program.

• Based on: Syntactic constructs, notion of a type.

• Example: If both operators of “+”, “-”, “*” are of type


integer then so is the result.

• Type Checker: An implementation of a type system.


– SyntaxDirected.

• Sound Type System: eliminates the need for checking type


errors during run time.
402
Type
Expression
• Implicit Assumptions: s Expressions
– Each program has a type
– Types have a structure Statements

Type Constructors
Basic Types
Arrays
Boolean Character
Records
Real Integer
Sets
Enumera Sub-ranges Pointers
tions Functions
Void Error
403
Variables Names
Representation of Type
Expressions
cell = record
-> ->
x

x pointr x pointr x x

char char integr char integr info int next ptr

struct cell {
Tree DAG
int info;
struct cell * next;
(char x char)-> pointer (integer) };

404
Type Expressions
Grammar
Type -> int | float | char | …
| void
| error
| name Basic Types
| variable
| array( size, Type)
| record( (name, Type)*)
| pointer( Type)
Structured
| tuple((Type)*)
Types
| arrow(Type, Type)

405
A Simple Typed
Language
Program -> Declaration; Statement
Declaration -> Declaration; Declaration
| id: Type
Statement -> Statement; Statement
| id := Expression
| if Expression then Statement
| while Expression do Statement
Expression -> literal | num | id
| Expression mod Expression
| E[E] | E ↑ | E (E)

406
Type Checking
Expressions
E -> int_const { E.type = int }
E -> float_const { E.type = float }
E -> id { E.type = sym_lookup(id.entry, type) }
E -> E1 + E2 {E.type = if E1.type {int, float} | E2.type 
{int, float}
then error
else if E1.type == E2.type == int
then int
else float }

407
Type Checking
Expressions
E -> E1 [E2] {E.type = if E1.type = array(S, T) &
E2.type = int then T else error}
E -> *E1 {E.type = if E1.type = pointer(T) then T else error}
E -> &E1 {E.type = pointer(E1.tye)}
E -> E1 (E2) {E.type = if (E1.type = arrow(S, T) &
E2.type = S, then T else err}
E -> (E1, E2) {E.type = tuple(E1.type, E2.type)}

408
Type Checking
Statements
S -> id := E {S.type := if id.type = E.type then void else error}

S -> if E then S1 {S.type := if E.type = boolean then S1.type


else error}

S -> while E do S1 {S.type := if E.type = boolean then


S1.type}

S -> S1; S2 {S.type := if S1.type = void ∧ S2.type = void then


void else error}

409
Equivalence of Type
Expressions
Problem: When in E1.type = E2.type?
– We need a precise definition for type equivalence
– Interaction between type equivalence and type
representation

Example: type vector = array [1..10] of real


type weight = array [1..10] of real
var x, y: vector; z: weight

Name Equivalence: When they have the same name.


– x, y have the same type; z has a different type.

Structural Equivalence: When they have the same structure.


– x, y, z have the same type.
410
Structural
Equivalence
• Definition: by Induction
– Same basic type(basis)
– Same constructor applied to SE Type(induction step)
– Same DAG Representation

• In Practice: modifications are needed


– Do not include array bounds – when they are passed as
parameters
– Other applied representations (More compact)

• Can be applied to: Tree/ DAG


– Does not check for cycles
– Later improve it.

411
Algorithm Testing
Structural Equivalence
function stequiv(s, t): boolean
{
if (s & t are of the same basic type) return true;
if (s = array(s1, s2) & t = array(t1, t2))
return equal(s1, t1) & stequiv(s2, t2);
if (s = tuple(s1, s2) & t = tuple(t1, t2))
return stequiv(s1, t1) & stequiv(s2, t2);
if (s = arrow(s1, s2) & t = arrow(t1, t2))
return stequiv(s1, t1) & stequiv(s2, t2);
if (s = pointer(s1) & t = pointer(t1))
return stequiv(s1, t1);
}

412
Recursive
Types
Where: Linked Lists, Trees, etc.
How: records containing pointers to similar records
Example: type link = ‡ cell;
cell = record info: int; next = link end
Representation:

cell = record cell = record

x x

x x x x

info int next ptr info int next ptr

DAG with Names cell Substituting names out (cycles)


413
Recursive
• Types
C Policy: avoid cycles ingraphs
in type C by:
– Using structural equivalence for all types
– Except for records -> name equivalence

• Example:
– struct cell {int info; struct cell * next;}

• Name use: name cell becomes part of the type of the


record.
– Use the acyclic representation
– Names declared before use – except for pointers to
records.
– Cycles – potential due to pointers in records
– Testing for structural equivalence stops when a
record constructor is reached ~ same named record
type?

414
Overloading Functions
& Operators
• Overloaded Symbol: one that has different meanings
depending on its context

• Example: Addition operator +

• Resolving (operator identification): overloading is resolved


when a unique meaning is determined.

• Context: it is not always possible to resolve overloading by


looking only the arguments of a function
– Set of possible types
– Context (inherited attribute) necessary

415
Overloading
Example
function “*” (i, j: integer) return complex;
function “*” (x, y: complex) return complex;
* Has the following types:
arrow(tuple(integer, integer), integer)
arrow(tuple(integer, integer), complex)
arrow(tuple(complex, complex), complex)
int i, j;
k = i * j;

416
Narrowing
Down Types
E’ -> E {E’.types = E. types
E.unique = if E’.types = {t} then t else
error}
E -> id {E.types = lookup(id.entry)}
E -> E1(E2) {E.types = {s’ |  s  E2.types and S->s’
E1.types}
t = E.unique
S = {s | s  E2.types and S->t E1.types}
E2.unique = if S ={s} the S else error
E1.unique = if S = {s} the S->t else error

417
Polymorphic
Functions
• Defn: a piece of code (functions, operators) that can be
executed with arguments of different types.

• Examples: Built in Operator indexing arrays, pointer


manipulation

• Why use them: facilitate manipulation of data structures


regardless of types.

• Example HL:
fun length(lptr) = if null (lptr) then 0
else length(+l(lptr)) + 1

418
A Language for
Polymorphic Functions
P -> D ; E
D -> D ; D | id : Q
Q ->  α. Q | T
T -> arrow (T, T) | tuple (T, T)
| unary (T) | (T)
| basic

E -> E (E) | E, E | id

419
Type
Variables
• Why: variables representing type expressions allow us to
talk about unknown types.
– Use Greek alphabets α, þ, y …
• Application: check consistent usage of identifiers in a
language that does not require identifiers to be declared
before usage.
– A type variable represents the type of an undeclared
identifier.
• Type Inference Problem: Determine the type of a language
constant from the way it is used.
– We have to deal with expressions containing variables.

420
Examples of Type
Type link ‡ cell;
Inference
Procedure mlist (lptr: link; procedure p);
{ while lptr <> null { p(lptr); lptr := lptr ‡ .next} }
Hence: p: link -> void
Function deref (p){ return p ‡; }
P: þ, þ = pointer(α)
Hence deref:  α. pointer(α) -> α

421
Program in
Polymorphic Language
apply: α0
deref:  α. pointer(α) -> α
q: pointer (pointer (integer))
deref (deref( (q)) deref0: pointer (α0 ) -> α0 apply: αi

deref0: pointer (αi ) -> αi


Notation:
-> arrow
q: pointer (pointer (integer))
x tuple

Subsripts i and o distinguish between the inner and outer occurrences of


deref, respectively.

422
Type Checking
Polymorphic Functions
• Distinct occurrences of a p.f. in the same expression need
not have arguments of the same type.
– deref ( deref (q))
– Replace α with fresh variable and remove  (αi, αo)

• The notion of type equivalence changes in the presence of


variables.
– Use unification: check if s and t can be made
structurally equivalent by replacing type vars by the
type expression.

• We need a mechanism for recording the effect of unifying


two expressions.
– A type variable may occur in several type expressions.

423
Substitutions and

Unification
Substitution: a mapping from type variables to type expressions.
Function subst (t: type Expr): type Expr { S
if (t is a basic type) return t;
if (t is a basic variable) return S(t); --identify if t  S
if (t is t1 -> t2) return subst(t1) -> subst (t2); }

• Instance: S(t) is an instance of t written S(t) < t.


– Examples: pointer (integer) < pointer (α) , int -> real ≠ α-> α

• Unify: t1 ≈ tβ if  S. S (t1) = S (t2)

• Most General Unifier S: A substitution S:


– S (t1) = S (t2)
– S’. S’ (t1) = S’ (tβ) ➔ t. S’(t) < S(t).

424
Polymorphic Type checking
Translation Scheme
E -> E1 (E2) { p := mkleaf(newtypevar); unify
(E1.type, mknode(‘-
>’, Eβ.type,p);
E.type = p}
E -> E1, E2 {E.type := mknode(‘x’, E1.type, Eβ.type); }
E -> id { E.type := fresh (id.type) }

fresh (t): replaces bound vars in t by fresh vars. Returns pointer


to a node representing result.type.
fresh( α.pointer(α) -> α) = pointer(α1) -> α1.

unify (m, n): unifies expressions represented by m and n.


– Side-effect: keep track of substitution
– Fail-to-unify: abort type checking.

425
PType Checking
Example
Given: derefo (derefi (q)) q = pointer (pointer (int))
Bottom Up: fresh (α. Pointer(α) -> α)

-> : 3
pointer :
derefo derefi q 2
α:1
-> : 3 -> : 6 pointer :
9
pointer : pointer : pointer :
2 5 8
αo : 1 αi : 4 integer : 7
n-> : 6
-> : 3 m-> : 6
pointer : þ:8
pointer : pointer : 5
2 5 pointer :
αo : 1 αi : 4 8 426
integer : 7
UNIT-3

PART-B

SYMBOLTABLE
Symbol tables

Def : Symbol table is a data structure used by compiler to keep


track of semantics of variable .
L-value and r – value : the l and r prefixes come from left and right
side assignment .
Ex:
a := I + 1

l-value r-value
Symbol table entries

• Variable names
• Constants
• Procedure names
• Function names
• Literal constants and strings
• Compiler generated temporaries
• Labels in source languages
Compiler uses following types of information from symbol table
1. Data type
2. Name
3.Declaring procedures
4.Offset in storage
Symbol table entries

5. If structure are record then pointer to structure variable .


6. For parameters, whether parameter passing is
by value or reference .
7. Number and type of arguments passed to the function .
8. Base address .
How to store names in symbol table

• There are two types of name representation .


• Fixed length names :
• A fixed space for each name is allocated in symbol table .

name attribute
C A L C U L A T E

S U m

b
Variable length name

• Variable length

name
Starting length attribute
index
0 10
10 4
14 2
16 2
Symbol table management
• Data structure for symbol table :
1 . Linear list
2 . Arrays
The pointer variable is maintained at the end of all stored records .

Name 1 Info 1

Name2 Info 2

Name 3 Info 3

. .
. .
. .

Available Name n Info n


(start of empty
slot
Symbol table management
• Self organization list :
• This symbol table implementation is using linked list .
• We search the records in the order pointed by the link of link
field .
Name 1 Info 1

Name 2 Info 2
first
Name 3 Info 3

Name 4 Info 4

A pointer first is maintained to point to first record of symbol table


The reference to these names can be name 3,name 1,name 4,name2 .
Symbol table management

• Self organization list :


• When the name is referenced or created it is moved to the front of
the list .
• The most frequently referred names will tend to be front of the list .
Hence access time to most frequently referred names will be the
least .
Symbol table management
• Binary trees

Left node Symbols Information Right child

• Ex:
• Int m,n,p;
• Int compute(int a,int b,intc)
• {
• T=a+b*c;
• Return(t);
• }
• Main()
• {
• Int k;
• K=compute(10,20,30)
• }
Symbol table management

• Binary tree structure organization


k int

a int m int

n int
b int

c int p int

compute int t int


Symbol table management

• Binary tree structure organization


• Advantages :
• Insertion of any symbol is efficient.
• Any symbol can be searched efficiently using binary searching
method.
• Disadvantages :
• This structure consumes lot of space in storing left pointer,right
pointer and null pointers.
Symbol table management

• Hash tables :
• It is used to search the records of symbol table .
• In hashing two tables are maintained a hash table and symbol
table .
• Hash table contains of k entries from 0,1 to k-1 . These entries
are basically pointers to symbol table pointing to the names of
symbol table .
• To determine where the name is in symbol table ,we use a hash
function ‘h’ such that h(name) will result any integer between 0
to k-1 . We can search any name by
position = h(name)
Using this position we can obtain the exact locations of name in
symbol table .
Symbol table management

• Hash tables :
The hash function should result in uniform distribution of names
in symbol table .

The hash function should be such that there will be minimum


number of collision . Collision is such a situation where hash
function results in same location for storing the names .
Various collision resolution techniques are open addressing,
chaining , rehashing .
Symbol table management

• Hash tables :
Hash table name info Hash link
sum sum

j
avg
avg

. .
. .
Symbol table management

• Hash tables :
• The advantages of hash table is quick search is possible .
• The disadvantage is that hashing is complicated to implement .
Some extra space is required . Obtaining scope of variables is
very difficult .
Symbol table management

• HEAPALLOCATION:

• The stack allocation strategy cannot be used if either


of the following is possible:
• 1. The values of local names must be retained when
activation ends.
• 2. A called activation outlives the caller.

• In each of the above cases, the deallocation of
activation records need not occur in a last- in first-out fashion, so
storage cannot be organized as a stack.
• Heap allocation parcels out pieces contiguous
storage, as needed for activation records or other objects. Pieces
may be deallocated in any order, so over time the heap will consist of
alternate areas that are free and in use.
Storage allocation stratagies

• 1.code area
• 2.Static data area
• 3.Stack area
• 4.heap area
• There are 3 different storage allocation strategies based on this
division of run time storage .the strategies are
• 1.Static allocation :At compile time
• 2.Stack allocation : A stack is used to manage the run time
manage .
• 3 . Heap allocation : heap is used to manage the dynamic
memory allocation .
Static Allocation

• Done at compile time

– Literals (and constants) bound to values


– Variables bound to addresses

• Compiler notes undefined symbols


– Library functions
– Global Variables and System Constants

• Linker (and loader if DLLs used) resolve undefined


references.
Stack Based Allocation

• Stack Layout determined at compile time

– Variables bound to offsets from top of stack.


• Layout called stack frame or activation record
– Compilers use registers

• Function parameters and results need consistent treatment across


modules
– C/C++ use prototypes
– Eiffel/Java/Oberon use single definition
Heap Allocation

• Heap provides dynamic memory management.


– Not to be confused with binary heap or binomial
heap data structures.
– Under the hood, may periodically need to request
additional memory from the O/S.
• Requested large regions (requests are
expensive).
– Done using a library (e.g. C)
– Or as part of the language (C++, Java, Lisp).
Activation Record

model of Activation Record

Return value
Actual parameters
Control link(dynamic link)
Access link (static link)
Saved machine status
Local variables
Temporaries
Activation Record
• Temporary values : These values are needed during the evaluation of
expressions .
• Local variables : the local data is a data that is local to the execution of
procedure is stored in this field of activation record .
• Saved machine registers : the status of machine just before the
procedure is called .this field contains the machine registers and
program counter .
• Control link : this field is optional .it points the activation record of the
calling procedure . This link is also called dynamic link .
• Access link : this field is optional . It refers the non local data in other
activation record . This field is also called static link field .
• Actual parameters : this contains the information about the actual
parameters .
• Return values : this field is used to store the result of a function call .
Storage variable length data

Control link Act record for A


Pointer to x .
Pointer to y

Array x
Array y
Array of A

Control link
Act record for B
Top_sp

Place variable length of B


top
Block structer and non block structure storage allocation

• The storage allocation can be done for 2 types of data variables


• 1. Local data
• 2. Non local data .
• Local data can be accessed with the help of activation record .
• Non local data can be accessed using scope information .
• The block structured storage allocation can be done using static
scope or lexical scope .
• The non block structured can be done by dynamic scope .
Local data

• Reference to any variable x in procedure = Base pointer pointing


to start of procedure
+ Offset of variable x
from base pointer .
• Ex: consider following program
• procedure A
• int a;
• procedure B
• int b;
• body of B;
• body of A ;
Local data

• The contents of stack along with base pointer and offset are as
shown below .

Base_ptr
Return value
Act rec for proc. Dynamic link
A
Saved registers offset

parameters
Locals : a
Access to local data
top
Local data

Act rec for


proc.A
Return value Base_ptr
Dynamic link
Saved registers
Act rec for Offset
proc.B parameters
Local : b Access to local data
top
Access to non local names

access
Used by block Used by non block
Handling non local data
structured stuctured languages
languages

Static scope or
lexical scope Dynamic scope

Access link Deep access

display Shallow accesss


Scope rules

Static scope rule : Is also called as lexical scope .


• In this type the scope is determined by examining the program
test . PASCAL,C and ADA are the languages that use the static
scope .
Dynamic scope : for non block structured languages this dynamic
scope allocation rules are used .
• Ex: LISP and SNOBOL
Static scope or lexical scope

Access link
• By using pointers to each record. test
• These pointers are called access links. Access link
a:

test test B(1)

access link Access link Access link

a: a: i,b:

B(1) B(1) B(0)

Access link Access link Access link

i,b: i,b: i,b:


B(0) c
Access link Access link
i,b: K:
Static scope or lexical scope
access link
test
access link
a:
B(1)
access link
i,b:
B(0)
access link
i,b:
C
access link
k:
A
access link
d:
Static scope or lexical scope

• Displays :
• It is expensive to traverse down access link every time when a
particular local variable is accessed . To speed up the access to
non local can be achieved by maintaining an array of pointers
called display.
• In display
• An array of pointers to activation record is maintained.
• Array is indexing by nesting level.
• The pointers points to only accessible activation record.
• The display changes when a new activation occurs and it must
be reset when control returns from the new activation.
Storage allocation for non block structured languages

• Dynamic scope :
• 1. deep access : the idea is keep a stack of active variables, use
control links instead of access links and when you want to find a
variable then search the stack from top to bottom looking for
most recent activation record that contains the space for desired
variables . This method of accessing non local variables is called
Deep access .
• In this method a symbol table is needed to be used at run time .
• Shallow access : the idea is to keep a central storage with one
slot for every variable name . If the names are not created at run
time then that storage layout can be fixed at compile time
otherwise when new activation of procedure occures,then that
procedure changes the storage entries for its locals at entry and
exit .
Comparison of Deep and Shallow access

• Deep access takes longer time to access the non locals


while Shallow access allows fast access .
• Shallow access has a overhead of handling procedure
entry and exit .
• Deep access needs a symbol table at run time .

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy