CD Unit 3
CD Unit 3
T-3
PART-A
Semantic analysis
276
Syntax Analysis
Example
a := b + c* 100
❖ The seven tokens are grouped into a parse tree
Assignment stmt
identifier expression
:=
a expression expression
+
identifier
c*100
b
277
Example of
Parse Tree
list → list + digit (2.2)
Given the grammar:
list → list - digit (2.3)
list → digit (2.4)
digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 (2.5)
list digit
digit
9 - 5 + 2
278
Abstract Syntax
Tree (AST)
The AST is a condensed/simplified/abstract form of
the parse tree in which:
1. Operators are directly associated with interior nodes
(non-terminals)
279
Abstract and
Concrete Trees
list
list digit +
list digit
- 2
digit
9 - 5 + 2 9 5
280
Advantages of the AST
Representation
• Convenient representation for semantic analysis and
intermediate-language (IL) generation
• Useful for building other programming language tools e.t., a
syntax-directed editor
281
Syntax Directed
Translation (SDT)
Syntax-directed translation is a method of translating a string into a
sequence of actions by attaching on such action to each rule of a grammar.
282
Syntax-Directed Definitions
and Translation Schemes
A. Syntax-Directed Definitions:
• give high-level specifications for translations
• hide many implementation details such as order of evaluation of
semantic actions.
• We associate a production rule with a set of semantic actions, and we do
not say when they will be evaluated.
B. Translation Schemes:
• Indicate the order of evaluation of semantic actions associated with
a production rule.
• In other words, translation schemes give a little bit information about
implementation details.
283
Example Syntax-Directed Definition
term ::= ID
{ term.place := ID.place ; term.code = “” }
284
YACC – Yet Another
Compiler-Compiler
A bottom-up parser generator
It provides semantic stack manipulation and supports
specification of semantic routines.
Developed by Steve Johnson and others at AT&T Bell Lab.
Can use scanner generated by Lex or hand-coded scanner in C
Used by many compilers and tools, including production
compilers.
285
Syntax-Directed
•
Translation
Grammar symbols are associated with attributes to associate
information with the programming language constructs that
they represent.
• Values of these attributes are evaluated by the semantic rules
associated with the production rules.
• Evaluation of these semantic rules:
– may generate intermediate codes
– may put information into the symbol table
– may perform type checking
– may issue error messages
– may perform some other activities
– in fact, they may perform almost any activities.
• An attribute may hold almost any thing.
– a string, a number, a memory location, a complex record.
286
Syntax-Directed Definitions and
Translation Schemes
• When we associate semantic rules with productions, we use
two notations:
– Syntax-Directed Definitions
– Translation Schemes
• Syntax-Directed Definitions:
– give high-level specifications for translations
– hide many implementation details such as order of
evaluation of semantic actions.
– We associate a production rule with a set of semantic
actions, and we do not say when they will be evaluated.
287
Syntax-Directed Definitions
and Translation
Schemes
• Translation Schemes:
– indicate the order of evaluation of semantic actions
associated with a production rule.
– In other words, translation schemes give a little bit
information about implementation details.
288
Syntax-Directed
•
Definitions
A syntax-directed definition is a generalization of a context-
free grammar in which:
– Each grammar symbol is associated with a set of attributes.
– This set of attributes for a grammar symbol is partitioned
into two subsets called synthesized and inherited attributes
of that grammar symbol.
– Each production rule is associated with a set of semantic
rules.
• Semantic rules set up dependencies between attributes which
can be represented by a dependency graph.
• This dependency graph determines the evaluation order of
these semantic rules.
• Evaluation of a semantic rule defines the value of an attribute.
But a semantic rule may also have some side effects such as289
Annotated
Parse Tree
• A parse tree showing the values of attributes at each node
is called an annotated parse tree.
• The process of computing the attributes values at the nodes
is called annotating (or decorating) of the parse tree.
• Of course, the order of these computations depends on the
dependency graph induced by the semantic rules.
290
Syntax-Directed
Definition
• In a syntax-directed definition, each production A ‹ α is
associated with a set of semantic rules of the form:
b=f(c1,c2,…,cn) where f is a function,
and b can be one of the followings:
➔ b is a synthesized attribute of A and c1,c2,…,cn are
attributes of the grammar symbols in the production (
A ‹ α ).
OR
➔ b is an inherited attribute one of the grammar symbols
in α (on the
right side of the production), and c1,c2,…,cn are
attributes of the grammar symbols in the production (
A ‹ α ).
291
Attribute
Grammar
• So, a semantic rule b=f(c1,c2,…,cn) indicates that the
attribute b depends on attributes c1,c2,…,cn.
• In a syntax-directed definition, a semantic rule may just
evaluate a value of an attribute or it may have some
side effects such as printing values.
292
Syntax-Directed
Definition -- Example
Production Semantic Rules
L ‹ E return print(E.val)
E‹ E1 + T E.val = E1.val + T.val
E‹ T E.val = T.val
T ‹ T1 * F T.val = T1.val * F.val
T‹ F T.val = F.val
F‹ (E) F.val = E.val
F ‹ digit F.val = digit.lexval
E.val=17 return
E.val=5 + T.val=12
digit.lexval=5 digit.lexval=3
294
Dependency
GraphL
Input: 5+3*4
E.val=17
E.val=5 T.val=12
digit.lexval=5 digit.lexval=3
295
Syntax-Directed
Definition – Example2
Production Semantic Rules
E ‹ E1 + T E.loc=newtemp(), E.code = E1.code ||
T.code
|| add E1.loc,T.loc,E.loc
296
Syntax-Directed
Definition – Example2
• Symbols E, T, and F are associated with synthesized
attributes loc and code.
• The token id has a synthesized attribute name (it is
assumed that it is evaluated by the lexical analyzer).
• It is assumed that || is the string concatenation operator.
297
Syntax-Directed Definition –
Inherited Attributes
Production Semantic Rules
D‹ TL L.in = T.type
T‹ int T.type = integer
T ‹ real T.type = real
L ‹ L1 id L1.in = L.in, addtype(id.entry,L.in)
L ‹ id addtype(id.entry,L.in)
298
A Dependency Graph –
Inherited Attributes
Input: real p q
D L.in=real
T L T.type=real L1.in=real
addtype(q,real)
id id.entry=p
300
S-Attributed
Definitions
• To implement S-Attributed Definitions and L-Attributed
Definitions are easy (we can evaluate semantic rules in a
single pass during the parsing).
• Implementations of S-attributed Definitions are a little bit
easier than implementations of L-Attributed Definitions
301
Bottom-Up Evaluation of S-
Attributed Definitions
• We put the values of the synthesized attributes of the
grammar symbols into a parallel stack.
– When an entry of the parser stack holds a grammar
symbol X (terminal or non-terminal), the
corresponding entry in the parallel stack will hold the
synthesized attribute(s) of the symbol X.
302
Bottom-Up Evaluation of S-
Attributed Definitions
A → XYZ A.a=f(X.x,Y.y,Z.z) where all attributes are
synthesized.
stack parallel-stack
top →
→
Z Z.z
➔ top
Y Y.y
X X.x A A.a
. . . .
303
Bottom-Up Eval. of S-Attributed
Definitions (cont.)
Production Semantic Rules
L ‹ E return print(val[top-1])
E‹ E1 + T val[ntop] = val[top-2] + val[top]
E‹ T
T ‹ T1 * F val[ntop] = val[top-2] * val[top]
T‹ F
F‹ (E) val[ntop] = val[top-1]
F ‹ digit
304
Canonical LR(0) Collection for The Grammar
..
I0: L’→ L
L I1: L’→L . I7: L →Er . .
.
I11: E →E+T * 9
r
L→
E→ ..Er
E+T E I2: L →E r .. +
..
I8: E →E+ T. T
F 4
T →T *F
..
E→ T E →E +T T → T*F (
5
T→
T→
F→ ..
T*F
F
(E)
T I3: E →T
T →T *F
.. ..
T→ F
F → (E)
F→ d
d
6
F→ d
F I4: T →F . *
.
( . I9: T →T* F
..
F
I12: T →T*F .
I5: F →
E→ .. ( E)
E+T
E
F → (E)
F→ d (
5
..
E→ T d 6
T→
T→
F→ ..
T*F
F
(E)
T
3
..
I10:F →(E )
E →E +T
) I13: F →(E) .
F→ d F
4 8
d I6: F →d . (
d
5
6
+
305
Bottom-Up Evaluation
0E2+8F4 5-3
-- Example
*4r T‹F T.val=F.val – do nothing
307
Top-Down Evaluation (of S-
Attributed Definitions)
• Remember that: In a recursive predicate parser, each non-
terminal corresponds to a procedure.
procedure A() {
call B(); A‹
B
}
procedure B() {
if (currtoken=0) { consume 0; call B(); } B‹
0B
else if (currtoken=1) { consume 1; call B(); } B‹
1B
else if (currtoken=$) {} // $ is end-marker B‹
else error(“unexpected token”);
}
308
Top-Down Evaluation (of S-
Attributed Definitions)
procedure A() {
int n0,n1; Synthesized attributes of non-terminal B
call B(&n0,&n1); are the output parameters of procedure B.
print(n0); print(n1);
} All the semantic rules can be
evaluated
procedure B(int *n0, int *n1) { at the end of parsing of production
rules
if (currtoken=0)
{int a,b; consume 0; call B(&a,&b); *n0=a+1; *n1=b;}
else if (currtoken=1)
{ int a,b; consume 1; call B(&a,&b); *n0=a; *n1=b+1; }
else if (currtoken=$) {*n0=0; *n1=0; } // $ is end-marker
else error(“unexpected token”);
}
309
L-Attributed
Definitions
• S-Attributed Definitions can be efficiently implemented.
• We are looking for a larger (larger than S-Attributed
Definitions) subset of syntax-directed definitions which can
be efficiently evaluated.
➔ L-Attributed Definitions
310
L-Attributed
Definitions
• A syntax-directed definition is L-attributed if each
inherited attribute of Xj, where 1jn, on the right side of
A ‹ X1X2...Xn depends only on:
1. The attributes of the symbols X1,...,Xj-1 to the left of Xj
in the production and
2. the inherited attribute of A
311
A Definition which is NOT
L-Attributed
Productions Semantic Rules
A‹ LM L.in=l(A.i), M.in=m(L.s), A.s=f(M.s)
A‹ QR R.in=r(A.in), Q.in=q(R.s), A.s=f(Q.s)
• This syntax-directed definition is not L-attributed because
the semantic rule Q.in=q(R.s) violates the restrictions of
L-attributed definitions.
• When Q.in must be evaluated before we enter to Q because
it is an inherited attribute.
• But the value of Q.in depends on R.s which will be
available after we return from R. So, we are not be able to
evaluate the value of Q.in before we enter to Q.
312
Translation
Schemes
• In a syntax-directed definition, we do not say anything
about the evaluation times of the semantic rules (when the
semantic rules associated with a production should be
evaluated?).
314
Translation Schemes for S-
•
attributed Definitions
If our syntax-directed definition is S-attributed, the
construction of the corresponding translation scheme will be
simple.
• Each associated semantic rule in a S-attributed syntax-
directed definition will be inserted as a semantic action into
the end of the right side of the associated production.
E‹ TR
R‹ + T { print(“+”) } R1
R‹
T‹ id { print(id.name) }
a+b+c ➔ ab+c+
316
A Translation Scheme
E Example (cont.)
T R
id {print(“a”)} + T {print(“+”)} R
id {print(“b”)} + T {print(“+”)}
R
id {print(“c”)}
319
A Translation Scheme with
Inherited Attributes
D ‹ T id { addtype(id.entry,T.type), L.in = T.type } L
320
Predictive Parsing (of
Inherited Attributes)
procedure D() {
int Ttype,Lin,identry;
call T(&Ttype); consume(id,&identry);
addtype(identry,Ttype); Lin=Ttype;
call L(Lin); a synthesized attribute
(an output parameter)
}
procedure T(int *Ttype) {
if (currtoken is int) { consume(int); *Ttype=TYPEINT; }
else if (currtoken is real) { consume(real);
*Ttype=TYPEREAL; }
else { error(“unexpected type”); } an inherited attribute
(an input parameter)
}
321
Predictive Parsing (of
Inherited Attributes)
procedure L(int Lin) {
if (currtoken is id) { int L1in,identry;
consume(id,&identry);
addtype(identry,Lin); L1in=Lin; call
L(L1in); }
else if (currtoken is endmarker) { }
else { error(“unexpected token”); }
}
322
Translation Scheme - Intermediate
Code Generation
E ‹ T { A.in=T.loc } A { E.loc=A.loc }
A ‹ + T { A1.in=newtemp(); emit(add,A.in,T.loc,A1.in)
}
A1 { A.loc =A1.loc}
A‹ { A.loc = A.in }
T ‹ F { B.in=F.loc } B { T.loc=B.loc }
B‹ * F { B1.in=newtemp(); emit(mult,B.in,F.loc,B1.in)
}
B1 { B.loc = B1.loc}
B‹ { B.loc = B.in }
F ‹ ( E ) { F.loc = E.loc }
F ‹ id { F.loc = id.name }
323
Predictive Parsing – Intermediate
Code Generation
procedure E(char **Eloc) {
char *Ain, *Tloc, *Aloc;
call T(&Tloc); Ain=Tloc;
call A(Ain,&Aloc); *Eloc=Aloc;
}
procedure A(char *Ain, char **Aloc){
if (currtok is +) {
char *A1in, *Tloc, *A1loc;
consume(+); call T(&Tloc); A1in=newtemp();
emit(“add”,Ain,Tloc,A1in);
call A(A1in,&A1loc); *Aloc=A1loc;
}
else { *Aloc = Ain }
} 324
Predictive
Parsing (cont.)
procedure T(char **Tloc) {
char *Bin, *Floc, *Bloc;
call F(&Floc); Bin=Floc;
call B(Bin,&Bloc); *Tloc=Bloc;
}
procedure B(char *Bin, char **Bloc) {
if (currtok is *) {
char *B1in, *Floc, *B1loc;
consume(+); call F(&Floc); B1in=newtemp();
emit(“mult”,Bin,Floc,B1in);
call B(B1in,&B1loc); Bloc=B1loc;
}
325
Predictive Parsing (cont.)
327
Removing Embedding
Semantic Actions
• In bottom-up evaluation scheme, the
semantic actions are evaluated during the
reductions.
• During the bottom-up evaluation of S-
attributed definitions, we have a parallel
stack to hold synthesized attributes.
• Problem: where are we going to hold
inherited attributes?
• A Solution:
328
Removing Embedding
Semantic Actions
– We will convert our grammar to an equivalent
grammar to guarantee to the followings.
– All embedding semantic actions in our translation
scheme will be moved into the end of the production
rules.
– All inherited attributes will be copied into the
synthesized attributes (most of the time synthesized
attributes of new non-terminals).
– Thus we will be evaluate all semantic actions during
reductions, and we find a place to store an inherited
attribute.
329
Removing Embedding
Semantic Actions
• To transform our translation scheme into an equivalent
translation scheme:
1. Remove an embedding semantic action Si, put new a non-
terminal Mi instead of that semantic action.
2. Put that semantic action Si into the end of a new
production rule Mi→ for that non-terminal Mi.
3. That semantic action Si will be evaluated when this new
production rule is reduced.
4. The evaluation order of the semantic rules are not
changed by this transformation.
330
Removing Embedding
Semantic Actions
A→ {S1} X1 {S2} X2 ... {Sn} Xn
A→ M1 X1 M2 X2 ... Mn Xn
M1→ {S1}
M2→ {S2}
.
.
Mn→ {Sn}
331
Removing Embedding
Semantic Actions
E‹ TR
R‹ + T { print(“+”) } R1
R‹
T‹ id { print(id.name) }
E‹ TR
R‹ + T M R1
R‹
T‹ id { print(id.name) }
M‹ { print(“+”) }
332
Translation with
Inherited Attributes
• Let us assume that every non-terminal A has an inherited
attribute A.i, and every symbol X has a synthesized
attribute X.s in our grammar.
• For every production rule A→ X1 X2 ... Xn ,
– introduce new marker non-terminals M1,M2,...,Mn and
– replace this production rule with A→ M1 X1 M2 X2 ...
Mn Xn
– the synthesized attribute of Xi will be not changed.
– the inherited attribute of Xi will be copied into the
synthesized attribute of Mi by the new semantic action
added at the end of the new production rule Mi→.
– Now, the inherited attribute of Xi can be found in the
synthesized attribute of Mi (which is immediately
available in the stack.
333
Translation with
Inherited
S → {A.i=1} A{S.s=k(A.i,A.s)}
Attributes
A → {B.i=f(A.i)} B {C.i=g(A.i,B.i,B.s)} C {A.s=
h(A.i,B.i,B.s,C.i,C.s)}
B → b {B.s=m(B.i,b.s)}
C → c {C.s=n(C.i,c.s)}
S → {M1.i=1} M1 {A.i=M1.s} A {S.s=k(M1.s,A.s)}
A → {M2.i=f(A.i)} M2 {B.i=M2.s}B
{M3.i=g(A.i,M2.s,B.s)} M3 {C.i=M3.s} C {A.s= h(A.i,
M2.s,B.s, M3.s,C.s)}
B → b {B.s=m(B.i,b.s)}
C → c {C.s=n(C.i,c.s)}
M1→ {M1.s=M1.i}
M2→ {M2.s=M2.i}
M3→ {M3.s=M3.i}
334
Actual
Translation
1 1 Scheme
S → {M .i=1} M {A.i=M .s}1A{S.s=k(M .s,A.s)}
1
A → {M2.i=f(A.i)} M2 {B.i=M2.s} B {M3.i=g(A.i,M2.s,B.s)} M3
{C.i=M3.s} C {A.s= h(A.i, M2.s,B.s, M3.s,C.s)}
B → b {B.s=m(B.i,b.s)}
C → c {C.s=n(C.i,c.s)}
M1→ {M1.s= M1.i}
M2→ {M2.s=M2.i}
M3→ {M3.s=M3.i}
335
Actual
Translation
S → M1 A
Scheme
{ s[ntop]=k(s[top-1],s[top]) }
M1→ { s[ntop]=1 }
A → M2 B M3 C { s[ntop]=h(s[top-4],s[top-3],s[top-2],
s[top-1],s[top]) }
M2→ { s[ntop]=f(s[top]) }
M3→ { s[ntop]=g(s[top-2],s[top-1],s[top])}
B→b { s[ntop]=m(s[top-1],s[top]) }
C→c { s[ntop]=n(s[top-1],s[top]) } 336
Evaluation of
Attributes
S
S.s=k(1,h(..))
A.i=1
A
A.s=h(1,f(1),m(..),g(..),n(..))
B.i=f(1) C.i=g(1,f(1),m(..))
B C
B.s=m(f(1),b.s) C.s=n(g(..),c.s)
b c
337
Evaluation of
Attributes
stack input s-attribute stack
bc$
M1 bc$ 1
M1 M2 bc$ 1 f(1)
M1 M2 b c$ 1 f(1) b.s
M1 M2 B c$ 1 f(1) m(f(1),b.s)
M 1 M 2 B M3 c$ 1 f(1) m(f(1),b.s) g(1,f(1),m(f(1),b.s))
M1 M2 B M3 c $ 1 f(1) m(f(1),b.s) g(1,f(1),m(f(1),b.s)) c.s
M1 M2 B M3 C $ 1 f(1) m(f(1),b.s) g(1,f(1),m(f(1),b.s))
n(g(..),c.s)
M1 A $ 1 h(f(1),m(..),g(..),n(..))
S $ k(1,h(..))
338
Probl
ems
• All L-attributed definitions based on LR grammars cannot be evaluated
during bottom-up parsing.
S → M1 L
L → M2 L1 1 ➔ But since L → will be reduced first by the
bottom-up
L→ { print(s[top]) } parser, the translator cannot know the number of
1s.
M1 → { s[ntop]=0 }
M2 → { s[ntop]=s[top]+1 }
339
Probl
ems
• The modified grammar cannot be LR grammar anymore.
L → Lb L → M Lb
L→a ➔ L→a NOT LR-grammar
M→
S’ → .L, $
L → . M L b,$
L → . a, $
M → .,a ➔ shift/reduce conflict
340
Intermediate Code
Generation
• Intermediate codes are machine independent codes, but
they are close to machine instructions.
341
Intermediate Code
Generation
– postfix notation can be used as an intermediate language.
– three-address code (Quadraples) can be used as an
intermediate language
• we will use quadraples to discuss intermediate code
generation
• quadraples are close to machine instructions, but they are
not actual machine instructions.
– some programming languages have well defined intermediate
languages.
• java – java virtual machine
• prolog – warren abstract machine
• In fact, there are byte-code emulators to execute
instructions in these intermediate languages.
342
Three-Address Code
(Quadraples)
• A quadraple is:
x := y op z
where x, y and z are names, constants or compiler-generated
temporaries; op is any operator.
343
Three-Address
Statements
Binary Operator: op y,z,result or result := y op z
where op is a binary arithmetic or logical operator. This binary
operator is applied to y and z, and the result of the operation is
stored in result.
Ex: add a,b,c
gt a,b,c
addr a,b,c
addi a,b,c
344
Three-Address
Statements (cont.)
Move Operator: mov y,,result or result := y
where the content of y is copied into result.
Ex: mov a,,c
movi a,,c
movr a,,c
345
Three-Address Statements (cont.)
Conditional Jumps: jmprelop y,z,L or if y
relop z goto L
We will jump to the three-address code with the label L if the result of y
relop z is true, and the execution continues from that statement. Ifthe
result is false, the execution continues from the statement following this
conditional jump statement.
346
Three-Address
Statements (cont.)
Our relational operator can also be a unary operator.
jmpnz y,,L1 // jump to L1 if y is not zero
jmpz y,,L1 // jump to L1 if y is zero
jmpt y,,L1 // jump to L1 if y is true
jmpf y,,L1 // jump to L1 if y is false
347
Three-Address
Statements
Procedure Parameters:
(cont.)
param x,, or param x
Procedure Calls: call p,n, or call p,n
where x is an actual parameter, we invoke the procedure p
with n parameters.
Ex: param x1,,
param x2,,
➔ p(x1,...,xn)
param xn,,
call p,n,
f(x+1,y) ➔ add x,1,t1
param t1,,
param y,,
348
call f,2,
Three-Address
Statements (cont.)
Indexed Assignments:
move y[i],,x or x := y[i]
move x,,y[i] or y[i] := x
349
Syntax-Directed Translation into
Three-Address Code
S → id := E “.code = E.code || gen(‘mov’ E.place ‘,,’ id.place)
350
Syntax-Directed
Translation (cont.)
S → while E do S1 S.begin = newlabel();
S.after = newlabel();
“.code = gen(“.begin “:”) || E.code ||
gen(‘jmpf’ E.place ‘,,’ “.after) || “1.code ||
gen(‘jmp’ ‘,,’ “.begin) ||
gen(“.after ‘:”)
S → if E then S1 else S2 S.else = newlabel();
S.after = newlabel();
S.code = E.code ||
gen(‘jmpf’ E.place ‘,,’ “.else) || “1.code ||
gen(‘jmp’ ‘,,’ “.after) ||
gen(“.else ‘:”) || “2.code ||
gen(“.after ‘:”)
351
Translation Scheme to Produce
S → id := E
Three-Address Code
{ p= lookup(id.name);
if (p is not nil) then emit(‘mov’ E.place ‘,,’ p)
else error(“undefined-variable”) }
E → E1 +E2 { E.place = newtemp();
emit(‘add’ E1.place ‘,’ E2.place ‘,’ E.place) }
E → E1 *E2 { E.place = newtemp();
emit(‘mult’ E1.place ‘,’ E2.place ‘,’ E.place) }
E → - E1 { E.place = newtemp();
emit(‘uminus’ E1.place ‘,,’ E.place) }
E → ( E1 ) { E.place = E1.place; }
E → id { p= lookup(id.name);
if (p is not nil) then E.place = id.place
else error(“undefined-variable”) }
352
Translation Scheme
S → id := { E.inloc =with
S.inlocLocations
}E
{ p = lookup(id.name);
if (p is not nil) then { emit(E.outloc ‘mov’ E.place ‘,,’ p);
S.outloc=E.outloc+1 }
else { error(“undefined-variable”); “.outloc=E.outloc
}}
354
Boolean
Expressions
E → { E1.inloc = E.inloc } E1 and { E2.inloc = E1.outloc }E2
{ E.place = newtemp(); emit(E2.outloc ‘and’ E1.place ‘,’
E2.place ‘,’ E.place); E.outloc=E2.outloc+1 }
E → { E1.inloc = E.inloc } E1 or { E2.inloc = E1.outloc }E2
{ E.place = newtemp(); emit(E2.outloc ‘and’ E1.place ‘,’
E2.place ‘,’ E.place); E.outloc=E2.outloc+1 }
E → not { E1.inloc = E.inloc } E1
{ E.place = newtemp(); emit(E1.outloc ‘not’ E1.place ‘,,’
E.place); E.outloc=E1.outloc+1 }
E → { E1.inloc = E.inloc } E1 relop { E2.inloc = E1.outloc } E2
{ E.place = newtemp();
emit(E2.outloc relop.code E1.place ‘,’ E2.place ‘,’
E.place); E.outloc=E2.outloc+1 }
355
Translation
Scheme(cont.)
S → while { E.inloc = S.inloc } E do
{ emit(E.outloc ‘jmpf’ E.place ‘,,’‘NOTKNO→N’);
S1.inloc=E.outloc+1; } S1
{ emit(S1.outloc ‘jmp’ ‘,,’S.inloc);
S.outloc=S1.outloc+1;
backpatch(E.outloc,S.outloc); }
356
Translation
Scheme(cont.)
S → if { E.inloc = S.inloc } E then
{ emit(E.outloc ‘jmpf’ E.place ‘,,’‘NOTKNO→N’);
S1.inloc=E.outloc+1; } S1 else
{ emit(S1.outloc ‘jmp’ ‘,,’‘NOTKNO→N’);
S2.inloc=S1.outloc+1;
backpatch(E.outloc,S2.inloc); } S2
{ S.outloc=S2.outloc;
backpatch(S1.outloc,S.outloc); }
357
Three Address
Codes - Example
x:=1; 01: mov 1,,x
y:=x+10; 02: add x,10,t1
while (x<y) { ➔ 03: mov t1,,y
x:=x+1; 04: lt x,y,t2
if (x%2==1) then y:=y+1; 05: jmpf t2,,17
else y:=y-2; 06: add x,1,t3
} 07: mov t3,,x
358
Three Address
Codes - Example
08: mod x,2,t4
09: eq t4,1,t5
10: jmpf t5,,14
11: add y,1,t6
12: mov t6,,y
13: jmp ,,16
14: sub y,2,t7
15: mov t7,,y
16: jmp ,,4
17:
359
Arr
ays
• Elements of arrays can be accessed quickly if the elements
are stored in a block of consecutive locations.
A one-dimensional arrayA:
… …
362
Two-Dimensional
•
Arrays (cont.)
The location of A[i ,i ] is
1 2
baseA+ ((i1-low1)*n2+i2-low2)*width
baseA is the location of the array A.
low1 is the index of the first row
low2 is the index of the first column
n2 is the number of elements in each row
width is the width of each array element
• Again, this formula can be re-written as
((i1*n2)+i2)*width + (baseA-((low1*n1)+low2)*width)
L → id | id [ Elist ]
Elist → Elist , E | E
L → id | Elist ]
Elist → Elist , E | id [ E
365
Translation Scheme for
Arrays (cont.)
S → L := E { if (L.offset is null) emit(‘mov’ E.place ‘,,’L.place)
else emit(‘mov’ E.place ‘,,’ L.place ‘[‘ L.offset ‘]’)}
E → E1 + E2 { E.place = newtemp();
emit(‘add’E1.place ‘,’ E2.place ‘,’ E.place) }
E → ( E1 ) { E.place = E1.place; }
366
Translation Scheme for
L → id { L.place =Arrays (cont.)
id.place; L.offset = null; }
L → Elist ]
{ L.place = newtemp(); L.offset = newtemp();
emit(‘mov’ c(Elist.array) ‘,,’L.place);
emit(‘mult’Elist.place ‘,’width(Elist.array) ‘,’L.offset)
}
Elist → Elist1 , E
{ Elist.array = Elist1.array ; Elist.place = newtemp();
Elist.ndim = Elist1.ndim + 1;
emit(‘mult’ Elist1.place ‘,’limit(Elist.array,Elist.ndim)
‘,’Elist.place);
emit(‘add’Elist.place ‘,’E.place ‘,’Elist.place); }
Elist → id [ E
{Elist.array = id.place ; Elist.place = E.place; Elist.ndim
= 1; }
367
Translation Scheme for
Arrays – Example1
• A one-dimensional double arrayA : 5..100
➔ n1=95 width=8 (double) low1=5
368
Translation Scheme for
Arrays – Example2
• A two-dimensional int array A : 1..10x1..20
➔ n1=10 n2=20 width=4 (integers) low1=1 low2=1
mult y,20,t1
add t1,z,t1
mov c,,t2 // where c=baseA-
(1*20+1)*4
mult t1,4,t3
mov t2[t3],,t4
mov t4,,x
369
Translation Scheme for
Arrays – Example3
• A three-dimensional int array A : 0..9x0..19x0..29
➔ n1=10 n2=20 n3=30 width=4 (integers) low1=0 low2=0
low3=0
370
Declara
tions
P→ MD
M→€ { offset=0 }
D→ D;D
D → id : T { enter(id.name,T.type,offset);
offset=offset+T.width }
T → int { T.type=int; T.width=4 }
T → real { T.type=real; T.width=8 }
T → array[num] of T1 { T.type=array(num.val,T1.type);
T.width=num.val*T1.width }
T → ↑ T1 { T.type=pointer(T1.type); T.width=4 }
372
Nested Procedure
Declarations
addwidth(symtable,width) – puts the total width of all entries
in the symbol table into the header of that table.
373
Nested Procedure
P→MD Declarations pop(tblptr);
{ addwidth(top(tblptr),top(offset));
pop(offset) }
N→ € { t=mktable(top(tblptr)); push(t,tblptr);
push(0,offset) }
374
Intermediate Code
Generation
375
Intermediate Code
• Translating source Generation
program into an “intermediate
language.”
– Simple
– CPU Independent,
– …yet, close in spirit to machine language.
• Or, depending on the application other intermediate
languages may be used, but in general, we opt for simple,
well structured intermediate forms.
• (and this completes the “Front-End” of Compilation).
Benefits
1. Retargeting is facilitated
2. Machine independent Code Optimization can be
applied. 376
Intermediate Code
❖ Intermediate codesGeneration (II)
are machine independent codes, but they are close to
machine instructions.
❖ The given program in a source language is converted to an equivalent program
in an intermediate language by the intermediate code generator.
❖ Intermediate language can be many different languages, and the designer of
the compiler decides this intermediate language.
377
Types of
Intermediate
•
Languages
Graphical Representations.
– Consider the assignment a:=b*-c+b*-c:
assign
assign
+
a + a
*
* *
b uminus uminus
uminus b
c c
b c
378
Syntax Dir. Definition for
Assignment Statements
PRODUCTION Semantic Rule
S → id := E { S.nptr = mknode (‘assign’, mkleaf(id, id.entry),
E.nptr) }
379
Three
Address
•
Code
Statements of general form x:=y op z
• As a result, x:=y + z * w
should be represented as
t1:=z * w
t2:=y + t1
x:=t2
380
Three
Address
•
Code
Observe that given the syntax-tree or the dag of the
graphical representation we can easily derive a three
address code for assignments as above.
381
Example of 3-
address code
t1:= - c t1:=- c
t2:= b * t1 t2:= b *
t3:= - c t1
t4:= b * t3 t5:= t2 +
t5:= 2 + t4 t2
a:= t5 a:= t5
382
Types of Three-Address
Statements.
Assignment Statement: x:=y op z
Assignment Statement: x:=op z
Copy Statement: x:=z
Unconditional Jump: goto L
Conditional Jump: if x relop y goto L
Stack Operations: Push/pop
More Advanced:
Procedure:
param x1
param x2
…
param xn
call p,n
383
Types of Three-Address
Statements.
Index Assignments:
x:=y[i]
x[i]:=y
Address and Pointer Assignments:
x:=&y
x:=*y
*x:=y
384
Syntax-Directed Translation
•
into 3-address
First deal with assignments.
code.
• Use attributes
– E.place: the name that will hold the value of E
• Identifier will be assumed to already have the place
attribute defined.
– E.code:hold the three address code statements that
evaluate E (this is the `translation’ attribute).
• Use function newtemp that returns a new temporary
variable that we can use.
• Use function gen to generate a single three address
statement given the necessary information (variable names
and operations).
385
Syntax-Dir. Definition for
3-address code
PRODUCTION Semantic Rule
S → id := E { S.code = E.code||gen(id.place ‘=’ E.place ‘;’) }
E → E1 + E2 {E.place= newtemp ;
E.code = E1.code || E2.code ||
|| gen(E.place‘:=’E1.place‘+’E2.place) }
E → E 1 * E2 {E.place= newtemp ;
E.code = E1.code || E2.code ||
|| gen(E.place‘=’E1.place‘*’E2.place) }
E → - E1 {E.place= newtemp ;
E.code = E1.code ||
|| gen(E.place ‘=’ ‘uminus’ E1.place) }
E → ( E1 ) {E.place= E1.place ; E.code = E1.code}
E → id {E.place = id.entry ; E.code = ‘’ }
387
What about things that are not
assignments?(cont)
S.code= gen(S.begin ‘:’)
|| E.code
|| gen(‘if’ E.place ‘=’‘0’‘goto’ S.after)
|| S1.code
|| gen(‘goto’S.begin)
|| gen(S.after ‘:’)
388
Implementations of 3-address statements
•Quadruples
t1:=- c op arg1 arg2 result
t2:=b * t1 (0) uminu c t1
t3:=- c
s
t4:=b * t3
t5:=t2 + t4 (1) * b t1 t2
a:=t5
(2) uminu c
s
(3) * b t3 t4
(4) + t2 t4 t5
(5) := t5 a
Temporary names must be entered into the symbol table as they are created.
389
Implementations of 3-address statements, II
op arg1 arg2
•Triples
(0) uminu c
t1:=- c s
t2:=b * t1 (1) * b (0)
t3:=- c
(2) uminu c
t4:=b * t3 s
t5:=t2 + t4 (3) * b (2)
a:=t5
(4) + (1) (3)
(5) assign a (4)
Temporary names are not entered into the symbol table.
390
Other types of 3-
• e.g. ternary address
operations like statements
x[i]:=y x:=y[i]
• require two or more entries. e.g.
op arg1 arg2
(0) []= x i
(1) assign (0) y
op arg1 arg2
(0) []= y i
(1) assign x (0)
391
Implementations of 3-
•
address
Indirect Triples
statements, III
op op arg1 arg2
392
Dealing with
Procedures
P → procedure id ‘;’ block ‘;’
Semantic Rule
begin = newlabel;
Enter into symbol-table in the entry of the procedure name
the begin label.
P.code = gen(begin ‘:’) || block.code ||
gen(‘pop’ return_address) || gen(“goto return_address”)
S → call id
Semantic Rule
Look up symbol table to find procedure name. Find its begin
label called proc_begin
return = newlabel;
S.code = gen(‘push’return); gen(goto proc_begin) || 393
Declarat
ions
Using a global variable offset
394
Nested Procedure
Declarations
•For each procedure we should create a symbol table.
mktable(previous) – create a new symbol table where
previous is the parent symbol table of this new symbol
table
enter(symtable,name,type,offset) – create a new entry for a
variable in the given symbol table.
enterproc(symtable,name,newsymbtable) – create a new
entry for the procedure in the symbol table of its parent.
addwidth(symtable,width) – puts the total width of all entries
in the symbol table into the header of that table.
• We will have two stacks:
– tblptr – to hold the pointers to the symbol tables
– offset – to hold the current offsets in the symbol tables
in tblptr stack. 395
Keeping Track of Scope
Information
Consider the grammar fraction:
P→D
D → D ; D | id : T | proc id ; D ; S
396
Keeping Track of Scope
Information
(a translation scheme)
P→ MD { addwidth(top(tblptr), top(offset));
pop(tblptr); pop(offset) }
M→ { t:=mktable(null); push(t, tblptr);
push(0,
offset)}
D → D1 ; D2 ...
D → proc id ; N D ; S { t:=top(tblpr);
addwidth(t,top(offset));
pop(tblptr); pop(offset);
enterproc(top(tblptr), id.name, t)}
N → {t:=mktable(top(tblptr)); push(t,tblptr);
397
Keeping Track of Scope
Information
D → id : T {enter(top(tblptr), id.name, T.type, top(offset);
top(offset):=top(offset) + T.width
398
Type Checking
. 399
Static
Checking
Abstract Decorated
Syntax Abstract
Tree Syntax Tree
400
Type
Checking
• Problem: Verify that a type of a construct matches that
expected by its context.
• Examples:
– mod requires integer operands (PASCAL)
– * (dereferencing) – applied to a pointer
– a[i] – indexing applied to an array
– f(a1, aβ, …, an) – function applied to correct arguments.
• Information gathered by a type checker:
– Needed during code generation.
401
Type
Systems
• A collection of rules for assigning type expressions to the
various parts of a program.
Type Constructors
Basic Types
Arrays
Boolean Character
Records
Real Integer
Sets
Enumera Sub-ranges Pointers
tions Functions
Void Error
403
Variables Names
Representation of Type
Expressions
cell = record
-> ->
x
x pointr x pointr x x
struct cell {
Tree DAG
int info;
struct cell * next;
(char x char)-> pointer (integer) };
404
Type Expressions
Grammar
Type -> int | float | char | …
| void
| error
| name Basic Types
| variable
| array( size, Type)
| record( (name, Type)*)
| pointer( Type)
Structured
| tuple((Type)*)
Types
| arrow(Type, Type)
405
A Simple Typed
Language
Program -> Declaration; Statement
Declaration -> Declaration; Declaration
| id: Type
Statement -> Statement; Statement
| id := Expression
| if Expression then Statement
| while Expression do Statement
Expression -> literal | num | id
| Expression mod Expression
| E[E] | E ↑ | E (E)
406
Type Checking
Expressions
E -> int_const { E.type = int }
E -> float_const { E.type = float }
E -> id { E.type = sym_lookup(id.entry, type) }
E -> E1 + E2 {E.type = if E1.type {int, float} | E2.type
{int, float}
then error
else if E1.type == E2.type == int
then int
else float }
407
Type Checking
Expressions
E -> E1 [E2] {E.type = if E1.type = array(S, T) &
E2.type = int then T else error}
E -> *E1 {E.type = if E1.type = pointer(T) then T else error}
E -> &E1 {E.type = pointer(E1.tye)}
E -> E1 (E2) {E.type = if (E1.type = arrow(S, T) &
E2.type = S, then T else err}
E -> (E1, E2) {E.type = tuple(E1.type, E2.type)}
408
Type Checking
Statements
S -> id := E {S.type := if id.type = E.type then void else error}
409
Equivalence of Type
Expressions
Problem: When in E1.type = E2.type?
– We need a precise definition for type equivalence
– Interaction between type equivalence and type
representation
411
Algorithm Testing
Structural Equivalence
function stequiv(s, t): boolean
{
if (s & t are of the same basic type) return true;
if (s = array(s1, s2) & t = array(t1, t2))
return equal(s1, t1) & stequiv(s2, t2);
if (s = tuple(s1, s2) & t = tuple(t1, t2))
return stequiv(s1, t1) & stequiv(s2, t2);
if (s = arrow(s1, s2) & t = arrow(t1, t2))
return stequiv(s1, t1) & stequiv(s2, t2);
if (s = pointer(s1) & t = pointer(t1))
return stequiv(s1, t1);
}
412
Recursive
Types
Where: Linked Lists, Trees, etc.
How: records containing pointers to similar records
Example: type link = ‡ cell;
cell = record info: int; next = link end
Representation:
x x
x x x x
• Example:
– struct cell {int info; struct cell * next;}
414
Overloading Functions
& Operators
• Overloaded Symbol: one that has different meanings
depending on its context
415
Overloading
Example
function “*” (i, j: integer) return complex;
function “*” (x, y: complex) return complex;
* Has the following types:
arrow(tuple(integer, integer), integer)
arrow(tuple(integer, integer), complex)
arrow(tuple(complex, complex), complex)
int i, j;
k = i * j;
416
Narrowing
Down Types
E’ -> E {E’.types = E. types
E.unique = if E’.types = {t} then t else
error}
E -> id {E.types = lookup(id.entry)}
E -> E1(E2) {E.types = {s’ | s E2.types and S->s’
E1.types}
t = E.unique
S = {s | s E2.types and S->t E1.types}
E2.unique = if S ={s} the S else error
E1.unique = if S = {s} the S->t else error
417
Polymorphic
Functions
• Defn: a piece of code (functions, operators) that can be
executed with arguments of different types.
• Example HL:
fun length(lptr) = if null (lptr) then 0
else length(+l(lptr)) + 1
418
A Language for
Polymorphic Functions
P -> D ; E
D -> D ; D | id : Q
Q -> α. Q | T
T -> arrow (T, T) | tuple (T, T)
| unary (T) | (T)
| basic
|α
E -> E (E) | E, E | id
419
Type
Variables
• Why: variables representing type expressions allow us to
talk about unknown types.
– Use Greek alphabets α, þ, y …
• Application: check consistent usage of identifiers in a
language that does not require identifiers to be declared
before usage.
– A type variable represents the type of an undeclared
identifier.
• Type Inference Problem: Determine the type of a language
constant from the way it is used.
– We have to deal with expressions containing variables.
420
Examples of Type
Type link ‡ cell;
Inference
Procedure mlist (lptr: link; procedure p);
{ while lptr <> null { p(lptr); lptr := lptr ‡ .next} }
Hence: p: link -> void
Function deref (p){ return p ‡; }
P: þ, þ = pointer(α)
Hence deref: α. pointer(α) -> α
421
Program in
Polymorphic Language
apply: α0
deref: α. pointer(α) -> α
q: pointer (pointer (integer))
deref (deref( (q)) deref0: pointer (α0 ) -> α0 apply: αi
422
Type Checking
Polymorphic Functions
• Distinct occurrences of a p.f. in the same expression need
not have arguments of the same type.
– deref ( deref (q))
– Replace α with fresh variable and remove (αi, αo)
423
Substitutions and
•
Unification
Substitution: a mapping from type variables to type expressions.
Function subst (t: type Expr): type Expr { S
if (t is a basic type) return t;
if (t is a basic variable) return S(t); --identify if t S
if (t is t1 -> t2) return subst(t1) -> subst (t2); }
424
Polymorphic Type checking
Translation Scheme
E -> E1 (E2) { p := mkleaf(newtypevar); unify
(E1.type, mknode(‘-
>’, Eβ.type,p);
E.type = p}
E -> E1, E2 {E.type := mknode(‘x’, E1.type, Eβ.type); }
E -> id { E.type := fresh (id.type) }
425
PType Checking
Example
Given: derefo (derefi (q)) q = pointer (pointer (int))
Bottom Up: fresh (α. Pointer(α) -> α)
-> : 3
pointer :
derefo derefi q 2
α:1
-> : 3 -> : 6 pointer :
9
pointer : pointer : pointer :
2 5 8
αo : 1 αi : 4 integer : 7
n-> : 6
-> : 3 m-> : 6
pointer : þ:8
pointer : pointer : 5
2 5 pointer :
αo : 1 αi : 4 8 426
integer : 7
UNIT-3
PART-B
SYMBOLTABLE
Symbol tables
l-value r-value
Symbol table entries
• Variable names
• Constants
• Procedure names
• Function names
• Literal constants and strings
• Compiler generated temporaries
• Labels in source languages
Compiler uses following types of information from symbol table
1. Data type
2. Name
3.Declaring procedures
4.Offset in storage
Symbol table entries
name attribute
C A L C U L A T E
S U m
b
Variable length name
• Variable length
name
Starting length attribute
index
0 10
10 4
14 2
16 2
Symbol table management
• Data structure for symbol table :
1 . Linear list
2 . Arrays
The pointer variable is maintained at the end of all stored records .
Name 1 Info 1
Name2 Info 2
Name 3 Info 3
. .
. .
. .
Name 2 Info 2
first
Name 3 Info 3
Name 4 Info 4
• Ex:
• Int m,n,p;
• Int compute(int a,int b,intc)
• {
• T=a+b*c;
• Return(t);
• }
• Main()
• {
• Int k;
• K=compute(10,20,30)
• }
Symbol table management
a int m int
n int
b int
c int p int
• Hash tables :
• It is used to search the records of symbol table .
• In hashing two tables are maintained a hash table and symbol
table .
• Hash table contains of k entries from 0,1 to k-1 . These entries
are basically pointers to symbol table pointing to the names of
symbol table .
• To determine where the name is in symbol table ,we use a hash
function ‘h’ such that h(name) will result any integer between 0
to k-1 . We can search any name by
position = h(name)
Using this position we can obtain the exact locations of name in
symbol table .
Symbol table management
• Hash tables :
The hash function should result in uniform distribution of names
in symbol table .
• Hash tables :
Hash table name info Hash link
sum sum
j
avg
avg
. .
. .
Symbol table management
• Hash tables :
• The advantages of hash table is quick search is possible .
• The disadvantage is that hashing is complicated to implement .
Some extra space is required . Obtaining scope of variables is
very difficult .
Symbol table management
• HEAPALLOCATION:
• 1.code area
• 2.Static data area
• 3.Stack area
• 4.heap area
• There are 3 different storage allocation strategies based on this
division of run time storage .the strategies are
• 1.Static allocation :At compile time
• 2.Stack allocation : A stack is used to manage the run time
manage .
• 3 . Heap allocation : heap is used to manage the dynamic
memory allocation .
Static Allocation
Return value
Actual parameters
Control link(dynamic link)
Access link (static link)
Saved machine status
Local variables
Temporaries
Activation Record
• Temporary values : These values are needed during the evaluation of
expressions .
• Local variables : the local data is a data that is local to the execution of
procedure is stored in this field of activation record .
• Saved machine registers : the status of machine just before the
procedure is called .this field contains the machine registers and
program counter .
• Control link : this field is optional .it points the activation record of the
calling procedure . This link is also called dynamic link .
• Access link : this field is optional . It refers the non local data in other
activation record . This field is also called static link field .
• Actual parameters : this contains the information about the actual
parameters .
• Return values : this field is used to store the result of a function call .
Storage variable length data
Array x
Array y
Array of A
Control link
Act record for B
Top_sp
• The contents of stack along with base pointer and offset are as
shown below .
Base_ptr
Return value
Act rec for proc. Dynamic link
A
Saved registers offset
parameters
Locals : a
Access to local data
top
Local data
access
Used by block Used by non block
Handling non local data
structured stuctured languages
languages
Static scope or
lexical scope Dynamic scope
Access link
• By using pointers to each record. test
• These pointers are called access links. Access link
a:
a: a: i,b:
• Displays :
• It is expensive to traverse down access link every time when a
particular local variable is accessed . To speed up the access to
non local can be achieved by maintaining an array of pointers
called display.
• In display
• An array of pointers to activation record is maintained.
• Array is indexing by nesting level.
• The pointers points to only accessible activation record.
• The display changes when a new activation occurs and it must
be reset when control returns from the new activation.
Storage allocation for non block structured languages
• Dynamic scope :
• 1. deep access : the idea is keep a stack of active variables, use
control links instead of access links and when you want to find a
variable then search the stack from top to bottom looking for
most recent activation record that contains the space for desired
variables . This method of accessing non local variables is called
Deep access .
• In this method a symbol table is needed to be used at run time .
• Shallow access : the idea is to keep a central storage with one
slot for every variable name . If the names are not created at run
time then that storage layout can be fixed at compile time
otherwise when new activation of procedure occures,then that
procedure changes the storage entries for its locals at entry and
exit .
Comparison of Deep and Shallow access