CD Module 4new full
CD Module 4new full
------------------------------------------------------------------------------------------------------------------------
⮚ Semantic analysis is a phase by a compiler that adds semantic information to the parse
⮚ It logically follows the parsing phase, in which the parse tree is generated, and logically
precedes the code generation phase, in which (intermediate/target) code is generated. (In
a compiler implementation, it may be possible to fold different phases into one pass.)
information ( type checking ) and the binding of variables and function names to their
definitions ( object binding ).
⮚ Sometimes also some early code optimization is done in this phase. For this phase the
compiler usually maintains symbol tables in which it stores what each symbol (variable
names, function names, etc.) refers to.
2. Type checking : The process of verifying and enforcing the constraints of types is called type checking.
⮚ This may occur either at compile-time (a static check) or run-time (` dynamic check).
⮚ Static type checking is a primary task of the semantic analysis carried out by a compiler.
⮚ If type rules are enforced strongly (that is, generally allowing only those automatic type
conversions which do not lose information), the process is called strongly typed, if not, weakly
typed.
3. Uniqueness checking : Whether a variable name is unique or not, in the its scope.
4. Type coercion : If some kind of mixing of types is allowed. Done in languages which are not
strongly typed. This can be done dynamically as well as statically.
5. Name Checks : Check whether any variable has a name which is not allowed. Ex. Name is
same as an identifier( Ex. int in java).
⮚ A parser has its own limitations in catching program errors related to semantics,
⮚ Typical features of semantic analysis cannot be modeled using context free grammar
formalism.
⮚ If one tries to incorporate those features in the definition of a language then that language
⮚ These are a couple of examples which tell us that typically what a compiler has to do
⮚ An identifier x can be declared in two separate functions in the program, once of the type
int and then of the type char. Hence the same identifier will have to be bound to these two
different properties in the two different contexts.
Semantic Errors
We have mentioned some of the semantics errors that the semantic analyzer is expected to
recognize:
● Type mismatch
● Undeclared variable
⮚ There are two ways to represent the semantic rules associated with grammar symbols.
⮚ The right part of the CFG contains the semantic rules that specify how the grammar
should be interpreted. Here, the values of non-terminals E and T are added together and
the result is copied to the non-terminal E.
⮚ Semantic attributes may be assigned to their values from their domain at the time of
parsing and evaluated at the time of assignment or conditions.
⮚ Based on the way the attributes get their values, they can be broadly divided into two
categories : synthesized attributes and inherited attributes
ATTRIBUTES
Synthesized Inherited
1. Synthesized Attributes: These are those attributes which get their values from their
children nodes i.e. value of synthesized attribute at node is computed from the values of
attributes at children nodes in parse tree.
⮚ Write the SDD using apppropriate semantic rules for each production in given grammar.
⮚ The annotated parse tree is generated and attribute values are computed in bottom up
manner.
Let us assume an input string 4 * 5 + 6 for computing synthesized attributes. The annotated parse
tree for the input string is
S
⮚ For computation of attributes we start from leftmost bottom node. The rule F –> digit is
used to reduce digit to F and the value of digit is obtained from lexical analyzer which
becomes value of F i.e. from semantic action F.val = digit.lexval.
⮚ Hence, F.val = 4 and since T is parent node of F so, we get T.val = 4 from semantic
⮚ Then, for T –> T1 * F production, the corresponding semantic action is T.val = T1.val *
⮚ Similarly, combination of E1.val + T.val becomes E.val i.e. E.val = E1.val + T.val = 26.
Then, the production S –> E is applied to reduce E.val = 26 and semantic action
associated with it prints the result E.val . Hence, the output will be 26.
⮚ The parse tree containing the values of attributes at each node for given input string is
EXAMPLE:
B can get values from A, C and D. C can take values from A, B, and D. Likewise, D can take
values from A, B, and C.
⮚ The annotated parse tree is generated and attribute values are computed in top down
manner.
Consider the following grammar:
D --> T L
T --> int
T -->
float
T --> double
L --> L1, id
D->TL L.in==T.type
Let us assume an input string int a, c for computing inherited attributes. The annotated
parse tree for the input string is
The value of L nodes is obtained from T.type (sibling) which is basically lexical value
obtained as int, float or double.
Then L node gives type of identifiers a and c. The computation of type is done in top
down manner or preorder traversal.
Using function Enter_type the type of identifiers a and c is inserted in symbol table at
corresponding id.entry.
Dependency Graphs are the most general technique used to evaluate syntax directed
definitions with both synthesized and inherited attributes.
Annotated parse tree shows the values of attributes, dependency graph helps to determine
how those values are computed
The interdependencies among the attributes of the various nodes of a parse-tree can be
depicted by a directed graph called a dependency graph.
●
If attribute b depends on an attribute c there is a link from the node for c to the node for b (
Dependency Rule: If an attribute b depends from an attribute c, then we need to find the
semantic rule for c first and then the semantic rule for b.
TYPES OF SDT’S
1. S –attributed definition
2. L –attributed definition
S-attributed definition
⮚ S-attributed SDTs are evaluated in bottom-up parsing, as the values of the parent nodes
depend upon the values of the child nodes.
L –attributed definition
L stands for one parse from left to right.
Ie, If an SDT uses both synthesized attributes and inherited attributes with a restriction
that inherited attribute can inherit values from parent and left siblings only, it is called
as L-attributed SDT.
EXAMPLE:
A ->BCD {B.a=A.a, C.a=B.a}
C.a=D.a This is not possible because C cannot get attribute value from its right side(right
sibling)
Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right parsing
manner.
Semantic actions are placed anywhere in
RHS.
A->{ }BC
A->B{ }C
A->BC{ }
Note: (Also write SDD for declaration statement as example)
If an attribute is S attributed , it is also L attributed.
Evaluation of L-attributed SDD
SDD for desk calculator/SDD for evaluation of expressions
Evaluate the expression 3*5+4n using the above SDD both in bottom up and top
down approach
Solution: Bottom up evaluation for this expression is shown below
In both case first we need to draw the parse tree.
Then traverse from top to down left to right
In bottom up approach, whenever there is a reduction ,go to the production and carry out
the action
Traversal
visit(m)
A translation needs to relate the static source text of a program to the dynamic actions that must
occur at runtime to implement the program. The program consists of names for procedures,
identifiers etc., that require mapping with the actual memory location at runtime.
Runtime environment is a state of the target machine, which may include software libraries,
environment variables, etc., to provide services to the processes running in the system.
A procedure definition is a declaration that associates an identifier with a statement. The identifier is
procedure name, and statement is the procedure body.
When a procedure name appears with in an executable statement, the procedure is said to be
called at that point.
Activation Tree
Each execution of procedure is referred to as an activation of the procedure. Lifetime of
an activation is the sequence of steps present in the execution of the procedure.
If ‘a’ and ‘b’ be two procedures, then their activations will be non-overlapping (when one
is called after other) or nested (nested procedures).
A procedure is recursive if a new activation begins before an earlier activation of the
same procedure has ended. An activation tree shows the way control enters and leaves,
activations.
Properties of activation trees are :-
❖ The node for procedure ‘x’ is the parent of node for procedure ‘y’ if and only if
the control flows from procedure x to procedure y.
EXAMPLE
Consider the following program of quicksort
main()
{
readarray(); quicksort(1,10);
}
quicksort(int m, int n)
{
int i= partition(m,n);
quicksort(m,i-1);
quicksort(i+1,n);
}
First main function as root then main calls readarray and quicksort.
Quicksort in turn calls partition and quicksort again. The flow of control in a program corresponds
to the depth first traversal of activation tree which starts at the root.
Control Stack
Control stack or runtime stack is used to keep track of the live procedure activations
i.e the procedures whose execution have not been completed.
A procedure name is pushed on to the stack when it is called (activation begins) and it is
popped when it returns (activation ends).
Information needed by a single execution of a procedure is managed using an activation
record.
When a procedure is called, an activation record is pushed into the stack and as soon as
the control returns to the caller function the activation record is popped on as the control
turns to the caller function the activation record is popped.
Then the contents of the control stack are related to paths to the root of the activation tree.
When node n is at the top of the control stack, the stack contains the nodes along the path
from n to the root.
Consider the above activation tree, when quicksort(4,4) gets executed, the contents of
control stack were main() quicksort(1,10) quicksort(1,4), quicksort(4,4)
or may be explicit. The portion of program to which a declaration applies is called the
scope of that declaration.
Binding Of Names
Even if each name is declared once in a program, the same name may denote different
data object at run time. “Data objects” corresponds to a storage location that hold values.
The term environment refers to a function that maps a name to a storage location. The
term state refers to a function that maps a storage location to the value held there.
The executing target program runs in its own logical address space in which each
program value has a location
The management and organization of this logical address space is shared between the
compiler, operating system and target machine. The operating system maps the logical
address into physical addresses, which are usually spread through memory.
Typical subdivision of run time memory.
Code area: used to store the generated executable instructions, memory locations for the
code are determined at compile time
Static Data Area: Is the locations of data that can be determined at compile time
Stack Area: Used to store the data object allocated at runtime. eg. Activation records
Heap: Used to store other dynamically allocated data objects at runtime ( for ex: malloac)
This runtime storage can be subdivided to hold the different components of an existing
system
1. Generated executable code
2. Static data objects
3. Dynamic data objects-heap
4. Automatic data objects-stack
Activation Records
It is LIFO structure used to hold information about each instantiation.
Procedure calls and returns are usually managed by a run time stack called control stack.
Each live activation has an activation record on control stack, with the root of the
activation tree at the bottom, the latter activation has its record at the top of the stack
The contents of the activation record vary with the language being implemented. The
The purpose of the fields of an activation record is as follows, starting from the field for
temporaries.
1. Temporary values, such as those arising in the evaluation of expressions, are stored in the
field for temporaries.
2. The field for local data holds data that is local to an execution of a procedure.
3. The field for saved machine status holds information about the state of the machine just
before the procedure is called. This information includes the values of the program
counter and machine registers that have to be restored when control returns from the
procedure.
4. The optional access link is used to refer to nonlocal data held in other activation records.
5. The optional control /ink paints to the activation record of the caller
6. The field for actual parameters is used by the calling procedure to supply parameters to
the called procedure.
7. The field for the returned value is used by the called procedure to return a value to the
calling procedure, Again, in practice this value is often returned in a register for greater
efficiency.
Returned value
Actual parameters
Local data
temporaries
Heap allocation - allocates and deallocates storage as needed at run time from a data area known
as heap.
Static Allocation
All compilers for languages that use procedures, functions or methods as units of user
functions define actions manage at least part of their runtime memory as a stack run- time
stack.
Each time a procedure called, space for its local variables is pushed onto a stack, and
when the procedure terminates, space popped off from the stack
Calling Sequences
Heap allocation parcels out pieces of contiguous storage, as needed for activation records
or other objects.
Pieces may be deallocated in any order, so over the time the heap will consist of alternate
areas that are free and in use.
Records for live activations need not be adjacent in heap
The record for an activation of procedure r is retained when the activation ends.
Therefore, the record for the new activation q(1 , 9) cannot follow that for s physically.
If the retained activation record for r is deallocated, there will be free space in the heap
between the activation records for s and q.
INTERMEDIATE CODE GENERATION (ICG)
In compiler, the front-end translates a source program into an intermediate representation from
which the back end generates target code.
Need For ICG
1. If a compiler translates the source language to its target machine language without
generating IC, then for each new machine, a full native compiler is required.
2. IC eliminates the need of a new full compiler for every machine by keeping the analysis
portion for all the compilers.
3. Synthesis part of back end depends on the target machine.
2 important things:
⮚ It shouldn’t be difficult to produce the target program from the intermediate code.
A source program can be translated directly into the target language, but some benefits of using
intermediate form are:
INTERMEDIATE LANGUAGES
⮚ Syntax Tree
⮚ Postfix Notation
⮚ 3 Address Code
GRAPHICAL REPRESENTATION
Includes both
Syntax Tree
DAG (Direct Acyclic Graph)
Syntax Tree Or Abstract Syntax Tree(AST)
Graphical Intermediate Representation
Syntax Tree depicts the hierarchical structure of a source program.
Syntax tree (AST) is a condensed form of parse tree useful for representing language
constructs.
EXAMPLE
Parse tree and syntax tree for 3 * 5 + 4 as follows.
Grammar Parse Tree Syntax Tree
E E +T E +
E E-T
E + T * 4
E T
T T*F T F 3 5
T F
T * F digit
F digit
F digit 4
digit 5
3
Parse Tree VS Syntax Tree
Parse Syntax
Tree Tree
A parse tree is a graphical representation A syntax tree (AST) is a condensed form
of a replacement process in a derivation of parse tree
Each interior node represents a grammar Each interior node represents an operator
rule
Each leaf node represents a terminal Each leaf node represents an operand
Parse tree represent every detail from the real Syntax tree does not represent every detail
syntax from the real syntax
Eg : No parenthesis
Each node in a syntax tree can be implemented in arecord with several fields.
In the node of an operator, one field contains operator and remaining field contains pointer
to the nodes for the operands.
When used for translation, the nodes in a syntax tree may contain addition of fields to hold
the values of attributes attached to the node.
Following functions are used to create syntax tree
1. mknode(op,left,right): creates an operator node with label op and two fields
containing pointers to left and right.
2. mkleaf(id,entry): creates an identifier node with label id and a field containing
entry, a pointer to the symbol table entry for identifier
3. mkleaf(num,val): creates a number node with label num and a field containing val,
the value of the number.
Such functions return a pointer to a newly created node.
EXAMPLE
a– 4+c
The tree is
constructed bottom
up
P1 = mkleaf(id,entry a)
P2 = mkleaf(num, 4)
P3 = mknode(-, P1, P2)
P4 = mkleaf(id,entry c)
P5 = mknode(+, P3, P4)
Syntax trees for assignment statements are produced by the syntax-directed definition.
Non terminal S generates an assignment statement.
The two binary operators + and * are examples of the full operator set in a typical
language. Operator associates and precedences are the usual ones, even though they have
not been put into the grammar. This definition constructs the tree from the input a:=b* -c
+ b* -c
The token id has an attribute place that points to the symbol-table entry for the identifier.
A symbol-table entry can be found from an attribute id.name, representing the lexeme
associated with that occurrence of id.
If the lexical analyser holds all lexemes in a single array of characters, then attribute name
might be the index of the first character of the lexeme.
Two representations of the syntax tree are as follows.
In (a), each node is represented as a record with a field for its operator and additional fields
for pointers to its children.
In Fig (b), nodes are allocated from an array of records and the index or position of the
node serves as the pointer to the node.
All the nodes in the syntax tree can be visited by following winters, starting from the
root at position 10.
EXAMPLE
a=b*-c + b*-c
Postfix Notation
Linearized representation of syntax tree
In postfix notation, each operator appears immediately after its last operand.
Operators can be evaluated in the order in which they appear in the string
EXAMPLE
Source String : a := b * -c + b * -c
Postfix String: a b c uminus * b c uminus * + assign
Postfix Rules
2. If E is an expression of the form E1 op E2 then postfix notation for E is E1’ E2’ op, here E1’ and
E2’ are the postfix notations for E1and E2, respectively
3. If E is an expression of the form (E), then the postfix notation for E is the same as the postfix
notation for E.
a = b op c
where,
a, b, c are the operands that can be names, constants or compiler generated temporaries.
op represents operator, such as fixed or floating point arithmetic operator or a logical operator on
Boolean valued data. Thus a source language expression like x + y * z might be translated into a
sequence
t1 := y*z
t2 := x+t1
1. Assignment statements
if x relop y goto L This instruction applies a relational operator ( <, =, =, etc,) to x and y, and
executes the statement with label L next if x stands in relation relop to y. If not, the three-
address statement following if x relop y goto L is executed next, as in the usual sequence.
param x and call p, n for procedure calls and return y, where y representing a
returned value is optional. Their typical use is as the sequence of three-address statements
param x1
param x2
……….
param xn call
p,n
generated as part of the call procedure p( xl , x2, . . . , xn ) . The integer n indicating
the number of actual-parameters in ''call p , n" is not redundant because calls can be
nested.
7. Indexed Assignments
When three-address code is generated, temporary names are made up for the interior
nodes of a syntax tree. for example id : = E consists of code to evaluate E into some
temporary t, followed by the assignment id.place : = t.
Given input a:= b * - c + b + - c, it produces the three address code in given above (page
no: ) The synthesized attribute S.code represents the three address code for the
assignment S. The nonterminal E has two attributes:
Flow of control statements can be added to the language of assignments. The code for S🡪
while E do S1 is generated using new attributes S.begin and S.after to mark the first
statement in the code for E and the statement following the code for S, respectively.
The function newlabel returns a new label every time is called. We assume that a
nonzero expression represents true; that is when the value of E becomes zero, control
laves the while statement
Implementation Of Three-Address Statements
⮚ Quadruples
⮚ Triples
⮚ Indirect triples
QUADRUPLES
A quadruple is a record structure with four fields, which are op, ag1, arg2 and result
The op field contains an internal code for the operator. The three address statement
x:= y op z is represented by placing y in arg1, z in arg2 and x in result.
The contents of arg1, arg2, and result are normally pointers to the symbol table entries
for the names represented by these fields. If so temporary names must be entered into the
symbol table as they are created.
EXAMPLE 1
For the first construct the three address code for the expression
t1 = e ^ f t2 =
b * c t3 = t2 /
t1 t4 = b * a t5
= a + t3 t6 = t5
+ t4
Locati O ar ar Res
on P g1 g2 ult
(0) ^ e f t1
(1) * b c t2
(2) / t2 t1 t3
(3) * b a t4
(4) + a t3 t5
(5) + t3 t4 t6
Exceptions
⇨ A statement like param t1 is represented by placing param in the operator field and t1 in the
arg1 field. Neither arg2 not result field is used
⇨ Unconditional & Conditional jump statements are represented by placing the target in the result
field.
TRIPLES
In triples representation, the use of temporary variables is avoided & instead reference to
instructions are made
So three address statements can be represented by records with only there fields OP, arg1
& arg2.
Since, there fields are used this intermediated code formal is known as triples
Advantages
EXAMPLE 1
a+b*c|e^f+b*a
t1 = e ^ f
t2 = b * c
t3 = t2 / t1
t4 = b * a
t5 = a + t3
t6 = t5 + t4
Locati O ar ar
on P g1 g2
(0) ^ e F
(1) * b C
(2) / (1 (0
) )
(3) * b A
(4) + a (2
)
(5) + (4 (3
) )
EXAMPLE 2
A ternary operation like x[i] : = y requires two entries in the triple structure while x : = y[i] is
naturally represented as two operations.
x[i] := x :=
y y[i]
INDIRECT TRIPLES
Locati op ar ar
on g1 g2
(0) ^ E F
(1) * B C
(2) / (1 (0
) )
(3) * B A
(4) + A (2
)
(5) + (4 (3
) )
Comparison
When we ultimately produce the target code each temporary and programmer defined name
will assign runtime memory location
This location will be entered into symbol table entry of that data.
Using the quadruple notation, a three address statement containing a temporary can
immediately access the location for that temporary via symbol table.
But this is not possible with triples notation.
With quadruple notation, statements can often move around which makes optimization
easier.
This is achieved because using quadruple notation the symbol table interposes high degree of
indirection between computation of a value and its use.
With quadruple notation, if we move a statement computing x, the statement using x
requires no change.
But with triples, moving a statement that defines a temporary value requires us to change
all references to that statement in arg1 and arg2 arrays. This makes triples difficult to use in
optimizing compiler
With indirect triples also, there is no such problem.
A statement can be moved by reordering the statement list.
Space Utilization
Quadruples and indirect triples requires same amount of space for storage (normal case).
But if same temporary value is used more than once indirect triples can save some space.
This is bcz, 2 or more entries in statement array can point to the same line of op-arg1-arg2
structure.
Triples requires less space for storage compared to above 2.
Quadruples
Indirect Triples
⮚ space efficiency
PROBLEM 1
t1 = uniminus c t2 = b* t1
t3 = uniminus c t4 =
b* t3
t5 = t2 + t4 Q = t5
QUADRUPLES
Locatio OP ar ar resu
n g1 g2 lt
(0) uniminu c t1
s
(1) * b t1 t2
(3) uniminu c t3
s
(4) * b t3 t4
(5) + t2 t4 t5
(6) = t5 a
TRIPLES
Locati OP ar ar
on g1 g2
(1) unimin c
us
(2) * b (1
)
(3) unimin c
us
(4) * b (3
)
(5) + (2 (4
) )
(6) = a (5
)
INDIRECT TRIPLES
Statement
s
35 (1)
36 (2)
37 (3)
38 (4)
39 (5)
40 (6)
Locatio OP ar ar
n g1 g2
(1) uniminu C
s
(2) * B (1)
(3) uniminu C 0
s
(4) * B (3)
(5) + (2) (4)
(6) = A (5)
********