0% found this document useful (0 votes)
198 views

VMKV Engineering College Department of Computer Science & Engineering Principles of Compiler Design Unit I Part-A

The document discusses principles of compiler design. It provides definitions and explanations of key concepts in compiler design such as: 1) A compiler is a program that converts a program written in one language (the source language) into another language (the target language). 2) The main phases of a compiler are lexical analysis, syntax analysis, code generation, and code optimization. 3) A symbol table stores information about variables and other symbols in the program being compiled.

Uploaded by

Dular Chandran
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
198 views

VMKV Engineering College Department of Computer Science & Engineering Principles of Compiler Design Unit I Part-A

The document discusses principles of compiler design. It provides definitions and explanations of key concepts in compiler design such as: 1) A compiler is a program that converts a program written in one language (the source language) into another language (the target language). 2) The main phases of a compiler are lexical analysis, syntax analysis, code generation, and code optimization. 3) A symbol table stores information about variables and other symbols in the program being compiled.

Uploaded by

Dular Chandran
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 80

VMKV ENGINEERING COLLEGE

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

PRINCIPLES OF COMPILER DESIGN


Unit I

PART-A

1. What is a compiler?
A program that converts another program from some source
language (or programming language) to machine language
(object code).

2. Name the different phases of a compiler?


Lexical analysis phase or scanning phase
Syntax analysis phase
Intermediate code generation
Code optimization
Code generation.

3. What is a symbol table?


Symbol table is a data structure that contains all variables in the
program and temporary storage and any information needed to
reference or allocate storage for them.

4. What is a token and give examples?


A basic, grammatically indivisible unit of a language such as a
keyword, operator or identifier.

5. What is lexeme? Give an example?


Sequence of character in the source program that are matched
with the pattern of the token .(e.g.) ,int,i,num,ans,choice.

6. Write short note on error handler?


Each phase can encounter errors. After detecting error in a
phase it allows further errors to be detected. In lexical analysis, it
finds errors where it cannot form a token. In syntax analyzer, it
finds where the token stream violates the structure rules. In
semantic phase, it tries to detect constructs that have the right
syntactic structure but no meaning to the operations involved.

7. What is 3-address code and give example?


1. Lexical,
2. syntax,
3. semantic.

8. Define translator
A translator which can translate the high level language to the
low level language.

9. Differentiate star closure & positive closure?

Star closure Positive closure


1. It is also called There is no
as kleen closure specification .
.
2. E.g. LD which 2.L^5 is a set of
represents the string having length
set of string of 5 each.
consisting of
letters followed
by digits.
10. Define syntax tree
Position=initial + rate * 60

The syntax tree for the above statement is,

position +

initial *

rate 60

it is also called as parsing . in this phase the token generated by the


lexical analyser are grouped to form a hierarical structure .

11.What are the issues of the lexical analyzer?


Lexical analysis is the first phase of a compiler. Lexical
analysis, also called scanning, scans a source program form left to
right character-by-character and groups them into tokens having a
collective meaning. Each token or basic syntactic element
represents a logically cohesive sequence of characters such as
identifier (also called variable), a keyword (if, then. else, etc.), a
multi -character operator < =, etc. The output of this phase goes to
the next phase, i.e, syntax analysis or parsing.
The second task performed during lexical analysis is to
make entry of tokens into a symbol table if it is not there.
Some other tasks performed during lexical analysis are:
- To remove all comments, tabs, blank spaces and machine
characters.
- To produce error messages (also called diagnostics) occurred
in a source program.
12. What is a regular expression?
We use regular expressions to describe tokens of a
programming language.
A regular expression is built up of simpler regular expressions
(using defining rules).
Each regular expression denotes a language.
A language denoted by a regular expression is called as a
regular set.

13. Define parse tree


Syntax analysis is the second phase of compilation process.
This process is also called parsing. It performs the following
operations:
1. Obtains a group of tokens from the lexical analyzer.
2. Determines whether a string of tokens can be generated by a
grammar of the language, i.e. it checks whether the
expression is syntactically correct or not.
3. Reports syntax error(s) if any.

14. What are the two parts of compilation process?


Analysis & synthesis are the two parts of compilation. The
analysis part is carried out in 3 sub parts they are lexical analysis,
syntax analysis, semantic analysis.

15. List out the compiler construction tools?


1. Data Flow Engines
2. Parser Generator:
3. Syntax Directed Translation Engine
4. Automatic code generators
5. Scanner Generator
16.What is an assembly code?
Some compilers produce assembly code and passed to an
assembler for further processing. Some produce relocatable
machine code that can be passed to loader and linker editor. The
assembly code is mnemonic version of machine code, in which
names are used instead of binary codes for operations and names
are also given to memory addresses.
17.What are the classifications of a compiler?
The compiler can be classified as single pass,multi-pass,load nand
go,debugging,or optimization ,depending up how they have been
constructed or on what function they are supported to performed.
18.What are the fronts and back ends of a compiler?
Front end
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
Back end
1. Code optimizer
2. Code generator.

19.What is meant by lexical analysis?


Lexical analysis is the first phase of a compiler. Lexical analysis,
also called scanning, scans a source program form left to right
character-by-character and groups them into tokens having a
collective meaning. Each token or basic syntactic element
represents a logically cohesive sequence of characters such as
identifier (also called variable), a keyword (if, then. else, etc.), a
multi -character operator < =, etc. The output of this phase goes to
the next phase, i.e., syntax analysis or parsing.
20.What are the possible error recovery actions in lexical
analyzer?
1. Erros like comments, tabs, blank spaces and machine
characters.
2. It produces error messages (also called diagnostics)
occurred in a source program

1. Explain specification of tokens in detail.

Token specification
Alphabet :

 a finite set of symbols (ASCII characters)

String :

 Finite sequence of symbols on an alphabet


 Sentence and word are also used in terms of string
  is the empty string
 |s| is the length of string s.

Language:

 sets of strings over some fixed alphabet


  the empty set is a language.
 {} the set containing empty string is a language
 The set of well-wormed C programs is a language
 The set of all possible identifiers is a language.
Operators on Strings:

 Concatenation: xy represents the concatenation of strings x and y.


 s  = s,
 
 s =s
 (Exponentiation) sn = s s s .. s ( n times) s0 = 

Parts of string:

 prefix of s : a string abtained by removing zero or more trailing


symbols of the string s; eg. Com is a prefix of Computer
 suffix of s : a string abtained by removing zero or more leading
symbols of the string s; eg. puter is a suffix of Computer
 sub string of s: a string obtained by deleting the prefix and suffix
from the string s. eg put in computer
 proper prefix, suffix or substring of s: any non empty string x that
is, respectively , a prefix , suffix, or substring of s suct that sx.

Operations on Languages

 •Concatenation:
o L1L2 = { s1s2 | s1  L1 and s2  L2 }
 •
Union

o L1  L2 = { s | s  L1 or s  L2 }
 •Exponentiation:
o L0 = {} L1 = L L2 = LL
 •
 Kleene Closure
o –
 L* = zero or more occurance
 •Positive Closure

o L+ = one or more occurance

Example

 L1 = {a,b,c,d} L2 = {1,2}
 L1L2 = {a1,a2,b1,b2,c1,c2,d1,d2}
 L1 È L2 = {a,b,c,d,1,2}
 L13 = all strings with length three (using a,b,c,d}
 L1* = all strings using letters a,b,c,d and empty string
 L1+ = doesn’t include the empty string

Regular Expressions

 We use regular expressions to describe tokens of a programming


language.
 A regular expression is built up of simpler regular expressions
(using defining rules)
 Each regular expression denotes a language.
 A language denoted by a regular expression is called as a regular
set.
Regular Expressions (Rules)

 Regular expressions over alphabet S

Reg. Expr Language it denotes

 {}

a S {a}

(r1) | (r2) L(r1) L(r2)

(r1) (r2) L(r1) L(r2)

(r)* (L(r))*
(r) L(r)

 (r)+ = (r)(r)*
 (r)? = (r) | 

 We may remove parentheses by using precedence rules.

o * highest
o concatenation next
o | lowest
ab*|c means (a(b)*)|(c)
Examples:

 S = {0,1}
 0|1 => {0,1}
 (0|1)(0|1) => {00,01,10,11}
 0* => { ,0,00,000,0000,....}
 (0|1)* => all strings with 0 and 1, including the empty string

Regular Definitions

 To write regular expression for some languages can be difficult,


because their regular expressions can be quite complex. In those
cases, we may use regular definitions.
 We can give names to regular expressions, and we can use these
names as symbols to define other regular expressions.
 A regular definition is a sequence of the definitions of the form:
d1  r1 where di is a distinct name and

d2  r2 ri is a regular expression over


symbols in

.  S{d1,d2,...,di-1}

dn  rn
basic symbols previously defined
names

Ex:Identifiers in Pascal

 letter  A | B | ... | Z | a | b | ... | z


 digit  0 | 1 | ... | 9
 id  letter (letter | digit ) *

 If we try to write the regular expression representing identifiers


without using regular definitions, that regular expression will be
complex.
o (A|...|Z|a|...|z) ( (A|...|Z|a|...|z) | (0|...|9) ) *
 •

 Ex: Unsigned numbers in Pascal

 digit  0 | 1 | ... | 9
 digits  digit +
 opt-fraction  ( . digits ) ?
 opt-exponent  ( E (+|-)? digits ) ?
 unsigned-num digits opt-fraction opt-exponent

1. Briefly explain about operate precedent parsing with example


Ambiguity – Operator Precedence
Ambiguous grammars (because of ambiguous operators) can be
disambiguated according to the precedence and associativity rules.

E  E+E | E*E | E^E | id | (E)


disambiguate the grammar

precedence: ^ (right to left)


* (left to right)
+ (left to right)

E  E+T | T

T  T*F | F

F  G^F | G

G  id | (E)

Left Recursion
 A grammar is left recursive if it has a non-terminal A such that
there is a derivation.
o +

o A  A for some string 

 Top-down parsing techniques cannot handle left-recursive


grammars.
 So, we have to convert our left-recursive grammar into an
equivalent grammar which is not left-recursive.
 The left-recursion may appear in a single step of the derivation
(immediate left-recursion), or may appear in more than one step of
the derivation.

Immediate Left-Recursion

AA|  where  does not start with A

 eliminate immediate left recursion


A   A’

A’   A’ |  an equivalent grammar

In general,

A A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with


A

 eliminate immediate left recursion


A  1 A’ | ... | n A’

A’  1 A’ | ... | m A’ |  an equivalent grammar

Immediate Left-Recursion – Example


E  E+T | T

T  T*F | F
F  id | (E)

 eliminate immediate left recursion

E  T E’

E’  +T E’ | 

T  F T’

T’  *F T’ | 

F  id | (E)
Left-Recursion – Problem

 A grammar cannot be immediately left-recursive, but it still can be


 left-recursive.
 By just eliminating the immediate left-recursion, we may not get
 a grammar which is not left-recursive.

S  Aa | b

A Sc | d This grammar is not immediately left-recursive,


but it is still left-recursive.

S  Aa  Sca or

A  Sc  Aac causes to a left-recursion


So, we have to eliminate all left-recursions from our grammar Eliminate
Left-Recursion – Algorithm
Arrange non-terminals in some order: A1 ... An

- for i from 1 to n do {
- for j from 1 to i-1 do {
replace each production

Ai  Aj 

by

Ai  1  | ... | k 

where Aj  1 | ... | k

}
- eliminate immediate left-recursions among Ai productions

Eliminate Left-Recursion – Example


S  Aa | b

A  Ac | Sd | f
- Order of non-terminals: S, A
for S:
- we do not enter the inner loop.
- there is no immediate left recursion in S.
for A:

- Replace A  Sd with A  Aad | bd

So, we will have A  Ac | Aad | bd | f


- Eliminate the immediate left-recursion in A

A  bdA’ | fA’

A’  cA’ | adA’ | 
So, the resulting equivalent grammar which is not left-recursive is:

S  Aa | b
A  bdA’ | fA’

A’  cA’ | adA’ | 

Eliminate Left-Recursion – Example2


S  Aa | b

A  Ac | Sd | f
- Order of non-terminals: A, S
for A:
- we do not enter the inner loop.
- Eliminate the immediate left-recursion in A

A  SdA’ | fA’

A’  cA’ | 
for S:

- Replace S  Aa with S SdA’a | fA’a

So, we will have S  SdA’a | fA’a | b


- Eliminate the immediate left-recursion in S

S  fA’aS’ | bS’

S’  dA’aS’ | 

So, the resulting equivalent grammar which is not left-recursive is:


S  fA’aS’ | bS’

S’  dA’aS’ | 

A  SdA’ | fA’

A’  cA’ | 

2. Briefly explain about a predictive parsing with example

Predictive Parser

a grammar   a grammar suitable for


predictive
eliminate left parsing (a LL(1) grammar)
left recursion factor no %100 guarantee.
When re-writing a non-terminal in a derivation step, a predictive
parser can uniquely choose a production rule by just looking the current
symbol in the input string.

A  1 | ... | n input: ... a .......

current token

stmt  if ...... |
while ...... |
begin ...... |
for .....

 When we are trying to write the non-terminal stmt, if the current


token is if we have to choose first production rule.
 When we are trying to write the non-terminal stmt, we can
uniquely choose the production rule by just looking the current
token.
 We eliminate the left recursion in the grammar, and left factor it.
But it may not be suitable for predictive parsing (not LL(1)
grammar).

Recursive Predictive Parsing


Each non-terminal corresponds to a procedure.

Ex: A  aBb (This is only the production rule for A)


proc A
{
- match the current token with a, and move to the next
token;
- call ‘B’;
- match the current token with b, and move to the next
token;
}

A  aBb | bAB
proc A {
case of the current token {
‘a’: - match the current token with a, and move to the next
token;
- call ‘B’;
- match the current token with b, and move to the next
token;
‘b’: - match the current token with b, and move to the next
token;
- call ‘A’;
- call ‘B’;
}
}

When to apply -productions.


A  aA | bB | e

 If all other productions fail, we should apply an -production. For


example, if the current token is not a or b, we may apply the -
production.
 Most correct choice: We should apply an -production for a non-
terminal A when the current token is in the follow set of A (which
terminals can follow A in the sentential forms).

Recursive Predictive Parsing (Example)

A  aBe | cBd | C

B  bB | e

Cf

proc C
{
match the current token with f, and move to the next token;
}
proc A
{
case of the current token
{
a: - match the current token with a, and move to the next
token; - call B;
- match the current token with e, and move to the next
token;
c: - match the current token with c, and move to the next
token;
- call B;
- match the current token with d, and move to the next
token;
f: - call C
}
}

proc B
{
case of the current token
{
b: - match the current token with b, and move to the next
token;
- call B;
e,d: do nothing
}

f- first set of C
e,d – follow set of B

Non-Recursive Predictive Parsing -- LL(1) Parser


 Non-Recursive predictive parsing is a table-driven parser.
 It is a top-down parser.
 It is also known as LL(1) Parser.

input buffer

stack Non-recursive output


Predictive Parser

Parsing Table

LL(1) Parser

input buffer
 our string to be parsed. We will assume that its end is marked with
a special symbol $.
output

 a production rule representing a step of the derivation sequence


(left-most derivation) of the string in the input buffer.
stack
–contains the grammar symbols

 at the bottom of the stack, there is a special end marker symbol $.


 initially the stack contains only the symbol $ and the starting
symbol S.
 $S  initial stack
 when the stack is emptied (ie. only $ left in the stack), the parsing
is completed.

parsing table

 a two-dimensional array M[A,a]


 each row is a non-terminal symbol
 each column is a terminal symbol or the special symbol $
 each entry holds a production rule.

LL(1) Parser – Parser Actions

The symbol at the top of the stack (say X) and the current symbol in the
input string (say a) determine the parser action.

There are four possible parser actions.


 1. If X and a are $
parser halts (successful completion)
 2.
 If X and a are the same terminal symbol (different from $)
parser pops X from the stack, and moves the next symbol in
the input buffer.
 3. If X is a non-terminal
 parser looks at the parsing table entry M[X,a]. If M[X,a]
holds a production rule XY1Y2...Yk, it pops X from the
stack and pushes Yk,Yk-1,...,Y1 into the stack. The parser
also outputs the production rule XY1Y2...Yk to represent a
step of the derivation.

 4. none of the above


 error
o all empty entries in the parsing table are errors.
o If X is a terminal symbol different from a, this is also an error case.

LL(1) Parser – Example1


S  aBa

B  bB | 

stack input output

$S abba$ S  aBa
$aBa abba$

$aB bba$ B  bB
$aBb bba$
$aB ba$ B  bB
$aBb ba$

$aB a$ Be
$a a$
$ $ accept, successful completion

a b $
S S
aBa
B B B
bB
LL(1) Parsing Table

Outputs: S  aBa B  bB B bB B

Derivation(left-most): SaBaabBaabbBaabba
Unit II

PART-A

1. Define CFG

Context-Free Grammars

Inherently recursive structures of a programming language are


defined by a context-free grammar.
In a context-free grammar,
We have:

 A finite set of terminals (in our case, this will be the set of tokens)
 A finite set of non-terminals (syntactic-variables)
 A finite set of productions rules in the following form
 Aa where A is a non-terminal and a is a string of
terminals and non-terminals (including the empty string)
 A start symbol (one of the non-terminal symbol)

2. What are the advantages of grammar?


In a context-free grammar,
We have:

 A finite set of terminals (in our case, this will be the set of tokens)
 A finite set of non-terminals (syntactic-variables)
 A finite set of productions rules in the following form
 Aa where A is a non-terminal and a is a string of
terminals and non-terminals (including the empty string)
 A start symbol (one of the non-terminal symbol)

3. Define parse tree with example

Parser
 Parser works on a stream of tokens.

 The smallest item is a token.


token parse
Source
Lexical
Analyzer Parser tree
program
4. Define ambiguous grammar.
Get next token

Ambiguity
A grammar produces more than one parse tree for a sentence is called
as an ambiguous grammar.

5. Eliminate left recursion from the grammar


S → (L)/a
T → L,S/S

AA|  where  does not start with A

 eliminate immediate left recursion


A   A’
A’   A’ |  an equivalent grammar
T-> ST’

T’->,ST’/ .
6. What is the role of the error handler in a parser?

Each phase can encounter errors. After detecting error in a


phase it allows further errors to be detected. In lexical analysis, it
finds errors where it cannot form a token. In syntax analyzer, it
finds where the token stream violates the structure rules. In
semantic phase, it tries to detect constructs that have the right
syntactic structure but no meaning to the operations involved.

7. What are the possible action of a shift reduce parsing

A shift-reduce parser tries to reduce the given input string


into the starting symbol.

8. a string  the starting symbol


reduced to

At each reduction step, a substring of the input matching to


the right side of a production rule is replaced by the non-terminal
at the left side of that production rule.
If the substring is chosen correctly, the right most derivation
of that string is created in thereverse order.

Rightmost Derivation: S


Shift-Reduce Parser finds:   ...  S
2. Define bottom up parsing.

A bottom-up parser creates the parse tree of the given input


starting from leaves towards the root.

3. Define top down parsing.

The parse tree is created top to bottom.


1. Top-down parser
2. Recursive-Descent Parsing

 Backtracking is needed (If a choice of a production rule does


not work, we backtrack to try other alternatives.)
 It is a general parsing technique, but not widely used.
 Not efficient

3. Predictive Parsing

 no backtracking
 efficient
 needs a special form of grammars (LL(1) grammars).
 Recursive Predictive Parsing is a special form of Recursive
Descent parsing without backtracking.
 Non-Recursive (Table Driven) Predictive Parser is also
known as LL(1) parser.

4. List out the error recovery strategies

1. panic mode
2. pharse level
3. error productions
4. global correction.
5. Draw the block diagram for syntax analysis
token parse
Source
6. Lexical parser
Analyzer Parser
program
Get next token

8. Construct parse tree for the given statement


i. E=E+E*10
=

E +

E *
E 10

15. What are the terminals? Non Terminals and start symbol for
the grammar
S → (L)|a
L→ L,S|S

AA|  where  does not start with A

 eliminate immediate left recursion


A   A’

A’   A’ |  an equivalent grammar
T-> ST’

T’->,ST’/ .
9. What is the need of left factoring
A predictive parser (a top-down parser without backtracking)
insists that the grammar must be left-factored.

grammar  a new equivalent grammar suitable for predictive


parsing

stmt  if expr then stmt else stmt |


i. if expr then stmt

10. Define parser.

Parser works on a stream of tokens.

The smallest item is a token.


token parse
tree
Source
Lexical
Analyzer Parser
program
We categorize the parsers into two groups:
Get next token

11. What is terminal with example


L(G) is the language of G (the language generated by G)
which is a set of sentences.
A sentence of L(G) is a string of terminal symbols of G.

12. Define Handle


Informally, a handle of a string is a substring that matches
the right side of a production rule.
–But not every substring matches the right side of a production
rule is handle

13. Construct parse tree for the given statement


i. E=E+E*10
=

E +

E *
E 10

15. What are the terminals? Non Terminals and start symbol for
the grammar
S → (L)|a
L→ L,S|S

AA|  where  does not start with A

 eliminate immediate left recursion


A   A’

A’   A’ |  an equivalent grammar
T-> ST’

T’->,ST’/ .
14. What is the need of left factoring
A predictive parser (a top-down parser without backtracking)
insists that the grammar must be left-factored.

grammar  a new equivalent grammar suitable for predictive


parsing

stmt  if expr then stmt else stmt |


i. if expr then stmt

15. Define parser.

Parser works on a stream of tokens.

The smallest item is a token.


token parse
tree
Source
Lexical
Analyzer Parser
program
We categorize the parsers into two groups:
Get next token

16. What is terminal with example


L(G) is the language of G (the language generated by G)
which is a set of sentences.
A sentence of L(G) is a string of terminal symbols of G.

17. Define Handle


Informally, a handle of a string is a substring that matches
the right side of a production rule.
–But not every substring matches the right side of a production
rule is handle

Unit III

PART-A

1. Write the advantage of generating an intermediate


representation

Advantage of machine independent intermediate form

 Retarget is facilitated. [a compiler for a different machine can be


created by attaching a back end for the new machine to an existing
front end]
 A machineindepedent code optimiser can be applied to the
intermediate represntation

2. Define syntax Tree with example.


A syntax tree shows the hierarchical structure of a source program
A DAG gives the same information but in compact way because
common sub expression are identified.
Statements a:= b*-c +b*-c

3. What are the functions used to create the notes of syntax


trees.

A syntax tree shows the hierarchical structure of a source program


A DAG gives the same information but in compact way because
common sub expression are identified.
Statements a:= b*-c +b*-c

4. What are the three kinds of intermediate representation?


Types of intermediate represntation

 Syntax trees
 Postfix notation
 Three address codes: [the semantic rules for generating three
address code from common programming languages]

5. Define quadruple with an example.


A quadruples is a record structure with four fields ,which we call
op,arg1, arg2 and result.

6. Define symbol table

It is data structure contaning a record for each identify with fields


for the attribute of the identifier. The data structure allows us to find the
record for each identifier quickly and to store or retrive data from that
record quickly.

7. What are the various data structure used for implementing


symbol table

1. Intial,
2. position,and
3. rate .

8. Draw the DAG for a:=b*-c+b*-c


Code for the DAG (Statements a:= b*-c +b*-c)
1. t1:= -c
2. t2:= b * t1
3. t5 := t2 + t2
4. a:= t5

9. Translate the expression –(a+b)*(c+d)+(a+b+c)into


quadruples

Op Arg1 Arg2 result

(0) + a b T1

(1) *,+ t1 c,d T2


(3) -,*,+ t2 a,b,c T3
(4) = t4 x

10. Write short note for Triples

In this the use of temporary variables is avoided by referring the


pointer in the symbol table .

11. Translate the arithmetic expression a*-(b+c)into syntax


tree

12. Write three address code for the expressiona*-(b+c)

Op Arg1 Arg2 result

(0) + a b T1

(1) *,+ t1 c,d T2


(3) -,*,+ t2 a,b,c T3

(4) = t4 x

13. Give the difference between syntax – directed definition


and translation schemes

 Retarget is facilitated. [a compiler for a different machine can be


created by attaching a back end for the new machine to an existing
front end]
 A machineindepedent code optimiser can be applied to the
intermediate represntation

Translation:evaluate the expression


find which value in the list of case is the same as the value of the
expression

14. Give the form of a syntax - directed definition


When the three address code is generated, temporary names are
made up for the interior nodes of a syntax tree.

15. What is the purpose of DAG

A DAG gives the same information but in compact way because


common sub expression are identified.

16. Define back patching


Generating a series of branching statement for boolean expression
and flow of control statement with the targets of the jump temporarily
left un specified .
17. Define dependency graphs

18. Define procedure calls

Procedure or a function is an important programming constructs


which is used to obtain the modularity in the user program .

19. What is the use of symbol table

It is data structure contaning a record for each identify with fields


for the attribute of the identifier. The data structure allows us to find the
record for each identifier quickly and to store or retrive data from that
record quickly.

20. Give the advantage and disadvantage of linear list


implementation of symbol table

It is data structure contaning a record for each identify with fields


for the attribute of the identifier. The data structure allows us to find the
record for each identifier quickly and to store or retrive data from that
record quickly.

1. Translate the arithmetic expression a*-(b+c)into


i. syntax tree
ii. postfix notation
iii. Three-address code
 Syntax trees
 Postfix notation
 Three address codes: [the semantic rules for generating three
address code from common programming languages]
Graphical representation

A syntax tree shows the hierarchical structure of a source program


A DAG gives the same information but in compact way because
common sub expression are identified.
Statements a:= b*-c +b*-c
Postfix notation

Linear represntation of a syntax tree

Postfix for the Statements a:= b*-c +b*-c


a b c uminus * b c unminus * + assign
Syntax tree for the assignment statement is produced by the syntax
directed translation
Nonterminal S generates an assignment statement
+ and – are operators in the typical languages
operator associates and precedence are usual
syntax directed definition for the Statements a:= b*-c +b*-c
Productio Semantic rule
n
S id:=E S.nptr := mknode (‘assign’, mkleaf(id,
id.place),E.nptr)
EE1 + E2 E.nptr := mknode (‘+’ E1.nptr, E2.nptr)
EE1 * E.nptr := mknode (‘*’ E1.nptr, E2.nptr)
E2
E-E1 E.nptr := mknode (‘uminus’ E1.nptr)
E ( E1 ) E.nptr :=E1.nptr
Eid E.nptr := mkleaf(id,id.place)

Representation of syntax tree

 Each node as As a record


o Field – operator, pointers to children
 Node are allocated from an array
2. Briefly explain about intermediate code generation?

The front end translates a source program into an intermediate


represntation from which the back end generates the target code.
Intermediate code
Parser Static Intermedi Code
checker ate code generator
generator

Position of intermediate code


Advantage of machine independent intermediate form

 Retarget is facilitated. [a compiler for a different machine can be


created by attaching a back end for the new machine to an existing
front end]
 A machineindepedent code optimiser can be applied to the
intermediate represntation

Intermediate languages

Types of intermediate Representation


Syntax trees

 Postfix notation
 Three address codes: [the semantic rules for generating three
address code from common programming languages]

Graphical representation

A syntax tree shows the hierarchical structure of a source program


A DAG gives the same information but in compact way because
common sub expression are identified.
Statements a:= b*-c +b*-c
Postfix notation

Linear represntation of a syntax tree


Postfix for the Statements a:= b*-c +b*-c

a b c uminus * b c unminus * + assign

Syntax tree for the assignment statement is produced by the syntax


directed translation
Nonterminal S generates an assignment statement
+ and – are operators in the typical languages
operator associates and precedence are usual

syntax directed definition for the Statements a:= b*-c +b*-c

Productio Semantic rule


n
S id:=E S.nptr := mknode (‘assign’, mkleaf(id,
id.place),E.nptr)
EE1 + E2 E.nptr := mknode (‘+’ E1.nptr, E2.nptr)
EE1 * E.nptr := mknode (‘*’ E1.nptr, E2.nptr)
E2
E-E1 E.nptr := mknode (‘uminus’ E1.nptr)
E ( E1 ) E.nptr :=E1.nptr
Eid E.nptr := mkleaf(id,id.place)

Representation of syntax tree

 Each node as As a record


o Field – operator, pointers to children
 Node are allocated from an array

Unit IV

PART-A

1. What are the issues in the design of code generators?

 The requirements on a code generator:


o –The output code must be correct.
o –The output code must be high quality.
o –It should make effective use of the resources of the target
machine.
o –It should run efficiently.

2. Give the two standard storage allocation strategies.


1. Static allocation ,and
2. stack allocation.

3. Define static allocation

The position of an activation record in memory is fixed at compiler


time.

4. Define stack allocation.

Static allocatin can become stack allocationby using relative address


for storage in activation records.

5. Define code generation.


A code generator takes an intermediate representation of a
source program, and produces an equivalent program as an output.

6. Write short note for target program.


1.The output of the code generation is the target program.
2.The target program can be in one of the following form:
1. –absolute machine language
2. –relocatable machine language
3. –assembly language
4. –virtual machine codes(Java..)

7. Define flow graph


 We can add the flow-of-control information to the set of basic
blocks making up a program by constructing a directed graph
called a flow graph.

8. Define basic block

A basic block is a sequence of consecutive statements (of


intermediate codes – quadraples) in which flow of control enters at
the beginning and leaves at the end without halt or possibility of
branch (except at the end).

A basic block:
t1 := a * a
t2 := a * b
t3 := t1 – t2

9. Give the types of partition sequence ot three – address


statements into basic blocks.
(i) The first statement is a leader
(ii) Conditional or unconditional,goto statement is a leader.
(iii) Any statement that immediately follows a go to or
conditional goto statement is a leader .

10. Give the types of transformation of basic blocks


1.Structure-preserving transformation,
2. Algebraic transformation.

11. Define DAG with example

The directed acyclic graph(DAGs) are useful data structure for


implementing transformations on basic blocks
12. What are the typical peephole optimization?.

Peephole Optimization is a method to improve performance of the


target program by examining a short sequence of target instructions
(called peephole), and replacing these instructions shorter and faster
instructions.

13. Give the representation of intermediate language


The front end translates a source program into an intermediate
represntation from which the back end generates the target code.

Intermediate code
Parser Static Intermedi Code
checker ate code generator
generator

14. Position of intermediate code

15. List out the primary structure preserving


transformations

Structure-Preserving Transformations

 The primary structure-preserving transformations are:


o –common sub-expression elimination
o –dead-code elimination
o –renaming of temporary variables
o –interchange of two independent adjacent statements.

16. Write short note for flow of control optimizations

Flow-of-Control Optimizations
goto L1 goto L2
.
L1: goto L2 L1: goto L2

----------------------------------------------
if a<b goto L1 if a<b goto L2
.
L1: goto L2  L1: goto L2
-----------------------------------------------
goto L1  if a<b goto L2
. goto L3
L1: if a<b goto L2 .

17. Give the different forms of addressing mode and


associated costs.

Address Modes
 The source and destination fields are not long enough to hold
memory addresses. Certain bit-patterns in these fields specify that
words following the instruction (the instruction is also one word)
contain operand addresses (or constants).
 Of course, there will be cost for having memory addresses and
constants in instructions.
 We will use different addressing modes to get addresses of source
and destination.

18. What are the forms of the output of the code generator?
Target Programs (Output of Code Generation)

 The output of the code generation is the target program.


 The target program can be in one of the following form:
o –absolute machine language
o –relocatable machine language
o –assembly language
o –virtual machine codes(Java..)

19. Write short note for memory management


Implementation of static and stack allocation of data objects?
How the names in the intermediate codes are converted into
addresses in the target code?
The labels in the intermediate codes must be converted into
the addresses of the target machine instructions.
A quadraple will match to more than one machine
instruction. If that quadraple has a label, this label will be the
address of the first machine instruction corresponding to that
quadraple.

20. Define re locatable machine language.

Producing a relocatable machine language program as out put allows


subprogram to be compiled separately.

PART-B

1. What are the issues in designing of code generator? Explain in


detail.

Code Generation

 A code generator takes an intermediate representation of a source


program, and produces an equivalent program as an output.
 The requirements on a code generator:
o –The output code must be correct.
o –The output code must be high quality.
o –It should make effective use of the resources of the target
machine.
o –It should run efficiently.
 In theory, the problem of generating optimal code is undecidable.
 In practice, we use heuristic techniques to generate sub-optimal
(good, but not optimal) target code. The choice of the heuristic is
important since a carefully designed code generation algorithm can
produce much better code than a naive code generation algorithm.

Input to Code Generator

 The input of a code generator is the intermediate representation of


a source program (together with the information in the symbol
table to figure out the addresses of the symbols).
 The intermediate representation can be:
o –Three-address codes (quadraples).
o –Trees
o –Dags (Directed Acyclic Graphs)
o –or, other representations
 Code generator assumes that codes in the intermediate
representation are free of semantic errors and we have all the type
conversion instructions in these codes.

Target Programs (Output of Code Generation)

 The output of the code generation is the target program.


 The target program can be in one of the following form:
o –absolute machine language
o –relocatable machine language
o –assembly language
o –virtual machine codes(Java..)
 If absolute machine language is used, the target program can be
placed in a fixed location, and immediately executed (WATFIV,
PL/C).
 If relocatable machine language is used, we need a linker and
loader to combine the relocatable object files and load them. It is a
flexible approach (C language)
 If assembly language is used, we need an assembler is need

Memory Management

 Implementation of static and stack allocation of data objects?


 How the names in the intermediate codes are converted into
addresses in the target code?
 The labels in the intermediate codes must be converted into the
addresses of the target machine instructions.
 A quadraple will match to more than one machine instruction. If
that quadraple has a label, this label will be the address of the first
machine instruction corresponding to that quadraple.

Instruction Selection

 The structure of the instruction set of the target machine


determines the difficulty of the instruction selection.
 –The uniformity and completeness of the instruction set are an
important factors.
 Instruction speeds are also important.
 –If we do not care speed, the code generation is a straight forward
job. We can map each quadraple into a set of machine instructions.
Naive code generation:
ADD y,z,x  MOV y, R0
ADD z, R0
MOV R0,x

 The quality of the generated code is determined by its speed and


size.
 Instruction speeds are needed to design good code sequences.

Register Allocation

 Instructions involving register operands are usually shorter and


faster than those involving operands in memory.
 The efficient utilization of registers is important in generating good
code sequence.
 The use of registers is divided into two sub-problems:
o –Register Allocation – we select the set of registers that will
be reside in registers at a point in the program.
o –Register Assignment – we pick the specific register that a
variable will reside in.
 Finding an optimal assignment of registers is difficult.
o –In theory, the problem is NP-complete.
o –The problem is further complicated because some
architectures may require certain register-usage conventions
such as address vs data registers, even vs odd registers for
certain instructions.

Choice of Evaluation Order


 The order of computations affect the efficiency of the target code.
 Some computation orders require less registers to hold
intermediate results.
 Picking the best computation order is also another NP-complete
problem.
 We will try to use the order used in the intermediate codes.
 But, the most important criterion for a code generator is that it
should produce correct code.
 We may use a less efficient code generator as long as it produces
correct codes. But we cannot use a code generator which is
efficient but it does not produce correct codes.

Target Machine
 To design a code generator, we should be familiar with the
structure of the target machine and its instruction set.
 nstead of a specific architecture, we will design our own simple
target machine for the code generation.
– We will decide the instruction set, but it will be closer
actually machine instructions.
– We will decide size and speeds of the instructions, and we
will use them in the creation of good code generators.
– Although we do not use an actual target machine, our
discussions are also applicable to actual target machines.

Our Target Machine


 Our target machine is a byte-addressable machine (each word is
four-bytes).
 Our target machine has n general purpose registers – R0, R1,...,Rn-
1
 Our target machine has two-address instructions of the form:
 op source,destination
 where op is an op-code, and source and destination are data fields.


ADD  add source to destination
SUB  subtract source from destination
MOV  move source to destination

Our Target Machine – Address Modes


 •The source and destination fields are not long enough to hold
memory addresses. Certain bit-patterns in these fields specify that
words following the instruction (the instruction is also one word)
contain operand addresses (or constants).
 •Of course, there will be cost for having memory addresses and
constants in instructions.
 •We will use different addressing modes to get addresses of source
and destination.

a. Discuss about the run time storage management of a code


generator.

Run-Time Addresses
 Stack Variables
o Stack variables are accesses using offsets from the beginning
of the activation records.

 local variable  *OFFSET(SP)

 non-local variable
o access links
o displays

Basic Blocks
A basic block is a sequence of consecutive statements (of intermediate
codes – quadraples) in which flow of control enters at the beginning and
leaves at the end without halt or possibility of branch (except at the end).
A basic block:
t1 := a * a
t2 := a * b
t3 := t1 – t2

Partition into Basic Blocks

Input: A sequence of three-address codes


Output: A list of basic blocks with each three-address statement in
exactly one block.
Algorithm:

 1.Determine the list of leaders. The first statement of each basic


block will be a leader.
o The first statement is a leader
o Any statement that is the target of a jump instruction
(conditional or unconditional) is a leader.
o Any statement immediately following a jump instruction
(conditional or unconditional) is a leader.
 2.For each leader, its basic block consists of the leader and all
statements up to but not including the next leader or the end of the
program.

Example Pascal Program


begin
prod := 0;
i := 1;
do begin
prod := prod + a[i] * b[i];
i := i + 1;
end
while i <= 20
end

Corresponding Quadraples
1: prod := 0
2: i := 1
3: t1 := 4*i
4: t2 := a[t1]
5: t3 := 4*i
6: t4 := b[t3]
7: t5 := t2*t4
8: t6 := prod+t5
9: prod := t6
10: t7 := i+1
11: i := t7
12: if i<=20 goto 3
2. Explain briefly about DAG representation of basic blocks.

DAG Representation of Basic Blocks


 Directed Acyclic Graphs (dags) can be useful data structures for
implementing transformations on basic blocks.
 Using dags
o –we can easily determine common sub-expressions
o –We can determine which names are evaluated outside of the
block, but used in the block.
 First, we will construct a dag for a basic block.
 Then, we apply transformations on this dag.
 Later, we will produce target code from a dag.

A dag for A Basic Block


 A dag for a basic block is:
o –Leaves are labeled with unique identifiers(names,
constants). If the value of a variable is changed in a basic
block we use subscripts to distinguish two different value of
that name.
o –Interior nodes are labeled by an operator symbol
o –Interior nodes optionally may also have a sequence of
names as labels.
 So, for each basic block we can create a dag for that basic block.

Three-Address Codes for A Basic Block

1: t1 := 4*i
2: t2 := a[t1]
3: t3 := 4*i
4: t4 := b[t3]
5: t5 := t2*t4
6: t6 := prod+t5
7: prod := t6
8: t7 := i+1
9: i := t7
10: if i<=20 goto 1

Corresponding DAG

Construction of DAGs
 •We can systematically create a corresponding dag for a given
basic block.
 •Each name is associated with a node of the dag. Initially, all
names are undefined (i.e. they are not associated with nodes of the
dag).
 •For each three-address code x := y op z
o –Find node(y). If node(y) is undefined, create a leaf node
labeled y and let node(y) to be this node.
o –Find node(z). If node(z) is undefined, create a leaf node
labeled y and let node(z) to be this node.
o –If there is a node with op, node(y) as its left child, and
node(z) as its right child  this is node is also treated as
node(x).
o –Otherwise, create node(x) with op, node(y) as its left child,
and node(z) as its right child.

Applications of DAGs
 We automatically detect common sub-expressions.
 We can determine which identifiers whose values are used in the
block. (the identifier at leaves).
 We can create simplified quadraples for a block using its dag.
o –taking advantage of common sub-expressions
o –without performing unnecessary move instructions.
 In general, the interior nodes of a the dag can be evaluated in any
order that is a topological sort of the dag.
o –In topological sort, a node is not evaluated until its all
children are evaluated.
o –So, a different evaluation order may correspond to a better
code sequence.

3. Write short notes for


i. flow of control optimizations
ii. Algebraic simplification
iii. Redundant instruction elimination
iv. Reduction in strength

(i)Flow-of-Control Optimizations:
goto L1 goto L2
. 
L1: goto L2 L1: goto L2
----------------------------------------------
if a<b goto L1 if a<b goto L2
. 
L1: goto L2 L1: goto L2
-----------------------------------------------
goto L1 if a<b goto L2
.  goto L3
L1: if a<b goto L2 .
L3: L3:

ii) Algebraic Transformations


x := x+0 eliminate this statement
x := y+0  x := y
x := x+1  INC ,,X
x := y**2  x := y*y

iii) redundant-instruction elimination:


if we see the instruction sequence
MOV R,a
MOV a,R
We can delete the second instruction if it is an unlabeled
instruction, because the first instruction ensures that the value of a is
already in the register R .
iv) reduction in strength:
It replaced expensive operation by equivalent cheaper ones on
target machine.
6.Briefly explain about storage allocation strategies?
Run-Time Storage Organization

 Static Allocation -- the static allocation can be performed by just


reserving enough memory space for static data objects.
o –Static variables can be accessible by just using absolute
memory address.

 Stack Allocation – the code generator should produce machine


codes to allocate the activation records (corresponding to
intermediate codes).
o –Normally we will use a specific register to point (the
beginning of) the activation record, and we will use this
register to access variables residing in that activation record.
o We cannot know actual address of these stack variables until
run-time.
Stack Allocation – Activation Record
Return address

Return Value

SP
Actual Parameters

Other Stuff

Local Variables

Temporaries

 All values in the activation record can be accessible from SP by a


positive offset.

 And all these offsets are calculated at compile-time.

Possible Procedure Invocation

ADD #caller.recordsize,SP
MOV PARAM1,*8(SP) // save parameters
MOV PARAM2,*12(SP)
.
MOV PARAMn,*4+4n(SP)
. // saving other stuff
MOV #here+16,*SP // save return address
GOTO callee.codearea // jump to procedure
SUB #caller.recordsize,SP // return address

Possible Return from A Procedure Call


MOV RETVAL,*4(SP) // save the return value
GOTO *SP // return to caller

Run-Time Addresses
 Static Variables:
o static[12]  staticaddressblock+12

 if the beginning of the static address block is 100,


o MOV #0,,static[12]  MOV #0,112

 So, the static variables are absolute addresses and these absolute
addresses are evaluated at compile time (or load time).

Run-Time Addresses
 Stack Variables
o Stack variables are accesses using offsets from the beginning
of the activation records.

 local variable  *OFFSET(SP)


 non-local variable
o access links
o displays

7. Write short note


i) Register allocation
ii) Memory management
iii) Input to the code generator
iv) Instruction selection

i)Register Allocation

 Instructions involving register operands are usually shorter and


faster than those involving operands in memory.
 The efficient utilization of registers is important in generating good
code sequence.
 The use of registers is divided into two sub-problems:
o –Register Allocation – we select the set of registers that will
be reside in registers at a point in the program.
o –Register Assignment – we pick the specific register that a
variable will reside in.
 Finding an optimal assignment of registers is difficult.
o –In theory, the problem is NP-complete.
o –The problem is further complicated because some
architectures may require certain register-usage conventions
such as address vs data registers, even vs odd registers for
certain instructions.
ii) Memory Management

 Implementation of static and stack allocation of data objects?


 How the names in the intermediate codes are converted into
addresses in the target code?
 The labels in the intermediate codes must be converted into the
addresses of the target machine instructions.
 A quadraple will match to more than one machine instruction. If
that quadraple has a label, this label will be the address of the first
machine instruction corresponding to that quadraple.

iii)Input to Code Generator

 The input of a code generator is the intermediate representation of


a source program (together with the information in the symbol
table to figure out the addresses of the symbols).
 The intermediate representation can be:
o –Three-address codes (quadraples).
o –Trees
o –Dags (Directed Acyclic Graphs)
o –or, other representations
 Code generator assumes that codes in the intermediate
representation are free of semantic errors and we have all the type
conversion instructions in these codes.

iv)Instruction Selection
 The structure of the instruction set of the target machine
determines the difficulty of the instruction selection.
 –The uniformity and completeness of the instruction set are an
important factors.
 Instruction speeds are also important.
 –If we do not care speed, the code generation is a straight forward
job. We can map each quadraple into a set of machine instructions.
Naive code generation:
ADD y,z,x  MOV y, R0
ADD z, R0
MOV R0,x

 The quality of the generated code is determined by its speed and


size.
 Instruction speeds are needed to design good code sequences.

3. Explain about peephole optimization?

Peephole Optimization

 Peephole Optimization is a method to improve performance of the


target program by examining a short sequence of target
instructions (called peephole), and replacing these instructions
shorter and faster instructions.
o –peephole optimization can be applicable to both
intermediate codes and target codes.
o –the peephole can be in a basic block (sometimes can be
across blocks).
o –we may need multiple passes to get best improvement in the
target code.
o –we will look at certain program transformations which can
be seen as peephole optimization.

Redundant Instruction Elimination


MOV R0,a  MOV R0,a
MOV a,R0
•We can eliminate the second instruction, if there is no jump instruction
jumping to that instruction.

Unreachable Code
We may remove unreachable codes.
#define debug 0
.
.
if (debug==1) { print debugging info }

This is an unreachable code sequence. So we can eliminate it.

Flow-of-Control Optimizations
goto L1 goto L2
. 
L1: goto L2 L1: goto L2
----------------------------------------------
if a<b goto L1 if a<b goto L2
. 
L1: goto L2 L1: goto L2
-----------------------------------------------
goto L1 if a<b goto L2
.  goto L3
L1: if a<b goto L2 .
L3: L3:

Other Peephole Optimizations


 Algebraic Simplifications:
o –x := x+0
o –x := x*1
o –... more
 Reduction in Strength
o –x := y**2  x := y*y
o –x := y*2  x := lshift(y,1)
 Specific Machine Instructions
o –The target machine may specific instructions to implement
specific operations.
o –auto increment, auto decrement, ...
Unit V

PART-A

1. Define activation Tree.

Each execution of a procedure is referred to an activation of a


procedure .if the procedure is recursive ,several of its activation may be
alive at the same time.

2. What is the use of control stack?

The flow of control in a program corresponds to a depth first


traversal of the activation tree that starts at the root , visits a node before
its child ren, and recursively visit children at each node in a left to right
roder .

3. Define scope of declarations.

A scope of declaration in a language is a syntatic construct that


associates information with name .

4. What are the strategies of storage allocation?


There are 3 storage allocation strategy ,they are
1. Static allocation
2. Stack allocation
4. Help allocation

5. Define static allocation.


The position of an activation record in memory is fixed at
compiler time.
6. What are the limitations of the static memory allocation?
The static allocation can be performed by just reserving
enough memory space for static data objects.
–Static variables can be accessible by just using absolute memory
address

7. What are the two approaches to implement dynamic scopes.

1. Static scope rule,


2. Dynamic scope rule.

8. What is meant by code optimization?

Codegen produce optimal code for our target machine if we


assume that
1. –there are no common sub-expression
2. –there are no algebraic properties of operators which effect
the timing.

9. Define optimizing compilers.

Compiler that apply code-improving transformation are called


optimizing compiler.

10. Give the criteria for – improving transformation

A transformation of a program is called local,if it can be performed


by looking only at the statements in a basic blocks.

11. What are the two levels of code optimization technique?

o –there are no common sub-expression


o –there are no algebraic properties of operators which effect
the timing.
12. What are the phases of code optimization technique.

codegen produce optimal code for our target machine if we assume


that
1. –there are no common sub-expression
2. –there are no algebraic properties of operators which effect
the timing.
Algebraic properties of operators (such as commutative,
associative) may effect the generated code.
When there are common sub-expressions, the dag will no longer be
a tree. In this case, we cannot apply the algorithm directly.

13. Define function – preserving transformation

1. –Structure-Preserving Transformations
2. –Algebraic Transformations.

14. what is heap allocation

The stack allocation cannot be used if either of the following is


possible
i) The value of a local name must be retained when an
activation ends.
ii) A called activation outline the caller.
Heap allocation parcels out pieces of contiguous stroage as
needed for activation records or other object.

15. Define dead code elimination?

i)We say that x is dead at a certain point, if it is not used after that
point in the block (or in the following blocks).
ii)If x is dead at the point of the statement x := y op z, this
statement can be safely eliminated without changing the meaning
of the block

16. What is access links

A direct implementation of lexical scope for nested procedure is


obtained by adding a pointer called access link.

17. Define deep access and shallow access

Deep:

A simple implementation is to dispenses with access links and use the


control link to search in to stack.

Shallow:

The current value of each name is in statically allocated stroage.

18. List out three loop optimization technique

The running time of a program if its value can be used


subsequently.
I) Code motion
II) Induction variable elimination
III) Reduction in strength.

19. List the principle source of the code optimization.

 1.Labeling – Label each node of the tree (bottom-up) with an


integer that denotes the fewest number of registers required to
evaluate the tree with no stores of intermediate results.
 2.Code Generation from The Labeled Tree – We traverse the
tree by looking the computed label of the tree, and emit the target
code during that traversal.
–For a binary operator, we evaluate the hardest operand first, then
we evaluate the other operand.

20. Define code generation

 This code generation algorithm takes a basic block of three-address


codes, and produces machines codes in our target architecture.
 For each three-address code we perform certain actions.
 We assume that we have getreg routine to determine the location
of the result of the operation.

1. What are the different storage organization strategies? Explain

Storage for Temporaries


If two temporaries are not live at the some time, we can pack these
temporaries into a same location.
We can use the next use information to pack temporaries.
t1 := a*a t1 := a*a
t2 := a*b t2 := a*b
t3 := 2*t2  t2 := 2*t2
t4 := t1+t3 t1 := t1+t2
t5 := b*b t2 := b*b
t6 := t4+t5 t1 := t1+t2

Simple Code Generator

 For simplicity, we will assume that for each intermediate code


operator we have a corresponding target code operator.
 We will also assume that computed results can be left in registers
as long as possible.
o –If the register is needed for another computation, the value
in the register must be stored.
o –Before we leave a basic block, everything must be stored in
memory locations.
 We will try to produce reasonable code for a given basic block.
 The code-generation algorithm will use descriptors keep track of
register contents and addresses for names.

Register Descriptors
 A register descriptor keeps track of what is currently in each
register.
 It will be consulted when a new register is needed by code-
generation algorithm.
 We assume that all registers are initially empty before we enter
into a basic block. This is not true if the registers are assigned
across blocks.
 At a certain time, each register descriptor will hold zero or more
names.
R1 is empty
MOV a,R1
R1 holds a
MOV R1,b
R1 holds both a and b

Address Descriptors
 An address descriptor keeps track of the locations where the
current value of a name can be found at run-time.
 The location can be a register, a stack location or a memory
location (in static area). The location can be a set of these.
 This information can be stored in the symbol table.

a is in the memory
MOV a,R1
a is in R1 and in the memory
MOV R1,b
b is in R1 and in the memory

A Code Generation Algorithm

 This code generation algorithm takes a basic block of three-address


codes, and produces machines codes in our target architecture.
 For each three-address code we perform certain actions.
 We assume that we have getreg routine to determine the location
of the result of the operation.
2. Explain about activation tree

Each execution of a procedure is referred to as an activation of the


procedure . if the procedure is recursive , several of its activation may be
alive at the same time .
The life time of a activation of a procedure P is the sequence of
steps btw the first & the last step in the execution of the procedure
body , including time spent executing procedures called P.
If a &b are the procedure activation can begin before an earlier
activation of the same procedure has ended. An activation tree which
depict the way control enters & leaves activation .
In activation tree:
i) Each node represents an activation of a
procedure.
ii) The root represent the activation of the main
program.
iii) 3. The node for a is the parent of the node for b
if and only if control flow from activation a to b
iv) The node for a is the left of the node for b if and
only if the life time of a occurs before the lifetime
of b

3. Write short note


a) copy propagation
b) dead code elimination
c) constant folding

Dead-Code Elimination

 We say that x is dead at a certain point, if it is not used after that


point in the block (or in the following blocks).
 If x is dead at the point of the statement x := y op z, this statement
can be safely eliminated without changing the meaning of the
block
i) Constant folding:

 The contiguous evaluation of a tree is:


o –first evaluate left sub-tree , then evaluate right sub-tree, and
the root.
o –Or, first evaluate the right sub-tree, then evaluate the left
sub-tree, and finally the root.
 In non-contiguous evaluations, we may mix the evaluations of the
sub-trees.
 For any given machine-language program P (for register machines)
to evaluate an expression tree T, we can find an equivalent
program Q such that:
 1.Q does not have higher cost than P
 2.Q uses no more registers than
 3.Q evaluates the tree in a contiguous fashion.
 –This means that every expression tree can be evaluated optimally
by a contiguous program.

Example
 Assume that we have the following machine codes, and the cost of
each of them is one unit.
o –mov M,Ri
o –mov Ri,M
o –mov Ri,Rj
o –OP M,Ri
o –OP Rj,Ri
 Assume that we have only two registers R0 and R1.
 First, we have to evaluate cost arrays for the tree.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy