0% found this document useful (0 votes)

21 views

Compiler-Designnotes (All Modules)

Uploaded by

shreeshubha272

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Compiler-Designnotes (All Modules)

Uploaded by

shreeshubha272

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 148

COMPILER DESIGN

Subject Code: 10CS63 I.A. Marks : 25

Hours/Week : 04 Exam Hours: 03
Total Hours : 52 E xam M ar ks : 100
PART – A
UNIT – 1 8
Hours

m
Introduction, Lexical analysis: Language processors; The structure of a Compiler; The
evolution pf programming languages; The science of building a Compiler; Applications
of compiler technology; Programming language basics.
Lexical analysis: The Role of Lexical Analyzer; Input Buffering; Specifications of

o
Tokens; Recognition of Tokens.

UNIT – 2 6

s.c
Hours
Syntax Analysis – 1: Introduction; Context-free Grammars; Writing a Grammar. Top-
down Parsing; Bottom-up Parsing.

UNIT – 3 6
Hours c
Syntax Analysis – 2: Top-down Parsing; Bottom-up Parsing.
vtu
UNIT – 4 6
Hours
Syntax Analysis – 3: Introduction to LR Parsing: Simple LR; More powerful LR parsers
(excluding Efficient construction and compaction of parsing tables) ; Using ambiguous
grammars; Parser Generators.

PART – B
w.

UNIT – 5 7
Hours
Syntax-Directed Translation: Syntax-directed definitions; Evaluation orders for SDDs;
Applications of syntax-directed translation; Syntax-directed translation schemes.
UNIT – 6 6
Hours
ww

Intermediate Code Generation: Variants of syntax trees; Three-address code;

Translation of expressions; Control flow; Back patching; Switch statements; Procedure
calls.
UNIT – 7 6
Hours
Run-Time Environments: Storage Organization; Stack allocation of space; Access to
non-local data on the stack; Heap management; Introduction to garbage collection.
UNIT – 8 7
Hours

DEPT. OF CSE, SJBIT

Code Generation: Issues in the design of Code Generator; The Target Language;
Addresses in the target code; Basic blocks and Flow graphs; Optimization of basic
blocks; A Simple Code Generator
Text Books:
1. Alfred V Aho, Monica S.Lam, Ravi Sethi, Jeffrey D Ullman: Compilers- Principles,
Techniques and Tools, 2nd Edition, Pearson Education, 2007.
(Chapters 1, 3.1 to 3.4, 4 excluding 4.7.5 and 4.7.6, 5.1 to 5.4, 6.1, 6.2, 6.4, 6.6, 6.7 to
6.9, 7.1 to 7.5, 8.1 to 8.6.)

m
Reference Books:
1. Charles N. Fischer, Richard J. leBlanc, Jr.: Crafting a Compiler with C, Pearson
Education, 1991.

o
2. Andrew W Apple: Modern Compiler Implementation in C, Cambridge University
Press, 1997.
3. Kenneth C Louden: Compiler Construction Principles & Practice, Cengage Learning,

s.c
1997.

c
vtu
w.
ww

DEPT. OF CSE, SJBIT

INDEX SHEET

PART A Page no.

UNIT – 1 INTRODUCTION: 1 - 28

UNIT – 2 SYNTAX ANALYSIS – 1: 29 - 50

m
UNIT – 3 SYNTAX ANALYSIS – 2: 51 - 65

UNIT – 4 SYNTAX ANALYSIS – 3: 66 - 76

o
PART B

UNIT – 5 SYNTAX-DIRECTED TRANSLATION: 77 - 84

s.c
UNIT – 6 INTERMEDIATE CODE GENERATION: 85 – 102

UNIT – 7 RUN-TIME ENVIRONMENTS: 103 – 110

UNIT – 8 CODE GENERATION: c 111 - 125

vtu
w.
ww

DEPT. OF CSE, SJBIT

COMPILER DESIGN
SUBJECT CODE: 10CS63

Preliminaries Required
• Basic knowledge of programming languages.
• Basic knowledge of DFA and NFA( FAFL concepts).

m
• Knowledge of a high programming language for the programming assignments.
Textbook:
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman, Monica,
“Compilers: Principles, Techniques, and Tools”

o
Addison-Wesley, 2nd Edition.

s.c
Course Outline
• Introduction to Compiling
• Lexical Analysis
• Syntax Analysis
– Context Free Grammars
– Top-Down Parsing, LL Parsing
c
– Bottom-Up Parsing, LR Parsing
• Syntax-Directed Translation
vtu
– Attribute Definitions
– Evaluation of Attribute Definitions
• Semantic Analysis, Type Checking
• Run-Time Organization
• Intermediate Code Generation
w.
ww

DEPT. OF CSE, SJBIT Page 1

UNIT I: INTRODUCTION, LEXICAL ANALYSIS
SYLLABUS:
x Lexical analysis: Language processors;
x The structure of a Compiler;
x The evolution of programming languages;

m
x The science of building a Compiler;
x Applications of compiler technology;
x Programming language basics.

o
x Lexical analysis: The Role of Lexical Analyzer;

s.c
x Input Buffering;
x Specifications of Tokens;
x Recognition of Tokens.

c
vtu
w.
ww

DEPT. OF CSE, SJBIT Page 2

LANGUAGE PROCESSORS?
x Is a Software which process a program given in a certain source language
x Types:- compilers, Interpreters, Preprocessors, assembler ..etc
COMPILER
Pictorial representation of compiler is given below

Source program COMPILER target program

m
If target program is executable machine language then

o
Input target program output

s.c
INTERPRETER
Pictorial representation of an Interpreter is given below

Source program Interpreter Output

Input c
HYBRID COMPILER
vtu
Source program Translator

Intermediate pgm
w.

i/p virtual machine o/p

COMPILERS
• A compiler is a program takes a program written in a source language and
ww

translates it into an equivalent program in a target language.

source program COMPILER target program

error messages

Other Applications

DEPT. OF CSE, SJBIT Page 3

• In addition to the development of a compiler, the techniques used in compiler
design can be applicable to many problems in computer science.
– Techniques used in a lexical analyzer can be used in text editors,
information retrieval system, and pattern recognition programs.
– Techniques used in a parser can be used in a query processing system such
as SQL.
– Many software having a complex front-end may need techniques used in

m
compiler design.
• A symbolic equation solver which takes an equation as input. That
program should parse the given input equation.

o
– Most of the techniques used in compiler design can be used in Natural
Language Processing (NLP) systems.

s.c
Major Parts of Compilers

Source Target
Program c Analysis Synthesis Program
vtu
• There are two major parts of a compiler: Analysis and Synthesis
• In analysis phase, an intermediate representation is created from the given source
program.
– Lexical Analyzer, Syntax Analyzer and Semantic Analyzer are the parts of
w.

this phase.
• In synthesis phase, the equivalent target program is created from this intermediate
representation.
– Intermediate Code Generator, Code Generator, and Code Optimizer are
the parts of this phase.
ww

DEPT. OF CSE, SJBIT Page 4

Phases of A Compiler
Front end Symbol table Back end

Source Le xical Synta x Se ma ntic Inter mediate Code Code Target

Program Anal yzer Anal yzer Anal yzer Code Genera tor Opti mi zer Generator Program

m
Error hSaynmdbloelr table
• Each phase tran sforms the source progr am from one represen tation

o
into another representation.

• They commun icate with error handlers.

s.c
• They commun icate with the symbo l tab le.

c
Compiler v/s interpreter
vtu
• Source program Compiler target program

• Input Target program output

• Source program Interpreter output

• Input

• Assignments:
ww

• 1. What is the difference between a compiler and an interpreter?

• 2. what are advantages of (i) a compiler over an interpreter (ii) an interpreter over a
compiler?

DEPT. OF CSE, SJBIT Page 5

Translation of an assignm ent
• Pisition = initial + rate * 60
Code Optimizer
Lexical Analyzer
• <id,1> < = > <id,2> < + > <id,3> < * > < 60> t1 = id3 * 60.0
i d1 = i d2 + t 1
Syntax Analyzer
=
Code Generator

m
<id,1> +
<id,2> *
<id,3> 60
Semantic Analyzer LDF R2, ID3
= MULF R2, R2, #60.0

o
<id,1> + LDF R1, ID2
<id,2> * ADDF R1,R1, R2
<id,3> inttofloat ST F ID1, R1
60

s.c
Intermediate Code Generator
t1 = int tofl oat( 60)
t2 = id 3 * t1
t3 = id 3 + t2
id1 = t 3

Lexical Analyzer
c
• Lexical Analyzer reads the source program character by character and returns the
vtu
tokens of the source program.
• Lexical Analyzer also called as Scanner.
• It reads the stream of characters making up the source program and groups the
characters into meaningful sequences called Lexemes.
• For each lexeme, it produces as output a token of the form
• <token_name, attribute_value>
w.

• token_name is an abstract symbol that is used during syntax analysis, and

the second component attribute_value points to an entry in the symbol
table for this token.
• A token describes a pattern of characters having same meaning in the source
ww

program. (such as identifiers, operators, keywords, numbers, delimeters and so

on)
Ex: newval := oldval + 12 => tokens: newval identifier
:= assignment operator
oldval identifier
+ add operator
12 a number

• Puts information about identifiers into the symbol table not all attributes.
• Regular expressions are used to describe tokens (lexical constructs).

DEPT. OF CSE, SJBIT Page 6

• A (Deterministic) Finite State Automaton(DFA) can be used in the
implementation of a lexical analyzer.
• Blank spaces removed

Syntax Analyzer
 A Syntax Analyzer creates the syntactic structure (generally a parse tree) of the
given program.

m
 A syntax analyzer is also called as a parser.
 A parse tree describes a syntactic structure.
=

o
newval +

s.c
oldval 12

• In a parse tree, all terminals are at leaves.

• All inner nodes are non-terminals in a context free grammar
• The syntax of a language is specified by a context free grammar (CFG).
• The rules in a CFG are mostly recursive.
c
• A syntax analyzer checks whether a given program satisfies the rules implied by a
CFG or not.
vtu
1. If it satisfies, the syntax analyzer creates a parse tree for the given
program.
• Ex: We use BNF (Backus Naur Form) to specify a CFG
1. E Identifier
2. ENumber
3. EE+E
w.

4. EE*E
5. E(E)
• Where E is an expression
Syntax Analyzer list out the followings
By rule 1, newval is an expresiion
ww

By rule 1, oldval is an expression

By rule 2, 12 is an expression
By rule 3, we get oldval+12 is an expression
Syntax Analyzer versus Lexical Analyzer
• Which constructs of a program should be recognized by the lexical analyzer, and
which ones by the syntax analyzer?
– Both of them do similar things; But the lexical analyzer deals with simple
non-recursive constructs of the language.
– The syntax analyzer deals with recursive constructs of the language.

DEPT. OF CSE, SJBIT Page 7

– The lexical analyzer simplifies the job of the syntax analyzer.
– The lexical analyzer recognizes the smallest meaningful units (tokens) in a
source program.
– The syntax analyzer works on the smallest meaningful units (tokens) in a
source program to recognize meaningful structures in our programming
language.

m
Parsing Techniques
• Depending on how the parse tree is created, there are different parsing techniques.
• These parsing techniques are categorized into two groups:

o
– Top-Down Parsing, Bottom-Up Parsing
• Top-Down Parsing:
– Construction of the parse tree starts at the root, and proceeds towards the

s.c
leaves.
– Efficient top-down parsers can be easily constructed by hand.
– Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL
Parsing).
• Bottom-Up Parsing: c
– Construction of the parse tree starts at the leaves, and proceeds towards
the root.
vtu
– Normally efficient bottom-up parsers are created with the help of some
software tools.
– Bottom-up parsing is also known as shift-reduce parsing.
– Operator-Precedence Parsing – simple, restrictive, easy to implement
– LR Parsing – much general form of shift-reduce parsing, LR, SLR, LALR
Semantic Analyzer
w.

• A semantic analyzer checks the source program for semantic errors and collects
the type information for the code generation.
• Determines meaning of source string.
– Matching of parenthesis
– Matching if else stmt.
ww

– Checking scope of operation

• Type-checking is an important part of semantic analyzer.
• Normally semantic information cannot be represented by a context-free language
used in syntax analyzers.
• Context-free grammars used in the syntax analysis are integrated with attributes
(semantic rules)
– the result is a syntax-directed translation,
– Attribute grammars
• Ex:

DEPT. OF CSE, SJBIT Page 8

newval = oldval + 12

• The type of the identifier newval must match with type of the
expression (oldval+12)
Intermediate Code Generation
• A compiler may produce an explicit intermediate codes representing the source
program.

m
• Is a kind of code.
• Easy to generate
• Easily converted to target code

o
• It can be in three address code or quadruples, triples etc
• Ex:
newval = oldval + 1

s.c
i d1 = i d2 + 1

Intermediates Codes (Quadraples) 3- address code

t1 = id2 + 1c t 1 = 12
id1= t1 t 2 = i d2 + 1
i d1 = t 2
vtu
Code Optimizer (for Intermediate Code Generator)
 The code optimizer optimizes the code produced by the intermediate code
generator in the terms of time and space.
 T1=id2*id3
 Id1=t1+1
Code Generator
w.

 Produces the target language in a specific architecture.

 The target program is normally is a relocatable object file containing the machine
codes.
 If target code is machine code then registers (memory locations) are selected for
each variable in pgm. Then intermediate instructions are translated to sequences
ww

of machine instruction which perform same task.

 Ex: ( assume that we have an architecture with instructions whose at least one of
its operands is a machine register)
MOVE id2,R1
MULT id3,R1
ADD #1,R1
MOVE R1,id1

DEPT. OF CSE, SJBIT Page 9

Symbol Table Management
• This record variable name used in the source program
• This stores attributes of each name
o Eg: name, its type, its scope
o Method of passing each argument(by value or by reference)
o Return type
Implementation of symbol table can be done is either linear list or hash table. There will

m
be n entries e enquires to fetch information from this table.
If n and e are more than linear list method is poor in performance. But hash table is better
than list method

o
Grouping of phases into passes
• In an implementation, activities from several phases may be grouped together
into a pass that reads an input file and writes an output file.

s.c
• For example,
o front-end phases of lexical analysis, syntax analysis, semantic analysis and
from back end intermediate code generation might be grouped together
into one pass.
o Code optimization might be an optional pass.
c
o The there could be a back-end pass consisting of code generation for a
particular target machine.
vtu
CLASSIFICATION OF LANGUAGES
1.Based on generation:
a)First generation language-machine languages.
b)second generation languages-assembly languages.
c)Third generation languages-higher level languages.
d)Fourth generation languages-designed for specific applications like sql for database
w.

applications.
e)Fifth generation languages-applied to logic and constraint based languages like prolog
and ops5.
Imperative languages:
languages in which a program specifies how a computation is to be done.
ww

eg: c, c++.
Declarative languages:
languages in which a program specifies what computation is to be done.
eg: prolog.
The science of building a compiler
• Compiler design deals with complicated real world problems.
• First the problem is taken.
• A mathematical abstraction is formulated.
• Solve using mathematical techniques.

DEPT. OF CSE, SJBIT Page 10

Modeling in compiler design and implementation
 Right mathematical model and right algorithm.
 Fundamental models – finite state machine, regular expression, context free
grammar.
The science of code optimization
Optimization : It is an attempt made by compiler to produce code that is more efficient
than the obvious one.

m
Compiler optimizations-Design objectives
 Must improve performance of many programs.
 Optimization must be correct.

o
 Compilation time must be kept reasonable.
 Engineering effort required must be manageable.

s.c
APPLICATIONS OF COMPILER TECHNOLOGY
IMPLEMENTATION OF HIGH LEVEL PROGRAMING LANGUAGES.
• The programmer expresses an alg using the Lang, and the compiler must translate
that prog to the target language.
• Generally HLP langs are easier to program in, but are less efficient, i.e., the target
c
prog runs more slowly.
• Programmers using LLPL have more control over a computation and can produce
vtu
more efficient code.
• Unfortunately, LLP are harder to write and still worse less portable, more prone to
errors and harder to maintain.
• Optimizing compilers include techniques to improve the performance of general
code, thus offsetting the inefficiency introduced by HL abstractions.
w.

OPTIMIZATIONS FOR COMPUTER ARCHITECTURES

• The rapid evolution of comp architecture has also led to an insatiable demand for
a new complier technology.
• Almost all high performance systems take advantage of the same basic 2
ww

techniques: parallelism and memory hierarchies.

•Parallelism can be found at several levels : at the instruction level, where multiple
operations are executed simultaneously and at the processor level, where different
threads of same application are run on different processors.
• Memory hierarchies are a response to the basic limitation that we can built very
fast storage or very large storage, but not storage that is both fast and large.
PARALLELISM

DEPT. OF CSE, SJBIT Page 11

• All modern microprocessors exploit instruction-level parallelism. this can be
hidden from the programmer.
• The hardware scheduler dynamically checks for dependencies in the sequential
instruction stream and issues them in parallel when possible.
• Whether the hardware reorders the instruction or not, compilers can rearrange the
instruction to make instr-level parallelism more effective.

m
MEMORY HIERARCHIES.
• a memory hierarchy consists of several levels of storage with different speeds and
sizes.

o
• a processor usually has a small number of registers consisting of hundred of bytes,
several levels of caches containing kilobytes to megabytes, and finally secondary

s.c
storage that contains gigabytes and beyond.
• Correspondingly, the speed of accesses between adjacent levels of the hierarchy
can differ by two or three orders of magnitude.
• The performance of a system is often limited not by the speed of the processor but
by the performance of the memory subsystem.
c
• While compliers traditionally focus on optimizing the processor execution, more
emphasis is now placed on making the memory hierarchy more effective.
vtu
DESIGN OF NEW COMPUTER ARCHITECTURES.
• In modern computer arch development, compilers are developed in the processor-
design stage, and compiled code running on simulators, is used to evaluate the
proposed architectural design.
• One of the best known ex of how compilers influenced the design of computer
w.

arch was the invention of RISC (reduced inst-set comp) arch.

• Over the last 3 decades, many architectural concepts have been proposed. they
include data flow machines, vector machines, VLIW(very long inst word)
machines, multiprocessors with shared memory, and with distributed memory.
ww

• The development of each of these architectural concepts was accompanied by the

research and development of corresponding compiler technology.
• Compiler technology is not only needed to support programming of these arch,
but also to evaluate the proposed architectural designs.

PROGRAM TRANSLATIONS
Normally we think of compiling as a translation of a high level Lang to machine level
Lang, the same technology can be applied to translate between diff kinds of
languages.
The following are some of the imp applications of program translation techniques:

DEPT. OF CSE, SJBIT Page 12

BINARY TRANSLATION
• Compiler technology can be used to translate the binary code for one machine to
that of another, allowing a machine to run programs originally compiled for
another instr set.
• This tech has been used by various computer companies to increase the

m
availability of software to their machines.

HARDWARE SYNTHESIS
• Not only is most software written in high level languages, even hardware designs

o
are mostly described in high level hardware description languages like verilog and
VHDL(very high speed integrated circuit hardware description languages).

s.c
• Hardware designs are typically described at the register transfer level (RTL).
• Hardware synthesis tools translate RTL descriptions automatically into gates
which are then mapped to transistors and eventually to a physical layout. This
process takes long hours to optimize the circuits unlike compilers for
programming langs.
c
DATABASE QUERY INTERPRETERS
vtu
• Query languages like SQL are used to search databases.
• These database queries consist of relational and Boolean operators.
• They can be compiled into commands to search a database for records satisfying
the needs.

SOFTWARE PRODUCTIVITY TOOLS

Several ways in which program analysis, building techniques originally developed to

optimize code in compilers, have improved software productivity.
TYPE CHECKING
BOUNDS CHECKING
MEMORY MANAGEMENT TOOLS.
ww

Programming Language Basics

Static/Dynamic Distinction
 The language uses a policy that allows the compiler to decide an issue then that
language uses staticor issue decided at compile time.
 The decision is made during execution of a program is said to be dynamic or
decision at run time.
 Scope of declarations.
 Eg:public static int x;
Environments and States

DEPT. OF CSE, SJBIT Page 13

Programming language changes occuring as the program runs affect the values of
data elements.eg:x= y+1;
Names locations values.
Eg: …..
Int i;
…..
Void f(….)

m
{
int I;
…..

o
i=3;
}
x=i+1;

s.c
Static Scope and Block Structure
Most of the languages,including C and its family uses static scope.The scope rules for
C are based on prgm struc,the scope of declaration is determined implicitly where the
declaration appears in the prgm…for java,c++ provide explicit control over scopes by
c
using the keywords like PUBLIC,PRIVATE,and PROTECTED
Static_scope rules for a language with blocks-grouping of declarations and
vtu
statements.. e.g.: C uses braces
Main()
{
int a=1;
int b=1;
{
w.

int b=2;
{
int a=3;
Cout<< a< <b;
}
ww

{
int b=4;
Cout << a<< b;
}
Cout << a<< b;
}
Cout << a<< b
};

DEPT. OF CSE, SJBIT Page 14

Lexical Analyzer
THE ROLL OF THE LEXICAL ANALYZER

• Lexical Analyzer reads the source program character by character to

produce tokens.
• Normally a lexical analyzer doesn’t return a list of tokens at one shot,
it returns a token when the parser asks a token from it.

m
source Lexical token
To semantic analysis
program Parser

o
Analyzer get next token

Symbol

s.c
Table

Some Other Issues in Lexical Analyzer

• Skipping comments (stripping out comments & white space)
c
– Normally L.A. don’t return a comment as a token.
– It skips a comment, and return the next token (which is not a comment) to
vtu
the parser.
– So, the comments are only processed by the lexical analyzer, and the don’t
complicate the syntax of the language.
• Correlating error messages
- It can associate a line number with each error message..
- In some compilers it makes a copy of the source pgm. with the error messages
inserted at the appropriate positions.
w.

- If the source pgm. Using macro-processor, the expansion of macros may be

performed by the l .a
• Symbol table interface
– symbol table holds information about tokens (at least lexeme of
ww

identifiers)
– how to implement the symbol table, and what kind of operations.
• hash table – open addressing, chaining
• putting into the hash table, finding the position of a token from
its lexeme.
Token: It describes the class or category of input string. For example, identifier,
keywords, constants are called tokens
Patters: set of rule that describes the tokens
Lexeme: sequence of character in the source pgm that are matched with the pattern of the
token. Eg: int, I, num, ans etc

DEPT. OF CSE, SJBIT Page 15

Implementing Lexical Analyzers
• Different approaches:
Using a scanner generator, e.g., lex or flex. This automatically
generates a lexical analyzer from a high-level description of the tokens.
(easiest to implement; least efficient)
• Programming it in a language such as C, using the I/O facilities of the
language. (intermediate in ease, efficiency)

m
• Writing it in assembly language and explicitly managing the input.
(hardest to implement, but most efficient)

o
List out lexeme and token in the following example
1.
int Max(int a, int b) Lexeme Token

s.c
{ if(a>b) int Keyword
return a; Max Identifier
else ( Operator
return b; a Identifier
} ,
c Operator
. .
. .
vtu
. .

2. Examples of tokens
Token Informal description Sample Lexemes
if Characters i, f if
w.

else Characters e, l, s, e else

Comparis < or> or <= or >= or == or != <=, !=
on
id Letter followed by letters and Pi, score, D2
ww

digits
Number Any numeric constant 3.14, 0.6, 20
Literal Anything but surrounded by “ “total= %d\n” , “core
” ’s dumped”

In FORTRAN,
DO 5 I = 1.25 DO5I is a lexeme
Disadvantage of
Lexical Analyzer
DO 5 I = 1, 25 do- statement

DEPT. OF CSE, SJBIT Page 16

Another disadvantage of LA is
Instead of if(a==b) statement if we mistype it as fi(a==b) then lexical analyzer will
not rectify this mistake.
Advantages of Lexical Analyzer
Lexical analysis v/s Parsing
• Simplicity of design is the most important consideration.

m
• Compiler efficiency is improved.
- can apply specialized techniques that serve only the lexical task
- Specialized buffering techniques for reading input characters can speed up the

o
compiler.
• Compiler portability is enhanced.
- Input-device-specific peculiarities can be restricted to the lexical analyzer.

s.c
Lexical analysis v/s Parsing
• Simplicity of design is the most important consideration.
• Compiler efficiency is improved.
- can apply specialized techniques that serve only the lexical task
- Specialized buffering techniques for reading input characters can speed up the
c
compiler.
• Compiler portability is enhanced.
vtu
- Input-device-specific peculiarities can be restricted to the lexical analyzer.

Input Buffering
To recognize tokens reading data/ source program from hard disk is done. Accessing
hard disk each time is time consuming so special buffer technique have been
developed to reduce the amount of overhead required.
w.

- One such technique is two-buffer scheme each of which is alternately

loaded.
- Size of each buffer is N(size of disk block) Ex:4096 bytes
– One read command is used to read N characters.
– If fewer than N characters remain in the input file , then special character,
ww

represented by eof, marks the end of source file.

.Sentinel is a special character that cannot be a part of source program. eof is used as
sentinel
• Two pointers to the input are maintained
– Pointer lexemeBegin, marks the beginning of the current lexeme, whose
extent we are attempting to determine.
– Pointer forward scans ahead until a pattern match if found.

DEPT. OF CSE, SJBIT Page 17

Algorithm for i/p buffer (or)

Lookahead code with sentinels

m
Switch(*forward++)
{
case eof:
if (forward is at end of first buffer)

o
{ reload second buffer;
forward = beginning of second buffer;

s.c
}
else if (forward is at end of second buffer)
{ reload first buffer;
forward = beginning of first buffer;
}
else /* eof within a buffer marks the end of input terminate lexical analysis*/

}
break; c
case for the other characters
vtu
15

Terminology of Languages
• Alphabet : a finite set of symbols (ASCII characters)
w.

• String :
– Finite sequence of symbols on an alphabet
– Sentence and word are also used in terms of string
– H is the empty string
ww

– |s| is the length of string s.

• Language: sets of strings over some fixed alphabet
– the empty set is a language.
– {H} the set containing empty string is a language
– The set of well-wormed C programs is a language
– The set of all possible identifiers is a language.
• Operators on Strings:
– Concatenation: xy represents the concatenation of strings x and y. s H = s
Hs=s
– sn = s s s .. s ( n times) s0 = H

DEPT. OF CSE, SJBIT Page 18

Operations on Languages

o m
s.c
Example
• L1 = {a,b,c,d} L2 = {1,2}

• L1L2 = {a1,a2,b1,b2,c1,c2,d1,d2}

• L1 L2 = {a,b,c,d,1,2}
c
vtu
• L13 = all strings with length three (using a,b,c,d}

• L1* = all strings using letters a,b,c,d and empty string

• L1+ = doesn’t include the empty string

Regular Expressions
w.

• We use regular expressions to describe tokens of a programming language.

• A regular expression is built up of simpler regular expressions (using defining
rules)
• Each regular expression denotes a language.
• A language denoted by a regular expression is called as a regular set.
ww

DEPT. OF CSE, SJBIT Page 19

Regular Expressions (Rules)
Regular expressions over alphabet 6

Reg. Expr Language it denotes

H {H}
a 6 {a}

m
(r1) | (r2) L(r1) L(r2)
(r1) (r2) L(r1) L(r2)
(r)* (L(r))*
(r) L(r)

o
• (r)+ = (r)(r)*
• (r)? = (r) | H

s.c
Regular Expressions (cont.)
c
• We may remove parentheses by using precedence rules.
– * highest
vtu
– concatenation next
– | lowest
• ab*|c means (a(b)*)|(c)

• Ex:
w.

– 6 = {0,1}
– 0|1 => {0,1}
– (0|1)(0|1) => {00,01,10,11}
– 0* => {H ,0,00,000,0000,....}
– (0|1)* => all strings with 0 and 1, including the empty string
ww

DEPT. OF CSE, SJBIT Page 20

Regular Definitions
• To write regular expression for some languages can be difficult, because
their regular expressions can be quite complex. In those cases, we may
use regular definitions.
• We can give names to regular expressions, and we can use these names

m
as symbols to define other regular expressions.

• A regular definition is a sequence of the definitions of the form:

d1 o r1 where di is a distinct name and

o
d2 o r2 ri is a regular expression over symbols in
. 6{d1,d2,...,di-1}

s.c
dn o rn
basic symbols previously defined names

Regular Definitions
• Ex: Identifiers in Pascal
c
letter o A | B | ... | Z | a | b | ... | z
digit o 0 | 1 | ... | 9
vtu
id o letter (letter | digit ) *
– If we try to write the regular expression representing identifiers without
using regular definitions, that regular expression will be complex.
(A|...|Z|a|...|z) ( (A|...|Z|a|...|z) | (0|...|9) ) *
• Ex: Unsigned numbers in Pascal
digit o 0 | 1 | ... | 9
w.

digits o digit +
opt-fraction o ( . digits ) ?
opt-exponent o ( E (+|-)? digits ) ?
unsigned-num o digits opt-fraction opt-exponent
ww

DEPT. OF CSE, SJBIT Page 21

o m
s.c
Recognition of Tokens

Our current goal is to perform the lexical analysis needed for the following grammar.

stmt → if expr then stmt

c
vtu
| if expr then stmt else stmt
|ε
expr → term relop term // relop is relational operator =, >, etc
| term
term → id
| number
w.

Recall that the terminals are the tokens, the nonterminals produce terminals.

A regular definition for the terminals is

digit → [0-9]
digits → digits+
number → digits (. digits)? (E[+-]? digits)?
letter → [A-Za-z]
id → letter ( letter | digit )*
if → if
then → then
else → else
relop → < | > | <= | >= | = | <>

DEPT. OF CSE, SJBIT Page 22

Lexeme Token Attribute
Whitespace ws —
if if —
then then —
else else —

m
An identifier id Pointer to table entry
A number number Pointer to table entry
< relop LT
<= relop LE

o
= relop EQ
<> relop NE

s.c
> relop GT
>= relop GE

On the board show how this can be done with just REs.

c
We also want the lexer to remove whitespace so we define a new token

ws → ( blank | tab | newline ) +

vtu
where blank, tab, and newline are symbols used to represent the corresponding ascii
characters.

Recall that the lexer will be called by the parser when the latter needs a new token. If the
lexer then recognizes the token ws, it does not return it to the parser but instead goes on
w.

to recognize the next token, which is then returned. Note that you can't have two
consecutive ws tokens in the input because, for a given token, the lexer will match the
longest lexeme starting at the current position that yields this token. The table on the
right summarizes the situation.
ww

For the parser, all the relational ops are to be treated the same so they are all the same
token, relop. Naturally, other parts of the compiler, for example the code generator, will
need to distinguish between the various relational ops so that appropriate code is
generated. Hence, they have distinct attribute values.

Specification of Token
To specify tokens Regular Expressions are used.
Recognition of Token
To recognize tokens there are 2 steps

DEPT. OF CSE, SJBIT Page 23

1. Design of Transition Diagram
2. Implementation of Transition Diagram

Transition Diagrams

A transition diagram is similar to a flowchart for (a part of) the lexer. We draw one for
each possible token. It shows the decisions that must be made based on the input seen.

m
The two main components are circles representing states (think of them as decision
points of the lexer) and arrows representing edges (think of them as the decisions made).

The transition diagram (3.13) for relop is shown on the right.

o
1. The double circles represent accepting or final states at which point a lexeme has
been found. There is often an action to be done (e.g., returning the token), which

s.c
is written to the right of the double circle.
2. If we have moved one (or more) characters too far in finding the token, one (or
more) stars are drawn.
3. An imaginary start state exists and has an arrow coming from it to indicate where
to begin the process.
c
It is fairly clear how to write code corresponding to this diagram. You look at the first
vtu
character, if it is <, you look at the next character. If that character is =, you return
(relop,LE) to the parser. If instead that character is >, you return (relop,NE). If it is
another character, return (relop,LT) and adjust the input buffer so that you will read this
character again since you have not used it for the current lexeme. If the first character
was =, you return (relop,EQ).
w.
ww

DEPT. OF CSE, SJBIT Page 24

Transition Diagrams
• Transition diagrams have a collection of nodes or circles, called States.

m
• Each state represents a condition that could occur during the process of
scanning the input looking for a lexeme that matches one of several
patterns.

o
• Edges are directed from one state of the transition diagram to another.

start 1 < 2

s.c
25

To design transition diagram following conventions are used

c
vtu

Recognition of Reserved Words and Identifiers

The next transition diagram corresponds to the regular definition given previously.
w.

Note again the star affixed to the final state.

Two questions remain.
1. How do we distinguish between identifiers and keywords such as then, which also
match the pattern in the transition diagram?
ww

2. What is (gettoken(), installID())?

We will continue to assume that the keywords are reserved, i.e., may not be used as
identifiers. (What if this is not the case—as in Pl/I, which had no reserved words? Then
the lexer does not distinguish between keywords and identifiers and the parser must.)

We will use the method mentioned last chapter and have the keywords installed into the
identifier table prior to any invocation of the lexer. The table entry will indicate that the
entry is a keyword.

DEPT. OF CSE, SJBIT Page 25

installID() checks if the lexeme is already in the table. If it is not present, the lexeme is
installed as an id token. In either case a pointer to the entry is returned.

gettoken() examines the lexeme and returns the token name, either id or a name
corresponding to a reserved keyword.

The text also gives another method to distinguish between identifiers and keywords.

m
Completion of the Running Example

So far we have transition diagrams for identifiers (this diagram also handles keywords)

o
and the relational operators. What remains are whitespace, and numbers, which are
respectively the simplest and most complicated diagrams seen so far.

s.c
Recognizing Whitespace

The diagram itself is quite simple reflecting the simplicity of the corresponding regular
expression.

x
tab, and newline.
c
The delim in the diagram represents any of the whitespace characters, say space,
vtu
x The final star is there because we needed to find a non-whitespace character in
order to know when the whitespace ends and this character begins the next token.
x There is no action performed at the accepting state. Indeed the lexer does not
return to the parser, but starts again from its beginning as it still must find the next
token.
w.

Recognizing Numbers

This certainly looks formidable, but it is not that bad; it follows from the regular
expression.
ww

In class go over the regular expression and show the corresponding parts in the diagram.

When an accepting states is reached, action is required but is not shown on the diagram.
Just as identifiers are stored in a identifier table and a pointer is returned, there is a
corresponding number table in which numbers are stored. These numbers are needed
when code is generated. Depending on the source language, we may wish to indicate in
the table whether this is a real or integer. A similar, but more complicated, transition
diagram could be produced if the language permitted complex numbers as well.

DEPT. OF CSE, SJBIT Page 26

Architecture of a Transition-Diagram-Based Lexical Analyzer

The idea is that we write a piece of code for each decision diagram. I will show the one
for relational operations below. This piece of code contains a case for each state, which
typically reads a character and then goes to the next case depending on the character read.
The numbers in the circles are the names of the cases.

m
Accepting states often need to take some action and return to the parser. Many of these
accepting states (the ones with stars) need to restore one character of input. This is called
retract() in the code.

o
What should the code for a particular diagram do if at one state the character read is not
one of those for which a next state has been defined? That is, what if the character read is

s.c
not the label of any of the outgoing arcs? This means that we have failed to find the token
corresponding to this diagram.

The code calls fail(). This is not an error case. It simply means that the current input does
not match this particular token. So we need to go to the code section for another diagram
c
after restoring the input pointer so that we start the next diagram at the point where this
failing diagram started. If we have tried all the diagram, then we have a real failure and
need to print an error message and perhaps try to repair the input.
vtu
Note that the order the diagrams are tried is important. If the input matches more than one
token, the first one tried will be chosen.

TOKEN getRelop() // TOKEN has two components

TOKEN retToken = new(RELOP); // First component set here
w.

while (true)
switch(state)
case 0: c = nextChar();
if (c == '<') state = 1;
else if (c == '=') state = 5;
ww

else if (c == '>') state = 6;

else fail();
break;
case 1: ...
...
case 8: retract(); // an accepting state with a star
retToken.attribute = GT; // second component
return(retToken);
Alternate Methods

DEPT. OF CSE, SJBIT Page 27

The book gives two other methods for combining the multiple transition-diagrams (in
addition to the one above).

1. Unlike the method above, which tries the diagrams one at a time, the first new
method tries them in parallel. That is, each character read is passed to each
diagram (that hasn't already failed). Care is needed when one diagram has
accepted the input, but others still haven't failed and may accept a longer prefix of

m
the input.
2. The final possibility discussed, which appears to be promising, is to combine all
the diagrams into one. That is easy for the example we have been considering because
all the diagrams begin with different characters being matched. Hence we just have

o
one large start with multiple outgoing edges. It is more difficult when there is a
character that can begin more than one diagram.

c s.c
vtu
w.
ww

DEPT. OF CSE, SJBIT Page 28

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

UNIT II: SYNTAX ANALYSIS - 1

SYLLABUS
x Syntax Analysis – 1: Introduction;
x Context-free Grammars;
x Writing a Grammar.

m
x Top-down Parsing;
x Bottom-up Parsing

o
c s.c
vtu
w.
ww

DEPT. OF CSE, SJBIT Page 29

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

Introduction
• Syntax Analyzer creates the syntactic structure of the given source program.
• This syntactic structure is mostly a parse tree.
• Syntax Analyzer is also known as parser.
• The syntax of a programming is described by a context-free grammar (CFG). We
will use BNF (Backus-Naur Form) notation in the description of CFGs.
• The syntax analyzer (parser) checks whether a given source program satisfies the

m
rules implied by a context-free grammar or not.
– If it satisfies, the parser creates the parse tree of that program.
– Otherwise the parser gives the error messages.
• A context-free grammar

o
– gives a precise syntactic specification of a programming language.
– the design of the grammar is an initial phase of the design of a compiler.

s.c
– a grammar can be directly converted into a parser by some tools.
• Parser works on a stream of tokens.
• The smallest item is a token.
Fig :Position Of Parser in Compiler model
c
vtu
SYMBOL TABLE
• We categorize the parsers into two groups:
w.

1. Top-Down Parser
2. the parse tree is created top to bottom, starting from the root.
1. Bottom-Up Parser
– the parse is created bottom to top; starting from the leaves
• Both top-down and bottom-up parsers scan the input from left to right (one
ww

symbol at a time).
• Efficient top-down and bottom-up parsers can be implemented only for sub-
classes of context-free grammars.
– LL for top-down parsing
– LR for bottom-up parsing
Syntax Error Handling
• Common Programming errors can occur at many different levels.
1. Lexical errors: include misspelling of identifiers, keywords, or operators.
2. Syntactic errors : include misplaced semicolons or extra or missing braces.

DEPT. OF CSE, SJBIT Page 30

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

3. Semantic errors: include type mismatches between operators and operands.

4. Logical errors: can be anything from incorrect reasoning on the part of the
programmer.
Goals of the Parser
• Report the presence of errors clearly and accurately
• Recover from each error quickly enough to detect subsequent errors.
• Add minimal overhead to the processing of correct programs.

m
Error-Recovery Strategies
• Panic-Mode Recovery
• Phrase-Level Recovery
• Error Productions

o
• Global Correction
Panic-Mode Recovery

s.c
• On discovering an error, the parser discards input symbols one at a time until one
of a designated set of Synchronizing tokens is found.
• Synchronizing tokens are usually delimiters.
Ex: semicolon or } whose role in the source program is clear and unambiguous.
• It often skips a considerable amount of input without checking it for additional
errors. c
Advantage:
Simplicity
vtu
Is guaranteed not to go into an infinite loop
Phrase-Level Recovery
• A parser may perform local correction on the remaining input. i.e
it may replace a prefix of the remaining input by some string that allows the parser to
continue.
Ex: replace a comma by a semicolon, insert a missing semicolon
• Local correction is left to the compiler designer.
w.

• It is used in several error-repairing compliers, as it can correct any input string.

• Difficulty in coping with the situations in which the actual error has occurred
before the point of detection.
Error Productions
ww

• We can augment the grammar for the language at hand with productions that
generate the erroneous constructs.
• Then we can use the grammar augmented by these error productions to
Construct a parser.
• If an error production is used by the parser, we can generate appropriate error
diagnostics to indicate the erroneous construct that has been recognized in the
input.
Global Correction
• We use algorithms that perform minimal sequence of changes to obtain a globally
least cost correction.

DEPT. OF CSE, SJBIT Page 31

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

• Given an incorrect input string x and grammar G, these algorithms will find a
parse tree for a related string y.
• Such that the number of insertions, deletions and changes of tokens required to
transform x into y is as small as possible.
• It is too costly to implement in terms of time space, so these techniques only of
theoretical interest.

m
Context-Free Grammars
• Inherently recursive structures of a programming language are defined by a
context-free grammar.
• In a context-free grammar, we have:

o
– A finite set of terminals (in our case, this will be the set of tokens)
– A finite set of non-terminals (syntactic-variables)

s.c
– A finite set of productions rules in the following form
• AoD where A is a non-terminal and
D is a string of terminals and non-terminals (including the empty
string)
– A start symbol (one of the non-terminal symbol)
NOTATIONAL CONVENTIONS c
1. Symbols used for terminals are :
 Lower case letters early in the alphabet (such as a, b, c, . . .)
vtu
 Operator symbols (such as +, *, . . . )
 Punctuation symbols (such as parenthesis, comma and so on)
 The digits(0…9)
 Boldface strings and keywords (such as id or if) each of which represents
a single terminal symbol
2. Symbols used for non terminals are:
w.

 Uppercase letters early in the alphabet (such as A, B, C, …)

 The letter S, which when it appears is usually the start symbol.
 Lowercase, italic names (such as expr or stmt).
3. Lower case greek letters, α, β, J for example represent (possibly empty)
ww

strings of grammar symbols.

Derivations

DEPT. OF CSE, SJBIT Page 32

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

E E+E
• E+E derives from E
– we can replace E by E+E
– to able to do this, we have to have a production rule EoE+E in our
grammar.
E E+E id+E id+id
• A sequence of replacements of non-terminal symbols is called a derivation of

m
id+id from E.
• In general a derivation step is
DAE DJ if there is a production rule AoJ in our grammar
where D and E are arbitrary strings of terminal and non-

o
terminal symbols
D1 D2 ... Dn (Dn derives from D1 or D1 derives Dn )

s.c
: derives in one step
: derives in zero or more steps
: derives in one or more steps

CFG – Terminology
• L(G) is the language of G (the language generated by G) which is a set of
c
sentences.
• A sentence of L(G) is a string of terminal symbols of G.
vtu
• If S is the start symbol of G then
Z is a sentence of L(G) iff S Z where Z is a string of terminals of G.
• If G is a context-free grammar, L(G) is a context-free language.
• Two grammars are equivalent if they produce the same language.
• S D - If D contains non-terminals, it is called as a sentential form of G.
- If D does not contain non-terminals, it is called as a sentence of
w.

G.
Derivation Example
E -E -(E) -(E+E) -(id+E) -(id+id)
OR
E -E -(E) -(E+E) -(E+id) -(id+id)
ww

• At each derivation step, we can choose any of the non-terminal in the sentential
form of G for the replacement.
• If we always choose the left-most non-terminal in each derivation step, this
derivation is called as left-most derivation.
• If we always choose the right-most non-terminal in each derivation step, this
derivation is called as right-most derivation.
Left-Most and Right-Most Derivations
Left-Most Derivation
E -E -(E) -(E+E) -(id+E) -(id+id)

DEPT. OF CSE, SJBIT Page 33

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

Right-Most Derivation
E -E -(E) -(E+E) -(E+id) -(id+id)

• We will see that the top-down parsers try to find the left-most derivation of the
given source program.
• We will see that the bottom-up parsers try to find the right-most derivation of the
given source program in the reverse order.

m
Parse Tree
• Inner nodes of a parse tree are non-terminal symbols.
• The leaves of a parse tree are terminal symbols.
• A parse tree can be seen as a graphical representation of a derivation.

o
c s.c
vtu
Problems on derivation of a string with parse tree:
1. Consider the grammar S (L) | a
LL,S | S
i. What are the terminals, non terminal and the start symbol?
ii. Find the parse tree for the following sentence
a. (a,a)
b. (a, (a, a))
w.

c. (a, ((a,a),(a,a)))
iii. Construct LMD and RMD for each.
2. Do the above steps for the grammar S  aS | aSbS | for the string “aaabaab”
Ambiguity
ww

• A grammar produces more than one parse tree for a sentence is

called as an ambiguous grammar.

DEPT. OF CSE, SJBIT Page 34

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

m
• For the most parsers, the grammar must be unambiguous.

o
• unambiguous grammar
 unique selection of the parse tree for a sentence
• We should eliminate the ambiguity in the grammar during the design phase of the

s.c
compiler.
• An ambiguous grammar should be written to eliminate the ambiguity.
• We have to prefer one of the parse trees of a sentence (generated by an ambiguous
grammar) to disambiguate that grammar to restrict to this choice.
• EG: c Ambiguity (cont.)
vtu
stmt o if expr then stmt |
if expr then stmt else stmt | otherstmts
w.

if E1 then if E2 then S1 else S2

stmt stmt

if expr then stmt else stmt if expr then stmt

E1 if expr then stmt S2 E1 if expr then stmt else stmt

E2 S1 E2 S1 S2
1 2

• We prefer the second parse tree (else matches with closest if).
• So, we have to disambiguate our grammar to reflect this choice.
• The unambiguous grammar will be:
• stmt o matchedstmt | unmatchedstmt

DEPT. OF CSE, SJBIT Page 35

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

• matchedstmt o if expr then matchedstmt else matchedstmt | otherstmts

• unmatchedstmt o if expr then stmt |
• if expr then matchedstmt else unmatchedstmt
Problems on ambiguous grammar:
Show that the following grammars are ambiguous grammar by constructing either
2 lmd or 2 rmd for the given string.
1. S  S(S)S | with the string ( ( ) ( ) )

Ambiguity – Operator Precedence

o
• Ambiguous grammars (because of ambiguous operators) can be disambiguated
according to the precedence and associativity rules.

A AD for some string D

• Top-down parsing techniques cannot handle left-recursive grammars.
w.

• So, we have to convert our left-recursive grammar into an equivalent grammar

which is not left-recursive.
• The left-recursion may appear in a single step of the derivation (immediate left-
recursion), or may appear in more than one step of the derivation.
Immediate Left-Recursion
ww

AoAD| E where E does not start with A

eliminate immediate left recursion
’
AoEA
A’ o D A’ | H an equivalent grammar
In general,
A o A D1 | ... | A Dm | E1 | ... | En where E1 ... En do not start with A
eliminate immediate left recursion
’ ’
A o E 1 A | ... | E n A
A’ o D1 A’ | ... | Dm A’ | H an equivalent grammar
Example

DEPT. OF CSE, SJBIT Page 36

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

m
Left-Recursion – Problem
• A grammar cannot be immediately left-recursive, but it still can be

o
left-recursive.
• By just eliminating the immediate left-recursion, we may not get
a grammar which is not left-recursive.

s.c
S o Aa | b
A o Sc | d This grammar is not immediately left-recursive,
but it is still left-recursive.
S Aa Sca or
A Sc Aac causes to a left-recursion
• So, we have to eliminate all left-recursions from our grammar
c
Eliminate Left-Recursion – Algorithm
- Arrange non-terminals in some order: A1 ... An
vtu
- for i from 1 to n do {
- for j from 1 to i-1 do {
replace each production
Ai o Aj J
by
Ai o D1 J | ... | Dk J
where Aj o D1 | ... | Dk
w.

}
- eliminate immediate left-recursions among Ai productions
}
Example2:
S o Aa | b
ww

DEPT. OF CSE, SJBIT Page 37

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

m
Problems of left recursion
1. S  S(S)S |
2. S  S+S | |SS | (S) |S* a
3. S  SS+ | SS* | a

s.c
5. S  (L) | a, L  L,S | S

Left-Factoring
• A predictive parser (a top-down parser without backtracking) insists that the
grammar must be left-factored.
c
grammar  a new equivalent grammar suitable for predictive parsing
stmt o if expr then stmt else stmt |
if expr then stmt
vtu
• when we see if, we cannot now which production rule to choose to re-write stmt
in the derivation.
• In general,
A o DE1 | DE2 where D is non-empty and the first symbols
of E1 and E2 (if they have one)are different.
• when processing D we cannot know whether expand
w.

A to DE1 or
A to DE2
• But, if we re-write the grammar as follows
A o DA’
A’ o E1 | E2 so, we can immediately expand A to DA’
ww

Left-Factoring – Algorithm
• For each non-terminal A with two or more alternatives (production rules) with a
common non-empty prefix, let say
A o DE1 | ... | DEn | J1 | ... | Jm
convert it into
A o DA’ | J1 | ... | Jm
A’ o E1 | ... | En
Left-Factoring – Example1
A o abB | aB | cdg | cdeB | cdfB

DEPT. OF CSE, SJBIT Page 38

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

A o aA’ | cdg | cdeB | cdfB

A’ o bB | B

A o aA’ | cdA’’
A’ o bB | B
A’’ o g | eB | fB
Example2

m
A o ad | a | ab | abc | b

A o aA’ | b
A’ o d | H | b | bc

o
A o aA’ | b
A’ o d | H | bA’’

Non-Context Free Language Constructs

• There are some language constructions in the programming languages which are
not context-free. This means that, we cannot write a context-free grammar for
these constructions.
• L1 = { ZcZ | Z is in (a | b)*} is not context-free
 Declaring an identifier and checking whether it is declared or not later.
ww

We cannot do this with a context-free language. We need semantic analyzer (which is

not context-free).
• L2 = {anbmcndm | nt1 and mt1 } is not context-free
 Declaring two functions (one with n parameters, the other one with
m parameters), and then calling them with actual parameters.

Top – down parser

Recursive Descent parser Predictive parser

LL(1) parser LL(k) parser

DEPT. OF CSE, SJBIT Page 39

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

First L stands for left to right scan

Second L stands for LMD
(1) stands for only one i/p symbol to predict the parser
(2) stands for k no. of i/p symbol to predict the parser
• The parse tree is created top to bottom.
• Top-down parser
– Recursive-Descent Parsing

m
• Backtracking is needed (If a choice of a production rule does not
work, we backtrack to try other alternatives.)
• It is a general parsing technique, but not widely used.
• Not efficient

o
– Predictive Parsing
• no backtracking

s.c
• efficient
• Needs a special form of grammars (LL (1) grammars).
• Recursive Predictive Parsing is a special form of Recursive
Descent parsing without backtracking.
Non-Recursive (Table Driven) Predictive Parser is also known as LL (1) parser.
Recursive-Descent Parsing (uses Backtracking)
c
• Backtracking is needed.
• It tries to find the left-most derivation.
vtu
S o aBc
B o bc | b
S S
input: abc
a B c a B c
w.

b c b
Predictive Parser
• When re-writing a non-terminal in a derivation step, a predictive parser can
uniquely choose a production rule by just looking the current symbol in the input
string.
ww

A o D1 | ... | Dn input: ... a .......

current token
example
s t m t o i f ...... |
while ...... |
begin ...... |
for .....
• When we are trying to write the non-terminal stmt, if the current token is if we
have to choose first production rule.

DEPT. OF CSE, SJBIT Page 40

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

• When we are trying to write the non-terminal stmt, we can uniquely choose the
production rule by just looking the current token.
• We eliminate the left recursion in the grammar, and left factor it. But it may not
be suitable for predictive parsing (not LL(1) grammar).
Non-Recursive Predictive Parsing -- LL(1) Parser
• Non-Recursive predictive parsing is a table-driven parser.
• It is a top-down parser.

m
• It is also known as LL(1) Parser.

o
LL(1) Parser
input buffer
c s.c
vtu
– our string to be parsed. We will assume that its end is marked with a
special symbol $.
output
– a production rule representing a step of the derivation sequence (left-most
derivation) of the string in the input buffer.
stack
– contains the grammar symbols
w.

– at the bottom of the stack, there is a special end marker symbol $.

– initially the stack contains only the symbol $ and the starting symbol S. $S
 initial stack
– when the stack is emptied (ie. only $ left in the stack), the parsing is
ww

completed.
parsing table
– a two-dimensional array M[A, a]
– each row is a non-terminal symbol
– each column is a terminal symbol or the special symbol $
each entry holds a production rule.
Constructing LL(1) Parsing Tables
• Two functions are used in the construction of LL(1) parsing tables:
– FIRST FOLLOW

DEPT. OF CSE, SJBIT Page 41

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

• FIRST(D) is a set of the terminal symbols which occur as first symbols in strings
derived from D where D is any string of grammar symbols.
• if D derives to H, then H is also in FIRST(D) .
• FOLLOW(A) is the set of the terminals which occur immediately after (follow)
the non-terminal A in the strings derived from the starting symbol.
– a terminal a is in FOLLOW(A) if S DAaE

m
– $ is in FOLLOW(A) if S DA
Compute FIRST for Any String X
• If X is a terminal symbol  FIRST(X)={X}
• If X is a non-terminal symbol and X o H is a production rule  H is in

o
FIRST(X).
• If X is a non-terminal symbol and X o Y1Y2..Yn is a production rule

s.c
 if a terminal a in FIRST(Yi) and H is in all FIRST(Yj) for j=1,...,i-1then a is in
FIRST(X).
 if H is in all FIRST(Yj) for j=1,...,n then H is in FIRST(X).
• If X is H  FIRST(X)={H}
• If X is Y1Y2..Yn  if a terminal a in FIRST(Yi) and H is in all
FIRST(Yj) for c j=1,...,i-1 then a is in FIRST(X).
 if H is in all FIRST(Yj) for j=1,...,n
vtu
then H is in FIRST(X).
Compute FOLLOW (for non-terminals)
• If S is the start symbol  $ is in FOLLOW(S)
• if A o DBE is a production rule  everything in FIRST(E) is
FOLLOW(B) except H
• If ( A o DB is a production rule ) or ( A o DBE is a production rule and H is in
w.

FIRST(E) )
 everything in FOLLOW(A) is in
FOLLOW(B).
We apply these rules until nothing more can be added to any follow set.
ww

LL(1) Parser – Parser Actions

• The symbol at the top of the stack (say X) and the current symbol in the input
string (say a) determine the parser action.
• There are four possible parser actions.
1. If X and a are $  parser halts (successful completion)
2. If X and a are the same terminal symbol (different from $)
 parser pops X from the stack, and moves the next symbol in the input buffer.
3. If X is a non-terminal

DEPT. OF CSE, SJBIT Page 42

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

 parser looks at the parsing table entry M[X, a]. If M[X, a] holds a production
rule XoY1Y2...Yk, it pops X from the stack and pushes Yk,Yk-1,...,Y1 into the stack. The
parser also outputs the production rule XoY1Y2...Yk to represent a step of the derivation.
4. none of the above  error
– all empty entries in the parsing table are errors.
– If X is a terminal symbol different from a, this is also an error case.

m
Non-Recursive predictive parsing Algorithm

o
c s.c
vtu
Fall 2003 CS416 Compiler Design 13

LL(1) Parser – Example1

S o aBa LL (1) Parsing Table
w.

B o bB | H
FIRST FUNCTION
FIRST(S) = {a} FIRST (aBa) = {a}
FIRST (B) = {b} FIRST (bB) = {b} FIRST (H ) = {H}
FOLLOW FUNCTION
ww

FOLLOW(S) = {$} FOLLOW (B) = {a}

a b $
S S o aBa
B BoH B o bB
stack in p u t output
$S abba$ S o aBa
$aBa abba$
$aB bba$ B o bB
$aBb bba$

DEPT. OF CSE, SJBIT Page 43

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

$aB ba$ B o bB
$aBb ba$
$aB a$ BoH
$a a$
$ $ accept, successful completion
Outputs: S o aBa B o bB B o bB BoH
Derivation(left-most): SaBaabBaabbBaabba

FIRST(F) = {(,id} FIRST(TE’) = {(,id}

FIRST(T’) = {*, H} FIRST(+TE’ ) = {+}
FIRST(T) = {(,id} FIRST(H) = {H}
FIRST(E’) = {+, H} FIRST(FT’) = {(,id}
FIRST(E) = {(,id} FIRST(*FT’) = {*}
ww

FIRST(H) = {H} FIRST((E)) = {(} FIRST(id) = {id}

FOLLOW Example
E o TE’
E’ o +TE’ | H
T o FT’
T’ o *FT’ | H
F o (E) | id
FOLLOW (E) = {$,)}
FOLLOW (E’) = {$,)}
FOLLOW (T) = {+,), $}

DEPT. OF CSE, SJBIT Page 44

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

FOLLOW (T’) = {+,), $}

FOLLOW (F) = {+, *,), $}
Constructing LL (1) Parsing Table – Algorithm
• for each production rule A o D of a grammar G
– for each terminal a in FIRST(D)  add A o D to M[A, a]
– If H in FIRST(D)  for each terminal a in FOLLOW(A) add A
o D to M[A, a]

m
– If H in FIRST(D) and $ in FOLLOW(A)  add A o D to M[A, $]
– All other undefined entries of the parsing table are error entries.
Constructing LL (1) Parsing Table – Example
E o TE’ FIRST (TE’) = {(, id}  E o TE’ into M [E, (] and M[E, id]

o
E’ o +TE’ FIRST (+TE’) = {+}  E’ o +TE’ into M [E’, +]
’
E oH FIRST (H) = {H}  none

s.c
but since H in FIRST(H) and FOLLOW(E’)={$,)}
 E’ o H into M[E’,$] and M[E’,)]
’ ’
T o FT FIRST (FT ) = {(, id}  T o FT’ into M[T,(] and M[T, id]
T’ o *FT’ FIRST (*FT’) = {*}  T’ o *FT’ into M [T’,*]
T’ o H FIRST (H) = {H}  none
c but since H in FIRST(H)
and FOLLOW(T’)={$, ) ,+}
 T’ o H into M [T’, $], M [T’ , )] and M [T’,+]
vtu
F o (E) FIRST ((E)) = {(}  F o (E) into M [F, (]
F o id FIRST (id) = {id}  F o id into M [F, id]
id + * ( ) $
E E o E o TE’
TE’
E’
w.

E’ o +TE’ E’ o H E’ o H
T T o T o FT’
FT’
T’ T’ o H T’ o T’ o H T’ o H
ww

*FT’
F F o id F o (E)
stack in p u t output
$E id+id$ E o TE’
$E’T id+id$ T o FT’
$E’ T’F id+id$ F o id
$ E’ T’id id+id$
$ E’ T’ +id$ T’ o H
$ E’ +id$ E’ o +TE’
$ E’ T+ +id$

DEPT. OF CSE, SJBIT Page 45

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

$ E’ T id$ T o FT’
$ E’ T’ F id$ F o id
’ ’
$ E T id id$
$ E’ T’ $ T’ o H
$ E’ $ E’ o H
$ $ accept
Construct the predictive parser LL (1) for the following grammar and parse the

o
“+*aa a” cababca”
8. S  PQR
3. S  aSbS | bSaS | with the string
P  a | Rb |

LL (1) Grammars
• A grammar whose parsing table has no multiply-defined entries is said to be LL
w.

(1) grammar. one input symbol used as a look-head symbol do determine parser
action LL (1) left most derivation input scanned from left to right
• The parsing table of a grammar may contain more than one production rule. In
this case, we say that it is not a LL (1) grammar.
ww

A Grammar which is not LL (1)

SoiCtSE | a
EoeS | H
Cob
FIRST(iCtSE) = {i} FOLLOW(S) = {$, e}
FIRST(a) = {a} FOLLOW (E) = {$, e}
FIRST(eS) = {e} FOLLOW(C) = {t }
FIRST(H) = {H}
FIRST(b) = {b}

DEPT. OF CSE, SJBIT Page 46

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

a b e i t $
S Soa S o iCtSE
E EoeS EoH
EoH
C Cob

m
two production rules for M[E, e]
Problem  ambiguity
• What do we have to do it if the resulting parsing table contains multiply defined
entries?

o
– If we didn’t eliminate left recursion, eliminate the left recursion in the
grammar.

s.c
– If the grammar is not left factored, we have to left factor the grammar.
– If it’s (new grammar’s) parsing table still contains multiply defined
entries, that grammar is ambiguous or it is inherently not a LL(1)
grammar.
• A left recursive grammar cannot be a LL (1) grammar.
– A o AD | E c
 any terminal that appears in FIRST(E) also appears FIRST(AD)
vtu
because AD ED.
 If E is H, any terminal that appears in FIRST(D) also appears in
FIRST(AD) and FOLLOW(A).
• A grammar is not left factored, it cannot be a LL(1) grammar
– A o DE1 | DE2
 any terminal that appears in FIRST(DE1) also appears in
w.

FIRST(DE2).
• An ambiguous grammar cannot be a LL (1) grammar.

Properties of LL(1) Grammars

• A grammar G is LL(1) if and only if the following conditions hold for two
distinctive production rules A o D and A o E
1. Both D and E cannot derive strings starting with same terminals.
2. At most one of D and E can derive to H.
3. If E can derive to H, then D cannot derive to any string starting with a
terminal in FOLLOW(A).
Error Recovery in Predictive Parsing
• An error may occur in the predictive parsing (LL(1) parsing)
– if the terminal symbol on the top of stack does not match with the current
input symbol.

DEPT. OF CSE, SJBIT Page 47

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

– if the top of stack is a non-terminal A, the current input symbol is a, and

the parsing table entry M[A, a] is empty.
• What should the parser do in an error case?
– The parser should be able to give an error message (as much as possible
meaningful error message).
– It should be recovered from that error case, and it should be able to
continue the parsing with the rest of the input.

m
Error Recovery Techniques
• Panic-Mode Error Recovery
– Skipping the input symbols until a synchronizing token is found.
• Phrase-Level Error Recovery

o
– Each empty entry in the parsing table is filled with a pointer to a specific
error routine to take care that error case.

s.c
• Error-Productions
– If we have a good idea of the common errors that might be encountered,
we can augment the grammar with productions that generate erroneous
constructs.
– When an error production is used by the parser, we can generate
c
appropriate error diagnostics.
– Since it is almost impossible to know all the errors that can be made by the
vtu
programmers, this method is not practical.
• Global-Correction
– Ideally, we would like a compiler to make as few changes as possible in
processing incorrect inputs.
– We have to globally analyze the input to find the error.
– This is an expensive method, and it is not in practice.
w.

Panic-Mode Error Recovery in LL (1) Parsing

• In panic-mode error recovery, we skip all the input symbols until a synchronizing
token is found.
• What is the synchronizing token?
– All the terminal-symbols in the follow set of a non-terminal can be used as
ww

a synchronizing token set for that non-terminal.

• So, a simple panic-mode error recovery for the LL(1) parsing:
– All the empty entries are marked as synch to indicate that the parser will
skip all the input symbols until a symbol in the follow set of the non-
terminal A which on the top of the stack. Then the parser will pop that
non-terminal A from the stack. The parsing continues from that state.
– To handle unmatched terminal symbols, the parser pops that unmatched
terminal symbol from the stack and it issues an error message saying that
that unmatched terminal is inserted.

DEPT. OF CSE, SJBIT Page 48

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

Panic-Mode Error Recovery – Example

S o AbS | e | H
A o a | cAd
Soln:
FIRST (S) = FIRST (A) = {a, c}
FIRST (A) = {a, c}
FOLLOW (S) = {$}
FOLLOW (A) = {b, d}

m
a b c d e $
S S o sync S o sync S o S o

o
AbS AbS e H
A Aoa sync A o sync sync sync

s.c
cAd
Eg: input string “aab”
stack input output
$S aab$ S o AbS
$SbA aab$ A o a
$Sba aab$
$Sb ab$
$S ab$
c
Error: missing b, inserted
S o AbS
vtu
$SbA ab$ Aoa
$Sba ab$
$Sb b$
$S $ SoH
$ $ accept
w.

Eg: Another input string “ceadb”

stack input output
$S ceadb$ S o AbS
$SbA ceadb$ A o cAd
$SbdAc ceadb$
$SbdA eadb$ Error:unexpected e (illegal A)
ww

(Remove all input tokens until first b or d, pop A)

$Sbd db$
$Sb b$
$S $ SoH
$ $ accept
Phrase-Level Error Recovery
• Each empty entry in the parsing table is filled with a pointer to a special error
routine which will take care that error case.
• These error routines may:
– Change, insert, or delete input symbols.

DEPT. OF CSE, SJBIT Page 49

COMPILER DESIGN [10CS63] UNIT- II SYNTAX ANALYSIS-1

– issue appropriate error messages

– Pop items from the stack.
• We should be careful when we design these error routines, because we may put
the parser into an infinite loop.

o m
c s.c
vtu
w.
ww

DEPT. OF CSE, SJBIT Page 50

COMPILER DESIGN [10CS63] UNIT- III SYNTAX ANALYSIS-2

UNIT III: SYNTAX ANALYSIS-2

SYLLABUS
x Top-down Parsing;
x Bottom-up Parsing

o m
cs.c
vtu
w.
ww

DEPT. OF CSE, SJBIT Page 51

COMPILER DESIGN [10CS63] UNIT- III SYNTAX ANALYSIS-2

Bottom-Up Parsing
• A bottom-up parser creates the parse tree of the given input starting from leaves
towards the root.
• A bottom-up parser tries to find the right-most derivation of the given input in the
reverse order.
S ... Z (the right-most derivation of Z)
m (the bottom-up parser finds the right-most derivation in the reverse

m
order)
• Bottom-up parsing is also known as shift-reduce parsing because its two main
actions are shift and reduce.

o
– At each shift action, the current symbol in the input string is pushed to a
stack.
– At each reduction step, the symbols at the top of the stack (this symbol

s.c
sequence is the right side of a production) will replaced by the non-
terminal at the left side of that production.
– There are also two more actions: accept and error.
Shift-Reduce Parsing
• A shift-reduce parser tries to reduce the given input string into the starting
c
symbol.
a string  the starting symbol
vtu
reduced to
• At each reduction step, a substring of the input matching to the right side of a
production rule is replaced by the non-terminal at the left side of that production
rule.
• If the substring is chosen correctly, the right most derivation of that string is
created in the reverse order.
w.

Rightmost Derivation: SZ

Shift-Reduce Parser finds: Z ... S

Example
ww

S o aABb input string: aaabb

A o aA | a aaAbb
B o bB | b aAbb reduction
aABb
S

DEPT. OF CSE, SJBIT Page 52

COMPILER DESIGN [10CS63] UNIT- III SYNTAX ANALYSIS-2

S aABb aAbb aaAbb aaabb

Right Sentential Forms

• How do we know which substring to be replaced at each reduction step?
Handle

m
• Informally, a handle of a string is a substring that matches the right side of a
production rule.
– But not every substring matches the right side of a production rule is

o
handle
• A handle of a right sentential form J ({ DEZ) is
a production rule A o E and a position of J

s.c
where the string E may be found and replaced by A to produce
the previous right-sentential form in a rightmost derivation of J.

S DAZ DEZ

•
c
If the grammar is unambiguous, then every right-sentential form of the grammar
has exactly one handle.
vtu
• We will see that Z is a string of terminals.
Handle Pruning
w.

x Handle pruning help in finding handle which will be reduced to a non

terminal, that is the process of shift reduce parsing.

A Shift-Reduce Parser

DEPT. OF CSE, SJBIT Page 53

COMPILER DESIGN [10CS63] UNIT- III SYNTAX ANALYSIS-2

o m
s.c
A Stack Implementation of A Shift-Reduce Parser
• There are four possible actions of a shift-parser action:
1. Shift : The next input symbol is shifted onto the top of the stack.
2. Reduce: Replace the handle on the top of the stack by the non-terminal.
3. Accept: Successful completion of parsing.
c
4. Error: Parser discovers a syntax error, and calls an error recovery routine.
• Initial stack just contains only the end-marker $.
vtu
• The end of the input string is marked by the end-marker $.

Consider the following grams and parse the respective strings using shift-
reduce parser.
(1) E o E+T | T
T o T*F | F
w.

F o (E) | id string is “id + id * id”

Here we follow 2 rules
1. If the incoming operator has more priority than in stack operator then
perform shift.
ww

2. If in stack operator has same or less priority than the priority of incoming
operator then perform reduce.

DEPT. OF CSE, SJBIT Page 54

COMPILER DESIGN [10CS63] UNIT- III SYNTAX ANALYSIS-2

A Stack Implementation of A Shift-Reduce Parser

Stack Input Action
$ id+id*id$ shift
$ id +id*id$ reduce by F o id Parse Tree

m
$F +id*id$ reduce by T o F
$T +id*id$ reduce by E o T E 8
$E +id*id$ shift
$ E+ id*id$ shift E 3 + T 7

o
$E+id *id$ reduce by F o id
$E+F *id$ reduce by T o F T 2 T 5 * F6
$E+T *id$ shift

s.c
$E+T* id $ shift F 1 F 4 id
$E+T*id $ reduce by F o id
$E+T*F $ reduce by T o T*F id id
$E+T $ reduce by E o E+T
$E $ accept
c
vtu
(2) S TL;
T int | float
L  L, id | id
String is “int id, id;” do shift-reduce parser.
(3) S (L) |a
L  L,S | S
w.

String “(a,(a,a))” do shift-reduce parser.

DEPT. OF CSE, SJBIT Page 55

COMPILER DESIGN [10CS63] UNIT- III SYNTAX ANALYSIS-2

Shift reduce parser problem

m
Stack Input Sequence ACTION
$ the dog jumps$ SHIFT word onto stack
$the dog jumps$ REDUCE using grammar rule
$Art dog jumps$ SHIFT..

o
$Art dog jumps$ REDUCE..
$Art Noun jumps$ REDUCE

s.c
$NounPhrase jumps$ SHIFT
$NounPhrase jumps $ REDUCE
$NounPhrase Verb $ REDUCE
$NounPhrase VerbPhrase $ REDUCE
$Sentence $ SUCCESS
c
vtu
w.
ww

Shift-Reduce Parsers
• There are two main categories of shift-reduce parsers
1. Operator-Precedence Parser
– Simple, but only a small class of grammars.
– LR-Parsers

DEPT. OF CSE, SJBIT Page 56

COMPILER DESIGN [10CS63] UNIT- III SYNTAX ANALYSIS-2

– Covers wide range of grammars.

• SLR – simple LR parser
• LR – most general LR parser
• LALR – intermediate LR parser (lookhead LR parser)
SLR, LR and LALR work same, only their parsing tables are different

o m
c s.c
vtu
w.
ww

LR Parsers
• LR-Parsers
– covers wide range of grammars.
– SLR – simple LR parser
– LR – most general LR parser(canonical LR)
– LALR – intermediate LR parser (look-head LR parser)

DEPT. OF CSE, SJBIT Page 57

COMPILER DESIGN [10CS63] UNIT- III SYNTAX ANALYSIS-2

– SLR, LR and LALR work same (they used the same algorithm), only their
parsing tables are different.

LR Parsing Algorithm

input a1 . . . ai . . . an $
stack

m
Sm
Xm
LR Parsing Algorithm output
Sm-1
Xm-1

o
.
.
Action Table Goto Table
S1 terminals and $ non-terminal

s.c
X1 s s
t four different t each item is
S0 a actions a a state number
t t
e e
s s

c
A Configuration of LR Parsing Algorithm
vtu
• A configuration of a LR parsing is:

( So X1 S1 ... Xm Sm, ai ai+1 ... an $ )

Stack Rest of Input

• Sm and ai decides the parser action by consulting the parsing action

table. (Initial Stack contains just So )

• A configuration of a LR parsing represents the right sentential form:

X1 ... Xm ai ai+1 ... an $

DEPT. OF CSE, SJBIT Page 58

COMPILER DESIGN [10CS63] UNIT- III SYNTAX ANALYSIS-2

Actions of A LR-Parser
1. shift s -- shifts the next input symbol and the state s onto the stack
( So X1 S1 ... Xm Sm, ai ai+1 ... an $ )  ( So X1 S1 ... Xm Sm ai s, ai+1 ... an $ )

2. reduce AoE (or rn where n is a production number)

m
– pop 2|E | (=r) items from the stack;
– then push A and s where s=goto[sm-r,A]
( So X1 S1 ... Xm Sm, ai ai+1 ... an $ )  ( So X1 S1 ... Xm-r Sm-r A s, ai ... an $ )

– Output is the reducing production reduce AoE

o
3. Accept – Parsing successfully completed

s.c
4. Error -- Parser detected an error (an empty entry in the action table)

Reduce Action
• pop 2|E | (=r) items from the stack; let us assume that E = Y1Y2...Yr
• then push A and s where s=goto[sm-r,A]
c
( So X1 S1 ... Xm-r Sm-r Y1 Sm-r ...Yr Sm, ai ai+1 ... an $ )
 ( So X1 S1 ... Xm-r Sm-r A s, ai ... an $ )
vtu
• In fact, Y1Y2...Yr is a handle.
X1 ... Xm-r A ai ... an $ X1 ... Xm Y1...Yr ai ai+1 ... an $
Constructing SLR Parsing Tables – LR(0) Item
• An LR(0) item of a grammar G is a production of G a dot at the some position of
the right side.
• Ex: A o aBb Possible LR(0) Items: A o .aBb
w.

(four different possibility) A o a.Bb

A o aB.b
A o aBb.
• Sets of LR(0) items will be the states of action and goto table of the SLR parser.
ww

• A collection of sets of LR(0) items (the canonical LR(0) collection) is the basis
for constructing SLR parsers.
• Augmented Grammar:
G’ is G with a new production rule S’oS where S’ is the new starting symbol.
The Closure Operation
• If I is a set of LR(0) items for a grammar G, then closure(I) is the set of LR(0)
items constructed from I by the two rules:
1. Initially, every LR(0) item in I is added to closure(I).

DEPT. OF CSE, SJBIT Page 59

COMPILER DESIGN [10CS63] UNIT- III SYNTAX ANALYSIS-2

2. If A o D.BE is in closure(I) and BoJ is a production rule of G; then

Bo.J will be in the closure(I). We will apply this rule until no more new
LR(0) items can be added to closure(I).

The Closure Operation -- Example

E’ o E closure({E’ o .E}) =

m
E o E+T { E’ o .E kernel items
EoT E o .E+T
T o T*F E o .T

o
ToF T o .T*F
F o (E) T o .F

s.c
F o id F o .(E)
F o .i d }
Goto Operation
• If I is a set of LR(0) items and X is a grammar symbol (terminal or non-terminal),
then goto(I,X) is defined as follows:
c
– If A o D.XE in I
in goto(I,X).
then every item in closure({A o DX.E}) will be
vtu
Example:
I ={ E’ o .E, E o .E+T, E o .T,
T o .T*F, T o .F,
F o .(E), F o .id }
goto(I,E) = { E’ o E., E o E.+T }
goto(I,T) = { E o T., T o T.*F }
w.

goto(I,F) = {T o F. }
goto(I,() = { F o (.E), E o .E+T, E o .T, T o .T*F, T o .F,
F o .(E), F o .id }
goto(I,id) = { F o id. }
Construction of The Canonical LR(0) Collection
ww

• To create the SLR parsing tables for a grammar G, we will create the canonical
LR(0) collection of the grammar G’.
• Algorithm:
C is { closure({S’o.S}) }
repeat the followings until no more set of LR(0) items can be added to C.
for each I in C and each grammar symbol X
if goto(I,X) is not empty and not in C
add goto(I,X) to C
• goto function is a DFA on the sets in C.

DEPT. OF CSE, SJBIT Page 60

COMPILER DESIGN [10CS63] UNIT- III SYNTAX ANALYSIS-2

The Canonical LR(0) Collection – Example

I0: E’ o .E I1: E’ o E. I6: E o E+.T I9: E o E+T.
E o .E+T E o E.+T T o .T*F T o T.*F
E o .T T o .F
T o .T*F I2: E o T. F o .(E) I10: T o T*F.
T o .F T o T.*F F o .id

m
F o .(E)
F o .i d I3: T o F. I7: T o T*.F I11: F o (E).
F o .(E)
I4: F o (.E) F o .id

o
E o .E+T
E o .T I8: F o (E.)

s.c
T o .T*F E o E.+T
T o .F
F o .(E)
F o .i d
I5: F o id. c
Transition Diagram (DFA) of Goto Function
vtu
E T
I0 I1 I6 I9 * to I7
F
( to I3
T
id to I4
to I5
F I2 * I7
w.

F
(
I10
I3 id to I4
(
to I5
I4 E I8 )
id id T
to I2 I11
ww

F +
I5 to I3 to I6
(
to I4

DEPT. OF CSE, SJBIT Page 61

COMPILER DESIGN [10CS63] UNIT- III SYNTAX ANALYSIS-2

Constructing SLR Parsing Table

(of an augumented grammar G’)

1. Construct the canonical collection of sets of LR(0) items for G’.

Cm{I0,...,In}
2. Create the parsing action table as follows

m
• If a is a terminal, AoD.aE in Ii and goto(Ii,a)=Ij then action[i,a] is shift j.
• If AoD. is in Ii , then action[i,a] is reduce AoD for all a in FOLLOW(A)
where AzS’.
• If S’oS. is in Ii , then action[i,$] is accept.

o
• If any conflicting actions generated by these rules, the grammar is not SLR(1).

3. Create the parsing goto table

s.c
• for all non-terminals A, if goto(Ii,A)=Ij then goto[i,A]=j

4. All entries not defined by (2) and (3) are errors.

5. Initial state of the parser contains S’o.S

c
(SLR) Parsing Tables for Expression Grammar
vtu
Action Table Goto Table
1) E o E+T state id + * ( ) $ E T F
0 s5 s4 1 2 3
2) EoT
1 s6 acc
3) T o T*F 2 r2 s7 r2 r2
4) ToF 3 r4 r4 r4 r4
w.

5) F o (E) 4 s5 s4 8 2 3
6) F o id 5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
ww

9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5

DEPT. OF CSE, SJBIT Page 62

COMPILER DESIGN [10CS63] UNIT- III SYNTAX ANALYSIS-2

Actions of A (S)LR-Parser – Example

stack in p u t action goto parsing
$0 id*id+id$ [0,id]=s5 shift 5
$0id5 *id+id$ [5,*]=r6 [0,F]=3 reduce by Foid (pop 2|id| no. of
symbols from stack and push F to the stack)
$0F3 *id+id$ [3,*]=r4 [0,T]=2 reduce by ToF (pop 2|F| no. of
symbols from stack and push T onto the stack)

m
$0T2 *id+id$ [2,*]=s7 shift 7
$0T2*7 id+id$ [7,id]=s5 shift 5
$0T2*7id5 +id$ [5,+]=r6 [7,F]=10 reduce by Foid(pop 2|id| no. of

o
symbols from stack and push F onto the stack)
$0T2*7F10 +id$ [10,+]=r3 [0,T]=2 reduce by ToT*F(pop 2 |T*F| no. of
symbols from stack and push F on the stack)

s.c
$0T2 +id$ [2,+]=r2 [0,E]=1 reduce by EoT (pop 2|T| no. of
symbols from stack and push E onto the stack)
$0E1 +id$ [1,+]=s6 shift 6
$0E1+6 id$ [6,id]=s5 shift 5
$0E1+6id5 $ c[5,$]=r6 [6,F]=3 reduce by Foid (pop 2|id| no. of
symbols from stack and push F onto the stack)
$0E1+6F3 $ [3,$]=r4 [6,F]=3 reduce by ToF (pop 2|F| no. of
vtu
symbols from stack and push T onto the stack)
$0E1+6T9 $ [9,$]=r1 [0,E]=1 reduce by EoE+T (pop 2
|E+T| no. of symbols from stack and push F on the stack)
$0E1 $ accept
w.
ww

DEPT. OF CSE, SJBIT Page 63

COMPILER DESIGN [10CS63] UNIT- III SYNTAX ANALYSIS-2

SLR(1) Grammar
• An LR parser using SLR(1) parsing tables for a grammar G is called as the
SLR(1) parser for G.
• If a grammar G has an SLR(1) parsing table, it is called SLR(1) grammar (or SLR
grammar in short).
• Every SLR grammar is unambiguous, but every unambiguous grammar is not a
SLR grammar.

m
shift/reduce and reduce/reduce conflicts
• If a state does not know whether it will make a shift operation or reduction for a
terminal, we say that there is a shift/reduce conflict.

o
• If a state does not know whether it will make a reduction operation using the
production rule i or j for a terminal, we say that there is a reduce/reduce conflict.
• If the SLR parsing table of a grammar G has a conflict, we say that that grammar

ToF T
ToF
F o (E)
Foi
ww

DEPT. OF CSE, SJBIT Page 64

COMPILER DESIGN [10CS63] UNIT- III SYNTAX ANALYSIS-2

Construct parsing table for this. In this table there are 2 actions in one entry of the
table which is why It is not a SLR(1) grammar.

Another example for not SLR(1) grammar:

Conflict Example2

m
S o AaAb I0: S’ o .S
S o BbBa S o .AaAb
AoH S o .BbBa

o
BoH Ao.
Bo.

s.c
Problem
FOLLOW(A)={a,b}
FOLLOW(B)={a,b}
a reduce by A o H b reduce by A o H
reduce by B o H reduce by B o H
reduce/reduce conflictc reduce/reduce conflict
vtu
Problems : show that following grammars are not SLR(1) by constructing
parsing table.
1. Show that S  S(S)S | not SLR(1)
2. Show that S  AaAb | BbBa
A 
w.

B  is not SLR(1) but is LL(1)

DEPT. OF CSE, SJBIT Page 65

COMPILER DESIGN [10CS63] UNIT- IV SYNTAX ANALYSIS-3

UNIT IV: SYNTAX ANALYSIS – 3

SYLLABUS:
x Introduction to LR Parsing:
x Simple LR;
x More powerful LR parsers (excluding Efficient construction and compaction of

m
parsing tables) ;
x Using ambiguous grammars;

o
x Parser Generators.

c s.c
vtu
w.
ww

DEPT. OF CSE, SJBIT Page 66

COMPILER DESIGN [10CS63] UNIT- IV SYNTAX ANALYSIS-3

Constructing Canonical LR(1) Parsing Tables

• In SLR method, the state i makes a reduction by AoD when the current token is
a:

– if the AoD. in the Ii and a is FOLLOW(A)

m
– In some situations, EA cannot be followed by the terminal a in a right-
sentential form when ED and the state i are on the top stack. This means
that making reduction in this case is not correct.

o
S o AaAb SAaAbAabab SBbBaBbaba
S o BbBa
AoH Aab H ab Bba H ba

s.c
BoH AaAb Aa H b BbBa Bb H a
LR(1) Item
• To avoid some of invalid reductions, the states need to carry more information.

• Extra information is put into a state by including a terminal symbol as a second

• A LR(1) item is:

c
component in an item.
vtu
A o D.E,a where a is the look-head of the LR(1) item
(a is a terminal or end-marker.)
• When E ( in the LR(1) item A o D.E,a ) is not empty, the look-head does not
have any affect.

• When E is empty (A o D.,a ), we do the reduction by AoD only if the next

input symbol is a (not for any terminal in FOLLOW(A)).

• A state will contain A o D.,a1 where {a1,...,an} FOLLOW(A)

...
A o D.,an
Canonical Collection of Sets of LR(1) Items
• The construction of the canonical collection of the sets of LR(1) items are similar
to the construction of the canonical collection of the sets of LR(0) items, except
that closure and goto operations work a little bit different.

closure(I) is: ( where I is a set of LR(1) items)

– every LR(1) item in I is in closure(I)

DEPT. OF CSE, SJBIT Page 67

COMPILER DESIGN [10CS63] UNIT- IV SYNTAX ANALYSIS-3

– if AoD.BE,a in closure(I) and BoJ is a production rule of G; then

Bo.J,b will be in the closure(I) for each terminal b in FIRST(Ea) .

goto operation
• If I is a set of LR(1) items and X is a grammar symbol (terminal or non-terminal),
then goto(I,X) is defined as follows:

m
– If A o D.XE,a in I then
every item in closure({A o DX.E,a}) will be in goto(I,X).

Construction of The Canonical LR(1) Collection

• Algorithm:

o
C is { closure({S’o.S,$}) }

s.c
repeat the followings until no more set of LR(1) items can be added to C.
for each I in C and each grammar symbol X
if goto(I,X) is not empty and not in C
add goto(I,X) to C
goto function is a DFA on the sets in C.
A Short Notation for The Sets of LR(1) Items
c
• A set of LR(1) items containing the following items

A o D.E,a1
vtu
...
A o D.E,an
can be written as
A o D.E,a1/a2/.../an
w.
ww

DEPT. OF CSE, SJBIT Page 68

COMPILER DESIGN [10CS63] UNIT- IV SYNTAX ANALYSIS-3

o m
s.c
SLR(1) Parsing table
id * = $ S L R
0
1
s5 s4
acc
c 1 2 3
vtu
2 s6/r5 r5
3 r2
4 s5 s4 8 7
5 r4 r4
w.

6 s5 s4 10 9
7 r3 r3
8 r5 r5
ww

9 r1

DEPT. OF CSE, SJBIT Page 69

COMPILER DESIGN [10CS63] UNIT- IV SYNTAX ANALYSIS-3

Canonical LR(1) Collection – Example2

S’ o S I0:S’ o .S,$ I1:S’ o S.,$ I4:L o *.R,$/= R t o I7
1) S o L=R S o .L=R,$ S * R o .L,$/= L
to I6 Lo .*R,$/= t o I8
2) S o R S o .R,$ L I2:S o L.=R,$ *
3) Lo *R L o .*R,$/= R o L.,$ L o . id , $ /= t o I4
id

m
4 ) L o id L o . id , $ /= R t o I5
I3:S o R.,$ id
I 5: L o i d . , $ / =
5) R o L R o .L,$

I9:S o L=R.,$
R I13:L o *R.,$
I6:S o L=.R,$ t o I9

o
R o .L,$ L I10:R o L.,$
to I10
L o .*R,$ * I4 and I11
to I11 I11:L o *.R,$ R
L o . id , $ to I13
id R o .L,$ L I5 and I12

s.c
to I12 to I10
Lo .*R,$ *
I7:L o *R.,$/= L o . id , $ to I11 I7 and I13
id
I8: R o L.,$/= to I12 I8 and I10
I12:L o id.,$

c
Construction of LR(1) Parsing Tables
1. Construct the canonical collection of sets of LR(1) items for G’. Cm{I0,...,In}
vtu
2. Create the parsing action table as follows
• If a is a terminal, AoD.aE,b in Ii and goto(Ii,a)=Ij then action[i,a] is
shift j.
• If AoD.,a is in Ii , then action[i,a] is reduce AoD where AzS’.
• If S’oS.,$ is in Ii , then action[i,$] is accept.
w.

• If any conflicting actions generated by these rules, the grammar is not

LR(1).
3. Create the parsing goto table
• for all non-terminals A, if goto(Ii,A)=Ij then goto[i,A]=j
4. All entries not defined by (2) and (3) are errors.
ww

Initial state of the parser contains S’o.S,$

DEPT. OF CSE, SJBIT Page 70

COMPILER DESIGN [10CS63] UNIT- IV SYNTAX ANALYSIS-3

LR(1) Parsing Tables – (for Example2)

id * = $ S L R
0 s5 s4 1 2 3
1 acc
2 s6 r5
3 r2

m
4 s5 s4 8 7
5 r4 r4 no shift/reduce or
6 s12 s11 10 9 no reduce/reduce conflict
7 r3 r3

o
8 r5 r5
9 r1 so, it is a LR(1) grammar
10 r5

s.c
11 s12 s11 10 13
12 r4
13 r3

LALR Parsing Tables c

• LALR stands for LookAhead LR.
• LALR parsers are often used in practice because LALR parsing tables are smaller
vtu
than LR(1) parsing tables.
• The number of states in SLR and LALR parsing tables for a grammar G are equal.
• But LALR parsers recognize more grammars than SLR parsers.
• yacc creates a LALR parser for the given grammar.
• A state of LALR parser will be again a set of LR(1) items.
Creating LALR Parsing Tables
w.

Canonical LR(1) Parser  LALR Parser

shrink # of states
• This shrink process may introduce a reduce/reduce conflict in the resulting
LALR parser (so the grammar is NOT LALR)
• But, this shrik process does not produce a shift/reduce conflict.
ww

The Core of A Set of LR(1) Items

• The core of a set of LR(1) items is the set of its first component.

Ex: S o L.=R,$  S o L.=R Core

R o L.,$ R o L.
• We will find the states (sets of LR(1) items) in a canonical LR(1) parser with
same cores. Then we will merge them as a single state.

I1:L o id.,= A new state: I12: L o id.,=

 L o id.,$

DEPT. OF CSE, SJBIT Page 71

COMPILER DESIGN [10CS63] UNIT- IV SYNTAX ANALYSIS-3

I2:L o id.,$ have same core, merge them

• We will do this for all states of a canonical LR(1) parser to get the states of the
LALR parser.

• In fact, the number of the states of the LALR parser for a grammar will be equal
to the number of states of the SLR parser for that grammar.

Creation of LALR Parsing Tables

m
• Create the canonical LR(1) collection of the sets of LR(1) items for the given
grammar.

• Find each core; find all sets having that same core; replace those sets having same

o
cores with a single set which is their union.

s.c
C={I0,...,In}  C’={J1,...,Jm} where m d n
• Create the parsing tables (action and goto tables) same as the construction of the
parsing tables of LR(1) parser.

– Note that: If J=I1 ... Ik since I1,...,Ik have same cores

–
c
 cores of goto(I1,X),...,goto(I2,X) must be same.
So, goto(J,X)=K where K is the union of all sets of items having same
cores as goto(I1,X).
vtu
• If no conflict is introduced, the grammar is LALR(1) grammar. (We may
only introduce reduce/reduce conflicts; we cannot introduce a shift/reduce
conflict)

Shift/Reduce Conflict
• We say that we cannot introduce a shift/reduce conflict during the shrink process
w.

for the creation of the states of a LALR parser.

• Assume that we can introduce a shift/reduce conflict. In this case, a state of LALR
parser must have:
ww

A o D.,a and B o E.aJ,b

• This means that a state of the canonical LR(1) parser must have:

A o D.,a and B o E.aJ,c

But, this state has also a shift/reduce conflict. i.e. The original canonical LR(1)
parser has a conflict.
(Reason for this, the shift operation does not depend on lookaheads)
Reduce/Reduce Conflict
• But, we may introduce a reduce/reduce conflict during the shrink process for the
creation of the states of a LALR parser.

DEPT. OF CSE, SJBIT Page 72

COMPILER DESIGN [10CS63] UNIT- IV SYNTAX ANALYSIS-3

I1 : A o D.,a I2: A o D.,b

B o E.,b B o E.,c

I12: A o D.,a/b  reduce/reduce conflict

B o E.,b/c

Canonical LALR(1) Collection – Example2

m
. . .
. .
S’ o S I 0: S ’ o S,$ I 1: S ’ o S , $ I411:L o * R,$/= R to I713
1) S o L=R So
.
L=R,$ S
..
*
.
R o L,$/= L
to I810
. .
2) S o R So R,$ L2I : S o L = R , $ t o I 6 Lo *R,$/= *

o
R o L ,$ to I411
.
3) Lo *R Lo *R,$/= L o id , $ /=

.
id
4 ) L o id
5) R o L
Lo
Ro .
id , $ /=
L,$
R
I 3: S o R , $ id
.
I :L o id , $ /=
512
to I512

s.c
.
.. .
R I9:S o L=R ,$
I6:S o L= R,$ t o I9 Same Cores
R o L,$ L
to I810 I4 and I11

.
L o *R,$ *
L o id , $ to I411
id I5 and I12
to I512
.
I713:L o *R ,$/=
c I7 and I13

.
I810: R o L ,$/=
I8 and I10
vtu
w.
ww

Using Ambiguous Grammars

• All grammars used in the construction of LR-parsing tables must be un-
ambiguous.

• Can we create LR-parsing tables for ambiguous grammars ?

– Yes, but they will have conflicts.

– We can resolve these conflicts in favor of one of them to disambiguate the

grammar.

DEPT. OF CSE, SJBIT Page 73

COMPILER DESIGN [10CS63] UNIT- IV SYNTAX ANALYSIS-3

– At the end, we will have again an unambiguous grammar.

• Why we want to use an ambiguous grammar?

– Some of the ambiguous grammars are much natural, and a corresponding

unambiguous grammar can be very complex.

– Usage of an ambiguous grammar may eliminate unnecessary reductions.

m
• Ex.

E o E+T | T

o
E o E+E | E*E | (E) | id  T o T*F | F
F o (E) | id

s.c
Sets of LR(0) Items for Ambiguous Grammar

..EE+E .. .. ..
I 0: E ’ o E I 1: E ’ o E + I 4: E o E + E E I7: E o E+E + I4

.E*E . . .
Eo E o E +E E o E+E ( E o E +E * I

..(idE) ..
5
Eo E o E *E E o E*E I2 E o E *E
* id
Eo c E o (E ) I3
Eo E o id
(

.. ..
(
I 5: E o E * E E
vtu
..EE+)E
I8: E o E*E + I4
.
E o E+E (

.
I 2: E o ( I2 E o E +E * I
..
E o E*E id
.E*E
Eo I3 E o E *E 5
E o (E )
..(idE)
Eo E
Eo E o id

.. .
id Eo id )
I 6: E o ( E ) I 9: E o ( E )

I : E o id . .
E o E +E +
3 E o E *E * I4
w.

I5
ww

DEPT. OF CSE, SJBIT Page 74

COMPILER DESIGN [10CS63] UNIT- IV SYNTAX ANALYSIS-3

SLR-Parsing Tables for Ambiguous Grammar

FOLLOW(E) = { $,+,*,) }

m
State I8 has shift/reduce conflicts for symbols + and *.

E * E
I0 I1 I5 I7

when current token is *

o
shift  * is right-associative
reduce  * is left-associative

s.c
when current token is +
shift  + has higher precedence than *
reduce  * has higher precedence than +

c
SLR-Parsing Tables for Ambiguous Grammar
vtu
Action Goto
id + * ( ) $ E
0 s3 s2 1
1 s4 s5 acc
2 s3 s2 6
3 r4 r4 r4 r4
4 s3 s2 7
w.

5 s3 s2 8
6 s4 s5 s9
7 r1 s5 r1 r1
8 r2 r2 r2 r2
9 r3 r3 r3 r3
ww

DEPT. OF CSE, SJBIT Page 75

COMPILER DESIGN [10CS63] UNIT- IV SYNTAX ANALYSIS-3

Error Recovery in LR Parsing

• An LR parser will detect an error when it consults the parsing action
table and finds an error entry. All empty entries in the action table are
error entries.
• Errors are never detected by consulting the goto table.

m
• An LR parser will announce error as soon as there is no valid
continuation for the scanned portion of the input.
• A canonical LR parser (LR(1) parser) will never make even a single
reduction before announcing an error.
• The SLR and LALR parsers may make several reductions before

o
announcing an error.
• But, all LR parsers (LR(1), LALR and SLR parsers) will never shift an
erroneous input symbol onto the stack.

s.c
Panic Mode Error Recovery in LR Parsing
c
• Scan down the stack until a state s with a goto on a particular
nonterminal A is found. (Get rid of everything from the stack before this
state s).
vtu
• Discard zero or more input symbols until a symbol a is found that can
legitimately follow A.
– The symbol a is simply in FOLLOW(A), but this may not work for all situations.
• The parser stacks the nonterminal A and the state goto[s,A], and it
resumes the normal parsing.
• This nonterminal A is normally is a basic programming block (there can
be more than one choice for A).
w.

– stmt, expr, block, ...

Phrase-Level Error Recovery in LR Parsing

• Each empty entry in the action table is marked with a specific error routine.
• An error routine reflects the error that the user most likely will make in that case.
• An error routine inserts the symbols into the stack or the input (or it deletes the
symbols from the stack and the input, or it can do both insertion and deletion).
– missing operand
– unbalanced right parenthesis

DEPT. OF CSE, SJBIT Page 76

COMPILER DESIGN [10CS63] UNIT – V S DD

PART-B
UNIT V: SYNTAX-DIRECTED DEFINITIONS
SYLLABUS:
x Syntax-directed definitions;
x Evaluation orders for SDDs;

m
x Applications of syntax-directed translation;
x Syntax-directed translation schemes

o
c s.c
vtu
w.
ww

DEPT. OF CSE, SJBIT Page 63

COMPILER DESIGN [10CS63] UNIT – V S DD

Overview
input parse tree dependency graph attribute evaluation order
 Grammar symbols are associated with attributes to associate information with
the programming language constructs that they represent.
 Values of these attributes are evaluated by the semantic rules associated with the
production rules.
 Evaluation of these semantic rules:

m
o may generate intermediate codes
o may put information into the symbol table
o may perform type checking

o
o may issue error messages
o may perform some other activities
o in fact, they may perform almost any activities.

s.c
 An attribute may hold almost anything.
o a string, a number, a memory location, a complex record.
Attributes for expressions:
type of value: int, float, double, char, string,…
type of construct: variable, constant, operations, …
c
Attributes for constants: values
Attributes for variables: name, scope
vtu
o Attributes for operations: arity, operands, operator,…
 When we associate semantic rules with productions, we use two notations:
o Syntax-Directed Definitions
o Translation Schemes
 Syntax-Directed Definitions:
o give high-level specifications for translations
w.

o Hide many implementation details such as order of evaluation of semantic

actions.
o We associate a production rule with a set of semantic actions, and we do
not say when they will be evaluated.
 Translation Schemes:
ww

o Indicate the order of evaluation of semantic actions associated with a

production rule.
o In other words, translation schemes give a little bit information about
implementation details.
Syntax directed definition (SDD) :
 To translate a programming language construct compiler has to keep track of
many quantities such as the type of the construct, location of the first instruction
in target code or the number of instructions generated.

DEPT. OF CSE, SJBIT Page 64

COMPILER DESIGN [10CS63] UNIT – V S DD

 A formalist called as syntax directed definition is used fort specifying translations

for programming language constructs.
 A syntax directed definition is a generalization of a context free grammar in
which each grammar symbol has associated set of attributes and each and each
productions is associated with a set of semantic rules
Definition of (syntax Directed definition ) SDD :
SDD is a generalization of CFG in which each grammar productions XoD is

m
associated with it a set of semantic rules of the form
a: = f(b1,b2…..bk)
Where a is an attributes obtained from the function f.

o
• A syntax-directed definition is a generalization of a context-free grammar in
which:
– Each grammar symbol is associated with a set of attributes.

s.c
– This set of attributes for a grammar symbol is partitioned into two subsets
called synthesized and inherited attributes of that grammar symbol.
– Each production rule is associated with a set of semantic rules.
• Semantic rules set up dependencies between attributes which can be represented
by a dependency graph.
c
• This dependency graph determines the evaluation order of these semantic rules.
• Evaluation of a semantic rule defines the value of an attribute. But a semantic rule
vtu
may also have some side effects such as printing a value.
The two attributes for non terminal are :
1) synthesized attribute (S-attribute) : (n)
An attribute is said to be synthesized attribute if its value at a parse tree node is
determined from attribute values at the children of the node
2) Inherited attribute : (o,n)
w.

An inherited attribute is one whose value at parse tree node is determined in

terms of attributes at the parent and | or siblings of that node.
 The attribute can be string, a number, a type, a, memory location or
anything else.
 The parse tree showing the value of attributes at each node is called an
ww

annotated parse tree.

The process of computing the attribute values at the node is called annotating
or decorating the parse tree.
Terminals can have synthesized attributes, but not inherited attributes.
Annotated Parse Tree
• A parse tree showing the values of attributes at each node is called an
annotated parse tree.
• The process of computing the attributes values at the nodes is called annotating
(or decorating) of the parse tree.

DEPT. OF CSE, SJBIT Page 65

COMPILER DESIGN [10CS63] UNIT – V S DD

• Of course, the order of these computations depends on the dependency graph

induced by the semantic rules.
Ex1:
1) Synthesized Attributes :
Ex: Consider the CFG :
So EN
Eo E+T

m
EoE-T
Eo T
To T*F

o
ToT/F
ToF

s.c
Fo(E)
Fodigit
No;
Solution :
The syntax directed definition can be written for the above grammar by using
c
semantic actions for each production.
vtu
Production rule Semantic actions
S oEN S.val=E.val
E oE1+T E.val =E1.val + T.val
EoE1-T E.val = E1.val – T.val
EoT E.val =T.val
w.

ToTF T.val = T.val F.val

ToT|F T.val =T.val | F.val
Fo(E) F.val =E.val
T oF T.val =F.val
Fodigit F.val =digit.lexval
ww

No; can be ignored by lexical Analyzer as; is

terminating
Symbol.

For the Non-terminals E,T and F the values can be obtained using the attribute “Val”.
The taken digit has synthesized attribute “lexval”.
In SoEN, symbol S is the start symbol. This rule is to print the final answer of
expressed.
Following steps are followed to Compute S attributed definition.

DEPT. OF CSE, SJBIT Page 66

COMPILER DESIGN [10CS63] UNIT – V S DD

1. Write the SDD using the appropriate semantic actions for corresponding
production rule of the given Grammar.
2. The annotated parse tree is generated and attribute values are computed. The
Computation is done in bottom up manner.
3. The value obtained at the node is supposed to be final output.
PROBLEM 1:
Consider the string 5*6+7; Construct Syntax tree, parse tree and annotated tree.

m
Solution :
The corresponding annoted parse tree is shown below for the string 5*6+7;
Syntax tree:

o
Parse tree:
c s.c S
vtu
E N

E + T

+
T F
w.

T * F digit

F digit 7
ww

digit 6

The corresponding annoted parse tree is shown below for the string 5*6+7;

DEPT. OF CSE, SJBIT Page 67

COMPILER DESIGN [10CS63] UNIT – V S DD

o m
s.c
Advantages: SDDs are more readable and hence useful for specifications
Disadvantages: not very efficient.
Ex2: c
PROBLEM : Consider the grammar that is used for Simple desk calculator. Obtain
the Semantic action and also the annotated parse tree for the string
vtu
3*5+4n.
LoEn
EoE1+T
EoT
ToT1*F
ToF
w.

Fo(E)
Fodigit
Solution :
Production rule Semantic actions
ww

LoEn L.val=E.val
EoE1+T E.val=E1.val + T.val
EoT E.val=T.val
ToT1*F T.val=T1.val*F.val
ToF T.val=F.val
Fo(E) F.val=E.val
Fodigit F.val=digit.lexval
The corresponding annotated parse tree U shown below, for the string 3*5+4n.

DEPT. OF CSE, SJBIT Page 68

COMPILER DESIGN [10CS63] UNIT – V S DD

o m
c s.c
Fig:Annotated parse tree
vtu
Exercise :
For the SDD of the problem 1 give annotated parse tree for the Following expressions
a) (3+4)*(5+6)n
b) 1*2*3*(4+5)n
c) (9+8*(7+6)+5)*4n
w.
ww

Solution: a)

DEPT. OF CSE, SJBIT Page 69

COMPILER DESIGN [10CS63] UNIT – V S DD

o m
b) cs.c
vtu
w.
ww

DEPT. OF CSE, SJBIT Page 70

COMPILER DESIGN [10CS63] UNIT – V S DD

o m
c s.c
vtu
Dependency Graphs
w.

Note: should be in Figure 5.6

Dependency graph and topological sort:

1. For each parse-tree node, say a node labeled by grammar symbol X, the
dependency graph has a node for each attribute associated with X.
2. If a semantic rule associated with a production p defines the value of
synthesized attribute A.b in terms of the value of X.c. Then the
dependency graph has an edge from X.c to A.b .
3. If a semantic rule associated with a production p defines the value of
inherited attribute B.c in terms of the value X.a. Then , the dependency
graph has an edge from X.a to B.c.

DEPT. OF CSE, SJBIT Page 71

COMPILER DESIGN [10CS63] UNIT – V S DD

Evaluation Orders for SDD’s

• Dependency graphs – are a useful tool for determining an evaluation order for the
attribute instances in a given parse tree.
• A dependency graph depicts the flow of information among the attribute instances
in a particular parse tree.
– An edge from one attribute instance to another means that the value of the

m
first is needed to compute the second. Edges express constraints implied
by the semantic rules.
Edges express constraints implied by the semantic rules

o
c s.c
vtu
2) Inherited attributes :
Consider an example and compute the inherited attributes, annotate the parse
tree for the computation of inherited attributes for the given string int a, b, c
Ex:
w.

SoTL
Toint
Tofloat
Tochar
ww

Todouble
LoL,id
L o id
The steps are to be followed are:
1) Construct the syntax directed definition using semantic action.
2) Annotate the parser tree with inherited attributes by processing in top down
fashion.

DEPT. OF CSE, SJBIT Page 72

COMPILER DESIGN [10CS63] UNIT – V S DD

The SDD is given below:

Production rule Semantic actions

S o TL L.inh =T.type
Toint T.type = int

m
Tofloat T.type =float
Tochar T.type =char
Todouble T.type=double

o
LoL,id T.type = L, inh
Add–type (id.entry,L.inh)

s.c
Loid Add – type (id.entry,L.inh)
string int a, b,c

c
vtu
w.
ww

Fig: Annotated parse tree.

DEPT. OF CSE, SJBIT Page 73

COMPILER DESIGN [10CS63] UNIT – V S DD

String float id1, id2, id3

o m
s.c
Ex2: PROBLEMS: consider the following context free grammar for evaluating
arithmetic expressions with operator *
T FT’
T’  *FT’
T’ 
F digit
c
vtu
w.
ww

DEPT. OF CSE, SJBIT Page 74

COMPILER DESIGN [10CS63] UNIT – V S DD

o m
s.c
Dependency graph above example:
T 9 val

F 3 val
c inh 5 T’ 8 syn
vtu
Digit 1 lexval * F 4 va l i nh
6 T’ 7 syn
w.

digit 2 lexval
A topological sort of the above graph gives the order of evaluation of the SDD. One of
the topological sort order is (1,2,3,4,5,6,7,8 and 9) another topological sort order is
(1,2,5,2,4,6,7,8,9)
Advantages: dependency graph helps in computing the order of evaluation of the
ww

attributes
Disadvantage: it cannot give the order of evaluation of attributes if there is a cycle
formation in the graph. However, this disadvantage can be overcome by using S –
attributed and L – attributed definitions.

Problems on SDD, Annotated parse tree, Dependency graph, evaluation of order:

DEPT. OF CSE, SJBIT Page 75

COMPILER DESIGN [10CS63] UNIT – V S DD

1. E TE’ 2. T  BC
E’  +TE’ B  int
E’  B  float
T FT’ C  [num]C
T’  *FT’ C
T’  String (i) “int[2][3]” (note: int
F (E) [2][3] should be passed as array(2,

m
F  id array(3, integer)))
String (i) “id +id*id” (ii) “ (id (ii) “float [3]” (iii) “float
+ id* id)” [3][3][2]”

o
S-Attributed Definitions
• Syntax-directed definitions are used to specify syntax-directed translations.

s.c
• To create a translator for an arbitrary syntax-directed definition can be difficult.
• We would like to evaluate the semantic rules during parsing (i.e. in a single pass,
we will parse and we will also evaluate semantic rules during the parsing).
• We will look at two sub-classes of the syntax-directed definitions:
– S-Attributed Definitions: only synthesized attributes used in the syntax-
c
directed definitions.
– L-Attributed Definitions: in addition to synthesized attributes, we may
vtu
also use inherited attributes in a restricted fashion.
• To implement S-Attributed Definitions and L-Attributed Definitions are easy (we
can evaluate semantic rules in a single pass during the parsing).
• Implementations of S-attributed Definitions are a little bit easier than
implementations of L-Attributed Definitions
L-Attributed Definitions
w.

• S-Attributed Definitions can be efficiently implemented.

• We are looking for a larger (larger than S-Attributed Definitions) subset of
syntax-directed definitions which can be efficiently evaluated.
 L-Attributed Definitions
• L-Attributed Definitions can always be evaluated by the depth first visit of the
ww

parse tree.
This means that they can also be evaluated during the parsing
• A syntax-directed definition is L-attributed if each inherited attribute of Xj,
where 1djdn, on the right side of A → X1X2...Xn depends only on:
1. The attributes of the symbols X1,...,Xj-1 to the left of Xj in the production
and
2. the inherited attribute of A
• Every S-attributed definition is L-attributed, the restrictions only apply to the
inherited attributes (not to synthesized attributes).

DEPT. OF CSE, SJBIT Page 76

COMPILER DESIGN [10CS63] UNIT – V S DD

o m
s.c
Semantic Rules with Controlled Side effects
• Permit incidental side effects the do not constrain attribute evaluation
• Constrain the allowable evaluation orders, so that the same translation is produced
foer any allowable order.
c
– Ex: For production L En Semantic Rule is print(E.val)
vtu
w.
ww

Applications of Syntax-Directed Translation

• Construction of syntax Trees
– The nodes of the syntax tree are represented by objects with a suitable
number of fields.
– Each object will have an op field that is the label of the node.
– The objects will have additional fields as follows
• If the node is a leaf, an additional field holds the lexical value for
the leaf. A constructor function Leaf (op, val) creates a leaf object.
• If nodes are viewed as records, the Leaf returns a pointer to a new
record for a leaf.

DEPT. OF CSE, SJBIT Page 77

COMPILER DESIGN [10CS63] UNIT – V S DD

• If the node is an interior node, there are as many additional fields

as the node has children in the syntax tree. A constructor function
Node takes two or more arguments:
Node (op , c1,c2,…..ck) creates an object with first field op and k additional fields for
the k children c1,c2,…..ck
SDD- To construct syntax tree for a simple expression
Production Semantic Rules

m
E → E1 + T E.node = new Node (‘+’, E1 .node, T.node)
E → E1 - T E.node = new Node (‘-’, E1 .node, T.node)
E→T E. node= T.node

o
T→(E) T.node = E.node
T → id T.node= new Leaf (id, id.entry)
T → num T.node= new Leaf (num, num.val)

s.c
This is an example for S-attributed definition

c
vtu
w.
ww

L-attributed definition for Simple Expression

DEPT. OF CSE, SJBIT Page 78

COMPILER DESIGN [10CS63] UNIT – V S DD

o m
c s.c
vtu
Syntax-Directed Translation Schemes
A SDT scheme is a context-free grammar with program fragments embedded within
production bodies .The program fragments are called semantic actions and can appear at
any position within the production body.
Any SDT can be implemented by first building a parse tree and then pre-forming the
w.

actions in a left-to-right depth first order. i.e during preorder traversal.

The use of SDT’s to implement two important classes of SDD’s
1. If the grammar is LR parsable, then SDD is S-attributed.
2. If the grammar is LL parsable, then SDD is L-attributed.
ww

Postfix Translation Schemes

The postfix SDT implements the desk calculator SDD with one change: the action for
the first production prints the value. As the grammar is LR, and the SDD is S-attributed.
L  E n {print(E.val);}
E → E1 + T { E.val = E1.val + T.val }
E → E1 - T { E.val = E1.val - T.val }
E → T { E.val = T.val }
T → T1 * F { T.val = T1.val * F.val }
T → F { T.val = F.val }
F→(E) { F.val = E.val }

DEPT. OF CSE, SJBIT Page 79

COMPILER DESIGN [10CS63] UNIT – V S DD

F → digit { F.val = digit.lexval }

Syntax Trees

o m
Postfix SDT's
Leftmost: the leftmost nonterminal is always chosen for expansion at each step of

s.c
x
derivation.
L-attributed SDT's
x Shows a graphical depiction of a derivation.

c
vtu
w.
ww

Bottom-Up Evaluation of S-Attributed Definitions

DEPT. OF CSE, SJBIT Page 80

COMPILER DESIGN [10CS63] UNIT – V S DD

o m
Production Semantic Rules

s.c
L→E return {print(stack[top-1].val); top =top – 1;}
E → E1 + T {stack[top-2].val = stack[top-2].val + stack[top].val; top =
top – 2;}
E→T
T → T1 * F {stack[top-2].val = stack[top-2].val * stack[top].val; top =
top – 2;}
T→F
c
vtu
F→(E) { stack[top-2].val = stack[top-1].val ; top = top – 2;}
F → digit
• At each shift of digit, we also push digit.lexval into val-stack.
• At all other shifts, we do not put anything into val-stack because other terminals
do not have attributes (but we increment the stack pointer for val-stack).
Translation Schemes
w.

• In a syntax-directed definition, we do not say anything about the evaluation times

of the semantic rules (when the semantic rules associated with a production
should be evaluated?).
• In a syntax-directed definition, we do not say anything about the evaluation times
of the semantic rules (when the semantic rules associated with a production
ww

should be evaluated?).
• A translation scheme is a context-free grammar in which:
• attributes are associated with the grammar symbols and
• semantic actions enclosed between braces {} are inserted within the right sides
of productions.
• Ex: A → { ... } X { ... } Y { ... }

Semantic Actions

DEPT. OF CSE, SJBIT Page 81

COMPILER DESIGN [10CS63] UNIT – V S DD

• When designing a translation scheme, some restrictions should be observed to

ensure that an attribute value is available when a semantic action refers to that
attribute.
• These restrictions (motivated by L-attributed definitions) ensure that a semantic
action does not refer to an attribute that has not yet computed.
• In translation schemes, we use semantic action terminology instead of semantic
rule terminology used in syntax-directed definitions.

m
• The position of the semantic action on the right side indicates when that semantic
action will be evaluated.
Translation Schemes for S-attributed Definitions

o
• If our syntax-directed definition is S-attributed, the construction of the
corresponding translation scheme will be simple.
• Each associated semantic rule in a S-attributed syntax-directed definition will be

s.c
inserted as a semantic action into the end of the right side of the associated
production.
Production Semantic Rule
E → E1 + T E.val = E1.val + T.val  a production of
c a syntax directed
definition
vtu
E → E1 + T { E.val = E1.val + T.val }  the production of the
corresponding
translation scheme
SDT for infix –to- prefix translation during parsing
L E n
E → {print(‘+’);} E1 + T
w.

E→T
T → {print(‘*’);} T1 * F
T→F
F→(E)
F → digit { print( digit.lexval); }
ww

DEPT. OF CSE, SJBIT Page 82

COMPILER DESIGN [10CS63] UNIT – V S DD

o m
s.c
A Translation Scheme Example
• A simple translation scheme that converts infix expressions to the
c
corresponding postfix expressions.
E→TR
vtu
R → + T { print(“+”) } R1
R→H
T → id { print(id.name) }

a+b+c  ab+c+

infix expression postfix expression

w.
ww

Inherited Attributes in Translation Schemes

• If a translation scheme has to contain both synthesized and inherited attributes, we
have to observe the following rules:

DEPT. OF CSE, SJBIT Page 83

COMPILER DESIGN [10CS63] UNIT – V S DD

1. An inherited attribute of a symbol on the right side of a production must

be computed in a semantic action before that symbol.
2. A semantic action must not refer to a synthesized attribute of a symbol to
the right of that semantic action.
3. A synthesized attribute for the non-terminal on the left can only be
computed after all attributes it references have been computed (we
normally put this semantic action at the end of the right side of the

m
production).
4. With a L-attributed syntax-directed definition, it is always possible to
construct a corresponding translation scheme which satisfies these

o
three conditions (This may not be possible for a general syntax-directed
translation).
A Translation Scheme with Inherited Attributes

s.c
D → T id { addtype(id.entry,T.type), L.in = T.type } L
T → int { T.type = integer }
T → real { T.type = real }
L → id { addtype(id.entry,L.in), L1.in = L.in } L1
L→H c
This is a translation scheme for an L-attributed definition.
vtu
w.
ww

DEPT. OF CSE, SJBIT Page 84

COMPILER DESIGN [10CS63] UNIT – VI INTERMEDIATE-CODE GENERATION

UNIT VI: INTERMEDIATE-CODE GENERATION

SYLLABUS
x Variants of syntax trees;
x Three-address code;
x Translation of expressions;

m
x Control flow; Back patching;
x Switch statements;
x Procedure calls.

o
c s.c
vtu
w.
ww

DEPT. OF CSE, SJBIT Page 85

COMPILER DESIGN [10CS63] UNIT – VI INTERMEDIATE-CODE GENERATION

Background

m
front end | back end

Fig 6.1: Logical structure of the compiler frond end

o
Static checking includes type checking, which ensures that operands are applied to
compatible operands. It also includes any syntactic checks that remain after parsing.
Ex: Static checking assures that a break statement in C is enclosed within a while, for or

s.c
switches statement; otherwise an error message is issued.
Intermediate representation
A complier may construct a sequence of intermediate representation as in fig.

c
Fig6.2: A compiler might use a sequence of intermediate representation
vtu
High level representations are close to source language and low level representations are
close to the target machine.
There are three types of intermediate representation:-
1. Syntax Trees
2. Postfix notation
3. Three Address Code
w.

6.1 Variants of Syntax Trees

• Nodes of syntax tree represent constructs in the source program; the children of a
node represent the meaningful components of a construct.
• A directed acyclic graph (DAG) for an expression identifies the common
subexpressions of the expression. (subexpressions that appears more than once)
ww

• A DAG as leaves corresponding to atomic operands and interior codes

corresponding to operators. A node N in a DAG has more than one parent if N
represents a common subexpression; in a syntax tree.

6.1.1 Directed Acyclic Graphs for Expressions

Eg1: Following Figure shows a dag for the expression a + a * ( b - c ) + ( b - c ) * d.

DEPT. OF CSE, SJBIT Page 86

COMPILER DESIGN [10CS63] UNIT – VI INTERMEDIATE-CODE GENERATION

Fig 6.3:DAG a + a * ( b - c ) + ( b - c ) * d Syntax tree for a + a * ( b - c ) + ( b - c ) *

o m
s.c
Eg2: syntax tree for assignment statement DAG for a=b*-c + b*-c
a=b*-c + b*-c

c
vtu

Eg 3 : For example, the expression 2*x + y

* (2x - y) could be represented by Syntax tree for 2x + y * (2*x - y)

DAG +

* *
ww

2 x y _

* y

2 x

DEPT. OF CSE, SJBIT Page 87

COMPILER DESIGN [10CS63] UNIT – VI INTERMEDIATE-CODE GENERATION

The DAG representation may expose instances where redundancies can be eliminated.
SDD to construct DAG for the expression a + a * ( b - c ) + ( b - c ) * d.

o m
c s.c
vtu
6.1.2 The Value-Number Method for Constructing DAG’s
In many applications, nodes are implemented as records stored in an array, as in Figure
7.In the figure; each record has a label field that determines the nature of the node. We
w.

can refer to a node by its index in the array. The integer index of a node is often called
value number. For example, using value numbers, we can say node 3 has label +, its left
child is node 1, and its right child is node 2.The following algorithm can be used to create
nodes for a dag representation of an expression.
ARRAY
ww

to entry for i

Figure : Nodes o fa DAG for i=i+10 allocated in an array

DEPT. OF CSE, SJBIT Page 88

COMPILER DESIGN [10CS63] UNIT – VI INTERMEDIATE-CODE GENERATION

Algorithm: The value-number method for constructing the nodes of a DAG

Input: Label op, node l, and node r
Output: The value number of a node in the array with signature <op, l, r>
Method: Search the array for a node M with label op, left child l, and right child r. If
there is such node, return the value number of M. If not, create in the array a new node N
with label op, left child l, and right child r, and return its value number.
6.2 Three-Address Code

m
• In three-address code, there is at most one operator on the right side of an
instruction; that is, no built-up arithmetic expressions are permitted.
x+y*z t1 = y * z

o
t2 = x + t1
• Example 6.4:

c s.c
vtu
Problems: write the 3-address code for the following expression
1. if(x + y * z > x * y +z)
a=0;
2. (2 + a * (b – c / d)) / e
3. A :=b * -c + b * -c
w.

6.2.1 Address and Instructions

• Three-address code is built from two concepts: addresses and instructions.
• An address can be one of the following:
– A name: A source name is replaced by a pointer to its symbol table entry.
ww

• A name: For convenience, allow source-program names to

appear as addresses in three-address code. In an
implementation, a source name is replaced by a pointer to
its symbol-table entry, where all information about the
name is kept.
– A constant
• A constant: In practice, a compiler must deal with many different
types of constants and variables
– A compiler-generated temporary

DEPT. OF CSE, SJBIT Page 89

COMPILER DESIGN [10CS63] UNIT – VI INTERMEDIATE-CODE GENERATION

• A compiler-generated temporary. It is useful, especially in

optimizing compilers, to create a distinct name each time a
temporary is needed. These temporaries can be combined, if
possible, when registers are allocated to variables.
A list of common three-address instruction forms:
Assignment statements
– x= y op z, where op is a binary operation

m
– x= op y, where op is a unary operation
– Copy statement: x=y
– Indexed assignments: x=y[i] and x[i]=y

o
– Pointer assignments: x=&y, *x=y and x=*y
Control flow statements
– Unconditional jump: goto L

s.c
– Conditional jump: if x relop y goto L ; if x goto L; if False x goto L
– Procedure calls: call procedure p with n parameters and return y, is
optional
param x1
param x2
c
…
param xn
vtu
call p, n
• Example 6.5:
– do i = i +1; while (a[i]<v);
w.

The multiplication i * 8 is appropriate for an array of elements that each take 8

units of space.
ww

6.2.2 Quadruples
• Three-address instructions can be implemented as objects or as record with fields
for the operator and operands.
• Three such representations
– Quadruple, triples, and indirect triples
• A quadruple (or quad) has four fields: op, arg1, arg2, and result.
• Example 6.6:

DEPT. OF CSE, SJBIT Page 90

COMPILER DESIGN [10CS63] UNIT – VI INTERMEDIATE-CODE GENERATION

m
6.2.3 Triples
• A triple has only three fields: op, arg1, and arg2

o
• Using triples, we refer to the result of an operation x op y by its position, rather by
an explicit temporary name.

s.c
Example 6.7

c
vtu
Fig6.11: Representations of a = b * - c + b * - c
w.
ww

Fig6.12: Indirect triples representation of 3-address code

 The benefit of Quadruples over Triples can be seen in an optimizing compiler,
where instructions are often moved around.
 With quadruples, if we move an instruction that computes a temporary t, then the
instructions that use t require no change. With triples, the result of an operation is
referred to by its position, so moving an instruction may require changing all
references to that result. This problem does not occur with indirect triples.

6.2.4 Static Single-Assignment Form

DEPT. OF CSE, SJBIT Page 91

COMPILER DESIGN [10CS63] UNIT – VI INTERMEDIATE-CODE GENERATION

• Static single assignment form (SSA) is an intermediate representation that

facilitates certain code optimization.
• Two distinct aspects distinguish SSA from three –address code.
– All assignments in SSA are to variables with distinct names; hence the
term static single-assignment.

o m
6.3 Types and Declarations
c s.c
• The applications of types can be grouped under checking and translation:
vtu
– Type checking uses logical rules to reason about the behavior of a program
at run time. Specifically it ensures that the types of the operands match the
type expected by an operator.
– Translation Applications. From the type of a name, a compiler can
determine the storage that will be needed for the name at run time.
• Type information is also needed to calculate the address denoted
w.

by an array reference, to insert explicit type conversions, and to

choose the right version of an arithmetic operator, among other
things.
• In this section, we examine types and storage layout for names declared within a
procedure or a class.
ww

6.3.1 Type Expressions

• Types have structure, which we shall represent using type expressions:
– A type expression is either a basic type or is formed by applying an
operator called a type constructor or to a type expression
• Example 6.8:
– int[2][3]
• “array of 2 arrays of 3 integers each”
• array(2, array(3, integer))

DEPT. OF CSE, SJBIT Page 92

COMPILER DESIGN [10CS63] UNIT – VI INTERMEDIATE-CODE GENERATION

• Definition of type expressions:

m
– A basic type is a type expression.
– A type name is a type expression.
– A type expression can be formed by applying the array type constructor to
a number and a type expression.

o
– A record is a data structure with named fields.
– A type expression can be formed by using the type constructor o for

s.c
function type.
– If s and t are type expressions, then their Cartesian product sut is a type
expression.
– Type expressions may contain variables whose values are type
expressions.
c
6.3.2 Type Equivalence
• When type expressions are represented by graphs, two types are structurally
vtu
equivalent if and only if one of the following condition is true:
– They are the same basic type.
– They are formed by applying the same constructor to structurally
equivalent types.
– One is a type name that denotes the other.
6.3.3 Declaration
w.

• We shall study types and declarations using a simplified grammar that declares
just one name at a time.
D o T id ; D | H
T o B C | record ‘{‘ D ‘}’
B o int | float
ww

C o H | [num] C
• Example 6.9: Storage Layout for Local Names
• Computing types and their widths

DEPT. OF CSE, SJBIT Page 93

COMPILER DESIGN [10CS63] UNIT – VI INTERMEDIATE-CODE GENERATION

m
 Syntax-directed translation of array types

o
c s.c
vtu
Sequences of Declarations
w.

 Actions at the end:

Fields in Records and Classes

6.4 Translation of Expressions

DEPT. OF CSE, SJBIT Page 94

COMPILER DESIGN [10CS63] UNIT – VI INTERMEDIATE-CODE GENERATION

• An expression with more than one operator, like a + b * c, will translate into
instructions with at most one operator per instruction.
• An array reference A[ i ][ j ] will expand into a sequence of three-address
instructions that calculate an address for the reference.
6.4.1 Operations within Expressions
• Example 6.11:
a = b+ -c

m
t1 = minus

o
t2 = b + t1
a = t2

s.c
Three-address code for expressions

c
vtu
w.

Incremental Translation
ww

Addressing Array Elements

DEPT. OF CSE, SJBIT Page 95

COMPILER DESIGN [10CS63] UNIT – VI INTERMEDIATE-CODE GENERATION

o m
Semantic actions for array reference

c s.c
vtu
w.

Translation of Array References

Nonterminal L has three synthesized attributes:
ww

 L.addr
 L.array
 L.type
Conversions between primitive types in Java

DEPT. OF CSE, SJBIT Page 96

COMPILER DESIGN [10CS63] UNIT – VI INTERMEDIATE-CODE GENERATION

m
Introducing type conversions into expression evaluation

o
s.c
Abstract syntax tree for the function definition

c
vtu
w.

Inferring a type for the function length

DEPT. OF CSE, SJBIT Page 97

COMPILER DESIGN [10CS63] UNIT – VI INTERMEDIATE-CODE GENERATION

Algorithm for Unification

o m
boolean unify (Node m, Node n)
{

s.c
s = find(m); t = find(n);
if ( s = t ) return true;
else if ( nodes s and t represent the same basic type ) return true;
else if (s is an op-node with children s1 and s2 and
t is an op-node with children t1 and t2) {
c
union(s , t) ;
return unify(s1, t1) and unify(s2, t2);
vtu
}
else if s or t represents a variable {
union(s, t) ;
return true;
}
else return false;
w.

Control Flow
Boolean expressions are often used to:
ww

 Alter the flow of control.

 Compute logical values.
Short-Circuit Code

Flow-of-Control Statements

DEPT. OF CSE, SJBIT Page 98

COMPILER DESIGN [10CS63] UNIT – VI INTERMEDIATE-CODE GENERATION

o m
s.c
Syntax-directed definition

c
vtu
w.
ww

Generating three-address code for Booleans

DEPT. OF CSE, SJBIT Page 99

COMPILER DESIGN [10CS63] UNIT – VI INTERMEDIATE-CODE GENERATION

o m
c
Translation of a simple if-statement s.c
vtu
w.

Backpatching
 Previous codes for Boolean expressions insert symbolic labels for jumps
 It therefore needs a separate pass to set them to appropriate addresses
 We can use a technique named backpatching to avoid this
ww

 We assume we save instructions into an array and labels will be indices in the
array
 For nonterminal B we use two attributes B.truelist and B.falselist together with
following functions:
 makelist(i): create a new list containing only I, an index into the array of
instructions
 Merge(p1,p2): concatenates the lists pointed by p1 and p2 and returns a
pointer to the concatenated list
 Backpatch(p,i): inserts i as the target label for each of the instruction on
the list pointed to by p

DEPT. OF CSE, SJBIT Page 100

COMPILER DESIGN [10CS63] UNIT – VI INTERMEDIATE-CODE GENERATION

Backpatching for Boolean Expressions

o m
c s.c
vtu
 Annotated parse tree for x < 100 || x > 200 && x ! = y
w.
ww

Flow-of-Control Statements

DEPT. OF CSE, SJBIT Page 101

COMPILER DESIGN [10CS63] UNIT – VI INTERMEDIATE-CODE GENERATION

o m
c s.c
vtu
Translation of a switch-statement
w.
ww

DEPT. OF CSE, SJBIT Page 102

COMPILER DESIGN [10CS63] UNIT- VII RUN TIME ENVIRONMENT

UNIT VII: RUN-TIME ENVIRONMENTS

SYLLABUS
x Storage Organization;
x Stack allocation of space;
x Access to non-local data on the stack;

m
x Heap management;
x Introduction to garbage collection.

o
c s.c
vtu
w.
ww

DEPT. OF CSE, SJBIT Page 103

COMPILER DESIGN [10CS63] UNIT- VII RUN TIME ENVIRONMENT

 Compiler must do the storage allocation and provide access to variables and data
 Memory management
 Stack allocation
 Heap management
 Garbage collection
Storage Organization

o m
s.c
• Assumes a logical address space
– Operating system will later map it to physical addresses, decide how to
use cache memory, etc.
• Memory typically divided into areas for
– Program code
– Other static data storage, including global constants and compiler
c
generated data
– Stack to support call/return policy for procedures
– Heap to store data that can outlive a call to a procedure
vtu
Static vs. Dynamic Allocation
 Static: Compile time, Dynamic: Runtime allocation
 Many compilers use some combination of following
 Stack storage: for local variables, parameters and so on
 Heap storage: Data that may outlive the call to the procedure that created
it
Stack allocation is a valid allocation for procedures since procedure calls are nest
w.
ww

DEPT. OF CSE, SJBIT Page 104

COMPILER DESIGN [10CS63] UNIT- VII RUN TIME ENVIRONMENT

o m
s.c
Activation for Quicksort

c
vtu
Activation tree representing calls during an execution of quicksort
w.
ww

Activation records
 Procedure calls and returns are usaully managed by a run-time stack called the
control stack.
 Each live activation has an activation record (sometimes called a frame)
 The root of activation tree is at the bottom of the stack
 The current execution path specifies the content of the stack with the last
activation has record in the top of the stack.

A General Activation Record

DEPT. OF CSE, SJBIT Page 105

COMPILER DESIGN [10CS63] UNIT- VII RUN TIME ENVIRONMENT

m
Activation Record
 Temporary values
 Local data
 A saved machine status

o
 An “access link”
 A control link
 Space for the return value of the called function

s.c
 The actual parameters used by the calling procedure
 Elements in the activation record:
 temporary values that could not fit into registers
 local variables of the procedure
 saved machine status for point at which this procedure called. includes
c
return address and contents of registers to be restored.
 access link to activation record of previous block or procedure in lexical
scope chain
vtu
 control link pointing to the activation record of the caller
 space for the return value of the function, if any
 actual parameters (or they may be placed in registers, if possible)

Downward-growing stack of activation records

w.
ww

DEPT. OF CSE, SJBIT Page 106

COMPILER DESIGN [10CS63] UNIT- VII RUN TIME ENVIRONMENT

Designing Calling Sequences

 Values communicated between caller and callee are generally placed at the
beginning of callee’s activation record
 Fixed-length items: are generally placed at the middle
 Items whose size may not be known early enough: are placed at the end of
activation record
 We must locate the top-of-stack pointer judiciously: a common approach is to
have it point to the end of fixed length fields.

m
Access to dynamically allocated arrays

o
c s.c
vtu
 ML
 ML is a functional language
 Variables are defined, and have their unchangeable values initialized, by a statement
of the form:
val (name) = (expression)
 Functions are defined using the syntax:
fun (name) ( (arguments) ) = (body)
w.

 For function bodies we shall use let-statements of the form:

let (list of definitions) in (statements) end
A version of quicksort, in ML style, using nested functions
ww

DEPT. OF CSE, SJBIT Page 107

COMPILER DESIGN [10CS63] UNIT- VII RUN TIME ENVIRONMENT

o m
s.c
Access links for finding nonlocal data

c
vtu
w.

Sketch of ML program that uses function-parameters

Actual parameters carry their access link with them

DEPT. OF CSE, SJBIT Page 108

COMPILER DESIGN [10CS63] UNIT- VII RUN TIME ENVIRONMENT

m
Maintaining the Display

o
Memory Manager
Two basic functions:
c s.c
vtu
 Allocation
 Deallocation
 Properties of memory managers:
 Space efficiency
 Program efficiency
Low overhead
Typical Memory Hierarchy Configurations
w.
ww

Locality in Programs

DEPT. OF CSE, SJBIT Page 109

COMPILER DESIGN [10CS63] UNIT- VII RUN TIME ENVIRONMENT

The conventional wisdom is that programs spend 90% of their time executing 10% of the
code:
 Programs often contain many instructions that are never executed.
 Only a small fraction of the code that could be invoked is actually executed in a
typical run of the program.
 The typical program spends most of its time executing innermost loops and tight
recursive cycles in a program.

o m
c s.c
vtu
w.
ww

DEPT. OF CSE, SJBIT Page 110

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

UNIT-VIII: CODE GENERATION

SYLLABUS
x Issues in the design of Code Generator;
x The Target Language;
x Addresses in the target code;

m
x Basic blocks and Flow graphs;
x Optimization of basic blocks;
x A Simple Code Generator

o
c s.c
vtu
w.
ww

DEPT. Of CSE, SJBIT Page 105

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

• The final phase in our compiler model

• Requirements imposed on a code generator

m
– Preserving the semantic meaning of the source program and being of high
quality
– Making effective use of the available resources of the target machine
– The code generator itself must run efficiently.

o
• A code generator has three primary tasks:
– Instruction selection, register allocation, and instruction ordering

s.c
Issue in the Design of a Code Generator
• General tasks in almost all code generators: instruction selection, register
allocation and assignment.
– The details are also dependent on the specifics of the intermediate
representation, the target language, and the run-tie system.
• The most important criterion for a code generator is that it produce correct code.
c
Given the premium on correctness, designing a code generator so it can be easily
implemented, tested, and maintained is an important design
vtu
Input to the Code Generator
• The input to the code generator is
– the intermediate representation of the source program produced by the
frontend along with
– information in the symbol table that is used to determine the run-time
address of the data objects denoted by the names in the IR.
w.

• Choices for the IR

– Three-address representations: quadruples, triples, indirect triples
– Virtual machine representations such as bytecodes and stack-machine
code
– Linear representations such as postfix notation
ww

– Graphical representation such as syntax trees and DAG’s

• Assumptions
– Relatively lower level IR
– All syntactic and semantic errors are detected.
The Target Program
• The instruction-set architecture of the target machine has a significant impact on
the difficulty of constructing a good code generator that produces high-quality
machine code.
• The most common target-machine architecture are RISC, CISC, and stack based.

DEPT. Of CSE, SJBIT Page 106

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

– A RISC machine typically has many registers, three-address instructions,

simple addressing modes, and a relatively simple instruction-set
architecture.
– A CISC machine typically has few registers, two-address instructions, and
variety of addressing modes, several register classes, variable-length
instructions, and instruction with side effects.
In a stack-based machine, operations are done by pushing operands onto a stack and then

m
performing the operations on the operands at the top of the stack
• Java Virtual Machine (JVM)
– Just-in-time Java compiler
• Producing the target program as

o
– An absolute machine-language program
– Relocatable machine-language program

s.c
– An assembly-language program
• In this chapter
– Use very simple RISC-like computer as the target machine.
– Add some CISC-like addressing modes
– Use assembly code as the target language.
Instruction Selection
c
• The code generator must map the IR program into a code sequence that can be
executed by the target machine.
vtu
• The complexity of the mapping is determined by the factors such as
– The level of the IR
– The nature of the instruction-set architecture
– The desired quality of the generated code
• If the IR is high level, use code templates to translate each IR statement into a
sequence of machine instruction.
w.

– Produces poor code, needs further optimization.

• If the IR reflects some of the low-level details of the underlying machine, then it
can use this information to generate more efficient code sequence.
ww

DEPT. Of CSE, SJBIT Page 107

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

m
• The quality of the generated code is usually determined by its speed and size.
• A given IR program can be implemented by many different code sequences, with

o
significant cost differences between the different implementations.
• A naïve translation of the intermediate code may therefore lead to correct but

s.c
unacceptably inefficient target code.
• For example use INC for a=a+1 instead of
LD R0,a
ADD R0, R0, #1
ST a, R0
• We need to know instruction costs in order to design good code sequences but,
c
unfortunately, accurate cost information is often difficult to obtain.
vtu
Register Allocation
• A key problem in code generation is deciding what values to hold in what
registers.
• Efficient utilization is particularly important.
• The use of registers is often subdivided into two subproblems:
1. Register Allocation, during which we select the set of variables that will
w.

reside in registers at each point in the program.

2. Register assignment, during which we pick the specific register that a
variable will reside in.
• Finding an optimal assignment of registers to variables is difficult, even with
single-register machine.
ww

• Mathematically, the problem is NP-complete.

Register pairs (even/odd numbered) for some operands & results
• Multiplication instruction is in the form M x, y where x, the multiplicand, is the
even register of even/odd register pair and y, the multiplier, is the odd register.
• The product is occupies the entire even/odd register pair.
• D x, y where the dividend occupies an even/odd register pair whose even register
is x, the divisor is y. After division, the even register holds the remainder and the
odd register the quotient.

Example:

DEPT. Of CSE, SJBIT Page 108

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

o m
s.c
Evaluation Order
• The order in which computations are performed can affect the efficiency of the
target code.
• Some computation orders require fewer registers to hold intermediate results than
others.
c
• However, picking a best order in the general case is a difficult NP-complete
problem.
vtu
A Simple Target Machine Model
• Our target computer models a three-address machine with load and store
operations, computation operations, jump operations, and conditional jumps.
• The underlying computer is a byte-addressable machine with n general-purpose
registers.
• Assume the following kinds of instructions are available:
w.

– Load operations Ex: LD dst, addr

– Store operations Instruction like ST x, r
– Computation operations of the form OP dst,src1,src2
– Unconditional jumps : The instruction BR L
ww

– Conditional jumps : Bcond r, L

– Ex: BLTZ r, L

DEPT. Of CSE, SJBIT Page 109

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

o m
Program and Instruction Costs

s.c
• For simplicity, we take the cost of an instruction to be one plus the costs
associated with the addressing modes of the operands.
• Addressing modes involving registers have zero additional cost, while those
involving a memory location or constant in them have an additional cost f one.
• For example,
c
– LD R0, R1 cost = 1
– LD R0, M cost = 2
vtu
– LD R1, *100(R2) cost = 3
Addresses in the Target Code
• We show how names in the IR can be converted into addresses in the target code
by looking at code generation for simple procedure calls and returns using static
and stack allocation.
• In Section 7.1, we described how each executing program runs in its own logical
w.

address space that was partitioned into four code and data areas:
1. A statically determined area Code that holds the executable target code.
2. A statically determined data area Static, for holding global constants and
other data generated by the compiler.
3. A dynamically managed area Heap for holding data objects that are
ww

allocated and freed during program execution.

A dynamically managed area Stack for holding activation records as they are created and
destroyed during procedure calls and returns

DEPT. Of CSE, SJBIT Page 110

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

o m
cs.c
vtu
w.
ww

DEPT. Of CSE, SJBIT Page 111

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

o m
cs.c
vtu
w.
ww

DEPT. Of CSE, SJBIT Page 112

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

o m
cs.c
vtu
w.
ww

DEPT. Of CSE, SJBIT Page 113

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

o m
s.c
• Code optimization:
– A transformation to a program to make it run faster and/or take up less
space
– Optimization should be safe, preserve the meaning of a program.
c
– Code optimization is an important component in a compiler
vtu
– Examples :
• Flow of cont rol opt imizat ion:
got o L1 got o L2
… …
L1: got o L2 L1: got o L2

if a < b got o L1 if a<b got o L2

… …
w.

L1: got o L2 L1: got o L2

got o L1 if a < b got o L2

… got o L 3
L1: if a < b got o L2 …
L 1:
L 3:
ww

• Algebraic simplification:
x : = x+0
x := x*1 == nop
• Reduction in strength
X^2  x * x
X * 4  x << 2
• Instruction selection
Sometimes some hardware instructions can implement certain operation efficiently.
• Code optimization can either be high level or low level:
– High level code optimizations:

DEPT. Of CSE, SJBIT Page 114

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

• Loop unrolling, loop fusion, procedure inlining

– Low level code optimizations:
• Instruction selection, register allocation
– Some optimization can be done in both levels:
• Common subexpression elimination, strength reduction, etc.
– Flow graph is a common intermediate representation for code
optimization.

m
• Code optimization can either be high level or low level:
– High level code optimizations:
• Loop unrolling, loop fusion, procedure inlining

o
– Low level code optimizations:
• Instruction selection, register allocation
– Some optimization can be done in both levels:

s.c
• Common subexpression elimination, strength reduction, etc.
– Flow graph is a common intermediate representation for code
optimization.
• Basic block: a sequence of consecutive statements with exactly 1 entry and 1 exit.
• Flow graph: a directed graph where the nodes are basic blocks and block B1
c
block B2 if and only if B2 can be executed immediately after B1:
• Algorithm to construct flow graph:
vtu
– Finding leaders of the basic blocks:
• The first statement is a leader
• Any statement that is the target of a conditional or unconditional
goto is a leader
• Any statement that immediately follows a goto or conditional goto
statement is a leader
w.

– For each leader, its basic block consists all statements up to the next
leader.
– B1B2 if and only if B2 can be executed immediately after B1.
• Example:
100: sum = 0
ww

101: j = 0
102: goto 107
103: t1 = j << 2
104: t2 = addr(a)
105: t3 = t2[t1]
106: sum = sum + t3
107: if j < n goto 103
• Optimizations within a basic block is called local optimization.
• Optimizations across basic blocks is called global optimization.
• Some common optimizations:
– Instruction selection

DEPT. Of CSE, SJBIT Page 115

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

– Register allocation
– Common subexpression elimination
– Code motion
– Strength reduction
– Induction variable elimination
– Dead code elimination
– Branch chaining

m
– Jump elimination
– Instruction scheduling
– Procedure inlining

o
– Loop unrolling
– Loop fusing
– Code hoisting

s.c
• Instruction selection:
– Using a more efficient instruction to replace a sequence of instructions
(space and speed).
– Example:
Mov R2, (R3) c
Add R2, #1, R2
Mov (R3), R2  Add (R3), 1, (R3)
• Register allocation: allocate variables to registers (speed)
vtu
• Example:
w.

• Code motion: move a loop invariant computation before the loop

• Example:
ww

• Strength reduction: replace expensive operation by equivalent cheaper

operations
• Example:

DEPT. Of CSE, SJBIT Page 116

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

m
• Induction variable elimination: can induce value from another variable.
Example:

o
•
c s.c
Common sub-expression elimination: an expression was previously calculated
and the variables in the expression have not changed. Can avoid re-computing the
expression.
vtu
• Example:

ALGEBRAIC TRANSFORMATION
w.

Countless algebraic transformations can be used to change the set of

expressions computed by a basic block into an algebraically equivalent set. The useful
ones are those that simplify expressions or replace expensive operations by cheaper ones.
For example, statements
such as
ww

x := x +0
Or
x := x*1
can be eliminated from a basic block without changing the set of expressions it
computes. The exponentiation operator in the statements
x := y ** 2
usually requires a function call to implement. Using an algebraic transformation, this
statement can be replaced by cheaper, but equivalent statement
x := y*y

DEPT. Of CSE, SJBIT Page 117

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

FLOW GRAPHS
We can add the flow-of –control information to the set of basic blocks making
up a program by constructing a directed graph called a flow graph. The nodes of the flow
graph are the basic blocks. One node is distinguished as initial; it is the block whose
leader is the first statement. There is a directed edge from block B1 to block B2can be
immediately follow B1in some execution sequence; that is, if
1. there is a conditional or unconditional jump from the last statement of B2, or
2. B2 immediately follow B1in the order of the program, and B1 does not end in the

m
unconditional jump
B1 is a predecessor of B2, and B2is a successor of B1.
Example 4:The flow graph of the program of fig. 7 is shown in fig. 9, B1 is the initial

o
node.

B1 Prod := 0

s.c
I:=1

B2 t1 := 4 * i
t2 := a [ t1 ]
t3 := 4 * i
c t4 := b [ t3 ]
t5 := t2 * t4
t6:= prod + t5
t7:=i+1
vtu
i := t7
if I <= 20 goto B2

Fig .9 flow graph for program

REPRESENTATION OF BASIC BLOCKS

Basic Blocks are represented by variety of data structures. For example, after
partitioning the three address statements by Algorithm 1, each basic block can be
represented by a record consisting of a count of number of quadruples in the block,
followed by a pointer to the leader of the block, and by the list of predecessors and
successors of the block. For example the block B2 running from the statement (3)
ww

through (12) in the intermediate code of figure 2 were moved elsewhere in the quadruples
array or were shrunk, the (3) in if i<=20 goto(3) would have to be changed.

LOOPS
Loop is a collection of nodes in a flow graph such that

1. All nodes in the collection are strongly connected; from any node in the loop to any
other, there is path of length one or more, wholly within the loop, and

DEPT. Of CSE, SJBIT Page 118

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

2. The collection of nodes has a unique entry, a node in the loop such that is, a node in
the loop such that the only way to reach a node of the loop from a node outside the loop
is to first go through the entry.
A loop that contains no other loops is called an inner loop.
REDUNTANT LOADS AND STORES
If we see the instructions sequence
(1) MOV R0,a
(2) MOV a,R0

m
-we can delete instructions (2) because whenever (2) is executed. (1) will ensure that the
value of a is already in register R0.If (2) had a label we could not be sure that (1) was
always executed immediately before (2) and so we could not remove (2).
UNREACHABLE CODE

o
Another opportunity for peephole optimizations is the removal of
unreachable instructions. An unlabeled instruction immediately following an
unconditional jump may be removed. This operation can be repeated to eliminate a

s.c
sequence of instructions. For example, for debugging purposes, a large program may
have within it certain segments that are executed only if a variable debug is 1.In C, the
source code might look like:
#define debug 0
….
If ( debug ) { c
Print debugging information
}
vtu
In the intermediate representations the if-statement may be translated as:
If debug =1 goto L2
Goto L2
L1: print debugging information
L2: …………………………(a)

One obvious peephole optimization is to eliminate jumps over jumps .Thus no matter
w.

what the value of debug, (a) can be replaced by:

If debug ≠1 goto L2
Print debugging information
L2: ……………………………(b)
As the argument of the statement of (b) evaluates to a constant true it can be replaced by
If debug ≠0 goto L2
ww

Print debugging information

L2: ……………………………(c)
As the argument of the first statement of (c) evaluates to a constant true, it can be
replaced by goto L2. Then all the statement that print debugging aids are manifestly
unreachable and can be eliminated one at a time.
FLOW-OF-CONTROL OPTIMIZATIONS
The unnecessary jumps can be eliminated in either the intermediate code or the
target code by the following types of peephole optimizations. We can replace the jump
sequence
goto L2

DEPT. Of CSE, SJBIT Page 119

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

….
L1 : gotoL2
by the sequence
goto L2
….
L1 : goto L2
If there are now no jumps to L1, then it may be possible to eliminate the statement
L1:goto L2 provided it is preceded by an unconditional jump .Similarly, the sequence

m
if a < b goto L1
….
L1 : goto L2
can be replaced by

o
if a < b goto L2
….
L1 : goto L2

s.c
Finally, suppose there is only one jump to L1 and L1 is preceded by an unconditional
goto. Then the sequence
goto L1
……..
L1:if a<b goto L2
L3: c …………………………………..(1)
may be replaced by
if a<b goto L2
vtu
goto L3
…….
L3: ………………………………….(2)
While the number of instructions in (1) and (2) is the same, we sometimes skip the
unconditional jump in (2), but never in (1).Thus (2) is superior to (1) in execution time

ALGEBRAIC SIMPLIFICATION
w.

There is no end to the amount of algebraic simplification that can be attempted

through peephole optimization. Only a few algebraic identities occur frequently enough
that it is worth considering implementing them .For example, statements such as
x := x+0
Or
x := x * 1
ww

are often produced by straightforward intermediate code-generation algorithms, and they

can be eliminated easily through peephole optimization.

ELIMINATION OF COMMON SUBEXPRESSIONS

Common sub expressions need not be computed over and over again. Instead they can
be computed once and kept in store from where its referenced when encountered again –
of course providing the variable values in the expression still remain constant.
ELIMINATION OF DEAD CODE
Its possible that a large amount of dead(useless) code may exist in the program. This
might be especially caused when introducing variables and procedures as part of

DEPT. Of CSE, SJBIT Page 120

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

construction or error-correction of a program – once declared and defined, one forgets to

remove them in case they serve no purpose. Eliminating these will definitely optimize the
code
REDUCTION IN STRENGTH
Reduction in strength replaces expensive operations by equivalent cheaper ones
on the target machine. Certain machine instructions are considerably cheaper than others
and can often be used as special cases of more expensive operators. For example, x² is
invariably cheaper to implement as x*x than as a call to an exponentiation routine. Fixed-

m
point multiplication or division by a power of two is cheaper to implement as a shift.
Floating-point division by a constant can be implemented as multiplication by a constant,
which may be cheaper.

o
USE OF MACHINE IDIOMS

s.c
The target machine may have hardware instructions to implement certain
specific operations efficiently. Detecting situations that permit the use of these
instructions can reduce execution time significantly. For example, some machines have
auto-increment and
auto-decrement addressing modes. These add or subtract one from an operand before or
after using its value. The use of these modes greatly improves the quality of code when
c
pushing or popping a stack, as in parameter passing. These modes can also be used in
code for statements like i : =i+1.
vtu
Getting Better Performance
Dramatic improvements in the running time of a program-such as cutting the running
time form a few hours to a few seconds-are usually obtained by improving the program at
all levels, from the source level to the target level, as suggested by fig. At each level, the
available options fall between the two extremes of finding a better algorithm and of
implementing a given algorithm so that fewer operations are performed.
Algorithmic transformations occasionally produce spectacular improvements in running
w.

time. For example, Bentley relates that the running time of a program for sorting N
elements dropped from 2.02N^2 microseconds to 12Nlog2N microseconds then a
carefully coded "insertion sort" was replaced by "quicksort".
THE PRINCIPAL SOURCES OF OPTIMIZATION
Here we introduce some of the most useful code-improving transformations. Techniques
for implementing these transformations are presented in subsequent sections. A
ww

transformation of a program is called local if it can be performed by looking only at the

statements in a bas9ic block; otherwise, it is called global. Many transformations can be
performed at both the local and global levels. Local transformations are usually
performed first.
Function-Preserving Transformations
There are a number of ways in which a compiler can improve a program without
changing the function it computes. Common subexpression elimination, copy
propagation, dead-code elimination, and constant folding are common examples of such
function-preserving transformations. The other transformations come up primarily when
global optimizations are performed.

DEPT. Of CSE, SJBIT Page 121

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

Frequently, a program will include several calculations of the same value, such as an
offset in an array. Some of these duplicate calculations cannot be avoided by the
programmer because they lie below the level of detail accessible within the source
language. For example, block B5 shown in fig recalculates 4*i and 4*j.

o m
Local common subexpression elimination

Common Subexpressions

s.c
An occurrence of an expression E is called a common subexpression if E was previously
computed, and the values of variables in E have not changed since the previous
computation. We can avoid recomputing the expression if we can use the previously
computed value. For example, the assignments to t7 and t10 have the common
subexpressions 4*I and 4*j, respectively, on the right side in Fig. They have been
eliminated in Fig by using t6 instead of t7 and t8 instead of t10. This change is what
c
would result if we reconstructed the intermediate code from the dag for the basic block.
Example: Fig shows the result of eliminating both global and local common
vtu
subexpressions from blocks B5 and B6 in the flow graph of Fig. We first discuss the
transformation of B5 and then mention some subtleties involving arrays.
w.
ww

B5 and B6 after common sub-expression elimination

After local common sub-expressions are eliminated B5 still evaluates 4*i and 4*j, as
shown in the earlier fig. Both are common sub-expressions; in particular, the three
statements
t8:= 4*j; t9:= a[t[8]; a[t8]:=x
in B5 can be replaced by

DEPT. Of CSE, SJBIT Page 122

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

t9:= a[t4]; a[t4:= x using t4 computed in block B3. In Fig. observe that as control passes
from the evaluation of 4*j in B3 to B5, there is no change in j, so t4 can be used if 4*j is
needed.
Another common sub-expression comes to light in B5 after t4 replaces t8. The new
expression a[t4] corresponds to the value of a[j] at the source level. Not only does j retain
its value as control leaves b3 and then enters B5, but a[j], a value computed into a
temporary t5, does too because there are no assignments to elements of the array a in the
interim. The statement

m
t9:= a[t4]; a[t6]:= t9
in B5 can therefore be replaced by
a[t6]:= t5
The expression in blocks B1 and B6 is not considered a common subexpression although

o
t1 can be used in both places.After control leaves B1 and before it reaches B6,it can go
through B5,where there are assignments to a.Hence, a[t1] may not have the same value
on reaching B6 as it did in leaving B1, and it is not safe to treat a[t1] as a common

s.c
subexpression.
Copy Propagation
Block B5 in Fig. can be further improved by eliminating x using two new
transformations. One concerns assignments of the form f:=g called copy statements, or
copies for short. Had we gone into more detail in Example 10.2, copies would have arisen
much sooner, because the algorithm for eliminating common subexpressions introduces
c
them, as do several other algorithms. For example, when the common subexpression in
c:=d+e is eliminated in Fig., the algorithm uses a new variable t to hold the value of d+e.
vtu
Since control may reach c:=d+e either after the assignment to a or after the assignment to
b, it would be incorrect to replace c:=d+e by either c:=a or by c:=b.
The idea behind the copy-propagation transformation is to use g for f, wherever possible
after the copy statement f:=g. For example, the assignment x:=t3 in block B5 of Fig. is a
copy. Copy propagation applied to B5 yields:
x:=t3
a[t2]:=t5
w.

a[t4]:=t3
goto B2
ww

Copies introduced during common subexpression

elimination.
This may not appear to be an improvement, but as we shall see, it gives us the
opportunity to eliminate the assignment to x.
Dead-Code Eliminations
A variable is live at a point in a program if its value can be used subsequently; otherwise,
it is dead at that point. A related idea is dead or useless code, statements that compute
values that never get used. While the programmer is unlikely to introduce any dead code
intentionally, it may appear as the result of previous transformations. For example, we

DEPT. Of CSE, SJBIT Page 123

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

discussed the use of debug that is set to true or false at various points in the program, and
used in statements like
If (debug) print …
By a data-flow analysis, it may be possible to deduce that each time the program reaches
this statement, the value of debug is false. Usually, it is because there is one particular
statement
Debug :=false
That we can deduce to be the last assignment to debug prior to the test no matter what

m
sequence of branches the program actually takes. If copy propagation replaces debug b y
false, then the print statement is dead because it cannot be reached. We can eliminate
both the test and printing from the o9bject code. More generally, deducing at compile
time that the value of an expression is a co9nstant and using the constant instead is

o
known as constant folding.
One advantage of copy propagation is that it often turns the copy statement into dead
code. For example, copy propagation followed by dead-code elimination removes the

s.c
assignment to x and transforms 1.1 into
a [ t2 ] := t5
a [t4] := t3
goto B2
Loop Optimizations
We now give a brief introduction to a very important place for optimizations, namely
c
loops, especially the inner loops where programs tend to spend the bulk of their time. The
running time of a program may be improved if we decrease the number of instructions in
vtu
an inner loop, even if we increase the amount of code outside that loop. Three techniques
are important for loop optimization: code motion, which moves code outside a loop;
induction-variable elimination, which we apply to eliminate I and j from the inner loops
B2 and B3 and, reduction in strength, which replaces and expensive operation by a
cheaper one, such as a multiplication by an addition.
Code Motion
An important modification that decreases the amount of code in a loop is code motion.
w.

This transformation takes an expression that yields the same result independent of the
number of times a loop is executed ( a loop-invariant computation) and places the
expression before the loop. Note that the notion “before the loop” assumes the existence
of an entry for the loop. For example, evaluation of limit-2 is a loop-invariant
computation in the following while-statement:
While (i<= limit-2 )
ww

Code motion will result in the equivalent of

t= limit-2;
while (i<=t)
Induction Variables and Reduction in Strength
While code motion is not applicable to the quicksort example we have been considering
the other two transformations are.Loops are usually processed inside out.For example
consider the loop around B3.
Note that the values of j and t4 remain in lock-step;every time the value of j decreases b y
1 ,that of t4 decreases by 4 because 4*j is assigned to t4.Such identifiers are called
induction variables.

DEPT. Of CSE, SJBIT Page 124

COMPILER DESIGN [10CS63] UNIT- VIII CODE GENERATION

When there are two or more induction variables in a loop, iit may be possible to get rid
of all but one, by the process of induction-variable elimination.For the inner loop around
B3 in Fig. we cannot ger rid of either j or t4 completely.; t4 is used in B3 and j in B4.
However, we can illustrate reduction in strength and illustrate a part of the process of
induction-variable elimination. Eventually j will be eliminated when the outer loop of B2
- B5 is considered.
Example: As the relationship t4:=4*j surely holds after such an assignment to t4 in Fig.
and t4 is not changed elsewhere in the inner loop around B3, it follows that just after the

m
statement j:=j-1 the relationship t4:= 4*j-4 must hold. We may therefore replace the
assignment t4:= 4*j by t4:= t4-4. The only problem is that t4 does not have a value when
we enter block B3 for the first time. Since we must maintain the relationship t4=4*j on
entry to the block B3, we place an intializations\ of t4 at the end of the blcok where j

o
itself is initialized, shown by the dashed addt\ition to block B1 in second Fig.
The replacement of a multiplication by a subtraction will speed up the object code if
multiplication takes more time than addition or subtraction, as is the case on many

s.c
machines.

c
vtu
w.

Before After
ww

strength reduction applied to 4*j in block B3

DEPT. Of CSE, SJBIT Page 125

Project Report of Migrating To Cloud Bas
100% (1)
Project Report of Migrating To Cloud Bas
32 pages
Mba-III-sales & Retail Management (10mbamm312) - Notes
100% (1)
Mba-III-sales & Retail Management (10mbamm312) - Notes
68 pages
DS Practical Handset
No ratings yet
DS Practical Handset
221 pages
C Programming For Beginners: The Simple Guide to Learning C Programming Language Fast!
From Everand
C Programming For Beginners: The Simple Guide to Learning C Programming Language Fast!
Tim Warren
5/5 (1)
Cse Vi Compiler Design (10cs63) Notes
No ratings yet
Cse Vi Compiler Design (10cs63) Notes
148 pages
Compiler Design Notes-Sjbit
No ratings yet
Compiler Design Notes-Sjbit
148 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
148 pages
OS Notes PDF
No ratings yet
OS Notes PDF
150 pages
6 Sem Mining
No ratings yet
6 Sem Mining
25 pages
LPG Automation System Documentation
No ratings yet
LPG Automation System Documentation
171 pages
Msds Eurokyd Bs36 Tx60
No ratings yet
Msds Eurokyd Bs36 Tx60
4 pages
Cse V Systems Software (10cs52) Notes
No ratings yet
Cse V Systems Software (10cs52) Notes
206 pages
Cse-V-Systems Software (10CS52) - Notes PDF
No ratings yet
Cse-V-Systems Software (10CS52) - Notes PDF
206 pages
Chismtry Form 2 Government Book (1)
No ratings yet
Chismtry Form 2 Government Book (1)
118 pages
Annual Report 2009-2010
No ratings yet
Annual Report 2009-2010
23 pages
Muskaan Garg_41317701817_BBA(B&I) (1)
No ratings yet
Muskaan Garg_41317701817_BBA(B&I) (1)
77 pages
C Programming : All-in-One Resource for C Programming , Comprehensive Tutorials, Expert Tips, and a Wide Range of Exercises for All Skill Levels
From Everand
C Programming : All-in-One Resource for C Programming , Comprehensive Tutorials, Expert Tips, and a Wide Range of Exercises for All Skill Levels
Aria Thane
No ratings yet
Comparative Analaysis of Fleet Managemen
No ratings yet
Comparative Analaysis of Fleet Managemen
103 pages
17-A Nac Annual Report 2017 Final-2
No ratings yet
17-A Nac Annual Report 2017 Final-2
247 pages
New PDF
No ratings yet
New PDF
100 pages
4_Sem_BTech_AI_Honors_Syllabus (1)
No ratings yet
4_Sem_BTech_AI_Honors_Syllabus (1)
13 pages
Information Technology 27-5-13
No ratings yet
Information Technology 27-5-13
31 pages
FAQs On Commodities
No ratings yet
FAQs On Commodities
6 pages
Understanding Software Engineering Vol 2: Programming principles and concepts to build any software.
From Everand
Understanding Software Engineering Vol 2: Programming principles and concepts to build any software.
Gabriel Clemente
5/5 (1)
paperIJERM0205069 1
No ratings yet
paperIJERM0205069 1
5 pages
4 Sem Mining
No ratings yet
4 Sem Mining
23 pages
Syllabus_6_Sem_B.Tech_CyberSecurity (1)
No ratings yet
Syllabus_6_Sem_B.Tech_CyberSecurity (1)
13 pages
6th Sem Final
No ratings yet
6th Sem Final
27 pages
Module 2 - Food Manufacture
82% (11)
Module 2 - Food Manufacture
33 pages
Comm115-R Pefanis
No ratings yet
Comm115-R Pefanis
3 pages
Smart Player
No ratings yet
Smart Player
103 pages
Paper - 1 MCQs
No ratings yet
Paper - 1 MCQs
2 pages
Objective-C Programming Nuts and bolts
From Everand
Objective-C Programming Nuts and bolts
Keith Lee
No ratings yet
PTVAIM Part Time IB MIS FM
No ratings yet
PTVAIM Part Time IB MIS FM
12 pages
Alp Result
No ratings yet
Alp Result
27 pages
Mining_scheme 3 1 18
No ratings yet
Mining_scheme 3 1 18
6 pages
May 2015
No ratings yet
May 2015
48 pages
6_Sem_Syllabus_AI_
No ratings yet
6_Sem_Syllabus_AI_
24 pages
Ise Vii Information Systems (10is72) Notes
No ratings yet
Ise Vii Information Systems (10is72) Notes
100 pages
Mastering C: A Comprehensive Guide to Programming Excellence
From Everand
Mastering C: A Comprehensive Guide to Programming Excellence
THE NORTHERN HIMALAYAS
No ratings yet
Final Exam 2017-2018 For Summer Class
100% (1)
Final Exam 2017-2018 For Summer Class
12 pages
Java - Major Proj 2010-11
No ratings yet
Java - Major Proj 2010-11
5 pages
Irrigation System
No ratings yet
Irrigation System
49 pages
Eng DrMushtaqAhmad
No ratings yet
Eng DrMushtaqAhmad
5 pages
B.Pharmacy-7-Sem
No ratings yet
B.Pharmacy-7-Sem
12 pages
7_Sem_Syllabus_AI
No ratings yet
7_Sem_Syllabus_AI
11 pages
Fruit Crops
No ratings yet
Fruit Crops
6 pages
Assembly Language: From Basics to Expert Proficiency
From Everand
Assembly Language: From Basics to Expert Proficiency
William Smith
No ratings yet
B.Pharmacy 7 Sem
No ratings yet
B.Pharmacy 7 Sem
12 pages
India Year Book
No ratings yet
India Year Book
23 pages
Code Beneath the Surface: Mastering Assembly Programming
From Everand
Code Beneath the Surface: Mastering Assembly Programming
Kameron Hussain
No ratings yet
Objective-C Language Reference and Techniques: Definitive Reference for Developers and Engineers
From Everand
Objective-C Language Reference and Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
XII Economics Macro Practice questions
No ratings yet
XII Economics Macro Practice questions
47 pages
ssd
No ratings yet
ssd
6 pages
Success Secrets of The Subconscious Mind
100% (1)
Success Secrets of The Subconscious Mind
24 pages
Mba III Sales Retail Management 10mbamm312 Notes
No ratings yet
Mba III Sales Retail Management 10mbamm312 Notes
68 pages
Modified CSR Pyramid An Indian Prospective
No ratings yet
Modified CSR Pyramid An Indian Prospective
5 pages
FAFL Notes (2010) (SJBIT) PDF
No ratings yet
FAFL Notes (2010) (SJBIT) PDF
124 pages
FAFL Notes (2010) (SJBIT)
No ratings yet
FAFL Notes (2010) (SJBIT)
124 pages
Flexural Behavior of Bacteria Incorporated Reinforced Concrete Beams
No ratings yet
Flexural Behavior of Bacteria Incorporated Reinforced Concrete Beams
7 pages
C# Algorithms for New Programmers: A Practical Guide with Examples
From Everand
C# Algorithms for New Programmers: A Practical Guide with Examples
William E. Clark
No ratings yet
Mold Building Standards: Revised Date
No ratings yet
Mold Building Standards: Revised Date
13 pages
RRC SCR GDCE JE Mechanical 16-08-2024 QP (1)
No ratings yet
RRC SCR GDCE JE Mechanical 16-08-2024 QP (1)
25 pages
ILT Stirring Pot - The Ingredients For Feb 2009
No ratings yet
ILT Stirring Pot - The Ingredients For Feb 2009
1 page
Engine Control Celica 2000
100% (2)
Engine Control Celica 2000
12 pages
1.testing Throughout The Software Life Cycle
No ratings yet
1.testing Throughout The Software Life Cycle
12 pages
Core 6 Succinctly
No ratings yet
Core 6 Succinctly
102 pages
Topics - Numerical Solutions PDF
100% (1)
Topics - Numerical Solutions PDF
1 page
Ratios and Proportions
50% (2)
Ratios and Proportions
23 pages
Abdul Raouf Mangla T24-Team Lead PDF
No ratings yet
Abdul Raouf Mangla T24-Team Lead PDF
6 pages
HALO UNSC Handbook
No ratings yet
HALO UNSC Handbook
134 pages
DW002
No ratings yet
DW002
3 pages
Docker Containers Versus Virtual Machine-Based Virtualization: Proceedings of IEMIS 2018, Volume 3
No ratings yet
Docker Containers Versus Virtual Machine-Based Virtualization: Proceedings of IEMIS 2018, Volume 3
11 pages
Linux Boot Processes
No ratings yet
Linux Boot Processes
7 pages
T66 2 Ecam 81995985292 Eng
No ratings yet
T66 2 Ecam 81995985292 Eng
104 pages
A Presentation On Embedded System
No ratings yet
A Presentation On Embedded System
22 pages
Devops Unit 1 - Notes
No ratings yet
Devops Unit 1 - Notes
37 pages
Preface: Understanding Minimalism
No ratings yet
Preface: Understanding Minimalism
2 pages
3.2 math notes
No ratings yet
3.2 math notes
5 pages
PN 1900036-D Elite IFU English (En)
No ratings yet
PN 1900036-D Elite IFU English (En)
22 pages
Script for AI-generated Video by ChatGPT
No ratings yet
Script for AI-generated Video by ChatGPT
66 pages
K1200LT Radio Quick Set Up Guide
100% (1)
K1200LT Radio Quick Set Up Guide
1 page
CRM Modul 1
No ratings yet
CRM Modul 1
23 pages
Case Study - 1
No ratings yet
Case Study - 1
33 pages
1151CS104 Digitalelectronics Syllabus
No ratings yet
1151CS104 Digitalelectronics Syllabus
2 pages
Mi True Wireless Earphones 2 Basic User Manual
No ratings yet
Mi True Wireless Earphones 2 Basic User Manual
7 pages
Smart Manufacturing For The Oil Refining and Petrochemical Industry
No ratings yet
Smart Manufacturing For The Oil Refining and Petrochemical Industry
4 pages
Rundown Gathering 2022 Final Final Final
No ratings yet
Rundown Gathering 2022 Final Final Final
26 pages
Capicitor Placement
No ratings yet
Capicitor Placement
5 pages
toc mod 5 notes
No ratings yet
toc mod 5 notes
41 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.