0% found this document useful (0 votes)

7 views126 pages

SP Unit III-2024-25

This document outlines the syllabus for the Compilers unit in the Systems Programming course at MIT School of Computing, Pune. It covers the phases of compilation, the role of lexical analyzers and parsers, and various parsing techniques including top-down and bottom-up parsing. Additionally, it discusses error recovery methods, token specifications, and the construction of predictive parsing tables.

Uploaded by

GAYATRI BHOSALE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views126 pages

SP Unit III-2024-25

Uploaded by

GAYATRI BHOSALE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 126

MIT Art Design and Technology University

MIT School of Computing, Pune

21BTCS601 – Systems Programming

Class - T.Y. (SEM-II)

Unit – III
COMPILERS

AY 2024-2025 SEM-II
Unit III - Syllabus

Unit III –Compilers 09 hours

• Phase structure of Compiler and entire compilation process.
• Lexical Analyzer: The Role of the Lexical Analyzer
• Input Buffering. Specification of Tokens, Recognition Tokens,
• Design of Lexical Analyzer using Uniform Symbol Table,
• Lexical Errors.
• Role of parsers,
• Classification of Parsers:
• Top down parsers- recursive descent parser and predictive parser (LL parser),
• Bottom up Parsers – Shift Reduce parser, LR parser. YACC specification and Automatic
construction of Parser (YACC).
Compilers
• “Compilation”
• Translation of a program written in a source language into a semantically
equivalent program written in a target language
• Oversimplified view:

Input

Source Target
Compiler
Program Program

Error messages Output

3
Preprocessors, Compilers, Assemblers, and
Linkers
Skeletal Source Program

Preprocessor

Source Program

Try for example:

Compiler
gcc -v myprog.c
Target Assembly Program

Assembler

Relocatable Object Code

Libraries and
Linker Relocatable Object Files

Absolute Machine Code

4
The Phases of a Compiler

5
The Phases of a Compiler
Phase Output Sample
Programmer (source code producer) Source string A=B+C;
Scanner (performs lexical analysis) Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’
And symbol table with names
Parser (performs syntax analysis based on the Parse tree or abstract syntax tree ;
|
grammar of the programming language) =
/ \
A +
/ \
B C

Semantic analyzer (type checking, etc) Annotated parse tree or abstract syntax tree

Intermediate code generator Three-address code, quads, or RTL int2fp B t1

+ t1 C t2
:= t2 A
Optimizer Three-address code, quads, or RTL int2fp B t1
+ t1 #2.3 A
Code generator Assembly code MOVF #2.3,r1
ADDF2 r1,r2
MOVF r2,A
Peephole optimizer Assembly code ADDF2 #2.3,r2
MOVF r2,A 6
The Grouping of Phases
• Compiler front and back ends:
• Front end: analysis (machine independent)
• Back end: synthesis (machine dependent)
• Compiler passes:
• A collection of phases is done only once (single pass) or multiple times
(multi pass)
• Single pass: usually requires everything to be defined before being used in source
program
• Multi pass: compiler may have to keep entire program representation in memory

7
The role of lexical analyzer

token
Source To semantic
Lexical Analyzer Parser
program analysis
getNextToken

Symbol
table
Input buffering
• Sometimes lexical analyzer needs to look ahead some symbols to
decide about the token to return
• In C language: we need to look after -, = or < to decide what token to return
• In Fortran: DO 5 I = 1.25
• We need to introduce a two buffer scheme to handle large look-
aheads safely

E = M* C**2 eof
Tokens, Patterns and Lexemes
• A token is a pair a token name and an optional token value
• A pattern is a description of the form that the lexemes of a token
may take
• A lexeme is a sequence of characters in the source program that
matches the pattern for a token
Example

Token Informal description Sample lexemes

if Characters i, f if
else Characters e, l, s, e else
comparison < or > or <= or >= or == or != <=, !=

id Letter followed by letter and digits pi, score, D2

number Any numeric constant 3.14159, 0, 6.02e23
literal Anything but “ sorrounded by “ “core dumped”

printf(“total = %d\n”, score);

Attributes for tokens
• E = M * C ** 2
• <id, pointer to symbol table entry for E>
• <assign-op>
• <id, pointer to symbol table entry for M>
• <mult-op>
• <id, pointer to symbol table entry for C>
• <exp-op>
• <number, integer value 2>
Specification of tokens
• In theory of compilation regular expressions are used to formalize
the specification of tokens
• Regular expressions are means for specifying regular languages
• Example:
• Letter_(letter_ | digit)*
• Each regular expression is a pattern specifying the form of strings
Regular expressions
• Ɛ is a regular expression, L(Ɛ) = {Ɛ}
• If a is a symbol in ∑then a is a regular expression, L(a) = {a}
• (r) | (s) is a regular expression denoting the language L(r) ∪ L(s)
• (r)(s) is a regular expression denoting the language L(r)L(s)
• (r)* is a regular expression denoting (L9r))*
• (r) is a regular expression denting L(r)
Regular definitions
d1 -> r1
d2 -> r2
…
dn -> rn

• Example:
letter_ -> A | B | … | Z | a | b | … | Z | _
digit -> 0 | 1 | … | 9
id -> letter_ (letter_ | digit)*
Recognition of tokens
• Starting point is the language grammar to understand the tokens:
stmt -> if expr then stmt
| if expr then stmt else stmt
|Ɛ
expr -> term relop term
| term
term -> id
| number
Recognition of tokens (cont.)
• The next step is to formalize the patterns:
digit -> [0-9]
Digits -> digit+
number -> digit(.digits)? (E[+-]? Digit)?
letter -> [A-Za-z_]
id -> letter (letter|digit)*
If -> if
Then -> then
Else -> else
Relop -> < | > | <= | >= | = | <>
• We also need to handle whitespaces:
ws -> (blank | tab | newline)+

Mohamed Sathak Engineering College B.E CSE III Year

Transition diagrams
• Transition diagram for relop
Transition diagrams (cont.)
• Transition diagram for reserved words and identifiers
Transition diagrams (cont.)
• Transition diagram for unsigned numbers

Mohamed Sathak Engineering College B.E CSE III Year

Transition diagrams (cont.)
• Transition diagram for whitespace
Lexical errors
• Some errors are out of power of lexical analyzer to recognize:
• fi (a == f(x)) …
• However it may be able to recognize errors like:
• d = 2r
• Such errors are recognized when no pattern for tokens matches a
character sequence
Error recovery
• Panic mode: successive characters are ignored until we reach to a
well formed token
• Delete one character from the remaining input
• Insert a missing character into the remaining input
• Replace a character by another character
• Transpose two adjacent characters

Mohamed Sathak Engineering College B.E CSE III Year

Lexical Analyzer Generator - Lex

Lex Source program

Lexical Compiler lex.yy.c
lex.l

C
lex.yy.c compiler
a.out

Input stream a.out

Sequence of
tokens
Role of Parsers
Classification pf Parsers
Elimination of left recursion
• A grammar is left recursive if it has a non-terminal A such that there
is a derivation A=> Aα +

• Top down parsing methods cant handle left-recursive grammars

• A simple rule for direct left recursion elimination:
• For a rule like:
• A -> A α|β
• We may replace it with
• A -> β A’
• A’ -> α A’ | ɛ
Left recursion elimination (cont.)
• There are cases like following
• S -> Aa | b
• A -> Ac | Sd | ɛ
• Left recursion elimination algorithm:
• Arrange the nonterminals in some order A1,A2,…,An.
• For (each i from 1 to n) {
• For (each j from 1 to i-1) {
• Replace each production of the form Ai-> Aj γ by the production Ai -> δ1 γ | δ2 γ | … |δk γ where Aj-> δ1 | δ2 | … |δk
are all current Aj productions
• }
• Eliminate left recursion among the Ai-productions
• }
Left factoring
• Left factoring is a grammar transformation that is useful for producing
a grammar suitable for predictive or top-down parsing.
• Consider following grammar:
• Stmt -> if expr then stmt else stmt
• | if expr then stmt
• On seeing input if it is not clear for the parser which production to use
• We can easily perform left factoring:
• If we have A->αβ1 | αβ2 then we replace it with
• A -> αA’
• A’ -> β1 | β2
Left factoring (cont.)
• Algorithm
• For each non-terminal A, find the longest prefix α common to two or more
of its alternatives. If α<> ɛ, then replace all of A-productions A->αβ1 |αβ2
| … | αβn | γ by
• A -> αA’ | γ
• A’ -> β1 |β2 | … | βn
• Example:
• S -> I E t S | i E t S e S | a
• E -> b
Top Down Parsing
• A Top-down parser tries to create a parse tree from the
root towards the leafs scanning input from left to right
• It can be also viewed as finding a leftmost derivation
for an input string
• Example: id+id*id

E -> TE’ E E E E E E
lm lm lm lm lm
E’ -> +TE’ | Ɛ T E’ T E’ T E’ T E’ T E’
T -> FT’
T’ -> *FT’ | Ɛ F T’ F T’ F T’ F T’ + T E’

F -> (E) | id id id Ɛ id Ɛ
Recursive descent parsing
• Consists of a set of procedures, one for each
nonterminal
• Execution begins with the procedure for start symbol
• A typical procedure for a non-terminal
void A() {
choose an A-production, A->X1X2..Xk
for (i=1 to k) {
if (Xi is a nonterminal
call procedure Xi();
else if (Xi equals the current input symbol a)
advance the input to the next symbol;
else /* an error has occurred */
}
}
Recursive descent parsing (cont)
• General recursive descent may require backtracking
• The previous code needs to be modified to allow backtracking
• In general form it cant choose an A-production easily.
• So we need to try all alternatives
• If one failed the input pointer needs to be reset and another
alternative should be tried
• Recursive descent parsers cant be used for left-recursive grammars
Example

S->cAd
A->ab | a Input: cad

S S S

c A d c A d c A d

a b a
First and Follow
• First() is set of terminals that begins strings derived from
• If α=>ɛ then *
is also in First(ɛ)
• In predictive parsing when we have A-> α|β, if First(α) and First(β) are
disjoint sets then we can select appropriate A-production by looking at the
next input
• Follow(A), for any nonterminal A, is set of terminals a that can
appear immediately* after A in some sentential form
• If we have S => αAaβ for some αand βthen a is in Follow(A)
• If A can be the rightmost symbol in some sentential form, then $ is
in Follow(A)
Computing First
• To compute First(X) for all grammar symbols X, apply following rules
until no more
* terminals or ɛ can be added to any First set:
1. If X is a terminal then First(X) = {X}.
2. If X is a nonterminal and X->Y1Y2…Yk is a production for some k>=1, then
place a in First(X) if for some i a is in First(Yi) and ɛ is in all of First(Y1),
…,First(Yi-1) that is Y1…Yi-1 => ɛ. if ɛ is in First(Yj) for j=1,…,k then add ɛ
to First(X).
3. If X-> ɛ is a production*
then add ɛ to First(X)
• Example!
Computing follow
• To compute First(A) for all nonterminals A, apply following rules until
nothing can be added to any follow set:
1. Place $ in Follow(S) where S is the start symbol
2. If there is a production A-> αBβ then everything in First(β) except ɛ is in
Follow(B).
3. If there is a production A->B or a production A->αBβ where First(β)
contains ɛ, then everything in Follow(A) is in Follow(B)
• Example!
LL(1) Grammars
• Predictive parsers are those recursive descent parsers needing no
backtracking
• Grammars for which we can create predictive parsers are called LL(1)
• The first L means scanning input from left to right
• The second L means leftmost derivation
• And 1 stands for using one input symbol for lookahead
• A grammar G is LL(1) if and only if whenever A-> α|βare two distinct
productions of G, the following conditions hold:
• For no terminal a do αandβ both derive strings beginning with a
• At most one*of α or βcan derive empty string
• If α=> ɛ then βdoes not derive any string beginning with a terminal in Follow(A).
Construction of Predictive Parsing table
• For each production A->α in grammar do the following:
1. For each terminal a in First(α) add A-> in M[A,a]
2. If ɛ is in First(α), then for each terminal b in Follow(A) add A-> ɛ to M[A,b].
If ɛ is in First(α) and $ is in Follow(A), add A-> ɛ to M[A,$] as well
• If after performing the above, there is no production in M[A,a] then
set M[A,a] to error
First
Example Follow

E -> TE’ F {(,id} {+, *, ), $}

T {(,id} {+, ), $}
E’ -> +TE’ | Ɛ {(,id} {), $}
T -> FT’ E
E’ {+,ɛ} {), $}
T’ -> *FT’ | Ɛ {+, ), $}
F -> (E) | id T’ {*,ɛ}
Input Symbol
Non -
terminal id + * ( ) $
E E -> TE’ E -> TE’

E’ E’ -> +TE’ E’ -> Ɛ E’ -> Ɛ

T T -> FT’ T -> FT’

T’ T’ -> Ɛ T’ -> *FT’ T’ -> Ɛ T’ -> Ɛ

F F -> id F -> (E)
Another example
S -> iEtSS’ | a
S’ -> eS | Ɛ
E -> b

Input Symbol
Non -
terminal a b e i t $
S S -> a S -> iEtSS’

S’ S’ -> Ɛ S’ -> Ɛ
S’ -> eS
E E -> b
Bottom-Up Parsing
• A bottom-up parser creates the parse tree of the given input
starting from leaves towards the root.
• A bottom-up parser tries to find the right-most derivation of the
given input in the reverse order.
S ⇒ ... ⇒ ω (the right-most derivation of ω)
← (the bottom-up parser finds the right-most derivation in the reverse order)

• Bottom-up parsing is also known as shift-reduce parsing because its

two main actions are shift and reduce.
• At each shift action, the current symbol in the input string is pushed to a stack.
• At each reduction step, the symbols at the top of the stack (this symbol sequence is the right side of a
production) will replaced by the non-terminal at the left side of that production.
• There are also two more actions: accept and error.
CS416 Compiler Design 42
Shift-Reduce Parsing
• A shift-reduce parser tries to reduce the given input string into the starting symbol.

a string 🡺 the starting symbol

reduced to

• At each reduction step, a substring of the input matching to the right side of a production rule is replaced by
the non-terminal at the left side of that production rule.
• If the substring is chosen correctly, the right most derivation of that string is created in the reverse order.
*
Rightmost Derivation: S⇒ω rm

Shift-Reduce Parser finds: ω ⇐ ... ⇐ S rm rm

CS416 Compiler Design 43

Shift-Reduce Parsing -- Example
S → aABb input string:aaabb
A → aA | a aaAbb
B → bB | b aAbb ⇓ reduction
aABb
S
rm rm rm rm
S ⇒ aABb ⇒ aAbb ⇒ aaAbb ⇒ aaabb

Right Sentential Forms

• How do we know which substring to be replaced at each reduction step?

CS416 Compiler Design 44
Handle
• Informally, a handle of a string is a substring that matches the right side of a production rule.
• But not every substring matches the right side of a production rule is handle

• A handle of a right sentential form γ (≡ αβω) is

a production rule A → β and a position of γ
where the string β may be found and replaced by A to produce
the previous right-sentential form in a rightmost derivation of γ.
*
S ⇒ αAω rm
⇒ αβω rm

• If the grammar is unambiguous, then every right-sentential form of the grammar has exactly
one handle.
• We will see that ω is a string of terminals.

CS416 Compiler Design 45

Handle Pruning
• A right-most derivation in reverse can be obtained by handle-pruning.
rm rm rm rm rm

S=γ0 ⇒ γ1 ⇒ γ2 ⇒ ... ⇒ γn-1 ⇒ γn= ω

input string

• Start from γn, find a handle An→βn in γn, and replace

βn in by An to get γn-1.
• Then find a handle An-1→βn-1 in γn-1, and replace βn-
1 in by An-1 to get γn-2.
• Repeat this, until we reach S.
CS416 Compiler Design 46
A Shift-Reduce Parser
E → E+T | T Right-Most Derivation of id+id*id
T → T*F | F E ⇒ E+T ⇒ E+T*F ⇒ E+T*id ⇒ E+F*id
F → (E) | id ⇒ E+id*id ⇒ T+id*id ⇒ F+id*id ⇒ id+id*id

Right-Most Sentential Form Reducing Production

id+id*id F → id
F+id*id T→F
T+id*id E→T
E+id*id F → id
E+F*id T→F
E+T*id F → id
E+T*F T → T*F
E+T E → E+T
E
Handles are red and underlined in the right-sentential forms.

CS416 Compiler Design 47

A Stack Implementation of A Shift-Reduce
Parser
• There are four possible actions of a shift-parser action:

1. Shift : The next input symbol is shifted onto the top of the stack.
2. Reduce: Replace the handle on the top of the stack by the non-terminal.
3. Accept: Successful completion of parsing.
4. Error: Parser discovers a syntax error, and calls an error recovery routine.

• Initial stack just contains only the end-marker $.

• The end of the input string is marked by the end-marker $.

CS416 Compiler Design 48

A Stack Implementation of A Shift-Reduce
Stack Parser Input Action
$ id+id*id$ shift
$id +id*id$ reduce by F → id Parse Tree
$F +id*id$ reduce by T → F
$T +id*id$ reduce by E → T E 8
$E +id*id$ shift
$E+ id*id$ shift E 3 + T 7
$E+id *id$ reduce by F → id
$E+F *id$ reduce by T → F T 2 T 5 * F6
$E+T *id$ shift
$E+T* id$ shift F 1 F 4 id
$E+T*id $ reduce by F → id
$E+T*F $ reduce by T → T*F id id
$E+T $ reduce by E → E+T
$E $ accept
CS416 Compiler Design 49
Conflicts During Shift-Reduce Parsing
• There are context-free grammars for which shift-reduce parsers cannot be
used.
• Stack contents and the next input symbol may not decide action:
• shift/reduce conflict: Whether make a shift operation or a reduction.
• reduce/reduce conflict: The parser cannot decide which of several reductions to make.
• If a shift-reduce parser cannot be used for a grammar, that grammar is called
as non-LR(k) grammar.

left to right right-most k lookhead

scanning derivation

• An ambiguous grammar can never be a LR grammar.

CS416 Compiler Design 50
Shift-Reduce Parsers
• There are two main categories of shift-reduce parsers

1. Operator-Precedence Parser CFG

• simple, but only a small class of grammars.
LR
LALR

SLR
2. LR-Parsers
• covers wide range of grammars.
• SLR – simple LR parser
• LR – most general LR parser
• LALR – intermediate LR parser (lookhead LR parser)
• SLR, LR and LALR work same, only their parsing tables are different.
CS416 Compiler Design 51
Actions of A LR-Parser
1. shift s -- shifts the next input symbol and the state s onto the stack
( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) 🡺 ( So X1 S1 ... Xm Sm ai s, ai+1 ... an $ )

2. reduce A→β (or rn where n is a production number)

• pop 2|β| (=r) items from the stack;
• then push A and s where s=goto[sm-r,A]

( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) 🡺 ( So X1 S1 ... Xm-r Sm-r A s, ai ... an $ )

• Output is the reducing production reduce A→β

3. Accept – Parsing successfully completed

4. Error -- Parser detected an error (an empty entry in the action table)
CS416 Compiler Design 52
Reduce Action
• pop 2|β| (=r) items from the stack; let us assume that β = Y1Y2...Yr
• then push A and s where s=goto[sm-r,A]

( So X1 S1 ... Xm-r Sm-r Y1 Sm-r ...Yr Sm, ai ai+1 ... an $ )

🡺 ( So X1 S1 ... Xm-r Sm-r A s, ai ... an $ )

• In fact, Y1Y2...Yr is a handle.

X1 ... Xm-r A ai ... an $ ⇒ X1 ... Xm Y1...Yr ai ai+1 ... an $

CS416 Compiler Design 53
Constructing SLR Parsing Tables – LR(0) Item
• An LR(0) item of a grammar G is a production of G a dot at the some position of the
right side.
•
Ex: A → aBb
. .
Possible LR(0) Items: A → aBb

.
(four different possibility)
A → aB b
.
A → a Bb

A → aBb
• Sets of LR(0) items will be the states of action and goto table of the SLR parser.
• A collection of sets of LR(0) items (the canonical LR(0) collection) is the basis for
constructing SLR parsers.
• Augmented Grammar:
G’ is G with a new production rule S’→S where S’ is the new starting symbol.
54
The Closure Operation
• If I is a set of LR(0) items for a grammar G, then closure(I) is the
set of LR(0) items constructed from I by the two rules:

.
1. Initially, every LR(0) item in I is added to closure(I).

.
2. If A → α Bβ is in closure(I) and B→γ is a production rule of G; then
B→ γ will be in the closure(I). We will apply
this rule until no more new LR(0) items can be added to closure(I).

CS416 Compiler Design 55

The Closure Operation -- Example
E’ → E .
closure({E’ → E}) =
E → E+T .
{ E’ → E kernel items
E→T .E+T
E→
T → T*F .
E→ T
T→F .T*F
T→
F → (E) .
T→ F
F → id .(E)
F→
.id }
F→
CS416 Compiler Design 56
Goto Operation
• If I is a set of LR(0) items and X is a grammar symbol (terminal or non-terminal), then
goto(I,X) is defined as follows:
•
.
If A → α Xβ in I
will be in goto(I,X).
.
then every item in closure({A → αX β})

Example:
I ={ E’ → .. .. .
E, E → E+T, E → T,
T→
F→ . . ..
T*F, T →
(E), F →
F,
id }

.. .
goto(I,E) = { E’ → E , E → E +T }
goto(I,T) = { E → T , T → T *F }
goto(I,F) = {T → F
. .. . . .
}
goto(I,() = { F → ( E), E → E+T, E → T, T → T*F, T → . F,

goto(I,id) = { F → id .
F→
}
(E), F → id }

CS416 Compiler Design 57

Construction of The Canonical LR(0)
Collection
• To create the SLR parsing tables for a grammar G, we will create the canonical
LR(0) collection of the grammar G’.

.
• Algorithm:
C is { closure({S’→ S}) }
repeat the followings until no more set of LR(0) items can be added to C.
for each I in C and each grammar symbol X
if goto(I,X) is not empty and not in C
add goto(I,X) to C

• goto function is a DFA on the sets in C.

CS416 Compiler Design 58
Example
Consider the grammar
S->(L)
S->a
L->S
L->L,S

Prepare the LR(0) parsing table for the above grammar

CS416 Compiler Design 59

CS416 Compiler Design 60
CS416 Compiler Design 61
The Canonical LR(0) Collection -- Example
I0: E’ → .E I1: E’ → E. I6: E → E+.T I9: E → E+T.
E → .E+T E → E.+T T → .T*F T → T.*F
E → .T T → .F
T → .T*F I2: E → T. F → .(E) I10: T → T*F.
T → .F T → T.*F F → .id
F → .(E)
F → .id I3: T → F. I7: T → T*.F I11: F → (E).
F → .(E)
I4: F → (.E) F → .id
E → .E+T
E → .T I8: F → (E.)
T → .T*F E → E.+T
T → .F
F → .(E)
F → .id

I5: F → id.

CS416 Compiler Design 62

Transition Diagram (DFA) of Goto Function
I0 E I1 + I6 T I9 * to I7
F to I3
to I4
(
T to I5
I2 I7 id
I10
I3 to I4
F *
F to I5
I4 I8
to I2
( I11
I5 to I3
id to I6
( to I4

E
)
id id T
F +

CS416 Compiler Design 63

Actions of A (S)LR-Parser -- Example
stack input action output
0 id*id+id$ shift 5
0id5 *id+id$ reduce by F→id F→id
0F3 *id+id$ reduce by T→F T→F
0T2 *id+id$ shift 7
0T2*7 id+id$ shift 5
0T2*7id5 +id$ reduce by F→id F→id
0T2*7F10 +id$ reduce by T→T*F T→T*F
0T2 +id$ reduce by E→T E→T
0E1 +id$ shift 6
0E1+6 id$ shift 5
0E1+6id5 $ reduce by F→id F→id
0E1+6F3 $ reduce by T→F T→F
0E1+6T9 $ reduce by E→E+T E→E+T
0E1 $ accept

CS416 Compiler Design 64

Constructing SLR Parsing Table
(of an augumented grammar G’)

1. Construct the canonical collection of sets of LR(0) items for G’. C←{I0,...,In}

2. Create the parsing action table as follows

• If a is a terminal, A→α.aβ in Ii and goto(Ii,a)=Ij then action[i,a] is shift j.
• If A→α. is in Ii , then action[i,a] is reduce A→α for all a in FOLLOW(A) where A≠S’.
• If S’→S. is in Ii , then action[i,$] is accept.
• If any conflicting actions generated by these rules, the grammar is not SLR(1).

3. Create the parsing goto table

• for all non-terminals A, if goto(I i,A)=Ij then goto[i,A]=j

4. All entries not defined by (2) and (3) are errors.

5. Initial state of the parser contains S’→.S

CS416 Compiler Design 65
(SLR) Parsing Tables for Expression Grammar
Action Table Goto Table

1) E → E+T state id + * ( ) $ E T F
2) E→T
0 s5 s4 1 2 3
3) T → T*F
4) T→F 1 s6 acc
5) F → (E) 2 r2 s7 r2 r2
6) F → id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5

CS416 Compiler Design 66

Parsing Tables of Expression Grammar
Action Table Goto Table

state id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5

CS416 Compiler Design 67

SLR(1) Grammar
• An LR parser using SLR(1) parsing tables for a grammar G is called as
the SLR(1) parser for G.
• If a grammar G has an SLR(1) parsing table, it is called SLR(1)
grammar (or SLR grammar in short).
• Every SLR grammar is unambiguous, but every unambiguous
grammar is not a SLR grammar.

CS416 Compiler Design 68

shift/reduce and reduce/reduce conflicts
• If a state does not know whether it will make a shift operation or
reduction for a terminal, we say that there is a shift/reduce conflict.

• If a state does not know whether it will make a reduction operation

using the production rule i or j for a terminal, we say that there is
a reduce/reduce conflict.

• If the SLR parsing table of a grammar G has a conflict, we say that

that grammar is not SLR grammar.

CS416 Compiler Design 69

Conflict Example
S → L=R I0: S’ → .S I1: S’ → S. I6: S → L=.R I9: S → L=R.
S→R S → .L=R R → .L
L→ *R S → .R I2: S → L.=R L→ .*R
L → id L → .*R R → L. L → .id
R→L L → .id
R → .L I3: S → R.

I4: L → .R I7: L → R.

Problem R → .L
FOLLOW(R)={=,$} L→ .*R I8: R → L.
= shift 6 L → .id
reduce by R → L
shift/reduce conflict I5: L → id.
CS416 Compiler Design 70
Conflict Example2
S → AaAb I0: S’ → .S
S → BbBa S → .AaAb
A→ε S → .BbBa
B→ε A→.
B→.

Problem
FOLLOW(A)={a,b}
FOLLOW(B)={a,b}
a reduce by A → ε b reduce by A → ε
reduce by B → ε reduce by B → ε
reduce/reduce conflict reduce/reduce conflict
CS416 Compiler Design 71
Constructing Canonical LR(1) Parsing Tables
• In SLR method, the state i makes a reduction by A→α when the current
token is a:
• if the A→α. in the Ii and a is FOLLOW(A)

• In some situations, βA cannot be followed by the terminal a in a

right-sentential form when βα and the state i are on the top stack. This
means that making reduction in this case is not correct.

S → AaAb S⇒AaAb⇒Aab⇒ab S⇒BbBa⇒Bba⇒ba

S → BbBa
A→ε Aab ⇒ ε ab Bba ⇒ ε ba
B→ε AaAb ⇒ Aa ε b BbBa ⇒ Bb ε a
CS416 Compiler Design 72
LR Parsers

• The most powerful shift-reduce parsing (yet efficient) is:

LR(k) parsing.

left to right right-most k lookhead

scanning derivation (k is omitted 🡺 it is 1)

• LR parsing is attractive because:

• LR parsing is most general non-backtracking shift-reduce parsing, yet it is still efficient.
• The class of grammars that can be parsed using LR methods is a proper superset of the class of grammars that can
be parsed with predictive parsers. LL(1)-Grammars ⊂ LR(1)-Grammars
• An LR-parser can detect a syntactic error as soon as it is possible to do so a left-to-right scan of the input.

CS416 Compiler Design 73

LR Parsers
• LR-Parsers
• covers wide range of grammars.
• SLR – simple LR parser
• LR – most general LR parser
• LALR – intermediate LR parser (look-head LR parser)
• SLR, LR and LALR work same (they used the same algorithm), only their
parsing tables are different.

CS416 Compiler Design 74

LR Parsing Algorithm
input a1 ... ai ... an $
stack

Sm
Xm output
LR Parsing Algorithm
Sm-1
Xm-1
.
. Action Table Goto Table
S1 terminals and $ non-terminal
s s
X1 t four different t each item is
a actions a a state number
S0 t t
e e
s s

CS416 Compiler Design 75

A Configuration of LR Parsing Algorithm
• A configuration of a LR parsing is:

( So X1 S1 ... Xm Sm, ai ai+1 ... an $ )

Stack Rest of Input

• Sm and ai decides the parser action by consulting the parsing action table. (Initial Stack
contains just So )

• A configuration of a LR parsing represents the right sentential form:

X1 ... Xm ai ai+1 ... an $

CS416 Compiler Design 76
LR(1) Item
• To avoid some of invalid reductions, the states need to carry more
information.
• Extra information is put into a state by including a terminal symbol
as a second component in an item.

• A LR(1) item is:

.
A → α β,a where a is the look-head of the LR(1) item
(a is a terminal or end-marker.)

CS416 Compiler Design 77

LR(1) Item (cont.)
.
• When β ( in the LR(1) item A → α β,a ) is not empty, the look-head
does not have any affect.
.
• When β is empty (A → α ,a ), we do the reduction by A→α only if
the next input symbol is a (not for any terminal in FOLLOW(A)).

.
• A state will contain A → α ,a where {a ,...,a } ⊆ FOLLOW(A)
1 1 n

...
A → α ,an.

CS416 Compiler Design 78

Steps to solve LR parser
• Augmented grammar
• Computation of canonical items by closure and goto functions
• DFA generation
• CLR parsing table generation
• Parsing the input with the help of parsing table

CS416 Compiler Design 79

Augmented Grammar G’
Augmented Grammar G’:
This equals G ∪ {S’ 🡪 S} where S is the start state of G.
The start state of G’ = S’.
This is done to signal to the parser when the parsing should stop to
announce acceptance of input.

CS416 Compiler Design 80

What is meant by canonical item
• Item: An LR (0) item or simply, an item of a grammar G is a
production of G with a dot ‘.’ at some position of the right side. For
example, the production A 🡪 XYZ yields four items,
• A 🡪 .XYZ
• A 🡪 X.YZ
• A 🡪 XY.Z
• A 🡪 XYZ.
• A production rule of the form A 🡪 ε yields only one item A 🡪 . .
Intuitively, an item shows how much of a production we have seen
till the current point in the parsing procedure.

CS416 Compiler Design 81

Canonical Collection of Sets of LR(1) Items
• The construction of the canonical collection of the sets of LR(1) items are
similar to the construction of the canonical collection of the sets of LR(0)
items, except that closure and goto operations work a little bit different.
LR(0) item has format A→α.Bβ
•
LR(1) item has format A→α.Bβ, a (production rule , look ahead )
closure(I) is: ( where I is a set of LR(1) items)
• every LR(1) item in I is in closure(I)
•
.
if A→α B β,a in closure(I) and B→γ is a production rule of G;
will be in the closure(I) for each terminal
then B→.γ, b

• b in FIRST(βa) .
82
goto operation
• If I is a set of LR(1) items and X is a grammar symbol (terminal or
non-terminal), then goto(I,X) is defined as follows:
• If A → α.Xβ,a in I then every item
in closure({A → αX.β,a}) will be in goto(I,X).
• If A → α.Xβ,a in I then goto(I,X)= A → αX.β,a
• Shifting of dot one symbol ahead keeping look ahead symbol as it is

CS416 Compiler Design 83

Construction of The Canonical LR(1)
Collection
• Algorithm:
C is { closure({S’→.S,$}) }
repeat the followings until no more set of LR(1) items can be added to C.
for each I in C and each grammar symbol X
if goto(I,X) is not empty and not in C
add goto(I,X) to C

• goto function is a DFA on the sets in C.

CS416 Compiler Design 84

A Short Notation for The Sets of LR(1) Items
• A set of LR(1) items containing the following items
.
A → α β,a1
...
.
A → α β,an

can be written as

.
A → α β,a1/a2/.../an
CS416 Compiler Design 85
EXAMPLE
GRAMMAR:
1. S’ -> S
2. S -> CC
3. C -> aC
4. C -> d
No goto and closure
operations , because the . Is

SET
I0 : S’ -> .S, $ OF ITEMS:
Goto(I0,S)
I1: S’ -> S., $
at the end of production rule

S -> .CC, $
C -> .a C, a /d Goto(I0,C) I2: S -> C.C, $ Goto(I2,C)
C -> .aC, $ I5: S -> CC., $
C -> .d, a/d C -> .d, $

I6: C -> a.C, $

G oto(I2,a) C -> .aC, $
C -> .d, $

Goto(I3,a)
Goto(I6,a)
Goto(I6,d)
Goto(I2,d) I7: C -> d., $

Goto(I0,a) I3: C -> a. C, a /d

C -> .aC, a /d Goto(I3,C)
C -> .d, a /d
I8: C -> aC., a /d
Goto(I0,d) Goto(I6,C)
I4: C -> d.,
a /d I9: C -> aC., $

Goto(I3,d)
DFA I5
S
I0 I1 a
C

a I6
C
I2
d
d
I7

a
I3 C
C
a I8
d

d
I4
I9

CS416 Compiler Design 88

• PARSING TABLEgrammar
Input: An augmented GENERATION
G’.
• Output: The canonical LR parsing table functions action and goto for G’
• Method :
• Construct C={I0,I1………..,In}, the collection of sets of LR(1) items for G’.
• State I of the parser is constructed from Ii. The parsing actions for state I
are determined as follows :
• If [A 🡪 α. a β, b] is in Ii, and goto(Ii, a) = Ij, then set action[ i,a] to “shift j.”
Here, a is required to be a terminal.
• b) If [ A 🡪 α., a] is in Ii, A ≠ S’, then set action[ i,a] to “reduce A 🡪 α.”
• c) If [S’🡪S.,$] is in Ii, then set action[ i ,$] to “accept.”

CS416 Compiler Design 89

Cont…
• The goto transition for state i are determined as follows: If goto(Ii ,
A)= Ij ,then goto[i,A]=j.
• All entries not defined by rules(2) and (3) are made “error.”
• The initial state of the parser is the one constructed from the set
containing item [S’🡪.S, $].

CS416 Compiler Design 90

Parsing table
State a d $ S C

0 1 2

2 5

3 8

6 9

CS416 Compiler Design 91

Parsing table
State a d $ S C

0 S3 S4 1 2

1 ACCEPT

2 S6 S7 5

3 S3 S4 8

6 S6 S7 9

CS416 Compiler Design 92

Parsing table
State a d $ S C

0 S3 S4 1 2

1 ACCEPT

2 S6 S7 5

3 S3 S4 8

4 R4 R4

5 R2

6 S6 S7 9

7 R4

8 R3 R3

9 R3

CS416 Compiler Design 93

Stack
Parsing the input string
Input buffer Action table Goto table Parsing action

$0 aadd$ action[0,a]=s3
$0a3 add$
action[3,a]=s3 SHIFT

$0a3a3 dd$ action[3,d]=s4 SHIFT

$0a3a3d4 d$ [3,C]=8
action[4,d]=r3 Reduce
$0a3a3C8 d$ action[8,d]=r2 [3,C]=8 Reduce
$0a3C8 d$
action[8,d]=r2 [0,C]=2 Reduce
$0C2 d$ action[2,d]=s7 SHIFT
CS416 Compiler Design 94
Stack
Parsing the input string
Input buffer Action table Goto table Parsing action

$0C2d7 $ action[7,$]=r3 [2,C]=5 Reduce

$0C2C5 $ [0,S]=1
action[5,$]=r3 Reduce

$0S1 $
Accept

CS416 Compiler Design 95

SLR parsing
• Continuing the same way, we define all LR(0) item states:
I1
S R
S'→∙ S S'→ S ∙ I6 S → L= ∙ S → L=R
I0
S→∙ R ∙
R→∙L i I9
L=R L I3
S→ L = L → ∙ *R d
S→∙R
I2 ∙=R L → ∙ id
L → ∙ *R L
R→ L∙
L → ∙ id * *
R→∙L L → *∙
i R I5 R L
R→ L I7
d R→∙L
I3 L → id ∙
i L → ∙ id R
∙ L → *R I8
d L→∙* ∙
* R
I4 S→ R
∙
Canonical LR(1) Collection -- Example
S
A
S → AaAb I0: S’ → .S ,$ I1: S’ → S. ,$ a to I4
S → BbBa S → .AaAb ,$ B
b to I5
A→ε S → .BbBa ,$ I2: S → A.aAb ,$
B→ε A → . ,a
A a
B → . ,b I3: S → B.bBa ,$

I4: S → Aa.Ab ,$ B I6: S → AaA.b ,$ b I8: S → AaAb. ,$

A → . ,b

I5: S → Bb.Ba ,$ I7: S → BbB.a ,$ I9: S → BbBa. ,$

B → . ,a

CS416 Compiler Design 97

Canonical LR(1) Collection – Example2
S’ → S I0:S’ → .S,$ I1:S’ → S.,$ I4:L → *.R,$/= R to I7
1) S → L=R S → .L=R,$ S * R → .L,$/= L
to I8
2) S → R S → .R,$ L I2:S → L.=R,$ to I6 L→ .*R,$/= *
3) L→ *R L → .*R,$/= R → L.,$ L → .id,$/= to I4
id
4) L → id L → .id,$/= R to I5
I3:S → R.,$ id
5) R → L R → .L,$ I5:L → id.,$/=

I9:S → L=R.,$
R I13:L → *R.,$
I6:S → L=.R,$ to I9
L I10:R → L.,$
R → .L,$ to I10
L → .*R,$ * I4 and I11
to I11 R
L → .id,$ I11:L → *.R,$ to I13
id L
to I12 R → .L,$ to I10 I5 and I12
I7:L → *R.,$/= L→ .*R,$ *
to I11
L → .id,$ id I7 and I13
I8: R → L.,$/= to I12
I12:L → id.,$ I8 and I10
CS416 Compiler Design 98
Construction of LR(1) Parsing Tables
1. Construct the canonical collection of sets of LR(1) items for G’. C←{I0,...,In}

2. Create the parsing action table as follows

•
.
If a is a terminal, A→α aβ,b in Ii and goto(Ii,a)=Ij then action[i,a] is shift j.
•
.
If A→α ,a is in Ii , then action[i,a] is reduce A→α where A≠S’.
If S’→S.,$ is in I , then action[i,$] is accept.
•
i

• If any conflicting actions generated by these rules, the grammar is not LR(1).

3. Create the parsing goto table

• for all non-terminals A, if goto(Ii,A)=Ij then goto[i,A]=j

4. All entries not defined by (2) and (3) are errors.

5. Initial state of the parser contains S’→.S,$

CS416 Compiler Design 99
LR(1) idParsing
* = Tables
$ S – (for
L R Example2)
0 s5 s4 1 2 3
1 acc
2 s6 r5
3 r2
4 s5 s4 8 7
no shift/reduce or
5 r4 r4 no reduce/reduce conflict

⇓
6 s12 s11 10 9
7 r3 r3
so, it is a LR(1) grammar
8 r5 r5
9 r1
10 r5
11 s12 s11 10 13
12 r4
13 r3

CS416 Compiler Design 100

LALR Parsing Tables
• LALR stands for LookAhead LR.

• LALR parsers are often used in practice because LALR parsing tables
are smaller than LR(1) parsing tables.
• The number of states in SLR and LALR parsing tables for a grammar
G are equal.
• But LALR parsers recognize more grammars than SLR parsers.
• yacc creates a LALR parser for the given grammar.
• A state of LALR parser will be again a set of LR(1) items.
CS416 Compiler Design 101
Creating LALR Parsing Tables

Canonical LR(1) Parser 🡺 LALR Parser

shrink # of states

• This shrink process may introduce a reduce/reduce conflict in the

resulting LALR parser (so the grammar is NOT LALR)
• But, this shrink process does not produce a shift/reduce conflict.

CS416 Compiler Design 102

The Core of A Set of LR(1) Items
• The core of a set of LR(1) items is the set of its first component.

. .
. .
Ex: S → L =R,$ 🡺 S → L =R Core
R → L ,$ R→L

• We will find the states (sets of LR(1) items) in a canonical LR(1) parser with same cores. Then we will merge them as a
single state.

. .
.
I1:L → id ,= A new state: I12: L → id ,=

.
🡺 L → id ,$
I2:L → id ,$ have same core, merge them

• We will do this for all states of a canonical LR(1) parser to get the states of the LALR parser.
• In fact, the number of the states of the LALR parser for a grammar will be equal to the number of states of the SLR
parser for that grammar.
CS416 Compiler Design 103
Creation of LALR Parsing Tables
• Create the canonical LR(1) collection of the sets of LR(1) items for the given
grammar.
• Find each core; find all sets having that same core; replace those sets having same
cores with a single set which is their union.
C={I0,...,In} 🡺 C’={J1,...,Jm} where m ≤ n
• Create the parsing tables (action and goto tables) same as the construction of the
parsing tables of LR(1) parser.
• Note that: If J=I1 ∪ ... ∪ Ik since I1,...,Ik have same cores
🡺 cores of goto(I1,X),...,goto(I2,X) must be same.
• So, goto(J,X)=K where K is the union of all sets of items having same cores as goto(I1,X).

• If no conflict is introduced, the grammar is LALR(1) grammar. (We may only

introduce reduce/reduce conflicts; we cannot introduce a shift/reduce conflict)
CS416 Compiler Design 104
Shift/Reduce Conflict
• We say that we cannot introduce a shift/reduce conflict during the shrink process for the
creation of the states of a LALR parser.
• Assume that we can introduce a shift/reduce conflict. In this case, a state of LALR parser
must have:
. and B → β.aγ,b
A → α ,a
• This means that a state of the canonical LR(1) parser must have:
A → α.,a and B → β.aγ,c
But, this state has also a shift/reduce conflict. i.e. The original canonical LR(1) parser has a
conflict.
(Reason for this, the shift operation does not depend on lookaheads)

CS416 Compiler Design 105

Reduce/Reduce Conflict
• But, we may introduce a reduce/reduce conflict during the shrink
process for the creation of the states of a LALR parser.

.
I1 : A → α ,a .
I2: A → α ,b
B → β.,b B → β.,c
⇓
.
I12: A → α ,a/b 🡺 reduce/reduce conflict
B → β.,b/c

CS416 Compiler Design 106

Canonical LALR(1) Collection – Example2
S’ → S I0:S’ → . S,$ .
I1:S’ → S ,$ .
I411:L → * R,$/= R
. .
to I713

.
1) S → L=R S→ L=R,$ S * R→ L,$/= L
2) S → R
S→ . R,$
.
L I2:S → L =R,$ to I6
L→ .*R,$/= *
to I810
3) L→ *R
. R → L ,$
. id
to I411

.
L→ *R,$/= L→ id,$/=
. .
4) L → id R to I512
I3:S → R ,$ id
L→ id,$/= I512:L → id ,$/=
.
5) R → L
R→ L,$

. R
to I9 I9:S → L=R ,$ .
.
I6:S → L= R,$ Same Cores
L
I4 and I11
.
R → L,$ to I810
*

.
L → *R,$ to I411
id I5 and I12
L → id,$ to I512
.
I713:L → *R ,$/= I7 and I13

.
I810: R → L ,$/= I8 and I10
CS416 Compiler Design 107
LALR(1)
id Parsing
* = $Tables
S L– (for
R Example2)
0 s5 s4 1 2 3
1 acc
2 s6 r5
3 r2
4 s5 s4 8 7
no shift/reduce or
5 r4 r4 no reduce/reduce conflict

⇓
6 s12 s11 10 9
7 r3 r3
so, it is a LALR(1) grammar
8 r5 r5
9 r1

CS416 Compiler Design 108

Using Ambiguous Grammars
• All grammars used in the construction of LR-parsing tables must be un-
ambiguous.
• Can we create LR-parsing tables for ambiguous grammars ?
• Yes, but they will have conflicts.
• We can resolve these conflicts in favor of one of them to disambiguate the grammar.
• At the end, we will have again an unambiguous grammar.

• Why we want to use an ambiguous grammar?

• Some of the ambiguous grammars are much natural, and a corresponding unambiguous grammar can be very
complex.
• Usage of an ambiguous grammar may eliminate unnecessary reductions.

• Ex.
E → E+T | T
E → E+E | E*E | (E) | id 🡺 T → T*F | F
F → (E) | id
CS416 Compiler Design 109
Sets of LR(0) Items for Ambiguous Grammar
I : E’ → .E I : E’ → E. I : E → E + .E I : E → E+E.
E + E + I
E → .E+E E → E .+E E → .E+E E → E.+E
0 1 4 7 4
( *
I
E → .E*E E → E .*E E → .E*E E → E.*E
5
I 2
id
E → .(E) E → .(E)
*
I 3

E → .id ( E → .id
I : E → E *.E
(
I : E → E*E.
E
E → .E+E
+ I
I : E → (.E)
5
(
E → E.+E
8 4
*
E → .E+E E → .E*E
id I I
E → E.*E
2 2
5

.
I
E → .E*E
3
E E → (E)
id E → .(E) E → .id
E → .id I : E → (E.) I : E → (E).
id )
E → E.+E
6 9
+
I : E → id.
E → E.*E
3
* I 4

I 5

CS416 Compiler Design 110

SLR-Parsing Tables for Ambiguous Grammar
FOLLOW(E) = { $,+,*,) }

State I7 has shift/reduce conflicts for symbols + and *.

I0 E I1 + I4 E I7

when current token is +

shift 🡺 + is right-associative
reduce 🡺 + is left-associative

when current token is *

shift 🡺 * has higher precedence than +
reduce 🡺 + has higher precedence than *

CS416 Compiler Design 111

SLR-Parsing Tables for Ambiguous Grammar
FOLLOW(E) = { $,+,*,) }

State I8 has shift/reduce conflicts for symbols + and *.

I0 E I1 * I5 E I7

when current token is *

shift 🡺 * is right-associative
reduce 🡺 * is left-associative

when current token is +

shift 🡺 + has higher precedence than *
reduce 🡺 * has higher precedence than +

CS416 Compiler Design 112

SLR-Parsing Tables for Ambiguous Grammar
Action Goto

id + * ( ) $ E
0 s3 s2 1
1 s4 s5 acc
2 s3 s2 6
3 r4 r4 r4 r4
4 s3 s2 7
5 s3 s2 8
6 s4 s5 s9
7 r1 s5 r1 r1
8 r2 r2 r2 r2
9 r3 r3 r3 r3
CS416 Compiler Design 113
Error Recovery in LR Parsing
• An LR parser will detect an error when it consults the parsing action table and
finds an error entry. All empty entries in the action table are error entries.
• Errors are never detected by consulting the goto table.
• An LR parser will announce error as soon as there is no valid continuation for
the scanned portion of the input.
• A canonical LR parser (LR(1) parser) will never make even a single reduction
before announcing an error.
• The SLR and LALR parsers may make several reductions before announcing an
error.
• But, all LR parsers (LR(1), LALR and SLR parsers) will never shift an erroneous
input symbol onto the stack.
CS416 Compiler Design 114
Panic Mode Error Recovery in LR Parsing
• Scan down the stack until a state s with a goto on a particular
nonterminal A is found. (Get rid of everything from the stack before
this state s).
• Discard zero or more input symbols until a symbol a is found that can
legitimately follow A.
• The symbol a is simply in FOLLOW(A), but this may not work for all situations.

• The parser stacks the nonterminal A and the state goto[s,A], and it
resumes the normal parsing.
• This nonterminal A is normally is a basic programming block (there
can be more than one choice for A).
• stmt, expr, block, ...
CS416 Compiler Design 115
Phrase-Level Error Recovery in LR Parsing
• Each empty entry in the action table is marked with a specific error
routine.
• An error routine reflects the error that the user most likely will
make in that case.
• An error routine inserts the symbols into the stack or the input (or it
deletes the symbols from the stack and the input, or it can do both
insertion and deletion).
• missing operand
• unbalanced right parenthesis

CS416 Compiler Design 116

YACC
Basic Operational Sequence

File containing desired

grammar in YACC format
gram.y

YACC program
yacc

C source program created by YACC

y.tab.c

cc C compiler
or gcc

Executable program that will parse

grammar given in gram.y
a.out
YACC program/ File Format
Definitions

Rules

Supplementary Code
Definitions Section
Example

%{
#include <stdio.h>
#include <stdlib.h>
%} This is called a terminal

%token ID NUM
%start expr

The start symbol

(non-terminal)
Rules Section
• Is a grammar

• Example

expr : expr '+' term | term;

term : term '*' factor | factor;
factor : '(' expr ')' | ID | NUM;
Rules Section
• Normally written like this
• Example:
expr : expr '+' term
| term
;
OR
Expr : Expr '+‘ NUM { $$ = $1 + $3; }
I Expr ‘- ' NUM { $$ = $1 - $3; }
| NUM { $$ = $1; }
;
Semantic actions
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
;
Semantic actions (cont’d)
$1
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
;
Semantic actions (cont’d)
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
$2
;
Semantic actions (cont’d)
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
; $3
Default: $$ = $1;

Carton Packaging Knowledge
88% (8)
Carton Packaging Knowledge
93 pages
CS3304 9 LanguageSyntax 2 PDF
No ratings yet
CS3304 9 LanguageSyntax 2 PDF
39 pages
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
From Everand
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
Sherwyn Allibang
5/5 (2)
Lexical and Syntax Analysis
No ratings yet
Lexical and Syntax Analysis
63 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Chapter 4 - Syntax Analysis CIE1
No ratings yet
Chapter 4 - Syntax Analysis CIE1
69 pages
Compiler
No ratings yet
Compiler
60 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
63 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
52 pages
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
No ratings yet
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
41 pages
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
No ratings yet
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
41 pages
Lexical Analysis 3
No ratings yet
Lexical Analysis 3
27 pages
Compiler Rewind
No ratings yet
Compiler Rewind
52 pages
Lecture 4 Lexical Analysis
No ratings yet
Lecture 4 Lexical Analysis
23 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
CD KCS502 Unit 1 B
No ratings yet
CD KCS502 Unit 1 B
12 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
Comp Review: Compilers: Fall 1996 Textbook: "Compilers" by Aho, Sethi & Ullman
No ratings yet
Comp Review: Compilers: Fall 1996 Textbook: "Compilers" by Aho, Sethi & Ullman
10 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
CH 3 Myppt
No ratings yet
CH 3 Myppt
59 pages
SSC Module2 LexicalAnalysis
No ratings yet
SSC Module2 LexicalAnalysis
26 pages
Compiler Designnotes
No ratings yet
Compiler Designnotes
18 pages
Module 5 Lexical Analyser
No ratings yet
Module 5 Lexical Analyser
10 pages
Chapter 3 - Lexical Analysis
100% (3)
Chapter 3 - Lexical Analysis
51 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Recap: Mooly Sagiv
No ratings yet
Recap: Mooly Sagiv
42 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
CD - Ch.1
No ratings yet
CD - Ch.1
28 pages
Ch2+3 Compiler
No ratings yet
Ch2+3 Compiler
21 pages
Unit 6
No ratings yet
Unit 6
109 pages
Ch3 1
No ratings yet
Ch3 1
52 pages
Lecture 02
No ratings yet
Lecture 02
150 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
CD Notes by Quantum City AIR 107, GATE CS 2024, Shreyas Rathod Compiler
No ratings yet
CD Notes by Quantum City AIR 107, GATE CS 2024, Shreyas Rathod Compiler
37 pages
Lexical Analyzer in Perspective: Parser Source Program Token
No ratings yet
Lexical Analyzer in Perspective: Parser Source Program Token
22 pages
PL Lec 2 Syntax and Semantics
No ratings yet
PL Lec 2 Syntax and Semantics
48 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
34 pages
UNIT-I - Lexical Analysis
No ratings yet
UNIT-I - Lexical Analysis
51 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Top Down PDF
No ratings yet
Top Down PDF
49 pages
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
No ratings yet
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
40 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
Notes - IAE-1-CD
No ratings yet
Notes - IAE-1-CD
14 pages
Compiler Construction CS-4207: Lecture 8-9 Instructor Name: Atif Ishaq
No ratings yet
Compiler Construction CS-4207: Lecture 8-9 Instructor Name: Atif Ishaq
34 pages
Chapter 3
No ratings yet
Chapter 3
96 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
4-Intro To Flex and Bison-09!09!2024
No ratings yet
4-Intro To Flex and Bison-09!09!2024
28 pages
Comp Final
No ratings yet
Comp Final
16 pages
SPCC - 5
No ratings yet
SPCC - 5
19 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Asia-Pacific Trade Agreement
No ratings yet
Asia-Pacific Trade Agreement
2 pages
CG Project Report
No ratings yet
CG Project Report
25 pages
Chapter1 InteractionsandMotion
No ratings yet
Chapter1 InteractionsandMotion
44 pages
RMK Engineering College Digital India Activities
No ratings yet
RMK Engineering College Digital India Activities
2 pages
Loan Approval Prediction System Using Machina Learning
No ratings yet
Loan Approval Prediction System Using Machina Learning
4 pages
Cue Words Relaxation
No ratings yet
Cue Words Relaxation
4 pages
Stock Tables and Stock Types
No ratings yet
Stock Tables and Stock Types
10 pages
Vet Pharm Therapeutics - 2020 - Broughton Neiswanger - Pharmacometabolomics With A Combination of PLS DA and Random
No ratings yet
Vet Pharm Therapeutics - 2020 - Broughton Neiswanger - Pharmacometabolomics With A Combination of PLS DA and Random
11 pages
Number Series
No ratings yet
Number Series
16 pages
Geuself Module 3 Solo PDF March 2024
No ratings yet
Geuself Module 3 Solo PDF March 2024
8 pages
Dhupguri Report
No ratings yet
Dhupguri Report
11 pages
Ben Beya Article Rodopi Caribbean Global Ethics
No ratings yet
Ben Beya Article Rodopi Caribbean Global Ethics
14 pages
Philippine Public Administration
No ratings yet
Philippine Public Administration
15 pages
U2000 Northbound Performance File Interface Developer Guide (NE-Based)
No ratings yet
U2000 Northbound Performance File Interface Developer Guide (NE-Based)
79 pages
Akash Internship Report
No ratings yet
Akash Internship Report
49 pages
Jamb Mat Questions 1 5
No ratings yet
Jamb Mat Questions 1 5
46 pages
Soe Hed Cbcs Syllabus
No ratings yet
Soe Hed Cbcs Syllabus
53 pages
Abbotsford VFR Terminal Procedures Chart Rwy 01 & 19
No ratings yet
Abbotsford VFR Terminal Procedures Chart Rwy 01 & 19
3 pages
Glass Ampoules & Glass Vials Import Sample
No ratings yet
Glass Ampoules & Glass Vials Import Sample
15 pages
Solaris Disk Quota Implementation
No ratings yet
Solaris Disk Quota Implementation
2 pages
DR - AishaCv 20250422 152511 0000
No ratings yet
DR - AishaCv 20250422 152511 0000
4 pages
TSR Notes
No ratings yet
TSR Notes
6 pages
Mental Health Essay
100% (2)
Mental Health Essay
7 pages
Wafers: Basic Wafer Types
No ratings yet
Wafers: Basic Wafer Types
7 pages
Trevithick Second Steam Locomotive PDF
50% (2)
Trevithick Second Steam Locomotive PDF
6 pages
1 s2.0 S0263224113006519 Main
No ratings yet
1 s2.0 S0263224113006519 Main
11 pages
IFU SURGICAL INSTRUMENTS Titan
No ratings yet
IFU SURGICAL INSTRUMENTS Titan
2 pages
Biotechnology and It's Application by Hare Krishna Deepak
No ratings yet
Biotechnology and It's Application by Hare Krishna Deepak
42 pages
RRL
100% (1)
RRL
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

SP Unit III-2024-25

Uploaded by

SP Unit III-2024-25

Uploaded by

MIT Art Design and Technology University

MIT School of Computing, Pune

Class - T.Y. (SEM-II)

Unit III –Compilers 09 hours

Error messages Output

Try for example:

Relocatable Object Code

Absolute Machine Code

Intermediate code generator Three-address code, quads, or RTL int2fp B t1

Token Informal description Sample lexemes

id Letter followed by letter and digits pi, score, D2

printf(“total = %d\n”, score);

Mohamed Sathak Engineering College B.E CSE III Year

Mohamed Sathak Engineering College B.E CSE III Year

Mohamed Sathak Engineering College B.E CSE III Year

Lex Source program

Input stream a.out

• Top down parsing methods cant handle left-recursive grammars

E -> TE’ F {(,id} {+, *, ), $}

E’ E’ -> +TE’ E’ -> Ɛ E’ -> Ɛ

T T -> FT’ T -> FT’

T’ T’ -> Ɛ T’ -> *FT’ T’ -> Ɛ T’ -> Ɛ

• Bottom-up parsing is also known as shift-reduce parsing because its

a string 🡺 the starting symbol

Shift-Reduce Parser finds: ω ⇐ ... ⇐ S rm rm

CS416 Compiler Design 43

Right Sentential Forms

• How do we know which substring to be replaced at each reduction step?

• A handle of a right sentential form γ (≡ αβω) is

CS416 Compiler Design 45

S=γ0 ⇒ γ1 ⇒ γ2 ⇒ ... ⇒ γn-1 ⇒ γn= ω

• Start from γn, find a handle An→βn in γn, and replace

Right-Most Sentential Form Reducing Production

CS416 Compiler Design 47

• Initial stack just contains only the end-marker $.

CS416 Compiler Design 48

left to right right-most k lookhead

• An ambiguous grammar can never be a LR grammar.

1. Operator-Precedence Parser CFG

2. reduce A→β (or rn where n is a production number)

( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) 🡺 ( So X1 S1 ... Xm-r Sm-r A s, ai ... an $ )

• Output is the reducing production reduce A→β

3. Accept – Parsing successfully completed

( So X1 S1 ... Xm-r Sm-r Y1 Sm-r ...Yr Sm, ai ai+1 ... an $ )

• In fact, Y1Y2...Yr is a handle.

X1 ... Xm-r A ai ... an $ ⇒ X1 ... Xm Y1...Yr ai ai+1 ... an $

CS416 Compiler Design 55

CS416 Compiler Design 57

• goto function is a DFA on the sets in C.

Prepare the LR(0) parsing table for the above grammar

CS416 Compiler Design 59

CS416 Compiler Design 62

CS416 Compiler Design 63

CS416 Compiler Design 64

2. Create the parsing action table as follows

3. Create the parsing goto table

4. All entries not defined by (2) and (3) are errors.

5. Initial state of the parser contains S’→.S

CS416 Compiler Design 66

CS416 Compiler Design 67

CS416 Compiler Design 68

• If a state does not know whether it will make a reduction operation

• If the SLR parsing table of a grammar G has a conflict, we say that

CS416 Compiler Design 69

I4: L → *.R I7: L → *R.

• In some situations, βA cannot be followed by the terminal a in a

S → AaAb S⇒AaAb⇒Aab⇒ab S⇒BbBa⇒Bba⇒ba

• The most powerful shift-reduce parsing (yet efficient) is:

left to right right-most k lookhead

• LR parsing is attractive because:

CS416 Compiler Design 73

CS416 Compiler Design 74

CS416 Compiler Design 75

( So X1 S1 ... Xm Sm, ai ai+1 ... an $ )

Stack Rest of Input

• A configuration of a LR parsing represents the right sentential form:

I4: L → .R I7: L → R.