0% found this document useful (0 votes)
7 views126 pages

SP Unit III-2024-25

This document outlines the syllabus for the Compilers unit in the Systems Programming course at MIT School of Computing, Pune. It covers the phases of compilation, the role of lexical analyzers and parsers, and various parsing techniques including top-down and bottom-up parsing. Additionally, it discusses error recovery methods, token specifications, and the construction of predictive parsing tables.

Uploaded by

GAYATRI BHOSALE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views126 pages

SP Unit III-2024-25

This document outlines the syllabus for the Compilers unit in the Systems Programming course at MIT School of Computing, Pune. It covers the phases of compilation, the role of lexical analyzers and parsers, and various parsing techniques including top-down and bottom-up parsing. Additionally, it discusses error recovery methods, token specifications, and the construction of predictive parsing tables.

Uploaded by

GAYATRI BHOSALE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 126

MIT Art Design and Technology University

MIT School of Computing, Pune


21BTCS601 – Systems Programming

Class - T.Y. (SEM-II)

Unit – III
COMPILERS

AY 2024-2025 SEM-II
Unit III - Syllabus

Unit III –Compilers 09 hours


• Phase structure of Compiler and entire compilation process.
• Lexical Analyzer: The Role of the Lexical Analyzer
• Input Buffering. Specification of Tokens, Recognition Tokens,
• Design of Lexical Analyzer using Uniform Symbol Table,
• Lexical Errors.
• Role of parsers,
• Classification of Parsers:
• Top down parsers- recursive descent parser and predictive parser (LL parser),
• Bottom up Parsers – Shift Reduce parser, LR parser. YACC specification and Automatic
construction of Parser (YACC).
Compilers
• “Compilation”
• Translation of a program written in a source language into a semantically
equivalent program written in a target language
• Oversimplified view:

Input

Source Target
Compiler
Program Program

Error messages Output


3
Preprocessors, Compilers, Assemblers, and
Linkers
Skeletal Source Program

Preprocessor

Source Program

Try for example:


Compiler
gcc -v myprog.c
Target Assembly Program

Assembler

Relocatable Object Code

Libraries and
Linker Relocatable Object Files

Absolute Machine Code


4
The Phases of a Compiler

5
The Phases of a Compiler
Phase Output Sample
Programmer (source code producer) Source string A=B+C;
Scanner (performs lexical analysis) Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’
And symbol table with names
Parser (performs syntax analysis based on the Parse tree or abstract syntax tree ;
|
grammar of the programming language) =
/ \
A +
/ \
B C

Semantic analyzer (type checking, etc) Annotated parse tree or abstract syntax tree

Intermediate code generator Three-address code, quads, or RTL int2fp B t1


+ t1 C t2
:= t2 A
Optimizer Three-address code, quads, or RTL int2fp B t1
+ t1 #2.3 A
Code generator Assembly code MOVF #2.3,r1
ADDF2 r1,r2
MOVF r2,A
Peephole optimizer Assembly code ADDF2 #2.3,r2
MOVF r2,A 6
The Grouping of Phases
• Compiler front and back ends:
• Front end: analysis (machine independent)
• Back end: synthesis (machine dependent)
• Compiler passes:
• A collection of phases is done only once (single pass) or multiple times
(multi pass)
• Single pass: usually requires everything to be defined before being used in source
program
• Multi pass: compiler may have to keep entire program representation in memory

7
The role of lexical analyzer

token
Source To semantic
Lexical Analyzer Parser
program analysis
getNextToken

Symbol
table
Input buffering
• Sometimes lexical analyzer needs to look ahead some symbols to
decide about the token to return
• In C language: we need to look after -, = or < to decide what token to return
• In Fortran: DO 5 I = 1.25
• We need to introduce a two buffer scheme to handle large look-
aheads safely

E = M* C**2 eof
Tokens, Patterns and Lexemes
• A token is a pair a token name and an optional token value
• A pattern is a description of the form that the lexemes of a token
may take
• A lexeme is a sequence of characters in the source program that
matches the pattern for a token
Example

Token Informal description Sample lexemes

if Characters i, f if
else Characters e, l, s, e else
comparison < or > or <= or >= or == or != <=, !=

id Letter followed by letter and digits pi, score, D2


number Any numeric constant 3.14159, 0, 6.02e23
literal Anything but “ sorrounded by “ “core dumped”

printf(“total = %d\n”, score);


Attributes for tokens
• E = M * C ** 2
• <id, pointer to symbol table entry for E>
• <assign-op>
• <id, pointer to symbol table entry for M>
• <mult-op>
• <id, pointer to symbol table entry for C>
• <exp-op>
• <number, integer value 2>
Specification of tokens
• In theory of compilation regular expressions are used to formalize
the specification of tokens
• Regular expressions are means for specifying regular languages
• Example:
• Letter_(letter_ | digit)*
• Each regular expression is a pattern specifying the form of strings
Regular expressions
• Ɛ is a regular expression, L(Ɛ) = {Ɛ}
• If a is a symbol in ∑then a is a regular expression, L(a) = {a}
• (r) | (s) is a regular expression denoting the language L(r) ∪ L(s)
• (r)(s) is a regular expression denoting the language L(r)L(s)
• (r)* is a regular expression denoting (L9r))*
• (r) is a regular expression denting L(r)
Regular definitions
d1 -> r1
d2 -> r2

dn -> rn

• Example:
letter_ -> A | B | … | Z | a | b | … | Z | _
digit -> 0 | 1 | … | 9
id -> letter_ (letter_ | digit)*
Recognition of tokens
• Starting point is the language grammar to understand the tokens:
stmt -> if expr then stmt
| if expr then stmt else stmt

expr -> term relop term
| term
term -> id
| number
Recognition of tokens (cont.)
• The next step is to formalize the patterns:
digit -> [0-9]
Digits -> digit+
number -> digit(.digits)? (E[+-]? Digit)?
letter -> [A-Za-z_]
id -> letter (letter|digit)*
If -> if
Then -> then
Else -> else
Relop -> < | > | <= | >= | = | <>
• We also need to handle whitespaces:
ws -> (blank | tab | newline)+

Mohamed Sathak Engineering College B.E CSE III Year


Transition diagrams
• Transition diagram for relop
Transition diagrams (cont.)
• Transition diagram for reserved words and identifiers
Transition diagrams (cont.)
• Transition diagram for unsigned numbers

Mohamed Sathak Engineering College B.E CSE III Year


Transition diagrams (cont.)
• Transition diagram for whitespace
Lexical errors
• Some errors are out of power of lexical analyzer to recognize:
• fi (a == f(x)) …
• However it may be able to recognize errors like:
• d = 2r
• Such errors are recognized when no pattern for tokens matches a
character sequence
Error recovery
• Panic mode: successive characters are ignored until we reach to a
well formed token
• Delete one character from the remaining input
• Insert a missing character into the remaining input
• Replace a character by another character
• Transpose two adjacent characters

Mohamed Sathak Engineering College B.E CSE III Year


Lexical Analyzer Generator - Lex

Lex Source program


Lexical Compiler lex.yy.c
lex.l

C
lex.yy.c compiler
a.out

Input stream a.out


Sequence of
tokens
Role of Parsers
Classification pf Parsers
Elimination of left recursion
• A grammar is left recursive if it has a non-terminal A such that there
is a derivation A=> Aα +

• Top down parsing methods cant handle left-recursive grammars


• A simple rule for direct left recursion elimination:
• For a rule like:
• A -> A α|β
• We may replace it with
• A -> β A’
• A’ -> α A’ | ɛ
Left recursion elimination (cont.)
• There are cases like following
• S -> Aa | b
• A -> Ac | Sd | ɛ
• Left recursion elimination algorithm:
• Arrange the nonterminals in some order A1,A2,…,An.
• For (each i from 1 to n) {
• For (each j from 1 to i-1) {
• Replace each production of the form Ai-> Aj γ by the production Ai -> δ1 γ | δ2 γ | … |δk γ where Aj-> δ1 | δ2 | … |δk
are all current Aj productions
• }
• Eliminate left recursion among the Ai-productions
• }
Left factoring
• Left factoring is a grammar transformation that is useful for producing
a grammar suitable for predictive or top-down parsing.
• Consider following grammar:
• Stmt -> if expr then stmt else stmt
• | if expr then stmt
• On seeing input if it is not clear for the parser which production to use
• We can easily perform left factoring:
• If we have A->αβ1 | αβ2 then we replace it with
• A -> αA’
• A’ -> β1 | β2
Left factoring (cont.)
• Algorithm
• For each non-terminal A, find the longest prefix α common to two or more
of its alternatives. If α<> ɛ, then replace all of A-productions A->αβ1 |αβ2
| … | αβn | γ by
• A -> αA’ | γ
• A’ -> β1 |β2 | … | βn
• Example:
• S -> I E t S | i E t S e S | a
• E -> b
Top Down Parsing
• A Top-down parser tries to create a parse tree from the
root towards the leafs scanning input from left to right
• It can be also viewed as finding a leftmost derivation
for an input string
• Example: id+id*id

E -> TE’ E E E E E E
lm lm lm lm lm
E’ -> +TE’ | Ɛ T E’ T E’ T E’ T E’ T E’
T -> FT’
T’ -> *FT’ | Ɛ F T’ F T’ F T’ F T’ + T E’

F -> (E) | id id id Ɛ id Ɛ
Recursive descent parsing
• Consists of a set of procedures, one for each
nonterminal
• Execution begins with the procedure for start symbol
• A typical procedure for a non-terminal
void A() {
choose an A-production, A->X1X2..Xk
for (i=1 to k) {
if (Xi is a nonterminal
call procedure Xi();
else if (Xi equals the current input symbol a)
advance the input to the next symbol;
else /* an error has occurred */
}
}
Recursive descent parsing (cont)
• General recursive descent may require backtracking
• The previous code needs to be modified to allow backtracking
• In general form it cant choose an A-production easily.
• So we need to try all alternatives
• If one failed the input pointer needs to be reset and another
alternative should be tried
• Recursive descent parsers cant be used for left-recursive grammars
Example

S->cAd
A->ab | a Input: cad

S S S

c A d c A d c A d

a b a
First and Follow
• First() is set of terminals that begins strings derived from
• If α=>ɛ then *
is also in First(ɛ)
• In predictive parsing when we have A-> α|β, if First(α) and First(β) are
disjoint sets then we can select appropriate A-production by looking at the
next input
• Follow(A), for any nonterminal A, is set of terminals a that can
appear immediately* after A in some sentential form
• If we have S => αAaβ for some αand βthen a is in Follow(A)
• If A can be the rightmost symbol in some sentential form, then $ is
in Follow(A)
Computing First
• To compute First(X) for all grammar symbols X, apply following rules
until no more
* terminals or ɛ can be added to any First set:
1. If X is a terminal then First(X) = {X}.
2. If X is a nonterminal and X->Y1Y2…Yk is a production for some k>=1, then
place a in First(X) if for some i a is in First(Yi) and ɛ is in all of First(Y1),
…,First(Yi-1) that is Y1…Yi-1 => ɛ. if ɛ is in First(Yj) for j=1,…,k then add ɛ
to First(X).
3. If X-> ɛ is a production*
then add ɛ to First(X)
• Example!
Computing follow
• To compute First(A) for all nonterminals A, apply following rules until
nothing can be added to any follow set:
1. Place $ in Follow(S) where S is the start symbol
2. If there is a production A-> αBβ then everything in First(β) except ɛ is in
Follow(B).
3. If there is a production A->B or a production A->αBβ where First(β)
contains ɛ, then everything in Follow(A) is in Follow(B)
• Example!
LL(1) Grammars
• Predictive parsers are those recursive descent parsers needing no
backtracking
• Grammars for which we can create predictive parsers are called LL(1)
• The first L means scanning input from left to right
• The second L means leftmost derivation
• And 1 stands for using one input symbol for lookahead
• A grammar G is LL(1) if and only if whenever A-> α|βare two distinct
productions of G, the following conditions hold:
• For no terminal a do αandβ both derive strings beginning with a
• At most one*of α or βcan derive empty string
• If α=> ɛ then βdoes not derive any string beginning with a terminal in Follow(A).
Construction of Predictive Parsing table
• For each production A->α in grammar do the following:
1. For each terminal a in First(α) add A-> in M[A,a]
2. If ɛ is in First(α), then for each terminal b in Follow(A) add A-> ɛ to M[A,b].
If ɛ is in First(α) and $ is in Follow(A), add A-> ɛ to M[A,$] as well
• If after performing the above, there is no production in M[A,a] then
set M[A,a] to error
First
Example Follow

E -> TE’ F {(,id} {+, *, ), $}


T {(,id} {+, ), $}
E’ -> +TE’ | Ɛ {(,id} {), $}
T -> FT’ E
E’ {+,ɛ} {), $}
T’ -> *FT’ | Ɛ {+, ), $}
F -> (E) | id T’ {*,ɛ}
Input Symbol
Non -
terminal id + * ( ) $
E E -> TE’ E -> TE’

E’ E’ -> +TE’ E’ -> Ɛ E’ -> Ɛ

T T -> FT’ T -> FT’

T’ T’ -> Ɛ T’ -> *FT’ T’ -> Ɛ T’ -> Ɛ


F F -> id F -> (E)
Another example
S -> iEtSS’ | a
S’ -> eS | Ɛ
E -> b

Input Symbol
Non -
terminal a b e i t $
S S -> a S -> iEtSS’

S’ S’ -> Ɛ S’ -> Ɛ
S’ -> eS
E E -> b
Bottom-Up Parsing
• A bottom-up parser creates the parse tree of the given input
starting from leaves towards the root.
• A bottom-up parser tries to find the right-most derivation of the
given input in the reverse order.
S ⇒ ... ⇒ ω (the right-most derivation of ω)
← (the bottom-up parser finds the right-most derivation in the reverse order)

• Bottom-up parsing is also known as shift-reduce parsing because its


two main actions are shift and reduce.
• At each shift action, the current symbol in the input string is pushed to a stack.
• At each reduction step, the symbols at the top of the stack (this symbol sequence is the right side of a
production) will replaced by the non-terminal at the left side of that production.
• There are also two more actions: accept and error.
CS416 Compiler Design 42
Shift-Reduce Parsing
• A shift-reduce parser tries to reduce the given input string into the starting symbol.

a string 🡺 the starting symbol


reduced to

• At each reduction step, a substring of the input matching to the right side of a production rule is replaced by
the non-terminal at the left side of that production rule.
• If the substring is chosen correctly, the right most derivation of that string is created in the reverse order.
*
Rightmost Derivation: S⇒ω rm

Shift-Reduce Parser finds: ω ⇐ ... ⇐ S rm rm

CS416 Compiler Design 43


Shift-Reduce Parsing -- Example
S → aABb input string:aaabb
A → aA | a aaAbb
B → bB | b aAbb ⇓ reduction
aABb
S
rm rm rm rm
S ⇒ aABb ⇒ aAbb ⇒ aaAbb ⇒ aaabb

Right Sentential Forms

• How do we know which substring to be replaced at each reduction step?


CS416 Compiler Design 44
Handle
• Informally, a handle of a string is a substring that matches the right side of a production rule.
• But not every substring matches the right side of a production rule is handle

• A handle of a right sentential form γ (≡ αβω) is


a production rule A → β and a position of γ
where the string β may be found and replaced by A to produce
the previous right-sentential form in a rightmost derivation of γ.
*
S ⇒ αAω rm
⇒ αβω rm

• If the grammar is unambiguous, then every right-sentential form of the grammar has exactly
one handle.
• We will see that ω is a string of terminals.

CS416 Compiler Design 45


Handle Pruning
• A right-most derivation in reverse can be obtained by handle-pruning.
rm rm rm rm rm

S=γ0 ⇒ γ1 ⇒ γ2 ⇒ ... ⇒ γn-1 ⇒ γn= ω


input string

• Start from γn, find a handle An→βn in γn, and replace


βn in by An to get γn-1.
• Then find a handle An-1→βn-1 in γn-1, and replace βn-
1 in by An-1 to get γn-2.
• Repeat this, until we reach S.
CS416 Compiler Design 46
A Shift-Reduce Parser
E → E+T | T Right-Most Derivation of id+id*id
T → T*F | F E ⇒ E+T ⇒ E+T*F ⇒ E+T*id ⇒ E+F*id
F → (E) | id ⇒ E+id*id ⇒ T+id*id ⇒ F+id*id ⇒ id+id*id

Right-Most Sentential Form Reducing Production


id+id*id F → id
F+id*id T→F
T+id*id E→T
E+id*id F → id
E+F*id T→F
E+T*id F → id
E+T*F T → T*F
E+T E → E+T
E
Handles are red and underlined in the right-sentential forms.

CS416 Compiler Design 47


A Stack Implementation of A Shift-Reduce
Parser
• There are four possible actions of a shift-parser action:

1. Shift : The next input symbol is shifted onto the top of the stack.
2. Reduce: Replace the handle on the top of the stack by the non-terminal.
3. Accept: Successful completion of parsing.
4. Error: Parser discovers a syntax error, and calls an error recovery routine.

• Initial stack just contains only the end-marker $.


• The end of the input string is marked by the end-marker $.

CS416 Compiler Design 48


A Stack Implementation of A Shift-Reduce
Stack Parser Input Action
$ id+id*id$ shift
$id +id*id$ reduce by F → id Parse Tree
$F +id*id$ reduce by T → F
$T +id*id$ reduce by E → T E 8
$E +id*id$ shift
$E+ id*id$ shift E 3 + T 7
$E+id *id$ reduce by F → id
$E+F *id$ reduce by T → F T 2 T 5 * F6
$E+T *id$ shift
$E+T* id$ shift F 1 F 4 id
$E+T*id $ reduce by F → id
$E+T*F $ reduce by T → T*F id id
$E+T $ reduce by E → E+T
$E $ accept
CS416 Compiler Design 49
Conflicts During Shift-Reduce Parsing
• There are context-free grammars for which shift-reduce parsers cannot be
used.
• Stack contents and the next input symbol may not decide action:
• shift/reduce conflict: Whether make a shift operation or a reduction.
• reduce/reduce conflict: The parser cannot decide which of several reductions to make.
• If a shift-reduce parser cannot be used for a grammar, that grammar is called
as non-LR(k) grammar.

left to right right-most k lookhead


scanning derivation

• An ambiguous grammar can never be a LR grammar.


CS416 Compiler Design 50
Shift-Reduce Parsers
• There are two main categories of shift-reduce parsers

1. Operator-Precedence Parser CFG


• simple, but only a small class of grammars.
LR
LALR

SLR
2. LR-Parsers
• covers wide range of grammars.
• SLR – simple LR parser
• LR – most general LR parser
• LALR – intermediate LR parser (lookhead LR parser)
• SLR, LR and LALR work same, only their parsing tables are different.
CS416 Compiler Design 51
Actions of A LR-Parser
1. shift s -- shifts the next input symbol and the state s onto the stack
( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) 🡺 ( So X1 S1 ... Xm Sm ai s, ai+1 ... an $ )

2. reduce A→β (or rn where n is a production number)


• pop 2|β| (=r) items from the stack;
• then push A and s where s=goto[sm-r,A]

( So X1 S1 ... Xm Sm, ai ai+1 ... an $ ) 🡺 ( So X1 S1 ... Xm-r Sm-r A s, ai ... an $ )

• Output is the reducing production reduce A→β

3. Accept – Parsing successfully completed

4. Error -- Parser detected an error (an empty entry in the action table)
CS416 Compiler Design 52
Reduce Action
• pop 2|β| (=r) items from the stack; let us assume that β = Y1Y2...Yr
• then push A and s where s=goto[sm-r,A]

( So X1 S1 ... Xm-r Sm-r Y1 Sm-r ...Yr Sm, ai ai+1 ... an $ )


🡺 ( So X1 S1 ... Xm-r Sm-r A s, ai ... an $ )

• In fact, Y1Y2...Yr is a handle.

X1 ... Xm-r A ai ... an $ ⇒ X1 ... Xm Y1...Yr ai ai+1 ... an $


CS416 Compiler Design 53
Constructing SLR Parsing Tables – LR(0) Item
• An LR(0) item of a grammar G is a production of G a dot at the some position of the
right side.

Ex: A → aBb
. .
Possible LR(0) Items: A → aBb

.
(four different possibility)
A → aB b
.
A → a Bb

A → aBb
• Sets of LR(0) items will be the states of action and goto table of the SLR parser.
• A collection of sets of LR(0) items (the canonical LR(0) collection) is the basis for
constructing SLR parsers.
• Augmented Grammar:
G’ is G with a new production rule S’→S where S’ is the new starting symbol.
54
The Closure Operation
• If I is a set of LR(0) items for a grammar G, then closure(I) is the
set of LR(0) items constructed from I by the two rules:

.
1. Initially, every LR(0) item in I is added to closure(I).

.
2. If A → α Bβ is in closure(I) and B→γ is a production rule of G; then
B→ γ will be in the closure(I). We will apply
this rule until no more new LR(0) items can be added to closure(I).

CS416 Compiler Design 55


The Closure Operation -- Example
E’ → E .
closure({E’ → E}) =
E → E+T .
{ E’ → E kernel items
E→T .E+T
E→
T → T*F .
E→ T
T→F .T*F
T→
F → (E) .
T→ F
F → id .(E)
F→
.id }
F→
CS416 Compiler Design 56
Goto Operation
• If I is a set of LR(0) items and X is a grammar symbol (terminal or non-terminal), then
goto(I,X) is defined as follows:

.
If A → α Xβ in I
will be in goto(I,X).
.
then every item in closure({A → αX β})

Example:
I ={ E’ → .. .. .
E, E → E+T, E → T,
T→
F→ . . ..
T*F, T →
(E), F →
F,
id }

.. .
goto(I,E) = { E’ → E , E → E +T }
goto(I,T) = { E → T , T → T *F }
goto(I,F) = {T → F
. .. . . .
}
goto(I,() = { F → ( E), E → E+T, E → T, T → T*F, T → . F,

goto(I,id) = { F → id .
F→
}
(E), F → id }

CS416 Compiler Design 57


Construction of The Canonical LR(0)
Collection
• To create the SLR parsing tables for a grammar G, we will create the canonical
LR(0) collection of the grammar G’.

.
• Algorithm:
C is { closure({S’→ S}) }
repeat the followings until no more set of LR(0) items can be added to C.
for each I in C and each grammar symbol X
if goto(I,X) is not empty and not in C
add goto(I,X) to C

• goto function is a DFA on the sets in C.


CS416 Compiler Design 58
Example
Consider the grammar
S->(L)
S->a
L->S
L->L,S

Prepare the LR(0) parsing table for the above grammar

CS416 Compiler Design 59


CS416 Compiler Design 60
CS416 Compiler Design 61
The Canonical LR(0) Collection -- Example
I0: E’ → .E I1: E’ → E. I6: E → E+.T I9: E → E+T.
E → .E+T E → E.+T T → .T*F T → T.*F
E → .T T → .F
T → .T*F I2: E → T. F → .(E) I10: T → T*F.
T → .F T → T.*F F → .id
F → .(E)
F → .id I3: T → F. I7: T → T*.F I11: F → (E).
F → .(E)
I4: F → (.E) F → .id
E → .E+T
E → .T I8: F → (E.)
T → .T*F E → E.+T
T → .F
F → .(E)
F → .id

I5: F → id.

CS416 Compiler Design 62


Transition Diagram (DFA) of Goto Function
I0 E I1 + I6 T I9 * to I7
F to I3
to I4
(
T to I5
I2 I7 id
I10
I3 to I4
F *
F to I5
I4 I8
to I2
( I11
I5 to I3
id to I6
( to I4

E
)
id id T
F +

CS416 Compiler Design 63


Actions of A (S)LR-Parser -- Example
stack input action output
0 id*id+id$ shift 5
0id5 *id+id$ reduce by F→id F→id
0F3 *id+id$ reduce by T→F T→F
0T2 *id+id$ shift 7
0T2*7 id+id$ shift 5
0T2*7id5 +id$ reduce by F→id F→id
0T2*7F10 +id$ reduce by T→T*F T→T*F
0T2 +id$ reduce by E→T E→T
0E1 +id$ shift 6
0E1+6 id$ shift 5
0E1+6id5 $ reduce by F→id F→id
0E1+6F3 $ reduce by T→F T→F
0E1+6T9 $ reduce by E→E+T E→E+T
0E1 $ accept

CS416 Compiler Design 64


Constructing SLR Parsing Table
(of an augumented grammar G’)

1. Construct the canonical collection of sets of LR(0) items for G’. C←{I0,...,In}

2. Create the parsing action table as follows


• If a is a terminal, A→α.aβ in Ii and goto(Ii,a)=Ij then action[i,a] is shift j.
• If A→α. is in Ii , then action[i,a] is reduce A→α for all a in FOLLOW(A) where A≠S’.
• If S’→S. is in Ii , then action[i,$] is accept.
• If any conflicting actions generated by these rules, the grammar is not SLR(1).

3. Create the parsing goto table


• for all non-terminals A, if goto(I i,A)=Ij then goto[i,A]=j

4. All entries not defined by (2) and (3) are errors.

5. Initial state of the parser contains S’→.S


CS416 Compiler Design 65
(SLR) Parsing Tables for Expression Grammar
Action Table Goto Table

1) E → E+T state id + * ( ) $ E T F
2) E→T
0 s5 s4 1 2 3
3) T → T*F
4) T→F 1 s6 acc
5) F → (E) 2 r2 s7 r2 r2
6) F → id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5

CS416 Compiler Design 66


Parsing Tables of Expression Grammar
Action Table Goto Table

state id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5

CS416 Compiler Design 67


SLR(1) Grammar
• An LR parser using SLR(1) parsing tables for a grammar G is called as
the SLR(1) parser for G.
• If a grammar G has an SLR(1) parsing table, it is called SLR(1)
grammar (or SLR grammar in short).
• Every SLR grammar is unambiguous, but every unambiguous
grammar is not a SLR grammar.

CS416 Compiler Design 68


shift/reduce and reduce/reduce conflicts
• If a state does not know whether it will make a shift operation or
reduction for a terminal, we say that there is a shift/reduce conflict.

• If a state does not know whether it will make a reduction operation


using the production rule i or j for a terminal, we say that there is
a reduce/reduce conflict.

• If the SLR parsing table of a grammar G has a conflict, we say that


that grammar is not SLR grammar.

CS416 Compiler Design 69


Conflict Example
S → L=R I0: S’ → .S I1: S’ → S. I6: S → L=.R I9: S → L=R.
S→R S → .L=R R → .L
L→ *R S → .R I2: S → L.=R L→ .*R
L → id L → .*R R → L. L → .id
R→L L → .id
R → .L I3: S → R.

I4: L → *.R I7: L → *R.


Problem R → .L
FOLLOW(R)={=,$} L→ .*R I8: R → L.
= shift 6 L → .id
reduce by R → L
shift/reduce conflict I5: L → id.
CS416 Compiler Design 70
Conflict Example2
S → AaAb I0: S’ → .S
S → BbBa S → .AaAb
A→ε S → .BbBa
B→ε A→.
B→.

Problem
FOLLOW(A)={a,b}
FOLLOW(B)={a,b}
a reduce by A → ε b reduce by A → ε
reduce by B → ε reduce by B → ε
reduce/reduce conflict reduce/reduce conflict
CS416 Compiler Design 71
Constructing Canonical LR(1) Parsing Tables
• In SLR method, the state i makes a reduction by A→α when the current
token is a:
• if the A→α. in the Ii and a is FOLLOW(A)

• In some situations, βA cannot be followed by the terminal a in a


right-sentential form when βα and the state i are on the top stack. This
means that making reduction in this case is not correct.

S → AaAb S⇒AaAb⇒Aab⇒ab S⇒BbBa⇒Bba⇒ba


S → BbBa
A→ε Aab ⇒ ε ab Bba ⇒ ε ba
B→ε AaAb ⇒ Aa ε b BbBa ⇒ Bb ε a
CS416 Compiler Design 72
LR Parsers

• The most powerful shift-reduce parsing (yet efficient) is:

LR(k) parsing.

left to right right-most k lookhead


scanning derivation (k is omitted 🡺 it is 1)

• LR parsing is attractive because:


• LR parsing is most general non-backtracking shift-reduce parsing, yet it is still efficient.
• The class of grammars that can be parsed using LR methods is a proper superset of the class of grammars that can
be parsed with predictive parsers. LL(1)-Grammars ⊂ LR(1)-Grammars
• An LR-parser can detect a syntactic error as soon as it is possible to do so a left-to-right scan of the input.

CS416 Compiler Design 73


LR Parsers
• LR-Parsers
• covers wide range of grammars.
• SLR – simple LR parser
• LR – most general LR parser
• LALR – intermediate LR parser (look-head LR parser)
• SLR, LR and LALR work same (they used the same algorithm), only their
parsing tables are different.

CS416 Compiler Design 74


LR Parsing Algorithm
input a1 ... ai ... an $
stack

Sm
Xm output
LR Parsing Algorithm
Sm-1
Xm-1
.
. Action Table Goto Table
S1 terminals and $ non-terminal
s s
X1 t four different t each item is
a actions a a state number
S0 t t
e e
s s

CS416 Compiler Design 75


A Configuration of LR Parsing Algorithm
• A configuration of a LR parsing is:

( So X1 S1 ... Xm Sm, ai ai+1 ... an $ )

Stack Rest of Input

• Sm and ai decides the parser action by consulting the parsing action table. (Initial Stack
contains just So )

• A configuration of a LR parsing represents the right sentential form:

X1 ... Xm ai ai+1 ... an $


CS416 Compiler Design 76
LR(1) Item
• To avoid some of invalid reductions, the states need to carry more
information.
• Extra information is put into a state by including a terminal symbol
as a second component in an item.

• A LR(1) item is:


.
A → α β,a where a is the look-head of the LR(1) item
(a is a terminal or end-marker.)

CS416 Compiler Design 77


LR(1) Item (cont.)
.
• When β ( in the LR(1) item A → α β,a ) is not empty, the look-head
does not have any affect.
.
• When β is empty (A → α ,a ), we do the reduction by A→α only if
the next input symbol is a (not for any terminal in FOLLOW(A)).

.
• A state will contain A → α ,a where {a ,...,a } ⊆ FOLLOW(A)
1 1 n

...
A → α ,an.

CS416 Compiler Design 78


Steps to solve LR parser
• Augmented grammar
• Computation of canonical items by closure and goto functions
• DFA generation
• CLR parsing table generation
• Parsing the input with the help of parsing table

CS416 Compiler Design 79


Augmented Grammar G’
Augmented Grammar G’:
This equals G ∪ {S’ 🡪 S} where S is the start state of G.
The start state of G’ = S’.
This is done to signal to the parser when the parsing should stop to
announce acceptance of input.

CS416 Compiler Design 80


What is meant by canonical item
• Item: An LR (0) item or simply, an item of a grammar G is a
production of G with a dot ‘.’ at some position of the right side. For
example, the production A 🡪 XYZ yields four items,
• A 🡪 .XYZ
• A 🡪 X.YZ
• A 🡪 XY.Z
• A 🡪 XYZ.
• A production rule of the form A 🡪 ε yields only one item A 🡪 . .
Intuitively, an item shows how much of a production we have seen
till the current point in the parsing procedure.

CS416 Compiler Design 81


Canonical Collection of Sets of LR(1) Items
• The construction of the canonical collection of the sets of LR(1) items are
similar to the construction of the canonical collection of the sets of LR(0)
items, except that closure and goto operations work a little bit different.
LR(0) item has format A→α.Bβ

LR(1) item has format A→α.Bβ, a (production rule , look ahead )
closure(I) is: ( where I is a set of LR(1) items)
• every LR(1) item in I is in closure(I)

.
if A→α B β,a in closure(I) and B→γ is a production rule of G;
will be in the closure(I) for each terminal
then B→.γ, b

• b in FIRST(βa) .
82
goto operation
• If I is a set of LR(1) items and X is a grammar symbol (terminal or
non-terminal), then goto(I,X) is defined as follows:
• If A → α.Xβ,a in I then every item
in closure({A → αX.β,a}) will be in goto(I,X).
• If A → α.Xβ,a in I then goto(I,X)= A → αX.β,a
• Shifting of dot one symbol ahead keeping look ahead symbol as it is

CS416 Compiler Design 83


Construction of The Canonical LR(1)
Collection
• Algorithm:
C is { closure({S’→.S,$}) }
repeat the followings until no more set of LR(1) items can be added to C.
for each I in C and each grammar symbol X
if goto(I,X) is not empty and not in C
add goto(I,X) to C

• goto function is a DFA on the sets in C.

CS416 Compiler Design 84


A Short Notation for The Sets of LR(1) Items
• A set of LR(1) items containing the following items
.
A → α β,a1
...
.
A → α β,an

can be written as

.
A → α β,a1/a2/.../an
CS416 Compiler Design 85
EXAMPLE
GRAMMAR:
1. S’ -> S
2. S -> CC
3. C -> aC
4. C -> d
No goto and closure
operations , because the . Is

SET
I0 : S’ -> .S, $ OF ITEMS:
Goto(I0,S)
I1: S’ -> S., $
at the end of production rule

S -> .CC, $
C -> .a C, a /d Goto(I0,C) I2: S -> C.C, $ Goto(I2,C)
C -> .aC, $ I5: S -> CC., $
C -> .d, a/d C -> .d, $

I6: C -> a.C, $


G oto(I2,a) C -> .aC, $
C -> .d, $

Goto(I3,a)
Goto(I6,a)
Goto(I6,d)
Goto(I2,d) I7: C -> d., $

Goto(I0,a) I3: C -> a. C, a /d


C -> .aC, a /d Goto(I3,C)
C -> .d, a /d
I8: C -> aC., a /d
Goto(I0,d) Goto(I6,C)
I4: C -> d.,
a /d I9: C -> aC., $

Goto(I3,d)
DFA I5
S
I0 I1 a
C

a I6
C
I2
d
d
I7

a
I3 C
C
a I8
d

d
I4
I9

CS416 Compiler Design 88


• PARSING TABLEgrammar
Input: An augmented GENERATION
G’.
• Output: The canonical LR parsing table functions action and goto for G’
• Method :
• Construct C={I0,I1………..,In}, the collection of sets of LR(1) items for G’.
• State I of the parser is constructed from Ii. The parsing actions for state I
are determined as follows :
• If [A 🡪 α. a β, b] is in Ii, and goto(Ii, a) = Ij, then set action[ i,a] to “shift j.”
Here, a is required to be a terminal.
• b) If [ A 🡪 α., a] is in Ii, A ≠ S’, then set action[ i,a] to “reduce A 🡪 α.”
• c) If [S’🡪S.,$] is in Ii, then set action[ i ,$] to “accept.”

CS416 Compiler Design 89


Cont…
• The goto transition for state i are determined as follows: If goto(Ii ,
A)= Ij ,then goto[i,A]=j.
• All entries not defined by rules(2) and (3) are made “error.”
• The initial state of the parser is the one constructed from the set
containing item [S’🡪.S, $].

CS416 Compiler Design 90


Parsing table
State a d $ S C

0 1 2

2 5

3 8

6 9

CS416 Compiler Design 91


Parsing table
State a d $ S C

0 S3 S4 1 2

1 ACCEPT

2 S6 S7 5

3 S3 S4 8

6 S6 S7 9

CS416 Compiler Design 92


Parsing table
State a d $ S C

0 S3 S4 1 2

1 ACCEPT

2 S6 S7 5

3 S3 S4 8

4 R4 R4

5 R2

6 S6 S7 9

7 R4

8 R3 R3

9 R3

CS416 Compiler Design 93


Stack
Parsing the input string
Input buffer Action table Goto table Parsing action

$0 aadd$ action[0,a]=s3
$0a3 add$
action[3,a]=s3 SHIFT

$0a3a3 dd$ action[3,d]=s4 SHIFT

$0a3a3d4 d$ [3,C]=8
action[4,d]=r3 Reduce
$0a3a3C8 d$ action[8,d]=r2 [3,C]=8 Reduce
$0a3C8 d$
action[8,d]=r2 [0,C]=2 Reduce
$0C2 d$ action[2,d]=s7 SHIFT
CS416 Compiler Design 94
Stack
Parsing the input string
Input buffer Action table Goto table Parsing action

$0C2d7 $ action[7,$]=r3 [2,C]=5 Reduce


$0C2C5 $ [0,S]=1
action[5,$]=r3 Reduce

$0S1 $
Accept

CS416 Compiler Design 95


SLR parsing
• Continuing the same way, we define all LR(0) item states:
I1
S R
S'→∙ S S'→ S ∙ I6 S → L= ∙ S → L=R
I0
S→∙ R ∙
R→∙L i I9
L=R L I3
S→ L = L → ∙ *R d
S→∙R
I2 ∙=R L → ∙ id
L → ∙ *R L
R→ L∙
L → ∙ id * *
R→∙L L → *∙
i R I5 R L
R→ L I7
d R→∙L
I3 L → id ∙
i L → ∙ id R
∙ L → *R I8
d L→∙* ∙
* R
I4 S→ R

Canonical LR(1) Collection -- Example
S
A
S → AaAb I0: S’ → .S ,$ I1: S’ → S. ,$ a to I4
S → BbBa S → .AaAb ,$ B
b to I5
A→ε S → .BbBa ,$ I2: S → A.aAb ,$
B→ε A → . ,a
A a
B → . ,b I3: S → B.bBa ,$

I4: S → Aa.Ab ,$ B I6: S → AaA.b ,$ b I8: S → AaAb. ,$


A → . ,b

I5: S → Bb.Ba ,$ I7: S → BbB.a ,$ I9: S → BbBa. ,$


B → . ,a

CS416 Compiler Design 97


Canonical LR(1) Collection – Example2
S’ → S I0:S’ → .S,$ I1:S’ → S.,$ I4:L → *.R,$/= R to I7
1) S → L=R S → .L=R,$ S * R → .L,$/= L
to I8
2) S → R S → .R,$ L I2:S → L.=R,$ to I6 L→ .*R,$/= *
3) L→ *R L → .*R,$/= R → L.,$ L → .id,$/= to I4
id
4) L → id L → .id,$/= R to I5
I3:S → R.,$ id
5) R → L R → .L,$ I5:L → id.,$/=

I9:S → L=R.,$
R I13:L → *R.,$
I6:S → L=.R,$ to I9
L I10:R → L.,$
R → .L,$ to I10
L → .*R,$ * I4 and I11
to I11 R
L → .id,$ I11:L → *.R,$ to I13
id L
to I12 R → .L,$ to I10 I5 and I12
I7:L → *R.,$/= L→ .*R,$ *
to I11
L → .id,$ id I7 and I13
I8: R → L.,$/= to I12
I12:L → id.,$ I8 and I10
CS416 Compiler Design 98
Construction of LR(1) Parsing Tables
1. Construct the canonical collection of sets of LR(1) items for G’. C←{I0,...,In}

2. Create the parsing action table as follows



.
If a is a terminal, A→α aβ,b in Ii and goto(Ii,a)=Ij then action[i,a] is shift j.

.
If A→α ,a is in Ii , then action[i,a] is reduce A→α where A≠S’.
If S’→S.,$ is in I , then action[i,$] is accept.

i

• If any conflicting actions generated by these rules, the grammar is not LR(1).

3. Create the parsing goto table


• for all non-terminals A, if goto(Ii,A)=Ij then goto[i,A]=j

4. All entries not defined by (2) and (3) are errors.

5. Initial state of the parser contains S’→.S,$


CS416 Compiler Design 99
LR(1) idParsing
* = Tables
$ S – (for
L R Example2)
0 s5 s4 1 2 3
1 acc
2 s6 r5
3 r2
4 s5 s4 8 7
no shift/reduce or
5 r4 r4 no reduce/reduce conflict


6 s12 s11 10 9
7 r3 r3
so, it is a LR(1) grammar
8 r5 r5
9 r1
10 r5
11 s12 s11 10 13
12 r4
13 r3

CS416 Compiler Design 100


LALR Parsing Tables
• LALR stands for LookAhead LR.

• LALR parsers are often used in practice because LALR parsing tables
are smaller than LR(1) parsing tables.
• The number of states in SLR and LALR parsing tables for a grammar
G are equal.
• But LALR parsers recognize more grammars than SLR parsers.
• yacc creates a LALR parser for the given grammar.
• A state of LALR parser will be again a set of LR(1) items.
CS416 Compiler Design 101
Creating LALR Parsing Tables

Canonical LR(1) Parser 🡺 LALR Parser


shrink # of states

• This shrink process may introduce a reduce/reduce conflict in the


resulting LALR parser (so the grammar is NOT LALR)
• But, this shrink process does not produce a shift/reduce conflict.

CS416 Compiler Design 102


The Core of A Set of LR(1) Items
• The core of a set of LR(1) items is the set of its first component.

. .
. .
Ex: S → L =R,$ 🡺 S → L =R Core
R → L ,$ R→L

• We will find the states (sets of LR(1) items) in a canonical LR(1) parser with same cores. Then we will merge them as a
single state.

. .
.
I1:L → id ,= A new state: I12: L → id ,=

.
🡺 L → id ,$
I2:L → id ,$ have same core, merge them

• We will do this for all states of a canonical LR(1) parser to get the states of the LALR parser.
• In fact, the number of the states of the LALR parser for a grammar will be equal to the number of states of the SLR
parser for that grammar.
CS416 Compiler Design 103
Creation of LALR Parsing Tables
• Create the canonical LR(1) collection of the sets of LR(1) items for the given
grammar.
• Find each core; find all sets having that same core; replace those sets having same
cores with a single set which is their union.
C={I0,...,In} 🡺 C’={J1,...,Jm} where m ≤ n
• Create the parsing tables (action and goto tables) same as the construction of the
parsing tables of LR(1) parser.
• Note that: If J=I1 ∪ ... ∪ Ik since I1,...,Ik have same cores
🡺 cores of goto(I1,X),...,goto(I2,X) must be same.
• So, goto(J,X)=K where K is the union of all sets of items having same cores as goto(I1,X).

• If no conflict is introduced, the grammar is LALR(1) grammar. (We may only


introduce reduce/reduce conflicts; we cannot introduce a shift/reduce conflict)
CS416 Compiler Design 104
Shift/Reduce Conflict
• We say that we cannot introduce a shift/reduce conflict during the shrink process for the
creation of the states of a LALR parser.
• Assume that we can introduce a shift/reduce conflict. In this case, a state of LALR parser
must have:
. and B → β.aγ,b
A → α ,a
• This means that a state of the canonical LR(1) parser must have:
A → α.,a and B → β.aγ,c
But, this state has also a shift/reduce conflict. i.e. The original canonical LR(1) parser has a
conflict.
(Reason for this, the shift operation does not depend on lookaheads)

CS416 Compiler Design 105


Reduce/Reduce Conflict
• But, we may introduce a reduce/reduce conflict during the shrink
process for the creation of the states of a LALR parser.

.
I1 : A → α ,a .
I2: A → α ,b
B → β.,b B → β.,c

.
I12: A → α ,a/b 🡺 reduce/reduce conflict
B → β.,b/c

CS416 Compiler Design 106


Canonical LALR(1) Collection – Example2
S’ → S I0:S’ → . S,$ .
I1:S’ → S ,$ .
I411:L → * R,$/= R
. .
to I713

.
1) S → L=R S→ L=R,$ S * R→ L,$/= L
2) S → R
S→ . R,$
.
L I2:S → L =R,$ to I6
L→ .*R,$/= *
to I810
3) L→ *R
. R → L ,$
. id
to I411

.
L→ *R,$/= L→ id,$/=
. .
4) L → id R to I512
I3:S → R ,$ id
L→ id,$/= I512:L → id ,$/=
.
5) R → L
R→ L,$

. R
to I9 I9:S → L=R ,$ .
.
I6:S → L= R,$ Same Cores
L
I4 and I11
.
R → L,$ to I810
*

.
L → *R,$ to I411
id I5 and I12
L → id,$ to I512
.
I713:L → *R ,$/= I7 and I13

.
I810: R → L ,$/= I8 and I10
CS416 Compiler Design 107
LALR(1)
id Parsing
* = $Tables
S L– (for
R Example2)
0 s5 s4 1 2 3
1 acc
2 s6 r5
3 r2
4 s5 s4 8 7
no shift/reduce or
5 r4 r4 no reduce/reduce conflict


6 s12 s11 10 9
7 r3 r3
so, it is a LALR(1) grammar
8 r5 r5
9 r1

CS416 Compiler Design 108


Using Ambiguous Grammars
• All grammars used in the construction of LR-parsing tables must be un-
ambiguous.
• Can we create LR-parsing tables for ambiguous grammars ?
• Yes, but they will have conflicts.
• We can resolve these conflicts in favor of one of them to disambiguate the grammar.
• At the end, we will have again an unambiguous grammar.

• Why we want to use an ambiguous grammar?


• Some of the ambiguous grammars are much natural, and a corresponding unambiguous grammar can be very
complex.
• Usage of an ambiguous grammar may eliminate unnecessary reductions.

• Ex.
E → E+T | T
E → E+E | E*E | (E) | id 🡺 T → T*F | F
F → (E) | id
CS416 Compiler Design 109
Sets of LR(0) Items for Ambiguous Grammar
I : E’ → .E I : E’ → E. I : E → E + .E I : E → E+E.
E + E + I
E → .E+E E → E .+E E → .E+E E → E.+E
0 1 4 7 4
( *
I
E → .E*E E → E .*E E → .E*E E → E.*E
5
I 2
id
E → .(E) E → .(E)
*
I 3

E → .id ( E → .id
I : E → E *.E
(
I : E → E*E.
E
E → .E+E
+ I
I : E → (.E)
5
(
E → E.+E
8 4
*
E → .E+E E → .E*E
id I I
E → E.*E
2 2
5

.
I
E → .E*E
3
E E → (E)
id E → .(E) E → .id
E → .id I : E → (E.) I : E → (E).
id )
E → E.+E
6 9
+
I : E → id.
E → E.*E
3
* I 4

I 5

CS416 Compiler Design 110


SLR-Parsing Tables for Ambiguous Grammar
FOLLOW(E) = { $,+,*,) }

State I7 has shift/reduce conflicts for symbols + and *.

I0 E I1 + I4 E I7

when current token is +


shift 🡺 + is right-associative
reduce 🡺 + is left-associative

when current token is *


shift 🡺 * has higher precedence than +
reduce 🡺 + has higher precedence than *

CS416 Compiler Design 111


SLR-Parsing Tables for Ambiguous Grammar
FOLLOW(E) = { $,+,*,) }

State I8 has shift/reduce conflicts for symbols + and *.

I0 E I1 * I5 E I7

when current token is *


shift 🡺 * is right-associative
reduce 🡺 * is left-associative

when current token is +


shift 🡺 + has higher precedence than *
reduce 🡺 * has higher precedence than +

CS416 Compiler Design 112


SLR-Parsing Tables for Ambiguous Grammar
Action Goto

id + * ( ) $ E
0 s3 s2 1
1 s4 s5 acc
2 s3 s2 6
3 r4 r4 r4 r4
4 s3 s2 7
5 s3 s2 8
6 s4 s5 s9
7 r1 s5 r1 r1
8 r2 r2 r2 r2
9 r3 r3 r3 r3
CS416 Compiler Design 113
Error Recovery in LR Parsing
• An LR parser will detect an error when it consults the parsing action table and
finds an error entry. All empty entries in the action table are error entries.
• Errors are never detected by consulting the goto table.
• An LR parser will announce error as soon as there is no valid continuation for
the scanned portion of the input.
• A canonical LR parser (LR(1) parser) will never make even a single reduction
before announcing an error.
• The SLR and LALR parsers may make several reductions before announcing an
error.
• But, all LR parsers (LR(1), LALR and SLR parsers) will never shift an erroneous
input symbol onto the stack.
CS416 Compiler Design 114
Panic Mode Error Recovery in LR Parsing
• Scan down the stack until a state s with a goto on a particular
nonterminal A is found. (Get rid of everything from the stack before
this state s).
• Discard zero or more input symbols until a symbol a is found that can
legitimately follow A.
• The symbol a is simply in FOLLOW(A), but this may not work for all situations.

• The parser stacks the nonterminal A and the state goto[s,A], and it
resumes the normal parsing.
• This nonterminal A is normally is a basic programming block (there
can be more than one choice for A).
• stmt, expr, block, ...
CS416 Compiler Design 115
Phrase-Level Error Recovery in LR Parsing
• Each empty entry in the action table is marked with a specific error
routine.
• An error routine reflects the error that the user most likely will
make in that case.
• An error routine inserts the symbols into the stack or the input (or it
deletes the symbols from the stack and the input, or it can do both
insertion and deletion).
• missing operand
• unbalanced right parenthesis

CS416 Compiler Design 116


YACC
Basic Operational Sequence

File containing desired


grammar in YACC format
gram.y

YACC program
yacc

C source program created by YACC


y.tab.c

cc C compiler
or gcc

Executable program that will parse


grammar given in gram.y
a.out
YACC program/ File Format
Definitions

%%

Rules

%%

Supplementary Code
Definitions Section
Example

%{
#include <stdio.h>
#include <stdlib.h>
%} This is called a terminal

%token ID NUM
%start expr

The start symbol


(non-terminal)
Rules Section
• Is a grammar

• Example

expr : expr '+' term | term;


term : term '*' factor | factor;
factor : '(' expr ')' | ID | NUM;
Rules Section
• Normally written like this
• Example:
expr : expr '+' term
| term
;
OR
Expr : Expr '+‘ NUM { $$ = $1 + $3; }
I Expr ‘- ' NUM { $$ = $1 - $3; }
| NUM { $$ = $1; }
;
Semantic actions
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
;
Semantic actions (cont’d)
$1
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
;
Semantic actions (cont’d)
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
$2
;
Semantic actions (cont’d)
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
; $3
Default: $$ = $1;

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy