Chapter 4 Automatacomplierdesign Final
Chapter 4 Automatacomplierdesign Final
Contents
4 Push down Automata................................................................................................................... 2
4.1 Introduction........................................................................................................................... 2
4.2 PDA Components .................................................................................................................. 3
4.3 Formal Definition of PDA....................................................................................................... 4
4.4 What is instantaneous description?...................................................................................... 5
4.5 Representation of Push-down Automata (PDA) ................................................................... 6
4.5.1 Transaction function of push-down automata............................................................... 6
4.5.2 Graphical Notation of pushdown automata (PDA) ........................................................ 7
4.5.3 Acceptance of PDA ....................................................................................................... 12
4.6 Syntax Analysis Phase of the compiler ................................................................................ 14
4.6.1 Bottom-up Parsing ........................................................................................................ 14
4.6.1.1 Handle and Handle Pruning ....................................................................................... 16
4.6.1.2 Shift Reduce Parser.................................................................................................... 17
4.6.1.3 shift-reduce parser conflict ....................................................................................... 20
4.6.1.4 reduce-reduce parser conflict ................................................................................... 21
4.6.1.5 LR PARSING ................................................................................................................ 22
Mallikarjuna G D Notes 1
AUTOMATA AND COMPILER DESIGN
Module 4
Push down Automata: Definition of Pushdown Automata, The languages of PDA
Syntax Analysis Phase of the compiler:part-2- Bottom-up parsing, Introduction to LR Parsing’s,
More Powerful LR Parsers
4.1 Introduction
Pushdown automata (PDA) are a type of automaton used in computer science and theoretical
computer science to model and analyze processes involving stack-based memory. They are an
extension of finite automata, which have limited memory, allowing PDAs to recognize a broader
class of languages, including context-free languages.
Why pushdown automata are important:
Modeling Context-Free Languages: Context-free grammars are widely used in programming
languages, parsing, and natural language processing. Pushdown automata provide a formalism
for recognizing and generating strings in these languages, making them essential for
understanding the structure of context-free languages.
Parsing: PDAs are used in parsing algorithms, such as the famous LR and LL parsers. These parsers
are crucial components of compiler construction and other language-processing tasks. They help
in analyzing the syntax of programming languages and ensuring that programs are correctly
structured.
Expressive Power: Pushdown automata can recognize more languages than finite automata
while maintaining a relatively simple structure. They strike a balance between expressiveness
and ease of understanding, making them a valuable tool for studying computational complexity
and language recognition.
Formal Language Theory: Pushdown automata play a central role in formal language theory,
providing insights into the computational capabilities of various language classes. They are used
to prove the properties about context-free languages and to establish relationships between
different classes of languages.
Applications in Software Engineering: Beyond theoretical applications, pushdown automata
have practical implications in software engineering. They are used in static code analysis,
verification of software correctness, and designing efficient algorithms for language processing
tasks.
Mallikarjuna G D Notes 2
AUTOMATA AND COMPILER DESIGN
Overall, pushdown automata are essential in both theoretical and practical aspects of computer
science, providing a foundation for understanding and solving a wide range of problems related
to formal languages and automata theory.
Mallikarjuna G D Notes 3
AUTOMATA AND COMPILER DESIGN
The following diagram shows a transition in a PDA from a state q1 to state q2, labeled as a,b → c
Mallikarjuna G D Notes 4
AUTOMATA AND COMPILER DESIGN
Mallikarjuna G D Notes 5
AUTOMATA AND COMPILER DESIGN
Mallikarjuna G D Notes 6
AUTOMATA AND COMPILER DESIGN
For example, let us consider the set of transition rules of a pushdown automaton given by
δ(q1, a, b) = {(q2 , cd), (q3 , ε)}
If at any time the control unit is in state q1, the input symbol read is ‘a’, and the symbol on the
top of stack is ‘b’, then one of the following two cases can occur:
• The control unit tends to go into the state q2 and the string ‘cd’ replaces ‘b’ on top of the
stack.
• The control unit goes into state q3 with the symbol b removed from the top of the stack.
In the deterministic case, when the function δ is applied, the automaton moves to a new state
q∈Q and pushes a new string of symbols x∈Γ* onto the stack. As we are dealing with a
nondeterministic pushdown automaton, the result of applying δ is a finite set of (q, x) pairs.
Mallikarjuna G D Notes 7
AUTOMATA AND COMPILER DESIGN
Mallikarjuna G D Notes 8
AUTOMATA AND COMPILER DESIGN
Mallikarjuna G D Notes 9
AUTOMATA AND COMPILER DESIGN
Mallikarjuna G D Notes 10
AUTOMATA AND COMPILER DESIGN
Mallikarjuna G D Notes 11
AUTOMATA AND COMPILER DESIGN
Acceptance by Final State: The PDA is said to accept its input by the final state if it enters
any final state in zero or more moves after reading the entire input.
Let P = (Q, ∑, Γ, δ, q0, Z, F) be a PDA. The language acceptable by the final state can be defined
as:
L(PDA) = {w | (q0, w, Z) ⊢* (p, ε, ε), q ∈ F}
Mallikarjuna G D Notes 12
AUTOMATA AND COMPILER DESIGN
Acceptance by Empty Stack: On reading the input string from the initial configuration for
some PDA, the stack of PDA gets empty.
Let P =(Q, ∑, Γ, δ, q0, Z, F) be a PDA. The language acceptable by empty stack can be defined as:
N(PDA) = {w | (q0, w, Z) ⊢* (p, ε, ε), q ∈ Q}
Mallikarjuna G D Notes 13
AUTOMATA AND COMPILER DESIGN
Mallikarjuna G D Notes 14
AUTOMATA AND COMPILER DESIGN
Mallikarjuna G D Notes 15
AUTOMATA AND COMPILER DESIGN
Mallikarjuna G D Notes 16
AUTOMATA AND COMPILER DESIGN
Mallikarjuna G D Notes 17
AUTOMATA AND COMPILER DESIGN
Shift:
• In the shift step, the parser reads the next input symbol from the input string.
• It then pushes this symbol onto the top of the stack.
• This step continues until the parser encounters a situation where it cannot proceed by
shifting, typically because there are no applicable rules to reduce the symbols on the stack.
Reduce (Handle Pruning):
• When the parser cannot shift anymore, it tries to reduce the symbols currently on the stack
if they match the right-hand side of any production rule in the grammar.
• If a match is found, the parser applies the corresponding production rule, replacing the
symbols on the stack with the non-terminal symbol on the left-hand side of the production
rule.
• This reduction step continues until the parser cannot apply any more reduction rules.
Accept:
• After processing the entire input string and successfully reducing it to the start symbol of the
grammar, the parser accepts the input.
• This means that the input string is syntactically correct according to the grammar rules
defined.
Error:
• If the parser encounters a situation where it cannot shift or reduce and there are still symbols
remaining on the stack, or if it reaches the end of the input string without successfully
reducing it to the start symbol, it indicates a syntax error.
• Common syntax errors include unexpected symbols, missing symbols, or invalid combinations
of symbols according to the grammar rules.
Overall, the shift-reduce parsing process involves a combination of shifting input symbols onto
the stack and reducing symbols on the stack using production rules until the input string is either
accepted as syntactically correct or an error is encountered.
Mallikarjuna G D Notes 18
AUTOMATA AND COMPILER DESIGN
Mallikarjuna G D Notes 19
AUTOMATA AND COMPILER DESIGN
Mallikarjuna G D Notes 20
AUTOMATA AND COMPILER DESIGN
Mallikarjuna G D Notes 21
AUTOMATA AND COMPILER DESIGN
4.6.1.5 LR PARSING
LR parsing is a bottom-up parsing technique used in compiler design to analyze and recognize the
structure of input strings based on context-free grammar. It stands for "Left-to-right, Rightmost
derivation" parsing. In LR parsing, the input string is scanned from left to right, and a rightmost
derivation of the input string is built in reverse, starting from the start symbol of the grammar.
At each step, the parser applies shift and reduce operations guided by a parsing table constructed
from the grammar.
LR parsing is classified into different types based on the construction of the parsing table and the
lookahead symbols used during parsing. Here are the main classifications:
• LR(0): In LR(0) parsing, the parser makes decisions based solely on the current state of the
parser and does not consider any lookahead symbols. This can lead to conflicts, such as shift-
reduce and reduce-reduce conflicts, in ambiguous grammars.
• SLR (Simple LR): SLR parsing improves upon LR(0) parsing by using a simplified parsing table
that resolves some shift-reduce conflicts. It does this by considering only a subset of the
lookahead symbols.
• LALR (Look-Ahead LR): LALR parsing further improves upon SLR parsing by using a more
compact parsing table that merges states with similar cores, resulting in fewer states and a
smaller table. It achieves this by sharing states that have identical cores but differ only in their
lookahead sets.
• CLR(1): CLR(1), which stands for Canonical LR(1), is an extension of LR(1) parsing. It is a more
formal and rigorous approach to LR(1) parsing, aiming to simplify and organize the parsing
process. CLR(1) parsing utilizes a canonical collection of LR(1) parsing states, which are
constructed to capture all possible LR(1) configurations of the parsing process.
Mallikarjuna G D Notes 22
AUTOMATA AND COMPILER DESIGN
Each type of LR parsing has its advantages and disadvantages in terms of table size, parsing
efficiency, and the class of grammars it can handle. SLR and LALR parsers are commonly used in
practice due to their balance between simplicity and power, while LR(1) parsers are used for
more complex grammars where additional lookahead is necessary to resolve parsing ambiguities.
Why LR Parsing?
LR parsers are favored in compiler design and parsing for several reasons:
• Powerful and General: LR parsers are capable of parsing a wide range of context-free
grammars, including those that cannot be parsed by simpler parsing techniques like LL
parsers. This makes LR parsing suitable for parsing many programming languages, which
often have complex grammatical structures.
• Efficient: LR parsing is typically more efficient than other parsing techniques, such as LL
parsing, especially for larger grammars. LR parsers have linear-time complexity for parsing,
meaning that the time taken to parse an input string is proportional to the length of the string.
• Bottom-Up Parsing: LR parsing is a bottom-up parsing technique, which means that it starts
parsing from the input symbols and builds up to the start symbol of the grammar. Bottom-up
parsing can often result in better error recovery and better handling of left-recursive
grammars compared to top-down parsing techniques.
• Automatic Construction: LR parsers can be automatically generated from a given context-
free grammar using parser generator tools like YACC (Yet Another Compiler Compiler) or
Bison. These tools take a grammar specification as input and produce the corresponding LR
parsing tables and parser code.
• Error Reporting: LR parsers can provide detailed error messages when syntax errors are
encountered during parsing. This is particularly useful for compiler writers and developers,
as it helps them quickly identify and fix issues in their code.
• Widely Used: LR parsing has been extensively studied and is widely used in practice. Many
programming languages, such as C, C++, Java, and Python, are parsed using LR-based parsers.
Overall, LR parsers offer a powerful, efficient, and widely applicable approach to parsing context-
free grammars, making them a popular choice in compiler design and related fields.
Mallikarjuna G D Notes 23
AUTOMATA AND COMPILER DESIGN
The input buffer is used to indicate end of input and it contains the string to be parsed followed by a $
Symbol.
A stack is used to contain a sequence of grammar symbols with a $ at the bottom of the stack.
The parsing table is a two-dimensional array. It contains two parts: The action part and GoTo part
Note:
• All parsers having the same structure as LR Parser only difference in the Parsing Table(ACTION and
GO)
• To construct LR(0) and SLR(1) tables we can use canonical collection of LR(0) items
• To construct LALR(1) and CLR(1) tables we can use canonical collection of LR(1) items
Mallikarjuna G D Notes 24
AUTOMATA AND COMPILER DESIGN
Augmenting grammar involves adding a new start symbol and a new production rule to the existing
grammar. This process is typically done to ensure that the grammar has a unique start symbol and to make
it easier to construct parsers using certain parsing techniques such as LR parsing.
• Choose a new non-terminal symbol that does not already exist in the original grammar. This symbol
will be the new start symbol for the augmented grammar.
• Create a new production rule that uses the new start symbol to derive the original start symbol of the
grammar.
• This new rule effectively makes the new start symbol the parent of the original start symbol.
For example :
S -> A
A -> aA | ε
After Augmenting
S' -> S
S -> A
A -> aA | ε
An LR (0) item is a production G with dot at some position on the right side of the production.
LR(0) items is useful to indicate that how much of the input has been scanned up to a given point in the
process of parsing.
For example:
Given grammar:
S → AA
A → aA | b
Augment Production and insert '•' symbol at the first position for every production in G
S` → •S
S → •AA
A → •aA
Mallikarjuna G D Notes 25
AUTOMATA AND COMPILER DESIGN
A → •b
• If a state is going to some other state on a terminal then it corresponds to a shift move.
• If a state is going to some other state on a variable then it corresponds to go to move.
• If a state contains the final item in the particular row then write the reduce node completely.
Mallikarjuna G D Notes 26
AUTOMATA AND COMPILER DESIGN
GOTO: move(non-terminal/variables)
In the SLR (1) parsing, we place the reduce move only in the FOLLOW of left-hand side.
Step 1: For the given input string write a context free grammar
Mallikarjuna G D Notes 27
AUTOMATA AND COMPILER DESIGN
Mallikarjuna G D Notes 28
AUTOMATA AND COMPILER DESIGN
Mallikarjuna G D Notes 29
AUTOMATA AND COMPILER DESIGN
o The look ahead is used to determine that where we place the final item.
o The look ahead always add & symbol for the augment production.
Mallikarjuna G D Notes 30
AUTOMATA AND COMPILER DESIGN
LALR (1)
LALR refers to the look ahead LR. To construct the LALR (!) parsing table, we use the canonical
collection of LR (10 items.
In the LALR (1) parsing , the LR (1) items which have same productions but different look ahead
are combined form a single set items.
LALR (1) is same as the CLR (1) parsing, only difference in the parsing in the parsing table
Mallikarjuna G D Notes 31
AUTOMATA AND COMPILER DESIGN
Mallikarjuna G D Notes 32
AUTOMATA AND COMPILER DESIGN
Mallikarjuna G D Notes 33