0% found this document useful (0 votes)

90 views44 pages

LexYacc Final

This document provides an overview of Lex and Yacc, which are tools used for generating lexical analyzers and parsers. It discusses the structure and components of Lex specification (.l) files and Yacc specification (.y) files. The key points covered are: - Lex is used to generate scanners/lexers that break input streams into tokens. It uses regular expressions to define patterns and actions. - Yacc is used to generate parsers based on context-free grammars. It defines production rules for a language. - Both Lex and Yacc files have definition, rules, and user code sections to specify the language and processing.

Uploaded by

Vinutha K

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views44 pages

LexYacc Final

Uploaded by

Vinutha K

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 44

Tutorial

On
Lex & Yacc
Compiler Phases

Font End

Syntax Syntax
High Level

Tree
program

Lexical Tokens Syntax Tree Semantic Intermediate

Analyzer Analyzer Analyzer Code Generator

Intermediate
Code
Target Machine

Machine-
Machine-Dependent Code Independent Code
Code

Code Optimizer Generator

Target Intermediate Optimizer
machine Code Code

Back End
Lex: what is it?
1. Lex: a tool for automatically generating a lexer or scanner
given a lex specification (.l file)
2. A lexer or scanner is used to perform lexical analysis, or the
breaking up of an input stream into meaningful units, or
tokens.
3. For example, consider breaking a text file up into individual
words.

x = a + b * 2;

[(identifier, x), (operator, =), (identifier, a), (operator, +), (identifier, b),
(operator, *), (literal, 2), (separator, ;)]
Skeleton of a lex specification (.l file)
• Input specification for Lex
•Three parts: Definitions, Rules, User definition
•Use “%%” as a delimiter of the each part

• First part: Definitions(Declaration)

•Define the options that is used by Lex inside Lexer.
•Generally, define the running environment of the Lexer
•The programming code within “%{” and “%}” will be directly copied into
the Lexer
•The second part will use these Definitions.

• Second part: Rules

•This part contains pattern information and corresponding action
scripts that should be executed when the pattern matched.
•Patterns can be defined by regular expression.
Skeleton of a lex specification (.l file)

[_a-zA-Z][_a-zA-Z0-9]
{action}

• Third part: User definition

– The code in this area will be copied into the ‘lex.yy.c’ which
is output of the Lex.
– This part is generally used for the sub-programs for the action
code.
The rules section
%%
[RULES SECTION]

<pattern> { <action to take when matched> }

<pattern> { <action to take when matched> }
…
%%

Patterns are specified by regular expressions.

For example:
%%
[A-Za-z]* { printf(“this is a word”); }
%%
Regular Expression Basics
. : matches any single character except \n
* : matches 0 or more instances of the preceding regular expression
+ : matches 1 or more instances of the preceding regular expression
? : matches 0 or 1 of the preceding regular expression
| : matches the preceding or following regular expression
[ ] : defines a character class
() : groups enclosed regular expression into a new regular expression
“…”: matches everything within the “ “ literally
Lex Regular Exp (cont)
x|y x or y
{i} definition of i
x/y x, only if followed by y (y not removed from input)
x{m,n} m to n occurrences of x
 x x, but only at beginning of line
x$ x, but only at end of line
"s" exactly what is in the quotes (except for "\" and following
character)
Meta-characters
– meta-characters (do not match themselves, because they are
used in the preceding reg exps):
• ()[]{}<>+/,^*|.\"$?-%

– to match a meta-character, prefix with "\"

– to match a backslash, tab or newline, use \\, \t, or \n

Regular Expression Examples
• an integer: 12345
[1-9][0-9]*
• a word: cat
[a-zA-Z]+
• a (possibly) signed integer: 12345 or -12345
[-+]?[1-9][0-9]*
• a floating point number: 1.2345
[0-9]*”.”[0-9]+
Two Rules
1. lex will always match the longest (number of characters)
token possible.
If two or more possible tokens are of the same length, then
the token with the regular expression that is defined first in
the lex specification is favoured.

2. Lex patterns only match a given input character or string once

Regular Expression Examples
• a delimiter for an English sentence

“.” | “?” | ! OR [“.””?”!]

• C++ comment: // call foo() here!!

“//”.*
•white space

[ \t]+
• English sentence: Look at this!

([ \t]+|[a-zA-Z]+)+(“.”|”?”|!)
Generation of lexical analyzer using LEX

LEX Compiler
Lex specification Lex.yy.c
file x.l

C compiler a.out
Lex.yy.c
Executable program

a.out
Input strings from Stream of tokens
source program
Lex program example 1
%{
/* program to count words, lines and characters in the given file */
#include<stdio.h>
int nblk,nword,nchar,nline;
%}
%%
\n {nline++,nchar++;}
[^ \t\n]+ {nword++,nchar+=yyleng;}
"" {nblk++,nchar++;}
. {nchar++;}
%%
int yywrap()
{
return 1;
}
Contn …
int main()
{
yyin = fopen("in.dat","r");
yylex();
fclose(yyin);
printf("\n char count is :%d",nchar);
printf("\n blk count is :%d",nblk);
printf("\n word count is :%d",nword);
printf("\n line count is :%d",nline);

}
Lex program compilation
steps

• 1) lex example.l
• 2) cc lex.yy.c –ll
• 3) ./a.out
Both lex and yacc pgm are
involved

• 1) lex 1a.l
• 2) yacc –d 1b.l
• 3) cc lex.yy.c y.tab.c -ll
• 4) ./a.out
Example program 2
%{
/* program to count comments in the input file and delete
them */
#include <stdio.h>
int comment=0;
%}

"/*".*"*/" {comment++;}
. ECHO;

%%
Contn …
int main(int argc, char *argv[])
{
FILE *fp;
if(argc > 1)
{
fp = fopen(argv[1],"r");
yyout = fopen(argv[2],"w");
if(!fp)
{
printf("error in opening the file");
exit(1);
}
yyin = fp;
yylex();

}
printf("no of comments lines:%d",comment);
}
yytext
Special Functions
•
– where text matched most recently is stored
• yyleng
– number of characters in text most recently matched
• yylval
– associated value of current token
• yymore()
– append next string matched to current contents of yytext
• yyless(n)
– remove from yytext all but the first n characters
• unput(c)
– return character c to input stream
• yywrap()
– When the end of the file is reached the return value of
yywrap() is checked.
– If it is non-zero, scanning terminates and if it is 0 scanning
continues with next input file.
Grammar
• Grammar is nothing but the collection of rules
• These rules are required to arrange the tokens in some proper
sequence so that the syntax of the language can be defined
• Ex: for being a sentence, there should be some rule to be followed.
– Ex: noun verb adjective , Noun verb , Noun verb noun

• For defining such rules, we need a grammar. Grammar for YACC is

described using a BNF
• The BNF grammar can be used to describe the context free
languages.
• The general form of writing the grammar in YACC is
A  bb | c

Here  means particular token can be replaced by the productions present on RHS.
Here | indicates “or”.
Parser lexer communication
• In the process of compilation, the lexical analyzer and parser work
together.
• When parser requires string of tokens, it invokes lexical analyzer. In
turn, the lexical analyzer supplies tokens to the parser

Error handler

Source
program
Demand for tokens
Lexical analyzer Parser Rest of compiler

Parse
Supply for tokens tree
Output
Symbol table code
Yacc: what is it?
Yacc: a tool for automatically generating a parser given
a grammar written in a yacc specification (.y file)

A grammar specifies a set of production rules, which

define a language. A production rule specifies a
sequence of symbols, sentences, which are legal in the
language.
Structure of yacc File

Definition section
declarations of tokens
type of values used on parser stack

Rules section
list of grammar rules with semantic routines

User code
Skeleton of a yacc specification (.y
file)
*.c is generated after
x.y
running

%{
< C global variables, prototypes, comments > This part will be
embedded into *.c
%}

contains token declarations.

[DEFINITION SECTION] Tokens are recognized in
lexer.
define how to “understand”
%% the input language, and
[PRODUCTION RULES SECTION] what actions to take for
each “sentence”.
%% any user code. For
< C auxiliary subroutines> example, a main function to
call the parser function
yyparse()
The Production Rules Section
%%
production : symbol1 symbol2 … { action }
| symbol3 symbol4 … { action }
| …

production: symbol1 symbol2 { action }

%%
An example
%%
statement : expression { printf (“ = %g\n”, $1); }
expression : expression ‘+’ expression { $$ = $1 + $3; }
| expression ‘-’ expression { $$ = $1 - $3; }
| NUMBER { $$ = $1; }
%%
statement
According these two productions,
expression
5 + 4 – 3 + 2 is parsed into:
expression expression

number expression expression

number number

5 + 4 - 3 + 2
Choosing a Grammar
S -> E S -> E
E -> E + T E -> E + E
E -> E - T
E -> T E ->E - E
T -> T * F E -> E * E
T -> T / F E -> E / E
T -> F E -> ( E )
F -> ( E )
E -> ID
F -> ID
Precedence and
Associativity

%right ‘='
%left '-' '+'
%left '*' '/'
%right '^'
Precedence Rules are used
in two situations

• In expression grammars
• To resolve the “dangling else”
conflict in grammars for if-then-else
language constructs.
Defining Values

expr : expr '+' term { $$ = $1 + $3; }

| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
;
$1 Defining Values
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
;
Defining Values
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
; $2
Defining Values
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
; $3 Default: $$ = $1;
Example: Lex scanner.l

%{
#include <stdio.h>
#include "y.tab.h"
%}
id [_a-zA-Z][_a-zA-Z0-9]*
wspc [ \t\n]+
semi [;]
comma [,]
%%
int { return INT; }
char { return CHAR; }
float { return FLOAT; }
{comma} { return COMMA; } /* Necessary? */
{semi} { return SEMI; }
{id} { return ID;}
{wspc} {;}
decl.y
Example: Definitions

%{
#include <stdio.h>
#include <stdlib.h>
%}
%start line
%token CHAR, COMMA, FLOAT, ID, INT, SEMI
%%
decl.y

Example: Rules

decl : type ID list

{ printf("Success!\n");
} ;
list : COMMA ID list
| SEMI
;
type : INT | CHAR | FLOAT
;

%%
decl.y

Example: Supplementary Code

extern FILE *yyin;

main()
{
do {
yyparse();
} while(!feof(yyin));
}

yyerror(char *s)
{
/* Don't have to do anything! */
}
What yacc cannot Parse
• It cannot deal with ambiguous
grammars.
• It also cannot deal with grammars
that need more than one token of
lookahead to tell whether it has
matched a rule.
Example
Phrase -> cart_animal AND CART
| work_animal AND PLOW
cart_animal -> HORSE | GOAT
work_animal -> HORSE | OX
HORSE AND CART

Phrase -> cart_animal CART

| work_animal PLOW
Conflicts
• A conflict occurs when the parser has
multiple possible actions in some state for a
given next token.
• Two kinds of conflicts:
– shift-reduce conflict:
• The parser can either keep reading more of the input
(“shift action”), or it can mimic a derivation step using
the input it has read already (“reduce action”).
– reduce-reduce conflict:
• There is more than one production that can be used
for mimicking a derivation step at that point.
Example of a conflict
Grammar rules:
S  if ( e ) S /* 1 */ Input: if ( e1 ) if ( e2 ) S2 else S3
| if ( e ) S else S /* 2 */

Parser state when input token = ‘else’:

– Input already seen: if ( e1 ) if ( e2 ) S2
– Choices for continuing:

1. keep reading input (“shift”): 2. mimic derivation step using

S  if ( e ) S (“reduce”):
• ‘else’ part of innermost if • ‘else’ part of outermost if
• eventual parse structure: • eventual parse structure:
if (e1) { if (e2) S2 else S3 } if (e1) { if (e2) S2 } else S3

shift-reduce conflict
Handling Conflicts
General approach:
 Iterate as necessary:
1. Use “yacc -v” to generate the file y.output.
2. Examine y.output to find parser states with conflicts.
3. For each such state, examine the items to figure why the conflict is
occurring.
4. Transform the grammar to eliminate the conflict

Reason for conflict Possible grammar transformation

Ambiguity with operators in expressions Specify associativity, precedence
Error action Move or eliminate offending error action
Semantic action Move the offending semantic action
Insufficient lookahead “expand out” the nonterminal involved
Other …???…
Reference Books
• lex & yacc, 2nd Edition
by John R.Levine, Tony Mason & Doug
Brown, O’Reilly,ISBN: 1-56592-000-7

• Mastering Regular Expressions

by Jeffrey E.F. Friedl,O’Reilly
ISBN: 1-56592-257-3

FDS - Unit 3 - MCQ
No ratings yet
FDS - Unit 3 - MCQ
8 pages
Language Processing: Introduction To Compiler Construction: Andy D. Pimentel Computer Systems Architecture Group
No ratings yet
Language Processing: Introduction To Compiler Construction: Andy D. Pimentel Computer Systems Architecture Group
91 pages
Compiler Desing-Final ppt2
No ratings yet
Compiler Desing-Final ppt2
194 pages
SS Lab Manual
No ratings yet
SS Lab Manual
66 pages
Lex
No ratings yet
Lex
41 pages
COS 320 Compilers: David Walker
No ratings yet
COS 320 Compilers: David Walker
38 pages
Module 4 RVC
No ratings yet
Module 4 RVC
59 pages
Flex and Bison
100% (1)
Flex and Bison
23 pages
Lex Yacc Tutorial
No ratings yet
Lex Yacc Tutorial
38 pages
SS & OS Final Lab Manual
No ratings yet
SS & OS Final Lab Manual
46 pages
Example Program For The Lex and Yacc Programs
No ratings yet
Example Program For The Lex and Yacc Programs
13 pages
Yacc
No ratings yet
Yacc
32 pages
Class 2019 Lex
No ratings yet
Class 2019 Lex
30 pages
Tutorial On Lex & Yacc: Presented by Dewan Tanvir Ahmed Lecturer, CSE Bangladesh University of Engineering and Technology
No ratings yet
Tutorial On Lex & Yacc: Presented by Dewan Tanvir Ahmed Lecturer, CSE Bangladesh University of Engineering and Technology
31 pages
Lex and Yacc
No ratings yet
Lex and Yacc
33 pages
Lex & Yacc
No ratings yet
Lex & Yacc
46 pages
System Software Manual
No ratings yet
System Software Manual
27 pages
Lexy Acc
No ratings yet
Lexy Acc
91 pages
SS Lab Manual
No ratings yet
SS Lab Manual
38 pages
Module-4 Lex and Yacc
No ratings yet
Module-4 Lex and Yacc
67 pages
Lab Manual
No ratings yet
Lab Manual
23 pages
CD MANUAL Edited
No ratings yet
CD MANUAL Edited
26 pages
Lex and Yacc: A Brisk Tutorial
No ratings yet
Lex and Yacc: A Brisk Tutorial
25 pages
SS Manual GEC 18CSL66
No ratings yet
SS Manual GEC 18CSL66
49 pages
Lex and Yacc
No ratings yet
Lex and Yacc
8 pages
Notes About Lex and Yacc: Pablo Nogueira Iglesias December 26, 1999
No ratings yet
Notes About Lex and Yacc: Pablo Nogueira Iglesias December 26, 1999
15 pages
Flex
No ratings yet
Flex
36 pages
CS 501 TOC Student 'S Lab Manual ODD Experiment 1 & 2 (1) - 1563252281
0% (1)
CS 501 TOC Student 'S Lab Manual ODD Experiment 1 & 2 (1) - 1563252281
5 pages
Overview of LEX and YACC
No ratings yet
Overview of LEX and YACC
6 pages
Yaac and Lex
No ratings yet
Yaac and Lex
13 pages
Lexnyacc
No ratings yet
Lexnyacc
15 pages
Theory:: Aim: Implement A Lexical Analyzer For A Subset of C Using LEX Implementation Should Support Error Handling
No ratings yet
Theory:: Aim: Implement A Lexical Analyzer For A Subset of C Using LEX Implementation Should Support Error Handling
5 pages
Yacc Examples
No ratings yet
Yacc Examples
9 pages
A Genetic Algorithm For The Vehicle Routing Problem
No ratings yet
A Genetic Algorithm For The Vehicle Routing Problem
101 pages
Course Plan - Soft Computing
No ratings yet
Course Plan - Soft Computing
5 pages
Lex and Yacc Roll No 23
No ratings yet
Lex and Yacc Roll No 23
7 pages
Lex Yacc
No ratings yet
Lex Yacc
17 pages
Compiler 56
No ratings yet
Compiler 56
39 pages
Lab Manual FOR Is Lab: WCTM /It/Lab Manual/6Th Sem/If Lab
No ratings yet
Lab Manual FOR Is Lab: WCTM /It/Lab Manual/6Th Sem/If Lab
43 pages
Yacc Tutorial
No ratings yet
Yacc Tutorial
15 pages
Lecture3 Lex
No ratings yet
Lecture3 Lex
44 pages
1lex and Yacc
No ratings yet
1lex and Yacc
42 pages
Classify As A Monomial, Binomial, or Trinomial
No ratings yet
Classify As A Monomial, Binomial, or Trinomial
28 pages
Data Mining: Clustering Validation Minimum Description Length Information Theory Co-Clustering
No ratings yet
Data Mining: Clustering Validation Minimum Description Length Information Theory Co-Clustering
67 pages
CHPT 1
No ratings yet
CHPT 1
25 pages
Chapter 6 Part IV
No ratings yet
Chapter 6 Part IV
36 pages
Implementation of Calculator Using LEX and YACC
0% (1)
Implementation of Calculator Using LEX and YACC
4 pages
A Genetic Algorithm For The Maximum Clique Problem
No ratings yet
A Genetic Algorithm For The Maximum Clique Problem
29 pages
Reinforcement Learning: R M V E R I
No ratings yet
Reinforcement Learning: R M V E R I
21 pages
Compiler Construction: Department of Computer Science
No ratings yet
Compiler Construction: Department of Computer Science
17 pages
Mini Project
No ratings yet
Mini Project
26 pages
Chapter 9
No ratings yet
Chapter 9
5 pages
Example Program For The Lex and Yacc Programs
No ratings yet
Example Program For The Lex and Yacc Programs
5 pages
Solution To LINEAR ALGEBRAIC EQUATIONS by GAUSS
No ratings yet
Solution To LINEAR ALGEBRAIC EQUATIONS by GAUSS
4 pages
Numeration Systems in Bases Other Than 10: Pamela Leutwyler
No ratings yet
Numeration Systems in Bases Other Than 10: Pamela Leutwyler
24 pages
Lex and Yacc
No ratings yet
Lex and Yacc
5 pages
SPCC Practicalss
No ratings yet
SPCC Practicalss
6 pages
CVE208 - Handout - 02 - Part 1 PDF
No ratings yet
CVE208 - Handout - 02 - Part 1 PDF
2 pages
Compiler File
No ratings yet
Compiler File
47 pages
CompilerDesignLabManual PDF
No ratings yet
CompilerDesignLabManual PDF
11 pages
Calculator
No ratings yet
Calculator
2 pages
Lex Yacc
No ratings yet
Lex Yacc
22 pages
Anshu Dsa File 10-30
No ratings yet
Anshu Dsa File 10-30
50 pages
BACKTRACKING
No ratings yet
BACKTRACKING
25 pages
Lex-Yacc For Exam
100% (1)
Lex-Yacc For Exam
17 pages
MATH 147 Practice 10 Solutions
No ratings yet
MATH 147 Practice 10 Solutions
4 pages
The Hexagonal Fast Fourier Transform: James B. Birdsong Nicholas I. Rummelt
No ratings yet
The Hexagonal Fast Fourier Transform: James B. Birdsong Nicholas I. Rummelt
4 pages
June 2022 MS - Paper 2 OCR Computer Science A-Level
No ratings yet
June 2022 MS - Paper 2 OCR Computer Science A-Level
26 pages
Class Xi PT 2 Cs 2022
No ratings yet
Class Xi PT 2 Cs 2022
2 pages
Theory Assn 2
No ratings yet
Theory Assn 2
2 pages
Lex Material 1
No ratings yet
Lex Material 1
37 pages
Reasoning Systems For Categories
No ratings yet
Reasoning Systems For Categories
13 pages
CS3501 Compiler Design Lab
No ratings yet
CS3501 Compiler Design Lab
35 pages
SSCD LAB MAUNUAL DRTTIT FULL (Santhosh) PDF
No ratings yet
SSCD LAB MAUNUAL DRTTIT FULL (Santhosh) PDF
50 pages
LEX and YACC
No ratings yet
LEX and YACC
31 pages
0.extracted Pages 20MCA201 From 2020 MCA S3 S4
No ratings yet
0.extracted Pages 20MCA201 From 2020 MCA S3 S4
18 pages
SPCC Exp7
No ratings yet
SPCC Exp7
8 pages
CMPE 442 Introduction To Machine Learning: Support Vector Machines
No ratings yet
CMPE 442 Introduction To Machine Learning: Support Vector Machines
64 pages
Lesson+2 Flowcharts
No ratings yet
Lesson+2 Flowcharts
14 pages
(1991) Cutting Stock Problems and Solution Procedures
No ratings yet
(1991) Cutting Stock Problems and Solution Procedures
39 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
33 pages
Ads Lab Manual-B.tech
No ratings yet
Ads Lab Manual-B.tech
49 pages
Two-Step Inequalities - Ks-Ia1
No ratings yet
Two-Step Inequalities - Ks-Ia1
2 pages
Write Specification of YACC
No ratings yet
Write Specification of YACC
7 pages
Compiler Design Lab KCS552
No ratings yet
Compiler Design Lab KCS552
82 pages
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Python Programming Concepts
From Everand
Python Programming Concepts
MRB
No ratings yet
Dive Into Sea of C
From Everand
Dive Into Sea of C
M Ashok
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

LexYacc Final

Uploaded by

LexYacc Final

Uploaded by

Tutorial

Lexical Tokens Syntax Tree Semantic Intermediate

Code Optimizer Generator

• First part: Definitions(Declaration)

• Second part: Rules

• Third part: User definition

<pattern> { <action to take when matched> }

Patterns are specified by regular expressions.

– to match a meta-character, prefix with "\"

– to match a backslash, tab or newline, use \\, \t, or \n

2. Lex patterns only match a given input character or string once

“.” | “?” | ! OR [“.””?”!]

• For defining such rules, we need a grammar. Grammar for YACC is

A grammar specifies a set of production rules, which

contains token declarations.

production: symbol1 symbol2 { action }

number expression expression

number expression expression

expr : expr '+' term { $$ = $1 + $3; }

decl : type ID list

Example: Supplementary Code

Phrase -> cart_animal CART

Parser state when input token = ‘else’:

1. keep reading input (“shift”): 2. mimic derivation step using

Reason for conflict Possible grammar transformation

• Mastering Regular Expressions

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.