0% found this document useful (0 votes)

16 views28 pages

4-Intro To Flex and Bison-09!09!2024

Uploaded by

aryanshah1957

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views28 pages

4-Intro To Flex and Bison-09!09!2024

Uploaded by

aryanshah1957

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 28

1

Lexical Analysis and

Lexical Analyzer Generators

COP5621 Compiler Construction

The Reason Why Lexical

Analysis is a Separate Phase
• Simplifies the design of the compiler
– LL(1) or LR(1) parsing with 1 token lookahead would
not be possible (multiple characters/tokens to match)
• Provides efficient implementation
– Systematic techniques to implement lexical analyzers
by hand or automatically from specifications
– Stream buffering methods to scan input
• Improves portability
– Non-standard symbols and alternate character
encodings can be normalized (e.g. trigraphs)
3

Interaction of the Lexical

Analyzer with the Parser
Token,
Source Lexical tokenval
Program Parser
Analyzer
Get next
token
error error

Symbol Table
4

Attributes of Tokens

y := 31 + 28*x Lexical analyzer

<id, “y”> <assign, > <num, 31> <+, > <num, 28> <*, > <id, “x”>

token
tokenval
(token attribute) Parser
5

Tokens, Patterns, and Lexemes

• A token is a classification of lexical units
– For example: id and num
• Lexemes are the specific character strings that
make up a token
– For example: abc and 123
• Patterns are rules describing the set of lexemes
belonging to a token
– For example: “letter followed by letters and digits” and
“non-empty sequence of digits”
6

Specification of Patterns for

Tokens: Definitions
• An alphabet  is a finite set of symbols
(characters)
• A string s is a finite sequence of symbols
from 
 s denotes the length of string s
  denotes the empty string, thus  = 0
• A language is a specific set of strings over
some fixed alphabet 
7

Specification of Patterns for

Tokens: String Operations
• The concatenation of two strings x and y is
denoted by xy
• The exponentation of a string s is defined by

s0 = 
si = si-1s for i > 0

note that s = s = s
8

Specification of Patterns for

Tokens: Language Operations
• Union
L  M = {s  s  L or s  M}
• Concatenation
LM = {xy  x  L and y  M}
• Exponentiation
L0 = {}; Li = Li-1L
• Kleene closure
L* = i=0,…, Li
• Positive closure
L+ = i=1,…, Li
9

Specification of Patterns for

Tokens: Regular Expressions
• Basis symbols:
  is a regular expression denoting language {}
– a   is a regular expression denoting {a}
• If r and s are regular expressions denoting languages
L(r) and M(s) respectively, then
– rs is a regular expression denoting L(r)  M(s)
– rs is a regular expression denoting L(r)M(s)
– r* is a regular expression denoting L(r)*
– (r) is a regular expression denoting L(r)
• A language defined by a regular expression is called a
regular set
10

Specification of Patterns for

Tokens: Regular Definitions
• Regular definitions introduce a naming convention:
d1  r1
d2  r2
…
dn  rn
where each ri is a regular expression over
  {d1, d2, …, di-1 }
• Any dj in ri can be textually substituted in ri to obtain
an equivalent set of definitions
11

Specification of Patterns for

Tokens: Regular Definitions
• Example:

letter  AB…Zab…z
digit  01…9
id  letter ( letterdigit )*

• Regular definitions are not recursive:

digits  digit digitsdigit wrong!

Specification of Patterns for

Tokens: Notational Shorthand
• The following shorthands are often used:

r+ = rr*
r? = r
[a-z] = abc…z

• Examples:
digit  [0-9]
num  digit+ (. digit+)? ( E (+-)? digit+ )?
13

Regular Definitions and

Grammars
Grammar
stmt  if expr then stmt
 if expr then stmt else stmt

expr  term relop term
 term Regular definitions
term  id if  if
 num then  then
else  else
relop  <  <=  <>  >  >=  =
id  letter ( letter | digit )*
num  digit+ (. digit+)? ( E (+-)? digit+ )?
14

Coding Regular Definitions in

Transition Diagrams
relop  <<=<>>>==
start < =
0 1 2 return(relop, LE)
>
3 return(relop, NE)
other
4 * return(relop, LT)
=
5 return(relop, EQ)
> =
6 7 return(relop, GE)
other
8 * return(relop, GT)
id  letter ( letterdigit )* letter or digit

start letter other

9 10 11 * return(gettoken(),
install_id())
Coding Regular Definitions in 15

Transition Diagrams: Code

token nexttoken()
{ while (1) {
switch (state) {
case 0: c = nextchar();
if (c==blank || c==tab || c==newline) { Decides the
state = 0;
lexeme_beginning++; next start state
}
else if (c==‘<’) state = 1; to check
else if (c==‘=’) state = 5;
else if (c==‘>’) state = 6;
else state = fail();
break; int fail()
case 1: { forward = token_beginning;
… swith (start) {
case 9: c = nextchar(); case 0: start = 9; break;
if (isletter(c)) state = 10; case 9: start = 12; break;
else state = fail(); case 12: start = 20; break;
break; case 20: start = 25; break;
case 10: c = nextchar(); case 25: recover(); break;
if (isletter(c)) state = 10; default: /* error */
else if (isdigit(c)) state = 10; }
else state = 11; return start;
break; }
…
16

The Lex and Flex Scanner

Generators
• Lex and its newer cousin flex are scanner
generators
• Systematically translate regular definitions
into C source code for efficient scanning
• Generated code is easy to integrate in C
applications
17

Creating a Lexical Analyzer with

Lex and Flex
lex
source lex or flex lex.yy.c
program compiler
lex.l

lex.yy.c
C a.out
compiler

input sequence
stream a.out of tokens
18

{ definitions }
%%
{ rules }
%%
{ user subroutines }
How to run:
Compile:lex Programname.l
Run: cc lex.yy.c -lfl
Output: ./a.out
19

Lex Specification
• A lex specification consists of three parts:
regular definitions, C declarations in %{ %}
%%
translation rules
%%
user-defined auxiliary procedures
• The translation rules are of the form:
p1 { action1 }
p2 { action2 }
…
pn { actionn }
20

Program to count the number of

%{
vowels and consonants
int vow_count=0;
int const_count =0;
%}
%%
[aeiouAEIOU] {vow_count++;}
[a-zA-Z] {const_count++;}
%%
int yywrap(){}
int main()
{
printf("Enter the string of vowels and consonants:");
yylex();
printf("Number of vowels are: %d\n", vow_count);
printf("Number of consonants are: %d\n", const_count);
return 0;}
21

/* DESCRIPTION/DEFINITION SECTION */
%{
#include<stdio.h>
int lc=0,sc=0,tc=0,ch=0,wc=0; // GLOBAL VARIABLES
%}

// RULE SECTION
%%
[\n] { lc++; ch+=yyleng;}
[ \t] { sc++; ch+=yyleng;}
[^\t] { tc++; ch+=yyleng;}
[^\t\n ]+ { wc++; ch+=yyleng;}
%%

int yywrap(){ return 1; }

/* After inputting press ctrl+d */

// MAIN FUNCTION
int main(){
printf("Enter the Sentence : ");
yylex();
printf("Number of lines : %d\n",lc);
printf("Number of spaces : %d\n",sc);
printf("Number of tabs, words, charc : %d , %d , %d\n",tc,wc,ch);

return 0;
}
22

Number
/*lex program to count number of words*/
of words
%{
#include<stdio.h>
#include<string.h>
int i = 0;
%}
/* Rules Section*/
%%
([a-zA-Z0-9])* {i++;} /* Rule for counting number of words*/
"\n" {printf("%d\n", i); i = 0;}
%%
int yywrap(void){}
int main()
{ // The function that starts the analysis
yylex();
return 0; }
23

Mobile Number
/* Lex Program to check valid Mobile Number */
%{
/* Definition section */
%}
/* Rule Section */
%%
[1-9][0-9]{9} {printf("\nMobile Number Valid\n");}
.+ {printf("\nMobile Number Invalid\n");}
%%
// driver code
int main()
{ printf("\nEnter Mobile Number : ");
yylex();
printf("\n");
return 0;
}
24

Regular Expressions in Lex

x match the character x
\. match the character .
“string”match contents of string of characters
. match any character except newline
^ match beginning of a line
$ match the end of a line
[xyz] match one character x, y, or z (use \ to escape -)
[^xyz]match any character except x, y, and z
[a-z] match one of a to z
r* closure (match zero or more occurrences)
r+ positive closure (match one or more occurrences)
r? optional (match zero or one occurrence)
r1 r2 match r1 then r2 (concatenation)
r1|r2 match r1 or r2 (union)
(r) grouping
r1\r2 match r1 when followed by r2
{d} match the regular expression defined by d
25

Example Lex Specification 1

Contains
%{ the matching
Translation #include <stdio.h> lexeme
%}
rules %%
[0-9]+ { printf(“%s\n”, yytext); }
.|\n { }
%% Invokes
main() the lexical
{ yylex(); analyzer
}

lex spec.l
gcc lex.yy.c -ll
./a.out < spec.l
26

Example Lex Specification 2

%{
#include <stdio.h> Regular
int ch = 0, wd = 0, nl = 0;
definition
Translation %}
rules delim [ \t]+
%%
\n { ch++; wd++; nl++; }
^{delim} { ch+=yyleng; }
{delim} { ch+=yyleng; wd++; }
. { ch++; }
%%
main()
{ yylex();
printf("%8d%8d%8d\n", nl, wd, ch);
}
27

Example Lex Specification 3

%{
#include <stdio.h> Regular
%}
definitions
Translation digit [0-9]
rules letter [A-Za-z]
id {letter}({letter}|{digit})*
%%
{digit}+ { printf(“number: %s\n”, yytext); }
{id} { printf(“ident: %s\n”, yytext); }
. { printf(“other: %s\n”, yytext); }
%%
main()
{ yylex();
}
28

%{
#include<stdio.h>
#include<string.h>
int i = 0;
%}
%%
([a-zA-Z0-9])* {i++;} /* Rule for counting
number of words*/
"\n" {printf("%d\n", i); i = 0;}
%%
int yywrap(void){}

int main()
{
yylex();

return 0;
}

Learn Japanese With Manga Volume Two - Marc Bernabe
100% (1)
Learn Japanese With Manga Volume Two - Marc Bernabe
355 pages
Taller de Inglés Juan
No ratings yet
Taller de Inglés Juan
74 pages
Activities For English Language Learners Across The Curriculum
100% (4)
Activities For English Language Learners Across The Curriculum
136 pages
FGG
100% (4)
FGG
116 pages
Lesson Plan in Grade 7 With Nos
0% (2)
Lesson Plan in Grade 7 With Nos
3 pages
Answered Tpa 2
100% (4)
Answered Tpa 2
27 pages
Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
63 pages
Lexical Analysis1
No ratings yet
Lexical Analysis1
44 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Ch3 1
No ratings yet
Ch3 1
52 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
CH 2 - Lexical Analysis
No ratings yet
CH 2 - Lexical Analysis
36 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
2 Lex
No ratings yet
2 Lex
45 pages
Lexical Analysis: Textbook:Modern Compiler Design
No ratings yet
Lexical Analysis: Textbook:Modern Compiler Design
43 pages
CD ch2
No ratings yet
CD ch2
104 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
Chapter 3 - Lexical Analysis
100% (3)
Chapter 3 - Lexical Analysis
51 pages
UNIT-I - Lexical Analysis
No ratings yet
UNIT-I - Lexical Analysis
51 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
Lexical Analysis: Programming Languages Translators
No ratings yet
Lexical Analysis: Programming Languages Translators
21 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
Lexical Analysis 3
No ratings yet
Lexical Analysis 3
27 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Passive Voice
No ratings yet
Passive Voice
14 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
Compiler
No ratings yet
Compiler
60 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
3rd Year Prep Prepration New Hello 2023 by Mr. Adel 1stterm
No ratings yet
3rd Year Prep Prepration New Hello 2023 by Mr. Adel 1stterm
28 pages
Ch2+3 Compiler
No ratings yet
Ch2+3 Compiler
21 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
Lexical Analysis
No ratings yet
Lexical Analysis
36 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Planning A Trip - Assignment
No ratings yet
Planning A Trip - Assignment
2 pages
B.ingg. Kls 8 Unit 5
No ratings yet
B.ingg. Kls 8 Unit 5
18 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
Lesson Plan Writing Fresh Fruits
No ratings yet
Lesson Plan Writing Fresh Fruits
3 pages
Definition Adjective Clause
No ratings yet
Definition Adjective Clause
14 pages
A Note On Understanding
No ratings yet
A Note On Understanding
12 pages
At A Loss For Words Printable
No ratings yet
At A Loss For Words Printable
31 pages
SSC Module2 LexicalAnalysis
No ratings yet
SSC Module2 LexicalAnalysis
26 pages
Lecture 1 - Film As Art
No ratings yet
Lecture 1 - Film As Art
19 pages
Conversations Containing Familiar Vocabulary
No ratings yet
Conversations Containing Familiar Vocabulary
5 pages
Lexical Analysis 2
No ratings yet
Lexical Analysis 2
24 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
Lesson Plan - Personal Information
No ratings yet
Lesson Plan - Personal Information
4 pages
The Past Perfect or The Past Simple at Auto-English
100% (1)
The Past Perfect or The Past Simple at Auto-English
2 pages
Ju C Unit Question Bank
No ratings yet
Ju C Unit Question Bank
230 pages
Angela Romero Sanchez
No ratings yet
Angela Romero Sanchez
1 page
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Gradable and Non-Gradable Adjectives
No ratings yet
Gradable and Non-Gradable Adjectives
3 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
CH 3 Myppt
No ratings yet
CH 3 Myppt
59 pages
Lexical and Syntax Analysis
No ratings yet
Lexical and Syntax Analysis
63 pages
2 - Lexical Analysis
No ratings yet
2 - Lexical Analysis
52 pages
L3 FSM
No ratings yet
L3 FSM
20 pages
PT GRADE 1 MATATAG Reading & LIt 1 TOS
No ratings yet
PT GRADE 1 MATATAG Reading & LIt 1 TOS
3 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
CD 1
No ratings yet
CD 1
92 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
2 Lexing
No ratings yet
2 Lexing
73 pages
Unit 1
No ratings yet
Unit 1
34 pages
Compilation Techniques
No ratings yet
Compilation Techniques
20 pages
Roshik English
No ratings yet
Roshik English
16 pages
F1 UT2 English ExamScope 2023-2024
No ratings yet
F1 UT2 English ExamScope 2023-2024
2 pages
Basic Grammar Test 1 (Noun, Adj, Pron, Det)
No ratings yet
Basic Grammar Test 1 (Noun, Adj, Pron, Det)
3 pages
Unemployment-ESLworksheetbypaula Esl 1730116295716
No ratings yet
Unemployment-ESLworksheetbypaula Esl 1730116295716
6 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
Words Gram2
No ratings yet
Words Gram2
2 pages
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
No ratings yet
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
40 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
52 pages
Eye Movement and Reading
No ratings yet
Eye Movement and Reading
3 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Telur Mata Sapi Black and White Buku Siswa 1
No ratings yet
Telur Mata Sapi Black and White Buku Siswa 1
182 pages
Spoken English
No ratings yet
Spoken English
66 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

4-Intro To Flex and Bison-09!09!2024

Uploaded by

4-Intro To Flex and Bison-09!09!2024

Uploaded by

1

Lexical Analysis and

COP5621 Compiler Construction

The Reason Why Lexical

Interaction of the Lexical

y := 31 + 28*x Lexical analyzer

Tokens, Patterns, and Lexemes

Specification of Patterns for

Specification of Patterns for

Specification of Patterns for

Specification of Patterns for

Specification of Patterns for

Specification of Patterns for

• Regular definitions are not recursive:

digits  digit digitsdigit wrong!

Specification of Patterns for

Regular Definitions and

Coding Regular Definitions in

start letter other

Transition Diagrams: Code

The Lex and Flex Scanner

Creating a Lexical Analyzer with

Program to count the number of

int yywrap(){ return 1; }

Regular Expressions in Lex

Example Lex Specification 1

Example Lex Specification 2

Example Lex Specification 3

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.