0% found this document useful (0 votes)
16 views28 pages

4-Intro To Flex and Bison-09!09!2024

Uploaded by

aryanshah1957
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views28 pages

4-Intro To Flex and Bison-09!09!2024

Uploaded by

aryanshah1957
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 28

1

Lexical Analysis and


Lexical Analyzer Generators

COP5621 Compiler Construction


Copyright Robert van Engelen, Florida State University, 2007
2

The Reason Why Lexical


Analysis is a Separate Phase
• Simplifies the design of the compiler
– LL(1) or LR(1) parsing with 1 token lookahead would
not be possible (multiple characters/tokens to match)
• Provides efficient implementation
– Systematic techniques to implement lexical analyzers
by hand or automatically from specifications
– Stream buffering methods to scan input
• Improves portability
– Non-standard symbols and alternate character
encodings can be normalized (e.g. trigraphs)
3

Interaction of the Lexical


Analyzer with the Parser
Token,
Source Lexical tokenval
Program Parser
Analyzer
Get next
token
error error

Symbol Table
4

Attributes of Tokens

y := 31 + 28*x Lexical analyzer

<id, “y”> <assign, > <num, 31> <+, > <num, 28> <*, > <id, “x”>

token
tokenval
(token attribute) Parser
5

Tokens, Patterns, and Lexemes


• A token is a classification of lexical units
– For example: id and num
• Lexemes are the specific character strings that
make up a token
– For example: abc and 123
• Patterns are rules describing the set of lexemes
belonging to a token
– For example: “letter followed by letters and digits” and
“non-empty sequence of digits”
6

Specification of Patterns for


Tokens: Definitions
• An alphabet  is a finite set of symbols
(characters)
• A string s is a finite sequence of symbols
from 
 s denotes the length of string s
  denotes the empty string, thus  = 0
• A language is a specific set of strings over
some fixed alphabet 
7

Specification of Patterns for


Tokens: String Operations
• The concatenation of two strings x and y is
denoted by xy
• The exponentation of a string s is defined by

s0 = 
si = si-1s for i > 0

note that s = s = s
8

Specification of Patterns for


Tokens: Language Operations
• Union
L  M = {s  s  L or s  M}
• Concatenation
LM = {xy  x  L and y  M}
• Exponentiation
L0 = {}; Li = Li-1L
• Kleene closure
L* = i=0,…, Li
• Positive closure
L+ = i=1,…, Li
9

Specification of Patterns for


Tokens: Regular Expressions
• Basis symbols:
  is a regular expression denoting language {}
– a   is a regular expression denoting {a}
• If r and s are regular expressions denoting languages
L(r) and M(s) respectively, then
– rs is a regular expression denoting L(r)  M(s)
– rs is a regular expression denoting L(r)M(s)
– r* is a regular expression denoting L(r)*
– (r) is a regular expression denoting L(r)
• A language defined by a regular expression is called a
regular set
10

Specification of Patterns for


Tokens: Regular Definitions
• Regular definitions introduce a naming convention:
d1  r1
d2  r2

dn  rn
where each ri is a regular expression over
  {d1, d2, …, di-1 }
• Any dj in ri can be textually substituted in ri to obtain
an equivalent set of definitions
11

Specification of Patterns for


Tokens: Regular Definitions
• Example:

letter  AB…Zab…z
digit  01…9
id  letter ( letterdigit )*

• Regular definitions are not recursive:

digits  digit digitsdigit wrong!


12

Specification of Patterns for


Tokens: Notational Shorthand
• The following shorthands are often used:

r+ = rr*
r? = r
[a-z] = abc…z

• Examples:
digit  [0-9]
num  digit+ (. digit+)? ( E (+-)? digit+ )?
13

Regular Definitions and


Grammars
Grammar
stmt  if expr then stmt
 if expr then stmt else stmt

expr  term relop term
 term Regular definitions
term  id if  if
 num then  then
else  else
relop  <  <=  <>  >  >=  =
id  letter ( letter | digit )*
num  digit+ (. digit+)? ( E (+-)? digit+ )?
14

Coding Regular Definitions in


Transition Diagrams
relop  <<=<>>>==
start < =
0 1 2 return(relop, LE)
>
3 return(relop, NE)
other
4 * return(relop, LT)
=
5 return(relop, EQ)
> =
6 7 return(relop, GE)
other
8 * return(relop, GT)
id  letter ( letterdigit )* letter or digit

start letter other


9 10 11 * return(gettoken(),
install_id())
Coding Regular Definitions in 15

Transition Diagrams: Code


token nexttoken()
{ while (1) {
switch (state) {
case 0: c = nextchar();
if (c==blank || c==tab || c==newline) { Decides the
state = 0;
lexeme_beginning++; next start state
}
else if (c==‘<’) state = 1; to check
else if (c==‘=’) state = 5;
else if (c==‘>’) state = 6;
else state = fail();
break; int fail()
case 1: { forward = token_beginning;
… swith (start) {
case 9: c = nextchar(); case 0: start = 9; break;
if (isletter(c)) state = 10; case 9: start = 12; break;
else state = fail(); case 12: start = 20; break;
break; case 20: start = 25; break;
case 10: c = nextchar(); case 25: recover(); break;
if (isletter(c)) state = 10; default: /* error */
else if (isdigit(c)) state = 10; }
else state = 11; return start;
break; }

16

The Lex and Flex Scanner


Generators
• Lex and its newer cousin flex are scanner
generators
• Systematically translate regular definitions
into C source code for efficient scanning
• Generated code is easy to integrate in C
applications
17

Creating a Lexical Analyzer with


Lex and Flex
lex
source lex or flex lex.yy.c
program compiler
lex.l

lex.yy.c
C a.out
compiler

input sequence
stream a.out of tokens
18

{ definitions }
%%
{ rules }
%%
{ user subroutines }
How to run:
Compile:lex Programname.l
Run: cc lex.yy.c -lfl
Output: ./a.out
19

Lex Specification
• A lex specification consists of three parts:
regular definitions, C declarations in %{ %}
%%
translation rules
%%
user-defined auxiliary procedures
• The translation rules are of the form:
p1 { action1 }
p2 { action2 }

pn { actionn }
20

Program to count the number of


%{
vowels and consonants
int vow_count=0;
int const_count =0;
%}
%%
[aeiouAEIOU] {vow_count++;}
[a-zA-Z] {const_count++;}
%%
int yywrap(){}
int main()
{
printf("Enter the string of vowels and consonants:");
yylex();
printf("Number of vowels are: %d\n", vow_count);
printf("Number of consonants are: %d\n", const_count);
return 0;}
21

/* DESCRIPTION/DEFINITION SECTION */
%{
#include<stdio.h>
int lc=0,sc=0,tc=0,ch=0,wc=0; // GLOBAL VARIABLES
%}

// RULE SECTION
%%
[\n] { lc++; ch+=yyleng;}
[ \t] { sc++; ch+=yyleng;}
[^\t] { tc++; ch+=yyleng;}
[^\t\n ]+ { wc++; ch+=yyleng;}
%%

int yywrap(){ return 1; }


/* After inputting press ctrl+d */

// MAIN FUNCTION
int main(){
printf("Enter the Sentence : ");
yylex();
printf("Number of lines : %d\n",lc);
printf("Number of spaces : %d\n",sc);
printf("Number of tabs, words, charc : %d , %d , %d\n",tc,wc,ch);

return 0;
}
22

Number
/*lex program to count number of words*/
of words
%{
#include<stdio.h>
#include<string.h>
int i = 0;
%}
/* Rules Section*/
%%
([a-zA-Z0-9])* {i++;} /* Rule for counting number of words*/
"\n" {printf("%d\n", i); i = 0;}
%%
int yywrap(void){}
int main()
{ // The function that starts the analysis
yylex();
return 0; }
23

Mobile Number
/* Lex Program to check valid Mobile Number */
%{
/* Definition section */
%}
/* Rule Section */
%%
[1-9][0-9]{9} {printf("\nMobile Number Valid\n");}
.+ {printf("\nMobile Number Invalid\n");}
%%
// driver code
int main()
{ printf("\nEnter Mobile Number : ");
yylex();
printf("\n");
return 0;
}
24

Regular Expressions in Lex


x match the character x
\. match the character .
“string”match contents of string of characters
. match any character except newline
^ match beginning of a line
$ match the end of a line
[xyz] match one character x, y, or z (use \ to escape -)
[^xyz]match any character except x, y, and z
[a-z] match one of a to z
r* closure (match zero or more occurrences)
r+ positive closure (match one or more occurrences)
r? optional (match zero or one occurrence)
r1 r2 match r1 then r2 (concatenation)
r1|r2 match r1 or r2 (union)
(r) grouping
r1\r2 match r1 when followed by r2
{d} match the regular expression defined by d
25

Example Lex Specification 1


Contains
%{ the matching
Translation #include <stdio.h> lexeme
%}
rules %%
[0-9]+ { printf(“%s\n”, yytext); }
.|\n { }
%% Invokes
main() the lexical
{ yylex(); analyzer
}

lex spec.l
gcc lex.yy.c -ll
./a.out < spec.l
26

Example Lex Specification 2


%{
#include <stdio.h> Regular
int ch = 0, wd = 0, nl = 0;
definition
Translation %}
rules delim [ \t]+
%%
\n { ch++; wd++; nl++; }
^{delim} { ch+=yyleng; }
{delim} { ch+=yyleng; wd++; }
. { ch++; }
%%
main()
{ yylex();
printf("%8d%8d%8d\n", nl, wd, ch);
}
27

Example Lex Specification 3


%{
#include <stdio.h> Regular
%}
definitions
Translation digit [0-9]
rules letter [A-Za-z]
id {letter}({letter}|{digit})*
%%
{digit}+ { printf(“number: %s\n”, yytext); }
{id} { printf(“ident: %s\n”, yytext); }
. { printf(“other: %s\n”, yytext); }
%%
main()
{ yylex();
}
28

%{
#include<stdio.h>
#include<string.h>
int i = 0;
%}
%%
([a-zA-Z0-9])* {i++;} /* Rule for counting
number of words*/
"\n" {printf("%d\n", i); i = 0;}
%%
int yywrap(void){}

int main()
{
yylex();

return 0;
}

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy