0% found this document useful (0 votes)
18 views19 pages

Lex

The document introduces the concept of lexical analysis and the use of the Lex tool to create lexical analyzers through regular expressions. It outlines the structure of Lex programs, including declarations, translation rules, and auxiliary functions, and explains how the generated C program interacts with a parser. Additionally, it discusses conflict resolution and the lookahead operator in Lex for matching patterns in input.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views19 pages

Lex

The document introduces the concept of lexical analysis and the use of the Lex tool to create lexical analyzers through regular expressions. It outlines the structure of Lex programs, including declarations, translation rules, and auxiliary functions, and explains how the generated C program interacts with a parser. Additionally, it discusses conflict resolution and the lookahead operator in Lex for matching patterns in input.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

UNIT I

INTRODUCTION TO COMPILERS

Lexical Analysis
Lex – The Lexical-Analyzer Generator
Introduction
• Recent implementation – Flex
• To specify a lexical analyzer by specifying
regular expressions to describe patterns for
tokens
• Tool: Lex compiler
• Input notation: Lex language
• Lex compiler transforms the input patterns into
a transition diagram and generates code,
lex.yy.c, simulates the transition diagram
Use of Lex
• Creating a lexical analyzer with Lex
Use of Lex
• An input file, lex.l, is written in the Lex language
and describes the lexical analyzer to be generated
• The Lex compiler transforms lex.l to a C program
always named lex.yy.c
• The latter file is compiled by the C compiler into a
file called a.out
• The C-compiler output is a lexical analyzer that
take a stream of input characters and produce a
stream of tokens
Use of Lex
• a.out
– A subroutine of the parser
– Returns an integer, a code for one of the possible token
names
– The attribute value, whether a numeric code, a pointer
to the symbol table, or nothing, is placed in a global
variable yylval
– yylval is shared between the lexical analyzer and parser
– Thereby returns both the name and an attribute value
of a token
Structure of Lex Programs
• A Lex program has the following form:

declarations
%%
translation rules
%%
auxiliary functions

• The declarations section includes declarations of


– variables
– manifest constants
(identifiers declared to stand for a constant, e.g. , the name of a token)
Structure of Lex Programs
• The translation rules have the form
Pattern { Action }
– Each pattern is a regular expression
– The actions are fragments of code written in C
• The third section holds additional functions
used in the actions
– These functions can be compiled separately and
loaded with the lexical analyzer
Structure of Lex Programs
• The lexical analyzer created by Lex behaves in concert with
the parser as follows
– When called by the parser, the lexical analyzer begins reading its
remaining input, one character at a time, until it finds the input
that matches one of the patterns Pi
– It then executes the associated action Ai
– Ai will return to the parser
• If it does not (e.g., whitespace), then the lexical analyzer proceeds to
find additional lexemes, until one of the corresponding actions causes a
return to the parser
• The lexical analyzer returns a single value, the token name,
to the parser
• Uses the shared, integer variable yylval to pass additional
Example
• Tokens, patterns and attribute value
Example
%{
/* definitions of manifest constants
LT, LE, EQ, NE, GT, GE, IF, THEN, ELSE, ID, NUMBER, RELOP */
%}

/* regular definitions */
delim [ \t\n]
ws {del im}+
letter [A-Za-z]
digit [0-9]
id {letter}({letter}|{digit})*
number {digit}+(\.{digit }+)?(E[+-]?{digit}+)?
%%
Example
{ws} {/* no action and no return */}
if {return (IF);}
then {return (THEN);}
else {return (ELSE) ;}
{id} {yylval = (int) installID (); return (ID);}
{number} {yylval = (int) installNum (); return (NUMBER);}
"<“ {yylval = LT; return (RELOP);}
" <=“ {yylval = LE; return (RELOP);}
"=“ {yylval = EQ; return (RELOP);}
" <>“ {yylval = NE; return (RELOP);}
">“ {yylval = GT; return (RELOP);}
" >=“ {yylval = GE; return (RELOP);}
%%
Example
int installID () {/* function to install the lexeme, whose
first character is pointed to by yytext,
and whose length is yyleng, into the
symbol table and return a pointer
thereto */
}

int installNum () {/* similar to installID, but puts numerical


constants into a separate table */
}
Example
• The Lex program recognizes the tokens given
and returns the token found
• Declarations section
– A pair of special brackets, %{ and %}
• Anything within these brackets is copied directly to the
file lex.yy.c, and is not treated as a regular definition
• The definitions of the manifest constants are placed
here using C #define statements
– Associates unique integer codes with each of the manifest
constants
Example
– A sequence of regular definitions
• These use the extended notation for regular
expressions
• Regular definitions that are used in later definitions or
in the patterns of the translation rules are surrounded
by curly braces
• For instance, delim
– defined to be a shorthand for the character class consisting of
the blank, the tab, and the newline
– ws is defined to be one or more delimiters, by the regular
expression {delim}+
Example
• Patterns and rules in the middle section
• Auxiliary-function section
– Everything in the auxiliary section is copied directly to file lex.yy.c,
but may be used in the actions
– Function installID ()
• Called to place the lexeme found in the symbol table
• Returns a pointer to the symbol table, which is placed in global variable
yylval, where it can be used by the parser or a later component of the
compiler
• yytext is a pointer to the beginning of the lexeme
• yyleng is the length of the lexeme found
• The token name ID is returned to the parser
– Function installNum ()
• Action is taken when a lexeme matching the pattern number
Conflict Resolution in Lex
• The two rules that Lex uses to decide on the
proper lexemeto select, when several prefixes
of the input match one or more patterns:
– Always prefer a longer prefix to a shorter prefix
– If the longest possible prefix matches two or more
patterns, prefer the pattern listed first in the Lex
program
The Lookahead Operator
• Lex automatically reads one character ahead of the last
character that forms the selected lexeme
• Then retracts the input so only the lexeme itself is consumed
from the input
• Sometimes, a certain pattern to be matched to the input only
when it is followed by a certain other characters
• If so, use the slash in a pattern to indicate the end of the part of
the pattern that matches the lexeme
• What follows / is additional pattern that must be matched
before deciding that the token
• The second pattern is not part of the lexeme
int mark1, mark2;
The Lookahead Operator
• Keywords are not reserved
IF (I , J) = 3 IF is the name of an
array
IF (condition) THEN . . . IF is a keyword
IF / \ ( . * \) {letter} Lex rule

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy