Lex
Lex
INTRODUCTION TO COMPILERS
Lexical Analysis
Lex – The Lexical-Analyzer Generator
Introduction
• Recent implementation – Flex
• To specify a lexical analyzer by specifying
regular expressions to describe patterns for
tokens
• Tool: Lex compiler
• Input notation: Lex language
• Lex compiler transforms the input patterns into
a transition diagram and generates code,
lex.yy.c, simulates the transition diagram
Use of Lex
• Creating a lexical analyzer with Lex
Use of Lex
• An input file, lex.l, is written in the Lex language
and describes the lexical analyzer to be generated
• The Lex compiler transforms lex.l to a C program
always named lex.yy.c
• The latter file is compiled by the C compiler into a
file called a.out
• The C-compiler output is a lexical analyzer that
take a stream of input characters and produce a
stream of tokens
Use of Lex
• a.out
– A subroutine of the parser
– Returns an integer, a code for one of the possible token
names
– The attribute value, whether a numeric code, a pointer
to the symbol table, or nothing, is placed in a global
variable yylval
– yylval is shared between the lexical analyzer and parser
– Thereby returns both the name and an attribute value
of a token
Structure of Lex Programs
• A Lex program has the following form:
declarations
%%
translation rules
%%
auxiliary functions
/* regular definitions */
delim [ \t\n]
ws {del im}+
letter [A-Za-z]
digit [0-9]
id {letter}({letter}|{digit})*
number {digit}+(\.{digit }+)?(E[+-]?{digit}+)?
%%
Example
{ws} {/* no action and no return */}
if {return (IF);}
then {return (THEN);}
else {return (ELSE) ;}
{id} {yylval = (int) installID (); return (ID);}
{number} {yylval = (int) installNum (); return (NUMBER);}
"<“ {yylval = LT; return (RELOP);}
" <=“ {yylval = LE; return (RELOP);}
"=“ {yylval = EQ; return (RELOP);}
" <>“ {yylval = NE; return (RELOP);}
">“ {yylval = GT; return (RELOP);}
" >=“ {yylval = GE; return (RELOP);}
%%
Example
int installID () {/* function to install the lexeme, whose
first character is pointed to by yytext,
and whose length is yyleng, into the
symbol table and return a pointer
thereto */
}