LexYacc Final
LexYacc Final
On
Lex & Yacc
Compiler Phases
Font End
Syntax Syntax
High Level
Tree
program
Intermediate
Code
Target Machine
Machine-
Machine-Dependent Code Independent Code
Code
Back End
Lex: what is it?
1. Lex: a tool for automatically generating a lexer or scanner
given a lex specification (.l file)
2. A lexer or scanner is used to perform lexical analysis, or the
breaking up of an input stream into meaningful units, or
tokens.
3. For example, consider breaking a text file up into individual
words.
x = a + b * 2;
[(identifier, x), (operator, =), (identifier, a), (operator, +), (identifier, b),
(operator, *), (literal, 2), (separator, ;)]
Skeleton of a lex specification (.l file)
• Input specification for Lex
•Three parts: Definitions, Rules, User definition
•Use “%%” as a delimiter of the each part
[_a-zA-Z][_a-zA-Z0-9]
{action}
“//”.*
•white space
[ \t]+
• English sentence: Look at this!
([ \t]+|[a-zA-Z]+)+(“.”|”?”|!)
Generation of lexical analyzer using LEX
LEX Compiler
Lex specification Lex.yy.c
file x.l
C compiler a.out
Lex.yy.c
Executable program
a.out
Input strings from Stream of tokens
source program
Lex program example 1
%{
/* program to count words, lines and characters in the given file */
#include<stdio.h>
int nblk,nword,nchar,nline;
%}
%%
\n {nline++,nchar++;}
[^ \t\n]+ {nword++,nchar+=yyleng;}
"" {nblk++,nchar++;}
. {nchar++;}
%%
int yywrap()
{
return 1;
}
Contn …
int main()
{
yyin = fopen("in.dat","r");
yylex();
fclose(yyin);
printf("\n char count is :%d",nchar);
printf("\n blk count is :%d",nblk);
printf("\n word count is :%d",nword);
printf("\n line count is :%d",nline);
}
Lex program compilation
steps
• 1) lex example.l
• 2) cc lex.yy.c –ll
• 3) ./a.out
Both lex and yacc pgm are
involved
• 1) lex 1a.l
• 2) yacc –d 1b.l
• 3) cc lex.yy.c y.tab.c -ll
• 4) ./a.out
Example program 2
%{
/* program to count comments in the input file and delete
them */
#include <stdio.h>
int comment=0;
%}
%%
"/*".*"*/" {comment++;}
. ECHO;
%%
Contn …
int main(int argc, char *argv[])
{
FILE *fp;
if(argc > 1)
{
fp = fopen(argv[1],"r");
yyout = fopen(argv[2],"w");
if(!fp)
{
printf("error in opening the file");
exit(1);
}
yyin = fp;
yylex();
}
printf("no of comments lines:%d",comment);
}
yytext
Special Functions
•
– where text matched most recently is stored
• yyleng
– number of characters in text most recently matched
• yylval
– associated value of current token
• yymore()
– append next string matched to current contents of yytext
• yyless(n)
– remove from yytext all but the first n characters
• unput(c)
– return character c to input stream
• yywrap()
– When the end of the file is reached the return value of
yywrap() is checked.
– If it is non-zero, scanning terminates and if it is 0 scanning
continues with next input file.
Grammar
• Grammar is nothing but the collection of rules
• These rules are required to arrange the tokens in some proper
sequence so that the syntax of the language can be defined
• Ex: for being a sentence, there should be some rule to be followed.
– Ex: noun verb adjective , Noun verb , Noun verb noun
Here means particular token can be replaced by the productions present on RHS.
Here | indicates “or”.
Parser lexer communication
• In the process of compilation, the lexical analyzer and parser work
together.
• When parser requires string of tokens, it invokes lexical analyzer. In
turn, the lexical analyzer supplies tokens to the parser
Error handler
Source
program
Demand for tokens
Lexical analyzer Parser Rest of compiler
Parse
Supply for tokens tree
Output
Symbol table code
Yacc: what is it?
Yacc: a tool for automatically generating a parser given
a grammar written in a yacc specification (.y file)
Definition section
declarations of tokens
type of values used on parser stack
Rules section
list of grammar rules with semantic routines
User code
Skeleton of a yacc specification (.y
file)
*.c is generated after
x.y
running
%{
< C global variables, prototypes, comments > This part will be
embedded into *.c
%}
number number
5 + 4 - 3 + 2
Choosing a Grammar
S -> E S -> E
E -> E + T E -> E + E
E -> E - T
E -> T E ->E - E
T -> T * F E -> E * E
T -> T / F E -> E / E
T -> F E -> ( E )
F -> ( E )
E -> ID
F -> ID
Precedence and
Associativity
%right ‘='
%left '-' '+'
%left '*' '/'
%right '^'
Precedence Rules are used
in two situations
• In expression grammars
• To resolve the “dangling else”
conflict in grammars for if-then-else
language constructs.
Defining Values
%{
#include <stdio.h>
#include "y.tab.h"
%}
id [_a-zA-Z][_a-zA-Z0-9]*
wspc [ \t\n]+
semi [;]
comma [,]
%%
int { return INT; }
char { return CHAR; }
float { return FLOAT; }
{comma} { return COMMA; } /* Necessary? */
{semi} { return SEMI; }
{id} { return ID;}
{wspc} {;}
decl.y
Example: Definitions
%{
#include <stdio.h>
#include <stdlib.h>
%}
%start line
%token CHAR, COMMA, FLOAT, ID, INT, SEMI
%%
decl.y
Example: Rules
%%
decl.y
main()
{
do {
yyparse();
} while(!feof(yyin));
}
yyerror(char *s)
{
/* Don't have to do anything! */
}
What yacc cannot Parse
• It cannot deal with ambiguous
grammars.
• It also cannot deal with grammars
that need more than one token of
lookahead to tell whether it has
matched a rule.
Example
Phrase -> cart_animal AND CART
| work_animal AND PLOW
cart_animal -> HORSE | GOAT
work_animal -> HORSE | OX
HORSE AND CART
shift-reduce conflict
Handling Conflicts
General approach:
Iterate as necessary:
1. Use “yacc -v” to generate the file y.output.
2. Examine y.output to find parser states with conflicts.
3. For each such state, examine the items to figure why the conflict is
occurring.
4. Transform the grammar to eliminate the conflict