0% found this document useful (0 votes)
21 views16 pages

Lexical Analyzer

Uploaded by

Maheen Munir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views16 pages

Lexical Analyzer

Uploaded by

Maheen Munir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

COMPILER

CONSTRUCTION
Lexical Analysis
 Lexical Analysis (LA) is the first phase of the Compiler
 Only phase among other phases which will interact with the source program
 Basic responsibility of LA is to read source programs and convert them into the
stream of tokens
The Role of the Lexical Analyzer
 As the first phase of a compiler, the main task of the lexical analyzer is to read
the input characters of the source program, group them into lexemes, and
produce as output a sequence of tokens for each lexeme in the source program.
 The stream of tokens is sent to the parser for syntax analysis. It is common for
the lexical analyzer to interact with the symbol table as well.
 When the lexical analyzer discovers a lexeme constituting an identifier, it needs
to enter that lexeme into the symbol table.
The Role of the Lexical Analyzer
The Role of the Lexical Analyzer
 The lexical analyzer is the part of the compiler that reads the source text, it may
perform certain other tasks besides the identification of lexemes.
 One such task is stripping out comments and whitespace (blank, newline, tab,
and perhaps other characters that are used to separate tokens in the input).
 It counts the number of lines in the source program.
 Another task is correlating error messages generated by the compiler with the
source program.
Separation of Lexical Analyzer from Syntax
 Simpler design is perhaps the most important consideration. The separation of
lexical analysis from syntax analysis often allows us to simply one or the other of
these phases
 Compiler efficiency is improved
 Compiler Portability is enhanced
Tokens, Patterns and Lexemes
 A token is a pair consisting of a token name and an optional attribute value.
The token name is an abstract symbol representing a kind of lexical unit, e.g., a
particular keyword, or a sequence of input characters denoting an identifier
 A pattern is a description of the form that the lexemes of a token may take. In
the case of a keyword as a token, the pattern is just the sequence of characters
that form the keyword.
 A lexeme is a sequence of characters in the source program that matches the
pattern for a token and is identified by the lexical analyzer as an instance of
that token.
Examples of Tokens
 Figure 3.2 gives some typical tokens, their informally described patterns, and
some sample lexemes.
 To see how these concepts are used in practice, in the C statement
printf ("Total = %d/n”, score) ;
 both printf and score are lexemes matching the pattern for token id, and
"Total = %d/n” is a lexeme matching literal.
Examples of Tokens
Attributes for Token / Examples of Tokens
 The token names and associated attribute values for the Fortran statement are
written below as a sequence of pairs.
E = M * C ** 2
Attributes for Token / Examples of Tokens
<id, pointer to symbol-table entry for E>
< assign - op >
<id, pointer to symbol-table entry for M>
<multi - op>
<id, pointer to symbol-table entry for C>
<exp - op>
<number, integer value 2 >
Attributes for Token / Examples of Tokens
 The token names and associated attribute values for the Fortran statement are
written below as a sequence of pairs.
E = M * C ** 2
 Note that in certain pairs, especially operators, punctuation, and keywords,
there is no need for an attribute value.
Lexical Error
 It is hard for a lexical analyzer to tell, without the aid of other components, that
there is a source-code error. For instance, if the string fi is encountered for the
first time in a C program in the context
fi(a==b)
 A lexical analyzer cannot tell whether fi is a misspelling of the keyword if or an
undeclared function identifier. Since fi is a valid lexeme for the token id
 The lexical analyzer must return the token id to the parser and let some other
phase of the compiler - probably the parser in this case - handle an error
Lexical Error
 However, suppose a situation arises in which the lexical analyzer is unable to
proceed because none of the patterns for tokens matches any prefix of the
remaining input.
Panic Mode Recovery
 The simplest recovery strategy is "panic mode" recovery. We delete successive
characters from the remaining input until the lexical analyzer can find a well-
formed token at the beginning of what input is left.
Other possible error-recovery actions are:
1. Delete one character from the remaining input.
2. Insert a missing character into the remaining input.
3. Replace a character with another character.
4. Transpose of two adjacent characters.
Example to convert token

float limitedSquare(x){float x;
/* returns x-squared, nut never more than 100 */
return (x <= -10.0 || x >= 10.0) ? 100 : x*x;
}

Which lexeme should get associated lexical values?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy