0% found this document useful (0 votes)
21 views13 pages

Lexical Analysis of Compiler

Uploaded by

Nikhil Ashok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views13 pages

Lexical Analysis of Compiler

Uploaded by

Nikhil Ashok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Compiler Construction

1
Lexical Analysis

• The scanner is the first component of the front-end of a


compiler; parser is the second.

2
Lexical Analysis

• The task of the scanner is to take a program written in some


programming language as a stream of characters and break
it into a stream of tokens.
• This activity is called lexical analysis.
• A token, however, contains more than just the words
extracted from the input.
• The lexical analyzer partition input string into substrings,
called words, and classifies them according to their role.

3
Tokens

• A token is a syntactic category in a sentence of a language.


Consider the sentence:
• “He wrote the program”
of the natural language English.
• The words in the sentence are: “He”, “wrote”, “the” and
“program”.
• The blanks between words have been ignored.
• These words are classified as subject, verb, object etc.
These are the roles.

4
Tokens

• Similarly, the sentence in a programming language like C:


• if(b == 0) a = b
• the words are “if”, “(”, “b”, “==”, “0”, “)”, “a”, “=” and
“b”.
• The roles are keyword, variable, boolean operator,
assignment operator.
• The pair <role,word> is given the name token.

5
Tokens

• Here are some familiar tokens in programming languages:


• Identifiers: x , y11 , maxsize
• Keywords: if else , while , for
• Integers: 2 , 1000 , -44 , 5L
• Floats: 2.0 ,0.0034 , 1e5
• Symbols: ( ) ,+ , * , / , ,{ } ,< , > , ==
• Strings: “enter x” , “error”

6
Ad-hoc Lexer

• The task of writing a scanner is fairly straight forward.


• We can hand-write code to generate tokens.
• We do this by partitioning the input string by reading
left-to-right, recognizing one token at a time.
• We will need to look-ahead in order to decide where
one token ends and the next token begins.

7
Ad-hoc Lexer

• The following C++ code presented is template for a Lexer


class. An object of this class can produce the desired tokens
from the input stream.
class Lexer
{
Inputstream s;
char next; //look ahead
Lexer( Inputstream _s)
{
s = _s;
next = s.read();
}
8
Ad-hoc Lexer

Token nextToken()
{
If ( idChar(next) )
return readId();
if ( number(next) )
return readNumber();
If ( next == ‘”’ )
return readString();
. . . .
} 9
Ad-hoc Lexer

Token readId()
{
string id = “”;
while(true)
{
char c = input.read();
If(idChar(c) == false)
return new Token(TID,id);
id = id + string(c);
}
}
10
Ad-hoc Lexer

boolean idChar(char c)
{
if( isAlpha(c) )
return true;
if( isDigit(c) )
return true;
if( c == ‘_’ )
return true;

return false;
}
11
Ad-hoc Lexer

Token readNumber()
{
string num = “”;
while(true)
{
next = input.read();
if( !isNumber(next))
return new Token(TNUM,num);
num = num + string(next);
}
}
12
Ad-hoc Lexer

Problems :
• We do not know what kind of token we are going to read
from seeing first character.
• If token begins with “i”, is it an identifier “i” or keyword
“if”?
• If token begins with “=”, is it “=” or “==” ?

• We need a more principled approach. The most frequently


used approach is to use a lexer generator that generates
efficient tokenizer automatically.

13

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy