0% found this document useful (0 votes)

124 views

Chapter Two LexicalAnalysis

This document provides an overview of lexical analysis in compiler design. It discusses how a lexical analyzer converts a sequence of characters into tokens by identifying character sequences and categorizing them. Key aspects covered include the role of lexical analysis in separating analysis from parsing for simpler design, efficiency, and portability. The document also describes tokens, patterns, lexemes, and how regular expressions are used to formally specify tokens. It introduces finite automata and explains how regular expressions can be converted to non-deterministic (NFA) and deterministic (DFA) models to perform lexical analysis.

Uploaded by

Asfaw Bassa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

124 views

Chapter Two LexicalAnalysis

Uploaded by

Asfaw Bassa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Bonga University

College of Engineering and Technology

Department of Computer Science
CoSc4103– COMPILER DESIGN
Chapter 2 Handouts – Lexical Analysis

Overview of Lexical Analysis

A lexical analyzer, also called a scanner, typically has the following functionality and
characteristics.

 Its primary function is to convert from a (often very long) sequence of characters into a
(much shorter, perhaps 10X shorter) sequence of tokens.
 The scanner must identify and Categorize specific character sequences into tokens. It
must know whether every two adjacent characters in the file belong together in the
same token, or whether the second character must be in a different token.
 Most lexical analyzers discard comments & whitespace. In most languages these
characters serve to separate tokens from each other, but once lexical analysis is
completed they serve no purpose.
 Handle lexical errors (illegal characters, malformed tokens) by reporting them intelligibly
to the user.
 Efficiency is crucial; a scanner may perform elaborate input buffering.
 Token categories can be (precisely, formally) specified using regular expressions, e.g.
 IDENTIFIER=[a-zA-Z][a-zA-Z0-9]*
 Lexical Analyzers can be written by hand, or implemented automatically using finite
automata.

Role of Lexical Analysis

Issues (why separating lexical analysis from parsing)

⚫
Simpler design
⚫
Compiler efficiency
⚫
Compiler portability (e.g. Linux to Win)

COMPILER DESIGN (CoSc4103) Page 1

What’s a Token?
⚫
A syntactic category
⚫
In English:
⚫
noun, verb, adjective, …
⚫
In a programming language:
⚫
Identifier, Integer, Keyword, Whitespace …

Tokens, Patterns and Lexemes

⚫
A token is a pair a token name and an optional token value
⚫
A pattern is a description of the form that the lexemes of a token may take
⚫
A lexeme is a sequence of characters in the source program that matches the pattern for a
token

Input buffering

⚫
Sometimes lexical analyzer needs to look ahead some symbols to decide about the token to
return
⚫
In C language: we need to look after -, = or < to decide what token to return
⚫
In Fortran: DO 5 I = 1.25
⚫
We need to introduce a two buffer scheme to handle large look-aheads safely

Specification of tokens

⚫
In theory of compilation regular expressions are used to formalize the specification of tokens
⚫
Regular expressions are means for specifying regular languages
⚫
Example:
⚫
Letter(letter | digit)*
⚫
Each regular expression is a pattern specifying the form of strings

Terminology of Languages
⚫ Alphabet : a finite set of symbols (ASCII characters)
⚫
String :
⚫
Finite sequence of symbols on an alphabet
⚫
Sentence and word are also used in terms of string
⚫
 is the empty string
⚫
|s| is the length of string s.

COMPILER DESIGN (CoSc4103) Page 2

⚫
Language: sets of strings over some fixed alphabet
⚫
 the empty set is a language.
⚫
{} the set containing empty string is a language
⚫
The set of well-formed C programs is a language
⚫
The set of all possible identifiers is a language.
⚫
Operators on Strings:
⚫
Concatenation: xy represents the concatenation of strings x and y. s  = s  s = s

⚫ sn = s s s .. s ( n times) s0 = 
Regular Expressions
⚫
We use regular expressions to describe tokens of a programming language.
⚫
A regular expression is built up of simpler regular expressions (using defining rules)
⚫
Each regular expression denotes a language.
⚫
A language denoted by a regular expression is called as a regular set.

Rules

Regular expressions over alphabet 


Reg. Expr Language it denotes
 {}
a  {a}
(r1) | (r2) L(r1)  L(r2)
(r1) (r2) L(r1) L(r2)
(r)* (L(r))*
(r) L(r)
⚫ (r)+ = (r)(r)*
⚫
(r)? = (r) | 
⚫
We may remove parentheses by using precedence rules.
⚫ * highest
⚫ concatenation next
⚫ | lowest
⚫⚫ ab |c means (a(b)*)|(c)
*
Ex:
⚫
 = {0,1}
⚫
0|1 => {0,1}
⚫
(0|1)(0|1) => {00,01,10,11}
⚫
0* => { ,0,00,000,0000,....}
⚫
(0|1)* => all strings with 0 and 1, including the empty string

COMPILER DESIGN (CoSc4103) Page 3

Finite Automata

⚫
A recognizer for a language is a program that takes a string x, and answers “yes” if x is a
sentence of that language, and “no” otherwise.
⚫
We call the recognizer of the tokens as a finite automaton.
⚫
A finite automaton can be: deterministic(DFA) or non-deterministic (NFA)
⚫
This means that we may use a deterministic or non-deterministic automaton as a lexical
analyzer.
⚫
Both deterministic and non-deterministic finite automatons recognize regular sets.
⚫
Which one?
⚫
deterministic – faster recognizer, but it may take more space
⚫
non-deterministic – slower, but it may take less space
⚫
Deterministic automatons are widely used lexical analyzers.
⚫
First, we define regular expressions for tokens; then we convert them into a DFA to get a lexical
analyzer for our tokens.
⚫  
Algorithm1: Regular Expression NFA DFA (two steps: first to NFA, then to
DFA)
⚫ 
Algorithm2: Regular Expression DFA (directly convert a regular expression into a DFA)

Non-Deterministic Finite Automaton (NFA)

⚫
A non-deterministic finite automaton (NFA) is a mathematical model that consists of:
⚫
S - a set of states
⚫
 - a set of input symbols (alphabet)
⚫
Move – a transition function move to map state-symbol pairs to sets of states.
⚫
s0 - a start (initial) state
⚫
F – a set of accepting states (final states)
⚫ - Transitions are allowed in NFAs. In other words, we can move from one state to another one
without consuming any symbol.
⚫
A NFA accepts a string x, if and only if there is a path from the starting state to one of accepting
states such that edge labels along this path spell out x.

Deterministic Finite Automaton (DFA)

 A Deterministic Finite Automaton (DFA) is a special form of a

NFA o no state has - transition
o for each symbol a and state s, there is at most one labeled edge a leaving s. o
i.e. transition function is from pair of state-symbol to state (not set of states)

Converting a Regular Expression into NFA (Thomson’s Construction)

⚫
This is one way to convert a regular expression into a NFA.
⚫
There can be other ways (much efficient) for the conversion.
⚫
Thomson’s Construction is simple and systematic method. It guarantees that the resulting NFA
will have exactly one final state, and one start state.

COMPILER DESIGN (CoSc3112) Page 4

⚫
Construction starts from simplest parts (alphabet symbols). To create a NFA for a complex
regular expression, NFAs of its sub-expressions are combined to create its NFA,

COMPILER DESIGN (CoSc3112) Page 5

COMPILER DESIGN (CoSc3112) Page 6
COMPILER DESIGN (CoSc3112) Page 7
Minimizing Number of States of a DFA
⚫
partition the set of states into two groups:
⚫
G1 : set of accepting states
⚫
G2 : set of non-accepting states
⚫
For each new group G
⚫
partition G into subgroups such that states s1 and s2 are in the same group iff

For all input symbols a, states s1 and s2 have transitions to states in the same group.

⚫
Start state of the minimized DFA is the group containing the start state of the original DFA.
⚫
Accepting states of the minimized DFA are the groups containing the accepting states of the
original DFA.

COMPILER DESIGN (CoSc3112) Page 8

Deterministic and Nondeterministic Automata
⚫
Deterministic Finite Automata (DFA)
⚫
One transition per input per state
⚫
No -moves
⚫
Nondeterministic Finite Automata (NFA)
⚫
Can have multiple transitions for one input in a given state
⚫
Can have -moves
⚫
Finite automata have finite memory
⚫
Need only to encode the current state

NFA vs. DFA

⚫
NFAs and DFAs recognize the same set of languages (regular languages)
⚫
DFAs are easier to implement
⚫
There are no choices to consider

Regular Expressions to Finite Automata

COMPILER DESIGN (CoSc3112) Page 9

Overview of Lex and Yacc


Lex (A LEXical Analyzer Generator)
Generates lexical analyzers (scanners or Lexers)


Yacc (Yet Another Compiler-Compiler) Generates
parser based on an analytic grammar

Flex is Free scanner alternative to Lex

Bison is Free parser generator program

Written for the GNU project alternative to Yacc

Lex: what is it?

1. Lex: a tool for automatically generating a lexer or scanner given a lex specification (.l
file)
2. A lexer or scanner is used to perform lexical analysis, or the breaking up of an input
stream into meaningful units, or tokens.
3. For example, consider breaking a text file up into individual words.

Lexical analyzer: scans the input stream and converts sequences of characters into tokens.

Token: a classification of groups of characters.

Examples: Lexeme Token
Sum ID
for FOR
= ASSIGN_OP
== EQUAL_OP
57 INTEGER_CONST
* MULT_OP
, COMMA
( LEFT_PAREN

Lex / Flex is a tool for writing lexical analyzers.

Lex / Flex: reads a specification file containing regular expressions

and generates a C routine that performs lexical analysis.

Matches sequences that identify tokens.

COMPILER DESIGN (CoSc3112) Page 10

Skeleton of a lex specification (.l file)

*.c is generated after

x.l
running

%{ This part will be

< C global variables, prototypes, comments embedded into *.c
>
%}
substitutions, code and
start states; will be
[DEFINITION SECTION] copied into *.c

define how to scan and

%% what action to take for
each token
[RULES SECTION]
any user code. For example, a
%% main function to call the scanning
function yylex().
< C auxiliary subroutines>

COMPILER DESIGN (CoSc3112) Page 11

The rules section
%%
[RULES SECTION]

<pattern> { <action to take when matched> } { <action

<pattern> to take when matched> }
…
%%

Patterns are specified by regular expressions.

For example:
%%
[A-Za-z]* { printf(“this is a word”); }
%%

Two Rules

1. lex will always match the longest (number of characters) token possible.

2. If two or more possible tokens are of the same length, then the token with the regular
expression that is defined first in the lex specification is favored.

Regular Expressions in lex / Flex:

a matches a
abc matches abc
[abc] matches a, b or c
[a-f] matches a, b, c, d, e, or f
[0-9] matches any digit
X+ matches one or more of X
X* matches zero or more of X
[0-9]+ matches any integer
(…) grouping an expression into a single unit
COMPILER DESIGN (CoSc3112) Page 12
| alternation (or)
(a|b|c)* is equivalent to [a-c]*
X? X is optional (0 or 1 occurrence)
if(def)? matches if or ifdef (equivalent to if|ifdef)
[A-Za-z] matches any alphabetical character
. matches any character except newline character
\. matches the . character
\n matches the newline character
\t matches the tab character
\\ matches the \ character
[ \t] matches either a space or tab character
[^a-d] matches any character other than a,b,c and d

Examples:

Real numbers, e.g., 0, 27, 2.10, .17

[0-9]+|[0-9]+\.[0-9]+|\.[0-9]+
[0-9]+(\.[0-9]+)?|\.[0-9]+
[0-9]*(\.)?[0-9]+

To include an optional preceding sign: [+-]?[0-9]*(\.)?[0-9]+

Special Functions
• yytext
–where text matched most recently is stored
• yyleng
–number of characters in text most recently matched
• yylval
–associated value of current token
• yymore()
–append next string matched to current contents of yytext
• yyless(n)
–remove from yytext all but the first n characters
• unput(c)
–return character c to input stream
• yywrap()
–may be replaced by user
–The yywrap method is called by the lexical analyser whenever it inputs an
EOF as the first character when trying to match a regular expression

Yacc / Bison: what is it?

Yacc: a tool for automatically generating a parser given a grammar written in a yacc
specification (.y file)

COMPILER DESIGN (CoSc3112) Page 13

A grammar specifies a set of production rules, which define a language. A production rule
specifies a sequence of symbols, sentences, which are legal in the language.

Skeleton of a yacc specification (.y file)

*.c is generated after running

x.y

%{
< C global variables, prototypes, comments > This part will be embedded
into *.c
%}

contains token declarations. Tokens are

[DEFINITION SECTION] recognized in lexer.

define how to “understand”

%%
the input language, and what
[PRODUCTION RULES SECTION] %% actions to take for each “sentence”.

any user code. For example, a main

< C auxiliary subroutines> function to call the parser function
yyparse()

Structure of yacc File

Definition section

Declarations of tokens

Type of values used on parser stack

Rules section

List of grammar rules with semantic routines

User code

The Production Rules Section

production: symbol1 symbol2 … { action }

| symbol3 symbol4 … { action }

|…

production: symbol1 symbol2 { action }

COMPILER DESIGN (CoSc3112) Page 14

1. Lex program to count number of vowels and consonants
%{
int v=0,c=0;
%}
%%
[aeiouAEIOU]
v++; [a-zA-Z] c++;
%%
main()
{
printf("ENTER INTPUT : \n"); yylex();
printf("VOWELS=%d\nCONSONANTS=%d\n",v,c);

}
2. Lex program to count the type of numbers
%{
int pi=0,ni=0,pf=0,nf=0;
%}
%%
\+?[0-9]+ pi++;
\+?[0-9]*\.[0-9]+ pf++;
\-[0-9]+ ni++;
\-[0-9]*\.[0-9]+ nf++;
%%
main()
{
printf("ENTER INPUT : ");
yylex();
printf("\nPOSITIVE INTEGER : %d",pi);
printf("\nNEGATIVE INTEGER : %d",ni);
printf("\nPOSITIVE FRACTION : %d",pf);
printf("\nNEGATIVE FRACTION : %d\n",nf);
}
3. Lex program to find simple and compound
statements %{ }%

%%
"and"|
"or"|
"but"|
"because"|
"nevertheless" {printf("COMPOUND STATEMENT"); exit(0); }
.;
\n return 0;
%%
main()

COMPILER DESIGN (CoSc3112) Page 15

{
prntf("\nENTER THE STATEMENT : ");
yylex();
printf("SIMPLE STATEMENT");
}
4. Lex program to word count
/* just like Unix wc */
%{
int chars = 0;
int words = 0;
int lines = 0;
%}
%%
[a-zA-Z]+ { words++; chars += strlen(yytext);
} \n { chars++; lines++; }
. { chars++; }
%%
main(int argc, char **argv)
{
yylex();
printf("%8d%8d%8d\n", lines, words, chars);
}
5. Lex program for English to American
%%
"colour" { printf("color"); }
"flavour" { printf("flavor"); }
"clever" { printf("smart"); }
"smart" { printf("elegant"); }
"conservative" { printf("liberal"); }
… lots of other words …
. { printf("%s", yytext); }
%%

COMPILER DESIGN (CoSc3112) Page 16

[FREE PDF sample] Practice Makes Perfect Spanish Vocabulary 3rd Edition Dorothy Richmond ebooks
No ratings yet
[FREE PDF sample] Practice Makes Perfect Spanish Vocabulary 3rd Edition Dorothy Richmond ebooks
62 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
2.1 Constituents of Lexical Analysis
No ratings yet
2.1 Constituents of Lexical Analysis
10 pages
Lexical Analysis
No ratings yet
Lexical Analysis
36 pages
Chapter 3 - Lexical Analysis
100% (3)
Chapter 3 - Lexical Analysis
51 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
Mod 1.2 - Lexical Analysis 2 _CH03REC-B
No ratings yet
Mod 1.2 - Lexical Analysis 2 _CH03REC-B
19 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
No ratings yet
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
52 pages
ch-2.pdf 2
No ratings yet
ch-2.pdf 2
27 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
No ratings yet
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
69 pages
Chapter 2 - Copy
No ratings yet
Chapter 2 - Copy
39 pages
CC_unit_2
No ratings yet
CC_unit_2
80 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
55 pages
2 - Scanner
No ratings yet
2 - Scanner
49 pages
compiler construction Lecture 3-4
No ratings yet
compiler construction Lecture 3-4
78 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
Lexical Analysis
No ratings yet
Lexical Analysis
88 pages
UNIT-I - Lexical Analysis
No ratings yet
UNIT-I - Lexical Analysis
51 pages
Ch3 1
No ratings yet
Ch3 1
52 pages
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
No ratings yet
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
88 pages
File 1675742677 110405 LexicalAnalysis-Continue1
No ratings yet
File 1675742677 110405 LexicalAnalysis-Continue1
39 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
34 pages
02. Chapter 3 - Lexical Analysis (1)
No ratings yet
02. Chapter 3 - Lexical Analysis (1)
52 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
CompilerD L3
No ratings yet
CompilerD L3
36 pages
Compiler Design
No ratings yet
Compiler Design
42 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
Chapter-2[1]
No ratings yet
Chapter-2[1]
77 pages
02. Chapter 3 - Lexical Analysis
No ratings yet
02. Chapter 3 - Lexical Analysis
51 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Ch3myppt
No ratings yet
Ch3myppt
59 pages
SLD 2
No ratings yet
SLD 2
67 pages
Compiler Design Part 2
No ratings yet
Compiler Design Part 2
20 pages
CD_Unit1_Lecture4-5-6-7
No ratings yet
CD_Unit1_Lecture4-5-6-7
50 pages
Lexical Analysis All Token List and Diffence
No ratings yet
Lexical Analysis All Token List and Diffence
4 pages
Chapter-2
No ratings yet
Chapter-2
99 pages
2024 CSN352 Lec 8
No ratings yet
2024 CSN352 Lec 8
48 pages
Implementation of The Regular Expression
No ratings yet
Implementation of The Regular Expression
10 pages
2 Lexical
100% (1)
2 Lexical
7 pages
Lexical Analysis
No ratings yet
Lexical Analysis
47 pages
Lecture 04
No ratings yet
Lecture 04
37 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
Lexical Analyzer (Compiler Contruction)
100% (1)
Lexical Analyzer (Compiler Contruction)
6 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Unit I Bks Lexical Analysis V - Re - and - Fsa
No ratings yet
Unit I Bks Lexical Analysis V - Re - and - Fsa
52 pages
CD_Unit II_Notes
No ratings yet
CD_Unit II_Notes
20 pages
Lexical Analysis1
No ratings yet
Lexical Analysis1
44 pages
CC Note 1
No ratings yet
CC Note 1
11 pages
The Structure of A Compiler: Any Compiler Must Perform Two Major Tasks
No ratings yet
The Structure of A Compiler: Any Compiler Must Perform Two Major Tasks
57 pages
5CS4-02-CD - Guess Paper @zammers
No ratings yet
5CS4-02-CD - Guess Paper @zammers
97 pages
Lexical Analysis I
No ratings yet
Lexical Analysis I
15 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
Chinese Characters History
100% (3)
Chinese Characters History
85 pages
Unit 7.2 PDF
No ratings yet
Unit 7.2 PDF
28 pages
Medgyes, P. (1996) - Native or Non-Native Who's Worth More
No ratings yet
Medgyes, P. (1996) - Native or Non-Native Who's Worth More
11 pages
Handout 2. Match The Activities and The Principles
100% (1)
Handout 2. Match The Activities and The Principles
2 pages
Russian in Plain English
93% (15)
Russian in Plain English
301 pages
Guided Discovery-Task Types
100% (1)
Guided Discovery-Task Types
4 pages
European Portuguese beginner level A1 lessons - YouTube
No ratings yet
European Portuguese beginner level A1 lessons - YouTube
6 pages
Adjectival Clauses
No ratings yet
Adjectival Clauses
7 pages
Lesson 1 - Lesson 8
No ratings yet
Lesson 1 - Lesson 8
126 pages
Interview With Gunther Kress: by Fredrik Lindstrand
No ratings yet
Interview With Gunther Kress: by Fredrik Lindstrand
7 pages
Cee b2 RB Ack A7 Resource
No ratings yet
Cee b2 RB Ack A7 Resource
1 page
Great Writing 1 (Unit1)
No ratings yet
Great Writing 1 (Unit1)
28 pages
Optimise B1plus 90h
No ratings yet
Optimise B1plus 90h
24 pages
Lessons 4 Favourite Things Inj@ - LCC
No ratings yet
Lessons 4 Favourite Things Inj@ - LCC
10 pages
By F. W. Thomas
No ratings yet
By F. W. Thomas
28 pages
1past SImple, Present Perfect and Continuous PDF
No ratings yet
1past SImple, Present Perfect and Continuous PDF
1 page
Ruby Bridges
No ratings yet
Ruby Bridges
15 pages
Computation and Cognition Toward A Foundation For Cognitive Science. (Zenon W. Pylyshyn) (Z-Library) - 69-108 - CAP3
No ratings yet
Computation and Cognition Toward A Foundation For Cognitive Science. (Zenon W. Pylyshyn) (Z-Library) - 69-108 - CAP3
40 pages
Lesson Plan 5th Grade Street Food
No ratings yet
Lesson Plan 5th Grade Street Food
11 pages
Module in Creative Writing: First Quarter / Week 2 / Day 3
No ratings yet
Module in Creative Writing: First Quarter / Week 2 / Day 3
5 pages
Identification of Parts of Speech
No ratings yet
Identification of Parts of Speech
5 pages
Alternatives To If
No ratings yet
Alternatives To If
1 page
FUNCTIONS of LANGUAGE L. Hébert
100% (2)
FUNCTIONS of LANGUAGE L. Hébert
7 pages
Cambridge IGCSE™: Spanish 0530/41
No ratings yet
Cambridge IGCSE™: Spanish 0530/41
13 pages
Storytown Practice Book Grade 1 1st Edition Harcourt School Publishers pdf download
100% (1)
Storytown Practice Book Grade 1 1st Edition Harcourt School Publishers pdf download
71 pages
Reading Buddies Log
No ratings yet
Reading Buddies Log
6 pages
Mandarin Introduction Phonetics (汉语语音）
No ratings yet
Mandarin Introduction Phonetics (汉语语音）
28 pages
Human Resource Management International Digest: Article Information
No ratings yet
Human Resource Management International Digest: Article Information
4 pages
The Bilingual Programme
No ratings yet
The Bilingual Programme
52 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Chapter Two LexicalAnalysis

Uploaded by

Chapter Two LexicalAnalysis

Uploaded by

Bonga University

College of Engineering and Technology

Overview of Lexical Analysis

Role of Lexical Analysis

Issues (why separating lexical analysis from parsing)

COMPILER DESIGN (CoSc4103) Page 1

Tokens, Patterns and Lexemes

COMPILER DESIGN (CoSc4103) Page 2

Regular expressions over alphabet 

COMPILER DESIGN (CoSc4103) Page 3

Non-Deterministic Finite Automaton (NFA)

Deterministic Finite Automaton (DFA)

 A Deterministic Finite Automaton (DFA) is a special form of a

Converting a Regular Expression into NFA (Thomson’s Construction)

COMPILER DESIGN (CoSc3112) Page 4

COMPILER DESIGN (CoSc3112) Page 5

COMPILER DESIGN (CoSc3112) Page 8

NFA vs. DFA

Regular Expressions to Finite Automata

COMPILER DESIGN (CoSc3112) Page 9

Written for the GNU project alternative to Yacc

Lex: what is it?

Token: a classification of groups of characters.

Lex / Flex is a tool for writing lexical analyzers.

Lex / Flex: reads a specification file containing regular expressions

Matches sequences that identify tokens.

COMPILER DESIGN (CoSc3112) Page 10

*.c is generated after

%{ This part will be

define how to scan and

COMPILER DESIGN (CoSc3112) Page 11

<pattern> { <action to take when matched> } { <action

Patterns are specified by regular expressions.

Regular Expressions in lex / Flex:

Real numbers, e.g., 0, 27, 2.10, .17

To include an optional preceding sign: [+-]?[0-9]*(\.)?[0-9]+

Yacc / Bison: what is it?

COMPILER DESIGN (CoSc3112) Page 13

Skeleton of a yacc specification (.y file)

*.c is generated after running

contains token declarations. Tokens are

define how to “understand”

any user code. For example, a main

Structure of yacc File

Type of values used on parser stack

List of grammar rules with semantic routines

The Production Rules Section

production: symbol1 symbol2 … { action }

| symbol3 symbol4 … { action }

production: symbol1 symbol2 { action }

COMPILER DESIGN (CoSc3112) Page 14

COMPILER DESIGN (CoSc3112) Page 15

COMPILER DESIGN (CoSc3112) Page 16

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.