0% found this document useful (0 votes)
45 views5 pages

Lecture08 4up

The document discusses finite automata and scanners. It defines what a finite automaton is and its key components - states, transitions, a start state, and accepting states. It also discusses how finite automata can be represented graphically and how they are used to recognize tokens specified by regular expressions. Deterministic and nondeterministic finite automata are introduced.

Uploaded by

Mahamad Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views5 pages

Lecture08 4up

The document discusses finite automata and scanners. It defines what a finite automaton is and its key components - states, transitions, a start state, and accepting states. It also discusses how finite automata can be represented graphically and how they are used to recognize tokens specified by regular expressions. Deterministic and nondeterministic finite automata are introduced.

Uploaded by

Mahamad Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Examples • A comment delimited by ## markers,

which allows single #’s within the


Let D be the ten single digits comment body:
and let L be the set of all 52 Comment2 =
letters. Then
## ((# | λ) Not(#) )* ##
• A Java or C++ single-line comment
that begins with // and ends with Eol All finite sets and many infinite sets
can be defined as: are regular. But not all infinite sets
Comment = // Not(Eol)* Eol are regular. Consider the set of
balanced brackets of the form
• A fixed decimal literal (e.g., 12.345) [ [ […] ] ].
can be defined as: This set is defined formally as
Lit = D+. D+ { [m ]m | m ≥ 1 }.
This set is known not to be regular.
• An optionally signed integer literal Any regular expression that tries to
can be defined as: define it either does not get all
IntLiteral = ( '+' | − | λ ) D+ balanced nestings or it includes extra,
unwanted strings.
(Why the quotes on the plus?)

© ©
CS 536 Spring 2007 94 CS 536 Spring 2007 95

Finite Automata and Scanners These four components of a


finite automaton are often
A finite automaton (FA) can be represented graphically:
used to recognize the tokens
specified by a regular is a state
expression. FAs are simple,
idealized computers that
recognize strings belonging to is a transition
regular sets. An FA consists of:
is the start state
• A finite set of states

• A set of transitions (or moves) from is an accepting state


one state to another, labeled with
characters in Σ
• A special state called the start state Finite automata (the plural of
automaton is automata) are
• A subset of the states called the
represented graphically using
accepting, or final, states
transition diagrams. We start at
the start state. If the next input
character matches the label on

© ©
CS 536 Spring 2007 96 CS 536 Spring 2007 97
a transition from the current Deterministic Finite Automata
state, we go to the state it
points to. If no move is As an abbreviation, a transition
possible, we stop. If we finish may be labeled with more than
in an accepting state, the one character (for example,
Not(c)). The transition may be
sequence of characters read taken if the current input
forms a valid token; otherwise, character matches any of the
we have not seen a valid token. characters labeling the transition.
If an FA always has a unique
In this diagram, the valid transition (for a given state and
character), the FA is deterministic
tokens are the strings (that is, a deterministic FA, or
described by the regular DFA). Deterministic finite
expression (a b (c)+ )+. automata are easy to program
and often drive a scanner.
a
If there are transitions to more
than one state for some character,
a b c then the FA is nondeterministic
(that is, an NFA).

© ©
CS 536 Spring 2007 98 CS 536 Spring 2007 99

A DFA is conveniently represented


in a computer by a transition / / Eol
1 2 3 4
table. A transition table, T, is a eof

two dimensional array indexed by


a DFA state and a vocabulary Not(Eol)
symbol.
Table entries are either a DFA The corresponding transition
state or an error flag (often table is:
represented as a blank table State Character
entry). If we are in state s, and / Eol a b …
read character c, then T[s,c] will 1 2
be the next state we visit, or T[s,c] 2 3
will contain an error marker 3 3 4 3 3 3
indicating that c cannot extend 4
the current token. For example, A complete transition table
the regular expression contains one column for each
// Not(Eol)* Eol character. To save space, table
compression may be used. Only
which defines a Java or C++ non-error entries are explicitly
single-line comment, might be represented in the table, using
hashing, indirection or linked
translated into
structures.

© ©
CS 536 Spring 2007 100 CS 536 Spring 2007 101
All regular expressions can be For example, suppose
translated into DFAs that accept CurrentChar is the current input
(as valid tokens) the strings character. End of file is
defined by the regular represented by a special character
expressions. This translation can value, eof. Using the DFA for the
be done manually by a Java comments shown earlier, a
programmer or automatically table-driven scanner is:
using a scanner generator. State = StartState
A DFA can be coded in: while (true){
if (CurrentChar == eof)
• Table-driven form
break
• Explicit control form NextState =
T[State][CurrentChar]
In the table-driven form, the if(NextState == error)
transition table that defines a break
DFA’s actions is explicitly State = NextState
represented in a run-time table read(CurrentChar)
that is “interpreted” by a driver }
program. if (State in AcceptingStates)
In the direct control form, the // Process valid token
transition table that defines a else // Signal a lexical error
DFA’s actions appears implicitly as
the control logic of the program.

© ©
CS 536 Spring 2007 102 CS 536 Spring 2007 103

This form of scanner is produced The token being scanned is


by a scanner generator; it is “hardwired” into the logic of the
definition-independent. The code. The scanner is usually easy
scanner is a driver that can scan to read and often is more
any token if T contains the efficient, but is specific to a single
appropriate transition table. token definition.
Here is an explicit-control scanner
for the same comment definition:
if (CurrentChar == '/'){
read(CurrentChar)
if (CurrentChar == '/')
repeat
read(CurrentChar)
until (CurrentChar in
{eol, eof})
else //Signal lexical error
else // Signal lexical error
if (CurrentChar == eol)
// Process valid token
else //Signal lexical error

© ©
CS 536 Spring 2007 104 CS 536 Spring 2007 105
More Examples • An identifier consisting of letters,
digits, and underscores, which begins
• A FORTRAN-like real literal (which with a letter and allows no adjacent
requires digits on either or both sides or trailing underscores, may be
of a decimal point, or just a string of defined as
digits) can be defined as

+ * +
ID = L (L | D)* ( _ (L | D)+)*
RealLit = (D (λ | . )) | (D .D )
This definition includes identifiers
This corresponds to the DFA like sum or unit_cost, but
excludes _one and two_ and
D grand___total. The DFA is:
.

D D L|D

D L _
.

L|D

© ©
CS 536 Spring 2007 106 CS 536 Spring 2007 107

Lex/Flex/JLex This approach greatly simplifies


building a scanner, since most of
Lex is a well-known Unix scanner the details of scanning (I/O,
generator. It builds a scanner, in buffering, character matching,
C, from a set of regular etc.) are automatically handled.
expressions that define the
tokens to be scanned.
Flex is a newer and faster version
of Lex.
Jlex is a Java version of Lex. It
generates a scanner coded in
Java, though its regular
expression definitions are very
close to those used by Lex and
Flex.
Lex, Flex and JLex are largely non-
procedural. You don’t need to tell
the tools how to scan. All you
need to tell it what you want
scanned (by giving it definitions
of valid tokens).

© ©
CS 536 Spring 2007 108 CS 536 Spring 2007 109
JLex You compile f.jlex.java just
like any Java program, using your
JLex is coded in Java. To use it, favorite Java compiler.
you enter After compilation, the class file
java JLex.Main f.jlex Yylex.class is created.
Your CLASSPATH should be set to It contains the methods:
search the directories where JLex’s
• Token yylex() which is the actual
classes are stored.
(The CLASSPATH we gave you scanner. The constructor for Yylex
includes JLex’s classes). takes the file you want scanned, so
After JLex runs (assuming there new Yylex(System.in)
are no errors in your token will build a scanner that reads from
specifications), the Java source System.in. Token is the token
file class you want returned by the
f.jlex.java is created. (f stands scanner; you can tell JLex what class
for any file name you choose. you want returned.
Thus csx.jlex might hold token • String yytext() returns the
definitions for CSX, and character text matched by the last call
csx.jlex.java would hold the
to yylex.
generated scanner).

© ©
CS 536 Spring 2007 110 CS 536 Spring 2007 111

A simple example of the use of


JLex is in
~cs536-1/pubic/jlex
Just enter
make test

©
CS 536 Spring 2007 112

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy