Lecture08 4up
Lecture08 4up
© ©
CS 536 Spring 2007 94 CS 536 Spring 2007 95
© ©
CS 536 Spring 2007 96 CS 536 Spring 2007 97
a transition from the current Deterministic Finite Automata
state, we go to the state it
points to. If no move is As an abbreviation, a transition
possible, we stop. If we finish may be labeled with more than
in an accepting state, the one character (for example,
Not(c)). The transition may be
sequence of characters read taken if the current input
forms a valid token; otherwise, character matches any of the
we have not seen a valid token. characters labeling the transition.
If an FA always has a unique
In this diagram, the valid transition (for a given state and
character), the FA is deterministic
tokens are the strings (that is, a deterministic FA, or
described by the regular DFA). Deterministic finite
expression (a b (c)+ )+. automata are easy to program
and often drive a scanner.
a
If there are transitions to more
than one state for some character,
a b c then the FA is nondeterministic
(that is, an NFA).
© ©
CS 536 Spring 2007 98 CS 536 Spring 2007 99
© ©
CS 536 Spring 2007 100 CS 536 Spring 2007 101
All regular expressions can be For example, suppose
translated into DFAs that accept CurrentChar is the current input
(as valid tokens) the strings character. End of file is
defined by the regular represented by a special character
expressions. This translation can value, eof. Using the DFA for the
be done manually by a Java comments shown earlier, a
programmer or automatically table-driven scanner is:
using a scanner generator. State = StartState
A DFA can be coded in: while (true){
if (CurrentChar == eof)
• Table-driven form
break
• Explicit control form NextState =
T[State][CurrentChar]
In the table-driven form, the if(NextState == error)
transition table that defines a break
DFA’s actions is explicitly State = NextState
represented in a run-time table read(CurrentChar)
that is “interpreted” by a driver }
program. if (State in AcceptingStates)
In the direct control form, the // Process valid token
transition table that defines a else // Signal a lexical error
DFA’s actions appears implicitly as
the control logic of the program.
© ©
CS 536 Spring 2007 102 CS 536 Spring 2007 103
© ©
CS 536 Spring 2007 104 CS 536 Spring 2007 105
More Examples • An identifier consisting of letters,
digits, and underscores, which begins
• A FORTRAN-like real literal (which with a letter and allows no adjacent
requires digits on either or both sides or trailing underscores, may be
of a decimal point, or just a string of defined as
digits) can be defined as
+ * +
ID = L (L | D)* ( _ (L | D)+)*
RealLit = (D (λ | . )) | (D .D )
This definition includes identifiers
This corresponds to the DFA like sum or unit_cost, but
excludes _one and two_ and
D grand___total. The DFA is:
.
D D L|D
D L _
.
L|D
© ©
CS 536 Spring 2007 106 CS 536 Spring 2007 107
© ©
CS 536 Spring 2007 108 CS 536 Spring 2007 109
JLex You compile f.jlex.java just
like any Java program, using your
JLex is coded in Java. To use it, favorite Java compiler.
you enter After compilation, the class file
java JLex.Main f.jlex Yylex.class is created.
Your CLASSPATH should be set to It contains the methods:
search the directories where JLex’s
• Token yylex() which is the actual
classes are stored.
(The CLASSPATH we gave you scanner. The constructor for Yylex
includes JLex’s classes). takes the file you want scanned, so
After JLex runs (assuming there new Yylex(System.in)
are no errors in your token will build a scanner that reads from
specifications), the Java source System.in. Token is the token
file class you want returned by the
f.jlex.java is created. (f stands scanner; you can tell JLex what class
for any file name you choose. you want returned.
Thus csx.jlex might hold token • String yytext() returns the
definitions for CSX, and character text matched by the last call
csx.jlex.java would hold the
to yylex.
generated scanner).
© ©
CS 536 Spring 2007 110 CS 536 Spring 2007 111
©
CS 536 Spring 2007 112