Implementation of Scanner: Build Understand Categorize
Implementation of Scanner: Build Understand Categorize
Objectives:
• Be able to Build a simple scanner
• Be able to Understand Symbol table construction
• Be able to Categorize source program components
Tokens, Patterns, and Lexemes
• A token is a classification of lexical units
• For example: id and num
• Lexemes are the specific character strings that make
up a token
• For example: abc and 123
• Patterns are rules describing the set of lexemes
belonging to a token
• For example: “letter followed by letters and digits” and “non-
empty sequence of digits”
2
Regular Definitions and Grammars
Grammar
stmt if expr then stmt
if expr then stmt else stmt
expr term relop term
term
term id Regular definitions
num if if
then then
else else
relop < <= <> > >= =
id letter ( letter | digit )*
num digit+ (. digit+)? ( E (+-)? digit+ )?
3
Context Free Grammar
How to
Define CFG for
Mathematical Expression ?
Testing? Or Verification?
BNF Notation
5
What strings are produced by these CFG’s???
• S→1S|0A0S|ε
•A→1A|ε
And
• S→1S|0T|ε
• T→1T|0S
Applications of CFG’s
1- Validity of syntax (Parsing)
• A-> AA|(A)|ε
9
The Front End
• For this CFG
S = goal
T = { number, id, +, -}
N = { goal, expr, term, op}
P = { 1, 2, 3, 4, 5, 6, 7}
10
Parse
Production Result
goal
1 expr
2 expr op term
5 expr op y
7 expr – y
2 expr op term – y
4 expr op 2 – y
6 expr + 2 – y
3 term + 2 – y
5 x+2–y
11
Example Grammar
with productions P =
list digit
digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
12
Table generation
Name Class Type Other *Location
Symbol
Table
Symbol Table
The symbol table is globally accessible (to all phases of the compiler)
Each entry in the symbol table contains a string and a token value:
struct entry
{ char *lexptr; /* lexeme (string) for tokenval */
int token;
};
struct entry symtable[];
insert(s, t): returns array index to new entry for string s token t
lookup(s): returns array index to entry for string s or 0
Possible implementations:
- simple C code as in the project
- hashtables 15
Structure of the Symbol Table
• We will implement the symbol table as a linked list of hash tables, one
hash table for each block level.