Syntax
Syntax
http://www.cs.stonybrook.edu/~cse307
A non_zero_digit “is” 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Regular Expressions
A regular expression is one of the following:
a character
Regular Expressions
A digit “is” 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
| decimal ( exponent | ε )
Regular Expressions
op → + | - | * | /
Chomsky Hierarchy
to recognize, so
syntax tree, while Context Free grammars build the syntax tree
Chomsky Hierarchy:
Types 0 and 1 are not for practical use in defining programming languages
3
Type 2, for very restricted practical use (O(N ) in the worst case)
Type 3 are fast (linear time to recognize tokens), but not expressive enough
for most languages (c) Paul Fodor (CS Stony Brook) and Elsevier
op → + | - | * | /
a set of non-terminals N
7 a set of productions
John Backus and Peter Naur used the BNF form for
Algol
Peter Naur also won the ACM Turing Award in 2005 for
notation
id_list → id ( , id )*
is shorthand for
id_list → id id_list_tail
id_list_tail → , id id_list_tail
id_list_tail → ε
id_list → id
id_list → id_list , id
As −> a As
| ε
S −> b As b S
| ε
G −> As S
10
replace the start symbol with the right-hand side of that production
right-hand side of P
11
⇒ expr op id + id
⇒ expr * id + id op → + | - | * | /
⇒ id * id + id
12 derivation
grammar
children represent a
production
3 + 4 * 5is:
13
Grammar:
expr → id | number
| - expr | ( expr )
| expr op expr
op → + | - | * | /
14
add_op → + | -
mult_op → * | /
15
add_op → + | -
16 mult_op → * | /
Scanning
tokenizing source
removing comments
comments)
17 messages
(c) Paul Fodor (CS Stony Brook) and Elsevier
Scanning
There are two syntaxes for regular expressions: Perl-style Regex and
EBNF
Table-driven DFA
18
Scanning
expression: cases
19
Scanning
Construction of an NFA equivalent to a given regular
expression: cases
20
Scanning
expression d* ( .d | d. ) d*
21
Scanning
expression d* ( .d | d. ) d*
22
Scanning
The state of the DFA after reading any input will be the set
of states that the NFA might have reached on the same input
States 2, 4, 5, or 8
Scanning
in A.
to any of States 2, 4, 5, or 8.
24
Scanning
Calculator:
assign → :=
plus → +
minus → -
times → *
div → /
lparen → (
rparen → )
| // ( non-newline )* newline
26
Scanning
if cur_char = ‘:’
if cur_char = ‘/’
if it is ‘*’ or ‘/’
is seen, respectively
27
Scanning
if cur_char = .
if it is a digit
return number
if cur_char is a digit
return number
if cur_char is a letter
write
else return id
28
Scanning
Pictorial
representation of
a scanner for
calculator
tokens, in the
form of a finite
automaton
29
Scanning
foob
30
Scanning
token
next token
proceed
In Pascal, for example, when you have a 3and you a see a dot
Scanning
programming technique
scangenproduce
Perl-style Regexp
Learning by examples:
abcd - concatenation
a(b|c)d- grouping
?= 0-1
* = 0-inf
+ = 1-inf
[a-zA-Z0-9_]- ranges
\a - alpha
\d - numeric
33
Perl-style Regexp
Learning by examples:
floats?
digit*(.digit|digit.)digit*
\d*(\.\d|\d \.)\d*
34
Parsing
translation).
35
Parsing
cycles)
36
Parsing
time
LL and LR
terminated by a semicolon:
id_list → id id_list_tail
id_list_tail → , id id_list_tail
id_list_tail → ;
38 nodes.
id_list → id id_list_tail
id_list_tail → , id id_list_tail
id_list_tail → ;
“A, B, C;”
- The parser finds the left-most
39
side.
an id_list_tail.
Parsing
ahead
LL grammars requirements:
41
An LL(1) grammar
| ε
stmt → id := expr
| read id
| write expr
| ε
factor → ( expr )
| id
| number
add_op → +
| -
mult_op → *
42
LL Parsing
43
LL Parsing
Example (the average program):
read A
read B
sum := A + B
write sum
write sum / 2 $$
symbol inserted
LL Parsing
OR
45
production rules to
production the
could start it
46
LL Parsing
will be used?
47
LL Parsing
calculator language
read A
read B
sum := A + B
write sum
write sum / 2 $$
48
49
50
51
LL Parsing
left recursion
example:
id_list → id_list , id
id_list → id
id_list → id id_list_tail
id_list_tail → , id id_list_tail
id_list_tail → ε
52
LL Parsing
common prefixes
example:
stmt → id := expr
| id ( arg_list )
"left-factoring ”
stmt → id id_stmt_tail
id_stmt_tail → := expr
| ( arg_list)
53
LL Parsing
make a grammar LL
| other_stuff
54
LL Parsing
else balanced_stmt
| other_stuff
else unbalanced_stmt
55
(c) Paul Fodor (CS Stony Brook) and Elsevier
LL Parsing
| other_stuff
56
LL Parsing
problem.
if A = B then
if C = D then E := F end
else
G := H
end
57
LL Parsing
One problem with end markers is that they tend to bunch up.
if A = B then …
else if A = C then …
else if A = D then …
else if A = E then …
else ...;
if A = B then …
else if A = C then …
else if A = D then …
else if A = E then …
else ...;
58
LR Parsing
LR parsers are almost always table-driven:
state
LR Parsing
in their place
60
id_list_tail → , id id_list_tail
id_list_tail → ;
identifiers grammar:
61
LR Parsing
62
LR Parsing
read A
read B
sum := A + B
write sum
write sum / 2 $$
63
LR Parsing
When we begin execution, the parse stack is
program → . stmt_list $$
64
LR Parsing
program → . stmt_list $$
stmt_list → . stmt
65
LR Parsing
program → . stmt_list $$
stmt_list → . stmt
stmt → . id := expr
stmt → . read id
66
LR Parsing
stmt → read . id
precedes a terminal.
stmt → read id .
67
LR Parsing
stmt_list → . stmt
we now have
stmt_list → stmt .
LR Parsing
program → stmt_list . $$
stmt → . id := expr
stmt → . read id
69
70
71
(c) Paul Fodor (CS Stony Brook) and
Elsevier
72
73
Table entries indicate whether to shift (s), reduce (r), or shift and
74
75
76
A scanner is a DFA
stack
Actions
compiler.
directly
Parsing problems
78
Programming
and machine
language
79
expressions
80
Lex and Yacc generate C code for your analyzer & parser
Parsed
char
stream
(Tokenizer)
81
82
Lex Example
/* lexer.l */
%{
#include “header.h”
int lineno = 1;
%}
%%
\n { lineno++; }
return NUMBER; }
return ID; }
\+ { return PLUS; }
- { return MINUS; }
\* { return TIMES; }
\/ { return DIVIDE; }
= { return EQUALS; }
%%
83
Yacc Example
/* parser.y */
%{
#include “header.h”
%}
%union {
char *name;
int val;
%token<name> ID;
%token<val> NUMBER;
%%
| term
...
84
Bison Overview
The programmer puts BNF rules and
myparser.y
your grammar.
myparser.tab.c yylex.c
myprog
executable program
85
PLY
A bit of history:
PLY: 2001
86
PLY
87
PLY
ply.lex example:
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
’DIVIDE’, EQUALS’ ]
t_ignore = ‘ \t’
t_PLUS = r’\+’
t_MINUS = r’-’
t_TIMES = r’\*’
t_DIVIDE = r’/’
t_EQUALS = r’=’
t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’
def t_NUMBER(t):
r’\d+’
return t
88
PLY
...
while True:
tok = lex.token ()
89
PLY
| term'''
| factor'''
'''factor : NUMBER'''
‘factor : NUMBER’
p [0] = p[1]
91
92
93
94
precedence = (
('left','PLUS','MINUS'),
('left','TIMES','DIVIDE'),
('nonassoc ','UMINUS'),
p[0] = -p[1]
...
95
http://groups.google.com/group/ply-hack
96
TPG
programming languages, …)
while parsing
97
Syntax:
Example:
98
import tpg
class Calc:
r"""
"""
99 http://cdsoft.fr/tpg
TPG example
Non-terminal productions:
100
TPG example
import tpg
class Calc:
r"""
101 """
TPG example
return {
}[s]
102
TPG example
a variable n: number/n.
103
import operator
import string
import tpg
return {
'+': lambda x,y : x+y,
}[s]
r"""
"""
calc = Calc ()
if tpg.__python__ == 3:
operator.div = operator.truediv
raw_input = input
105
#!/usr/bin/env python
import math
import operator
import string
import tpg
if tpg.__python__ == 3:
operator.div = operator.truediv
raw_input = input
def make_op(op):
return {
'+' : operator.add,
'-' : operator.sub,
'*' : operator.mul,
'/' : operator.div,
'%' : operator.mod,
'cos' : math.cos,
'sin' : math.sin,
'tan' : math.tan,
'asin': math.asin,
'atan': math.atan,
'sqrt': math.sqrt,
'abs' : abs,
}[op]
r"""
$ float
107
START/e ->
'vars' $ e=self.mem()
| Expr/e
;
Var/$self.get(v,0)$ -> VarId/v ;
)*
)*
Fact/f ->
| Pow/f
)?
108
Atom/a ->
real/a
| integer/a
| Function/a
| Var/a
Function/y ->
"""
def mem(self):
vars = sorted(self.items())
memory = [ "%s = %s"%(var, val) for (var, val) in vars ]
109
calc = Calc()
while 1:
l = raw_input("\n:")
if l:
try:
print(calc(l))
except Exception:
print(tpg.exc())
else:
break
110
AntLR
lexers
parsers
111
Tasks Divided
Tree Generation
Code Generation
112
import java.io.*;
class Main {
try {
System.out.println (x);
} catch(Exception e) {
113
114