Parsing
Parsing
Outline
Parsing with CFGs
Bottom-up, top-down
CKY parsing
Mention of Earley and chart parsing
Parsing
Parsing with CFGs refers to the task of
assigning trees to input strings
Trees that covers all and only the
elements of the input and has an S at the
top
Parsing
parsing involves a search
Top-Down Search
We’re trying to find trees rooted with an
S; start with the rules that give us an S.
Then we can work our way down from
there to the words.
Top Down Space
Bottom-Up Parsing
We also want trees that cover the input
words.
Start with trees that link up with the
words
Then work your way up from there to
larger and larger trees.
Bottom-Up Space
8
Top-Down and Bottom-Up
Top-down
Only searches for trees that can be S’s
But also suggests trees that are not consistent
with any of the words
Bottom-up
Only forms trees consistent with the words
But suggests trees that make no sense
globally
Control
Which node to try to expand next
Which grammar rule to use to expand a node
One approach: exhaustive search of the
space of possibilities
Not feasible
Time is exponential in the number of non-
terminals
LOTS of repeated work, as the same constituent is
created over and over (shared sub-problems)
Dynamic Programming
DP search methods fill tables with partial results
and thereby
Avoid doing avoidable repeated work
Solve exponential problems in polynomial time (well,
no not really – we’ll return to this point)
Efficiently store ambiguous structures with shared
sub-parts.
We’ll cover two approaches that roughly
correspond to bottom-up and top-down
approaches.
CKY
Earley
CKY Parsing
Solved in class
CKY Parsing
Is that really a parser?
So, far it is only a recognizer
Success? an S in cell [0,N]
To turn it into a parser … see Lecture
CKY Notes
Since it’s bottom up, CKY populates the
table with a lot of worthless constituents.
To avoid this we can switch to a top-down
control strategy
Or we can add some kind of filtering that
blocks constituents where they can not
happen in a final analysis.
Dynamic Programming Parsing
Methods
CKY (Cocke-Kasami-Younger) algorithm
based on bottom-up parsing and requires
first normalizing the grammar.
Earley parser is based on top-down
parsing and does not require normalizing
grammar but is more complex.[self-study]
More generally, chart parsers retain
completed phrases in a chart and can
combine top-down and bottom-up search.
Conclusions
Syntax parse trees specify the syntactic
structure of a sentence that helps determine its
meaning.
John ate the spaghetti with meatballs with chopsticks.
How did John eat the spaghetti?
What did John eat?
CFGs can be used to define the grammar of a
natural language.
Dynamic programming algorithms allow
computing a single parse tree in cubic time or all
parse trees in exponential time.