Skip to content

Commit 4887b0e

Browse files
antmarakisnorvig
authored andcommitted
NLP Notebook + Tests: Chomsky Normal Form (aimacode#607)
* add cnf_rules to grammar * Update nlp.ipynb * Update test_nlp.py * add more to CNF section
1 parent 790213a commit 4887b0e

File tree

3 files changed

+144
-0
lines changed

3 files changed

+144
-0
lines changed

nlp.ipynb

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,25 @@
8181
"Now we know it is more likely for `S` to be replaced by `aSb` than by `e`."
8282
]
8383
},
84+
{
85+
"cell_type": "markdown",
86+
"metadata": {},
87+
"source": [
88+
"### Chomsky Normal Form\n",
89+
"\n",
90+
"A grammar is in Chomsky Normal Form (or **CNF**, not to be confused with *Conjunctive Normal Form*) if its rules are one of the three:\n",
91+
"\n",
92+
"* `X -> Y Z`\n",
93+
"* `A -> a`\n",
94+
"* `S -> ε`\n",
95+
"\n",
96+
"Where *X*, *Y*, *Z*, *A* are non-terminals, *a* is a terminal, *ε* is the empty string and *S* is the start symbol (the start symbol should not be appearing on the right hand side of rules). Note that there can be multiple rules for each left hand side non-terminal, as long they follow the above. For example, a rule for *X* might be: `X -> Y Z | A B | a | b`.\n",
97+
"\n",
98+
"Of course, we can also have a *CNF* with probabilities.\n",
99+
"\n",
100+
"This type of grammar may seem restrictive, but it can be proven that any context-free grammar can be converted to CNF."
101+
]
102+
},
84103
{
85104
"cell_type": "markdown",
86105
"metadata": {},
@@ -275,6 +294,52 @@
275294
"print(\"Is 'here' a noun?\", grammar.isa('here', 'Noun'))"
276295
]
277296
},
297+
{
298+
"cell_type": "markdown",
299+
"metadata": {},
300+
"source": [
301+
"If the grammar is in Chomsky Normal Form, we can call the class function `cnf_rules` to get all the rules in the form of `(X, Y, Z)` for each `X -> Y Z` rule. Since the above grammar is not in *CNF* though, we have to create a new one."
302+
]
303+
},
304+
{
305+
"cell_type": "code",
306+
"execution_count": 2,
307+
"metadata": {
308+
"collapsed": true
309+
},
310+
"outputs": [],
311+
"source": [
312+
"E_Chomsky = Grammar('E_Prob_Chomsky', # A Grammar in Chomsky Normal Form\n",
313+
" Rules(\n",
314+
" S='NP VP',\n",
315+
" NP='Article Noun | Adjective Noun',\n",
316+
" VP='Verb NP | Verb Adjective',\n",
317+
" ),\n",
318+
" Lexicon(\n",
319+
" Article='the | a | an',\n",
320+
" Noun='robot | sheep | fence',\n",
321+
" Adjective='good | new | sad',\n",
322+
" Verb='is | say | are'\n",
323+
" ))"
324+
]
325+
},
326+
{
327+
"cell_type": "code",
328+
"execution_count": 4,
329+
"metadata": {},
330+
"outputs": [
331+
{
332+
"name": "stdout",
333+
"output_type": "stream",
334+
"text": [
335+
"[('NP', 'Article', 'Noun'), ('NP', 'Adjective', 'Noun'), ('VP', 'Verb', 'NP'), ('VP', 'Verb', 'Adjective'), ('S', 'NP', 'VP')]\n"
336+
]
337+
}
338+
],
339+
"source": [
340+
"print(E_Chomsky.cnf_rules())"
341+
]
342+
},
278343
{
279344
"cell_type": "markdown",
280345
"metadata": {},
@@ -428,6 +493,52 @@
428493
"print(\"Is 'here' a noun?\", grammar.isa('here', 'Noun'))"
429494
]
430495
},
496+
{
497+
"cell_type": "markdown",
498+
"metadata": {},
499+
"source": [
500+
"If we have a grammar in *CNF*, we can get a list of all the rules. Let's create a grammar in the form and print the *CNF* rules:"
501+
]
502+
},
503+
{
504+
"cell_type": "code",
505+
"execution_count": 6,
506+
"metadata": {
507+
"collapsed": true
508+
},
509+
"outputs": [],
510+
"source": [
511+
"E_Prob_Chomsky = ProbGrammar('E_Prob_Chomsky', # A Probabilistic Grammar in CNF\n",
512+
" ProbRules(\n",
513+
" S='NP VP [1]',\n",
514+
" NP='Article Noun [0.6] | Adjective Noun [0.4]',\n",
515+
" VP='Verb NP [0.5] | Verb Adjective [0.5]',\n",
516+
" ),\n",
517+
" ProbLexicon(\n",
518+
" Article='the [0.5] | a [0.25] | an [0.25]',\n",
519+
" Noun='robot [0.4] | sheep [0.4] | fence [0.2]',\n",
520+
" Adjective='good [0.5] | new [0.2] | sad [0.3]',\n",
521+
" Verb='is [0.5] | say [0.3] | are [0.2]'\n",
522+
" ))"
523+
]
524+
},
525+
{
526+
"cell_type": "code",
527+
"execution_count": 9,
528+
"metadata": {},
529+
"outputs": [
530+
{
531+
"name": "stdout",
532+
"output_type": "stream",
533+
"text": [
534+
"[('NP', 'Article', 'Noun', 0.6), ('NP', 'Adjective', 'Noun', 0.4), ('VP', 'Verb', 'NP', 0.5), ('VP', 'Verb', 'Adjective', 0.5), ('S', 'NP', 'VP', 1.0)]\n"
535+
]
536+
}
537+
],
538+
"source": [
539+
"print(E_Prob_Chomsky.cnf_rules())"
540+
]
541+
},
431542
{
432543
"cell_type": "markdown",
433544
"metadata": {},

nlp.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,16 @@ def isa(self, word, cat):
5252
"""Return True iff word is of category cat"""
5353
return cat in self.categories[word]
5454

55+
def cnf_rules(self):
56+
"""Returns the tuple (X, Y, Z) for rules in the form:
57+
X -> Y Z"""
58+
cnf = []
59+
for X, rules in self.rules.items():
60+
for (Y, Z) in rules:
61+
cnf.append((X, Y, Z))
62+
63+
return cnf
64+
5565
def generate_random(self, S='S'):
5666
"""Replace each token in S by a random entry in grammar (recursively)."""
5767
import random
@@ -229,6 +239,21 @@ def __repr__(self):
229239
Digit="0 [0.35] | 1 [0.35] | 2 [0.3]"
230240
))
231241

242+
243+
244+
E_Chomsky = Grammar('E_Prob_Chomsky', # A Grammar in Chomsky Normal Form
245+
Rules(
246+
S='NP VP',
247+
NP='Article Noun | Adjective Noun',
248+
VP='Verb NP | Verb Adjective',
249+
),
250+
Lexicon(
251+
Article='the | a | an',
252+
Noun='robot | sheep | fence',
253+
Adjective='good | new | sad',
254+
Verb='is | say | are'
255+
))
256+
232257
E_Prob_Chomsky = ProbGrammar('E_Prob_Chomsky', # A Probabilistic Grammar in CNF
233258
ProbRules(
234259
S='NP VP [1]',

tests/test_nlp.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,10 @@ def test_grammar():
3232
assert grammar.rewrites_for('A') == [['B', 'C'], ['D', 'E']]
3333
assert grammar.isa('the', 'Article')
3434

35+
grammar = nlp.E_Chomsky
36+
for rule in grammar.cnf_rules():
37+
assert len(rule) == 3
38+
3539

3640
def test_generation():
3741
lexicon = Lexicon(Article="the | a | an",
@@ -77,6 +81,10 @@ def test_prob_grammar():
7781
assert grammar.rewrites_for('A') == [(['B', 'C'], 0.3), (['D', 'E'], 0.7)]
7882
assert grammar.isa('the', 'Article')
7983

84+
grammar = nlp.E_Prob_Chomsky
85+
for rule in grammar.cnf_rules():
86+
assert len(rule) == 4
87+
8088

8189
def test_prob_generation():
8290
lexicon = ProbLexicon(Verb="am [0.5] | are [0.25] | is [0.25]",

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy