Skip to content

NLP Notebook + Tests: Chomsky Normal Form #607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 8, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 111 additions & 0 deletions nlp.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,25 @@
"Now we know it is more likely for `S` to be replaced by `aSb` than by `e`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Chomsky Normal Form\n",
"\n",
"A grammar is in Chomsky Normal Form (or **CNF**, not to be confused with *Conjunctive Normal Form*) if its rules are one of the three:\n",
"\n",
"* `X -> Y Z`\n",
"* `A -> a`\n",
"* `S -> ε`\n",
"\n",
"Where *X*, *Y*, *Z*, *A* are non-terminals, *a* is a terminal, *ε* is the empty string and *S* is the start symbol (the start symbol should not be appearing on the right hand side of rules). Note that there can be multiple rules for each left hand side non-terminal, as long they follow the above. For example, a rule for *X* might be: `X -> Y Z | A B | a | b`.\n",
"\n",
"Of course, we can also have a *CNF* with probabilities.\n",
"\n",
"This type of grammar may seem restrictive, but it can be proven that any context-free grammar can be converted to CNF."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -275,6 +294,52 @@
"print(\"Is 'here' a noun?\", grammar.isa('here', 'Noun'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If the grammar is in Chomsky Normal Form, we can call the class function `cnf_rules` to get all the rules in the form of `(X, Y, Z)` for each `X -> Y Z` rule. Since the above grammar is not in *CNF* though, we have to create a new one."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"E_Chomsky = Grammar('E_Prob_Chomsky', # A Grammar in Chomsky Normal Form\n",
" Rules(\n",
" S='NP VP',\n",
" NP='Article Noun | Adjective Noun',\n",
" VP='Verb NP | Verb Adjective',\n",
" ),\n",
" Lexicon(\n",
" Article='the | a | an',\n",
" Noun='robot | sheep | fence',\n",
" Adjective='good | new | sad',\n",
" Verb='is | say | are'\n",
" ))"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[('NP', 'Article', 'Noun'), ('NP', 'Adjective', 'Noun'), ('VP', 'Verb', 'NP'), ('VP', 'Verb', 'Adjective'), ('S', 'NP', 'VP')]\n"
]
}
],
"source": [
"print(E_Chomsky.cnf_rules())"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -428,6 +493,52 @@
"print(\"Is 'here' a noun?\", grammar.isa('here', 'Noun'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we have a grammar in *CNF*, we can get a list of all the rules. Let's create a grammar in the form and print the *CNF* rules:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"E_Prob_Chomsky = ProbGrammar('E_Prob_Chomsky', # A Probabilistic Grammar in CNF\n",
" ProbRules(\n",
" S='NP VP [1]',\n",
" NP='Article Noun [0.6] | Adjective Noun [0.4]',\n",
" VP='Verb NP [0.5] | Verb Adjective [0.5]',\n",
" ),\n",
" ProbLexicon(\n",
" Article='the [0.5] | a [0.25] | an [0.25]',\n",
" Noun='robot [0.4] | sheep [0.4] | fence [0.2]',\n",
" Adjective='good [0.5] | new [0.2] | sad [0.3]',\n",
" Verb='is [0.5] | say [0.3] | are [0.2]'\n",
" ))"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[('NP', 'Article', 'Noun', 0.6), ('NP', 'Adjective', 'Noun', 0.4), ('VP', 'Verb', 'NP', 0.5), ('VP', 'Verb', 'Adjective', 0.5), ('S', 'NP', 'VP', 1.0)]\n"
]
}
],
"source": [
"print(E_Prob_Chomsky.cnf_rules())"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
25 changes: 25 additions & 0 deletions nlp.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,16 @@ def isa(self, word, cat):
"""Return True iff word is of category cat"""
return cat in self.categories[word]

def cnf_rules(self):
"""Returns the tuple (X, Y, Z) for rules in the form:
X -> Y Z"""
cnf = []
for X, rules in self.rules.items():
for (Y, Z) in rules:
cnf.append((X, Y, Z))

return cnf

def generate_random(self, S='S'):
"""Replace each token in S by a random entry in grammar (recursively)."""
import random
Expand Down Expand Up @@ -229,6 +239,21 @@ def __repr__(self):
Digit="0 [0.35] | 1 [0.35] | 2 [0.3]"
))



E_Chomsky = Grammar('E_Prob_Chomsky', # A Grammar in Chomsky Normal Form
Rules(
S='NP VP',
NP='Article Noun | Adjective Noun',
VP='Verb NP | Verb Adjective',
),
Lexicon(
Article='the | a | an',
Noun='robot | sheep | fence',
Adjective='good | new | sad',
Verb='is | say | are'
))

E_Prob_Chomsky = ProbGrammar('E_Prob_Chomsky', # A Probabilistic Grammar in CNF
ProbRules(
S='NP VP [1]',
Expand Down
8 changes: 8 additions & 0 deletions tests/test_nlp.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ def test_grammar():
assert grammar.rewrites_for('A') == [['B', 'C'], ['D', 'E']]
assert grammar.isa('the', 'Article')

grammar = nlp.E_Chomsky
for rule in grammar.cnf_rules():
assert len(rule) == 3


def test_generation():
lexicon = Lexicon(Article="the | a | an",
Expand Down Expand Up @@ -77,6 +81,10 @@ def test_prob_grammar():
assert grammar.rewrites_for('A') == [(['B', 'C'], 0.3), (['D', 'E'], 0.7)]
assert grammar.isa('the', 'Article')

grammar = nlp.E_Prob_Chomsky
for rule in grammar.cnf_rules():
assert len(rule) == 4


def test_prob_generation():
lexicon = ProbLexicon(Verb="am [0.5] | are [0.25] | is [0.25]",
Expand Down
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy