Lecture 3
Lecture 3
3 4
5 6
1
Learning for Categorization Sample Category Learning Problem
P( E | H ) P ( H ) P( X = xk ) = ∑ P(Y = yi ) P ( X = xk | Y = yi )
QED: P ( H | E ) = i =1
P( E )
11 12
2
Bayesian Categorization (cont.) Generative Probabilistic Models
• Need to know: • Assume a simple (usually unrealistic) probabilistic method
by which the data was generated.
– Priors: P(Y=yi) • For categorization, each category has a different
– Conditionals: P(X=xk | Y=yi) parameterized generative model that characterizes that
category.
• P(Y=yi) are easily estimated from data. • Training: Use the data for each category to estimate the
– If ni of the examples in D are in yi then P(Y=yi) = ni / |D| parameters of the generative model for that category.
– Maximum Likelihood Estimation (MLE): Set parameters to
• Too many possible instances (e.g. 2n for binary maximize the probability that the model produced the given
training data.
features) to estimate all P(X=xk | Y=yi). – If Mλ denotes a model with parameter values λ and Dk is the
training data for the kth class, find model parameters for class k
• Still need to make some sort of independence (λk) that maximize the likelihood of Dk:
assumptions about the features to make learning λk = argmax P( Dk | M λ )
tractable. λ
• Testing: Use Bayesian analysis to determine the category
model that most likely generated a specific test instance.
13 14
lg red circ
neg
pos pos ?? ??
pos neg
pos neg
Category neg
pos pos
pos neg
pos neg
Category
17 18
3
Naïve Bayes Categorization Example Naïve Bayes Diagnosis Example
P(positive | X) = P(positive)*P(medium | positive)*P(red | positive)*P(circle | positive) / P(X) Prob Well Cold Allergy
0.5 * 0.1 * 0.9 * 0.9
= 0.0405 / P(X) = 0.0405 / 0.0495 = 0.8181
P(ci) 0.9 0.05 0.05
P(negative | X) = P(negative)*P(medium | negative)*P(red | negative)*P(circle | negative) / P(X) P(sneeze|ci) 0.1 0.9 0.9
0.5 * 0.2 * 0.3 * 0.3 P(cough|ci) 0.1 0.8 0.7
= 0.009 / P(X) = 0.009 / 0.0495 = 0.1818
P(positive | X) + P(negative | X) = 0.0405 / P(X) + 0.009 / P(X) = 1
P(fever|ci) 0.01 0.7 0.4
P(X) = (0.0405 + 0.009) = 0.0495
19 20
4
Laplace Smothing Example Text Categorization Applications
• Assume training set contains 10 positive examples: • Web pages
– Recommending
– 4: small – Yahoo-like classification
– 0: medium • Newsgroup/Blog Messages
– 6: large – Recommending
– spam filtering
• Estimate parameters as follows (if m=1, p=1/3) – Sentiment analysis for marketing
– P(small | positive) = (4 + 1/3) / (10 + 1) = 0.394 • News articles
– P(medium | positive) = (0 + 1/3) / (10 + 1) = 0.03 – Personalized newspaper
– P(large | positive) = (6 + 1/3) / (10 + 1) = 0.576 • Email messages
– P(small or medium or large | positive) = 1.0 – Routing
– Prioritizing
– Folderizing
– spam filtering
25 – Advertising on Gmail 26
27 28
Naïve Bayes Generative Model for Text Naïve Bayes Text Classification
Win lotttery $ !
spam ?? ??
legit
spam spam
legit legit spam
spam spam legit
legit spam spam
legit legit
Category
spam spam
science legit
Viagra science
Viagra
win PM
win
Category PM
hot ! !! computer Friday
Nigeria deal hot ! !! computer Friday
test homework Nigeria deal
lottery nude test homework
March score lottery nude
! Viagra March score
$ May exam ! Viagra
$ May exam
spam legit spam
29
legit 30
5
Text Naïve Bayes Algorithm Text Naïve Bayes Algorithm
(Train) (Test)
Let V be the vocabulary of all words in the documents in D Given a test document X
For each category ci ∈ C Let n be the number of word occurrences in X
Let Di be the subset of documents in D in category ci Return the category:
n
P(ci) = |Di| / |D|
argmax P (ci )∏ P( ai | ci )
Let Ti be the concatenation of all the documents in Di ci ∈C i =1
Let ni be the total number of word occurrences in Ti where ai is the word occurring the ith position in X
For each word wj ∈ V
Let nij be the number of occurrences of wj in Ti
Let P(wj | ci) = (nij + 1) / (ni + |V|)
31 32