Unit 4
Unit 4
2101CS521
Unit-4
Classification
Predictive
• These tasks predict the value of one attribute on the basis of values of other
attributes.
• E.g.: Festival Customer/Product Sell prediction at store
Model
Classifier Predictor
Classifier
(Model)
Unseen Data
Testing
Data
(XYZ, youth,
low)
incom loan_decisi
loan_decision ?
Name Age
e on
Juan Bello senior low safe
Sylvia Crest middle_aged low risky
Risky
Anne Yee middle_aged high safe
9
Prof. Jayesh D. Vagadiya #2101CS521 (DM) Unit 4 - Classification 9
Decision Tree Induction
Decision tree induction is the learning of decision trees from class-labeled
training tuples.
A decision tree is a flowchart-like tree structure.
The Inner node represents an attribute.
An Edge represents a test on attribute. A decision tree for the
Leaf represent one of the classes.
age?
concept buys computer,
indicating whether an
AllElectronics customer is
youth seni likely to purchase a
middle_aged or computer.
The attribute A with the highest information gain is chosen as the splitting
attribute at node N.
Similarly,
Gain(income) = 0.029 bits
Gain(student) = 0.151 bits
Gain(credit_rating) = 0.048 bits
age attribute has highest Information Gain among all attributes.
Therefore node N is labelled with age and branches grow for each
of the attributes value.
The attribute with the maximum gain ratio is selected as the splitting
attribute.
For a discrete-valued attribute, the subset that gives the minimum Gini
index for that attribute is selected as its splitting subset.
( ( ) ( )) ( ( ) ( ))
2 2 2 2
Distinct 10 7 3 4 2 2
Values
Yes No Total ¿ 1− − + 1− −
Low, 7 3 10
14 10 10 14 4 4
Medium
¿ 0.443
High 2 2 4
¿ 𝐺𝑖𝑛𝑖𝑖𝑛𝑐𝑜𝑚𝑒 ∈ {h𝑖𝑔h } ( 𝐷 )
Incom
e
P(H|X) :
The probability of customer X purchasing a computer,
considering the information about the customer's age and
income.
Here H conditioned on X.
Prof. Jayesh D. Vagadiya #2101CS521 (DM) Unit 4 - Classification 39
Bayes’ Theorem
P(X|H) :
The probability that customer X is 35 years old and earnsThe Bayes Theorem:
40,000 Rs., given that we are aware that he/she intends
P(H|X)=
to purchase the computer.
Here X conditioned on H.
P(H) :
The probability that the customer will buy the computer.
P(X) :
The probability the customer X from a set of customers is
35 years old and earns 40,000 Rs.
x’=(Outlook=Sunny,
Rain 3/9 2/5 Temperature=Cool, Humidity=High,
Wind=Strong) = ?
P(Ci|X) =
The “IF” part (or left side) of a The “THEN” part (or right
rule is known as the rule side) is the rule consequent.
antecedent or precondition.
Rule-based ordering
The rules are organized into one long priority list, according to some measure of rule
quality, such as accuracy, coverage, or size (number of attribute tests in the rule
antecedent)
Rule that appears earliest in the list has the highest priority, and so it gets to fire its
class prediction. Any other rule that satisfies X is ignored.
youth seni
middle_aged or
Exhaustive rules
There exists a rule for each combination of attribute values.
This ensures that every record is covered by at least one rule.
Output:
A set of IF-THEN rules.
Test
Set
Prof. Jayesh D. Vagadiya #2101CS521 (DM) Unit 4 - Classification 60
Metrics for Performance Evaluation: Confusion
Matrix
Actual\ • TP :True Positive
predict Yes No • FN :False
ed Negative
Yes TP FN • FP : False
Positive
No FP TN
• TN : True
True Positive: These refer to the positive
tuples that were correctly
Negative
labeled by the classifier.
True negatives: These are the negative tuples that were correctly
labeled by the classifier.
False positives: These are the negative tuples that were incorrectly
labeled as positive.
False negatives: These are the positive tuples that were mislabeled as
negative.
Actual\
Yes No Total
predicte
d Accuracy = = 0.57
4
Yes 2 2
3
No 1 2
Actual\
Yes No Total
predicte
d Error rate =
4
Yes 2 2
3
No 1 2
Actual\
predicte Yes No Total Precision =
d
4
Yes 2 2
3
No 1 2
Prof. Jayesh D. Vagadiya #2101CS521 (DM) Unit 4 - Classification 66
Recall
Completeness – what % of positive tuples did the classifier label as
positive?
It Actual\
calculates the ratio of true positives to the total number of actual
positive
predicteinstances.
Yes No
Total Recall =
d
P
Yes TP FN
N
No FP TN
Actual\
Yes No Total
predicte
d Recall =
4
Yes 2 2
3
No 1 2
Test
Set
K = 10
1 2 3 4 5 6 7 8 9 10
Iteration
Training and testing is performed k times.
1 Test Train Train Train Train Train Train Train Train Train
2 Train Test Train Train Train Train Train Train Train Train
10 Train Train Train Train Train Train Train Train Train Test
0.6
0.4
6 N 0.54 4 2 0.8 0.4
7 N 0.53 4 3 0.8 0.6
0.2
8 N 0.51 4 4 0.8 0.8
0.0 9 P 0.50 5 4 1.0 0.8
0.0 0.2 0.4 0.6 0.8 1.0 1.2
10 N 0.40 5 5 1.0 1.0
FPR
Returned by a (TPR) =
• Vertical axis represents the true
positive rate probabilistic classifier for
• Horizontal axis represents the false each of the 10 tuples in a (FPR) = =
positive rate test set, sorted by
• A model with perfect accuracy will 0.0
have an area of 1.0 decreasing probability
order.
Prof. Jayesh D. Vagadiya #2101CS521 (DM) Unit 4 - Classification 74
Techniques to Improve Classification Accuracy
We focus on ensemble methods for improvement of classification
accuracy.
It involve combining the predictions from multiple individual models
(classifiers) to improve the overall performance.
The individual classifiers vote, and a class label prediction is returned by
the ensemble based on the collection of votes.
Combine a series of k Clearned models, M1, M2, …, Mk, with the aim of
1 New
creating an improved model M* Data
Sample
C2
Combine Class
Data . Vote
Predicati
on
.
CT
Prof. Jayesh D. Vagadiya #2101CS521 (DM) Unit 4 - Classification 75
Bagging
Analogy: Diagnosis based on multiple doctors’ majority vote.
Training:
Given a set D of d tuples, at each iteration i, a training set Di of d tuples is sampled
with replacement from D (i.e., bootstrap)
A classifier model Mi is learned for each training set Di
Classification: classify an unknown sample X
Each classifier Mi returns its class prediction
The bagged classifier M* counts the votes and assigns the class with the most votes
to X
Prediction can be applied to the prediction of continuous values by taking
the average value of each prediction for a given test tuple.