ML-Lec-06-Supervised Learning-Decision Trees
ML-Lec-06-Supervised Learning-Decision Trees
CS-13410
Introduction to Machine Learning
Supervised Learning
(Decision Trees – Ch # 3 by Tom Mitchell)
by
Mudasser Naseer
Types of Learning
Supervised Learning
2
Supervised Learning – Detailed Definition
In supervised learning we have input variables (X) and an output
variable (Y) and we use an algorithm to learn the mapping
function from the input to the output. Y = f(X)
The goal is to approximate the mapping function so well that
when you have new input data (X) that you can predict the output
variables (Y) for that data.
It is called supervised learning because the process of an
algorithm learning from the training dataset can be thought of as
a teacher supervising the learning process. We know the correct
answers, the algorithm iteratively makes predictions on the
training data and is corrected by the teacher. Learning stops when
the algorithm achieves an acceptable level of performance.
3
Supervised Learning
Supervised learning problems can be further grouped
into regression and classification problems.
Classification: A classification problem is when the output
variable is a category, such as “red” or “blue” or “disease”
and “no disease”.
Regression: A regression problem is when the output
variable is a real value, such as “dollars” or “weight”.
4
Catching tax-evasion
6
What is classification (cont…)
The target function f is known as a classification model
7
Examples of Classification Tasks
Predicting tumor cells as benign or malignant
9
Illustrating Classification Task
10
Evaluation of classification models
Counts of test records that are correctly (or incorrectly)
predicted by the classification model
Confusion matrix Predicted Class
Class = 1 Class = 0
Actual Class
Class = 1 f11 f10
Class = 0 f01 f00
11
Classification Techniques
Decision Tree based Methods
Rule-based Methods
Memory based reasoning
Neural Networks
Naïve Bayes and Bayesian Belief Networks
Support Vector Machines
12
Classification Techniques
Decision Tree based Methods
Rule-based Methods
Memory based reasoning
Neural Networks
Naïve Bayes and Bayesian Belief Networks
Support Vector Machines
13
Decision Trees
Decision tree
A flow-chart-like tree structure
Internal node denotes a test on an attribute
Branch represents an outcome of the test
Leaf nodes represent class labels or class distribution
14
Example of a Decision Tree
l l
ca ca us
ri ri u o
ego ego tin ss
t t n a
ca ca co cl
Splitting Attributes
Refund
Yes No
Test outcom
NO MarSt
Single, Divorced Married
TaxInc NO
<= 80K > 80K
NO YES
Class labe
Training Data Model: Decision Tree
15
Another Example of Decision Tree
l l s
ir ca ir ca uo
u
ego ego tin ss
t t n la
ca ca co c MarSt Single,
Married Divorced
NO Refund
Yes No
NO TaxInc
<= 80K > 80K
NO YES
16
Decision Tree Classification Task
Decision
Tree
17
Apply Model to Test Data
Test Data
Start from the root of tree.
Refund
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
<= 80K > 80K
NO YES
18
Apply Model to Test Data
Test Data
Refund
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
<= 80K > 80K
NO YES
19
Apply Model to Test Data
Test Data
Refund
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
<= 80K > 80K
NO YES
20
Apply Model to Test Data
Test Data
Refund
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
<= 80K > 80K
NO YES
21
Apply Model to Test Data
Test Data
Refund
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
<= 80K > 80K
NO YES
22
Apply Model to Test Data
Test Data
Refund
Yes No
NO MarSt
Single, Divorced Married Assign Cheat to “No”
TaxInc NO
<= 80K > 80K
NO YES
23
Decision Tree Classification Task
Decision
Tree
24
Tree Induction
Finding the best decision tree is NP-hard
Greedy strategy.
Split the records based on an attribute test that
optimizes certain criterion.
Many Algorithms:
Hunt’s Algorithm (one of the earliest)
CART
ID3, C4.5
SLIQ, SPRINT
25
ID3 Algorithm
ID3 (Iterative Dichotomiser 3) is an algorithm
invented by Ross Quinlan used to generate a decision
tree from a dataset.
C4.5 is its successor.
These algorithms employs a top-down, greedy search
through the space of possible decision trees.
26
Which Attribute is the Best Classifier?
The central choice in the ID3 algorithm is selecting
which attribute should be tested at the root of the tree and
then at each node in the tree
28
Entropy
Entropy (D)
Entropy of data set D is denoted by H(D)
Ci s are the possible classes
pi = fraction of records from D that have class C
30
Splitting Criterion
Example:
Two classes, +/-
100 records overall (50 +s and 50 -s)
A and B are two binary attributes
Records with A=0: 48+, 2-
Records with A=1: 2+, 48-
Records with B=0: 26+, 24-
Records with B=1: 24+, 26-
Splitting on A is better than splitting on B
A does a good job of separating +s and -s
B does a poor job of separating +s and -s
31
Which Attribute is the Best Classifier?
Information Gain
The expected information needed to classify a tuple in
D is
33
Information Gain
Gain of an attribute split: compare the impurity of
the parent node with the average impurity of the
child nodes
34
DECISION TREES
Which Attribute is the Best Classifier?: Information Gain
35
DECISION TREES
36
DECISION TREES
Which Attribute is the Best Classifier?: Information Gain
The information gain obtained by separating the examples
according to the attribute Wind is calculated as:
37
DECISION TREES
Which Attribute is the Best Classifier?: Information Gain
We calculate the Info Gain for each attribute and select
the attribute having the highest Info Gain
38
DECISION TREES
Example
40
DECISION TREES
Example
42
DECISION TREES
Example
43
DECISION TREES
From Decision Trees to Rules
Next Step: Make rules from the decision tree
Example
A B A∧ B
F F F
T F F
F T F
T T T
45