Decision Trees (I) : ISOM3360 Data Mining For Business Analytics, Session 4
Decision Trees (I) : ISOM3360 Data Mining For Business Analytics, Session 4
Things to look at
Class imbalance
Dispersion of data attribute values
Skewness, outliers, missing values
Correlation analysis
3
Question
4
What Induction Algorithm Shall We Use?
5
Commonly Used Induction Algorithms
6
Why Decision Trees?
7
Classification Tree
8
Classification Tree: Upside-Down
Yes No
Class = Not
Default Balance Node
>=50K <50K
Class = Not
Default
Age
>=45 <45
9
Classification Tree: Divide and Conquer
“Recursive Partitioning”
Employed
Nodes
Each node represents one attribute Yes No
Tests on nominal attribute: number Class = Not
of splits (branches) is number of po Default Balance
ssible values or 2 (one value vs. res
>=50K <50K
t)
Continuous attributes are discretize Class = Not
d Default
Age
11
Assigning Probability Estimates
Age >=45
Entire Population
Age <45
Balance >= 50k
Age < 45
13
Classification Tree Learning
No
Yes Yes No No
14
Classification Tree Learning
15
Classification Tree Learning
Purple Bodies
Yes No No
Yes No No
16
Classification Tree Learning
Purple Bodies
Yes No No
Yes No No
17
Classification Tree Learning
18
Classification Tree Learning
Orange Bodies
Yes Yes Yes
Yes Yes No
19
Classification Tree Learning
Orange Bodies
Yes Yes
20
Classification Tree Learning
Body
Head
Limbs
Red Head Green Head
Blue Limbs
Yes No No
Yellow Limbs
Yes Yes Yes Yes Yes No
Yes No No
21
Summary: Classification Tree Learning
22
Let’s play a game. I have someone in my mind, and
your job is to guess this person. You can only ask
yes/no question.
Go!
23
Next…
24
How to Choose Which Attribute to Split Over?
Objectives
For each splitting node, choose the attribute that best p
artitions the population into less impure groups.
All else being equal, fewer nodes is better (more compr
ehensible, easy to use, reduce overfitting)
25
Entropy
26
Entropy Exercise
A dataset is composed of 10
1
cases of class “Positive” and
10 cases of class “Negative” 0.75
Entropy
Entropy=?
0.5
A dataset is composed of 0 c
ases of class “Positive” and 2 0.25
0 cases of class “Negative”
0
Entropy=? 0.25 0.5 0.75 1
% of one class
27
Information Gain (based on Entropy)
Information Gain =
entropy (parent) – [weighted average] entropy (children)
28
Information Gain Example
30 instances
17 instances
13 instances
15 instances
15 instances
Impurity Impurity
=? =? Exercise!
Information Gain = ?
recall, gain from first splitting on “Balance” = 0.382
30
Our Original Question
31
Decision Tree Algorithm (Full Tree)
32