0% found this document useful (0 votes)
148 views32 pages

Decision Trees (I) : ISOM3360 Data Mining For Business Analytics, Session 4

The document discusses decision trees for classification and regression. It recaps data understanding and preprocessing steps. Decision trees are popular due to their interpretability and computational efficiency. Classification trees classify examples based on attribute tests from the root node to a leaf node. Regression trees are used for numeric target variables. The document provides examples of classification trees and how they are constructed through recursive partitioning to split data into purer subgroups.

Uploaded by

Hiu Tung Chan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
148 views32 pages

Decision Trees (I) : ISOM3360 Data Mining For Business Analytics, Session 4

The document discusses decision trees for classification and regression. It recaps data understanding and preprocessing steps. Decision trees are popular due to their interpretability and computational efficiency. Classification trees classify examples based on attribute tests from the root node to a leaf node. Regression trees are used for numeric target variables. The document provides examples of classification trees and how they are constructed through recursive partitioning to split data into purer subgroups.

Uploaded by

Hiu Tung Chan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

ISOM3360 Data Mining for Business Analytics, Session 4

Decision Trees (I)

Instructor: Rong Zheng


Department of ISOM
Fall 2020
Recap: Data Understanding

Preliminary investigation of the data to better understand


its specific characteristics
 Help in selecting appropriate data mining algorithms

Things to look at
 Class imbalance
 Dispersion of data attribute values
 Skewness, outliers, missing values
 Correlation analysis

Visualization tools are important


 Histograms, box plots
 Scatter plot
2
Recap: Data Preprocessing

Data transformation might be needed


 Handling missing values
 Handling categorical variables
 Feature transformation
 e.g., log transformation
 Normalization (back to this when clustering is discussed)
 Feature discretization

3
Question

Which of the following normalization method(s)


may transform the original variable to a negative
value: min-max, z-score, or both, or neither?

4
What Induction Algorithm Shall We Use?

5
Commonly Used Induction Algorithms

Post-mortem analysis of a popular data mining competition

6
Why Decision Trees?

Decision trees are one of the most popular data mining


tools.
 Classification trees (used when target variable is categori
cal)
 Regression trees (used when target variable is numeric)

They are easy to understand, implement and use, and c


omputationally cheap.
The model comprehensibility is important for communi
cation to non-DM-savvy stakeholders.

7
Classification Tree

Employed Balance Age Default


Yes 123,000 50 No
No 51,100 40 Yes
No 68,000 55 No
Yes 34,000 46 Yes
Yes 50,000 44 No
No 100,000 50 Yes

Objective: predicting borrowers who will default on loan payments.

8
Classification Tree: Upside-Down

Employed Root Node

Yes No
Class = Not
Default Balance Node

>=50K <50K
Class = Not
Default
Age

>=45 <45

Class = Not Class =


Leaf Default Default

9
Classification Tree: Divide and Conquer
“Recursive Partitioning”
Employed
Nodes
 Each node represents one attribute Yes No
 Tests on nominal attribute: number Class = Not
of splits (branches) is number of po Default Balance
ssible values or 2 (one value vs. res
>=50K <50K
t)
 Continuous attributes are discretize Class = Not
d Default
Age

Leaves >=45 <45


 A class assignment (e.g., default /not default) Class = Not Class =
 Also provide a distribution over all possible Default Default
classes (e.g., default with probability 0.25, not
default with prob. 0.75)
10
How a Tree is Used for Classification ?
To determine the class of a new exa
mple: e,g., Mark, age 40, retired, bal Employed
ance 38K. Yes No
 The example is routed down the tre Class = Not
e according to values of attributes . Default Balance
 At each node, a test is applied to one >=50K <50K
attribute.
Class = Not
 When a leaf is reached, the example Default
is assigned to a class—or alternativel Age
y to a distribution over the possible c >=45 <45
lasses (e.g., default with probability
0.25, not default with prob. 0.75). Class = Not Class =
Default Default

11
Assigning Probability Estimates

Age >=45

Entire Population
Age <45
Balance >= 50k

Balance < 50k


Age >=45

Age < 45

Q: How would you assign estimates of class probability (e.g.,


probability of default/not default) based on such trees?
12
Exercise: Assume this is the classification tree you learned from the
A Process View to a Business Problem
past defaulting data. A new guy, who is 45 and has 20K balance, is
applying a credit card issued by your company. Can you predict if
this new guy is gonna default? How confident are you about your
prediction? Another girl is also applying the same credit card. But
the only information we have about her is she has 70k balance. Can
you predict if she will default? How sure about that?

13
Classification Tree Learning

Objective: based on customer attributes, partition the custome


rs into subgroups that are less impure – with respect to the clas
s (i.e., such that in each group most instances belong to the sa
me class)
No
Yes
No Yes Yes
Yes Yes

No
Yes Yes No No

14
Classification Tree Learning

Partitioning into “purer” groups

Orange Bodies Purple Bodies


Yes Yes Yes Yes No No

Yes Yes No Yes No No

15
Classification Tree Learning

Partitioning into “purer” groups recursively

Purple Bodies
Yes No No

Yes No No

16
Classification Tree Learning

Purple Bodies

Red Head Green Head

Yes No No

Yes No No

17
Classification Tree Learning

Partitioning into “purer” groups

Orange Bodies Purple Bodies


Yes Yes Yes Yes No No

Yes Yes No Yes No No

18
Classification Tree Learning

Partitioning into “purer” groups recursively

Orange Bodies
Yes Yes Yes

Yes Yes No

19
Classification Tree Learning
Orange Bodies

Blue Limbs Yellow Limbs


Yes Yes Yes
No

Yes Yes

20
Classification Tree Learning
Body

Orange Bodies Purple Bodies


Yes Yes Yes Yes No No

Yes Yes No Yes No No

Head
Limbs
Red Head Green Head
Blue Limbs
Yes No No
Yellow Limbs
Yes Yes Yes Yes Yes No
Yes No No

21
Summary: Classification Tree Learning

A tree is constructed by recursively partitioning the e


xamples.
With each partition, the examples are split into subgr
oups that are “increasingly pure”.

22
Let’s play a game. I have someone in my mind, and
your job is to guess this person. You can only ask
yes/no question.

This person is an entrepreneur.

Go!

23
Next…

Some important questions without being answered


yet:
 How to automatically choose which attribute to be
used to split the data?
 When to stop the splitting?

24
How to Choose Which Attribute to Split Over?

Objectives
 For each splitting node, choose the attribute that best p
artitions the population into less impure groups.
 All else being equal, fewer nodes is better (more compr
ehensible, easy to use, reduce overfitting)

Impurity measures: many available but most common


one (from information theory) is: entropy.

25
Entropy

is the proportion of class in the data

For example: our initial population is composed of 16 cases


of class “Default” and 14 cases of class “Not default”

Entropy (entire population of examples) =

26
Entropy Exercise

A dataset is composed of 10
1
cases of class “Positive” and
10 cases of class “Negative” 0.75

Entropy
Entropy=?
0.5
A dataset is composed of 0 c
ases of class “Positive” and 2 0.25
0 cases of class “Negative”
0
Entropy=? 0.25 0.5 0.75 1
% of one class

Two-class entropy function


 𝑡𝑖𝑝 : 0 log 2 0=0

27
Information Gain (based on Entropy)

The information gain is based on the reduction in entro


py after a dataset is split on an attribute

Information Gain =
entropy (parent) – [weighted average] entropy (children)

where weight of each child is given by the


proportion of the examples in that child.
Balance<50k Balance>=50k

28
Information Gain Example
30 instances

Balance < 50k Balance >= 50k

17 instances
13 instances

(Weighted) Average Impurity of Children =

Information Gain = 0.997 - 0.615 = 0.382


29
What If We Split Over “Age” First?
30 instances

Age < 45 Age >=45

15 instances
15 instances

Impurity Impurity
=? =? Exercise!
Information Gain = ?
recall, gain from first splitting on “Balance” = 0.382
30
Our Original Question

Now, ready to answer anxiously awaiting question:


How to choose which attribute to split over?

Answer: at each node, choose the attribute that obt


ains maximum information gain!

31
Decision Tree Algorithm (Full Tree)

 Step 1: Calculate the information gain from splitting over


each attribute using the dataset
 Step 2: Split the set into subsets using the attribute for which
the information gain is maximum
 Step 3: Make a decision tree node containing that attribute,
divide the dataset by its branches and repeat the same
process on every node.
 Step 4a: A node with entropy of 0 is a leaf.
 Step 4b: A node with entropy more than 0 needs further
splitting.

32

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy