Open navigation menu

Scribd

0% found this document useful (0 votes)

18 views76 pages

ML 05 Decision Trees

Uploaded by

Mrs.SANTHOSHI A

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views76 pages

ML 05 Decision Trees

Uploaded by

Mrs.SANTHOSHI A

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 76

CS 60050

Machine Learning

Decision Tree Classifier

Slides taken from course materials of Tan, Steinbach, Kumar

Illustrating Classification Task

Tid Attrib1 Attrib2 Attrib3 Class

Learning
1 Yes Large 125K No
algorithm
2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No

Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn

8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes

Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction

14 No Small 95K ?

15 No Large 67K ?
10

Test Set
Intuition behind a decision tree

● Ask a series of questions about a given record

– Each question is about one of the attributes
– Answer to one question decides what question to ask
next (or if a next question is needed)
– Continue asking questions until we can infer the class
of the given record
Example of a Decision Tree

Splitting Attributes
Tid Refund Marital Taxable
Status Income Cheat

1 Yes Single 125K No

2 No Married 100K No Refund
3 No Single 70K No
Yes No
4 Yes Married 120K No NO MarSt
5 No Divorced 95K Yes Married
Single, Divorced
6 No Married 60K No
7 Yes Divorced 220K No TaxInc NO
8 No Single 85K Yes < 80K > 80K
9 No Married 75K No
NO YES
10 No Single 90K Yes
10

Training Data Model: Decision Tree

Structure of a decision tree

● Decision tree: hierarchical structure

– One root node: no incoming edge, zero or more
outgoing edges
– Internal nodes: exactly one incoming edge, two or
more outgoing edges
– Leaf or terminal nodes: exactly one incoming edge, no
outgoing edge
● Each leaf node assigned a class label
● Each non-leaf node contains a test condition on
one of the attributes
Applying a Decision Tree Classifier

Tid Attrib1 Attrib2 Attrib3 Class

Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No

4 Yes Medium 120K No

Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn

8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes

Model
10

Training Set
Apply Decision
Tid Attrib1 Attrib2 Attrib3 Class
Model Tree
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ?

Deduction
14 No Small 95K ?

15 No Large 67K ?
10

Test Set
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO Once a decision tree

< 80K > 80K has been constructed
(learned), it is easy to
NO YES
apply it to test data
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married Assign Cheat to “No”

TaxInc NO
< 80K > 80K

NO YES
Learning a Decision Tree Classifier

Tid Attrib1 Attrib2 Attrib3 Class

Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No

4 Yes Medium 120K No

Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn

8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes

Model
10

Training Set
Apply Decision
Tid Attrib1 Attrib2 Attrib3 Class
Model Tree
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ?

Deduction
14 No Small 95K ?

15 No Large 67K ?

How to learn a decision tree?

10

Test Set
A Decision Tree (seen earlier)

Splitting Attributes
Tid Refund Marital Taxable
Status Income Cheat

1 Yes Single 125K No

2 No Married 100K No Refund
3 No Single 70K No
Yes No
4 Yes Married 120K No NO MarSt
5 No Divorced 95K Yes Married
Single, Divorced
6 No Married 60K No
7 Yes Divorced 220K No TaxInc NO
8 No Single 85K Yes < 80K > 80K
9 No Married 75K No
NO YES
10 No Single 90K Yes
10

Training Data Model: Decision Tree

Another Decision Tree on same dataset

MarSt Single,
Married Divorced
Tid Refund Marital Taxable
Status Income Cheat
NO Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than one tree
10
10 No Single 90K Yes that fits the same data!
Challenge in learning decision tree

● Exponentially many decision trees can be

constructed from a given set of attributes
– Some of the trees are more ‘accurate’ or better
classifiers than the others
– Finding the optimal tree is computationally infeasible
● Efficient
algorithms available to learn a
reasonably accurate (although potentially
suboptimal) decision tree in reasonable time
– Employs greedy strategy
– Locally optimal choices about which attribute to use
next to partition the data
Decision Tree Induction

● Many Algorithms:
– Hunt’s Algorithm (one of the earliest)
– CART
– ID3, C4.5
– SLIQ,SPRINT
General Structure of Hunt’s Algorithm
Tid Refund Marital Taxable
● Let Dt be the set of training records Status Income Cheat
that reach a node t 1 Yes Single 125K No

● General Procedure: 2 No Married 100K No

3 No Single 70K No
– If Dt contains records that all
4 Yes Married 120K No
belong the same class yt, then t 5 No Divorced 95K Yes
is a leaf node labeled as yt 6 No Married 60K No
– If Dt is an empty set, then t is a 7 Yes Divorced 220K No
leaf node labeled by the default 8 No Single 85K Yes

class yd 9 No Married 75K No

10 No Single 90K Yes
– If Dt contains records that 10

belong to more than one class, Dt

use an attribute test to split the
data into smaller subsets.
Recursively apply the ?
procedure to each subset.
Hunt’s Algorithm Tid Refund Marital
Status
Taxable
Income Cheat

1 Yes Single 125K No

2 No Married 100K No

Don’t 3 No Single 70K No

Cheat 4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

Default class is “Don’t

cheat” since it is the
majority class in the
dataset
Hunt’s Algorithm Tid Refund Marital
Status
Taxable
Income Cheat

1 Yes Single 125K No

2 No Married 100K No
Refund
Don’t 3 No Single 70K No
Yes No
Cheat 4 Yes Married 120K No
Don’t Don’t 5 No Divorced 95K Yes
Cheat Cheat
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

For now, assume that

“Refund” has been
decided to be the best
attribute for splitting in
some way (to be
discussed soon)
Hunt’s Algorithm Tid Refund Marital
Status
Taxable
Income Cheat

1 Yes Single 125K No

2 No Married 100K No
Refund
Don’t 3 No Single 70K No
Yes No
Cheat 4 Yes Married 120K No
Don’t Don’t 5 No Divorced 95K Yes
Cheat Cheat
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes

Refund 9 No Married 75K No

Yes No 10 No Single 90K Yes
10

Don’t Marital
Cheat Status
Single,
Married
Divorced
Don’t
Cheat
Cheat
Hunt’s Algorithm Tid Refund Marital
Status
Taxable
Income Cheat

1 Yes Single 125K No

2 No Married 100K No
Refund
Don’t 3 No Single 70K No
Yes No
Cheat 4 Yes Married 120K No
Don’t Don’t 5 No Divorced 95K Yes
Cheat Cheat
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes

Refund Refund 9 No Married 75K No

Yes No Yes No 10 No Single 90K Yes
10

Don’t Don’t Marital

Marital Cheat
Cheat Status Status
Single, Single,
Married Married
Divorced Divorced

Don’t Taxable Don’t

Cheat Cheat
Cheat Income
< 80K >= 80K

Don’t
Cheat
Cheat
Tree Induction

● Greedy strategy
– Split the records based on an attribute test
that optimizes certain criterion

● Issues
– Determine how to split the records
uHow to specify the attribute test condition?
uHow to determine the best split?

– Determine when to stop splitting

Tree Induction

● Greedy strategy
– Split the records based on an attribute test
that optimizes certain criterion

● Issues
– Determine how to split the records
uHow to specify the attribute test condition?
uHow to determine the best split?

– Determine when to stop splitting

How to Specify Test Condition?

● Depends on attribute types

– Nominal: two or more distinct values (special
case: binary) E.g., marital status: {single,
divorced, married}
– Ordinal: two or more distinct values that have
an ordering. E.g. shirt size: {S, M, L, XL}
– Continuous: continuous range of values
● Depends on number of ways to split
– 2-way split
– Multi-way split
Splitting Based on Nominal Attributes

● Multi-way split: Use as many partitions as distinct

values.
CarType
Family Luxury
Sports

● Binary split: Divides values into two subsets.

Need to find optimal partitioning.
CarType CarType
{Sports, OR {Family,
Luxury} {Family} Luxury} {Sports}
Splitting Based on Ordinal Attributes

● Multi-way split: Use as many partitions as distinct

values.
Size
Small Large
Medium

● Binary split: Divides values into two subsets.

Need to find optimal partitioning.
Size Size
{Small,
{Large}
OR {Medium,
{Small}
Medium} Large}

Size
{Small,
● What about this split? Large} {Medium}
Splitting Based on Continuous Attributes

● Different ways of handling

– Discretization to form an ordinal categorical
attribute
u Static – discretize once at the beginning
u Dynamic – ranges can be found by equal interval
bucketing, equal frequency bucketing
(percentiles), or clustering.

– Binary Decision: (A < v) or (A ≥ v)

u consider all possible splits and finds the best cut
u can be more compute intensive
Splitting Based on Continuous Attributes

Taxable Taxable
Income Income?
> 80K?
< 10K > 80K
Yes No

[10K,25K) [25K,50K) [50K,80K)

(i) Binary split (ii) Multi-way split

Tree Induction

● Greedy strategy.
– Split the records based on an attribute test
that optimizes certain criterion.

● Issues
– Determine how to split the records
uHow to specify the attribute test condition?
uHow to determine the best split?

– Determine when to stop splitting

What is meant by “determine best split”

Before Splitting: 10 records of class 0,

10 records of class 1

Own Car Student

Car? Type? ID?

Yes No Family Luxury c1 c20

c10 c11
Sports
C0: 6 C0: 4 C0: 1 C0: 8 C0: 1 C0: 1 ... C0: 1 C0: 0 ... C0: 0
C1: 4 C1: 6 C1: 3 C1: 0 C1: 7 C1: 0 C1: 0 C1: 1 C1: 1

Which test condition is the best?

How to determine the Best Split

● Greedy approach:
– Nodes with homogeneous class distribution
are preferred
● Need a measure of node impurity:

C0: 5 C0: 9
C1: 5 C1: 1

Non-homogeneous, Homogeneous,
High degree of impurity Low degree of impurity
Measures of Node Impurity

● Gini Index

● Entropy

● Misclassification error
How to Find the Best Split
Before Splitting: C0 N00 M0
C1 N01

A? B?
Yes No Yes No

Node N1 Node N2 Node N3 Node N4

C0 N10 C0 N20 C0 N30 C0 N40

C1 N11 C1 N21 C1 N31 C1 N41

M1 M2 M3 M4

M12 M34
Gain = M0 – M12 vs M0 – M34
Measures of Node Impurity

● Gini Index

● Entropy

● Misclassification error
Measure of Impurity: GINI Index

● Gini Index for a given node t :

GINI (t ) = 1 − ∑[ p( j | t )]2
j

p( j | t) is the relative frequency of class j at node t

Examples for computing GINI

GINI (t ) = 1 − ∑[ p( j | t )]2
j

C1 0 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1

C2 6 Gini = 1 – P(C1)2 – P(C2)2 = 1 – 0 – 1 = 0

C1 1 P(C1) = 1/6 P(C2) = 5/6

C2 5 Gini = 1 – (1/6)2 – (5/6)2 = 0.278

C1 2 P(C1) = 2/6 P(C2) = 4/6

C2 4 Gini = 1 – (2/6)2 – (4/6)2 = 0.444
Measure of Impurity: GINI Index

● Gini Index for a given node t :

GINI (t ) = 1 − ∑[ p( j | t )]2
j

p( j | t) is the relative frequency of class j at node t

– Maximum (1 - 1/nc) when records are equally

distributed among all classes, implying least
interesting information [nc: number of classes]
– Minimum (0.0) when all records belong to one class,
implying most interesting information
C1 0 C1 1 C1 2 C1 3
C2 6 C2 5 C2 4 C2 3
Gini=0.000 Gini=0.278 Gini=0.444 Gini=0.500
Splitting Based on GINI

● Used in CART, SLIQ, SPRINT.

● When a node p is split into k partitions (children), the
quality of split is computed as,
k
ni
GINI split = ∑ GINI (i )
i =1 n

where, ni = number of records at child i,

n = number of records at node p.
Binary Attributes: Computing GINI Index

● Splits into two partitions

● Effect of Weighing partitions:
– Larger and Purer Partitions are sought for.
Parent
B? C1 6
Yes No C2 6
Gini = 0.500
Node N1 Node N2
Gini(N1)
= 1 – (5/7)2 – (2/7)2 N1 N2 Gini(Children)
= 0.408
C1 5 1 = 7/12 * 0.408 +
Gini(N2) C2 2 4 5/12 * 0.32
= 1 – (1/5)2 – (4/5)2 Gini=0.371 = 0.371
= 0.32
Categorical Attributes: Computing Gini Index

● For each distinct value, gather counts for each class in

the dataset
● Use the count matrix to make decisions

Multi-way split Two-way split

(find best partition of values)

CarType CarType CarType

Family Sports Luxury {Sports, {Family,
{Family} {Sports}
Luxury} Luxury}
C1 1 2 1 C1 3 1 C1 2 2
C2 4 1 1 C2 2 4 C2 1 5
Gini 0.393 Gini 0.400 Gini 0.419
Continuous Attributes: Computing Gini Index

Tid Refund Marital Taxable

● Use Binary Decisions based on one Status Income Cheat
value
1 Yes Single 125K No
● Several Choices for the splitting value 2 No Married 100K No

– Number of possible splitting values 3 No Single 70K No

= Number of distinct values 4 Yes Married 120K No

● Each splitting value has a count matrix 5 No Divorced 95K Yes

associated with it 6 No Married 60K No

7 Yes Divorced 220K No
– Class counts in each of the
8 No Single 85K Yes
partitions, A < v and A ≥ v
9 No Married 75K No
● Simple method to choose best v 10 No Single 90K Yes
– For each v, scan the database to
10

gather count matrix and compute Taxable

Income
its Gini index > 80K?
– Computationally Inefficient!
Repetition of work. Yes No
Continuous Attributes: Computing Gini Index...

● For efficient computation: for each attribute,

– Sort the attribute on values
– Linearly scan these values, each time updating the count matrix
and computing gini index
– Choose the split position that has the least gini index

Cheat No No No Yes Yes Yes No No No No

Taxable Income

Sorted Values 60 70 75 85 90 95 100 120 125 220

55 65 72 80 87 92 97 110 122 172 230
Split Positions
<= > <= > <= > <= > <= > <= > <= > <= > <= > <= > <= >
Yes 0 3 0 3 0 3 0 3 1 2 2 1 3 0 3 0 3 0 3 0 3 0

No 0 7 1 6 2 5 3 4 3 4 3 4 3 4 4 3 5 2 6 1 7 0

Gini 0.420 0.400 0.375 0.343 0.417 0.400 0.300 0.343 0.375 0.400 0.420
Measures of Node Impurity

● Gini Index

● Entropy

● Misclassification error
Alternative Splitting Criteria based on INFO

● Entropy at a given node t:

Entropy(t ) = −∑ p( j | t ) log p( j | t )
j 2

p( j | t) is the relative frequency of class j at node t

● Measures homogeneity of a node

Examples for computing Entropy

Entropy(t ) = −∑ p( j | t ) log p( j | t )
j 2

C1 0 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1

C2 6 Entropy = – 0 log 0 – 1 log 1 = – 0 – 0 = 0

C1 1 P(C1) = 1/6 P(C2) = 5/6

C2 5 Entropy = – (1/6) log2 (1/6) – (5/6) log2 (1/6) = 0.65

C1 2 P(C1) = 2/6 P(C2) = 4/6

C2 4 Entropy = – (2/6) log2 (2/6) – (4/6) log2 (4/6) = 0.92
Alternative Splitting Criteria based on INFO

● Entropy at a given node t:

Entropy(t ) = −∑ p( j | t ) log p( j | t )
j 2

p( j | t) is the relative frequency of class j at node t

● Measures homogeneity of a node

u Maximum (log nc) when records are equally distributed
among all classes implying least information
u Minimum (0.0) when all records belong to one class,
implying most information
Splitting Based on INFO...

● Information Gain:
⎛∑ nk ⎞
GAIN = Entropy ( p) − ⎜ Entropy (i ) ⎟
i

⎝ n
split i =1
⎠
Parent Node p is split into k partitions;
ni is number of records in partition i
– Measures Reduction in Entropy achieved because of
the split. Choose the split that achieves most reduction
(maximizes GAIN)
– Used in ID3 and C4.5
– Disadvantage: Tends to prefer splits that result in large
number of partitions, each being small but pure.
Splitting Based on INFO...

● Gain Ratio:

GAIN n n k
GainRATIO = SplitINFO = − ∑ log
Split i i

SplitINFO
split

n n i =1

Parent Node, p is split into k partitions

ni is the number of records in partition i

– Adjusts Information Gain by the entropy of the

partitioning (SplitINFO). Higher entropy partitioning
(large number of small partitions) is penalized!
– Used in C4.5
– Designed to overcome the disadvantage of Information
Gain
Measures of Node Impurity

● Gini Index

● Entropy

● Misclassification error
Splitting Criteria based on Classification Error

● Classification error at a node t :

Error (t ) = 1 − max P(i | t )

i

p( i | t) is the relative frequency of class i at node t

● Measures misclassification error made by a node

Examples for Computing Error

Error (t ) = 1 − max P (i | t )
i

C1 0 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1

C2 6 Error = 1 – max (0, 1) = 1 – 1 = 0

C1 1 P(C1) = 1/6 P(C2) = 5/6

C2 5 Error = 1 – max (1/6, 5/6) = 1 – 5/6 = 1/6

C1 2 P(C1) = 2/6 P(C2) = 4/6

C2 4 Error = 1 – max (2/6, 4/6) = 1 – 4/6 = 1/3
Splitting Criteria based on Classification Error

● Classification error at a node t :

Error (t ) = 1 − max P(i | t )

i

● Measures misclassification error made by a node

u Maximum (1 - 1/nc) when records are equally distributed
among all classes, implying least interesting information
u Minimum (0.0) when all records belong to one class, implying
most interesting information
Comparison among Splitting Criteria

For a 2-class problem:

Tree Induction

● Greedy strategy.
– Split the records based on an attribute test
that optimizes certain criterion.

● Issues
– Determine how to split the records
uHow to specify the attribute test condition?
uHow to determine the best split?

– Determine when to stop splitting

Stopping Criteria for Tree Induction

● Stopexpanding a node when all the records

belong to the same class

● Stop expanding a node when all the records have

similar attribute values (if different class values,
then usually assign the majority class)

● Earlytermination, usually to prevent overfitting (to

be discussed later)
DT classification: points to note

● Findingan optimal DT is NPC, but efficient and

fast heuristic methods available

● Advantages:
– Extremely fast at classifying unknown records
– Easy to interpret, especially for small-sized
trees
– Accuracy is comparable to other classification
techniques for many simple data sets
DT classification: points to note

● Inwhat we discussed till now, the test condition

always involved a single attribute
– Decision boundaries are ‘rectilinear’ i.e., parallel
to ‘coordinate axes’ of the feature space
– Limits the expressiveness of DTs
● Oblique DTs – allows test conditions that involve
more than one attribute (e.g., x + y < 1)
– Better expressiveness
– But finding a good tree is computationally more
expensive
Decision Boundary
1

0.9

0.8
x < 0.43?

0.7
Yes No
0.6

y < 0.33?
y

0.5 y < 0.47?

0.4

0.3
Yes No Yes No

0.2
:4 :0 :0 :4
0.1 :0 :4 :3 :0
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

x
• Border line between two neighboring regions of different classes is
known as decision boundary
• Decision boundary is parallel to axes because test condition involves
a single attribute at-a-time
Oblique Decision Trees

x+y<1

Class = + Class =

• Test condition may involve multiple attributes

• More expressive representation
• Finding optimal test condition is computationally expensive
Example: C4.5

● Simple depth-first construction.

● Uses Information Gain
● Sorts Continuous Attributes at each node.
● Needs entire data to fit in memory.
● Unsuitable for Large Datasets.
– Needs out-of-core sorting.

● You can download the software from:

http://www.cse.unsw.edu.au/~quinlan/c4.5r8.tar.gz
Practical issues of Decision Tree classifier
Underfitting and Overfitting (Example)

500 circular and 500

triangular data points.

Circular points:
0.5 ≤ sqrt(x12+x22) ≤ 1

Triangular points:
sqrt(x12+x22) > 0.5 or
sqrt(x12+x22) < 1
Underfitting and Overfitting

Overfitting

Underfitting: when DT is too simple, both training and test errors are large
Overfitting: DT has grown too large, and is now fitting the noise in the dataset
Overfitting

● Overfitting
results in decision trees that are more
complex than necessary

● Training
error no longer provides a good estimate
of how well the tree will perform on previously
unseen records
Overfitting due to Noise

Decision boundary is distorted by noise point

Overfitting due to Insufficient Examples

Lack of data points in the lower half of the diagram makes it difficult
to predict correctly the class labels of that region
- Insufficient number of training records in the region causes the
decision tree to predict the test examples using other training
records that are irrelevant to the classification task
Occam’s Razor

● Given two models of similar generalization errors,

one should prefer the simpler model over the
more complex model

● For complex models, there is a greater chance

that it was fitted accidentally by errors in data

● Therefore, one should include model complexity

when evaluating a model
Minimum Description Length (MDL)
A?
X y Yes No
X y
X1 1 0 B? X1 ?
X2 0 B1 B2
X2 ?
X3 0 C? 1
A C1 C2 B X3 ?
X4 1
0 1 X4 ?
… …
Xn
… …
1
Xn ?

● Cost(Model,Data) = Cost(Data|Model) + Cost(Model)

– Cost is the number of bits needed for encoding.
– Search for the least costly model.
● Cost(Data|Model) encodes the misclassification errors.
● Cost(Model) uses node encoding (number of children)
plus splitting condition encoding.
How to Address Overfitting

● Pre-Pruning (Early Stopping Rule)

– Stop the algorithm before it becomes a fully-grown tree
– Typical stopping conditions for a node:
u Stop if all instances belong to the same class
u Stop if all the attribute values are the same
– More restrictive conditions:
u Stop if number of instances is less than some user-specified
threshold
uStop if expanding the current node does not improve impurity
measures (e.g., Gini or information gain)
How to Address Overfitting…

● Post-pruning
– Grow decision tree to its entirety
– Trim the nodes of the decision tree in a bottom-up
fashion
– If generalization error improves after trimming, replace
sub-tree by a leaf node.
– Class label of leaf node is determined from majority
class of instances in the sub-tree
– Can use MDL for post-pruning
Other Issues

● Data Fragmentation
● Search Strategy
● Expressiveness
● Tree Replication
Data Fragmentation

● Number of instances gets smaller as you traverse

down the tree

● Number of instances at the leaf nodes could be

too small to make any statistically significant
decision
Search Strategy

● Finding an optimal decision tree is NP-hard

● Thealgorithm presented so far uses a greedy,

top-down, recursive partitioning strategy to
induce a reasonable solution

● Otherstrategies?
– Bottom-up
– Bi-directional
Expressiveness

● Decision tree provides expressive representation for

learning discrete-valued function
– But they do not generalize well to certain types of
Boolean functions
u Example: parity function:
– Class = 1 if there is an even number of Boolean attributes with truth
value = True
– Class = 0 if there is an odd number of Boolean attributes with truth
value = True
u For accurate modeling, must have a complete tree

● Not expressive enough for modeling continuous variables

– Particularly when test condition involves only a single
attribute at-a-time
Tree Replication
P

Q R

S 0 Q 1

0 1 S 0

0 1

• Same subtree appears in multiple branches

You might also like

Lecture 2
No ratings yet
Lecture 2
98 pages
Aiml Unit-3
No ratings yet
Aiml Unit-3
92 pages
Data Mining Classification - Basic Concepts and Techniques
No ratings yet
Data Mining Classification - Basic Concepts and Techniques
92 pages
Wk. 5.2. Decision Trees [27.10.2020]
No ratings yet
Wk. 5.2. Decision Trees [27.10.2020]
80 pages
Introduction To Artificial Intelligence: Amna Iftikhar Spring ' 2021 1
No ratings yet
Introduction To Artificial Intelligence: Amna Iftikhar Spring ' 2021 1
50 pages
L09 - Learning - Part 2
No ratings yet
L09 - Learning - Part 2
41 pages
datamining-lect10a-Classsification-basics-DT
No ratings yet
datamining-lect10a-Classsification-basics-DT
87 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Classification Slides
No ratings yet
Classification Slides
147 pages
Classification Part 1
No ratings yet
Classification Part 1
76 pages
Tree Based Classifiers: Dinesh R
No ratings yet
Tree Based Classifiers: Dinesh R
54 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
96 pages
Datamining-lect3 - Classification. Decision Trees. Evaluation
No ratings yet
Datamining-lect3 - Classification. Decision Trees. Evaluation
95 pages
Lecture 14&15
No ratings yet
Lecture 14&15
81 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
87 pages
Chap4 Basic Classification PDF
No ratings yet
Chap4 Basic Classification PDF
101 pages
DM Lec6
No ratings yet
DM Lec6
18 pages
Decision Tree and Evalaution
No ratings yet
Decision Tree and Evalaution
50 pages
2024-2025 Python IEEE Projects List
No ratings yet
2024-2025 Python IEEE Projects List
10 pages
Decision Trees
No ratings yet
Decision Trees
88 pages
Chap4 Basic Classification
No ratings yet
Chap4 Basic Classification
51 pages
Lecture 13-Supervised Learning-Decision Trees-M
No ratings yet
Lecture 13-Supervised Learning-Decision Trees-M
47 pages
Decision Tree
No ratings yet
Decision Tree
42 pages
Thesis Floris Visser 406508fv
No ratings yet
Thesis Floris Visser 406508fv
80 pages
Unit-II - Tree Based Methods
No ratings yet
Unit-II - Tree Based Methods
158 pages
Lec 16,17
No ratings yet
Lec 16,17
90 pages
Classification Basic Concepts, Decision Trees, and Model Evaluation
No ratings yet
Classification Basic Concepts, Decision Trees, and Model Evaluation
67 pages
Classification Basics
No ratings yet
Classification Basics
65 pages
Data Mining: Lecture - 03
No ratings yet
Data Mining: Lecture - 03
56 pages
Jntuk Machine Learning 3-2 Unit-1
No ratings yet
Jntuk Machine Learning 3-2 Unit-1
31 pages
Lecture3 2020classification PDF
No ratings yet
Lecture3 2020classification PDF
124 pages
5-Classification (2)
No ratings yet
5-Classification (2)
59 pages
09 - ML - Decision Tree
No ratings yet
09 - ML - Decision Tree
45 pages
Decision Trees 4
No ratings yet
Decision Trees 4
56 pages
Classification: Basic Concepts and Decision Trees
No ratings yet
Classification: Basic Concepts and Decision Trees
71 pages
Important For Data Mining
No ratings yet
Important For Data Mining
96 pages
2EL1730-ML-Lecture05-Trees and Ensemble Learning
No ratings yet
2EL1730-ML-Lecture05-Trees and Ensemble Learning
70 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Machlearn PDF
No ratings yet
Machlearn PDF
168 pages
Academic Courses After +2 - Guide by CBSE PDF
No ratings yet
Academic Courses After +2 - Guide by CBSE PDF
122 pages
IML Unit04 - Learning Decision Trees
No ratings yet
IML Unit04 - Learning Decision Trees
28 pages
L6 Decision Tree Classifier
No ratings yet
L6 Decision Tree Classifier
46 pages
3. Undergraduate Fundamentals of Machine Learning Author William J. Deuschle
No ratings yet
3. Undergraduate Fundamentals of Machine Learning Author William J. Deuschle
143 pages
Week 4 - Classification - Decision Tree 1
No ratings yet
Week 4 - Classification - Decision Tree 1
40 pages
DECISION TREE
No ratings yet
DECISION TREE
38 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
Classification
No ratings yet
Classification
58 pages
Classification: Lecture Notes For Chapters 4 & 5
No ratings yet
Classification: Lecture Notes For Chapters 4 & 5
42 pages
06 Classification
No ratings yet
06 Classification
32 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Datamining-Lect5 Decision Tree
No ratings yet
Datamining-Lect5 Decision Tree
38 pages
Lecture 11-Classification-M
No ratings yet
Lecture 11-Classification-M
33 pages
Unit II Part 1
No ratings yet
Unit II Part 1
62 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
Classification: Basic Concepts and Decision Trees
No ratings yet
Classification: Basic Concepts and Decision Trees
56 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
59 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
36 pages
Week 6 Chap3 - Basic - Classificationi
No ratings yet
Week 6 Chap3 - Basic - Classificationi
59 pages
7 - Classfication - Concept - DecisionTree - Evaluation
No ratings yet
7 - Classfication - Concept - DecisionTree - Evaluation
47 pages
CH03 Classification Part I
No ratings yet
CH03 Classification Part I
58 pages
Dral Et Al 2024 Mlatom 3 A Platform For Machine Learning Enhanced Computational Chemistry Simulations and Workflows
No ratings yet
Dral Et Al 2024 Mlatom 3 A Platform For Machine Learning Enhanced Computational Chemistry Simulations and Workflows
21 pages
Kenny-230717-Data Science Interview Preparation
No ratings yet
Kenny-230717-Data Science Interview Preparation
6 pages
Exactspace: Power Plant Analytics
No ratings yet
Exactspace: Power Plant Analytics
24 pages
Deep and Evolutionary
No ratings yet
Deep and Evolutionary
19 pages
Classification: Basic Concepts and Decision Trees
No ratings yet
Classification: Basic Concepts and Decision Trees
71 pages
Using Predicate Logic Abd Representing Simple Facts Using Predicate Logic
No ratings yet
Using Predicate Logic Abd Representing Simple Facts Using Predicate Logic
16 pages
Big Data & Analytics
No ratings yet
Big Data & Analytics
14 pages
CH 6
No ratings yet
CH 6
72 pages
Thesis Robotics PDF
100% (3)
Thesis Robotics PDF
8 pages
Pneumonia Detection Using Convolutional Neural Networks (CNNS)
No ratings yet
Pneumonia Detection Using Convolutional Neural Networks (CNNS)
14 pages
Deep Learning Based Recommendation Systems
No ratings yet
Deep Learning Based Recommendation Systems
47 pages
Predicting_Heart_Diseases_Using_Machine_Learning_a
No ratings yet
Predicting_Heart_Diseases_Using_Machine_Learning_a
16 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Final PPT
No ratings yet
Final PPT
10 pages
DOC-20250317-WA0008.
No ratings yet
DOC-20250317-WA0008.
19 pages
Final Research Paper
No ratings yet
Final Research Paper
9 pages
Class 10 Ai Notes
No ratings yet
Class 10 Ai Notes
8 pages
1.3 Current Landscape and Challenges
No ratings yet
1.3 Current Landscape and Challenges
8 pages
Lecture 04 - Supervised Learning by Computing Distances (2) - Plain
No ratings yet
Lecture 04 - Supervised Learning by Computing Distances (2) - Plain
16 pages
Decision Tree and Ensemble
No ratings yet
Decision Tree and Ensemble
92 pages
A Purview of the Impact of Supervised
No ratings yet
A Purview of the Impact of Supervised
7 pages
IEEE-Advancing Safety and Efficiency in Human-Robot Interaction
No ratings yet
IEEE-Advancing Safety and Efficiency in Human-Robot Interaction
8 pages
7th Ise Syllabus
No ratings yet
7th Ise Syllabus
6 pages
Kanchan Malhotra: TH TH
No ratings yet
Kanchan Malhotra: TH TH
2 pages
Liaquat Majeed Sheikh: National University of Computer and Emerging Sciences
No ratings yet
Liaquat Majeed Sheikh: National University of Computer and Emerging Sciences
79 pages
Tugas2 Regresi Linear Berganda - Ipynb - Colab
No ratings yet
Tugas2 Regresi Linear Berganda - Ipynb - Colab
3 pages
nimish
No ratings yet
nimish
4 pages
AI and Cloud Computing Integration Offer Numerous Benefits For Businesses and Organizations
No ratings yet
AI and Cloud Computing Integration Offer Numerous Benefits For Businesses and Organizations
2 pages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies:

Alternative Proxy