0% found this document useful (0 votes)
56 views

Decision Trees Boosting Example Problem

Uploaded by

dk singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Decision Trees Boosting Example Problem

Uploaded by

dk singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Decision Trees Example Problem

Consider the following data, where the Y label is whether or not the child goes out to play.

Day Weather Temperature Humidity Wind Play?

1 Sunny Hot High Weak No

2 Cloudy Hot High Weak Yes

3 Sunny Mild Normal Strong Yes

4 Cloudy Mild High Strong Yes

5 Rainy Mild High Strong No

6 Rainy Cool Normal Strong No

7 Rainy Mild High Weak Yes

8 Sunny Hot High Strong No

9 Cloudy Hot Normal Weak Yes

10 Rainy Mild High Strong No


Step 1: Calculate the IG (information gain) for each attribute (feature)

Initial entropy = 𝐻(𝑌) = − ∑𝑦 𝑃(𝑌 = 𝑦) log 2 𝑃(𝑌 = 𝑦)

= −𝑃(𝑌 = 𝑦𝑒𝑠) log 2 𝑃(𝑌 = 𝑦𝑒𝑠) − 𝑃(𝑌 = 𝑛𝑜) log 2 𝑃(𝑌 = 𝑛𝑜)

= −(0.5) log 2 (0.5) − (0.5) log 2 (0.5)

= 1

Temperature:

Temperature

HOT MILD COLD


N, Y, N, Y Y, Y, N, Y, N N

Total entropy of this division is:

𝐻(𝑌 | 𝑡𝑒𝑚𝑝) = − ∑ 𝑃(𝑡𝑒𝑚𝑝 = 𝑥) ∑ 𝑃(𝑌 = 𝑦 | 𝑡𝑒𝑚𝑝 = 𝑥) log 2 𝑃(𝑌 = 𝑦 | 𝑡𝑒𝑚𝑝 = 𝑥)


𝑥 𝑦

= −(𝑃(𝑡𝑒𝑚𝑝 = 𝐻) ∑𝑦 𝑃(𝑌 = 𝑦 |𝑡𝑒𝑚𝑝 = 𝐻) log 2 𝑃(𝑌 = 𝑦 | 𝑡𝑒𝑚𝑝 = 𝐻) +


𝑃(𝑡𝑒𝑚𝑝 = 𝑀) ∑𝑦 𝑃(𝑌 = 𝑦 |𝑡𝑒𝑚𝑝 = 𝑀) log 2 𝑃(𝑌 = 𝑦 |𝑡𝑒𝑚𝑝 = 𝑀) +
𝑃(𝑡𝑒𝑚𝑝 = 𝐶) ∑𝑦 𝑃(𝑌 = 𝑦 | 𝑡𝑒𝑚𝑝 = 𝐶) log 2 𝑃(𝑌 = 𝑦 | 𝑡𝑒𝑚𝑝 = 𝐶))
1 1 1 1 3 3 2 2
= −((0.4)((2) log 2 (2) + (2) log 2 (2)) + (0.5)((5) log 2 (5) + (5) log 2 (5)) +
(0.1)((1) log 2 (1) + (0) log 2 (0)))
= 0.7884

IG(Y, temp) = 1 – 0.7884 = 0.2116


Weather:

Weather

SUNNY CLOUDY RAINY


N, Y, N Y, Y, Y N, N, Y, N

Total entropy of this division is:

𝐻(𝑌 | 𝑤𝑒𝑎𝑡ℎ𝑒𝑟) = − ∑ 𝑃(𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑥) ∑ 𝑃(𝑌 = 𝑦 | 𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑥) log 2 𝑃(𝑌 = 𝑦 | 𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑥)


𝑥 𝑦

= −(𝑃(𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑆) ∑𝑦 𝑃(𝑌 = 𝑦 | 𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑆) log 2 𝑃(𝑌 = 𝑦 | 𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑆) +


𝑃(𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝐶) ∑𝑦 𝑃(𝑌 = 𝑦 | 𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝐶) log 2 𝑃(𝑌 = 𝑦 | 𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝐶) +
𝑃(𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑅) ∑𝑦 𝑃(𝑌 = 𝑦 | 𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑅) log 2 𝑃(𝑌 = 𝑦 | 𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑅))
1 1 2 2
= −((0.3)((3) log 2 (3) + (3) log 2 (3)) + (0.3)((1) log 2 (1) + (0) log 2 (0)) +
1 1 3 3
(0.4)((4) log 2 (4) + (4) log 2 (4)))

= 0.6

IG(Y, weather) = 1 – 0.6 = 0.4


Humidity:

Humidity

STRONG WEAK
Y, Y, Y, N, N, N, N Y, N, Y

Total entropy of this division is:

𝐻(𝑌 | ℎ𝑢𝑚) = − ∑ 𝑃(ℎ𝑢𝑚 = 𝑥) ∑ 𝑃(𝑌 = 𝑦 | ℎ𝑢𝑚 = 𝑥) log 2 𝑃(𝑌 = 𝑦 | ℎ𝑢𝑚 = 𝑥)


𝑥 𝑦

= −(𝑃(ℎ𝑢𝑚 = 𝐻) ∑𝑦 𝑃(𝑌 = 𝑦 |ℎ𝑢𝑚 = 𝐻) log 2 𝑃(𝑌 = 𝑦 | ℎ𝑢𝑚 = 𝐻) +


𝑃(ℎ𝑢𝑚 = 𝑁) ∑𝑦 𝑃(𝑌 = 𝑦 |ℎ𝑢𝑚 = 𝑁) log 2 𝑃(𝑌 = 𝑦 |ℎ𝑢𝑚 = 𝑁)
3 3 4 4 2 2 1 1
= −((0.7)(( ) log 2 ( ) + ( ) log 2 ( )) + (0.3)(( ) log 2 ( ) + ( ) log 2 ( ))
7 7 7 7 3 3 3 3

= 0.8651

IG(Y, hum) = 1 – 0.8651 = 0.1349


Wind:

Wind

STRONG WEAK
Y, Y, N, N, N, N N, Y, Y, Y

Total entropy of this division is:

𝐻(𝑌 | 𝑤𝑖𝑛𝑑) = − ∑ 𝑃(𝑤𝑖𝑛𝑑 = 𝑥) ∑ 𝑃(𝑌 = 𝑦 | 𝑤𝑖𝑛𝑑 = 𝑥) log 2 𝑃(𝑌 = 𝑦 | 𝑤𝑖𝑛𝑑 = 𝑥)


𝑥 𝑦

= −(𝑃(𝑤𝑖𝑛𝑑 = 𝑆) ∑𝑦 𝑃(𝑌 = 𝑦 |𝑤𝑖𝑛𝑑 = 𝑆) log 2 𝑃(𝑌 = 𝑦 | 𝑤𝑖𝑛𝑑 = 𝑆) +


𝑃(𝑤𝑖𝑛𝑑 = 𝑊) ∑𝑦 𝑃(𝑌 = 𝑦 |𝑤𝑖𝑛𝑑 = 𝑊) log 2 𝑃(𝑌 = 𝑦 |𝑤𝑖𝑛𝑑 = 𝑊)
2 2 4 4 1 1 3 3
= −((0.6)(( ) log 2 ( ) + ( ) log 2 ( )) + (0.4)(( ) log 2 ( ) + ( ) log 2 ( ))
6 6 6 6 4 4 4 4

= 0.8755

IG(Y, wind) = 1 – 0.8755 = 0.1245

Step 2: Choose which feature to split with!

IG(Y, wind) = 0.1245

IG(Y, hum) = 0.1349

IG(Y, weather) = 0.4

IG(Y, temp) = 0.2116


Step 3: Repeat for each level (sad, I know)

Temperature

SUNNY CLOUDY RAINY


N, Y, N Y, Y, Y N, N, Y, N

Temperature Temperature

HOT HOT
N, N -

MILD MILD
Y N, Y, N

COOL
- COOL
N

1 1 2 2
Entropy of “Sunny” node = −((3) log 2 (3) + (3) log 2 (3)) = 0.9183

Entropy of its children = 0

IG = 0.9183

1 1 3 3
Entropy of “Rainy” node = −((4) log 2 (4) + (4) log 2 (4)) = 0.8113

3 1 1 2 2
Entropy of children = −(4)((3) log 2 (3) + (3) log 2 (3)) + 0 = 0.6887

IG = 0.1226
Humidity

SUNNY CLOUDY RAINY


N, Y, N Y, Y, Y N, N, Y, N

Humidity Humidity

HIGH HIGH
N, N N, Y, N

NORMAL NORMAL
Y N

1 1 2 2
Entropy of “Sunny” node = −((3) log 2 (3) + (3) log 2 (3)) = 0.9183

Entropy of its children = 0

IG = 0.9183

1 1 3 3
Entropy of “Rainy” node = −(( ) log 2 ( ) + ( ) log 2 ( )) = 0.8113
4 4 4 4

3 1 1 2 2
Entropy of children = −(4)((3) log 2 (3) + (3) log 2 (3)) + 0 = 0.6887

IG = 0.1226
Wind

SUNNY CLOUDY RAINY


N, Y, N Y, Y, Y N, N, Y, N

Wind Wind

STRONG STRONG
N, Y N, N, N

WEAK WEAK
N Y

1 1 2 2
Entropy of “Sunny” node = −(( ) log 2 ( ) + ( ) log 2 ( )) = 0.9183
3 3 3 3

2 1 1 1 1
Entropy of its children = −(3)((2) log 2 (2) + (2) log 2 (2)) + 0 = 0.6667

IG = 0.2516

1 1 3 3
Entropy of “Rainy” node = −((4) log 2 (4) + (4) log 2 (4)) = 0.8113

Entropy of children = 0

IG = 0.8113
Step 4: Choose feature for each node to split on!

“Sunny node”:

IG(Y, weather) = IG(humidity) = 0.9183

IG(Y, wind) = 0.2516

“Rainy node”:

IG(Y, weather) = IG(Y, humidity) = 0.1226

IG(Y, wind) = 0.8113

Final Tree!

Weather

SUNNY CLOUDY RAINY


N, Y, N Y, Y, Y N, N, Y, N

Humidity Wind

HIGH STRONG
N, N N, N, N

NORMAL WEAK
Y Y
Boosting

(https://www.ccs.neu.edu/home/vip/teach/MLcourse/4_boosting/slides/boosting.pdf)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy