0% found this document useful (0 votes)
42 views24 pages

Data Mining

The document discusses proximity measures for binary and mixed attributes, detailing methods for calculating dissimilarity and similarity among objects based on various attribute types. It provides examples of calculations for binary attributes, nominal, ordinal, and numeric data, as well as the use of Minkowski, Euclidean, Manhattan, and cosine similarity distances. Additionally, it introduces concepts of support and confidence in the context of frequent patterns in datasets.

Uploaded by

rabby01601565625
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views24 pages

Data Mining

The document discusses proximity measures for binary and mixed attributes, detailing methods for calculating dissimilarity and similarity among objects based on various attribute types. It provides examples of calculations for binary attributes, nominal, ordinal, and numeric data, as well as the use of Minkowski, Euclidean, Manhattan, and cosine similarity distances. Additionally, it introduces concepts of support and confidence in the context of frequent patterns in datasets.

Uploaded by

rabby01601565625
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

chapter-2

Similarity and Dissimilarity ( Proximity ):

Proximity Measures for Binary Attributes: Contingency Table for Binary Attributes

i j
r+s q 1 1
1. Symmetric d(i,j) = r 1 0
q+r + s+ t
s 0 1
r +s t 0 0
2. Asymmetric d(i,j)= q+r + s

Example :

Name Gender Fever Cough test-1 test-2 test-3 test-4

Jack M Y N P N N N

Jim M Y Y N N N N

Mary F Y N P N P N

Solution:
Name Fever Cough test-1 test-2 test-3 test-4

Jack Y (1) N (0) P (1) N (0) N (0) N (0)

Jim Y (1) Y (1) N (0) N (0) N (0) N (0)

Mary Y (1) N (0) P (1) N (0) P (1) N (0)

asymmetric binary dissimilarity:

r +s q 1+0+0+0+0+0=1
d(Jack, Jim) =
q+r + s r 0+0+1+0+0+0=1
1+1 s 0+1+0+0+0+0=1
= 1+ 1+ 1
= 0.67

r +s q 1+0+1+0+0+0=2
d(Jack, Mary) =
q+r + s r 0+0+0+0+0+0=0
0+1 s 0+0+0+0+1+0=1
= 2+ 0+1
= 0.33

r +s q 1+0+0+0+0+0=1
d(Jim, Mary) =
q+r + s r 0+1+0+0+0+0=1
1+2 s 0+0+1+0+1+0=2
= 1+ 1+ 2
= 0.75

These measurements suggest that Jim and Mary are unlikely to have a similar
disease because they have the highest dissimilarity value among the three
pairs. Of the three patients, Jack and Mary are the most likely to have a similar
disease.

Proximity Measures for Mixed Attributes:


Identify nominal , ordinal and numeric:

nominal 1. Categorized ● Gender (Male, Female, Other)


2. Nol rank or order ● Marital Status (Single, Married, Divorced)
● Color (Brown, Blue, Green)

ordinal 1. Categorized ● Satisfaction (Low, Medium, High)


2. Meaningful Rank or order ● Ranks (Lieutenant, Captain, Major)
● Ratings (1-star, 2-star, 3-star, 4-star, 5-star)

numeric 1. Perform Mathematical ● Years (1990, 2000, 2020)


operations ● Score (90, 100, 120)
2. measurable quantities

Nominal: p= number of attribute


m= number of matches
d(i,j)= p
p−m

Ordinal:

r if = particular Rank
r if −1 M if = Total Rank number
z if =¿
M if −1

Numeric:

¿
d if =¿¿ x if −x jf ∨ ¿
max−min

Example:
Object Identifier Test-1 Test-2 Test-3

1 Code A Excellent 45

2 Code B Fair 22

3 Code C Good 64

4 Code A Excellent 28

Test-1 —---------------> Nominal


Test-2 —---------------> Ordinal
Test-3 —---------------> Numeric

For nominal:

p−m
d(i,j) =
p
d(2,1)=
1−0
=1 1 2 3 4
1 1 0
d(3,1)=
1−0
=1 2 1 0
1 3 1 1 0
=1
1−0 4 0 1 1 0
d(3,2)=
1

=0
1−1
d(4,1)=
1

=1
1−0
d(4,2)=
1

d(4,3)=
1−0
=1 Dissimilarity Matrix
1

[ Note: if there are two or more attributes, there will still be one dissimilarity matrix
that represents the dissimilarities between objects based on all attributes. ]

For Ordinal:
Rank:
r if −1
z if =¿ Excellent (1) —-------> 0
M if −1
Fair (2) —-------> 0.5
Z excellent =
1−1
3−1
=0 Good (3) —-------> 1

Z Fair =
2−1
3−1
= 0.5
ZGood =
3−1
=1
3−1 1 2 3 4
1 0
Using Manhattan: 2 0.5 0
3 1 0.5 0
d(2,1) = |0.5-0| = 0.5
4 0 0.5 1 0
d(3,1) = |1-0| = 1
d(3,2) = |1-0.5| = 0.5
d(4,1) = |0-0| = 0
d(4,2) = |0-0.5| = 0.5
d(4,3) = |0-1| = 1 Dissimilarity Matrix

For Numerical:

¿
d if =¿¿ x if −x jf ∨ ¿
max−min
1 2 3 4
d 21=¿ 22−45∨ ¿ ¿= d 31=¿ 64−45∨ ¿ ¿= 1 0
64−22 64−22
2 0.54 0
0.54 0.45
3 0.45 1 0
¿ ¿ d =¿ 28−45∨ ¿ ¿= 4 0.40 0.14 0.86 0
64−22 =1 41
d 32=¿ 64−22∨
64−22
0.40
¿ ¿ ¿ ¿
64−22 = 64−22 =
d 42=¿ 28−22∨ d 21=¿ 28−64∨
0.14 0.86
Dissimilarity Matrix

Final Dissimilarity Matrix:


p
δ ij (f )=0 1. x if ∨x jf is missing
d(i,j)= ∑ ij
δ ( f ). d ij (f )
f =1 2. x if =x jf =0

δ ij (f )=1 others

=
(1∗1)+(1∗0.5)+(1∗0.54 )
d(2,1) 1+ 1+1
=¿0.68
1 2 3 4
1 0
d(3,1) =
(1∗1)+(1∗1)+(1∗0.45)
=¿ 0.81
1+1+1 2 0.68 0
3 0.81 0.84 0
d(3,2) =
(1∗1)+(1∗0.5)+(1∗0)
=¿0.84 4 0.14 0.54 0.95 0
1+1+1

d(4,1) =
(1∗0)+(1∗0)+(1∗0.40)
=¿0.14
1+ 1+ 1

d(4,2) =
(1∗1)+(1∗0.5)+(1∗0.14 )
=¿0.54
1+ 1+1

d(4,3) =
(1∗1)+(1∗1)+(1∗0.86)
=¿0.95
1+1+1

Dissimilarity of Numeric Data:

Minkowski distance, d(i,j)= √¿ xh


i1 −x j 1∨¿h +¿ x i 2−x j 2∨¿ h+........+ ¿ x ip −x jp ∨¿h ¿ ¿ ¿

Euclidean distance, d(i,j)= √¿ x


2
i1
2 2 2
−x j 1∨¿ +¿ x i 2−x j 2∨¿ +........+¿ x ip −x jp ∨¿ ¿ ¿ ¿
Manhattan (or city block) distance, d(i,j) =
❑ ❑ ❑
¿ x i 1−x j 1∨¿ +¿ x i 2−x j 2∨¿ +........+ ¿ x ip −x jp ∨¿ ¿ ¿ ¿

supremum distance d(i,j)= max ( ¿ x i 1−x j 1∨¿ , ¿ xi 2−x j 2∨¿ ,........ , ¿ x ip −x jp ∨¿ ¿ ¿ ¿)


❑ ❑ ❑

Cosine Similarity:

d 1. d 2
cos(d1,d2) = ¿∨d 1∨¿ .∨¿ d 2∨¿
||d||=√❑

Example: Document Vector or Term-Frequency Vector


Document team coach hockey baseball soccer penalty score win loss season
document1 5 0 3 0 2 0 0 2 0 0
document2 3 0 2 0 1 1 0 1 0 1
document3 0 7 0 2 1 0 0 3 0 0
document 0 1 0 0 1 2 2 0 3 0
4

Which pair has the most similarity and the most dissimilarity?
d1.d2= (5*3)+(0*0)+(3*2)+(0*0)+(2*1)+(0*1)+(0*0)+(2*1)+(0*0)+(0*1)
= 15 + 6 + 2 + 2
= 25

d1.d3= (5*0)+(0*7)+(3*0)+(0*2)+(2*1)+(0*0)+(0*0)+(2*3)+(0*0)+(0*0)
=2+6
=8

d1.d4= (5*0)+(0*1)+(3*0)+(0*0)+(2*1)+(0*2)+(0*2)+(2*0)+(0*3)+(0*0)
=2

d2.d3= (3*0)+(0*7)+(2*0)+(0*2)+(1*1)+(1*0)+(0*0)+(1*3)+(0*0)+(1*0)
=1+3
=4

d2.d4= (3*0)+(0*1)+(2*0)+(0*0)+(1*1)+(1*2)+(0*2)+(1*0)+(0*3)+(1*0)
=1+2
=3

d3.d4= (0*0)+(7*1)+(0*0)+(2*0)+(1*1)+(0*2)+(0*2)+(3*0)+(0*3)+(0*0)
=7+1
=8

||d1||=√ ❑
=√ ❑
=√ ❑ = 6.48

||d2||=√ ❑
=√ ❑
=√ ❑ = 4.12
||d3||=√ ❑
=√ ❑
=√ ❑ = 7.93

||d4||=√ ❑
=√ ❑
=√ ❑ = 4.35

Distance Between Document-1 and Document-2:

d 1. d 2 25
cos(d1,d2) = ¿∨d 1∨¿ .∨¿ d 2∨¿ = 6.48∗4.12 = 0.94

Distance Between Document-1 and Document-3:

d 1. d 3 8
cos(d1,d3) = ¿∨d 1∨¿ .∨¿ d 3∨¿ = 6.48∗7.93 = 0.15

Distance Between Document-1 and Document-4:


d 1. d 4 2
cos(d1,d4) = ¿∨d 1∨¿ .∨¿ d 4∨¿ = 6.48∗4.35 = 0.07

Distance Between Document-2 and Document-3:

d 2. d 3 4
cos(d2,d3) = ¿∨d 2∨¿ .∨¿ d 3∨¿ = 4.12∗7.93 = 0.12

Distance Between Document-2 and Document-4:

d 2. d 4 3
cos(d2,d4) = ¿∨d 2∨¿ .∨¿ d 4∨¿ = 4.12∗4.35 = 0.16

Distance Between Document-3 and Document-4:

d 3. d 4 8
cos(d3,d4) = ¿∨d 3∨¿ .∨¿ d 4∨¿ = 7.93∗4.35 = 0.23

(d1, d2) because their similarity is 94%. So, (d1, d2) is the most similar.

(d1, d4) has the most dissimilarity because their similarity is 7%.

Chapter-6

Frequent pattern:

Support : how frequently a pattern (or itemset) appears in a dataset.


Confidence : Confidence measures the probability of Y happening if X is present

=
σ (x ∪ y)
Support S(x→y) N
σ (x ∪ y)
Confidence C(x→y) = σ (x)

Tid Items Bought

10 Beef,Nuts,Durian

20 Beef,Coffee,Durian

30 Beef,Durian,Eggs

40 Nuts,Eggs,Milk

50 Nuts,Coffee,Durian,Eggs,Milk

σ (Beef ∪ Durian) 3
S(Beef→Durian)= N
= 5 =0.6
σ (Beef ∪ Durian) 3
C(Beef→Durian)= σ (Beef ) = 3 =1
σ (Durian ∪ Beef ) 3
S(Durian→Beef)= N
= 5 =0.6
σ (Durian ∪ Beef ) 3
C(Durian→Beef)= σ (Durian) = 4 =0.75

Apriori Algorithm:

¿min =2

Dataset C1 L1

Tid items ItemSet sup ItemSet sup

10 A,C,D {A} 2 {A} 2 C2


20 B,C,E 1st scan {B} 3 {B} 3 2nd scan ItemSet sup

30 A,B,C,E —------> {C} 3 ---> {C} 3 —---------> {A,B} 1

40 B,E {D} 1 {E} 3 {A,C} 2

{E} 3 {A,E} 1

L2 {B,C} 2

ItemSet sup {B,E} 3

{A,C} 2 <—-------- {C,E} 2

C3 {B,C} 2

ItemSet sup 3rd scan {B,E} 3


<--—-------
{A,B,C} 1 {C,E} 2

{A,B,E} 1
L3

{A,C,E} 1 ItemSet sup

{B,C,E} 2 —----------> {B,C,E} 2

Frequent pattern {B,C,E}


[ Note : If Empty set arise The Frequent pattern will be Previous layer ]
Given
¿min =2 Conmin =2

Dataset C1

Tid Items ItemSet Sup

T100 I 1, I 2, I 5 { I 1} 6

T200 I 2, I 4 { I 2} 7 L1 C2

T300 I 2, I 3 1st { I 3} 6 ItemSet Sup ItemSet Sup


Scan

T400 I 1, I 2, I 4 —----> { I 4} 2 —----> { I 1} 6 { I 1 , I 2} 4


T500 I 1, I 3 { I 5} 2 { I 2} 7 2nd { I 1, I 3} 4
scan

T600 I 2, I 3 { I 3} 6 —-----> { I 1, I 4} 1

T700 I 1, I 3 { I 4} 2 { I 1, I 5} 2

T800 I 1, I 2 , I 3, { I 5} 2 { I 2, I 3} 4
I5

T900 I 1, I 2 , I 3 { I 2, I 4} 2

L2 { I 2, I 5} 2

L3 C3 ItemSet Sup { I 3, I 4 } 0

ItemSet Sup ItemSet Sup { I 1 , I 2} 4 <-—------- { I 3, I 5} 1

{ I 1 , I 2 , I 3} 2 {I 1, I 2 , I 3 2 { I 1, I 3} 4 { I 4, I 5} 0
}

{ I 1 , I 2 , I 5} 2 <—------ { 1 3rd { I 1, I 5} 2
I 1, I 2, I 4 scan
}

{I 1, I 2 , I 5 2 <—---- { I 2, I 3} 4
}

{ I 2, I 3 , I 4 0 { I 2, I 4} 2
}

ItemSet Sup { I 2, I 3 , I 5 1 { I 2, I 5} 2
}

{ 0 { 0
I 1, I 2, I 3, I 4 I 3, I 4 , I 5 ,
} }

{ 0
I 2 , I 3 , I 4 , I5
}

Frequent Pattern { I 1 , I 2 , I 3} ∪{ I 1 , I 2 , I 5}
Associate Rule Generation: IF( Confidence < 50 %) Invalid
Else Valid
Support (I )
S→I-S Confidence= Support (S) I={
I 1 , I 2 , I 3}
S={{ I 1},{ I 2},{ I 3},{ I 1 , I 2 ,},{ I 1 , I 3},{ I 2 , I 3}}

Rule-1: 2 2
Support( I ) =
9 Con = 6 =0.33
{ I 1} → { I 2 , I 3} 6 Invalid / Weak Associative
Support(S)=
9

Rule-2: 2 2
Support( I ) =
9 Con= 7 =0.28
{ I 2} → { I 1 , I 3} 7 Invalid / Weak Associative
Support(S) =
9

Rule-3: 2 2
Support(I ) =
9 Con = 6 =0.33
{ I 3} → { I 1 , I 2} 6 Invalid / Weak
Support( S) = Associative
9

Rule-4: 2 2
Support( I ) =
9 Con = 4 =0.5
{ I 1 , I 2} → { I 3} 4 Valid/Strong
Support(S)= Associative
9

Rule-5: 2 2
Support( I ) =
9 Con = 4 =0.5
{ I 1 , I 3} → { I 2} 4 Valid/Strong
Support(S) = Associative
9
Rule-6: 2 2
Support( I ) =
9 Con= 4 =0.5
{ I 2 , I 3} → { I 1} 4 Valid/Strong Associative
Support(S)=
9

I={ I 1 , I 2 , I 5}
S={{ I 1},{ I 2},{ I 5},{ I 1 , I 2 ,},{ I 1 , I 5},{ I 2 , I 5}}

Rule-1: 2 2
Support( I ) =
9 Con = 6 =0.33
{ I 1} → { I 2 , I 5} 6 Invalid / Weak Associative
Support(S)=
9

Rule-2: 2 2
Support( I ) =
9 Con = 7 =0.28
{ I 2} → { I 1 , I 5} 7 Invalid / Weak Associative
Support(S)=
9

Rule-3: 2 2
Support(I ) =
9 Con = 2 =1
{ I 5} → { I 1 , I 2} 2 Valid/Strong Associative
Support( S) =
9
Rule-4: 2 2
Support( I ) =
9 Con= 4 =0.5
{ I 1 , I 2} → { I 5} 4 Valid/Strong
Support(S)= Associative
9

Rule-5: 2 2
Support( I ) =
9 Con = 2 =1
{ I 1 , I 5} → { I 2} 2 Valid/Strong
Support(S) = Associative
9

Rule-6: 2 2
Support(I ) =
9 Con = 2 =1
{ I 2 , I 5} → { I 1} 2 Valid/Strong Associative
Support( S)=
9

FP Graph Growth Algorithm :

K=5 O=3 C=2 Example how count


Tid Item E=4 Y=3 U=1 1. Count how many rows contain
the letter "O" at least once.
T1 { K,E,M,N,O,Y } M=3 D=1 I=1 2. 2.If a row has one or more
occurrences of "O", count it as 1
N=2 A=1
T2 { D,E,K,N,O,Y }

T3 { A,E,K,M }
T1 has "O" → Count 1
T2 has "O" → Count 1
T3 has no "O" → Count 0
T4 has no "O" → Count 0
T5 has "O" → Count 1

O=3
T4 { C,K,M,U,Y }

T5 { C,E,I,K,O,O }

Given
¿min =3

L={ K:5 , E:4 , M:3 , O:3 , Y:3 } Tree:

Tid Item Ordered List


T1 { K,E,M,N,O,Y } { K,E,M,O,Y}
T2 { D,E,K,N,O,Y } { K,E,O,Y}
T3 { A,E,K,M } {K,E,M}
T4 { C,K,M,U,Y } {K,M,Y}
T5 { C,E,I,K,O,O } {K,E,O}

Items Conditional Pattern Base Conditional Fp Tree Frequent pattern


K Null Null Null
E { K:4 } { K:4 } { <K,E:4> }
M { K,E:2 } , { K:1 } { K:3 } { <K,M:3> }
O {K,E,M:1},{K,E:2} {K,E:3} { <K,O:3> },{ <E,O:3> },{ <K,E,O:3> }
Y {K,E,M,O:1},{K,E,O:1},{K,M:1} {K:3} { <K,Y:3> }
❑❑
Chapter-8

Decision tree : (ID3)

m m
Entropy(S) = -∑ pi log2 ( pi ) = - pi ∑ log 2 ( pi)
i=1 i=1
9 9 5 5
Entropy (S) = - log 2( ) - log 2( ) = 0.94
14 14 14 14

Now Consider the Outlook attribute: ( Sunny , OverCast, Rain )

Overcast(total)= 4
sunny(total)= 5
Overcast(Yes) = 4
Sunny(Yes) = 2 Overcast(No) = 0

Sunny(No) = 3

2 2 3 3 4 4 0 0
Entropy(Sunny)=- log 2( ) - log 2( ) Entropy(Overcast)=- log 2( ) - log 2( )
5 5 5 5 4 4 4 4
= 0.971 =0

Rain(total) = 5
Rain(Yes) = 3
Rain(No) = 2

3 3 2 2
Entropy(Rain)=- log 2( ) - log 2( ) = 971
5 5 5 5
5 4 5
Gain(OutLook)=Entropy (S)- *Entropy(Sunny) - *Entropy(Overcast)- *Entropy(Rain)
14 14 14
5 4 5
= 0.94 - ( ∗0.971 ¿−( ∗0)- ( ∗0.971 ¿
14 14 14
= 0.2464

Now Consider the Temp attribute: ( Hot,Mild, Cool)

Hot(total) = 4
Hot(Yes) = 2 Mild(total) = 6
Hot(No) = 2 Mild(Yes) = 4
Mild(No) = 2

2 2 2 2
Entropy(Hot)=- log 2( ) - log 2( )
4 4 4 4
4 4 2 2
=1 Entropy(Mild)=- log 2( ) - log 2( )
6 6 6 6
= 0.918

Cool(total) = 4
Cool(Yes) = 3
Cool(No) = 1
3 3 1 1
Entropy(Cool)=- log 2( ) - log 2( )
4 4 4 4
= 0.811
4 6 4
Gain(Temp)=Entropy (S)- *Entropy(Hot) - *Entropy(Mild)- *Entropy(Cool)
14 14 14
4 6 4
= 0.94 - ( ∗1 ¿−( ∗0.918)- ( ∗0.811 ¿
14 14 14
= 0.0289

Now Consider the Humidity attribute: (High, Normal)

High(total) = 7
High(Yes) = 3
High(No) = 4 Normal(total)= 7
Normal(Yes) = 6
Normal(No) = 1
3 3 4 4
Entropy(High)=- log 2( ) - log 2( )
7 7 7 7
6 6 1 1
=0.9852 Entropy(Normal)=- log 2( ) - log 2( )
7 7 7 7
= 0.5916
7 7
Gain(Humidity)=Entropy (S)- *Entropy(High) - *Entropy(Normal)
14 14
7 7
= 0.94 - ( ∗0.9852 ¿−( ∗0.5916)
14 14
= 0.1516

Now Consider the Wind attribute: (Weak,Strong)

Weak(total) = 8
Weak(Yes) = 6 Strong(total) = 6
Weak(No) = 2 Strong(Yes) = 3
Strong(No) = 3

6 6 2 2
Entropy(Weak)=- log 2( ) - log 2( )
8 8 8 8
3 3 3 3
=0.94 Entropy(Strong)=- log 2( ) - log 2( )
6 6 6 6
=0.8113

8 6
Gain( Wind)=Entropy (S)- *Entropy(Weak) - *Entropy(Strong)
14 14
8 6
= 0.94 - ( ∗0.94 ¿−( ∗0.8113 ) = 0.0478
14 14
Gain(OutLook) = 0.2464
Gain(Temp) = 0.0289 Gain(OutLook) > others
Gain(Humidity) = 0.1516 Root node = Outlook
Gain( Wind) = 0.0478
For Sunny:

Day temp Humidity Wind Play Tennis

D1 Hot High Weak No

D2 Hot High Strong No

D8 Mild High Weak No

D9 Cool Normal Weak Yes

D11 Mild Normal Strong Yes

2 2 3 3
Entropy( S sunny ¿= - log 2( ) - log 2( ) = 0.97
5 5 5 5

Temp attribute : (Hot,Mild,Cool):

0 0 2 2
Entropy(Hot) =- log 2( ) - log 2( ) =0
2 2 2 2
1 1 1 1
Entropy(Mild) =- log 2( ) - log 2( ) =1
2 2 2 2
1 1 0 0
Entropy(Cool) =- log 2( ) - log 2( ) =0
1 1 1 1
2 2 1
Gain(Temp ¿=Entropy( S sunny ¿ - *Entropy(Hot) - *Entropy(Mild) - *Entropy(Cool)
5 5 5
2 2 1
=0.97-
5
*0 - 5
*1 - 5
*0
= 0.570

Humidity attribute : (High,Normal):

0 0 3 3
Entropy(High) =- log 2( ) - log 2( ) =0
3 3 3 3
2 2 0 0
Entropy(Normal) =- log 2( ) - log 2( ) =0
2 2 2 2
3 2
Gain( Humidity ¿=Entropy( S sunny ¿ - *Entropy(High) - *Entropy(Normal)
5 5
2 2
=0.97-
5
*0 - 5
*0
= 0.97

Wind attribute : (Weak,Strong):

1 1 2 2
Entropy(Weak) =- log 2( ) - log 2( ) =0.9183
3 3 3 3
1 1 1 1
Entropy(Strong) =- log 2( ) - log 2( ) =1
2 2 2 2
3 2
Gain(Wind ¿ =Entropy( S sunny ¿ - *Entropy(Weak) - *Entropy(Strong)
5 5
2 2
=0.97- *0.9183 - *1
5 5
= 0.0192

Gain(Temp ¿= 0.57
Gain( Humidity ¿= 0.97 humidity is highest Gain
Gain(Wind ¿ =0.0192 Humidity is the particular node
For Rain:

Day Temp Wind Play Tennis

D4 Mild Weak Yes

D5 Cool Weak Yes

D6 Cool Strong No

D10 Mild Week Yes

D14 Mild Strong No

3 3 2 2
Entropy( Srain )=- 5 log 2( 5 ) - 5 log 2( 5 ) = 0.97

Attribute Temp (Mild,Cool):


2 2 1 1
Entropy(Mild)=- log 2( ) - log 2( ) =0.9183
3 3 3 3
1 1 1 1
Entropy(Cool)=- log 2( ) - log 2( ) = 1
2 2 2 2
3 2
Gain(Temp ¿= 0.97 - *0.9183 - *1 = 0.0192
5 5
Attribute Wind(Weak,Strong):
3 3 0 0
Entropy(Weak)=- log 2( ) - log 2( ) = 0
3 3 3 3
0 0 2 2
Entropy(Strong)=- log 2( ) - log 2( ) = 0
2 2 2 2
3 2
Gain(Wind ¿ = 0.97 - *0 - *0 = 0.97
5 5

Gain(Temp ¿= 0.0192 Wind is highest Gain


Gain(Wind ¿ = 0.97 Wind is the particular node

Final Decision Tree is:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy