Saad Iqbal 301-211073 Assign 2
Saad Iqbal 301-211073 Assign 2
Section: BSCS-B
Semester: 6th
Assignment 2
Question 1
You are given a dataset containing information about weather conditions and whether people
decide to play tennis under those conditions. The dataset includes the following attributes:
1. Outlook: {Sunny, Overcast, Rain}
2. Temperature: {Hot, Mild, Cool}
3. Humidity: {High, Normal}
4. Wind: {Weak, Strong}
5. PlayTennis: {Yes, No} (target attribute)
To calculate the information gain for each attribute, we need to find the entropy of each
subset of the data based on the attribute values and then apply the information gain
formula.
➢ Entropy of overcast:
Outlook = Overcast
Instances with "PlayTennis = Yes" = 4
Instances with "PlayTennis = No" = 0
Entropy (Overcast) = -(4/4 * log2(4/4)) - (0/4 * log2(0/4)) = 0
➢ Entropy of rain:
Outlook = Rain
Instances with "PlayTennis = Yes" = 3
Instances with "PlayTennis = No" = 2
Entropy (Rain) = -(3/5 * log2(3/5)) - (2/5 * log2(2/5)) = 0.9710
➢ Entropy of mild:
Temperature = Mild
Instances with "PlayTennis = Yes" = 4
Instances with "PlayTennis = No" = 2
Entropy (Mild) = -(4/6 * log2(4/6)) - (2/6 * log2(2/6)) = 0.9183
➢ Entropy of cool:
Temperature = Cool
Instances with "PlayTennis = Yes" = 3
Instances with "PlayTennis = No" = 1
Entropy (Cool) = -(3/4 * log2(3/4)) - (1/4 * log2(1/4)) = 0.8113
➢ Information gain of temperature:
Information Gain (Temperature) = Entropy(S) - ((4/14 * Entropy(Hot)) + (6/14 *
Entropy(Mild)) + (4/14 * Entropy(Cool)))
= 0.9403 - ((4/14 * 1) + (6/14 * 0.9183) + (4/14 * 0.8113))
= 0.9403 - (0.2857 + 0.3937 + 0.2317)
= 0.9403 - 0.9111
= 0.0292
➢ Entropy of normal:
Humidity = Normal
Instances with "PlayTennis = Yes" = 6
Instances with "PlayTennis = No" = 1
Entropy (Normal) = -(6/7 * log2(6/7)) - (1/7 * log2(1/7)) = 0.5917
➢ Entropy of strong:
Wind = Strong
Instances with "PlayTennis = Yes" = 3
Instances with "PlayTennis = No" = 3
Entropy (Strong) = -(3/6 * log2(3/6)) - (3/6 * log2(3/6)) = 1
Based on the information gain calculations, the attribute with the highest information gain is
"Outlook" with an information gain of 0.2469. Therefore, the ID3 algorithm will choose
"Outlook" as the root node for the decision tree.
After selecting "Outlook" as the root node, the ID3 algorithm will create branches for each
possible value of "Outlook" (Sunny, Overcast, and Rain). Then, for each branch, the algorithm
will recursively construct the decision tree by selecting the next attribute with the highest
information gain on the subset of instances corresponding to that branch.
The process continues until all instances in a branch belong to the same class (i.e., the
entropy is zero), or there are no remaining attributes to split on. In the latter case, the
algorithm assigns the majority class in that subset as the leaf node.
The algorithm repeats this process for each branch, creating sub-trees until all instances are
classified or there are no more attributes to split on.