Decision-Tree-Classifier_Manual
Decision-Tree-Classifier_Manual
Engineering
ML Assignment No. 2
R C V T Total Dated Sign
(2) (4) (2) (2) (10)
2.1 Title
A dataset collected in a cosmetics shop showing details of customers and whether or not they responded to
a special offer to buy a new lip-stick is shown in table below. Use this dataset to build a decision tree, with
Buys as the target variable, to help in buying lip-sticks in the future. Find the root node of decision tree.
According to the decision tree you have made from previous training data set, what is the decision for the
test data: [Age < 21, Income = Low, Gender = Female, Marital Status = Married]?
2.3 Prerequisite:
Learn How to Apply Decision Tree Classifier to find the root node of decision tree. According to the
decision tree you have made from previous training data set.
2.7 Outcomes:
After completion of this assignment students are able Implement code for Create Decision tree for given
dataset and find the root node for same based on the given condition.
2.8.1 Motivation
Suppose we have following plot for two classes represented by black circle and blue squares. Is it possible
to draw a single separation line ? Perhaps no.
Lab Practices-III Fourth Year Computer Engineering
Engineering
Can you draw single division line for these classes?
We will need more than one line, to divide into classes. Something similar to following image:
We need two lines here one separating according to threshold value of x and other for threshold value of y.
Decision Tree Classifier, repetitively divides the working area(plot) into sub part by identifying
lines. (repetitively because there may be two distant regions of same class divided by other as shown in
image below).
1. Impurity-
In above division, we had clear separation of classes. But what if we had following case?
Impurity is when we have a traces of one class division into other. This can arise due to following reason
2. We run out of available features to divide the class upon.
We tolerate some percentage of impurity (we stop further division) for faster performance. (There is
always trade off between accuracy and performance).
For example in second case we may stop our division when we have x number of fewer number of
elements left. This is also known as gini impurity.
Lab Practices-III Fourth Year Computer Engineering
Engineering
2. Entropy
Entropy is degree of randomness of elements or in other words it is measure of
impurity. Mathematically, it can be calculated with the help of probability of the items as:
3. Information Gain
Suppose we have multiple features to divide the current working set. What feature should we select for
division? Perhaps one that gives us less impurity.
Suppose we divide the classes into multiple branches as follows, the information gain at any node is
defined as
Information Gain (n) = Entropy(x) — ([weighted average] * entropy(children for feature))
This need a bit explanation!
Suppose we have following class to work with intially
Lab Practices-III Fourth Year Computer Engineering
Engineering
112234445
Suppose we divide them based on property: divisible by 2
What is the decision for the test data: [Age < 21, Income = Low, Gender = Female, Marital Status =
Married]?
2.10 Algorithm
In this way we learn that to how to create Decision Tree based on given decision, Find the Root Node of
the tree using Decision tree Classifier.
References:-
1. https://medium.com/machine-learning-101/chapter-3-decision-trees-theory-e7398adac567
2. Mittu Skillologies Youtube Channel
3. http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/