Adobe Scan 16 May 2023
Adobe Scan 16 May 2023
Decision tree learning is one of the most widely adopted algorithms for classification.
Asthe name indicates,it builds a model in the form of a tree structure. Its grouping
exactness is focused with different strategies, and it is exceptionally productive.
Adecision tree is used formulti-dimensional analysis with multiple classes. It is char
acterized by fast execution time and ease in the interpretation of the rules. The goal of
decision tree leaning is to create a model (based on thepast data called past vector)that
predicts the value of the output variable bascd on the input variables in the feature vector.
Each node (or decision node) of adecision trec corresponds to one of the feature
vector. Fromevery node, there are cdgestochildren, whercin there is an edge for
cach of the possible values (oT range of values) of the feature associated with the
node. lhc tree ternninates at different leaf nodes (or terminal nodes) where each
lcaf node represents a possible valuc for the output variable. The output variable is
deteminedby following a path that starts at the root and is guided by the values
of the inputvariables.
Adecision tree is usually represcnted in the format depicted in Figure 78.
B B
F F
FIG. 7.8
Decision tree structure
Figure9 shows an example decision tree tor acar driving - the decision to be
takenis whether to'Keep Going' or to 'Stop; which depends on various situationsas
depicted in the figure. If thesignal is RED in colour,hen thecar should be stopped.
If there is not enough gas (pelrol) in the car, the car should be stopped at the next
available gas station.
Keep driving forward
Is there a Is the
stoplight Yes
stop
ahead?
light
red?
Yes
No Yes
No
Find a petrol
stafion, No
and buv Keep Stop
Is there enough going
gas in the car?
FIG. 7.9
Decision tree example
FIG. 7.10
Training datafor GTS recruitment
Let us try to solve this problem, i.e. predicting whether Chandra will get a job offer
the decision tree correspond.
byusing thedecision tree model. First,we need to draw
the training data given in Figure 710. Accordng to the table, job otter condition
ing to where Aptitude = Low, irrespective of
(i.e. the outcome) is FALSE for allthe cases be taken up as the tirst node of th
can
Other conditions. So. the feature Aptitude
decision tree.
job offer condilion Is TRUE for all the cases k
For Aptitude = High, Commumication =Bad, job offer cond
Communication = Good. For cases where
CGPA = High.
tion is TRUE for allthe cases where decision tree diagram for the table
the complete given in
Figure 711 depicts
Figure 710.
START
High Bad
Aptitude? Communication?
CGPA?
Low Good
High
Job not offered Job offered
Job offered
FIG. 7.11
Decision tree based on the trainingdata
Exhaustive search
1. Place the item in the first
item in the first group group (class). Recursively examine solutions with the
2. Place the item in the (class).
second group (class).
the itemn in the
second grOup (class). Recursively examine solutions with
3. Repeat the above
steps untilthe solution is
Exhaustive search reached.
travels
much time when the decisionthrough the decision tree
values. tree is big with multiple exhaustively, but it will take
leaves and multiple attrnbule
Branch and bound search
Branch and bound uses an
decision tree in full. When the existing best solution to sidestep
have the worst possible algorithm starts, the best searching
solution
of the entire
makes the algorithm value; thus, any solution it finds out is well detined
though that is unlikely initially run down to the
to produce a left-most
is an
branch împrovement.
of
e
solution corresponds to realistic result. In the tree, evo
the
solution. A
programme putting every item in one
can partitioning
initial solution. This can be speed up the process bygroup, and it is an problem, unacceptable
right, the savings can be used as an input for using a fast heuristic to find an
substantial. branch and bound. If the heuristici
Section 7.5 Common Classification Algorithms 191
START
Medium/
Low
Low
Good High
Job not offered
Job offered Job offered
FIG. 7.12
Decision tree based on the training data (depicting a sample path)
Figure 712 depicts a sample path (thick line) for the conditions CGPA = High,
Communication = Bad, Aptitude =High and Programming skills =Bad. According to
the above decision tree, the prediction can be made as Chandra will get the job offer.
There are many implementations of decision tree, the most prominent ones being
CS.0, CART (Classification and Regression Tree), CHAID (Chi-square Automatic
Interaction Detector) and ID3 (Iterative Dichotomiser 3) algorithms. The biggest
challenge of a decision tree algorithm is to find out which feature to split upon. The
main driver for identifying the feature is that the data should be split in such a way
that the partitions created by the split should contain examples belonging to asingle
class. If that happens, the partitions are considered to bepure. Entropy is a measure
of impurity of an attribute or feature adopted by many algorithms such as ID3 and
C5.0. The information gain is calculated on the basis of the decrease in entropy (S)
after a data set is split according to a particular attribute (A). Constructing a decision
tree is all about finding an attribute that returns the highest information gain (i.e. the
most homogeneous branches).
Note:
Like information gain, there are other measures like Gini index or chi-square for
individual nodes to decide the feature on the basis of which the split has to be applied.
TheCART algorithm uses Gini index, while the CHAID algorithm uses chi-square
for deciding the feature for applying split.
Entropy(Sas) =w,
i=1
Entropy (p;)
Let us examine the value of information gain for the training data set shown in
Figure 710. We will find the value of entropy at the beginning before any split happens
and then again after the split happens. We will compare the values for all the cases -
1. when the feature CGPA' is used for the split
2. when the feature Communication' is used for the split
3. when the feature 'Aptitude' is used for the split
4. when the feature 'Programming Skills' isused for the split
Figure 7.13a gives the entropy values for the first level split for each of the cases
mentioned above.
Ascalculated, entropy of the data set before split (i.e. Entropy (Sh:)) =
entropy of the data set after split (i.e. Entropy (Sas)) is 0.99, and
" 0.69 when the feature
'CGPA' is used for split
" 0.63 when the feature
" 0.52 when the feature Communication'
is used for split
" 0.95 when the feature
'Aptitude is used for split
Programming skill' is used for split
Section 7.5 Common 193
Classification Algoritnms
(a) Original data
set:
Yes No Total
Count 8
18
pi 0.44 0.56
-pilog(pi) 0.52 0.47 0.99
Aptitude = High
CGPA Communication Programming Skill Job offered?
High Good Good Yes
Medium Good Good Yes
High Good Bad Yes
High Good Good Yes
High Bad Good Yes
Medium Good Good Yes
Low Bad Bad No
Low Bad Bad No
Medium Good Bad Yes
Medium Bad Good No
Medium Good Bad
Yes
FIG. 7.13B (Continued)
Section 7.5 Common Classification 195
Algorithms
(a) Level 2 starting set:
Yes No Total
Count 3
0.73 0.27
pilogpi) 0.33 0.51 0.85
FIG. 7.13B
Entropy and information gain calculation (Level 2)
196 Chapter 7 Supervised Learning: Classification
As a part of level 3, we willthus have only one branch to navigate in this . case -the
one for Communication = Bad. Figure Z13c presents calculations for level 3.. As can
be seen from the figure, the entropy value is as follows:
0.81 before the split
" 0when the feature CGPA' is used for split
Skill' is used for split
" 0.50when the feature Programming
Count 1 3 4
0.25 0.75
pi
-pi*log(pi) 0.50 0.31 0.81
1 1 Count 1 1 Count 2 2
Count
1.00 0.00 pi 0.00 1.00 pl 0.00 1.00
pi
-pi*log(pi) 0.00 0.00 0.00 -pi*log(pi) 0.00 0.00 0.00 -pi*log(pi) 0.00 0.00 0.00