Unit 3
Unit 3
• Every internal node holds a “test” on an attribute, branches hold the conclusion of
the test and every leaf node means the class label. This is the most used algorithm
when it comes to supervised learning techniques.
Introduction
• Decision tree learning is a method for approximating discrete-
valued target function
• The learned function is represented by a decision tree
• Decision tree can also be re-represented as if-then rules to improve
human readability
• Decision trees classify instances by sorting them down the tree from the root to
some leaf node
• A node
– Specifies some attribute of an instance to be tested
• A branch
– Corresponds to one of the possible values for an attribute
Satish Bansal
Terminology
• Let us now see the common terms used in Decision Tree that is stated below:
1.If the input is a categorical variable like whether the loan contender will
defaulter or not, that is either yes/no. This type of decision tree is called a
Categorical variable decision tree, also called classification trees.
2.If the input is numeric types and or is continuous in nature like when we have
to predict a house price. Then the used decision tree is called a Continuous
variable decision tree, also called Regression trees.
Satish Bansal
• Such a classification is, in fact, made by posing questions starting from the
root node to each terminal node.
Satish Bansal
An Illustrative Example
Day Outlook Temperature Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak
D4 Rain Mild High Weak Yes
D5 Rain Normal Weak
D6 Rain Cool Yes
D7 Overcast Normal Strong
D8 Sunny Cool Yes
D9 Sunny Normal Strong No
D10 Rain Cool High Weak
Sunny Normal Weak Yes
D11 Overcast Mild Weak No
Normal
D12 Overcast Cool Strong Yes
Rain Normal
D13 Training examples for the
Mild target
High concept PlayTennisYes
Strong
Normal Weak
D14 Mild High Yes
Strong
Satish Bansal
Entropy is defined as the randomness or measuring the disorder of the
information being processed in Machine Learning. Further, in other words,
we can say that entropy is the machine learning metric that measures the
unpredictability or impurity in the system. hm (cont.)
For the set X =
• Entropy {a,a,a,b,b,b,b,b}
– characterizes the (im)purity of an arbitrary of Total instances: 8
Instances of b: 5
examples Instances of a: 3
Entropy(X)=-
(3/8)log2(3/8)-
– Entropy specifies the minimum (5/8)log2(5/8)
# of bits of information needed
to encode the classification of an
= -
arbitrary member of S [0.375 * (-1.415) + 0.625 *
– For example (-0.678)] =-(-0.53-0.424) =
0.954
• The information required for
classification of Table 3.2
=-(9/14)log2(9/14)-
Satish Bansal
Lists of Algorithms
• ID3 (Iterative Dicotomizer3) – This DT algorithm was developed by
Ross Quinlan that uses greedy algorithms to generate multiple branch
trees. Trees extend to maximum size before pruning.
• C4.5 flourished ID3 by overcoming restrictions of features that are
required to be categorical. It effectively defines distinct attributes for
numerical features. Using if-then condition it converts the trained trees.
• C5.0 uses less space and creates smaller rulesets than C4.5.
• The CART classification and regression tree are similar to C4.5 but it
braces numerical target variables and does not calculate the rule sets. It
generates a binary tree.
Satish Bansal
• Disadvantages
• Logics get transformed if there are even small changes in training data.
• Larger trees get difficult to interpret.
• Biased towards three having more levels.
Satish Bansal
BNN ANN
Soma Node
Dendrites Input
Synapse Weights or connections
Axon Output
Satish Bansal
ANN Vs. BNN
• Criteria Based Comparison
Criteria BNN ANN
Processing Massively parallel processing Some parallel processing
Learning They can tolerate ambiguity Very precise, structured and formatted data is
required to tolerate ambiguity
Fault Tolerance BHH exhibits high fault tolerance ANN are slightly tolerant to fault.
Storage Stores the information in the Stores the information in continuous memory
synapse(unlimited) locations(limited)
Satish Bansal
Backpropagation
● Most common method of obtaining the many
weights in the network
● A form of supervised training
● The basic backpropagation algorithm is based on
minimizing the error of the network using the
derivatives of the error function
► Simple
► Slow
Satish Bansal
Backpropagation Network
Representations
• individual units interconnected in layers that form a
directed graph.
• learning corresponds to choosing a weight value for
each edge in the graph.
• certain types of cycles are allowed.
• vast majority of practical applications are acyclic feed-
forward networks like ALVINN.
Satish Bansal
.8
•Single layer - Single layer perceptron's can learn only linearly separable
patterns
Input Neurons Weights Output Neuron
•Multilayer - Multilayer perceptron's or feedforward neural networks with two
or more layers have the greater processing power
Satish Bansal
Satish Bansal
Application of NN
• Speech Recognition
• Character Recognition
• Signature Verification
• Human Face Recognition