0% found this document useful (0 votes)
24 views33 pages

Unit 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views33 pages

Unit 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Satish Bansal

Decision Tree & NN


Satish bansal
Satish Bansal

What is a Decision Tree?


• Decision tree as the name suggests it is a flow like a tree structure that works on
the principle of conditions. It is efficient and has strong algorithms used for
predictive analysis. It has mainly attributes that include internal nodes, branches
and a terminal node.

• Every internal node holds a “test” on an attribute, branches hold the conclusion of
the test and every leaf node means the class label. This is the most used algorithm
when it comes to supervised learning techniques.

• It is used for both classifications as well as regression. It is often termed as


“CART” that means Classification and Regression Tree. Tree algorithms are
always preferred due to stability and reliability.
Satish Bansal

Introduction
• Decision tree learning is a method for approximating discrete-
valued target function
• The learned function is represented by a decision tree
• Decision tree can also be re-represented as if-then rules to improve
human readability
• Decision trees classify instances by sorting them down the tree from the root to
some leaf node
• A node
– Specifies some attribute of an instance to be tested
• A branch
– Corresponds to one of the possible values for an attribute
Satish Bansal

How can an algorithm be used to represent


a tree
• Let us see an example of a basic decision tree where it is to be decided
in what conditions to play cricket and in what conditions not to play.
Satish Bansal
Decision Tree Representation)

• Each path corresponds to a conjunction of attribute tests. For


Outlook
example, if the instance is (Outlook=sunny, Temperature=Hot,
Humidity=high, Wind=Strong) then the path of (Outlook=Sunny ∧
Sunny Rain Humidity=High) is matched so that the target value would be NO as

Overcast shown in the tree.

Humidity Wind • A decision tree represents a disjunction of


Yes conjunction of constraints on the attribute values of
instances. For example, three positive instances can
High Strong Weak be represented as (Outlook=Sunny ∧ Humidity=normal)

Normal ∨ (Outlook=Overcast) ∨ (Outlook=Rain ∧Wind=Weak) as


shown in the tree.
No Yes
No Yes
Satish Bansal

Terminology
• Let us now see the common terms used in Decision Tree that is stated below:

• Branches - Division of the whole tree is called branches.


• Root Node - Represent the whole sample that is further divided.
• Splitting - Division of nodes is called splitting.
• Terminal Node - Node that does not split further is called a terminal node.
• Decision Node - It is a node that also gets further divided into different sub-nodes being
a sub node.
• Pruning - Removal of subnodes from a decision node.
• Parent and Child Node - When a node gets divided further then that node is termed as
parent node whereas the divided nodes or the sub-nodes are termed as a child node of the
parent node.
Satish Bansal

How Does Decision Tree Algorithm Work


• It works on both the type of input & output that is categorical and
continuous. In classification problems, the decision tree asks questions, and
based on their answers (yes/no) it splits data into further sub branches.
• It can also be used as a binary classification problem like to predict whether
a bank customer will churn or not, whether an individual who has requested
a loan from the bank will default or not and can even work for multiclass
classifications problems. But how does it do these tasks?
• In a decision tree, the algorithm starts with a root node of a tree then
compares the value of different attributes and follows the next branch until
it reaches the end leaf node. It uses different algorithms to check about the
split and variable that allow the best homogeneous sets of population.
Satish Bansal

Types of Decision Tree


• Type of decision tree depends upon the type of input we have that is
categorical or numerical :
• Categorical Variable Decision Tree
• Continuous Variable Decision Tree

1.If the input is a categorical variable like whether the loan contender will
defaulter or not, that is either yes/no. This type of decision tree is called a
Categorical variable decision tree, also called classification trees.
2.If the input is numeric types and or is continuous in nature like when we have
to predict a house price. Then the used decision tree is called a Continuous
variable decision tree, also called Regression trees.
Satish Bansal

Decision Tree and Classification Task


• Decision tree helps us to classify data.
• Internal nodes are some attribute

• Edges are the values of attributes

• External nodes are the outcome of classification

• Such a classification is, in fact, made by posing questions starting from the
root node to each terminal node.
Satish Bansal

What are appropriate problems for Decision


tree learning?(When to use DT Learning?)
• Decision tree learning is generally best suited to problems with the
following characteristics:
• 1. Instances are represented by attribute-value pairs.
• “Instances are described by a fixed set of attributes (e.g.,
Temperature) and their values (e.g., Hot). The easiest situation for
decision tree learning is when each attribute takes on a small number
of disjoint possible values (e.g., Hot, Mild, Cold). However, extensions
to the basic algorithm allow handling real-valued attributes as well
(e.g., representing Temperature numerically).”
Satish Bansal

What are appropriate problems for


Decision tree learning?
• 2. The target function has discrete output values.
• “The decision tree is usually used for Boolean classification
(e.g., yes or no) kind of example. Decision tree methods easily extend
to learning functions with more than two possible output values. A
more substantial extension allows learning target functions with real-
valued outputs, though the application of decision trees in this setting
is less common.”
• 3. Disjunctive descriptions may be required.
• Decision trees naturally represent disjunctive expressions.
Satish Bansal

What are appropriate problems for


Decision tree learning?
• 4. The training data may contain errors.
• “Decision tree learning methods are robust to errors, both errors in
classifications of the training examples and errors in the attribute
values that describe these examples.”
• 5. The training data may contain missing attribute values.
• “Decision tree methods can be used even when some training
examples have unknown values (e.g., if the Humidity of the day is
known for only some of the training examples).”
Satish Bansal

Appropriate problems for Decision tree


learning?
• Many practical problems have been found to fit these characteristics.
• Equipment Classification
• Medical diagnosis
• Credit risk analysis
• Several tasks in natural language processing
The problems, in which the task is to classify examples into one of a
discrete set of possible categories are often referred to as classification.
Satish Bansal
Learning Algorithm (cont.)

An Illustrative Example
Day Outlook Temperature Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak
D4 Rain Mild High Weak Yes
D5 Rain Normal Weak
D6 Rain Cool Yes
D7 Overcast Normal Strong
D8 Sunny Cool Yes
D9 Sunny Normal Strong No
D10 Rain Cool High Weak
Sunny Normal Weak Yes
D11 Overcast Mild Weak No
Normal
D12 Overcast Cool Strong Yes
Rain Normal
D13 Training examples for the
Mild target
High concept PlayTennisYes
Strong
Normal Weak
D14 Mild High Yes
Strong
Satish Bansal
Entropy is defined as the randomness or measuring the disorder of the
information being processed in Machine Learning. Further, in other words,
we can say that entropy is the machine learning metric that measures the
unpredictability or impurity in the system. hm (cont.)
For the set X =
• Entropy {a,a,a,b,b,b,b,b}
– characterizes the (im)purity of an arbitrary of Total instances: 8
Instances of b: 5
examples Instances of a: 3
Entropy(X)=-
(3/8)log2(3/8)-
– Entropy specifies the minimum (5/8)log2(5/8)
# of bits of information needed
to encode the classification of an
= -
arbitrary member of S [0.375 * (-1.415) + 0.625 *
– For example (-0.678)] =-(-0.53-0.424) =
0.954
• The information required for
classification of Table 3.2
=-(9/14)log2(9/14)-
Satish Bansal

Lists of Algorithms
• ID3 (Iterative Dicotomizer3) – This DT algorithm was developed by
Ross Quinlan that uses greedy algorithms to generate multiple branch
trees. Trees extend to maximum size before pruning.
• C4.5 flourished ID3 by overcoming restrictions of features that are
required to be categorical. It effectively defines distinct attributes for
numerical features. Using if-then condition it converts the trained trees.
• C5.0 uses less space and creates smaller rulesets than C4.5.
• The CART classification and regression tree are similar to C4.5 but it
braces numerical target variables and does not calculate the rule sets. It
generates a binary tree.
Satish Bansal

Hypothesis Space Search


• ID3 can be seen as searching the space of possible decision trees
• simple to complex hill-climbing search
• complete hypothesis space of finite discrete-valued functions
• ID3 maintains only a single current hypothesis
• can't tell how many alternative decision trees are consistent with the
available training data
• can't pose queries for new instances that optimally resolve the competing
hypotheses
• pure ID3 performs no backtracking - can converge to local optimum
• ID3 not incremental - less sensitive to errors in individual training instances -
easily extended to handle noisy data
Satish Bansal

Inductive Bias in Decision Tree Learning


• Much harder to define because of heuristic search.
• Shorter trees are preferred over longer ones.
• Trees that place high information gain attributes close to the root are
preferred over those that do not.
Satish Bansal
Issues in Decision Tree Learning
• Determine how deeply to grow the decision tree
• Handling continuous attributes
• Choosing an appropriate attribute selection measure
• Handling training data with missing attribute values
• Handling attributes with differing costs
• Improving computational efficiency
Satish Bansal

What are the Advantages and Disadvantages of


Decision Trees?
• Advantages
• Decision tree algorithm is effective and is very simple.
• Decision tree algorithms can be used while dealing with the missing values in the
dataset.
• Decision tree algorithms can take care of numeric as well as categorical features.
• Results that are generated from the Decision tree algorithm does not require any
statistical or mathematics knowledge to be explained.

• Disadvantages
• Logics get transformed if there are even small changes in training data.
• Larger trees get difficult to interpret.
• Biased towards three having more levels.
Satish Bansal

Applications of Decision Trees


• Decision Tree is one of the basic and widely-used algorithms in the fields of
Machine Learning. It’s put into use across different areas in classification and
regression modeling. Due to its ability to depict visualized output, one can
easily draw insights from the modeling process flow. Here are a few examples
wherein Decision Tree could be used,
• Business Management
• Customer Relationship Management
• Fraudulent Statement Detection
• Energy Consumption
• Healthcare Management
• Fault Diagnosis
Satish Bansal
In Artificial Neural
Network…
• We want to create machines that
implement the exact working of a
human brain
• Instead of biological neurons there are
artificial neurons that are connected in
the exact same way as neurons in
human brain
• These neurons also learn & train on data
and try to generate better outputs
• The basic of ANN is to put human brain
capability as it is into a machine so that
it can learn from its experiences
Satish Bansal

ANN Vs. BNN


• Similarities

BNN ANN
Soma Node
Dendrites Input
Synapse Weights or connections
Axon Output
Satish Bansal
ANN Vs. BNN
• Criteria Based Comparison
Criteria BNN ANN
Processing Massively parallel processing Some parallel processing

Speed Slower(milliseconds) Faster(nanoseconds)


Control No control unit There is a control unit to monitor the computation.
Mechanism
Size Huge in size, 1011 neurons and 1015 Comparatively small, 102 to 104 nodes
interconnections Mainly depends on the type of network & network
design

Learning They can tolerate ambiguity Very precise, structured and formatted data is
required to tolerate ambiguity
Fault Tolerance BHH exhibits high fault tolerance ANN are slightly tolerant to fault.
Storage Stores the information in the Stores the information in continuous memory
synapse(unlimited) locations(limited)
Satish Bansal

Artificial Neural Networks


• robust approach to approximating real-valued, discrete-valued, and vector-
valued target functions
• for certain types of problems (complex real-world sensor data), ANN are the
most effective learning methods currently known
• learning to recognize handwritten characters, spoken words,or faces
• Inspiration: biological learning systems are built of very complex webs of
interconnected neurons
• ANNs are built of a densely interconnected set of simple units, where each
unit takes a number of real-valued inputs (possibly the outputs of other
units) and produces a single real-valued output (which may become the
input to many other units)
Satish Bansal

Neural Network Representations


•ALVINN(Autonomous Land Vehicle In a Neural Network) - ANN
to steer an autonomous vehicle driving at normal speeds
on public highways
•input - 30 x 32 grid of pixel intensities from a
forward-pointed camera
•output - direction vehicle is steered
•trained to mimic the observed steering commands of
a human driving the vehicle for 5 minutes
Satish Bansal

Backpropagation
● Most common method of obtaining the many
weights in the network
● A form of supervised training
● The basic backpropagation algorithm is based on
minimizing the error of the network using the
derivatives of the error function
► Simple
► Slow
Satish Bansal

Backpropagation Network
Representations
• individual units interconnected in layers that form a
directed graph.
• learning corresponds to choosing a weight value for
each edge in the graph.
• certain types of cycles are allowed.
• vast majority of practical applications are acyclic feed-
forward networks like ALVINN.
Satish Bansal

Appropriate Problems for ANN


•training data is noisy, complex sensor data.
•also problems where symbolic algos are used (decision tree
learning (DTL)) - ANN and DTL produce results of
comparable accuracy.
•instances are attribute-value pairs, attributes may be highly
correlated or independent, values can be any real value.
•target function may be discrete-valued, real-valued or a vector.
•training examples may contain errors.
•long training times are acceptable.
•requires fast eval. of learned target func.
Satish Bansal
Perceptrons
● First neural network with the ability to learn
● Made up of only input neurons and output neurons
● Input neurons typically have two states: ON and OFF
● Output neurons use a simple threshold activation function
● In basic form, can only solve linear problems
● Perceptron is a linear classifier (binary). Also, it is used in supervised
learning. It helps to classify the given input data.
.5

There are two types of Perceptron's: Single layer and Multilayer. .2

.8
•Single layer - Single layer perceptron's can learn only linearly separable
patterns
Input Neurons Weights Output Neuron
•Multilayer - Multilayer perceptron's or feedforward neural networks with two
or more layers have the greater processing power
Satish Bansal
Satish Bansal

Multilayer Feedforward Networks


● Most common neural network
● An extension of the perceptron
► Multiple layers
■ The addition of one or more “hidden” layers in
between the input and output layers
► Activation function is not simply a threshold
■ Usually a sigmoid function
► A general function approximator
■ Not limited to linear problems

● Information flows in one direction


► The outputs of one layer act as inputs to the
next layer
Satish Bansal

Application of NN
• Speech Recognition
• Character Recognition
• Signature Verification
• Human Face Recognition

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy