0% found this document useful (0 votes)
56 views45 pages

ML-Lec-06-Supervised Learning-Decision Trees

The document discusses supervised learning, specifically classification, which involves using a training dataset containing examples with known labels to learn a model that can predict the labels of new examples. Classification problems involve learning to classify examples into categories, and common techniques include decision trees, rules, neural networks, naive Bayes, and support vector machines. Decision trees represent a classification model as a tree structure where internal nodes represent attributes and leaves represent class labels that are predicted for new examples.

Uploaded by

Hamza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views45 pages

ML-Lec-06-Supervised Learning-Decision Trees

The document discusses supervised learning, specifically classification, which involves using a training dataset containing examples with known labels to learn a model that can predict the labels of new examples. Classification problems involve learning to classify examples into categories, and common techniques include decision trees, rules, neural networks, naive Bayes, and support vector machines. Decision trees represent a classification model as a tree structure where internal nodes represent attributes and leaves represent class labels that are predicted for new examples.

Uploaded by

Hamza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Lecture 06

CS-13410
Introduction to Machine Learning
Supervised Learning
(Decision Trees – Ch # 3 by Tom Mitchell)
by
Mudasser Naseer
Types of Learning
 Supervised Learning

Supervised learning is the machine learning task of learning a


function that maps an input to an output based on example
input-output pairs. It infers a function from labeled training
data consisting of a set of training examples. In supervised
learning, each example is a pair consisting of an input object
(typically a vector) and a desired output value (also called
the supervisory signal). (Wikipedia)

2
Supervised Learning – Detailed Definition
In supervised learning we have input variables (X) and an output
variable (Y) and we use an algorithm to learn the mapping
function from the input to the output. Y = f(X)
The goal is to approximate the mapping function so well that
when you have new input data (X) that you can predict the output
variables (Y) for that data.
It is called supervised learning because the process of an
algorithm learning from the training dataset can be thought of as
a teacher supervising the learning process. We know the correct
answers, the algorithm iteratively makes predictions on the
training data and is corrected by the teacher. Learning stops when
the algorithm achieves an acceptable level of performance.

3
Supervised Learning
Supervised learning problems can be further grouped
into regression and classification problems.
Classification: A classification problem is when the output
variable is a category, such as “red” or “blue” or “disease”
and “no disease”.
Regression: A regression problem is when the output
variable is a real value, such as “dollars” or “weight”.

4
Catching tax-evasion

Tax-return data for year 2011

A new tax return for 2012


Is this a cheating tax return?

An instance of the classification problem: learn a method for


discriminating between records of different classes (cheaters vs
non-cheaters)
5
What is classification?
Classification is the task of learning a target function f that
maps attribute set x to one of the predefined class labels y
l l s
r ica r ica uou
e go e go ti n ss
at at n l a
c c co c

One of the attributes is the class attribute


In this case: Cheat

Two class labels (or classes): Yes (1), No (0)

6
What is classification (cont…)
The target function f is known as a classification model

Descriptive modeling: Explanatory tool to distinguish


between objects of different classes (e.g., understand
why people cheat on their taxes)

Predictive modeling: Predict a class of a previously


unseen record

7
Examples of Classification Tasks
Predicting tumor cells as benign or malignant

Classifying credit card transactions as legitimate or


fraudulent

Categorizing news stories as finance, weather, sports,


entertainment, etc

Identifying spam email, spam web pages, adult content

Understanding if a web query has commercial intent or not


8
General approach to classification
Training set consists of records with known class labels

Training set is used to build a classification model

A labeled test set of previously unseen data records is


used to evaluate the quality of the model.

The classification model is applied to new records with


unknown class labels

9
Illustrating Classification Task

10
Evaluation of classification models
Counts of test records that are correctly (or incorrectly)
predicted by the classification model
Confusion matrix Predicted Class
Class = 1 Class = 0

Actual Class
Class = 1 f11 f10
Class = 0 f01 f00

11
Classification Techniques
Decision Tree based Methods
Rule-based Methods
Memory based reasoning
Neural Networks
Naïve Bayes and Bayesian Belief Networks
Support Vector Machines

12
Classification Techniques
Decision Tree based Methods
Rule-based Methods
Memory based reasoning
Neural Networks
Naïve Bayes and Bayesian Belief Networks
Support Vector Machines

13
Decision Trees
Decision tree
A flow-chart-like tree structure
Internal node denotes a test on an attribute
Branch represents an outcome of the test
Leaf nodes represent class labels or class distribution

14
Example of a Decision Tree
l l
ca ca us
ri ri u o
ego ego tin ss
t t n a
ca ca co cl
Splitting Attributes

Refund
Yes No
Test outcom
NO MarSt
Single, Divorced Married

TaxInc NO
<= 80K > 80K

NO YES

Class labe
Training Data Model: Decision Tree
15
Another Example of Decision Tree
l l s
ir ca ir ca uo
u
ego ego tin ss
t t n la
ca ca co c MarSt Single,
Married Divorced

NO Refund
Yes No

NO TaxInc
<= 80K > 80K

NO YES

There could be more than one tree that


fits the same data!

16
Decision Tree Classification Task

Decision
Tree

17
Apply Model to Test Data
Test Data
Start from the root of tree.

Refund
Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
<= 80K > 80K

NO YES

18
Apply Model to Test Data
Test Data

Refund
Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
<= 80K > 80K

NO YES

19
Apply Model to Test Data
Test Data

Refund
Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
<= 80K > 80K

NO YES

20
Apply Model to Test Data
Test Data

Refund
Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
<= 80K > 80K

NO YES

21
Apply Model to Test Data
Test Data

Refund
Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
<= 80K > 80K

NO YES

22
Apply Model to Test Data
Test Data

Refund
Yes No

NO MarSt
Single, Divorced Married Assign Cheat to “No”

TaxInc NO
<= 80K > 80K

NO YES

23
Decision Tree Classification Task

Decision
Tree

24
Tree Induction
Finding the best decision tree is NP-hard

Greedy strategy.
Split the records based on an attribute test that
optimizes certain criterion.

Many Algorithms:
Hunt’s Algorithm (one of the earliest)
CART
ID3, C4.5
SLIQ, SPRINT

25
ID3 Algorithm
ID3 (Iterative Dichotomiser 3) is an algorithm
invented by Ross Quinlan used to generate a decision
tree from a dataset.
C4.5 is its successor.
These algorithms employs a top-down, greedy search
through the space of possible decision trees.

26
Which Attribute is the Best Classifier?
The central choice in the ID3 algorithm is selecting
which attribute should be tested at the root of the tree and
then at each node in the tree

We would like to select the attribute which is most useful


for classifying examples

For this we need a good quantitative measure

For this purpose a statistical property, called information


gain is used
27
Which Attribute is the Best Classifier?
Definition of Entropy
In order to define information gain precisely, we
begin by defining entropy

Entropy is a measure commonly used in information


theory.

Entropy characterizes the impurity of an arbitrary


collection of examples

28
Entropy
Entropy (D)
Entropy of data set D is denoted by H(D)
Ci s are the possible classes
pi = fraction of records from D that have class C

The range of entropy is from 0 to log2(m), where m


is the number of classes. Maximum value is
attained when all the classes have equal
proportion. 29
Entropy Examples
Example:
10 records have class A
20 records have class B
30 records have class C
40 records have class D
Entropy = -[(.1 log .1) + (.2 log .2) + (.3 log .3) + (.4
log .4)]
Entropy = 1.846

30
Splitting Criterion
Example:
Two classes, +/-
100 records overall (50 +s and 50 -s)
A and B are two binary attributes
Records with A=0: 48+, 2-
Records with A=1: 2+, 48-
Records with B=0: 26+, 24-
Records with B=1: 24+, 26-
Splitting on A is better than splitting on B
A does a good job of separating +s and -s
B does a poor job of separating +s and -s

31
Which Attribute is the Best Classifier?
Information Gain
The expected information needed to classify a tuple in
D is

How much more information would we still need


(after partitioning at attribute A) to arrive at an exact
classification? This amount is measured by

In general, we write , where D is the collection of


examples & A is an attribute
32
Entropy (for two classes i.e. m=2)

33
Information Gain
Gain of an attribute split: compare the impurity of
the parent node with the average impurity of the
child nodes

Maximizing the gain  Minimizing the weighted


average impurity measure of children nodes

34
DECISION TREES
Which Attribute is the Best Classifier?: Information Gain

35
DECISION TREES

Which Attribute is the Best Classifier?: Information Gain


The collection of examples has 9 positive values and 5
negative ones
Entropy(D) = Entropy(S) = 0.940
Eight (6 positive and 2 negative ones) of these examples
have the attribute value Wind = Weak

Six (3 positive and 3 negative ones) of these examples


have the attribute value Wind = Strong

36
DECISION TREES
Which Attribute is the Best Classifier?: Information Gain
The information gain obtained by separating the examples
according to the attribute Wind is calculated as:

¿ 0.94 − ( 8 /14 ) ∗ ( 0.811 ) − ( 6 / 14 ) ∗(1.0)

37
DECISION TREES
Which Attribute is the Best Classifier?: Information Gain
We calculate the Info Gain for each attribute and select
the attribute having the highest Info Gain

38
DECISION TREES
Example

Which attribute should be selected as the first test?

“Outlook” provides the most information, so we put “Outlook”


at root node of the decision tree.
39
DECISION TREES

40
DECISION TREES
Example

The process of selecting a new attribute is now


repeated for each (non-terminal) descendant
node, this time using only training examples
associated with that node

Attributes that have been incorporated higher in


the tree are excluded, so that any given attribute
can appear at most once along any path through
the tree
41
DECISION TREES
Example

This process continues for each new leaf node


until either:

1. Every attribute has already been included along


this path through the tree

2. The training examples associated with a leaf


node have zero entropy

42
DECISION TREES
Example

43
DECISION TREES
From Decision Trees to Rules
Next Step: Make rules from the decision tree

After making the identification tree, we trace each path


from the root node to leaf node, recording the test
outcomes as antecedents and the leaf node classification as
the consequent
Simple way: one rule for each leaf

For our example we have:

If the Outlook is Sunny and the Humidity is High then No


If the Outlook is Sunny and the Humidity is Normal then
Yes 44
DECISION TREES FOR BOOLEAN FUNCTIOS

Example

Develop decision tree for the following Boolean function.


A∧ B

A B A∧ B
F F F
T F F
F T F
T T T

45

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy