0% found this document useful (0 votes)

56 views45 pages

ML-Lec-06-Supervised Learning-Decision Trees

The document discusses supervised learning, specifically classification, which involves using a training dataset containing examples with known labels to learn a model that can predict the labels of new examples. Classification problems involve learning to classify examples into categories, and common techniques include decision trees, rules, neural networks, naive Bayes, and support vector machines. Decision trees represent a classification model as a tree structure where internal nodes represent attributes and leaves represent class labels that are predicted for new examples.

Uploaded by

Hamza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views45 pages

ML-Lec-06-Supervised Learning-Decision Trees

Uploaded by

Hamza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 45

Lecture 06

CS-13410
Introduction to Machine Learning
Supervised Learning
(Decision Trees – Ch # 3 by Tom Mitchell)
by
Mudasser Naseer
Types of Learning
 Supervised Learning

Supervised learning is the machine learning task of learning a

function that maps an input to an output based on example
input-output pairs. It infers a function from labeled training
data consisting of a set of training examples. In supervised
learning, each example is a pair consisting of an input object
(typically a vector) and a desired output value (also called
the supervisory signal). (Wikipedia)

2
Supervised Learning – Detailed Definition
In supervised learning we have input variables (X) and an output
variable (Y) and we use an algorithm to learn the mapping
function from the input to the output. Y = f(X)
The goal is to approximate the mapping function so well that
when you have new input data (X) that you can predict the output
variables (Y) for that data.
It is called supervised learning because the process of an
algorithm learning from the training dataset can be thought of as
a teacher supervising the learning process. We know the correct
answers, the algorithm iteratively makes predictions on the
training data and is corrected by the teacher. Learning stops when
the algorithm achieves an acceptable level of performance.

3
Supervised Learning
Supervised learning problems can be further grouped
into regression and classification problems.
Classification: A classification problem is when the output
variable is a category, such as “red” or “blue” or “disease”
and “no disease”.
Regression: A regression problem is when the output
variable is a real value, such as “dollars” or “weight”.

4
Catching tax-evasion

Tax-return data for year 2011

A new tax return for 2012

Is this a cheating tax return?

An instance of the classification problem: learn a method for

discriminating between records of different classes (cheaters vs
non-cheaters)
5
What is classification?
Classification is the task of learning a target function f that
maps attribute set x to one of the predefined class labels y
l l s
r ica r ica uou
e go e go ti n ss
at at n l a
c c co c

One of the attributes is the class attribute

In this case: Cheat

Two class labels (or classes): Yes (1), No (0)

6
What is classification (cont…)
The target function f is known as a classification model

Descriptive modeling: Explanatory tool to distinguish

between objects of different classes (e.g., understand
why people cheat on their taxes)

Predictive modeling: Predict a class of a previously

unseen record

7
Examples of Classification Tasks
Predicting tumor cells as benign or malignant

Classifying credit card transactions as legitimate or

fraudulent

Categorizing news stories as finance, weather, sports,

entertainment, etc

Identifying spam email, spam web pages, adult content

Understanding if a web query has commercial intent or not

8
General approach to classification
Training set consists of records with known class labels

Training set is used to build a classification model

A labeled test set of previously unseen data records is

used to evaluate the quality of the model.

The classification model is applied to new records with

unknown class labels

9
Illustrating Classification Task

10
Evaluation of classification models
Counts of test records that are correctly (or incorrectly)
predicted by the classification model
Confusion matrix Predicted Class
Class = 1 Class = 0

Actual Class
Class = 1 f11 f10
Class = 0 f01 f00

11
Classification Techniques
Decision Tree based Methods
Rule-based Methods
Memory based reasoning
Neural Networks
Naïve Bayes and Bayesian Belief Networks
Support Vector Machines

12
Classification Techniques
Decision Tree based Methods
Rule-based Methods
Memory based reasoning
Neural Networks
Naïve Bayes and Bayesian Belief Networks
Support Vector Machines

13
Decision Trees
Decision tree
A flow-chart-like tree structure
Internal node denotes a test on an attribute
Branch represents an outcome of the test
Leaf nodes represent class labels or class distribution

14
Example of a Decision Tree
l l
ca ca us
ri ri u o
ego ego tin ss
t t n a
ca ca co cl
Splitting Attributes

Refund
Yes No
Test outcom
NO MarSt
Single, Divorced Married

TaxInc NO
<= 80K > 80K

NO YES

Class labe
Training Data Model: Decision Tree
15
Another Example of Decision Tree
l l s
ir ca ir ca uo
u
ego ego tin ss
t t n la
ca ca co c MarSt Single,
Married Divorced

NO Refund
Yes No

NO TaxInc
<= 80K > 80K

NO YES

There could be more than one tree that

fits the same data!

16
Decision Tree Classification Task

Decision
Tree

17
Apply Model to Test Data
Test Data
Start from the root of tree.

Refund
Yes No