0% found this document useful (0 votes)
18 views50 pages

Lecture-5 Classification in ML

Classification in ML

Uploaded by

Rimsha Shabbir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views50 pages

Lecture-5 Classification in ML

Classification in ML

Uploaded by

Rimsha Shabbir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Classification in

Machine Learning
Lecture 05
Discussion
2

Whenever fear,
worry, or despair
creeps into your
heart, remind
yourself that Allah
is your guardian,
and nothing can
go wrong when
you put your trust
in Him and please
Him. Let your faith
be greater than
your fear!
Agenda:
•Supervised Vs Un-Supervised Learning (Recap)
•Mandatory steps in ML journey
•Classification in ML
•Classification Vs Numeric Predictions
•Applications of Classification
•Classification Process
•Accuracy Assessment
•Classification Task Illustration
•Classifier-I: Decision Tree (D.T)
•Information Gain and Entropy
•Examples

3
Supervised vs. Unsupervised Learning
• Supervised learning (classification)
– Supervision: The training data (observations, measurements, etc.)
are accompanied by labels indicating the class of the observations
– Two main tasks: Regression and Classification
– New data is classified based on the training set
• Unsupervised learning (clustering)
– The class labels of training data is unknown
– Given a set of measurements, observations, etc. with the aim of
establishing the existence of classes or clusters in the data

4
Better Model Performance:
– Quality of data we are providing to the model
– How much ground facts are embedded inside the provided data
– How much data we are providing for training
– How much data is relevant to train the model
– E.g. student performance prediction
– In ML our target is to have a better performing model that can do
good predictions
– In Classification: Model O/P approximately near than true O/P
– In Regression: Residuals calculation, R-Squared
5
Mandatory steps in ML Journey:
– Data Acquisition
– Data Pre-processing
– Data pre-processing is an integral step in Machine
Learning as the quality of data and the useful information
that can be derived from it directly affects the ability of our
model to learn; therefore, it is extremely important that we
pre-process our data before feeding it into our model
– Data Cleaning
– Handling missing values, encoding, imputation, scaling

6
Mandatory steps in ML Journey:
– Feature Selection (to provide the most relevant data to the model)
– We identify the suitable and best features for model building
– Feature Extraction
– Data dimensions are reduced and provide as a compact
representation to the model
– Features merging
– Dimensionality reduction

7
What is Classification In Machine Learning?
– process of categorizing a given set of data into classes
– process starts with predicting the class of given data points
– classes are often referred to as target, label or categories
– main goal is to identify which class/category the new data will fall into

The classification predictive modelling is the task of approximating the


mapping function from input variables to discrete output variables

8
What is Classification In Machine Learning?
Example: Spam Detection
– This is a binary classification since there are only 2 classes as
spam and not spam
– A classifier utilizes some training data to understand how given
input variables relate to the class
– In this case, known spam and non-spam emails have to be used
as the training data
– When the classifier is trained accurately, it can be used to detect
an unknown email.

9
Some Classification Terminologies In ML:
– Classifier : It is an algorithm that is used to map the input data to a specific
category
– Classification Model – The model predicts or draws a conclusion to the input data
given for training, it will predict the class or category for the data
– Binary Classification – It is a type of classification with two outcomes, for e.g –
either true or false
– Multi-Class Classification – The classification with more than two classes, in multi-
class classification each sample is assigned to one and only one label or target.

10
What is Classification?
– Given a collection of records (training set)
– Each record contains a set of attributes, one of the attributes is the class

– Find a model for class attribute as a function of the values of other attributes.
– Goal: previously unseen records should be assigned a class as accurately as
possible.
– A test set is used to determine the accuracy of the model. Usually, the given data set is
divided into training and test sets, with training set used to build the model and test set
used to validate it.

11
What is Classification?

T id R e fu n d M a r it a l T a x a b le
S ta t u s In c o m e Cheat

1 Yes S in g le 125K No
2 No M a r rie d 100K No
3 No S in g le 70K No
4 Yes M a r rie d 120K No
5 No D iv o r c e d 95K Yes
6 No M a r rie d 60K No
7 Yes D iv o r c e d 220K No
8 No S in g le 85K Yes
9 No M a r rie d 75K No
10 No S in g le 90K Yes

12
Classification Vs. Numeric Prediction:
Classification:
– predicts categorical class labels (discrete or nominal)
– classifies data (constructs a model) based on the training set & the values (class
labels) in a classifying attribute and uses it in classifying new data e.g. Decision T
Numeric Prediction:
– models continuous-valued functions, i.e., predicts unknown or missing values e.g
Regression

13
Applications of Classification:

– Credit/loan approval

– Medical diagnosis: if a tumour is cancerous or benign

– Fraud detection: if a transaction is fraudulent

– Web page categorization: which category it is

14
Classification – A two step process:

Step-I: Model Construction


– Describing a set of predetermined classes

– Each instance is assumed to belong to a predefined class, as


determined by the class label attribute
– The set of tuples used for model construction is training set

– The model is represented as classification rules, decision


trees, or mathematical formulae

15
Step-I: Model Construction

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier


Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
IF rank = ‘professor’
Dave Assistant Prof 6 no
OR years > 6
Anne Associate Prof 3 no
THEN tenured = ‘yes’
16
Classification – A two step process:
– Step-II: Model usage
– It classify future or unknown objects
– Estimate accuracy of the model
– The known label of test sample is compared with the classified result from the
model

– Accuracy rate is the percentage of test set samples that are correctly classified
by the model

– Test set is independent of training set (otherwise overfitting)

– If the accuracy is acceptable, use the model to classify new data

*Note: If the test set is used to select models, it is called validation (test) set
17
Step-II: Model Usage / Model for Prediction

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph
8 Assistant Prof 7 yes

18
Accuracy assessment:
Accuracy:
– Proportion of the values that are correctly predicted by the classifier
– Number of predictions that are correct / total number of instances (at how many
times model gave the correct results)
– Classifier predicted apple as an orange: Misclassification
– Satisfactory level accuracy results as model deployment
– In Regression we talked about the difference of predicted and actual i.e.
residuals/errors

19
Illustrating Classification Task:
– Training and test sets are randomly sampled

20
Find a mapping/ function that can predict class label of given tuple X
Classification Techniques:

• Decision Tree based Methods


• Bayes Classification Methods
• Rule-based Methods
• Nearest-Neighbour Classifier
• Artificial Neural Networks
• Support Vector Machines
• Memory based reasoning

21
Classifier-I: Decision Tree

22
Introduction to Decision Tree:

Decision Tree algorithm belongs to the family of supervised learning algorithms.


Unlike other supervised learning algorithms, decision tree algorithm can be used
for solving regression and classification problems too

23
Decision Tree Algorithm (Pseudocode)
1. Place the best attribute of the dataset at the root of the tree
2. Split the training set into subsets. Subsets should be made in such a
way that each subset contains data with the same value for an
attribute.
3. Repeat step 1 and step 2 on each subset until you find leaf nodes in
all the branches of the tree.

24
Assumptions while creating D.T
1. At the beginning, the whole training set is considered as the root
2. Feature values are preferred to be categorical. If the values are continuous
then they are discretized prior to building the model
3. Records are distributed recursively on the basis of attribute values
4. Order to placing attributes as root or internal node of the tree is done by
using some statistical approach

25
D.T Example:

3(1,4,7) 7(2,3,5,6,8,9,10)

4 (3,5,8,10) 3 (2,6,9)

1 (3)

3(5,8,10)

26
D.T Example:

27
Illustrating Classification Task using D.T:

28
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

29
Apply Model to Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

15

30
Apply Model to Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

16

31
Apply Model to Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

17

32
Apply Model to Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

18

33
Apply Model to Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married Assign Cheat to “No”

TaxInc NO
< 80K > 80K

NO YES

19

34
Illustrating Classification Task using D.T:
Tid Attrib1 Attrib2 Attrib3 Class

1 Yes Large 125K No

2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No

5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn


8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes


10

Apply
Decision
Model
Tid Attrib1 Attrib2 Attrib3 Class Tree
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ?

14 No Small 95K ?

15 No Large 67K ?
10

35
Decision Tree Inductions:

• Many algorithms exists:


1. Hunt’s Algorithm

2. ID3, C4.5

3. CART

4. SLIQ,SPRINT

36
Algorithm for Decision Tree Induction
• Basic algorithm (a greedy algorithm)
– Tree is constructed in a top-down recursive divide-and-conquer manner
– At start, all the training examples are at the root
– Attributes are categorical (if continuous-valued, they are
discretized in advance)
– Examples are partitioned recursively based on selected
attributes
– Test attributes are selected on the basis of a heuristic or
statistical measure (e.g., information gain)
• Conditions for stopping partitioning
– All samples for a given node belong to the same class
– There are no remaining attributes for further partitioning –
majority voting is employed for classifying the leaf
– There are no samples left

37
Tree Induction:
• Greedy strategy
– Split the records based on an attribute test that optimizes
certain criterion.

• Issues
– Determine how to split the records
• How to specify the attribute test condition?
• How to determine the best split?
– Determine when to stop splitting

38
How to Specify Test Conditions?

• Depends on attribute types


– Nominal
– Ordinal
– Continuous
• Depends on number of ways to split
– 2-way split
– Multi-way split

39
Splitting Based on Nominal Attributes
• Multi-way split: Use as many partitions
as distinct values.
CarType
Family Luxury
Sports

• Binary split: Divides values into two subsets.


Need to find optimal partitioning.
CarType CarType
{Sports, {Family,
Luxury} {Family} OR Luxury} {Sports}

40
Splitting Based on Ordinal Attributes
• Multi-way split: Use as many partitions as
distinct values.
Size
Small Large
Medium

• Binary split: Divides values into two subsets.


Need to find optimal partitioning.
Size Size
{Small,
{Large}
OR {Medium,
{Small}
Medium} Large}

• What about this split? {Small,


Size
{Medium}
Large}

41
Splitting Based on Continuous Attributes
• Different ways of handling
– Discretization to form an ordinal categorical attribute
• Static – discretize once at the beginning
• Dynamic – ranges can be found by equal interval bucketing,
equal frequency bucketing (percentiles), or clustering.

– Binary Decision: (A < v) or (A >


• consider all possible splits and finds the best cut
• can be more compute intensive

42
Splitting Based on Continuous Attributes

43
How to determine the Best Split?
Before Splitting: 10 records of class 0, 10 records of class 1

Which test condition is the best?

44
How to determine the Best Split
• Greedy approach:
– Nodes with homogeneous class distribution are preferred
• Need a measure of node impurity

Non-homogeneous, Homogeneous,
High degree of impurity Low degree of impurity

The degree of impurity or confusion is called as Entropy


45
Attribute Selection-Splitting Rules Measures
(Measures of Node Impurity)

Provides a ranking for each attribute describing the given


training tuples. The attribute having the best score for the
measure is chosen as the splitting attribute for the given tuples.

• Information Gain-Entropy All methods measure


impurity of nodes. We try
• Gini Index to select more pure nodes
• Misclassification error (that leads towards
homogeneity)

46
Brief Review of Entropy

m=2

-(9/10)log(9/10)-1/10log(1/10) Note: log with the base 2 not the natural log
47
Attribute Selection Measure: Information
Gain (ID3/C4.5)

48
Attribute Selection: Information Gain

49
Thank you
Any Question?

50

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy