0% found this document useful (0 votes)

18 views50 pages

Lecture-5 Classification in ML

Classification in ML

Uploaded by

Rimsha Shabbir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views50 pages

Lecture-5 Classification in ML

Classification in ML

Uploaded by

Rimsha Shabbir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Classification in

Machine Learning
Lecture 05
Discussion
2

Whenever fear,
worry, or despair
creeps into your
heart, remind
yourself that Allah
is your guardian,
and nothing can
go wrong when
you put your trust
in Him and please
Him. Let your faith
be greater than
your fear!
Agenda:
•Supervised Vs Un-Supervised Learning (Recap)
•Mandatory steps in ML journey
•Classification in ML
•Classification Vs Numeric Predictions
•Applications of Classification
•Classification Process
•Accuracy Assessment
•Classification Task Illustration
•Classifier-I: Decision Tree (D.T)
•Information Gain and Entropy
•Examples

3
Supervised vs. Unsupervised Learning
• Supervised learning (classification)
– Supervision: The training data (observations, measurements, etc.)
are accompanied by labels indicating the class of the observations
– Two main tasks: Regression and Classification
– New data is classified based on the training set
• Unsupervised learning (clustering)
– The class labels of training data is unknown
– Given a set of measurements, observations, etc. with the aim of
establishing the existence of classes or clusters in the data

4
Better Model Performance:
– Quality of data we are providing to the model
– How much ground facts are embedded inside the provided data
– How much data we are providing for training
– How much data is relevant to train the model
– E.g. student performance prediction
– In ML our target is to have a better performing model that can do
good predictions
– In Classification: Model O/P approximately near than true O/P
– In Regression: Residuals calculation, R-Squared
5
Mandatory steps in ML Journey:
– Data Acquisition
– Data Pre-processing
– Data pre-processing is an integral step in Machine
Learning as the quality of data and the useful information
that can be derived from it directly affects the ability of our
model to learn; therefore, it is extremely important that we
pre-process our data before feeding it into our model
– Data Cleaning
– Handling missing values, encoding, imputation, scaling

6
Mandatory steps in ML Journey:
– Feature Selection (to provide the most relevant data to the model)
– We identify the suitable and best features for model building
– Feature Extraction
– Data dimensions are reduced and provide as a compact
representation to the model
– Features merging
– Dimensionality reduction

7
What is Classification In Machine Learning?
– process of categorizing a given set of data into classes
– process starts with predicting the class of given data points
– classes are often referred to as target, label or categories
– main goal is to identify which class/category the new data will fall into

The classification predictive modelling is the task of approximating the

mapping function from input variables to discrete output variables

8
What is Classification In Machine Learning?
Example: Spam Detection
– This is a binary classification since there are only 2 classes as
spam and not spam
– A classifier utilizes some training data to understand how given
input variables relate to the class
– In this case, known spam and non-spam emails have to be used
as the training data
– When the classifier is trained accurately, it can be used to detect
an unknown email.

9
Some Classification Terminologies In ML:
– Classifier : It is an algorithm that is used to map the input data to a specific
category
– Classification Model – The model predicts or draws a conclusion to the input data
given for training, it will predict the class or category for the data
– Binary Classification – It is a type of classification with two outcomes, for e.g –
either true or false
– Multi-Class Classification – The classification with more than two classes, in multi-
class classification each sample is assigned to one and only one label or target.

10
What is Classification?
– Given a collection of records (training set)
– Each record contains a set of attributes, one of the attributes is the class

– Find a model for class attribute as a function of the values of other attributes.
– Goal: previously unseen records should be assigned a class as accurately as
possible.
– A test set is used to determine the accuracy of the model. Usually, the given data set is
divided into training and test sets, with training set used to build the model and test set
used to validate it.

11
What is Classification?

T id R e fu n d M a r it a l T a x a b le
S ta t u s In c o m e Cheat

1 Yes S in g le 125K No
2 No M a r rie d 100K No
3 No S in g le 70K No
4 Yes M a r rie d 120K No
5 No D iv o r c e d 95K Yes
6 No M a r rie d 60K No
7 Yes D iv o r c e d 220K No
8 No S in g le 85K Yes
9 No M a r rie d 75K No
10 No S in g le 90K Yes

12
Classification Vs. Numeric Prediction:
Classification:
– predicts categorical class labels (discrete or nominal)
– classifies data (constructs a model) based on the training set & the values (class
labels) in a classifying attribute and uses it in classifying new data e.g. Decision T
Numeric Prediction:
– models continuous-valued functions, i.e., predicts unknown or missing values e.g
Regression

13
Applications of Classification:

– Credit/loan approval

– Medical diagnosis: if a tumour is cancerous or benign

– Fraud detection: if a transaction is fraudulent

– Web page categorization: which category it is

14
Classification – A two step process:

Step-I: Model Construction

– Describing a set of predetermined classes

– Each instance is assumed to belong to a predefined class, as

determined by the class label attribute
– The set of tuples used for model construction is training set

– The model is represented as classification rules, decision

trees, or mathematical formulae

15
Step-I: Model Construction

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier

Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
IF rank = ‘professor’
Dave Assistant Prof 6 no
OR years > 6
Anne Associate Prof 3 no
THEN tenured = ‘yes’
16
Classification – A two step process:
– Step-II: Model usage
– It classify future or unknown objects
– Estimate accuracy of the model
– The known label of test sample is compared with the classified result from the
model

– Accuracy rate is the percentage of test set samples that are correctly classified
by the model

– Test set is independent of training set (otherwise overfitting)

– If the accuracy is acceptable, use the model to classify new data

*Note: If the test set is used to select models, it is called validation (test) set
17
Step-II: Model Usage / Model for Prediction

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph
8 Assistant Prof 7 yes

18
Accuracy assessment:
Accuracy:
– Proportion of the values that are correctly predicted by the classifier
– Number of predictions that are correct / total number of instances (at how many
times model gave the correct results)
– Classifier predicted apple as an orange: Misclassification
– Satisfactory level accuracy results as model deployment
– In Regression we talked about the difference of predicted and actual i.e.
residuals/errors

19
Illustrating Classification Task:
– Training and test sets are randomly sampled

20
Find a mapping/ function that can predict class label of given tuple X
Classification Techniques:

• Decision Tree based Methods

• Bayes Classification Methods
• Rule-based Methods
• Nearest-Neighbour Classifier
• Artificial Neural Networks
• Support Vector Machines
• Memory based reasoning

21
Classifier-I: Decision Tree

22
Introduction to Decision Tree:

Decision Tree algorithm belongs to the family of supervised learning algorithms.

Unlike other supervised learning algorithms, decision tree algorithm can be used
for solving regression and classification problems too

23
Decision Tree Algorithm (Pseudocode)
1. Place the best attribute of the dataset at the root of the tree
2. Split the training set into subsets. Subsets should be made in such a
way that each subset contains data with the same value for an
attribute.
3. Repeat step 1 and step 2 on each subset until you find leaf nodes in
all the branches of the tree.

24
Assumptions while creating D.T
1. At the beginning, the whole training set is considered as the root
2. Feature values are preferred to be categorical. If the values are continuous
then they are discretized prior to building the model
3. Records are distributed recursively on the basis of attribute values
4. Order to placing attributes as root or internal node of the tree is done by
using some statistical approach

25
D.T Example:

3(1,4,7) 7(2,3,5,6,8,9,10)

4 (3,5,8,10) 3 (2,6,9)

1 (3)

3(5,8,10)

26
D.T Example:

27
Illustrating Classification Task using D.T:

28
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No