Lecture-5 Classification in ML
Lecture-5 Classification in ML
Machine Learning
Lecture 05
Discussion
2
Whenever fear,
worry, or despair
creeps into your
heart, remind
yourself that Allah
is your guardian,
and nothing can
go wrong when
you put your trust
in Him and please
Him. Let your faith
be greater than
your fear!
Agenda:
•Supervised Vs Un-Supervised Learning (Recap)
•Mandatory steps in ML journey
•Classification in ML
•Classification Vs Numeric Predictions
•Applications of Classification
•Classification Process
•Accuracy Assessment
•Classification Task Illustration
•Classifier-I: Decision Tree (D.T)
•Information Gain and Entropy
•Examples
3
Supervised vs. Unsupervised Learning
• Supervised learning (classification)
– Supervision: The training data (observations, measurements, etc.)
are accompanied by labels indicating the class of the observations
– Two main tasks: Regression and Classification
– New data is classified based on the training set
• Unsupervised learning (clustering)
– The class labels of training data is unknown
– Given a set of measurements, observations, etc. with the aim of
establishing the existence of classes or clusters in the data
4
Better Model Performance:
– Quality of data we are providing to the model
– How much ground facts are embedded inside the provided data
– How much data we are providing for training
– How much data is relevant to train the model
– E.g. student performance prediction
– In ML our target is to have a better performing model that can do
good predictions
– In Classification: Model O/P approximately near than true O/P
– In Regression: Residuals calculation, R-Squared
5
Mandatory steps in ML Journey:
– Data Acquisition
– Data Pre-processing
– Data pre-processing is an integral step in Machine
Learning as the quality of data and the useful information
that can be derived from it directly affects the ability of our
model to learn; therefore, it is extremely important that we
pre-process our data before feeding it into our model
– Data Cleaning
– Handling missing values, encoding, imputation, scaling
6
Mandatory steps in ML Journey:
– Feature Selection (to provide the most relevant data to the model)
– We identify the suitable and best features for model building
– Feature Extraction
– Data dimensions are reduced and provide as a compact
representation to the model
– Features merging
– Dimensionality reduction
7
What is Classification In Machine Learning?
– process of categorizing a given set of data into classes
– process starts with predicting the class of given data points
– classes are often referred to as target, label or categories
– main goal is to identify which class/category the new data will fall into
8
What is Classification In Machine Learning?
Example: Spam Detection
– This is a binary classification since there are only 2 classes as
spam and not spam
– A classifier utilizes some training data to understand how given
input variables relate to the class
– In this case, known spam and non-spam emails have to be used
as the training data
– When the classifier is trained accurately, it can be used to detect
an unknown email.
9
Some Classification Terminologies In ML:
– Classifier : It is an algorithm that is used to map the input data to a specific
category
– Classification Model – The model predicts or draws a conclusion to the input data
given for training, it will predict the class or category for the data
– Binary Classification – It is a type of classification with two outcomes, for e.g –
either true or false
– Multi-Class Classification – The classification with more than two classes, in multi-
class classification each sample is assigned to one and only one label or target.
10
What is Classification?
– Given a collection of records (training set)
– Each record contains a set of attributes, one of the attributes is the class
– Find a model for class attribute as a function of the values of other attributes.
– Goal: previously unseen records should be assigned a class as accurately as
possible.
– A test set is used to determine the accuracy of the model. Usually, the given data set is
divided into training and test sets, with training set used to build the model and test set
used to validate it.
11
What is Classification?
T id R e fu n d M a r it a l T a x a b le
S ta t u s In c o m e Cheat
1 Yes S in g le 125K No
2 No M a r rie d 100K No
3 No S in g le 70K No
4 Yes M a r rie d 120K No
5 No D iv o r c e d 95K Yes
6 No M a r rie d 60K No
7 Yes D iv o r c e d 220K No
8 No S in g le 85K Yes
9 No M a r rie d 75K No
10 No S in g le 90K Yes
12
Classification Vs. Numeric Prediction:
Classification:
– predicts categorical class labels (discrete or nominal)
– classifies data (constructs a model) based on the training set & the values (class
labels) in a classifying attribute and uses it in classifying new data e.g. Decision T
Numeric Prediction:
– models continuous-valued functions, i.e., predicts unknown or missing values e.g
Regression
13
Applications of Classification:
– Credit/loan approval
14
Classification – A two step process:
15
Step-I: Model Construction
Classification
Algorithms
Training
Data
– Accuracy rate is the percentage of test set samples that are correctly classified
by the model
*Note: If the test set is used to select models, it is called validation (test) set
17
Step-II: Model Usage / Model for Prediction
Classifier
Testing
Data Unseen Data
(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph
8 Assistant Prof 7 yes
18
Accuracy assessment:
Accuracy:
– Proportion of the values that are correctly predicted by the classifier
– Number of predictions that are correct / total number of instances (at how many
times model gave the correct results)
– Classifier predicted apple as an orange: Misclassification
– Satisfactory level accuracy results as model deployment
– In Regression we talked about the difference of predicted and actual i.e.
residuals/errors
19
Illustrating Classification Task:
– Training and test sets are randomly sampled
20
Find a mapping/ function that can predict class label of given tuple X
Classification Techniques:
21
Classifier-I: Decision Tree
22
Introduction to Decision Tree:
23
Decision Tree Algorithm (Pseudocode)
1. Place the best attribute of the dataset at the root of the tree
2. Split the training set into subsets. Subsets should be made in such a
way that each subset contains data with the same value for an
attribute.
3. Repeat step 1 and step 2 on each subset until you find leaf nodes in
all the branches of the tree.
24
Assumptions while creating D.T
1. At the beginning, the whole training set is considered as the root
2. Feature values are preferred to be categorical. If the values are continuous
then they are discretized prior to building the model
3. Records are distributed recursively on the basis of attribute values
4. Order to placing attributes as root or internal node of the tree is done by
using some statistical approach
25
D.T Example:
3(1,4,7) 7(2,3,5,6,8,9,10)
4 (3,5,8,10) 3 (2,6,9)
1 (3)
3(5,8,10)
26
D.T Example:
27
Illustrating Classification Task using D.T:
28
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
29
Apply Model to Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
15
30
Apply Model to Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
16
31
Apply Model to Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
17
32
Apply Model to Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
18
33
Apply Model to Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married Assign Cheat to “No”
TaxInc NO
< 80K > 80K
NO YES
19
34
Illustrating Classification Task using D.T:
Tid Attrib1 Attrib2 Attrib3 Class
2 No Medium 100K No
3 No Small 70K No
6 No Medium 60K No
Apply
Decision
Model
Tid Attrib1 Attrib2 Attrib3 Class Tree
11 No Small 55K ?
14 No Small 95K ?
15 No Large 67K ?
10
35
Decision Tree Inductions:
2. ID3, C4.5
3. CART
4. SLIQ,SPRINT
36
Algorithm for Decision Tree Induction
• Basic algorithm (a greedy algorithm)
– Tree is constructed in a top-down recursive divide-and-conquer manner
– At start, all the training examples are at the root
– Attributes are categorical (if continuous-valued, they are
discretized in advance)
– Examples are partitioned recursively based on selected
attributes
– Test attributes are selected on the basis of a heuristic or
statistical measure (e.g., information gain)
• Conditions for stopping partitioning
– All samples for a given node belong to the same class
– There are no remaining attributes for further partitioning –
majority voting is employed for classifying the leaf
– There are no samples left
37
Tree Induction:
• Greedy strategy
– Split the records based on an attribute test that optimizes
certain criterion.
• Issues
– Determine how to split the records
• How to specify the attribute test condition?
• How to determine the best split?
– Determine when to stop splitting
38
How to Specify Test Conditions?
39
Splitting Based on Nominal Attributes
• Multi-way split: Use as many partitions
as distinct values.
CarType
Family Luxury
Sports
40
Splitting Based on Ordinal Attributes
• Multi-way split: Use as many partitions as
distinct values.
Size
Small Large
Medium
41
Splitting Based on Continuous Attributes
• Different ways of handling
– Discretization to form an ordinal categorical attribute
• Static – discretize once at the beginning
• Dynamic – ranges can be found by equal interval bucketing,
equal frequency bucketing (percentiles), or clustering.
42
Splitting Based on Continuous Attributes
43
How to determine the Best Split?
Before Splitting: 10 records of class 0, 10 records of class 1
44
How to determine the Best Split
• Greedy approach:
– Nodes with homogeneous class distribution are preferred
• Need a measure of node impurity
Non-homogeneous, Homogeneous,
High degree of impurity Low degree of impurity
46
Brief Review of Entropy
m=2
-(9/10)log(9/10)-1/10log(1/10) Note: log with the base 2 not the natural log
47
Attribute Selection Measure: Information
Gain (ID3/C4.5)
48
Attribute Selection: Information Gain
49
Thank you
Any Question?
50