0% found this document useful (0 votes)
38 views56 pages

ML.4-Classification Techniques (Week 5,6,7)

Uploaded by

Sơn Trịnh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views56 pages

ML.4-Classification Techniques (Week 5,6,7)

Uploaded by

Sơn Trịnh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Nhân bản – Phụng sự – Khai phóng

Chapter 4

Classification Techniques
Machine Learning
CONTENTs

• Classification Problems
• Classification Algorithms
• K-Nearest Neighbors
• Naïve Bayes Classification
• Decision Tree
• Support Vector Machines
• Metrics to measure Classification Performance

Machine Learning 2
Chapter Content
➢ Classification Problems
➢ Classification Algorithms
○ K-Nearest Neighbors
○ Naïve Bayes Classification
○ Decision Tree
○ Support Vector Machines
➢ Metrics
Machine Learning
to measure Classification Performance 3
Classification Problem

➢ Classification is a supervised task that requires the


use of machine learning algorithms that learn how to
assign a class label to examples from the problem
domain

Machine Learning 4
Classification Problem
➢ Classification requires a training dataset with many
examples of inputs and outputs from which to learn.
➢ A ML model will use the training dataset and will
calculate how to best map examples of input data to
specific class labels.
➢ The training dataset must be sufficiently representative
of the problem and have many examples of each class
label.

Machine Learning 5
Example

• Can we predict the class of the new example?

Machine Learning
Classification Examples

● Email Spam Detection


● Speech Recognition
● Identifications of Cancer tumor cells.
● Drugs Classification
● Biometric Identification, etc.

Machine Learning 7
Binary Classification
➢ Binary Classification classify the input data into two mutually
exclusive categories.
➢ The training data in such a situation is labeled in a binary
format: true and false; positive and negative; 0 and 1; spam and
not spam, etc. depending on the problem being tackled.

Machine Learning 8
Multi-Class Classification
➢ Multi-class classification is where each data sample is
assigned one and only one label from more than two
classes.

Machine Learning 9
Multi-Label Classification
➢ Multi-Label classification refers to predicting zero or
more labels to each data sample.
➢ Example: auto-tagging in Natural Language Processing,
where a given text can contain multiple topics or in computer
vision, an image can contain multiple objects.

Machine Learning 10
Example

????

Machine Learning 11
Chapter Content
➢ Classification Problems
➢ Classification Algorithms
○ K-Nearest Neighbors
○ Naïve Bayes Classification
○ Decision Tree
○ Support Vector Machines
➢ Metrics
Machine Learning
to measure Classification Performance 12
K- nearest neighbor (K-NN)
➔ Given training data D = {(x1, y1),...,(xN , yN )} and a test point
➔ Prediction Rule: Look at the K most similar training examples
➔ For classification: assign the majority class label (majority
voting)
➔ For regression: assign the average response
➔ The algorithm requires:
◆ Parameter K: number of nearest

neighbors to look for


◆ Distance function: To compute the

similarities between examples


➔ Special Case: 1-Nearest Neighbor
Machine Learning 13
K- Nearest Neighbor (K-NN): Steps

➔ Compute the test point’s distance from each training point


➔ Sort the distances in ascending (or descending) order
➔ Use the sorted distances to select the K nearest neighbors
➔ Use majority rule (for classification) or averaging (for
regression)

Machine Learning 14
K- Nearest Neighbor (K-NN)

➔ K-NN is called a non-parametric method


➔ Unlike other supervised learning algorithms, K-Nearest
Neighbors doesn’t learn an explicit mapping f from the
training data
➔ It simply uses the training data at the test time to make
predictions

Machine Learning 15
K- Nearest Neighbor (K-NN): Computing the distance

➔ The K-NN algorithm requires computing distances of the test


example from each of the training examples.
➔ Several ways to compute distances.
➔ The choice depends on the type of the features in the data.
D
◆ Real-valued features (xi ∈ R ): Euclidean distance is

commonly used.

Machine Learning 16
K- Nearest Neighbor (K-NN): Computing the distance

➔ Some other distance measures


◆ Binary-valued features
● Use Hamming distance:
● Hamming distance counts the number of features where
the two examples disagree
◆ Mixed feature types (some real-valued and some binary-
valued)?
● Can use mixed distance measures
● E.g., Euclidean for the real part, Hamming for the binary
part
◆ Can also assign weights to features:

Machine Learning 17
K- Nearest Neighbor (K-NN): Choice of K

➔ Small K
◆ Creates many small regions for each class
◆ May lead to non-smooth) decision boundaries and overfit
➔ Large K
◆ Creates fewer larger regions
◆ Usually leads to smoother decision boundaries (caution: too smooth decision
boundary can underfit)
◆ Choosing K
➔ Often data dependent and heuristic based
◆ Or using cross-validation (using some held-out data)
◆ In general, a K too small or too big is bad!

Machine Learning 18
K- Nearest Neighbor (K-NN): Properties
➔ What’s nice
◆ Simple and intuitive; easily implementable
◆ Asymptotically consistent (a theoretical property)
● With infinite training data and large enough K, K-NN approaches the
best possible classifier (Bayes optimal)
➔ What’s not so nice..
◆ Store all the training data in memory even at test time
● Can be memory intensive for large training datasets
● An example of non-parametric, or memory/instance-based methods
● Different from parametric, model-based learning models
◆ Expensive at test time: O(ND) computations for each test point
● Have to search through all training data to find nearest neighbors
● Distance computations with N training points (D features each)
◆ Sensitive to noisy features

Machine Learning 19
Chapter Content
➢ Classification Problems
➢ Classification Algorithms
○ K-Nearest Neighbors
○ Naïve Bayes Classification
○ Decision Tree
○ Support Vector Machines
➢ Metrics
Machine Learning
to measure Classification Performance 20
Naïve Bayes Classification: Rule

➔ Bayes Rule:

➔ Prior :
➔ Posterior :

…by no means merely a curious speculation in the doctrine of chances, but


necessary to be solved in order to a sure foundation for all our reasonings
concerning past facts, and what is likely to be hereafter…. necessary to be
considered by any that would give a clear account of the strength of
analogical or inductive reasoning…
Machine Learning 21
Naïve Bayes Classification: Rule

➔ Bayes Rule:

➔ A = you got flu B = you just coughed

➔ What is P(flu|cough)=P(A|B)?

Machine Learning 22
Naïve Bayes Classification: in a Nutshell

➔ Bayes Rule:

➔ If Xi and Xj are conditionally independent given Y, for all i ≠ j

➔ So, to pick the most probably Y for

Machine Learning 23
Chapter Content
➢ Classification Problems
➢ Classification Algorithms
○ K-Nearest Neighbors
○ Naïve Bayes Classification
○ Decision Tree
○ Support Vector Machines
➢ Metrics
Machine Learning
to measure Classification Performance 24
Decision Tree
Example: learn concept Play Tennis (i.e., decide whether our friend will play
tennis or not in a given day)
Simple Training Data Set

Machine Learning 25
Decision Tree
➔ Each internal node: test one (discrete-valued) attribute Xi
➔ Each branch from a node: corresponds to one possible values for Xi
➔ Each leaf node: predict Y (or P(Y=1|x ∊ leaf))
➔ Example: A Decision tree for f: PlayTennis?

E.g., x=(Outlook=sunny, Temperature-Hot, Humidity=Normal,Wind=High),


f(x)=Yes.

Machine Learning 26
Decision Tree
➔ Suppose X = <x1,…xn >
➔ where xi are boolean-valued variables
➔ How would you represent the following as DTs?

Machine Learning 27
Decision Tree
➔ Input: Training labeled examples {(x (i) ,y(i) )} of unknown target
function f
◆ Examples described by their values on some set of features or
attributes
● E.g. 4 attributes: Humidity, Wind, Outlook, Temp – e.g.,
● Set of possible instances X (a.k.a instance space)
◆ Unknown target function f : XY
● e.g., Y={0,1} label space
● e.g., 1 if we play tennis on this day, else 0

➔ Output: Hypothesis h H that (best) approximates target function f


◆ Set of function hypotheses H={ h | h : XY }
– each hypothesis h is a decision tree
Machine Learning 28
Chapter Content
➢ Classification Problems
➢ Classification Algorithms
○ K-Nearest Neighbors
○ Naïve Bayes Classification
○ Decision Tree
○ Support Vector Machines
➢ Metrics
Machine Learning
to measure Classification Performance 29
Support Vector Machines
➔ If large margin, # mistakes Peceptron makes is small (independent on
the dim of the ambient space)!
➔ Large margin can help prevent overfitting.
◆ If large margin 𝛾 and if alg. produces a
large margin classifier, then amount of
data needed depends only on R/𝛾
[Bartlett & Shawe-Taylor ’99].
➔ Ideas: Directly search for a large margin classifier!!!
Support Vector Machines (SVMs)

Machine Learning 30
Support Vector Machines: Geometric Margin
➔ Definition: The margin of example 𝑥 w.r.t. a linear sep. 𝑤 is the
distance from 𝑥 to the plane 𝑤 ⋅ 𝑥 = 0.
➔ Definition: The margin 𝛾𝑤 of a set of examples 𝑆 wrt a linear separator
𝑤 is the smallest margin over points 𝑥 ∈ 𝑆.
➔ Definition: The margin 𝛾 of a set of examples 𝑆 is the maximum 𝛾𝑤
over all linear separators 𝑤.

Machine Learning 31
Support Vector Machines: Geometric Margin
➔ Directly optimize for the maximum margin separator: SVMs
➔ First, assume we know a lower bound on the margin 𝜸

Input: 𝛾, S={(x1, 𝑦1), …,(xm, 𝑦m)};


Find: some w where:
• ||w ||2 = 1
• For all i, 𝑦𝑖𝑤 ⋅ 𝑥𝑖 ≥ 𝛾
Output: w, a separator of margin 𝛾 over S

The case where the data is truly linearly separable by margin 𝛾

Machine Learning 32
Support Vector Machines: Geometric Margin
➔ Directly optimize for the maximum margin separator: SVMs
E.g., search for the best possible 𝜸

Input: 𝛾, S={(x1, 𝑦1), …,(xm, 𝑦m)};


Find: some w where:
• ||w ||2 = 1
• For all i, 𝑦𝑖𝑤 ⋅ 𝑥𝑖 ≥ 𝛾
Output: w, a separator of margin 𝛾 over S

The case where the data is truly linearly separable by margin 𝛾

Machine Learning 33
Support Vector Machines: Geometric Margin
➔ Directly optimize for the maximum margin separator: SVMs

Input: 𝛾, S={(x1, 𝑦1), …,(xm, 𝑦m)};


Maximize 𝛾 under the constraint:
• ||w ||2 = 1
• For all i, 𝑦𝑖𝑤 ⋅ 𝑥𝑖 ≥ 𝛾

objective function

Famous example of constrained optimization: linear programming, where


objective fn is linear, constraints are linear (in)equalities

Machine Learning 34
Support Vector Machines: Geometric Margin
➔ Directly optimize for the maximum margin separator: SVMs

Input: 𝛾, S={(x1, 𝑦1), …,(xm, 𝑦m)};


Maximize 𝛾 under the constraint:
• ||w ||2 = 1
• For all i, 𝑦𝑖𝑤 ⋅ 𝑥𝑖 ≥ 𝛾

This constraint is non-linear.


In fact, it’s even non-convex

Machine Learning 35
Chapter Content
➢ Classification Problems
➢ Classification Algorithms
○ K-Nearest Neighbors
○ Naïve Bayes Classification
○ Decision Tree
○ Support Vector Machines
➢ Metrics
Machine Learning
to measure Classification Performance 36
Metrics to measure Classification Performance
➢ Confusion Matrix
➢ Accuracy
➢ Precision
➢ Recall
➢ F1 score

Machine Learning 37
Confusion Matrix
➢ describe the performance of a classification model on a set
of test data for which the true values (actual classes) are known.
Binary-class Classification

Machine Learning 38
Example
A confusion matrix for a binary classifier in disease diagnosis in
which "yes" would mean they have the disease, and "no" would
mean they don't have the disease.

Machine Learning 39
Confusion Matrix
Multi-class Classification

Machine Learning 40
Confusion Matrix
Multi-label Classification

Machine Learning 41
The role of confusion matrix
➢ evaluates the performance of the classification
models, when they make predictions on test data.
➢ measures how good generated classification model is.
➢ calculate the different metrics for evaluating the
model, such as accuracy, precision, etc.

Machine Learning 42
Accuracy
➢ determined as the number of correct predictions to the
total number of predictions.

Machine Learning 43
Accuracy in Binary Classification

Accuracy= (100+50)/165 = 0.91

Machine Learning 44
Accuracy in Multi-class Classification
Multi-class Classification

Accuracy

Machine Learning 45
Accuracy in Multilabel Classification

Machine Learning 46
When uses Accuracy?
➢ use the Accuracy metric when the target variable
classes in data are approximately balanced
(unbiased)
➢ in the case of imbalanced data (biased), where one
class is much larger than another, the accuracy can
be highly misleading.

Machine Learning 47
Example
There is a model for a disease prediction in which, out
of 100 persons, only 5 persons have a disease, and 95
people don't have one. If the classifier predicts
everyone with no disease , the Accuracy value is
95% (TP= 0; TN= 95; FP=0; FN=5; All examples=100)
=> not correct

Machine Learning 48
Example
60% of classes in a fruit image dataset are of Apple, 40%
are Mango (approximately balanced data)
if the classifier predict whether the image is of Apple or
Mango, a prediction with 97% of accuracy is credible.

Machine Learning 49
Precision
➢ Precision measures how good the model is at correctly
identifying the positive class

Machine Learning 50
Precision in Multi-class Classification

Actual Labels

Predicted
Labels

Machine Learning 51
Recall
➢ Recall measures how good the model is at
correctly predicting all the positive observations
in the dataset

Machine Learning 52
Recall in Multi-class Classification
Actual Labels

Predicted
Labels

Machine Learning 53
F1 Score
➢ F1 Score (F-measure) is the harmonic mean of precision
and recall.

➢ used to compare the performance of two classifiers when


determines which one produces better results.

Machine Learning 54
SUMMARY

• Classification Problems
• Classification Algorithms
• K-Nearest Neighbors
• Naïve Bayes Classification
• Decision Tree
• Support Vector Machines
• Metrics to measure Classification Performance

Machine Learning 55
Nhân bản – Phụng sự – Khai phóng

Enjoy the Course…!

Machine Learning 56

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy