0% found this document useful (0 votes)
9 views21 pages

Lecture5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views21 pages

Lecture5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Classification : principles

and examples

Terminology

✓ Classification (En) = «classification » (Fr)


✓ Clustering (En) = « partitionnement (de données) » (Fr)
✓ A category or label in a classification problem is called a class
✓ Data points are called samples

© Nicolas Navet University of Luxembourg 3


Basics of Classification
- Classification is assigning labels to data
- Classification is usually done with Supervised Learning (SL): training a model
with examples (the training set) and applying the trained model to unseen
data.
- A great diversity of techniques can be employed : Neural-Networks, k-NN,
Support Vector Machines, decision trees, …
A feature is a property or
Geom. problem: characteristic of a sample that
segmenting serves an input to the
algorithm. Selecting good
features, ones having genuine
Figures from [AG20] predictive ability, is crucial !
Classification: a labeled training set for
spam evaluation What are some possible features for
spam classification?
© Nicolas Navet University of Luxembourg 4
Classification with supervised learning
Training
Examples
Learning Trained model applied to
Algorithm classify unseen data
Class of
Model

K-Nearest Neighbors (k-NN)


in this lecture

© Nicolas Navet University of Luxembourg 5


Multi-label classification

Multi-label classification: predicting


classes which are not mutually
exclusive

This lecture does not cover multi-label classification, it focuses


on classification between mutually exclusive classes.

© Nicolas Navet University of Luxembourg 6


K-Nearest Neighbours (K-NN) principles Samples of coins to “train” a
vending machine
- Principle: classify observations by assigning them to the
same category as their similar / "nearest" neighbors.
- Supervised learning: use training data already
classified into categories
- K-NN identifies the k samples in the training set that are the “nearest” to
the unseen data, then classify it in the category that is most frequent
among these k neighbors
- Similarity of data measured (usually) by the distance between the data
points in the feature space. Each “feature” is a coordinate: n features → n
dimensional space What will be the result of
- Parameters of K-NN to be chosen by user: the classification if k is set
- The set of features to be considered equal to the size of the
- k : the number of nearest neighbors to be used training set?

© Nicolas Navet University of Luxembourg 7


Different concepts of distance

𝑥, 𝑦: 2 data points in a 𝑛 dimensional space

𝑝 = 1 : Manhattan distance (aka L1 or city-block distance)


𝑑 𝑥, 𝑦 = 𝑝 = 2 : Euclidean distance

There are other various concepts of distance, which one will work best is application dependent

© Nicolas Navet University of Luxembourg 8


Different concepts of distance
(6,6)

What is the Euclidean and Manhattan


distances between the two points?

(0,0) Figure from wikipedia

𝑝 = 1 : Manhattan distance (aka city-block distance)


𝑑 𝑥, 𝑦 = 𝑝 = 2 : Euclidean distance

© Nicolas Navet University of Luxembourg 9


K-Nearest Neighbours (K-NN) applications
- Simple but powerful, K-NN successfully used in character recognition, face
recognition in image and video, recommendations, diagnosing diseases
based on patient data like symptoms, blood test results, etc
- Good choice when relationships between features and classes are numerous
and complex to understand, but data in same class tend to be homogeneous
and there is a clear distinction between classes
- “lazy” learning algorithm since computation is deferred until classification (≠
eager learning where the algorithm processes the training data before
receiving queries) → rely heavily on quality of training set
- A good first approach: if k-NN yields positive results, classification is possible,
and a more powerful approach like Neural Networks will perform better.

© Nicolas Navet University of Luxembourg 10


Example : food classification
- Plot the following foods in Python with “How sweet the food tastes” on the
x-axis and “How crunchy the food is” on the y-axis. There should be a label
near the points indicating the name of the ingredients.
Hint: look at https://www.tutorialspoint.com/matplotlib/matplotlib_scatter_plot.htm then add names e.g.
for i, txt in enumerate(Ingredient):
ax.annotate(txt, (Sweetness[i], Crunchiness[i]))

Example from [1]

© Nicolas Navet University of Luxembourg 11


Example : more food types

Propose a
classification of the
foods shown on the
plot into a few
categories

Figure from [1]

© Nicolas Navet University of Luxembourg 12


Training set : classified food types

This is the training set,


meaning the data has already
been classified by food type -
regardless of how this
classification was done

Figure from [1]

© Nicolas Navet University of Luxembourg 13


Classifying tomatoes

Figure from [1]


© Nicolas Navet University of Luxembourg 14
Measuring similarity
with distance of the features

- Euclidean distance (2D):

Generalizes to higher dimensions,


here 3D [Wikipedia]

[Wikipedia]

- pk is the value of the k-th feature for the first data and qk is the value of
the k-th feature for the second data

© Nicolas Navet University of Luxembourg 15


Illustration : food example

Calculate the distance between the


tomato (sweetness = 6, crunchiness = 4)
and its four closest neighbors listed in
the table

Classify the tomato with a 1-NN and 3-NN classification,


which class does it belong to?

© Nicolas Navet University of Luxembourg 16


Choosing the appropriate K value
- Larger K will reduce the negative impact of noisy data, but rare patterns might
be ignored
- With smaller K, such as 1-NN, noisy data can negatively impact classification and
lead to incorrect results

The challenge is that we don't know in


advance which value of K is best for
capturing the true underlying pattern

Common practices: start with K equal to the square root of the training set size, use a larger K with a
weighted voting process based on the distances of the neighbors, and/or use cross-validation to evaluate
the model’s performance (this will be discussed in a later lecture).

© Nicolas Navet University of Luxembourg 17


Feature scaling aka Data Normalization
- Classification algorithms – and machine learning algorithms at large - do not
perform well when their input (i.e., values of the features) have very different
scales (because features will have very different weights in the classification)

- The usual method is min-max normalization :


X is the value of a feature
Limitations:
• Not robust to outliers, or data errors in data (e.g.,
extremely large values)
• Requires knowledge of plausible minimum and Xnew will be in [0,1]
maximum values in advance, as the full range of
values may not be represented in the training set.

© Nicolas Navet University of Luxembourg 18


Data Normalization Continued
- Alternative method is z-score normalization :

- If a value is exactly equal to the mean of all the feature values, it will be
normalized to 0. If it is below the mean, it will result in a negative value, and if
it is above the mean it will result in a positive number (see examples here).

• Handles outliers better


• But does not produce normalized data with the exact
same scale (e.g. not in [0,1]).

© Nicolas Navet University of Luxembourg 19


References
1. B. Lantz, “Machine learning with R”, second edition, 2015.
2.Peter Bruce et al, “Practical Statistics for Data Scientists”, O’Reilly, second
edition, 2020.

© Nicolas Navet University of Luxembourg 20


Appendix
- “The unreasonable effectiveness of data” in supervised learning
- Neural networks excel at utilizing data

© Nicolas Navet University of Luxembourg 21


Supervised Learning:
“The Unreasonable Effectiveness of Data”
- “Garbage in, Garbage out” principle: quality of data is crucial, it is very hard to
compensate for bad data (e.g., wrong labels)
All techniques perform
- Famous studies in the 2000s (before deep-learning) similarly
showing that very different ML algorithms performed
almost identically well on a natural language problem
(deciding when to write “two”, “to”, “too”) once they had
“enough” data
- It suggests to spend more time on collecting
quality data than on algorithms
- Deep-learning has since proven to make a better
use of data (esp. large data set) than “traditional”
ML algorithms (which tend to plateau off after a certain
point
Figure from [AG20]

Introduction to Machine Learning 22


Neural Networks VS traditional machine learning
Performance,
e.g. accuracy of
classification

Amount of data
✓ Neural networks excel at utilizing large amounts of data, while traditional machine learning techniques
tend to plateau after reaching a certain data threshold.
✓ If a traditional ML algorithm is performing well, and there is a significant amount of data, it is highly
likely that using a neural network could yield even better results.
Introduction to Machine Learning 23

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy