0% found this document useful (0 votes)

16 views13 pages

ATML Unit2

Uploaded by

aaryasawant2575

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views13 pages

ATML Unit2

Uploaded by

aaryasawant2575

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Advanced Topics in

Machine Learning
Unit 2 : Outlier Detection and Dimensionality
Reduction
Text Books:
1. Charu C. Aggarwal, Outlier Analysis, 2nd edition, Springer, 2017

Web references:
https://www.analyticsvidhya.com/blog/2024/03/one-class-svm-for-anomaly-detection/
Outliers

 An outlier is a data point that significantly deviates from the rest of the data.
 It can be either much higher or much lower than the other data points, and its
presence can have a significant impact on the results of machine learning
algorithms.
 They can be caused by measurement or execution errors.

There are two main types of outliers:

• Global outliers: Global outlier are isolated data points that are far away from
the main body of the data. They are often easy to identify and remove.
• Contextual outliers: Contextual outliers are data points that are unusual in a
specific context but may not be outliers in a different context. They are often
more difficult to identify and may require additional information or domain
knowledge to determine their significance.
Outliers

Algorithm
1. Calculate the mean of each cluster
2. Initialize the Threshold value
3. Calculate the distance of the test data from each cluster mean
4. Find the nearest cluster to the test data
5. If (Distance > Threshold) then, Outlier
Outliers

Importance of outlier detection in machine learning

Outlier detection is important in machine learning for several reasons:
1. Biased models: Outliers can bias a machine learning model towards the outlier
values, leading to poor performance on the rest of the data. This can be particularly
problematic for algorithms that are sensitive to outliers, such as linear regression.
2. Reduced accuracy: Outliers can introduce noise into the data, making it difficult for a
machine learning model to learn the true underlying patterns. This can lead to reduced
accuracy and performance.
3. Increased variance: Outliers can increase the variance of a machine learning
model, making it more sensitive to small changes in the data. This can make it difficult to
train a stable and reliable model.
4. Reduced interpretability: Outliers can make it difficult to understand what a machine
learning model has learned from the data. This can make it difficult to trust the model’s
predictions and can hamper efforts to improve its performance.
One Class SVM (OC-SVM)

Anomalies
 Anomalies are observations or instances that deviate significantly from a
dataset’s normal behavior.
 These deviations can manifest in various forms, such as outliers, noise, errors,
or unexpected patterns.
 Outlier and novelty detection identify anomalies and abnormal or uncommon
observations.

The One-Class Support Vector Machine (SVM) is a variant of the traditional SVM. It is
specifically tailored to detect anomalies. Its primary aim is to locate instances that
notably deviate from the standard.
SVM revisit

Soft margin Hard margin

A new regularization parameter C controls the trade-off between maximizing the margin and
minimizing the loss.
As you can see, the difference between the primal problem and the one for the hard margin is
the addition of slack variables. The new slack variables ( in the figure) add flexibility for
misclassifications of the model:
One Class SVM vs Traditional SVM

 One-class SVMs represent a variant of the traditional SVM algorithm primarily

employed for outlier and novelty detection tasks. Unlike traditional SVMs, which
handle binary classification tasks, One-Class SVM exclusively trains on data
points from a single class, known as the target class.
 Traditional SVMs aim to find a decision boundary that maximizes the margin
between different classes, allowing for optimal classification of new data points.
On the other hand, One-Class SVM seeks to find a boundary that encapsulates
the target class while minimizing the risk of including outliers or novel instances
outside this boundary.
 Traditional SVMs require labeled data with instances from multiple classes,
making them suitable for supervised classification tasks. In contrast, a One-
Class SVM allows application in scenarios where only data from the target class
is available, making it well-suited for unsupervised anomaly detection and
novelty detection tasks.
One Class SVM

 One-class SVM aims to discover a hyperplane with maximum margin within the
feature space by separating the mapped data from the origin. On a dataset Dn
= {x1, . . . , xn} with xi ∈ X (xi is a feature) and n dimensions:

This equation represents the primal problem formulation for OC-SVM,

where w is the separating hyperplane,
ρ is the offset from the origin, and
ξi are slack variables.
They allow for a soft margin but penalize violations ξi.
A hyperparameter ν ∈ (0, 1] controls the effect of the slack variable and should be
adjusted according to need. The objective is to minimize the norm of w while
penalizing deviations from the margin. Further, this allows a fraction of the data to
fall within the margin or on the wrong side of the hyperplane.
One Class SVM

gamma (ρ) : is a crucial parameter that

influences the shape of the decision boundary. A
smaller gamma value results in a broader
decision boundary which makes the model less
sensitive to individual data points. Conversely, a
larger gamma value leads to a more complex
decision boundary, potentially capturing intricate

w.x + b =0 is the decision boundary, and patterns in the data. Fine-tuning gamma is
the slack variables penalize deviations. essential for achieving optimal model
performance.

nu (ν): This is a crucial hyperparameter in One-Class SVM, which controls the

proportion of outliers allowed. It sets an upper bound on the fraction of training
errors and a lower bound on the fraction of support vectors. It typically ranges
between 0 and 1, where lower values imply a stricter margin and may capture fewer
outliers, while higher values are more permissive. The default value is 0.5.
One Class SVM

Kernel Functions in One-Class SVM

 Kernel functions play a crucial role in One-Class SVM by allowing the algorithm to
operate in higher-dimensional feature spaces without explicitly computing the
transformations.
 These kernels map the original input space into a higher-dimensional space, where
data points become linearly separable or exhibit more distinct patterns, facilitating
learning.
One Class SVM

Margin and Support Vectors

 In One-Class SVM, the margin represents the region where most of the data points
belonging to the target class lie.
 Maximizing the margin is crucial for One-Class SVM as it helps generalize new data
points well and improves the model’s robustness.
 In One-Class SVM, support vectors are the data points from the target class closest
to the decision boundary.
 These support vectors play a significant role in determining the shape and
orientation of the decision boundary and, thus, in the overall performance of the
One-Class SVM model.
One Class SVM

The plots allow us to visually inspect the performance of the One-Class SVM models in detecting
outliers in the Wine dataset. By comparing the results of hard margin and soft margin One-Class
SVM models, we can observe how the choice of margin setting (nu parameter) affects outlier
detection.
 The hard margin model with a very small nu value (0.01) likely results in a more
conservative decision boundary. It tightly wraps around the majority of the data points and
potentially classifies fewer points as outliers.
 Conversely, the soft margin model with a larger nu value (0.35) likely results in a more
flexible decision boundary. Thus allowing for a wider margin and potentially capturing more

Problem 1 Report Trần Minh Long 2052154 Final
No ratings yet
Problem 1 Report Trần Minh Long 2052154 Final
31 pages
Supervised Alg
No ratings yet
Supervised Alg
27 pages
Artificial Intelligence and Machine Learning: T.A. Silvia Bucci
No ratings yet
Artificial Intelligence and Machine Learning: T.A. Silvia Bucci
78 pages
2.1 SVM
No ratings yet
2.1 SVM
16 pages
SVM Notes Unit 4
No ratings yet
SVM Notes Unit 4
8 pages
Support Vectors
No ratings yet
Support Vectors
7 pages
Lecture Notes - SVM
No ratings yet
Lecture Notes - SVM
13 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
SVM Notes
No ratings yet
SVM Notes
40 pages
Unit2 Notes What Is A Support Vector Machine
No ratings yet
Unit2 Notes What Is A Support Vector Machine
11 pages
EE353 - 769 08 Linear Classification
No ratings yet
EE353 - 769 08 Linear Classification
22 pages
Session Svmclassification
No ratings yet
Session Svmclassification
28 pages
Machine Learning Term Test 2
No ratings yet
Machine Learning Term Test 2
20 pages
SVM
No ratings yet
SVM
11 pages
Unit - 2-1
No ratings yet
Unit - 2-1
7 pages
Support Vector Machine
No ratings yet
Support Vector Machine
9 pages
Support Vector Machines
No ratings yet
Support Vector Machines
12 pages
16 SVM
No ratings yet
16 SVM
41 pages
English Proj
No ratings yet
English Proj
15 pages
SVM
No ratings yet
SVM
12 pages
SVM Scribe Notes
No ratings yet
SVM Scribe Notes
16 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
SVM
No ratings yet
SVM
43 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
Fitting A Model To Data
No ratings yet
Fitting A Model To Data
41 pages
Overview of SVM: A Support Vector Machine (SVM) Performs by Finding The That The Margin Between The
No ratings yet
Overview of SVM: A Support Vector Machine (SVM) Performs by Finding The That The Margin Between The
20 pages
315 F19 14 SVM 1
No ratings yet
315 F19 14 SVM 1
33 pages
Support Vector Machine (SVM) Terminology Hyperplane WX + B 0 Support Vectors Margin Kernel Hard Margin Soft Margin
No ratings yet
Support Vector Machine (SVM) Terminology Hyperplane WX + B 0 Support Vectors Margin Kernel Hard Margin Soft Margin
6 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
CSC454 8
No ratings yet
CSC454 8
36 pages
ML - Lec 8-SVM As A Linear Classifier
No ratings yet
ML - Lec 8-SVM As A Linear Classifier
78 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
Unit II 2.2 ML Kernel Machines SVM
No ratings yet
Unit II 2.2 ML Kernel Machines SVM
50 pages
Tutorial4 SVM
No ratings yet
Tutorial4 SVM
8 pages
Support Vector Machine
No ratings yet
Support Vector Machine
17 pages
ML Lec9 SVM
No ratings yet
ML Lec9 SVM
32 pages
Support Vector Machines For Histogram-Based Image Classification
No ratings yet
Support Vector Machines For Histogram-Based Image Classification
10 pages
Mod09-ppt2-ML in Image Classification
No ratings yet
Mod09-ppt2-ML in Image Classification
30 pages
Multiclass Classification: 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
No ratings yet
Multiclass Classification: 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
59 pages
Multiclass Classification: 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
No ratings yet
Multiclass Classification: 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
59 pages
Understanding Support Vector Machine Algorithm From Examples
No ratings yet
Understanding Support Vector Machine Algorithm From Examples
10 pages
Support Vector Machines
No ratings yet
Support Vector Machines
11 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
Ann Unit III
No ratings yet
Ann Unit III
20 pages
Data Mining Techniques
No ratings yet
Data Mining Techniques
27 pages
SVM MJJ
No ratings yet
SVM MJJ
19 pages
Support Vector Machines
No ratings yet
Support Vector Machines
19 pages
Lec 05
No ratings yet
Lec 05
54 pages
Support Vector Machines As Probabilistic Models
No ratings yet
Support Vector Machines As Probabilistic Models
8 pages
Novos Classificadores
No ratings yet
Novos Classificadores
38 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Unit 2 PPT - Part 2
100% (1)
Unit 2 PPT - Part 2
81 pages
Support Vector Machine (SVM) Algorithm
No ratings yet
Support Vector Machine (SVM) Algorithm
8 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
Understanding Support Vector Machine Algorithm From Examples Along With Code
No ratings yet
Understanding Support Vector Machine Algorithm From Examples Along With Code
11 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
B1-Floorbeam (250 X 500) Beam Design
No ratings yet
B1-Floorbeam (250 X 500) Beam Design
2 pages
Matter
No ratings yet
Matter
38 pages
HCHCR6142T HCHCR6142P
No ratings yet
HCHCR6142T HCHCR6142P
4 pages
Foundations of Elastoplasticity Subloading Surface Model 4th Edition Koichi Hashiguchi Instant Download
No ratings yet
Foundations of Elastoplasticity Subloading Surface Model 4th Edition Koichi Hashiguchi Instant Download
49 pages
Silk Road
No ratings yet
Silk Road
1 page
Animals 12 02251
No ratings yet
Animals 12 02251
25 pages
BIO2133 LEC1 Jan 11 - 2021 - 1 Slide Per Page
No ratings yet
BIO2133 LEC1 Jan 11 - 2021 - 1 Slide Per Page
36 pages
GentileGrabeSelf-esteemMetaRGP2009 - Bibliografia
No ratings yet
GentileGrabeSelf-esteemMetaRGP2009 - Bibliografia
13 pages
Assignment3 Tnlcs
No ratings yet
Assignment3 Tnlcs
7 pages
Human Memory
No ratings yet
Human Memory
8 pages
Saic H 2049
No ratings yet
Saic H 2049
3 pages
Jeanine Meyer - Origami As A General Education Math Cour
No ratings yet
Jeanine Meyer - Origami As A General Education Math Cour
4 pages
Patient Complaint Form Template
No ratings yet
Patient Complaint Form Template
3 pages
Resume Design For Process Engineering
No ratings yet
Resume Design For Process Engineering
3 pages
Hours Submittal Form
No ratings yet
Hours Submittal Form
2 pages
DS - SG10KTL-MT Datasheet - V10 - EN PDF
No ratings yet
DS - SG10KTL-MT Datasheet - V10 - EN PDF
1 page
Osmotic Concentration of Potato. Spatial Distribution The Osmotic Effect
No ratings yet
Osmotic Concentration of Potato. Spatial Distribution The Osmotic Effect
25 pages
Basic Designer and Virtual Verifier (Mechanical Stream)
No ratings yet
Basic Designer and Virtual Verifier (Mechanical Stream)
2 pages
Simple Calculation of The Inbreeding Coefficient
100% (1)
Simple Calculation of The Inbreeding Coefficient
4 pages
NOTES Module 2 - ANOVA (Analysis of Variance)
No ratings yet
NOTES Module 2 - ANOVA (Analysis of Variance)
37 pages
Metadata Digestive System Grade 6 Week 2 Q2
No ratings yet
Metadata Digestive System Grade 6 Week 2 Q2
1 page
Vedic Maths Final PPT-1
No ratings yet
Vedic Maths Final PPT-1
21 pages
AIRCRAFT Impact On Society
No ratings yet
AIRCRAFT Impact On Society
2 pages
Mass Spectra - The Molecular Ion (M+) Peak
No ratings yet
Mass Spectra - The Molecular Ion (M+) Peak
5 pages
Forhad CVH
No ratings yet
Forhad CVH
3 pages
Internship Supervisor's Evalution Form Makerere
100% (1)
Internship Supervisor's Evalution Form Makerere
2 pages
Design of Fault Tolerant Control System For Electric Vehicles With Steer-by-Wire and In-Wheel Motors
No ratings yet
Design of Fault Tolerant Control System For Electric Vehicles With Steer-by-Wire and In-Wheel Motors
6 pages
Test 4 Online
No ratings yet
Test 4 Online
6 pages
Lit. Rev
No ratings yet
Lit. Rev
1 page
Bio Te CN Ika: Csir Net Unit 11 Syllabus Evolution and Behaviour
No ratings yet
Bio Te CN Ika: Csir Net Unit 11 Syllabus Evolution and Behaviour
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ATML Unit2

Uploaded by

ATML Unit2

Uploaded by

Advanced Topics in

There are two main types of outliers:

Importance of outlier detection in machine learning

Soft margin Hard margin

 One-class SVMs represent a variant of the traditional SVM algorithm primarily

This equation represents the primal problem formulation for OC-SVM,

gamma (ρ) : is a crucial parameter that

nu (ν): This is a crucial hyperparameter in One-Class SVM, which controls the

Kernel Functions in One-Class SVM

Margin and Support Vectors

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.