0% found this document useful (0 votes)

122 views9 pages

Anomaly Detection - Problem Motivation

Anomaly detection is an unsupervised machine learning technique used to identify outliers or anomalies in datasets. It involves building a model of normal behavior from unlabeled training data, and then using that model to detect data points that are anomalous or abnormal compared to the rest of the data. Some key applications of anomaly detection include fraud detection by identifying unusual user activity patterns, and monitoring systems like data centers to detect machines that may be anomalous and at risk of failure. The algorithm works by modeling each feature of the data as a Gaussian distribution, and then calculating a probability that a new data point was generated by the same distributions - lower probability points are more anomalous. Evaluation on labeled test data allows measuring how well the technique identifies known anomalous examples.

Uploaded by

marc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

122 views9 pages

Anomaly Detection - Problem Motivation

Uploaded by

marc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

15: Anomaly Detection

Previous Next Index

Anomaly detection - problem motivation

Anomaly detection is a reasonably commonly used type of machine learning application
Can be thought of as a solution to an unsupervised learning problem
But, has aspects of supervised learning
What is anomaly detection?
Imagine you're an aircraft engine manufacturer
As engines roll off your assembly line you're doing QA
Measure some features from engines (e.g. heat generated and vibration)
You now have a dataset of x1 to xm (i.e. m engines were tested)
Say we plot that dataset
0
Next day you have a new engine
An anomaly detection method is used to see if the new engine is anomalous (when compared to the previous engines)
If the new engine looks like this;
Probably OK - looks like the ones we've seen before
But if the engine looks like this

Uh oh! - this looks like an anomalous data-point

More formally
We have a dataset which contains normal (data)
How we ensure they're normal is up to us
In reality it's OK if there are a few which aren't actually normal
Using that dataset as a reference point we can see if other examples are anomalous
How do we do this?
First, using our training dataset we build a model
We can access this model using p(x)
This asks, "What is the probability that example x is normal"
Having built a model
if p(xtest) < ε --> flag this as an anomaly
if p(xtest) >= ε --> this is OK
ε is some threshold probability value which we define, depending on how sure we need/want to be
We expect our model to (graphically) look something like this;

i.e. this would be our model if we had 2D data

Applications

Fraud detection
Users have activity associated with them, such as
Length on time on-line
Location of login
Spending frequency
Using this data we can build a model of what normal users' activity is like
What is the probability of "normal" behavior?
Identify unusual users by sending their data through the model
Flag up anything that looks a bit weird
Automatically block cards/transactions
Manufacturing
Already spoke about aircraft engine example
Monitoring computers in data center
If you have many machines in a cluster
Computer features of machine
x1 = memory use
x2 = number of disk accesses/sec
x3 = CPU load
In addition to the measurable features you can also define your own complex features
x4 = CPU load/network traffic
If you see an anomalous machine
Maybe about to fail
Look at replacing bits from it

The Gaussian distribution (optional)

Also called the normal distribution
Example
Say x (data set) is made up of real numbers
Mean is μ
Variance is σ2
σ is also called the standard deviation - specifies the width of the Gaussian probability
The data has a Gaussian distribution
Then we can write this ~ N(μ,σ2 )
~ means = is distributed as
N (should really be "script" N (even curlier!) -> means normal distribution
μ, σ2 represent the mean and variance, respectively
These are the two parameters a Gaussian means
Looks like this;

This specifies the probability of x taking a value

As you move away from μ
Gaussian equation is
P(x : μ , σ2 ) (probability of x, parameterized by the mean and squared variance)

Some examples of Gaussians below

Area is always the same (must = 1)
But width changes as standard deviation changes

Parameter estimation problem

What is it?
Say we have a data set of m examples
Give each example is a real number - we can plot the data on the x axis as shown below

Problem is - say you suspect these examples come from a Gaussian

Given the dataset can you estimate the distribution?
Could be something like this

Seems like a reasonable fit - data seems like a higher probability of being in the central region, lower probability of
being further away
Estimating μ and σ2
μ = average of examples
σ2 = standard deviation squared

As a side comment
These parameters are the maximum likelihood estimation values for μ and σ2
You can also do 1/(m) or 1/(m-1) doesn't make too much difference
Slightly different mathematical problems, but in practice it makes little difference

Anomaly detection algorithm

Unlabeled training set of m examples
Data = {x 1 , x2 , ..., xm }
Each example is an n-dimensional vector (i.e. a feature vector)
We have n features!
Model P(x) from the data set
What are high probability features and low probability features
x is a vector
So model p(x) as
= p(x1 ; μ1 , σ1 2 ) * p(x2 ; μ2 , σ2 2 ) * ... p(xn ; μn , σn 2 )
Multiply the probability of each features by each feature
We model each of the features by assuming each feature is distributed according to a Gaussian distribution
p(xi; μi , σi2 )
The probability of feature xi given μi and σi2 , using a Gaussian distribution
As a side comment
Turns out this equation makes an independence assumption for the features, although algorithm works if features are
independent or not
Don't worry too much about this, although if you're features are tightly linked you should be able to do some
dimensionality reduction anyway!
We can write this chain of multiplication more compactly as follows;
Capital PI (Π) is the product of a set of values
The problem of estimation this distribution is sometimes call the problem of density estimation
Algorithm
1 - Chose features
Try to come up with features which might help identify something anomalous - may be unusually large or small values
More generally, chose features which describe the general properties
This is nothing unique to anomaly detection - it's just the idea of building a sensible feature vector
2 - Fit parameters
Determine parameters for each of your examples μi and σi2
Fit is a bit misleading, really should just be "Calculate parameters for 1 to n"
So you're calculating standard deviation and mean for each feature
You should of course used some vectorized implementation rather than a loop probably
3 - compute p(x)
You compute the formula shown (i.e. the formula for the Gaussian probability)
If the number is very small, very low chance of it being "normal"

Anomaly detection example

x1
Mean is about 5
Standard deviation looks to be about 2
x2
Mean is about 3
Standard deviation about 1
So we have the following system

If we plot the Gaussian for x1 and x2 we get something like this

If you plot the product of these things you get a surface plot like this
With this surface plot, the height of the surface is the probability - p(x)
We can't always do surface plots, but for this example it's quite a nice way to show the probability of a 2D feature vector
Check if a value is anomalous
Set epsilon as some value
Say we have two new data points new data-point has the values
x1 test
x2 test
We compute
p(x1 test) = 0.436 >= epsilon (~40% chance it's normal)
Normal
p(x2 test) = 0.0021 < epsilon (~0.2% chance it's normal)
Anomalous
What this is saying is if you look at the surface plot, all values above a certain height are normal, all the values below that threshold
are probably anomalous

Developing and evaluating and anomaly detection system

Here talk about developing a system for anomaly detection
How to evaluate an algorithm
Previously we spoke about the importance of real-number evaluation
Often need to make a lot of choices (e.g. features to use)
Easier to evaluate your algorithm if it returns a single number to show if changes you made improved or worsened an
algorithm's performance
To develop an anomaly detection system quickly, would be helpful to have a way to evaluate your algorithm
Assume we have some labeled data
So far we've been treating anomalous detection with unlabeled data
If you have labeled data allows evaluation
i.e. if you think something iss anomalous you can be sure if it is or not
So, taking our engine example
You have some labeled data
Data for engines which were non-anomalous -> y = 0
Data for engines which were anomalous -> y = 1
Training set is the collection of normal examples
OK even if we have a few anomalous data examples
Next define
Cross validation set
Test set
For both assume you can include a few examples which have anomalous examples
Specific example
Engines
Have 10 000 good engines
OK even if a few bad ones are here...
LOTS of y = 0
20 flawed engines
Typically when y = 1 have 2-50
Split into
Training set: 6000 good engines (y = 0)
CV set: 2000 good engines, 10 anomalous
Test set: 2000 good engines, 10 anomalous
Ratio is 3:1:1
Sometimes we see a different way of splitting
Take 6000 good in training
Same CV and test set (4000 good in each) different 10 anomalous,
Or even 20 anomalous (same ones)
This is bad practice - should use different data in CV and test set
Algorithm evaluation
Take trainings set { x1 , x2 , ..., xm }
Fit model p(x)
On cross validation and test set, test the example x
y = 1 if p(x) < epsilon (anomalous)
y = 0 if p(x) >= epsilon (normal)
Think of algorithm a trying to predict if something is anomalous
But you have a label so can check!
Makes it look like a supervised learning algorithm
What's a good metric to use for evaluation
y = 0 is very common
So classification would be bad
Compute fraction of true positives/false positive/false negative/true negative
Compute precision/recall
Compute F1-score
Earlier, also had epsilon (the threshold value)
Threshold to show when something is anomalous
If you have CV set you can see how varying epsilon effects various evaluation metrics
Then pick the value of epsilon which maximizes the score on your CV set
Evaluate algorithm using cross validation
Do final algorithm evaluation on the test set

Anomaly detection vs. supervised learning

If we have labeled data, we not use a supervised learning algorithm?
Here we'll try and understand when you should use supervised learning and when anomaly detection would be better

Anomaly detection

Very small number of positive examples

Save positive examples just for CV and test set
Consider using an anomaly detection algorithm
Not enough data to "learn" positive examples
Have a very large number of negative examples
Use these negative examples for p(x) fitting
Only need negative examples for this
Many "types" of anomalies
Hard for an algorithm to learn from positive examples when anomalies may look nothing like one another
So anomaly detection doesn't know what they look like, but knows what they don't look like
When we looked at SPAM email,
Many types of SPAM
For the spam problem, usually enough positive examples
So this is why we usually think of SPAM as supervised learning
Application and why they're anomaly detection
Fraud detection
Many ways you may do fraud
If you're a major on line retailer/very subject to attacks, sometimes might shift to supervised learning
Manufacturing
If you make HUGE volumes maybe have enough positive data -> make supervised
Means you make an assumption about the kinds of errors you're going to see
It's the unknown unknowns we don't like!
Monitoring machines in data

Supervised learning

Reasonably large number of positive and negative examples

Have enough positive examples to give your algorithm the opportunity to see what they look like
If you expect anomalies to look anomalous in the same way
Application
Email/SPAM classification
Weather prediction
Cancer classification

Choosing features to use

One of the things which has a huge effect is which features are used
Non-Gaussian features
Plot a histogram of data to check it has a Gaussian description - nice sanity check
Often still works if data is non-Gaussian
Use hist command to plot histogram
Non-Gaussian data might look like this

Can play with different transformations of the data to make it look more Gaussian
Might take a log transformation of the data
i.e. if you have some feature x1 , replace it with log( x1 )
This looks much more Gaussian
Or do log(x1 +c)
Play with c to make it look as Gaussian as possible
Or do x1/2
Or do x1/3

Error analysis for anomaly detection

Good way of coming up with features

Like supervised learning error analysis procedure
Run algorithm on CV set
See which one it got wrong
Develop new features based on trying to understand why the algorithm got those examples wrong
Example
p(x) large for normal, p(x) small for abnormal
e.g.

Here we have one dimension, and our anomalous value is sort of buried in it (in green - Gaussian superimposed in blue)
Look at data - see what went wrong
Can looking at that example help develop a new feature (x2) which can help distinguish further anomalous
Example - data center monitoring
Features
x1 = memory use
x2 = number of disk access/sec
x3 = CPU load
x4 = network traffic
We suspect CPU load and network traffic grow linearly with one another
If server is serving many users, CPU is high and network is high
Fail case is infinite loop, so CPU load grows but network traffic is low
New feature - CPU load/network traffic
May need to do feature scaling

Multivariate Gaussian distribution

Is a slightly different technique which can sometimes catch some anomalies which non-multivariate Gaussian distribution anomaly
detection fails to
Unlabeled data looks like this

Say you can fit a Gaussian distribution to CPU load and memory use
Lets say in the test set we have an example which looks like an anomaly (e.g. x1 = 0.4, x2 = 1.5)
Looks like most of data lies in a region far away from this example
Here memory use is high and CPU load is low (if we plot x1 vs. x2 our green example looks miles away from the
others)
Problem is, if we look at each feature individually they may fall within acceptable limits - the issue is we know we shouldn't don't
get those kinds of values together
But individually, they're both acceptable

This is because our function makes probability prediction in concentric circles around the the means of both
Probability of the two red circled examples is basically the same, even though we can clearly see the green one as an outlier
Doesn't understand the meaning

Multivariate Gaussian distribution model

To get around this we develop the multivariate Gaussian distribution

Model p(x) all in one go, instead of each feature separately
What are the parameters for this new model?
μ - which is an n dimensional vector (where n is number of features)
Σ - which is an [n x n] matrix - the covariance matrix

For the sake of completeness, the formula for the multivariate Gaussian distribution is as follows

NB don't memorize this - you can always look it up

What does this mean?
= absolute value of Σ (determinant of sigma)
This is a mathematic function of a matrix
You can compute it in MATLAB using det(sigma)
More importantly, what does this p(x) look like?
2D example
Sigma is sometimes call the identity matrix
p(x) looks like this
For inputs of x1 and x2 the height of the surface gives the value of p(x)
What happens if we change Sigma?

So now we change the plot to

Now the width of the bump decreases and the height increases
If we set sigma to be different values this changes the identity matrix and we change the shape of our graph

Using these values we can, therefore, define the shape of this to better fit the data, rather than assuming symmetry in every
dimension
One of the cool things is you can use it to model correlation between data
If you start to change the off-diagonal values in the covariance matrix you can control how well the various dimensions correlation

So we see here the final example gives a very tall thin distribution, shows a strong positive correlation
We can also make the off-diagonal values negative to show a negative correlation
Hopefully this shows an example of the kinds of distribution you can get by varying sigma
We can, of course, also move the mean ( μ) which varies the peak of the distribution

Applying multivariate Gaussian distribution to anomaly detection

Saw some examples of the kinds of distributions you can model
Now let's take those ideas and look at applying them to different anomaly detection algorithms
As mentioned, multivariate Gaussian modeling uses the following equation;

Which comes with the parameters μ and Σ

Where
μ - the mean (n-dimenisonal vector)
Σ - covariance matrix ([nxn] matrix)
Parameter fitting/estimation problem
If you have a set of examples
{x1 , x2 , ..., xm }
The formula for estimating the parameters is

Using these two formulas you get the parameters

Anomaly detection algorithm with multivariate Gaussian distribution

1) Fit model - take data set and calculate μ and Σ using the formula above
2) We're next given a new example ( xtest) - see below
For it compute p(x) using the following formula for multivariate distribution

3) Compare the value with ε (threshold probability value)

if p(xtest) < ε --> flag this as an anomaly
if p(xtest) >= ε --> this is OK
If you fit a multivariate Gaussian model to our data we build something like this

Which means it's likely to identify the green value as anomalous

Finally, we should mention how multivariate Gaussian relates to our original simple Gaussian model (where each feature is looked
at individually)
Original model corresponds to multivariate Gaussian where the Gaussians' contours are axis aligned
i.e. the normal Gaussian model is a special case of multivariate Gaussian distribution
This can be shown mathematically
Has this constraint that the covariance matrix sigma as ZEROs on the non-diagonal values

If you plug your variance values into the covariance matrix the models are actually identical

Original model vs. Multivariate Gaussian

Original Gaussian model

Probably used more often

There is a need to manually create features to capture anomalies where x1 and x2 take unusual combinations of values
So need to make extra features
Might not be obvious what they should be
This is always a risk - where you're using your own expectation of a problem to "predict" future anomalies
Typically, the things that catch you out aren't going to be the things you though of
If you thought of them they'd probably be avoided in the first place
Obviously this is a bigger issue, and one which may or may not be relevant depending on your problem space
Much cheaper computationally
Scales much better to very large feature vectors
Even if n = 100 000 the original model works fine
Works well even with a small training set
e.g. 50, 100
Because of these factors it's used more often because it really represents a optimized but axis-symmetric specialization of the general
model

Multivariate Gaussian model

Used less frequently

Can capture feature correlation
So no need to create extra values
Less computationally efficient
Must compute inverse of matrix which is [n x n]
So lots of features is bad - makes this calculation very expensive
So if n = 100 000 not very good
Needs for m > n
i.e. number of examples must be greater than number of features
If this is not true then we have a singular matrix (non-invertible)
So should be used only in m >> n
If you find the matrix is non-invertible, could be for one of two main reasons
m<n
So use original simple model
Redundant features (i.e. linearly dependent)
i.e. two features that are the same
If this is the case you could use PCA or sanity check your data

Anamoly Suleman
No ratings yet
Anamoly Suleman
37 pages
5 Anomaly Detection Annotated Section 100 300
No ratings yet
5 Anomaly Detection Annotated Section 100 300
48 pages
Data Handling: Probability Statistics II
No ratings yet
Data Handling: Probability Statistics II
98 pages
ML Course Slides
No ratings yet
ML Course Slides
345 pages
AA1 Tema4
No ratings yet
AA1 Tema4
37 pages
CS L06 MachineLearning AnomalyDetection
No ratings yet
CS L06 MachineLearning AnomalyDetection
61 pages
Module-I: Office Management and Automation - B. Com, FCMS, Sri Sri University
No ratings yet
Module-I: Office Management and Automation - B. Com, FCMS, Sri Sri University
52 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
169 pages
Anomaly Detection: A Tutorial
No ratings yet
Anomaly Detection: A Tutorial
101 pages
ML RUSA Module 6 Probablistic EM KNN SVM
No ratings yet
ML RUSA Module 6 Probablistic EM KNN SVM
51 pages
Slide 11 - Anomaly Detection PDF
No ratings yet
Slide 11 - Anomaly Detection PDF
31 pages
Introtoanomalydetection 170421012904
No ratings yet
Introtoanomalydetection 170421012904
53 pages
Lecture 2b
No ratings yet
Lecture 2b
45 pages
PR Practical File
No ratings yet
PR Practical File
38 pages
Solution
No ratings yet
Solution
148 pages
Data and Metrics
No ratings yet
Data and Metrics
35 pages
Advertisement Agency
72% (18)
Advertisement Agency
42 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
59 pages
Ecmlpkdd08 Lazarevic Dmfa
No ratings yet
Ecmlpkdd08 Lazarevic Dmfa
116 pages
Dmba103 - Statistics For Management-Muj 1511
No ratings yet
Dmba103 - Statistics For Management-Muj 1511
12 pages
Machine Learning
No ratings yet
Machine Learning
48 pages
8086 Instruction Set
No ratings yet
8086 Instruction Set
50 pages
Anomaly Detection
No ratings yet
Anomaly Detection
49 pages
12 Anomaly Detection SVD III
No ratings yet
12 Anomaly Detection SVD III
25 pages
Data Mining Slide Contents
No ratings yet
Data Mining Slide Contents
22 pages
Bayesian Classifier Implementation Using MATLAB
No ratings yet
Bayesian Classifier Implementation Using MATLAB
21 pages
DSH - L5 - Data-Driven Approaches - Concepts
No ratings yet
DSH - L5 - Data-Driven Approaches - Concepts
38 pages
C3 W1 Anomaly Detection
No ratings yet
C3 W1 Anomaly Detection
14 pages
02 - 03 - Anomaly Detection Survey
No ratings yet
02 - 03 - Anomaly Detection Survey
27 pages
Dealing With Outliers
No ratings yet
Dealing With Outliers
19 pages
Probability PDF
No ratings yet
Probability PDF
30 pages
Linear Regression
No ratings yet
Linear Regression
13 pages
1 s2.0 S254252932200339X Main
No ratings yet
1 s2.0 S254252932200339X Main
33 pages
Anomaly Detection: Jing Gao
No ratings yet
Anomaly Detection: Jing Gao
51 pages
Module 11 (C)
No ratings yet
Module 11 (C)
4 pages
Data Mining Chapter 6 Anomaly & Fraud Detection
No ratings yet
Data Mining Chapter 6 Anomaly & Fraud Detection
41 pages
Feature Engineering
No ratings yet
Feature Engineering
66 pages
Normal Distribn Theory
0% (1)
Normal Distribn Theory
16 pages
WP S-Ax Key Steps To Detect An Anomaly in Real-time-JAN10
No ratings yet
WP S-Ax Key Steps To Detect An Anomaly in Real-time-JAN10
10 pages
English Project
No ratings yet
English Project
30 pages
Gaussian Probability Density Functions: Properties and Error Characterization
No ratings yet
Gaussian Probability Density Functions: Properties and Error Characterization
30 pages
Week 9 Lecture Notes
No ratings yet
Week 9 Lecture Notes
7 pages
Chapter 6 7 Anomaly Fraud Detection Advanced Datamining Application
No ratings yet
Chapter 6 7 Anomaly Fraud Detection Advanced Datamining Application
10 pages
90210-1272DEB E Cubic-S Instruction Manual PDF
No ratings yet
90210-1272DEB E Cubic-S Instruction Manual PDF
258 pages
10 - Anomaly Detection
No ratings yet
10 - Anomaly Detection
12 pages
Report On Gillette PDF
No ratings yet
Report On Gillette PDF
19 pages
Excel Simulations
From Everand
Excel Simulations
Gerard M. Verschuuren
3.5/5 (2)
Docs Slides Lecture15
No ratings yet
Docs Slides Lecture15
37 pages
5 Random Var PDF
No ratings yet
5 Random Var PDF
74 pages
Anomaly Detection
No ratings yet
Anomaly Detection
3 pages
Anomaly Detection: A Tutorial: Arindam Banerjee, Varun Chandola, Vipin Kumar, Jaideep Srivastava
No ratings yet
Anomaly Detection: A Tutorial: Arindam Banerjee, Varun Chandola, Vipin Kumar, Jaideep Srivastava
101 pages
Documento Actas JEOR
No ratings yet
Documento Actas JEOR
15 pages
Anomaly Detection
No ratings yet
Anomaly Detection
7 pages
Admin Resume Sample
100% (1)
Admin Resume Sample
6 pages
Ebook Beginners Guide To Anomaly Detection 2022
No ratings yet
Ebook Beginners Guide To Anomaly Detection 2022
12 pages
Nonprofit Organizations
No ratings yet
Nonprofit Organizations
25 pages
Outlier Detection
No ratings yet
Outlier Detection
36 pages
Internship Assessment Report On: Prakash Gupta 1812210083 Under The Guidance of Er. Devendra Kumar
No ratings yet
Internship Assessment Report On: Prakash Gupta 1812210083 Under The Guidance of Er. Devendra Kumar
26 pages
Ankit Pandey
No ratings yet
Ankit Pandey
9 pages
Haris Waheed Bhatti
No ratings yet
Haris Waheed Bhatti
26 pages
Object Oriented Programming Paradigm (OOPP)
No ratings yet
Object Oriented Programming Paradigm (OOPP)
14 pages
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
From Everand
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
Andrei Besedin
2.5/5 (2)
Graphing Practice
No ratings yet
Graphing Practice
6 pages
Electric Bill
No ratings yet
Electric Bill
1 page
Election Worker Lawsuit Dismissal
No ratings yet
Election Worker Lawsuit Dismissal
12 pages
AVR® Microcontroller Hardware Design Considerations
No ratings yet
AVR® Microcontroller Hardware Design Considerations
26 pages
Assignment 2 Chem
No ratings yet
Assignment 2 Chem
2 pages
Ex 8
No ratings yet
Ex 8
15 pages
Zertifikat-IEC62109-fuer-Huawei-SUN2000-215KTL-H0-Wechselrichter
No ratings yet
Zertifikat-IEC62109-fuer-Huawei-SUN2000-215KTL-H0-Wechselrichter
3 pages
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
No ratings yet
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
7 pages
Cutting of Cement Bags by Manually JSA HSE Professionals
No ratings yet
Cutting of Cement Bags by Manually JSA HSE Professionals
1 page
MFRS 112 Dtadtl
No ratings yet
MFRS 112 Dtadtl
19 pages
Landslide Cameron Highland
No ratings yet
Landslide Cameron Highland
14 pages
XV. Anomaly Detection
0% (1)
XV. Anomaly Detection
4 pages
cs229 Notes10 PDF
No ratings yet
cs229 Notes10 PDF
6 pages
10: Advice For Applying Machine Learning: Deciding What To Try Next
No ratings yet
10: Advice For Applying Machine Learning: Deciding What To Try Next
8 pages
Markov Models Supervised and Unsupervised Machine Learning: Mastering Data Science And Python
From Everand
Markov Models Supervised and Unsupervised Machine Learning: Mastering Data Science And Python
William Sullivan
2/5 (1)
12 Support Vector Machines PDF
No ratings yet
12 Support Vector Machines PDF
11 pages
06 Logistic Regression PDF
No ratings yet
06 Logistic Regression PDF
10 pages
11252
No ratings yet
11252
2 pages
17 Large Scale Machine Learning PDF
No ratings yet
17 Large Scale Machine Learning PDF
10 pages
11 Machine Learning System Design PDF
No ratings yet
11 Machine Learning System Design PDF
7 pages
6anomaly Fraud Detection
No ratings yet
6anomaly Fraud Detection
5 pages
18: Application Example OCR: Problem Description and Pipeline
No ratings yet
18: Application Example OCR: Problem Description and Pipeline
6 pages
16 Recommender Systems PDF
No ratings yet
16 Recommender Systems PDF
6 pages
08 Neural Networks Representation PDF
No ratings yet
08 Neural Networks Representation PDF
10 pages
Linear Regression With Multiple Features
No ratings yet
Linear Regression With Multiple Features
7 pages
Housekeeping House Rules Lesson Exemplar
No ratings yet
Housekeeping House Rules Lesson Exemplar
16 pages
01 02 Introduction Regression Analysis and GR
No ratings yet
01 02 Introduction Regression Analysis and GR
11 pages
Anomaly Detection
No ratings yet
Anomaly Detection
11 pages
Steven R Dobbs: Marketing and Communications Manager
No ratings yet
Steven R Dobbs: Marketing and Communications Manager
3 pages
09: Neural Networks - Learning: Neural Network Cost Function
No ratings yet
09: Neural Networks - Learning: Neural Network Cost Function
9 pages
C 3 Kernel 3 v2.11 June 2023
No ratings yet
C 3 Kernel 3 v2.11 June 2023
150 pages
03: Linear Algebra - Review: Matrices - Overview
No ratings yet
03: Linear Algebra - Review: Matrices - Overview
4 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
13: Clustering: Unsupervised Learning - Introduction
No ratings yet
13: Clustering: Unsupervised Learning - Introduction
4 pages
Brands You Trust.: KROMBACH® Triple-Eccentric Valves (Tri-EX) AK110
No ratings yet
Brands You Trust.: KROMBACH® Triple-Eccentric Valves (Tri-EX) AK110
8 pages
Step Down Requirements
No ratings yet
Step Down Requirements
1 page
Boycott List of Israel Items
No ratings yet
Boycott List of Israel Items
3 pages
07: Regularization: The Problem of Overfitting
No ratings yet
07: Regularization: The Problem of Overfitting
5 pages
Assignment 6
No ratings yet
Assignment 6
4 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Anomaly Detection - Problem Motivation

Uploaded by

Anomaly Detection - Problem Motivation

Uploaded by

15: Anomaly Detection

Previous Next Index

Anomaly detection - problem motivation

Uh oh! - this looks like an anomalous data-point

i.e. this would be our model if we had 2D data

The Gaussian distribution (optional)

This specifies the probability of x taking a value

Some examples of Gaussians below

Parameter estimation problem

Problem is - say you suspect these examples come from a Gaussian

Anomaly detection algorithm

Anomaly detection example

If we plot the Gaussian for x1 and x2 we get something like this

Developing and evaluating and anomaly detection system

Anomaly detection vs. supervised learning

Very small number of positive examples

Reasonably large number of positive and negative examples

Choosing features to use

Error analysis for anomaly detection

Good way of coming up with features

Multivariate Gaussian distribution

Multivariate Gaussian distribution model

To get around this we develop the multivariate Gaussian distribution

NB don't memorize this - you can always look it up

So now we change the plot to

Applying multivariate Gaussian distribution to anomaly detection

Which comes with the parameters μ and Σ

Using these two formulas you get the parameters

Anomaly detection algorithm with multivariate Gaussian distribution

3) Compare the value with ε (threshold probability value)

Which means it's likely to identify the green value as anomalous

Original model vs. Multivariate Gaussian

Original Gaussian model

Probably used more often

Multivariate Gaussian model

Used less frequently

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.