Anomaly-Fraud-Detection
Anomaly-Fraud-Detection
Anomaly/Outlier Detection
✔ What are anomalies/outliers?
– The set of data points that are
considerably different than the
remainder of the data
3
Causes of Anomalies
✔ Natural variation
– Unusually tall people
✔ Data errors
– 200 pound 2 year old
4
Distinction Between Noise and Anomalies
✔
Noise doesn’t necessarily produce unusual values or objects
✔
Noise is not interesting
✔
Noise and anomalies are related but distinct concepts
5
Model-based vs Model-free
✔
Model-based Approaches
✔
Model can be parametric or non-parametric
✔
Anomalies are those points that don’t fit well
✔
Anomalies are those points that distort the model
✔
Model-free Approaches
✔
Anomalies are identified directly from the data without
building a model
✔
Often the underlying assumption is that the most of the points in
the data are normal
6
General Issues: Label vs Score
✔
Some anomaly detection techniques provide only a
binary categorization
✔
Other approaches measure the degree to which an
object is an anomaly
✔
This allows objects to be ranked
✔
Scores can also have associated meaning (e.g., statistical
significance)
7
Anomaly Detection Techniques
✔ Statistical Approaches
✔ Proximity-based
– Anomalies are points far away from other points
✔ Clustering-based
– Points far away from cluster centers are outliers
– Small clusters are outliers
✔ Reconstruction Based
8
Statistical Approaches
✔ Probabilistic definition of an outlier: An outlier is an
object that has a low probability with respect to a
probability distribution model of the data.
✔ Usually assume a parametric model describing the
distribution of the data (e.g., normal distribution)
✔ Apply a statistical test that depends on
– Data distribution
– Parameters of distribution (e.g., mean, variance)
– Number of expected outliers (confidence limit)
9
Normal Distributions
One-dimensional
Gaussian
7
0.1
6
0.09
5
0.08
4
0.07
3
2
0.06
Two-dimensional
1
0.05 Gaussian
y
0 0.04
-1 0.03
-2 0.02
-3 0.01
-4
probability
-5 density
-4 -3 -2 -1 0 1 2 3 4 5
x
10
Strengths/Weaknesses of Statistical Approaches
✔
Firm mathematical foundation.
✔
Can be very efficient.
✔
Good results if distribution is known.
✔
In many cases, data distribution may not be known.
✔
For high dimensional data, it may be difficult to estimate
the true distribution.
✔
Anomalies can distort the parameters of the distribution.
11
Distance-Based Approaches
One of the simplest ways to define a proximity-based anomaly
score of a data instance x is to use the distance to its kth
nearest neighbor, dist(x,k).
If an instance x has many other instances located close to it
(characteristic of the normal class), it will have a low value of
dist(x,k).
On other hand, an anomalous instance x will be quite distant
from its k-neighboring instances and would thus have a high
value of dist(x, k).
12
One Nearest Neighbor - One Outlier
Figure shows a set of points in a two-
dimensional space that have been shaded
according to their distance to the kth nearest D 2
Note that point D has been correctly assigned a 1.6
0.8
0.6
0.4
Outlier Score
13
One Nearest Neighbor - Two Outliers
0.55
Note that dist(x,k) can be quite sensitive to the value of D
k. If k is too small, e.g., 1, then a small number of 0.5
0.35
0.3
0.25
0.2
0.15
0.1
0.05
For example, Figure shows anomaly scores using k =
1 for a set of normal points and two outliers that are
located close to each other (shading reflects anomaly Outlier Score
scores). Note that both D and its neighbor have a low
14
anomaly score.
Five Nearest Neighbors - Small Cluster
✔
If k is too large, then it is possible for all objects in a 2
cluster that has fewer than k objects to become
anomalies.
D
1.8
✔
For example, Figure below shows a data set that has 1.6
0.8
0.6
0.4
Outlier Score
✔
For k = 5, the anomaly score of all points in the 15
smaller cluster is very high.
Five Nearest Neighbors - Differing Density
D 1.8
1.6
1.4
1.2
C
1
0.8
0.6
0.4
0.2
✔
Simple
✔
Expensive – O(n2)
✔
Sensitive to parameters
✔
Sensitive to variations in density
✔
Distance becomes less meaningful in high-
dimensional space
17
Density-Based Approaches
✔
Consider the density of a point relative to that of
its k nearest neighbors.
✔
We define the following measures of density that
are based on the two distance measures,
20
Relative Density Outlier Scores
✔
In scenarios where the data contains
regions of varying densities, such
methods would not be able to correctly 6.85
1.40 4
D
1.33
2
A
Outlier Score
21
Relative Density Outlier Scores
✔
Assigning anomaly scores to points
according to dist(x, k) with k = 5 correctly
identifies point C to be an anomaly, but 6.85
1.40 4
D
1.33
2
A
✔
In fact, the score for D is much lower
than many points that are part of the Outlier Score
loose cluster.
22
Relative Density Outlier Scores
✔
To correctly identify anomalies in such
data sets, we need a notion of density
that is relative to the densities of 6.85
neighboring instances. C
6
1.40 4
D
1.33
2
A
✔
For example, point D in Figure has a
higher absolute density than point A, but Outlier Score
its density is lower relative to its nearest
23
neighbors.
Relative Density Outlier Scores
✔
There are many ways to define the
relative density of an instance.
✔
For a point x, One approach is to 6.85
1.40 4
D
1.33
2
A
Outlier Score
24
Relative Density Outlier Scores
✔
The relative density of a point is high
when the average density of points in its
neighborhood is significantly higher than 6.85
1.40 4
D
1.33
2
A
Outlier Score
25
Relative Density-based: LOF approach
✔
Note that by replacing density(x,k) with avg.density(x,k) in the
above equation, we can obtain a more robust measure of relative
density.
✔
The above approach is similar to that used by the Local Outlier
Factor (LOF) score, which is a widely-used measure for detecting
anomalies using relative density.
26
Relative Density-based: LOF approach
27
Relative Density-based: LOF approach
28
Strengths/Weaknesses of Density-Based Approaches
✔
Simple.
✔
Expensive – O(n2).
✔
Sensitive to parameters.
✔
Density becomes less meaningful in high-
dimensional space.
29
Clustering-Based Approaches
30
Distance of Points from Closest Centroids
✔
Here using K-means algorithm, the anomaly score of a point is computed
by the point’s distance from its closest centroid
31
Relative Distance of Points from Closest Centroid
✔
Here using K-means algorithm, the anomaly score of a point is
computed by the point’s relative distance from its closest centroid,
where the relative distance is the ratio of the point’s distance from the
centroid to the median distance of all points in the cluster from the
centroid.
✔
This approach is used to adjust for the large difference in density
32
between compact and loose clusters.
Strengths/Weaknesses of Clustering-Based Approaches
✔
Simple
✔
Many clustering techniques can be used
✔
Can be difficult to decide on a clustering
technique
✔
Can be difficult to decide on number of clusters
✔
Outliers can distort the clusters
33
Reconstruction-Based Approaches
✔
Let be the original data object
✔
Find the representation of the object in a lower
dimensional space
✔
Project the object back to the original space
✔
Call this object
✔
Objects with large reconstruction errors are
anomalies.
35
Reconstruction of two-dimensional data
36
Basic Architecture of an Autoencoder
37
Strengths and Weaknesses
✔
Does not require assumptions about distribution
of normal class
✔
Can use many dimensionality reduction
approaches
✔
The reconstruction error is computed in the
original space
✔
This can be a problem if dimensionality is high
38
One Class SVM
✔
Uses an SVM approach to classify normal objects
✔
Uses the given data to construct such a model
✔
This data may contain outliers
✔
But the data does not contain class labels
✔
How to build a classifier given one class?
39
How Does One-Class SVM Work?
✔
Uses the “origin” trick
✔
Use a Gaussian kernel
✔
Every point mapped to a unit hypersphere
✔
Every point in the same orthant (quadrant)
✔
Aim to maximize the distance of the separating
plane from the origin
40
Two-dimensional One Class SVM
41
Equations for One-Class SVM
✔ Equation of hyperplane
✔ is the mapping to high dimensional space
✔ Weight vector is
✔ is fraction of outliers
✔ Optimization condition is the following
42
Finding Outliers with a One-Class SVM
43
Finding Outliers with a One-Class SVM
44
Strengths and Weaknesses
✔
Strong theoretical foundation
✔
Choice of is difficult
✔
Computationally expensive
45
Information Theoretic Approaches
46
Information Theoretic Example
✔
Survey of height and weight for 100 participants
✔
Eliminating last group give a gain of
2.08 − 1.89 = 0.19
47
Strengths and Weaknesses
✔
Solid theoretical foundation.
✔
Theoretically applicable to all kinds of data.
✔
Difficult and computationally expensive to
implement in practice.
48
Evaluation of Anomaly Detection
49
Distribution of Anomaly Scores
50