0% found this document useful (0 votes)
34 views17 pages

Anomaly-Detection 112940

Anomaly detection

Uploaded by

Assem Mahmoud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views17 pages

Anomaly-Detection 112940

Anomaly detection

Uploaded by

Assem Mahmoud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Introduction to

Anomaly Detection

Introduction to
Anomaly Detection
Linghao Chen

HOMEPAGE: https://lhchen.top
lhchen@stu.xidian.edu.cn
School of Computer Science and Technology, Xidian University, Xi'an, ShaanXi, P.R.China
Anomaly Detection

What is it?

[1]: Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "Isolation forest." 2008 eighth ieee international conference on data mining. IEEE, 2008.
[2]: Eswaran, Dhivya, et al. "Spotlight: Detecting anomalies in streaming graphs." Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining. 2018.
Problems

Why so hard to detect anomaly?

✓ Unsupervised learning in most cases;


✓ The data is extremely unbalanced;
✓ It often involves density estimation, which requires a large amount of
distance or similarity calculations, and computationally expensive;
✓ Real-time detection;
✓ Interpretability of methods.
Methods

Classic Methods:

➢ kNN(K-Nearest Neighbor)
➢ LOF(Local Outlier Factor)
➢ PCA(Principal Component Analysis)
➢ HBOS(Histogram-based Outlier Score)
➢ Isolation Forest
➢ AE(Auto Encoder)
kNN(K-Nearest Neighbor)

1
𝑁 𝑝
𝑝
𝐷𝑖𝑠 𝑥, 𝑦 = ෍ 𝑥𝑖 − 𝑦𝑖
𝑖=1

Choose Top K-th Dstance

Simple but expensive!

[1]: Ramaswamy, S., Rastogi, R. and Shim, K., 2000, May. Efficient algorithms for mining outliers from large data sets. ACM Sigmod
Record, 29(2), pp. 427-438.
LOF(Local Outlier Factor)

K-distance of an object p

5-distance

[1]: Breunig, M.M., Kriegel, H.P., Ng, R.T. and Sander, J., 2000, May. LOF: identifying density-based local outliers. ACM Sigmod Record,
29(2), pp. 93-104.
LOF(Local Outlier Factor)

K-distance neighborhood of an object p


𝑁𝑘 𝑂 = 𝑃′ ∈ 𝐷{𝑂 | 𝑑 𝑂, 𝑃′ ≤ 𝑑𝑘 (𝑂)}

𝑁5 𝑂 = {𝑃1 , 𝑃2 , 𝑃3 , 𝑃4 , 𝑃5 , 𝑃6 }
𝑃5

Reachability distance of an object P w.r.t. object O


|𝑁𝑘 (𝑃)| 𝑃6
𝜌𝑘 𝑃 =
σ𝑂∈𝑁𝑘 (𝑃) 𝑑_𝑘(𝑃, 𝑂)

𝜌𝑘 𝑂
σ𝑂∈𝑁𝑘 (𝑃)
𝜌𝑘 𝑃
5-distance
𝐿𝑂𝐹𝑘 𝑃 =
|𝑁𝑘 (𝑃)|

[1]: Breunig, M.M., Kriegel, H.P., Ng, R.T. and Sander, J., 2000, May. LOF: identifying density-based local outliers. ACM Sigmod Record,
29(2), pp. 93-104.
PCA(Principal Component Analysis)

Algorithm
Input: 𝑋 ∈ ℝn×𝑚 with 𝑛 samples
Output: 𝑌 = 𝑊𝑋 ∈ ℝn×𝑚′

1 𝑚
Normalization: 𝑥𝑖 = 𝑥𝑖 − σ 𝑥
𝑚 𝑗=1 𝑗
1
Covariance matrix: 𝐶 = 𝑋𝑋 𝑇
𝑚

Calculate eigenvectors
Anomaly score: the distance between the abnormal sample and the feature vector

[1]: Shyu, Mei-Ling, et al. A novel anomaly detection scheme based on principal component classifier. MIAMI UNIV CORAL GABLES
FL DEPT OF ELECTRICAL AND COMPUTER ENGINEERING, 2003.
HBOS(Histogram-based Outlier Score)

Methods

Low density area

[1]: Goldstein, M. and Dengel, A., 2012. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm. In KI-2012:
Poster and Demo Track, pp.59-63.
HBOS(Histogram-based Outlier Score)

Assumption
Multidimensional data is independent of each dimension.

Algorithm
➢ Draw a data histogram
➢ Divide the value range into K buckets of equal(sometimes can be dynamic)
width, and the frequency of the value falling into each bucket is used as an
estimate of density.

Anomaly Score
𝑎
1
𝐻𝐵𝑂𝑆 𝑝 = ෍ log( )
ℎ𝑖𝑠𝑡𝑖 (𝑝)
𝑖=0
[1]: Goldstein, M. and Dengel, A., 2012. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm. In KI-2012:
Poster and Demo Track, pp.59-63.
AE(Auto Encoder)

Latent Representation

[1]: Ramaswamy, S., Rastogi, R. and Shim, K., 2000, May. Efficient algorithms for mining outliers from large data sets. ACM Sigmod
Record, 29(2), pp. 427-438.
Isolation Forest

ICDM '08

[1]: Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In International Conference on Data Mining (ICDM), pp. 413-
422. IEEE
Isolation Forest

Anomaly Detection

[1]: Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In International Conference on Data Mining (ICDM), pp. 413-
422. IEEE
Isolation Forest

Anomaly Detection

[1]: Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In International Conference on Data Mining (ICDM), pp. 413-
422. IEEE
Isolation Forest

Anomaly Detection

[1]: Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In International Conference on Data Mining (ICDM), pp. 413-
422. IEEE
REFERENCE

[1]: Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "Isolation forest." 2008 eighth ieee international
conference on data mining. IEEE, 2008.
[2]: Eswaran, Dhivya, et al. "Spotlight: Detecting anomalies in streaming graphs." Proceedings of the 24th
ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018.
[3]: Ramaswamy, S., Rastogi, R. and Shim, K., 2000, May. Efficient algorithms for mining outliers from
large data sets. ACM Sigmod Record, 29(2), pp. 427-438.
[4]: Breunig, M.M., Kriegel, H.P., Ng, R.T. and Sander, J., 2000, May. LOF: identifying density-based local
outliers. ACM Sigmod Record, 29(2), pp. 93-104.
[5]: Shyu, Mei-Ling, et al. A novel anomaly detection scheme based on principal component classifier.
MIAMI UNIV CORAL GABLES FL DEPT OF ELECTRICAL AND COMPUTER ENGINEERING, 2003.
[6]: Goldstein, M. and Dengel, A., 2012. Histogram-based outlier score (hbos): A fast unsupervised anomaly
detection algorithm. In KI-2012: Poster and Demo Track, pp.59-63.
[7]: Ramaswamy, S., Rastogi, R. and Shim, K., 2000, May. Efficient algorithms for mining outliers from
large data sets. ACM Sigmod Record, 29(2), pp. 427-438.
[8]: Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In International Conference on
Data Mining (ICDM), pp. 413-422. IEEE
Introduction to
Anomaly Detection

Q&A

Linghao Chen
HOMEPAGE: https://lhchen.top
lhchen@stu.xidian.edu.cn
School of Computer Science and Technology, Xidian University, Xi'an, ShaanXi, P.R.China

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy