0% found this document useful (0 votes)

1 views41 pages

Unit 5 - Lecture 2 - Statistical - Methods - Mining - Techniques

Chapter 12 discusses outlier analysis, including definitions, challenges, and various detection methods such as supervised, unsupervised, and semi-supervised approaches. It highlights techniques like clustering and statistical methods for identifying outliers in datasets, with applications in fraud detection and other fields. The chapter also covers specific methods like Grubb’s Test for univariate outliers and Mahalanobis distance for multivariate outliers.

Uploaded by

julybabies2804

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views41 pages

Unit 5 - Lecture 2 - Statistical - Methods - Mining - Techniques

Uploaded by

julybabies2804

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

1

Data Mining & Analytics

— Unit 5 —
— (Chapter 12) —

Outliers: Introduction, Challenges &

Detection Methods

2
Chapter 12. Outlier Analysis

 Outlier and Outlier Analysis

 Outlier Detection Methods
 Statistical Approaches
 Proximity-Base Approaches
 Clustering-Base Approaches
 Classification Approaches

3
Outlier Detection Methods

4
Outlier Detection Methods- Supervised
Methods

5
Outlier Detection Methods- Supervised
Methods
• Supervised methods model data normality and
• abnormality.
Domain experts examine and label a sample of the
underlying data.
• Outlier detection can then be modeled as a
classification problem.

• The sample is used for training and testing.

• In some applications, the experts may label just
the normal objects, and any other objects not
matching the model
of normal objects are reported as outliers.
• Other methods model the outliers and treat
objects not matching the model of outliers as
6
Outlier Detection Methods-
UnSupervised Methods
 In some application scenarios, objects labeled as “normal” or
“outlier” are not available. Thus, an unsupervised learning
method has to be used.

 Many clustering methods can be adapted to act as

unsupervised outlier detection methods.
 The central idea is to find clusters first, and then the data
objects not belonging to any cluster are detected as outliers.
 However, such methods suffer from two issues.
 First, a data object not belonging to any cluster may be noise
instead of an outlier.
 Second, it is often costly to find clusters first and then find
outliers.

7
Outlier Detection Methods-
UnSupervised Methods
Clustering (DBSCAN, K-Means):
• DBSCAN (Density-Based Clustering): Finds points

that do not belong to any dense cluster.

• K-Means: Points far from cluster centroids can be

outliers.

8
Outlier Detection Methods-
UnSupervised Methods
Use Case : Detecting Fraudulent Credit Card
Transactions That Don’t Fit Into Usual Spending Clusters

Approach:
Clustering-Based Outlier Detection (Unsupervised)

When labeled fraudulent transactions are not available, we can use

clustering to find usual spending patterns (clusters) and identify
transactions that do not fit well into any cluster as potential fraud.

9
Outlier Detection Methods-
UnSupervised Methods
Clustering-Based Outlier Detection (Unsupervised)
1. Collect and Prepare Data
Include features such as:
•Transaction amount

•Time of transaction

•Merchant category
•Location

•Frequency of spending

10
Outlier Detection Methods- UnSupervised Methods
2. Apply Clustering Algorithm
Use K-Means or DBSCAN to group similar transactions.

•K-Means:
•Find the distance of each transaction to its assigned

cluster center.
•Transactions with large distances from the cluster

centroid may be fraudulent.

•DBSCAN:
•Identifies dense clusters of normal behavior.
•Any transaction not assigned to a cluster (labeled

as noise) is a potential outlier (possible fraud).

11
Outlier Detection Methods- UnSupervised Methods

Example Scenario

12
Outlier Detection Methods- Semi-
Supervised Methods
• Only normal data is available during training, and the model identifies
deviations.
• Examples: Manufacturing defects, disease detection.
Techniques:
1. One-Class SVM: Trains only on normal data and flags anything that

deviates.
1. Example: Training on non-fraudulent credit card transactions and

flagging any unusual activity.

2. Autoencoders (Neural Networks): Trains a model to reconstruct

normal data; outliers have high reconstruction error.

1. Example: Identifying faults in industrial machines by training on

normal vibration patterns.

13
Outlier Detection Methods- Semi-
Supervised Methods
 In many applications, although obtaining some labeled
examples is feasible, the number of such labeled examples is
often small. We may encounter cases where only a small set
of the normal and/or outlier objects are labeled, but most of
the data are unlabeled.
 Semi-supervised outlier detection methods were developed
to tackle such scenarios.
 when some labeled normal objects are available, we can use
them, together with unlabeled objects that are close by, to
train a model for normal objects.
 The model of normal objects then can be used to detect
outliers—those objects not fitting the model of normal
objects are classified as outliers.

14
Outlier Detection Methods

 (1) Statistical Methods

 (2) Proximity-Based Methods
 (3) Clustering-Based Methods

15
Outlier Detection (2): Proximity-Based Methods

16
Outlier Detection (3): Clustering-Based Methods

17
Chapter 12. Outlier Analysis
 Outlier and Outlier Analysis
 Outlier Detection Methods
 Statistical Approaches
 Proximity-Base Approaches
 Clustering-Base Approaches
 Classification Approaches

18
Outlier Detection (1): Statistical Methods

19
Outlier Detection (1): Statistical Methods

 If the heights of adult males in a city follow a

normal distribution with a mean of 170 cm and
most people are between 160–180 cm, a man
who is 210 cm tall is very rare.
 The model says, "This doesn’t fit the expected
pattern," so it's flagged.

20
Statistical Approaches

21
Parametric Methods I: Detection Univariate
Outliers Based on Normal Distribution

22
Parametric Methods I: The Grubb’s Test

Detects one outlier at a time in a normally distributed dataset.

23
The Grubb’s Test (contd….)

where is the value taken by a t-distribution at a

significance level of α/(2N), and N is the #of objects in the data
set
24
Parametric Methods II: Detection of
Multivariate Outliers

25
Parametric Methods II: Detection of
Multivariate Outliers(contd)

 1.Calculate the mean vector from the multivariate data set.

 2.For each object o calculate MDist(o, ō), the Mahalanobis distance from o to ō.
 3. Detect outliers in the transformed univariate dataset , {MDist(o, ō) = (o – ō )| o € D}.
 4.If MDist(o, ō)is determined to be an outlier , then o is regarded as an outlier as well.

26
Parametric Methods II: Detection of
Multivariate Outliers(contd)

 is the value of o on the ith dimension.

 where Ei is the mean of the i-dimension among all objects, and n is the dimensionality
 If χ2 –statistic is large, then object oi is an outlier

27
Parametric Methods III: Using Mixture of
Parametric Distributions

28
Non-Parametric Methods: Detection Using Histogram

29
Other Methodologies of Data Mining

 Statistical Data Mining

30
Major Statistical Data Mining Methods

31
Statistical Data Mining (1)

32
33
Scientific and Statistical Data Mining (2)

34
Generalized linear models
Linear models

35
Mixed-effect models
• When there are multiple levels, such as patients seen by the
same doctor, the variability in the outcome can be thought of
as being either within group or between group.
• Patient level observations are not independent, as within a
given doctor patients are more similar.
• Units sampled at the highest level (in our example, doctors)
are independent. The figure below shows a sample where the
dots are patients within doctors, the larger circles.

36
Scientific and Statistical Data Mining (3)

37
Statistical Data Mining (4)

www.spss.com/datamine/factor.htm
38
Discriminant analysis

39
Statistical Data Mining (5)

40
Thank You!!!

Unit 5-2
No ratings yet
Unit 5-2
41 pages
Outliers EXTD
No ratings yet
Outliers EXTD
24 pages
5 Anomaly Detection Annotated Section 100 300
No ratings yet
5 Anomaly Detection Annotated Section 100 300
48 pages
ADII11 Metode Deteksi Outlier
No ratings yet
ADII11 Metode Deteksi Outlier
50 pages
Anomoly Detection - Ensemble - Classifiers
No ratings yet
Anomoly Detection - Ensemble - Classifiers
68 pages
Outlier Detection
No ratings yet
Outlier Detection
17 pages
Outlier Detection
No ratings yet
Outlier Detection
9 pages
Unit-5 Outlier Analysis
No ratings yet
Unit-5 Outlier Analysis
32 pages
741 Outlier Detection
No ratings yet
741 Outlier Detection
55 pages
12outlier 1
No ratings yet
12outlier 1
45 pages
Data Mining Slide Contents
No ratings yet
Data Mining Slide Contents
22 pages
Unit 5 - Lecture 1 - Outlier Detection
No ratings yet
Unit 5 - Lecture 1 - Outlier Detection
30 pages
Lec3. Outlier Analysis
No ratings yet
Lec3. Outlier Analysis
54 pages
Lecture 12
No ratings yet
Lecture 12
54 pages
Outlier Analysis
No ratings yet
Outlier Analysis
18 pages
Module5 - Outlier - Analysis: Reference: "Data Mining The Text Book", Charu C. Aggarwal, Springer, 2015. (Chapters 8)
No ratings yet
Module5 - Outlier - Analysis: Reference: "Data Mining The Text Book", Charu C. Aggarwal, Springer, 2015. (Chapters 8)
21 pages
Introduction To Outlier Analysis Complete
No ratings yet
Introduction To Outlier Analysis Complete
12 pages
07 Outlier Detection
No ratings yet
07 Outlier Detection
54 pages
Unit 5
No ratings yet
Unit 5
47 pages
Outlier Detection
No ratings yet
Outlier Detection
30 pages
17 dm2 Anomaly Detection 2022 23
No ratings yet
17 dm2 Anomaly Detection 2022 23
113 pages
Outlier Detection
No ratings yet
Outlier Detection
45 pages
On Normalization and Algorithm Selection For Unsupervised Outlier Detection
No ratings yet
On Normalization and Algorithm Selection For Unsupervised Outlier Detection
34 pages
Lecture 8 Data Prepration Techniques
No ratings yet
Lecture 8 Data Prepration Techniques
4 pages
Anomaly or Outlier Detection
No ratings yet
Anomaly or Outlier Detection
14 pages
Guide On Outlier Detection Methods
No ratings yet
Guide On Outlier Detection Methods
11 pages
ADII10 Analisa Outlier
No ratings yet
ADII10 Analisa Outlier
37 pages
Outlier Detection
No ratings yet
Outlier Detection
19 pages
Outlier Analysis
No ratings yet
Outlier Analysis
28 pages
12 Outlier
No ratings yet
12 Outlier
18 pages
Unit 5
No ratings yet
Unit 5
70 pages
Unit 4
No ratings yet
Unit 4
17 pages
A Survey On Outlier Detection Methods
No ratings yet
A Survey On Outlier Detection Methods
4 pages
12 Outlier
No ratings yet
12 Outlier
16 pages
Anomaly Detection and Outlier Analysis
No ratings yet
Anomaly Detection and Outlier Analysis
25 pages
Outliers ML
No ratings yet
Outliers ML
14 pages
Feature Engineering
No ratings yet
Feature Engineering
66 pages
Outlier Detection Techniques
100% (2)
Outlier Detection Techniques
56 pages
Outlier Analysis in Data Mining
No ratings yet
Outlier Analysis in Data Mining
5 pages
INFO 101 Chapter 12 - System Analysis and Design
No ratings yet
INFO 101 Chapter 12 - System Analysis and Design
25 pages
Outliers
No ratings yet
Outliers
3 pages
4 - Outliers - +transformaations ML
No ratings yet
4 - Outliers - +transformaations ML
28 pages
Missing and Outlier
No ratings yet
Missing and Outlier
20 pages
Data Minning Unit 4-1
No ratings yet
Data Minning Unit 4-1
10 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
44 pages
Outlier Detection
No ratings yet
Outlier Detection
22 pages
Distance-Based Outlier Detection: Consolidation and Renewed Bearing
No ratings yet
Distance-Based Outlier Detection: Consolidation and Renewed Bearing
12 pages
Docs Quality MA - IMS.00001 Integrated Management System Manual
No ratings yet
Docs Quality MA - IMS.00001 Integrated Management System Manual
42 pages
Data Cleaning
No ratings yet
Data Cleaning
4 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
13 pages
Chapter 4 Part 2
No ratings yet
Chapter 4 Part 2
12 pages
Outlier Detection
No ratings yet
Outlier Detection
36 pages
Ifa Pure Proposal 2023
No ratings yet
Ifa Pure Proposal 2023
22 pages
Fournier - Consumer Brand Relationship PDF
No ratings yet
Fournier - Consumer Brand Relationship PDF
32 pages
Krishnendu PCB-IT602B
No ratings yet
Krishnendu PCB-IT602B
11 pages
ETI Solved Paper
No ratings yet
ETI Solved Paper
38 pages
ISAT 600 Progress Report 3
No ratings yet
ISAT 600 Progress Report 3
4 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
24 pages
Outlier Detection Using Reverse Neares Neighbor For Unsupervised Data
No ratings yet
Outlier Detection Using Reverse Neares Neighbor For Unsupervised Data
3 pages
Ad3411-Data Science and Analytics Laboratory
No ratings yet
Ad3411-Data Science and Analytics Laboratory
27 pages
Welfare Facilities and Employee Satisfaction in HLL PROJECT REPORT MBA
0% (1)
Welfare Facilities and Employee Satisfaction in HLL PROJECT REPORT MBA
91 pages
A Novel Approach To Credit Card Fraud Detection Model
No ratings yet
A Novel Approach To Credit Card Fraud Detection Model
3 pages
12 Outlier
No ratings yet
12 Outlier
55 pages
01 - Introduction To Data Analytics
100% (2)
01 - Introduction To Data Analytics
58 pages
3is Q4 M1 LESSON 1.1
No ratings yet
3is Q4 M1 LESSON 1.1
48 pages
ML Supervised Full Notes
No ratings yet
ML Supervised Full Notes
62 pages
STAT 445 Regression Analysis
No ratings yet
STAT 445 Regression Analysis
49 pages
Students' Academic Performance, Aptitude and Occupational Interest in The National Career Assessment Examination PDF
No ratings yet
Students' Academic Performance, Aptitude and Occupational Interest in The National Career Assessment Examination PDF
21 pages
Final PPT
100% (1)
Final PPT
16 pages
Methods To Detect Different Types of Outliers: March 2016
No ratings yet
Methods To Detect Different Types of Outliers: March 2016
7 pages
1 +fikadu
No ratings yet
1 +fikadu
17 pages
VGG 16
No ratings yet
VGG 16
18 pages
Decision Support System Assignment 1
No ratings yet
Decision Support System Assignment 1
4 pages
Chapter 12. Outlier Analysis
No ratings yet
Chapter 12. Outlier Analysis
4 pages
T-Test For Correlated Samples: Sherry V. Mecida, LPT, MATCC
No ratings yet
T-Test For Correlated Samples: Sherry V. Mecida, LPT, MATCC
6 pages
Beamer Pcs
No ratings yet
Beamer Pcs
22 pages
Linear Regression Machine Learning Model
No ratings yet
Linear Regression Machine Learning Model
10 pages
13 Multiple Regression Part3
No ratings yet
13 Multiple Regression Part3
20 pages
Two-Way Anova Interaction
No ratings yet
Two-Way Anova Interaction
9 pages
"A Study On Financial Analysis of BSNL": International Journal of Pure and Applied Mathematics No. 12 2018, 1471-1489
No ratings yet
"A Study On Financial Analysis of BSNL": International Journal of Pure and Applied Mathematics No. 12 2018, 1471-1489
20 pages
43651-Article Text-205325-1-10-20230123
No ratings yet
43651-Article Text-205325-1-10-20230123
8 pages
Power Bi Notes
100% (1)
Power Bi Notes
6 pages
Outlier Mining Techniques For Uncertain Data
No ratings yet
Outlier Mining Techniques For Uncertain Data
7 pages
Glennis Dsouza CV
No ratings yet
Glennis Dsouza CV
2 pages
3i's SYLLABUS
No ratings yet
3i's SYLLABUS
6 pages
Strengthening The Livelihood of Chepang People Vulnerable To Biodiversity Losses in Chitwan District, Nepal
No ratings yet
Strengthening The Livelihood of Chepang People Vulnerable To Biodiversity Losses in Chitwan District, Nepal
13 pages
Business Analyst/Machine Learning/analytics: Evalueserve PVT Ltd. EXL Services Cognizant Technology Solutions
No ratings yet
Business Analyst/Machine Learning/analytics: Evalueserve PVT Ltd. EXL Services Cognizant Technology Solutions
3 pages
11 DP Physics - Topic 1 Measurements & Uncertainties Program
No ratings yet
11 DP Physics - Topic 1 Measurements & Uncertainties Program
7 pages
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 5 - Lecture 2 - Statistical - Methods - Mining - Techniques

Uploaded by

Unit 5 - Lecture 2 - Statistical - Methods - Mining - Techniques

Uploaded by

1

Data Mining & Analytics

Outliers: Introduction, Challenges &

 Outlier and Outlier Analysis

• The sample is used for training and testing.

 Many clustering methods can be adapted to act as

that do not belong to any dense cluster.

When labeled fraudulent transactions are not available, we can use

centroid may be fraudulent.

as noise) is a potential outlier (possible fraud).

flagging any unusual activity.

normal data; outliers have high reconstruction error.

normal vibration patterns.

 (1) Statistical Methods

 If the heights of adult males in a city follow a

Detects one outlier at a time in a normally distributed dataset.

where is the value taken by a t-distribution at a

 1.Calculate the mean vector from the multivariate data set.

 is the value of o on the ith dimension.

 Statistical Data Mining

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.