0% found this document useful (0 votes)

41 views10 pages

Data Minning Unit 4-1

The document discusses different types of outliers including global outliers, collective outliers, and contextual outliers. It also discusses challenges of outlier detection such as scalability, high dimensionality, and interpretability. Statistical approaches for outlier detection include standard deviation, interquartile range, box plots, z-scores, and distance-based methods.

Uploaded by

yadavchilki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views10 pages

Data Minning Unit 4-1

Uploaded by

yadavchilki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

DATA MINNING UNIT 4 :

Outlier is a data object that deviates significantly from the rest of the data objects and
behaves in a different manner. They can be caused by measurement or execution errors.
The analysis of outlier data is referred to as outlier analysis or outlier mining.

An outlier cannot be termed as a noise or error. Instead, they are suspected of not being
generated by the same method as the rest of the data objects.

Outliers are of three types, namely –

1. Global (or Point) Outliers

2. Collective Outliers
3. Contextual (or Conditional) Outliers

1. Global Outliers

1. Definition: Global outliers are data points that deviate significantly from the overall
distribution of a dataset.
2. Causes: Errors in data collection, measurement errors, or truly unusual events can
result in global outliers.
3. Impact: Global outliers can distort data analysis results and affect machine learning
model performance.
4. Detection: Techniques include statistical methods (e.g., z-score, Mahalanobis
distance), machine learning algorithms (e.g., isolation forest, one-class SVM), and data
visualization techniques.
5. Handling: Options may include removing or correcting outliers, transforming data, or
using robust methods.
6. Considerations: Carefully considering the impact of global outliers is crucial for
accurate data analysis and machine learning model outcomes.

The red data point is a global outlier.

2. Collective Outliers

1. Definition: Collective outliers are groups of data points that collectively deviate
significantly from the overall distribution of a dataset.
2. Characteristics: Collective outliers may not be outliers when considered individually,
but as a group, they exhibit unusual behavior.
3. Detection: Techniques for detecting collective outliers include clustering algorithms,
density-based methods, and subspace-based approaches.
4 Impact: Collective outliers can represent interesting patterns or anomalies in data that
may require special attention or further investigation.
5. Handling: Handling collective outliers depends on the specific use case and may
involve further analysis of the group behavior, identification of contributing factors, or
considering contextual information.
6. Considerations: Detecting and interpreting collective outliers can be more complex
than individual outliers, as the focus is on group behavior rather than individual data
points. Proper understanding of the data context and domain knowledge is crucial for
effective handling of collective outliers.

The red data points as a whole are collective outliers.

3. Contextual Outliers

1. Definition: Contextual outliers are data points that deviate significantly from the
expected behavior within a specific context or subgroup.
2. Characteristics: Contextual outliers may not be outliers when considered in the entire
dataset, but they exhibit unusual behavior within a specific context or subgroup.
3. Detection: Techniques for detecting contextual outliers include contextual clustering,
contextual anomaly detection, and context-aware machine learning approaches.
4. Contextual Information: Contextual information such as time, location, or other
relevant factors are crucial in identifying contextual outliers.
5. Impact: Contextual outliers can represent unusual or anomalous behavior within a
specific context, which may require further investigation or attention.
6. Handling: Handling contextual outliers may involve considering the contextual
information, contextual normalization or transformation of data, or using context-specific
models or algorithms.
7. Considerations: Proper understanding of the context and domain-specific knowledge
is crucial for accurate detection and interpretation of contextual outliers, as they may vary
based on the specific context or subgroup being considered.

A low temperature value in June is a contextual outlier because the

same value in December is not an outlier.

Outlier detection challenges :

Outlier detection poses several challenges due to the complexity and variability of real-world datasets.
Some of the key challenges include:

4. Scalability: Outlier detection algorithms must be able to handle large-scale datasets efficiently. As
dataset sizes increase, the computational complexity of outlier detection algorithms can become a
bottleneck, requiring scalable and parallelizable approaches to maintain acceptable performance.
5. High Dimensionality: Many real-world datasets have high dimensionality, meaning they contain
a large number of features or attributes. In high-dimensional spaces, the notion of distance and
similarity becomes less intuitive, making it challenging to define and detect outliers accurately.
6. Data Quality: Outlier detection algorithms are sensitive to noise and errors in the data. Noisy or
corrupted data can lead to false positives (normal data incorrectly classified as outliers) or false
negatives (outliers missed by the algorithm), reducing the effectiveness of outlier detection
methods.
7. Complex Data Patterns: Outliers may not always exhibit simple patterns or deviations from the
norm. In some cases, outliers may be part of complex data patterns or clusters, making them
difficult to detect using traditional statistical methods or distance-based approaches.
8. Imbalanced Data: In datasets where outliers are rare compared to normal data points,
imbalanced class distributions can pose a challenge for outlier detection algorithms. Traditional
statistical methods may struggle to distinguish outliers from the majority class, leading to biased
results.
9. Concept Drift: In dynamic or evolving environments, the underlying data distribution may change
over time, leading to concept drift. Outlier detection models trained on historical data may become
less effective when applied to new data, requiring continuous monitoring and adaptation to detect
emerging outliers.
10. Interpretability: While outlier detection algorithms can effectively identify anomalies in the data,
interpreting the reasons behind outliers and understanding their significance can be challenging.
Domain knowledge and contextual information are often necessary to interpret the implications of
detected outliers accurately.
11. Computational Cost: Some outlier detection algorithms, especially those based on complex
models or iterative optimization techniques, can be computationally expensive. Balancing the
trade-off between computational cost and detection accuracy is essential, especially for real-time
or resource-constrained applications.

statistical approaches for outlier detection:

In data mining, outlier detection is a crucial step in understanding and cleaning datasets. There are
several statistical approaches for outlier detection, each with its own strengths and weaknesses. Here
are some commonly used statistical methods:

12. Standard Deviation Method:

• Outliers are often defined as data points that lie beyond a certain number of standard deviations from
the mean.
• Data points that fall more than a certain number of standard deviations away from the mean are
considered outliers.
13. Interquartile Range (IQR) Method:
• The interquartile range is the range between the first quartile (25th percentile) and the third quartile
(75th percentile) of the data.
• Outliers are identified as points that fall below the first quartile minus a specified multiplier times the
IQR or above the third quartile plus the same multiplier times the IQR.
14. Box Plot Method:
• Box plots visually represent the distribution of data based on quartiles.
• Outliers are identified as points that fall outside of the whiskers of the box plot, typically defined as 1.5
times the IQR.
15. Z-Score:
• Z-score measures how many standard deviations a data point is from the mean.
• Outliers are identified as data points with an absolute Z-score greater than a threshold (often 2 or 3).
16. Density-Based Methods:
• These methods identify outliers based on the density of data points in a neighborhood.
• DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a popular density-based
clustering algorithm that can be used for outlier detection.
17. Robust Statistical Methods:
• These methods are less sensitive to outliers compared to traditional statistical methods.
• Examples include robust regression techniques like RANSAC (RANdom SAmple Consensus) and robust
estimation of covariance matrices.
18. Distance-Based Methods:
• These methods compute the distance between data points and identify outliers based on unusually
large distances.
• For example, the k-nearest neighbors algorithm can be used to detect outliers based on the distance to
the kth nearest neighbor.

Proximity-Based Methods in Data Mining

Proximity-based methods are an important technique in data mining. They are employed to

find patterns in large databases by scanning documents for certain keywords and phrases.

They are highly prevalent because they do not require expensive hardware or much storage

space, and they scale up efficiently as the size of databases increases.

Advantages of Proximity-Based Methods:

1. Proximity-based methods make use of machine learning techniques, in which algorithms

are trained to respond to certain patterns.
2. Using a random sample of documents, the machine learning algorithm analyzes the
keywords and phrases used in them and makes predictions about the probability that
these words appear together across all documents.
3. Proximity can be calculated by calculating a similarity score between two collections of
training data and then comparing these scores. The algorithm then tries to compute the
maximum similarity score for two distinct sets of training items.

Disadvantages of Proximity-Based Methods:

1. Important words may not be as close in proximity as we expected.

2. Over-segmentation of documents into phrases. To counter these problems, a lexical chain-
based algorithm has been proposed.
Proximity-based methods perform very well for finding sets of documents that contain certain
words based on background knowledge. But performance is limited when the background
knowledge has not been pre-classified into categories.
To find sets of documents containing certain categories, one must assign categorical values to
each document and then run proximity-based methods on these documents as training data,
hoping for accurate representations of the categories.
One way to identify outliers is by calculating their distance from the rest of the data set in is
known as density-based outlier detection.
Types of Proximity-Based Outlier Detection Methods:
• Distance-based outlier detection methods: A distance-based outlier detection method is
a statistical technique. Such methods typically measure distances between individual data
points and the rest of their respective groups. Many approaches also have a configurable
error threshold for determining when a point is an outlier. Many distance-based outliers
methods have been developed. The methods use distance statistics such as Euclidean,
Manhattan, or Mahalanobis distance for calculating distances between individual points and
to detect outliers. The following three outlier detection methods have been selected based
on their performance:
• WLSMV (Weighted Least Squares Minimization) method
• SVM (Support Vector Machines) method,
• RMSProp method.
• Density-based Outlier detection methods: A density-based outlier detection method is
used for checking the density of an entity object and its closest objects. Key applications of
this method are used in many applications including Malware Detection, Awareness,
Behavior Analysis, and Network Intrusion Detection. There are some limitations to density-
based outlier detection methods that are effective until it is determined that the outliers
being detected are not necessarily outliers but just a part of a much larger distribution of
data. A limitation with using density-based outlier detection methods is that the density
function must be defined and clearly understood before implementation and the proper
value set.

Clustering-based approaches for outlier detection:

Clustering-based approaches for outlier detection in data mining involve leveraging
clustering algorithms to group data points into clusters and then identifying outliers as data
points that do not belong to any cluster or form clusters of their own. These approaches
utilize the notion that outliers often lie in sparsely populated regions of the data or exhibit
dissimilarities with the majority of the data points. Here are some common clustering-based
methods for outlier detection:

1. Density-Based Outlier Detection:

- **DBSCAN (Density-Based Spatial Clustering of Applications with Noise)**:
DBSCAN groups together densely connected data points into clusters and identifies outliers
as noise points that do not belong to any cluster. Outliers are typically located in regions
with low density or insufficient neighbors to form a cluster.

2. Distance-Based Outlier Detection:

- **K-Means Clustering**: K-Means partitions the dataset into K clusters based on the
similarity of data points to the cluster centroids. Outliers are identified as data points that
are not assigned to any cluster or have large distances to the nearest cluster centroid.
- **Hierarchical Clustering**: Hierarchical clustering builds a tree-like hierarchy of
clusters by recursively merging or splitting clusters based on their similarity. Outliers can
be identified as data points that form singleton clusters or lie in separate branches of the
hierarchical tree.

3. Graph-Based Outlier Detection:

- **Minimum Spanning Tree (MST)**: MST constructs a tree that connects all data
points with minimum total edge weights. Outliers are identified as data points with high
edge weights or long distances to the rest of the tree.
- **Isolation Forest**: Isolation Forest builds an ensemble of decision trees to isolate
outliers by recursively partitioning the dataset into subsets. Outliers are identified as data
points that require fewer splits to isolate them from the rest of the dataset.

4. Subspace Clustering-Based Outlier Detection:

- **CLIQUE (CLustering In QUEst)**: CLIQUE identifies clusters in subspaces of the data
that exhibit high density. Outliers are detected as data points that do not belong to any
dense subspace or have low membership probabilities in any subspace cluster.

5. Cluster-Based Outlier Factor:

- **COF (Cluster-Based Outlier Factor)**: COF measures the degree to which a data
point deviates from its cluster members in terms of distance. Outliers are identified as data
points with high COF scores, indicating significant deviations from the cluster center.

6. Probabilistic Model-Based Outlier Detection:

- **EM Clustering (Expectation-Maximization)**: EM clustering models the dataset as
a mixture of multivariate Gaussian distributions. Outliers are identified as data points with
low posterior probabilities of belonging to any cluster or having low likelihoods under the
mixture model.

These clustering-based approaches provide effective methods for outlier detection in

various types of datasets and can be adapted to different domains and applications. By
leveraging clustering algorithms and analyzing the clustering structure of the data, these
methods can effectively identify anomalies and outliers that deviate from the majority of
the data points.

Classification-based approaches for outlier detection:

Classification-based approaches for outlier detection in data mining involve training a
supervised learning model to classify data points as either normal or outlier. These approaches
require labeled training data, where outliers are explicitly identified or labeled, and then use
this labeled data to train a classifier. Here's how classification-based outlier detection works:

1. **Labeling Data**: The first step in classification-based outlier detection is to label the data.
This typically involves identifying outliers in the dataset and assigning them a label (e.g., 1 for
outliers, 0 for normal data points). The labeled dataset is then used for training the classifier.

2. **Feature Extraction and Selection**: Next, features are extracted from the data and
selected based on their relevance to outlier detection. Feature engineering techniques may be
applied to transform or combine raw features to improve the performance of the classifier.

3. Training a Classifier: A supervised learning classifier, such as logistic regression,

decision trees, random forests, support vector machines (SVM), or neural networks, is trained
using the labeled dataset. The classifier learns to distinguish between normal data points and
outliers based on the features extracted from the data.

4. **Model Evaluation**: The trained classifier is evaluated using evaluation metrics such as
accuracy, precision, recall, F1-score, or area under the ROC curve (AUC). Cross-validation
techniques may be used to assess the generalization performance of the model and identify
potential overfitting.

5. **Predicting Outliers**: Once the classifier is trained and evaluated, it can be used to
predict outliers in new, unseen data. Data points that are classified as outliers by the classifier
are flagged as anomalous and may require further investigation or monitoring.

6. **Model Tuning and Optimization**: The performance of the classifier may be further
improved by tuning hyperparameters, optimizing feature selection, or using ensemble
techniques to combine multiple classifiers. Iterative refinement may be performed to enhance
the robustness and accuracy of the outlier detection model.

Classification-based outlier detection offers several advantages, including the ability to

leverage labeled data for training, the flexibility to handle different types of features and data
distributions, and the potential for high accuracy and interpretability. However, it also has
limitations, such as the reliance on labeled training data, the need for feature engineering, and
the risk of overfitting to the training data. Careful consideration of these factors is essential
when applying classification-based approaches to outlier detection in real-world applications.

Detecting outliers in multidimensional data:

Detecting outliers in multidimensional data (also known as multivariate outlier detection) requires
techniques that can handle multiple dimensions simultaneously. Here are some methods commonly used
for outlier detection in multidimensional data:

1. **Mahalanobis Distance**:
- Mahalanobis distance measures the distance of a data point from the centroid of the data distribution,
taking into account the covariance structure of the data.
- Points with a Mahalanobis distance exceeding a certain threshold are considered outliers.

2. Principal Component Analysis (PCA):

- PCA is a dimensionality reduction technique that can be used for outlier detection by projecting the
data onto a lower-dimensional subspace.
- Outliers can be identified as data points with large reconstruction errors or points that lie far from the
principal components.

3. **Isolation Forest**:
- Isolation Forest is an ensemble method that constructs isolation trees to isolate outliers.
- Outliers are identified as data points that require fewer splits to isolate them from the rest of the data.

4. Local Outlier Factor (LOF):

- LOF measures the density of data points relative to their neighbors, identifying points with
significantly lower density as outliers.
- It considers the local neighborhood of each data point to detect outliers.

5. **One-Class SVM**:
- One-Class Support Vector Machine (SVM) learns a decision boundary around the majority of the data
points and identifies outliers as points lying outside this boundary.
- It is particularly useful when only normal data is available for training.

6. **Clustering-Based Approaches**:
- Clustering algorithms such as k-means or DBSCAN can be used to cluster the data and identify outliers
as points that do not belong to any cluster or form singleton clusters.

7. Robust Covariance Estimation:

- Robust covariance estimation techniques, such as Minimum Covariance Determinant (MCD), estimate
the covariance matrix of the data while downweighting the influence of outliers.
- Outliers are detected based on their influence on the estimated covariance matrix.

8. **Probabilistic Models**:
- Probabilistic models such as Gaussian Mixture Models (GMMs) can be used to model the distribution
of the data and identify outliers as points with low probability under the model.

When applying these methods to multidimensional data, it's essential to consider the characteristics of
the data, such as its distribution, dimensionality, and the presence of correlation between variables.
Additionally, it's often beneficial to use multiple approaches in combination or to customize the method
to the specific properties of the dataset.

Krishnendu PCB-IT602B
No ratings yet
Krishnendu PCB-IT602B
11 pages
Outlier Analysis in Data Mining
No ratings yet
Outlier Analysis in Data Mining
5 pages
UNIT 4
No ratings yet
UNIT 4
17 pages
07 OUTLIER DETECTION
No ratings yet
07 OUTLIER DETECTION
54 pages
A Survey on Outlier Detection Methods
No ratings yet
A Survey on Outlier Detection Methods
4 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
44 pages
Outlier Detection
No ratings yet
Outlier Detection
22 pages
Handling Outliers
No ratings yet
Handling Outliers
6 pages
Outliers ML
No ratings yet
Outliers ML
14 pages
12Outlier
No ratings yet
12Outlier
16 pages
Outlier Detection Techniques
100% (2)
Outlier Detection Techniques
56 pages
Feature Engineering
No ratings yet
Feature Engineering
66 pages
Outlier Detection
No ratings yet
Outlier Detection
45 pages
Lec3. Outlier Analysis
No ratings yet
Lec3. Outlier Analysis
54 pages
LECTURE 12
No ratings yet
LECTURE 12
54 pages
Feature Engineering
No ratings yet
Feature Engineering
63 pages
Outlier detection
No ratings yet
Outlier detection
10 pages
ISAT 600 Progress Report 3
No ratings yet
ISAT 600 Progress Report 3
4 pages
Outliers
No ratings yet
Outliers
3 pages
Methods To Detect Different Types of Outliers: March 2016
No ratings yet
Methods To Detect Different Types of Outliers: March 2016
7 pages
Outlier
No ratings yet
Outlier
2 pages
12Outlier-1
No ratings yet
12Outlier-1
45 pages
Unit 5_Lecture 1_Outlier Detection
No ratings yet
Unit 5_Lecture 1_Outlier Detection
30 pages
Unit 5
No ratings yet
Unit 5
70 pages
Unit5
No ratings yet
Unit5
47 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
13 pages
ADII10 Analisa Outlier
No ratings yet
ADII10 Analisa Outlier
37 pages
Datamining_seminar
No ratings yet
Datamining_seminar
19 pages
741OutlierDetection
No ratings yet
741OutlierDetection
55 pages
Unit-5 Outlier Analysis
No ratings yet
Unit-5 Outlier Analysis
32 pages
17 dm2 Anomaly Detection 2022 23
No ratings yet
17 dm2 Anomaly Detection 2022 23
113 pages
Chapter 12. Outlier Analysis
No ratings yet
Chapter 12. Outlier Analysis
4 pages
12 Outlier
No ratings yet
12 Outlier
55 pages
12 Outlier
No ratings yet
12 Outlier
18 pages
Lecture 8 Data Prepration Techniques
No ratings yet
Lecture 8 Data Prepration Techniques
4 pages
4_Outliers_+Transformaations ML
No ratings yet
4_Outliers_+Transformaations ML
28 pages
Anomoly Detection - Ensemble - Classifiers
No ratings yet
Anomoly Detection - Ensemble - Classifiers
68 pages
Unit - 3: Big Data Analytics
No ratings yet
Unit - 3: Big Data Analytics
23 pages
Outlier Mining Techniques For Uncertain Data
No ratings yet
Outlier Mining Techniques For Uncertain Data
7 pages
Chapter 4 Part 2
No ratings yet
Chapter 4 Part 2
12 pages
Expt2
No ratings yet
Expt2
3 pages
What is Outlier
No ratings yet
What is Outlier
3 pages
ADS EXP 7
No ratings yet
ADS EXP 7
10 pages
Data Cleaning
No ratings yet
Data Cleaning
4 pages
Unit V Outlier 2
No ratings yet
Unit V Outlier 2
13 pages
Outlier or Anomaly Detection
No ratings yet
Outlier or Anomaly Detection
9 pages
Missing and Outlier
No ratings yet
Missing and Outlier
20 pages
A-fuzzy-proximity-relation-approach-for-outlier-detection-in_2021_Soft-Compu
No ratings yet
A-fuzzy-proximity-relation-approach-for-outlier-detection-in_2021_Soft-Compu
12 pages
Concepts of EDA, Outliers-Detection and Treatment
No ratings yet
Concepts of EDA, Outliers-Detection and Treatment
99 pages
Anomaly Detection and Outlier Analysis
No ratings yet
Anomaly Detection and Outlier Analysis
25 pages
Outlier Analysis
No ratings yet
Outlier Analysis
28 pages
Ppt Data Mining
No ratings yet
Ppt Data Mining
13 pages
Unit5-OutliersDetection
No ratings yet
Unit5-OutliersDetection
37 pages
Outlier Detection and Removal
No ratings yet
Outlier Detection and Removal
2 pages
Elastic Anomalies
No ratings yet
Elastic Anomalies
7 pages
Energy Conversion and Econom - 2023 - Patel - Taxonomy of Outlier Detection Methods For Power System Measurements
No ratings yet
Energy Conversion and Econom - 2023 - Patel - Taxonomy of Outlier Detection Methods For Power System Measurements
16 pages
Lecture 12 Outliers and Guidelines For Exercises
No ratings yet
Lecture 12 Outliers and Guidelines For Exercises
6 pages
Studentdata 1
No ratings yet
Studentdata 1
89 pages
Data Mining - Outlier Analysis
100% (3)
Data Mining - Outlier Analysis
11 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
ML Probable Questions 2026 – أسئلة محتملة لامتحان تعلم الآلة 2026 ??
No ratings yet
ML Probable Questions 2026 – أسئلة محتملة لامتحان تعلم الآلة 2026 ??
2 pages
Control Chart Example: Date Meas'ent Mean UCL LCL
No ratings yet
Control Chart Example: Date Meas'ent Mean UCL LCL
5 pages
AP Psych Myers Research - Methods PDF
No ratings yet
AP Psych Myers Research - Methods PDF
200 pages
MStat PSB 2018 PDF
No ratings yet
MStat PSB 2018 PDF
2 pages
Task Complexity As A Moderator of Goal Effects: A Meta-Analysis
No ratings yet
Task Complexity As A Moderator of Goal Effects: A Meta-Analysis
12 pages
Sampling Techniques: Advantages and Disadvantages
No ratings yet
Sampling Techniques: Advantages and Disadvantages
1 page
James Steiger R For MultipleRegressionIntro
No ratings yet
James Steiger R For MultipleRegressionIntro
54 pages
Coefficient of Skewness (Grouped Data)
No ratings yet
Coefficient of Skewness (Grouped Data)
4 pages
ECON 322 ECONOMETRICS 11 - Kabarak University
No ratings yet
ECON 322 ECONOMETRICS 11 - Kabarak University
6 pages
Midterm Prep
No ratings yet
Midterm Prep
2 pages
Test Questions For Grade 11
No ratings yet
Test Questions For Grade 11
10 pages
Multivariate Statistical Methods A Primer Third Edition Manly 2024 scribd download
100% (1)
Multivariate Statistical Methods A Primer Third Edition Manly 2024 scribd download
47 pages
Supervised Learning With R
No ratings yet
Supervised Learning With R
30 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
10 pages
Probs-Stats Revision Notes
No ratings yet
Probs-Stats Revision Notes
19 pages
Role of Nurse in Suicide and Its Prevention: By-Himani Raj 1 Year MSC Nursing College of Nursing, Aiims Raipur
No ratings yet
Role of Nurse in Suicide and Its Prevention: By-Himani Raj 1 Year MSC Nursing College of Nursing, Aiims Raipur
21 pages
Measures of Central Tendency For Ungrouped Data
100% (6)
Measures of Central Tendency For Ungrouped Data
3 pages
Tesis - Garcia Anchelia Rodolfo Manuel - Fpycf
No ratings yet
Tesis - Garcia Anchelia Rodolfo Manuel - Fpycf
84 pages
CS 747, Autumn 2023 - Lecture 3
No ratings yet
CS 747, Autumn 2023 - Lecture 3
27 pages
1.110 Are The NAEP U.S. History Scores Approximately Normal?
No ratings yet
1.110 Are The NAEP U.S. History Scores Approximately Normal?
5 pages
Ebook Ebook PDF Introduction To Probability and Statistics 15Th Edition All Chapter PDF Docx Kindle
100% (39)
Ebook Ebook PDF Introduction To Probability and Statistics 15Th Edition All Chapter PDF Docx Kindle
41 pages
Lab-5-1-Regression and Multiple Regression
100% (2)
Lab-5-1-Regression and Multiple Regression
8 pages
Independent-Samples T Test Worksheet 2 - ANSWER KEY
No ratings yet
Independent-Samples T Test Worksheet 2 - ANSWER KEY
2 pages
Note Multivariate Analysis of Variance
No ratings yet
Note Multivariate Analysis of Variance
3 pages
Quantitative Analysis Using Spss
100% (1)
Quantitative Analysis Using Spss
42 pages
CONTOH UJI HOMEGENITAS SAMPEL
No ratings yet
CONTOH UJI HOMEGENITAS SAMPEL
3 pages
Problem Set 7. Statistics and Probability
No ratings yet
Problem Set 7. Statistics and Probability
3 pages
Marketing Management - I by Prof. Jayanta Chatterjee & Prof. Shashi Shekhar
No ratings yet
Marketing Management - I by Prof. Jayanta Chatterjee & Prof. Shashi Shekhar
3 pages
MBS_7e_PPT_15 (1)
No ratings yet
MBS_7e_PPT_15 (1)
51 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Minning Unit 4-1

Uploaded by

Data Minning Unit 4-1

Uploaded by

DATA MINNING UNIT 4 :

Outliers are of three types, namely –

1. Global (or Point) Outliers

The red data point is a global outlier.

The red data points as a whole are collective outliers.

A low temperature value in June is a contextual outlier because the

Outlier detection challenges :

statistical approaches for outlier detection:

12. Standard Deviation Method:

Proximity-Based Methods in Data Mining

space, and they scale up efficiently as the size of databases increases.

Advantages of Proximity-Based Methods:

1. Proximity-based methods make use of machine learning techniques, in which algorithms

Disadvantages of Proximity-Based Methods:

1. Important words may not be as close in proximity as we expected.

Clustering-based approaches for outlier detection:

1. Density-Based Outlier Detection:

2. Distance-Based Outlier Detection:

3. Graph-Based Outlier Detection:

4. Subspace Clustering-Based Outlier Detection:

5. Cluster-Based Outlier Factor:

6. Probabilistic Model-Based Outlier Detection:

These clustering-based approaches provide effective methods for outlier detection in

Classification-based approaches for outlier detection:

3. Training a Classifier: A supervised learning classifier, such as logistic regression,

Classification-based outlier detection offers several advantages, including the ability to

Detecting outliers in multidimensional data:

2. Principal Component Analysis (PCA):

4. Local Outlier Factor (LOF):

7. Robust Covariance Estimation:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Data Minning Unit 4-1

Uploaded by

Data Minning Unit 4-1

Uploaded by

DATA MINNING UNIT 4 :

Outliers are of three types, namely –

1. Global (or Point) Outliers

The red data point is a global outlier.

The red data points as a whole are collective outliers.

A low temperature value in June is a contextual outlier because the

Outlier detection challenges :

statistical approaches for outlier detection:

12. Standard Deviation Method:

Proximity-Based Methods in Data Mining

space, and they scale up efficiently as the size of databases increases.

Advantages of Proximity-Based Methods:

1. Proximity-based methods make use of machine learning techniques, in which algorithms

Disadvantages of Proximity-Based Methods:

1. Important words may not be as close in proximity as we expected.

Clustering-based approaches for outlier detection:

1. **Density-Based Outlier Detection**:

2. **Distance-Based Outlier Detection**:

3. **Graph-Based Outlier Detection**:

4. **Subspace Clustering-Based Outlier Detection**:

5. **Cluster-Based Outlier Factor**:

6. **Probabilistic Model-Based Outlier Detection**:

These clustering-based approaches provide effective methods for outlier detection in

Classification-based approaches for outlier detection:

3. **Training a Classifier**: A supervised learning classifier, such as logistic regression,

Classification-based outlier detection offers several advantages, including the ability to

Detecting outliers in multidimensional data:

2. **Principal Component Analysis (PCA)**:

4. **Local Outlier Factor (LOF)**:

7. **Robust Covariance Estimation**:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

1. Density-Based Outlier Detection:

2. Distance-Based Outlier Detection:

3. Graph-Based Outlier Detection:

4. Subspace Clustering-Based Outlier Detection:

5. Cluster-Based Outlier Factor:

6. Probabilistic Model-Based Outlier Detection:

3. Training a Classifier: A supervised learning classifier, such as logistic regression,

2. Principal Component Analysis (PCA):

4. Local Outlier Factor (LOF):

7. Robust Covariance Estimation: