0% found this document useful (0 votes)
21 views2 pages

1 Isolationtree

The document discusses outlier detection using Isolation Forest, an unsupervised algorithm that can effectively identify anomalies or outliers in a dataset. Isolation Forest works by recursively splitting data and measuring path lengths, with shorter paths indicating more likely outliers. Advantages of Isolation Forest include handling large, high-dimensional datasets and insensitivity to normal data distributions.

Uploaded by

Hidden character
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views2 pages

1 Isolationtree

The document discusses outlier detection using Isolation Forest, an unsupervised algorithm that can effectively identify anomalies or outliers in a dataset. Isolation Forest works by recursively splitting data and measuring path lengths, with shorter paths indicating more likely outliers. Advantages of Isolation Forest include handling large, high-dimensional datasets and insensitivity to normal data distributions.

Uploaded by

Hidden character
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Chapter 1

Outlier detection using Isolation


Forest

Figure 1.1: Splits to determine outlier

Isolation Forest is an unsupervised algorithm that can effectively identify anomalies or outliers
in a dataset. These outliers often represent noisy or unrepresentative instances of the majority
class, and their inclusion in the training data can have adverse effects on the performance of a
classification model.
The Isolation Forest “isolates” observations by randomly selecting a feature and then randomly
selecting a split value between the maximum and minimum values of the selected feature. Since
recursive partitioning can be represented by a tree structure, the number of splittings required
to isolate a sample is equivalent to the path length from the root node to the terminating node.
This path length, averaged over a forest of such random trees, is a measure of normality and our
decision function. Random partitioning produces noticeably shorter paths for anomalies. Hence,
when a forest of random trees collectively produce shorter path lengths for particular samples,
they are highly likely to be anomalies [1].
The advantages of using Isolation Forest for outlier detection include its capability to handle
high-dimensional and large datasets effectively, as well as its insensitivity to the shape and
distribution of normal instances.

1
References

[1] scikit learn, “IsolationForest - scikit-learn 1.4.1 documentation,” https://scikit-learn.


org/stable/modules/generated/sklearn.ensemble.IsolationForest.html, scikit-learn. [Online].
Available: https://scikit-learn.org

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy