Open navigation menu

Scribd

0% found this document useful (0 votes)

46 views31 pages

L08 Clustering

This document provides an overview of clustering techniques for unsupervised learning. It discusses popular clustering algorithms like K-means, agglomerative clustering, and DBSCAN. It also covers anomaly detection using clustering. K-means groups data by assigning points to centroids, while agglomerative clustering merges clusters iteratively. DBSCAN finds clusters of varying shapes based on density. Anomaly detection identifies outliers that are distant from clusters of normal data points.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views31 pages

L08 Clustering

This document provides an overview of clustering techniques for unsupervised learning. It discusses popular clustering algorithms like K-means, agglomerative clustering, and DBSCAN. It also covers anomaly detection using clustering. K-means groups data by assigning points to centroids, while agglomerative clustering merges clusters iteratively. DBSCAN finds clusters of varying shapes based on density. Anomaly detection identifies outliers that are distant from clusters of normal data points.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

UCCD2063

Artificial Intelligence Techniques

Unit 08:
Unsupervised Learning:
Clustering

Clustering
Outline
• Unsupervised Learning - Clustering
• Anomaly Detection

Clustering
What is Clustering?
▪ Clustering: the process of grouping data samples that are
similar in some way into classes of similar objects
▪ Clustering is a form of unsupervised learning – class labels are
not known in advance (e.g. y is not provided)
▪ It is a method of data exploration – a way of looking for
patterns or structure in the data that are of interest

Clustering 3
Popular Clustering Algorithms

▪ K-means clustering – trying to separate samples in k

groups of equal variance.

▪ Agglomerative clustering – a bottom up approach which

each observation starts in its own cluster, and clusters are
successively merged together.

▪ DBSCAN – views clusters as areas of high density

separated by areas of low density, clusters found by
DBSCAN can be any shape.

Clustering 4
K-means Clustering

1. Randomly initialize k (e.g. 3)

cluster centers
2. Assign each data point to the
closest cluster center (using some
distance measure)
3. Re-compute cluster centers (mean
of data points in cluster)
4. Repeat the process and stop when
there are no new re-assignments

Clustering 5
K-means Clustering

1. Randomly initialize k cluster

centers
2. Assign each data point to the
closest cluster center (using some
distance measure)
3. Re-compute cluster centers (mean
of data points in cluster)
4. Repeat the process and stop when
there are no new re-assignments

Clustering 6
K-means Clustering

1. Randomly initialize k cluster

centers
2. Assign each data point to the
closest cluster center (using some
distance measure)
3. Re-compute cluster centers (mean
of data points in cluster)
4. Repeat the process and stop when
there are no new re-assignments

Clustering 7
K-means Clustering

1. Randomly initialize k cluster

centers
2. Assign each data point to the
closest cluster center (using some
distance measure)
3. Re-compute cluster centers (mean
of data points in cluster)
4. Repeat the process and stop when
there are no new re-assignments

Clustering 8
K-means Clustering - animation

Clustering 9
K-means Clustering

Problems with k-means:

• Have to select k centers, can be difficult
• Start with a random choice of cluster centers and
may yield different clustering results on different
runs
• Assume clusters are convex shaped, cannot deal
with complex clusters

Clustering 10
K-means Clustering Problems

(a) Different starting cluster centers

(b) Incorrect number of clusters (c) Non-convex clusters

Clustering 11
Agglomerative Clustering

▪ Each points are initialized as its own cluster

▪ Compute the linkage between clusters
▪ Merge the two clusters with smallest linkage
▪ Repeat the process until desirable number of clusters obtained

Clustering 12
Agglomerative Clustering

▪ Each points are initialized as its own cluster

▪ Compute the linkage between clusters
▪ Merge the two clusters with smallest linkage
▪ Repeat the process until desirable number of clusters obtained

Clustering 13
Agglomerative Clustering

▪ Each points are initialized as its own cluster

▪ Compute the linkage between clusters
▪ Merge the two clusters with smallest linkage
▪ Repeat the process until desirable number of clusters obtained

Clustering 14
Agglomerative Clustering

▪ Each points are initialized as its own cluster

▪ Compute the linkage between clusters
▪ Merge the two clusters with smallest linkage
▪ Repeat the process until desirable number of clusters obtained

Clustering 15
Agglomerative Clustering

▪ Each points are initialized as its own cluster

▪ Compute the linkage between clusters
▪ Merge the two clusters with smallest linkage
▪ Repeat the process until desirable number of clusters obtained

Clustering 16
Agglomerative Clustering

▪ Each points are initialized as its own cluster

▪ Compute the linkage between clusters
▪ Merge the two clusters with smallest linkage
▪ Repeat the process until desirable number of clusters obtained

Clustering 17
Agglomerative Clustering - animation

Clustering 18
DBSCAN
▪ DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
▪ Core point: a data point that has more than a specified number of
MinPts within Eps radius around it
▪ Border point: a data point that has fewer than MinPts within Eps, but
is in the neighborhood of a core point
▪ Noise point: any point that is not a core point or a border point

(Hyperparameters: MinPts and Eps )

Clustering 19
DBSCAN

▪ Randomly select a core point to start a new cluster

▪ Iteratively add all points (core and border) within the Eps distance to
the cluster
▪ Stop when no more points are within the Eps neighborhood
▪ Repeat the procedure with an unvisited core point, and stop when no
more unvisited core point

Clustering 20
DBSCAN

▪ Randomly select a core point to start a new cluster

▪ Iteratively add all points (core and border) within the Eps distance to
the cluster
▪ Stop when no more points are within the Eps neighborhood
▪ Repeat the procedure with an unvisited core point, and stop when no
more unvisited core point

Clustering 21
DBSCAN

▪ Randomly select a core point to start a new cluster

▪ Iteratively add all points (core and border) within the Eps distance to
the cluster
▪ Stop when no more points are within the Eps neighborhood
▪ Repeat the procedure with an unvisited core point, and stop when no
more unvisited core point

Clustering 22
DBSCAN

▪ Randomly select a core point to start a new cluster

▪ Iteratively add all points (core and border) within the Eps distance to
the cluster
▪ Stop when no more points are within the Eps neighborhood
▪ Repeat the procedure with an unvisited core point, and stop when no
more unvisited core point

Clustering 23
DBSCAN - animation

Clustering 24
DBSCAN

▪ Strength :
• Can handle noise (outliers ) very well
• Can handle clusters of different shapes and sizes

▪ Weakness :
• Does not work well when dealing with clusters of
varying densities
• Sensitive to the hyperparameters
• May not work well in high dimensionality of data

Clustering 25
Anomaly Detection

▪ Anomaly detection detects data points that does not fit

well with the rest of the data. It has a wide range of
applications such as fraud detection, surveillance,
diagnosis, data cleanup, etc.
▪ Anomaly detection can be approached in many ways
depending on the nature of data (labeled or not labeled,
ordered or unordered, ...)
▪ Today we will focus on anomaly detection on multivariate
unordered data using clustering and Mahalanobis
distance.

Clustering 26
Anomaly Detection with Clustering
▪ The underlying assumption is that if we cluster the data,
normal data will belong to clusters while anomalies will
not belong to any clusters or belong to small clusters.
▪ A data point is considered an anomaly if the distance
from the data point to known large clusters is too far.

Clustering 27
Problem with Euclidean Distance
▪ For the distance measure, it should be noted that Euclidean
distance fails to find the correct distance because it tries to get
ordinary straight-line distance.

Clustering 28
Mahalanobis Distance
▪ The solution is Mahalanobis distance which takes into account
the direction of the variance in order to normalize it properly.

S is the Covariance Matrix of cluster

Clustering 29
Multivariate Outlier Detection With Mahalanobis Distance
▪ To detect outliers, we should specify the distance threshold
▪ The distance threshold is set by multiplying the STD (standard
deviation) of the Mahalanobis Distance by the extremeness
degree k such that:
thresh = k * std

Clustering 30
Next:

Search
SearchAlgorithms
Problems

Clustering

You might also like

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6458)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (1005)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1022)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (582)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5181)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2133)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (464)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (280)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2016)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4372)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2814)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4135)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
P04 The Regression Pipeline - Preprocessing Ans
No ratings yet
P04 The Regression Pipeline - Preprocessing Ans
19 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
L12 Bayesian Network
No ratings yet
L12 Bayesian Network
35 pages
P01 Introduction To Python Ans
No ratings yet
P01 Introduction To Python Ans
30 pages
P05 The Regression Pipeline - Training and Testing Ans
No ratings yet
P05 The Regression Pipeline - Training and Testing Ans
13 pages
L01 Introduction To AI
No ratings yet
L01 Introduction To AI
26 pages
L02 Fundamentals of ML
No ratings yet
L02 Fundamentals of ML
39 pages
L03 The Regression Pipeline
No ratings yet
L03 The Regression Pipeline
94 pages
Chapter 3 Static Test
No ratings yet
Chapter 3 Static Test
26 pages
Chapter 4 Test Techniques
No ratings yet
Chapter 4 Test Techniques
25 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Roadmap To GenAi
No ratings yet
Roadmap To GenAi
2 pages
AI-Powered Predictive Analytics For Vehicle Maintenance Scheduling
No ratings yet
AI-Powered Predictive Analytics For Vehicle Maintenance Scheduling
16 pages
Final Year Report 7th Sem
No ratings yet
Final Year Report 7th Sem
41 pages
Whitepaper AI Research Merck KGaA
No ratings yet
Whitepaper AI Research Merck KGaA
11 pages
Neuromorphic Computing Full Report
No ratings yet
Neuromorphic Computing Full Report
12 pages
AIML Lab Programs
No ratings yet
AIML Lab Programs
13 pages
Automatic Music Generation
No ratings yet
Automatic Music Generation
16 pages
Support Vector Machine
No ratings yet
Support Vector Machine
7 pages
Six Figure Career Blueprint
No ratings yet
Six Figure Career Blueprint
21 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
19 pages
UNIT 1 - Introduction (Types of Machine Learning)
100% (1)
UNIT 1 - Introduction (Types of Machine Learning)
21 pages
The Structure of Style
No ratings yet
The Structure of Style
358 pages
Dod Data Analytics Ai Adoption Strategy
No ratings yet
Dod Data Analytics Ai Adoption Strategy
26 pages
Previewpdf
No ratings yet
Previewpdf
29 pages
Artificial Intelligence ADI INDEX
100% (1)
Artificial Intelligence ADI INDEX
10 pages
Seven Figure Social Selling Over 400 Pages of Proven Social Selling Scripts, Strategies, and Secrets To Increase Sales And... (Brandon Bornancin)
No ratings yet
Seven Figure Social Selling Over 400 Pages of Proven Social Selling Scripts, Strategies, and Secrets To Increase Sales And... (Brandon Bornancin)
749 pages
Adaptive Neuro-Fuzzy Inference
No ratings yet
Adaptive Neuro-Fuzzy Inference
13 pages
ZenHire Competitor Analysis
No ratings yet
ZenHire Competitor Analysis
12 pages
Ocean PRIZE
No ratings yet
Ocean PRIZE
9 pages
Doc-20250301-Wa0024 250301 043845
No ratings yet
Doc-20250301-Wa0024 250301 043845
31 pages
Aiml Unit 4
No ratings yet
Aiml Unit 4
20 pages
Performance Evaluation of SVM in A Real Dataset To Predict Customer Purchases
No ratings yet
Performance Evaluation of SVM in A Real Dataset To Predict Customer Purchases
5 pages
Generating Adversarial Malware Examples For Black-Box Attacks Based On GAN
No ratings yet
Generating Adversarial Malware Examples For Black-Box Attacks Based On GAN
7 pages
INFO59548 (F2020) Final Exam
No ratings yet
INFO59548 (F2020) Final Exam
3 pages
Introduction To AIOps - Simplilearn
No ratings yet
Introduction To AIOps - Simplilearn
14 pages
On Image Classification: City vs. Landscape
No ratings yet
On Image Classification: City vs. Landscape
6 pages
MPS 2 EM 2024 25 MP - Watermark
No ratings yet
MPS 2 EM 2024 25 MP - Watermark
11 pages
GE Healthcare White Paper - FINAL
No ratings yet
GE Healthcare White Paper - FINAL
19 pages
BCA Program Structure
No ratings yet
BCA Program Structure
30 pages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies:

Alternative Proxy