0% found this document useful (0 votes)
45 views41 pages

Clustering Part-A

This document provides an overview of unsupervised learning techniques, specifically clustering using the k-means algorithm. It explains that k-means clustering is an iterative algorithm that groups unlabeled data points into k clusters based on similarity. It works by initially assigning data points to the closest cluster centroid and then iteratively updating the centroid positions until the clusters converge. The document also discusses applications of unsupervised learning like customer segmentation and anomaly detection.

Uploaded by

Waseem Sajjad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views41 pages

Clustering Part-A

This document provides an overview of unsupervised learning techniques, specifically clustering using the k-means algorithm. It explains that k-means clustering is an iterative algorithm that groups unlabeled data points into k clusters based on similarity. It works by initially assigning data points to the closest cluster centroid and then iteratively updating the centroid positions until the clusters converge. The document also discusses applications of unsupervised learning like customer segmentation and anomaly detection.

Uploaded by

Waseem Sajjad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

High Impact Skills Development Program

in Artificial Intelligence, Data Science, and Blockchain

Module 2: Unsupervised Learning


Lecture 1: Clustering

Instructor: Dr. Nazia Perwaiz


Assistant Professor, SEECS, NUST

1
Someone Messed Up the Library!

French

German

Spanish
Someone Messed Up the Library!
French

German

Spanish
Supervised Learning

Feature 2

Feature 1
Unsupervised Learning

Feature 2

Feature 1
Unsupervised Learning
Supervised vs Unsupervised Learning
Properties Unsupervised Learning Supervised Learning
Definition Type of machine learning Type of machine learning that
that happens without happens under human supervision,
human supervision and meaning people label input data
machine tries to find out with answer keys, that guided/
the patterns in data itself supervises the machine to learn the
desired outputs
Input data Unlabeled Labeled
Use of data Model is given only the Model is given input variables (X),
input variables (X) and no output variables (Y) and an
corresponding output data algorithm to learn the function
from input to output
7
Supervised vs Unsupervised
Learning
Properties Unsupervised Learning Supervised Learning
When to use You don’t know what you You know what you are looking for
are looking for in data in data

Applicable in To identify patterns and To determine a specific output


relationships in data relating to classification and
(Clustering and regression problems
association problems)
Accuracy of May provide less accurate Provides more accurate results
results results

Methods Computationally Simple


Complex 8
Why/ When Unsupervised Learning?
• Easier to get unlabeled data and less time-consuming
• no need to manually label the data

• Understanding raw data


• To find unknown patterns to get useful insights from raw data
• e.g. user categorization by their social media activity

• Similar to human mind


• baby-cat example

9
When Unsupervised Learning?
• If you need to identify patterns and relationships in data

• The data is pretty large


• where labeling the data may be time-consuming or impractical

10
Applications of Unsupervised Learning
• Medical diagnosis

• Customer segmentation

• Recommendation systems

• Anomaly Detection

• Cyber security

• Preparing data for supervised learning

11
Applications of Unsupervised Learning
• Medical diagnosis

12
Applications of Unsupervised Learning
• Customer segmentation

13
Applications of Unsupervised Learning
• Recommendation systems

14
Applications of Unsupervised Learning
• Anomaly Detection

15
Applications of Unsupervised Learning
• Cyber security (data preparation for unknown threats)

16
Applications of Unsupervised Learning
• Preparing data for supervised learning (Image segmentation)

17
Unsupervised ML Approaches
• Clustering: identifies similarities and differences between
unlabelled data entries and groups them based on their
properties.

• Dimensionality reduction: reduces some data while


maintaining the integrity of a data, when there's so much data
to analyse which may reduce the algorithms' performance.

• Association: can find relationships between variables, i.e.


identifies sets of items which often occur together in a dataset

18
Unsupervised ML Approaches
• Clustering: identifies similarities and differences between
unlabelled data entries and groups them based on their
properties.

19
Unsupervised ML Approaches
• Dimensionality reduction: reduces some data while
maintaining the integrity of a data, when there's so much data
to analyse which may reduce the algorithms' performance.

20
Unsupervised ML Approaches
• Association: can find relationships between variables, i.e.
identifies sets of items which often occur together in a dataset

21
Clustering

K-means Algorithm

22
K-means Clustering
Motivation:

• to summarize a complex real-valued data point with


a single categorical variable
K-means Clustering
K-means Clustering
• Is an Iterative algorithm

• that divides a group of n datasets

• into k different clusters/ subgroups

• based on the similarity and their mean distance from the


central point (centroid) of that particular subgroup/
formed.
K-means Clustering
Start:

• Pick K random
points as cluster
centeroids.

Here,

K=2
K-means Clustering

Iterative Step 1

• Assign data
points to closest
cluster centroid
K-means Clustering

Iterative Step 2

• Compute the
average position
of all data points
assigned to a
centroid
K-means Clustering

Iterative Step 3

• Move the cluster


centroid to the
average of the
assigned points
K-means Clustering

Repeat:

• Calculate average
of data points

• Move centroid to
the new average
position
K-means Clustering

Repeat:

• Until
Convergence

• i.e. Reassignment
of data points
occurs
K-means Clustering

Repeat:

• Until
Convergence

• i.e. Reassignment
of data points
occurs
K-means Clustering

Repeat:

• Until
Convergence

• i.e. Reassignment
of data points
occurs
K-means Clustering

Repeat:

• Until
Convergence

Converged
or
Not Converged?
K-means Clustering
When K-means Algorithm ends?

1. No re-assignment of the data points occurs

2. No relocation/ re-positioning of centroids


K-means Algorithm
K-means Algorithm

For all data points of


training data:
Find closest centroid c(i)

Average of points is recomputed


for all centroids relocation
K-means Optimization Objective

x(i) training example i

uc(i) cluster centroid of x(i)


Using K-means Clustering

from sklearn.cluster import KMeans


kmeans = KMeans(n_clusters=k, **kmeans_kwargs)
kmeans.fit(features)
Happy
Learning!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy