0% found this document useful (0 votes)

4 views4 pages

Clustering

The document provides an overview of machine learning techniques, categorizing them into supervised, unsupervised, and semi-supervised methods. It focuses on clustering techniques, detailing various models such as connectivity, centroid, distribution, and density models, along with performance criteria and algorithms like K-means. Additionally, it discusses distance measurement methods and metrics for choosing the number of clusters, including Davis Bouldin's Index, Dunn's Index, and Pseudo F-statistics.

Uploaded by

Saikat Dey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views4 pages

Clustering

Uploaded by

Saikat Dey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Cluster Analysis

Subhajit Chattopadhyay

Types of Machine Learning techniques

• Supervised => methods having a dependent variable or labelled data

– regression
– time series
– multiple discriminant analysis
– support vector machine
– decision tree

Figure 1: ML Methods

• Unsupervised - such methods does not have any dependent variable or labelled data

– clustering

• Semi-supervised - a section of methods uses labelled data and another section of method uses unlabeled
data

1
Figure 2: Cluster Analysis

2
Types of Clustering Techniques

• Connectivity models: distance connectivity between observations is the measure e.g. hierarchical clus-
tering
• Centroid models: distance from mean value of each observations / cluster is the measure e.g. k-means
clustering
• Distribution models: signficance of statistical distribution of variables in the dataset is the measure
e.g. expectation - maximization algorithms
• Density models: density in data space is the measure e.g. DBSCAN
• Another way to categorize clustering models -

– Hard clustering - each object belong to only one cluster

– Soft clustering - each object has some likelihood of belonging to some other cluster

Performance Criteria of clustering algorithms:

• High intra-class similarity

• Low inter-class similarity

K-means clustering : This can be done using 3 algorithms

• Llyod Forgy algorithm =>

– select the value of number of clusters (k)

– randomly initiate the K center in the feature space
– assign each case / data to the cluster whose center is nearest to the data
– place the center at the mean of the center of each cluster
– for each data point -
∗ after assignment of a data , calculate the distance between the data and the center of the
cluster
∗ assign the case to nearest centroid
∗ with every new addition or deletion of cases calculate the value of the centroids
– place each center at the mean of the cluster

• McQueen algorithm => McQueen and Lloyd algorithms are very similar. The only difference is -

– Lloyd algorithm updates the centroids after each iteration , so it is called batch or offline algorithm.
– McQueen algorithm updates the centroids when a case changes a cluster and algorithm passes
through the entire data set. McQueen algorith converges quicker than Llyod algorithm.

• Hartigan Wong
P algorithm => this algorithm, for each case of the dataset, calculates the sum of squared
error (ss = i=k (xi − ck )2 , where ck is the centroid of the other cluster) of that case’s current cluster
excluding the case and also sum of squared errors of other clusters where it may be assigned.

3
Measuring the distance between two data points

Distance between two cases underlines the similarity or dissimilarity. Higher the distance between two case,
lower is the similarity and vice-versa.

Pn 1
• Eucledian distance => d(x, y) = [ i=1 (xi − yi )2 ] 2
Pn
• Manhattan distance => d(x, y) = i=1 |xi − yi |
P
(xi −x̄)(yi −ȳ)
• Pearson Correlation Index =>d(x, y) = 1 − P P 1
[ (xi −x̄)2 ∗ (yi −ȳ)2 )] 2
P
xi yi
• Eisen Cosen correlation distance => d(x, y) = 1 − P x2 ∗P y2
i i

Pp 1
• Minkowski distance => d(x, y) = [ k=1 |xik − xjk |r ] r

Choosing the number of clusters:

To decide the number of clusters , internal cluster metrics are looked into -

• Davis Bouldin’s Index => It looks at the ratio of within cluster variance (scatter) to the distance
between the centroids of the clusters. This ratio is computer for all possible pair of clusters and for
each cluster.

1 X 1
Scatterk = [ (xi − ck )2 ] 2
nk
N
1
X
separationj,k = [ (cj − ck )2 ] 2
i<j<k

scatterj + scatterk
ratioj,k =
separationk
PN
The largest ratio is termed as Rk and DBIndex = N1 k=1 Rk

• Dunn’s Index => It aims to identify dense and well separated clusters. It is defined as the ratio of
minimal
• Pseudo F-statistics => It is the ratio of the between cluster sum of squares to the between clusters to
sum of square within clusters.

SSbetween /(k − 1)
P seudo − F =
SSwithin /(n − k)
n
X
SSbetween = (ck − cg )2 ∗ nk
k
nk
N X
X
SSwithin = (xi − ck )2
k i=k

Chapter 5 Violations of CLRM Assumptions
100% (2)
Chapter 5 Violations of CLRM Assumptions
25 pages
ML-Module 5-P1
No ratings yet
ML-Module 5-P1
45 pages
Real Statistics Using Excel - Examples Workbook Charles Zaiontz, 9 April 2015
No ratings yet
Real Statistics Using Excel - Examples Workbook Charles Zaiontz, 9 April 2015
1,595 pages
کتاب چهارم بارگزاری شده
No ratings yet
کتاب چهارم بارگزاری شده
63 pages
K Medoids
No ratings yet
K Medoids
101 pages
Module-5 Clustering Algorithms
No ratings yet
Module-5 Clustering Algorithms
44 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
ML Visuals
No ratings yet
ML Visuals
61 pages
Machine Learning Notes Anna University
100% (1)
Machine Learning Notes Anna University
14 pages
Clustering
No ratings yet
Clustering
35 pages
M4 - Clustering
No ratings yet
M4 - Clustering
43 pages
Mini Tab 16 Help Data Sets
50% (2)
Mini Tab 16 Help Data Sets
21 pages
Path Analysis Introduction and Example
No ratings yet
Path Analysis Introduction and Example
8 pages
Lecture 6 Stationarity and Cointegration
No ratings yet
Lecture 6 Stationarity and Cointegration
24 pages
L18 19 Clustering
No ratings yet
L18 19 Clustering
48 pages
Unsupervised Machine Learning Techniques
No ratings yet
Unsupervised Machine Learning Techniques
58 pages
Using Propensity Score Analysis in Behavioral Studies: Jie Chen, Ph.D. University of Massachusetts Boston
No ratings yet
Using Propensity Score Analysis in Behavioral Studies: Jie Chen, Ph.D. University of Massachusetts Boston
58 pages
Mscfe CRT m2
100% (1)
Mscfe CRT m2
6 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
61 pages
Unit 2
No ratings yet
Unit 2
89 pages
Ancova: Psy 420 Andrew Ainsworth
No ratings yet
Ancova: Psy 420 Andrew Ainsworth
53 pages
Big Data Analysis Pge L3 Academic Year 2022-2023: Excel Application 2 - PCA
0% (2)
Big Data Analysis Pge L3 Academic Year 2022-2023: Excel Application 2 - PCA
3 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
14 pages
CONCEPTS IN MACHINE LEARNING-Ktunotes - in
No ratings yet
CONCEPTS IN MACHINE LEARNING-Ktunotes - in
14 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
ML Unit-5
No ratings yet
ML Unit-5
21 pages
Module 5
No ratings yet
Module 5
43 pages
Unit 4
No ratings yet
Unit 4
29 pages
Structural Equation Modeling in R
No ratings yet
Structural Equation Modeling in R
28 pages
DM Clustering
No ratings yet
DM Clustering
51 pages
Mediation and Multi-Group Moderation
No ratings yet
Mediation and Multi-Group Moderation
41 pages
Machine Learning: Linear Models For Classification 1
No ratings yet
Machine Learning: Linear Models For Classification 1
30 pages
CS230
No ratings yet
CS230
6 pages
Unit IV
No ratings yet
Unit IV
51 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
Mediation Moderation
No ratings yet
Mediation Moderation
20 pages
Multiple Linear Regression Analysis Usin
No ratings yet
Multiple Linear Regression Analysis Usin
19 pages
Structural Equation Modeling (Sem) : Kassa T. (PHD) Email: Tel
No ratings yet
Structural Equation Modeling (Sem) : Kassa T. (PHD) Email: Tel
76 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Strategy For Complete Discriminant Analysis
No ratings yet
Strategy For Complete Discriminant Analysis
87 pages
Welcome To:: Multiple Regression and Model Building
No ratings yet
Welcome To:: Multiple Regression and Model Building
20 pages
Chapter - 4 Research Methodology
No ratings yet
Chapter - 4 Research Methodology
16 pages
K Mean Notes
No ratings yet
K Mean Notes
5 pages
Clustering
No ratings yet
Clustering
80 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
IUK 108 - Statistik Dengan Aplikasi Komputer November 2010
No ratings yet
IUK 108 - Statistik Dengan Aplikasi Komputer November 2010
7 pages
Sol 114
No ratings yet
Sol 114
8 pages
6 Clustering
No ratings yet
6 Clustering
15 pages
ML - 8
No ratings yet
ML - 8
70 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
17 Regression Analysis
No ratings yet
17 Regression Analysis
10 pages
TQM - TRG - F-07 - Cluster Analysis - Rev02 - 20180421
No ratings yet
TQM - TRG - F-07 - Cluster Analysis - Rev02 - 20180421
42 pages
CSE4261 Lecture-8
No ratings yet
CSE4261 Lecture-8
49 pages
Polynomial Regression and Step Function
100% (1)
Polynomial Regression and Step Function
6 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
Artikel Ahmad Fadhil Imran PDF
No ratings yet
Artikel Ahmad Fadhil Imran PDF
5 pages
Datamining Lect8
No ratings yet
Datamining Lect8
79 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
IDS Unit-3 L2
No ratings yet
IDS Unit-3 L2
26 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
ML Unit Iii
No ratings yet
ML Unit Iii
12 pages
The Group Assessment
No ratings yet
The Group Assessment
2 pages
Debanki Mitra
No ratings yet
Debanki Mitra
10 pages
Interview Screening Questions Document 1
No ratings yet
Interview Screening Questions Document 1
5 pages
M5
No ratings yet
M5
40 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
M5
No ratings yet
M5
40 pages
Week 9
No ratings yet
Week 9
66 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
Week 9 - Clustering
No ratings yet
Week 9 - Clustering
63 pages
U-5 Iml
No ratings yet
U-5 Iml
20 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
Unit-7 Finalized
No ratings yet
Unit-7 Finalized
20 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Debanki Mitra
No ratings yet
Debanki Mitra
10 pages
Debanki Mitra
No ratings yet
Debanki Mitra
10 pages
Different Reliability Tests
No ratings yet
Different Reliability Tests
3 pages
Unit 4 Machine Learning
No ratings yet
Unit 4 Machine Learning
12 pages
Debanki Mitra
No ratings yet
Debanki Mitra
7 pages
Clustering
No ratings yet
Clustering
104 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Clustering Analysis: What Is Cluster Analysis?
No ratings yet
Clustering Analysis: What Is Cluster Analysis?
5 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
Medical Imabmnge Analysis
No ratings yet
Medical Imabmnge Analysis
41 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
Graph Partitioning Advance Clustering Technique
No ratings yet
Graph Partitioning Advance Clustering Technique
14 pages
Clustering: CMPUT 466/551 Nilanjan Ray
No ratings yet
Clustering: CMPUT 466/551 Nilanjan Ray
34 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Clustering

Uploaded by

Clustering

Uploaded by

Cluster Analysis

Types of Machine Learning techniques

• Supervised => methods having a dependent variable or labelled data

– Hard clustering - each object belong to only one cluster

Performance Criteria of clustering algorithms:

• High intra-class similarity

• Low inter-class similarity

K-means clustering : This can be done using 3 algorithms

• Llyod Forgy algorithm =>

– select the value of number of clusters (k)

Choosing the number of clusters:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.