Lecture 1 Clustering PDF
Lecture 1 Clustering PDF
MACHINE LEARNING (21CSH-286) 3. To develop skills of supervised and unsupervised learning techniques and implementation of these to
solve real life problems.
Faculty: Prof. (Dr.) Vineet Mehan (E13038) 4. To develop basic knowledge on the machine techniques to build an intellectual machine for making
decisions behalf of humans.
5. To develop skills for selecting suitable model parameters and apply them for designing optimized
machine learning applications.
Lecture – 1
Clustering DISCOVER . LEARN . EMPOWER
1 2
3 4
• Applications
• REFERENCE BOOKS:
• R1 Sebastian Raschka, Vahid Mirjalili, Python Machine Learning, (2014)
• R2 Richard O. Duda, Peter E. Hart, David G. Stork, “Pattern Classification, Wiley, 2nd Edition”.
• R3 Christopher Bishop, “Pattern Recognition and Machine Learning, illustrated Edition, Springer, 2006”.
1
4/28/2023
• How?
By: Prof. (Dr.) Vineet Mehan 7 By: Prof. (Dr.) Vineet Mehan 8
Cluster 2
Cluster 1
Unlabeled Data • Buy a product on Amazon
• Then Links \ Products that are relevant to search are shown by means
of clustering.
Cluster 3
By: Prof. (Dr.) Vineet Mehan 9 By: Prof. (Dr.) Vineet Mehan 10
Cluster of Diamonds
• No response class in Clustering
Cluster 3
2
4/28/2023
3. Distribution-based Clustering
• If any other form then convert data into numeric form (Label
Encoding) 4. Hierarchical Clustering
By: Prof. (Dr.) Vineet Mehan 13 By: Prof. (Dr.) Vineet Mehan 14
Centroid of Cluster 2
Centroid of Cluster 1
By: Prof. (Dr.) Vineet Mehan 17 By: Prof. (Dr.) Vineet Mehan 18
3
4/28/2023
• Initial conditions:
• Choosing adequate initial seeds affects both the speed and quality.
By: Prof. (Dr.) Vineet Mehan 19 By: Prof. (Dr.) Vineet Mehan 20
By: Prof. (Dr.) Vineet Mehan 23 By: Prof. (Dr.) Vineet Mehan 24
4
4/28/2023
• Further, by design, these algorithms do not assign outliers to clusters. Outliers not assigned
By: Prof. (Dr.) Vineet Mehan 25 By: Prof. (Dr.) Vineet Mehan 26
Pre-requisite Pre-requisite
• Data can be "distributed" (spread out) in different ways. • But there are many cases where the data tends to be around a central
value with no bias left or right, and it gets close to a "Normal
Distribution" like this:
• It can be spread out more on the left
By: Prof. (Dr.) Vineet Mehan 29 By: Prof. (Dr.) Vineet Mehan 30
5
4/28/2023
Z Score Z Score
• A Z-Score is a statistical measurement of a score's relationship to the • The statistical formula for a value's z-score is calculated using the following
mean in a group of scores. formula:
• z=(x-μ)/σ
• In general, a Z-score of -3.0 to 3.0 suggests that a stock is trading
• Where:
within three standard deviations of its mean.
• z = Z-score
• x = the value being evaluated
• μ = the mean
• σ = the standard deviation
By: Prof. (Dr.) Vineet Mehan 31 By: Prof. (Dr.) Vineet Mehan 32
• In Figure, the distribution-based algorithm clusters data into three E.g. Expectation Maximization Algo.
Gaussian distributions. That uses Normal Distribution for
Clustering the data points
• As distance from the distribution's center increases, the probability
that a point belongs to the distribution decreases.
By: Prof. (Dr.) Vineet Mehan 35 By: Prof. (Dr.) Vineet Mehan 36
6
4/28/2023
• Large Clusters • Insurance: It is used to acknowledge the Cluster customers, Cluster policies
• Divide Clusters and Cluster frauds.
• Clusters to Inv. Elements
• Libraries: It is used in Cluster different books on the basis of topics and
information.
• Also called as Divisive Clustering
• Hierarchy of Clusters Build and Represented is called Dendogram • Biology: It can be used for Cluster different species of plants and animals.
By: Prof. (Dr.) Vineet Mehan 39 By: Prof. (Dr.) Vineet Mehan 40
Summary Task
• Clustering • Identify any 5 application areas of Clustering in details and infer which
type of clustering technique would be best suited corresponding to
each one of them. (BT-Level 4)
• Types of Clustering
• Applications
7
4/28/2023
REFERENCES
• https://www.geeksforgeeks.org/clustering-in-machine-learning/
For queries
Email: vineet.e13038@cumail.in
43 44