0% found this document useful (0 votes)
220 views18 pages

Clustering PPT 1233

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
220 views18 pages

Clustering PPT 1233

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18

Clustering In

Machine Learning
BY
BODA SANTOSH NAIK(EC21B020)
BANOTH ROHITH(EC21B015)
DESAVATH SIVA NAIK(EC21B024)
Introduction to Clustering
Clustering is an unsupervised learning technique used to group
similar data points.

It helps in discovering inherent patterns within datasets without


prior labels.

Clustering is widely used in various applications such as image


segmentation and customer segmentation.

PAGE-2
Importance of Clustering

Clustering simplifies complex datasets by reducing dimensionality.

It facilitates
. better data analysis by grouping similar items together.

Clustering can improve decision-making processes in business and research.

PAGE-3
Types of Clustering

Clustering can be categorized into several types,


including centroid-based, density-based, and
hierarchical clustering.

Each type has its own methodology and use cases


suited for different data distributions.

Understanding the types of clustering is crucial for


selecting the appropriate algorithm.
Centroid-Based Clustering
K-Means is a widely used centroid-based clustering
algorithm.

It partitions the data into K clusters by minimizing


the variance within each cluster.

The algorithm iteratively updates cluster centroids


until convergence is reached.
Density-Based Clustering

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) identifies clusters based
on the density of data points.

DBSCAN requires two parameters: epsilon (neighborhood radius) and minPts (minimum points
to form a cluster).

Density Based consists of 3 types of data points

Core point : It should satisfy the condition of min. pts

Boundary point : Neighbour of Core.

Noise point : Not core nor boundary


PAGE-6
Hierarchical Clustering

Hierarchical clustering creates a tree-like


structure to represent data relationships.

It can be agglomerative (bottom-up) or


divisive (top-down) in its approach.

Dendrograms are commonly used to


visualize the results of hierarchical
clustering.

PAGE-7
Evaluation Metrics

Evaluation matrices are crucial tools in machine learning for assessing the performance of a model. They
provide quantitative measures to understand how well a model is making predictions. Here are some
commonly used evaluation matrices.
.
Classification of matrices:
Accuracy : Accuracy is a matrices that measures how often a machine learning model correctly
predicts the outcomes.

Precision : Precision performance the quality of a positive prediction made by the model.

Recall : Recall is a machine learning metric that measures how well a model can identify positive
instances in a dataset.

PAGE-8
Challenges in Clustering

Clustering is sensitive to outliers, which


can distort the results significantly.

The choice of the number of clusters (K)


in algorithms like K-Means can be
subjective.

High-dimensional data often leads to the


“curse of dimensionality,” complicating
clustering.
Practical Applications

Clustering is used in customer segmentation


to tailor marketing strategies effectively.

It plays a critical role in image and video


processing for object recognition.

In bioinformatics, clustering helps in gene


expression analysis and protein
classification.

PAGE-10
Tools and Libraries

Popular libraries for clustering in Python include Scikit-


learn, Scipy, and HDBSCAN.

R also offers robust clustering packages such as 'cluster'


and 'factoextra’.

These tools provide easy-to-use implementations of


various clustering algorithms.

Pandas is useful for data manipulation and preprocessing


before clustering.

Numpy is useful for numerical operations, it’s often used


for implementing clustering algorithms from scratch.
Case Study: Customer Segmentation
A retail company used K-Means clustering to segment its
customer base into distinct groups.

This segmentation enabled targeted marketing campaigns


and improved customer engagement.

The results showed a significant increase in sales and


customer satisfaction.

PAGE-12
Case Study: Image Segmentation

Researchers applied DBSCAN for segmenting complex


images in a computer vision project.

The algorithm effectively identified regions of interest


while ignoring background noise.

This segmentation improved the accuracy of subsequent


image classification tasks.

The segmentation approach was applied to real-world


data, such as satellite images and medical scans, Where
DBSCAN successfully identified key region like urban
areas or tumor boundaries, further validating its
effectiveness.

PAGE-13
Future Directions

The integration of clustering with deep learning


techniques is an emerging trend.

Research is focusing on developing algorithms that


can handle dynamic and streaming data.

Further advancements in clustering will enhance its


applicability across various domains.

PAG-14
Best Practices

Always preprocess your data to remove noise and


handle missing values before clustering.

Experiment with multiple algorithms and


parameters to find the most suitable method for
your data.

Visualize the clusters formed to gain insights and


validate the clustering results.

PAGE-15
Conclusion
Clustering is a powerful tool for data analysis that
uncovers hidden structures in data.

Understanding different clustering algorithms and


their applications is essential for practitioners.

As data continues to grow, the importance and


relevance of clustering in machine learning will
only increase

PAG-16
PAG-17

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy