0% found this document useful (0 votes)

158 views8 pages

Lecture 1 Clustering PDF

1. The course aims to provide an understanding of machine learning techniques including data handling, visualization, supervised and unsupervised learning. 2. Students will learn skills like clustering, association rule learning, and reinforcement learning and how to apply these techniques to solve real-life problems. 3. The objectives are to understand basic learning algorithms, analyze large datasets, implement machine learning models, and build intelligent systems to make automated decisions.

Uploaded by

Pika Xavier

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

158 views8 pages

Lecture 1 Clustering PDF

Uploaded by

Pika Xavier

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

4/28/2023

ML: Course Objectives

COURSE OBJECTIVES

APEX INSTITUTE OF TECHNOLOGY

The Course aims to:
1. Understand and apply various data handling and visualization techniques.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING 2. Understand about some basic learning algorithms and techniques and their applications, as well as
general questions related to analysing and handling large data sets.

MACHINE LEARNING (21CSH-286) 3. To develop skills of supervised and unsupervised learning techniques and implementation of these to
solve real life problems.
Faculty: Prof. (Dr.) Vineet Mehan (E13038) 4. To develop basic knowledge on the machine techniques to build an intellectual machine for making
decisions behalf of humans.
5. To develop skills for selecting suitable model parameters and apply them for designing optimized
machine learning applications.
Lecture – 1
Clustering DISCOVER . LEARN . EMPOWER
1 2

COURSE OUTCOMES Unit-3 Syllabus

Unit-3 Unsupervised Learning
On completion of this course, the students shall be able to:-
Clustering Types of Clustering: Centroid-based clustering, Density-based
Identify and implement simple learning strategies using data science and clustering, Distribution-based Clustering and Hierarchical clustering;
CO3
statistics principles. K- Means Clustering, KNN (K-Nearest Neighbours), DBSCAN
Evaluate machine learning model’s performance and apply learning strategy to clustering algorithm; Performance metrics for clustering: Silhouette
CO4
improve the performance of supervised and unsupervised learning model. Score
Association Rule Apriori algorithm, F-P Growth Algorithm, Applications of Association
Learning Rule Learning, Market Basket Analysis.

Reinforcement Types of Reinforcement learning, Key Features of Reinforcement

Learning Learning, Elements of Reinforcement Learning, Applications of
Reinforcement Learning.

3 4

SUGGESTIVE READINGS Index

• TEXT BOOKS:
• There is no single textbook covering the material presented in this course. Here is a list of books
• Clustering
recommended for further reading in connection with the material presented:
• T1: Tom.M.Mitchell, “Machine Learning, McGraw Hill International Edition”.
• T2: Ethern Alpaydin,” Introduction to Machine Learning. Eastern Economy Edition, Prentice Hall of India,
2005”.
• Types of Clustering
• T3: Andreas C. Miller, Sarah Guido, Introduction to Machine Learning with Python, O’REILLY (2001).

• Applications
• REFERENCE BOOKS:
• R1 Sebastian Raschka, Vahid Mirjalili, Python Machine Learning, (2014)
• R2 Richard O. Duda, Peter E. Hart, David G. Stork, “Pattern Classification, Wiley, 2nd Edition”.
• R3 Christopher Bishop, “Pattern Recognition and Machine Learning, illustrated Edition, Springer, 2006”.

5 By: Prof. (Dr.) Vineet Mehan 6

1
4/28/2023

Clustering Real Life Example

List of 15 Brightest Star Clusters
• Clustering  To cluster the data.

• How?

• Similar kind of data are put together to form a cluster.

Data with nearly similar Characteristics

By: Prof. (Dr.) Vineet Mehan 7 By: Prof. (Dr.) Vineet Mehan 8

Grouping unlabeled Data is called clustering.

Example Practical Example
• Search on Google

Cluster 2
Cluster 1
Unlabeled Data • Buy a product on Amazon

• Then Links \ Products that are relevant to search are shown by means
of clustering.
Cluster 3

• Idea: Groups of similar objects are made.

By: Prof. (Dr.) Vineet Mehan 9 By: Prof. (Dr.) Vineet Mehan 10

Grouping unlabeled Data is called clustering.

Clustering Example
meaning

• Clustering does not need a response class Unlike Classification which

needs a response class. Cluster of Stars Cluster of Circles
Cluster 2
Unlabeled Data Cluster 1

• In Dataset we have a response class  in Classification

Cluster of Diamonds
• No response class  in Clustering
Cluster 3

• After grouping  Visually look at cluster  and Optionally associate

meaning to each cluster.
By: Prof. (Dr.) Vineet Mehan 11 By: Prof. (Dr.) Vineet Mehan 12

2
4/28/2023

Clustering Types of Clustering

• Prediction in Clustering  is set of clusters themselves
1. Centroid-based Clustering

• But data must be in numeric form. 2. Density-based Clustering

3. Distribution-based Clustering
• If any other form then convert data into numeric form (Label
Encoding) 4. Hierarchical Clustering

By: Prof. (Dr.) Vineet Mehan 13 By: Prof. (Dr.) Vineet Mehan 14

1. Centroid-based Clustering K – means algorithm

K is chosen (i.e. No. of clusters to be made (E.g. K=3))

• Centroid  Center Randomly place centroids

• Clusters are formed according to Centroid Iteration

It minimizes the Aggregate (Mean) intra cluster distances

• How? and every iteration results in different clusters

• Distance of data points to centroid should be min. After Multiple Iterations

Centroids position is identified that

• E.g. K – means algorithm is one of the popular examples of this algorithm. has min. distance to the data points.
(K  number of Clusters, To be defined by users)
By: Prof. (Dr.) Vineet Mehan 15 By: Prof. (Dr.) Vineet Mehan 16

K – means algorithm 1. Centroid-based Clustering

Two Clusters Centroid
Algorithm

Centroid of Cluster 2

Centroid of Cluster 1

By: Prof. (Dr.) Vineet Mehan 17 By: Prof. (Dr.) Vineet Mehan 18

3
4/28/2023

1. Centroid-based Clustering 1. Centroid-based Clustering

Four Clusters Centroid
Algorithm
• Centroid-based algorithms are efficient but sensitive to initial
conditions and outliers.

• Initial conditions:

• Choosing adequate initial seeds affects both the speed and quality.

• Iterating improves the centroids position, from previous centroids.

By: Prof. (Dr.) Vineet Mehan 19 By: Prof. (Dr.) Vineet Mehan 20

Outliers Reasons of Outliers

• Outlier is an observation that appears far away and diverges from an • Experimental errors (data extraction or experiment
overall pattern in a sample. planning/executing errors)
• Measurement errors (instrument errors)
• Outliers in input data can skew and mislead the training process of • Data entry errors (human errors)
machine learning algorithms • Intentional (dummy outliers made to test detection methods)
• Data processing errors (data manipulation errors)
• It results in longer training times, less accurate models and ultimately • Sampling errors (extracting or mixing data from wrong or various
poorer results. sources)
• Natural (not an error, novelties in data)
By: Prof. (Dr.) Vineet Mehan 21 By: Prof. (Dr.) Vineet Mehan 22

2. Density-based Clustering 2. Density-based Clustering

Connects areas of high density Arbitrary-shaped distributions
• Density-based clustering connects areas of high density (concentrated
density) into clusters.

• This allows for arbitrary-shaped distributions as long as dense areas

can be connected.

By: Prof. (Dr.) Vineet Mehan 23 By: Prof. (Dr.) Vineet Mehan 24

4
4/28/2023

2. Density-based Clustering 2. Density-based Clustering

• These algorithms have difficulty with data of varying densities and
high dimensions.

• Further, by design, these algorithms do not assign outliers to clusters. Outliers not assigned

By: Prof. (Dr.) Vineet Mehan 25 By: Prof. (Dr.) Vineet Mehan 26

Pre-requisite Pre-requisite
• Data can be "distributed" (spread out) in different ways. • But there are many cases where the data tends to be around a central
value with no bias left or right, and it gets close to a "Normal
Distribution" like this:
• It can be spread out more on the left

• It can be spread out more on the right

• It can be jumbled The blue curve is a Normal Distribution.

The yellow histogram shows some data that

By: Prof. (Dr.) Vineet Mehan 27
follows it closely, but not perfectly (which is usual).
By: Prof. (Dr.) Vineet Mehan 28

Pre-requisite Standard Normal Distribution

• We say the data is "normally distributed":

• The Normal Distribution has:

• mean = median = mode
• symmetry about the center
• 50% of values less than the mean
• and 50% values greater than the mean

By: Prof. (Dr.) Vineet Mehan 29 By: Prof. (Dr.) Vineet Mehan 30

5
4/28/2023

Z Score Z Score
• A Z-Score is a statistical measurement of a score's relationship to the • The statistical formula for a value's z-score is calculated using the following
mean in a group of scores. formula:

• z=(x-μ)/σ
• In general, a Z-score of -3.0 to 3.0 suggests that a stock is trading
• Where:
within three standard deviations of its mean.
• z = Z-score
• x = the value being evaluated
• μ = the mean
• σ = the standard deviation
By: Prof. (Dr.) Vineet Mehan 31 By: Prof. (Dr.) Vineet Mehan 32

3. Distribution-based Clustering 3. Distribution-based Clustering

Data clustered into three Gaussian distributions
• This clustering approach assumes data is composed of distributions,
such as Gaussian distributions.

• In Figure, the distribution-based algorithm clusters data into three E.g. Expectation Maximization Algo.
Gaussian distributions. That uses Normal Distribution for
Clustering the data points
• As distance from the distribution's center increases, the probability
that a point belongs to the distribution decreases.

It is similar to centroid based clustering except that in this probability

By: Prof. (Dr.) Vineet Mehan 33 By: Prof. (Dr.) Vineet Mehan 34
is used to compute the clusters rather than mean.

4. Hierarchical Clustering 4. Hierarchical Clustering

In the animal kingdom, animals have been categorized into two main groups vertebrate and invertebrate.
This differentiation is mainly based on the presence and absence of the backbone (spinal column).
• Hierarchical clustering creates a tree of clusters.

• Hierarchical clustering, not surprisingly, is well suited to hierarchical

data, such as taxonomies (Categorization).

• Taxonomy is a system for naming and organizing things, especially

plants and animals, into groups that share similar qualities.

By: Prof. (Dr.) Vineet Mehan 35 By: Prof. (Dr.) Vineet Mehan 36

6
4/28/2023

4. Hierarchical Clustering Hierarchical Clustering

Plotted Data Points on XY axis
Hierarchy of Clusters Build
Y • Approach: Bottom to Up
E Data Points are close
(Form a cluster) ABCDE
• Inv. Elements to Clusters
D • Combination of Clusters According to similarity
C CDE
• Large Clusters
AB CD
A B Data Points are close • Also called as Agglomerative Clustering
(Form a cluster)
A B C D E • Hierarchy of Clusters Build and Represented is called Dendogram
X 5 Data Points
By: Prof. (Dr.) Vineet Mehan 37 By: Prof. (Dr.) Vineet Mehan 38

Hierarchical Clustering Applications of Clustering

• Approach: Top to Bottom • Marketing: It can be used to Cluster different customer segments for
marketing purposes.

• Large Clusters • Insurance: It is used to acknowledge the Cluster customers, Cluster policies
• Divide Clusters and Cluster frauds.
• Clusters to Inv. Elements
• Libraries: It is used in Cluster different books on the basis of topics and
information.
• Also called as Divisive Clustering
• Hierarchy of Clusters Build and Represented is called Dendogram • Biology: It can be used for Cluster different species of plants and animals.

By: Prof. (Dr.) Vineet Mehan 39 By: Prof. (Dr.) Vineet Mehan 40

Summary Task
• Clustering • Identify any 5 application areas of Clustering in details and infer which
type of clustering technique would be best suited corresponding to
each one of them. (BT-Level 4)
• Types of Clustering

• Applications

41 By: Prof. (Dr.) Vineet Mehan 42

7
4/28/2023

REFERENCES
• https://www.geeksforgeeks.org/clustering-in-machine-learning/

• https://www.javatpoint.com/clustering-in-machine-learning THANK YOU

• https://developers.google.com/machine-
learning/clustering/overview

For queries
Email: vineet.e13038@cumail.in
43 44

Phoenix Black-Microwave Muffle Furnace
No ratings yet
Phoenix Black-Microwave Muffle Furnace
12 pages
Module - 5 - ECE3047 - Machine Learning
No ratings yet
Module - 5 - ECE3047 - Machine Learning
52 pages
Unit 4 Introduction To Algorithm
No ratings yet
Unit 4 Introduction To Algorithm
10 pages
Clustering
No ratings yet
Clustering
22 pages
04-FSSR DS610 2024 2025T1 Kmeans
No ratings yet
04-FSSR DS610 2024 2025T1 Kmeans
57 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
CEC453 Machine Learning
No ratings yet
CEC453 Machine Learning
168 pages
ML Lecture06 Unsupervised Learning
No ratings yet
ML Lecture06 Unsupervised Learning
87 pages
Machine Learningfor Everyone
No ratings yet
Machine Learningfor Everyone
35 pages
ML 8
No ratings yet
ML 8
5 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
17 pages
Clustering
No ratings yet
Clustering
20 pages
Machine Learning Clustering AlgorithmsI
No ratings yet
Machine Learning Clustering AlgorithmsI
129 pages
DS Chapter 5
No ratings yet
DS Chapter 5
28 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
21 pages
Classify Clustering
No ratings yet
Classify Clustering
31 pages
CC - Unit IV - Chapters
No ratings yet
CC - Unit IV - Chapters
47 pages
Clustering
No ratings yet
Clustering
16 pages
Machine Learning For Everyone
No ratings yet
Machine Learning For Everyone
35 pages
ML Clustering
No ratings yet
ML Clustering
33 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
Unit 4
No ratings yet
Unit 4
62 pages
Lect 12
No ratings yet
Lect 12
80 pages
Lecturer-1 Unit 3
No ratings yet
Lecturer-1 Unit 3
31 pages
3CP10 MJJ Clustering Intro
No ratings yet
3CP10 MJJ Clustering Intro
18 pages
Clustering New
No ratings yet
Clustering New
6 pages
9 Som
No ratings yet
9 Som
32 pages
Clustering
No ratings yet
Clustering
57 pages
Classification Clustering Overview
No ratings yet
Classification Clustering Overview
7 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Cbsyllabus Bda
No ratings yet
Cbsyllabus Bda
5 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
63 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
MLT Unit 3 Notes
No ratings yet
MLT Unit 3 Notes
32 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
108 pages
Unit 6
No ratings yet
Unit 6
22 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
18 pages
Data Science
No ratings yet
Data Science
20 pages
ML Unit-3
No ratings yet
ML Unit-3
22 pages
Clustering
No ratings yet
Clustering
82 pages
Unit 5
No ratings yet
Unit 5
33 pages
Jntuk Machine Learning 3-2 Unit-4
No ratings yet
Jntuk Machine Learning 3-2 Unit-4
32 pages
Cs8080 Unit3 Text Classification and Clustering
No ratings yet
Cs8080 Unit3 Text Classification and Clustering
171 pages
UNIT5
No ratings yet
UNIT5
60 pages
ML Unit 5
No ratings yet
ML Unit 5
20 pages
Slides Courtesy: Ling Chen lchen@L3S.de
No ratings yet
Slides Courtesy: Ling Chen lchen@L3S.de
42 pages
Unit III Clustering
No ratings yet
Unit III Clustering
47 pages
Classification and Clustering
No ratings yet
Classification and Clustering
8 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
47 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
Decision Trees. These Models Use Observations About Certain
No ratings yet
Decision Trees. These Models Use Observations About Certain
6 pages
Unit 5 ML
No ratings yet
Unit 5 ML
38 pages
DM After Midz
No ratings yet
DM After Midz
22 pages
6th - SEM Machine Learning Notes PDF
100% (1)
6th - SEM Machine Learning Notes PDF
36 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
4 pages
Unit 8
No ratings yet
Unit 8
62 pages
OPTALIGNsmart guideNV
No ratings yet
OPTALIGNsmart guideNV
2 pages
Resume: Lokam Srikanth Contact No: +91 8463931010
No ratings yet
Resume: Lokam Srikanth Contact No: +91 8463931010
2 pages
Clinical Job Aid Radiant Warmer Phoenix
No ratings yet
Clinical Job Aid Radiant Warmer Phoenix
2 pages
T34 Catlogue - Catalogue - V2 - 2023
No ratings yet
T34 Catlogue - Catalogue - V2 - 2023
8 pages
Sf6 Gas Density Monitor
No ratings yet
Sf6 Gas Density Monitor
2 pages
Certificates PageNumbers Centered From Intro
No ratings yet
Certificates PageNumbers Centered From Intro
67 pages
RD545 Acoustic Leak Detector: Advanced Electronic Ground Microphone
No ratings yet
RD545 Acoustic Leak Detector: Advanced Electronic Ground Microphone
2 pages
248HSL
No ratings yet
248HSL
8 pages
Building Internet Brands: Brand Equity and Brand Image Creating A Strong Brand On The Internet
No ratings yet
Building Internet Brands: Brand Equity and Brand Image Creating A Strong Brand On The Internet
22 pages
SPE-199091-MS, Electric Submersible Pump Troubleshooting Guide, An Effective Way To Improve System Performance and Reduce Avoidable System Failues
100% (1)
SPE-199091-MS, Electric Submersible Pump Troubleshooting Guide, An Effective Way To Improve System Performance and Reduce Avoidable System Failues
18 pages
Outlining Long Quiz
No ratings yet
Outlining Long Quiz
3 pages
Dissertation Knowledge Management PDF
100% (2)
Dissertation Knowledge Management PDF
7 pages
Telecommunications Security Code of Practice
No ratings yet
Telecommunications Security Code of Practice
150 pages
Networking
No ratings yet
Networking
4 pages
RL Quadcopter Movement Control Using Image Processing Techniques
No ratings yet
RL Quadcopter Movement Control Using Image Processing Techniques
4 pages
Bipolar Soft Neutrosophic Topological Region
No ratings yet
Bipolar Soft Neutrosophic Topological Region
5 pages
TADANO 80ton GR-800EX - Specification & Load Chart PDF
0% (1)
TADANO 80ton GR-800EX - Specification & Load Chart PDF
13 pages
PHD Thesis Media Communication
100% (3)
PHD Thesis Media Communication
4 pages
Quran Fonts
0% (1)
Quran Fonts
8 pages
EE502 Assignment Answers
No ratings yet
EE502 Assignment Answers
2 pages
Leviat - Ancon - AUS Coupler BR - 2024
No ratings yet
Leviat - Ancon - AUS Coupler BR - 2024
24 pages
Simple Packer-In C Gunther
No ratings yet
Simple Packer-In C Gunther
10 pages
SMART HELMET and SOS
No ratings yet
SMART HELMET and SOS
9 pages
ESL Brains Texting Is Killing Language TV 1311
No ratings yet
ESL Brains Texting Is Killing Language TV 1311
2 pages
03 S4HANA Logistics
No ratings yet
03 S4HANA Logistics
50 pages
Aman Pandey Resume 20241012
No ratings yet
Aman Pandey Resume 20241012
2 pages
CBD ZZ 00 DR DR 1001
No ratings yet
CBD ZZ 00 DR DR 1001
1 page
Tekstong Deskriptibo
No ratings yet
Tekstong Deskriptibo
1 page
Mickael Musindo
No ratings yet
Mickael Musindo
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture 1 Clustering PDF

Uploaded by

Lecture 1 Clustering PDF

Uploaded by

4/28/2023

ML: Course Objectives

APEX INSTITUTE OF TECHNOLOGY

COURSE OUTCOMES Unit-3 Syllabus

Reinforcement Types of Reinforcement learning, Key Features of Reinforcement

SUGGESTIVE READINGS Index

5 By: Prof. (Dr.) Vineet Mehan 6

Clustering Real Life Example

• Similar kind of data are put together to form a cluster.

Data with nearly similar Characteristics

Grouping unlabeled Data is called clustering.

• Idea: Groups of similar objects are made.

Grouping unlabeled Data is called clustering.

• Clustering does not need a response class Unlike Classification which

• In Dataset we have a response class  in Classification

• After grouping  Visually look at cluster  and Optionally associate

Clustering Types of Clustering

• But data must be in numeric form. 2. Density-based Clustering

1. Centroid-based Clustering K – means algorithm

• Centroid  Center Randomly place centroids

• Clusters are formed according to Centroid Iteration

It minimizes the Aggregate (Mean) intra cluster distances

• Distance of data points to centroid should be min. After Multiple Iterations

Centroids position is identified that

K – means algorithm 1. Centroid-based Clustering

1. Centroid-based Clustering 1. Centroid-based Clustering

• Iterating improves the centroids position, from previous centroids.

Outliers Reasons of Outliers

2. Density-based Clustering 2. Density-based Clustering

• This allows for arbitrary-shaped distributions as long as dense areas

2. Density-based Clustering 2. Density-based Clustering

• It can be spread out more on the right

• It can be jumbled The blue curve is a Normal Distribution.

The yellow histogram shows some data that

Pre-requisite Standard Normal Distribution

• The Normal Distribution has:

3. Distribution-based Clustering 3. Distribution-based Clustering

It is similar to centroid based clustering except that in this probability

4. Hierarchical Clustering 4. Hierarchical Clustering

• Hierarchical clustering, not surprisingly, is well suited to hierarchical

• Taxonomy is a system for naming and organizing things, especially

4. Hierarchical Clustering Hierarchical Clustering

Hierarchical Clustering Applications of Clustering

41 By: Prof. (Dr.) Vineet Mehan 42

• https://www.javatpoint.com/clustering-in-machine-learning THANK YOU

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.