0% found this document useful (0 votes)

6 views130 pages

Aiml Ece Unit-4

Uploaded by

Sai Loukik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views130 pages

Aiml Ece Unit-4

Uploaded by

Sai Loukik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 130

UNIT-IV

Unsupervised Machine Learning

UNIT-IV
Unsupervised Machine Learning
Contents

• Different clustering methods (Distance, Density,

Hierarchical)
• Iterative distance-based clustering; K-Means
Clustering Algorithm and Image Quantization,
• Basics of Principal Component Analysis.
BOOK -- Machine Learning by Saikat Dutt
https://youtu.be/3QE_SueBMzc?si=wVEco4z88BM-4XU3
(Statistical characteristics of a population,
such as age, gender, ethnicity, education,
and income, etc.)
Image segmentation and
compression
(Image Quantization)
• Consider the problem of identifying groups, or clusters, of data points in a
multidimensional space.

• Suppose we have a data set {x1,...,xN} consisting of

• N observations of a random D-dimensional Euclidean variable x.
• Our goal is to partition the dataset into K clusters.

• We might think of a cluster as comprising a group of data points whose inter-

point distances are small compared with the distances to points outside of the
cluster.

• Formalize this notion by a set of D-dimensional vectors µk, where k=1,...,K, in

which µk is a prototype associated with the kth CLUSTER.
• µk as representing the centres of the clusters.

• Our goal is
• to find an assignment of data points to clusters and a set of vectors {µk}, such that the
sum of the squares of the distances of each data point to its closest vector µk, is a
minimum.
Source: Pattern Recognition and Machine Learning by Christopher Bishop
• Notation to describe the assignment of data points to
clusters.
• For each data point xn, we introduce a corresponding set of
binary indicator variables rnk∈{0,1}, where k=1,...,K describing
which of the K clusters the data point xn is assigned to, so that if
data point xn is assigned to cluster k then rnk=1, and rnj =0 for j≠k.
• This is known as the 1-of-K coding scheme.

• We can then define an objective function, sometimes

called a distortion measure, given by

• which represents the sum of the squares of the distances of

each data point to its assigned vector µk.
• Our goal is to find values for the {rnk} and the {µk} to minimize J.

• We can do this through an iterative procedure in which each

iteration involves two successive steps corresponding to
successive optimizations w.r.t the rnk and the µk.
• First phase,
• choose some initial values for the µk.
• Then minimize J w.r.t rnk, keeping the µk fixed.
• Second phase,
• minimize J w.r.t the µk, keeping rnk fixed.

• This two-stage optimization is then repeated until convergence.

• These two stages of updating rnk and updating µk correspond

respectively to the E(expectation) and M(maximization) steps of
the EM algorithm.
• We simply assign the nth data point to the closest cluster centre.
• Formally, this can be expressed as
• The two phases of re-assigning data points to
clusters and re-computing the cluster means are
repeated in turn until there is no further change in
the assignments (or until some maximum number of
iterations is exceeded).

• Because each phase reduces the value of the

objective function J, convergence of the algorithm is
assured.

• However, it may converge to a local rather than

global minimum of J.
• We have deliberately chosen poor initial values for the cluster centres so that the
algorithm takes several steps before convergence.
• In practice, a better initialization procedure would be to choose the cluster centres µ k
to be equal to random subset of K data points.
• It is also worth noting that the K-means algorithm itself is often used to initialize the
parameters in a Gaussian mixture model before applying the EM algorithm.
• A direct implementation of the K-means algorithm, as discussed here, can be relatively
slow because, in each E step, it is necessary to compute the Euclidean distance
between every prototype vector and every data point.
Image segmentation and compression
(quantization)
• Illustration of the application of the K-means algorithm;
• Consider the related problems of image segmentation and image compression.
• The goal of segmentation is
• to partition an image into regions, each of which has a reasonably
homogeneous visual appearance or which corresponds to objects or parts of
objects.

• Each pixel in an image is a point in a 3-dimensional space comprising

the intensities of the red, blue, and green channels

• The segmentation algorithm treats each pixel in the image as a

separate data point.

• For a given value of K, the algorithm is representing the image using

only K colors.
• This use of K-means is not a sophisticated approach to image segmentation
because it takes no account of the spatial proximity of different pixels.
• We illustrate the result of running K-means to convergence, for any particular value of K, by re-
drawing the image and replacing each pixel vector with the {R,G,B} intensity triplet given by the
centre µk to which that pixel has been assigned.

• Results for various values of K are shown in the Figure.

• We can also use a clustering algorithm to perform data
compression.

• It is important to distinguish between lossless data compression

(reconstructing the original data exactly) and lossy data
compression (accepting some errors in the reconstruction).

• We can apply the K-means algorithm to the problem of lossy data

compression as follows.
• For each of the N data points, we store only the identity k of the cluster
to which it is assigned.
• We also store the values of the K cluster centres µk, which typically
requires significantly less data, provided we choose K<< N.
• Each data point is then approximated by its nearest centre µk.
• New data points can similarly be compressed by first finding the nearest
µk and then storing the label k instead of the original data vector.
• This framework is often called vector quantization, and the vectors µk
are called code-book vectors.
• Suppose the original image has N pixels comprising
{R,G,B} values, each of which is stored with 8 bits of
precision.
• Then to transmit the whole image directly would cost
24N bits.

• Suppose we first run K-means on the image data, and

then instead of transmitting the original pixel intensity
vectors, we transmit the identity of the nearest vector µk.
• Because there are K such vectors, this requires log2K bits
per pixel.
• We must also transmit the K code book vectors µk, which
requires 24K bits, so the total number of bits required to
transmit the image is 24K + N log2 K (rounding up to the
nearest integer).
• The original image shown in Figure 9.3 (above) has 240×180 =
43,200 pixels and so requires 24×43,200 = 1,036,800 bits to
transmit directly.

• Compressed images require 43,248 bits (K =2); 86,472 bits (K =3),

and 173,040bits (K =10),respectively, to transmit.

• These represent compression ratios compared to the original

image of 4.2%, 8.3%, and 16.7%, respectively.

• We see that there is a trade-off between the degree of

compression and image quality.

• If we had been aiming to produce a good image compressor, then

it would be more fruitful to consider small blocks of adjacent
pixels, for instance 5×5, and thereby exploit the correlations that
exist in natural images between nearby pixels.
https://youtu.be/0NNcVu9v3nw?si=q8yyJ9-0z8ypBLhW
Example –
NEXT SLIDE
https://youtu.be/YH0r47m0kFM?si=3oVMsdODjZLpfsHZ
Min distance
between 42 and 43
Merge 42 and 43

Merged 42 and 43
Min distance
between 25 and 27

Merge 25 and 27

Merged 42 and 43
Min distance between
22 and (25,27)

Merged 22 and (25,27)

Min distance between
18 and (22,25,27)

Merged 18 and (22,25,27)

Min distance between
P3 and P6
Min distance between
P4 and (P3,P6)
Density-based
clustering methods
• When partitioning and hierarchical clustering methods are used, the
resulting clusters are spherical or nearly spherical in nature.

• In the case of the other shaped clusters such as S-shaped or uneven

shaped clusters, the above two types of method do not provide
accurate results.

• The density-based clustering approach provides a solution to identify

clusters of arbitrary shapes.

• The principle is based on identifying the dense area and sparse area
within the data set and then run the clustering algorithm.

• DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

is one of the popular density-based algorithm which creates clusters
by using connected regions with high density.
Why do we need a Density-Based
clustering algorithm like DBSCAN when
•
we already have K-means clustering?
K-Means clustering may cluster loosely related observations together. Every observation becomes a
part of some cluster even if the observations are scattered far away in the vector space.
• Since clusters depend on the mean value of cluster elements, each data point plays a role in forming the
clusters. A slight change in data points might affect the clustering outcome.
• This problem is greatly reduced in DBSCAN due to the way clusters are formed.
• Another challenge with k-means is that you need to specify the number of clusters (“k”) in order to
use it. Much of the time, we won’t know what a reasonable k value is a priori.
• In DBSCAN, you don’t have to specify the number of clusters to use it.
• All you need is a function to calculate the distance between values and some guidance for what
amount of distance is considered “close”. DBSCAN also produces more reasonable results than k-means
across a variety of different distributions.
• Below figure illustrates the fact:
(Density-Based Spatial Clustering of Applications with Noise)

https://youtu.be/
jqKAAVEwX9M?
si=cnvsppj2X-KFBWA6
A

Distance Based Models
No ratings yet
Distance Based Models
19 pages
Unit 6
No ratings yet
Unit 6
102 pages
Clustering
No ratings yet
Clustering
41 pages
14
No ratings yet
14
72 pages
Outline: Three Basic Algorithms
No ratings yet
Outline: Three Basic Algorithms
34 pages
Clustering
No ratings yet
Clustering
82 pages
Unit-Iv Material
No ratings yet
Unit-Iv Material
24 pages
ML Unit-4
No ratings yet
ML Unit-4
23 pages
8 Cluster
No ratings yet
8 Cluster
33 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
ML 03 Clustering
No ratings yet
ML 03 Clustering
63 pages
Lecture08b Kmeans
No ratings yet
Lecture08b Kmeans
10 pages
ML - Unit - 4 - Part Ii
No ratings yet
ML - Unit - 4 - Part Ii
79 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Machine Learning-4
No ratings yet
Machine Learning-4
73 pages
Data Science Analysis Final Project
No ratings yet
Data Science Analysis Final Project
10 pages
ML Unit - 3
No ratings yet
ML Unit - 3
23 pages
K-Means Clustering and Related Algorithms: Ryan P. Adams
No ratings yet
K-Means Clustering and Related Algorithms: Ryan P. Adams
16 pages
07 Clustering 2024
No ratings yet
07 Clustering 2024
51 pages
UNIT III Part-1
No ratings yet
UNIT III Part-1
69 pages
CV Unit 4
No ratings yet
CV Unit 4
60 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
Image Segmentation Using K Mean Algorithm
No ratings yet
Image Segmentation Using K Mean Algorithm
5 pages
Clustering
No ratings yet
Clustering
75 pages
ML Lecture14
No ratings yet
ML Lecture14
17 pages
K Means
No ratings yet
K Means
10 pages
Clustering Slides
No ratings yet
Clustering Slides
22 pages
Unit 3 Updated Notes
No ratings yet
Unit 3 Updated Notes
29 pages
W6 Clustering
No ratings yet
W6 Clustering
29 pages
What Is Image Segmentation?: Different
No ratings yet
What Is Image Segmentation?: Different
30 pages
Digital Image Processing
No ratings yet
Digital Image Processing
37 pages
Lecture Notes On Clustering
No ratings yet
Lecture Notes On Clustering
10 pages
Image Segmentation by Using K-Means Clustering
No ratings yet
Image Segmentation by Using K-Means Clustering
25 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
K Mean Clustering
No ratings yet
K Mean Clustering
32 pages
K - Means Clustering and Related Algorithms: Ryan P. Adams COS 324 - Elements of Machine Learning Princeton University
No ratings yet
K - Means Clustering and Related Algorithms: Ryan P. Adams COS 324 - Elements of Machine Learning Princeton University
18 pages
M5
No ratings yet
M5
40 pages
ML Lec-16
No ratings yet
ML Lec-16
16 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
5 - Clustering
No ratings yet
5 - Clustering
13 pages
Unit 5
No ratings yet
Unit 5
63 pages
Report 1
No ratings yet
Report 1
3 pages
Clustering and Visualisation of Data - 2020
No ratings yet
Clustering and Visualisation of Data - 2020
5 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
Unit 3 & 4 (p18)
No ratings yet
Unit 3 & 4 (p18)
18 pages
PART2
No ratings yet
PART2
61 pages
DIP Lecture4 5
No ratings yet
DIP Lecture4 5
82 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
What Is Artificial Intelligence Seeing Through The Hype and Focusing On Business Value
100% (1)
What Is Artificial Intelligence Seeing Through The Hype and Focusing On Business Value
17 pages
UNIT5
No ratings yet
UNIT5
60 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
Clustering L7
No ratings yet
Clustering L7
7 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
Artificial Intelligence Vs Machine Learning Vs Deep Learning
No ratings yet
Artificial Intelligence Vs Machine Learning Vs Deep Learning
38 pages
Lect 4
No ratings yet
Lect 4
34 pages
Clustering Analysis
No ratings yet
Clustering Analysis
30 pages
MLT Unit 3 Notes
No ratings yet
MLT Unit 3 Notes
19 pages
Clustering: CMPUT 466/551 Nilanjan Ray
No ratings yet
Clustering: CMPUT 466/551 Nilanjan Ray
34 pages
Important Questions of Machine Learning
No ratings yet
Important Questions of Machine Learning
5 pages
Graph Partitioning Advance Clustering Technique
No ratings yet
Graph Partitioning Advance Clustering Technique
14 pages
Master Thesis Computer Vision
100% (3)
Master Thesis Computer Vision
6 pages
Ai Project Logbook
100% (1)
Ai Project Logbook
30 pages
Week 2 Lecture Notes
No ratings yet
Week 2 Lecture Notes
98 pages
AI For Pharmacy Students
No ratings yet
AI For Pharmacy Students
29 pages
Abp
No ratings yet
Abp
10 pages
Aiml Ece Unit-3
No ratings yet
Aiml Ece Unit-3
161 pages
Machine Learning and Law
No ratings yet
Machine Learning and Law
29 pages
Aiml Ece Unit-5
No ratings yet
Aiml Ece Unit-5
48 pages
Unit - 1 AIML
No ratings yet
Unit - 1 AIML
60 pages
Unit I
No ratings yet
Unit I
26 pages
Go Cloudpaths
No ratings yet
Go Cloudpaths
41 pages
Lecture 34
No ratings yet
Lecture 34
13 pages
1-MATERIAL DL Syllabus V2
No ratings yet
1-MATERIAL DL Syllabus V2
2 pages
PTSP Unit-2
No ratings yet
PTSP Unit-2
65 pages
Ok Design and Implementation of 32-Bits MIPS Processor To Perform QRD Based On FPGA
No ratings yet
Ok Design and Implementation of 32-Bits MIPS Processor To Perform QRD Based On FPGA
6 pages
A Scalable Proof Methodology For RISC Processor Designs A Functional Approach
No ratings yet
A Scalable Proof Methodology For RISC Processor Designs A Functional Approach
6 pages
Autonomous Nuclear Power Plants With Artificial Intelligence
No ratings yet
Autonomous Nuclear Power Plants With Artificial Intelligence
280 pages
Deep Learning Unit 1..
No ratings yet
Deep Learning Unit 1..
21 pages
Ok Implementation of 32-Bit RISC Processors Without Interlocked Pipelining On Artix-7 FPGA B
No ratings yet
Ok Implementation of 32-Bit RISC Processors Without Interlocked Pipelining On Artix-7 FPGA B
4 pages
Fake News Detection System PDF
No ratings yet
Fake News Detection System PDF
47 pages
Kernels
No ratings yet
Kernels
65 pages
K Means Clustering
No ratings yet
K Means Clustering
17 pages
PTSP Unit-1
No ratings yet
PTSP Unit-1
63 pages
Aerospace Dissertation Ideas
100% (2)
Aerospace Dissertation Ideas
5 pages
Datascience
No ratings yet
Datascience
6 pages
Navya - Week 3 Assignment
No ratings yet
Navya - Week 3 Assignment
14 pages
Masters Data Science Deakin
No ratings yet
Masters Data Science Deakin
14 pages
Assignment 2 (Cse121)
No ratings yet
Assignment 2 (Cse121)
6 pages
Tutorial #7 - ID3 and CART
No ratings yet
Tutorial #7 - ID3 and CART
3 pages
PTSP Unit-3
No ratings yet
PTSP Unit-3
32 pages
L L Ra: E F - L - C L L M: ONG O Fficient INE Tuning OF ONG Ontext Arge Anguage Odels
No ratings yet
L L Ra: E F - L - C L L M: ONG O Fficient INE Tuning OF ONG Ontext Arge Anguage Odels
17 pages
1 s2.0 S1755008423001266 Main
No ratings yet
1 s2.0 S1755008423001266 Main
16 pages
答案解析
No ratings yet
答案解析
15 pages
Open Electives Circular VII Sem AY 2021-22
No ratings yet
Open Electives Circular VII Sem AY 2021-22
31 pages
Lungs Disease Prediction Using Machine Learning Techniques
No ratings yet
Lungs Disease Prediction Using Machine Learning Techniques
37 pages
PTSP U-4&5 Sir Notes
No ratings yet
PTSP U-4&5 Sir Notes
9 pages
Neural Network For Image Classification
No ratings yet
Neural Network For Image Classification
16 pages
Computers & Security: Bin Li, Yijie Wang, Kele Xu, Li Cheng, Zhiquan Qin
No ratings yet
Computers & Security: Bin Li, Yijie Wang, Kele Xu, Li Cheng, Zhiquan Qin
17 pages
Startup Finance
No ratings yet
Startup Finance
1 page
Misuse of Ai in Art
No ratings yet
Misuse of Ai in Art
2 pages
Icipcn 2020 1
No ratings yet
Icipcn 2020 1
2 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Aiml Ece Unit-4

Uploaded by

Aiml Ece Unit-4

Uploaded by

UNIT-IV

Unsupervised Machine Learning

• Different clustering methods (Distance, Density,

• Suppose we have a data set {x1,...,xN} consisting of

• We might think of a cluster as comprising a group of data points whose inter-

• Formalize this notion by a set of D-dimensional vectors µk, where k=1,...,K, in

• We can then define an objective function, sometimes

• which represents the sum of the squares of the distances of

• We can do this through an iterative procedure in which each

• This two-stage optimization is then repeated until convergence.

• These two stages of updating rnk and updating µk correspond

• Because each phase reduces the value of the

• However, it may converge to a local rather than

• Each pixel in an image is a point in a 3-dimensional space comprising

• The segmentation algorithm treats each pixel in the image as a

• For a given value of K, the algorithm is representing the image using

• Results for various values of K are shown in the Figure.

• It is important to distinguish between lossless data compression

• We can apply the K-means algorithm to the problem of lossy data

• Suppose we first run K-means on the image data, and

• Compressed images require 43,248 bits (K =2); 86,472 bits (K =3),

• These represent compression ratios compared to the original

• We see that there is a trade-off between the degree of

• If we had been aiming to produce a good image compressor, then

Merged 22 and (25,27)

Merged 18 and (22,25,27)

• In the case of the other shaped clusters such as S-shaped or uneven

• The density-based clustering approach provides a solution to identify

• DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.