ML Unit 4 V1
ML Unit 4 V1
UNIT-IV
By
Dr.Satyabrata Dash
Professor
Department of Computer Science & Engineering
Ramachandra College of Engineering, Eluru
1. Unsupervised learning is different from the Supervised learning technique; as its name
suggests, there is no need for supervision.
2. It means, in unsupervised machine learning, the machine is trained using the unlabeled dataset,
and the machine predicts the output without any supervision.
3. In unsupervised learning, the models are trained with the data that is neither classified nor
labelled, and the model acts on that data without any supervision.
4. The main aim of the unsupervised learning algorithm is to group or categories the
unsorted dataset according to the similarities, patterns, and differences.
5. Machines are instructed to find the hidden patterns from the input dataset.
6. Unsupervised learning cannot be directly applied to a regression or classification problem
because unlike supervised learning, we have the input data but no corresponding output data.
7. The goal of unsupervised learning is to find the underlying structure of dataset, group that
data according to similarities, and represent that dataset in a compressed format.
4
Unsupervised Learning
5
Unsupervised Learning
1. Clustering: Clustering is a method of grouping the objects into clusters such that objects with
most similarities remains into a group and has less or no similarities with the objects of another
group. Cluster analysis finds the commonalities between the data objects and categorizes them
as per the presence and absence of those commonalities.
2. Association: An association rule is an unsupervised learning method which is used for finding
the relationships between variables in the large database. It determines the set of items that
occurs together in the dataset. Association rule makes marketing strategy more effective. Such
as people who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A
typical example of Association rule is Market Basket Analysis.
6
Clustering
1. Clustering is a technique in machine learning and data analysis that involves grouping similar
data points together into clusters.
2. The goal of clustering is to identify patterns or structures in the data that may not be
immediately apparent, and to gain insights into the underlying relationships among the data
points.
3. Clustering is commonly used in a wide range of applications, including image analysis,
natural language processing, bioinformatics, and customer segmentation.
4. There are several different types of clustering algorithms, including hierarchical clustering, k-
means clustering, and density-based clustering.
5. Clustering can be used for a variety of purposes, including exploratory data analysis, data
visualization, and anomaly detection.
1. Sensitivity to initial conditions: The final clustering result of K-Means can depend heavily on the
initial choice of cluster centers. Depending on the random initialization, K-Means can converge to
different local optima, resulting in different clustering results.
2. Assumes spherical clusters: K-Means algorithm assumes that the clusters are spherical in shape,
which may not be true for all types of data. If the clusters are elongated or have irregular shapes,
K-Means may fail to capture the true underlying structure of the data.
3. Requires predefined number of clusters: K-Means requires the number of clusters to be
predefined, which may not always be known in advance. Selecting the optimal number of clusters
can be a difficult task, and an incorrect choice can lead to poor clustering results.
4. Sensitive to outliers: K-Means is sensitive to outliers, as they can significantly affect the
location of cluster centers. Outliers can distort the clustering result and lead to incorrect
conclusions.
5. Cannot handle categorical data: K-Means is designed to work with numerical data, and
cannot handle categorical or binary data directly. Data preprocessing is required to convert
categorical data to numerical data before applying K-Means.
6. Cannot handle non-linear relationships: K-Means assumes that the clusters are separated
by linear boundaries. If the data has non-linear relationships between the features, K-Means
may not be able to capture the underlying structure of the data.
1. In image segmentation, clustering is used to group pixels or regions in an image into different
clusters based on their similarity in terms of color, texture, or other visual features.
2. The resulting clusters can be used to segment the image into different regions or objects.
3. There are different clustering algorithms that can be used for image segmentation, including k-
means clustering, fuzzy c-means clustering, and spectral clustering.
4. K-means clustering is a popular algorithm that partitions the data into k clusters based on
minimizing the distance between the data points and the centroid of each cluster and commonly
used for image segmentation.
5. To apply clustering for image segmentation, we first represent the image as a matrix of pixels
with each pixel having a set of visual features such as color and texture. We then apply the
clustering algorithm to group the pixels into different clusters based on their similarity. Finally,
we assign each pixel to the cluster it belongs to, and we can use the resulting clusters to segment
the image into different regions.
1. Clustering can also be used for feature extraction. In some datasets, the raw data may be high
dimensional and difficult to work with. Clustering can help identify the underlying structure in
the data and extract meaningful features that can be used for machine learning. For example, in
image processing, clustering can be used to extract texture features or color features from the
image.
2. Clustering can be a useful preprocessing technique for machine learning.
3. It can help with data cleaning, dimensionality reduction, and feature extraction, which can
improve the performance and generalization of machine learning models. However, it is
important to carefully choose the clustering algorithm and parameters based on the
characteristics of the dataset and the desired preprocessing task.
1. Find all the neighbor points within eps and identify the core points or visited with more than
MinPts neighbors.
2. For each core point if it is not already assigned to a cluster, create a new cluster.
3. Find recursively all its density connected points and assign them to the same cluster as the
core point.
4. A point a and b are said to be density connected if there exist a point c which has a sufficient
number of points in its neighbors and both the points a and b are within the eps distance. This
is a chaining process. So, if b is neighbor of c, c is neighbor of d, d is neighbor of e, which in
turn is neighbor of a implies that b is neighbor of a.
5. Iterate through the remaining unvisited points in the dataset. Those points that do not belong
to any cluster are noise.
1. GMM has many applications, such as density estimation, clustering, and image segmentation. For
density estimation, GMM can be used to estimate the probability density function of a set of data
points.
2. For clustering, GMM can be used to group together data points that come from the same Gaussian
distribution.
3. And for image segmentation, GMM can be used to partition an image into different regions.
1. Dimensionality Reduction:
2. The Curse of Dimensionality,
3. Main Approaches for Dimensionality Reduction,
4. PCA,
5. Using Scikit-Learn,
6. Randomized PCA,
7. Kernel PCA.