0% found this document useful (0 votes)

3 views27 pages

Unsupervised Learning (1)

Unsupervised learning is a machine learning approach that identifies patterns in unlabeled data, focusing on discovering hidden structures, clustering similar data points, and reducing dimensionality. Key techniques include clustering, anomaly detection, dimensionality reduction, and association rule learning, with K-Means being a prominent clustering algorithm. Choosing the optimal number of clusters (k) is challenging and often involves methods like the Elbow Method, while anomaly detection identifies significant deviations from normal behavior in data.

Uploaded by

SALIHU ISMAIL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views27 pages

Unsupervised Learning (1)

Uploaded by

SALIHU ISMAIL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Unsupervised

Learning
What is Unsupervised Learning?
 Unsupervised learning is a type of machine learning method where algorithms explore and find patterns within
unlabeled data.
 Unlike supervised learning, it does not require labeled examples or explicit target variables.

Objectives of Unsupervised Learning:

 Discover hidden patterns or intrinsic structures within data.

 Group or cluster similar data points based on inherent features.
 Reduce dimensionality for easier interpretation of complex data.

Examples:

 Customer Segmentation: Grouping customers by purchasing behaviors to target marketing campaigns.

 Social Network Analysis: Finding groups of similar users based on interactions.
 Bioinformatics: Identifying genetic profiles associated with diseases.
https://machinelearningmastery.com/types-of-classification-in-machine-learning/
https://www.nvidia.com/en-au/glossary/k-means/
Types of Unsupervised Learning
1. Clustering: Identify groups or clusters in data based on similarity.
Examples:
o K-Means Clustering
o Hierarchical Clustering
2. Anomaly Detection: Detect unusual or rare events that deviate significantly from normal behavior.
Examples:
o Gaussian-based anomaly detection
o Autoencoder-based methods
3. Dimensionality Reduction: Reduce the number of features while retaining key information for easier interpretation and visualization.
Examples:
o Principal Component Analysis (PCA)
o t-distributed Stochastic Neighbor Embedding (t-SNE)
4. Association Rule Learning: Discover interesting relationships or frequent co-occurrences among variables.
Supervised vs. Unsupervised Learning
Aspect Supervised Learning Unsupervised Learning

Data Input Labeled data (inputs x, labels y) Unlabeled data (inputs x only)

Find intrinsic structures, patterns, or

Learning Objective Learn to predict target labels (y) accurately
relationships in data

Linear regression, Decision trees, Neural K-Means clustering, Principal Component

Common Algorithms
networks (with labels) Analysis (PCA), Autoencoders

Clustering, Dimensionality reduction,

Use-Cases Classification, Regression
Anomaly detection

Outcome Accurate predictions Meaningful data structures and insights

What is Clustering?
Definition:

 Clustering is an unsupervised machine learning method used to identify groups (clusters) of similar data points in an
unlabeled dataset.
 The algorithm automatically organizes data points into meaningful groups based on similarity or distance criteria
without predefined labels.

Intuition:

 Data points within the same cluster share similarities, while data points in different clusters exhibit distinct
characteristics.

Motivation:

 Materials science data (structural, compositional, and performance-related) are often complex and unlabeled.
 Uncover meaningful groupings and hidden patterns that facilitate new insights and guide further investigations.
The K-Means Algorithm
What is K-Means?

K-Means clustering is a popular unsupervised algorithm that partitions data into K distinct clusters based on distance to
centroids.

Step-by-Step Example:

Step 1: Initialization: Randomly choose K initial centroids (cluster centers) from the dataset.

Step 2: Assigning Points to Centroids: Assign each data point to the nearest centroid based on Euclidean distance.

Step 3: Updating (move) Centroids: Calculate new centroid positions by taking the average of all points assigned to each
centroid.

Repeat Steps 2 and 3 until convergence: Convergence: Centroids no longer significantly change position; cluster
assignments remain stable
Step 1: Initialization

· Initialization: The K-means algorithm starts by selecting

initial positions for K cluster centers.

· Number of Clusters (K): The number K is chosen by the

user based on the specific application or problem.

· Initial Locations: Users may either specify initial locations

manually or allow the algorithm to choose them randomly
from the dataset.

· Importance: Good initial positions help ensure accurate

clustering and faster convergence.
Step 2: Assigning Points to Centroids: Assign each data
point to the nearest centroid based on Euclidean
distance.

Step 3: Updating (move) Centroids: Calculate new

centroid positions by taking the average of all points
assigned to each centroid.

https://www.nvidia.com/en-au/glossary/k-means/
Repeat Steps 2 and 3 until convergence:
Convergence: Centroids no longer significantly
change position; cluster assignments remain stable
Repeat Steps 2 and 3 until convergence:
Convergence: Centroids no longer significantly
change position; cluster assignments remain stable
Formalizing the K-Means Algorithm
Notation:
 Dataset: , K: Number of clusters.
 Centroids:
Algorithm Steps:
1. Initialization:: Randomly select initial centroids
2. Cluster Assignment (Step 1):
o Assign each data point to the nearest centroid .
2. Centroid Update (Step 2):
o Update centroid positions by computing the mean of assigned points:
where is the set of points assigned to cluster k.
4. Repeat:
o Repeat Steps 2 and 3 until centroids no longer move significantly or assignments stop changing
Optimization Objective in K-Means
Cost (Distortion) Function:

 K-Means optimizes the distortion (cost) function, defined as:

o Measures average squared distance between points and their assigned centroids.

Goal: Minimize the distortion to achieve compact, well-defined clusters.

Why This Works:

 Cluster assignment step minimizes distortion by assigning points to the nearest centroid.
 Centroid update step minimizes distortion by moving centroids to positions that reduce squared distances within
clusters.

Convergence:

 Distortion function monotonically decreases (or remains constant) with each iteration.
 Algorithm guaranteed to converge to local optimum (minimum distortion)
Practical Aspects – Initialization of K-Means
Importance of Initialization: Different initial centroid positions can lead to very different clustering outcomes.
Common Initialization Strategies:
 Random Selection: Choose K random data points from the dataset as initial centroids.
 Multiple Initializations:
o Run K-Means multiple times with different initial centroids.
o Select clustering with lowest distortion.
Example of Good vs. Poor Initialization:
 Good Initialization: Leads to clear, intuitive clusters with minimal distortion.
 Poor Initialization: Can result in poor local minima, suboptimal clusters, and higher distortion values.
Best Practices:
 Typically use multiple random initializations (e.g., 50-100 times).
 Evaluate and select the clustering result that gives the lowest distortion.
 Consider advanced initialization methods (e.g., K-Means++ algorithm) for improved results.
The Challenge of Selecting 'k'
Why is choosing 'k' difficult?

 In clustering, the "correct" number of clusters often isn’t clearly defined.

 No explicit labels or ground truth to validate optimal cluster count.

Ambiguity Illustrated:

 Different observers may suggest different numbers of clusters from the same dataset.
 Example: One scientist sees 2 distinct clusters, another might identify 4 distinct clusters in the same dataset.
 Ambiguity occurs because clustering outcomes depend on interpretation, research context, and data complexity.

Implications:

 No single universally "correct" value for 'k'.

 Choice of clusters often depends on application-specific criteria and practical considerations.
https://pub.towardsai.net/fully-explained-k-means-clustering-with-python-e7caa573176a
Methods for Choosing 'k': The Elbow Method
What is the Elbow Method? A method to estimate a suitable number of clusters (k) by observing how clustering
performance improves with increasing cluster count.
How it works (step-by-step):
1. Run K-Means clustering algorithm multiple times, varying the number of clusters (k = 1, 2, …, n).
2. Calculate and record the distortion (cost function) for each k.
3. Plot distortion values vs. number of clusters (k).
Identifying the "Elbow":
 Look for the point ("elbow") where the decrease in distortion significantly slows down.
 The "elbow" point is often chosen as a suitable number of clusters.
Pros: Simple and intuitive visual tool and easy to interpret, widely used in practice.
Cons: Sometimes ambiguous; no clear elbow point. Also subjective: Different interpretations of the same plot are
possible.
Illustrative Example: A clear elbow at k = 3 might suggest 3 clusters as optimal for a particular dataset
Practical Considerations in Materials Applications
Why Context Matters in Materials Science: Optimal number of clusters (k) strongly depends on specific research
objectives or downstream applications.
Examples:
 Materials Phase Identification:
o Choosing k based on known physical or chemical phases observed in experimental or simulation data.
o Balancing detail (more clusters) with interpretability and simplicity (fewer clusters).
 Alloy Design and Composition Screening:
o Selecting k according to practical manufacturing constraints or categories of alloys to investigate.
o More clusters allow finer distinctions in compositions; fewer clusters reduce complexity and simplify
exploration.
 Product Design and Commercial Constraints:
o Example from manufacturing (e.g., clothing): Number of clusters influenced by production feasibility, cost
constraints, or market segmentation needs.
o Fewer clusters simplify production but might compromise the ideal performance or customer fit. More
clusters could increase cost and complexity.
What is Anomaly Detection?
Definition: Anomaly detection algorithms identify data points or events that deviate significantly from the
expected behavior or typical data patterns.

Intuition:

 Algorithms learn patterns from a dataset representing "normal" behavior.

 Points far from typical patterns are flagged as anomalies.

Illustrative Example:

 Aircraft engine manufacturing:

o Normal: Typical range of heat and vibration signatures.
o Anomalous: Unusually high or low heat/vibration indicating potential failure or defects
Why Anomaly Detection in Materials Science?
Motivation:

 Detecting defects or failures.

 Preventing catastrophic failures in critical materials or components.
 Improving quality control during manufacturing.

Specific Applications:

 Defect Detection: Identifying anomalies in additive manufacturing processes, metal casting, or

semiconductor fabrication.
 Material Failure Prediction: Early detection of unusual mechanical properties or degradation.
 Quality Assurance: Continuous monitoring of products (e.g., aerospace alloys, composites) to flag unusual
behavior or defects early
Anomaly Detection vs. Supervised Learning
Aspect Anomaly Detection Supervised Learning

Primarily unlabeled; few labeled Requires sufficient labeled

Labels Needed?
anomalies for evaluation only positive (anomaly) examples

Number of Positive
Very few (0-20) Moderate to large
Examples

Typically unknown or novel

Nature of Anomalies Typically known anomalies
anomalies

Recognize previously seen

Goal Identify novel deviations
anomalies
Pattern Recognition "Anomaly Detection Challenges"
The Gaussian (Normal) Distribution
Definition:

 Gaussian distribution models the probability of a random variable x being seen in the dataset:

Parameters:

 μ (mean): Center of the distribution.

 σ2 (variance): Spread or dispersion of data around the mean.

Visual Intuition:

 Bell-shaped curve indicating high probability near mean and low probability for points far from mean
Anomaly Detection Algorithm: Mathematical Formulation
Step 1: Model Data with Gaussian Distribution: Assume each feature is modeled as a Gaussian:
o Estimate mean () and variance () for each feature :
Step 2: Compute Probability of a Data Point
 Given a new point x, compute its probability:

Step 3: Flag Anomalies

 Define threshold (ϵ); flag anomaly if:

Example:
 Anomalous engine data points have significantly lower probability than normal points

Sedecal Mobile Digital PDF
100% (2)
Sedecal Mobile Digital PDF
8 pages
13: Clustering: Unsupervised Learning - Introduction
No ratings yet
13: Clustering: Unsupervised Learning - Introduction
4 pages
Week 9
No ratings yet
Week 9
66 pages
UNIT 4
No ratings yet
UNIT 4
125 pages
Clustering
No ratings yet
Clustering
6 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
10.Lab Activity
No ratings yet
10.Lab Activity
11 pages
Kmean
No ratings yet
Kmean
24 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Ml Unit5 Notes
No ratings yet
Ml Unit5 Notes
18 pages
Machine Learning Chapter 3
No ratings yet
Machine Learning Chapter 3
12 pages
Working of K Means Algorithm - YashBhure
No ratings yet
Working of K Means Algorithm - YashBhure
14 pages
Mod4_Unsupervised Learning
No ratings yet
Mod4_Unsupervised Learning
9 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
kmea
No ratings yet
kmea
53 pages
K Mean
No ratings yet
K Mean
7 pages
Clustering
No ratings yet
Clustering
18 pages
K - Means Clustering
No ratings yet
K - Means Clustering
13 pages
Clustering-Part1.pptx
No ratings yet
Clustering-Part1.pptx
84 pages
algo
No ratings yet
algo
59 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
No ratings yet
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
168 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
CE345 - Lecture #9 - Clustering
No ratings yet
CE345 - Lecture #9 - Clustering
56 pages
K-MEANS CLUSTERING ppt kpu
No ratings yet
K-MEANS CLUSTERING ppt kpu
4 pages
UNIT-4
No ratings yet
UNIT-4
22 pages
2 - K-Mean
No ratings yet
2 - K-Mean
39 pages
K Mean
No ratings yet
K Mean
12 pages
K-Mean
No ratings yet
K-Mean
9 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
K-MEANS-FINAL
No ratings yet
K-MEANS-FINAL
10 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
som-new
No ratings yet
som-new
21 pages
kmeansfinal
No ratings yet
kmeansfinal
16 pages
K means Clustering
No ratings yet
K means Clustering
11 pages
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
No ratings yet
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
19 pages
Pilot
No ratings yet
Pilot
3 pages
ML - Unit - 2
No ratings yet
ML - Unit - 2
13 pages
19.1. Partitioning-Based Clustering Algorithms
No ratings yet
19.1. Partitioning-Based Clustering Algorithms
27 pages
06. k Clustering
No ratings yet
06. k Clustering
28 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Chapter 9
No ratings yet
Chapter 9
8 pages
k means clustering
No ratings yet
k means clustering
27 pages
Unsupervised Unit 1
No ratings yet
Unsupervised Unit 1
12 pages
Kmeans
No ratings yet
Kmeans
92 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
20 pages
ML unit 4
No ratings yet
ML unit 4
110 pages
EAI13
No ratings yet
EAI13
19 pages
Unit IV
No ratings yet
Unit IV
96 pages
Ml Module5 Clustering
No ratings yet
Ml Module5 Clustering
71 pages
4.1.2. K Means Clustering
No ratings yet
4.1.2. K Means Clustering
38 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Chapter 1
No ratings yet
Chapter 1
20 pages
Presentation+Gibbs Group
No ratings yet
Presentation+Gibbs Group
38 pages
Thermo 502 Proposal
No ratings yet
Thermo 502 Proposal
3 pages
1st Law of Thermodynamics L34
No ratings yet
1st Law of Thermodynamics L34
14 pages
Client Feedback Form
No ratings yet
Client Feedback Form
1 page
Bachelor of Science in Aviation Management Major in Airline Operations
No ratings yet
Bachelor of Science in Aviation Management Major in Airline Operations
3 pages
Controlled & Uncontrolled Airspace
100% (1)
Controlled & Uncontrolled Airspace
30 pages
Iloilo Jar Corporation Digest
No ratings yet
Iloilo Jar Corporation Digest
2 pages
CMGTCB554 Final Course Reflection
No ratings yet
CMGTCB554 Final Course Reflection
7 pages
FDNY C14 Study Materials
No ratings yet
FDNY C14 Study Materials
105 pages
Scrapeater Installation Operating Instructions Issue F
No ratings yet
Scrapeater Installation Operating Instructions Issue F
9 pages
CV Dewald Wahlstrand 2018
No ratings yet
CV Dewald Wahlstrand 2018
10 pages
48v 15 Amp MPPT Solar Charge Controller
No ratings yet
48v 15 Amp MPPT Solar Charge Controller
1 page
Road SAFETY
No ratings yet
Road SAFETY
13 pages
1 Omt-M3
No ratings yet
1 Omt-M3
22 pages
090-2100-MMM-DET-20080-01
No ratings yet
090-2100-MMM-DET-20080-01
1 page
Refunds - Final
No ratings yet
Refunds - Final
7 pages
01-05 SDH Boards
No ratings yet
01-05 SDH Boards
53 pages
PSIM 2024.1 Licensing and Installation Guide
No ratings yet
PSIM 2024.1 Licensing and Installation Guide
14 pages
Chapter 1 Smith and Van Ness
No ratings yet
Chapter 1 Smith and Van Ness
41 pages
Minebea Stepper Motor Specifications
No ratings yet
Minebea Stepper Motor Specifications
4 pages
G11 Agriculture STB 2023 Web
No ratings yet
G11 Agriculture STB 2023 Web
40 pages
Rel Costing
No ratings yet
Rel Costing
94 pages
Report On The Physical Count of Inventories
No ratings yet
Report On The Physical Count of Inventories
10 pages
Regional Hierarchy
No ratings yet
Regional Hierarchy
25 pages
15101209265579
No ratings yet
15101209265579
30 pages
Patanjali Project
No ratings yet
Patanjali Project
75 pages
C12 - Cooling Water System
No ratings yet
C12 - Cooling Water System
2 pages
Hayes Clement On Midtown Police Presence
No ratings yet
Hayes Clement On Midtown Police Presence
2 pages
1004042119
No ratings yet
1004042119
1 page
PhonePe_Statement_Mar2025_Mar2025 (1)
No ratings yet
PhonePe_Statement_Mar2025_Mar2025 (1)
2 pages
Ebook10508 (WWW Takbook Com)
No ratings yet
Ebook10508 (WWW Takbook Com)
48 pages
Model Standard Operating Procedures (SOP) to optimize mining operation during the rainy season
No ratings yet
Model Standard Operating Procedures (SOP) to optimize mining operation during the rainy season
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unsupervised Learning (1)

Uploaded by

Unsupervised Learning (1)

Uploaded by

Unsupervised

Objectives of Unsupervised Learning:

 Discover hidden patterns or intrinsic structures within data.

 Customer Segmentation: Grouping customers by purchasing behaviors to target marketing campaigns.

Find intrinsic structures, patterns, or

Linear regression, Decision trees, Neural K-Means clustering, Principal Component

Clustering, Dimensionality reduction,

Outcome Accurate predictions Meaningful data structures and insights

· Initialization: The K-means algorithm starts by selecting

· Number of Clusters (K): The number K is chosen by the

· Initial Locations: Users may either specify initial locations

· Importance: Good initial positions help ensure accurate

Step 3: Updating (move) Centroids: Calculate new

 K-Means optimizes the distortion (cost) function, defined as:

Goal: Minimize the distortion to achieve compact, well-defined clusters.

Why This Works:

 In clustering, the "correct" number of clusters often isn’t clearly defined.

 No single universally "correct" value for 'k'.

 Algorithms learn patterns from a dataset representing "normal" behavior.

 Aircraft engine manufacturing:

 Detecting defects or failures.

 Defect Detection: Identifying anomalies in additive manufacturing processes, metal casting, or

Primarily unlabeled; few labeled Requires sufficient labeled

Typically unknown or novel

Recognize previously seen

 μ (mean): Center of the distribution.

 σ2 (variance): Spread or dispersion of data around the mean.

Step 3: Flag Anomalies

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.