0% found this document useful (0 votes)
10 views32 pages

004 UnSupervised Learning

The document covers the fundamental concepts of unsupervised learning, including key clustering algorithms and dimensionality reduction techniques. It outlines learning outcomes, content structure, and evaluation methods for clustering results. Additionally, it provides examples of various clustering algorithms and their applications in real-world scenarios.

Uploaded by

yjun042003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views32 pages

004 UnSupervised Learning

The document covers the fundamental concepts of unsupervised learning, including key clustering algorithms and dimensionality reduction techniques. It outlines learning outcomes, content structure, and evaluation methods for clustering results. Additionally, it provides examples of various clustering algorithms and their applications in real-world scenarios.

Uploaded by

yjun042003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

CT032-3-3-FAI& Further Artificial Intelligence

UNSUPERVISED LEARNING

Module Code & Module Title Slide Title SLIDE 1


TOPIC LEARNING OUTCOMES
At the end of this topic, you should be able to:
1. Explain the fundamental concepts of unsupervised learning
2. Distinguish between key clustering algorithms
3. Understand dimensionality reduction techniques
4. Apply basic clustering algorithms to real-world datasets
5. Evaluate the quality of clustering results

Module Code & Module Title Slide Title SLIDE 2


CONTENTS & STRUCTURE
1. Fundamental concepts of unsupervised learning
2. Key clustering algorithms
3. Dimensionality reduction techniques
4. Quality of clustering results

Module Code & Module Title Slide Title SLIDE 3


What is Unsupervised Learning?

Core Concepts of Unsupervised Learning:


1. Unlabelled Data
2. Pattern Recognition
3. Data Distribution Understanding
4. Dimensionality
5. Iterative Nature
6. Evaluation Challenges

Image credits: google.com


Module Code & Module Title Slide Title SLIDE 4
What is Unsupervised Learning?...cont.

Unlabeled Data
• No labels are given to the learning algorithm, leaving it on its
own to find structure in its input.
• Unsupervised machine learning cannot be directly applied to a
regression because it is unknown what the output values could be,
therefore making it impossible to train the algorithm how you
normally would.

Image credits: google.com


Module Code & Module Title Slide Title SLIDE 5
What is Unsupervised Learning?...cont.

Pattern Recognition
• This could be finding:
1. Natural groupings of similar items
2. Hidden relationships between variables
3. Common features or characteristics
4. Abnormal or unusual patterns

Image credits: google.com


Module Code & Module Title Slide Title SLIDE 6
What is Unsupervised Learning?...cont.

Data Distribution Understanding


• Unsupervised learning helps understand how data is distributed
and organized in the feature space.

• This includes:
1. Density estimation: Understanding where data points are
concentrated
2. Feature relationships: How different variables correlate or
interact
3. Data structure: The inherent organization or hierarchy in the
data
Image credits: google.com
Module Code & Module Title Slide Title SLIDE 7
What is Unsupervised Learning?...cont.

Dimensionality
• Many unsupervised learning tasks involve working with high-
dimensional data where patterns are not immediately obvious.

• The algorithms help by:


1. Reducing the dimensions while retaining important
information
2. Finding the most important features
3. Creating more compact representations of the data

Image credits: google.com


Module Code & Module Title Slide Title SLIDE 8
What is Unsupervised Learning?...cont.

Iterative Nature
• K-means is an unsupervised clustering algorithm designed to
partition unlabelled data into a certain number (which is the “ K”) of
distinct groupings.

Image credits: towardsdatascience.com


Module Code & Module Title Slide Title SLIDE 9
What is Unsupervised Learning?...cont.

Iterative Nature

Image credits: towardsdatascience.com


Module Code & Module Title Slide Title SLIDE 10
What is Unsupervised Learning?...cont.

Evaluation Challenges

• Unlike supervised learning, there's often no clear "right answer" to


validate against.

• Success is typically measured by:


1. Internal metrics (like cluster cohesion)
2. Domain expertise validation
3. Practical usefulness of the discovered patterns

Image credits: google.com


Module Code & Module Title Slide Title SLIDE 11
Quick Review Question

Why is unsupervised learning called "unsupervised" and what is its


main goal?

Image credits: google.com


Module Code & Module Title Slide Title SLIDE 12
Key Clustering Algorithms

K-means Clustering
• Example: Segmentation in a gym:
– Input: Member data (age, visit frequency, class preferences)
• Process: Divides into K groups (e.g., K=3)
• Results:
– Cluster 1: Young, frequent visitors, prefer group classes
– Cluster 2: Middle-aged, moderate visits, prefer machines
– Cluster 3: Seniors, regular visits, prefer morning sessions

Image credits: google.com


Module Code & Module Title Slide Title SLIDE 13
Key Clustering Algorithms ...cont.

Hierarchical Clustering
• Example: Organizing a company's product catalog:
• Top Level:
• Electronics, Clothing, Home goods
• Second Level:
• Electronics → Phones, Laptops, Accessories
• Clothing → Men's, Women's, Children's
• Third Level:
• Phones → Budget, Mid-range, Premium Creates a tree-like
structure showing relationships at different levels.

Image credits: google.com


Module Code & Module Title Slide Title SLIDE 14
Key Clustering Algorithms ...cont.

DBSCAN (Density-Based)
• Example: Analyzing city locations:
• Core Points: Dense areas (shopping districts)
• Border Points: Edge of neighborhoods
• Noise: Isolated locations

• Finds irregularly shaped clusters like:


1. Shopping districts of any shape
2. Connected neighborhoods
3. Identifies outliers (isolated stores)

Image credits: google.com


Module Code & Module Title Slide Title SLIDE 15
Key Clustering Algorithms ...cont.

Gaussian Mixture Models


• Example: Analyzing customer spending:
• Overlapping groups based on probability
• Can identify:
• Budget shoppers ($0-50 range)
• Mid-range ($30-150 range)
• Luxury ($100+ range) Allows for fuzzy boundaries where
customers might belong partially to multiple groups.

Image credits: google.com


Module Code & Module Title Slide Title SLIDE 16
Quick Review Question

What's the key difference between K-means and DBSCAN


clustering?

Image credits: google.com

Module Code & Module Title Slide Title SLIDE 17


Dimensionality Reduction Techniques

Principal Component Analysis (PCA)


• Example: Analyzing smartphone features
– Original 6 dimensions:
• Screen size, Battery life, Camera quality, Processor speed, RAM,
Storage

• Reduced to 2 key components:


• PC1: "Performance" (combines processor, RAM, storage)
• PC2: "User Experience" (combines screen, battery, camera)

• This simplifies analysis while maintaining most important


information. Image credits: google.com
Module Code & Module Title Slide Title SLIDE 18
Dimensionality Reduction Techniques ...cont.

t-SNE (t-Distributed Stochastic Neighbor Embedding)


• Example: Visualizing customer behaviors
• Original dimensions:
• Purchase history, Browsing patterns, Time spent on site, Cart
abandonment, Return history, Review ratings

• Reduces to 2D visualization showing:


• Clear clusters of similar customers
• Outlier detection

• Relationship patterns Especially good for visualizing complex


patterns. Image credits: google.com
Module Code & Module Title Slide Title SLIDE 19
Dimensionality Reduction Techniques ...cont.

Autoencoders
• Example: Image compression
• Input: 1000x1000 pixel image
• Process:
• Encoder compresses to key features
• Decoder reconstructs from compressed form

• Learns efficient representation Output: Same image reconstructed


from much less data

Image credits: google.com


Module Code & Module Title Slide Title SLIDE 20
Dimensionality Reduction Techniques ...cont.

LDA (Linear Discriminant Analysis)


• Example: Text document analysis
• Original: Thousands of words per document
• Reduced to:
• Key topics
• Main themes

• Document similarities Helps in document classification and topic


modeling.

Image credits: google.com


Module Code & Module Title Slide Title SLIDE 21
Quick Review Question

When would you choose t-SNE over PCA for


dimensionality reduction?

Image credits: google.com

Module Code & Module Title Slide Title SLIDE 22


Quality of Clustering Results

Silhouette Score
• Example: Analyzing customer segments
• Score range: -1 to 1
• Higher is better
• Example scores:
• 0.8: Clear separation between young, middle-aged, and senior
shoppers
• 0.3: Some overlap in shopping patterns between age groups
• -0.1: Poor clustering, age groups not meaningfully separated

Image credits: google.com


Module Code & Module Title Slide Title SLIDE 23
Quality of Clustering Results...cont.

Intra-Cluster Distance
• Example: Restaurant groupings by cuisine and price
• Good clustering:
• All Italian fine dining restaurants tightly grouped
• All fast food chains closely clustered Poor clustering:
• Mix of expensive and budget restaurants in same cluster
• Different cuisines randomly mixed together

Image credits: google.com


Module Code & Module Title Slide Title SLIDE 24
Quality of Clustering Results...cont.

Inter-Cluster Distance
• Example: Product categories in e-commerce
• Good separation:
• Electronics cluster clearly distinct from clothing cluster
• Beauty products well-separated from tools Poor separation:
• Sports equipment overlapping with casual wear
• Home decor mixing with kitchen appliances

Image credits: google.com


Module Code & Module Title Slide Title SLIDE 25
Quality of Clustering Results...cont.

Stability Analysis
• Example: Customer behavior clustering
• Stable results:
• Similar clusters appear even with different data samples
• Patterns remain consistent over time Unstable results:
• Clusters change dramatically with small data changes
• No consistent patterns emerge

Image credits: google.com


Module Code & Module Title Slide Title SLIDE 26
Quick Review Question

What does a high silhouette score (close to 1) tell you


about your clustering results?

Image credits: google.com

Module Code & Module Title Slide Title SLIDE 27


Summary / Recap of Main Points

• Fundamental Concepts of Unsupervised Learning


• Works with unlabeled data – no predefined correct answers
• Focuses on finding hidden patterns and structures
• Aims to understand data distribution and relationships
• Handles high-dimensional data through reduction
• Uses iterative processes to improve results
• Requires careful evaluation due to lack of ground truth

Module Code & Module Title Slide Title SLIDE 28


Summary / Recap of Main Points…cont.

• Key Clustering Algorithms


• K-means: Simple, fast, requires predefined number of clusters
• Hierarchical Clustering: Creates tree-like structure, good for
nested relationships
• DBSCAN: Handles irregular shapes, identifies outliers
automatically
• Gaussian Mixture Models: Allows probabilistic cluster
membership Each suitable for different types of data and
objectives

Module Code & Module Title Slide Title SLIDE 29


Summary / Recap of Main Points…cont.

• Dimensionality Reduction Techniques


• PCA: Linear reduction, preserves maximum variance
• t-SNE: Good for visualization, preserves local relationships
• Autoencoders: Non-linear reduction using neural networks
• LDA: Supervised reduction, optimizes class separation Choose
based on data type and visualization needs

Module Code & Module Title Slide Title SLIDE 30


Summary / Recap of Main Points…cont.

• Quality of Clustering Results


• Silhouette Score: Measures cluster cohesion and separation
• Intra-Cluster Distance: Evaluates similarity within clusters
• Inter-Cluster Distance: Assesses separation between clusters
• Stability Analysis: Tests robustness of clustering results
Essential for validating clustering effectiveness

Module Code & Module Title Slide Title SLIDE 31


What To Expect Next Week

In Class Preparation for Class


• Unsupervised Learning • Clustering & Classification

Module Code & Module Title Slide Title SLIDE 32

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy