004 UnSupervised Learning
004 UnSupervised Learning
UNSUPERVISED LEARNING
Unlabeled Data
• No labels are given to the learning algorithm, leaving it on its
own to find structure in its input.
• Unsupervised machine learning cannot be directly applied to a
regression because it is unknown what the output values could be,
therefore making it impossible to train the algorithm how you
normally would.
Pattern Recognition
• This could be finding:
1. Natural groupings of similar items
2. Hidden relationships between variables
3. Common features or characteristics
4. Abnormal or unusual patterns
• This includes:
1. Density estimation: Understanding where data points are
concentrated
2. Feature relationships: How different variables correlate or
interact
3. Data structure: The inherent organization or hierarchy in the
data
Image credits: google.com
Module Code & Module Title Slide Title SLIDE 7
What is Unsupervised Learning?...cont.
Dimensionality
• Many unsupervised learning tasks involve working with high-
dimensional data where patterns are not immediately obvious.
Iterative Nature
• K-means is an unsupervised clustering algorithm designed to
partition unlabelled data into a certain number (which is the “ K”) of
distinct groupings.
Iterative Nature
Evaluation Challenges
K-means Clustering
• Example: Segmentation in a gym:
– Input: Member data (age, visit frequency, class preferences)
• Process: Divides into K groups (e.g., K=3)
• Results:
– Cluster 1: Young, frequent visitors, prefer group classes
– Cluster 2: Middle-aged, moderate visits, prefer machines
– Cluster 3: Seniors, regular visits, prefer morning sessions
Hierarchical Clustering
• Example: Organizing a company's product catalog:
• Top Level:
• Electronics, Clothing, Home goods
• Second Level:
• Electronics → Phones, Laptops, Accessories
• Clothing → Men's, Women's, Children's
• Third Level:
• Phones → Budget, Mid-range, Premium Creates a tree-like
structure showing relationships at different levels.
DBSCAN (Density-Based)
• Example: Analyzing city locations:
• Core Points: Dense areas (shopping districts)
• Border Points: Edge of neighborhoods
• Noise: Isolated locations
Autoencoders
• Example: Image compression
• Input: 1000x1000 pixel image
• Process:
• Encoder compresses to key features
• Decoder reconstructs from compressed form
Silhouette Score
• Example: Analyzing customer segments
• Score range: -1 to 1
• Higher is better
• Example scores:
• 0.8: Clear separation between young, middle-aged, and senior
shoppers
• 0.3: Some overlap in shopping patterns between age groups
• -0.1: Poor clustering, age groups not meaningfully separated
Intra-Cluster Distance
• Example: Restaurant groupings by cuisine and price
• Good clustering:
• All Italian fine dining restaurants tightly grouped
• All fast food chains closely clustered Poor clustering:
• Mix of expensive and budget restaurants in same cluster
• Different cuisines randomly mixed together
Inter-Cluster Distance
• Example: Product categories in e-commerce
• Good separation:
• Electronics cluster clearly distinct from clothing cluster
• Beauty products well-separated from tools Poor separation:
• Sports equipment overlapping with casual wear
• Home decor mixing with kitchen appliances
Stability Analysis
• Example: Customer behavior clustering
• Stable results:
• Similar clusters appear even with different data samples
• Patterns remain consistent over time Unstable results:
• Clusters change dramatically with small data changes
• No consistent patterns emerge