Week6_clustering_regression
Week6_clustering_regression
IE4483
1
Weekly Plan
2
Mini Project
3
Mini Project
• Grouping
4
Mini Project
• Application type:
Natural Language Processing
• Training data:
• Testing data:
5
Mini Project
• Training data:
• Images in dog / cat folders:
|— dog
|— cat
• Testing data:
6
Mini Project
7
Clustering
8
Outline
• Concept of Clustering
• Distance Metrics
• K-Means
• Examples
9
Carry-on Questions
10
Recap: Classification
11
Recap: Classification
• What if we do NOT have the labels? Which pixels form the flower?
13
From classification to clustering
14
From classification to clustering
15
From classification to clustering
Background Pixels
Flower Pixels
16
From classification to clustering
17
From classification to clustering
Cluster 2
Cluster 1
18
From classification to clustering
• Classification is supervised:
• Clustering is unsupervised:
19
Clustering
• Clustering:
• Unsupervised Method.
20
Distance Measures / Metrics
• Given a set of N data samples / points 𝒙𝒙1 , 𝒙𝒙2 , … , 𝒙𝒙𝑖𝑖 , … 𝒙𝒙𝑁𝑁 that we
would like to cluster.
• We define the distance between any two data samples 𝒙𝒙𝑖𝑖 and 𝒙𝒙𝑗𝑗 as
21
Distance Measures / Metrics
• We define the distance between any two data samples 𝒙𝒙𝑖𝑖 and 𝒙𝒙𝑗𝑗 as
𝑑𝑑(𝒙𝒙𝑖𝑖 , 𝒙𝒙𝑗𝑗 ) → 𝑅𝑅 (real scalar)
𝒙𝒙𝑗𝑗
non-negativity
𝒙𝒙𝑖𝑖
22
Distance Measures / Metrics
• We define the distance between any two data samples 𝒙𝒙𝑖𝑖 and 𝒙𝒙𝑗𝑗 as
𝑑𝑑(𝒙𝒙𝑖𝑖 , 𝒙𝒙𝑗𝑗 ) → 𝑅𝑅 (real scalar)
triangle 𝒙𝒙𝑗𝑗
inequality 𝒙𝒙𝑖𝑖
23
Distance Measures / Metrics
• We define the distance between any two data samples 𝒙𝒙𝑖𝑖 and 𝒙𝒙𝑗𝑗 as
𝑑𝑑(𝒙𝒙𝑖𝑖 , 𝒙𝒙𝑗𝑗 ) → 𝑅𝑅 (real scalar)
symmetry 𝒙𝒙𝑖𝑖
24
Distance Measures / Metrics
• Example of distances:
1. Euclidean Distance:
𝑑𝑑
𝑑𝑑 𝒙𝒙 , 𝒚𝒚 = �(𝑥𝑥𝑗𝑗 − 𝑦𝑦𝑗𝑗 )2 = 𝒙𝒙 − 𝒚𝒚 2
𝑗𝑗=1
𝒙𝒙
𝒙𝒙 − 𝒚𝒚 2
𝒚𝒚
25
Distance Measures / Metrics
• Example of distances:
1. Euclidean Distance:
𝑑𝑑
𝑑𝑑 𝒙𝒙 , 𝒚𝒚 = �(𝑥𝑥𝑗𝑗 − 𝑦𝑦𝑗𝑗 )2 = 𝒙𝒙 − 𝒚𝒚 2
𝑗𝑗=1
2. Manhattan Distance:
𝑑𝑑
𝑑𝑑 𝒙𝒙 , 𝒚𝒚 = � | 𝑥𝑥𝑗𝑗 − 𝑦𝑦𝑗𝑗 | = 𝒙𝒙 − 𝒚𝒚 1
𝑗𝑗=1
𝒙𝒙
𝒙𝒙 − 𝒚𝒚 1
𝒚𝒚 26
Distance Measures / Metrics
• Example of distances:
1. Euclidean Distance:
𝑑𝑑
𝑑𝑑 𝒙𝒙 , 𝒚𝒚 = �(𝑥𝑥𝑗𝑗 − 𝑦𝑦𝑗𝑗 )2 = 𝒙𝒙 − 𝒚𝒚 2
𝑗𝑗=1
2. Manhattan Distance:
𝑑𝑑
𝑑𝑑 𝒙𝒙 , 𝒚𝒚 = � | 𝑥𝑥𝑗𝑗 − 𝑦𝑦𝑗𝑗 | = 𝒙𝒙 − 𝒚𝒚 1
𝑗𝑗=1
• Example of distances:
1. Euclidean Distance: 𝑑𝑑 𝒙𝒙 , 𝒚𝒚 = 𝒙𝒙 − 𝒚𝒚 2
2. Manhattan Distance: 𝑑𝑑 𝒙𝒙 , 𝒚𝒚 = 𝒙𝒙 − 𝒚𝒚 1
3. Infinity (Sup) Distance: 𝑑𝑑 𝒙𝒙 , 𝒚𝒚 = max 𝑥𝑥𝑗𝑗 − 𝑦𝑦𝑗𝑗
1≤𝑗𝑗≤𝑑𝑑
28
Distance Measures / Metrics
29
Clustering Algorithms
• Partition Algorithms
• K-Means
• Mixture of Gaussian
• Spectral Clustering
• Hierarchical Algorithms
• Agglomerative
• Divisive
30
Clustering Algorithms
• Partition Algorithm
Cluster 1
Cluster 2
31
Clustering Algorithms
• Hierarchical Algorithm
32
Clustering Algorithms
• Partition Algorithms
• K-Means
• Mixture of Gaussian
• Spectral Clustering
• Hierarchical Algorithms
• Agglomerative
• Divisive
33
Clustering Algorithms
• Partition Algorithms
• K-Means
• Mixture of Gaussian
• Spectral Clustering
• Hierarchical Algorithms
• Agglomerative
• Divisive
34
K-Means
• Given a set of d-dimension data points 𝒙𝒙1 , 𝒙𝒙2 , … , 𝒙𝒙𝑖𝑖 , … 𝒙𝒙𝑁𝑁 , and a
distance metric 𝑑𝑑 𝒙𝒙, 𝒚𝒚
• Clustering Goal:
3. The sum of distances between each 𝒙𝒙(𝑖𝑖) and its centroid 𝜇𝜇𝑘𝑘 is minimized.
35
K-Means
36
K-Means
1. Assign every data point 𝒙𝒙𝑖𝑖 to its closest cluster center, according to the
given distance metric, i.e., find the 𝜇𝜇𝑘𝑘 such that 𝑑𝑑 𝒙𝒙𝑖𝑖 , 𝜇𝜇𝑘𝑘 is minimized.
2. Update the cluster center 𝜇𝜇𝑘𝑘 to be the average of its assigned data points.
37
K-Means Example 1
38
K-Means Example 1
39
K-Means Example 1
• Given the three initial cluster centers A1, B1, and C1.
• Step 1: Determine which data point belongs to which cluster by
calculating their distances to the centers, i.e., yellow-highlighted columns
in the following matrix:
40
K-Means Example 1
• Cluster 1={A1}, Cluster 2={B1, B2, B3, A3, C2}, Cluster 3={A2, C1}
41
K-Means Example 1
• Cluster 1={A1}, Cluster 2={B1, B2, B3, A3, C2}, Cluster 3={A2, C1}
• Step 2: The cluster centers after the first round of iteration can be obtained
by computing the mean of all the data points belong to each cluster as:
C1 = (2, 10); C2 = (6, 6); C3 = (1.5, 3.5)
42
K-Means Example 1
• Step 2: The cluster centers after the first round of iteration can be
obtained by computing the mean of all the data points belong to each
cluster as:
C1 = (2, 10); C2 = (6, 6); C3 = (1.5, 3.5)
• ……
43
K-Means Example 2
Centroid 1
Data samples
Centroid 2
44
K-Means Example 3
45
K-Means Example 3
K=2 K=3 46
K-Means Example 3
K=3 47
K-Means
𝑁𝑁 𝐾𝐾
1 2
min min � � 𝑟𝑟𝑖𝑖𝑖𝑖 𝑥𝑥𝑖𝑖 − µ𝑘𝑘 2
µ𝑘𝑘 𝑟𝑟𝑖𝑖𝑖𝑖 2
𝑖𝑖=1 𝑘𝑘=1
• Normalization: ∑𝐾𝐾
𝑘𝑘=1 𝑟𝑟𝑖𝑖𝑖𝑖 = 1 ∀𝑖𝑖
48
K-Means: Is the algorithm good?
• Pros:
• Easy to implement
• Cons:
• Need to choose K.
49
K-Means
50
K-Means
51
K-Means
• Good Initialization
Centroid
52
K-Means
• Good Initialization
53
K-Means
• Bad Initialization
Centroid
54
K-Means
• Bad Initialization
55
K-Means
56
Hierarchical Agglomerative Algorithm (HAC)
• At each step, merge the closest pair of clusters, until only one cluster (or K
clusters) left. K is a given number.
• How to merge?
57
HAC
Iteration:
Merge two clusters with the
minimum distance.
Stopping Criteria:
All objects are merged into a
single cluster.
Or
Cut at 2 clusters
Cluster Cluster
1 2
59
HAC
• Advantages of HAC
• Any clustering result with the desired number of clusters K, can be obtained
by “cutting” the dendrogram at the corresponding level.
60
HAC
Distance?
61
HAC
62
HAC
63
HAC
• Average Linkage: the average distance between all pairs of two data
samples from each cluster.
64
HAC
65
HAC
66
HAC
67
HAC
68
HAC
69
HAC
70
HAC
72
Example: HAC – Centroid distance
points #1 #2 #3 #4
x 1.9 1.8 2.3 2.3
y 1.0 0.9 1.6 2.1
73
Example: HAC – Centroid distance
points #1 #2 #3 #4
x 1.9 1.8 2.3 2.3
y 1.0 0.9 1.6 2.1
#1 0
#2 0.14 0
#3 0.72 0.86 0
#4 1.17 1.3 0.5 0
74
Example: HAC
points #1 #2 #3 #4
x 1.9 1.8 2.3 2.3
y 1.0 0.9 1.6 2.1
Cluster #1+#2 #3 #4
Centroid 1.85 2.3 2.3
0.95 1.6 2.1
75
Example: HAC
Cluster #1+#2 #3 #4
Centroid 1.85 2.3 2.3
0.95 1.6 2.1
#1 + #2 0
#3 0.79 0
#4 1.23 0.5 0
76
Example: HAC
#1 + #2 #3 #4
Cluster #1+#2 #3 #4
#1 + #2 0
Centroid 1.85 2.3 2.3
#3 0.79 0
0.95 1.6 2.1
#4 1.23 0.5 0
77
Example: HAC
Cluster #1+#2 #3+#4
Centroid 1.85 2.3
0.95 1.85
Cluster #1+#2+#3+#4
Centroid 2.075
1.4
78
Example: HAC – Centroid distance
points #1 #2 #3 #4
x 1.9 1.8 2.3 2.3
y 1.0 0.9 1.6 2.1
#1 0
#2 0.14 0
#3 0.72 0.86 0
#4 1.17 1.3 0.5 0
79
#1 #2 #3 #4
Example: HAC #1 0
#2 0.14 0
#3 0.72 0.86 0
#4 1.17 1.3 0.5 0
#1 + #2 #3 #4
#1 + #2 0
#3 0.79 0
#4 1.24 0.5 0
80
#1 + #2 #3 #4
Example: HAC
#1 + #2 0
#3 0.79 0
#4 1.24 0.5 0
#1 + #2 #3 + #4
#1 + #2 0
#3 + #4 1.02 0
81
K-Means vs. HAC
• K-Means
• HAC
82
What we learn
• What is clustering?
• K-Means
• HAC
• Need to choose K. Can stuck at poor local minimum. Need good metric.
84
Regression
85
Outline
• Concept of Regression
• Linear Regression
• Examples
86
Carry-on Questions
87
Recap: Classification
0.9213
• It may output a continuous value, in the form of the probability for a discrete
class label.
90
Linear Regression
• Training dataset contains N data points 𝒙𝒙1 , … , 𝒙𝒙𝑖𝑖 , … 𝒙𝒙𝑁𝑁 , and their ground
truth 𝑦𝑦1 , … , 𝑦𝑦𝑖𝑖 , … 𝑦𝑦𝑁𝑁 .
91
Linear Regression
• Given a set of N data points 𝒙𝒙1 , … , 𝒙𝒙𝑖𝑖 , … 𝒙𝒙𝑁𝑁 , and their 𝑦𝑦1 , … , 𝑦𝑦𝑖𝑖 , … 𝑦𝑦𝑁𝑁 .
𝑁𝑁
1
min 𝐿𝐿� 𝑓𝑓𝒘𝒘,𝑏𝑏 = min �(𝒘𝒘𝑇𝑇 𝒙𝒙𝒊𝒊 + 𝑏𝑏 − 𝑦𝑦𝑖𝑖 )2
𝒘𝒘,𝑏𝑏 𝒘𝒘,𝑏𝑏 N
𝑖𝑖=1
• Loss function: mean squared error between 𝒘𝒘𝑇𝑇 𝒙𝒙𝒊𝒊 + 𝑏𝑏 and 𝑦𝑦𝑖𝑖 .
92
Linear Regression - Example
93
Linear Regression - Example
𝒘𝒘𝑇𝑇 𝒙𝒙𝒊𝒊 + 𝑏𝑏
𝑦𝑦𝑖𝑖
94
Linear Regression – Derivation
95
Linear Regression - Derivation
• Matrix 𝑿𝑿 ∈ 𝑅𝑅𝑁𝑁×(𝑑𝑑+1)
𝒚𝒚
• Vector 𝒚𝒚 ∈ 𝑅𝑅 𝑁𝑁
• Vector 𝒘𝒘 ∈ 𝑅𝑅𝑑𝑑+1
96
Linear Regression - Derivation
98
Linear Regression - Derivation
• Polynomial fit?
• No, thanks….
• Solution:
• Feature Learning
99
Carry-on Questions
• Loss function: mean squared error between 𝒘𝒘𝑇𝑇 𝒙𝒙𝒊𝒊 + 𝑏𝑏 and 𝑦𝑦𝑖𝑖 .
100
Thank you! Now questions
101