Tutorial Series 4: Exercise 1
Tutorial Series 4: Exercise 1
Tutorial series 4
Exercise 1:
1. What are the similarities and differences between the KNN and K-means methods?
2. Given n observations represented by p binary variables, using a hierarchical clustering
algorithm with single linkage, what is the maximum depth of the resulting dendrogram? If
we now use complete linkage and suppose that p << n, does this change the answer?
3. What distance metric is used in a hierarchical clustering algorithm applied to data
represented by binary variables?
Exercice 2 :
Let the set D of the following integers be: D = { 2, 5, 8, 10, 11, 18, 20 }.
We want to divide the data in D into three (3) clusters using the K-means algorithm.
The distance d between two numbers a and b is calculated as follows: d(a, b) = |a - b| (the absolute
value of a minus b).
Apply K-means using the initial cluster centers: 8, 10, and 11, respectively. Show all calculation
steps.
Exercice 3 :
Given the points: 1, 2, 9, 12, 20
1. Apply the hierarchical clustering algorithm and draw the corresponding dendrogram.
2. Consider a top-down hierarchical clustering algorithm that, at each iteration, looks for the
best way to split a set of points into two parts.
Describe in detail the first iteration of this algorithm (using minimal jump strategy).
Exercice 4 :
Given the following 1D dataset: X = {1, 2, 2, 3, 6, 7, 8, 9}
Assume the data comes from a mixture of two Gaussian distributions (K = 2).
Use the Expectation-Maximization algorithm to cluster the data (1 iteration).
Initial Conditions:
• μ₁ = 2, μ₂ = 8
• σ₁² = 1, σ₂² = 1
• π₁ = 0.5, π₂ = 0.5