0% found this document useful (0 votes)
0 views

tut1_solution

The document presents solutions to a tutorial on clustering analysis, detailing steps for hierarchical clustering and k-means clustering methods. It includes calculations of distances, cluster centers, and characteristics of different customer segments based on their internet usage. The results indicate the formation of three distinct clusters with varying demographics and spending behaviors.

Uploaded by

Chan Hufflepuff
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

tut1_solution

The document presents solutions to a tutorial on clustering analysis, detailing steps for hierarchical clustering and k-means clustering methods. It includes calculations of distances, cluster centers, and characteristics of different customer segments based on their internet usage. The results indicate the formation of three distinct clusters with varying demographics and spending behaviors.

Uploaded by

Chan Hufflepuff
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

STAT3613

Tutorial 1 Solution

Q1
a)
( ) = min ,
∈( ), ∈
= min , | ∈ ( ), ∈
= min , | ∈ , ∈ , , | ∈ , ∈
= min , | ∈ , ∈ , , | ∈ , ∈
= min min , , min ,
∈ , ∈ ∈ , ∈
= min( ,d )

b)
Step1:
A B C D E
A 0
B 4.04 0
C 4.64 1.00 0
D 3.28 0.04 1.04 0
E 0.64 5.00 4.00 4.24 0
BD merge. d(B,D)=0.04.

Step2:
A C E BD
A 0
C 4.04 0
E 0.64 4.00 0
BD 3.28 1.00 4.24 0
AE merge. d(A,E)=0.64.

Step3:
C BD AE
C 0
BD 1.00 0
AE 4.00 3.28 0
BD, C merge. d(BD,C)=1

Step4:
AE BCD
AE 0
BCD 3.28 0
AE,BCD merge.

1
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Height

4
Q2
Centers
BE 0.5 1
ACD 0.6 1.266667

Distances
A B C D E
BE 1.09 1.25 1.25 0.89 1.25
ACD 1.644444 0.697778 0.897778 0.444444 1.964444

Centers
AE 0.4 0
BCD 0.666667 1.933333

Distances
A B C D E
AE 0.16 4.36 4.16 3.6 0.16
BCD 3.755556 0.115556 0.448889 0.128889 4.182222

No more reallocation.

The clustering is (AE), (BCD).

2
Q3
b)
80
60
Height

40
20
0

c18

c15
c19

c16
c13
c20

c12
c14

c17
c10
c11
c2
c3

c4

c7

c6
c9

c1
c5
c8

c)
From the plot we can see that there is a big increase in coefficients when 3-cluster
solution is moved to 2-cluster solution. Therefore, the number of clusters is 3.

75

50
height

25

0
5 10 15
stage

d)
Cluster 1 1 5 8 10 11 12 14 17
Cluster 2 2 3 4 7 15 16 18 19
Cluster 3 6 9 13 20

e)
cluster income age edu expense hours
1 1 -0.049 0.24 0.20 0.40 0.32
2 2 -0.754 -0.90 -0.82 -0.81 -0.73
3 3 1.607 1.32 1.23 0.83 0.83

3
1
cluster
1
mean

2
0 3

-1
income age edu expense hours

f)
Cluster 1 contains customers 1, 5, 8, 10, 11, 12, 14 and 17 and they are middle-aged
people with moderate income, moderate education level. They would not spend a lot of
money on internet and have moderate usage of internet.
Cluster 2 contains customers 2, 3, 4, 7, 15, 16, 18 and 19 and they are younger people
with low income, low education level. They would spend less money on internet and
have low usage of internet.
Cluster 3 contains customers 6, 9, 13 and 20 and they are elder people with high income,
higher education level. They would spend a lot money on internet and have high usage of
internet.

PCCW NWT Pacific


1 0.000 0.125 0.875
2 1.000 0.000 0.000
3 0.000 1.000 0.000

When the result of cluster memberships is compared the value of Y, it seems that the
internet service providers segment the customers into several markets by the variables
used in this cluster analysis.
PCCW serves low usage group.
NWT serves high usage group.
Pacific serves moderate group.

4
Q4
a)
> fit1<-kmeans(x=eg1,centers=4,algorithm="MacQueen")
> fit1
K-means clustering with 4 clusters of sizes 11, 10, 10, 9

Cluster means:
English Math Chinese Science Music PE
1 0.75676842 -0.8289215 0.66097269 0.71634801 0.81428861 0.6973030
2 -1.54493946 -0.9524401 -1.60051497 -1.57003049 -1.53244087 -1.6212116
3 -0.04555104 0.6845897 0.09240849 0.03987673 -0.05603237 0.1334654
4 0.84227248 1.3107378 0.86781836 0.82463438 0.76972864 0.8007922

> fit2<-kmeans(x=eg1,centers=4,algorithm="Hartigan-Wong")
> fit2
K-means clustering with 4 clusters of sizes 9, 11, 10, 10

Cluster means:
English Math Chinese Science Music PE
1 0.84227248 1.3107378 0.86781836 0.82463438 0.76972864 0.8007922
2 0.75676842 -0.8289215 0.66097269 0.71634801 0.81428861 0.6973030
3 -0.04555104 0.6845897 0.09240849 0.03987673 -0.05603237 0.1334654
4 -1.54493946 -0.9524401 -1.60051497 -1.57003049 -1.53244087 -1.6212116

The solutions are the same.

> table(fit1$cluster,fit2$cluster)

1 2 3 4
1 0 11 0 0
2 0 0 0 10
3 0 0 10 0
4 9 0 0 0

b)
Centers by Ward’s method
cluster English Math Chinese Science Music PE
1 1 0.757 -0.83 0.661 0.72 0.814 0.70
2 2 -1.545 -0.95 -1.601 -1.57 -1.532 -1.62
3 3 0.842 1.31 0.868 0.82 0.770 0.80
4 4 -0.046 0.68 0.092 0.04 -0.056 0.13

5
c)
The class sizes are approximately the same and the numbers of students in classes 1 to 4
are 11, 10, 9 and 10 respectively.

d)
K-means clustering with 4 clusters of sizes 11, 10, 9, 10

Cluster means:
English Math Chinese Science Music PE
1 0.75676842 -0.8289215 0.66097269 0.71634801 0.81428861 0.6973030
2 -1.54493946 -0.9524401 -1.60051497 -1.57003049 -1.53244087 -1.6212116
3 0.84227248 1.3107378 0.86781836 0.82463438 0.76972864 0.8007922
4 -0.04555104 0.6845897 0.09240849 0.03987673 -0.05603237 0.1334654

e)

cluster
1
0
mean

2
3
4

-1

English Math Chinese Science Music PE

Students in cluster 1 have high ability in all subjects except mathematics.


Students in cluster 4 have medium ability in all subjects.
Students in cluster 3 have high ability in all subjects.
Students in cluster 2 have low ability in all subjects.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy