Cluster analysis is a technique used to segment a market. It involves several steps:
1. Standardize the dataset variables to ensure each has equal effect on cluster selection.
2. For more than two variables, decide the number of clusters and select initial cluster anchors.
3. Use a solver tool to iteratively assign data points to clusters to minimize the distance between points and their cluster anchor, finding the optimal cluster configuration.
The number of clusters can be varied to determine the right number for the data.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
70 views
Cluster Analysis - Market Segmentation
Cluster analysis is a technique used to segment a market. It involves several steps:
1. Standardize the dataset variables to ensure each has equal effect on cluster selection.
2. For more than two variables, decide the number of clusters and select initial cluster anchors.
3. Use a solver tool to iteratively assign data points to clusters to minimize the distance between points and their cluster anchor, finding the optimal cluster configuration.
The number of clusters can be varied to determine the right number for the data.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5
Cluster Analysis – Market Segmentation
T RAMA NAMBI RAJAN || 16PGM36
Step I - Start with your data set
Create a data set that is not nominal in scale.
Step Two – If just two variables, use a scatter graph on Excel
For only two variables involved in a dataset a scatter plot would be sufficient to cluster the data.
Step III – Decide the number of clusters
If the variables are more than two then decide the number of clusters. So select the cluster anchors so that they have a differentiation among themselves.
Step IV – Standardize the attributes
In order to do this identify the mean and SD of all the attributes involved Standardization = (attribute value – attribute’s mean value) / attribute’s SD Working with standardized values for each attribute ensures that your analysis is unit-free and each attribute has the same effect on your cluster selection. The standardized value is known as the Z-score.
Step V – Using solver to find optimal cluster
Each identifier in the cluster should be similar to its cluster anchor, and all other identifier not in the cluster should be different from the cluster anchor. The cluster anchors can be arbitrarily be decided
For example (from book)
For each city in the data set, you can determine the squared distance (using z- scores) of each city from each of the four cluster anchors. Then you assign each city the squared distance to the closest anchor and have your Solver target cell equal the sum of these squared distances. To begin, set up a way to “look up” the z-scores for candidate cluster centers: In H5:H8 enter “trial values” for cluster anchors. Each of these values can be any integer between 1 and 49. For simplicity you can let the four trial anchors be cities 1–4. After naming A9:N58 as the range lookup in G5, look up the name of the first cluster anchor with the formula =VLOOKUP(H5,Lookup,2). Copy this formula to G6:G8 to identify the name of each cluster center candidate. In I5:N8 identify the z-scores for each cluster anchor candidate by copying from I5 to I5:N8 the formula =VLOOKUP($H5,Lookup,I$3). Figure : Look up z-scores for cluster anchors You can now compute the squared distance from each city to each cluster candidate. To compute the distance from city 1 (Albuquerque) to cluster candidate anchor 1, enter in O10 the formula =SUMXMY2($I$5:$N$5,$I10:$N10). This cool Excel function computes the following: (I5-I10)2+(J5-J10)2+(K5-K10)2+(L5-L10)2+(M5-M10)2+(N5-N10)2 To compute the squared distance of Albuquerque from the second cluster anchor, change each 5 in O10 to a 6. Similarly, in Q10 change each 5 to a 7. Finally, in R10 we change each 5 to an 8. Copy from O10:R10 to O11:R58 to compute the squared distance of each city from each cluster anchor. Figure : Computing squared distances from cluster anchors In S10:S58 compute the distance from each city to the “closest” cluster anchor by entering the formula =MIN(O10:R10) in cell S10 and copying it to the cell range S10:S59. In S8 compute the sum of squared distances of all cities from their cluster anchor with the formula =SUM(S10:S58). In T10:T58 compute the cluster to which each city is assigned by entering in T10 the formula =MATCH(S10,O10:R10,0) and copying this formula to T11:T58. This formula identifi es which element in columns O:R gives the smallest squared distance to the city. Use the Solver window, as shown in Figure, to find the optimal cluster anchors for the four clusters.
Determining the Right number of clusters:
In the solver we can input the number of cluster by varying it by 1 or a
minimum number to identify the optimal no. of clusters.