0% found this document useful (0 votes)
15 views

Telecom Segmentation 132

Uploaded by

shirleygamboa260
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Telecom Segmentation 132

Uploaded by

shirleygamboa260
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/257201599

Customer Segmentation and Analysis of a Mobile Telecommunication Company


of Pakistan using Two Phase Clustering algorithm

Conference Paper · September 2013


DOI: 10.1109/ICDIM.2013.6693978

CITATIONS READS

6 5,296

6 authors, including:

Ali Moaz Ali Mustafa Qamar


National University of Sciences and Technology Qassim University
1 PUBLICATION 6 CITATIONS 80 PUBLICATIONS 1,224 CITATIONS

SEE PROFILE SEE PROFILE

Ahsan Rehman
IBM
4 PUBLICATIONS 146 CITATIONS

SEE PROFILE

All content following this page was uploaded by Ali Mustafa Qamar on 11 November 2014.

The user has requested enhancement of the downloaded file.


Customer Segmentation and Analysis of a Mobile
Telecommunication Company of Pakistan using
Two Phase Clustering algorithm
Salar Masood, Moaz Ali, Faryal Arshad, Ali Mustafa Qamar, Aatif Kamal Ahsan Rehman
Department of Computing Business Analytic Consultant
School of Electrical Engineering and Computer Science (SEECS) IBM - Global Business Services
National University of Sciences and Technology (NUST) ahsanr@pk.ibm.com
Islamabad, Pakistan
{09bitsmasood, 09bitmoaza, 09bitfarshad, mustafa.qamar, aatif.kamal}@seecs.edu.pk

Abstract—Pakistan hosts a competitive and fluid telecommu- Li [5] has identified various customer segments belonging to
nication market and for a company to sustain, create customer a retail supermarket before using association rules in order to
value and increase economic efficiency, it needs to better un- perform customer characteristic analysis.
derstand its customers. The purpose of clustering or customer
segmentation is to deliver actionable results for marketing, Zhang et al. [9] also clustered telecom customers of Liaon-
product development and business planning. In this paper, we ing, China based on consumers’ behavior. Like the work of
focus on customer segmentation using clustering algorithms on Li [5], they also described the characteristics of different
real data of a telecommunication company in Pakistan. After cluster groups. However, they also put forward a marketing
choosing appropriate attributes for clustering, we used the two-
strategy based on their study.
step clustering algorithm in order to create different customer
segments. Moreover, the insights obtained from each segment In this research, we first applied the two-step clustering
were analyzed before suggesting marketing strategies for up- algorithm on real customer data of a telecommunications
selling and better targeted campaigns. company of Pakistan pertaining to the call usage, revenue
Keywords-Customer segmentation, Two-Step clustering algo- and recharge analysis . The two-step algorithm was selected
rithm, Business Intelligence, Clustering since it can do clustering even if the categorical attributes are
used which was our case. We made five revenue segments,
I. I NTRODUCTION whereby each one was further segmented based on user’s call
Pakistan hosts the world’s largest and the most experienced usage data. Although many related research works have been
telecommunication companies. The number of mobile sub- performed earlier, we know of no such work in the Pakistani
scribers has reached around 123 million, with more than 90% telelcommunication market.
of the country having cellular services and with a tele-density The paper is organized as follows: Section II describes
of over 62% as of January 2013. In such an environment, in the research methodology including the data transformation,
order to increase profits and create a loyal customer base, a correlation analysis and the use of the two-step clustering
company first needs to better understand its customers. For algorithm along with its results. Section III discusses various
this purpose, different clustering algorithms are used. marketing strategies which could be developed based on the
Segmentation is applied in order to classify the customers performed analysis where as the paper is concluded in Section
into different groups according to one or more attributes. The IV along with shedding light on future perspectives.
customers within the same group have greater similarity, and
the ones in a different group have greater differences. It further II. R ESEARCH M ETHODOLOGY
helps in better understanding the customers, in producing op-
timal price plans, in creating tailored products and ultimately This section describes in detail the research methodology
helps in reducing churn. Moreover, it helps in providing a including the data transformation, correlation analysis and the
multidimensional view of the customer for better treatment application of the two-step clustering algorithm. In the first
targeting. Cross and Thompson [2] have applied supervised step, the data was cleansed and transformed into categorical
(decision trees) as well as unsupervised (K-means clustering) variable types. Next, correlation analysis was performed so
machine learning algorithms to identify customer segments as to select the attributes for clustering. This is followed by
and to predict different risks. Another related work is by performing clustering on customer call usage data, revenue
McCarty and Hastak [7] who compared different segmentation data and recharge analysis data resulting in five revenue
approaches such as RFM, CHAID (CHi-squared Automatic segments. Each of the revenue segment was further divided
Interaction Detection) and logistic regression in data mining. into five sub-segments based on usage and recharge data.
TABLE I TABLE III
ATTRIBUTES U SED FOR C ORRELATION A NALYSIS C ORRELATION MATRIX FOR C ALL U SAGE ATTRIBUTES
Name of Attribute
Calls Usage On-Net Off-Net Peak Off-Peak
Final Revenue (Pakistani Rupees - PKR)
Calls Revenue Calls Usage 1 0.951 0.493 0.910 0.455
SMS Revenue On-Net 0.951 1 0.335 0.854 0.432
VAS (Value Added Services) Revenue Off-Net 0.493 0.335 1 0.594 0.261
Call Usage (Minutes of Use - MOU) Peak 0.910 0.854 0.594 1 0.424
Peak Call Usage (MOU) Off-Peak 0.455 0.432 0.261 0.424 1
Off Peak Call Usage (MOU)
On-Net Call Usage (MOU) TABLE IV
SMS Usage (Count) C ORRELATION MATRIX FOR SMS U SAGE ATTRIBUTES
On-Net SMS Usage (Count)
Off-Net SMS Usage (Count)
Total SMS On-Net Off-Net Peak Off-Peak
Total SMS 1 0.862 0.832 0.964 0.645
TABLE II On-Net SMS 0.862 1 0.437 0.877 0.425
C ORRELATION MATRIX FOR REVENUE ATTRIBUTES Off-Net SMS 0.832 0.437 1 0.752 0.680

Final Rev Calls Rev SMS Rev VAS Rev


Final Rev 1 0.925 0.592 0.568 C. Discretization of Continuous Attributes
Calls Rev 0.925 1 0.429 0.405
SMS Rev 0.592 0.420 1 0.912 It is easier for a company to mark its customers as low,
VAS Rev 0.568 0.405 0.912 1 medium and high valued customers and analyze them rather
assigning some numeric value to them. In our case, all
of the aforementioned attributes were continuous and were
A. Transformation of Data discretized with an equal number of cases in each bin. Fig. 1
and Table V show the bins formed for Final Revenue attribute.
The data set comprised of nine days of 5,109 customers’
There are seven bins in total which are: less than 2, 2-13,
daily call usage, SMS (Short Message Service) usage and
14-27, 28-45, 46-69, 70-113, 114+ . We used visual binning
revenue generation. This daily data was aggregated using pivot
method with equal percentiles to discretize our data for each
tables so as to get good insights from different clusters.
attribute. The minimum number of bins formed by SPSS
B. Correlation Analysis against total revenue was 7 and more bins could not be formed
since whole of the customer data was equally distributed in
Correlation analysis is performed in order to see the re- these bins.
lationship between two attributes on the basis of Pearson Similarly the Calls Usage attribute was also discretized to
correlation value. A positive value of correlation means that ten bins in total which are: less than 0, 0 - 3, 4 - 10, 11 - 19,
if one variable increases, the other one also increases or vice 20 - 32, 33 - 56, 57 - 95, 96 - 164, 165 - 316, 317+ as shown
versa. If the value is negative, it means that if one variable in Fig. 2 and Table VI. Here also, the visual binning method
decreases the other increases and vice versa [1]. The values was employed with equal percentiles to discretize our data for
of Pearson correlation fall in the range of [-1, 1]. Table I shows each attribute. The minimum number of bins formed by SPSS
the various attributes used in correlation analysis. against Calls Usage was ten.
Table II shows the correlation results for the revenue at- Moreover, the SMS Usage attribute was also discretized into
tributes. The Pearson correlation values for revenue related five bins which are less than 0, 1-7, 8-11, 12-36, and 37+ as
to calls, SMS and VAS with the Final Revenue attribute are shown in Fig. 3 as well as Table VII.
0.925, 0.592 and 0.568. These values are quite high which
depicts that these attributes have a strong relationship with
the Final Revenue attribute. Therefore, these attributes could
be selected for clustering purposes.
Table III shows the results for different attributes related
to the call usage. The call usage parameter for On-Net, Off-
Net, Peak and Off-Peak had correlation values of 0.951, 0.493,
0.910 and 0.455 respectively with the Calls Usage attribute.
However, only On-Net calls usage and Peak calls usage were
selected for clustering purpose since their values are more than
0.5.
Table IV shows the values for Pearson correlation related
to the SMS usage attributes. In this case, both On-Net SMS
as well as Off-Net SMS have high values (0.862 and 0.832
respectively) and hence are selected for clustering purposes. Fig. 1. Binning Final Revenue Attribute with Seven Bins
TABLE VII
TABLE V B INNING SMS Usage ATTRIBUTE
B INNING F INAL REVENUE ATTRIBUTE

Value Label
Value Label
1 Less than and Equal to 0 <= 0
1 2.0 <2 2 7 1-7
2 14.0 2 - 13 3 11 8 - 11
3 28.0 14 - 27 4 36 12 - 36
4 46.0 28 - 45 5 High 37+
5 70.0 46 - 69
6 114.0 70 - 113
7 High 114+
D. Two Phase/Step Clustering Algorithm
Namvar et al. [8] developed a two phase clustering algo-
rithm for intelligent customer segmentation. In the first step,
they used K-means clustering in order to cluster the customers
into different segments using their RFM (Recency, Frequency,
Monetary) value. This was followed by further clustering
of each cluster based on the demographic data. The same
algorithm is used in this research since this algorithm can
handle large data sets with both categorical and continuous
attributes at the same time. This algorithm works in two steps:
• In the first step, all the cases are assigned to pre-clusters.
• During the second step, these pre-clusters are handled as
individual cases and are further clustered using the hier-
archical clustering algorithm. The twoâĂŘstep algorithm
can determine itself the number of clusters or they could
Fig. 2. Binning Call Usage Attribute With Ten Bins be specified by the user as well.
A rather classical approach is to cluster customers based on
TABLE VI their Call Detail Records (CDRs) [6]. However in our case, the
B INNING C ALL USAGE ATTRIBUTE customers were segmented based on their revenue attributes
such as Final Revenue, Calls Revenue, SMS Revenue and
Value Label
VAS Revenue. The number of clusters was manually selected
1 Less than 0 <0 as five because the value for silhouette measure of cluster
2 4 0-3
3 11 4 - 10
cohesion and separation (which indicates the results as poor,
4 20 11 - 19 fair or good) was found to be more than 0.5 which shows
5 33 20 - 32 that good clustering has been done by the algorithm as visible
6 57 33 - 56
7 96 57 - 95
from Fig. 4. The classification of results into poor, fair or
8 165 96 - 164 good are based on the work of Kaufman and Rousseeuw [4]
9 317 165 - 316 regarding interpretation of cluster structures. The silhouette
10 High 317+
measure averages over all instances as shown in equation 1:
B−A
(1)
max(A, B)
where A is the distance between an instance and the center of
the cluster to which it belongs and B is the distance between
an instance and the nearest center from other nearby clusters.
A silhouette coefficient of 1 would mean that all cases are
located directly on their cluster centers. A value of 1 would
mean that all cases are located on the cluster centers of some
other cluster. A value of 0 shows that on average, cases are
equi-distant between their own cluster center and the nearest
different cluster.
The noise handling was kept to 3%. This meant that the
algorithm ran on 4517 cases after removing 592 outlier cases
from a total of 5109 customers. With this value, we got a
Fig. 3. Binning SMS Usage Attribute With Five Bins silhouette value more than 0.5 which classifies our result as
good. With noise handling less then 3%, the results were fair.
TABLE VIII
R EVENUE C LUSTERS D ETAILED A NALYSIS

No. Total Cases Total Rev Calls Rev SMS Rev VAS Rev
1 922 > 114 > 84 > 16 > 31
2 1035 > 114 > 84 10 - 15 21 - 30
3 836 14 - 27 10 - 24 0-3 1-8
4 1006 <2 0-9 0-3 1-8
5 718 2 - 13 0-9 4-9 9 - 20
Fig. 4. Silhouette measure of cohesion and separation

On the other hand, with a noise handling greater than 3%,


the results became very good. However, in this case, a large
number of cases got removed from the data.

E. Results of Two Step Clustering Algorithm


The clusters obtained as a result of using two-step clustering
algorithm are given in Table VIII and shown in fig 5. One
can observe that the biggest cluster is Cluster 2. This cluster
contains 22.9% (1035 cases) of the total customers. The
customers in this cluster have a total revenue more than Rs.
114, calls revenue more than Rs. 84, SMS revenue between
Rs. 10 - 15 and VAS revenue between Rs. 21-30. One can
also note that although cluster 2 is the biggest one and has
got the largest total revenue, it does not has the largest SMS
revenue as well as the VAS revenue.
Cluster 4 is the second largest cluster having 22.3% (1006
cases) of the total cases. The customers in this cluster have
a lowest total revenue (less than Rs. 2) where as the calls Fig. 5. Revenue Clusters
revenue is between Rs. 0 - 9, SMS revenue between Rs. 0 - 3
and VAS Revenue between Rs. 1 - 8.
Similarly, cluster 1 has got 20.4% (922 cases) of the total show more or less usage than the amount of revenue generated
cases. The customers in this cluster have a total revenue more against it.
than Rs. 114, calls revenue more than Rs. 84, SMS revenue As a first step, the revenue cluster 2 was segmented into a
more than Rs. 16 and VAS revenue more than Rs. 31. This number of clusters but only top five segments were analyzed
cluster has got the largest revenue with calls, SMS as well as and compared. In fig. 6, one can see that four clusters i.e.
VAS. 5, 6, 8 and 9 which are of almost the same size fall in
Cluster 3 is relatively a smaller cluster, consisting of just categories 33 - 56, 317 +, 96 - 164 and 57-95 minutes of
18.5% (836 cases) of the total cases. The customers in this use respectively. Moreover, this justifies the high revenue of
cluster have a total revenue between Rs. 14 - 27, calls revenue Rs. 114 + for cluster 2 as shown in fig. 5. Jansen [3] performed
between Rs. 10 - 24, SMS revenue between Rs. 0 - 3 and customer segmentation based on usage call behavior (incoming
VAS revenue between Rs. 1 - 8. The smallest cluster is cluster or outgoing). Moreover, he estimated the customer segment
5, having only 15.9% (718 cases) of the total cases. The using Support Vector Machine (SVM) based on the customer
customers in this cluster have a total revenue between Rs. profile. He got an accuracy of 80.3%.
2 - 13, calls revenue between Rs. 0 - 9, SMS revenue between Similarly, the cluster comparison on the basis of SMS shows
Rs. 4 - 9 and VAS revenue between Rs. 9 - 20. that this segment has relatively medium SMS usage. Most of
In all of the clusters, the VAS revenue is greater than the the clusters falling in categories have a SMS count between 8
SMS revenue. Similarly, calls revenue is also greater than SMS - 36 in total. In short, this cluster shows a higher call usage
revenue for all of the clusters, a phenomenon which shows that and medium SMS usage.
the people have a higher tendency to make a call as compared Cluster 4 of revenue was similarly analyzed and compared.
to sending a SMS. Fig. 7 shows that most of the cases related to the calls fall
The next step was performing further segmentation of each within only 0 - 3 minutes of use hence justifying a revenue
revenue segment based on customer’s call and SMS usage. of less than 2 for segment 4 of revenue. On analyzing the
One of the challenges was that we did not know whether division along SMS usage, one can observe that a very big
a customer is subscribed to free minutes and other such cluster falls in the 1st category i.e. 1 - 7 SMS. In general, this
subscriptions. Because of these subscriptions, a segment might cluster has very low call as well as SMS usage.
Fig. 9. Segmentation of Revenue Cluster 3
Fig. 6. Segmentation of Revenue Cluster 2

Fig. 10. Segmentation of Revenue Cluster 5


Fig. 7. Segmentation of Revenue Cluster 2

as their revenue from all categories is the highest. Such


Fig. 8 shows the segmentation of cluster 1. It can be
customers need to be retained as they generate most of the
observed that both call as well as SMS usage is very high
company revenue. They must be given special offers, rewards
in this segment as all segments fall in higher categories.
and discounts so that they feel more loyal to the company.
Fig. 9 shows the cluster 3 of revenue further segmented
When we analyze cluster 2 from Table V, we can see that they
on the basis of call and SMS usage. It can be seen that this
are more inclined towards making calls which shows that they
segment has high call usage and low to medium SMS usage.
are more comfortable to communicate through calls rather than
Fig. 10 shows cluster 5 segmented on the basis of call and
sending SMS. There is a need to provide them with optimum
SMS usage. This segment has medium call usage and very
price plans for making calls. Packages such as:
low to low SMS usage.
• Talk Monthly 24 hours a day throughout the month on
III. M ARKETING S TRATEGIES same network and all land-line numbers for Rs. 300 +
Since our segmentation is done on nine days of customer tax per month.
data, it would be better to discuss weekly and daily packages • Talk Weekly 24 hours a day on same network for Rs. 80

which could be offered to these segments. Different tailored + tax per week.
packages, bundles and offers can be given to these segments • Talk Daily 24 hours a day on same network and friends

based on the type of revenue they generate and their usage and family numbers for Rs. 12 + tax daily.
behavior. Cluster 1 is indeed the segment of loyal customers Cluster 3 shows good call usage and some signs of VAS
usage as well. They are least interested in communicating
through SMS. Hence they should be offered good price plans
for calls and good VAS offers such as:
• Talk Daily 24 hours a day on same network and friends
and family numbers for Rs.12 + tax daily.
• To enhance their VAS usage, we need to offer them free
Internet buckets of 3 - 10 MB etc. on weekly basis.
Analyzing cluster 4 of revenue from Table V, we can see
that they generate the least revenue in all of the categories.
We need to offer them different packages and bundles related
to call and SMS only so that their usage of the core services
could increase. They should not be offered VAS promotions as
Fig. 8. Segmentation of Revenue Cluster 1 their core services usage is already so low and they would not
bother for extra services offer. They could be offered packages for each segment using different machine learning algorithms
such as: such as multinomial logistic regression. With data of different
• On recharge of Rs. 50, get free 10 min + 100 SMS on services used by customer, each segment could further be ana-
all networks and on recharge of Rs.100, get 30min + 300 lyzed on the basis of the services used. This would eventually
SMS allow to offer better targeted promotions for increasing the
• For only Rs. 10 + Tax per day, all calls on same network sales.
are free. Moreover, get 100 min + 300 SMS free as a
bonus. ACKNOWLEDGMENT
Cluster 5 is of technology lovers because their VAS (Value
We would like to thank IBM for rendering help under IBM
Added Services) revenue is good. Their SMS revenue is also
Academic Initiative.
fairly good as compared to their call revenue as shown in Table
V. We should give incentives in using VAS. Following offers
R EFERENCES
could be given:
• For Rs. 250, get 5000 SMS and MMS each with 1.5 GB [1] N. Bhatia and Vandana. Survey of nearest neighbor techniques. Interna-
Internet for the whole month. Note that such an offer will tional Journal of Computer Science and Information Security, Vol. 8, No.
2, 2010.
also cater for their SMS use. [2] G. Cross and W. Thompson. Understanding your customer: Segmentation
• For Rs. 30, send unlimited messages through WhatsApp techniques for gaining customer insight and predicting risk in the telecom
(a free on-line chatting application on cellphone) monthly. industry: Data mining and predictive modeling, SAS Global Forum, 2008.
[3] S. M. H. Jansen. Customer segmentation and customer profiling for
IV. C ONCLUSION AND F UTURE W ORK a mobile telecommunications company based on usage behavior, A
Vodafone Case Study, Master thesis, University of Maastricht, July 2007.
In this paper, we have shown that through the use of
[4] L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: An Introduc-
customer segmentation, a telecommunication company could tion to Cluster Analysis. Wiley-Interscience, New York (Series in Applied
easily market its customers with right products and services. Probability and Statistics), 1990.
Moreover, this also helps in offering tailored packages, offers [5] Z. Li. Research on customer segmentation in retailing based on clus-
tering model. In Computer Science and Service System (CSSS), 2011
and bundles for customers. In this way, it becomes easier for International Conference on, pages 3437–3440, 2011.
company officials to create marketing campaigns from scratch [6] Q. Lin and Y. Wan. Mobile customer clustering based on call detail
for specific customer segments instead of the whole customer records for marketing campaigns. In Management and Service Science,
2009. MASS ’09. International Conference on, pages 1–4, 2009.
base. The data set consisted of 5,109 customers’ daily call and
[7] J. A. McCarty and M. Hastak. Segmentation approaches in data-mining:
SMS usage as well as revenue generation data spanning over 9 A comparison of rfm, chaid, and logistic regression. Journal of Business
days. The continuous attributes were discretized using binning Research, 60(6):656 – 662, 2007.
method. The two-step clustering algorithm was applied and [8] M. Namvar, M. R. Gholamian, and S. KhakAbi. A two phase clustering
method for intelligent customer segmentation. In Intelligent Systems,
the results were thoroughly analyzed. It was shown that VAS Modelling and Simulation (ISMS), 2010 International Conference on,
(Value Added Services) usage was greater than the SMS usage pages 215–219, 2010.
for all customers’ segments. Every cluster was analyzed so as [9] J. F. Tang T. J. Zhang, X. H. Huang and X. G. Luo. Case study on
cluster analysis of the telecom customers based on consumers’ behavior.
to uncover its call as well as SMS usage behaviour. In Industrial Engineering and Engineering Management (IE EM), 2011
In future, a revenue prediction model could be built which IEEE 18th International Conference on, volume Part 2, pages 1358–1362,
predicts total revenue based on call, SMS and VAS usage 2011.

View publication stats

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy