Customer Churn
Customer Churn
Churn Management
OLANREWAJU ADENIJI
Applied project submitted in partial fulfillment of the requirements for the degree of
August, 2020
2
DECLARATION
I declare that this applied project that I have submitted to Dublin Business School for the award
of Masters in Data Analytics is the result of my own investigations, except where otherwise
stated, where it is clearly acknowledged by references. Furthermore, this work has not been
submitted for any other degree.
ACKNOWLEDGMENT
This is a copy of my dissertation, as it is a product of a collaborative process, as the author, I am
responsible and remain the owner for the work in this project. I would like to extend my sincere
and heartfelt obligation towards every single individual who has contributed to this endeavor.
I am grateful to Anshu Mukhergee, my supervisor for his valuable guidance and support for the
completion of this project.
I will like to dedicate this project to God almighty, my late mother, and I for sponsoring and
seeing this dream come through independently regardless of the adversity impacted by the
COVID-19 outbreak this year.
4
ABSTRACT
Customer churn also know as customer attrition is one of the major challenges faced by
telecoms service providers and other types of businesses. Revenue is lost annually and marketing
budget is wasted due to customer attrition. In order to maintain a strong business to consumer
management, companies adopt business intelligence and data analytic models to extract and
process necessary customer information. The project research will be divided into two (a) to
predict customer churn (b) to create innovative idea to maximize profit for telecom in business to
customer sector. Two different analytical tools were used to process a public telecom dataset and
model algorithms for classification. The aim of this project will suggest how to reduce losses in
marketing cost, fraud, and create an innovative digital idea to breach the revenue gap between
telecom and digital platforms using customer and network data for profit maximization.
5
List of Figures
2.6 1 Big Data Online Survey
2.6 2 Data Processing
2.8 1 Sample of Payment Database
2.9.1 Telecom Big Data Analytics Framework
2.11 1 Customer Profiling
List of Tables
2.5 1 Confusion Matrix
4.1 1 Data Exploration
4.2 1 Feature Selection
4.2 2 Logistic Regression Performance & Confusion Matrix
4.2 3 Churn Analysis Result
4.2 4 Research Survey
5.2 1 Evaluation
List of Abbreviations
% - percent
BI – business intelligence
RE – relation extraction
E.g. – example
ER – entity recognition
ETC – Etcetera
QA – question answering
B2B – business to business
B2C – business to consumers
CCP – customer churn prediction
CDR – call detail record
CRM – customer relationship management
CUG – closed user group
ETL – extraction, transform, and load
JIT – just in time
QOE – quality of experience
CRISP-DM – Cross Industry Standard Process of Data Mining
6
TABLE OF CONTENTS
DECLARATION ….………………………………………………………………………. 2
ACKNOWLEDGMENT ……………………………………………………………….......3
ABSTRACT ……………………………………………………………………………...... 4
CHAPTER 1
INTRODUCTION ……………………………………………………………………….. 10
CHAPTER 2
LITERATURE REVIEW
2.1 Introduction…………………………………………………………………………….. 22
2.3 The CRISP-DM model: The New Blueprint for Data Mining (2016)………………..... 23
2.4 Feature Selection for Text Categorization on Imbalanced Data. ACM Sigkdd Explorations
(2004) ……………………………………………………………………………………… 24
2.5 Understand System’s Relative Effectiveness Using Adapted Confusion Matrix (2013).25
2.6 Beyond the Hype: Big Data Concepts, Methods, and Analytics (2015)………………. 26
2.7 Customer Retention, Loyalty, and Satisfaction in the German Mobile Cellular
2.8 The Use of Call Detail Records and Data Mart Dimensioning for Telecommunication
Companies (2012)………………………………………………………………………….. 31
2.9 Use Cases and Challenges in Telecom Big Data Analytics,” APSIPA Transactions on Signal
(1999)……………………………………………………………………………………… .35
8
2.13 Introduction to Machine Learning, Second Edition (Adaptive Computation and Machine
Learning)…………………………………………………………………………………… 38
CHAPTER 3
RESEARCH METHODOLOGY
3.1 Introduction…………………………………………………………………………….. 41
3.3 CRISP-DM.……………………………………………………..…............................... 41
CHAPTER 4
4.1 Python………………………………………………………………………………….. 46
CHAPTER 5
CONCLUSION
5.1 Introduction………………………………………………………………….................. 61
REFERENCES…………………………………………………………………… 65
APPENDICES…………………………………………………………………….. 67
10
CHAPTER 1
INTRODUCTION
It focuses on the relative background surrounding the topic. This chapter will discuss the
requirement of the study, problem statement, research objective, importance of the research, and
Machine Learning is the application of artificial intelligence to make prediction by learning and
using algorithm to parse data with computer. There are three types of machine learning;
the task of learning a function that maps an input to an output based on example input-output
pairs, it learns from a data which has been labelled for a desired output [4]. There are two types
Regression is a statistical technique that is used to determine the relationship between two or
more variables where the dependent variable changes according to a change in one or more
independent variables. It enables the prediction of continuous outcome variable (y) based on the
value of one or multiple predictor variables (x), X variable is sometimes called the independent
variable and the Y variable is called the dependent variable. A common technique for continuous
11
prediction is linear regression that models a relationship between variables and output e.g. sales
Classification is a technique that is used to categorize data into specific labels or groups from
values given for training. It predicts class given data point, approximates a function from input
variable to a discrete output variable e.g. spam filtering, churn prediction. In terms of
classification, it is important to learn more than a single technique for modelling. There are
different types of algorithms for classification which can be determined by its use and evaluation
Regression is easier than classification but classification is more popular in terms analysis. The
main difference between the algorithms is that regression algorithms is used to predict the
continuous values like price or salary while classification is used to predict or classify the
discrete values such as Male or Female, True or False, Churn or Not Churn.
Business to consumers in commerce is the model of selling product and services to individual
end-users this is a relationship between a company and the consumer. Marketing is the technique
telecom companies reach out to consumers about their product and services. The aim of
marketing in this sector is to generate the highest level of revenue, reduce customer retention,
and strengthen customer loyalty. Unlike the usual consumer market that focuses on selling more
bundles to customers, the business sector aims to build a strong, mutual beneficial and
continuous relationship to market additional services to customers. Creating value for individual
12
consumers is the key managerial goal for B2C operation, it can create opportunities to serve or
benefit other businesses but the main goal is to create consumer satisfaction for the end users.
Business setup to offer goods and services to satisfy customer wants and needs. Goods are items
that can be purchased or transferred to consumers for the purpose of creating satisfaction e.g.
food, cars, clothes, jewelries, vouchers, and more. Services are non items that deliver value by
offers to consumers for the purpose of creating satisfaction e.g. teaching, haircuts, hospitality,
mail delivery, and car repair. Services can be classified into four groups;
e.g. haircut
Mental stimulus processing services: it provides services that influence the consumers
consumer's want and need. Businesses use the strategy to research what kind of service
customers are mostly interested in and follow up by creating an offer to satisfy their want and
need. Traditional business model use human representatives to connect and contact consumers
but with improved technology artificial intelligence use machine to machine and digital networks
to analyze, contact, and connect users to one another. Challenges of services vary as they can be
13
inconsistent, cannot be separated by the service provider, and customer perception of service
quality can be linked to the interest and skill of the provider. Quality and speed of service
provides also determines utility in the rate of customer consumption which can be responsible for
the selection of a service provider over the other. The four elements of 4P’s that determine
consumer satisfaction and value creation are the people, process, pace, and proof [17]. The
objective now is to create mutual digital means for the telecom operators and its service
consumers.
Telecommunication the exchange of information, sound, messages, and images through optical
and electromagnetic systems has advanced since the 21st century. As wired data communication
expanded, a separate form of data exchange that required no wires experienced a concurrent
development and evolution witnessed a digital wireless packet data protocols. In the early
2000’s, the whole world has seen technology advancement in devices like the 3G, broadband,
and the new launch of 5G network with increasing customer interest and subscription. As an
essential of daily lives, telecom customer database grows at a fast pace into what is called big
data because of its record volume, transactions, and tables – this makes it difficult to extract and
analyze information about a particular customer. [6] “Big data is a term that describes large
volumes of high velocity, complex and variable data that require advanced techniques and
technologies to enable the capture, storage, distribution, management, and analysis of the
information.” Companies have tapped for business intelligence (BI) and analytics for its business
to consumers (B2C); the direct process of selling goods and services to the end-user consumers.
14
Gerpott. T.J et al [7], a research conducted in Germany found out that customer care
performance has no significant impact in customer retention in telecom market. The restructuring
of telecom monopoly in the European Union policy opened the market to competition in the act
of rendering business to customers. It is expected that the change in policy will promote
technology and innovation in the free market. Telecom innovations come as different offers and
subscription plans that attract customers from one company to another. Some companies and
customers gain from innovative technologies, while the competitions in the market lose when
this occurs. It is cheaper to keep an existing customer than acquire a new one that is why it is
Customer churn also known as customer attrition is when a customer fails to renew their
subscription for a service. Amin, A et al. [1], described a customer who leaves one service
provider for another in a saturated market is known as churned customer. Business to consumer
model creates two kinds of customer models; the one that subscribes and prepays and the other
kind comes and goes at whim without stating when the next purchase will take place. The churn
of a random or important customer can lead to more churns and that is unprofitable for a
business organization. Customer is king – this is why it is better to retain an existing customer
because it is more expensive to acquire a new one. Churn delays business growth and waste
marketing cost so therefore, it is important for companies to monitor the rate customers
unsubscribe from their services and determine retention success. Customer churn can be divided
subscribes to service provided by a network due to situations beyond the customer’s control. A
customer might not intend to leave a service but a payment failure can lead to a cancelled
subscription. This type of churn can be due to natural causes or when a third party like a bank in
involved; terms and condition update may no longer favor current service provider. Hence, the
Voluntary churn: it is be referred to as deliberate churn, when a customer chooses to leave and
refuse to subscribe for a service provided. A customer can decide to leave for several reasons
like a better offer from a competitor, poor customer service, better technology, and so on. Also
Telecom store information for different purposes which can be classified into three;
1. Consumer Data: consumer data or customer data is the collection of personal information
about users from their devices. Personal information comes from sources and channels as
16
social media networks, marketing campaigns, customer service requests, call center,
online browsing data, mobile applications and more. This information are mostly used for
marketing purposes and example of consumer data collected are age, name, country,
2. Call Detail Data: it contains basic information collected from mobile phone usage from
cell towers. These data gathered can be used by analyst to gain information about
subscriber behavior and develop features for predictive models. Example of data
collected are identities of call origin & destination, duration of call, and cell tower can be
used to approximate the location of a user in the case of pinpointing people [8]. The
information stored in the Call Detail Data are stored in binary codes and divided into
blocks.
3. Network Data: telecoms allow users share experiences by connecting through a network
which allows operator gain deeper understanding of the network. The knowledge
gathered can be used to identify trends, patterns, maintaining and improving quality of
service. This may enable a better decision-making operation for business outcome.
Customer detail records are a valuable data source about network users and subscribers
use for analysis that helps gain insights into customer behavior for better understanding.
It gives distinctive insight about individual customers from a collective big data showing
unique or similar patterns in three powerful metrics – customer activity pattern, top cell
It conducts analysis by using towers to identify the most commonly cell in the network
by a customer. Although this only triangulates the user it does not give the exact user
conducting by clustering areas with the highest number of customer and identifies churn
Social Network
communication among a user to a group of people. Network operators can use the
analysis to offer discount in call or data for a closed user group (CUG). CUG is a group
of telephone subscribers who can make and receive calls from members within the group
for free but other calls beyond the group would be charged.
Telecom can use data collection for sales forecasting where consumers make calls or use
services, and by projecting service use, operators can boost coverage and network performance.
18
Analysis of use may assess a planned obsolescence strategy or define complimentary services.
Forecasting also looks at the number of customers, market-share, can forecast usage and estimate
Network operators in business intelligence that face similar challenges adopt artificial
intelligence (AI) related to data mining and data warehousing problem domain. Artificial
Intelligence is the ability a machine emulates and recreates human intelligence first by learning.
Origin of big data challenges from advancement in computing and communication technologies
include its properties – 3Vs (volume, variety, and velocity) and fraud [9].
Volume is the massive amount of growing data in time. The exponential growth in telecom data
due to advance technology from the age of 3G, 4G, and 5G allow more user access and
contribution on the internet. The continuous inflow of data from different sources stored measure
in terabytes per minute and stored in a cloud system, also referred to as flood of data e.g. Tweets.
Velocity is the speed by which the data is being generated and processed in the storage system.
The various forms of data stored are described as Variety. In social media use, the data influx
from different platforms Facebook, Twitter, Snapchat, Whatsapp, and more come in various
In all, the challenges telecoms face with big data is how to extract useful information among
these huge, different, and fast growing data is to identify the correlations among the variables
and uncover the relationship between the variables for meaningful prediction. Data integration is
19
another challenge data analyst face when processing information and this can also be determined
by the quality of data. Data processing stage in data mining involves extraction, transform, and
load (ETL) of data and it consumes most of the time than analysis itself.
Telecom operators face challenges and losses due to fraud. The inability to predict and identify
potential defectors masked as customers contributes to losses in churn. Fraud in telecom can be
classified as subscription fraud and superimposed fraud [10]. Subscription fraud is observed
when a user obtains an account without the intention to pay the bill or take advantage of the
promotion without the intention to continue the account. Superimposed fraud is observed when
Organizations that focus on business to consumers will consider monitoring customer churn as it
contributes to annual losses. The establishment of major business organization is to make profit,
so what then happens when the loss of customers is not bring monitored, controlled, and
understood. Customer churn or attrition may be due to different factors in the services provided
The purpose of this research is to develop a model that can enable telecom operators with the use
of machine learning tools and the right algorithm to predict customers who are likely to
unsubscribe from a service, to save recurrent expenditure in marketing campaign budget, reduce
fraud, and manage customer retention through the introduction of self marketing [12] digital
20
directory that offer services from one user to another, and to close the profit margin between
How can Machine Learning accurately predict customers who are likely to churn?
How can marketing cost and fraud be reduced with algorithms in Telecom business to
consumers?
How to breach the profit margin between digital companies, B2B and B2C in telecom?
To use a public dataset to justify the best algorithm for customer churn prediction in
Telecom
Establish a business model and solution to manage customer churn, marketing, and losses
in Telecom
Research consumer interest in self-marketing digital directory for user service offering to
It describes how the dissertation is divided into different parts. It is the outline of each
chapter with brief description which aims to create a background understating for the
project.
Introduction: the chapter includes the problem definition, research question, aim and
objectives
Literature Review: the chapter summarizes the existing research for time series
forecasting with the use of journals, books which include the theories, concepts and
research, each of the six phases of the methodology are modified for the research
Tool Application, Analysis, and Result: it includes the process of creating classification
Conclusion: the final chapter includes the inference gathered from the research
22
CHAPTER 2
LITERATURE REVIEW
2.1 Introduction
This literature review provides information about the academic studies and theoretical result in
the area of customer churn, CRISP-DM, classification algorithms, marketing cost & revenue
loss, and network employment in telecom business to consumer. It will focus on the summary of
previous research on the topic from journals, articles, books, and more to further understand the
(2017)
explained [1] Customer Churn and developed a Just-in-Time (JIT) approach as an alternative to
Customer Churn Prediction (CCP). The benefit of Customer Relationship Management (CRM)
database through the application of business processes and machine learning, implemented the
possibility of Just-in-Time (JIT) approach on cross-company data in the telecom sector. They
reviewed the performance JIT-homogeneous and JIT-heterogeneous ensemble models for CCP
23
in the telecommunication sector, proposed Support Vector Machine (SVM) algorithm as the best
The study uses 10-fold cross-validation process for performance validation of the cross-company
JIT-CCP model. The following observation of data sharing can implicate laws of privacy and
concluded to validate the proposal of JIT approach for customer churn prediction in telecom and
2.3 The CRISP-DM model: The New Blueprint for Data Mining (2016)
Shearer, C. in the article described a free data mining model as the CRISP-DM (Cross Industry
Standard Process of Data Mining). Journal of data warehousing, 5(4), pp.13-22 organizes Data
Mining planning of a cycle that comprises six phases [2]; business understanding, data
The CRISP-DM for big data optimizes performance and fulfills business objectives. The model
works and it does not require high skill proficiency from the management although it can be
learned. Planning, limitation precautions, and time management especially at the data preparation
stage should be put into consideration when making use of this model. It might have a downside
in decision making and the integration of SEMMA is a relative model for deployment.
24
The researchers (Zheng, Z. et al) in this paper focused on how to improve scalability, efficiency
and accuracy of data [3]. The experiment showed potential and actual goodness of combining
positive and negative features in an imbalance data, using naïve bayes, multinomial, and logistic
regression as classifier. More focus on what extent the performance can be improved by better
combination of positive and negative features, and how the optimal combination can be learned.
Four feature selection metrics were presented as information gain, Chi-Square, correlation
coefficient, and odd ratio. Further on the challenges of imbalanced data and classification
algorithm accuracy. Performance measure with precision and recall, the effect of positive and
negative features;
TP (True Positive)
FP (False Positive)
F N (False Negative)
T N (True Negative)
In the experiment, the following conclusion was observed that the combination of positive and
negative feature on imbalanced data is always ideal. Taking into consideration the dataset,
performance measure, classification method, and a good combination of the feature has benefits
to enhance the performance of both logistic regression and naïve bayes algorithms.
25
Matrix (2013)
Jiang, N and Liu, H, proposed a predictive method to measure a system’s relative effectiveness
performance prediction of accuracy and completeness for the objective [5]. It focuses on
different aspect of the user’s output because the effectiveness is not directly achieved.
Effectiveness is defined by the completeness and accuracy if the aggregate effectiveness is not
initiated there will be inaccurate comparison. The methodology of holistic view to initiate the
need to correlate the measurements of errors and completion, the errors can be in different
They described confusion matrix as a statistical tool for interpreting the intelligent information
The confusion matrix case study compared the effectiveness on two e-commerce (Amazon and
Play) websites. The experiment was done by searching the cheapest 16GB memory card on sale
and the best selling picture book for a 16 year old girl with a budget of 10 pounds. To determine
the error, it is considered as a user’s attempt led by a mouse click which was not needed for
error-free task completion with the minimal effort. The performance revealed that Play website
performed better than Amazon website at a lower error rate of 82% and 51%, higher task
between error and completion with introduced confusion matrix. The confusion matrix selects a
compatible classifier for accuracy by using the error rates instead of the number of errors as
classifier.
2.6 Beyond the Hype: Big Data Concepts, Methods, and Analytics (2015)
The paper by A. and Haider M, redefined and elaborated the meaning of Big Data on its
individual attributes. The public & private sector swift embrace of Big Data increases the subject
benefit of which focuses on analytics of unstructured data. The introduction of new tools for
predictive analytics for structured and describes the types of big data text, audio, video, social
media, and tools to analyze data in individual format [6]. Big data defined based on an online
survey conducted in 2012 focuses on what it is, while others tried to answer what it does.
27
Speculation on the best definition of big data is determined by factors like time, type of data, and
the industry. “Big data is a term that describes large volumes of high velocity, complex and
variable data that require advanced techniques and technologies to enable the
capture, storage, distribution, management, and analysis of the information”. Big data
Volume, a survey on Facebook revealed that the social media network processes up to one
million photographs per second. 1024 terabytes in 1 petabyte storage, and a result of 20
data as images, audio and videos; and semi-structured data is extensible markup language (XML)
for exchanging data online. Example for variety property in big data is social media.
Velocity described as the speed data is generated and processed. High volume of data is observed
in mobile devices or retail in Wal-Mart processes more than one million transactions per hour.
28
Other area discussed includes veracity in IBM dealing with unbalanced data collected from
unreliable sources and proving solution using mining and management tools for analytics.
Variability as SAS is the rate of how complex imbalance data flows. Oracle described value in
big data by density; an analyzed data is considered to have a higher value than an original data.
Technique and analytical tools for structured data in big data discussed in text mining, audio
Text Analysis is described as the technique of using machine learning, statistics for extracting
information from text data. Information can be extracted from social media, emails, call center
logs, blogs, and more e.g. Stock market prediction. Method of text analytics include information
extraction is the method of using algorithm to extract information from unstructured data.
Information extraction classified into two as Entity Recognition (ER) and Relation Extraction
(RE). ER finds names in text and classifies them into predefined categories like person, date, and
location, while RE finds and extracts meaningful relationships between entities. Text
summarization method creates abstracts from single and multiple documents e.g. Email and blog
summary.
29
Question answering (QA) techniques finds solution to question enquiries composed in natural
language e.g. Apple’s Siri. Question answering techniques classified into three categories:
information retrieval approach, knowledge approach, and the hybrid approach. Sentiment
analysis also known as opinion mining a technique used to analyze meaningful opinion from
mass opinions about any organization on a public digital space. Opinions can be categorized into
positive and negative, while the technique is majorly applied in areas of business, finance,
Audio Analysis or speech analytics technique examines and extracts information from
unstructured audio data. Example in healthcare to support diagnosis and treatment, it is also used
in call centers.
Video analytics is a technique used to analyze, monitor, and extract meaningful information from
uploaded videos. This can be used in security and surveillance systems for business intelligence.
Server-based architecture and Edge-based architecture are the two approaches to video analytics.
Social media analytics referred to as the analysis of structured and unstructured data from social
media. This can be grouped into social networks, blogs, media stream, and more.
Predictive analytics technique uses statistical method to makes future predictions by learning
from past data. It recognizes patterns and relationships in data, in business it can be used to
Gerpott, T.J. et al, in this paper described the relationship between customer retention, customer
loyalty, and customer retention from a sample of 684 customer digital cellular network operators
in Germany. It is believed that customer care performance had no major effect on customer
retention and the market competition that allows number portability procedures between network
operators. The analysis gives an insight into the German telecom market and explained how the
mobile phone telephone market dominates the sub-market for mobile communication in revenue
from sales. Keeping a contract with customer is important in a telecom business than purchased
goods [7].
In business to business (B2B), there are executives who make the decision to extend or continue
determines customer loyalty which in the end determines customer retention. Customer
satisfaction defined as the experience a customer derives from the service rendered by the
network operator. There are two issues facing customer loyalty in Germany, where a customer
neglects using the service if they wish to terminate their contract and another instance is when a
customer is indifferent about loyalty has no choice but to continue subscription with the network
provider because the current mobile number cannot be ported or assigned to another mobile
network provider, also the tariff charged by all networks in Germany are similar.
Data collected for the survey by a market research firm excluded some groups of individuals as
follows: VIAG Interkom mobile network customers because the company only had a 0.2% share
in 1999, and T-Mobil customers on analogue C-Netz with 2.5% market share in 1999. Customer
31
retention is determined by three factors; phone number holding benefit, cost benefit, and
personal benefit.
2.8 The Use of Call Detail Records and Data Mart Dimensioning for
Jukić, O. and Heđi, 20th telecommunication forum in 2012 discussed the use of CDR files to
record customer call information using a database model known as model star. Profit observed in
telecom market due to implementation of advanced technology regardless of the open market.
Telecom operators in for business analyze data collected from customers to stay ahead of
Call detail record (CDR) an equipment used by network operators that contains the details about
each phone call. The information recorded in the CDR are phone number of both the caller and
recipient, call duration, time of call, and more. CDR files are mostly recorded in binary codes,
the accuracy of the data recorded is important for business achievement than processing. The
challenges of big data variety property are faced in the processing of different data format. The
solution to this will have all data formats converted into one, transformed, and loaded for
analysis.
Database modeling is setup to record information of a call i.e. the user full details and recipient
approved detailed. Database architecture is built to link MSISDN (Mobile Station International
Data warehouse also known as data mart is used for customer behavior analysis. Following the
ETL process of extraction, transformation, and loading – extraction separates data collected from
multiple origins; transformation converts data in different formats to a single format for
processing; loading transports the processed data into the warehouse. Dimensional modeling in
telecom CDR connecting relationships between calls, subscriber, calling party, service, time,
demographic and more indicating primary keys, foreign keys, relations, entities and columns.
Data in the CDR files transformed into a single format depend on the hardware equipment for
quality result. Data warehousing of telecom information from calls assist in learning customer
2.9 Use Cases and Challenges in Telecom Big Data Analytics,” APSIPA
Chen, C explored the advantages and challenges of big data analytics in telecom sector, tools and
technology. Telecom survey recorded 75% of operator marketing & operations would benefit
financially from the use of analytical tools for big data e.g. Hadoop and NoSQL. Telecom
operators before the use of advanced tools used a management framework TM forum [9].
In resource layer in telecoms plan and monitor operations the service layer is responsible for
providing audio, video, and data services, and the customer layer is responsible for customer
relationship management that control consumer order, inquiry, user satisfaction, and more which
The paper research identifies two cases of big data as quality of experience (QOE) and sim box
detection. Customer care in telecom is familiar with importance of quality of experience for
churn prediction, and sim box holds information about service use by customers that can be
a scale of 1-10 from analyzing data collected from audio, video, and data. It identifies problems,
34
improves performance, and predicts telecom marketing. Supervised machine learning algorithms
can be used to gather more insight for customer satisfaction because a positive network service
does not guarantee a positive user satisfaction. Sim box helps detect fraudulent activities by
using call test calls from a suspicious card from a different country for sim card fraudster
confirmation or by conducting a call detailed record (CDR) analysis. Fraudsters can also learn
this network tracking operations and adopt different models hence, create a low false positive
algorithm result.
Origin of big data challenges from advancement in computing and communication technologies
includes its properties – 3Vs (volume, variety, and velocity) and fraud. Volume is the massive
amount of growing data in time. The exponential growth in telecom data due to advance
technology from the age of 3G, 4G, and 5G allow more user access and contribution on the
internet cloud computing. The continuous inflow of data from different sources stored measure
in terabytes per minute and stored in a cloud system [9], this can be seen as flood of data E.g.
tweets. Velocity is the speed by which the data is being generated and processed in the storage
system. The various forms of data stored are described as Variety. In social media use, the data
influx from different platforms Facebook, Twitter, Tinder, Whatsapp, and more come in various
Solutions (1999)
Rosset, S et al, in this paper made a research on the losses observed by telecom operators from
fraud. Emphasis on rules of detecting fraudulent activities using advanced tools, the tools make
detection and conclusion by learning from past data and illustrating a pattern to the observation.
Challenges facing fraud detection rules are customer credit rating and fraudulent behavior. Churn
management and fraud analysis application created by Amdocs works to learn and understand
why customers churn and ability to identify the next churn by building a model for prediction.
As explained in the paper, “Customers who make many international calls, and whose overall
usage is low, tend to chum. This pattern had an explanation, as it was cheaper to make
international calls from one of the competitors” [10]. Fraud in telecom can be classified into two
as subscription fraud and superimposed fraud. Subscription fraud is observed when a user
obtains an account without the intention to pay the bill or take advantage of the promotion
without the intention to continue the account. Superimposed fraud is observed when an active
account is hacked or cloned. Price plan and credit rating are key factors to consider in churn
analysis while call detail record (CDR) is used to understand customer behavior.
Rules in fraud detection are known as fraud pattern that follow a condition. The research aims at
understanding the challenges of rule discovery for fraud analysis differentiating it from standard
classification rule-discovery problem. These rules are set as triggers to detect fraud and the
quality or standard of the rules are measured by how many fraud cases has been identified.
Quality of rules can be determined by high accuracy (specificity), high coverage (sensitivity);
rules with high accuracy find most cases fraudulent, rules with high coverage find most cases
36
fraudulent. The analysis encountered setbacks using machine learning rule-discovery system
Rosset, S. et al, in this paper understood the telecom challenges providing services to the
consumer market. They focus on identifying customers who have potential to churn at an early
stage, indentify customer preference, and execute retention strategy. The objective of the
researchers is to offer integrated model for customer churn prediction and retention strategies.
Statistics, data mining, and machine learning are used for prediction to attain the objectives by
populating risk score using customer profile and transactional behavior. Logistic regression was
suggested as a model can be used for predictive analysis between machine learning and
regression techniques. The risk quotient as a result is used for churn; logistic regression model
formula [11]
( )= + b1X1+b2X2+…+bnXn
Cluster analysis also used in transactional behavior and demographic information for churn
Customer churn prediction can be identified and predicted with logistic regression as a viable
algorithm and customers that exhibit similar behavioral pattern are clustered using k-means
algorithm to achieve high customer retention rate that will result in profit increase.
The revolution of technology as discussed in this paper by Mohr, N. and Meffert, J describes the
panic of the existing telecom operators that majorly serve the business to consumer. A growing
population of about 2.5 billion digital consumers under the age of 25 defines the market with
over the top technology consumption. More time for entertainment are spent on new social
media platforms like YouTube, Snapchat, Tik-Tok and so on. The major worry of the incumbent
network operators is how the new tech world have taken over their consumer market with
communication software or application like Whatsapp, IMO, WeChat, iMessage and the likes. A
38
fall in profit margin of direct calls, international roaming, multi-media texting to a tune of $ 300
billion, and an estimated compound annual growth of 0.7% in 2020 worldwide [12].
In order to maintain the business market, it is advised telecom operators make import decisions
like;
- To make the core business “super-slim,” cost efficient, and act smart by transforming
their business model. Implement digital operations for commercial, marketing, and sales
- To identify new growth areas in the space that combines the great potential of
The suggestion is how to make network operators the backbone of other machine services like
Machine Learning is the ability to make prediction by studying and using algorithm to parse the
data with computers [4]. Computers are programmed in condition where humans might not be
able to explain their knowledge. It can be used for speech and handwriting recognition by
learning from past experience – the application of machine learning to large data bases is called
data mining. In retail, machine learning can be used for basket analysis, fraud detection or credit
39
knowledge from a data which has been labeled for a desired output e.g. Classification
Customer retention is a churn management strategy that should not be ignored when it comes to
business. There should be a focus on winning more customers than losing another after another
since this is totally bad for business and profitability. Tamaddoni Jahromi, A et al, in this paper
analyzed churn management in business to business aspect; their research focuses on two
techniques of (a) predicting customer churn (b) maximizing profit for businesses [14].
They started the research analysis by using a customer data in a B2B were used for the predictive
model using a data mining technique of Classification and Regression Tree (CART) model,
boosting technique, and logistic regression. ROC (receiver operating characteristic) curve is used
to evaluate the performance of the classifiers. Data mining in B2B is not as common as in B2C
due to availability of data to learn from, big data is considered to be underdeveloped in B2B.
The aim of this research is to use classifiers to identify churners and target selected churners with
churn as a change in monetary variable for prediction. It is important to disburse incentives out
by percentage than fixed discount which can be measured using a calibration period, they choose
the first half of a year as a unit of measurement and the second half as prediction period.
Calibration period is described as time set for measurement to determine inactive customers for
churn prediction protocol while prediction period is the interval for predictive analysis. Retention
approach followed different method by 1) maximizes the total profit of a retention campaign, and
2) determines the optimum target size. Price incentive as the primary tool in B2B is adopted for
retention campaign and the cost of retaining different customers will be different depending on
how much they would spend in future and discount parameter. The model predictor uses a target
variable and training set of 70% and the test set of 30%. The model differences are measured
CHAPTER 3
RESEARCH METHODOLOGY
3.1 Introduction
The procedure and technique starts with how to analyze big data in telecom using the standard
process for data mining known as CRISP-DM. Machine learning tools (Python & Rapid Miner)
are used to process and analyze customer data for churn prediction in this work. The customer
need, view, and requirements are important while carrying out the business research and
operations as a service provider. The evaluation of a conducted survey has been used to
determine feasible approach that can contribute to customer loyalty and retention.
This includes the plans conducted that guides through the planning, executing, and monitoring
the research study. Research methods can be qualitative or quantitative for data collection that it
includes questionnaires, interviews, focus groups, and documents. A qualitative method has been
3.3 CRISP-DM
Cross-industry standard process for data mining (CRISP-DM) is an advanced yet simple model
that provides essential knowledge for data mining for beginners and experts. It explains the step-
42
by-step process in six phases in data mining from business understanding, data understanding,
Business Understanding: it entails gaining insight into the telecommunication business and
operations. It aims to understand why customers churn, effect, and disadvantages of a churn to
Data Understanding: it describes the business variables collected from repository and gives
insight into the data. The telecom dataset used is labeled and contains 7043 rows x 21 columns;
customer ID, gender, tenure, internet service, contract, payment method and more. 1,869
Data Preparation: the technique converts the raw data into a useful format that can be easily
analyzed. Data are often inconsistent that can contain missing values, duplicates, and other
errors. Irrelevant information in dataset is called noise in machine learning. The three main
43
factors that contribute to the quality of data are accuracy, completeness, and consistency. Pre-
processing is important to improve the quality of the information in a dataset and it can be
divided into the following steps as follows, data cleaning, data integration, data reduction, data
transformation, and feature selection. There are no missing values in the sample dataset.
Feature selection [3] is a process of automatically or manually selecting features that contribute
irrelevant features that do not add meaning to the variables. There are two techniques that can be
introduced for feature selections, Chi-square and Information gain. Information gain is the
amount of information recovered by removing entropy from a dataset in building a decision tree
gain can be biased when a large number of values as root node are selected.
Modeling: it involves the application of tools and algorithms to the data collected. Rapid Miner
tool is selected to model three different algorithms which are Decision Tree, Naïve Bayes, and
Logistic Regression among others that are available for classification. The algorithm with the
best accuracy will be selected after running the auto-model, accuracy and error will be
determined by the performance, best information gain, and fastest scoring time. In this research,
accuracy was set as priority to avoid class imbalance, this enables the algorithm make the right
prediction between churners and non-churner – the number of non-churn is expected to be more
44
than the actual churners [14]. Misclassification is prone using error rate it can occur if all
Evaluation: it involves the model assessment of algorithm for deployment. ROC curve (receiver
operating characteristic curve) is a graph that shows the performance of a classification model at
all thresholds and it compares the performance between the three model samples. It is the
relationship between sensitivity (True Positive) and specificity (False Positive) rate.
Deployment: it is the final stage for applying the selected model for prediction. The insight
gathered from the independent variables would be reviewed, the predictive analytics whould be
applied for improvement in the telecom operation, and a final report would be presented to the
Artificial intelligent application use algorithms that enable computers learn and enhance without
human interference like in data mining and predictive modeling. The modeling concept uses real
Python: a programming language that is used for web development, creating software
prototypes and data science. In data science it is used for deep learning, data processing, image
and data processing. The Python script has been used to gain insight for data understanding and
Rapid Miner: it is data science software that merges different functions for machine learning,
data preparation, data mining, and predictive analysis. Rapid Miner features analytical
algorithms that can be used for churn modeling, basket analysis, fraud detection, credit risk
modeling, and more. It includes turbo prep and auto model extension – auto model expedites the
process of building and validating models in case of prediction, clustering, and outlier detection.
46
CHAPTER 4
4.1 Python
Data exploration and missing values detection, there are no missing values in the telecom
dataset.
The software is applied to the business understanding and data understanding in the telecom
The telecom data is loaded into the software and it is retrieved to create a single row version of
the data which can be used in deployment. The data parsed for preprocessing used feature
selection to identify and remove variables that are less important in order to improve scalability,
efficiency and accuracy of data modeling – customer ID and Phone services were deselected
Modeling phase technique uses algorithms to build relationship in the data. Algorithm is a set of
rules computers use in solving specific problems. The three algorithms that have been used for
the following churn prediction task are naïve bayes, logistic regression, and decision tree.
Naïve bayes is a machine learning probalistic classifier that uses Bayes' theorem with strong
application either text classification, sentiment analysis, spam filtering and so on. Bayes theorem
Logistic model uses logistic function to model a binary dependent variable, predicts the
healthy/sick. It uses a linear method to combine the input and output to form a prediction, and
the probability prediction should be transformed to binary values (0 or 1). Logistic regression
uses training data to derive its coefficient model that would predict a value (No) very close to 0
P(x) = P(y=1|x)
A decision tree is a non-parametric supervised learning technique that uses branches for its
prediction and classification. It can handle both categorical and numerical data, it breaks down a
dataset into smaller subsets with an increase in depth of tree; it is a high-variance machine
learning algorithm that includes a root node, branches, and leaf nodes – it breaks down a dataset
into smaller subsets with an increase in depth of tree. The node denotes a test on an attribute,
each branch denotes the outcome of a test, and each leaf node holds a class label. Algorithms in
decision tree are ID3, Chi-Square, Gini Index, and Reduction in Variance.
Auto model expedites the process of building and validating models automatically without the
need to setup a training set or test set – in machine learning a training set is implemented to build
up a model, while a test set is to validate the model built. After running the process for accuracy
and error, logistic regression produced the best accuracy at 80.6%, best performance, best
information gain, and fastest scoring time, followed by naïve bayes at 79.1%, and decision tree at
73.5%. Logistic Regression is a supervised learning technique for solving classification problem,
51
it builds a regression model to predict the probability that a given data belongs to class yes or no.
The evaluation phase is the model assessment for deployment. It uses ROC (receiver operating
characteristic) curve, a graph showing the performance of the model at all classification
thresholds comparing the performance between the three model samples. It is the relationship
The Lift Chart measures the effectiveness of a model by calculating the ratio between the result
obtained with a model and the result obtained without a model. The focus is on the true positives
and thus it can be argued that they indicate the sensitivity of the model. Lift chart captures the
percentage of the customer population that can be targeted for churn management.
Telecom customer churn prediction model performance using logistic regression is evaluated by
its accuracy, precision, recall, and F1, which is measured and visualized in by confusion matrix.
[5].
= 0.8061630219 × 100%
= 80.6%
Precision: this is the positive predictive value i.e. the ratio of correct positive predictions to the
= 0.82538 × 100%
= 82.5%
Recall: this is the ratio of true positive predictions to the total of positive values
= 0.93454 × 100%
54
= 93.45%
F1 Score: it is the function of precision and recall. F1 Score is needed when you want to seek a
Algorithm accuracy predicts customer churn class as churner or non-churner. That the real
churner as ‘churner’ (true positive), a real churner as ‘non-churner’ (false negative), a real non-
churner as ‘non-churner’ (true negative), and a real non-churner as ‘churner’ (false positive).
55
Logistic regression from the result above shows the target customers and prediction result. The
labeled columns ‘Churn’ and ‘Churn Prediction’ include Yes/No decision maker. The row in the
Churn column states if a customer has churned or not, while, the results on Churn Prediction
column validates the prediction model. Logistic regression shows customers who are likely to
churn as yes or no for the customers who are not likely to churn.
The deployment phase is crucial to the success of the project, it is important to consider it from
the business understanding phase. The insight gathered will be used to make improvements
within the organization – is where predictive analytics enables improvement in telecom business
operation and a report will be presented to the management for implementation. The urgent
variables that require review are internet service, monthly charges, tenure, contract, and payment
method.
Retention Strategy
Deployment review leads to the final objective of this project aims at marketing in telecom
sector to generate the highest level of revenue and increase customer retention. Since customer
satisfaction determines customer loyalty which in the end determines customer retention [7].
Retention strategy should understand customers better and tailor products and services to their
need. Technology upgrade, marketing strategies and attractive offers are executive advice for the
affected factors. Technology for internet service could be upgraded from 4G to 5G and service
maintenance for pre-existing ones. Unfinished data may be reviewed to rollover in monthly
charges. A customer may be allowed more payment method options to support the service
provided, payment method may include cash deposit, cheque, credit card, debit card, and wire
transfer. Promotional marketing strategies may include brand activation, incentives, and bundle
offers – [14] although incentives have been found to not be an effective churn management
strategy. Churn measure by customer frequency and consistency describing the total volume of
subscription. Fixed and equal discount is common in B2C compared to B2B where percentage
volume discount is advised to avoid excess and unnecessary budget cost. The innovative idea of
this work is to cross-sell a digital directory platform to customers that will allow users benefit by
offering vocational skills or services for extra disposable personal income and provide quick
solution for personal needs. These vocational skills or services may include truck driving,
hairdressing, painting, woodwork, gardening, etc. Users would add their choice of service by
category to enable other users who demand their offer identify and contact them based on
location. Because a directory is a file system cataloging structure which contains references to
other computer files, a telecom operator would connect a user to another through the platform
without the need to disclose private contact details (phone number & email). To protect users
from abuse there will be options to rate and also report inappropriate activities.
A survey was conducted to get user interest feedback on telecom operation to introduce a digital
directory for self-marketing among 65 individuals. The feedback helped to gain insight on
customer interest if the method would build loyalty and service their need as customers –
because the idea was new some respondents were skeptical to give a feedback but the total result
of the survey remained positive. The survey includes five simple questions as follows;
Employment status
Would you like your network operator to list your vocational skill for casual
employment?
58
The first question inquires the service that satisfies customer and contributes to loyalty between
data, call tariff, or others. The second question on duration inquires how long the user has been
subscribed to the network operator – an inquiry of loyalty. The third question enquires user
employment status for disposable income, the forth question would like to know user interest for
the listing service, and the final question inquires user willingness to pay extra for the service.
(%)
.
59
(%)
listing service
Customer satisfaction feedback from the survey reveals 47.7% users prefer data over 35.4% to
calls, and 16.9% others – demand for data consumption is high. 37.5% have been subscribed for
over 10 years, 23.4% in 5 to 10 years, while 39.1% have joined a telecom service in the past 4
years. 62.5% are employed while 31.3% are self employed. Majority of the respondents at 50.8%
would like to be listed, 30.8% would not like to be listed, and 27.7% are indifferent to join the
60
innovative platform. In conclusion, respondents who are more employed have subscribed to a
network service in the past four years derive more satisfaction in data service. They would also
like to benefit from the digital directory platform for user service marketing since it can
contribute to casual employment, personal disposable income, and provide solution. But
according to the conducted survey, users are not willing to pay extra for the listing. Some users
who are willing to pay but not sure are relative to consumer price sensitivity. Price sensitivity
towards service is an important factor that determines consumer uncertainty about product or
service benefits [16]. Product or service uncertainty in consumers develop from the
circumstances of information because brands or network operator are more informed about
service than the consumers. Consumer price sensitivity about a service is inversely related to the
The implementation of the idea is expected to increase customer satisfaction among more
employed user subscription without an additional cost, should it increase customer loyalty and in
CHAPTER 5
CONCLUSION
5.1 Introduction
The final chapter treats the research questions using insights gathered from using machine
learning tools to analyze customer churn in this research. It reveals the importance of CRISP-
DM, customer prediction using machine learning tools, loss reduction, and profitability in
Telecom data has been successfully proven to be compatible with CRISP-DM methodology. It
developed intelligent analytical approach and revealed unfamiliar factors to consider in business
The effect of using machine learning tools to predict customer churn was carried out with
logistic regression because of its accuracy and coefficient model. The implementation of the
right algorithm as a set of rules contributes to the accuracy of customer churn prediction for
Regression
Churn prediction drives management optimism and identifies customers that are about to leave
the company, and using a Lift Chart analysis to target customer range for strategic retention
marketing. Also churn prediction identifies the exact amount or range of customers that have left
the company in order to set an accurate recurrent budget for marketing which helps operators
manage losses once the correct unit of expenditure is budgeted. Losses to fraud are analytically
identified after setting a set of rules to detect pattern in credit and account abusers that leave bad
debt – these user details are immediately cut-off, blocked, or extra verification is requested by
Previous works in this field mostly focus on identifying customers who are likely to churn alone
and leave retention strategy to telecom marketing executives, also reduce losses by fraud
detection. This work aims at managing customer churn by facilitating a digital means that could
contribute to household disposable income [13] and provide solutions in order to raise the level
63
of satisfaction in customers which in turn increases customer loyalty and operators leverage the
benefit directly as a retention strategy. The self-marketing directory idea would also contribute to
a growth strategy for network operators that focus on closing the profit margin between the
telecom business to consumers and digital establishments. The conducted survey made the
observation that an additional disposable income to consumers would increase spending, and
subscription capacity. Most consumers are not willing to pay an additional cost to utilize the
digital service which affirms the concept of price elasticity of demand that states if a price of a
Calibration period as described as the time set for measurement to determine inactive customers
for churn prediction is usually the first half of the year which is a six month period and the
second half is used as the predictive period. It is advised that churn prediction in telecom for
effective retention management should set a quarterly calibration period whereby the first four
months is used to study customer behavior and the second is used to prediction.
The concluded research might face limiting rules depending on country or region’s labor law that
regulates employment. There is more to learn from the success of the digital directory service in
churn management using machine learning to identify, rate, and recommend satisfaction by its
users. Once a company collects and analyzes market research data, it can create marketing
materials based on the message believed would appeal to a target demographic. Some
recommendation algorithms can use the data collected to make scientific suggestion to
64
consumers based on their history to improve its user experience and promote sales. In future,
clustering and recommendation algorithms would be modeled and evaluated for the directory
software development.
65
REFERENCE
1. Amin, A., Al-Obeidat, F., Shah, B., Al Tae, M., Khan, C., Durrani, H.U.R. and Anwar,
S., 2017. Just-in-time customer churn prediction in the telecommunication sector. The
2. Shearer, C., 2000. The CRISP-DM model: the new blueprint for data mining. Journal of
3. Zheng, Z., Wu, X. and Srihari, R., 2004. Feature selection for text categorization on
5. Jiang, N. and Liu, H., 2013, July. Understand system’s relative effectiveness using
6. Gandomi, A. and Haider, M., 2015. Beyond the hype: Big data concepts, methods, and
7. Gerpott, T.J., Rams, W. and Schindler, A., 2001. Customer retention, loyalty, and
8. Jukić, O. and Heđi, I., 2012, November. The use of call detail records and data mart
9. Chen, C.-M. (2016) “Use cases and challenges in telecom big data analytics,” APSIPA
10. Rosset, S., Murad, U., Neumann, E., Idan, Y. and Pinkas, G., 1999, August. Discovery of
ACM SIGKDD international conference on Knowledge discovery and data mining (pp.
409-413).
11. Renjith, S., 2015. An integrated framework to recommend personalized retention actions
12. Mohr, N. and Meffert, J., 2017. Overwhelming OTT: Telcos’ growth strategy in a digital
14. Jahromi, A.T., Stakhovych, S. and Ewing, M., 2014. Managing B2B customer churn,
15. Agnius Valaitis, Analyzing Telecom Subscriber Behavior: 3 Advanced Metrics Built
16. Erdem, T., Swait, J. and Louviere, J., 2002. The impact of brand credibility on consumer
marketing.html
67
APPENDICES
The following instructions and codes will provide a step-by-by guidance to understand the
project artifacts. The machine learning tools that have been used are Python and Rapid Miner for
visualization, data processing and modeling Logistic Regression.
1. Dataset: “WA_Fn-UseC_-Telco-Customer-Churn.csv”
2. Source: https://www.kaggle.com/pavanraj159/telecom-customer-churn-prediction/data
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.tools as tls
import plotly.figure_factory as ff
import itertools
import warnings
warnings.filterwarnings("ignore")
telchurn = pd.read_csv("C:/Users/OLANREWAJU/Documents/Courses/Second
Semeter/Data Mining/CA/CA Project/WA_Fn-UseC_-Telco-Customer-
Churn.csv")
7. #Data Understanding - Size description, to see total number of rows and columns
print("Rows : " , telchurn.shape[0])
print ("Columns : " , telchurn.shape[1])
11. #Data Preparation - Manipulation, Replace null in Total Charges, Drop null
13. #Customers w/o internet are unhappy. Replace "No internet service" to "Not Happy"
replace_cols = [ 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
'TechSupport','StreamingTV', 'StreamingMovies']
for i in replace_cols :
telchurn[i] = telchurn[i].replace({'No internet service' : 'Not happy'})
telchurn["SeniorCitizen"] = telchurn["SeniorCitizen"].replace({1:"Happy",0:"Not
Happy"})
lab = telchurn["Churn"].value_counts().values.tolist()
data = [trace]
fig = go.Figure(data = data,layout = layout)
py.iplot(fig)
#Churn Customers
70
#Non-churn Customers
trace2 = go.Bar(x = tg_nch["tenure_group"] , y = tg_nch["count"],
name = "Non Churn Customers",
marker = dict(line = dict(width = .5,color = "black")),
opacity = .9)
Rapid Miner
1. Load Data: Select ‘import process’ from file in Rapid Miner. Choose ‘.rmp’ file
format from repository.
2. Double-click ‘load and process’
3. Insert or drag and drop dataset.csv file from folder into ‘load data’
4. Execute process: click play button
5. Result