0% found this document useful (0 votes)
24 views70 pages

Customer Churn

The document discusses the application of machine learning in managing customer churn within the telecom industry, highlighting its significance for business-to-consumer (B2C) relationships. It aims to predict customer attrition and propose innovative strategies to enhance profitability by utilizing customer and network data. The research employs various analytical tools and models to address the challenges of customer retention and optimize marketing efforts in the telecom sector.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views70 pages

Customer Churn

The document discusses the application of machine learning in managing customer churn within the telecom industry, highlighting its significance for business-to-consumer (B2C) relationships. It aims to predict customer attrition and propose innovative strategies to enhance profitability by utilizing customer and network data. The research employs various analytical tools and models to address the challenges of customer retention and optimize marketing efforts in the telecom sector.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Business to Consumers (B2C):

The Effect of Machine Learning Application in Telecom Customer

Churn Management

OLANREWAJU ADENIJI

Applied project submitted in partial fulfillment of the requirements for the degree of

Masters in Data Analytics

at Dublin Business School

Supervisor: Anshu Mukhergee

August, 2020
2

DECLARATION
I declare that this applied project that I have submitted to Dublin Business School for the award
of Masters in Data Analytics is the result of my own investigations, except where otherwise
stated, where it is clearly acknowledged by references. Furthermore, this work has not been
submitted for any other degree.

Signed: OLANREWAJU ADENIJI


Student No: 10531388
Date: 25th August, 2020
3

ACKNOWLEDGMENT
This is a copy of my dissertation, as it is a product of a collaborative process, as the author, I am
responsible and remain the owner for the work in this project. I would like to extend my sincere
and heartfelt obligation towards every single individual who has contributed to this endeavor.

I am grateful to Anshu Mukhergee, my supervisor for his valuable guidance and support for the
completion of this project.

I will like to dedicate this project to God almighty, my late mother, and I for sponsoring and
seeing this dream come through independently regardless of the adversity impacted by the
COVID-19 outbreak this year.
4

ABSTRACT
Customer churn also know as customer attrition is one of the major challenges faced by
telecoms service providers and other types of businesses. Revenue is lost annually and marketing
budget is wasted due to customer attrition. In order to maintain a strong business to consumer
management, companies adopt business intelligence and data analytic models to extract and
process necessary customer information. The project research will be divided into two (a) to
predict customer churn (b) to create innovative idea to maximize profit for telecom in business to
customer sector. Two different analytical tools were used to process a public telecom dataset and
model algorithms for classification. The aim of this project will suggest how to reduce losses in
marketing cost, fraud, and create an innovative digital idea to breach the revenue gap between
telecom and digital platforms using customer and network data for profit maximization.
5

List of Figures
2.6 1 Big Data Online Survey
2.6 2 Data Processing
2.8 1 Sample of Payment Database
2.9.1 Telecom Big Data Analytics Framework
2.11 1 Customer Profiling

3.3 1 CRISP-DM process

4.1 1 Data Visualization


4.2 1 Data Understanding
4.2 2 Data Processing
4.2 3 Auto Model
4.2 4 Evaluation
4.2 5 Lift Chart
4.2 6 Service Deployment

List of Tables
2.5 1 Confusion Matrix
4.1 1 Data Exploration
4.2 1 Feature Selection
4.2 2 Logistic Regression Performance & Confusion Matrix
4.2 3 Churn Analysis Result
4.2 4 Research Survey
5.2 1 Evaluation

List of Abbreviations
% - percent
BI – business intelligence
RE – relation extraction
E.g. – example
ER – entity recognition
ETC – Etcetera
QA – question answering
B2B – business to business
B2C – business to consumers
CCP – customer churn prediction
CDR – call detail record
CRM – customer relationship management
CUG – closed user group
ETL – extraction, transform, and load
JIT – just in time
QOE – quality of experience
CRISP-DM – Cross Industry Standard Process of Data Mining
6

TABLE OF CONTENTS

DECLARATION ….………………………………………………………………………. 2

ACKNOWLEDGMENT ……………………………………………………………….......3

ABSTRACT ……………………………………………………………………………...... 4

List of Figures …………………………………………………………………. …………. 5

List of Tables ……………………………………………………………………………… 5

List of Abbreviations ……………………………………………………………………… 5

CHAPTER 1

INTRODUCTION ……………………………………………………………………….. 10

1.1 Introduction and Background ……………………………………………………... 10

1.1.1 Introduction to Machine Learning ………………………………… ……... 10

1.1.2 Introduction to Business to Consumers…………………………………..... 11

1.1.3 Evolution of Data in Telecom ……………………………………………... 13

1.1.4 Customer Churn in Telecom……………………………………………….. 14

1.1.5 Types of Data in Telecom………………………………………………….. 15

1.1.6 Big Data Challenges in Telecom…………………………………………... 16

1.2 Research Problem & Research Purpose……………………………………………. 18

1.3 Research Questions & Research Objectives………………………………………. 20

1.4 Research Structure………………………………………………………………..... 21


7

CHAPTER 2

LITERATURE REVIEW

2.1 Introduction…………………………………………………………………………….. 22

2.2 Just-in-time Customer Churn Prediction in the Telecommunication Sector (2017)……22

2.3 The CRISP-DM model: The New Blueprint for Data Mining (2016)………………..... 23

2.4 Feature Selection for Text Categorization on Imbalanced Data. ACM Sigkdd Explorations

(2004) ……………………………………………………………………………………… 24

2.5 Understand System’s Relative Effectiveness Using Adapted Confusion Matrix (2013).25

2.6 Beyond the Hype: Big Data Concepts, Methods, and Analytics (2015)………………. 26

2.7 Customer Retention, Loyalty, and Satisfaction in the German Mobile Cellular

Telecommunications Mark (2001)………………………………………………………… 30

2.8 The Use of Call Detail Records and Data Mart Dimensioning for Telecommunication

Companies (2012)………………………………………………………………………….. 31

2.9 Use Cases and Challenges in Telecom Big Data Analytics,” APSIPA Transactions on Signal

and Information Processing (2016)………………………………………………………… 33

2.10 Discovery of Fraud Rules for Telecommunications – Challenges and Solutions

(1999)……………………………………………………………………………………… .35
8

2.11 An Integrated Framework to Recommend Personalized Retention Actions to Control B2C

E-Commerce Customer Churn (2015)……………………………………………………... 36

2.12 Overwhelming OTT: Telcos’ Growth Strategy in a Digital World (2017)…………... 37

2.13 Introduction to Machine Learning, Second Edition (Adaptive Computation and Machine

Learning)…………………………………………………………………………………… 38

2.14 Managing B2B customer churn, retention and profitability………………………….. 39

CHAPTER 3

RESEARCH METHODOLOGY

3.1 Introduction…………………………………………………………………………….. 41

3.2 Research Strategy……………………………………………………………………… 41

3.3 CRISP-DM.……………………………………………………..…............................... 41

3.4 Machine Learning Tools………………………………………………………………. 44

CHAPTER 4

TOOL APPLICATION, ANALYSIS, & RESULT

4.1 Python………………………………………………………………………………….. 46

4.2 Rapid Miner……………………………………………………………………………. 47


9

CHAPTER 5

CONCLUSION

5.1 Introduction………………………………………………………………….................. 61

5.2 Model Summary………………………………………………………………………...61

5.3 Limitation and Future Work…………………………………………………………… 63

REFERENCES…………………………………………………………………… 65

APPENDICES…………………………………………………………………….. 67
10

CHAPTER 1

INTRODUCTION

1.1 Introduction & Background

It focuses on the relative background surrounding the topic. This chapter will discuss the

requirement of the study, problem statement, research objective, importance of the research, and

structure of the thesis.

1.1.1 Introduction to Machine Learning

Machine Learning is the application of artificial intelligence to make prediction by learning and

using algorithm to parse data with computer. There are three types of machine learning;

supervised learning, unsupervised learning, and reinforcement learning. Supervised learning is

the task of learning a function that maps an input to an output based on example input-output

pairs, it learns from a data which has been labelled for a desired output [4]. There are two types

of Supervised Learning techniques namely regression and classification.

Regression is a statistical technique that is used to determine the relationship between two or

more variables where the dependent variable changes according to a change in one or more

independent variables. It enables the prediction of continuous outcome variable (y) based on the

value of one or multiple predictor variables (x), X variable is sometimes called the independent

variable and the Y variable is called the dependent variable. A common technique for continuous
11

prediction is linear regression that models a relationship between variables and output e.g. sales

prediction. Linear regression used the following equation, Y= a + bX.

Classification is a technique that is used to categorize data into specific labels or groups from

values given for training. It predicts class given data point, approximates a function from input

variable to a discrete output variable e.g. spam filtering, churn prediction. In terms of

classification, it is important to learn more than a single technique for modelling. There are

different types of algorithms for classification which can be determined by its use and evaluation

e.g. Naïve bayes, decision, random forest, etc.

Regression is easier than classification but classification is more popular in terms analysis. The

main difference between the algorithms is that regression algorithms is used to predict the

continuous values like price or salary while classification is used to predict or classify the

discrete values such as Male or Female, True or False, Churn or Not Churn.

1.1.2 Introduction to Business to Consumers (B2C)

Business to consumers in commerce is the model of selling product and services to individual

end-users this is a relationship between a company and the consumer. Marketing is the technique

telecom companies reach out to consumers about their product and services. The aim of

marketing in this sector is to generate the highest level of revenue, reduce customer retention,

and strengthen customer loyalty. Unlike the usual consumer market that focuses on selling more

bundles to customers, the business sector aims to build a strong, mutual beneficial and

continuous relationship to market additional services to customers. Creating value for individual
12

consumers is the key managerial goal for B2C operation, it can create opportunities to serve or

benefit other businesses but the main goal is to create consumer satisfaction for the end users.

Business setup to offer goods and services to satisfy customer wants and needs. Goods are items

that can be purchased or transferred to consumers for the purpose of creating satisfaction e.g.

food, cars, clothes, jewelries, vouchers, and more. Services are non items that deliver value by

offers to consumers for the purpose of creating satisfaction e.g. teaching, haircuts, hospitality,

mail delivery, and car repair. Services can be classified into four groups;

 People processing services: it requires the presence of person-to-person to fulfill a service

e.g. haircut

 Product processing services: it requires provision of services to limited or no participation

of the customer e.g. gardener

 Mental stimulus processing services: it provides services that influence the consumers

cognition which promotes mental awareness or development e.g. teaching

 Information processing services: it uses of technology as the product and customer

participation is required e.g. telecom

Service marketing in business is regarded as the application of strategies to predict the

consumer's want and need. Businesses use the strategy to research what kind of service

customers are mostly interested in and follow up by creating an offer to satisfy their want and

need. Traditional business model use human representatives to connect and contact consumers

but with improved technology artificial intelligence use machine to machine and digital networks

to analyze, contact, and connect users to one another. Challenges of services vary as they can be
13

inconsistent, cannot be separated by the service provider, and customer perception of service

quality can be linked to the interest and skill of the provider. Quality and speed of service

provides also determines utility in the rate of customer consumption which can be responsible for

the selection of a service provider over the other. The four elements of 4P’s that determine

consumer satisfaction and value creation are the people, process, pace, and proof [17]. The

objective now is to create mutual digital means for the telecom operators and its service

consumers.

1.1.3 Evolution of Data in Telecom

Telecommunication the exchange of information, sound, messages, and images through optical

and electromagnetic systems has advanced since the 21st century. As wired data communication

expanded, a separate form of data exchange that required no wires experienced a concurrent

development and evolution witnessed a digital wireless packet data protocols. In the early

2000’s, the whole world has seen technology advancement in devices like the 3G, broadband,

and the new launch of 5G network with increasing customer interest and subscription. As an

essential of daily lives, telecom customer database grows at a fast pace into what is called big

data because of its record volume, transactions, and tables – this makes it difficult to extract and

analyze information about a particular customer. [6] “Big data is a term that describes large

volumes of high velocity, complex and variable data that require advanced techniques and

technologies to enable the capture, storage, distribution, management, and analysis of the

information.” Companies have tapped for business intelligence (BI) and analytics for its business

to consumers (B2C); the direct process of selling goods and services to the end-user consumers.
14

Gerpott. T.J et al [7], a research conducted in Germany found out that customer care

performance has no significant impact in customer retention in telecom market. The restructuring

of telecom monopoly in the European Union policy opened the market to competition in the act

of rendering business to customers. It is expected that the change in policy will promote

technology and innovation in the free market. Telecom innovations come as different offers and

subscription plans that attract customers from one company to another. Some companies and

customers gain from innovative technologies, while the competitions in the market lose when

this occurs. It is cheaper to keep an existing customer than acquire a new one that is why it is

advised to invest on retention and to avoid spending more on acquisition.

1.1.4 Customer Churn in Telecom

Customer churn also known as customer attrition is when a customer fails to renew their

subscription for a service. Amin, A et al. [1], described a customer who leaves one service

provider for another in a saturated market is known as churned customer. Business to consumer

model creates two kinds of customer models; the one that subscribes and prepays and the other

kind comes and goes at whim without stating when the next purchase will take place. The churn

of a random or important customer can lead to more churns and that is unprofitable for a

business organization. Customer is king – this is why it is better to retain an existing customer

because it is more expensive to acquire a new one. Churn delays business growth and waste

marketing cost so therefore, it is important for companies to monitor the rate customers

unsubscribe from their services and determine retention success. Customer churn can be divided

into two main groups;


15

Figure 1.1: Types of Churners

Involuntary churn: it is be referred to as unconditioned churn, when a customer no longer

subscribes to service provided by a network due to situations beyond the customer’s control. A

customer might not intend to leave a service but a payment failure can lead to a cancelled

subscription. This type of churn can be due to natural causes or when a third party like a bank in

involved; terms and condition update may no longer favor current service provider. Hence, the

company loses a customer and reoccurring revenue.

Voluntary churn: it is be referred to as deliberate churn, when a customer chooses to leave and

refuse to subscribe for a service provided. A customer can decide to leave for several reasons

like a better offer from a competitor, poor customer service, better technology, and so on. Also

the churn could be incidental.

1.1.5 Types of Telecom Data

Telecom store information for different purposes which can be classified into three;

1. Consumer Data: consumer data or customer data is the collection of personal information

about users from their devices. Personal information comes from sources and channels as
16

social media networks, marketing campaigns, customer service requests, call center,

online browsing data, mobile applications and more. This information are mostly used for

marketing purposes and example of consumer data collected are age, name, country,

email address and more depending on preference.

2. Call Detail Data: it contains basic information collected from mobile phone usage from

cell towers. These data gathered can be used by analyst to gain information about

subscriber behavior and develop features for predictive models. Example of data

collected are identities of call origin & destination, duration of call, and cell tower can be

used to approximate the location of a user in the case of pinpointing people [8]. The

information stored in the Call Detail Data are stored in binary codes and divided into

blocks.

3. Network Data: telecoms allow users share experiences by connecting through a network

which allows operator gain deeper understanding of the network. The knowledge

gathered can be used to identify trends, patterns, maintaining and improving quality of

service. This may enable a better decision-making operation for business outcome.

Customer detail records are a valuable data source about network users and subscribers

use for analysis that helps gain insights into customer behavior for better understanding.

It gives distinctive insight about individual customers from a collective big data showing

unique or similar patterns in three powerful metrics – customer activity pattern, top cell

metrics, and social networks [15].


17

Customer Activity Pattern:

 Subscriber usage (Minutes of Use or MOU)

 MBs of use (MBU), data

 SMS sent and consumption frequency

Top Cell Metrics

It conducts analysis by using towers to identify the most commonly cell in the network

by a customer. Although this only triangulates the user it does not give the exact user

location that provided an insight into customer satisfaction by clusters. Analysis

conducting by clustering areas with the highest number of customer and identifies churn

with the cells with the highest unsubscribe rate.

Social Network

It enables the identification of friends and family among subscribers as a group on a

social network. Information is gathered by accumulating and analyzing the frequency of

communication among a user to a group of people. Network operators can use the

analysis to offer discount in call or data for a closed user group (CUG). CUG is a group

of telephone subscribers who can make and receive calls from members within the group

for free but other calls beyond the group would be charged.

Telecom can use data collection for sales forecasting where consumers make calls or use

services, and by projecting service use, operators can boost coverage and network performance.
18

Analysis of use may assess a planned obsolescence strategy or define complimentary services.

Forecasting also looks at the number of customers, market-share, can forecast usage and estimate

customer community revenue.

1.1.6 Big Data Challenges in Telecom

Network operators in business intelligence that face similar challenges adopt artificial

intelligence (AI) related to data mining and data warehousing problem domain. Artificial

Intelligence is the ability a machine emulates and recreates human intelligence first by learning.

Origin of big data challenges from advancement in computing and communication technologies

include its properties – 3Vs (volume, variety, and velocity) and fraud [9].

Volume is the massive amount of growing data in time. The exponential growth in telecom data

due to advance technology from the age of 3G, 4G, and 5G allow more user access and

contribution on the internet. The continuous inflow of data from different sources stored measure

in terabytes per minute and stored in a cloud system, also referred to as flood of data e.g. Tweets.

Velocity is the speed by which the data is being generated and processed in the storage system.

The various forms of data stored are described as Variety. In social media use, the data influx

from different platforms Facebook, Twitter, Snapchat, Whatsapp, and more come in various

forms forming the big data.

In all, the challenges telecoms face with big data is how to extract useful information among

these huge, different, and fast growing data is to identify the correlations among the variables

and uncover the relationship between the variables for meaningful prediction. Data integration is
19

another challenge data analyst face when processing information and this can also be determined

by the quality of data. Data processing stage in data mining involves extraction, transform, and

load (ETL) of data and it consumes most of the time than analysis itself.

Telecom operators face challenges and losses due to fraud. The inability to predict and identify

potential defectors masked as customers contributes to losses in churn. Fraud in telecom can be

classified as subscription fraud and superimposed fraud [10]. Subscription fraud is observed

when a user obtains an account without the intention to pay the bill or take advantage of the

promotion without the intention to continue the account. Superimposed fraud is observed when

an active account is hacked or cloned.

1.2 Research Problem & Research Purpose

Organizations that focus on business to consumers will consider monitoring customer churn as it

contributes to annual losses. The establishment of major business organization is to make profit,

so what then happens when the loss of customers is not bring monitored, controlled, and

understood. Customer churn or attrition may be due to different factors in the services provided

by an operator, understanding the factors are necessary for customer feedback.

The purpose of this research is to develop a model that can enable telecom operators with the use

of machine learning tools and the right algorithm to predict customers who are likely to

unsubscribe from a service, to save recurrent expenditure in marketing campaign budget, reduce

fraud, and manage customer retention through the introduction of self marketing [12] digital
20

directory that offer services from one user to another, and to close the profit margin between

B2C and digital platforms.

1.3 Research Questions & Research Objectives

 Is Data Mining CRISP-DM methodology a feasible approach in Telecom Business to

Consumer model to analyze Big Data?

 How can Machine Learning accurately predict customers who are likely to churn?

 How can marketing cost and fraud be reduced with algorithms in Telecom business to

consumers?

 How to breach the profit margin between digital companies, B2B and B2C in telecom?

Hence, an excerpt of the research objective is;

 To use CRISP-DM methodology to process & analyze big data in Telecom

 To use a public dataset to justify the best algorithm for customer churn prediction in

Telecom

 Establish a business model and solution to manage customer churn, marketing, and losses

in Telecom

 Research consumer interest in self-marketing digital directory for user service offering to

improve customer retention


21

1.4. Research Structure

It describes how the dissertation is divided into different parts. It is the outline of each

chapter with brief description which aims to create a background understating for the

project.

Introduction: the chapter includes the problem definition, research question, aim and

objectives

Literature Review: the chapter summarizes the existing research for time series

forecasting with the use of journals, books which include the theories, concepts and

models of classification and machine learning

Research Methodology: the chapter utilizes CRISP-DM approach to conduct the

research, each of the six phases of the methodology are modified for the research

Tool Application, Analysis, and Result: it includes the process of creating classification

model using Python and Rapid Miner analysis tools

Conclusion: the final chapter includes the inference gathered from the research
22

CHAPTER 2

LITERATURE REVIEW

2.1 Introduction

This literature review provides information about the academic studies and theoretical result in

the area of customer churn, CRISP-DM, classification algorithms, marketing cost & revenue

loss, and network employment in telecom business to consumer. It will focus on the summary of

previous research on the topic from journals, articles, books, and more to further understand the

purpose of this research.

2.2 Just-in-time Customer Churn Prediction in the Telecommunication Sector

(2017)

According to the research by A Amin et al, in The Journal of Supercomputing, pp.1-25,

explained [1] Customer Churn and developed a Just-in-Time (JIT) approach as an alternative to

Customer Churn Prediction (CCP). The benefit of Customer Relationship Management (CRM)

database through the application of business processes and machine learning, implemented the

possibility of Just-in-Time (JIT) approach on cross-company data in the telecom sector. They

reviewed the performance JIT-homogeneous and JIT-heterogeneous ensemble models for CCP
23

in the telecommunication sector, proposed Support Vector Machine (SVM) algorithm as the best

base classifier and mathematical formula of Kappa statistics for classification.

The study uses 10-fold cross-validation process for performance validation of the cross-company

JIT-CCP model. The following observation of data sharing can implicate laws of privacy and

concluded to validate the proposal of JIT approach for customer churn prediction in telecom and

beyond other sectors.

2.3 The CRISP-DM model: The New Blueprint for Data Mining (2016)

Shearer, C. in the article described a free data mining model as the CRISP-DM (Cross Industry

Standard Process of Data Mining). Journal of data warehousing, 5(4), pp.13-22 organizes Data

Mining planning of a cycle that comprises six phases [2]; business understanding, data

understanding, data preparation, modeling, evaluation, and deployment.

The CRISP-DM for big data optimizes performance and fulfills business objectives. The model

works and it does not require high skill proficiency from the management although it can be

learned. Planning, limitation precautions, and time management especially at the data preparation

stage should be put into consideration when making use of this model. It might have a downside

in decision making and the integration of SEMMA is a relative model for deployment.
24

2.4 Feature Selection for Text Categorization on Imbalanced Data. ACM

Sigkdd Explorations (2004)

The researchers (Zheng, Z. et al) in this paper focused on how to improve scalability, efficiency

and accuracy of data [3]. The experiment showed potential and actual goodness of combining

positive and negative features in an imbalance data, using naïve bayes, multinomial, and logistic

regression as classifier. More focus on what extent the performance can be improved by better

combination of positive and negative features, and how the optimal combination can be learned.

Four feature selection metrics were presented as information gain, Chi-Square, correlation

coefficient, and odd ratio. Further on the challenges of imbalanced data and classification

algorithm accuracy. Performance measure with precision and recall, the effect of positive and

negative features;

TP (True Positive)

FP (False Positive)

F N (False Negative)

T N (True Negative)

In the experiment, the following conclusion was observed that the combination of positive and

negative feature on imbalanced data is always ideal. Taking into consideration the dataset,

performance measure, classification method, and a good combination of the feature has benefits

to enhance the performance of both logistic regression and naïve bayes algorithms.
25

2.5 Understand System’s Relative Effectiveness Using Adapted Confusion

Matrix (2013)

Jiang, N and Liu, H, proposed a predictive method to measure a system’s relative effectiveness

performance prediction of accuracy and completeness for the objective [5]. It focuses on

different aspect of the user’s output because the effectiveness is not directly achieved.

Effectiveness is defined by the completeness and accuracy if the aggregate effectiveness is not

initiated there will be inaccurate comparison. The methodology of holistic view to initiate the

need to correlate the measurements of errors and completion, the errors can be in different

conditions by factors. Errors classified as critical & non-critical is determined by performance.

They described confusion matrix as a statistical tool for interpreting the intelligent information

retrieval by determining the true & predicted classes using a 2 x 2 matrix.

Table 2.5 1: Confusion Matrix

 True positive: true cases correctly identified

 False negative: true cases mistakenly identified as false

 True negative: false cases correctly identified

 False positive: false cases mistakenly identified


26

The confusion matrix case study compared the effectiveness on two e-commerce (Amazon and

Play) websites. The experiment was done by searching the cheapest 16GB memory card on sale

and the best selling picture book for a 16 year old girl with a budget of 10 pounds. To determine

the error, it is considered as a user’s attempt led by a mouse click which was not needed for

error-free task completion with the minimal effort. The performance revealed that Play website

performed better than Amazon website at a lower error rate of 82% and 51%, higher task

completion rate of 16% and 68% respectively.

It described the predictive method to measure the effectiveness of a system by a correlation

between error and completion with introduced confusion matrix. The confusion matrix selects a

compatible classifier for accuracy by using the error rates instead of the number of errors as

classifier.

2.6 Beyond the Hype: Big Data Concepts, Methods, and Analytics (2015)

The paper by A. and Haider M, redefined and elaborated the meaning of Big Data on its

individual attributes. The public & private sector swift embrace of Big Data increases the subject

benefit of which focuses on analytics of unstructured data. The introduction of new tools for

predictive analytics for structured and describes the types of big data text, audio, video, social

media, and tools to analyze data in individual format [6]. Big data defined based on an online

survey conducted in 2012 focuses on what it is, while others tried to answer what it does.
27

Figure 2.6 1: Big Data online survey, April 2012

Speculation on the best definition of big data is determined by factors like time, type of data, and

the industry. “Big data is a term that describes large volumes of high velocity, complex and

variable data that require advanced techniques and technologies to enable the

capture, storage, distribution, management, and analysis of the information”. Big data

challenges are the popular 3Vs; volume, variety, and velocity.

Volume, a survey on Facebook revealed that the social media network processes up to one

million photographs per second. 1024 terabytes in 1 petabyte storage, and a result of 20

petabytes of 260 billion photos stored.

Variety described as accumulation of structured, semi-structured, and unstructured data from

various sources. Example of structured data as spreadsheets or relational databases; unstructured

data as images, audio and videos; and semi-structured data is extensible markup language (XML)

for exchanging data online. Example for variety property in big data is social media.

Velocity described as the speed data is generated and processed. High volume of data is observed

in mobile devices or retail in Wal-Mart processes more than one million transactions per hour.
28

Other area discussed includes veracity in IBM dealing with unbalanced data collected from

unreliable sources and proving solution using mining and management tools for analytics.

Variability as SAS is the rate of how complex imbalance data flows. Oracle described value in

big data by density; an analyzed data is considered to have a higher value than an original data.

Figure 2.6 2: Data processing

Technique and analytical tools for structured data in big data discussed in text mining, audio

analytics, video analytics, and social media analytics.

Text Analysis is described as the technique of using machine learning, statistics for extracting

information from text data. Information can be extracted from social media, emails, call center

logs, blogs, and more e.g. Stock market prediction. Method of text analytics include information

extraction, text summarization, question answering, and sentiment analysis. Information

extraction is the method of using algorithm to extract information from unstructured data.

Information extraction classified into two as Entity Recognition (ER) and Relation Extraction

(RE). ER finds names in text and classifies them into predefined categories like person, date, and

location, while RE finds and extracts meaningful relationships between entities. Text

summarization method creates abstracts from single and multiple documents e.g. Email and blog

summary.
29

Question answering (QA) techniques finds solution to question enquiries composed in natural

language e.g. Apple’s Siri. Question answering techniques classified into three categories:

information retrieval approach, knowledge approach, and the hybrid approach. Sentiment

analysis also known as opinion mining a technique used to analyze meaningful opinion from

mass opinions about any organization on a public digital space. Opinions can be categorized into

positive and negative, while the technique is majorly applied in areas of business, finance,

marketing, political and social science.

Audio Analysis or speech analytics technique examines and extracts information from

unstructured audio data. Example in healthcare to support diagnosis and treatment, it is also used

in call centers.

Video analytics is a technique used to analyze, monitor, and extract meaningful information from

uploaded videos. This can be used in security and surveillance systems for business intelligence.

Server-based architecture and Edge-based architecture are the two approaches to video analytics.

Social media analytics referred to as the analysis of structured and unstructured data from social

media. This can be grouped into social networks, blogs, media stream, and more.

Predictive analytics technique uses statistical method to makes future predictions by learning

from past data. It recognizes patterns and relationships in data, in business it can be used to

predict a customer’s next purchase.


30

2.7 Customer Retention, Loyalty, and Satisfaction in the German Mobile

Cellular Telecommunications Mark (2001)

Gerpott, T.J. et al, in this paper described the relationship between customer retention, customer

loyalty, and customer retention from a sample of 684 customer digital cellular network operators

in Germany. It is believed that customer care performance had no major effect on customer

retention and the market competition that allows number portability procedures between network

operators. The analysis gives an insight into the German telecom market and explained how the

mobile phone telephone market dominates the sub-market for mobile communication in revenue

from sales. Keeping a contract with customer is important in a telecom business than purchased

goods [7].

In business to business (B2B), there are executives who make the decision to extend or continue

subscription contract unlike business to consumers (customers). Customer satisfaction

determines customer loyalty which in the end determines customer retention. Customer

satisfaction defined as the experience a customer derives from the service rendered by the

network operator. There are two issues facing customer loyalty in Germany, where a customer

neglects using the service if they wish to terminate their contract and another instance is when a

customer is indifferent about loyalty has no choice but to continue subscription with the network

provider because the current mobile number cannot be ported or assigned to another mobile

network provider, also the tariff charged by all networks in Germany are similar.

Data collected for the survey by a market research firm excluded some groups of individuals as

follows: VIAG Interkom mobile network customers because the company only had a 0.2% share

in 1999, and T-Mobil customers on analogue C-Netz with 2.5% market share in 1999. Customer
31

retention is determined by three factors; phone number holding benefit, cost benefit, and

personal benefit.

2.8 The Use of Call Detail Records and Data Mart Dimensioning for

Telecommunication Companies (2012)

Jukić, O. and Heđi, 20th telecommunication forum in 2012 discussed the use of CDR files to

record customer call information using a database model known as model star. Profit observed in

telecom market due to implementation of advanced technology regardless of the open market.

Telecom operators in for business analyze data collected from customers to stay ahead of

competition in the market by understanding customer behavior and need [8].

Call detail record (CDR) an equipment used by network operators that contains the details about

each phone call. The information recorded in the CDR are phone number of both the caller and

recipient, call duration, time of call, and more. CDR files are mostly recorded in binary codes,

the accuracy of the data recorded is important for business achievement than processing. The

challenges of big data variety property are faced in the processing of different data format. The

solution to this will have all data formats converted into one, transformed, and loaded for

analysis.

Database modeling is setup to record information of a call i.e. the user full details and recipient

approved detailed. Database architecture is built to link MSISDN (Mobile Station International

Subscriber Directory Number) and calls to procure service payment.


32

Figure 2.8 1: Sample of Payment Database

Data warehouse also known as data mart is used for customer behavior analysis. Following the

ETL process of extraction, transformation, and loading – extraction separates data collected from

multiple origins; transformation converts data in different formats to a single format for

processing; loading transports the processed data into the warehouse. Dimensional modeling in

telecom CDR connecting relationships between calls, subscriber, calling party, service, time,

demographic and more indicating primary keys, foreign keys, relations, entities and columns.

Data in the CDR files transformed into a single format depend on the hardware equipment for

quality result. Data warehousing of telecom information from calls assist in learning customer

behavior and prediction for better market decision making.


33

2.9 Use Cases and Challenges in Telecom Big Data Analytics,” APSIPA

Transactions on Signal and Information Processing (2016)

Chen, C explored the advantages and challenges of big data analytics in telecom sector, tools and

technology. Telecom survey recorded 75% of operator marketing & operations would benefit

financially from the use of analytical tools for big data e.g. Hadoop and NoSQL. Telecom

operators before the use of advanced tools used a management framework TM forum [9].

Figure 2.9 1: Telecom big data analytics framework

In resource layer in telecoms plan and monitor operations the service layer is responsible for

providing audio, video, and data services, and the customer layer is responsible for customer

relationship management that control consumer order, inquiry, user satisfaction, and more which

helps to predict a customer churn by observing quality of experience.

The paper research identifies two cases of big data as quality of experience (QOE) and sim box

detection. Customer care in telecom is familiar with importance of quality of experience for

churn prediction, and sim box holds information about service use by customers that can be

fraudulent or increase revenue loss. QOE is a statistical measurement of customer satisfaction on

a scale of 1-10 from analyzing data collected from audio, video, and data. It identifies problems,
34

improves performance, and predicts telecom marketing. Supervised machine learning algorithms

can be used to gather more insight for customer satisfaction because a positive network service

does not guarantee a positive user satisfaction. Sim box helps detect fraudulent activities by

using call test calls from a suspicious card from a different country for sim card fraudster

confirmation or by conducting a call detailed record (CDR) analysis. Fraudsters can also learn

this network tracking operations and adopt different models hence, create a low false positive

algorithm result.

Origin of big data challenges from advancement in computing and communication technologies

includes its properties – 3Vs (volume, variety, and velocity) and fraud. Volume is the massive

amount of growing data in time. The exponential growth in telecom data due to advance

technology from the age of 3G, 4G, and 5G allow more user access and contribution on the

internet cloud computing. The continuous inflow of data from different sources stored measure

in terabytes per minute and stored in a cloud system [9], this can be seen as flood of data E.g.

tweets. Velocity is the speed by which the data is being generated and processed in the storage

system. The various forms of data stored are described as Variety. In social media use, the data

influx from different platforms Facebook, Twitter, Tinder, Whatsapp, and more come in various

forms forming the big data.


35

2.10 Discovery of Fraud Rules for Telecommunications – Challenges and

Solutions (1999)

Rosset, S et al, in this paper made a research on the losses observed by telecom operators from

fraud. Emphasis on rules of detecting fraudulent activities using advanced tools, the tools make

detection and conclusion by learning from past data and illustrating a pattern to the observation.

Challenges facing fraud detection rules are customer credit rating and fraudulent behavior. Churn

management and fraud analysis application created by Amdocs works to learn and understand

why customers churn and ability to identify the next churn by building a model for prediction.

As explained in the paper, “Customers who make many international calls, and whose overall

usage is low, tend to chum. This pattern had an explanation, as it was cheaper to make

international calls from one of the competitors” [10]. Fraud in telecom can be classified into two

as subscription fraud and superimposed fraud. Subscription fraud is observed when a user

obtains an account without the intention to pay the bill or take advantage of the promotion

without the intention to continue the account. Superimposed fraud is observed when an active

account is hacked or cloned. Price plan and credit rating are key factors to consider in churn

analysis while call detail record (CDR) is used to understand customer behavior.

Rules in fraud detection are known as fraud pattern that follow a condition. The research aims at

understanding the challenges of rule discovery for fraud analysis differentiating it from standard

classification rule-discovery problem. These rules are set as triggers to detect fraud and the

quality or standard of the rules are measured by how many fraud cases has been identified.

Quality of rules can be determined by high accuracy (specificity), high coverage (sensitivity);

rules with high accuracy find most cases fraudulent, rules with high coverage find most cases
36

fraudulent. The analysis encountered setbacks using machine learning rule-discovery system

(C4.5) when fraudulent customers were classified based on their location.

2.11 An Integrated Framework to Recommend Personalized Retention

Actions to Control B2C E-Commerce Customer Churn (2015)

Rosset, S. et al, in this paper understood the telecom challenges providing services to the

consumer market. They focus on identifying customers who have potential to churn at an early

stage, indentify customer preference, and execute retention strategy. The objective of the

researchers is to offer integrated model for customer churn prediction and retention strategies.

Statistics, data mining, and machine learning are used for prediction to attain the objectives by

populating risk score using customer profile and transactional behavior. Logistic regression was

suggested as a model can be used for predictive analysis between machine learning and

regression techniques. The risk quotient as a result is used for churn; logistic regression model

formula [11]

( )= + b1X1+b2X2+…+bnXn

Cluster analysis also used in transactional behavior and demographic information for churn

prediction running on an iterative model with k-means clustering algorithm.


37

Figure 2.11 1: Customer profiling

Customer churn prediction can be identified and predicted with logistic regression as a viable

algorithm and customers that exhibit similar behavioral pattern are clustered using k-means

algorithm to achieve high customer retention rate that will result in profit increase.

2.12 Overwhelming OTT: Telcos’ Growth Strategy in a Digital World (2017)

The revolution of technology as discussed in this paper by Mohr, N. and Meffert, J describes the

panic of the existing telecom operators that majorly serve the business to consumer. A growing

population of about 2.5 billion digital consumers under the age of 25 defines the market with

over the top technology consumption. More time for entertainment are spent on new social

media platforms like YouTube, Snapchat, Tik-Tok and so on. The major worry of the incumbent

network operators is how the new tech world have taken over their consumer market with

communication software or application like Whatsapp, IMO, WeChat, iMessage and the likes. A
38

fall in profit margin of direct calls, international roaming, multi-media texting to a tune of $ 300

billion, and an estimated compound annual growth of 0.7% in 2020 worldwide [12].

In order to maintain the business market, it is advised telecom operators make import decisions

like;

- To make the core business “super-slim,” cost efficient, and act smart by transforming

their business model. Implement digital operations for commercial, marketing, and sales

in B2B and B2C.

- To identify new growth areas in the space that combines the great potential of

digitization and telecom existing core competencies.

The suggestion is how to make network operators the backbone of other machine services like

social media, analytics, logistic and so on.

2.13 Introduction to Machine Learning, Second Edition (Adaptive

Computation and Machine Learning)

Machine Learning is the ability to make prediction by studying and using algorithm to parse the

data with computers [4]. Computers are programmed in condition where humans might not be

able to explain their knowledge. It can be used for speech and handwriting recognition by

learning from past experience – the application of machine learning to large data bases is called

data mining. In retail, machine learning can be used for basket analysis, fraud detection or credit
39

in financial institutions. Machine learning classification is categorized into supervised,

unsupervised, and semi-supervised learning.

 Supervised Learning is the method of training a machine with algorithms to learn

knowledge from a data which has been labeled for a desired output e.g. Classification

 Unsupervised Learning is the method of training an algorithm with unlabeled data to

derive unknown information on its own from experience e.g. clustering

2.14 Managing B2B customer churn, retention and profitability

Customer retention is a churn management strategy that should not be ignored when it comes to

business. There should be a focus on winning more customers than losing another after another

since this is totally bad for business and profitability. Tamaddoni Jahromi, A et al, in this paper

analyzed churn management in business to business aspect; their research focuses on two

techniques of (a) predicting customer churn (b) maximizing profit for businesses [14].

They started the research analysis by using a customer data in a B2B were used for the predictive

model using a data mining technique of Classification and Regression Tree (CART) model,

boosting technique, and logistic regression. ROC (receiver operating characteristic) curve is used

to evaluate the performance of the classifiers. Data mining in B2B is not as common as in B2C

due to availability of data to learn from, big data is considered to be underdeveloped in B2B.

The aim of this research is to use classifiers to identify churners and target selected churners with

incentives to stay. The incentives or discount are to be distributed by percentage among


40

customers based on churn evaluation, frequency of purchase or customer spending described

churn as a change in monetary variable for prediction. It is important to disburse incentives out

by percentage than fixed discount which can be measured using a calibration period, they choose

the first half of a year as a unit of measurement and the second half as prediction period.

Calibration period is described as time set for measurement to determine inactive customers for

churn prediction protocol while prediction period is the interval for predictive analysis. Retention

approach followed different method by 1) maximizes the total profit of a retention campaign, and

2) determines the optimum target size. Price incentive as the primary tool in B2B is adopted for

retention campaign and the cost of retaining different customers will be different depending on

how much they would spend in future and discount parameter. The model predictor uses a target

variable and training set of 70% and the test set of 30%. The model differences are measured

using area under the curve (AUC) ROC curve.


41

CHAPTER 3

RESEARCH METHODOLOGY

3.1 Introduction

The procedure and technique starts with how to analyze big data in telecom using the standard

process for data mining known as CRISP-DM. Machine learning tools (Python & Rapid Miner)

are used to process and analyze customer data for churn prediction in this work. The customer

need, view, and requirements are important while carrying out the business research and

operations as a service provider. The evaluation of a conducted survey has been used to

determine feasible approach that can contribute to customer loyalty and retention.

3.2 Research Strategy

This includes the plans conducted that guides through the planning, executing, and monitoring

the research study. Research methods can be qualitative or quantitative for data collection that it

includes questionnaires, interviews, focus groups, and documents. A qualitative method has been

used to gather feedback for an innovative telecom retention strategy.

3.3 CRISP-DM

Cross-industry standard process for data mining (CRISP-DM) is an advanced yet simple model

that provides essential knowledge for data mining for beginners and experts. It explains the step-
42

by-step process in six phases in data mining from business understanding, data understanding,

data preparation, modeling, evaluation, and deployment [2];

Figure 3.3 1: CRISP-DM Process

Business Understanding: it entails gaining insight into the telecommunication business and

operations. It aims to understand why customers churn, effect, and disadvantages of a churn to

the business organization by learning from data experience.

Data Understanding: it describes the business variables collected from repository and gives

insight into the data. The telecom dataset used is labeled and contains 7043 rows x 21 columns;

customer ID, gender, tenure, internet service, contract, payment method and more. 1,869

customers churned and 5,174 renew subscription.

Data Preparation: the technique converts the raw data into a useful format that can be easily

analyzed. Data are often inconsistent that can contain missing values, duplicates, and other

errors. Irrelevant information in dataset is called noise in machine learning. The three main
43

factors that contribute to the quality of data are accuracy, completeness, and consistency. Pre-

processing is important to improve the quality of the information in a dataset and it can be

divided into the following steps as follows, data cleaning, data integration, data reduction, data

transformation, and feature selection. There are no missing values in the sample dataset.

Feature selection [3] is a process of automatically or manually selecting features that contribute

most to a predictive variable or output of interested. It involves identifying and removal of

irrelevant features that do not add meaning to the variables. There are two techniques that can be

introduced for feature selections, Chi-square and Information gain. Information gain is the

amount of information recovered by removing entropy from a dataset in building a decision tree

algorithm for prediction. Entropy is a measure of randomness or uncertainty in data. Information

gain can be biased when a large number of values as root node are selected.

Gain(X, Y) = Entropy(X) – Entropy(X, Y)

Information gain = (Information before split) – (information after split)

Modeling: it involves the application of tools and algorithms to the data collected. Rapid Miner

tool is selected to model three different algorithms which are Decision Tree, Naïve Bayes, and

Logistic Regression among others that are available for classification. The algorithm with the

best accuracy will be selected after running the auto-model, accuracy and error will be

determined by the performance, best information gain, and fastest scoring time. In this research,

accuracy was set as priority to avoid class imbalance, this enables the algorithm make the right

prediction between churners and non-churner – the number of non-churn is expected to be more
44

than the actual churners [14]. Misclassification is prone using error rate it can occur if all

customers are classified as non-churners which creates a low model observation.

Evaluation: it involves the model assessment of algorithm for deployment. ROC curve (receiver

operating characteristic curve) is a graph that shows the performance of a classification model at

all thresholds and it compares the performance between the three model samples. It is the

relationship between sensitivity (True Positive) and specificity (False Positive) rate.

Deployment: it is the final stage for applying the selected model for prediction. The insight

gathered from the independent variables would be reviewed, the predictive analytics whould be

applied for improvement in the telecom operation, and a final report would be presented to the

organization for implementation.

3.4 Machine Learning Tools

Artificial intelligent application use algorithms that enable computers learn and enhance without

human interference like in data mining and predictive modeling. The modeling concept uses real

world data it is then parsed into the model for enhancement.

Python: a programming language that is used for web development, creating software

prototypes and data science. In data science it is used for deep learning, data processing, image

and data processing. The Python script has been used to gain insight for data understanding and

review any missing value in the dataset.


45

Rapid Miner: it is data science software that merges different functions for machine learning,

data preparation, data mining, and predictive analysis. Rapid Miner features analytical

algorithms that can be used for churn modeling, basket analysis, fraud detection, credit risk

modeling, and more. It includes turbo prep and auto model extension – auto model expedites the

process of building and validating models in case of prediction, clustering, and outlier detection.
46

CHAPTER 4

TOOL APPLICATION, ANALYSIS, & RESULT

4.1 Python

Data exploration and missing values detection, there are no missing values in the telecom

dataset.

Table 4.1 1: Data Exploration Python

Figure 4.1 1: Data Visualization – Churn by Tenure


47

4.2 Rapid Miner

The software is applied to the business understanding and data understanding in the telecom

dataset sample 7043 rows x 21 columns, and different variables.

Figure 4.2 1: Data Understanding

 Customer ID – is the unique identifier for each customer

 Gender – male or female customer

 Tenure – duration of subscription

 Internet service – fiber Optic, DSL, none

 Device Protection – insurance plan

 Contract – describes agreed allowance

 Payment Method – describes mode of payment

 Churn – describes if the customer has unsubscribed or not


48

Figure 4.2 2: Data Processing

The telecom data is loaded into the software and it is retrieved to create a single row version of

the data which can be used in deployment. The data parsed for preprocessing used feature

selection to identify and remove variables that are less important in order to improve scalability,

efficiency and accuracy of data modeling – customer ID and Phone services were deselected

among the variables [3]

Table 4.2 1: Feature Selection


49

Modeling phase technique uses algorithms to build relationship in the data. Algorithm is a set of

rules computers use in solving specific problems. The three algorithms that have been used for

the following churn prediction task are naïve bayes, logistic regression, and decision tree.

Naïve bayes is a machine learning probalistic classifier that uses Bayes' theorem with strong

independence assumptions between the features. It is a good classifier depending on its

application either text classification, sentiment analysis, spam filtering and so on. Bayes theorem

calculates posterior probability P(c|x) from P(c), P(x) and P(x|c)

P(c|x) = P(x1|c) × P(x2|c) × … P(Xn|c) × P(c)

 P(c|x) is the posterior probability of class (c), as target (x) as attribute

 P(x|c) is the likelihood which is the probability of predictor given class

 P(c) is the prior probability of class

 P(x) is the prior probability of predictor

Logistic model uses logistic function to model a binary dependent variable, predicts the

probability of a certain class or event existing such as pass/fail, win/lose, alive/dead or

healthy/sick. It uses a linear method to combine the input and output to form a prediction, and

the probability prediction should be transformed to binary values (0 or 1). Logistic regression

uses training data to derive its coefficient model that would predict a value (No) very close to 0

and another value (Yes) very close to 1.

It is easy to implement, interpret and very efficient to train.


50

P(x) = P(y=1|x)

A decision tree is a non-parametric supervised learning technique that uses branches for its

prediction and classification. It can handle both categorical and numerical data, it breaks down a

dataset into smaller subsets with an increase in depth of tree; it is a high-variance machine

learning algorithm that includes a root node, branches, and leaf nodes – it breaks down a dataset

into smaller subsets with an increase in depth of tree. The node denotes a test on an attribute,

each branch denotes the outcome of a test, and each leaf node holds a class label. Algorithms in

decision tree are ID3, Chi-Square, Gini Index, and Reduction in Variance.

Figure 4.2 3: Auto Model

Auto model expedites the process of building and validating models automatically without the

need to setup a training set or test set – in machine learning a training set is implemented to build

up a model, while a test set is to validate the model built. After running the process for accuracy

and error, logistic regression produced the best accuracy at 80.6%, best performance, best

information gain, and fastest scoring time, followed by naïve bayes at 79.1%, and decision tree at

73.5%. Logistic Regression is a supervised learning technique for solving classification problem,
51

it builds a regression model to predict the probability that a given data belongs to class yes or no.

Figure 4.2 4: Evaluation

The evaluation phase is the model assessment for deployment. It uses ROC (receiver operating

characteristic) curve, a graph showing the performance of the model at all classification

thresholds comparing the performance between the three model samples. It is the relationship

between sensitivity (True Positive) and specificity (False Positive) rate.

Figure 4.2 5: Lift Chart


52

The Lift Chart measures the effectiveness of a model by calculating the ratio between the result

obtained with a model and the result obtained without a model. The focus is on the true positives

and thus it can be argued that they indicate the sensitivity of the model. Lift chart captures the

percentage of the customer population that can be targeted for churn management.

Table 4.2 2: Logistic Regression Performance & Confusion Matrix

Telecom customer churn prediction model performance using logistic regression is evaluated by

its accuracy, precision, recall, and F1, which is measured and visualized in by confusion matrix.

A confusion matrix is a technique for summarizing the performance of a classification algorithm

[5].

Accuracy: it is the ratio of correctly predicted examples by the total examples.


53

(1385 + 237) ÷ (1385+293+97+237)

= 0.8061630219 × 100%

= 80.6%

Precision: this is the positive predictive value i.e. the ratio of correct positive predictions to the

total predicted positives

1385 ÷ (1385 + 293)

= 0.82538 × 100%

= 82.5%

Recall: this is the ratio of true positive predictions to the total of positive values

1385 ÷ (1385 + 97)

= 0.93454 × 100%
54

= 93.45%

F1 Score: it is the function of precision and recall. F1 Score is needed when you want to seek a

balance between Precision and Recall

 Positive (P): Observation is positive

 Negative (N) : Observation is not positive

 True Positive (TP): true cases correctly identified

 False Negative (FN): true cases mistakenly identified as false ones

 True Negative (TN): false cases correctly identified

 False Positive (FP): false cases mistakenly identified

Algorithm accuracy predicts customer churn class as churner or non-churner. That the real

churner as ‘churner’ (true positive), a real churner as ‘non-churner’ (false negative), a real non-

churner as ‘non-churner’ (true negative), and a real non-churner as ‘churner’ (false positive).
55

Table 4.2 3: Churn Analysis Result

Logistic regression from the result above shows the target customers and prediction result. The

labeled columns ‘Churn’ and ‘Churn Prediction’ include Yes/No decision maker. The row in the

Churn column states if a customer has churned or not, while, the results on Churn Prediction

column validates the prediction model. Logistic regression shows customers who are likely to

churn as yes or no for the customers who are not likely to churn.

Figure 4.2 10: Service Deployment


56

The deployment phase is crucial to the success of the project, it is important to consider it from

the business understanding phase. The insight gathered will be used to make improvements

within the organization – is where predictive analytics enables improvement in telecom business

operation and a report will be presented to the management for implementation. The urgent

variables that require review are internet service, monthly charges, tenure, contract, and payment

method.

Retention Strategy

Deployment review leads to the final objective of this project aims at marketing in telecom

sector to generate the highest level of revenue and increase customer retention. Since customer

satisfaction determines customer loyalty which in the end determines customer retention [7].

Retention strategy should understand customers better and tailor products and services to their

need. Technology upgrade, marketing strategies and attractive offers are executive advice for the

affected factors. Technology for internet service could be upgraded from 4G to 5G and service

maintenance for pre-existing ones. Unfinished data may be reviewed to rollover in monthly

charges. A customer may be allowed more payment method options to support the service

provided, payment method may include cash deposit, cheque, credit card, debit card, and wire

transfer. Promotional marketing strategies may include brand activation, incentives, and bundle

offers – [14] although incentives have been found to not be an effective churn management

strategy. Churn measure by customer frequency and consistency describing the total volume of

purchase by customers and consistency by how recent customers purchase a product or


57

subscription. Fixed and equal discount is common in B2C compared to B2B where percentage

volume discount is advised to avoid excess and unnecessary budget cost. The innovative idea of

this work is to cross-sell a digital directory platform to customers that will allow users benefit by

offering vocational skills or services for extra disposable personal income and provide quick

solution for personal needs. These vocational skills or services may include truck driving,

hairdressing, painting, woodwork, gardening, etc. Users would add their choice of service by

category to enable other users who demand their offer identify and contact them based on

location. Because a directory is a file system cataloging structure which contains references to

other computer files, a telecom operator would connect a user to another through the platform

without the need to disclose private contact details (phone number & email). To protect users

from abuse there will be options to rate and also report inappropriate activities.

A survey was conducted to get user interest feedback on telecom operation to introduce a digital

directory for self-marketing among 65 individuals. The feedback helped to gain insight on

customer interest if the method would build loyalty and service their need as customers –

because the idea was new some respondents were skeptical to give a feedback but the total result

of the survey remained positive. The survey includes five simple questions as follows;

 Why your mobile network operator

 Duration of subscription with mobile network?

 Employment status

 Would you like your network operator to list your vocational skill for casual

employment?
58

 Would you like to pay for the listing service?

The first question inquires the service that satisfies customer and contributes to loyalty between

data, call tariff, or others. The second question on duration inquires how long the user has been

subscribed to the network operator – an inquiry of loyalty. The third question enquires user

employment status for disposable income, the forth question would like to know user interest for

the listing service, and the final question inquires user willingness to pay extra for the service.

Table 4.2 1: Research Survey

Question 1 Call Tariff Data (%) Other (%) Responses

(%)

Why Your Network Operator 35.4 47.7 16.9 65

Question 2 0 – 4 Years 5 – 10 10 & above Responses

(%) Years (%) (%)

Duration of Subscription 39.1 23.4 37.5 64

.
59

Question 3 Employed Self- Unemployed Responses

(%) employed (%)

(%)

Employment Status 62.5 31.3 N/A 64

Question 4 Yes (%) No (%) Maybe (%) Responses

Would you like your network 50.8 30.8 18.5 65

operator to list your vocational

skill for casual employment

Question 5 Yes (%) No (%) Maybe (%) Responses

Would you like to pay for the 29.2 43.1 27.7 65

listing service

Customer satisfaction feedback from the survey reveals 47.7% users prefer data over 35.4% to

calls, and 16.9% others – demand for data consumption is high. 37.5% have been subscribed for

over 10 years, 23.4% in 5 to 10 years, while 39.1% have joined a telecom service in the past 4

years. 62.5% are employed while 31.3% are self employed. Majority of the respondents at 50.8%

would like to be listed, 30.8% would not like to be listed, and 27.7% are indifferent to join the
60

innovative platform. In conclusion, respondents who are more employed have subscribed to a

network service in the past four years derive more satisfaction in data service. They would also

like to benefit from the digital directory platform for user service marketing since it can

contribute to casual employment, personal disposable income, and provide solution. But

according to the conducted survey, users are not willing to pay extra for the listing. Some users

who are willing to pay but not sure are relative to consumer price sensitivity. Price sensitivity

towards service is an important factor that determines consumer uncertainty about product or

service benefits [16]. Product or service uncertainty in consumers develop from the

circumstances of information because brands or network operator are more informed about

service than the consumers. Consumer price sensitivity about a service is inversely related to the

level of performance expectation

The implementation of the idea is expected to increase customer satisfaction among more

employed user subscription without an additional cost, should it increase customer loyalty and in

the end increase retention.


61

CHAPTER 5

CONCLUSION

5.1 Introduction

The final chapter treats the research questions using insights gathered from using machine

learning tools to analyze customer churn in this research. It reveals the importance of CRISP-

DM, customer prediction using machine learning tools, loss reduction, and profitability in

telecom business to consumers.

5.2 Model Summary

Telecom data has been successfully proven to be compatible with CRISP-DM methodology. It

developed intelligent analytical approach and revealed unfamiliar factors to consider in business

application e.g. data inconsistency in pre-processing.

The effect of using machine learning tools to predict customer churn was carried out with

logistic regression because of its accuracy and coefficient model. The implementation of the

right algorithm as a set of rules contributes to the accuracy of customer churn prediction for

investing on retention and less on acquisition.


62

Table 5.2 1: Evaluation

MODEL ACCURACY STANDARD GAINS TOTAL TRAINING

DEVIATION TIME TIME

Naïve Bayes 79.1% 1.1% 200 1 min 19 s 38 ms

Logistic 80.6% 0.3% 280 1 min 18 s 66 ms

Regression

Decision Tree 73.5% 0.9% 0 1 min 8 s 22ms

Churn prediction drives management optimism and identifies customers that are about to leave

the company, and using a Lift Chart analysis to target customer range for strategic retention

marketing. Also churn prediction identifies the exact amount or range of customers that have left

the company in order to set an accurate recurrent budget for marketing which helps operators

manage losses once the correct unit of expenditure is budgeted. Losses to fraud are analytically

identified after setting a set of rules to detect pattern in credit and account abusers that leave bad

debt – these user details are immediately cut-off, blocked, or extra verification is requested by

the operator’s system.

Previous works in this field mostly focus on identifying customers who are likely to churn alone

and leave retention strategy to telecom marketing executives, also reduce losses by fraud

detection. This work aims at managing customer churn by facilitating a digital means that could

contribute to household disposable income [13] and provide solutions in order to raise the level
63

of satisfaction in customers which in turn increases customer loyalty and operators leverage the

benefit directly as a retention strategy. The self-marketing directory idea would also contribute to

a growth strategy for network operators that focus on closing the profit margin between the

telecom business to consumers and digital establishments. The conducted survey made the

observation that an additional disposable income to consumers would increase spending, and

subscription capacity. Most consumers are not willing to pay an additional cost to utilize the

digital service which affirms the concept of price elasticity of demand that states if a price of a

commodity or service increases, the demanded will fall.

Calibration period as described as the time set for measurement to determine inactive customers

for churn prediction is usually the first half of the year which is a six month period and the

second half is used as the predictive period. It is advised that churn prediction in telecom for

effective retention management should set a quarterly calibration period whereby the first four

months is used to study customer behavior and the second is used to prediction.

5.3 Limitation and Future Work

The concluded research might face limiting rules depending on country or region’s labor law that

regulates employment. There is more to learn from the success of the digital directory service in

churn management using machine learning to identify, rate, and recommend satisfaction by its

users. Once a company collects and analyzes market research data, it can create marketing

materials based on the message believed would appeal to a target demographic. Some

recommendation algorithms can use the data collected to make scientific suggestion to
64

consumers based on their history to improve its user experience and promote sales. In future,

clustering and recommendation algorithms would be modeled and evaluated for the directory

software development.
65

REFERENCE

1. Amin, A., Al-Obeidat, F., Shah, B., Al Tae, M., Khan, C., Durrani, H.U.R. and Anwar,

S., 2017. Just-in-time customer churn prediction in the telecommunication sector. The

Journal of Supercomputing, pp.1-25.

2. Shearer, C., 2000. The CRISP-DM model: the new blueprint for data mining. Journal of

data warehousing, 5(4), pp.13-22.

3. Zheng, Z., Wu, X. and Srihari, R., 2004. Feature selection for text categorization on

imbalanced data. ACM Sigkdd Explorations Newsletter, 6(1), pp.80-89.

4. Alpaydin, E., 2020. Introduction to machine learning. MIT press.

5. Jiang, N. and Liu, H., 2013, July. Understand system’s relative effectiveness using

adapted confusion matrix. In International Conference of Design, User Experience, and

Usability (pp. 294-302). Springer, Berlin, Heidelberg

6. Gandomi, A. and Haider, M., 2015. Beyond the hype: Big data concepts, methods, and

analytics. International journal of information management, 35(2), pp.137-144.

7. Gerpott, T.J., Rams, W. and Schindler, A., 2001. Customer retention, loyalty, and

satisfaction in the German mobile cellular telecommunications

market. Telecommunications policy, 25(4), pp.249-269.

8. Jukić, O. and Heđi, I., 2012, November. The use of call detail records and data mart

dimensioning for telecommunication companies. In 2012 20th Telecommunications

Forum (TELFOR) (pp. 292-295). IEEE.


66

9. Chen, C.-M. (2016) “Use cases and challenges in telecom big data analytics,” APSIPA

Transactions on Signal and Information Processing. Cambridge University Press, 5, p.

e19. doi: 10.1017/ATSIP.2016.20.

10. Rosset, S., Murad, U., Neumann, E., Idan, Y. and Pinkas, G., 1999, August. Discovery of

fraud rules for telecommunications—challenges and solutions. In Proceedings of the fifth

ACM SIGKDD international conference on Knowledge discovery and data mining (pp.

409-413).

11. Renjith, S., 2015. An integrated framework to recommend personalized retention actions

to control B2C E-commerce customer churn. arXiv preprint arXiv:1511.06975.

12. Mohr, N. and Meffert, J., 2017. Overwhelming OTT: Telcos’ growth strategy in a digital

world. McKinsey Quarterly, pp.2-12.

13. De Wolff, P. (1941). Income Elasticity of Demand, a Micro-Economic and a Macro-

Economic Interpretation. The Economic Journal, 51(201), 140-145. doi:10.2307/2225666

14. Jahromi, A.T., Stakhovych, S. and Ewing, M., 2014. Managing B2B customer churn,

retention and profitability. Industrial Marketing Management, 43(7), pp.1258-1268.

15. Agnius Valaitis, Analyzing Telecom Subscriber Behavior: 3 Advanced Metrics Built

from CDR’s, https://www.exacaster.com/call-detail-record-analysis/

16. Erdem, T., Swait, J. and Louviere, J., 2002. The impact of brand credibility on consumer

price sensitivity. International journal of Research in Marketing, 19(1), pp.1-19.

17. Prachi M, July 11, 2019 Service Marketing, https://theinvestorsbook.com/service-

marketing.html
67

APPENDICES

The following instructions and codes will provide a step-by-by guidance to understand the
project artifacts. The machine learning tools that have been used are Python and Rapid Miner for
visualization, data processing and modeling Logistic Regression.

1. Dataset: “WA_Fn-UseC_-Telco-Customer-Churn.csv”

2. Source: https://www.kaggle.com/pavanraj159/telecom-customer-churn-prediction/data

3. User environment is Python

4. Import necessary library for the task


import numpy as np
import io
import pandas as pd
import seaborn as sns
import os
import matplotlib.pyplot as plt
import plotly.offline as py

from PIL import Image


%matplotlib inline

py.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.tools as tls
import plotly.figure_factory as ff
import itertools
import warnings
warnings.filterwarnings("ignore")

5. Gain access into dataset


telchurn = pd.read_csv(“file_path.csv”)

telchurn = pd.read_csv("C:/Users/OLANREWAJU/Documents/Courses/Second
Semeter/Data Mining/CA/CA Project/WA_Fn-UseC_-Telco-Customer-
Churn.csv")

6. Return the first n rows in dataset


telchurn.head()
68

7. #Data Understanding - Size description, to see total number of rows and columns
print("Rows : " , telchurn.shape[0])
print ("Columns : " , telchurn.shape[1])

8. #Data Preparation - check for missing values


print ("\nMissing values : ", telchurn.isnull().sum().values.sum())

9. #Data Preparation - check for N/A, NaN, bias,variance, error


telchurn.isnull().any()

10. #Check for unique values

print ("\nUnique values : \n",telchurn.nunique())

11. #Data Preparation - Manipulation, Replace null in Total Charges, Drop null

telchurn['TotalCharges'] = telchurn["TotalCharges"].replace(" ",np.nan)


telchurn = telchurn[telchurn["TotalCharges"].notnull()]
telchurn = telchurn.reset_index()[telchurn.columns]

12. #Convert to float


telchurn["TotalCharges"] = telchurn["TotalCharges"].astype(float)

13. #Customers w/o internet are unhappy. Replace "No internet service" to "Not Happy"
replace_cols = [ 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
'TechSupport','StreamingTV', 'StreamingMovies']
for i in replace_cols :
telchurn[i] = telchurn[i].replace({'No internet service' : 'Not happy'})
telchurn["SeniorCitizen"] = telchurn["SeniorCitizen"].replace({1:"Happy",0:"Not
Happy"})

14. #Tenure to categorical column


def tenure_lab(telchurn) :
if telchurn["tenure"] <= 12 :
return "Tenure_0-12"
elif (telchurn["tenure"] > 12) & (telchurn["tenure"] <= 24 ):
return "Tenure_12-24"
elif (telchurn["tenure"] > 24) & (telchurn["tenure"] <= 48) :
return "Tenure_24-48"
elif (telchurn["tenure"] > 48) & (telchurn["tenure"] <= 60) :
return "Tenure_48-60"
elif telchurn["tenure"] > 60 :
return "Tenure_gt_60"
telchurn["tenure_group"] = telchurn.apply(lambda telchurn:tenure_lab(telchurn),
axis = 1)
69

15. #Data Preparation - separation


#customer churn from non churn

churn = telchurn[telchurn["Churn"] == "Yes"]


not_churn = telchurn[telchurn["Churn"] == "No"]

#categorical and numerical


Id_col = ['customerID']
target_col = ["Churn"]
cat_cols = telchurn.nunique()[telchurn.nunique() < 6].keys().tolist()
cat_cols = [x for x in cat_cols if x not in target_col]
num_cols = [x for x in telchurn.columns if x not in cat_cols + target_col + Id_col]

16. # Pie chart Visualization -

lab = telchurn["Churn"].value_counts().values.tolist()

trace = go.Pie(labels = lab ,


values = val ,
marker = dict(colors = [ 'lime' ,'red'],
line = dict(color = "white",
width = 1.3)
),
rotation = 90,
hoverinfo = "label+value+text",
hole = .5
)
layout = go.Layout(dict(title = "Customer Churn Data",
plot_bgcolor = "rgb(243,243,243)",
paper_bgcolor = "rgb(243,243,243)",
)
)

data = [trace]
fig = go.Figure(data = data,layout = layout)
py.iplot(fig)

17. #Visualization of customer churn by tenure


tg_ch = churn["tenure_group"].value_counts().reset_index()
tg_ch.columns = ["tenure_group","count"]
tg_nch = not_churn["tenure_group"].value_counts().reset_index()
tg_nch.columns = ["tenure_group","count"]

#Churn Customers
70

trace1 = go.Bar(x = tg_ch["tenure_group"] , y = tg_ch["count"],


name = "Churn Customers",
marker = dict(line = dict(width = .5,color = "black")),
opacity = .9)

#Non-churn Customers
trace2 = go.Bar(x = tg_nch["tenure_group"] , y = tg_nch["count"],
name = "Non Churn Customers",
marker = dict(line = dict(width = .5,color = "black")),
opacity = .9)

layout = go.Layout(dict(title = "Customer Churn by Tenure",


plot_bgcolor = "rgb(243,243,243)",
paper_bgcolor = "rgb(243,243,243)",
xaxis = dict(gridcolor = 'rgb(255, 255, 255)',
title = "Tenure",
zerolinewidth=1,ticklen=5,gridwidth=2),
yaxis = dict(gridcolor = 'rgb(255, 255, 255)',
title = "Customer Count",
zerolinewidth=1,ticklen=5,gridwidth=2),
)
)
data = [trace1,trace2]
fig = go.Figure(data=data,layout=layout)
py.iplot(fig)

Rapid Miner

1. Load Data: Select ‘import process’ from file in Rapid Miner. Choose ‘.rmp’ file
format from repository.
2. Double-click ‘load and process’
3. Insert or drag and drop dataset.csv file from folder into ‘load data’
4. Execute process: click play button
5. Result

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy