0% found this document useful (0 votes)
19 views16 pages

GiaoHoThanh - RFM and CLV Paper - V2

The document summarizes a study that performed customer segmentation and predicted customer lifetime value (CLV) using machine learning algorithms. Specifically, it used the Recency-Frequency-Monetary (RFM) model to segment customers into groups and the Pareto/Negative Binomial Distribution and Gamma-Gamma models to predict CLV. The study experimented on transaction data from 121,317 customers and found the models achieved high accuracy according to evaluation metrics. The proposed models can help businesses better understand customers and implement effective marketing strategies tailored to each customer group.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views16 pages

GiaoHoThanh - RFM and CLV Paper - V2

The document summarizes a study that performed customer segmentation and predicted customer lifetime value (CLV) using machine learning algorithms. Specifically, it used the Recency-Frequency-Monetary (RFM) model to segment customers into groups and the Pareto/Negative Binomial Distribution and Gamma-Gamma models to predict CLV. The study experimented on transaction data from 121,317 customers and found the models achieved high accuracy according to evaluation metrics. The proposed models can help businesses better understand customers and implement effective marketing strategies tailored to each customer group.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Machine Translated by Google

Customer segmentation analysis and customer lifetime value


prediction using Pareto/NBD and Gamma-Gamma model
Kim-Giao Tran1,2 , Van-Ho Nguyen1,2, Thanh Ho1,2,*
1University of Economics and Law, Ho Chi Minh City, Vietnam
2Vietnam National University, Ho Chi Minh City, Vietnam
*Corresponding author, Email: thanhht@uel.edu.vn

ABSTRACT

Customer segmentation divides customers into groups with common characteristics such as
demographics, interests, needs, or locations. This will help the organization to manage customer
relationships expertly and gain a deep understanding of customers. With the advancement of
current technology and the proliferation of machine learning methods, this study performs data
science algorithms into traditional marketing such as Recency, Frequency and Monetary (RFM)
model for customer segmentation and Pareto/Negative binomial distribution (NBD), Gamma-
Gamma model for predicting customer lifetime value (CLV) to determine customer value. This
study experiments on a customer segmentation model based on a dataset of 121,317 historical
transactions, including individual customers and retailers. Then, customers are divided into
different groups based on their similar specialties and behaviors to help managers make better
decisions to retain customers. In addition, predicting CLV will help the business consider
customers more comprehensive.
Experimental results and model evaluation show that the evaluation metrics have high accuracy
with Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error
(RMSE), and F1 scores of 0.79, 0.89, 0.94, and 0.91, respectively . Based on the empirical
results, the proposed research model can be also applied in other businesses that will help them
get the right and effective business strategies for each customer group depending on their
financial and human potential.
Keywords: RFM model, customer segmentation, clustering, CLV, Pareto/NBD, customer
retention
1. Introduction

In the intense competition and complexity of the business environment, customer


segmentation helps the marketing departments easily define the pivotal solution to attract each
group of customers. Based on the data segmentation, customers are classified into different
groups according to distinguishing similarities such as gender, age, income, products of interest,
and purchasing behaviors (Anitha, P. & Patil, MM, 2019). These characteristics are analyzed
and categorized based on the historical purchasing data of the business. Recency, Frequency,
and Monetary (RFM) has been very famous in marketing as a tool to identify a company's best
customers by calculating and analyzing their spending habits. RFM analysis weights customers'
importance by scoring them in three measurements such as how recently they have made a
purchase (Recency), how often they have bought
(Frequency), and how much they have spent (Monetary) (Thanh, HT, Son, NT, 2021).
Besides using RFM for customer segmentation, customer lifetime value (CLV), retention
rate and churn rate are a combination of robust metrics to measure customer satisfaction. While
CLV is the discounted value of future profits that the customer spends on the company (Glady,
N., Baesens, B. & Croux, C., 2007), the retention rate shows the ability of a company to keep its
existing customers ( Ismail, M.B.M. & Safrana, M.J., 2015).
Machine Translated by Google

In contrast, the churn rate is the percentage of customers moving out of a cohort over a
particular period.
As Kotler and Keller describe customers' churn as a phenomenon that results in a
waste of money and efforts (Kotler, P. & Keller, KL, 2006), choosing to focus on retaining old
customers and turning them from potential customers into loyal customers will help businesses
reduce more costs than building advertising campaigns to attract new customers.
However, the problem is that when a business has a lot of customers and all of them have
made many transactions with the business, it is tough to know if they are still attached to the
business or not. Besides, businesses cannot calculate exactly when a customer will leave but
can only predict based on probabilities, so this is even more difficult.
Realizing the importance of customers to businesses, this study identifies the goal of
analyzing customer segmentation and customer lifetime value with a combination of business,
marketing, and information technology knowledge bases. The final result is to give managers
a multi-dimensional view of their customers. That makes it easier for managers to decide
whether to implement appropriate marketing strategies for each customer group as well as to
assess whether existing customer care policies are still appropriate for retaining customers or
not.

The following content of the article is Section 2, including the theoretical basis and
related studies, to identify models and algorithms suitable for the set goals. Section 3 is the
methodology that describes relevant issues and experimental processes. After the experimental
process, the results and discussion of the identified customer segments are mentioned in
Section 4. The last Section is the conclusion and implications of the study.
2. Theoretical background and related work
This section provides the literature overall and some related researches based on the
purpose of this study.
The RFM model is usually used to classify customers and define their behaviors.
RFM records the customers' transactions under three factors:

(1) Recency is the distance between the last purchasing date of that customer and the
date of implementing the model;
(2) Frequency is the total transactions of that customer;
(3) Monetary is the actual money that the customer had spent on businesses' products
or services.

The most well-known clustering methods by RFM are customer quintiles


(Miglautsch, 2000) and clustering by K-Means.
Clustering using the K-Means algorithm is a method of unsupervised learning used for
data analysis (Anitha, P. & Patil, MM, 2019). It generates k points as initial centroids randomly,
with k is chosen by users. Each point is assigned to the cluster with the closest centroid. Then
the centroids are updated by taking the mean of the points of each cluster (Anitha, P. & Patil,
MM, 2019) (Ismail, M. & Dauda, U., 2013) (Yedla, M., Pathakota, S.
R. & Srinivasa, T.M., 2010). The data points may move to different clusters after each iterative
approach. The chosen centroids are defined when there are no point changes clusters
or the centroids remain. The algorithm mainly uses Euclidean distance to measure the
distance between data points and centroids (Dwivedi, S., Pandey, P., Tiwari, MS & Kalam, A.,
2014). The formula to calculate Euclidean distance between two multi-dimensional data points
= ( 1, 2, 3, … , m) and = ( 1, 2, 3, … , m) is described as equation (1):
(first)
( , ) = ÿ( 1 ÿ 1) 2 + ( 2 ÿ 2) 2 + ÿ + ( m ÿ m) 2
Machine Translated by Google

Although K-Means is the most common algorithm to classify clusters, it still has some
drawbacks. Because the centroids are first chosen randomly, the results can turn out
different for different runs. Besides, defining the right number of clusters is also a huge
problem to deal with. Thanh HT and Son ND (2021) used the Elbow method
to find the optimal number of clusters then use the Silhouette method to re-evaluate the
results above, while Anitha and Patil (2019) only used the Silhouette score to find the optimal
k. These studies point out the efficiency of the clustering method in Data Science and also
perform the clustering results in RFM analysis and provide customers' different behaviors in
specific clusters.
The Elbow method is used to determine the number of clusters of a dataset by using
the visual technique. The graphic obtains the results from the Sum Squared Error (SSE)
calculation, which measures the difference between points in clusters. The more the number
of clusters k, the smaller the SSE value will be. If the value of the former cluster and the
value of the later cluster draw an angle between them, the cluster at the elbow flexion point
will be the chosen cluster or the cluster with the biggest reducing value compared with its
former will be chosen (Thanh, HT, Son, NT, 2021) (Humaira, H. & Rasyidah, R., 2020)
(Nainggolan, R., Perangin-angin, R., Simarmata, E. & Tarigan, F.A., 2015). The formula of
SSE calculation is described as equation (2):
j

i) 2 (2)
= ÿÿ ( ij,
=1 =1

Where m is the centroid of the data point x and k is the number of clusters. The
graphic which obtains the values of the SSE calculation for the different number of clusters
will perform the looks visual as an elbow arm. The Elbow method is easy to implement and
adequately fitted with perplexing, huge data, but its weakness is that the user must choose
the number of clusters based on experience (Humaira, H. & Rasyidah, R., 2020).
Along with the Elbow method, the Silhouette score is also an effective way to see
how well each cluster is separated from the others. In the two studies (Anitha, P. & Patil, M.
M., 2019) (Humaira, H. & Rasyidah, R., 2020), the authors give two different theories about
the range values of the Silhouette score. After researching more deeply, the Silhouette score
is informed to be in the range [ÿ1, +1], if it is scored near +1, the clustering quality performed
well, if it is valued at 0, we can say there is no distinction between the clusters, and if it is
near -1, the clusters were not distributed well (Ogbuabor, G. & Ugwoke, FN, 2018). The
formula to calculate Silhouette score is written as equation (3):
iÿi
ÿ i= (3)
max ( i, i)

With a is the average intra-cluster distance (the mean distance between i and the
data points in the same cluster), and b is the average inter-cluster distance (the mean
distance between the data point i to all the data points outside its cluster).
Pareto/negative binomial distribution (NBD) model is one of the most classic used
RFM models to calculate CLV. The model mostly uses the recency, frequency, and length
of the customer's observation period to predict the customer's future purchases (Qismat, T.
& Feng, Y., 2020). The Pareto/NBD model is developed by Fader and Hardie. They also
describe the model that is based on five assumptions (Peter, SF, Bruce, GSH & Ka, L.
L., 2004):
Machine Translated by Google

(1) The transactions made by a customer in a period of length follow a Poisson distribution
with transaction rate. It means that they can purchase randomly
whenever they want in their active period, but the rate (in a unit time) is constant.
(2) Heterogeneity in transaction rates across customers follows a gamma distribution
with shape parameter and scale parameter .
(3) Each customer has an unobserved lifetime. In other words, the point at which the
customer becomes inactive or churned is distributed exponentially with the dropout rate .

(4) Heterogeneity in dropout rates across customers follows a gamma distribution with shape
parameter and scale parameter .
(5) Each customer has a varied transaction rate and the dropout rate.
The Gamma-Gamma model is the extension of the Pareto/NBD model. While the Pareto/
NBD model only focuses on the recency and frequency factors, the Gamma-Gamma model uses
the monetary component to predict the average future purchase value (Aslekar, A., Piyali, S. &
Arunima, P., 2019) (Aslekar, A., Piyali, S. & Arunima, P., 2019) ( Qismat, T. & Feng, Y., 2020).
The Pareto/NBD and Gamma-Gamma models are a powerful combination to calculate CLV.
While Pareto/NBD predicts future purchases, the Gamma-Gamma model allows us to assign a
monetary value to each of those future purchases. To ensure to have the best estimated CLV,
these models can be evaluated in the holdout period before making forecasts.
3. Methodology and proposed research model
Figure 1 describes methodology and proposed research model with three main stages:
(1) Stage 1 is customer segmentation analysis according to RFM. From the input data is a
dataset extracted from the sales department of Microsoft's Adventure Works sample
data, perform data preprocessing, and calculate recency, frequency, and monetary
values for use in the RFM model. After preprocessing the data and realizing the difference
between the data points, the study will implement the standardization for input data, then
using some methods related to K-Means to find out the optimal number of clusters for
segmentation;

Figure 1. Overview of the proposed research model

(2) Stage 2 is to use Pareto/NBD and Gamma-Gamma model to predict the number of
purchases and revenue that customers yield in the future, the root of these two models
is to be exploited and developed from the RFM model. Build the above two models on
the training set and re-evaluate the predictions on the test set to see the accuracy
Machine Translated by Google

of the model. Repeat this loop by changing the indexes in the model until the model
gives the most optimal results;
(3) Stage 3 is from the two optimal models above, performing customer lifetime value
(CLV) prediction.
4. Experimental results and discussion
4.1. Customer segmentation using RFM
The first stage of the experimental process includes preprocessing data
standardization, RFM data construction, and K-Means customer segmentation (Figure 1).
4.1.1. Dataset and data preprocessing
The study uses a dataset of customer transactions extracted from the dataset of the
company Adventure Works Cycles. This is a multinational company that manufactures and
sells bicycles to the North American, European, and Asian markets. The extracted dataset
records 121,317 transactions of the company from 06/2011 to 07/2014. This includes both
individual customers and retailers. To analyze the optimal customer segment for each
different market, the study filters out the transactions made in the US (United States) market
for use in further analysis.
4.1.2. Customer retention analysis
Before jumping into clustering customers by the RFM model, the study will briefly
analyze the company's customer retention situation to find out the insight of its business
status. Adventure Works is a business that manufactures and sells bicycles for both
individuals and resellers, but bicycles are non-essential and can be used in the long term,
the number of customers who have one transaction only over 3 years is very high, at
74.31% (Figure 2). Meanwhile, the number of customers who used to repeat transactions
with the company only accounted for 25.69% but brought even higher revenue than the others over
time. time. In particular, there would be a sudden increase in the revenue that this group of
customers brought to the business every 1 month.

Figure 2. Sales by customers over time

It can be seen in Figure 3 that, in the period from 05/2011 to 06/2013, approximately
two years, the number of regular customers was higher, almost all customers returned to
make transactions again with the company. However, starting from 07/2013, when the
business had a sudden growth in attracting more customers, the number of customers
leaving when they only transacted once with the business was very high.
Machine Translated by Google

Figure 3. Number of churn and repeated customers over time

Grouping customers according to cohort, also known as grouping customers according to the
timeline from the customer's first transaction (Croll, A. & Yoskovitz, B., 2013). The formula to calculate the
retention rate is described as equation (4):
ÿ ÿ
= (4)

The retention rate of each cohort is shown on the horizontal axis of Figure 4. With
the analysis of customer retention rate using the heat chart, it can be seen:

Figure 4. Retention rates in cohort analysis

Customers of the business did not transact regularly once a month, but on average, customers
came back every 2-3 months. With the group of customers having transactions from the beginning of the
observed period, from June 2011, only 4% of customers returned to transact in the next month. However,
with a cycle every 2-3 months, the customer
Machine Translated by Google

retention rate of the business at this time was very high, in the 34th month, it still maintained
67% of the total number of original customers.
In contrast, the retention rate for new customer groups decreased significantly.
Generally, the company's customer retention policy was appropriate for the period before 2012
and was able to retain this group of loyal customers until the end of the period.
However, it seemed to be no longer suitable for new customer groups, especially when the
business in later period promoted marketing and attracted more customers but cannot keep
them. Businesses should focus more on customer care policies as well as targeted marketing
campaigns to attract returning customers.
4.1.3. Customer segmentation based on RFM scores
This is the traditional and the simple way that can explain how the RFM model works.
The RFM model is famous for transforming transactional data, which basically includes
CustomerID – unique customer code, SalesOrderID – unique invoice code, ProductID – unique
product code, InvoiceDate – date of the transaction, Quantity – quantity of purchased items, Unit
Price – the price of one item, Country – country of the transaction, into profitability scores (Zaki,
M., Kandeil, D. & Neely, A., 2016). After calculating recency, frequency, and monetary for RFM
analysis, the characteristics of the statistical distribution of these factors such as average,
minimum value, maximum value, as well as quartiles are described in Table 1. The average last
purchased date is 206 days ago with nearly 1.5 purchases and 1473.8 revenue in total.

Table 1. Quartiles description in RFM table

Recency Frequency Monetary


Mean 206.377101 1.466626 1473.809070

Min 0.000000 1.000000 1.374000

Max 1122.000000 12,000000 58662.190608


st
first
quartile 91,000000 1.000000 21.490000

2 nd quartile 177.000000 1.000000 69.990000


rd
3 quartile 277.000000 2.000000 2294.990000

While the authors in (Zaki, M., Kandeil, D. & Neely, A., 2016) ranked customers in
quintiles. This study chose to rank them based on the quartiles. Following the related works,
customers with the most recency value will have a 1 R score. In contrast, the ones with the
lowest recency received a 4 R score because the customers with more recent transactions are
considered more valuable to the business. This step was performed repeatedly for the frequency
and monetary but in a reverse way, which means the highest frequency and monetary received
4 scores and the ones with the lowest had 1 score. Noted that the customers' value is
proportional to RFM scores. By mapping the RFM scores, we had the worst valuable customers
who had an overall RFM score of 111. On the contrary, the ones with an RFM score of 444 were
considered as top customers to the company.
This study divided customers into segments based on the distinguished segmentation of
the RFM scores, which Jasmin (2020) described as a graphic in her blog. This study uses the
exemplary figures in the reverse RFM scores.
Machine Translated by Google

Figure 5. Customer segmentation distribution

Figure 5 shows the RFM segmentation labeling results in a treemap. Based on different
segments, the company needs a specific strategy to develop its business status. The company
had a huge number of new customers like Unsteady Customers (36.53%) with high monetary
value. It is advised to build a long-term relationship with these customers by cross-selling
strategy or specific promotions. Besides, the Customers At Risk segment, which accounted for
16.95%, is also a potential group from which the business can exploit. With a very high monetary
value but having stopped trading for a long time, finding a way to contact and pull these
customers back will bring a great benefit to the business. Top Customers and Active Customers
accounted for a small percentage but the profits that they brought were considerable, the
company cannot lose them. Finally, the company had quite many Inactive and Lost customers.
The study mentioned that the company with main products such as long-term usable bicycles
could have many churned customers, but the managers could research more insight into these
customer groups to find the exact churned reason to re-engage these customers as much as
possible.
4.1.4. Data standardization

After preprocessing the data and preparing the input data for the RFM model with the
corresponding recency, frequency, and monetary values, it was found that there is a huge
difference between these three values, which can affect the model run time and the accuracy of
the algorithms. The study conducted data standardization according to the standard distribution
method (Standard Score), also known as Z-Score (Ismail, M. & Dauda, U., 2013)
to bring the data to a distribution range that where the mean value of the observations is 0 and
the standard deviation is 1. The formula for standardizing the data is described as equation (5):

ÿ
= (5)

With x is the initial value before standardization, ÿ is the mean value of the observations,
and ÿ is the standard deviation of the observations. After standardizing the
Machine Translated by Google

data, it can be concluded that the three current recency, frequency, and monetary values
weight equally when included in the analysis in the K-Means clustering model.
4.1.5. The optimal number of clusters for K-Means
As having described these models' literature in the theoretical basis part, this study
uses the Elbow method and Silhouette score to find the most optimal number of clusters of
the dataset. The result for the number of clusters from 2 to 9 is described in Figure 6. The
graphic has the visualization as an elbow and the SSE line shows that the elbow flexion point
is around = 3 or = 4. Silhouette score will be implemented to re-evaluate the quality of finding
optimal k in the Elbow method to get the final result.

Figure 6. Elbow method result Figure 7. Silhouette score result


Figure 7 illustrates the results that Silhouette scored each cluster. It can be seen
that = 3 with 0.74 score is the highest among other clusters. The indicator means that
with = 3, the distance from data points to their centroid in each cluster is optimized and
the cluster eccentricity barely occurs. Therefore, the study will use the number of = 3 to
cluster customers into different levels based on three factors of the RFM model (Recency,
Frequency, and Monetary).
The number of customers and the average value of recency, frequency, and
monetary of each cluster after being divided into 3 clusters by K-Means are all described in Table 2.
It can be seen that the Gold cluster includes the least number of customers who have the
most transactions, relatively recent purchases and bring the highest revenue for the business.
In the other two groups, they quite have similar frequency and monetary mean value, but
the average recency value of one group is nearly twice more than the other one. The group
with the most recency value is labeled as the Bronze group because of no recent transactions
with the business.
Table 2. Each cluster description

Cluster Customers Mean recency Mean frequency Mean currency


Gold 216 149.527778 8.606481 11299.873848
silver 4665 148.998928 1.231726 1162.314786
Bronze 3329 290.471012 1.332532 1272.754954

Figure 8 describes the clustering result with = 3 in three-dimensional space. The


Silver level is the group with the highest convergence, but there is still confusion between
the data points in the Silver and Bronze levels. The Silver group contains customers who
Machine Translated by Google

have more stable recency and frequency indexes while the ones in the Bronze level have
even higher monetary value but stopped trading for a long time. Besides the Gold group
with its distinction from the others, Silver and Bronze were labeled mainly based on the
average recency value.

Figure 8. Clustering result by RFM level

4.2. Predicting CLV using Pareto/ NBD and Gamma-Gamma model


The second stage of the experimental process includes Pareto/NBD and Gamma-
Gamma model data construction (Figure 1), calibration and holdout dataset divided for
evaluation, then using the models to predict CLV.
4.2.1. Constructing input data for Pareto/ NBD and Gamma-Gamma model
Because the Pareto/NBD and Gamma-Gamma models use the RFM basis, the data
used for these models are quite the same as the previous data construction. The difference
is Pareto/NBD only considers customers with repeated transactions, which means the
customers with only one transaction have = 0 in this model.= The
0 and
Pareto/NBD model also
uses another factor, which is the customer lifetime (T) calculated as the distance from the
customer's first purchase date to the model implementation date.

The data for the Gamma-Gamma model is the same as the data for the Pareto/NBD
model but only the rows have frequency and currency bigger than 0.
4.2.2. Calibration and holdout dataset

The calibration dataset starts at the beginning of the observed period from June 7,
2011 to July 7, 2013, and the holdout period spans from July 8, 2013 to July 7, 2014,
exactly 365 days. The percentage is approximately 70% in calibration and 30% in the
holdout dataset.

4.2.3. Predicting future purchases using Pareto/ NBD model


Figure 9 illustrates the number of purchases in the calibration dataset on the x-axis
and the corresponding average number of purchases in the holdout dataset on the y-axis.
As can be seen, the model predicts that the customers with the higher purchases in the
calibration will also have higher purchases in the holdout, except for a slight reduction in
Machine Translated by Google

customers with 5 purchases in the calibration. In contrast, in actual data, the holdout set
shows more unpredictable volatility.

Figure 9. Actual and predicted purchases of Pareto/NBD model in holdout dataset

Evaluating models can be the most important step in Data Science. This study uses
several indicators to evaluate the quality of the prediction model. The formulas of these
indicators are described as equations (6), (7), and (8) (Chicco, D., Warrens, M.J. & Jurman,
G., 2021):
first

= (6)
ÿ| i ÿ i|
=1

first

= (7)
ÿ( i ÿ i) 2
=1

(8)
ÿ( i ÿ i) 2
=1
=ÿ1

With X is the actual values and Y is the predicted values. MAE is a measurement
of errors between actual and predicted observations. MSE measures the average of
squares of the errors or can be understood as the average squared difference between
the actual and estimated values. RMSE is also a measurement to evaluate the differences
between two observations. The more these indicators move close to 0, the fewer errors
the predictions have. Table 3 shows that the Pareto/NBD predicts the future purchases
fairly well while the evaluation values are all small, nearly 0.
Table 3. Pareto/NBD purchases prediction evaluation

Types Results

Mean Absolute Error (MAE) 0.7904071127295603

Mean Squared Error (MSE) 0.8850164704192321

Root Mean Squared Error (RMSE) 0.9407531399996665


Machine Translated by Google

4.2.4. Predicting the future average order value using the Gamma-Gamma model
The estimated results of the Gamma-Gamma model are shown in Figure 10. The
histogram plots the monetary value distribution of the actual and estimated observations. It
shows that predicted results tend to be smaller than actual and both predicted and actual
monetary values are concentrated near zero.

Figure 10. Actual and predicted of the Gamma-Gamma model in the holdout set

Because the difference in monetary values is much larger than the factor used in the
previous model. Instead of using metrics such as MAE, MSE, and RMSE, which are often
used for normalized, standardized datasets or have values close to zero, here the chosen
option is dividing monetary values into 5 bins according to ordinal variable and K-Means.
Then use the confusion matrix and the F1-score to evaluate the accuracy of the model.
The confusion matrix Figure 11 shows that the Gamma-Gamma model worked well
in the holdout set while the predictions were mostly divided into the right bins. The F1 score
is 0.9 which means the estimation had high accuracy.

Figure 11. Confusion matrix of actual and predicted monetary value

After training and evaluating the models to check their quality, the Gamma-Gamma
model was implemented again in the initial dataset to check if it had the overfitting problem
Machine Translated by Google

or not. Fortunately, the model also performed well in the original data set, demonstrated when
Figure 12 shows that the actual and predicted monetary values have a linear correlation, and the
histogram gives the results of the true and predicted values almost overlapping (Figure 13). After
training and fixing the Pareto/NBD and Gamma-Gamma model, finding that the two models
worked quite well and the evaluation was very high, the study will apply these two models to
predict the CLV for the company's customers.

Figure 12. Scatter plot of actual and Figure 13. Histogram of actual and predicted
predicted monetary value in initial monetary value in initial dataset
dataset

4.2.5. Predicting CLV values

Figure 14. Average predicted CLV by customer segmentation

Among the 5 groups containing repeated customers of the business in Figure 10, the
model predicts that the Top Customers group has the largest customer lifetime value. The Active
Customers and Unsteady groups have a fairly low value because they did not make many
transactions with the business during the observed period. Although the Customers At Risk group
had not traded with the business for a long time, the number of orders and the amount of revenue
that this customer group could bring is very large for the business, its estimated CLV is fairly high.
Machine Translated by Google

4.3. Discussion

The study had found out the relationship between the original RFM customer
segmentation and the RFM customer clustering by K-Means. Figure 15 describes the total
number of customers in 8 segments divided by RFM score and 3 clusters classified by K-
Means. As can be seen that the customers in Gold level were mostly divided into the Top
Customers, Emerging Customers, and Customers At Risk, these were also the three which
had the most predicted CLV in the previous analysis. This shows that the models used in
the study are closely related to each other.
Besides, as mentioned above, the Bronze and Silver groups had almost the same
Frequency and Monetary indexes, only a big difference in Recency. Therefore, the K-Means
model for clustering has not been fully effective. Since each run of K-Means gave different
clustering results, and the user had to base on those results and label the clusters for each
customer group, this requires a lot of expertise in the field to be able to effectively cluster
and label the group thoroughly . However, the K-Means clustering and the predictions of
the Pareto/NBD and Gamma-Gamma model were matching when the 3 top customer
groups in CLV including customers in Gold level, and the two next most CLV groups
contained mostly customers from the Silver group .

Figure 15. The number of customers by segmentation and RFM level

5. Conclusions and implications


By combining marketing and business knowledge with information technology, a
clearer view of the Adventure Works company was realized. RFM is easy to apply and
flexible method for implementing customer segmentation. As Mark Patron commented that
RFM did not provide the company the profitability and the potential of a customer (Mark, P.,
2004), using the combination of RFM and CLV results to find hidden potential customers
in the business will be very profitable. Managers can base on that to implement customer
care policies such as discounts and customer gratitude programs for Gold and Silver
customers or use cross-selling strategy to maximize profits from existing customers as well
attract as new customers. The model in this study was defined to use most effectively for
this dataset. They can be developed to use especially based on the company's needs.
Machine Translated by Google

References

Anitha, P. & Patil, M. M. (2019). RFM model for customer purchase behavior using K-Means
algorithm. Journal of King Saud University – Computer and Information Sciences,
1-8. doi:10.1016/j.jksuci.2019.12.011
Aslekar, A., Piyali, S. & Arunima, P. (2019). Big Data Analytics for Customer Lifetime Value
Prediction. Telecom Business Review, 12(1), 46-49. Retrieved from http://
publishingindia.com/tbr/
Chicco, D., Warrens, M.J. & Jurman, G. (2021). The coefficient of determination R-squared
is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression
analysis evaluation. PeerJ Computer Science, 7(3), e623. doi:10.7717/peerj-cs.623
Croll, A. & Yoskovitz, B. (2013). Use the Data to Build a Better Startup Faster (1 ed.).
Cambridge: O'Reilly Media.
Dwivedi, S., Pandey, P., Tiwari, M.S. & Kalam, A. (2014). Comparative Study of Clustering
Algorithms Used in Counter Terrorism. IOSR Journal of Computer Engineering (IOSR-
JCE), 16(6), 13-17.
Glady, N., Baesens, B. & Croux, C. (2007). A modified Pareto/ NBD approach for predicting
customer.customer lifetime value. Leuven: Ltd. doi:https://doi.org/
Elsevier
10.1016/j.eswa.2007.12.049
Humaira, H. & Rasyidah, R. (2020). Determining The Appropriate Cluster Number Using
Elbow Method for K-Means Algorithm. Proceedings of the 2nd Workshop on
Multidisciplinary and Applications (WMA) 2018 (pp. 24-25). Padang: EAI. doi:10.4108/
eai.24-1-2018.2292388

Ismail, M. & Dauda, U. (2013). Standardization and Its Effects on K-Means Clustering
Algorithm. Research Journal of Applied Sciences, Engineering and Technology,
6(17), 3299-3303. doi:10.19026/rjaset.6.3638
Ismail, M.B.M. & Safrana, M.J. (2015). Impact Of Marketing Strategy On Customer Retention
In Handloom Industry. Sri Lanka: 5th International Conference, SEUSL.
Jasmine. (2020, November 12). Machine Learning In Customer Segmentation With RFM-
Analysis. Retrieved from Nextlytics: https://www.nextlytics.com/blog/machine-
learning-in-customer-segmentation-with-rfm-analysis
Kotler, P. & Keller, K.L. (2006). Marketing Management (12th ed.). New Jersey: Pearson
Prentice Hall.

Mark, P. (2004). Applying RFM segmentation to the SilverMinds catalog. Journal of Direct
Data and Digital Marketing Practice, 5(3), 269-275. doi:10.1057/palgrave.im.4340243

Miglautsch, J.R. (2000). Thoughts on RFM scoring. Journal of Database Marketing &
Customer Strategic Management, 8(1), 67-72. doi:10.1057/palgrave.jdm.3240019
Nainggolan, R., Perangin-angin, R., Simarmata, E. & Tarigan, F.A. (2015). Improved the
Performance of the K-Means Cluster Using the Sum of Squared Error (SSE)
optimized by using the Elbow Method. Journal of Physics: Conference Series, 1361.
doi:10.1088/1742-6596/1361/1/012015
Machine Translated by Google

Ogbuabor, G. & Ugwoke, F. N. (2018). Clustering Algorithm For A Healthcare Dataset


Using Silhouette Score Value. International Journal of Computer Science &
Information Technology (IJCSIT), 10(2), 27-37. doi:10.5121/ijcsit.2018.10203
Peter, SF, Bruce, GSH & Ka, LL (2004). “Counting Your Customers” the Easy Way: An
Alternative to the Pareto/NBD Model. Marketing Science, INFORMS, 24(2),
261-282. doi:10.1287/mksc.1040.0098

Qismat, T. & Feng, Y. (2020). Comparison of classical RFM models and Machine learning.
Norway: Master of Science.
Thanh HT, Son N D. (2021). An interdisciplinary research between analyzing customer
segmentation in marketing and machine learning method. Sci. Tech. Dev. J. - Eco.
Law Manag, 6(1):2005-2015.
Yedla, M., Pathakota, S.R. & Srinivasa, T.M. (2010). Enhancing K-means Clustering
Algorithm with Improved Initial Center. International Journal of Computer Science
and Information Technologies, 1(2), 121-125.
Zaki, M., Kandeil, D. & Neely, A. (2016). The Fallacy of the Net Promoter Score: Customer
Loyalty Predictive Model. UK: Cambridge Service Alliance.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy