Recommender System Based On Customer Segmentation (RSCS)
Recommender System Based On Customer Segmentation (RSCS)
Access to this document was granted through an Emerald subscription provided by emerald-srm:374558 []
For Authors
Downloaded by University of Birmingham At 05:32 09 June 2016 (PT)
If you would like to write for this, or any other Emerald publication, then please use our Emerald for Authors service
information about how to choose which publication to write for and submission guidelines are available for all. Please
visit www.emeraldinsight.com/authors for more information.
About Emerald www.emeraldinsight.com
Emerald is a global publisher linking research and practice to the benefit of society. The company manages a portfolio of
more than 290 journals and over 2,350 books and book series volumes, as well as providing an extensive range of online
products and additional customer resources and services.
Emerald is both COUNTER 4 and TRANSFER compliant. The organization is a partner of the Committee on Publication
Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archive preservation.
1. Introduction
For internet based business, the importance of appropriate recommendations is growing fast
and people are increasing expecting suitable recommendations from those businesses to
identify products and services (Carrer-Neto et al., 2012). That is why many companies and
websites have initiated the implementation of recommendation systems in recent years to
identify customer interests. The recommender system is designed to assist users to identify
Downloaded by University of Birmingham At 05:32 09 June 2016 (PT)
2. Literature Review
The objective of the research on the recommendation systems is to improve the accuracy of
the suggestions to the customers. The customer’s segmentation goal is to recognize the
valuable customers which have higher impacts on the company’s profitability and future
Downloaded by University of Birmingham At 05:32 09 June 2016 (PT)
purchases. If the recommendation systems can be combined with the segmentation methods,
the accuracy of the recommendations to each cluster of the customers can be improved as
the customers in each cluster have considerably similar attitudes in commodities selections
and purchase trends. As a result, it improves the recommendations to each group of the
customers which is the final objective of all the recommendation systems. In this research, a
customer segmentation method (Rezaeinia et al. 2012) is combined with the
recommendation systems which improves the accuracy of the recommendations. It also
presents a novel method on recommendation systems.
The aim of clustering is to classify data so that each data group has the most possible
differences from other groups and the data in each group has the most possible similarities
with the same data group. Bottcher et al. (2009) believe that customer segmentation is the
process of dividing customers into homogeneous groups on the basis of common attributes.
Also, Mizuno et al. (2008) explains that segmentation of potentially profitable customers,
whom we call “good” customers, becomes significantly important. Many customer
segmentation methods exist. Cuadros and Dominguez (2014) used the SOM method for
customer segmentation. Casabayó et al. (2015) did the customers’ segmentation using the
fuzzy method. Coussement et al. (2014) segmented the customers based on the comparison
of the RFM (Recency, Frequency, and Monetary) variables, Decision Tree and logistic
regression methods. The RFM variables will be discussed further in later sections. Wang
(2009) believes that, “In spite of various types of segmentation variables (demographic,
psycho graphic, or purchasing behavior patterns) proposed so far, practical marketers
continue to use RFM (recency, frequency and monetary) model since it is easy to use and to
be understood by decision makers”. Rezaeinia et al. (2012) combined the use of RFM and
Analytic Hierarchy Process (AHP) and the K-Means algorithm to cluster customers of the
Banking industry based on the benefit of the customers to the bank.
The suggestion methodology is the core of a recommender system which directly affects the
suggestion results (Lee, 2010). Many recommender systems exists, often addressing
different needs; and we summarize several relevant ones below. Xia et al. (2006) has
presented a heuristic method to solve the SVM issues in recommendation systems. Zahra et
al. (2015) have presented a recommendation system using K-Means clustering which
improves the accuracy of the system. Brito et al. (2015) have combined K-Medoids and
CN2-SD algorithm to modify the determination process of the customers’ preferences.
As mentioned, the recommender system is generally divided into three general
categories: content-based forum (CBF), collaborative filtering (CF) and hybrid systems.
Among those, considering the CF characteristics, it is mostly recognized as the most
successful recommender system (Tsai and Hung, 2012). These systems receive information
about customers, analyze them and finally make their recommendations (Huang and Huang,
2009). There are numerous different systems using CF technique, e.g., the Tapestry which is
used to filter out users' emails or the Ringo system which is employed for music
recommendations (Carrer-Neto et al., 2012). The MoveiLense website is another example in
the movie recommendation field.
Increasing the accuracy is one of the major research topics in recommender systems and
CF technique. Recently, many CF systems are combined with other systems to enhance the
quality of the results (Lee, 2010). The recommender systems can be combined with
clustering methods to increase the accuracy of recommendations (Carrer-Neto et al., 2012).
customer spends during a period”. The RFM method is an effective attribute for the
customers’ segmentation. In other words, the RFM is a model that extracts important
customers from a large transaction data (Chang and Tsai, 2011). RFM is a behavior-based
model. In other words, it is used to analyze the behavior of a customer which is engaged and
make predictions based on his behavior (Yeh et al., 2009). According to Coussement et al.
(2014), “RFM analysis is a popular approach in database marketing because of its simplicity
and reasonable performance”. The basic assumption of using the RFM model is that the
future patterns of consumer trading resemble past and current patterns (Chan, 2008). Its
advantage is in its ability to extract characteristics of customers by using fewer criteria (a
three-dimensional) such as cluster attributes which causes reducing the complexity of the
customer value analysis models. In this method, RFM variables of each customer are
extracted which are defined as follows:
R (Recency) is the amount of recency relative to the last day of the course,
F (Frequency) is the number of each customer’s transactions during the course, and
M (Monetary) is the average of the customer’s deposits during the course.
Cheng and Chen (2009) also pointed out that the RFM model is one of the well-known
customer value analysis methods. The calculated RFM values are summarised to clarify
customer behaviour patterns.
3. Research Framework
In Fig. 1, The research framework of this study is shown together with the comparison
methods. We provide an overview below of each phase, and then expand the details thereof
in the subsections that follow. First, in the Data Collection and Cleanup phase, the data is
acquired and the cleaned by removing transactions which have missing values. Then, in the
RFM Variables Extraction phase, RFM variables are extracted according the methodology
outlined in section 2.3. In the RFM Variable Weight Calculation phase, the weight of each
of the RFM variables are calculated as described in section 3.4. In the Customer
Segmentation phase, the customers are segmented by AHP and EM algorithm. Next, the
collaborative filtering is calculated based on the nearest distinctive neighbor for each cluster
Downloaded by University of Birmingham At 05:32 09 June 2016 (PT)
of customers. The proposed method are evaluated and compared with the Conventional
Method that uses collaborative filtering without segmentation. In addition, our proposed
method is evaluated and compared with Multi-class SVM method, which is described in
section 4. Then, the collaborative filtering (CF) is calculated for the Multi-Class SVM
method and our proposed method. The final step is to evaluate the acquired results, select
the optimum method, and then apply an appropriate strategy for the above-mentioned
customers.
The goal of the section is to compare the presented method based on the segmentation of the
customers with the two aforementioned methods, such that CF is calculated separately for
each cluster based on the KNN. At the end of this section, the results of the presented
method are compared against the conventional and Multi-Class SVM methods.
Fig. 1 The Research Framework of the Proposed Method versus Comparison Methods
Our dataset contains the sales data of a 10-month period of a wholesale center in Tehran.
This center distributes the commodities between various sellers in Tehran. This dataset
contains 5 main fields which are CustomerID, SellerID, Requestdate, ComID (commodities
ID), and Box fields. The Box field demonstrates how much commodities have been received
by each seller from the center. The data includes 500,220 sales records for selling 179
different commodities to 12,429 customers. The noise in this research is defined as those
records which have some unfilled fields. As an example, the SellerID or RequestDate or
Box are not reported in some records. These records are not considered in further
calculations.
First, sale transactions of the above-mentioned center over a period of 10 months are taken
into account which includes 500,220 records. These transactions contain the sale data of 179
Downloaded by University of Birmingham At 05:32 09 June 2016 (PT)
different types of items to 12,429 customers in the city. Considering the facts that some of
the items have been sold in a short period of time and also some customers have bought
only once and some others have only bought in a short period of time, the initial data had
considerable noise and as a result, the cleanup of transactions are carried out with high
precision. Consequently, the number of transactions is reduced to 274,716. These
transactions are in fact related to the products and customers during the under-study period.
The active clients and types of goods are then declined to 4472 clients and 117 types,
respectively. Fig. 2 shows the details of customers’ data before the cleanup step.
Bose and Chen (2009) believe that data on customers behaviour include customers
transaction records, feedbacks from customers, and web browsing records. Coussement et al.
(2014) believe that “the impact of data accuracy issues on the performance of RFM analysis
is evaluated”. The commonly used RFM variables are often extracted from transaction
records of customers. In this step, the variables R, F and M of each customer are calculated.
such as slow convergence, brute computing methods, long computation times and low
stabilities. In association rules, major drawback is the number of generated rules which is
huge and may be redundant (Cheng and Chen, 2009).
On the other hand, AHP is a hierarchical process to solve the complex problems involving
multiple attributes by constructing the problem into goal, attribute and alternative for
decision-maker (Chen, 2009). Chen and Wang (2010) believe that AHP as a qualitative and
also quantitative method is a useful approach for evaluating the alternatives of complex
multiple criteria methods involving subject judgement.
The SVM method is a classic method in customers’ segmentation, which we compare here
and show results in the Experimental section below. For this application, Mizuno et al.
(2008) reported that the logistic regression method (LRM) is more efficient than SVM in
customers’ segmentation. Huang et al. (2007) presented a new method called SVC to
improve the SVM method in customers’ clustering. Tu and Yang (2013) claimed that the
SVM method is efficient for unbalanced data, however it is not recommended as it improves
the performance of the minority classes by decreasing the performance of the main and
bigger classes. Ha (2010) has shown that the ANN method has higher accuracy than the
SVM method in customers clustering. Lee et al. (2006) reported that the CART and MARS
methods are considerably more efficient than SVM method. Xia et al. (2006) reported that
the performance of the SVM method is not acceptable in recommendation systems because
of the user-item matrix sparsity. They presented a heuristic method to solve the issues of the
recommendation systems.
SVM is an efficient method for classification. However, it is also used for clustering. As the
number of clusters are not clear in the customer segmentation process, the clustering method
has to be employed. The SVM method is combined with other methods to improve it in the
clustering process.
The studies which emphasise the benefit of customer retention indicate that a 1%
improvement in the customer retention rate improves firm value by 5%. Similarly,
Downloaded by University of Birmingham At 05:32 09 June 2016 (PT)
Reichheld and Sasser showed that a 5% increase in customer retention increases a firm’s
profits at a range between 25% and 85%. There are different clustering algorithms (Hidalgo
et al., 2008).
3.5.1 EM Algorithm
In this paper, the EM algorithm is used for customers' clustering. Bilmes (1998) stated that
the EM algorithm is an efficient technique in this issue. This algorithm estimates the
maximum-likelihood of the parameters in a given data set based on an underlying
distribution in a case that the data is not complete. It is called ‘E-step’ as it is not necessary
to explicitly form a probability distribution over completions, and it just needs to compute
the ‘expected’ sufficient statistics over these completions. Likewise, the ‘M-step’ means that
the model re-estimation is in fact ‘maximization’ of the expected log-likelihood of the data
(Do and Batzoglu, 2008). These two methods can also be compared in the following way.
The E-step is in fact constructing a local lower-bound for the subsequent distribution.
However, the M-step improves the estimations of the unknowns by optimizing the bound.
The following example illustrates these differences more clearly. Dampster et al. (1977) has
presented detailed discussions about the EM algorithm.
This clustering is achieved based on weighted RFM variables and hence, as highlighted in
Fig. 3, customers are classified into 5 categories.
Table 1 shows the characteristics of these clusters. It includes members in each cluster and
the total number of RFM variables. As it is shown in the table, Cluster 1 and 2 have the
lowest and highest numbers of customers, respectively.
Downloaded by University of Birmingham At 05:32 09 June 2016 (PT)
Knowing the value ratio of each one of the customers' cluster is crucial for all salesmen
because attracting new customers is usually much more expensive than the existing
customers and loyal and valued customers should be maintained as a competitive asset. To
calculate the loyalty ratio of each one of the customers' cluster, first, the normalized average
j j
ratio ( C R ), the normalized average rotation ( C F ) and similarly the normalized average
j
ratio of currency value ( C M ) of each group of customers have to be calculated
independently. According to Liu and Shih (2005) and Rezaeinia et al (2012), to measure
the loyalty of each customer group, we must calculate the amounts of normalised averages
j j j
of recency ( C R ) , frequency( C F ) and monetary value ( C M ), separately.
j j j j
The C = W × C +W × C +W × C
I R R F F M M
formula is finally applied to calculate the ratio of
j
customers' loyalty where CI stands for the ratio of each cluster of customers. It is also
One of the methods widely prevalent in the recommender systems is collaborative filtering
(CF) method which is the best method for this research considering existing data. The
proposed method in this paper is the combining of CF method with the K-nearest neighbors.
In other words, the K-nearest neighbors is used to identify the goods which are more
favorable to customers. The recommendation systems designed based on the combination of
CF and KNN are popular in business and academic field (Bobadilla et al., 2013). Similarity
ranking based on the KNN is a classic approach, but it is still used in CF. Therefore, the
combination of CF and KNN is a high priority research (Luo et al., 2013). As an example,
Park et al. (2015) have combined CF and KNN and presented a fast algorithm. CF is a well-
known method in recommendation systems which can be categorized as neighborhood-
based and model-based methods (Zhu et al., 2015). One of the preferred methods in the CF
recommendation systems is using KNN classifiers which are based on the similarity level or
measuring the distance (Bagchi, 2015).
In this step, first, the customer-good matrix is established considering the customers
purchase transactions for each cluster. This matrix shows the goods purchased by the
customers. However, the customer-goods matrix in the proposed method of this research
also shows the quantity of goods sold. Using this method, the quantity of customers’
purchases is also determined in each cluster. This matrix is shown in Fig. 4.
Fig. 4 A View of the Customer-good Matrix of one of the Clusters
The next step uses Pearson coefficient to compute the purchasing similarity of customers.
The Pearson coefficient equation is used as a well-known metric for calculating the
similarity in the literature (e.g., Chen et al. 2015, Ekstrand et al. 2011, Candillier et al. 2007,
and Eckhardt et al. 2012).The similarity between users a and b is obtained from the
following formula:
Downloaded by University of Birmingham At 05:32 09 June 2016 (PT)
Which and represent the average number of products bought by customers a and b,
respectively. Likewise, , and , indicate that customers a and b have purchased the
item p.
After calculating CF, the customers are sorted based on their resemblance to the proposed
customer. Then, the nearest K neighbors to the customer are selected and the process is
repeated for all customers which are shown below.
After determining K customers similar to the customer of interest, the number of goods
bought by K customers is calculated and N goods which have maximum purchase are
suggested as proposed goods.
3.8 Recommendations to Customers based on Conventional Method
Here, in order to compare the proposed method with the conventional method, collaborative
filtering (CF) is applied on the existing data regardless of customer clustering and results are
examined. Like the previous step, customers of one cluster are not just compared with
another cluster but all customers are compared together. Then, the results are studied based
on customers of each cluster in order to compare the results with the proposed method.
Downloaded by University of Birmingham At 05:32 09 June 2016 (PT)
4. Experimental Results
To evaluate the acquired results, customers’ data are divided into two parts of training and
test. In this way, 75% and 25% of data are taken into account for training and test,
respectively. The training set consists of goods purchased by customers in a particular
period (approximately 7 months). The acquired results are studied in the test section.
With respect to our dataset (described in Section 3.1), True Positive (TP) means that an
product was recommended to a customer and the customer bought it. False Positive (FP)
means that an product was recommended to a customer and the customer did not buy it.
False Negative (FN) means that an product was not recommended to a customer and the
customer bought it. True Negative (TN) means that an product was not recommended to a
customer and the customer did not buy it.
Recall and Precision are popular scales for measuring the quality in recommender and
information retrieval systems whose results are obtained on F1 scale. Precision-Recall is
considered as the most prevalent assessment criterion (Cao and Li, 2007).
The formulas used for the Precision and Recall are defined as follows:
#$
Precision =
(2)
#$ + &$
#$
Recall =
(3)
#$ + &*
The increase in recommendation causes decrease in Precision and increase in Recall. The
formula F1 leads to a balance in Precision and Recall.
In each of the above scales, each of the customers' clusters is calculated separately and then
Downloaded by University of Birmingham At 05:32 09 June 2016 (PT)
the mean value of each cluster is calculated so that the proposed method could be assessed.
As shown in the Table 3, the cluster with the highest value which includes 17 customers has
the highest application compared to other clusters. Recommendation quality in this cluster is
also the highest one. Considering the obtained results, the higher values of customers in one
cluster is equivalent to the higher accuracy of the recommendations of that cluster.
Conclusively, it can be said that customers' value ratio has direct relationship with accuracy
of recommender system. Similarly, results of collaborative filtering method are obtained as
follows.
Customers segmented into five clusters using SVM method. As it can be seen in table 5, the
accuracy of the SVM method is higher than the CF common method. However, comparing
the results of our proposed method with SVM, it can be concluded that our proposed method
has higher accuracy.
Downloaded by University of Birmingham At 05:32 09 June 2016 (PT)
Today, the CF recommender system has successful uses. In the current research, a new
method was presented to improve the accuracy of CF. In this method, customers' clustering
was combined with CF and its nearest K neighbors to check the transactions of one of the
sales centers in Tehran, Iran. The results indicated that the proposed method has higher
accuracy compared to the conventional CF method. Likewise, the clusters which have
higher values were received more accurate recommendations. This is very important for
businesses and trade centers as more than 80% of their profits come from valued customers
and hence, recommendations with higher accuracy to these valued customers lead to more
profits to sales centers. Another point was that the proposed method was faster on obtaining
the results than the conventional method as the recommendations were performed with
respect to the customers of the same cluster, while all clients were assessed in the
convectional method and as a result, the calculation speed is reduced as the number of
customers increases in this method. The other advantage of this approach was the
identification of valuable customers as some businesses can capitalize on their valued
customers. Since the valued customers were calculated in the proposed method and the
value of each customer was distinguished for sales representatives, the accomplished
recommendations can be coordinated with sales' strategies to make it more targeted.
For future work, to improve the accuracy of the recommendation systems, this research can
be extended using other customers’ clustering methods and the clustering accuracy can be
improved. Do the recommendations become more accurate by increasing the number of
clusters? The authors are planned to use new machine learning methods such as Deep
learning in customers’ clustering and study the recommendations accuracy. Likewise,
profitable commodities can also be combined with low income products and increase their
sales which are all related to the company's sales and marketing strategies.
References:
11516.
Ekstrand, M.D., Riedl, J.T. and Konstan, J.A. (2011), "Collaborative filtering recommender systems,
Found.", Trends Human–Comput. Vol. 4, pp. 81-173.
Ghani, R. and Fano, A. (2002), "Building Recommender Systems using a Knowledge Base of Product
Semantics", In 2nd International Conference on Adaptive Hypermedia and Adaptive Web
Based Systems. Malaga.
Ha S.H. (2010), " Behavioral assessment of recoverable credit of retailer’s customers", Information
Sciences, Vol. 180, pp. 3703–3717.
Hidalgo, P., Manzur, E., Olavarrieta, S. and Farias, P. (2008), "Customer retention and price
matching: The AFPs case", Journal of Business Research, Vol. 61, pp. 691-696.
Huang, C.L. and Huang, W.L. (2009), "Handling sequential pattern decay: Developing a two-stage
collaborative recommender system", Electronic Commerce Research and Applications, Vol.
8, pp. 117-129.
Huang, J.J., Tzeng, G.H. and Ong C.S. (2007), " Marketing segmentation using support vector
clustering", Expert Systems with Applications, Vol. 32, pp. 313–317.
Lee, S.L. (2010), "Commodity recommendations of retail business based on decisiontree induction",
Expert Systems with Applications, Vol. 37, pp. 3685-3694.
Lee, T.S., Chiu, C.C., Chou, Y.C. and Lu C.J. (2006), " Mining the customer credit using classification
and regression tree and multivariate adaptive regression splines", Computational Statistics
& Data Analysis, Vol. 50, pp. 1113 – 1130.
Liu, D.R. and Shih, Y.Y. (2005) "Hybrid approaches to product recommendation based on customer
lifetime value and purchase preferences", The Journal of Systems and Software,Vol. 77,
pp.181–191.
Loh, S., Lorenzi, F., Saldana, R. and Licthnow, D. (2004), "A tourism recommender system based on
collaboration and text analysis", Information Technology and Tourism, Vol. 6.
Luo, X., Xia, Y., Zhu, Q. and Li. Y. (2013), " Boosting the K-Nearest-Neighborhood based incremental
collaborative filtering", Knowledge-Based Systems, Vol. 53, pp. 90-99.
McCarty, J.A. and Hastak, H. (2007) "Segmentation approaches in data-mining: a comparison of
RFM, CHAID, and logistic regression", Journal of Business Research, Vol. 60, pp.656–662.
Mazurowski, M.A. (2013), "Estimating confidence of individual rating predictions in collaborative
filtering recommender systems", Expert Systems with Applications, 40, pp. 3847-3857.
Mizuno, M., Saji, A., Sumita, U. and Suzuki, H. (2008) ‘Optimal threshold analysis of segmentation
methods for identifying target customers’, European Journal of OperationaResearch, Vol.
186, ,pp.358–379.
Park, Y., Park, S., Jung, W. and Lee S.G. (2015), " Reversed CF: A fast collaborative filtering algorithm
using a k-nearest neighbor graph", Expert Systems with Applications, Vol. 42, pp. 4022–
4028.
Perez-Gallardo, Y., Alor-Hernandez, G., Cortes-Robles, G. and Rodriguez-Gonzalez, A. (2013),
"Collective intelligence as mechanism of medical diagnosis: The iPixel approach", Expert
Systems with Applications, Vol. 40, pp. 2726-2737.
Rezaeinia, S.M., Keramati, A. and Albadvi, A. (2012), "An integrated AHP–RFM method to banking
customer segmentation", International Journal of Electronic Customer Relationship
Management, Vol. 6, pp. 153-168.
Tsai, C.F. and Hung, C. (2012), "Cluster ensembles in collaborative filtering recommendation",
Applied Soft Computing, Vol. 12, pp. 1417-1425.
Tu, Y. and Yang, Z. (2013), " An enhanced Customer Relationship Management classification
framework with Partial Focus Feature Reduction", Expert Systems with Applications,Vol. 40,
pp. 2137–2146.
Wang, C.H. (2009) ‘Outlier identification and market segmentation using kernel-based clustering
Downloaded by University of Birmingham At 05:32 09 June 2016 (PT)
Data Collection
Proposed Method
Conventional Method
Data Cleaning
Multi-Class SVM
R, F and M Variable
Downloaded by University of Birmingham At 05:32 09 June 2016 (PT)
Extraction
R, F and M variable
EM- Weight Calculation
الگوريتم Algorithm
Customer
EM Segmentation
CF Value Calculation of
Calculation Each Customer Cluster
Value
Cluster1 4 980 0.42 0.593
Cluster2 1 246 0.521 0.717
Cluster3 2 610 0.504 0.698
Cluster4 5 1567 0.402 0.55
Cluster5 3 1069 0.475 0.644