Market Basket Analysis Using Apriori and FP Growth Algorithm
Market Basket Analysis Using Apriori and FP Growth Algorithm
M
( language.
tree. So if we can reduce our computation by A. Analysis over French Retail Dataset
some approach,
Here, minimum support=1% and minimum
E
confidence =50%, the time required for two
algorithms are given in Figure 2.
M
I
it will be productive. Our proposed way is to The results in Figure 2 indicates that FP
reduce the items of datasets with top selling Growth algorithm takes shorter time than Apriori
products. So we reshape the datasets by taking algorithm for various transac tions. In this paper,
those products that bought most by the customers. minimum confidence has been kept 50% for all
But how much top selling products will be suitable experiments.
for this proposed approach is a key question. For Fig. 2. The Required Time for Apriori and FP Growth
250 Algorithms
200
150
100
300
)
5 28 2 28 2 M
(
200
E
150
SUPPOR T V E R SUS T IME FOR A PR IOR I A L G OR IT H M achieved before by using without product
Apriori Apriori with 55% product reduction reduction- [eggs, groundbeef] =>
450 [mineralwater](1.19, 6
0) [milk, groundbeef]
400
=> [mineralwater](1.2
, 5
1.3)
350
•S
ample 2: In this, we get same one rule- •S
ample 3: Again, we get same one rule-
[eggs, groundbeef] => [mineralwater](1.26, [milk, groundbeef] => [mineralwater](1.66,
65.5) 64.1) • Sample 4: But in this sample we get no
same rules.
Fig. 3. Time of Apriori for Without and With 55% Product
60
M Taking 5 0% of top selling products: As 94 unique
items so 50% will contain 47 most popular items,
(
=50%, Figure 5 displays that that the required time for Taking 55% of top selling products: As 94 unique items
FP so
)
Reduction
M
(
Growth algorithm is smaller than Apriori algorithm. 55% will contain 52 most popular items.
C OMPA R ISON OF E X E C UT ION T IME ( MS)
E
Apriori FP Growth
120 M
I
TABLE VI TABLE V
SUPPORT VERSUS RULES WHEN CONFIDENCE=50% COMPARISON OF FREQUENT ITEMSETS AND ASSOCIATION RULES
80
Support(%) Without Reduction With 50% Reduction Frequent Items
60
Rules Frequent Items Rules 1 61 11 61 11 2 33 8 33 8
60 40
M
(
E 20
M
I 0
T
0 1 2 3 4 5 6 SUPPORT(%)
40
20
BETWEEN WITHOUT REDUCTION AND WI TH REDUCTION Transaction
0
0 2000 4000 6000 8000 10000 TRANSACTION
reduc tion, 60
9000 60 11 59 11 60 11 60 11 •S
ample 3: We get same two rules here-
[hotchocolate] => [coffee](3.22, 55.4 5)
From Table V we can observe that, when we take 50% [pastry] => [ coffee](5.01, 58.6 4)
•S ample 4: Here, we get same six rules-
top selling products the number of generated rules are
totally [alf ajores] => [coffee](2.69, 67.1 0)
30 [cake] => [coffee] (6.2 8, 5 7.48)
20
[hotchocolate] => [coffee](2.79, 58.2 4)
[pastry] => [ coffee](5.44, 57.8 6)
10
[scone] => [coffee] (1.5 8, 5 4.54)
0
0 1 2 3 4 5 6 SUPPORT(%) [toast] => [ coffee](2.58, 7 1.0 1)
•S ample 5: Again, we get same six rules-
similar to the rules which we get from without [cookies] => [ coffee](3.38, 5 6.6 3)
reduction. Again, also the frequent itemsets are quite [juice] => [ coffee] (2.2
7, 6 0.56)
similar. For this dataset, there is no need to compute [medialuna] => [ coffee](3.38, 5 9.81)
55% product reduction because we have already got [pastry] => [ coffee](4.64, 56.4 1)
similar frequent itemsets and rules. For this, we have [scone] => [coffee] (2.1 6, 5 6.16)
used 50% product reduction for this dataset. So we [spanishbrunch] => [coffee] (1.47, 75.67)
can say that it is beneficial to use reduction because
So we can see that after using sampling without
previously we need 94 products for mining important replacement, same rules are generated which are
associations among frequently purchased items, now more accurate compared to those rules which were
we need only 47 products for the same purpose. generated without product reduction.
After performing support versus rules comparison
for with out reduction and with 50% reduction, again VI. CONCLUSION
Table VI indicates that the generated frequent items
and association rules for 50% reduction are totally From experimental analysis, the results show that if
similar to the rules which are generated without we use reduction with top selling products the time
reduction. required for both algorithms is less than using all the
Fig. 7. Time of FP Growth for Without and With 50% Product
products. Again, after using product reduction it gives
Reduction same rules and almost same frequent itemsets for
various support levels. So from our point of view, it is
3) Rule Analysis Using Sampling Without beneficial to use reduction of items because for this
Replacement: At first we have computed 9465 reduction we need less computation than before.
transactions with all 94 products without product Again, FP Growth requires shorter time than Apriori
replacement while keeping minimum support=1% and algorithm both for without and with product reduction.
minimum confidence=50%. The rules are- We have also done rule analysis by using sampling
• [ alf ajores] => [ coffee] (1.9 6, 5 4.06) without replacement and results show that we get the
• [ cake] => [coffee] (5.4 7, 5 2.69) same rules with higher confidence. So we can say the
• [ cookies] => [ coffee](2.82, 5 1.84) reduction of items is capable of identifying customers
• [ hotchocolate] => [coffee](2.95, 5 0.7
2) purchasing patterns which require less computation.
• [ juice] => [coffee] (2.0
6, 5 3.42) In future, more transactional datasets can be used to
• [ medialuna] => [ coffee](3.51, 56.9 2) determine the range of percentage for product
• [ pastry] => [coffee](4.75, 55.2 1) reduction. Also analysis of individual rule with
• [ sandwich] => [ coffee](3.8 2, 53.25) correlation analysis will be interesting.
• [ scone] => [ coffee] (1.80, 5 2.29) REFERENCES
• [ spanish] => [coffee] (1.08, 5 9.8 8) [1] M. Dhanabhakyam and M. Punithavalli, “A survey on data
• [ toast] => [coffee] (2.36, 7 0.44) mining algorithm for market basket analysis,” Global Journal of
Computer Science and Technology, 2011.
After 50% reduction, we have done sampling without [2] Y. Liu and Y. Guan, “Fp-growth algorithm for application in
replace ment. As there are 9465 transactions, we research of market basket analysis,” in 2008 IEEE
have taken five samples, each has 1893 transactions International Conference on Computational Cybernetics, pp.
in it. The results are- 269–272, IEEE, 2008.
[3] J. Han, M. Kamber, and J. Pei, “Data mining concepts and
•S
ample 1: Here, we get same two rules which we techniques third edition,” The Morgan Kaufmann Series in
had achieved before by using without product Data Management Systems, 2011.
replacement- [alf ajores] => [coffee](1.69, 56.1
4) [4] R. Agrawal, T. Imielinski, and A. Swami, “Mining association
rules ´ between sets of items in large databases,” in Acm
[spanishbrunch] => [coffee] (1.0 , 67.8
5) sigmod record, vol. 22, pp. 207–216, ACM, 1993.
•S
ample 2: Here, we get same six rules- [5] R. Agrawal, R. Srikant, et al., “Fast algorithms for mining
[hotchocolate] => [coffee](3.16, 53.0 9) association rules,” in Proc. 20th int. conf. very large data
[medialuna] => [ coffee](3.22, 5
8.65) bases, VLDB, vol. 1215, pp. 487–499, 1994.
[6] S. Abdulsalam, K. Adewole, A. Akintola, and M. Hambali,
“Data mining in market basket transaction: An association rule
mining approach,” International Journal of Applied Information
Systems, vol. 7, no. 10, pp. 15–20, 2014.
[7] J. Qiu, Z. Lin, and Y. Li, “Predicting customer purchase
behavior in the e-commerce context,” Electronic commerce
research, vol. 15, no. 4, pp. 427–452, 2015.
[8] M. Kaur and S. Kang, “Market basket analysis: Identify the
changing trends of market data using association rule mining,”
Procedia computer science, vol. 85, pp. 78–85, 2016.
[9] R. Sharma. Market Basket Optimization (May,2019), Version
1, Retrieved from
https://www.kaggle.com/roshansharma/market-basket
optimization/metadata.
[10] S. Sarwar. Transactions from a bakery (November,2018),
Version 1, Retrieved from
https://www.kaggle.com/sulmansarwar/transactions
from-a-bakery/metadata.
[11] S. Raschka, “Mlxtend: Providing machine learning and data
science utilities and extensions to python’s scientific
computing stack.,” J. Open Source Software, vol. 3, no. 24, p.
638, 2018.