0% found this document useful (0 votes)
276 views7 pages

Market Basket Analysis Using Apriori and FP Growth Algorithm

The document summarizes a paper presented at the 2019 22nd International Conference on Computer and Information Technology titled "Market Basket Analysis Using Apriori and FP Growth Algorithm". The paper proposes reducing the items in a dataset to the top selling products before applying the Apriori and FP Growth algorithms for market basket analysis. Testing was done using different percentages of top selling items, such as 30%, 40%, 50%, 55%, and results showed similar frequent itemsets and rules could be obtained more quickly compared to using all items. FP Growth was also found to be faster than Apriori.

Uploaded by

cendy oktari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
276 views7 pages

Market Basket Analysis Using Apriori and FP Growth Algorithm

The document summarizes a paper presented at the 2019 22nd International Conference on Computer and Information Technology titled "Market Basket Analysis Using Apriori and FP Growth Algorithm". The paper proposes reducing the items in a dataset to the top selling products before applying the Apriori and FP Growth algorithms for market basket analysis. Testing was done using different percentages of top selling items, such as 30%, 40%, 50%, 55%, and results showed similar frequent itemsets and rules could be obtained more quickly compared to using all items. FP Growth was also found to be faster than Apriori.

Uploaded by

cendy oktari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

2019 22nd International Conference on Computer and Information Technology (ICCIT), 18-20 December, 2019

Market Basket Analysis Using Apriori and


FP Growth Algorithm
maliha143099@gmail.com Mahit Kumar Paul
Maliha Hossain A H M Sarowar Sattar Dept. of Computer Science &
Dept. of Computer Science & Engineering ​Rajshahi University of
Dept. of Computer Science &
Engineering ​Rajshahi University of Engineering & Technology ​Rajshahi,
​ ajshahi University of
Engineering R
Engineering & Technology ​Rajshahi, Bangladesh
Engineering & Technology ​Rajshahi,
Bangladesh Bangladesh mahit.cse@gmail.com
sarowar@gmail.com
market, then how many greatly possibility to purchase
bread simultaneously with milk [2]. This analysis helps
the shop
​ Market basket analysis finds out customers’ pur owners to take many important business decisions,
Abstract—
chasing patterns by discovering important associations among identify regular customers, increase products sale,
the products which they place in their shopping baskets. It not catalog design and many more. The main goal of market
only assists in decision making process but also increases
basket analysis is to extract associations among
sales in many business organizations. Apriori and FP Growth
are the most common algorithms for mining frequent itemsets. purchasing products. It also helps retailers to product
For both algorithms predefined minimum support is needed to placement on shelves by placing similar products close
satisfy for identifying the frequent itemsets. But when the to one another. For example, If customers who purchase
minimum support is low, a huge number of candidate sets will computers also tend to buy anti-virus software at the
be generated which requires large computation. In this paper, same time, then placing the hardware display close to
an approach has been proposed to avoid this large software display may help increase the sale of both
computation by reducing the items of dataset with top selling
items [3].
products. Various percentages of top selling products like 30%,
40%, 50%, 55% have been taken and for both algorithms Many algorithms have been proposed for discovering
frequent itemsets and association rules are generated. The knowledge from these large databases. Mining
results show that if top selling items are used, it is possible to association rules is one of the most important
get almost same frequent itemsets and association rules within measurements. An asso ciation rule is of the form ​X ​=​>
a short time comparing with that outputs which are derived by Y ​, where ​X ​is referred as the antecedent and ​Y ​is
computing all the items. From time comparison it is also found referred as the consequent and the rule represents the
that FP Growth algorithm takes smaller time than Apriori
customers who purchase ​X ​are more likely to purchase
algorithm.
Index Terms— ​ Market Basket Analysis, Association Rule Y ​[1]. The interestingness of rules is mea sured by
Min ing, Apriori Algorithm, FP Growth algorithm. support and confidence. The usefulness and certainty of
discovered rules are reflected by them. The association
I. I​NTRODUCTION rules need to satisfy the user-specified minimum support
and minimum confidence. Apriori and FP Growth are
In this digital world Terabytes of commercial data are two most basic algorithms for finding frequent itemsets
generated in second. In day-to-day activities huge and discovering associations among products [3].
amounts of data are generated, as a result the volume of In this paper, we have used Apriori and FP Growth
data is increasing dramatically. Mining information from algo rithms for discovering popular items in transactional
these explosive growth of data has become one of the datasets and obtaining relations among those items. We
major challenges for data management and mining have also proposed a new approach for mining
communities. Moreover, the majority of the recognized association rules by selecting a specific percentage of
organizations collect and store massive amounts of frequent items from our dataset and have performed
customer transaction data [1]. However having these many tests to support our proposal.
massive data do not mean the organizations had rich The leavings of the paper is structured as follows:
commercial information [2]. The business industries previous related work of market basket analysis have
need to discover valuable information and knowledge been discussed in section II. In section III, information
from this vast quantity of data. This leads to market on the datasets is described. Data preprocessing,
basket analysis. This process discovers customers frequent itemset mining meth ods and proposed
buying patterns by finding associations among different approach have been described in section IV. Section V
items that customers place in their shopping baskets [3]. contains the implementation of existing and pro posed
The aim of market basket analysis is to determine which approaches. This section mainly discusses experimental
items are frequently purchased together by customers. results and analysis of the overall work. In section VI, we
The term frequent items means the itemsets which have concluded our work.
satisfy a user specified predefined percentage value. For
example, if customers have purchased milk in a super ​ VIEW
II. L​ITERATURE ​RE
A lot of studies have been done in the area of presented an efficient algorithm that generates all
association rule mining. Mr. Rakesh and others have significant association rules among

978-1-7281-5842-6/19/$31.00 c 2019 IEEE


items in the database. The authors proposed Apriori method to other areas.
property which identifies frequent itemsets in a database
[4], [5]. The authors have used sales data from a large III. D​ATASET ​D​ESCRIPTION
retailing company and tried to find the associations In our work, we have used two datasets and these
between products by taking the minimum support=1% datasets are obtained from Kaggle. The first dataset
and minimum confidence=50%. They also assured the provides transac tion information over the course of a
effectiveness of the estimation and pruning techniques week at a French Retail Store [9]. The second dataset
by measuring accuracy. consists of observations from a
Authors Abdulsalam, Hambali and others used Apriori bakery which provides transaction of bakery items [10].
algorithm for market basket analysis. The authors tried The details are shown in Table I.
to represent the sales pattern of a supermarket by
representing six (6) distinct products across thirty (30) TABLE I
D​ATASET ​D​ESCRIPTION
unique transactions. The authors assumed minimum
support is 50% and tried to find the itemsets which are Datasets Name No. of Instances No. of Attributes French
frequent by using Apriori algorithm in JAVA Retail Store 7501 20
programming language [6]. Bakery Shop 21293 4

Dhanabhakyam and Punithavalli highlighted Classifica


tion Dependent Predictive Association Rules(CPAR), IV. M​ETHODOLOGIES
Asso ciative Classification, Classification Association A. Data Preprocessing
Rule Min ing(CARM), Distributed Apriori Association
For French Retail Store dataset at first we have
Rule, Six Sigma Technique and Apriori algorithm in their
checked ’Nan’ values, which mean that the item
paper [1]. They marked out the advantages and
represented by the column was not purchased in that
disadvantages of the methods and tried to drag a
specific transaction and replaced them with 0. Then we
conclusion that which method is better. According to
have identified all unique items which show how many
authors among all the methods Apriori algorithm is found
non-replicating data are there. There are 120 different
to be better for association but it has many difficulties.
unique items sold by the French Retail Store. Then
For this the authors proposed to combine fuzzy logic
TransactionEncoder is used to map the items on per
with Apriori algorithm which will return better result.
transaction. Here, TransactionEncoder means if the
Again the authors Liu and Guan have used FP Growth
product is present on that transaction, then the value of
in their paper which can solve the disadvantages of
the product is 1, otherwise 0 [11].
Apriori [2]. According to the authors, FP Growth
For Bakery Shop dataset there are four columns-
constructs an FP tree which has highly compressed
Date, Time, Transaction, Item. We have first checked for
information. The authors have taken five transactions
’NONE’ values in these four columns. In the Item column
and generated the FP tree to discover the relationship
we found ’NONE’ value which means no item was
between transactions.
purchased and the number of such rows is 786. So
Authors Jiangtao Qiu and others have tried to build a
these have no use in the dataset and we dropped such
model of customers purchase behavior in the
rows. Again, in Transactions column, the rows that
e-commerce context, which known as customer
share the same values belong to same transaction, for
purchase prediction model(COREL) [7]. This model
this the dataset has fewer transactions than
mainly has two stages. At first a candidate production
observations. We have finally 9465 transactions in this
collection is built by discov ering associations among
dataset. Then we have computed the unique items and
products and by this it predicts customers’ motivations.
94 items are found which means only these items are
The second stage is used to determine the most
present in the Items column. After this,
purchased candidate products based on customers’
TransactionEncoder is used so that we can transform
preferences. Authors have gained customers’
our data into a correct format for applying the mining
information and product reviews from ”Jingdong”. The
algorithms. Table II shows the number of instances and
outcome of their paper showed that customers’
attributes after data preprocessing.
preference plays a great role in pur chasing decisions.
Authors Kaura and Kanga proposed an approach to TABLE II
identify the changing trends of market data using P​REPROCESSED ​D​ATASET
association rule mining [8]. They at first described
Datasets Name No. of Instances No. of Attributes French
various techniques of data mining and then tried to Retail Store 7501 120
describe why market basket analysis is important. They Bakery Shop 9465 97
have used extended bakery datasets and tried to detect
outliers. Also the authors suggested to extend this B. Frequent Itemset Mining Methods
For finding frequent itemsets and corresponding 1)-itemsets. Here, at first frequent 1-itemsets are found
association rules in our datasets, we use two mining by scanning the database which satisfy the minimum
algorithms. They are ​1) Apriori Algorithm: A ​ priori is the support. Again, frequent 2-itemsets are found by using
first and basic algo rithm for finding frequent itemsets fre quent 1-itemsets. So this process continues until
proposed by R. Agrawal and R. Srikant in 1994 [4]. frequent ​
k ​ itemsets can be found [3]. Actually Apriori
Apriori involves an approach known as a level-wise follows an anti monotonic property which states that
search, where ​k​-itemsets are used to explore (​k + ​ every subset of a frequent
itemset must also be frequent and it descending support count and then it datasets
uses a breadth-first search to count uses that FP-tree to obtain the Start
the candidate items frequently. this association information [3]. The best
algorithm has two main steps- advantage of FP Growth is it scans Input Dataset
•J​ oining step: T ​ o find ​LK​ ​, a set of the database only two times and does Data Preprocessing
candidate ​k​-itemsets is generated by not generate a huge number of
candidate sets.
joining (​LK​ −​ ​1) with itself [3]. •​ P
​ runing
Reduce products
step: A ​ ny (​k − ​1)-itemset that is not C. Proposed Approach with top selling
frequent cannot be a subset of The main goal of this study is to show
products
frequent ​k-​ itemset [3]. performance eval uation between
2) FP Growth Algorithm: ​Apriori Apriori and FP Growth algorithms. For Apply Apriori
and FP Growth
algorithm has two major demerits like both algorithms, at first we discover allalgorithms to the
it generates a huge number of the frequent itemsets which satisfy a reduced datasets
candidate sets and scans the predefined minimum support and then Compare the
database a lot of time. To overcome find associa results derived
from without
the disadvantages of Apriori and with items
algorithm, FP Growth algorithm is reduction
used. FP Growth follows a
Result Analysis
divide-and-conquer strategy. At first it
constructs a frequent pattern tree or End
FP-tree by taking the frequent items
which are sorted in the order of Apply Apriori and FP Growth algorithms to the Fig. 1. Proposed Approach
transactional
50
0
0 1000 2000 3000 4000 5000 6000 7000 8000 ​TRANSACTION
tions between frequent item sets which satisfy a
predefined minimum confidence. Then we this, we have taken 30%, 40%, 50% and 55%
compare the execution time against transaction top selling products and compared the results
and mark out the results which are given in against frequent itemsets and association rules
experimental analysis section. When the database which are obtained by computing all products in
is large Apriori generates a huge number of the datasets. Figure 1 shows the flow-chart of
candidate sets and FP Growth algorithm can not our proposed approach.
construct a main memory base FP
C OMPA R ISON OF E X E C UT ION T IME ( MS)
​ ALYSIS
V. E​XPERIMENTAL ​AN
Apriori FP Growth The overall experiment is performed on a PC
450
with Intel(R) Core(TM) i5-4210U CPU 2.40 GHz
400
350
processor, 4 GB main memory and running the
300 Microsoft Windows 10 operating sys tem. All the
analysis are done by using python programming
)

M
( language.
tree. So if we can reduce our computation by A. Analysis over French Retail Dataset
some approach,
Here, minimum support=1% and minimum
E
confidence =50%, the time required for two
algorithms are given in Figure 2.
M
I

it will be productive. Our proposed way is to The results in Figure 2 indicates that FP
reduce the items of datasets with top selling Growth algorithm takes shorter time than Apriori
products. So we reshape the datasets by taking algorithm for various transac tions. In this paper,
those products that bought most by the customers. minimum confidence has been kept 50% for all
But how much top selling products will be suitable experiments.
for this proposed approach is a key question. For Fig. 2. The Required Time for Apriori and FP Growth
250 Algorithms
200
150
100
300
)

3) Rule Analysis Using Sampling Without


1) Execution of Proposed Approach: W ​ e have Replace ment: ​In order to support our proposed
taken var ious percentages of top selling approach more pre cisely we have done
products and compare the results. So for product sampling without replacement in the datasets. At
reduction, first we have computed 7501 transactions with
Taking 3 ​ 0% ​of top selling products: A ​ s 120 all 120 products and generated all the
unique items so 30% will contain 36 most associations among frequent itemsets keeping
popular items, ​Taking ​40% ​of top selling minimum support=1% and mini mum
products: A ​ s 120 unique items so 40% will confidence=50%. Rules have been written in the
contain 48 most popular items, ​Taking 5 ​ 0% ​of form ​X =​ ​> Y ​(​s, c)​ , where s and c represent
top selling products: ​As 120 unique items so support and confidence respectively which are
50% will contain 60 most popular items, ​Taking expressed as percentage.
55% ​of top selling products: A​ s 120 unique items • ​[​eggs, groundbeef​] =​> [​ ​mineralwater]​ (1​.​01​,
so 55% will contain 66 most popular items. 50​.​66), This means that 1​.0 ​ 1% of all
In Table III, Frequent Itemsets are denoted as FI transactions under analysis show that eggs,
and Rules are denoted as R. ground beef and mineral water are pur
From Table III, we can see that if we take 55% chased together and 50​.​66% of customers
top selling products the number of generated who purchased eggs and ground beef also
rules are totally similar to the rules which we get bought mineral water.
from without reduction. Again, also the frequent • ​[​milk, groundbeef]​ =​> ​[​mineralwater​](1​.1 ​ ​,
itemsets are quite similar. So for this dataset 50​.​3), This means that 1​.1 ​ % of all
55% product reduction is taken. transactions under analysis show that milk,
TABLE III ground beef and mineral water are pur
C​OMPARISON WITH ​30%, 40%, 50%, 55% ​REDUCTION
chased together and 50​.​3% of customers
who purchased milk and ground beef also
Transaction Without Reduction ​Reduction With Top Selling bought mineral water.
Products
30% 40% 50% 55% After 55% product reduction, that means instead
FI R FI R FI R FI R FI R ​1000 350 20 261 17 of using 120 products for mining association
290 18 316 19 326 20 2000 293 8 223 7 249 7 269 7
rules, we have used only 66 products and we
275 8 3000 294 4 226 4 252 4 270 4 276 4 4000 302 3
230 3 256 3 276 3 282 3 5000 285 3 217 3 244 3 263 3 have analyzed these generated rules by using
269 3 6000 274 4 208 4 235 4 252 4 258 4 7000 264 2 sampling without replacement.
204 2 228 2 244 2 249 2 As there are 7500 transactions, we have taken
five samples, each has 1500 transactions in it. At
first we have randomly chosen 1500 transactions
Now if we perform support versus rules
and made associations among fre quent
comparison for without and with 55% product
itemsets in transactions for each sample and
reduction, Table IV indicates that the number of
results are noted. Then we have again randomly
generated rules for 55% reduction are to tally
chosen another 1500 transactions and made
similar to the rules which are got using without
associations among them. There are many rules
reduction.
generated in each sample but we have taken
TABLE IV only those rules which have similarity with the
S​UPPORT VERSUS ​R​ULES WHEN ​C​ONFIDENCE​=50% rules which are generated from without product
Support(%) Without Reduction With 55% reduction. As we have used five samples where
Reduction ​Frequent Items Rules each has 1500 transactions we marked out the
Frequent Items Rules results as follows-
1 257 63 242 63
2 103 20 102 20 •S
​ ample 1: Here, we get same two rules which we
3 54 7 54 7 had
4 35 4 35 4 S
250

5 28 2 28 2 M
(

200
E

2) Time Comparison between Existing and M


I

Proposed Ap proach: ​Figure 3 and Figure 4 T

150

display that with 55% product reduction the 100

required time for both algorithms are less than 50


0
without reduction. 0 1 2 3 4 5 6 ​SUPPORT(%)

SUPPOR T V E R SUS T IME FOR A PR IOR I A L G OR IT H M achieved before by using without product
Apriori Apriori with 55% product reduction reduction- [​eggs, groundbeef​] =​>
450 [​mineralwater​](1​.​19​, 6
​ 0) [​milk, groundbeef​]
400
=​> ​[​mineralwater​](1​.2
​ ​, 5
​ 1​.​3)
350
•S
​ ample 2: In this, we get same one rule- •S
​ ample 3: Again, we get same one rule-
[​eggs, groundbeef]​ =​> ​[​mineralwater​](1​.​26​, [​milk, groundbeef​] =​> ​[​mineralwater​](1​.​66​,
65​.​5) 64​.​1) ​• ​Sample 4: But in this sample we get no
same rules.
Fig. 3. Time of Apriori for Without and With 55% Product

Reduction ​SUPPOR T V E R SUS T IME FOR FP G R OWT H A L G OR IT H M


FP Growth FP Growth with 55% product reduction
notice carefully we can see after
•S
​ ample 5: We get same one rule
120

100 here- sampling without replacement we


[​eggs, groundbeef​] =​> have got same rules in many
80
)
[​mineralwater]​ (1​.​06​, 5
​ 5​.​17) If we samples with
S

60
M Taking 5 ​ 0% ​of top selling products: ​As 94 unique
items so 50% will contain 47 most popular items,
(

reduction and with 50% reduction separately for


M
I

40 Apriori and FP Growth algorithms has been used and


20 Figure 6 and Figure 7 indicate the results. The results
show that when 50% product reduction is used, both
0
0 1 2 3 4 5 6 ​SUPPORT(%) algorithms take shorter time.
higher confidence values compared to the previous
result. So,we get the same rules with higher confidence
SUPPOR T V E R SUS T IME FOR A PR IOR I A L G OR IT H M
value.
Apriori Apriori with 50% product reduction
140

B. Analysis over Bakery Shop Dataset 120

Here, minimum support=1% and minimum confidence 100

=50%, Figure 5 displays that that the required time for Taking ​55% ​of top selling products: ​As 94 unique items
FP so
)

Fig. 4. Time of FP Growth for Without and With 55% Product S

Reduction
M
(

Growth algorithm is smaller than Apriori algorithm. 55% will contain 52 most popular items.
C OMPA R ISON OF E X E C UT ION T IME ( MS)
E
Apriori FP Growth
120 M
I

In Table V, Frequent Itemsets are denoted as FI and


100

80 Rules are denoted as R.


)
S

TABLE VI TABLE V
S​UPPORT VERSUS ​R​ULES WHEN ​C​ONFIDENCE​=50% C​OMPARISON OF ​F​REQUENT ITEMSETS AND ​A​SSOCIATION RULES
80
Support(%) Without Reduction With 50% Reduction ​Frequent Items
60
Rules Frequent Items Rules​ 1 61 11 61 11 2 33 8 33 8
60 40
M
(

E 20
M
I 0
T
0 1 2 3 4 5 6 ​SUPPORT(%)
40

20
BETWEEN ​W​ITHOUT REDUCTION AND ​WI​ TH REDUCTION ​Transaction
0
0 2000 4000 6000 8000 10000 ​TRANSACTION

3 23 4 23 4 Without Reduction Reduction


​ With Top Selling Products
4 14 2 14 2
5 11 1 11 1
30% 40% 50%
FI R FI R FI R FI R
1000 60 9 50 7 56 8 58 9​ 2000 64 10 54 7 62 9 64 10
2) Time Comparison between Existing and Proposed 3000 60 8 53 7 59 8 60 8 4000 59 9 52 8 58 9 59 9 5000
Ap proach: ​Again, support versus time chart between 57 10 52 8 57 10 57 10 6000 56 9 53 9 56 9 56 9
without Fig. 6. Time of Apriori for Without and With 50% Product Reduction
Fig. 5. The Required Time for Apriori and FP Growth Algorithms

SUPPOR T V ER SUS T IME FOR FP G R OWT H AL G OR IT H M


1) Execution of Proposed Approach: ​For product FP Growth FP Growth with 50 % product reduction

reduc tion, 60

Taking 3 ​ 0% ​of top selling products: ​As 94 unique 50

items so 30% will contain 28 most popular items, 40

Taking 4 ​ 0% ​of top selling products: ​As 94 unique


)

items so 40% will contain 38 most popular items, 7000 56 9 54 9 56 9 56 9


M
[​sandwich​] =​> ​[​coffee​](4​.​06​, ​62​.0 ​ 9)
(

8000 59 10 57 10 59 10 59 10 [​scone​] =​> ​[​coffee]​ (2​.0 ​ 0​, 6 ​ 1​.​29)


M
[​spanishbrunch​] =​> ​[​coffee]​ (1​.​21​, 6 ​ 3​.​88)
[​toast​] =​> [​ ​coffee​](3​.​27​, 8 ​ 3​.7 ​ 8)
I

9000 60 11 59 11 60 11 60 11 •S
​ ample 3: We get same two rules here-
[​hotchocolate]​ =​> ​[​coffee​](3​.​22​, ​55​.4 ​ 5)
From Table V we can observe that, when we take 50% [​pastry​] =​> [​ ​coffee​](5​.​01​, ​58​.6 ​ 4)
•S​ ample 4: Here, we get same six rules-
top selling products the number of generated rules are
totally [​alf ajores​] =​> ​[​coffee​](2​.​69​, ​67​.1 ​ 0)
30 [​cake​] =​> ​[​coffee]​ (6​.2 ​ 8​, 5 ​ 7​.​48)
20
[​hotchocolate]​ =​> ​[​coffee​](2​.​79​, ​58​.2 ​ 4)
[​pastry​] =​> [​ ​coffee​](5​.​44​, ​57​.8 ​ 6)
10
[​scone​] =​> ​[​coffee]​ (1​.5 ​ 8​, 5 ​ 4​.​54)
0
0 1 2 3 4 5 6 ​SUPPORT(%) [​toast​] =​> [​ ​coffee​](2​.​58​, 7 ​ 1​.0 ​ 1)
•S ​ ample 5: Again, we get same six rules-
similar to the rules which we get from without [​cookies​] =​> [​ ​coffee​](3​.​38​, 5 ​ 6​.6 ​ 3)
reduction. Again, also the frequent itemsets are quite [​juice​] =​> [​ ​coffee]​ (2​.2
​ 7​, 6 ​ 0​.​56)
similar. For this dataset, there is no need to compute [​medialuna​] =​> [​ ​coffee​](3​.​38​, 5 ​ 9​.​81)
55% product reduction because we have already got [​pastry​] =​> [​ ​coffee​](4​.​64​, ​56​.4 ​ 1)
similar frequent itemsets and rules. For this, we have [​scone​] =​> ​[​coffee]​ (2​.1 ​ 6​, 5 ​ 6​.​16)
used 50% product reduction for this dataset. So we [​spanishbrunch​] =​> ​[​coffee]​ (1​.​47​, ​75​.​67)
can say that it is beneficial to use reduction because
So we can see that after using sampling without
previously we need 94 products for mining important replacement, same rules are generated which are
associations among frequently purchased items, now more accurate compared to those rules which were
we need only 47 products for the same purpose. generated without product reduction.
After performing support versus rules comparison
for with out reduction and with 50% reduction, again VI. C​ONCLUSION
Table VI indicates that the generated frequent items
and association rules for 50% reduction are totally From experimental analysis, the results show that if
similar to the rules which are generated without we use reduction with top selling products the time
reduction. required for both algorithms is less than using all the
Fig. 7. Time of FP Growth for Without and With 50% Product
products. Again, after using product reduction it gives
Reduction same rules and almost same frequent itemsets for
various support levels. So from our point of view, it is
3) Rule Analysis Using Sampling Without beneficial to use reduction of items because for this
Replacement: ​At first we have computed 9465 reduction we need less computation than before.
transactions with all 94 products without product Again, FP Growth requires shorter time than Apriori
replacement while keeping minimum support=1% and algorithm both for without and with product reduction.
minimum confidence=50%. The rules are- We have also done rule analysis by using sampling
• [​ ​alf ajores​] =​> [​ ​coffee]​ (1​.9 ​ 6​, 5​ 4​.​06) without replacement and results show that we get the
• [​ ​cake]​ =​> ​[​coffee]​ (5​.4 ​ 7​, 5​ 2​.​69) same rules with higher confidence. So we can say the
• [​ ​cookies]​ =​> [​ ​coffee​](2​.​82​, 5 ​ 1​.​84) reduction of items is capable of identifying customers
• [​ ​hotchocolate]​ =​> ​[​coffee​](2​.​95​, 5 ​ 0​.7
​ 2) purchasing patterns which require less computation.
• [​ ​juice]​ =​> ​[​coffee]​ (2​.0
​ 6​, 5​ 3​.​42) In future, more transactional datasets can be used to
• [​ ​medialuna​] =​> [​ ​coffee​](3​.​51​, ​56​.9 ​ 2) determine the range of percentage for product
• [​ ​pastry​] =​> ​[​coffee​](4​.​75​, ​55​.2 ​ 1) reduction. Also analysis of individual rule with
• [​ ​sandwich​] =​> [​ ​coffee​](3​.8 ​ 2​, ​53​.​25) correlation analysis will be interesting.
• [​ ​scone​] =​> [​ ​coffee]​ (1​.​80​, 5 ​ 2​.​29) R​EFERENCES
• [​ ​spanish]​ =​> ​[​coffee]​ (1​.​08​, 5 ​ 9​.8​ 8) [1] M. Dhanabhakyam and M. Punithavalli, “A survey on data
• [​ ​toast​] =​> ​[​coffee]​ (2​.​36​, 7 ​ 0​.​44) mining algorithm for market basket analysis,” ​Global Journal of
Computer Science and Technology​, 2011.
After 50% reduction, we have done sampling without [2] Y. Liu and Y. Guan, “Fp-growth algorithm for application in
replace ment. As there are 9465 transactions, we research of market basket analysis,” in ​2008 IEEE
have taken five samples, each has 1893 transactions International Conference on Computational Cybernetics​, pp.
in it. The results are- 269–272, IEEE, 2008.
[3] J. Han, M. Kamber, and J. Pei, “Data mining concepts and
•S
​ ample 1: Here, we get same two rules which we techniques third edition,” ​The Morgan Kaufmann Series in
had achieved before by using without product Data Management Systems​, 2011.
replacement- [​alf ajores​] =​> ​[​coffee​](1​.​69​, ​56​.1
​ 4) [4] R. Agrawal, T. Imielinski, and A. Swami, “Mining association
rules ´ between sets of items in large databases,” in ​Acm
[​spanishbrunch]​ =​> ​[​coffee]​ (1​.0 ​ ,​ ​67​.8
​ 5) sigmod record,​ vol. 22, pp. 207–216, ACM, 1993.
•S
​ ample 2: Here, we get same six rules- [5] R. Agrawal, R. Srikant, ​et al.​, “Fast algorithms for mining
[​hotchocolate]​ =​> ​[​coffee​](3​.​16​, ​53​.0 ​ 9) association rules,” in ​Proc. 20th int. conf. very large data
[​medialuna]​ =​> [​ ​coffee​](3​.​22​, 5
​ 8​.​65) bases, VLDB,​ vol. 1215, pp. 487–499, 1994.
[6] S. Abdulsalam, K. Adewole, A. Akintola, and M. Hambali,
“Data mining in market basket transaction: An association rule
mining approach,” ​International Journal of Applied Information
Systems​, vol. 7, no. 10, pp. 15–20, 2014.
[7] J. Qiu, Z. Lin, and Y. Li, “Predicting customer purchase
behavior in the e-commerce context,” ​Electronic commerce
research​, vol. 15, no. 4, pp. 427–452, 2015.
[8] M. Kaur and S. Kang, “Market basket analysis: Identify the
changing trends of market data using association rule mining,”
Procedia computer science​, vol. 85, pp. 78–85, 2016.
[9] R. Sharma. Market Basket Optimization (May,2019), Version
1, Retrieved from
https://www.kaggle.com/roshansharma/market-basket
optimization/metadata.
[10] S. Sarwar. Transactions from a bakery (November,2018),
Version 1, Retrieved from
https://www.kaggle.com/sulmansarwar/transactions
from-a-bakery/metadata.
[11] S. Raschka, “Mlxtend: Providing machine learning and data
science utilities and extensions to python’s scientific
computing stack.,” ​J. Open Source Software,​ vol. 3, no. 24, p.
638, 2018.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy