0% found this document useful (0 votes)
118 views7 pages

Market Basket Analysis

Association mining is used to identify frequently purchased products together to make recommendations. It analyzes transaction data and generates rules in the form of "if a customer buys item A, they often also buy item B". The strength of the rules is measured by support, confidence and lift values. The apriori algorithm is commonly used to efficiently find these patterns in large transaction datasets and allow controlling the number and types of rules outputted.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views7 pages

Market Basket Analysis

Association mining is used to identify frequently purchased products together to make recommendations. It analyzes transaction data and generates rules in the form of "if a customer buys item A, they often also buy item B". The strength of the rules is measured by support, confidence and lift values. The apriori algorithm is commonly used to efficiently find these patterns in large transaction datasets and allow controlling the number and types of rules outputted.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Association Mining (Market Basket Analysis)

Links

Association Rule Mining


Rule
Measure the strength of a rule
Support
Confidence
Lift
Calculation
Example : Groceries
Transactions data
Association Rule Analysis
Most frequent items
Product recommendation rules
Control The Number Of Rules in Output
Remove Redundant Rules
Find Rules Related To Given Item/s
To find what factors influenced purchase of product X
Find out what products were purchased after/along with product X
Convert Data into Transactions Format

Links
https://rpubs.com/sbushmanov/180410
Visualisation
http://www.rdatamining.com/examples/association-rules

Association Rule Mining


Association mining is commonly used to make product recommendations by identifying products that are
frequently bought together. It generates, which have to be interpreted correctly.

Association mining is usually done on transactions data from a retail market or from an online e-commerce
store. Since most transactions data is large, the apriori algorithm makes it easier to find these patterns or rules
quickly.

Rule

A rule is a notation that represents which item/s is frequently bought with what item/s. It has an LHS and an
RHS part and can be represented as follows:

itemset A => itemset B

This means, the item/s on the right were frequently purchased along with items on the left.
Measure the strength of a rule

The apriori() generates the most relevent set of rules from a given transaction data. It also shows the support,
confidence and lift of those rules. These three measure can be used to decide the relative strength of the
rules. So what do these terms mean?

Lets consider the rule ​A => B​ in order to compute these metrics.

Support

Support=Number of transactions with both A and B./ Total number of transactions=


support(A->B) = P(A∩B) = n(A&B)/ N

Confidence

Confidence=Number of transactions with both A and B / Total number of transactions with A


confidence(A->B) = P(A∩B) / P(A) = n(A&B)/ n(A)

Lift

Expected Confidence of B = Number of transactions with B Total number of transactions=P(B)


expconf (B) = P(B) = n(B)/ N
expconf (A) = P(B) = n(A)/ N
Expconf of A & B = P(A) * P(B) : if A & B is independent
Confidence of A & B = P(A&B) = n(AUB) / N

Lift (A->B) = (Confidence / (Expected Confidence)


= P(A∩B)/ (P(A) * P(B)) = conf(A->B) / P(B)

Lift is the factor by which, the co-occurence of A and B exceeds the expected probability of A and B
co-occurring, had they been independent. So, higher the lift, higher the chance of A and B occurring together.

Calculation
Example : Groceries
Transactions data
Groceries​ data that comes with the ​arules​ package. Unlike dataframe, using ​head(Groceries)​ does 
not display the transaction items in the data. To view the transactions, use the i​ nspect()​ function 
instead. 

Since association mining deals with transactions, the data has to be converted to one of class 
transactions​, made available in R through the ​arules​ pkg. This is a necessary step because the 
apriori()​ function accepts transactions data of class t​ ransactions ​only. 

Association Rule Analysis


 

library​(arules)

library(datasets)

class​(Groceries)
inspect​(​head​(Groceries, ​3​)) 

size​(​head​(Groceries)) ​# number of items in each observation


LIST​(​head​(Groceries, ​3​)) ​# convert 'transactions' to a list, note the LIST in CAPS 

Most frequent items

eclat()​ takes in a transactions object and gives the most frequent items in the data based the support 
you provide to the ​supp​ argument. The ​maxlen​ defines the maximum number of items in each 
itemset of frequent items.

frequentItems <-​ ​eclat​ (Groceries, ​parameter =​ ​list​(​supp =​ ​0.07​, ​maxlen =​ ​15​)) ​# calculates
support for frequent items

inspect​(frequentItems)

itemFrequencyPlot​(Groceries, ​topN=​10​, ​type=​"absolute"​, ​main=​"Item Frequency"​) ​# plot


frequent items
Product recommendation rules

#Support: The fraction of which our item set occurs in our dataset.
#Confidence: probability that a rule is correct for a new transaction with items on the left.
#Lift: The ratio by which by the confidence of a rule exceeds the expected confidence. if the lift is 1 it
indicates that the items on the left and right are independent.

rules <-​ ​apriori​ (Groceries, ​parameter =​ ​list​(​supp =​ ​0.001​, ​conf =​ ​0.5​)) ​# Min Support as 0.001,
confidence as 0.8.

rules_conf <-​ ​sort​ (rules, ​by=​"confidence"​, ​decreasing=​TRUE​) ​# 'high-confidence' rules.

inspect​(​head​(rules_conf)) ​# show the support, lift and confidence for all rules

The rules with confidence of 1 (see r​ ules_conf​ above) imply that, whenever the LHS item was 
purchased, the RHS item was also purchased 100% of the time.

rules_lift <-​ ​sort​ (rules, ​by=​"lift"​, ​decreasing=​TRUE​) ​# 'high-lift' rules.


inspect​(​head​(rules_lift)) ​# show the support, lift and confidence for all rules

A rule with a lift of 18 (see ​rules_lift​ above) imply that, the items in LHS and RHS are 18 times more 
likely to be purchased together compared to the purchases when they are assumed to be 
unrelated.
Control The Number Of Rules in Output 

Adjust the ​maxlen​, s​ upp​ and c​ onf​ arguments in the a


​ priori​ function to control the number of rules 
generated. You will have to adjust this based on the sparesness of you data.

rules <-​ ​apriori​(Groceries, ​parameter =​ ​list​ (​supp =​ ​0.001​, ​conf =​ ​0.5​, ​maxlen=​3​)) ​# maxlen = 3
limits the elements in a rule to 3

1. To get ​‘strong‘​ rules, increase the value of ‘​ conf’​ parameter. 

2. To get ​‘longer‘​ rules, increase ​‘maxlen’​.

Remove Redundant Rules

Sometimes it is desirable to remove the rules that are subset of larger rules. To do so, filter the 
redundant rules. 

sum(is.redundant(rules2)) 

(redundant = which(is.redundant(rules2))) 

#remove it 
rulesNR = rules2[-redundant]  

is.redundant(rulesNR) 
sum(is.redundant(rulesNR)) #ok now 
 

Find Rules Related To Given Item/s 


This can be achieved by modifying the a
​ ppearance​ parameter in the a
​ priori()​ function. For example, 

To find what factors influenced purchase of product X


To find out what customers had purchased before buying ‘Whole Milk’. This will help you understand 
the patterns that led to the purchase of ‘whole milk’. 

rules <-​ ​apriori​ (​data=​Groceries, ​parameter=​list​ (​supp=​0.001​,​conf =​ ​0.08​), ​appearance =​ ​list


(​default=​"lhs"​,​rhs=​"whole milk"​), ​control =​ ​list​ (​verbose=​F)) ​# get rules that lead to buying 'whole
milk' 

rules_conf <-​ ​sort​ (rules, ​by=​"confidence"​, ​decreasing=​TRUE​) ​# 'high-confidence' rules.


inspect​(​head​(rules_conf)) 

 
 

Find out what products were purchased after/along with product X


This a case to find out ​the Customers who bought ‘Whole Milk’ also bought . .​ In the equation, ‘whole 
milk’ is in LHS (left hand side). 

rules <-​ ​apriori​ (​data=​Groceries, ​parameter=​list​ (​supp=​0.001​,​conf =​ ​0.15​,​minlen=​2​), ​appearance =


list​(​default=​"rhs"​,​lhs=​"whole milk"​), ​control =​ ​list​ (​verbose=​F)) ​# those who bought 'milk' also
bought.. 

rules_conf <-​ ​sort​ (rules, ​by=​"confidence"​, ​decreasing=​TRUE​) ​# 'high-confidence' rules. 

inspect​(​head​(rules_conf))

One drawback with this is, you will get only 1 item on the RHS, irrespective of the support, confidence 
or minlen parameters. 

Convert Data into Transactions Format

If you have to read data from a file as a transactions data, use ​read.transactions()​. 

tdata <-​ ​read.transactions​(​"transactions_data.txt"​, ​sep=​"\t"​)

If you already have your transactions stored as a dataframe, you could convert it to class ​transactions 
as follows, 

tData <-​ ​as​ (myDataFrame, ​"transactions"​) ​# convert to 'transactions' class

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy