0% found this document useful (0 votes)

11 views44 pages

Association

Uploaded by

parksy317575

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views44 pages

Association

Uploaded by

parksy317575

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Market Basket Analysis and

Association Rules
Instructor: Junghye Lee

Ref.: M.J.A. Berry and G. Linoff, Data Mining Techniques, Wiley, 1997.

1
Contents

 Introduction
 Association Rules
 Basic Process
- Choosing the right set of items
- Generating rules and their measures
- Overcoming the practical limits
 Strengths and Weakness
 Application Areas

2
Introduction: What is Market Basket Analysis?

 Finding useful information in ‘Market Basket’

 Useful information
– Who customers are
– Which products tend to be purchased together
– Why some products tend to be purchased together
 Useful information like “If Item A then Item B” is
called ‘Association rule’
Example: shopping cart
window
cleaner detergent
orange
juice
milk milk bananas

3
Introduction: Point of Sale Transactions

 transaction and item

Ex) Grocery Point-Of-Sale Transactions

customer items

1 orange juice, banana

2 orange juice, milk

3 detergent, window cleaner

transaction

4
Introduction: Transactions and Co-Occurrence
Customer Items
• OJ and soda are more likely to
be purchased together.
1 Orange juice, Soda
•Detergent is never purchased
2 Milk, Orange juice, Window Cleaner
with window cleaner or milk.
3 Orange juice, Detergent
•Milk is never purchased with
4 Orange juice, Detergent ,Soda soda or detergent.

5 Window Cleaner, Soda

OJ Window Milk Soda Detergent
Cleaner
Confidence of the rule OJ 4 1 1 2 1

-“if soda, then orange juice”: Window 1 2 1 1 0

Cleaner
2/3 (67%) Milk 1 1 1 0 0
- “if orange juice then soda”: Soda 2 1 0 3 1
2/4 (50%) Detergent 1 0 0 1 2
5
Association Rules

 The clear and useful result of market basket

analysis.
 Only one result in association rules are
strongly recommended.
– “If diapers and Thursday, then beer” is more
useful than “If Thursday, then diapers and beer”.
 Three types of rules
– the useful
– the trivial
– the inexplicable

6
Association Rules - The Useful Rule

 Contains high quality, actionable information

 Once the pattern (rule) is found, it is often
not hard to justify.
– Example ; Rules like “ On Thursday, customer
who purchase diapers are likely to purchase beer”
 Young couples prepare for the weekend by
stocking up on diapers for the infants and beer
for dad.
 Can be applied to Market layout
– Ex) Placing other baby products within sight of
the beer

7
Association Rules - The Trivial Rule

 Trivial results are already known by anyone

familiar with the business.
Ex) - “Customers who purchase maintenance
agreements are very likely to purchase large appliances”
(they purchase both at the same time.)
- “customers purchasing paint buy paint brushes”
 Results from market basket analysis may
simply be measuring the success of previous
marketing campaigns.

8
Association Rules - The inexplicable rules

 Inexplicable rules give a new fact but no

explanation about consumer behavior and
future actions.
– Example) “When a new hardware store opens,
one of the most commonly sold items is toilet
rings”
 Inexplicable rules can be flukes in data.
 More investigation might give explanation.

9
The Basic Process in Market Basket Analysis

Choosing the right set of item and right level

- using taxonomy and virtual items

Generating rules by co-occurrence matrix

(measures: support, confidence, improvement)

Overcoming the practical limits

(pruning)
10
Basic Process:
Choosing the right set of items - Taxonomy
Frozen Foods
 Why?
– To reduce too many item combinations Desserts
 How to find the appropriate level?
Ice Cream
– Consider frequency (roll up rare items to
higher levels to help to generalize items)
Vanilla
– Consider importance (roll down expensive
item to lower levels)

When the items occur in about the same number

of times in the data, the analysis produce the best
results
11
Basic Process:
Choosing the right set of items- Virtual Items

 Items that cross product boundaries

– They do not appear in the product taxonomy
(e.g. “Calvin Klein”, “cash”, “month”)
 Cautions
– Prime cause of redundant rules
(e.g., If Coke products, then Coke)
– When a virtual item and a generalized item appearing
together are a proxy for individual items
(e.g., If “coke product” and “diet soda” then “pretzels”
If “diet coke” then pretzels”)

12
Basic Process: Generating rules

1. Gathering transactions for selected items

( including virtual items)
2. Making Co-Occurrence Matrix
3. Finding the most frequent combination
from the matrix
4. Making a distinction between ‘condition’
and ‘result’ from that combination
If “condition”, then “result”.

– support, confidence, improvement

13
Performance Measures - Support

Rule: If “condition” then “result”.

 Support
- How many transactions that contain “condition” and “result” at
the same time ?

# of transactions that include " condition" and " result "

S  P("condition"" result " ) 
# of total transactions

- Support can be used to eliminate uninteresting rules.

14
Performance Measures - Confidence

 Confidence
- How many transactions that contain “condition” and “result”
among the transactions including “condition” ?

P(condition result)
C  P(" result"|" condition" ) 
P(condition)
# of transactions that include condition and result

# of transactions that include condition
- Conditional probability
- Degree of association – may not imply causality
- not symmetric

15
Performance Measures - Improvement

 improvement or lift
- Lift (improvement) tells us how much better a rule is at
predicting the result than just guessing the result at random

P(result | condition) P(codition  result )

I 
P(result ) P(condition) P(result )

Improvement example

1 Two items are independent Pepper and cookie

>1 complementary Bread and butter

<1 substitutional Butter and margarine

16
Basic Process: Generating rules -
Example

 Grocery shopping cart

Customer Items

1 Orange juice, Soda

2 Milk, Orange juice, Window Cleaner

3 Orange juice, Detergent

4 Orange juice, Detergent ,Soda

5 Window Cleaner, Soda

17
Basic Process: Generating rules -
Example
 Co-Occurrence Matrix
4 /5
transactions OJ Window Milk Soda Detergent
Cleaner

OJ 4 1 1 2 1
(0.8) (0.2) (0.2) (0.4) (0.2)
Window 1 2 1 1 0
Cleaner (0.2) (0.4) (0.2) (0.2)

Milk 1 1 1 0 0
(0.2) (0.2) (0.2)
Soda 2 1 0 3 1
(0.4) (0.2) (0.6) (0.2)
Detergent 1 0 0 1 2
(0.2) (0.2) (0.4)

18
Basic Process: Generating rules -
Example
 Assume most common combination
‘A,B,C’

Combination Probability Combination Probability

A 45% A and B 25%

B 42.5% A and C 20%

C 40% B and C 15%

A and B and C 5%

19
Basic Process: Generating rules -
Example
 Which is Result between A,B,C?
 Setting a result on the basis of ‘confidence’
 Confidence of rule “ If condition then result” support
– P(Result|Condition) = P(Result and
Condition)/P(Condition)
 Confidence of rule “If AB then C” = P(ABC|AB)
Association P(condition) P(condition confidence
Rule and result)
If AB then C 25% 5% 0.20

If AC then B 20% 5% 0.25

If BC then A 15% 5% 0.33

20
Basic Process: Generating rules -
Example
 What if P(R) > P(R|C) (= Confidence) ?
 ‘Improvement’ tells how much better a rule is than
just guessing randomly the result
 Improvement = P(R|C) / P(R) = P(RC)/P(R)P(C)
– If Improvement > 1  the rule is better
– If Improvement < 1  “If C, then NOT R ” is
better (Negative rule)

Rule confidence Improvement

If BC then A 0.33 0.74 0.33/0.45
If BC then Not A 0.67 1.22 0.67/0.55
If A then B 0.56 1.31
0.25/0.45
21
Basic Process:
Overcoming the practical limits
 Exponential growth as problem size
increases
– On menu with 100 items, how many
combinations are there?  161,700 (3 items )
 How to solve the problem of Big Data?
– To use the taxonomy ; generalize items that
can meet criterion
– To use Pruning ; throw out item or
combination of items that do not meet
criterion. “Minimum support pruning” is the
most common pruning mechanism

22
Strengths of Market Basket Analysis

 It produces clear and understandable

results
– because the results are association rules
 It supports both directed and undirected
data mining
 It can handle transactions themselves
 The computations it uses are simple to
understand
– The calculation of confidence and
improvement is simple

23
Weaknesses of Market Basket Analysis

 It requires exponentially more

computational effort as the problem size
grows
– number of items, complexity of the rules
 It has limited support for data attribute
– Virtual items make rules more expressive
 It is difficult to determine the right number
of items
 It discounts rare items

24
Application of Market Basket Analysis

 It can suggest store layouts.

 It can tell which products are
amenable to promotion.
 It is used to compare stores.
 It can be applied to time-series
problems.

25
Apriori Algorithm

 Agrawal and Srikant, 1994

(Phase 1) Find all frequent itemsets having the minimum support smin.

(Phase 2) Consider a subset A of a frequent itemset L.

For a specified confidence cmin,
if supp(L)/supp(A)  cmin ,
then, generate a rule R: A  (L-A).

So, the support of this rule will be supp(R)=supp(L), and

the confidence will be conf(R)= supp(L)/supp(A).

26
Apriori Algorithm – Phase 1
Step 0. Specify the minimum support smin .
k=1 C1  [{i1},{i2 },...,{im }] L1  {c  C1 | supp(c)  s min }

Step 1. k=k+1
Generate new candidate itemsets C k from Lk 1
(apriori-gen function)
Step 1-1. (join)
Generate k-itemsets Lk 1 by joining like C  Lk 1 * Lk 1

Step 1-2. (prune)

Delete any (k-1)-itemset in C if it does not belong to Lk 1
Form C k by pruning all of these itemsets.
Stop if C k   .

Step 2. Generate Lk such that Lk  {c  Ck | supp(c)  s min }

Repeat Step 1.
27
Apriori Algorithm – Example
transaction items
Smin=0.4
1 b, c, g
2 a, b, d, e, f
C1=[{a}, {b}, {c}, {d}, {e}, {f}, {g}] 3 a, b, c, g
4 b, c, e, f
L1=[{a}, {b}, {c}, {e}, {f}, {g}] 5 b, c, e, f, g

C2=[{a,b}, {a,c}, {a,e}, {a,f}, {a,g}, {b,c}, {b,e}, {b,f},

{b,g}, {c,e}, {c,f}, {c,g}, {e,f}, {e,g}, {f,g}]
L2=[{a,b}, {b,c}, {b,e}, {b,f}, {b,g}, {c,e}, {c,f}, {c,g},
{e,f}]
C3=[{b,c,e}, {b,c,f}, {b,c,g}, {b,e,f}, {c,e,f}]
L3=[{b,c,e}, {b,c,f}, {b,c,g}, {b,e,f}, {c,e,f}]
C4=[{b,c,e,f}]=L4 28
Apriori Algorithm – Example

 Rules generated

L={b,c,g}

Candidates of rules having 1-item in ‘result”:

R1: {b,c}{g} conf(R1)=0.6/0.8 = 0.75

R2: {b,g}{c} conf(R2)=0.6/0.6 = 1
R3: {c,g}{b} conf(R3)=0.6/0.6 = 1

29
Apriori Algorithm – Theorem
Sequential Patterns
Sequential Patterns
 Sequence: List of items in the order of time, etc.
– eg: s1  A1 , A2 ,..., An  s2  B1 , B2 ,..., Bm 

 Length of sequence: number of items in a

sequence, (k-sequence)
 Subsequence s1 of s2 : sequence s1 is a
subsequence of s2 if
A1  Bi1 , A2  Bi2 ,..., An  Bin i1  i2  ...  in

– eg: s1=<{a}, {b,c}, {e}>, s2=<{f}, {a,g}, {h}, {b,c,d},{e,j}>,

s3=<{a}, {b}, {c}, {e}>
 s1 is a subsequence of s2, but s3 is not a
subsequence of s2
30
Sequential Patterns
 sequence s is called as maximal if it is not a
subsequence of others.
 Support of a sequence s:
supp(s) = proportion of customers that
contain sequence s
<example>
# Customer sequences Maximal sequences having min. support 0.4:
1 <{a}, {b}> s1=<{a}, {b}>, s2=<{a}, {e, g}>
2 <{c,d}, {a}, {e,f,g}>
3 <{a,h,g}> s3=<{a}, {e}> is not maximal.
4 <{a}, {e,g}, {b}>
5 <{b}>

31
Algorithm for Finding Sequences
 Agrawal and Srikant (1995)
1. Sort Phase: convert the transaction
database into customer sequences
2. L-itemset Phase: Find the set of all l-
itemsets L by considering the minimum
support.
3. Transformation Phase: Transform each
customer sequence into the set of all l-
itemsets contained in that transaction.
4. Sequence Phase: Generate large
sequences
5. Maximal Phase: Find the maximal
sequences among large sequences.
32
Algorithm for Finding Sequences
<example> (cont’d)
Min. support: 0.4
2) L-itemset Phase
itemset support #
{a} 4 1
{b} 3 2
{e} 2 3
{e, g} 2 4
{g} 3 5
3)Transformation Phase
cust # Cust seq Transformed seq
1 <{a}, {b}> <{1}, {2}>
2 <{c,d}, {a}, {e,f,g}> <{1}, {3, 4, 5}>
3 <{a,h,g}> <{1, 5}>
4 <{a}, {e,g}, {b}> <{1}, {3, 4, 5}, {2}>
5 <{b}> <{2}>
33
Algorithm for Finding Sequences
4)Sequence Phase
•AprioriAll: does not guarantee max seq, so requires
Maximal Phase
•AprioriSome, ‘DynamicSome’: guarantee max seq
AprioriAll
Step 0. Set all large 1-seq to L1.
k=1
Step 1. k=k+1
Ck  Lk 1 * Lk 1
Step 2. Obtain Lk from Ck
Stop if Lk   . Repeat Step 1, otherwise.

Example (cont’d)
L1=[<1>, <2>, <3>, <4>, <5>]
L2=[<1, 2>, <1, 3>, <1, 4>, <1, 5>]
L3=[]stop
Max seq : <1, 2> and <1,4>
34
Algorithm for Finding Sequences
Example
cust Transformed seq
1 <{1,5}, {2}, {3}, {4}>
2 <{1}, {3}, {4}, {3,5}>
3 <{1}, {2}, {3}, {4}>
4 <{1}, {3}, {5}>
5 <{4}, {5}>
Min support: 0.4
L1=[<1>, <2>, <3>, <4>, <5>]
L2=[<1 2>, <1 3>, <1 4>, <1 5>, <2 3>, <2 4>, <3 4>, <3 5>, <4 5>]
C3=[<1 2 3>, <1 2 4>, <1 3 4>, <1 3 5>, <1 4 5>, <2 3 4>, <2 3 5>,
<2 4 5>, <3 4 5>]
L3=[<1 2 3>, <1 2 4>, <1 3 4>, <1 3 5>, <2 3 4>]
C4=L4=[<1 2 3 4>]
Max seq: <1 2 3 4>, <1 3 5>, <4 5>

35
Algorithm for Finding Sequences
AprioriSome
(Forward phase)
Step 0. k=1; Obtain L1; C1=L1; last=1.
Step 1. (Generate Ck)
k←k+1
1) Lk 1known: Ck  Lk 1 * Lk 1
2) Lk 1unknown: Ck  Ck 1 * Ck 1
Step 2. (select k for Lk )
Stop if Ck   , Proceed, otherwise.
1)If k=next(last), obtain Lk , last=k, Go to Step 1.
2)If not k=next(last), go to Step 1.
(Backward phase)
Step 0. k=kmax
Step 1.
1) Lk known: delete all subsequences in Li (i  k ) from Lk
2) Lk unknown: delete all subsequences in Li (i  k ) from Ck
Step 2. k←k-1; go to Step 1.
36
Algorithm for Finding Sequences
 Function ‘next’ determines the length of
sequences.
– Agrawal & Srikant (1995)
| Lk |
hit (k ) 
| Ck |
1) If hit(k) < 0.666 , next(k)=k+1
2) If 0.666  hit(k) < 0.75 , next(k)=k+2
3) If 0.75  hit(k) < 0.80, next(k)=k+3
4) If 0.80  hit(k) < 0.85, next(k)=k+4
5) If hit(k)  0.85, next(k)=k+5
 Once the all large sequences are obtained,
the union will be the maximal sequence.
37
Algorithm for Finding Sequences
Example(cont’d)- AprioriSome
next(i)=2i
(Forward phase)
Iteration 0.
L1=C1=[<1>, <2>, <3>, <4>, <5>], last=1
Iteration 1. (k=2)
C2=[<1 2>, <1 3>, <1 4>, <1 5>, <2 3>, <2 4>, <2 5>, <3 4>, <3 5>, <4 5>]
next(1)=2=k
L2=[<1 2>, <1 3>, <1 4>, <1 5>, <2 3>, <2 4>, <3 4>, <3 5>, <4 5>]
last=2
Iteration 2. (k=3)
C3=[<1 2 3>, <1 2 4>, <1 3 4>, <1 3 5>, <1 4 5>, <2 3 4>, <2 3 5>, <2 4 5>,<3 4 5>]
next(2)=43
Iteration 3.(k=4)
C4=[<1 2 3 4>, <1 2 3 5>, <1 2 4 5>, <1 3 4 5>, <2 3 4 5>]
next(2)=4=k
L4=[<1 2 3 4>]
38
Algorithm for Finding Sequences
Example(cont’d)
(Backward phase)
Iteration 0.
kmax=4
Iteration 1.
L4=[<1 2 3 4>]; k=3
Iteration 2.
L3=[<1 3 5>]; k=2
Iteration 3.
L2=[<4 5>]

Max sequences: <1 2 3 4>, <1 3 5>, <4 5>

MRA Project - Shehroz Khan
67% (3)
MRA Project - Shehroz Khan
19 pages
Association Rule Mining
No ratings yet
Association Rule Mining
61 pages
TMK_DWDM_Unit 4. From government engineering College
No ratings yet
TMK_DWDM_Unit 4. From government engineering College
176 pages
Data - Analytics - Chapter 3
No ratings yet
Data - Analytics - Chapter 3
54 pages
CafeSales MRA Presentation
No ratings yet
CafeSales MRA Presentation
40 pages
Module 3 Mining frequent patterns and associations
No ratings yet
Module 3 Mining frequent patterns and associations
37 pages
Market Basket Analysis and Association Rules
No ratings yet
Market Basket Analysis and Association Rules
18 pages
ch14 Min Assoc Rules
No ratings yet
ch14 Min Assoc Rules
12 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
12 pages
Ml Report
No ratings yet
Ml Report
11 pages
Market Basket Analysis Using Association Rules Unit 5
No ratings yet
Market Basket Analysis Using Association Rules Unit 5
21 pages
Marketing Analytics - Week 11- LAQ
No ratings yet
Marketing Analytics - Week 11- LAQ
5 pages
Market Basket Analysis Ppt
No ratings yet
Market Basket Analysis Ppt
12 pages
Untitled Document
No ratings yet
Untitled Document
59 pages
DA Unit 4
No ratings yet
DA Unit 4
125 pages
Market Basket Analysis and Association Rules
No ratings yet
Market Basket Analysis and Association Rules
22 pages
Module-IV(Frequent Pattern & Association Rule Mining)
No ratings yet
Module-IV(Frequent Pattern & Association Rule Mining)
59 pages
DMW Unit4
No ratings yet
DMW Unit4
39 pages
Association
No ratings yet
Association
54 pages
Session 3 - Market Basket Analysis and Association Rules
No ratings yet
Session 3 - Market Basket Analysis and Association Rules
18 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
88 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
30 pages
Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods
12 pages
Mining Frequent Patterns, Associations, and Correlations
No ratings yet
Mining Frequent Patterns, Associations, and Correlations
12 pages
lec2
No ratings yet
lec2
18 pages
2024 Market Basket Analysis
No ratings yet
2024 Market Basket Analysis
30 pages
Unit 3 Mining Frequent patterens
No ratings yet
Unit 3 Mining Frequent patterens
30 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
33 pages
Market Basket Analysis and Association Rules
No ratings yet
Market Basket Analysis and Association Rules
21 pages
Market Basket Analysis Case PDF
No ratings yet
Market Basket Analysis Case PDF
35 pages
MRA Milestone-2
No ratings yet
MRA Milestone-2
20 pages
Learning Data Mining with Python Robert Layton instant download
100% (1)
Learning Data Mining with Python Robert Layton instant download
69 pages
Data Mining - 8
No ratings yet
Data Mining - 8
19 pages
Seminar 6
No ratings yet
Seminar 6
30 pages
Market Basket Analysis Unit-4
No ratings yet
Market Basket Analysis Unit-4
4 pages
S28
No ratings yet
S28
35 pages
DWM Unit V
No ratings yet
DWM Unit V
27 pages
Frequent Itemsets and Associations
No ratings yet
Frequent Itemsets and Associations
15 pages
Unit 4 - DA - Frequent Itemsets and Associations
No ratings yet
Unit 4 - DA - Frequent Itemsets and Associations
31 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
14 pages
Team 9 - PPT
No ratings yet
Team 9 - PPT
9 pages
Data Mining Techniques Market Basket Analysis and Association Rules
No ratings yet
Data Mining Techniques Market Basket Analysis and Association Rules
6 pages
Market Basket Analysis: Rengarajan R (19049)
No ratings yet
Market Basket Analysis: Rengarajan R (19049)
12 pages
Data Mining Chapter 2: Market Basket Analysis
No ratings yet
Data Mining Chapter 2: Market Basket Analysis
4 pages
Unit 4
No ratings yet
Unit 4
8 pages
Sample Report 1
No ratings yet
Sample Report 1
4 pages
Avinash.pdf.PDF
No ratings yet
Avinash.pdf.PDF
23 pages
Market Basket Analysis: Interim Progress Report (IPR)
No ratings yet
Market Basket Analysis: Interim Progress Report (IPR)
12 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
13 pages
dataanalytics unit-4
No ratings yet
dataanalytics unit-4
23 pages
M5 m6 KC
No ratings yet
M5 m6 KC
36 pages
Problem Statement
100% (1)
Problem Statement
17 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
9 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
4 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
2 pages
AssociationRule and Apriori
No ratings yet
AssociationRule and Apriori
45 pages
Project Report (2)
No ratings yet
Project Report (2)
57 pages
Unit 5
No ratings yet
Unit 5
2 pages
Doc3 Main Report
No ratings yet
Doc3 Main Report
60 pages
Market Basket Analysis: Identify The Changing Trends of Market Data Using Association Rule Mining
No ratings yet
Market Basket Analysis: Identify The Changing Trends of Market Data Using Association Rule Mining
8 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
11 pages
MA_UNIT V
No ratings yet
MA_UNIT V
22 pages
MA unit 5 notes by Saurabh vishwakarma
No ratings yet
MA unit 5 notes by Saurabh vishwakarma
18 pages
CRM Unit 4
No ratings yet
CRM Unit 4
22 pages
Advanced DAX - Dynamic Segmentation, Time Comparisons, Cross Sell and Averages - Power BI Experience
100% (1)
Advanced DAX - Dynamic Segmentation, Time Comparisons, Cross Sell and Averages - Power BI Experience
39 pages
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
No ratings yet
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
24 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
37 pages
174819-Market Basket Analysis
No ratings yet
174819-Market Basket Analysis
54 pages
Market Basket Analysis Using: R Tool
No ratings yet
Market Basket Analysis Using: R Tool
23 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
17 pages
Unit 5 Mining Frequent Patterns and Cluster Analysis
No ratings yet
Unit 5 Mining Frequent Patterns and Cluster Analysis
63 pages
Mra Project 2
No ratings yet
Mra Project 2
26 pages
Marketing and Retail Analysis Grocery Store - Pranjal Jain
No ratings yet
Marketing and Retail Analysis Grocery Store - Pranjal Jain
26 pages
Product Affinity Analysis To Increase Sales Using Machine Learning
No ratings yet
Product Affinity Analysis To Increase Sales Using Machine Learning
8 pages
Oral Questions LP-II: Star Schema
No ratings yet
Oral Questions LP-II: Star Schema
21 pages
Grocery Store - MRA - Priyanka Sharma
No ratings yet
Grocery Store - MRA - Priyanka Sharma
24 pages
How To Create Insightful Slides That Drive Impact-1
No ratings yet
How To Create Insightful Slides That Drive Impact-1
4 pages
Data Minig Unit 2nd
No ratings yet
Data Minig Unit 2nd
9 pages
Market Basket
No ratings yet
Market Basket
13 pages
Market Basket Analysis For Data Mining Concepts and Techniques
No ratings yet
Market Basket Analysis For Data Mining Concepts and Techniques
4 pages
Less Is More
From Everand
Less Is More
Jason Jennings
3/5 (8)
Hooked (Review and Analysis of Eyal and Hoover's Book)
From Everand
Hooked (Review and Analysis of Eyal and Hoover's Book)
BusinessNews Publishing
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Association

Uploaded by

Association

Uploaded by

Market Basket Analysis and

 Finding useful information in ‘Market Basket’

 transaction and item

Ex) Grocery Point-Of-Sale Transactions

1 orange juice, banana

2 orange juice, milk

3 detergent, window cleaner

5 Window Cleaner, Soda

-“if soda, then orange juice”: Window 1 2 1 1 0

 The clear and useful result of market basket

 Contains high quality, actionable information

 Trivial results are already known by anyone

 Inexplicable rules give a new fact but no

Choosing the right set of item and right level

Generating rules by co-occurrence matrix

Overcoming the practical limits

When the items occur in about the same number

 Items that cross product boundaries

1. Gathering transactions for selected items

– support, confidence, improvement

Rule: If “condition” then “result”.

# of transactions that include " condition" and " result "

- Support can be used to eliminate uninteresting rules.

P(result | condition) P(codition  result )

1 Two items are independent Pepper and cookie

>1 complementary Bread and butter

<1 substitutional Butter and margarine

 Grocery shopping cart

1 Orange juice, Soda

2 Milk, Orange juice, Window Cleaner

3 Orange juice, Detergent

4 Orange juice, Detergent ,Soda

5 Window Cleaner, Soda

Combination Probability Combination Probability

A 45% A and B 25%

B 42.5% A and C 20%

C 40% B and C 15%

If AC then B 20% 5% 0.25

If BC then A 15% 5% 0.33

Rule confidence Improvement

 It produces clear and understandable

 It requires exponentially more

 It can suggest store layouts.

 Agrawal and Srikant, 1994

(Phase 2) Consider a subset A of a frequent itemset L.

So, the support of this rule will be supp(R)=supp(L), and

Step 1-2. (prune)

Step 2. Generate Lk such that Lk  {c  Ck | supp(c)  s min }

C2=[{a,b}, {a,c}, {a,e}, {a,f}, {a,g}, {b,c}, {b,e}, {b,f},

Candidates of rules having 1-item in ‘result”:

R1: {b,c}{g} conf(R1)=0.6/0.8 = 0.75

 Length of sequence: number of items in a

– eg: s1=<{a}, {b,c}, {e}>, s2=<{f}, {a,g}, {h}, {b,c,d},{e,j}>,

Max sequences: <1 2 3 4>, <1 3 5>, <4 5>

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.