06apriori Edited v3
06apriori Edited v3
— Chapter 6 —
1
May 10, 2021 Data Mining: Concepts and Techniques 2
Chapter 5: Mining Frequent Patterns, Association
and Correlations: Basic Concepts and Methods
Basic Concepts
Evaluation Methods
Summary
3
What Is Frequent Pattern Analysis?
Frequent pattern: a pattern (a set of items, subsequences, substructures,
etc.) that occurs frequently in a data set
First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context
of frequent itemsets and association rule mining
Motivation: Finding inherent regularities in data
What products were often purchased together?— Beer and diapers?!
What are the subsequent purchases after buying a PC?
What kinds of DNA are sensitive to this new drug?
Can we automatically classify web documents?
Applications
Basket data analysis, cross-marketing, catalog design, sale campaign
analysis, Web log (click stream) analysis, and DNA sequence analysis.
4
Why Is Freq. Pattern Mining Important?
5
Basic Concepts: Frequent Itemset
6
Basic Concepts: Association Rules
Tid Items bought Find all the rules X Y with
10 Beer, Nuts, Diaper
minimum support and confidence
20 Beer, Coffee, Diaper
30 Beer, Diaper, Eggs support, s, probability that a
40 Nuts, Eggs, Milk transaction contains X Y
50 Nuts, Coffee, Diaper, Eggs, Milk
confidence, c, conditional
probability that a transaction
Customer Customer
buys both
having X also contains Y
buys
diaper
Let minsup = 50%, minconf = 50%
Freq. Pat.: Beer:3, Nuts:3, Diaper:4, Eggs:3,
Customer {Beer, Diaper}:3
buys beer Association rules: (many more!)
Beer Diaper (60%, 100%)
Diaper Beer (60%, 75%)
7
Compute Support and Confidence
Basic Concepts
Evaluation Methods
Summary
11
Scalable Frequent Itemset Mining Methods
Approach
12
The Downward Closure Property and Scalable
Mining Methods
The downward closure property of frequent patterns
Any subset of a frequent itemset must be frequent
diaper}
i.e., every transaction having {beer, diaper, nuts} also
@SIGMOD’00)
Vertical data format approach (Charm—Zaki & Hsiao
@SDM’02)
13
Apriori: A Candidate Generation & Test Approach
14
The Apriori Algorithm—An Example
Supmin = 2 Itemset sup
Itemset sup
Database TDB {A} 2
Tid Items
L1 {A} 2
C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup 2nd scan {A, B}
{A, C} 2
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2
{B, C} 2 {A, E}
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that
are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
17
Chapter 5: Mining Frequent Patterns, Association
and Correlations: Basic Concepts and Methods
Basic Concepts
Evaluation Methods
Summary
18
Interestingness Measure: Correlations (Lift)
play basketball eat cereal [40%, 66.7%] is misleading
The overall % of students eating cereal is 75% > 66.7%.
play basketball not eat cereal [20%, 33.3%] is more accurate,
although with lower support and confidence
Measure of dependent/correlated events: lift
20
Chapter 5: Mining Frequent Patterns, Association
and Correlations: Basic Concepts and Methods
Basic Concepts
Evaluation Methods
Summary
21
Summary
22
Ref: Basic Concepts of Frequent Pattern Mining
23
Ref: Apriori and Its Improvements
26
Ref: Mining Correlations and Interesting Rules
28
May 10, 2021 Data Mining: Concepts and Techniques 29