SE 458 - Data Mining (DM) : Spring 2019 Section W1
SE 458 - Data Mining (DM) : Spring 2019 Section W1
Spring 2019
Section W1
Basic Concepts
Evaluation Methods
Summary
2
Scalable Frequent Itemset Mining Methods
Approach
Approach
3
Vertical Data Format
The Downward Closure Property and
Scalable Mining Methods
The downward closure property of frequent patterns
Any subset of a frequent itemset must be frequent
If {cola, diaper, nuts} is frequent, so is {cola,
diaper}
i.e., every transaction having {cola, diaper, nuts}
also contains {cola, diaper}
4
Apriori: A Candidate Generation & Test
Approach
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
7
return k Lk;
Implementation of Apriori
How to generate candidates?
Step 1: self-joining Lk
Step 2: pruning
Example of Candidate-generation
L3={abc, abd, acd, ace, bcd}
Self-joining: L3*L3
abcd from abc and abd
acde from acd and ace
Pruning:
acde is removed because ade is not in L3
C4 = {abcd}
8
Exercise
Format
14
Further Improvement of the Apriori Method
candidates
Improving Apriori: general ideas
etc.
Scan database again to find missed frequent
patterns
17
H. Toivonen. Sampling large databases for
DIC: Reduce Number of Scans
ABCD
Once both A and D are determined
frequent, the counting of AD begins
ABC ABD ACD BCD Once all length-2 subsets of BCD are
determined frequent, the counting of
BCD begins
AB AC BC AD BD CD
Transactions
1-itemsets
A B C D
Apriori 2-itemsets
…
{}
Itemset lattice 1-itemsets
S. Brin R. Motwani, J. Ullman, 2-items
and S. Tsur. Dynamic itemset DIC 3-items
counting and implication
rules for market basket data.
18
In SIGMOD’97
Transaction Reduction
Reducing the number of transactions
scanned in future iterations
Approach
Data Format
20