0% found this document useful (0 votes)
47 views20 pages

SE 458 - Data Mining (DM) : Spring 2019 Section W1

The document discusses frequent pattern mining and association rule mining. It describes the Apriori algorithm, which uses a candidate generation-and-test approach to find frequent itemsets in transaction data. The algorithm makes multiple passes over the data and prunes supersets of infrequent itemsets to improve efficiency. The document also discusses methods for scaling Apriori, such as partitioning data, sampling, and transaction reduction.

Uploaded by

rock
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views20 pages

SE 458 - Data Mining (DM) : Spring 2019 Section W1

The document discusses frequent pattern mining and association rule mining. It describes the Apriori algorithm, which uses a candidate generation-and-test approach to find frequent itemsets in transaction data. The algorithm makes multiple passes over the data and prunes supersets of infrequent itemsets to improve efficiency. The document also discusses methods for scaling Apriori, such as partitioning data, sampling, and transaction reduction.

Uploaded by

rock
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

SE 458 - Data Mining (DM)

Spring 2019
Section W1

Lecture 22: Apriori-


Association Rule Mining

Dr. Malik Tahir Hassan, University of Management and


Chapter 5: Mining Frequent Patterns, Association and
Correlations: Basic Concepts and Methods

 Basic Concepts

 Frequent Itemset Mining Methods

 Which Patterns Are Interesting?—Pattern

Evaluation Methods

 Summary

2
Scalable Frequent Itemset Mining Methods

Apriori: A Candidate Generation-and-Test

Approach

Improving the Efficiency of Apriori

FPGrowth: A Frequent Pattern-Growth

Approach

ECLAT: Frequent Pattern Mining with

3
Vertical Data Format
The Downward Closure Property and
Scalable Mining Methods
 The downward closure property of frequent patterns
Any subset of a frequent itemset must be frequent
If {cola, diaper, nuts} is frequent, so is {cola,
diaper}
i.e., every transaction having {cola, diaper, nuts}
also contains {cola, diaper}

4
Apriori: A Candidate Generation & Test
Approach

 Apriori pruning principle: If there is any itemset


which is infrequent, its superset should not be
generated/tested! (Agrawal & Srikant @VLDB’94,
Mannila, et al. @ KDD’ 94)
 Method:
Initially, scan DB once to get frequent 1-itemset
Generate length (k+1) candidate itemsets from
length k frequent itemsets
Prune all itemsets having any infrequent subset
Test the candidates against DB
5 Terminate when no frequent or candidate set can
The Apriori Algorithm—An Example
Supmin = 2 Itemset sup
Database TDB Itemset sup
{A} 2
L1 {A} 2
Tid Items C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2
Itemset
{A, B} 1
L2 Itemset sup 2nd scan {A, B}
{A, C} 2
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2 {A, E}
{B, C} 2
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset L3 Itemset sup


3rd scan
{B, C, E} {B, C, E} 2
6
The Apriori Algorithm
(Pseudo-Code)
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
7
return k Lk;
Implementation of Apriori
 How to generate candidates?
Step 1: self-joining Lk
Step 2: pruning
 Example of Candidate-generation
L3={abc, abd, acd, ace, bcd}
Self-joining: L3*L3
 abcd from abc and abd
 acde from acd and ace
Pruning:
 acde is removed because ade is not in L3

C4 = {abcd}
8
Exercise

Let minimum support = 2


Exercise…
What about L4?
Generating Association Rules
from Frequent Itemsets

For each frequent itemset l, generate all


nonempty subsets of l.
For every nonempty subset s of l, output
the rule
s  l - s
Scalable Frequent Itemset Mining Methods

 Apriori: A Candidate Generation-and-Test Approach

 Improving the Efficiency of Apriori

 FPGrowth: A Frequent Pattern-Growth Approach

 ECLAT: Frequent Pattern Mining with Vertical Data

Format

14
Further Improvement of the Apriori Method

 Major computational challenges

Multiple scans of transaction database

Huge number of candidates

Tedious workload of support counting for

candidates
 Improving Apriori: general ideas

Reduce passes of transaction database scans

Shrink number of candidates

Facilitate support counting of candidates


15
Partition: Scan Database Only Twice

 Any itemset that is potentially frequent in DB must


be frequent in at least one of the partitions of DB
Scan 1: partition database and find local frequent
patterns
Scan 2: consolidate global frequent patterns

DB1 + DB2 + + DBk = DB


sup1(i) < sup2(i) < supk(i) < sup(i) < σDB
σDB1 σDB2 σDBk
Sampling for Frequent Patterns

 Select a sample of original database, mine

frequent patterns within sample using Apriori


 Scan database once to verify frequent itemsets

found in sample, only borders of closure of


frequent patterns are checked
Example: check abcd instead of ab, ac, …,

etc.
 Scan database again to find missed frequent

patterns
17
 H. Toivonen. Sampling large databases for
DIC: Reduce Number of Scans

ABCD
 Once both A and D are determined
frequent, the counting of AD begins
ABC ABD ACD BCD  Once all length-2 subsets of BCD are
determined frequent, the counting of
BCD begins
AB AC BC AD BD CD
Transactions
1-itemsets
A B C D
Apriori 2-itemsets

{}
Itemset lattice 1-itemsets
S. Brin R. Motwani, J. Ullman, 2-items
and S. Tsur. Dynamic itemset DIC 3-items
counting and implication
rules for market basket data.
18
In SIGMOD’97
Transaction Reduction
Reducing the number of transactions
scanned in future iterations

A transaction that does not contain any


frequent k-itemsets cannot contain any
frequent (k +1)-itemsets

Mark or remove such transactions


Scalable Frequent Itemset Mining Methods

 Apriori: A Candidate Generation-and-Test

Approach

 Improving the Efficiency of Apriori

 ECLAT: Frequent Pattern Mining with Vertical

Data Format

20

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy