DWDM Unit-3
DWDM Unit-3
Syllabus :
•Mining frequent Patterns,Associations and
Corelations
•Mining Methods
• Mining various kinds of Association Rules
•Correlation Analysis
•Constraint based Association Mining
INTRODUCTION
Frequent item set mining methods
1.Apriori
2.Fp-growth
3.Vertical mining algorithm-Apriori-tid
Apriori Example
Apriori Example
Drawback of Apriori
Requires many scans
Suitable for small datasets
Improvements of Apriori
Partitioning
Sampling
Transaction reduction
Dynamic Item set counting
Direct Hashing and pruning
Partitioning
Sampling
Sampling (mining on a subset of the given data):
The basic idea of the sampling
approach is to pick a random sample S of the given
data D, and then search for
frequent itemsets in S instead of D.
Transaction reduction
Transaction reduction (reducing the number of transactions
scanned in future iterations):
A transaction that does not contain any frequent k-itemsets
cannot contain any frequent (k C1)-itemsets. Therefore, such
a transaction can be marked or removed from further
consideration
C1 support
Transaction reduction{bread}
3
tid Items bought
1 Bread,butter {butter} 3
2 Egg,cheese,butter
3 Bread,butter,egg {egg} 3
4 Bread,egg,cheese
5 Milk,yogurt {cheese} 2
{Milk} 1
Minimum support =2
{yogurt} 1
Dynamic Item set counting
Direct hashing and pruning
Hash-based technique (hashing itemsets into corresponding buckets): A hash-based
technique can be used to reduce the size of the candidate k-itemsets, Ck, for k > 1.
For example, when scanning each transaction in the database to generate the frequent
1-itemsets, L1, we can generate all the 2-itemsets for each transaction, hash (i.e., map)
them into the different buckets of a hash table structure, and increase the corresponding
bucket counts .
A 2-itemset with a corresponding bucket count in the hash table that is below the support threshold cannot be
frequent and thus should be removed from the candidate set. Such a hash-based technique may substantially reduce
the number of candidate k-itemsets examined (especially when k = 2).
DHP : Example
DHP : Example
I1 6 T800 I2,I1,I3,I5
I2 7 T900 I2,I1,I3
I3 6
I4 2
Step4
4)Construct FP-tree for the sorted database:
Tree construction starts from creating root node.
Root node for a FP-tree is NULL.
{ }
Step 4 : FP-tree construction
Get the second transaction from the sorted
database and insert into FP tree.
{ }
T100 = I2,I1,I5
I2 :1
I1 : 1
I5:1
Step 4 : FP-tree construction
Get the first transaction from the sorted database
and insert into FP tree.
{ }
T200 = I2,I4
I2 :1
I1 : 1
I5:1
Step 4 : FP-tree construction
Get the third transaction from the sorted database
and insert into FP tree.
{ }
T300 = I2,I3
I2 :2
I1 : 1 I4:1
I5:1
Step 4
Final FP-tree after inserting all transactions
Step5
5)Find conditional pattern base for
each item in Fp-tree.
Consider Item I5 .
Find prefix paths for I5:
{I2:1,I1:1,I5:1} {I2:1,I1:1,I3:1,I5:1}
I2:2 I1:2
I2:4 I2:4
I1:2
I1:2
I1:2
Step6
Conditional Fp-tree ,costructed for conditional pattern base “I3”
I3
{I2 :4,I3:6} {I2,I3 :4}