Unit-7 Apriori
Unit-7 Apriori
Apriori algorithm
Apriori algorithm was the first algorithm that was proposed for frequent itemset mining. It was introduced by R Agarwal
and R Srikant.
Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset properties.
*
Apriori Property
All subsets of a frequent itemset must be frequent (Apriori property).
Join Step: This step generates (K+1) itemset from K-itemsets by joining each item with itself.
Prune Step: This step scans the count of each item in the database. If the candidate item does not meet minimum support,
then it is regarded as infrequent and thus it is removed. This step is performed to reduce the size of the candidate itemsets.
The above join and the prune steps iteratively until the most frequent itemsets are achieved.
munira.topia@ljku.edu.in | AML
2
Consider the following dataset and find frequent item sets and generate association rules for them. Assume that minimum
support threshold (s = 50%) and minimum confident threshold (c = 80%).
T1 I1, I2, I3
T2 I2, I3, I4
T3 I4, I5
559
T4 I1, I2, I4
Solution
(i) Create a table containing support count of each item present in dataset – Called C1 (candidate set).
Item Count
I1 4
I2 5
I3 4
I4 4
I5 2
(ii) Prune Step: Compare candidate set item’s support count with minimum support count. The above table shows that I5 item
does not meet min_sup = 3, thus it is removed, only I1, I2, I3, I4 meet min_sup count.
Item Count
I1 4
I2 5
I3 4
I4 4
Step-2:
(i) Join step: Generate candidate set C2 (2-itemset) using L1.And find out the occurrences of 2-itemset from the given
dataset.
Item Count
I1, I2 4
I1, I3 3
I1, I4 2
munira.topia@ljku.edu.in | AML
*
3
I2, I3 4
I2, I4 3
I3, I4 2
(ii) Prune Step: Compare candidate set item’s support count with minimum support count. The above table shows that item sets
{I1, I4} and {I3, I4} does not meet min_sup = 3, thus those are removed.
Item Count
I1, I2 4
I1, I3 3
I2, I3 4
I2, I4 3
Step-3:
(i) Join step: Generate candidate set C3 (3-itemset) using L2.And find out the occurrences of 3-itemset from the given
dataset.
Item Count
I1, I2, I3 3
I1, I2, I4 2
I1, I3, I4 1
I2, I3, I4 2
(ii) Prune Step: Compare candidate set item’s support count with minimum support count. The above table shows that
itemset {I1, I2, I4}, {I1, I3, I4} and {I2, I3, I4} does not meet min_sup = 3, thus those are removed. Only the item set {I1, I2,
I3} meet min_sup count.
This shows that the association rule {I1, I3} ⇒ {I2} is strong if minimum confidence threshold is 80%.
munira.topia@ljku.edu.in | AML