Data Mining Practical 6
Data Mining Practical 6
(VI SEMESTER)
EXPERIMENT NO: 6
TITLE: Study of th eApriori Algorithm (Finding Frequent Item sets Using Candidate
Generation).
THEORY:
Apriori is a seminal algorithm proposed by R. Agrawal and R. Srikant in 1994 for mining frequent
itemsets for Boolean association rules.
The name of the algorithm is based on the fact that the algorithm uses prior knowledge of frequent
itemset properties, as we shallsee following.
Apriori employs an iterative approach known as a level-wise search, where k-itemsets are usedtoexplore
(k+1)-itemsets. First, the setof frequent 1-itemsets is found by scanning the database to accumulate the
count for each item, and collecting those items that satisfy minimum support. The resulting set is denoted
L1.Next, L1 is used to find L2, the set of frequent 2-itemsets, which is used to find L3, and so on, until no
more frequent k-itemsets can be found. The finding of each Lk requires one full scan of the database.
Property of Apriori
Apriori property: All nonempty subsets of a frequent itemset must also be frequent.
TheApriori property is based on the following observation. By definition, if an itemsetI does not satisfy
the minimum support threshold, min sup, then I is not frequent; thatis, P(I) <min sup. If an item A is
added to the itemsetI, then the resulting itemset (i.e.,I [A) cannot occur more frequently than I. Therefore,
I UA is not frequent either; thatis, P(I UA) <min sup.[1]
This property belongs to a special category of properties called antimonotone in thesense that
if a set cannot pass a test, all of its supersets will fail the same test as well.
AprioriAlgorithm [1]:
Algorithm: Apriori. Find frequent itemsets using an iterative level-wise approach based on
candidategeneration.
Method:
1. L1 = find frequent 1-itemsets(D);
2. for (k = 2;Lk-1≠Φ;k++) {
3. Ck= apriori gen(Lk-1);
4. for each transaction t D f // scan D for counts
5. Ct= subset(Ck, t); // get the subsets of t that are candidates
6. for each candidate c Ct
7. c.count++;
8. }
9. Lk= {c C |c.count ≥min_sup}
10. }
11. return L = k Lk;
Once the frequent itemsets from transactions in a database D have been found,it is straightforward to
generate strong association rules from them (where strong association rules satisfy both minimum support
and minimum confidence). Thiscan be done using following Equation. for confidence, which we show
again here forcompleteness:
Example:
Apriori. Let’s look at a concrete example, based on the AllElectronics transaction database, D, of
Table 1. There are nine transactions in this database, that is, |D| = 9. We use Figure 1 to illustrate
the Apriori algorithm for finding frequent itemsets in D.
Figure 1 Generation of candidate itemsets and frequent itemsets, where the minimum support count is 2.
Let’s try an example based on the transactional datafor AllElectronicsshown in Table 1. Suppose the data
contain the frequent itemsetl = fI1, I2, I5g. What are the association rules that can be generated from l?
Thenonempty subsets of l are fI1, I2g, fI1, I5g, fI2, I5g, fI1g, fI2g, and fI5g. Theresulting association
rules are as shown below, each listed with its confidence:
If the minimum confidence threshold is, say, 70%, then only the second, third, andlast rules above are
output, because these are the only ones generated that are strong.Note that, unlike conventional
classification rules, association rules can contain morethan one conjunct in the right-hand side of the rule.
References:
[1] JiaweiHan,MichelineKamber,"Data Mining:Concepts and Techniques",Second Edition,
University of Illinois at Urbana-Champaign
EXCERSICE:
1) Take one dataset from the http://archive.ics.uci.edu/ml/or any. And perform Apriori algorithm on
that data in Weka tool. Take screen shot.
2) How do we interpret the clustering output?
3) Write down the disadvantage of Apriori algorithm.
EVALUATION:
Observation &
Timely completion Viva Total
Implementation
4 2 4 10
Signature: ____________
Date: ________________