0% found this document useful (0 votes)
2 views5 pages

Data Mining Practical 6

The document outlines an experiment on the Apriori algorithm, focusing on finding frequent itemsets through candidate generation. It details the algorithm's theory, properties, and implementation steps, along with an example using a transaction database. Additionally, it includes exercise tasks for students to apply the algorithm using Weka tools and interpret the results.

Uploaded by

akhilpapa303
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views5 pages

Data Mining Practical 6

The document outlines an experiment on the Apriori algorithm, focusing on finding frequent itemsets through candidate generation. It details the algorithm's theory, properties, and implementation steps, along with an example using a transaction database. Additionally, it includes exercise tasks for students to apply the algorithm using Weka tools and interpret the results.

Uploaded by

akhilpapa303
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 5

Shree Swaminarayan Institute of Technology CE DEPT.

(VI SEMESTER)

EXPERIMENT NO: 6

TITLE: Study of th eApriori Algorithm (Finding Frequent Item sets Using Candidate
Generation).

OBJECTIVE:On completion of this exercise student will able to know about…

1. What is frequent pattern?


2. What is Association rule mining?
3. Study about Association rule mining algorithm (Apriori).
4. Implementation of Apriori algorithm in Weka tools and interpret the output.

THEORY:

What is Apriori algorithm?

Apriori is a seminal algorithm proposed by R. Agrawal and R. Srikant in 1994 for mining frequent
itemsets for Boolean association rules.

The name of the algorithm is based on the fact that the algorithm uses prior knowledge of frequent
itemset properties, as we shallsee following.

Apriori employs an iterative approach known as a level-wise search, where k-itemsets are usedtoexplore
(k+1)-itemsets. First, the setof frequent 1-itemsets is found by scanning the database to accumulate the
count for each item, and collecting those items that satisfy minimum support. The resulting set is denoted
L1.Next, L1 is used to find L2, the set of frequent 2-itemsets, which is used to find L3, and so on, until no
more frequent k-itemsets can be found. The finding of each Lk requires one full scan of the database.

Property of Apriori

Apriori property: All nonempty subsets of a frequent itemset must also be frequent.

TheApriori property is based on the following observation. By definition, if an itemsetI does not satisfy
the minimum support threshold, min sup, then I is not frequent; thatis, P(I) <min sup. If an item A is
added to the itemsetI, then the resulting itemset (i.e.,I [A) cannot occur more frequently than I. Therefore,
I UA is not frequent either; thatis, P(I UA) <min sup.[1]

This property belongs to a special category of properties called antimonotone in thesense that

if a set cannot pass a test, all of its supersets will fail the same test as well.

It iscalled antimonotonebecause the property is monotonic in the context of failing a test.

AprioriAlgorithm [1]:

Algorithm: Apriori. Find frequent itemsets using an iterative level-wise approach based on
candidategeneration.

Input:D, a database of transactions;


Student Name (Enrollment No) Page no
Shree Swaminarayan Institute of Technology CE DEPT. (VI SEMESTER)

Min_sup, the minimum support count threshold.

Output: L, frequent itemsets in D.

Method:
1. L1 = find frequent 1-itemsets(D);
2. for (k = 2;Lk-1≠Φ;k++) {
3. Ck= apriori gen(Lk-1);
4. for each transaction t D f // scan D for counts
5. Ct= subset(Ck, t); // get the subsets of t that are candidates
6. for each candidate c Ct
7. c.count++;
8. }
9. Lk= {c C |c.count ≥min_sup}
10. }
11. return L = k Lk;

procedureapriori gen(Lk-1:frequent (k-1)-itemsets)


1. for each itemsetl1 Lk-1

2. for each itemsetl2 Lk-1


3. if (l1[1] = l2[1])^(l1[2] = l2[2])^:::^(l1[k-2] = l2[k-2])^(l1[k-1] <l2[k-1]) then {
4. c = l1join l2; // join step: generate candidates
5. if has infrequent subset(c, Lk-1) then
6. delete c; // prune step: remove unfruitful candidate
7. else add c to Ck;
8. }
9. return Ck;

procedure has infrequent subset(c: candidate k-itemset;


Lk-1: frequent (k-1)-itemsets); // use prior knowledge
1. for each (k-1)-subset s of c
2. if s not belongs toLk-1then
3. return TRUE;
4. return FALSE;

Generating Association Rules from Frequent Itemsets

Once the frequent itemsets from transactions in a database D have been found,it is straightforward to
generate strong association rules from them (where strong association rules satisfy both minimum support
and minimum confidence). Thiscan be done using following Equation. for confidence, which we show
again here forcompleteness:

Student Name (Enrollment No) Page no


Shree Swaminarayan Institute of Technology CE DEPT. (VI SEMESTER)

The conditional probability is expressed in terms of itemset support count, wheresupport_count(A B) is


the number of transactions containing the itemsetsA B, andsupport count(A) is the number of
transactions containing the itemsetA. Based on thisequation, association rules can be generated as follows:

 For each frequent itemsetl, generate all nonempty subsets of l.


 For every nonempty subset s of l, output the rule “s =>l -s)” if (support count(l) / support
count(s) ) ≥ min_conf, where min_confis the minimum confidence threshold.

Example:

Apriori. Let’s look at a concrete example, based on the AllElectronics transaction database, D, of
Table 1. There are nine transactions in this database, that is, |D| = 9. We use Figure 1 to illustrate
the Apriori algorithm for finding frequent itemsets in D.

Table 1Transactional data for an AllElectronics branch.

TID List of item_IDs.


T100 I1,I2,I5
T200 I2,I4
T300 I2,I3
T400 I1,I2,I4
T500 I1,I3
T600 I2,I3
T700 I1,I3
T800 I1,I2,I3,I5
T900 I1,I2,I3

Student Name (Enrollment No) Page no


Shree Swaminarayan Institute of Technology CE DEPT. (VI SEMESTER)

Figure 1 Generation of candidate itemsets and frequent itemsets, where the minimum support count is 2.

Let’s try an example based on the transactional datafor AllElectronicsshown in Table 1. Suppose the data
contain the frequent itemsetl = fI1, I2, I5g. What are the association rules that can be generated from l?
Thenonempty subsets of l are fI1, I2g, fI1, I5g, fI2, I5g, fI1g, fI2g, and fI5g. Theresulting association
rules are as shown below, each listed with its confidence:

I1^I2 =>I5, confidence = 2=4 = 50%


I1^I5=>I2, confidence = 2=2 = 100%
I2^I5=>I1, confidence = 2=2 = 100%
I1=>I2^I5, confidence = 2=6 = 33%
I2=>I1^I5, confidence = 2=7 = 29%
I5=>I1^I2, confidence = 2=2 = 100%

If the minimum confidence threshold is, say, 70%, then only the second, third, andlast rules above are
output, because these are the only ones generated that are strong.Note that, unlike conventional
classification rules, association rules can contain morethan one conjunct in the right-hand side of the rule.

References:
[1] JiaweiHan,MichelineKamber,"Data Mining:Concepts and Techniques",Second Edition,
University of Illinois at Urbana-Champaign

EXCERSICE:

Student Name (Enrollment No) Page no


Shree Swaminarayan Institute of Technology CE DEPT. (VI SEMESTER)

1) Take one dataset from the http://archive.ics.uci.edu/ml/or any. And perform Apriori algorithm on
that data in Weka tool. Take screen shot.
2) How do we interpret the clustering output?
3) Write down the disadvantage of Apriori algorithm.

EVALUATION:

Observation &
Timely completion Viva Total
Implementation
4 2 4 10

Signature: ____________

Date: ________________

Student Name (Enrollment No) Page no

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy