0% found this document useful (0 votes)

9 views62 pages

DMT Unit-IV - UR20 - New

Uploaded by

Light Yagmi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views62 pages

DMT Unit-IV - UR20 - New

Uploaded by

Light Yagmi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 62

UNIT-IV

ASSOCIATION ANALYSIS:
 “Association analysis is the process
of discovering interesting
relationships hidden in large data
sets”.
For example,
 huge amounts of customer purchase data are
collected daily at the counters of grocery
stores.
 such data, commonly known as market basket
transactions.
Each row in this table corresponds to a
transaction, which contains a unique identiﬁer
labeled TID and a set of items bought by a
given customer.
The following rule can be extracted from the
data set
{Diapers} → {Beer}.
The rule suggests that many customers who buy
diapers also buy beer.
PROBLEM DEFINITION:
basic terminology used in association analysis .
Binary Representation
Market basket data can be represented in a
binary
TID format as shown
Bread Milk in Table
Diapers Beer Eggs Cola
1 1 1 0 0 0 0
2 1 0 1 1 1 0
3 0 1 1 1 0 1
4 1 1 1 1 0 0
5 1 1 1 0 0 1

A binary 0/1 representation of market basket data.

Item set :
 a collection of zero or more items is termed
as itemset.
 Example: If an itemset contains k items, it is
called a k-itemset.
 For instance, {Beer, Diapers, Milk} is an
example of a 3-itemset.
 The null (or empty) set is an itemset that does
not contain any items.
Support Count: which refers to the number of
transactions that contain a particular itemset.

 Example: The support count for {Beer,

Diapers, Milk} is equal to two because there
are only two transactions that contain all
three items.
Association Rule :
 An association rule is an implication
expression of the form X → Y ,
where X and Y are disjoint item sets, i.e.,
X ∩ Y = ∅.
 The strength of an association rule can be
measured in terms of its support and
confidance.
Support determines how often a rule is
applicable to a given data set,
while confidence determines how frequently
items in Y appear in transactions that contain
X. The formal definitions of these metrics are
• Support, s(X → Y ) = σ(X ∪ Y );
N

• Conﬁdence, c(X → Y ) =σ(X ∪ Y ) .

σ(X)
EXAMPLE:
 Consider the rule {Milk, Diapers} → {Beer}.
Since the support count for {Milk, Diapers,
Beer} is 2 and the total number of trans-
actions is 5, the rule’s support is 2/5 = 0.4.
 The rule’s conﬁdence is obtained by dividing
the support count for {Milk, Diapers, Beer} by
the support count for {Milk, Diapers}.
 Since there are 3 transactions that contain
milk and diapers, the conﬁdence for this rule
is 2/3 = 0.67.

P. DEEPIKA, ASSISTANT PROFESSOR, URCET

FREQUENT ITEM SET GENERATION:
 A lattice structure can be used to enumerate
the list of all possible itemsets. Figure shows
an item set lattice for I = {a, b, c, d, e}.

 In general, a data set that contains k items can

potentially generate up to 2k − 1 frequent item
sets, excluding the null set.
nul
l

a b c d e

a a a a b b b c c d
b c d e c d e d e e

ab ab ab ac ac ad bc bc bd cd
c d e d e e d e e e

abc abc abd acd bcd

d e e e e

abcd
Apriori Principle.
“If an item set is frequent, then all of its subsets
must also be frequent”.
• Suppose {c, d, e} is a frequent item set.
Clearly, any transaction that contains {c, d, e}
must also contain its subsets, {c, d},{c, e}, {d,
e}, {c}, {d}, and {e}.
• As a result, if {c, d, e} is frequent, then all
subsets of {c, d, e} (i.e., the shaded itemsets in
this ﬁgure) must also be frequent.
nul
l

a b c d e

a a a a b b b c c d
b c d e c d e d e e

ab ab ab ac ac ad bc bc bd cd
c d e d e e d e e e

abc abc abd acd bcd

d e e e e
Freque
nt
Itemset
abcd
“if an itemset such as {a, b} is infrequent, then
all of its supersets must be infrequent too”.
nul
l
Infreque
nt a b c d e
Itemset

a a a a b b b c c d
b c d e c d e d e e

ab abd ac ac ad bc bc bd cd
c abe d e e d e e e

abc abc abd acd bcd

d e e e e
Pruned
Superset
s
abcd
Apriori Algorithm
Frequent Item set Generation in the Apriori
Algorithm:

 Apriori is the ﬁrst association rule mining

algorithm
Apriori Algorithm example
Apriori: A Candidate Generation & Test Approach

Apriori Principle.
“If an item set is frequent, then all of its subsets must also be
frequent”.
• Apriori pruning principle: If there is any itemset which is infrequent,
its superset should not be generated/tested!)
• Method:
– Initially, scan DB once to get frequent 1-itemset
– Generate length (k+1) candidate itemsets from length k
frequent itemsets
– Test the candidates against DB
– Terminate when no frequent or candidate set can be generated
22
The Apriori Algorithm—An Example-1
Supmin = 2 Itemset sup
Itemset sup
Database TDB {A} 2
L1 {A} 2
Tid Items C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup 2nd scan {A, B}
{A, C} 2
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2 {A, E}
{B, C} 2
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset L3 Itemset sup

3rd scan
{B, C, E} {B, C, E} 2
23
The Apriori Algorithm

Step 1: Ck: Candidate item set of size k

Step 2: Lk : frequent item set of size k
Step 3: L1 = {frequent items};
Step 4: for (k = 1; Lk !=; k++) do begin
Step 5: Ck+1 = candidates generated from Lk;
step 6: for each transaction t in database do
Step 7: increment the count of all candidates
in Ck+1 that are contained in t
step 8: Lk+1 = candidates in Ck with min_support
24
The Apriori Algorithm—An Example-2
EXAMPLE-2

The Apriori
Algorithm
RULE GENERATION
How to extract association rules efficiently
from a given frequent item set?
 Each frequent k-itemset, Y , can produce up
to 2k −2 association rules.
 An association rule can be extracted by
partitioning the item set Y into two non-
empty subsets, X and Y − X, such that
X → Y − X satisfies the confidence threshold.
Example.
Let X = {1, 2, 3} be a frequent item set. There are six
candidate association rules that can be generated
from X:
1) {1, 2} → {3},
2) {1, 3} →{2},
3) {2, 3} → {1},
4) {1} → {2, 3},
5) {2} → {1, 3}, and
6) {3} → {1, 2}. As
Note: Each of their support is identical to the support
for X, the rules must satisfy the support threshold.
Consider the rule {1, 2} → {3},
which is generated from the frequent item set
X = {1, 2, 3}.
 The confidence for this rule is
σ({1, 2, 3})/σ({1, 2}).

Because {1, 2, 3} is frequent, the antimonotone

property of support ensures that {1, 2} must
be frequent, too.
CONFIDENCE-BASED PRUNING
 Unlike the support measure, confidence does
not have any monotone property.
 For example, the confidence for X → Y can be
larger, smaller, or equal to the confidence for
another rule X˜ → Y˜ ,
where X˜ ⊆ X and Y˜ ⊆ Y
 if we compare rules generated from the same
frequent item set Y ,
the following theorem holds for the
confidence measure.
Theorem: If a rule X → Y −X does not satisfy
the confidence threshold, then
any rule XJ → Y − XJ, where XJ is a subset
of X, must not satisfy the confidence threshold
as well.
Rule Generation in Apriori Algorithm
 The Apriori algorithm uses a level-wise
approach for generating association rules.

 where each level corresponds to the number

of items that belong to the rule consequent.

 Initially, all the low-conﬁdence rules that have

only one item in the rule consequent are
extracted.
 a lattice structure for the association rules
generated from the frequent itemset
{a, b, c, d}.
 If any node in the lattice has low conﬁdence,
then according to Theorem.
Low-
Confidence
Rule abcd=>
{}

bcd= acd= abd= abc=

>a >b >c >d

cd=> bd=> bc=> ad=> ac=> ab=>

ab ac ad bc bd cd

d=>a c=>a b=>a a=>b

bc bd cd cd
Prune
d
Rules
COMPACT REPRESENTATION OF
FREQUENT ITEMSETS
 In practice, the number of frequent item sets
produced from a transaction data set can be
very large.

 It is useful to identify a small representative

set of item sets from which all other frequent
item sets can be derived.
Two such compact representations are
1) Maximal frequent item sets.
2) Closed frequent item sets.
1) Maximal Frequent Item sets

Deﬁnition . “A maximal frequent item- set

is deﬁned as a frequent item set for
which none of its immediate supersets
are frequent”.
Maximal
Frequent
Itemset
a
b c d e

ab ad ae bc bd be cd ce
ac
de

abc abd ab acd ace ade bc bce bde

e d cde

abc abc abd acd bcd

d e e e e Frequen
t
Frequent
abcd Itemset
Closed Frequent Item sets

 Deﬁnition: An item set X is closed if none of its

immediate supersets has exactly the same
support count as X.
FP-GROWTH ALGORITHM

an alternative algorithm called FP-growth to

discovering frequent item sets.
Pattern-Growth Approach: Mining Frequent Patterns Without
Candidate Generation

• Bottlenecks of the Apriori approach

1) It may need to generate a huge number of candidate
set.
2) It may need to repeatedly scan the data base .
• The FPGrowth Approach
– Avoid explicit candidate generation

46
 FP growth algorithm first construct the data
structure called an FP-tree and extracts
frequent item sets directly from this structure.
FP-TREE REPRESENTATION
 An FP-tree is a compressed representation of
the input data.
 It is constructed by reading the data set one
transaction at a time and mapping each
transaction onto a path in the FP-tree.
EXAMPLE-1

DWDM FINAL4
No ratings yet
DWDM FINAL4
37 pages
Unit 4
No ratings yet
Unit 4
72 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Association Rules Notes
No ratings yet
Association Rules Notes
30 pages
Unit 4
No ratings yet
Unit 4
97 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
667a8d24bb947 PPT
No ratings yet
667a8d24bb947 PPT
24 pages
Association Rule
No ratings yet
Association Rule
22 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
Chapter 5 - Association Rule Mining
No ratings yet
Chapter 5 - Association Rule Mining
45 pages
DWDM Module III
No ratings yet
DWDM Module III
33 pages
Unit 4 .3 Association Analysis
No ratings yet
Unit 4 .3 Association Analysis
50 pages
DWM Unit-4 Sem Ans
No ratings yet
DWM Unit-4 Sem Ans
9 pages
304A Data Warehousing and Data Mining Unit-3
No ratings yet
304A Data Warehousing and Data Mining Unit-3
17 pages
Unit 4
No ratings yet
Unit 4
21 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Association Rule Mining
No ratings yet
Association Rule Mining
19 pages
MODULE 3 - Question &answer-2
No ratings yet
MODULE 3 - Question &answer-2
32 pages
Association Rules
No ratings yet
Association Rules
33 pages
06 Association Rules
No ratings yet
06 Association Rules
32 pages
Unit4 1 Association Rules Apriori
No ratings yet
Unit4 1 Association Rules Apriori
23 pages
Mod 5
No ratings yet
Mod 5
56 pages
DM-M4.1-Association v25.4.2
No ratings yet
DM-M4.1-Association v25.4.2
40 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Data Mining Unit-III
No ratings yet
Data Mining Unit-III
24 pages
Association Rule Mapping - Unit-4
No ratings yet
Association Rule Mapping - Unit-4
11 pages
DWDM Unit 4 (R22)
No ratings yet
DWDM Unit 4 (R22)
25 pages
DWM Unit 4
No ratings yet
DWM Unit 4
11 pages
Mod 4 Part1 - Merged
No ratings yet
Mod 4 Part1 - Merged
104 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
12 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Comparative Evaluation of Association Rule Mining Algorithms With Frequent Item Sets
No ratings yet
Comparative Evaluation of Association Rule Mining Algorithms With Frequent Item Sets
7 pages
Q) Frequent Itemset Generation: States That If An Itemset Is Frequent, Then All of Its Subsets Must Also Be Frequent. This
No ratings yet
Q) Frequent Itemset Generation: States That If An Itemset Is Frequent, Then All of Its Subsets Must Also Be Frequent. This
9 pages
Association Analysis: Basic Concepts and Algorithms: Problem Definition
No ratings yet
Association Analysis: Basic Concepts and Algorithms: Problem Definition
15 pages
Unit-5 DWDM
No ratings yet
Unit-5 DWDM
7 pages
Multilevel Queues
50% (2)
Multilevel Queues
15 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
27 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
UNIT-5 DWDM (Data Warehousing and Data Mining) Association Analysis
No ratings yet
UNIT-5 DWDM (Data Warehousing and Data Mining) Association Analysis
7 pages
Study On Application of Apriori Algorithm in Data Mining
No ratings yet
Study On Application of Apriori Algorithm in Data Mining
4 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
Association Rules
No ratings yet
Association Rules
24 pages
Association Analysis: Unit-V
No ratings yet
Association Analysis: Unit-V
12 pages
Association Rule
No ratings yet
Association Rule
27 pages
An Efficient Algorithm For Mining
No ratings yet
An Efficient Algorithm For Mining
6 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
50 Coding Interview Questions V2
No ratings yet
50 Coding Interview Questions V2
37 pages
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
No ratings yet
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
7 pages
Intro To Advanced Applied Algorithms Nitk 2013
No ratings yet
Intro To Advanced Applied Algorithms Nitk 2013
1,908 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
40 pages
Lab8 Apriori
No ratings yet
Lab8 Apriori
9 pages
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
No ratings yet
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
5 pages
Closet - An Efficient Algorithm For Mining Frequent
No ratings yet
Closet - An Efficient Algorithm For Mining Frequent
8 pages
Time-Series-Based Supply Planning Heuristic - SAP Help Portal
No ratings yet
Time-Series-Based Supply Planning Heuristic - SAP Help Portal
2 pages
Non-Preemptive Shortest Job First CPU Scheduling Algorithm
No ratings yet
Non-Preemptive Shortest Job First CPU Scheduling Algorithm
38 pages
Course Name: Design and Analysis of Algorithm: B.Tech V Sem Cse
No ratings yet
Course Name: Design and Analysis of Algorithm: B.Tech V Sem Cse
21 pages
213 02 Final 071
No ratings yet
213 02 Final 071
4 pages
Association Rule Mining Using WEKA Explorer
No ratings yet
Association Rule Mining Using WEKA Explorer
5 pages
COSC 101 secondclassALGOWITHFLOWCHART
No ratings yet
COSC 101 secondclassALGOWITHFLOWCHART
25 pages
AI - Experiment 6 - Alpha Beta Pruning
No ratings yet
AI - Experiment 6 - Alpha Beta Pruning
6 pages
Two Pointers
No ratings yet
Two Pointers
23 pages
Asm Part1 Dsa Datlt Bh01906
No ratings yet
Asm Part1 Dsa Datlt Bh01906
45 pages
Simplex Method
No ratings yet
Simplex Method
7 pages
33.8 - Red Black Trees in Java
No ratings yet
33.8 - Red Black Trees in Java
10 pages
l6 Advanced Power System Optimization l5 Simplex Method p2
No ratings yet
l6 Advanced Power System Optimization l5 Simplex Method p2
22 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
50 pages
An Operations Research Case Study: by Rohit Anand 10BME0153
No ratings yet
An Operations Research Case Study: by Rohit Anand 10BME0153
8 pages
Mini Project
No ratings yet
Mini Project
7 pages
DATA STRUCTURE - Lab - Manual Final
No ratings yet
DATA STRUCTURE - Lab - Manual Final
35 pages
Unit 5 - Queues
No ratings yet
Unit 5 - Queues
16 pages
9.circular Linked List
No ratings yet
9.circular Linked List
17 pages
Chapter 5 EX
No ratings yet
Chapter 5 EX
10 pages
Chapter 4
No ratings yet
Chapter 4
28 pages
Rank-Balanced Trees
No ratings yet
Rank-Balanced Trees
26 pages
Lab 2
No ratings yet
Lab 2
3 pages
Prefix Hash Tree: An Indexing Data Structure Over Distributed Hash Tables
No ratings yet
Prefix Hash Tree: An Indexing Data Structure Over Distributed Hash Tables
10 pages
Lecture 27
No ratings yet
Lecture 27
7 pages
PFC Challenge Exam Questions
No ratings yet
PFC Challenge Exam Questions
2 pages
Mts 3023: Data Structures SEMESTER 1 SESI 2018/2019 Assignment 1
No ratings yet
Mts 3023: Data Structures SEMESTER 1 SESI 2018/2019 Assignment 1
6 pages
CSC 233 Exam 2010-2011
No ratings yet
CSC 233 Exam 2010-2011
2 pages
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DMT Unit-IV - UR20 - New

Uploaded by

DMT Unit-IV - UR20 - New

Uploaded by

UNIT-IV

A binary 0/1 representation of market basket data.

 Example: The support count for {Beer,

• Conﬁdence, c(X → Y ) =σ(X ∪ Y ) .

P. DEEPIKA, ASSISTANT PROFESSOR, URCET

 In general, a data set that contains k items can

abc abc abd acd bcd

abc abc abd acd bcd

abc abc abd acd bcd

 Apriori is the ﬁrst association rule mining

C3 Itemset L3 Itemset sup

Step 1: Ck: Candidate item set of size k

Because {1, 2, 3} is frequent, the antimonotone

 where each level corresponds to the number

 Initially, all the low-conﬁdence rules that have

bcd= acd= abd= abc=

cd=> bd=> bc=> ad=> ac=> ab=>

d=>a c=>a b=>a a=>b

 It is useful to identify a small representative

Deﬁnition . “A maximal frequent item- set

abc abd ab acd ace ade bc bce bde

abc abc abd acd bcd

 Deﬁnition: An item set X is closed if none of its

an alternative algorithm called FP-growth to

• Bottlenecks of the Apriori approach

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.