0% found this document useful (0 votes)

24 views29 pages

06apriori Edited v3

Chapter 5 of 'Data Mining: Concepts and Techniques' discusses frequent pattern mining, including basic concepts, methods for mining frequent itemsets, and evaluation techniques for identifying interesting patterns. It highlights the significance of frequent patterns in various applications such as market analysis and DNA sequence analysis. The chapter also covers scalable mining methods like the Apriori algorithm and introduces concepts like closed patterns and max-patterns to reduce computational complexity.

Uploaded by

Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views29 pages

06apriori Edited v3

Uploaded by

Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Data Mining:

Concepts and Techniques

(3rd ed.)

— Chapter 6 —

1
May 10, 2021 Data Mining: Concepts and Techniques 2
Chapter 5: Mining Frequent Patterns, Association
and Correlations: Basic Concepts and Methods

 Basic Concepts

 Frequent Itemset Mining Methods

 Which Patterns Are Interesting?—Pattern

Evaluation Methods

 Summary

3
What Is Frequent Pattern Analysis?
 Frequent pattern: a pattern (a set of items, subsequences, substructures,
etc.) that occurs frequently in a data set
 First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context
of frequent itemsets and association rule mining
 Motivation: Finding inherent regularities in data
 What products were often purchased together?— Beer and diapers?!
 What are the subsequent purchases after buying a PC?
 What kinds of DNA are sensitive to this new drug?
 Can we automatically classify web documents?
 Applications
 Basket data analysis, cross-marketing, catalog design, sale campaign
analysis, Web log (click stream) analysis, and DNA sequence analysis.
4
Why Is Freq. Pattern Mining Important?

 Freq. pattern: An intrinsic and important property of

datasets
 Foundation for many essential data mining tasks
 Association, correlation, and causality analysis

 Sequential, structural (e.g., sub-graph) patterns

 Pattern analysis in spatiotemporal, multimedia, time-

series, and stream data

 Classification: discriminative, frequent pattern analysis

 Cluster analysis: frequent pattern-based clustering

 Data warehousing: iceberg cube and cube-gradient

5
Basic Concepts: Frequent Itemset

Tid Items bought  itemset: A set of one or more

10 Beer, Nuts, Diaper items
20 Beer, Coffee, Diaper  k-itemset X = {x1, …, xk}
30 Beer, Diaper, Eggs
 (absolute) support, or, support
40 Nuts, Eggs, Milk count of X: Frequency or
50 Nuts, Coffee, Diaper, Eggs, Milk occurrence of an itemset X
Customer Customer
 (relative) support, s, is the
buys both buys diaper fraction of transactions that
contains X (i.e., the probability
that a transaction contains X)
 An itemset X is frequent if X’s
support is no less than a minsup
Customer
buys beer
threshold

6
Basic Concepts: Association Rules
Tid Items bought  Find all the rules X  Y with
10 Beer, Nuts, Diaper
minimum support and confidence
20 Beer, Coffee, Diaper
30 Beer, Diaper, Eggs  support, s, probability that a
40 Nuts, Eggs, Milk transaction contains X  Y
50 Nuts, Coffee, Diaper, Eggs, Milk
 confidence, c, conditional
probability that a transaction
Customer Customer
buys both
having X also contains Y
buys
diaper
Let minsup = 50%, minconf = 50%
Freq. Pat.: Beer:3, Nuts:3, Diaper:4, Eggs:3,
Customer {Beer, Diaper}:3
buys beer  Association rules: (many more!)
 Beer  Diaper (60%, 100%)
 Diaper  Beer (60%, 75%)
7
Compute Support and Confidence

May 10, 2021 Data Mining: Concepts and Techniques 8

Closed Patterns and Max-Patterns
 A long pattern contains a combinatorial number of sub-
patterns, e.g., {a1, …, a100} contains (1001) + (1002) + … +
(110000) = 2100 – 1 = 1.27*1030 sub-patterns!
 Solution: Mine closed patterns and max-patterns instead
 An itemset X is closed if X is frequent and there exists no
super-pattern Y ⸦ X, with the same support as X
(proposed by Pasquier, et al. @ ICDT’99)
 An itemset X is a max-pattern if X is frequent and there
exists no frequent super-pattern Y ⸦ X(proposed by
Bayardo @ SIGMOD’98)
 Closed pattern is a lossless compression of freq. patterns
 Reducing the # of patterns and rules
9
Closed Patterns and Max-Patterns
 Exercise. DB = {<a1, …, a100>, < a1, …, a50>}
 Min_sup = 1.
 What is the set of closed itemset?
 <a1, …, a100>: 1
 < a1, …, a50>: 2
 What is the set of max-pattern?
 <a1, …, a100>: 1
 What is the set of all patterns?
 !!
10
Chapter 5: Mining Frequent Patterns, Association
and Correlations: Basic Concepts and Methods

 Basic Concepts

 Frequent Itemset Mining Methods

 Which Patterns Are Interesting?—Pattern

Evaluation Methods

 Summary

11
Scalable Frequent Itemset Mining Methods

 Apriori: A Candidate Generation-and-Test

Approach

 Mining Close Frequent Patterns and Maxpatterns

12
The Downward Closure Property and Scalable
Mining Methods
 The downward closure property of frequent patterns
 Any subset of a frequent itemset must be frequent

 If {beer, diaper, nuts} is frequent, so is {beer,

diaper}
 i.e., every transaction having {beer, diaper, nuts} also

contains {beer, diaper}

 Scalable mining methods: Three major approaches
 Apriori (Agrawal & Srikant@VLDB’94)

 Freq. pattern growth (FPgrowth—Han, Pei & Yin

@SIGMOD’00)
 Vertical data format approach (Charm—Zaki & Hsiao

@SDM’02)
13
Apriori: A Candidate Generation & Test Approach

 Apriori pruning principle: If there is any itemset which is

infrequent, its superset should not be generated/tested!
(Agrawal & Srikant @VLDB’94, Mannila, et al. @ KDD’ 94)
 Method:
 Initially, scan DB once to get frequent 1-itemset
 Generate length (k+1) candidate itemsets from length k
frequent itemsets
 Test the candidates against DB
 Terminate when no frequent or candidate set can be
generated

14
The Apriori Algorithm—An Example
Supmin = 2 Itemset sup
Itemset sup
Database TDB {A} 2
Tid Items
L1 {A} 2
C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup 2nd scan {A, B}
{A, C} 2
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2
{B, C} 2 {A, E}
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset L3 Itemset sup

3rd scan
{B, C, E} {B, C, E} 2
15
Finding the Association Rules
Itemset sup
{B, C, E} 2

May 10, 2021 Data Mining: Concepts and Techniques 16

The Apriori Algorithm (Pseudo-Code)
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that
are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
17
Chapter 5: Mining Frequent Patterns, Association
and Correlations: Basic Concepts and Methods

 Basic Concepts

 Frequent Itemset Mining Methods

 Which Patterns Are Interesting?—Pattern

Evaluation Methods

 Summary

18
Interestingness Measure: Correlations (Lift)
 play basketball  eat cereal [40%, 66.7%] is misleading
 The overall % of students eating cereal is 75% > 66.7%.
 play basketball  not eat cereal [20%, 33.3%] is more accurate,
although with lower support and confidence
 Measure of dependent/correlated events: lift

P( A B) Basketball Not basketball Sum (row)

lift  Cereal 2000 1750 3750
P( A) P( B)
Not cereal 1000 250 1250
2000 / 5000
lift( B, C )   0.89 Sum(col.) 3000 2000 5000
3000 / 5000 * 3750 / 5000
1000 / 5000
lift( B, C )   1.33
3000 / 5000 *1250 / 5000

 lift>=1 Accept rule, lift<1 Reject rule

19
Are lift and 2 Good Measures of Correlation?

 “Buy walnuts  buy

milk [1%, 80%]” is
misleading if 85% of
customers buy milk
 Support and confidence
are not good to indicate
correlations
 Over 20 interestingness
measures have been
proposed (see Tan,
Kumar, Sritastava
@KDD’02)
 Which are good ones?

20
Chapter 5: Mining Frequent Patterns, Association
and Correlations: Basic Concepts and Methods

 Basic Concepts

 Frequent Itemset Mining Methods

 Which Patterns Are Interesting?—Pattern

Evaluation Methods

 Summary

21
Summary

 Basic concepts: association rules, support-

confident framework, closed and max-patterns
 Scalable frequent pattern mining methods
 Apriori (Candidate generation & test)
 Which patterns are interesting?
 Pattern evaluation methods

22
Ref: Basic Concepts of Frequent Pattern Mining

 (Association Rules) R. Agrawal, T. Imielinski, and A. Swami. Mining

association rules between sets of items in large databases.
SIGMOD'93.
 (Max-pattern) R. J. Bayardo. Efficiently mining long patterns from
databases. SIGMOD'98.
 (Closed-pattern) N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal.
Discovering frequent closed itemsets for association rules. ICDT'99.
 (Sequential pattern) R. Agrawal and R. Srikant. Mining sequential
patterns. ICDE'95

23
Ref: Apriori and Its Improvements

 R. Agrawal and R. Srikant. Fast algorithms for mining association rules.

VLDB'94.
 H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for
discovering association rules. KDD'94.
 A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for
mining association rules in large databases. VLDB'95.
 J. S. Park, M. S. Chen, and P. S. Yu. An effective hash-based algorithm
for mining association rules. SIGMOD'95.
 H. Toivonen. Sampling large databases for association rules. VLDB'96.
 S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset
counting and implication rules for market basket analysis. SIGMOD'97.
 S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule
mining with relational database systems: Alternatives and implications.
SIGMOD'98.
24
Ref: Depth-First, Projection-Based FP Mining
 R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for
generation of frequent itemsets. J. Parallel and Distributed Computing:02.
 J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate
generation. SIGMOD’ 00.
 J. Liu, Y. Pan, K. Wang, and J. Han. Mining Frequent Item Sets by
Opportunistic Projection. KDD'02.
 J. Han, J. Wang, Y. Lu, and P. Tzvetkov. Mining Top-K Frequent Closed
Patterns without Minimum Support. ICDM'02.
 J. Wang, J. Han, and J. Pei. CLOSET+: Searching for the Best Strategies for
Mining Frequent Closed Itemsets. KDD'03.
 G. Liu, H. Lu, W. Lou, J. X. Yu. On Computing, Storing and Querying Frequent
Patterns. KDD'03.
 G. Grahne and J. Zhu, Efficiently Using Prefix-Trees in Mining Frequent
Itemsets, Proc. ICDM'03 Int. Workshop on Frequent Itemset Mining
Implementations (FIMI'03), Melbourne, FL, Nov. 2003
25
Ref: Vertical Format and Row Enumeration Methods

 M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. Parallel algorithm

for discovery of association rules. DAMI:97.
 Zaki and Hsiao. CHARM: An Efficient Algorithm for Closed Itemset
Mining, SDM'02.
 C. Bucila, J. Gehrke, D. Kifer, and W. White. DualMiner: A Dual-
Pruning Algorithm for Itemsets with Constraints. KDD’02.
 F. Pan, G. Cong, A. K. H. Tung, J. Yang, and M. Zaki , CARPENTER:
Finding Closed Patterns in Long Biological Datasets. KDD'03.
 H. Liu, J. Han, D. Xin, and Z. Shao, Mining Interesting Patterns from
Very High Dimensional Data: A Top-Down Row Enumeration
Approach, SDM'06.

26
Ref: Mining Correlations and Interesting Rules

 M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I.

Verkamo. Finding interesting rules from large sets of discovered
association rules. CIKM'94.
 S. Brin, R. Motwani, and C. Silverstein. Beyond market basket:
Generalizing association rules to correlations. SIGMOD'97.
 C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable
techniques for mining causal structures. VLDB'98.
 P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the Right
Interestingness Measure for Association Patterns. KDD'02.
 E. Omiecinski. Alternative Interest Measures for Mining
Associations. TKDE’03.
 T. Wu, Y. Chen and J. Han, “Association Mining in Large Databases:
A Re-Examination of Its Measures”, PKDD'07
27
Ref: Freq. Pattern Mining Applications

 Y. Huhtala, J. Kärkkäinen, P. Porkka, H. Toivonen. Efficient

Discovery of Functional and Approximate Dependencies Using
Partitions. ICDE’98.
 H. V. Jagadish, J. Madar, and R. Ng. Semantic Compression and
Pattern Extraction with Fascicles. VLDB'99.
 T. Dasu, T. Johnson, S. Muthukrishnan, and V. Shkapenyuk.
Mining Database Structure; or How to Build a Data Quality
Browser. SIGMOD'02.
 K. Wang, S. Zhou, J. Han. Profit Mining: From Patterns to Actions.
EDBT’02.

28
May 10, 2021 Data Mining: Concepts and Techniques 29

Module 3
No ratings yet
Module 3
98 pages
Slide 06 Chapter6 Frequent Itemset Mining Methods
No ratings yet
Slide 06 Chapter6 Frequent Itemset Mining Methods
62 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
Unit 2
No ratings yet
Unit 2
65 pages
38 GM - ASAP-Association Rule Mining
No ratings yet
38 GM - ASAP-Association Rule Mining
64 pages
Data Structures Multiple Choice Questions
83% (6)
Data Structures Multiple Choice Questions
6 pages
04 FPbasic
No ratings yet
04 FPbasic
78 pages
Chap 4-Mining Frequent Patterns, Association-Lecture 6-2
No ratings yet
Chap 4-Mining Frequent Patterns, Association-Lecture 6-2
66 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
Multilevel Queues
50% (2)
Multilevel Queues
15 pages
DM 2
No ratings yet
DM 2
71 pages
Ch5 DataMIning
No ratings yet
Ch5 DataMIning
99 pages
Module 3
No ratings yet
Module 3
136 pages
UE19CS202: Data Structures and Its Applications (4-0-0-4-4) : Course Objectives
No ratings yet
UE19CS202: Data Structures and Its Applications (4-0-0-4-4) : Course Objectives
2 pages
33 GM - ASAP-Association Rule Mining
No ratings yet
33 GM - ASAP-Association Rule Mining
64 pages
Updated Module 3
No ratings yet
Updated Module 3
31 pages
P8 FPBasic
No ratings yet
P8 FPBasic
53 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
Association
No ratings yet
Association
40 pages
Mining Frequent Patterns
No ratings yet
Mining Frequent Patterns
41 pages
06 FPBasic
No ratings yet
06 FPBasic
74 pages
Chapter3 Python Fot Chemical Engineers
No ratings yet
Chapter3 Python Fot Chemical Engineers
84 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
93 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
94 pages
Daa Question Bank Srmist
No ratings yet
Daa Question Bank Srmist
58 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
Hashing in Data Structures
No ratings yet
Hashing in Data Structures
27 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Linked List Notes
100% (1)
Linked List Notes
2 pages
Chapter 7 - Run Time Environment
No ratings yet
Chapter 7 - Run Time Environment
12 pages
Notes 4 DWM Data Mining
No ratings yet
Notes 4 DWM Data Mining
34 pages
Unit 3
No ratings yet
Unit 3
62 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
06 FPBasic
No ratings yet
06 FPBasic
59 pages
Chapter - 6 Data Mining
No ratings yet
Chapter - 6 Data Mining
65 pages
Stack Applicatons
No ratings yet
Stack Applicatons
49 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
99 pages
Exercises
No ratings yet
Exercises
42 pages
50 SDE Preparation Questions
No ratings yet
50 SDE Preparation Questions
52 pages
Final Exam - Attempt Review
No ratings yet
Final Exam - Attempt Review
17 pages
Data Structures Lab Manual
No ratings yet
Data Structures Lab Manual
30 pages
06 FPBasic
No ratings yet
06 FPBasic
37 pages
Week 3
No ratings yet
Week 3
56 pages
Course Name: Design and Analysis of Algorithm: B.Tech V Sem Cse
No ratings yet
Course Name: Design and Analysis of Algorithm: B.Tech V Sem Cse
21 pages
06 FPBasic
No ratings yet
06 FPBasic
65 pages
DWDWM Unit2
No ratings yet
DWDWM Unit2
59 pages
Association Rules
No ratings yet
Association Rules
20 pages
Association Rules
No ratings yet
Association Rules
48 pages
02data Edited v2
No ratings yet
02data Edited v2
43 pages
Algorithms & Programming Concepts
No ratings yet
Algorithms & Programming Concepts
6 pages
06 Association Rule Mining
No ratings yet
06 Association Rule Mining
20 pages
Slides 06FPBasic
No ratings yet
Slides 06FPBasic
30 pages
03 Preprocessing
No ratings yet
03 Preprocessing
60 pages
06 Apriori
No ratings yet
06 Apriori
36 pages
KDDM-Lecture 3
No ratings yet
KDDM-Lecture 3
21 pages
08ClassBasic v1
No ratings yet
08ClassBasic v1
46 pages
Data Structures 8.1 Introduction To Data Structures
No ratings yet
Data Structures 8.1 Introduction To Data Structures
51 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
99 pages
Frequent Patterns
No ratings yet
Frequent Patterns
80 pages
Frequent Pattern Based Clustering Methods
No ratings yet
Frequent Pattern Based Clustering Methods
23 pages
Queue, Deque, and Priority Queue Implementations
No ratings yet
Queue, Deque, and Priority Queue Implementations
40 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
67 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
Chapter06 (Frequent Patterns)
No ratings yet
Chapter06 (Frequent Patterns)
47 pages
History-Independent Cuckoo Hashing: Udi Wieder Moni Naor Gil Segev
No ratings yet
History-Independent Cuckoo Hashing: Udi Wieder Moni Naor Gil Segev
20 pages
Concepts and Techniques: - Chapter 6
No ratings yet
Concepts and Techniques: - Chapter 6
64 pages
Mining Frequent Patterns, Association and Correlations
No ratings yet
Mining Frequent Patterns, Association and Correlations
100 pages
KTH Smallest Number Algo
No ratings yet
KTH Smallest Number Algo
17 pages
Unit 4&5 QAB UPDATED
No ratings yet
Unit 4&5 QAB UPDATED
3 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
Doubly Linked List
No ratings yet
Doubly Linked List
6 pages
P E O A: Hilippine Agle Ptimization Lgorithm
No ratings yet
P E O A: Hilippine Agle Ptimization Lgorithm
34 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
No ratings yet
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
10 pages
02 CUDA Shared Memory
No ratings yet
02 CUDA Shared Memory
21 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
Frequent Itemset Mining
No ratings yet
Frequent Itemset Mining
58 pages
10ClusBasic Editted v1
No ratings yet
10ClusBasic Editted v1
41 pages
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
No ratings yet
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
66 pages
CSC 233 Exam 2010-2011
No ratings yet
CSC 233 Exam 2010-2011
2 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
99 pages
Combinatorial Optimization: Algorithms and Complexity: Christos H. Papadimitriou and Kenneth Steiglitz October 8, 2000
No ratings yet
Combinatorial Optimization: Algorithms and Complexity: Christos H. Papadimitriou and Kenneth Steiglitz October 8, 2000
2 pages
Back Propagation LSN 4
No ratings yet
Back Propagation LSN 4
17 pages
Gerald Samson
No ratings yet
Gerald Samson
18 pages
7 - Association Rule Analysis
No ratings yet
7 - Association Rule Analysis
16 pages
Loading Pandas
No ratings yet
Loading Pandas
23 pages
Assignment Data Structure and Algorithm
No ratings yet
Assignment Data Structure and Algorithm
9 pages
01 Laurie Stephey
No ratings yet
01 Laurie Stephey
14 pages
Chap 6
No ratings yet
Chap 6
77 pages
Bcsl305 Updated Lab Manual
No ratings yet
Bcsl305 Updated Lab Manual
50 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
KNN in Python
No ratings yet
KNN in Python
11 pages
Subdivision
No ratings yet
Subdivision
5 pages
Lab 3
No ratings yet
Lab 3
3 pages
بارگذاری فایل
No ratings yet
بارگذاری فایل
2 pages
Clustering
No ratings yet
Clustering
1 page
2 Candidate Elimination Alg
No ratings yet
2 Candidate Elimination Alg
3 pages
On A Cofferdam Design Optimization
No ratings yet
On A Cofferdam Design Optimization
3 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

06apriori Edited v3

Uploaded by

06apriori Edited v3

Uploaded by

Data Mining:

Concepts and Techniques

 Frequent Itemset Mining Methods

 Which Patterns Are Interesting?—Pattern

 Freq. pattern: An intrinsic and important property of

 Sequential, structural (e.g., sub-graph) patterns

 Pattern analysis in spatiotemporal, multimedia, time-

series, and stream data

 Cluster analysis: frequent pattern-based clustering

 Data warehousing: iceberg cube and cube-gradient

Tid Items bought  itemset: A set of one or more

May 10, 2021 Data Mining: Concepts and Techniques 8

 Frequent Itemset Mining Methods

 Which Patterns Are Interesting?—Pattern

 Apriori: A Candidate Generation-and-Test

 Mining Close Frequent Patterns and Maxpatterns

 If {beer, diaper, nuts} is frequent, so is {beer,

contains {beer, diaper}

 Freq. pattern growth (FPgrowth—Han, Pei & Yin

 Apriori pruning principle: If there is any itemset which is

C3 Itemset L3 Itemset sup

May 10, 2021 Data Mining: Concepts and Techniques 16

 Frequent Itemset Mining Methods

 Which Patterns Are Interesting?—Pattern

P( A B) Basketball Not basketball Sum (row)

 lift>=1 Accept rule, lift<1 Reject rule

 “Buy walnuts  buy

 Frequent Itemset Mining Methods

 Which Patterns Are Interesting?—Pattern

 Basic concepts: association rules, support-

 (Association Rules) R. Agrawal, T. Imielinski, and A. Swami. Mining

 R. Agrawal and R. Srikant. Fast algorithms for mining association rules.

 M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. Parallel algorithm

 M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I.

 Y. Huhtala, J. Kärkkäinen, P. Porkka, H. Toivonen. Efficient

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.