0% found this document useful (0 votes)

22 views68 pages

Unit-2 Dma

The document covers Association Rule Learning, a machine learning method used to discover relationships between variables in large datasets, particularly for analyzing customer behavior. It explains key concepts such as support and confidence, classification of association rules, and various algorithms like Apriori and FP Growth for mining frequent itemsets. Additionally, it discusses practical applications in marketing and the importance of understanding buying patterns to enhance sales strategies.

Uploaded by

bm8968

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views68 pages

Unit-2 Dma

Uploaded by

bm8968

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 68

21CSE355T- Data Mining and Analytics

Unit-2
School of Computing - SRMIST Kattankulathur Campus
Association Rules

● Association rule learning is a rule-based machine learning

method for discovering interesting relations between variables
in large databases.

● Analyse and predict costumers behaviour.

● If/then statements

Example
bread => butter
buys{onions, potatoes} => buys{tomatoes}

These information's are the basics for marketing activities such as

product promotion /product pricing.

2
Association Rules Continues…..

Understanding the buying patterns can help to increase sales in several ways.
Example:
● If there is a pair of items, X and Y, that are frequently bought together

● Both X and Y can be placed on the same shelf, so that buyers of one item
would be prompted to buy the other.

● Promotional discounts could be applied to just one out of the two

items.

● Advertisements on X could be targeted at buyers who purchase Y.

● X and Y could be combined into a new product, such as having Y in

flavour's of X.
3
Parts of Association Rule

bread => butter[20%,45%]

● Bread: Antecedent
● Butter: Consequent
● 20%: Support
● 45%: Confidence

● Support: denotes the probability that contains both bread and butter.
● Confidence: denotes the probability that a transaction containing bread also
contains butter.

4
Examples to calculate the Support and confidence

● Consider, in the supermarket

Total Transactions: 100
Bread: 20
So 20/100*100 = 20% [Support].

In 20 transactions, butter occurs in 9 transactions

So 9/20*100 = 45%[Confidence].

5
Examples to calculate the Support and confidence
● Support: This says how popular an itemset is, as measured by the
proportion of transactions in which an itemset appears. In Table, the
support of {apple} is 4 out of 8, or 50%. Itemsets can also contain multiple
items. For instance, the support of {apple, beer, rice} is 2 out of 8, or 25%.

Confidence: This says how likely item Y is purchased when item X is purchased, expressed
as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y
also appears. In Table, the confidence of {apple -> beer} is 3 out of 4, or 75%.

6
Classification of Association Rules

● Single Dimensional Association Rule

Eg: Bread => butter
Dimension: buying(one dimension)
● Multidimensional Association Rule
With 2 or more predicates or dimensions
Eg: Occupation(IT),age(>22) => buys(laptop)
Dimensions must be unique it should not repeat.
● Hybrid Dimensional Association Rule
With repetative predicates or dimensions
Eg: Time(5’0 clock), buys(tea) => buys(biscuits)

7
Association Mining – Fields &Algorithms

● Web Usage Mining

● Banking
● Bio informatics
● Market Based Analysis
● Credit/Debit Card Analysis
● Product Clustering
● Catalog Design
Algorithms
● Apriori Algorithm
● Elcat Algorithm
● FP Growth Algorithm

8
● Frequent patterns are patterns (such as itemsets,
subsequences, or substructures) that appear in a data set
frequently.
● Eg: milk and bread.

● A subsequence, such as buying first a PC, then a digital

camera, and then a memory card, if it occurs frequently in a
shopping history database, is a (frequent) sequential
pattern.

● If a substructure occurs frequently, it is called a (frequent)

structured pattern.
● Eg: subgraphs, subtrees.
9
Frequent Itemset

10
Market basket Analysis

11
Market Basket Analysis

● Frequent item set mining leads to the discovery of

associations and correlations among items in large
transactional or relational data sets.
● This process analyses customer buying habits by finding
associations between the different items that customers
place in their “shopping baskets”.

12
Market Basket Analysis

13
Frequent Item sets, Closed Item sets, and Association Rules

• Rules that satisfy both a minimum support threshold (min sup) and a minimum confidence
threshold (min conf ) are called strong.
• A set of items is referred to as an item set.
• The occurrence frequency of an item set is the number of transactions that contain the itemset.
• If the relative support( the proportion of transactions in a dataset that contain a specific itemset) of
an item set I satisfies a pre-specified minimum support threshold (i.e., the absolute support of I
satisfies the corresponding minimum support count threshold), then I is
a frequent item set.

14
Frequent Item sets, Closed Item sets, and Association Rules

In general, association rule mining can be viewed as a two-step

process:

1. Find all frequent item sets: By definition, each of these itemsets will
occur at least as frequently as a predetermined minimum support
count, min sup.
2. Generate strong association rules from the frequent itemsets: By
definition, these rules must satisfy minimum support and minimum
confidence

15
Frequent Item sets, Closed Item sets, and Association Rules

● Maximal Itemset: An itemset is maximal frequent if none of its supersets are frequent.
● Closed Itemset:An itemset is closed if none of its immediate supersets have same support count same
as Itemset.
● K- Itemset:Itemset which contains K items is a K-itemset. So it can be said that an itemset is frequent if
the corresponding support count is greater than minimum support count.

16
Maximal & Closed Frequent Item set

17
Maximal & Closed Frequent Item set

All Maximal Frequent Itemsets are Closed Frequent Itemsets but all
Closed Frequent Itemsets are not Maximal Frequent Itemsets.

18
Apriori Algorithm

19
Frequent Pattern Mining

● Based on the completeness of patterns to be mined.

Eg: closed frequent itemsets, and the maximal frequent itemsets,
constrained frequent itemsets etc.
● Based on the levels of abstraction involved in the rule set.

● Based on the number of data dimensions involved in the

rule.
Eg: single-dimensional association rule, multidimensional
association rule.

20
● Based on the types of values handled in the rule.
Eg: Boolean association rule, quantitative association
rule

● Based on the kinds of rules to be mined.

Eg: Association rules, correlation rules.

● Based on the kinds of patterns to be mined.

Eg: Sequential pattern mining, Structured pattern
mining etc.

21
Frequent Itemset Mining Methods – Apriori Alg

It is for mining frequent itemsets for boolean association rules.

APRIORI Property: All nonempty subsets of a frequent itemset must also be frequent

Two actions involved

1. Join Step
2. Prune Step

22
23
Example -2

24
Steps of Apriori Algorithm

1. Generation of Candidate Item set C1.

2. Check for the required minimum support count of the transaction.
3. The set of frequent 1-itemset L1 is generated from C1 that satisfies the minimum
support count(Pruning).
4. Discover the 2-frequent item set by L1*L1(Joining) and generate C2.
5. L2 is generated by pruning the records that do not satisfy the minimum support.
6. Discover the 3-frequent item set by L2*L2(Joining) and generate C3.
7. L3 is generated by pruning the records that do not satisfy the minimum support.
8. Discover the 4-frequent item set by L3*L3(Joining) and generate C4.
9. The Algorithm ends the frequent pattern mining if 4-frequent itemset is not
available.
25
Apriori Algorithm

26
● Find the frequent item sets in the following database
with min support 50% & min confidence 50%.
Transaction id Items Bought
2000 A,B,C
1000 A,C
4000 A,D
5000 B,E,F

● 50/100*4 = 2
● MIN SUP COUNT IS 2.

27
● Step1: find c1
Items Support count
[A] 3
[B] 2
[C] 2
[D] 1
[E] 1
[F] 1

● Min support count is 2 so eliminate which are less than

that.

28
● Step 2: compare the candidate support count with min
support count so L1 will be
Items Support
[A] 3
[B] 2
[C] 2

● Step 3: Generate candidate C2 from L1.

Items

[A,B]
[A,C]
[B,C]

29
● Step 4: Scan D for count of each candidate in C2 and find
support.
Items Support
[A,B] 1
[A,C] 2
[B,C] 1

● Step 5: Compare candidate C2 support count with min

support count so L2 will be

Items Support
[A,C] 2

● Step 6: so the data contains the frequent item [A,C].

30
Association Rule Support Confidence Confidence %
A->C 2 2/3=0.66 66%
C->A 2 2/2=1 100%

Min Confidence - 50%

So final rules are

Rule 1: A -> C
Rule 2: C -> A

31
Generating Association Rules for Frequent Item sets

● Association rules can be generated as follows:

32
Generating Association Rules for Frequent Item sets

Minimum Confidence : 70%

33
Generating Association Rules for Frequent Item sets

● R1 I1Î2 -> I5
Confidence = SC(I1,I2,I5)/SC(I1,I2) = 2/4=50%. (Rejected)
● R2 I1Î5 -> I2
Confidence = SC(I1,I5,I2)/SC(I1,I5) = 2/2=100%. (Accepted)
● R3 I2Î5 -> I1
Confidence = SC(I2,I5,I1)/SC(I2,I5) = 2/2=100%. (Accepted)
● R4 I1->I2Î5
Confidence = SC(I1,I2,I5)/SC(I1) = 2/6=33%. (Rejected)
● R5 I2->I1Î5
Confidence = SC(I2,I1,I5)/SC(I2) = 2/7=29%. (Rejected)
● R6 I5->I1Î2
Confidence = SC(I5,I1,I2)/SC(I5) = 2/2=100%. (Accepted)
34
Problem - 1
● A database has five transactions. Let the Minimum Support &
Confidence ,min_sup=60%, min_confi = 100%.
● Find the frequent itemsets and generate the
association rules using Apriori algorithm.
TI ITEMS
D
T {M,O,N,K,E,Y}
1
T {D,O,N,K,E,Y}
2
T {M,A,K,E}
3
T {M,U,C,K,Y}
4
T {C,O,O,K,I,E}
5 35
Problem - 2
● A database has five transactions. Let the Minimum
Support & Confidence ,min_sup=3, min_confi = 80%.
TID ITEMS
T1 {1,2,3,4,5,6}
T2 {7,2,3,4,5,6}
T3 {1,8,4,5}
T4 {1,9,0,4,6}
T5 {0,2,2,4,5}

● Find the frequent item sets and generate the

association rules using Apriori algorithm.

36
Improving the Efficiency of Apriori

● Transaction Reduction(reducing the number of transactions

scanned in future iterations): A transaction that does not contain
any frequent k-itemsets cannot contain any frequent (k+1)-itemsets.

● Partitioning(partitioning the data to find candidate itemsets):

Partitioning technique can be used that requires just two database
scans to mine the frequent itemsets.
● In Phase I, the algorithm subdivides the transactions of D into n non
overlapping partitions. If the minimum support threshold for
transactions in D is min sup, then the minimum support count for a
partition is
min sup X the number of transactions in that partition.
● All frequent itemsets within the partition are found. These are
referred to as local frequent itemsets.

37
Improving the Efficiency of Apriori

● Phase II, Any itemset that is potentially frequent with respect to D

must occur as a frequent itemset in at least one of the partitions.
Therefore, all local frequent itemsets are candidate itemsets with
respect to D.
● The collection of frequent itemsets from all partitions forms the global
candidate itemsets.

38
Improving the Efficiency of Apriori

Sampling(mining on a subset of the given data):

● Pick a random sample S of the given data D, and then
search for frequent itemsets in S instead of D.

Dynamic itemset counting (adding candidate itemsets at

different points during a scan):
● The database is partitioned into blocks marked by start
points.
● new candidate itemsets can be added at any start point.

39
Hash Based techniques

● Hash-based technique (hashing itemsets into corresponding

buckets):
● A hash-based technique can be used to reduce the size of the candidate
k-itemsets.

40
Hash Based techniques

41
Frequent Pattern Growth Algorithm

42
Mining Frequent Item sets without Candidate
Generation
Disadvantages in Apriori Algorithm:
● It may need to generate a huge number of candidate sets.
● It may need to repeatedly scan the database and check a large set of
candidates by pattern matching.
FP Growth Algorithm

43
Mining Frequent Item sets without Candidate Generation

● we will start from the node that has the minimum support count ie.I5.
● We exclude the node with maximum support count ie.I2 for preparing
the table.

44
Mining Frequent Item sets without Candidate Generation

The Conditional FP-Tree associated with the Conditional node I3.

45
FP Growth Algorithm

46
FP Growth Algorithm Vs Apriori Algorithm

FP Growth Algorithm Apriori Algorithm

1. FP growth algorithm is faster It is slower than FP growth algorithm .

than Apriori algorithm.

2. It is an array based algorithm. It is a tree based algorithm

3. It required only 2 database It requires multiple database scan to generate a

scan candidate set.

4.It uses depth-first search It uses breath-first search.

5. Less accurate More accurate

47
FP GROWTH ALGORITHM Vs APRIORI ALGORITHM

48
Problems 1 – FP Growth Tree

● A database has five transactions. Let the Minimum

Support min_sup=60%.
T ITEMS
● Find the frequent itemsets using I
FP growth Algorithm. D
T {M,O,N,K,E,Y}
1
T {D,O,N,K,E,Y}
2
T {M,A,K,E}
3
T {M,U,C,K,Y}
4
T {C,O,O,K,I,E}
5

49
Problems 2 – FP Growth Tree

● A database has Eight transactions. Let the Minimum

Support, min_sup=30%.
TID ITEMS
1 {E,A,D,B}
2 {D,A,C,E,B}
3 {C,A,B.E}
4 {B,A,D}
5 {D}
6 {D,B}
7 {A,D,E}
8 {B,C}

● Find the frequent item sets using FP growth Algorithm.

50
Mining Frequent Item sets Using Vertical Data Format
Horizontal Data Format will be converted to Vertical Data Format

51
Mining Frequent Item sets Using Vertical Data Format

52
Mining Closed Frequent Item sets

● It is a frequent itemset that is both closed and its support is greater than
or equal to minsup.
● An itemset is closed in a data set if there exists no superset that has the
same support count as this original itemset.

● Frequent itemset mining may generate a huge number of frequent

itemsets, when the min sup threshold is set low or when there exist
long patterns in the data set.

53
Mining Closed Frequent Item sets

“How can we mine closed frequent itemsets?”

● First mine the complete set of frequent itemsets.
● Then remove every frequent itemset that is a proper subset of, and carries the same support as, an
existing frequent itemset.

● To search for closed frequent itemsets directly during the mining process.
● This requires us to prune the search space as soon as we can identify the case of closed itemsets during
mining.

54
Pruning strategies

● Item merging: If every transaction containing a frequent item set

X also contains an item set Y but not having any proper superset of
Y,
● then X UY forms a frequent closed item set and there is no need to
search for any item set containing X but no Y.

55
Pruning strategies

● Sub-item set pruning: If a frequent item set X is a proper subset of an already found frequent closed
itemset Y
● and support count(X) = support count(Y), then X and all of X’s descendants in the set enumeration tree
cannot be frequent closed item sets and thus can be pruned.

56
Pruning strategies
Item skipping: In the depth-first mining of closed itemsets, at each level, there will
be a prefix itemset X associated with a header table and a projected database.
● If a local frequent item p has the same support in several header tables at
different levels, we can safely prune p from the header tables at higher levels.

57
Pruning strategies
● Important optimization is to perform efficient checking

Perform two kinds of closure checking:

● superset checking: checks if this new frequent itemset is a superset of some
already found closed itemsets with the same support.

● subset checking: checks whether the newly found itemset is a subset of an already
found closed itemset with the same support.

● For efficient subset checking, we can use the following property:

● If the current itemset Sc can be subsumed by another already found closed itemset Sa,
then
(1) Sc and Sa have the same support.
(2) the length of Sc is smaller than that of Sa.
(3) all of the items in Sc are contained in Sa.

58
Which Patterns Are Interesting?—Pattern
Evaluation Method
● Most association rule mining algorithms employ a support-confidence
framework.
● Many interesting rules can be found using low support thresholds.
● Strong Rules Are Not Necessarily Interesting.
● Whether or not a rule is interesting can be assessed either subjectively
or objectively.
● only the user can judge if a given rule is interesting, and this judgment,
being subjective, may differ from one user to another.
● objective interestingness measures, based on the statistics “behind” the
data.

59
Association Mining to Correlation Analysis

A misleading “strong” association rule.

● Let game refer to the transactions containing computer games,
and video refer to those containing videos. Of the 10,000
transactions analyzed, the data show that 6,000 of the customer
transactions included computer games, while 7,500 included
videos, and 4,000 included both computer games and videos.
● minimum support - 30% minimum confidence - 60%.
● Support value of computer games: 4000/10000 =
40%
● Confidence value of “ “ “ “ : 4000/6000 = 66%

60
Association Mining to Correlation Analysis

● The probability of purchasing videos is 75%, which is even larger than

66%.
● In fact, computer games and videos are negatively associated because
the purchase of one of these items actually decreases the likelihood of
purchasing the other.

61
From Association Analysis to Correlation Analysis

● The support and confidence measures are insufficient at filtering out

uninteresting association rules.
● This leads to correlation rules of the form
A=>B [support, confidence. correlation].
● A correlation rule is measured not only by its support and confidence
but also by the correlation between item sets A and B.

62
Correlation Measures
● Lift is a simple correlation measure.
● The occurrence of item set A is independent of the occurrence of itemset
B if P(A U B) = P(A)P(B).
● otherwise, item sets A and B are dependent and correlated as events.

• Lift(A,B)<1 – A & B are negatively correlated.

• Lift(A,B)>1 – A & B are positively correlated.
• Lift(A,B)=1 – A & B are not correlated, they are independent.
•
• It assesses the degree to which the occurrence of one “lifts” the occurrence of
the other.

63
Correlation analysis using lift

0.89<1 so game and video are negatively correlated.

64
Correlation analysis using Chi square

65
Correlation analysis using Chi square

66
67
Thank You

Data Structure and Algorithm Exam
100% (1)
Data Structure and Algorithm Exam
3 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
CH - 5
No ratings yet
CH - 5
43 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
Mining: Association Rules
No ratings yet
Mining: Association Rules
54 pages
Module1 Part2
No ratings yet
Module1 Part2
17 pages
Unit - III
No ratings yet
Unit - III
27 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
Association Rules and Frequent Item Analysis
No ratings yet
Association Rules and Frequent Item Analysis
30 pages
Unit 5
No ratings yet
Unit 5
40 pages
Association Rule Mining:: Dm-Unit-2
No ratings yet
Association Rule Mining:: Dm-Unit-2
16 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Unit 4 - Part 1
No ratings yet
Unit 4 - Part 1
152 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
Unit 4 .3 Association Analysis
No ratings yet
Unit 4 .3 Association Analysis
50 pages
Equent Itemsets & Clustering
No ratings yet
Equent Itemsets & Clustering
27 pages
III Unit-DM
No ratings yet
III Unit-DM
9 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
Contents
No ratings yet
Contents
59 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
ML Unit - Iii
No ratings yet
ML Unit - Iii
64 pages
Association Rule - Data Mining
100% (1)
Association Rule - Data Mining
131 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Association RuleMining
No ratings yet
Association RuleMining
52 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
Association Rule Mining
No ratings yet
Association Rule Mining
21 pages
Data Analytics and Visualization Unit-IV
No ratings yet
Data Analytics and Visualization Unit-IV
4 pages
DM Unit-II
No ratings yet
DM Unit-II
80 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Unit-5: Concept Description and Association Rule Mining
No ratings yet
Unit-5: Concept Description and Association Rule Mining
39 pages
Association Rule Mining
No ratings yet
Association Rule Mining
8 pages
Association Rules
No ratings yet
Association Rules
39 pages
Rani 2
No ratings yet
Rani 2
98 pages
ICS 2408 - Lecture 5 - Association
No ratings yet
ICS 2408 - Lecture 5 - Association
44 pages
Mining Frequent, Patterns, Associations, and Correlations
No ratings yet
Mining Frequent, Patterns, Associations, and Correlations
13 pages
Lect 6
No ratings yet
Lect 6
74 pages
Association Rule Mining Presentation
No ratings yet
Association Rule Mining Presentation
44 pages
DM Unit-2
No ratings yet
DM Unit-2
22 pages
DM Chapter 6 (Association)
100% (1)
DM Chapter 6 (Association)
21 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
Data Mining Mod 2
No ratings yet
Data Mining Mod 2
7 pages
BIS 541 Ch05 20-21 S
No ratings yet
BIS 541 Ch05 20-21 S
91 pages
Dmunit 2
No ratings yet
Dmunit 2
85 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
No ratings yet
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
32 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
Association Rules
No ratings yet
Association Rules
24 pages
UNIT 2 Updated
No ratings yet
UNIT 2 Updated
50 pages
Unit 4
No ratings yet
Unit 4
97 pages
Assignment1 12 Merged
No ratings yet
Assignment1 12 Merged
35 pages
Unit2 - Req Eng
No ratings yet
Unit2 - Req Eng
84 pages
Unit-1 PPT Dma
No ratings yet
Unit-1 PPT Dma
83 pages
Unit 1-2
No ratings yet
Unit 1-2
94 pages
Introduction To Data Structures: Prepared by Tasneea Hossain 1/20/2020
No ratings yet
Introduction To Data Structures: Prepared by Tasneea Hossain 1/20/2020
16 pages
Binary Tree
No ratings yet
Binary Tree
18 pages
QUAN-DATA Structure
No ratings yet
QUAN-DATA Structure
36 pages
Queue
No ratings yet
Queue
18 pages
Chapter 5 Transportation and Assignment Problems
No ratings yet
Chapter 5 Transportation and Assignment Problems
49 pages
Quora Answer Classifier (Redux)
No ratings yet
Quora Answer Classifier (Redux)
2 pages
Lec 2
No ratings yet
Lec 2
30 pages
DBSCAN
No ratings yet
DBSCAN
7 pages
Content Beyond Syllabus
No ratings yet
Content Beyond Syllabus
6 pages
Acing Technical Interviews
No ratings yet
Acing Technical Interviews
16 pages
Selection Sort Visualizer
No ratings yet
Selection Sort Visualizer
9 pages
9.circular Linked List
No ratings yet
9.circular Linked List
17 pages
Chapter 07 Deadlocks
No ratings yet
Chapter 07 Deadlocks
11 pages
CPCS-204: Data Structures I
No ratings yet
CPCS-204: Data Structures I
11 pages
Sorting Algorithms Sorting Algorithms: Biostatistics 615/815
No ratings yet
Sorting Algorithms Sorting Algorithms: Biostatistics 615/815
43 pages
DAA Assignment Question Jbki HN Ha SB HN
No ratings yet
DAA Assignment Question Jbki HN Ha SB HN
2 pages
Q Bank2
No ratings yet
Q Bank2
4 pages
TUES TEE Review
No ratings yet
TUES TEE Review
2 pages
Back Propagation LSN 4
No ratings yet
Back Propagation LSN 4
17 pages
AVL Trees
No ratings yet
AVL Trees
13 pages
HPC-Practical-5HPC Application For AIML Domain.
No ratings yet
HPC-Practical-5HPC Application For AIML Domain.
2 pages
Evolutionary Computing: Presented By: Praveen Yadav (CSE-IV Yr.) Smrittee Priya (CSE-IV Yr.)
No ratings yet
Evolutionary Computing: Presented By: Praveen Yadav (CSE-IV Yr.) Smrittee Priya (CSE-IV Yr.)
12 pages
2020-Daa-Sol-Mid-2020 Autumn
No ratings yet
2020-Daa-Sol-Mid-2020 Autumn
12 pages
Ca File Advance Dsa
No ratings yet
Ca File Advance Dsa
3 pages
Chapter 12
No ratings yet
Chapter 12
21 pages
Unit 5 Binary Trees
No ratings yet
Unit 5 Binary Trees
28 pages
Zoho - Round 2 - Questions
No ratings yet
Zoho - Round 2 - Questions
20 pages
KNN
No ratings yet
KNN
3 pages
2 Candidate Elimination Alg
No ratings yet
2 Candidate Elimination Alg
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit-2 Dma

Uploaded by

Unit-2 Dma

Uploaded by

21CSE355T- Data Mining and Analytics

● Association rule learning is a rule-based machine learning

● Analyse and predict costumers behaviour.

These information's are the basics for marketing activities such as

● Promotional discounts could be applied to just one out of the two

● Advertisements on X could be targeted at buyers who purchase Y.

● X and Y could be combined into a new product, such as having Y in

bread => butter[20%,45%]

● Consider, in the supermarket

In 20 transactions, butter occurs in 9 transactions

● Single Dimensional Association Rule

● Web Usage Mining

● A subsequence, such as buying first a PC, then a digital

● If a substructure occurs frequently, it is called a (frequent)

● Frequent item set mining leads to the discovery of

In general, association rule mining can be viewed as a two-step

● Based on the completeness of patterns to be mined.

● Based on the number of data dimensions involved in the

● Based on the kinds of rules to be mined.

● Based on the kinds of patterns to be mined.

It is for mining frequent itemsets for boolean association rules.

Two actions involved

1. Generation of Candidate Item set C1.

● Min support count is 2 so eliminate which are less than

● Step 3: Generate candidate C2 from L1.

● Step 5: Compare candidate C2 support count with min

● Step 6: so the data contains the frequent item [A,C].

Min Confidence - 50%

So final rules are

● Association rules can be generated as follows:

Minimum Confidence : 70%

● Find the frequent item sets and generate the

● Transaction Reduction(reducing the number of transactions

● Partitioning(partitioning the data to find candidate itemsets):

● Phase II, Any itemset that is potentially frequent with respect to D

Sampling(mining on a subset of the given data):

Dynamic itemset counting (adding candidate itemsets at

● Hash-based technique (hashing itemsets into corresponding

The Conditional FP-Tree associated with the Conditional node I3.

FP Growth Algorithm Apriori Algorithm

1. FP growth algorithm is faster It is slower than FP growth algorithm .

2. It is an array based algorithm. It is a tree based algorithm

3. It required only 2 database It requires multiple database scan to generate a

4.It uses depth-first search It uses breath-first search.

5. Less accurate More accurate

● A database has five transactions. Let the Minimum

● A database has Eight transactions. Let the Minimum

● Find the frequent item sets using FP growth Algorithm.

● Frequent itemset mining may generate a huge number of frequent

“How can we mine closed frequent itemsets?”

● Item merging: If every transaction containing a frequent item set

Perform two kinds of closure checking:

● For efficient subset checking, we can use the following property:

A misleading “strong” association rule.

● The probability of purchasing videos is 75%, which is even larger than

● The support and confidence measures are insufficient at filtering out

• Lift(A,B)<1 – A & B are negatively correlated.

0.89<1 so game and video are negatively correlated.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.