0% found this document useful (0 votes)

3 views79 pages

ARM and Clustering

Association Rule Mining is a machine learning technique used to discover interesting relationships between variables in large datasets, focusing on frequent patterns and associations. The Apriori algorithm is a classical method for mining frequent itemsets and association rules, utilizing prior knowledge of itemset properties. Clustering, particularly through the K-Means algorithm, is an unsupervised learning method aimed at grouping similar data points, with the choice of the number of clusters often determined by methods like the Elbow Method.

Uploaded by

aliraza.khokhar098

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views79 pages

ARM and Clustering

Uploaded by

aliraza.khokhar098

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 79

Association Rule Mining

• Association Rule Mining is one of the ways to

find patterns in data
• It is a rule-based machine learning method for
discovering interesting relations between
variables in large databases
• It aims to extract interesting correlations,
frequent patterns, associations or casual
structures among sets of items in the transaction
databases or other data repositories.
• In short, Frequent Mining shows which
items appear together in a transaction or
relation.
Important Definitions

• Support
• It is one of the measure of interestingness. This tells about usefulness and certainty of rules.
• can be calculated by finding number of transactions containing a particular item divided by total number of transactions. Suppose we
want to find support for item B. This can be calculated as:

• For instance if out of 1000 transactions, 100 transactions contain Ketchup then the support for item Ketchup can be calculated as:

Support(Ketchup) = (Transactions containingKetchup)/(Total Transactions)

Support(Ketchup) = 100/1000 = 10%
Important Definitions

• Confidence
• Confidence refers to the likelihood that an item B is also bought if item A is bought. It can be calculated by finding the number of
transactions where A and B are bought together, divided by total number of transactions where A is bought. Mathematically, it can be
represented as:

• Coming back to our problem, we had 50 transactions where Burger and Ketchup were bought together. While in 150 transactions,
burgers are bought. Then we can find likelihood of buying ketchup when a burger is bought can be represented as confidence of Burger -
> Ketchup and can be mathematically written as:

Confidence(Burger→Ketchup) = (Transactions containing both (Burger and Ketchup))/(Transactions containing Burger)

Confidence(Burger→Ketchup) = 50/150 = 33.3%
Apriori Algorithm

• It is a classical algorithm in data mining.

• Given by R. Agrawal and R. Srikant in 1994
• It is used for mining frequent itemsets and relevant association rules
• Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset properties
Steps in Apriori Algorithm

• Let’s take an example from the supermarket sphere

• I = {Onion, Burger, Potato, Milk, Beer} and a database consisting of six
transactions.
• Each transaction is a tuple of 0’s and 1’s where 0 represents the absence of an
item and 1 the presence.
Cont.

• Step 1: Create a frequency table of all the items that occur in all the transactions.
• For our case:
Cont.

• Step 2: We know that only those elements are significant for which the support is greater than or
equal to the threshold support. Here, support threshold is 50%, hence only those items are significant
which occur in more than three transactions and such items are Onion(O), Potato(P), Burger(B), and
Milk(M). Therefore, we are left with:
Cont.

• Step 3: The next step is to make all the possible pairs of the significant items keeping in mind that the
order doesn’t matter, i.e., AB is same as BA. To do this, take the first item and pair it with all the others
such as OP, OB, OM. Similarly, consider the second item and pair it with preceding items, i.e., PB, PM.
We are only considering the preceding items because PO (same as OP) already exists. So, all the pairs
in our example are OP, OB, OM, PB, PM, BM.
Cont.

• Step 4: We will now count the occurrences of each pair in all the transactions.
Cont.

• Step 5: Again only those itemsets are significant which cross the support threshold, and those are OP,
OB, PB, and PM.
Cont.

• Step 6: Now let’s say we would like to look for a set of three items that are purchased together. We will
use the itemsets found in step 5 and create a set of 3 items.
• To create a set of 3 items another rule, called self-join is required. It says that from the item pairs OP, OB,
PB and PM we look for two pairs with the identical first letter and so we get
• OP and OB, this gives OPB
• PB and PM, this gives PBM
Cont.
• Next, we find the frequency for these two itemsets.

• Applying the threshold rule again, we find that OPB is the only significant itemset.
• Therefore, the set of 3 items that was purchased most frequently is OPB.
General Process of the Apriori
algorithm
Pros and Cons of the Apriori
algorithm

• Pros:
• It is an easy-to-implement and easy-to-understand algorithm.
• It can be used on large itemsets.

• Cons:
• Sometimes, it may need to find a large number of candidate
rules which can be computationally expensive.
• Calculating support is also expensive because it has to go
through the entire database.
Implementation in Python
Implementation in Python
Assignment

 Find useful patterns from store_data using Apriori algorithm

Unsupervised Learning

 When dealing with real-world problems, most of the time, data will
not come with predefined labels.

 So, we want to group the data into that are similar in nature (feature
values)
Introduction to Clustering

• Clustering is a type of unsupervised learning method.

• In basic terms, the objective of clustering is to find different groups

within the elements in the data.

• To do so, clustering algorithms find the structure in the data so that

elements of the same cluster (or group) are more similar to each
other than to those from different clusters.
Unsupervised Learning Process
Introduction to Unsupervised Learning

In supervised
learning, we have
labelled set of
examples.
Introduction to Unsupervised Learning

In unsupervised
learning, we have
unlabeled set of
examples.
Introduction to Unsupervised Learning

In unsupervised
learning, we form
clusters in the
dataset
Clustering
Clustering
Applications of Unsupervised Learning
K-Means Algorithm

In K-means, assume
we have dataset as
shown in Figure and
we need find 2
clusters
K-Means Algorithm

First step: to
randomly initialize
two points called
cluster centroids. 2
because we need 2
clusters
K-Means Algorithm

K-means is an iterative
algorithm and perform
two steps.
First, clustering
assignment step
Second, move centroid
step
K-Means Algorithm
K-Means Algorithm
K-Means Algorithm

Repeat the cluster

assignment step
K-Means Algorithm

Perform the Move

Centroid Step
K-Means Algorithm

Repeat the cluster

assignment step
K-Means Algorithm

if you keep running additional

iterations of K means from here
the cluster centroids will not
change any further and the colors
of the points will not change any
further. And so, this is the, at this
point, K means has converged and
it's done a pretty good job finding the
2 clusters in the data
K-Means Algorithm
K-Means Algorithm
K-Means Algorithm
K-Means Algorithm

C.
D.
K-Means Algorithm

C.
D.
K-Means Algorithm
K-Means Algorithm
K-Means Algorithm Optimization Objective (Cost
Function)
K-Means Algorithm Optimization Objective (Cost
Function)
K-Means Algorithm Optimization Objective (Cost
Function)
K-Means Algorithm Optimization Objective (Cost
Function)
K-Means Algorithm Optimization Objective (Cost
Function)
K-Means Algorithm Optimization Objective (Cost
Function)
K-Means Random Initialization
K-Means Random Initialization
K-Means Random Initialization
K-Means Random Initialization
K-Means Random Initialization
K-Means Random Initialization
K-Means Random Initialization

Hence, depending upon the

random initialization, k-means
come up with different
convergence solutions. See
next slide
K-Means Random Initialization

So, if k-means does not give

you good clusters then the
solution is to try multiple
random initializations. Run k-
means lots of time
K-Means Random Initialization

So, if k-means does not give

you good clusters then the
solution is to try multiple
random initializations. Run k-
means lots of time
K-Means Random Initialization

So, if k-means does not give

you good clusters then the
solution is to try multiple
random initializations. Run k-
means lots of time
K-Means Random Initialization

Note: this multiple random

initialization works the best
when you have small number
of K or clusters (K = 2 to 10).
However, this multiple
random initialization may not
work the best when you have
very large number of K (from
100 to 1000). In such case,
the initial random initiation
may work the best.
K-Means Random Initialization
K-Means Random Initialization
K-Means Choosing K Number of Clusters

How to choose the value of

parameter K?
Solution No. 1: Data
Visualization

Solution No. 2: By hand

K-Means Choosing K Number of Clusters

A useful method is to use the

Elbow Method to determine
the best number of K
K-Means Choosing K Number of Clusters

A useful method is to use the

Elbow Method to determine
the best number of K
K-Means Choosing K Number of Clusters

Distortion goes down rapidly

until K=3 then it was very
slow. Thus, K=3 may be the
best number of K. We call it
elbow method because the
curve is analogous to human
arm.
K-Means Choosing K Number of Clusters

One reason people don’t use elbow method because it

sometimes generate graph like this. Hence, very difficult to
determine the best number of K in such graph because it
looks,3,4,5,6 can give a good number of clusters. Hence, elbow
method is useful when it gives graph like elbow
K-Means Choosing K Number of Clusters
K-Means Choosing K Number of Clusters
K-Means Choosing K Number of Clusters

Sometimes, particular business

might give you a good idea to
decide the number of clusters.
For instance, particular brand
sells S, M, L T-shirts or XS, S, M,
L, XL T-shirts.
Hence, either 3 or 5 K
K-Means Choosing K Number of Clusters

Conclusion:

1. To conclude, for the most part, the number of clusters K is still chosen by hand by
human input or human insight.

2. One way to try to do so is to use the Elbow Method, but we shouldn’t always
expect that it will always work well

3. Thus, the better way to think about how to choose the number of clusters is to
ask, for what purpose are you running K-means?
Implementation of K - Means Clustering
Algorithm in Python (Live Demo)
K-Means Implementation
K-Means Implementation
K-Means Implementation
# Importing the libraries # Using the elbow method to find the optimal number of
1. import numpy as np clusters
2. import matplotlib.pyplot as plt from sklearn.cluster import KMeans
3. import pandas as pd
wcss =[]
for i in range (1,11):
#Importing the mall dataset with pandas kmeans = KMeans(n_clusters = i, init =
4. dataset = pd.read_csv('Mall_Customers.csv') 'k-means++', max_iter =300, n_init = 10,
5. X = dataset.iloc[:,[3,4]].values
random_state = 0)
kmeans.fit(X)
wcss.append(kmeans.inertia_)
# Plot the graph to visualize the Elbow Method # Applying KMeans to the dataset with the optimal number
to find the optimal number of cluster of cluster
1. plt.plot(range(1,11),wcss)
2. plt.title('The Elbow Method') 1. kmeans=KMeans(n_clusters= 5, init = 'k-means++',
3. plt.xlabel('Number of clusters') max_iter = 300, n_init = 10, random_state = 0)
4. plt.ylabel('WCSS') 2. y_kmeans = kmeans.fit_predict(X)
5. plt.show()
# Visualising the clusters
1. plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0,1],s = 100, c='red', label = 'Cluster 1')
2. plt.scatter(X[y_kmeans == 1, 0], X[y_kmeans == 1,1],s = 100, c='blue', label = 'Cluster 2')
3. plt.scatter(X[y_kmeans == 2, 0], X[y_kmeans == 2,1],s = 100, c='green', label = 'Cluster 3')
4. plt.scatter(X[y_kmeans == 3, 0], X[y_kmeans == 3,1],s = 100, c='cyan', label = 'Cluster 4')
5. plt.scatter(X[y_kmeans == 4, 0], X[y_kmeans == 4,1],s = 100, c='magenta', label = 'Cluster 5')
6. plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1], s = 300, c = 'yellow', label =
'Centroids')
7. plt.title('Clusters of clients')
8. plt.xlabel('Annual Income (k$)')
9. plt.ylabel('Spending score (1-100)')
10. plt.legend()
11. plt.show()
K – Means (Advantages)

1. Relatively simple to implement.

2. Scales to large data sets.
3. Guarantees convergence.
4. Can warm-start the positions of centroids.
5. Easily adapts to new examples.
6. Generalizes to clusters of different shapes and sizes, such as
elliptical clusters.
K – Means (Disadvantages)

1. Choosing manually.
2. Being dependent on initial values.
3. Clustering data of varying sizes and density.
4. Clustering outliers.
5. Scaling with number of dimensions.

ADM3308 Assigment1
No ratings yet
ADM3308 Assigment1
12 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Unit 4 Notes
No ratings yet
Unit 4 Notes
21 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
22 pages
DWM Labno 9 A222
No ratings yet
DWM Labno 9 A222
10 pages
Unsupervised ML
No ratings yet
Unsupervised ML
15 pages
Unsupervised Machine Learning Techniques
No ratings yet
Unsupervised Machine Learning Techniques
58 pages
Data Mining With Apriori Algorithm
No ratings yet
Data Mining With Apriori Algorithm
12 pages
Unit 3
No ratings yet
Unit 3
58 pages
Unit3 Data Mining Pattern
No ratings yet
Unit3 Data Mining Pattern
46 pages
Pattern Mining
No ratings yet
Pattern Mining
36 pages
Unit 5
No ratings yet
Unit 5
38 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
6 pages
DWDM Mid Ii
No ratings yet
DWDM Mid Ii
13 pages
Association Rules Explained
No ratings yet
Association Rules Explained
10 pages
Association Rule Mining (ARM)
No ratings yet
Association Rule Mining (ARM)
24 pages
Data Mining, Data Warehousing and Knowledge Discovery
No ratings yet
Data Mining, Data Warehousing and Knowledge Discovery
70 pages
Chapter 3 ML
No ratings yet
Chapter 3 ML
27 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
Lesson 8 Association Rules
No ratings yet
Lesson 8 Association Rules
58 pages
Assignment ON Data Mining: Submitted by Name: Manjula.T
No ratings yet
Assignment ON Data Mining: Submitted by Name: Manjula.T
11 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
8 pages
W05.data Mining Functionalities
No ratings yet
W05.data Mining Functionalities
31 pages
A216 - DWM - LAbno 9
No ratings yet
A216 - DWM - LAbno 9
8 pages
Data Mining - Classification Using Frequent Pattern
No ratings yet
Data Mining - Classification Using Frequent Pattern
8 pages
Data Mining: Budi Santosa, PHD 2008 Lab Komputasi Dan Optimasi Industri Teknik Industri Its
No ratings yet
Data Mining: Budi Santosa, PHD 2008 Lab Komputasi Dan Optimasi Industri Teknik Industri Its
42 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
77 pages
Clustering & Association Algorithms 4
No ratings yet
Clustering & Association Algorithms 4
17 pages
Explain Architecture of Data Mining
No ratings yet
Explain Architecture of Data Mining
12 pages
U2 - Apriori - 5th Sem - DS
No ratings yet
U2 - Apriori - 5th Sem - DS
12 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
A216 - DWM - LAbno 9
No ratings yet
A216 - DWM - LAbno 9
8 pages
ML Unit - Iii
No ratings yet
ML Unit - Iii
64 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
4 pages
Unit-4 Da
No ratings yet
Unit-4 Da
15 pages
BDA LabReport-9
No ratings yet
BDA LabReport-9
17 pages
Unit Ii DM
No ratings yet
Unit Ii DM
82 pages
Association Rules
No ratings yet
Association Rules
48 pages
Data Mining Frequent Patterns
No ratings yet
Data Mining Frequent Patterns
22 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
K Means Clustering
No ratings yet
K Means Clustering
27 pages
Mining Frequent Patterns Unit-3
No ratings yet
Mining Frequent Patterns Unit-3
13 pages
Python DM Lab Manual Part 2
No ratings yet
Python DM Lab Manual Part 2
8 pages
What Is A Frequent Itemset?
No ratings yet
What Is A Frequent Itemset?
7 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
19 pages
Association Rules
No ratings yet
Association Rules
24 pages
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages
UNIT 3: Association Rules and Regression: I) Apriori Algorithm
No ratings yet
UNIT 3: Association Rules and Regression: I) Apriori Algorithm
18 pages
Importance of Clustering
No ratings yet
Importance of Clustering
5 pages
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
No ratings yet
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
6 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Association Rule Mining
No ratings yet
Association Rule Mining
19 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
Rani 2
No ratings yet
Rani 2
98 pages
Association Rules
No ratings yet
Association Rules
14 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
30 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Mod 5
No ratings yet
Mod 5
56 pages
Learn Design and Analysis of Algorithms in 24 Hours
From Everand
Learn Design and Analysis of Algorithms in 24 Hours
Alex Nordeen
No ratings yet
A Conversation About Calculus
From Everand
A Conversation About Calculus
Ginachukwu Amah
No ratings yet
Annex-E FYP Final Report Template
No ratings yet
Annex-E FYP Final Report Template
19 pages
Assessment 1
No ratings yet
Assessment 1
14 pages
Week 14 Tableau Lecture
No ratings yet
Week 14 Tableau Lecture
8 pages
Ch7 SystemIntegrationTesting
No ratings yet
Ch7 SystemIntegrationTesting
26 pages
Ch3 UnitTesting
No ratings yet
Ch3 UnitTesting
30 pages
CH 3 - Information System and Its Components
No ratings yet
CH 3 - Information System and Its Components
25 pages
Wahyudi 2021 J. Phys. Conf. Ser. 1830 012016
No ratings yet
Wahyudi 2021 J. Phys. Conf. Ser. 1830 012016
13 pages
IRJEAS04V6I101180318000002
No ratings yet
IRJEAS04V6I101180318000002
5 pages
Deep Learning
100% (1)
Deep Learning
3 pages
Early Predicting of Students Performance in Higher
No ratings yet
Early Predicting of Students Performance in Higher
12 pages
SEM II - Subjects
No ratings yet
SEM II - Subjects
6 pages
Data Mining PDF
No ratings yet
Data Mining PDF
23 pages
Clustering
No ratings yet
Clustering
110 pages
CSE E Lab Solution PPs
No ratings yet
CSE E Lab Solution PPs
56 pages
CRM in Retailing (A Content Analysis of Retail Trade Journals)
No ratings yet
CRM in Retailing (A Content Analysis of Retail Trade Journals)
6 pages
BRI405B
No ratings yet
BRI405B
2 pages
Lecture 4 - Density Based Methods
No ratings yet
Lecture 4 - Density Based Methods
16 pages
Pengelompokan Kejadian Gempa Bumi Menggunakan Fuzzy C-Means Clustering
No ratings yet
Pengelompokan Kejadian Gempa Bumi Menggunakan Fuzzy C-Means Clustering
8 pages
Detailed Syllabus Trim 10
No ratings yet
Detailed Syllabus Trim 10
16 pages
Data Mining in Healthcare
No ratings yet
Data Mining in Healthcare
10 pages
Data Extraction Cleanup and Transformation Tools
100% (2)
Data Extraction Cleanup and Transformation Tools
10 pages
Crime Analysis and Prediction Using Datamining: A Review
No ratings yet
Crime Analysis and Prediction Using Datamining: A Review
20 pages
Keras Cheat Sheet Python For Data Science: Model Architecture Inspect Model
No ratings yet
Keras Cheat Sheet Python For Data Science: Model Architecture Inspect Model
1 page
Data Mining Techniques and Its Application in Industrial Engineerin1
No ratings yet
Data Mining Techniques and Its Application in Industrial Engineerin1
80 pages
Adm Unit - 1
No ratings yet
Adm Unit - 1
62 pages
Data Warehousing and Data Mining - Thara - M.Tech Cse
No ratings yet
Data Warehousing and Data Mining - Thara - M.Tech Cse
11 pages
DM00 Intro HWS2019 - v1
No ratings yet
DM00 Intro HWS2019 - v1
71 pages
Internet Technologies Notes 2 - TutorialsDuniya
No ratings yet
Internet Technologies Notes 2 - TutorialsDuniya
74 pages
Exercise 1
No ratings yet
Exercise 1
6 pages
60 Common Data Mining Interview Questions in 2025
No ratings yet
60 Common Data Mining Interview Questions in 2025
20 pages
Young Achievers School of Caloocan Inc.: A Compilation Study
No ratings yet
Young Achievers School of Caloocan Inc.: A Compilation Study
4 pages
0b9e4data Warehousing and Data Mining - Merged
No ratings yet
0b9e4data Warehousing and Data Mining - Merged
14 pages
Implementation of Apriori Algorithm For Analysis o
No ratings yet
Implementation of Apriori Algorithm For Analysis o
9 pages
Backpropagation Neural Network
No ratings yet
Backpropagation Neural Network
21 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ARM and Clustering

Uploaded by

ARM and Clustering

Uploaded by

Association Rule Mining

• Association Rule Mining is one of the ways to

Support(Ketchup) = (Transactions containingKetchup)/(Total Transactions)

Confidence(Burger→Ketchup) = (Transactions containing both (Burger and Ketchup))/(Transactions containing Burger)

• It is a classical algorithm in data mining.

• Let’s take an example from the supermarket sphere

 Find useful patterns from store_data using Apriori algorithm

• Clustering is a type of unsupervised learning method.

• In basic terms, the objective of clustering is to find different groups

• To do so, clustering algorithms find the structure in the data so that

Repeat the cluster

Perform the Move

Repeat the cluster

if you keep running additional

Hence, depending upon the

So, if k-means does not give

So, if k-means does not give

So, if k-means does not give

Note: this multiple random

How to choose the value of

Solution No. 2: By hand

A useful method is to use the

A useful method is to use the

Distortion goes down rapidly

One reason people don’t use elbow method because it

Sometimes, particular business

1. Relatively simple to implement.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.