0% found this document useful (0 votes)
50 views13 pages

CSA 106 Market Basket Analysis

The document discusses association rule mining and market basket analysis. It defines key concepts like support, confidence and lift used to evaluate association rules. Association rule mining is used to discover relationships between variables in large datasets. Applications include retail market basket analysis to understand customer purchasing patterns and identify frequently bought item combinations. The Apriori algorithm is described as a popular method to identify frequent itemsets and generate association rules from transactional data.

Uploaded by

Harold Costales
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views13 pages

CSA 106 Market Basket Analysis

The document discusses association rule mining and market basket analysis. It defines key concepts like support, confidence and lift used to evaluate association rules. Association rule mining is used to discover relationships between variables in large datasets. Applications include retail market basket analysis to understand customer purchasing patterns and identify frequently bought item combinations. The Apriori algorithm is described as a popular method to identify frequent itemsets and generate association rules from transactional data.

Uploaded by

Harold Costales
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Association Rule Mining

Market-Basket Analysis
Objectives
• Discuss the concept of association rule mining and provide
examples of its application.
• Calculate support, confidence, and lift for a given set of association
rules and interpret the results.
• Implement association rule mining algorithms such as Apriori or FP-
Growth on a dataset and generate meaningful rules.
Introduction to Association Rule Mining
• Association mining, also known as Association Rule Learning, is a
data mining technique that involves discovering interesting
relationships or associations between variables in a large dataset.
• The goal of association mining is to identify patterns, correlations,
and dependencies between variables that are not immediately
obvious, and to use these insights to improve decision-making,
customer segmentation, or product recommendations.
Applications
• Association mining is commonly used in the field of marketing,
where it is used to understand customer behavior and preferences.
• For example, a retailer might use association mining to uncover
patterns in customer purchasing behavior, such as which products
are frequently purchased together, which products are most likely
to be purchased based on the time of day, or which products are
most likely to be purchased by specific customer segments.
• It can also be used in healthcare to understand the association
between patient symptoms and diagnoses, in financial services to
detect fraudulent transactions, and in telecommunications to
identify customer usage patterns.
Metrics
• Association mining is typically performed using two main metrics:
support and confidence.
• Support measures the frequency or occurrence of a specific item or
itemset in the dataset, while confidence measures the strength of
the association between two items or item sets.
• A third metric, lift, is sometimes used to measure the strength of
the association between two items or item sets relative to the
expected frequency of occurrence if the items were independent.
Steps
• First, the dataset is prepared and preprocessed to ensure that it is in a format
that can be easily analyzed.
• Next, frequent item sets are identified using algorithms such as Apriori or FP-
Growth.
• These algorithms use the support metric to identify item sets that occur frequently in the
dataset.
• Once frequent item sets have been identified, association rules are generated
by examining the item sets for interesting patterns or relationships.
• Rules are typically generated using the confidence metric to measure the strength of the
association between the antecedent and consequent of the rule.
• Finally, the association rules are evaluated and pruned to identify the most
interesting or actionable patterns.
• This might involve applying additional filters or constraints to the rules, such as only
considering rules that involve a minimum number of items or that have a lift value above a
certain threshold.
Market-Basket Analysis
• Market Basket Analysis (MBA) is a data mining technique used to
uncover the association between items that customers purchase
together.
• It is based on the idea that if a customer buys a certain product,
they are likely to buy other related or complementary products as
well.
• This type of analysis is often used in retail and e-commerce
businesses to understand customer behavior, improve product
recommendations, and optimize pricing and promotions.
Market-Basket Analysis
• MBA uses the concept of association rules, which are statements
that describe the co-occurrence of items in a dataset.
• The two primary metrics used in MBA are support and confidence.
Support refers to the percentage of transactions that contain a
particular item or set of items, while confidence measures the
likelihood of an item being purchased given that another item has
already been purchased.
• These metrics are used to identify frequent item sets, which are
groups of items that are frequently purchased together.
Market-Basket Analysis (Limitation)
• One limitation of MBA is that it only identifies associations between
items and does not provide information on causality or
directionality.
• Therefore, MBA should be used in conjunction with other data
analysis techniques to gain a more complete understanding of
customer behavior.
Apriori Algorithm
• The Apriori algorithm is a popular algorithm used in association rule
mining to find frequent item sets in a large dataset.
• The algorithm was introduced by Rakesh Agrawal and
Ramakrishnan Srikant in 1994 and is based on the idea of
generating frequent item sets from smaller itemsets.
Apriori Algorithm – How it works?
• The Apriori algorithm works by first identifying all the frequent itemsets of size
1 in the dataset, which are called level-1 itemsets.
• It then uses these level-1 item sets to generate candidate itemsets of size 2,
which are then used to find the frequent item sets of size 2.
• This process is repeated iteratively to generate candidate item sets of size k+1
using the frequent items ets of size k until no more frequent item sets can be
found.
• The algorithm uses a minimum support threshold to determine which item sets
are frequent.
• An itemset is considered frequent if its support is greater than or equal to the
minimum support threshold.
• The support of an itemset is the number of transactions in which the itemset
appears.
Apriori Algorithm – Optimization
• The Apriori algorithm has several optimizations to reduce the
computational complexity of the algorithm.
• One such optimization is called the "pruning" technique, which
involves eliminating candidate item sets that are infrequent by
using the downward closure property of frequent item sets.
Apriori Algorithm – Applications
• The Apriori algorithm is widely used in many applications such as
market basket analysis, recommendation systems, and customer
behavior analysis.
• However, it has some limitations, such as the inability to handle
datasets with a large number of items or a large number of
transactions.

• Overall, the Apriori algorithm is a powerful and efficient algorithm


for identifying frequent itemsets in a large dataset and generating
association rules from them.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy