DMW Unit4
DMW Unit4
Let’s see ASSOCIATION RULE {IF} -> {THEN} rules used in Market
Basket Analysis in Data Mining. For example, customers buying
a domain means they definitely need extra plugins/extensions
to make it easier for the users.
Like we said above Antecedent is the item sets that are
available in data. By formulating from the rules
means {if} component and from the example is the domain.
Same as Consequent is the item that is found with the
combination of Antecedents. By formulating from the rules
means {THEN} component and from the example is extra
plugins/extensions.
With the help of these, we are able to predict customer
behavioral patterns. From this, we are able to make certain
combinations with offers that customers will probably buy
those products. That will automatically increase the sales and
revenue of the company.
With the help of the Apriori Algorithm, we can further classify
and simplify the item sets which are frequently bought by the
consumer.
There are three components in APRIORI ALGORITHM:
SUPPORT
CONFIDENCE
LIFT
Now take an example, suppose 5000 transactions have been
made through a popular eCommerce website. Now they want
to calculate the support, confidence, and lift for the two
products, let’s say pen and notebook for example out of 5000
transactions, 500 transactions for pen, 700 transactions for
notebook, and 1000 transactions for both.
SUPPORT: It is been calculated with the number of transactions
divided by the total number of transactions made,
Support=freq(A,B)/NSupport=freq(A,B)/N
support(pen) = transactions related to pen/total transactions
i.e support -> 500/5000=10 percent
CONFIDENCE: It is been calculated for whether the product
sales are popular on individual sales or through combined sales.
That is calculated with combined transactions/individual
transactions.
Confidence=freq(A,B)/freq(A)Confidence=freq(A,B)/freq(A)
Confidence = combine transactions/individual transactions
i.e confidence-> 1000/500=20 percent
LIFT: Lift is calculated for knowing the ratio for the sales.
Lift=confidencepercent/
supportpercentLift=confidencepercent/supportpercent
Lift-> 20/10=2
When the Lift value is below 1 means the combination is not so
frequently bought by consumers. But in this case, it shows that
the probability of buying both the things together is high when
compared to the transaction for the individual items sold.
With this, we come to an overall view of the Market Basket
Analysis in Data Mining and how to calculate the sales for
combination products.
Types of Market Basket Analysis
There are three types of Market Basket Analysis. They are as
follow:
1. Descriptive market basket analysis: This sort of analysis
looks for patterns and connections in the data that exist
between the components of a market basket. This kind of
study is mostly used to understand consumer behavior,
including what products are purchased in combination and
what the most typical item combinations. Retailers can
place products in their stores more profitably by
understanding which products are frequently bought
together with the aid of descriptive market basket
analysis.
2. Predictive Market Basket Analysis: Market basket analysis
that predicts future purchases based on past purchasing
patterns is known as predictive market basket analysis.
Large volumes of data are analyzed using machine learning
algorithms in this sort of analysis in order to create
predictions about which products are most likely to be
bought together in the future. Retailers may make data-
driven decisions about which products to carry, how to
price them, and how to optimize shop layouts with the use
of predictive market basket research.
3. Differential Market Basket Analysis: Differential market
basket analysis analyses two sets of market basket data to
identify variations between them. Comparing the behavior
of various client segments or the behavior of customers
over time is a common usage for this kind of study.
Retailers can respond to shifting consumer behavior by
modifying their marketing and sales tactics with the help
of differential market basket analysis.
Benefits of Market Basket Analysis
1. Enhanced Customer Understanding: Market basket
research offers insights into customer behavior, including
what products they buy together and which products they
buy the most frequently. Retailers can use this information
to better understand their customers and make informed
decisions.
2. Improved Inventory Management: By examining market
basket data, retailers can determine which products are
sluggish sellers and which ones are commonly bought
together. Retailers can use this information to make well-
informed choices about what products to stock and how to
manage their inventory most effectively.
3. Better Pricing Strategies: A better understanding of the
connection between product prices and consumer
behavior might help merchants develop better pricing
strategies. Using this knowledge, pricing plans that boost
sales and profitability can be created.
4. Sales Growth: Market basket analysis can assist businesses
in determining which products are most frequently bought
together and where they should be positioned in the store
to grow sales. Retailers may boost revenue and enhance
customer shopping experiences by improving store layouts
and product positioning.
Applications of Market Basket Analysis
1. Retail: Market basket research is frequently used in the
retail sector to examine consumer buying patterns and
inform decisions about product placement, inventory
management, and pricing tactics. Retailers can utilize
market basket research to identify which items are sluggish
sellers and which ones are commonly bought together, and
then modify their inventory management strategy
accordingly.
2. E-commerce: Market basket analysis can help online
merchants better understand the customer buying habits
and make data-driven decisions about product
recommendations and targeted advertising campaigns.
The behaviour of visitors to a website can be examined
using market basket analysis to pinpoint problem areas.
3. Finance: Market basket analysis can be used to evaluate
investor behaviour and forecast the types of investment
items that investors will likely buy in the future. The
performance of investment portfolios can be enhanced by
using this information to create tailored investment
strategies.
4. Telecommunications: To evaluate consumer behaviour and
make data-driven decisions about which goods and
services to provide, the telecommunications business
might employ market basket analysis. The usage of this
data can enhance client happiness and the shopping
experience.
5. Manufacturing: To evaluate consumer behaviour and
make data-driven decisions about which products to
produce and which materials to employ in the production
process, the manufacturing sector might use market
basket analysis. Utilizing this knowledge will increase
effectiveness and cut costs.
1. Frequent item sets, also known as association rules, are a
fundamental concept in association rule mining, which is a
technique used in data mining to discover relationships
between items in a dataset. The goal of association rule
mining is to identify relationships between items in a
dataset that occur frequently together.
2. A frequent item set is a set of items that occur together
frequently in a dataset. The frequency of an item set is
measured by the support count, which is the number of
transactions or records in the dataset that contain the item
set. For example, if a dataset contains 100 transactions and
the item set {milk, bread} appears in 20 of those
transactions, the support count for {milk, bread} is 20.
3. Association rule mining algorithms, such as Apriori or FP-
Growth, are used to find frequent item sets and generate
association rules. These algorithms work by iteratively
generating candidate item sets and pruning those that do
not meet the minimum support threshold. Once the
frequent item sets are found, association rules can be
generated by using the concept of confidence, which is the
ratio of the number of transactions that contain the item
set and the number of transactions that contain the
antecedent (left-hand side) of the rule.
4. Frequent item sets and association rules can be used for a
variety of tasks such as market basket analysis, cross-
selling and recommendation systems. However, it should
be noted that association rule mining can generate a large
number of rules, many of which may be irrelevant or
uninteresting. Therefore, it is important to use appropriate
measures such as lift and conviction to evaluate the
interestingness of the generated rules.
Association Mining searches for frequent items in the data set.
In frequent mining usually, interesting associations and
correlations between item sets in transactional and relational
databases are found. In short, Frequent Mining shows which
items appear together in a transaction or relationship.
Need of Association Mining: Frequent mining is the generation
of association rules from a Transactional Dataset. If there are 2
items X and Y purchased frequently then it’s good to put them
together in stores or provide some discount offer on one item
on purchase of another item. This can really increase sales. For
example, it is likely to find that if a customer
buys Milk and bread he/she also buys Butter. So the association
rule is [‘milk]^[‘bread’]=>[‘butter’]. So the seller can suggest
the customer buy butter if he/she buys Milk and Bread.
Important Definitions :
Support : It is one of the measures of interestingness. This
tells about the usefulness and certainty of rules. 5%
Support means total 5% of transactions in the database
follow the rule.
Support(A -> B) = Support_count(A ∪ B)
Confidence: A confidence of 60% means that 60% of the
customers who purchased a milk and bread also bought
butter.
Confidence(A -> B) = Support_count(A ∪ B) / Support_count(A)
If a rule satisfies both minimum support and minimum
confidence, it is a strong rule.
Support_count(X): Number of transactions in which X
appears. If X is A union B then it is the number of
transactions in which A and B both are present.
Maximal Itemset: An itemset is maximal frequent if none
of its supersets are frequent.
Closed Itemset: An itemset is closed if none of its
immediate supersets have same support count same as
Itemset.
K- Itemset: Itemset which contains K items is a K-itemset.
So it can be said that an itemset is frequent if the
corresponding support count is greater than the minimum
support count.
Example On finding Frequent Itemsets – Consider the given
dataset with given transactions.
1 Bread, Milk
Before we start defining the rule, let us first see the basic
definitions. Support Count( ) – Frequency of occurrence of a
itemset.
Here ({Milk, Bread, Diaper})=2
Frequent Itemset – An itemset whose support is greater than or
equal to minsup threshold. Association Rule – An implication
expression of the form X -> Y, where X and Y are any 2 itemsets.
Example: {Milk, Diaper}->{Beer}
Rule Evaluation Metrics –
Support(s) – The number of transactions that include
items in the {X} and {Y} parts of the rule as a percentage of
the total number of transaction.It is a measure of how
frequently the collection of items occur together as a
percentage of all transactions.
Support = (X+Y) total – It is interpreted as fraction of
transactions that contain both X and Y.
Confidence(c) – It is the ratio of the no of transactions that
includes all items in {B} as well as the no of transactions
that includes all items in {A} to the no of transactions that
includes all items in {A}.
Conf(X=>Y) = Supp(X Y) Supp(X) – It measures how
often each item in Y appears in transactions that contains
items in X also.
Lift(l) – The lift of the rule X=>Y is the confidence of the
rule divided by the expected confidence, assuming that the
itemsets X and Y are independent of each other.The
expected confidence is the confidence divided by the
frequency of {Y}.
Lift(X=>Y) = Conf(X=>Y) Supp(Y) – Lift value near 1
indicates X and Y almost often appear together as
expected, greater than 1 means they appear together
more than expected and less than 1 means they appear
less than expected.Greater lift values indicate stronger
association.
Example – From the above table, {Milk, Diaper}=>{Beer}
s= ({Milk, Diaper, Beer}) |T|
= 2/5
= 0.4
Step-3:
o Generate candidate set C3 using L2 (join step).
Condition of joining Lk-1 and Lk-1 is that it should have
(K-2) elements in common. So here, for L2, first
element should match.
So itemset generated by joining L2 is {I1, I2, I3}{I1, I2,
I5}{I1, I3, i5}{I2, I3, I4}{I2, I4, I5}{I2, I3, I5}
o Check if all subsets of these itemsets are frequent or
not and if not, then remove that itemset.(Here subset
of {I1, I2, I3} are {I1, I2},{I2, I3},{I1, I3} which are
frequent. For {I2, I3, I4}, subset {I3, I4} is not frequent
so remove it. Similarly check for every itemset)
o find support count of these remaining itemset by
searching in dataset.
(II) Compare candidate (C3) support count with minimum
support count(here min_support=2 if support_count of
candidate set item is less than min_support then remove those
items) this gives us itemset L3.
Step-4:
o Generate candidate set C4 using L3 (join step).
Condition of joining Lk-1 and Lk-1 (K=4) is that, they
should have (K-2) elements in common. So here, for
L3, first 2 elements (items) should match.
o Check all subsets of these itemsets are frequent or
not (Here itemset formed by joining L3 is {I1, I2, I3, I5}
so its subset contains {I1, I3, I5}, which is not
frequent). So no itemset in C4
o We stop here because no frequent itemsets are found
further
L = {K : 5, E : 4, M : 3, O : 4, Y : 3}
Here, all the items are simply linked one after the other in the
order of occurrence in the set and initialize the support count
for each item as 1.
b) Inserting the set {K, E, O, Y}:
For each row, two types of association rules can be inferred for
example for the first row which contains the element, the rules
K -> Y and Y -> K can be inferred. To determine the valid rule,
the confidence of both the rules is calculated and the one with
confidence greater than or equal to the minimum confidence
value is retained.