Problem Statement
Problem Statement
Problem Statement
A Grocery Store shared the transactional data with you. Your job is to identify the most
popular combos that can be suggested to the Grocery Store chain after a thorough analysis of
the most commonly occurring sets of items in the customer orders. The Store doesn’t have
any combo offers. Can you suggest the best combos & offers?
We aim to analyze the association rules to suggest the best combo and offers for the
The data provided in the csv file has a Point of Sale (POS).
Tableau used for EDA Visualization
Sivaramakrishnan_S_MRA_Project_MileStone_2.knwf
TOOLS USED
ABOUT DATA
DATA DICTIONARY
No of transactions : 20682
No of features : 3
INFORMATION
& No missing values
ASSUMPTIONS No duplicates
# of Unique Orders : (1 to 1139)
# of Unique Products : 37
Data provided from Jan to Sep for 2 years (2018, 2019) and 2020 with 2 months(Jan and Feb)
YEARLY TREND
The year 2018 has the highest no of orders followed by 2019, Since the data in the year
2020 has only 2 months so very low count in orders.
OVERALL MONTHLY
There is highest no of unique orders in Jan(174) and low number of orders made in June(105)
MONTHLY TREND
The Q1 2019 and Q3 2018 have the highest no of orders (180) and the lowest no of orders in Q1 2020 since it contains only 2 months of data.
DAY WISE TREND
High number of orders made on mid of the month and start of month is low and it reduced
at end of month.
PRODUCTS COUNT
The product poultry is the order highest no of orders and hand shop is the lowest no of orders.
POULTRY - 480
CEREALS - 451
WAFFLES - 449
CHEESES - 445
SODA - 445
EGGS - 444
BAGELS - 439
YOGURT - 438
MILK - 433
COFFEE/TEA - 432
SOAP - 432
JUICE - 429
MIXES - 428
BEEF - 427
KETCHUP - 423
PASTA - 423
FRUITS - 422
TORTILLAS - 421
SHAMPOO - 420
BUTTER - 419
SUGAR - 411
PORK - 405
FLOUR - 402
All Purpose is general product so we will remove the data to get better combos
Market Basket Analysis creates If-Then scenario rules, for example, if item A is purchased then
item B is likely to be purchased. The rules are probabilistic in nature or, in other words, they are
derived from the frequencies of co-occurrence in the observations. Frequency is the proportion of
baskets that contain the items of interest. The rules can be used in pricing strategies, product
placement, and various types of cross-selling strategies. In order to make it easier to understand,
think of Market Basket Analysis in terms of shopping at a supermarket. Market Basket Analysis
takes data at transaction level, which lists all items bought by a customer in a single purchase.
The technique determines relationships of what products were purchased with which other
product(s). These relationships are then used to build profiles containing If-Then rules of the
items purchased. The rules could be written as If {A} Then {B}
The If part of the rule (the {A} above) is known as the antecedent and the THEN part of the rule is
known as the consequent (the {B} above).
The antecedent is the condition and the consequent is the result. The association rule has three
measures that express the degree of confidence in the rule, Support, Confidence, and Lift.
Threshold Values Support: Its the default popularity of an item. In mathematical terms, the support of item A is
nothing but the ratio of transactions involving A to the total number of transactions.
Confidence: Likelihood that customer who bought both A and B. Its divides the number of
transactions involving both A and B by the number of transactions involving B.
Data Load
Filtered Data
The filtered data then grouped with Order ID and the unique values of 1139 rows
MBA – CELL SPLITTER Grouped Data
This is the most important node for our Market Basket Analysis. We have here the three metrics
that are Support, Confidence and Lift, we added a value to our Support which is between 0-1.
We added value of 0.03 that is 3% sell of a product from overall transactions and we also
selected the association rule for the minimum confidence as 0.05. So as you can see the values
So as we can see in the previous slide the table shows 145104 records in which each row contains a
INFERENCE
different rules.
It has created multiple rules on the basis of threshold limit that we have set earlier in the Association Rule
Learner Node and whichever has a higher lift value we recommend that product to the customer
Consequent column contains recommended products and we have sorted the lift values from higher to
If we see the result table of the Association Rule Learner some item are single as well as double and
INSIGHTS &
So generally we recommend the products that are listed in consequent feature which has a higher lift
value
That means it has the higher probability of being purchased by the customer.