Data Analysis
Data Analysis
Page 1 / 1
Data Analytics Report
Retail Sales Forecasting for a Chain of Supermarkets
1. Introduction
Background of the Dataset
Today, stores uses the data (information) to make better decisions, help the customers, and work
more smoothly. Big store chains that have shops in many places collect a lot of data every day.
This data comes from things like sales, talking to customers, checking what’s in stock, and
running discounts. When this data is studied, it helps the stores know what to sell, how much to
charge, how much stock to keep, and how to advertise to different kinds of customers.
This report uses a made-up but realistic set of data to show how a supermarket chain might work
in both cities and villages. The data is based on the real life examples found on websites like
Kaggle and UCI that share data or datasets for learning. It includes over 100,000 sales made
during one year, showing shopping the habits, changes during different seasons, and how
discounts can affect buying.
Each row in the dataset shows one shopping transaction and includes the following important
details:
• Date of purchase: Shows the exact day something is bought by someone. This helps us
to see the patterns over days, weeks, months, and seasons.
• Store location and type: Tells us if the store is in a city or village, and in which city.
This helps compare how well stores are doing in different places.
• Product category: Products are grouped into the types like groceries, personal care,
electronics, clothes, and household items. This helps us to see which types of the
products make the most money and which are most affected by the sales and offers.
• Units sold and money earned: It shows how many items were sold and how much
money was made from them. This helps to measure store performance, how fast the
products sell, and how much the money is made from per customer.
• Discounts: Shows if a product was sold at a lower price and how big the discount was.
This helps us to understand that if sales and offers really work or not.
• Customer details: Basic customer information like age group, gender, and if they are a
member of the loyalty program. This can helps to group the customers and study how
differently people shop.
Use this kind of dataset is helpful because it looks like the real data that medium and and large
stores uses it every day. Big retail companies int the world track this kind of data or information.
It helps stores to managers to make the smart choices and smart decisions based on the data, and
not only just by guesses.
One great thing about this dataset is that it has many kind of data. It has numbers like how much
items was sold and how much money was made, and it also has categories like store location,
product type, and customer type. Because of this, we can use many ways to study the data like
looking for patterns, making predictions and grouping similar things.
From a business point of view, this data can help answer important questions like:
• How do customer choices change by age, gender, or location?
This report will look at the data step by step, using modern tools to show how a store can use its
data to improve and make better decisions.
2. Methodology
Analytical Workflow and Tools
To get useful information from the data, we followed a clear step-by-step process used in the
data industry. We used ideas from a common method called CRISP-DM, which helps to guide
the data projects. We used both basic math (statistics) and computer models (machine learning)
to answer the business questions mentioned earlier.
The goal was to find the patterns, understand how different things in the data are connected, and
help the retail business make smart decisions based on facts.
• Removed Duplicates: We deleted the repeated records so that the results wouldn’t be
unfair or wrong.
• Fixed Missing Data: Some parts of the data were empty, like customer details or
discount info. We lled the missing numbers using the average, and for missing
categories (like gender), we used the most common value. If too much data was missing
in a row, we removed that row.
• Made Categories Consistent: We cleaned up the category names like store type (city or
village), product types, and gender so everything can matched. We then changed these
words into numbers using the simple methods so computer models could understand
them.
• Changed Date Format: We changed the date eld into a special format that lets us easily
nd the day, month, or quarter. This helped us study sales over time and see trends.
• Basic Statistics: We used the Python tools like Pandas and NumPy to nd the average,
middle value, how much values the change, and how often things appear.
• Graphs and Charts: We used tools like Seaborn and Matplotlib to make:
◦ Line graphs to see how the sales changed over the time
From these graphs, we learned that the city stores sold more electronics, while village stores
made more money from groceries.
• ARIMA Model: This method helped us to predict the future sales by looking at the past
trends and patterns.
• Exponential Smoothing: This helped us to see seasonal effects (like higher sales during
holidays) by reducing short-term ups and downs.
• STL (Seasonal Decomposition of Time Series): This method broke the sales data into
three parts:
These tools helped us to understand how the sales change over time and plan better for the
future.
4. Regression Analysis
To nd out what things affect the sales, we used the following methods:
• Linear Regression: This helped us to see how the things like discounts, customer details
(like age or gender), and store location affect sales and the number of items sold.
• We grouped customers based on how much they spend, how often they shop, and what
they buy using a method called K-Means Clustering.
• We found different types of shoppers (like discount hunters, premium buyers, and bulk
buyers) to help improve marketing and customer loyalty strategies.
This helped the business make better marketing and loyalty plans.
6. Tools Used
We used both coding tools and spreadsheet software to do the analysis.
fi
Python Libraries:
o Pandas and NumPy for data handling and numerical operations
o Seaborn and Matplotlib for advanced data visualization
o Scikit-learn for regression and clustering models
o Statsmodels for statistical modeling and time series analysis
• Excel: Used at the beginning to look at the data, make quick summaries, and create pivot
tables before doing deeper analysis with code.
• Sales were higher on the weekends than weekdays. On average, people spent about
$18,500 on weekends and $14,000 on the weekdays, showing a 32% increase in weekend
spending.
• The highest sales in one day happened on December 23rd, with $78,000 in sales likely
because of last minute holiday shopping.
• We also saw more sales in stationery and clothes during August and September, which
matches the school season.
Store Avg Monthly Revenue Avg Transac on Value Customer Foo all (avg/
Type ($) ($) month)
Urban 450,000 38 11,842
Rural 320,000 31 10,322
• Urban stores made more money than rural stores about 40.6% more. This is because city
stores have more people, higher spending, and bigger shopping.
• City stores also had more customers and bigger sales per purchase compared to rural
stores.
Product Performance Matrix
• Electronics and household items made more pro t (35% and 28%).
• The best-selling product was “Family Pack Milk 2L”, selling about 3,450 units per
month, making $6,200every month.
• Promotions for Cluster 2 (Premium Shoppers) worked well — 18.3% more of them
bought expensive products, showing that targeted ads and deals were effective.
fi
ti
ti
tf
ti
ti
fi
fi
• Cluster 1 customers didn’t spend as much per the visit, but they made up more than half
of all shoppers, so it’s smart to keep them happy with bulk or value deals.
Strengths
• Accuracy: The error was low, so we can trust the predictions.
• We grouped customers in helpful ways for better marketing and planning.
• We used charts and tables to explain the findings and easy to understand.
• The results gave direct ideas for stock, ads and running the business.
Even the data was not real but based on realistic example, the results shows how data can be
used to make smart business decision. This analysis helps us to increase sales, cut down on waste
in both urban and rural areas.
Recommendations
Based on the findings from the data, we proposed some recommendations to get benefit in the
business:
If the company follows these steps, it can improve how it run the stores, market to the
right people, and plan better decisions which can increase the profits by 10-15% each
year. Using data in daily decisions is not just helpful , it’s a big advantage in todays
business world.