Hive Case Study: Problem Statement
Hive Case Study: Problem Statement
PROBLEM STATEMENT :
With the popularity of online sales growing, IT companies are looking for ways to
improve their sales by analysing client behaviour and learning about product trends.
Furthermore, the websites make it easy for users to find the things they need without having
to scavenge for them. Needless to say, the position of big data analyst is one of the most
sought-after job profiles of the decade. As a result, as part of this assignment, we will want
you, as a big data analyst, to extract data and generate insights from a real-world data set
from an e-commerce company.
CASE STUDY:
1. We uploaded the files to S3 before launching the Hadoop EMR Cluster.
2. In the user, we created a directory called "case study."
12.We now compete with the prior table Sales and optimise the table sales_data.
The sales table took 3.4 seconds, whereas the sales data took 0.4 seconds.
Solutions
Write a query to yield the total sum of purchases per month in a single output.
month(event_time)=10 then cast(price as decimal(9,2)) else null end) as oct_rev from sales_data);
Query : Select distinct category_code from sales_data where category_code is not null;
Find the total number of products available under each category.
Query: Select count(category_code) count, category_code from sales_data where category_code is
not null group by category_code;
Which brand had the maximum sales in October and November combined?
Query: Select brand, count (*) as sales_count from sales_data group by brand order by sales_count
desc limit 5;
O/P: Brand with no name comes in first with 3645290, followed by runail.
Which brands increased their sales from October to November?
Query : Select brand, count(case when month(event_time)=11 then 1 else null end) as nov_sales,
count(case when month(event_time)=10 then 1 else null end) as oct_sales from sales_data group by
brand having nov_sales > oct_Sales;
Your company wants to reward the top 10 users of its website with a Golden
Customer plan. Write a query to generate a list of top 10 users who spend the
most.