SMDM Coded Project - Vidya Sawant
SMDM Coded Project - Vidya Sawant
Contents
Problem Statement, Objective .........................................................................................................................................1
Data Description & Pre-Processing....................................................................................................................................2
Outlier Detection and Treatment.......................................................................................................................................3
Univariate Analysis...................................................................................................................................................19
Bivariate Analysis.....................................................................................................................................................19
Correlation plot of all Numeric variable..............................................................................................................................19
Pairplot of all Numeric varibales.....................................................................................................................................19
Problem 1- Key questions to be answered............................................................................................................................19
Problem 1 –Actionable insights and recommendation...............................................................................................................19
Problem 2- Framing analytical problem..............................................................................................................................19
Problem 2- Business Justification....................................................................................................................................19
Objective..............................................................................................................................................................19
Context................................................................................................................................................................19
Objective..............................................................................................................................................................19
Context................................................................................................................................................................19
Objective..............................................................................................................................................................19
Context................................................................................................................................................................19
Objective..............................................................................................................................................................19
Context................................................................................................................................................................19
Objective..............................................................................................................................................................19
Context................................................................................................................................................................19
Objective..............................................................................................................................................................19
Context................................................................................................................................................................19
Objective..............................................................................................................................................................19
2
Problem Statement:
Austo Motor Company is a leading car manufacturer specializing in SUV, Sedan, and Hatchback models. In its recent board meeting, concerns were
raised by the members on the efficiency of the marketing campaign currently being used. The board decides to rope in an analytics professional to
improve the existing campaign.
Objective:
Deep dive into data to get a fair idea about the demand of customers which will help them in enhancing their customer experience. Suppose you are a
Data Scientist at the company and the Data Science team has shared some of the key questions that need to be answered. Perform the data analysis to
find answers to these questions that will help the company to improve the business.
Data Description:
age: The age of the individual in years.
gender: The gender of the individual, categorized as male or female.
profession: The occupation or profession of the individual.
marital_status: The marital status of the individual, such as married &, single
education: The educational qualification of the individual Graduate and Post Graduate
no_of_dependents: The number of dependents (e.g., children, elderly parents) that the individual supports financially.
personal_loan: A binary variable indicating whether the individual has taken a personal loan "Yes" or "No"
house_loan: A binary variable indicating whether the individual has taken a housing loan "Yes" or "No"
partner_working: A binary variable indicating whether the individual's partner is employed "Yes" or "No"
salary: The individual's salary or income.
partner_salary: The salary or income of the individual's partner, if applicable.
Total_salary: The total combined salary of the individual and their partner (if applicable).
price: The price of a product or service.
make: The type of automobile
3
Dataset – austo_automobile.csv
Dataset has total 1581 rows and 14 columns. It has 8 objects, 5 Integer and 1 float datatype columns
There are 106 null values in the ‘Partner_salary’ and 53 null values in ‘Gender’
Statistical Summary of the Numerical Variables before proceeding for the analysis
4
After the treatment to outliers now we can see there are no outliers in ‘Total_salary’ column. Please see below boxplot.
6
Univariate Analysis
Univariate analysis of Numerical and Categorical variables have been performed to check the pattern of the data
Numeric Variables -Here is the viz. for Numeric Univariate Analysis with the help of Histplot and Boxplot to understand the distribution and
pattern
Observations:
Salary of customers has more of symmetrical distribution. Median salary is Rs.59500/- and most of the customers are buying the cars who are
in the range of median salary
Age distribution is more or less left skewed and most of the car buyers are ranging between the age group from 22 to 39. To get the clear
understanding of ‘Age’ feature , we have grouped the Age into Age Buckets.
7
Price - Higher priced cars are very less in demand and the distribution is left skewed. Avg of the Car Prices is Rs.35597
Categorical Variables – Lets deep dive into univariate analysis of Categorical Variables based on Ratio (%)
Gender – It is clear that demand from Male is higher than Female. It is almost close to 80% from Male in terms of buying cars
Marital Status – Married customers tend to buy cars more as compared single customers. Its is almost 91% Married customers have bought
the automobiles.
8
Make –Sedan and Hatchback models have more demands compared to SUV
Education –Post Graduates customers are more i.e 62% as compared to Graduates customers.
Partner_working –Cutomers with working partners have raised more demand compared to customers with non-working partner
Personal_loan – It is difficult to conclude at this point as the Ratio is almost same for Customers having and not having Perosnal_loan
House_loan – Customers with no House_loan has more capacity to buy the automobiles as compared to customers with having house loan.
9
Below are the Bar plots visualization for categorical variables based on number of cars bought
From above Age group categorical view, It is clearly observed that demand from age group 26 to 39 is more compared to other age groups
10
Bivariate Analysis :
Lets deep dive into Bivariate Analysis of Numeric Vs Numeric Variable
There is positive correlation between Age & Salary – as the age increases salary also increases and in turn people will be influenced to buy
higher priced cars
There is positive correlation between Price of the cars and Age of Customers as the aged customers have purchased higher priced cars.
Customers Age between 22 to 30 prefer to buy cars ranging between 20k to 35k approximately.
Also there is positive correlation between Total Salary and Price. Higher the total salary ,customers will buy higher priced cars
11
Observations – Based on above plots , we can observe relationships between Categorical Vs Categorical Varibales
Married Male have raised more demands in buying card compared to Married Female (refer Graph 1)
Age Group between 26 to 39 are buying cars more compared to other age group (refer Graph 2)
Number of Customers with working partner is higher than Customers with Non-working Partners (refer Graph 3)
For Hatchback, demand from female is very less but demand from Male is very high but on the other hand female prefer SUV than Sedan and
Hatchback—(refer Graph 4)
For SUV and Sedan , salaried customers are more but Hatchback have demands from both salaried and businessmen in almost equal ratio
No of Dependents with 2 & 3 have more volume than others. Customers with 0 dependent are very less in numbers.
12
As mentioned earlier too, there is very strong correlation between Age and Price variables
Also, Salary and Age also positively correlated
13
Observations :
For SUV , the Average Price is higher than the median price.
For age group 26-39 , the distribution is symmetrical but at the same time we have detected the outlier in this case.
For age group 50-59 and <25 , the distribution is right skewed as the age increases, Total Salary of customers also increase
Based on Gender & Price boxplot it is clearly noted that the median spend by Female is higher than the median spend by Male,but there are some
exception in case of Male as we can see extreme values in prices of cars spend by Male customers
15
3. What evidence or data supports Sheldon Cooper's claim that a salaried male is an easier target for a SUV sale over a Sedan sale?
Answer -From below viz, its is proved that salaried male is an easier target for SUV sale over Sedan sale. Salaried male
prefer less to buy Sedan compared to SUV
4. How does the the amount spent on purchasing automobiles vary by gender?
Answer -based on below viz,Female has spent more on purchasing Automobiles compared to male even though No. of Female buyers are less.
17
5. How much money was spent on purchasing automobiles by individuals who took a personal loan?
Answer – From the below table, Rs.27290000 amount was spent on purchasing automobiles by individuals who took a personal loan.
6.How does having a working partner influence the purchase of higher-priced cars?
Answer – From above scatter plot we can see that customers with working partner have purchased higher price cars compared to Non-working_Partner
18
Objective
As a Data Scientist at the company and the Data Science team has shared some data. You are supposed to find the key variables that have a vital
impact on the analysis which will help the company to improve the business.
Based on some analysis and studying the data , below are the few important varibales in order to grow the business of credit cards
and reduce the risk of attrition
1) Card_type
2) high_networth
3) Transactor_revolver
4) Occupation at source
5) Avg Spends
6) Annual Income at source
7) CC Limit
With regards to the above important variables we have framed some analytical questions and tried to find the answers to it.
1) How many credit cards and what is the volume of customers against these credit cards?
Answer -'rewards', 'prosperity', 'edge', 'chartered', 'smartearn', 'shoprite', 'indianoil', 'cashback', 'aura', 'gold', 'prime', 'pulse', 'elite',
'centurion', 'platinum' are the type of Credit Cards
Above barplot shows that 'rewards', 'prosperity', 'edge', 'chartered', 'smartearn', 'shoprite', 'indianoil', 'cashback' and ‘aura' has more
customers. ‘rewards’ and ‘Prosperity’ are the top most credit card types based on customers volume
21
3) How many credit cards does not have activity from past 90 days?
Answer : Out of Total 8448 credit cards,3106 does not have activity from past 90 days.
22
4) How many credit cards does not have activity from past 90 days?
Answer : Below are the Credit Card Types have non-activity from past 90 days. Customers with Rewards card types are Top most customers
falling under non-activity from past 90 days
5) How many customers are falling under Transactor and revolver category?
Answer – approx 85% customers are Transactor but 15% are revolver, Need to further analyze the data of revolver customers
23
6) What are credit card types whose card holders are revolver?
Answer – Under the 15% revolver customers, below are the card types falling under revolver category.
Business Justification:
As a result of the above analysis, we can conclude that the bank should focus more on improving or reviewing their credit card policy towards
Edge, Prosperity & Chartered card holders issued by Visa who are mostly Self-employed and salaried as their average spending is high in the last
three months and their spending is unaffected although they own other bank credit cards. And in addition to this they settle their balances in
full every month.
Hence, we can conclude that below are the top five important variables for framing a good analytical business problem for GODIGTBank to
review its credit card policy.
The columns look as follows:
1. card_type
2. annual_income_at_source
3. Occupation_at_source
4. avg_spends_l3m
5. Transactor_revolver
If the cc_active30, cc_active60 & cc_active90 have 0 transactions , then the avg_spends_l3m should also be 0 but values are still there in the
data.Thanks
26