0% found this document useful (0 votes)
20 views27 pages

SMDM Coded Project - Vidya Sawant

The document analyzes customer data from an automobile company to understand customer demand and improve their marketing campaign. It describes the data, performs outlier detection and treatment, and univariate and bivariate analysis. Key findings are that most customers are male, married, and between ages 22-39, and demand is higher for lower to mid-priced vehicles.

Uploaded by

Vidya Sawant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views27 pages

SMDM Coded Project - Vidya Sawant

The document analyzes customer data from an automobile company to understand customer demand and improve their marketing campaign. It describes the data, performs outlier detection and treatment, and univariate and bivariate analysis. Key findings are that most customers are male, married, and between ages 22-39, and demand is higher for lower to mid-priced vehicles.

Uploaded by

Vidya Sawant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

SMDM Coded Project- Vidya Sawant

Contents
Problem Statement, Objective .........................................................................................................................................1
Data Description & Pre-Processing....................................................................................................................................2
Outlier Detection and Treatment.......................................................................................................................................3
Univariate Analysis...................................................................................................................................................19
Bivariate Analysis.....................................................................................................................................................19
Correlation plot of all Numeric variable..............................................................................................................................19
Pairplot of all Numeric varibales.....................................................................................................................................19
Problem 1- Key questions to be answered............................................................................................................................19
Problem 1 –Actionable insights and recommendation...............................................................................................................19
Problem 2- Framing analytical problem..............................................................................................................................19
Problem 2- Business Justification....................................................................................................................................19
Objective..............................................................................................................................................................19
Context................................................................................................................................................................19
Objective..............................................................................................................................................................19
Context................................................................................................................................................................19
Objective..............................................................................................................................................................19
Context................................................................................................................................................................19
Objective..............................................................................................................................................................19
Context................................................................................................................................................................19
Objective..............................................................................................................................................................19
Context................................................................................................................................................................19
Objective..............................................................................................................................................................19
Context................................................................................................................................................................19
Objective..............................................................................................................................................................19
2

Problem Statement:
Austo Motor Company is a leading car manufacturer specializing in SUV, Sedan, and Hatchback models. In its recent board meeting, concerns were
raised by the members on the efficiency of the marketing campaign currently being used. The board decides to rope in an analytics professional to
improve the existing campaign.

Objective:
Deep dive into data to get a fair idea about the demand of customers which will help them in enhancing their customer experience. Suppose you are a
Data Scientist at the company and the Data Science team has shared some of the key questions that need to be answered. Perform the data analysis to
find answers to these questions that will help the company to improve the business.

Data Description:
 age: The age of the individual in years.
 gender: The gender of the individual, categorized as male or female.
 profession: The occupation or profession of the individual.
 marital_status: The marital status of the individual, such as married &, single
 education: The educational qualification of the individual Graduate and Post Graduate
 no_of_dependents: The number of dependents (e.g., children, elderly parents) that the individual supports financially.
 personal_loan: A binary variable indicating whether the individual has taken a personal loan "Yes" or "No"
 house_loan: A binary variable indicating whether the individual has taken a housing loan "Yes" or "No"
 partner_working: A binary variable indicating whether the individual's partner is employed "Yes" or "No"
 salary: The individual's salary or income.
 partner_salary: The salary or income of the individual's partner, if applicable.
 Total_salary: The total combined salary of the individual and their partner (if applicable).
 price: The price of a product or service.
 make: The type of automobile
3

Dataset – austo_automobile.csv
Dataset has total 1581 rows and 14 columns. It has 8 objects, 5 Integer and 1 float datatype columns

There are 106 null values in the ‘Partner_salary’ and 53 null values in ‘Gender’

# Column Non-Null Count Dtype


--- ------ -------------- -----
0 Age 1581 non-null int64
1 Gender 1528 non-null object
2 Profession 1581 non-null object
3 Marital_status 1581 non-null object
4 Education 1581 non-null object
5 No_of_Dependents 1581 non-null int64
6 Personal_loan 1581 non-null object
7 House_loan 1581 non-null object
8 Partner_working 1581 non-null object
9 Salary 1581 non-null int64
10 Partner_salary 1475 non-null float64
11 Total_salary 1581 non-null int64
12 Price 1581 non-null int64
13 Make 1581 non-null object
dtypes: float64(1), int64(5), object(8)

Statistical Summary of the Numerical Variables before proceeding for the analysis
4

Correction of missing data,anomalies,Duplication before proceeding for the analysis


 As we have seen there are 106 null values in the partner salary and these null values have been imputed with the mean of the partner salary
only for the customers with working partner
 For ‘Gender’ null values , we have imputed Mode of the Gender. i.e ‘Male’ in this case
 Also, ‘Gender’ column has some bad data (Femle , Femal ) which we have rectified by replacing correct keyword i.e ‘’Female
 There are no duplicates in the data
 Total_salary = We also need to recalculate the Total_salary as Partner_salary was imputed with mean of Partner_salary in place of null
values

Statistical Summary of the Numerical Variables post removing anamolies,bad


data etc.
5

Outlier Detection and Treatment


 With the help of boxplot, we have detected that there are some outliers in the Total_salary and treated with IQR method.

 After the treatment to outliers now we can see there are no outliers in ‘Total_salary’ column. Please see below boxplot.
6

Univariate Analysis
 Univariate analysis of Numerical and Categorical variables have been performed to check the pattern of the data

Numeric Variables -Here is the viz. for Numeric Univariate Analysis with the help of Histplot and Boxplot to understand the distribution and
pattern

Observations:

 Salary of customers has more of symmetrical distribution. Median salary is Rs.59500/- and most of the customers are buying the cars who are
in the range of median salary
 Age distribution is more or less left skewed and most of the car buyers are ranging between the age group from 22 to 39. To get the clear
understanding of ‘Age’ feature , we have grouped the Age into Age Buckets.
7

 Price - Higher priced cars are very less in demand and the distribution is left skewed. Avg of the Car Prices is Rs.35597

Categorical Variables – Lets deep dive into univariate analysis of Categorical Variables based on Ratio (%)

 Gender – It is clear that demand from Male is higher than Female. It is almost close to 80% from Male in terms of buying cars

 Marital Status – Married customers tend to buy cars more as compared single customers. Its is almost 91% Married customers have bought
the automobiles.
8

 Make –Sedan and Hatchback models have more demands compared to SUV

 Education –Post Graduates customers are more i.e 62% as compared to Graduates customers.

 Profession – Salaried customers have more buying capacity compared to Businessmens.

 Partner_working –Cutomers with working partners have raised more demand compared to customers with non-working partner

 Personal_loan – It is difficult to conclude at this point as the Ratio is almost same for Customers having and not having Perosnal_loan

 House_loan – Customers with no House_loan has more capacity to buy the automobiles as compared to customers with having house loan.
9

Below are the Bar plots visualization for categorical variables based on number of cars bought

 From above Age group categorical view, It is clearly observed that demand from age group 26 to 39 is more compared to other age groups
10

Bivariate Analysis :
Lets deep dive into Bivariate Analysis of Numeric Vs Numeric Variable

Observations – Based on above scatterplots , we can observe relationships between variables

 There is positive correlation between Age & Salary – as the age increases salary also increases and in turn people will be influenced to buy
higher priced cars
 There is positive correlation between Price of the cars and Age of Customers as the aged customers have purchased higher priced cars.
 Customers Age between 22 to 30 prefer to buy cars ranging between 20k to 35k approximately.
 Also there is positive correlation between Total Salary and Price. Higher the total salary ,customers will buy higher priced cars
11

Lets deep dive into Bivariate Analysis of Numeric Vs Categorical Variables

Observations – Based on above plots , we can observe relationships between Categorical Vs Categorical Varibales

 Married Male have raised more demands in buying card compared to Married Female (refer Graph 1)
 Age Group between 26 to 39 are buying cars more compared to other age group (refer Graph 2)
 Number of Customers with working partner is higher than Customers with Non-working Partners (refer Graph 3)
 For Hatchback, demand from female is very less but demand from Male is very high but on the other hand female prefer SUV than Sedan and
Hatchback—(refer Graph 4)
 For SUV and Sedan , salaried customers are more but Hatchback have demands from both salaried and businessmen in almost equal ratio
 No of Dependents with 2 & 3 have more volume than others. Customers with 0 dependent are very less in numbers.
12

Correlation Plot for all Numeric Variables

 As mentioned earlier too, there is very strong correlation between Age and Price variables
 Also, Salary and Age also positively correlated
13

PairPlot for all Numeric Variable


14

 Now let’s do Bivariate Analysis of Categorical Vs Numeric Variable

Observations :

Median price for SUV is higher than Sedan and Hatchback.

For SUV , the Average Price is higher than the median price.

For Sedan and Hatchback, the distribution is symmetrical.

For age group 26-39 , the distribution is symmetrical but at the same time we have detected the outlier in this case.

For age group 50-59 and <25 , the distribution is right skewed as the age increases, Total Salary of customers also increase

Based on Gender & Price boxplot it is clearly noted that the median spend by Female is higher than the median spend by Male,but there are some
exception in case of Male as we can see extreme values in prices of cars spend by Male customers
15

Poblem 1:Key questions to be answered based on overall analysis


1. Do men tend to prefer SUVs more compared to women?
Answer – From below analysis, it is clear that Women prefer SUVs (173 cars) more than Men (124 cars)

2. What is the likelihood of a salaried person buying a Sedan?


Answer : From above viz, it is understood that Salaried people has more likelyhood buying Sedan Car. Total 396 cars were purchased by salaried people.
16

3. What evidence or data supports Sheldon Cooper's claim that a salaried male is an easier target for a SUV sale over a Sedan sale?

Answer -From below viz, its is proved that salaried male is an easier target for SUV sale over Sedan sale. Salaried male
prefer less to buy Sedan compared to SUV

4. How does the the amount spent on purchasing automobiles vary by gender?
Answer -based on below viz,Female has spent more on purchasing Automobiles compared to male even though No. of Female buyers are less.
17

5. How much money was spent on purchasing automobiles by individuals who took a personal loan?
Answer – From the below table, Rs.27290000 amount was spent on purchasing automobiles by individuals who took a personal loan.

6.How does having a working partner influence the purchase of higher-priced cars?
Answer – From above scatter plot we can see that customers with working partner have purchased higher price cars compared to Non-working_Partner
18

Problem 1: Actionable Insights and recommendation


 1. As Married Males tend to buy more of ‘Sedans’ & ‘Hatchback’ may be due to budget constraints and other liabilities like ‘Personal
loans’ & ‘house loans. Multiple schemes can be provided to them for discounts on ‘SUVs’. Which can come under their budget and is more
spacious as per the family requirements.
 2. Single Males tend to buy more of ‘Hatchback’ cars as compared to other two categories. Marketing strategy can be devised in a way to
target only single males for increasing sales of ‘hatchback’ cars.
 3. Married females tend to buy more of ‘SUVs’ & ‘Sedans’ as cars as compared to ‘hatchback’. Marketing strategy can be devised in a way
to target only married females for increasing sales of ‘SUVs’ & ‘Sedans’.
 4. We found that lower salary income people take personal loans to buy the cars where as the persons having higher salary will buy the
higher price car without taking personal loan
 It seems that Male prefers less price cars compares to females. Hence, Males are easy target for selling the lower priced cars. And,
Females are the targets for selling high price cars.

Problem 2 – Framing analytical problem


Context
A bank generates revenue through interest, transaction fees, and financial advice, with interest charged on customer loans being a significant source
of profits. GODIGT Bank, a mid-sized private bank, offers various banking products and cross-sells asset products to existing customers through
different communication methods. However, the bank is facing high credit card attrition, leading them to reevaluate their credit card policy to ensure
customers receive the right card for higher spending and intent, resulting in profitable relationships.

Objective
As a Data Scientist at the company and the Data Science team has shared some data. You are supposed to find the key variables that have a vital
impact on the analysis which will help the company to improve the business.

Type of Data and Its Features


• Data set: godigt_cc_data.csv
• The data has 8448 rows & 28 columns
• The data contains 19 Integer & 8 Object and 1 datetime datatype columns
19

• The Dataset does not have any duplicate entries

Critical Analysis of the data set:


• There are 38 missing values in “Transactor_revolver“ column and that can be replaced by Mode value
• There are some ‘0’ values in column ‘Occupation_at_source’ , which can be replaced with Mode value
20

Based on some analysis and studying the data , below are the few important varibales in order to grow the business of credit cards
and reduce the risk of attrition

1) Card_type
2) high_networth
3) Transactor_revolver
4) Occupation at source
5) Avg Spends
6) Annual Income at source
7) CC Limit
With regards to the above important variables we have framed some analytical questions and tried to find the answers to it.
1) How many credit cards and what is the volume of customers against these credit cards?

Answer -'rewards', 'prosperity', 'edge', 'chartered', 'smartearn', 'shoprite', 'indianoil', 'cashback', 'aura', 'gold', 'prime', 'pulse', 'elite',
'centurion', 'platinum' are the type of Credit Cards

Above barplot shows that 'rewards', 'prosperity', 'edge', 'chartered', 'smartearn', 'shoprite', 'indianoil', 'cashback' and ‘aura' has more
customers. ‘rewards’ and ‘Prosperity’ are the top most credit card types based on customers volume
21

2) What is the volume of customers based on high_networth category?


Answer – It is clear that customers are almost equally distributed among these categories but High_networth A has maximum customers
compared to other categories

3) How many credit cards does not have activity from past 90 days?
Answer : Out of Total 8448 credit cards,3106 does not have activity from past 90 days.
22

4) How many credit cards does not have activity from past 90 days?
Answer : Below are the Credit Card Types have non-activity from past 90 days. Customers with Rewards card types are Top most customers
falling under non-activity from past 90 days

5) How many customers are falling under Transactor and revolver category?
Answer – approx 85% customers are Transactor but 15% are revolver, Need to further analyze the data of revolver customers
23

6) What are credit card types whose card holders are revolver?
Answer – Under the 15% revolver customers, below are the card types falling under revolver category.

7) What is the average spend amount based on customer's occupation?


Answer : Salaried and Self Employed customers have higher avg spends than other categories. Housewife has very lesser amount spends
compared to others
24

8) What is the correlation between the some of the important variables?


Answer :From the below heatmap, we can find out the correlation between these variable.
Customers Annual Income is highly correlated with the credit card limit. It means the based on highest annual income , credit card limit will
also be more.
Correlation between Avg Spends from past 3 months and Credit Card limit also has some good amount of correlation. It shows that the more
spending capacity of the customers if they falling under highest annual income category
25

9) Which are Top 5 Card Types have maximum avg spends?


Answer -edge,prosperity,chartered,rewards and smartearn card types have more avg spends compared to other credit cards.

Business Justification:

As a result of the above analysis, we can conclude that the bank should focus more on improving or reviewing their credit card policy towards
Edge, Prosperity & Chartered card holders issued by Visa who are mostly Self-employed and salaried as their average spending is high in the last
three months and their spending is unaffected although they own other bank credit cards. And in addition to this they settle their balances in
full every month.

Hence, we can conclude that below are the top five important variables for framing a good analytical business problem for GODIGTBank to
review its credit card policy.
The columns look as follows:
1. card_type
2. annual_income_at_source
3. Occupation_at_source
4. avg_spends_l3m
5. Transactor_revolver

If the cc_active30, cc_active60 & cc_active90 have 0 transactions , then the avg_spends_l3m should also be 0 but values are still there in the
data.Thanks
26

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy