0% found this document useful (0 votes)

63 views42 pages

Credit EDA Case Study

This document presents an analysis of credit application data to determine loan approval risk. It loads current and previous loan application data and performs exploratory data analysis including: - Inspecting and cleaning the data by imputing missing values, handling data types, and removing outliers. - Checking for data imbalance in the target variable of payment difficulties, finding an 11.3 ratio imbalance. The data is split into non-defaulted and defaulted populations for further analysis. - Performing univariate analysis on categorical variables like gender, age, income type, education, and family type to understand patterns in loan risk for different groups. Female clients and middle-aged clients showed higher risk overall.

Uploaded by

varsha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views42 pages

Credit EDA Case Study

Uploaded by

varsha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

CREDIT EDA CASE STUDY

PGDDS C26 November 2020

Presented By
VADAGAM PRAVALIKA
Introduction

1 Objective

• When the company receives a loan application, the company must decide for loan approval based on the applicant’s profile. Two types of risks are
associated with the bank’s decision:

• If the applicant is likely to repay the loan, then not approving the loan results in a loss of business to the company

• If the applicant is not likely to repay the loan, i.e., he/she is likely to default, then approving the loan may lead to a financial loss for the company.

2 Data set used

• Current applications “application_data.csv”

• Previous applications “previous_application.csv”

Steps Involved

1 Loading Data 8 Bivariate Analysis : Numerical – Numerical

2 Inspecting Data 9 Correlations

3 Data Analysis 10 Loading Data "previous_application.csv"

4 Checking the Data Imbalance 11 Final Conclusions

Univariate analysis for Categorical columns

6 Univariate Analysis for Numerical columns

7 Bivariate Analysis : Numerical – Categorical

2.Loading Data / EDA
EDA ANALYSIS

• Reading Data

1. Reading Dataset application_data.csv.

• We found that the column CODE_GENDER having value ‘XNA’ which simply
2. Reading Dataset previous_application.csv.
means Not Available . So, we imputed those values with the most frequent
value(mode value) i.e., F.

• Inspecting Data frame

• Then we found that column ORGANIZATION_TYPE also have 18% ‘XNA’
1. Inspecting and understanding Data values. So firstly, we checked that whether the values are Missing Completely at
Random(MCAR), Missing at Random(MAR) or Missing Not at
• Checking few records of Dataset such as .shape, .info(), .describe() . Random(MNAR).After comparing the values of ORGANIZATION_TYPE with third
2. Data Cleaning variables i.e., values of column NAME_INCOME_TYPE , we found that Clients
• Checking the percentage of null values in the data frame in descending order. who are pensioner were having XNA values in ORGANIZATION_TYPE. So, we
• Analyzing number of null columns replaced “XNA” with “Pensioner”.
• Dropped columns having null values > 35%
• Imputing Columns having null values ≤ 19% with Mode values for numeric columns except for
continuous numeric columns we imputed with Median value. • Similarly, we found the same relation between OCCUPATION_TYPE and
NAME_INCOME_TYPE , so we imputed null values with “Pensioner”.
3. Handling Errors in Data types and Data

• Checking the percentage of null values in the data frame in descending order.
• Analyzing number of null columns : 49
• For convenience to our analysis, we made binning based on the quantiles for
• Dropped columns having null values > 35%
the columns mentioned below:
• Imputing Columns having null values ≤ 19% with Mode values for numeric columns except for
AMT_INCOME_TOTAL , AMT_CREDIT AND AGE_DAYS
continuous numeric columns we imputed with Median value.

• After observing data frame, we find columns: 'DAYS_BIRTH', ‘DAYS_EMPLOYED',

‘DAYS_REGISTRATION', 'DAYS_ID_PUBLISH‘ and 'DAYS_LAST_PHONE_CHANGE’ which had
negative or mixed values, So we imputed them with absolute values for our analysis.
• Then we changed the values of columns FLAG_OWN_CAR and FLAG_OWN_REALTY from ‘Y’ and
‘N’ to 1 and 0 respectively for convenience in analysis.
3.Analysis
3.1 Data Types Conversion's for better understanding of Variables

In data frame, we observed some columns having d-type

“object” ,which can be converted to d-type “category”
which will be convenient for our analysis as well as it will
save memory usage.
There are so many columns in data frame. We’ll remove
columns which we don’t need for further analysis.
3.2 Finding and Analyzing outliers

We made list of all numerical columns .And then

plot Boxplots for each numerical columns.

• IQR for AMT_INCOME_TOTAL is very slim, and it have many

outliers. It has Maximum value of 117000000 which is a huge
variation from 75th percentile.

AMT_INCOME_TOTAL AMT_GOODS_PRICE

• Third quartile of AMT_GOODS_PRICE, AMT_CREDIT is larger

as compared to First quartile which means that most of
the Credit amount of the loan of customers are present in the
third quartile. Here maximum value is 4050000 which varies a lot
from 75th percentile.

• DAYS_EMPLOYED have maximum value at 375000 ,which vary a

lot from mean and 75th percentile.
AMT_CREDIT DAYS_EMPLOYED
• Visual Representation by Boxplot is shown
4. Checking Imbalance
4.1 Checking Data Imbalance for Target Variable

Since there is a huge imbalance between

the TARGET variables 0 and 1, it makes
more sense to divide data frame into
two sub datasets then continue our
analysis.

We have splits data frame as

follows:

• Target0 : (Non-Defaulted Population)

Clients without Payment Difficulties.

Conclusion
• Target1 : (Defaulted Population)
Clients with Payment Difficulties.
We have found that -
Ratio of Data Imbalance is “11.3”
In order to analyze the imbalance and various aspects of data we will perform
various types of analysis such as:

• Univariate analysis , Bivariate analysis , Multivariate analysis

5.Univariante Analysis
Categorical
5.1 Gender/Age Group with respect to Target variables

• It seems like Female clients applied

higher than male clients for loan.

• 66.6% Female clients and 33.4%

male clients are payment difficulties.

• 57% Female clients and 42% male clients

are with payment difficulties.

• Middle Age Group(35 – 60) have

applied most and have higher payment
difficulties amongst all.

• While Senior Citizens(60 -

100) and Very young (19 - 25) age
group facing fewer paying difficulties as
compared to other age groups
5.2 CONTRACT/INCOME Type with respect to Target variables

• Most of the clients applied for the Cash

Loan while a very small proportion
applied for Revolving Loan.

• But Clients applied for Cash Loan have

higher payment difficulties.

 Clients who applied for loans were getting income by Working, Commercial associate
and Pensioner are more likely to apply for the loan, highest being the Working-class
category .

 Businessman, students and Unemployed less likely to apply for loan .

 Working category have high risk to default.

 State Servant is at Minimal risk to default.

5.3 Education/Family Type with respect to Target variables

• Clients having Secondary/Secondary Special Education are more

likely to apply for the Loan.

• Also, clients having Secondary/Secondary Special Education are

facing higher payment difficulties ,so they have high risk to default.
Other education types are at minimal risk

• It seems that Married clients applied most for the loan and have higher
payment difficulties.

• Widows are less likely to apply for the loan and have minimal risk.

• Clients with the single relationship have minimal risk to default i.e., have less
payment difficulties.
5.3 Occupation/Income Range with respect to Target variables

• Pensioners and Laborers have applied the most for the

loan.

• Pensioner being highest followed by laborers have higher

payment difficulties , so have high risk to default.

• Clients having Medium salary range are more likely to apply for the loan.
And have higher payment difficulties.

• Clients having low and medium income are at high risk to default.

• Clients having high salaries are at minimal risk.

5.3 Housing Type/Credit Range with respect to Target variables

• Most of the clients applied for the loan owns a

house/apartment and have a higher payment difficulties.

• Other clients have less payment difficulties with low

applications for loan.

• Most of the clients applied for Medium Credit Amount

for the loan .

• Clients applying for medium and low credit have high

payment difficulties and have high risk to default.
6.Univariante Analysis
Numerical
6.1 Annuity/Credit with respect to Target variables

• Distribution of
AMT_ANNUITY for
Target1 is broader than
Target0

• Distribution of AMT_CREDIT for

Target1 and Target0 is similar.
6.2 Goods Price/ Amt Income with respect to Target variables

•Distribution of
AMT_GOODS_PRICE for
Target1 and Target0 is
similar.

• Distribution of _INCOME_TOTAL for

Target1 and Target0 is similar.
6.3 Points To Conclude

Notes

• People with target one has largely staggered income as compared to

target zero. Dist. plot clearly shows that the shape in Income total,
Annuity, Credit and Good Price are similar for Target 0 and similar for
Target 1.

• The plots are also highlightingthat people who have difficulty in paying
back loans with respectto their income, loan amount, price of goods
against which loan is procured and Annuity.

• Dist. plot highlights the curve shape which is wider for Target 1 in
comparison to Target 0 which is narrower with well defined edges.
7. Bivariate Analysis
Numerical V/s Categorical
7.1 Income_Amount Vs Education_Status Vs FAMILY_Status among Target Variables

• Widow Client with Academic degree have a very few outliers and doesn't have First and
Third quartile. Also, Clients with all type of family status having academic degree have
very less outliers as compared to other type of education.

• Income of the clients with all type of family status having rest of the education type lie
Below the First quartile i.e., 25%.

• Income amount for Married clients with academic

degree is much lesser as compared to others.

• (Defaulter) Clients have relatively less income as

compared to Non-defaulters.
7.2 Credit Vs Education_Status Vs FAMILY_Status among Target Variables

• Clients with all Education type except Academic degree have large number of outliers

• Most of the population i.e., clients credit amounts lie below 25th percentile.

• Clients with Academic degree and who is a widow tend to take higher credit loan.

• Some of the clients with Higher Education, Incomplete Higher Education, Lower
Secondary Education and Secondary/Secondary Special Education are more likely to take
high amount of credit loan.

• Married client with academic applied for higher

credit loan. And doesn't have outliers. Single clients
with academic degree have a very slim boxplot with
no outliers.

• Some of the clients with Higher Education,

Incomplete Higher Education, Lower Secondary
Education and Secondary/Secondary Special
Education are more likely to take high amount of
credit loan.
8. Bivariate Analysis
Categorical V/s Categorical
• Though count of working clients applying for loan is significantly high , risk to default in payments is less
as compared to others

• Count of clients with income type Maternity leave is only 5, but risk to default in payments for those is
minimum among all the
income types.

• Same condition is observed in case of unemployed. Though count is very low, risk to default in payments
is low.

• Pensioner, State servant and Commercial associate have higher risk to default.

• Cash loans have higher risk to default, Revolving loans have comparatively lower
risk for the same .
• Clients having Low
income have high risk
to default followed by
clients with medium
and very low income.

• Clients with high

salaries have minimal
risk to default.

• Clients having
Academic Degree and
higher Education have
lower risk to default.

• Clients having Lower

• Low-skill Laborers have higher risk to default
Secondary ,
Secondary/Secondary
Special Education have
very high risk to
• Managers, High skill tech staff and Accountants have
relatively lower risk to default
default.
8.Correlation’s
Points to Mark:
GOODS PRICE Vs CREDIT AMOUNT
1. High correlated variables
for both defaulters and non
T-0 T-1 - defaulters. So as the home
price increases the loan
amount also increases

2. High correlated variables

for both defaulters and non
defaulters . So as the home
CREDIT AMOUNT Vs ANNUITY price increases the EMI
amount also increases
which is logical
T-0 T-1

Conclusion

• All three variables Are

highly correlated for both
ANNUITY Vs GOODS PRICE defaulters and non-
defaulters, which might not
give a good indicator for
defaulter detection
T-1
T-0
8.2 Correlations between numerical variables using "Heatmap's"

TARGET 0 As we can see from correlation heat map

for TARGET-0, There are number of
observation we can point out

• AMT_CREDIT is inversely proportional to the

DAYS_BIRTH, peoples belongs to low-age group
taking high Credit amount and vice-versa

• AMT_CREDIT is inversely proportional to the

CNT_CHILDREN, means Credit amount is higher
for less children count client have and vice-versa.

• AMT_INCOME_TOTAL is inversely proportional to

the CNT_CHILDREN, means more income for less
children client have and vice-versa.

• Less CNT_CHILDREN client have in densely

populated area.

• AMT_CREDIT is higher to densely populated area.

• AMT_INCOME_TOTAL is also higher in densely

populated area.
8.2 Correlations between numerical variables using "Heatmap's"

• AMT_CREDIT is inversely
TARGET 1 proportional to the DAYS_BIRTH,
peoples belongs to low-age group
taking high Credit amount and vice-
versa Conclusion
This heat map for Target 1 is
• AMT_CREDIT is inversely also having quite a same
observation just like Previous
proportional to the CNT_CHILDREN,
Target 0. But for few points
means Credit amount is higher for
are different. They are listed
less children count client have and
below.
vice-versa.

• AMT_INCOME_TOTAL is inversely • The client's

proportional to the CNT_CHILDREN, permanent address
means more income for less children does not match
client have and vice-versa. contact address are
having less children.
• Less CNT_CHILDREN client have in
• The client's
densely populated area.
permanent address
• AMT_CREDIT is higher to densely does not match work
populated area. address are having
less children.
• AMT_INCOME_TOTAL is also higher
in densely populated area.
9. Top 10 Correlation’s
9.1Correlation Remarks

• Top 10 correlations between both Default (TARGET 1) and non default(TARGET 0)

clients are almost at the same level for different variables

TARGET 0 TARGET 1

V/s
10. Loan Distributions and
Purposes
10.1 Percentage of `NAME_CONTRACT_STATUS` and `NAME_CLIENT_TYPE`

• Around 80.7% clients were repeaters

applying for loan.

• 14.5% clients are new applying for the

loan.

Percentage of contracts approved or not in previous

applications

• Approved :- 38.8%
• Refused :- 58.5%
• Canceled :- 2.3%
• Unused offer :- 0.31%
10.2 CONTRACT_STATUS V/s LOAN_PURPOSE

• Most rejection of loans came from

purpose "Repairs".

• For "Education" & "Medicine"

purposes we have equal number of
approves and rejection

• “Paying other loans” and "Buying a

new car'` is having significant higher
rejection than approves.
10.3 TARGET V/s LOAN_PURPOSE

• Here also there is a high

variation for “Repairs" for both
the targets

• Comparing "Education" &

"Medicine" purposes , Medicine
there is high no. of clients
having difficulties for re
payment the loan amount
compare to "Education“

• “Buying used Car" and “Building

purpose” client having
difficulties in payment have
equal ratio
10.4 AMT_CREDIT vs NAME_HOUSING_TYPE

Here for Housing type

• Office apartment is having higher credit of target

0 and co-op apartment is having higher credit of
TARGET 1.

Conclusion

So, we can conclude that bank should avoid giving

loans to the housing type of co-op apartment as they
are having difficulties in payment. Bank can focus
mostly on housing type with parents or House or
municipal apartment for successful payments.
Conclusions
Final Insights

• We observe a decrease in the percentage of Payment Difficulties who are pensioners and an increase in the
percentage of Payment Difficulties who are working when compared the percentages of both Payment
Difficulties and non-Payment Difficulties.

• We observe a decrease in the percentage of married and widowed with Loan Payment Difficulties and an
increase in the the percentage of single and civil married with Loan Payment Difficulties when compared with
the percentages of both Loan Payment Difficulties and Loan Non-Payment Difficulties

• We observe an increase in percentage of Loan Payment Difficulties whose educational qualifications are
secondary/secondary special and a decrease in the percentage of Loan Payment Difficulties who have
completed higher education when compared with the percentages of Loan Payment Difficulties and Loan Non-
Payment Difficulties
Final Insights

• The count of ‘Low skilled Laborers’ in ‘OCCUPATION_TYPE’ is comparatively very less and it also has maximum
% of payment difficulties- around 17%. Hence, client with occupation type as ‘Low skilled Laborers’ are the
driving factors for Loan Defaulters.

• The count of ‘Lower Secondary’ in ‘NAME_EDUCATION_TYPE’ is comparatively very less and it also has
maximum % of payment difficulties- around 11%. Hence, client with education type as ‘Lower Secondary’ are
the driving factors for Loan Defaulters.

• Banks should focus more on contract type Student ,pensioner and Businessman with housing type other than
Co-op apartment, Office apartment for successful payments.

• Banks should focus less on income type Working as they are having the greatest number of unsuccessful
payments.
Final Insights

• Applicants living in House/Apartments has the highest number of loan application. While we see that Rented
apartment and applicants living with parents have very high percentage to default

• Get as much as clients from housing type With parents as they are having least number of unsuccessful
payments

• Also, with loan purpose Repair is having higher number of unsuccessful payments on time.
THANK YOU

Credit EDA Assignment
67% (6)
Credit EDA Assignment
41 pages
EDA Credit Case Study (Karan Pratap Singh)
100% (1)
EDA Credit Case Study (Karan Pratap Singh)
63 pages
Capstone Project - Credit Risk Analysis
67% (6)
Capstone Project - Credit Risk Analysis
50 pages
EDA Loan Case Study PPT - Ver 1.1
80% (5)
EDA Loan Case Study PPT - Ver 1.1
22 pages
EDA Assignment
100% (1)
EDA Assignment
19 pages
Credit EDA Case Study
100% (3)
Credit EDA Case Study
22 pages
Quantitative Mathematics Module 2 PDF
No ratings yet
Quantitative Mathematics Module 2 PDF
13 pages
Eda Case Study Final PDF
100% (1)
Eda Case Study Final PDF
15 pages
EDA Group Case Study
No ratings yet
EDA Group Case Study
33 pages
EDA Case Study
No ratings yet
EDA Case Study
94 pages
Credit EDA Case Study Doc 1
100% (1)
Credit EDA Case Study Doc 1
16 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
34 pages
Ass 06 - Bank Loan Case Study
No ratings yet
Ass 06 - Bank Loan Case Study
11 pages
Credit EDA Assignment PDF
No ratings yet
Credit EDA Assignment PDF
40 pages
1 PPPP
No ratings yet
1 PPPP
26 pages
Summary and Context
No ratings yet
Summary and Context
51 pages
Bank Loan Casestudy
No ratings yet
Bank Loan Casestudy
17 pages
Credit EDA Case Study
100% (3)
Credit EDA Case Study
16 pages
EDA Credit Assignment Shakti - PDF
No ratings yet
EDA Credit Assignment Shakti - PDF
51 pages
Credit EDA Case Study
No ratings yet
Credit EDA Case Study
19 pages
Bank Loan Case Study Report
No ratings yet
Bank Loan Case Study Report
23 pages
Credit - Eda Case Study: Mr. Murali Krishna Manala Ms. Prachi Patil
100% (1)
Credit - Eda Case Study: Mr. Murali Krishna Manala Ms. Prachi Patil
22 pages
Explatory Data Analysis
No ratings yet
Explatory Data Analysis
18 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
11 pages
Vechile Loan Defaulter
No ratings yet
Vechile Loan Defaulter
23 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
22 pages
Credit Card EDA: Authored by
100% (1)
Credit Card EDA: Authored by
16 pages
EDA Assignment
No ratings yet
EDA Assignment
33 pages
EDA Assignment S
No ratings yet
EDA Assignment S
33 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
26 pages
Cart Project
75% (4)
Cart Project
17 pages
Credit EDA Assignment
No ratings yet
Credit EDA Assignment
23 pages
Trainity Data Analytics Training Project 6
No ratings yet
Trainity Data Analytics Training Project 6
22 pages
Fradulent Credit Case Study
100% (1)
Fradulent Credit Case Study
31 pages
Bank Loan PDF
No ratings yet
Bank Loan PDF
30 pages
Machine Learning Paper BD
No ratings yet
Machine Learning Paper BD
16 pages
Bank Loan Case Study 2
No ratings yet
Bank Loan Case Study 2
23 pages
Group 5 Dseb64a Report
No ratings yet
Group 5 Dseb64a Report
10 pages
Data Mining Case Study PDF
No ratings yet
Data Mining Case Study PDF
21 pages
Data Mining Case Study PDF
100% (1)
Data Mining Case Study PDF
21 pages
Kritika Sejwal 24MCI10023 ML Lab Project Report
No ratings yet
Kritika Sejwal 24MCI10023 ML Lab Project Report
10 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
13 pages
LendingClubCaseStudy 1
No ratings yet
LendingClubCaseStudy 1
19 pages
Bank Loan PPT
No ratings yet
Bank Loan PPT
45 pages
EDA Assignment Summary PDF
No ratings yet
EDA Assignment Summary PDF
12 pages
Trainity-Data An
No ratings yet
Trainity-Data An
24 pages
Capstone Project
100% (1)
Capstone Project
7 pages
Capstone Project
No ratings yet
Capstone Project
33 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
21 pages
Lending Club Case Study - Shambhu - Rakesh
No ratings yet
Lending Club Case Study - Shambhu - Rakesh
14 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
2 pages
Problem Statement
No ratings yet
Problem Statement
11 pages
Business Analytics
No ratings yet
Business Analytics
56 pages
Loan Status Prediction
No ratings yet
Loan Status Prediction
23 pages
Capstone Project - Final Submission
No ratings yet
Capstone Project - Final Submission
36 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
21 pages
An Kit
No ratings yet
An Kit
12 pages
LDA CreditCardDefault Code N
No ratings yet
LDA CreditCardDefault Code N
11 pages
Credit Eda Case Study
100% (1)
Credit Eda Case Study
39 pages
Exploring Data-MC Practice: Use The Data For Questions 1 - 5
No ratings yet
Exploring Data-MC Practice: Use The Data For Questions 1 - 5
2 pages
Cumulative Frequency
No ratings yet
Cumulative Frequency
27 pages
Ad3301 Dev QB-3,4,5
100% (1)
Ad3301 Dev QB-3,4,5
27 pages
November 2023 QP-1
No ratings yet
November 2023 QP-1
32 pages
4 ExploratoryAnalysis
No ratings yet
4 ExploratoryAnalysis
42 pages
L5 6 DataViz
No ratings yet
L5 6 DataViz
79 pages
Chapter 3 PDF
No ratings yet
Chapter 3 PDF
15 pages
Math 1040 Skittles Term Project
No ratings yet
Math 1040 Skittles Term Project
9 pages
UL Coded Project Report - KC
No ratings yet
UL Coded Project Report - KC
30 pages
Box Plot
No ratings yet
Box Plot
4 pages
Where Can Buy Essentials of Statistics For Business & Economics 9th Edition David R. Anderson Ebook With Cheap Price
100% (1)
Where Can Buy Essentials of Statistics For Business & Economics 9th Edition David R. Anderson Ebook With Cheap Price
62 pages
Data Management Assignment
No ratings yet
Data Management Assignment
36 pages
Data Mining Basics
No ratings yet
Data Mining Basics
38 pages
Pif Estadistica
No ratings yet
Pif Estadistica
135 pages
Paper 1H (Non-Calculator) : GCSE (9-1)
No ratings yet
Paper 1H (Non-Calculator) : GCSE (9-1)
11 pages
Ebooks File (Ebook PDF) Statistics For Business Economics 13th Edition by David All Chapters
100% (5)
Ebooks File (Ebook PDF) Statistics For Business Economics 13th Edition by David All Chapters
56 pages
Ass 10 DSBDL
No ratings yet
Ass 10 DSBDL
9 pages
Statistical Analysis of SAT Scores
No ratings yet
Statistical Analysis of SAT Scores
4 pages
Measures of Position
No ratings yet
Measures of Position
22 pages
Box Plots and Cumulative Frequency WS
No ratings yet
Box Plots and Cumulative Frequency WS
2 pages
Descriptive and Inferential Statistics With R
No ratings yet
Descriptive and Inferential Statistics With R
6 pages
Statistics For Managers Using Microsoft Excel: 5 Edition
No ratings yet
Statistics For Managers Using Microsoft Excel: 5 Edition
54 pages
Preparación de Microesferas de Alginato Por Emulsión: Gelificación Interna para Encapsular Polifenoles de Cacao
100% (1)
Preparación de Microesferas de Alginato Por Emulsión: Gelificación Interna para Encapsular Polifenoles de Cacao
10 pages
AP Stats Study Guide
No ratings yet
AP Stats Study Guide
27 pages
Skewness, Kurtosis and Moments
No ratings yet
Skewness, Kurtosis and Moments
96 pages
Business Statistics: Assignment-1 Project Report
No ratings yet
Business Statistics: Assignment-1 Project Report
10 pages
Spss Problem Solve
No ratings yet
Spss Problem Solve
107 pages
Descriptive Statistics and Normality Tests For Statistical Data
No ratings yet
Descriptive Statistics and Normality Tests For Statistical Data
13 pages
A Survey On Long-Term Stability of Stock Standard Solutions in Pesticide Residue Analysis
No ratings yet
A Survey On Long-Term Stability of Stock Standard Solutions in Pesticide Residue Analysis
9 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Credit EDA Case Study

Uploaded by

Credit EDA Case Study

Uploaded by

CREDIT EDA CASE STUDY

PGDDS C26 November 2020

2 Data set used

• Current applications “application_data.csv”

• Previous applications “previous_application.csv”

1 Loading Data 8 Bivariate Analysis : Numerical – Numerical

2 Inspecting Data 9 Correlations

3 Data Analysis 10 Loading Data "previous_application.csv"

4 Checking the Data Imbalance 11 Final Conclusions

Univariate analysis for Categorical columns

6 Univariate Analysis for Numerical columns

7 Bivariate Analysis : Numerical – Categorical

1. Reading Dataset application_data.csv.

• Inspecting Data frame

• After observing data frame, we find columns: 'DAYS_BIRTH', ‘DAYS_EMPLOYED',

In data frame, we observed some columns having d-type

We made list of all numerical columns .And then

• IQR for AMT_INCOME_TOTAL is very slim, and it have many

• Third quartile of AMT_GOODS_PRICE, AMT_CREDIT is larger

• DAYS_EMPLOYED have maximum value at 375000 ,which vary a

Since there is a huge imbalance between

We have splits data frame as

• Target0 : (Non-Defaulted Population)

• Univariate analysis , Bivariate analysis , Multivariate analysis

• It seems like Female clients applied

• 66.6% Female clients and 33.4%

• 57% Female clients and 42% male clients

• Middle Age Group(35 – 60) have

• While Senior Citizens(60 -

• Most of the clients applied for the Cash

• But Clients applied for Cash Loan have

 Businessman, students and Unemployed less likely to apply for loan .

 Working category have high risk to default.

 State Servant is at Minimal risk to default.

• Clients having Secondary/Secondary Special Education are more

• Also, clients having Secondary/Secondary Special Education are

• Pensioners and Laborers have applied the most for the

• Pensioner being highest followed by laborers have higher

• Clients having high salaries are at minimal risk.

• Most of the clients applied for the loan owns a

• Other clients have less payment difficulties with low

• Most of the clients applied for Medium Credit Amount

• Clients applying for medium and low credit have high

• Distribution of AMT_CREDIT for

• Distribution of _INCOME_TOTAL for

• People with target one has largely staggered income as compared to

• Income amount for Married clients with academic

• (Defaulter) Clients have relatively less income as

• Married client with academic applied for higher

• Some of the clients with Higher Education,

• Clients with high

• Clients having Lower

2. High correlated variables

• All three variables Are

TARGET 0 As we can see from correlation heat map

• AMT_CREDIT is inversely proportional to the

• AMT_CREDIT is inversely proportional to the

• AMT_INCOME_TOTAL is inversely proportional to

• Less CNT_CHILDREN client have in densely

• AMT_CREDIT is higher to densely populated area.

• AMT_INCOME_TOTAL is also higher in densely

• AMT_INCOME_TOTAL is inversely • The client's

• Top 10 correlations between both Default (TARGET 1) and non default(TARGET 0)

• Around 80.7% clients were repeaters

• 14.5% clients are new applying for the

Percentage of contracts approved or not in previous

• Most rejection of loans came from

• For "Education" & "Medicine"

• “Paying other loans” and "Buying a

• Here also there is a high

• Comparing "Education" &

• “Buying used Car" and “Building

Here for Housing type

• Office apartment is having higher credit of target

So, we can conclude that bank should avoid giving

You might also like