Credit EDA Case Study
Credit EDA Case Study
Presented By
VADAGAM PRAVALIKA
Introduction
1 Objective
• When the company receives a loan application, the company must decide for loan approval based on the applicant’s profile. Two types of risks are
associated with the bank’s decision:
• If the applicant is likely to repay the loan, then not approving the loan results in a loss of business to the company
• If the applicant is not likely to repay the loan, i.e., he/she is likely to default, then approving the loan may lead to a financial loss for the company.
• Reading Data
• Checking the percentage of null values in the data frame in descending order.
• Analyzing number of null columns : 49
• For convenience to our analysis, we made binning based on the quantiles for
• Dropped columns having null values > 35%
the columns mentioned below:
• Imputing Columns having null values ≤ 19% with Mode values for numeric columns except for
AMT_INCOME_TOTAL , AMT_CREDIT AND AGE_DAYS
continuous numeric columns we imputed with Median value.
AMT_INCOME_TOTAL AMT_GOODS_PRICE
Conclusion
• Target1 : (Defaulted Population)
Clients with Payment Difficulties.
We have found that -
Ratio of Data Imbalance is “11.3”
In order to analyze the imbalance and various aspects of data we will perform
various types of analysis such as:
Clients who applied for loans were getting income by Working, Commercial associate
and Pensioner are more likely to apply for the loan, highest being the Working-class
category .
• It seems that Married clients applied most for the loan and have higher
payment difficulties.
• Widows are less likely to apply for the loan and have minimal risk.
• Clients with the single relationship have minimal risk to default i.e., have less
payment difficulties.
5.3 Occupation/Income Range with respect to Target variables
• Clients having Medium salary range are more likely to apply for the loan.
And have higher payment difficulties.
• Clients having low and medium income are at high risk to default.
• Distribution of
AMT_ANNUITY for
Target1 is broader than
Target0
•Distribution of
AMT_GOODS_PRICE for
Target1 and Target0 is
similar.
Notes
• The plots are also highlightingthat people who have difficulty in paying
back loans with respectto their income, loan amount, price of goods
against which loan is procured and Annuity.
• Dist. plot highlights the curve shape which is wider for Target 1 in
comparison to Target 0 which is narrower with well defined edges.
7. Bivariate Analysis
Numerical V/s Categorical
7.1 Income_Amount Vs Education_Status Vs FAMILY_Status among Target Variables
• Widow Client with Academic degree have a very few outliers and doesn't have First and
Third quartile. Also, Clients with all type of family status having academic degree have
very less outliers as compared to other type of education.
• Income of the clients with all type of family status having rest of the education type lie
Below the First quartile i.e., 25%.
• Clients with all Education type except Academic degree have large number of outliers
• Most of the population i.e., clients credit amounts lie below 25th percentile.
• Clients with Academic degree and who is a widow tend to take higher credit loan.
• Some of the clients with Higher Education, Incomplete Higher Education, Lower
Secondary Education and Secondary/Secondary Special Education are more likely to take
high amount of credit loan.
• Count of clients with income type Maternity leave is only 5, but risk to default in payments for those is
minimum among all the
income types.
• Same condition is observed in case of unemployed. Though count is very low, risk to default in payments
is low.
• Pensioner, State servant and Commercial associate have higher risk to default.
• Cash loans have higher risk to default, Revolving loans have comparatively lower
risk for the same .
• Clients having Low
income have high risk
to default followed by
clients with medium
and very low income.
• Clients having
Academic Degree and
higher Education have
lower risk to default.
Conclusion
• AMT_CREDIT is inversely
TARGET 1 proportional to the DAYS_BIRTH,
peoples belongs to low-age group
taking high Credit amount and vice-
versa Conclusion
This heat map for Target 1 is
• AMT_CREDIT is inversely also having quite a same
observation just like Previous
proportional to the CNT_CHILDREN,
Target 0. But for few points
means Credit amount is higher for
are different. They are listed
less children count client have and
below.
vice-versa.
TARGET 0 TARGET 1
V/s
10. Loan Distributions and
Purposes
10.1 Percentage of `NAME_CONTRACT_STATUS` and `NAME_CLIENT_TYPE`
• Approved :- 38.8%
• Refused :- 58.5%
• Canceled :- 2.3%
• Unused offer :- 0.31%
10.2 CONTRACT_STATUS V/s LOAN_PURPOSE
Conclusion
• We observe a decrease in the percentage of Payment Difficulties who are pensioners and an increase in the
percentage of Payment Difficulties who are working when compared the percentages of both Payment
Difficulties and non-Payment Difficulties.
• We observe a decrease in the percentage of married and widowed with Loan Payment Difficulties and an
increase in the the percentage of single and civil married with Loan Payment Difficulties when compared with
the percentages of both Loan Payment Difficulties and Loan Non-Payment Difficulties
• We observe an increase in percentage of Loan Payment Difficulties whose educational qualifications are
secondary/secondary special and a decrease in the percentage of Loan Payment Difficulties who have
completed higher education when compared with the percentages of Loan Payment Difficulties and Loan Non-
Payment Difficulties
Final Insights
• The count of ‘Low skilled Laborers’ in ‘OCCUPATION_TYPE’ is comparatively very less and it also has maximum
% of payment difficulties- around 17%. Hence, client with occupation type as ‘Low skilled Laborers’ are the
driving factors for Loan Defaulters.
• The count of ‘Lower Secondary’ in ‘NAME_EDUCATION_TYPE’ is comparatively very less and it also has
maximum % of payment difficulties- around 11%. Hence, client with education type as ‘Lower Secondary’ are
the driving factors for Loan Defaulters.
• Banks should focus more on contract type Student ,pensioner and Businessman with housing type other than
Co-op apartment, Office apartment for successful payments.
• Banks should focus less on income type Working as they are having the greatest number of unsuccessful
payments.
Final Insights
• Applicants living in House/Apartments has the highest number of loan application. While we see that Rented
apartment and applicants living with parents have very high percentage to default
• Get as much as clients from housing type With parents as they are having least number of unsuccessful
payments
• Also, with loan purpose Repair is having higher number of unsuccessful payments on time.
THANK YOU