0% found this document useful (0 votes)
32 views33 pages

EDA Assignment

The document analyzes risk factors for loan defaults using an applicant dataset. It provides univariate analysis of various applicant attributes for the overall dataset as well as segmented by those who experienced payment difficulties. Key observations include that males and younger applicants are more likely to default, as are those who are unmarried, lower educated, renting housing, or unemployed.

Uploaded by

ankur1312
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views33 pages

EDA Assignment

The document analyzes risk factors for loan defaults using an applicant dataset. It provides univariate analysis of various applicant attributes for the overall dataset as well as segmented by those who experienced payment difficulties. Key observations include that males and younger applicants are more likely to default, as are those who are unmarried, lower educated, renting housing, or unemployed.

Uploaded by

ankur1312
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

EDA Assignment

By Ankur Singh
aryan1312@gmail.com
TYPES OF RISKS

 Not providing credit facilities to a  Providing credit facilities to a


capable borrower. borrower with malicious
intention.
 Lose of Revenue.
 Lose of Asset.
TYPES OF SCENARIOS

 The client with payment  All other cases


difficulties
 All other cases when the payment is
 Late payment more than X days on at paid on time.
least one of the first Y installments of
the loan
TYPES OF DECISIONS

• Approved • Refused  Unused offer  Cancelled


 Loan has been  The client cancelled
• The Company • The company cancelled by the the application
has approved had rejected client but at sometime during
the loan different stages approval. Either the
loan of the process. client changed her/his
(because the mind about the loan
Application
client does not or in some cases due
to a higher risk of the
meet their client he received
requirements worse pricing which
etc.). he did not want.
Business Objectives

 Identify patterns that indicate if a client has difficulty paying their installments, which
may be used for taking actions such as denying the loan, reducing the loan, lending (to
risky applicants) at a higher interest rate, etc. This will ensure that the consumers
capable of repaying the loan are not rejected.
 Understand the driving factors (or driver variables) behind loan default, i.e. the
variables which are strong indicators of default.
DATASET

'application_data.csv' 'previous_application.csv' 'columns_description.csv'


 Contains all the  Contains information about  Data dictionary which
information of the client the client’s previous loan data. describes the meaning of
at the time of application. the variables.
 It contains the data on
 The data is whether the previous
about whether a client application had
has payment difficulties. been Approved, Cancelled,
Refused, or Unused offer.
Points to Remember
• Present the overall approach of the analysis in a presentation. Mention the problem statement and the analysis
approach briefly.
• Identify the missing data and use an appropriate method to deal with it. (Remove columns/or replace it with an
appropriate value)

Hints
• Identify if there are outliers in the dataset. Also, mention why do you think it is an outlier. Again, remember that
for this exercise, it is not necessary to remove any data points.
• Identify if there is data imbalance in the data. Find the ratio of data imbalance.
• Explain the results of univariate, segmented univariate, bivariate analysis, etc. in business terms.
• Find the top 10 correlation for the Client with payment difficulties and all other cases (Target variable). Note
that you have to find the top correlation by segmenting the data frame w.r.t to the target variable and then find
the top correlation for each of the segmented data and find if any insight is there. Say, there are 5+1(target)
variables in a dataset: Var1, Var2, Var3, Var4, Var5, Target. And if you have to find top 3 correlation, it can be:
Var1 & Var2, Var2 & Var3, Var1 & Var3. The target variable will not feature in this correlation as it is a categorical
variable and not a continuous variable that is increasing or decreasing.
• Include visualizations and summarise the most important results in the presentation. You are free to choose the
graphs which explain the numerical/categorical variables. Insights should explain why the variable is important
for differentiating the clients with payment difficulties from all other cases.
GENERAL OBSERVATION

Percentage of Payment Default

Observation: The default rate is in


accordance with industry standards around
8.1 percent.
GENERAL OBSERVATION

Applicant Gender

Observation: Contrasting to normal industry


norms, the female applicant is twice more
than that of the male applicant.
GENERAL OBSERVATION

Car and Bike Ownership

Observation: The majority of the applicant


owns a car and a house. The ownership of
realty is around 69 % while the car is 66%.
GENERAL OBSERVATION

Income Distribution

Observation: The distribution of applicants is


maximum i.e 32.3 % of Total applicant
between 0-25 percentile while the average
income 90,500/-, followed by 50-75 – 26.4 %
with average income 1,76,000/- and 75-100
percentile has average income 2,96,000/- at
23.5%. The least number of applications are
present between 25-50 percentile i.e 17.8%
income bucket with an average income of
1,33,00/-.
GENERAL OBSERVATION

Age Distribution

Observation: With the average age of 43


years, the highest percentage of people lies
between the age group of 30-45 years,
making up 40 % of the group, followed by 45-
60 years at 33%, 18-30 years at 17 % and a
senior citizen at 8% only respective.
TARGETED OBSERVATION

Observation: With respect to the general gender ratio of 66: 34,


Applicants with Payment the applicants with payment difficulties changes to 57:43 i.e.
Difficulties more male have payment difficulties. The age ratios also change
from general to more Middle-age adults and Young-age Adults i.e.
3% and 7% increase.
TARGETED OBSERVATION

Observation: There is a direct indication that external rating 2


provide state that Defaulters have a significant lower rating.
External Rating
TARGETED OBSERVATION

Observation: The external rating for Applicants with payment


difficulties has a worse rating than others, while ratings of others
External Rating are more in line with whole data.
TARGETED OBSERVATION

Observation: With respect to the general gender ratio of 66: 34,


Applicants with Payment the applicants with payment difficulties changes to 57:43 i.e.
Difficulties more male have payment difficulties. The age ratios also change
from general to more Middle-age adults and Young-age Adults i.e.
3% and 7% increase.

Conclusion: The percentage of Male defaulters: 10.2 and Female defaulters: 7.08. Males are more likely to default.
The percentage of Young Adults defaulter: 11.41, Middle-Age defaulters: 8.8, Old Adults defaulters: 6.4, and Senior
Citizen defaulters: 5.23. Young Adults below the age of 30 years are likely to default more.
TARGETED OBSERVATION

Observation: The Annuity amount increase in applicants who have


difficulties with payments.
Annuity Amount
TARGETED OBSERVATION

Observation:
Marital: People with Civil Marriage defaults the most followed by
Singles and Widow tends to default the least.
Education Type: Applicant with an academic degree has the least
Marital Status and Education Type default while Lower Secondary and secondary have the highest and
second-highest default rate.
TARGETED OBSERVATION

Observation:
Housing: People Staying in rented apartments followed by Staying
with parents are more likely to default. People staying at officer
Housing and Income Type accommodation and those who live in their own house are less
likely to have payment difficulties.
Income Type: Applicant on Maternity leave or Unemployed highest
and second-highest default rate while people working as state
servant or pensioner are least likely to have payment difficulties.
MULTI VARIABLE OBSERVATION

Observation: Academic Degree has the least applicant with


payment difficulties while lower secondary has the highest. The
Family and Education riskiest categories are Civil Marriage and Separated with lower
secondary education.
MULTI VARIABLE OBSERVATION

Observation: Young Adults (18-30 years) have the highest


probability of having difficulties in payment followed by Middle-
Age and housing type Age Adults (30-45).
MULTI VARIABLE OBSERVATION

Observation: Managers have the highest Income but also the


highest annuity, all the occupations have the income to annuity
Annuity and Income ratio steady except HR staff. HR staff tends to have a higher
annuity in respect to their income.
Loan amount to HR staff should be lower than others.
MULTI VARIABLE OBSERVATION

Observation: People whose application has previously approved


tend to reapply withing a year of loan approval.
Days to Reapply
MULTI VARIABLE OBSERVATION

Observation: The approval rate of the repeater’s application is


very high in terms of number with a default rate of 7.28% only.
Type of Clients While the new applicant has the highest approval ratio but they
also tend to default the highest with 8.9%.
MULTI VARIABLE OBSERVATION

Observation: New applicants have the highest interest rate, followed by


Channel , Loan Type and Yield for refreshed applicants. Consumer loans of the new applicant have the
approved loans highest probability of approval while cash loan of repeater has the
least. The most effective channel are Stone, Regional/Local, and
Countrywide while the least effective are credit and cash office for new
customers and corporate sales for a repeater.
MULTI VARIABLE OBSERVATION

Observation: Occupation with the highest Risk to default: Low-skill


Laborers, Drivers, Laborers, Cooking staff, Security staff,
External Rating and Job Type Waiters/barmen staff. Low-risk occupation: Accountants, High skill
tech staff, Core staff, and Managers.
MULTI VARIABLE OBSERVATION

Observation: Taking unemployed and Maternity leave applicants


have the highest rate of default the external rating shows two
External Rating and Income Type contradictory paths i.e when they are not having difficulties in
paying their rating tends to stay high. Especially in the case of
applicants on maternity leave, while previous applications show
that even they are loan is approved for below rating average.
MULTI VARIABLE OBSERVATION

Observation: For those whose loans have already been sanctioned.


Hobby and Car Repairs have the highest rate of default while for a
Purpose of Loan new car, Furniture and Everyday expenses have the least default
rate.
CONCLUSION

 After analysis recommendation to made on the bases of three characters:


1. Whether an applicant will be Repaying the loan.
2. Whether an applicant will have difficulties in payment.
3. Loan can be given on the Condition of High-Interest rate to mitigate any
default risk leading to business loss
CONCLUSION

Factor supporting repayment capabilities

1. Academic degree has fewer defaults.


2. Students and Businessmen have no defaults.
3. People above the age of 45 yr have a low probability of default.
4. State Servants staying at their own home or at office apartments are more
like to repay.
5. Accountants, High skill tech staff, Core staff, and Managers are more like
to repay
6. Married and window are more likely to repay.

Least Risk Individual: Married State servant manager with age above 45 living in
officer apartment with an academic degree
CONCLUSION

Factor affecting payment capabilities

1. Avoid Low-skill Laborers, Drivers, Laborers, Cooking staff, Security staff,


Waiters/barmen staff.
2. Applicant with academic degree: Lower Secondary & Secondary education
3. Men have a higher default rate
4. Application who are single or civil marriage may default.
5. Clients who are either on Maternity leave OR Unemployed default a lot.
6. Adults of the age group of 18-45 as they have a higher probability of
default.
Extremely High-risk Individual: Unemployed Single Male with Lower Secondary
education working low skill labor with age between 18-30 year
CONCLUSION

Condition of High-Interest rate to mitigate any default risk leading to


business loss
1. Applicant whose External Rating has below 0.5.

2. Application who live in Rented apartments & living with parents.


3. Applicant age between 30 to 45 year.
Other Insights
 High probability of SK_ID_CURR 272071 index=148403, Index = 133829
SK_ID_CURR = 255214, Index = 198047 SK_ID_CURR = 329624, Index = 279348
SK_ID_CURR = 423593 Index = 298686 SK_ID_CURR = 446031 to default if loan
is given due being outlier in observation of client's social surroundings
defaulted.
 Applicants with Age above 65 to be avoided as high chance of default due to
high risk of mortality and high insurance coverage premium against the loan.
 Concerning the ‘AMT_REQ_CREDIT_BUREAU_MON’ column, applicants whose
inquiries are above 10 should be avoided to mitigate the risk of multiple
financing.
 We can check for the sustainability of applicants. The annuity to income
percent Should not be more than 65% of Total Income.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy