EDA Assignment
EDA Assignment
By Ankur Singh
aryan1312@gmail.com
TYPES OF RISKS
Identify patterns that indicate if a client has difficulty paying their installments, which
may be used for taking actions such as denying the loan, reducing the loan, lending (to
risky applicants) at a higher interest rate, etc. This will ensure that the consumers
capable of repaying the loan are not rejected.
Understand the driving factors (or driver variables) behind loan default, i.e. the
variables which are strong indicators of default.
DATASET
Hints
• Identify if there are outliers in the dataset. Also, mention why do you think it is an outlier. Again, remember that
for this exercise, it is not necessary to remove any data points.
• Identify if there is data imbalance in the data. Find the ratio of data imbalance.
• Explain the results of univariate, segmented univariate, bivariate analysis, etc. in business terms.
• Find the top 10 correlation for the Client with payment difficulties and all other cases (Target variable). Note
that you have to find the top correlation by segmenting the data frame w.r.t to the target variable and then find
the top correlation for each of the segmented data and find if any insight is there. Say, there are 5+1(target)
variables in a dataset: Var1, Var2, Var3, Var4, Var5, Target. And if you have to find top 3 correlation, it can be:
Var1 & Var2, Var2 & Var3, Var1 & Var3. The target variable will not feature in this correlation as it is a categorical
variable and not a continuous variable that is increasing or decreasing.
• Include visualizations and summarise the most important results in the presentation. You are free to choose the
graphs which explain the numerical/categorical variables. Insights should explain why the variable is important
for differentiating the clients with payment difficulties from all other cases.
GENERAL OBSERVATION
Applicant Gender
Income Distribution
Age Distribution
Conclusion: The percentage of Male defaulters: 10.2 and Female defaulters: 7.08. Males are more likely to default.
The percentage of Young Adults defaulter: 11.41, Middle-Age defaulters: 8.8, Old Adults defaulters: 6.4, and Senior
Citizen defaulters: 5.23. Young Adults below the age of 30 years are likely to default more.
TARGETED OBSERVATION
Observation:
Marital: People with Civil Marriage defaults the most followed by
Singles and Widow tends to default the least.
Education Type: Applicant with an academic degree has the least
Marital Status and Education Type default while Lower Secondary and secondary have the highest and
second-highest default rate.
TARGETED OBSERVATION
Observation:
Housing: People Staying in rented apartments followed by Staying
with parents are more likely to default. People staying at officer
Housing and Income Type accommodation and those who live in their own house are less
likely to have payment difficulties.
Income Type: Applicant on Maternity leave or Unemployed highest
and second-highest default rate while people working as state
servant or pensioner are least likely to have payment difficulties.
MULTI VARIABLE OBSERVATION
Least Risk Individual: Married State servant manager with age above 45 living in
officer apartment with an academic degree
CONCLUSION