Loan Prediction System Using Machine Learning
Loan Prediction System Using Machine Learning
1051/itmconf/20224403019
ICACC-2022
Abstract. As the needs of people are increasing, the demand for loans in banks is also frequently
getting higher every day. Banks typically process an applicant's loan after screening and verifying
the applicant's eligibility, which is a difficult and time-consuming process. In some cases, some
applicants default and banks lose capital. The machine learning approach is ideal for reducing
human effort and effective decision making in the loan approval process by implementing machine
learning tools that use classification algorithms to predict eligible loan applicants.
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution
License 4.0 (http://creativecommons.org/licenses/by/4.0/).
ITM Web of Conferences 44, 03019 (2022) https://doi.org/10.1051/itmconf/20224403019
ICACC-2022
applications. The model is available in open-source 3.2 EMI – In Fig 2, EMI is the yearly volume that the
software R. This application works well and meets the seeker must pay to reimburse the loan. The model
requirements of all banks. The downside of this model behind this variable is that people with lofty. EMI may
is that it gives each element a different weight, but in
possess challenges with the prepayment of their loans.
reality, it may be possible to approve a loan only based
on a single powerful element, which is not possible with EMI can exist figured by taking the rate of the loan
this system. This component can be easily connected to volume to the majority the loan volume rate of the loan
many other systems. There are cases of computer volume to the majority of the loan volume.
failure, and the most important weights of content errors
and features are fixed by the automatic prediction
system, and soon, so-called software may be safer, more
reliable and more [4]. Risk assessment and forecasting
is an important task in the banking industry in
determining whether a good and lazy loan applicant is
applicable. To improve the accuracy of risk, risk
assessments are conducted in primary and secondary
education. Customer data is extracted and related
attributes are selected using information gain theory.
Rule forecasting is performed for each credit type based
on predefined criteria. Approved and rejected applicants
are considered "Applicable" and evaluated as "Not
Applicable". Corresponding experimental results have
shown that the method proposed predicts better
accuracy and takes less time than existing methods [5]. Fig. 2. Density vs EMI
The main purpose of this design is to prognosticate
which customers will be repaid with a loan because the 3.3 Balance Income – In Fig 3, this is the return
lender needs to anticipate the problem that the borrower
deserted over after compensating the EMI. The model
won't be suitable to repay the threat. Studies of three
models show that logistic regression with a rating is behind creating this variable is that the advanced the
superior to other models, random forests, and decision valuation, additionally probable a person is to reimburse
trees. Poor credit seekers aren't accepted, presumably the loan and thus additionally probable it's to authorize
because they have the option of not paying. In utmost the loan.
cases, high-value appliers may be eligible for a
reduction that may repay the loan. Certain sexual
orientations and marriage status appear to be out of the
reach of the company [6].
3 Feature Engineering
Predicated on the field knowledge, this system can
develop new features that can affect the target variables.
Created three new functions:
4 Proposed Framework
4.1 Business understanding:
In the early stages, the base is on deriving the design
from a custom outlook and rephrasing that lore into data
mining challenge delineations and primary designs.
2
ITM Web of Conferences 44, 03019 (2022) https://doi.org/10.1051/itmconf/20224403019
ICACC-2022
4.4 Modelling:
The algorithm which will be used for data modelling is
Logistic Regression using stratified k-folds cross-
validation and Random Forest.
References
[1] M. Sheikh, A. Goel, T. Kumar, “An Approach
for Prediction of Loan Approval using
Machine Learning Algorithm,” International
Fig. 4. AUC value of 0.5626 Conference on Electronics and Sustainable
Communication Systems (ICESC), (2020).
Random Forest: This system was tested to reduce the [2] S. M S, R. Sunny T, “Loan Credibility
exactness by conforming to the hyperparameters of this Prediction System Based on Decision Tree
model. The model used grid search to master optimized Algorithm,” International Journal of
valuations for hyperparameters. Grid - search is a way Engineering Research & Technology (IJERT)
to elect the stylish one from the family of
Vol. 4 Issue 09, (2015).
hyperparameters parameterized by the parameter grid.
[3] A. Kumar, I. Garg and S. Kaur, “Loan
adapted the max_depth and n_estimators’ parameters.
max_depth determines the maximum depth of the tree Approval Prediction based on Machine
and n_estimators determine the number of trees used in Learning Approach,” IOSR Journal of
the random forest model. In Tabel 2, generated the mean Computer Engineering, (2016).
validation accuracy for the hyperparameters. [4] Dr K. Kavitha, “Clustering Loan Applicants
based on Risk Percentage using K-Means
Table 2. Mean Validation Accuracy of Hyperparameters Clustering Techniques,” IJARCSSE - Volume
Mean Validation Accuracy 0.7947 6, Issue 2, (2016).
[5] P. Dutta, “A STUDY ON MACHINE
LEARNING ALGORITHM FOR
ENHANCEMENT OF LOAN
PREDICTION”, International Research
3
ITM Web of Conferences 44, 03019 (2022) https://doi.org/10.1051/itmconf/20224403019
ICACC-2022