0% found this document useful (0 votes)
92 views4 pages

Loan Pre Research Paper

This document discusses using decision trees, naive bayes, and PCA (principal component analysis) models to predict loan approvals. It implements a decision tree model to classify loan approvals based on home, personal, and other attributes. It also uses naive bayes for loan prediction, though it has lower accuracy than decision trees. PCA is used to remove dimensions from the naive bayes model to improve its accuracy. The document reviews related works on evaluating data mining methods for loan prediction and credit score risk management. It discusses the research methodology for using these algorithms to help banks determine which loan applications to approve or reject.

Uploaded by

Vaseem Akram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views4 pages

Loan Pre Research Paper

This document discusses using decision trees, naive bayes, and PCA (principal component analysis) models to predict loan approvals. It implements a decision tree model to classify loan approvals based on home, personal, and other attributes. It also uses naive bayes for loan prediction, though it has lower accuracy than decision trees. PCA is used to remove dimensions from the naive bayes model to improve its accuracy. The document reviews related works on evaluating data mining methods for loan prediction and credit score risk management. It discusses the research methodology for using these algorithms to help banks determine which loan applications to approve or reject.

Uploaded by

Vaseem Akram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Loan prediction using Decision Trees, Naive Bayes

and PCA

Purushottam
Vaseem Akram Sanjay Kumar
Computer Science
Computer Science Computer Science
VIT, Vellore
VIT, Vellore VIT, Vellore
purushottam1@gmail.com
skakram100@gmail.com sanjay@gmail.com

Abstract— Loan prediction is important for the financial So as to keep away these type of criminal deceptions data
companies or sectors to calculate the credit score of the mining techniques have been implemented. This will use the
customers who want to apply for the loan. By the results they previous records of the users and will help them so that they
get by the model are used to approve the loan for the can estimate who many customers they can trust. As so they
customers whether they can get the loan or not without that it can stop these type of fraud customers from coming, and
will be difficult for the institution to know whether he or she create new offers for the users so that they can trust the bank .
can get the loan. As a result the bank will get those customers who are truly
credible. In these type of sections data mining techniques are
being utilized in an effective manner.
ACKNOWLEDGEMENT - We would like to thank DR. .
Santhi. K, for giving us the opportunity to do our project
under her guidance. We are extremely thankful to her for
giving us valuable inputs and ideas, solving our doubts II. IMPLEMENTATION OF PAPER 1 AND PAPER 2
whenever we had and giving us timely feedbacks. After the In this section we are going to implement an decision tree
completion of every milestone and helping us all over the based on the technique we have approached in the paper. By
project and semester to complete the project. this we can find out the number of customers who are able
to have the access to take the loan.

A. Decision Tree

The Decision Tree algorithm is used in the approached


I. INTRODUCTION model. It consists of a parent node, child and branches. The
parent node represents the attributes of the given data set, the
As there are a large group of data available presently the data branch denotes the output of the given data , the child node
mining models or methods have become more useful and also represents the similar symbols in the data. The top most is
we can acquire knowledge from them. These methods are used the parent node.
in many cases like retail companies, communication, bio data
reading or evaluation, intrusion finding and many different The tree which we developed is about what we are getting as
cases. They are also helpful in banking sectors that can be give a data set. In the figure it will show the 3 sections like
helpful or useful to be equal with other sectors. Here we do home, personal and other. It will show whether a person can
implemented an model for banking sector for : - Loan get a loan on that particular type or not.
prediction is important for the financial companies or sectors
to calculate the credit score of the customers who want to
apply for the loan. By the results they get by the model are B. Naïve Bayes
used to approve the loan for the customers whether they can
get the loan or not without that it will be difficult for the It is based on the Bayes algorithm and it is independent of
institution to know whether he or she can get the loan. others. It sis very simple to make this model as it is not that
much complicated even though we have a huge amount of
As there is huge development of data in banking sector will datasets. But the accuracy we get here is very low when
be handling with analyzing and making use of the data to gain compared to that decision tree algorithm. It will have many
required knowledge which have become an piece of work dimensions i.e, it cannot get the accurate output of the data
above man ability. These methods have to acquired according which has many dimensional values. Although it has a very
to the way to find out the business crisis by finding different low accuracy and still it is used because of its simplicity in
arrangements, grouping and connection which are loaded in using and it will perform some useful methods.
data base. As we use the techniques given the banking sectors
can achieve best accurate results of the users and their credit
scores and the their possibility to get the loan. The
development and the competitiveness has made the banking
sectors to keep focus on the users control of something and
criminal deception.
According to the Mileris every evaluation should start with
the initial probability prediction for particular cases of
fields. After that a sample of data is taken to get some more
information regarding the events. Here a additional info of
the prior prediction values are updated by evaluating the
revised predictions, as mentioned to as Posterior
predictions.
Andersonetal explained the Bayesian algorithm as given
way of making all probability evaluations.

C. PCA(Principle Component Analysis)

PCA is a technique that will use some statistical or


independent values and gets the results under variance of
given dataset.

Principle Component Analysis is a unique way of approach


which is based on S.V.D methodology. PCA gives results
that are related to the variance of the given values. The
output which we obtain can be sometimes useful or
preferable. It is used to remove the dimensional values. In
the Project we have used the PCA to remove dimensions in
the Naïve Bayes and get more accurate results to the project.
A. Reasearch Methodology
Banking sector are now a days in very competitive world, so
there is need of checking the credit score risk management as
III. RELATED WORKS
it play a major role for the safety of the company or sector.
When a user comes for the bank for applying for a loan of any
Our team has reviewed many articles based on the evaluating
type the bank should first calculate the credit score of the
process of different data mining methods or models and
particular person who approached. Same process is done for all
those which we use are being explained here. Loan prediction
users who comes for the loan request. As it is more important
is important for the financial companies or sectors to calculate
for the banking sectors many evaluation techniques are used to
the credit score of the customers who want to apply for the
help the banks so that they come to know for whom they have
loan. By the results they get by the model are used to approve
accept and reject. Here we are discussing more briefly about
the loan for the customers whether they can get the loan or
the algorithms we are using in the project component. We are
not without that it will be difficult for the institution to know
using decision tree, Naïve Bayes and principle component
whether he or she can get the loan.
analysis. Naives Bayes is a model based on the Bayes rule, the
model which predicts the attributes X1….Xn they all are
For removing the point values and to get the exact values of
independent of one another as Y. The evaluation for this
the credit scores we using PCA(Principle Component prediction is get simplified as P(X/Y) and estimation done fro
Analysis) for removing these point values. In this we are
the taken data.
also using Decision Tree for getting the number of
customers who can get the loan. The BN denotes JPD(Joint Probability Distribution) as a
group a sequence of inputs Xi. In present model, to make or
IV. METHODOLOGY develop a Naïve Bayes is depending upon the given
The use of data mining to estimate the defaults accurately is equation(1):
very necessary in banking sectors because to avoid the risk in
credit score management as it is important to keep the trusted
users form fault deception. Of all relative models for getting
or evaluating the results of moderate accuracy Bayesian
classifier is used to predict the probability. As known,
Antonakis and Safikianakis proved the prediction of a case by
Given a new occurrence X new =(X 1
grouping some data. Rosner explained from the information
...Xn),equation(1) the prediction that Y can make on other data,
he found about Posterior Prediction.
as the data for the X new & P(Y) and P(Xi/Y) from the given
input of the data. In case of other interest of getting most
probable values of Y, so the rule for Bayes algorithm is given
as the equation given below(2):
VI. REFERENCES

[I] Abdelmoula, Aida Krichene. "Bank credit risk


analysis with knearest-neighbor classifier: Case
ofTunisian banks." Accounting and Management
But , Mitchell(2010) proposed, “ as the Xi becomes Information Systems 14.1 (2015):79.
continuous the another way is to be selected for denoting the [2] Arutjothi, G., Dr. C. Senthamarai. "Comparison of
Feature Selection Methods for Credit Risk
distribution P(Xi/Y)”. There is very usual way that for every Assessment", International Journal of Computer
value of yk of Y, the Xi is Gaussian and is described as a Science, Volume 5, Issue I, No 5, 2017.
means of SD to Xi and yk. As to get such a naïve Bayes [3] Arutjothi,G.,Dr.C.Senthamarai. "Credit Risk
model , there is a need of estimation of mean and SD for Evaluation using Hybrid Feature Selection Method"
Gaussians: Software Engineering and Technology 9.2 (2017):23-
26.
[4] Attig, Anja, and Petra Perner. "The Problem
ofNormalization and a Normalized Similarity Measure
by Online Data."Tran. CBR 4.1 (2011):3-17.
[5] Babu, Ram, and A. Rama Satish. "Improved of K-
Nearest Neighbor Techniques in Credit Scoring."
International Journal For Development ofComputer
Science & Technology I (2013).
For every given input value of Xi and each possible value [6] Bach, Mirjana Pejic, et al. "Selection of Variables
yk of Y, keep in note that there are 2nK of these for Credit Risk Data Mining Models: Preliminary
parameters, all of which must be estimated independently. research" MIPRO 2017-40 th Jubilee International
In accordingly, we have to estimate the priors on Y as well Convention.2017.
[7] Byanjankar, Ajay, Markku Heikkila, and Jozsef
Mezei. "Predicting credit risk in peer-to-peer lending: A
neural network approach." Computational Intelligence,
2015 IEEE Symposium Series on. IEEE, 2015.
[8] Devi, CR Durga, and R. Manicka Chezian. "A
relative evaluation of the performance of ensemble
The model which is explained above is naïve Bayes model learning in credit scoring." Advances in Computer
which predicts x as a combination of CC(i.e. they are Applications (ICACA), IEEE International Conference
on. IEEE, 2016.
dependent on class variable Y) Gaussians. Additionally, naïve
[9] G.Arutjothi, Dr.C.Senthamarai, "Effective Analysis
Bayes prediction introduce some other data in which the ofFinancial Data using Knowledge Discovery
input values of Xi will not depend on any other components. Database", International Journal of Computational
Intelligence and Informatics, Vol. 6: No. 2, September
2016
V. CONCLUSION [10] Goel, Dr Himani, and Gurbhej Singh "Evaluation
of Expectation Maximization based Clustering
Thus we came to the conclusion that Decision trees are Approach for Reusability Prediction of Function based
much more accurate than naïve Bayes as it can be noticed Software Systems." International Journal of Computer
from the accuracy difference. But naïve Bayes can be Applications (0975-8887) Volume (2010).
improved by applying PCA and thus reducing dimensions.
[11] Abid, F. and Zouari, A. (2000), “Financial distress
prediction using neural networks”
[12 ] Abramowicz, W., Nowak, M. and Sztykiel, J.
(2003), Bayesian Networks as a Decision Support Tool
in CreditScoring Domain,IdeaGroup Publishing.
[13] Altman, E.I. (1968), “Financial ratios, discriminant
analysis and the prediction of corporate
bankruptcy”,JournalofFinance,Vol.23No.4,pp.589-609.
[14] Anderson, D.R., Sweeney, D.J., Freeman, J.,
Williams, T.A. and Shoesmith, E. (2007), Statistics for
Businessand Economics,Thomson
LearningEMEA,London.
[15] Antonakis,A.C.andSfakianakis,M.E.(2009),
“Assessing naïve Bayesasa method forscreeningcredit
applicants”,JournalofApplied
Statistics,Vol.36No.5,pp.537-545.
https://www.coursera.org/account/accomplishments/records/EGMMNJQVV9MX

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy