0% found this document useful (0 votes)
9 views6 pages

LOan Final

1 ml

Uploaded by

rashmikaeonic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
9 views6 pages

LOan Final

1 ml

Uploaded by

rashmikaeonic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 6
10121124, 2:81 PM Loan import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns data=pd.read_csv("LoanApprovalPrediction.csv") data Loan 1D Gender Married Dependents Education Self Employed Applicantincome © \P001002 Male = No 0.0 Graduate No 584g 1 (P001003 Male Yes 10 Graduate No 4583 2 \P00100s Male Yes 0.0 Graduate Yes 3000 Not , 3 LP001006 Male Yes 20 Graduate No 2583 4 PO01008 © Male. No 00 Graduate No 6000 593 LP002978 Female No 0.0 Graduate No 2900 594 LP00297s Male Yes 3.0 Graduate No 4106 595 .P002983 Male Yes 10 Graduate No 8072 596 LP002984 Male Yes 20 Graduate No 7583 597 P0029 Female No 00 Graduate Yes 4583 598 rows x 13 columns data.head(5) Loan ID Gender Married Dependents Education Self Employed Applicantincome ¢ © (P0102 Male No 0.0 Graduate No 5849 1 [P001003 Male Yes 1.0 Graduate No 4583 2 (P001005 Male Yes 0.0 Graduate Yes 3000 3 (P001006 Male Yes 90 ra ane No 2583 4 (P001008 ~— Male No 0.0 Graduate No 6000 > obj = (data.dtypes == ‘object") print("categorical variable”, len(1ist (obj [obj]. index))) {i:13C:/Usors/ADMIN/Desktop/MLP projectLOAN.htmt 8 10121124, 2:81 PM In Loan categorical variable 7 fLoan_ID is completely unique and not correlated with any of the other column ‘Drop loan_ié column’ data.drop('Loan_I0" ],axis=1, inplace=True) obj = (data.dtypes == ‘object’) object_cols = list(obj[obj].index) plt. Figure(figsize=(18,36)) index = 1 for col in object_cols: y = data[col].value_counts() plt. subplot (11,4, index) plt.xticks (rotation=@) sns.barplot(x=list(y.index), y=y) Index m - i Re me io In In annual income low,high,medium convert in 1,2,3 from sklearn import preprocessing # Label_encoder object knows how # to understand word Labets. Label_encoder = preprocessing. LabelEncoder() obj = (data.dtypes == ‘object") for col in 1ist(obj[obj] index): data[col] = label_encoder. fit_transform(data[col]) Import label encoder categorical variable value convert in numerical value like # Again check the object datatype columns. Let’s find out if there is still any Lef obj = (data.dtypes object’) print("Categorical variables:", len(1ist(obj[obj].index))) Categorical variables: @ plt. figure (figsize=(12,6)) sns-heatmap(data.corr(), chap="Br8G' , fmt: Linewidth: 22, annot=True) shows how closely related two variables are. {i:13C:/Usors/ADMIN/Desktop/MLP projectLOAN.htmt calculates the correlation matrix of the dataset data. The correlation matrix 10121124, 2:81 PM Loan Values range from -1 (perfect negative correlation) to 1 (perfect positive correlat #fnt='.2f'= the numbers inside the heatmap to show two decimal places. #linewidths=2: the width of the lines between the cells in the heatmap to 2 units #annot=True: is used, these exact numbers (like 1.0, 0.85, -0.38, etc.) Seaborn prints the exact numbers inside each colored box of the heatmap.''* out [20]: «Axes: > rootcontncome EM conpooincre pe en . “ct Property Area oo 2 Be: a 5 2 2 AS agg tbs 3 5 tn [21]: | eatplet]viauel Le the] plot fon the lector land] Mara tal] Status loFlehe epplleanes sns.catplot(x="Gender", y="Narried", hue="Loan_status", kind="bar”, data-data) Out [21]: {i:13C:/Usors/ADMIN/Desktop/MLP projectLOAN.htmt 10121124, 2:81 PM Loan Loan_status mmo m1 Married Gender #find out if there is any missing values in the dataset for col in data.colunns data[col] = data{col] .fillna(data[col] .mean()) data. isna().sum() adata[col]: Refers to the current column in the dataset. #1.fillna(): replaces all missing values in the column with a specified value. #2.data[col].mean(): This calculates the mean (average) of the non-missing values 4 #If there are any missing values in a column, it fills them with the average value #This checks how many missing values (NaNs) are Left in each colum after filling. #isna(): Identifies where the missing values are. sum(): Adds up the number of missing values in each column. {i:13C:/Usors/ADMIN/Desktop/MLP projectLOAN.htmt 10121124, 2:81 PM Loan Gender Narried Dependents Education Self_Employed Applicant Income CoapplicantIncone Loananount Loan_Anount_Term Credit_History Property_Area Loan_status dtype: ints from sklearn.model_selection import train_test_split X = data.drop(['Loan_status’],axis=1) Y = data[ ‘Loan_status’] X.shape,Y.shape X train, X_test, Y_train, Y_test = train test_split(x, Y, X_train.shape, X_test.shape, Y_train.shape, Y_test.shape ((358, 11), (248, 11), (358,), (248,)) from sklearn.neighbors import kNeighborsClassifier from sklearn.ensenble import RandonForestClassifier from sklearn.svm import SVC from sklearn.linear_model import LogisticRegression from sklearn import metrics knn = KNeighborsClassifier(n_neighbors=3) rfc = RandonForestClassifier(n_estinators = 7, criterion = ‘entropy’, random_state =7) sve = Svc() Lc = LogisticRegression() # making predictions on the training set for clf in (rfc, kan, svc,l¢ clf.fit(X_train, Y_train) Yipred = clf.predict(X_train) print("Accuracy score of ", clf._class_._name_, "=", 1o0*metrics.accuracy_score(¥_train, Accuracy score of RandonForestClassifier = 98.04469273743017 Accuracy score of KNeighborsClassifier = 78.49162011173185 Accuracy score of SVC = 68.715837988826¢ Accuracy score of LogisticRegression = 80.16759776536313, {i:13C:/Usors/ADMIN/Desktop/MLP projectLOAN.htmt vp 10121124, 2:81 PM Loan : \anaconda\Lib\site-packages\sklearn\linear_model\_logistic.py:469: Convergenceliarn ing: lbfgs failed to converge (status=1) STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the nunber of iterations (max_iter) or scale the data as shown in: https://scikit-Learn.org/stable/modules/preprocessing-html Please also refer to the documentation for alternative solver option https://scikit-learn.org/stable/modules/1inear_model .html#logistic-regression a iter_i = _check_optimize_result( # making predictions on the testing set for clf in (rfc, knn, svc,lc): clf.Fit(Xtrain, Y_train) Y_pred = clf.predict(x_test) print("Accuracy score of “, clf._class_._name_, 1ee*metrics.accuracy_score(Y_test, Y_pred)) Accuracy score of RandonForestClassifier Accuracy score of KNeighborsClassifier Accuracy score of SVC = 69.16666666666667 Accuracy score of LogisticRegression = 79,58333333333333 :\anaconda\Lib\site-packages\sklearn\linear_model\_logistic.py:469: ConvergenceWarn ing: lbfgs failed to converge (status=1) STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. 82.5 33. 7499999999999 Increase the nunber of iterations (max_iter) or scale the data as shown in: https://scikit-Learn.org/stable/modules/preprocessing-html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/1inear_model.html#logistic-regression 1 _Ater_i = _check_optimize_result( {i:13C:/Usors/ADMIN/Desktop/MLP projectLOAN.htmt

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy