0% found this document useful (0 votes)

18 views17 pages

Capstone Removed

The document outlines a data analysis and machine learning workflow using a loan detection dataset. It includes data loading, preprocessing, model training with Logistic Regression, Decision Trees, and XGBoost, and evaluation through accuracy scores and confusion matrices. Additionally, it features hyperparameter tuning for the XGBoost model using GridSearchCV.

Uploaded by

dishasingla113

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views17 pages

Capstone Removed

Uploaded by

dishasingla113

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

In [1]: import numpy as np

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split, GridSearchCV

from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
import warnings
warnings.filterwarnings('ignore')

In [2]: # Load dataset

df = pd.read_csv('loan_detection.csv')
df

Out[2]: jo
age campaign pdays previous no_previous_contact not_working job_admin.

0 56 1 999 0 1 0 0

1 57 1 999 0 1 0 0

2 37 1 999 0 1 0 0

3 40 1 999 0 1 0 1

4 56 1 999 0 1 0 0

... ... ... ... ... ... ... ...

41183 73 1 999 0 1 1 0

41184 46 1 999 0 1 0 0

41185 56 2 999 0 1 1 0

41186 44 1 999 0 1 0 0

41187 74 3 999 1 1 1 0

41188 rows × 60 columns

In [3]: # Display dataset info

def data_overview(data):
print("Data Shape:", data.shape)
print("Columns:\n", data.columns.tolist())
print("\nMissing Values:\n", data.isna().sum())
print("\nSample Data:\n", data.head())
data_overview(df)
Data Shape: (41188, 60)
Columns:
['age', 'campaign', 'pdays', 'previous', 'no_previous_contact', 'not_working', 'job
_admin.', 'job_blue-collar', 'job_entrepreneur', 'job_housemaid', 'job_management',
'job_retired', 'job_self-employed', 'job_services', 'job_student', 'job_technician',
'job_unemployed', 'job_unknown', 'marital_divorced', 'marital_married', 'marital_sin
gle', 'marital_unknown', 'education_basic.4y', 'education_basic.6y', 'education_basi
c.9y', 'education_high.school', 'education_illiterate', 'education_professional.cour
se', 'education_university.degree', 'education_unknown', 'default_no', 'default_unkn
own', 'default_yes', 'housing_no', 'housing_unknown', 'housing_yes', 'loan_no', 'loa
n_unknown', 'loan_yes', 'contact_cellular', 'contact_telephone', 'month_apr', 'month
_aug', 'month_dec', 'month_jul', 'month_jun', 'month_mar', 'month_may', 'month_nov',
'month_oct', 'month_sep', 'day_of_week_fri', 'day_of_week_mon', 'day_of_week_thu',
'day_of_week_tue', 'day_of_week_wed', 'poutcome_failure', 'poutcome_nonexistent', 'p
outcome_success', 'Loan_Status_label']

Missing Values:
age 0
campaign 0
pdays 0
previous 0
no_previous_contact 0
not_working 0
job_admin. 0
job_blue-collar 0
job_entrepreneur 0
job_housemaid 0
job_management 0
job_retired 0
job_self-employed 0
job_services 0
job_student 0
job_technician 0
job_unemployed 0
job_unknown 0
marital_divorced 0
marital_married 0
marital_single 0
marital_unknown 0
education_basic.4y 0
education_basic.6y 0
education_basic.9y 0
education_high.school 0
education_illiterate 0
education_professional.course 0
education_university.degree 0
education_unknown 0
default_no 0
default_unknown 0
default_yes 0
housing_no 0
housing_unknown 0
housing_yes 0
loan_no 0
loan_unknown 0
loan_yes 0
contact_cellular 0
contact_telephone 0
month_apr 0
month_aug 0
month_dec 0
month_jul 0
month_jun 0
month_mar 0
month_may 0
month_nov 0
month_oct 0
month_sep 0
day_of_week_fri 0
day_of_week_mon 0
day_of_week_thu 0
day_of_week_tue 0
day_of_week_wed 0
poutcome_failure 0
poutcome_nonexistent 0
poutcome_success 0
Loan_Status_label 0
dtype: int64

Sample Data:
age campaign pdays previous no_previous_contact not_working \
0 56 1 999 0 1 0
1 57 1 999 0 1 0
2 37 1 999 0 1 0
3 40 1 999 0 1 0
4 56 1 999 0 1 0

job_admin. job_blue-collar job_entrepreneur job_housemaid ... \

0 0 0 0 1 ...
1 0 0 0 0 ...
2 0 0 0 0 ...
3 1 0 0 0 ...
4 0 0 0 0 ...

month_sep day_of_week_fri day_of_week_mon day_of_week_thu \

0 0 0 1 0
1 0 0 1 0
2 0 0 1 0
3 0 0 1 0
4 0 0 1 0

day_of_week_tue day_of_week_wed poutcome_failure poutcome_nonexistent \

0 0 0 0 1
1 0 0 0 1
2 0 0 0 1
3 0 0 0 1
4 0 0 0 1

poutcome_success Loan_Status_label
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0

[5 rows x 60 columns]

In [69]: # Correlation Matrix Heatmap

plt.figure(figsize=(12, 8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix')
plt.show()

In [71]: # Distribution of numerical features

df.hist(bins=20, figsize=(15, 10), edgecolor='black')
plt.suptitle('Feature Distributions')
plt.show()
In [4]: # Target column analysis
print("\nTarget Value Counts:\n", df['Loan_Status_label'].value_counts())

Target Value Counts:

Loan_Status_label
0 36548
1 4640
Name: count, dtype: int64

In [5]: plt.pie(df['Loan_Status_label'].value_counts(), labels=['Not Approved', 'Approved']

plt.title('Target Distribution')
plt.show()
In [6]: X = df.drop(columns=['Loan_Status_label'])

In [7]: y = df['Loan_Status_label']

In [8]: # Train-test split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_sta

In [9]: # Standardize numerical features

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [10]: # Initialize models

lr = LogisticRegression()

In [11]: dtree = DecisionTreeClassifier(max_depth=5, random_state=42)

In [12]: xgb = XGBClassifier(use_label_encoder=False, eval_metric='logloss', random_state=42

In [13]: # Train Logistic Regression

lr.fit(X_train, y_train)
lr_train_acc = lr.score(X_train, y_train)
lr_test_acc = lr.score(X_test, y_test)

In [14]: print(f"Logistic Regression - Training Accuracy: {lr_train_acc*100:.2f}%, Test Accu

Logistic Regression - Training Accuracy: 89.81%, Test Accuracy: 89.57%

In [15]: # Train Decision Tree
dtree.fit(X_train, y_train)

Out[15]: ▾ DecisionTreeClassifier i ?

DecisionTreeClassifier(max_depth=5, random_state=42)

In [16]: dtree_train_acc = dtree.score(X_train, y_train)

dtree_test_acc = dtree.score(X_test, y_test)

In [17]: print(f"Decision Tree - Training Accuracy: {dtree_train_acc*100:.2f}%, Test Accurac

Decision Tree - Training Accuracy: 90.14%, Test Accuracy: 89.45%

In [18]: # Train XGBoost

xgb.fit(X_train, y_train)

Out[18]: ▾ XGBClassifier i

XGBClassifier(base_score=None, booster=None, callbacks=None,

colsample_bylevel=None, colsample_bynode=None,
colsample_bytree=None, device=None, early_stopping_rounds=No
ne,
enable_categorical=False, eval_metric='logloss',
feature_types=None, gamma=None, grow_policy=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_bin=None, max_cat_threshold=None,
max_cat_to_onehot=None, max_delta_step=None, max_depth=None,

In [19]: xgb_train_acc = xgb.score(X_train, y_train)

xgb_test_acc = xgb.score(X_test, y_test)

In [20]: print(f"XGBoost - Training Accuracy: {xgb_train_acc*100:.2f}%, Test Accuracy: {xgb_

XGBoost - Training Accuracy: 92.17%, Test Accuracy: 89.29%

In [21]: # Model Evaluation - XGBoost

y_pred_train = xgb.predict(X_train)
y_pred_test = xgb.predict(X_test)

In [22]: print("\nConfusion Matrix - Train:")

sns.heatmap(confusion_matrix(y_train, y_pred_train), annot=True, fmt='d', cmap='Blu
plt.show()

Confusion Matrix - Train:

In [23]: print("\nClassification Report - Train:\n", classification_report(y_train, y_pred_t

Classification Report - Train:

precision recall f1-score support

0 0.92 0.99 0.96 29245

1 0.86 0.36 0.51 3705

accuracy 0.92 32950

macro avg 0.89 0.68 0.73 32950
weighted avg 0.92 0.92 0.91 32950

In [24]: print("\nConfusion Matrix - Test:")

sns.heatmap(confusion_matrix(y_test, y_pred_test), annot=True, fmt='d', cmap='Blues
plt.show()

Confusion Matrix - Test:

In [25]: print("\nClassification Report - Test:\n", classification_report(y_test, y_pred_tes

Classification Report - Test:

precision recall f1-score support

0 0.91 0.98 0.94 7303

1 0.57 0.23 0.33 935

accuracy 0.89 8238

macro avg 0.74 0.60 0.63 8238
weighted avg 0.87 0.89 0.87 8238

In [77]: # Find misclassified instances

misclassified = X_test[y_test != y_pred_test]
print("\nMisclassified Instances:\n", misclassified)

Misclassified Instances:
[[ 0.28580588 -0.56620036 0.19466067 ... -0.34048171 0.39837381
-0.18496534]
[ 1.53153166 0.15371713 0.19466067 ... -0.34048171 0.39837381
-0.18496534]
[ 0.57328106 -0.56620036 0.19466067 ... 2.93701532 -2.51020518
-0.18496534]
...
[ 2.3939572 -0.56620036 0.19466067 ... -0.34048171 0.39837381
-0.18496534]
[-1.24739509 -0.56620036 0.19466067 ... -0.34048171 0.39837381
-0.18496534]
[-0.86409485 -0.20624161 0.19466067 ... -0.34048171 0.39837381
-0.18496534]]
In [73]: from sklearn.metrics import roc_curve, auc

# ROC-AUC for XGBoost

y_pred_prob = xgb.predict_proba(X_test)[:, 1]
fpr, tpr, _ = roc_curve(y_test, y_pred_prob)
roc_auc = auc(fpr, tpr)

plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, label=f'XGBoost (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], 'r--') # Random classifier line
plt.title('ROC Curve')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend(loc='lower right')
plt.show()

In [26]: # Hyperparameter tuning for XGBoost

params = {
'n_estimators': [100, 200],
'learning_rate': [0.01, 0.05, 0.1],
'max_depth': [3, 4, 5],
'gamma': [0, 0.1, 0.2],
'reg_alpha': [0.1, 0.5, 1],
'reg_lambda': [0.1, 0.5, 1]
}
In [27]: grid_search = GridSearchCV(estimator=XGBClassifier(use_label_encoder=False, eval_me
param_grid=params, scoring='accuracy', cv=3, verbose=2)
grid_search.fit(X_train, y_train)
Fitting 3 folds for each of 486 candidates, totalling 1458 fits
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=0.1,
reg_lambda=0.1; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=0.1,
reg_lambda=0.1; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=0.1,
reg_lambda=0.1; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=0.1,
reg_lambda=0.5; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=0.1,
reg_lambda=0.5; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=0.1,
reg_lambda=0.5; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=0.1,
reg_lambda=1; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=0.1,
reg_lambda=1; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=0.1,
reg_lambda=1; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=0.5,
reg_lambda=0.1; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=0.5,
reg_lambda=0.1; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=0.5,
reg_lambda=0.1; total time= 0.3s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=0.5,
reg_lambda=0.5; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=0.5,
reg_lambda=0.5; total time= 0.3s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=0.5,
reg_lambda=0.5; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=0.5,
reg_lambda=1; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=0.5,
reg_lambda=1; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=0.5,
reg_lambda=1; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=1, re
g_lambda=0.1; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=1, re
g_lambda=0.1; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=1, re
g_lambda=0.1; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=1, re
g_lambda=0.5; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=1, re
g_lambda=0.5; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=1, re
g_lambda=0.5; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=1, re
g_lambda=1; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=1, re
g_lambda=1; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=100, reg_alpha=1, re
g_lambda=1; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=0.1,
reg_lambda=0.1; total time= 0.4s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=0.1,
reg_lambda=0.1; total time= 0.4s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=0.1,
reg_lambda=0.1; total time= 0.4s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=0.1,
reg_lambda=0.5; total time= 0.4s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=0.1,
reg_lambda=0.5; total time= 0.4s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=0.1,
reg_lambda=0.5; total time= 0.4s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=0.1,
reg_lambda=1; total time= 0.5s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=0.1,
reg_lambda=1; total time= 0.6s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=0.1,
reg_lambda=1; total time= 0.6s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=0.5,
reg_lambda=0.1; total time= 0.5s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=0.5,
reg_lambda=0.1; total time= 0.5s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=0.5,
reg_lambda=0.1; total time= 0.5s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=0.5,
reg_lambda=0.5; total time= 0.5s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=0.5,
reg_lambda=0.5; total time= 0.5s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=0.5,
reg_lambda=0.5; total time= 0.5s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=0.5,
reg_lambda=1; total time= 0.5s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=0.5,
reg_lambda=1; total time= 0.4s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=0.5,
reg_lambda=1; total time= 0.4s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=1, re
g_lambda=0.1; total time= 0.4s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=1, re
g_lambda=0.1; total time= 0.4s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=1, re
g_lambda=0.1; total time= 0.4s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=1, re
g_lambda=0.5; total time= 0.4s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=1, re
g_lambda=0.5; total time= 0.4s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=1, re
g_lambda=0.5; total time= 0.4s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=1, re
g_lambda=1; total time= 0.4s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=1, re
g_lambda=1; total time= 0.5s
[CV] END gamma=0, learning_rate=0.01, max_depth=3, n_estimators=200, reg_alpha=1, re
g_lambda=1; total time= 0.4s
[CV] END gamma=0, learning_rate=0.01, max_depth=4, n_estimators=100, reg_alpha=0.1,
reg_lambda=0.1; total time= 0.2s
[CV] END gamma=0, learning_rate=0.01, max_depth=4, n_estimators=100, reg_alpha=0.1,
eg_lambda=0.5; total time= 0.5s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=4, n_estimators=200, reg_alpha=1, r
eg_lambda=0.5; total time= 0.5s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=4, n_estimators=200, reg_alpha=1, r
eg_lambda=1; total time= 0.5s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=4, n_estimators=200, reg_alpha=1, r
eg_lambda=1; total time= 0.4s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=4, n_estimators=200, reg_alpha=1, r
eg_lambda=1; total time= 0.4s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=0.1,
reg_lambda=0.1; total time= 0.2s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=0.1,
reg_lambda=0.1; total time= 0.2s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=0.1,
reg_lambda=0.1; total time= 0.3s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=0.1,
reg_lambda=0.5; total time= 0.4s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=0.1,
reg_lambda=0.5; total time= 0.2s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=0.1,
reg_lambda=0.5; total time= 0.3s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=0.1,
reg_lambda=1; total time= 0.3s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=0.1,
reg_lambda=1; total time= 0.3s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=0.1,
reg_lambda=1; total time= 0.2s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=0.5,
reg_lambda=0.1; total time= 0.4s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=0.5,
reg_lambda=0.1; total time= 0.4s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=0.5,
reg_lambda=0.1; total time= 0.3s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=0.5,
reg_lambda=0.5; total time= 0.4s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=0.5,
reg_lambda=0.5; total time= 0.3s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=0.5,
reg_lambda=0.5; total time= 0.3s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=0.5,
reg_lambda=1; total time= 0.2s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=0.5,
reg_lambda=1; total time= 0.3s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=0.5,
reg_lambda=1; total time= 0.3s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=1, r
eg_lambda=0.1; total time= 0.3s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=1, r
eg_lambda=0.1; total time= 0.3s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=1, r
eg_lambda=0.1; total time= 0.2s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=1, r
eg_lambda=0.5; total time= 0.3s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=1, r
eg_lambda=0.5; total time= 0.3s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=1, r
eg_lambda=0.5; total time= 0.4s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=1, r
eg_lambda=1; total time= 0.4s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=1, r
eg_lambda=1; total time= 0.4s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=100, reg_alpha=1, r
eg_lambda=1; total time= 0.3s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=0.1,
reg_lambda=0.1; total time= 0.5s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=0.1,
reg_lambda=0.1; total time= 0.6s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=0.1,
reg_lambda=0.1; total time= 0.5s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=0.1,
reg_lambda=0.5; total time= 0.6s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=0.1,
reg_lambda=0.5; total time= 0.7s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=0.1,
reg_lambda=0.5; total time= 0.7s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=0.1,
reg_lambda=1; total time= 0.5s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=0.1,
reg_lambda=1; total time= 0.4s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=0.1,
reg_lambda=1; total time= 0.5s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=0.5,
reg_lambda=0.1; total time= 0.5s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=0.5,
reg_lambda=0.1; total time= 0.5s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=0.5,
reg_lambda=0.1; total time= 0.5s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=0.5,
reg_lambda=0.5; total time= 0.4s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=0.5,
reg_lambda=0.5; total time= 0.5s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=0.5,
reg_lambda=0.5; total time= 0.4s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=0.5,
reg_lambda=1; total time= 0.4s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=0.5,
reg_lambda=1; total time= 0.4s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=0.5,
reg_lambda=1; total time= 0.5s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=1, r
eg_lambda=0.1; total time= 0.4s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=1, r
eg_lambda=0.1; total time= 0.4s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=1, r
eg_lambda=0.1; total time= 0.5s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=1, r
eg_lambda=0.5; total time= 0.5s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=1, r
eg_lambda=0.5; total time= 0.6s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=1, r
eg_lambda=0.5; total time= 0.6s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=1, r
eg_lambda=1; total time= 0.5s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=1, r
eg_lambda=1; total time= 0.5s
[CV] END gamma=0.2, learning_rate=0.1, max_depth=5, n_estimators=200, reg_alpha=1, r
eg_lambda=1; total time= 0.5s
Out[27]: ▸ GridSearchCV i ?

▸ estimator: XGBClassifier

▸ XGBClassifier

In [67]: print("\nBest Parameters from GridSearchCV:\n", grid_search.best_params_)

Best Parameters from GridSearchCV:

{'gamma': 0, 'learning_rate': 0.05, 'max_depth': 3, 'n_estimators': 100, 'reg_alph
a': 0.5, 'reg_lambda': 0.5}

In [29]: # Evaluate the best model

y_pred_gs_test = grid_search.predict(X_test)

In [75]: # Feature importance for XGBoost

importance = xgb.feature_importances_
features = X.columns
plt.figure(figsize=(10, 6))
sns.barplot(x=importance, y=features)
plt.title('Feature Importance')
plt.show()

In [30]: print("\nConfusion Matrix - Best Model (Test):")

sns.heatmap(confusion_matrix(y_test, y_pred_gs_test), annot=True, fmt='d', cmap='Bl
plt.show()

Confusion Matrix - Best Model (Test):

In [31]: print("\nClassification Report - Best Model (Test):\n", classification_report(y_tes

Classification Report - Best Model (Test):

precision recall f1-score support

0 0.90 0.99 0.94 7303

1 0.66 0.18 0.28 935

accuracy 0.90 8238

macro avg 0.78 0.58 0.61 8238
weighted avg 0.88 0.90 0.87 8238

In [79]: import joblib

# Save the best XGBoost model

joblib.dump(grid_search.best_estimator_, 'xgb_model.pkl')
print("Best XGBoost model saved as 'xgb_model.pkl'")

Best XGBoost model saved as 'xgb_model.pkl'

In [ ]:

EXCEL LifePlanner
No ratings yet
EXCEL LifePlanner
1,143 pages
Solution Manual For Microeconometrics
59% (22)
Solution Manual For Microeconometrics
785 pages
Week3 Logistic Regression Post PDF
No ratings yet
Week3 Logistic Regression Post PDF
110 pages
S-Curve and Gantt Chart Generator
No ratings yet
S-Curve and Gantt Chart Generator
188 pages
Answer:: Activity in Statistics
No ratings yet
Answer:: Activity in Statistics
3 pages
Financial Modelling
No ratings yet
Financial Modelling
58 pages
Ex 8
No ratings yet
Ex 8
3 pages
Mloa Exp1 C121
No ratings yet
Mloa Exp1 C121
49 pages
Chapter 2
No ratings yet
Chapter 2
32 pages
Data Preprocessing
No ratings yet
Data Preprocessing
27 pages
SMDM Week 2 Quiz 2 - Solution
100% (3)
SMDM Week 2 Quiz 2 - Solution
4 pages
DW 14
No ratings yet
DW 14
14 pages
Deltarunechapter - 3 Pre Knight
No ratings yet
Deltarunechapter - 3 Pre Knight
52 pages
Ensemmmmm
No ratings yet
Ensemmmmm
10 pages
Dự báo và phát triển kinh doanh
No ratings yet
Dự báo và phát triển kinh doanh
43 pages
Business Math1
No ratings yet
Business Math1
30 pages
Random Forest Classification
No ratings yet
Random Forest Classification
24 pages
Lecture 2
No ratings yet
Lecture 2
13 pages
Credit Risk Modeling in R: Logistic Regression: Introduction
No ratings yet
Credit Risk Modeling in R: Logistic Regression: Introduction
27 pages
Filech2 1
No ratings yet
Filech2 1
52 pages
Group 3
No ratings yet
Group 3
15 pages
Tarea 4
No ratings yet
Tarea 4
6 pages
Suvdata Analysis
No ratings yet
Suvdata Analysis
7 pages
Sem Data Analysis
No ratings yet
Sem Data Analysis
36 pages
SPPUML2
No ratings yet
SPPUML2
7 pages
Diabetics Data Analysis
No ratings yet
Diabetics Data Analysis
5 pages
"D:/ML/Cleaned-Data - CSV": Import As From Import From Import From Import
No ratings yet
"D:/ML/Cleaned-Data - CSV": Import As From Import From Import From Import
5 pages
ML Cops
No ratings yet
ML Cops
17 pages
Tlrnow Resume Services and Chatbot Analytics: Presented by Azeem Khalipha
No ratings yet
Tlrnow Resume Services and Chatbot Analytics: Presented by Azeem Khalipha
7 pages
457 Labs
No ratings yet
457 Labs
19 pages
Departmental Leave Planner - 2021: Company Abc
No ratings yet
Departmental Leave Planner - 2021: Company Abc
3 pages
Loan Amount 369,000 Rate of Interest (Reducing) 14.25% Total Instalments 60 EMI 8,634.00
No ratings yet
Loan Amount 369,000 Rate of Interest (Reducing) 14.25% Total Instalments 60 EMI 8,634.00
6 pages
ML Assignment Presentation
No ratings yet
ML Assignment Presentation
37 pages
SM Tutorial Sheet-2
0% (1)
SM Tutorial Sheet-2
2 pages
Nikitha
No ratings yet
Nikitha
15 pages
Filech2 9
No ratings yet
Filech2 9
52 pages
Salary Slip For August
No ratings yet
Salary Slip For August
4 pages
PracticalWeek02
No ratings yet
PracticalWeek02
1 page
Coding
No ratings yet
Coding
9 pages
1.-Gestion de Mantenimiento-6to - 2S2019
No ratings yet
1.-Gestion de Mantenimiento-6to - 2S2019
20 pages
Assignment 1 Data Mining
No ratings yet
Assignment 1 Data Mining
1 page
Michelin - EHS Daily Tracker
No ratings yet
Michelin - EHS Daily Tracker
33 pages
AML Project LearnerNotebook LowCode
No ratings yet
AML Project LearnerNotebook LowCode
74 pages
Filech2 9
No ratings yet
Filech2 9
52 pages
Logistic Regression Implementation
No ratings yet
Logistic Regression Implementation
10 pages
Annual Leave Planner
No ratings yet
Annual Leave Planner
3 pages
Predictive Modeling
No ratings yet
Predictive Modeling
42 pages
Employee Turnover Analytics
No ratings yet
Employee Turnover Analytics
32 pages
Credit Card Default
No ratings yet
Credit Card Default
5 pages
AIML Lab Ex 3-5 - 1
No ratings yet
AIML Lab Ex 3-5 - 1
31 pages
Filech2 2
No ratings yet
Filech2 2
52 pages
Daily Attendance
No ratings yet
Daily Attendance
2 pages
Proc Stat
No ratings yet
Proc Stat
1 page
Working Paper Leave Management
No ratings yet
Working Paper Leave Management
69 pages
Nelson Plosser 1982
100% (1)
Nelson Plosser 1982
24 pages
Semi-Detailed Lesson Plan in Statistics and Probability I. Objectives
100% (2)
Semi-Detailed Lesson Plan in Statistics and Probability I. Objectives
5 pages
Algoritmo de Coloración
No ratings yet
Algoritmo de Coloración
7 pages
Algorithm:: Time Complexity: O (2 (N 2) )
No ratings yet
Algorithm:: Time Complexity: O (2 (N 2) )
6 pages
Random Process
No ratings yet
Random Process
77 pages
Partial Least Squares Structural Equation Modeling: September 2017
No ratings yet
Partial Least Squares Structural Equation Modeling: September 2017
41 pages
T-Test Z Test
No ratings yet
T-Test Z Test
33 pages
FPPTI Jawa Barat Bid Ekonomi (May 2024-Dec 2024) 12 Peserta
No ratings yet
FPPTI Jawa Barat Bid Ekonomi (May 2024-Dec 2024) 12 Peserta
4 pages
XLSX
No ratings yet
XLSX
16 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
Algorithm:: Time Complexity: O (2 (N 2) )
No ratings yet
Algorithm:: Time Complexity: O (2 (N 2) )
6 pages
Practical PRogram List 2.ipynb - Colab
No ratings yet
Practical PRogram List 2.ipynb - Colab
6 pages
C. Safety Performance Statistics - TRIR
No ratings yet
C. Safety Performance Statistics - TRIR
11 pages
The Role of Rest in The NBA Home-Court Advantage
No ratings yet
The Role of Rest in The NBA Home-Court Advantage
10 pages
Choosing The Right Statistical Test
100% (1)
Choosing The Right Statistical Test
1 page
ML Project 2
No ratings yet
ML Project 2
19 pages
Written Assignment Unit2
No ratings yet
Written Assignment Unit2
6 pages
Name: Udaya Bir Saha Batch: 61D Student ID: 50 Marks: Out of 60 SPRING 2021
No ratings yet
Name: Udaya Bir Saha Batch: 61D Student ID: 50 Marks: Out of 60 SPRING 2021
21 pages
Joint Probability Distributions: Chapter Outline
No ratings yet
Joint Probability Distributions: Chapter Outline
12 pages
Stochastic Frontier Analysis and DEA
No ratings yet
Stochastic Frontier Analysis and DEA
3 pages
6 Some Probability Distributions B1-2
No ratings yet
6 Some Probability Distributions B1-2
36 pages
Sta 301
No ratings yet
Sta 301
11 pages
Correlation
No ratings yet
Correlation
57 pages
Questions Answers Topic 5
No ratings yet
Questions Answers Topic 5
5 pages
Module 6. T-Test Two Sample Test
No ratings yet
Module 6. T-Test Two Sample Test
6 pages
Anim Feed Sci Technol 165 68 Meta Analysis Pigs Betaine
No ratings yet
Anim Feed Sci Technol 165 68 Meta Analysis Pigs Betaine
11 pages
Unit 3
No ratings yet
Unit 3
20 pages
Lesson 2. Constructing Probability Distributions
No ratings yet
Lesson 2. Constructing Probability Distributions
28 pages
Report: Mean (Expected Value) of A Discrete Random Variable 100%
No ratings yet
Report: Mean (Expected Value) of A Discrete Random Variable 100%
2 pages
SMS 3466 Survival Analysis
No ratings yet
SMS 3466 Survival Analysis
6 pages
Non Linear Model
No ratings yet
Non Linear Model
4 pages
Worksheet 03
No ratings yet
Worksheet 03
2 pages
12 BM 22heq Kalvi Asiriyarkal
No ratings yet
12 BM 22heq Kalvi Asiriyarkal
5 pages
9709 - w21 - QP - 52 Solved+unsolved
No ratings yet
9709 - w21 - QP - 52 Solved+unsolved
12 pages
Moon2020 q1
No ratings yet
Moon2020 q1
14 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Capstone Removed

Uploaded by

Capstone Removed

Uploaded by

In [1]: import numpy as np

from sklearn.model_selection import train_test_split, GridSearchCV

In [2]: # Load dataset

... ... ... ... ... ... ... ...

41188 rows × 60 columns

In [3]: # Display dataset info

job_admin. job_blue-collar job_entrepreneur job_housemaid ... \

month_sep day_of_week_fri day_of_week_mon day_of_week_thu \

day_of_week_tue day_of_week_wed poutcome_failure poutcome_nonexistent \

In [69]: # Correlation Matrix Heatmap

In [71]: # Distribution of numerical features

Target Value Counts:

In [5]: plt.pie(df['Loan_Status_label'].value_counts(), labels=['Not Approved', 'Approved']

In [8]: # Train-test split

In [9]: # Standardize numerical features

In [10]: # Initialize models

In [11]: dtree = DecisionTreeClassifier(max_depth=5, random_state=42)

In [12]: xgb = XGBClassifier(use_label_encoder=False, eval_metric='logloss', random_state=42

In [13]: # Train Logistic Regression

In [14]: print(f"Logistic Regression - Training Accuracy: {lr_train_acc*100:.2f}%, Test Accu

Logistic Regression - Training Accuracy: 89.81%, Test Accuracy: 89.57%

In [16]: dtree_train_acc = dtree.score(X_train, y_train)

In [17]: print(f"Decision Tree - Training Accuracy: {dtree_train_acc*100:.2f}%, Test Accurac

Decision Tree - Training Accuracy: 90.14%, Test Accuracy: 89.45%

In [18]: # Train XGBoost

XGBClassifier(base_score=None, booster=None, callbacks=None,

In [19]: xgb_train_acc = xgb.score(X_train, y_train)

In [20]: print(f"XGBoost - Training Accuracy: {xgb_train_acc*100:.2f}%, Test Accuracy: {xgb_

XGBoost - Training Accuracy: 92.17%, Test Accuracy: 89.29%

In [21]: # Model Evaluation - XGBoost

In [22]: print("\nConfusion Matrix - Train:")

Confusion Matrix - Train:

Classification Report - Train:

0 0.92 0.99 0.96 29245

accuracy 0.92 32950

In [24]: print("\nConfusion Matrix - Test:")

Confusion Matrix - Test:

Classification Report - Test:

0 0.91 0.98 0.94 7303

accuracy 0.89 8238

In [77]: # Find misclassified instances

# ROC-AUC for XGBoost

In [26]: # Hyperparameter tuning for XGBoost

In [67]: print("\nBest Parameters from GridSearchCV:\n", grid_search.best_params_)

Best Parameters from GridSearchCV:

In [29]: # Evaluate the best model

In [75]: # Feature importance for XGBoost

In [30]: print("\nConfusion Matrix - Best Model (Test):")

Confusion Matrix - Best Model (Test):

Classification Report - Best Model (Test):

0 0.90 0.99 0.94 7303

accuracy 0.90 8238

In [79]: import joblib

# Save the best XGBoost model

Best XGBoost model saved as 'xgb_model.pkl'

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.