Capstone Removed
Capstone Removed
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Out[2]: jo
age campaign pdays previous no_previous_contact not_working job_admin.
0 56 1 999 0 1 0 0
1 57 1 999 0 1 0 0
2 37 1 999 0 1 0 0
3 40 1 999 0 1 0 1
4 56 1 999 0 1 0 0
41183 73 1 999 0 1 1 0
41184 46 1 999 0 1 0 0
41185 56 2 999 0 1 1 0
41186 44 1 999 0 1 0 0
41187 74 3 999 1 1 1 0
Missing Values:
age 0
campaign 0
pdays 0
previous 0
no_previous_contact 0
not_working 0
job_admin. 0
job_blue-collar 0
job_entrepreneur 0
job_housemaid 0
job_management 0
job_retired 0
job_self-employed 0
job_services 0
job_student 0
job_technician 0
job_unemployed 0
job_unknown 0
marital_divorced 0
marital_married 0
marital_single 0
marital_unknown 0
education_basic.4y 0
education_basic.6y 0
education_basic.9y 0
education_high.school 0
education_illiterate 0
education_professional.course 0
education_university.degree 0
education_unknown 0
default_no 0
default_unknown 0
default_yes 0
housing_no 0
housing_unknown 0
housing_yes 0
loan_no 0
loan_unknown 0
loan_yes 0
contact_cellular 0
contact_telephone 0
month_apr 0
month_aug 0
month_dec 0
month_jul 0
month_jun 0
month_mar 0
month_may 0
month_nov 0
month_oct 0
month_sep 0
day_of_week_fri 0
day_of_week_mon 0
day_of_week_thu 0
day_of_week_tue 0
day_of_week_wed 0
poutcome_failure 0
poutcome_nonexistent 0
poutcome_success 0
Loan_Status_label 0
dtype: int64
Sample Data:
age campaign pdays previous no_previous_contact not_working \
0 56 1 999 0 1 0
1 57 1 999 0 1 0
2 37 1 999 0 1 0
3 40 1 999 0 1 0
4 56 1 999 0 1 0
poutcome_success Loan_Status_label
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
[5 rows x 60 columns]
In [7]: y = df['Loan_Status_label']
Out[15]: ▾ DecisionTreeClassifier i ?
DecisionTreeClassifier(max_depth=5, random_state=42)
Out[18]: ▾ XGBClassifier i
Misclassified Instances:
[[ 0.28580588 -0.56620036 0.19466067 ... -0.34048171 0.39837381
-0.18496534]
[ 1.53153166 0.15371713 0.19466067 ... -0.34048171 0.39837381
-0.18496534]
[ 0.57328106 -0.56620036 0.19466067 ... 2.93701532 -2.51020518
-0.18496534]
...
[ 2.3939572 -0.56620036 0.19466067 ... -0.34048171 0.39837381
-0.18496534]
[-1.24739509 -0.56620036 0.19466067 ... -0.34048171 0.39837381
-0.18496534]
[-0.86409485 -0.20624161 0.19466067 ... -0.34048171 0.39837381
-0.18496534]]
In [73]: from sklearn.metrics import roc_curve, auc
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, label=f'XGBoost (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], 'r--') # Random classifier line
plt.title('ROC Curve')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend(loc='lower right')
plt.show()
▸ estimator: XGBClassifier
▸ XGBClassifier
In [ ]:
In [ ]: