0% found this document useful (0 votes)

145 views3 pages

Team 14 - Project Documentation - Taiwan Credit Defaults v1.0

1. The document analyzes credit card payment default data from Taiwan in 2005 when consumer debt was high. 2. Three machine learning algorithms - logistic regression, random forest, and naive bayes - are tested on the data to predict payment defaults. 3. Random forest performs the best with a sensitivity of 88% and specificity of 45% for predicting defaults, balancing correct predictions of defaulters and non-defaulters. Recent payments, billed amounts, and payment status are the most important predictors.

Uploaded by

Swetha Karthikeyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

145 views3 pages

Team 14 - Project Documentation - Taiwan Credit Defaults v1.0

Uploaded by

Swetha Karthikeyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Taiwan Credit Defaults

Group 14: Atindra Bandi, Henry Chang, Avani Sharma, Abraham Khan

Introduction
The Taiwanese economy experienced tremendous growth during the 1990’s, almost doubling in
value along with the other countries known as the Asian Tigers . The country’s financial sector
was heavily involved in the growth of real estate during this period. However, in the early 2000’s,
this growth slowed and banks in Taiwan turned towards consumer lending to continue the
expansion. As a result, credit requirements were loosened and consumers were encouraged to
spend by borrowing capital.

We will be analyzing data on Taiwanese credit card holders from mid-2005, as the flood of debt
was reaching its peak. The following algorithms will be tested for accuracy in predicting if an
individual will miss their next payment (predictor variable).
 Logistic Regression
 Random forest
 Naïve Bayes

The banking data is skewed with only 22.1% defaulters which could potentially lead to biased
predictions. Thus, we compared the models in terms of their specificity and sensitivity - the
proportion of correct positive predictions (Non-defaulters) and proportion of correct negative
predictions (Defaulters). Each method was compared on the basis of Receiver Operating
Characteristic curve.

Exploratory Data Analysis

The data contains five categorical variables: sex, education level, marital status for the prior six
months, and whether or not the person missed their next month payment. There are also four
numerical variables: the person’s credit limit, age, bill and payment amounts for the last six
months.

Demographical measures:

University Graduate High School Others

47% 35% 16% 2%

Married Single Others Male Female

53% 45% 2% 40% 60%
Defaulters 23% 21% 24% Defaulters 24% 21%
Non - Defaulters 77% 79% 76% Non - Defaulters 76% 79%

Feature Engineering
We created variables spend per month, weighted spending; pay spend ratio and payment month
status as the demographics did not show a strong indication of defaulting.
1. Spend = Bill amount this month – Bill amount last month +Payment this month
2. Pay Spend Ratio = ∑ Pay e t / ∑ Spe di g
3. Mean Spend ratio = Weighted Spending /Limit Balance
4. Weighted Payment Delay = 0.4* PAY_0+0.25* PAY_2+0.15* PAY_3+0.1* PAY_4+0.05*
PAY_AMT5+0.05* PAY_6
5. Grouped ages
6. Interactions terms SEX*MARRIAGE, MARRIAGE*AGE, EDUCATION*AGE

Analysis
We performed a lasso analysis on logistic regression with all variables for both variable selection
and regularization. From the cross validation we got the most optimum lambda along with
variables that significantly impacted our dependent variable. Following are the most significant
variables from the analysis:

1. The dummy variables derived from education, marriage & payment status come significant
and hence were incorporated.
2. The most recent pay status, payment and bill amount also came out as significant
variables.
3. From the feature engineering the two most significant variables were weighted spend and
mean spend

We moved to techniques like Naives Bayes and Random Forest to improve our results further
keeping the variables selected from Lasso. The optimum number of trees from the random forest
plot is 700. We selected random forest as it had a higher AUC in ROC curve which depicts that it
could reach a higher sensitivity with low false positive rate. Although we could reach a very high
accuracy of 94% but there was a trade-off between both sensitivity (96%) and specificity (37%).
For our problem statement we selected a lower accuracy of 80% with a sensitivity of 88% and
specificity of 45%. Since the bank has to take remedial measures to lower the credit debt it has to
identify the probable defaulters correctly (specificity) and at the same time keep their market hold
by not penalizing the non-defaulters (sensitivity). Our sensitivity, specificity and accuracy are
selected in the same spirit.

Conclusion

1. Random forest has the best balance between TPR and FPR. The model is accurate as well
as can predict the number of defaulters the best among all algorithms.
2. Recent payments, billed amounts and payment status are the most important variables for
prediction
3. Demographic variables are not important predictors for defaulting as could be actually
seen from our data exploration exercise.
4. Feature engineering led us to many interesting variables of which few actually turned out
to be significant in our final model equation.
5. We could predict our test set with 80% accuracy and defaulter rate prediction of 53% and
non-defaulter rate prediction accuracy of

Dataset:
http://archive.ics.uci.edu/ml/machine-learning-databases/00350/

Cross Validation Results:

Data Set Information:

The data contains 30000 observations with the predictor variables as well as the response
variable. Our test set contained 6000 random observations with the response variable removed.

Variable descriptions:
This employed a binary variable, default payment (Yes = 1, No = 0), as the response variable. This
study reviewed the literature and used the following 23 variables as explanatory variables:
LIMIT_BAL: Amount of the given credit (NT dollar): it includes both the individual consumer credit
and his/her family (supplementary) credit.
SEX: Gender (1 = male; 2 = female).
Education: (1 = graduate school; 2 = university; 3 = high school; 4 = others).
Marital status: (1 = married; 2 = single; 3 = others).
Age: In years.
PAY_1 – PAY_6: History of past payment. We tracked the past monthly payment records (from
April to September, 2005) as follows: PAY_1 = the repayment status in September, 2005; PAY_2 =
the repayment status in August, 2005; . . .; PAY_6 = the repayment status in April, 2005. The
measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2
= payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for
nine months and above.
BILL_AMT1 – BILL_AMT6 -: Amount of bill statement (NT dollar). BILL_AMT1 = amount of bill
statement in September, 2005; BILL_AMT2 = amount of bill statement in August, 2005; . . .;
BILL_AMT6 = amount of bill statement in April, 2005.
PAY_AMT1 – PAY_AMT6: Amount of previous payment (NT dollar). PAY_AMT1 = amount paid in
September, 2005; PAY_AMT2 = amount paid in August, 2005; . . .; PAY_AMT6 = amount paid in
April, 2005.

The Case Against The Case For Christ - Robert M. Price
56% (27)
The Case Against The Case For Christ - Robert M. Price
308 pages
Analysis of German Credit Data
100% (1)
Analysis of German Credit Data
24 pages
Credit Card Default
No ratings yet
Credit Card Default
30 pages
Credit EDA Assignment
67% (6)
Credit EDA Assignment
41 pages
Credit EDA Case Study
100% (3)
Credit EDA Case Study
16 pages
EDA Loan Case Study PPT - Ver 1.1
80% (5)
EDA Loan Case Study PPT - Ver 1.1
22 pages
Mini Tennis Coaching Manual
100% (5)
Mini Tennis Coaching Manual
110 pages
EDA Credit Case Study (Karan Pratap Singh)
100% (1)
EDA Credit Case Study (Karan Pratap Singh)
63 pages
Credit Card EDA: Authored by
100% (1)
Credit Card EDA: Authored by
16 pages
Capstone Project
No ratings yet
Capstone Project
33 pages
HW1 - Group 2
No ratings yet
HW1 - Group 2
9 pages
Predicting Loan Default Data Analytics
100% (1)
Predicting Loan Default Data Analytics
3 pages
Lending Club Data Analysis and Default
No ratings yet
Lending Club Data Analysis and Default
10 pages
Final Project Title and Abstract Group-3
No ratings yet
Final Project Title and Abstract Group-3
5 pages
Qtmfinalpresentationpaper
No ratings yet
Qtmfinalpresentationpaper
19 pages
Capastone Project Taiwan Customer Default
67% (3)
Capastone Project Taiwan Customer Default
36 pages
Eda Case Study Final PDF
100% (1)
Eda Case Study Final PDF
15 pages
Analysis of Bank Loan Data
No ratings yet
Analysis of Bank Loan Data
8 pages
Analysis of Bank Loan Data
No ratings yet
Analysis of Bank Loan Data
8 pages
Loan Default Prediction Article Mar 31 2021
No ratings yet
Loan Default Prediction Article Mar 31 2021
14 pages
PBA
No ratings yet
PBA
30 pages
EDA Case Study
No ratings yet
EDA Case Study
94 pages
Assignment 3 F1 - F4
No ratings yet
Assignment 3 F1 - F4
19 pages
Coser Al. Crisan Albu (T)
No ratings yet
Coser Al. Crisan Albu (T)
17 pages
A Scorecard Model Using Survival Analysis Framework - EXPLODED PANEL REGRESSION
No ratings yet
A Scorecard Model Using Survival Analysis Framework - EXPLODED PANEL REGRESSION
16 pages
Reading Material - Module-5 - Introduction To Special Topics
No ratings yet
Reading Material - Module-5 - Introduction To Special Topics
27 pages
Credit EDA Case Study
No ratings yet
Credit EDA Case Study
42 pages
Final Report
No ratings yet
Final Report
69 pages
FRA Assignment
100% (1)
FRA Assignment
31 pages
MKT Research HW 5
No ratings yet
MKT Research HW 5
5 pages
Bank Credit Card Default Prediction: Problem Statement
No ratings yet
Bank Credit Card Default Prediction: Problem Statement
5 pages
IDS 575 Project Report
No ratings yet
IDS 575 Project Report
9 pages
Case Study Credit Card Default
No ratings yet
Case Study Credit Card Default
2 pages
EDA Group Case Study
No ratings yet
EDA Group Case Study
33 pages
The VIth International Conference Advanced Information Systems and Technologies, AIST 2018
No ratings yet
The VIth International Conference Advanced Information Systems and Technologies, AIST 2018
4 pages
Student's First Name, Middle Initial(s), Last Name Institutional Affiliation Course Number and Name Instructor's Name and Title Assignment Due Date
No ratings yet
Student's First Name, Middle Initial(s), Last Name Institutional Affiliation Course Number and Name Instructor's Name and Title Assignment Due Date
20 pages
Omkar Gaikwad Project..Suk
No ratings yet
Omkar Gaikwad Project..Suk
23 pages
Case: German Credit: Var. # Variable Name Description Variable Type Code Description
No ratings yet
Case: German Credit: Var. # Variable Name Description Variable Type Code Description
4 pages
German Credit Data Information - 2015
No ratings yet
German Credit Data Information - 2015
3 pages
Capstone Project Report v1 - Abhishek Bihani
No ratings yet
Capstone Project Report v1 - Abhishek Bihani
16 pages
Credit Card Default Predicati ON: High Level Design
No ratings yet
Credit Card Default Predicati ON: High Level Design
6 pages
Development of A Credit Scoring Model On The Public Report Data From Bondora P2P Lending Platform
No ratings yet
Development of A Credit Scoring Model On The Public Report Data From Bondora P2P Lending Platform
5 pages
Xtreme Boosting Machine
No ratings yet
Xtreme Boosting Machine
5 pages
Credit Card Default Predication: Larissa Pereira Meet Patel
No ratings yet
Credit Card Default Predication: Larissa Pereira Meet Patel
6 pages
Sivaprasadareddy.k 17125760065 Default of Credit Card Clients Data Set
No ratings yet
Sivaprasadareddy.k 17125760065 Default of Credit Card Clients Data Set
1 page
Dot Cards Introduction Procedure
100% (1)
Dot Cards Introduction Procedure
4 pages
Credit Card Default Taiwan Initial Exploratory
No ratings yet
Credit Card Default Taiwan Initial Exploratory
26 pages
Vechile Loan Defaulter
No ratings yet
Vechile Loan Defaulter
23 pages
An Kit
No ratings yet
An Kit
12 pages
Ppa Final Project
No ratings yet
Ppa Final Project
17 pages
Nazreen - CIA 2 Applied Data Mining and Big Data
No ratings yet
Nazreen - CIA 2 Applied Data Mining and Big Data
5 pages
Credit Default Project 23124001
No ratings yet
Credit Default Project 23124001
13 pages
November 2010)
No ratings yet
November 2010)
6 pages
Group 5 Dseb64a Report
No ratings yet
Group 5 Dseb64a Report
10 pages
Credit Defaulter Classifier 1659348484
No ratings yet
Credit Defaulter Classifier 1659348484
7 pages
PC Magazine - February 2014 USA
No ratings yet
PC Magazine - February 2014 USA
142 pages
Capstone Project PPT
No ratings yet
Capstone Project PPT
13 pages
Spark Python Course APPLY Project Problem Statement
No ratings yet
Spark Python Course APPLY Project Problem Statement
3 pages
Banking Credit Risk Analysis With Naive Bayes Approach and Cox Proportional Hazard
No ratings yet
Banking Credit Risk Analysis With Naive Bayes Approach and Cox Proportional Hazard
6 pages
Capstone Project
100% (1)
Capstone Project
7 pages
Mathematical Language and Symbols
No ratings yet
Mathematical Language and Symbols
11 pages
Grade 1 Sses Quiz Bee Reviewer
88% (8)
Grade 1 Sses Quiz Bee Reviewer
8 pages
Credit Card Data Dictionary-1
No ratings yet
Credit Card Data Dictionary-1
2 pages
Capstone Project Taiwan
No ratings yet
Capstone Project Taiwan
6 pages
Credit Risk Management Using ML
No ratings yet
Credit Risk Management Using ML
4 pages
Physics Unit & Mesaurement
No ratings yet
Physics Unit & Mesaurement
26 pages
Effects of Storage Temperature On Post-Harvest of Potato by Bikash Khanal & Dipti Uprety
No ratings yet
Effects of Storage Temperature On Post-Harvest of Potato by Bikash Khanal & Dipti Uprety
7 pages
8DG24624AGAATQZZA - V1 - 1850 Transport Service Switch 5C (TSS-5C) Release 6.1 User Provisioning Guide PDF
No ratings yet
8DG24624AGAATQZZA - V1 - 1850 Transport Service Switch 5C (TSS-5C) Release 6.1 User Provisioning Guide PDF
464 pages
EDU4 Instructors Lesson Plan Final
No ratings yet
EDU4 Instructors Lesson Plan Final
26 pages
Work Order For School Uniform
No ratings yet
Work Order For School Uniform
1 page
Electric Submersible Pump
No ratings yet
Electric Submersible Pump
2 pages
OSI Security Architecture
No ratings yet
OSI Security Architecture
5 pages
Economics A Contemporary Introduction With InfoTrac 7th Edition William A. Mceachern Instant Download
No ratings yet
Economics A Contemporary Introduction With InfoTrac 7th Edition William A. Mceachern Instant Download
55 pages
Procession of The Sorcerers - Flute Sheet Music Robert Buckley Concert Band
No ratings yet
Procession of The Sorcerers - Flute Sheet Music Robert Buckley Concert Band
1 page
The Tech Guy Files
No ratings yet
The Tech Guy Files
105 pages
Euripides Our Contemporary 1st Edition J. Michael Walton Download PDF
100% (5)
Euripides Our Contemporary 1st Edition J. Michael Walton Download PDF
55 pages
3 Laptop 26 Oktober 2020
No ratings yet
3 Laptop 26 Oktober 2020
1 page
Student Council Vice President: Zanna Mccleary
No ratings yet
Student Council Vice President: Zanna Mccleary
8 pages
Ati Teas 6 English Language Study Guide
No ratings yet
Ati Teas 6 English Language Study Guide
23 pages
FSC BT405 Datasheet
No ratings yet
FSC BT405 Datasheet
6 pages
Adore You
No ratings yet
Adore You
5 pages
Affirmative Action in Malaysia: Education and Employment Outcomes Since The 1990s
No ratings yet
Affirmative Action in Malaysia: Education and Employment Outcomes Since The 1990s
37 pages
Eee Lab-3
No ratings yet
Eee Lab-3
4 pages
Artificial Intelligence The Death of Creativity
No ratings yet
Artificial Intelligence The Death of Creativity
2 pages
Nagoyamotor Com Kawasaki-Catalog
No ratings yet
Nagoyamotor Com Kawasaki-Catalog
5 pages
Chapter 7: Data Link Control Protocols True or False: Data and Computer Communications, 10 Edition, by William Stallings
No ratings yet
Chapter 7: Data Link Control Protocols True or False: Data and Computer Communications, 10 Edition, by William Stallings
5 pages
35 City of Manila Vs Chinese Community
No ratings yet
35 City of Manila Vs Chinese Community
2 pages
Year 3 and 4 Statutory Spelling Words Activity Mat Pack 3
No ratings yet
Year 3 and 4 Statutory Spelling Words Activity Mat Pack 3
5 pages
Science 7th Paper
No ratings yet
Science 7th Paper
2 pages
High Credit Score Step by Step
From Everand
High Credit Score Step by Step
paulo gomes
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Team 14 - Project Documentation - Taiwan Credit Defaults v1.0

Uploaded by

Team 14 - Project Documentation - Taiwan Credit Defaults v1.0

Uploaded by

Taiwan Credit Defaults

Exploratory Data Analysis

University Graduate High School Others

Married Single Others Male Female

Cross Validation Results:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.