Analytics Report: Gray Hunter and John Mcclintock Yamelle Gonzalez Dummy Variables APRIL 14, 2020
Analytics Report: Gray Hunter and John Mcclintock Yamelle Gonzalez Dummy Variables APRIL 14, 2020
Introduction
As requested, the researcher studied and analyzed the credit card data from Taiwanese
customers further. His/her goal was to accurately predict the credit card limit, and the
likelihood of someone defaulting on their next month’s payment. For the model
predicting these customer’s credit limit, the researcher used the following variables: age,
education level, marital status, and age, and for the other model predicting their default
chance, the researcher used their age, average bill amount, and average previous payment
amounts. With this information, and by conducting regression models, the researcher
determined the best model for calculating the previously mentioned predictions. Below
you will find the two models conducted, which are regression statistic tests.
Data Analysis
Regression Output: Best Model to Predict Credit Limit
SUMMARY OUTPUT
Made by: Yamelle Gonzalez Dabdoub
Regression Statistics
Multiple R 0.344472735
R Square 0.118661465
Adjusted R Square 0.118444074
Standard Error 4027.958073
Observations 24332
ANOVA
df SS MS F Significance F
Regression 6 53136100338 8856016723 545.844006 0
Residual 24325 3.9466E+11 16224446.2
Total 24331 4.47796E+11
Regression Equation
^
Limit :376.50+ 366.04 ( Female )−1135.59 ( High School ) +2321.85 ( Graduate School )+ 2978.66 ( Married ) −67.
Standard Error: The researcher’s predictions of the credit limit are off by an average of
$4,027.96.
High School: Someone with a high school degree has a credit limit of $1135.59 lower
than someone with a graduate school degree, on average and else constant.
Graduate School: Someone with a graduate school degree has a credit limit of $2321.85
higher than someone with a high school degree, on average and else constant.
Married: At zero years old, someone who is married would have a credit limit of
$2978.66 higher than someone who is single, on average and all else constant.
Married*Age: For married people, as age increases by 1 year, credit limit increases by
$51, on average and all else constant.
OR
For married people, as age increases by 1 year, credit limit increases by $67.03 less than
for a single person, on average and all else constant.
Age: For single people, as age increases by 1 year, credit limit increases by $117.96, on
average and all else constant.
Note: *The slopes written above mention the slope for married and single people as well
as the changes for married people*
Regression Output: Best Model to Predict the Chance of Defaulting on Next Month’s
Payment
2
SUMMARY OUTPUT
Made by: Yamelle Gonzalez Dabdoub
Regression Statistics
Multiple R 0.11812777
R Square 0.01395417
Adjusted R Square 0.01383258
Standard Error 0.41209391
Observations 24332
ANOVA
df SS MS F Significance F
Regression 3 58.46631187 19.4887706 114.760399 8.5694E-74
Residual 24328 4131.414791 0.16982139
Total 24331 4189.881103
Which variables are included in the best model? How do you know?
The variables that are included in the best model are the following: “Average Bill”,
“Average Payment”, and “Age”. They are all included because they are significant- low p
value and high t statistic. In addition, and as mentioned before, a researcher would never
remove a significant variable, because it would make their model less precise and
reliable.
Regression Equation
^
P( Default =1) :0.20+0.0000069 ( Average Bill )−0.00017 ( Average Payment ) +0.0010( Age)
Standard Error: The researcher’s predictions of the default chance are off by an average
of 41.21 percentage points.
3
For the “Limit” Model-Prediction of the Limit of a 35-year-old, Single, Female with a
High School Degree
^
Limit :376.50+ 366.04 ( Female )−1135.59 ( High School ) +2321.85 ( Graduate School )+ 2978.66 ( Married ) −67.
376.50+366.04(1)-1135.59(1)+2321.85(0)+2978.66(0)-67.03(0*35)+117.96(35)
$3735.55
^
P( Default =1) :0.20+0.0000069 ( Average Bill )−0.00017 ( Average Payment ) +0.0010( Age)
0.20+0.0000069 ( 1150 ) −0.00017 ( 900 ) +0.0010(25)
0.079935 percentage points
Conclusion
The researcher would recommend individuals to use either the credit limit regression
statistics test or the default chance regression statistics test due to having a low standard
error, high adjusted R2, and showing significance in all of its variables. In more depth, the
credit limit regression statistics test had a standard error of 4027.96 and an adjusted R2of
0.1184 whereas the default chance regression statistics test had a standard error of 0.4121
and an adjusted R2 of 0.0138. Although the credit limit regression statistics test appears to
have a much higher standard error and a much lower adjusted R2, it is not the case
entirely, as it is only calculated in a different measurement. Besides, all of the p-values
were low, and all of the t statistics were relatively high. To conclude, these models with
the given variables are quite effective and reliable for predicting the credit limit and
chance of default of Taiwanese customers.