0% found this document useful (0 votes)
104 views4 pages

Analytics Report: Gray Hunter and John Mcclintock Yamelle Gonzalez Dummy Variables APRIL 14, 2020

The researcher analyzed credit card data from Taiwanese customers to predict credit limits and likelihood of default. Two regression models were created: one used variables like age, education, marital status to predict limits; the other used age, average bill, payment amounts to predict defaults. Both models had low standard errors and high adjusted R-squares, showing the variables were significant predictors. The models allow estimating a customer's limit and default chance based on their characteristics.

Uploaded by

api-529885888
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views4 pages

Analytics Report: Gray Hunter and John Mcclintock Yamelle Gonzalez Dummy Variables APRIL 14, 2020

The researcher analyzed credit card data from Taiwanese customers to predict credit limits and likelihood of default. Two regression models were created: one used variables like age, education, marital status to predict limits; the other used age, average bill, payment amounts to predict defaults. Both models had low standard errors and high adjusted R-squares, showing the variables were significant predictors. The models allow estimating a customer's limit and default chance based on their characteristics.

Uploaded by

api-529885888
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

ANALYTICS REPORT

TO: GRAY HUNTER AND JOHN MCCLINTOCK

FROM: YAMELLE GONZALEZ

SUBJECT: DUMMY VARIABLES

DATE: APRIL 14, 2020

Introduction
As requested, the researcher studied and analyzed the credit card data from Taiwanese
customers further. His/her goal was to accurately predict the credit card limit, and the
likelihood of someone defaulting on their next month’s payment. For the model
predicting these customer’s credit limit, the researcher used the following variables: age,
education level, marital status, and age, and for the other model predicting their default
chance, the researcher used their age, average bill amount, and average previous payment
amounts. With this information, and by conducting regression models, the researcher
determined the best model for calculating the previously mentioned predictions. Below
you will find the two models conducted, which are regression statistic tests.

Data Analysis
Regression Output: Best Model to Predict Credit Limit
SUMMARY OUTPUT
Made by: Yamelle Gonzalez Dabdoub
Regression Statistics
Multiple R 0.344472735
R Square 0.118661465
Adjusted R Square 0.118444074
Standard Error 4027.958073
Observations 24332

ANOVA
df SS MS F Significance F
Regression 6 53136100338 8856016723 545.844006 0
Residual 24325 3.9466E+11 16224446.2
Total 24331 4.47796E+11

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 376.5023861 159.5896786 2.35919008 0.01832273 63.6967991 689.307973
Female 366.0376855 53.33062783 6.86355478 6.8781E-12 261.506374 470.568997
High School -1135.585699 75.081855 -15.124636 1.9102E-51 -1282.7508 -988.42065
Graduate School 2321.850475 57.82486723 40.1531484 0 2208.51018 2435.19077
Married 2978.65553 234.7488337 12.6886915 8.9362E-37 2518.53338 3438.77768
Married*Age -67.02842584 6.391249245 -10.487531 1.1183E-25 -79.555668 -54.501184
AGE 117.9569722 4.713377908 25.0259951 1.699E-136 108.718462 127.195483
Which variables are included in the best model? How do you know?
The variables that are included in the best model are the following: “Female”, “High
School”, “Graduate School”, “Married”, “Married*Age”, and “Age”. They are all
included because they are significant- low p value and high t statistic. In addition, a
researcher would never remove a significant variable, because it would make their model
less precise and reliable.

Regression Equation
^
Limit :376.50+ 366.04 ( Female )−1135.59 ( High School ) +2321.85 ( Graduate School )+ 2978.66 ( Married ) −67.

Interpretation of the cand the Standard Error


R2 : The researcher is 11.87% of the way toward perfectly predicting the credit limit using
this model and the variables given: female, high school, graduate school, married,
married*age, and age.

Standard Error: The researcher’s predictions of the credit limit are off by an average of
$4,027.96.

Interpretation of the Coefficients


Female: Females have a credit limit $366.04 higher than males, on average and all else
constant.

High School: Someone with a high school degree has a credit limit of $1135.59 lower
than someone with a graduate school degree, on average and else constant.

Graduate School: Someone with a graduate school degree has a credit limit of $2321.85
higher than someone with a high school degree, on average and else constant.

Married: At zero years old, someone who is married would have a credit limit of
$2978.66 higher than someone who is single, on average and all else constant.

Married*Age: For married people, as age increases by 1 year, credit limit increases by
$51, on average and all else constant.
OR
For married people, as age increases by 1 year, credit limit increases by $67.03 less than
for a single person, on average and all else constant.

Age: For single people, as age increases by 1 year, credit limit increases by $117.96, on
average and all else constant.

Note: *The slopes written above mention the slope for married and single people as well
as the changes for married people*

Regression Output: Best Model to Predict the Chance of Defaulting on Next Month’s
Payment

2
SUMMARY OUTPUT
Made by: Yamelle Gonzalez Dabdoub
Regression Statistics
Multiple R 0.11812777
R Square 0.01395417
Adjusted R Square 0.01383258
Standard Error 0.41209391
Observations 24332

ANOVA
df SS MS F Significance F
Regression 3 58.46631187 19.4887706 114.760399 8.5694E-74
Residual 24328 4131.414791 0.16982139
Total 24331 4189.881103

Coefficients Standard Error t St at P-value Lower 95% Upper 95%


Intercept 0.20453129 0.010650007 19.204803 1.3566E-81 0.18365662 0.22540596
Average Bill 6.8707E-06 1.34359E-06 5.11366262 3.1837E-07 4.2372E-06 9.5042E-06
Average Payment -0.0001685 9.23401E-06 -18.24578 6.957E-74 -0.0001866 -0.0001504
AGE 0.00100365 0.000289205 3.47036809 0.00052065 0.00043679 0.00157051

Which variables are included in the best model? How do you know?
The variables that are included in the best model are the following: “Average Bill”,
“Average Payment”, and “Age”. They are all included because they are significant- low p
value and high t statistic. In addition, and as mentioned before, a researcher would never
remove a significant variable, because it would make their model less precise and
reliable.

Regression Equation
^
P( Default =1) :0.20+0.0000069 ( Average Bill )−0.00017 ( Average Payment ) +0.0010( Age)

Interpretation of the R2and the Standard Error


R2 :The researcher is 1.40% of the way toward perfectly predicting the chance of
defaulting on next month’s payment using this model and the variables given: age,
average bill amount and average previous payment amounts.

Standard Error: The researcher’s predictions of the default chance are off by an average
of 41.21 percentage points.

Interpretation of the Coefficients


Average Bill: As average bill amount increases by $1000, chance of defaulting increases
by 0.69 percentage points, on average and all else constant.

Average Payment: As average payment amounts increases by $10, chance of defaulting


decreases by 0.17 percentage points, on average and all else constant.

Age: As age increases by 1 year, chance of defaulting increases by 0.0010 percentage


points, on average and all else constant.

3
For the “Limit” Model-Prediction of the Limit of a 35-year-old, Single, Female with a
High School Degree
^
Limit :376.50+ 366.04 ( Female )−1135.59 ( High School ) +2321.85 ( Graduate School )+ 2978.66 ( Married ) −67.

376.50+366.04(1)-1135.59(1)+2321.85(0)+2978.66(0)-67.03(0*35)+117.96(35)
$3735.55

For the “Default” Model-Prediction of the Chance of Someone Defaulting who is 25


years old, has an Average Bill Amount of $1150, and Average Payments of $900

^
P( Default =1) :0.20+0.0000069 ( Average Bill )−0.00017 ( Average Payment ) +0.0010( Age)
0.20+0.0000069 ( 1150 ) −0.00017 ( 900 ) +0.0010(25)
0.079935 percentage points

Conclusion
The researcher would recommend individuals to use either the credit limit regression
statistics test or the default chance regression statistics test due to having a low standard
error, high adjusted R2, and showing significance in all of its variables. In more depth, the
credit limit regression statistics test had a standard error of 4027.96 and an adjusted R2of
0.1184 whereas the default chance regression statistics test had a standard error of 0.4121
and an adjusted R2 of 0.0138. Although the credit limit regression statistics test appears to
have a much higher standard error and a much lower adjusted R2, it is not the case
entirely, as it is only calculated in a different measurement. Besides, all of the p-values
were low, and all of the t statistics were relatively high. To conclude, these models with
the given variables are quite effective and reliable for predicting the credit limit and
chance of default of Taiwanese customers.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy