0% found this document useful (0 votes)
76 views19 pages

DS II Mid Term 2017 Solution

1) The regression model between per capita income and corruption index explains 74.3% of the variation in per capita income. There is a statistically significant relationship between the two variables. 2) At a 95% confidence level, the minimum average value of per capita income when corruption index is 50 is 25552.6836 dollars. 3) The average per capita income of communist states based on the regression model is 22912.282 dollars. However, the model is not statistically significant due to heteroscedasticity in the residuals.

Uploaded by

kapadia krunal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views19 pages

DS II Mid Term 2017 Solution

1) The regression model between per capita income and corruption index explains 74.3% of the variation in per capita income. There is a statistically significant relationship between the two variables. 2) At a 95% confidence level, the minimum average value of per capita income when corruption index is 50 is 25552.6836 dollars. 3) The average per capita income of communist states based on the regression model is 22912.282 dollars. However, the model is not statistically significant due to heteroscedasticity in the residuals.

Uploaded by

kapadia krunal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

1

Decision Sciences II
Mid-Term Examination (Solution)
Wednesday, October25, 2017
Time : 180 minutes
Total No. of Pages :20
Name________________________
Total No. of Questions: 3 Roll No. ________________________
Total marks:55 Section ________________________

Instructions
1. This is a closed book exam. You are NOT allowed to use text book and class notes.
2. Answer all questions only in the space provided following the question.
3. Show all work and give adequate explanations to get full credit.
4. You may use the backside of the last page for rough work only if needed. Do NOT attach any
rough work/sheets.
5. Encircle or underline your final answer for each part.
6. No clarifications will be made during the exam.
7. Assume 95% confidence level if necessary ( = 0.05).
8. Use approximate critical values for Z, t, F, and 2 tests if the exact value is not available in the
tables attached with the question paper.

Question Q1 Q2 Q3
Number
Max Marks 20 15 20 Total
Marks Scored
2

Question 1 (20 points)

Per Capita Income of 20countries were analysed using the variables described in Table 1.

Table 1. Data Dictionary


S.No Variable Variable Type Code in SPSS output
1 Per Capita Income (in Numerical Per Capita
Dollars)
2 Corruption Index (Higher Integer CI
Value indicates lower
level of corruption in the
country)
3 Gini Index (Measure of Numerical Gini
Wealth Distribution and
Discrimination)
4 Communist State Binary CS
(Whether the county 1 = Communist State; 0 otherwise
was/is a communist state)

Descriptive statistics of the variables and correlations are shown in Tables 2 and 3 respectively.

Table 2 Descriptive Statistics


Std.
N Minimum Maximum Mean
Deviation
CI 20 29.0 90.0 61.700 20.6171
Gini 20 23.5 53.7 34.740 7.3846
CS 20 .0 1.0 .250 .4443
PerCapita 20 12275.0 69249.0 37789.050 15847.4829
Valid N
20
(listwise)

Table 3 Correlations
CI Gini CS PerCapita
CI 1 -.464* -.612** .862**
Gini -.464* 1 .253 -.338
CS -.612** .253 1 -.556*
Per Capita .862** -.338 -.556* 1

Model 1
Y (Per Capita) = 0 + 1 x CI

A simple linear regression model (Model 1) is developed between per capita (Y) and corruption
index (CI).SPSS model outputs are shown in Tables 4 and 5. Normal P-P Plot and Residual Plot
are shown in Figures 1 and 2 respectively.
3

Table 4 Model Summaryb


Model R R Square Adjusted R Std. Error of the
Square Estimate
1 8241.4390
a. Predictors: (Constant), CI
b. Dependent Variable: Per Capita

Table 5 Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) -3112.753 5950.818 -.523 .607
1
CI 662.914 91.706 .862
a. Dependent Variable: Per Capita

Figure 1 Normal Probability Plot Figure 2 Residual Plot


Question 1.1 (1 points)
What proportion of the variation in per capita income is explained by corruption index (CI)?

SOLUTION:
r = 0.862 (correlation value of percapita and CI)
R2 = r2 = (0.862)2 = 0.743
74.3 % of variation in percapita income is explained by CI.
4

Question 1.2 (1 point)


Is there a statistically significant relationship between corruption index and per capita income of
the countries at 5% significance?

SOLUTION:
We can use either F test or t test for this problem.
H0 : β1 = 0
H1 : β1 ≠ 0
T test: F test:
t- stat = B1 - β1 / SE (B1) = 7.22 F = (SSR/k) / (SSE/ (n-k-1) ) = 52.05
t-critical (0.025,18) = 2.101 F-critical(0.05,1,18) = 4.414
t-stat > t-critical, we reject null hypothesis. F > F-critical, we reject null hypothesis.
There is a statistically significant relationship between percapita income and
CI at 5% significance.

Question 1.3 (2 points)


Is it possible to conclude that the per capita income increases by at least 500 dollars for every one
unit increase in corruption index at 10% significance level? Clearly write all the steps.

SOLUTION:
H0 : β1< 500
H1 : β1 ≥ 500
T test:
t- stat = B1 - β1 / SE (B1) = 1.776
t-critical(0.1,18) = 1.33 (since one tailed test)
t-stat > t-critical, we reject null hypothesis.
Therefore, the per capita income increases by at least 500 dollars for every one
unit increase in corruption index at 10% significance level.
5

Question 1.4 (1 Point)


What can you conclude about model between per capita and CI based on the plots in Figures 1
and 2?

SOLUTION:
Figure 1 : The error follows approximately normal distribution.
Figure 2 : The residual plot shows no pattern and therefore we can conclude that there is
homoscedasticity.

Question 1.5 (3 Points)


What is the minimumaverage value of per capita at 95% confidence interval when CI = 50?

SOLUTION:

^ 1 ( X i  X )2
Y i  t / 2, n  k 1 * se * 
n SSX

^
Y i  30032.947

tα/2, 18 = 2.101

se = 8241.439

Xi = 50 X = 61.70

SSX = (n-1)* SD2 = 8076.23

Minimum average value = 25552.6836

A second model is developed between Per Capita and Communists States (CS).
Model 2
Y (Per Capita) = 0 + 1 x CS
The outputis shown in Table 6.Normal probability and residual plots are shown in Figures 3 and
4 respectively.
Table 6 Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) 42743.933 3495.319 12.229 .000
1
CS 6990.639 -.556
a. Dependent Variable: PerCapita
6

Figure 3 Normal Probability Plot


Figure 4 Residual Plot
Question 1.6 (2 Points)
Calculate the average per capita of communist states. Clearly write all the steps?

SOLUTION:
^ 
Y   0  1 * X

β1 = Std β * ( SY / SX ) = -19831.646
β0 = 42743.933

X = 1 (for communist states)
^
Y = 22912.282

Question 1.7(2 points)


Is model 2 statistically significant at 5% significance, use all the information (Table 6, Figures 3
and 4) provided. Clearly write all the arguments.

SOLUTION:
From table 6, p-value is 0.
When p-value < 0.05, we can say that the model is significant but from Figure 4, it is clear that
there exists heteroscedasticity. So, the p-value may not be reliable.
7

A stepwise regression model is developed using CI and Gini as independent variables and the
outputs are shown in Table 7.
Table 7 Stepwise Regression Output
Model Unstandardized t Sig. Correlations
Coefficients
B Std. Error Zero-order Partial Part
(Constant) -3112.753 5950.818 -.523 .607
1
CI 662.914 91.706 7.229 .000 .862 .862 .862
(Constant) -10781.284 14572.250 -.740 .469
2 CI 691.235 105.487 6.553 .000 .862 .846 .797
Gini -.338 .139 .070

Question 1.8 (2 Points)


What is the value of R-square after adding the variable Gini to the model?

SOLUTION:
R2 after adding gini = R2 of model without gini + ( part correlation of gini in new model)2
= 0.7479

Question 1.9 (2 points)


Carry out an appropriate hypothesis test to check whether the variable “Gini” is worth adding to
the model at 5% significance.

SOLUTION:
Use partial F test for this problem.
H0 : β1 = 0
H1 : β1 ≠ 0
Partial F test :
( RF2  RR2 ) / r
 = 0.33048
(1  RF2 ) / (n  k  1)
F critical (0.05, 1, 17) = 4.451

F < F-critical, we do not reject null hypothesis.


Thus, adding gini is not worthy.
8

Question 1.10 (2 points)


Calculate the variance inflation factor between variables CI and Gini. What can you conclude
from the calculated VIF value?

SOLUTION:
VIF = 1 / (1 - R2) = 1.27436
(R = correlation value of CI and gini)
Since VIF <4, no multicollinearity between CI and gini.

A stepwise regression model is developed using all the 3 independent variables and the SPSS
outputs are given in Tables 8 and 9
Table 8 Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) -3112.753 5950.818 -.523 .607
1
CI 662.914 91.706 .862 7.229 .000
a. Dependent Variable: PerCapita

Table 9 Excluded Variablesa


Model Beta In t Sig. Partial Collinearity
Correlation Statistics
Tolerance
Gini .079b .579 .570 .139 .785
1 b
CS -.044 -.287 .777 -.070 .625
a. Dependent Variable: PerCapita
b. Predictors in the Model: (Constant), CI

Question 1.11 (2 points)


Based on the information provided in Tables 8 and 9, is it possible to conclude that there is no
statistically significant relationship between Per Capita and independent variables Gini and CS?
Excluded variables in Table 9 are variables that are not part of the regression model (statistically
not significant) when stepwise regression is used.

SOLUTION:
We have to do F test for percapita,gini and percapita,CS.
F test:
F = R2 / ( (1 - R2)/18 )
9

Percapita,gini Percapita,CS
H0 : β1 = 0 H0 : β1 = 0
H1 : β1 ≠ 0 H1 : β1 ≠ 0

F (gini) = 2.3215 F (CS) = 8.0543

F-critical (0.05, 1, 18) = 4.414

F (gini) < F-critical


So, we accept null hypothesis. Thus, gini is not significant.

F(CS) > F-critical


So, we reject null hypothesis. Thus, CS is significant.
10

Question 2 (15 points)


Applicants who apply for a job at Precision Watches Inc., which requires extensive manual
assembly of small intricate parts, are initially given three different tests to measure their manual
dexterity. The ones who are hired are then periodically given a performance rating on a 0-100
scale that combines their speed and accuracy in performing the required assembly operations.
Data is collected on the test scores and performance ratings for a randomly selected group of 80
employees who continued working for the company. Their seniority (months with the company)
at the time of the performance rating is also noted. The summary information and the results from
four regression models developed using the data are given below:
Pairwise Correlation Matrix
JobPerf Seniority Test1 Test2 Test3
JobPerf 1
Seniority 0.43 1.00
Test1 0.58 1.00
Test2 0.52 0.60 1.00
Test3 0.62 0.66 0.80 1.00
Descriptive Statistics
Minimu Maximu Std.
N m m Mean Deviation
JobPerf 80 38 100 65.75 10.630
Seniority 80 7 30 18.89 5.00
Test1 80 31 82 60.53 9.576
Test2 80 37 86 60.75 9.872
Test3 80 26 77 50.71 9.181
Valid N
80
(listwise)
Model 1 Summary
Std. Error
R Adjusted of the Durbin-
Model R Square R Square Estimate Watson
1 .176 9.651 1.856
a Predictors: (Constant), Seniority
b Dependent Variable: JobPerf

Model 1 ANOVA
Sum of Mean
Model Squares Df Square F Sig.
1 Regression 1662.584 1 1662.584 17.852 .000
Residual 7264.416 78 93.134
Total 8927.000 79

Model 1 Coefficients

Model Unstandardized Standardized t Sig.


11

Coefficients Coefficients

B Std. Error Beta


1 (Constant) 11.8
48.928 4.125 .000
61
Seniority .891 .432

Model 2 Summary
Adjusted Std. Error
R R of the Durbin-
Model R Square Square Estimate Watson
1 .764 .583 .561 7.042 1.878
a Predictors: (Constant), Test3, Seniority, Test1, Test2
b Dependent Variable: JobPerf

Model 2 ANOVA
Sum of Mean
Model Squares Df Square F Sig.
1 Regression 5208.110 4 1302.027 26.258 .000
Residual 3718.890 75 49.585
Total 8927.000 79

Model 2 Coefficients
Standardize
Mode Unstandardized d Collinearity
l Coefficients Coefficients t Sig. Statistics
Std. Toleranc
B Error Beta e VIF
1 (Constant) 6.557 6.187 1.060 .293
Seniority .801 .155 .388 5.171 .000 .986 1.014
Test1 .300 .112 .271 2.693 .009 .550 1.819
Test2 .086 .135 .080 .640 .524 .355 2.816
Test3 .407 .154 .352 2.638 .010 .313 3.197

Model 3 Summary
Adjusted Std. Error
R R of the Durbin-
Model R Square Square Estimate Watson
1 .762 .581 .565 7.014 1.891
a Predictors: (Constant), Test3, Seniority, Test1
b Dependent Variable: JobPerf

Model 3 ANOVA
12

Sum of Mean
Model Squares Df Square F Sig.
1 Regression 5187.803 3 1729.268 35.148 .000
Residual 3739.197 76 49.200
Total 8927.000 79

Model 3 Coefficients
Standardize
Unstandardized d Collinearity
Model Coefficients Coefficients t Sig. Statistics
Std. Toleranc
B Error Beta e VIF
1 (Constant) 7.893 5.801 1.361 .178
Seniority .793 .154 5.157 .000 .993 1.008
Test1 .312 .110 2.844 .006 .565 1.771
Test3 .473 .114 4.145 .000 .567 1.764

Model 4 Summary
Adjusted Std. Error
R R of the Durbin-
Model R Square Square Estimate Watson
1 .757 .574 .562 7.031 1.843
a Predictors: (Constant), AvgScore, Seniority
b Dependent Variable: JobPerf

Model 4 ANOVA
Sum of Mean
Model Squares Df Square F Sig.
1 Regression 5120.011 2 2560.006 51.779 .000
Residual 3806.989 77 49.441
Total 8927.000 79

Model 4 Coefficients
Standardize
Mo Unstandardized d Collinearity
del Coefficients Coefficients t Sig. Statistics
Std. Toleranc
B Error Beta e VIF
1 (Constant) 5.407 6.010 .900 .371
Seniority .821 .154 .398 5.339 .000 .997 1.003
AvgScore .782 .094 .623 8.362 .000 .997 1.003
13

Use the information given above to answer the following questions. Specify the model(s) you
use to draw your conclusions, where relevant.

a) Can it be concluded that performance rating improves with length of stay with the company
(Seniority), irrespective of the original test scores? Select the appropriate model to answer the
question. (3 points)

Ans. Here we use Model 1 to test the hypothesis since test scores are ignored.
H0 : β1<= 0 vs. H1 : β1 > 0 (One-sided t-test)
S.E(β1) = Se/((n-1)*SSx)^(.5) = 9.651/(79*25)^(.5) = 0.217
So the t-statistic value will be given by -
tcalc= β1 – 0/S.E(β1) = .891/.217 = 4.103
From the table we get the value of t as – t.05,79 = 1.665
So, tcalc> t.05,78.
So, we reject H0 and hence conclude that performance rating increases with length
of stay with the company.

b) Predict the average performance rating for a worker who has 15 months of Seniority. What
are the highest and lowest performance ratings that this worker is likely to get at 90%
confidence level? (3 Points)

Ans. To predict the average performance rating of a worker with 15 months of experience:
AvgY = b0 + b1 * Seniority = 48.928 + .891*15 = 62.293
Hence, the average performance rating of the worker is 62.293

Prediction interval at 90% confidence level (note: PI for individual, not average)
P.I = (Y- t.05,78 *Se *(1+(1/n)+(X-𝑋̅)2/SSx )^(.5) , Y+ t.05,78 * Se *(1 +(1/n)+(X -𝑋̅)/SSx )^(.5))
= (62.293 – 1.664 * 9.651*(1+(1/80)+(15-18.89)2/(79*25))^(.5) ,
62.293 + 1.664 * 9.651*(1+(1/80)+(15-18.89)2 /(79*25))^(.5))
= (62.293 – 16.203 , 62.293 + 16.203 )
= ( 46.09 , 78.513)
So the highest and lowest performance ratings are: 78.513 and 46.09.

c) If Test 2 was used to predict performance scores on its own, is it likely to be a significant
predictor of JobPerf? Justify. In the presence of other 3 variables is it a significant predictor.
Why or not why not? (3 Points)

Ans. We can see from the correlation matrix that Corr(Test 2, JobPerf) = 0.52.
So if we build a model using Test 2 then it will have R2 = .522 = .2704 i.e. it will
explain 27% of the variation in the model which is satisfactory. So, we can conclude
that a significant relationship between Job Performance and Test 2 is likely.
From model 3, we see that in presence of other 3 variables, coeff of Test2 has a p-
14

value = .524, indicating that in presence of the other variables, Test 2 is not a
significant predictor. The reason for this is Test 2 has high correlation with Test 3
(.8) and Test 1 (.6). The part of the variation explained by Test 2 has largely been
explained by these other variables, leading to Test 2 becoming insignificant.

d) Can it be concluded that employees with higher average scores on the tests stay longer with
the company? Choose the appropriate models to compare. (3 points)

Ans. Here we consider Model 1 and Model 4. We use omitted variable bias to
conclude.
Model 1: 48.928 + 0.891*Seniority
Model 4: 5.407 + 0.821*Seniority + 0.782*AvgScore
The formula for omitted variable bias is given by -
α1 = β1 + β2 * Cov(X1,X2) / Var(X1) where, X1=Seniority , X2=AvgScore
Now , we are given α1 =0.891 , β1 = 0.821 , β2 = 0.782
Therefore,
Cov(X1,X2) / Var(X1) = (.891-.821)/.782 = .07/.782 > 0
Also, Var(X1) > 0 always.
=> Cov(X1,X2) > 0 i.e. X1 and X2 are positively correlated.
So, we can conclude that employees with higher average scores stay longer with the
company.

e) Two employees whose seniority differs by 5 months have the same average test score. Can it
be concluded that the performance rating of the more senior employee will be at least 3 points
higher at 5% significance level? (3 Points)

Ans. Here, the seniority differs by 5 months and we have to test if performance ratings
changes by at least 3 points.
So, in one month the rating would have to change by (3/5) = 0.6 points.
The hypothesis:
H0 : β1< 0.6 vs. H1 : β1 >= 0.6 right tailed t-test
The t-statistic is computed as -
tcalc = (0.821 – 0.6) / 0.154 = 0.221 / 0.154 = 1.435
From the t table we get : t.05,77 = 1.664
Hence tcalc < t.05,77
Therefore, we cannot reject the null hypothesis.
Hence, we cannot conclude that given a seniority difference 5 months, the
performance rating of the more senior employee will differ by at least 3 points.
15

Question 3 (20 Points)


A data analytics start up works with political parties during elections. They have got access to
voting patterns from various official sources. They are trying to understand how the percent of
votes obtained by the winner is determined. As a first cut they are using the following data:

% VOTES – the percent of votes polled obtained by the winning candidate


MARGIN – the margin of victory measured in number of votes
Gender – 1 is for Men and 0 for women
College – 1 is for college educated winners and 0 for those who did not go to college.
They run the regression for all 543 elected MPs. The model output is provided below (with few
missing information):
Table 3.1
Regression
Statistics
Multiple R 𝟎. 𝟕𝟐𝟖𝟖𝟎𝟗
R Square 𝟎. 𝟓𝟑𝟏𝟏𝟔𝟑
Adjusted R Square 𝟎. 𝟓𝟐𝟖𝟓𝟓𝟑
Standard Error 𝟓. 𝟔𝟑𝟑𝟐
Observations 543

Table 3.2
ANOVA
Significance
Df SS MS F F
Regression 3 𝟏𝟗𝟑𝟕𝟕. 𝟖𝟑 6459.2667 203.5514
Residual 539 17104.06 31.7329
Total 542 36481.89

Table 3.3 Coefficients


Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept 38.59235 0.937225 𝟒𝟏. 𝟏𝟕𝟕𝟐 36.75129 40.4334106
MARGIN 5.32E-05 2.18E-06 𝟐𝟒. 𝟒𝟎𝟑𝟔 4.89E-05 5.7463E-05
Gender 1.551306 0.777806 𝟏. 𝟗𝟗𝟒𝟒𝟔 0.023404 3.07920835
College -1.47506 0.586995 −𝟐. 𝟓𝟏𝟐𝟗 -2.62814 -0.3219783

(i) Fill up the Tables 3.1, 3.2 and 3.3 above (except the p values and the Significance
F values). Clearly write all the steps. [10 points]
16

SOLUTION
STEP I:
We first fill the data of Table 3.3. We need to standardize the coefficients which means we have
to find the standard deviation of the response variable Y (= %age of votes polled to winning
candidate).

𝑆𝑆𝑌 36481.89
𝑆𝑌 = √ =√ = 8.20425
𝑁−1 543 − 1

Observing that 𝑆𝐸(𝛽̂𝑖 ) = 𝑆𝑥 for each respective variable, we have the t-stat values:
38.59235
𝑡𝛽𝑜 = = 41.177252
0.937225
5.32E − 05
𝑡𝛽1 = = 24.40367
2.18E − 06
1.551306
𝑡𝛽2 = = 1.99446
0.777806
−1.47506
𝑡𝛽3 = = −2.5129
0.586995
STEP II:
We next fill Table 3.2. We have
𝑆𝑆𝐸 = 17104.06
𝑆𝑆𝑇 = 36481.89
so that 𝑆𝑆𝑅 = 𝑆𝑆𝑇 − 𝑆𝑆𝐸 = 19377.83. The regression degrees of freedom is 4 − 1 = 3 and the
residual degrees of freedom is the total minus this 3 so that the residual degrees of freedom is
539. Hence, the Mean-Squared Values are the respective squared-sums divided by their degrees
of freedom:
𝑆𝑆𝑅 19377.83
𝑀𝑆𝑅𝑒𝑔𝑟𝑒𝑒𝑠𝑖𝑜𝑛 = 𝑀𝑆𝑅 = 𝑑𝑓 = = 6459.2667 and
𝑅 3

𝑆𝑆𝐸 17104.06
𝑀𝑆𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 = 𝑀𝑆𝐸 = 𝑑𝑓 = = 31.732949.
𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 539

Thus, the F-value for the model is


𝑀𝑆𝑅 6459.2667
𝐹𝑐𝑎𝑙,3,539 = 𝑀𝑆𝐸 = 31.732949 = 203.551.
17

STEP III:
We now fill table 3.1. We have:
𝑆𝑆𝑅 19377.83
𝑅2 = = = 0.531163
𝑆𝑆𝑇 36481.89

𝑀𝑢𝑙𝑡𝑖𝑝𝑙𝑒 − 𝑅 = √𝑅 2 = √0.531163 = 0.728809

2
(1 − 𝑅 2 )(𝑁 − 1) (1 − 0.531163)(542)
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 − 𝑅 = 1 − =1− = 0.528553
𝑁−𝑘−1 543 − 3 − 1
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐸𝑟𝑟𝑜𝑟 (𝑜𝑓 𝑡ℎ𝑒 𝑚𝑜𝑑𝑒𝑙) = √𝑀𝑆𝐸 = √31.732949 = 5.6332
This competes this part.

(i) Assuming that t is significant for any value greater than 1.964 at 5%, are the
variables (margin, gender and college) significant? [2 points]
SOLUTION
From (i), we have,
𝑡𝛽1 = 𝑡𝑚𝑎𝑟𝑔𝑖𝑛 = 24.40367 > 1.96

so that the variable “margin” is significant at 5%.


Next,
𝑡𝛽2 = 𝑡𝑔𝑒𝑛𝑑𝑒𝑟 = 1.99446

𝑡𝛽3 = 𝑡𝑐𝑜𝑙𝑙𝑒𝑔𝑒 = −2.5129

so that “gender” is significant and again “college” is significant.

(ii) Assuming that the critical value of F is 2.621 at 5% significance, is the overall
regression significant? [2 points]

SOLUTION
Overall Regression is SIGNIFICANT because

𝑀𝑆𝑅 6459.2667
𝐹𝑐𝑎𝑙,3,539 = = = 203.551 > 2.621 at 5% significance level.
𝑀𝑆𝐸 31.732949
18

The analytics firm decides to dig a little deeper and looks at two outlying states, UP and AP, one
of which has significantly lower assets per winner and the other significantly higher. Both the
new variables are 0-1 variables. The values for some of the regressions are given below (Table
3.4).

Table 3.4 Regression Models with Corresponding R-Square


Regression
Model Independent Variables R2
1 MARGIN
2 MARGIN, Gender 0.52567
3 MARGIN, Gender, College 0.531163
MARGIN, Gender, College,
4 UP 0.56051
MARGIN, Gender, College,
5 UP, AP 0.581339

(iii) What is the part correlation for College and % of votes in Regression model 3? [2
points]
SOLUTION
(𝑃𝑎𝑟𝑡 − 𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝐶𝑜𝑙𝑙𝑒𝑔𝑒)2 = 𝑅 2 3 − 𝑅 2 2 = 0.531163 − 0.52567 = 0.005493
and hence,

𝑃𝑎𝑟𝑡 − 𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝐶𝑜𝑙𝑙𝑒𝑔𝑒 = √0.005493 = 0.07411

(iv) Between regression 2 and 5 is it justified to add the additional variables?


[2 points]
SOLUTION
Partial-F Test :=

(𝑅 2 𝑁𝑒𝑤 −𝑅 2 𝑂𝑙𝑑 )
𝑁𝑜.𝑜𝑓 𝑁𝑒𝑤 𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑜𝑟𝑠 (0.581339 − 0.52567)/3
𝐹𝑐𝑎𝑙 = (1−𝑅 2 𝑁𝑒𝑤 )
= (1−0.581339)
= 23.80148
𝑁−𝑘−1 543−5−1

> 2.621 at 5% significance level. Thus, it is justified to add the new variables.
19

Regression model 5 in Table 3.4 has a standard error of 5.333135, an overall F value of 149.1324
with significance of 4.4x10-99 . The standard deviation for the dependent variable is 8.204253.
The values of standard deviation for the dependent and independent variables are given below
(Table 3.5).

Table 3.5
Standard
Coefficients deviation
Intercept 38.56993
MARGIN 5.58E-05 111365.7
Gender 1.498308 0.311494
College -1.53774 0.412796
UP -3.71439 0.354761
AP 5.715821 0.209766

(v) Which variable has the greatest impact on Voting % ?

SOLUTION
To compare the relative impact, we have to standardise these coefficients. The formula is
𝛽
𝛽̂ = 𝑆𝐸𝑌 ×
𝑆𝐸𝛽

and here, 𝑆𝐸𝑌 = 5.6332 so that the values of the standardised coefficients are

Standard Standardised
Coefficients deviation Coefficients
Intercept 38.56993
MARGIN 5.58E-05 111365.7 2.8225e-9
Gender 1.498308 0.311494 27.09608
College -1.53774 0.412796 -20.9846
UP -3.71439 0.354761 58.98027
AP 5.715821 0.209766 153.4965

Hence, AP has the greatest impact on voting percentage.


[2 points]

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy