DS II Mid Term 2017 Solution
DS II Mid Term 2017 Solution
Decision Sciences II
Mid-Term Examination (Solution)
Wednesday, October25, 2017
Time : 180 minutes
Total No. of Pages :20
Name________________________
Total No. of Questions: 3 Roll No. ________________________
Total marks:55 Section ________________________
Instructions
1. This is a closed book exam. You are NOT allowed to use text book and class notes.
2. Answer all questions only in the space provided following the question.
3. Show all work and give adequate explanations to get full credit.
4. You may use the backside of the last page for rough work only if needed. Do NOT attach any
rough work/sheets.
5. Encircle or underline your final answer for each part.
6. No clarifications will be made during the exam.
7. Assume 95% confidence level if necessary ( = 0.05).
8. Use approximate critical values for Z, t, F, and 2 tests if the exact value is not available in the
tables attached with the question paper.
Question Q1 Q2 Q3
Number
Max Marks 20 15 20 Total
Marks Scored
2
Per Capita Income of 20countries were analysed using the variables described in Table 1.
Descriptive statistics of the variables and correlations are shown in Tables 2 and 3 respectively.
Table 3 Correlations
CI Gini CS PerCapita
CI 1 -.464* -.612** .862**
Gini -.464* 1 .253 -.338
CS -.612** .253 1 -.556*
Per Capita .862** -.338 -.556* 1
Model 1
Y (Per Capita) = 0 + 1 x CI
A simple linear regression model (Model 1) is developed between per capita (Y) and corruption
index (CI).SPSS model outputs are shown in Tables 4 and 5. Normal P-P Plot and Residual Plot
are shown in Figures 1 and 2 respectively.
3
Table 5 Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) -3112.753 5950.818 -.523 .607
1
CI 662.914 91.706 .862
a. Dependent Variable: Per Capita
SOLUTION:
r = 0.862 (correlation value of percapita and CI)
R2 = r2 = (0.862)2 = 0.743
74.3 % of variation in percapita income is explained by CI.
4
SOLUTION:
We can use either F test or t test for this problem.
H0 : β1 = 0
H1 : β1 ≠ 0
T test: F test:
t- stat = B1 - β1 / SE (B1) = 7.22 F = (SSR/k) / (SSE/ (n-k-1) ) = 52.05
t-critical (0.025,18) = 2.101 F-critical(0.05,1,18) = 4.414
t-stat > t-critical, we reject null hypothesis. F > F-critical, we reject null hypothesis.
There is a statistically significant relationship between percapita income and
CI at 5% significance.
SOLUTION:
H0 : β1< 500
H1 : β1 ≥ 500
T test:
t- stat = B1 - β1 / SE (B1) = 1.776
t-critical(0.1,18) = 1.33 (since one tailed test)
t-stat > t-critical, we reject null hypothesis.
Therefore, the per capita income increases by at least 500 dollars for every one
unit increase in corruption index at 10% significance level.
5
SOLUTION:
Figure 1 : The error follows approximately normal distribution.
Figure 2 : The residual plot shows no pattern and therefore we can conclude that there is
homoscedasticity.
SOLUTION:
^ 1 ( X i X )2
Y i t / 2, n k 1 * se *
n SSX
^
Y i 30032.947
tα/2, 18 = 2.101
se = 8241.439
Xi = 50 X = 61.70
A second model is developed between Per Capita and Communists States (CS).
Model 2
Y (Per Capita) = 0 + 1 x CS
The outputis shown in Table 6.Normal probability and residual plots are shown in Figures 3 and
4 respectively.
Table 6 Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) 42743.933 3495.319 12.229 .000
1
CS 6990.639 -.556
a. Dependent Variable: PerCapita
6
SOLUTION:
^
Y 0 1 * X
β1 = Std β * ( SY / SX ) = -19831.646
β0 = 42743.933
X = 1 (for communist states)
^
Y = 22912.282
SOLUTION:
From table 6, p-value is 0.
When p-value < 0.05, we can say that the model is significant but from Figure 4, it is clear that
there exists heteroscedasticity. So, the p-value may not be reliable.
7
A stepwise regression model is developed using CI and Gini as independent variables and the
outputs are shown in Table 7.
Table 7 Stepwise Regression Output
Model Unstandardized t Sig. Correlations
Coefficients
B Std. Error Zero-order Partial Part
(Constant) -3112.753 5950.818 -.523 .607
1
CI 662.914 91.706 7.229 .000 .862 .862 .862
(Constant) -10781.284 14572.250 -.740 .469
2 CI 691.235 105.487 6.553 .000 .862 .846 .797
Gini -.338 .139 .070
SOLUTION:
R2 after adding gini = R2 of model without gini + ( part correlation of gini in new model)2
= 0.7479
SOLUTION:
Use partial F test for this problem.
H0 : β1 = 0
H1 : β1 ≠ 0
Partial F test :
( RF2 RR2 ) / r
= 0.33048
(1 RF2 ) / (n k 1)
F critical (0.05, 1, 17) = 4.451
SOLUTION:
VIF = 1 / (1 - R2) = 1.27436
(R = correlation value of CI and gini)
Since VIF <4, no multicollinearity between CI and gini.
A stepwise regression model is developed using all the 3 independent variables and the SPSS
outputs are given in Tables 8 and 9
Table 8 Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) -3112.753 5950.818 -.523 .607
1
CI 662.914 91.706 .862 7.229 .000
a. Dependent Variable: PerCapita
SOLUTION:
We have to do F test for percapita,gini and percapita,CS.
F test:
F = R2 / ( (1 - R2)/18 )
9
Percapita,gini Percapita,CS
H0 : β1 = 0 H0 : β1 = 0
H1 : β1 ≠ 0 H1 : β1 ≠ 0
Model 1 ANOVA
Sum of Mean
Model Squares Df Square F Sig.
1 Regression 1662.584 1 1662.584 17.852 .000
Residual 7264.416 78 93.134
Total 8927.000 79
Model 1 Coefficients
Coefficients Coefficients
Model 2 Summary
Adjusted Std. Error
R R of the Durbin-
Model R Square Square Estimate Watson
1 .764 .583 .561 7.042 1.878
a Predictors: (Constant), Test3, Seniority, Test1, Test2
b Dependent Variable: JobPerf
Model 2 ANOVA
Sum of Mean
Model Squares Df Square F Sig.
1 Regression 5208.110 4 1302.027 26.258 .000
Residual 3718.890 75 49.585
Total 8927.000 79
Model 2 Coefficients
Standardize
Mode Unstandardized d Collinearity
l Coefficients Coefficients t Sig. Statistics
Std. Toleranc
B Error Beta e VIF
1 (Constant) 6.557 6.187 1.060 .293
Seniority .801 .155 .388 5.171 .000 .986 1.014
Test1 .300 .112 .271 2.693 .009 .550 1.819
Test2 .086 .135 .080 .640 .524 .355 2.816
Test3 .407 .154 .352 2.638 .010 .313 3.197
Model 3 Summary
Adjusted Std. Error
R R of the Durbin-
Model R Square Square Estimate Watson
1 .762 .581 .565 7.014 1.891
a Predictors: (Constant), Test3, Seniority, Test1
b Dependent Variable: JobPerf
Model 3 ANOVA
12
Sum of Mean
Model Squares Df Square F Sig.
1 Regression 5187.803 3 1729.268 35.148 .000
Residual 3739.197 76 49.200
Total 8927.000 79
Model 3 Coefficients
Standardize
Unstandardized d Collinearity
Model Coefficients Coefficients t Sig. Statistics
Std. Toleranc
B Error Beta e VIF
1 (Constant) 7.893 5.801 1.361 .178
Seniority .793 .154 5.157 .000 .993 1.008
Test1 .312 .110 2.844 .006 .565 1.771
Test3 .473 .114 4.145 .000 .567 1.764
Model 4 Summary
Adjusted Std. Error
R R of the Durbin-
Model R Square Square Estimate Watson
1 .757 .574 .562 7.031 1.843
a Predictors: (Constant), AvgScore, Seniority
b Dependent Variable: JobPerf
Model 4 ANOVA
Sum of Mean
Model Squares Df Square F Sig.
1 Regression 5120.011 2 2560.006 51.779 .000
Residual 3806.989 77 49.441
Total 8927.000 79
Model 4 Coefficients
Standardize
Mo Unstandardized d Collinearity
del Coefficients Coefficients t Sig. Statistics
Std. Toleranc
B Error Beta e VIF
1 (Constant) 5.407 6.010 .900 .371
Seniority .821 .154 .398 5.339 .000 .997 1.003
AvgScore .782 .094 .623 8.362 .000 .997 1.003
13
Use the information given above to answer the following questions. Specify the model(s) you
use to draw your conclusions, where relevant.
a) Can it be concluded that performance rating improves with length of stay with the company
(Seniority), irrespective of the original test scores? Select the appropriate model to answer the
question. (3 points)
Ans. Here we use Model 1 to test the hypothesis since test scores are ignored.
H0 : β1<= 0 vs. H1 : β1 > 0 (One-sided t-test)
S.E(β1) = Se/((n-1)*SSx)^(.5) = 9.651/(79*25)^(.5) = 0.217
So the t-statistic value will be given by -
tcalc= β1 – 0/S.E(β1) = .891/.217 = 4.103
From the table we get the value of t as – t.05,79 = 1.665
So, tcalc> t.05,78.
So, we reject H0 and hence conclude that performance rating increases with length
of stay with the company.
b) Predict the average performance rating for a worker who has 15 months of Seniority. What
are the highest and lowest performance ratings that this worker is likely to get at 90%
confidence level? (3 Points)
Ans. To predict the average performance rating of a worker with 15 months of experience:
AvgY = b0 + b1 * Seniority = 48.928 + .891*15 = 62.293
Hence, the average performance rating of the worker is 62.293
Prediction interval at 90% confidence level (note: PI for individual, not average)
P.I = (Y- t.05,78 *Se *(1+(1/n)+(X-𝑋̅)2/SSx )^(.5) , Y+ t.05,78 * Se *(1 +(1/n)+(X -𝑋̅)/SSx )^(.5))
= (62.293 – 1.664 * 9.651*(1+(1/80)+(15-18.89)2/(79*25))^(.5) ,
62.293 + 1.664 * 9.651*(1+(1/80)+(15-18.89)2 /(79*25))^(.5))
= (62.293 – 16.203 , 62.293 + 16.203 )
= ( 46.09 , 78.513)
So the highest and lowest performance ratings are: 78.513 and 46.09.
c) If Test 2 was used to predict performance scores on its own, is it likely to be a significant
predictor of JobPerf? Justify. In the presence of other 3 variables is it a significant predictor.
Why or not why not? (3 Points)
Ans. We can see from the correlation matrix that Corr(Test 2, JobPerf) = 0.52.
So if we build a model using Test 2 then it will have R2 = .522 = .2704 i.e. it will
explain 27% of the variation in the model which is satisfactory. So, we can conclude
that a significant relationship between Job Performance and Test 2 is likely.
From model 3, we see that in presence of other 3 variables, coeff of Test2 has a p-
14
value = .524, indicating that in presence of the other variables, Test 2 is not a
significant predictor. The reason for this is Test 2 has high correlation with Test 3
(.8) and Test 1 (.6). The part of the variation explained by Test 2 has largely been
explained by these other variables, leading to Test 2 becoming insignificant.
d) Can it be concluded that employees with higher average scores on the tests stay longer with
the company? Choose the appropriate models to compare. (3 points)
Ans. Here we consider Model 1 and Model 4. We use omitted variable bias to
conclude.
Model 1: 48.928 + 0.891*Seniority
Model 4: 5.407 + 0.821*Seniority + 0.782*AvgScore
The formula for omitted variable bias is given by -
α1 = β1 + β2 * Cov(X1,X2) / Var(X1) where, X1=Seniority , X2=AvgScore
Now , we are given α1 =0.891 , β1 = 0.821 , β2 = 0.782
Therefore,
Cov(X1,X2) / Var(X1) = (.891-.821)/.782 = .07/.782 > 0
Also, Var(X1) > 0 always.
=> Cov(X1,X2) > 0 i.e. X1 and X2 are positively correlated.
So, we can conclude that employees with higher average scores stay longer with the
company.
e) Two employees whose seniority differs by 5 months have the same average test score. Can it
be concluded that the performance rating of the more senior employee will be at least 3 points
higher at 5% significance level? (3 Points)
Ans. Here, the seniority differs by 5 months and we have to test if performance ratings
changes by at least 3 points.
So, in one month the rating would have to change by (3/5) = 0.6 points.
The hypothesis:
H0 : β1< 0.6 vs. H1 : β1 >= 0.6 right tailed t-test
The t-statistic is computed as -
tcalc = (0.821 – 0.6) / 0.154 = 0.221 / 0.154 = 1.435
From the t table we get : t.05,77 = 1.664
Hence tcalc < t.05,77
Therefore, we cannot reject the null hypothesis.
Hence, we cannot conclude that given a seniority difference 5 months, the
performance rating of the more senior employee will differ by at least 3 points.
15
Table 3.2
ANOVA
Significance
Df SS MS F F
Regression 3 𝟏𝟗𝟑𝟕𝟕. 𝟖𝟑 6459.2667 203.5514
Residual 539 17104.06 31.7329
Total 542 36481.89
(i) Fill up the Tables 3.1, 3.2 and 3.3 above (except the p values and the Significance
F values). Clearly write all the steps. [10 points]
16
SOLUTION
STEP I:
We first fill the data of Table 3.3. We need to standardize the coefficients which means we have
to find the standard deviation of the response variable Y (= %age of votes polled to winning
candidate).
𝑆𝑆𝑌 36481.89
𝑆𝑌 = √ =√ = 8.20425
𝑁−1 543 − 1
Observing that 𝑆𝐸(𝛽̂𝑖 ) = 𝑆𝑥 for each respective variable, we have the t-stat values:
38.59235
𝑡𝛽𝑜 = = 41.177252
0.937225
5.32E − 05
𝑡𝛽1 = = 24.40367
2.18E − 06
1.551306
𝑡𝛽2 = = 1.99446
0.777806
−1.47506
𝑡𝛽3 = = −2.5129
0.586995
STEP II:
We next fill Table 3.2. We have
𝑆𝑆𝐸 = 17104.06
𝑆𝑆𝑇 = 36481.89
so that 𝑆𝑆𝑅 = 𝑆𝑆𝑇 − 𝑆𝑆𝐸 = 19377.83. The regression degrees of freedom is 4 − 1 = 3 and the
residual degrees of freedom is the total minus this 3 so that the residual degrees of freedom is
539. Hence, the Mean-Squared Values are the respective squared-sums divided by their degrees
of freedom:
𝑆𝑆𝑅 19377.83
𝑀𝑆𝑅𝑒𝑔𝑟𝑒𝑒𝑠𝑖𝑜𝑛 = 𝑀𝑆𝑅 = 𝑑𝑓 = = 6459.2667 and
𝑅 3
𝑆𝑆𝐸 17104.06
𝑀𝑆𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 = 𝑀𝑆𝐸 = 𝑑𝑓 = = 31.732949.
𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 539
STEP III:
We now fill table 3.1. We have:
𝑆𝑆𝑅 19377.83
𝑅2 = = = 0.531163
𝑆𝑆𝑇 36481.89
2
(1 − 𝑅 2 )(𝑁 − 1) (1 − 0.531163)(542)
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 − 𝑅 = 1 − =1− = 0.528553
𝑁−𝑘−1 543 − 3 − 1
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐸𝑟𝑟𝑜𝑟 (𝑜𝑓 𝑡ℎ𝑒 𝑚𝑜𝑑𝑒𝑙) = √𝑀𝑆𝐸 = √31.732949 = 5.6332
This competes this part.
(i) Assuming that t is significant for any value greater than 1.964 at 5%, are the
variables (margin, gender and college) significant? [2 points]
SOLUTION
From (i), we have,
𝑡𝛽1 = 𝑡𝑚𝑎𝑟𝑔𝑖𝑛 = 24.40367 > 1.96
(ii) Assuming that the critical value of F is 2.621 at 5% significance, is the overall
regression significant? [2 points]
SOLUTION
Overall Regression is SIGNIFICANT because
𝑀𝑆𝑅 6459.2667
𝐹𝑐𝑎𝑙,3,539 = = = 203.551 > 2.621 at 5% significance level.
𝑀𝑆𝐸 31.732949
18
The analytics firm decides to dig a little deeper and looks at two outlying states, UP and AP, one
of which has significantly lower assets per winner and the other significantly higher. Both the
new variables are 0-1 variables. The values for some of the regressions are given below (Table
3.4).
(iii) What is the part correlation for College and % of votes in Regression model 3? [2
points]
SOLUTION
(𝑃𝑎𝑟𝑡 − 𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝐶𝑜𝑙𝑙𝑒𝑔𝑒)2 = 𝑅 2 3 − 𝑅 2 2 = 0.531163 − 0.52567 = 0.005493
and hence,
(𝑅 2 𝑁𝑒𝑤 −𝑅 2 𝑂𝑙𝑑 )
𝑁𝑜.𝑜𝑓 𝑁𝑒𝑤 𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑜𝑟𝑠 (0.581339 − 0.52567)/3
𝐹𝑐𝑎𝑙 = (1−𝑅 2 𝑁𝑒𝑤 )
= (1−0.581339)
= 23.80148
𝑁−𝑘−1 543−5−1
> 2.621 at 5% significance level. Thus, it is justified to add the new variables.
19
Regression model 5 in Table 3.4 has a standard error of 5.333135, an overall F value of 149.1324
with significance of 4.4x10-99 . The standard deviation for the dependent variable is 8.204253.
The values of standard deviation for the dependent and independent variables are given below
(Table 3.5).
Table 3.5
Standard
Coefficients deviation
Intercept 38.56993
MARGIN 5.58E-05 111365.7
Gender 1.498308 0.311494
College -1.53774 0.412796
UP -3.71439 0.354761
AP 5.715821 0.209766
SOLUTION
To compare the relative impact, we have to standardise these coefficients. The formula is
𝛽
𝛽̂ = 𝑆𝐸𝑌 ×
𝑆𝐸𝛽
and here, 𝑆𝐸𝑌 = 5.6332 so that the values of the standardised coefficients are
Standard Standardised
Coefficients deviation Coefficients
Intercept 38.56993
MARGIN 5.58E-05 111365.7 2.8225e-9
Gender 1.498308 0.311494 27.09608
College -1.53774 0.412796 -20.9846
UP -3.71439 0.354761 58.98027
AP 5.715821 0.209766 153.4965