0% found this document useful (0 votes)
148 views26 pages

STAT 302-1 Sample Final Exam

STAT 302 Sample Final
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
148 views26 pages

STAT 302-1 Sample Final Exam

STAT 302 Sample Final
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 26
Simon Fraser University STAT 302 : Final examination Dat Student Number: Last Name : First Name: Programmable and graphic calculators are NOT allowed. Point values are given in parentheses. 65 points maximum. Duration 3 hrs. Question |1a-b [2a-c [3a-b [3cd | 4a | 4b Score Maximum | 6 5 9 5 2 3 Score Question |5f-g | 6a-b | 6c-d 7a 7b 7c Score | Maximum | 4 4 5 2 2 2 Score Question 1. (a) (2 marks) Explain the difference between an influential point and an outlier. (b) (4 marks) A simple linear regression (call this regression #1) was carried out and one point was determined to be an outlier but not an influential point. It is removed from the data and the regression line is refit using the remaining data (call this regression #2). Which of the following quantities will differ by a substantial amount between the two regressions? If there is a substantial difference, indicate for which regression (regression #1 or regression #2) the quantity is larger. Indicate if you don’t have enough information to make a conclusion. If there is no substantial difference, give a reason. (i) the slope (i)? Question 2. (5 marks) For each of the following models, state whether its parameters can be estimated using standard linear regression techniques. If linear regression can be used, what are the independent and dependent variables? (a) Y; = Bo+ exp(B.X)) + & [Note: exp(x) means e*) (b) Yi =1/(Bo+ BiX: + €:) (c) Yi = Bo+ BiXin + B2Xia + €: where Bis known to be 8. Question 3. The data analysed in this question are from a random sample of healthy adults. Simple linear regression was used to consider how well the age of an adult (in years, variable name is age) predict his/her systolic blood pressure (in mmHg, variable name is pressure), Use the output below to answer the questions that follow. Parameter estimates [ | Estimate Std.Err tvalue Pr(>|tl) (intercept) [11231666 [128744 | 87.24 <2e-16 *** age 0.44509 [0.02777 16.03 4.2de-12 ** Residual standard error: 2.12 on 18 degrees of freedom Multiple R-Squared: 0.9345, Adjusted R-squared: 0.9309 F-statistic: 256.8 on 1 and 18 DF, p-value: 4.239e-12 Analysis of Variance [ DF “sum Sq MeanSq |Fvalue | Pr(>F) ‘Age E jaisaiz (115412 | 256.84 4.239e-12 Residuals [18 80.88 /4.49 a (a) (7 points) Complete the chart (A through G) below. Statistic Observed Value Slope of line (A) Correlation between age and systolic | (B) blood pressure ‘Average change in systolic blood © pressure for an increase of 10 in age Estimate of systolic blood pressure (D) when age is 45. | Estimated variance of intercept © P-value for test of Ho: Bi=O versus H,: | (F) Bi 40 Estimate of 0 (6) {b}(2 points) Briefly explain the principle of least squares. {c) (2 points) Find a 95% confidence interval for the slope. (d)(3 points) Given that a 90% prediction interval for the systolic blood pressure of a 50 year old is (130.7914, 138.3508), find a 90% confidence interval for the mean of systolic blood pressure when age is 50. Question 4, A simple random sample of 15 apparently healthy children between the ages of 6 months and 15 years yielded the following data on age, X, and liver volume per unit of body weight (ml/kg),Y: x Y x y 05 a 10.0 26 07 [SS 10.1 35 25 41 10.9 25 41 | 39 115 31, 5.9 7 50 12.1 31 61 32 14.1 29 [70 a1 15.0 23 82 42 (a) (2 marks) Compute the sample correlation coefficient. You might find the following useful: 5x = 118.7, Sx’ = 1235.35, Sy = 541, Sy’ = 20695, Sxy =3814.5, (b) (3 marks) Test Ho: p = 0 at the 0.05 level of significance and state your conclusion Question 5. A study was conducted on medical devices from three different suppliers, for the continuous delivery of an anti-inflammatory hormone. The remaining hormone in a device is expected to be linearly related to the time the device has been in use. Hence, we study the relationship between the remaining hormone in a device (Y) and the time the device has been in use (x). The following two dummy (indicator) variables Z, = 1if device is from Supplier A, 0 otherwise 2 = Lif device is from Supplier B, 0 otherwise index the three groups. Use the following output to answer the questions that follow: Parameter estimates Estimate Std.Err Pr(>itl) intercept) 37.193671 1.506316 <2e-16 x -0.074518 | 0.012740 8.33e-06 qs ~3,833616 1933112 ‘| 0.0606 Z -1.987554 | 1.844497 osc XZ, 0.006222 ‘| 0.014670 joerse X*Z } 0.018232 0.013348 1.366 0.1864 =e —__ Analysis of Variance Table 1: DF | Sum Sq x 1 | 936.54 21x 1 | 81.24 Zl Za, X 1 | 0.88 X* 2:1 Za, Zi, X 1 | 3.88 X*ZIX*Z, ZX [1 [4.52 Residuals 21 | 50.87 Analysis of Variance Table 2: DF | Sum Sq x 1 | 936.54 ZIX 1 | 81.24 X* Zi] ZX 1 [459 Z,|X* Zs, ZX 1 [0.18 X*Zp| Zo, X*Zs, ZX | 1 | 4.52 Residuals 21 | 50.87 (a) (1 mark) State a single regression model that defines straight-line models relating Y to x for all 3 suppliers. 10 (b) (3 marks) Test the null hypothesis that the straight lines for the three suppliers coincide. (c) (3 marks) Test Ho : “The (three) lines are parallel” versus H, : “The lines are not parallel”. uw (d) (3 marks) Provide estimates of (i) difference in intercepts between suppliers A and 8 (ii) difference in intercepts between suppliers B and C (e)(2 marks) Using the ANOVA tables given above, is it possible to test the hypothesis that the slopes and intercepts are the same for suppliers A and B? If it is possible, perform the test. If it is not possible, write “not possible” and state the null and alternative hypotheses in terms of regression coefficients. 2 (f) (2 marks) Using the ANOVA tables given above, is it possible to test the hypothesis that the slopes and intercepts are the same for suppliers A and C?_ If it is possible, perform the test. If it is not possible, write “not possible” and state the null and alternative hypotheses in terms of regression coefficients, (g) (2 marks) Using the ANOVA tables given above, is it possible to test the hypothesis that the slopes and intercepts are the same for suppliers B and C? If it is possible, perform the test. If it is not possible, write “not possible” and state the null and alternative hypotheses in terms of regression coefficients. B Question 6. Accompany wants to compare three different point-of-sale promotions for its snack foods. The three promotions are: Promotion 1: Buy two items, get a third free. Promotion 2: Mail in a rebate for $1.00 with any $2.00 purchase Promotion 3: Buy reduced-price multi-packs of each snack food. The company is interested in the average increase in sales volume due to the promotions. Fifteen grocery stores were selected in a targeted market, and each store was randomly assigned one of the promotion types. During the month-long run of the promotions, the company collected data on increase in sales volume (Y, in hundreds of units) at each store, to be gauged against average monthly sales volume (X, in hundreds of units) prior to the promotions. Let Z; = 1 if promotion type 1, or 0 otherwise. Let Z.= 1 if promotion type 2, or 0 otherwise, The sample data are shown in the following table: "Store # Promotion Y x : er: 39 la 1 T 3B a2 2 i 2B 4 3 a7 39 — 5 | 38 15 7 37 6 a 18 31 7 1 “a2 8 2 | 19 88 9 3 24 33 | (continue nxt pg) 14 10 13 44 a 7 26 12 5 20 13 8 32 14 7 36 i. 15 19 29 Output: Parameter estimates Estimate Std.Err tvalue Pr(>{tl) (Intercept) 31.9216 24.4351 1.306 0.224 x -0.4069 0.6919 -0.588 0.571 Z -39,9627 27.1686 “1.471 0.175 Zp ~36.2958 26.0337 -1,394 0.197 X*Z, 0.9802 0.7595 1.291 0.229 xX*Z, 0.9975 0.7576 1.317 0.220 15 Analysis of Variance Table 1 (Y regressed on X, 21, Z:, X*Z1, X*Zs): DF | Sum Sq x 1 [115.602 ZlX 1 | 61.212 Zl ZX 1 [6.852 X* Z| Za, ZX 1 [2.414 X* Z| X*Z1,Z,,Z,X | 1 | 33.864 Residuals 9 | 175.790 Parameter estimates Estimate Std.Err tvalue Pr(> itl) (Intercept) | 0.3004 7.5836 0.040 0.9691 x 0.4915 0.2081 2.362 0.0377 zy ~5.2812 2.8145 -1.876 0.0874 z -1.8580 3.1167 ~0.596 0.5631 Analysis of Variance Table 2 (Y regressed on X, Z:, 22): Df | Sum Sq x 1 [115.602 21x 1 | 61.212 Z| ZX 1 | 6.852 Residuals 11 | 212.068 16 (a) (1 mark) State an ANACOVA regression model for comparing the three promotion types, controlling for average pre-promotion monthly sales. (b) (3 marks) Identify the model that should be used to check whether the ANACOVA model in part (a) is appropriate. Carry out the appropriate test (a = 0.05). v7 (c) (3 marks) Using ANACOVA, fill in the table below with adjusted and unadjusted mean increases in sales volume for the three promotions. Sales Increase Adjusted Means Unadjusted Means Promotion 1 - Promotion 2 Promotion 3 (d) (2 marks) Test whether the adjusted mean increases in sales volume for the three promotions differ significantly from one another, 18 Question 7 ‘An experiment was conducted to evaluate the effects of Xj, X» and Xs (independent variables) on Y (the dependent variable). Use the following output to answer the questions that follow: Dependent variable: ¥ Parameter estimates Estimate Std.Err t value Pr(>[tl) | (Intercept) 4.004 | 16.125 6.01e-08 *** 15.821 ~6.337 0.000135 *** Residual standard error: 3.765 on 9 degrees of freedom Multiple R-Squared: 0.8169, Adjusted R-squared: 0.7966 F-statisti : 40.15 on 1 and 9 DF, p-value: 0.000135 Analysis of Variance Table 1 (Y regressed on X,): Df|SumSq | Xs 1 | 569.04 Residuals 9 | 127.55 19 Dependent varia ble: Y Parameter estimates lessee Estimate | std.rr “| t value Pre>[tl) (Intercept) [19.2978 1.5365 | 12.56 5.22e-07 *** % 3.1892 0.2193 “/14.54 1.48e-07 ** | Residual standarc ‘d error: 1.778 on 9 degrees of freedom Multiple R-Squared: 0.9592, Adjusted R-squared: 0.9546 F-statistic: 211.4 Analysis of Varia on Land 9 DF, p-value: 1.477¢-07 nce Table 2 (Y regressed on X:): | Df | Sum Sq m 1 | 668.14 Residuals 9 | 28.44 {a) (2 marks) For the two simple linear regressions, which variable (Xs or Xz) do you think is a better predictor of Y? Why? 20 Here is an output from the multiple linear regression with independent variables X, Xoand X3. Dependent varia ble: ¥ Parameter estimates Estimate Std.Err tvalue Pr(sith) (intercept) | -1.5953 18.0435, -0.088 0.9320 cA 76.4568 44.2951 1726 0.1280 x 15758 0.7313 2.155 0.0681 De “23.7705, 13.3461 ‘| -1.781, 0.1181 7 Model Residuals (b) (2 marks) What are the hypothesis for the analysis of variance F-test (table 3) and what do you conclude? (Use a = 0.05) 2 Output of Pearson correlation coefficients: Y x X % ly “1.00000 o.9seeai7 | 0.9793729 | -0.9038247 | Xs 0.9588417 | 1.00000 0.9515815 _|-0.8190615 X 0.9793729 _|0.9515815 | 1.00000 ~0,8784658 % -0.8190615 |-0.8784658 | 1.00000 (c) (2 marks) Given the output of pearson correlation coefficients, is there any indication of multicollinearity? How do you tell? 2 (4) (4 marks) Sketch typical residuals plots that illustrate each of the following conditions. Clearly indicate what you are plotting. (i) The error variance increases with X. (ii) There is a non-linear relationship with X3. 23 TABLE A2 Pevoentiles of the Distribution Shude' ¢daebation Gale 12708 31871 6687 Ga6.19 2am ‘tans “oges “e825 “31809 25s 3382 Agu Sat 12504 dia 2764787 Asa ‘B10 dois an 3365 Aon 6.80 ten au? aa 3.707 8989 Yees 2505 2e8 459 sate tes 23082083386 Ett vas Yaer Dez! 3260, 4701 tea 2b 264 eo er tym 2201278 a8 aaa tam 2199263055843 tan 2ieo28e0 ne Azan Yar 2as 26m 2077440 ton Bhat 2802 2847s 7s 21702583 20714016 too 202587 200 3.058 wie bhor2562 Pare 382 tos Sonn 2599 a0 30g iis 2aee 2682045 3.880 sya 20002518 2a sat8 iy fom sce Bata Sg S34 zoeo 3800307 S208 int dock Saaz B07 Se Sion 2oco base 278725 702066 24792078507 Yio beeen E31 Sabo fio aaa baer 362 Some Yoo 2as2aee 2988 See feo Zoe 2s? 2780 3a se | ox27 oage ogee © h.0521g08 ©1600 2000 za 2704 aaa Gass 3a Get be 1300 tgge Fort Baz 2708 ast te | ote 0388 Oseo Nea 1am fuse 2at2 200 3.820 ‘ % | ote 083 Oars «Novy 1300 tere 3009203 2mm 3 a0 a & | ots one7 oss hoes 1206181 2390 2680 3.400 yo | cx oa? ose toes 120887 2381 20485436 & | crs 0387 Summ tes 1302 tot 23m 26s a6 % | orgs 0387 os? Nea 1301 he 33s 2en2 3.402 oo | onze 0386 © sr? Nota $260 80 2aee 2am 3300 io | ong oe Ory Noe 1268 se ahs 2817 3373 sao | one 088 ©0678 © t.04 1.2888 pass 2813961 eo | ose O28 ©8981 1e7 se 238028073382 eo | orze 0s O78 tan 186 88a Pour 2803 148 zm | cigs cae ogre ano 13e8 ted 250 2601 3:50 o | Oi baa Game tous ase tees 18602528 2576381 823 a RaRR RR: one Feo AS mae (pongo) wopnaunsia. 4 2m Jo seHWOKA FV ATA,

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy