Causality: Causes y
Causality: Causes y
Causality
When there is a significant regression of y and x, it is tempting to conclude that x
causes y. However, it is possible that one or more unknown variables that you have
not even measured and that are not included in the analysis may be causing the ob-
served relationship. In general, the statistician reports the results of an analysis but
leaves conclusions concerning causality to scientists and investigators who are experts
in these areas. These experts are better prepared to make such decisions!
12.5 EXERCISES
BASIC TECHNIQUES 12.22 Refer to Exercise 12.8. The data, along with
12.19 Refer to Exercise 12.6. The data are repro- the MS Excel analysis of variance table are reproduced
duced below. below:
x !2 !1 0 1 2 x 1 2 3 4 5 6
a. Do the data present sufficient evidence to indicate MS Excel ANOVA table for Exercise 12.22
that y and x are linearly related? Test the hypothesis ANOVA
that b " 0 at the 5% level of significance. df SS MS F Significance F
Regression 1 49.72857 49.72857 111.45 0.000
b. Use the ANOVA table from Exercise 12.6 to calcu- Residual 4 1.78476 0.4462
late F " MSR/MSE. Verify that the square of the Total 5 51.51333
t statistic used in part a is equal to F.
a. Do the data provide sufficient evidence to indicate
c. Compare the two-tailed critical value for the t-test
that y and x are linearly related? Use the informa-
in part a with the critical value for F with a " .05.
tion in the printout to answer this question at the
What is the relationship between the critical values?
5% level of significance.
12.20 Refer to Exercise 12.19. Find a 95% confi-
b. Calculate the coefficient of determination r2. What
dence interval for the slope of the line. What does the information does this value give about the useful-
phrase “95% confident” mean? ness of the linear model?
12.21 Refer to Exercise 12.7. The data, along with
the MINITAB analysis of variance table are reproduced APPLICATIONS
below.
12.23 Chirping Crickets In Exercise 3.18,
x 1 2 3 4 5 6 EX1223 we found that male crickets chirp by rubbing
y 5.6 4.6 4.5 3.7 3.2 2.7 their front wings together, and their chirping is temper-
ature dependent. The table below shows the number of
MINITAB ANOVA table for Exercise 12.21
chirps per second for a cricket, recorded at 10 different
Regression Analysis: y versus x temperatures:
Chirps per Second 20 16 19 18 18 16 14 17 15 16
Analysis of Variance
Source DF SS MS F P
Regression 1 5.4321 5.4321 152.10 0.000 Temperature 88 73 91 85 82 75 69 82 69 83
Residual Error
Total
4
5
0.1429
5.5750
0.0357
a. Use the formulas given in this chapter to find the
least-squares regression line relating the number of
a. Do the data provide sufficient evidence to indicate chirps to temperature. Compare to the results
that y and x are linearly related? Use the informa- obtained in Exercise 3.18.
tion in the MINITAB printout to answer this question b. Do the data provide sufficient evidence to indicate
at the 1% level of significance. that there is a linear relationship between number of
b. Calculate the coefficient of determination r 2. What chirps and temperature?
information does this value give about the useful- c. Calculate r 2. What does this value tell you about
ness of the linear model? the effectiveness of the linear regression analysis?
03758_13_ch12_p482-529.qxd 9/7/11 1:06 PM Page 501
12.24 Gestation Times and Longevity The is achieved by using the linear regression
EX1224 table below, a subset of the data given in Exercise model?
3.33, shows the gestation time in days and the average c. Plot the data or refer to the plot in Exercise 12.9,
longevity in years for a variety of mammals in captivity.4 part b. Do the results of parts a and b indicate that
the model provides a good fit for the data? Are
Gestation Avg Longevity there any assumptions that may have been violated
Animal (days) (yrs)
in fitting the linear model?
Baboon 187 20
Bear (black) 219 18 12.26 Refer to the sleep deprivation experiment
Bison 285 15 described in Exercises 12.11 and 12.12 and data set
Cat (domestic) 63 12 EX1211. The data and the MINITAB and MS Excel
Elk 250 15 printout are reproduced here.
Fox (red) 52 7
Goat (domestic) 151 8 Number of Errors, y 8, 6 6, 10 8, 14
Gorilla 258 20 Number of Hours without Sleep, x 8 12 16
Horse 330 20
Monkey (rhesus) 166 15 Number of Errors, y 14, 12 16, 12
Mouse (meadow) 21 3 Number of Hours without Sleep, x 20 24
Pig (domestic) 112 10
Puma 90 12
Sheep (domestic) 154 12 MINITAB output for Exercise 12.26
Wolf (maned) 63 5
Regression Analysis: y versus x
a. If you want to estimate the average longevity of an The regression equation is
animal based on its gestation time, which variable y = 3.00 + 0.475 x
is the response variable and which is the indepen- Predictor Coef SE Coef T P
Constant 3.000 2.127 1.41 0.196
dent predictor variable? x 0.4750 0.1253 3.79 0.005
b. Assume that there is a linear relationship between S = 2.24165 R-Sq = 64.2% R-Sq(adj) = 59.8%
gestation time and longevity. Calculate the least- Analysis of Variance
squares regression line describing longevity as a
Source DF SS MS F P
linear function of gestation time. Regression 1 72.200 72.200 14.37 0.005
Residual Error 8 40.200 5.025
c. Plot the data points and the regression line. Does it Total 9 112.400
appear that the line fits the data?
d. Use the appropriate statistical tests and measures to
explain the usefulness of the regression model for MS Excel output for Exercise 12.26
predicting longevity.
ANOVA
12.25 Professor Asimov, continued Refer to the df SS MS F Significance F
data in Exercise 12.9, relating x, the number of books Regression 1 72.2 72.2 14.368 0.005
written by Professor Isaac Asimov, to y, the number of Residual 8 40.2 5.025
Total 9 112.4
months he took to write his books (in increments of
Coefficients Standard t P- Lower Upper
100). The data are reproduced below. Error Stat value 95% 95%
Intercept 3 2.1266 1.4107 0.1960 -1.9040 7.9040
Number of Books, x 100 200 300 400 490 x 0.475 0.1253 3.7905 0.0053 0.1860 0.7640
Time in Months, y 237 350 419 465 507
a. Do the data present sufficient evidence to indicate
a. Do the data support the hypothesis that b " 0? Use that the number of errors is linearly related to the
the p-value approach, bounding the p-value using number of hours without sleep? Identify the two
Table 4 of Appendix I. Explain your conclusions in test statistics in the printout that can be used to
practical terms. answer this question.
b. Use the ANOVA table in Exercise 12.9, part c, b. Would you expect the relationship between y and x
to calculate the coefficient of determination r 2. to be linear if x varied over a wider range (say,
What percentage reduction in the total variation x " 4 to x " 48)?
03758_13_ch12_p482-529.qxd 9/7/11 1:06 PM Page 502
c. How do you describe the strength of the relation- MINITAB output for Exercise 12.28
ship between y and x? Regression Analysis: y versus x
d. What is the best estimate of the common population
The regression equation is
variance s 2? y = -26.8 + 1.26 x
e. Find a 95% confidence interval for the slope of the Predictor Coef SE Coef T P
Constant -26.82 14.76 -1.82 0.086
line. x 1.2617 0.1685 7.49 0.000