Ap Stats Bivariate Data
Ap Stats Bivariate Data
BIVARIATE DATA
1. Each of the following except one is a 5. The equation MPG = 0.12(Speed) + 19.1 is
description of correlation. given for the speed of the vehicle vs. miles
per gallon. If our sample contained MPG
a. correlation can only be between of 26 for 50 mph, what is the residual for
-1 and 1 this speed?
b. correlation is a unit of measure
c. correlation identifies the strength a. -0.3
of a linear trend b. 0.9
d. a measure of zero means absolutely c. -0.9
no correlation d. 0.3
e. correlation can only be computed for e. 1.8
a quantitative data
6. Adding a point along the same line, but far
2. Given is the equation FinalScore = away from the rest of the points
55.3+0.2(MidtermScore), with r = 0.31.
If the professor adds 10 points to each a. makes the correlation stronger
students’ MidtermScore, the new b. makes the correlations weaker
correlation will be c. makes the correlation negative
d. does not change the correlation
a. 0.31 e. cannot tell with the given information
b. 0.41
c. 0.49 7. The equation Y= 0.88X+1.69, with r = 0.885,
d. 0.97 is given. Which of these points will
e. cannot predict without the original data make the correlation stronger?
a. 0.9936
b. 0.9924
c. 0.9848
d. 0.9836
e. 0.9902
Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 18
Name: __________________________________ Date: ____________
BIVARIATE DATA
a. the distance away from the mean a. variables that are being tested that
b. how close the points are to the the tester does not know
straight line b. variables that are only important
c. the value of Y when X is plugged into some of the time
the equations c. variables that are not measured but
d. the difference between the observed may explain the relationship
and predicted response variable d. variables that tell us what happened
e. the difference between the observed prior to the test
and predicted explanatory variable e. variables that cause the relationship
10. Correlation tells us when the regression 14. What is the least-squares regression line?
is the proper one.
a. a line that minimizes the sum of the
a. this statement is completely true squares of the vertical distances of
b. this statement is only true for linear the observed response variable
regressions b. a line that minimizes the correlation
c. this statement is only true when the coefficient on the response variable
correlation is strong c. a line that minimizes the coefficient of
d. this statement is never true determination on the response variable
e. this statement is never true except d. a line that minimizes the sum of the
when the correlation is close to 1 squares of the vertical distances of the
observed explanatory variable
11. If Y = 3 + 4X and the average value for the e. a line that minimizes the sum of the
response variable is 11, then the average squares of the horizontal distances of
value for the explanatory variable is the observed explanatory variable
Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 19
Name: __________________________________ Date: ____________
BIVARIATE DATA
a. 9.46 a. 22.095
b. 0.9716 b. -32.77
c. 0.9857 c. -12.67
d. 1.08 d. 7.5124
e. cannot be determined from the e. 0
given information
a. scattered residuals
b. high positive correlation
c. correlation close to 1 or -1
d. positive slope
e. having no outliers
Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 20
Name: __________________________________ Date: ____________
BIVARIATE DATA
20. Which of these can not be true? 22. The following statement was made in a
study of scores on a memory exam for
I. points that lie close to a straight line senior citizens. “The paired sample data of
will have a correlation near 1 age and score result in a linear correlation
II. scattered residuals is an indicator of coefficient very close to 0.” Which of the
a good linear fit following is incorrect?
III. influential points increase the
correlation coefficient a. older people tend to get lower scores
b. the line of best fit is virtually horizontal
a. I c. age does not appear to influence score
b. II d. the does not seem to be any
c. III relationship between age and score
d. I and III e. score does not appear to influence age
e. II and III
23. The following statement was made in a
21. A study by the New Mexico State Highway report on spending habits of urban shop-
Department compared the fatality rate on pers. “There is a significant positive linear
New Mexico highways to United States correlation between per capita income and
highways for the years 1945-1984. The per capita spending.” Which of the follow-
results of a regression analysis are shown. ing is incorrect?
i. In the above study, what is the correlation 24. All but one of the following statements con-
coefficient for the regression? tain a blunder in statistical reasoning.
Which conclusion is stated correctly?
a. 0.635
b. 0.661
a. there is a correlation of 0.54 between
c. 0.789
the position a football player plays
d. 0.903
and his weight
e. 0.950
b. the correlation between planting
rate and yield of corn was found to be
ii. In the above study, if the New Mexico
r = 0.23
fatality rate was 6.71 deaths per million
c. the correlation between the gas
vehicle miles, what would be the
mileage of a car and its weight is
predicted US rate?
r = 0.71 MPG
d. researchers found a high correlation
a. 2.07
(r = 1.09) between the sleep time and
b. 4.26
IQ of children
c. 4.35
e. the correlation between the age of a
d. 4.44
child and its height is -0.65
e. 5.29
Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 21
Name: __________________________________ Date: ____________
BIVARIATE DATA
25. What does the squared correlation 28. Given the scatter plot as shown.
coefficient, r2, measure?
Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 22
Name: __________________________________ Date: ____________
BIVARIATE DATA
30. The average prices per ounce for gold 32. A study of selected western states
and silver for a selected time range are calculated the regression line for average
shown in the table. What is the equation number of tourist visits versus the number
of the regression line if silver is the of dollars spent on advertising in vacation
response variable? magazines. The y-intercept was 16,000 and
the slope was 0.95.
Siver (x ) Gold (y )
5.47 368 i. Given the information above, what
7.01 478
average number of tourist visits does
6.53 438
this regression predict for a state that
5.50 383
4.82 385 spends $10,000 on advertising?
4.04 363
3.94 345 a. 16,950
4.30 361 b. 24,700
5.30 389 c. 25,200
d. 25,500
a. Silver = -3.88 + 0.023 Gold e. 26,840
b. Gold = -3.88 + 0.023 Silver
c. Silver = 200.50 + 36.36 Gold ii. Given the information above, how
d. Gold = 200.50 + 36.36 Silver much money should a state spend if it
e. none of the above wants to average 40,000 tourist visits?
Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 23
Name: __________________________________ Date: ____________
BIVARIATE DATA
34. A regression was done on a randomly 38. The given data points were obtained.
selected group of girls and their scores on
x 1 3 4 7 9
the Wide Range Achievement Test. The
y 0.2 1.8 3.2 9.8 16.2
equation was Score = -48.3 + 11.9 Age with
r2 = 0.912. About how old must a girl be if Which of the following transformations to
she received a score of 60? the response data will change the scatter-
plot of the data to a straight line?
a. 8.3
b. 9.1 a. take the natural logarithm of the
c. 10.0 response variable
d. 11.7 b. take the square root of the
e. 11.9 response variable
c. take the exponential of the
35. In order to transform test scores to a response variable
standard scale, a teacher added 6% to each d. take the square of the response variable
student’s score. If the correlation was 0.42, e. add a constant to the response variable
what will be the new correlation?
39. The given data points were obtained.
a. 0.42
b. 0.45 x 1 3 4 7 9
y 2.24 3.87 4.47 5.92 6.71
c. 0.48
d. 0.78 Which of the following transformations to
e. none of the above the response data will change the scatter-
plot of the data to a straight line?
36. Consider the set of data points: (2, 8),
(3, 11), (7, 23), (9, 29), (14, y). What value a. take the natural logarithm of the
of y will make the correlation between response variable
x- and y-values equal to 1? b. take the square root of the
response variable
a. 44 c. take the exponential of the
b. 46 response variable
c. 51 d. take the square of the response variable
d. 56 e. add a constant to the response variable
e. no value will work
40. What is the correlation coefficient for 42. What is the correlation coefficient for
this plot? this plot?
a. -1 a. -1
b. -0.99 b. -0.98
c. 0 c. 0
d. 0.99 d. 0.98
e. 1 e. 1
41. What is the correlation coefficient for 43. The following is a graph of residuals.
this plot? Approximately, what is the residual for
C1 = 15?
a. -1
b. -0.99
c. 0 a. 5
d. 0.99 b. 50
e. 1 c. 100
d. -5
e. -50
Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 25
Name: __________________________________ Date: ____________
BIVARIATE DATA • Free Response
1. The following table gives the number of hours studied and the grades obtained by
ten students selected at random from a large class.
Student 1 2 3 4 5 6 7 8 9 10
Hours studied (x ) 9 5 14 10 6 18 16 2 12 16
Exam Grade (y ) 52 37 66 64 43 90 81 27 51 86
d. Compute the residual for student number seven. Show your work.
e. Predict the score a student would likely receive if he studied for 8 hours.
Show your work.
Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 26
Name: __________________________________ Date: ____________
BIVARIATE DATA • Free Response
2. This data is from the 1988-89 National Basketball Association Chicago Bulls team.
a. Create a scatter plot of Minutes Played (x) vs. Points Scored (y).
b. Calculate the regression equation for the data. Plot the line on the scatter plot.
d. Calculate the regression equation for the data with the influential observation
removed. Using a dotted line, graph the line on the scatter plot.
Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 27
Name: __________________________________ Date: ____________
BIVARIATE DATA • Free Response
3. The following table gives the data for the number of women participating in the Olympic
Games from 1896 to 1992.
Year 1896 1900 1904 1908 1912 1920 1924 1928 1932 1936 1948
Women 0 11 6 36 57 64 136 290 127 328 385
Year 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992
Women 518 397 610 683 781 1171 1272 1192 1620 2438 3008
a. Find a proper regression for this data. Include a complete explanation why this
particular regression is used.
b. 1916, 1940, 1944 are the three years when Olympic Games were not held because
of the World Wars. Use extrapolation to estimate the number of women that
would have been present if the Games were held.
c. In 1980, the Games were held in Moscow, USSR. Due to the Cold War, some of the
countries boycotted the Games and did not attend. Eliminate the data for 1980 and
figure out a new regression.
d. Use the new regression to estimate the number of women that would have
attended in 1916, 1940, and 1944 if the Games were held.
Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 28
Name: __________________________________ Date: ____________
BIVARIATE DATA • Free Response
4. The following is a computer graph of the life expectancy of males vs. females in
different countries all over the world.
a. Describe what you can tell from the graph about the relationship in this data.
b. If the correlation coefficient is 0.97, what does that say about how well the
regression fits the data?
Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 29
Name: __________________________________ Date: ____________
BIVARIATE DATA • Free Response
5. The regression line y = 1.05 + 0.23x for the five points on the scatterplot was
computed using the method of least-squares.
5
x y
1 1.2
2 1.6
3 1.8
4 1.9
5 2.2
-5 5
-5
Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 30
Answer Key
b. The second exam has much lower scores than the b. Skewed to the right
first exam. The spread is about the same, though the c. None
first exam has more outliers on the lower end. d. The five number summary shows skewed
5a. behavior. The mean and standard deviation are
most appropriate for symmetric distributions.
e. Mean: 83.1; St. Dev. 66.0
Min: 1; q1: 24; Median: 68; q3: 141; Max: 216
7a. The mean is most helpful since the number of
parking spaces is to be computed on the total
number of cars. Neither the mode nor the median
can be used to find the total.
b. The median represents the middle value. It must be
either an integer or a number that ends in 0.5. The
Five Number Summaries: mode is the most commonly occurring value. Cars
come in discrete integer values and cannot be mea-
Min Q1 Med Q3 Max
sured in fractional parts.
Route I 19.3 23 28.4 32.1 38.4
c. Assuming a normal distribution:
Route II 23.7 27.7 32.2 35.5 42.9
(275)(1.8) + (275)(0.13)(1.282) = 540.83 ≈ 541 spaces.
b. The box plots show the second route as basically d. 540.83 = (275)(1.8) + (sd)(275)(1.476)
the same shape, but a bit to the right. Therefore, sd = 0.113
route one seems to be faster, but more tests might
be necessary to prove that the difference is
significant.
6a.
Bivariate Data
1. b 10. d 18. a 24. b ii. b 41. b
2. a 11. d 19i. b 25. d 33. e 42. c
3. a 12. e ii. a 26. d 34. b 43. e
4. c 13. c iii. a 27. a 35. e
5. b 14. a 20. a 28. a 36. a
6. a 15. d 21i. d 29. c 37. e
7. e 16. e ii. c 30. a 38. b
8. e 17i. b 22. a 31. c 39. d
9. d ii. a 23. c 32i. d 40. d
Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 91
Answer Key
1a. y = 18.1 + 3.86x c. Using the above regression, the answers change
b. r = 0.960 only in the hundredth digits, and so do not need to
c. Strong, since r > 0.90. be redone. In a different model, the answers may
d. Predicted Grade = 18.1 + 3.86(16) = 79.86 come out slightly different.
Residual = Actual – Predicted = 81 – 79.86 = 1.14 d. Same as c.
e. Grade = 18.1 + 3.86(8) = 48.98 ≈ 49 4a. The plot and the correlation suggest strong positive
f. The residual plot shows no discernible pattern. association between the two variables.
The residual values appear to be randomly scattered b. It says that the points lie close to the least-square
about 0. As a result, the equation in part a is a rea- regression line. It does not say that the line is the
sonable model for the data given. proper model for this data.
c. The difficulties in extrapolating refer to the fact that
life expectancies are bounded by the number of
years a person may live on the right and zero on
the left.
5a. 5
-5 5
2a, b, and d.
3000
-5
Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 92