0% found this document useful (0 votes)
23 views15 pages

Ap Stats Bivariate Data

Uploaded by

jh seo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views15 pages

Ap Stats Bivariate Data

Uploaded by

jh seo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Name: __________________________________ Date: ____________

BIVARIATE DATA

1. Each of the following except one is a 5. The equation MPG = 0.12(Speed) + 19.1 is
description of correlation. given for the speed of the vehicle vs. miles
per gallon. If our sample contained MPG
a. correlation can only be between of 26 for 50 mph, what is the residual for
-1 and 1 this speed?
b. correlation is a unit of measure
c. correlation identifies the strength a. -0.3
of a linear trend b. 0.9
d. a measure of zero means absolutely c. -0.9
no correlation d. 0.3
e. correlation can only be computed for e. 1.8
a quantitative data
6. Adding a point along the same line, but far
2. Given is the equation FinalScore = away from the rest of the points
55.3+0.2(MidtermScore), with r = 0.31.
If the professor adds 10 points to each a. makes the correlation stronger
students’ MidtermScore, the new b. makes the correlations weaker
correlation will be c. makes the correlation negative
d. does not change the correlation
a. 0.31 e. cannot tell with the given information
b. 0.41
c. 0.49 7. The equation Y= 0.88X+1.69, with r = 0.885,
d. 0.97 is given. Which of these points will
e. cannot predict without the original data make the correlation stronger?

3. Which of the following correlations are a. (1,1)


not possible? b. (1,4)
c. (3,5)
a. the correlation between eye color and d. (4,1)
number of children in the family is 0.42 e. (6,7)
b. the correlation between the height and
weight in teenage girls is 0.76 8. The equation Y= 0.88X+1.69, with r = 0.885,
c. the correlation between years of is given. Compute the sum of
experience and salary is 0.87 the residuals.
d. the correlation between amount of TV
watched and GPA is -0.59 a. 0.88
e. two of these are not possible b. 1.69
c. 0.885
4. Find the proportion of error explained by d. 1
the linear regression. e. 0
X 4 6 8 9 12
Y 12 16 25 30 39

a. 0.9936
b. 0.9924
c. 0.9848
d. 0.9836
e. 0.9902

Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 18
Name: __________________________________ Date: ____________
BIVARIATE DATA

9. Residuals are 13. What are lurking variables?

a. the distance away from the mean a. variables that are being tested that
b. how close the points are to the the tester does not know
straight line b. variables that are only important
c. the value of Y when X is plugged into some of the time
the equations c. variables that are not measured but
d. the difference between the observed may explain the relationship
and predicted response variable d. variables that tell us what happened
e. the difference between the observed prior to the test
and predicted explanatory variable e. variables that cause the relationship

10. Correlation tells us when the regression 14. What is the least-squares regression line?
is the proper one.
a. a line that minimizes the sum of the
a. this statement is completely true squares of the vertical distances of
b. this statement is only true for linear the observed response variable
regressions b. a line that minimizes the correlation
c. this statement is only true when the coefficient on the response variable
correlation is strong c. a line that minimizes the coefficient of
d. this statement is never true determination on the response variable
e. this statement is never true except d. a line that minimizes the sum of the
when the correlation is close to 1 squares of the vertical distances of the
observed explanatory variable
11. If Y = 3 + 4X and the average value for the e. a line that minimizes the sum of the
response variable is 11, then the average squares of the horizontal distances of
value for the explanatory variable is the observed explanatory variable

a. 4 15. The amount of change in the response


b. 3 variable when the explanatory variable
c. 7 is increased by 1 is called the
d. 1.75
e. 47 a. residual
b. correlation
12. What are limitations of extrapolation? c. intercept
d. slope
a. can only be done within the range e. determination
of the data
b. does not give an exact value 16. A point is removed from a data set which
c. unrealistic data may be produced results in a significant change in the
when outside of the range position of the line of best fit. The point
d. all three are limitations removed is called
e. two of the above are limitations
a. a robust point
b. a residual point
c. a response variable
d. an outlier
e. an influential point

Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 19
Name: __________________________________ Date: ____________
BIVARIATE DATA

17. x y 19. The following table contains the IQ and SAT


5 12 scores for a group of high school seniors.
8 16
9 18 IQ SAT
10 19 100 820
12 22 107 890
13 26 115 900
15 28 120 920
16 29 125 1010
19 31 132 1130
21 36 145 1210
25 37
28 39 i. Using the table above, what is the
29 40 correlation coefficient for this data?
30 42
32 45 a. 0.9339
36 46 b. 0.9664
37 48 c. 8.9907
d. -101.17
i. Use the table above to find the fraction of e. 0.5788
the variation explained by the regression
line. ii. Using the table above, what fraction
of error in the SAT score is explained
a. 9.46 by the linear regression on the IQ?
b. 0.9716
c. 0.9857 a. 0.9339
d. 1.08 b. 0.9664
e. cannot be determined from the c. 8.9907
given information d. -101.17
e. 0.5788
ii. Use the table above to find the value of the
response variable when the explanatory iii. Using the table above, what is the
variable is zero. residual for an IQ of 100?

a. 9.46 a. 22.095
b. 0.9716 b. -32.77
c. 0.9857 c. -12.67
d. 1.08 d. 7.5124
e. cannot be determined from the e. 0
given information

18. Which is a better indicator of a good


linear fit?

a. scattered residuals
b. high positive correlation
c. correlation close to 1 or -1
d. positive slope
e. having no outliers

Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 20
Name: __________________________________ Date: ____________
BIVARIATE DATA

20. Which of these can not be true? 22. The following statement was made in a
study of scores on a memory exam for
I. points that lie close to a straight line senior citizens. “The paired sample data of
will have a correlation near 1 age and score result in a linear correlation
II. scattered residuals is an indicator of coefficient very close to 0.” Which of the
a good linear fit following is incorrect?
III. influential points increase the
correlation coefficient a. older people tend to get lower scores
b. the line of best fit is virtually horizontal
a. I c. age does not appear to influence score
b. II d. the does not seem to be any
c. III relationship between age and score
d. I and III e. score does not appear to influence age
e. II and III
23. The following statement was made in a
21. A study by the New Mexico State Highway report on spending habits of urban shop-
Department compared the fatality rate on pers. “There is a significant positive linear
New Mexico highways to United States correlation between per capita income and
highways for the years 1945-1984. The per capita spending.” Which of the follow-
results of a regression analysis are shown. ing is incorrect?

The regression equation is a. the line of best fit is increasing from


US = -0.083 + 0.661 NM. left to right
Predictor Constant NM b. r is greater than 0
Coef -0.0831 0.66143 c. increased spending is caused by
St. dev. 0.3085 0.03509 increased income
t-ratio -0.27 18.85 d. r2 is greater than 0.5
p 0.789 0.0000
e. the more money people have, the
s = 0.6355 R-sq = 90.3% R-sq(adj) = 90.1% more money they spend

i. In the above study, what is the correlation 24. All but one of the following statements con-
coefficient for the regression? tain a blunder in statistical reasoning.
Which conclusion is stated correctly?
a. 0.635
b. 0.661
a. there is a correlation of 0.54 between
c. 0.789
the position a football player plays
d. 0.903
and his weight
e. 0.950
b. the correlation between planting
rate and yield of corn was found to be
ii. In the above study, if the New Mexico
r = 0.23
fatality rate was 6.71 deaths per million
c. the correlation between the gas
vehicle miles, what would be the
mileage of a car and its weight is
predicted US rate?
r = 0.71 MPG
d. researchers found a high correlation
a. 2.07
(r = 1.09) between the sleep time and
b. 4.26
IQ of children
c. 4.35
e. the correlation between the age of a
d. 4.44
child and its height is -0.65
e. 5.29
Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 21
Name: __________________________________ Date: ____________
BIVARIATE DATA

25. What does the squared correlation 28. Given the scatter plot as shown.
coefficient, r2, measure?

a. the slope of the least squares


regression line
b. the intercept of the least squares
regression line
c. the extent to which cause and effect
is present in the data
d. the fraction of the variation of one
variable that is explained by the least
squares regression on the other Which of the following is the most reason-
e. the amount of fit between two variables able value for the correlation coefficient?

26. Given the scatter plot as shown. a. 0.01


b. 0.43
c. 0.73
d. 0.91
e. 1.00

29. Students were tested before and after a


course to see if a pre-test could be used to
predict a student’s success. The results are
shown in the table.

Which of the following is the most reason- Pre-test Post-test


able value for the correlation coefficient? 17 73
21 66
11 64
a. 0.01
16 61
b. 0.43 15 70
c. 0.73 11 71
d. 0.91 24 90
e. 1.00 27 68
19 84
27. A study of elementary school children, ages 8 52
6 to 10, found a high negative correlation
between number of teeth and score on a Which of the following is true?
vocabulary test. What is the most likely
reason for the correlation? a. there is a strong positive correlation
between pre-test and post-test scores
a. age as a lurking variable b. there is a strong negative correlation
b. a causal relationship between pre-test and post-test scores
c. Simpson’s paradox c. there is a weak positive correlation
d. the Hawthorne effect between pre-test and post-test scores
e. an error in computations since teeth d. there is a weak negative correlation
have nothing to do with test scores between pre-test and post-test scores
e. the correlation coefficient is 0.318

Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 22
Name: __________________________________ Date: ____________
BIVARIATE DATA

30. The average prices per ounce for gold 32. A study of selected western states
and silver for a selected time range are calculated the regression line for average
shown in the table. What is the equation number of tourist visits versus the number
of the regression line if silver is the of dollars spent on advertising in vacation
response variable? magazines. The y-intercept was 16,000 and
the slope was 0.95.
Siver (x ) Gold (y )
5.47 368 i. Given the information above, what
7.01 478
average number of tourist visits does
6.53 438
this regression predict for a state that
5.50 383
4.82 385 spends $10,000 on advertising?
4.04 363
3.94 345 a. 16,950
4.30 361 b. 24,700
5.30 389 c. 25,200
d. 25,500
a. Silver = -3.88 + 0.023 Gold e. 26,840
b. Gold = -3.88 + 0.023 Silver
c. Silver = 200.50 + 36.36 Gold ii. Given the information above, how
d. Gold = 200.50 + 36.36 Silver much money should a state spend if it
e. none of the above wants to average 40,000 tourist visits?

31. A researcher wants to compare neck sizes a. $22,800


and wrist sizes of two year-old children. b. $25,267
She isn’t sure which one is the response c. $38,000
variable. She decides to calculate two d. $42,000
different regression lines; the first one with e. $53,200
neck size as the response variable and
the second one with wrist size as the 33. A group of college professors and a group
response variable. If r1 is the correlation of college students were asked to rate a
coefficient of the first calculation and r2 is collection of essays. The correlation
the correlation coefficient of the second coefficient was 1.03. A reasonable
calculation, which of the following state- conclusion of this report is that
ments is true?
a. professors and students tend to agree
a. r1 > r2 on what is a good essay
b. r1 < r2 b. there is little relationship between
c. r1 = r2 professor and student ratings of the
d. r1 = -r2 essays
e. r1 ≠ r2 c. there is a strong relationship between
professor and student ratings of the
essays
d. there is a strong causation relationship
between professor and student ratings
of the essays
e. an arithmetic error was made

Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 23
Name: __________________________________ Date: ____________
BIVARIATE DATA

34. A regression was done on a randomly 38. The given data points were obtained.
selected group of girls and their scores on
x 1 3 4 7 9
the Wide Range Achievement Test. The
y 0.2 1.8 3.2 9.8 16.2
equation was Score = -48.3 + 11.9 Age with
r2 = 0.912. About how old must a girl be if Which of the following transformations to
she received a score of 60? the response data will change the scatter-
plot of the data to a straight line?
a. 8.3
b. 9.1 a. take the natural logarithm of the
c. 10.0 response variable
d. 11.7 b. take the square root of the
e. 11.9 response variable
c. take the exponential of the
35. In order to transform test scores to a response variable
standard scale, a teacher added 6% to each d. take the square of the response variable
student’s score. If the correlation was 0.42, e. add a constant to the response variable
what will be the new correlation?
39. The given data points were obtained.
a. 0.42
b. 0.45 x 1 3 4 7 9
y 2.24 3.87 4.47 5.92 6.71
c. 0.48
d. 0.78 Which of the following transformations to
e. none of the above the response data will change the scatter-
plot of the data to a straight line?
36. Consider the set of data points: (2, 8),
(3, 11), (7, 23), (9, 29), (14, y). What value a. take the natural logarithm of the
of y will make the correlation between response variable
x- and y-values equal to 1? b. take the square root of the
response variable
a. 44 c. take the exponential of the
b. 46 response variable
c. 51 d. take the square of the response variable
d. 56 e. add a constant to the response variable
e. no value will work

37. Which of the following statements is true


about the linear correlation coefficient, r?

a. when r = 0, there is no relationship


between the variables
b. when r = 0.60, 60% of the variables are
closely related
c. when r = -1, there is a perfect cause
and effect relationship between the
variables
d. when r = -1, all values must be negative
e. r2 measures the ratio of predicted
amount of variation and the actual
variation
Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 24
Name: __________________________________ Date: ____________
BIVARIATE DATA

40. What is the correlation coefficient for 42. What is the correlation coefficient for
this plot? this plot?

a. -1 a. -1
b. -0.99 b. -0.98
c. 0 c. 0
d. 0.99 d. 0.98
e. 1 e. 1

41. What is the correlation coefficient for 43. The following is a graph of residuals.
this plot? Approximately, what is the residual for
C1 = 15?

a. -1
b. -0.99
c. 0 a. 5
d. 0.99 b. 50
e. 1 c. 100
d. -5
e. -50

Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 25
Name: __________________________________ Date: ____________
BIVARIATE DATA • Free Response

1. The following table gives the number of hours studied and the grades obtained by
ten students selected at random from a large class.

Student 1 2 3 4 5 6 7 8 9 10
Hours studied (x ) 9 5 14 10 6 18 16 2 12 16
Exam Grade (y ) 52 37 66 64 43 90 81 27 51 86

a. Compute the linear regression equation.

b. Find the correlation coefficient.

c. Describe the overall strength of the relationship (strong, moderate, or weak).

d. Compute the residual for student number seven. Show your work.

e. Predict the score a student would likely receive if he studied for 8 hours.
Show your work.

f. Use residuals to analyze whether the equation found in part a. is a reasonable


model for the data. Include a sketch of your residual plot.

Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 26
Name: __________________________________ Date: ____________
BIVARIATE DATA • Free Response

2. This data is from the 1988-89 National Basketball Association Chicago Bulls team.

PLAYER MINUTES PLAYED POINTS SCORED


Brown 591 197
Corzine 2328 804
Grant 1827 622
Jordan 3311 2868
Oakley 2816 1014
Paxson 1888 640
Pippen 1650 625
Sellers 2212 777
Sparrow 1044 260
Vincent 1501 573

a. Create a scatter plot of Minutes Played (x) vs. Points Scored (y).

b. Calculate the regression equation for the data. Plot the line on the scatter plot.

c. Circle the influential observation.

d. Calculate the regression equation for the data with the influential observation
removed. Using a dotted line, graph the line on the scatter plot.

e. Which player has the smallest residual in both regression equations?


Explain your reasoning.

Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 27
Name: __________________________________ Date: ____________
BIVARIATE DATA • Free Response

3. The following table gives the data for the number of women participating in the Olympic
Games from 1896 to 1992.
Year 1896 1900 1904 1908 1912 1920 1924 1928 1932 1936 1948
Women 0 11 6 36 57 64 136 290 127 328 385
Year 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992
Women 518 397 610 683 781 1171 1272 1192 1620 2438 3008

a. Find a proper regression for this data. Include a complete explanation why this
particular regression is used.

b. 1916, 1940, 1944 are the three years when Olympic Games were not held because
of the World Wars. Use extrapolation to estimate the number of women that
would have been present if the Games were held.

c. In 1980, the Games were held in Moscow, USSR. Due to the Cold War, some of the
countries boycotted the Games and did not attend. Eliminate the data for 1980 and
figure out a new regression.

d. Use the new regression to estimate the number of women that would have
attended in 1916, 1940, and 1944 if the Games were held.

Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 28
Name: __________________________________ Date: ____________
BIVARIATE DATA • Free Response

4. The following is a computer graph of the life expectancy of males vs. females in
different countries all over the world.

a. Describe what you can tell from the graph about the relationship in this data.

b. If the correlation coefficient is 0.97, what does that say about how well the
regression fits the data?

c. What might be the difficulties in extrapolating this data?

Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 29
Name: __________________________________ Date: ____________
BIVARIATE DATA • Free Response

5. The regression line y = 1.05 + 0.23x for the five points on the scatterplot was
computed using the method of least-squares.

5
x y
1 1.2
2 1.6
3 1.8
4 1.9
5 2.2
-5 5

-5

a. Sketch the regression line on the plot.

b. Use the above scatterplot to demonstrate the meaning of “least-squares.”

c. What is the residual when x = 3?

Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 30
Answer Key

b. The second exam has much lower scores than the b. Skewed to the right
first exam. The spread is about the same, though the c. None
first exam has more outliers on the lower end. d. The five number summary shows skewed
5a. behavior. The mean and standard deviation are
most appropriate for symmetric distributions.
e. Mean: 83.1; St. Dev. 66.0
Min: 1; q1: 24; Median: 68; q3: 141; Max: 216
7a. The mean is most helpful since the number of
parking spaces is to be computed on the total
number of cars. Neither the mode nor the median
can be used to find the total.
b. The median represents the middle value. It must be
either an integer or a number that ends in 0.5. The
Five Number Summaries: mode is the most commonly occurring value. Cars
come in discrete integer values and cannot be mea-
Min Q1 Med Q3 Max
sured in fractional parts.
Route I 19.3 23 28.4 32.1 38.4
c. Assuming a normal distribution:
Route II 23.7 27.7 32.2 35.5 42.9
(275)(1.8) + (275)(0.13)(1.282) = 540.83 ≈ 541 spaces.
b. The box plots show the second route as basically d. 540.83 = (275)(1.8) + (sd)(275)(1.476)
the same shape, but a bit to the right. Therefore, sd = 0.113
route one seems to be faster, but more tests might
be necessary to prove that the difference is
significant.
6a.

Bivariate Data
1. b 10. d 18. a 24. b ii. b 41. b
2. a 11. d 19i. b 25. d 33. e 42. c
3. a 12. e ii. a 26. d 34. b 43. e
4. c 13. c iii. a 27. a 35. e
5. b 14. a 20. a 28. a 36. a
6. a 15. d 21i. d 29. c 37. e
7. e 16. e ii. c 30. a 38. b
8. e 17i. b 22. a 31. c 39. d
9. d ii. a 23. c 32i. d 40. d

Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 91
Answer Key

Bivariate Data • Free Response

1a. y = 18.1 + 3.86x c. Using the above regression, the answers change
b. r = 0.960 only in the hundredth digits, and so do not need to
c. Strong, since r > 0.90. be redone. In a different model, the answers may
d. Predicted Grade = 18.1 + 3.86(16) = 79.86 come out slightly different.
Residual = Actual – Predicted = 81 – 79.86 = 1.14 d. Same as c.
e. Grade = 18.1 + 3.86(8) = 48.98 ≈ 49 4a. The plot and the correlation suggest strong positive
f. The residual plot shows no discernible pattern. association between the two variables.
The residual values appear to be randomly scattered b. It says that the points lie close to the least-square
about 0. As a result, the equation in part a is a rea- regression line. It does not say that the line is the
sonable model for the data given. proper model for this data.
c. The difficulties in extrapolating refer to the fact that
life expectancies are bounded by the number of
years a person may live on the right and zero on
the left.
5a. 5

-5 5
2a, b, and d.

3000

-5

2000 b. The “least-squares line” means that the sum of the


squared differences of the observed and computed
y values is a minimum, or, in other words, the least
1000 it can be. For this situation, the first point would
have a difference of 1.2–1.28 = -0.08. The other
differences are shown in the table below.
1000 2000 3000 x y ® y–® (y–®)2
1 1.2 1.28 -0.08 0.0064
b. y = 0.78x – 655.0 2 1.6 1.51 0.09 0.0081
c. The influential observation is (3311, 2868). 3 1.8 1.74 0.06 0.0036
4 1.9 1.97 -0.07 0.0049
d. y = 0.375x – 48.38 5 2.2 2.2 0 0
e. Pippen had the smallest residual for the first
data set. Sellers had the smallest when Jordan Then the sum of the squares (shown in the last
was removed. column) would be 0.0064 + 0.0081 + 0.0036 + 0.0049
3a. The best regression for this model is a power regres- + 0 = 0.023. No other line fit to these same five
sion with the equation ln(women) = 1.88 ? ln(year) + points would have a smaller sum of the squares.
1.37. A linear model is not proper in this situation c. 0.06
because the plot of residuals has a definite pattern;
linearizng the model by taking the logs of x and y,
gets rid of the pattern. In this model the years are
coded as 1, 2, 3, etc., and the first data point is
thrown out because ln(0) does not exist. Also to
keep the pattern in the years, numbers 5, 11, 12
are skipped.
Other models are acceptable as long as the
explanation is reasonable. Residual plots must
be mentioned for the answer to be correct.
b. 1916 extrapolates to exp(4.39) = 80.86 ≈ 80. This can
be found by plugging year = 5 into the equation
above and then taking e to the power of the answer.
1940 gives 355.68, 1944 gives 418.85
Answers may vary if different coding was used for
the years or if a different model was used.

Copyright © 2000 William K. Bradford Publishing Company. All rights reserved. Permission to reproduce this master is granted to registered purchasers for their classroom use. Printed in the United States of America. 92

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy