0% found this document useful (0 votes)
11 views28 pages

Chapter 3 Notes 2024 2025 PDF

Chapter 3 discusses the analysis of relationships between quantitative variables using scatterplots, correlation, and least-squares regression. It emphasizes the importance of visualizing data, identifying explanatory and response variables, and interpreting the strength and direction of relationships. The chapter also covers practical applications, such as predicting outcomes based on regression models and evaluating the appropriateness of linear models using residual plots.

Uploaded by

Nancy Skocik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views28 pages

Chapter 3 Notes 2024 2025 PDF

Chapter 3 discusses the analysis of relationships between quantitative variables using scatterplots, correlation, and least-squares regression. It emphasizes the importance of visualizing data, identifying explanatory and response variables, and interpreting the strength and direction of relationships. The chapter also covers practical applications, such as predicting outcomes based on regression models and evaluating the appropriateness of linear models using residual plots.

Uploaded by

Nancy Skocik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Chapter 3: Describing Relationships

3.1 Scatterplots and Correlation


3.2 Least-Squares Regression

Name___________________________________
What information do the park rangers at Yellowstone use to predict when Old Faithful’s next eruption will be?

Identify the explanatory and the response variables in this situation.

What kind of graph do we use to show the relationship between two quantitative variables?

1
Here is a scatterplot that plots the interval between consecutive eruptions of Old Faithful against the duration of the
previous eruption, for the month prior to the Starnes visit.

1. Describe the direction of the relationship.

2. What form does the relationship take?

3. How strong is the relationship?

4. Are there any unusual features?

5. If the previous eruption lasted 3 minutes and 42 seconds how long do you think it would be until the next eruption?

2
3.1 Describing Relationships (Read pgs. 143-149)
In Chapter 1, we explored the relationships between categorical variables. In this chapter we investigate the
relationships between two quantitative variables.

The principles that guide our work remain the same:

• Plot the data, then add numerical summaries.


• Look for overall patterns and departures from those patterns.
• When there’s a regular overall pattern, use a simplified model to describe it.

A response variable _____________________________________________________________________.

An explanatory variable _____________________________________________________________________________

_____________________________________________.

Check Your Understanding

Identify the explanatory and response variables in each setting.

1. How does drinking beer affect the level of alcohol in people’s blood? The legal limit for driving in all states is 0.08%.
In a study, adult volunteers drank different numbers of cans of beer. Thirty minutes later, a police officer measured
their blood alcohol levels.

2. The National Student Loan Survey provides data on the amount of debt for recent college graduates, their current
income, and how stressed they feel about college debt. A sociologist looks at the data with the goal of using amount
of debt and income to explain the stress caused by college debt.

A ___________________________________ shows the relationship between two quantitative variables measured on


the same individuals. The values of one variable appear on the horizontal axis, and the values of the other variable
appear on the vertical axis. Each individual in the data appears as a point in the graph.

Always plot the ___________________________________________, if there is one, on the horizontal axis (the x-axis) of
a scatterplot. Note: you don’t always start at (0, 0).

What is the easiest way to lose points when making a scatterplot? Failure to label your axes.

3
What four characteristics should you consider when interpreting a scatterplot?

The following scatterplot displays the average number of points scored per game and the number of wins for college
football teams in the Southeastern Conference. Describe what the scatterplot reveals about the relationship between
points per game and wins.

Study the scatterplots below and evaluate the direction and strength of each relationship. Complete the table below.

Strong Moderate Weak

Negative

Positive

A B C

D E F

4
3.1 Measuring Linear Association: Correlation

The correlation r measures the ________________________________________________________________________

between two quantitative variables.

Facts about Correlation

• The correlation r is always between ________ and ________.

• r > 0 indicates a _______________ relationship.

• r < 0 indicates a _______________ relationship.

• Values near 0 indicate a ____________ linear relationship.

• Values near 1 or -1 indicate a _______________ linear relationship.

1. The table shows the weight (in pounds) and cost (in dollars) of a sample of 11 stand mixers Weight Price
(from Consumer Reports, November 2005.) (lb) ($)
a. Enter the data into List 1 and List 2 of your calculator, sketch a scatterplot of the data and
23 180
describe the scatterplot. (Discuss strength, form, direction and outliers in context.)
28 250

19 300

17 150

25 300

b. Follow the steps below to calculate the correlation. 26 370


1. Select MODE and make sure your stats diagnostics are turned on.
21 400

32 350

16 200

17 150

8 30

2. To calculate the correlation, r, use the following keystrokes: STAT, CALC, 8:LinReg(a +bx)

r = ______

5
c. The last mixer in the table is from Walmart; put an x through this point. What happens to the correlation when
you remove the point?

d. What happens to the correlation if the Walmart mixer weighs 25 pounds instead of 8 pounds? Add the point
(25, 30) and recalculate the correlation. (Make it a star)

e. What happens to the correlation if the Walmart mixer weighs 25 pounds and costs $310? Add the point (25,
310) and recalculate the correlation.

f. Suppose that a new titanium mixer was introduced that weighed 8 pounds, but the cost was $500. Remove the
point (25, 30) and add the point (8, 500), circle it, and recalculate the correlation.

g. When a point is added that is far away from the other points but still fits the linear pattern what happens to the
correlation?

h. When a point is added that is far away from the other points and doesn’t fit the linear pattern what happens to
the correlation?

The formula for correlation is as follows:

2. Would switching the explanatory and response variables change the correlation?

3. Would switching the units (for example feet to inches) of one or both of the variables change the correlation?

X y
4. a. For the data to the right, is the relationship between x and y linear?
1 1

2 4

3 9
b. Find the correlation.
4 16

5 25

c. Does a correlation close to one always imply that the relationship is linear?

6
3.2 Least-Squares Regression
How Much Is That Truck Worth?

Everyone knows that cars and trucks lose value the more they are driven. Can we predict the price of a used Ford F-
150 SuperCrew 4 × 4 if we know how many miles it has on the odometer? A random sample of 16 used Ford F-150
SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The number of miles driven and
price (in dollars) were recorded for each of the trucks. Here are the data:

1. Identify the explanatory variable.

2. Identify the response variable.

3. Sketch a scatterplot of the data. Then describe what the scatterplot reveals about the relationship between miles
driven and price.

Suppose that y is a response variable (plotted on the vertical axis) and x is an explanatory variable (plotted on the
horizontal axis). A regression line relating y to x has an equation of the form

𝑦𝑦� = 𝑎𝑎 + 𝑏𝑏𝑏𝑏
In this equation,

• _______ (read “y hat”) is the ______________________ value of the response variable y for a given value of
the explanatory variable x.

• b is the _____________, the amount by which y is predicted to change when x increases by one unit.

• a is the ______________, the predicted value of y when x = 0.

7
4. Calculate the regression equation for predicting the price of a Ford F-150 based on number of miles driven.

5. Identify and interpret the slope.

6. Identify and interpret the y-intercept.

7. Predict the price of a Ford F-150 that has been driven 100,000 miles.

8. Can we predict the price of a Ford F-150 with 300,000 miles driven?

Residuals and the Least Squares Regression Line

A ___________________ is the difference between an observed value of the response variable and the value
predicted by the regression line. That is,

residual = ____________________________________________

Find and interpret the residual for the Ford F-150 that
had 70,583 miles driven and a price of $21,994.

The least-squares regression line of y on x is the line that makes ______________________________________

_____________________________________________________________________________________________

8
Determining Whether a Linear Model Is Appropriate: Residual Plots

A residual plot is a scatterplot of the residuals against the explanatory variable. Residual plots help us assess whether a
linear model is appropriate. When an obvious curved pattern exists in a residual plot, the linear model is not
appropriate.

Scatterplot Residual Plot Notes

Because there is only random scatter in the


residual plot we know the linear model is
appropriate.

There is a definite curved pattern to the


residual plot. A linear model is not appropriate.

Construct and interpret a residual plot for the Ford F-150 data. Your calculator calculates the residuals for you AFTER
finding the regression equation. Set up your Stat Plot as shown, then select 9:ZoomStat. (To access the list of residuals
press 2nd STAT)

Sketch the residual plot. Does the linear model seem appropriate? Explain.

9
A residual plot is a graphical tool for determining if a least-squares regression line is an appropriate model for a
relationship between two variables. Once we have determined that a least-squares regression line is appropriate, it
makes sense to ask: How well does the line work? If we use the least-squares regression line to make predictions, how
good will these predictions be?

If we use a least-squares line to predict the values of a response variable y from an explanatory variable x, the standard
deviation of the residuals (s) is given by

∑ residuals ∑ ( y − yˆ )
2 2
i
=s =
n−2 n−2

Calculate and interpret the standard deviation of the residuals for the F-150 data and interpret this value in context.

There is another numerical quantity that tells us how well the least-squares line predicts the values of the response
variable y. It is r2, the coefficient of determination.

Suppose that we randomly selected an additional used Ford F-150 that was on sale. What should we predict for its
price? Figure 3.14 shows a scatterplot of the truck data that we have studied throughout this section, including the
least-squares regression line. Another horizontal line has been added at the mean y-value, y = $27,834. If we don’t
know the number of miles driven for the additional truck, we can’t use the regression line to make a prediction. What
should we do? Our best strategy is to use the mean price of the other 16 trucks as our prediction.

10
Calculate the ratio of the sum of the squared residuals to determine the percent of variation that is unaccounted for by
the least squares regression line.

What are some other factors, besides miles driven, that may determine the price of the truck.

What is the proportion of the total variation in price that is accounted for by the least squares regression line? Interpret
this value in context.

The coefficient of determination r2 is the fraction of the variation in the values of y that is accounted for by the least-
squares regression line of y on x. We can calculate r2 using the formula:

2
r = 1−
∑ residuals 2

∑( y − y )
2
i

r2 is the percent of variation in the (y variable) that is accounted for by the linear model relating (y variable)
to (x variable).

How is r 2 related to r? How is r 2 related to s?

• r 2 and s both measure how well the least-squares regression line models the data (how much scatter there is
from the least-squares regression line)
• s is measured in the units of the response variable, r2 is on a standard scale (no units)
• neither address form!
• Careful – If you want to find r from r2 you must take into account the direction of the association.

11
3.2 Interpreting Computer Output

Does seat location affect grades?


Many people believe that students learn better if they sit closer to the front of the classroom. Does sitting closer cause
higher achievement, or do better students simply choose to sit in the front? To investigate, an AP Statistics teacher
randomly assigned students to seat locations in her classroom for a particular chapter. At the end of the chapter, she
recorded the row number (Row 1 is closest to the front) and test score for each student. Least-squares regression was
performed on the data. A scatterplot with the regression line added, a residual plot, and some computer output from
the regression are shown below.
slope y-intercept

r2
Standard deviation of the residuals

(a) Is this an observational study or an experiment? Explain.

(b) Identify the type and scope of inference.

(c) Describe the relationship between row and test score for students in this class.

(d) What is the equation of the least-squares regression line? Define any variables you use.

(e) Interpret the slope of the least-squares regression line.

12
(f) What is the correlation?

(g) Is a linear model appropriate for this data? Explain.

(h) Calculate and interpret the residual for a student who was seated in the fourth row and had a test score of 75.

(i) Interpret the value of r 2 in context.

(j) Interpret s in context.

(k) Would it be reasonable to use the fitted regression equations to predict the test score for a student sitting in row 50
of a lecture hall? Explain.

13
3.2 How to Calculate the Least Square Regression Line

The least-squares regression line is the line ____________________________________ with slope _______________

and y-intercept _______________________.

1. A random sample of 15 high school students was selected from the CensusAtSchool database. The foot length
(in centimeters) and height (in centimeters) of each student in the sample were recorded. The mean and
standard deviation of foot lengths are x = 24.76 cm and sx = 2.71 cm. The mean and standard deviation of the
heights are y = 171.43 cm and s y = 10.69 cm. The correlation between foot and length and height is r = 0.697.
Find the equation of the least-squares regression line for predicting height from foot length.

2. The mean height of married American women in their early twenties is 64.5 inches and the standard deviation is
2.5 inches. The mean height of married men the same age is 68.5 inches, with standard deviation of 2.7 inches.
The correlation between the heights of husbands and wives is about r = 0.5. Find the equation of the least-
squares regression line for predicting a husband’s height from his wife’s height for married couples in their early
20s.

14
3.2 Linear Regression Practice

Can we predict the number of wins for an MLB team from their run differential (runs scored – runs allowed)?

The scatterplot below shows the run differential and number of wins for all 30 MLB teams for the 2023 season. A
residual plot and some computer output is also provided.

(a) Describe the relationship between run differential and # of wins.

(b) Interpret the slope and y-intercept of the least-squares regression line.

(c) Calculate and interpret the residual for the Chicago Cubs who had a run differential of 96 and won 83 games.

(d) Interpret the value of r 2 in context.

(e) Interpret s in context.

(f) Statistician Bill James developed the Pythagorean Winning Percentage formula to predict the number of games a
team “should” win based on its total number of runs scored versus its number of runs allowed. The initial formula
for Pythagorean winning percentage was as follows: (runs scored^2)/(runs scored^2 + runs allowed^2). Since then,
other analysts have attempted to find an even more accurate formula. For instance, Baseball-Reference.com uses
1.83 as its exponent of choice. Use both versions of the formula do predict how many games should the Cubs have
won. They scored 819 runs and allowed 723 runs in 162 games.

15
How Do Outliers Affect the LSRL?

1. Use the Correlation and Regression applet at tinyurl.com/regressionapplet

• Click on the graphing area to add 10


points in the center so that the
correlation is about r = 0.90.
(d)
• Check the box to show the least-
squares line.
(b) (a)

(c)

2. Predict if the slope, y-intercept and correlation will increase, decrease, or stay about the same. Check your
prediction, then delete the point (click on it) before moving to the next part.

(a) Adding point (a) as shown in the picture above.

slope: y-intercept: correlation:

(b) Adding point (b) as shown in the picture above.

slope: y-intercept: correlation:

(c) Adding point (c) as shown in the picture above.

slope: y-intercept: correlation:

(d) Adding point (d) as shown in the picture above.

slope: y-intercept: correlation:

3. Why are points (a), (b), (c), and (d) considered outliers?

4. Which outliers had the greatest impact on the LSRL: vertical or horizontal outliers?

16
As you learned in this activity, unusual points may or may not have an influence on the least-squares regression line and
the correlation r. The same is true for the coefficient of determination r2 and the standard deviation of residuals s. Here
are 4 scatter plots the summarize the possibilities. In all four scatterplots, the 8 points in the low left are the same.

Case 1: No unusual points

Case 2: A point far from the other points in the x direction, but in the same pattern.

Compared to case 1, determine whether the following stayed the same, increased, decreased or changed from positive
to negative.

Slope: y-intercept: r:

r2 s:

Case 3: A point that if far from the other points in the x direction, and not in the same pattern.

Compared to case 1, determine whether the following stayed the same, increased, decreased or changed from positive
to negative.

Slope: y-intercept: r:

r2 s:

17
Case 4: A point that is far from the other points in the y direction, and not in the same pattern.

Compared to case 1, determine whether the following stayed the same, increased, decreased or changed from positive
to negative.

Slope: y-intercept: r:

r2 s:

In Cases 2 and 3, the unusual point had a much bigger x value than the other points. Points whose x values are much
smaller or much larger than the other points in a scatterplot have high leverage. In Case 4, the unusual point had a very
large residual. Points with large residuals are called outliers. All three of these unusual points are considered influential
points because adding them to the scatterplot substantially changed either the equation of the least-squares regression
line or one or more of the other summary statistics (r, r², s).

Definitions

Points with high leverage in regression have much larger or much smaller x values than the other points in the data set.

An outlier in regression is a point that does not follow the pattern of the data and has a large residual.

An influential point in regression is any point that, if removed, substantially changes the slope, y intercept, correlation,
coefficient of determination, or standard deviation of the residuals.

Example 10

Here is a scatterplot showing the cost in dollars and the battery life in hours 8
for a sample of netbooks (small laptop computers). 6

What effect do the two netbooks that cost $500 have on the following? 4

Slope: 2

y-intercept: 300 350 400 450 500


Cost (dollars)

correlation r:

coefficient of determination r2:

standard deviation of the residuals:

18
2007B

19
Matching Graphs & Coefficients of determination

A circled point was added to graph A to create graphs B, C and D. Likewise, a circled point was added to graph E to
create graphs F, G and H. Match the graphs with the regression equations and coefficients of determination. NO
CALCULATORS!

1) _______ yˆ =
−0.366 x + 3; r 2 =
0.012 2) _______ yˆ =
−0.436 x + 4.53; r 2 =
0.33

3) _______ yˆ = 0.327 x + 1.3; r 2 = 0.72 4) _______ yˆ =


−0.0617 x + 3.3; r 2 =
0.026

5) _______ yˆ =
−0.888 x + 6.7; r 2 =
0.34 6) _______ yˆ = 0.231x + 2.3; r 2 = 0.28

7) _______ yˆ = 0.536 x + 0.31; r 2 = 0.65 8) _______ yˆ = 0.53 x + 1.1; r 2 = 0.14

A B C
D)

D E F

G H

20
Developing an equation for estimating body height from linear body measurements of Ethiopian adults

Alemayehu Digssie, Alemayehu Argaw, and Tefera Belachew https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6258443/

Measurements of erect height in older people, hospitalized and bedridden patients, and people with skeletal deformity
is difficult. As a result, using body mass index for assessing nutritional status is not valid. Height estimated from linear
body measurements such as arm span, knee height, and half arm span was shown to be useful surrogate measures of
stature. However, the relationship between linear body measurements and stature varies across populations implying
the need for the development of population-specific prediction equation. The objective of this study was to develop a
formula that predicts height from arm span, half arm span, and knee height for Ethiopian adults and assess its
agreement with measured height.

A cross-sectional study was conducted from March 15 to April 21, 2016 in Jimma University among a total of 660 (330
females and 330 males) subjects aged 18–40 years. A two-stage sampling procedure was employed to select study
participants. Data were collected using interviewer-administered questionnaire and measurement of anthropometric
parameters. The data were edited and entered into Epi Data version 3.1 and exported to SPSS for windows version 20
for cleaning and analyses. Linear regression model was fitted to predict height from knee height, half arm span, and arm
span. Bland-Altman analysis was employed to see the agreement between actual height and predicted heights. P values
< 0.05 was used to declare as statistically significance.

On multivariable linear regression analyses after adjusting for age and sex, arm span (β = 0.63, p < 0.001, R2 = 87%), half
arm span (β = 1.05, p < 0.001, R2 = 83%), and knee height (β = 1.62, p < 0.001, R2 = 84%) predicted height significantly.
The Bland-Altman analyses showed a good agreement between measured height and predicted height using all the
three linear body measurements.

The findings imply that in the context where height cannot be measured, height predicted from arm span, half arm span,
and knee height is a valid proxy indicator of height. Arm span was found to be the best predictor of height. The
prediction equations can be used to assess the nutritional status of hospitalized and/or bedridden patients, people with
skeletal deformity, and elderly population in Ethiopia.

1. Identify the explanatory and response variables.

2. Identify the population and the sample.

3. “P values < 0.05 was used to declare as statistically significant.” What does this mean?

4. According to this study, which measurement is best for predicting height in Ethiopian adults?

5. Interpret the slope for the arm spam equation. (Note β = slope)

6. Interpret the r2 value for the arm span data.


21
How much candy can you grab?

Can students with a larger handspan grab more candy than those with smaller handspans?
Today we will investigate this question.

1. Measure the span of your dominant hand to the nearest half centimeter (cm). Handspan is the distance from the tip
of the thumb to the tip of the pinkie finger on your fully stretched-out hand. Handspan = ________ cm

2. Use the same hand to grab as many candies as possible from the container. You must grab the candies with your
fingers pointing down (no scooping!) and hold the candies for 2 seconds before counting them. After counting, put
the candy back into the container. Record your data on the board.

3. Sketch a scatterplot of the relationship and describe what the scatterplot reveals about the relationship between
hand span and number of candies held.

4. Using technology determine the least squares regression equation for predicting the number of candies grabbed
from hand size. Add the line to the scatterplot.

5. Record the correlation coefficient below and interpret its value.

6. What is the slope of the line? Interpret the slope in context.

22
7. What is the y-intercept of the line? Interpret the y-intercept in context.

8. Lebron James has a handspan of 23.5 cm. Use the equation to predict how many candies Lebron can grab.

9. Suppose Lebron actually grabbed 38 candies. Find and interpret the residual.

10. Suppose we did not gather information about your hand sizes and had simply used the mean number of candies
grabbed for your class to make a prediction for any given student. Use your data to find the mean number of candies
grabbed in this class. Record the mean.

11. Using the mean you found in #10, draw a horizontal line at the value on the graph in #3. For the majority of the
students in the class, does it appear that the mean line or the LSRL better predicts the number of candies a given
student can grab? Explain.

12. The coefficient of determination, r2, is a measure of the improvement in prediction when using the LSRL to predict
the value of a response variable rather than simply using the average value of the response variable. Find and
interpret the value of r2 for the LSRL of number of candies grabbed and hand size.

23
Chapter 3 Review

Exercises 1-5 refer to the following setting.


Measurements on young children in Mumbai, India, found this least-squares line for predicting height y from arm span x:

𝑦𝑦� = 6.4 + 0.93𝑥𝑥


Measurements are in centimeters (cm).

1. By looking at the equation of the least-squares regression line, you can see that the correlation between height
and arm span is

(a) greater than zero.


(b) less than zero.
(c) 0.93.
(d) 6.4.
(e) Can’t tell without seeing the data.

2. In addition to the regression line, the report on the Mumbai measurements says that r2 = 0.95. This suggests
that

(a) although arm span and height are correlated, arm span does not predict height very accurately.
(b) height increases by .97 cm for each additional centimeter of arm span.
(c) 95% of the relationship between height and arm span is accounted for by the regression line.
(d) 95% of the variation in height is accounted for by the regression line.
(e) 95% of the height measurements are accounted for by the regression line.

3. One child in the Mumbai study had height 59 cm and arm span 60 cm. This child’s residual is

(a) −3.2 cm.


(b) −2.2 cm.
(c) −1.3 cm.
(d) 3.2 cm.
(e) 62.2 cm.

4. Suppose that a tall child with arm span 120 cm and height 118 cm was added to the sample used in this study.
What effect will adding this child have on the correlation and the slope of the least-squares regression line?
(a) Correlation will increase, slope will increase.
(b) Correlation will increase, slope will stay the same.
(c) Correlation will increase, slope will decrease.
(d) Correlation will stay the same, slope will stay the same.
(e) Correlation will stay the same, slope will increase.

5. Suppose that the measurements of arm span and height were converted from centimeters to meters by dividing
each measurement by 100. How will this conversion affect the values of r2 and s?

(a) r2 will increase, s will increase.


(b) r2 will increase, s will stay the same.
(c) r2 will increase, s will decrease.
(d) r2 will stay the same, s will stay the same.
(e) r2 will stay the same, s will decrease.

24
6. You have data for many years on the average price of a barrel of oil and the average retail price of a gallon of
unleaded regular gasoline. If you want to see how well the price of oil predicts the price of gas, then you should
make a scatterplot with _______ as the explanatory variable.

(a) the price of oil


(b) the price of gas
(c) the year
(d) either oil price or gas price
(e) time

7. In a scatterplot of the average price of a barrel of oil and the average retail price of a gallon of gas, you expect to
see

(a) very little association.


(b) a weak negative association.
(c) a strong negative association.
(d) a weak positive association.
(e) a strong positive association.

8. If women always married men who were 2 years older than themselves, what would the correlation between
the ages of husband and wife be?

(a) 2
(b) 1
(c) 0.5
(d) 0
(e) Can’t tell without seeing the data

9. Which of the following is not a characteristic of the least-squares regression line?


(a) The slope of the least-squares regression line is always between −1 and 1.
(b) The least-squares regression line always goes through the point ( x , y ).
(c) The least-squares regression line minimizes the sum of squared residuals.
(d) The slope of the least-squares regression line will always have the same sign as the correlation.
(e) The least-squares regression line is not resistant to outliers.

10. The figure to the right is a scatterplot of reading test scores against IQ test scores for 14 fifth-grade children.
There is one low outlier in the plot. What effect does this low outlier have on the correlation?

(a) It makes the correlation closer to 1.


(b) It makes the correlation closer to 0 but still positive.
(c) It makes the correlation equal to 0.
(d) It makes the correlation negative.
(e) It has no effect on the correlation.

25
11. The manager of a grocery store selected a random sample of 11 customers to investigate the relationship
between the number of customers in a checkout line and the time to finish checkout. As soon as the selected
customer entered the end of the checkout line, data were collected on the number of customers in line who
were in front of the selected customer and the time, in seconds until the selected customer was finished with
the checkout. The data are shown in the following scatterplot along with the corresponding LSRL and computer
output.

a. Describe what the scatterplot reveals about the relationship between number of customers in line and the
time it takes to checkout.

b. What is the equation of the LSRL?

c. Identify and interpret in context the estimate of the slope for the LSRL.

d. Identify and interpret in context the estimate of the intercept for the LSRL.

e. Calculate and interpret the residual for a customer who was in a line with 3 people and finished checking out
after 200 seconds.

26
f. Identify and interpret the standard deviation of the residuals in context.

g. Identify and interpret in context the coefficient of determination, r2.

h. One of the data points was determined to be an outlier. Circle the point on the scatterplot and explain why
it is considered an outlier. If this point were removed from the plot, what effect would it have on the
correlation?

12. Each year, students in an elementary school take a standardized math test at the end of the school year. For a
class of fourth-graders, the average score was 55.1 with a standard deviation of 12.3. In the third grade, these
same students had an average score of 61.7 with a standard deviation of 14.0. The correlation between the two
sets of scores is r = 0.95. Calculate the equation of the least-squares regression line for predicting a fourth-grade
score from a third-grade score.

27

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy