0% found this document useful (0 votes)
4 views7 pages

Chapter 5 - Regression

Chapter 5 discusses regression analysis, focusing on how one variable can predict or explain another. It introduces key concepts such as response and explanatory variables, the regression line, residuals, and the coefficient of determination. The chapter includes examples and calculations to illustrate these concepts in practical scenarios.

Uploaded by

jaylabee15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views7 pages

Chapter 5 - Regression

Chapter 5 discusses regression analysis, focusing on how one variable can predict or explain another. It introduces key concepts such as response and explanatory variables, the regression line, residuals, and the coefficient of determination. The chapter includes examples and calculations to illustrate these concepts in practical scenarios.

Uploaded by

jaylabee15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Chapter 5 – Regression

(Copyrighted – may not be repurposed, posted, or disseminated anywhere else.)

Chapter 4 – Scatterplots and Correlation


 Scatterplot – a graph that displays the relationship (or the association) between 2
quantitative variables.
 We describe a scatterplot (or a relationship) by giving the form, direction, strength, and
anything unusual (e.g., outliers).
 𝑟 = correlation coefficient – measures the strength and direction of a linear relationship
between 2 quantitative variables.

Chapter 5 – Regression

Sometimes we want to do more than measure the strength of a relationship; we want to use one
variable to help predict (or explain) the other.

Response variable
 The variable that measures the outcome of a study or the variable that we would like to
explain or predict.
 Plotted on the y axis.

Explanatory variable
 The variable that may help to explain, predict, or influence changes in the response
variable.
 Plotted on the x-axis

Example – For each of the following identify the response and explanatory variables.
(a) Student volunteers at a university drank different numbers of cans of beer. Thirty
minutes later, a police officer measured their blood alcohol content.

% of alcohol in the blood = __________ Number of beers consumed = __________

(b) The results of the National Student Loan Survey includes data on the amount of debt of
recent graduates, their current income, and how stressed they feel about college debt.

Amount of debt = __________ Current income = __________

How stressed they feel about their debt = __________

(c) Number of times a student accessed the website for a course = __________

Grade on the final exam = __________

(d) Calories burned per week = __________

Number of hours per week spent exercising = __________


1
Example – Researchers wonder if a person’s waist (which is easy to measure) could be a good
predictor of their body fat percentage (which might be harder to measure). To investigate the
relationship, researchers measure the waists of 10 male subjects and then immerse them in water
to accurately measure their body fat percent. The data is below.

Person 1 2 3 4 5 6 7 8 9 10
Waist (inches) 32 33 33 34 36 38 39 41 41 44
Body Fat (%) 6 6 10 12 16 21 22 27 32 33

(a) Identify the response and explanatory variables.

(b) Construct a scatterplot of the data. Describe the nature of the relationship, i.e., give the
form, direction, and strength of the relationship.

Regression Line
 The line that best ∗ fits the scatterplot (or best models the relationship between 𝑥 and 𝑦.
(* What is meant by best? See the last page of these notes.)

 The equation is 𝑦̂ = 𝑎 + 𝑏𝑥 where


𝑠𝑦
o 𝑏 = =slope of the line = 𝑟 ⋅
𝑠𝑥
o 𝑎 = y-intercept of the line = 𝑦̅ − (𝑏 ⋅ 𝑥̅ )
o 𝑦̂ = predicted y value

2
Example Continued

(c) Use the information below to calculate the equation of the regression line.

Mean Std. Dev.


𝑥 = waist 37.1 4.122
𝑦 = body fat 18.5 10.091
𝑟 = correlation 0.982

(d) Graph the regression line on your scatterplot. How well do you think the line fits the data
(or models the relationship between 𝑥 and 𝑦)?

(e) In the context of this problem give an interpretation of the slope.


Explain the meaning of the y-intercept.

(f) Predict the body fat percent for people having waist measurements of 33, 40, and 50
inches.

3
(g) Would you trust the accuracy of all 3 predictions made in Part (f)? Why or why not?

Residual
 The vertical distance that a point is from the regression line.
 A residual represents the prediction error between the observed (or actual) 𝑦 value and
the predicted 𝑦 value.
 When a point is
o above the regression line it has a positive residual. The line under-predicts the
actual y value.
o below the regression line it has a negative residual. The line over-predicts the
actual y value.
 Residual = (Observed 𝑦 value) – (Predicted 𝑦 value) = 𝑦 − 𝑦̂

Example Continued

(h) Calculate the residuals for Person #2 and Person #9. Is the regression line over-
predicting or under-predicting the actual body fat % in these two cases?

4
Coefficient of Determination
Question – Why is there variation among the 𝑦 values, i.e., why do different people have
different body fat %?
There could be many reasons, but we group them into two major categories.
 𝑦 values vary because body fat % is related to waist.
(The larger a person’s waist the larger their body fat % will be.)
 𝑦 values vary because body fat % is related to other factors.
(This could explain why person 2 and person 3 have the same waist but different
body fat %. Similarly for person 8 and person 9.)
Statisticians measure the percent of variation in the 𝑦 values that is due to each category.

 𝑟2 =
o the coefficient of determination
o the percent of the variation in the 𝑦 values that is due to the relationship with 𝑥
o the percent of the variation in the 𝑦 values that can be explained by the
regression model

 1 − 𝑟 2 = the percent of the variation in the 𝑦 values that remains unexplained

Example Continued

(i) What percent of the variation in the body fat % can be explained by the regression
model? What percent remains unexplained?

Notes
 0 ≤ 𝑟2 ≤ 1
 If our model (that uses 𝑥 to predict 𝑦) is a good model, then “most” of the variation in the
𝑦 values should be explained by 𝑥, i.e., 𝑟 2 should be “close” to 1.
 The closer 𝑟 2 is to 1.0, the better the regression line is for modeling the relationship
between 𝑥 and 𝑦 and the better it is for predicting 𝑦 by using 𝑥.
5
Example – As soon as a bottle of soda is opened, it begins to lose its carbonation. Fourteen 12-
ounce bottles of cola were obtained, and each was assigned a randomly selected time period (in
hours). Each bottle was opened and allowed to stand at room temperature. The carbonation (y)
in each bottle was measured after the prescribed time period (x). Summaries of the data appear
below.
Mean Std. Dev.
Time 0.614 0.390
Carbonation 2.671 0.891
Correlation –0.744

(a) Identify the response and explanatory variables.

(b) Calculate the equation of the least squares regression line.

(c) In the context of this problem, give an interpretation of the slope of the regression line.

(d) Predict the carbonation after 1 hour and 15 minutes.

(e) After sitting for 2 hours, one particular bottle had an actual carbonation of 0.300.
Calculate the residual for the bottle. Would the regression line under-predict or over-
predict the actual amount of carbonation for the bottle?

(f) What percent of the variation in the y values can be explained by the regression model?

6
Question: How do we determine whether one line fits a scatterplot better than another?

Answer:
 We use the method of least squares to assign a rating (SSE) to each line.

 The line that achieves the best rating (i.e., the smallest SSE) is the regression line.

 Residual = distance (measured


vertically) that a point is from a line

 Points above the line have positive


residuals; points below the line have
negative residuals.

 SSE = sum of squared errors

=  (residual 2 )

 The line achieving the smallest SSE is


the best line. This is the regression line.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy