0% found this document useful (0 votes)

36 views22 pages

Interactive Lecture Notes 12-Regression Analysis

This document provides an overview of regression analysis, focusing on the relationship between two quantitative variables, specifically using restaurant bills and tips as an example. It explains the process of creating a scatterplot, fitting a regression line, calculating residuals, and measuring the strength of the relationship through correlation coefficients. Additionally, it discusses the significance of the linear relationship and the estimation of standard deviation for regression analysis.

Uploaded by

victornwabasili

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views22 pages

Interactive Lecture Notes 12-Regression Analysis

Uploaded by

victornwabasili

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Regression Analysis

Describing and assessing the significance of relationships between variables is very important
in research. We will first learn how to do this in the case when the two variables are
quantitative. Quantitative variables have numerical values that can be ordered according to
those values.

Main idea
We wish to study the relationship between two quantitative variables.

Generally one variable is the variable, denoted by y.

This variable measures the outcome of the study
and is also called the variable.

The other variable is the variable, denoted by x.

It is the variable that is thought to explain the changes we see in the response variable.

The explanatory variable is also called the variable.

The first step in examining the relationship is to use a graph -‐ a scatterplot -‐ to display the
relationship. We will look for an overall pattern and see if there are any departures from this
overall pattern.

If a linear relationship appears to be reasonable from the scatterplot, we will take the next step
of finding a model (an equation of a line) to summarize the relationship. The resulting equation
may be used for predicting the response for various values of the explanatory variable. If
certain assumptions hold, we can assess the significance of the linear relationship and make
some confidence intervals for our estimations and predictions.

Let's begin with an example that we will carry throughout our discussions.
Graphing the Relationship: Restaurant Bill vs Tip
How well does the size of a restaurant bill predict the tip the server receives? Below are the
bills and tips from six different restaurant visits in dollars.
Bill 41 98 25 85 50 73
Tip 8 17 4 12 5 14
Response (dependent) variable y = .
Explanatory (independent) variable x = .
Step 1: Examine the data graphically with a scatterplot.
Add the points to the scatterplot below:

Interpret the scatterplot in terms of ...

 overall form (is the average pattern look like a straight line or is it curved?)
 direction of association (positive or negative)
 strength of association (how much do the points vary around the average pattern?)
 any deviations from the overall form?
Describing a Linear Relationship with a Regression Line
Regression analysis is the area of statistics used to examine the relationship between a
quantitative response variable and one or more explanatory variables. A key element is the
estimation of an equation that describes how, on average, the response variable is related to
the explanatory variables. A regression equation can also be used to make predictions.
The simplest kind of relationship between two variables is a straight line, the analysis in this
case is called linear regression.

Regression Line for Bill vs. Tip

Remember the equation of a line?
In statistics we denote the regression line for a sample as:
where:

yˆ

Goal:
To find a line that is “close” to the data points -‐ find the “best fitting” line.

How?
What do we mean by best?
One measure of how good a line fits is to look at the
“observed errors” in prediction.

Observed errors =

are called

So we want to choose the line for which the sum of

squares of the observed errors (the sum of squared
residuals) is the least.

The line that does this is called:

The equations for the estimated slope and intercept are given by:

b1 

b0 

The least squares regression line (estimated regression function) is:

yˆ  ˆ y (x)  b0  b1 x
To find this estimated regression line for our exam data by hand, it is easier if we set up a
calculation table. By filling in this table and computing the column totals, we will have all of the
main summaries needed to perform a complete linear regression analysis. Note that here we
have n = 6 observations. The first five rows have been completed for you. In general, use R or a
calculator to help with the graphing and numerical computations!

x = bill y = tip xx  x  x 2 x  x y yy y  y2

41 8 41–62 = -‐21 (-‐21)2 = 441 (-‐21)(8)= -‐168 8–10 = -‐2 (-‐2)2 = 4
98 17 98–62 = 36 (36)2 = 1296 (36)(17)= 612 17–10 = 7 (7)2 = 49

25 4 25–62 = -‐37 (-‐37)2 = 1369 (-‐37)(4)= -‐148 4–10 = -‐6 (-‐6)2 = 36

85 12 85–62 = 23 (23)2 = 529 (23)(12)= 276 12–10 = 2 (2)2 = 4

50 5 50–62 = -‐12 (-‐12)2 = 144 (-‐12)(5)= -‐60 5–10 = -‐5 (-‐5)2 = 25

73 14

372 60

372 60
x= = 𝟔𝟐 ӯ= = 𝟏𝟎
6 6

Slope Estimate:

y-‐intercept Estimate:

Estimated Regression Line:

Predict the tip for a dinner guest who had a $50 bill.

Note: The 5th dinner guest in sample had a bill of $50 and the observed tip was $5.

Find the residual for the 5th observation.

Notation for a residual  e5  y5  yˆ 5 

The residuals …

You found the residual for one observation. You could compute the residual for each
observation. The following table shows each residual.

𝑦̂ = −0.5877 +
predicted values residuals Squared residuals

0.17077(𝑥)
x = bill y = tip e  y  yˆ (e) 2   y  yˆ 2
41 8 6.41 1.59 2.52
98 17 16.15 0.85 0.72
25 4 3.68 0.32 0.10
85 12 13.93 -‐1.93 3.73
50 5 7.95 -‐2.95 8.70
73 14 11.88 2.12 4.49
-‐-‐ -‐-‐ - ‐- ‐

SSE = sum of squared errors (or residuals) 

Measuring Strength and Direction of a Linear Relationship with Correlation
The correlation coefficient r is a measure of strength of the linear relationship between y and x.
Properties about the Correlation Coefficient r
1. r ranges from ...

2. Sign of r indicates ...

3. Magnitude of r indicates ...

A “strong” r is discipline specific

r = 0.8 might be an important (or strong) correlation in engineering
r = 0.6 might be a strong correlation in psychology or medical research

4. r ONLY measures the strength of the relationship.

Some pictures:

The formula for the correlation:

(but we will get it from computer output or from r2)

Tips Example:

Interpretation:
The square of the correlation r 2

The squared correlation coefficient r always has a value between and is

sometimes presented as a percent. It can be shown that the square of the correlation is related
to the sums of squares that arise in regression.

The responses (the amount of tip) in data set are not all the same -‐ they do vary. We
would measure the total variation in these responses as SSTO    y  y2 (last column
total in calculation table said we would use later).
Part of the reason why the amount of tip varies is because there is a linear relationship
between amount of tip and amount of bill, and the study included different amounts of bill.

When we found the least squares regression line, there was still some small variation remaining
of the responses from the line. This amount of variation that is not accounted for by the linear
relationship is called the SSE.
The amount of variation that is accounted for by the linear relationship is called the sum of
squares due to the model (or regression), denoted by SSM (or sometimes as SSR).

So we have: SSTO =
It can be shown that
r2=
= the proportion of total variability in the responses that can be explained by the linear
relationship with the explanatory variable x .
Note: The value of r 2 and these sums of squares are summarized in an ANOVA table that is
standard output from computer packages when doing regression.
Measuring Strength and Direction for Exam 2 vs Final

From our first calculation table we have:

SSTO =

From our residual calculation table we have:

SSE =

So the squared correlation coefficient for our exam scores regression is:

2 SSTO  SSE
r  SSTO =

Interpretation:
We accounted for % of the variation in

by the linear regression on .

The correlation coefficient is r =

A few more general notes:

 Nonlinear relationships
 Detecting Outliers and their influence on regression results.
 Dangers of Extrapolation (predicting outside the range of your data)
 Dangers of combining groups inappropriately (Simpson’s Paradox)
 Correlation does not prove causation
R Regression Analysis for Bill vs Tips

Let’s look at the R output for our Bill and Tip data.
We will see that much of the computations are done for us.

Call:
lm(formula = Tip ~ Bill, data = Tips)

Residuals:
1 2 3 4 5 6
1.5862 0.8523 0.3185 -1.9277 -2.9508 2.1215

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.58769 2.41633 -0.243 0.81980
Bill 0.17077 0.03604 4.738 0.00905 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.251 on 4 degrees of freedom Multiple R-

squared: 0.8487, Adjusted R-squared: 0.8109
F-statistic: 22.45 on 1 and 4 DF, p-value: 0.009052

Correlation “Matrix”

Bill Tip
Bill 1.0000000 0.9212755
Tip 0.9212755 1.0000000

ANOVA Table

Response: Tip
Df Sum Sq Mean Sq F value Pr(>F) Bill
1 113.732 113.732 22.446 0.009052 **
Residuals 4 20.268 5.067
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Inference in Linear Regression Analysis
The material covered so far focuses on using the data for a sample to graph and describe the
relationship. The slope and intercept values we have computed are statistics, they are
estimates of the underlying true relationship for the larger population.

Next we turn to making inferences about the relationship for the larger population. Here is a
nice summary to help us distinguish between the regression line for the sample and the
regression line for the population.

Regression Line for the Sample

Regression Line for the Population

All images
To do formal inference, we think of our b0 and b1 as estimates of the unknown parameters 0
and 1 . Below we have the somewhat statistical way of expressing the underlying model that
produces our data:

Linear Model: the response y = [0 + 1(x)] + 

= [Population relationship] + Randomness

This statistical model for simple linear regression assumes that for each value of x the observed
values of the response (the population of y values) is normally distributed, varying around
some true mean (that may depend on x in a linear way) and a standard deviation  that does
not depend on x. This true mean is sometimes expressed as E(Y) = 0 + 1(x). And the
components and assumptions regarding this statistical model are show visually below.

True
regression
line

x
The  represents the true error term. These would be the deviations of a particular value of the
response y from the true regression line. As these are the deviations from the mean, then these
error terms should have a normal distribution with mean 0 and constant standard deviation .
Now, we cannot observe these  ’s. However we will be able to use the estimated (observable)
errors, namely the residuals, to come up with an estimate of the standard deviation and to
check the conditions about the true errors.
So what have we done, and where are we going?
1. Estimate the regression line based on some data. DONE!
2. Measure the strength of the linear relationship with the correlation. DONE!
3. Use the estimated equation for predictions. DONE!
4. Assess if the linear relationship is statistically significant.
5. Provide interval estimates (confidence intervals) for our predictions.
6. Understand and check the assumptions of our model.
We have already discussed the descriptive goals of 1, 2, and 3. For the inferential goals of 4 and
5, we will need an estimate of the unknown standard deviation in regression 

Estimating the Standard Deviation for Regression

The standard deviation for regression can be thought of as measuring the average size of the
residuals. A relatively small standard deviation from the regression line indicates that individual
data points generally fall close to the line, so predictions based on the line will be close to the
actual values.
It seems reasonable that our estimate of this average size of the residuals be based on the
residuals using the sum of squared residuals and dividing by appropriate degrees of freedom.
Our estimate of  is given by:

Note: Why n – 2?

Estimating the Standard Deviation: Bill vs Tip

Below are the portions of the R regression output that we could use to obtain the estimate of 
for our regression analysis.

From Summary:

Residual standard error: 2.251 on 4 degrees of freedom Multiple R-

squared: 0.8487, Adjusted R-squared: 0.8109
F-statistic: 22.45 on 1 and 4 DF, p-value: 0.009052

Or from ANOVA:

Response: Tip
Df Sum Sq Mean Sq F value Pr(>F)
Bill 1 113.732 113.732 22.446 0.009052 **
Significant Linear Relationship?
Consider the following hypotheses: H 0 : 1  versus H a : 1  0
What happens if the null hypothesis is true? 0

There are a number of ways to test this hypothesis. One way is through a t-‐test statistic (think
about why it is a t and not a z test). The general form for a t test statistic is:

sample statistic - null value

t standard error of the sample
statistic

We have our sample estimate for 1 , it is b1 . And we have the null value of 0. So we need the
standard error for b1 . We could “derive” it, using the idea of sampling distributions (think about
the population of all possible b1 values if we were to repeat this procedure over and over many
times). Here is the result:

t-‐test for the population slope 1

t  b1  0
To test H 01 :   0 we would use
where SE(b ) s.e.(b )
1
s and the degrees of freedom for the t-‐distribution are n – 2.
1
 x  x 
2

This t-‐statistic could be modified to test a variety of hypotheses about the population slope (different n

Try It!
Significant Relationship between Bill and Tip?
Is there a significant (non-‐zero) linear relationship between the total cost of a restaurant
bill and the tip that is left? (is the bill a useful linear predictor for the tip?)
That is, H 0 : 1  0 versus H a : 1  0 using a 5% level of significance.
test
Think about it:
Based on the results of the previous t-‐test conducted at the 5% significance level, do you think a
95% confidence interval for the true slope 1 would contain the value of 0?

Confidence Interval for the population slope

1
b1  t * SEb1  where df = n – 2 for the t * value

Compute the interval and check your answer.

Could you interpret the 95% confidence level here?

Inference about the Population Slope using R

Below are the portions of the R regression output that we could use to perform the t-‐test and
obtain the confidence interval for the population slope 1 .

Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) -0.58769
2.41633 -0.243 0.81980
Bill0.17077 0.036044.738 0.00905 **

Note: There is a third way to test H 0 : 1  0 versus H a : 1 

0. It involves another F-‐test from an ANOVA for regression.

Response: Tip
Df Sum Sq Mean Sq F value Pr(>F)
Bill 1 113.732 113.732 22.446 0.009052 **
Residuals 4 20.268 5.067
Predicting for Individuals versus Estimating the Mean
Consider the relationship between the bill and tip …
Least squares regression line (or estimated regression function):

yˆ 

We also have: s 

How would you predict the tip for Barb who had a $50 restaurant bill?

How would you estimate the mean tip for all customers who had a $50 restaurant bill?

So our estimate for predicting a future observation and for estimating the mean response are
found using the same least squares regression equation. What about their standard errors?
(We would need the standard errors to be able to produce an interval estimate.)

Idea: Consider a population of individuals and a population of means:

What is the standard deviation for a population of individuals?

What is the standard deviation for a population of means?

Which standard deviation is larger?

So a prediction interval for an individual response will be

(wider or narrower) than a confidence interval for a mean response.
Here are the (somewhat messy) formulas:
Confidence interval for a mean response:
yˆ  t*s.e.(fit)
1 (x  x) 2
s.e.(fit)  s
where
n x  x  2 df = n – 2
 i

Prediction interval for an individual response:

yˆ  t *s.e.(pred)
where s.e.(pred) s 2  s.e.(fit)2 df = n – 2

Try It! Bill vs Tip

Construct a 95% confidence interval for the mean tip given for all customers who had a $50 bill
(x). Recall: n = 6, x = 62,   x  x 2  S  yˆ  -‐0.58 +0.17(x), and s = 2.251.
3900,
XX

Construct a 95% prediction interval for the tip from an individual customer who had a $50 bill
(x).
Checking Assumptions in Regression
Let’s recall the statistical way of expressing the underlying model that produces our data:

Linear Model: the response y = [0 + 1(x)] + 

= [Population relationship] + Randomness
where the ‘s, the true error terms should be normally distributed
with mean 0 and constant standard deviation ,
and this randomness is independent from one case to another.

Thus there are four essential technical assumptions required for inference in linear regression:

(1) Relationship is in fact linear.

(2) Errors should be normally distributed.
(3) Errors should have constant variance.
(4) Errors should not display obvious ‘patterns’.

Now, we cannot observe these  ’s. However we will be able to use the estimated (observable)
errors, namely the residuals, to come up with an estimate of the standard deviation and to
check the conditions about the true errors.
So how can we check these assumptions with our data and estimated model?

(1) Relationship is in fact linear. 

(2) Errors should be normally distributed. 

(3) Errors should have constant variance. If we see …

(4) Errors should not display obvious ‘patterns’.

Now, if we saw …
Let's turn to one last full regression problem
that includes checking assumptions.

Relationship between height and foot

length for College Men

The heights (in inches) and foot lengths (in

centimeters) of 32 college men were used to
develop a model for the relationship between
height and foot length. The scatterplot and R
regression output are provided.

mean sd n
foot27.78125 1.549701 32
height 71.68750 3.057909 32

Call:
lm(formula = foot ~ height, data = heightfoot)

Residuals:
Min 1Q Median3QMax 0.07875 0.58075 2.25075
-1.74925 -0.81825

Coeffi cients:
Estimate Std. Error t value (Intercept) 0.253134.332320.058 Pr(>|t|)
0.954
height 0.38400 0.06038 6.360 5.12e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.028 on 30 degrees of freedom Multiple R-squared: 0.5741,A

F-statistic: 40.45 on 1 and 30 DF, p-value: 5.124e-07

Correlation Matrix

foot height
foot1.0000000 0.7577219
height 0.7577219 1.0000000
Analysis of Variance Table Response: foot
Df Sum Sq Mean Sq F value

Pr(>F)
height1 42.744 42.744 40.446 5.124e-07 ***
Residuals 30 31.7051.057

Also note that: SXX =   x  x 2 = 289.87

a. How much would you expect foot length to increase for each 1-‐inch increase in height?
Include the units.

b. What is the correlation between height and foot length?

c. Give the equation of the least squares regression line for predicting foot length from height.

d. Suppose Max is 70 inches tall and has a foot length of 28.5 centimeters. Based on the least
squares regression line, what is the value of the prediction error (residual) for Max? Show
all work.

e. Use a 1% significance level to assess if there is a significant positive linear relationship

between height and foot length. State the hypotheses to be tested, the observed value of
the test statistic, the corresponding p-‐value, and your decision.

Hypotheses: H0: Ha:

Test Statistic Value: p-‐value:

Decision: (circle) Fail to reject H0 Reject H0

Conclusion:
f. Calculate a 95% confidence interval for the average foot length for all college men who are
70 inches tall. (Just clearly plug in all numerical values.)

g. Consider the residuals vs fitted plot shown.

Does this plot support the conclusion that the linear regression model is appropriate?

Yes No

Explain:
Regression
Linear Regression Model Standard Error of the Sample Slope
s s
s.e.(b1 )  
Population Version:
Mean: Y x  E(Y )  0  1 x
S XX
 x  x  2

Individual: yi  0  1 xi   i
Confidence Interval for 1
where  i is N (0, )
b1  t*s.e.(b1 ) df = n – 2
Sample Version:
b 0
Mean: yˆ  b0  b1x t-‐Test for  To test H :   0 t  1
1 0 1
Individual: yi  b0  b1xi  ei s.e.(b )
1
df = n – 2
MSREG
or F  df = 1, n – 2
MSE
Parameter Estimators Confidence Interval for the Mean Response

b
S XY x  x y  y x  x y yˆ  t *s.e.(fit) df = n – 2
1    (x  x) 2
1
SXX  x  x  2
x  x  2
where s.e.(fit)  s 
n S XX
b0  y  b1 x
Residuals Prediction Interval for an Individual Response
e  y  yˆ = observed y – predicted y yˆ  t *s.e.(pred) df = n – 2
where s.e.(pred)  s 2  s.e.(fit)
2

Correlation and its square Standard Error of the Sample Intercept

S XY 1 x2
r
S XX SYY s.e.(b0 )  s 
n SXX
2 SSTO  SSE SSREG
r  SSTO  SSTO Confidence Interval for 0
2
where SSTO  S  y  y b0  t*s.e.(b0 ) df = n – 2
YY 
Estimate of  t-‐Test for 0 To test H 0 :  0  0
SSE b0  0
s  MSE  where t df = n – 2
n2 s.e.(b0 )
2
SSE    y  yˆ   e 2
Additional Notes
A place to … jot down questions you may have and ask during office hours, take a few extra notes, write
out an extra problem or summary completed in lecture, create your own summary about these
concepts.

Introduction To Linear Regression and Correlation Analysis
No ratings yet
Introduction To Linear Regression and Correlation Analysis
47 pages
Asmerom Tekle .
No ratings yet
Asmerom Tekle .
296 pages
Managerial Statistics-Notes On All Chapter
100% (2)
Managerial Statistics-Notes On All Chapter
73 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
20 pages
The Simple Linear Regression Model and Correlation
100% (1)
The Simple Linear Regression Model and Correlation
64 pages
Chapter13 MAS202
No ratings yet
Chapter13 MAS202
32 pages
FM Project REPORT - Group3
No ratings yet
FM Project REPORT - Group3
24 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
36 pages
Uttam Linear Regression 17March24 (1)
No ratings yet
Uttam Linear Regression 17March24 (1)
82 pages
Lecture 7
No ratings yet
Lecture 7
12 pages
Summary: Correlation and Regression
No ratings yet
Summary: Correlation and Regression
6 pages
13 Predictive Analysis - Tests of Association- Regression
No ratings yet
13 Predictive Analysis - Tests of Association- Regression
70 pages
Simple Linear Regression
100% (1)
Simple Linear Regression
50 pages
Chapter 10
No ratings yet
Chapter 10
3 pages
C6 Regression
No ratings yet
C6 Regression
27 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
Effect of Product Packaging On Consumer Buying Behavior The Case
No ratings yet
Effect of Product Packaging On Consumer Buying Behavior The Case
78 pages
Lecture 4
No ratings yet
Lecture 4
3 pages
Statistical Process Control (SPC)
100% (1)
Statistical Process Control (SPC)
52 pages
Regression Equation
No ratings yet
Regression Equation
56 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
Unit 2 - Scatterplots Correlation and Regression Summer 2021
No ratings yet
Unit 2 - Scatterplots Correlation and Regression Summer 2021
43 pages
Simple Linear Regressionclassroom
No ratings yet
Simple Linear Regressionclassroom
37 pages
Relationship- Correlation and Regression (1)
No ratings yet
Relationship- Correlation and Regression (1)
42 pages
Lecture 4 - Correlation and Regression
No ratings yet
Lecture 4 - Correlation and Regression
35 pages
Unit-III
No ratings yet
Unit-III
13 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
Linear Regression and Correlation: Mcgraw-Hill/Irwin
No ratings yet
Linear Regression and Correlation: Mcgraw-Hill/Irwin
29 pages
PE Civil: Transportation e-book Practice Exam
No ratings yet
PE Civil: Transportation e-book Practice Exam
41 pages
6.3 SSK5210 Parametric Statistical Testing - Analysis of Variance LR and Correlation - 2
No ratings yet
6.3 SSK5210 Parametric Statistical Testing - Analysis of Variance LR and Correlation - 2
39 pages
14 Statistics and Probability
No ratings yet
14 Statistics and Probability
37 pages
Linear Regression Full Version
No ratings yet
Linear Regression Full Version
34 pages
Chapter 12
No ratings yet
Chapter 12
12 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
REGRESSION
No ratings yet
REGRESSION
8 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
Business Analytics: Advance: Simple & Multiple Linear Regression
No ratings yet
Business Analytics: Advance: Simple & Multiple Linear Regression
38 pages
Regression and Correlation
No ratings yet
Regression and Correlation
13 pages
Introducing Regression: Notes Unit 5: Regression Basics
No ratings yet
Introducing Regression: Notes Unit 5: Regression Basics
5 pages
Correlation
100% (1)
Correlation
29 pages
Simple Linear
No ratings yet
Simple Linear
10 pages
ArunRangrej
No ratings yet
ArunRangrej
5 pages
Regression Equation For SI
No ratings yet
Regression Equation For SI
12 pages
Advanced Marketing Research
No ratings yet
Advanced Marketing Research
32 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
Statistics for the Behavioral Sciences 3rd Edition, (Ebook PDF) download
100% (1)
Statistics for the Behavioral Sciences 3rd Edition, (Ebook PDF) download
56 pages
Regression
No ratings yet
Regression
66 pages
Session 5 Marked B PDF
No ratings yet
Session 5 Marked B PDF
36 pages
Trend of Watching Movies Among Cfsiium Gambang Students: Group Members
0% (1)
Trend of Watching Movies Among Cfsiium Gambang Students: Group Members
22 pages
Regression
No ratings yet
Regression
24 pages
Week-4 BA Linear Regression
No ratings yet
Week-4 BA Linear Regression
16 pages
Lecture9 Regression1 PDF
No ratings yet
Lecture9 Regression1 PDF
22 pages
Regression
No ratings yet
Regression
15 pages
Time Series hw5
100% (2)
Time Series hw5
4 pages
Regresión y Calibración
No ratings yet
Regresión y Calibración
6 pages
Notes 516 Summer 09 Part 2
No ratings yet
Notes 516 Summer 09 Part 2
15 pages
Linear Regression Analysis: Module - Iv
No ratings yet
Linear Regression Analysis: Module - Iv
10 pages
Topic - chapter 12 - Regression models
No ratings yet
Topic - chapter 12 - Regression models
1 page
Fin Math
100% (1)
Fin Math
151 pages
114 Heo Con Test
No ratings yet
114 Heo Con Test
12 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Format Final Q1 Research
No ratings yet
Format Final Q1 Research
21 pages
Student Notes Madule 2
No ratings yet
Student Notes Madule 2
12 pages
A Tutorial On How To Run A Simple Linear Regression in Excel
No ratings yet
A Tutorial On How To Run A Simple Linear Regression in Excel
19 pages
Factors Influence Impulse Buying Behaviour
No ratings yet
Factors Influence Impulse Buying Behaviour
8 pages
Formula Sheet for Measures of Central Tendency
No ratings yet
Formula Sheet for Measures of Central Tendency
2 pages
Intro Statistics Dtu
No ratings yet
Intro Statistics Dtu
426 pages
LabModule - Exploratory Data Analysis - 2023ic
No ratings yet
LabModule - Exploratory Data Analysis - 2023ic
24 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
7 pages
Correlation Paper 1
No ratings yet
Correlation Paper 1
5 pages
Normal Curve: Theory: Gauss - Z Probability Distribution
No ratings yet
Normal Curve: Theory: Gauss - Z Probability Distribution
16 pages
Ema Theory Trading
No ratings yet
Ema Theory Trading
16 pages
BBA Quiz 1
No ratings yet
BBA Quiz 1
7 pages
Friedman - The Use of Ranks To Avoid The Assumption of Normality Implicit in The Analysis of Variance
No ratings yet
Friedman - The Use of Ranks To Avoid The Assumption of Normality Implicit in The Analysis of Variance
27 pages
Ap Stat 1-7 Notes
No ratings yet
Ap Stat 1-7 Notes
12 pages
ES Upper Deviation EI Lower Deviation Unit 0.001mm: Basic Hole System
No ratings yet
ES Upper Deviation EI Lower Deviation Unit 0.001mm: Basic Hole System
2 pages
M5 L1 Mean Median Mode Prac Probs Solutions
100% (1)
M5 L1 Mean Median Mode Prac Probs Solutions
5 pages
Chap 1
No ratings yet
Chap 1
5 pages
Bunea Diana
No ratings yet
Bunea Diana
3 pages
Unit 5C Worksheet 3 Measures of Central Tendency
No ratings yet
Unit 5C Worksheet 3 Measures of Central Tendency
2 pages
Psychological Statistics II LAB Psychological Statistics
No ratings yet
Psychological Statistics II LAB Psychological Statistics
2 pages
Chapter 4 MMW
No ratings yet
Chapter 4 MMW
13 pages
Series: Comment On
No ratings yet
Series: Comment On
4 pages
Assignment Stats
No ratings yet
Assignment Stats
9 pages
Project Variance: ISDS 3115 Exam 1 Formula Sheet All Sections
No ratings yet
Project Variance: ISDS 3115 Exam 1 Formula Sheet All Sections
2 pages
Student Solutions Manual for Mathematics for Economics, fourth edition
From Everand
Student Solutions Manual for Mathematics for Economics, fourth edition
Michael Hoy
No ratings yet
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Interactive Lecture Notes 12-Regression Analysis

Uploaded by

Interactive Lecture Notes 12-Regression Analysis

Uploaded by

Regression Analysis

Generally one variable is the variable, denoted by y.

The other variable is the variable, denoted by x.

The explanatory variable is also called the variable.

Interpret the scatterplot in terms of ...

Regression Line for Bill vs. Tip

So we want to choose the line for which the sum of

The line that does this is called:

The least squares regression line (estimated regression function) is:

x = bill y = tip xx  x  x 2 x  x y yy y  y2

25 4 25–62 = -‐37 (-‐37)2 = 1369 (-‐37)(4)= -‐148 4–10 = -‐6 (-‐6)2 = 36

85 12 85–62 = 23 (23)2 = 529 (23)(12)= 276 12–10 = 2 (2)2 = 4

50 5 50–62 = -‐12 (-‐12)2 = 144 (-‐12)(5)= -‐60 5–10 = -‐5 (-‐5)2 = 25

Estimated Regression Line:

Find the residual for the 5th observation.

Notation for a residual  e5  y5  yˆ 5 

SSE = sum of squared errors (or residuals) 

2. Sign of r indicates ...

3. Magnitude of r indicates ...

A “strong” r is discipline specific

4. r ONLY measures the strength of the relationship.

The formula for the correlation:

The squared correlation coefficient r always has a value between and is

From our first calculation table we have:

From our residual calculation table we have:

by the linear regression on .

The correlation coefficient is r =

A few more general notes:

Residual standard error: 2.251 on 4 degrees of freedom Multiple R-

Regression Line for the Sample

Regression Line for the Population

Linear Model: the response y = [0 + 1(x)] + 

Estimating the Standard Deviation for Regression

Estimating the Standard Deviation: Bill vs Tip

Residual standard error: 2.251 on 4 degrees of freedom Multiple R-

sample statistic - null value

t-­‐test for the population slope 1

Confidence Interval for the population slope

Compute the interval and check your answer.

Could you interpret the 95% confidence level here?

Inference about the Population Slope using R

Note: There is a third way to test H 0 : 1  0 versus H a : 1 

Idea: Consider a population of individuals and a population of means:

What is the standard deviation for a population of individuals?

What is the standard deviation for a population of means?

So a prediction interval for an individual response will be

Prediction interval for an individual response:

Try It! Bill vs Tip

Linear Model: the response y = [0 + 1(x)] + 

(1) Relationship is in fact linear.

(1) Relationship is in fact linear. 

(2) Errors should be normally distributed. 

(3) Errors should have constant variance. If we see …

Relationship between height and foot

The heights (in inches) and foot lengths (in

Residual standard error: 1.028 on 30 degrees of freedom Multiple R-squared: 0.5741,A

Also note that: SXX =   x  x 2 = 289.87

b. What is the correlation between height and foot length?

e. Use a 1% significance level to assess if there is a significant positive linear relationship

Hypotheses: H0: Ha:

Test Statistic Value: p-‐value:

Decision: (circle) Fail to reject H0 Reject H0

g. Consider the residuals vs fitted plot shown.

Correlation and its square Standard Error of the Sample Intercept

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

t-‐test for the population slope 1