0% found this document useful (0 votes)
35 views131 pages

Ch11-Simple Linear Regression

This document discusses simple linear regression models. It covers the linear regression steps of hypothesizing a deterministic relationship between variables, estimating unknown parameters, specifying a probability distribution for random errors, evaluating the model, and using the model for prediction. The document also discusses different types of regression models including simple, multiple, linear, and nonlinear models.

Uploaded by

Nida Buhurcu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views131 pages

Ch11-Simple Linear Regression

This document discusses simple linear regression models. It covers the linear regression steps of hypothesizing a deterministic relationship between variables, estimating unknown parameters, specifying a probability distribution for random errors, evaluating the model, and using the model for prediction. The document also discusses different types of regression models including simple, multiple, linear, and nonlinear models.

Uploaded by

Nida Buhurcu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 131

© 2003 Pearson Prentice Hall

Chapter 11

Simple Linear Regression

11 - 1
Learning Objectives
© 2003 Pearson Prentice Hall

1. Describe the Linear Regression Model


2. State the Regression Modeling Steps
3. Explain Ordinary Least Squares
1. Understand and check model assumptions
4. Compute Regression Coefficients
5. Predict Response Variable
6. Interpret Computer Output

11 - 2
© 2003 Pearson Prentice Hall

Models

11 - 3
Models
© 2003 Pearson Prentice Hall

1. Representation of Some Phenomenon


2. Mathematical Model Is a Mathematical
Expression of Some Phenomenon
3. Often Describe Relationships between
Variables
4. Types
 Deterministic Models
 Probabilistic Models

11 - 4
Deterministic Models
© 2003 Pearson Prentice Hall

1. Hypothesize Exact Relationships


2. Suitable When Prediction Error is
Negligible
3. Example: Force Is Exactly
Mass Times Acceleration
 F = m·a

© 1984-1994 T/Maker Co.

11 - 5
Probabilistic Models
© 2003 Pearson Prentice Hall

1. Hypothesize 2 Components
 Deterministic
 Random Error
2. Example: Sales Volume Is 10 Times
Advertising Spending + Random Error
 Y = 10X + 
 Random Error May Be Due to Factors
Other Than Advertising

11 - 6
Types of
© 2003 Pearson Prentice Hall
Probabilistic Models

Probabilistic
Models

Regression Correlation Other


Models Models Models

11 - 7
© 2003 Pearson Prentice Hall

Regression Models

11 - 8
Types of
© 2003 Pearson Prentice Hall
Probabilistic Models

Probabilistic
Models

Regression Correlation Other


Models Models Models

11 - 9
Regression Models
© 2003 Pearson Prentice Hall

1. Answer ‘What Is the Relationship


Between the Variables?’
2. Equation Used
 1 Numerical Dependent (Response) Variable
 What Is to Be Predicted
 1 or More Numerical or Categorical
Independent (Explanatory) Variables
3. Used Mainly for Prediction & Estimation

11 - 10
Regression Modeling
© 2003 Pearson Prentice Hall
Steps
1. Hypothesize Deterministic Component
2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of
Random Error Term
 Estimate Standard Deviation of Error
4. Evaluate Model
5. Use Model for Prediction & Estimation

11 - 11
© 2003 Pearson Prentice Hall

Model Specification

11 - 12
Regression Modeling
© 2003 Pearson Prentice Hall
Steps

1. Hypothesize Deterministic Component


2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of Random
Error Term
 Estimate Standard Deviation of Error
4. Evaluate Model
5. Use Model for Prediction & Estimation
11 - 13
Specifying the Model
© 2003 Pearson Prentice Hall

1. Define Variables
2. Hypothesize Nature of Relationship
 Expected Effects (i.e., Coefficients’ Signs)
 Functional Form (Linear or Non-Linear)
 Interactions

11 - 14
Model Specification
© 2003 Pearson Prentice Hall
Is Based on Theory

1. Theory of Field (e.g., Sociology)


2. Mathematical Theory
3. Previous Research
4. ‘Common Sense’

11 - 15
Thinking Challenge:
© 2003 Pearson Prentice Hall
Which Is More Logical?
Sales Sales

Advertising Advertising

Sales Sales

Advertising Advertising

11 - 16
Types of
© 2003 Pearson Prentice Hall
Regression Models

11 - 17
Types of
© 2003 Pearson Prentice Hall
Regression Models
Regression
Models

11 - 18
Types of
© 2003 Pearson Prentice Hall
Regression Models

1 Explanatory Regression
Variable Models

Simple

11 - 19
Types of
© 2003 Pearson Prentice Hall
Regression Models

1 Explanatory Regression 2+ Explanatory


Variable Models Variables

Simple Multiple

11 - 20
Types of
© 2003 Pearson Prentice Hall
Regression Models

1 Explanatory Regression 2+ Explanatory


Variable Models Variables

Simple Multiple

Linear

11 - 21
Types of
© 2003 Pearson Prentice Hall
Regression Models

1 Explanatory Regression 2+ Explanatory


Variable Models Variables

Simple Multiple

Non-
Linear
Linear

11 - 22
Types of
© 2003 Pearson Prentice Hall
Regression Models

1 Explanatory Regression 2+ Explanatory


Variable Models Variables

Simple Multiple

Non-
Linear Linear
Linear

11 - 23
Types of
© 2003 Pearson Prentice Hall
Regression Models

1 Explanatory Regression 2+ Explanatory


Variable Models Variables

Simple Multiple

Non- Non-
Linear Linear
Linear Linear

11 - 24
© 2003 Pearson Prentice Hall

Linear Regression Model

11 - 25
Types of
© 2003 Pearson Prentice Hall
Regression Models

1 Explanatory Regression 2+ Explanatory


Variable Models Variables

Simple Multiple

Non- Non-
Linear Linear
Linear Linear

11 - 26
Linear Equations
© 2003 Pearson Prentice Hall

Y
Y = mX + b
Change
m = Slope in Y
Change in X
b = Y-intercept
X

High School Teacher


© 1984-1994 T/Maker Co.

11 - 27
Linear Regression Model

1. Relationship Between Variables Is a


Linear Function
Population Population Random
Y-Intercept Slope Error

Yi   0  1X i   i
Dependent Independent
(Response) (Explanatory)
Variable Variable
(e.g., income) (e.g., education)
Population & Sample
© 2003 Pearson Prentice Hall
Regression Models

11 - 29
Population & Sample
© 2003 Pearson Prentice Hall
Regression Models

Population

$

$ $
$
$
11 - 30
Population & Sample
© 2003 Pearson Prentice Hall
Regression Models

Population

Unknown
Relationship
$
Yi   0  1X i   i
$ $
$
$
11 - 31
Population & Sample
© 2003 Pearson Prentice Hall
Regression Models

Population Random Sample

Unknown
Relationship
$
Yi   0  1X i   i $
$
$ $
$
$
11 - 32
Population & Sample
© 2003 Pearson Prentice Hall
Regression Models

Population Random Sample

Unknown
 
Yi   0   1X i   i
Relationship
$
Yi   0  1X i   i $
$
$ $
$
$
11 - 33
Population Linear
© 2003 Pearson Prentice Hall
Regression Model

Y Yi   0  1X i   i Observed
value

i = Random error

E Y    0  1 X i

X
Observed value
11 - 34
Sample Linear
© 2003 Pearson Prentice Hall
Regression Model

Y Yi   0   1X i   i

^i = Random
error
Unsampled
observation
Yi   0   1X i
X
Observed value
11 - 35
© 2003 Pearson Prentice Hall

Estimating Parameters:
Least Squares Method

11 - 36
Regression Modeling
© 2003 Pearson Prentice Hall
Steps

1. Hypothesize Deterministic Component


2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of
Random Error Term
 Estimate Standard Deviation of Error
4. Evaluate Model
5. Use Model for Prediction & Estimation
11 - 37
Scattergram
© 2003 Pearson Prentice Hall

1. Plot of All (Xi, Yi) Pairs


2. Suggests How Well Model Will Fit
Y
60
40
20
0 X
0 20 40 60

11 - 38
Thinking Challenge
© 2003 Pearson Prentice Hall

How would you draw a line through the


points? How do you determine which line
‘fits best’?
Y
60
40
20
0 X
0 20 40 60

11 - 39
Thinking Challenge
© 2003 Pearson Prentice Hall

How would you draw a line through the


points? How do you determine which line
‘fits best’?
Y
60
40
20
0 X
0 20 40 60

11 - 40
Thinking Challenge
© 2003 Pearson Prentice Hall

How would you draw a line through the


points? How do you determine which line
‘fits best’?
Y
60
40
20
0 X
0 20 40 60

11 - 41
Thinking Challenge
© 2003 Pearson Prentice Hall

How would you draw a line through the


points? How do you determine which line
‘fits best’?
Y
60
40
20
0 X
0 20 40 60

11 - 42
Thinking Challenge
© 2003 Pearson Prentice Hall

How would you draw a line through the


points? How do you determine which line
‘fits best’?
Y
60
40
20
0 X
0 20 40 60

11 - 43
Thinking Challenge
© 2003 Pearson Prentice Hall

How would you draw a line through the


points? How do you determine which line
‘fits best’?
Y
60
40
20
0 X
0 20 40 60

11 - 44
Thinking Challenge
© 2003 Pearson Prentice Hall

How would you draw a line through the


points? How do you determine which line
‘fits best’?
Y
60
40
20
0 X
0 20 40 60

11 - 45
Least Squares
© 2003 Pearson Prentice Hall

1. ‘Best Fit’ Means Difference Between


Actual Y Values & Predicted Y Values
Are a Minimum
 But Positive Differences Off-Set Negative

11 - 46
Least Squares
© 2003 Pearson Prentice Hall

1. ‘Best Fit’ Means Difference Between


Actual Y Values & Predicted Y Values
Are a Minimum
 But Positive Differences Off-Set Negative

 Y  Yˆ    ˆ
n 2 n
2
i i i
i 1 i 1

11 - 47
Least Squares
© 2003 Pearson Prentice Hall

1. ‘Best Fit’ Means Difference Between Actual


Y Values & Predicted Y Values Are a
Minimum
 But Positive Differences Off-Set Negative

    ˆ
n n
2
 Y  Yˆ 2
i
i i
i 1 i 1

2. LS Minimizes the Sum of the Squared


Differences (SSE)

11 - 48
Least Squares
© 2003 Pearson Prentice Hall
Graphically
n
LS minimizes   i   1   2   3   4
 2  2  2  2  2

i 1
Y Y2   0   1X 2   2
^4
^2
^1 ^3
Yi   0   1X i
X
11 - 49
Coefficient Equations
© 2003 Pearson Prentice Hall

Prediction Equation
yˆi  ˆ0  ˆ1xi

Sample Slope
SS xy  xi  x  yi  y 
ˆ1  
2
SS xx  xi  x 
Sample Y-intercept
ˆ0  y  ˆ1x
11 - 50
Computation Table
© 2003 Pearson Prentice Hall

2 2
Xi Yi Xi Yi XiYi
X1 Y1 X12 Y1 2 X1 Y 1
2 2
X2 Y2 X2 Y2 X2 Y 2
: : : : :
2 2
Xn Yn Xn Yn XnYn
2 2
Xi  Yi  Xi  Yi XiYi

11 - 51
Interpretation of
© 2003 Pearson Prentice Hall
Coefficients

11 - 52
Interpretation of
© 2003 Pearson Prentice Hall
Coefficients
^
1. Slope (1)
^
 Estimated Y Changes by 1 for Each 1
Unit ^Increase in X
 If 1 = 2, then Sales (Y) Is Expected to
Increase by 2 for Each 1 Unit Increase in
Advertising (X)

11 - 53
Interpretation of
© 2003 Pearson Prentice Hall
Coefficients
1. Slope (^1)
 Estimated Y Changes by ^1 for Each 1 Unit
Increase in X
^
 If  = 2, then Sales (Y) Is Expected to Increase
If 1 = 2, then Sales (Y) Is Expected to Increase
by 2 for Each 1 Unit Increase in Advertising (X)
2. Y-Intercept (^0)
 Average Value of Y When X = 0
 If  = 4, then Average Sales (Y) Is Expected to
^0
Be 4 When Advertising (X) Is 0

11 - 54
Parameter Estimation
© 2003 Pearson Prentice Hall
Example
You’re a marketing analyst for Hasbro Toys.
You gather the following data:
Ad $ Sales (Units)
1 1
2 1
3 2
4 2
5 4
What is the relationship
between sales & advertising?

11 - 55
Scattergram
© 2003 Pearson Prentice Hall
Sales vs. Advertising

Sales
4
3
2
1
0
0 1 2 3 4 5
Advertising

11 - 56
© 2003 Pearson Prentice Hall

Guess The Parameters!

11 - 57
Scattergram
© 2003 Pearson Prentice Hall
Sales vs. Advertising

Sales
4
3
2
1
0
0 1 2 3 4 5
Advertising

11 - 58
Parameter Estimation
© 2003 Pearson Prentice Hall
Solution Table
Xi Yi Xi2 Yi2 XiYi
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
11 - 59
Parameter Estimation
© 2003 Pearson Prentice Hall
Solution
n
  n 
  X i   Yi 
n
   1510 

i 1 i 1
X Y
i i  37 
n 5
ˆ1  i 1
  0.70

n

2
15
2

n
  Xi 55 
5
 
 2 i 1
X i 
i 1 n

ˆ0  Y  ˆ1 X  2  0.70 3  0.10

11 - 60
Coefficient Interpretation
© 2003 Pearson Prentice Hall
Solution

11 - 61
Coefficient Interpretation
© 2003 Pearson Prentice Hall
Solution
^
1. Slope (1)
 Sales Volume (Y) Is Expected to Increase
by .7 Units for Each $1 Increase in
Advertising (X)

11 - 62
Coefficient Interpretation
© 2003 Pearson Prentice Hall
Solution
^
1. Slope (1)
 Sales Volume (Y) Is Expected to Increase by .7
Units for Each $1 Increase in Advertising (X)
2. Y-Intercept (0)
^
 Average Value of Sales Volume (Y) Is
Average Value of Sales Volume (Y) Is
-.10 Units When Advertising (X) Is 0
 Difficult to Explain to Marketing Manager
 Expect Some Sales Without Advertising

11 - 63
Stata: Dataset Creation
© 2003 Pearson Prentice Hall

First, create the data set (as CSV)


Save it as a file (hasbro.txt)
ad, sales
1, 1
2, 1
3, 2
4, 2
5, 4

11 - 64
Stata: Read Dataset
© 2003 Pearson Prentice Hall

clear
insheet using hasbro.txt

11 - 65
Stata: Regress
© 2003 Pearson Prentice Hall

. regress sales ad

Source | SS df MS Number of obs = 5


-------------+------------------------------ F( 1, 3) = 13.36
Model | 4.9 1 4.9 Prob > F = 0.0354
Residual | 1.1 3 .366666667 R-squared = 0.8167
-------------+------------------------------ Adj R-squared = 0.7556
Total | 6 4 1.5 Root MSE = .60553

------------------------------------------------------------------------------
sales | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ad | .7 .1914854 3.66 0.035 .0906079 1.309392
_cons | -.1 .6350853 -0.16 0.885 -2.121125 1.921125
------------------------------------------------------------------------------

^
0 ^1
11 - 66
Derivation of Parameter
© 2003 Pearson Prentice Hall
Equations
Goal: Minimize squared error
  ˆi   yi   0  1xi 
2 ˆ ˆ 2
0 
ˆ0 ˆ0
   2yi  ˆ0  ˆ1xi 
 2ny  nˆ0  nˆ1x 

ˆ0  y  ˆ1x
11 - 67
Derivation of Parameter
Equations
0
2
ˆ ˆ
  ˆi   yi   0  1xi

2

ˆ1 ˆ1
 2 xi yi  ˆ0  ˆ1xi 
 2 xi yi  y  ˆ1x  ˆ1xi 

ˆ1  xi xi  x    xi  yi  y 
ˆ1  xi  x xi  x    xi  x  yi  y 
SS xy
ˆ1 
SS xx
Parameter Estimation
© 2003 Pearson Prentice Hall
Thinking Challenge
You’re an economist for the county
cooperative. You gather the following data:
Fertilizer (lb.) Yield (lb.)
4 3.0
6 5.5
10 6.5
12 9.0
© 1984-1994 T/Maker Co.
What is the relationship
between fertilizer & crop yield?

11 - 69
Scattergram
© 2003 Pearson Prentice Hall
Crop Yield vs. Fertilizer*

Yield (lb.)
10
8
6
4
2
0
0 5 10 15
Fertilizer (lb.)

11 - 70
Parameter Estimation
© 2003 Pearson Prentice Hall
Solution Table*
2 2
Xi Yi Xi Yi XiYi
4 3.0 16 9.00 12
6 5.5 36 30.25 33
10 6.5 100 42.25 65
12 9.0 144 81.00 108
32 24.0 296 162.50 218

11 - 71
Parameter Estimation
© 2003 Pearson Prentice Hall
Solution*
n
  n 
  X i   Yi 

n
X Y   i 1  i 1 
218 
32 24 
i i
n 4
ˆ1  i 1
  0.65

n

2
32 
2

n
  Xi 296 
4
 
 2 i 1
X i 
i 1 n

ˆ0  Y  ˆ1 X  6  0.658  0.80

11 - 72
Coefficient Interpretation
© 2003 Pearson Prentice Hall
Solution*

11 - 73
Coefficient Interpretation
© 2003 Pearson Prentice Hall
Solution*
^
1. Slope (1)
 Crop Yield (Y) Is Expected to Increase
by .65 lb. for Each 1 lb. Increase in Fertilizer
(X)

11 - 74
Coefficient Interpretation
© 2003 Pearson Prentice Hall
Solution*
^
1. Slope (1)
 Crop Yield (Y) Is Expected to Increase
by .65 lb. for Each 1 lb. Increase in Fertilizer
(X)
^
2. Y-Intercept (0)
 Average Crop Yield (Y) Is Expected to Be
0.8 lb. When No Fertilizer (X) Is Used

11 - 75
© 2003 Pearson Prentice Hall

Probability Distribution
of Random Error

11 - 76
The Big Picture Again
© 2003 Pearson Prentice Hall

Yi   0   1X i   i
Model of true population relationship
 Parameters for the coefficients
 Parameters to characterize distribution of errors
 Some assumptions about distribution required
Regression based on sample yields statistics
 Best estimates of the parameters
 Coefficients and error parameters
 Confidence in coefficient estimates depends on
estimates of error parameters
11 - 77
Linear Regression
© 2003 Pearson Prentice Hall
Assumptions
1. Mean of Probability Distribution of
Error Is 0
2. Probability Distribution of Error Has
Constant Variance
1. Exercise: Constant across what?

11 - 78
Linear Regression
© 2003 Pearson Prentice Hall
Assumptions
1. Mean of Probability Distribution of
Error Is 0
2. Probability Distribution of Error Has
Constant Variance
1. Exercise: Constant across what?
xiVar    Var  | xi 

11 - 79
Linear Regression
© 2003 Pearson Prentice Hall
Assumptions
1. Mean of Probability Distribution of Error Is 0
2. Probability Distribution of Error Has
Constant Variance
1. Exercise: Constant across what?
xiVar    Var  | xi 

3. Probability Distribution of Error is Normal


4. Errors Are Independent
c1 , c2 Pr  i  c1   Pr  i  c1 |  j  c2 

11 - 80
Error
© 2003 Pearson Prentice Hall
Probability Distribution
^
f()

Y
X1
X2
X

11 - 81
Error Term Parameters
© 2003 Pearson Prentice Hall

We assume errors are normally distributed


 With mean 0
Just need to estimate the variance

11 - 82
Regression Modeling
© 2003 Pearson Prentice Hall
Steps

1. Hypothesize Deterministic Component


2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of
Random Error Term
 Estimate Standard Deviation of Error
4. Evaluate Model
5. Use Model for Prediction & Estimation
11 - 83
Sample Squared Error
© 2003 Pearson Prentice Hall

Given best estimates for coefficients ˆ0 , ˆ1


For each data point i, have predicted y
value yˆi  ˆ0  ˆ1xi
And thus an observed error  i  yˆi  yi  ˆ0  ˆ1xi  yi
2 2 SSE
SSE   ˆi s  MSE 
n2
s estimates sigma, true variance of errors
in the population

11 - 84
Random Error Variation
© 2003 Pearson Prentice Hall

11 - 85
Random Error Variation
© 2003 Pearson Prentice Hall

1. Variation of Actual Y from Predicted Y

11 - 86
Random Error Variation
© 2003 Pearson Prentice Hall

1. Variation of Actual Y from Predicted Y


2. Measured by Standard Error of
Regression Model
 Sample Standard Deviation of ^, s

11 - 87
Random Error Variation
© 2003 Pearson Prentice Hall

1. Variation of Actual Y from Predicted Y


2. Measured by Standard Error of
Regression Model
 Sample Standard Deviation of ^, s
3. Affects Several Factors
 Parameter Significance
 Prediction Accuracy

11 - 88
Stata: Finding estimate of s
© 2003 Pearson Prentice Hall

Stata: Regress
. regress sales ad

Source | SS df MS Number of obs = 5


-------------+------------------------------ F( 1, 3) = 13.36
Model | 4.9 1 4.9 Prob > F = 0.0354
Residual | 1.1 3 .366666667 R-squared = 0.8167
-------------+------------------------------ Adj R-squared = 0.7556
Total | 6 4 1.5 Root MSE = .60553
------------------------------------------------------------------------------
sales | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ad | .7 .1914854 3.66 0.035 .0906079 1.309392
_cons | -.1 .6350853 -0.16 0.885 -2.121125 1.921125
------------------------------------------------------------------------------

11 - 89
© 2003 Pearson Prentice Hall

Evaluating the Model

Testing for Significance

11 - 90
Regression Modeling
© 2003 Pearson Prentice Hall
Steps

1. Hypothesize Deterministic Component


2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of
Random Error Term
 Estimate Standard Deviation of Error
4. Evaluate Model
5. Use Model for Prediction & Estimation
11 - 91
Test of Slope Coefficient
© 2003 Pearson Prentice Hall

1. Shows If There Is a Linear Relationship


Between X & Y
2. Involves Population Slope 1
3. Hypotheses
 H0: 1 = 0 (No Linear Relationship)
 Ha: 1  0 (Linear Relationship)
4. Theoretical Basis Is Sampling Distribution of
Slope

11 - 92
Sampling Distribution
© 2003 Pearson Prentice Hall
of Sample Slopes

11 - 93
Sampling Distribution
© 2003 Pearson Prentice Hall
of Sample Slopes
Y Sample 1 Line

Sample 2 Line
Population Line
X

11 - 94
Sampling Distribution
© 2003 Pearson Prentice Hall
of Sample Slopes
Y Sample 1 Line
All Possible
Sample Slopes
Sample 2 Line Sample 1: 2.5
Population Line Sample 2: 1.6
X
Sample 3: 1.8
Sample 4: 2.1
Earlier, we considered different : :
choices of slope for a single Very large number of
sample. Here we have many sample slopes
possible samples; always
choose best slope for any
particular sample, using
formulas
11 - 95 from book.
Sampling Distribution
© 2003 Pearson Prentice Hall
of Sample Slopes
Y Sample 1 Line
All Possible
Sample Slopes
Sample 2 Line Sample 1: 2.5
Population Line Sample 2: 1.6
X Sample 3: 1.8
Sample 4: 2.1
Sampling Distribution : :
S^ Very large number of
1 sample slopes

^
1 1
11 - 96
Slope Coefficient
© 2003 Pearson Prentice Hall
Test Statistic
ˆ1  1
tn2 
S ˆ
1

where
S
S ˆ 
1 2
 n

n
 Xi 
 i 1 

i 1
X i
2

n

11 - 97
T Distribution
© 2003 Pearson Prentice Hall

t-statistic from previous slide will follow a T-


distribution
 (if regression assumptions hold)
 n-2 degrees of freedom
 (n is sample size for each sample)
With known distribution for t-statistic
 Hypothesis test (null hypothesis about slope)
 P-value
 Even confidence interval
 But express it not as interval around t-statistic, but
as interval around slope coefficient

11 - 98
Test of Slope Coefficient
© 2003 Pearson Prentice Hall
Example
You’re a marketing analyst for Hasbro Toys.
You find b0 = -.1, b1 = .7 & s = .60553.
Ad $ Sales (Units)
1 1
2 1
3 2
4 2
5 4
Is the relationship significant
at the .05 level?

11 - 99
Test of Slope Parameter
© 2003 Pearson Prentice Hall
Solution
H0: 1 = 0 Test Statistic:
Ha: 1  0  1   1 0.70  0
  .05 t   3.656
S 0.1915
1
df  5 - 2 = 3
Critical Value(s): Decision:
Reject Reject
Reject at  = .05
.025 .025
Conclusion:
There is evidence of a
-3.1824 0 3.1824 t
relationship
11 - 100
Stata: Slope T-statistic
© 2003 Pearson Prentice Hall

Stata: Regress
. regress sales ad

Source | SS df MS Number of obs = 5


-------------+------------------------------ F( 1, 3) = 13.36
Model | 4.9 1 4.9 Prob > F = 0.0354
Residual | 1.1 3 .366666667 R-squared = 0.8167
-------------+------------------------------ Adj R-squared = 0.7556
Total | 6 4 1.5 Root MSE = .60553

------------------------------------------------------------------------------
sales | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ad | .7 .1914854 3.66 0.035 .0906079 1.309392
_cons | -.1 .6350853 -0.16 0.885 -2.121125 1.921125
------------------------------------------------------------------------------

ˆ1  1 ˆ1
S ˆ t  (H 0 : 1  0)
1 S ˆ S ˆ
1 1
11 - 101
Stata: Slope p-value, CI
© 2003 Pearson Prentice Hall

Stata: Regress
. regress sales ad

Source | SS df MS Number of obs = 5


-------------+------------------------------ F( 1, 3) = 13.36
Model | 4.9 1 4.9 Prob > F = 0.0354
Residual | 1.1 3 .366666667 R-squared = 0.8167
-------------+------------------------------ Adj R-squared = 0.7556
Total | 6 4 1.5 Root MSE = .60553

------------------------------------------------------------------------------
sales | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ad | .7 .1914854 3.66 0.035 .0906079 1.309392
_cons | -.1 .6350853 -0.16 0.885 -2.121125 1.921125
------------------------------------------------------------------------------

CI for Coefficient
P-value
11 - 102
Measures of Variation
© 2003 Pearson Prentice Hall
in Regression
1. Total Sum of Squares (SSyy)
 Measures Variation of Observed Yi
Around the MeanY
2. Explained Variation (SSR)
 Variation Due to Relationship Between
X&Y
3. Unexplained Variation (SSE)
 Variation Due to Other Factors

11 - 103
Variation Measures
© 2003 Pearson Prentice Hall

Unexplained sum
Y Yi ^ 2
of squares (Yi -Yi)
Total sum
of squares Yi   0   1X i

(Yi -Y)2 Explained sum of


^
squares (Yi -Y)2
Y
X
Xi
11 - 104
Coefficient of
© 2003 Pearson Prentice Hall
Determination
1. Proportion of Variation ‘Explained’ by
Relationship Between X & Y 0  r2  1

2Explained Variation
r 
Total Variation

 Y  Y    Y  Yˆ 
n n
2 2
i i
 i 1
n
i 1

 Y  Y 
2
i
i 1
11 - 105
Coefficient of
© 2003 Pearson Prentice Hall
Determination Examples

Y r2 = 1 Y r2 = 1

X X

Y r2 = .8 Y r2 = 0

X X
11 - 106
Stata: R-squared Statistic
© 2003 Pearson Prentice Hall

Stata: Regress
. regress sales ad

Source | SS df MS Number of obs = 5


-------------+------------------------------ F( 1, 3) = 13.36
Model | 4.9 1 4.9 Prob > F = 0.0354
Residual | 1.1 3 .366666667 R-squared = 0.8167
-------------+------------------------------ Adj R-squared = 0.7556
Total | 6 4 1.5 Root MSE = .60553

------------------------------------------------------------------------------
sales | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ad | .7 .1914854 3.66 0.035 .0906079 1.309392
_cons | -.1 .6350853 -0.16 0.885 -2.121125 1.921125
------------------------------------------------------------------------------

SS yy  SSE 6  1.1
 
SS yy 6
11 - 107
Coefficient of
© 2003 Pearson Prentice Hall
Determination Example
You’re a marketing analyst for Hasbro
^ ^
Toys. You find 0 = -0.1 & 1 = 0.7.
Ad $ Sales (Units)
1 1
2 1
3 2
4 2
5 4
Interpret a coefficient of
determination of 0.8167.
11 - 108
Correlation Coefficient
© 2003 Pearson Prentice Hall

SS xy
r
SS xx SS yy

A rescaling of the slope coefficient


SS xy SS xx
̂1  r  ̂1
SS xx SS yy

 Always between -1 and 1


For univariate regression only
r  coefficient of determination
11 - 109
Stata Correlation
© 2003 Pearson Prentice Hall
Coefficients

Can generate all pairwise correlations


Just two in this case

. corr sales ad
(obs=5)

| sales
ad
-------------
+------------------
sales | .8167
1.0000
 .9037
11 - 110 ad | 0.9037
© 2003 Pearson Prentice Hall

Using the Model for


Prediction & Estimation

11 - 111
Regression Modeling
© 2003 Pearson Prentice Hall
Steps

1. Hypothesize Deterministic Component


2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of Random
Error Term
 Estimate Standard Deviation of Error
4. Evaluate Model
5. Use Model for Prediction & Estimation
11 - 112
Prediction With
© 2003 Pearson Prentice Hall
Regression Models
Estimate Population Mean Response
E(Y) for Given X
 Point on Population Regression Line
Predict Individual Response (Yi) for Given
X

11 - 113
Point Estimation/Prediction
© 2003 Pearson Prentice Hall
is Same

Y
YIndividual ^ 1X
^ 0 +
^Y i=
Mean Y, E(Y)

True relationship:
E(Y) = 0 + 1X
Prediction, ^
Y
X
Xp
11 - 114
Estimation Experiment
Assume true relationship Y   o  1 X  
 But parameters unknown
For particular xp, want to estimate E(y)
 Correct (but unknown) value is
E (Y )   o  1x p
Draw a sample at random
 Compute regression line
 Make point estimate Y  ˆo  ˆ1 X
 Draw a confidence intervalEsuch
(Y ) that
ˆo  ˆ1 x p
 With probability alpha, true value of E(Y) lies in the interval
Prediction Experiment
Assume true relationship Y   o  1 X  
 But parameters unknown
For particular xp, select a one-point sample at random
 Correct value (but unobserved) is
Draw a sample at random
 Compute regression line y   o  1x p   i
 Make point estimate
 Draw a confidence interval such that
 With probability alpha, true value of y Y
lies
ˆ
 in othe ˆ
1 X
 interval
yˆ  ˆo  ˆ1x p
Sampling Distributions
for Estimation, Prediction
© 2003 Pearson Prentice Hall

Estimation  yˆ  
1

x p  x 2

n SS xx`

1 x p  x 2
Prediction   y  yˆ    1  
n SS xx`

Sigma unknown; confidence


interval based on T distribution,
n-2 dof
11 - 117
Confidence Interval
© 2003 Pearson Prentice Hall
Estimate of Mean Y

Yˆ  t n  2, / 2  SYˆ  E (Y )  Yˆ  t n  2, / 2  SYˆ


where

SYˆ  S
1

X  X 
p
2

n
n
 X  X 
2
i
i 1

11 - 118
Factors Affecting
© 2003 Pearson Prentice Hall
Interval Width
1. Level of Confidence (1 - )
 Width Increases as Confidence Increases
2. Data Dispersion (s)
 Width Increases as Variation Increases
3. Sample Size
 Width Decreases as Sample Size Increases
4. Distance of Xp from MeanX
 Width Increases as Distance Increases

11 - 119
Why Distance from Mean?
© 2003 Pearson Prentice Hall

in e
1 L Greater
p le
a m
_ S dispersion
than X1
Y Sample 2 L
ine

X
X1 X X2
11 - 120
Confidence Interval
© 2003 Pearson Prentice Hall
Estimate Example
You’re a marketing analyst for Hasbro Toys.
You find b0 = -.1, b1 = .7 & s = .60553.
Ad $ Sales (Units)
1 1
2 1
3 2
4 2
5 4
Estimate the mean sales when
advertising is $4 at the .05 level.

11 - 121
Confidence Interval
© 2003 Pearson Prentice Hall
Estimate Solution
Yˆ  t n  2, / 2  SYˆ  E (Y )  Yˆ  t n  2, / 2  SYˆ

Yˆ  0.1  0.7 4   2.7 X to be predicted

1 4  3
2
SYˆ  .60553   0.3316
5 10

2.7  3.18240.3316  E (Y )  2.7  3.18240.3316

1.6445  E (Y )  3.7553

11 - 122
Prediction Interval of
© 2003 Pearson Prentice Hall
Individual Response

Yˆ  t n  2, / 2  S Y Yˆ   YP  Yˆ  t n  2, / 2  S Y Yˆ 


where

1
S Y Yˆ   S 1  
X  X P
2

n
n
 X  X 
2
i
i 1

Note!
11 - 123
Why the Extra ‘S’?
© 2003 Pearson Prentice Hall

Y
Y we're trying to
^ 1X i
predict
^
^= 0
 +
 Expected
(Mean) Y
Yi

E(Y) = 0 + 1 X

Prediction, Y ^

X
XP
11 - 124
Hyperbolic Interval Bands
© 2003 Pearson Prentice Hall

^
 Xi
^
^ = 0 + 1
Yi

_ X
X XP
11 - 125
In Stata
Terminology a little different, be careful
Predict command used for both estimation and prediction
 Run immediately after regress
 (it remembers the last regress you did)
For estimating E(Y)
 stdp (standard error of prediction)
For prediction particular Y
 stdf (standard error or forecast)
You have to look up critical t-value and generate
confidence interval yourself
Stata Code
© 2003 Pearson Prentice Hall

regress sales ad
predict point
predict esterror, stdp
predict prederror, stdf
*** with n-2=3 dof, alpha=.05, critical t-value is 3.182
gen estlower = point-3.182*esterror
gen esthigher = point+3.182*esterror
gen predlower = point-3.182*prederror
gen predhigher = point+3.182*prederror
format esterror prederror estlower esthigher predlower predhigher
%9.3g
list ad point esterror prederror estlower esthigher predlower
predhigher, clean

11 - 127
Stata Output
© 2003 Pearson Prentice Hall

. list ad point esterror prederror estlower esthigher predlower predhigher, clean

ad point esterror preder~r estlower esthig~r predlo~r predhi~r


1. 1 .6 .469 .766 -.892 2.09 -1.84 3.04
2. 2 1.3 .332 .69 .245 2.36 -.897 3.5
3. 3 2 .271 .663 1.14 2.86 -.111 4.11
4. 4 2.7 .332 .69 1.64 3.76 .503 4.9
5. 5 3.4 .469 .766 1.91 4.89 .963 5.84

s yˆ s y  yˆ CI for estimate CI for prediction

11 - 128
Review: Prediction With
© 2003 Pearson Prentice Hall
Regression Models
Estimate Population Mean Response E(Y) for
Given X
 Point on Population Regression Line
Predict Individual Response (Yi) for Given X

More confident that you’ll get prediction of mean


response correct than that you’ll get
prediction of individual response correct
 Why?

11 - 129
Conclusion
© 2003 Pearson Prentice Hall

1. Described the Linear Regression Model


2. Stated the Regression Modeling Steps
3. Explained Ordinary Least Squares
4. Computed Regression Coefficients
5. Predicted Response Variable
6. Interpreted Computer Output

11 - 130
End of Chapter

Any blank slides that follow are


blank intentionally.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy