Ch11-Simple Linear Regression
Ch11-Simple Linear Regression
Chapter 11
11 - 1
Learning Objectives
© 2003 Pearson Prentice Hall
11 - 2
© 2003 Pearson Prentice Hall
Models
11 - 3
Models
© 2003 Pearson Prentice Hall
11 - 4
Deterministic Models
© 2003 Pearson Prentice Hall
11 - 5
Probabilistic Models
© 2003 Pearson Prentice Hall
1. Hypothesize 2 Components
Deterministic
Random Error
2. Example: Sales Volume Is 10 Times
Advertising Spending + Random Error
Y = 10X +
Random Error May Be Due to Factors
Other Than Advertising
11 - 6
Types of
© 2003 Pearson Prentice Hall
Probabilistic Models
Probabilistic
Models
11 - 7
© 2003 Pearson Prentice Hall
Regression Models
11 - 8
Types of
© 2003 Pearson Prentice Hall
Probabilistic Models
Probabilistic
Models
11 - 9
Regression Models
© 2003 Pearson Prentice Hall
11 - 10
Regression Modeling
© 2003 Pearson Prentice Hall
Steps
1. Hypothesize Deterministic Component
2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of
Random Error Term
Estimate Standard Deviation of Error
4. Evaluate Model
5. Use Model for Prediction & Estimation
11 - 11
© 2003 Pearson Prentice Hall
Model Specification
11 - 12
Regression Modeling
© 2003 Pearson Prentice Hall
Steps
1. Define Variables
2. Hypothesize Nature of Relationship
Expected Effects (i.e., Coefficients’ Signs)
Functional Form (Linear or Non-Linear)
Interactions
11 - 14
Model Specification
© 2003 Pearson Prentice Hall
Is Based on Theory
11 - 15
Thinking Challenge:
© 2003 Pearson Prentice Hall
Which Is More Logical?
Sales Sales
Advertising Advertising
Sales Sales
Advertising Advertising
11 - 16
Types of
© 2003 Pearson Prentice Hall
Regression Models
11 - 17
Types of
© 2003 Pearson Prentice Hall
Regression Models
Regression
Models
11 - 18
Types of
© 2003 Pearson Prentice Hall
Regression Models
1 Explanatory Regression
Variable Models
Simple
11 - 19
Types of
© 2003 Pearson Prentice Hall
Regression Models
Simple Multiple
11 - 20
Types of
© 2003 Pearson Prentice Hall
Regression Models
Simple Multiple
Linear
11 - 21
Types of
© 2003 Pearson Prentice Hall
Regression Models
Simple Multiple
Non-
Linear
Linear
11 - 22
Types of
© 2003 Pearson Prentice Hall
Regression Models
Simple Multiple
Non-
Linear Linear
Linear
11 - 23
Types of
© 2003 Pearson Prentice Hall
Regression Models
Simple Multiple
Non- Non-
Linear Linear
Linear Linear
11 - 24
© 2003 Pearson Prentice Hall
11 - 25
Types of
© 2003 Pearson Prentice Hall
Regression Models
Simple Multiple
Non- Non-
Linear Linear
Linear Linear
11 - 26
Linear Equations
© 2003 Pearson Prentice Hall
Y
Y = mX + b
Change
m = Slope in Y
Change in X
b = Y-intercept
X
11 - 27
Linear Regression Model
Yi 0 1X i i
Dependent Independent
(Response) (Explanatory)
Variable Variable
(e.g., income) (e.g., education)
Population & Sample
© 2003 Pearson Prentice Hall
Regression Models
11 - 29
Population & Sample
© 2003 Pearson Prentice Hall
Regression Models
Population
$
$ $
$
$
11 - 30
Population & Sample
© 2003 Pearson Prentice Hall
Regression Models
Population
Unknown
Relationship
$
Yi 0 1X i i
$ $
$
$
11 - 31
Population & Sample
© 2003 Pearson Prentice Hall
Regression Models
Unknown
Relationship
$
Yi 0 1X i i $
$
$ $
$
$
11 - 32
Population & Sample
© 2003 Pearson Prentice Hall
Regression Models
Unknown
Yi 0 1X i i
Relationship
$
Yi 0 1X i i $
$
$ $
$
$
11 - 33
Population Linear
© 2003 Pearson Prentice Hall
Regression Model
Y Yi 0 1X i i Observed
value
i = Random error
E Y 0 1 X i
X
Observed value
11 - 34
Sample Linear
© 2003 Pearson Prentice Hall
Regression Model
Y Yi 0 1X i i
^i = Random
error
Unsampled
observation
Yi 0 1X i
X
Observed value
11 - 35
© 2003 Pearson Prentice Hall
Estimating Parameters:
Least Squares Method
11 - 36
Regression Modeling
© 2003 Pearson Prentice Hall
Steps
11 - 38
Thinking Challenge
© 2003 Pearson Prentice Hall
11 - 39
Thinking Challenge
© 2003 Pearson Prentice Hall
11 - 40
Thinking Challenge
© 2003 Pearson Prentice Hall
11 - 41
Thinking Challenge
© 2003 Pearson Prentice Hall
11 - 42
Thinking Challenge
© 2003 Pearson Prentice Hall
11 - 43
Thinking Challenge
© 2003 Pearson Prentice Hall
11 - 44
Thinking Challenge
© 2003 Pearson Prentice Hall
11 - 45
Least Squares
© 2003 Pearson Prentice Hall
11 - 46
Least Squares
© 2003 Pearson Prentice Hall
Y Yˆ ˆ
n 2 n
2
i i i
i 1 i 1
11 - 47
Least Squares
© 2003 Pearson Prentice Hall
ˆ
n n
2
Y Yˆ 2
i
i i
i 1 i 1
11 - 48
Least Squares
© 2003 Pearson Prentice Hall
Graphically
n
LS minimizes i 1 2 3 4
2 2 2 2 2
i 1
Y Y2 0 1X 2 2
^4
^2
^1 ^3
Yi 0 1X i
X
11 - 49
Coefficient Equations
© 2003 Pearson Prentice Hall
Prediction Equation
yˆi ˆ0 ˆ1xi
Sample Slope
SS xy xi x yi y
ˆ1
2
SS xx xi x
Sample Y-intercept
ˆ0 y ˆ1x
11 - 50
Computation Table
© 2003 Pearson Prentice Hall
2 2
Xi Yi Xi Yi XiYi
X1 Y1 X12 Y1 2 X1 Y 1
2 2
X2 Y2 X2 Y2 X2 Y 2
: : : : :
2 2
Xn Yn Xn Yn XnYn
2 2
Xi Yi Xi Yi XiYi
11 - 51
Interpretation of
© 2003 Pearson Prentice Hall
Coefficients
11 - 52
Interpretation of
© 2003 Pearson Prentice Hall
Coefficients
^
1. Slope (1)
^
Estimated Y Changes by 1 for Each 1
Unit ^Increase in X
If 1 = 2, then Sales (Y) Is Expected to
Increase by 2 for Each 1 Unit Increase in
Advertising (X)
11 - 53
Interpretation of
© 2003 Pearson Prentice Hall
Coefficients
1. Slope (^1)
Estimated Y Changes by ^1 for Each 1 Unit
Increase in X
^
If = 2, then Sales (Y) Is Expected to Increase
If 1 = 2, then Sales (Y) Is Expected to Increase
by 2 for Each 1 Unit Increase in Advertising (X)
2. Y-Intercept (^0)
Average Value of Y When X = 0
If = 4, then Average Sales (Y) Is Expected to
^0
Be 4 When Advertising (X) Is 0
11 - 54
Parameter Estimation
© 2003 Pearson Prentice Hall
Example
You’re a marketing analyst for Hasbro Toys.
You gather the following data:
Ad $ Sales (Units)
1 1
2 1
3 2
4 2
5 4
What is the relationship
between sales & advertising?
11 - 55
Scattergram
© 2003 Pearson Prentice Hall
Sales vs. Advertising
Sales
4
3
2
1
0
0 1 2 3 4 5
Advertising
11 - 56
© 2003 Pearson Prentice Hall
11 - 57
Scattergram
© 2003 Pearson Prentice Hall
Sales vs. Advertising
Sales
4
3
2
1
0
0 1 2 3 4 5
Advertising
11 - 58
Parameter Estimation
© 2003 Pearson Prentice Hall
Solution Table
Xi Yi Xi2 Yi2 XiYi
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
11 - 59
Parameter Estimation
© 2003 Pearson Prentice Hall
Solution
n
n
X i Yi
n
1510
i 1 i 1
X Y
i i 37
n 5
ˆ1 i 1
0.70
n
2
15
2
n
Xi 55
5
2 i 1
X i
i 1 n
11 - 60
Coefficient Interpretation
© 2003 Pearson Prentice Hall
Solution
11 - 61
Coefficient Interpretation
© 2003 Pearson Prentice Hall
Solution
^
1. Slope (1)
Sales Volume (Y) Is Expected to Increase
by .7 Units for Each $1 Increase in
Advertising (X)
11 - 62
Coefficient Interpretation
© 2003 Pearson Prentice Hall
Solution
^
1. Slope (1)
Sales Volume (Y) Is Expected to Increase by .7
Units for Each $1 Increase in Advertising (X)
2. Y-Intercept (0)
^
Average Value of Sales Volume (Y) Is
Average Value of Sales Volume (Y) Is
-.10 Units When Advertising (X) Is 0
Difficult to Explain to Marketing Manager
Expect Some Sales Without Advertising
11 - 63
Stata: Dataset Creation
© 2003 Pearson Prentice Hall
11 - 64
Stata: Read Dataset
© 2003 Pearson Prentice Hall
clear
insheet using hasbro.txt
11 - 65
Stata: Regress
© 2003 Pearson Prentice Hall
. regress sales ad
------------------------------------------------------------------------------
sales | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ad | .7 .1914854 3.66 0.035 .0906079 1.309392
_cons | -.1 .6350853 -0.16 0.885 -2.121125 1.921125
------------------------------------------------------------------------------
^
0 ^1
11 - 66
Derivation of Parameter
© 2003 Pearson Prentice Hall
Equations
Goal: Minimize squared error
ˆi yi 0 1xi
2 ˆ ˆ 2
0
ˆ0 ˆ0
2yi ˆ0 ˆ1xi
2ny nˆ0 nˆ1x
ˆ0 y ˆ1x
11 - 67
Derivation of Parameter
Equations
0
2
ˆ ˆ
ˆi yi 0 1xi
2
ˆ1 ˆ1
2 xi yi ˆ0 ˆ1xi
2 xi yi y ˆ1x ˆ1xi
ˆ1 xi xi x xi yi y
ˆ1 xi x xi x xi x yi y
SS xy
ˆ1
SS xx
Parameter Estimation
© 2003 Pearson Prentice Hall
Thinking Challenge
You’re an economist for the county
cooperative. You gather the following data:
Fertilizer (lb.) Yield (lb.)
4 3.0
6 5.5
10 6.5
12 9.0
© 1984-1994 T/Maker Co.
What is the relationship
between fertilizer & crop yield?
11 - 69
Scattergram
© 2003 Pearson Prentice Hall
Crop Yield vs. Fertilizer*
Yield (lb.)
10
8
6
4
2
0
0 5 10 15
Fertilizer (lb.)
11 - 70
Parameter Estimation
© 2003 Pearson Prentice Hall
Solution Table*
2 2
Xi Yi Xi Yi XiYi
4 3.0 16 9.00 12
6 5.5 36 30.25 33
10 6.5 100 42.25 65
12 9.0 144 81.00 108
32 24.0 296 162.50 218
11 - 71
Parameter Estimation
© 2003 Pearson Prentice Hall
Solution*
n
n
X i Yi
n
X Y i 1 i 1
218
32 24
i i
n 4
ˆ1 i 1
0.65
n
2
32
2
n
Xi 296
4
2 i 1
X i
i 1 n
11 - 72
Coefficient Interpretation
© 2003 Pearson Prentice Hall
Solution*
11 - 73
Coefficient Interpretation
© 2003 Pearson Prentice Hall
Solution*
^
1. Slope (1)
Crop Yield (Y) Is Expected to Increase
by .65 lb. for Each 1 lb. Increase in Fertilizer
(X)
11 - 74
Coefficient Interpretation
© 2003 Pearson Prentice Hall
Solution*
^
1. Slope (1)
Crop Yield (Y) Is Expected to Increase
by .65 lb. for Each 1 lb. Increase in Fertilizer
(X)
^
2. Y-Intercept (0)
Average Crop Yield (Y) Is Expected to Be
0.8 lb. When No Fertilizer (X) Is Used
11 - 75
© 2003 Pearson Prentice Hall
Probability Distribution
of Random Error
11 - 76
The Big Picture Again
© 2003 Pearson Prentice Hall
Yi 0 1X i i
Model of true population relationship
Parameters for the coefficients
Parameters to characterize distribution of errors
Some assumptions about distribution required
Regression based on sample yields statistics
Best estimates of the parameters
Coefficients and error parameters
Confidence in coefficient estimates depends on
estimates of error parameters
11 - 77
Linear Regression
© 2003 Pearson Prentice Hall
Assumptions
1. Mean of Probability Distribution of
Error Is 0
2. Probability Distribution of Error Has
Constant Variance
1. Exercise: Constant across what?
11 - 78
Linear Regression
© 2003 Pearson Prentice Hall
Assumptions
1. Mean of Probability Distribution of
Error Is 0
2. Probability Distribution of Error Has
Constant Variance
1. Exercise: Constant across what?
xiVar Var | xi
11 - 79
Linear Regression
© 2003 Pearson Prentice Hall
Assumptions
1. Mean of Probability Distribution of Error Is 0
2. Probability Distribution of Error Has
Constant Variance
1. Exercise: Constant across what?
xiVar Var | xi
11 - 80
Error
© 2003 Pearson Prentice Hall
Probability Distribution
^
f()
Y
X1
X2
X
11 - 81
Error Term Parameters
© 2003 Pearson Prentice Hall
11 - 82
Regression Modeling
© 2003 Pearson Prentice Hall
Steps
11 - 84
Random Error Variation
© 2003 Pearson Prentice Hall
11 - 85
Random Error Variation
© 2003 Pearson Prentice Hall
11 - 86
Random Error Variation
© 2003 Pearson Prentice Hall
11 - 87
Random Error Variation
© 2003 Pearson Prentice Hall
11 - 88
Stata: Finding estimate of s
© 2003 Pearson Prentice Hall
Stata: Regress
. regress sales ad
11 - 89
© 2003 Pearson Prentice Hall
11 - 90
Regression Modeling
© 2003 Pearson Prentice Hall
Steps
11 - 92
Sampling Distribution
© 2003 Pearson Prentice Hall
of Sample Slopes
11 - 93
Sampling Distribution
© 2003 Pearson Prentice Hall
of Sample Slopes
Y Sample 1 Line
Sample 2 Line
Population Line
X
11 - 94
Sampling Distribution
© 2003 Pearson Prentice Hall
of Sample Slopes
Y Sample 1 Line
All Possible
Sample Slopes
Sample 2 Line Sample 1: 2.5
Population Line Sample 2: 1.6
X
Sample 3: 1.8
Sample 4: 2.1
Earlier, we considered different : :
choices of slope for a single Very large number of
sample. Here we have many sample slopes
possible samples; always
choose best slope for any
particular sample, using
formulas
11 - 95 from book.
Sampling Distribution
© 2003 Pearson Prentice Hall
of Sample Slopes
Y Sample 1 Line
All Possible
Sample Slopes
Sample 2 Line Sample 1: 2.5
Population Line Sample 2: 1.6
X Sample 3: 1.8
Sample 4: 2.1
Sampling Distribution : :
S^ Very large number of
1 sample slopes
^
1 1
11 - 96
Slope Coefficient
© 2003 Pearson Prentice Hall
Test Statistic
ˆ1 1
tn2
S ˆ
1
where
S
S ˆ
1 2
n
n
Xi
i 1
i 1
X i
2
n
11 - 97
T Distribution
© 2003 Pearson Prentice Hall
11 - 98
Test of Slope Coefficient
© 2003 Pearson Prentice Hall
Example
You’re a marketing analyst for Hasbro Toys.
You find b0 = -.1, b1 = .7 & s = .60553.
Ad $ Sales (Units)
1 1
2 1
3 2
4 2
5 4
Is the relationship significant
at the .05 level?
11 - 99
Test of Slope Parameter
© 2003 Pearson Prentice Hall
Solution
H0: 1 = 0 Test Statistic:
Ha: 1 0 1 1 0.70 0
.05 t 3.656
S 0.1915
1
df 5 - 2 = 3
Critical Value(s): Decision:
Reject Reject
Reject at = .05
.025 .025
Conclusion:
There is evidence of a
-3.1824 0 3.1824 t
relationship
11 - 100
Stata: Slope T-statistic
© 2003 Pearson Prentice Hall
Stata: Regress
. regress sales ad
------------------------------------------------------------------------------
sales | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ad | .7 .1914854 3.66 0.035 .0906079 1.309392
_cons | -.1 .6350853 -0.16 0.885 -2.121125 1.921125
------------------------------------------------------------------------------
ˆ1 1 ˆ1
S ˆ t (H 0 : 1 0)
1 S ˆ S ˆ
1 1
11 - 101
Stata: Slope p-value, CI
© 2003 Pearson Prentice Hall
Stata: Regress
. regress sales ad
------------------------------------------------------------------------------
sales | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ad | .7 .1914854 3.66 0.035 .0906079 1.309392
_cons | -.1 .6350853 -0.16 0.885 -2.121125 1.921125
------------------------------------------------------------------------------
CI for Coefficient
P-value
11 - 102
Measures of Variation
© 2003 Pearson Prentice Hall
in Regression
1. Total Sum of Squares (SSyy)
Measures Variation of Observed Yi
Around the MeanY
2. Explained Variation (SSR)
Variation Due to Relationship Between
X&Y
3. Unexplained Variation (SSE)
Variation Due to Other Factors
11 - 103
Variation Measures
© 2003 Pearson Prentice Hall
Unexplained sum
Y Yi ^ 2
of squares (Yi -Yi)
Total sum
of squares Yi 0 1X i
2Explained Variation
r
Total Variation
Y Y Y Yˆ
n n
2 2
i i
i 1
n
i 1
Y Y
2
i
i 1
11 - 105
Coefficient of
© 2003 Pearson Prentice Hall
Determination Examples
Y r2 = 1 Y r2 = 1
X X
Y r2 = .8 Y r2 = 0
X X
11 - 106
Stata: R-squared Statistic
© 2003 Pearson Prentice Hall
Stata: Regress
. regress sales ad
------------------------------------------------------------------------------
sales | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ad | .7 .1914854 3.66 0.035 .0906079 1.309392
_cons | -.1 .6350853 -0.16 0.885 -2.121125 1.921125
------------------------------------------------------------------------------
SS yy SSE 6 1.1
SS yy 6
11 - 107
Coefficient of
© 2003 Pearson Prentice Hall
Determination Example
You’re a marketing analyst for Hasbro
^ ^
Toys. You find 0 = -0.1 & 1 = 0.7.
Ad $ Sales (Units)
1 1
2 1
3 2
4 2
5 4
Interpret a coefficient of
determination of 0.8167.
11 - 108
Correlation Coefficient
© 2003 Pearson Prentice Hall
SS xy
r
SS xx SS yy
. corr sales ad
(obs=5)
| sales
ad
-------------
+------------------
sales | .8167
1.0000
.9037
11 - 110 ad | 0.9037
© 2003 Pearson Prentice Hall
11 - 111
Regression Modeling
© 2003 Pearson Prentice Hall
Steps
11 - 113
Point Estimation/Prediction
© 2003 Pearson Prentice Hall
is Same
Y
YIndividual ^ 1X
^ 0 +
^Y i=
Mean Y, E(Y)
True relationship:
E(Y) = 0 + 1X
Prediction, ^
Y
X
Xp
11 - 114
Estimation Experiment
Assume true relationship Y o 1 X
But parameters unknown
For particular xp, want to estimate E(y)
Correct (but unknown) value is
E (Y ) o 1x p
Draw a sample at random
Compute regression line
Make point estimate Y ˆo ˆ1 X
Draw a confidence intervalEsuch
(Y ) that
ˆo ˆ1 x p
With probability alpha, true value of E(Y) lies in the interval
Prediction Experiment
Assume true relationship Y o 1 X
But parameters unknown
For particular xp, select a one-point sample at random
Correct value (but unobserved) is
Draw a sample at random
Compute regression line y o 1x p i
Make point estimate
Draw a confidence interval such that
With probability alpha, true value of y Y
lies
ˆ
in othe ˆ
1 X
interval
yˆ ˆo ˆ1x p
Sampling Distributions
for Estimation, Prediction
© 2003 Pearson Prentice Hall
Estimation yˆ
1
x p x 2
n SS xx`
1 x p x 2
Prediction y yˆ 1
n SS xx`
SYˆ S
1
X X
p
2
n
n
X X
2
i
i 1
11 - 118
Factors Affecting
© 2003 Pearson Prentice Hall
Interval Width
1. Level of Confidence (1 - )
Width Increases as Confidence Increases
2. Data Dispersion (s)
Width Increases as Variation Increases
3. Sample Size
Width Decreases as Sample Size Increases
4. Distance of Xp from MeanX
Width Increases as Distance Increases
11 - 119
Why Distance from Mean?
© 2003 Pearson Prentice Hall
in e
1 L Greater
p le
a m
_ S dispersion
than X1
Y Sample 2 L
ine
X
X1 X X2
11 - 120
Confidence Interval
© 2003 Pearson Prentice Hall
Estimate Example
You’re a marketing analyst for Hasbro Toys.
You find b0 = -.1, b1 = .7 & s = .60553.
Ad $ Sales (Units)
1 1
2 1
3 2
4 2
5 4
Estimate the mean sales when
advertising is $4 at the .05 level.
11 - 121
Confidence Interval
© 2003 Pearson Prentice Hall
Estimate Solution
Yˆ t n 2, / 2 SYˆ E (Y ) Yˆ t n 2, / 2 SYˆ
1 4 3
2
SYˆ .60553 0.3316
5 10
1.6445 E (Y ) 3.7553
11 - 122
Prediction Interval of
© 2003 Pearson Prentice Hall
Individual Response
1
S Y Yˆ S 1
X X P
2
n
n
X X
2
i
i 1
Note!
11 - 123
Why the Extra ‘S’?
© 2003 Pearson Prentice Hall
Y
Y we're trying to
^ 1X i
predict
^
^= 0
+
Expected
(Mean) Y
Yi
E(Y) = 0 + 1 X
Prediction, Y ^
X
XP
11 - 124
Hyperbolic Interval Bands
© 2003 Pearson Prentice Hall
^
Xi
^
^ = 0 + 1
Yi
_ X
X XP
11 - 125
In Stata
Terminology a little different, be careful
Predict command used for both estimation and prediction
Run immediately after regress
(it remembers the last regress you did)
For estimating E(Y)
stdp (standard error of prediction)
For prediction particular Y
stdf (standard error or forecast)
You have to look up critical t-value and generate
confidence interval yourself
Stata Code
© 2003 Pearson Prentice Hall
regress sales ad
predict point
predict esterror, stdp
predict prederror, stdf
*** with n-2=3 dof, alpha=.05, critical t-value is 3.182
gen estlower = point-3.182*esterror
gen esthigher = point+3.182*esterror
gen predlower = point-3.182*prederror
gen predhigher = point+3.182*prederror
format esterror prederror estlower esthigher predlower predhigher
%9.3g
list ad point esterror prederror estlower esthigher predlower
predhigher, clean
11 - 127
Stata Output
© 2003 Pearson Prentice Hall
11 - 128
Review: Prediction With
© 2003 Pearson Prentice Hall
Regression Models
Estimate Population Mean Response E(Y) for
Given X
Point on Population Regression Line
Predict Individual Response (Yi) for Given X
11 - 129
Conclusion
© 2003 Pearson Prentice Hall
11 - 130
End of Chapter