Lecture 7 - Regression
Lecture 7 - Regression
• Email: anisf@neduet.edu.pk
Driveonelink:
https://1drv.ms/f/s!AkgKqDvMcQJRgjeZOGE0fgYluSRX
1
Books and lecture notes
2
Correlation vs. Regression
• A scatter diagram can be used to show the
relationship between two variables
• Correlation analysis is used to measure
strength of the association (linear relationship)
between two variables
– Correlation is only concerned with strength of the
relationship
– No causal effect is implied with correlation
– Scatter diagrams were first presented
– Correlation was first presented
Introduction to
Regression Analysis
• Regression analysis is used to:
– Predict the value of a dependent variable based on the value
of at least one independent variable
– Explain the impact of changes in an independent variable on
the dependent variable
Dependent variable: the variable we wish to predict
or explain
Independent variable: the variable used to explain
the dependent variable
Simple Linear Regression Model
Y Y
X X
Y Y
X X
Types of Relationships
(continued)
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
Types of Relationships
(continued)
No relationship
X
Simple Linear Regression Model
Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable
Yi β0 β1Xi ε i
Linear component Random Error
component
Simple Linear Regression Model
(continued)
Y Yi β0 β1Xi ε i
Observed Value
of Y for Xi
εi Slope = β1
Intercept = β0
Xi
X
Individual observations around true
regression line
Simple Linear Regression
12
Simple Linear Regression
Least Squares Estimates
The least-squares estimates of the intercept and slope in
the simple linear regression model are
ˆ y
ˆ x
(11-1)
n n
yi xi
n
i 1
yi xi i 1
n
ˆ
i 1
2
n
xi
n
(11-2) xi2 i 1
i 1 n
n n
xi yi
n n
S xy yi xi x 2 xi yi i 1 i 1
i 1 i 1 n
14
11-2: Simple Linear Regression
The fitted or estimated regression line is therefore
ˆ
yˆ ˆ x
(11-3)
ei yi yˆ i
16
Oxygen Purity
We will fit a simple linear regression model to the oxygen purity
data in
Table 11-1. The following
20
quantities
20
may be computed:
n 20 xi 23.92 yi 1,843.21
i 1 i 1
x 1.1960 y 92.1605
20 20
yi2 170,044.5321 xi2 29.2892
i 1 i 1
20
xi yi 2,214.6566
i 1
2
20
xi
20 ( 23.92) 2
S xx xi
2 i 1 29.2892
i 1 20 20
0.68088
an 20 20
xi yi
d 20
S xy xi yi i 1 i 1
i 1 20
( 23.92) (1,843.21)
2,214.6566 10.17744
20
17
Oxygen Purity - continued
Therefore, the least squares estimates of the slope and
intercept are
S xy 10.17744
ˆ 1 14.94748
S xx 0.68088
and
ˆ 0 y ˆ x 92.1605 (14.94748)1.196 74.28331
18
Simple Linear Regression
Estimating 2
The error sum of squares is
n n
SS E ei2 yi yˆ i 2
i 1 i 1
19
Simple Linear Regression
Estimating 2
An unbiased estimator of 2 is
SS E
ˆ2
(11-4)
n2
20
Properties of the Least Squares Estimators
• Slope Properties
2
ˆ )
E ( ˆ )
1 1 V (1
S xx
• Intercept Properties
1 x 2
E (ˆ 0 ) 0 and V (ˆ 0 ) 2
n S xx
21
Confidence Intervals
11-5.1 Confidence Intervals on the Slope and
Intercept
Definition
Under the assumption that the observation are normally and
independently distributed, a 100(1 - a)% confidence interval on
the slope b1 in simple linear regression is
ˆ t ˆ2
ˆ t ˆ2
1 /2, n 2 1 1 /2, n 2
(11-11) S xx S xx
2 1 x2
0 ˆ 0 t/2, n 2
ˆ
n S xx
(11-12)
22
Confidence Intervals
EXAMPLE Oxygen Purity Confidence Interval on the Slope
We will find a 95% confidence interval on the slope of the
regression line using the ˆ 1 14.947
data in, SExample ˆ 2 1.18 that
xx 0.6808811-1.
Recall
, and (see Table 11-2). Then, from
Equation 11-11 we find
ˆ ˆ2 ˆ ˆ2
t0.025,18 1 1 t0.025,18
S xx S xx
Or 1.18 1.18
14.947 2.101 1 14.947 2.101
0.68088 0.68088
This simplifies to
12.181 b1 17.713
24
Adequacy of the Regression Model
EXAMPLE Oxygen Purity Residuals
The regression model for the oxygen purity data in Example
isyˆ 74.283 14.947 x
26
Adequacy of the Regression Model
Example
27
Adequacy of the Regression Model
Example
28
Adequacy of the Regression Model
Coefficient of Determination (R2)
• The
quantity 2 SS R SS E
R 1
SST SST
29
Adequacy of the Regression Model
Coefficient of Determination (R2)
• For the oxygen purity regression
model,
R2 = SSR/SST
= 152.13/173.38
= 0.877
• Thus, the model accounts for 87.7%
of the variability in the data.
30
Some Useful Transformations to Linearize
Diagrams depicting functions listed in Table
11.6
Data for Example
11 - 33
Pressure and volume data and fitted
regression
Example
• A study was made on the amount of converted sugar
in a certain process at various temperatures. The
data were coded and recorded as follows:
• Estimate the linear
regression line.
• Estimate the mean amount
of converted sugar produced
when the coded
temperature is 1.75.
• Plot the residuals versus
temperature. Comment.
35
36