0% found this document useful (0 votes)
4 views22 pages

Session CLRM Review 3

The document discusses the concept of sum of squared residuals in regression analysis, emphasizing its role in finding the best estimator for coefficients. It explains how adding regressors can reduce the sum of squared residuals, while dropping them cannot improve the fit. Additionally, it introduces R-squared and adjusted R-squared as measures of goodness of fit, highlighting their limitations and the importance of considering the variation in predictors.

Uploaded by

sd.shashank74
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views22 pages

Session CLRM Review 3

The document discusses the concept of sum of squared residuals in regression analysis, emphasizing its role in finding the best estimator for coefficients. It explains how adding regressors can reduce the sum of squared residuals, while dropping them cannot improve the fit. Additionally, it introduces R-squared and adjusted R-squared as measures of goodness of fit, highlighting their limitations and the importance of considering the variation in predictors.

Uploaded by

sd.shashank74
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Regression Algebra and Fit

Sisir Debnath
Indian Institute of Technology Delhi

August 31, 2021


The Sum of Square Residuals

The sum of squared residuals was our minimand in our quest to find best
estimator for β

Recall that e0 e = Sum of squared residuals

Note that e0 e = y0 e = e0 y

Proof:

e0 e = (y − Xb)0 e

= ( y0 − b0 X 0 ) e

= y0 e − b0 X 0 e

= y0 e

Since y0 e is a scalar, e0 e = e0 y

HSL719 2021-22 IITD Sisir Debnath


The Sum of Square Residuals

b minimizes e0 e
OLS solution: b = (X 0 X )−1 X 0 y

Any Other Coefficient Vector Other than b Produces a Larger Sum of Squares

Quick proof:
Let d(6= b) be the vector that minimizes sum of square residuals instead

u = y − Xd
= y − Xb + Xb − Xd
= e − X (d − b)

Then,

u0 u = (y − Xd)0 (y − Xd)
= (e − X (d − b))0 (e − X (d − b))
= e0 e + ( d − b ) 0 X 0 X ( d − b )
≥ e0 e

HSL719 2021-22 IITD Sisir Debnath


The Sum of Square Residuals

Adding a Regressor Reduces the Sum of Square Residuals

Quick proof: " #


d
Let Z = [X z] and bz = ( Z 0 Z ) −1 Z 0 y =
c

u = y − Zbz
" #
d
= y − [X z]
c
= y − Xd − zc

Where, c = (z0 Mz)−1 z0 My. [Using F-W theorem]


Therefore, d = (X 0 X )−1 X 0 (y − zc)

u = y − X [(X 0 X )−1 X 0 (y − zc)] − zc


= y − Xb + X (X 0 X )−1 X 0 zc − zc
= e + [X (X 0 X )−1 X 0 − I ]zc
= e − Mzc

HSL719 2021-22 IITD Sisir Debnath


The Sum of Square Residuals

u0 u = (e − Mzc)0 (e − Mzc)

= (e0 −cz0 M 0 )(e − Mzc)

= (e0 e−cz0 M 0 e − e0 Mzc + cz0 M 0 Mzc)

= e0 e−2cz0 M 0 e + cz0 M 0 Mzc

= e0 e−2cz0 M 0 My + c2 z0 M 0 Mz

= e0 e−2cz0 My + c2 z0 M 0 Mz

= e0 e−2cc[z0 Mz] + c2 z0 M 0 Mz

= e0 e−2c2 z0 Mz + c2 z0 Mz

= e0 e−c2 z0 Mz

≤ e0 e

HSL719 2021-22 IITD Sisir Debnath


The Sum of Square Residuals

Dropping a Regressor cannot Reduce the Sum of Square Residuals

Trivially follows from the previous result

u0 u ≤ e0 e

Dropping a variable(s) cannot improve the fit - that is, it cannot reduce the sum of
squared residuals.

Adding a variable(s) cannot degrade the fit - that is, it cannot increase the sum of
squared residuals.

HSL719 2021-22 IITD Sisir Debnath


The Fit of the Regression

Smaller the value of sum of squared residuals, greater the fit of the line to the data

Sum of squared residual is not a good measure of goodness of fit

Instead we may ask if variation in X is a good predictor for variation in y

“Variation:” In the context of the “model” we speak of covariation of a variable as


movement of the variable, usually associated with (not necessarily caused by)
movement of another variable.

Total variation in y
n
SST = ∑ (yi − ȳ)2
i=1

HSL719 2021-22 IITD Sisir Debnath


Aside: Sum of Squared Deviations

i = [1 1 · · · 1]0 is a column of ones (n × 1)


What is the residual maker for i?

M 0 = I − i ( i0 i ) −1 i0
1 − n1 − n1 − n1
 
···
 −1 1 − n1 ··· − n1 
 n 
= .. .. .. .. 
.

 . . . 
− n1 − n1 ··· 1 − n1

Let a = [a1 a2 · · · an ]0 be a (n × 1) vector


a1 − ā
 
 a2 − ā 
M0 a =  . 
 
 .. 
an − ā

n
Therefore, (M 0 a)0 (M 0 a) = ∑ (ai − ā)2
i=1
n
a0 M 0 a = ∑ (ai − ā)2
i=1

HSL719 2021-22 IITD Sisir Debnath


The Fit of the Regression

Smaller the value of sum of squared residuals, greater the fit of the line to the data

Sum of squared residual is not a good measure of goodness of fit

Instead we may ask if variation in X is a good predictor for variation in y

“Variation:” In the context of the “model” we speak of covariation of a variable as


movement of the variable, usually associated with (not necessarily caused by)
movement of another variable.

Total variation in y
n
SST = ∑ (yi − ȳ)2
i=1

= y0 M 0 y

Decomposing the variation in y : M 0 y = M 0 (Xb + e)

Therefore,
y0 M 0 y = (Xb)0 M 0 (Xb) + e0 M 0 e
n n n
∑ (yi − ȳ)2 = ∑ [(xi − x̄)0 b]2 + ∑ e2i
i=1 i=1 i=1

HSL719 2021-22 IITD Sisir Debnath


R-squared

n n n
∑ (yi − ȳ)2 = ∑ [(xi − x̄)0 b]2 + ∑ e2i
i=1 i=1 i=1
Total Sum of Square = Regression Sum of Squares + Residual Sum of Squares
SST = SSR + SSE

SSR
R2 =
SST
SSE
= 1−
SST
e0 e
= 1− 0 0
yM y
b0 X 0 M 0 Xb
=
y0 M 0 y

R2 is bounded by zero and one only if:

B There is a constant term in X and

B The line is computed by linear least squares

HSL719 2021-22 IITD Sisir Debnath


The Adjusted R-Squared

There are some problems with the use of R2 as a measure of goodness of fit
R2 will never decrease when another variable is added to the model
It is tempting to include irrelevant regressors in the model to push R2 to it’s upper
limit to increase the fit

Adjusted R-squared (R¯2 ):


e0 e
(n−K )
R¯2 = 1 − 0
y M0 y
(n−1)
n−1
R¯2 = 1 − (1 − R2 )
n−K

R¯2 includes a penalty for variables that do not add much fit.

R¯2 can fall when a variable is added to the equation. Will see that later.

When will R¯2 rise if another variable, z, is added to the regression?

R¯2 is higher with z than without z if and only if the t ratio on z (in the regression
when it is included) is larger than one in absolute value.

HSL719 2021-22 IITD Sisir Debnath


Analysis of Variance: STATA
Source Sum of Square Degrees of Free- Mean Square
dom
(SS) (df) (MS)

b0 X 0 y−nȳ2
Regression b0 X 0 y − nȳ2 K−1 K −1
(Model)
e0 e
Residual e0 e n−K n−K
y0 y−nȳ2
Total y0 y − nȳ2 n−1 n−1
. reg mpg foreign weight displacement

Source SS df MS Number of obs = 74


F(3, 70) = 45.88
Model 1619.71935 3 539.906448 Prob > F = 0.0000
Residual 823.740114 70 11.7677159 R-squared = 0.6629
Adj R-squared = 0.6484
Total 2443.45946 73 33.4720474 Root MSE = 3.4304

mpg Coef. Std. Err. t P>|t| [95% Conf. Interval]

foreign -1.600631 1.113648 -1.44 0.155 -3.821732 .6204699


weight -.0067745 .0011665 -5.81 0.000 -.0091011 -.0044479
displacement .0019286 .0100701 0.19 0.849 -.0181556 .0220129
_cons 41.84795 2.350704 17.80 0.000 37.15962 46.53628

HSL719 2021-22 IITD Sisir Debnath


R-Squared: STATA
. reg mpg foreign weight displacement

Source SS df MS Number of obs = 74


F(3, 70) = 45.88
Model 1619.71935 3 539.906448 Prob > F = 0.0000
Residual 823.740114 70 11.7677159 R-squared = 0.6629
Adj R-squared = 0.6484
Total 2443.45946 73 33.4720474 Root MSE = 3.4304

mpg Coef. Std. Err. t P>|t| [95% Conf. Interval]

foreign -1.600631 1.113648 -1.44 0.155 -3.821732 .6204699


weight -.0067745 .0011665 -5.81 0.000 -.0091011 -.0044479
displacement .0019286 .0100701 0.19 0.849 -.0181556 .0220129
_cons 41.84795 2.350704 17.80 0.000 37.15962 46.53628

HSL719 2021-22 IITD Sisir Debnath


R-Squared: STATA
. reg mpg foreign weight displacement

Source SS df MS Number of obs = 74


F(3, 70) = 45.88
Model 1619.71935 3 539.906448 Prob > F = 0.0000
Residual 823.740114 70 11.7677159 R-squared = 0.6629
Adj R-squared = 0.6484
Total 2443.45946 73 33.4720474 Root MSE = 3.4304

mpg Coef. Std. Err. t P>|t| [95% Conf. Interval]

foreign -1.600631 1.113648 -1.44 0.155 -3.821732 .6204699


weight -.0067745 .0011665 -5.81 0.000 -.0091011 -.0044479
displacement .0019286 .0100701 0.19 0.849 -.0181556 .0220129
_cons 41.84795 2.350704 17.80 0.000 37.15962 46.53628

HSL719 2021-22 IITD Sisir Debnath


R-Squared: STATA
. reg mpg foreign weight displacement

Source SS df MS Number of obs = 74


F(3, 70) = 45.88
Model 1619.71935 3 539.906448 Prob > F = 0.0000
Residual 823.740114 70 11.7677159 R-squared = 0.6629
Adj R-squared = 0.6484
Total 2443.45946 73 33.4720474 Root MSE = 3.4304

mpg Coef. Std. Err. t P>|t| [95% Conf. Interval]

foreign -1.600631 1.113648 -1.44 0.155 -3.821732 .6204699


weight -.0067745 .0011665 -5.81 0.000 -.0091011 -.0044479
displacement .0019286 .0100701 0.19 0.849 -.0181556 .0220129
_cons 41.84795 2.350704 17.80 0.000 37.15962 46.53628

HSL719 2021-22 IITD Sisir Debnath


R-Squared: STATA
. reg mpg foreign weight displacement

Source SS df MS Number of obs = 74


F(3, 70) = 45.88
Model 1619.71935 3 539.906448 Prob > F = 0.0000
Residual 823.740114 70 11.7677159 R-squared = 0.6629
Adj R-squared = 0.6484
Total 2443.45946 73 33.4720474 Root MSE = 3.4304

mpg Coef. Std. Err. t P>|t| [95% Conf. Interval]

foreign -1.600631 1.113648 -1.44 0.155 -3.821732 .6204699


weight -.0067745 .0011665 -5.81 0.000 -.0091011 -.0044479
displacement .0019286 .0100701 0.19 0.849 -.0181556 .0220129
_cons 41.84795 2.350704 17.80 0.000 37.15962 46.53628

HSL719 2021-22 IITD Sisir Debnath


R-Squared: STATA
. reg mpg foreign weight displacement

Source SS df MS Number of obs = 74


F(3, 70) = 45.88
Model 1619.71935 3 539.906448 Prob > F = 0.0000
Residual 823.740114 70 11.7677159 R-squared = 0.6629
Adj R-squared = 0.6484
Total 2443.45946 73 33.4720474 Root MSE = 3.4304

mpg Coef. Std. Err. t P>|t| [95% Conf. Interval]

foreign -1.600631 1.113648 -1.44 0.155 -3.821732 .6204699


weight -.0067745 .0011665 -5.81 0.000 -.0091011 -.0044479
displacement .0019286 .0100701 0.19 0.849 -.0181556 .0220129
_cons 41.84795 2.350704 17.80 0.000 37.15962 46.53628
. reg mpg foreign weight displacement length

Source SS df MS Number of obs = 74


F(4, 69) = 35.56
Model 1645.32866 4 411.332165 Prob > F = 0.0000
Residual 798.130799 69 11.567113 R-squared = 0.6734
Adj R-squared = 0.6544
Total 2443.45946 73 33.4720474 Root MSE = 3.401

mpg Coef. Std. Err. t P>|t| [95% Conf. Interval]

foreign -1.692645 1.105846 -1.53 0.130 -3.898747 .5134562


weight -.0044303 .0019544 -2.27 0.027 -.0083292 -.0005315
displacement .0005878 .0100245 0.06 0.953 -.0194106 .0205861
length -.0824511 .0554128 -1.49 0.141 -.1929966 .0280944
_cons 50.55702 6.300024 8.02 0.000 37.98882 63.12523
HSL719 2021-22 IITD Sisir Debnath
R-Squared: STATA
. reg mpg foreign weight displacement length

Source SS df MS Number of obs = 74


F(4, 69) = 35.56
Model 1645.32866 4 411.332165 Prob > F = 0.0000
Residual 798.130799 69 11.567113 R-squared = 0.6734
Adj R-squared = 0.6544
Total 2443.45946 73 33.4720474 Root MSE = 3.401

mpg Coef. Std. Err. t P>|t| [95% Conf. Interval]

foreign -1.692645 1.105846 -1.53 0.130 -3.898747 .5134562


weight -.0044303 .0019544 -2.27 0.027 -.0083292 -.0005315
displacement .0005878 .0100245 0.06 0.953 -.0194106 .0205861
length -.0824511 .0554128 -1.49 0.141 -.1929966 .0280944
_cons 50.55702 6.300024 8.02 0.000 37.98882 63.12523

HSL719 2021-22 IITD Sisir Debnath


R-Squared: STATA
. reg mpg foreign weight displacement length headroom gear_ratio trunk

Source SS df MS Number of obs = 74


F(7, 66) = 20.23
Model 1666.67596 7 238.096565 Prob > F = 0.0000
Residual 776.783504 66 11.769447 R-squared = 0.6821
Adj R-squared = 0.6484
Total 2443.45946 73 33.4720474 Root MSE = 3.4307

mpg Coef. Std. Err. t P>|t| [95% Conf. Interval]

foreign -2.451147 1.259653 -1.95 0.056 -4.966124 .0638309


weight -.0041847 .0019917 -2.10 0.039 -.0081613 -.0002082
displacement .007931 .0115234 0.69 0.494 -.0150762 .0309382
length -.090164 .0611049 -1.48 0.145 -.2121639 .0318358
headroom -.086585 .6401843 -0.14 0.893 -1.364754 1.191584
gear_ratio 2.383442 1.777187 1.34 0.184 -1.164826 5.931711
trunk .0077513 .1582042 0.05 0.961 -.3081135 .3236162
_cons 43.00843 8.706864 4.94 0.000 25.62461 60.39224

HSL719 2021-22 IITD Sisir Debnath


R-Squared and Partial Correlation

Let R2Xz be the R-square in the regression of y on X and z, Let R2X be the same in the
∗ be the partial correlation between y and z after
regression of y on X alone, and let ryz
controlling for X. Then
 
∗2
R2Xz = R2X + 1 − R2X ryz

R2Xz − R2X

∗2
ryz =
1 − R2X


t2z
'
t2z + df

R − squared without length: 0.6629


R − squared with length: 0.6734

R2Xz − R2X

∗ 0.6734 − 0.6629
rmpg,lenght = = = 0.03115
1 − R2X

1 − 0.6629
t2z (−1.49)2
' = = 0.03117
t2z + df (−1.49)2 + 74 − 5

HSL719 2021-22 IITD Sisir Debnath


Comparing Models

Make sure the denominator in R2 is the same - i.e., same left hand side variable.
Example, linear vs. loglinear. Loglinear will almost always appear to fit better
because taking logs reduces variation.

HSL719 2021-22 IITD Sisir Debnath


Comparing Models: STATA
. reg mpg foreign weight displacement

Source SS df MS Number of obs = 74


F( 3, 70) = 45.88
Model 1619.71935 3 539.906448 Prob > F = 0.0000
Residual 823.740114 70 11.7677159 R-squared = 0.6629
Adj R-squared = 0.6484
Total 2443.45946 73 33.4720474 Root MSE = 3.4304

mpg Coef. Std. Err. t P>|t| [95% Conf. Interval]

foreign -1.600631 1.113648 -1.44 0.155 -3.821732 .6204699


weight -.0067745 .0011665 -5.81 0.000 -.0091011 -.0044479
displacement .0019286 .0100701 0.19 0.849 -.0181556 .0220129
_cons 41.84795 2.350704 17.80 0.000 37.15962 46.53628

. reg ln_mpg foreign weight displacement

Source SS df MS Number of obs = 74


F( 3, 70) = 64.83
Model 3.63346676 3 1.21115559 Prob > F = 0.0000
Residual 1.30775659 70 .018682237 R-squared = 0.7353
Adj R-squared = 0.7240
Total 4.94122335 73 .067687991 Root MSE = .13668

ln_mpg Coef. Std. Err. t P>|t| [95% Conf. Interval]

foreign -.1062946 .0443727 -2.40 0.019 -.1947933 -.017796


weight -.0003058 .0000465 -6.58 0.000 -.0003985 -.0002131
displacement -.0001345 .0004012 -0.34 0.739 -.0009347 .0006658
_cons 4.006155 .0936626 42.77 0.000 3.819351 4.192959

HSL719 2021-22 IITD Sisir Debnath

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy