Session CLRM Review 3
Session CLRM Review 3
Sisir Debnath
Indian Institute of Technology Delhi
The sum of squared residuals was our minimand in our quest to find best
estimator for β
Note that e0 e = y0 e = e0 y
Proof:
e0 e = (y − Xb)0 e
= ( y0 − b0 X 0 ) e
= y0 e − b0 X 0 e
= y0 e
Since y0 e is a scalar, e0 e = e0 y
b minimizes e0 e
OLS solution: b = (X 0 X )−1 X 0 y
Any Other Coefficient Vector Other than b Produces a Larger Sum of Squares
Quick proof:
Let d(6= b) be the vector that minimizes sum of square residuals instead
u = y − Xd
= y − Xb + Xb − Xd
= e − X (d − b)
Then,
u0 u = (y − Xd)0 (y − Xd)
= (e − X (d − b))0 (e − X (d − b))
= e0 e + ( d − b ) 0 X 0 X ( d − b )
≥ e0 e
u = y − Zbz
" #
d
= y − [X z]
c
= y − Xd − zc
u0 u = (e − Mzc)0 (e − Mzc)
= e0 e−2cz0 M 0 My + c2 z0 M 0 Mz
= e0 e−2cz0 My + c2 z0 M 0 Mz
= e0 e−2cc[z0 Mz] + c2 z0 M 0 Mz
= e0 e−2c2 z0 Mz + c2 z0 Mz
= e0 e−c2 z0 Mz
≤ e0 e
u0 u ≤ e0 e
Dropping a variable(s) cannot improve the fit - that is, it cannot reduce the sum of
squared residuals.
Adding a variable(s) cannot degrade the fit - that is, it cannot increase the sum of
squared residuals.
Smaller the value of sum of squared residuals, greater the fit of the line to the data
Total variation in y
n
SST = ∑ (yi − ȳ)2
i=1
M 0 = I − i ( i0 i ) −1 i0
1 − n1 − n1 − n1
···
−1 1 − n1 ··· − n1
n
= .. .. .. ..
.
. . .
− n1 − n1 ··· 1 − n1
n
Therefore, (M 0 a)0 (M 0 a) = ∑ (ai − ā)2
i=1
n
a0 M 0 a = ∑ (ai − ā)2
i=1
Smaller the value of sum of squared residuals, greater the fit of the line to the data
Total variation in y
n
SST = ∑ (yi − ȳ)2
i=1
= y0 M 0 y
Therefore,
y0 M 0 y = (Xb)0 M 0 (Xb) + e0 M 0 e
n n n
∑ (yi − ȳ)2 = ∑ [(xi − x̄)0 b]2 + ∑ e2i
i=1 i=1 i=1
n n n
∑ (yi − ȳ)2 = ∑ [(xi − x̄)0 b]2 + ∑ e2i
i=1 i=1 i=1
Total Sum of Square = Regression Sum of Squares + Residual Sum of Squares
SST = SSR + SSE
SSR
R2 =
SST
SSE
= 1−
SST
e0 e
= 1− 0 0
yM y
b0 X 0 M 0 Xb
=
y0 M 0 y
There are some problems with the use of R2 as a measure of goodness of fit
R2 will never decrease when another variable is added to the model
It is tempting to include irrelevant regressors in the model to push R2 to it’s upper
limit to increase the fit
R¯2 includes a penalty for variables that do not add much fit.
R¯2 can fall when a variable is added to the equation. Will see that later.
R¯2 is higher with z than without z if and only if the t ratio on z (in the regression
when it is included) is larger than one in absolute value.
b0 X 0 y−nȳ2
Regression b0 X 0 y − nȳ2 K−1 K −1
(Model)
e0 e
Residual e0 e n−K n−K
y0 y−nȳ2
Total y0 y − nȳ2 n−1 n−1
. reg mpg foreign weight displacement
Let R2Xz be the R-square in the regression of y on X and z, Let R2X be the same in the
∗ be the partial correlation between y and z after
regression of y on X alone, and let ryz
controlling for X. Then
∗2
R2Xz = R2X + 1 − R2X ryz
R2Xz − R2X
∗2
ryz =
1 − R2X
t2z
'
t2z + df
R2Xz − R2X
∗ 0.6734 − 0.6629
rmpg,lenght = = = 0.03115
1 − R2X
1 − 0.6629
t2z (−1.49)2
' = = 0.03117
t2z + df (−1.49)2 + 74 − 5
Make sure the denominator in R2 is the same - i.e., same left hand side variable.
Example, linear vs. loglinear. Loglinear will almost always appear to fit better
because taking logs reduces variation.