Lecture II - Docx - 12
Lecture II - Docx - 12
Sarmila Banerjee
Lecture II
K-Variable CLRM
β^
Derivation of OLS estimator
;
V ( β^ ) s2u
Derivation of and its estimator ;
1
[ ][ ][ ] [ ]
Y1 1 X 21 . . X k1 β 1 u1
Y2 1 X 22 . . X k2 β 2 u2
. = . . . . . . + .
. . . . . . . .
Yn 1 X2 n . . X kn βk un
Y = Xβ+U
, the intercept term is represented by a sum vector : [1, 1, …., 1]’
This is the variance-Covariance matrix of U, which has V(ui)’s along diagonal and Cov (ui, uj),
∀ i≠ j
, along off-diagonal;
[ ]
u21 u1 u2 . . u1 un
u1 u 2 u22 . . u2 un
Var (U )=E(UU ' )=E . . . . . =σ 2u I n
. . . . .
u1 u n . . . u 2n
Data Matrix [X]nxk is non-stochastic means in repeated sampling the values of X’s are
held constant;
Since the regression equation has a linear structure, while producing the effect the
included causes should not interact among themselves;
2
Implies the columns of matrix X should be linearly independent. ρ(X) = k;
However, for the model to be estimable another additional assumption needed is [k < n].
Number of unknowns in the k-variable model is k no. of βj’s and n no. of ui’s; i.e., (k+n)
unknowns and n number of equations, where ui’s are stochastic.
ui ~ IN (0,σ 2u )∀ i,
Since hence (k+1) parameters are really unknown in the system, viz.,
σ 2u
[(β1, β2, …., βk) and ];
The estimators are obtained by minimizing the RSS (residual sum of squares) and k no. of
F.O.C.s (first order conditions) are obtained as:
∂(e' e)
=0 ,
∂β e=(Y −Y^ )=(Y −X β^ );
where RSS = e’e and
These F.O.C.s are giving the number of constraints that the optimal value of RSS has to
satisfy.
So, in the presence of k-parameters, and, therefore, k no. of F.O.C.s, out of n no. of ei’s
only (n-k) can be chosen freely.
Thus, (n-k) stands for the degree of freedom of the system and indicates the statistical
power of the inferences drawn about the unknown parameters.
So, the requirement in a compact form turns out to be: ρ(X) = k < n;
OLS Estimation
^ (Y− X β)
e' e=(Y −X β)' ^
^ β'
=Y ' Y −Y ' X β− ^ X ' Y + β^ '( X ' X ) β^
^ β'
=Y ' Y −2Y ' X β+ ^ ( X ' X ) β^
3
^ β^ ' X ' Y
Y ' X β∧
Since both are scalar and one is the transpose of the other, hence both are
equal;
Y ' X β^ β^
is a linear function of and
β^ ' ( X ' X ) β^ β^
is a quadratic function of ;
n
∑ ci x i =c ' x
i=1
is a linear form in x, where, x(nx1) & c(nx1);
[ ][ ]
∂( c ' x )
∂ x1 c1
∂( c ' x ) . .
=. = . =c
∂x
. .
∂( c ' x ) cn ∂(Y ' X β^ )
∂ xn =X'Y
∂ β^
; so,
x ' Ax=( x 1 x2 )
[ a11
a12 ]( )
a12 x 1
a22 x 2
4
[ ][
∂( x ' Ax )
∂( x ' Ax )
∂x
=
∂ x1
∂( x ' Ax )
∂ x2
=
2 a11 x 1 +2 a 12 x 2
2 a12 x 1 +2 a 22 x 2 ]
=2 Ax
∂ [ β^ ' ( X ' X ) β^ ]
∴ =2( X ' X ) β^
∂ β^
∂(e ' e ) ^
∴ =−2 X ' Y +2( X ' X ) β=0
∂ β^
^
∴( X ' X ) β=X 'Y ;
β^ OLS :
Properties of
β^
Linear in Y; Since Y is a linear function of U, so will be ;
β^ j
The marginal distribution of each would be normal;
β^ β^
is unbiased estimator of β; E( ) = β;
5
β^ OLS=( X ' X )−1 X ' Y
β^
Variance-Covariance matrix of :
β^
is the minimum variance unbiased estimator (MVUE); i.e., in the class of all linear
β^
and unbiased estimators of , the OLS estimator has the minimum variance.
Minimum Variance Property (Best):
Direct comparison of the variance-covariance matrices would be inappropriate as they
contain information not only about variance but covariance as well and the claim is
placed only against variance.
Way out:
β^ ,
Consider a linear combinations of the elements of vector b=c’β, where c(kx1) is a
vector of constant coefficients;
Suppose, c’ = [0,0,…, 1,0…0], with 1 in the jth position. Then, b = c’β = βj and
^ β^ ;
c ' β= j
6
^ ' β=c
b=c ^ ' ( X ' X )−1 X ' Y =c ' ( X ' X )−1 X ' ( Xβ +U )
¿c β+c ( X X )−1 X U
' ' ' '
E( b^ )=c β=b
'
as E(U) = 0.
^
V ( b)=E {b−E(
^ b) ^ }{b−E(
^ b)[^ }′ ]
¿ E [ c ( X X)−1 X U U X( X X)−1 c ]
' ' ' ' '
¿σ 2u c ( X X )−1 c
' '
Propose another linear estimator of b, say, b*, defined as: b* = a’Y, where a(nx1) is a
vector of constant coefficients;
¿ σ 2u a' a
7
Definition:
Characteristic roots (latent roots/ Eigen values) and characteristic vectors (latent vectors/
Eigen vectors) are often introduced to students in the context of linear algebra courses
focused on matrices.
It can produce non-trivial solution for X, if and only if, the matrix (A – λI) is non-singular,
i.e. | A – λI| = 0; this is known as the characteristic equation;
All these vectors will be scalar multiples of one another and, therefore, parallel and
collinear (i.e., independent of each other).
8
A=
[ a 11
a 12
a12
a22 ]
Example: Consider the 2x2 real, square, symmetric matrix
∴|A−λI|=0
a −λ a12
→| 11 |=0
a 12 a22− λ
→ λ2 −(a11+a22 ) λ+(a11 a22−a212 )=0;
Suppose, a11 = a22 = 1 & a12 = 2; then λ1 = 3 & λ2 = -1; Solving the characteristic function
x 11=x 12
AX = λX for λ = 3, we get ;
x 21 =−x 22
Similarly, the solution corresponding to λ = -1 gives ;
i.e., the solution comes in the form of a ratio and solvable up to a scalar multiple.
To get exact solution the system needs to be closed by adding another equation involving
the same variables.
9
[] [ ]
1 1
X 1= √ ∧ X 2= √
2 2
1 1
−
√2 √2
Using that, ; where
Diagonalization of Matrix A:
Or, P’AP = Λ, or, A = PΛP’ → the matrix A can be completely characterized in terms of
its Eigen values and the associated Eigen vectors.
Hence, matrix A and Λ are similar matrices and the transformation is called similarity
transformation.
Definiteness of Matrices:
Consider a quadratic form X’AX; if all characteristic roots of A are positive then
irrespective of the value of X, (X’AX) > 0 and A is called positive definite;
Similarly, negative definiteness and semi-positive definiteness can be defined for the
quadratic form X’AX according as λi’s are all negative or semi-positive.
^
Var(b∗)−Var( b)=σ 2 '
u a Ma≥0
10
^
Var(b∗)≥Var( b);
σ 2u :
V(ui) =
Unbiased estimator of
ei helps to form an idea about ui ;
σ 2u ;
Try V(ei) = [RSS/n] as an unbiased estimator of
'
ee
RSS = ;
e=(Y −Y^ )=(Y −X β^ )
=Y −X ( X X )−1 X Y
' '
E(U MU )=E [ u1
'
u2 ]
[ m11
m12 ][ ]
m12 u1
m22 u2
For n=2,
11
tr ( M )=tr [ I n − X ( X X )−1 X ]
' '
( )
'
ee RSS
¿E =σ u2 ;or , s 2u=
n−k n−k
s2u σ 2u ;
is an unbiased estimator of
Note:
Since tr(M) = (n - k) and the characteristic roots of M are either 1 or 0, hence, exactly (n
– k) columns of M are linearly independent and the remaining k are not. So, the system
has (n – k) degrees of freedom.
12