Properties of Least Squares Estimators
Properties of Least Squares Estimators
Table of Contents
Chapter: Properties of Least Squares Estimators
Introduction
Assumptions of the regression model
Unbiasedness of the OLS estimators
Variances of OLS estimators
Gauss-Markov Theorem
Sampling distributions of OLS estimators
Summary
Exercises
Solved Problems
Practice Questions
References
Learning Outcomes
After reading this lesson you should be able to
Introduction
We learnt in lesson 1 the method of ordinary least squares to estimate the population
regression coefficients, the intercept and the slope coefficient. The method was quite
simple. However, are these estimators reliable?
Are these the best estimators or there are better estimators available? In order to be
able to answer these questions, we need to know the properties of the least squares
estimators that in turn depends upon the assumption of the classical linear regression
model. The current lesson deals with the assumptions of the regression model, the
properties of the least square estimators, Gauss Markov Theorem and the sampling
distributions of the estimators.
(1)
(2)
Basically what linearity in parameters implies is that as in eq. (1), we are estimating
and simply and not any function of and or any relationship between the two
as in eq (2).
It is assumed further that there is no specification bias in the regression model. For
instance, while estimating the relationship between food expenditure and income, one
knows that food expenditure increases at a decreasing rate when income increases,
therefore a simple linear relationship between the two may not be correct and we should
include another variable, squared income that captures the change in food expenditure
at higher levels of income .It also implies that we have included all the possible variables
that could affect a given dependent variable.
A3:- The error term has a zero mean given the values of .
That is,
(3)
Recall that the error term incorporate all the variables that are not included as
regression but may affect . Thus, assumption A3 states that all these variables are not
correlated with and therefore, given the value of , their mean value is zero.
Fig.1
As shown in fig.1 above, the mean of for all the values of is zero.
Assumption A4 states that the values of are fixed in a sample and thus, don’t contain
any stochastic or random components. This assumption also implies that the explanatory
variable X is uncorrelated with the error term, .
A5:- It is assumed further that the disturbance term, has a constant variance,
denoted by that is
(5)
Figs. 2(a) and 2(b) show the cases of homoscedasticity and heteroscedasticity
respectively:-
Fig. 2
A6:-The regression model also assumes that the error terms are not correlated
amongst themselves. This is known as the assumption of no autocorrelation.
. (6)
What this assumption means is that no two error terms are systematically dependent or
related. In other words, they are purely random.
Note that by of eq. (1) and given that are fixed or constants, and is non-
stochastic, variance of will be same as variance of . Also, eq. (6) and eq. (1) would
imply that . On the other hand if this covariance is not zero, then
the error terms are said to be auto correlated. Fig.3 below shows different patterns of
autocorrelation between two error terms.
Fig. 3
A7:- The variance of regressor / independent variable should not be zero i.e.
the values of X in a given sample should not be identical.
x i yi X i X Yi Y
ˆ 2
x X X
2 2
i i
This ratio will be undefined if all the are identical. Not only , will also be
indeterminate. If the independent variable has little variation, it will not be able to
explain the variation in the dependent variable.
We will prove that the slope coefficient, b2 is unbiased in the text and leave the proof for
b1, as an exercise (solved) at the end of the lesson.
Before we start the proof, let us recall the formula for the slope coefficient as
= . (7).
. (8).
In order to prove the unbiasedness of b2, let us run the expectation operator through eq.
(8) to obtain
= ).
= . (9).
Note that
Now, notice that the slope coefficient, b2 is not the only estimator of that is unbiased.
For instance, consider an estimator that is obtained by joining the 1st and 2nd
observations in the sample as shown in fig. 4 below and then finding the slope of that
line.
Fig.4
. (10)
Recall eq. (1) that is, for all i and note that
And . (11).
Substituting for and from eq. (11) into eq. (10), we obtain
= (12).
Note that
Thus, and hence the naïve estimator is also an unbiased of .Also the
estimator is relatively simple to compute.
The answer is no because as you notice, this estimator uses only two observations to
find out an estimate of b2 and wastes the rest of them.
Therefore, this naïve estimators will be highly sensitive to the values of the error term
for the first two observations. On the other hand, the OLS estimator makes use of all the
observations in the sample and has the advantage that a lot of error terms may cancel
out each other and may not affect the regressions much.
Indeed, it can be shown that naïve estimator has a much higher variance than that of
the OLS estimators and is therefore less precise which is what we discuss in the next
section.
And . (13).
We will derive the formula for now and leave the derivation of the formula for as
solved exercise question.
We know that,
Hence, Var (
. (14)
Thus, as eq. (14) states, is directly proportional to the variance of error term
and inversely proportional to the squared mean derivation of .That is to say, higher
is the variance of the error term, the less efficient will be (more will be the variance of
) and the lower is the variance of the more precise will the be.
And .
were the population variances of the least squares estimators, and respectively. In
reality, however one has to work with just a sample of population.
However, it can be shown that the above measure is not a preferable estimator of as
it is biased. More precisely, it can be shown that an unbiased estimator of is
denoted by .
Further, by taking the square roots of the population variances of & , one can obtain
their standard deviations that are popularly known as standard errors of the regression
coefficients, which are abbreviated to in the literature.
And .
The subsequent section using Gauss Markov Theorem shows why OLS estimators must
be preferred over naïve estimators even though both are unbiased estimators.
X i X Yi Y
ˆ 2 i 1
n
and ˆ 1 Y ˆ 2 X
X X
2
i
i 1
X i X Yi Y X i X
X i X Yi Y X i n X
(Since X i nX , we get
X i X Yi )
xi
Let ai
2
xi
X i X Yi
ˆ 2 ai Yi
X X
2
i
ˆ 2 ai Yi
We know PRF; Yi 1 2 X i ui
So; ˆ 2 ai 1 2 X i ui
1 ai 2 ai X i ai ui
xi
Given ai ;
x 2
i
x X i i x x X x
i i
2
i
a Xi i 1
x 2
i x 2
i x 2
i
, ˆ 2 1 0 2 1 ai ui 2 ai ui
E ˆ 2 2 ai E ui 2 (Since, E ui = 0 by assumption of CLRM)
In the class of all linear unbiased estimators, OLS estimators have the minimum
variance.
Let us prove that ̂ 2 has minimum variance in the class of all linear unbiased estimators
of 2 .
xi
Consider ˆ 2 ai Yi where ai
x 2
i
wi 1 2 X i 1 wi 2 wi X i
Let us compute the variance of
2 : var 2 var wi Yi
2
x x
2 wi i i
xi2 xi2
2
x xi2 xi xi
2 wi i 2 2 2
i
w
2
xi xi
2
xi
xi2
2
2
2
x 1
2 wi i 2
2 2
xi
xi
2
If wi ai
xi
, then var 2
var ˆ 2
xi2 xi2
2
If wi ai then var
2
x 2
i
that OLS estimator ̂ 2 has minimum variance in the class of linear unbiased estimators.
Thus, by proving (1), (2) and (3) we have proved Gauss Markov theorem or that OLS
estimators are BLUE.
The above assumption is added to the initial assumptions to arrive at the sampling
distribution of OLS estimators.
An explanation of A9 lies in the Central Limit Theorem (CLT) which can be stated as
follows:
If there are large number of independently and identically distributed random variable,
then with a few exceptions, the distribution of their sum tends to a normal distribution as
the number of such variables increases indefinitely (Gujarati and Porter : Essential Of
Econometrics).
Now, since the error term contains all those random variables which affect the dependent
variables other than explanatory variables, included in the model, it can be thought of sum
of all these random variables. By involving the CLT ' ui ' therefore, follows a normal
Since any linear function of a normally distributed variable is itself normally distributed,
̂1 and ̂ 2 will also be normally distributed. We had already proved that ̂1 and ̂ 2 are
linear functions of error term ' ui '. Therefore,
If i ~ N 0, 2 then
ˆ 1 ~ N 1 , 2ˆ
1
and ˆ 2
~ N 2 , 2ˆ
2
X 2
2
i
Also we know that, var ˆ 1 2ˆ 2
and var ˆ 2 2ˆ
n x n xi2
1 2
2
i
Fig. 5 Sampling distribution of ̂ 2 and 2 .
Since ̂ 2 is unbiased,
E ˆ 2 2 . Also you can see 2s sampling distribution. Though
it has the same mean, variance is much higher compared with variance of ̂ 2 . This
Summary
1. The assumptions of the classical linear regression model are linearity, non -
stochastic regressors, no autocorrelation, homoscedasticity, zero conditional
mean of the error term and normality of the error term.
2. Under the assumptions of the CLRM, it is shown that OLS estimators are unbiased
and their variances and standard errors are derived.
3. Guass Markov theorem states that under the assumptions of the CLRM (A1-A7),
OLS estimators are BLUE which has been proved in the lesson.
4. Using the last assumption of the CLRM (A8), it is shown that the OLS estimators
are normally distributed.
Exercises
Solved Exercises:-
Proof: Consider the formula for from the normal equations as follows:
Proved.
2. Derive the expression for the as given in eq. (13) in the text.
Solution: To prove:
Proof: Note that
-0.
Proved.
(a) Which estimator is better, the naïve estimator or the OLS estimator if the
researcher is concerned with the property of unbiasedness? Explain your
answer.
(b) Find the variance of the naïve estimator, b2. Does your answer remain the
same if efficiency is also a desirable property of the estimator?
Solution:
(a) In order to answer this question, we need to find out if naïve estimator is an
unbiased estimator of or not.
Proof: Taking the expectation of the naïve estimator, we obtain:
Proved.
Thus since both the naïve estimators as well as OLS estimator are unbiased estimators
of , so the researcher can go for anyone of them.
Y Y E b2 2 var b2 E b2 2
2
(b) Now, b2 n 1 and , so
X n X1
u u
2
E 2 n 1 2
X n X1
1
X n X1
2
E un2 E u12 2 E u1 un
By assumption: E ui2 u2 and E ui u j 0 where i j
1 2u2
var b2 0
2 2
X n X1 X n X1
2 u u 2
As far as precision is concerned, we know from Gauss Markov Theorem that b2 (OLS
estimator) is BLUE. So that must be preferred.
(2) An estimate that will be equal to the true parameter in large samples
(4) If repeated samples of same size are taken, on average their value will be
equal to the true parameter value.
6. Given the assumptions in column 1 of the table, show that the assumptions in
column 2 are equivalent to them.
(1) (2)
E (ui | X i ) 0 E (Yi | X i ) 2 2 X
cov(ui , u j ) 0, i j cov(Yi , Yj ) 0, i j
var(ui , X i ) 2 var(Yi , X i ) 2
E Yi | X i 1 2 E X i | X i E ui | X i 1 2 X i 0 1 2 X i
Hence, proved
Now;
cov Yi , Y j E Yi E Yi Y j E Y j E u u 0
i j
Hence, proved.
PRF: Yi 1 2 X i ui
E Yi | X i 1 2 X i E ui | X i 1 2 X i
So, Yi E Yi | X i ui
Hence, proved.
Practice questions:
1. State whether true or false.
2. Show that
3. Find out the expression for the variance of the naïve estimator (
discussed in the text. Compare it with the variance of , the OLS estimator of
5. Prove that ̂1 (OLS estimator of 1 ) is BLUE i.e. linear, unbiased and efficient.
REFERENCES
1. Dougherty, C.,”Introduction to Econometrics”, (3rd edition), OUP.
2. Gujarati, D.N and D.C. Porter, “Essentials of Econometrics”. (4 th edition), McGraw Hill
Publication.