0% found this document useful (0 votes)
24 views21 pages

Properties of Least Squares Estimators

The document discusses the properties of Least Squares Estimators, focusing on the assumptions of the classical linear regression model, unbiasedness, variances, and the Gauss-Markov Theorem. It outlines key assumptions such as linearity, zero mean of error terms, and homoscedasticity, which are crucial for the reliability of Ordinary Least Squares (OLS) estimators. The lesson aims to equip readers with an understanding of these properties and their implications for regression analysis.

Uploaded by

Smitha Ajay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views21 pages

Properties of Least Squares Estimators

The document discusses the properties of Least Squares Estimators, focusing on the assumptions of the classical linear regression model, unbiasedness, variances, and the Gauss-Markov Theorem. It outlines key assumptions such as linearity, zero mean of error terms, and homoscedasticity, which are crucial for the reliability of Ordinary Least Squares (OLS) estimators. The lesson aims to equip readers with an understanding of these properties and their implications for regression analysis.

Uploaded by

Smitha Ajay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Properties of Least Squares Estimators

Lesson: Properties of Least Squares


Estimators
Lesson Developer: Niti Khandelwal Garg;
Sonam
College/Department: KM College and
Hansraj College, University of Delhi

Institute of Life Long Learning 1


Properties of Least Squares Estimators

Institute of Life Long Learning 2


Properties of Least Squares Estimators

Table of Contents
Chapter: Properties of Least Squares Estimators

 Introduction
 Assumptions of the regression model
 Unbiasedness of the OLS estimators
 Variances of OLS estimators
 Gauss-Markov Theorem
 Sampling distributions of OLS estimators
 Summary
 Exercises
 Solved Problems
 Practice Questions
 References

Institute of Life Long Learning 3


Properties of Least Squares Estimators

Learning Outcomes
After reading this lesson you should be able to

(a) Understand the assumptions of the classical linear regression model.


(b) Understand the Gauss Markov Theorem.
(c) Examine the properties of the OLS estimators.
(d) Derive the sampling distribution of the OLS estimators.

Institute of Life Long Learning 4


Properties of Least Squares Estimators

Introduction
We learnt in lesson 1 the method of ordinary least squares to estimate the population
regression coefficients, the intercept and the slope coefficient. The method was quite
simple. However, are these estimators reliable?

Are these the best estimators or there are better estimators available? In order to be
able to answer these questions, we need to know the properties of the least squares
estimators that in turn depends upon the assumption of the classical linear regression
model. The current lesson deals with the assumptions of the regression model, the
properties of the least square estimators, Gauss Markov Theorem and the sampling
distributions of the estimators.

Assumptions of the classical linear regression model


(CLRM)
A1:- The regression model is assumed to be linear in parameters though not
necessarily in regression. For instance, the regression model of the type given in eq.
(1) below is considered to be linear in parameters while the one given in eq. (2) below is
not.

(1)

(2)

Basically what linearity in parameters implies is that as in eq. (1), we are estimating
and simply and not any function of and or any relationship between the two
as in eq (2).

A2:- The regression model is correctly specified.

It is assumed further that there is no specification bias in the regression model. For
instance, while estimating the relationship between food expenditure and income, one
knows that food expenditure increases at a decreasing rate when income increases,
therefore a simple linear relationship between the two may not be correct and we should
include another variable, squared income that captures the change in food expenditure
at higher levels of income .It also implies that we have included all the possible variables
that could affect a given dependent variable.

A3:- The error term has a zero mean given the values of .

That is,

(3)

Recall that the error term incorporate all the variables that are not included as
regression but may affect . Thus, assumption A3 states that all these variables are not
correlated with and therefore, given the value of , their mean value is zero.

Institute of Life Long Learning 5


Properties of Least Squares Estimators

Fig.1

As shown in fig.1 above, the mean of for all the values of is zero.

A4:- Variable is non- stochastic.

Assumption A4 states that the values of are fixed in a sample and thus, don’t contain
any stochastic or random components. This assumption also implies that the explanatory
variable X is uncorrelated with the error term, .

That is, (4)

Note: Cov stands for covariance.

A5:- It is assumed further that the disturbance term, has a constant variance,
denoted by that is

(5)

Thus, variance of is independent of the observation and we say that is


homoscedastic. On the other hand, if variance of error term, doesn’t remain constant,
then the error term is said to be heteroscedastic.

Figs. 2(a) and 2(b) show the cases of homoscedasticity and heteroscedasticity
respectively:-

Institute of Life Long Learning 6


Properties of Least Squares Estimators

Fig. 2

A6:-The regression model also assumes that the error terms are not correlated
amongst themselves. This is known as the assumption of no autocorrelation.

Algebraically, the assumption implies that

. (6)

What this assumption means is that no two error terms are systematically dependent or
related. In other words, they are purely random.

Note that by of eq. (1) and given that are fixed or constants, and is non-
stochastic, variance of will be same as variance of . Also, eq. (6) and eq. (1) would
imply that . On the other hand if this covariance is not zero, then
the error terms are said to be auto correlated. Fig.3 below shows different patterns of
autocorrelation between two error terms.

Institute of Life Long Learning 7


Properties of Least Squares Estimators

Fig. 3

(a):No Autocorrelation (b):Positive Autocorrelation (c) :Negative Autocorrelation

A7:- The variance of regressor / independent variable should not be zero i.e.
the values of X in a given sample should not be identical.

To understand the importance of this assumption, recall from lesson 1,

x i yi  X i  X Yi  Y 
ˆ 2  
x  X X
2 2
i i

This ratio will be undefined if all the are identical. Not only , will also be
indeterminate. If the independent variable has little variation, it will not be able to
explain the variation in the dependent variable.

Unbiasedness of the estimators


As you must have studied in your statistics class, one of the most desirable properties of
any estimator is its unbiasedness. Indeed, it can be shown that both the OLS estimators,
the intercept as well as the slope are unbiased estimators of their respective population
parameters, and .

We will prove that the slope coefficient, b2 is unbiased in the text and leave the proof for
b1, as an exercise (solved) at the end of the lesson.

Before we start the proof, let us recall the formula for the slope coefficient as

Using eq. (1) above in the expression for b2, we obtain

Institute of Life Long Learning 8


Properties of Least Squares Estimators

= . (7).

Note that = 0 since is a constant and equals zero.

Using notation from Dougherty (2008),

Let , so that eq.(7) can be rewritten as,

. (8).

In order to prove the unbiasedness of b2, let us run the expectation operator through eq.
(8) to obtain

= ).

= . (9).

Note that

Thus, the slope coefficient, b2 is an unbiased estimator of population parameter,


Similarly, it can be shown that b1 is an unbiased estimator of

Now, notice that the slope coefficient, b2 is not the only estimator of that is unbiased.
For instance, consider an estimator that is obtained by joining the 1st and 2nd
observations in the sample as shown in fig. 4 below and then finding the slope of that
line.

Institute of Life Long Learning 9


Properties of Least Squares Estimators

Fig.4

That is, let

. (10)

Let us examine whether this estimator is unbiased or not.

Recall eq. (1) that is, for all i and note that

And . (11).

Substituting for and from eq. (11) into eq. (10), we obtain

= (12).

Now, taking expectations operator through eq. (12), we obtain

Note that

Thus, and hence the naïve estimator is also an unbiased of .Also the
estimator is relatively simple to compute.

But is that sufficient? Do we stop here in our hunt for estimators?

The answer is no because as you notice, this estimator uses only two observations to
find out an estimate of b2 and wastes the rest of them.

Therefore, this naïve estimators will be highly sensitive to the values of the error term
for the first two observations. On the other hand, the OLS estimator makes use of all the
observations in the sample and has the advantage that a lot of error terms may cancel
out each other and may not affect the regressions much.

Institute of Life Long Learning 10


Properties of Least Squares Estimators

Indeed, it can be shown that naïve estimator has a much higher variance than that of
the OLS estimators and is therefore less precise which is what we discuss in the next
section.

Variances of the OLS estimators


In this section, we shall discuss formulae for the variances of the OLS estimators. The
population variances for b1 and b2, denoted by and are given by the following
expressions:-

And . (13).

We will derive the formula for now and leave the derivation of the formula for as
solved exercise question.

We know that,

= for a random variable X.

Hence, Var (

= From eq. (9)

From eq. (8)

[Substituting for from eq. (8)].

. (14)

Thus, as eq. (14) states, is directly proportional to the variance of error term
and inversely proportional to the squared mean derivation of .That is to say, higher
is the variance of the error term, the less efficient will be (more will be the variance of
) and the lower is the variance of the more precise will the be.

As discussed earlier, the expressions,

And .

Institute of Life Long Learning 11


Properties of Least Squares Estimators

were the population variances of the least squares estimators, and respectively. In
reality, however one has to work with just a sample of population.

Hence, we don’t have known to us and therefore we need a sample counterpart of


that. One alternative to that is simply the variance of residuals, i.e.

However, it can be shown that the above measure is not a preferable estimator of as
it is biased. More precisely, it can be shown that an unbiased estimator of is
denoted by .

Hence from now on, we will replace by everywhere.

Further, by taking the square roots of the population variances of & , one can obtain
their standard deviations that are popularly known as standard errors of the regression
coefficients, which are abbreviated to in the literature.

Thus, the standard errors of and are expressed as follows:

And .

The subsequent section using Gauss Markov Theorem shows why OLS estimators must
be preferred over naïve estimators even though both are unbiased estimators.

The Gauss Markov Theorem


Given the assumption of the classical linear regression model, OLS estimators are BLUE
(Best linear unbiased estimators) i.e. they have the minimum variance in the class of
linear estimators. An estimator is said to be BLUE if the following properties holds:

(1) LINEARITY: An estimator is said to be linear if it is linear function of the sample


observations.

Let us prove that OLS estimators are linear


n

 X i  X Yi  Y 
ˆ 2  i 1
n
and ˆ 1  Y  ˆ 2 X
 X X
2
i
i 1

Consider,  X i  X Yi  Y     X i  X Yi    X i  X Y

   X i  X  Yi  Y   X i  X 

Institute of Life Long Learning 12


Properties of Least Squares Estimators

 
   X i  X  Yi  Y   X i  n X 
 

(Since X i  nX , we get

   X i  X Yi )

xi
Let ai 
 2
  xi 
 

 X i  X  Yi
ˆ 2    ai Yi
 X X
2
i

So, this proves that ̂ 2 is a linear function of Yi .

Now, ˆ 1  Y  ˆ 2 X ; therefore ̂1 is a linear function too.

(2) UNBIASEDNESS: An estimator is said to be unbiased if the mean value of


estimator in repeated sampling, is equal to the true value of the parameter. Formally, an
estimator X is unbiased estimator of 'u' if, E X   . Let us prove that ̂ 2 is
unbiased.

ˆ 2   ai Yi

We know PRF; Yi  1  2 X i  ui

So; ˆ 2   ai  1  2 X i  ui 

 1  ai  2  ai X i   ai ui

xi
Given ai  ;

x 2
i

Institute of Life Long Learning 13


Properties of Least Squares Estimators

 since in the numerator 


 xi  
 sum of deviation of 
a i  0
 variable from its mean 
x 2
i
 is equal to zero 
 

x X i i x x  X  x
i i
2
i

a Xi i    1
x 2
i x 2
i x 2
i

, ˆ 2  1  0  2 1   ai ui  2   ai ui

 
E ˆ 2  2   ai E  ui   2 (Since, E  ui  = 0 by assumption of CLRM)

Similarly, it can be proved that: ̂1 is also unbiased estimator of 1 .

(3) EFFICIENCY: An unbiased estimator with minimum variance is called an efficient


estimator.

In the class of all linear unbiased estimators, OLS estimators have the minimum
variance.

Let us prove that ̂ 2 has minimum variance in the class of all linear unbiased estimators
of  2 .

xi
Consider ˆ 2   ai Yi where ai 
x 2
i

Now define an alternative linear estimator of 2 as 2   wi Yi where wi is not

necessarily equal to ai.

Taking expectation, E  2    wi E Yi 

  wi  1  2 X i   1  wi  2  wi X i

For 2 to be unbiased, we should have  w  0 and  w X


i i i 1

 
Let us compute the variance of  
2 : var 2  var   wi Yi 
 

Institute of Life Long Learning 14


Properties of Least Squares Estimators

Since, var Yi  var  i   2 , var  2   2  wi2

2
 
 x x 
 2   wi  i  i 
 xi2  xi2 


 
2
     
 x   xi2  xi   xi 
 2   wi  i   2  2 2
 i
 w     
 2 
xi    xi 
 
2
 


 xi 
   xi2  

 2

 
2


 
2
   
 x   1 
 2   wi  i    2  
 2   2 


 xi 

  xi 
 

2
If wi  ai 
xi
, then var  2    
 var ˆ 2
 xi2  xi2
2
If wi  ai then var   
2 
x 2
i

Or we can say,   since the first term in the expression is positive.


var 2   var ˆ 2

2 will have minimum variance if w  a , otherwise var     var  


ˆ  . This proves

So, i i 2 2

that OLS estimator ̂ 2 has minimum variance in the class of linear unbiased estimators.

Similarly, it can be proved that ̂1 has least variance.

Thus, by proving (1), (2) and (3) we have proved Gauss Markov theorem or that OLS
estimators are BLUE.

Sampling distribution of OLS estimators


A8. The error term, i in the PRF Yi  1  2 X i  ui follows a normal distribution

with mean 'zero' and variance '  ' i.e.


2
ui ~ N  0, 2 

The above assumption is added to the initial assumptions to arrive at the sampling
distribution of OLS estimators.

Institute of Life Long Learning 15


Properties of Least Squares Estimators

An explanation of A9 lies in the Central Limit Theorem (CLT) which can be stated as
follows:

If there are large number of independently and identically distributed random variable,
then with a few exceptions, the distribution of their sum tends to a normal distribution as
the number of such variables increases indefinitely (Gujarati and Porter : Essential Of
Econometrics).

Now, since the error term contains all those random variables which affect the dependent
variables other than explanatory variables, included in the model, it can be thought of sum
of all these random variables. By involving the CLT ' ui ' therefore, follows a normal

distribution with mean zero and variance  .


2

Since any linear function of a normally distributed variable is itself normally distributed,
̂1 and ̂ 2 will also be normally distributed. We had already proved that ̂1 and ̂ 2 are
linear functions of error term ' ui '. Therefore,

If i ~ N  0, 2  then 
ˆ 1 ~ N 1 , 2ˆ
1
 and ˆ 2  
~ N 2 , 2ˆ
2

We have already derived that  


E ˆ 1  1 and E  
ˆ 
2 2

X 2
2
   
i
Also we know that, var ˆ 1  2ˆ   2
and var ˆ 2  2ˆ 
n x n xi2
1 2
2
i


Fig. 5 Sampling distribution of ̂ 2 and  2 .

In fig. 5, you can see the sampling (or probability) distribution of ̂ 2 .


Insert fig. 5 here.

Institute of Life Long Learning 16


Properties of Least Squares Estimators

Since ̂ 2 is unbiased,  
E ˆ 2  2 . Also you can see 2s sampling distribution. Though

it has the same mean, variance is much higher compared with variance of ̂ 2 . This

happens because, as we have discussed, ̂ 2 in BLUE.

Summary
1. The assumptions of the classical linear regression model are linearity, non -
stochastic regressors, no autocorrelation, homoscedasticity, zero conditional
mean of the error term and normality of the error term.

2. Under the assumptions of the CLRM, it is shown that OLS estimators are unbiased
and their variances and standard errors are derived.

3. Guass Markov theorem states that under the assumptions of the CLRM (A1-A7),
OLS estimators are BLUE which has been proved in the lesson.

4. Using the last assumption of the CLRM (A8), it is shown that the OLS estimators
are normally distributed.

Exercises
Solved Exercises:-

1. Show that , the OLS estimator of intercept is an unbiased estimator of .

Proof: Consider the formula for from the normal equations as follows:

Running the expectations operator throughout, we obtain

Proved.

2. Derive the expression for the as given in eq. (13) in the text.
Solution: To prove:
Proof: Note that

Institute of Life Long Learning 17


Properties of Least Squares Estimators

-0.

Proved.

3. Suppose a researcher used a naïve estimator, as the estimator of the


slope coefficient, on a dataset on two variables X and Y instead of the OLS
estimator .

(a) Which estimator is better, the naïve estimator or the OLS estimator if the
researcher is concerned with the property of unbiasedness? Explain your
answer.
(b) Find the variance of the naïve estimator, b2. Does your answer remain the
same if efficiency is also a desirable property of the estimator?

Solution:

(a) In order to answer this question, we need to find out if naïve estimator is an
unbiased estimator of or not.
Proof: Taking the expectation of the naïve estimator, we obtain:

Institute of Life Long Learning 18


Properties of Least Squares Estimators

Proved.

Thus since both the naïve estimators as well as OLS estimator are unbiased estimators
of , so the researcher can go for anyone of them.

Y Y E  b2   2 var  b2   E  b2  2 
2
(b) Now, b2  n 1 and , so
X n  X1

 u  u   
2


 E  2  n 1   2 

  X n  X1   

1
 X n  X1 
2     
E un2  E u12  2 E  u1 un  
By assumption: E  ui2   u2 and E  ui u j   0 where i  j

1 2u2
var  b2       0 
2 2

 X n  X1   X n  X1 
2 u u 2

As far as precision is concerned, we know from Gauss Markov Theorem that b2 (OLS
estimator) is BLUE. So that must be preferred.

4. Which of these is not a classical assumption?

(1) Values taken by dependent variable is fixed in repeated sampling

(2) Regression model is linear in parameters

(3) Error term has zero mean

(4) Error term has constant variance.

Sol. Answer is point (1)

5. An unbiased slope coefficient estimate is:

(1) An estimate that is always equal to the true parameter value

(2) An estimate that will be equal to the true parameter in large samples

(3) The mean of sampling distribution of slope parameter is zero.

(4) If repeated samples of same size are taken, on average their value will be
equal to the true parameter value.

Sol. Answer is point (4)

6. Given the assumptions in column 1 of the table, show that the assumptions in
column 2 are equivalent to them.

Assumptions of the Classical Model

Institute of Life Long Learning 19


Properties of Least Squares Estimators

(1) (2)
E (ui | X i )  0 E (Yi | X i )  2  2 X
cov(ui , u j )  0, i  j cov(Yi , Yj )  0, i  j
var(ui , X i )   2 var(Yi , X i )   2

Sol. (1) To prove: E  ui | X i   0  E  ui | X i   1  2 X .

If E  ui | X i   0 ; and PRF Yi  1  2 X i  ui running expectation operator through, we


get:

E Yi | X i   1  2 E  X i | X i   E  ui | X i   1  2 X i  0  1  2 X i
Hence, proved

(2) Consider the PRF, Yi  1  2 X i  ui

E Yi   1  2 X i and so Yi  E Yi   ui

Now;  
cov Yi , Y j   E Yi  E Yi   Y j  E Y j    E u u   0
i j

The last step follows from the assumption that cov  ui , u j   0 i  j

Hence, proved.

var Yi | X i   EX i Yi  E Yi | X i  


2
(3) By applying the formulae, we get:

PRF: Yi  1  2 X i  ui

E Yi | X i   1  2 X i  E  ui | X i   1  2 X i

So, Yi  E Yi | X i   ui

var Yi | X i   EX i  ui   E  ui2 | X i   2


2
Now,

Hence, proved.

Practice questions:
1. State whether true or false.

(a) OLS estimator b2 is the only estimator of that is unbiased.

(b) The assumptions of the regression being non-stochastic is equivalent to saying


that the regressors are constant i.e. their variance is zero.

2. Show that

Institute of Life Long Learning 20


Properties of Least Squares Estimators

3. Find out the expression for the variance of the naïve estimator (
discussed in the text. Compare it with the variance of , the OLS estimator of

4. You propose to study the relationship, PTi    RDi  i , where PT is the


number of patents applications filed during a given period and RD is the
expenditure on research and development as ratio of gross domestic product.
The error term  satisfies all the assumptions made by the classical linear
regression model. What are the properties of the least squares estimator ̂ ?
Prove any two of these properties.

5. Prove that ̂1 (OLS estimator of 1 ) is BLUE i.e. linear, unbiased and efficient.

6. Given the following hypothetical data on 20 pairs of observations on Y and X:


S. No. Y X
1 1.1 11
2 1.5 10.4
3 1.9 10.2
4 2.1 9.9
5 2.5 9.6
6 2.7 9.4
7 3.5 9.1
8 3.9 8.5
9 4.2 8.1
10 4.8 7.5
11 5.3 7.1
12 5.7 6.7
13 6.1 6.2
14 6.5 5.8
15 7.0 5.3
16 7.4 4.9
17 8.1 4.7
18 8.2 3.5
19 8.6 3.3
20 8.9 3

(a) Obtain and


(b) Obtain standard errors of and .

REFERENCES
1. Dougherty, C.,”Introduction to Econometrics”, (3rd edition), OUP.
2. Gujarati, D.N and D.C. Porter, “Essentials of Econometrics”. (4 th edition), McGraw Hill
Publication.

Institute of Life Long Learning 21

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy