2 Simple Regression Model 29x09x2011
2 Simple Regression Model 29x09x2011
y
x
Figure 2.2. The scatter diagram
We can express the population model for each observation of the sample:
0 1
1, 2, ,
i i i
y x u i n | | = + + =
(6)
In figure 2.3 the population regression function and the scatter diagram are put
together, but it is important to keep in mind that
0
| and
1
| are fixed, but unknown.
According to the model it is possible, from a theoretical point of view, to make the
following decomposition:
( | ) 1, 2, ,
i i i i
y E y x u i n = + =
(7)
that has been represented in the figure 2.3 for the observation i. However, from an
empirical point of view, it is not possible to do it, because
0
| and
1
| are unknown
parameters and
i
u is not observable.
4
y
x
E(y/x)
y
i
x
i
E(y
i
/x
i
)
u
i
Figure 2.3. The population regression function and the scatter diagram
Sample regression function
The basic idea of regression is to estimate the population parameters,
0
| and
1
| ,
from a given sample.
The sample regression function (SRF) is the sample counterpart of the
population regression function (PRF). Since the SRF is obtained for a given sample, a
new sample will generate different estimates.
The SRF, which is an estimation of the PRF, given by
0 1
i i
y x | | = +
(8)
allows to calculate the fitted value (
i
y ) for y when
i
x x = .In the SRF
0
| and
1
| are
estimators of the parameters
0
| and
1
| .
For each x
i
we have an observed value (
i
y ) and a fitted value (
i
y )
We call residual (
i
u ) to the difference between
i
y and
i
y . That is
0 1
i i i i i
u y y y x | | = =
(9)
In other words, the residual
i
u is the difference between the fitted line and the
sample point, as it can be seen in figure 2.4. In this case, it is possible to calculate the
decomposition:
i i i
y y u = +
for a given sample.
5
y
x
y
i
x
i
Figure 2.4. The sample regression function and the scatter diagram
To sum up,
0
| ,
1
| ,
i
y and
i
u are the sample counterpart of
0
| ,
1
| , E(y|x) and
i
u respectively. It is possible to calculate
0
| and
1
| and
1
|
that make the sum of all the residuals as near to zero as possible. With this criterion the
expression to minimize would be the following:
Min.
1
n
i
i
u
=
(10)
The main problem of this method of estimation is that residuals of different sign
can be compensated. Such situation can be observed graphically in the figure 2.5, in
which three aligned observations are graphed, (
1 1
, x y ), (
2 2
, x y ) and (
3 3
, x y ). In this case
the following happens:
3 1 2 1
2 1 3 1
y y y y
x x x x
=
6
x
y
x
1
x
2
x
3
Figure 2.5. The problems of criterion 1.
If a straight line is fitted so that it crosses through the three points, each one of
the residuals will take value zero, so that
1
0
n
i
i
u
=
=
This fit could be considered optimal. But it is also possible to obtain
3
1
0
i
i
u
=
=
is always obtained. This simple example shows us that this criterion is not
appropriate for the estimation of the parameters because, for any set of observations,
infinite straight lines exist, that satisfy this criterion,
Criterion 2
A form to avoid the compensation of positive residuals with negatives consists
of taking the absolute values from the residuals. In this case the following expression
would be minimized:
Min.
1
n
i
i
u
=
(11)
Unfortunately, although the estimators thus obtained have some interesting
properties, their calculation is complicated, requiring the resolution of a problem of
linear programming or the application of a procedure of iterative calculation.
Criterion 3
A third method consists of minimizing the sum of the square residuals, that is to
say, minimizing
7
2
1
. .
n
i
i
Min S Min u
=
=
(12)
The estimators obtained are denominated least square estimators, and they enjoy
certain desirable statistical properties, that we will study later. On the other hand, as
opposed to first of the examined criteria, when we square the residuals the compensation
of these is avoided, whereas, unlike the second of the criteria, the least square estimators
are simple to obtain. It is important to indicate that, from the moment at which we are
squaring the residuals, we are proportionally penalizing to the great residuals with
respect to the small ones (if a residual is the double of another one, its square will be
four times greater), which also characterizes the least square estimation with respect to
other possible methods.
Application of least square criterion
Now, we are going to look at the process to obtain least square estimators. The
objective is to minimize the sum of the square residuals (S). To do this, in the first place
we are going to express S as a function of the estimators, using (9):
Therefore, we must
0 1 0 1 0 1
2 2
0 1
, , ,
1 1
min min min ( )
T n
t i i
t i
S u y x
| | | | | |
| |
= =
= =
(13)
To minimize S, we differentiate partially with respect to
0
| and
1
| :
0 1
1
0
2 ( )
n
i i
i
S
y x | |
|
=
c
=
c
0 1
1
1
2 ( )
n
i i i
i
S
y x x | |
|
=
c
=
c
The least square estimators are obtained equalling to 0 the previous derivatives:
0
1
( ) 0
n
i i i
i
y x | |
=
=
(14)
0
1
( ) 0
n
i i i i
i
y x x | |
=
=
(15)
The equations (14)are denominated normal equations or LS first order
conditions.
In operations with summations, the following rules must be taken into account:
1
n
i
a na
=
=
1 1
n n
i i
i i
ax a x
= =
=
8
1 1 1
( )
n n n
i i i i
i i i
x y x y
= = =
+ = +
Operating with the normal equations, we have
0 1
1 1
n n
i i
i i
y n x | |
= =
= +
(16)
2
0 1
1 1 1
n n n
i i i i
i i i
y x x x | |
= = =
= +
(17)
Dividing both sides of (16) by n, we have
0 1
y x | | = +
(18)
Therefore
0 1
y x | | =
(19)
Substituting this value of
0
|
1 1
1
2
1 1
n n
i i i
i i
n n
i i
i i
y x y x
x x x
|
= =
= =
(20)
Taking into account that
1 1 1 1 1
1 1
1 1 1 1 1 1
1 1
( )( ) ( )
n n n n n
i i i i i i i i i i
i i i i i
n n
i i n n n n n n
i i
i i i i i i i i
i i i i i i
n n
i i i
i i
y y x x y x xy yx yx y x x y y x nyx
x x
y x ny y x ny y x y x y x y x
n n
y x y x
= = = = =
= =
= = = = = =
= =
= + = +
= + = +
=
2 2 2 2
1 1 1 1
2 2
1 1 1 1 1
( ) ( 2 ) 2
2
n n n n
i i i i i
i i i i
n n n n n
i i i i i
i i i i i
x x x xx xx x x x nxx
x x x x x x x x
= = = =
= = = = =
= + = +
= + =
9
then (20) can be expressed in the following way:
1 1 1
1
2 2
1 1 1
( )( )
( )
n n n
i i i i i
i i i
n n n
i i i
i i i
y x y x y y x x
x x x x x
|
= = =
= = =
= =
(21)
Once
1
| by using (19).
These are the Least Squares (LS) estimators. Since other more complicated
methods exist, also called least square methods, the method that we have applied is
denominated ordinary least square (OLS), due to its simplicity.
In the precedent epigraphs
0
| and
1
(24)
0
1
1
( ) 0
n
i i i i
i
y x x
n
| |
=
=
(25)
From the two previous equations the estimators
0
|
and
1
|
n n
i i
i i
y y
= =
=
(29)
and, dividing (26) and (29) by n, we obtain
0 u y y = =
(30)
2. The OLS line always goes through the mean of the sample ( , y x ).
Effectively, dividing the equation (16) by n, we have:
0 1
y x | | = +
(31)
3. The sample covariance between the regressor and the OLS residuals is zero.
The sample covariance between the regressor and the OLS residuals is equal to
1 1
1 1 1
( )( ) ( )
n n
i i i i
i i
xu
n n n
i i i i i
i i i
x x u u x x u
S
n n
x u x u x u
n n
= =
= = =
= =
= =
Therefore, if
11
1
1
0
n
i i
i
x u
=
=
(32)
then the sample covariance between the regressor and the OLS residuals is equal to 0.
It can be seen that (32) is equal to the second normal equation,
0 1
1 1
( ) 0
n n
i i i i i
i i
x u x y x | |
= =
= =
given in (15).
Therefore, the sample covariance between the regressor and the OLS residuals is
zero.
4. The sample covariance between the fitted values (
i
y ) and the OLS residuals
i
u is zero.
Analogous to property 3, the sample covariance between the fitted values (
i
y )
and the OLS residuals (
i
u ) is equal to 0, if the following happens
1
0
n
i i
i
y u
=
=
(33)
Proof
Taking into account the algebraic properties 1 - (26)- and 3 - (32)-, we have
0 1 0 1
1 1 1 1 0 1
( )
0 0
0
n n n n
i i i i i i i
i i i i
yu
y u x u u x u
S
n n n n
| | | |
| |
= = = =
+ +
+
= = = = =
Decomposition of the variance of y
By definition
i i i
y y u = +
(34)
Subtracting y in both sides of the previous expression (remember that y is
equal to y ), we have
i i i
y y y y u = +
Squaring both sides:
| |
2
2
2 2
( ) ( ) 2 ( )
i i i i i i i
y y y y u y y u u y y
(
= + = + +
Summing for all i:
| |
2
2 2
( ) 2 ( )
i i i i i
y y y y u u y y = + +
Taking into account the algebraic properties 1 and 4, the third term of the right
hand side is equal to 0. Analytically,
12
( ) 0
i i i i i
u y y u y y u = =
(35)
Therefore, we have
| |
2
2 2
( )
i i i
y y y y u = +
(36)
It must be stressed that it is necessary to use the relation (26) to assure that (35)
is equal to 0. We must remember that (26) is associated to the first normal equation that
is to say, to the equation corresponding to the intercept. If in the fitted model there is no
intercept, then in general the decomposition obtained will not be fulfilled (36).
In words,
Sum of square total (SST) =
Sum of square explained (SSE)+ Sum of square residuals (SSR)
or,
SST=SSE+SSR
This decomposition can be done in terms of variances, dividing both sides of
(36) by n:
| |
2
2 2
( )
i i i
y y y y u
n n n
= +
(37)
In words,
Total variance= explained variance + residual variance
Goodness of fit: Coefficient of determination (R
2
)
A priori we have obtained the estimators minimizing the sum of square
residuals.
Now, once the estimation has been done, we can look at how well our sample
regression line fits our data.
The statistics that indicate how well sample regression line fits to data are
denominated goodness of fit measures. We are going to look at the most known
measure, which is called coefficient of determination or the R-square (
2
R ). This
measure is defined in the following way:
2
2 1
2
1
( )
( )
n
i
i
n
i
i
y y
R
y y
=
=
(38)
Therefore,
2
R is the proportion of sum of square total that is explained by the
regression, that is to say, that is explained by the model. We can also say that 100
2
R is
the percentage of the sample variation in y explained by x.
Alternatively, taking into account (36), we have:
13
2 2 2
( ) ( )
i i i
y y y y u =
Substituting in (38), we have
2
2
2 2
2 1
2 2 2
1 1 1
( )
( )
1
( ) ( ) ( )
n
i
i i i
i
n n n
i i i
i i i
y y
y y u u
R
y y y y y y
=
= = =
= = =
(39)
Therefore,
2
R is equal to 1 minus the proportion of sum of square total that is
non-explained by the regression.
According to the definition of
2
R , the following must be accomplished
2
0 1 R s s
Extreme cases:
a) If we have a perfect fit, then 0
y c i = , it implies that
2 2
0 ( ) 0 0
i i
y c y y c c i y y R = = = = =
If
2
R is close to 0, it implies that we have a poor fit. In other words, very little
variation in y is explained by x.
In many cases, a high
2
R is obtained when the model is fitted using time series
data, due to the effect of a common trend. On the contrary, when we use cross sectioned
data a low value is obtained in many cases, but it does not mean that the fitted model is
bad.
Coefficient of determination and coefficient of correlation
In descriptive statistics, the coefficient of correlation is studied. It is calculated
according to the following formula:
1
2 2
1 1
1
2 2
1 1
( )( )
cov( , )
var( ) var( )
( ) ( )
( )( )
( ) ( )
n
i i
i
xy
n n
i i
i i
n
i i
i
n n
i i
i i
y y x x
x y
n
r
x y
x x y y
n n
y y x x
x x y y
=
= =
=
= =
= =
(40)
14
What is the relationship between the coefficient of determination and the
coefficient of correlation?
Answer: the coefficient of determination is equal to the squared coefficient of
correlation. Analytically,
2 2
xy
r R =
(41)
(This equality is valid in simple regression model, but not in multiple regression
model)
Proof
In first place we are going to see an equivalence that is going to be used in the
proof. By definition,
0 1
i i
y x | | = +
From the first normal equation, we have
0 1
y x | | = +
Subtracting the second equation of the first one:
1
( )
i i
y y x x | =
Squaring both sides
2 2 2
1
( ) ( )
i i
y y x x | =
and summing for all i, we have
2 2 2
1
( ) ( )
i i
y y x x | =
Taking into account the previous equivalence, we have
2 2 2
1
2 1 1
2 2
1 1
2
2
1 1
2
2
2
1
1
2
1 2
2 2
1 1
( ) ( )
( ) ( )
( )( ) ( )
( )
( )
( )( )
1
( ) ( )
n n
i i
i i
n n
i i
i i
n n
i i i
i i
n
n
i
i
i
i
n
i i
i
xy n n
i i
i i
y y x x
R
y y y y
y y x x x x
y y
x x
y y x x
r
x x y y
|
= =
= =
= =
=
=
=
= =
= =
(
(
=
(
(
(
(
= =
15
2.4 Units of measurement and functional form
Units of Measurement
Changing the units of measurement (changing of scale)
If x is multiplied/divided by a constant, c = 0, then the OLS slope is
divided/multiplied by the same constant, c. Thus
1
0
( )
i i
y x c
c
|
|
(
= +
(
(42)
Example
Let us suppose the following estimated function of consumption, in
which both variables are measured in thousands of euros:
0.2 0.85 i
i
cons inc = +
(43)
If we now express income in euros (multiplication by 1000) and call it
ince, the fitted model with the new units of measurement of income would
be the following:
0.2 0.00085
i i
cons ince = +
As can be seen, changing the units of measurement of the explanatory variable
does not affect the intercept.
If y is multiplied/divided by a constant, c = 0, then the OLS slope and intercept
are both multiplied/divided by the same constant, c. Thus,
0 1
( ) ( ) ( )
i i
y c c c x | | = +
(44)
Example
If we express, in model (43), consumption in euros (multiplication by
1000) and call it conse, the fitted model with the new units of measurement
of consumption would be the following:
200 850
i i
conse inc = +
Changing the origin
If one adds/subtracts a constant d to x and/or y, then the OLS slope is not
affected.
However, changing the origin of either x and/or y affects the intercept of the
regression.
If one subtracts a constant d to x, the intercept will change in the following way:
0 1 1
( ) ( )
i i
y d x d | | | = + +
(45)
Example
16
Let us suppose that the average income is 20 thousand euros. If we define
the variable
i i
incd inc inc = and both variables are measured in thousands
of euros, the fitted model with this change in the origin will be the following:
15 0.2 15 0.85 i
i
cons inc = +
that is to say,
14.8 0.85 i
i
consd inc =- +
Eventually note that
2
R is invariant to changes in the units of x and/or y, and
also is invariant to the origin of the variables.
Functional Form
Linear relationships are not adequate in many cases for economic applications.
However, we can incorporate many nonlinearities (in variables) into simple regression
analysis by appropriately redefining the dependent and independent variables.
Previously, before examining different functional form, we are going to look at
some definitions which will be useful in the exposition.
Some definitions
We are going to look at some definitions of variation measures that will be
useful in the interpretation of the coefficients corresponding to different functional
forms. Specifically, we will look at the following: absolute change, proportional change,
percentage change and changes in logarithms.
The absolute change between
1
x and
0
x is given by
1 1 0
x x x A =
(47)
This measure of variation is not very used by economists.
The proportional change (or relative rate of variation) between
1
x and
0
x is
given by:
1 0 1
0 0
x x x
x x
A
=
(48)
17
Multiplying a proportional change by 100 a percentage change is obtained. That
is to say:
1
0
100 %
x
x
A
(49)
In economic reports this is the most used measure.
The change in logarithms between
1
x and
0
x is given by
1 0
log( ) log( ) log( ) x x x A =
(50)
This is also a variation rate, which is used in economics research. The
relationship between proportional change and change in logarithms can be seen if we
expand (50) by Taylor series:
1
1
0
0
1
0
1
1 0
0
2
1 1
1
0 0
1
0
1
0
1
3
1
3
0
1
0
1
1 1
0 0
log( ) log( ) log
1 1 1
log(1) 1 1
2
1 2
1
3 2
1
1 1
2
x
x
x
x
x
x
x
x x
x
x x
x
x x
x
x
x
x
x
x
x
x x
x x
=
=
=
(
=
(
(
(
(
(
( (
(
( = + +
( (
(
(
(
(
(
(
(
(
(
(
(
+ +
(
(
(
(
(
(
( (
=
(
2 3
1
0
2 3
1 1 1
0 0 0
1
1
3
1 1
2 3
x
x
x x x
x x x
(
+ +
( (
( ( A A A
= + +
( (
(51)
Therefore, if we take the linear approximation in this expansion, we have
1 1
1 0
0 0
log( ) log( ) log( ) log
x x
x x x
x x
( A
A = = ~
(
(52)
This approximation is good when the proportional change is small, but the
differences can be important when the proportional change is big, as can see in table 1.
18
Table 1. Examples of proportional change and change in logarithms
x
1
202 210 220 240 300
x
0
200 200 200 200 200
Proportional change 0.010 0.050 0.100 0.200 0.500
Change in logarithms 0.010 0.049 0.095 0.182 0.405
Proportional change % 1% 5.0% 10.0% 20.0% 50.0%
Change in logarithms % 1% 4.9% 9.5% 18.2% 40.5%
Elasticity is the ratio of the relative changes of two variables.
If we use proportional changes, the elasticity of the variable y with respect to the
variable x is given by
0
/
0
y x
y
y
x
x
c
A
=
A
(53)
If we use changes in logarithms and consider infinitesimal changes, the elasticity
of the variable y with respect to a variable x is given by
/
log( )
log( )
y x
dy
d y y
dx
d x
x
c = =
(54)
In general, in the econometric models elasticity is defined by using (54).
Alternative functional forms
The OLS method can be applied without any problem to models in which the
transformations of the endogenous variable and/or of the exogenous variable have been
done. In the presentation of the model (1) we said that exogenous variable and regressor
were equivalent terms. But from now on, regressor is the specific form in which an
exogenous variable appears in the equation. For example, in the model
0 1
log( ) y x u | | = + +
the exogenous variable is x, but the regressor is log(x).
In the presentation of the model (1) we also said that endogenous variable and
regressand were equivalent. But from now on, regressand is the specific form in which
an endogenous variable appears in the equation. For example, in the model
0 1
log( ) y x u | | = + +
the endogenous variable is y, but the regressand is log(y).
The two previous models are linear in parameters, although they are not linear in
the variable x (the first one) or in the variable y (the second one). In any case, if a model
is linear in parameters, it can be estimated applying the OLS method. On the contrary, if
a model is not linear in parameters, iterative methods must be used in the estimation.
However, there are certain nonlinear models that, by means of suitable
transformations, can become linear. These models are denominated linearizables.
19
Thus, in some occasions potential models are postulated in economic theory, like
in the well-known production function of Cobb-Douglas. A potential model with a
unique explanatory variable is given by
1
y Ax
|
=
If we introduce the error term in the following way
1
u
y Ax e
|
=
(55)
then, taking natural logarithms in both sides of (55), we obtain a linear model in
parameters:
0 1
log( ) log( ) y x u | | = + +
(56)
where we have called
0
| to log(A).
On the contrary, if we introduce the error term in the following way
1
y Ax u
|
= +
then there is not a transformation that allows turning this model into a linear model.
This is a non linearizable model.
Now, we are going to consider some models with alternative functional forms,
all of them linear are in parameters. We will look at the interpretation of the coefficient
1
| in each case.
a) Linear model
In this model, defined in (1), if the other factors included in u are held fixed, so
that the change in u is zero, 0 u A = , then x has a linear effect on y:
1
if 0 y x u | A = A A =
Therefore,
1
| is the change in y (in the units in which y is measured) by a unit
change of x (in the units in which x is measured).
For example, in the fitted function (43) if income increases by 1 unit,
consumption will increase by 0,85 units.
The linearity of this model implies that a one-unit change in x has always the
same effect on y, regardless of the value of x considered.
b) Linear log model
A linear log model is given by
0 1
log( ) y x u | | = + +
(57)
Taking first differences in (57) and doing 0 u A = , we have
1
log( ) if 0 y x u | A = A A =
Therefore, if x increases by 1%, then y will increase by
1
( / 100) | units, being
0 u A = .
20
c) Log linear model
A log linear model is given by
0 1
log( ) y x u | | = + +
(58)
This model can be obtained from the following one:
0 1
exp( ) y x u | | = + +
by taking natural logs in both sides. For this reason this model is also called
exponential.
Taking first differences in (58) and doing 0 u A = , we have
1
log( ) if 0 y x u | A = A A =
Therefore, if x increases by 1 unit, then y will increase by 100
1
| %, being
0 u A = .
d) Log log model
The model given in (56) is a log log model or, before the transformation, a
potential model (55). This model is also called constant elasticity model.
Taking first differences in (56) and doing 0 u A = , we have
1
log( ) log( ) if 0 y x u | A = A A =
Therefore, if x increases by 1%, then y will increase by
1
| %, being 0 u A = . It is
important to remark that, in this model,
1
| is the elasticity of y with respect to x, for any
value of x and y. Consequently, in this model the elasticity is constant.
In table 2 and for the fitted model, the interpretation of
1
| in different models
Model If x increases in then y will increases in
linear 1 unit
1
| units
linear log 1%
1
( /100) | units
log linear 1 unit
1
(100 )% |
log log 1%
1
% |
2.5 Statistical Properties of OLS
We are going now to study the statistical properties of OLS estimators,
0
| and
1
(61)
SLR.4: Zero conditional mean
E(u|x) = 0
For a random sample, this assumption implies that
E(u
i
|x
i
) = 0
, i = 1,2,3,,n
According to the SLR.4, all derivations using this assumption will be conditional
on the sample values, xs.
We are going to proof only the unbiasedness of the estimator
1
| , which is the
most important. In order to do this proof, we need to rewrite our estimator in terms of
the population parameter. Start with a simple rewrite of the OLS formula (21) as
( )( )
( )
( )
( )
1 1
1
2 2
1 1
n n
i i i i
i i
n n
i i
i i
x x y y x x y
x x x x
|
= =
= =
= =
(62)
because ( ) ( )
1 1
0 0
n n
i i
i i
x x y y x x y
= =
= = =
Substituting (60) into (62), we have
22
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
0 1 1
1 1 1
1
2 2 2
1 1 1
2
1
1 1 1
1
2 2 2
1 1 1
( ) ( )
n n n
i i i i i i i i
i i i
n n n
i i i
i i i
n n n
i i i i i
i i i
n n n
i i i
i i i
x x y x x x u x x x u
x x x x x x
x x x x u x x u
x x x x x x
| | |
|
|
|
= = =
= = =
= = =
= = =
+ + +
= = =
= + = +
(63)
The estimator equals the population slope, |
1
, plus a term that is a linear
combination in the errors, {u
1
,,u
n
}.
In the derivation of (63), we have taken into account that
( ) ( ) 0 0 0
1 1
0 0
n n
i i
i i
x x x x | | |
= =
= = =
( ) ( )( ) ( ) ( ) ( )
2
1 1 1 1 1
n n n n n
i i i i i i i i
i i i i i
x x x x x x x x x x x x x x x
= = = = =
= = =
THEOREM 2.1 Unbiasedness of OLS
Under assumptions SLR.1 to SLR.4 the expectations of
0
|
and
1
| , conditional
to x, are unbiased.
Proof
1
|
is unbiased
( )
( )
( )
( )
( )
( )
1 1
1 1 1
2 2
1 1
1
1 1
2
1
|
( | ) |
( )
n n
i i i i i
i i
n n
i i
i i
n
i i
i
n
i
i
x x u E x x u x
E X E X
x x x x
x x E u
x x
| | |
| |
= =
= =
=
=
(
(
(
( = + = +
(
(
= + =
(64)
(From now on in order to simplify the notation, we will use
1
( ) E | , instead of
1
| (and
0
| ) an additional assumption is
needed.
SLR.5: Homoskedasticity
Homoskedasticity implies that all errors have the same variance. That is to say,
2
( | ) var u x o =
(65)
We add SLR.5 because it simplifies the variance calculations and because it
implies that OLS has certain efficiency properties.
The homoskedasticity assumption is quite distinct from the zero conditional
mean assumption, E(u|x)=0. SLR.4 involves the expected value of u, while SLR.5
concerns the variance of u. Homoskedasticity plays no role in showing that
0
| and
1
|
are unbiased.
Taking into account that ( ) 0 E u x = and
2 2
( | ) ( ) E u x E u = -because E(xu)=0-,
we obtain
| | | | | |
| |
2
2 2
2 2 2
( | ( ( | ) 2 ( ( (
( | ) ( ( | ) ( ) ( )
var u x E u x E u x E u x E u x E u x E u x
E u x E u x E u x E u var u
o = = =
= = = =
2
2
| ) | ) | ) | )+ | )
| )
Thus, o
2
is the conditional variance, but also the unconditional variance. It is
called the error variance. The square root of the error variance (o) is called the
standard deviation of the error.
We can say
0 1
( ) E y x x | | = + | and
2
( | ) var u x o = .
So, the conditional expectation of y given x is linear in x, but the variance of y
given x is constant. When var(u|x) depends on x, the error term is said to exhibit
heteroskedasticity.
On the other hand,
| | | |
| | | |
2 2
0 1
2 2
0 1
( | ( | (
( ) (
var y x E y x E y x E y x x x
E y x x E u x var u x
| |
| |
= =
= = =
| ) | ) + | )
- | ) | ) | )
(66)
Since var(u|x)=var (y|x), heteroskedasticity is present whenever var(y|x) is a
function of x.
THEOREM 2.2 Sampling variances of OLS estimators
Under assumptions SLR.1 to SLR.5 the variances of
1
| and
0
| , conditional on
{x
1
,,x
n
}, are the following:
24
( )
2
1
2
1
( )
n
i
i
Var
x x
o
|
=
=
(67)
and
( )
2 1 2
1
0
2
1
( )
n
i
i
n
i
i
n x
Var
x x
o
|
=
=
=
(68)
Proof of (67):
Taking into account (63)
( )
( ) ( )
( )
( )
( ) ( )( )
( )
1
2
2
1 1 1
|
2 2
2
1
2 2
1
1 1
2
2
2
2
1 2
1
( | ) ( ) |
1
| |
1
|
1
X
n
i i
n
i
i i
n n
i
i i
i i
n n
i i i j i j
n
i i j
i
i
i
var X E E X
x x u
E X E x x u X
x x x x
E x x u x x x x u u X
x x
x x
|
o | | |
=
=
= =
= =
=
( (
= =
( (
( (
(
( ( = =
(
( (
( (
( (
= +
( (
( (
(
=
( ) ( )( )
( )
( ) ( )( )
( )
( )
( )
2
2
2
1 2
1
2
2
2
1 2
1
2 2
2
2
2 2
2
1 2
1
1
| |
1
( ) ( )
1
( )
n n
i i i j i j
n
i i j
i
n n
i i i j i j
n
i i j
i
i
n
i i
n
n
X i
i
i
i
i
E x x u X E x x x x u u X
x x E u x x x x E u u
x x
x x E u
nS
x x
x x
o o
= =
=
= =
=
=
=
=
( (
(
+
( (
(
( (
(
(
= +
(
(
(
(
= = =
(
(
(69)
where
( )
2
2 1
n
i
i
X
x x
S
n
=
=
is the sample variance of X.
In the above proof we have taken into account that ( )
i j
E u u is equal to 0,
because the sampling is random and errors are therefore independent.
(From now on in order to simplify the notation, we will not use explicitely
conditional expectations)
We can draw the following conclusions of the last expression:
25
1) The larger the error variance, o
2
, the larger the variance of the
slope estimate. This is not at all surprising: more noise in the
equation, a larger o
2
, makes it more difficult to estimate the
effect of x on y, and this is reflected in higher variances for the
OLS slope estimators. Since o
2
is a feature of the population,
it has nothing to do with the sample size.
2) The larger the variability in the x
i
s, the smaller the variance of
the slope estimate. Everything else being equal, for estimating
|
1
we prefer to have as much sample variation in x as possible.
In any case it is not allowed by Assumption SLR.3 that
2
X
S =0.
3) The larger the sample size (n), the smaller the variance of the
slope estimate.
We have a problem to calculate
1
2
|
o because the error variance, o
2
, is unknown.
Estimating the Error Variance
We dont know what the error variance, o
2
, is, and we cannot estimate it from
the errors, u
i
, because we dont observe the errors.
Given that o
2
= E(u
2
), an unbiased estimator would be
2
2 1
n
i
i
u
n
o
=
=
(70)
Unfortunately, this is not a true estimator, because we dont observe the errors
u
i
. But, we do have estimates of the u
i
, namely the OLS residuals
i
.
The relation between errors and residuals is given by
( ) ( )
0 1 0 1
0 0 1 1
i i i i i i
i i
u y y x u x
u x
| | | |
| | | |
= = + +
=
(71)
Hence
i
is not the same as u
i
, although the difference between them does have
an expected value of zero. If we replace the errors with the OLS residuals, we have
2
2 1
n
i
i
u
n
o
=
=
(72)
This is a true estimator, because it gives a computable rule for any sample of the
data, x and y. However, this estimator is biased, essentially because it does not account
for two restrictions that must be satisfied by the OLS residuals,
26
1
1
0
0
n
i
i
n
i i
i
u
x u
=
=
(73)
One way to view these restrictions is the following one: if we know n2 of the
residuals, we can get the other two residuals by using the restrictions implied by the
moment conditions.
Thus, there are only n2 degrees of freedom in the OLS residuals, as opposed to
n degrees of freedom in the errors. The unbiased estimator of o
2
that we will use makes
an adjustment taking into account the degrees of freedom:
2
2 1
2
n
i
i
u
n
o
=
=
(74)
THEOREM 2.3 Unbiased estimator of o
2
Under assumptions SLR.1 to SLR.5
2 2
( ) E o o =
(75)
If
2
o is plugged into the variance formulas we then have unbiased estimators of
var(
0
| ) and var(
1
| )
The natural estimator of o is
2
o o = and is called the standard error of the
regression.
The square root of the variance of
1
| , that is to say,
( )
1
2
1
( )
n
i
i
sd
x x
o
|
=
=
(76)
Therefore, its natural estimator is
( )
1
2
1
( )
n
i
i
se
x x
o
|
=
=
(77)
Note that
1
| varies with
different samples. The standard error of any estimate gives us an idea of how precise the
estimator is.
27
2.6 Regression through the origin
If we force the regression line to pass through the point (0,0) we are constraining
the intercept to be zero, as it can be seen in figure 2.5. This is called a regression
through the origin.
y
x
Figure 2.5. A regression through the origin.
This is not very often done because, among other problems, when
0
0 | = then
the slope estimate will be biased.
We are going to how to estimate a regression line through the origin.
The fitted model is the following:
1
i i
y x | =
(78)
Therefore, we must minimize
0 0
2
1
1
min min ( )
n
i i
i
S y x
| |
|
=
=
(79)
To minimize S, we differentiate with respect to
1
|
and equal to 0:
1
1
1
2 ( ) 0
n
i i i
i
S
y x x |
|
=
c
= =
c
(80)
Solving for
1
|
28
1
1
2
1
n
i i
i
n
i
i
y x
x
|
=
=
=
(81)
Other problem with fitting a regression line through the origin is that, in general,
happens:
| |
2
2 2
( )
i i i
y y y y u = +
If it is not possible the decomposition of the variance of y in two components
(explained and residual), then the R
2
is meaningless. This coefficient can take values
negative or greater than 1 in this case.
To sum up, it must be included an intercept in the regressions, unless there are
strong theoretical reasons from the economic theory- to exclude it.
29
Case study. Engel's Curve. Demand of dairy products
The expression Engel's Curve makes reference to the line which shows the
relationship between various quantities of a good a consumer is willing to purchase at
varying income levels.
In a survey to 40 household, data about expenditure on dairy products and
income have been obtained. These data appear in table 3. In order to avoid distortions
due to the different size of households, both the consumption and the income have been
expressed in terms per capita. The data come expressed in thousands of euros monthly.
Table 3. Expenditure in dairy products (dairy), disposable income (inc) in terms per capita.
Unit: euros per month.
household dairy inc household dairy inc
1 8.87 1,250 21 16.20 2,100
2 6.59 985 22 10.39 1,470
3 11.46 2,175 23 13.50 1,225
4 15.07 1,025 24 8.50 1,380
5 15.60 1,690 25 19.77 2,450
6 6.71 670 26 9.69 910
7 10.02 1,600 27 7.90 690
8 7.41 940 28 10.15 1,450
9 11.52 1,730 29 13.82 2,275
10 7.47 640 30 13.74 1,620
11 6.73 860 31 4.91 740
12 8.05 960 32 20.99 1,125
13 11.03 1,575 33 20.06 1,335
14 10.11 1,230 34 18.93 2,875
15 18.65 2,190 35 13.19 1,680
16 10.30 1,580 36 5.86 870
17 15.30 2,300 37 7.43 1,620
18 13.75 1,720 38 7.15 960
19 11.49 850 39 9.10 1,125
20 6.69 780 40 15.31 1,875
There are multiple types of models that are used in the demand studies. In
particular in this case study we are going to consider the following models: linear,
inverse, semi-logarithmic, potential, exponential and exponential inverse. In the first
three models, the regressand of the equation is directly the endogenous variable,
whereas in the last three the regressand is the natural logarithm of the endogenous
variable.
In all the models we will calculate the marginal propensity to expenditure, as
well as the elasticity expenditure/income.
Linear model
The linear model for demand of dairy products will be the following:
0 1
dairy inc u | | = + +
(82)
As it is known, the marginal propensity indicates us as expenditure changes
when income varies and it is obtained differentiating the expenditure with respect to
income in the demand equation. In the linear model the marginal propensity of the
expenditure is given by
30
1
d dairy
d inc
| =
(83)
In other words, in the linear model the marginal propensity is constant and,
therefore, it is independent of the value that takes the income. The fact that it is constant
is an advantage, but at the same time it has the disadvantage of not being adapted to
describe the behaviour of the consumers, especially when there are important
differences in the income of the households. Thus, it is unreasonable that the marginal
propensity of expenditure on dairy products is the same in a low income family and
another family with great income. However, if the variation of the income is not very
high in the sample, a linear model can be used to describe the demand of certain goods.
In this model the elasticity expenditure/income is the following:
/ 1
linear
dairy inc
d dairy inc inc
d inc dairy dairy
c | = =
(84)
Estimating the model (82) with the data of table 3, we obtain
2
4.012 0.005288 0.4584 dairy inc R = + =
(85)
Inverse model
In an inverse model there is a linear relationship between expenditure and the
inverse income. Therefore, this model is directly linear in the parameters and it is
expressed in the following way:
0 1
1
dairy u
inc
| | = + +
(86)
The sign of the coefficient
1
| will be negative if the income is correlated
positively with the expenditure. It is easy to see that, when the income tends towards
infinite, the expenditure tends towards a limit which is equal to |
0
. In other words, |
0
represents the maximum consumption of this good.
In figure 2.6, we can see a double representation of the population function
corresponding to this model. In the first one, the relationship between the dependent
variable and explanatory variable has been represented. In the second one, the
relationship between the regressand and regressor has been represented. As can be seen,
the second function is linear.
Figure 2.6. The inverse model
In the inverse model, the marginal propensity to expenditure is given by
31
1 2
1
( )
d dairy
d inc inc
| =
(87)
According to (87), the propensity is inversely proportional to the square of the
income level.
On the other hand, the elasticity is inversely proportional to the product of
expenditure and income, as can be seen in the following expression:
/ 1
1
inv
dairy inc
d dairy inc
d inc dairy inc dairy
c | = =
(88)
Estimating the model (86) with the data of table 3, we obtain
2
1
18.652 8702 0.4281 dairy R
inc
= =
(89)
In this case the coefficient
1
|
does not have an economic meaning.
Linear-log model
This model is denominated linear-log model, because the expenditure is a linear
function of the logarithm of income, that is to say,
0 1
log( ) dairy inc u | | = + +
(90)
In this model the marginal propensity to expenditure is given by
1
1 1
log( )
d dairy d dairy inc d dairy
d inc d inc inc d inc inc inc
| = = =
(91)
and the elasticity expenditure/income by
/ 1
1 1
log( )
lin-log
dairy inc
d dairy inc d dairy
d inc dairy d inc dairy dairy
c | = = =
(92)
As can be seen, in the linear-log model the marginal propensity is inversely
proportional to the level of income, while the elasticity is inversely proportional to the
level of expenditure on dairy products.
In figure 2.7, we can see a double representation of the population function
corresponding to this model.
Figure 2.7. The linear log model
32
Estimating the model (90) with the data of table 3, we obtain
2
41.623 7.399 log( ) 0.4567 dairy inc R = + =
(93)
The interpretation of
1
|
is the following: if the income increases by 1%, the
demand of dairy products will increase by 0.07399 euros.
Log-log model or potential model
This exponential model is defined in the following way:
1
u
dairy Ainc e
|
=
(94)
This model is not linear in the parameters, but linearizable, by taking natural
logarithms, obtaining the model:
0 1
log( ) log( ) dairy inc u | | = + +
(95)
where |
0
=log(A)
This model is also called log-log model, because it is the structure of the
corresponding linearized model.
In this model the marginal propensity to expenditure is given by
1
d dairy dairy
d inc inc
| =
(96)
In the log-log model, the elasticity is constant. Therefore, if the income increases
by 1%, the expenditure will increase by |
1
%, since
/ 1
log( )
log( )
log-log
dairy inc
d dairy inc d dairy
d inc dairy d inc
c | = = =
(97)
In figure 2.8, we can see a double representation of the population function
corresponding to this model.
Figure 2.8. The log log model
Estimating the model (95) with the data of table 3, we obtain
2
log( ) 2.556 0.6866 log( ) 0.5190 dairy inc R = + =
(98)
33
In this case
1
|
is the elasticity expenditure/income. Its interpretation is the
following: if the income increases by 1%, the demand of dairy products will increase by
0.68%.
Log linear or exponential model
This exponential model is defined in the following way:
0 1
exp( ) dairy inc u | | = + +
(99)
By taking natural logarithms in both sides of (99), we obtain the following
model that is linear in the parameters:
0 1
log( ) dairy inc u | | = + +
(100)
In this model the marginal propensity to expenditure is given by
1
d dairy
dairy
d inc
| =
(101)
In the exponential model, unlike other models seen previously, the marginal
propensity increases when the level of expenditure does. For this reason, this model is
adequate to describe the demand of luxury products. On the other hand, the elasticity is
proportional to the level of income:
/ 1
log( )
exp
dairy inc
d dairy inc d dairy
inc inc
d inc dairy d inc
c | = = =
(102)
In figure 2.9, we can see a double representation of the population function
corresponding to this model.
Figure 2.9. The log linear model
Estimating the model (100) with the data of table 3, we obtain
2
log( ) 1.694 0.00048 0.4978 dairy inc R = + =
(103)
The interpretation of
1
|
is the following: if the income increases by 1 euro the
demand of dairy products will increase by 0.048%.
Inverse exponential model
The inverse exponential model, that is a mixture of the exponential model and
the inverse model, is given by
34
0 1
1
exp( ) dairy u
inc
| | = + +
(104)
By taking natural logarithms in both sides of (104), we obtain the following
model that is linear in the parameters:
0 1
1
log( ) dairy u
inc
| | = + +
(105)
In this model the marginal propensity to expenditure is given by
1 2
( )
d dairy dairy
d inc inc
| =
(106)
and the elasticity by
/ 1
log( ) 1
invexp
dairy inc
d dairy inc d dairy
inc
d inc dairy d inc inc
c | = = =
(107)
Estimating the model (105) with the data of table 3, we obtain
2
1
log( ) 3.049 822.02 0.5040 dairy R
inc
= =
(108)
In this case, as in the inverse model, the coefficient
1
|
does not have an
economic meaning.
In table 4, the results of marginal propensity, elasticity expenditure/income and
R
2
in the six fitted models are shown
Table 4. Marginal propensity, elasticity expenditure/income and R
2
in the fitted models.
Model
Marginal propensity
Elasticity R
2
Linear 1
| =0.0053
1
inc
dairy
| =0.6505
0.4440
Inverse
1 2
1
inc
|
(
=0.0044
1
1
dairy inc
|
=0.5361
0.4279
Linear log 1
1
inc
| =0.0052
1
1
dairy
| =0.6441
0.4566
Log log
1
dairy
inc
| =0.0056
1
| =0.6864 0.5188
Log linear 1
dairy | =0.0055
1
dairy
inc
|
(
=0.0047
1
1
inc
| =0.5815
0.5038
35
The R
2
obtained in the three first models are not comparable with the R
2
obtained in the three last one because the functional form of the regressand is different:
y in the three first models and log(y) in the three last ones.
Comparing the three first models the best one is the linear log model, if we use
the R
2
as goodness of fit measure. Comparing the three last models the best one is the
log log model. If we had used the measure Akaike Information Criterion (AIC), which
allows comparing models with different functional forms for the regressand, then the
log log model would have been the best among the six models fitted.