Chapter 3 Econometrics
Chapter 3 Econometrics
3.1 INTRODUCTION
We have studied the two-variable model extensively in the previous unit. But in economics
you hardly found that one variable is affected by only one explanatory variable. For example,
the demand for a commodity is dependent on price of the same commodity, price of other
market etc. Hence, the two variable model is often inadequate in practical works. Therefore,
we need to discuss multiple regression models. The multiple linear regression is entirely
concerned with the relationship between a dependent variable (Y) and two or more explanatory
explanatory variables.
Y = f(X1, X2)
Example: Demand for a commodity may be influenced not only by the price of the commodity
Since the theory does not specify the mathematical form of the demand function, we assume
the relationship between Y, X1, and X2 is linear. Hence we may write the three variable
2 is the coefficient of X2 and its expected sign is positive assuming that the good is a
normal good.
The coefficients 1 and 2 are called the partial regression coefficients. We will discuss the
3.3 ASSUMPTIONS
To complete the specification of our simple model we need some assumptions about the
random variable U. These assumptions are the same as those assumptions already explained in
2. Homoscedasticity
3. Normality
Ui ~ N(0, u2 )
The values of Ui (corresponding to Xi) are independent from the values of any other Uj
(corresponding to Xj).
5. Independence of Ui and Xi
Here the values of the X’s are a set of fixed numbers in all hypothetical samples (Refer
are not perfectly linearly correlated. There is no exact linear relationships between X1 and
X2.
The model has no specification error in that all the important explanatory variables appear
3.4 ESTIMATION
We have specified our model in the previous subsection. We have also stated the assumptions
required in subsection 3.3. Now let us have sample observations on Y, X1i, and X2i and obtain
Yi X1i X2i
Y1 X11 X21
Y2 X12 X22
Y3 X13 X23
Yn X1n X2n
As discussed in unit 2, the estimates will be obtained by choosing the values of the unknown
parameters that will minimize the sum of squares of the residuals. (OLS requires the Ui2 be
x y x x y x 2
x 2i
1=
1i i 2i 2i i 1i
x x x x
2
1i
2
2i 1i 2i
2
x y x x y x 2
x 2i
2
2i i 1i 1i i 1i
=
x x x x
2
1i
2
2i 1i 2i
2
, k being the total number of parameters that are estimated. In the above case
nk
(three-variable model, k = 3)
regression equation. This notion of r2 can be easily extended to regression models containing
In the three-variable model we would like to know the proportion of the variation in Y
The quantity that gives this information is known as the multiple coefficient of determination.
It is denoted by R2, with subscripts the variables whose relationships is being studies.
Example: R 2 y . X 1 X 2 - shows the percentage of the total variation of Y explained by the regression
yˆ Y Y
2 2
2 i i
R =
y Y Y
y. X1 X 2
2 2
i i
U
2
RSS
1
i
=1–
y
2
i
TSS
Recall that
The value of R2 lies between 0 and 1. The higher R2 the greater the percentage of the variation
of Y explained by the regression plane, that is, the better the goodness of fit of the regression
plane to the sample observations. The closer R2 to zero, the worse the fit.
The Adjusted R2
Note that as the number of regressors (explanatory variables) increases the coefficient of
multiple determinations will usually increase. To see this, recall the definition of R2
U
2
i
R =1–
2
y
2
i
Now yi2 is independent of the number of X variables in the model because it is simply (yi -
Y )2. The residual sum of squares (RSS), Ui2, however depends on the number of explanatory
variables present in the model. It is clear that as the number of X variables increases, Ui2 is
bound to decrease (at least it will not increase), hence R2 will increase. Therefore, in comparing
two regression models with the same dependent variable but differing number of X variables,
one should be very wary of choosing the model with the highest R2. An explanatory variable
which is not statistically significant may be retained in the model if one looks at R2 only.
Therefore, to correct for this defect we adjust R2 by taking into account the degrees of freedom,
U n k
2
2 i
R =1–
y n 1
2
i
2
or R = 1 – (1 – R2)
n 1
nk
where k = the number of parameters in the model (including the intercept term)
As the number of explanatory variables increases, the adjusted R2 is increasingly less than the
2
unadjusted R2. The adjusted R2 ( R ) can be negative, although R2 is necessarily non-negative.
constant. The t test is used to test a hypothesis about any individual partial regression
coefficient. The partial regression coefficient measures the change in the mean value of Y
This is the observed (or sample) value of the t ratio, which we compare with the theoretical
The theoretical values of t (at the chosen level of significance) are the critical values that define
H0: i = 0
The null hypothesis states that, holding X2 constant, X1 has no (linear) influence on y.
If the computed t value exceeds the critical t value at the chosen level of significance, we may
reject the hypothesis; otherwise, we may accept it ( 1 is not significant at the chosen level of
significance and hence the corresponding regression does not appear to contribute to the
Acceptance
region
95%
Critical region
2.5% Critical region
(2.5%)
For a number of degrees of freedom higher than 8 the critical value of t (at the 5% level of
any significant influence on the dependent variable. The test of the overall significance of the
H0: 1 = 2 = … = k = 0
If the null hypothesis is true, then there is no linear relationship between y and the regressors.
The above joint hypothesis can be tested by the analysis of variance (AOV) technique. The
k 1
k
i
N–k
Total y 2
i
N–1
(Total variation)
Therefore to undertake the test first find the calculated value of F and compare it with the F
tabulated. The calculated value of F can be obtained by using the following formula.
Decision Rule: If Fcalculated > Ftabulated (F(k – 1, N – k)), reject H0: otherwise you may accept it,
where F(k – 1, N – k) is the critical F value at the level of significance and (k – 1) numerator
Note that there is a relationship between the coefficient of determination R2 and the F test used
R 2 (k 1) f(F)
F=
(1 R 2 ( N k )
5% of area
0 1 2 3 4 5
When R2 = 0, F is zero. The larger the R2, the greater the F value. In the limit, when R2 = 1, F
is infinite. Thus the F test, which is a measure of the overall significance of the estimated
regression, is also a test of significance of R2. Testing the null hypothesis is equivalent to testing