Introduction to Econometrics_Module
Introduction to Econometrics_Module
March, 2023
1. INTRODUCION ..................................................................................................................... 5
Chapter Two.................................................................................................................................. 14
2. The Classical Linear Regression Analysis: The Simple Linear Regression Models ............ 14
2.1.2. Terminology............................................................................................................ 15
2.7.3. The Errors That We Can Make Using Hypothesis Tests ........................................ 39
Chapter Three................................................................................................................................ 43
3. The Classical Linear Regression Analysis: Multiple Linear Regression Models ................. 43
4.2. Assumption 1: The assumption of mean of the disturbance is zero (E(u) = 0) .............. 51
Appendix ....................................................................................................................................... 74
Course description
Econometrics is the quantitative application of statistical, economic theories and mathematical
models using data to develop theories or test existing hypotheses in economics and to forecast
future trends from historical data. The objective of Econometrics is to quantify such relationships
using available data and statistical techniques to interpret and use the resulting outcomes. So,
Econometrics is the application of statistical and mathematical methods to the analysis of
economic data, with the purpose of giving empirical content to economic theories and then
verifying or refuting them. Bridging the gap between theory and policy analysis requires
acquiring the practice of applying the concepts, theories and methods of Economics to policy
analysis. This course is designed to meet this challenge by providing insights on how the three
elements of Econometrics namely: economic theory, data and statistical procedures can be
combined, to provide useful information to policy analysts and decision makers. In this course,
practical exercises using econometrics and statistical soft wares such as SPSS, STATA, EViews
and others like EXCEL will be conducted to equip students with knowledge and skill of using
software for data analysis.
The study of econometrics has become an essential part of every undergraduate course in
economics, and it is not an exaggeration to say that it is also an essential part of every
economist‘s training. This is because the importance of applied economics is constantly
increasing and the ability to quantity and evaluates economic theories and hypotheses constitutes
now, more than ever, a bare necessity. Theoretical economies may suggest that there is a
relationship between two or more variables, but applied economics demands both evidence that
this relationship is a real one, observed in everyday life and quantification of the relationship,
between the variable relationship using actual data is known as econometrics.
Literally econometrics means measurement (the meaning of the Greek word metrics) in
economic. However econometrics includes all those statistical and mathematical techniques that
are utilized in the analysis of economic data. The main aim of using those tools is to prove or
disprove particular economic propositions and models.
Econometrics, the result of a certain outlook on the role of economics consists of the application
of mathematical statistics to economic data to tend empirical support to the models constructed
by mathematical economics and to obtain numerical results. Econometrics may be defined as the
quantitative analysis of actual economic phenomena based on the concurrent development of
theory and observation, related by appropriate methods of inferences.
Econometrics may also be defined as the social sciences in which the tools of economic theory,
mathematics and statistical inference are applied to the analysis of economic phenomena.
Econometrics is concerned with the empirical determination of economic laws.
Econometrics differs from both mathematical statistics and economic statistics. An economic
statistician gathers empirical data, records them, tabulates them or charts them, and attempts to
describe the pattern in their development over time and perhaps detect some relationship
between various economic magnitudes. Economic statistics is mainly a descriptive aspect of
economics. It does not provide explanations of the development of the various variables and it
does not provide measurements the coefficients of economic relationships.
Mathematical (or inferential) statistics deals with the method of measurement which are
developed on the basis of controlled experiments. But statistical methods of measurement are not
appropriate for a number of economic relationships because for most economic relationships
controlled or carefully planned experiments cannot be designed due to the fact that the nature of
relationships among economic variables are stochastic or random. Yet the fundamental ideas of
inferential statistics are applicable in econometrics, but they must be adapted to the problem
economic life. Econometric methods are adjusted so that they may become appropriate for the
Example: Economic theory postulates that the demand for a commodity depends on its price (P),
on the prices of other related commodities (Po), on consumers’ income(Y) and on tastes (t). This
is an exact relationship which can be written mathematically as:
Q 1 P 2 Po 3Y 4 t
The above demand equation is exact. However, many more factors may affect demand. In
econometrics the influence of these ‘other’ factors is taken into account by the introduction into
the economic relationships of random variable. In our example, the demand function studied
with the tools of econometrics would be of the stochastic form:
Where stands for the random factors which affect the quantity demanded.
1. Theoretical Econometrics
2. Applied Econometrics
Theoretical Econometrics: is concerned with the development of appropriate methods for
measuring economic relationships specified by econometric models. In this aspect, econometrics
leans heavily on mathematical statistics. For example, one of the tools that are used extensively
is the method of least squares. It is the concern of theoretical econometrics to spell out the
assumptions of this method, its properties, and what happens to these properties when one or
more of the assumptions of the method are not fulfilled.
In applied Econometrics we use the tools of theoretical econometrics to study some special
field(s) of economics, such as the production function, consumption function, investment
function, demand and supply functions, etc.
Applied econometrics includes the applications of econometric methods to specific branches of
economic theory. It involves the application of the tools of theoretical econometrics for the
analysis of economic phenomena and forecasting economic behavior.
Although there are of course many different ways to go about the process of model building, a
logical and valid approach would be to follow the steps described in figure below
2. Collection of Data
3. Model Estimation
No Yes
Step 1a and 1b: General statement of the problem. This will usually involve the formulation
of a theoretical model, or intuition from economic or financial theory that two or more variables
should be related to one another in a certain way. The model is unlikely to be able to completely
capture every relevant real-world phenomenon, but it should present a sufficiently good
approximation that it is useful for the purpose at hand.
Panel data: These are the results of repeated survey of a single (cross sectional
data) sample in different periods of time. Example: employment data across
individuals and over time
Step 3: Choice of estimation method relevant to the model proposed in step 1. For example,
is a single equation or multiple equation technique to be used?
Step 4: Statistical evaluation of the model. What assumptions were required to estimate the
parameters of the model optimally? Were these assumptions satisfied by the data or the model?
Also, does the model adequately describe the data? If the answer is ‘yes’, proceed to step 5; if
not, go back to steps 1--3 and either reformulate the model, collect more data, or select a
different estimation technique that has less stringent requirements.
Step 5: Evaluation of the model from a theoretical perspective Are the parameter estimates of
the sizes and signs that the theory or intuition from step 1 suggested? If the answer is ‘yes’,
proceed to step 6; if not, again return to stages 1-3.
It is important to note that the process of building a robust empirical model is an iterative one,
and it is certainly not an exact science. Often, the final preferred model could be very different
from the one originally proposed, and need not be unique in the sense that another researcher
with the same data and the same initial theory could arrive at a different final specification.
Analysis i.e. testing economic theory: Economists formulated the basic principles of
the functioning of the economic system using verbal exposition and applying a
deductive procedure. Economic theories thus developed in an abstract level were not
tested against economic reality. Econometrics aims primarily at the verification of
economic theories.
Policy making i.e. obtaining numerical estimates of the coefficients of economic
relationships for policy simulations. In many cases we apply the various econometric
techniques in order to obtain reliable estimates of the individual coefficients of the
economic relationships from which we may evaluate elasticity or other parameters of
economic theory (multipliers, technical coefficients of production, marginal costs,
marginal revenues, etc.) The knowledge of the numerical value of these coefficients is
very important for the decisions of firms as well as for the formulation of the
economic policy of the government. It helps to compare the effects of alternative
policy decisions.
Forecasting i.e. using the numerical estimates of the coefficients in order to forecast
the future values of economic magnitudes. In formulating policy decisions it is
essential to be able to forecast the value of the economic
Review questions
1. Define econometrics?
2. How does it differ from mathematical economics and statistics?
3. Why is econometrics a separate discipline?
4. Describe the main steps involved in any econometrics research.
5. Describe the types of data
6. Differentiate between economic and econometric model.
7. What are the goals of econometrics?
Broadly speaking, we may say Regression analysis is concerned with the study of the
dependence of one variable, the dependent variable, on one or more other variables, the
explanatory variables, with a view to estimating and/or predicting the (population) mean or
average value of the former in terms of the known or fixed (in repeated sampling) values of the
latter.
Example:
Although regression analysis deals with the dependence of one variable on other variables, it
does not necessarily imply causation. In the words of Kendall and Stuart, “A statistical
relationship, however strong and however suggestive, can never establish causal connection: our
ideas of causation must come from outside statistics, ultimately from some theory or other.
2.1.2. Terminology
In the literature the terms dependent variable and explanatory variable are described variously.
A representative list is:
The variables that we will generally encounter fall into four broad categories: ratio scale,
interval scale, ordinal scale, and nominal scale. It is important that we understand each.
Ratio Scale: For a variable X, taking two values, X1 and X2, the ratio X1/X2 and the distance (X2 -
X1) are meaningful quantities. Also, there is a natural ordering (ascending or descending) of the
values along the scale. Therefore, comparisons such as X2 ≤ X1 or X2 ≥ X1 are meaningful. Most
economic variables belong to this category. E.g., it is meaningful to ask how big this year’s GDP
is compared with the previous year’s GDP.
Ordinal Scale: A variable belongs to this category only if it satisfies the third property of the
ratio scale (i.e., natural ordering). Examples are grading systems (A, B, C grades) or income
class (upper, middle, lower). For these variables the ordering exists but the distances between the
categories cannot be quantified.
Nominal Scale: Variables in this category have none of the features of the ratio scale variables.
Variables such as gender (male, female) and marital status (married, unmarried, divorced,
separated) simply denote categories.
It is clear that each conditional mean E(Y | Xi) is a function of Xi, where Xi is a given value of X.
Symbolically,
Where f (Xi) denotes some function of the explanatory variable X. E(Y | Xi) is a linear function of
Xi and is known as the conditional expectation function (CEF) or population regression
function (PRF) or population regression (PR) for short. It states merely that the expected value
of the distribution of Y given Xi is functionally related to Xi. In simple terms, it tells how the
mean or average response of Y varies with X.
We may assume that the PRF E(Y | Xi) is a linear function of Xi, say, of the type
E(Y | Xi) = + β1 X
Where and β1 are unknown but fixed parameters known as the regression coefficients
Therefore, we can express the deviation of an individual Yi around its expected value as follows:
ui = Yi - E(Y | Xi)
or
Yi = E(Y | Xi) + ui
It is about time to face up to the sampling problems, for in most practical situations what we
have is but a sample of Y values corresponding to some fixed X’s. Therefore, the task now is to
estimate the PRF on the basis of the sample information.
Now, analogously to the PRF that underlies the population regression line, we can develop the
concept of the sample regression function (SRF) to represent the sample regression line. The
sample counterpart of the PRF may be written as:
Y 1 X i
Where Y is read as “Y-hat”= estimator of E(Y | Xi)
=estimator of
1 =estimator of β1
Note that an estimator, also known as a (sample) statistic, is simply a rule or formula or
method that tells how to estimate the population parameter from the information provided by the
sample at hand. A particular numerical value obtained by the estimator in an application is
known as an estimate.
Y 1 X i i
Yi = + β1 Xi + ui
On the basis of the SRF
Yi 1 X i i
Because more often than not our analysis is based upon a single sample from some
population:
The deviations of the observations from the line may be attributed to several factors.
(1) Omission of variables from the function
In economic reality each variable is influenced by a very large number of factors.
However, not all the factors influencing a certain variable can be included in the
function for various reasons.
(2) Random behavior of the human beings
The scatter of points around the line may be attributed to an erratic element which is
inherent in human behavior. Human reactions are to a certain extent unpredictable
and may cause deviations from the normal behavioral pattern depicted by the line.
(3) Imperfect specification of the mathematical form of the model
We may have linearized a possibly nonlinear relationship. Or we may have left out of
the model some equations.
(4) Errors of aggregation
We often use aggregate data (aggregate consumption, aggregate income), in which
we add magnitudes referring to individuals whose behavior is dissimilar. In this case
we say that variables expressing individual peculiarities are missing.
(5) Errors of measurement
This refers to errors of measurement of the variables, which are inevitable due to the
methods of collecting and processing statistical information.
The first four sources of error render the form of the equation wrong, and they are
usually referred to as error in the equation or error of omission. The fifth source of
error is called error of measurement or error of observation.
To estimate the coefficients and we need observations on X, Y and u. yet u is never observed like
the other explanatory variables, and therefore in order to estimate the function Yi = + β1 Xi + ui,
we should guess the values of u, that is we should make some reasonable assumptions about the
shape of the distribution of each ui (its means, variance and covariance with other u’s). These
assumptions are guesses about the true, but unobservable, value of ui.
The linear regression model is based on certain assumptions, some of which refers to the
distribution of the random variable u, some to the relationship between u and the explanatory
variables, and some refers to the relationship between the explanatory variables themselves.
1. ui is a random real variable and has zero mean value: E(ui) = 0 or E(uiXi) = 0)
This implies that for each value of X, u may assume various values, some
positive, and some negative but on average zero.
Further E(Yi) = Yi = + β1 Xi + ui gives the relationship between X and Y on the
average, i.e. when X takes on value Xi , then Y will on the average take on E(Yi)
(or E(YiXi))
2. The variance of ui is constant for all i, i.e., var(uiXi) = E( u i2 Xi) = , and is called the
Thus far we have completed the work involved in the first stage of any econometric application,
namely we have specified the model and stated explicitly its assumptions. The next step is the
estimation of the model, that is, the computation of the numerical values of its parameters. The
linear relationship Yi = + β1 Xi + ui holds for the population of the values of X and Y, so that
we could obtain the numerical values of and β1 only if we could have all the possible values
of X, Y and u which form the population of these variables. Since this is impossible in practice,
we get a sample of observed values of Y and X, specify the distribution of the u’s and try to get
satisfactory estimates of the true parameters of the relationship. This is done by fitting a
Jigdan College Intr. to Econometrics 20
regression line through the observations of the sample, which we consider as an approximation
to the true line.
The method of ordinary least squares is one of the econometric methods which enable us to find
the estimate of the true parameter and is attributed to Carl Friedrich Gauss, a German
mathematician. To understand this method, we first explain the least squares principle.
However, as noted in earlier, the PRF is not directly observable. We estimate it from the SRF:
Y 1 X i i
Y Y i
Where Y is the estimated (conditional mean) value of Y
But how is the SRF itself determined? To see this, let us proceed as follows. First, express the
above equation as:
u i Yi Yi
i Yi 1 X i
Which shows that the ui (the residuals) are simply the differences between the actual and
estimated Y values.
Now given n pairs of observations on Y and X, we would determine the SRF in such a manner
that it is as close as possible to the actual Y. To this end, we adopt the least-squares criterion,
which states that the SRF can be fixed in such a way that
2
u i (Yi Yi ) 2
2
u i (Yi 1 X i ) 2 , is as small as possible, where u i2 are the squared residuals.
The principle or the method of least squares chooses and 1 in such a manner that, for a given
sample or set of data, ui2 is as small as possible. In other words, for a given sample, the
method of least squares provides us with unique estimates of and 1 that give the smallest
possible value of ui2 .
The process of differentiation yields the following equations for estimating and 1
u i2
2 (Yi ˆ 1 X i ) 0
ˆ t
ui2
2 xi ( yi ˆ ˆ xi ) 0
1 t
Y i nˆ 1 X i )
X Y i i ˆ X i 1 X i2 )
where n is the sample size. These simultaneous equations are known as the normal equations.
n Yi X i X i Yi
1
n X i2 ( X i ) 2
1
( X X )(Y Y )
i i
(X X ) i
2
Where X and Y are the sample means of X and Y where we define xi X i X and y i Yi Y .
The above lowercase letters in the formula denote deviations from mean values and can be
obtained directly simply dividing both sides of the equation by n.
Note that, by making use of simple algebraic identities for estimating β1 can be alternatively
expressed as
1
x y i i
x y i i
x x nX 2
2 2
i i
The estimators obtained previously are known as the least-squares estimators, for they are
derived from the least-squares principle. We finally write the regression line equation as:
Yi 1 X i
Interpretation of estimates
Estimated intercept, : The estimated average value of the dependent variable when the
independent variable takes on the value zero
Estimated slope, 1 : The estimated change in the average value of the dependent variable
when the independent variable increases by one unit.
Yi gives average relationship between Y and X. i.e. Yi is average value of Y given Xi
We illustrate the econometric theory developed so far by considering the Keynesian consumption
function. As a test of the Keynesian consumption function, we use the sample data below.
Hypothetical data on weekly family consumption expenditure Y and weekly family income X
Y(Birr) X(Birr)
70 80
65 100
90 120
95 140
110 160
115 180
120 200
140 220
155 240
150 260
A. Determine the regression equation.
1
( X X )(Y Y ) 16800 0.509
i i
(X X ) i
2
33000
Y 1 X 111 0.509 x170 24.47
Yi 1 X i 24.47 0.509 X i
B. If X is 300 then, Yi 24.47 0.509 x300 177.17
C. The associated regression line are interpreted as follows: Each point on the regression
line gives an estimate of the expected or mean value of Y corresponding to the chosen X
value; that is, Yi is an estimate of E(Y | Xi). The value of 1 = 0.5091, which measures the
slope of the line, shows that, within the sample range of X between Birr 80 and Birr 260
per week, as X increases, say, by Birr1, the estimated in the mean or average weekly
consumption expenditure amounts to about 51 cents. The value of = 24.47, which is the
Given the assumptions of the classical linear regression model, the least-squares estimates
possess some ideal or optimum properties. These properties are contained in the well-known
Gauss–Markov theorem. An estimator, say the OLS estimators 1 , is said to be a best linear
unbiased estimator (BLUE) of β1 if the following hold:
1. It is linear, that is, a linear function of a random variable, such as the dependent variable
Y in the regression model.
2. It is unbiased, that is, its average or expected value, E ( 1 ), is equal to the true value, β1.
3. It has minimum variance in the class of all such linear unbiased estimators; an unbiased
estimator with the least variance is known as an efficient estimator.
In the regression context it can be proved that the OLS estimators ( , 1 ) are BLUE
After the estimation of the parameters and the determination of the least square regression line,
we need to know how ‘good’ is the fit of this line to the sample observation of Y and X, that is to
say we need to measure the dispersion of the observations around the regression line. It is clear
that if all the observations were to lie on the regression line, we would obtain a “perfect fit”, but
this is rarely the case. Hence, the knowledge of the dispersion of the observation around the
regression line is essential because the closer the observations to the line, the better the goodness
of fit. That is the better is the explanation of the variations of Y by the changes in the explanatory
variables. In general the coefficient of determination R2 is a summary measure that tells how
well the sample regression line fits the data.
SRF
Y
.
Unexplained variation due
to error Y Y
……………………………………………………………………………… Y
By fitting the line Yˆ = + ˆ1 Xi we try to obtain the explanation of the variation of the
dependent variable Y produced by the changes of the explanatory variable X. However, the fact
that the observations deviate from the estimated line shows that the regression line explains only
a part of the total variation of the dependent variable. A part of the variation, defined as ui = Yi -
Yˆ , remains unexplained. Note the following:
a) We may compute the total variation of the dependent variable by comparing each value of
Y to the mean value Y and adding all the resulting deviations
n n
yi Yi Y
2 2
[Total variation in Y] =
i 1 i
b) In the same way we define the deviation of the regressed (i.e., the estimate from the line)
values of Yˆ ’s , from the mean value, ŷ Yˆi Y . This is part of the total variation of Yi
which is explained by the regression line. Thus, the sum of the squares of these deviations is
the total explained by the regression line
i 1 i
c) Recall that we have defined the error term ui as the difference, u Yi Yˆi . This is Part of
the variation of the dependent variable which is not explained by the regression line and is
attributed to the existence of the disturbance variable U. Thus the sum of the squared
residuals gives the total unexplained variation of the dependent variable Y around its mean.
This is given by
n n
2
Yi yˆ
2
[Unexplained variation] = i
i 1 i 1
we obtain
Y i
2
Y yˆ i Y Y yˆ
2
i
2
This shows the total variation in the observed Y values about their mean values can be
partitioned in to two parts. One attributed to the regression line and the other to random forces
because not all actual Y observations lie on the fitted line. In other words total sum of square
(TSS) is equal to explained sum of square (ESS) plus residuals sum of squares (RSS).
Symbolically,
n n n
y
i 1
2
i yˆ i2 i2
i 1 i 1
Total var iation Explained var iation Un exp lained var iation
Note that because an OLS estimator minimizes the sum of squared residuals (i.e., the
unexplained variation) it automatically maximizes R2. Thus maximization of R2 as a criterion of
an estimator is formally identical to the least squares criterion. we obtain
ESS RSS
1 =
TSS TSS
yˆ Y
2 2
i
1 =
Y Y Y Y
2 2
We now define R2 as
yˆ Y
2
2
R =
Y Y
2
=
yˆ 2
i
=
ESS
y 2
TSS
R2 measures the proportion of the total variation in Y explained by the regression model.
Note that if we are working on cross section data, an R2 value equal to 0.5 may be a good fit. But
for time series data 0.5 may be too low. This means that there is no hard-and-fast rule as to how
much R2 should be. Generally, however, R2 is a good fit the higher the value of it is.
• Any set of regression estimates of and are specific to the sample used in their
estimation. Recall that the estimators of and from the sample parameters ( and )
are given by ˆ
xy nxy and ˆ y ˆx
x 2 nx 2
• What we need is some measure of the reliability or precision of the estimators ( and ).
The precision of the estimate is given by its standard error and it can be shown to be
given by
x2 x
2
SE (ˆ ) s s ,
n ( x x ) 2 n x 2 n 2 x 2
1 1
SE ( ˆ ) s s
(x x)2 x nx 22
1
s2
n
u2
• Unfortunately this is not workable since u is not observable. We can use the sample
counterpart to u, which is :
1
s2
n
uˆ 2
But this estimator is a biased estimator of 2.
s
uˆ 2
n2
1. Both SE( ) and SE( ) depend on s2 (or s). The greater the variance s2, then the more
dispersed the errors are about their mean value and therefore the more dispersed y will be
about its mean value.
2. The sum of the squares of x about their mean appears in both formulae. The larger the
sum of squares, the smaller the coefficient variances.
3. The larger the sample size, n, the smaller will be the coefficient variances. n appears
explicitly in SE( ) and implicitly in SE( ), n appears implicitly since the sum
(x x) 2
is from t = 1 to n.
4. The term appears in the SE( ). The reason is that measures how far the points
are away from the y-axis.
• Assume we have the following data calculated from a regression of y on a single variable
x and a constant over 22 observations.
•
xy 830102, n 22, x 416.5, y 86.65,
Data:
• Calculations
= 86.5 - 0.35x416.5 = -59.12
• We write yˆ ˆ ˆx
yˆ 59.12 0.35x
SE(regression), s
uˆ 2
130.6 2.55
n2 20
SE (ˆ ) s s ,
n ( x x ) 2
n x 2 n 2 x 2
1 1
SE ( ˆ ) s s
(x x)2 x nx 2
2
3919654
SE ( ) 2.55 * 3.35
22 3919654 22 416.52
1
SE ( ) 2.55 * 0.0079
3919654 22 416.52
ˆ ˆ
~ N 0,1 and ~ N 0,1
var var
ˆ ˆ
~ t n 2 and ~ t n2
SE (ˆ ) SE ( ˆ )
2.7.1. Testing Hypotheses: The Test of Significance Approach
Normal distribution
t-distribution
5. Perform the test: If the hypothesised value of (*) lies outside the confidence interval,
then reject the null hypothesis that = *, otherwise do not reject the null.
Confidence Intervals Versus Tests of Significance
ˆ *
t crit t crit
SE ( ˆ )
• Rearranging, we would not reject if
t crit xSE( ) * t crit xSE( )
( t crit xSE( ) * t crit xSE( )
• But this is just the rule under the confidence interval approach.
Example
• Using the regression results above,
yˆ 20.3 0.5091x
, n=22
(14.38) (0.2561)
• Using both the test of significance and confidence interval approaches, test the hypothesis
that =1 against a two-sided alternative.
• The first step is to obtain the critical value. We want tcrit = t20;5%
Determining the Rejection Region
-2.086 +2.086
ˆ * 0.5091 1
test statistic 1.917
SE ( ˆ ) 0.2561
Do not reject the null hypothesis (H0) since test stat lies within non-rejection region
Confidence interval approach
( t crit xSE( ), t crit xSE( )
(0.5091 2.086 x0.2561,0.5091 2.086 x0.2561
(0.0251.1.0433)
Since 1 lies within the confidence interval, do not reject the null hypothesis (H0)
Changing the Size of the Test
But note that we looked at only a 5% size of test. In marginal cases (e.g. H0 : = 1), we may get
a completely different answer if we use a different size of test. This is where the test of
significance approach is better than a confidence interval.
For example, say we wanted to use a 10% size of test. Using the test of significance approach,
ˆ * 0.5091 1
test statistic 1.917 as above. The only thing that changes is the critical
SE ( ˆ ) 0.2561
t-value.
Changing the Size of the Test: The New Rejection Regions
-1.725 +1.725
ˆ *
test statistic
SE ( ˆ )
If the test is H0 : = 0
H1 : 0
i.e. a test that the population coefficient is zero against a two-sided alternative, this is known as a
t-ratio test:
ˆ
Since * = 0, test statistic
SE ( ˆ )
• The ratio of the coefficient to its SE is known as the t-ratio or t-statistic.
2.7.3. The Errors That We Can Make Using Hypothesis Tests
We usually reject H0 if the test statistic is statistically significant at a chosen significance
level. There are two possible errors we could make:
1. Rejecting H0 when it was really true. This is called a type I error.
2. Not rejecting H0 when it was in fact false. This is called a type II error.
So there is always a trade-off between type I and type II errors when choosing a significance
level. The only way we can reduce the chances of both is to increase the sample size.
Review Questions
2) Explain, with the use of equations, the difference between the sample regression function
and the population regression function
Jigdan College Intr. to Econometrics 40
3) Differentiate simple and multiple linear regression models
5) Econometrics deals with the measurement of economic relationships which are stochastic
or random. The simplest form of economic relationships between two variables X and Y
can be represented by:
Yi = + β1 Xi + ui
Where and β1 are regression parameters, and ui =the stochastic disturbance term. What
are the reasons for the insertion of U-term in the model?
6) The following data refers to the demand for money (M) and the rate of interest (R) in for
eight different economics:
M (In billions) 56 50 46 30 20 35 37 61
R% 6.3 4.6 5.1 7.3 8.9 5.3 6.7 3.5
A. Assuming a relationship M R i , obtain the OLS estimators of , and
B. interpret its value the parameters , and
C. If in a 9th economy the rate of interest is R=8.1, predict the demand for money (M) in
economy.
D. Test the statistical significance of the parameter estimates, (H0: = 0) at 1% 5%
and 10% level of significance
E. Construct a 95% confidence interval for the true slope(coefficient)
7) Are hypotheses tested concerning the actual values of the coefficients (i.e. β) or their
estimated values (i.e. ) and why?
The simple linear regression model was assumed implicitly that only one independent variable
(X) affects the dependent variable (Y). But economic theory is seldom so simple for; a number
of other variables are also likely to affect the dependent variable. Therefore, we need to extend
our simple two-variable regression model to cover models involving more than two variables.
Adding more variables leads us to the discussion of multiple linear regression models, that is,
models in which the dependent variable, or regressand, Y depends on two or more explanatory
variables, or regressors.
The general linear regression model with k explanatory variables is of the form
Y 1 X 1 2 X 2 ... k X k i
There are K parameters to be estimated (K = k+1). Clearly the system of normal equations will
consist of K equations, in which the unknowns are the parameters , 1 , 2 ,.... k
The simplest possible multiple regression model is three-variable regression, with one dependent
variable and two explanatory variables.
In this part we shall extend the simple linear regression model to relationships with two
explanatory variables and consequently to relationships with any number of explanatory
variables.
The population regression model with two explanatory variables is given as:
Y 1 X 1 2 X 2 i
is the intercept term which gives the average values of Y when X1 and X2 are zero.
1 and 2 are called the partial slope coefficient, or partial regression coefficients.
1 measures the change in the mean value of Y resulting from a unit change in the X1
given X2 (i.e. holding the value of X2 constant). Or equivalently 1 measures the direct
To complete the specification of our simple model we need some assumptions about the random
variable u. These assumptions are the same as in the single explanatory variable model
developed previously. That is:
Homoscedasticity, or var( i ) = 2
Zero covariance between i and each X variable, or cov( i ,X1 ) = cov( i ,X2 ) = 0
The assumption of no collinearity is a new one and means the absence of possibility of one of the
explanatory variables being expressed as a linear combination of the other. Existence of exact
linear dependence between X1 and X2 would mean that we have only one independent variable in
our model than two. If such a regression is estimated there is no way to estimate the separate
influence of X 1 (1 ) and X 2 ( 2 ) on Y, since such a regression gives us only the combined
influence of X1 and X2 on Y.
Y 1 X 1 2 X 2 i
Y 1 X 1 2 ( 2 X 1 ) i
Y ( 1 2 2 ) X 1 i
Y X 1 i where, 1 2 2
This assumption does not guarantee there will not be correlations among the explanatory
variables; it only means that the correlations are not exact or perfect, as it is not impossible to
find two or more (economic) variables that may not be correlated to some extent. Likewise the
assumption does not guarantee absence of non-linear relationships among X’s either.
Having specified our model we next use sample observations on Y, X1 and X2 obtain estimates
of the true parameters , 1 and 2
Y 1 X 1 2 X 2
where , 1 and 2 are estimates of the true parameters , 1 and 2 of the relationship.
As before, the estimates will be obtained by minimizing the sum of squared residuals
2
u i (Yi Yi ) 2 (Yi 1 X 1 2 X 2 ) 2
A necessary condition for this expression to assume a minimum value is that its partial
derivatives with respect to , 1 and 2 be equal to zero:
(Yi 1 X 1 2 X 2 ) 2
0
ˆ
(Yi 1 X 1 2 X 2 ) 2
0
1
(Yi 1 X 1 2 X 2 ) 2
0
2
Y i nˆ 1 X 1 2 X 2
X Y 1 i ˆ X 1 1 X 12 2 X 1 X 2
X Y 2 i ˆ X 2 1 X 1 X 2 2 X 22
From the solution of this system (by any method, for example using determinants) we obtain
values for , 1 and 2 . Besides, by solving the system of normal equations,
x y1 i 1 X 12 2 X 1 X 2
x 2 yi 1 x1 x2 2 X 22
The following formulae, in which the variables are expressed in deviations from their mean, may
obtained for estimating the values of the parameter estimates
Y 1 X 1 2 X 2
( x1 yi )( x22 ) ( x2 yi )( x1 x2 )
1
( x1 )( x22 ) ( x1 x2 ) 2
2
( x2 yi )( x12 ) ( x1 yi )( x1 x2 )
2
( x1 )( x22 ) ( x1 x2 ) 2
2
Where x1 X 1 X , x2 X 2 X and y i Yi Y
H0 : β3 = 2
H0 : β3 + β4 = 1
H0 : β3 + β4 = 1 and β5 = 1
H0 : β2β3 = 1
3. Which would you expect to be bigger – the unrestricted residual sum of squares or the
restricted residual sum of squares, and why?
4. What are the most common units of R2?
This assumption is imposed by the stochastic nature of economic relationships, which otherwise
it would be impossible to estimate with the common rule of mathematics. The assumption
implies that the observations of Y and X must be scattered around the line in a random way (and
hence the estimated line Yˆ = + ̂1 X be a good approximation of the true line.) This defines the
relationship connecting Y and X ‘on the average’. The alternative possible assumptions are either
E(u) > 0 or E(u) < 0. Assume that for some reason the U’s had not an average value of zero, but
tended most of them to be positive. This would imply that the observation of Y and X would lie
above the true line.
It can be shown that by using these observations we would get a bad estimate of the true line. If
the true line lies below or above the observations, the estimated line would be biased
Note that there is no test for the verification of this assumption because the assumption E(u) = 0
is forced upon us if we are to establish the true relationship. That is, we set E(u) = 0 at the outset
of our estimation procedure. Its plausibility should be examined in each particular case on a
priori grounds. In any econometric application we must be sure that the following things are
fulfilled so as to be safe from violating the assumption of E(u) = 0
All the important variables have been included into the function.
There are no systematically positive or systematically negative errors of measurement in
the dependent variable.
4.3. Assumption 2: The assumption of homoscedasticity (Var(ut) = 2 < )
The assumption of homoscedasticity (or constant variance) about the random variable u is that its
probability distribution remains the same over all observations of X, and in particular that the
variance of each u i is the same for all values of the explanatory variable. Symbolically we have
Var(u) = 2
If the errors do not have a constant variance, we say that they are heteroscedastic. Note that if u2
is not constant, but its value depends on X. we may write ui2 = f(Xi). As X increases, so does the
Cons
Income
High
Low
income
income
Figure: Increasing variance of u
Furthermore, suppose we have a cross-section sample of family budget from which we want to
measure the savings function. That means Saving = f(income). In this case the assumption of
constant variance of the u’s is not appropriate, because high-income families show a much
greater variability in their saving behavior than do low income families. Families with high
income tend to stick to a certain standard of living and when their income falls they cut down
their savings rather than their consumption expenditure. But this is not the case in low income
families. Hence, the variance of ui’s increase as income increases.
Note, however, that the problem of hetroscedasticity is the problem of cross-sectional data rather
than time series data. That is, the problem is more serious on cross section data.
Hetrodcedasticity can also arise as a result of several cases. The first one is the presence of
outliers (i.e., extreme values compared to the majority of a variable). The inclusion or exclusion
of such an observation, especially if the sample size is small, can substantially alter the results of
Another source of hetrodcedasticity arises from violating the assumption that the regression
model is correctly specified. Very often what looks like hetroscedasticity may be due to the fact
that some important variables are omitted from the model. In such situation the residuals
obtained from the regression may give the distinct impression that the error variance may not be
constant. But if the omitted variables are included in the model, the impression may disappear.
In summary we may say that on a priori grounds there are reasons to believe that the assumption
of homoscedasticity may often be violated in practice. It is therefore, important to examine the
consequences of hetroscedaticity.
a. If u is hetroscedastic, the OLS estimates do not have the minimum variance property in the
class of unbiased estimators; that is, they are inefficient in small samples. Furthermore,
they are inefficient in large samples
b. The coefficient estimates would still be statistically unbiased. That is the expected value of
c. The prediction (of Y for a given value of X) would be inefficient because of high variance.
This is because the variance of the prediction includes the variances of u and of the
parameter estimates, which are not minimum due to the incidence of hetroscedasticity.
In any case how does one detect whether the problem really exists.
The usual first step in attacking this problem is to determine whether or not heterodcedasticity
actually exists. There are several tests for this which is based on the examination of the OLS
residuals (i.e., ui). There are many formal tests: we will discuss Goldfeld-Quandt test and
White’s test
This test is applicable to large samples. The observation must be at least twice as many as the
parameters to be estimated. The test assumes normality and serially independent disturbance
term, Ui’s. Consider the following:
Furthermore, suppose that the test is to assess whether there exists hetroscedasticity or not. The
hypothesis to be tested is
1. Split the total sample of length N into two sub-samples of length N1 and N2. The
regression model is estimated on each sub-sample and the two residual variances are
calculated.
2. The null hypothesis is that the variances of the disturbances are equal,
H0: 12 22
3. The test statistic, denoted GQ, is simply the ratio of the two residual variances where the
larger of the two variances must be placed in the numerator.
S12
GQ 2
S2
4. The test statistic is distributed as an F(N1-k, N2-k) under the null of homoscedasticity.
And the null of a constant variance is rejected if the test statistic exceeds the critical
value.
5. If the above GQ > F(N1-k, N2-k) we accept that there is hetroscedasticity (that is we
reject the null hypothesis of no difference between the variances of U’s in the two sub
samples). If GQ < F(N1-k, N2-k) , we accept that the u’s are homoscedastic (in other
White’s general test for heteroscedasticity is one of the best approaches because it makes few
assumptions about the form of the heteroscedasticity.
And we want to test Var(u) = 2. We estimate the model, obtaining the residuals, u
2) Then run the auxiliary regression(Regress the squared residuals on a constant, the
original regressors, the original regressors squared and, if enough data, the cross-products
of the X’s), and get R2
uˆ 2 1 x1 2 x2 3 x12 4 x22 5 x1 x2 v
3) Obtain R2 from the auxiliary regression and multiply it by the number of observations, N.
It can be shown that
N R2 2 (k)
where k is the number of regressors in the auxiliary regression excluding the constant term.
4) If the 2 test statistic from step 3 is greater than the corresponding value from the
statistical table then reject the null hypothesis that the disturbances are homoscedastic.
C. Remedial Measures: Solutions for Hetroscedastic Disturbances
As we have seen, hetroscedacticity does not destroy the unbiasedness and consistency properties
of the OLS estimators, but they are no longer efficient, not even asymptotically (i.e., large
sample size). This lack of efficiency makes the usual hypothesis testing procedure of dubious
value. Therefore, remedial measures are clearly called for. When hetroscedasticity is established
1. If the form (i.e. the cause) of the heteroscedasticity is known, then we can use an estimation
method which takes this into account (called generalised least squares, GLS). A simple
illustration of GLS is as follows: Suppose that the error variance is related to another variable
i.e Suppose that we assume the error variance is proportional to Xi2. That is,
E(ui2) = 2Xi2
It is believed that the variance of ui is proportional to the square of the explanatory variable X,
one may transform the original model as follows. Divide the original through by Xi to obtain
Yi
1 i
Xi Xi Xi
1
= + 1 + Vi
Xi
where Vi is the transformed disturbance term, equal to ui/Xi. Now it is easy to verify that
2
E(Vi ) = E i
2
Xi
1
2
E ( i2 )
Xi
E(ui2) = 2Xi2
E(Vi2) =
1
Xi2
2 X i2 2
Thus the variance of Vi is homoscedastic and one may proceed to apply OLS to the transformed
equation. Notice that in the transformed regression the intercept term 1 is the slope coefficient
in the original equation and the slope coefficient 0 is the intercept term in the original model.
Therefore, to get back to the original model we shall have to multiply the estimated (4.11) by Xi
2. Given the model Yi = + 1Xi + ui suppose that we assume the error variance to be
proportional to Xi. That is,
E(Ui2) = 2Xi
In this case the original model can be transformed by dividing the model with X i . That is,
Yi Xi i
1
Xi Xi Xi Xi
1
= 1 X i Vi = Y* = * + 1*Xi + Vi where Vi = i X i and Xi > 0
xi
Given assumption 2, one can readily verify that E(Vi2) = 2, a homoscedastic situation. That is
2
2
Var (Vi) = E(Vi ) = E i
X i
1 2
Var (Vi) = Xi = 2
Xi
Therefore, one may proceed to apply OLS to the transformed equation. Note an important feature
of the transformed model: It has no intercept term. Therefore, one will have to use the regression
through the origin model to estimate and 1. Having run regression on the transformed model
one can get back to the original model simply by multiplying it with Xi
Very often such transformation reduces hetrodcedasticity when compared with the regression
Yi = + 1Xi + Ui
This result arises because log transformation compresses the scales in which the variables are
measured. For example log transformation reduces a ten-fold difference between two values
(such as between 8 and 80) into a two-fold difference (because ln(80) = 4.32 and ln(8) = 2.08)
To conclude, the remedial measures explained earlier through transformation point out that we
are essentially speculating about the nature of i2. Note, also that the OLS estimators obtained
from the transformed equation are BLUE. Which of the transformation discussed will work will
depend on the nature of the problem and the severity of hetroscedasticity. Moreover, we may not
know a priori which of the X variable should be chosen for transformation the data in case of
multiple regression model. In addition log transformation is not applicable if some of the Y and
X values are zero or negative. Besides the use of t-test, F tests, etc are valid only in large samples
when regression is conducted in transformed variables.
But if this assumption is violated, it implies that the disturbances are said to be auto correlated.
This could arise for several reasons.
+
û t û t
+
-3.7
-6
-6.5
- +
uˆ t 1 -6
-3.1
Time -5
-3
0.5
-1
1
4
3
5
7
- - 8
7
time.
û t
+ +
û t
- +
Time
uˆ t 1
- -
- uˆ t +1
Figure: No Autocorrelation
Consequences of Autocorrelation
When the disturbance term exhibits serial correlation the value as well as the standard errors of
the parameter estimates is affected.
i) If disturbances are correlated, the prevailed value of the disturbances has some information to
convey about the current disturbances. If this information is ignored it is clear that the sample
data is not being used with maximum efficiency. However the estimates of the parameters do
not have the statistical biased even when the residuals are serially correlated. That is, the
parameter of OLS estimates is statistically unbiased in the sense that their expected value is
equal to the true parameter.
ii) The variance of the random term u may be seriously underestimated. In particular, the under
estimation of the variance of u will be more serious in the case of positive autocorrelation of
the error term (ut). With positive first-order auto correlated errors it implies that fitting an
OLS estimating line clearly gives an estimate quite wide of the mark. The high variation in
these estimates will cause the variance of ’s to be greater than it would have been had the
errors been distributed randomly.
iii) The prediction based on ordinary least squares estimate will be inefficient with
autocorrelated errors. This is because of having a larger variance as compared with
predictions based on estimates obtained from other econometric techniques. Recall that the
variance of the forecast depends on the variances of the coefficient estimates and the variance
Note that since the population disturbances t , cannot be observed directly, we use its proxy,
the residual t which can be obtained form the usual OLS procedure. The examination of t can
provide useful information not only about autocorrelation but also about hetrescedasticity, model
inadequacy, or specification bias.
A. Durbin-Watson d Test
The most celebrated test for detecting serial correlation is the one developed by statisticians
Durbin and Watson. It is popularly known as the Durbin-Watson d-Statistic which is defined as
n
t 1
2
t
t 2
DW = n
2
t 2
t
which is simply the ratio of the sum of squared differences in successive residuals to the residual
sum of squares, RSS. Note that in the numerator of the d statistic the number of observations is
n-1 because one observation is lost in taking successive differences. Note that expanding the
above formula allows us to obtain
DW = 2(1 - ̂ ).
1 ˆ 1
Although it is not used routinely, it is important to note the assumptions underlying the d-
statistics
t = t 1 + t
d) the regression model does not include lagged value(s) of the dependent variable as one of
the explanatory variables
e) there are no missing observations in the data
Note from the Durbin-Watson statistic that for positive autocorrelation ( > 0), successive
disturbance values will tend to have the same sign and the quantities ( t - t 1 )2 will tend to be
small relative to the squares of the actual values of the disturbances. We can therefore, expect the
value of DW to be low. Indeed, for the extreme case = 1 it is possible that t = t 1 for all t so
that the minimum possible value of the equation is zero. However, for negative autocorrelation,
since positive disturbance values now tend to be followed by negative ones and vise versa, the
quantities ( t - t 1 )2 will tend to be large relative to the squares of the u’s. Hence, the value of
DW now tends to be high. The extreme case here is when = 0 we should expect DW to take a
value in the neighborhood of 2. Notice, however, that when = 0, the equation reduces to t =
t for all t, so that t takes on all the property of t – in particular it is no longer autocorrelated.
Thus in the absence of the autocorrelation we can expect DW = 2(1 - ̂ ). to take a value close
to 2, when negative autocorrelation is present a value in excess of 2 and may be as high as 4, and
when positive autocorrelation is present a value lower than 2 and may be close to zero.
The Durbin-Watson test tests the hypothesis that H0: = 0 (implying that the error terms are not
autocorrelated with a first order scheme) against the alternate. However, the sampling
distribution for the d-statistic depends on the sample size n, the number of explanatory variables
k and also on the actual sample values of the explanatory variables. Thus, the critical values at
which we might, for example reject the null hypothesis at 5 percent level of significance depend
very much on the sample we have chosen. Notice that it is impracticable to tabulate critical
values for all possible sets of sample values. What is possible however, is for given values of n
and k, to find upper and lower bounds such that actual critical values for any set of sample values
The Durbin-Watson test procedure in testing the null hypothesis of = 0 against the alternative
hypothesis of positive autocorrelation is illustrated in the figure below.
Note that under the null hypothesis the actual sampling distribution of d, for the given n and k
and for the given sample X values is shown by the unbroken curve. It is such that 5 percent of
the area beneath it lies to the left of the point d*, i.e., P(d < d*) = 0.05. If d* were known we
would reject the null hypothesis at the 5 percent level of significance if for our sample d < d *.
Unfortunately, for the reason given above, d* is unknown. The broken curve labeled DL and du
represent for given values of n and k, the upper and lower limits to the sampling distribution of d
with in which the actual sampling distribution must lie whatever the sample x-values.
du
dL
d*L d* d*u 4
The point d*U and d*L are such that the areas under the respective du and dL curves to the left of
these points are in each case 5 percent of the total area. i.e., p(dL < d*L) = p(dU < d*U) = 0.05. It
is the point d*U and d*L, representing the upper and lower bounds to the unknown d*, that are
tabulated for varying values of n and k. Clearly, if the sample value of the Durbin-Watson
statistic lies to the left of d*L it must also lie to the left of d*, while if it lies to the right of d*U . it
The decision criterion for the Durbin-Watson test is therefore, of the following form
- for DW < d*L reject the null hypothesis of no autocorrelation in favor of positive
autocorrelation;
- for DW > d*U do not reject null hypothesis, i.e., insufficient evidence to suggest positive
autocorrelation;
- for d*L < DW < d*U test inconclusive.
Because of the symmetry of the distribution illustrated in the previous figure it is also possible to
use the tables for d*L and d*U to test the null hypothesis of no autocorrelation against the
alternative hypothesis of negative autocorrelation, i.e. < 0. The decision criterion then takes the
form.
- for DW > 4 - d*L reject the null hypothesis of no autocorrelation in favor of negative
autocorrelation.
- for DW < 4 - d*U do not reject null hypothesis, i.e., insufficient evidence to suggest negative
autocorrelation
- for 4 - d*L > DW > 4- d*U test inconclusive.
Note that tables for d*U and d*L are constructed to facilitate the use of one-tail rather than two tail
tests. The following representation explains better the actual test procedure which shows that the
limit of d are 0 and 4.
t 1 t 1 2 t 2 3 t 3 ... r t r vt vt N(0, 2 )
H1 : 1 0 or 2 0 or ... or r 0
1. Estimate the linear regression using OLS and obtain the residuals, t
2. Regress t on all of the regressors from stage 1 (the x’s) plus t 1 , t 2 ,...r t r
If the test statistic exceeds the critical value from the statistical tables, reject the null hypothesis
of no autocorrelation.
Yi = + 1Xi + ui
t = t 1 + t
Step 1: Estimate the two variable model by the standard OLS routine and obtain the residuals t
t t 1 vt
Step 3: Using ̂ obtained from step 2 regression, run the generalized difference equation similar
as follows
or Y*t = * + *1X*t + t*
Step 4: Since a priori it is not known that the ̂ obtained from the regression in step 2 is the best
estimate of , substitute the values of * and ̂1* obtained from the regression in step 3
into the original regression (4.21) and obtain the new residuals, say t** as
t** = Yt - *- ̂1* Xt
Note that this can be easily computed since Yt, Xt, * and ̂1* are all known.
Since we do not know whether this second round estimate of is the best estimate of , we can
go into the third estimate, and so on. That is why the Cochrane-Orcutt method is said iterative.
But how long should we go on? The general procedure is to stop carrying out iterations when the
successive estimates of converges (i.e., differ by very small amount). Thus we select that
chosen to transform the model and apply a kind of GLS estimation that minimizes the problem
of autocorrelation.
One of the assumption of the classical linear regression model (CLRM) is that there is no perfect
multicollinearity among the regressors included in the regression model. Note that although the
assumption is said to be violated only in the case of exact multicollinearity (i.e., an exact linear
relationship among some of the regressors), the presence of multicollinearity (an approximate
linear relationship among some of the regressors) lead to estimating problems important enough
to warrant out treating it as a violation of the classical linear regression model.
Multicollinearity does not depend on any theoretical or actual linear relationship among any of
the regressors; it depends on the existence of an approximate linear relationship in the data set at
hand. Unlike most other estimating problems, this problem is caused by the particular sample
available. Multicollinearity in the data could arise for several reasons. For example, the
independent variables may all share a common time trend, one independent variable might be the
lagged value of another that follows a trend, some independent variable may have varied
together because the data were not collected from a wide enough base, or there could in fact exist
some kind of approximate relationship among some of the regressors.
Note that the existence of multicollinearity will affect seriously the parameter estimates.
Intuitively, when any two explanatory variables are changing in nearly the same way, it becomes
extremely difficult to establish the influence of each one regressors on the dependent variable
B. Consequences of Multicollinearity
In the case of near or high multicollinearity, one is likely to encounter the following
consequences
i) Although BLUE, the OLS estimators have large variances and covariances, making precise
estimation difficult. This is clearly seen through the formula of variance of the estimators.
For example of multiple linear regression, Var( ˆ1 ) can be written as follows
2
Var( ˆ1 ) =
x 1 R
2
1i
2
12
It is apparent from the above formula that as r12 (which is the coefficient of correlation between
X1 and X2) tends towards 1, which is as collinearity increases, the variance of the estimator
increases. The same holds for Var( ˆ 2 )and the cov ( ˆ1 , ˆ 2 )
ii) Because of consequence (i), the confidence interval tend to be much wider, leading to the
acceptance of the “Zero null hypothesis” (i.e., the true population coefficient is zero).
iii) Because of consequence (i), the t-ratio of one or more coefficient’s tend to be statistically
insignificant.
iv) Although the t-ratio of one or more coefficients is statistically insignificant, R2, the overall
measure of goodness of fit, can be very high. This is the basic symptom of the problem.
v) The OLS estimators and their standard errors can be sensitive to small changes in the data.
That is when few observations are included, the pattern of relationship may change and
affect the result.
Note that multicollinearity is a question of degree and not of a kind. The meaningful distinction
is not between the presence of multicolinearity, but between its various degrees.
Multicollinearity is a feature of the sample and not of the population. Therefore, we do not “test
for multicollinearity” but can, if we wish, measure its degree in any particular sample. The
following are some rules of thumb and formal rules to detection of multicolinearity.
i) High R2 but few significant t-ratios: If R2 is high, say in excess of 0.8, the F-test in most
cases will reject the hypothesis that the partial slope coefficients are simultaneiously equal
to zero. But the individual t tests will show that none of very few of the partial slope
coefficients are statistically different from zero.
ii) High pair-wise correlation among regressors. If the pair-wise correlation coefficient among
two regressors is high, say in excess of 0.8, then multicolinearity is a serious problem.
iii) Auxiliary Regression: - Since multicollinearity arises because one or more of the regressors
are exact or approximately linear combinations of the other regressors, one way of finding
out which X variable is related to other X variables is to regress each Xi on the remaining X
variables and compute the corresponding R2that will help to decide abut the problem. For
example, consider the following auxiliary regression :
Xk = 1X1 + 2X2 + … + k-1Xk-1 + V
If the R2 of the above regression is high it implies that Xk is highly correlated with the rest
of the explanatory variables and hence drop Xk from the model.
D. Remedial Measures
The existence of multicolinearity in a data set doesnot necessarily mean that the coefficient
estimators in which the researcher is interested have unacceptably high variance. Thus, the
a) Obtain more data: - Because the multicollinearity is essentially a data problem, additional
data that do not contain the multicollinearity feature could solve the problem. For example,
in the three variable model we saw that
2
Var( ˆ1 ) =
x2
1i (1 r122 )
Now as the sample size increases, x1i2 will generally increases. Thus for any given r12, the
variance of ˆ1 will decrease, thus decreasing the standard error, which will enable us to estimate
1 more precisely.
b) Drop a variable: - when faced with severe multicollinearity, one of the “simplest” thing to
do is to drop one of the collinear variables. But note that in dropping a variable from the
model we may be committing a specification bias or specification error. Specification bias
arises from incorrect specification of the model used in the analysis. Thus, if economic
theory requires some variables to be included in the model, dropping one of the variables
due to multicollinearity problem would constitute specification bias. This is because we are
dropping a variable when its true coefficient in the equation being estimated is not zero.
c) Transformation of variables: - In time series analysis, one reason for high multicollinearity
between two variables is that over time both variables tend to move in the same direction.
One way of minimizing then dependence is to transform the variables. That is, suppose Yt =
α + 1X1t + 2X2t.
This relation must also hold at time t-1 because the origin of time is arbitrary anyway.
Therefore we have
This is known as the first difference form because we run the regression, not on the original
variables, but on the difference of successive values of the variables. The first difference
regression model often reduces the severity of multicollinearity because, although the levels of
X1 and X2 may be highly correlated, there is no a priori reason to believe that their difference
will also be highly correlated
22 based on the last 30 observations = 140, df = 25. Carrying out the GQ test of
hetroscedasticity at the 5% level of significance.