0% found this document useful (0 votes)
3 views

Introduction to Econometrics_Module

The document is a course outline for an Introduction to Econometrics at Jigdan College, detailing the course description, objectives, and chapter contents. It emphasizes the importance of econometrics in applying statistical and mathematical methods to economic data for theory testing and forecasting. The course aims to equip students with skills in model formulation, data analysis, and the use of statistical software.

Uploaded by

Biruk Abiyu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Introduction to Econometrics_Module

The document is a course outline for an Introduction to Econometrics at Jigdan College, detailing the course description, objectives, and chapter contents. It emphasizes the importance of econometrics in applying statistical and mathematical methods to economic data for theory testing and forecasting. The course aims to equip students with skills in model formulation, data analysis, and the use of statistical software.

Uploaded by

Biruk Abiyu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

Jigdan College

Introduction to Econometrics (Econ…)

March, 2023

Addis Ababa, Ethiopia

Jigdan College Intr. to Econometrics 1


Contents page
Course description .......................................................................................................................... 4

Objectives of the Course ................................................................................................................. 4

Chapter One .................................................................................................................................... 5

1. INTRODUCION ..................................................................................................................... 5

1.1. Definition/Meaning of Econometrics ............................................................................... 5

1.2. Why Is Econometrics a Separate Discipline? .................................................................. 5

1.3. Econometrics vs. Mathematical Economics..................................................................... 7

1.4. Econometrics vs. Statistics ............................................................................................... 7

1.5. Economic Models vs. Econometric Models ..................................................................... 8

1.6. Division of Econometrics ................................................................................................. 9

1.7. Steps involved in formulating an econometric model .................................................... 10

1.8. Goals of Econometrics ................................................................................................... 12

Chapter Two.................................................................................................................................. 14

2. The Classical Linear Regression Analysis: The Simple Linear Regression Models ............ 14

2.1. Basic Concepts and Assumptions .................................................................................. 14

2.1.1. The Modern Interpretation of Regression ............................................................... 14

2.1.2. Terminology............................................................................................................ 15

2.1.3. A Note on the Measurement Scales of Variable ..................................................... 15

2.1.4. The Population Regression Function (PRF) ........................................................... 16

2.1.5. The Sample Regression Function (SRF)................................................................. 17

2.2. The Ordinary Least Squares Methods (OLS) ................................................................. 19

2.3. The Least Square Criterion And Normal Equations Of OLS......................................... 20

2.4. Properties of Least-Squares Estimators: The Gauss–Markov Theorem ........................ 26

2.5. Measure of “Goodness of Fit” R2................................................................................... 26

Jigdan College Intr. to Econometrics 2


2.6. Precision and Standard Errors ........................................................................................ 30

2.7. Statistical Inferences: Hypothesis Testing ..................................................................... 32

2.7.1. Testing Hypotheses: The Test of Significance Approach....................................... 33

2.7.2. The Confidence Interval Approach to Hypothesis Testing..................................... 36

2.7.3. The Errors That We Can Make Using Hypothesis Tests ........................................ 39

Chapter Three................................................................................................................................ 43

3. The Classical Linear Regression Analysis: Multiple Linear Regression Models ................. 43

Chapter Four ................................................................................................................................. 50

4. Classical Linear Regression Model Assumptions and Diagnostics: Violation of the


Assumptions of the CLRM ............................................................................................................. 50

4.1. Introduction .................................................................................................................... 50

4.2. Assumption 1: The assumption of mean of the disturbance is zero (E(u) = 0) .............. 51

4.4. Assumption 3: The Assumptions of No Autocorrelation ............................................... 58

4.5. Assumption 4: Multicollinearity .................................................................................... 68

Appendix ....................................................................................................................................... 74

Jigdan College Intr. to Econometrics 3


Introduction to Econometrics

Course description
Econometrics is the quantitative application of statistical, economic theories and mathematical
models using data to develop theories or test existing hypotheses in economics and to forecast
future trends from historical data. The objective of Econometrics is to quantify such relationships
using available data and statistical techniques to interpret and use the resulting outcomes. So,
Econometrics is the application of statistical and mathematical methods to the analysis of
economic data, with the purpose of giving empirical content to economic theories and then
verifying or refuting them. Bridging the gap between theory and policy analysis requires
acquiring the practice of applying the concepts, theories and methods of Economics to policy
analysis. This course is designed to meet this challenge by providing insights on how the three
elements of Econometrics namely: economic theory, data and statistical procedures can be
combined, to provide useful information to policy analysts and decision makers. In this course,
practical exercises using econometrics and statistical soft wares such as SPSS, STATA, EViews
and others like EXCEL will be conducted to equip students with knowledge and skill of using
software for data analysis.

Objectives of the Course


Students are expected to:

 Understand the main goals of econometrics and its purpose;


 Develop/ formulate regression models based on theory in their field of study.
 Estimate the regression model using the real data and interpret the result,
 Use estimated equations to make predictions and forecasting
 Understand and apply the methodology of Econometrics for their research project.

Jigdan College Intr. to Econometrics 4


Chapter One
1. INTRODUCION

The study of econometrics has become an essential part of every undergraduate course in
economics, and it is not an exaggeration to say that it is also an essential part of every
economist‘s training. This is because the importance of applied economics is constantly
increasing and the ability to quantity and evaluates economic theories and hypotheses constitutes
now, more than ever, a bare necessity. Theoretical economies may suggest that there is a
relationship between two or more variables, but applied economics demands both evidence that
this relationship is a real one, observed in everyday life and quantification of the relationship,
between the variable relationship using actual data is known as econometrics.

1.1. Definition/Meaning of Econometrics

Literally econometrics means measurement (the meaning of the Greek word metrics) in
economic. However econometrics includes all those statistical and mathematical techniques that
are utilized in the analysis of economic data. The main aim of using those tools is to prove or
disprove particular economic propositions and models.

Econometrics, the result of a certain outlook on the role of economics consists of the application
of mathematical statistics to economic data to tend empirical support to the models constructed
by mathematical economics and to obtain numerical results. Econometrics may be defined as the
quantitative analysis of actual economic phenomena based on the concurrent development of
theory and observation, related by appropriate methods of inferences.

Econometrics may also be defined as the social sciences in which the tools of economic theory,
mathematics and statistical inference are applied to the analysis of economic phenomena.
Econometrics is concerned with the empirical determination of economic laws.

1.2. Why Is Econometrics a Separate Discipline?

Based on the definition above, econometrics is an amalgam of economic theory, mathematical


economics, economic statistics and mathematical statistics. However, the course (Econometrics)
deserves to be studied in its own right for the following reasons:

Jigdan College Intr. to Econometrics 5


1. Economic theory makes statements or hypotheses that are mostly qualitative in nature.
For example, microeconomics they states that, other thing remaining the
same, a reduction in the price of a commodity is expected to increase the quantity
demanded of that commodity. Thus, economic theory postulates a negative or inverse
relationship between the price and quantity demanded of a commodity. But the theory
itself does not provide any numerical measure of the relationship between the two\; that is
it does not tell by how much the quantity will go up or down as a result of a certain
change in the price of the commodity. It is the job of econometrician to provide such
numerical estimates. Stated differently, econometrics gives empirical content to most
economic theory.
2. The main concern of mathematical economics is to express economic theory in
mathematical form (equation) without regard to measurability or mainly interested in the
empirical verification of the theory. Econometrics, as noted in our discussion above, is
mainly interested in the empirical verification of economic theory. As we shall see in this
course later on, the econometrician often uses the mathematical equations proposed by
the mathematical economist but puts these equations in such a form that they lend
themselves to empirical testing and this conversion of mathematical and practical skill.
3. Economic statistics is mainly concerned with collecting, processing and presenting
economic data in the form of charts and tables. These are the jobs of the economic
statistician. It is he or she who is primarily responsible for collecting data on gross
national product (GNP) employment, unemployment, price etc. the data on thus collected
constitute the raw data for econometric work, but the economic statistician does not go
any further, not being concerned with using the collected data to test economic theories
and one who does that becomes an econometrician.
4. Although mathematical statistics provides many tools used in the trade, the
econometrician often needs special methods in view of the unique nature of the most
economic data, namely, that the data are not generated as the result of a controlled
experiment. The econometrician, like the meteorologist, generally depends on data that
cannot be controlled directly.

Jigdan College Intr. to Econometrics 6


1.3. Econometrics vs. Mathematical Economics

Mathematical economics states economic theory in terms of mathematical symbols. There is no


essential difference between mathematical economics and economic theory. Both state the same
relationships, but while economic theory use verbal exposition, mathematical symbols. Both
express economic relationships in an exact or deterministic form. Neither mathematical
economics nor economic theory allows for random elements which might affect the relationship
and make it stochastic. Furthermore, they do not provide numerical values for the coefficients of
economic relationships.

Econometrics differs from mathematical economics in that, although econometrics presupposes,


the economic relationships to be expressed in mathematical forms, it does not assume exact or
deterministic relationship. Econometrics assumes random relationships among economic
variables. Econometric methods are designed to take into account random disturbances which
relate deviations from exact behavioral patterns suggested by economic theory and mathematical
economics. Furthermore, econometric methods provide numerical values of the coefficients of
economic relationships.

1.4. Econometrics vs. Statistics

Econometrics differs from both mathematical statistics and economic statistics. An economic
statistician gathers empirical data, records them, tabulates them or charts them, and attempts to
describe the pattern in their development over time and perhaps detect some relationship
between various economic magnitudes. Economic statistics is mainly a descriptive aspect of
economics. It does not provide explanations of the development of the various variables and it
does not provide measurements the coefficients of economic relationships.

Mathematical (or inferential) statistics deals with the method of measurement which are
developed on the basis of controlled experiments. But statistical methods of measurement are not
appropriate for a number of economic relationships because for most economic relationships
controlled or carefully planned experiments cannot be designed due to the fact that the nature of
relationships among economic variables are stochastic or random. Yet the fundamental ideas of
inferential statistics are applicable in econometrics, but they must be adapted to the problem
economic life. Econometric methods are adjusted so that they may become appropriate for the

Jigdan College Intr. to Econometrics 7


measurement of economic relationships which are stochastic. The adjustment consists primarily
in specifying the stochastic (random) elements that are supposed to operate in the real world and
enter into the determination of the observed data.

1.5. Economic Models vs. Econometric Models


I. Economic models: Any economic theory is an observation from the real world. For one
reason, the immense complexity of the real world economy makes it impossible for us to
understand all interrelationships at once. Another reason is that all the interrelationships
are not equally important as such for the understanding of the economic phenomenon
under study. The sensible procedure is therefore, to pick up the important factors and
relationships relevant to our problem and to focus our attention on these alone. Such a
deliberately simplified analytical framework is called on economic model. It is an
organized set of relationships that describes the functioning of an economic entity under a
set of simplifying assumptions. All economic reasoning is ultimately based on models.
Economic models consist of the following three basic structural elements.
 A set of variables
 A list of fundamental relationships and
 A number of strategic coefficients
II. Econometric models: The most important characteristic of economic relationships is that
they contain a random element which is ignored by mathematical economic models
which postulate exact relationships between economic variables.

Example: Economic theory postulates that the demand for a commodity depends on its price (P),
on the prices of other related commodities (Po), on consumers’ income(Y) and on tastes (t). This
is an exact relationship which can be written mathematically as:

Q    1 P   2 Po   3Y   4 t

The above demand equation is exact. However, many more factors may affect demand. In
econometrics the influence of these ‘other’ factors is taken into account by the introduction into
the economic relationships of random variable. In our example, the demand function studied
with the tools of econometrics would be of the stochastic form:

Jigdan College Intr. to Econometrics 8


Q    1 P   2 Po   3Y   4 t  

Where stands for the random factors which affect the quantity demanded.

1.6. Division of Econometrics

Econometrics may be divided in to two broad categories

1. Theoretical Econometrics
2. Applied Econometrics
Theoretical Econometrics: is concerned with the development of appropriate methods for
measuring economic relationships specified by econometric models. In this aspect, econometrics
leans heavily on mathematical statistics. For example, one of the tools that are used extensively
is the method of least squares. It is the concern of theoretical econometrics to spell out the
assumptions of this method, its properties, and what happens to these properties when one or
more of the assumptions of the method are not fulfilled.
In applied Econometrics we use the tools of theoretical econometrics to study some special
field(s) of economics, such as the production function, consumption function, investment
function, demand and supply functions, etc.
Applied econometrics includes the applications of econometric methods to specific branches of
economic theory. It involves the application of the tools of theoretical econometrics for the
analysis of economic phenomena and forecasting economic behavior.

Jigdan College Intr. to Econometrics 9


1.7. Steps involved in formulating an econometric model

Although there are of course many different ways to go about the process of model building, a
logical and valid approach would be to follow the steps described in figure below

1a. Economic or Financial Theory (Previous Studies)

1b. Formulation of an Estimable Theoretical Model

2. Collection of Data

3. Model Estimation

4. Is the Model Statistically Adequate?

No Yes

Reformulate Model 5. Interpret Model

6. Use for Analysis

Step 1a and 1b: General statement of the problem. This will usually involve the formulation
of a theoretical model, or intuition from economic or financial theory that two or more variables
should be related to one another in a certain way. The model is unlikely to be able to completely
capture every relevant real-world phenomenon, but it should present a sufficiently good
approximation that it is useful for the purpose at hand.

Jigdan College Intr. to Econometrics 10


Step 2: Collection of data relevant to the model. The data used in the estimation of
econometric model may be of various types.

 Cross-sectional data: data collected on one or more variables collected at


particular period of time. Example: Data for people, households, businesses,
countries, cities, etc.

 Data may be qualitative or quantitative


Qualitative data: are sometimes called dummy variables or categorical
variable. These are variables that cannot be quantified such as male or
female, married or unmarried, religion, etc
Quantitative data are data that can be quantified such as income, prices,
money etc.

 Time series data: a data related to a sequence of observations over time on an


individual or group of individuals etc. Example: Government budget deficit,
GDP, growth rates etc.

 Panel data: These are the results of repeated survey of a single (cross sectional
data) sample in different periods of time. Example: employment data across
individuals and over time

Step 3: Choice of estimation method relevant to the model proposed in step 1. For example,
is a single equation or multiple equation technique to be used?

Step 4: Statistical evaluation of the model. What assumptions were required to estimate the
parameters of the model optimally? Were these assumptions satisfied by the data or the model?
Also, does the model adequately describe the data? If the answer is ‘yes’, proceed to step 5; if
not, go back to steps 1--3 and either reformulate the model, collect more data, or select a
different estimation technique that has less stringent requirements.

Step 5: Evaluation of the model from a theoretical perspective Are the parameter estimates of
the sizes and signs that the theory or intuition from step 1 suggested? If the answer is ‘yes’,
proceed to step 6; if not, again return to stages 1-3.

Jigdan College Intr. to Econometrics 11


Step 6: Use of model. When a researcher is finally satisfied with the model, it can then be used
for testing the theory specified in step 1, or for formulating forecasts or suggested courses of
action. This suggested course of action might be for an individual (e.g. ‘if inflation and GDP rise,
buy stocks in sector X’), or as an input to government policy (e.g. ‘when equity markets fall,
program trading causes excessive volatility and so should be banned’).

It is important to note that the process of building a robust empirical model is an iterative one,
and it is certainly not an exact science. Often, the final preferred model could be very different
from the one originally proposed, and need not be unique in the sense that another researcher
with the same data and the same initial theory could arrive at a different final specification.

1.8. Goals of Econometrics

Three main goals of Econometrics are identified:

 Analysis i.e. testing economic theory: Economists formulated the basic principles of
the functioning of the economic system using verbal exposition and applying a
deductive procedure. Economic theories thus developed in an abstract level were not
tested against economic reality. Econometrics aims primarily at the verification of
economic theories.
 Policy making i.e. obtaining numerical estimates of the coefficients of economic
relationships for policy simulations. In many cases we apply the various econometric
techniques in order to obtain reliable estimates of the individual coefficients of the
economic relationships from which we may evaluate elasticity or other parameters of
economic theory (multipliers, technical coefficients of production, marginal costs,
marginal revenues, etc.) The knowledge of the numerical value of these coefficients is
very important for the decisions of firms as well as for the formulation of the
economic policy of the government. It helps to compare the effects of alternative
policy decisions.
 Forecasting i.e. using the numerical estimates of the coefficients in order to forecast
the future values of economic magnitudes. In formulating policy decisions it is
essential to be able to forecast the value of the economic

Jigdan College Intr. to Econometrics 12


magnitudes. Such forecasts will enable the policy-maker to judge whether it is
necessary to take any measures in order to influence the relevant economic variables.

Review questions

1. Define econometrics?
2. How does it differ from mathematical economics and statistics?
3. Why is econometrics a separate discipline?
4. Describe the main steps involved in any econometrics research.
5. Describe the types of data
6. Differentiate between economic and econometric model.
7. What are the goals of econometrics?

Jigdan College Intr. to Econometrics 13


Chapter Two
2. The Classical Linear Regression Analysis: The Simple Linear Regression Models
2.1. Basic Concepts and Assumptions
2.1.1. The Modern Interpretation of Regression

Broadly speaking, we may say Regression analysis is concerned with the study of the
dependence of one variable, the dependent variable, on one or more other variables, the
explanatory variables, with a view to estimating and/or predicting the (population) mean or
average value of the former in terms of the known or fixed (in repeated sampling) values of the
latter.

Example:

Dependent Variable(Y); Explanatory Variable (Xs)

 Y = Personal Consumption Expenditure X = Personal Disposable Income


 Y = Quantity demand; X = Price of the product
 Y = Crop yield; Xs = temperature, rainfall, sunshine, fertilizer

Statistical versus Deterministic Relationships

In statistical relationships among variables we essentially deal with random or stochastic


variables, that is, variables that have probability distributions. In functional or deterministic
dependency, on the other hand, we also deal with variables, but these variables are not random or
stochastic. In regression analysis, we are concerned with statistical dependence among variables
(not Functional or Deterministic), we essentially deal with random or stochastic variables (with
the probability distributions).

Regression versus Causation

Although regression analysis deals with the dependence of one variable on other variables, it
does not necessarily imply causation. In the words of Kendall and Stuart, “A statistical
relationship, however strong and however suggestive, can never establish causal connection: our
ideas of causation must come from outside statistics, ultimately from some theory or other.

Regression versus Correlation

Jigdan College Intr. to Econometrics 14


Closely related to but conceptually very much different from regression analysis is correlation
analysis, where the primary objective is to measure the strength or degree of linear association
between two variables. In regression analysis, however, we are not primarily interested in such a
measure. Instead, we try to estimate or predict the average value of one variable on the basis of
the fixed values of other variables.

2.1.2. Terminology

In the literature the terms dependent variable and explanatory variable are described variously.
A representative list is:

The dependent variable Independent variables


Explained variable Expnanatory variable(s)
Predictand Predictor(s)
Regressand Regressor(s)
Response variable Control variable(s)
Endogenous Exogenous (es)
If we are studying the dependence of a variable on only a single explanatory variable, such as
that of consumption expenditure on real income, such a study is known as simple, or two-
variable, regression analysis. However, if we are studying the dependence of one variable on
more than one explanatory variable, as in the crop-yield, rainfall, temperature, sunshine, and
fertilizer examples, it is known as multiple regression analysis.

2.1.3. A Note on the Measurement Scales of Variable

The variables that we will generally encounter fall into four broad categories: ratio scale,
interval scale, ordinal scale, and nominal scale. It is important that we understand each.

Ratio Scale: For a variable X, taking two values, X1 and X2, the ratio X1/X2 and the distance (X2 -
X1) are meaningful quantities. Also, there is a natural ordering (ascending or descending) of the
values along the scale. Therefore, comparisons such as X2 ≤ X1 or X2 ≥ X1 are meaningful. Most
economic variables belong to this category. E.g., it is meaningful to ask how big this year’s GDP
is compared with the previous year’s GDP.

Jigdan College Intr. to Econometrics 15


Interval Scale: An interval scale variable satisfies the last two properties of the ratio scale
variable but not the first. Thus, the distance between two time periods, say (2000–1995) is
meaningful, but not the ratio of two time periods (2000/1995).

Ordinal Scale: A variable belongs to this category only if it satisfies the third property of the
ratio scale (i.e., natural ordering). Examples are grading systems (A, B, C grades) or income
class (upper, middle, lower). For these variables the ordering exists but the distances between the
categories cannot be quantified.

Nominal Scale: Variables in this category have none of the features of the ratio scale variables.
Variables such as gender (male, female) and marital status (married, unmarried, divorced,
separated) simply denote categories.

2.1.4. The Population Regression Function (PRF)

It is clear that each conditional mean E(Y | Xi) is a function of Xi, where Xi is a given value of X.
Symbolically,

E(Y | Xi) = f (Xi)

Where f (Xi) denotes some function of the explanatory variable X. E(Y | Xi) is a linear function of
Xi and is known as the conditional expectation function (CEF) or population regression
function (PRF) or population regression (PR) for short. It states merely that the expected value
of the distribution of Y given Xi is functionally related to Xi. In simple terms, it tells how the
mean or average response of Y varies with X.

We may assume that the PRF E(Y | Xi) is a linear function of Xi, say, of the type

E(Y | Xi) =  + β1 X

Where  and β1 are unknown but fixed parameters known as the regression coefficients

Therefore, we can express the deviation of an individual Yi around its expected value as follows:

ui = Yi - E(Y | Xi)
or
Yi = E(Y | Xi) + ui

Jigdan College Intr. to Econometrics 16


Where the deviation ui is an unobservable random variable taking positive or negative values.
Technically, ui is known as the stochastic disturbance or stochastic error term.

If E(Y | Xi) is assumed to be linear in Xi, it may be written as:


Yi = E(Y | Xi) + ui
Yi =  + β1 Xi + ui
We call this equation stochastic specification of the PRF (true PRF)
2.1.5. The Sample Regression Function (SRF)

It is about time to face up to the sampling problems, for in most practical situations what we
have is but a sample of Y values corresponding to some fixed X’s. Therefore, the task now is to
estimate the PRF on the basis of the sample information.

Now, analogously to the PRF that underlies the population regression line, we can develop the
concept of the sample regression function (SRF) to represent the sample regression line. The
sample counterpart of the PRF may be written as:

  
Y    1 X i

Where Y is read as “Y-hat”= estimator of E(Y | Xi)

 =estimator of 

 1 =estimator of β1

Note that an estimator, also known as a (sample) statistic, is simply a rule or formula or
method that tells how to estimate the population parameter from the information provided by the
sample at hand. A particular numerical value obtained by the estimator in an application is
known as an estimate.

Now we can express the SRF in its stochastic form as follows:

  
Y    1 X i   i

Jigdan College Intr. to Econometrics 17


Where, in addition to the symbols already defined, ̂i denotes the (sample) residual term.

Conceptually  i is analogous to ui and can be regarded as an estimate of ui.
To sum up, our primary objective in regression analysis is to estimate the PRF

Yi =  + β1 Xi + ui
On the basis of the SRF
  
Yi    1 X i   i
Because more often than not our analysis is based upon a single sample from some
population:
The deviations of the observations from the line may be attributed to several factors.
(1) Omission of variables from the function
In economic reality each variable is influenced by a very large number of factors.
However, not all the factors influencing a certain variable can be included in the
function for various reasons.
(2) Random behavior of the human beings
The scatter of points around the line may be attributed to an erratic element which is
inherent in human behavior. Human reactions are to a certain extent unpredictable
and may cause deviations from the normal behavioral pattern depicted by the line.
(3) Imperfect specification of the mathematical form of the model
We may have linearized a possibly nonlinear relationship. Or we may have left out of
the model some equations.
(4) Errors of aggregation
We often use aggregate data (aggregate consumption, aggregate income), in which
we add magnitudes referring to individuals whose behavior is dissimilar. In this case
we say that variables expressing individual peculiarities are missing.
(5) Errors of measurement
This refers to errors of measurement of the variables, which are inevitable due to the
methods of collecting and processing statistical information.
The first four sources of error render the form of the equation wrong, and they are
usually referred to as error in the equation or error of omission. The fifth source of
error is called error of measurement or error of observation.

Jigdan College Intr. to Econometrics 18


In order to take in to account the above sources of error we introduce in econometric functions a
random variable u called random disturbance term of the function, so called because u is
supposed to disturb the exact linear relationship which is assumed to exist between X and Y.

2.2. The Ordinary Least Squares Methods (OLS)

To estimate the coefficients and we need observations on X, Y and u. yet u is never observed like
the other explanatory variables, and therefore in order to estimate the function Yi =  + β1 Xi + ui,
we should guess the values of u, that is we should make some reasonable assumptions about the
shape of the distribution of each ui (its means, variance and covariance with other u’s). These
assumptions are guesses about the true, but unobservable, value of ui.

The assumptions underlying the method of least squares

The linear regression model is based on certain assumptions, some of which refers to the
distribution of the random variable u, some to the relationship between u and the explanatory
variables, and some refers to the relationship between the explanatory variables themselves.

1. ui is a random real variable and has zero mean value: E(ui) = 0 or E(uiXi) = 0)
 This implies that for each value of X, u may assume various values, some
positive, and some negative but on average zero.
 Further E(Yi) = Yi =  + β1 Xi + ui gives the relationship between X and Y on the
average, i.e. when X takes on value Xi , then Y will on the average take on E(Yi)
(or E(YiXi))
2. The variance of ui is constant for all i, i.e., var(uiXi) = E( u i2 Xi) = , and is called the

assumptions of common variance or homoscedasticity.


 The implication is that for all values of X, the values of u show the same
dispersion around their mean.
 The consequence of this assumption is that var(uiXi) =

 If on the other hand the variance of Y population varies as X changes, a situation


of non-constancy of the variance of Y, called heteroscedasticity arises.
3. ui has a normal distribution, i.e., ui ∼ N(0, ), which also implies Yi ∼ N(  + β1 Xi, )

Jigdan College Intr. to Econometrics 19


4. The random terms of different observations are independent, cov(ui,uj) =E(ui,uj) = 0 for i
≠ j where i and j run from 1 to n. This is called the assumption of no autocorrelation
(serial) among the error terms.
 The consequence of this assumption is that cov(Yi,Yj) = 0, for i ≠ j i.e. no autocorrelation
among the Y’s.
5. Xi’s are a set of fixed values in the process of repeated sampling which underlies the
linear regression model, i.e. they are non-stochastic.
6. u is independent of the explanatory variables, i.e., cov(ui,Xi) = E(ui,Xi) = 0.
7. Variability in X values. The X values in a given sample must not all be the same.
Technically, var(X) must be a finite positive number.
8. The regression model is correctly specified. This assumption implies that there is no
specification bias or error in the model used in empirical analysis. This means that we
have included all the important regressions explicitly in the model and that its
mathematical form is correct.
Unfortunately in practice one rarely knows the correct variables to include in the model
or the correct functional forms of the model or the correct probabilistic assumptions
about the variables entering the model for the theory underlying the particular
investigation not be strong or result enough to answer all these questions. Therefore, in
practice, the econometrician has to use some judgment in choosing the number of
variables entering the model and the functional form of the model. To some extent there
is some trial and error involved in choosing the “right” model building is more often an
art rather than a science.
2.3. The Least Square Criterion And Normal Equations Of OLS

Thus far we have completed the work involved in the first stage of any econometric application,
namely we have specified the model and stated explicitly its assumptions. The next step is the
estimation of the model, that is, the computation of the numerical values of its parameters. The
linear relationship Yi =  + β1 Xi + ui holds for the population of the values of X and Y, so that
we could obtain the numerical values of  and β1 only if we could have all the possible values
of X, Y and u which form the population of these variables. Since this is impossible in practice,
we get a sample of observed values of Y and X, specify the distribution of the u’s and try to get
satisfactory estimates of the true parameters of the relationship. This is done by fitting a
Jigdan College Intr. to Econometrics 20
regression line through the observations of the sample, which we consider as an approximation
to the true line.

The method of ordinary least squares is one of the econometric methods which enable us to find
the estimate of the true parameter and is attributed to Carl Friedrich Gauss, a German
mathematician. To understand this method, we first explain the least squares principle.

Recall the two-variable PRF:


Yi =  + β1 Xi + ui

However, as noted in earlier, the PRF is not directly observable. We estimate it from the SRF:

  
Y    1 X i   i
 
Y  Y  i


Where Y is the estimated (conditional mean) value of Y

But how is the SRF itself determined? To see this, let us proceed as follows. First, express the
above equation as:

 
u i  Yi  Yi

  
i  Yi    1 X i


Which shows that the ui (the residuals) are simply the differences between the actual and
estimated Y values.

Now given n pairs of observations on Y and X, we would determine the SRF in such a manner
that it is as close as possible to the actual Y. To this end, we adopt the least-squares criterion,
which states that the SRF can be fixed in such a way that

2 
u i   (Yi  Yi ) 2

2  
u i   (Yi    1 X i ) 2 , is as small as possible, where u i2 are the squared residuals.

Jigdan College Intr. to Econometrics 21


2  
It is obvious that u i = f (  ,  1 ) that is, the sum of the squared residuals is some function of
 
the estimators  and  1 . For any given set of data, choosing different values for ̂ and ̂2will give

different ̂‟s and hence different values of  ui2 .

 
The principle or the method of least squares chooses  and  1 in such a manner that, for a given

sample or set of data,  ui2 is as small as possible. In other words, for a given sample, the
 
method of least squares provides us with unique estimates of  and  1 that give the smallest

possible value of  ui2 .

 
The process of differentiation yields the following equations for estimating  and  1

  u i2 
 2 (Yi  ˆ   1 X i )  0
ˆ t

  ui2
  2 xi ( yi  ˆ  ˆ xi )  0
1 t

Setting these equations to zero gives, the normal equations below

Y i  nˆ  1  X i )

X Y i i  ˆ  X i  1  X i2 )

where n is the sample size. These simultaneous equations are known as the normal equations.

Solving the normal equations simultaneously, we obtain

 n Yi X i   X i  Yi
1 
n X i2  ( X i ) 2


1 
 ( X  X )(Y  Y )
i i

(X  X ) i
2

Jigdan College Intr. to Econometrics 22



  Y  1 X

Where X and Y are the sample means of X and Y where we define xi  X i  X and y i  Yi  Y .
The above lowercase letters in the formula denote deviations from mean values and can be
obtained directly simply dividing both sides of the equation by n.

Note that, by making use of simple algebraic identities for estimating β1 can be alternatively
expressed as


1 
x y i i

x y i i

x x  nX 2
2 2
i i

The estimators obtained previously are known as the least-squares estimators, for they are
derived from the least-squares principle. We finally write the regression line equation as:

  
Yi    1 X i

Interpretation of estimates


 Estimated intercept,  : The estimated average value of the dependent variable when the
independent variable takes on the value zero

 Estimated slope,  1 : The estimated change in the average value of the dependent variable
when the independent variable increases by one unit.
 
 Yi gives average relationship between Y and X. i.e. Yi is average value of Y given Xi

Jigdan College Intr. to Econometrics 23


Numerical Example

We illustrate the econometric theory developed so far by considering the Keynesian consumption
function. As a test of the Keynesian consumption function, we use the sample data below.

Hypothetical data on weekly family consumption expenditure Y and weekly family income X

Y(Birr) X(Birr)
70 80
65 100
90 120
95 140
110 160
115 180
120 200
140 220
155 240
150 260
A. Determine the regression equation.

B. Determine the value of when X is 300.

C. Interpret the value of and

Jigdan College Intr. to Econometrics 24


Yi Xi Xi - X Yi- Y (Xi - X )2 (Xi - X ) (Yi- Y )
70 80 -90 -41 8100 3690
65 100 -70 -46 4900 3220
90 120 -50 -21 2500 1050
95 140 -30 -16 900 480
110 160 -10 -1 100 10
115 180 10 4 100 40
120 200 30 9 900 270
140 220 50 29 2500 1450
155 240 70 44 4900 3080
150 260 90 39 8100 3510
Sum 1110 1700 33000 16800
Mean 111 170
A.


1 
 ( X  X )(Y  Y )  16800  0.509
i i

(X  X ) i
2
33000


  Y  1 X  111  0.509 x170  24.47

The estimated regression line therefore is

  
Yi    1 X i  24.47  0.509 X i


B. If X is 300 then, Yi  24.47  0.509 x300  177.17

C. The associated regression line are interpreted as follows: Each point on the regression
line gives an estimate of the expected or mean value of Y corresponding to the chosen X
 
value; that is, Yi is an estimate of E(Y | Xi). The value of  1 = 0.5091, which measures the
slope of the line, shows that, within the sample range of X between Birr 80 and Birr 260
per week, as X increases, say, by Birr1, the estimated in the mean or average weekly

consumption expenditure amounts to about 51 cents. The value of  = 24.47, which is the

Jigdan College Intr. to Econometrics 25


intercept of the line, indicates the average level of weekly consumption expenditure when
weekly income is zero.

2.4. Properties of Least-Squares Estimators: The Gauss–Markov Theorem

Given the assumptions of the classical linear regression model, the least-squares estimates
possess some ideal or optimum properties. These properties are contained in the well-known

Gauss–Markov theorem. An estimator, say the OLS estimators  1 , is said to be a best linear
unbiased estimator (BLUE) of β1 if the following hold:

1. It is linear, that is, a linear function of a random variable, such as the dependent variable
Y in the regression model.

2. It is unbiased, that is, its average or expected value, E (  1 ), is equal to the true value, β1.
3. It has minimum variance in the class of all such linear unbiased estimators; an unbiased
estimator with the least variance is known as an efficient estimator.

 
In the regression context it can be proved that the OLS estimators (  ,  1 ) are BLUE

2.5. Measure of “Goodness of Fit” R2

After the estimation of the parameters and the determination of the least square regression line,
we need to know how ‘good’ is the fit of this line to the sample observation of Y and X, that is to
say we need to measure the dispersion of the observations around the regression line. It is clear
that if all the observations were to lie on the regression line, we would obtain a “perfect fit”, but
this is rarely the case. Hence, the knowledge of the dispersion of the observation around the
regression line is essential because the closer the observations to the line, the better the goodness
of fit. That is the better is the explanation of the variations of Y by the changes in the explanatory
variables. In general the coefficient of determination R2 is a summary measure that tells how
well the sample regression line fits the data.

Jigdan College Intr. to Econometrics 26


Consider the following diagram

SRF

Y
.
Unexplained variation due
 
to error   Y  Y

Explained variation due to



regression Y  Y


……………………………………………………………………………… Y

Figure 1 Breakdown of the variation of Yi into two components

By fitting the line Yˆ =  + ˆ1 Xi we try to obtain the explanation of the variation of the
dependent variable Y produced by the changes of the explanatory variable X. However, the fact
that the observations deviate from the estimated line shows that the regression line explains only
a part of the total variation of the dependent variable. A part of the variation, defined as ui = Yi -
Yˆ , remains unexplained. Note the following:

a) We may compute the total variation of the dependent variable by comparing each value of
Y to the mean value Y and adding all the resulting deviations

 
n n

 yi   Yi  Y
2 2
[Total variation in Y] =
i 1 i

b) In the same way we define the deviation of the regressed (i.e., the estimate from the line)
values of Yˆ ’s , from the mean value, ŷ  Yˆi  Y . This is part of the total variation of Yi
which is explained by the regression line. Thus, the sum of the squares of these deviations is
the total explained by the regression line

Jigdan College Intr. to Econometrics 27


n n
[Explained variation] =  yˆ i2    yˆ i  y 
2

i 1 i

c) Recall that we have defined the error term ui as the difference, u  Yi  Yˆi . This is Part of
the variation of the dependent variable which is not explained by the regression line and is
attributed to the existence of the disturbance variable U. Thus the sum of the squared
residuals gives the total unexplained variation of the dependent variable Y around its mean.
This is given by
n n
2
   Yi  yˆ 
2
[Unexplained variation] = i
i 1 i 1

we obtain

 Y i
2

 Y    yˆ i  Y    Y  yˆ 
2
i
2

This shows the total variation in the observed Y values about their mean values can be
partitioned in to two parts. One attributed to the regression line and the other to random forces
because not all actual Y observations lie on the fitted line. In other words total sum of square
(TSS) is equal to explained sum of square (ESS) plus residuals sum of squares (RSS).
Symbolically,

TSS = ESS + RSS

Or in deviation form it is given by

n n n

y
i 1
2
i   yˆ i2    i2
i 1 i 1

Total var iation  Explained var iation  Un exp lained var iation

Note that because an OLS estimator minimizes the sum of squared residuals (i.e., the
unexplained variation) it automatically maximizes R2. Thus maximization of R2 as a criterion of
an estimator is formally identical to the least squares criterion. we obtain

ESS RSS
1 = 
TSS TSS

Jigdan College Intr. to Econometrics 28


From point of view the above result can be rewritten as

 yˆ  Y    
2 2
i
1 =
 Y  Y   Y  Y 
2 2

We now define R2 as

 yˆ  Y 
2
2
R =
 Y  Y 
2

=
 yˆ 2
i
=
ESS
y 2
TSS

R2 measures the proportion of the total variation in Y explained by the regression model.

Two properties of R2 may be noted

i) It is a non-negative quantity. This is because we are dealing with sum of squares


ii) Its limit are 0  R2  1.
In this regard,

a) if R2 = 1 means a perfect fit, that is ŷi = Yi for each i (or alternativelyui2 = 0)

b) if R2 = 0 means there is no relationship between the regression and the regressor


Hence the closer R2 to 1, the model becomes a good fit for instance if R2 = 0.90 this means that
the regression line gives a good fit to the observed data, since this line explains 90 percent of the
total variation of the dependent variable values around their mean. The remaining 10 percent of
the total variation of the dependent variable is unaccounted for by the regression line and is
attributed to the factors included in the disturbance variable, u.

Note that if we are working on cross section data, an R2 value equal to 0.5 may be a good fit. But
for time series data 0.5 may be too low. This means that there is no hard-and-fast rule as to how
much R2 should be. Generally, however, R2 is a good fit the higher the value of it is.

Jigdan College Intr. to Econometrics 29


2.6. Precision and Standard Errors

• Any set of regression estimates of and are specific to the sample used in their
estimation. Recall that the estimators of  and  from the sample parameters ( and )
are given by ˆ 
 xy  nxy and ˆ  y  ˆx
 x 2  nx 2
• What we need is some measure of the reliability or precision of the estimators ( and ).
The precision of the estimate is given by its standard error and it can be shown to be
given by

 x2 x
2

SE (ˆ )  s s ,
n ( x  x ) 2 n x 2  n 2 x 2
1 1
SE ( ˆ )  s s
 (x  x)2  x  nx 22

where s is the estimated standard deviation of the residuals.

Estimating the Variance of the Disturbance Term

• The variance of the random variable u is given by

Var(u) = E[(u)-E(u)]2 which reduces to Var(u) = E(u2)

• We could estimate this using the average of:

1
s2 
n
 u2

• Unfortunately this is not workable since u is not observable. We can use the sample
counterpart to u, which is :
1
s2 
n
 uˆ 2
But this estimator is a biased estimator of 2.

• An unbiased estimator of  is given by

s
 uˆ 2

n2

Jigdan College Intr. to Econometrics 30


Where the residual sum of squares and n is the sample size

Some Comments on the Standard Error Estimators

1. Both SE( ) and SE( ) depend on s2 (or s). The greater the variance s2, then the more
dispersed the errors are about their mean value and therefore the more dispersed y will be
about its mean value.
2. The sum of the squares of x about their mean appears in both formulae. The larger the
sum of squares, the smaller the coefficient variances.
3. The larger the sample size, n, the smaller will be the coefficient variances. n appears
explicitly in SE( ) and implicitly in SE( ), n appears implicitly since the sum

(x  x) 2
is from t = 1 to n.

4. The term appears in the SE( ). The reason is that measures how far the points
are away from the y-axis.

Example: How to Calculate the Parameters and Standard Errors

• Assume we have the following data calculated from a regression of y on a single variable
x and a constant over 22 observations.


 xy  830102, n  22, x  416.5, y  86.65,
Data:

 x  3919654, RSS  130.6


2

• Calculations

 830102 - (22x416.5x 86.65


  0.35
3919654 - 22x(416.5) ^2


 = 86.5 - 0.35x416.5 = -59.12

• We write yˆ  ˆ  ˆx

yˆ  59.12  0.35x

SE(regression), s
 uˆ 2


130.6  2.55
n2 20

Jigdan College Intr. to Econometrics 31


x x
2
2

SE (ˆ )  s s ,
n ( x  x ) 2
n x 2  n 2 x 2
1 1
SE ( ˆ )  s s
 (x  x)2  x  nx 2
2

3919654
SE ( )  2.55 *  3.35

22  3919654  22  416.52 
1
SE (  )  2.55 *  0.0079

3919654  22  416.52 

We now write the results as:


yˆ t   59.12  0.35 xt
(3.35) (0.0079)

2.7. Statistical Inferences: Hypothesis Testing


We want to make inferences about the likely population values from the regression parameters.
Example: Suppose we have the following regression results:
yˆ  20.3  0.5091x
(14.38) (0.2561)

  0.5091 is a single (point) estimate of the unknown population parameter, . How “reliable”
is this estimate? The reliability of the point estimate is measured by the coefficient’s standard
error.
We can use the information in the sample to make inferences about the population. We will
always have two hypotheses that go together, the null hypothesis (denoted H0) and the alternative
hypothesis (denoted H1).
The null hypothesis is the statement or the statistical hypothesis that is actually being tested. The
alternative hypothesis represents the remaining outcomes of interest.
For example, suppose given the regression results above, we are interested in the hypothesis that
the true value of  is in fact 0.5. We would use the notation
H0 :  = 0.5
H1 :   0.5. This would be known as a two sided test.

Jigdan College Intr. to Econometrics 32


One-Sided Hypothesis Tests
Sometimes we may have some prior information that, for example, we would expect  > 0.5
rather than  < 0.5. In this case, we would do a one-sided test:
H0 :  = 0.5
H1 :  > 0.5
or we could have had
H0 :  = 0.5
H1 :  < 0.5
There are two ways to conduct a hypothesis test: via the test of significance approach or via the
confidence interval approach.
The Probability Distribution of the Least Squares Estimators
We assume that ut  N(0,2) What if the errors are not normally distributed? Will the parameter
estimates still be normally distributed? Yes, if the other assumptions of the classical linear
regression model (CLRM) hold, and the sample size is sufficiently large.
 
• Standard normal variates can be constructed from  and  :

ˆ   ˆ  
~ N 0,1 and ~ N 0,1
var   var  

• But var() and var() are unknown, so

ˆ   ˆ  
~ t n  2 and ~ t n2
SE (ˆ ) SE ( ˆ )
2.7.1. Testing Hypotheses: The Test of Significance Approach

Assume the regression equation is given by ,


y    x  u

The steps involved in doing a test of significance are:


   
1. Estimate  ,  and SE(  ) and SE(  ) in the usual way
2. Calculate the test statistic. This is given by the formula

Jigdan College Intr. to Econometrics 33


ˆ   *
test statistic 
SE ( ˆ )
Where  * is the value of  under the null hypothesis.
3. We need some tabulated distribution with which to compare the estimated test statistics.
Test statistics derived in this way can be shown to follow a t-distribution with n-2
degrees of freedom.
As the number of degrees of freedom increases, we need to be less cautious in our
approach since we can be more sure that our results are robust.
4. We need to choose a “significance level”, often denoted. This is also sometimes called
the size of the test and it determines the region where we will reject or not reject the null
hypothesis that we are testing. It is conventional to use a significance level of 5%.
Intuitive explanation is that we would only expect a result as extreme as this or more
extreme 5% of the time as a consequence of chance alone.
Conventional to use a 5% size of test, but 10% and 1% are also commonly used.
5. Given a significance level, we can determine a rejection region and non-rejection region.
For a 2-sided test:

95% non-rejection region


2.5% rejection region 2.5% rejection region

Figure 2: The Rejection Region for a two-Sided Test

Jigdan College Intr. to Econometrics 34


95% non-rejection region
5% rejection region

Figure 3: The Rejection Region for a 1-Sided Test (Upper Tail)

5% rejection 95% non-rejection region


region

Figure 4: The Rejection Region for a 1-Sided Test (Lower Tail)


6. Use the t-tables to obtain a critical value or values with which to compare the test
statistic.
7. Finally perform the test. If the test statistic lies in the rejection region then reject the null
hypothesis (H0), else do not reject H0.

A Note on the t and the Normal Distribution


You should all be familiar with the normal distribution and its characteristic “bell” shape. We
can scale a normal variate to have zero mean and unit variance by subtracting its mean and
dividing by its standard deviation.

Jigdan College Intr. to Econometrics 35


There is, however, a specific relationship between the t- and the standard normal distribution.
Both are symmetrical and centred on zero. The t-distribution has another parameter, its degrees
of freedom. We will always know this (for the time being from the number of observations -2).
What Does the t-Distribution Look Like?

Normal distribution

t-distribution

2.7.2. The Confidence Interval Approach to Hypothesis Testing


An example of its usage: We estimate a parameter, say to be 0.93, and a “95% confidence
interval” to be (0.77, 1.09). This means that we are 95% confident that the interval containing the
true (but unknown) value of.
Confidence intervals are almost invariably two-sided, although in theory a one-sided interval can
be constructed.
   
1. Calculate  ,  and SE(  ) and SE(  )as before.
2. Choose a significance level, , (again the convention is 5%). This is equivalent to
choosing a (1-)100% confidence interval, i.e. 5% significance level = 95% confidence
interval
3. Use the t-tables to find the appropriate critical value, which will again have n-2 degrees
of freedom.
   
4. The confidence interval is given by (  t crit xSE( ),   t crit xSE( )

5. Perform the test: If the hypothesised value of  (*) lies outside the confidence interval,
then reject the null hypothesis that  = *, otherwise do not reject the null.
Confidence Intervals Versus Tests of Significance

Jigdan College Intr. to Econometrics 36


• Note that the Test of Significance and Confidence Interval approaches always give the
same answer. Under the test of significance approach, we would not reject H0 that  = *
if the test statistic lies within the non-rejection region, i.e. if

ˆ   *
 t crit   t crit
SE ( ˆ )
• Rearranging, we would not reject if
  
 t crit xSE( )     *  t crit xSE( )
   
(  t crit xSE( )   *    t crit xSE( )
• But this is just the rule under the confidence interval approach.
Example
• Using the regression results above,
yˆ  20.3  0.5091x
, n=22
(14.38) (0.2561)
• Using both the test of significance and confidence interval approaches, test the hypothesis
that  =1 against a two-sided alternative.
• The first step is to obtain the critical value. We want tcrit = t20;5%
Determining the Rejection Region

2.5% rejection region 2.5% rejection region

-2.086 +2.086

Performing the Test


• The hypotheses are:
H0 :  = 1

Jigdan College Intr. to Econometrics 37


H1 :   1
Test of significance approach

ˆ   * 0.5091  1
test statistic    1.917
SE ( ˆ ) 0.2561

Do not reject the null hypothesis (H0) since test stat lies within non-rejection region
Confidence interval approach
   
(  t crit xSE( ),   t crit xSE( )
 (0.5091  2.086 x0.2561,0.5091  2.086 x0.2561
 (0.0251.1.0433)
Since 1 lies within the confidence interval, do not reject the null hypothesis (H0)
Changing the Size of the Test
But note that we looked at only a 5% size of test. In marginal cases (e.g. H0 :  = 1), we may get
a completely different answer if we use a different size of test. This is where the test of
significance approach is better than a confidence interval.
For example, say we wanted to use a 10% size of test. Using the test of significance approach,

ˆ   * 0.5091  1
test statistic    1.917 as above. The only thing that changes is the critical
SE ( ˆ ) 0.2561

t-value.
Changing the Size of the Test: The New Rejection Regions

5% rejection region 5% rejection region

-1.725 +1.725

Jigdan College Intr. to Econometrics 38


t20;10% = 1.725. So now, as the test statistic lies in the rejection region, we would reject the null
hypothesis (H0).
If we reject the null hypothesis at the 5% level, we say that the result of the test is statistically
significant.
The Exact Significance Level or p-value
This is equivalent to choosing an infinite number of critical t-values from tables. It gives us the
marginal significance level where we would be indifferent between rejecting and not rejecting
the null hypothesis.
If the test statistic is large in absolute value, the p-value will be small, and vice versa. The p-
value gives the plausibility of the null hypothesis.
e.g. a test statistic is distributed as a t62 = 1.47.
The p-value = 0.12.
• Do we reject at the 5% level?...........................No
• Do we reject at the 10% level?.........................No
• Do we reject at the 20% level?.........................Yes
A Special Type of Hypothesis Test: The t-ratio
Recall that the formula for a test of significance approach to hypothesis testing using a t-test was

ˆ   *
test statistic 
SE ( ˆ )
If the test is H0 :  = 0
H1 :   0
i.e. a test that the population coefficient is zero against a two-sided alternative, this is known as a
t-ratio test:

ˆ
Since  * = 0, test statistic 
SE ( ˆ )
• The ratio of the coefficient to its SE is known as the t-ratio or t-statistic.
2.7.3. The Errors That We Can Make Using Hypothesis Tests
We usually reject H0 if the test statistic is statistically significant at a chosen significance
level. There are two possible errors we could make:
1. Rejecting H0 when it was really true. This is called a type I error.
2. Not rejecting H0 when it was in fact false. This is called a type II error.

Jigdan College Intr. to Econometrics 39


Reality
Ho is true Ho is false

Significant Type I error 


Result of test (Reject Ho) =
Type II error
Insignificant  =
(Do not reject Ho)

The Trade-off between Type I and Type II Errors


The probability of a type I error is just , the significance level or size of test we chose. To see
this, recall what we said significance at the 5% level meant: it is only 5% likely that a result as or
more extreme as this could have occurred purely by chance.
Note that there is no chance for a free lunch here! What happens if we reduce the size of the test
(e.g. from a 5% test to a 1% test)? We reduce the chances of making a type I error but we also
reduce the probability that we will reject the null hypothesis at all, so we increase the probability
of a type II error:

Less likely to falsely


reject
Reduce size  More strict  Reject null
of test criterion for hypothesis less
More likely to
rejection often
incorrectly not reject

So there is always a trade-off between type I and type II errors when choosing a significance
level. The only way we can reduce the chances of both is to increase the sample size.

Review Questions

1) Briefly explain the assumptions of classical linear regression models

2) Explain, with the use of equations, the difference between the sample regression function
and the population regression function
Jigdan College Intr. to Econometrics 40
3) Differentiate simple and multiple linear regression models

4) Discuss the properties of the least square Estimates

5) Econometrics deals with the measurement of economic relationships which are stochastic
or random. The simplest form of economic relationships between two variables X and Y
can be represented by:

Yi =  + β1 Xi + ui
Where  and β1 are regression parameters, and ui =the stochastic disturbance term. What
are the reasons for the insertion of U-term in the model?

6) The following data refers to the demand for money (M) and the rate of interest (R) in for
eight different economics:
M (In billions) 56 50 46 30 20 35 37 61
R% 6.3 4.6 5.1 7.3 8.9 5.3 6.7 3.5

 
A. Assuming a relationship M    R   i , obtain the OLS estimators of  , and 
 
B. interpret its value the parameters  , and 
C. If in a 9th economy the rate of interest is R=8.1, predict the demand for money (M) in
economy.
D. Test the statistical significance of the parameter estimates,  (H0:  = 0) at 1% 5%
and 10% level of significance
E. Construct a 95% confidence interval for the true slope(coefficient)
7) Are hypotheses tested concerning the actual values of the coefficients (i.e. β) or their

estimated values (i.e.  ) and why?

Jigdan College Intr. to Econometrics 41


Jigdan College Intr. to Econometrics 42
Chapter Three
3. The Classical Linear Regression Analysis: Multiple Linear Regression Models

The simple linear regression model was assumed implicitly that only one independent variable
(X) affects the dependent variable (Y). But economic theory is seldom so simple for; a number
of other variables are also likely to affect the dependent variable. Therefore, we need to extend
our simple two-variable regression model to cover models involving more than two variables.
Adding more variables leads us to the discussion of multiple linear regression models, that is,
models in which the dependent variable, or regressand, Y depends on two or more explanatory
variables, or regressors.

The general linear regression model with k explanatory variables is of the form
Y    1 X 1   2 X 2  ...   k X k   i

There are K parameters to be estimated (K = k+1). Clearly the system of normal equations will
consist of K equations, in which the unknowns are the parameters  , 1 ,  2 ,.... k

The simplest possible multiple regression model is three-variable regression, with one dependent
variable and two explanatory variables.

In this part we shall extend the simple linear regression model to relationships with two
explanatory variables and consequently to relationships with any number of explanatory
variables.

3.1. Models with Two Explanatory Variables

The population regression model with two explanatory variables is given as:
Y    1 X 1   2 X 2   i

  is the intercept term which gives the average values of Y when X1 and X2 are zero.
 1 and  2 are called the partial slope coefficient, or partial regression coefficients.
 1 measures the change in the mean value of Y resulting from a unit change in the X1
given X2 (i.e. holding the value of X2 constant). Or equivalently 1 measures the direct

Jigdan College Intr. to Econometrics 43


or net effect of a unit change in X1 on the mean value of Y( net of any effect that X 2 may
have on the mean of Y). The interpretation of  2  2 is also similar.

To complete the specification of our simple model we need some assumptions about the random
variable u. These assumptions are the same as in the single explanatory variable model
developed previously. That is:

 Zero mean value of  i , or E (  |X1, X2) = 0 for each I

 No serial correlation, or cov(  i ,  j ) = 0 where i ≠ j

 Homoscedasticity, or var(  i ) =  2

 Normality of  i ,i.e  i ,∼ N(0,  2 )

 Zero covariance between  i and each X variable, or cov(  i ,X1 ) = cov(  i ,X2 ) = 0

 No specification bias, or the model is correctly specified


 No exact collinearity between the X variables, or no exact linear relationship between
X1 and X2

The assumption of no collinearity is a new one and means the absence of possibility of one of the
explanatory variables being expressed as a linear combination of the other. Existence of exact
linear dependence between X1 and X2 would mean that we have only one independent variable in
our model than two. If such a regression is estimated there is no way to estimate the separate
influence of X 1 (1 ) and X 2 ( 2 ) on Y, since such a regression gives us only the combined
influence of X1 and X2 on Y.

To see this suppose X2=2X1 then

Y    1 X 1   2 X 2   i

Y    1 X 1   2 ( 2 X 1 )   i

Y    ( 1  2 2 ) X 1   i

Y    X 1   i where,   1  2 2

Jigdan College Intr. to Econometrics 44


Estimating the above regression yields the combined effect of X1 and X2 as represented by
  1  2 2 where there is no possibility of separating their individual effects which are
represented by  1 and  2

This assumption does not guarantee there will not be correlations among the explanatory
variables; it only means that the correlations are not exact or perfect, as it is not impossible to
find two or more (economic) variables that may not be correlated to some extent. Likewise the
assumption does not guarantee absence of non-linear relationships among X’s either.

Having specified our model we next use sample observations on Y, X1 and X2 obtain estimates
of the true parameters  ,  1 and  2

   
Y    1 X 1   2 X 2

  
where  ,  1 and  2 are estimates of the true parameters  ,  1 and  2 of the relationship.

As before, the estimates will be obtained by minimizing the sum of squared residuals

2    
u i   (Yi  Yi ) 2   (Yi   1 X 1   2 X 2 ) 2

A necessary condition for this expression to assume a minimum value is that its partial
  
derivatives with respect to  ,  1 and  2 be equal to zero:

  
  (Yi    1 X 1   2 X 2 ) 2
0
ˆ

  
  (Yi    1 X 1   2 X 2 ) 2
 0
1

  
  (Yi    1 X 1   2 X 2 ) 2
 0
 2

Jigdan College Intr. to Econometrics 45


Performing the partial differentiations we get the following system of three normal equations in
  
three unknown parameters  ,  1 and  2


Y i  nˆ  1  X 1   2  X 2

X Y 1 i  ˆ  X 1  1  X 12   2  X 1 X 2

X Y 2 i  ˆ  X 2  1  X 1 X 2   2  X 22

From the solution of this system (by any method, for example using determinants) we obtain
  
values for  ,  1 and  2 . Besides, by solving the system of normal equations,

x y1 i  1  X 12   2  X 1 X 2

x 2 yi  1  x1 x2   2  X 22

The following formulae, in which the variables are expressed in deviations from their mean, may
obtained for estimating the values of the parameter estimates

 
  Y  1 X 1   2 X 2

 ( x1 yi )( x22 )  ( x2 yi )( x1 x2 )
1 
( x1 )( x22 )  ( x1 x2 ) 2
2

 ( x2 yi )( x12 )  ( x1 yi )( x1 x2 )
2 
( x1 )( x22 )  ( x1 x2 ) 2
2

Where x1  X 1  X , x2  X 2  X and y i  Yi  Y

3.2. Testing Multiple Hypotheses: The F-test

Jigdan College Intr. to Econometrics 46


We used the t-test to test single hypotheses, i.e. hypotheses involving only one coefficient. But
what if we want to test more than one coefficient simultaneously? We do this using the F-test.
The F-test involves estimating 2 regressions.
 The unrestricted regression is the one in which the coefficients are freely determined by
the data, as we have done before.
 The restricted regression is the one in which the coefficients are restricted, i.e. the
restrictions are imposed on some s.
The F-test: Restricted and Unrestricted Regressions
• Example
The general regression is
Y =  + 1x1+ 2x2 + 3x3 + u
We want to test the restriction that 2+3 = 1 (we have some hypothesis from theory which
suggests that this would be an interesting hypothesis to study). The unrestricted regression is the
above equation, but what is the restricted regression?
Y =  + 1x1+ 2x2 + 3x3 + u subject to. 2+3 = 1
We substitute the restriction (2+3 = 1) into the regression so that it is automatically imposed
on the data.
2+3 = 1 3= 1- 2
Y =  + 1x1+ 2x2 + (1- 2)x3 + u
Y =  + 1x1+ 2x2 + x3- 2x3 + u

• Gather terms in ’s together and rearrange


y- x3 =  + 1x1 + 2(x2 – x3) + u
• This is the restricted regression. We actually estimate it by creating two new variables,
call them, say, P and Q.
P = y – x3
Q = x2 – x3
So, P =  + 1x1 + 2Qt + ut is the restricted regression we actually estimate.
Calculating the F-Test Statistic
The test statistic is given by

Jigdan College Intr. to Econometrics 47


RRSS  URSS N  k
test statistic  x
URSS m
where URSS = RSS from unrestricted regression
RRSS = RSS from restricted regression
m = number of restrictions
N = number of observations
k = number of regressors in unrestricted regression including a constant in
the unrestricted regression (or the total number of parameters to be estimated).
The test statistic follows the F-distribution, which has 2 degrees of freedom(d.f) parameters.
• The value of the degrees of freedom parameters are m and (N-k) respectively (the order of
the degrees of freedom (d.f) parameters is important).
• The appropriate critical value will be in column m, row (N-k). The F-distribution has only
positive values and is not symmetrical. We therefore only reject the null if the test
statistic > critical F-value.
Determining the Number of Restrictions in an F-test
• Examples :
H0: hypothesis No. of restrictions, m
1 +  2 = 2 1
1 = 1 and 2 = -1 2
1 = 0, 2 = 0 and 3 = 0 3
If the model is Y =  + 1x1+ 2x2 + 3x3 + u
Then the null hypothesis
H0: 1 = 0, and 2 = 0 and 3 = 0 is tested by the regression F-statistic. It tests the null
hypothesis that all of the coefficients except the intercept coefficient are zero.
Note the form of the alternative hypothesis for all tests when more than one restriction is
involved: H1: 2  0, or 3  0 or 4  0
We cannot test using this framework hypotheses which are not linear or which are multiplicative,
e.g. H0: 2 3 = 2 or H0: 2 2 = 1 cannot be tested.
Review Questions
1. Explain, with the use of equations, the difference between the simple linear regression
function and the multiple linear regression function

Jigdan College Intr. to Econometrics 48


2. Which of the following hypotheses about the coefficients can be tested using a t-test?
Which of them can be tested using an F-test? In each case, state the number of
restrictions

H0 : β3 = 2

H0 : β3 + β4 = 1

H0 : β3 + β4 = 1 and β5 = 1

H0 : β2 = 0 and β3 = 0 and β4 = 0 and β5 = 0

H0 : β2β3 = 1

3. Which would you expect to be bigger – the unrestricted residual sum of squares or the
restricted residual sum of squares, and why?
4. What are the most common units of R2?

Jigdan College Intr. to Econometrics 49


Chapter Four

4. Classical Linear Regression Model Assumptions and Diagnostics: Violation of the


Assumptions of the CLRM
4.1. Introduction

Recall that in the classical model we have assumed

 Zero mean of the random term


 Constant variance of the error term (i.e., the assumption of homoscedasticity)
 No autocorrelation of the error term
 Normality of the error term
 No multicolinearity among the explanatory variable.
We will now study these assumptions further, and in particular look at:
 How we test for violations
 Causes
 Consequences
In general we could encounter any combination of 3 problems:

o The coefficient estimates are wrong


o The associated standard errors are wrong
o The distribution that we assumed for the test statistics will be inappropriate
 Solutions
o the assumptions are no longer violated
o we work around the problem so that we use alternative techniques which are still
valid
It was on the basis of these assumptions that we try to estimate the model, and test the
significance of the model. But the question is what would be the implication if some or all of
these assumptions are violated. That is, if the assumptions are not fulfilled what will be the
outcome? In this unit we will discuss issues in violation of some of the assumptions that are
more important.

Jigdan College Intr. to Econometrics 50


4.2. Assumption 1: The assumption of mean of the disturbance is zero (E(u) = 0)

This assumption is imposed by the stochastic nature of economic relationships, which otherwise
it would be impossible to estimate with the common rule of mathematics. The assumption
implies that the observations of Y and X must be scattered around the line in a random way (and

hence the estimated line Yˆ =  + ̂1 X be a good approximation of the true line.) This defines the
relationship connecting Y and X ‘on the average’. The alternative possible assumptions are either
E(u) > 0 or E(u) < 0. Assume that for some reason the U’s had not an average value of zero, but
tended most of them to be positive. This would imply that the observation of Y and X would lie
above the true line.

It can be shown that by using these observations we would get a bad estimate of the true line. If
the true line lies below or above the observations, the estimated line would be biased

Note that there is no test for the verification of this assumption because the assumption E(u) = 0
is forced upon us if we are to establish the true relationship. That is, we set E(u) = 0 at the outset
of our estimation procedure. Its plausibility should be examined in each particular case on a
priori grounds. In any econometric application we must be sure that the following things are
fulfilled so as to be safe from violating the assumption of E(u) = 0

 All the important variables have been included into the function.
 There are no systematically positive or systematically negative errors of measurement in
the dependent variable.
4.3. Assumption 2: The assumption of homoscedasticity (Var(ut) = 2 < )

The assumption of homoscedasticity (or constant variance) about the random variable u is that its
probability distribution remains the same over all observations of X, and in particular that the
variance of each u i is the same for all values of the explanatory variable. Symbolically we have

Var(u) = 2

I. The Nature of Hetroscedasticity

If the errors do not have a constant variance, we say that they are heteroscedastic. Note that if u2
is not constant, but its value depends on X. we may write ui2 = f(Xi). As X increases, so does the

Jigdan College Intr. to Econometrics 51


variance of u. This is a common form of hetrodcedasticity assumed in econometric applications.
That is, the larger an independent variable, the larger the variance of the associated disturbance.
Various examples can be stated in support of this argument. For instance, if consumption is a
function of the level of income, at higher levels of income (the independent variable) there is
greater scope for the consumer to act on whims and deviate by larger amounts from the specified
consumption relationship. The following diagram depicts this case.

Cons

Income
High
Low
income
income
Figure: Increasing variance of u

Furthermore, suppose we have a cross-section sample of family budget from which we want to
measure the savings function. That means Saving = f(income). In this case the assumption of
constant variance of the u’s is not appropriate, because high-income families show a much
greater variability in their saving behavior than do low income families. Families with high
income tend to stick to a certain standard of living and when their income falls they cut down
their savings rather than their consumption expenditure. But this is not the case in low income
families. Hence, the variance of ui’s increase as income increases.

Note, however, that the problem of hetroscedasticity is the problem of cross-sectional data rather
than time series data. That is, the problem is more serious on cross section data.

II. Causes of Hetroscedasticity

Hetrodcedasticity can also arise as a result of several cases. The first one is the presence of
outliers (i.e., extreme values compared to the majority of a variable). The inclusion or exclusion
of such an observation, especially if the sample size is small, can substantially alter the results of

Jigdan College Intr. to Econometrics 52


regression analysis. With outliers it would be hard to maintain the assumption of
homoscedasticity.

Another source of hetrodcedasticity arises from violating the assumption that the regression
model is correctly specified. Very often what looks like hetroscedasticity may be due to the fact
that some important variables are omitted from the model. In such situation the residuals
obtained from the regression may give the distinct impression that the error variance may not be
constant. But if the omitted variables are included in the model, the impression may disappear.

In summary we may say that on a priori grounds there are reasons to believe that the assumption
of homoscedasticity may often be violated in practice. It is therefore, important to examine the
consequences of hetroscedaticity.

III. The consequence of Hetrodcedasticity

If the assumption of homoscedastic disturbance is not fulfilled we have the following


consequences:

a. If u is hetroscedastic, the OLS estimates do not have the minimum variance property in the
class of unbiased estimators; that is, they are inefficient in small samples. Furthermore,
they are inefficient in large samples
b. The coefficient estimates would still be statistically unbiased. That is the expected value of

the ˆ ' s will equal to the true parameters, E( ˆ i ) = i

c. The prediction (of Y for a given value of X) would be inefficient because of high variance.
This is because the variance of the prediction includes the variances of u and of the
parameter estimates, which are not minimum due to the incidence of hetroscedasticity.
In any case how does one detect whether the problem really exists.

IV. Detecting the problems of heteroscedasticity

The usual first step in attacking this problem is to determine whether or not heterodcedasticity
actually exists. There are several tests for this which is based on the examination of the OLS
residuals (i.e., ui). There are many formal tests: we will discuss Goldfeld-Quandt test and
White’s test

Jigdan College Intr. to Econometrics 53


A. The Goldfeld-Quandt (GQ) test is carried out as follows.

This test is applicable to large samples. The observation must be at least twice as many as the
parameters to be estimated. The test assumes normality and serially independent disturbance
term, Ui’s. Consider the following:

Yi = α + 1X1i + 2X2i + …kXki + ui

Furthermore, suppose that the test is to assess whether there exists hetroscedasticity or not. The
hypothesis to be tested is

H0: ui’s are homoscedastic


H1: u’s are hetrodcedastic (with increasing variance)

To test this, Goldfeld-Quandent perform the following steps

1. Split the total sample of length N into two sub-samples of length N1 and N2. The
regression model is estimated on each sub-sample and the two residual variances are
calculated.

2. The null hypothesis is that the variances of the disturbances are equal,

H0:  12   22

3. The test statistic, denoted GQ, is simply the ratio of the two residual variances where the
larger of the two variances must be placed in the numerator.
S12
GQ  2
S2
4. The test statistic is distributed as an F(N1-k, N2-k) under the null of homoscedasticity.
And the null of a constant variance is rejected if the test statistic exceeds the critical
value.
5. If the above GQ > F(N1-k, N2-k) we accept that there is hetroscedasticity (that is we
reject the null hypothesis of no difference between the variances of U’s in the two sub
samples). If GQ < F(N1-k, N2-k) , we accept that the u’s are homoscedastic (in other

Jigdan College Intr. to Econometrics 54


words we accept the null hypothesis). The higher the observed F* ratio the stronger the
hetrodcedasticity of the u’s.
6. A problem with the test is that the choice of where to split the sample is that usually
arbitrary and may crucially affect the outcome of the test.
B. Detection of Heteroscedasticity using White’s Test

White’s general test for heteroscedasticity is one of the best approaches because it makes few
assumptions about the form of the heteroscedasticity.

The test is carried out as follows:

1) Assume that the regression we carried out is as follows


Yi = α + 1X1i + 2X2i + ui


And we want to test Var(u) = 2. We estimate the model, obtaining the residuals, u

2) Then run the auxiliary regression(Regress the squared residuals on a constant, the
original regressors, the original regressors squared and, if enough data, the cross-products
of the X’s), and get R2
uˆ 2    1 x1   2 x2  3 x12   4 x22  5 x1 x2  v

3) Obtain R2 from the auxiliary regression and multiply it by the number of observations, N.
It can be shown that

N R2  2 (k)

where k is the number of regressors in the auxiliary regression excluding the constant term.

4) If the 2 test statistic from step 3 is greater than the corresponding value from the
statistical table then reject the null hypothesis that the disturbances are homoscedastic.
C. Remedial Measures: Solutions for Hetroscedastic Disturbances

As we have seen, hetroscedacticity does not destroy the unbiasedness and consistency properties
of the OLS estimators, but they are no longer efficient, not even asymptotically (i.e., large
sample size). This lack of efficiency makes the usual hypothesis testing procedure of dubious
value. Therefore, remedial measures are clearly called for. When hetroscedasticity is established

Jigdan College Intr. to Econometrics 55


on the basis of any test, the appropriate solution is to transform the original model in such a way
as to obtain a form in which the transformed disturbances term has constant variance. We then
may apply the method of classical least squares to the transformed model. The adjustment of the
model depends on the particular form of homoscedasticity. Note that the transformation is based
on the assumption of the form of hetroscedasticity plausible assumptions about hetrodcedasticity
pattern

Given the model Yi =  + 1Xi + ui

1. If the form (i.e. the cause) of the heteroscedasticity is known, then we can use an estimation
method which takes this into account (called generalised least squares, GLS). A simple
illustration of GLS is as follows: Suppose that the error variance is related to another variable
i.e Suppose that we assume the error variance is proportional to Xi2. That is,
E(ui2) = 2Xi2

If, as a matter of “speculation”, or graphical methods it is believed that the variance of ui is


proportional to the square of the explanatory variable X.

It is believed that the variance of ui is proportional to the square of the explanatory variable X,
one may transform the original model as follows. Divide the original through by Xi to obtain

Yi  
  1  i
Xi Xi Xi

 1 
=    + 1 + Vi
Xi 

where Vi is the transformed disturbance term, equal to ui/Xi. Now it is easy to verify that

2
 
E(Vi ) = E  i 
2

 Xi 

1
 2
E (  i2 )
Xi

Given this it can be concluded that

Jigdan College Intr. to Econometrics 56


1
E (  i2 ) = 2
X i2

This is because by definition we assumed that

E(ui2) = 2Xi2

So by substituting it on the above result we obtained the following

E(Vi2) =
1
Xi2

 2 X i2   2 

Thus the variance of Vi is homoscedastic and one may proceed to apply OLS to the transformed
equation. Notice that in the transformed regression the intercept term 1 is the slope coefficient
in the original equation and the slope coefficient 0 is the intercept term in the original model.
Therefore, to get back to the original model we shall have to multiply the estimated (4.11) by Xi

2. Given the model Yi =  + 1Xi + ui suppose that we assume the error variance to be
proportional to Xi. That is,
E(Ui2) = 2Xi

This requires square root transformation.

In this case the original model can be transformed by dividing the model with X i . That is,

Yi  Xi i
  1 
Xi Xi Xi Xi

1
=  1 X i  Vi = Y* =  * + 1*Xi + Vi where Vi =  i X i and Xi > 0
xi

Given assumption 2, one can readily verify that E(Vi2) = 2, a homoscedastic situation. That is

2
  
2
Var (Vi) = E(Vi ) = E  i 
 X i 

Jigdan College Intr. to Econometrics 57


1
= E(  i 2)
Xi

Since by assumption we said E(  i 2) = 2Xi. It implies that

1 2
Var (Vi) =  Xi = 2
Xi

Therefore, one may proceed to apply OLS to the transformed equation. Note an important feature
of the transformed model: It has no intercept term. Therefore, one will have to use the regression
through the origin model to estimate  and 1. Having run regression on the transformed model

one can get back to the original model simply by multiplying it with Xi

3. A log transformation such as


lnYi =  + 1lnXi + Ui

Very often such transformation reduces hetrodcedasticity when compared with the regression

Yi =  + 1Xi + Ui

This result arises because log transformation compresses the scales in which the variables are
measured. For example log transformation reduces a ten-fold difference between two values
(such as between 8 and 80) into a two-fold difference (because ln(80) = 4.32 and ln(8) = 2.08)

To conclude, the remedial measures explained earlier through transformation point out that we
are essentially speculating about the nature of i2. Note, also that the OLS estimators obtained
from the transformed equation are BLUE. Which of the transformation discussed will work will
depend on the nature of the problem and the severity of hetroscedasticity. Moreover, we may not
know a priori which of the X variable should be chosen for transformation the data in case of
multiple regression model. In addition log transformation is not applicable if some of the Y and
X values are zero or negative. Besides the use of t-test, F tests, etc are valid only in large samples
when regression is conducted in transformed variables.

4.4. Assumption 3: The Assumptions of No Autocorrelation


1) The Nature of Autocorrelation

Jigdan College Intr. to Econometrics 58


An important assumption of the classical linear model is that there is no autocorrelation or serial
correlation among the disturbances ui entering into the population regression function. This
assumption implies that the covariance of ui and uj in equal to zero. That is:

Cov (ui , uj) = 0 for ij

But if this assumption is violated, it implies that the disturbances are said to be auto correlated.
This could arise for several reasons.

i) Spatial autocorrelation: In regional cross-section data, a random shock affecting economic


activity in one region may cause economic activity in an adjacent region to change because
of close economic ties between the regions. Shocks due to weather similarities might also
tend to cause the error terms between adjacent regions to be related.
ii) Prolonged influence of shocks: In time series data, random shocks (disturbances) have
effects that often persist over more than one time period. An earth quick, flood, strike or
war, for example, will probably affect the economy’s operation in periods.
iii) Inertia: past action often have a strong effect on current actions, so that a positive
disturbance in one period is likely to influence activity in succeeding periods.
iv) Data manipulation published data often undergo interpolation or smoothing, procedures that
average true disturbances over successive time periods.
v) Misspecification: An omitted relevant independent variable that is auto correlated will make
the disturbance (associated with the misspecified model) auto correlated. An incorrect
functional form or a misspecification of the equation’s dynamics could do the same. In
these instances the appropriate procedure is to correct the misspecification.

Note that autocorrelation is a special case of correlation. Autocorrelation refers to the


relationship not between two (or more) different variables, but between the successive values of
the same variable (where in this section we are particularly interested in the autocorrelation of
the u’s. Moreover, note that the term autocorrelation and serial correlation are treated
synonymously.

Jigdan College Intr. to Econometrics 59


Positive Autocorrelation: Positive Autocorrelation is indicated by a cyclical residual plot over

+
û t û t
+

-3.7
-6
-6.5
- +
uˆ t 1 -6
-3.1
Time -5
-3
0.5
-1
1
4
3
5
7
- - 8
7
time.

Figure: Positive Autocorrelation

Negative Autocorrelation: Negative autocorrelation is indicated by an alternating pattern where


the residuals cross the time axis more frequently than if they were distributed randomly

û t
+ +
û t

- +
Time
uˆ t 1

- -

Figure: Negative Autocorrelation

No pattern in residuals – No autocorrelation: No pattern in residuals at all: this is what we


would like to see

Jigdan College Intr. to Econometrics 60


û t
+
+
û t

- uˆ t +1

Figure: No Autocorrelation

Consequences of Autocorrelation

When the disturbance term exhibits serial correlation the value as well as the standard errors of
the parameter estimates is affected.

i) If disturbances are correlated, the prevailed value of the disturbances has some information to
convey about the current disturbances. If this information is ignored it is clear that the sample
data is not being used with maximum efficiency. However the estimates of the parameters do
not have the statistical biased even when the residuals are serially correlated. That is, the
parameter of OLS estimates is statistically unbiased in the sense that their expected value is
equal to the true parameter.
ii) The variance of the random term u may be seriously underestimated. In particular, the under
estimation of the variance of u will be more serious in the case of positive autocorrelation of
the error term (ut). With positive first-order auto correlated errors it implies that fitting an
OLS estimating line clearly gives an estimate quite wide of the mark. The high variation in
these estimates will cause the variance of ’s to be greater than it would have been had the
errors been distributed randomly.
iii) The prediction based on ordinary least squares estimate will be inefficient with
autocorrelated errors. This is because of having a larger variance as compared with
predictions based on estimates obtained from other econometric techniques. Recall that the
variance of the forecast depends on the variances of the coefficient estimates and the variance

Jigdan College Intr. to Econometrics 61


of u. Since these variances are not minimal as compared with other techniques, the standard
error of the forecast (from OLS) will not have the least value, due to autocorrelated u’s.
2) Detecting for Autocorrelation
Autocorrelation is potentially a series problem. Hence, it is essential to find out whether
autocorrelation exists in a given situation. Consider the following commonly used tests of serial
correlation.

Note that since the population disturbances  t , cannot be observed directly, we use its proxy,
 
the residual  t which can be obtained form the usual OLS procedure. The examination of  t can
provide useful information not only about autocorrelation but also about hetrescedasticity, model
inadequacy, or specification bias.

A. Durbin-Watson d Test

The most celebrated test for detecting serial correlation is the one developed by statisticians
Durbin and Watson. It is popularly known as the Durbin-Watson d-Statistic which is defined as

n

    t 1 
2
t
t 2
DW = n
2

t 2
t

which is simply the ratio of the sum of squared differences in successive residuals to the residual
sum of squares, RSS. Note that in the numerator of the d statistic the number of observations is
n-1 because one observation is lost in taking successive differences. Note that expanding the
above formula allows us to obtain

DW = 2(1 - ̂ ).

where ̂ is the estimated correlation coefficient. Since ̂ is a correlation, it implies that

 1  ˆ  1

Although it is not used routinely, it is important to note the assumptions underlying the d-
statistics

Jigdan College Intr. to Econometrics 62


a) the regression model includes an intercept term
b) the explanatory variables are non-stochastic or fixed in repeated sampling
c) the disturbances  t are generated by the first order autoregressive scheme.

 t =  t 1 + t

d) the regression model does not include lagged value(s) of the dependent variable as one of
the explanatory variables
e) there are no missing observations in the data
Note from the Durbin-Watson statistic that for positive autocorrelation ( > 0), successive
disturbance values will tend to have the same sign and the quantities (  t - t 1 )2 will tend to be
small relative to the squares of the actual values of the disturbances. We can therefore, expect the
value of DW to be low. Indeed, for the extreme case  = 1 it is possible that  t = t 1 for all t so

that the minimum possible value of the equation is zero. However, for negative autocorrelation,
since positive disturbance values now tend to be followed by negative ones and vise versa, the
quantities (  t - t 1 )2 will tend to be large relative to the squares of the u’s. Hence, the value of

DW now tends to be high. The extreme case here is when  = 0 we should expect DW to take a
value in the neighborhood of 2. Notice, however, that when  = 0, the equation reduces to  t =

t for all t, so that t takes on all the property of  t – in particular it is no longer autocorrelated.

Thus in the absence of the autocorrelation we can expect DW = 2(1 - ̂ ). to take a value close
to 2, when negative autocorrelation is present a value in excess of 2 and may be as high as 4, and
when positive autocorrelation is present a value lower than 2 and may be close to zero.

The Durbin-Watson test tests the hypothesis that H0:  = 0 (implying that the error terms are not
autocorrelated with a first order scheme) against the alternate. However, the sampling
distribution for the d-statistic depends on the sample size n, the number of explanatory variables
k and also on the actual sample values of the explanatory variables. Thus, the critical values at
which we might, for example reject the null hypothesis at 5 percent level of significance depend
very much on the sample we have chosen. Notice that it is impracticable to tabulate critical
values for all possible sets of sample values. What is possible however, is for given values of n
and k, to find upper and lower bounds such that actual critical values for any set of sample values

Jigdan College Intr. to Econometrics 63


will fall within these known limits. Tables are available which give these upper and lower
bounds for various levels of n and k and for specified levels of significance.(In the appendices
part you can get the Durbin Watson table)

The Durbin-Watson test procedure in testing the null hypothesis of  = 0 against the alternative
hypothesis of positive autocorrelation is illustrated in the figure below.

Note that under the null hypothesis the actual sampling distribution of d, for the given n and k
and for the given sample X values is shown by the unbroken curve. It is such that 5 percent of
the area beneath it lies to the left of the point d*, i.e., P(d < d*) = 0.05. If d* were known we
would reject the null hypothesis at the 5 percent level of significance if for our sample d < d *.
Unfortunately, for the reason given above, d* is unknown. The broken curve labeled DL and du
represent for given values of n and k, the upper and lower limits to the sampling distribution of d
with in which the actual sampling distribution must lie whatever the sample x-values.

du

dL

d*L d* d*u 4

Figure: Distribution of dL and dU

The point d*U and d*L are such that the areas under the respective du and dL curves to the left of
these points are in each case 5 percent of the total area. i.e., p(dL < d*L) = p(dU < d*U) = 0.05. It
is the point d*U and d*L, representing the upper and lower bounds to the unknown d*, that are
tabulated for varying values of n and k. Clearly, if the sample value of the Durbin-Watson
statistic lies to the left of d*L it must also lie to the left of d*, while if it lies to the right of d*U . it

Jigdan College Intr. to Econometrics 64


must also lie to the right of d*. However, there is an inconclusive region, since if d lies between
d*L and d*U we cannot know whether it lies to the left or right of d*

The decision criterion for the Durbin-Watson test is therefore, of the following form

- for DW < d*L reject the null hypothesis of no autocorrelation in favor of positive
autocorrelation;
- for DW > d*U do not reject null hypothesis, i.e., insufficient evidence to suggest positive
autocorrelation;
- for d*L < DW < d*U test inconclusive.
Because of the symmetry of the distribution illustrated in the previous figure it is also possible to
use the tables for d*L and d*U to test the null hypothesis of no autocorrelation against the
alternative hypothesis of negative autocorrelation, i.e.  < 0. The decision criterion then takes the
form.

- for DW > 4 - d*L reject the null hypothesis of no autocorrelation in favor of negative
autocorrelation.
- for DW < 4 - d*U do not reject null hypothesis, i.e., insufficient evidence to suggest negative
autocorrelation
- for 4 - d*L > DW > 4- d*U test inconclusive.
Note that tables for d*U and d*L are constructed to facilitate the use of one-tail rather than two tail
tests. The following representation explains better the actual test procedure which shows that the
limit of d are 0 and 4.

Jigdan College Intr. to Econometrics 65


Note that from the above presentation we can develop the following rule of thumb. That is, if d is
found to be closer to 2 in an application, one may assume that there is no first order
autocorrelation either positive or negative. If d is closer to 0 it is because  is closer to 1
indicating strong positive autocorrelation in the residuals. Similarly the closer d is to 4, the
greater the evidence of negative serial correlation. This is because  is closer to –1.

B. The Breusch-Godfrey Test

It is a more general test for rth order autocorrelation:

 t  1  t 1   2  t 2   3  t 3  ...   r  t r  vt vt N(0,  2 )

The null and alternative hypotheses are:

H0 : 1 = 0 and 2 = 0 and ... and r = 0

H1 : 1  0 or 2  0 or ... or r  0

The test is carried out as follows:


1. Estimate the linear regression using OLS and obtain the residuals,  t


2. Regress  t on all of the regressors from stage 1 (the x’s) plus  t 1 ,  t 2 ,...r  t r

Obtain R2 from this regression.

3. It can be shown that (N-r)R2  2(r)

If the test statistic exceeds the critical value from the statistical tables, reject the null hypothesis
of no autocorrelation.

3) Remedies for Autocorrelation


If the form of the autocorrelation is known, we could use a GLS procedure – i.e. an approach that
allows for autocorrelated residuals e.g., Cochrane-Orcutt.

The Cochrane-Orcutt interactive Procedure

Jigdan College Intr. to Econometrics 66


This procedure helps to estimate  from the estimated residuals Û t so that information about the

unknown  will be derived.

To explain the method, consider the two-variable model

Yi =  + 1Xi + ui

and assume that  t is generated by the AR(1) scheme namely

 t =   t 1 + t

Cochrance and Orcutt then recommended the following steps to estimate :

Step 1: Estimate the two variable model by the standard OLS routine and obtain the residuals  t

Step 2: Using the estimated residuals, run the following regression

 
 t   t 1  vt

Step 3: Using ̂ obtained from step 2 regression, run the generalized difference equation similar
as follows

(Yt - ̂ Yt-1) =  (1- ̂ ) + 1(Xt - ̂ Xt-1) + (  t -  t 1 )


or Y*t =  * + *1X*t +  t*

Step 4: Since a priori it is not known that the ̂ obtained from the regression in step 2 is the best

estimate of , substitute the values of  * and ̂1* obtained from the regression in step 3

into the original regression (4.21) and obtain the new residuals, say  t** as


 t** = Yt -  *- ̂1* Xt

Note that this can be easily computed since Yt, Xt,  * and ̂1* are all known.

Step 5: Now estimate this regression

Jigdan College Intr. to Econometrics 67


 
 t** = ˆˆ t**1 + Wt

Where ̂ˆ is the second round estimate of 

Since we do not know whether this second round estimate of is the best estimate of , we can
go into the third estimate, and so on. That is why the Cochrane-Orcutt method is said iterative.
But how long should we go on? The general procedure is to stop carrying out iterations when the
successive estimates of  converges (i.e., differ by very small amount). Thus we select that
chosen  to transform the model and apply a kind of GLS estimation that minimizes the problem
of autocorrelation.

4.5. Assumption 4: Multicollinearity


A. The nature of the problem

One of the assumption of the classical linear regression model (CLRM) is that there is no perfect
multicollinearity among the regressors included in the regression model. Note that although the
assumption is said to be violated only in the case of exact multicollinearity (i.e., an exact linear
relationship among some of the regressors), the presence of multicollinearity (an approximate
linear relationship among some of the regressors) lead to estimating problems important enough
to warrant out treating it as a violation of the classical linear regression model.

Multicollinearity does not depend on any theoretical or actual linear relationship among any of
the regressors; it depends on the existence of an approximate linear relationship in the data set at
hand. Unlike most other estimating problems, this problem is caused by the particular sample
available. Multicollinearity in the data could arise for several reasons. For example, the
independent variables may all share a common time trend, one independent variable might be the
lagged value of another that follows a trend, some independent variable may have varied
together because the data were not collected from a wide enough base, or there could in fact exist
some kind of approximate relationship among some of the regressors.

Note that the existence of multicollinearity will affect seriously the parameter estimates.
Intuitively, when any two explanatory variables are changing in nearly the same way, it becomes
extremely difficult to establish the influence of each one regressors on the dependent variable

Jigdan College Intr. to Econometrics 68


separately. That is, if two explanatory variables change by the same proportion, the influence on
the dependent variable by one of the explanatory variables may be erroneously attributed to the
other. Their effect cannot be sensibly investigated, due to the high inter correlation.

In general, the problem of multicollinearity arises when individual effects of explanatory


variables cannot be isolated and the corresponding parameter magnitudes cannot be determined
with the desired degree of precision. Though it is quite frequent in cross section data as well, it
should be noted that it tends to be more common and more serious problem in time series data.

B. Consequences of Multicollinearity

In the case of near or high multicollinearity, one is likely to encounter the following
consequences

i) Although BLUE, the OLS estimators have large variances and covariances, making precise
estimation difficult. This is clearly seen through the formula of variance of the estimators.
For example of multiple linear regression, Var( ˆ1 ) can be written as follows

2
Var( ˆ1 ) =
 x 1  R 
2
1i
2
12

It is apparent from the above formula that as r12 (which is the coefficient of correlation between
X1 and X2) tends towards 1, which is as collinearity increases, the variance of the estimator
increases. The same holds for Var( ˆ 2 )and the cov ( ˆ1 , ˆ 2 )

ii) Because of consequence (i), the confidence interval tend to be much wider, leading to the
acceptance of the “Zero null hypothesis” (i.e., the true population coefficient is zero).
iii) Because of consequence (i), the t-ratio of one or more coefficient’s tend to be statistically
insignificant.
iv) Although the t-ratio of one or more coefficients is statistically insignificant, R2, the overall
measure of goodness of fit, can be very high. This is the basic symptom of the problem.
v) The OLS estimators and their standard errors can be sensitive to small changes in the data.
That is when few observations are included, the pattern of relationship may change and
affect the result.

Jigdan College Intr. to Econometrics 69


vi) Forecasting is still possible if the nature of the collinearity remains the same within the new
(future) sample observation. That is, if collinearity exists on the data of the past 15 years
sample, and if collinearity is expected to be the same for the future sample period, then
forecasting will not be a problem.
C. Detecting Multicollinearity

Note that multicollinearity is a question of degree and not of a kind. The meaningful distinction
is not between the presence of multicolinearity, but between its various degrees.
Multicollinearity is a feature of the sample and not of the population. Therefore, we do not “test
for multicollinearity” but can, if we wish, measure its degree in any particular sample. The
following are some rules of thumb and formal rules to detection of multicolinearity.

i) High R2 but few significant t-ratios: If R2 is high, say in excess of 0.8, the F-test in most
cases will reject the hypothesis that the partial slope coefficients are simultaneiously equal
to zero. But the individual t tests will show that none of very few of the partial slope
coefficients are statistically different from zero.

ii) High pair-wise correlation among regressors. If the pair-wise correlation coefficient among
two regressors is high, say in excess of 0.8, then multicolinearity is a serious problem.
iii) Auxiliary Regression: - Since multicollinearity arises because one or more of the regressors
are exact or approximately linear combinations of the other regressors, one way of finding
out which X variable is related to other X variables is to regress each Xi on the remaining X
variables and compute the corresponding R2that will help to decide abut the problem. For
example, consider the following auxiliary regression :
Xk = 1X1 + 2X2 + … + k-1Xk-1 + V

If the R2 of the above regression is high it implies that Xk is highly correlated with the rest
of the explanatory variables and hence drop Xk from the model.

D. Remedial Measures

The existence of multicolinearity in a data set doesnot necessarily mean that the coefficient
estimators in which the researcher is interested have unacceptably high variance. Thus, the

Jigdan College Intr. to Econometrics 70


econometrician should not worry about multicollinearity if the R 2 from the regression exceeds
the R2 of any independent variable regressed on the other independent variables”. Moreover the
researcher should worry about multicollinearity if the t-statistics are all greater than 2. Because
multicollinearity is essentially a sample problem there are no infallible guides. However one can
try the following rules of thumb, the success depending on the severity of the collinearity
problem.

a) Obtain more data: - Because the multicollinearity is essentially a data problem, additional
data that do not contain the multicollinearity feature could solve the problem. For example,
in the three variable model we saw that
2
Var( ˆ1 ) =
x2
1i (1  r122 )

Now as the sample size increases, x1i2 will generally increases. Thus for any given r12, the

variance of ˆ1 will decrease, thus decreasing the standard error, which will enable us to estimate
1 more precisely.

b) Drop a variable: - when faced with severe multicollinearity, one of the “simplest” thing to
do is to drop one of the collinear variables. But note that in dropping a variable from the
model we may be committing a specification bias or specification error. Specification bias
arises from incorrect specification of the model used in the analysis. Thus, if economic
theory requires some variables to be included in the model, dropping one of the variables
due to multicollinearity problem would constitute specification bias. This is because we are
dropping a variable when its true coefficient in the equation being estimated is not zero.
c) Transformation of variables: - In time series analysis, one reason for high multicollinearity
between two variables is that over time both variables tend to move in the same direction.
One way of minimizing then dependence is to transform the variables. That is, suppose Yt =
α + 1X1t + 2X2t.
This relation must also hold at time t-1 because the origin of time is arbitrary anyway.
Therefore we have

Yt-1 = α + 1X1t-1 + 2X2t-1 + Ut-1.

Jigdan College Intr. to Econometrics 71


Subtracting this from the above gives

Yt – Yt-1 = 1(X1t – X1t-1) + 2(X2t – X2t-1) + Vt

This is known as the first difference form because we run the regression, not on the original
variables, but on the difference of successive values of the variables. The first difference
regression model often reduces the severity of multicollinearity because, although the levels of
X1 and X2 may be highly correlated, there is no a priori reason to believe that their difference
will also be highly correlated

d) Formalize relationships among regressors: - If it is believed that the multicollinearity


arises not from an unfortunate data set but from an actual approximate linear relationship
among some of the regressors, this relationship could be formalized and the estimation
could then proceed in the context of a simultaneous equation estimation problem.
Review Questions
1. True or false and explain when necessary
A. In the presence of hetroscedasticity the usual OLS method always overestimates the
standard errors of estimators.
B. If a regression model is mis-specified the OLS residuals will show a distinct pattern.
C. The Durbin Watson d test assumes that the variance of the error term Ut is
homoscedastic.
D. In case of high multi co linearity, it is not possible to assess the individual significance of
one or more partial regression coefficients.
2. State in algebraic notation and explain the assumption about the CLRM’s disturbance that
is referred to by the term ‘homoscedasticity’.
3. What would the consequence be for a regression model if the errors were not
homoscedastic?
4. How might you proceed if you found that hetroscedasticwere actually the case?
5. What do you understand by the term ‘autocorrelation’?
6. An econometrician suspects that the residuals of her model might be autocorrelated.
Explain the steps involved in testing this theory using the Durbin–Watson (DW) test.
7. You are given the following data.

Jigdan College Intr. to Econometrics 72



 12 based on the first 30 observations = 55, df =25.


  22 based on the last 30 observations = 140, df = 25. Carrying out the GQ test of
hetroscedasticity at the 5% level of significance.

Jigdan College Intr. to Econometrics 73


Appendix

Jigdan College Intr. to Econometrics 74


Jigdan College Intr. to Econometrics 75
UPPER PERCENTAGE POINTS OF THE F DISTRIBUTION

Jigdan College Intr. to Econometrics 76


Jigdan College Intr. to Econometrics 77
Jigdan College Intr. to Econometrics 78
Jigdan College Intr. to Econometrics 79
Jigdan College Intr. to Econometrics 80
Jigdan College Intr. to Econometrics 81
Jigdan College Intr. to Econometrics 82
Jigdan College Intr. to Econometrics 83
Jigdan College Intr. to Econometrics 84
Jigdan College Intr. to Econometrics 85

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy